Message ID | 5264EA79.3050505@vmware.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
op 21-10-13 10:48, Thomas Hellstrom schreef: > Hi! > > As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in > the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. > > Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. > > We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... > > Basically I'm proposing to keep the current locking order. I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, no extra locking is needed for get_user_pages. Releasing it is a bit silly. I think we should keep mmap_sem as outer lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. ~Maarten
On Mon, Oct 21, 2013 at 10:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > Hi! > > As discussed previously the current locking order in TTM of these locks is > bo::reserve -> vm::mmap_sem. This leads to a hack in > the TTM fault() handle to try and revert the locking order. If a tryreserve > failed, we tried to have the vm code release the mmap_sem() and then > schedule, to give the holder of bo::reserve a chance to release the lock. > This solution is no longer legal, since we've been more or less kindly asked > to remove the set_need_resched() call. > > Maarten has proposed to invert the locking order. I've previously said I had > no strong preference. The current locking order dates back from the time > when TTM wasn't using unmap_mapping_range() but walked the page tables > itself, updating PTEs as needed. Furthermore it was needed for user bos that > used get_user_pages() in the TTM populate and swap-in methods. User-bos were > removed some time ago but I'm looking at re-adding them. They would suite > the VMware model of cached-only pages very well. I see uses both in the > gallium API, XA's DMA functionality and openCL. In particular OpenCL's clEnqueueWriteBuffer[1] function and friends which can move quite a bit of data in and out of VRAM. > > We would then need a somewhat nicer way to invert the locking order. I've > attached a solution that ups the mmap_sem and then reserves, but due to how > the fault API is done, we then need to release the reserve and retry the > fault. This of course opens up for starvation, but I don't think starvation > at this point is very likely: One thread being refused to write or read from > a buffer object because the GPU is continously busy with it. If this *would* > become a problem, it's probably possible to modify the fault code to allow > us to hold locks until the retried fault, but that would be a bit invasive, > since it touches the arch code.... > > Basically I'm proposing to keep the current locking order. Cheers, Jakob. [1] http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueWriteBuffer.html
On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: > op 21-10-13 10:48, Thomas Hellstrom schreef: >> Hi! >> >> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >> >> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >> >> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >> >> Basically I'm proposing to keep the current locking order. > I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, > no extra locking is needed for get_user_pages. Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. > Releasing it is a bit silly. I think we should keep mmap_sem as outer > lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do > with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. > > ~Maarten Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. We boost performance and save resources. Thanks, /Thomas > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
op 21-10-13 11:37, Thomas Hellstrom schreef: > On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >> op 21-10-13 10:48, Thomas Hellstrom schreef: >>> Hi! >>> >>> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >>> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >>> >>> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >>> >>> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >>> >>> Basically I'm proposing to keep the current locking order. >> I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, >> no extra locking is needed for get_user_pages. > > Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, > user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. If we don't allow mmapping user bo's through TTM, we can use special lockdep annotation when user-bo's are used. Normal bo's would have mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the other way around. This might complicate validation a little, since you would have to reserve and validate all user-bo's before any normal bo's are reserved. But since this is meant to be a vmwgfx specific optimization I think it might be worth it. >> Releasing it is a bit silly. I think we should keep mmap_sem as outer >> lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do >> with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. >> >> ~Maarten > Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. > We boost performance and save resources. Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy engines that can be used. Flushing the vm and stalling is probably more expensive than performing a memcpy.
On 10/21/2013 11:48 AM, Maarten Lankhorst wrote: > op 21-10-13 11:37, Thomas Hellstrom schreef: >> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >>> op 21-10-13 10:48, Thomas Hellstrom schreef: >>>> Hi! >>>> >>>> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >>>> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >>>> >>>> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >>>> >>>> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >>>> >>>> Basically I'm proposing to keep the current locking order. >>> I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, >>> no extra locking is needed for get_user_pages. >> Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, >> user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. > If we don't allow mmapping user bo's through TTM, we can use special lockdep annotation when user-bo's are used. Normal bo's would have > mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the other way around. > > This might complicate validation a little, since you would have to reserve and validate all user-bo's before any normal bo's are reserved. But since this > is meant to be a vmwgfx specific optimization I think it might be worth it. Would that work (lockdep-wise) with user BO swapout as part of a normal BO validation? During user BO swapout, we don't need mmap_sem, but the BO needs to be reserved. But I guess it would work, since we use tryreserve when walking the LRU lists? > >>> Releasing it is a bit silly. I think we should keep mmap_sem as outer >>> lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do >>> with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. >>> >>> ~Maarten >> Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. >> We boost performance and save resources. > Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy engines that can be used. Flushing the vm and stalling is probably more > expensive than performing a memcpy In the end, I'm not sure it will be vmwgfx specific once there is a working example, and there are user-space APIs that will benefit from it. There are other examples out there today that uses streaming DMA to feed the DMA engines, although not through TTM, see for example via_dmablit.c. Also if we need a separate locking order for User BOs, what would be the big benefit of having the locking order mmap_sem()->bo_reserve() ? /Thomas
op 21-10-13 12:10, Thomas Hellstrom schreef: > On 10/21/2013 11:48 AM, Maarten Lankhorst wrote: >> op 21-10-13 11:37, Thomas Hellstrom schreef: >>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >>>> op 21-10-13 10:48, Thomas Hellstrom schreef: >>>>> Hi! >>>>> >>>>> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >>>>> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >>>>> >>>>> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >>>>> >>>>> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >>>>> >>>>> Basically I'm proposing to keep the current locking order. >>>> I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, >>>> no extra locking is needed for get_user_pages. >>> Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, >>> user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. >> If we don't allow mmapping user bo's through TTM, we can use special lockdep annotation when user-bo's are used. Normal bo's would have >> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the other way around. >> >> This might complicate validation a little, since you would have to reserve and validate all user-bo's before any normal bo's are reserved. But since this >> is meant to be a vmwgfx specific optimization I think it might be worth it. > > Would that work (lockdep-wise) with user BO swapout as part of a normal BO validation? During user BO swapout, we don't need mmap_sem, but the BO needs to be reserved. But I guess it would work, since we use tryreserve when walking the LRU lists? Correct. >> >>>> Releasing it is a bit silly. I think we should keep mmap_sem as outer >>>> lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do >>>> with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. >>>> >>>> ~Maarten >>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. >>> We boost performance and save resources. >> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy engines that can be used. Flushing the vm and stalling is probably more >> expensive than performing a memcpy > In the end, I'm not sure it will be vmwgfx specific once there is a working example, and there are user-space APIs that will benefit from it. There are other examples out there today that uses streaming DMA to feed the DMA engines, although not through TTM, see for example via_dmablit.c. > > Also if we need a separate locking order for User BOs, what would be the big benefit of having the locking order mmap_sem()->bo_reserve() ? Deterministic locking. I fear about livelocks that could happen otherwise with the right timing. Especially but not exclusively on -rt kernels. ~Maarten
On 10/21/2013 12:24 PM, Maarten Lankhorst wrote: > op 21-10-13 12:10, Thomas Hellstrom schreef: >> On 10/21/2013 11:48 AM, Maarten Lankhorst wrote: >>> op 21-10-13 11:37, Thomas Hellstrom schreef: >>>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >>>>> op 21-10-13 10:48, Thomas Hellstrom schreef: >>>>>> Hi! >>>>>> >>>>>> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >>>>>> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >>>>>> >>>>>> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >>>>>> >>>>>> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >>>>>> >>>>>> Basically I'm proposing to keep the current locking order. >>>>> I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, >>>>> no extra locking is needed for get_user_pages. >>>> Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, >>>> user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. >>> If we don't allow mmapping user bo's through TTM, we can use special lockdep annotation when user-bo's are used. Normal bo's would have >>> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the other way around. >>> >>> This might complicate validation a little, since you would have to reserve and validate all user-bo's before any normal bo's are reserved. But since this >>> is meant to be a vmwgfx specific optimization I think it might be worth it. >> Would that work (lockdep-wise) with user BO swapout as part of a normal BO validation? During user BO swapout, we don't need mmap_sem, but the BO needs to be reserved. But I guess it would work, since we use tryreserve when walking the LRU lists? > Correct. > >>>>> Releasing it is a bit silly. I think we should keep mmap_sem as outer >>>>> lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do >>>>> with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. >>>>> >>>>> ~Maarten >>>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. >>>> We boost performance and save resources. >>> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy engines that can be used. Flushing the vm and stalling is probably more >>> expensive than performing a memcpy >> In the end, I'm not sure it will be vmwgfx specific once there is a working example, and there are user-space APIs that will benefit from it. There are other examples out there today that uses streaming DMA to feed the DMA engines, although not through TTM, see for example via_dmablit.c. >> >> Also if we need a separate locking order for User BOs, what would be the big benefit of having the locking order mmap_sem()->bo_reserve() ? > Deterministic locking. I fear about livelocks that could happen otherwise with the right timing. Especially but not exclusively on -rt kernels. But livelocks wouldn't be an issue anymore since we use a waiting reserve in the retry path, right? The only issue we might theoretically be facing is starvation in the fault path. With a two-step validation scheme I think we're up to real issues, like bouncing ordinary BOs in and out of swap during a single execbuf... /Thomas > > ~Maarten
op 21-10-13 13:10, Thomas Hellstrom schreef: > On 10/21/2013 12:24 PM, Maarten Lankhorst wrote: >> op 21-10-13 12:10, Thomas Hellstrom schreef: >>> On 10/21/2013 11:48 AM, Maarten Lankhorst wrote: >>>> op 21-10-13 11:37, Thomas Hellstrom schreef: >>>>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >>>>>> op 21-10-13 10:48, Thomas Hellstrom schreef: >>>>>>> Hi! >>>>>>> >>>>>>> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >>>>>>> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >>>>>>> >>>>>>> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >>>>>>> >>>>>>> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >>>>>>> >>>>>>> Basically I'm proposing to keep the current locking order. >>>>>> I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, >>>>>> no extra locking is needed for get_user_pages. >>>>> Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, >>>>> user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. >>>> If we don't allow mmapping user bo's through TTM, we can use special lockdep annotation when user-bo's are used. Normal bo's would have >>>> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the other way around. >>>> >>>> This might complicate validation a little, since you would have to reserve and validate all user-bo's before any normal bo's are reserved. But since this >>>> is meant to be a vmwgfx specific optimization I think it might be worth it. >>> Would that work (lockdep-wise) with user BO swapout as part of a normal BO validation? During user BO swapout, we don't need mmap_sem, but the BO needs to be reserved. But I guess it would work, since we use tryreserve when walking the LRU lists? >> Correct. >> >>>>>> Releasing it is a bit silly. I think we should keep mmap_sem as outer >>>>>> lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do >>>>>> with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. >>>>>> >>>>>> ~Maarten >>>>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. >>>>> We boost performance and save resources. >>>> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy engines that can be used. Flushing the vm and stalling is probably more >>>> expensive than performing a memcpy >>> In the end, I'm not sure it will be vmwgfx specific once there is a working example, and there are user-space APIs that will benefit from it. There are other examples out there today that uses streaming DMA to feed the DMA engines, although not through TTM, see for example via_dmablit.c. >>> >>> Also if we need a separate locking order for User BOs, what would be the big benefit of having the locking order mmap_sem()->bo_reserve() ? >> Deterministic locking. I fear about livelocks that could happen otherwise with the right timing. Especially but not exclusively on -rt kernels. > But livelocks wouldn't be an issue anymore since we use a waiting reserve in the retry path, right? The only issue we might theoretically be facing is starvation in the fault path. > > With a two-step validation scheme I think we're up to real issues, like bouncing ordinary BOs in and out of swap during a single execbuf... I would need to see some code, but I think it can be avoided by having a slowpath similar to i915 if it fails. Anyway have 2 threads fault on the same bo in different offsets, so fault handler is called twice on the same bo. You could end up a loop, first one grabbed the bo lock without mmap_sem, second one does trylock, which fails then unlocks mmap_sem, and acquires the bo lock again. At which point the first thread acquired mmap_sem and does a trylock. So the same thing can theoretically happen infinitely over and over again. ~Maarten
On 10/21/2013 03:21 PM, Maarten Lankhorst wrote: > op 21-10-13 13:10, Thomas Hellstrom schreef: >> On 10/21/2013 12:24 PM, Maarten Lankhorst wrote: >>> op 21-10-13 12:10, Thomas Hellstrom schreef: >>>> On 10/21/2013 11:48 AM, Maarten Lankhorst wrote: >>>>> op 21-10-13 11:37, Thomas Hellstrom schreef: >>>>>> On 10/21/2013 11:01 AM, Maarten Lankhorst wrote: >>>>>>> op 21-10-13 10:48, Thomas Hellstrom schreef: >>>>>>>> Hi! >>>>>>>> >>>>>>>> As discussed previously the current locking order in TTM of these locks is bo::reserve -> vm::mmap_sem. This leads to a hack in >>>>>>>> the TTM fault() handle to try and revert the locking order. If a tryreserve failed, we tried to have the vm code release the mmap_sem() and then schedule, to give the holder of bo::reserve a chance to release the lock. This solution is no longer legal, since we've been more or less kindly asked to remove the set_need_resched() call. >>>>>>>> >>>>>>>> Maarten has proposed to invert the locking order. I've previously said I had no strong preference. The current locking order dates back from the time when TTM wasn't using unmap_mapping_range() but walked the page tables itself, updating PTEs as needed. Furthermore it was needed for user bos that used get_user_pages() in the TTM populate and swap-in methods. User-bos were removed some time ago but I'm looking at re-adding them. They would suite the VMware model of cached-only pages very well. I see uses both in the gallium API, XA's DMA functionality and openCL. >>>>>>>> >>>>>>>> We would then need a somewhat nicer way to invert the locking order. I've attached a solution that ups the mmap_sem and then reserves, but due to how the fault API is done, we then need to release the reserve and retry the fault. This of course opens up for starvation, but I don't think starvation at this point is very likely: One thread being refused to write or read from a buffer object because the GPU is continously busy with it. If this *would* become a problem, it's probably possible to modify the fault code to allow us to hold locks until the retried fault, but that would be a bit invasive, since it touches the arch code.... >>>>>>>> >>>>>>>> Basically I'm proposing to keep the current locking order. >>>>>>> I'm not sure why we have to worry about mmap_sem lock being taken before bo::reserve. If we already hold mmap_sem, >>>>>>> no extra locking is needed for get_user_pages. >>>>>> Typically, they are populated outside of fault, as part of execbuf, where we don't hold and don't want to hold mmap_sem(). In fact, >>>>>> user bo's should not be remappable through the TTM VM system. Anyway, we need to grab the mmap_sem inside ttm_populate for user buffers. >>>>> If we don't allow mmapping user bo's through TTM, we can use special lockdep annotation when user-bo's are used. Normal bo's would have >>>>> mmap_sem outer lock, bo::reserve inner lock, while those bo's would have the other way around. >>>>> >>>>> This might complicate validation a little, since you would have to reserve and validate all user-bo's before any normal bo's are reserved. But since this >>>>> is meant to be a vmwgfx specific optimization I think it might be worth it. >>>> Would that work (lockdep-wise) with user BO swapout as part of a normal BO validation? During user BO swapout, we don't need mmap_sem, but the BO needs to be reserved. But I guess it would work, since we use tryreserve when walking the LRU lists? >>> Correct. >>> >>>>>>> Releasing it is a bit silly. I think we should keep mmap_sem as outer >>>>>>> lock, and have bo::reserve as inner, even if it might complicate support for user-bo's. I'm not sure what you can do >>>>>>> with user-bo's that can't be done by allocating the same bo from kernel first and map + populate it. >>>>>>> >>>>>>> ~Maarten >>>>>> Using DMA API analogy, user BOs correspond to using streaming DMA whereas normal BOs correspond to alloced DMA memory buffers. >>>>>> We boost performance and save resources. >>>>> Yeah but it's vmwgfx specific. Nouveau and radeon have dedicated copy engines that can be used. Flushing the vm and stalling is probably more >>>>> expensive than performing a memcpy >>>> In the end, I'm not sure it will be vmwgfx specific once there is a working example, and there are user-space APIs that will benefit from it. There are other examples out there today that uses streaming DMA to feed the DMA engines, although not through TTM, see for example via_dmablit.c. >>>> >>>> Also if we need a separate locking order for User BOs, what would be the big benefit of having the locking order mmap_sem()->bo_reserve() ? >>> Deterministic locking. I fear about livelocks that could happen otherwise with the right timing. Especially but not exclusively on -rt kernels. >> But livelocks wouldn't be an issue anymore since we use a waiting reserve in the retry path, right? The only issue we might theoretically be facing is starvation in the fault path. >> >> With a two-step validation scheme I think we're up to real issues, like bouncing ordinary BOs in and out of swap during a single execbuf... > I would need to see some code, but I think it can be avoided by having a slowpath similar to i915 if it fails. There's no slowpath needed, but the following situation might happen. Let's assume there is a memory shortage. 1. user_bo reference 2. bo reference 3. user_bo validate. bos are swapped out. 4. bo validate bos are swapped in again. While it's not an error, it's not a situation we should get into. > > Anyway have 2 threads fault on the same bo in different offsets, so fault handler is called twice on the same bo. > > You could end up a loop, first one grabbed the bo lock without mmap_sem, second one does trylock, which fails then unlocks mmap_sem, and acquires the bo lock again. At which point the first thread acquired mmap_sem and does a trylock. So the same thing can theoretically happen infinitely over and over again. Indeed. This situation would be extremely unlikely but could however be easily fixed with wait_for_unreserved semantics instead of reserve; unreserve. Like in mm/filemap.c line 666 and onwards. Either by an API addition or using the current API. The latter would, however require an extra mutex per bo. /Thomas > > ~Maarten
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 1006c15..55c487d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -61,13 +61,22 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) /* * Work around locking order reversal in fault / nopfn * between mmap_sem and bo_reserve: Perform a trylock operation - * for reserve, and if it fails, retry the fault after scheduling. + * for reserve, and if it fails, retry the fault after releasing + * the mmap_sem and waiting. */ ret = ttm_bo_reserve(bo, true, true, false, 0); if (unlikely(ret != 0)) { - if (ret == -EBUSY) - set_need_resched(); + if (ret == -EBUSY) { + if ((vmf->flags & FAULT_FLAG_ALLOW_RETRY) && + !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { + up_read(&vma->vm_mm->mmap_sem); + ttm_bo_reserve_nolru(bo, true, false, false, + NULL); + ww_mutex_unlock(&bo->resv->lock); + return VM_FAULT_RETRY; + } + } return VM_FAULT_NOPAGE; }