Message ID | 50B658D0.2060800@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/28/2012 07:32 PM, Maarten Lankhorst wrote: > Op 28-11-12 16:10, Thomas Hellstrom schreef: >> On 11/28/2012 03:46 PM, Maarten Lankhorst wrote: >>> Op 28-11-12 15:23, Thomas Hellstrom schreef: >>>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote: >>>>> Op 28-11-12 14:21, Thomas Hellstrom schreef: >>>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote: >>>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef: >>>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote: >>>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is >>>>>>>>> removed where the bo is taken off the swap list or the lru list between >>>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified, >>>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails >>>>>>>>> it will drop the locks and perform a blocking wait, or return an error >>>>>>>>> if no_wait_gpu was set. >>>>>>>>> >>>>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first >>>>>>>>> will always follow the destruction path, so no new fence is allowed >>>>>>>>> to be attached. As far as I can see this may already have been the case, >>>>>>>>> but the unlocking / relocking required a complicated loop to deal with >>>>>>>>> re-reservation. >>>>>>>>> >>>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with >>>>>>>>> reservation held, so drivers must be aware that move_notify with a null >>>>>>>>> parameter doesn't require a reservation. >>>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not >>>>>>>> immediately clear from this patch. >>>>>>> Because we would hold the reservation while waiting and with the object still >>>>>>> on swap and lru lists still, that would defeat the whole purpose of keeping >>>>>>> the object on multiple lists, plus break current code that assumes bo's on the >>>>>>> those lists can always be reserved. >>>>>>> >>>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and >>>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but >>>>>>> I'm sure that would not be the case if the reservations were shared across >>>>>>> devices. >>>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation, >>>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error. >>>>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose >>>>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that >>>>> case, so not adding it back to the other lists is harmless. >>>>> >>>> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock >>>> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers, >>>> but it may come back and bite us. >>> That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in >>> only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen >>> right away, it still happens before last list ref to the bo is dropped. >>> >>> But it's your call, just choose the approach you want and I'll resubmit this. :-) >>> >>>> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process. >>>> >>>> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use function, and as far >>>> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to >>>> do the same as the evict path I think without looking to deeply into the consequences that we should be safe. >>> I think returning success early without removing off ddestroy list if re-reserving fails >>> with lru lock held would be better. >>> >>> We completed the wait and attempt to reserve the bo, which failed. Without the lru >>> lock atomicity, this can't happen since the only places that would do it call this with >>> the lru lock held. >>> >>> With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete >>> with remove_all set to true. And even if that happened the destruction code would run >>> *anyway* since we completed the waiting part already, it would just not necessarily be >>> run from this thread, but that guarantee didn't exist anyway. >>>> Then we should be able to skip patch 2 as well. >>> If my tryreserve approach sounds sane, second patch should still be skippable. :-) >> Sure, Lets go for that approach. > Ok updated patch below, you only need to check if you like the approach or not, > since I haven't tested it yet beyond compiling. Rest of series (minus patch 2) > should still apply without modification. > > drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v2 > > By removing the unlocking of lru and retaking it immediately, a race is > removed where the bo is taken off the swap list or the lru list between > the unlock and relock. As such the cleanup_refs code can be simplified, > it will attempt to call ttm_bo_wait non-blockingly, and if it fails > it will drop the locks and perform a blocking wait, or return an error > if no_wait_gpu was set. > > The need for looping is also eliminated, since swapout and evict_mem_first > will always follow the destruction path, no new fence is allowed > to be attached. As far as I can see this may already have been the case, > but the unlocking / relocking required a complicated loop to deal with > re-reservation. > > Changes since v1: > - Simplify no_wait_gpu case by folding it in with empty ddestroy. > - Hold a reservation while calling ttm_bo_cleanup_memtype_use again. > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 202fc20..e9f01fe 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo) > ttm_bo_mem_put(bo, &bo->mem); > > atomic_set(&bo->reserved, 0); > + wake_up_all(&bo->event_queue); > > /* > - * Make processes trying to reserve really pick it up. > + * Since the final reference to this bo may not be dropped by > + * the current task we have to put a memory barrier here to make > + * sure the changes done in this function are always visible. > + * > + * This function only needs protection against the final kref_put. > */ > - smp_mb__after_atomic_dec(); > - wake_up_all(&bo->event_queue); > + smp_mb__before_atomic_dec(); > } > > static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) > @@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) > } > > /** > - * function ttm_bo_cleanup_refs > + * function ttm_bo_cleanup_refs_and_unlock > * If bo idle, remove from delayed- and lru lists, and unref. > * If not idle, do nothing. > * > + * Must be called with lru_lock and reservation held, this function > + * will drop both before returning. > + * > * @interruptible Any sleeps should occur interruptibly. > - * @no_wait_reserve Never wait for reserve. Return -EBUSY instead. > * @no_wait_gpu Never wait for gpu. Return -EBUSY instead. > */ > > -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, > - bool interruptible, > - bool no_wait_reserve, > - bool no_wait_gpu) > +static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo, > + bool interruptible, > + bool no_wait_gpu) > { > struct ttm_bo_device *bdev = bo->bdev; > + struct ttm_bo_driver *driver = bdev->driver; > struct ttm_bo_global *glob = bo->glob; > int put_count; > - int ret = 0; > + int ret; > > -retry: > spin_lock(&bdev->fence_lock); > - ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu); > - spin_unlock(&bdev->fence_lock); > + ret = ttm_bo_wait(bo, false, false, true); > > - if (unlikely(ret != 0)) > - return ret; > + if (ret && !no_wait_gpu) { > + void *sync_obj; > > -retry_reserve: > - spin_lock(&glob->lru_lock); > + /* > + * Take a reference to the fence and unreserve, > + * at this point the buffer should be dead, so > + * no new sync objects can be attached. > + */ > + sync_obj = driver->sync_obj_ref(&bo->sync_obj); > + spin_unlock(&bdev->fence_lock); > > - if (unlikely(list_empty(&bo->ddestroy))) { > + put_count = ttm_bo_del_from_lru(bo); > + > + atomic_set(&bo->reserved, 0); > + wake_up_all(&bo->event_queue); > spin_unlock(&glob->lru_lock); > - return 0; > - } > > - ret = ttm_bo_reserve_locked(bo, false, true, false, 0); > + ttm_bo_list_ref_sub(bo, put_count, true); > > - if (unlikely(ret == -EBUSY)) { > - spin_unlock(&glob->lru_lock); > - if (likely(!no_wait_reserve)) > - ret = ttm_bo_wait_unreserved(bo, interruptible); > - if (unlikely(ret != 0)) > + ret = driver->sync_obj_wait(sync_obj, false, interruptible); > + driver->sync_obj_unref(&sync_obj); > + if (ret) { > + /* > + * Either the wait returned -ERESTARTSYS, or -EDEADLK > + * (radeon lockup) here. No effort is made to re-add > + * this bo to any lru list. Instead the bo only > + * appears on the delayed destroy list. > + */ > return ret; > + } Actually, we *must* re-add the bo to LRU lists here, because otherwise when a driver needs to evict a memory type completely, there's a large chance it will miss this bo. So I think either we need to keep the reservation, or keep the bo on the LRU lists. /Thomas
Op 28-11-12 20:21, Thomas Hellstrom schreef: > On 11/28/2012 07:32 PM, Maarten Lankhorst wrote: >> Op 28-11-12 16:10, Thomas Hellstrom schreef: >>> On 11/28/2012 03:46 PM, Maarten Lankhorst wrote: >>>> Op 28-11-12 15:23, Thomas Hellstrom schreef: >>>>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote: >>>>>> Op 28-11-12 14:21, Thomas Hellstrom schreef: >>>>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote: >>>>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef: >>>>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote: >>>>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is >>>>>>>>>> removed where the bo is taken off the swap list or the lru list between >>>>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified, >>>>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails >>>>>>>>>> it will drop the locks and perform a blocking wait, or return an error >>>>>>>>>> if no_wait_gpu was set. >>>>>>>>>> >>>>>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first >>>>>>>>>> will always follow the destruction path, so no new fence is allowed >>>>>>>>>> to be attached. As far as I can see this may already have been the case, >>>>>>>>>> but the unlocking / relocking required a complicated loop to deal with >>>>>>>>>> re-reservation. >>>>>>>>>> >>>>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with >>>>>>>>>> reservation held, so drivers must be aware that move_notify with a null >>>>>>>>>> parameter doesn't require a reservation. >>>>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not >>>>>>>>> immediately clear from this patch. >>>>>>>> Because we would hold the reservation while waiting and with the object still >>>>>>>> on swap and lru lists still, that would defeat the whole purpose of keeping >>>>>>>> the object on multiple lists, plus break current code that assumes bo's on the >>>>>>>> those lists can always be reserved. >>>>>>>> >>>>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and >>>>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but >>>>>>>> I'm sure that would not be the case if the reservations were shared across >>>>>>>> devices. >>>>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation, >>>>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error. >>>>>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose >>>>>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that >>>>>> case, so not adding it back to the other lists is harmless. >>>>>> >>>>> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock >>>>> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers, >>>>> but it may come back and bite us. >>>> That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in >>>> only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen >>>> right away, it still happens before last list ref to the bo is dropped. >>>> >>>> But it's your call, just choose the approach you want and I'll resubmit this. :-) >>>> >>>>> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process. >>>>> >>>>> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use function, and as far >>>>> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to >>>>> do the same as the evict path I think without looking to deeply into the consequences that we should be safe. >>>> I think returning success early without removing off ddestroy list if re-reserving fails >>>> with lru lock held would be better. >>>> >>>> We completed the wait and attempt to reserve the bo, which failed. Without the lru >>>> lock atomicity, this can't happen since the only places that would do it call this with >>>> the lru lock held. >>>> >>>> With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete >>>> with remove_all set to true. And even if that happened the destruction code would run >>>> *anyway* since we completed the waiting part already, it would just not necessarily be >>>> run from this thread, but that guarantee didn't exist anyway. >>>>> Then we should be able to skip patch 2 as well. >>>> If my tryreserve approach sounds sane, second patch should still be skippable. :-) >>> Sure, Lets go for that approach. >> Ok updated patch below, you only need to check if you like the approach or not, >> since I haven't tested it yet beyond compiling. Rest of series (minus patch 2) >> should still apply without modification. >> >> drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v2 >> By removing the unlocking of lru and retaking it immediately, a race is >> removed where the bo is taken off the swap list or the lru list between >> the unlock and relock. As such the cleanup_refs code can be simplified, >> it will attempt to call ttm_bo_wait non-blockingly, and if it fails >> it will drop the locks and perform a blocking wait, or return an error >> if no_wait_gpu was set. >> The need for looping is also eliminated, since swapout and evict_mem_first >> will always follow the destruction path, no new fence is allowed >> to be attached. As far as I can see this may already have been the case, >> but the unlocking / relocking required a complicated loop to deal with >> re-reservation. >> Changes since v1: >> - Simplify no_wait_gpu case by folding it in with empty ddestroy. >> - Hold a reservation while calling ttm_bo_cleanup_memtype_use again. >> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >> index 202fc20..e9f01fe 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> @@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo) >> ttm_bo_mem_put(bo, &bo->mem); >> atomic_set(&bo->reserved, 0); >> + wake_up_all(&bo->event_queue); >> /* >> - * Make processes trying to reserve really pick it up. >> + * Since the final reference to this bo may not be dropped by >> + * the current task we have to put a memory barrier here to make >> + * sure the changes done in this function are always visible. >> + * >> + * This function only needs protection against the final kref_put. >> */ >> - smp_mb__after_atomic_dec(); >> - wake_up_all(&bo->event_queue); >> + smp_mb__before_atomic_dec(); >> } >> static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) >> @@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) >> } >> /** >> - * function ttm_bo_cleanup_refs >> + * function ttm_bo_cleanup_refs_and_unlock >> * If bo idle, remove from delayed- and lru lists, and unref. >> * If not idle, do nothing. >> * >> + * Must be called with lru_lock and reservation held, this function >> + * will drop both before returning. >> + * >> * @interruptible Any sleeps should occur interruptibly. >> - * @no_wait_reserve Never wait for reserve. Return -EBUSY instead. >> * @no_wait_gpu Never wait for gpu. Return -EBUSY instead. >> */ >> -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, >> - bool interruptible, >> - bool no_wait_reserve, >> - bool no_wait_gpu) >> +static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo, >> + bool interruptible, >> + bool no_wait_gpu) >> { >> struct ttm_bo_device *bdev = bo->bdev; >> + struct ttm_bo_driver *driver = bdev->driver; >> struct ttm_bo_global *glob = bo->glob; >> int put_count; >> - int ret = 0; >> + int ret; >> -retry: >> spin_lock(&bdev->fence_lock); >> - ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu); >> - spin_unlock(&bdev->fence_lock); >> + ret = ttm_bo_wait(bo, false, false, true); >> - if (unlikely(ret != 0)) >> - return ret; >> + if (ret && !no_wait_gpu) { >> + void *sync_obj; >> -retry_reserve: >> - spin_lock(&glob->lru_lock); >> + /* >> + * Take a reference to the fence and unreserve, >> + * at this point the buffer should be dead, so >> + * no new sync objects can be attached. >> + */ >> + sync_obj = driver->sync_obj_ref(&bo->sync_obj); >> + spin_unlock(&bdev->fence_lock); >> - if (unlikely(list_empty(&bo->ddestroy))) { >> + put_count = ttm_bo_del_from_lru(bo); >> + >> + atomic_set(&bo->reserved, 0); >> + wake_up_all(&bo->event_queue); >> spin_unlock(&glob->lru_lock); >> - return 0; >> - } >> - ret = ttm_bo_reserve_locked(bo, false, true, false, 0); >> + ttm_bo_list_ref_sub(bo, put_count, true); >> - if (unlikely(ret == -EBUSY)) { >> - spin_unlock(&glob->lru_lock); >> - if (likely(!no_wait_reserve)) >> - ret = ttm_bo_wait_unreserved(bo, interruptible); >> - if (unlikely(ret != 0)) >> + ret = driver->sync_obj_wait(sync_obj, false, interruptible); >> + driver->sync_obj_unref(&sync_obj); >> + if (ret) { >> + /* >> + * Either the wait returned -ERESTARTSYS, or -EDEADLK >> + * (radeon lockup) here. No effort is made to re-add >> + * this bo to any lru list. Instead the bo only >> + * appears on the delayed destroy list. >> + */ >> return ret; >> + } > Actually, we *must* re-add the bo to LRU lists here, because otherwise when a driver needs > to evict a memory type completely, there's a large chance it will miss this bo. > > So I think either we need to keep the reservation, or keep the bo on the LRU lists. The second option is what v1 did, except I never bothered to re-take the reservation. ;-) It shouldn't cause troubles to leave it on the lru lists if we drop the the reservation, we can keep handling re-reservation failure in the same way as in v2. In that case would v3 be the same as v2 of this patch, except with those 2 lines from the ret && !no_wait_gpu branch removed: put_count = ttm_bo_del_from_lru(bo); ttm_bo_list_ref_sub(bo, put_count, true); And of course the comment after sync_obj_wait failure would no longer apply. ~Maarten
On 11/28/2012 11:26 PM, Maarten Lankhorst wrote: > Op 28-11-12 20:21, Thomas Hellstrom schreef: >> On 11/28/2012 07:32 PM, Maarten Lankhorst wrote: >>> Op 28-11-12 16:10, Thomas Hellstrom schreef: >>>> On 11/28/2012 03:46 PM, Maarten Lankhorst wrote: >>>>> Op 28-11-12 15:23, Thomas Hellstrom schreef: >>>>>> On 11/28/2012 02:55 PM, Maarten Lankhorst wrote: >>>>>>> Op 28-11-12 14:21, Thomas Hellstrom schreef: >>>>>>>> On 11/28/2012 01:15 PM, Maarten Lankhorst wrote: >>>>>>>>> Op 28-11-12 12:54, Thomas Hellstrom schreef: >>>>>>>>>> On 11/28/2012 12:25 PM, Maarten Lankhorst wrote: >>>>>>>>>>> By removing the unlocking of lru and retaking it immediately, a race is >>>>>>>>>>> removed where the bo is taken off the swap list or the lru list between >>>>>>>>>>> the unlock and relock. As such the cleanup_refs code can be simplified, >>>>>>>>>>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails >>>>>>>>>>> it will drop the locks and perform a blocking wait, or return an error >>>>>>>>>>> if no_wait_gpu was set. >>>>>>>>>>> >>>>>>>>>>> The need for looping is also eliminated, since swapout and evict_mem_first >>>>>>>>>>> will always follow the destruction path, so no new fence is allowed >>>>>>>>>>> to be attached. As far as I can see this may already have been the case, >>>>>>>>>>> but the unlocking / relocking required a complicated loop to deal with >>>>>>>>>>> re-reservation. >>>>>>>>>>> >>>>>>>>>>> The downside is that ttm_bo_cleanup_memtype_use is no longer called with >>>>>>>>>>> reservation held, so drivers must be aware that move_notify with a null >>>>>>>>>>> parameter doesn't require a reservation. >>>>>>>>>> Why can't we unreserve *after* ttm_bo_cleanup_memtype_use? That's not >>>>>>>>>> immediately clear from this patch. >>>>>>>>> Because we would hold the reservation while waiting and with the object still >>>>>>>>> on swap and lru lists still, that would defeat the whole purpose of keeping >>>>>>>>> the object on multiple lists, plus break current code that assumes bo's on the >>>>>>>>> those lists can always be reserved. >>>>>>>>> >>>>>>>>> the if (ret && !no_wait_gpu) path has to drop the reservation and lru lock, and >>>>>>>>> isn't guaranteed to be able to retake it. Maybe it could be guaranteed now, but >>>>>>>>> I'm sure that would not be the case if the reservations were shared across >>>>>>>>> devices. >>>>>>>> The evict path removes the BO from the LRU lists, drops the LRU lock but hangs on to the reservation, >>>>>>>> and in case the wait goes wrong, re-adds the bo to the LRU lists and returns an error. >>>>>>> If you really want to, we could hang on to the !no_wait_gpu path, wait shouldn't ever fail there, so I suppose >>>>>>> leaving it off the lru lists and not re-add on any list in case of wait fail is fine. It's still on the ddestroy list in that >>>>>>> case, so not adding it back to the other lists is harmless. >>>>>>> >>>>>> Well I'm a bit afraid that theoretically, other callers may have a bo reserved, while cleanup_refs_and_unlock >>>>>> more or less runs the whole destroy path on that buffer. Sure, we have control over those other reservers, >>>>>> but it may come back and bite us. >>>>> That's why initially I moved all the destruction to ttm_bo_release_list, to have all destruction in >>>>> only 1 place. But even now it's serialized with the lru lock, while the destruction may not happen >>>>> right away, it still happens before last list ref to the bo is dropped. >>>>> >>>>> But it's your call, just choose the approach you want and I'll resubmit this. :-) >>>>> >>>>>> Also the wait might fail if a signal is hit, so it's definitely possible, and even likely in the case of the X server process. >>>>>> >>>>>> Anyway, I prefer if we could try to keep the reservation across the ttm_cleanup_memtype_use function, and as far >>>>>> as I can tell, the only thing preventing that is the reservation release in the (!no_wait_gpu) path. So if we alter that to >>>>>> do the same as the evict path I think without looking to deeply into the consequences that we should be safe. >>>>> I think returning success early without removing off ddestroy list if re-reserving fails >>>>> with lru lock held would be better. >>>>> >>>>> We completed the wait and attempt to reserve the bo, which failed. Without the lru >>>>> lock atomicity, this can't happen since the only places that would do it call this with >>>>> the lru lock held. >>>>> >>>>> With the atomicity removal, the only place that could do this is ttm_bo_delayed_delete >>>>> with remove_all set to true. And even if that happened the destruction code would run >>>>> *anyway* since we completed the waiting part already, it would just not necessarily be >>>>> run from this thread, but that guarantee didn't exist anyway. >>>>>> Then we should be able to skip patch 2 as well. >>>>> If my tryreserve approach sounds sane, second patch should still be skippable. :-) >>>> Sure, Lets go for that approach. >>> Ok updated patch below, you only need to check if you like the approach or not, >>> since I haven't tested it yet beyond compiling. Rest of series (minus patch 2) >>> should still apply without modification. >>> >>> drm/ttm: call ttm_bo_cleanup_refs with reservation and lru lock held, v2 >>> By removing the unlocking of lru and retaking it immediately, a race is >>> removed where the bo is taken off the swap list or the lru list between >>> the unlock and relock. As such the cleanup_refs code can be simplified, >>> it will attempt to call ttm_bo_wait non-blockingly, and if it fails >>> it will drop the locks and perform a blocking wait, or return an error >>> if no_wait_gpu was set. >>> The need for looping is also eliminated, since swapout and evict_mem_first >>> will always follow the destruction path, no new fence is allowed >>> to be attached. As far as I can see this may already have been the case, >>> but the unlocking / relocking required a complicated loop to deal with >>> re-reservation. >>> Changes since v1: >>> - Simplify no_wait_gpu case by folding it in with empty ddestroy. >>> - Hold a reservation while calling ttm_bo_cleanup_memtype_use again. >>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >>> index 202fc20..e9f01fe 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> @@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo) >>> ttm_bo_mem_put(bo, &bo->mem); >>> atomic_set(&bo->reserved, 0); >>> + wake_up_all(&bo->event_queue); >>> /* >>> - * Make processes trying to reserve really pick it up. >>> + * Since the final reference to this bo may not be dropped by >>> + * the current task we have to put a memory barrier here to make >>> + * sure the changes done in this function are always visible. >>> + * >>> + * This function only needs protection against the final kref_put. >>> */ >>> - smp_mb__after_atomic_dec(); >>> - wake_up_all(&bo->event_queue); >>> + smp_mb__before_atomic_dec(); >>> } >>> static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) >>> @@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) >>> } >>> /** >>> - * function ttm_bo_cleanup_refs >>> + * function ttm_bo_cleanup_refs_and_unlock >>> * If bo idle, remove from delayed- and lru lists, and unref. >>> * If not idle, do nothing. >>> * >>> + * Must be called with lru_lock and reservation held, this function >>> + * will drop both before returning. >>> + * >>> * @interruptible Any sleeps should occur interruptibly. >>> - * @no_wait_reserve Never wait for reserve. Return -EBUSY instead. >>> * @no_wait_gpu Never wait for gpu. Return -EBUSY instead. >>> */ >>> -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, >>> - bool interruptible, >>> - bool no_wait_reserve, >>> - bool no_wait_gpu) >>> +static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo, >>> + bool interruptible, >>> + bool no_wait_gpu) >>> { >>> struct ttm_bo_device *bdev = bo->bdev; >>> + struct ttm_bo_driver *driver = bdev->driver; >>> struct ttm_bo_global *glob = bo->glob; >>> int put_count; >>> - int ret = 0; >>> + int ret; >>> -retry: >>> spin_lock(&bdev->fence_lock); >>> - ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu); >>> - spin_unlock(&bdev->fence_lock); >>> + ret = ttm_bo_wait(bo, false, false, true); >>> - if (unlikely(ret != 0)) >>> - return ret; >>> + if (ret && !no_wait_gpu) { >>> + void *sync_obj; >>> -retry_reserve: >>> - spin_lock(&glob->lru_lock); >>> + /* >>> + * Take a reference to the fence and unreserve, >>> + * at this point the buffer should be dead, so >>> + * no new sync objects can be attached. >>> + */ >>> + sync_obj = driver->sync_obj_ref(&bo->sync_obj); >>> + spin_unlock(&bdev->fence_lock); >>> - if (unlikely(list_empty(&bo->ddestroy))) { >>> + put_count = ttm_bo_del_from_lru(bo); >>> + >>> + atomic_set(&bo->reserved, 0); >>> + wake_up_all(&bo->event_queue); >>> spin_unlock(&glob->lru_lock); >>> - return 0; >>> - } >>> - ret = ttm_bo_reserve_locked(bo, false, true, false, 0); >>> + ttm_bo_list_ref_sub(bo, put_count, true); >>> - if (unlikely(ret == -EBUSY)) { >>> - spin_unlock(&glob->lru_lock); >>> - if (likely(!no_wait_reserve)) >>> - ret = ttm_bo_wait_unreserved(bo, interruptible); >>> - if (unlikely(ret != 0)) >>> + ret = driver->sync_obj_wait(sync_obj, false, interruptible); >>> + driver->sync_obj_unref(&sync_obj); >>> + if (ret) { >>> + /* >>> + * Either the wait returned -ERESTARTSYS, or -EDEADLK >>> + * (radeon lockup) here. No effort is made to re-add >>> + * this bo to any lru list. Instead the bo only >>> + * appears on the delayed destroy list. >>> + */ >>> return ret; >>> + } >> Actually, we *must* re-add the bo to LRU lists here, because otherwise when a driver needs >> to evict a memory type completely, there's a large chance it will miss this bo. >> >> So I think either we need to keep the reservation, or keep the bo on the LRU lists. > The second option is what v1 did, except I never bothered to re-take the reservation. ;-) Yes, I know ;) > It shouldn't cause troubles to leave it on the lru lists if we drop the the reservation, > we can keep handling re-reservation failure in the same way as in v2. > > In that case would v3 be the same as v2 of this patch, except with those 2 lines from the > ret && !no_wait_gpu branch removed: > > put_count = ttm_bo_del_from_lru(bo); > ttm_bo_list_ref_sub(bo, put_count, true); > > And of course the comment after sync_obj_wait failure would no longer apply. Yes, sounds good, although note that if the tryreserve fails in the !no_wait_gpu case, and we give up returning 0, that may cause a similar problem (ttm_bo_force_list_clean() not *ensuring* that a bo was removed, at least not by the time the function completes, but if we make sure the while(!list_empty()) in ttm_bo_force_list_clean() remains, that won't be a problem either. /Thomas > > ~Maarten
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 202fc20..e9f01fe 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -488,12 +488,16 @@ static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo) ttm_bo_mem_put(bo, &bo->mem); atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); /* - * Make processes trying to reserve really pick it up. + * Since the final reference to this bo may not be dropped by + * the current task we have to put a memory barrier here to make + * sure the changes done in this function are always visible. + * + * This function only needs protection against the final kref_put. */ - smp_mb__after_atomic_dec(); - wake_up_all(&bo->event_queue); + smp_mb__before_atomic_dec(); } static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) @@ -543,68 +547,95 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) } /** - * function ttm_bo_cleanup_refs + * function ttm_bo_cleanup_refs_and_unlock * If bo idle, remove from delayed- and lru lists, and unref. * If not idle, do nothing. * + * Must be called with lru_lock and reservation held, this function + * will drop both before returning. + * * @interruptible Any sleeps should occur interruptibly. - * @no_wait_reserve Never wait for reserve. Return -EBUSY instead. * @no_wait_gpu Never wait for gpu. Return -EBUSY instead. */ -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, - bool interruptible, - bool no_wait_reserve, - bool no_wait_gpu) +static int ttm_bo_cleanup_refs_and_unlock(struct ttm_buffer_object *bo, + bool interruptible, + bool no_wait_gpu) { struct ttm_bo_device *bdev = bo->bdev; + struct ttm_bo_driver *driver = bdev->driver; struct ttm_bo_global *glob = bo->glob; int put_count; - int ret = 0; + int ret; -retry: spin_lock(&bdev->fence_lock); - ret = ttm_bo_wait(bo, false, interruptible, no_wait_gpu); - spin_unlock(&bdev->fence_lock); + ret = ttm_bo_wait(bo, false, false, true); - if (unlikely(ret != 0)) - return ret; + if (ret && !no_wait_gpu) { + void *sync_obj; -retry_reserve: - spin_lock(&glob->lru_lock); + /* + * Take a reference to the fence and unreserve, + * at this point the buffer should be dead, so + * no new sync objects can be attached. + */ + sync_obj = driver->sync_obj_ref(&bo->sync_obj); + spin_unlock(&bdev->fence_lock); - if (unlikely(list_empty(&bo->ddestroy))) { + put_count = ttm_bo_del_from_lru(bo); + + atomic_set(&bo->reserved, 0); + wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock); - return 0; - } - ret = ttm_bo_reserve_locked(bo, false, true, false, 0); + ttm_bo_list_ref_sub(bo, put_count, true); - if (unlikely(ret == -EBUSY)) { - spin_unlock(&glob->lru_lock); - if (likely(!no_wait_reserve)) - ret = ttm_bo_wait_unreserved(bo, interruptible); - if (unlikely(ret != 0)) + ret = driver->sync_obj_wait(sync_obj, false, interruptible); + driver->sync_obj_unref(&sync_obj); + if (ret) { + /* + * Either the wait returned -ERESTARTSYS, or -EDEADLK + * (radeon lockup) here. No effort is made to re-add + * this bo to any lru list. Instead the bo only + * appears on the delayed destroy list. + */ return ret; + } - goto retry_reserve; - } + /* + * remove sync_obj with ttm_bo_wait, the wait should be + * finished, and no new wait object should have been added. + */ + spin_lock(&bdev->fence_lock); + ret = ttm_bo_wait(bo, false, false, true); + WARN_ON(ret); + spin_unlock(&bdev->fence_lock); + if (ret) + return ret; - BUG_ON(ret != 0); + spin_lock(&glob->lru_lock); + ret = ttm_bo_reserve_locked(bo, false, true, false, 0); - /** - * We can re-check for sync object without taking - * the bo::lock since setting the sync object requires - * also bo::reserved. A busy object at this point may - * be caused by another thread recently starting an accelerated - * eviction. - */ + /* + * We raced, and lost, someone else holds the reservation now, + * and is probably busy in ttm_bo_cleanup_memtype_use. + * + * Even if it's not the case, because we finished waiting any + * delayed destruction would succeed, so just return success + * here. + */ + if (ret) { + spin_unlock(&glob->lru_lock); + return 0; + } + } else + spin_unlock(&bdev->fence_lock); - if (unlikely(bo->sync_obj)) { + if (ret || unlikely(list_empty(&bo->ddestroy))) { atomic_set(&bo->reserved, 0); wake_up_all(&bo->event_queue); spin_unlock(&glob->lru_lock); - goto retry; + return ret; } put_count = ttm_bo_del_from_lru(bo); @@ -647,9 +678,13 @@ static int ttm_bo_delayed_delete(struct ttm_bo_device *bdev, bool remove_all) kref_get(&nentry->list_kref); } - spin_unlock(&glob->lru_lock); - ret = ttm_bo_cleanup_refs(entry, false, !remove_all, - !remove_all); + ret = ttm_bo_reserve_locked(entry, false, !remove_all, false, 0); + if (!ret) + ret = ttm_bo_cleanup_refs_and_unlock(entry, false, + !remove_all); + else + spin_unlock(&glob->lru_lock); + kref_put(&entry->list_kref, ttm_bo_release_list); entry = nentry; @@ -803,9 +838,13 @@ retry: kref_get(&bo->list_kref); if (!list_empty(&bo->ddestroy)) { - spin_unlock(&glob->lru_lock); - ret = ttm_bo_cleanup_refs(bo, interruptible, - no_wait_reserve, no_wait_gpu); + ret = ttm_bo_reserve_locked(bo, interruptible, no_wait_reserve, false, 0); + if (!ret) + ret = ttm_bo_cleanup_refs_and_unlock(bo, interruptible, + no_wait_gpu); + else + spin_unlock(&glob->lru_lock); + kref_put(&bo->list_kref, ttm_bo_release_list); return ret; @@ -1799,8 +1838,9 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink) kref_get(&bo->list_kref); if (!list_empty(&bo->ddestroy)) { - spin_unlock(&glob->lru_lock); - (void) ttm_bo_cleanup_refs(bo, false, false, false); + ttm_bo_reserve_locked(bo, false, false, false, 0); + ttm_bo_cleanup_refs_and_unlock(bo, false, false); + kref_put(&bo->list_kref, ttm_bo_release_list); spin_lock(&glob->lru_lock); continue;