Message ID | 50B868C1.8000703@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/30/2012 09:05 AM, Maarten Lankhorst wrote: > Just use the return error from ttm_mem_evict_first instead. > > Changes since v1: > - Add warning if list is not empty, nothing else we can do here. Marten, when this function is called, all cross-device reservers have been shut out with the TTM lock or whatever similar mechanism is in place. It's a critical function that must succeed, unless there is a hardware failure to evict. As mentioned in the comments on the previous patch, ttm_bo_evict_first() may return 0 (OK) if it failed to reclaim the trylock in cleanup_refs_and_unlock(), with the assumption that another process will destroy the bo anyway (possibly at a later time which we know nothing about). The lru list needs to be empty when this function returns. This means we must either keep the while(list_empty) or perhaps better, retry instead of WARN if list_empty() is detected at the end of list. /Thomas > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 30 +++++++++++++++--------------- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 8ab23ae..9028327 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -1323,25 +1323,25 @@ static int ttm_bo_force_list_clean(struct ttm_bo_device *bdev, > struct ttm_bo_global *glob = bdev->glob; > int ret; > > - /* > - * Can't use standard list traversal since we're unlocking. > - */ > - > - spin_lock(&glob->lru_lock); > - while (!list_empty(&man->lru)) { > - spin_unlock(&glob->lru_lock); > + do { > ret = ttm_mem_evict_first(bdev, mem_type, false, false); > - if (ret) { > - if (allow_errors) { > - return ret; > - } else { > - pr_err("Cleanup eviction failed\n"); > - } > + if (ret && ret != -EBUSY && !allow_errors) { > + pr_err("Cleanup eviction failed with %i\n", ret); > + ret = 0; > } > + } while (!ret); > + > + if (likely(ret == -EBUSY)) { > + /* > + * lru list should be empty, verify this is the case. > + */ > spin_lock(&glob->lru_lock); > + WARN_ON(!list_empty(&man->lru)); > + if (list_empty(&man->lru) || !allow_errors) > + ret = 0; > + spin_unlock(&glob->lru_lock); > } > - spin_unlock(&glob->lru_lock); > - return 0; > + return ret; > } > > int ttm_bo_clean_mm(struct ttm_bo_device *bdev, unsigned mem_type)
Op 30-11-12 10:04, Thomas Hellstrom schreef: > On 11/30/2012 09:05 AM, Maarten Lankhorst wrote: >> Just use the return error from ttm_mem_evict_first instead. >> >> Changes since v1: >> - Add warning if list is not empty, nothing else we can do here. > > Marten, when this function is called, all cross-device reservers have been shut out with the TTM lock or whatever similar > mechanism is in place. It's a critical function that must succeed, unless there is a hardware failure to evict. > > As mentioned in the comments on the previous patch, ttm_bo_evict_first() may return 0 (OK) if it failed to reclaim the > trylock in cleanup_refs_and_unlock(), with the assumption that another process will destroy the bo anyway > (possibly at a later time which we know nothing about). The lru list needs to be empty when this function returns. > > This means we must either keep the while(list_empty) or perhaps better, retry instead of WARN if list_empty() is detected at the end of list. As long as evict_first returns 0, that function is called over and over again. But regarding the race.. Even if in this case it returns 0 early, that bo would no longer be on the lru list anyway, since I always take the reservation with lru lock held in the cleanup code, so either this is a cross-device reservation (needs more thought on how to handle that correctly), or that other function is inside the cleanup_memtype_use function, and has already taken the bo off all lists before dropping lru lock. Now it may seem that blocking on reservation can help in this case, maybe it would, but it doesn't close the race entirely..If another function also called ttm_bo_cleanup_refs_and_unlock from another context BEFORE ttm_mem_evict_first was called, at the time ttm_mem_evict_first gets the lru lock the buffer was already taken off the lru lists, and evict_first wouldn't know anything about it and return -EBUSY too. But the lru list is still empty in any of those cases, so the WARN wouldn't trigger.. ~Maarten
On 11/30/2012 11:16 AM, Maarten Lankhorst wrote: > Op 30-11-12 10:04, Thomas Hellstrom schreef: >> On 11/30/2012 09:05 AM, Maarten Lankhorst wrote: >>> Just use the return error from ttm_mem_evict_first instead. >>> >>> Changes since v1: >>> - Add warning if list is not empty, nothing else we can do here. >> Marten, when this function is called, all cross-device reservers have been shut out with the TTM lock or whatever similar >> mechanism is in place. It's a critical function that must succeed, unless there is a hardware failure to evict. >> >> As mentioned in the comments on the previous patch, ttm_bo_evict_first() may return 0 (OK) if it failed to reclaim the >> trylock in cleanup_refs_and_unlock(), with the assumption that another process will destroy the bo anyway >> (possibly at a later time which we know nothing about). The lru list needs to be empty when this function returns. >> >> This means we must either keep the while(list_empty) or perhaps better, retry instead of WARN if list_empty() is detected at the end of list. > As long as evict_first returns 0, that function is called over and over again Ah, ok. You're right. Reviewed-by: Thomas Hellstrom<thellstrom@vmware.com>
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 8ab23ae..9028327 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1323,25 +1323,25 @@ static int ttm_bo_force_list_clean(struct ttm_bo_device *bdev, struct ttm_bo_global *glob = bdev->glob; int ret; - /* - * Can't use standard list traversal since we're unlocking. - */ - - spin_lock(&glob->lru_lock); - while (!list_empty(&man->lru)) { - spin_unlock(&glob->lru_lock); + do { ret = ttm_mem_evict_first(bdev, mem_type, false, false); - if (ret) { - if (allow_errors) { - return ret; - } else { - pr_err("Cleanup eviction failed\n"); - } + if (ret && ret != -EBUSY && !allow_errors) { + pr_err("Cleanup eviction failed with %i\n", ret); + ret = 0; } + } while (!ret); + + if (likely(ret == -EBUSY)) { + /* + * lru list should be empty, verify this is the case. + */ spin_lock(&glob->lru_lock); + WARN_ON(!list_empty(&man->lru)); + if (list_empty(&man->lru) || !allow_errors) + ret = 0; + spin_unlock(&glob->lru_lock); } - spin_unlock(&glob->lru_lock); - return 0; + return ret; } int ttm_bo_clean_mm(struct ttm_bo_device *bdev, unsigned mem_type)
Just use the return error from ttm_mem_evict_first instead. Changes since v1: - Add warning if list is not empty, nothing else we can do here. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)