diff mbox

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean, v2

Message ID 50B868C1.8000703@canonical.com (mailing list archive)
State New, archived
Headers show

Commit Message

Maarten Lankhorst Nov. 30, 2012, 8:05 a.m. UTC
Just use the return error from ttm_mem_evict_first instead.

Changes since v1:
- Add warning if list is not empty, nothing else we can do here.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |   30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

Comments

Thomas Hellstrom Nov. 30, 2012, 9:04 a.m. UTC | #1
On 11/30/2012 09:05 AM, Maarten Lankhorst wrote:
> Just use the return error from ttm_mem_evict_first instead.
>
> Changes since v1:
> - Add warning if list is not empty, nothing else we can do here.

Marten, when this function is called, all cross-device reservers have 
been shut out with the TTM lock or whatever similar
mechanism is in place. It's a critical function that must succeed, 
unless there is a hardware failure to evict.

As mentioned in the comments on the previous patch, ttm_bo_evict_first() 
may return 0 (OK) if it failed to reclaim the
trylock in cleanup_refs_and_unlock(), with the assumption that another 
process will destroy the bo anyway
(possibly at a later time which we know nothing about). The lru list 
needs to be empty when this function returns.

This means we must either keep the while(list_empty) or perhaps better, 
retry instead of WARN if list_empty() is detected at the end of list.

/Thomas

> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c |   30 +++++++++++++++---------------
>   1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 8ab23ae..9028327 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1323,25 +1323,25 @@ static int ttm_bo_force_list_clean(struct ttm_bo_device *bdev,
>   	struct ttm_bo_global *glob = bdev->glob;
>   	int ret;
>   
> -	/*
> -	 * Can't use standard list traversal since we're unlocking.
> -	 */
> -
> -	spin_lock(&glob->lru_lock);
> -	while (!list_empty(&man->lru)) {
> -		spin_unlock(&glob->lru_lock);
> +	do {
>   		ret = ttm_mem_evict_first(bdev, mem_type, false, false);
> -		if (ret) {
> -			if (allow_errors) {
> -				return ret;
> -			} else {
> -				pr_err("Cleanup eviction failed\n");
> -			}
> +		if (ret && ret != -EBUSY && !allow_errors) {
> +			pr_err("Cleanup eviction failed with %i\n", ret);
> +			ret = 0;
>   		}
> +	} while (!ret);
> +
> +	if (likely(ret == -EBUSY)) {
> +		/*
> +		* lru list should be empty, verify this is the case.
> +		*/
>   		spin_lock(&glob->lru_lock);
> +		WARN_ON(!list_empty(&man->lru));
> +		if (list_empty(&man->lru) || !allow_errors)
> +			ret = 0;
> +		spin_unlock(&glob->lru_lock);
>   	}
> -	spin_unlock(&glob->lru_lock);
> -	return 0;
> +	return ret;
>   }
>   
>   int ttm_bo_clean_mm(struct ttm_bo_device *bdev, unsigned mem_type)
Maarten Lankhorst Nov. 30, 2012, 10:16 a.m. UTC | #2
Op 30-11-12 10:04, Thomas Hellstrom schreef:
> On 11/30/2012 09:05 AM, Maarten Lankhorst wrote:
>> Just use the return error from ttm_mem_evict_first instead.
>>
>> Changes since v1:
>> - Add warning if list is not empty, nothing else we can do here.
>
> Marten, when this function is called, all cross-device reservers have been shut out with the TTM lock or whatever similar
> mechanism is in place. It's a critical function that must succeed, unless there is a hardware failure to evict.
>
> As mentioned in the comments on the previous patch, ttm_bo_evict_first() may return 0 (OK) if it failed to reclaim the
> trylock in cleanup_refs_and_unlock(), with the assumption that another process will destroy the bo anyway
> (possibly at a later time which we know nothing about). The lru list needs to be empty when this function returns.
>
> This means we must either keep the while(list_empty) or perhaps better, retry instead of WARN if list_empty() is detected at the end of list.
As long as evict_first returns 0, that function is called over and over again.

But regarding the race.. Even if in this case it returns 0 early, that bo would no longer be on the lru list
anyway, since I always take the reservation with lru lock held in the cleanup code, so either this is a
cross-device reservation (needs more thought on how to handle that correctly), or that other function
is inside the cleanup_memtype_use function, and has already taken the bo off all lists before dropping
lru lock.

Now it may seem that blocking on reservation can help in this case, maybe it would, but it doesn't close
the race entirely..If another function also called ttm_bo_cleanup_refs_and_unlock from another context
BEFORE ttm_mem_evict_first was called, at the time ttm_mem_evict_first gets the lru lock the buffer was
already taken off the lru lists, and evict_first wouldn't know anything about it and return -EBUSY too.

But the lru list is still empty in any of those cases, so the WARN wouldn't trigger..

~Maarten
Thomas Hellstrom Nov. 30, 2012, 11:55 a.m. UTC | #3
On 11/30/2012 11:16 AM, Maarten Lankhorst wrote:
> Op 30-11-12 10:04, Thomas Hellstrom schreef:
>> On 11/30/2012 09:05 AM, Maarten Lankhorst wrote:
>>> Just use the return error from ttm_mem_evict_first instead.
>>>
>>> Changes since v1:
>>> - Add warning if list is not empty, nothing else we can do here.
>> Marten, when this function is called, all cross-device reservers have been shut out with the TTM lock or whatever similar
>> mechanism is in place. It's a critical function that must succeed, unless there is a hardware failure to evict.
>>
>> As mentioned in the comments on the previous patch, ttm_bo_evict_first() may return 0 (OK) if it failed to reclaim the
>> trylock in cleanup_refs_and_unlock(), with the assumption that another process will destroy the bo anyway
>> (possibly at a later time which we know nothing about). The lru list needs to be empty when this function returns.
>>
>> This means we must either keep the while(list_empty) or perhaps better, retry instead of WARN if list_empty() is detected at the end of list.
> As long as evict_first returns 0, that function is called over and over again

Ah, ok. You're right.

Reviewed-by: Thomas Hellstrom<thellstrom@vmware.com>
diff mbox

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8ab23ae..9028327 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1323,25 +1323,25 @@  static int ttm_bo_force_list_clean(struct ttm_bo_device *bdev,
 	struct ttm_bo_global *glob = bdev->glob;
 	int ret;
 
-	/*
-	 * Can't use standard list traversal since we're unlocking.
-	 */
-
-	spin_lock(&glob->lru_lock);
-	while (!list_empty(&man->lru)) {
-		spin_unlock(&glob->lru_lock);
+	do {
 		ret = ttm_mem_evict_first(bdev, mem_type, false, false);
-		if (ret) {
-			if (allow_errors) {
-				return ret;
-			} else {
-				pr_err("Cleanup eviction failed\n");
-			}
+		if (ret && ret != -EBUSY && !allow_errors) {
+			pr_err("Cleanup eviction failed with %i\n", ret);
+			ret = 0;
 		}
+	} while (!ret);
+
+	if (likely(ret == -EBUSY)) {
+		/*
+		* lru list should be empty, verify this is the case.
+		*/
 		spin_lock(&glob->lru_lock);
+		WARN_ON(!list_empty(&man->lru));
+		if (list_empty(&man->lru) || !allow_errors)
+			ret = 0;
+		spin_unlock(&glob->lru_lock);
 	}
-	spin_unlock(&glob->lru_lock);
-	return 0;
+	return ret;
 }
 
 int ttm_bo_clean_mm(struct ttm_bo_device *bdev, unsigned mem_type)