diff mbox

[1/2] drm/ttm: never add BO that failed to validate to the LRU list

Message ID 20170214091845.1225-2-nhaehnle@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolai Hähnle Feb. 14, 2017, 9:18 a.m. UTC
From: Nicolai Hähnle <nicolai.haehnle@amd.com>

Fixes a potential race condition in amdgpu that looks as follows:

Task 1: attempt ttm_bo_init, but ttm_bo_validate fails
Task 1: add BO to global list anyway
Task 2: grabs hold of the BO, waits on its reservation lock
Task 1: releases its reference of the BO; never gives up the
        reservation lock

The patch "drm/amdgpu: fix a potential deadlock in
amdgpu_bo_create_restricted()" attempts to fix that by releasing
the reservation lock in amdgpu code; unfortunately, it introduces
a use-after-free when this race _doesn't_ happen.

This patch should fix the race properly by never adding the BO
to the global list in the first place.

Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: zhoucm1 <david1.zhou@amd.com>
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Christian König Feb. 14, 2017, 10:38 a.m. UTC | #1
Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle:
> From: Nicolai Hähnle <nicolai.haehnle@amd.com>
>
> Fixes a potential race condition in amdgpu that looks as follows:
>
> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails
> Task 1: add BO to global list anyway
> Task 2: grabs hold of the BO, waits on its reservation lock
> Task 1: releases its reference of the BO; never gives up the
>          reservation lock
>
> The patch "drm/amdgpu: fix a potential deadlock in
> amdgpu_bo_create_restricted()" attempts to fix that by releasing
> the reservation lock in amdgpu code; unfortunately, it introduces
> a use-after-free when this race _doesn't_ happen.
>
> This patch should fix the race properly by never adding the BO
> to the global list in the first place.
>
> Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
> Cc: zhoucm1 <david1.zhou@amd.com>
> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

NAK, that is actually not correct either.

The previous implementation doesn't have an use after free, but actually 
a memory leak.

I completely agree that adding the BO to the LRU in case of an error is 
incorrect, but ttm_bo_add_to_lru() will add some references to the BO.

So the following ttm_bo_unref() will just drop the initial reference.

Let's clean up this mess by moving the freeing of the BO in case of an 
error to the caller.

Regards,
Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 239a957..76bee42 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev,
>   	if (likely(!ret))
>   		ret = ttm_bo_validate(bo, placement, interruptible, false);
>   
> -	if (!resv) {
> +	if (!resv)
>   		ttm_bo_unreserve(bo);
>   
> -	} else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
> +	if (unlikely(ret)) {
> +		ttm_bo_unref(&bo);
> +		return ret;
> +	}
> +
> +	if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>   		spin_lock(&bo->glob->lru_lock);
>   		ttm_bo_add_to_lru(bo);
>   		spin_unlock(&bo->glob->lru_lock);
>   	}
>   
> -	if (unlikely(ret))
> -		ttm_bo_unref(&bo);
> -
>   	return ret;
>   }
>   EXPORT_SYMBOL(ttm_bo_init);
Nicolai Hähnle Feb. 14, 2017, 11:37 a.m. UTC | #2
On 14.02.2017 11:38, Christian König wrote:
> Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle:
>> From: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>
>> Fixes a potential race condition in amdgpu that looks as follows:
>>
>> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails
>> Task 1: add BO to global list anyway
>> Task 2: grabs hold of the BO, waits on its reservation lock
>> Task 1: releases its reference of the BO; never gives up the
>>          reservation lock
>>
>> The patch "drm/amdgpu: fix a potential deadlock in
>> amdgpu_bo_create_restricted()" attempts to fix that by releasing
>> the reservation lock in amdgpu code; unfortunately, it introduces
>> a use-after-free when this race _doesn't_ happen.
>>
>> This patch should fix the race properly by never adding the BO
>> to the global list in the first place.
>>
>> Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
>> Cc: zhoucm1 <david1.zhou@amd.com>
>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>
> NAK, that is actually not correct either.
>
> The previous implementation doesn't have an use after free, but actually
> a memory leak.
>
> I completely agree that adding the BO to the LRU in case of an error is
> incorrect, but ttm_bo_add_to_lru() will add some references to the BO.

Yes, but those are bo->list_krefs, not bo->krefs. ttm_bo_unref will put 
the bo->kref, which should lead to ttm_bo_release. This will then add 
the buffer to the ddestroy list and eventually put all the list_krefs. 
(Or possibly, if the ttm_bo_wait in ttm_bo_cleanup_refs_or_queue fails, 
it will explicitly delete the buffer from the LRU, which should amount 
to the same, I think).

Or perhaps I'm still missing something, I'm not too familiar with the 
whole BO reference counting.


> So the following ttm_bo_unref() will just drop the initial reference.
>
> Let's clean up this mess by moving the freeing of the BO in case of an
> error to the caller.

Yes, see my other patches.

Thanks,
Nicolai


>
> Regards,
> Christian.
>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 239a957..76bee42 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev,
>>       if (likely(!ret))
>>           ret = ttm_bo_validate(bo, placement, interruptible, false);
>>   -    if (!resv) {
>> +    if (!resv)
>>           ttm_bo_unreserve(bo);
>>   -    } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>> +    if (unlikely(ret)) {
>> +        ttm_bo_unref(&bo);
>> +        return ret;
>> +    }
>> +
>> +    if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>>           spin_lock(&bo->glob->lru_lock);
>>           ttm_bo_add_to_lru(bo);
>>           spin_unlock(&bo->glob->lru_lock);
>>       }
>>   -    if (unlikely(ret))
>> -        ttm_bo_unref(&bo);
>> -
>>       return ret;
>>   }
>>   EXPORT_SYMBOL(ttm_bo_init);
>
>
Christian König Feb. 14, 2017, 12:53 p.m. UTC | #3
Am 14.02.2017 um 12:37 schrieb Nicolai Hähnle:
> On 14.02.2017 11:38, Christian König wrote:
>> Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle:
>>> From: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>>
>>> Fixes a potential race condition in amdgpu that looks as follows:
>>>
>>> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails
>>> Task 1: add BO to global list anyway
>>> Task 2: grabs hold of the BO, waits on its reservation lock
>>> Task 1: releases its reference of the BO; never gives up the
>>>          reservation lock
>>>
>>> The patch "drm/amdgpu: fix a potential deadlock in
>>> amdgpu_bo_create_restricted()" attempts to fix that by releasing
>>> the reservation lock in amdgpu code; unfortunately, it introduces
>>> a use-after-free when this race _doesn't_ happen.
>>>
>>> This patch should fix the race properly by never adding the BO
>>> to the global list in the first place.
>>>
>>> Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com>
>>> Cc: zhoucm1 <david1.zhou@amd.com>
>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>
>> NAK, that is actually not correct either.
>>
>> The previous implementation doesn't have an use after free, but actually
>> a memory leak.
>>
>> I completely agree that adding the BO to the LRU in case of an error is
>> incorrect, but ttm_bo_add_to_lru() will add some references to the BO.
>
> Yes, but those are bo->list_krefs, not bo->krefs. ttm_bo_unref will 
> put the bo->kref, which should lead to ttm_bo_release. This will then 
> add the buffer to the ddestroy list and eventually put all the 
> list_krefs. (Or possibly, if the ttm_bo_wait in 
> ttm_bo_cleanup_refs_or_queue fails, it will explicitly delete the 
> buffer from the LRU, which should amount to the same, I think).
>
> Or perhaps I'm still missing something, I'm not too familiar with the 
> whole BO reference counting.

Ah, yes of course, never mind.

>
>
>> So the following ttm_bo_unref() will just drop the initial reference.
>>
>> Let's clean up this mess by moving the freeing of the BO in case of an
>> error to the caller.
>
> Yes, see my other patches.

In this case the set is Reviewed-by: Christian König 
<christian.koenig@amd.com>.

Regards,
Christian.

>
> Thanks,
> Nicolai
>
>
>>
>> Regards,
>> Christian.
>>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++-----
>>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 239a957..76bee42 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev,
>>>       if (likely(!ret))
>>>           ret = ttm_bo_validate(bo, placement, interruptible, false);
>>>   -    if (!resv) {
>>> +    if (!resv)
>>>           ttm_bo_unreserve(bo);
>>>   -    } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>>> +    if (unlikely(ret)) {
>>> +        ttm_bo_unref(&bo);
>>> +        return ret;
>>> +    }
>>> +
>>> +    if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
>>>           spin_lock(&bo->glob->lru_lock);
>>>           ttm_bo_add_to_lru(bo);
>>>           spin_unlock(&bo->glob->lru_lock);
>>>       }
>>>   -    if (unlikely(ret))
>>> -        ttm_bo_unref(&bo);
>>> -
>>>       return ret;
>>>   }
>>>   EXPORT_SYMBOL(ttm_bo_init);
>>
>>
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 239a957..76bee42 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1215,18 +1215,20 @@  int ttm_bo_init(struct ttm_bo_device *bdev,
 	if (likely(!ret))
 		ret = ttm_bo_validate(bo, placement, interruptible, false);
 
-	if (!resv) {
+	if (!resv)
 		ttm_bo_unreserve(bo);
 
-	} else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
+	if (unlikely(ret)) {
+		ttm_bo_unref(&bo);
+		return ret;
+	}
+
+	if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) {
 		spin_lock(&bo->glob->lru_lock);
 		ttm_bo_add_to_lru(bo);
 		spin_unlock(&bo->glob->lru_lock);
 	}
 
-	if (unlikely(ret))
-		ttm_bo_unref(&bo);
-
 	return ret;
 }
 EXPORT_SYMBOL(ttm_bo_init);