diff mbox series

drm/ttm: fix potential null ptr deref in when mem space alloc fails

Message ID 20220318195004.416539-1-bob.beckett@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/ttm: fix potential null ptr deref in when mem space alloc fails | expand

Commit Message

Bob Beckett March 18, 2022, 7:50 p.m. UTC
when allocating a resource in place it is common to free the buffer's
resource, then allocate a new resource in a different placement.

e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls
ttm_bo_mem_space.

In this situation, bo->resource will be null as it is cleared during
the initial freeing of the previous resource.
This leads to a null deref.

Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)

Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian König March 21, 2022, 9:51 a.m. UTC | #1
Am 18.03.22 um 20:50 schrieb Robert Beckett:
> when allocating a resource in place it is common to free the buffer's
> resource, then allocate a new resource in a different placement.
>
> e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls
> ttm_bo_mem_space.

Well yes I'm working the drivers towards this, but NAK at the moment. 
Currently bo->resource is never expected to be NULL.

And yes I'm searching for this bug in amdgpu for quite a while. Where 
exactly does that happen?

Amdgpu is supposed to allocate a new resource first, then do a swap and 
the free the old one.

Thanks,
Christian.

>
> In this situation, bo->resource will be null as it is cleared during
> the initial freeing of the previous resource.
> This leads to a null deref.
>
> Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
>
> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index db3dc7ef5382..62b29ee7d040 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>   	}
>   
>   error:
> -	if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
> +	if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
>   		ttm_bo_move_to_lru_tail_unlocked(bo);
>   
>   	return ret;
Bob Beckett March 21, 2022, 3:44 p.m. UTC | #2
On 21/03/2022 09:51, Christian König wrote:
> Am 18.03.22 um 20:50 schrieb Robert Beckett:
>> when allocating a resource in place it is common to free the buffer's
>> resource, then allocate a new resource in a different placement.
>>
>> e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls
>> ttm_bo_mem_space.
> 
> Well yes I'm working the drivers towards this, but NAK at the moment. 
> Currently bo->resource is never expected to be NULL.
> 
> And yes I'm searching for this bug in amdgpu for quite a while. Where 
> exactly does that happen?

in my case, I am writing new code for i915 that does this. I will switch 
it to allocate the new resource first, then free the old one if successful.

For the existing amd case, see 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c?h=v5.17#n384


amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls 
ttm_bo_mem_space. If the ttm_bo_mem_space call fails (e.g. due to memory 
pressure), then the error path will try to deref bo->resource, which 
will be null at that point.


to fix this, I honestly don't see a reason to not also have the safety 
check for null there. It could check early and return an error if it is 
null. I think that defensive programming here makes sense, better than a 
null deref if someone programs it wrong.



> 
> Amdgpu is supposed to allocate a new resource first, then do a swap and 
> the free the old one.
> 
> Thanks,
> Christian.
> 
>>
>> In this situation, bo->resource will be null as it is cleared during
>> the initial freeing of the previous resource.
>> This leads to a null deref.
>>
>> Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
>>
>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index db3dc7ef5382..62b29ee7d040 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>>       }
>>   error:
>> -    if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
>> +    if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && 
>> !bo->pin_count)
>>           ttm_bo_move_to_lru_tail_unlocked(bo);
>>       return ret;
>
Christian König March 22, 2022, 7:17 a.m. UTC | #3
Am 21.03.22 um 16:44 schrieb Robert Beckett:
>
>
> On 21/03/2022 09:51, Christian König wrote:
>> Am 18.03.22 um 20:50 schrieb Robert Beckett:
>>> when allocating a resource in place it is common to free the buffer's
>>> resource, then allocate a new resource in a different placement.
>>>
>>> e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls
>>> ttm_bo_mem_space.
>>
>> Well yes I'm working the drivers towards this, but NAK at the moment. 
>> Currently bo->resource is never expected to be NULL.
>>
>> And yes I'm searching for this bug in amdgpu for quite a while. Where 
>> exactly does that happen?
>
> in my case, I am writing new code for i915 that does this. I will 
> switch it to allocate the new resource first, then free the old one if 
> successful.
>
> For the existing amd case, see 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_object.c%3Fh%3Dv5.17%23n384&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C81f91d39683e4991181008da0b51a8d0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637834742606744919%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=mjHPvBStFM7tsYf%2BL9fYCWqddZuIaeza6BsRF3fAmao%3D&amp;reserved=0
>
>
> amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls 
> ttm_bo_mem_space. If the ttm_bo_mem_space call fails (e.g. due to 
> memory pressure), then the error path will try to deref bo->resource, 
> which will be null at that point.

Yeah, but that's a special handling only used during driver startup. We 
somehow have this on systems with DMA-buf sharing as well.

>
> to fix this, I honestly don't see a reason to not also have the safety 
> check for null there. It could check early and return an error if it 
> is null. I think that defensive programming here makes sense, better 
> than a null deref if someone programs it wrong.

Having it here is fine, the problem is you need to have that at tons of 
other places as well.

Maybe I should send you my WIP patch set for this? If you handle all the 
other cases as well I'm perfectly fine with this.

Regards,
Christian.

>
>
>
>>
>> Amdgpu is supposed to allocate a new resource first, then do a swap 
>> and the free the old one.
>>
>> Thanks,
>> Christian.
>>
>>>
>>> In this situation, bo->resource will be null as it is cleared during
>>> the initial freeing of the previous resource.
>>> This leads to a null deref.
>>>
>>> Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer)
>>>
>>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index db3dc7ef5382..62b29ee7d040 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
>>>       }
>>>   error:
>>> -    if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
>>> +    if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && 
>>> !bo->pin_count)
>>>           ttm_bo_move_to_lru_tail_unlocked(bo);
>>>       return ret;
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index db3dc7ef5382..62b29ee7d040 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -875,7 +875,7 @@  int ttm_bo_mem_space(struct ttm_buffer_object *bo,
 	}
 
 error:
-	if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
+	if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count)
 		ttm_bo_move_to_lru_tail_unlocked(bo);
 
 	return ret;