Message ID | 20220318195004.416539-1-bob.beckett@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/ttm: fix potential null ptr deref in when mem space alloc fails | expand |
Am 18.03.22 um 20:50 schrieb Robert Beckett: > when allocating a resource in place it is common to free the buffer's > resource, then allocate a new resource in a different placement. > > e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls > ttm_bo_mem_space. Well yes I'm working the drivers towards this, but NAK at the moment. Currently bo->resource is never expected to be NULL. And yes I'm searching for this bug in amdgpu for quite a while. Where exactly does that happen? Amdgpu is supposed to allocate a new resource first, then do a swap and the free the old one. Thanks, Christian. > > In this situation, bo->resource will be null as it is cleared during > the initial freeing of the previous resource. > This leads to a null deref. > > Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer) > > Signed-off-by: Robert Beckett <bob.beckett@collabora.com> > --- > drivers/gpu/drm/ttm/ttm_bo.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index db3dc7ef5382..62b29ee7d040 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, > } > > error: > - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) > + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) > ttm_bo_move_to_lru_tail_unlocked(bo); > > return ret;
On 21/03/2022 09:51, Christian König wrote: > Am 18.03.22 um 20:50 schrieb Robert Beckett: >> when allocating a resource in place it is common to free the buffer's >> resource, then allocate a new resource in a different placement. >> >> e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls >> ttm_bo_mem_space. > > Well yes I'm working the drivers towards this, but NAK at the moment. > Currently bo->resource is never expected to be NULL. > > And yes I'm searching for this bug in amdgpu for quite a while. Where > exactly does that happen? in my case, I am writing new code for i915 that does this. I will switch it to allocate the new resource first, then free the old one if successful. For the existing amd case, see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c?h=v5.17#n384 amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space. If the ttm_bo_mem_space call fails (e.g. due to memory pressure), then the error path will try to deref bo->resource, which will be null at that point. to fix this, I honestly don't see a reason to not also have the safety check for null there. It could check early and return an error if it is null. I think that defensive programming here makes sense, better than a null deref if someone programs it wrong. > > Amdgpu is supposed to allocate a new resource first, then do a swap and > the free the old one. > > Thanks, > Christian. > >> >> In this situation, bo->resource will be null as it is cleared during >> the initial freeing of the previous resource. >> This leads to a null deref. >> >> Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer) >> >> Signed-off-by: Robert Beckett <bob.beckett@collabora.com> >> --- >> drivers/gpu/drm/ttm/ttm_bo.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >> index db3dc7ef5382..62b29ee7d040 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, >> } >> error: >> - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) >> + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && >> !bo->pin_count) >> ttm_bo_move_to_lru_tail_unlocked(bo); >> return ret; >
Am 21.03.22 um 16:44 schrieb Robert Beckett: > > > On 21/03/2022 09:51, Christian König wrote: >> Am 18.03.22 um 20:50 schrieb Robert Beckett: >>> when allocating a resource in place it is common to free the buffer's >>> resource, then allocate a new resource in a different placement. >>> >>> e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls >>> ttm_bo_mem_space. >> >> Well yes I'm working the drivers towards this, but NAK at the moment. >> Currently bo->resource is never expected to be NULL. >> >> And yes I'm searching for this bug in amdgpu for quite a while. Where >> exactly does that happen? > > in my case, I am writing new code for i915 that does this. I will > switch it to allocate the new resource first, then free the old one if > successful. > > For the existing amd case, see > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_object.c%3Fh%3Dv5.17%23n384&data=04%7C01%7Cchristian.koenig%40amd.com%7C81f91d39683e4991181008da0b51a8d0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637834742606744919%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mjHPvBStFM7tsYf%2BL9fYCWqddZuIaeza6BsRF3fAmao%3D&reserved=0 > > > amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls > ttm_bo_mem_space. If the ttm_bo_mem_space call fails (e.g. due to > memory pressure), then the error path will try to deref bo->resource, > which will be null at that point. Yeah, but that's a special handling only used during driver startup. We somehow have this on systems with DMA-buf sharing as well. > > to fix this, I honestly don't see a reason to not also have the safety > check for null there. It could check early and return an error if it > is null. I think that defensive programming here makes sense, better > than a null deref if someone programs it wrong. Having it here is fine, the problem is you need to have that at tons of other places as well. Maybe I should send you my WIP patch set for this? If you handle all the other cases as well I'm perfectly fine with this. Regards, Christian. > > > >> >> Amdgpu is supposed to allocate a new resource first, then do a swap >> and the free the old one. >> >> Thanks, >> Christian. >> >>> >>> In this situation, bo->resource will be null as it is cleared during >>> the initial freeing of the previous resource. >>> This leads to a null deref. >>> >>> Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer) >>> >>> Signed-off-by: Robert Beckett <bob.beckett@collabora.com> >>> --- >>> drivers/gpu/drm/ttm/ttm_bo.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>> b/drivers/gpu/drm/ttm/ttm_bo.c >>> index db3dc7ef5382..62b29ee7d040 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, >>> } >>> error: >>> - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) >>> + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && >>> !bo->pin_count) >>> ttm_bo_move_to_lru_tail_unlocked(bo); >>> return ret; >>
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index db3dc7ef5382..62b29ee7d040 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -875,7 +875,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, } error: - if (bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) + if (bo->resource && bo->resource->mem_type == TTM_PL_SYSTEM && !bo->pin_count) ttm_bo_move_to_lru_tail_unlocked(bo); return ret;
when allocating a resource in place it is common to free the buffer's resource, then allocate a new resource in a different placement. e.g. amdgpu_bo_create_kernel_at calls ttm_resource_free, then calls ttm_bo_mem_space. In this situation, bo->resource will be null as it is cleared during the initial freeing of the previous resource. This leads to a null deref. Fixes: d3116756a710 (drm/ttm: rename bo->mem and make it a pointer) Signed-off-by: Robert Beckett <bob.beckett@collabora.com> --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)