Message ID | 20170214091845.1225-2-nhaehnle@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle: > From: Nicolai Hähnle <nicolai.haehnle@amd.com> > > Fixes a potential race condition in amdgpu that looks as follows: > > Task 1: attempt ttm_bo_init, but ttm_bo_validate fails > Task 1: add BO to global list anyway > Task 2: grabs hold of the BO, waits on its reservation lock > Task 1: releases its reference of the BO; never gives up the > reservation lock > > The patch "drm/amdgpu: fix a potential deadlock in > amdgpu_bo_create_restricted()" attempts to fix that by releasing > the reservation lock in amdgpu code; unfortunately, it introduces > a use-after-free when this race _doesn't_ happen. > > This patch should fix the race properly by never adding the BO > to the global list in the first place. > > Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com> > Cc: zhoucm1 <david1.zhou@amd.com> > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> NAK, that is actually not correct either. The previous implementation doesn't have an use after free, but actually a memory leak. I completely agree that adding the BO to the LRU in case of an error is incorrect, but ttm_bo_add_to_lru() will add some references to the BO. So the following ttm_bo_unref() will just drop the initial reference. Let's clean up this mess by moving the freeing of the BO in case of an error to the caller. Regards, Christian. > --- > drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 239a957..76bee42 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev, > if (likely(!ret)) > ret = ttm_bo_validate(bo, placement, interruptible, false); > > - if (!resv) { > + if (!resv) > ttm_bo_unreserve(bo); > > - } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { > + if (unlikely(ret)) { > + ttm_bo_unref(&bo); > + return ret; > + } > + > + if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { > spin_lock(&bo->glob->lru_lock); > ttm_bo_add_to_lru(bo); > spin_unlock(&bo->glob->lru_lock); > } > > - if (unlikely(ret)) > - ttm_bo_unref(&bo); > - > return ret; > } > EXPORT_SYMBOL(ttm_bo_init);
On 14.02.2017 11:38, Christian König wrote: > Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle: >> From: Nicolai Hähnle <nicolai.haehnle@amd.com> >> >> Fixes a potential race condition in amdgpu that looks as follows: >> >> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails >> Task 1: add BO to global list anyway >> Task 2: grabs hold of the BO, waits on its reservation lock >> Task 1: releases its reference of the BO; never gives up the >> reservation lock >> >> The patch "drm/amdgpu: fix a potential deadlock in >> amdgpu_bo_create_restricted()" attempts to fix that by releasing >> the reservation lock in amdgpu code; unfortunately, it introduces >> a use-after-free when this race _doesn't_ happen. >> >> This patch should fix the race properly by never adding the BO >> to the global list in the first place. >> >> Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com> >> Cc: zhoucm1 <david1.zhou@amd.com> >> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> > > NAK, that is actually not correct either. > > The previous implementation doesn't have an use after free, but actually > a memory leak. > > I completely agree that adding the BO to the LRU in case of an error is > incorrect, but ttm_bo_add_to_lru() will add some references to the BO. Yes, but those are bo->list_krefs, not bo->krefs. ttm_bo_unref will put the bo->kref, which should lead to ttm_bo_release. This will then add the buffer to the ddestroy list and eventually put all the list_krefs. (Or possibly, if the ttm_bo_wait in ttm_bo_cleanup_refs_or_queue fails, it will explicitly delete the buffer from the LRU, which should amount to the same, I think). Or perhaps I'm still missing something, I'm not too familiar with the whole BO reference counting. > So the following ttm_bo_unref() will just drop the initial reference. > > Let's clean up this mess by moving the freeing of the BO in case of an > error to the caller. Yes, see my other patches. Thanks, Nicolai > > Regards, > Christian. > >> --- >> drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++----- >> 1 file changed, 7 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c >> index 239a957..76bee42 100644 >> --- a/drivers/gpu/drm/ttm/ttm_bo.c >> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev, >> if (likely(!ret)) >> ret = ttm_bo_validate(bo, placement, interruptible, false); >> - if (!resv) { >> + if (!resv) >> ttm_bo_unreserve(bo); >> - } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { >> + if (unlikely(ret)) { >> + ttm_bo_unref(&bo); >> + return ret; >> + } >> + >> + if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { >> spin_lock(&bo->glob->lru_lock); >> ttm_bo_add_to_lru(bo); >> spin_unlock(&bo->glob->lru_lock); >> } >> - if (unlikely(ret)) >> - ttm_bo_unref(&bo); >> - >> return ret; >> } >> EXPORT_SYMBOL(ttm_bo_init); > >
Am 14.02.2017 um 12:37 schrieb Nicolai Hähnle: > On 14.02.2017 11:38, Christian König wrote: >> Am 14.02.2017 um 10:18 schrieb Nicolai Hähnle: >>> From: Nicolai Hähnle <nicolai.haehnle@amd.com> >>> >>> Fixes a potential race condition in amdgpu that looks as follows: >>> >>> Task 1: attempt ttm_bo_init, but ttm_bo_validate fails >>> Task 1: add BO to global list anyway >>> Task 2: grabs hold of the BO, waits on its reservation lock >>> Task 1: releases its reference of the BO; never gives up the >>> reservation lock >>> >>> The patch "drm/amdgpu: fix a potential deadlock in >>> amdgpu_bo_create_restricted()" attempts to fix that by releasing >>> the reservation lock in amdgpu code; unfortunately, it introduces >>> a use-after-free when this race _doesn't_ happen. >>> >>> This patch should fix the race properly by never adding the BO >>> to the global list in the first place. >>> >>> Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com> >>> Cc: zhoucm1 <david1.zhou@amd.com> >>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> >> >> NAK, that is actually not correct either. >> >> The previous implementation doesn't have an use after free, but actually >> a memory leak. >> >> I completely agree that adding the BO to the LRU in case of an error is >> incorrect, but ttm_bo_add_to_lru() will add some references to the BO. > > Yes, but those are bo->list_krefs, not bo->krefs. ttm_bo_unref will > put the bo->kref, which should lead to ttm_bo_release. This will then > add the buffer to the ddestroy list and eventually put all the > list_krefs. (Or possibly, if the ttm_bo_wait in > ttm_bo_cleanup_refs_or_queue fails, it will explicitly delete the > buffer from the LRU, which should amount to the same, I think). > > Or perhaps I'm still missing something, I'm not too familiar with the > whole BO reference counting. Ah, yes of course, never mind. > > >> So the following ttm_bo_unref() will just drop the initial reference. >> >> Let's clean up this mess by moving the freeing of the BO in case of an >> error to the caller. > > Yes, see my other patches. In this case the set is Reviewed-by: Christian König <christian.koenig@amd.com>. Regards, Christian. > > Thanks, > Nicolai > > >> >> Regards, >> Christian. >> >>> --- >>> drivers/gpu/drm/ttm/ttm_bo.c | 12 +++++++----- >>> 1 file changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>> b/drivers/gpu/drm/ttm/ttm_bo.c >>> index 239a957..76bee42 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>> @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev, >>> if (likely(!ret)) >>> ret = ttm_bo_validate(bo, placement, interruptible, false); >>> - if (!resv) { >>> + if (!resv) >>> ttm_bo_unreserve(bo); >>> - } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { >>> + if (unlikely(ret)) { >>> + ttm_bo_unref(&bo); >>> + return ret; >>> + } >>> + >>> + if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { >>> spin_lock(&bo->glob->lru_lock); >>> ttm_bo_add_to_lru(bo); >>> spin_unlock(&bo->glob->lru_lock); >>> } >>> - if (unlikely(ret)) >>> - ttm_bo_unref(&bo); >>> - >>> return ret; >>> } >>> EXPORT_SYMBOL(ttm_bo_init); >> >> >
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 239a957..76bee42 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1215,18 +1215,20 @@ int ttm_bo_init(struct ttm_bo_device *bdev, if (likely(!ret)) ret = ttm_bo_validate(bo, placement, interruptible, false); - if (!resv) { + if (!resv) ttm_bo_unreserve(bo); - } else if (!(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { + if (unlikely(ret)) { + ttm_bo_unref(&bo); + return ret; + } + + if (resv && !(bo->mem.placement & TTM_PL_FLAG_NO_EVICT)) { spin_lock(&bo->glob->lru_lock); ttm_bo_add_to_lru(bo); spin_unlock(&bo->glob->lru_lock); } - if (unlikely(ret)) - ttm_bo_unref(&bo); - return ret; } EXPORT_SYMBOL(ttm_bo_init);