Message ID | 20170218225007.20754-1-nhaehnle@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle: > From: Nicolai Hähnle <nicolai.haehnle@amd.com> > > The vm fault handler relies on the fact that the VMA owns a reference > to the BO. However, once mmap_sem is released, other tasks are free to > destroy the VMA, which can lead to the BO being freed. Fix two code > paths where that can happen, both related to vm fault retries. > > Found via a lock debugging warning which flagged &bo->wu_mutex as > locked while being destroyed. > > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries") > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com> > -- > This does not fix the random memory corruption I've been seeing. > --- > drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c > index a6ed9d5..750733a 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c > @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > goto out_unlock; > > + ttm_bo_reference(bo); > up_read(&vma->vm_mm->mmap_sem); > (void) fence_wait(bo->moving, true); > + ttm_bo_unreserve(bo); > + ttm_bo_unref(&bo); > goto out_unlock; > } > > @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > > if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { > if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { > + ttm_bo_reference(bo); > up_read(&vma->vm_mm->mmap_sem); > (void) ttm_bo_wait_unreserved(bo); > + ttm_bo_unref(&bo); > } > > return VM_FAULT_RETRY; > @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > ret = ttm_bo_vm_fault_idle(bo, vma, vmf); > if (unlikely(ret != 0)) { > retval = ret; > + > + if (retval == VM_FAULT_RETRY && > + !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { > + /* The BO has already been unreserved. */ > + return retval; > + } > + > goto out_unlock; > } >
On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote: > Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle: > > From: Nicolai Hähnle <nicolai.haehnle@amd.com> > > > > The vm fault handler relies on the fact that the VMA owns a reference > > to the BO. However, once mmap_sem is released, other tasks are free to > > destroy the VMA, which can lead to the BO being freed. Fix two code > > paths where that can happen, both related to vm fault retries. > > > > Found via a lock debugging warning which flagged &bo->wu_mutex as > > locked while being destroyed. > > > > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries") > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> > > Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com> Since you have commit rights and all, care to push to drm-misc.git? -Daniel > > > -- > > This does not fix the random memory corruption I've been seeing. > > --- > > drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c > > index a6ed9d5..750733a 100644 > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c > > @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, > > if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) > > goto out_unlock; > > + ttm_bo_reference(bo); > > up_read(&vma->vm_mm->mmap_sem); > > (void) fence_wait(bo->moving, true); > > + ttm_bo_unreserve(bo); > > + ttm_bo_unref(&bo); > > goto out_unlock; > > } > > @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > > if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { > > if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { > > + ttm_bo_reference(bo); > > up_read(&vma->vm_mm->mmap_sem); > > (void) ttm_bo_wait_unreserved(bo); > > + ttm_bo_unref(&bo); > > } > > return VM_FAULT_RETRY; > > @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > > ret = ttm_bo_vm_fault_idle(bo, vma, vmf); > > if (unlikely(ret != 0)) { > > retval = ret; > > + > > + if (retval == VM_FAULT_RETRY && > > + !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { > > + /* The BO has already been unreserved. */ > > + return retval; > > + } > > + > > goto out_unlock; > > } > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 26.02.2017 um 22:35 schrieb Daniel Vetter: > On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote: >> Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle: >>> From: Nicolai Hähnle <nicolai.haehnle@amd.com> >>> >>> The vm fault handler relies on the fact that the VMA owns a reference >>> to the BO. However, once mmap_sem is released, other tasks are free to >>> destroy the VMA, which can lead to the BO being freed. Fix two code >>> paths where that can happen, both related to vm fault retries. >>> >>> Found via a lock debugging warning which flagged &bo->wu_mutex as >>> locked while being destroyed. >>> >>> Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries") >>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> >> Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com> > Since you have commit rights and all, care to push to drm-misc.git? Do I have to use dim or could I just push the patches using standard git? See my problem with installing an extra tool in my dev environment is that I have 5+ hard disks setup from an image with all the neat stuff I need. Distributing something new in there would be rather painful for me. On the other hand I could just setup my laptop and use that one as a bridge for pushing into drm-misc. That would work for single bug fixes like this, but would break my usual development workflow. Christian. > -Daniel > >>> -- >>> This does not fix the random memory corruption I've been seeing. >>> --- >>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++ >>> 1 file changed, 12 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> index a6ed9d5..750733a 100644 >>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c >>> @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, >>> if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) >>> goto out_unlock; >>> + ttm_bo_reference(bo); >>> up_read(&vma->vm_mm->mmap_sem); >>> (void) fence_wait(bo->moving, true); >>> + ttm_bo_unreserve(bo); >>> + ttm_bo_unref(&bo); >>> goto out_unlock; >>> } >>> @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) >>> if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { >>> if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { >>> + ttm_bo_reference(bo); >>> up_read(&vma->vm_mm->mmap_sem); >>> (void) ttm_bo_wait_unreserved(bo); >>> + ttm_bo_unref(&bo); >>> } >>> return VM_FAULT_RETRY; >>> @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) >>> ret = ttm_bo_vm_fault_idle(bo, vma, vmf); >>> if (unlikely(ret != 0)) { >>> retval = ret; >>> + >>> + if (retval == VM_FAULT_RETRY && >>> + !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { >>> + /* The BO has already been unreserved. */ >>> + return retval; >>> + } >>> + >>> goto out_unlock; >>> } >> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Mon, Feb 27, 2017 at 09:56:56AM +0100, Christian König wrote: > Am 26.02.2017 um 22:35 schrieb Daniel Vetter: > > On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote: > > > Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle: > > > > From: Nicolai Hähnle <nicolai.haehnle@amd.com> > > > > > > > > The vm fault handler relies on the fact that the VMA owns a reference > > > > to the BO. However, once mmap_sem is released, other tasks are free to > > > > destroy the VMA, which can lead to the BO being freed. Fix two code > > > > paths where that can happen, both related to vm fault retries. > > > > > > > > Found via a lock debugging warning which flagged &bo->wu_mutex as > > > > locked while being destroyed. > > > > > > > > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries") > > > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> > > > Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com> > > Since you have commit rights and all, care to push to drm-misc.git? > > Do I have to use dim or could I just push the patches using standard git? > > See my problem with installing an extra tool in my dev environment is that I > have 5+ hard disks setup from an image with all the neat stuff I need. > Distributing something new in there would be rather painful for me. > > On the other hand I could just setup my laptop and use that one as a bridge > for pushing into drm-misc. That would work for single bug fixes like this, > but would break my usual development workflow. Atm the big magic in dim is in the integration tree construction and faning out conflict handling to everyone. There's also the upshot of being able to share sanity checks for silly fumbles, but we don't yet have much of these, so dim's indeed needed. I know it's annoying. I can apply this one here if you don't want to fiddle with it all for now. -Daniel
On Mon, Feb 27, 2017 at 10:08:47AM +0100, Daniel Vetter wrote: > On Mon, Feb 27, 2017 at 09:56:56AM +0100, Christian König wrote: > > Am 26.02.2017 um 22:35 schrieb Daniel Vetter: > > > On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote: > > > > Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle: > > > > > From: Nicolai Hähnle <nicolai.haehnle@amd.com> > > > > > > > > > > The vm fault handler relies on the fact that the VMA owns a reference > > > > > to the BO. However, once mmap_sem is released, other tasks are free to > > > > > destroy the VMA, which can lead to the BO being freed. Fix two code > > > > > paths where that can happen, both related to vm fault retries. > > > > > > > > > > Found via a lock debugging warning which flagged &bo->wu_mutex as > > > > > locked while being destroyed. > > > > > > > > > > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries") > > > > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> > > > > Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com> > > > Since you have commit rights and all, care to push to drm-misc.git? > > > > Do I have to use dim or could I just push the patches using standard git? > > > > See my problem with installing an extra tool in my dev environment is that I > > have 5+ hard disks setup from an image with all the neat stuff I need. > > Distributing something new in there would be rather painful for me. > > > > On the other hand I could just setup my laptop and use that one as a bridge > > for pushing into drm-misc. That would work for single bug fixes like this, > > but would break my usual development workflow. > > Atm the big magic in dim is in the integration tree construction and > faning out conflict handling to everyone. There's also the upshot of being > able to share sanity checks for silly fumbles, but we don't yet have much > of these, so dim's indeed needed. I know it's annoying. > > I can apply this one here if you don't want to fiddle with it all for now. Well just noticed that Alex already applied this to his fixes pull, so all moot. -Daniel
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index a6ed9d5..750733a 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) goto out_unlock; + ttm_bo_reference(bo); up_read(&vma->vm_mm->mmap_sem); (void) fence_wait(bo->moving, true); + ttm_bo_unreserve(bo); + ttm_bo_unref(&bo); goto out_unlock; } @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) { if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { + ttm_bo_reference(bo); up_read(&vma->vm_mm->mmap_sem); (void) ttm_bo_wait_unreserved(bo); + ttm_bo_unref(&bo); } return VM_FAULT_RETRY; @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) ret = ttm_bo_vm_fault_idle(bo, vma, vmf); if (unlikely(ret != 0)) { retval = ret; + + if (retval == VM_FAULT_RETRY && + !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) { + /* The BO has already been unreserved. */ + return retval; + } + goto out_unlock; }