drm/ttm: fix use-after-free races in vm fault handling

Message ID	20170218225007.20754-1-nhaehnle@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nhaehnle@gmail.com> To: amd-gfx@lists.freedesktop.org Subject: [PATCH] drm/ttm: fix use-after-free races in vm fault handling Date: Sat, 18 Feb 2017 23:50:07 +0100 Message-Id: <20170218225007.20754-1-nhaehnle@gmail.com> MIME-Version: 1.0 Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <nicolai.haehnle@amd.com>, dri-devel@lists.freedesktop.org Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Nicolai Hähnle Feb. 18, 2017, 10:50 p.m. UTC

From: Nicolai Hähnle <nicolai.haehnle@amd.com>

The vm fault handler relies on the fact that the VMA owns a reference
to the BO. However, once mmap_sem is released, other tasks are free to
destroy the VMA, which can lead to the BO being freed. Fix two code
paths where that can happen, both related to vm fault retries.

Found via a lock debugging warning which flagged &bo->wu_mutex as
locked while being destroyed.

Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries")
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
--
This does not fix the random memory corruption I've been seeing.
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Christian König Feb. 19, 2017, 9:32 a.m. UTC | #1

Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle:
> From: Nicolai Hähnle <nicolai.haehnle@amd.com>
>
> The vm fault handler relies on the fact that the VMA owns a reference
> to the BO. However, once mmap_sem is released, other tasks are free to
> destroy the VMA, which can lead to the BO being freed. Fix two code
> paths where that can happen, both related to vm fault retries.
>
> Found via a lock debugging warning which flagged &bo->wu_mutex as
> locked while being destroyed.
>
> Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries")
> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com>

> --
> This does not fix the random memory corruption I've been seeing.
> ---
>   drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index a6ed9d5..750733a 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
>   		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
>   			goto out_unlock;
>   
> +		ttm_bo_reference(bo);
>   		up_read(&vma->vm_mm->mmap_sem);
>   		(void) fence_wait(bo->moving, true);
> +		ttm_bo_unreserve(bo);
> +		ttm_bo_unref(&bo);
>   		goto out_unlock;
>   	}
>   
> @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>   
>   		if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
>   			if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
> +				ttm_bo_reference(bo);
>   				up_read(&vma->vm_mm->mmap_sem);
>   				(void) ttm_bo_wait_unreserved(bo);
> +				ttm_bo_unref(&bo);
>   			}
>   
>   			return VM_FAULT_RETRY;
> @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>   	ret = ttm_bo_vm_fault_idle(bo, vma, vmf);
>   	if (unlikely(ret != 0)) {
>   		retval = ret;
> +
> +		if (retval == VM_FAULT_RETRY &&
> +		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
> +			/* The BO has already been unreserved. */
> +			return retval;
> +		}
> +
>   		goto out_unlock;
>   	}
>

Daniel Vetter Feb. 26, 2017, 9:35 p.m. UTC | #2

On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote:
> Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle:
> > From: Nicolai Hähnle <nicolai.haehnle@amd.com>
> > 
> > The vm fault handler relies on the fact that the VMA owns a reference
> > to the BO. However, once mmap_sem is released, other tasks are free to
> > destroy the VMA, which can lead to the BO being freed. Fix two code
> > paths where that can happen, both related to vm fault retries.
> > 
> > Found via a lock debugging warning which flagged &bo->wu_mutex as
> > locked while being destroyed.
> > 
> > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries")
> > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
> 
> Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com>

Since you have commit rights and all, care to push to drm-misc.git?
-Daniel

> 
> > --
> > This does not fix the random memory corruption I've been seeing.
> > ---
> >   drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > index a6ed9d5..750733a 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
> >   		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
> >   			goto out_unlock;
> > +		ttm_bo_reference(bo);
> >   		up_read(&vma->vm_mm->mmap_sem);
> >   		(void) fence_wait(bo->moving, true);
> > +		ttm_bo_unreserve(bo);
> > +		ttm_bo_unref(&bo);
> >   		goto out_unlock;
> >   	}
> > @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >   		if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
> >   			if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
> > +				ttm_bo_reference(bo);
> >   				up_read(&vma->vm_mm->mmap_sem);
> >   				(void) ttm_bo_wait_unreserved(bo);
> > +				ttm_bo_unref(&bo);
> >   			}
> >   			return VM_FAULT_RETRY;
> > @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >   	ret = ttm_bo_vm_fault_idle(bo, vma, vmf);
> >   	if (unlikely(ret != 0)) {
> >   		retval = ret;
> > +
> > +		if (retval == VM_FAULT_RETRY &&
> > +		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
> > +			/* The BO has already been unreserved. */
> > +			return retval;
> > +		}
> > +
> >   		goto out_unlock;
> >   	}
> 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Christian König Feb. 27, 2017, 8:56 a.m. UTC | #3

Am 26.02.2017 um 22:35 schrieb Daniel Vetter:
> On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote:
>> Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle:
>>> From: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>>
>>> The vm fault handler relies on the fact that the VMA owns a reference
>>> to the BO. However, once mmap_sem is released, other tasks are free to
>>> destroy the VMA, which can lead to the BO being freed. Fix two code
>>> paths where that can happen, both related to vm fault retries.
>>>
>>> Found via a lock debugging warning which flagged &bo->wu_mutex as
>>> locked while being destroyed.
>>>
>>> Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries")
>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>> Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
> Since you have commit rights and all, care to push to drm-misc.git?

Do I have to use dim or could I just push the patches using standard git?

See my problem with installing an extra tool in my dev environment is 
that I have 5+ hard disks setup from an image with all the neat stuff I 
need. Distributing something new in there would be rather painful for me.

On the other hand I could just setup my laptop and use that one as a 
bridge for pushing into drm-misc. That would work for single bug fixes 
like this, but would break my usual development workflow.

Christian.

> -Daniel
>
>>> --
>>> This does not fix the random memory corruption I've been seeing.
>>> ---
>>>    drivers/gpu/drm/ttm/ttm_bo_vm.c | 12 ++++++++++++
>>>    1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> index a6ed9d5..750733a 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> @@ -66,8 +66,11 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
>>>    		if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
>>>    			goto out_unlock;
>>> +		ttm_bo_reference(bo);
>>>    		up_read(&vma->vm_mm->mmap_sem);
>>>    		(void) fence_wait(bo->moving, true);
>>> +		ttm_bo_unreserve(bo);
>>> +		ttm_bo_unref(&bo);
>>>    		goto out_unlock;
>>>    	}
>>> @@ -120,8 +123,10 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>>>    		if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
>>>    			if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
>>> +				ttm_bo_reference(bo);
>>>    				up_read(&vma->vm_mm->mmap_sem);
>>>    				(void) ttm_bo_wait_unreserved(bo);
>>> +				ttm_bo_unref(&bo);
>>>    			}
>>>    			return VM_FAULT_RETRY;
>>> @@ -166,6 +171,13 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>>>    	ret = ttm_bo_vm_fault_idle(bo, vma, vmf);
>>>    	if (unlikely(ret != 0)) {
>>>    		retval = ret;
>>> +
>>> +		if (retval == VM_FAULT_RETRY &&
>>> +		    !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
>>> +			/* The BO has already been unreserved. */
>>> +			return retval;
>>> +		}
>>> +
>>>    		goto out_unlock;
>>>    	}
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Daniel Vetter Feb. 27, 2017, 9:08 a.m. UTC | #4

On Mon, Feb 27, 2017 at 09:56:56AM +0100, Christian König wrote:
> Am 26.02.2017 um 22:35 schrieb Daniel Vetter:
> > On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote:
> > > Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle:
> > > > From: Nicolai Hähnle <nicolai.haehnle@amd.com>
> > > > 
> > > > The vm fault handler relies on the fact that the VMA owns a reference
> > > > to the BO. However, once mmap_sem is released, other tasks are free to
> > > > destroy the VMA, which can lead to the BO being freed. Fix two code
> > > > paths where that can happen, both related to vm fault retries.
> > > > 
> > > > Found via a lock debugging warning which flagged &bo->wu_mutex as
> > > > locked while being destroyed.
> > > > 
> > > > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries")
> > > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
> > > Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
> > Since you have commit rights and all, care to push to drm-misc.git?
> 
> Do I have to use dim or could I just push the patches using standard git?
> 
> See my problem with installing an extra tool in my dev environment is that I
> have 5+ hard disks setup from an image with all the neat stuff I need.
> Distributing something new in there would be rather painful for me.
> 
> On the other hand I could just setup my laptop and use that one as a bridge
> for pushing into drm-misc. That would work for single bug fixes like this,
> but would break my usual development workflow.

Atm the big magic in dim is in the integration tree construction and
faning out conflict handling to everyone. There's also the upshot of being
able to share sanity checks for silly fumbles, but we don't yet have much
of these, so dim's indeed needed. I know it's annoying.

I can apply this one here if you don't want to fiddle with it all for now.
-Daniel

Daniel Vetter Feb. 27, 2017, 9:10 a.m. UTC | #5

On Mon, Feb 27, 2017 at 10:08:47AM +0100, Daniel Vetter wrote:
> On Mon, Feb 27, 2017 at 09:56:56AM +0100, Christian König wrote:
> > Am 26.02.2017 um 22:35 schrieb Daniel Vetter:
> > > On Sun, Feb 19, 2017 at 10:32:43AM +0100, Christian König wrote:
> > > > Am 18.02.2017 um 23:50 schrieb Nicolai Hähnle:
> > > > > From: Nicolai Hähnle <nicolai.haehnle@amd.com>
> > > > > 
> > > > > The vm fault handler relies on the fact that the VMA owns a reference
> > > > > to the BO. However, once mmap_sem is released, other tasks are free to
> > > > > destroy the VMA, which can lead to the BO being freed. Fix two code
> > > > > paths where that can happen, both related to vm fault retries.
> > > > > 
> > > > > Found via a lock debugging warning which flagged &bo->wu_mutex as
> > > > > locked while being destroyed.
> > > > > 
> > > > > Fixes: cbe12e74ee4e ("drm/ttm: Allow vm fault retries")
> > > > > Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
> > > > Good catch! Patch is Reviewed-by: Christian König <christian.koenig@amd.com>
> > > Since you have commit rights and all, care to push to drm-misc.git?
> > 
> > Do I have to use dim or could I just push the patches using standard git?
> > 
> > See my problem with installing an extra tool in my dev environment is that I
> > have 5+ hard disks setup from an image with all the neat stuff I need.
> > Distributing something new in there would be rather painful for me.
> > 
> > On the other hand I could just setup my laptop and use that one as a bridge
> > for pushing into drm-misc. That would work for single bug fixes like this,
> > but would break my usual development workflow.
> 
> Atm the big magic in dim is in the integration tree construction and
> faning out conflict handling to everyone. There's also the upshot of being
> able to share sanity checks for silly fumbles, but we don't yet have much
> of these, so dim's indeed needed. I know it's annoying.
> 
> I can apply this one here if you don't want to fiddle with it all for now.

Well just noticed that Alex already applied this to his fixes pull, so all
moot.
-Daniel

drm/ttm: fix use-after-free races in vm fault handling

Commit Message

Comments

Patch