drm/i915: Fix unhandled deadlock in grab_vma()

Message ID	20221110053133.2433412-1-mani@chromium.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Mani Milani <mani@chromium.org> To: LKML <linux-kernel@vger.kernel.org> Date: Thu, 10 Nov 2022 16:31:33 +1100 Message-Id: <20221110053133.2433412-1-mani@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH] drm/i915: Fix unhandled deadlock in grab_vma() Precedence: list Cc: dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, =?utf-8?q?Thomas_Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>, Mani Milani <mani@chromium.org>, Chris Wilson <chris@chris-wilson.co.uk>, Matthew Auld <matthew.auld@intel.com>, Daniel Vetter <daniel@ffwll.ch>, Rodrigo Vivi <rodrigo.vivi@intel.com>, David Airlie <airlied@gmail.com>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, Nirmoy Das <nirmoy.das@intel.com> Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	drm/i915: Fix unhandled deadlock in grab_vma() \| expand drm/i915: Fix unhandled deadlock in grab_vma()

Mani Milani Nov. 10, 2022, 5:31 a.m. UTC

At present, the gpu thread crashes at times when grab_vma() attempts to
acquire a gem object lock when in a deadlock state.

Problems:
I identified the following 4 issues in the current code:
1. Since grab_vma() calls i915_gem_object_trylock(), which consequently
   calls ww_mutex_trylock(), to acquire lock, it does not perform any
   -EDEADLK handling; And -EALREADY handling is also unreliable,
   according to the description of ww_mutex_trylock().
2. Since the return value of grab_vma() is a boolean showing
   success/failure, it does not provide any extra information on the
   failure reason, and therefore does not provide any mechanism to its
   caller to take any action to fix a potential deadlock.
3. Current grab_vma() implementation produces inconsistent behaviour
   depending on the refcount value, without informing the caller. If
   refcount is already zero, grab_vma() neither acquires lock nor
   increments the refcount, but still returns 'true' for success! This
   means that grab_vma() returning true (for success) does not always
   mean that the gem obj is actually safely accessible.
4. Currently, calling "i915_gem_object_lock(obj,ww)" is meant to be
   followed by a consequent "i915_gem_object_unlock(obj)" ONLY if the
   original 'ww' object pointer was NULL, or otherwise not be called and
   leave the houskeeping to "i915_gem_ww_ctx_fini(ww)". There are a few
   issues with this:
   - This is not documented anywhere in the code (that I could find),
     but only explained in an older commit message.
   - This produces an inconsistent usage of the lock/unlock functions,
     increasing the chance of mistakes and issues.
   - This is not a clean design as it requires any new code that calls
     these lock/unlock functions to know their internals, as well as the
     internals of the functions calling the new code being added.

Fix:
To fix the issues above, this patch:
1. Changes grab_vma() to call i915_gem_object_lock() instead of
   i915_gem_object_trylock(), to handle -EDEADLK and -EALREADY cases.
   This should not cause any issue since the PIN_NONBLOCK flag is
   checked beforehand in the 2 cases grab_vma() is called.
2. Changes grab_vma() to return the actual error code, instead of bool.
3. Changes grab_vma() to behave consistently when returning success, by
   both incrementing the refcount and acquiring lock at all times.
4. Changes i915_gem_object_unlock() to pair with i915_gem_object_lock()
   nicely in all cases and do the housekeeping without the need for the
   caller to do anything other than simply calling lock and unlock.
5. Ensures the gem obj->obj_link is initialized and deleted from the ww
   list such that it can be tested for emptiness using list_empty().

Signed-off-by: Mani Milani <mani@chromium.org>
---

 drivers/gpu/drm/i915/gem/i915_gem_object.c |  2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h | 10 ++++-
 drivers/gpu/drm/i915/i915_gem_evict.c      | 48 ++++++++++++----------
 drivers/gpu/drm/i915/i915_gem_ww.c         |  8 ++--
 4 files changed, 41 insertions(+), 27 deletions(-)

Matthew Auld Nov. 10, 2022, 2:49 p.m. UTC | #1

On 10/11/2022 05:31, Mani Milani wrote:
> At present, the gpu thread crashes at times when grab_vma() attempts to
> acquire a gem object lock when in a deadlock state.
> 
> Problems:
> I identified the following 4 issues in the current code:
> 1. Since grab_vma() calls i915_gem_object_trylock(), which consequently
>     calls ww_mutex_trylock(), to acquire lock, it does not perform any
>     -EDEADLK handling; And -EALREADY handling is also unreliable,
>     according to the description of ww_mutex_trylock().
> 2. Since the return value of grab_vma() is a boolean showing
>     success/failure, it does not provide any extra information on the
>     failure reason, and therefore does not provide any mechanism to its
>     caller to take any action to fix a potential deadlock.
> 3. Current grab_vma() implementation produces inconsistent behaviour
>     depending on the refcount value, without informing the caller. If
>     refcount is already zero, grab_vma() neither acquires lock nor
>     increments the refcount, but still returns 'true' for success! This
>     means that grab_vma() returning true (for success) does not always
>     mean that the gem obj is actually safely accessible.
> 4. Currently, calling "i915_gem_object_lock(obj,ww)" is meant to be
>     followed by a consequent "i915_gem_object_unlock(obj)" ONLY if the
>     original 'ww' object pointer was NULL, or otherwise not be called and
>     leave the houskeeping to "i915_gem_ww_ctx_fini(ww)". There are a few
>     issues with this:
>     - This is not documented anywhere in the code (that I could find),
>       but only explained in an older commit message.
>     - This produces an inconsistent usage of the lock/unlock functions,
>       increasing the chance of mistakes and issues.
>     - This is not a clean design as it requires any new code that calls
>       these lock/unlock functions to know their internals, as well as the
>       internals of the functions calling the new code being added.
> 
> Fix:
> To fix the issues above, this patch:
> 1. Changes grab_vma() to call i915_gem_object_lock() instead of
>     i915_gem_object_trylock(), to handle -EDEADLK and -EALREADY cases.
>     This should not cause any issue since the PIN_NONBLOCK flag is
>     checked beforehand in the 2 cases grab_vma() is called.
> 2. Changes grab_vma() to return the actual error code, instead of bool.
> 3. Changes grab_vma() to behave consistently when returning success, by
>     both incrementing the refcount and acquiring lock at all times.
> 4. Changes i915_gem_object_unlock() to pair with i915_gem_object_lock()
>     nicely in all cases and do the housekeeping without the need for the
>     caller to do anything other than simply calling lock and unlock.
> 5. Ensures the gem obj->obj_link is initialized and deleted from the ww
>     list such that it can be tested for emptiness using list_empty().
> 
> Signed-off-by: Mani Milani <mani@chromium.org>
> ---
> 
>   drivers/gpu/drm/i915/gem/i915_gem_object.c |  2 +
>   drivers/gpu/drm/i915/gem/i915_gem_object.h | 10 ++++-
>   drivers/gpu/drm/i915/i915_gem_evict.c      | 48 ++++++++++++----------
>   drivers/gpu/drm/i915/i915_gem_ww.c         |  8 ++--
>   4 files changed, 41 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 369006c5317f..69d013b393fb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -78,6 +78,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>   
>   	INIT_LIST_HEAD(&obj->mm.link);
>   
> +	INIT_LIST_HEAD(&obj->obj_link);
> +
>   	INIT_LIST_HEAD(&obj->lut_list);
>   	spin_lock_init(&obj->lut_lock);
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index 1723af9b0f6a..7e7a61bdf52c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -219,7 +219,7 @@ static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj,
>   		return ww_mutex_trylock(&obj->base.resv->lock, &ww->ctx);
>   }
>   
> -static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj)
> +static inline void __i915_gem_object_unlock(struct drm_i915_gem_object *obj)
>   {
>   	if (obj->ops->adjust_lru)
>   		obj->ops->adjust_lru(obj);
> @@ -227,6 +227,14 @@ static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj)
>   	dma_resv_unlock(obj->base.resv);
>   }
>   
> +static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj)
> +{
> +	if (list_empty(&obj->obj_link))
> +		__i915_gem_object_unlock(obj);
> +	else
> +		i915_gem_ww_unlock_single(obj);
> +}
> +
>   static inline void
>   i915_gem_object_set_readonly(struct drm_i915_gem_object *obj)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index f025ee4fa526..3eb514b4eddc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -55,29 +55,33 @@ static int ggtt_flush(struct intel_gt *gt)
>   	return intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
>   }
>   
> -static bool grab_vma(struct i915_vma *vma, struct i915_gem_ww_ctx *ww)
> +static int grab_vma(struct i915_vma *vma, struct i915_gem_ww_ctx *ww)
>   {
> +	int err;
> +
> +	/* Dead objects don't need pins */
> +	if (dying_vma(vma))
> +		atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> +
> +	err = i915_gem_object_lock(vma->obj, ww);

AFAIK the issue here is that we are already holding the vm->mutex, so 
this can potentially deadlock, which I guess is why this was trylock.

We typically grab a bunch of object locks during execbuf, and then grab 
the vm->mutex, before binding the vma for each object. So vm->mutex is 
always our inner lock, and the object lock is the outer one. Using a 
full lock here then inverts that locking AFAICT. Like say if one process 
is holding object A + vm->mutex and then tries to grab object B here in 
grab_vma(), but another process is already holding object B + waiting to 
grab vm->mutex?

> +
>   	/*
>   	 * We add the extra refcount so the object doesn't drop to zero until
> -	 * after ungrab_vma(), this way trylock is always paired with unlock.
> +	 * after ungrab_vma(), this way lock is always paired with unlock.
>   	 */
> -	if (i915_gem_object_get_rcu(vma->obj)) {
> -		if (!i915_gem_object_trylock(vma->obj, ww)) {
> -			i915_gem_object_put(vma->obj);
> -			return false;
> -		}
> -	} else {
> -		/* Dead objects don't need pins */
> -		atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> -	}
> +	if (!err)
> +		i915_gem_object_get(vma->obj);
>   
> -	return true;
> +	return err;
>   }
>   
>   static void ungrab_vma(struct i915_vma *vma)
>   {
> -	if (dying_vma(vma))
> +	if (dying_vma(vma)) {
> +		/* Dead objects don't need pins */
> +		atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
>   		return;
> +	}
>   
>   	i915_gem_object_unlock(vma->obj);
>   	i915_gem_object_put(vma->obj);
> @@ -93,10 +97,11 @@ mark_free(struct drm_mm_scan *scan,
>   	if (i915_vma_is_pinned(vma))
>   		return false;
>   
> -	if (!grab_vma(vma, ww))
> +	if (grab_vma(vma, ww))
>   		return false;
>   
>   	list_add(&vma->evict_link, unwind);
> +
>   	return drm_mm_scan_add_block(scan, &vma->node);
>   }
>   
> @@ -284,10 +289,12 @@ i915_gem_evict_something(struct i915_address_space *vm,
>   		vma = container_of(node, struct i915_vma, node);
>   
>   		/* If we find any non-objects (!vma), we cannot evict them */
> -		if (vma->node.color != I915_COLOR_UNEVICTABLE &&
> -		    grab_vma(vma, ww)) {
> -			ret = __i915_vma_unbind(vma);
> -			ungrab_vma(vma);
> +		if (vma->node.color != I915_COLOR_UNEVICTABLE) {
> +			ret = grab_vma(vma, ww);
> +			if (!ret) {
> +				ret = __i915_vma_unbind(vma);
> +				ungrab_vma(vma);
> +			}
>   		} else {
>   			ret = -ENOSPC;
>   		}
> @@ -382,10 +389,9 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
>   			break;
>   		}
>   
> -		if (!grab_vma(vma, ww)) {
> -			ret = -ENOSPC;
> +		ret = grab_vma(vma, ww);
> +		if (ret)
>   			break;
> -		}
>   
>   		/*
>   		 * Never show fear in the face of dragons!
> diff --git a/drivers/gpu/drm/i915/i915_gem_ww.c b/drivers/gpu/drm/i915/i915_gem_ww.c
> index 3f6ff139478e..937b279f50fc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_ww.c
> +++ b/drivers/gpu/drm/i915/i915_gem_ww.c
> @@ -19,16 +19,14 @@ static void i915_gem_ww_ctx_unlock_all(struct i915_gem_ww_ctx *ww)
>   	struct drm_i915_gem_object *obj;
>   
>   	while ((obj = list_first_entry_or_null(&ww->obj_list, struct drm_i915_gem_object, obj_link))) {
> -		list_del(&obj->obj_link);
> -		i915_gem_object_unlock(obj);
> -		i915_gem_object_put(obj);
> +		i915_gem_ww_unlock_single(obj);
>   	}
>   }
>   
>   void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj)
>   {
> -	list_del(&obj->obj_link);
> -	i915_gem_object_unlock(obj);
> +	list_del_init(&obj->obj_link);
> +	__i915_gem_object_unlock(obj);
>   	i915_gem_object_put(obj);
>   }
>

Thomas Hellstrom Nov. 10, 2022, 3:21 p.m. UTC | #2

On Thu, 2022-11-10 at 14:49 +0000, Matthew Auld wrote:
> On 10/11/2022 05:31, Mani Milani wrote:
> > At present, the gpu thread crashes at times when grab_vma()
> > attempts to
> > acquire a gem object lock when in a deadlock state.
> > 
> > Problems:
> > I identified the following 4 issues in the current code:
> > 1. Since grab_vma() calls i915_gem_object_trylock(), which
> > consequently
> >     calls ww_mutex_trylock(), to acquire lock, it does not perform
> > any
> >     -EDEADLK handling; And -EALREADY handling is also unreliable,
> >     according to the description of ww_mutex_trylock().
> > 2. Since the return value of grab_vma() is a boolean showing
> >     success/failure, it does not provide any extra information on
> > the
> >     failure reason, and therefore does not provide any mechanism to
> > its
> >     caller to take any action to fix a potential deadlock.
> > 3. Current grab_vma() implementation produces inconsistent
> > behaviour
> >     depending on the refcount value, without informing the caller.
> > If
> >     refcount is already zero, grab_vma() neither acquires lock nor
> >     increments the refcount, but still returns 'true' for success!
> > This
> >     means that grab_vma() returning true (for success) does not
> > always
> >     mean that the gem obj is actually safely accessible.
> > 4. Currently, calling "i915_gem_object_lock(obj,ww)" is meant to be
> >     followed by a consequent "i915_gem_object_unlock(obj)" ONLY if
> > the
> >     original 'ww' object pointer was NULL, or otherwise not be
> > called and
> >     leave the houskeeping to "i915_gem_ww_ctx_fini(ww)". There are
> > a few
> >     issues with this:
> >     - This is not documented anywhere in the code (that I could
> > find),
> >       but only explained in an older commit message.
> >     - This produces an inconsistent usage of the lock/unlock
> > functions,
> >       increasing the chance of mistakes and issues.
> >     - This is not a clean design as it requires any new code that
> > calls
> >       these lock/unlock functions to know their internals, as well
> > as the
> >       internals of the functions calling the new code being added.
> > 
> > Fix:
> > To fix the issues above, this patch:
> > 1. Changes grab_vma() to call i915_gem_object_lock() instead of
> >     i915_gem_object_trylock(), to handle -EDEADLK and -EALREADY
> > cases.
> >     This should not cause any issue since the PIN_NONBLOCK flag is
> >     checked beforehand in the 2 cases grab_vma() is called.
> > 2. Changes grab_vma() to return the actual error code, instead of
> > bool.
> > 3. Changes grab_vma() to behave consistently when returning
> > success, by
> >     both incrementing the refcount and acquiring lock at all times.
> > 4. Changes i915_gem_object_unlock() to pair with
> > i915_gem_object_lock()
> >     nicely in all cases and do the housekeeping without the need
> > for the
> >     caller to do anything other than simply calling lock and
> > unlock.
> > 5. Ensures the gem obj->obj_link is initialized and deleted from
> > the ww
> >     list such that it can be tested for emptiness using
> > list_empty().
> > 
> > Signed-off-by: Mani Milani <mani@chromium.org>
> > ---
> > 
> >   drivers/gpu/drm/i915/gem/i915_gem_object.c |  2 +
> >   drivers/gpu/drm/i915/gem/i915_gem_object.h | 10 ++++-
> >   drivers/gpu/drm/i915/i915_gem_evict.c      | 48 ++++++++++++-----
> > -----
> >   drivers/gpu/drm/i915/i915_gem_ww.c         |  8 ++--
> >   4 files changed, 41 insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > index 369006c5317f..69d013b393fb 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > @@ -78,6 +78,8 @@ void i915_gem_object_init(struct
> > drm_i915_gem_object *obj,
> >   
> >         INIT_LIST_HEAD(&obj->mm.link);
> >   
> > +       INIT_LIST_HEAD(&obj->obj_link);
> > +
> >         INIT_LIST_HEAD(&obj->lut_list);
> >         spin_lock_init(&obj->lut_lock);
> >   
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > index 1723af9b0f6a..7e7a61bdf52c 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> > @@ -219,7 +219,7 @@ static inline bool
> > i915_gem_object_trylock(struct drm_i915_gem_object *obj,
> >                 return ww_mutex_trylock(&obj->base.resv->lock, &ww-
> > >ctx);
> >   }
> >   
> > -static inline void i915_gem_object_unlock(struct
> > drm_i915_gem_object *obj)
> > +static inline void __i915_gem_object_unlock(struct
> > drm_i915_gem_object *obj)
> >   {
> >         if (obj->ops->adjust_lru)
> >                 obj->ops->adjust_lru(obj);
> > @@ -227,6 +227,14 @@ static inline void
> > i915_gem_object_unlock(struct drm_i915_gem_object *obj)
> >         dma_resv_unlock(obj->base.resv);
> >   }
> >   
> > +static inline void i915_gem_object_unlock(struct
> > drm_i915_gem_object *obj)
> > +{
> > +       if (list_empty(&obj->obj_link))
> > +               __i915_gem_object_unlock(obj);
> > +       else
> > +               i915_gem_ww_unlock_single(obj);
> > +}
> > +
> >   static inline void
> >   i915_gem_object_set_readonly(struct drm_i915_gem_object *obj)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c
> > b/drivers/gpu/drm/i915/i915_gem_evict.c
> > index f025ee4fa526..3eb514b4eddc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> > @@ -55,29 +55,33 @@ static int ggtt_flush(struct intel_gt *gt)
> >         return intel_gt_wait_for_idle(gt, MAX_SCHEDULE_TIMEOUT);
> >   }
> >   
> > -static bool grab_vma(struct i915_vma *vma, struct i915_gem_ww_ctx
> > *ww)
> > +static int grab_vma(struct i915_vma *vma, struct i915_gem_ww_ctx
> > *ww)
> >   {
> > +       int err;
> > +
> > +       /* Dead objects don't need pins */
> > +       if (dying_vma(vma))
> > +               atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> > +
> > +       err = i915_gem_object_lock(vma->obj, ww);
> 
> AFAIK the issue here is that we are already holding the vm->mutex, so
> this can potentially deadlock, which I guess is why this was trylock.
> 
> We typically grab a bunch of object locks during execbuf, and then
> grab 
> the vm->mutex, before binding the vma for each object. So vm->mutex
> is 
> always our inner lock, and the object lock is the outer one. Using a 
> full lock here then inverts that locking AFAICT. Like say if one
> process 
> is holding object A + vm->mutex and then tries to grab object B here
> in 
> grab_vma(), but another process is already holding object B + waiting
> to 
> grab vm->mutex?

Indeed. 

IIRC the assumption here was that a ww mutex trylock would be similar
in behaviour to the previous code which instead checked for pinned
vmas.

One difference, though, is that we lock most objects upfront and then
try to make space for VMAs avoiding evicting vmas that is needed anyway
for the batch. The old code would instead evict and rebind.

But there should be a catch-all evict everything and rebind if eviction
fails so it would be beneficial to see the "gpu thread crash". I think
there is an issue filed in gitlab where even  this catch-all evict
everything fails that might needs looking at.

/Thomas

> 
> > +
> >         /*
> >          * We add the extra refcount so the object doesn't drop to
> > zero until
> > -        * after ungrab_vma(), this way trylock is always paired
> > with unlock.
> > +        * after ungrab_vma(), this way lock is always paired with
> > unlock.
> >          */
> > -       if (i915_gem_object_get_rcu(vma->obj)) {
> > -               if (!i915_gem_object_trylock(vma->obj, ww)) {
> > -                       i915_gem_object_put(vma->obj);
> > -                       return false;
> > -               }
> > -       } else {
> > -               /* Dead objects don't need pins */
> > -               atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> > -       }
> > +       if (!err)
> > +               i915_gem_object_get(vma->obj);
> >   
> > -       return true;
> > +       return err;
> >   }
> >   
> >   static void ungrab_vma(struct i915_vma *vma)
> >   {
> > -       if (dying_vma(vma))
> > +       if (dying_vma(vma)) {
> > +               /* Dead objects don't need pins */
> > +               atomic_and(~I915_VMA_PIN_MASK, &vma->flags);
> >                 return;
> > +       }
> >   
> >         i915_gem_object_unlock(vma->obj);
> >         i915_gem_object_put(vma->obj);
> > @@ -93,10 +97,11 @@ mark_free(struct drm_mm_scan *scan,
> >         if (i915_vma_is_pinned(vma))
> >                 return false;
> >   
> > -       if (!grab_vma(vma, ww))
> > +       if (grab_vma(vma, ww))
> >                 return false;
> >   
> >         list_add(&vma->evict_link, unwind);
> > +
> >         return drm_mm_scan_add_block(scan, &vma->node);
> >   }
> >   
> > @@ -284,10 +289,12 @@ i915_gem_evict_something(struct
> > i915_address_space *vm,
> >                 vma = container_of(node, struct i915_vma, node);
> >   
> >                 /* If we find any non-objects (!vma), we cannot
> > evict them */
> > -               if (vma->node.color != I915_COLOR_UNEVICTABLE &&
> > -                   grab_vma(vma, ww)) {
> > -                       ret = __i915_vma_unbind(vma);
> > -                       ungrab_vma(vma);
> > +               if (vma->node.color != I915_COLOR_UNEVICTABLE) {
> > +                       ret = grab_vma(vma, ww);
> > +                       if (!ret) {
> > +                               ret = __i915_vma_unbind(vma);
> > +                               ungrab_vma(vma);
> > +                       }
> >                 } else {
> >                         ret = -ENOSPC;
> >                 }
> > @@ -382,10 +389,9 @@ int i915_gem_evict_for_node(struct
> > i915_address_space *vm,
> >                         break;
> >                 }
> >   
> > -               if (!grab_vma(vma, ww)) {
> > -                       ret = -ENOSPC;
> > +               ret = grab_vma(vma, ww);
> > +               if (ret)
> >                         break;
> > -               }
> >   
> >                 /*
> >                  * Never show fear in the face of dragons!
> > diff --git a/drivers/gpu/drm/i915/i915_gem_ww.c
> > b/drivers/gpu/drm/i915/i915_gem_ww.c
> > index 3f6ff139478e..937b279f50fc 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_ww.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_ww.c
> > @@ -19,16 +19,14 @@ static void i915_gem_ww_ctx_unlock_all(struct
> > i915_gem_ww_ctx *ww)
> >         struct drm_i915_gem_object *obj;
> >   
> >         while ((obj = list_first_entry_or_null(&ww->obj_list,
> > struct drm_i915_gem_object, obj_link))) {
> > -               list_del(&obj->obj_link);
> > -               i915_gem_object_unlock(obj);
> > -               i915_gem_object_put(obj);
> > +               i915_gem_ww_unlock_single(obj);
> >         }
> >   }
> >   
> >   void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj)
> >   {
> > -       list_del(&obj->obj_link);
> > -       i915_gem_object_unlock(obj);
> > +       list_del_init(&obj->obj_link);
> > +       __i915_gem_object_unlock(obj);
> >         i915_gem_object_put(obj);
> >   }
> >

Mani Milani Nov. 14, 2022, 2:16 a.m. UTC | #3

Thank you for your comments.

To Thomas's point, the crash always seems to happen when the following
sequence of events occurs:

1. When inside "i915_gem_evict_vm()", the call to
"i915_gem_object_trylock(vma->obj, ww)" fails (due to deadlock), and
eviction of a vma is skipped as a result. Basically if the code
reaches here:
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/i915/i915_gem_evict.c#L468
And here is the stack dump for this scenario:
    Call Trace:
    <TASK>
    dump_stack_lvl+0x68/0x95
    i915_gem_evict_vm+0x1d2/0x369
    eb_validate_vmas+0x54a/0x6ae
    eb_relocate_parse+0x4b/0xdb
    i915_gem_execbuffer2_ioctl+0x6f5/0xab6
    ? i915_gem_object_prepare_write+0xfb/0xfb
    drm_ioctl_kernel+0xda/0x14d
    drm_ioctl+0x27f/0x3b7
    ? i915_gem_object_prepare_write+0xfb/0xfb
    __se_sys_ioctl+0x7a/0xbc
    do_syscall_64+0x56/0xa1
    ? exit_to_user_mode_prepare+0x3d/0x8c
    entry_SYSCALL_64_after_hwframe+0x61/0xcb
    RIP: 0033:0x78302de5fae7
    Code: c0 0f 89 74 ff ff ff 48 83 c4 08 49 c7 c4 ff ff ff ff 5b 4c
89 e0 41 5c 41 5d 5d c3 0f 1f 80 00 00 00 00 b8 10 00 00 00 0f 05 <48>
3d 01 f0 ff ff 73 01 c3 48 8b 0d 51 c3 0c 00 f7 d8 64 89 01 48
    RSP: 002b:00007ffe64b87f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 000003cc00470000 RCX: 000078302de5fae7
    RDX: 00007ffe64b87fd0 RSI: 0000000040406469 RDI: 000000000000000d
    RBP: 00007ffe64b87fa0 R08: 0000000000000013 R09: 000003cc004d0950
    R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000000d
    R13: 0000000000000000 R14: 00007ffe64b87fd0 R15: 0000000040406469
    </TASK>
It is worth noting that "i915_gem_evict_vm()" still returns success in
this case.

2. After step 1 occurs, the next call to "grab_vma()" always fails
(with "i915_gem_object_trylock(vma->obj, ww)" failing also due to
deadlock), which then results in the crash.
Here is the stack dump for this scenario:
    Call Trace:
    <TASK>
    dump_stack_lvl+0x68/0x95
    grab_vma+0x6c/0xd0
    i915_gem_evict_for_node+0x178/0x23b
    i915_gem_gtt_reserve+0x5a/0x82
    i915_vma_insert+0x295/0x29e
    i915_vma_pin_ww+0x41e/0x5c7
    eb_validate_vmas+0x5f5/0x6ae
    eb_relocate_parse+0x4b/0xdb
    i915_gem_execbuffer2_ioctl+0x6f5/0xab6
    ? i915_gem_object_prepare_write+0xfb/0xfb
    drm_ioctl_kernel+0xda/0x14d
    drm_ioctl+0x27f/0x3b7
    ? i915_gem_object_prepare_write+0xfb/0xfb
    __se_sys_ioctl+0x7a/0xbc
    do_syscall_64+0x56/0xa1
    ? exit_to_user_mode_prepare+0x3d/0x8c
    entry_SYSCALL_64_after_hwframe+0x61/0xcb
    RIP: 0033:0x78302de5fae7
    Code: c0 0f 89 74 ff ff ff 48 83 c4 08 49 c7 c4 ff ff ff ff 5b 4c
89 e0 41 5c 41 5d 5d c3 0f 1f 80 00 00 00 00 b8 10 00 00 00 0f 05 <48>
3d 01 f0 ff ff 73 01 c3 48 8b 0d 51 c3 0c 00 f7 d8 64 89 01 48
    RSP: 002b:00007ffe64b87f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
    RAX: ffffffffffffffda RBX: 000003cc00470000 RCX: 000078302de5fae7
    RDX: 00007ffe64b87fd0 RSI: 0000000040406469 RDI: 000000000000000d
    RBP: 00007ffe64b87fa0 R08: 0000000000000013 R09: 000003cc004d0950
    R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000000d
    R13: 0000000000000000 R14: 00007ffe64b87fd0 R15: 0000000040406469
    </TASK>

My Notes:
- I verified the two "i915_gem_object_trylock()" failures I mentioned
above are due to deadlock by slightly modifying the code to call
"i915_gem_object_lock()" only in those exact cases and subsequent to
the trylock failure, only to look at the return error code.
- The two cases mentioned above, are the only cases where
"i915_gem_object_trylock(obj, ww)" is called with the second argument
not being forced to NULL.
- When in either of the two cases above (i.e. inside "grab_vma()" or
"i915_gem_evict_vm") I replace calling "i915_gem_object_trylock" with
"i915_gem_object_lock", the issue gets resolved (because deadlock is
detected and resolved).

So if this could matches the design better, another solution could be
for "grab_vma" to continue to call "i915_gem_object_trylock", but for
"i915_gem_evict_vm" to call "i915_gem_object_lock" instead.

Further info:
- Would you like any further info on the crash? If so, could you
please advise 1) what exactly you need and 2) how I can share with you
especially if it is big dumps?

Thanks.

Thomas Hellstrom Nov. 14, 2022, 12:48 p.m. UTC | #4

Hi, Mani.

On 11/14/22 03:16, Mani Milani wrote:
> Thank you for your comments.
>
> To Thomas's point, the crash always seems to happen when the following
> sequence of events occurs:
>
> 1. When inside "i915_gem_evict_vm()", the call to
> "i915_gem_object_trylock(vma->obj, ww)" fails (due to deadlock), and
> eviction of a vma is skipped as a result. Basically if the code
> reaches here:
> https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/i915/i915_gem_evict.c#L468
> And here is the stack dump for this scenario:
>      Call Trace:
>      <TASK>
>      dump_stack_lvl+0x68/0x95
>      i915_gem_evict_vm+0x1d2/0x369
>      eb_validate_vmas+0x54a/0x6ae
>      eb_relocate_parse+0x4b/0xdb
>      i915_gem_execbuffer2_ioctl+0x6f5/0xab6
>      ? i915_gem_object_prepare_write+0xfb/0xfb
>      drm_ioctl_kernel+0xda/0x14d
>      drm_ioctl+0x27f/0x3b7
>      ? i915_gem_object_prepare_write+0xfb/0xfb
>      __se_sys_ioctl+0x7a/0xbc
>      do_syscall_64+0x56/0xa1
>      ? exit_to_user_mode_prepare+0x3d/0x8c
>      entry_SYSCALL_64_after_hwframe+0x61/0xcb
>      RIP: 0033:0x78302de5fae7
>      Code: c0 0f 89 74 ff ff ff 48 83 c4 08 49 c7 c4 ff ff ff ff 5b 4c
> 89 e0 41 5c 41 5d 5d c3 0f 1f 80 00 00 00 00 b8 10 00 00 00 0f 05 <48>
> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 51 c3 0c 00 f7 d8 64 89 01 48
>      RSP: 002b:00007ffe64b87f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>      RAX: ffffffffffffffda RBX: 000003cc00470000 RCX: 000078302de5fae7
>      RDX: 00007ffe64b87fd0 RSI: 0000000040406469 RDI: 000000000000000d
>      RBP: 00007ffe64b87fa0 R08: 0000000000000013 R09: 000003cc004d0950
>      R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000000d
>      R13: 0000000000000000 R14: 00007ffe64b87fd0 R15: 0000000040406469
>      </TASK>
> It is worth noting that "i915_gem_evict_vm()" still returns success in
> this case.
>
> 2. After step 1 occurs, the next call to "grab_vma()" always fails
> (with "i915_gem_object_trylock(vma->obj, ww)" failing also due to
> deadlock), which then results in the crash.
> Here is the stack dump for this scenario:
>      Call Trace:
>      <TASK>
>      dump_stack_lvl+0x68/0x95
>      grab_vma+0x6c/0xd0
>      i915_gem_evict_for_node+0x178/0x23b
>      i915_gem_gtt_reserve+0x5a/0x82
>      i915_vma_insert+0x295/0x29e
>      i915_vma_pin_ww+0x41e/0x5c7
>      eb_validate_vmas+0x5f5/0x6ae
>      eb_relocate_parse+0x4b/0xdb
>      i915_gem_execbuffer2_ioctl+0x6f5/0xab6
>      ? i915_gem_object_prepare_write+0xfb/0xfb
>      drm_ioctl_kernel+0xda/0x14d
>      drm_ioctl+0x27f/0x3b7
>      ? i915_gem_object_prepare_write+0xfb/0xfb
>      __se_sys_ioctl+0x7a/0xbc
>      do_syscall_64+0x56/0xa1
>      ? exit_to_user_mode_prepare+0x3d/0x8c
>      entry_SYSCALL_64_after_hwframe+0x61/0xcb
>      RIP: 0033:0x78302de5fae7
>      Code: c0 0f 89 74 ff ff ff 48 83 c4 08 49 c7 c4 ff ff ff ff 5b 4c
> 89 e0 41 5c 41 5d 5d c3 0f 1f 80 00 00 00 00 b8 10 00 00 00 0f 05 <48>
> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 51 c3 0c 00 f7 d8 64 89 01 48
>      RSP: 002b:00007ffe64b87f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>      RAX: ffffffffffffffda RBX: 000003cc00470000 RCX: 000078302de5fae7
>      RDX: 00007ffe64b87fd0 RSI: 0000000040406469 RDI: 000000000000000d
>      RBP: 00007ffe64b87fa0 R08: 0000000000000013 R09: 000003cc004d0950
>      R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000000d
>      R13: 0000000000000000 R14: 00007ffe64b87fd0 R15: 0000000040406469
>      </TASK>
>
> My Notes:
> - I verified the two "i915_gem_object_trylock()" failures I mentioned
> above are due to deadlock by slightly modifying the code to call
> "i915_gem_object_lock()" only in those exact cases and subsequent to
> the trylock failure, only to look at the return error code.
> - The two cases mentioned above, are the only cases where
> "i915_gem_object_trylock(obj, ww)" is called with the second argument
> not being forced to NULL.
> - When in either of the two cases above (i.e. inside "grab_vma()" or
> "i915_gem_evict_vm") I replace calling "i915_gem_object_trylock" with
> "i915_gem_object_lock", the issue gets resolved (because deadlock is
> detected and resolved).
>
> So if this could matches the design better, another solution could be
> for "grab_vma" to continue to call "i915_gem_object_trylock", but for
> "i915_gem_evict_vm" to call "i915_gem_object_lock" instead.

No, i915_gem_object_lock() is not allowed when the vm mutex is held.


>
> Further info:
> - Would you like any further info on the crash? If so, could you
> please advise 1) what exactly you need and 2) how I can share with you
> especially if it is big dumps?

Yes, I would like to know how the crash manifests itself. Is it a kernel 
BUG or a kernel WARNING or is it the user-space application that crashes 
due to receiveing an -ENOSPC?

Thanks,

Thomas



>
> Thanks.

Mani Milani Nov. 15, 2022, 11:54 p.m. UTC | #5

Hi Thomas,

It is a user-space application that crashes due to receiving an -ENOSPC.

This occurs after code reaches the line below where grab_vma() fails
(due to deadlock) and consequently i915_gem_evict_for_node() returns
-ENOSPC. I provided the call stack for when this happens in my
previous message:
https://github.com/torvalds/linux/blame/59d0d52c30d4991ac4b329f049cc37118e00f5b0/drivers/gpu/drm/i915/i915_gem_evict.c#L386

Context:
This crash is happening on an intel GeminiLake chromebook, when
running a video seek h264 stress test, and it is reproducible 100%. To
troubleshoot, I did a git bisect and found the following commit to be
the culprit (which is when grab_vma() has been introduced):
https://github.com/torvalds/linux/commit/7e00897be8bf13ef9c68c95a8e386b714c29ad95

I also have crash dumps and further logs that I can send you if
needed. But please let me know how to share those with you, since
pasting them here does not seem reasonable to me.

Thanks,
Mani

On Mon, Nov 14, 2022 at 11:48 PM Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> Hi, Mani.
>
> On 11/14/22 03:16, Mani Milani wrote:
> > Thank you for your comments.
> >
> > To Thomas's point, the crash always seems to happen when the following
> > sequence of events occurs:
> >
> > 1. When inside "i915_gem_evict_vm()", the call to
> > "i915_gem_object_trylock(vma->obj, ww)" fails (due to deadlock), and
> > eviction of a vma is skipped as a result. Basically if the code
> > reaches here:
> > https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/i915/i915_gem_evict.c#L468
> > And here is the stack dump for this scenario:
> >      Call Trace:
> >      <TASK>
> >      dump_stack_lvl+0x68/0x95
> >      i915_gem_evict_vm+0x1d2/0x369
> >      eb_validate_vmas+0x54a/0x6ae
> >      eb_relocate_parse+0x4b/0xdb
> >      i915_gem_execbuffer2_ioctl+0x6f5/0xab6
> >      ? i915_gem_object_prepare_write+0xfb/0xfb
> >      drm_ioctl_kernel+0xda/0x14d
> >      drm_ioctl+0x27f/0x3b7
> >      ? i915_gem_object_prepare_write+0xfb/0xfb
> >      __se_sys_ioctl+0x7a/0xbc
> >      do_syscall_64+0x56/0xa1
> >      ? exit_to_user_mode_prepare+0x3d/0x8c
> >      entry_SYSCALL_64_after_hwframe+0x61/0xcb
> >      RIP: 0033:0x78302de5fae7
> >      Code: c0 0f 89 74 ff ff ff 48 83 c4 08 49 c7 c4 ff ff ff ff 5b 4c
> > 89 e0 41 5c 41 5d 5d c3 0f 1f 80 00 00 00 00 b8 10 00 00 00 0f 05 <48>
> > 3d 01 f0 ff ff 73 01 c3 48 8b 0d 51 c3 0c 00 f7 d8 64 89 01 48
> >      RSP: 002b:00007ffe64b87f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> >      RAX: ffffffffffffffda RBX: 000003cc00470000 RCX: 000078302de5fae7
> >      RDX: 00007ffe64b87fd0 RSI: 0000000040406469 RDI: 000000000000000d
> >      RBP: 00007ffe64b87fa0 R08: 0000000000000013 R09: 000003cc004d0950
> >      R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000000d
> >      R13: 0000000000000000 R14: 00007ffe64b87fd0 R15: 0000000040406469
> >      </TASK>
> > It is worth noting that "i915_gem_evict_vm()" still returns success in
> > this case.
> >
> > 2. After step 1 occurs, the next call to "grab_vma()" always fails
> > (with "i915_gem_object_trylock(vma->obj, ww)" failing also due to
> > deadlock), which then results in the crash.
> > Here is the stack dump for this scenario:
> >      Call Trace:
> >      <TASK>
> >      dump_stack_lvl+0x68/0x95
> >      grab_vma+0x6c/0xd0
> >      i915_gem_evict_for_node+0x178/0x23b
> >      i915_gem_gtt_reserve+0x5a/0x82
> >      i915_vma_insert+0x295/0x29e
> >      i915_vma_pin_ww+0x41e/0x5c7
> >      eb_validate_vmas+0x5f5/0x6ae
> >      eb_relocate_parse+0x4b/0xdb
> >      i915_gem_execbuffer2_ioctl+0x6f5/0xab6
> >      ? i915_gem_object_prepare_write+0xfb/0xfb
> >      drm_ioctl_kernel+0xda/0x14d
> >      drm_ioctl+0x27f/0x3b7
> >      ? i915_gem_object_prepare_write+0xfb/0xfb
> >      __se_sys_ioctl+0x7a/0xbc
> >      do_syscall_64+0x56/0xa1
> >      ? exit_to_user_mode_prepare+0x3d/0x8c
> >      entry_SYSCALL_64_after_hwframe+0x61/0xcb
> >      RIP: 0033:0x78302de5fae7
> >      Code: c0 0f 89 74 ff ff ff 48 83 c4 08 49 c7 c4 ff ff ff ff 5b 4c
> > 89 e0 41 5c 41 5d 5d c3 0f 1f 80 00 00 00 00 b8 10 00 00 00 0f 05 <48>
> > 3d 01 f0 ff ff 73 01 c3 48 8b 0d 51 c3 0c 00 f7 d8 64 89 01 48
> >      RSP: 002b:00007ffe64b87f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> >      RAX: ffffffffffffffda RBX: 000003cc00470000 RCX: 000078302de5fae7
> >      RDX: 00007ffe64b87fd0 RSI: 0000000040406469 RDI: 000000000000000d
> >      RBP: 00007ffe64b87fa0 R08: 0000000000000013 R09: 000003cc004d0950
> >      R10: 0000000000000200 R11: 0000000000000246 R12: 000000000000000d
> >      R13: 0000000000000000 R14: 00007ffe64b87fd0 R15: 0000000040406469
> >      </TASK>
> >
> > My Notes:
> > - I verified the two "i915_gem_object_trylock()" failures I mentioned
> > above are due to deadlock by slightly modifying the code to call
> > "i915_gem_object_lock()" only in those exact cases and subsequent to
> > the trylock failure, only to look at the return error code.
> > - The two cases mentioned above, are the only cases where
> > "i915_gem_object_trylock(obj, ww)" is called with the second argument
> > not being forced to NULL.
> > - When in either of the two cases above (i.e. inside "grab_vma()" or
> > "i915_gem_evict_vm") I replace calling "i915_gem_object_trylock" with
> > "i915_gem_object_lock", the issue gets resolved (because deadlock is
> > detected and resolved).
> >
> > So if this could matches the design better, another solution could be
> > for "grab_vma" to continue to call "i915_gem_object_trylock", but for
> > "i915_gem_evict_vm" to call "i915_gem_object_lock" instead.
>
> No, i915_gem_object_lock() is not allowed when the vm mutex is held.
>
>
> >
> > Further info:
> > - Would you like any further info on the crash? If so, could you
> > please advise 1) what exactly you need and 2) how I can share with you
> > especially if it is big dumps?
>
> Yes, I would like to know how the crash manifests itself. Is it a kernel
> BUG or a kernel WARNING or is it the user-space application that crashes
> due to receiveing an -ENOSPC?
>
> Thanks,
>
> Thomas
>
>
>
> >
> > Thanks.

drm/i915: Fix unhandled deadlock in grab_vma()

Commit Message

Comments

Patch