diff mbox

drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display

Message ID 1396462861-16396-1-git-send-email-oscar.mateo@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

oscar.mateo@intel.com April 2, 2014, 6:21 p.m. UTC
From: Oscar Mateo <oscar.mateo@intel.com>

Otherwise, we do a NULL pointer dereference.

I've seen this happen while handling an error in
i915_gem_object_pin_to_display_plane():

If i915_gem_object_set_cache_level() fails, we call is_pin_display() to
handle the error. At this point, the object is still not pinned to GGTT
and maybe not even bound, so we have to check before we dereference its
GGTT vma.

Issue: VIZ-3772
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Chris Wilson April 2, 2014, 5:59 p.m. UTC | #1
On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Otherwise, we do a NULL pointer dereference.

Or we could just do
was_pin_display = obj->pin_display;
obj->pin_display = true;

err_unpin_display:
obj->pin_display = was_pin_display;

And then we wouldn't even need the essay... However, it is almost a
theory of operation and quite useful...
-Chris
Daniel Vetter April 3, 2014, 9:34 a.m. UTC | #2
On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote:
> From: Oscar Mateo <oscar.mateo@intel.com>
> 
> Otherwise, we do a NULL pointer dereference.
> 
> I've seen this happen while handling an error in
> i915_gem_object_pin_to_display_plane():
> 
> If i915_gem_object_set_cache_level() fails, we call is_pin_display() to
> handle the error. At this point, the object is still not pinned to GGTT
> and maybe not even bound, so we have to check before we dereference its
> GGTT vma.
> 
> Issue: VIZ-3772
> Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

Have you looked into provoking this with an igt testcase? On a hunch a
busy load (to extend the race window) plus the usual interruptor trick to
jump out of wait_seqno calls should be able to make this go kaboom on
command. But I haven't analyzed the bug in detail.

Thanks, Daniel
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c70121d..1d161c7 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3619,6 +3619,10 @@ unlock:
>  
>  static bool is_pin_display(struct drm_i915_gem_object *obj)
>  {
> +	struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
> +	if (!vma)
> +		return false;
> +
>  	/* There are 3 sources that pin objects:
>  	 *   1. The display engine (scanouts, sprites, cursors);
>  	 *   2. Reservations for execbuffer;
> @@ -3630,7 +3634,7 @@ static bool is_pin_display(struct drm_i915_gem_object *obj)
>  	 * subtracting the potential reference by the user, any pin_count
>  	 * remains, it must be due to another use by the display engine.
>  	 */
> -	return i915_gem_obj_to_ggtt(obj)->pin_count - !!obj->user_pin_count;
> +	return vma->pin_count - !!obj->user_pin_count;
>  }
>  
>  /*
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
oscar.mateo@intel.com May 12, 2014, 9:05 a.m. UTC | #3
Hi Daniel,

Sorry, this fell through the cracks:

> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to
> GGTT in is_pin_display
> 
> On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote:
> > From: Oscar Mateo <oscar.mateo@intel.com>
> >
> > Otherwise, we do a NULL pointer dereference.
> >
> > I've seen this happen while handling an error in
> > i915_gem_object_pin_to_display_plane():
> >
> > If i915_gem_object_set_cache_level() fails, we call is_pin_display()
> > to handle the error. At this point, the object is still not pinned to
> > GGTT and maybe not even bound, so we have to check before we
> > dereference its GGTT vma.
> >
> > Issue: VIZ-3772
> > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> 
> Have you looked into provoking this with an igt testcase? On a hunch a busy
> load (to extend the race window) plus the usual interruptor trick to jump out of
> wait_seqno calls should be able to make this go kaboom on command. But I
> haven't analyzed the bug in detail.

AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is:

intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display

Now, I think intelfb_alloc only happens during bootup, so reproducing it on IGT would be difficult.
I still feel the fix is valid though :)

-- Oscar
Chris Wilson May 12, 2014, 10:09 a.m. UTC | #4
On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote:
> Hi Daniel,
> 
> Sorry, this fell through the cracks:
> 
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to
> > GGTT in is_pin_display
> > 
> > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > Otherwise, we do a NULL pointer dereference.
> > >
> > > I've seen this happen while handling an error in
> > > i915_gem_object_pin_to_display_plane():
> > >
> > > If i915_gem_object_set_cache_level() fails, we call is_pin_display()
> > > to handle the error. At this point, the object is still not pinned to
> > > GGTT and maybe not even bound, so we have to check before we
> > > dereference its GGTT vma.
> > >
> > > Issue: VIZ-3772
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Have you looked into provoking this with an igt testcase? On a hunch a busy
> > load (to extend the race window) plus the usual interruptor trick to jump out of
> > wait_seqno calls should be able to make this go kaboom on command. But I
> > haven't analyzed the bug in detail.
> 
> AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is:
> 
> intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display
> 
> Now, I think intelfb_alloc only happens during bootup, so reproducing it on IGT would be difficult.
> I still feel the fix is valid though :)

Did you consider my alternative fix of restoring the old value in the
error path?
-Chris
oscar.mateo@intel.com May 12, 2014, 10:30 a.m. UTC | #5
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47


> -----Original Message-----
> From: Chris Wilson [mailto:chris@chris-wilson.co.uk]
> Sent: Monday, May 12, 2014 11:09 AM
> To: Mateo Lozano, Oscar
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to
> GGTT in is_pin_display
> 
> On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote:
> > Hi Daniel,
> >
> > Sorry, this fell through the cracks:
> >
> > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not
> > > bound to GGTT in is_pin_display
> > >
> > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com
> wrote:
> > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > >
> > > > Otherwise, we do a NULL pointer dereference.
> > > >
> > > > I've seen this happen while handling an error in
> > > > i915_gem_object_pin_to_display_plane():
> > > >
> > > > If i915_gem_object_set_cache_level() fails, we call
> > > > is_pin_display() to handle the error. At this point, the object is
> > > > still not pinned to GGTT and maybe not even bound, so we have to
> > > > check before we dereference its GGTT vma.
> > > >
> > > > Issue: VIZ-3772
> > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > Have you looked into provoking this with an igt testcase? On a hunch
> > > a busy load (to extend the race window) plus the usual interruptor
> > > trick to jump out of wait_seqno calls should be able to make this go
> > > kaboom on command. But I haven't analyzed the bug in detail.
> >
> > AFAICT, the only sequence where this likely to happen (because we are
> handling a recently created object) is:
> >
> > intelfb_alloc -> intel_pin_and_fence_fb_obj ->
> > i915_gem_object_pin_to_display_plane ->
> > i915_gem_object_set_cache_level -> is_pin_display
> >
> > Now, I think intelfb_alloc only happens during bootup, so reproducing it on
> IGT would be difficult.
> > I still feel the fix is valid though :)
> 
> Did you consider my alternative fix of restoring the old value in the error path?
> -Chris

Is that directed to Daniel or me? Restoring the old value is way easier, but I thought you wanted to keep is_pin_display as a theory of operation? And Daniel might still want an IGT test, I suppose...
Chris Wilson May 12, 2014, 10:37 a.m. UTC | #6
On Mon, May 12, 2014 at 10:30:11AM +0000, Mateo Lozano, Oscar wrote:
> > Did you consider my alternative fix of restoring the old value in the error path?
> 
> Is that directed to Daniel or me? Restoring the old value is way easier, but I thought you wanted to keep is_pin_display as a theory of operation? And Daniel might still want an IGT test, I suppose...
> 

I'd like to be able to kill is_pin_display(), it's very fragile but I
don't have a better alternative yet.
-Chris
Daniel Vetter May 12, 2014, 4:11 p.m. UTC | #7
On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote:
> Hi Daniel,
> 
> Sorry, this fell through the cracks:
> 
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to
> > GGTT in is_pin_display
> > 
> > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote:
> > > From: Oscar Mateo <oscar.mateo@intel.com>
> > >
> > > Otherwise, we do a NULL pointer dereference.
> > >
> > > I've seen this happen while handling an error in
> > > i915_gem_object_pin_to_display_plane():
> > >
> > > If i915_gem_object_set_cache_level() fails, we call is_pin_display()
> > > to handle the error. At this point, the object is still not pinned to
> > > GGTT and maybe not even bound, so we have to check before we
> > > dereference its GGTT vma.
> > >
> > > Issue: VIZ-3772
> > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > 
> > Have you looked into provoking this with an igt testcase? On a hunch a busy
> > load (to extend the race window) plus the usual interruptor trick to jump out of
> > wait_seqno calls should be able to make this go kaboom on command. But I
> > haven't analyzed the bug in detail.
> 
> AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is:
> 
> intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display

Pageflipping to a freshly allocated BO without ever touching it beforehand
should be able to achive the same. If this is really all that's needed.

But looking at the code a better way should be:
1. Create new bo, wrap it in a kms fb.
2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter.
3. Enable evil interruptor (igt_fork_signal_helper).
4. Submit pageflip

-> Boom since the set_cache_level will block, get interrupted and exit
early with -EINTR.

Given sufficient overkill in 2. this should be 100% reliable to reproduce.

Cheers, Daniel
Daniel Vetter May 12, 2014, 4:14 p.m. UTC | #8
On Mon, May 12, 2014 at 06:11:18PM +0200, Daniel Vetter wrote:
> On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote:
> > Hi Daniel,
> > 
> > Sorry, this fell through the cracks:
> > 
> > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to
> > > GGTT in is_pin_display
> > > 
> > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote:
> > > > From: Oscar Mateo <oscar.mateo@intel.com>
> > > >
> > > > Otherwise, we do a NULL pointer dereference.
> > > >
> > > > I've seen this happen while handling an error in
> > > > i915_gem_object_pin_to_display_plane():
> > > >
> > > > If i915_gem_object_set_cache_level() fails, we call is_pin_display()
> > > > to handle the error. At this point, the object is still not pinned to
> > > > GGTT and maybe not even bound, so we have to check before we
> > > > dereference its GGTT vma.
> > > >
> > > > Issue: VIZ-3772
> > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
> > > 
> > > Have you looked into provoking this with an igt testcase? On a hunch a busy
> > > load (to extend the race window) plus the usual interruptor trick to jump out of
> > > wait_seqno calls should be able to make this go kaboom on command. But I
> > > haven't analyzed the bug in detail.
> > 
> > AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is:
> > 
> > intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display
> 
> Pageflipping to a freshly allocated BO without ever touching it beforehand
> should be able to achive the same. If this is really all that's needed.
> 
> But looking at the code a better way should be:
> 1. Create new bo, wrap it in a kms fb.
> 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter.
> 3. Enable evil interruptor (igt_fork_signal_helper).
> 4. Submit pageflip
> 
> -> Boom since the set_cache_level will block, get interrupted and exit
> early with -EINTR.
> 
> Given sufficient overkill in 2. this should be 100% reliable to reproduce.

Aside: Those kinds of tricks are a big reason why I think igts aren't just
useful as testcases, but also to really understand how a bug comes about.
At least ime finally figuring out the last ingredient to make an igt fully
reliably often resulted in a suddenly much clearer understanding of the
bug at hand.

I call this "review by asking for an igt" ;-)
-Daniel
oscar.mateo@intel.com May 12, 2014, 5:10 p.m. UTC | #9
> I call this "review by asking for an igt" ;-) -Daniel

Ok, I´ll give it a try. At least I will learn something about the kms code, a.k.a. "learning by igt" :D
oscar.mateo@intel.com May 15, 2014, 1:14 p.m. UTC | #10
> > But looking at the code a better way should be:
> > 1. Create new bo, wrap it in a kms fb.
> > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter.
> > 3. Enable evil interruptor (igt_fork_signal_helper).
> > 4. Submit pageflip
> >
> > -> Boom since the set_cache_level will block, get interrupted and exit
> > early with -EINTR.
> >
> > Given sufficient overkill in 2. this should be 100% reliable to reproduce.

As soon as I execbuffer to the bo, it gets a vma for the GGTT vm:

	vm = ctx->vm;
	if (!USES_FULL_PPGTT(dev))
		vm = &dev_priv->gtt.base;

	...

	/* Look up object handles */
	ret = eb_lookup_vmas(eb, exec, args, vm, file);
	if (ret)
		goto err;

And then it becomes impossible to reproduce the problem :(
Is there any other trick to make set_cache_level fail?

-- Oscar
Ville Syrjala May 15, 2014, 1:33 p.m. UTC | #11
On Thu, May 15, 2014 at 01:14:54PM +0000, Mateo Lozano, Oscar wrote:
> > > But looking at the code a better way should be:
> > > 1. Create new bo, wrap it in a kms fb.
> > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter.
> > > 3. Enable evil interruptor (igt_fork_signal_helper).
> > > 4. Submit pageflip
> > >
> > > -> Boom since the set_cache_level will block, get interrupted and exit
> > > early with -EINTR.
> > >
> > > Given sufficient overkill in 2. this should be 100% reliable to reproduce.
> 
> As soon as I execbuffer to the bo, it gets a vma for the GGTT vm:
> 
> 	vm = ctx->vm;
> 	if (!USES_FULL_PPGTT(dev))
> 		vm = &dev_priv->gtt.base;
> 
> 	...
> 
> 	/* Look up object handles */
> 	ret = eb_lookup_vmas(eb, exec, args, vm, file);
> 	if (ret)
> 		goto err;
> 
> And then it becomes impossible to reproduce the problem :(
> Is there any other trick to make set_cache_level fail?

What if you make the pin_to_ggtt fail instead? Can you create an object
that's too big for the ggtt?
Daniel Vetter May 15, 2014, 1:45 p.m. UTC | #12
On Thu, May 15, 2014 at 01:14:54PM +0000, Mateo Lozano, Oscar wrote:
> > > But looking at the code a better way should be:
> > > 1. Create new bo, wrap it in a kms fb.
> > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter.
> > > 3. Enable evil interruptor (igt_fork_signal_helper).
> > > 4. Submit pageflip
> > >
> > > -> Boom since the set_cache_level will block, get interrupted and exit
> > > early with -EINTR.
> > >
> > > Given sufficient overkill in 2. this should be 100% reliable to reproduce.
> 
> As soon as I execbuffer to the bo, it gets a vma for the GGTT vm:
> 
> 	vm = ctx->vm;
> 	if (!USES_FULL_PPGTT(dev))
> 		vm = &dev_priv->gtt.base;
> 
> 	...
> 
> 	/* Look up object handles */
> 	ret = eb_lookup_vmas(eb, exec, args, vm, file);
> 	if (ret)
> 		goto err;
> 
> And then it becomes impossible to reproduce the problem :(
> Is there any other trick to make set_cache_level fail?

i915.ppgtt=2 should still make this blow up. The bug kinda doesn't exist
without full ppgtt I think ...
-Daniel
oscar.mateo@intel.com May 16, 2014, 10:43 a.m. UTC | #13
> -----Original Message-----
> From: Ville Syrjälä [mailto:ville.syrjala@linux.intel.com]
> Sent: Thursday, May 15, 2014 2:34 PM
> To: Mateo Lozano, Oscar
> Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to
> GGTT in is_pin_display
> 
> On Thu, May 15, 2014 at 01:14:54PM +0000, Mateo Lozano, Oscar wrote:
> > > > But looking at the code a better way should be:
> > > > 1. Create new bo, wrap it in a kms fb.
> > > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter.
> > > > 3. Enable evil interruptor (igt_fork_signal_helper).
> > > > 4. Submit pageflip
> > > >
> > > > -> Boom since the set_cache_level will block, get interrupted and
> > > > -> exit
> > > > early with -EINTR.
> > > >
> > > > Given sufficient overkill in 2. this should be 100% reliable to reproduce.
> >
> > As soon as I execbuffer to the bo, it gets a vma for the GGTT vm:
> >
> > 	vm = ctx->vm;
> > 	if (!USES_FULL_PPGTT(dev))
> > 		vm = &dev_priv->gtt.base;
> >
> > 	...
> >
> > 	/* Look up object handles */
> > 	ret = eb_lookup_vmas(eb, exec, args, vm, file);
> > 	if (ret)
> > 		goto err;
> >
> > And then it becomes impossible to reproduce the problem :( Is there
> > any other trick to make set_cache_level fail?
> 
> What if you make the pin_to_ggtt fail instead? Can you create an object that's
> too big for the ggtt?

Thanks Ville: that worked like a charm ;)
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c70121d..1d161c7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3619,6 +3619,10 @@  unlock:
 
 static bool is_pin_display(struct drm_i915_gem_object *obj)
 {
+	struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
+	if (!vma)
+		return false;
+
 	/* There are 3 sources that pin objects:
 	 *   1. The display engine (scanouts, sprites, cursors);
 	 *   2. Reservations for execbuffer;
@@ -3630,7 +3634,7 @@  static bool is_pin_display(struct drm_i915_gem_object *obj)
 	 * subtracting the potential reference by the user, any pin_count
 	 * remains, it must be due to another use by the display engine.
 	 */
-	return i915_gem_obj_to_ggtt(obj)->pin_count - !!obj->user_pin_count;
+	return vma->pin_count - !!obj->user_pin_count;
 }
 
 /*