Message ID | 1396462861-16396-1-git-send-email-oscar.mateo@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote: > From: Oscar Mateo <oscar.mateo@intel.com> > > Otherwise, we do a NULL pointer dereference. Or we could just do was_pin_display = obj->pin_display; obj->pin_display = true; err_unpin_display: obj->pin_display = was_pin_display; And then we wouldn't even need the essay... However, it is almost a theory of operation and quite useful... -Chris
On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote: > From: Oscar Mateo <oscar.mateo@intel.com> > > Otherwise, we do a NULL pointer dereference. > > I've seen this happen while handling an error in > i915_gem_object_pin_to_display_plane(): > > If i915_gem_object_set_cache_level() fails, we call is_pin_display() to > handle the error. At this point, the object is still not pinned to GGTT > and maybe not even bound, so we have to check before we dereference its > GGTT vma. > > Issue: VIZ-3772 > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> Have you looked into provoking this with an igt testcase? On a hunch a busy load (to extend the race window) plus the usual interruptor trick to jump out of wait_seqno calls should be able to make this go kaboom on command. But I haven't analyzed the bug in detail. Thanks, Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index c70121d..1d161c7 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3619,6 +3619,10 @@ unlock: > > static bool is_pin_display(struct drm_i915_gem_object *obj) > { > + struct i915_vma *vma = i915_gem_obj_to_ggtt(obj); > + if (!vma) > + return false; > + > /* There are 3 sources that pin objects: > * 1. The display engine (scanouts, sprites, cursors); > * 2. Reservations for execbuffer; > @@ -3630,7 +3634,7 @@ static bool is_pin_display(struct drm_i915_gem_object *obj) > * subtracting the potential reference by the user, any pin_count > * remains, it must be due to another use by the display engine. > */ > - return i915_gem_obj_to_ggtt(obj)->pin_count - !!obj->user_pin_count; > + return vma->pin_count - !!obj->user_pin_count; > } > > /* > -- > 1.9.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Hi Daniel, Sorry, this fell through the cracks: > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to > GGTT in is_pin_display > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote: > > From: Oscar Mateo <oscar.mateo@intel.com> > > > > Otherwise, we do a NULL pointer dereference. > > > > I've seen this happen while handling an error in > > i915_gem_object_pin_to_display_plane(): > > > > If i915_gem_object_set_cache_level() fails, we call is_pin_display() > > to handle the error. At this point, the object is still not pinned to > > GGTT and maybe not even bound, so we have to check before we > > dereference its GGTT vma. > > > > Issue: VIZ-3772 > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> > > Have you looked into provoking this with an igt testcase? On a hunch a busy > load (to extend the race window) plus the usual interruptor trick to jump out of > wait_seqno calls should be able to make this go kaboom on command. But I > haven't analyzed the bug in detail. AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is: intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display Now, I think intelfb_alloc only happens during bootup, so reproducing it on IGT would be difficult. I still feel the fix is valid though :) -- Oscar
On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote: > Hi Daniel, > > Sorry, this fell through the cracks: > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to > > GGTT in is_pin_display > > > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote: > > > From: Oscar Mateo <oscar.mateo@intel.com> > > > > > > Otherwise, we do a NULL pointer dereference. > > > > > > I've seen this happen while handling an error in > > > i915_gem_object_pin_to_display_plane(): > > > > > > If i915_gem_object_set_cache_level() fails, we call is_pin_display() > > > to handle the error. At this point, the object is still not pinned to > > > GGTT and maybe not even bound, so we have to check before we > > > dereference its GGTT vma. > > > > > > Issue: VIZ-3772 > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> > > > > Have you looked into provoking this with an igt testcase? On a hunch a busy > > load (to extend the race window) plus the usual interruptor trick to jump out of > > wait_seqno calls should be able to make this go kaboom on command. But I > > haven't analyzed the bug in detail. > > AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is: > > intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display > > Now, I think intelfb_alloc only happens during bootup, so reproducing it on IGT would be difficult. > I still feel the fix is valid though :) Did you consider my alternative fix of restoring the old value in the error path? -Chris
--------------------------------------------------------------------- Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47 > -----Original Message----- > From: Chris Wilson [mailto:chris@chris-wilson.co.uk] > Sent: Monday, May 12, 2014 11:09 AM > To: Mateo Lozano, Oscar > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to > GGTT in is_pin_display > > On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote: > > Hi Daniel, > > > > Sorry, this fell through the cracks: > > > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not > > > bound to GGTT in is_pin_display > > > > > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com > wrote: > > > > From: Oscar Mateo <oscar.mateo@intel.com> > > > > > > > > Otherwise, we do a NULL pointer dereference. > > > > > > > > I've seen this happen while handling an error in > > > > i915_gem_object_pin_to_display_plane(): > > > > > > > > If i915_gem_object_set_cache_level() fails, we call > > > > is_pin_display() to handle the error. At this point, the object is > > > > still not pinned to GGTT and maybe not even bound, so we have to > > > > check before we dereference its GGTT vma. > > > > > > > > Issue: VIZ-3772 > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> > > > > > > Have you looked into provoking this with an igt testcase? On a hunch > > > a busy load (to extend the race window) plus the usual interruptor > > > trick to jump out of wait_seqno calls should be able to make this go > > > kaboom on command. But I haven't analyzed the bug in detail. > > > > AFAICT, the only sequence where this likely to happen (because we are > handling a recently created object) is: > > > > intelfb_alloc -> intel_pin_and_fence_fb_obj -> > > i915_gem_object_pin_to_display_plane -> > > i915_gem_object_set_cache_level -> is_pin_display > > > > Now, I think intelfb_alloc only happens during bootup, so reproducing it on > IGT would be difficult. > > I still feel the fix is valid though :) > > Did you consider my alternative fix of restoring the old value in the error path? > -Chris Is that directed to Daniel or me? Restoring the old value is way easier, but I thought you wanted to keep is_pin_display as a theory of operation? And Daniel might still want an IGT test, I suppose...
On Mon, May 12, 2014 at 10:30:11AM +0000, Mateo Lozano, Oscar wrote: > > Did you consider my alternative fix of restoring the old value in the error path? > > Is that directed to Daniel or me? Restoring the old value is way easier, but I thought you wanted to keep is_pin_display as a theory of operation? And Daniel might still want an IGT test, I suppose... > I'd like to be able to kill is_pin_display(), it's very fragile but I don't have a better alternative yet. -Chris
On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote: > Hi Daniel, > > Sorry, this fell through the cracks: > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to > > GGTT in is_pin_display > > > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote: > > > From: Oscar Mateo <oscar.mateo@intel.com> > > > > > > Otherwise, we do a NULL pointer dereference. > > > > > > I've seen this happen while handling an error in > > > i915_gem_object_pin_to_display_plane(): > > > > > > If i915_gem_object_set_cache_level() fails, we call is_pin_display() > > > to handle the error. At this point, the object is still not pinned to > > > GGTT and maybe not even bound, so we have to check before we > > > dereference its GGTT vma. > > > > > > Issue: VIZ-3772 > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> > > > > Have you looked into provoking this with an igt testcase? On a hunch a busy > > load (to extend the race window) plus the usual interruptor trick to jump out of > > wait_seqno calls should be able to make this go kaboom on command. But I > > haven't analyzed the bug in detail. > > AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is: > > intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display Pageflipping to a freshly allocated BO without ever touching it beforehand should be able to achive the same. If this is really all that's needed. But looking at the code a better way should be: 1. Create new bo, wrap it in a kms fb. 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter. 3. Enable evil interruptor (igt_fork_signal_helper). 4. Submit pageflip -> Boom since the set_cache_level will block, get interrupted and exit early with -EINTR. Given sufficient overkill in 2. this should be 100% reliable to reproduce. Cheers, Daniel
On Mon, May 12, 2014 at 06:11:18PM +0200, Daniel Vetter wrote: > On Mon, May 12, 2014 at 09:05:45AM +0000, Mateo Lozano, Oscar wrote: > > Hi Daniel, > > > > Sorry, this fell through the cracks: > > > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to > > > GGTT in is_pin_display > > > > > > On Wed, Apr 02, 2014 at 07:21:01PM +0100, oscar.mateo@intel.com wrote: > > > > From: Oscar Mateo <oscar.mateo@intel.com> > > > > > > > > Otherwise, we do a NULL pointer dereference. > > > > > > > > I've seen this happen while handling an error in > > > > i915_gem_object_pin_to_display_plane(): > > > > > > > > If i915_gem_object_set_cache_level() fails, we call is_pin_display() > > > > to handle the error. At this point, the object is still not pinned to > > > > GGTT and maybe not even bound, so we have to check before we > > > > dereference its GGTT vma. > > > > > > > > Issue: VIZ-3772 > > > > Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> > > > > > > Have you looked into provoking this with an igt testcase? On a hunch a busy > > > load (to extend the race window) plus the usual interruptor trick to jump out of > > > wait_seqno calls should be able to make this go kaboom on command. But I > > > haven't analyzed the bug in detail. > > > > AFAICT, the only sequence where this likely to happen (because we are handling a recently created object) is: > > > > intelfb_alloc -> intel_pin_and_fence_fb_obj -> i915_gem_object_pin_to_display_plane -> i915_gem_object_set_cache_level -> is_pin_display > > Pageflipping to a freshly allocated BO without ever touching it beforehand > should be able to achive the same. If this is really all that's needed. > > But looking at the code a better way should be: > 1. Create new bo, wrap it in a kms fb. > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter. > 3. Enable evil interruptor (igt_fork_signal_helper). > 4. Submit pageflip > > -> Boom since the set_cache_level will block, get interrupted and exit > early with -EINTR. > > Given sufficient overkill in 2. this should be 100% reliable to reproduce. Aside: Those kinds of tricks are a big reason why I think igts aren't just useful as testcases, but also to really understand how a bug comes about. At least ime finally figuring out the last ingredient to make an igt fully reliably often resulted in a suddenly much clearer understanding of the bug at hand. I call this "review by asking for an igt" ;-) -Daniel
> I call this "review by asking for an igt" ;-) -Daniel
Ok, I´ll give it a try. At least I will learn something about the kms code, a.k.a. "learning by igt" :D
> > But looking at the code a better way should be: > > 1. Create new bo, wrap it in a kms fb. > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter. > > 3. Enable evil interruptor (igt_fork_signal_helper). > > 4. Submit pageflip > > > > -> Boom since the set_cache_level will block, get interrupted and exit > > early with -EINTR. > > > > Given sufficient overkill in 2. this should be 100% reliable to reproduce. As soon as I execbuffer to the bo, it gets a vma for the GGTT vm: vm = ctx->vm; if (!USES_FULL_PPGTT(dev)) vm = &dev_priv->gtt.base; ... /* Look up object handles */ ret = eb_lookup_vmas(eb, exec, args, vm, file); if (ret) goto err; And then it becomes impossible to reproduce the problem :( Is there any other trick to make set_cache_level fail? -- Oscar
On Thu, May 15, 2014 at 01:14:54PM +0000, Mateo Lozano, Oscar wrote: > > > But looking at the code a better way should be: > > > 1. Create new bo, wrap it in a kms fb. > > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter. > > > 3. Enable evil interruptor (igt_fork_signal_helper). > > > 4. Submit pageflip > > > > > > -> Boom since the set_cache_level will block, get interrupted and exit > > > early with -EINTR. > > > > > > Given sufficient overkill in 2. this should be 100% reliable to reproduce. > > As soon as I execbuffer to the bo, it gets a vma for the GGTT vm: > > vm = ctx->vm; > if (!USES_FULL_PPGTT(dev)) > vm = &dev_priv->gtt.base; > > ... > > /* Look up object handles */ > ret = eb_lookup_vmas(eb, exec, args, vm, file); > if (ret) > goto err; > > And then it becomes impossible to reproduce the problem :( > Is there any other trick to make set_cache_level fail? What if you make the pin_to_ggtt fail instead? Can you create an object that's too big for the ggtt?
On Thu, May 15, 2014 at 01:14:54PM +0000, Mateo Lozano, Oscar wrote: > > > But looking at the code a better way should be: > > > 1. Create new bo, wrap it in a kms fb. > > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter. > > > 3. Enable evil interruptor (igt_fork_signal_helper). > > > 4. Submit pageflip > > > > > > -> Boom since the set_cache_level will block, get interrupted and exit > > > early with -EINTR. > > > > > > Given sufficient overkill in 2. this should be 100% reliable to reproduce. > > As soon as I execbuffer to the bo, it gets a vma for the GGTT vm: > > vm = ctx->vm; > if (!USES_FULL_PPGTT(dev)) > vm = &dev_priv->gtt.base; > > ... > > /* Look up object handles */ > ret = eb_lookup_vmas(eb, exec, args, vm, file); > if (ret) > goto err; > > And then it becomes impossible to reproduce the problem :( > Is there any other trick to make set_cache_level fail? i915.ppgtt=2 should still make this blow up. The bug kinda doesn't exist without full ppgtt I think ... -Daniel
> -----Original Message----- > From: Ville Syrjälä [mailto:ville.syrjala@linux.intel.com] > Sent: Thursday, May 15, 2014 2:34 PM > To: Mateo Lozano, Oscar > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Gracefully handle obj not bound to > GGTT in is_pin_display > > On Thu, May 15, 2014 at 01:14:54PM +0000, Mateo Lozano, Oscar wrote: > > > > But looking at the code a better way should be: > > > > 1. Create new bo, wrap it in a kms fb. > > > > 2. Slap busy load onto that bo, e.g. reapeatedly fill it with the blitter. > > > > 3. Enable evil interruptor (igt_fork_signal_helper). > > > > 4. Submit pageflip > > > > > > > > -> Boom since the set_cache_level will block, get interrupted and > > > > -> exit > > > > early with -EINTR. > > > > > > > > Given sufficient overkill in 2. this should be 100% reliable to reproduce. > > > > As soon as I execbuffer to the bo, it gets a vma for the GGTT vm: > > > > vm = ctx->vm; > > if (!USES_FULL_PPGTT(dev)) > > vm = &dev_priv->gtt.base; > > > > ... > > > > /* Look up object handles */ > > ret = eb_lookup_vmas(eb, exec, args, vm, file); > > if (ret) > > goto err; > > > > And then it becomes impossible to reproduce the problem :( Is there > > any other trick to make set_cache_level fail? > > What if you make the pin_to_ggtt fail instead? Can you create an object that's > too big for the ggtt? Thanks Ville: that worked like a charm ;)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c70121d..1d161c7 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3619,6 +3619,10 @@ unlock: static bool is_pin_display(struct drm_i915_gem_object *obj) { + struct i915_vma *vma = i915_gem_obj_to_ggtt(obj); + if (!vma) + return false; + /* There are 3 sources that pin objects: * 1. The display engine (scanouts, sprites, cursors); * 2. Reservations for execbuffer; @@ -3630,7 +3634,7 @@ static bool is_pin_display(struct drm_i915_gem_object *obj) * subtracting the potential reference by the user, any pin_count * remains, it must be due to another use by the display engine. */ - return i915_gem_obj_to_ggtt(obj)->pin_count - !!obj->user_pin_count; + return vma->pin_count - !!obj->user_pin_count; } /*