Message ID | 20210806201852.1345184-1-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Release ctx->syncobj on final put, not on ctx close | expand |
On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > gem context refcounting is another exercise in least locking design it > seems, where most things get destroyed upon context closure (which can > race with anything really). Only the actual memory allocation and the > locks survive while holding a reference. > > This tripped up Jason when reimplementing the single timeline feature > in > > commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d > Author: Jason Ekstrand <jason@jlekstrand.net> > Date: Thu Jul 8 10:48:12 2021 -0500 > > drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) > > We could fix the bug by holding ctx->mutex, but it's cleaner to just What bug is this fixing, exactly? --Jason > > make the context object actually invariant over its _entire_ lifetime. > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") > Cc: Jason Ekstrand <jason@jlekstrand.net> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Matthew Brost <matthew.brost@intel.com> > Cc: Matthew Auld <matthew.auld@intel.com> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > Cc: "Thomas Hellström" <thomas.hellstrom@intel.com> > Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > Cc: Dave Airlie <airlied@redhat.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index 754b9b8d4981..93ba0197d70a 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) > trace_i915_context_free(ctx); > GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); > > + if (ctx->syncobj) > + drm_syncobj_put(ctx->syncobj); > + > mutex_destroy(&ctx->engines_mutex); > mutex_destroy(&ctx->lut_mutex); > > @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) > if (vm) > i915_vm_close(vm); > > - if (ctx->syncobj) > - drm_syncobj_put(ctx->syncobj); > - > ctx->file_priv = ERR_PTR(-EBADF); > > /* > -- > 2.32.0
On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand <jason@jlekstrand.net> wrote: > > On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > >> gem context refcounting is another exercise in least locking design it >> seems, where most things get destroyed upon context closure (which can >> race with anything really). Only the actual memory allocation and the >> locks survive while holding a reference. >> >> This tripped up Jason when reimplementing the single timeline feature >> in >> >> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d >> Author: Jason Ekstrand <jason@jlekstrand.net> >> Date: Thu Jul 8 10:48:12 2021 -0500 >> >> drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) >> >> We could fix the bug by holding ctx->mutex, but it's cleaner to just > > > What bug is this fixing, exactly? Oh lol I was all busy ranting and not explaining :-) I found it while auditing other context stuff, so that other patch has the longer commit message with more history, but that patch is also now tied into the vm-dercuify, so short summary: You put the syncobj in context close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final kref_put. Which means you're open to a use-after-free if you race against an execbuf. ctx->vm is equally broken (but for other ioctl), once this fix here is merged I send out the ctx->vm fix because that's tied into the vm-dercuify now due to conflicts. -Daniel > > --Jason > >> >> make the context object actually invariant over its _entire_ lifetime. >> >> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> >> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") >> Cc: Jason Ekstrand <jason@jlekstrand.net> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> >> Cc: Matthew Brost <matthew.brost@intel.com> >> Cc: Matthew Auld <matthew.auld@intel.com> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> >> Cc: Dave Airlie <airlied@redhat.com> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c >> index 754b9b8d4981..93ba0197d70a 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c >> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) >> trace_i915_context_free(ctx); >> GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); >> >> + if (ctx->syncobj) >> + drm_syncobj_put(ctx->syncobj); >> + >> mutex_destroy(&ctx->engines_mutex); >> mutex_destroy(&ctx->lut_mutex); >> >> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) >> if (vm) >> i915_vm_close(vm); >> >> - if (ctx->syncobj) >> - drm_syncobj_put(ctx->syncobj); >> - >> ctx->file_priv = ERR_PTR(-EBADF); >> >> /* >> -- >> 2.32.0 > >
On Mon, Aug 09, 2021 at 08:47:14PM +0200, Daniel Vetter wrote: > On Sun, Aug 8, 2021 at 2:56 AM Jason Ekstrand <jason@jlekstrand.net> wrote: > > > > On August 6, 2021 15:18:59 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > >> gem context refcounting is another exercise in least locking design it > >> seems, where most things get destroyed upon context closure (which can > >> race with anything really). Only the actual memory allocation and the > >> locks survive while holding a reference. > >> > >> This tripped up Jason when reimplementing the single timeline feature > >> in > >> > >> commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d > >> Author: Jason Ekstrand <jason@jlekstrand.net> > >> Date: Thu Jul 8 10:48:12 2021 -0500 > >> > >> drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) > >> > >> We could fix the bug by holding ctx->mutex, but it's cleaner to just > > > > > > What bug is this fixing, exactly? > > Oh lol I was all busy ranting and not explaining :-) I found it while > auditing other context stuff, so that other patch has the longer > commit message with more history, but that patch is also now tied into > the vm-dercuify, so short summary: You put the syncobj in context > close (i.e. CTX_DESTRY ioctl or close(drmfd)), not in the final > kref_put. Which means you're open to a use-after-free if you race > against an execbuf. ctx->vm is equally broken (but for other ioctl), > once this fix here is merged I send out the ctx->vm fix because that's > tied into the vm-dercuify now due to conflicts. CI caught more, so just explaining what I'm fixing here isn't going to be enough. The troubel is that drm_syncobj_put is now called from very awkward places, and I need to see whether we can fix that. Or whether we need more work_struct (like we have already for i915_address_space) for the final release. -Daniel > -Daniel > > > > > --Jason > > > >> > >> make the context object actually invariant over its _entire_ lifetime. > >> > >> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > >> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") > >> Cc: Jason Ekstrand <jason@jlekstrand.net> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > >> Cc: Matthew Brost <matthew.brost@intel.com> > >> Cc: Matthew Auld <matthew.auld@intel.com> > >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> > >> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com> > >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> > >> Cc: Dave Airlie <airlied@redhat.com> > >> --- > >> drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >> index 754b9b8d4981..93ba0197d70a 100644 > >> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >> @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) > >> trace_i915_context_free(ctx); > >> GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); > >> > >> + if (ctx->syncobj) > >> + drm_syncobj_put(ctx->syncobj); > >> + > >> mutex_destroy(&ctx->engines_mutex); > >> mutex_destroy(&ctx->lut_mutex); > >> > >> @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) > >> if (vm) > >> i915_vm_close(vm); > >> > >> - if (ctx->syncobj) > >> - drm_syncobj_put(ctx->syncobj); > >> - > >> ctx->file_priv = ERR_PTR(-EBADF); > >> > >> /* > >> -- > >> 2.32.0 > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 754b9b8d4981..93ba0197d70a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); + if (ctx->syncobj) + drm_syncobj_put(ctx->syncobj); + mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm); - if (ctx->syncobj) - drm_syncobj_put(ctx->syncobj); - ctx->file_priv = ERR_PTR(-EBADF); /*
gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference. This tripped up Jason when reimplementing the single timeline feature in commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d Author: Jason Ekstrand <jason@jlekstrand.net> Date: Thu Jul 8 10:48:12 2021 -0500 drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) We could fix the bug by holding ctx->mutex, but it's cleaner to just make the context object actually invariant over its _entire_ lifetime. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dave Airlie <airlied@redhat.com> --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)