Message ID | xmsgfynkhycw3cf56akp4he2ffg44vuratocsysaowbsnhutzi@augnqbm777at (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v6] drm/i915: Fix NULL pointer dereference in capture_engine | expand |
On 03.12.2024 15:54, Eugene Kobyak wrote: > When the intel_context structure contains NULL, > it raises a NULL pointer dereference error in drm_info(). > > Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request") > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309 > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > Cc: John Harrison <John.C.Harrison@Intel.com> > Cc: <stable@vger.kernel.org> # v6.3+ > Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com> > --- > v2: > - return drm_info to separate condition > v3: > - create separate condition which generate string if intel_context exist > v4: > - rollback and add check intel_context in log condition > v5: > - create separate string with guc_id if intel_context exist > v6: > - print changed log if intel_context exist > > drivers/gpu/drm/i915/i915_gpu_error.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c > index 135ded17334e..d88cefb889c3 100644 > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > @@ -1643,9 +1643,21 @@ capture_engine(struct intel_engine_cs *engine, > return NULL; > > intel_engine_get_hung_entity(engine, &ce, &rq); > - if (rq && !i915_request_started(rq)) > - drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", > - engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); > + if (rq && !i915_request_started(rq)) { > + /* > + * We want to know also what is the gcu_id of the context, typo: guc_id > + * but if we don't have the context reference, then skip > + * printing it. > + */ but IMO this comment is redundant as it's quite obvious that without context pointer you can't print guc_id member > + if (ce) > + drm_info(&engine->gt->i915->drm, > + "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", > + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); > + else > + drm_info(&engine->gt->i915->drm, > + "Got hung context on %s with active request %lld:%lld not yet started\n", > + engine->name, rq->fence.context, rq->fence.seqno); since you are touching drm_info() where we use engine->gt then maybe it's good time to switch to gt_info() to get better per-GT message? > + } > > if (rq) { > capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL);
Hi Michal, > > + if (rq && !i915_request_started(rq)) { > > + /* > > + * We want to know also what is the gcu_id of the context, > > typo: guc_id > > > + * but if we don't have the context reference, then skip > > + * printing it. > > + */ > > but IMO this comment is redundant as it's quite obvious that without > context pointer you can't print guc_id member I recommended to add a comment because there is some code redundancy that, I think, needs some explanation; someone might not spot immediately the need for ce, unless goes a the end of the drm_info parameter's list. > > + if (ce) > > + drm_info(&engine->gt->i915->drm, > > + "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", > > + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); > > + else > > + drm_info(&engine->gt->i915->drm, > > + "Got hung context on %s with active request %lld:%lld not yet started\n", > > + engine->name, rq->fence.context, rq->fence.seqno); > > since you are touching drm_info() where we use engine->gt then maybe > it's good time to switch to gt_info() to get better per-GT message? I think the original reason for using drm_info is because we are outside the GT. But, because the engine belongs to the GT, it makes also sense to use gt_info(), I don't oppose. It would make more sense to move this function completely into gt/. Thanks, Andi
On Wed, 04 Dec 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote: > Hi Michal, > >> > + if (rq && !i915_request_started(rq)) { >> > + /* >> > + * We want to know also what is the gcu_id of the context, >> >> typo: guc_id >> >> > + * but if we don't have the context reference, then skip >> > + * printing it. >> > + */ >> >> but IMO this comment is redundant as it's quite obvious that without >> context pointer you can't print guc_id member > > I recommended to add a comment because there is some code > redundancy that, I think, needs some explanation; someone might > not spot immediately the need for ce, unless goes a the end of > the drm_info parameter's list. > >> > + if (ce) >> > + drm_info(&engine->gt->i915->drm, >> > + "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", >> > + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); >> > + else >> > + drm_info(&engine->gt->i915->drm, >> > + "Got hung context on %s with active request %lld:%lld not yet started\n", >> > + engine->name, rq->fence.context, rq->fence.seqno); >> >> since you are touching drm_info() where we use engine->gt then maybe >> it's good time to switch to gt_info() to get better per-GT message? > > I think the original reason for using drm_info is because we are > outside the GT. But, because the engine belongs to the GT, it > makes also sense to use gt_info(), I don't oppose. > > It would make more sense to move this function completely into > gt/. Can we converge on the patch instead of diverge, please? It's a Cc: stable null pointer dereference fix. It's already been iterated for two weeks to reach v6. Fix the comment typo while applying, but there's nothing inherently wrong here AFAICT. Merge it and move on. BR, Jani.
Hi Jani, On Wed, Dec 04, 2024 at 12:51:56PM +0200, Jani Nikula wrote: > On Wed, 04 Dec 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote: > > Hi Michal, > > > >> > + if (rq && !i915_request_started(rq)) { > >> > + /* > >> > + * We want to know also what is the gcu_id of the context, > >> > >> typo: guc_id > >> > >> > + * but if we don't have the context reference, then skip > >> > + * printing it. > >> > + */ > >> > >> but IMO this comment is redundant as it's quite obvious that without > >> context pointer you can't print guc_id member > > > > I recommended to add a comment because there is some code > > redundancy that, I think, needs some explanation; someone might > > not spot immediately the need for ce, unless goes a the end of > > the drm_info parameter's list. > > > >> > + if (ce) > >> > + drm_info(&engine->gt->i915->drm, > >> > + "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", > >> > + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); > >> > + else > >> > + drm_info(&engine->gt->i915->drm, > >> > + "Got hung context on %s with active request %lld:%lld not yet started\n", > >> > + engine->name, rq->fence.context, rq->fence.seqno); > >> > >> since you are touching drm_info() where we use engine->gt then maybe > >> it's good time to switch to gt_info() to get better per-GT message? > > > > I think the original reason for using drm_info is because we are > > outside the GT. But, because the engine belongs to the GT, it > > makes also sense to use gt_info(), I don't oppose. > > > > It would make more sense to move this function completely into > > gt/. > > Can we converge on the patch instead of diverge, please? > > It's a Cc: stable null pointer dereference fix. > > It's already been iterated for two weeks to reach v6. > > Fix the comment typo while applying, but there's nothing inherently > wrong here AFAICT. Merge it and move on. Thanks for the feedback, will go ahead and merge. All other gt/ adjustments can be done later. Andi
Hi Eugene, On Tue, Dec 03, 2024 at 02:54:06PM +0000, Eugene Kobyak wrote: > When the intel_context structure contains NULL, > it raises a NULL pointer dereference error in drm_info(). > > Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request") > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309 > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > Cc: John Harrison <John.C.Harrison@Intel.com> > Cc: <stable@vger.kernel.org> # v6.3+ > Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com> merged to drm-intel-next. Thank you, Andi
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 135ded17334e..d88cefb889c3 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1643,9 +1643,21 @@ capture_engine(struct intel_engine_cs *engine, return NULL; intel_engine_get_hung_entity(engine, &ce, &rq); - if (rq && !i915_request_started(rq)) - drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", - engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); + if (rq && !i915_request_started(rq)) { + /* + * We want to know also what is the gcu_id of the context, + * but if we don't have the context reference, then skip + * printing it. + */ + if (ce) + drm_info(&engine->gt->i915->drm, + "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n", + engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id); + else + drm_info(&engine->gt->i915->drm, + "Got hung context on %s with active request %lld:%lld not yet started\n", + engine->name, rq->fence.context, rq->fence.seqno); + } if (rq) { capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL);