diff mbox series

[v4] drm/i915: Fix NULL pointer dereference in capture_engine

Message ID sspixcrejd5nbhijuo7o3z5agbgyq637wc7gph7ajrs3lnls6g@5ljanfmvccyf (mailing list archive)
State New, archived
Headers show
Series [v4] drm/i915: Fix NULL pointer dereference in capture_engine | expand

Commit Message

Eugene Kobyak Nov. 28, 2024, 9:28 p.m. UTC
When the intel_context structure contains NULL,
it raises a NULL pointer dereference error in drm_info().

Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: <stable@vger.kernel.org> # v6.3+
Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
---
v2:
  - return drm_info to separate condition
v3:
  - create separate condition which generate string if intel_context exist
v4:
  - rollback and add check intel_context in log condition
 drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andi Shyti Nov. 28, 2024, 11:06 p.m. UTC | #1
Hi Eugene,

On Thu, Nov 28, 2024 at 09:28:43PM +0000, Eugene Kobyak wrote:
> When the intel_context structure contains NULL,
> it raises a NULL pointer dereference error in drm_info().
> 
> Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: <stable@vger.kernel.org> # v6.3+
> Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi
John Harrison Nov. 29, 2024, 2:32 a.m. UTC | #2
On 11/28/2024 13:28, Eugene Kobyak wrote:
> When the intel_context structure contains NULL,
> it raises a NULL pointer dereference error in drm_info().
>
> Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: <stable@vger.kernel.org> # v6.3+
> Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
> ---
> v2:
>    - return drm_info to separate condition
> v3:
>    - create separate condition which generate string if intel_context exist
> v4:
>    - rollback and add check intel_context in log condition
>   drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 135ded17334e..56f05a18870c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1643,7 +1643,7 @@ capture_engine(struct intel_engine_cs *engine,
>   		return NULL;
>   
>   	intel_engine_get_hung_entity(engine, &ce, &rq);
> -	if (rq && !i915_request_started(rq))
> +	if (rq && !i915_request_started(rq) && ce)
>   		drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
>   			 engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
>   
But now you don't get a message when there is a request without the 
context pointer. Which can clearly happen because otherwise you wouldn't 
have hit a null pointer dereference in the first place.

John.
Andi Shyti Nov. 29, 2024, 12:12 p.m. UTC | #3
On Thu, Nov 28, 2024 at 06:32:28PM -0800, John Harrison wrote:
> On 11/28/2024 13:28, Eugene Kobyak wrote:
> > When the intel_context structure contains NULL,
> > it raises a NULL pointer dereference error in drm_info().
> > 
> > Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
> > Cc: John Harrison <John.C.Harrison@Intel.com>
> > Cc: <stable@vger.kernel.org> # v6.3+
> > Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
> > ---
> > v2:
> >    - return drm_info to separate condition
> > v3:
> >    - create separate condition which generate string if intel_context exist
> > v4:
> >    - rollback and add check intel_context in log condition
> >   drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 135ded17334e..56f05a18870c 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1643,7 +1643,7 @@ capture_engine(struct intel_engine_cs *engine,
> >   		return NULL;
> >   	intel_engine_get_hung_entity(engine, &ce, &rq);
> > -	if (rq && !i915_request_started(rq))
> > +	if (rq && !i915_request_started(rq) && ce)
> >   		drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
> >   			 engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
> But now you don't get a message when there is a request without the context
> pointer. Which can clearly happen because otherwise you wouldn't have hit a
> null pointer dereference in the first place.

do we need the guc_id at all?

Andi
John Harrison Dec. 2, 2024, 7:28 p.m. UTC | #4
On 11/29/2024 04:12, Andi Shyti wrote:
> On Thu, Nov 28, 2024 at 06:32:28PM -0800, John Harrison wrote:
>> On 11/28/2024 13:28, Eugene Kobyak wrote:
>>> When the intel_context structure contains NULL,
>>> it raises a NULL pointer dereference error in drm_info().
>>>
>>> Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
>>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>> Cc: <stable@vger.kernel.org> # v6.3+
>>> Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
>>> ---
>>> v2:
>>>     - return drm_info to separate condition
>>> v3:
>>>     - create separate condition which generate string if intel_context exist
>>> v4:
>>>     - rollback and add check intel_context in log condition
>>>    drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index 135ded17334e..56f05a18870c 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -1643,7 +1643,7 @@ capture_engine(struct intel_engine_cs *engine,
>>>    		return NULL;
>>>    	intel_engine_get_hung_entity(engine, &ce, &rq);
>>> -	if (rq && !i915_request_started(rq))
>>> +	if (rq && !i915_request_started(rq) && ce)
>>>    		drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
>>>    			 engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
>> But now you don't get a message when there is a request without the context
>> pointer. Which can clearly happen because otherwise you wouldn't have hit a
>> null pointer dereference in the first place.
> do we need the guc_id at all?
>
> Andi
Huh?

Do you need any debug output at all? No. The driver will still work 
without it. However trying to debug problems because significantly more 
difficult. When using GuC submission, the guc_id is the unique 
identifier that lets you track a context/submission through the entire 
system. When tracking through H2G or G2H messaging or in the GuC log 
itself, the guc_id is the only identifier available.

John.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 135ded17334e..56f05a18870c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1643,7 +1643,7 @@  capture_engine(struct intel_engine_cs *engine,
 		return NULL;
 
 	intel_engine_get_hung_entity(engine, &ce, &rq);
-	if (rq && !i915_request_started(rq))
+	if (rq && !i915_request_started(rq) && ce)
 		drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
 			 engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);