diff mbox

drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()

Message ID 20180110151825.23477-1-boris.brezillon@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Boris BREZILLON Jan. 10, 2018, 3:18 p.m. UTC
When saving BOs in the hang state we skip one entry of the
kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL
pointer dereference when, later in this function, we iterate over all
BOs to check their ->madv state.

Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
---
 drivers/gpu/drm/vc4/vc4_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Anholt Jan. 17, 2018, 8:03 p.m. UTC | #1
Boris Brezillon <boris.brezillon@free-electrons.com> writes:

> When saving BOs in the hang state we skip one entry of the
> kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL
> pointer dereference when, later in this function, we iterate over all
> BOs to check their ->madv state.
>
> Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> ---
>  drivers/gpu/drm/vc4/vc4_gem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
> index 6c32c89a83a9..19ac7fe0e5db 100644
> --- a/drivers/gpu/drm/vc4/vc4_gem.c
> +++ b/drivers/gpu/drm/vc4/vc4_gem.c
> @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev)
>  			kernel_state->bo[j + prev_idx] = &bo->base.base;
>  			j++;
>  		}
> -		prev_idx = j + 1;
> +		prev_idx = j;

Could we replace the whole "[j + prev_idx]" with a "[k++]" and maybe a
WARN_ON_ONCE(k != state->bo_count) at the end?

I really need to figure out if I can come up with a way to make IGT
cases for GPU hangs on vc4, despite the validation.  I found a bug in
GPU reset due to BCL hangs when doing vc5, but I don't have a testcase.
Maybe some submit flags that overwrite the BCL or RCL to do an infinite
loop?
Boris BREZILLON Jan. 17, 2018, 8:32 p.m. UTC | #2
On Wed, 17 Jan 2018 12:03:15 -0800
Eric Anholt <eric@anholt.net> wrote:

> Boris Brezillon <boris.brezillon@free-electrons.com> writes:
> 
> > When saving BOs in the hang state we skip one entry of the
> > kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL
> > pointer dereference when, later in this function, we iterate over all
> > BOs to check their ->madv state.
> >
> > Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> > ---
> >  drivers/gpu/drm/vc4/vc4_gem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
> > index 6c32c89a83a9..19ac7fe0e5db 100644
> > --- a/drivers/gpu/drm/vc4/vc4_gem.c
> > +++ b/drivers/gpu/drm/vc4/vc4_gem.c
> > @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev)
> >  			kernel_state->bo[j + prev_idx] = &bo->base.base;
> >  			j++;
> >  		}
> > -		prev_idx = j + 1;
> > +		prev_idx = j;  
> 
> Could we replace the whole "[j + prev_idx]" with a "[k++]" and maybe a
> WARN_ON_ONCE(k != state->bo_count) at the end?

Sure.

> 
> I really need to figure out if I can come up with a way to make IGT
> cases for GPU hangs on vc4, despite the validation.

I managed to trigger the NULL pointer dereference while debugging the
perfmon stuff, but it's fixed now, so I don't have a way to easily
force a reset.

> I found a bug in
> GPU reset due to BCL hangs when doing vc5, but I don't have a testcase.
> Maybe some submit flags that overwrite the BCL or RCL to do an infinite
> loop?
Daniel Vetter Jan. 17, 2018, 9:29 p.m. UTC | #3
On Wed, Jan 17, 2018 at 12:03:15PM -0800, Eric Anholt wrote:
> Boris Brezillon <boris.brezillon@free-electrons.com> writes:
> 
> > When saving BOs in the hang state we skip one entry of the
> > kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL
> > pointer dereference when, later in this function, we iterate over all
> > BOs to check their ->madv state.
> >
> > Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs")
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
> > ---
> >  drivers/gpu/drm/vc4/vc4_gem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
> > index 6c32c89a83a9..19ac7fe0e5db 100644
> > --- a/drivers/gpu/drm/vc4/vc4_gem.c
> > +++ b/drivers/gpu/drm/vc4/vc4_gem.c
> > @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev)
> >  			kernel_state->bo[j + prev_idx] = &bo->base.base;
> >  			j++;
> >  		}
> > -		prev_idx = j + 1;
> > +		prev_idx = j;
> 
> Could we replace the whole "[j + prev_idx]" with a "[k++]" and maybe a
> WARN_ON_ONCE(k != state->bo_count) at the end?
> 
> I really need to figure out if I can come up with a way to make IGT
> cases for GPU hangs on vc4, despite the validation.  I found a bug in
> GPU reset due to BCL hangs when doing vc5, but I don't have a testcase.
> Maybe some submit flags that overwrite the BCL or RCL to do an infinite
> loop?

What we currently do for i915 is an endless chain of batches (since no
command parser we can get away with that). Previously we did a special
debugfs mode which blocked out updating the ring head (but left all the
other command submission handling in place). Except for the very minor
change nothing needed to be adjusted in the kernel, and from the kernel's
pov it very much looked like the gpu simply died.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c
index 6c32c89a83a9..19ac7fe0e5db 100644
--- a/drivers/gpu/drm/vc4/vc4_gem.c
+++ b/drivers/gpu/drm/vc4/vc4_gem.c
@@ -208,7 +208,7 @@  vc4_save_hang_state(struct drm_device *dev)
 			kernel_state->bo[j + prev_idx] = &bo->base.base;
 			j++;
 		}
-		prev_idx = j + 1;
+		prev_idx = j;
 	}
 
 	if (exec[0])