Message ID | 20180110151825.23477-1-boris.brezillon@free-electrons.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Boris Brezillon <boris.brezillon@free-electrons.com> writes: > When saving BOs in the hang state we skip one entry of the > kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL > pointer dereference when, later in this function, we iterate over all > BOs to check their ->madv state. > > Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs") > Cc: <stable@vger.kernel.org> > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> > --- > drivers/gpu/drm/vc4/vc4_gem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c > index 6c32c89a83a9..19ac7fe0e5db 100644 > --- a/drivers/gpu/drm/vc4/vc4_gem.c > +++ b/drivers/gpu/drm/vc4/vc4_gem.c > @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev) > kernel_state->bo[j + prev_idx] = &bo->base.base; > j++; > } > - prev_idx = j + 1; > + prev_idx = j; Could we replace the whole "[j + prev_idx]" with a "[k++]" and maybe a WARN_ON_ONCE(k != state->bo_count) at the end? I really need to figure out if I can come up with a way to make IGT cases for GPU hangs on vc4, despite the validation. I found a bug in GPU reset due to BCL hangs when doing vc5, but I don't have a testcase. Maybe some submit flags that overwrite the BCL or RCL to do an infinite loop?
On Wed, 17 Jan 2018 12:03:15 -0800 Eric Anholt <eric@anholt.net> wrote: > Boris Brezillon <boris.brezillon@free-electrons.com> writes: > > > When saving BOs in the hang state we skip one entry of the > > kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL > > pointer dereference when, later in this function, we iterate over all > > BOs to check their ->madv state. > > > > Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs") > > Cc: <stable@vger.kernel.org> > > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> > > --- > > drivers/gpu/drm/vc4/vc4_gem.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c > > index 6c32c89a83a9..19ac7fe0e5db 100644 > > --- a/drivers/gpu/drm/vc4/vc4_gem.c > > +++ b/drivers/gpu/drm/vc4/vc4_gem.c > > @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev) > > kernel_state->bo[j + prev_idx] = &bo->base.base; > > j++; > > } > > - prev_idx = j + 1; > > + prev_idx = j; > > Could we replace the whole "[j + prev_idx]" with a "[k++]" and maybe a > WARN_ON_ONCE(k != state->bo_count) at the end? Sure. > > I really need to figure out if I can come up with a way to make IGT > cases for GPU hangs on vc4, despite the validation. I managed to trigger the NULL pointer dereference while debugging the perfmon stuff, but it's fixed now, so I don't have a way to easily force a reset. > I found a bug in > GPU reset due to BCL hangs when doing vc5, but I don't have a testcase. > Maybe some submit flags that overwrite the BCL or RCL to do an infinite > loop?
On Wed, Jan 17, 2018 at 12:03:15PM -0800, Eric Anholt wrote: > Boris Brezillon <boris.brezillon@free-electrons.com> writes: > > > When saving BOs in the hang state we skip one entry of the > > kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL > > pointer dereference when, later in this function, we iterate over all > > BOs to check their ->madv state. > > > > Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs") > > Cc: <stable@vger.kernel.org> > > Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> > > --- > > drivers/gpu/drm/vc4/vc4_gem.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c > > index 6c32c89a83a9..19ac7fe0e5db 100644 > > --- a/drivers/gpu/drm/vc4/vc4_gem.c > > +++ b/drivers/gpu/drm/vc4/vc4_gem.c > > @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev) > > kernel_state->bo[j + prev_idx] = &bo->base.base; > > j++; > > } > > - prev_idx = j + 1; > > + prev_idx = j; > > Could we replace the whole "[j + prev_idx]" with a "[k++]" and maybe a > WARN_ON_ONCE(k != state->bo_count) at the end? > > I really need to figure out if I can come up with a way to make IGT > cases for GPU hangs on vc4, despite the validation. I found a bug in > GPU reset due to BCL hangs when doing vc5, but I don't have a testcase. > Maybe some submit flags that overwrite the BCL or RCL to do an infinite > loop? What we currently do for i915 is an endless chain of batches (since no command parser we can get away with that). Previously we did a special debugfs mode which blocked out updating the ring head (but left all the other command submission handling in place). Except for the very minor change nothing needed to be adjusted in the kernel, and from the kernel's pov it very much looked like the gpu simply died. -Daniel
diff --git a/drivers/gpu/drm/vc4/vc4_gem.c b/drivers/gpu/drm/vc4/vc4_gem.c index 6c32c89a83a9..19ac7fe0e5db 100644 --- a/drivers/gpu/drm/vc4/vc4_gem.c +++ b/drivers/gpu/drm/vc4/vc4_gem.c @@ -208,7 +208,7 @@ vc4_save_hang_state(struct drm_device *dev) kernel_state->bo[j + prev_idx] = &bo->base.base; j++; } - prev_idx = j + 1; + prev_idx = j; } if (exec[0])
When saving BOs in the hang state we skip one entry of the kernel_state->bo[] array, thus leaving it to NULL. This leads to a NULL pointer dereference when, later in this function, we iterate over all BOs to check their ->madv state. Fixes: ca26d28bbaa3 ("drm/vc4: improve throughput by pipelining binning and rendering jobs") Cc: <stable@vger.kernel.org> Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> --- drivers/gpu/drm/vc4/vc4_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)