drm/i915: Actually capture PP_DIR_BASE on error

Message ID	1394863318-17424-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Ben Widawsky <benjamin.widawsky@intel.com> To: Intel GFX <intel-gfx@lists.freedesktop.org> Date: Fri, 14 Mar 2014 23:01:58 -0700 Message-Id: <1394863318-17424-1-git-send-email-benjamin.widawsky@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net>, Ben Widawsky <benjamin.widawsky@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915: Actually capture PP_DIR_BASE on error Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

1394863318-17424-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)

State

New, archived

Headers

From: Ben Widawsky <benjamin.widawsky@intel.com>
To: Intel GFX <intel-gfx@lists.freedesktop.org>
Date: Fri, 14 Mar 2014 23:01:58 -0700
Message-Id: <1394863318-17424-1-git-send-email-benjamin.widawsky@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>,
	Ben Widawsky <benjamin.widawsky@intel.com>
Subject: [Intel-gfx] [PATCH] drm/i915: Actually capture PP_DIR_BASE on error
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Ben Widawsky March 15, 2014, 6:01 a.m. UTC

I have been seeing this for a long time, but ignored it because it's
typically not terribly important. Recently, I really needed this info,
and it was garbage. Proof that I should have fixed it sooner. Originally
wrong from:

commit 6c7a01ec3743a5a6ce9e53a69d7a6c2d8c715eb1
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Thu Jan 30 00:19:40 2014 -0800

    drm/i915: Capture PPGTT info on error capture

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Daniel Vetter March 15, 2014, 11:53 a.m. UTC | #1

On Fri, Mar 14, 2014 at 11:01:58PM -0700, Ben Widawsky wrote:
> I have been seeing this for a long time, but ignored it because it's
> typically not terribly important. Recently, I really needed this info,
> and it was garbage. Proof that I should have fixed it sooner. Originally
> wrong from:
> 
> commit 6c7a01ec3743a5a6ce9e53a69d7a6c2d8c715eb1
> Author: Ben Widawsky <benjamin.widawsky@intel.com>
> Date:   Thu Jan 30 00:19:40 2014 -0800
> 
>     drm/i915: Capture PPGTT info on error capture
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

Oops. I wonder whether we should start to have some functional checks for
the error states. Fumbles like this crop up fairly often, and they're a
pain if a bug reporter shows up with such a busted batch. Other examples
are the woes where we've dumped nothingess due to ppgtt.

Most of the stuff could be easily tested I think, e.g.
- Check whether the presumed offset of the batch returned by the kernel
  shows up anywhere in the right ring. If we first allocate a randomized
  large bo and then a batch the offset should be sufficiently unique to
  avoid false positives. This would check whether we correctly dump ring
  buffers.
- Add some unique values after MI_BB_END and check the dump for them.
  This would check whether we correctly dump batches.
- Special checks for the various register values we dump, e.g. we could
  fish information like the pp base out of debugfs. Not sure how useful
  this is ...

Anyway just food for thought. I'll probably volunteer the next guy who
breaks the error dumper as a victim for this.

Queued for -next, thanks for the patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 7a9bba1..d7ac688 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -899,10 +899,12 @@ static void i915_record_ring_state(struct drm_device *dev,
>  			}
>  			break;
>  		case 7:
> -			ering->vm_info.pp_dir_base = RING_PP_DIR_BASE(ring);
> +			ering->vm_info.pp_dir_base =
> +				I915_READ(RING_PP_DIR_BASE(ring));
>  			break;
>  		case 6:
> -			ering->vm_info.pp_dir_base = RING_PP_DIR_BASE_READ(ring);
> +			ering->vm_info.pp_dir_base =
> +				I915_READ(RING_PP_DIR_BASE_READ(ring));
>  			break;
>  		}
>  	}
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 7a9bba1..d7ac688 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -899,10 +899,12 @@  static void i915_record_ring_state(struct drm_device *dev,
 			}
 			break;
 		case 7:
-			ering->vm_info.pp_dir_base = RING_PP_DIR_BASE(ring);
+			ering->vm_info.pp_dir_base =
+				I915_READ(RING_PP_DIR_BASE(ring));
 			break;
 		case 6:
-			ering->vm_info.pp_dir_base = RING_PP_DIR_BASE_READ(ring);
+			ering->vm_info.pp_dir_base =
+				I915_READ(RING_PP_DIR_BASE_READ(ring));
 			break;
 		}
 	}

drm/i915: Actually capture PP_DIR_BASE on error

Commit Message

Comments

Patch