diff mbox

[1/7] drm/i915: add struct i915_ctx_hang_stats

Message ID 1371029734-10355-1-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala June 12, 2013, 9:35 a.m. UTC
To count context losses, add struct i915_ctx_hang_stats for
both i915_hw_context and drm_i915_file_private.
drm_i915_file_private is used when there is no context.

v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c |    2 +-
 drivers/gpu/drm/i915/i915_drv.h |   10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Chris Wilson June 12, 2013, 10:33 a.m. UTC | #1
On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote:
> To count context losses, add struct i915_ctx_hang_stats for
> both i915_hw_context and drm_i915_file_private.
> drm_i915_file_private is used when there is no context.
> 
> v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>

Only minor bikesheds on the series, so
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Ben Widawsky June 12, 2013, 9:13 p.m. UTC | #2
On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote:
> To count context losses, add struct i915_ctx_hang_stats for
> both i915_hw_context and drm_i915_file_private.
> drm_i915_file_private is used when there is no context.
> 
> v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>
I don't have time to do a proper review before Daniel wants to merge
these, and Chris has already reviewed it.

1-6 are:
Acked-by: Ben Widawsky <ben@bwidawsk.net>

I don't really like the behavior of 7. At least, I'd like to make it
something that can be disabled via debugfs, sysfs, or module parameter.
(I'd very much prefer it to be opt-in also).  TBH , I only read it very
fast, and I'm not horribly opposed to it, just a bunch of complexity for
IMO little gain. Presumably the problem it's trying to solve should be
fixed with a fix to ddx, mesa, libva, client, whatever.  In the embedded
case, the same thing applies.  Banning the guilty doesn't make the user
experience any better. So the only thing I see is DoS, but we've never
*really* made that our priority anyway, so, meh.
Chris Wilson June 12, 2013, 10:44 p.m. UTC | #3
On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote:
> On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote:
> > To count context losses, add struct i915_ctx_hang_stats for
> > both i915_hw_context and drm_i915_file_private.
> > drm_i915_file_private is used when there is no context.
> > 
> > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)
> > 
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> >
> I don't have time to do a proper review before Daniel wants to merge
> these, and Chris has already reviewed it.
> 
> 1-6 are:
> Acked-by: Ben Widawsky <ben@bwidawsk.net>
> 
> I don't really like the behavior of 7. At least, I'd like to make it
> something that can be disabled via debugfs, sysfs, or module parameter.
> (I'd very much prefer it to be opt-in also).  TBH , I only read it very
> fast, and I'm not horribly opposed to it, just a bunch of complexity for
> IMO little gain. Presumably the problem it's trying to solve should be
> fixed with a fix to ddx, mesa, libva, client, whatever.  In the embedded
> case, the same thing applies.  Banning the guilty doesn't make the user
> experience any better. So the only thing I see is DoS, but we've never
> *really* made that our priority anyway, so, meh.

Right, it is policy. But it is existing policy. Ultimately we want to
get as much of that decision out of the kernel.
-Chris
Daniel Vetter June 13, 2013, 9:53 a.m. UTC | #4
On Wed, Jun 12, 2013 at 11:44:51PM +0100, Chris Wilson wrote:
> On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote:
> > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote:
> > > To count context losses, add struct i915_ctx_hang_stats for
> > > both i915_hw_context and drm_i915_file_private.
> > > drm_i915_file_private is used when there is no context.
> > > 
> > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)
> > > 
> > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > >
> > I don't have time to do a proper review before Daniel wants to merge
> > these, and Chris has already reviewed it.
> > 
> > 1-6 are:
> > Acked-by: Ben Widawsky <ben@bwidawsk.net>
> > 
> > I don't really like the behavior of 7. At least, I'd like to make it
> > something that can be disabled via debugfs, sysfs, or module parameter.
> > (I'd very much prefer it to be opt-in also).  TBH , I only read it very
> > fast, and I'm not horribly opposed to it, just a bunch of complexity for
> > IMO little gain. Presumably the problem it's trying to solve should be
> > fixed with a fix to ddx, mesa, libva, client, whatever.  In the embedded
> > case, the same thing applies.  Banning the guilty doesn't make the user
> > experience any better. So the only thing I see is DoS, but we've never
> > *really* made that our priority anyway, so, meh.
> 
> Right, it is policy. But it is existing policy. Ultimately we want to
> get as much of that decision out of the kernel.

Merged the entire series with a little note added to this patch explaining
the justification for it. Thanks for patches and review.
-Daniel
Daniel Vetter June 13, 2013, 8:59 p.m. UTC | #5
On Thu, Jun 13, 2013 at 11:53:13AM +0200, Daniel Vetter wrote:
> On Wed, Jun 12, 2013 at 11:44:51PM +0100, Chris Wilson wrote:
> > On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote:
> > > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote:
> > > > To count context losses, add struct i915_ctx_hang_stats for
> > > > both i915_hw_context and drm_i915_file_private.
> > > > drm_i915_file_private is used when there is no context.
> > > > 
> > > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)
> > > > 
> > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > > >
> > > I don't have time to do a proper review before Daniel wants to merge
> > > these, and Chris has already reviewed it.
> > > 
> > > 1-6 are:
> > > Acked-by: Ben Widawsky <ben@bwidawsk.net>
> > > 
> > > I don't really like the behavior of 7. At least, I'd like to make it
> > > something that can be disabled via debugfs, sysfs, or module parameter.
> > > (I'd very much prefer it to be opt-in also).  TBH , I only read it very
> > > fast, and I'm not horribly opposed to it, just a bunch of complexity for
> > > IMO little gain. Presumably the problem it's trying to solve should be
> > > fixed with a fix to ddx, mesa, libva, client, whatever.  In the embedded
> > > case, the same thing applies.  Banning the guilty doesn't make the user
> > > experience any better. So the only thing I see is DoS, but we've never
> > > *really* made that our priority anyway, so, meh.
> > 
> > Right, it is policy. But it is existing policy. Ultimately we want to
> > get as much of that decision out of the kernel.
> 
> Merged the entire series with a little note added to this patch explaining
> the justification for it. Thanks for patches and review.

I think a careful review of the locking would be good here. I see a few
cases:
- dev_priv->hangcheck stats which are only touched by the hangcheck timer
  ever. Safe since the hangcheck timer is non-reentrant, but would be good
  to add a comment to those things.
- Data shared between hangcheck and reset work. Probably needs spinlock
  protection, I'd add a new one. I know that it's rather unlikely that
  we'll race, but if we e.g. increase the hangcheck rate a lot we might
  fire the next hangcheck before the reset work has completed. Or the
  scheduler could be really annoying and not give the work cpu time for a
  while.
- Guilty context stats in the context structs. Atm this is protected by
  dev->struct_mutex I think, but I'd like to not proliferate the use of
  that lock if possible (there's _way_ too much legacy bagadge attached to
  this thing). So I'd vote for the introduction of a new mutex.
- Anyting else I've missed in my quick review.

I think a patch for each class of data explaining how it's accessed and
how it's protected exactly (both in the commit message and with a short
comment in the headers) would be really fine.

Volunteered?

Cheers, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index fd8898c..8e628dc 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1814,7 +1814,7 @@  int i915_driver_open(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv;
 
 	DRM_DEBUG_DRIVER("\n");
-	file_priv = kmalloc(sizeof(*file_priv), GFP_KERNEL);
+	file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
 	if (!file_priv)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index eaa04a6..5f3da39 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -498,6 +498,13 @@  struct i915_hw_ppgtt {
 	void (*cleanup)(struct i915_hw_ppgtt *ppgtt);
 };
 
+struct i915_ctx_hang_stats {
+	/* This context had batch pending when hang was declared */
+	unsigned batch_pending;
+
+	/* This context had batch active when hang was declared */
+	unsigned batch_active;
+};
 
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_ID 0
@@ -508,6 +515,7 @@  struct i915_hw_context {
 	struct drm_i915_file_private *file_priv;
 	struct intel_ring_buffer *ring;
 	struct drm_i915_gem_object *obj;
+	struct i915_ctx_hang_stats hang_stats;
 };
 
 enum no_fbc_reason {
@@ -1367,6 +1375,8 @@  struct drm_i915_file_private {
 		struct list_head request_list;
 	} mm;
 	struct idr context_idr;
+
+	struct i915_ctx_hang_stats hang_stats;
 };
 
 #define INTEL_INFO(dev)	(((struct drm_i915_private *) (dev)->dev_private)->info)