Message ID | 1371029734-10355-1-git-send-email-mika.kuoppala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote: > To count context losses, add struct i915_ctx_hang_stats for > both i915_hw_context and drm_i915_file_private. > drm_i915_file_private is used when there is no context. > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> Only minor bikesheds on the series, so Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote: > To count context losses, add struct i915_ctx_hang_stats for > both i915_hw_context and drm_i915_file_private. > drm_i915_file_private is used when there is no context. > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > I don't have time to do a proper review before Daniel wants to merge these, and Chris has already reviewed it. 1-6 are: Acked-by: Ben Widawsky <ben@bwidawsk.net> I don't really like the behavior of 7. At least, I'd like to make it something that can be disabled via debugfs, sysfs, or module parameter. (I'd very much prefer it to be opt-in also). TBH , I only read it very fast, and I'm not horribly opposed to it, just a bunch of complexity for IMO little gain. Presumably the problem it's trying to solve should be fixed with a fix to ddx, mesa, libva, client, whatever. In the embedded case, the same thing applies. Banning the guilty doesn't make the user experience any better. So the only thing I see is DoS, but we've never *really* made that our priority anyway, so, meh.
On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote: > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote: > > To count context losses, add struct i915_ctx_hang_stats for > > both i915_hw_context and drm_i915_file_private. > > drm_i915_file_private is used when there is no context. > > > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > > I don't have time to do a proper review before Daniel wants to merge > these, and Chris has already reviewed it. > > 1-6 are: > Acked-by: Ben Widawsky <ben@bwidawsk.net> > > I don't really like the behavior of 7. At least, I'd like to make it > something that can be disabled via debugfs, sysfs, or module parameter. > (I'd very much prefer it to be opt-in also). TBH , I only read it very > fast, and I'm not horribly opposed to it, just a bunch of complexity for > IMO little gain. Presumably the problem it's trying to solve should be > fixed with a fix to ddx, mesa, libva, client, whatever. In the embedded > case, the same thing applies. Banning the guilty doesn't make the user > experience any better. So the only thing I see is DoS, but we've never > *really* made that our priority anyway, so, meh. Right, it is policy. But it is existing policy. Ultimately we want to get as much of that decision out of the kernel. -Chris
On Wed, Jun 12, 2013 at 11:44:51PM +0100, Chris Wilson wrote: > On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote: > > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote: > > > To count context losses, add struct i915_ctx_hang_stats for > > > both i915_hw_context and drm_i915_file_private. > > > drm_i915_file_private is used when there is no context. > > > > > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) > > > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > > > > I don't have time to do a proper review before Daniel wants to merge > > these, and Chris has already reviewed it. > > > > 1-6 are: > > Acked-by: Ben Widawsky <ben@bwidawsk.net> > > > > I don't really like the behavior of 7. At least, I'd like to make it > > something that can be disabled via debugfs, sysfs, or module parameter. > > (I'd very much prefer it to be opt-in also). TBH , I only read it very > > fast, and I'm not horribly opposed to it, just a bunch of complexity for > > IMO little gain. Presumably the problem it's trying to solve should be > > fixed with a fix to ddx, mesa, libva, client, whatever. In the embedded > > case, the same thing applies. Banning the guilty doesn't make the user > > experience any better. So the only thing I see is DoS, but we've never > > *really* made that our priority anyway, so, meh. > > Right, it is policy. But it is existing policy. Ultimately we want to > get as much of that decision out of the kernel. Merged the entire series with a little note added to this patch explaining the justification for it. Thanks for patches and review. -Daniel
On Thu, Jun 13, 2013 at 11:53:13AM +0200, Daniel Vetter wrote: > On Wed, Jun 12, 2013 at 11:44:51PM +0100, Chris Wilson wrote: > > On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote: > > > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote: > > > > To count context losses, add struct i915_ctx_hang_stats for > > > > both i915_hw_context and drm_i915_file_private. > > > > drm_i915_file_private is used when there is no context. > > > > > > > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) > > > > > > > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > > > > > > I don't have time to do a proper review before Daniel wants to merge > > > these, and Chris has already reviewed it. > > > > > > 1-6 are: > > > Acked-by: Ben Widawsky <ben@bwidawsk.net> > > > > > > I don't really like the behavior of 7. At least, I'd like to make it > > > something that can be disabled via debugfs, sysfs, or module parameter. > > > (I'd very much prefer it to be opt-in also). TBH , I only read it very > > > fast, and I'm not horribly opposed to it, just a bunch of complexity for > > > IMO little gain. Presumably the problem it's trying to solve should be > > > fixed with a fix to ddx, mesa, libva, client, whatever. In the embedded > > > case, the same thing applies. Banning the guilty doesn't make the user > > > experience any better. So the only thing I see is DoS, but we've never > > > *really* made that our priority anyway, so, meh. > > > > Right, it is policy. But it is existing policy. Ultimately we want to > > get as much of that decision out of the kernel. > > Merged the entire series with a little note added to this patch explaining > the justification for it. Thanks for patches and review. I think a careful review of the locking would be good here. I see a few cases: - dev_priv->hangcheck stats which are only touched by the hangcheck timer ever. Safe since the hangcheck timer is non-reentrant, but would be good to add a comment to those things. - Data shared between hangcheck and reset work. Probably needs spinlock protection, I'd add a new one. I know that it's rather unlikely that we'll race, but if we e.g. increase the hangcheck rate a lot we might fire the next hangcheck before the reset work has completed. Or the scheduler could be really annoying and not give the work cpu time for a while. - Guilty context stats in the context structs. Atm this is protected by dev->struct_mutex I think, but I'd like to not proliferate the use of that lock if possible (there's _way_ too much legacy bagadge attached to this thing). So I'd vote for the introduction of a new mutex. - Anyting else I've missed in my quick review. I think a patch for each class of data explaining how it's accessed and how it's protected exactly (both in the commit message and with a short comment in the headers) would be really fine. Volunteered? Cheers, Daniel
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index fd8898c..8e628dc 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1814,7 +1814,7 @@ int i915_driver_open(struct drm_device *dev, struct drm_file *file) struct drm_i915_file_private *file_priv; DRM_DEBUG_DRIVER("\n"); - file_priv = kmalloc(sizeof(*file_priv), GFP_KERNEL); + file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL); if (!file_priv) return -ENOMEM; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index eaa04a6..5f3da39 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -498,6 +498,13 @@ struct i915_hw_ppgtt { void (*cleanup)(struct i915_hw_ppgtt *ppgtt); }; +struct i915_ctx_hang_stats { + /* This context had batch pending when hang was declared */ + unsigned batch_pending; + + /* This context had batch active when hang was declared */ + unsigned batch_active; +}; /* This must match up with the value previously used for execbuf2.rsvd1. */ #define DEFAULT_CONTEXT_ID 0 @@ -508,6 +515,7 @@ struct i915_hw_context { struct drm_i915_file_private *file_priv; struct intel_ring_buffer *ring; struct drm_i915_gem_object *obj; + struct i915_ctx_hang_stats hang_stats; }; enum no_fbc_reason { @@ -1367,6 +1375,8 @@ struct drm_i915_file_private { struct list_head request_list; } mm; struct idr context_idr; + + struct i915_ctx_hang_stats hang_stats; }; #define INTEL_INFO(dev) (((struct drm_i915_private *) (dev)->dev_private)->info)
To count context losses, add struct i915_ctx_hang_stats for both i915_hw_context and drm_i915_file_private. drm_i915_file_private is used when there is no context. v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick) Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_dma.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-)