Message ID | 1363276337-12509-5-git-send-email-mika.kuoppala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 14, 2013 at 05:52:05PM +0200, Mika Kuoppala wrote: > From: Chris Wilson <chris@chris-wilson.co.uk> > > Once we thought we got semaphores working, we disabled kicking the ring > if hangcheck fired whilst waiting upon a ring as it was doing more harm > than good: > > commit 4e0e90dcb8a7df1229c69e30abebb59b0b3c2a1f > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Wed Dec 14 13:56:58 2011 +0100 > > drm/i915: kicking rings stuck on semaphores considered harmful > > However, life is never that easy and semaphores are still causing > problems whereby the value written by one ring (bcs) is not being > propagated to the waiter (rcs). Thus the waiter never wakes up and we > declare the GPU hung, which often has unfortunate consequences, even if > we successfully reset the GPU. > > But the GPU is idle as it has completed the work, just didn't notify its > clients. So we can detect the incomplete wait during hang check and > probe the target ring to see if has indeed emitted the breadcrumb seqno > following the work and then and only then kick the waiter. > > Based on a suggestion by Ben Widawsky. > > v2: cross-check wait with iphdr. fix signaller calculation. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=54226 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Ben Widawsky <ben@bwidawsk.net> > Acked-by: Ben Widawsky <ben@bwidawsk.net> I'll throw in the towel, let's all hail duct-tape. Queued for -next, thanks for the patch. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 2139714..ce7efa8 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1787,6 +1787,37 @@ static bool i915_hangcheck_ring_idle(struct intel_ring_buffer *ring, bool *err) return false; } +static bool semaphore_passed(struct intel_ring_buffer *ring) +{ + struct drm_i915_private *dev_priv = ring->dev->dev_private; + u32 acthd = intel_ring_get_active_head(ring) & HEAD_ADDR; + struct intel_ring_buffer *signaller; + u32 cmd, ipehr, acthd_min; + + ipehr = I915_READ(RING_IPEHR(ring->mmio_base)); + if ((ipehr & ~(0x3 << 16)) != + (MI_SEMAPHORE_MBOX | MI_SEMAPHORE_COMPARE | MI_SEMAPHORE_REGISTER)) + return false; + + /* ACTHD is likely pointing to the dword after the actual command, + * so scan backwards until we find the MBOX. + */ + acthd_min = max((int)acthd - 3 * 4, 0); + do { + cmd = ioread32(ring->virtual_start + acthd); + if (cmd == ipehr) + break; + + acthd -= 4; + if (acthd < acthd_min) + return false; + } while (1); + + signaller = &dev_priv->ring[(ring->id + (((ipehr >> 17) & 1) + 1)) % 3]; + return i915_seqno_passed(signaller->get_seqno(signaller, false), + ioread32(ring->virtual_start+acthd+4)+1); +} + static bool kick_ring(struct intel_ring_buffer *ring) { struct drm_device *dev = ring->dev; @@ -1798,6 +1829,15 @@ static bool kick_ring(struct intel_ring_buffer *ring) I915_WRITE_CTL(ring, tmp); return true; } + + if (INTEL_INFO(dev)->gen >= 6 && + tmp & RING_WAIT_SEMAPHORE && + semaphore_passed(ring)) { + DRM_ERROR("Kicking stuck semaphore on %s\n", + ring->name); + I915_WRITE_CTL(ring, tmp); + return true; + } return false; }