diff mbox

drm/i915/bdw: MU_FLUSH_DW a qword instead of dword

Message ID 1393954736-1397-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ben Widawsky March 4, 2014, 5:38 p.m. UTC
The actual post sync op is "Write Immediate Data QWord." It is therefore
arguable that we should have always done a qword write.

The actual impetus for this patch is our decoder complains when we write
a dword and I was trying to eliminate the spurious errors. With this
patch, I've noticed a really strange reproducible error turns into a
different strange reproducible error - so it does indeed have some
effect of some sort.

This was also recommended to me by someone that is familiar with the
Windows driver.

It's based on top of the semaphore series, so it won't be easily
applied, I'd guess.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 94 +++++++++++++++++++++++++--------
 1 file changed, 73 insertions(+), 21 deletions(-)

Comments

Chris Wilson March 5, 2014, 9:24 a.m. UTC | #1
On Tue, Mar 04, 2014 at 09:38:56AM -0800, Ben Widawsky wrote:
> The actual post sync op is "Write Immediate Data QWord." It is therefore
> arguable that we should have always done a qword write.

Not really since the spec explicitly says that we can choose either a
dword or qword write. Note that qword writes also currently require a
64 byte alignment.
-Chris
Daniel Vetter March 5, 2014, 6:33 p.m. UTC | #2
On Wed, Mar 05, 2014 at 09:24:34AM +0000, Chris Wilson wrote:
> On Tue, Mar 04, 2014 at 09:38:56AM -0800, Ben Widawsky wrote:
> > The actual post sync op is "Write Immediate Data QWord." It is therefore
> > arguable that we should have always done a qword write.
> 
> Not really since the spec explicitly says that we can choose either a
> dword or qword write. Note that qword writes also currently require a
> 64 byte alignment.

Yeah, that's also my reading of the spec - the lenght field selects
whether the hw does a qword or dword write, and the qword needs to be
specially aligned.
-Daniel
Ben Widawsky March 5, 2014, 7:05 p.m. UTC | #3
On Wed, Mar 05, 2014 at 07:33:11PM +0100, Daniel Vetter wrote:
> On Wed, Mar 05, 2014 at 09:24:34AM +0000, Chris Wilson wrote:
> > On Tue, Mar 04, 2014 at 09:38:56AM -0800, Ben Widawsky wrote:
> > > The actual post sync op is "Write Immediate Data QWord." It is therefore
> > > arguable that we should have always done a qword write.
> > 
> > Not really since the spec explicitly says that we can choose either a
> > dword or qword write. Note that qword writes also currently require a
> > 64 byte alignment.
> 
> Yeah, that's also my reading of the spec - the lenght field selects
> whether the hw does a qword or dword write, and the qword needs to be
> specially aligned.
> -Daniel

I think both of you only read this sentence, where I said it was
"arguable." The rest of the commit message was what actually mattered.
Chris Wilson March 5, 2014, 10:30 p.m. UTC | #4
On Wed, Mar 05, 2014 at 11:05:15AM -0800, Ben Widawsky wrote:
> On Wed, Mar 05, 2014 at 07:33:11PM +0100, Daniel Vetter wrote:
> > On Wed, Mar 05, 2014 at 09:24:34AM +0000, Chris Wilson wrote:
> > > On Tue, Mar 04, 2014 at 09:38:56AM -0800, Ben Widawsky wrote:
> > > > The actual post sync op is "Write Immediate Data QWord." It is therefore
> > > > arguable that we should have always done a qword write.
> > > 
> > > Not really since the spec explicitly says that we can choose either a
> > > dword or qword write. Note that qword writes also currently require a
> > > 64 byte alignment.
> > 
> > Yeah, that's also my reading of the spec - the lenght field selects
> > whether the hw does a qword or dword write, and the qword needs to be
> > specially aligned.
> > -Daniel
> 
> I think both of you only read this sentence, where I said it was
> "arguable." The rest of the commit message was what actually mattered.

I'm just arguing that the changelog is misleading. What we are doing is
papering over an elephant, and more importantly I think it overlooked
the extra restrictions imposed upon qwords (though it looks like we
fortuituously are ok). The changelog also implies that all our other
code is similarly flawed.

The actual patch of splitting the code up into separate gen8 routines I
thought was a nice improvement in readibility.
-Chris
Ben Widawsky March 5, 2014, 10:38 p.m. UTC | #5
On Wed, Mar 05, 2014 at 10:30:21PM +0000, Chris Wilson wrote:
> On Wed, Mar 05, 2014 at 11:05:15AM -0800, Ben Widawsky wrote:
> > On Wed, Mar 05, 2014 at 07:33:11PM +0100, Daniel Vetter wrote:
> > > On Wed, Mar 05, 2014 at 09:24:34AM +0000, Chris Wilson wrote:
> > > > On Tue, Mar 04, 2014 at 09:38:56AM -0800, Ben Widawsky wrote:
> > > > > The actual post sync op is "Write Immediate Data QWord." It is therefore
> > > > > arguable that we should have always done a qword write.
> > > > 
> > > > Not really since the spec explicitly says that we can choose either a
> > > > dword or qword write. Note that qword writes also currently require a
> > > > 64 byte alignment.
> > > 
> > > Yeah, that's also my reading of the spec - the lenght field selects
> > > whether the hw does a qword or dword write, and the qword needs to be
> > > specially aligned.
> > > -Daniel
> > 
> > I think both of you only read this sentence, where I said it was
> > "arguable." The rest of the commit message was what actually mattered.
> 
> I'm just arguing that the changelog is misleading. What we are doing is
> papering over an elephant, and more importantly I think it overlooked
> the extra restrictions imposed upon qwords (though it looks like we
> fortuituously are ok). The changelog also implies that all our other
> code is similarly flawed.

It wasn't completely fortuitous, I did check. I was lucky you think my
check was satisfactory though. I agree it makes future code somewhat
risky so maybe some improvement is needed to safeguard. I also have/had
a patch to lengthen MI_STORE_DATA_INDEX. The decoder however does not
complain about that one, and the windows team did neither. So I didn't
want to change it for the sake of change.

I think the reasons for FLUSH_DW are valid, but as it seems unrelated to
the actual root cause of the bug, I'll leave this one to the fates.

> 
> The actual patch of splitting the code up into separate gen8 routines I
> thought was a nice improvement in readibility.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Lespiau, Damien March 5, 2014, 11:13 p.m. UTC | #6
On Wed, Mar 05, 2014 at 02:38:50PM -0800, Ben Widawsky wrote:
> It wasn't completely fortuitous, I did check. I was lucky you think my
> check was satisfactory though. I agree it makes future code somewhat
> risky so maybe some improvement is needed to safeguard. I also have/had
> a patch to lengthen MI_STORE_DATA_INDEX. The decoder however does not
> complain about that one, and the windows team did neither. So I didn't
> want to change it for the sake of change.

Note that the decoder warns here because the shorter length is not valid
if you believe what's in the spec. It could actually work ok if the CS
hardware respects the length field, don't go fetch the missing dword,
and, as we're not using it, doesn't care too much (which seems to be the
case as we correctly jump to the next instruction). But not quite sure
how well defined this is.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5f7bee8..2f47abb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -684,7 +684,7 @@  static int gen8_rcs_signal(struct intel_ring_buffer *signaller,
 static int gen8_xcs_signal(struct intel_ring_buffer *signaller,
 			   unsigned int num_dwords)
 {
-#define MBOX_UPDATE_DWORDS 6
+#define MBOX_UPDATE_DWORDS 8
 	struct drm_device *dev = signaller->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *waiter;
@@ -704,15 +704,18 @@  static int gen8_xcs_signal(struct intel_ring_buffer *signaller,
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
+		intel_ring_emit(signaller, (MI_FLUSH_DW + 2) |
 					   MI_FLUSH_DW_OP_STOREDW);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
 					   MI_FLUSH_DW_USE_GTT);
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
 		intel_ring_emit(signaller, signaller->outstanding_lazy_seqno);
+		intel_ring_emit(signaller, 0); /* upper dword */
+
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->id));
 		intel_ring_emit(signaller, 0);
+		intel_ring_emit(signaller, MI_NOOP);
 	}
 
 	WARN_ON(i != num_rings);
@@ -1830,6 +1833,35 @@  static void gen6_bsd_ring_write_tail(struct intel_ring_buffer *ring,
 		   _MASKED_BIT_DISABLE(GEN6_BSD_SLEEP_MSG_DISABLE));
 }
 
+static int gen8_bsd_ring_flush(struct intel_ring_buffer *ring,
+			       u32 invalidate, u32 flush)
+{
+	uint32_t cmd;
+	int ret;
+
+	ret = intel_ring_begin(ring, 6);
+	if (ret)
+		return ret;
+
+	cmd = MI_FLUSH_DW + 2;
+	/*
+	 * Bspec vol 1c.5 - video engine command streamer:
+	 * "If ENABLED, all TLBs will be invalidated once the flush
+	 * operation is complete. This bit is only valid when the
+	 * Post-Sync Operation field is a value of 1h or 3h."
+	 */
+	if (invalidate & I915_GEM_GPU_DOMAINS)
+		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
+			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
+	intel_ring_emit(ring, cmd);
+	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
+	intel_ring_emit(ring, 0); /* upper addr */
+	intel_ring_emit(ring, 0); /* value */
+	intel_ring_emit(ring, 0); /* value */
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
+	return 0;
+}
 static int gen6_bsd_ring_flush(struct intel_ring_buffer *ring,
 			       u32 invalidate, u32 flush)
 {
@@ -1841,8 +1873,6 @@  static int gen6_bsd_ring_flush(struct intel_ring_buffer *ring,
 		return ret;
 
 	cmd = MI_FLUSH_DW;
-	if (INTEL_INFO(ring->dev)->gen >= 8)
-		cmd += 1;
 	/*
 	 * Bspec vol 1c.5 - video engine command streamer:
 	 * "If ENABLED, all TLBs will be invalidated once the flush
@@ -1854,13 +1884,8 @@  static int gen6_bsd_ring_flush(struct intel_ring_buffer *ring,
 			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
-	if (INTEL_INFO(ring->dev)->gen >= 8) {
-		intel_ring_emit(ring, 0); /* upper addr */
-		intel_ring_emit(ring, 0); /* value */
-	} else  {
-		intel_ring_emit(ring, 0);
-		intel_ring_emit(ring, MI_NOOP);
-	}
+	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 	return 0;
 }
@@ -1931,8 +1956,38 @@  gen6_ring_dispatch_execbuffer(struct intel_ring_buffer *ring,
 	return 0;
 }
 
-/* Blitter support (SandyBridge+) */
+static int gen8_ring_flush(struct intel_ring_buffer *ring,
+			   u32 invalidate, u32 flush)
+{
+	uint32_t cmd;
+	int ret;
+
+	ret = intel_ring_begin(ring, 6);
+	if (ret)
+		return ret;
+
+	cmd = MI_FLUSH_DW + 2;
+	/*
+	 * Bspec vol 1c.3 - blitter engine command streamer:
+	 * "If ENABLED, all TLBs will be invalidated once the flush
+	 * operation is complete. This bit is only valid when the
+	 * Post-Sync Operation field is a value of 1h or 3h."
+	 */
+	if (invalidate & I915_GEM_DOMAIN_RENDER)
+		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
+			MI_FLUSH_DW_OP_STOREDW;
+	intel_ring_emit(ring, cmd);
+	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
+	intel_ring_emit(ring, 0); /* upper addr */
+	intel_ring_emit(ring, 0); /* value */
+	intel_ring_emit(ring, 0); /* value */
+	intel_ring_emit(ring, MI_NOOP);
+	intel_ring_advance(ring);
+
+	return 0;
+}
 
+/* Blitter support (SandyBridge+) */
 static int gen6_ring_flush(struct intel_ring_buffer *ring,
 			   u32 invalidate, u32 flush)
 {
@@ -1945,8 +2000,7 @@  static int gen6_ring_flush(struct intel_ring_buffer *ring,
 		return ret;
 
 	cmd = MI_FLUSH_DW;
-	if (INTEL_INFO(ring->dev)->gen >= 8)
-		cmd += 1;
+
 	/*
 	 * Bspec vol 1c.3 - blitter engine command streamer:
 	 * "If ENABLED, all TLBs will be invalidated once the flush
@@ -1958,13 +2012,8 @@  static int gen6_ring_flush(struct intel_ring_buffer *ring,
 			MI_FLUSH_DW_OP_STOREDW;
 	intel_ring_emit(ring, cmd);
 	intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT);
-	if (INTEL_INFO(ring->dev)->gen >= 8) {
-		intel_ring_emit(ring, 0); /* upper addr */
-		intel_ring_emit(ring, 0); /* value */
-	} else  {
-		intel_ring_emit(ring, 0);
-		intel_ring_emit(ring, MI_NOOP);
-	}
+	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
 
 	if (IS_GEN7(dev) && !invalidate && flush)
@@ -2190,6 +2239,7 @@  int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		if (INTEL_INFO(dev)->gen >= 8) {
 			ring->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
+			ring->flush = gen8_bsd_ring_flush;
 			ring->irq_get = gen8_ring_get_irq;
 			ring->irq_put = gen8_ring_put_irq;
 			ring->dispatch_execbuffer =
@@ -2257,6 +2307,7 @@  int intel_init_blt_ring_buffer(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
+		ring->flush = gen8_ring_flush;
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
@@ -2306,6 +2357,7 @@  int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen >= 8) {
 		ring->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
+		ring->flush = gen8_ring_flush;
 		ring->irq_get = gen8_ring_get_irq;
 		ring->irq_put = gen8_ring_put_irq;
 		ring->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;