Message ID | 1343346523-25255-1-git-send-email-ben@bwidawsk.net (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Thu, Jul 26, 2012 at 04:48:43PM -0700, Ben Widawsky wrote: > The IVB simulator really doesn't like a TLB invalidate with no post-sync > operation, in fact it blows up in an assertion failure. The > documentation states that we must issue the TLB invalidate with a CS > stall: "Also Requires stall bit ([20] of DW1) set." This patch doesn't > comply with the docs, but we're able to satisfy the simulator with this > very small change, and I think simulator has historically trumped docs. > > Note, I don't think this belongs in stable as our TLB invalidation > should be correct since we use the global invalidation per batch. Using > TLB invalidation is itself only a requirement of HW contexts. > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> I have another patch from Chris that kills the non-zero post-sync op workaround for ivb ... So I guess we can't do this this easily. -Daniel > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index c58f1b9..339712a 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -198,8 +198,12 @@ intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring) > if (ret) > return ret; > > + /* NOTE: we want the TLB invalidate for render ring flush, but it must > + * be sent with a non-zero post sync op. So it can be stuffed in here > + * for convenience. > + */ > intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(5)); > - intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE); > + intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_TLB_INVALIDATE); > intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT); /* address */ > intel_ring_emit(ring, 0); > intel_ring_emit(ring, 0); > @@ -225,10 +229,10 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring, > > /* Just flush everything. Experiments have shown that reducing the > * number of bits based on the write domains has little performance > - * impact. > + * impact. Note to copy/pasters: the TLB invlidate we want is tucked > + * into intel_emit_post_sync_nonzero_flush(). > */ > flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; > - flags |= PIPE_CONTROL_TLB_INVALIDATE; > flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; > flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; > flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; > -- > 1.7.11.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, 27 Jul 2012 07:17:36 +0200 Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Jul 26, 2012 at 04:48:43PM -0700, Ben Widawsky wrote: > > The IVB simulator really doesn't like a TLB invalidate with no > > post-sync operation, in fact it blows up in an assertion failure. > > The documentation states that we must issue the TLB invalidate with > > a CS stall: "Also Requires stall bit ([20] of DW1) set." This patch > > doesn't comply with the docs, but we're able to satisfy the > > simulator with this very small change, and I think simulator has > > historically trumped docs. > > > > Note, I don't think this belongs in stable as our TLB invalidation > > should be correct since we use the global invalidation per batch. > > Using TLB invalidation is itself only a requirement of HW contexts. > > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > I have another patch from Chris that kills the non-zero post-sync op > workaround for ivb ... So I guess we can't do this this easily. > -Daniel Rethink this. The need to emit the TLB invalidation as the simulator dictates essentially means we cannot remove the workaround as Chris' patch does. > > --- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c > > b/drivers/gpu/drm/i915/intel_ringbuffer.c index c58f1b9..339712a > > 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -198,8 +198,12 @@ intel_emit_post_sync_nonzero_flush(struct > > intel_ring_buffer *ring) if (ret) > > return ret; > > > > + /* NOTE: we want the TLB invalidate for render ring flush, > > but it must > > + * be sent with a non-zero post sync op. So it can be > > stuffed in here > > + * for convenience. > > + */ > > intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(5)); > > - intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE); > > + intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE | > > PIPE_CONTROL_TLB_INVALIDATE); intel_ring_emit(ring, scratch_addr | > > PIPE_CONTROL_GLOBAL_GTT); /* address */ intel_ring_emit(ring, 0); > > intel_ring_emit(ring, 0); > > @@ -225,10 +229,10 @@ gen6_render_ring_flush(struct > > intel_ring_buffer *ring, > > /* Just flush everything. Experiments have shown that > > reducing the > > * number of bits based on the write domains has little > > performance > > - * impact. > > + * impact. Note to copy/pasters: the TLB invlidate we want > > is tucked > > + * into intel_emit_post_sync_nonzero_flush(). > > */ > > flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; > > - flags |= PIPE_CONTROL_TLB_INVALIDATE; > > flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; > > flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; > > flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; > > -- > > 1.7.11.3 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx >
On Fri, Jul 27, 2012 at 7:57 PM, Ben Widawsky <ben@bwidawsk.net> wrote: > On Fri, 27 Jul 2012 07:17:36 +0200 > Daniel Vetter <daniel@ffwll.ch> wrote: > >> On Thu, Jul 26, 2012 at 04:48:43PM -0700, Ben Widawsky wrote: >> > The IVB simulator really doesn't like a TLB invalidate with no >> > post-sync operation, in fact it blows up in an assertion failure. >> > The documentation states that we must issue the TLB invalidate with >> > a CS stall: "Also Requires stall bit ([20] of DW1) set." This patch >> > doesn't comply with the docs, but we're able to satisfy the >> > simulator with this very small change, and I think simulator has >> > historically trumped docs. >> > >> > Note, I don't think this belongs in stable as our TLB invalidation >> > should be correct since we use the global invalidation per batch. >> > Using TLB invalidation is itself only a requirement of HW contexts. >> > >> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> >> >> I have another patch from Chris that kills the non-zero post-sync op >> workaround for ivb ... So I guess we can't do this this easily. >> -Daniel > > Rethink this. The need to emit the TLB invalidation as the simulator > dictates essentially means we cannot remove the workaround as > Chris' patch does. Afaik that "tlb needs a non-zero post-sync op" just means that we need to have a non-zero post-sync op in the pipe_control cmd with the tlb set. Whereas the gen6 wa dictatest that we need a nonzero postsync op _before_ the pipe_control that sets the render flush bit. So I'm still playing the dense here and don't see the connection (besides that we can abuse the w/a nonzero postsync op to also make the tlb flush happy). So can't we just set add w/a nonzeor postsync op for gen7 to make the simulator happy? -Daniel
On 2012-08-05 11:32, Daniel Vetter wrote: > On Fri, Jul 27, 2012 at 7:57 PM, Ben Widawsky <ben@bwidawsk.net> > wrote: >> On Fri, 27 Jul 2012 07:17:36 +0200 >> Daniel Vetter <daniel@ffwll.ch> wrote: >> >>> On Thu, Jul 26, 2012 at 04:48:43PM -0700, Ben Widawsky wrote: >>> > The IVB simulator really doesn't like a TLB invalidate with no >>> > post-sync operation, in fact it blows up in an assertion failure. >>> > The documentation states that we must issue the TLB invalidate >>> with >>> > a CS stall: "Also Requires stall bit ([20] of DW1) set." This >>> patch >>> > doesn't comply with the docs, but we're able to satisfy the >>> > simulator with this very small change, and I think simulator has >>> > historically trumped docs. >>> > >>> > Note, I don't think this belongs in stable as our TLB >>> invalidation >>> > should be correct since we use the global invalidation per batch. >>> > Using TLB invalidation is itself only a requirement of HW >>> contexts. >>> > >>> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> >>> >>> I have another patch from Chris that kills the non-zero post-sync >>> op >>> workaround for ivb ... So I guess we can't do this this easily. >>> -Daniel >> >> Rethink this. The need to emit the TLB invalidation as the simulator >> dictates essentially means we cannot remove the workaround as >> Chris' patch does. > > Afaik that "tlb needs a non-zero post-sync op" just means that we > need > to have a non-zero post-sync op in the pipe_control cmd with the tlb > set. Whereas the gen6 wa dictatest that we need a nonzero postsync op > _before_ the pipe_control that sets the render flush bit. So I'm > still > playing the dense here and don't see the connection (besides that we > can abuse the w/a nonzero postsync op to also make the tlb flush > happy). > > So can't we just set add w/a nonzeor postsync op for gen7 to make the > simulator happy? > -Daniel That is certainly a possibility. I'll double check the bspec or simulator before I submit anything though. I'd like to make sure this doesn't break some other rule. For some reason I was under the assumption that we couldn't just emit a post sync op without several workarounds even on gen7...
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index c58f1b9..339712a 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -198,8 +198,12 @@ intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring) if (ret) return ret; + /* NOTE: we want the TLB invalidate for render ring flush, but it must + * be sent with a non-zero post sync op. So it can be stuffed in here + * for convenience. + */ intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(5)); - intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE); + intel_ring_emit(ring, PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_TLB_INVALIDATE); intel_ring_emit(ring, scratch_addr | PIPE_CONTROL_GLOBAL_GTT); /* address */ intel_ring_emit(ring, 0); intel_ring_emit(ring, 0); @@ -225,10 +229,10 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring, /* Just flush everything. Experiments have shown that reducing the * number of bits based on the write domains has little performance - * impact. + * impact. Note to copy/pasters: the TLB invlidate we want is tucked + * into intel_emit_post_sync_nonzero_flush(). */ flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; - flags |= PIPE_CONTROL_TLB_INVALIDATE; flags |= PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE; flags |= PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
The IVB simulator really doesn't like a TLB invalidate with no post-sync operation, in fact it blows up in an assertion failure. The documentation states that we must issue the TLB invalidate with a CS stall: "Also Requires stall bit ([20] of DW1) set." This patch doesn't comply with the docs, but we're able to satisfy the simulator with this very small change, and I think simulator has historically trumped docs. Note, I don't think this belongs in stable as our TLB invalidation should be correct since we use the global invalidation per batch. Using TLB invalidation is itself only a requirement of HW contexts. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/intel_ringbuffer.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)