Message ID | 1395741214-26440-1-git-send-email-sourab.gupta@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'. > This workaround has to be applied before doing TLB Invalidation on render ring. > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI > Store data commands. > Without this, hardware cannot guarantee the command after the PIPE_CONTROL > with TLB inv will not use the old TLB values. Note, that our command programming sequence already has multiple dword writes between the flush of the last batch and the invalidation of the next. Is this w/a still required? Why? -Chris
On Tue, 2014-03-25 at 10:59 +0000, Chris Wilson wrote: > On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote: > > From: Akash Goel <akash.goel@intel.com> > > > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'. > > This workaround has to be applied before doing TLB Invalidation on render ring. > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI > > Store data commands. > > Without this, hardware cannot guarantee the command after the PIPE_CONTROL > > with TLB inv will not use the old TLB values. > > Note, that our command programming sequence already has multiple dword > writes between the flush of the last batch and the invalidation of the > next. > > Is this w/a still required? Why? > -Chris > Hi Chris, Are you referring to the MI_STORE_DWORD_INDEX commands emitted as a part of add_request functions? We couldn't find any other place where we are storing dwords between flush of last batch and invalidation of next. In that case, we agree that as a part of command programming sequence, we'll have one set of store dwords emitted. But, as per spec, it is required to emit 2 sets of store dwords before the tlb invalidation. Also, our motive for having this w/a is just being paranoid, and not assuming that dwords would already have been emitted before doing tlb invalidation. So, we try to explicitly ensure the same through our w/a. Regards, Sourab
On Wed, Mar 26, 2014 at 05:14:05AM +0000, Gupta, Sourab wrote: > On Tue, 2014-03-25 at 10:59 +0000, Chris Wilson wrote: > > On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote: > > > From: Akash Goel <akash.goel@intel.com> > > > > > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'. > > > This workaround has to be applied before doing TLB Invalidation on render ring. > > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI > > > Store data commands. > > > Without this, hardware cannot guarantee the command after the PIPE_CONTROL > > > with TLB inv will not use the old TLB values. > > > > Note, that our command programming sequence already has multiple dword > > writes between the flush of the last batch and the invalidation of the > > next. > > > > Is this w/a still required? Why? > > -Chris > > > Hi Chris, > Are you referring to the MI_STORE_DWORD_INDEX commands emitted as a part > of add_request functions? > We couldn't find any other place where we are storing dwords between > flush of last batch and invalidation of next. > In that case, we agree that as a part of command programming sequence, > we'll have one set of store dwords emitted. > > But, as per spec, it is required to emit 2 sets of store dwords before > the tlb invalidation. At this moment, I am upset how vague this recommendation is. Why don't the LRI count? Is there a timing requirement? Do the stores have to be different pages to flush a TLB etc? > Also, our motive for having this w/a is just being paranoid, and not > assuming that dwords would already have been emitted before doing tlb > invalidation. So, we try to explicitly ensure the same through our w/a. Which would be as easy as doubling up the STORE_DW(seqno). -Chris
On Wed, 2014-03-26 at 07:54 +0000, Chris Wilson wrote: > On Wed, Mar 26, 2014 at 05:14:05AM +0000, Gupta, Sourab wrote: > > On Tue, 2014-03-25 at 10:59 +0000, Chris Wilson wrote: > > > On Tue, Mar 25, 2014 at 03:23:34PM +0530, sourab.gupta@intel.com wrote: > > > > From: Akash Goel <akash.goel@intel.com> > > > > > > > > Added a new rendering specific Workaround 'WaTlbInvalidateStoreDataBefore'. > > > > This workaround has to be applied before doing TLB Invalidation on render ring. > > > > In this WA, before pipecontrol with TLB invalidate set, need to add 2 MI > > > > Store data commands. > > > > Without this, hardware cannot guarantee the command after the PIPE_CONTROL > > > > with TLB inv will not use the old TLB values. > > > > > > Note, that our command programming sequence already has multiple dword > > > writes between the flush of the last batch and the invalidation of the > > > next. > > > > > > Is this w/a still required? Why? > > > -Chris > > > > > Hi Chris, > > Are you referring to the MI_STORE_DWORD_INDEX commands emitted as a part > > of add_request functions? > > We couldn't find any other place where we are storing dwords between > > flush of last batch and invalidation of next. > > In that case, we agree that as a part of command programming sequence, > > we'll have one set of store dwords emitted. > > > > But, as per spec, it is required to emit 2 sets of store dwords before > > the tlb invalidation. > > At this moment, I am upset how vague this recommendation is. Why don't > the LRI count? Is there a timing requirement? Do the stores have to > be different pages to flush a TLB etc? > Hi Chris, I see your point. We do see the MI_LOAD_REGISTER_IMM calls, but we were not sure whether they would serve the purpose (forgive our limited visibility on the low level pipeline details). If the LRI + store dword combination does the job, then this w/a may not be required and we can drop it. Regards, Sourab > > Also, our motive for having this w/a is just being paranoid, and not > > assuming that dwords would already have been emitted before doing tlb > > invalidation. So, we try to explicitly ensure the same through our w/a. > > Which would be as easy as doubling up the STORE_DW(seqno). > -Chris >
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 87d1a2d..816137f 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -208,6 +208,31 @@ intel_emit_post_sync_nonzero_flush(struct intel_ring_buffer *ring) } static int +gen6_tlb_invalidate_wa(struct intel_ring_buffer *ring) +{ + /* + * WaTlbInvalidateStoreDataBefore:gen6,gen7 + * This workaround has to be applied before doing TLB invalidation + * on the render ring. Before pipecontrol with TLB invalidate set, + * need 2 store data commands (such as MI_STORE_DATA_IMM or + * MI_STORE_DATA_INDEX). Without this, hardware cannot guarantee + * the command after the PIPE_CONTROL with TLB inv will not use + * the old TLB values. + */ + int i, ret; + ret = intel_ring_begin(ring, 3 * 2); + if (ret) + return ret; + for (i = 0; i < 2; i++) { + intel_ring_emit(ring, MI_STORE_DWORD_INDEX); + intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_ADDR); + intel_ring_emit(ring, 0); + } + intel_ring_advance(ring); + return 0; +} + +static int gen6_render_ring_flush(struct intel_ring_buffer *ring, u32 invalidate_domains, u32 flush_domains) { @@ -215,6 +240,13 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring, u32 scratch_addr = ring->scratch.gtt_offset + 128; int ret; + /* Apply WaTlbInvalidateStoreDataBefore workaround */ + if (invalidate_domains) { + ret = gen6_tlb_invalidate_wa(ring); + if (ret) + return ret; + } + /* Force SNB workarounds for PIPE_CONTROL flushes */ ret = intel_emit_post_sync_nonzero_flush(ring); if (ret) @@ -309,6 +341,13 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring, u32 scratch_addr = ring->scratch.gtt_offset + 128; int ret; + /* Apply WaTlbInvalidateStoreDataBefore workaround */ + if (invalidate_domains) { + ret = gen6_tlb_invalidate_wa(ring); + if (ret) + return ret; + } + /* * Ensure that any following seqno writes only happen when the render * cache is indeed flushed.