Message ID | 1390362310-15963-3-git-send-email-akash.goel@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'. > In this WA, need to add 12 MI Store Dword commands to ensure proper > flush of h/w pipeline. > > Signed-off-by: Akash Goel <akash.goel@intel.com> > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 133d273..e8ec536 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring) > > trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS); > > + if (IS_VALLEYVIEW(ring->dev)) { > + /* > + * WaReadAfterWriteHazard > + * Send a number of Store Data commands here to finish > + * flushing hardware pipeline.This is needed in the case > + * where the next workload tries reading from the same > + * surface that this batch writes to. Without these StoreDWs, > + * not all of the data will actually be flushd to the surface > + * by the time the next batch starts reading it, possibly > + * causing a small amount of corruption. > + */ > + int i; > + ret = intel_ring_begin(ring, 4 * 12); BSpec says 8 is enough. Is Bspec incorrect. Also this workaround is also listed for everything SNB+. > + if (ret) > + return ret; > + for (i = 0; i < 12; i++) { > + intel_ring_emit(ring, MI_STORE_DWORD_INDEX); > + intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX << > + MI_STORE_DWORD_INDEX_SHIFT); > + intel_ring_emit(ring, 0); > + intel_ring_emit(ring, MI_NOOP); > + } > + intel_ring_advance(ring); > + } > + > ring->gpu_caches_dirty = false; > return 0; > } > -- > 1.8.5.2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote: > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote: > > From: Akash Goel <akash.goel@intel.com> > > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'. > > In this WA, need to add 12 MI Store Dword commands to ensure proper > > flush of h/w pipeline. > > > > Signed-off-by: Akash Goel <akash.goel@intel.com> > > --- > > drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++ > > 1 file changed, 25 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > index 133d273..e8ec536 100644 > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring) > > > > trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS); > > > > + if (IS_VALLEYVIEW(ring->dev)) { > > + /* > > + * WaReadAfterWriteHazard > > + * Send a number of Store Data commands here to finish > > + * flushing hardware pipeline.This is needed in the case > > + * where the next workload tries reading from the same > > + * surface that this batch writes to. Without these StoreDWs, > > + * not all of the data will actually be flushd to the surface > > + * by the time the next batch starts reading it, possibly > > + * causing a small amount of corruption. > > + */ > > + int i; > > + ret = intel_ring_begin(ring, 4 * 12); > > BSpec says 8 is enough. Is Bspec incorrect. No, these are just figures they plucked out of the air. Last I heard they were using 32... > Also this workaround is also listed for everything SNB+. And we already have a more effective workaround. -Chris
On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote: > On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote: > > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote: > > > From: Akash Goel <akash.goel@intel.com> > > > > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'. > > > In this WA, need to add 12 MI Store Dword commands to ensure proper > > > flush of h/w pipeline. > > > > > > Signed-off-by: Akash Goel <akash.goel@intel.com> > > > --- > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++ > > > 1 file changed, 25 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > index 133d273..e8ec536 100644 > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring) > > > > > > trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS); > > > > > > + if (IS_VALLEYVIEW(ring->dev)) { > > > + /* > > > + * WaReadAfterWriteHazard > > > + * Send a number of Store Data commands here to finish > > > + * flushing hardware pipeline.This is needed in the case > > > + * where the next workload tries reading from the same > > > + * surface that this batch writes to. Without these StoreDWs, > > > + * not all of the data will actually be flushd to the surface > > > + * by the time the next batch starts reading it, possibly > > > + * causing a small amount of corruption. > > > + */ > > > + int i; > > > + ret = intel_ring_begin(ring, 4 * 12); > > > > BSpec says 8 is enough. Is Bspec incorrect. > > No, these are just figures they plucked out of the air. Last I heard > they were using 32... > > > Also this workaround is also listed for everything SNB+. > > And we already have a more effective workaround. > -Chris Can you please let us know whether this workaround patch is required or not. If not, then how is this currently handled. > > -- > Chris Wilson, Intel Open Source Technology Centre
On Fri, Mar 21, 2014 at 11:53:40AM +0000, Gupta, Sourab wrote: > On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote: > > On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote: > > > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote: > > > > From: Akash Goel <akash.goel@intel.com> > > > > > > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'. > > > > In this WA, need to add 12 MI Store Dword commands to ensure proper > > > > flush of h/w pipeline. > > > > > > > > Signed-off-by: Akash Goel <akash.goel@intel.com> > > > > --- > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++ > > > > 1 file changed, 25 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > > index 133d273..e8ec536 100644 > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring) > > > > > > > > trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS); > > > > > > > > + if (IS_VALLEYVIEW(ring->dev)) { > > > > + /* > > > > + * WaReadAfterWriteHazard > > > > + * Send a number of Store Data commands here to finish > > > > + * flushing hardware pipeline.This is needed in the case > > > > + * where the next workload tries reading from the same > > > > + * surface that this batch writes to. Without these StoreDWs, > > > > + * not all of the data will actually be flushd to the surface > > > > + * by the time the next batch starts reading it, possibly > > > > + * causing a small amount of corruption. > > > > + */ > > > > + int i; > > > > + ret = intel_ring_begin(ring, 4 * 12); > > > > > > BSpec says 8 is enough. Is Bspec incorrect. > > > > No, these are just figures they plucked out of the air. Last I heard > > they were using 32... > > > > > Also this workaround is also listed for everything SNB+. > > > > And we already have a more effective workaround. > > -Chris > > Can you please let us know whether this workaround patch is required or > not. If not, then how is this currently handled. We emit XY_SETUP_BLT before certain blt operations to insert a sufficiently long stall. The underlying bug this works around is that the cache controller of the cpu falls over in certain very specific rwm cycles. The official w/a is a full pipeline flush when switching between reading and writing to a surface, which has a horribly perf impact on the blitter. -Daniel
On Fri, 2014-03-21 at 14:58 +0000, Daniel Vetter wrote: > On Fri, Mar 21, 2014 at 11:53:40AM +0000, Gupta, Sourab wrote: > > On Wed, 2014-01-22 at 11:11 +0000, Chris Wilson wrote: > > > On Wed, Jan 22, 2014 at 12:54:51PM +0200, Ville Syrjälä wrote: > > > > On Wed, Jan 22, 2014 at 09:15:06AM +0530, akash.goel@intel.com wrote: > > > > > From: Akash Goel <akash.goel@intel.com> > > > > > > > > > > Added a new rendering specific Workaround 'WaReadAfterWriteHazard'. > > > > > In this WA, need to add 12 MI Store Dword commands to ensure proper > > > > > flush of h/w pipeline. > > > > > > > > > > Signed-off-by: Akash Goel <akash.goel@intel.com> > > > > > --- > > > > > drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++++++++++++++++++++ > > > > > 1 file changed, 25 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > > > index 133d273..e8ec536 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > > > > > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > > > > > @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring) > > > > > > > > > > trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS); > > > > > > > > > > + if (IS_VALLEYVIEW(ring->dev)) { > > > > > + /* > > > > > + * WaReadAfterWriteHazard > > > > > + * Send a number of Store Data commands here to finish > > > > > + * flushing hardware pipeline.This is needed in the case > > > > > + * where the next workload tries reading from the same > > > > > + * surface that this batch writes to. Without these StoreDWs, > > > > > + * not all of the data will actually be flushd to the surface > > > > > + * by the time the next batch starts reading it, possibly > > > > > + * causing a small amount of corruption. > > > > > + */ > > > > > + int i; > > > > > + ret = intel_ring_begin(ring, 4 * 12); > > > > > > > > BSpec says 8 is enough. Is Bspec incorrect. > > > > > > No, these are just figures they plucked out of the air. Last I heard > > > they were using 32... > > > > > > > Also this workaround is also listed for everything SNB+. > > > > > > And we already have a more effective workaround. > > > -Chris > > > > Can you please let us know whether this workaround patch is required or > > not. If not, then how is this currently handled. > > We emit XY_SETUP_BLT before certain blt operations to insert a > sufficiently long stall. The underlying bug this works around is that the > cache controller of the cpu falls over in certain very specific rwm > cycles. The official w/a is a full pipeline flush when switching between > reading and writing to a surface, which has a horribly perf impact on the > blitter. > -Daniel Hi Daniel, In the kernel code, we're not able to see any such operation during the batchbuffer submission. Is the XY_SETUP_BLT emit done through the userspace, in these specific usecases? Does this mean that this patch is not admissible as such for this underlying bug? Regards, Sourab
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 133d273..e8ec536 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2167,6 +2167,31 @@ intel_ring_flush_all_caches(struct intel_ring_buffer *ring) trace_i915_gem_ring_flush(ring, 0, I915_GEM_GPU_DOMAINS); + if (IS_VALLEYVIEW(ring->dev)) { + /* + * WaReadAfterWriteHazard + * Send a number of Store Data commands here to finish + * flushing hardware pipeline.This is needed in the case + * where the next workload tries reading from the same + * surface that this batch writes to. Without these StoreDWs, + * not all of the data will actually be flushd to the surface + * by the time the next batch starts reading it, possibly + * causing a small amount of corruption. + */ + int i; + ret = intel_ring_begin(ring, 4 * 12); + if (ret) + return ret; + for (i = 0; i < 12; i++) { + intel_ring_emit(ring, MI_STORE_DWORD_INDEX); + intel_ring_emit(ring, I915_GEM_HWS_SCRATCH_INDEX << + MI_STORE_DWORD_INDEX_SHIFT); + intel_ring_emit(ring, 0); + intel_ring_emit(ring, MI_NOOP); + } + intel_ring_advance(ring); + } + ring->gpu_caches_dirty = false; return 0; }