Message ID | 20171101163235.10795-1-rafael.antognolli@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Nov 01, 2017 at 04:32:35PM +0000, Rafael Antognolli wrote: > The workaround for this is described as: > > "if RenderSurfaceState.Num_Multisamples > 1, disable RCC clock gating if > RenderSurfaceState.Num_Multisamples == 1, set 0x7010[14] = 1" > > So it looks like the userspace should be responsible for setting these, > based on the number of multisamples dependency. However, the register > that controls RCC clock gating is not a context register, and cannot be > set by userspace. > > Since we would end up setting one or another based on the number of > multisamples anyway, it seems harmless to just set both all the time. > > This change (specially the GEN10_READ_HIT_WRITEONLY_DISABLE bit) I wonder if we shouldn't stay only with this bit. For me it looks like one or another. > improves CNL stability by avoiding some of the hangs seen in the > platform. But this is what matters. If this is the safest option for us let's do it. > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > --- > drivers/gpu/drm/i915/i915_reg.h | 2 ++ > drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index 8c775e96b4e4..d9a65cebefaa 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -3837,6 +3837,7 @@ enum { > */ > #define SLICE_UNIT_LEVEL_CLKGATE _MMIO(0x94d4) > #define SARBUNIT_CLKGATE_DIS (1 << 5) > +#define RCCUNIT_CLKGATE_DIS (1 << 7) > > /* > * Display engine regs > @@ -7016,6 +7017,7 @@ enum { > #define GEN7_COMMON_SLICE_CHICKEN1 _MMIO(0x7010) > # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC ((1<<10) | (1<<26)) > # define GEN9_RHWO_OPTIMIZATION_DISABLE (1<<14) > +# define GEN10_READ_HIT_WRITEONLY_DISABLE (1<<14) I don't believe you need to redefine this. It is same as GEN9_RHWO_OPTIMIZATION_DISABLE. RCC Read Hit Write Only Optimization Disabled, SKL+ o spec. > #define COMMON_SLICE_CHICKEN2 _MMIO(0x7014) > # define GEN9_PBE_COMPRESSED_HASH_SELECTION (1<<13) > # define GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE (1<<12) > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > index f31f2d6384c3..0d8e25a4623a 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -1320,6 +1320,11 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine) > WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_GPGPU_LEVEL_MASK, > GEN9_PREEMPT_GPGPU_COMMAND_LEVEL); > > + /* ReadHitWriteOnlyDisable: cnl */ I was going to complain about the name but I saw on bspec it is really ReadHitWriteOnlyDisable while on wa_database it is WaReadHitWriteOnlyDisable I would tend to prefer the second one, but with the first one the search on Bspec works and search on wa_database also works... while second one it would be found on BSpec. So let it be: ReadHitWriteOnlyDisable Thanks, Rodrigo. > + WA_SET_BIT_MASKED(GEN7_COMMON_SLICE_CHICKEN1, > + GEN10_READ_HIT_WRITEONLY_DISABLE); > + WA_SET_BIT_MASKED(SLICE_UNIT_LEVEL_CLKGATE, RCCUNIT_CLKGATE_DIS); > + > /* WaEnablePreemptionGranularityControlByUMD:cnl */ > I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1, > _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL)); > -- > 2.13.6 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Nov 01, 2017 at 08:56:59PM +0000, Patchwork wrote: > == Series Details == > > Series: drm/i915: Implement ReadHitWriteOnlyDisable. > URL : https://patchwork.freedesktop.org/series/32991/ > State : failure > > == Summary == > > Test kms_flip: > Subgroup plain-flip-fb-recreate-interruptible: > pass -> FAIL (shard-hsw) fdo#100368 > Subgroup modeset-vs-vblank-race-interruptible: > fail -> PASS (shard-hsw) fdo#103060 > Test kms_setmode: > Subgroup basic: > fail -> PASS (shard-hsw) fdo#99912 > Test perf: > Subgroup oa-exponents: > fail -> PASS (shard-hsw) fdo#102254 > Test drv_module_reload: > Subgroup basic-reload: > pass -> DMESG-WARN (shard-hsw) fdo#102707 > Test kms_frontbuffer_tracking: > Subgroup fbc-1p-pri-indfb-multidraw: > pass -> FAIL (shard-hsw) A false positive here. I triggered the patch retest. Altough we don't cnl on full igt run anyways... > > fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368 > fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060 > fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912 > fdo#102254 https://bugs.freedesktop.org/show_bug.cgi?id=102254 > fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707 > > shard-hsw total:2539 pass:1430 dwarn:2 dfail:0 fail:10 skip:1097 time:9227s > > == Logs == > > For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_6300/shards.html > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Nov 01, 2017 at 02:11:05PM -0700, Rodrigo Vivi wrote: > On Wed, Nov 01, 2017 at 04:32:35PM +0000, Rafael Antognolli wrote: > > The workaround for this is described as: > > > > "if RenderSurfaceState.Num_Multisamples > 1, disable RCC clock gating if > > RenderSurfaceState.Num_Multisamples == 1, set 0x7010[14] = 1" > > > > So it looks like the userspace should be responsible for setting these, > > based on the number of multisamples dependency. However, the register > > that controls RCC clock gating is not a context register, and cannot be > > set by userspace. > > > > Since we would end up setting one or another based on the number of > > multisamples anyway, it seems harmless to just set both all the time. > > > > This change (specially the GEN10_READ_HIT_WRITEONLY_DISABLE bit) > > I wonder if we shouldn't stay only with this bit. For me it looks like > one or another. I would say we need at least the GEN10_READ_HIT_WRITEONLY_DISABLE bit, and then we can decide if we are adding the other one or not. Within the same program (I think sometimes even within a single draw call), it's possible that we have some surfaces that are multisampled while others are not, so in that case we would probably have to set both anyway. However, I don't have a test case that proves that the workaround for the multisampled case is required... > > improves CNL stability by avoiding some of the hangs seen in the > > platform. > > But this is what matters. If this is the safest option for us > let's do it. > > > > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > > --- > > drivers/gpu/drm/i915/i915_reg.h | 2 ++ > > drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > > index 8c775e96b4e4..d9a65cebefaa 100644 > > --- a/drivers/gpu/drm/i915/i915_reg.h > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > @@ -3837,6 +3837,7 @@ enum { > > */ > > #define SLICE_UNIT_LEVEL_CLKGATE _MMIO(0x94d4) > > #define SARBUNIT_CLKGATE_DIS (1 << 5) > > +#define RCCUNIT_CLKGATE_DIS (1 << 7) > > > > /* > > * Display engine regs > > @@ -7016,6 +7017,7 @@ enum { > > #define GEN7_COMMON_SLICE_CHICKEN1 _MMIO(0x7010) > > # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC ((1<<10) | (1<<26)) > > # define GEN9_RHWO_OPTIMIZATION_DISABLE (1<<14) > > +# define GEN10_READ_HIT_WRITEONLY_DISABLE (1<<14) > > I don't believe you need to redefine this. > It is same as GEN9_RHWO_OPTIMIZATION_DISABLE. > > RCC Read Hit Write Only Optimization Disabled, SKL+ o spec. Oh, I haven't noticed that! Will fix it in the next version... > > #define COMMON_SLICE_CHICKEN2 _MMIO(0x7014) > > # define GEN9_PBE_COMPRESSED_HASH_SELECTION (1<<13) > > # define GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE (1<<12) > > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > > index f31f2d6384c3..0d8e25a4623a 100644 > > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > > @@ -1320,6 +1320,11 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine) > > WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_GPGPU_LEVEL_MASK, > > GEN9_PREEMPT_GPGPU_COMMAND_LEVEL); > > > > + /* ReadHitWriteOnlyDisable: cnl */ > > I was going to complain about the name but I saw on bspec it is really > ReadHitWriteOnlyDisable while on wa_database it is WaReadHitWriteOnlyDisable > > I would tend to prefer the second one, but with the first one the search on Bspec works > and search on wa_database also works... while second one it would be found on BSpec. > So let it be: ReadHitWriteOnlyDisable > > Thanks, > Rodrigo. > > > + WA_SET_BIT_MASKED(GEN7_COMMON_SLICE_CHICKEN1, > > + GEN10_READ_HIT_WRITEONLY_DISABLE); > > + WA_SET_BIT_MASKED(SLICE_UNIT_LEVEL_CLKGATE, RCCUNIT_CLKGATE_DIS); > > + > > > /* WaEnablePreemptionGranularityControlByUMD:cnl */ > > I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1, > > _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL)); > > -- > > 2.13.6 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 8c775e96b4e4..d9a65cebefaa 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3837,6 +3837,7 @@ enum { */ #define SLICE_UNIT_LEVEL_CLKGATE _MMIO(0x94d4) #define SARBUNIT_CLKGATE_DIS (1 << 5) +#define RCCUNIT_CLKGATE_DIS (1 << 7) /* * Display engine regs @@ -7016,6 +7017,7 @@ enum { #define GEN7_COMMON_SLICE_CHICKEN1 _MMIO(0x7010) # define GEN7_CSC1_RHWO_OPT_DISABLE_IN_RCC ((1<<10) | (1<<26)) # define GEN9_RHWO_OPTIMIZATION_DISABLE (1<<14) +# define GEN10_READ_HIT_WRITEONLY_DISABLE (1<<14) #define COMMON_SLICE_CHICKEN2 _MMIO(0x7014) # define GEN9_PBE_COMPRESSED_HASH_SELECTION (1<<13) # define GEN9_DISABLE_GATHER_AT_SET_SHADER_COMMON_SLICE (1<<12) diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index f31f2d6384c3..0d8e25a4623a 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -1320,6 +1320,11 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine) WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_GPGPU_LEVEL_MASK, GEN9_PREEMPT_GPGPU_COMMAND_LEVEL); + /* ReadHitWriteOnlyDisable: cnl */ + WA_SET_BIT_MASKED(GEN7_COMMON_SLICE_CHICKEN1, + GEN10_READ_HIT_WRITEONLY_DISABLE); + WA_SET_BIT_MASKED(SLICE_UNIT_LEVEL_CLKGATE, RCCUNIT_CLKGATE_DIS); + /* WaEnablePreemptionGranularityControlByUMD:cnl */ I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1, _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
The workaround for this is described as: "if RenderSurfaceState.Num_Multisamples > 1, disable RCC clock gating if RenderSurfaceState.Num_Multisamples == 1, set 0x7010[14] = 1" So it looks like the userspace should be responsible for setting these, based on the number of multisamples dependency. However, the register that controls RCC clock gating is not a context register, and cannot be set by userspace. Since we would end up setting one or another based on the number of multisamples anyway, it seems harmless to just set both all the time. This change (specially the GEN10_READ_HIT_WRITEONLY_DISABLE bit) improves CNL stability by avoiding some of the hangs seen in the platform. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> --- drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_engine_cs.c | 5 +++++ 2 files changed, 7 insertions(+)