Message ID | 20180126230524.17429-1-rafael.antognolli@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Rafael Antognolli (2018-01-26 23:05:24) > This workaround should prevent a bug that can be hit on a context > restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall > (0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of > NOOP(0x0) in the indirect context batch buffer, to ensure the engine is > idle prior to programming 3DSTATE_SAMPLE_PATTERN. > > References: HSD#1939868 > > v2: > - More descriptive changelog and comments. > - Fix math when counting dwords. > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++++- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 51e61b04a555..46c130d106d7 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1366,6 +1366,36 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) > return batch; > } > > +static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) > +{ > + int i; > + > + /* > + * WaPipeControlBefore3DStateSamplePattern: cnl > + * > + * Ensure the engine is idle prior to programming a > + * 3DSTATE_SAMPLE_PATTERN during a context restore. > + */ > + batch = gen8_emit_pipe_control(batch, > + PIPE_CONTROL_CS_STALL, > + 0); > + /* > + * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for > + * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in > + * total. Since gen8_emit_pipe_control() already advances the batch by > + * 6 dwords, we advance the other 10 here. > + */ > + for (i = 0; i < 10; i++) > + *batch++ = MI_NOOP; If the spec says emit 12 nops, emit 12 nops. Otherwise if we add another w/a here it may break this magic. I am not going to ask what the magic nops do, it just gives a 72 cycle delay in the CS parser. My guess is that they are trying to isolate a cacheline, in which case they are expecting the pipecontrol to be cacheline aligned. (I would get that clarified and add an assert to document such a requirement.) Or they just mean that the indirect ctx is cacheline aligned and the nops are just the normal padding and nothing to do with the w/a. s/10/12/ here and tweak the comment to remove the assumed cacheline tail, but please see if someone can clarify if the nops are indeed part of the w/a or just the normal filler. With s/10/12/, Acked-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Fri, Jan 26, 2018 at 11:17:01PM +0000, Chris Wilson wrote: > Quoting Rafael Antognolli (2018-01-26 23:05:24) > > This workaround should prevent a bug that can be hit on a context > > restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall > > (0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of > > NOOP(0x0) in the indirect context batch buffer, to ensure the engine is > > idle prior to programming 3DSTATE_SAMPLE_PATTERN. > > > > References: HSD#1939868 > > > > v2: > > - More descriptive changelog and comments. > > - Fix math when counting dwords. > > > > Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++++- > > 1 file changed, 33 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index 51e61b04a555..46c130d106d7 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -1366,6 +1366,36 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) > > return batch; > > } > > > > +static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) > > +{ > > + int i; > > + > > + /* > > + * WaPipeControlBefore3DStateSamplePattern: cnl > > + * > > + * Ensure the engine is idle prior to programming a > > + * 3DSTATE_SAMPLE_PATTERN during a context restore. > > + */ > > + batch = gen8_emit_pipe_control(batch, > > + PIPE_CONTROL_CS_STALL, > > + 0); > > + /* > > + * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for > > + * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in > > + * total. Since gen8_emit_pipe_control() already advances the batch by > > + * 6 dwords, we advance the other 10 here. > > + */ > > + for (i = 0; i < 10; i++) > > + *batch++ = MI_NOOP; > > If the spec says emit 12 nops, emit 12 nops. Otherwise if we add another > w/a here it may break this magic. So, in the bspec, there's the following text: "The address above needs to be a GGTT address and contain a pipe control with CS stall(0x7a000004 0x00100000 0x00000000 0x00000000)followed by 12DW’s of NOOP(0x00000000)" while on an internal ticket pointed by it, it is: "The address above needs to be a GGTT address.. It would contain the following value: 7A000004 00100000 followed by 14 DW’s of zero." That's why I think it is really 16 DW's in total. And gen8_emit_pipe_control() already emits 6 of them. How are we supposed to emit 12 after that? I don't know if they expect the pipe control to be cacheline aligned, but I can ask and try to find out. > I am not going to ask what the magic nops do, it just gives a 72 cycle > delay in the CS parser. My guess is that they are trying to isolate a > cacheline, in which case they are expecting the pipecontrol to be > cacheline aligned. (I would get that clarified and add an assert to > document such a requirement.) Or they just mean that the indirect ctx is > cacheline aligned and the nops are just the normal padding and nothing > to do with the w/a. > > s/10/12/ here and tweak the comment to remove the assumed cacheline > tail, but please see if someone can clarify if the nops are indeed part > of the w/a or just the normal filler. > With s/10/12/, > Acked-by: Chris Wilson <chris@chris-wilson.co.uk> > -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 51e61b04a555..46c130d106d7 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1366,6 +1366,36 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) return batch; } +static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) +{ + int i; + + /* + * WaPipeControlBefore3DStateSamplePattern: cnl + * + * Ensure the engine is idle prior to programming a + * 3DSTATE_SAMPLE_PATTERN during a context restore. + */ + batch = gen8_emit_pipe_control(batch, + PIPE_CONTROL_CS_STALL, + 0); + /* + * WaPipeControlBefore3DStateSamplePattern says we need 4 dwords for + * the PIPE_CONTROL followed by 12 dwords of 0x0, so 16 dwords in + * total. Since gen8_emit_pipe_control() already advances the batch by + * 6 dwords, we advance the other 10 here. + */ + for (i = 0; i < 10; i++) + *batch++ = MI_NOOP; + + /* Pad to end of cacheline */ + while ((unsigned long)batch % CACHELINE_BYTES) + *batch++ = MI_NOOP; + + return batch; +} + + #define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE) static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) @@ -1419,7 +1449,9 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine) switch (INTEL_GEN(engine->i915)) { case 10: - return 0; + wa_bb_fn[0] = gen10_init_indirectctx_bb; + wa_bb_fn[1] = NULL; + break; case 9: wa_bb_fn[0] = gen9_init_indirectctx_bb; wa_bb_fn[1] = NULL;
This workaround should prevent a bug that can be hit on a context restore. To avoid the issue, we must emit a PIPE_CONTROL with CS stall (0x7a000004 0x00100000 0x00000000 0x00000000) followed by 12DW's of NOOP(0x0) in the indirect context batch buffer, to ensure the engine is idle prior to programming 3DSTATE_SAMPLE_PATTERN. References: HSD#1939868 v2: - More descriptive changelog and comments. - Fix math when counting dwords. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)