Message ID | 20180126012634.3552-1-rafael.antognolli@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Rafael Antognolli (2018-01-26 01:26:34) > Write a PIPE_CONTROL with CS stall followed by 14 dwords of 0 in the > indirect context wa bb. 14 MI_NOOPS following? That isn't what you wrote in the code, but the main thing you haven't explained is why. A normal batch will already have a flush in its setup, but more to the point would be the only reason this is required is because of an implicit 3DSTATE inside the context image on preemption. Right? Otherwise it seems to be a purely userspace problem. -Chris
On Fri, Jan 26, 2018 at 08:23:13AM +0000, Chris Wilson wrote: > Quoting Rafael Antognolli (2018-01-26 01:26:34) > > Write a PIPE_CONTROL with CS stall followed by 14 dwords of 0 in the > > indirect context wa bb. > > 14 MI_NOOPS following? That isn't what you wrote in the code, but the Agreed, sorry. The workarounds says: 0x7a000004 0x00100000 + 14 dwords of 0. I counted 6 dwords from gen8_emit_pipe_control() and figured "just 8 more to 14". I think I had it right on my first version of this patch, but... > main thing you haven't explained is why. A normal batch will already > have a flush in its setup, but more to the point would be the only > reason this is required is because of an implicit 3DSTATE inside the > context image on preemption. Right? Otherwise it seems to be a purely > userspace problem. This is the text from the workaround: "This bug can be hit on a context restore. To avoid the issue the following must be programmed by SW to ensure the engine is idle prior to programming 3DSTATE_SAMPLE_PATTERN: With RS context enabled: 0x21c8 = 0x000085c0 With RS context disabled:0x21c8 = 0x00001ac0 The above program specifes the offset to insert driver programmed commands 0x21c4[31:6] = 0x<Addresses> 0x21c4[5:0] = 0x<N> N=Size of CL needed to fit Workaround The above programming is the GGTT address of the driver programmed commands and the size(# of CL) to execute. The address above needs to be a GGTT address and contain a pipe control with CS stall(0x7a000004 0x00100000 0x00000000 0x00000000)followed by 12DW’s of NOOP(0x00000000)" Since it needs to be in a GGTT address, and it's specifically talking about the Indirect Context Pointer, we figured it should be in the kernel. I can update the commit message with the above text if it helps. I originally implemented this trying to fix my GPU hang, but it turns out the issue was something else and this commit doesn't help at all. Still, I see no reason to not have it there just in case it prevent any future hangs... Rafael
On Fri, Jan 26, 2018 at 09:55:58AM -0800, Rafael Antognolli wrote: > On Fri, Jan 26, 2018 at 08:23:13AM +0000, Chris Wilson wrote: > > Quoting Rafael Antognolli (2018-01-26 01:26:34) > > > Write a PIPE_CONTROL with CS stall followed by 14 dwords of 0 in the > > > indirect context wa bb. > > > > 14 MI_NOOPS following? That isn't what you wrote in the code, but the > > Agreed, sorry. The workarounds says: > > 0x7a000004 0x00100000 + 14 dwords of 0. I counted 6 dwords from > gen8_emit_pipe_control() and figured "just 8 more to 14". I think I had > it right on my first version of this patch, but... > > > main thing you haven't explained is why. A normal batch will already > > have a flush in its setup, but more to the point would be the only > > reason this is required is because of an implicit 3DSTATE inside the > > context image on preemption. Right? Otherwise it seems to be a purely > > userspace problem. > > This is the text from the workaround: > > "This bug can be hit on a context restore. To avoid the issue the > following must be programmed by SW to ensure the engine is idle prior to > programming 3DSTATE_SAMPLE_PATTERN: > > With RS context enabled: 0x21c8 = 0x000085c0 > > With RS context disabled:0x21c8 = 0x00001ac0 > > The above program specifes the offset to insert driver programmed > commands > > 0x21c4[31:6] = 0x<Addresses> > > 0x21c4[5:0] = 0x<N> > > N=Size of CL needed to fit Workaround > > The above programming is the GGTT address of the driver programmed > commands and the size(# of CL) to execute. > > The address above needs to be a GGTT address and contain a pipe control > with CS stall(0x7a000004 0x00100000 0x00000000 0x00000000)followed by > 12DW’s of NOOP(0x00000000)" > > Since it needs to be in a GGTT address, and it's specifically talking > about the Indirect Context Pointer, we figured it should be in the > kernel. I can update the commit message with the above text if it helps. > > I originally implemented this trying to fix my GPU hang, but it turns > out the issue was something else and this commit doesn't help at all. > Still, I see no reason to not have it there just in case it prevent any > future hangs... Oh, and this workaround is actually a little longer, and there is a part that describe a similar PIPE_CONTROL before 3DSTATE_SAMPLE_PATTERN. That part is already implemented in userspace.
Quoting Rafael Antognolli (2018-01-26 17:58:29) > On Fri, Jan 26, 2018 at 09:55:58AM -0800, Rafael Antognolli wrote: > > On Fri, Jan 26, 2018 at 08:23:13AM +0000, Chris Wilson wrote: > > > Quoting Rafael Antognolli (2018-01-26 01:26:34) > > > > Write a PIPE_CONTROL with CS stall followed by 14 dwords of 0 in the > > > > indirect context wa bb. > > > > > > 14 MI_NOOPS following? That isn't what you wrote in the code, but the > > > > Agreed, sorry. The workarounds says: > > > > 0x7a000004 0x00100000 + 14 dwords of 0. I counted 6 dwords from > > gen8_emit_pipe_control() and figured "just 8 more to 14". I think I had > > it right on my first version of this patch, but... > > > > > main thing you haven't explained is why. A normal batch will already > > > have a flush in its setup, but more to the point would be the only > > > reason this is required is because of an implicit 3DSTATE inside the > > > context image on preemption. Right? Otherwise it seems to be a purely > > > userspace problem. > > > > This is the text from the workaround: > > > > "This bug can be hit on a context restore. To avoid the issue the > > following must be programmed by SW to ensure the engine is idle prior to > > programming 3DSTATE_SAMPLE_PATTERN: > > > > With RS context enabled: 0x21c8 = 0x000085c0 > > > > With RS context disabled:0x21c8 = 0x00001ac0 > > > > The above program specifes the offset to insert driver programmed > > commands > > > > 0x21c4[31:6] = 0x<Addresses> > > > > 0x21c4[5:0] = 0x<N> > > > > N=Size of CL needed to fit Workaround > > > > The above programming is the GGTT address of the driver programmed > > commands and the size(# of CL) to execute. > > > > The address above needs to be a GGTT address and contain a pipe control > > with CS stall(0x7a000004 0x00100000 0x00000000 0x00000000)followed by > > 12DW’s of NOOP(0x00000000)" > > > > Since it needs to be in a GGTT address, and it's specifically talking > > about the Indirect Context Pointer, we figured it should be in the > > kernel. I can update the commit message with the above text if it helps. > > > > I originally implemented this trying to fix my GPU hang, but it turns > > out the issue was something else and this commit doesn't help at all. > > Still, I see no reason to not have it there just in case it prevent any > > future hangs... > > Oh, and this workaround is actually a little longer, and there is a part > that describe a similar PIPE_CONTROL before 3DSTATE_SAMPLE_PATTERN. That > part is already implemented in userspace. Cool, I really just wanted confirmation that it was indeed required before the context switch. Hence justifying placement in the indirect_ctx bb, and explaining why it's not just a regular pre-batch flush. Please do try and distill as much as the workaround note into the changelog and a reminder inside the code; especially the rationale on why it's in the indirect ctx bb. -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 51e61b04a555..fd7e7ffde570 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1366,6 +1366,25 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) return batch; } +static u32 *gen10_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch) +{ + int i; + + /* WaPipeControlBefore3DStateSamplePattern: cnl */ + batch = gen8_emit_pipe_control(batch, + PIPE_CONTROL_CS_STALL, + 0); + for (i = 0; i < 8; i++) + *batch++ = MI_NOOP; + + /* Pad to end of cacheline */ + while ((unsigned long)batch % CACHELINE_BYTES) + *batch++ = MI_NOOP; + + return batch; +} + + #define CTX_WA_BB_OBJ_SIZE (PAGE_SIZE) static int lrc_setup_wa_ctx(struct intel_engine_cs *engine) @@ -1419,7 +1438,9 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine) switch (INTEL_GEN(engine->i915)) { case 10: - return 0; + wa_bb_fn[0] = gen10_init_indirectctx_bb; + wa_bb_fn[1] = NULL; + break; case 9: wa_bb_fn[0] = gen9_init_indirectctx_bb; wa_bb_fn[1] = NULL;
Write a PIPE_CONTROL with CS stall followed by 14 dwords of 0 in the indirect context wa bb. References: HSD#1939868 Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)