Message ID | 20240812180736.2013233-2-matthew.d.roper@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/xe: Name and document Wa_14019789679 | expand |
Woops, this should have gone to the Xe list, not the i915 list. I'll re-send. Matt On Mon, Aug 12, 2024 at 11:07:37AM -0700, Matt Roper wrote: > Early in the development of Xe we identified an issue with SVG state > handling on DG2 and MTL (and later on Xe2 as well). In > commit 72ac304769dd ("drm/xe: Emit SVG state on RCS during driver load > on DG2 and MTL") and commit fb24b858a20d ("drm/xe/xe2: Update SVG state > handling") we implemented our own workaround to prevent SVG state from > leaking from context A to context B in cases where context B never > issues a specific state setting. > > The hardware teams have now created official workaround Wa_14019789679 > to cover this issue. The workaround description only requires emitting > 3DSTATE_MESH_CONTROL, since they believe that's the only SVG instruction > that would potentially remain unset by a context B, but still cause > notable issues if unwanted values were inherited from context A. > However since we already have a more extensive implementation that emits > the entire SVG state and prevents _any_ SVG state from unintentionally > leaking, we'll stick with our existing implementation just to be safe. > > Signed-off-by: Matt Roper <matthew.d.roper@intel.com> > --- > drivers/gpu/drm/xe/xe_lrc.c | 35 +++++++++++++++++++++--------- > drivers/gpu/drm/xe/xe_wa_oob.rules | 2 ++ > 2 files changed, 27 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c > index 58121821f081..974a9cd8c379 100644 > --- a/drivers/gpu/drm/xe/xe_lrc.c > +++ b/drivers/gpu/drm/xe/xe_lrc.c > @@ -5,6 +5,8 @@ > > #include "xe_lrc.h" > > +#include <generated/xe_wa_oob.h> > + > #include <linux/ascii85.h> > > #include "instructions/xe_mi_commands.h" > @@ -24,6 +26,7 @@ > #include "xe_memirq.h" > #include "xe_sriov.h" > #include "xe_vm.h" > +#include "xe_wa.h" > > #define LRC_VALID BIT_ULL(0) > #define LRC_PRIVILEGE BIT_ULL(8) > @@ -1581,19 +1584,31 @@ void xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, struct xe_bb *b > int state_table_size = 0; > > /* > - * At the moment we only need to emit non-register state for the RCS > - * engine. > + * Wa_14019789679 > + * > + * If the driver doesn't explicitly emit the SVG instructions while > + * setting up the default LRC, the context switch will write 0's > + * (noops) into the LRC memory rather than the expected instruction > + * headers. Application contexts start out as a copy of the default > + * LRC, and if they also do not emit specific settings for some SVG > + * state, then on context restore they'll unintentionally inherit > + * whatever state setting the previous context had programmed into the > + * hardware (i.e., the lack of a 3DSTATE_* instruction in the LRC will > + * prevent the hardware from resetting that state back to any specific > + * value). > + * > + * The official workaround only requires emitting 3DSTATE_MESH_CONTROL > + * since that's a specific state setting that can easily cause GPU > + * hangs if unintentionally inherited. However to be safe we'll > + * continue to emit all of the SVG state since it's best not to leak > + * any of the state between contexts, even if that leakage is harmless. > */ > - if (q->hwe->class != XE_ENGINE_CLASS_RENDER) > - return; > - > - switch (GRAPHICS_VERx100(xe)) { > - case 1255: > - case 1270 ... 2004: > + if (XE_WA(gt, 14019789679) && q->hwe->class == XE_ENGINE_CLASS_RENDER) { > state_table = xe_hpg_svg_state; > state_table_size = ARRAY_SIZE(xe_hpg_svg_state); > - break; > - default: > + } > + > + if (!state_table) { > xe_gt_dbg(gt, "No non-register state to emit on graphics ver %d.%02d\n", > GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100); > return; > diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules > index 5cf27ff27ce6..920ca5060146 100644 > --- a/drivers/gpu/drm/xe/xe_wa_oob.rules > +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules > @@ -35,3 +35,5 @@ > GRAPHICS_VERSION(2001) > 22019338487_display PLATFORM(LUNARLAKE) > 16023588340 GRAPHICS_VERSION(2001) > +14019789679 GRAPHICS_VERSION(1255) > + GRAPHICS_VERSION_RANGE(1270, 2004) > -- > 2.45.2 >
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c index 58121821f081..974a9cd8c379 100644 --- a/drivers/gpu/drm/xe/xe_lrc.c +++ b/drivers/gpu/drm/xe/xe_lrc.c @@ -5,6 +5,8 @@ #include "xe_lrc.h" +#include <generated/xe_wa_oob.h> + #include <linux/ascii85.h> #include "instructions/xe_mi_commands.h" @@ -24,6 +26,7 @@ #include "xe_memirq.h" #include "xe_sriov.h" #include "xe_vm.h" +#include "xe_wa.h" #define LRC_VALID BIT_ULL(0) #define LRC_PRIVILEGE BIT_ULL(8) @@ -1581,19 +1584,31 @@ void xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, struct xe_bb *b int state_table_size = 0; /* - * At the moment we only need to emit non-register state for the RCS - * engine. + * Wa_14019789679 + * + * If the driver doesn't explicitly emit the SVG instructions while + * setting up the default LRC, the context switch will write 0's + * (noops) into the LRC memory rather than the expected instruction + * headers. Application contexts start out as a copy of the default + * LRC, and if they also do not emit specific settings for some SVG + * state, then on context restore they'll unintentionally inherit + * whatever state setting the previous context had programmed into the + * hardware (i.e., the lack of a 3DSTATE_* instruction in the LRC will + * prevent the hardware from resetting that state back to any specific + * value). + * + * The official workaround only requires emitting 3DSTATE_MESH_CONTROL + * since that's a specific state setting that can easily cause GPU + * hangs if unintentionally inherited. However to be safe we'll + * continue to emit all of the SVG state since it's best not to leak + * any of the state between contexts, even if that leakage is harmless. */ - if (q->hwe->class != XE_ENGINE_CLASS_RENDER) - return; - - switch (GRAPHICS_VERx100(xe)) { - case 1255: - case 1270 ... 2004: + if (XE_WA(gt, 14019789679) && q->hwe->class == XE_ENGINE_CLASS_RENDER) { state_table = xe_hpg_svg_state; state_table_size = ARRAY_SIZE(xe_hpg_svg_state); - break; - default: + } + + if (!state_table) { xe_gt_dbg(gt, "No non-register state to emit on graphics ver %d.%02d\n", GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100); return; diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules index 5cf27ff27ce6..920ca5060146 100644 --- a/drivers/gpu/drm/xe/xe_wa_oob.rules +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules @@ -35,3 +35,5 @@ GRAPHICS_VERSION(2001) 22019338487_display PLATFORM(LUNARLAKE) 16023588340 GRAPHICS_VERSION(2001) +14019789679 GRAPHICS_VERSION(1255) + GRAPHICS_VERSION_RANGE(1270, 2004)
Early in the development of Xe we identified an issue with SVG state handling on DG2 and MTL (and later on Xe2 as well). In commit 72ac304769dd ("drm/xe: Emit SVG state on RCS during driver load on DG2 and MTL") and commit fb24b858a20d ("drm/xe/xe2: Update SVG state handling") we implemented our own workaround to prevent SVG state from leaking from context A to context B in cases where context B never issues a specific state setting. The hardware teams have now created official workaround Wa_14019789679 to cover this issue. The workaround description only requires emitting 3DSTATE_MESH_CONTROL, since they believe that's the only SVG instruction that would potentially remain unset by a context B, but still cause notable issues if unwanted values were inherited from context A. However since we already have a more extensive implementation that emits the entire SVG state and prevents _any_ SVG state from unintentionally leaking, we'll stick with our existing implementation just to be safe. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> --- drivers/gpu/drm/xe/xe_lrc.c | 35 +++++++++++++++++++++--------- drivers/gpu/drm/xe/xe_wa_oob.rules | 2 ++ 2 files changed, 27 insertions(+), 10 deletions(-)