diff mbox series

drm/xe: Name and document Wa_14019789679

Message ID 20240812180736.2013233-2-matthew.d.roper@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/xe: Name and document Wa_14019789679 | expand

Commit Message

Matt Roper Aug. 12, 2024, 6:07 p.m. UTC
Early in the development of Xe we identified an issue with SVG state
handling on DG2 and MTL (and later on Xe2 as well).  In
commit 72ac304769dd ("drm/xe: Emit SVG state on RCS during driver load
on DG2 and MTL") and commit fb24b858a20d ("drm/xe/xe2: Update SVG state
handling") we implemented our own workaround to prevent SVG state from
leaking from context A to context B in cases where context B never
issues a specific state setting.

The hardware teams have now created official workaround Wa_14019789679
to cover this issue.  The workaround description only requires emitting
3DSTATE_MESH_CONTROL, since they believe that's the only SVG instruction
that would potentially remain unset by a context B, but still cause
notable issues if unwanted values were inherited from context A.
However since we already have a more extensive implementation that emits
the entire SVG state and prevents _any_ SVG state from unintentionally
leaking, we'll stick with our existing implementation just to be safe.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_lrc.c        | 35 +++++++++++++++++++++---------
 drivers/gpu/drm/xe/xe_wa_oob.rules |  2 ++
 2 files changed, 27 insertions(+), 10 deletions(-)

Comments

Matt Roper Aug. 12, 2024, 6:10 p.m. UTC | #1
Woops, this should have gone to the Xe list, not the i915 list.  I'll
re-send.


Matt

On Mon, Aug 12, 2024 at 11:07:37AM -0700, Matt Roper wrote:
> Early in the development of Xe we identified an issue with SVG state
> handling on DG2 and MTL (and later on Xe2 as well).  In
> commit 72ac304769dd ("drm/xe: Emit SVG state on RCS during driver load
> on DG2 and MTL") and commit fb24b858a20d ("drm/xe/xe2: Update SVG state
> handling") we implemented our own workaround to prevent SVG state from
> leaking from context A to context B in cases where context B never
> issues a specific state setting.
> 
> The hardware teams have now created official workaround Wa_14019789679
> to cover this issue.  The workaround description only requires emitting
> 3DSTATE_MESH_CONTROL, since they believe that's the only SVG instruction
> that would potentially remain unset by a context B, but still cause
> notable issues if unwanted values were inherited from context A.
> However since we already have a more extensive implementation that emits
> the entire SVG state and prevents _any_ SVG state from unintentionally
> leaking, we'll stick with our existing implementation just to be safe.
> 
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_lrc.c        | 35 +++++++++++++++++++++---------
>  drivers/gpu/drm/xe/xe_wa_oob.rules |  2 ++
>  2 files changed, 27 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index 58121821f081..974a9cd8c379 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -5,6 +5,8 @@
>  
>  #include "xe_lrc.h"
>  
> +#include <generated/xe_wa_oob.h>
> +
>  #include <linux/ascii85.h>
>  
>  #include "instructions/xe_mi_commands.h"
> @@ -24,6 +26,7 @@
>  #include "xe_memirq.h"
>  #include "xe_sriov.h"
>  #include "xe_vm.h"
> +#include "xe_wa.h"
>  
>  #define LRC_VALID				BIT_ULL(0)
>  #define LRC_PRIVILEGE				BIT_ULL(8)
> @@ -1581,19 +1584,31 @@ void xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, struct xe_bb *b
>  	int state_table_size = 0;
>  
>  	/*
> -	 * At the moment we only need to emit non-register state for the RCS
> -	 * engine.
> +	 * Wa_14019789679
> +	 *
> +	 * If the driver doesn't explicitly emit the SVG instructions while
> +	 * setting up the default LRC, the context switch will write 0's
> +	 * (noops) into the LRC memory rather than the expected instruction
> +	 * headers.  Application contexts start out as a copy of the default
> +	 * LRC, and if they also do not emit specific settings for some SVG
> +	 * state, then on context restore they'll unintentionally inherit
> +	 * whatever state setting the previous context had programmed into the
> +	 * hardware (i.e., the lack of a 3DSTATE_* instruction in the LRC will
> +	 * prevent the hardware from resetting that state back to any specific
> +	 * value).
> +	 *
> +	 * The official workaround only requires emitting 3DSTATE_MESH_CONTROL
> +	 * since that's a specific state setting that can easily cause GPU
> +	 * hangs if unintentionally inherited.  However to be safe we'll
> +	 * continue to emit all of the SVG state since it's best not to leak
> +	 * any of the state between contexts, even if that leakage is harmless.
>  	 */
> -	if (q->hwe->class != XE_ENGINE_CLASS_RENDER)
> -		return;
> -
> -	switch (GRAPHICS_VERx100(xe)) {
> -	case 1255:
> -	case 1270 ... 2004:
> +	if (XE_WA(gt, 14019789679) && q->hwe->class == XE_ENGINE_CLASS_RENDER) {
>  		state_table = xe_hpg_svg_state;
>  		state_table_size = ARRAY_SIZE(xe_hpg_svg_state);
> -		break;
> -	default:
> +	}
> +
> +	if (!state_table) {
>  		xe_gt_dbg(gt, "No non-register state to emit on graphics ver %d.%02d\n",
>  			  GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
>  		return;
> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
> index 5cf27ff27ce6..920ca5060146 100644
> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
> @@ -35,3 +35,5 @@
>  		GRAPHICS_VERSION(2001)
>  22019338487_display	PLATFORM(LUNARLAKE)
>  16023588340	GRAPHICS_VERSION(2001)
> +14019789679	GRAPHICS_VERSION(1255)
> +		GRAPHICS_VERSION_RANGE(1270, 2004)
> -- 
> 2.45.2
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 58121821f081..974a9cd8c379 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -5,6 +5,8 @@ 
 
 #include "xe_lrc.h"
 
+#include <generated/xe_wa_oob.h>
+
 #include <linux/ascii85.h>
 
 #include "instructions/xe_mi_commands.h"
@@ -24,6 +26,7 @@ 
 #include "xe_memirq.h"
 #include "xe_sriov.h"
 #include "xe_vm.h"
+#include "xe_wa.h"
 
 #define LRC_VALID				BIT_ULL(0)
 #define LRC_PRIVILEGE				BIT_ULL(8)
@@ -1581,19 +1584,31 @@  void xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, struct xe_bb *b
 	int state_table_size = 0;
 
 	/*
-	 * At the moment we only need to emit non-register state for the RCS
-	 * engine.
+	 * Wa_14019789679
+	 *
+	 * If the driver doesn't explicitly emit the SVG instructions while
+	 * setting up the default LRC, the context switch will write 0's
+	 * (noops) into the LRC memory rather than the expected instruction
+	 * headers.  Application contexts start out as a copy of the default
+	 * LRC, and if they also do not emit specific settings for some SVG
+	 * state, then on context restore they'll unintentionally inherit
+	 * whatever state setting the previous context had programmed into the
+	 * hardware (i.e., the lack of a 3DSTATE_* instruction in the LRC will
+	 * prevent the hardware from resetting that state back to any specific
+	 * value).
+	 *
+	 * The official workaround only requires emitting 3DSTATE_MESH_CONTROL
+	 * since that's a specific state setting that can easily cause GPU
+	 * hangs if unintentionally inherited.  However to be safe we'll
+	 * continue to emit all of the SVG state since it's best not to leak
+	 * any of the state between contexts, even if that leakage is harmless.
 	 */
-	if (q->hwe->class != XE_ENGINE_CLASS_RENDER)
-		return;
-
-	switch (GRAPHICS_VERx100(xe)) {
-	case 1255:
-	case 1270 ... 2004:
+	if (XE_WA(gt, 14019789679) && q->hwe->class == XE_ENGINE_CLASS_RENDER) {
 		state_table = xe_hpg_svg_state;
 		state_table_size = ARRAY_SIZE(xe_hpg_svg_state);
-		break;
-	default:
+	}
+
+	if (!state_table) {
 		xe_gt_dbg(gt, "No non-register state to emit on graphics ver %d.%02d\n",
 			  GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
 		return;
diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
index 5cf27ff27ce6..920ca5060146 100644
--- a/drivers/gpu/drm/xe/xe_wa_oob.rules
+++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
@@ -35,3 +35,5 @@ 
 		GRAPHICS_VERSION(2001)
 22019338487_display	PLATFORM(LUNARLAKE)
 16023588340	GRAPHICS_VERSION(2001)
+14019789679	GRAPHICS_VERSION(1255)
+		GRAPHICS_VERSION_RANGE(1270, 2004)