diff mbox series

drm/i915/gt: Fix CCS id's calculation for CCS mode setting

Message ID 20240517090616.242529-1-andi.shyti@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Fix CCS id's calculation for CCS mode setting | expand

Commit Message

Andi Shyti May 17, 2024, 9:06 a.m. UTC
The whole point of the previous fixes has been to change the CCS
hardware configuration to generate only one stream available to
the compute users. We did this by changing the info.engine_mask
that is set during device probe, reset during the detection of
the fused engines, and finally reset again when choosing the CCS
mode.

We can't use the engine_mask variable anymore, as with the
current configuration, it imposes only one CCS no matter what the
hardware configuration is.

Before changing the engine_mask for the third time, save it and
use it for calculating the CCS mode.

After the previous changes, the user reported a performance drop
to around 1/4. We have tested that the compute operations, with
the current patch, have improved by the same factor.

Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Gnattu OC <gnattuoc@me.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Jian Ye <jian.ye@intel.com>
---
Hi,

This ensures that all four CCS engines work properly. However,
during the tests, Jian detected that the performance during
memory copy assigned to the CCS engines is negatively impacted.

I believe this might be expected, considering that based on the
engines' availability, the media user might decide to reduce the
copy in multitasking.

With the upcoming work that will give the user the chance to
configure the CCS mode, this might improve.

Gnattu, can I use your kindness to ask for a test on this patch
and check whether the performance improve on your side as well?

Thanks,
Andi

 drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 6 ++++++
 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h    | 8 ++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

Comments

Andi Shyti May 17, 2024, 10:34 a.m. UTC | #1
Hi,

On Fri, May 17, 2024 at 11:06:16AM +0200, Andi Shyti wrote:
> The whole point of the previous fixes has been to change the CCS
> hardware configuration to generate only one stream available to
> the compute users. We did this by changing the info.engine_mask
> that is set during device probe, reset during the detection of
> the fused engines, and finally reset again when choosing the CCS
> mode.
> 
> We can't use the engine_mask variable anymore, as with the
> current configuration, it imposes only one CCS no matter what the
> hardware configuration is.
> 
> Before changing the engine_mask for the third time, save it and
> use it for calculating the CCS mode.
> 
> After the previous changes, the user reported a performance drop
> to around 1/4. We have tested that the compute operations, with
> the current patch, have improved by the same factor.
> 
> Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Gnattu OC <gnattuoc@me.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Tested-by: Jian Ye <jian.ye@intel.com>

of course:

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi
Umesh Nerlige Ramappa May 17, 2024, 8:43 p.m. UTC | #2
On Fri, May 17, 2024 at 11:06:16AM +0200, Andi Shyti wrote:
>The whole point of the previous fixes has been to change the CCS
>hardware configuration to generate only one stream available to
>the compute users. We did this by changing the info.engine_mask
>that is set during device probe, reset during the detection of
>the fused engines, and finally reset again when choosing the CCS
>mode.
>
>We can't use the engine_mask variable anymore, as with the
>current configuration, it imposes only one CCS no matter what the
>hardware configuration is.
>
>Before changing the engine_mask for the third time, save it and
>use it for calculating the CCS mode.
>
>After the previous changes, the user reported a performance drop
>to around 1/4. We have tested that the compute operations, with
>the current patch, have improved by the same factor.
>
>Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
>Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>Cc: Gnattu OC <gnattuoc@me.com>
>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>Cc: Matt Roper <matthew.d.roper@intel.com>
>Tested-by: Jian Ye <jian.ye@intel.com>
>---
>Hi,
>
>This ensures that all four CCS engines work properly. However,
>during the tests, Jian detected that the performance during
>memory copy assigned to the CCS engines is negatively impacted.
>
>I believe this might be expected, considering that based on the
>engines' availability, the media user might decide to reduce the
>copy in multitasking.
>
>With the upcoming work that will give the user the chance to
>configure the CCS mode, this might improve.
>
>Gnattu, can I use your kindness to ask for a test on this patch
>and check whether the performance improve on your side as well?
>
>Thanks,
>Andi
>
> drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 6 ++++++
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
> drivers/gpu/drm/i915/gt/intel_gt_types.h    | 8 ++++++++
> 3 files changed, 15 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>index 5c8e9ee3b008..3b740ca25000 100644
>--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>@@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
> 	if (IS_DG2(gt->i915)) {
> 		u8 first_ccs = __ffs(CCS_MASK(gt));
>
>+		/*
>+		 * Store the number of active cslices before
>+		 * changing the CCS engine configuration
>+		 */
>+		gt->ccs.cslices = CCS_MASK(gt);
>+
> 		/* Mask off all the CCS engine */
> 		info->engine_mask &= ~GENMASK(CCS3, CCS0);
> 		/* Put back in the first CCS engine */
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
>index 99b71bb7da0a..3c62a44e9106 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
>@@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
>
> 	/* Build the value for the fixed CCS load balancing */
> 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
>-		if (CCS_MASK(gt) & BIT(cslice))
>+		if (gt->ccs.cslices & BIT(cslice))
> 			/*
> 			 * If available, assign the cslice
> 			 * to the first available engine...
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
>index def7dd0eb6f1..cfdd2ad5e954 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
>+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
>@@ -207,6 +207,14 @@ struct intel_gt {
> 					    [MAX_ENGINE_INSTANCE + 1];
> 	enum intel_submission_method submission_method;
>
>+	struct {
>+		/*
>+		 * Mask of the non fused CCS slices
>+		 * to be used for the load balancing
>+		 */
>+		intel_engine_mask_t cslices;
>+	} ccs;
>+

LGTM,

Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>

> 	/*
> 	 * Default address space (either GGTT or ppGTT depending on arch).
> 	 *
>-- 
>2.43.0
>
Gnattu OC May 19, 2024, 3:34 p.m. UTC | #3
> On May 17, 2024, at 17:06, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> 
> The whole point of the previous fixes has been to change the CCS
> hardware configuration to generate only one stream available to
> the compute users. We did this by changing the info.engine_mask
> that is set during device probe, reset during the detection of
> the fused engines, and finally reset again when choosing the CCS
> mode.
> 
> We can't use the engine_mask variable anymore, as with the
> current configuration, it imposes only one CCS no matter what the
> hardware configuration is.
> 
> Before changing the engine_mask for the third time, save it and
> use it for calculating the CCS mode.
> 
> After the previous changes, the user reported a performance drop
> to around 1/4. We have tested that the compute operations, with
> the current patch, have improved by the same factor.
> 
> Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Gnattu OC <gnattuoc@me.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Tested-by: Jian Ye <jian.ye@intel.com>
> ---
> Hi,
> 
> This ensures that all four CCS engines work properly. However,
> during the tests, Jian detected that the performance during
> memory copy assigned to the CCS engines is negatively impacted.
> 
> I believe this might be expected, considering that based on the
> engines' availability, the media user might decide to reduce the
> copy in multitasking.
> 
> With the upcoming work that will give the user the chance to
> configure the CCS mode, this might improve.
> 
> Gnattu, can I use your kindness to ask for a test on this patch
> and check whether the performance improve on your side as well?
> 
> Thanks,
> Andi
> 
> drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 6 ++++++
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
> drivers/gpu/drm/i915/gt/intel_gt_types.h    | 8 ++++++++
> 3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 5c8e9ee3b008..3b740ca25000 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
> 	if (IS_DG2(gt->i915)) {
> 		u8 first_ccs = __ffs(CCS_MASK(gt));
> 
> +		/*
> +		 * Store the number of active cslices before
> +		 * changing the CCS engine configuration
> +		 */
> +		gt->ccs.cslices = CCS_MASK(gt);
> +
> 		/* Mask off all the CCS engine */
> 		info->engine_mask &= ~GENMASK(CCS3, CCS0);
> 		/* Put back in the first CCS engine */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> index 99b71bb7da0a..3c62a44e9106 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> @@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
> 
> 	/* Build the value for the fixed CCS load balancing */
> 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
> -		if (CCS_MASK(gt) & BIT(cslice))
> +		if (gt->ccs.cslices & BIT(cslice))
> 			/*
> 			 * If available, assign the cslice
> 			 * to the first available engine...
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index def7dd0eb6f1..cfdd2ad5e954 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -207,6 +207,14 @@ struct intel_gt {
> 					    [MAX_ENGINE_INSTANCE + 1];
> 	enum intel_submission_method submission_method;
> 
> +	struct {
> +		/*
> +		 * Mask of the non fused CCS slices
> +		 * to be used for the load balancing
> +		 */
> +		intel_engine_mask_t cslices;
> +	} ccs;
> +
> 	/*
> 	 * Default address space (either GGTT or ppGTT depending on arch).
> 	 *
> -- 
> 2.43.0

Hi Andi,

I can confirm that this patch restores most of the performance we had before the CCS change. 

I do notice a reduction in memcpy performance, but it is good enough for our use case since our video processing pipeline is zero-copy once the video is loaded to the VRAM.

Tested-by: Gnattu OC <gnattuoc@me.com <mailto:gnattuoc@me.com>>
Andi Shyti May 22, 2024, 8:51 a.m. UTC | #4
> The whole point of the previous fixes has been to change the CCS
> hardware configuration to generate only one stream available to
> the compute users. We did this by changing the info.engine_mask
> that is set during device probe, reset during the detection of
> the fused engines, and finally reset again when choosing the CCS
> mode.
> 
> We can't use the engine_mask variable anymore, as with the
> current configuration, it imposes only one CCS no matter what the
> hardware configuration is.
> 
> Before changing the engine_mask for the third time, save it and
> use it for calculating the CCS mode.
> 
> After the previous changes, the user reported a performance drop
> to around 1/4. We have tested that the compute operations, with
> the current patch, have improved by the same factor.
> 
> Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Gnattu OC <gnattuoc@me.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Tested-by: Jian Ye <jian.ye@intel.com>

Thanks everyone for testing and reviewing, pushed in
drm-intel-gt-next.

Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 5c8e9ee3b008..3b740ca25000 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -885,6 +885,12 @@  static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
 	if (IS_DG2(gt->i915)) {
 		u8 first_ccs = __ffs(CCS_MASK(gt));
 
+		/*
+		 * Store the number of active cslices before
+		 * changing the CCS engine configuration
+		 */
+		gt->ccs.cslices = CCS_MASK(gt);
+
 		/* Mask off all the CCS engine */
 		info->engine_mask &= ~GENMASK(CCS3, CCS0);
 		/* Put back in the first CCS engine */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
index 99b71bb7da0a..3c62a44e9106 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
@@ -19,7 +19,7 @@  unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
 
 	/* Build the value for the fixed CCS load balancing */
 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
-		if (CCS_MASK(gt) & BIT(cslice))
+		if (gt->ccs.cslices & BIT(cslice))
 			/*
 			 * If available, assign the cslice
 			 * to the first available engine...
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index def7dd0eb6f1..cfdd2ad5e954 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -207,6 +207,14 @@  struct intel_gt {
 					    [MAX_ENGINE_INSTANCE + 1];
 	enum intel_submission_method submission_method;
 
+	struct {
+		/*
+		 * Mask of the non fused CCS slices
+		 * to be used for the load balancing
+		 */
+		intel_engine_mask_t cslices;
+	} ccs;
+
 	/*
 	 * Default address space (either GGTT or ppGTT depending on arch).
 	 *