diff mbox series

[7/7] drm/i915/uc: Don't forget to prepare GuC for the reset

Message ID 20190517162227.6436-8-michal.wajdeczko@intel.com (mailing list archive)
State New, archived
Headers show
Series GuC fixes | expand

Commit Message

Michal Wajdeczko May 17, 2019, 4:22 p.m. UTC
When we reset engines using ALL_ENGINES mask, we will do
full GPU reset and GuC will be also affected. Let GuC be
prepared for upcoming reset.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Chris Wilson May 17, 2019, 4:27 p.m. UTC | #1
Quoting Michal Wajdeczko (2019-05-17 17:22:27)
> When we reset engines using ALL_ENGINES mask, we will do
> full GPU reset and GuC will be also affected. Let GuC be
> prepared for upcoming reset.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 464369bc55ad..ca6e40b6b4e2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct drm_i915_private *i915,
>                  */
>         }
>  
> +       /* We are about to do full GPU reset, don't forget about GuC */
> +       if (engine_mask == ALL_ENGINES)
> +               intel_uc_reset_prepare(i915);

Eh, this is done in reset_prepare already. The only other path to call
intel_gpu_reset() directly is along sanitization, which should also have
already sanitized the guc as well. No?
-Chris
Michal Wajdeczko May 17, 2019, 4:54 p.m. UTC | #2
On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Michal Wajdeczko (2019-05-17 17:22:27)
>> When we reset engines using ALL_ENGINES mask, we will do
>> full GPU reset and GuC will be also affected. Let GuC be
>> prepared for upcoming reset.
>>
>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c  
>> b/drivers/gpu/drm/i915/gt/intel_reset.c
>> index 464369bc55ad..ca6e40b6b4e2 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
>> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct  
>> drm_i915_private *i915,
>>                  */
>>         }
>>
>> +       /* We are about to do full GPU reset, don't forget about GuC */
>> +       if (engine_mask == ALL_ENGINES)
>> +               intel_uc_reset_prepare(i915);
>
> Eh, this is done in reset_prepare already. The only other path to call
> intel_gpu_reset() directly is along sanitization, which should also have
> already sanitized the guc as well. No?

There is igt_atomic_reset selftest which does not call reset_prepare.
And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL,
our later graceful goodbye with GuC was not working.

This is hidden with current GuC fw, but with new ICL fw with CT is was  
visible as:

<3> [508.613024] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action  
0x4506 failed with error -110 0x4506
<3> [508.613142] [drm:guc_action_deregister_ct_buffer [i915]] *ERROR* CT:  
deregister SEND buffer failed; owner=0 err=-110
<3> [508.623521] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action  
0x4506 failed with error -110 0x4506
<3> [508.623573] [drm:guc_action_deregister_ct_buffer [i915]] *ERROR* CT:  
deregister RECV buffer failed; owner=0 err=-110

Michal
Chris Wilson May 17, 2019, 5:01 p.m. UTC | #3
Quoting Michal Wajdeczko (2019-05-17 17:54:53)
> On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Quoting Michal Wajdeczko (2019-05-17 17:22:27)
> >> When we reset engines using ALL_ENGINES mask, we will do
> >> full GPU reset and GuC will be also affected. Let GuC be
> >> prepared for upcoming reset.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c  
> >> b/drivers/gpu/drm/i915/gt/intel_reset.c
> >> index 464369bc55ad..ca6e40b6b4e2 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct  
> >> drm_i915_private *i915,
> >>                  */
> >>         }
> >>
> >> +       /* We are about to do full GPU reset, don't forget about GuC */
> >> +       if (engine_mask == ALL_ENGINES)
> >> +               intel_uc_reset_prepare(i915);
> >
> > Eh, this is done in reset_prepare already. The only other path to call
> > intel_gpu_reset() directly is along sanitization, which should also have
> > already sanitized the guc as well. No?
> 
> There is igt_atomic_reset selftest which does not call reset_prepare.
> And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL,
> our later graceful goodbye with GuC was not working.

Ok, the intent there was to avoid letting any sleeps slip into the
device reset itself (should we need to put it inside a heavy hammer like
stop_machine() to serialise it with direct user access to the hw). I
think it is fair to put that under reset_prepare / reset_finish. Would
also mean moving it out of selftest_hangcheck and into selftest_reset.
-Chris
Chris Wilson May 17, 2019, 5:04 p.m. UTC | #4
Quoting Michal Wajdeczko (2019-05-17 17:54:53)
> On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Quoting Michal Wajdeczko (2019-05-17 17:22:27)
> >> When we reset engines using ALL_ENGINES mask, we will do
> >> full GPU reset and GuC will be also affected. Let GuC be
> >> prepared for upcoming reset.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c  
> >> b/drivers/gpu/drm/i915/gt/intel_reset.c
> >> index 464369bc55ad..ca6e40b6b4e2 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct  
> >> drm_i915_private *i915,
> >>                  */
> >>         }
> >>
> >> +       /* We are about to do full GPU reset, don't forget about GuC */
> >> +       if (engine_mask == ALL_ENGINES)
> >> +               intel_uc_reset_prepare(i915);
> >
> > Eh, this is done in reset_prepare already. The only other path to call
> > intel_gpu_reset() directly is along sanitization, which should also have
> > already sanitized the guc as well. No?
> 
> There is igt_atomic_reset selftest which does not call reset_prepare.
> And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL,
> our later graceful goodbye with GuC was not working.

(Apologies if resent.)

Ah, so the intent there was to prevent sleeps slipping into the
device-level reset (so that we could pull it under a heavy hammer like
stop_machine() to serialise the reset with direct user access). At one
point, I hoped we could make i915_reset() completely fast and lockless,
but that was a fleeting fancy.

It looks reasonable to put that under a reset_prepare/reset_finish and
keep the test semantics. That would also mean we have to move it out of
the increasingly misnamed selftest_hangcheck.c into a selftest_reset.c
-Chris
Chris Wilson May 17, 2019, 5:11 p.m. UTC | #5
Quoting Michal Wajdeczko (2019-05-17 17:54:53)
> On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > Quoting Michal Wajdeczko (2019-05-17 17:22:27)
> >> When we reset engines using ALL_ENGINES mask, we will do
> >> full GPU reset and GuC will be also affected. Let GuC be
> >> prepared for upcoming reset.
> >>
> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c  
> >> b/drivers/gpu/drm/i915/gt/intel_reset.c
> >> index 464369bc55ad..ca6e40b6b4e2 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct  
> >> drm_i915_private *i915,
> >>                  */
> >>         }
> >>
> >> +       /* We are about to do full GPU reset, don't forget about GuC */
> >> +       if (engine_mask == ALL_ENGINES)
> >> +               intel_uc_reset_prepare(i915);
> >
> > Eh, this is done in reset_prepare already. The only other path to call
> > intel_gpu_reset() directly is along sanitization, which should also have
> > already sanitized the guc as well. No?
> 
> There is igt_atomic_reset selftest which does not call reset_prepare.
> And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL,
> our later graceful goodbye with GuC was not working.
> 
> This is hidden with current GuC fw, but with new ICL fw with CT is was  
> visible as:

Imagine we fix the selftest, is there a GEM_BUG_ON(intel_uc_active()) we
could put here to enforce sanitization first? Does such a assert want to
be here or up a level in intel_gpu_reset()? 
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 464369bc55ad..ca6e40b6b4e2 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -564,6 +564,10 @@  static int gen8_reset_engines(struct drm_i915_private *i915,
 		 */
 	}
 
+	/* We are about to do full GPU reset, don't forget about GuC */
+	if (engine_mask == ALL_ENGINES)
+		intel_uc_reset_prepare(i915);
+
 	if (INTEL_GEN(i915) >= 11)
 		ret = gen11_reset_engines(i915, engine_mask, retry);
 	else