Message ID | 20190517162227.6436-8-michal.wajdeczko@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | GuC fixes | expand |
Quoting Michal Wajdeczko (2019-05-17 17:22:27) > When we reset engines using ALL_ENGINES mask, we will do > full GPU reset and GuC will be also affected. Let GuC be > prepared for upcoming reset. > > Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index 464369bc55ad..ca6e40b6b4e2 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct drm_i915_private *i915, > */ > } > > + /* We are about to do full GPU reset, don't forget about GuC */ > + if (engine_mask == ALL_ENGINES) > + intel_uc_reset_prepare(i915); Eh, this is done in reset_prepare already. The only other path to call intel_gpu_reset() directly is along sanitization, which should also have already sanitized the guc as well. No? -Chris
On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Michal Wajdeczko (2019-05-17 17:22:27) >> When we reset engines using ALL_ENGINES mask, we will do >> full GPU reset and GuC will be also affected. Let GuC be >> prepared for upcoming reset. >> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >> --- >> drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c >> b/drivers/gpu/drm/i915/gt/intel_reset.c >> index 464369bc55ad..ca6e40b6b4e2 100644 >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct >> drm_i915_private *i915, >> */ >> } >> >> + /* We are about to do full GPU reset, don't forget about GuC */ >> + if (engine_mask == ALL_ENGINES) >> + intel_uc_reset_prepare(i915); > > Eh, this is done in reset_prepare already. The only other path to call > intel_gpu_reset() directly is along sanitization, which should also have > already sanitized the guc as well. No? There is igt_atomic_reset selftest which does not call reset_prepare. And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL, our later graceful goodbye with GuC was not working. This is hidden with current GuC fw, but with new ICL fw with CT is was visible as: <3> [508.613024] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x4506 failed with error -110 0x4506 <3> [508.613142] [drm:guc_action_deregister_ct_buffer [i915]] *ERROR* CT: deregister SEND buffer failed; owner=0 err=-110 <3> [508.623521] [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x4506 failed with error -110 0x4506 <3> [508.623573] [drm:guc_action_deregister_ct_buffer [i915]] *ERROR* CT: deregister RECV buffer failed; owner=0 err=-110 Michal
Quoting Michal Wajdeczko (2019-05-17 17:54:53) > On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > > Quoting Michal Wajdeczko (2019-05-17 17:22:27) > >> When we reset engines using ALL_ENGINES mask, we will do > >> full GPU reset and GuC will be also affected. Let GuC be > >> prepared for upcoming reset. > >> > >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > >> --- > >> drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c > >> b/drivers/gpu/drm/i915/gt/intel_reset.c > >> index 464369bc55ad..ca6e40b6b4e2 100644 > >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c > >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct > >> drm_i915_private *i915, > >> */ > >> } > >> > >> + /* We are about to do full GPU reset, don't forget about GuC */ > >> + if (engine_mask == ALL_ENGINES) > >> + intel_uc_reset_prepare(i915); > > > > Eh, this is done in reset_prepare already. The only other path to call > > intel_gpu_reset() directly is along sanitization, which should also have > > already sanitized the guc as well. No? > > There is igt_atomic_reset selftest which does not call reset_prepare. > And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL, > our later graceful goodbye with GuC was not working. Ok, the intent there was to avoid letting any sleeps slip into the device reset itself (should we need to put it inside a heavy hammer like stop_machine() to serialise it with direct user access to the hw). I think it is fair to put that under reset_prepare / reset_finish. Would also mean moving it out of selftest_hangcheck and into selftest_reset. -Chris
Quoting Michal Wajdeczko (2019-05-17 17:54:53) > On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > > Quoting Michal Wajdeczko (2019-05-17 17:22:27) > >> When we reset engines using ALL_ENGINES mask, we will do > >> full GPU reset and GuC will be also affected. Let GuC be > >> prepared for upcoming reset. > >> > >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > >> --- > >> drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c > >> b/drivers/gpu/drm/i915/gt/intel_reset.c > >> index 464369bc55ad..ca6e40b6b4e2 100644 > >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c > >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct > >> drm_i915_private *i915, > >> */ > >> } > >> > >> + /* We are about to do full GPU reset, don't forget about GuC */ > >> + if (engine_mask == ALL_ENGINES) > >> + intel_uc_reset_prepare(i915); > > > > Eh, this is done in reset_prepare already. The only other path to call > > intel_gpu_reset() directly is along sanitization, which should also have > > already sanitized the guc as well. No? > > There is igt_atomic_reset selftest which does not call reset_prepare. > And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL, > our later graceful goodbye with GuC was not working. (Apologies if resent.) Ah, so the intent there was to prevent sleeps slipping into the device-level reset (so that we could pull it under a heavy hammer like stop_machine() to serialise the reset with direct user access). At one point, I hoped we could make i915_reset() completely fast and lockless, but that was a fleeting fancy. It looks reasonable to put that under a reset_prepare/reset_finish and keep the test semantics. That would also mean we have to move it out of the increasingly misnamed selftest_hangcheck.c into a selftest_reset.c -Chris
Quoting Michal Wajdeczko (2019-05-17 17:54:53) > On Fri, 17 May 2019 18:27:44 +0200, Chris Wilson > <chris@chris-wilson.co.uk> wrote: > > > Quoting Michal Wajdeczko (2019-05-17 17:22:27) > >> When we reset engines using ALL_ENGINES mask, we will do > >> full GPU reset and GuC will be also affected. Let GuC be > >> prepared for upcoming reset. > >> > >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> > >> Cc: Chris Wilson <chris@chris-wilson.co.uk> > >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > >> --- > >> drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c > >> b/drivers/gpu/drm/i915/gt/intel_reset.c > >> index 464369bc55ad..ca6e40b6b4e2 100644 > >> --- a/drivers/gpu/drm/i915/gt/intel_reset.c > >> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > >> @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct > >> drm_i915_private *i915, > >> */ > >> } > >> > >> + /* We are about to do full GPU reset, don't forget about GuC */ > >> + if (engine_mask == ALL_ENGINES) > >> + intel_uc_reset_prepare(i915); > > > > Eh, this is done in reset_prepare already. The only other path to call > > intel_gpu_reset() directly is along sanitization, which should also have > > already sanitized the guc as well. No? > > There is igt_atomic_reset selftest which does not call reset_prepare. > And since we lost GuC in gen6_reset_engines due to GEN6_GRDOM_FULL, > our later graceful goodbye with GuC was not working. > > This is hidden with current GuC fw, but with new ICL fw with CT is was > visible as: Imagine we fix the selftest, is there a GEM_BUG_ON(intel_uc_active()) we could put here to enforce sanitization first? Does such a assert want to be here or up a level in intel_gpu_reset()? -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 464369bc55ad..ca6e40b6b4e2 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -564,6 +564,10 @@ static int gen8_reset_engines(struct drm_i915_private *i915, */ } + /* We are about to do full GPU reset, don't forget about GuC */ + if (engine_mask == ALL_ENGINES) + intel_uc_reset_prepare(i915); + if (INTEL_GEN(i915) >= 11) ret = gen11_reset_engines(i915, engine_mask, retry); else
When we reset engines using ALL_ENGINES mask, we will do full GPU reset and GuC will be also affected. Let GuC be prepared for upcoming reset. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> --- drivers/gpu/drm/i915/gt/intel_reset.c | 4 ++++ 1 file changed, 4 insertions(+)