Message ID | 20230124110515.17017-1-nirmoy.das@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/gt: Clear wedged status upon suspend | expand |
Forgot to add the drm issue a reference. On 1/24/2023 12:05 PM, Nirmoy Das wrote: > From: Chris Wilson <chris.p.wilson@linux.intel.com> > > Currently we use set-wedged on suspend if the workload is not responding > in order to allow a fast suspend (albeit at the cost of discarding the > current userspace). This may leave the device wedged during suspend, > where we may require the device available in order to swapout CPU > inaccessible device memory. Clear any temporary wedged-status after > flushing userspace off the device so we can use the blitter ourselves > inside suspend. > > Testcase: igt/gem_eio/in-flight-suspend References: https://gitlab.freedesktop.org/drm/intel/-/issues/7896 > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > index cef3d6f5c34e..74d1dd3793f9 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > @@ -317,19 +317,17 @@ int intel_gt_resume(struct intel_gt *gt) > > static void wait_for_suspend(struct intel_gt *gt) > { > - if (!intel_gt_pm_is_awake(gt)) > - return; > - > - if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) { > + if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) > /* > * Forcibly cancel outstanding work and leave > * the gpu quiet. > */ > intel_gt_set_wedged(gt); > - intel_gt_retire_requests(gt); > - } > > intel_gt_pm_wait_for_idle(gt); > + > + /* Make the GPU available again for swapout */ > + intel_gt_unset_wedged(gt); > } > > void intel_gt_suspend_prepare(struct intel_gt *gt)
On Tue, Jan 24, 2023 at 12:07:19PM +0100, Das, Nirmoy wrote: > Forgot to add the drm issue a reference. > > On 1/24/2023 12:05 PM, Nirmoy Das wrote: > > From: Chris Wilson <chris.p.wilson@linux.intel.com> > > > > Currently we use set-wedged on suspend if the workload is not responding > > in order to allow a fast suspend (albeit at the cost of discarding the > > current userspace). This may leave the device wedged during suspend, > > where we may require the device available in order to swapout CPU > > inaccessible device memory. Clear any temporary wedged-status after > > flushing userspace off the device so we can use the blitter ourselves > > inside suspend. This seems a very good move. But this explain they unset_wedged part, not the removal of the retire_requests. Why don't we need to retire them anymore? Also, what are the chances of races here? I mean, we are marking the gpu as not wedged anymore. Do we have any warranty at this point that no further request will arrive? Shouldn't we have a way to differentiate between the totally wedged and blocked for user submission? > > > > Testcase: igt/gem_eio/in-flight-suspend > References: https://gitlab.freedesktop.org/drm/intel/-/issues/7896 > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> > > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 10 ++++------ > > 1 file changed, 4 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > index cef3d6f5c34e..74d1dd3793f9 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > > @@ -317,19 +317,17 @@ int intel_gt_resume(struct intel_gt *gt) > > static void wait_for_suspend(struct intel_gt *gt) > > { > > - if (!intel_gt_pm_is_awake(gt)) > > - return; > > - > > - if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) { > > + if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) > > /* > > * Forcibly cancel outstanding work and leave > > * the gpu quiet. > > */ > > intel_gt_set_wedged(gt); > > - intel_gt_retire_requests(gt); > > - } > > intel_gt_pm_wait_for_idle(gt); > > + > > + /* Make the GPU available again for swapout */ > > + intel_gt_unset_wedged(gt); > > } > > void intel_gt_suspend_prepare(struct intel_gt *gt)
Hi Rodrigo, On 1/24/2023 8:26 PM, Rodrigo Vivi wrote: > On Tue, Jan 24, 2023 at 12:07:19PM +0100, Das, Nirmoy wrote: >> Forgot to add the drm issue a reference. >> >> On 1/24/2023 12:05 PM, Nirmoy Das wrote: >>> From: Chris Wilson <chris.p.wilson@linux.intel.com> >>> >>> Currently we use set-wedged on suspend if the workload is not responding >>> in order to allow a fast suspend (albeit at the cost of discarding the >>> current userspace). This may leave the device wedged during suspend, >>> where we may require the device available in order to swapout CPU >>> inaccessible device memory. Clear any temporary wedged-status after >>> flushing userspace off the device so we can use the blitter ourselves >>> inside suspend. > This seems a very good move. But this explain they unset_wedged part, > not the removal of the retire_requests. Why don't we need to retire them > anymore? Thanks for noticing that. This on me, I missed another patch which moved the intel_gt_retire_requests() inside of intel_gt_set_wedged(). > > Also, what are the chances of races here? I mean, we are marking > the gpu as not wedged anymore. Do we have any warranty at this point > that no further request will arrive? The assumption was: this isĀ in single threaded suspend "context" so we should be fine but we just realized thatĀ this is getting called at pm prepare time. Thanks for raising this it seem I need to refactor i915_gem_backup_suspend() as well which should be called much later on. Regards, Nirmoy > > Shouldn't we have a way to differentiate between the totally wedged > and blocked for user submission? > >>> Testcase: igt/gem_eio/in-flight-suspend >> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7896 >>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com> >>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/intel_gt_pm.c | 10 ++++------ >>> 1 file changed, 4 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c >>> index cef3d6f5c34e..74d1dd3793f9 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c >>> @@ -317,19 +317,17 @@ int intel_gt_resume(struct intel_gt *gt) >>> static void wait_for_suspend(struct intel_gt *gt) >>> { >>> - if (!intel_gt_pm_is_awake(gt)) >>> - return; >>> - >>> - if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) { >>> + if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) >>> /* >>> * Forcibly cancel outstanding work and leave >>> * the gpu quiet. >>> */ >>> intel_gt_set_wedged(gt); >>> - intel_gt_retire_requests(gt); >>> - } >>> intel_gt_pm_wait_for_idle(gt); >>> + >>> + /* Make the GPU available again for swapout */ >>> + intel_gt_unset_wedged(gt); >>> } >>> void intel_gt_suspend_prepare(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index cef3d6f5c34e..74d1dd3793f9 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -317,19 +317,17 @@ int intel_gt_resume(struct intel_gt *gt) static void wait_for_suspend(struct intel_gt *gt) { - if (!intel_gt_pm_is_awake(gt)) - return; - - if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) { + if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) /* * Forcibly cancel outstanding work and leave * the gpu quiet. */ intel_gt_set_wedged(gt); - intel_gt_retire_requests(gt); - } intel_gt_pm_wait_for_idle(gt); + + /* Make the GPU available again for swapout */ + intel_gt_unset_wedged(gt); } void intel_gt_suspend_prepare(struct intel_gt *gt)