Message ID | 1505650654-11091-8-git-send-email-sagar.a.kamble@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
<SNIP> On 09/17/2017 05:17 AM, Sagar Arun Kamble wrote: > Teardown of GuC HW/SW state was not properly done in unload path. > guc_submission_disable was called as part of intel_uc_fini_hw which > happens post gem_unload in the i915_driver_unload path. > s/i915_gem_fini/i915_gem_cleanup as it looks more suitable as that > function does cleanup. > To differentiate the tasks during suspend and unload w.r.t GuC this > patch introduces new function i915_gem_fini which in addition to > disabling GuC interfaces also disables GuC submission during which > communication with GuC is needed for destroying doorbell. > i915_gem_fini is copy of i915_gem_suspend with difference w.r.t > GuC operations. To achieve this, new helpers i915_gem_context_suspend > and i915_gem_suspend_complete are prepared. > Sagar, I just realized this patch would make this comment in guc_client_free superfluous, right?: /* FIXME: in many cases, by the time we get here the GuC has been * reset, so we cannot destroy the doorbell properly. Ignore the * error message for now */ Can you make sure you remove it in this patch? Thanks!
On 9/22/2017 12:03 AM, Oscar Mateo wrote: > <SNIP> > > > On 09/17/2017 05:17 AM, Sagar Arun Kamble wrote: >> Teardown of GuC HW/SW state was not properly done in unload path. >> guc_submission_disable was called as part of intel_uc_fini_hw which >> happens post gem_unload in the i915_driver_unload path. >> s/i915_gem_fini/i915_gem_cleanup as it looks more suitable as that >> function does cleanup. >> To differentiate the tasks during suspend and unload w.r.t GuC this >> patch introduces new function i915_gem_fini which in addition to >> disabling GuC interfaces also disables GuC submission during which >> communication with GuC is needed for destroying doorbell. >> i915_gem_fini is copy of i915_gem_suspend with difference w.r.t >> GuC operations. To achieve this, new helpers i915_gem_context_suspend >> and i915_gem_suspend_complete are prepared. >> > > Sagar, I just realized this patch would make this comment in > guc_client_free superfluous, right?: > > /* FIXME: in many cases, by the time we get here the GuC has been > * reset, so we cannot destroy the doorbell properly. Ignore the > * error message for now */ > > Can you make sure you remove it in this patch? > > Thanks! Yes. Will need to remove this comment :) Will make this change as part of https://patchwork.freedesktop.org/patch/178189/ How about another comment before it? i915_gem_suspend is ensuring no outstanding submissions before coming to uc_suspend. So shall we remove that as well? /* * XXX: wait for any outstanding submissions before freeing memory. * Be sure to drop any locks */
On 09/21/2017 12:09 PM, Sagar Arun Kamble wrote: > > > On 9/22/2017 12:03 AM, Oscar Mateo wrote: >> <SNIP> >> >> >> On 09/17/2017 05:17 AM, Sagar Arun Kamble wrote: >>> Teardown of GuC HW/SW state was not properly done in unload path. >>> guc_submission_disable was called as part of intel_uc_fini_hw which >>> happens post gem_unload in the i915_driver_unload path. >>> s/i915_gem_fini/i915_gem_cleanup as it looks more suitable as that >>> function does cleanup. >>> To differentiate the tasks during suspend and unload w.r.t GuC this >>> patch introduces new function i915_gem_fini which in addition to >>> disabling GuC interfaces also disables GuC submission during which >>> communication with GuC is needed for destroying doorbell. >>> i915_gem_fini is copy of i915_gem_suspend with difference w.r.t >>> GuC operations. To achieve this, new helpers i915_gem_context_suspend >>> and i915_gem_suspend_complete are prepared. >>> >> >> Sagar, I just realized this patch would make this comment in >> guc_client_free superfluous, right?: >> >> /* FIXME: in many cases, by the time we get here the GuC has been >> * reset, so we cannot destroy the doorbell properly. Ignore the >> * error message for now */ >> >> Can you make sure you remove it in this patch? >> >> Thanks! > Yes. Will need to remove this comment :) > Will make this change as part of > https://patchwork.freedesktop.org/patch/178189/ > How about another comment before it? i915_gem_suspend is ensuring no > outstanding submissions before coming to > uc_suspend. So shall we remove that as well? > > /* > * XXX: wait for any outstanding submissions before freeing > memory. > * Be sure to drop any locks > */ I have the impression that this comment was here as a future design guide for Direct Submission. It seems to be the case for almost all the "XXX" comments in that file, like: /* XXX: wait for any interrupts */ /* XXX: wait for workqueue to drain */ Since Direct Submission is not being considered at the moment, maybe we can just drop of all them (or transform them into something more innocuous than a "XXX"). What do people think?
On 9/22/2017 1:19 AM, Oscar Mateo wrote: > > > On 09/21/2017 12:09 PM, Sagar Arun Kamble wrote: >> >> >> On 9/22/2017 12:03 AM, Oscar Mateo wrote: >>> <SNIP> >>> >>> >>> On 09/17/2017 05:17 AM, Sagar Arun Kamble wrote: >>>> Teardown of GuC HW/SW state was not properly done in unload path. >>>> guc_submission_disable was called as part of intel_uc_fini_hw which >>>> happens post gem_unload in the i915_driver_unload path. >>>> s/i915_gem_fini/i915_gem_cleanup as it looks more suitable as that >>>> function does cleanup. >>>> To differentiate the tasks during suspend and unload w.r.t GuC this >>>> patch introduces new function i915_gem_fini which in addition to >>>> disabling GuC interfaces also disables GuC submission during which >>>> communication with GuC is needed for destroying doorbell. >>>> i915_gem_fini is copy of i915_gem_suspend with difference w.r.t >>>> GuC operations. To achieve this, new helpers i915_gem_context_suspend >>>> and i915_gem_suspend_complete are prepared. >>>> >>> >>> Sagar, I just realized this patch would make this comment in >>> guc_client_free superfluous, right?: >>> >>> /* FIXME: in many cases, by the time we get here the GuC has been >>> * reset, so we cannot destroy the doorbell properly. Ignore the >>> * error message for now */ >>> >>> Can you make sure you remove it in this patch? >>> >>> Thanks! >> Yes. Will need to remove this comment :) >> Will make this change as part of >> https://patchwork.freedesktop.org/patch/178189/ >> How about another comment before it? i915_gem_suspend is ensuring no >> outstanding submissions before coming to >> uc_suspend. So shall we remove that as well? >> >> /* >> * XXX: wait for any outstanding submissions before freeing >> memory. >> * Be sure to drop any locks >> */ > > I have the impression that this comment was here as a future design > guide for Direct Submission. It seems to be the case for almost all > the "XXX" comments in that file, like: > > /* XXX: wait for any interrupts */ > /* XXX: wait for workqueue to drain */ > > Since Direct Submission is not being considered at the moment, maybe > we can just drop of all them (or transform them into something more > innocuous than a "XXX"). > What do people think? > I feel it makes sense to keep for future changes then. If needed, let us update these in separate patch. Thanks for the review Oscar.
This was reply to older series hence I replied patch there. Will update w.r.t new series. On 9/22/2017 12:03 AM, Oscar Mateo wrote: > <SNIP> > > > On 09/17/2017 05:17 AM, Sagar Arun Kamble wrote: >> Teardown of GuC HW/SW state was not properly done in unload path. >> guc_submission_disable was called as part of intel_uc_fini_hw which >> happens post gem_unload in the i915_driver_unload path. >> s/i915_gem_fini/i915_gem_cleanup as it looks more suitable as that >> function does cleanup. >> To differentiate the tasks during suspend and unload w.r.t GuC this >> patch introduces new function i915_gem_fini which in addition to >> disabling GuC interfaces also disables GuC submission during which >> communication with GuC is needed for destroying doorbell. >> i915_gem_fini is copy of i915_gem_suspend with difference w.r.t >> GuC operations. To achieve this, new helpers i915_gem_context_suspend >> and i915_gem_suspend_complete are prepared. >> > > Sagar, I just realized this patch would make this comment in > guc_client_free superfluous, right?: > > /* FIXME: in many cases, by the time we get here the GuC has been > * reset, so we cannot destroy the doorbell properly. Ignore the > * error message for now */ > > Can you make sure you remove it in this patch? > > Thanks!
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 4751679..f9622fc 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -596,13 +596,13 @@ static bool i915_switcheroo_can_switch(struct pci_dev *pdev) .can_switch = i915_switcheroo_can_switch, }; -static void i915_gem_fini(struct drm_i915_private *dev_priv) +static void i915_gem_cleanup(struct drm_i915_private *dev_priv) { /* Flush any outstanding unpin_work. */ i915_gem_drain_workqueue(dev_priv); mutex_lock(&dev_priv->drm.struct_mutex); - intel_uc_fini_hw(dev_priv); + intel_uc_cleanup(dev_priv); i915_gem_cleanup_engines(dev_priv); i915_gem_contexts_fini(dev_priv); i915_gem_cleanup_userptr(dev_priv); @@ -683,9 +683,9 @@ static int i915_load_modeset_init(struct drm_device *dev) return 0; cleanup_gem: - if (i915_gem_suspend(dev_priv)) + if (i915_gem_fini(dev_priv)) DRM_ERROR("failed to idle hardware; continuing to unload!\n"); - i915_gem_fini(dev_priv); + i915_gem_cleanup(dev_priv); cleanup_uc: intel_uc_fini_fw(dev_priv); cleanup_irq: @@ -1376,7 +1376,7 @@ void i915_driver_unload(struct drm_device *dev) i915_driver_unregister(dev_priv); - if (i915_gem_suspend(dev_priv)) + if (i915_gem_fini(dev_priv)) DRM_ERROR("failed to idle hardware; continuing to unload!\n"); intel_display_power_get(dev_priv, POWER_DOMAIN_INIT); @@ -1410,7 +1410,7 @@ void i915_driver_unload(struct drm_device *dev) cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); i915_reset_error_state(dev_priv); - i915_gem_fini(dev_priv); + i915_gem_cleanup(dev_priv); intel_uc_fini_fw(dev_priv); intel_fbc_cleanup_cfb(dev_priv); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a3b7435..341f2ba 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -3671,6 +3671,7 @@ void i915_gem_reset_engine(struct intel_engine_cs *engine, int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv, unsigned int flags); int __must_check i915_gem_suspend(struct drm_i915_private *dev_priv); +int __must_check i915_gem_fini(struct drm_i915_private *dev_priv); void i915_gem_resume(struct drm_i915_private *dev_priv); int i915_gem_fault(struct vm_fault *vmf); int i915_gem_object_wait(struct drm_i915_gem_object *obj, diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 7354307..99f55bd 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4545,12 +4545,11 @@ void i915_gem_sanitize(struct drm_i915_private *i915) } } -int i915_gem_suspend(struct drm_i915_private *dev_priv) +static int i915_gem_contexts_suspend(struct drm_i915_private *dev_priv) { struct drm_device *dev = &dev_priv->drm; int ret; - intel_runtime_pm_get(dev_priv); intel_suspend_gt_powersave(dev_priv); mutex_lock(&dev->struct_mutex); @@ -4577,8 +4576,15 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv) i915_gem_contexts_lost(dev_priv); mutex_unlock(&dev->struct_mutex); - intel_uc_system_suspend(dev_priv); + return 0; + +err_unlock: + mutex_unlock(&dev->struct_mutex); + return ret; +} +static void i915_gem_suspend_complete(struct drm_i915_private *dev_priv) +{ cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work); cancel_delayed_work_sync(&dev_priv->gt.retire_work); @@ -4615,12 +4621,48 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv) * machine in an unusable condition. */ i915_gem_sanitize(dev_priv); +} + +int i915_gem_suspend(struct drm_i915_private *dev_priv) +{ + int ret; + + intel_runtime_pm_get(dev_priv); + + ret = i915_gem_contexts_suspend(dev_priv); + if (ret && ret != -EIO) + goto err_suspend; + + intel_uc_system_suspend(dev_priv); + + i915_gem_suspend_complete(dev_priv); intel_runtime_pm_put(dev_priv); return 0; -err_unlock: - mutex_unlock(&dev->struct_mutex); +err_suspend: + intel_runtime_pm_put(dev_priv); + return ret; +} + +int i915_gem_fini(struct drm_i915_private *dev_priv) +{ + int ret; + + intel_runtime_pm_get(dev_priv); + + ret = i915_gem_contexts_suspend(dev_priv); + if (ret && ret != -EIO) + goto err_suspend; + + intel_uc_fini_hw(dev_priv); + + i915_gem_suspend_complete(dev_priv); + + intel_runtime_pm_put(dev_priv); + return 0; + +err_suspend: intel_runtime_pm_put(dev_priv); return ret; } diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c index 904cc58..a288c70 100644 --- a/drivers/gpu/drm/i915/intel_guc.c +++ b/drivers/gpu/drm/i915/intel_guc.c @@ -292,3 +292,44 @@ int intel_guc_system_resume(struct intel_guc *guc) */ return ret; } + +void intel_guc_fini_hw(struct intel_guc *guc) +{ + struct drm_i915_private *dev_priv = guc_to_i915(guc); + struct drm_device *dev = &dev_priv->drm; + + if (i915.enable_guc_submission) { + mutex_lock(&dev->struct_mutex); + i915_guc_submission_disable(dev_priv); + mutex_unlock(&dev->struct_mutex); + intel_guc_reset_prepare(guc); + } +} + +void intel_guc_capture_load_err_log(struct intel_guc *guc) +{ + if (!guc->log.vma || i915.guc_log_level < 0) + return; + + if (!guc->load_err_log) + guc->load_err_log = i915_gem_object_get(guc->log.vma->obj); +} + +static void guc_free_load_err_log(struct intel_guc *guc) +{ + if (guc->load_err_log) + i915_gem_object_put(guc->load_err_log); +} + +void intel_guc_cleanup(struct intel_guc *guc) +{ + struct drm_i915_private *dev_priv = guc_to_i915(guc); + + if (!i915.enable_guc_loading) + return; + + guc_free_load_err_log(guc); + + if (i915.enable_guc_submission) + i915_guc_submission_fini(dev_priv); +} diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h index e912667..4d7561c 100644 --- a/drivers/gpu/drm/i915/intel_guc.h +++ b/drivers/gpu/drm/i915/intel_guc.h @@ -153,6 +153,9 @@ static inline u32 guc_ggtt_offset(struct i915_vma *vma) int intel_guc_runtime_resume(struct intel_guc *guc); int intel_guc_system_suspend(struct intel_guc *guc); int intel_guc_system_resume(struct intel_guc *guc); +void intel_guc_fini_hw(struct intel_guc *guc); +void intel_guc_capture_load_err_log(struct intel_guc *guc); +void intel_guc_cleanup(struct intel_guc *guc); /* intel_guc_loader.c */ int intel_guc_select_fw(struct intel_guc *guc); diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c index 759d8f4e..52e55e1 100644 --- a/drivers/gpu/drm/i915/intel_uc.c +++ b/drivers/gpu/drm/i915/intel_uc.c @@ -250,23 +250,6 @@ void intel_uc_fini_fw(struct drm_i915_private *dev_priv) __intel_uc_fw_fini(&dev_priv->huc.fw); } -static void guc_capture_load_err_log(struct intel_guc *guc) -{ - if (!guc->log.vma || i915.guc_log_level < 0) - return; - - if (!guc->load_err_log) - guc->load_err_log = i915_gem_object_get(guc->log.vma->obj); - - return; -} - -static void guc_free_load_err_log(struct intel_guc *guc) -{ - if (guc->load_err_log) - i915_gem_object_put(guc->load_err_log); -} - static int guc_enable_communication(struct intel_guc *guc) { struct drm_i915_private *dev_priv = guc_to_i915(guc); @@ -377,7 +360,7 @@ int intel_uc_init_hw(struct drm_i915_private *dev_priv) guc_disable_communication(guc); gen9_disable_guc_interrupts(dev_priv); err_log_capture: - guc_capture_load_err_log(guc); + intel_guc_capture_load_err_log(guc); err_submission: if (i915.enable_guc_submission) i915_guc_submission_fini(dev_priv); @@ -403,22 +386,12 @@ int intel_uc_init_hw(struct drm_i915_private *dev_priv) void intel_uc_fini_hw(struct drm_i915_private *dev_priv) { - guc_free_load_err_log(&dev_priv->guc); - - if (!i915.enable_guc_loading) - return; - - if (i915.enable_guc_submission) - i915_guc_submission_disable(dev_priv); - - guc_disable_communication(&dev_priv->guc); - - if (i915.enable_guc_submission) { - gen9_disable_guc_interrupts(dev_priv); - i915_guc_submission_fini(dev_priv); - } + intel_guc_fini_hw(&dev_priv->guc); +} - i915_ggtt_disable_guc(dev_priv); +void intel_uc_cleanup(struct drm_i915_private *dev_priv) +{ + intel_guc_cleanup(&dev_priv->guc); } void intel_uc_reset_prepare(struct drm_i915_private *dev_priv) diff --git a/drivers/gpu/drm/i915/intel_uc.h b/drivers/gpu/drm/i915/intel_uc.h index 4d49a19..9fd8626 100644 --- a/drivers/gpu/drm/i915/intel_uc.h +++ b/drivers/gpu/drm/i915/intel_uc.h @@ -105,6 +105,7 @@ struct intel_uc_fw { void intel_uc_fini_fw(struct drm_i915_private *dev_priv); int intel_uc_init_hw(struct drm_i915_private *dev_priv); void intel_uc_fini_hw(struct drm_i915_private *dev_priv); +void intel_uc_cleanup(struct drm_i915_private *dev_priv); void intel_uc_reset_prepare(struct drm_i915_private *dev_priv); void intel_uc_runtime_suspend(struct drm_i915_private *dev_priv); int intel_uc_runtime_resume(struct drm_i915_private *dev_priv);
Teardown of GuC HW/SW state was not properly done in unload path. guc_submission_disable was called as part of intel_uc_fini_hw which happens post gem_unload in the i915_driver_unload path. s/i915_gem_fini/i915_gem_cleanup as it looks more suitable as that function does cleanup. To differentiate the tasks during suspend and unload w.r.t GuC this patch introduces new function i915_gem_fini which in addition to disabling GuC interfaces also disables GuC submission during which communication with GuC is needed for destroying doorbell. i915_gem_fini is copy of i915_gem_suspend with difference w.r.t GuC operations. To achieve this, new helpers i915_gem_context_suspend and i915_gem_suspend_complete are prepared. This patch updates the functions responsibilities as follows: 1. intel_uc_fini_hw: Disable all things that involve communication with GuC and internal operation of GuC firmware, currently submission. Post this disable all other state like guc_ggtt_invalidate, guc_communication and guc_interrupts. 2. intel_uc_cleanup: Free up all UC related memory and other data. v2: Prepared i915_gem_unload. (Michal) v3: Moved guc_free_load_err_log past i915.enable_guc_loading check in intel_uc_cleanup_hw. Prepared i915_gem_contexts_suspend and i915_gem_suspend_complete that are used in the two paths i915_gem_suspend and i915_gem_unload. Unload specific actions like disabling GuC submission and GuC communication are being done in i915_gem_unload. Commit message update. (Michał Winiarski) v4: prepared i915_gem_cleanup and intel_uc_cleanup. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 12 +++++----- drivers/gpu/drm/i915/i915_drv.h | 1 + drivers/gpu/drm/i915/i915_gem.c | 52 ++++++++++++++++++++++++++++++++++++---- drivers/gpu/drm/i915/intel_guc.c | 41 +++++++++++++++++++++++++++++++ drivers/gpu/drm/i915/intel_guc.h | 3 +++ drivers/gpu/drm/i915/intel_uc.c | 39 +++++------------------------- drivers/gpu/drm/i915/intel_uc.h | 1 + 7 files changed, 105 insertions(+), 44 deletions(-)