Message ID | 20250325114306.3740022-2-maciej.falkowski@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | PM fixes in Metric Steamer code | expand |
On 3/25/25 04:43, Maciej Falkowski wrote: > From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> > > Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after > file_priv->ms_lock is acquired. > > During a failure in runtime resume, a cold boot is executed, which > calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup() > that acquires file_priv->ms_lock and causes the deadlock. > > Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") > Cc: <stable@vger.kernel.org> # v6.11+ > Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> > Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> > --- > drivers/accel/ivpu/ivpu_ms.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c > index ffe7b10f8a76..eb485cf15ad6 100644 > --- a/drivers/accel/ivpu/ivpu_ms.c > +++ b/drivers/accel/ivpu/ivpu_ms.c > @@ -4,6 +4,7 @@ > */ > > #include <drm/drm_file.h> > +#include <linux/pm_runtime.h> > > #include "ivpu_drv.h" > #include "ivpu_gem.h" > @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file * > void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) > { > struct ivpu_ms_instance *ms, *tmp; > + struct ivpu_device *vdev = file_priv->vdev; > + > + pm_runtime_get_sync(vdev->drm.dev); Could get_sync() be failed here? Maybe it is better to add warning for failure? Lizhi > > mutex_lock(&file_priv->ms_lock); > > @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) > free_instance(file_priv, ms); > > mutex_unlock(&file_priv->ms_lock); > + > + pm_runtime_put_autosuspend(vdev->drm.dev); > } > > void ivpu_ms_cleanup_all(struct ivpu_device *vdev)
Hi, On 3/25/2025 9:50 PM, Lizhi Hou wrote: > > On 3/25/25 04:43, Maciej Falkowski wrote: >> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >> >> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after >> file_priv->ms_lock is acquired. >> >> During a failure in runtime resume, a cold boot is executed, which >> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup() >> that acquires file_priv->ms_lock and causes the deadlock. >> >> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") >> Cc: <stable@vger.kernel.org> # v6.11+ >> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> >> --- >> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c >> index ffe7b10f8a76..eb485cf15ad6 100644 >> --- a/drivers/accel/ivpu/ivpu_ms.c >> +++ b/drivers/accel/ivpu/ivpu_ms.c >> @@ -4,6 +4,7 @@ >> */ >> #include <drm/drm_file.h> >> +#include <linux/pm_runtime.h> >> #include "ivpu_drv.h" >> #include "ivpu_gem.h" >> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file * >> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >> { >> struct ivpu_ms_instance *ms, *tmp; >> + struct ivpu_device *vdev = file_priv->vdev; >> + >> + pm_runtime_get_sync(vdev->drm.dev); > > Could get_sync() be failed here? Maybe it is better to add warning for failure? Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()). > >> mutex_lock(&file_priv->ms_lock); >> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >> free_instance(file_priv, ms); >> mutex_unlock(&file_priv->ms_lock); >> + >> + pm_runtime_put_autosuspend(vdev->drm.dev); >> } >> void ivpu_ms_cleanup_all(struct ivpu_device *vdev) Regards, Jacek
On 3/26/25 01:06, Jacek Lawrynowicz wrote: > Hi, > > On 3/25/2025 9:50 PM, Lizhi Hou wrote: >> On 3/25/25 04:43, Maciej Falkowski wrote: >>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >>> >>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after >>> file_priv->ms_lock is acquired. >>> >>> During a failure in runtime resume, a cold boot is executed, which >>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup() >>> that acquires file_priv->ms_lock and causes the deadlock. >>> >>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") >>> Cc: <stable@vger.kernel.org> # v6.11+ >>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> >>> --- >>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c >>> index ffe7b10f8a76..eb485cf15ad6 100644 >>> --- a/drivers/accel/ivpu/ivpu_ms.c >>> +++ b/drivers/accel/ivpu/ivpu_ms.c >>> @@ -4,6 +4,7 @@ >>> */ >>> #include <drm/drm_file.h> >>> +#include <linux/pm_runtime.h> >>> #include "ivpu_drv.h" >>> #include "ivpu_gem.h" >>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file * >>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >>> { >>> struct ivpu_ms_instance *ms, *tmp; >>> + struct ivpu_device *vdev = file_priv->vdev; >>> + >>> + pm_runtime_get_sync(vdev->drm.dev); >> Could get_sync() be failed here? Maybe it is better to add warning for failure? > Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()). Will the deadlock still happens if this function fails? Lizhi >>> mutex_lock(&file_priv->ms_lock); >>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >>> free_instance(file_priv, ms); >>> mutex_unlock(&file_priv->ms_lock); >>> + >>> + pm_runtime_put_autosuspend(vdev->drm.dev); >>> } >>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev) > Regards, > Jacek >
On 3/27/2025 6:38 PM, Lizhi Hou wrote: > > On 3/26/25 01:06, Jacek Lawrynowicz wrote: >> Hi, >> >> On 3/25/2025 9:50 PM, Lizhi Hou wrote: >>> On 3/25/25 04:43, Maciej Falkowski wrote: >>>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >>>> >>>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after >>>> file_priv->ms_lock is acquired. >>>> >>>> During a failure in runtime resume, a cold boot is executed, which >>>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup() >>>> that acquires file_priv->ms_lock and causes the deadlock. >>>> >>>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") >>>> Cc: <stable@vger.kernel.org> # v6.11+ >>>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >>>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> >>>> --- >>>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c >>>> index ffe7b10f8a76..eb485cf15ad6 100644 >>>> --- a/drivers/accel/ivpu/ivpu_ms.c >>>> +++ b/drivers/accel/ivpu/ivpu_ms.c >>>> @@ -4,6 +4,7 @@ >>>> */ >>>> #include <drm/drm_file.h> >>>> +#include <linux/pm_runtime.h> >>>> #include "ivpu_drv.h" >>>> #include "ivpu_gem.h" >>>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file * >>>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >>>> { >>>> struct ivpu_ms_instance *ms, *tmp; >>>> + struct ivpu_device *vdev = file_priv->vdev; >>>> + >>>> + pm_runtime_get_sync(vdev->drm.dev); >>> Could get_sync() be failed here? Maybe it is better to add warning for failure? >> Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()). > > Will the deadlock still happens if this function fails? No. The deadlock was caused by runtime resume in free_instance(). pm_runtime_get_sync() will always bump PM usage counter, so there will be no resume regardless if it fails or not. >>>> mutex_lock(&file_priv->ms_lock); >>>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >>>> free_instance(file_priv, ms); >>>> mutex_unlock(&file_priv->ms_lock); >>>> + >>>> + pm_runtime_put_autosuspend(vdev->drm.dev); >>>> } >>>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev) >> Regards, >> Jacek >>
On 3/28/25 01:23, Jacek Lawrynowicz wrote: > > On 3/27/2025 6:38 PM, Lizhi Hou wrote: >> On 3/26/25 01:06, Jacek Lawrynowicz wrote: >>> Hi, >>> >>> On 3/25/2025 9:50 PM, Lizhi Hou wrote: >>>> On 3/25/25 04:43, Maciej Falkowski wrote: >>>>> From: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >>>>> >>>>> Fix deadlock in ivpu_ms_cleanup() by preventing runtime resume after >>>>> file_priv->ms_lock is acquired. >>>>> >>>>> During a failure in runtime resume, a cold boot is executed, which >>>>> calls ivpu_ms_cleanup_all(). This function calls ivpu_ms_cleanup() >>>>> that acquires file_priv->ms_lock and causes the deadlock. >>>>> >>>>> Fixes: cdfad4db7756 ("accel/ivpu: Add NPU profiling support") >>>>> Cc: <stable@vger.kernel.org> # v6.11+ >>>>> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> >>>>> Signed-off-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> >>>>> --- >>>>> drivers/accel/ivpu/ivpu_ms.c | 6 ++++++ >>>>> 1 file changed, 6 insertions(+) >>>>> >>>>> diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c >>>>> index ffe7b10f8a76..eb485cf15ad6 100644 >>>>> --- a/drivers/accel/ivpu/ivpu_ms.c >>>>> +++ b/drivers/accel/ivpu/ivpu_ms.c >>>>> @@ -4,6 +4,7 @@ >>>>> */ >>>>> #include <drm/drm_file.h> >>>>> +#include <linux/pm_runtime.h> >>>>> #include "ivpu_drv.h" >>>>> #include "ivpu_gem.h" >>>>> @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file * >>>>> void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >>>>> { >>>>> struct ivpu_ms_instance *ms, *tmp; >>>>> + struct ivpu_device *vdev = file_priv->vdev; >>>>> + >>>>> + pm_runtime_get_sync(vdev->drm.dev); >>>> Could get_sync() be failed here? Maybe it is better to add warning for failure? >>> Yes, this could fail but we already have detailed warnings in runtime resume callback (ivpu_pm_runtime_resume_cb()). >> Will the deadlock still happens if this function fails? > No. The deadlock was caused by runtime resume in free_instance(). > pm_runtime_get_sync() will always bump PM usage counter, so there will be no resume regardless if it fails or not. Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> >>>>> mutex_lock(&file_priv->ms_lock); >>>>> @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) >>>>> free_instance(file_priv, ms); >>>>> mutex_unlock(&file_priv->ms_lock); >>>>> + >>>>> + pm_runtime_put_autosuspend(vdev->drm.dev); >>>>> } >>>>> void ivpu_ms_cleanup_all(struct ivpu_device *vdev) >>> Regards, >>> Jacek >>>
diff --git a/drivers/accel/ivpu/ivpu_ms.c b/drivers/accel/ivpu/ivpu_ms.c index ffe7b10f8a76..eb485cf15ad6 100644 --- a/drivers/accel/ivpu/ivpu_ms.c +++ b/drivers/accel/ivpu/ivpu_ms.c @@ -4,6 +4,7 @@ */ #include <drm/drm_file.h> +#include <linux/pm_runtime.h> #include "ivpu_drv.h" #include "ivpu_gem.h" @@ -281,6 +282,9 @@ int ivpu_ms_get_info_ioctl(struct drm_device *dev, void *data, struct drm_file * void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) { struct ivpu_ms_instance *ms, *tmp; + struct ivpu_device *vdev = file_priv->vdev; + + pm_runtime_get_sync(vdev->drm.dev); mutex_lock(&file_priv->ms_lock); @@ -293,6 +297,8 @@ void ivpu_ms_cleanup(struct ivpu_file_priv *file_priv) free_instance(file_priv, ms); mutex_unlock(&file_priv->ms_lock); + + pm_runtime_put_autosuspend(vdev->drm.dev); } void ivpu_ms_cleanup_all(struct ivpu_device *vdev)