Message ID | 20241113154257.1971284-3-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/panthor: Be robust against failures in the resume path | expand |
On Wed, 13 Nov 2024 16:42:54 +0100 Boris Brezillon <boris.brezillon@collabora.com> wrote: > @@ -74,7 +76,8 @@ void panthor_device_unplug(struct panthor_device *ptdev) > */ > mutex_unlock(&ptdev->unplug.lock); > > - drm_WARN_ON(&ptdev->base, pm_runtime_get_sync(ptdev->base.dev) < 0); > + ret = pm_runtime_get_sync(ptdev->base.dev); > + drm_WARN_ON(&ptdev->base, ret < 0); > > /* Now, try to cleanly shutdown the GPU before the device resources > * get reclaimed. > @@ -85,7 +88,10 @@ void panthor_device_unplug(struct panthor_device *ptdev) > panthor_gpu_unplug(ptdev); > > pm_runtime_dont_use_autosuspend(ptdev->base.dev); > - pm_runtime_put_sync_suspend(ptdev->base.dev); > + > + /* If the resume failed, we don't need to suspend here. */ > + if (!ret) > + pm_runtime_put_sync_suspend(ptdev->base.dev); Okay, I always get confused by pm_runtime_get_sync(). Turns out the refcount is incremented even if pm_runtime_get_sync() fails, so we should call pm_runtime_put_sync_suspend() unconditionally here.
On Wed, 13 Nov 2024 16:42:54 +0100 Boris Brezillon <boris.brezillon@collabora.com> wrote: > @@ -541,17 +547,4 @@ int panthor_device_suspend(struct device *dev) > clk_disable_unprepare(ptdev->clks.core); > atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); > return 0; > - > -err_set_active: > - /* If something failed and we have to revert back to an > - * active state, we also need to clear the MMIO userspace > - * mappings, so any dumb pages that were mapped while we > - * were trying to suspend gets invalidated. > - */ > - mutex_lock(&ptdev->pm.mmio_lock); > - atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_ACTIVE); > - unmap_mapping_range(ptdev->base.anon_inode->i_mapping, > - DRM_PANTHOR_USER_MMIO_OFFSET, 0, 1); > - mutex_unlock(&ptdev->pm.mmio_lock); > - return ret; > } This chunk should be in patch 3.
On Wed, Nov 13, 2024 at 04:42:54PM +0100, Boris Brezillon wrote: > The runtime PM resume operation is not guaranteed to succeed, but if it > fails, the device should be in a suspended state. Make sure we're robust > to resume failures in the unplug path. > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > drivers/gpu/drm/panthor/panthor_device.c | 23 ++++++++--------------- > drivers/gpu/drm/panthor/panthor_fw.c | 14 +++++++++----- > drivers/gpu/drm/panthor/panthor_gpu.c | 3 ++- > drivers/gpu/drm/panthor/panthor_mmu.c | 3 ++- > 4 files changed, 21 insertions(+), 22 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c > index db7ba40f771d..8b5d54b2bbb4 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.c > +++ b/drivers/gpu/drm/panthor/panthor_device.c > @@ -48,6 +48,8 @@ static int panthor_clk_init(struct panthor_device *ptdev) > > void panthor_device_unplug(struct panthor_device *ptdev) > { > + int ret; > + > /* This function can be called from two different path: the reset work > * and the platform device remove callback. drm_dev_unplug() doesn't > * deal with concurrent callers, so we have to protect drm_dev_unplug() > @@ -74,7 +76,8 @@ void panthor_device_unplug(struct panthor_device *ptdev) > */ > mutex_unlock(&ptdev->unplug.lock); > > - drm_WARN_ON(&ptdev->base, pm_runtime_get_sync(ptdev->base.dev) < 0); > + ret = pm_runtime_get_sync(ptdev->base.dev); > + drm_WARN_ON(&ptdev->base, ret < 0); > > /* Now, try to cleanly shutdown the GPU before the device resources > * get reclaimed. > @@ -85,7 +88,10 @@ void panthor_device_unplug(struct panthor_device *ptdev) > panthor_gpu_unplug(ptdev); > > pm_runtime_dont_use_autosuspend(ptdev->base.dev); > - pm_runtime_put_sync_suspend(ptdev->base.dev); > + > + /* If the resume failed, we don't need to suspend here. */ > + if (!ret) > + pm_runtime_put_sync_suspend(ptdev->base.dev); > > /* If PM is disabled, we need to call the suspend handler manually. */ > if (!IS_ENABLED(CONFIG_PM)) > @@ -541,17 +547,4 @@ int panthor_device_suspend(struct device *dev) > clk_disable_unprepare(ptdev->clks.core); > atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); > return 0; > - > -err_set_active: Label is being deleted here, but I don't see where the `goto err_set_active` is being removed. Best regards, Liviu > - /* If something failed and we have to revert back to an > - * active state, we also need to clear the MMIO userspace > - * mappings, so any dumb pages that were mapped while we > - * were trying to suspend gets invalidated. > - */ > - mutex_lock(&ptdev->pm.mmio_lock); > - atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_ACTIVE); > - unmap_mapping_range(ptdev->base.anon_inode->i_mapping, > - DRM_PANTHOR_USER_MMIO_OFFSET, 0, 1); > - mutex_unlock(&ptdev->pm.mmio_lock); > - return ret; > } > diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c > index 2060085cc9f3..df74750cf1b7 100644 > --- a/drivers/gpu/drm/panthor/panthor_fw.c > +++ b/drivers/gpu/drm/panthor/panthor_fw.c > @@ -12,6 +12,7 @@ > #include <linux/iosys-map.h> > #include <linux/mutex.h> > #include <linux/platform_device.h> > +#include <linux/pm_runtime.h> > > #include <drm/drm_drv.h> > #include <drm/drm_managed.h> > @@ -1188,11 +1189,13 @@ void panthor_fw_unplug(struct panthor_device *ptdev) > > cancel_delayed_work_sync(&ptdev->fw->watchdog.ping_work); > > - /* Make sure the IRQ handler can be called after that point. */ > - if (ptdev->fw->irq.irq) > - panthor_job_irq_suspend(&ptdev->fw->irq); > + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) { > + /* Make sure the IRQ handler can be called after that point. */ > + if (ptdev->fw->irq.irq) > + panthor_job_irq_suspend(&ptdev->fw->irq); > > - panthor_fw_stop(ptdev); > + panthor_fw_stop(ptdev); > + } > > list_for_each_entry(section, &ptdev->fw->sections, node) > panthor_kernel_bo_destroy(section->mem); > @@ -1205,7 +1208,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev) > panthor_vm_put(ptdev->fw->vm); > ptdev->fw->vm = NULL; > > - panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 20000); > + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) > + panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 20000); > } > > /** > diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c > index 2d3529a0b156..27702bc62dd6 100644 > --- a/drivers/gpu/drm/panthor/panthor_gpu.c > +++ b/drivers/gpu/drm/panthor/panthor_gpu.c > @@ -174,7 +174,8 @@ void panthor_gpu_unplug(struct panthor_device *ptdev) > unsigned long flags; > > /* Make sure the IRQ handler is not running after that point. */ > - panthor_gpu_irq_suspend(&ptdev->gpu->irq); > + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) > + panthor_gpu_irq_suspend(&ptdev->gpu->irq); > > /* Wake-up all waiters. */ > spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags); > diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c > index c2262c72e8e2..fee6c7d9fe0a 100644 > --- a/drivers/gpu/drm/panthor/panthor_mmu.c > +++ b/drivers/gpu/drm/panthor/panthor_mmu.c > @@ -2674,7 +2674,8 @@ int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm > */ > void panthor_mmu_unplug(struct panthor_device *ptdev) > { > - panthor_mmu_irq_suspend(&ptdev->mmu->irq); > + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) > + panthor_mmu_irq_suspend(&ptdev->mmu->irq); > > mutex_lock(&ptdev->mmu->as.slots_lock); > for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) { > -- > 2.46.2 >
On Thu, 14 Nov 2024 11:13:29 +0000 Liviu Dudau <liviu.dudau@arm.com> wrote: > On Wed, Nov 13, 2024 at 04:42:54PM +0100, Boris Brezillon wrote: > > The runtime PM resume operation is not guaranteed to succeed, but if it > > fails, the device should be in a suspended state. Make sure we're robust > > to resume failures in the unplug path. > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > drivers/gpu/drm/panthor/panthor_device.c | 23 ++++++++--------------- > > drivers/gpu/drm/panthor/panthor_fw.c | 14 +++++++++----- > > drivers/gpu/drm/panthor/panthor_gpu.c | 3 ++- > > drivers/gpu/drm/panthor/panthor_mmu.c | 3 ++- > > 4 files changed, 21 insertions(+), 22 deletions(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c > > index db7ba40f771d..8b5d54b2bbb4 100644 > > --- a/drivers/gpu/drm/panthor/panthor_device.c > > +++ b/drivers/gpu/drm/panthor/panthor_device.c > > @@ -48,6 +48,8 @@ static int panthor_clk_init(struct panthor_device *ptdev) > > > > void panthor_device_unplug(struct panthor_device *ptdev) > > { > > + int ret; > > + > > /* This function can be called from two different path: the reset work > > * and the platform device remove callback. drm_dev_unplug() doesn't > > * deal with concurrent callers, so we have to protect drm_dev_unplug() > > @@ -74,7 +76,8 @@ void panthor_device_unplug(struct panthor_device *ptdev) > > */ > > mutex_unlock(&ptdev->unplug.lock); > > > > - drm_WARN_ON(&ptdev->base, pm_runtime_get_sync(ptdev->base.dev) < 0); > > + ret = pm_runtime_get_sync(ptdev->base.dev); > > + drm_WARN_ON(&ptdev->base, ret < 0); > > > > /* Now, try to cleanly shutdown the GPU before the device resources > > * get reclaimed. > > @@ -85,7 +88,10 @@ void panthor_device_unplug(struct panthor_device *ptdev) > > panthor_gpu_unplug(ptdev); > > > > pm_runtime_dont_use_autosuspend(ptdev->base.dev); > > - pm_runtime_put_sync_suspend(ptdev->base.dev); > > + > > + /* If the resume failed, we don't need to suspend here. */ > > + if (!ret) > > + pm_runtime_put_sync_suspend(ptdev->base.dev); > > > > /* If PM is disabled, we need to call the suspend handler manually. */ > > if (!IS_ENABLED(CONFIG_PM)) > > @@ -541,17 +547,4 @@ int panthor_device_suspend(struct device *dev) > > clk_disable_unprepare(ptdev->clks.core); > > atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); > > return 0; > > - > > -err_set_active: > > Label is being deleted here, but I don't see where the `goto err_set_active` is being removed. Yes, as mentioned in my own reply, this chunk should be in patch 3.
On Thu, Nov 14, 2024 at 12:27:55PM +0100, Boris Brezillon wrote: > On Thu, 14 Nov 2024 11:13:29 +0000 > Liviu Dudau <liviu.dudau@arm.com> wrote: > > > On Wed, Nov 13, 2024 at 04:42:54PM +0100, Boris Brezillon wrote: > > > The runtime PM resume operation is not guaranteed to succeed, but if it > > > fails, the device should be in a suspended state. Make sure we're robust > > > to resume failures in the unplug path. > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > --- > > > drivers/gpu/drm/panthor/panthor_device.c | 23 ++++++++--------------- > > > drivers/gpu/drm/panthor/panthor_fw.c | 14 +++++++++----- > > > drivers/gpu/drm/panthor/panthor_gpu.c | 3 ++- > > > drivers/gpu/drm/panthor/panthor_mmu.c | 3 ++- > > > 4 files changed, 21 insertions(+), 22 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c > > > index db7ba40f771d..8b5d54b2bbb4 100644 > > > --- a/drivers/gpu/drm/panthor/panthor_device.c > > > +++ b/drivers/gpu/drm/panthor/panthor_device.c > > > @@ -48,6 +48,8 @@ static int panthor_clk_init(struct panthor_device *ptdev) > > > > > > void panthor_device_unplug(struct panthor_device *ptdev) > > > { > > > + int ret; > > > + > > > /* This function can be called from two different path: the reset work > > > * and the platform device remove callback. drm_dev_unplug() doesn't > > > * deal with concurrent callers, so we have to protect drm_dev_unplug() > > > @@ -74,7 +76,8 @@ void panthor_device_unplug(struct panthor_device *ptdev) > > > */ > > > mutex_unlock(&ptdev->unplug.lock); > > > > > > - drm_WARN_ON(&ptdev->base, pm_runtime_get_sync(ptdev->base.dev) < 0); > > > + ret = pm_runtime_get_sync(ptdev->base.dev); > > > + drm_WARN_ON(&ptdev->base, ret < 0); > > > > > > /* Now, try to cleanly shutdown the GPU before the device resources > > > * get reclaimed. > > > @@ -85,7 +88,10 @@ void panthor_device_unplug(struct panthor_device *ptdev) > > > panthor_gpu_unplug(ptdev); > > > > > > pm_runtime_dont_use_autosuspend(ptdev->base.dev); > > > - pm_runtime_put_sync_suspend(ptdev->base.dev); > > > + > > > + /* If the resume failed, we don't need to suspend here. */ > > > + if (!ret) > > > + pm_runtime_put_sync_suspend(ptdev->base.dev); > > > > > > /* If PM is disabled, we need to call the suspend handler manually. */ > > > if (!IS_ENABLED(CONFIG_PM)) > > > @@ -541,17 +547,4 @@ int panthor_device_suspend(struct device *dev) > > > clk_disable_unprepare(ptdev->clks.core); > > > atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); > > > return 0; > > > - > > > -err_set_active: > > > > Label is being deleted here, but I don't see where the `goto err_set_active` is being removed. > > Yes, as mentioned in my own reply, this chunk should be in patch 3. Sorry, I did look at patch 3 before sending the email but somehow missed the line with the goto being deleted. With the change: Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Best regards, Liviu
diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index db7ba40f771d..8b5d54b2bbb4 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -48,6 +48,8 @@ static int panthor_clk_init(struct panthor_device *ptdev) void panthor_device_unplug(struct panthor_device *ptdev) { + int ret; + /* This function can be called from two different path: the reset work * and the platform device remove callback. drm_dev_unplug() doesn't * deal with concurrent callers, so we have to protect drm_dev_unplug() @@ -74,7 +76,8 @@ void panthor_device_unplug(struct panthor_device *ptdev) */ mutex_unlock(&ptdev->unplug.lock); - drm_WARN_ON(&ptdev->base, pm_runtime_get_sync(ptdev->base.dev) < 0); + ret = pm_runtime_get_sync(ptdev->base.dev); + drm_WARN_ON(&ptdev->base, ret < 0); /* Now, try to cleanly shutdown the GPU before the device resources * get reclaimed. @@ -85,7 +88,10 @@ void panthor_device_unplug(struct panthor_device *ptdev) panthor_gpu_unplug(ptdev); pm_runtime_dont_use_autosuspend(ptdev->base.dev); - pm_runtime_put_sync_suspend(ptdev->base.dev); + + /* If the resume failed, we don't need to suspend here. */ + if (!ret) + pm_runtime_put_sync_suspend(ptdev->base.dev); /* If PM is disabled, we need to call the suspend handler manually. */ if (!IS_ENABLED(CONFIG_PM)) @@ -541,17 +547,4 @@ int panthor_device_suspend(struct device *dev) clk_disable_unprepare(ptdev->clks.core); atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_SUSPENDED); return 0; - -err_set_active: - /* If something failed and we have to revert back to an - * active state, we also need to clear the MMIO userspace - * mappings, so any dumb pages that were mapped while we - * were trying to suspend gets invalidated. - */ - mutex_lock(&ptdev->pm.mmio_lock); - atomic_set(&ptdev->pm.state, PANTHOR_DEVICE_PM_STATE_ACTIVE); - unmap_mapping_range(ptdev->base.anon_inode->i_mapping, - DRM_PANTHOR_USER_MMIO_OFFSET, 0, 1); - mutex_unlock(&ptdev->pm.mmio_lock); - return ret; } diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c index 2060085cc9f3..df74750cf1b7 100644 --- a/drivers/gpu/drm/panthor/panthor_fw.c +++ b/drivers/gpu/drm/panthor/panthor_fw.c @@ -12,6 +12,7 @@ #include <linux/iosys-map.h> #include <linux/mutex.h> #include <linux/platform_device.h> +#include <linux/pm_runtime.h> #include <drm/drm_drv.h> #include <drm/drm_managed.h> @@ -1188,11 +1189,13 @@ void panthor_fw_unplug(struct panthor_device *ptdev) cancel_delayed_work_sync(&ptdev->fw->watchdog.ping_work); - /* Make sure the IRQ handler can be called after that point. */ - if (ptdev->fw->irq.irq) - panthor_job_irq_suspend(&ptdev->fw->irq); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) { + /* Make sure the IRQ handler can be called after that point. */ + if (ptdev->fw->irq.irq) + panthor_job_irq_suspend(&ptdev->fw->irq); - panthor_fw_stop(ptdev); + panthor_fw_stop(ptdev); + } list_for_each_entry(section, &ptdev->fw->sections, node) panthor_kernel_bo_destroy(section->mem); @@ -1205,7 +1208,8 @@ void panthor_fw_unplug(struct panthor_device *ptdev) panthor_vm_put(ptdev->fw->vm); ptdev->fw->vm = NULL; - panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 20000); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) + panthor_gpu_power_off(ptdev, L2, ptdev->gpu_info.l2_present, 20000); } /** diff --git a/drivers/gpu/drm/panthor/panthor_gpu.c b/drivers/gpu/drm/panthor/panthor_gpu.c index 2d3529a0b156..27702bc62dd6 100644 --- a/drivers/gpu/drm/panthor/panthor_gpu.c +++ b/drivers/gpu/drm/panthor/panthor_gpu.c @@ -174,7 +174,8 @@ void panthor_gpu_unplug(struct panthor_device *ptdev) unsigned long flags; /* Make sure the IRQ handler is not running after that point. */ - panthor_gpu_irq_suspend(&ptdev->gpu->irq); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) + panthor_gpu_irq_suspend(&ptdev->gpu->irq); /* Wake-up all waiters. */ spin_lock_irqsave(&ptdev->gpu->reqs_lock, flags); diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c b/drivers/gpu/drm/panthor/panthor_mmu.c index c2262c72e8e2..fee6c7d9fe0a 100644 --- a/drivers/gpu/drm/panthor/panthor_mmu.c +++ b/drivers/gpu/drm/panthor/panthor_mmu.c @@ -2674,7 +2674,8 @@ int panthor_vm_prepare_mapped_bos_resvs(struct drm_exec *exec, struct panthor_vm */ void panthor_mmu_unplug(struct panthor_device *ptdev) { - panthor_mmu_irq_suspend(&ptdev->mmu->irq); + if (!IS_ENABLED(CONFIG_PM) || pm_runtime_active(ptdev->base.dev)) + panthor_mmu_irq_suspend(&ptdev->mmu->irq); mutex_lock(&ptdev->mmu->as.slots_lock); for (u32 i = 0; i < ARRAY_SIZE(ptdev->mmu->as.slots); i++) {
The runtime PM resume operation is not guaranteed to succeed, but if it fails, the device should be in a suspended state. Make sure we're robust to resume failures in the unplug path. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- drivers/gpu/drm/panthor/panthor_device.c | 23 ++++++++--------------- drivers/gpu/drm/panthor/panthor_fw.c | 14 +++++++++----- drivers/gpu/drm/panthor/panthor_gpu.c | 3 ++- drivers/gpu/drm/panthor/panthor_mmu.c | 3 ++- 4 files changed, 21 insertions(+), 22 deletions(-)