Message ID | 20230112-acpi_nfit_lockdep-v1-1-660be4dd10be@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | fb6df4366f86dd252bfa3049edffa52d17e7b895 |
Headers | show |
Series | ACPI: NFIT: fix a potential deadlock during NFIT teardown | expand |
Vishal Verma wrote: > Lockdep reports that acpi_nfit_shutdown() may deadlock against an > opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a > 'work' and therefore has already acquired workqueue-internal locks. It > also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first > acquires init_mutex, and was subsequently attempting to cancel any > pending workqueue items. This reversed locking order causes a potential > deadlock: > > ====================================================== > WARNING: possible circular locking dependency detected > 6.2.0-rc3 #116 Tainted: G O N > ------------------------------------------------------ > libndctl/1958 is trying to acquire lock: > ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450 > > but task is already holding lock: > ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit] > > which lock already depends on the new lock. > > ... > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&acpi_desc->init_mutex); > lock((work_completion)(&(&acpi_desc->dwork)->work)); > lock(&acpi_desc->init_mutex); > lock((work_completion)(&(&acpi_desc->dwork)->work)); > > *** DEADLOCK *** > > Since the workqueue manipulation is protected by its own internal locking, > the cancellation of pending work doesn't need to be done under > acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the > init_mutex to fix the deadlock. Any work that starts after > acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the > cancel_delayed_work_sync() will safely flush it out. > > Reported-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> > --- > drivers/acpi/nfit/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c > index f1cc5ec6a3b6..4e48d6db05eb 100644 > --- a/drivers/acpi/nfit/core.c > +++ b/drivers/acpi/nfit/core.c > @@ -3297,8 +3297,8 @@ void acpi_nfit_shutdown(void *data) > > mutex_lock(&acpi_desc->init_mutex); > set_bit(ARS_CANCEL, &acpi_desc->scrub_flags); > - cancel_delayed_work_sync(&acpi_desc->dwork); > mutex_unlock(&acpi_desc->init_mutex); > + cancel_delayed_work_sync(&acpi_desc->dwork); Looks good, applied.
diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index f1cc5ec6a3b6..4e48d6db05eb 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -3297,8 +3297,8 @@ void acpi_nfit_shutdown(void *data) mutex_lock(&acpi_desc->init_mutex); set_bit(ARS_CANCEL, &acpi_desc->scrub_flags); - cancel_delayed_work_sync(&acpi_desc->dwork); mutex_unlock(&acpi_desc->init_mutex); + cancel_delayed_work_sync(&acpi_desc->dwork); /* * Bounce the nvdimm bus lock to make sure any in-flight
Lockdep reports that acpi_nfit_shutdown() may deadlock against an opportune acpi_nfit_scrub(). acpi_nfit_scrub () is run from inside a 'work' and therefore has already acquired workqueue-internal locks. It also acquiires acpi_desc->init_mutex. acpi_nfit_shutdown() first acquires init_mutex, and was subsequently attempting to cancel any pending workqueue items. This reversed locking order causes a potential deadlock: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc3 #116 Tainted: G O N ------------------------------------------------------ libndctl/1958 is trying to acquire lock: ffff888129b461c0 ((work_completion)(&(&acpi_desc->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x43/0x450 but task is already holding lock: ffff888129b460e8 (&acpi_desc->init_mutex){+.+.}-{3:3}, at: acpi_nfit_shutdown+0x87/0xd0 [nfit] which lock already depends on the new lock. ... Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); lock(&acpi_desc->init_mutex); lock((work_completion)(&(&acpi_desc->dwork)->work)); *** DEADLOCK *** Since the workqueue manipulation is protected by its own internal locking, the cancellation of pending work doesn't need to be done under acpi_desc->init_mutex. Move cancel_delayed_work_sync() outside the init_mutex to fix the deadlock. Any work that starts after acpi_nfit_shutdown() drops the lock will see ARS_CANCEL, and the cancel_delayed_work_sync() will safely flush it out. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- drivers/acpi/nfit/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- base-commit: b7bfaa761d760e72a969d116517eaa12e404c262 change-id: 20230112-acpi_nfit_lockdep-264d7f41e6c7 Best regards,