diff mbox

PCI / PM: Crashes in PME scan during system suspend

Message ID 20170414082249.GA5417@wunner.de (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Lukas Wunner April 14, 2017, 8:22 a.m. UTC
On Tue, Feb 14, 2017 at 12:26:01PM +0100, Rafael J. Wysocki wrote:
> On Tuesday, February 14, 2017 10:31:38 AM Geert Uytterhoeven wrote:
> > Laurent Pinchart reported that r8a7790/Lager crashes during suspend tests.
> > 
> > I managed to reproduce the issue on r8a7791/koelsch:
> >   - It only happens during suspend tests, after writing either "platform"
> >     or "processors" to /sys/power/pm_test,
> >   - It does not (or is less likely) to happen during full system suspend
> >     ("core" or "none").
> > 
> > More investigation shows this happens when the PME scan runs, once per
> > second.  During PME scan, the PCI host bridge (rcar-pci) registers are
> > accessed while the host bridge's module clock has already been disabled,
> > leading to a crash.
> 
> OK, so clearly PME scans should be suspended before the host bridge
> registers become inaccessible.
> 
> Another question, though, is whether or not PME scans are actually necessary
> on the affected platforms at all.

I'm not seeing a fix for this in linux-next, am I missing something?
Has anyone looked into it or is the issue still open?

Below is a tentative patch which moves PME polling to a freezable
workqueue, so it is frozen before the host bridge is suspended.
Geert, Laurent, could you test this?

The patch may be problematic in that pci_pme_list_scan() acquires
pci_pme_list_mutex, which is also acquired by pci_pme_active(),
which gets called when devices are suspended -- *after* the worker
has been frozen.  I'm not really familiar with the freezer, can it
happen that the worker is frozen while holding the mutex?  If so
this would deadlock.  Rafael?

Alternative approaches would be to
(a) skip devices in pci_pme_list_scan() if their is_prepared or
    is_suspended flags are set, or
(b) disable PME polling via a PM notifier.

The latter seems preferable performance-wise.  (To avoid checking
these flags once per second.)

Best regards,

Lukas

-- >8 --

Comments

Rafael J. Wysocki April 14, 2017, 10:27 p.m. UTC | #1
On Friday, April 14, 2017 10:22:49 AM Lukas Wunner wrote:
> On Tue, Feb 14, 2017 at 12:26:01PM +0100, Rafael J. Wysocki wrote:
> > On Tuesday, February 14, 2017 10:31:38 AM Geert Uytterhoeven wrote:
> > > Laurent Pinchart reported that r8a7790/Lager crashes during suspend tests.
> > > 
> > > I managed to reproduce the issue on r8a7791/koelsch:
> > >   - It only happens during suspend tests, after writing either "platform"
> > >     or "processors" to /sys/power/pm_test,
> > >   - It does not (or is less likely) to happen during full system suspend
> > >     ("core" or "none").
> > > 
> > > More investigation shows this happens when the PME scan runs, once per
> > > second.  During PME scan, the PCI host bridge (rcar-pci) registers are
> > > accessed while the host bridge's module clock has already been disabled,
> > > leading to a crash.
> > 
> > OK, so clearly PME scans should be suspended before the host bridge
> > registers become inaccessible.
> > 
> > Another question, though, is whether or not PME scans are actually necessary
> > on the affected platforms at all.
> 
> I'm not seeing a fix for this in linux-next, am I missing something?
> Has anyone looked into it or is the issue still open?

It is still open AFAICS.

> Below is a tentative patch which moves PME polling to a freezable
> workqueue, so it is frozen before the host bridge is suspended.
> Geert, Laurent, could you test this?
> 
> The patch may be problematic in that pci_pme_list_scan() acquires
> pci_pme_list_mutex, which is also acquired by pci_pme_active(),
> which gets called when devices are suspended -- *after* the worker
> has been frozen.  I'm not really familiar with the freezer, can it
> happen that the worker is frozen while holding the mutex?  If so
> this would deadlock.  Rafael?

That depends on the worker, precisely on where it calls try_to_freeze().

That said I think it won't do that while holding any locks. :-)

Thanks,
Rafael
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7904d02..d35c016 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1782,8 +1782,8 @@  static void pci_pme_list_scan(struct work_struct *work)
 		}
 	}
 	if (!list_empty(&pci_pme_list))
-		schedule_delayed_work(&pci_pme_work,
-				      msecs_to_jiffies(PME_TIMEOUT));
+		queue_delayed_work(system_freezable_wq, &pci_pme_work,
+				   msecs_to_jiffies(PME_TIMEOUT));
 	mutex_unlock(&pci_pme_list_mutex);
 }
 
@@ -1848,8 +1848,9 @@  void pci_pme_active(struct pci_dev *dev, bool enable)
 			mutex_lock(&pci_pme_list_mutex);
 			list_add(&pme_dev->list, &pci_pme_list);
 			if (list_is_singular(&pci_pme_list))
-				schedule_delayed_work(&pci_pme_work,
-						      msecs_to_jiffies(PME_TIMEOUT));
+				queue_delayed_work(system_freezable_wq,
+						   &pci_pme_work,
+						   msecs_to_jiffies(PME_TIMEOUT));
 			mutex_unlock(&pci_pme_list_mutex);
 		} else {
 			mutex_lock(&pci_pme_list_mutex);