Message ID | 20241022130243.263737-1-acelan.kao@canonical.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v2] PCI: pciehp: Fix system hang during resume with daisy-chained hotplug controllers | expand |
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c index ff458e692fed..56bf23d55c41 100644 --- a/drivers/pci/hotplug/pciehp_core.c +++ b/drivers/pci/hotplug/pciehp_core.c @@ -332,7 +332,7 @@ static int pciehp_resume_noirq(struct pcie_device *dev) ctrl_dbg(ctrl, "device replaced during system sleep\n"); pci_walk_bus(ctrl->pcie->port->subordinate, pci_dev_set_disconnected, NULL); - pciehp_request(ctrl, PCI_EXP_SLTSTA_PDC); + atomic_or(PCI_EXP_SLTSTA_PDC, &ctrl->pending_events); } }
A system hang occurs when multiple PCIe hotplug controllers in a daisy-chained setup (like a Thunderbolt dock with NVMe storage) resume from system sleep. This happens when both the dock and its downstream devices try to process PDC events at the same time through pciehp_request(). This patch changes pciehp_request() to atomic_or(), which adds the PDC event to ctrl->pending_events atomically. This change prevents the race condition by making the event handling atomic across multiple hotplug controllers during resume. The bug was found with an Intel Thunderbolt 4 Bridge (8086:0b26) dock and a Phison NVMe controller (1987:5012), where the system would hang if both devices tried to handle presence detect changes during resume. Changes: v2: * Replace pciehp_request() with atomic_or() to fix race condition v1: * https://lore.kernel.org/lkml/Zvf7xYEA32VgLRJ6@wunner.de/T/ * Remove pci_walk_bus() call * Fix appeared to work due to lower reproduction rate Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> --- drivers/pci/hotplug/pciehp_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)