Message ID | 20240612181625.3604512-3-kbusch@meta.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | pcie hotplug and error fixes | expand |
On Wed, Jun 12, 2024 at 11:16:25AM -0700, Keith Busch wrote: > DPC and AER handling access their subordinate bus devices. If pciehp > should happen to also trigger during this handling, it will remove > all the subordinate buses, then dereferecing any children may be a > use-after-free. That may lead to kernel panics like the below. I assume the crash occurs because the struct pci_bus accessed by pci_bus_read_config_dword() has already been freed? Generally the solution to issues like this is to hold references on the structs being accessed, not to acquire locks that happen to prevent the structs from being freed. Question is, which struct ref needs to be held and where? Holding a ref on a struct pci_dev also holds the pci_bus it resides on in place. So I suspect we need to call pci_dev_get() somewhere. The stack trace looks incomplete for some reason: > ? pci_bus_read_config_dword+0x17/0x50 > pci_dev_wait+0x107/0x190 > ? dpc_completed+0x50/0x50 > dpc_reset_link+0x4e/0xd0 > pcie_do_recovery+0xb2/0x2d0 I'd expect a call to pci_bridge_wait_for_secondary_bus() from dpc_reset_link(), which in turn calls pci_dev_wait(). Indeed pci_bridge_wait_for_secondary_bus() does something fishy: It takes the first entry from the devices list without acquiring a ref: child = list_first_entry(&dev->subordinate->devices, struct pci_dev, bus_list); Below is a small patch which acquires a ref on child. Maybe this already does the trick? -- >8 -- diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2a8063e..82db9a8 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4753,7 +4753,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus) */ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) { - struct pci_dev *child; + struct pci_dev *child __free(pci_dev_put) = NULL; int delay; if (pci_dev_is_disconnected(dev)) @@ -4782,8 +4782,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) return 0; } - child = list_first_entry(&dev->subordinate->devices, struct pci_dev, - bus_list); + + child = pci_dev_get(list_first_entry(&dev->subordinate->devices, + struct pci_dev, bus_list)); up_read(&pci_bus_sem); /*
On Sat, Jun 15, 2024 at 04:09:41PM +0200, Lukas Wunner wrote: > Below is a small patch which acquires a ref on child. Maybe this > already does the trick? Thanks, it appears your idea works! I've run it through 100 test iterations successfully; previously it would reliably panic within 5 or fewer iterations. I also need the first patch from this series, though, otherwise it inevitably deadlocks. I'd be interested to hear if you have any thoughts on that one. Reviewed-by: Keith Busch <kbusch@kernel.org> Tested-by: Keith Busch <kbusch@kernel.org> > -- >8 -- > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2a8063e..82db9a8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4753,7 +4753,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus) > */ > int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > { > - struct pci_dev *child; > + struct pci_dev *child __free(pci_dev_put) = NULL; > int delay; > > if (pci_dev_is_disconnected(dev)) > @@ -4782,8 +4782,9 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type) > return 0; > } > > - child = list_first_entry(&dev->subordinate->devices, struct pci_dev, > - bus_list); > + > + child = pci_dev_get(list_first_entry(&dev->subordinate->devices, > + struct pci_dev, bus_list)); > up_read(&pci_bus_sem); > > /*
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 31090770fffcc..5355fc0fbf910 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -192,7 +192,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_channel_state_t state, pci_ers_result_t (*reset_subordinates)(struct pci_dev *pdev)) { - int type = pci_pcie_type(dev); + int type = pci_pcie_type(dev), ret; struct pci_dev *bridge; pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); @@ -214,6 +214,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, else bridge = pci_upstream_bridge(dev); + + ret = pci_trylock_rescan_remove(bridge); + if (!ret) + return PCI_ERS_RESULT_DISCONNECT; pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL); pci_dbg(bridge, "broadcast error_detected message\n"); @@ -262,12 +266,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, } pci_walk_bridge(bridge, pci_pm_runtime_put, NULL); + pci_unlock_rescan_remove(); pci_info(bridge, "device recovery successful\n"); return status; failed: pci_walk_bridge(bridge, pci_pm_runtime_put, NULL); + pci_unlock_rescan_remove(); pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);