Message ID | 20240610220304.3162895-2-kbusch@meta.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | pcie hotplug and error fixes | expand |
Hi Keith, kernel test robot noticed the following build errors: [auto build test ERROR on pci/next] [also build test ERROR on pci/for-linus linus/master v6.10-rc3 next-20240607] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Keith-Busch/PCI-pciehp-fix-concurrent-sub-tree-removal-deadlock/20240611-060555 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/20240610220304.3162895-2-kbusch%40meta.com patch subject: [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock config: arm-allnoconfig (https://download.01.org/0day-ci/archive/20240611/202406111313.UuWo45kC-lkp@intel.com/config) compiler: clang version 19.0.0git (https://github.com/llvm/llvm-project 4403cdbaf01379de96f8d0d6ea4f51a085e37766) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240611/202406111313.UuWo45kC-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202406111313.UuWo45kC-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from drivers/pci/of.c:11: In file included from include/linux/pci.h:2672: In file included from include/linux/dma-mapping.h:11: In file included from include/linux/scatterlist.h:8: In file included from include/linux/mm.h:2253: include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 514 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from drivers/pci/of.c:16: >> drivers/pci/pci.h:416:2: error: call to undeclared function 'pci_notify_disconnected'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 416 | pci_notify_disconnected(); | ^ drivers/pci/pci.h:416:2: note: did you mean 'pci_doe_disconnected'? drivers/pci/pci.h:376:20: note: 'pci_doe_disconnected' declared here 376 | static inline void pci_doe_disconnected(struct pci_dev *pdev) { } | ^ 1 warning and 1 error generated. vim +/pci_notify_disconnected +416 drivers/pci/pci.h 411 412 static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) 413 { 414 pci_dev_set_io_state(dev, pci_channel_io_perm_failure); 415 pci_doe_disconnected(dev); > 416 pci_notify_disconnected(); 417 418 return 0; 419 } 420
Hi Keith, kernel test robot noticed the following build errors: [auto build test ERROR on pci/next] [also build test ERROR on pci/for-linus linus/master v6.10-rc3 next-20240607] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Keith-Busch/PCI-pciehp-fix-concurrent-sub-tree-removal-deadlock/20240611-060555 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/20240610220304.3162895-2-kbusch%40meta.com patch subject: [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock config: arc-allnoconfig (https://download.01.org/0day-ci/archive/20240611/202406111416.XIxUrEy4-lkp@intel.com/config) compiler: arc-elf-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240611/202406111416.XIxUrEy4-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202406111416.XIxUrEy4-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from drivers/pci/of.c:16: drivers/pci/pci.h: In function 'pci_dev_set_disconnected': >> drivers/pci/pci.h:416:9: error: implicit declaration of function 'pci_notify_disconnected'; did you mean 'pci_doe_disconnected'? [-Werror=implicit-function-declaration] 416 | pci_notify_disconnected(); | ^~~~~~~~~~~~~~~~~~~~~~~ | pci_doe_disconnected cc1: some warnings being treated as errors vim +416 drivers/pci/pci.h 411 412 static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) 413 { 414 pci_dev_set_io_state(dev, pci_channel_io_perm_failure); 415 pci_doe_disconnected(dev); > 416 pci_notify_disconnected(); 417 418 return 0; 419 } 420
On Tue, Jun 11, 2024 at 02:19:09PM +0800, kernel test robot wrote: > url: https://github.com/intel-lab-lkp/linux/commits/Keith-Busch/PCI-pciehp-fix-concurrent-sub-tree-removal-deadlock/20240611-060555 > base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next > patch link: https://lore.kernel.org/r/20240610220304.3162895-2-kbusch%40meta.com > patch subject: [PATCH 1/2] PCI: pciehp: fix concurrent sub-tree removal deadlock > config: arc-allnoconfig (https://download.01.org/0day-ci/archive/20240611/202406111416.XIxUrEy4-lkp@intel.com/config) > compiler: arc-elf-gcc (GCC) 13.2.0 > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240611/202406111416.XIxUrEy4-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202406111416.XIxUrEy4-lkp@intel.com/ > > All errors (new ones prefixed by >>): > > In file included from drivers/pci/of.c:16: > drivers/pci/pci.h: In function 'pci_dev_set_disconnected': > >> drivers/pci/pci.h:416:9: error: implicit declaration of function 'pci_notify_disconnected'; did you mean 'pci_doe_disconnected'? [-Werror=implicit-function-declaration] > 416 | pci_notify_disconnected(); > | ^~~~~~~~~~~~~~~~~~~~~~~ > | pci_doe_disconnected > cc1: some warnings being treated as errors Interesting, drivers/pci/of.c depends on CONFIG_OF, but not CONFIG_PCI. Okay, I can stub the function in that case.
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c index ad12515a4a121..ca6237b0732c8 100644 --- a/drivers/pci/hotplug/pciehp_pci.c +++ b/drivers/pci/hotplug/pciehp_pci.c @@ -34,9 +34,12 @@ int pciehp_configure_device(struct controller *ctrl) struct pci_dev *dev; struct pci_dev *bridge = ctrl->pcie->port; struct pci_bus *parent = bridge->subordinate; - int num, ret = 0; + int num, ret; - pci_lock_rescan_remove(); + ret = pci_trylock_rescan_remove(bridge); + if (!ret) + return -ENODEV; + ret = 0; dev = pci_get_slot(parent, PCI_DEVFN(0, 0)); if (dev) { @@ -93,6 +96,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) struct pci_dev *dev, *temp; struct pci_bus *parent = ctrl->pcie->port->subordinate; u16 command; + int ret; ctrl_dbg(ctrl, "%s: domain:bus:dev = %04x:%02x:00\n", __func__, pci_domain_nr(parent), parent->number); @@ -100,7 +104,9 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence) if (!presence) pci_walk_bus(parent, pci_dev_set_disconnected, NULL); - pci_lock_rescan_remove(); + ret = pci_trylock_rescan_remove(parent->self); + if (!ret) + return; /* * Stopping an SR-IOV PF device removes all the associated VFs, diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fd44565c47562..f525490a02122 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -370,6 +370,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused) { pci_dev_set_io_state(dev, pci_channel_io_perm_failure); pci_doe_disconnected(dev); + pci_notify_disconnected(); return 0; } diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 5fbabb4e3425f..d2e19a1d1a45b 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -3302,6 +3302,7 @@ EXPORT_SYMBOL_GPL(pci_rescan_bus); * routines should always be executed under this mutex. */ static DEFINE_MUTEX(pci_rescan_remove_lock); +static DECLARE_WAIT_QUEUE_HEAD(pci_lock_wq); void pci_lock_rescan_remove(void) { @@ -3309,12 +3310,35 @@ void pci_lock_rescan_remove(void) } EXPORT_SYMBOL_GPL(pci_lock_rescan_remove); +/* + * pci_trylock_rescan_remove() - keep trying to take the lock until successful + * or notified the device is disconnected + * + * Returns 1 if the lock was successfully taken, 0 otherwise. + */ +bool pci_trylock_rescan_remove(struct pci_dev *dev) +{ + int ret; + + wait_event(pci_lock_wq, + (ret = mutex_trylock(&pci_rescan_remove_lock)) == 1 || + pci_dev_is_disconnected(dev)); + + return ret; +} + void pci_unlock_rescan_remove(void) { mutex_unlock(&pci_rescan_remove_lock); + wake_up_all(&pci_lock_wq); } EXPORT_SYMBOL_GPL(pci_unlock_rescan_remove); +void pci_notify_disconnected(void) +{ + wake_up_all(&pci_lock_wq); +} + static int __init pci_sort_bf_cmp(const struct device *d_a, const struct device *d_b) { diff --git a/include/linux/pci.h b/include/linux/pci.h index cafc5ab1cbcb4..b05aaf9aac6c8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1442,7 +1442,9 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev); unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge); unsigned int pci_rescan_bus(struct pci_bus *bus); void pci_lock_rescan_remove(void); +bool pci_trylock_rescan_remove(struct pci_dev *dev); void pci_unlock_rescan_remove(void); +void pci_notify_disconnected(void); /* Vital Product Data routines */ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf);