Message ID | 20210720150145.640727-1-schnelle@linux.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v3] PCI: Move pci_dev_is/assign_added() to pci.h | expand |
On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > PCI arch code of both s390 and powerpc leading to awkward relative > includes. Move it to the global include/linux/pci.h and get rid of these > includes just for that one function. I agree the includes are awkward. But the arch code *using* pci_dev_is_added() seems awkward, too. AFAICS, in powerpc, pci_dev_is_added() is only used by pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those are only called from pcibios_add_device(), which is only called from pci_device_add(). Is it even possible for pci_dev_is_added() to be true in that path? s390 uses pci_dev_is_added() in recover_store(), but I don't know what that is (looks like a sysfs file, but it's not documented) or why s390 is the only arch that does this. Maybe we should make powerpc and s390 less special? > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > Since v1 (and bad v2): > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > defines and also move these to include/linux/pci.h > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > arch/powerpc/platforms/pseries/setup.c | 1 - > arch/s390/pci/pci_sysfs.c | 2 -- > drivers/pci/hotplug/acpiphp_glue.c | 1 - > drivers/pci/pci.h | 15 --------------- > include/linux/pci.h | 15 +++++++++++++++ > 6 files changed, 15 insertions(+), 22 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c > index 28aac933a439..2e0ca5451e85 100644 > --- a/arch/powerpc/platforms/powernv/pci-sriov.c > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c > @@ -9,9 +9,6 @@ > > #include "pci.h" > > -/* for pci_dev_is_added() */ > -#include "../../../../drivers/pci/pci.h" > - > /* > * The majority of the complexity in supporting SR-IOV on PowerNV comes from > * the need to put the MMIO space for each VF into a separate PE. Internally > diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c > index 631a0d57b6cd..17585ec9f955 100644 > --- a/arch/powerpc/platforms/pseries/setup.c > +++ b/arch/powerpc/platforms/pseries/setup.c > @@ -74,7 +74,6 @@ > #include <asm/hvconsole.h> > > #include "pseries.h" > -#include "../../../../drivers/pci/pci.h" > > DEFINE_STATIC_KEY_FALSE(shared_processor); > EXPORT_SYMBOL_GPL(shared_processor); > diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c > index 6e2450c2b9c1..8dbe54ef8f8e 100644 > --- a/arch/s390/pci/pci_sysfs.c > +++ b/arch/s390/pci/pci_sysfs.c > @@ -13,8 +13,6 @@ > #include <linux/stat.h> > #include <linux/pci.h> > > -#include "../../../drivers/pci/pci.h" > - > #include <asm/sclp.h> > > #define zpci_attr(name, fmt, member) \ > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c > index f031302ad401..4cb963f88183 100644 > --- a/drivers/pci/hotplug/acpiphp_glue.c > +++ b/drivers/pci/hotplug/acpiphp_glue.c > @@ -38,7 +38,6 @@ > #include <linux/slab.h> > #include <linux/acpi.h> > > -#include "../pci.h" > #include "acpiphp.h" > > static LIST_HEAD(bridge_list); > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index 93dcdd431072..a159cd0f6f05 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) > return dev->error_state == pci_channel_io_perm_failure; > } > > -/* pci_dev priv_flags */ > -#define PCI_DEV_ADDED 0 > -#define PCI_DPC_RECOVERED 1 > -#define PCI_DPC_RECOVERING 2 > - > -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) > -{ > - assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added); > -} > - > -static inline bool pci_dev_is_added(const struct pci_dev *dev) > -{ > - return test_bit(PCI_DEV_ADDED, &dev->priv_flags); > -} > - > #ifdef CONFIG_PCIEAER > #include <linux/aer.h> > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 540b377ca8f6..ea0e23dbc8ec 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -507,6 +507,21 @@ struct pci_dev { > unsigned long priv_flags; /* Private flags for the PCI driver */ > }; > > +/* pci_dev priv_flags */ > +#define PCI_DEV_ADDED 0 > +#define PCI_DPC_RECOVERED 1 > +#define PCI_DPC_RECOVERING 2 > + > +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) > +{ > + assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added); > +} > + > +static inline bool pci_dev_is_added(const struct pci_dev *dev) > +{ > + return test_bit(PCI_DEV_ADDED, &dev->priv_flags); > +} > + > static inline struct pci_dev *pci_physfn(struct pci_dev *dev) > { > #ifdef CONFIG_PCI_IOV > -- > 2.25.1 >
On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote: > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > > PCI arch code of both s390 and powerpc leading to awkward relative > > includes. Move it to the global include/linux/pci.h and get rid of these > > includes just for that one function. > > I agree the includes are awkward. > > But the arch code *using* pci_dev_is_added() seems awkward, too. See below for my interpretation why s390 has some driver like functionality in its arch code which isn't necessarily awkward. Independent from that I have found pci_dev_is_added() as the only way deal with the case that one might be looking at a struct pci_dev reference that has been removed via pci_stop_and_remove_bus_device() or has never been fully scanned. This is quite useful when handling error events which on s390 are part of the adapter event mechanism shared with channel I/O devices. > > AFAICS, in powerpc, pci_dev_is_added() is only used by > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those > are only called from pcibios_add_device(), which is only called from > pci_device_add(). > > Is it even possible for pci_dev_is_added() to be true in that path? > > s390 uses pci_dev_is_added() in recover_store() I'm actually looking into this as I'm working on an s390 implementation of the PCI recovery flow described in Documentation/PCI/pci-error- recovery.rst that would also call pci_dev_is_added() because when we get a platform notification of a PCI reset done by firmware it may be that the sturct pci_dev is going away i.e. we still have a ref count but it is not added to the PCI bus anymore. And pci_dev_is_added() is the only way I've found to check for this state. > , but I don't know what > that is (looks like a sysfs file, but it's not documented) or why s390 > is the only arch that does this. Good point about this not being documented, I'll look into adding docs. This is a sysfs attribute that basically removes the pci_dev and re- adds it. This has the complication that since the attribute sits at /sys/bus/pci/devices/<dev>/recover it deletes its own parent directory which requires extra caution and means concurrent accesses block on pci_lock_rescan_remove() instead of a kernfs lock. Long story short when concurrently triggering the attribute one thread proceeds into the pci_lock_rescan_remove() section and does the removal, while others would block on pci_lock_rescan_remove(). Now when the threads unblock the removal is done. In this case there is a new struct pci_dev found in the rescan but the previously blocked threads still have references to the old struct pci_dev which was removed and as far as I could tell can only be distinguihsed by checking pci_dev_is_added(). > > Maybe we should make powerpc and s390 less special? On s390, as I see it, the reason for this is that all of the PCI functionality is directly defined in the Architecture as special CPU instructions which are kind of hypercalls but also an ISA extension. These instructions range from the basic PCI memory accesses (no real MMIO) to enumeration of the devices and on to reporting of hot-plug and and resets/recovery events. Importantly we do not have any kind of direct access to a real or virtual PCI controller and the architecture has no concept of a comparable entity. So in my opinion while there is some of the functionality of a PCI controller in arch/s390/pci the cut off between controller functionality and arch support isn't clear at all and exposing PCI support as CPU instructions doesn't map well to the controller concept. That said, in principle I'm open to moving some of that into drivers/pci/controller/ if you think that would improve things and we can find a good argument what should go where. One possible cut off would be to have arch/s390/pci/ provide wrappers to the PCI instructions but move all their uses to e.g. drivers/pci/controller/s390/. This would of course be a major refactoring and none of that code would be useful on any other architecture but it would move a lot the accesses to PCI common code functionality out of the arch code. > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > --- > > Since v1 (and bad v2): > > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > > defines and also move these to include/linux/pci.h > > > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > > arch/powerpc/platforms/pseries/setup.c | 1 - > > arch/s390/pci/pci_sysfs.c | 2 -- > > drivers/pci/hotplug/acpiphp_glue.c | 1 - > > drivers/pci/pci.h | 15 --------------- > > include/linux/pci.h | 15 +++++++++++++++ > > 6 files changed, 15 insertions(+), 22 deletions(-) > > > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c > > index 28aac933a439..2e0ca5451e85 100644 > > --- a/arch/powerpc/platforms/powernv/pci-sriov.c > > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c > > @@ -9,9 +9,6 @@ > > > > #include "pci.h" > > > > -/* for pci_dev_is_added() */ > > -#include "../../../../drivers/pci/pci.h" > > .. snip ..
On Mon, Aug 23, 2021 at 12:53:39PM +0200, Niklas Schnelle wrote: > On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote: > > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > > > PCI arch code of both s390 and powerpc leading to awkward relative > > > includes. Move it to the global include/linux/pci.h and get rid of these > > > includes just for that one function. > > > > I agree the includes are awkward. > > > > But the arch code *using* pci_dev_is_added() seems awkward, too. > > See below for my interpretation why s390 has some driver like > functionality in its arch code which isn't necessarily awkward. > > Independent from that I have found pci_dev_is_added() as the only way > deal with the case that one might be looking at a struct pci_dev > reference that has been removed via pci_stop_and_remove_bus_device() or > has never been fully scanned. This is quite useful when handling error > events which on s390 are part of the adapter event mechanism shared > with channel I/O devices. > > > AFAICS, in powerpc, pci_dev_is_added() is only used by > > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those > > are only called from pcibios_add_device(), which is only called from > > pci_device_add(). > > > > Is it even possible for pci_dev_is_added() to be true in that path? If the pci_dev_is_added() in powerpc is unreachable, we can remove it and at least reduce this to an s390-only problem. > > s390 uses pci_dev_is_added() in recover_store() > > I'm actually looking into this as I'm working on an s390 implementation > of the PCI recovery flow described in Documentation/PCI/pci-error- > recovery.rst that would also call pci_dev_is_added() because when we > get a platform notification of a PCI reset done by firmware it may be > that the struct pci_dev is going away i.e. we still have a ref count > but it is not added to the PCI bus anymore. And pci_dev_is_added() is > the only way I've found to check for this state. > > > , but I don't know what > > that is (looks like a sysfs file, but it's not documented) or why s390 > > is the only arch that does this. > > Good point about this not being documented, I'll look into adding docs. > > This is a sysfs attribute that basically removes the pci_dev and re- > adds it. This has the complication that since the attribute sits at > /sys/bus/pci/devices/<dev>/recover it deletes its own parent directory > which requires extra caution and means concurrent accesses block on > pci_lock_rescan_remove() instead of a kernfs lock. > Long story short when concurrently triggering the attribute one thread > proceeds into the pci_lock_rescan_remove() section and does the > removal, while others would block on pci_lock_rescan_remove(). Now when > the threads unblock the removal is done. In this case there is a new > struct pci_dev found in the rescan but the previously blocked threads > still have references to the old struct pci_dev which was removed and > as far as I could tell can only be distinguished by checking > pci_dev_is_added(). Is this locking issue different from concurrently writing to /sys/.../remove on other architectures? > > Maybe we should make powerpc and s390 less special? > > On s390, as I see it, the reason for this is that all of the PCI > functionality is directly defined in the Architecture as special CPU > instructions which are kind of hypercalls but also an ISA extension. > > These instructions range from the basic PCI memory accesses (no real > MMIO) to enumeration of the devices and on to reporting of hot-plug and > and resets/recovery events. Importantly we do not have any kind of > direct access to a real or virtual PCI controller and the architecture > has no concept of a comparable entity. > > So in my opinion while there is some of the functionality of a PCI > controller in arch/s390/pci the cut off between controller > functionality and arch support isn't clear at all and exposing PCI > support as CPU instructions doesn't map well to the controller concept. > > That said, in principle I'm open to moving some of that into > drivers/pci/controller/ if you think that would improve things and we > can find a good argument what should go where. One possible cut off > would be to have arch/s390/pci/ provide wrappers to the PCI > instructions but move all their uses to e.g. > drivers/pci/controller/s390/. This would of course be a major > refactoring and none of that code would be useful on any other > architecture but it would move a lot the accesses to PCI common code > functionality out of the arch code. Looks like hotplug is already in drivers/pci/hotplug/s390_pci_hpc.c. Might be worth considering putting the other PCI core-ish code in drivers/pci as well, though it doesn't feel urgent to me. Maybe a good internship or mentoring project. I'm not sure this juggling around is worth it basically to just clean up the include path. The downside to me is exposing pci_dev_is_added() to outside the PCI core, because I don't want to encourage any other users. > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > > --- > > > Since v1 (and bad v2): > > > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > > > defines and also move these to include/linux/pci.h > > > > > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > > > arch/powerpc/platforms/pseries/setup.c | 1 - > > > arch/s390/pci/pci_sysfs.c | 2 -- > > > drivers/pci/hotplug/acpiphp_glue.c | 1 - > > > drivers/pci/pci.h | 15 --------------- > > > include/linux/pci.h | 15 +++++++++++++++ > > > 6 files changed, 15 insertions(+), 22 deletions(-) > > > > > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c > > > index 28aac933a439..2e0ca5451e85 100644 > > > --- a/arch/powerpc/platforms/powernv/pci-sriov.c > > > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c > > > @@ -9,9 +9,6 @@ > > > > > > #include "pci.h" > > > > > > -/* for pci_dev_is_added() */ > > > -#include "../../../../drivers/pci/pci.h" > > > > .. snip .. >
On Wed, 2021-08-25 at 14:04 -0500, Bjorn Helgaas wrote: > On Mon, Aug 23, 2021 at 12:53:39PM +0200, Niklas Schnelle wrote: > > On Fri, 2021-08-20 at 17:37 -0500, Bjorn Helgaas wrote: > > > On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > > > > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > > > > PCI arch code of both s390 and powerpc leading to awkward relative > > > > includes. Move it to the global include/linux/pci.h and get rid of these > > > > includes just for that one function. > > > > > > I agree the includes are awkward. > > > > > > But the arch code *using* pci_dev_is_added() seems awkward, too. > > > > See below for my interpretation why s390 has some driver like > > functionality in its arch code which isn't necessarily awkward. > > > > Independent from that I have found pci_dev_is_added() as the only way > > deal with the case that one might be looking at a struct pci_dev > > reference that has been removed via pci_stop_and_remove_bus_device() or > > has never been fully scanned. This is quite useful when handling error > > events which on s390 are part of the adapter event mechanism shared > > with channel I/O devices. > > > > > AFAICS, in powerpc, pci_dev_is_added() is only used by > > > pnv_pci_ioda_fixup_iov() and pseries_pci_fixup_iov_resources(). Those > > > are only called from pcibios_add_device(), which is only called from > > > pci_device_add(). > > > > > > Is it even possible for pci_dev_is_added() to be true in that path? > > If the pci_dev_is_added() in powerpc is unreachable, we can remove it > and at least reduce this to an s390-only problem. Ok. I might be missing something but I agree it does look these are called from within pcibios_add_device() only so pci_dev_is_added() can never be true. This looks pretty clear as pci_dev_assign_added() is called after pcibios_add_device() in pci_bus_add_device(). > > > > s390 uses pci_dev_is_added() in recover_store() > > > > I'm actually looking into this as I'm working on an s390 implementation > > of the PCI recovery flow described in Documentation/PCI/pci-error- > > recovery.rst that would also call pci_dev_is_added() because when we > > get a platform notification of a PCI reset done by firmware it may be > > that the struct pci_dev is going away i.e. we still have a ref count > > but it is not added to the PCI bus anymore. And pci_dev_is_added() is > > the only way I've found to check for this state. > > > > > , but I don't know what > > > that is (looks like a sysfs file, but it's not documented) or why s390 > > > is the only arch that does this. > > > > Good point about this not being documented, I'll look into adding docs. > > > > This is a sysfs attribute that basically removes the pci_dev and re- > > adds it. This has the complication that since the attribute sits at > > /sys/bus/pci/devices/<dev>/recover it deletes its own parent directory > > which requires extra caution and means concurrent accesses block on > > pci_lock_rescan_remove() instead of a kernfs lock. > > Long story short when concurrently triggering the attribute one thread > > proceeds into the pci_lock_rescan_remove() section and does the > > removal, while others would block on pci_lock_rescan_remove(). Now when > > the threads unblock the removal is done. In this case there is a new > > struct pci_dev found in the rescan but the previously blocked threads > > still have references to the old struct pci_dev which was removed and > > as far as I could tell can only be distinguished by checking > > pci_dev_is_added(). > > Is this locking issue different from concurrently writing to > /sys/.../remove on other architectures? In principle it is very similar except that we re-scan and thus the removed pdev may co-exist with a new pdev for the same actual device if there are other references to the pdev. There is however also a significant difference in locking that fixes a possible deadlock that I just confirmed also affects /sys/../remove where it is hidden by a lockdep ignore, see lockdep splash below. For /sys/../recover this was fixed by my commit dd712f0a53ca ("s390/pci: Fix possible deadlock in recover_store()") which also introduced the need for pci_dev_is_added(). The change in the above commit is very similar to what is done in drivers/scsi/scsi_sysfs.c:sdev_store_delete() which also has a self deletion and in commit 0ee223b2e1f6 ("scsi: core: Avoid that SCSI device removal through sysfs triggers a deadlock") fixed a very very similar lock order problem. Like the SCSI code I use sysfs_break_active_protection() which allows a concurrent access to /sys/../recover to advance into recover_store(). It may then block on pci_lock_rescan_remove() instead of the kernfs lock it would block on with just delete_remove_file_self(). Thus we have to handle unblocking from pci_lock_rescan_remove() while holding the stale struct pci_dev of the removed device. Together with the re- scan this means I have to be able to determine after unblocking if I'm looking at the old pdev. I just tested that when removing the lockdep ignore from remove_store() and do /sys/../power and /sys/../remove I do hit the same lock order inversion issue there for remove. Note that unlike in the commit introducing the ignore this is a completely flat hiearchy: [ 234.196509] ====================================================== [ 234.196511] WARNING: possible circular locking dependency detected [ 234.196512] 5.14.0-rc7-08025-gf6d7568b37df-dirty #18 Not tainted [ 234.196514] ------------------------------------------------------ [ 234.196516] zsh/1214 is trying to acquire lock: [ 234.196518] 000000013f296e30 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x26/0x48 [ 234.196529] but task is already holding lock: [ 234.196530] 000000009b6c3a10 (kn->active#363){++++}-{0:0}, at: sysfs_remove_file_self+0x42/0x78 [ 234.196538] which lock already depends on the new lock. [ 234.196539] the existing dependency chain (in reverse order) is: [ 234.196541] -> #1 (kn->active#363){++++}-{0:0}: [ 234.196545] validate_chain+0x9ca/0xde8 [ 234.196548] __lock_acquire+0x64c/0xc40 [ 234.196550] lock_acquire.part.0+0xec/0x258 [ 234.196552] lock_acquire+0xb0/0x200 [ 234.196554] kernfs_drain+0x17a/0x1c8 [ 234.196556] __kernfs_remove+0x1f4/0x230 [ 234.196558] kernfs_remove_by_name_ns+0x5c/0xa8 [ 234.196560] remove_files+0x46/0x90 [ 234.196563] sysfs_remove_group+0x5a/0xb8 [ 234.196565] sysfs_remove_groups+0x46/0x68 [ 234.196567] device_remove_attrs+0x90/0xb8 [ 234.196598] device_del+0x192/0x3f8 [ 234.196600] pci_remove_bus_device+0x8a/0x128 [ 234.196602] pci_stop_and_remove_bus_device_locked+0x3a/0x48 [ 234.196604] zpci_bus_remove_device+0x68/0xa8 [ 234.196607] zpci_deconfigure_device+0x3a/0xe0 [ 234.196610] power_write_file+0x7c/0x130 [ 234.196615] kernfs_fop_write_iter+0x13e/0x1e0 [ 234.196617] new_sync_write+0x100/0x190 [ 234.196620] vfs_write+0x21e/0x2d0 [ 234.196622] ksys_write+0x6c/0xf8 [ 234.196624] __do_syscall+0x1c2/0x1f0 [ 234.196628] system_call+0x78/0xa0 [ 234.196632] -> #0 (pci_rescan_remove_lock){+.+.}-{3:3}: [ 234.196636] check_noncircular+0x168/0x188 [ 234.196637] check_prev_add+0xe0/0xed8 [ 234.196639] validate_chain+0x9ca/0xde8 [ 234.196641] __lock_acquire+0x64c/0xc40 [ 234.196642] lock_acquire.part.0+0xec/0x258 [ 234.196644] lock_acquire+0xb0/0x200 [ 234.196646] __mutex_lock+0xa2/0x8d8 [ 234.196648] mutex_lock_nested+0x32/0x40 [ 234.196649] pci_stop_and_remove_bus_device_locked+0x26/0x48 [ 234.196652] remove_store+0x7a/0x88 [ 234.196654] kernfs_fop_write_iter+0x13e/0x1e0 [ 234.196656] new_sync_write+0x100/0x190 [ 234.196657] vfs_write+0x21e/0x2d0 [ 234.196659] ksys_write+0x6c/0xf8 [ 234.196661] __do_syscall+0x1c2/0x1f0 [ 234.196663] system_call+0x78/0xa0 [ 234.196665] other info that might help us debug this: [ 234.196666] Possible unsafe locking scenario: [ 234.196668] CPU0 CPU1 [ 234.196669] ---- ---- [ 234.196670] lock(kn->active#363); [ 234.196672] lock(pci_rescan_remove_lock); [ 234.196674] lock(kn->active#363); [ 234.196677] lock(pci_rescan_remove_lock); [ 234.196679] *** DEADLOCK *** [ 234.196680] 3 locks held by zsh/1214: [ 234.196682] #0: 0000000099d44498 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x6c/0xf8 [ 234.196688] #1: 00000000b214ee90 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x102/0x1e0 [ 234.196693] #2: 000000009b6c3a10 (kn->active#363){++++}-{0:0}, at: sysfs_remove_file_self+0x42/0x78 [ 234.196699] stack backtrace: [ 234.196701] CPU: 6 PID: 1214 Comm: zsh Not tainted 5.14.0-rc7-08025-gf6d7568b37df-dirty #18 [ 234.196703] Hardware name: IBM 8561 T01 703 (LPAR) [ 234.196705] Call Trace: [ 234.196706] [<000000013ebba5d0>] show_stack+0x90/0xf8 [ 234.196711] [<000000013ebcc28e>] dump_stack_lvl+0x8e/0xc8 [ 234.196714] [<000000013dfbe908>] check_noncircular+0x168/0x188 [ 234.196716] [<000000013dfbf9c8>] check_prev_add+0xe0/0xed8 [ 234.196718] [<000000013dfc118a>] validate_chain+0x9ca/0xde8 [ 234.196720] [<000000013dfc4184>] __lock_acquire+0x64c/0xc40 [ 234.196721] [<000000013dfc2d8c>] lock_acquire.part.0+0xec/0x258 [ 234.196724] [<000000013dfc2fa8>] lock_acquire+0xb0/0x200 [ 234.196725] [<000000013ebdac7a>] __mutex_lock+0xa2/0x8d8 [ 234.196727] [<000000013ebdb4e2>] mutex_lock_nested+0x32/0x40 [ 234.196729] [<000000013e7daaf6>] pci_stop_and_remove_bus_device_locked+0x26/0x48 [ 234.196732] [<000000013e7e753a>] remove_store+0x7a/0x88 [ 234.196734] [<000000013e35b4be>] kernfs_fop_write_iter+0x13e/0x1e0 [ 234.196736] [<000000013e264098>] new_sync_write+0x100/0x190 [ 234.196738] [<000000013e266f06>] vfs_write+0x21e/0x2d0 [ 234.196740] [<000000013e267224>] ksys_write+0x6c/0xf8 [ 234.196742] [<000000013ebcf9da>] __do_syscall+0x1c2/0x1f0 [ 234.196744] [<000000013ebe2238>] system_call+0x78/0xa0 > > > > Maybe we should make powerpc and s390 less special? > > > > On s390, as I see it, the reason for this is that all of the PCI > > functionality is directly defined in the Architecture as special CPU > > instructions which are kind of hypercalls but also an ISA extension. > > > > These instructions range from the basic PCI memory accesses (no real > > MMIO) to enumeration of the devices and on to reporting of hot-plug and > > and resets/recovery events. Importantly we do not have any kind of > > direct access to a real or virtual PCI controller and the architecture > > has no concept of a comparable entity. > > > > So in my opinion while there is some of the functionality of a PCI > > controller in arch/s390/pci the cut off between controller > > functionality and arch support isn't clear at all and exposing PCI > > support as CPU instructions doesn't map well to the controller concept. > > > > That said, in principle I'm open to moving some of that into > > drivers/pci/controller/ if you think that would improve things and we > > can find a good argument what should go where. One possible cut off > > would be to have arch/s390/pci/ provide wrappers to the PCI > > instructions but move all their uses to e.g. > > drivers/pci/controller/s390/. This would of course be a major > > refactoring and none of that code would be useful on any other > > architecture but it would move a lot the accesses to PCI common code > > functionality out of the arch code. > > Looks like hotplug is already in drivers/pci/hotplug/s390_pci_hpc.c. > > Might be worth considering putting the other PCI core-ish code in > drivers/pci as well, though it doesn't feel urgent to me. Maybe a > good internship or mentoring project. I agree > > I'm not sure this juggling around is worth it basically to just clean > up the include path. The downside to me is exposing > pci_dev_is_added() to outside the PCI core, because I don't want > to encourage any other users. Hmm, for me the big question how to then handle the need for pci_dev_is_added() in my upcoming recovery code, that would be a new user outside the PCI core unless I move arch/s390/pci/pci_event.c to drivers/pci. That said I'm not sure I understand yet what makes pci_dev_is_added() unsuitable for outside the PCI core. I do understand that struct pci_dev::priv_flags is supposed to be private to PCI drivers but on the other hand the functionality of checking whether a PCI device has been added to a PCI bus seems rather basic and useful whenever working with a PCI device. > > > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > > > > --- > > > > Since v1 (and bad v2): > > > > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > > > > defines and also move these to include/linux/pci.h > > > > > > > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > > > > arch/powerpc/platforms/pseries/setup.c | 1 - > > > > arch/s390/pci/pci_sysfs.c | 2 -- > > > > drivers/pci/hotplug/acpiphp_glue.c | 1 - > > > > drivers/pci/pci.h | 15 --------------- > > > > include/linux/pci.h | 15 +++++++++++++++ > > > > 6 files changed, 15 insertions(+), 22 deletions(-) > > > > > > > > diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c > > > > index 28aac933a439..2e0ca5451e85 100644 > > > > --- a/arch/powerpc/platforms/powernv/pci-sriov.c > > > > +++ b/arch/powerpc/platforms/powernv/pci-sriov.c > > > > @@ -9,9 +9,6 @@ > > > > > > > > #include "pci.h" > > > > > > > > -/* for pci_dev_is_added() */ > > > > -#include "../../../../drivers/pci/pci.h" > > > > > > .. snip .. > >
On Tue, Jul 20, 2021 at 05:01:45PM +0200, Niklas Schnelle wrote: > The helper function pci_dev_is_added() from drivers/pci/pci.h is used in > PCI arch code of both s390 and powerpc leading to awkward relative > includes. Move it to the global include/linux/pci.h and get rid of these > includes just for that one function. > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> > --- > Since v1 (and bad v2): > - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING > defines and also move these to include/linux/pci.h > > arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- > arch/powerpc/platforms/pseries/setup.c | 1 - > arch/s390/pci/pci_sysfs.c | 2 -- > drivers/pci/hotplug/acpiphp_glue.c | 1 - > drivers/pci/pci.h | 15 --------------- > include/linux/pci.h | 15 +++++++++++++++ > 6 files changed, 15 insertions(+), 22 deletions(-) I dropped this one because I think a subsequent patch removed the use in arch/powerpc, so if you still need this, it probably needs to be updated to at least drop those hunks.
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c index 28aac933a439..2e0ca5451e85 100644 --- a/arch/powerpc/platforms/powernv/pci-sriov.c +++ b/arch/powerpc/platforms/powernv/pci-sriov.c @@ -9,9 +9,6 @@ #include "pci.h" -/* for pci_dev_is_added() */ -#include "../../../../drivers/pci/pci.h" - /* * The majority of the complexity in supporting SR-IOV on PowerNV comes from * the need to put the MMIO space for each VF into a separate PE. Internally diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 631a0d57b6cd..17585ec9f955 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -74,7 +74,6 @@ #include <asm/hvconsole.h> #include "pseries.h" -#include "../../../../drivers/pci/pci.h" DEFINE_STATIC_KEY_FALSE(shared_processor); EXPORT_SYMBOL_GPL(shared_processor); diff --git a/arch/s390/pci/pci_sysfs.c b/arch/s390/pci/pci_sysfs.c index 6e2450c2b9c1..8dbe54ef8f8e 100644 --- a/arch/s390/pci/pci_sysfs.c +++ b/arch/s390/pci/pci_sysfs.c @@ -13,8 +13,6 @@ #include <linux/stat.h> #include <linux/pci.h> -#include "../../../drivers/pci/pci.h" - #include <asm/sclp.h> #define zpci_attr(name, fmt, member) \ diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c index f031302ad401..4cb963f88183 100644 --- a/drivers/pci/hotplug/acpiphp_glue.c +++ b/drivers/pci/hotplug/acpiphp_glue.c @@ -38,7 +38,6 @@ #include <linux/slab.h> #include <linux/acpi.h> -#include "../pci.h" #include "acpiphp.h" static LIST_HEAD(bridge_list); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 93dcdd431072..a159cd0f6f05 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -383,21 +383,6 @@ static inline bool pci_dev_is_disconnected(const struct pci_dev *dev) return dev->error_state == pci_channel_io_perm_failure; } -/* pci_dev priv_flags */ -#define PCI_DEV_ADDED 0 -#define PCI_DPC_RECOVERED 1 -#define PCI_DPC_RECOVERING 2 - -static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) -{ - assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added); -} - -static inline bool pci_dev_is_added(const struct pci_dev *dev) -{ - return test_bit(PCI_DEV_ADDED, &dev->priv_flags); -} - #ifdef CONFIG_PCIEAER #include <linux/aer.h> diff --git a/include/linux/pci.h b/include/linux/pci.h index 540b377ca8f6..ea0e23dbc8ec 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -507,6 +507,21 @@ struct pci_dev { unsigned long priv_flags; /* Private flags for the PCI driver */ }; +/* pci_dev priv_flags */ +#define PCI_DEV_ADDED 0 +#define PCI_DPC_RECOVERED 1 +#define PCI_DPC_RECOVERING 2 + +static inline void pci_dev_assign_added(struct pci_dev *dev, bool added) +{ + assign_bit(PCI_DEV_ADDED, &dev->priv_flags, added); +} + +static inline bool pci_dev_is_added(const struct pci_dev *dev) +{ + return test_bit(PCI_DEV_ADDED, &dev->priv_flags); +} + static inline struct pci_dev *pci_physfn(struct pci_dev *dev) { #ifdef CONFIG_PCI_IOV
The helper function pci_dev_is_added() from drivers/pci/pci.h is used in PCI arch code of both s390 and powerpc leading to awkward relative includes. Move it to the global include/linux/pci.h and get rid of these includes just for that one function. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> --- Since v1 (and bad v2): - Fixed accidental removal of PCI_DPC_RECOVERED, PCI_DPC_RECOVERING defines and also move these to include/linux/pci.h arch/powerpc/platforms/powernv/pci-sriov.c | 3 --- arch/powerpc/platforms/pseries/setup.c | 1 - arch/s390/pci/pci_sysfs.c | 2 -- drivers/pci/hotplug/acpiphp_glue.c | 1 - drivers/pci/pci.h | 15 --------------- include/linux/pci.h | 15 +++++++++++++++ 6 files changed, 15 insertions(+), 22 deletions(-)