Message ID | 171709637423.1568751.11773767969980847536.stgit@dwillia2-xfh.jf.intel.com |
---|---|
State | Superseded |
Headers | show |
Series | PCI: Revert / replace the cfg_access_lock lockdep mechanism | expand |
Dan Williams wrote: > While the experiment did reveal that there are additional places that > are missing the lock during secondary bus reset, one of the places that > needs to take cfg_access_lock (pci_bus_lock()) is not prepared for > lockdep annotation. > > Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is > currently dependent on the fact that the device_lock() is marked > lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that > annotation, pci_bus_lock() would need to use something like a new > pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in > the topology, and a hope that the depth of a PCI tree never exceeds the > max value for a lockdep subclass. > > The alternative to ripping out the lockdep coverage would be to deploy a > dynamic lock key for every PCI device. Unfortunately, there is evidence > that increasing the number of keys that lockdep needs to track to be > per-PCI-device is prohibitively expensive for something like the > cfg_access_lock. > > The main motivation for adding the annotation in the first place was to > catch unlocked secondary bus resets, not necessarily catch lock ordering > problems between cfg_access_lock and other locks. > > Replace the lockdep tracking with a pci_warn_once() for that primary > concern. > > Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") > Reported-by: Imre Deak <imre.deak@intel.com> > Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html > Cc: Jani Saarinen <jani.saarinen@intel.com> > Cc: Dave Jiang <dave.jiang@intel.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Bjorn, this against mainline, not your tree where I see you already have "PCI: Make cfg_access_lock lockdep key a singleton" queued up. The "overkill" justification for making it singleton is valid, but then means that it has all the same problems as the device lock that needs to be marked lockdep_set_novalidate_class(). Let me know if you want this rebased on your for-linus branch. Note that the pci_warn_once() will trigger on all pci_bus_reset() users unless / until pci_bus_lock() additionally locks the bridge itself ala: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Apologies for the thrash, this has been a useful exercise for finding some of these gaps, but ultimately not possible to carry forward without more invasive changes.
On Thu, May 30, 2024 at 12:53:46PM -0700, Dan Williams wrote: > Dan Williams wrote: > > While the experiment did reveal that there are additional places that > > are missing the lock during secondary bus reset, one of the places that > > needs to take cfg_access_lock (pci_bus_lock()) is not prepared for > > lockdep annotation. > > > > Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is > > currently dependent on the fact that the device_lock() is marked > > lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that > > annotation, pci_bus_lock() would need to use something like a new > > pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in > > the topology, and a hope that the depth of a PCI tree never exceeds the > > max value for a lockdep subclass. > > > > The alternative to ripping out the lockdep coverage would be to deploy a > > dynamic lock key for every PCI device. Unfortunately, there is evidence > > that increasing the number of keys that lockdep needs to track to be > > per-PCI-device is prohibitively expensive for something like the > > cfg_access_lock. > > > > The main motivation for adding the annotation in the first place was to > > catch unlocked secondary bus resets, not necessarily catch lock ordering > > problems between cfg_access_lock and other locks. > > > > Replace the lockdep tracking with a pci_warn_once() for that primary > > concern. > > > > Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") > > Reported-by: Imre Deak <imre.deak@intel.com> > > Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html > > Cc: Jani Saarinen <jani.saarinen@intel.com> > > Cc: Dave Jiang <dave.jiang@intel.com> > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > > Bjorn, this against mainline, not your tree where I see you already have > "PCI: Make cfg_access_lock lockdep key a singleton" queued up. The > "overkill" justification for making it singleton is valid, but then > means that it has all the same problems as the device lock that needs to > be marked lockdep_set_novalidate_class(). > > Let me know if you want this rebased on your for-linus branch. > > Note that the pci_warn_once() will trigger on all pci_bus_reset() users > unless / until pci_bus_lock() additionally locks the bridge itself ala: > > http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch > > Apologies for the thrash, this has been a useful exercise for finding > some of these gaps, but ultimately not possible to carry forward > without more invasive changes. No problem, this is a complicated locking scenario. These fixes are the only thing on my for-linus branch (which I regard as a draft rather than being immutable) and I haven't asked Linus to pull them yet, so I'll just drop both: ac445566fcf9 ("PCI: Make cfg_access_lock lockdep key a singleton") f941b9182c54 ("PCI: Fix missing lockdep annotation for pci_cfg_access_trylock()") I think the clearest way to do this would be to do a simple revert of 7e89efc6e9e4, followed by a second patch to add the pci_warn_once(). The revert would definitely be v6.10 material. The pci_warn_once() might be v6.11 material. Or if you think it will find significant bugs, maybe that's v6.10 material as well, but it'll be easier to make that argument if it's in a separate patch. Bjorn
On 5/30/24 1:52 PM, Bjorn Helgaas wrote: > On Thu, May 30, 2024 at 12:53:46PM -0700, Dan Williams wrote: >> Dan Williams wrote: >>> While the experiment did reveal that there are additional places that >>> are missing the lock during secondary bus reset, one of the places that >>> needs to take cfg_access_lock (pci_bus_lock()) is not prepared for >>> lockdep annotation. >>> >>> Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is >>> currently dependent on the fact that the device_lock() is marked >>> lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that >>> annotation, pci_bus_lock() would need to use something like a new >>> pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in >>> the topology, and a hope that the depth of a PCI tree never exceeds the >>> max value for a lockdep subclass. >>> >>> The alternative to ripping out the lockdep coverage would be to deploy a >>> dynamic lock key for every PCI device. Unfortunately, there is evidence >>> that increasing the number of keys that lockdep needs to track to be >>> per-PCI-device is prohibitively expensive for something like the >>> cfg_access_lock. >>> >>> The main motivation for adding the annotation in the first place was to >>> catch unlocked secondary bus resets, not necessarily catch lock ordering >>> problems between cfg_access_lock and other locks. >>> >>> Replace the lockdep tracking with a pci_warn_once() for that primary >>> concern. >>> >>> Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") >>> Reported-by: Imre Deak <imre.deak@intel.com> >>> Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html >>> Cc: Jani Saarinen <jani.saarinen@intel.com> >>> Cc: Dave Jiang <dave.jiang@intel.com> >>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >> >> Bjorn, this against mainline, not your tree where I see you already have >> "PCI: Make cfg_access_lock lockdep key a singleton" queued up. The >> "overkill" justification for making it singleton is valid, but then >> means that it has all the same problems as the device lock that needs to >> be marked lockdep_set_novalidate_class(). >> >> Let me know if you want this rebased on your for-linus branch. >> >> Note that the pci_warn_once() will trigger on all pci_bus_reset() users >> unless / until pci_bus_lock() additionally locks the bridge itself ala: >> >> http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch >> >> Apologies for the thrash, this has been a useful exercise for finding >> some of these gaps, but ultimately not possible to carry forward >> without more invasive changes. > > No problem, this is a complicated locking scenario. These fixes are > the only thing on my for-linus branch (which I regard as a draft > rather than being immutable) and I haven't asked Linus to pull them > yet, so I'll just drop both: > > ac445566fcf9 ("PCI: Make cfg_access_lock lockdep key a singleton") > f941b9182c54 ("PCI: Fix missing lockdep annotation for pci_cfg_access_trylock()") > > I think the clearest way to do this would be to do a simple revert of > 7e89efc6e9e4, followed by a second patch to add the pci_warn_once(). Complete revert of 7e89efc6e9e4 will also remove the bridge locking which I think we want to keep right? > > The revert would definitely be v6.10 material. The pci_warn_once() > might be v6.11 material. Or if you think it will find significant > bugs, maybe that's v6.10 material as well, but it'll be easier to make > that argument if it's in a separate patch. > > Bjorn
On Thu, May 30, 2024 at 02:03:09PM -0700, Dave Jiang wrote: > On 5/30/24 1:52 PM, Bjorn Helgaas wrote: > > On Thu, May 30, 2024 at 12:53:46PM -0700, Dan Williams wrote: > >> Dan Williams wrote: > >>> While the experiment did reveal that there are additional places that > >>> are missing the lock during secondary bus reset, one of the places that > >>> needs to take cfg_access_lock (pci_bus_lock()) is not prepared for > >>> lockdep annotation. > >>> > >>> Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is > >>> currently dependent on the fact that the device_lock() is marked > >>> lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that > >>> annotation, pci_bus_lock() would need to use something like a new > >>> pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in > >>> the topology, and a hope that the depth of a PCI tree never exceeds the > >>> max value for a lockdep subclass. > >>> > >>> The alternative to ripping out the lockdep coverage would be to deploy a > >>> dynamic lock key for every PCI device. Unfortunately, there is evidence > >>> that increasing the number of keys that lockdep needs to track to be > >>> per-PCI-device is prohibitively expensive for something like the > >>> cfg_access_lock. > >>> > >>> The main motivation for adding the annotation in the first place was to > >>> catch unlocked secondary bus resets, not necessarily catch lock ordering > >>> problems between cfg_access_lock and other locks. > >>> > >>> Replace the lockdep tracking with a pci_warn_once() for that primary > >>> concern. > >>> > >>> Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") > >>> Reported-by: Imre Deak <imre.deak@intel.com> > >>> Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html > >>> Cc: Jani Saarinen <jani.saarinen@intel.com> > >>> Cc: Dave Jiang <dave.jiang@intel.com> > >>> Cc: Bjorn Helgaas <bhelgaas@google.com> > >>> Signed-off-by: Dan Williams <dan.j.williams@intel.com> > >> > >> Bjorn, this against mainline, not your tree where I see you already have > >> "PCI: Make cfg_access_lock lockdep key a singleton" queued up. The > >> "overkill" justification for making it singleton is valid, but then > >> means that it has all the same problems as the device lock that needs to > >> be marked lockdep_set_novalidate_class(). > >> > >> Let me know if you want this rebased on your for-linus branch. > >> > >> Note that the pci_warn_once() will trigger on all pci_bus_reset() users > >> unless / until pci_bus_lock() additionally locks the bridge itself ala: > >> > >> http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch > >> > >> Apologies for the thrash, this has been a useful exercise for finding > >> some of these gaps, but ultimately not possible to carry forward > >> without more invasive changes. > > > > No problem, this is a complicated locking scenario. These fixes are > > the only thing on my for-linus branch (which I regard as a draft > > rather than being immutable) and I haven't asked Linus to pull them > > yet, so I'll just drop both: > > > > ac445566fcf9 ("PCI: Make cfg_access_lock lockdep key a singleton") > > f941b9182c54 ("PCI: Fix missing lockdep annotation for pci_cfg_access_trylock()") > > > > I think the clearest way to do this would be to do a simple revert of > > 7e89efc6e9e4, followed by a second patch to add the pci_warn_once(). > > Complete revert of 7e89efc6e9e4 will also remove the bridge locking > which I think we want to keep right? I dunno, you tell me. If we want to revert just part of 7e89efc6e9e4, it would be clearer to do that by itself, then add the new stuff separately. > > The revert would definitely be v6.10 material. The pci_warn_once() > > might be v6.11 material. Or if you think it will find significant > > bugs, maybe that's v6.10 material as well, but it'll be easier to make > > that argument if it's in a separate patch. > > > > Bjorn
On 5/30/24 2:08 PM, Bjorn Helgaas wrote: > On Thu, May 30, 2024 at 02:03:09PM -0700, Dave Jiang wrote: >> On 5/30/24 1:52 PM, Bjorn Helgaas wrote: >>> On Thu, May 30, 2024 at 12:53:46PM -0700, Dan Williams wrote: >>>> Dan Williams wrote: >>>>> While the experiment did reveal that there are additional places that >>>>> are missing the lock during secondary bus reset, one of the places that >>>>> needs to take cfg_access_lock (pci_bus_lock()) is not prepared for >>>>> lockdep annotation. >>>>> >>>>> Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is >>>>> currently dependent on the fact that the device_lock() is marked >>>>> lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that >>>>> annotation, pci_bus_lock() would need to use something like a new >>>>> pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in >>>>> the topology, and a hope that the depth of a PCI tree never exceeds the >>>>> max value for a lockdep subclass. >>>>> >>>>> The alternative to ripping out the lockdep coverage would be to deploy a >>>>> dynamic lock key for every PCI device. Unfortunately, there is evidence >>>>> that increasing the number of keys that lockdep needs to track to be >>>>> per-PCI-device is prohibitively expensive for something like the >>>>> cfg_access_lock. >>>>> >>>>> The main motivation for adding the annotation in the first place was to >>>>> catch unlocked secondary bus resets, not necessarily catch lock ordering >>>>> problems between cfg_access_lock and other locks. >>>>> >>>>> Replace the lockdep tracking with a pci_warn_once() for that primary >>>>> concern. >>>>> >>>>> Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") >>>>> Reported-by: Imre Deak <imre.deak@intel.com> >>>>> Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html >>>>> Cc: Jani Saarinen <jani.saarinen@intel.com> >>>>> Cc: Dave Jiang <dave.jiang@intel.com> >>>>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com> >>>> >>>> Bjorn, this against mainline, not your tree where I see you already have >>>> "PCI: Make cfg_access_lock lockdep key a singleton" queued up. The >>>> "overkill" justification for making it singleton is valid, but then >>>> means that it has all the same problems as the device lock that needs to >>>> be marked lockdep_set_novalidate_class(). >>>> >>>> Let me know if you want this rebased on your for-linus branch. >>>> >>>> Note that the pci_warn_once() will trigger on all pci_bus_reset() users >>>> unless / until pci_bus_lock() additionally locks the bridge itself ala: >>>> >>>> http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch >>>> >>>> Apologies for the thrash, this has been a useful exercise for finding >>>> some of these gaps, but ultimately not possible to carry forward >>>> without more invasive changes. >>> >>> No problem, this is a complicated locking scenario. These fixes are >>> the only thing on my for-linus branch (which I regard as a draft >>> rather than being immutable) and I haven't asked Linus to pull them >>> yet, so I'll just drop both: >>> >>> ac445566fcf9 ("PCI: Make cfg_access_lock lockdep key a singleton") >>> f941b9182c54 ("PCI: Fix missing lockdep annotation for pci_cfg_access_trylock()") >>> >>> I think the clearest way to do this would be to do a simple revert of >>> 7e89efc6e9e4, followed by a second patch to add the pci_warn_once(). >> >> Complete revert of 7e89efc6e9e4 will also remove the bridge locking >> which I think we want to keep right? > > I dunno, you tell me. If we want to revert just part of 7e89efc6e9e4, > it would be clearer to do that by itself, then add the new stuff > separately. Unless Dan objects I think we should do a partial revert and only remove the lockdep bits. > >>> The revert would definitely be v6.10 material. The pci_warn_once() >>> might be v6.11 material. Or if you think it will find significant >>> bugs, maybe that's v6.10 material as well, but it'll be easier to make >>> that argument if it's in a separate patch. >>> >>> Bjorn
diff --git a/drivers/pci/access.c b/drivers/pci/access.c index 30f031de9cfe..b123da16b63b 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -289,8 +289,6 @@ void pci_cfg_access_lock(struct pci_dev *dev) { might_sleep(); - lock_map_acquire(&dev->cfg_access_lock); - raw_spin_lock_irq(&pci_lock); if (dev->block_cfg_access) pci_wait_cfg(dev); @@ -345,8 +343,6 @@ void pci_cfg_access_unlock(struct pci_dev *dev) raw_spin_unlock_irqrestore(&pci_lock, flags); wake_up_all(&pci_cfg_wait); - - lock_map_release(&dev->cfg_access_lock); } EXPORT_SYMBOL_GPL(pci_cfg_access_unlock); diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 59e0949fb079..8df32a3a0979 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4883,7 +4883,9 @@ void __weak pcibios_reset_secondary_bus(struct pci_dev *dev) */ int pci_bridge_secondary_bus_reset(struct pci_dev *dev) { - lock_map_assert_held(&dev->cfg_access_lock); + if (!dev->block_cfg_access) + pci_warn_once(dev, "unlocked secondary bus reset via: %pS\n", + __builtin_return_address(0)); pcibios_reset_secondary_bus(dev); return pci_bridge_wait_for_secondary_bus(dev, "bus reset"); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 8e696e547565..5fbabb4e3425 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2546,9 +2546,6 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) dev->dev.dma_mask = &dev->dma_mask; dev->dev.dma_parms = &dev->dma_parms; dev->dev.coherent_dma_mask = 0xffffffffull; - lockdep_register_key(&dev->cfg_access_key); - lockdep_init_map(&dev->cfg_access_lock, dev_name(&dev->dev), - &dev->cfg_access_key, 0); dma_set_max_seg_size(&dev->dev, 65536); dma_set_seg_boundary(&dev->dev, 0xffffffff); diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 5e51b0de4c4b..08b0d1d9d78b 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -297,9 +297,6 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); .wait_type_inner = _wait_type, \ .lock_type = LD_LOCK_WAIT_OVERRIDE, } -#define lock_map_assert_held(l) \ - lockdep_assert(lock_is_held(l) != LOCK_STATE_NOT_HELD) - #else /* !CONFIG_LOCKDEP */ static inline void lockdep_init_task(struct task_struct *task) @@ -391,8 +388,6 @@ extern int lockdep_is_held(const void *); #define DEFINE_WAIT_OVERRIDE_MAP(_name, _wait_type) \ struct lockdep_map __maybe_unused _name = {} -#define lock_map_assert_held(l) do { (void)(l); } while (0) - #endif /* !LOCKDEP */ #ifdef CONFIG_PROVE_LOCKING diff --git a/include/linux/pci.h b/include/linux/pci.h index fb004fd4e889..cafc5ab1cbcb 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -413,8 +413,6 @@ struct pci_dev { struct resource driver_exclusive_resource; /* driver exclusive resource ranges */ bool match_driver; /* Skip attaching driver */ - struct lock_class_key cfg_access_key; - struct lockdep_map cfg_access_lock; unsigned int transparent:1; /* Subtractive decode bridge */ unsigned int io_window:1; /* Bridge has I/O window */
While the experiment did reveal that there are additional places that are missing the lock during secondary bus reset, one of the places that needs to take cfg_access_lock (pci_bus_lock()) is not prepared for lockdep annotation. Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is currently dependent on the fact that the device_lock() is marked lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that annotation, pci_bus_lock() would need to use something like a new pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in the topology, and a hope that the depth of a PCI tree never exceeds the max value for a lockdep subclass. The alternative to ripping out the lockdep coverage would be to deploy a dynamic lock key for every PCI device. Unfortunately, there is evidence that increasing the number of keys that lockdep needs to track to be per-PCI-device is prohibitively expensive for something like the cfg_access_lock. The main motivation for adding the annotation in the first place was to catch unlocked secondary bus resets, not necessarily catch lock ordering problems between cfg_access_lock and other locks. Replace the lockdep tracking with a pci_warn_once() for that primary concern. Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()") Reported-by: Imre Deak <imre.deak@intel.com> Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html Cc: Jani Saarinen <jani.saarinen@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/pci/access.c | 4 ---- drivers/pci/pci.c | 4 +++- drivers/pci/probe.c | 3 --- include/linux/lockdep.h | 5 ----- include/linux/pci.h | 2 -- 5 files changed, 3 insertions(+), 15 deletions(-)