Message ID | 20190311133122.11417-3-s.miroshnichenko@yadro.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Allow BAR movement during hotplug | expand |
[+cc Srinath, Marta, LKML] On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote: > CPU0 CPU1 > > pci_enable_device_mem() pci_enable_device_mem() > pci_enable_bridge() pci_enable_bridge() > pci_is_enabled() > return false; > atomic_inc_return(enable_cnt) > Start actual enabling the bridge > ... pci_is_enabled() > ... return true; > ... Start memory requests <-- FAIL > ... > Set the PCI_COMMAND_MEMORY bit <-- Must wait for this > > This patch protects the pci_enable/disable_device() and pci_enable_bridge() > with mutexes. This is a subtle issue that we've tried to fix before, but we've never had a satisfactory solution, so I hope you've figured out the right fix. I'll include some links to previous discussion. This patch is very similar to [2], which we didn't actually apply. We did apply the patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), but it caused the regressions reported in [4,5], so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while enabling upstream bridges""). I think the underlying design problem is that we have a driver for device B calling pci_enable_device(), and it is changing the state of device A (an upstream bridge). The model generally is that a driver should only touch the device it is bound to. It's tricky to get the locking right when several children of device A all need to operate on A. That's all to say I'll have to think carefully about this particular patch, so I'll go on to the others and come back to this one. Bjorn [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u [RFC PATCH v3] pci: Concurrency issue during pci enable bridge [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u iwlwifi firmware load broken in current -git [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u [RFC PATCH] nvme: avoid race-conditions when enabling devices > Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com> > --- > drivers/pci/pci.c | 26 ++++++++++++++++++++++---- > drivers/pci/probe.c | 1 + > include/linux/pci.h | 1 + > 3 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index f006068be209..895201d4c9e6 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev) > struct pci_dev *bridge; > int retval; > > + mutex_lock(&dev->enable_mutex); > + > bridge = pci_upstream_bridge(dev); > if (bridge) > pci_enable_bridge(bridge); > @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev) > if (pci_is_enabled(dev)) { > if (!dev->is_busmaster) > pci_set_master(dev); > + mutex_unlock(&dev->enable_mutex); > return; > } > > @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev) > pci_err(dev, "Error enabling bridge (%d), continuing\n", > retval); > pci_set_master(dev); > + mutex_unlock(&dev->enable_mutex); > } > > static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) > { > struct pci_dev *bridge; > + /* Enable-locking of bridges is performed within the pci_enable_bridge() */ > + bool need_lock = !dev->subordinate; > int err; > int i, bars = 0; > > @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) > dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); > } > > - if (atomic_inc_return(&dev->enable_cnt) > 1) > + if (need_lock) > + mutex_lock(&dev->enable_mutex); > + if (pci_is_enabled(dev)) { > + if (need_lock) > + mutex_unlock(&dev->enable_mutex); > return 0; /* already enabled */ > + } > > bridge = pci_upstream_bridge(dev); > if (bridge) > @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) > bars |= (1 << i); > > err = do_pci_enable_device(dev, bars); > - if (err < 0) > - atomic_dec(&dev->enable_cnt); > + if (err >= 0) > + atomic_inc(&dev->enable_cnt); > + if (need_lock) > + mutex_unlock(&dev->enable_mutex); > return err; > } > > @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev) > if (dr) > dr->enabled = 0; > > + mutex_lock(&dev->enable_mutex); > dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0, > "disabling already-disabled device"); > > - if (atomic_dec_return(&dev->enable_cnt) != 0) > + if (atomic_dec_return(&dev->enable_cnt) != 0) { > + mutex_unlock(&dev->enable_mutex); > return; > + } > > do_pci_disable_device(dev); > > dev->is_busmaster = 0; > + > + mutex_unlock(&dev->enable_mutex); > } > EXPORT_SYMBOL(pci_disable_device); > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 2ec0df04e0dc..977a127ce791 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus) > INIT_LIST_HEAD(&dev->bus_list); > dev->dev.type = &pci_dev_type; > dev->bus = pci_bus_get(bus); > + mutex_init(&dev->enable_mutex); > > return dev; > } > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 77448215ef5b..cb2760a31fe2 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -419,6 +419,7 @@ struct pci_dev { > unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ > pci_dev_flags_t dev_flags; > atomic_t enable_cnt; /* pci_enable_device has been called */ > + struct mutex enable_mutex; > > u32 saved_config_space[16]; /* Config space saved at suspend time */ > struct hlist_head saved_cap_space; > -- > 2.20.1 >
On 3/26/19 10:00 PM, Bjorn Helgaas wrote: > [+cc Srinath, Marta, LKML] > > On Mon, Mar 11, 2019 at 04:31:03PM +0300, Sergey Miroshnichenko wrote: >> CPU0 CPU1 >> >> pci_enable_device_mem() pci_enable_device_mem() >> pci_enable_bridge() pci_enable_bridge() >> pci_is_enabled() >> return false; >> atomic_inc_return(enable_cnt) >> Start actual enabling the bridge >> ... pci_is_enabled() >> ... return true; >> ... Start memory requests <-- FAIL >> ... >> Set the PCI_COMMAND_MEMORY bit <-- Must wait for this >> >> This patch protects the pci_enable/disable_device() and pci_enable_bridge() >> with mutexes. > > This is a subtle issue that we've tried to fix before, but we've never > had a satisfactory solution, so I hope you've figured out the right > fix. > > I'll include some links to previous discussion. This patch is very > similar to [2], which we didn't actually apply. We did apply the > patch from [3] as 40f11adc7cd9 ("PCI: Avoid race while enabling > upstream bridges"), but it caused the regressions reported in [4,5], > so we reverted it with 0f50a49e3008 ("Revert "PCI: Avoid race while > enabling upstream bridges""). > Thanks for the links, I wasn't aware of these discussions and patches! On PowerNV this issue is partially hidden by db2173198b95 ("powerpc/powernv/pci: Work around races in PCI bridge enabling"), and on x86 BIOS pre-initializes all the bridges, so it doesn't reproduce until hotplugging in a hotplugged bridge. This patch is indeed similar to 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), but instead of a single static mutex it adds per-device mutexes and prevents the dev->enable_cnt from incrementing too early. So it's not needed anymore to carefully select a moment safe enough to enable the device. Serge > I think the underlying design problem is that we have a driver for > device B calling pci_enable_device(), and it is changing the state of > device A (an upstream bridge). The model generally is that a driver > should only touch the device it is bound to. > > It's tricky to get the locking right when several children of device A > all need to operate on A. > > That's all to say I'll have to think carefully about this particular > patch, so I'll go on to the others and come back to this one. > > Bjorn > > [1] https://lore.kernel.org/linux-pci/1494256190-28993-1-git-send-email-srinath.mannam@broadcom.com/T/#u > [RFC PATCH] pci: Concurrency issue in NVMe Init through PCIe switch > > [2] https://lore.kernel.org/linux-pci/1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com/T/#u > [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch > > [3] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u > [RFC PATCH v3] pci: Concurrency issue during pci enable bridge > > [4] https://lore.kernel.org/linux-pci/150547971091.977464.16294045866179907260.stgit@buzz/T/#u > [PATCH bisected regression in 4.14] PCI: fix race while enabling upstream bridges concurrently > > [5] https://lore.kernel.org/linux-wireless/04c9b578-693c-1dc6-9f0f-904580231b21@kernel.dk/T/#u > iwlwifi firmware load broken in current -git > > [6] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u > [RFC PATCH] nvme: avoid race-conditions when enabling devices > >> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com> >> --- >> drivers/pci/pci.c | 26 ++++++++++++++++++++++---- >> drivers/pci/probe.c | 1 + >> include/linux/pci.h | 1 + >> 3 files changed, 24 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index f006068be209..895201d4c9e6 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev) >> struct pci_dev *bridge; >> int retval; >> >> + mutex_lock(&dev->enable_mutex); >> + >> bridge = pci_upstream_bridge(dev); >> if (bridge) >> pci_enable_bridge(bridge); >> @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev) >> if (pci_is_enabled(dev)) { >> if (!dev->is_busmaster) >> pci_set_master(dev); >> + mutex_unlock(&dev->enable_mutex); >> return; >> } >> >> @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev) >> pci_err(dev, "Error enabling bridge (%d), continuing\n", >> retval); >> pci_set_master(dev); >> + mutex_unlock(&dev->enable_mutex); >> } >> >> static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) >> { >> struct pci_dev *bridge; >> + /* Enable-locking of bridges is performed within the pci_enable_bridge() */ >> + bool need_lock = !dev->subordinate; >> int err; >> int i, bars = 0; >> >> @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) >> dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); >> } >> >> - if (atomic_inc_return(&dev->enable_cnt) > 1) >> + if (need_lock) >> + mutex_lock(&dev->enable_mutex); >> + if (pci_is_enabled(dev)) { >> + if (need_lock) >> + mutex_unlock(&dev->enable_mutex); >> return 0; /* already enabled */ >> + } >> >> bridge = pci_upstream_bridge(dev); >> if (bridge) >> @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) >> bars |= (1 << i); >> >> err = do_pci_enable_device(dev, bars); >> - if (err < 0) >> - atomic_dec(&dev->enable_cnt); >> + if (err >= 0) >> + atomic_inc(&dev->enable_cnt); >> + if (need_lock) >> + mutex_unlock(&dev->enable_mutex); >> return err; >> } >> >> @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev) >> if (dr) >> dr->enabled = 0; >> >> + mutex_lock(&dev->enable_mutex); >> dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0, >> "disabling already-disabled device"); >> >> - if (atomic_dec_return(&dev->enable_cnt) != 0) >> + if (atomic_dec_return(&dev->enable_cnt) != 0) { >> + mutex_unlock(&dev->enable_mutex); >> return; >> + } >> >> do_pci_disable_device(dev); >> >> dev->is_busmaster = 0; >> + >> + mutex_unlock(&dev->enable_mutex); >> } >> EXPORT_SYMBOL(pci_disable_device); >> >> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >> index 2ec0df04e0dc..977a127ce791 100644 >> --- a/drivers/pci/probe.c >> +++ b/drivers/pci/probe.c >> @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus) >> INIT_LIST_HEAD(&dev->bus_list); >> dev->dev.type = &pci_dev_type; >> dev->bus = pci_bus_get(bus); >> + mutex_init(&dev->enable_mutex); >> >> return dev; >> } >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 77448215ef5b..cb2760a31fe2 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -419,6 +419,7 @@ struct pci_dev { >> unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ >> pci_dev_flags_t dev_flags; >> atomic_t enable_cnt; /* pci_enable_device has been called */ >> + struct mutex enable_mutex; >> >> u32 saved_config_space[16]; /* Config space saved at suspend time */ >> struct hlist_head saved_cap_space; >> -- >> 2.20.1 >>
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f006068be209..895201d4c9e6 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1615,6 +1615,8 @@ static void pci_enable_bridge(struct pci_dev *dev) struct pci_dev *bridge; int retval; + mutex_lock(&dev->enable_mutex); + bridge = pci_upstream_bridge(dev); if (bridge) pci_enable_bridge(bridge); @@ -1622,6 +1624,7 @@ static void pci_enable_bridge(struct pci_dev *dev) if (pci_is_enabled(dev)) { if (!dev->is_busmaster) pci_set_master(dev); + mutex_unlock(&dev->enable_mutex); return; } @@ -1630,11 +1633,14 @@ static void pci_enable_bridge(struct pci_dev *dev) pci_err(dev, "Error enabling bridge (%d), continuing\n", retval); pci_set_master(dev); + mutex_unlock(&dev->enable_mutex); } static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) { struct pci_dev *bridge; + /* Enable-locking of bridges is performed within the pci_enable_bridge() */ + bool need_lock = !dev->subordinate; int err; int i, bars = 0; @@ -1650,8 +1656,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK); } - if (atomic_inc_return(&dev->enable_cnt) > 1) + if (need_lock) + mutex_lock(&dev->enable_mutex); + if (pci_is_enabled(dev)) { + if (need_lock) + mutex_unlock(&dev->enable_mutex); return 0; /* already enabled */ + } bridge = pci_upstream_bridge(dev); if (bridge) @@ -1666,8 +1677,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) bars |= (1 << i); err = do_pci_enable_device(dev, bars); - if (err < 0) - atomic_dec(&dev->enable_cnt); + if (err >= 0) + atomic_inc(&dev->enable_cnt); + if (need_lock) + mutex_unlock(&dev->enable_mutex); return err; } @@ -1910,15 +1923,20 @@ void pci_disable_device(struct pci_dev *dev) if (dr) dr->enabled = 0; + mutex_lock(&dev->enable_mutex); dev_WARN_ONCE(&dev->dev, atomic_read(&dev->enable_cnt) <= 0, "disabling already-disabled device"); - if (atomic_dec_return(&dev->enable_cnt) != 0) + if (atomic_dec_return(&dev->enable_cnt) != 0) { + mutex_unlock(&dev->enable_mutex); return; + } do_pci_disable_device(dev); dev->is_busmaster = 0; + + mutex_unlock(&dev->enable_mutex); } EXPORT_SYMBOL(pci_disable_device); diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 2ec0df04e0dc..977a127ce791 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2267,6 +2267,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus) INIT_LIST_HEAD(&dev->bus_list); dev->dev.type = &pci_dev_type; dev->bus = pci_bus_get(bus); + mutex_init(&dev->enable_mutex); return dev; } diff --git a/include/linux/pci.h b/include/linux/pci.h index 77448215ef5b..cb2760a31fe2 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -419,6 +419,7 @@ struct pci_dev { unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */ pci_dev_flags_t dev_flags; atomic_t enable_cnt; /* pci_enable_device has been called */ + struct mutex enable_mutex; u32 saved_config_space[16]; /* Config space saved at suspend time */ struct hlist_head saved_cap_space;
CPU0 CPU1 pci_enable_device_mem() pci_enable_device_mem() pci_enable_bridge() pci_enable_bridge() pci_is_enabled() return false; atomic_inc_return(enable_cnt) Start actual enabling the bridge ... pci_is_enabled() ... return true; ... Start memory requests <-- FAIL ... Set the PCI_COMMAND_MEMORY bit <-- Must wait for this This patch protects the pci_enable/disable_device() and pci_enable_bridge() with mutexes. Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com> --- drivers/pci/pci.c | 26 ++++++++++++++++++++++---- drivers/pci/probe.c | 1 + include/linux/pci.h | 1 + 3 files changed, 24 insertions(+), 4 deletions(-)