Message ID | 20170523054202.7985-2-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Hi, On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote: > Without this ->notify_reset instance may race with ->remove calls, > which can be easily triggered in NVMe. > Any input on this sometime before next -rc release would be great ? > Reported-by: Rakesh Pandit <rakesh@tuxera.com> > Tested-by: Rakesh Pandit <rakesh@tuxera.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/pci/pci.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b01bd5bba8e6..b61ad77dc322 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4275,11 +4275,13 @@ int pci_reset_function(struct pci_dev *dev) > if (rc) > return rc; > > + pci_dev_lock(dev); > pci_dev_save_and_disable(dev); > > - rc = pci_dev_reset(dev, 0); > + rc = __pci_dev_reset(dev, 0); > > pci_dev_restore(dev); > + pci_dev_unlock(dev); > > return rc; > } > -- > 2.11.0 >
[+cc Alex] On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote: > Without this ->notify_reset instance may race with ->remove calls, Do you mean the .reset_notify() method in struct pci_error_handlers? I don't see a "notify_reset" symbol. Can you elaborate on exactly how this race happens? I'm trying to figure out whether this is also a problem or potential problem with other reset paths like pci_try_reset_function(), pci_reset_bus(), pci_try_reset_bus(), pci_reset_slot(), and pci_try_reset_slot(). What does the race look like when it happens? Oops, panic, etc? Can this also be triggered via the sysfs "reset" file? > which can be easily triggered in NVMe. > > Reported-by: Rakesh Pandit <rakesh@tuxera.com> > Tested-by: Rakesh Pandit <rakesh@tuxera.com> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/pci/pci.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b01bd5bba8e6..b61ad77dc322 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4275,11 +4275,13 @@ int pci_reset_function(struct pci_dev *dev) > if (rc) > return rc; > > + pci_dev_lock(dev); > pci_dev_save_and_disable(dev); > > - rc = pci_dev_reset(dev, 0); > + rc = __pci_dev_reset(dev, 0); > > pci_dev_restore(dev); > + pci_dev_unlock(dev); > > return rc; > } > -- > 2.11.0 >
On Tue, May 30, 2017 at 05:28:44PM -0500, Bjorn Helgaas wrote: > [+cc Alex] > > On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote: > > Without this ->notify_reset instance may race with ->remove calls, > > Do you mean the .reset_notify() method in struct pci_error_handlers? > I don't see a "notify_reset" symbol. Yes. > Can you elaborate on exactly how this race happens? I'm trying to > figure out whether this is also a problem or potential problem with > other reset paths like pci_try_reset_function(), pci_reset_bus(), > pci_try_reset_bus(), pci_reset_slot(), and pci_try_reset_slot(). > > What does the race look like when it happens? Oops, panic, etc? > > Can this also be triggered via the sysfs "reset" file? Here is the original bug report: http://lists.infradead.org/pipermail/linux-nvme/2017-May/010345.html the issue is triggered through sysfs.
On Wed, May 31, 2017 at 06:58:48AM +0200, Christoph Hellwig wrote: > On Tue, May 30, 2017 at 05:28:44PM -0500, Bjorn Helgaas wrote: > > [+cc Alex] > > > > On Tue, May 23, 2017 at 07:42:02AM +0200, Christoph Hellwig wrote: > > > Without this ->notify_reset instance may race with ->remove calls, > > > > Do you mean the .reset_notify() method in struct pci_error_handlers? > > I don't see a "notify_reset" symbol. > > Yes. > > > Can you elaborate on exactly how this race happens? I'm trying to > > figure out whether this is also a problem or potential problem with > > other reset paths like pci_try_reset_function(), pci_reset_bus(), > > pci_try_reset_bus(), pci_reset_slot(), and pci_try_reset_slot(). > > > > What does the race look like when it happens? Oops, panic, etc? > > > > Can this also be triggered via the sysfs "reset" file? > > Here is the original bug report: > > http://lists.infradead.org/pipermail/linux-nvme/2017-May/010345.html > > the issue is triggered through sysfs. Thanks. Do you have any thoughts on whether the paths I mentioned above are also susceptible? I don't want to apply a patch that fixes one issue while leaving similar ones unfixed. Bjorn
On Wed, May 31, 2017 at 11:51:31AM -0500, Bjorn Helgaas wrote: > Do you have any thoughts on whether the paths I mentioned above are > also susceptible? At least in theory ever method call that doesn't have the device locked is susceptible, so yes.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index b01bd5bba8e6..b61ad77dc322 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4275,11 +4275,13 @@ int pci_reset_function(struct pci_dev *dev) if (rc) return rc; + pci_dev_lock(dev); pci_dev_save_and_disable(dev); - rc = pci_dev_reset(dev, 0); + rc = __pci_dev_reset(dev, 0); pci_dev_restore(dev); + pci_dev_unlock(dev); return rc; }