Message ID | 1527043490-17268-1-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
[-cc Gabriele (invalid email address)] [+cc Don, esc.storagedev, linux-scsi since hpsa is involved] Background for newcomers: Ryan reported a panic on shutdown/reboot [1] on DL360 Gen9. I think the problem is that the shutdown path clears PCI_COMMAND_MASTER on the Root Port leading to an hpsa device, the hpsa device attempts a DMA, the Root Port treats the DMA as an Unsupported Request (a fatal error), and that leads to a panic. This patch avoids the problem by changing the Root Port shutdown path so it doesn't clear PCI_COMMAND_MASTER. [1] https://bugzilla.kernel.org/show_bug.cgi?id=199779 On Tue, May 22, 2018 at 10:44:46PM -0400, Sinan Kaya wrote: > 'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during > shutdown")' has been added to kernel to shutdown pending PCIe port > service interrupts during reboot so that a newly started kexec kernel > wouldn't observe pending interrupts. > > pcie_port_device_remove() is disabling the root port and switches by > calling pci_disable_device() after all PCIe service drivers are shutdown. It's interesting and annoying that pci_enable_device() and pci_disable_device() sound like they should be inverses, or at least related, but they really aren't. pci_enable_device() basically turns on PCI_COMMAND_MEMORY and/or PCI_COMMAND_IO so the device will respond to programmed I/O. But pci_disable_device() leaves programmed I/O enabled and turns off PCI_COMMAND_MASTER, which keeps the device from performing DMA or forwarding transactions upstream (in the case of bridges). Not an issue for this patch, just an observation :) > pci_disable_device() has a much wider impact then port service itself and > it prevents all inbound transactions to reach to the system and impacts > the entire PCI traffic behind the bridge. > > Issue is that pcie_port_device_remove() doesn't maintain any coordination > with the rest of the PCI device drivers in the system before clearing the > bus master bit. I am interested in this part because if there really isn't any coordination, that's a good argument for getting rid of portdrv, which I want to do anyway. But I haven't dug into this enough to verify it. The portdrv does register as a regular PCI driver (pcie_portdriver), and I'm pretty sure the driver core is smart enough to call the driver entry points in a postorder traversal (children before parents), which means the hpsa .shutdown() should be called before the portdrv .shutdown(). The crash seems to indicate that the hpsa device attempted a DMA after we cleared the Root Port's PCI_COMMAND_MASTER, which means hpsa_shutdown() didn't stop DMA from the device (it looks like *most* shutdown methods don't disable device DMA, so it's in good company). > This has been found to cause crashes on HP DL360 Gen9 machines during > reboot. Besides, kexec is already clearing the bus master bit in > pci_device_shutdown() after all PCI drivers are removed. The original path was: pci_device_shutdown(hpsa) drv->shutdown hpsa_shutdown # hpsa_pci_driver.shutdown ... pci_device_shutdown(RP) # root port drv->shutdown pcie_portdrv_remove # pcie_portdriver.shutdown pcie_port_device_remove pci_disable_device do_pci_disable_device # clear RP PCI_COMMAND_MASTER if (kexec) pci_clear_master(RP) # clear RP PCI_COMMAND_MASTER If I understand correctly, the new path after this patch is: pci_device_shutdown(hpsa) drv->shutdown hpsa_shutdown # hpsa_pci_driver.shutdown ... pci_device_shutdown(RP) # root port drv->shutdown pcie_portdrv_shutdown # pcie_portdriver.shutdown __pcie_portdrv_remove(RP, false) pcie_port_device_remove(RP, false) # do NOT clear RP PCI_COMMAND_MASTER if (kexec) pci_clear_master(RP) # clear RP PCI_COMMAND_MASTER I guess this patch avoids the panic during reboot because we're not in the kexec path, so we never clear PCI_COMMAND_MASTER for the Root Port, so the hpsa device can DMA happily until the lights go out. But DMA continuing for some random amount of time before the reboot or shutdown happens makes me a little queasy. That doesn't sound safe. The more I think about this, the more confused I get. What am I missing? > Just remove the extra clear in shutdown path by seperating the remove and > shutdown APIs in the PORTDRV. > > Signed-off-by: Sinan Kaya <okaya@codeaurora.org> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779 > Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown") > Cc: stable@vger.kernel.org > Reported-by: Ryan Finnie <ryan@finnie.org> > --- > drivers/pci/pcie/portdrv.h | 2 +- > drivers/pci/pcie/portdrv_core.c | 5 +++-- > drivers/pci/pcie/portdrv_pci.c | 16 +++++++++++++--- > 3 files changed, 17 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h > index d0c6783..f6e88fe 100644 > --- a/drivers/pci/pcie/portdrv.h > +++ b/drivers/pci/pcie/portdrv.h > @@ -86,7 +86,7 @@ int pcie_port_device_register(struct pci_dev *dev); > int pcie_port_device_suspend(struct device *dev); > int pcie_port_device_resume(struct device *dev); > #endif > -void pcie_port_device_remove(struct pci_dev *dev); > +void pcie_port_device_remove(struct pci_dev *dev, bool disable); > int __must_check pcie_port_bus_register(void); > void pcie_port_bus_unregister(void); > > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c > index c9c0663..f35341e 100644 > --- a/drivers/pci/pcie/portdrv_core.c > +++ b/drivers/pci/pcie/portdrv_core.c > @@ -405,11 +405,12 @@ static int remove_iter(struct device *dev, void *data) > * Remove PCI Express port service devices associated with given port and > * disable MSI-X or MSI for the port. > */ > -void pcie_port_device_remove(struct pci_dev *dev) > +void pcie_port_device_remove(struct pci_dev *dev, bool disable) > { > device_for_each_child(&dev->dev, NULL, remove_iter); > pci_free_irq_vectors(dev); > - pci_disable_device(dev); > + if (disable) > + pci_disable_device(dev); > } > > /** > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c > index 973f1b8..29afaf3 100644 > --- a/drivers/pci/pcie/portdrv_pci.c > +++ b/drivers/pci/pcie/portdrv_pci.c > @@ -137,7 +137,7 @@ static int pcie_portdrv_probe(struct pci_dev *dev, > return 0; > } > > -static void pcie_portdrv_remove(struct pci_dev *dev) > +static void __pcie_portdrv_remove(struct pci_dev *dev, bool disable) > { > if (pci_bridge_d3_possible(dev)) { > pm_runtime_forbid(&dev->dev); > @@ -145,7 +145,17 @@ static void pcie_portdrv_remove(struct pci_dev *dev) > pm_runtime_dont_use_autosuspend(&dev->dev); > } > > - pcie_port_device_remove(dev); > + pcie_port_device_remove(dev, disable); > +} > + > +static void pcie_portdrv_remove(struct pci_dev *dev) > +{ > + return __pcie_portdrv_remove(dev, true); > +} > + > +static void pcie_portdrv_shutdown(struct pci_dev *dev) > +{ > + return __pcie_portdrv_remove(dev, false); > } > > static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { > > .probe = pcie_portdrv_probe, > .remove = pcie_portdrv_remove, > - .shutdown = pcie_portdrv_remove, > + .shutdown = pcie_portdrv_shutdown, What are the circumstances when we call .remove() vs .shutdown()? I guess the main (maybe only) way to call .remove() is to hot-remove the port? And .shutdown() is basically used in the reboot and kexec paths? > .err_handler = &pcie_portdrv_err_handler, > > -- > 2.7.4 >
On 5/23/2018 5:32 PM, Bjorn Helgaas wrote: > > The crash seems to indicate that the hpsa device attempted a DMA after > we cleared the Root Port's PCI_COMMAND_MASTER, which means > hpsa_shutdown() didn't stop DMA from the device (it looks like *most* > shutdown methods don't disable device DMA, so it's in good company). All drivers are expected to shutdown DMA and interrupts in their shutdown() routines. They can skip removing threads, data structures etc. but DMA and interrupt disabling are required. This is the difference between shutdown() and remove() callbacks. If you see that this is not being done in HPSA, then that is where the bugfix should be. Counter argument is that if shutdown() is not implemented, at least remove() should be called. Expecting all drivers to implement shutdown() callbacks is just bad by design in my opinion. Code should have fallen back to remove() if shutdown() doesn't exist. I can propose a patch for this but this is yet another story to chase. > >> This has been found to cause crashes on HP DL360 Gen9 machines during >> reboot. Besides, kexec is already clearing the bus master bit in >> pci_device_shutdown() after all PCI drivers are removed. > > The original path was: > > pci_device_shutdown(hpsa) > drv->shutdown > hpsa_shutdown # hpsa_pci_driver.shutdown > ... > pci_device_shutdown(RP) # root port > drv->shutdown > pcie_portdrv_remove # pcie_portdriver.shutdown > pcie_port_device_remove > pci_disable_device > do_pci_disable_device > # clear RP PCI_COMMAND_MASTER > if (kexec) > pci_clear_master(RP) > # clear RP PCI_COMMAND_MASTER > > If I understand correctly, the new path after this patch is: > > pci_device_shutdown(hpsa) > drv->shutdown > hpsa_shutdown # hpsa_pci_driver.shutdown > ... > pci_device_shutdown(RP) # root port > drv->shutdown > pcie_portdrv_shutdown # pcie_portdriver.shutdown > __pcie_portdrv_remove(RP, false) > pcie_port_device_remove(RP, false) > # do NOT clear RP PCI_COMMAND_MASTER yup > if (kexec) > pci_clear_master(RP) > # clear RP PCI_COMMAND_MASTER > > I guess this patch avoids the panic during reboot because we're not in > the kexec path, so we never clear PCI_COMMAND_MASTER for the Root > Port, so the hpsa device can DMA happily until the lights go out. > > But DMA continuing for some random amount of time before the reboot or > shutdown happens makes me a little queasy. That doesn't sound safe. > The more I think about this, the more confused I get. What am I > missing? see above. > >> Just remove the extra clear in shutdown path by seperating the remove and >> shutdown APIs in the PORTDRV. >> >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, >> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { >> >> .probe = pcie_portdrv_probe, >> .remove = pcie_portdrv_remove, >> - .shutdown = pcie_portdrv_remove, >> + .shutdown = pcie_portdrv_shutdown, > > What are the circumstances when we call .remove() vs .shutdown()? > > I guess the main (maybe only) way to call .remove() is to hot-remove > the port? And .shutdown() is basically used in the reboot and kexec > paths? Correct. shutdown() is only called during reboot/shutdown calls. If you echo 1 into the remove file, remove() gets called. Handy for hotplug use cases. It needs to be the exact opposite of the probe. It needs to clean up resources etc. and have the HW in a state where it can be reinitialized via probe again. > >> .err_handler = &pcie_portdrv_err_handler, >> >> -- >> 2.7.4 >> >
On Wed, May 23, 2018 at 06:57:18PM -0400, Sinan Kaya wrote: > On 5/23/2018 5:32 PM, Bjorn Helgaas wrote: > > > > The crash seems to indicate that the hpsa device attempted a DMA after > > we cleared the Root Port's PCI_COMMAND_MASTER, which means > > hpsa_shutdown() didn't stop DMA from the device (it looks like *most* > > shutdown methods don't disable device DMA, so it's in good company). > > All drivers are expected to shutdown DMA and interrupts in their shutdown() > routines. They can skip removing threads, data structures etc. but DMA and > interrupt disabling are required. This is the difference between shutdown() > and remove() callbacks. > > If you see that this is not being done in HPSA, then that is where the > bugfix should be. > > Counter argument is that if shutdown() is not implemented, at least remove() > should be called. Expecting all drivers to implement shutdown() callbacks > is just bad by design in my opinion. > > Code should have fallen back to remove() if shutdown() doesn't exist. > I can propose a patch for this but this is yet another story to chase. That sounds like a reasonable idea, and it is definitely another can of worms. I looked briefly at some of the .shutdown() cases: - device_shutdown() doesn't fall back to remove(). - It looks like most bus_types don't implement .shutdown() at all (I didn't look at them all). - Of the bus_types that do implement .shutdown(), most do not fall back to .remove(). ps3_system_bus_type() is an example of one that *does* fall back to a driver's .remove() if there is no .shutdown(). Implement shutdown (no fallback unless indicated): ecard_bus_type gio_bus_type ps3_system_bus_type # does fallback to remove ibmebus_bus_type isa_bus_type platform_bus_type # not direct implementation fmc_bus_type # fmc_shutdown() looks spurious mipi_dsi_bus_type hv_bus > >> This has been found to cause crashes on HP DL360 Gen9 machines during > >> reboot. Besides, kexec is already clearing the bus master bit in > >> pci_device_shutdown() after all PCI drivers are removed. > > > > The original path was: > > > > pci_device_shutdown(hpsa) > > drv->shutdown > > hpsa_shutdown # hpsa_pci_driver.shutdown > > ... > > pci_device_shutdown(RP) # root port > > drv->shutdown > > pcie_portdrv_remove # pcie_portdriver.shutdown > > pcie_port_device_remove > > pci_disable_device > > do_pci_disable_device > > # clear RP PCI_COMMAND_MASTER > > if (kexec) > > pci_clear_master(RP) > > # clear RP PCI_COMMAND_MASTER > > > > If I understand correctly, the new path after this patch is: > > > > pci_device_shutdown(hpsa) > > drv->shutdown > > hpsa_shutdown # hpsa_pci_driver.shutdown > > ... > > pci_device_shutdown(RP) # root port > > drv->shutdown > > pcie_portdrv_shutdown # pcie_portdriver.shutdown > > __pcie_portdrv_remove(RP, false) > > pcie_port_device_remove(RP, false) > > # do NOT clear RP PCI_COMMAND_MASTER > > yup > > > if (kexec) > > pci_clear_master(RP) > > # clear RP PCI_COMMAND_MASTER > > > > I guess this patch avoids the panic during reboot because we're not in > > the kexec path, so we never clear PCI_COMMAND_MASTER for the Root > > Port, so the hpsa device can DMA happily until the lights go out. > > > > But DMA continuing for some random amount of time before the reboot or > > shutdown happens makes me a little queasy. That doesn't sound safe. > > The more I think about this, the more confused I get. What am I > > missing? > > see above. > > > > >> Just remove the extra clear in shutdown path by seperating the remove and > >> shutdown APIs in the PORTDRV. > >> > >> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, > >> @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { > >> > >> .probe = pcie_portdrv_probe, > >> .remove = pcie_portdrv_remove, > >> - .shutdown = pcie_portdrv_remove, > >> + .shutdown = pcie_portdrv_shutdown, > > > > What are the circumstances when we call .remove() vs .shutdown()? > > > > I guess the main (maybe only) way to call .remove() is to hot-remove > > the port? And .shutdown() is basically used in the reboot and kexec > > paths? > > Correct. shutdown() is only called during reboot/shutdown calls. If you echo > 1 into the remove file, remove() gets called. Handy for hotplug use cases. > It needs to be the exact opposite of the probe. It needs to clean up resources > etc. and have the HW in a state where it can be reinitialized via probe again. > > > > >> .err_handler = &pcie_portdrv_err_handler, > >> > >> -- > >> 2.7.4 > >> > > > > > -- > Sinan Kaya > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
On 5/24/2018 2:35 PM, Bjorn Helgaas wrote: > That sounds like a reasonable idea, and it is definitely another can > of worms. I looked briefly at some of the .shutdown() cases: should we throw it into 4.18 and see what happens?
On Fri, May 25, 2018 at 09:30:59AM -0400, Sinan Kaya wrote: > On 5/24/2018 2:35 PM, Bjorn Helgaas wrote: > > That sounds like a reasonable idea, and it is definitely another can > > of worms. I looked briefly at some of the .shutdown() cases: > > should we throw it into 4.18 and see what happens? It wouldn't solve this particular problem because hpsa *does* have a .shutdown() method. The problem is that it doesn't work -- it's supposed to stop DMA and interrupts but it apparently doesn't. I think we need to fix hpsa first.
On 2018-05-25 15:10, Bjorn Helgaas wrote: > On Fri, May 25, 2018 at 09:30:59AM -0400, Sinan Kaya wrote: >> On 5/24/2018 2:35 PM, Bjorn Helgaas wrote: >> > That sounds like a reasonable idea, and it is definitely another can >> > of worms. I looked briefly at some of the .shutdown() cases: >> >> should we throw it into 4.18 and see what happens? > > It wouldn't solve this particular problem because hpsa *does* have a > .shutdown() method. The problem is that it doesn't work -- it's > supposed to stop DMA and interrupts but it apparently doesn't. > > I think we need to fix hpsa first. Absolutely, the othet patch is a parallel issue.
Bjorn, On 5/25/2018 3:10 PM, Bjorn Helgaas wrote: > On Fri, May 25, 2018 at 09:30:59AM -0400, Sinan Kaya wrote: >> On 5/24/2018 2:35 PM, Bjorn Helgaas wrote: >>> That sounds like a reasonable idea, and it is definitely another can >>> of worms. I looked briefly at some of the .shutdown() cases: >> >> should we throw it into 4.18 and see what happens? > > It wouldn't solve this particular problem because hpsa *does* have a > .shutdown() method. The problem is that it doesn't work -- it's > supposed to stop DMA and interrupts but it apparently doesn't. > > I think we need to fix hpsa first. > I don't know if you followed my latest V3 patch but I found a way to put these two together to reach to an, IMO, more acceptable solution. Sinan
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h index d0c6783..f6e88fe 100644 --- a/drivers/pci/pcie/portdrv.h +++ b/drivers/pci/pcie/portdrv.h @@ -86,7 +86,7 @@ int pcie_port_device_register(struct pci_dev *dev); int pcie_port_device_suspend(struct device *dev); int pcie_port_device_resume(struct device *dev); #endif -void pcie_port_device_remove(struct pci_dev *dev); +void pcie_port_device_remove(struct pci_dev *dev, bool disable); int __must_check pcie_port_bus_register(void); void pcie_port_bus_unregister(void); diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c index c9c0663..f35341e 100644 --- a/drivers/pci/pcie/portdrv_core.c +++ b/drivers/pci/pcie/portdrv_core.c @@ -405,11 +405,12 @@ static int remove_iter(struct device *dev, void *data) * Remove PCI Express port service devices associated with given port and * disable MSI-X or MSI for the port. */ -void pcie_port_device_remove(struct pci_dev *dev) +void pcie_port_device_remove(struct pci_dev *dev, bool disable) { device_for_each_child(&dev->dev, NULL, remove_iter); pci_free_irq_vectors(dev); - pci_disable_device(dev); + if (disable) + pci_disable_device(dev); } /** diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index 973f1b8..29afaf3 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -137,7 +137,7 @@ static int pcie_portdrv_probe(struct pci_dev *dev, return 0; } -static void pcie_portdrv_remove(struct pci_dev *dev) +static void __pcie_portdrv_remove(struct pci_dev *dev, bool disable) { if (pci_bridge_d3_possible(dev)) { pm_runtime_forbid(&dev->dev); @@ -145,7 +145,17 @@ static void pcie_portdrv_remove(struct pci_dev *dev) pm_runtime_dont_use_autosuspend(&dev->dev); } - pcie_port_device_remove(dev); + pcie_port_device_remove(dev, disable); +} + +static void pcie_portdrv_remove(struct pci_dev *dev) +{ + return __pcie_portdrv_remove(dev, true); +} + +static void pcie_portdrv_shutdown(struct pci_dev *dev) +{ + return __pcie_portdrv_remove(dev, false); } static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, @@ -218,7 +228,7 @@ static struct pci_driver pcie_portdriver = { .probe = pcie_portdrv_probe, .remove = pcie_portdrv_remove, - .shutdown = pcie_portdrv_remove, + .shutdown = pcie_portdrv_shutdown, .err_handler = &pcie_portdrv_err_handler,
'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")' has been added to kernel to shutdown pending PCIe port service interrupts during reboot so that a newly started kexec kernel wouldn't observe pending interrupts. pcie_port_device_remove() is disabling the root port and switches by calling pci_disable_device() after all PCIe service drivers are shutdown. pci_disable_device() has a much wider impact then port service itself and it prevents all inbound transactions to reach to the system and impacts the entire PCI traffic behind the bridge. Issue is that pcie_port_device_remove() doesn't maintain any coordination with the rest of the PCI device drivers in the system before clearing the bus master bit. This has been found to cause crashes on HP DL360 Gen9 machines during reboot. Besides, kexec is already clearing the bus master bit in pci_device_shutdown() after all PCI drivers are removed. Just remove the extra clear in shutdown path by seperating the remove and shutdown APIs in the PORTDRV. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199779 Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown") Cc: stable@vger.kernel.org Reported-by: Ryan Finnie <ryan@finnie.org> --- drivers/pci/pcie/portdrv.h | 2 +- drivers/pci/pcie/portdrv_core.c | 5 +++-- drivers/pci/pcie/portdrv_pci.c | 16 +++++++++++++--- 3 files changed, 17 insertions(+), 6 deletions(-)