Message ID | 20180918235848.26694-1-keith.busch@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | error handling and pciehp maintenance | expand |
On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote: > I ran into a lot of trouble testing error handling, and this series is > just trying to simplify some things. The first 4 fix up aer_inject, and > the rest are cleanup to make better use of kernel APIs. > > Keith Busch (12): > PCI: Set PCI bus accessors to noinline > PCI/AER: Covertly inject errors > PCI/AER: Reuse existing service device lookup > PCI/AER: Abstract AER interrupt handling > PCI/AER: Remove dead code > PCI/AER: Remove error source from aer struct > PCI/AER: Use kfifo for tracking events > PCI/AER: Use kfifo helper inserting locked elements > PCI/AER: Don't read upstream ports below fatal errors > PCI/AER: Use threaded IRQ for bottom half > PCI/AER: Use managed resource allocations > PCI/pciehp: Use device managed allocations > > drivers/pci/access.c | 4 +- > drivers/pci/hotplug/pciehp_core.c | 14 +- > drivers/pci/hotplug/pciehp_hpc.c | 48 ++---- > drivers/pci/pcie/Kconfig | 2 +- > drivers/pci/pcie/aer.c | 219 ++++++--------------------- > drivers/pci/pcie/aer_inject.c | 306 ++++++++++++++++++++------------------ > drivers/pci/pcie/portdrv.h | 4 - > drivers/pci/pcie/portdrv_core.c | 1 + > 8 files changed, 227 insertions(+), 371 deletions(-) Thanks a lot for doing this! I applied these to pci/hotplug for v4.20, except for "PCI/AER: Don't read upstream ports below fatal errors", which seems to be already there via another posting, and "PCI/pciehp: Use device managed allocations", which needs a few tweaks.
On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote: > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote: > > I ran into a lot of trouble testing error handling, and this series is > > just trying to simplify some things. The first 4 fix up aer_inject, and > > the rest are cleanup to make better use of kernel APIs. > > > > Keith Busch (12): > > PCI: Set PCI bus accessors to noinline > > PCI/AER: Covertly inject errors > > PCI/AER: Reuse existing service device lookup > > PCI/AER: Abstract AER interrupt handling > > PCI/AER: Remove dead code > > PCI/AER: Remove error source from aer struct > > PCI/AER: Use kfifo for tracking events > > PCI/AER: Use kfifo helper inserting locked elements > > PCI/AER: Don't read upstream ports below fatal errors > > PCI/AER: Use threaded IRQ for bottom half > > PCI/AER: Use managed resource allocations > > PCI/pciehp: Use device managed allocations > > > > drivers/pci/access.c | 4 +- > > drivers/pci/hotplug/pciehp_core.c | 14 +- > > drivers/pci/hotplug/pciehp_hpc.c | 48 ++---- > > drivers/pci/pcie/Kconfig | 2 +- > > drivers/pci/pcie/aer.c | 219 ++++++--------------------- > > drivers/pci/pcie/aer_inject.c | 306 ++++++++++++++++++++------------------ > > drivers/pci/pcie/portdrv.h | 4 - > > drivers/pci/pcie/portdrv_core.c | 1 + > > 8 files changed, 227 insertions(+), 371 deletions(-) > > Thanks a lot for doing this! I applied these to pci/hotplug for > v4.20, except for "PCI/AER: Don't read upstream ports below fatal > errors", which seems to be already there via another posting, and > "PCI/pciehp: Use device managed allocations", which needs a few > tweaks. Sounds good, and thanks for applying! In case this went unnoticed, patch 2's aer_inject using ftrace hooks to pci config accessors is really cool and fixes several kernel crashes I encountered, but it may not work on every architecture. I'm not sure how widely aer_inject is used, so maybe there are no concerns with the DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that dependency in case there are valid objections.
[+cc arm64 folks, LKML: This conversation is about this patch: https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com which fixes some PCIe AER error injection bugs, but also makes the error injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches support. Note that this question is only about the error *injection* module used for testing. It doesn't affect AER support itself.] On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote: > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote: > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote: > > > I ran into a lot of trouble testing error handling, and this series is > > > just trying to simplify some things. The first 4 fix up aer_inject, and > > > the rest are cleanup to make better use of kernel APIs. > > > > > > Keith Busch (12): > > > PCI: Set PCI bus accessors to noinline > > > PCI/AER: Covertly inject errors > > > PCI/AER: Reuse existing service device lookup > > > PCI/AER: Abstract AER interrupt handling > > > PCI/AER: Remove dead code > > > PCI/AER: Remove error source from aer struct > > > PCI/AER: Use kfifo for tracking events > > > PCI/AER: Use kfifo helper inserting locked elements > > > PCI/AER: Don't read upstream ports below fatal errors > > > PCI/AER: Use threaded IRQ for bottom half > > > PCI/AER: Use managed resource allocations > > > PCI/pciehp: Use device managed allocations > > > > > > drivers/pci/access.c | 4 +- > > > drivers/pci/hotplug/pciehp_core.c | 14 +- > > > drivers/pci/hotplug/pciehp_hpc.c | 48 ++---- > > > drivers/pci/pcie/Kconfig | 2 +- > > > drivers/pci/pcie/aer.c | 219 ++++++--------------------- > > > drivers/pci/pcie/aer_inject.c | 306 ++++++++++++++++++++------------------ > > > drivers/pci/pcie/portdrv.h | 4 - > > > drivers/pci/pcie/portdrv_core.c | 1 + > > > 8 files changed, 227 insertions(+), 371 deletions(-) > > > > Thanks a lot for doing this! I applied these to pci/hotplug for > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal > > errors", which seems to be already there via another posting, and > > "PCI/pciehp: Use device managed allocations", which needs a few > > tweaks. > > Sounds good, and thanks for applying! > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks > to pci config accessors is really cool and fixes several kernel crashes > I encountered, but it may not work on every architecture. I'm not sure > how widely aer_inject is used, so maybe there are no concerns with the > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that > dependency in case there are valid objections. Oh, indeed, I hadn't noticed this arch dependency. AFAICT, the new DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only on these arches: arm # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU powerpc # if PPC64 && CPU_LITTLE_ENDIAN riscv # ARCH_RV64I only s390 x86 Notably missing is arm64, which has DYNAMIC_FTRACE but not DYNAMIC_FTRACE_WITH_REGS. Bjorn
On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote: > [+cc arm64 folks, LKML: This conversation is about this patch: > > https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com > > which fixes some PCIe AER error injection bugs, but also makes the error > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches > support. Note that this question is only about the error *injection* > module used for testing. It doesn't affect AER support itself.] > > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote: > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote: > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote: > > > > I ran into a lot of trouble testing error handling, and this series is > > > > just trying to simplify some things. The first 4 fix up aer_inject, and > > > > the rest are cleanup to make better use of kernel APIs. > > > > > > > > Keith Busch (12): > > > > PCI: Set PCI bus accessors to noinline > > > > PCI/AER: Covertly inject errors > > > > PCI/AER: Reuse existing service device lookup > > > > PCI/AER: Abstract AER interrupt handling > > > > PCI/AER: Remove dead code > > > > PCI/AER: Remove error source from aer struct > > > > PCI/AER: Use kfifo for tracking events > > > > PCI/AER: Use kfifo helper inserting locked elements > > > > PCI/AER: Don't read upstream ports below fatal errors > > > > PCI/AER: Use threaded IRQ for bottom half > > > > PCI/AER: Use managed resource allocations > > > > PCI/pciehp: Use device managed allocations > > > > > > > > drivers/pci/access.c | 4 +- > > > > drivers/pci/hotplug/pciehp_core.c | 14 +- > > > > drivers/pci/hotplug/pciehp_hpc.c | 48 ++---- > > > > drivers/pci/pcie/Kconfig | 2 +- > > > > drivers/pci/pcie/aer.c | 219 ++++++--------------------- > > > > drivers/pci/pcie/aer_inject.c | 306 ++++++++++++++++++++------------------ > > > > drivers/pci/pcie/portdrv.h | 4 - > > > > drivers/pci/pcie/portdrv_core.c | 1 + > > > > 8 files changed, 227 insertions(+), 371 deletions(-) > > > > > > Thanks a lot for doing this! I applied these to pci/hotplug for > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal > > > errors", which seems to be already there via another posting, and > > > "PCI/pciehp: Use device managed allocations", which needs a few > > > tweaks. > > > > Sounds good, and thanks for applying! > > > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks > > to pci config accessors is really cool and fixes several kernel crashes > > I encountered, but it may not work on every architecture. I'm not sure > > how widely aer_inject is used, so maybe there are no concerns with the > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that > > dependency in case there are valid objections. > > Oh, indeed, I hadn't noticed this arch dependency. AFAICT, the new > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only > on these arches: > > arm # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU > powerpc # if PPC64 && CPU_LITTLE_ENDIAN > riscv # ARCH_RV64I only > s390 > x86 > > Notably missing is arm64, which has DYNAMIC_FTRACE but not > DYNAMIC_FTRACE_WITH_REGS. > > Bjorn Looks like the kbuild bot found an ARM kernel config that has DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile there. I'll need to update this patch regardless, and I think the right thing to do is maintain the "old" way with conditional compiling for any arch specific features.
On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote: > On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote: > > [+cc arm64 folks, LKML: This conversation is about this patch: > > > > https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com > > > > which fixes some PCIe AER error injection bugs, but also makes the error > > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches > > support. Note that this question is only about the error *injection* > > module used for testing. It doesn't affect AER support itself.] > > > > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote: > > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote: > > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote: > > > > > I ran into a lot of trouble testing error handling, and this series is > > > > > just trying to simplify some things. The first 4 fix up aer_inject, and > > > > > the rest are cleanup to make better use of kernel APIs. > > > > > > > > > > Keith Busch (12): > > > > > PCI: Set PCI bus accessors to noinline > > > > > PCI/AER: Covertly inject errors > > > > > PCI/AER: Reuse existing service device lookup > > > > > PCI/AER: Abstract AER interrupt handling > > > > > PCI/AER: Remove dead code > > > > > PCI/AER: Remove error source from aer struct > > > > > PCI/AER: Use kfifo for tracking events > > > > > PCI/AER: Use kfifo helper inserting locked elements > > > > > PCI/AER: Don't read upstream ports below fatal errors > > > > > PCI/AER: Use threaded IRQ for bottom half > > > > > PCI/AER: Use managed resource allocations > > > > > PCI/pciehp: Use device managed allocations > > > > > > > > > > drivers/pci/access.c | 4 +- > > > > > drivers/pci/hotplug/pciehp_core.c | 14 +- > > > > > drivers/pci/hotplug/pciehp_hpc.c | 48 ++---- > > > > > drivers/pci/pcie/Kconfig | 2 +- > > > > > drivers/pci/pcie/aer.c | 219 ++++++--------------------- > > > > > drivers/pci/pcie/aer_inject.c | 306 ++++++++++++++++++++------------------ > > > > > drivers/pci/pcie/portdrv.h | 4 - > > > > > drivers/pci/pcie/portdrv_core.c | 1 + > > > > > 8 files changed, 227 insertions(+), 371 deletions(-) > > > > > > > > Thanks a lot for doing this! I applied these to pci/hotplug for > > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal > > > > errors", which seems to be already there via another posting, and > > > > "PCI/pciehp: Use device managed allocations", which needs a few > > > > tweaks. > > > > > > Sounds good, and thanks for applying! > > > > > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks > > > to pci config accessors is really cool and fixes several kernel crashes > > > I encountered, but it may not work on every architecture. I'm not sure > > > how widely aer_inject is used, so maybe there are no concerns with the > > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that > > > dependency in case there are valid objections. > > > > Oh, indeed, I hadn't noticed this arch dependency. AFAICT, the new > > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only > > on these arches: > > > > arm # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU > > powerpc # if PPC64 && CPU_LITTLE_ENDIAN > > riscv # ARCH_RV64I only > > s390 > > x86 > > > > Notably missing is arm64, which has DYNAMIC_FTRACE but not > > DYNAMIC_FTRACE_WITH_REGS. > > > > Bjorn > > Looks like the kbuild bot found an ARM kernel config that has > DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile > there. I'll need to update this patch regardless, and I think the right > thing to do is maintain the "old" way with conditional compiling for > any arch specific features. Sounds messy, but probably the best route. I dropped these patches for now: PCI/AER: Covertly inject errors with ftrace hooks PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Abstract AER interrupt handling
Hi Bjorn, On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote: > [+cc arm64 folks, LKML: This conversation is about this patch: > > https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com > > which fixes some PCIe AER error injection bugs, but also makes the error > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches > support. Note that this question is only about the error *injection* > module used for testing. It doesn't affect AER support itself.] > > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote: > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks > > to pci config accessors is really cool and fixes several kernel crashes > > I encountered, but it may not work on every architecture. I'm not sure > > how widely aer_inject is used, so maybe there are no concerns with the > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that > > dependency in case there are valid objections. > > Oh, indeed, I hadn't noticed this arch dependency. AFAICT, the new > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only > on these arches: > > arm # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU > powerpc # if PPC64 && CPU_LITTLE_ENDIAN > riscv # ARCH_RV64I only > s390 > x86 > > Notably missing is arm64, which has DYNAMIC_FTRACE but not > DYNAMIC_FTRACE_WITH_REGS. Thanks for the heads-up here. This feature is currently in development for arm64: http://lkml.kernel.org/r/20181001141648.1DBED68BDF@newverein.lst.de but the latest review comments suggest that it's a fair way from ready yet, so I wouldn't hold your breath. Will
On Mon, Oct 08, 2018 at 10:18:47AM -0600, Keith Busch wrote: > On Fri, Oct 05, 2018 at 12:31:45PM -0500, Bjorn Helgaas wrote: > > [+cc arm64 folks, LKML: This conversation is about this patch: > > > > https://lore.kernel.org/linux-pci/20180918235848.26694-3-keith.busch@intel.com > > > > which fixes some PCIe AER error injection bugs, but also makes the error > > injector dependent on DYNAMIC_FTRACE_WITH_REGS, which not all arches > > support. Note that this question is only about the error *injection* > > module used for testing. It doesn't affect AER support itself.] > > > > On Thu, Oct 04, 2018 at 04:11:37PM -0600, Keith Busch wrote: > > > On Thu, Oct 04, 2018 at 04:40:15PM -0500, Bjorn Helgaas wrote: > > > > On Tue, Sep 18, 2018 at 05:58:36PM -0600, Keith Busch wrote: > > > > > I ran into a lot of trouble testing error handling, and this series is > > > > > just trying to simplify some things. The first 4 fix up aer_inject, and > > > > > the rest are cleanup to make better use of kernel APIs. > > > > > > > > > > Keith Busch (12): > > > > > PCI: Set PCI bus accessors to noinline > > > > > PCI/AER: Covertly inject errors > > > > > PCI/AER: Reuse existing service device lookup > > > > > PCI/AER: Abstract AER interrupt handling > > > > > PCI/AER: Remove dead code > > > > > PCI/AER: Remove error source from aer struct > > > > > PCI/AER: Use kfifo for tracking events > > > > > PCI/AER: Use kfifo helper inserting locked elements > > > > > PCI/AER: Don't read upstream ports below fatal errors > > > > > PCI/AER: Use threaded IRQ for bottom half > > > > > PCI/AER: Use managed resource allocations > > > > > PCI/pciehp: Use device managed allocations > > > > > > > > > > drivers/pci/access.c | 4 +- > > > > > drivers/pci/hotplug/pciehp_core.c | 14 +- > > > > > drivers/pci/hotplug/pciehp_hpc.c | 48 ++---- > > > > > drivers/pci/pcie/Kconfig | 2 +- > > > > > drivers/pci/pcie/aer.c | 219 ++++++--------------------- > > > > > drivers/pci/pcie/aer_inject.c | 306 ++++++++++++++++++++------------------ > > > > > drivers/pci/pcie/portdrv.h | 4 - > > > > > drivers/pci/pcie/portdrv_core.c | 1 + > > > > > 8 files changed, 227 insertions(+), 371 deletions(-) > > > > > > > > Thanks a lot for doing this! I applied these to pci/hotplug for > > > > v4.20, except for "PCI/AER: Don't read upstream ports below fatal > > > > errors", which seems to be already there via another posting, and > > > > "PCI/pciehp: Use device managed allocations", which needs a few > > > > tweaks. > > > > > > Sounds good, and thanks for applying! > > > > > > In case this went unnoticed, patch 2's aer_inject using ftrace hooks > > > to pci config accessors is really cool and fixes several kernel crashes > > > I encountered, but it may not work on every architecture. I'm not sure > > > how widely aer_inject is used, so maybe there are no concerns with the > > > DYNAMIC_FTRACE_WITH_REGS dependency, but I just want to reemphasize that > > > dependency in case there are valid objections. > > > > Oh, indeed, I hadn't noticed this arch dependency. AFAICT, the new > > DYNAMIC_FTRACE_WITH_REGS dependency means aer_inject will work only > > on these arches: > > > > arm # if (!XIP_KERNEL) && !CPU_ENDIAN_BE32 && MMU > > powerpc # if PPC64 && CPU_LITTLE_ENDIAN > > riscv # ARCH_RV64I only > > s390 > > x86 > > > > Notably missing is arm64, which has DYNAMIC_FTRACE but not > > DYNAMIC_FTRACE_WITH_REGS. > > > > Bjorn > > Looks like the kbuild bot found an ARM kernel config that has > DYNAMIC_FTRACE_WITH_REGS set, and then the module can't compile > there. I'll need to update this patch regardless, and I think the right > thing to do is maintain the "old" way with conditional compiling for > any arch specific features. The question is whether we really need to dynamically patch the kernel with ftrace to achieve what that patch does. Furthermore, it would also be good to report what bugs we are actually fixing, from what you are writing falling back to the current method if !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with fixing the current behaviour with something that does not depend on arch features that may not even be implemented. Anyway - please CC me in in the follow up series, I am happy to help you test it. Thanks, Lorenzo
On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote: > The question is whether we really need to dynamically patch the kernel > with ftrace to achieve what that patch does. > > Furthermore, it would also be good to report what bugs we are actually > fixing, from what you are writing falling back to the current method if > !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with > fixing the current behaviour with something that does not depend on arch > features that may not even be implemented. There are two problems with the current method: 1. It may dereference pci_dev after it was freed 2. The pci_dev's children inherit its fake pci_bus's ops on enumeration Both result in kernel panic. The dynamic kernel patch just seemed like a cool way to inject errors without messing with the driver's structures. But if there's a more elegant way to do it, I'm all for it.
On Tue, Nov 06, 2018 at 09:47:52AM -0700, Keith Busch wrote: > On Tue, Nov 06, 2018 at 04:34:08PM +0000, Lorenzo Pieralisi wrote: > > The question is whether we really need to dynamically patch the kernel > > with ftrace to achieve what that patch does. > > > > Furthermore, it would also be good to report what bugs we are actually > > fixing, from what you are writing falling back to the current method if > > !DYNAMIC_FTRACE_WITH_REGS is broken in many ways and I would start with > > fixing the current behaviour with something that does not depend on arch > > features that may not even be implemented. > > There are two problems with the current method: > > 1. It may dereference pci_dev after it was freed > 2. The pci_dev's children inherit its fake pci_bus's ops on > enumeration > > Both result in kernel panic. That's my point, current test module is not robust, I wanted to ask if there is a way to fix it that does not depend on arch features, because if there is a dependency that is not met we are still not fixing the current code, using it as a fallback can still create issues. > The dynamic kernel patch just seemed like a cool way to inject errors > without messing with the driver's structures. But if there's a more > elegant way to do it, I'm all for it. If you have a simple reproducer for the bugs I am happy to help you test it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that new code path if that's the final direction we are taking). Thanks, Lorenzo
On Tue, Nov 06, 2018 at 05:21:00PM +0000, Lorenzo Pieralisi wrote: > If you have a simple reproducer for the bugs I am happy to help you test > it (I can also apply arm64 DYNAMIC_FTRACE_WITH_REGS patches and test that > new code path if that's the final direction we are taking). The easiest way to reproduce is load the aer_inject module, inject an error into a bridge, then remove the bridge and rescan.