Message ID | 1530571967-19099-4-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: > If a bridge supports hotplug and observes a PCIe fatal error, the following > events happen: > > 1. AER driver removes the devices from PCI tree on fatal error > 2. AER driver brings down the link by issuing a secondary bus reset waits > for the link to come up. > 3. Hotplug driver observes a link down interrupt > 4. Hotplug driver tries to remove the devices waiting for the rescan lock > but devices are already removed by the AER driver and AER driver is waiting > for the link to come back up. > 5. AER driver tries to re-enumerate devices after polling for the link > state to go up. > 6. Hotplug driver obtains the lock and tries to remove the devices again. > > If a bridge is a hotplug capable bridge, mask hotplug interrupts before the > reset and unmask afterwards. Would it work for you if you just amended the AER driver to skip removal and re-enumeration of devices if the port is a hotplug bridge? Just check for is_hotplug_bridge in struct pci_dev. That would seem like a much simpler solution, given that it is known that the link will flap on reset, causing the hotplug driver to remove and re-enumerate devices. That would also cover cases where hotplug is handled by a different driver than pciehp, or by the platform firmware. Thanks, Lukas
On 2018-07-03 04:34, Lukas Wunner wrote: > On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: >> If a bridge supports hotplug and observes a PCIe fatal error, the >> following >> events happen: >> >> 1. AER driver removes the devices from PCI tree on fatal error >> 2. AER driver brings down the link by issuing a secondary bus reset >> waits >> for the link to come up. >> 3. Hotplug driver observes a link down interrupt >> 4. Hotplug driver tries to remove the devices waiting for the rescan >> lock >> but devices are already removed by the AER driver and AER driver is >> waiting >> for the link to come back up. >> 5. AER driver tries to re-enumerate devices after polling for the link >> state to go up. >> 6. Hotplug driver obtains the lock and tries to remove the devices >> again. >> >> If a bridge is a hotplug capable bridge, mask hotplug interrupts >> before the >> reset and unmask afterwards. > > Would it work for you if you just amended the AER driver to skip > removal and re-enumeration of devices if the port is a hotplug bridge? > Just check for is_hotplug_bridge in struct pci_dev. The reason why we want to remove devices before secondary bus reset is to quiesce pcie bus traffic before issuing a reset. Skipping this step might cause transactions to be lost in the middle of the reset as there will be active traffic flowing and drivers will suddenly start reading ffs. I don't think we can skip this step. > > That would seem like a much simpler solution, given that it is known > that the link will flap on reset, causing the hotplug driver to remove > and re-enumerate devices. That would also cover cases where hotplug is > handled by a different driver than pciehp, or by the platform firmware. > > Thanks, > > Lukas
On 2018-07-03 17:00, okaya@codeaurora.org wrote: > On 2018-07-03 04:34, Lukas Wunner wrote: >> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: >>> If a bridge supports hotplug and observes a PCIe fatal error, the >>> following >>> events happen: >>> >>> 1. AER driver removes the devices from PCI tree on fatal error >>> 2. AER driver brings down the link by issuing a secondary bus reset >>> waits >>> for the link to come up. >>> 3. Hotplug driver observes a link down interrupt >>> 4. Hotplug driver tries to remove the devices waiting for the rescan >>> lock >>> but devices are already removed by the AER driver and AER driver is >>> waiting >>> for the link to come back up. >>> 5. AER driver tries to re-enumerate devices after polling for the >>> link >>> state to go up. >>> 6. Hotplug driver obtains the lock and tries to remove the devices >>> again. >>> >>> If a bridge is a hotplug capable bridge, mask hotplug interrupts >>> before the >>> reset and unmask afterwards. >> >> Would it work for you if you just amended the AER driver to skip >> removal and re-enumeration of devices if the port is a hotplug bridge? >> Just check for is_hotplug_bridge in struct pci_dev. > > The reason why we want to remove devices before secondary bus reset is > to quiesce pcie bus traffic before issuing a reset. > > Skipping this step might cause transactions to be lost in the middle > of the reset as there will be active traffic flowing and drivers will > suddenly start reading ffs. > > I don't think we can skip this step. > what if we only have conditional enumeration ? (leaving removing devices followed by SBR as is) ? following code is doing little more extra work than our normal ERR_FATAL path. pciehp_unconfigure_device doing little more than enumeration to quiescence the bus. /* * Ensure that no new Requests will be generated from * the device. */ if (presence) { pci_read_config_word(dev, PCI_COMMAND, &command); command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR); command |= PCI_COMMAND_INTX_DISABLE; pci_write_config_word(dev, PCI_COMMAND, command); } > >> >> That would seem like a much simpler solution, given that it is known >> that the link will flap on reset, causing the hotplug driver to remove >> and re-enumerate devices. That would also cover cases where hotplug >> is >> handled by a different driver than pciehp, or by the platform >> firmware. >> >> Thanks, >> >> Lukas
> > what if we only have conditional enumeration ? (leaving removing devices followed by SBR as is) ? > If we leave it as is, hotplug driver observes a link down interrupt as soon as secondary bus reset is issued. Hotplug driver will try device removal while AER driver is working on the currently observe FATAL error causing a race condition. Hotplug driver wait for rescan lock to be obtained. > following code is doing little more extra work than our normal ERR_FATAL path. > See this comment and the pseudo code below. /* * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary * bus reset of the bridge, but at the same time we want to ensure that it is * not seen as a hot-unplug, followed by the hot-plug of the device. Thus, * disable link state notification and presence detection change notification * momentarily, if we see that they could interfere. Also, clear any spurious * events after. */ int pciehp_reset_slot(struct slot *slot, int probe) { <mask hotplug interrupts> <issue secondary bus reset> <unmask hotplug interrupts> } This patch is trying to mask the interrupt before secondary bus reset in AER path not after. Otherwise, hotplug driver will remove the devices as soon as it obtains the rescan lock following AER recovery since hotplug interrupts will be pending. > pciehp_unconfigure_device doing little more than enumeration to quiescence the bus. > > /* > * Ensure that no new Requests will be generated from > * the device. > */ > if (presence) { > pci_read_config_word(dev, PCI_COMMAND, &command); > command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR); > command |= PCI_COMMAND_INTX_DISABLE; > pci_write_config_word(dev, PCI_COMMAND, command); > } > > > >> >>> >>> That would seem like a much simpler solution, given that it is known >>> that the link will flap on reset, causing the hotplug driver to remove >>> and re-enumerate devices. That would also cover cases where hotplug is >>> handled by a different driver than pciehp, or by the platform firmware. >>> >>> Thanks, >>> >>> Lukas >
On 7/3/2018 9:25 AM, Sinan Kaya wrote: >> what if we only have conditional enumeration ? (leaving removing devices followed by SBR as is) ? >> Sorry, I think I didn't quite get your question. Are you asking about the enumeration following link up while keeping the current code as is? Issue is observing hotplug link down event in the middle of AER recovery as in my previous reply. If we mask hotplug interrupts before secondary bus reset via my patch, then hotplug driver will not observe both link up and link down interrupts. If we don't mask hotplug interrupts, we have a race condition. I hope I answered your question.
On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote: > Issue is observing hotplug link down event in the middle of AER recovery > as in my previous reply. > > If we mask hotplug interrupts before secondary bus reset via my patch, > then hotplug driver will not observe both link up and link down interrupts. > > If we don't mask hotplug interrupts, we have a race condition. I assume that a bus reset not only causes a link and presence event but also clears the Presence Detect State bit in the Slot Status register and the Data Link Layer Link Active bit in the Link Status register momentarily. pciehp may access those two bits concurrently to the AER driver performing a slot reset. So it may not be sufficient to mask the interrupt. I've posted this patch to address the issue: https://patchwork.ozlabs.org/patch/930391/ Thanks, Lukas
On 2018-07-03 19:29, Lukas Wunner wrote: > On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote: >> Issue is observing hotplug link down event in the middle of AER >> recovery >> as in my previous reply. >> >> If we mask hotplug interrupts before secondary bus reset via my patch, >> then hotplug driver will not observe both link up and link down >> interrupts. >> >> If we don't mask hotplug interrupts, we have a race condition. > > I assume that a bus reset not only causes a link and presence event but > also clears the Presence Detect State bit in the Slot Status register > and the Data Link Layer Link Active bit in the Link Status register > momentarily. > > pciehp may access those two bits concurrently to the AER driver > performing a slot reset. So it may not be sufficient to mask > the interrupt. Was just wondering that you are protecting Presence Detect State bit with reset_lock, mainly in pciehp_ist but with hotplug interrupt disabled, is there another way that it hotplug code gets activated ? > > I've posted this patch to address the issue: > https://patchwork.ozlabs.org/patch/930391/ > > Thanks, > > Lukas
On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote: > On 2018-07-03 04:34, Lukas Wunner wrote: > >On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: > >>If a bridge supports hotplug and observes a PCIe fatal error, the > >>following > >>events happen: > >> > >>1. AER driver removes the devices from PCI tree on fatal error > >>2. AER driver brings down the link by issuing a secondary bus reset > >>waits > >>for the link to come up. > >>3. Hotplug driver observes a link down interrupt > >>4. Hotplug driver tries to remove the devices waiting for the rescan > >>lock > >>but devices are already removed by the AER driver and AER driver is > >>waiting > >>for the link to come back up. > >>5. AER driver tries to re-enumerate devices after polling for the link > >>state to go up. > >>6. Hotplug driver obtains the lock and tries to remove the devices > >>again. > >> > >>If a bridge is a hotplug capable bridge, mask hotplug interrupts before > >>the > >>reset and unmask afterwards. > > > >Would it work for you if you just amended the AER driver to skip > >removal and re-enumeration of devices if the port is a hotplug bridge? > >Just check for is_hotplug_bridge in struct pci_dev. > > The reason why we want to remove devices before secondary bus reset is to > quiesce pcie bus traffic before issuing a reset. > > Skipping this step might cause transactions to be lost in the middle of the > reset as there will be active traffic flowing and drivers will suddenly > start reading ffs. Interesting, I think that merits a code comment. FWIW, macOS has a "PCI pause" callback to quiesce a device: https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf They're using it to reconfigure a device's BAR and bus number at runtime (sic!), e.g. if mmio windows need to be moved around on Thunderbolt hotplug if there's insufficient space: "During pause reconfiguration, the following may be changed: - device BAR registers - the devices bus number - registry properties reflecting these values ("ranges", "assigned-addresses", "reg") - device MSI block values for address and value, but not the number of MSIs allocated" Conceptually, "PCI pause" is similar to putting the device in a suspend state. I'm wondering if suspending the devices below the bridge would make more sense than removing them in the AER driver. Lukas
On Tue, Jul 03, 2018 at 07:40:46PM +0530, poza@codeaurora.org wrote: > On 2018-07-03 19:29, Lukas Wunner wrote: > >On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote: > >>Issue is observing hotplug link down event in the middle of AER recovery > >>as in my previous reply. > >> > >>If we mask hotplug interrupts before secondary bus reset via my patch, > >>then hotplug driver will not observe both link up and link down > >>interrupts. > >> > >>If we don't mask hotplug interrupts, we have a race condition. > > > >I assume that a bus reset not only causes a link and presence event but > >also clears the Presence Detect State bit in the Slot Status register > >and the Data Link Layer Link Active bit in the Link Status register > >momentarily. > > > >pciehp may access those two bits concurrently to the AER driver > >performing a slot reset. So it may not be sufficient to mask > >the interrupt. > > > >I've posted this patch to address the issue: > >https://patchwork.ozlabs.org/patch/930391/ > > Was just wondering that you are protecting Presence Detect State bit with > reset_lock, mainly in pciehp_ist > but with hotplug interrupt disabled, is there another way that it hotplug > code gets activated ? The user may turn the slot on/off via sysfs. If an Attention Button is present, the user may also press that button to turn the slot on/off after 5 seconds. Either way, it may cause pciehp's IRQ thread to run concurrently to a reset initiated by the AER driver, independently of any events signalled by the slot. Thanks, Lukas
On 2018-07-03 19:42, Lukas Wunner wrote: > On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote: >> On 2018-07-03 04:34, Lukas Wunner wrote: >> >On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: >> >>If a bridge supports hotplug and observes a PCIe fatal error, the >> >>following >> >>events happen: >> >> >> >>1. AER driver removes the devices from PCI tree on fatal error >> >>2. AER driver brings down the link by issuing a secondary bus reset >> >>waits >> >>for the link to come up. >> >>3. Hotplug driver observes a link down interrupt >> >>4. Hotplug driver tries to remove the devices waiting for the rescan >> >>lock >> >>but devices are already removed by the AER driver and AER driver is >> >>waiting >> >>for the link to come back up. >> >>5. AER driver tries to re-enumerate devices after polling for the link >> >>state to go up. >> >>6. Hotplug driver obtains the lock and tries to remove the devices >> >>again. >> >> >> >>If a bridge is a hotplug capable bridge, mask hotplug interrupts before >> >>the >> >>reset and unmask afterwards. >> > >> >Would it work for you if you just amended the AER driver to skip >> >removal and re-enumeration of devices if the port is a hotplug bridge? >> >Just check for is_hotplug_bridge in struct pci_dev. >> >> The reason why we want to remove devices before secondary bus reset is >> to >> quiesce pcie bus traffic before issuing a reset. >> >> Skipping this step might cause transactions to be lost in the middle >> of the >> reset as there will be active traffic flowing and drivers will >> suddenly >> start reading ffs. > > Interesting, I think that merits a code comment. > > FWIW, macOS has a "PCI pause" callback to quiesce a device: > https://opensource.apple.com/source/IOPCIFamily/IOPCIFamily-239.1.2/pause.rtf > > They're using it to reconfigure a device's BAR and bus number > at runtime (sic!), e.g. if mmio windows need to be moved around > on Thunderbolt hotplug if there's insufficient space: > > "During pause reconfiguration, the following may be changed: > - device BAR registers > - the devices bus number > - registry properties reflecting these values ("ranges", > "assigned-addresses", "reg") > - device MSI block values for address and value, but not the > number of MSIs allocated" > > Conceptually, "PCI pause" is similar to putting the device in a suspend > state. I'm wondering if suspending the devices below the bridge would > make more sense than removing them in the AER driver. > the code is shared by not only AER but also DPC, where if DPC event happens the devices are removed. also if the bridge is hotplug capable, then the devices beneath might have changed and resume might break. > Lukas
On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: > @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) > pci_dev_put(pdev); > } > > + hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP); > + hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP); > + > + if (hpdev && hpsvc) > + hpsvc->mask_irq(to_pcie_device(hpdev)); > + > result = reset_link(udev, service); > > + if (hpdev && hpsvc) > + hpsvc->unmask_irq(to_pcie_device(hpdev)); > + We've already got the ->reset_slot callback in struct hotplug_slot_ops, I'm wondering if we really need additional ones for this use case. If you just do... if (!pci_probe_reset_slot(dev->slot)) pci_reset_slot(dev->slot) else { /* regular code path */ } would that not be equivalent? (It's worth noting though that pciehp is the only hotplug driver implementing the ->reset_slot callback. If hotplug is handled by a different driver or by the platform firmware, devices may still be removed and re-enumerated twice.) Thanks, Lukas
On 2018-07-03 20:04, Lukas Wunner wrote: > On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: >> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, >> u32 service) >> pci_dev_put(pdev); >> } >> >> + hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP); >> + hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP); >> + >> + if (hpdev && hpsvc) >> + hpsvc->mask_irq(to_pcie_device(hpdev)); >> + >> result = reset_link(udev, service); >> >> + if (hpdev && hpsvc) >> + hpsvc->unmask_irq(to_pcie_device(hpdev)); >> + > > We've already got the ->reset_slot callback in struct hotplug_slot_ops, > I'm wondering if we really need additional ones for this use case. > > If you just do... > > if (!pci_probe_reset_slot(dev->slot)) > pci_reset_slot(dev->slot) > else { > /* regular code path */ > } > > would that not be equivalent? > pcie_do_fatal_recovery() calls reset_link() which calls dpc_reset_link() or aer_root_reset() depending on the event. and dpc_reset_link() and aer_root_reset() do their own cleanup stuffs. infact, aer_root_reset() is the only function which actually triggers pci_reset_bridge_secondary_bus(). So if we try to fit in your suggestion: if (!pci_probe_reset_slot(dev->slot)) { pci_reset_slot(dev->slot) result = reset_link(udev, service); >> in this case aer_root_reset must not call pci_reset_bridge_secondary_bus() } else result = reset_link(udev, service); >> in this case aer_root_reset must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug capable) Did I get your suggestion right ? Regards, Oza. > (It's worth noting though that pciehp is the only hotplug driver > implementing the ->reset_slot callback. If hotplug is handled by > a different driver or by the platform firmware, devices may still > be removed and re-enumerated twice.) > > Thanks, > > Lukas
On 7/3/2018 9:59 AM, Lukas Wunner wrote: > On Tue, Jul 03, 2018 at 09:31:24AM -0400, Sinan Kaya wrote: >> Issue is observing hotplug link down event in the middle of AER recovery >> as in my previous reply. >> >> If we mask hotplug interrupts before secondary bus reset via my patch, >> then hotplug driver will not observe both link up and link down interrupts. >> >> If we don't mask hotplug interrupts, we have a race condition. > > I assume that a bus reset not only causes a link and presence event but > also clears the Presence Detect State bit in the Slot Status register > and the Data Link Layer Link Active bit in the Link Status register > momentarily. > > pciehp may access those two bits concurrently to the AER driver > performing a slot reset. So it may not be sufficient to mask > the interrupt. > > I've posted this patch to address the issue: > https://patchwork.ozlabs.org/patch/930391/ Very interesting! I missed this completely. I know for a fact that bus reset clears the Data Link Layer Active bit as soon as link goes down. It gets set again following link up. Presence detect depends on the HW implementation. QDT root ports don't change presence detect for instance since nobody actually removed the card. If an implementation supports in-band presence detect, the answer is yes. As soon as the link goes down, presence detect bit will get cleared until recovery. It sounds like we need to update your lock change with my proposal. lock() in mask_irq() and unlock() in unmask_irq() > > Thanks, > > Lukas >
On 7/3/2018 10:34 AM, Lukas Wunner wrote: > On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: >> @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) >> pci_dev_put(pdev); >> } >> >> + hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP); >> + hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP); >> + >> + if (hpdev && hpsvc) >> + hpsvc->mask_irq(to_pcie_device(hpdev)); >> + >> result = reset_link(udev, service); >> >> + if (hpdev && hpsvc) >> + hpsvc->unmask_irq(to_pcie_device(hpdev)); >> + > > We've already got the ->reset_slot callback in struct hotplug_slot_ops, > I'm wondering if we really need additional ones for this use case. > > If you just do... > > if (!pci_probe_reset_slot(dev->slot)) > pci_reset_slot(dev->slot) > else { > /* regular code path */ > } > > would that not be equivalent? As I have informed you before on my previous reply, the pdev->slot is only valid for children objects such as endpoints not for a bridge when using pciehp. The pointer is NULL for the host bridge itself. I reached out to reset_slot() callback in v4 of this implementation. https://patchwork.kernel.org/patch/10494971/ However, as Oza explained FATAL error handling gets called from two different paths as AER and DPC. If the link goes down due to DPC, calling pci_reset_slot() would be a mistake as DPC has its own recovery mechanism by programming the DPC capabilities. pci_reset_slot() performs a secondary bus reset following hotplug interrupt mask. Issuing a secondary bus reset to a DPC event would be a mistake for recovery. That's why, I extracted the hotplug mask and unmask IRQ calls into service layer so that I can mask hotplug interrupts regardless of the source of the FATAL error whether it is DPC or AER. If error source is DPC, it still goes to DPC driver's reset_link() callback for DPC specific clean up. If error source is AER, it still goes to AER driver's reset_link() callback for secondary bus reset. Remember that AER driver completely bypasses pci_reset_slot() today. The lock mechanism you are putting will not be useful for FATAL error case where pci_secondary_bus_reset() is called directly. pci_reset_slot() only gets called from external drivers such as VFIO to initiate a reset to the slot if hotplug is supported. > > (It's worth noting though that pciehp is the only hotplug driver > implementing the ->reset_slot callback. If hotplug is handled by > a different driver or by the platform firmware, devices may still > be removed and re-enumerated twice.) > > Thanks, > > Lukas >
On 7/3/2018 11:12 AM, poza@codeaurora.org wrote: > if (!pci_probe_reset_slot(dev->slot)) > { > pci_reset_slot(dev->slot) > result = reset_link(udev, service); >> in this case aer_root_reset must not call pci_reset_bridge_secondary_bus() > } else > result = reset_link(udev, service); >> in this case aer_root_reset must call pci_reset_bridge_secondary_bus() [since bridge is not hotplug capable) Here are two different flow for two different FATAL error sources dpc_irq link is down due to DPC pcie_do_fatal_recovery() pci_reset_slot() mask hotplug IRQ secondary bus reset unmask hotplug IRQ undefined behavior as link went down due to DPC dpc_reset_link() undefined behavior secondary bus reset happened while a DPC event is pending link may or may not be up at this moment recover the link via DPC way if HW can cope with this undefined behavior. aer_irq link is up pcie_do_fatal_recovery() pci_reset_slot() mask hotplug IRQ secondary bus reset unmask hotplug IRQ link goes up aer_reset_link() secondary bus reset hotplug link down interrupt again I tried to change pci_reset_slot() such that we do mask hotplug IRQ go to AER/DPC reset_link based on error source unmask hotplug IRQ
On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote: > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote: >> On 7/3/2018 10:34 AM, Lukas Wunner wrote: >> > We've already got the ->reset_slot callback in struct hotplug_slot_ops, >> > I'm wondering if we really need additional ones for this use case. >> >> As I have informed you before on my previous reply, the pdev->slot is >> only valid for children objects such as endpoints not for a bridge when >> using pciehp. >> >> The pointer is NULL for the host bridge itself. > > Right, sorry, I had misremembered how this works. So essentially the > pointer is only set for the devices "in" the slot, but not for the bridge > "to" that slot. If the slot isn't occupied, *no* pci_dev points the > struct pci_slot. Seems counter-intuitive to be honest. That is true. There is a bit of history here. Back in pci days, you would have each PCI device number as a hotplug slot. There would be multiple slots under a bridge. There is a one to many relationship. > > Thanks for the explanation of the various reset codepaths, I'm afraid my > understanding of that portion of the PCI core is limited. > > Browsing through drivers/pci/pci.c, I notice pci_dev_lock() and > pci_dev_save_and_disable(), both are called from reset codepaths and > apparently seek to stop access to the device during reset. I'm wondering > why DPC and AER remove devices in order to avoid accesses to them during > reset, instead of utilizing these two functions? This was the behavior until 4.18. Since 4.18 all devices are being removed and reenumerated following a fatal error condition. > > My guess is that a lot of the reset code is historically grown and > could be simplified and made more consistent, but that requires digging > into the code and getting a complete picture. I've sort of done that > for pciehp, I think I'm now intimately familiar with 90% of it, > so I'll be happy to review patches for it and answer questions, > but I'm pretty much stumped when it comes to reset code in the PCI core. > > I treated the ->reset_slot() callback as one possible entry point into > pciehp and asked myself if it's properly serialized with the rest of the > driver and whether driver ->probe and ->remove is ordered such that > the driver is always properly initialized when the entry point might be > taken. I did not look too closely at the codepaths actually leading to > invocation of the ->reset_slot() callback. > Sure, this path gets called from vfio today and possibly via a reset file in sysfs. A bunch of things need to fail before hitting slot reset path. However, vfio is a direct caller if hotplug is supported. > >> I was curious if we could use a single work queue for all pcie portdrv >> services. This would also eliminate the need for locks that Lukas is >> adding. >> >> If hotplug starts first, hotplug code would run to completion before AER >> and DPC service starts recovery. >> >> If DPC/AER starts first, my patch would mask the hotplug interrupts. >> >> My solution doesn't help if link down interrupt is observed before the >> AER >> or DPC services. > > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with > my patches) remove all devices, check if the presence bit is set, Yup. > and > if so, try to bring the slot up again. Hotplug driver should only observe a link down interrupt. Link would come up in response to a secondary bus reset initiated by the AER driver. Can you point me to the code that would bring up the link in hp code? Maybe, I am missing something. > My (limited) understanding is > that the link will stay down until dpc/aer react. > pciehp_check_link_status() > will wait 1 sec for the link, wait another 100 msec, then poll the vendor > register for 1 sec before giving up. So if dpc/aer are locked out for this > long, they will only be able to reset the slot after 2100 msec. > > I've had a brief look at the PCIe r4.0 base spec and could not find > anything about how pciehp and dpc/aer should coordinate. Maybe that's > an oversight, or the PCISIG just leaves this to the OS. > > >> Another possibility is to add synchronization logic between these threads >> obviously. > > Maybe call pci_channel_offline() in the poll loops of pcie_wait_for_link() > and pci_bus_check_dev() to avoid waiting for the link if an error needs to > be acted upon first? Let me think about this. > > Thanks, > > Lukas >
On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote: > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote: > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote: > > > My solution doesn't help if link down interrupt is observed before the > > > AER > > > or DPC services. > > > > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with > > my patches) remove all devices, check if the presence bit is set, > > and if so, try to bring the slot up again. > > Hotplug driver should only observe a link down interrupt. Link would > come up in response to a secondary bus reset initiated by the AER > driver. PCIe hotplug doesn't have separate Link Down and Link Up interrupts, there is only a Link State *Changed* event. > Can you point me to the code that would bring up the link in hp code? I was referring to the situation with my recently posted pciehp patches applied, in particular patch [21/32] ("PCI: pciehp: Become resilient to missed events"): https://patchwork.ozlabs.org/patch/930389/ When I get a presence or link changed event, I turn the slot off. That includes removing all devices in the slot. Because even if the slot is still occupied or link is up, there was definitely a change and the safe behavior is to assume that the card in the slot is now a different one than before. Afterwards, I check if the slot is occupied or link is up. If either of those conditions is true, I try to bring the slot up again. Thanks, Lukas
On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote: > > On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote: > > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote: > > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote: > > > > My solution doesn't help if link down interrupt is observed before the > > > > AER > > > > or DPC services. > > > > > > If pciehp gets an interrupt quicker than dpc/aer, it will (at least with > > > my patches) remove all devices, check if the presence bit is set, > > > and if so, try to bring the slot up again. > > > > Hotplug driver should only observe a link down interrupt. Link would > > come up in response to a secondary bus reset initiated by the AER > > driver. > > PCIe hotplug doesn't have separate Link Down and Link Up interrupts, > there is only a Link State *Changed* event. > > > Can you point me to the code that would bring up the link in hp code? > > I was referring to the situation with my recently posted pciehp patches > applied, in particular patch [21/32] ("PCI: pciehp: Become resilient to > missed events"): > https://patchwork.ozlabs.org/patch/930389/ > > When I get a presence or link changed event, I turn the slot off. That > includes removing all devices in the slot. Because even if the slot is > still occupied or link is up, there was definitely a change and the safe > behavior is to assume that the card in the slot is now a different one > than before. > We do have a bit of mess unfortunately. Error handling and hotplug drivers do not play nicely with each other. When hotplug driver observes a link down, we are not checking if the link down happened because user really wanted to remove a card or if it was because it was originated by an error handling service such as AER/DPC. I'm thinking that we could potentially check if a hotplug event is pending at the entrance of fatal error handling. If it is pending, we could poll until the status bit clears. That should flush the link down event. Even then, link down indication of hotplug seem to turn off slot power and LED. If AER/DPC service runs after the hotplug driver, link won't come back up as the power to the slot is turned off. I'd like to hear about Bjorn's opinion before we throw something else into this problem. > Afterwards, I check if the slot is occupied or link is up. If either > of those conditions is true, I try to bring the slot up again. > > Thanks, > > Lukas
On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote: > On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote: > > On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote: > > > On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote: > > > > On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote: > > > > > My solution doesn't help if link down interrupt is observed > > > > > before the AER or DPC services. > > > > > > > > If pciehp gets an interrupt quicker than dpc/aer, it will (at > > > > least with my patches) remove all devices, check if the > > > > presence bit is set, and if so, try to bring the slot up > > > > again. > > > > > > Hotplug driver should only observe a link down interrupt. Link > > > would come up in response to a secondary bus reset initiated by > > > the AER driver. > > > > PCIe hotplug doesn't have separate Link Down and Link Up > > interrupts, there is only a Link State *Changed* event. > > > > > Can you point me to the code that would bring up the link in hp > > > code? > > > > I was referring to the situation with my recently posted pciehp > > patches applied, in particular patch [21/32] ("PCI: pciehp: Become > > resilient to missed events"): > > https://patchwork.ozlabs.org/patch/930389/ > > > > When I get a presence or link changed event, I turn the slot off. > > That includes removing all devices in the slot. Because even if > > the slot is still occupied or link is up, there was definitely a > > change and the safe behavior is to assume that the card in the > > slot is now a different one than before. > > We do have a bit of mess unfortunately. Error handling and hotplug > drivers do not play nicely with each other. > > When hotplug driver observes a link down, we are not checking if the > link down happened because user really wanted to remove a card or if > it was because it was originated by an error handling service such > as AER/DPC. > > I'm thinking that we could potentially check if a hotplug event is > pending at the entrance of fatal error handling. If it is pending, > we could poll until the status bit clears. That should flush the > link down event. > > Even then, link down indication of hotplug seem to turn off slot > power and LED. > > If AER/DPC service runs after the hotplug driver, link won't come > back up as the power to the slot is turned off. > > I'd like to hear about Bjorn's opinion before we throw something > else into this problem. You guys know way more about this than I do. I think the separation of AER/DPC/pciehp into separate drivers is somewhat artificial because there are many interdependencies. The driver model doesn't apply very well because there's only one underlying piece of hardware, which forces us to use the portdrv as sort of a multiplexer. The fact that portdrv claims these bridges also means normal drivers (e.g., for performance counters) can't use the usual model. All that is to say that if integrating these services more tightly would help solve this problem, I'd be open to that. Bjorn
On 7/20/2018 1:01 PM, Bjorn Helgaas wrote: > On Tue, Jul 10, 2018 at 02:30:11PM -0400, Sinan Kaya wrote: >> On Mon, Jul 9, 2018 at 12:00 PM, Lukas Wunner <lukas@wunner.de> wrote: >>> On Mon, Jul 09, 2018 at 08:48:44AM -0600, Sinan Kaya wrote: >>>> On 7/8/18, Lukas Wunner <lukas@wunner.de> wrote: >>>>> On Tue, Jul 03, 2018 at 11:43:26AM -0400, Sinan Kaya wrote: >>>>>> My solution doesn't help if link down interrupt is observed >>>>>> before the AER or DPC services. >>>>> >>>>> If pciehp gets an interrupt quicker than dpc/aer, it will (at >>>>> least with my patches) remove all devices, check if the >>>>> presence bit is set, and if so, try to bring the slot up >>>>> again. >>>> >>>> Hotplug driver should only observe a link down interrupt. Link >>>> would come up in response to a secondary bus reset initiated by >>>> the AER driver. >>> >>> PCIe hotplug doesn't have separate Link Down and Link Up >>> interrupts, there is only a Link State *Changed* event. >>> >>>> Can you point me to the code that would bring up the link in hp >>>> code? >>> >>> I was referring to the situation with my recently posted pciehp >>> patches applied, in particular patch [21/32] ("PCI: pciehp: Become >>> resilient to missed events"): >>> https://patchwork.ozlabs.org/patch/930389/ >>> >>> When I get a presence or link changed event, I turn the slot off. >>> That includes removing all devices in the slot. Because even if >>> the slot is still occupied or link is up, there was definitely a >>> change and the safe behavior is to assume that the card in the >>> slot is now a different one than before. >> >> We do have a bit of mess unfortunately. Error handling and hotplug >> drivers do not play nicely with each other. >> >> When hotplug driver observes a link down, we are not checking if the >> link down happened because user really wanted to remove a card or if >> it was because it was originated by an error handling service such >> as AER/DPC. >> >> I'm thinking that we could potentially check if a hotplug event is >> pending at the entrance of fatal error handling. If it is pending, >> we could poll until the status bit clears. That should flush the >> link down event. >> >> Even then, link down indication of hotplug seem to turn off slot >> power and LED. >> >> If AER/DPC service runs after the hotplug driver, link won't come >> back up as the power to the slot is turned off. >> >> I'd like to hear about Bjorn's opinion before we throw something >> else into this problem. > > You guys know way more about this than I do. > > I think the separation of AER/DPC/pciehp into separate drivers is > somewhat artificial because there are many interdependencies. The > driver model doesn't apply very well because there's only one > underlying piece of hardware, which forces us to use the portdrv as > sort of a multiplexer. The fact that portdrv claims these bridges > also means normal drivers (e.g., for performance counters) can't use > the usual model. > > All that is to say that if integrating these services more tightly > would help solve this problem, I'd be open to that. I was looking at how to destroy the portdrv for a while. It looks like a much more bigger task to be honest. There are multiple levels of abstractions in the code as you highlighted. My patch solves the problem if AER interrupt happens before the hotplug interrupt. We are masking the data link layer active interrupt. So, AER/DPC can perform their link operations without hotplug driver race. We need to figure out how to gracefully return inside hotplug driver if link down happened and there is an error pending. My first question is why hotplug driver is reacting to the link event if there was not an actual device insertion/removal. Would it help to keep track of presence changed interrupts since last link event? IF counter is 0 and device is present, hotplug driver bails out silently as an example. > > Bjorn >
On 7/20/2018 7:58 PM, Sinan Kaya wrote: > We need to figure out how to gracefully return inside hotplug driver > if link down happened and there is an error pending. How about adding the following into the hotplug ISR? 1. check if firmware first is disabled 2. check if there is a fatal error pending in the device_status register of the PCI Express capability on the root port. 3. bail out from hotplug routine if this is the case. 4. otherwise, existing behavior.
On 2018-07-21 11:37, Sinan Kaya wrote: > On 7/20/2018 7:58 PM, Sinan Kaya wrote: >> We need to figure out how to gracefully return inside hotplug driver >> if link down happened and there is an error pending. > > How about adding the following into the hotplug ISR? > > 1. check if firmware first is disabled > 2. check if there is a fatal error pending in the device_status > register > of the PCI Express capability on the root port. > 3. bail out from hotplug routine if this is the case. > 4. otherwise, existing behavior. This makes sense. from Lukas's text " The user may turn the slot on/off via sysfs. If an Attention Button is present, the user may also press that button to turn the slot on/off after 5 seconds. Either way, it may cause pciehp's IRQ thread to run concurrently to a reset initiated by the AER driver, independently of any events signalled by the slot. " so if device gets removed and re-enumerated other than hotplug ISR, or any other similar path has to take care of checking ERR_FATAL status. Regards, Oza.
On Tue, Jul 03, 2018 at 07:30:28AM -0400, okaya@codeaurora.org wrote: > On 2018-07-03 04:34, Lukas Wunner wrote: > > On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote: > > > If a bridge supports hotplug and observes a PCIe fatal error, the > > > following events happen: > > > > > > 1. AER driver removes the devices from PCI tree on fatal error > > > 2. AER driver brings down the link by issuing a secondary bus reset > > > waits for the link to come up. > > > 3. Hotplug driver observes a link down interrupt > > > 4. Hotplug driver tries to remove the devices waiting for the rescan > > > lock but devices are already removed by the AER driver and AER > > > driver is waiting for the link to come back up. > > > 5. AER driver tries to re-enumerate devices after polling for the link > > > state to go up. > > > 6. Hotplug driver obtains the lock and tries to remove the devices > > > again. > > > > > > If a bridge is a hotplug capable bridge, mask hotplug interrupts > > > before the reset and unmask afterwards. > > > > Would it work for you if you just amended the AER driver to skip > > removal and re-enumeration of devices if the port is a hotplug bridge? > > Just check for is_hotplug_bridge in struct pci_dev. > > The reason why we want to remove devices before secondary bus reset is to > quiesce pcie bus traffic before issuing a reset. > > Skipping this step might cause transactions to be lost in the middle of > the reset as there will be active traffic flowing and drivers will > suddenly start reading ffs. That's par for the course for any hotplug bridge with support for surprise removal (e.g. Thunderbolt) and drivers must be able to cope with it. Nothing to worry about IMHO. Thanks, Lukas
On Tue, Jul 03, 2018 at 06:41:33PM +0530, poza@codeaurora.org wrote: > pciehp_unconfigure_device doing little more than enumeration to quiescence > the bus. > > /* > * Ensure that no new Requests will be generated from > * the device. > */ > if (presence) { > pci_read_config_word(dev, PCI_COMMAND, &command); > command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR); > command |= PCI_COMMAND_INTX_DISABLE; > pci_write_config_word(dev, PCI_COMMAND, command); > } That piece of code is supposed to be executed on safe removal via sysfs or an Attention Button press: The card remains in the slot, even though the slot is brought down. So the card is quiesced. However IIUC, on fatal error the link goes down and for pciehp, that's essentially a surprise removal. In that case, the above code is not intended to be executed, rather the devices below the hotplug bridge are marked disconnected. See this patch I posted yesterday: https://www.spinics.net/lists/linux-pci/msg74763.html Thanks, Lukas
On Fri, Jul 20, 2018 at 07:58:20PM -0700, Sinan Kaya wrote: > My patch solves the problem if AER interrupt happens before the hotplug > interrupt. We are masking the data link layer active interrupt. So, > AER/DPC can perform their link operations without hotplug driver race. > > We need to figure out how to gracefully return inside hotplug driver > if link down happened and there is an error pending. > > My first question is why hotplug driver is reacting to the link event > if there was not an actual device insertion/removal. > > Would it help to keep track of presence changed interrupts since last > link event? > > IF counter is 0 and device is present, hotplug driver bails out > silently as an example. Counting PDC events doesn't work reliably if multiple such events occur in very short succession as the interrupt handler may not run quickly enough. See this commit message which shows unbalanced Link Up / Link Down events: https://patchwork.ozlabs.org/patch/867418/ And on Thunderbolt, interrupts can be signaled even though the port and its parents are in D3hot (sic!). A Thunderbolt daisy chain can consist of up to 6 devices, each comprising a PCI switch, so there's a cascade of over a dozen Upstream / Downstream ports between the Root port and the hotplug port at the end of the daisy chain. God knows how many events have occurred by the time all the parents are resumed to D0 and the Slot Status register of the hotplug port is read/written. That was really the motivation for the event handling rework. Thanks, Lukas
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index ae72f88..5a2d410 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -285,10 +285,12 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, */ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) { + struct pcie_port_service_driver *hpsvc; struct pci_dev *udev; struct pci_bus *parent; struct pci_dev *pdev, *temp; pci_ers_result_t result; + struct device *hpdev; if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) udev = dev; @@ -308,8 +310,17 @@ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service) pci_dev_put(pdev); } + hpsvc = pcie_port_find_service(udev, PCIE_PORT_SERVICE_HP); + hpdev = pcie_port_find_device(udev, PCIE_PORT_SERVICE_HP); + + if (hpdev && hpsvc) + hpsvc->mask_irq(to_pcie_device(hpdev)); + result = reset_link(udev, service); + if (hpdev && hpsvc) + hpsvc->unmask_irq(to_pcie_device(hpdev)); + if ((service == PCIE_PORT_SERVICE_AER) && (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE)) { /*
If a bridge supports hotplug and observes a PCIe fatal error, the following events happen: 1. AER driver removes the devices from PCI tree on fatal error 2. AER driver brings down the link by issuing a secondary bus reset waits for the link to come up. 3. Hotplug driver observes a link down interrupt 4. Hotplug driver tries to remove the devices waiting for the rescan lock but devices are already removed by the AER driver and AER driver is waiting for the link to come back up. 5. AER driver tries to re-enumerate devices after polling for the link state to go up. 6. Hotplug driver obtains the lock and tries to remove the devices again. If a bridge is a hotplug capable bridge, mask hotplug interrupts before the reset and unmask afterwards. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- drivers/pci/pcie/err.c | 11 +++++++++++ 1 file changed, 11 insertions(+)