Message ID | 20180728091324.154795-2-okaya@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: Mask and unmask hotplug interrupts during reset | expand |
On Sat, Jul 28, 2018 at 02:13:24AM -0700, Sinan Kaya wrote: > We need to figure out how to gracefully return inside hotplug driver > if link down happened and there is an error pending. Fatal error needs > to be serviced by AER/DPC drivers. Hotplug driver is observing link > events while AER/DPC drivers are performing link recovery today and > are causing confusion for the hotplug statemachine. > > 1. check if there is a fatal error pending in the device_status register > of the PCI Express capability on the root port. > 2. bail out from hotplug routine if this is the case. > 3. otherwise, existing behavior. > > If fatal error is pending and a fatal error service such as DPC or AER > is running, it is the responsibility of the fatal error service to > recover the link. [snip] > --- a/drivers/pci/hotplug/pciehp_hpc.c > +++ b/drivers/pci/hotplug/pciehp_hpc.c > @@ -612,10 +612,15 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) > * and cause the wrong event to queue. > */ > if (events & PCI_EXP_SLTSTA_DLLSC) { > - ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot), > - link ? "Up" : "Down"); > - pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP : > - INT_LINK_DOWN); > + if (pci_fatal_error_pending(pdev, PCI_ERR_UNC_SURPDN)) > + ctrl_info(ctrl, "Slot(%s): Ignoring Link %s event due to detected Fatal Error\n", > + slot_name(slot), link ? "Up" : "Down"); > + else { > + ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot), > + link ? "Up" : "Down"); > + pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP : > + INT_LINK_DOWN); > + } The above is still based on the event handling in v4.18, so the patch doesn't apply to what's currently queued on Bjorn's pci/hotplug branch. That said, a problem I see with the approach you're suggesting is that recovery from the fatal error may fail in pcie_do_fatal_recovery(). The devices below the hotplug port will then have been removed but internally in pciehp the slot will remain in ON_STATE and slot power will remain enabled. That feels wrong. After re-reading all the e-mails we've exchanged since early July, the approach Oza suggested in this e-mail seems the most sensible to me: https://lkml.kernel.org/r/324f8cf2fe6f7bdc43ca8a646eea908d@codeaurora.org He suggested skipping removal of devices in pcie_do_fatal_recovery() for hotplug ports. The rationale is that when a PCIe port raises a fatal error, that port will normally not have the ability to remove its children from the hierarchy unless it's a hotplug port. Thus, if the port is a hotplug port, pcie_do_fatal_recovery() should let pciehp handle removal of the devices. Only if it's not a hotplug port should pcie_do_fatal_recovery() remove the devices. My understanding is that after the error has been cleared, the link should automatically come back up, is that correct? pciehp will then bring up the slot on its own. Thanks, Lukas
On 7/29/18, Lukas Wunner <lukas@wunner.de> wrote: > On Sat, Jul 28, 2018 at 02:13:24AM -0700, Sinan Kaya wrote: >> We need to figure out how to gracefully return inside hotplug driver >> if link down happened and there is an error pending. Fatal error needs >> to be serviced by AER/DPC drivers. Hotplug driver is observing link >> events while AER/DPC drivers are performing link recovery today and >> are causing confusion for the hotplug statemachine. >> >> 1. check if there is a fatal error pending in the device_status register >> of the PCI Express capability on the root port. >> 2. bail out from hotplug routine if this is the case. >> 3. otherwise, existing behavior. >> >> If fatal error is pending and a fatal error service such as DPC or AER >> is running, it is the responsibility of the fatal error service to >> recover the link. > [snip] >> --- a/drivers/pci/hotplug/pciehp_hpc.c >> +++ b/drivers/pci/hotplug/pciehp_hpc.c >> @@ -612,10 +612,15 @@ static irqreturn_t pciehp_isr(int irq, void >> *dev_id) >> * and cause the wrong event to queue. >> */ >> if (events & PCI_EXP_SLTSTA_DLLSC) { >> - ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot), >> - link ? "Up" : "Down"); >> - pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP : >> - INT_LINK_DOWN); >> + if (pci_fatal_error_pending(pdev, PCI_ERR_UNC_SURPDN)) >> + ctrl_info(ctrl, "Slot(%s): Ignoring Link %s event due to detected >> Fatal Error\n", >> + slot_name(slot), link ? "Up" : "Down"); >> + else { >> + ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot), >> + link ? "Up" : "Down"); >> + pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP : >> + INT_LINK_DOWN); >> + } > > The above is still based on the event handling in v4.18, so the patch > doesn't apply to what's currently queued on Bjorn's pci/hotplug branch. > > That said, a problem I see with the approach you're suggesting is that > recovery from the fatal error may fail in pcie_do_fatal_recovery(). > The devices below the hotplug port will then have been removed but > internally in pciehp the slot will remain in ON_STATE and slot power > will remain enabled. That feels wrong. > > After re-reading all the e-mails we've exchanged since early July, > the approach Oza suggested in this e-mail seems the most sensible to me: > https://lkml.kernel.org/r/324f8cf2fe6f7bdc43ca8a646eea908d@codeaurora.org > > He suggested skipping removal of devices in pcie_do_fatal_recovery() > for hotplug ports. > > The rationale is that when a PCIe port raises a fatal error, that port > will normally not have the ability to remove its children from the > hierarchy unless it's a hotplug port. Thus, if the port is a hotplug > port, pcie_do_fatal_recovery() should let pciehp handle removal of the > devices. Only if it's not a hotplug port should pcie_do_fatal_recovery() > remove the devices. My understanding is that after the error has been > cleared, the link should automatically come back up, is that correct? > pciehp will then bring up the slot on its own. > Sending to the list. If anybody knows how to send text email from gmail via mobile app, I am interested to hear from you. Webmail sucks... I understand your slot power concerns. I don't have an answer at this moment. Here is the issue with mixing pciehp link handler and fatal error recovery paths. 1. fatal error happens 2. Pciehp ISR executes and turns off slot power 3. Fatal error recovery executes either secondary bus reset or dpc status trigger via reset_link() function to recover the link. 4. Link doesn't recover because power is off. > Thanks, > > Lukas >
On Sun, Jul 29, 2018 at 09:44:50AM -0700, Sinan Kaya wrote: > On 7/29/18, Lukas Wunner <lukas@wunner.de> wrote: > > After re-reading all the e-mails we've exchanged since early July, > > the approach Oza suggested in this e-mail seems the most sensible to me: > > https://lkml.kernel.org/r/324f8cf2fe6f7bdc43ca8a646eea908d@codeaurora.org > > > > He suggested skipping removal of devices in pcie_do_fatal_recovery() > > for hotplug ports. > > > > The rationale is that when a PCIe port raises a fatal error, that port > > will normally not have the ability to remove its children from the > > hierarchy unless it's a hotplug port. Thus, if the port is a hotplug > > port, pcie_do_fatal_recovery() should let pciehp handle removal of the > > devices. Only if it's not a hotplug port should pcie_do_fatal_recovery() > > remove the devices. My understanding is that after the error has been > > cleared, the link should automatically come back up, is that correct? > > pciehp will then bring up the slot on its own. > > > > I understand your slot power concerns. I don't have an answer at this moment. > > Here is the issue with mixing pciehp link handler and fatal error > recovery paths. > > 1. fatal error happens > 2. Pciehp ISR executes and turns off slot power > 3. Fatal error recovery executes either secondary bus reset or dpc > status trigger via reset_link() function to recover the link. > 4. Link doesn't recover because power is off. With the new code on the pci/hotplug branch, the expected behavior is: 4. Having brought the slot down after the DLLSC event, pciehp_handle_presence_or_link_change() notices that the slot is occupied ("Slot(...): Card present") and immediately tries to bring the slot up again: Slot power is re-enabled, green LED is set to blinking. pciehp_check_link_status() then waits 1 second for the link to go up. If the link does not come up within 1 second, an error message is logged but the procedure does not abort yet. We then wait for 100 ms and poll for 1 second whether the vendor ID of slot 0 function 0 becomes readable. Only if the link has not come up at this point is the procedure aborted. So the question is: a. Is DPC/AER able to recover from the error if slot power is disabled? Or do we need to synchronize with pciehp to keep slot power enabled so that the error can be handled? b. How fast is DPC/AER able to recover from a fatal error? If it's more than 2.1 seconds, pciehp will not bring the slot up automatically and the user is required to bring it up manually via sysfs. If we know that DPC/AER need a specific amount of time that is longer than 2.1 seconds, we can amend board_added() to poll until recovery from the fatal error (or use a waitqueue which is woken on recovery), with a sufficient timeout. Does this work for you? Thanks, Lukas
On 7/29/2018 10:39 AM, Lukas Wunner wrote: >> ere is the issue with mixing pciehp link handler and fatal error >> recovery paths. >> Let me put all use cases to the table: use case A 1. DPC fatal error happens. Link goes down in HW by the DPC. 2. Pciehp ISR executes first and turns off slot power 3. DPC Fatal error recovery executes dpc status trigger via reset_link() function to recover the link. 4. Link doesn't recover because power is off. use case B 1. AER fatal error happens. Link is up. 2. AER ISR executes first. 3. AER Fatal error recovery executes secondary bus reset aka warm reset via AER error recovery in reset_link() to recovery the link 4. Link goes down. 5. Pciehp ISR executes and turns off slot power 6. AER link recovery fails because power is off. use case C 1. AER fatal error happens. Link goes down after observing a fatal error before observing ISR. 2. Pciehp ISR executes and turns off slot power 3. AER ISR executes. 4. AER Fatal error recovery executes secondary bus reset aka warm reset via AER error recovery in reset_link() to recovery the link 5. AER link recovery fails because power is off. > With the new code on the pci/hotplug branch, the expected behavior is: > > 4. Having brought the slot down after the DLLSC event, > pciehp_handle_presence_or_link_change() notices that the slot is > occupied ("Slot(...): Card present") and immediately tries to > bring the slot up again: > Slot power is re-enabled, green LED is set to blinking. > pciehp_check_link_status() then waits 1 second for the link to > go up. If the link does not come up within 1 second, an error > message is logged but the procedure does not abort yet. We then > wait for 100 ms and poll for 1 second whether the vendor ID of > slot 0 function 0 becomes readable. Only if the link has not come > up at this point is the procedure aborted. Thanks for the heads up. I didn't realize HP was trying to recover the link as well. This is the situation that I'm trying to avoid. AER and DPC resets are known as warm-resets. HP link recovery is known as cold-reset via power-off and power-on command. In the middle of a warm-reset operation(AER/DPC), we are: 1. turning off the slow power 2. performing a cold reset causing Fatal Error recovery to fail. > > So the question is: > > a. Is DPC/AER able to recover from the error if slot power is disabled? > Or do we need to synchronize with pciehp to keep slot power enabled > so that the error can be handled? Yes, slot power needs to be kept on. > > b. How fast is DPC/AER able to recover from a fatal error? > If it's more than 2.1 seconds, pciehp will not bring the slot up > automatically and the user is required to bring it up manually via > sysfs. If we know that DPC/AER need a specific amount of time that > is longer than 2.1 seconds, we can amend board_added() to poll until > recovery from the fatal error (or use a waitqueue which is woken on > recovery), with a sufficient timeout. pciehp shouldn't attempt recovery. If link goes down due to a DPC event, it should be recovered by DPC status trigger. Injecting a cold reset in the middle can cause a HW lockup as it is an undefined behavior. Similarly, If link goes down due to an AER secondary bus reset issue, it should be recovered by HW. Injecting a cold reset in the middle of a secondary bus reset can cause a HW lockup as it is an undefined behavior. > . > Does this work for you? Not much, but along the lines of your thinking, I can propose this as an alternative. Maybe, this helps: 1. HP ISR observes link down interrupt. 2. HP ISR checks that there is a fatal error pending, it doesn't touch the link. 3. HP ISR waits until link recovery happens. 4. HP ISR calls the read vendor id function. DPC link recovery is very quick (100ms at most). Secondary bus reset recovery should be contained within 1 seconds for most cases but spec allows a device to extend vendor id read as much as it wants via CRS response. We poll up to an additional 60 seconds in read vendor id function.
On Sun, Jul 29, 2018 at 11:30:09AM -0700, Sinan Kaya wrote: > Yes, slot power needs to be kept on. > > pciehp shouldn't attempt recovery. > > If link goes down due to a DPC event, it should be recovered by DPC status > trigger. Injecting a cold reset in the middle can cause a HW > lockup as it is an undefined behavior. > > Similarly, If link goes down due to an AER secondary bus reset issue, it > should be recovered by HW. Injecting a cold reset in the middle of a > secondary bus reset can cause a HW lockup as it is an undefined behavior. Thanks a lot for the explanation, understood now. > Maybe, this helps: > > 1. HP ISR observes link down interrupt. > 2. HP ISR checks that there is a fatal error pending, it doesn't touch > the link. > 3. HP ISR waits until link recovery happens. > 4. HP ISR calls the read vendor id function. > > DPC link recovery is very quick (100ms at most). Secondary bus reset > recovery should be contained within 1 seconds for most cases but > spec allows a device to extend vendor id read as much as it wants via > CRS response. We poll up to an additional 60 seconds in read vendor > id function. Yes, that proposal makes a lot of sense to me. This should also work regardless whether pciehp or DPC/AER react first to the Link Down. Could you rebase your patch on the current pci/hotplug branch and insert the procedure you've outlined above at the top of pciehp_handle_presence_or_link_change() in pciehp_ctrl.c, or put it in a helper that's called at the top of that function. Your patch "[PATCH v6 1/1] PCI: pciehp: Ignore link events when there is a fatal error pending" only checks once for a pending fatal error, it should poll until either the fatal error is gone or a timeout is hit. If the fatal error is gone and the link is up, you can just return from pciehp_handle_presence_or_link_change(). Else (in the timeout case) fall back to the normal handling of a Link Down, i.e. let it bring down the slot. Please add a code comment in pciehp_handle_presence_or_link_change() along the lines of /* If a fatal error is pending, wait for AER or DPC to handle it. */ The information in your e-mail that a cold reset would incorrectly interfere with error recovery is a crucial piece of information that should be included at least in the commit message. (I was unaware of that.) If you have any further questions on pciehp, ask away. Thanks! Lukas
On 7/29/2018 12:07 PM, Lukas Wunner wrote: > On Sun, Jul 29, 2018 at 11:30:09AM -0700, Sinan Kaya wrote: >> Yes, slot power needs to be kept on. >> >> pciehp shouldn't attempt recovery. >> >> If link goes down due to a DPC event, it should be recovered by DPC status >> trigger. Injecting a cold reset in the middle can cause a HW >> lockup as it is an undefined behavior. >> >> Similarly, If link goes down due to an AER secondary bus reset issue, it >> should be recovered by HW. Injecting a cold reset in the middle of a >> secondary bus reset can cause a HW lockup as it is an undefined behavior. > > Thanks a lot for the explanation, understood now. > > >> Maybe, this helps: >> >> 1. HP ISR observes link down interrupt. >> 2. HP ISR checks that there is a fatal error pending, it doesn't touch >> the link. >> 3. HP ISR waits until link recovery happens. >> 4. HP ISR calls the read vendor id function. >> >> DPC link recovery is very quick (100ms at most). Secondary bus reset >> recovery should be contained within 1 seconds for most cases but >> spec allows a device to extend vendor id read as much as it wants via >> CRS response. We poll up to an additional 60 seconds in read vendor >> id function. > > Yes, that proposal makes a lot of sense to me. This should also work > regardless whether pciehp or DPC/AER react first to the Link Down. > Could you rebase your patch on the current pci/hotplug branch > and insert the procedure you've outlined above at the top of > pciehp_handle_presence_or_link_change() in pciehp_ctrl.c, > or put it in a helper that's called at the top of that function. > > Your patch "[PATCH v6 1/1] PCI: pciehp: Ignore link events when there > is a fatal error pending" only checks once for a pending fatal error, > it should poll until either the fatal error is gone or a timeout is > hit. If the fatal error is gone and the link is up, you can just return > from pciehp_handle_presence_or_link_change(). Else (in the timeout case) > fall back to the normal handling of a Link Down, i.e. let it bring down > the slot. > > Please add a code comment in pciehp_handle_presence_or_link_change() > along the lines of > > /* If a fatal error is pending, wait for AER or DPC to handle it. */ > > The information in your e-mail that a cold reset would incorrectly > interfere with error recovery is a crucial piece of information that > should be included at least in the commit message. (I was unaware > of that.) > > If you have any further questions on pciehp, ask away. > Great, let me take one more iteration on this after rebasing to the new branch. > Thanks! > > Lukas >
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 718b6073afad..776566ab7583 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -612,10 +612,15 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id) * and cause the wrong event to queue. */ if (events & PCI_EXP_SLTSTA_DLLSC) { - ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot), - link ? "Up" : "Down"); - pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP : - INT_LINK_DOWN); + if (pci_fatal_error_pending(pdev, PCI_ERR_UNC_SURPDN)) + ctrl_info(ctrl, "Slot(%s): Ignoring Link %s event due to detected Fatal Error\n", + slot_name(slot), link ? "Up" : "Down"); + else { + ctrl_info(ctrl, "Slot(%s): Link %s\n", slot_name(slot), + link ? "Up" : "Down"); + pciehp_queue_interrupt_event(slot, link ? INT_LINK_UP : + INT_LINK_DOWN); + } } else if (events & PCI_EXP_SLTSTA_PDC) { present = !!(status & PCI_EXP_SLTSTA_PDS); ctrl_info(ctrl, "Slot(%s): Card %spresent\n", slot_name(slot), diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index 882f1f9596df..7494b2c0c5ff 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -360,6 +360,7 @@ void pci_enable_acs(struct pci_dev *dev); /* PCI error reporting and recovery */ void pcie_do_fatal_recovery(struct pci_dev *dev, u32 service); void pcie_do_nonfatal_recovery(struct pci_dev *dev); +bool pci_fatal_error_pending(struct pci_dev *pdev, u32 usr_mask); bool pcie_wait_for_link(struct pci_dev *pdev, bool active); #ifdef CONFIG_PCIEASPM diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index f7ce0cb0b0b7..316b2d2750b9 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -386,3 +386,38 @@ void pcie_do_nonfatal_recovery(struct pci_dev *dev) /* TODO: Should kernel panic here? */ pci_info(dev, "AER: Device recovery failed\n"); } + +bool pci_fatal_error_pending(struct pci_dev *pdev, u32 usr_mask) +{ + u16 err_status = 0; + u32 status, mask; + int rc; + + if (!pci_is_pcie(pdev)) + return false; + + rc = pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &err_status); + if (rc) + return false; + + if (!(err_status & PCI_EXP_DEVSTA_FED)) + return false; + + if (!pdev->aer_cap) + return false; + + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, + &status); + if (rc) + return false; + + rc = pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_UNCOR_MASK, + &mask); + if (rc) + return false; + + status &= mask; + status &= ~usr_mask; + + return !!status; +}
We need to figure out how to gracefully return inside hotplug driver if link down happened and there is an error pending. Fatal error needs to be serviced by AER/DPC drivers. Hotplug driver is observing link events while AER/DPC drivers are performing link recovery today and are causing confusion for the hotplug statemachine. 1. check if there is a fatal error pending in the device_status register of the PCI Express capability on the root port. 2. bail out from hotplug routine if this is the case. 3. otherwise, existing behavior. If fatal error is pending and a fatal error service such as DPC or AER is running, it is the responsibility of the fatal error service to recover the link. Signed-off-by: Sinan Kaya <okaya@kernel.org> --- drivers/pci/hotplug/pciehp_hpc.c | 13 ++++++++---- drivers/pci/pci.h | 1 + drivers/pci/pcie/err.c | 35 ++++++++++++++++++++++++++++++++ 3 files changed, 45 insertions(+), 4 deletions(-)