Message ID | d97541df3b44822e0d085ffa058e9e7c0ba05214.1602663397.git.sathyanarayanan.kuppuswamy@linux.intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [v6,1/2] PCI/ERR: Call pci_bus_reset() before calling ->slot_reset() callback | expand |
On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan <sathyanarayanan.nkuppuswamy@gmail.com> wrote: > > Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > merged fatal and non-fatal error recovery paths, and also made > recovery code depend on hotplug handler for "remove affected > device + rescan" support. But this change also complicated the > error recovery path and which in turn led to the following > issues. > > 1. We depend on hotplug handler for removing the affected > devices/drivers on DLLSC LINK down event (on DPC event > trigger) and DPC handler for handling the error recovery. Since > both handlers operate on same set of affected devices, it leads > to race condition, which in turn leads to NULL pointer > exceptions or error recovery failures.You can find more details > about this issue in following link. > > https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t > > 2. For non-hotplug capable devices fatal (DPC) error recovery > is currently broken. Current fatal error recovery implementation > relies on PCIe hotplug (pciehp) handler for detaching and > re-enumerating the affected devices/drivers. So when dealing with > non-hotplug capable devices, recovery code does not restore the state > of the affected devices correctly. You can find more details about > this issue in the following links. > > https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ > https://lore.kernel.org/linux-pci/12115.1588207324@famine/ > https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ > > In order to fix the above two issues, we should stop relying on hotplug Yes, it doesn't rely on hotplug handler to remove and rescan the device, but it couldn't prevent hotplug drivers from doing another replicated removal/rescanning. it doesn't make sense to leave another useless removal/rescanning there. Maybe that's why these two paths were merged to one and made it rely on hotplug. > handler for cleaning the affected devices/drivers and let error recovery > handler own this functionality. So this patch reverts Commit bdb5ac85777d > ("PCI/ERR: Handle fatal error recovery") and re-introduce the "remove > affected device + rescan" functionality in fatal error recovery handler. > > Also holding pci_lock_rescan_remove() will prevent the race between hotplug > and DPC handler. > > Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > Reviewed-by: Sinan Kaya <okaya@kernel.org> > --- > Changes since v4: > * Added new interfaces for error recovery (pcie_do_fatal_recovery() > and pcie_do_nonfatal_recovery()). > > Changes since v5: > * Fixed static/non-static declartion issue. > > Documentation/PCI/pci-error-recovery.rst | 47 ++++++++++------ > Documentation/PCI/pcieaer-howto.rst | 2 +- > drivers/pci/pci.h | 5 +- > drivers/pci/pcie/aer.c | 10 ++-- > drivers/pci/pcie/dpc.c | 2 +- > drivers/pci/pcie/edr.c | 2 +- > drivers/pci/pcie/err.c | 70 ++++++++++++++++++------ > 7 files changed, 94 insertions(+), 44 deletions(-) > > diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst > index 84ceebb08cac..830c8af5838b 100644 > --- a/Documentation/PCI/pci-error-recovery.rst > +++ b/Documentation/PCI/pci-error-recovery.rst > @@ -115,7 +115,7 @@ The actual steps taken by a platform to recover from a PCI error > event will be platform-dependent, but will follow the general > sequence described below. > > -STEP 0: Error Event > +STEP 0: Error Event: ERR_NONFATAL > ------------------- > A PCI bus error is detected by the PCI hardware. On powerpc, the slot > is isolated, in that all I/O is blocked: all reads return 0xffffffff, > @@ -160,10 +160,10 @@ particular, if the platform doesn't isolate slots), and recovery > proceeds to STEP 2 (MMIO Enable). > > If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), > -then recovery proceeds to STEP 4 (Slot Reset). > +then recovery proceeds to STEP 3 (Slot Reset). > > If the platform is unable to recover the slot, the next step > -is STEP 6 (Permanent Failure). > +is STEP 5 (Permanent Failure). > > .. note:: > > @@ -198,7 +198,7 @@ reset or some such, but not restart operations. This callback is made if > all drivers on a segment agree that they can try to recover and if no automatic > link reset was performed by the HW. If the platform can't just re-enable IOs > without a slot reset or a link reset, it will not call this callback, and > -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) > +instead will have gone directly to STEP 3 (Slot Reset) > > .. note:: > > @@ -233,18 +233,12 @@ The driver should return one of the following result codes: > > The next step taken depends on the results returned by the drivers. > If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform > -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). > +proceeds to STEP 4 (Resume Operations). > > If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform > -proceeds to STEP 4 (Slot Reset) > +proceeds to STEP 3 (Slot Reset) > > -STEP 3: Link Reset > ------------------- > -The platform resets the link. This is a PCI-Express specific step > -and is done whenever a fatal error has been detected that can be > -"solved" by resetting the link. > - > -STEP 4: Slot Reset > +STEP 3: Slot Reset > ------------------ > > In response to a return value of PCI_ERS_RESULT_NEED_RESET, the > @@ -322,7 +316,7 @@ PCI card types:: > + pdev->needs_freset = 1; > + > > -Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent > +Platform proceeds either to STEP 4 (Resume Operations) or STEP 5 (Permanent > Failure). > > .. note:: > @@ -332,7 +326,7 @@ Failure). > However, it probably should. > > > -STEP 5: Resume Operations > +STEP 4: Resume Operations > ------------------------- > The platform will call the resume() callback on all affected device > drivers if all drivers on the segment have returned > @@ -344,7 +338,7 @@ a result code. > At this point, if a new error happens, the platform will restart > a new error recovery sequence. > > -STEP 6: Permanent Failure > +STEP 5: Permanent Failure > ------------------------- > A "permanent failure" has occurred, and the platform cannot recover > the device. The platform will call error_detected() with a > @@ -367,6 +361,27 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt > for additional detail on real-life experience of the causes of > software errors. > > +STEP 0: Error Event: ERR_FATAL > +-------------------- > +PCI bus error is detected by the PCI hardware. On powerpc, the slot is > +isolated, in that all I/O is blocked: all reads return 0xffffffff, all > +writes are ignored. > + > +STEP 1: Remove devices > +--------------------- > +Platform removes the devices depending on the error agent, it could be > +this port for all subordinates or upstream component (likely downstream > +port) > + > +STEP 2: Reset link > +--------------------- > +The platform resets the link. This is a PCI-Express specific step and is > +done whenever a fatal error has been detected that can be "solved" by > +resetting the link. > + > +STEP 3: Re-enumerate the devices > +--------------------- > +Initiates the re-enumeration. > > Conclusion; General Remarks > --------------------------- > diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst > index 0b36b9ebfa4b..9528cfd9449b 100644 > --- a/Documentation/PCI/pcieaer-howto.rst > +++ b/Documentation/PCI/pcieaer-howto.rst > @@ -206,7 +206,7 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within > a hierarchy in question. Then, performing link reset at upstream is > necessary. As different kinds of devices might use different approaches > to reset link, AER port service driver is required to provide the > -function to reset link via callback parameter of pcie_do_recovery() > +function to reset link via callback parameter of pcie_do_fatal_recovery() > function. If reset_link is not NULL, recovery function will use it > to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER > and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fa12f7cbc1a0..b3e1571107b5 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -555,8 +555,9 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev) > #endif > > /* PCI error reporting and recovery */ > -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > - pci_channel_state_t state, > +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev); > + > +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)); > > bool pcie_wait_for_link(struct pci_dev *pdev, bool active); > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 65dff5f3457a..f3e70bb9b30d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -947,9 +947,9 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) > if (pcie_aer_is_native(dev)) > pcie_clear_device_status(dev); > } else if (info->severity == AER_NONFATAL) > - pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); > + pcie_do_nonfatal_recovery(dev); > else if (info->severity == AER_FATAL) > - pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset); > + pcie_do_fatal_recovery(dev, aer_root_reset); > pci_dev_put(dev); > } > > @@ -985,11 +985,9 @@ static void aer_recover_work_func(struct work_struct *work) > } > cper_print_aer(pdev, entry.severity, entry.regs); > if (entry.severity == AER_NONFATAL) > - pcie_do_recovery(pdev, pci_channel_io_normal, > - aer_root_reset); > + pcie_do_nonfatal_recovery(pdev); > else if (entry.severity == AER_FATAL) > - pcie_do_recovery(pdev, pci_channel_io_frozen, > - aer_root_reset); > + pcie_do_fatal_recovery(pdev, aer_root_reset); > pci_dev_put(pdev); > } > } > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index daa9a4153776..74e7d1da3cf0 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -233,7 +233,7 @@ static irqreturn_t dpc_handler(int irq, void *context) > dpc_process_error(pdev); > > /* We configure DPC so it only triggers on ERR_FATAL */ > - pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link); > + pcie_do_fatal_recovery(pdev, dpc_reset_link); > > return IRQ_HANDLED; > } > diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c > index a6b9b479b97a..87379bc566f6 100644 > --- a/drivers/pci/pcie/edr.c > +++ b/drivers/pci/pcie/edr.c > @@ -183,7 +183,7 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data) > * or ERR_NONFATAL, since the link is already down, use the FATAL > * error recovery path for both cases. > */ > - estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link); > + estate = pcie_do_fatal_recovery(edev, dpc_reset_link); > > send_ost: > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 315a4d559c4c..fa50366a9632 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -79,11 +79,6 @@ static int report_error_detected(struct pci_dev *dev, > return 0; > } > > -static int report_frozen_detected(struct pci_dev *dev, void *data) > -{ > - return report_error_detected(dev, pci_channel_io_frozen, data); > -} > - > static int report_normal_detected(struct pci_dev *dev, void *data) > { > return report_error_detected(dev, pci_channel_io_normal, data); > @@ -146,9 +141,59 @@ static int report_resume(struct pci_dev *dev, void *data) > return 0; > } > > -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > - pci_channel_state_t state, > +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) > +{ > + struct pci_dev *udev; > + struct pci_bus *parent; > + struct pci_dev *pdev, *temp; > + pci_ers_result_t result; > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) > + udev = dev; > + else > + udev = dev->bus->self; > + > + parent = udev->subordinate; > + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > + > + pci_lock_rescan_remove(); Though here you have lock, but hotplug will do another 'pci_stop_and_remove_bus_device()' without merging it with the hotplug driver, you have no way to remove the replicated actions in hotplug handler. Thanks, Ethan > + pci_dev_get(dev); > + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > + bus_list) { > + pci_stop_and_remove_bus_device(pdev); > + } > + > + result = reset_link(udev); > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > + /* > + * If the error is reported by a bridge, we think this error > + * is related to the downstream link of the bridge, so we > + * do error recovery on all subordinates of the bridge instead > + * of the bridge and clear the error status of the bridge. > + */ > + pci_aer_clear_fatal_status(dev); > + if (pcie_aer_is_native(dev)) > + pcie_clear_device_status(dev); > + } > + > + if (result == PCI_ERS_RESULT_RECOVERED) { > + if (pcie_wait_for_link(udev, true)) And another pci_rescan_bus() like in the hotplug handler. > + pci_rescan_bus(udev->bus); > + pci_info(dev, "Device recovery from fatal error successful\n"); > + } else { > + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > + pci_info(dev, "Device recovery from fatal error failed\n"); > + } > + > + pci_dev_put(dev); > + pci_unlock_rescan_remove(); > + > + return result; > +} > + > +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev) > { > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_bus *bus; > @@ -164,16 +209,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > bus = dev->subordinate; > > pci_dbg(dev, "broadcast error_detected message\n"); > - if (state == pci_channel_io_frozen) { > - pci_walk_bus(bus, report_frozen_detected, &status); > - status = reset_link(dev); > - if (status != PCI_ERS_RESULT_RECOVERED) { > - pci_warn(dev, "link reset failed\n"); > - goto failed; > - } > - } else { > - pci_walk_bus(bus, report_normal_detected, &status); > - } > + pci_walk_bus(bus, report_normal_detected, &status); > > if (status == PCI_ERS_RESULT_CAN_RECOVER) { > status = PCI_ERS_RESULT_RECOVERED; > -- > 2.17.1 >
On 10/14/20 8:07 AM, Ethan Zhao wrote: > On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan > <sathyanarayanan.nkuppuswamy@gmail.com> wrote: >> >> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") >> merged fatal and non-fatal error recovery paths, and also made >> recovery code depend on hotplug handler for "remove affected >> device + rescan" support. But this change also complicated the >> error recovery path and which in turn led to the following >> issues. >> >> 1. We depend on hotplug handler for removing the affected >> devices/drivers on DLLSC LINK down event (on DPC event >> trigger) and DPC handler for handling the error recovery. Since >> both handlers operate on same set of affected devices, it leads >> to race condition, which in turn leads to NULL pointer >> exceptions or error recovery failures.You can find more details >> about this issue in following link. >> >> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t >> >> 2. For non-hotplug capable devices fatal (DPC) error recovery >> is currently broken. Current fatal error recovery implementation >> relies on PCIe hotplug (pciehp) handler for detaching and >> re-enumerating the affected devices/drivers. So when dealing with >> non-hotplug capable devices, recovery code does not restore the state >> of the affected devices correctly. You can find more details about >> this issue in the following links. >> >> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ >> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ >> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ >> >> In order to fix the above two issues, we should stop relying on hotplug > Yes, it doesn't rely on hotplug handler to remove and rescan the device, > but it couldn't prevent hotplug drivers from doing another replicated > removal/rescanning. > it doesn't make sense to leave another useless removal/rescanning there. > Maybe that's why these two paths were merged to one and made it rely on > hotplug. No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence depending on it to handle some of its recovery function is in-correct and would lead to issues in non-hotplug capable platforms (which is true currently). > >> + else >> + udev = dev->bus->self; >> + >> + parent = udev->subordinate; >> + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); >> + >> + pci_lock_rescan_remove(); > Though here you have lock, but hotplug will do another > 'pci_stop_and_remove_bus_device()' > without merging it with the hotplug driver, you have no way to > remove the replicated actions in > hotplug handler. No, the core operation (remove/add device) is syncronzied and done in only one thread. Please check the following flow. Even in hotplug handler, before removing the device, it attempts to hold pci_lock_rescan_remove() lock. So holding the same lock in DPC handler will syncronize the DPC/hotplug handlers. Also if one of the thread (DPC or hotplug) removes/adds the affected devices, other thread will not repeat the same action (since the device is already removed/added). ->pciehp_ist() ->pciehp_handle_presence_or_link_change() ->pciehp_disable_slot() ->__pciehp_disable_slot() ->remove_board() ->pciehp_unconfigure_device() ->pci_lock_rescan_remove() > > > Thanks, > Ethan >> + pci_dev_get(dev); >> + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, >> + bus_list) { >> + pci_stop_and_remove_bus_device(pdev); >> + } >> + >> + result = reset_link(udev); >> + >> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { >> + /* >> + * If the error is reported by a bridge, we think this error >> + * is related to the downstream link of the bridge, so we >> + * do error recovery on all subordinates of the bridge instead >> + * of the bridge and clear the error status of the bridge. >> + */ >> + pci_aer_clear_fatal_status(dev); >> + if (pcie_aer_is_native(dev)) >> + pcie_clear_device_status(dev); >> + } >> + >> + if (result == PCI_ERS_RESULT_RECOVERED) { >> + if (pcie_wait_for_link(udev, true)) > And another pci_rescan_bus() like in the hotplug handler. As I have mentioned before, holding the same lock should make them synchronized and not repeat the underlying functionality of pci_rescan_bus() in both threads at the same time. >> + pci_rescan_bus(udev->bus); >> + pci_info(dev, "Device recovery from fatal error successful\n"); >> + } else { >> + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); >> + pci_info(dev, "Device recovery from fatal error failed\n"); >> -- >> 2.17.1 >>
On Thu, Oct 15, 2020 at 1:06 AM Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > > > > On 10/14/20 8:07 AM, Ethan Zhao wrote: > > On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan > > <sathyanarayanan.nkuppuswamy@gmail.com> wrote: > >> > >> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > >> merged fatal and non-fatal error recovery paths, and also made > >> recovery code depend on hotplug handler for "remove affected > >> device + rescan" support. But this change also complicated the > >> error recovery path and which in turn led to the following > >> issues. > >> > >> 1. We depend on hotplug handler for removing the affected > >> devices/drivers on DLLSC LINK down event (on DPC event > >> trigger) and DPC handler for handling the error recovery. Since > >> both handlers operate on same set of affected devices, it leads > >> to race condition, which in turn leads to NULL pointer > >> exceptions or error recovery failures.You can find more details > >> about this issue in following link. > >> > >> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t > >> > >> 2. For non-hotplug capable devices fatal (DPC) error recovery > >> is currently broken. Current fatal error recovery implementation > >> relies on PCIe hotplug (pciehp) handler for detaching and > >> re-enumerating the affected devices/drivers. So when dealing with > >> non-hotplug capable devices, recovery code does not restore the state > >> of the affected devices correctly. You can find more details about > >> this issue in the following links. > >> > >> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ > >> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ > >> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ > >> > >> In order to fix the above two issues, we should stop relying on hotplug > > Yes, it doesn't rely on hotplug handler to remove and rescan the device, > > but it couldn't prevent hotplug drivers from doing another replicated > > removal/rescanning. > > it doesn't make sense to leave another useless removal/rescanning there. > > Maybe that's why these two paths were merged to one and made it rely on > > hotplug. > No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence > depending on it to handle some of its recovery function is in-correct and > would lead to issues in non-hotplug capable platforms (which is true > currently). > > > > >> + else > >> + udev = dev->bus->self; > >> + > >> + parent = udev->subordinate; > >> + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > >> + > >> + pci_lock_rescan_remove(); > > Though here you have lock, but hotplug will do another > > 'pci_stop_and_remove_bus_device()' > > without merging it with the hotplug driver, you have no way to > > remove the replicated actions in > > hotplug handler. > No, the core operation (remove/add device) is syncronzied and done in > only one thread. Please check the following flow. Even in hotplug pci_lock_rescan_remove() is global lock for PCIe, the mal-functional device's port holds this lock, it prevents the whole system from doing hot-plug operation. Though pciehp is not so hot/scalable and performance critical, but there is per cpu thread to handle hot-plug operation. synchronize all threads make them walk backwards for scalability. > handler, before removing the device, it attempts to hold pci_lock_rescan_remove() > lock. So holding the same lock in DPC handler will syncronize the DPC/hotplug > handlers. Also if one of the thread (DPC or hotplug) removes/adds the affected devices, > other thread will not repeat the same action (since the device is already removed/added). > > ->pciehp_ist() > ->pciehp_handle_presence_or_link_change() > ->pciehp_disable_slot() > ->__pciehp_disable_slot() > ->remove_board() > ->pciehp_unconfigure_device() > ->pci_lock_rescan_remove() > > > > > > Thanks, > > Ethan > >> + pci_dev_get(dev); > >> + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > >> + bus_list) { > >> + pci_stop_and_remove_bus_device(pdev); > >> + } > >> + > >> + result = reset_link(udev); > >> + > >> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > >> + /* > >> + * If the error is reported by a bridge, we think this error > >> + * is related to the downstream link of the bridge, so we > >> + * do error recovery on all subordinates of the bridge instead > >> + * of the bridge and clear the error status of the bridge. > >> + */ > >> + pci_aer_clear_fatal_status(dev); > >> + if (pcie_aer_is_native(dev)) > >> + pcie_clear_device_status(dev); > >> + } > >> + > >> + if (result == PCI_ERS_RESULT_RECOVERED) { > >> + if (pcie_wait_for_link(udev, true)) > > And another pci_rescan_bus() like in the hotplug handler. > As I have mentioned before, holding the same lock should make them synchronized > and not repeat the underlying functionality of pci_rescan_bus() in both threads > at the same time. Yes, it blocked them all. Thanks, Ethan > >> + pci_rescan_bus(udev->bus); > >> + pci_info(dev, "Device recovery from fatal error successful\n"); > >> + } else { > >> + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > >> + pci_info(dev, "Device recovery from fatal error failed\n"); > > >> -- > >> 2.17.1 > >> > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
On 10/14/20 6:58 PM, Ethan Zhao wrote: > On Thu, Oct 15, 2020 at 1:06 AM Kuppuswamy, Sathyanarayanan > <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: >> >> >> >> On 10/14/20 8:07 AM, Ethan Zhao wrote: >>> On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan >>> <sathyanarayanan.nkuppuswamy@gmail.com> wrote: >>>> >>>> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") >>>> merged fatal and non-fatal error recovery paths, and also made >>>> recovery code depend on hotplug handler for "remove affected >>>> device + rescan" support. But this change also complicated the >>>> error recovery path and which in turn led to the following >>>> issues. >>>> >>>> 1. We depend on hotplug handler for removing the affected >>>> devices/drivers on DLLSC LINK down event (on DPC event >>>> trigger) and DPC handler for handling the error recovery. Since >>>> both handlers operate on same set of affected devices, it leads >>>> to race condition, which in turn leads to NULL pointer >>>> exceptions or error recovery failures.You can find more details >>>> about this issue in following link. >>>> >>>> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t >>>> >>>> 2. For non-hotplug capable devices fatal (DPC) error recovery >>>> is currently broken. Current fatal error recovery implementation >>>> relies on PCIe hotplug (pciehp) handler for detaching and >>>> re-enumerating the affected devices/drivers. So when dealing with >>>> non-hotplug capable devices, recovery code does not restore the state >>>> of the affected devices correctly. You can find more details about >>>> this issue in the following links. >>>> >>>> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ >>>> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ >>>> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ >>>> >>>> In order to fix the above two issues, we should stop relying on hotplug >>> Yes, it doesn't rely on hotplug handler to remove and rescan the device, >>> but it couldn't prevent hotplug drivers from doing another replicated >>> removal/rescanning. >>> it doesn't make sense to leave another useless removal/rescanning there. >>> Maybe that's why these two paths were merged to one and made it rely on >>> hotplug. >> No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence >> depending on it to handle some of its recovery function is in-correct and >> would lead to issues in non-hotplug capable platforms (which is true >> currently). >>> > pci_lock_rescan_remove() is global lock for PCIe, the mal-functional > device's port holds this lock, it prevents the whole system from doing > hot-plug operation. It does not prevent the hotplug operation, but it might delay it. Since both DPC and hotplug operates on same set of devices, it must be synchronized. > Though pciehp is not so hot/scalable and performance critical, but there > is per cpu thread to handle hot-plug operation. synchronize all threads > make them walk backwards for scalability. DPC events does not happen in high frequency. So I don't think we should worry about the performance here. Even hotplug handler will hold this lock when adding/removing the devices. So adding/removing devices is a serialized operation. > >> >>>> -- >>>> 2.17.1 >>>> >> >> -- >> Sathyanarayanan Kuppuswamy >> Linux Kernel Developer
On Thu, Oct 15, 2020 at 11:04 AM Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > > > > On 10/14/20 6:58 PM, Ethan Zhao wrote: > > On Thu, Oct 15, 2020 at 1:06 AM Kuppuswamy, Sathyanarayanan > > <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > >> > >> > >> > >> On 10/14/20 8:07 AM, Ethan Zhao wrote: > >>> On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan > >>> <sathyanarayanan.nkuppuswamy@gmail.com> wrote: > >>>> > >>>> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > >>>> merged fatal and non-fatal error recovery paths, and also made > >>>> recovery code depend on hotplug handler for "remove affected > >>>> device + rescan" support. But this change also complicated the > >>>> error recovery path and which in turn led to the following > >>>> issues. > >>>> > >>>> 1. We depend on hotplug handler for removing the affected > >>>> devices/drivers on DLLSC LINK down event (on DPC event > >>>> trigger) and DPC handler for handling the error recovery. Since > >>>> both handlers operate on same set of affected devices, it leads > >>>> to race condition, which in turn leads to NULL pointer > >>>> exceptions or error recovery failures.You can find more details > >>>> about this issue in following link. > >>>> > >>>> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t > >>>> > >>>> 2. For non-hotplug capable devices fatal (DPC) error recovery > >>>> is currently broken. Current fatal error recovery implementation > >>>> relies on PCIe hotplug (pciehp) handler for detaching and > >>>> re-enumerating the affected devices/drivers. So when dealing with > >>>> non-hotplug capable devices, recovery code does not restore the state > >>>> of the affected devices correctly. You can find more details about > >>>> this issue in the following links. > >>>> > >>>> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ > >>>> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ > >>>> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ > >>>> > >>>> In order to fix the above two issues, we should stop relying on hotplug > >>> Yes, it doesn't rely on hotplug handler to remove and rescan the device, > >>> but it couldn't prevent hotplug drivers from doing another replicated > >>> removal/rescanning. > >>> it doesn't make sense to leave another useless removal/rescanning there. > >>> Maybe that's why these two paths were merged to one and made it rely on > >>> hotplug. > >> No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence > >> depending on it to handle some of its recovery function is in-correct and > >> would lead to issues in non-hotplug capable platforms (which is true > >> currently). > >>> > > > pci_lock_rescan_remove() is global lock for PCIe, the mal-functional > > device's port holds this lock, it prevents the whole system from doing > > hot-plug operation. > It does not prevent the hotplug operation, but it might delay it. Since both > DPC and hotplug operates on same set of devices, it must be synchronized. Think about a large system with some PCI domains, every domain has some ports and devices attached. why DPC and hotplug *must* operate on the same set of devices from different domains ? if it must be synchronized, why make the hotplug handlers threadable ? > > Though pciehp is not so hot/scalable and performance critical, but there > > is per cpu thread to handle hot-plug operation. synchronize all threads > > make them walk backwards for scalability. > DPC events does not happen in high frequency. So I don't think we should It's holding global lock, once malfunction happens to one device and it's driver, the whole system, everyone holds it, would be blocked to work. > worry about the performance here. Even hotplug handler will hold this lock > when adding/removing the devices. So adding/removing devices is a serialized You don't worry about performance, but if there is a requirement needs more scalable and reliable hotplug, the effect will be much harder. what to do then ? choose another OS ? To be honest, I don't like the global lock/ pci_lock_rescan_remove(). BTW, I didn't try the FATAL errors brute force injection on your patch, duplicated removal will work naturally because it was removed ? Thanks, Ethan > operation. > > > > >> > >>>> -- > >>>> 2.17.1 > >>>> > >> > >> -- > >> Sathyanarayanan Kuppuswamy > >> Linux Kernel Developer > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
On 10/14/20 10:05 PM, Ethan Zhao wrote: > On Thu, Oct 15, 2020 at 11:04 AM Kuppuswamy, Sathyanarayanan > <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: >> >> >> >> On 10/14/20 6:58 PM, Ethan Zhao wrote: >>> On Thu, Oct 15, 2020 at 1:06 AM Kuppuswamy, Sathyanarayanan >>> <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: >>>> >>>> >>>> >>>> On 10/14/20 8:07 AM, Ethan Zhao wrote: >>>>> On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan >>>>> <sathyanarayanan.nkuppuswamy@gmail.com> wrote: >>>>>> >>>>>> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") >>>>>> merged fatal and non-fatal error recovery paths, and also made >>>>>> recovery code depend on hotplug handler for "remove affected >>>>>> device + rescan" support. But this change also complicated the >>>>>> error recovery path and which in turn led to the following >>>>>> issues. >>>>>> >>>>>> 1. We depend on hotplug handler for removing the affected >>>>>> devices/drivers on DLLSC LINK down event (on DPC event >>>>>> trigger) and DPC handler for handling the error recovery. Since >>>>>> both handlers operate on same set of affected devices, it leads >>>>>> to race condition, which in turn leads to NULL pointer >>>>>> exceptions or error recovery failures.You can find more details >>>>>> about this issue in following link. >>>>>> >>>>>> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t >>>>>> >>>>>> 2. For non-hotplug capable devices fatal (DPC) error recovery >>>>>> is currently broken. Current fatal error recovery implementation >>>>>> relies on PCIe hotplug (pciehp) handler for detaching and >>>>>> re-enumerating the affected devices/drivers. So when dealing with >>>>>> non-hotplug capable devices, recovery code does not restore the state >>>>>> of the affected devices correctly. You can find more details about >>>>>> this issue in the following links. >>>>>> >>>>>> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ >>>>>> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ >>>>>> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ >>>>>> >>>>>> In order to fix the above two issues, we should stop relying on hotplug >>>>> Yes, it doesn't rely on hotplug handler to remove and rescan the device, >>>>> but it couldn't prevent hotplug drivers from doing another replicated >>>>> removal/rescanning. >>>>> it doesn't make sense to leave another useless removal/rescanning there. >>>>> Maybe that's why these two paths were merged to one and made it rely on >>>>> hotplug. >>>> No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence >>>> depending on it to handle some of its recovery function is in-correct and >>>> would lead to issues in non-hotplug capable platforms (which is true >>>> currently). >>>>> >> > >>> Though pciehp is not so hot/scalable and performance critical, but there >>> is per cpu thread to handle hot-plug operation. synchronize all threads >>> make them walk backwards for scalability. >> DPC events does not happen in high frequency. So I don't think we should > It's holding global lock, once malfunction happens to one device and > it's driver, > the whole system, everyone holds it, would be blocked to work. >> worry about the performance here. Even hotplug handler will hold this lock >> when adding/removing the devices. So adding/removing devices is a serialized > You don't worry about performance, but if there is a requirement needs > more scalable > and reliable hotplug, the effect will be much harder. what to do then ? choose > another OS ? As I have mentioned, all device creation/removal in PCI core code is already protected by this lock (including hotplug code). So the multidomain performance impact you mentioned should exist even now. All I am doing is, using the same lock for protecting device removal/rescan in error recovery code. drivers/pci/xen-pcifront.c:477: pci_lock_rescan_remove(); drivers/pci/xen-pcifront.c:567: pci_lock_rescan_remove(); drivers/pci/xen-pcifront.c:1064: pci_lock_rescan_remove(); drivers/pci/hotplug/rpaphp_core.c:498: pci_lock_rescan_remove(); drivers/pci/hotplug/rpaphp_core.c:520: pci_lock_rescan_remove(); drivers/pci/hotplug/s390_pci_hpc.c:70: pci_lock_rescan_remove(); drivers/pci/hotplug/shpchp_pci.c:31: pci_lock_rescan_remove(); drivers/pci/hotplug/shpchp_pci.c:73: pci_lock_rescan_remove(); drivers/pci/hotplug/pciehp_pci.c:39: pci_lock_rescan_remove(); drivers/pci/hotplug/pciehp_pci.c:96: pci_lock_rescan_remove(); drivers/pci/hotplug/acpiphp_glue.c:762: pci_lock_rescan_remove(); drivers/pci/hotplug/acpiphp_glue.c:787: pci_lock_rescan_remove(); drivers/pci/hotplug/acpiphp_glue.c:975: pci_lock_rescan_remove(); drivers/pci/hotplug/acpiphp_glue.c:1026: pci_lock_rescan_remove(); drivers/pci/hotplug/cpqphp_pci.c:75: pci_lock_rescan_remove(); drivers/pci/hotplug/cpqphp_pci.c:120: pci_lock_rescan_remove(); drivers/pci/hotplug/rpadlpar_core.c:361: pci_lock_rescan_remove(); drivers/pci/hotplug/pnv_php.c:513: pci_lock_rescan_remove(); drivers/pci/hotplug/pnv_php.c:582: pci_lock_rescan_remove(); drivers/pci/hotplug/ibmphp_core.c:668: pci_lock_rescan_remove(); drivers/pci/hotplug/ibmphp_core.c:738: pci_lock_rescan_remove(); drivers/pci/hotplug/cpci_hotplug_pci.c:245: pci_lock_rescan_remove(); drivers/pci/hotplug/cpci_hotplug_pci.c:298: pci_lock_rescan_remove(); drivers/pci/controller/pci-hyperv.c:1866: pci_lock_rescan_remove(); drivers/pci/controller/pci-hyperv.c:2135: pci_lock_rescan_remove(); drivers/pci/controller/pci-hyperv.c:2313: pci_lock_rescan_remove(); drivers/pci/controller/pci-hyperv.c:3300: pci_lock_rescan_remove(); drivers/pci/controller/pci-host-common.c:91: pci_lock_rescan_remove(); drivers/pci/remove.c:123: pci_lock_rescan_remove(); drivers/pci/pci-sysfs.c:410: pci_lock_rescan_remove(); drivers/pci/pci-sysfs.c:444: pci_lock_rescan_remove(); drivers/pci/pci-sysfs.c:479: pci_lock_rescan_remove(); drivers/pci/probe.c:3231:void pci_lock_rescan_remove(void) drivers/pci/probe.c:3235:EXPORT_SYMBOL_GPL(pci_lock_rescan_remove); > To be honest, I don't like the global lock/ pci_lock_rescan_remove(). > > BTW, I didn't try the FATAL errors brute force injection on your > patch, duplicated > removal will work naturally because it was removed ? > > Thanks, > Ethan >> operation. >>> >> >>>> >>>>>> -- >>>>>> 2.17.1 >>>>>> >>>> >>>> -- >>>> Sathyanarayanan Kuppuswamy >>>> Linux Kernel Developer >> >> -- >> Sathyanarayanan Kuppuswamy >> Linux Kernel Developer
> /* PCI error reporting and recovery */ > -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > - pci_channel_state_t state, > +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev); > + > +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)); Now that both functions have descriptive names (thanks for that!), the "_do" component of the names can be dropped. > +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) > +{ > + struct pci_dev *udev; > + struct pci_bus *parent; > + struct pci_dev *pdev, *temp; > + pci_ers_result_t result; > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) > + udev = dev; > + else > + udev = dev->bus->self; > + > + parent = udev->subordinate; > + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > + > + pci_lock_rescan_remove(); > + pci_dev_get(dev); > + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > + bus_list) { > + pci_stop_and_remove_bus_device(pdev); > + } > + > + result = reset_link(udev); Some of the indentation seems strange here, please use tab based indents everywhere.
On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan <sathyanarayanan.nkuppuswamy@gmail.com> wrote: > > Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > merged fatal and non-fatal error recovery paths, and also made > recovery code depend on hotplug handler for "remove affected > device + rescan" support. But this change also complicated the > error recovery path and which in turn led to the following > issues. > > 1. We depend on hotplug handler for removing the affected > devices/drivers on DLLSC LINK down event (on DPC event > trigger) and DPC handler for handling the error recovery. Since > both handlers operate on same set of affected devices, it leads > to race condition, which in turn leads to NULL pointer > exceptions or error recovery failures.You can find more details > about this issue in following link. > > https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t > > 2. For non-hotplug capable devices fatal (DPC) error recovery > is currently broken. Current fatal error recovery implementation > relies on PCIe hotplug (pciehp) handler for detaching and > re-enumerating the affected devices/drivers. So when dealing with > non-hotplug capable devices, recovery code does not restore the state > of the affected devices correctly. You can find more details about > this issue in the following links. > > https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ > https://lore.kernel.org/linux-pci/12115.1588207324@famine/ > https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ > > In order to fix the above two issues, we should stop relying on hotplug > handler for cleaning the affected devices/drivers and let error recovery > handler own this functionality. So this patch reverts Commit bdb5ac85777d > ("PCI/ERR: Handle fatal error recovery") and re-introduce the "remove > affected device + rescan" functionality in fatal error recovery handler. > > Also holding pci_lock_rescan_remove() will prevent the race between hotplug > and DPC handler. > > Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > Reviewed-by: Sinan Kaya <okaya@kernel.org> > --- > Changes since v4: > * Added new interfaces for error recovery (pcie_do_fatal_recovery() > and pcie_do_nonfatal_recovery()). > > Changes since v5: > * Fixed static/non-static declartion issue. > > Documentation/PCI/pci-error-recovery.rst | 47 ++++++++++------ > Documentation/PCI/pcieaer-howto.rst | 2 +- > drivers/pci/pci.h | 5 +- > drivers/pci/pcie/aer.c | 10 ++-- > drivers/pci/pcie/dpc.c | 2 +- > drivers/pci/pcie/edr.c | 2 +- > drivers/pci/pcie/err.c | 70 ++++++++++++++++++------ > 7 files changed, 94 insertions(+), 44 deletions(-) > > diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst > index 84ceebb08cac..830c8af5838b 100644 > --- a/Documentation/PCI/pci-error-recovery.rst > +++ b/Documentation/PCI/pci-error-recovery.rst > @@ -115,7 +115,7 @@ The actual steps taken by a platform to recover from a PCI error > event will be platform-dependent, but will follow the general > sequence described below. > > -STEP 0: Error Event > +STEP 0: Error Event: ERR_NONFATAL > ------------------- > A PCI bus error is detected by the PCI hardware. On powerpc, the slot > is isolated, in that all I/O is blocked: all reads return 0xffffffff, > @@ -160,10 +160,10 @@ particular, if the platform doesn't isolate slots), and recovery > proceeds to STEP 2 (MMIO Enable). > > If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), > -then recovery proceeds to STEP 4 (Slot Reset). > +then recovery proceeds to STEP 3 (Slot Reset). > > If the platform is unable to recover the slot, the next step > -is STEP 6 (Permanent Failure). > +is STEP 5 (Permanent Failure). > > .. note:: > > @@ -198,7 +198,7 @@ reset or some such, but not restart operations. This callback is made if > all drivers on a segment agree that they can try to recover and if no automatic > link reset was performed by the HW. If the platform can't just re-enable IOs > without a slot reset or a link reset, it will not call this callback, and > -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) > +instead will have gone directly to STEP 3 (Slot Reset) > > .. note:: > > @@ -233,18 +233,12 @@ The driver should return one of the following result codes: > > The next step taken depends on the results returned by the drivers. > If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform > -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). > +proceeds to STEP 4 (Resume Operations). > > If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform > -proceeds to STEP 4 (Slot Reset) > +proceeds to STEP 3 (Slot Reset) > > -STEP 3: Link Reset > ------------------- > -The platform resets the link. This is a PCI-Express specific step > -and is done whenever a fatal error has been detected that can be > -"solved" by resetting the link. > - > -STEP 4: Slot Reset > +STEP 3: Slot Reset > ------------------ > > In response to a return value of PCI_ERS_RESULT_NEED_RESET, the > @@ -322,7 +316,7 @@ PCI card types:: > + pdev->needs_freset = 1; > + > > -Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent > +Platform proceeds either to STEP 4 (Resume Operations) or STEP 5 (Permanent > Failure). > > .. note:: > @@ -332,7 +326,7 @@ Failure). > However, it probably should. > > > -STEP 5: Resume Operations > +STEP 4: Resume Operations > ------------------------- > The platform will call the resume() callback on all affected device > drivers if all drivers on the segment have returned > @@ -344,7 +338,7 @@ a result code. > At this point, if a new error happens, the platform will restart > a new error recovery sequence. > > -STEP 6: Permanent Failure > +STEP 5: Permanent Failure > ------------------------- > A "permanent failure" has occurred, and the platform cannot recover > the device. The platform will call error_detected() with a > @@ -367,6 +361,27 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt > for additional detail on real-life experience of the causes of > software errors. > > +STEP 0: Error Event: ERR_FATAL > +-------------------- > +PCI bus error is detected by the PCI hardware. On powerpc, the slot is > +isolated, in that all I/O is blocked: all reads return 0xffffffff, all > +writes are ignored. > + > +STEP 1: Remove devices > +--------------------- > +Platform removes the devices depending on the error agent, it could be > +this port for all subordinates or upstream component (likely downstream > +port) > + > +STEP 2: Reset link > +--------------------- > +The platform resets the link. This is a PCI-Express specific step and is > +done whenever a fatal error has been detected that can be "solved" by > +resetting the link. > + > +STEP 3: Re-enumerate the devices > +--------------------- > +Initiates the re-enumeration. > > Conclusion; General Remarks > --------------------------- > diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst > index 0b36b9ebfa4b..9528cfd9449b 100644 > --- a/Documentation/PCI/pcieaer-howto.rst > +++ b/Documentation/PCI/pcieaer-howto.rst > @@ -206,7 +206,7 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within > a hierarchy in question. Then, performing link reset at upstream is > necessary. As different kinds of devices might use different approaches > to reset link, AER port service driver is required to provide the > -function to reset link via callback parameter of pcie_do_recovery() > +function to reset link via callback parameter of pcie_do_fatal_recovery() > function. If reset_link is not NULL, recovery function will use it > to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER > and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fa12f7cbc1a0..b3e1571107b5 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -555,8 +555,9 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev) > #endif > > /* PCI error reporting and recovery */ > -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > - pci_channel_state_t state, > +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev); > + > +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)); > > bool pcie_wait_for_link(struct pci_dev *pdev, bool active); > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 65dff5f3457a..f3e70bb9b30d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -947,9 +947,9 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) > if (pcie_aer_is_native(dev)) > pcie_clear_device_status(dev); > } else if (info->severity == AER_NONFATAL) > - pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); > + pcie_do_nonfatal_recovery(dev); > else if (info->severity == AER_FATAL) > - pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset); > + pcie_do_fatal_recovery(dev, aer_root_reset); > pci_dev_put(dev); > } > > @@ -985,11 +985,9 @@ static void aer_recover_work_func(struct work_struct *work) > } > cper_print_aer(pdev, entry.severity, entry.regs); > if (entry.severity == AER_NONFATAL) > - pcie_do_recovery(pdev, pci_channel_io_normal, > - aer_root_reset); > + pcie_do_nonfatal_recovery(pdev); > else if (entry.severity == AER_FATAL) > - pcie_do_recovery(pdev, pci_channel_io_frozen, > - aer_root_reset); > + pcie_do_fatal_recovery(pdev, aer_root_reset); > pci_dev_put(pdev); > } > } > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index daa9a4153776..74e7d1da3cf0 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -233,7 +233,7 @@ static irqreturn_t dpc_handler(int irq, void *context) > dpc_process_error(pdev); > > /* We configure DPC so it only triggers on ERR_FATAL */ > - pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link); > + pcie_do_fatal_recovery(pdev, dpc_reset_link); > > return IRQ_HANDLED; > } > diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c > index a6b9b479b97a..87379bc566f6 100644 > --- a/drivers/pci/pcie/edr.c > +++ b/drivers/pci/pcie/edr.c > @@ -183,7 +183,7 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data) > * or ERR_NONFATAL, since the link is already down, use the FATAL > * error recovery path for both cases. > */ > - estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link); > + estate = pcie_do_fatal_recovery(edev, dpc_reset_link); > > send_ost: > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 315a4d559c4c..fa50366a9632 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -79,11 +79,6 @@ static int report_error_detected(struct pci_dev *dev, > return 0; > } > > -static int report_frozen_detected(struct pci_dev *dev, void *data) > -{ > - return report_error_detected(dev, pci_channel_io_frozen, data); > -} > - > static int report_normal_detected(struct pci_dev *dev, void *data) > { > return report_error_detected(dev, pci_channel_io_normal, data); > @@ -146,9 +141,59 @@ static int report_resume(struct pci_dev *dev, void *data) > return 0; > } > > -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > - pci_channel_state_t state, > +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) > +{ > + struct pci_dev *udev; > + struct pci_bus *parent; > + struct pci_dev *pdev, *temp; > + pci_ers_result_t result; > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) > + udev = dev; > + else > + udev = dev->bus->self; > + > + parent = udev->subordinate; > + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); > + > + pci_lock_rescan_remove(); > + pci_dev_get(dev); > + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, > + bus_list) { > + pci_stop_and_remove_bus_device(pdev); Again , pciehp driver wouldn't naturally ignore the DLLSC/PDC event, or because you do pci_stop_and_remove_bus_device(pdev) here, pciehp driver wouldn't do pcie_disable_slot() anymore, when DPC driver get its interrupt and run into here, pciehp driver also gets DLLSC /PDC and may run into SUPPRISE_REMOVAL first. so there would be duplicated pci_stop_and_remove_bus_device(pdev); Don't say it would naturally be okay. check it. > + } > + > + result = reset_link(udev); > + > + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { > + /* > + * If the error is reported by a bridge, we think this error > + * is related to the downstream link of the bridge, so we > + * do error recovery on all subordinates of the bridge instead > + * of the bridge and clear the error status of the bridge. > + */ > + pci_aer_clear_fatal_status(dev); > + if (pcie_aer_is_native(dev)) > + pcie_clear_device_status(dev); > + } > + > + if (result == PCI_ERS_RESULT_RECOVERED) { > + if (pcie_wait_for_link(udev, true)) > + pci_rescan_bus(udev->bus); So does to this pci_rescan_bus(udev->bus); You said you won't rely on pciehp handler, but how do you eliminate the handler's duplicated actions. Thanks, Ethan > + pci_info(dev, "Device recovery from fatal error successful\n"); > + } else { > + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > + pci_info(dev, "Device recovery from fatal error failed\n"); > + } > + > + pci_dev_put(dev); > + pci_unlock_rescan_remove(); > + > + return result; > +} > + > +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev) > { > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_bus *bus; > @@ -164,16 +209,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > bus = dev->subordinate; > > pci_dbg(dev, "broadcast error_detected message\n"); > - if (state == pci_channel_io_frozen) { > - pci_walk_bus(bus, report_frozen_detected, &status); > - status = reset_link(dev); > - if (status != PCI_ERS_RESULT_RECOVERED) { > - pci_warn(dev, "link reset failed\n"); > - goto failed; > - } > - } else { > - pci_walk_bus(bus, report_normal_detected, &status); > - } > + pci_walk_bus(bus, report_normal_detected, &status); > > if (status == PCI_ERS_RESULT_CAN_RECOVER) { > status = PCI_ERS_RESULT_RECOVERED; > -- > 2.17.1 >
On Thu, Oct 15, 2020 at 1:53 PM Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > > > > On 10/14/20 10:05 PM, Ethan Zhao wrote: > > On Thu, Oct 15, 2020 at 11:04 AM Kuppuswamy, Sathyanarayanan > > <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > >> > >> > >> > >> On 10/14/20 6:58 PM, Ethan Zhao wrote: > >>> On Thu, Oct 15, 2020 at 1:06 AM Kuppuswamy, Sathyanarayanan > >>> <sathyanarayanan.kuppuswamy@linux.intel.com> wrote: > >>>> > >>>> > >>>> > >>>> On 10/14/20 8:07 AM, Ethan Zhao wrote: > >>>>> On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan > >>>>> <sathyanarayanan.nkuppuswamy@gmail.com> wrote: > >>>>>> > >>>>>> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") > >>>>>> merged fatal and non-fatal error recovery paths, and also made > >>>>>> recovery code depend on hotplug handler for "remove affected > >>>>>> device + rescan" support. But this change also complicated the > >>>>>> error recovery path and which in turn led to the following > >>>>>> issues. > >>>>>> > >>>>>> 1. We depend on hotplug handler for removing the affected > >>>>>> devices/drivers on DLLSC LINK down event (on DPC event > >>>>>> trigger) and DPC handler for handling the error recovery. Since > >>>>>> both handlers operate on same set of affected devices, it leads > >>>>>> to race condition, which in turn leads to NULL pointer > >>>>>> exceptions or error recovery failures.You can find more details > >>>>>> about this issue in following link. > >>>>>> > >>>>>> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t > >>>>>> > >>>>>> 2. For non-hotplug capable devices fatal (DPC) error recovery > >>>>>> is currently broken. Current fatal error recovery implementation > >>>>>> relies on PCIe hotplug (pciehp) handler for detaching and > >>>>>> re-enumerating the affected devices/drivers. So when dealing with > >>>>>> non-hotplug capable devices, recovery code does not restore the state > >>>>>> of the affected devices correctly. You can find more details about > >>>>>> this issue in the following links. > >>>>>> > >>>>>> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ > >>>>>> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ > >>>>>> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ > >>>>>> > >>>>>> In order to fix the above two issues, we should stop relying on hotplug > >>>>> Yes, it doesn't rely on hotplug handler to remove and rescan the device, > >>>>> but it couldn't prevent hotplug drivers from doing another replicated > >>>>> removal/rescanning. > >>>>> it doesn't make sense to leave another useless removal/rescanning there. > >>>>> Maybe that's why these two paths were merged to one and made it rely on > >>>>> hotplug. > >>>> No, as per PCIe spec, hotplug and DPC has no functional dependency. Hence > >>>> depending on it to handle some of its recovery function is in-correct and > >>>> would lead to issues in non-hotplug capable platforms (which is true > >>>> currently). > >>>>> > >> > > > > >>> Though pciehp is not so hot/scalable and performance critical, but there > >>> is per cpu thread to handle hot-plug operation. synchronize all threads > >>> make them walk backwards for scalability. > >> DPC events does not happen in high frequency. So I don't think we should > > It's holding global lock, once malfunction happens to one device and > > it's driver, > > the whole system, everyone holds it, would be blocked to work. > >> worry about the performance here. Even hotplug handler will hold this lock > >> when adding/removing the devices. So adding/removing devices is a serialized > > You don't worry about performance, but if there is a requirement needs > > more scalable > > and reliable hotplug, the effect will be much harder. what to do then ? choose > > another OS ? > As I have mentioned, all device creation/removal in PCI core code is already > protected by this lock (including hotplug code). So the multidomain performance > impact you mentioned should exist even now. All I am doing is, using the > same lock for protecting device removal/rescan in error recovery code. > > drivers/pci/xen-pcifront.c:477: pci_lock_rescan_remove(); > drivers/pci/xen-pcifront.c:567: pci_lock_rescan_remove(); > drivers/pci/xen-pcifront.c:1064: pci_lock_rescan_remove(); > drivers/pci/hotplug/rpaphp_core.c:498: pci_lock_rescan_remove(); > drivers/pci/hotplug/rpaphp_core.c:520: pci_lock_rescan_remove(); > drivers/pci/hotplug/s390_pci_hpc.c:70: pci_lock_rescan_remove(); > drivers/pci/hotplug/shpchp_pci.c:31: pci_lock_rescan_remove(); > drivers/pci/hotplug/shpchp_pci.c:73: pci_lock_rescan_remove(); > drivers/pci/hotplug/pciehp_pci.c:39: pci_lock_rescan_remove(); > drivers/pci/hotplug/pciehp_pci.c:96: pci_lock_rescan_remove(); > drivers/pci/hotplug/acpiphp_glue.c:762: pci_lock_rescan_remove(); > drivers/pci/hotplug/acpiphp_glue.c:787: pci_lock_rescan_remove(); > drivers/pci/hotplug/acpiphp_glue.c:975: pci_lock_rescan_remove(); > drivers/pci/hotplug/acpiphp_glue.c:1026: pci_lock_rescan_remove(); > drivers/pci/hotplug/cpqphp_pci.c:75: pci_lock_rescan_remove(); > drivers/pci/hotplug/cpqphp_pci.c:120: pci_lock_rescan_remove(); > drivers/pci/hotplug/rpadlpar_core.c:361: pci_lock_rescan_remove(); > drivers/pci/hotplug/pnv_php.c:513: pci_lock_rescan_remove(); > drivers/pci/hotplug/pnv_php.c:582: pci_lock_rescan_remove(); > drivers/pci/hotplug/ibmphp_core.c:668: pci_lock_rescan_remove(); > drivers/pci/hotplug/ibmphp_core.c:738: pci_lock_rescan_remove(); > drivers/pci/hotplug/cpci_hotplug_pci.c:245: pci_lock_rescan_remove(); > drivers/pci/hotplug/cpci_hotplug_pci.c:298: pci_lock_rescan_remove(); > drivers/pci/controller/pci-hyperv.c:1866: pci_lock_rescan_remove(); > drivers/pci/controller/pci-hyperv.c:2135: pci_lock_rescan_remove(); > drivers/pci/controller/pci-hyperv.c:2313: pci_lock_rescan_remove(); > drivers/pci/controller/pci-hyperv.c:3300: pci_lock_rescan_remove(); > drivers/pci/controller/pci-host-common.c:91: pci_lock_rescan_remove(); > drivers/pci/remove.c:123: pci_lock_rescan_remove(); > drivers/pci/pci-sysfs.c:410: pci_lock_rescan_remove(); > drivers/pci/pci-sysfs.c:444: pci_lock_rescan_remove(); > drivers/pci/pci-sysfs.c:479: pci_lock_rescan_remove(); > drivers/pci/probe.c:3231:void pci_lock_rescan_remove(void) > drivers/pci/probe.c:3235:EXPORT_SYMBOL_GPL(pci_lock_rescan_remove); The existing fact isn't a good justification to use it and make things worse, vice versa ---use it in more places and "fix" something. The same logic as PCIe spec doesn't mention DPC should rely on hotplug, but it doesn't mean that's correct or incorrect. Thanks, Ethan > > > To be honest, I don't like the global lock/ pci_lock_rescan_remove(). > > > > BTW, I didn't try the FATAL errors brute force injection on your > > patch, duplicated > > removal will work naturally because it was removed ? > > > > Thanks, > > Ethan > >> operation. > >>> > >> > >>>> > >>>>>> -- > >>>>>> 2.17.1 > >>>>>> > >>>> > >>>> -- > >>>> Sathyanarayanan Kuppuswamy > >>>> Linux Kernel Developer > >> > >> -- > >> Sathyanarayanan Kuppuswamy > >> Linux Kernel Developer > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
On 10/15/20 6:55 AM, Ethan Zhao wrote: > On Wed, Oct 14, 2020 at 5:00 PM Kuppuswamy Sathyanarayanan > <sathyanarayanan.nkuppuswamy@gmail.com> wrote: >> >> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") >> merged fatal and non-fatal error recovery paths, and also made >> recovery code depend on hotplug handler for "remove affected >> device + rescan" support. But this change also complicated the >> error recovery path and which in turn led to the following >> issues. >> >> 1. We depend on hotplug handler for removing the affected >> devices/drivers on DLLSC LINK down event (on DPC event >> trigger) and DPC handler for handling the error recovery. Since >> both handlers operate on same set of affected devices, it leads >> to race condition, which in turn leads to NULL pointer >> exceptions or error recovery failures.You can find more details >> about this issue in following link. >> >> https://lore.kernel.org/linux-pci/20201007113158.48933-1-haifeng.zhao@intel.com/T/#t >> >> 2. For non-hotplug capable devices fatal (DPC) error recovery >> is currently broken. Current fatal error recovery implementation >> relies on PCIe hotplug (pciehp) handler for detaching and >> re-enumerating the affected devices/drivers. So when dealing with >> non-hotplug capable devices, recovery code does not restore the state >> of the affected devices correctly. You can find more details about >> this issue in the following links. >> >> https://lore.kernel.org/linux-pci/20200527083130.4137-1-Zhiqiang.Hou@nxp.com/ >> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ >> https://lore.kernel.org/linux-pci/0e6f89cd6b9e4a72293cc90fafe93487d7c2d295.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com/ >> >> In order to fix the above two issues, we should stop relying on hotplug >> handler for cleaning the affected devices/drivers and let error recovery >> handler own this functionality. So this patch reverts Commit bdb5ac85777d >> ("PCI/ERR: Handle fatal error recovery") and re-introduce the "remove >> affected device + rescan" functionality in fatal error recovery handler. >> >> Also holding pci_lock_rescan_remove() will prevent the race between hotplug >> and DPC handler. >> >> Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") >> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> >> Reviewed-by: Sinan Kaya <okaya@kernel.org> >> --- >> Changes since v4: >> * Added new interfaces for error recovery (pcie_do_fatal_recovery() >> and pcie_do_nonfatal_recovery()). >> >> Changes since v5: >> * Fixed static/non-static declartion issue. >> >> Documentation/PCI/pci-error-recovery.rst | 47 ++++++++++------ >> Documentation/PCI/pcieaer-howto.rst | 2 +- >> drivers/pci/pci.h | 5 +- >> drivers/pci/pcie/aer.c | 10 ++-- >> drivers/pci/pcie/dpc.c | 2 +- >> drivers/pci/pcie/edr.c | 2 +- >> drivers/pci/pcie/err.c | 70 ++++++++++++++++++------ >> 7 files changed, 94 insertions(+), 44 deletions(-) >> >> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst >> index 84ceebb08cac..830c8af5838b 100644 >> --- a/Documentation/PCI/pci-error-recovery.rst >> +++ b/Documentation/PCI/pci-error-recovery.rst >> @@ -115,7 +115,7 @@ The actual steps taken by a platform to recover from a PCI error >> event will be platform-dependent, but will follow the general >> sequence described below. >> >> -STEP 0: Error Event >> +STEP 0: Error Event: ERR_NONFATAL >> ------------------- >> A PCI bus error is detected by the PCI hardware. On powerpc, the slot >> is isolated, in that all I/O is blocked: all reads return 0xffffffff, >> @@ -160,10 +160,10 @@ particular, if the platform doesn't isolate slots), and recovery >> proceeds to STEP 2 (MMIO Enable). >> >> If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), >> -then recovery proceeds to STEP 4 (Slot Reset). >> +then recovery proceeds to STEP 3 (Slot Reset). >> >> If the platform is unable to recover the slot, the next step >> -is STEP 6 (Permanent Failure). >> +is STEP 5 (Permanent Failure). >> >> .. note:: >> >> @@ -198,7 +198,7 @@ reset or some such, but not restart operations. This callback is made if >> all drivers on a segment agree that they can try to recover and if no automatic >> link reset was performed by the HW. If the platform can't just re-enable IOs >> without a slot reset or a link reset, it will not call this callback, and >> -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) >> +instead will have gone directly to STEP 3 (Slot Reset) >> >> .. note:: >> >> @@ -233,18 +233,12 @@ The driver should return one of the following result codes: >> >> The next step taken depends on the results returned by the drivers. >> If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform >> -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). >> +proceeds to STEP 4 (Resume Operations). >> >> If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform >> -proceeds to STEP 4 (Slot Reset) >> +proceeds to STEP 3 (Slot Reset) >> >> -STEP 3: Link Reset >> ------------------- >> -The platform resets the link. This is a PCI-Express specific step >> -and is done whenever a fatal error has been detected that can be >> -"solved" by resetting the link. >> - >> -STEP 4: Slot Reset >> +STEP 3: Slot Reset >> ------------------ >> >> In response to a return value of PCI_ERS_RESULT_NEED_RESET, the >> @@ -322,7 +316,7 @@ PCI card types:: >> + pdev->needs_freset = 1; >> + >> >> -Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent >> +Platform proceeds either to STEP 4 (Resume Operations) or STEP 5 (Permanent >> Failure). >> >> .. note:: >> @@ -332,7 +326,7 @@ Failure). >> However, it probably should. >> >> >> -STEP 5: Resume Operations >> +STEP 4: Resume Operations >> ------------------------- >> The platform will call the resume() callback on all affected device >> drivers if all drivers on the segment have returned >> @@ -344,7 +338,7 @@ a result code. >> At this point, if a new error happens, the platform will restart >> a new error recovery sequence. >> >> -STEP 6: Permanent Failure >> +STEP 5: Permanent Failure >> ------------------------- >> A "permanent failure" has occurred, and the platform cannot recover >> the device. The platform will call error_detected() with a >> @@ -367,6 +361,27 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt >> for additional detail on real-life experience of the causes of >> software errors. >> >> +STEP 0: Error Event: ERR_FATAL >> +-------------------- >> +PCI bus error is detected by the PCI hardware. On powerpc, the slot is >> +isolated, in that all I/O is blocked: all reads return 0xffffffff, all >> +writes are ignored. >> + >> +STEP 1: Remove devices >> +--------------------- >> +Platform removes the devices depending on the error agent, it could be >> +this port for all subordinates or upstream component (likely downstream >> +port) >> + >> +STEP 2: Reset link >> +--------------------- >> +The platform resets the link. This is a PCI-Express specific step and is >> +done whenever a fatal error has been detected that can be "solved" by >> +resetting the link. >> + >> +STEP 3: Re-enumerate the devices >> +--------------------- >> +Initiates the re-enumeration. >> >> Conclusion; General Remarks >> --------------------------- >> diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst >> index 0b36b9ebfa4b..9528cfd9449b 100644 >> --- a/Documentation/PCI/pcieaer-howto.rst >> +++ b/Documentation/PCI/pcieaer-howto.rst >> @@ -206,7 +206,7 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within >> a hierarchy in question. Then, performing link reset at upstream is >> necessary. As different kinds of devices might use different approaches >> to reset link, AER port service driver is required to provide the >> -function to reset link via callback parameter of pcie_do_recovery() >> +function to reset link via callback parameter of pcie_do_fatal_recovery() >> function. If reset_link is not NULL, recovery function will use it >> to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER >> and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes >> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h >> index fa12f7cbc1a0..b3e1571107b5 100644 >> --- a/drivers/pci/pci.h >> +++ b/drivers/pci/pci.h >> @@ -555,8 +555,9 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev) >> #endif >> >> /* PCI error reporting and recovery */ >> -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> - pci_channel_state_t state, >> +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev); >> + >> +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, >> pci_ers_result_t (*reset_link)(struct pci_dev *pdev)); >> >> bool pcie_wait_for_link(struct pci_dev *pdev, bool active); >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index 65dff5f3457a..f3e70bb9b30d 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -947,9 +947,9 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) >> if (pcie_aer_is_native(dev)) >> pcie_clear_device_status(dev); >> } else if (info->severity == AER_NONFATAL) >> - pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); >> + pcie_do_nonfatal_recovery(dev); >> else if (info->severity == AER_FATAL) >> - pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset); >> + pcie_do_fatal_recovery(dev, aer_root_reset); >> pci_dev_put(dev); >> } >> >> @@ -985,11 +985,9 @@ static void aer_recover_work_func(struct work_struct *work) >> } >> cper_print_aer(pdev, entry.severity, entry.regs); >> if (entry.severity == AER_NONFATAL) >> - pcie_do_recovery(pdev, pci_channel_io_normal, >> - aer_root_reset); >> + pcie_do_nonfatal_recovery(pdev); >> else if (entry.severity == AER_FATAL) >> - pcie_do_recovery(pdev, pci_channel_io_frozen, >> - aer_root_reset); >> + pcie_do_fatal_recovery(pdev, aer_root_reset); >> pci_dev_put(pdev); >> } >> } >> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c >> index daa9a4153776..74e7d1da3cf0 100644 >> --- a/drivers/pci/pcie/dpc.c >> +++ b/drivers/pci/pcie/dpc.c >> @@ -233,7 +233,7 @@ static irqreturn_t dpc_handler(int irq, void *context) >> dpc_process_error(pdev); >> >> /* We configure DPC so it only triggers on ERR_FATAL */ >> - pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link); >> + pcie_do_fatal_recovery(pdev, dpc_reset_link); >> >> return IRQ_HANDLED; >> } >> diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c >> index a6b9b479b97a..87379bc566f6 100644 >> --- a/drivers/pci/pcie/edr.c >> +++ b/drivers/pci/pcie/edr.c >> @@ -183,7 +183,7 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data) >> * or ERR_NONFATAL, since the link is already down, use the FATAL >> * error recovery path for both cases. >> */ >> - estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link); >> + estate = pcie_do_fatal_recovery(edev, dpc_reset_link); >> >> send_ost: >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 315a4d559c4c..fa50366a9632 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -79,11 +79,6 @@ static int report_error_detected(struct pci_dev *dev, >> return 0; >> } >> >> -static int report_frozen_detected(struct pci_dev *dev, void *data) >> -{ >> - return report_error_detected(dev, pci_channel_io_frozen, data); >> -} >> - >> static int report_normal_detected(struct pci_dev *dev, void *data) >> { >> return report_error_detected(dev, pci_channel_io_normal, data); >> @@ -146,9 +141,59 @@ static int report_resume(struct pci_dev *dev, void *data) >> return 0; >> } >> >> -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> - pci_channel_state_t state, >> +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, >> pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) >> +{ >> + struct pci_dev *udev; >> + struct pci_bus *parent; >> + struct pci_dev *pdev, *temp; >> + pci_ers_result_t result; >> + >> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) >> + udev = dev; >> + else >> + udev = dev->bus->self; >> + >> + parent = udev->subordinate; >> + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); >> + >> + pci_lock_rescan_remove(); >> + pci_dev_get(dev); >> + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, >> + bus_list) { >> + pci_stop_and_remove_bus_device(pdev); > > Again , pciehp driver wouldn't naturally ignore the DLLSC/PDC event, > or because you do > pci_stop_and_remove_bus_device(pdev) here, pciehp driver wouldn't do > pcie_disable_slot() > anymore, when DPC driver get its interrupt and run into here, pciehp > driver also gets DLLSC > /PDC and may run into SUPPRISE_REMOVAL first. so there would be duplicated > pci_stop_and_remove_bus_device(pdev); If you check the implementation of pci_stop_and_remove_bus_device(), all it does is to go over the list of sub-devices and bus device, and then do pci_stop_dev() and pci_destroy_dev() operations. My point is, once the device is stopped or removed then in other thread which executes pci_stop_and_remove_bus_device(), the same device will not be present. And hence we are not repeating the core operation. Ofcourse there are still some duplication (like DLLSC/PDC event handling). But I am not trying to solve that problem here. I am just making error recovery handler own this functionality to solve, 1. recovery problem of non-hotplug capapble devices. 2. remove dependency on hotplug handler for handline remove and rescan functionality. 3. Make pcie_do_recovery() less complex. > > Don't say it would naturally be okay. check it. > >> + } >> + >> + result = reset_link(udev); >> + >> + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { >> + /* >> + * If the error is reported by a bridge, we think this error >> + * is related to the downstream link of the bridge, so we >> + * do error recovery on all subordinates of the bridge instead >> + * of the bridge and clear the error status of the bridge. >> + */ >> + pci_aer_clear_fatal_status(dev); >> + if (pcie_aer_is_native(dev)) >> + pcie_clear_device_status(dev); >> + } >> + >> + if (result == PCI_ERS_RESULT_RECOVERED) { >> + if (pcie_wait_for_link(udev, true)) >> + pci_rescan_bus(udev->bus); > So does to this pci_rescan_bus(udev->bus); > You said you won't rely on pciehp handler, but how do you eliminate the > handler's duplicated actions. > > Thanks, > Ethan >> + pci_info(dev, "Device recovery from fatal error successful\n"); >> + } else { >> + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); >> + pci_info(dev, "Device recovery from fatal error failed\n"); >> + } >> + >> + pci_dev_put(dev); >> + pci_unlock_rescan_remove(); >> + >> + return result; >> +} >> + >> +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev) >> { >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_bus *bus; >> @@ -164,16 +209,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> bus = dev->subordinate; >> >> pci_dbg(dev, "broadcast error_detected message\n"); >> - if (state == pci_channel_io_frozen) { >> - pci_walk_bus(bus, report_frozen_detected, &status); >> - status = reset_link(dev); >> - if (status != PCI_ERS_RESULT_RECOVERED) { >> - pci_warn(dev, "link reset failed\n"); >> - goto failed; >> - } >> - } else { >> - pci_walk_bus(bus, report_normal_detected, &status); >> - } >> + pci_walk_bus(bus, report_normal_detected, &status); >> >> if (status == PCI_ERS_RESULT_CAN_RECOVER) { >> status = PCI_ERS_RESULT_RECOVERED; >> -- >> 2.17.1 >>
diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst index 84ceebb08cac..830c8af5838b 100644 --- a/Documentation/PCI/pci-error-recovery.rst +++ b/Documentation/PCI/pci-error-recovery.rst @@ -115,7 +115,7 @@ The actual steps taken by a platform to recover from a PCI error event will be platform-dependent, but will follow the general sequence described below. -STEP 0: Error Event +STEP 0: Error Event: ERR_NONFATAL ------------------- A PCI bus error is detected by the PCI hardware. On powerpc, the slot is isolated, in that all I/O is blocked: all reads return 0xffffffff, @@ -160,10 +160,10 @@ particular, if the platform doesn't isolate slots), and recovery proceeds to STEP 2 (MMIO Enable). If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), -then recovery proceeds to STEP 4 (Slot Reset). +then recovery proceeds to STEP 3 (Slot Reset). If the platform is unable to recover the slot, the next step -is STEP 6 (Permanent Failure). +is STEP 5 (Permanent Failure). .. note:: @@ -198,7 +198,7 @@ reset or some such, but not restart operations. This callback is made if all drivers on a segment agree that they can try to recover and if no automatic link reset was performed by the HW. If the platform can't just re-enable IOs without a slot reset or a link reset, it will not call this callback, and -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) +instead will have gone directly to STEP 3 (Slot Reset) .. note:: @@ -233,18 +233,12 @@ The driver should return one of the following result codes: The next step taken depends on the results returned by the drivers. If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). +proceeds to STEP 4 (Resume Operations). If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform -proceeds to STEP 4 (Slot Reset) +proceeds to STEP 3 (Slot Reset) -STEP 3: Link Reset ------------------- -The platform resets the link. This is a PCI-Express specific step -and is done whenever a fatal error has been detected that can be -"solved" by resetting the link. - -STEP 4: Slot Reset +STEP 3: Slot Reset ------------------ In response to a return value of PCI_ERS_RESULT_NEED_RESET, the @@ -322,7 +316,7 @@ PCI card types:: + pdev->needs_freset = 1; + -Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent +Platform proceeds either to STEP 4 (Resume Operations) or STEP 5 (Permanent Failure). .. note:: @@ -332,7 +326,7 @@ Failure). However, it probably should. -STEP 5: Resume Operations +STEP 4: Resume Operations ------------------------- The platform will call the resume() callback on all affected device drivers if all drivers on the segment have returned @@ -344,7 +338,7 @@ a result code. At this point, if a new error happens, the platform will restart a new error recovery sequence. -STEP 6: Permanent Failure +STEP 5: Permanent Failure ------------------------- A "permanent failure" has occurred, and the platform cannot recover the device. The platform will call error_detected() with a @@ -367,6 +361,27 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt for additional detail on real-life experience of the causes of software errors. +STEP 0: Error Event: ERR_FATAL +-------------------- +PCI bus error is detected by the PCI hardware. On powerpc, the slot is +isolated, in that all I/O is blocked: all reads return 0xffffffff, all +writes are ignored. + +STEP 1: Remove devices +--------------------- +Platform removes the devices depending on the error agent, it could be +this port for all subordinates or upstream component (likely downstream +port) + +STEP 2: Reset link +--------------------- +The platform resets the link. This is a PCI-Express specific step and is +done whenever a fatal error has been detected that can be "solved" by +resetting the link. + +STEP 3: Re-enumerate the devices +--------------------- +Initiates the re-enumeration. Conclusion; General Remarks --------------------------- diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst index 0b36b9ebfa4b..9528cfd9449b 100644 --- a/Documentation/PCI/pcieaer-howto.rst +++ b/Documentation/PCI/pcieaer-howto.rst @@ -206,7 +206,7 @@ error_detected(dev, pci_channel_io_frozen) to all drivers within a hierarchy in question. Then, performing link reset at upstream is necessary. As different kinds of devices might use different approaches to reset link, AER port service driver is required to provide the -function to reset link via callback parameter of pcie_do_recovery() +function to reset link via callback parameter of pcie_do_fatal_recovery() function. If reset_link is not NULL, recovery function will use it to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fa12f7cbc1a0..b3e1571107b5 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -555,8 +555,9 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev) #endif /* PCI error reporting and recovery */ -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, - pci_channel_state_t state, +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev); + +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, pci_ers_result_t (*reset_link)(struct pci_dev *pdev)); bool pcie_wait_for_link(struct pci_dev *pdev, bool active); diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 65dff5f3457a..f3e70bb9b30d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -947,9 +947,9 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) if (pcie_aer_is_native(dev)) pcie_clear_device_status(dev); } else if (info->severity == AER_NONFATAL) - pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); + pcie_do_nonfatal_recovery(dev); else if (info->severity == AER_FATAL) - pcie_do_recovery(dev, pci_channel_io_frozen, aer_root_reset); + pcie_do_fatal_recovery(dev, aer_root_reset); pci_dev_put(dev); } @@ -985,11 +985,9 @@ static void aer_recover_work_func(struct work_struct *work) } cper_print_aer(pdev, entry.severity, entry.regs); if (entry.severity == AER_NONFATAL) - pcie_do_recovery(pdev, pci_channel_io_normal, - aer_root_reset); + pcie_do_nonfatal_recovery(pdev); else if (entry.severity == AER_FATAL) - pcie_do_recovery(pdev, pci_channel_io_frozen, - aer_root_reset); + pcie_do_fatal_recovery(pdev, aer_root_reset); pci_dev_put(pdev); } } diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index daa9a4153776..74e7d1da3cf0 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -233,7 +233,7 @@ static irqreturn_t dpc_handler(int irq, void *context) dpc_process_error(pdev); /* We configure DPC so it only triggers on ERR_FATAL */ - pcie_do_recovery(pdev, pci_channel_io_frozen, dpc_reset_link); + pcie_do_fatal_recovery(pdev, dpc_reset_link); return IRQ_HANDLED; } diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c index a6b9b479b97a..87379bc566f6 100644 --- a/drivers/pci/pcie/edr.c +++ b/drivers/pci/pcie/edr.c @@ -183,7 +183,7 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data) * or ERR_NONFATAL, since the link is already down, use the FATAL * error recovery path for both cases. */ - estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link); + estate = pcie_do_fatal_recovery(edev, dpc_reset_link); send_ost: diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 315a4d559c4c..fa50366a9632 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -79,11 +79,6 @@ static int report_error_detected(struct pci_dev *dev, return 0; } -static int report_frozen_detected(struct pci_dev *dev, void *data) -{ - return report_error_detected(dev, pci_channel_io_frozen, data); -} - static int report_normal_detected(struct pci_dev *dev, void *data) { return report_error_detected(dev, pci_channel_io_normal, data); @@ -146,9 +141,59 @@ static int report_resume(struct pci_dev *dev, void *data) return 0; } -pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, - pci_channel_state_t state, +pci_ers_result_t pcie_do_fatal_recovery(struct pci_dev *dev, pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) +{ + struct pci_dev *udev; + struct pci_bus *parent; + struct pci_dev *pdev, *temp; + pci_ers_result_t result; + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) + udev = dev; + else + udev = dev->bus->self; + + parent = udev->subordinate; + pci_walk_bus(parent, pci_dev_set_disconnected, NULL); + + pci_lock_rescan_remove(); + pci_dev_get(dev); + list_for_each_entry_safe_reverse(pdev, temp, &parent->devices, + bus_list) { + pci_stop_and_remove_bus_device(pdev); + } + + result = reset_link(udev); + + if (dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) { + /* + * If the error is reported by a bridge, we think this error + * is related to the downstream link of the bridge, so we + * do error recovery on all subordinates of the bridge instead + * of the bridge and clear the error status of the bridge. + */ + pci_aer_clear_fatal_status(dev); + if (pcie_aer_is_native(dev)) + pcie_clear_device_status(dev); + } + + if (result == PCI_ERS_RESULT_RECOVERED) { + if (pcie_wait_for_link(udev, true)) + pci_rescan_bus(udev->bus); + pci_info(dev, "Device recovery from fatal error successful\n"); + } else { + pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); + pci_info(dev, "Device recovery from fatal error failed\n"); + } + + pci_dev_put(dev); + pci_unlock_rescan_remove(); + + return result; +} + +pci_ers_result_t pcie_do_nonfatal_recovery(struct pci_dev *dev) { pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_bus *bus; @@ -164,16 +209,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, bus = dev->subordinate; pci_dbg(dev, "broadcast error_detected message\n"); - if (state == pci_channel_io_frozen) { - pci_walk_bus(bus, report_frozen_detected, &status); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { - pci_warn(dev, "link reset failed\n"); - goto failed; - } - } else { - pci_walk_bus(bus, report_normal_detected, &status); - } + pci_walk_bus(bus, report_normal_detected, &status); if (status == PCI_ERS_RESULT_CAN_RECOVER) { status = PCI_ERS_RESULT_RECOVERED;