Message ID | 20240617200411.1426554-2-terry.bowman@amd.com |
---|---|
State | New, archived |
Headers | show |
Series | Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports | expand |
On Mon, 17 Jun 2024 15:04:03 -0500 Terry Bowman <terry.bowman@amd.com> wrote: > The AER service driver does not currently call a handler for AER > uncorrectable errors (UCE) detected in root ports or downstream > ports. This is not needed in most cases because common PCIe port > functionality is handled by portdrv service drivers. > > CXL root ports include CXL specific RAS registers that need logging > before starting do_recovery() in the UCE case. > > Update the AER service driver to call the UCE handler for root ports > and downstream ports. These PCIe port devices are bound to the portdrv > driver that includes a CE and UCE handler to be called. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 705893b5f7b0..a4db474b2be5 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > + /* > + * PCIe ports may include functionality beyond the standard > + * extended port capabilities. This may present a need to log and > + * handle errors not addressed in this driver. Examples are CXL > + * root ports and CXL downstream switch ports using AER UIE to > + * indicate CXL UCE RAS protocol errors. > + */ > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->error_detected) { > + const struct pci_error_handlers *err_handler; > + > + err_handler = pdrv->err_handler; > + status = err_handler->error_detected(dev, state); This status is going to get overridden by one of the pci_walk_bridge() calls. Should it be kept around and acted on, or dropped silently? (I'd guess no for silent!). > + } > + } > + > /* > * If the error was detected by a Root Port, Downstream Port, RCEC, > * or RCiEP, recovery runs on the device itself. For Ports, that
Terry Bowman wrote: > The AER service driver does not currently call a handler for AER > uncorrectable errors (UCE) detected in root ports or downstream > ports. This is not needed in most cases because common PCIe port > functionality is handled by portdrv service drivers. > > CXL root ports include CXL specific RAS registers that need logging > before starting do_recovery() in the UCE case. > > Update the AER service driver to call the UCE handler for root ports > and downstream ports. These PCIe port devices are bound to the portdrv > driver that includes a CE and UCE handler to be called. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > --- > drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 705893b5f7b0..a4db474b2be5 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > > + /* > + * PCIe ports may include functionality beyond the standard > + * extended port capabilities. This may present a need to log and > + * handle errors not addressed in this driver. Examples are CXL > + * root ports and CXL downstream switch ports using AER UIE to > + * indicate CXL UCE RAS protocol errors. > + */ > + if (type == PCI_EXP_TYPE_ROOT_PORT || > + type == PCI_EXP_TYPE_DOWNSTREAM) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->error_detected) { > + const struct pci_error_handlers *err_handler; > + > + err_handler = pdrv->err_handler; > + status = err_handler->error_detected(dev, state); > + } > + } > + Would not a more appropriate place for this be pci_walk_bridge() where the ->subordinate == NULL and these type-check cases are unified?
Hi Jonathan, I added a response below. On 6/20/24 06:21, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:03 -0500 > Terry Bowman <terry.bowman@amd.com> wrote: > >> The AER service driver does not currently call a handler for AER >> uncorrectable errors (UCE) detected in root ports or downstream >> ports. This is not needed in most cases because common PCIe port >> functionality is handled by portdrv service drivers. >> >> CXL root ports include CXL specific RAS registers that need logging >> before starting do_recovery() in the UCE case. >> >> Update the AER service driver to call the UCE handler for root ports >> and downstream ports. These PCIe port devices are bound to the portdrv >> driver that includes a CE and UCE handler to be called. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 705893b5f7b0..a4db474b2be5 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> >> + /* >> + * PCIe ports may include functionality beyond the standard >> + * extended port capabilities. This may present a need to log and >> + * handle errors not addressed in this driver. Examples are CXL >> + * root ports and CXL downstream switch ports using AER UIE to >> + * indicate CXL UCE RAS protocol errors. >> + */ >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM) { >> + struct pci_driver *pdrv = dev->driver; >> + >> + if (pdrv && pdrv->err_handler && >> + pdrv->err_handler->error_detected) { >> + const struct pci_error_handlers *err_handler; >> + >> + err_handler = pdrv->err_handler; >> + status = err_handler->error_detected(dev, state); > > This status is going to get overridden by one of the pci_walk_bridge() > calls. Should it be kept around and acted on, or dropped silently? > (I'd guess no for silent!). > It should be used later. According to PCI spec "The only method of recovering from an Uncorrectable Internal Error is reset or hardware replacement." I need to make certain that carries through below. Regards, Terry >> + } >> + } >> + >> /* >> * If the error was detected by a Root Port, Downstream Port, RCEC, >> * or RCiEP, recovery runs on the device itself. For Ports, that >
Hi Dan, I added a response below. On 6/21/24 14:17, Dan Williams wrote: > Terry Bowman wrote: >> The AER service driver does not currently call a handler for AER >> uncorrectable errors (UCE) detected in root ports or downstream >> ports. This is not needed in most cases because common PCIe port >> functionality is handled by portdrv service drivers. >> >> CXL root ports include CXL specific RAS registers that need logging >> before starting do_recovery() in the UCE case. >> >> Update the AER service driver to call the UCE handler for root ports >> and downstream ports. These PCIe port devices are bound to the portdrv >> driver that includes a CE and UCE handler to be called. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: linux-pci@vger.kernel.org >> --- >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index 705893b5f7b0..a4db474b2be5 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >> >> + /* >> + * PCIe ports may include functionality beyond the standard >> + * extended port capabilities. This may present a need to log and >> + * handle errors not addressed in this driver. Examples are CXL >> + * root ports and CXL downstream switch ports using AER UIE to >> + * indicate CXL UCE RAS protocol errors. >> + */ >> + if (type == PCI_EXP_TYPE_ROOT_PORT || >> + type == PCI_EXP_TYPE_DOWNSTREAM) { >> + struct pci_driver *pdrv = dev->driver; >> + >> + if (pdrv && pdrv->err_handler && >> + pdrv->err_handler->error_detected) { >> + const struct pci_error_handlers *err_handler; >> + >> + err_handler = pdrv->err_handler; >> + status = err_handler->error_detected(dev, state); >> + } >> + } >> + > > Would not a more appropriate place for this be pci_walk_bridge() where > the ->subordinate == NULL and these type-check cases are unified? It does. I can take a look at moving that. Regards, Terry
On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: > Hi Dan, > > I added a response below. > > On 6/21/24 14:17, Dan Williams wrote: > > Terry Bowman wrote: > >> The AER service driver does not currently call a handler for AER > >> uncorrectable errors (UCE) detected in root ports or downstream > >> ports. This is not needed in most cases because common PCIe port > >> functionality is handled by portdrv service drivers. > >> > >> CXL root ports include CXL specific RAS registers that need logging > >> before starting do_recovery() in the UCE case. > >> > >> Update the AER service driver to call the UCE handler for root ports > >> and downstream ports. These PCIe port devices are bound to the portdrv > >> driver that includes a CE and UCE handler to be called. > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> Cc: Bjorn Helgaas <bhelgaas@google.com> > >> Cc: linux-pci@vger.kernel.org > >> --- > >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >> index 705893b5f7b0..a4db474b2be5 100644 > >> --- a/drivers/pci/pcie/err.c > >> +++ b/drivers/pci/pcie/err.c > >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > >> > >> + /* > >> + * PCIe ports may include functionality beyond the standard > >> + * extended port capabilities. This may present a need to log and > >> + * handle errors not addressed in this driver. Examples are CXL > >> + * root ports and CXL downstream switch ports using AER UIE to > >> + * indicate CXL UCE RAS protocol errors. > >> + */ > >> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >> + type == PCI_EXP_TYPE_DOWNSTREAM) { > >> + struct pci_driver *pdrv = dev->driver; > >> + > >> + if (pdrv && pdrv->err_handler && > >> + pdrv->err_handler->error_detected) { > >> + const struct pci_error_handlers *err_handler; > >> + > >> + err_handler = pdrv->err_handler; > >> + status = err_handler->error_detected(dev, state); > >> + } > >> + } > >> + > > > > Would not a more appropriate place for this be pci_walk_bridge() where > > the ->subordinate == NULL and these type-check cases are unified? > > It does. I can take a look at moving that. Has that already been handled in pci_walk_bridge? The function pci_walk_bridge() will call report_error_detected, where the err handler will be called. https://elixir.bootlin.com/linux/v6.10-rc6/source/drivers/pci/pcie/err.c#L80 Fan > > Regards, > Terry
Hi Fan, On 7/10/24 15:48, nifan.cxl@gmail.com wrote: > On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: >> Hi Dan, >> >> I added a response below. >> >> On 6/21/24 14:17, Dan Williams wrote: >>> Terry Bowman wrote: >>>> The AER service driver does not currently call a handler for AER >>>> uncorrectable errors (UCE) detected in root ports or downstream >>>> ports. This is not needed in most cases because common PCIe port >>>> functionality is handled by portdrv service drivers. >>>> >>>> CXL root ports include CXL specific RAS registers that need logging >>>> before starting do_recovery() in the UCE case. >>>> >>>> Update the AER service driver to call the UCE handler for root ports >>>> and downstream ports. These PCIe port devices are bound to the portdrv >>>> driver that includes a CE and UCE handler to be called. >>>> >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >>>> Cc: Bjorn Helgaas <bhelgaas@google.com> >>>> Cc: linux-pci@vger.kernel.org >>>> --- >>>> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ >>>> 1 file changed, 20 insertions(+) >>>> >>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>> index 705893b5f7b0..a4db474b2be5 100644 >>>> --- a/drivers/pci/pcie/err.c >>>> +++ b/drivers/pci/pcie/err.c >>>> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>>> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >>>> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); >>>> >>>> + /* >>>> + * PCIe ports may include functionality beyond the standard >>>> + * extended port capabilities. This may present a need to log and >>>> + * handle errors not addressed in this driver. Examples are CXL >>>> + * root ports and CXL downstream switch ports using AER UIE to >>>> + * indicate CXL UCE RAS protocol errors. >>>> + */ >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || >>>> + type == PCI_EXP_TYPE_DOWNSTREAM) { >>>> + struct pci_driver *pdrv = dev->driver; >>>> + >>>> + if (pdrv && pdrv->err_handler && >>>> + pdrv->err_handler->error_detected) { >>>> + const struct pci_error_handlers *err_handler; >>>> + >>>> + err_handler = pdrv->err_handler; >>>> + status = err_handler->error_detected(dev, state); >>>> + } >>>> + } >>>> + >>> >>> Would not a more appropriate place for this be pci_walk_bridge() where >>> the ->subordinate == NULL and these type-check cases are unified? >> >> It does. I can take a look at moving that. > > Has that already been handled in pci_walk_bridge? > > The function pci_walk_bridge() will call report_error_detected, where > the err handler will be called. > https://elixir.bootlin.com/linux/v6.10-rc6/source/drivers/pci/pcie/err.c#L80 > > Fan > You would think so but the UCE handler was not called in my testing for the PCIe ports (RP,USP,DSP). The pci_walk_bridge() function has 2 cases: - If there is a subordinate/secondary bus then the callback is called for those downstream devices but not the port itself. - If there is no subordinate/secondary bus then the callback is invoked for the port itself. The function header comment may explain it better: /** * pci_walk_bridge - walk bridges potentially AER affected * @bridge: bridge which may be a Port, an RCEC, or an RCiEP * @cb: callback to be called for each device found * @userdata: arbitrary pointer to be passed to callback * * If the device provided is a bridge, walk the subordinate bus, including * any bridged devices on buses under this bus. Call the provided callback * on each device found. * * If the device provided has no subordinate bus, e.g., an RCEC or RCiEP, * call the callback on the device itself. */ Regards, Terry >> >> Regards, >> Terry
On Wed, Jul 10, 2024 at 04:48:09PM -0500, Terry Bowman wrote: > Hi Fan, > > On 7/10/24 15:48, nifan.cxl@gmail.com wrote: > > On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: > >> Hi Dan, > >> > >> I added a response below. > >> > >> On 6/21/24 14:17, Dan Williams wrote: > >>> Terry Bowman wrote: > >>>> The AER service driver does not currently call a handler for AER > >>>> uncorrectable errors (UCE) detected in root ports or downstream > >>>> ports. This is not needed in most cases because common PCIe port > >>>> functionality is handled by portdrv service drivers. > >>>> > >>>> CXL root ports include CXL specific RAS registers that need logging > >>>> before starting do_recovery() in the UCE case. > >>>> > >>>> Update the AER service driver to call the UCE handler for root ports > >>>> and downstream ports. These PCIe port devices are bound to the portdrv > >>>> driver that includes a CE and UCE handler to be called. > >>>> > >>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >>>> Cc: Bjorn Helgaas <bhelgaas@google.com> > >>>> Cc: linux-pci@vger.kernel.org > >>>> --- > >>>> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > >>>> 1 file changed, 20 insertions(+) > >>>> > >>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >>>> index 705893b5f7b0..a4db474b2be5 100644 > >>>> --- a/drivers/pci/pcie/err.c > >>>> +++ b/drivers/pci/pcie/err.c > >>>> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >>>> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > >>>> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > >>>> > >>>> + /* > >>>> + * PCIe ports may include functionality beyond the standard > >>>> + * extended port capabilities. This may present a need to log and > >>>> + * handle errors not addressed in this driver. Examples are CXL > >>>> + * root ports and CXL downstream switch ports using AER UIE to > >>>> + * indicate CXL UCE RAS protocol errors. > >>>> + */ > >>>> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >>>> + type == PCI_EXP_TYPE_DOWNSTREAM) { > >>>> + struct pci_driver *pdrv = dev->driver; > >>>> + > >>>> + if (pdrv && pdrv->err_handler && > >>>> + pdrv->err_handler->error_detected) { > >>>> + const struct pci_error_handlers *err_handler; > >>>> + > >>>> + err_handler = pdrv->err_handler; > >>>> + status = err_handler->error_detected(dev, state); > >>>> + } > >>>> + } > >>>> + > >>> > >>> Would not a more appropriate place for this be pci_walk_bridge() where > >>> the ->subordinate == NULL and these type-check cases are unified? > >> > >> It does. I can take a look at moving that. > > > > Has that already been handled in pci_walk_bridge? > > > > The function pci_walk_bridge() will call report_error_detected, where > > the err handler will be called. > > https://elixir.bootlin.com/linux/v6.10-rc6/source/drivers/pci/pcie/err.c#L80 > > > > Fan > > > > You would think so but the UCE handler was not called in my testing for the PCIe > ports (RP,USP,DSP). The pci_walk_bridge() function has 2 cases: > - If there is a subordinate/secondary bus then the callback is called for > those downstream devices but not the port itself. > - If there is no subordinate/secondary bus then the callback is invoked for the > port itself. > > The function header comment may explain it better: > /** > * pci_walk_bridge - walk bridges potentially AER affected > * @bridge: bridge which may be a Port, an RCEC, or an RCiEP > * @cb: callback to be called for each device found > * @userdata: arbitrary pointer to be passed to callback > * > * If the device provided is a bridge, walk the subordinate bus, including > * any bridged devices on buses under this bus. Call the provided callback > * on each device found. > * > * If the device provided has no subordinate bus, e.g., an RCEC or RCiEP, > * call the callback on the device itself. > */ > > Regards, > Terry OK, interesting. Btw, what is the "state" passed to pcie_do_recovery(...state...)? Fan > > >> > >> Regards, > >> Terry
On Mon, Jun 24, 2024 at 12:56:29PM -0500, Terry Bowman wrote: > Hi Dan, > > I added a response below. > > On 6/21/24 14:17, Dan Williams wrote: > > Terry Bowman wrote: > >> The AER service driver does not currently call a handler for AER > >> uncorrectable errors (UCE) detected in root ports or downstream > >> ports. This is not needed in most cases because common PCIe port > >> functionality is handled by portdrv service drivers. > >> > >> CXL root ports include CXL specific RAS registers that need logging > >> before starting do_recovery() in the UCE case. > >> > >> Update the AER service driver to call the UCE handler for root ports > >> and downstream ports. These PCIe port devices are bound to the portdrv > >> driver that includes a CE and UCE handler to be called. > >> > >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> > >> Cc: Bjorn Helgaas <bhelgaas@google.com> > >> Cc: linux-pci@vger.kernel.org > >> --- > >> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ > >> 1 file changed, 20 insertions(+) > >> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >> index 705893b5f7b0..a4db474b2be5 100644 > >> --- a/drivers/pci/pcie/err.c > >> +++ b/drivers/pci/pcie/err.c > >> @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > >> pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > >> struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); > >> > >> + /* > >> + * PCIe ports may include functionality beyond the standard > >> + * extended port capabilities. This may present a need to log and > >> + * handle errors not addressed in this driver. Examples are CXL > >> + * root ports and CXL downstream switch ports using AER UIE to > >> + * indicate CXL UCE RAS protocol errors. > >> + */ > >> + if (type == PCI_EXP_TYPE_ROOT_PORT || > >> + type == PCI_EXP_TYPE_DOWNSTREAM) { > >> + struct pci_driver *pdrv = dev->driver; > >> + > >> + if (pdrv && pdrv->err_handler && > >> + pdrv->err_handler->error_detected) { > >> + const struct pci_error_handlers *err_handler; > >> + > >> + err_handler = pdrv->err_handler; > >> + status = err_handler->error_detected(dev, state); > >> + } > >> + } > >> + > > > > Would not a more appropriate place for this be pci_walk_bridge() where > > the ->subordinate == NULL and these type-check cases are unified? > > It does. I can take a look at moving that. > Based on current code logic, the code added here will be executed as long as the type matches (downstream port or root port), and I also noticed the case ->subordinate == NULL never gets touched when I try to inject an error through the aer_inject module and the user space tool. If my way to do error injection is right, it means the behaviour will get changed after the code move. Here is some of my experimental setup: QEMU + cxl topology (one type3 memdev directly attached to a HB with a single root port). 1. Load the cxl related drivers before error injection 2. Do aer inject with aer_inject inside the QEMU VM # aer_inject ~/nonfatal aer inject input file looks like below ----------------------------------------------------- fan:~/cxl/linux-fixes$ cat ~/nonfatal # Inject an uncorrectable/non-fatal training error into the device # with header log words 0 1 2 3. # # Either specify the PCI id on the command-line option or uncomment and edit # the PCI_ID line below using the correct PCI ID. # # Note that system firmware/BIOS may mask certain errors, change their severity # and/or not report header log words. # AER PCI_ID 0000:0c:00.0 UNCOR_STATUS COMP_ABORT HEADER_LOG 0 1 2 3 ----------------------------------------------------- The "lspci" output on the VM looks like below ---------------------------------------------------- Qemu: execute "lspci" on VM 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller 00:01.0 VGA compatible controller: Device 1234:1111 (rev 02) 00:02.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) 00:03.0 Unclassified device [0002]: Red Hat, Inc. Virtio filesystem 00:04.0 Unclassified device [0002]: Red Hat, Inc. Virtio filesystem 00:05.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02) 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02) 0c:00.0 PCI bridge: Intel Corporation Device 7075 0d:00.0 CXL: Intel Corporation Device 0d93 (rev 01) -------------------------------------------------- Fan > Regards, > Terry
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 705893b5f7b0..a4db474b2be5 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -203,6 +203,26 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_host_bridge *host = pci_find_host_bridge(dev->bus); + /* + * PCIe ports may include functionality beyond the standard + * extended port capabilities. This may present a need to log and + * handle errors not addressed in this driver. Examples are CXL + * root ports and CXL downstream switch ports using AER UIE to + * indicate CXL UCE RAS protocol errors. + */ + if (type == PCI_EXP_TYPE_ROOT_PORT || + type == PCI_EXP_TYPE_DOWNSTREAM) { + struct pci_driver *pdrv = dev->driver; + + if (pdrv && pdrv->err_handler && + pdrv->err_handler->error_detected) { + const struct pci_error_handlers *err_handler; + + err_handler = pdrv->err_handler; + status = err_handler->error_detected(dev, state); + } + } + /* * If the error was detected by a Root Port, Downstream Port, RCEC, * or RCiEP, recovery runs on the device itself. For Ports, that
The AER service driver does not currently call a handler for AER uncorrectable errors (UCE) detected in root ports or downstream ports. This is not needed in most cases because common PCIe port functionality is handled by portdrv service drivers. CXL root ports include CXL specific RAS registers that need logging before starting do_recovery() in the UCE case. Update the AER service driver to call the UCE handler for root ports and downstream ports. These PCIe port devices are bound to the portdrv driver that includes a CE and UCE handler to be called. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org --- drivers/pci/pcie/err.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)