Message ID | 20250211192444.2292833-2-terry.bowman@amd.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | Enable CXL PCIe port protocol error handling and logging | expand |
On Tue, Feb 11, 2025 at 01:24:28PM -0600, Terry Bowman wrote: > CXL.io is implemented on top of PCIe Protocol Errors. But, CXL.io and PCIe > have different handling requirements for uncorrectable errors (UCE). > > The PCIe AER service driver may attempt recovering PCIe devices with > UCE while recovery is not used for CXL.io. Recovery is not used in the > CXL.io case because of potential corruption on what can be system memory. > > Create pci_driver::cxl_err_handlers structure similar to > pci_driver::error_handler. Create handlers for correctable and > uncorrectable CXL.io error handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > Port Protocol Error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > Reviewed-by: Fan Ni <fan.ni@samsung.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Gregory Price <gourry@gourry.net> Acked-by: Bjorn Helgaas <bhelgaas@google.com> > --- > include/linux/pci.h | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 47b31ad724fa..1d62e785ae1f 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -884,6 +884,14 @@ struct pci_error_handlers { > void (*cor_error_detected)(struct pci_dev *dev); > }; > > +/* Compute Express Link (CXL) bus error event callbacks */ > +struct cxl_error_handlers { > + /* CXL bus error detected on this device */ > + pci_ers_result_t (*error_detected)(struct pci_dev *dev); > + > + /* Allow device driver to record more details of a correctable error */ > + void (*cor_error_detected)(struct pci_dev *dev); > +}; > > struct module; > > @@ -929,6 +937,7 @@ struct module; > * @sriov_get_vf_total_msix: PF driver callback to get the total number of > * MSI-X vectors available for distribution to the VFs. > * @err_handler: See Documentation/PCI/pci-error-recovery.rst > + * @cxl_err_handler: Compute Express Link specific error handlers. > * @groups: Sysfs attribute groups. > * @dev_groups: Attributes attached to the device that will be > * created once it is bound to the driver. > @@ -954,6 +963,7 @@ struct pci_driver { > int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ > u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); > const struct pci_error_handlers *err_handler; > + const struct cxl_error_handlers *cxl_err_handler; > const struct attribute_group **groups; > const struct attribute_group **dev_groups; > struct device_driver driver; > -- > 2.34.1 >
Terry Bowman wrote: > CXL.io is implemented on top of PCIe Protocol Errors. But, CXL.io and PCIe > have different handling requirements for uncorrectable errors (UCE). > > The PCIe AER service driver may attempt recovering PCIe devices with > UCE while recovery is not used for CXL.io. Recovery is not used in the > CXL.io case because of potential corruption on what can be system memory. > > Create pci_driver::cxl_err_handlers structure similar to > pci_driver::error_handler. Create handlers for correctable and > uncorrectable CXL.io error handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > Port Protocol Error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > Reviewed-by: Fan Ni <fan.ni@samsung.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Gregory Price <gourry@gourry.net> Looks good to me. Reviewed-by: Dan Williams <dan.j.williams@intel.com>
diff --git a/include/linux/pci.h b/include/linux/pci.h index 47b31ad724fa..1d62e785ae1f 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -884,6 +884,14 @@ struct pci_error_handlers { void (*cor_error_detected)(struct pci_dev *dev); }; +/* Compute Express Link (CXL) bus error event callbacks */ +struct cxl_error_handlers { + /* CXL bus error detected on this device */ + pci_ers_result_t (*error_detected)(struct pci_dev *dev); + + /* Allow device driver to record more details of a correctable error */ + void (*cor_error_detected)(struct pci_dev *dev); +}; struct module; @@ -929,6 +937,7 @@ struct module; * @sriov_get_vf_total_msix: PF driver callback to get the total number of * MSI-X vectors available for distribution to the VFs. * @err_handler: See Documentation/PCI/pci-error-recovery.rst + * @cxl_err_handler: Compute Express Link specific error handlers. * @groups: Sysfs attribute groups. * @dev_groups: Attributes attached to the device that will be * created once it is bound to the driver. @@ -954,6 +963,7 @@ struct pci_driver { int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); const struct pci_error_handlers *err_handler; + const struct cxl_error_handlers *cxl_err_handler; const struct attribute_group **groups; const struct attribute_group **dev_groups; struct device_driver driver;