Message ID | 20250107143852.3692571-2-terry.bowman@amd.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Enable CXL PCIe port protocol error handling and logging | expand |
Terry Bowman wrote: > CXL.io provides protocol error handling on top of PCIe Protocol Error > handling. But, CXL.io and PCIe have different handling requirements > for uncorrectable errors (UCE). > > The PCIe AER service driver may attempt recovering PCIe devices with > UCE while recovery is not used for CXL.io. Recovery is not used in the > CXL.io case because of potential corruption on what can be system memory. > > Create pci_driver::cxl_err_handlers structure similar to > pci_driver::error_handler. Create handlers for correctable and > uncorrectable CXL.io error handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > Port Protocol Error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> [snip]
On Tue, Jan 07, 2025 at 08:38:37AM -0600, Terry Bowman wrote: > CXL.io provides protocol error handling on top of PCIe Protocol Error > handling. But, CXL.io and PCIe have different handling requirements > for uncorrectable errors (UCE). > > The PCIe AER service driver may attempt recovering PCIe devices with > UCE while recovery is not used for CXL.io. Recovery is not used in the > CXL.io case because of potential corruption on what can be system memory. > > Create pci_driver::cxl_err_handlers structure similar to > pci_driver::error_handler. Create handlers for correctable and > uncorrectable CXL.io error handling. > > The CXL error handlers will be used in future patches adding CXL PCIe > Port Protocol Error handling. > > Signed-off-by: Terry Bowman <terry.bowman@amd.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Dave Jiang <dave.jiang@intel.com> > Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Gregory Price <gourry@gourry.net>
On 2/6/2025 11:01 AM, Gregory Price wrote: > On Tue, Jan 07, 2025 at 08:38:37AM -0600, Terry Bowman wrote: >> CXL.io provides protocol error handling on top of PCIe Protocol Error >> handling. But, CXL.io and PCIe have different handling requirements >> for uncorrectable errors (UCE). >> >> The PCIe AER service driver may attempt recovering PCIe devices with >> UCE while recovery is not used for CXL.io. Recovery is not used in the >> CXL.io case because of potential corruption on what can be system memory. >> >> Create pci_driver::cxl_err_handlers structure similar to >> pci_driver::error_handler. Create handlers for correctable and >> uncorrectable CXL.io error handling. >> >> The CXL error handlers will be used in future patches adding CXL PCIe >> Port Protocol Error handling. >> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Reviewed-by: Dave Jiang <dave.jiang@intel.com> >> Reviewed-by: Fan Ni <fan.ni@samsung.com> > Reviewed-by: Gregory Price <gourry@gourry.net> Thanks for reviewing and adding the "Reviewed-by". Terry
diff --git a/include/linux/pci.h b/include/linux/pci.h index db9b47ce3eef..e2e36f11205c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -882,6 +882,14 @@ struct pci_error_handlers { void (*cor_error_detected)(struct pci_dev *dev); }; +/* Compute Express Link (CXL) bus error event callbacks */ +struct cxl_error_handlers { + /* CXL bus error detected on this device */ + bool (*error_detected)(struct pci_dev *dev); + + /* Allow device driver to record more details of a correctable error */ + void (*cor_error_detected)(struct pci_dev *dev); +}; struct module; @@ -927,6 +935,7 @@ struct module; * @sriov_get_vf_total_msix: PF driver callback to get the total number of * MSI-X vectors available for distribution to the VFs. * @err_handler: See Documentation/PCI/pci-error-recovery.rst + * @cxl_err_handler: Compute Express Link specific error handlers. * @groups: Sysfs attribute groups. * @dev_groups: Attributes attached to the device that will be * created once it is bound to the driver. @@ -952,6 +961,7 @@ struct pci_driver { int (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */ u32 (*sriov_get_vf_total_msix)(struct pci_dev *pf); const struct pci_error_handlers *err_handler; + const struct cxl_error_handlers *cxl_err_handler; const struct attribute_group **groups; const struct attribute_group **dev_groups; struct device_driver driver;