diff mbox series

[v2,01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'

Message ID 20241025210305.27499-2-terry.bowman@amd.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Enable CXL PCIe port protocol error handling and logging | expand

Commit Message

Bowman, Terry Oct. 25, 2024, 9:02 p.m. UTC
CXL.io provides PCIe like protocol error implementation, but CXL.io and
PCIe have different handling requirements.

The PCIe AER service driver may attempt recovering PCIe devices with
uncorrectable errors while recovery is not used for CXL.io. Recovery is not
used in the CXL.io recovery because of the potential for corruption on
what can be system memory.

Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
Create handlers for correctable and uncorrectable CXL.io error
handling.

The CXL error handlers will be used in future patches adding CXL PCIe
port protocol error handling.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 include/linux/pci.h | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Jonathan Cameron Oct. 30, 2024, 3:14 p.m. UTC | #1
On Fri, 25 Oct 2024 16:02:52 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> CXL.io provides PCIe like protocol error implementation, but CXL.io and
> PCIe have different handling requirements.
> 
> The PCIe AER service driver may attempt recovering PCIe devices with
> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
> used in the CXL.io recovery because of the potential for corruption on
> what can be system memory.
> 
> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
> Create handlers for correctable and uncorrectable CXL.io error
> handling.
> 
> The CXL error handlers will be used in future patches adding CXL PCIe
> port protocol error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Bowman, Terry Oct. 30, 2024, 3:15 p.m. UTC | #2
Hi Jonathan,

Thank you for reviewing.

Regards,
Terry

On 10/30/2024 10:14 AM, Jonathan Cameron wrote:
> On Fri, 25 Oct 2024 16:02:52 -0500
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> CXL.io provides PCIe like protocol error implementation, but CXL.io and
>> PCIe have different handling requirements.
>>
>> The PCIe AER service driver may attempt recovering PCIe devices with
>> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
>> used in the CXL.io recovery because of the potential for corruption on
>> what can be system memory.
>>
>> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
>> Create handlers for correctable and uncorrectable CXL.io error
>> handling.
>>
>> The CXL error handlers will be used in future patches adding CXL PCIe
>> port protocol error handling.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Dave Jiang Oct. 31, 2024, 4:20 p.m. UTC | #3
On 10/25/24 2:02 PM, Terry Bowman wrote:
> CXL.io provides PCIe like protocol error implementation, but CXL.io and
> PCIe have different handling requirements.
> 
> The PCIe AER service driver may attempt recovering PCIe devices with
> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
> used in the CXL.io recovery because of the potential for corruption on
> what can be system memory.
> 
> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
> Create handlers for correctable and uncorrectable CXL.io error
> handling.
> 
> The CXL error handlers will be used in future patches adding CXL PCIe
> port protocol error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  include/linux/pci.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 573b4c4c2be6..106ac83e3a7b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -886,6 +886,14 @@ struct pci_error_handlers {
>  	void (*cor_error_detected)(struct pci_dev *dev);
>  };
>  
> +/* CXL bus error event callbacks */
> +struct cxl_error_handlers {
> +	/* CXL bus error detected on this device */
> +	bool (*error_detected)(struct pci_dev *dev);
> +
> +	/* Allow device driver to record more details of a correctable error */
> +	void (*cor_error_detected)(struct pci_dev *dev);
> +};
>  
>  struct module;
>  
> @@ -956,6 +964,7 @@ struct pci_driver {
>  	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
>  	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
>  	const struct pci_error_handlers *err_handler;
> +	const struct cxl_error_handlers *cxl_err_handler;
>  	const struct attribute_group **groups;
>  	const struct attribute_group **dev_groups;
>  	struct device_driver	driver;
Fan Ni Oct. 31, 2024, 8:24 p.m. UTC | #4
On Fri, Oct 25, 2024 at 04:02:52PM -0500, Terry Bowman wrote:
> CXL.io provides PCIe like protocol error implementation, but CXL.io and
> PCIe have different handling requirements.
> 
> The PCIe AER service driver may attempt recovering PCIe devices with
> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
> used in the CXL.io recovery because of the potential for corruption on
> what can be system memory.
> 
> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
> Create handlers for correctable and uncorrectable CXL.io error
> handling.
> 
> The CXL error handlers will be used in future patches adding CXL PCIe
> port protocol error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---

Reviewed-by: Fan Ni <fan.ni@samsung.com>

>  include/linux/pci.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 573b4c4c2be6..106ac83e3a7b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -886,6 +886,14 @@ struct pci_error_handlers {
>  	void (*cor_error_detected)(struct pci_dev *dev);
>  };
>  
> +/* CXL bus error event callbacks */
> +struct cxl_error_handlers {
> +	/* CXL bus error detected on this device */
> +	bool (*error_detected)(struct pci_dev *dev);
> +
> +	/* Allow device driver to record more details of a correctable error */
> +	void (*cor_error_detected)(struct pci_dev *dev);
> +};
>  
>  struct module;
>  
> @@ -956,6 +964,7 @@ struct pci_driver {
>  	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
>  	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
>  	const struct pci_error_handlers *err_handler;
> +	const struct cxl_error_handlers *cxl_err_handler;
>  	const struct attribute_group **groups;
>  	const struct attribute_group **dev_groups;
>  	struct device_driver	driver;
> -- 
> 2.34.1
>
diff mbox series

Patch

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 573b4c4c2be6..106ac83e3a7b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -886,6 +886,14 @@  struct pci_error_handlers {
 	void (*cor_error_detected)(struct pci_dev *dev);
 };
 
+/* CXL bus error event callbacks */
+struct cxl_error_handlers {
+	/* CXL bus error detected on this device */
+	bool (*error_detected)(struct pci_dev *dev);
+
+	/* Allow device driver to record more details of a correctable error */
+	void (*cor_error_detected)(struct pci_dev *dev);
+};
 
 struct module;
 
@@ -956,6 +964,7 @@  struct pci_driver {
 	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
 	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
 	const struct pci_error_handlers *err_handler;
+	const struct cxl_error_handlers *cxl_err_handler;
 	const struct attribute_group **groups;
 	const struct attribute_group **dev_groups;
 	struct device_driver	driver;