diff mbox series

[v3,15/15] PCI/AER: Enable internal errors for CXL upstream and downstream switch ports

Message ID 20241113215429.3177981-16-terry.bowman@amd.com
State New
Headers show
Series Enable CXL PCIe port protocol error handling and logging | expand

Commit Message

Bowman, Terry Nov. 13, 2024, 9:54 p.m. UTC
The AER service driver enables uncorrectable internal errors (UIE) and
correctable internal errors (CIE) for CXL root ports and CXL RCEC's. The
UIE and CIE are used in reporting CXL protocol errors. The same UIE/CIE
enablement is needed for CXL PCIe upstream and downstream ports inorder to
notify the associated root port and OS.[1]

Export the AER service driver's pci_aer_unmask_internal_errors() function
to CXL namsespace.

Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config
because it is now an exported function.

Call pci_aer_unmask_internal_errors() during RAS initialization in:
cxl_uport_init_ras_reporting() and cxl_dport_init_ras_reporting().

[1] PCIe Base Spec r6.2-1.0, 6.2.3.2.2 Masking Individual Errors

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/cxl/core/pci.c | 2 ++
 drivers/pci/pcie/aer.c | 5 +++--
 include/linux/aer.h    | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

Comments

Lukas Wunner Nov. 18, 2024, 11:54 a.m. UTC | #1
On Wed, Nov 13, 2024 at 03:54:29PM -0600, Terry Bowman wrote:
> Export the AER service driver's pci_aer_unmask_internal_errors() function
> to CXL namsespace.
         ^^^^^^^^^^
	 namespace

> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config
> because it is now an exported function.
[...]
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info)
>  	return info->status & PCI_ERR_UNC_INTN;
>  }
>  
> -#ifdef CONFIG_PCIEAER_CXL
>  /**
>   * pci_aer_unmask_internal_errors - unmask internal errors
>   * @dev: pointer to the pcie_dev data structure
> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info)
>   * Note: AER must be enabled and supported by the device which must be
>   * checked in advance, e.g. with pcie_aer_is_native().
>   */
> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)

Hm, it seems the reason why you're moving pci_aer_unmask_internal_errors()
outside of "ifdef CONFIG_PCIEAER_CXL" is that drivers/cxl/core/pci.c
is conditional on CONFIG_CXL_BUS, whereas CONFIG_PCIEAER_CXL depends
on CONFIG_CXL_PCI.

In other words, you need this to avoid build breakage if CONFIG_CXL_BUS
is enabled but CONFIG_CXL_PCI is not.

I'm wondering (as a CXL ignoramus) why that can happen in the first
place, i.e. why is drivers/cxl/core/pci.c compiled at all if
CONFIG_CXL_PCI is disabled?

Thanks,

Lukas
Bowman, Terry Nov. 21, 2024, 10:25 p.m. UTC | #2
On 11/18/2024 5:54 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:29PM -0600, Terry Bowman wrote:
>> Export the AER service driver's pci_aer_unmask_internal_errors() function
>> to CXL namsespace.
>          ^^^^^^^^^^
> 	 namespace
Yup, thanks.
>> Remove the function's dependency on the CONFIG_PCIEAER_CXL kernel config
>> because it is now an exported function.
> [...]
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -949,7 +949,6 @@ static bool is_internal_error(struct aer_err_info *info)
>>  	return info->status & PCI_ERR_UNC_INTN;
>>  }
>>  
>> -#ifdef CONFIG_PCIEAER_CXL
>>  /**
>>   * pci_aer_unmask_internal_errors - unmask internal errors
>>   * @dev: pointer to the pcie_dev data structure
>> @@ -960,7 +959,7 @@ static bool is_internal_error(struct aer_err_info *info)
>>   * Note: AER must be enabled and supported by the device which must be
>>   * checked in advance, e.g. with pcie_aer_is_native().
>>   */
>> -static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
>> +void pci_aer_unmask_internal_errors(struct pci_dev *dev)
> Hm, it seems the reason why you're moving pci_aer_unmask_internal_errors()
> outside of "ifdef CONFIG_PCIEAER_CXL" is that drivers/cxl/core/pci.c
> is conditional on CONFIG_CXL_BUS, whereas CONFIG_PCIEAER_CXL depends
> on CONFIG_CXL_PCI.
>
> In other words, you need this to avoid build breakage if CONFIG_CXL_BUS
> is enabled but CONFIG_CXL_PCI is not.
>
> I'm wondering (as a CXL ignoramus) why that can happen in the first
> place, i.e. why is drivers/cxl/core/pci.c compiled at all if
> CONFIG_CXL_PCI is disabled?
>
> Thanks,
>
> Lukas

I moved the function out of the 'ifdef' block because it would be used in
another subsystem. Bjorn requested in earlier review that functions used across
subsystems should not use ifdef.

The drivers/cxl/Makefile file shows CONFIG_CXL_PCI gates cxl_pci.c build with:
obj-$(CONFIG_CXL_PCI) += cxl_pci.o 

BTW, CONFIG_CXL_PCI was added in the commit (68cdd3d2af69) below.

commit 68cdd3d2af6964dae2f8d9b53ee94f740dcbda35
Author: Ben Widawsky <bwidawsk@kernel.org>
Date:   Sun Jan 23 16:28:44 2022 -0800

    cxl: Rename CXL_MEM to CXL_PCI

    The cxl_mem module was renamed cxl_pci in commit 21e9f76733a8 ("cxl:
    Rename mem to pci"). In preparation for adding an ancillary driver for
    cxl_memdev devices (registered on the cxl bus by cxl_pci), go ahead and
    rename CONFIG_CXL_MEM to CONFIG_CXL_PCI. Free up the CXL_MEM name for
    that new driver to manage CXL.mem endpoint operations.

    Suggested-by: Dan Williams <dan.j.williams@intel.com>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
    Link: https://lore.kernel.org/r/164298412409.3018233.12407355692407890752.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>


Regards,
Terry
Lukas Wunner Nov. 21, 2024, 10:32 p.m. UTC | #3
On Thu, Nov 21, 2024 at 04:25:31PM -0600, Bowman, Terry wrote:
> On 11/18/2024 5:54 AM, Lukas Wunner wrote:
> > Hm, it seems the reason why you're moving pci_aer_unmask_internal_errors()
> > outside of "ifdef CONFIG_PCIEAER_CXL" is that drivers/cxl/core/pci.c
> > is conditional on CONFIG_CXL_BUS, whereas CONFIG_PCIEAER_CXL depends
> > on CONFIG_CXL_PCI.
> >
> > In other words, you need this to avoid build breakage if CONFIG_CXL_BUS
> > is enabled but CONFIG_CXL_PCI is not.
> >
> > I'm wondering (as a CXL ignoramus) why that can happen in the first
> > place, i.e. why is drivers/cxl/core/pci.c compiled at all if
> > CONFIG_CXL_PCI is disabled?
[...]
> The drivers/cxl/Makefile file shows CONFIG_CXL_PCI gates cxl_pci.c build with:
> obj-$(CONFIG_CXL_PCI) += cxl_pci.o

I wasn't referring to drivers/cxl/pci.c, but drivers/cxl/core/pci.c.
That's gated by CONFIG_CXL_BUS, not CONFIG_CXL_PCI, which seems weird.

Thanks,

Lukas
diff mbox series

Patch

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index af2ff6936a09..4ede038a7148 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -887,6 +887,7 @@  void cxl_uport_init_ras_reporting(struct cxl_port *port)
 
 	cxl_assign_port_error_handlers(pdev);
 	devm_add_action_or_reset(port->uport_dev, cxl_clear_port_error_handlers, pdev);
+	pci_aer_unmask_internal_errors(pdev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, CXL);
 
@@ -919,6 +920,7 @@  void cxl_dport_init_ras_reporting(struct cxl_dport *dport)
 
 	cxl_assign_port_error_handlers(pdev);
 	devm_add_action_or_reset(dport_dev, cxl_clear_port_error_handlers, pdev);
+	pci_aer_unmask_internal_errors(pdev);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dport_init_ras_reporting, CXL);
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 87fddd514030..1028814379e4 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -949,7 +949,6 @@  static bool is_internal_error(struct aer_err_info *info)
 	return info->status & PCI_ERR_UNC_INTN;
 }
 
-#ifdef CONFIG_PCIEAER_CXL
 /**
  * pci_aer_unmask_internal_errors - unmask internal errors
  * @dev: pointer to the pcie_dev data structure
@@ -960,7 +959,7 @@  static bool is_internal_error(struct aer_err_info *info)
  * Note: AER must be enabled and supported by the device which must be
  * checked in advance, e.g. with pcie_aer_is_native().
  */
-static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
+void pci_aer_unmask_internal_errors(struct pci_dev *dev)
 {
 	int aer = dev->aer_cap;
 	u32 mask;
@@ -973,7 +972,9 @@  static void pci_aer_unmask_internal_errors(struct pci_dev *dev)
 	mask &= ~PCI_ERR_COR_INTERNAL;
 	pci_write_config_dword(dev, aer + PCI_ERR_COR_MASK, mask);
 }
+EXPORT_SYMBOL_NS_GPL(pci_aer_unmask_internal_errors, CXL);
 
+#ifdef CONFIG_PCIEAER_CXL
 static bool is_cxl_mem_dev(struct pci_dev *dev)
 {
 	/*
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 4b97f38f3fcf..093293f9f12b 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -55,5 +55,6 @@  void pci_print_aer(struct pci_dev *dev, int aer_severity,
 int cper_severity_to_aer(int cper_severity);
 void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
 		       int severity, struct aer_capability_regs *aer_regs);
+void pci_aer_unmask_internal_errors(struct pci_dev *dev);
 #endif //_AER_H_