Message ID | 20230907191956.674833-3-Benjamin.Cheatham@amd.com |
---|---|
State | Superseded |
Headers | show |
Series | CXL, ACPI, APEI, EINJ: Update EINJ for CXL 1.1 error types | expand |
On Thu, 7 Sep 2023 14:19:55 -0500 Ben Cheatham <Benjamin.Cheatham@amd.com> wrote: > Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI > v6.5. Because these error types target memory-mapped CXL 1.1 compliant > downstream ports and not physical (normal/persistent) memory, these > error types are not currently allowed through the memory range > validation done by the EINJ driver. > > The MMIO address of a CXL 1.1 downstream port can be found in the > cxl_rcrb_addr file in the corresponding dport directory under > /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same > procedure as a memory error type, but with param1 set to the > downstream port MMIO address. > > Example usage: > $ cd /sys/kernel/debug/apei/einj > $ cat available_error_type > 0x00000008 Memory Correctable > 0x00000010 Memory Uncorrectable non-fatal > 0x00000020 Memory Uncorrectable fatal > 0x00000040 PCI Express Correctable > 0x00000080 PCI Express Uncorrectable non-fatal > 0x00000100 PCI Express Uncorrectable fatal > 0x00008000 CXL.mem Protocol Correctable > 0x00020000 CXL.mem Protocol Uncorrectable fatal > $ echo 0x8000 > error_type > $ echo 0xfffffffffffff000 > param2 > $ echo 0x3 > flags From einj.rst (and the ACPI spec) bit 0 here is Processor APIC field valid. Why is that relevant here? If it were you'd be writing param3 I think. So probably harmless, but I think this should be 0x2 > flags > $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr > 0xb2f00000 > $ echo 0xb2f00000 > param1 > $ echo 1 > error_inject > > Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Hi Ben, Why bother with the config dependent build? It doesn't save that much in code built and right now there are no non ACPI CXL systems so in reality it's always built anyway. Otherwise this LGTM > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 001ab8742e21..f8f300496140 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -1122,6 +1122,25 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, > } > EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL); > > +#if IS_ENABLED(CONFIG_CXL_ACPI) I'm not a particular fan processor magic down in c files. Do we really care about the saving of not having this in builds where CONFIG_CXL_ACPI isn't set? I'm thinking those don't really exist. > +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base) > +{ > + struct cxl_dport *dport; > + unsigned long index; > + > + if (!cxl_root) > + return ERR_PTR(-ENODEV); > + > + xa_for_each(&cxl_root->dports, index, dport) > + if ((dport->rch && dport->rcrb.base != CXL_RESOURCE_NONE) > + && dport->rcrb.base == rcrb_base) > + return dport; > + > + return NULL; > +} > +EXPORT_SYMBOL_NS_GPL(cxl_find_rch_dport_by_rcrb, CXL); > +#endif > + > static int add_ep(struct cxl_ep *new) > { > struct cxl_port *port = new->dport->port;
On 9/12/23 9:00 AM, Jonathan Cameron wrote: > On Thu, 7 Sep 2023 14:19:55 -0500 > Ben Cheatham <Benjamin.Cheatham@amd.com> wrote: > >> Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI >> v6.5. Because these error types target memory-mapped CXL 1.1 compliant >> downstream ports and not physical (normal/persistent) memory, these >> error types are not currently allowed through the memory range >> validation done by the EINJ driver. >> >> The MMIO address of a CXL 1.1 downstream port can be found in the >> cxl_rcrb_addr file in the corresponding dport directory under >> /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same >> procedure as a memory error type, but with param1 set to the >> downstream port MMIO address. >> >> Example usage: >> $ cd /sys/kernel/debug/apei/einj >> $ cat available_error_type >> 0x00000008 Memory Correctable >> 0x00000010 Memory Uncorrectable non-fatal >> 0x00000020 Memory Uncorrectable fatal >> 0x00000040 PCI Express Correctable >> 0x00000080 PCI Express Uncorrectable non-fatal >> 0x00000100 PCI Express Uncorrectable fatal >> 0x00008000 CXL.mem Protocol Correctable >> 0x00020000 CXL.mem Protocol Uncorrectable fatal >> $ echo 0x8000 > error_type >> $ echo 0xfffffffffffff000 > param2 >> $ echo 0x3 > flags > > From einj.rst (and the ACPI spec) bit 0 here is > Processor APIC field valid. Why is that relevant here? > If it were you'd be writing param3 I think. So probably > harmless, but I think this should be 0x2 > flags > I don't think it matters in this case, but I agree. I'll change it to use 0x2 instead. >> $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr >> 0xb2f00000 >> $ echo 0xb2f00000 > param1 >> $ echo 1 > error_inject >> >> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> > Hi Ben, > > Why bother with the config dependent build? > It doesn't save that much in code built and right now there > are no non ACPI CXL systems so in reality it's always built > anyway. > Ok, I'll remove the config dependencies. It'll make my life a lot easier :). > Otherwise this LGTM > > >> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >> index 001ab8742e21..f8f300496140 100644 >> --- a/drivers/cxl/core/port.c >> +++ b/drivers/cxl/core/port.c >> @@ -1122,6 +1122,25 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, >> } >> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL); >> >> +#if IS_ENABLED(CONFIG_CXL_ACPI) > I'm not a particular fan processor magic down in c files. > > Do we really care about the saving of not having this in builds where > CONFIG_CXL_ACPI isn't set? I'm thinking those don't really exist. > I don't like it much either, I did it as a quick fix for the build errors from the kernel test bot. Thanks, Ben >> +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base) >> +{ >> + struct cxl_dport *dport; >> + unsigned long index; >> + >> + if (!cxl_root) >> + return ERR_PTR(-ENODEV); >> + >> + xa_for_each(&cxl_root->dports, index, dport) >> + if ((dport->rch && dport->rcrb.base != CXL_RESOURCE_NONE) >> + && dport->rcrb.base == rcrb_base) >> + return dport; >> + >> + return NULL; >> +} >> +EXPORT_SYMBOL_NS_GPL(cxl_find_rch_dport_by_rcrb, CXL); >> +#endif >> + >> static int add_ep(struct cxl_ep *new) >> { >> struct cxl_port *port = new->dport->port; >
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig index 6b18f8bc7be3..eb9cc7157342 100644 --- a/drivers/acpi/apei/Kconfig +++ b/drivers/acpi/apei/Kconfig @@ -55,6 +55,8 @@ config ACPI_APEI_MEMORY_FAILURE config ACPI_APEI_EINJ tristate "APEI Error INJection (EINJ)" depends on ACPI_APEI && DEBUG_FS + imply CXL_BUS + imply CXL_ACPI help EINJ provides a hardware error injection mechanism, it is mainly used for debugging and testing the other parts of diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index 013eb621dc92..8000417a5f26 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -21,6 +21,7 @@ #include <linux/nmi.h> #include <linux/delay.h> #include <linux/mm.h> +#include <linux/cxl.h> #include <asm/unaligned.h> #include "apei-internal.h" @@ -36,6 +37,7 @@ #define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \ ACPI_EINJ_MEMORY_UNCORRECTABLE | \ ACPI_EINJ_MEMORY_FATAL) +#define CXL_ERROR_MASK GENMASK(17, 12) /* * ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action. @@ -512,6 +514,22 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, return rc; } +static int is_valid_cxl_addr(u64 addr) +{ + struct cxl_dport *dport; + + if (IS_REACHABLE(CONFIG_CXL_ACPI)) + dport = cxl_find_rch_dport_by_rcrb((resource_size_t) addr); + else + return 0; + + if (!IS_ERR_OR_NULL(dport)) + return 1; + + pr_info("Could not find dport with rcrb 0x%llx\n", addr); + return 0; +} + /* Inject the specified hardware error */ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3, u64 param4) @@ -537,8 +555,11 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, if (type & ACPI5_VENDOR_BIT) { if (vendor_flags != SETWA_FLAGS_MEM) goto inject; - } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) + } else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) { goto inject; + } else if (type & CXL_ERROR_MASK && is_valid_cxl_addr(param1)) { + goto inject; + } /* * Disallow crazy address masks that give BIOS leeway to pick @@ -807,3 +828,4 @@ module_exit(einj_exit); MODULE_AUTHOR("Huang Ying"); MODULE_DESCRIPTION("APEI Error INJection support"); MODULE_LICENSE("GPL"); +MODULE_IMPORT_NS(CXL); diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 001ab8742e21..f8f300496140 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -1122,6 +1122,25 @@ struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port, } EXPORT_SYMBOL_NS_GPL(devm_cxl_add_rch_dport, CXL); +#if IS_ENABLED(CONFIG_CXL_ACPI) +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base) +{ + struct cxl_dport *dport; + unsigned long index; + + if (!cxl_root) + return ERR_PTR(-ENODEV); + + xa_for_each(&cxl_root->dports, index, dport) + if ((dport->rch && dport->rcrb.base != CXL_RESOURCE_NONE) + && dport->rcrb.base == rcrb_base) + return dport; + + return NULL; +} +EXPORT_SYMBOL_NS_GPL(cxl_find_rch_dport_by_rcrb, CXL); +#endif + static int add_ep(struct cxl_ep *new) { struct cxl_port *port = new->dport->port; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 4d5bce4bae7e..3e6779dbcd23 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -8,6 +8,7 @@ #include <linux/bitfield.h> #include <linux/bitops.h> #include <linux/log2.h> +#include <linux/cxl.h> #include <linux/io.h> /** diff --git a/include/linux/cxl.h b/include/linux/cxl.h new file mode 100644 index 000000000000..b57a4cc85005 --- /dev/null +++ b/include/linux/cxl.h @@ -0,0 +1,18 @@ +#ifndef _LINUX_CXL_H +#define _LINUX_CXL_H + +#include <linux/xarray.h> +#include <linux/errno.h> + +struct cxl_dport; + +#if IS_ENABLED(CONFIG_CXL_ACPI) +struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base); +#else +static inline struct cxl_dport *cxl_find_rch_dport_by_rcrb(resource_size_t rcrb_base) +{ + return NULL; +} +#endif + +#endif
Add support for CXL EINJ error types for CXL 1.1 hosts added in ACPI v6.5. Because these error types target memory-mapped CXL 1.1 compliant downstream ports and not physical (normal/persistent) memory, these error types are not currently allowed through the memory range validation done by the EINJ driver. The MMIO address of a CXL 1.1 downstream port can be found in the cxl_rcrb_addr file in the corresponding dport directory under /sys/bus/cxl/devices/portX. CXL 1.1 error types follow the same procedure as a memory error type, but with param1 set to the downstream port MMIO address. Example usage: $ cd /sys/kernel/debug/apei/einj $ cat available_error_type 0x00000008 Memory Correctable 0x00000010 Memory Uncorrectable non-fatal 0x00000020 Memory Uncorrectable fatal 0x00000040 PCI Express Correctable 0x00000080 PCI Express Uncorrectable non-fatal 0x00000100 PCI Express Uncorrectable fatal 0x00008000 CXL.mem Protocol Correctable 0x00020000 CXL.mem Protocol Uncorrectable fatal $ echo 0x8000 > error_type $ echo 0xfffffffffffff000 > param2 $ echo 0x3 > flags $ cat /sys/bus/cxl/devices/portX/dportY/cxl_rcrb_addr 0xb2f00000 $ echo 0xb2f00000 > param1 $ echo 1 > error_inject Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> --- drivers/acpi/apei/Kconfig | 2 ++ drivers/acpi/apei/einj.c | 24 +++++++++++++++++++++++- drivers/cxl/core/port.c | 19 +++++++++++++++++++ drivers/cxl/cxl.h | 1 + include/linux/cxl.h | 18 ++++++++++++++++++ 5 files changed, 63 insertions(+), 1 deletion(-) create mode 100644 include/linux/cxl.h