Message ID | 20250108215749.181852-1-Benjamin.Cheatham@amd.com |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | [RFC,ndctl] cxl: Add inject-error command | expand |
On Wed, Jan 08, 2025 at 03:57:49PM -0600, Ben Cheatham wrote: > Add inject-error command for injecting CXL errors into CXL devices. > The command currently only has support for injecting CXL protocol > errors into CXL downstream ports via EINJ. Hi Ben, I went through enough to give you some feedback for your v1. Just ran out of time and didn't want to keep you waiting any longer. wrt 'currently only has' - what is the grander scope of this that we might see in the future. Will there only every be one system wide response to --list-errors or will there be error types per type of port. Spec reference? > > The command takes an error type and injects an error of that type into the > specified downstream port. Downstream ports can be specified using the > port's device name with the -d option. Available error types can be obtained > by running "cxl inject-error --list-errors". > > This command requires the kernel to be built with CONFIG_DEBUGFS and > CONFIG_ACPI_APEI_EINJ_CXL enabled. It also requires root privileges to > run due to reading from <debugfs>/cxl/einj_types and writing to > <debugfs>/cxl/<dport>/einj_inject. > > Example usage: > # cxl inject-error --list-errors > cxl.mem_correctable > cxl.mem_fatal > ... > # cxl inject-error -d 0000:00:01.1 cxl.mem_correctable > injected cxl.mem_correctable protocol error > I'll probably ask this again later on, but how does user see list of downstream ports. Does user really think -d dport, or do they think -d device-name, or ? Man page will help here. We don't have cxl-cli support for poison inject, but to future proof this, let's think about the naming. Please split the patch up at least into 2 - libcxl: Add APIs supporting CXL protocol error injection include updates to Documentation/cxl/lib/libcxl.txt cxl: add {list,inject}-protocol-error' commands include man page updates, additiona The 'list-errors' is the the system level error support, right? Could that fit in the existing 'cxl list' hierachy? Would they be a property of the bus? I don't know that 'protocol' is best name, but seems it needs another descriptor. > Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> > --- > cxl/builtin.h | 1 + > cxl/cxl.c | 1 + > cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++ > cxl/lib/libcxl.c | 53 +++++++++++++ > cxl/lib/libcxl.sym | 2 + > cxl/libcxl.h | 13 ++++ > cxl/meson.build | 1 + > 7 files changed, 259 insertions(+) > create mode 100644 cxl/inject-error.c > > diff --git a/cxl/builtin.h b/cxl/builtin.h > index c483f30..e82fcb5 100644 > --- a/cxl/builtin.h > +++ b/cxl/builtin.h > @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx); > int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx); > int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx); > int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx); > +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx); > #ifdef ENABLE_LIBTRACEFS > int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx); > #else > diff --git a/cxl/cxl.c b/cxl/cxl.c > index 1643667..f808926 100644 > --- a/cxl/cxl.c > +++ b/cxl/cxl.c > @@ -79,6 +79,7 @@ static struct cmd_struct commands[] = { > { "enable-region", .c_fn = cmd_enable_region }, > { "disable-region", .c_fn = cmd_disable_region }, > { "destroy-region", .c_fn = cmd_destroy_region }, > + { "inject-error", .c_fn = cmd_inject_error }, > { "monitor", .c_fn = cmd_monitor }, > }; > > diff --git a/cxl/inject-error.c b/cxl/inject-error.c > new file mode 100644 Can this fit in an existing file? port.c ? I didn't review this file yet, so snipping. > index 0000000..3645934 > --- /dev/null > +++ b/cxl/inject-error.c snip > diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c > index 91eedd1..8174c11 100644 > --- a/cxl/lib/libcxl.c > +++ b/cxl/lib/libcxl.c > @@ -3179,6 +3179,59 @@ CXL_EXPORT int cxl_dport_get_id(struct cxl_dport *dport) > return dport->id; > } > > +CXL_EXPORT int cxl_dport_inject_proto_err(struct cxl_dport *dport, > + enum cxl_proto_error_types perr, > + const char *debugfs) > +{ > + struct cxl_port *port = cxl_dport_get_port(dport); > + size_t path_len = strlen(debugfs) + 24; What's the path_len math, +24 here and +16 in next fcn. I notice other path calloc's in this file padding more, ie +100 or +50. > + struct cxl_ctx *ctx = port->ctx; > + char buf[32]; > + char *path; > + int rc; > + > + if (!dport->dev_path) { > + err(ctx, "no dev_path for dport\n"); > + return -EINVAL; > + } > + > + path_len += strlen(dport->dev_path); > + path = calloc(1, path_len); > + if (!path) > + return -ENOMEM; > + > + snprintf(path, path_len, "%s/cxl/%s/einj_inject", debugfs, > + cxl_dport_get_devname(dport)); > + > + snprintf(buf, sizeof(buf), "0x%lx\n", (u64) perr); Here, and in cxl_get_proto_errors(), can we check for the path and fail with 'unsupported' if it doesn't exist. That'll tell folks if feature is not configured, or not enabled ? Are kernel configured and port enabled 2 different levels? There's an example in this file w "if (access(path, F_OK) != 0)" > + rc = sysfs_write_attr(ctx, path, buf); > + if (rc) > + err(ctx, "could not write to %s: %d\n", path, rc); for 'sameness' with most err's in this library, do: err(ctx, "failed write to %s: %s\n", path, strerr(-rc)); > + > + free(path); > + return rc; > +} > + > +CXL_EXPORT int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, > + const char *debugfs) > +{ > + size_t path_len = strlen(debugfs) + 16; > + char *path; > + int rc = 0; > + > + path = calloc(1, path_len); > + if (!path) > + return -ENOMEM; > + > + snprintf(path, path_len, "%s/cxl/einj_types", debugfs); > + rc = sysfs_read_attr(ctx, path, buf); > + if (rc) > + err(ctx, "could not read from %s: %d\n", path, rc); > + same comments as previous, check path and make err msg similar > + free(path); > + return rc; > +} > + > CXL_EXPORT struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport) > { > return dport->port; > diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym > index 304d7fa..d39a12d 100644 > --- a/cxl/lib/libcxl.sym > +++ b/cxl/lib/libcxl.sym > @@ -281,4 +281,6 @@ global: > cxl_memdev_get_ram_qos_class; > cxl_region_qos_class_mismatch; > cxl_port_decoders_committed; > + cxl_dport_inject_proto_err; > + cxl_get_proto_errors; > } LIBCXL_6; Start a new section above for these new symbols. Each ndctl release gets a new section - if symbols added. > diff --git a/cxl/libcxl.h b/cxl/libcxl.h > index fc6dd00..867daa4 100644 > --- a/cxl/libcxl.h > +++ b/cxl/libcxl.h > @@ -160,6 +160,15 @@ struct cxl_port *cxl_port_get_next_all(struct cxl_port *port, > for (port = cxl_port_get_first(top); port != NULL; \ > port = cxl_port_get_next_all(port, top)) > > +enum cxl_proto_error_types { > + CXL_CACHE_CORRECTABLE = 1 << 12, > + CXL_CACHE_UNCORRECTABLE = 1 << 13, > + CXL_CACHE_FATAL = 1 << 14, > + CXL_MEM_CORRECTABLE = 1 << 15, > + CXL_MEM_UNCORRECTABLE = 1 << 16, > + CXL_MEM_FATAL = 1 << 17, > +}; Please align like enum util_json_flags Is there a spec reference to add? That's all. -- Alison > + > struct cxl_dport; > struct cxl_dport *cxl_dport_get_first(struct cxl_port *port); > struct cxl_dport *cxl_dport_get_next(struct cxl_dport *dport); > @@ -168,6 +177,10 @@ const char *cxl_dport_get_physical_node(struct cxl_dport *dport); > const char *cxl_dport_get_firmware_node(struct cxl_dport *dport); > struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport); > int cxl_dport_get_id(struct cxl_dport *dport); > +int cxl_dport_inject_proto_err(struct cxl_dport *dport, > + enum cxl_proto_error_types err, > + const char *debugfs); > +int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, const char *debugfs); > bool cxl_dport_maps_memdev(struct cxl_dport *dport, struct cxl_memdev *memdev); > struct cxl_dport *cxl_port_get_dport_by_memdev(struct cxl_port *port, > struct cxl_memdev *memdev); > diff --git a/cxl/meson.build b/cxl/meson.build > index 61b4d87..79da4e6 100644 > --- a/cxl/meson.build > +++ b/cxl/meson.build > @@ -7,6 +7,7 @@ cxl_src = [ > 'memdev.c', > 'json.c', > 'filter.c', > + 'inject-error.c', > '../daxctl/json.c', > '../daxctl/filter.c', > ] > -- > 2.34.1 > >
Hi Alison, sorry it took me so long to respond I've been a bit sick. Responses inline. On 2/13/25 6:10 PM, Alison Schofield wrote: > On Wed, Jan 08, 2025 at 03:57:49PM -0600, Ben Cheatham wrote: >> Add inject-error command for injecting CXL errors into CXL devices. >> The command currently only has support for injecting CXL protocol >> errors into CXL downstream ports via EINJ. > > Hi Ben, > I went through enough to give you some feedback for your v1. > Just ran out of time and didn't want to keep you waiting any longer. > > wrt 'currently only has' - what is the grander scope of this that we > might see in the future. Will there only every be one system wide > response to --list-errors or will there be error types per type of > port. > My thinking was that it would be a command to do any type of CXL-related error injection. At the moment, that would be poison and protocol error injection, but I wanted to leave it open-ended in case something else came along in later spec revisions. As for the --list-errors, it'll probably be per type of port. I think I see where you're going with that, I can look into that option taking an argument for the type of the component. > Spec reference? > Protocol EINJ is defined starting in the ACPI v6.5 spec, section 18.6.4. I'll add references in the commit message(s) and code comments. >> >> The command takes an error type and injects an error of that type into the >> specified downstream port. Downstream ports can be specified using the >> port's device name with the -d option. Available error types can be obtained >> by running "cxl inject-error --list-errors". >> >> This command requires the kernel to be built with CONFIG_DEBUGFS and >> CONFIG_ACPI_APEI_EINJ_CXL enabled. It also requires root privileges to >> run due to reading from <debugfs>/cxl/einj_types and writing to >> <debugfs>/cxl/<dport>/einj_inject. >> >> Example usage: >> # cxl inject-error --list-errors >> cxl.mem_correctable >> cxl.mem_fatal >> ... >> # cxl inject-error -d 0000:00:01.1 cxl.mem_correctable >> injected cxl.mem_correctable protocol error >> > > I'll probably ask this again later on, but how does user see > list of downstream ports. Does user really think -d dport, > or do they think -d device-name, or ? > Man page will help here. > I went back and forth on whether to use '-d' or '-s' (to follow lspci), but ended up on '-d' since the dport device name isn't necessarily a PCIe SBDF. I don't mind changing it if you have a better suggestion. I'll also add a man page next time. > We don't have cxl-cli support for poison inject, but to future > proof this, let's think about the naming. > > Please split the patch up at least into 2 - > libcxl: Add APIs supporting CXL protocol error injection > include updates to Documentation/cxl/lib/libcxl.txt > > cxl: add {list,inject}-protocol-error' commands > include man page updates, additiona > Will do. > The 'list-errors' is the the system level error support, right? > Could that fit in the existing 'cxl list' hierachy? > Would they be a property of the bus? The errors listed by '--list-errors' are the CXL-related EINJ error types (ACPI v6.5, table 18.30). These error types are provided by the ACPI EINJ implementation and are system-wide AFAIK. So that wouldn't be a problem for adding them as a bus property, but the issue with that is these error types are meant to be used on CXL-capable PCIe root ports. I think it may be confusing to list these under a bus, and I wouldn't want to muddy up the appropriate dport entries since there can techincally be up to 6 of these error types (and the names are a bit long). If you think that's preferable, or adding it as a bus property is fine, I'd be happy to do so. > > I don't know that 'protocol' is best name, but seems it needs > another descriptor. > I just took the names from the spec, if you have a suggestion here I'd be happy to hear it! > >> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> >> --- >> cxl/builtin.h | 1 + >> cxl/cxl.c | 1 + >> cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++ >> cxl/lib/libcxl.c | 53 +++++++++++++ >> cxl/lib/libcxl.sym | 2 + >> cxl/libcxl.h | 13 ++++ >> cxl/meson.build | 1 + >> 7 files changed, 259 insertions(+) >> create mode 100644 cxl/inject-error.c >> >> diff --git a/cxl/builtin.h b/cxl/builtin.h >> index c483f30..e82fcb5 100644 >> --- a/cxl/builtin.h >> +++ b/cxl/builtin.h >> @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx); >> int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx); >> int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx); >> int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx); >> +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx); >> #ifdef ENABLE_LIBTRACEFS >> int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx); >> #else >> diff --git a/cxl/cxl.c b/cxl/cxl.c >> index 1643667..f808926 100644 >> --- a/cxl/cxl.c >> +++ b/cxl/cxl.c >> @@ -79,6 +79,7 @@ static struct cmd_struct commands[] = { >> { "enable-region", .c_fn = cmd_enable_region }, >> { "disable-region", .c_fn = cmd_disable_region }, >> { "destroy-region", .c_fn = cmd_destroy_region }, >> + { "inject-error", .c_fn = cmd_inject_error }, >> { "monitor", .c_fn = cmd_monitor }, >> }; >> >> diff --git a/cxl/inject-error.c b/cxl/inject-error.c >> new file mode 100644 > > Can this fit in an existing file? port.c ? I just put it into a new file since it's a new command. The functionality in the file ATM could probably fit, but my idea was this command would eventually not just do dport protocol error injection. > > I didn't review this file yet, so snipping. > > >> index 0000000..3645934 >> --- /dev/null >> +++ b/cxl/inject-error.c > > snip > >> diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c >> index 91eedd1..8174c11 100644 >> --- a/cxl/lib/libcxl.c >> +++ b/cxl/lib/libcxl.c >> @@ -3179,6 +3179,59 @@ CXL_EXPORT int cxl_dport_get_id(struct cxl_dport *dport) >> return dport->id; >> } >> >> +CXL_EXPORT int cxl_dport_inject_proto_err(struct cxl_dport *dport, >> + enum cxl_proto_error_types perr, >> + const char *debugfs) >> +{ >> + struct cxl_port *port = cxl_dport_get_port(dport); >> + size_t path_len = strlen(debugfs) + 24; > > What's the path_len math, +24 here and +16 in next fcn. > I notice other path calloc's in this file padding more, ie +100 or +50. > Taking another look at it, I can't remember why they are different :/. If I had to guess I was adding the length of the sysfs attribute (i.e. "einj_inject") + a bit. I'll change this to just add +50 or so when I get to the calloc call instead of doing it at the top (and also make it uniform between functions). > >> + struct cxl_ctx *ctx = port->ctx; >> + char buf[32]; >> + char *path; >> + int rc; >> + >> + if (!dport->dev_path) { >> + err(ctx, "no dev_path for dport\n"); >> + return -EINVAL; >> + } >> + >> + path_len += strlen(dport->dev_path); >> + path = calloc(1, path_len); >> + if (!path) >> + return -ENOMEM; >> + >> + snprintf(path, path_len, "%s/cxl/%s/einj_inject", debugfs, >> + cxl_dport_get_devname(dport)); >> + >> + snprintf(buf, sizeof(buf), "0x%lx\n", (u64) perr); > > Here, and in cxl_get_proto_errors(), can we check for the path and > fail with 'unsupported' if it doesn't exist. That'll tell folks > if feature is not configured, or not enabled ? > Are kernel configured and port enabled 2 different levels? For sure. If you have the kernel configured to use CXL EINJ these files come up as part of CXL driver init (specifically when cxl_dports are initialized). It's possible the CXL driver will come up before EINJ is initialized, in which case the sysfs files won't be visible. I've only seen that happen when both the EINJ module and CXL module(s) are built-in. In general though, if the kernel is configured correctly and the CXL driver doesn't run into any issues these files should be present. > > There's an example in this file w "if (access(path, F_OK) != 0)" > > >> + rc = sysfs_write_attr(ctx, path, buf); >> + if (rc) >> + err(ctx, "could not write to %s: %d\n", path, rc); > > for 'sameness' with most err's in this library, do: > err(ctx, "failed write to %s: %s\n", path, strerr(-rc)); > Will do! >> + >> + free(path); >> + return rc; >> +} >> + >> +CXL_EXPORT int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, >> + const char *debugfs) >> +{ >> + size_t path_len = strlen(debugfs) + 16; >> + char *path; >> + int rc = 0; >> + >> + path = calloc(1, path_len); >> + if (!path) >> + return -ENOMEM; >> + >> + snprintf(path, path_len, "%s/cxl/einj_types", debugfs); >> + rc = sysfs_read_attr(ctx, path, buf); >> + if (rc) >> + err(ctx, "could not read from %s: %d\n", path, rc); >> + > > same comments as previous, check path and make err msg similar > Ok! > >> + free(path); >> + return rc; >> +} >> + >> CXL_EXPORT struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport) >> { >> return dport->port; >> diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym >> index 304d7fa..d39a12d 100644 >> --- a/cxl/lib/libcxl.sym >> +++ b/cxl/lib/libcxl.sym >> @@ -281,4 +281,6 @@ global: >> cxl_memdev_get_ram_qos_class; >> cxl_region_qos_class_mismatch; >> cxl_port_decoders_committed; >> + cxl_dport_inject_proto_err; >> + cxl_get_proto_errors; >> } LIBCXL_6; > > Start a new section above for these new symbols. > Each ndctl release gets a new section - if symbols added. > Gotcha, I'll add it. > > >> diff --git a/cxl/libcxl.h b/cxl/libcxl.h >> index fc6dd00..867daa4 100644 >> --- a/cxl/libcxl.h >> +++ b/cxl/libcxl.h >> @@ -160,6 +160,15 @@ struct cxl_port *cxl_port_get_next_all(struct cxl_port *port, >> for (port = cxl_port_get_first(top); port != NULL; \ >> port = cxl_port_get_next_all(port, top)) >> >> +enum cxl_proto_error_types { >> + CXL_CACHE_CORRECTABLE = 1 << 12, >> + CXL_CACHE_UNCORRECTABLE = 1 << 13, >> + CXL_CACHE_FATAL = 1 << 14, >> + CXL_MEM_CORRECTABLE = 1 << 15, >> + CXL_MEM_UNCORRECTABLE = 1 << 16, >> + CXL_MEM_FATAL = 1 << 17, >> +}; > > Please align like enum util_json_flags > Is there a spec reference to add? Will do, and I'll add the reference as well. Thanks, Ben > > That's all. > -- Alison > > > >> + >> struct cxl_dport; >> struct cxl_dport *cxl_dport_get_first(struct cxl_port *port); >> struct cxl_dport *cxl_dport_get_next(struct cxl_dport *dport); >> @@ -168,6 +177,10 @@ const char *cxl_dport_get_physical_node(struct cxl_dport *dport); >> const char *cxl_dport_get_firmware_node(struct cxl_dport *dport); >> struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport); >> int cxl_dport_get_id(struct cxl_dport *dport); >> +int cxl_dport_inject_proto_err(struct cxl_dport *dport, >> + enum cxl_proto_error_types err, >> + const char *debugfs); >> +int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, const char *debugfs); >> bool cxl_dport_maps_memdev(struct cxl_dport *dport, struct cxl_memdev *memdev); >> struct cxl_dport *cxl_port_get_dport_by_memdev(struct cxl_port *port, >> struct cxl_memdev *memdev); >> diff --git a/cxl/meson.build b/cxl/meson.build >> index 61b4d87..79da4e6 100644 >> --- a/cxl/meson.build >> +++ b/cxl/meson.build >> @@ -7,6 +7,7 @@ cxl_src = [ >> 'memdev.c', >> 'json.c', >> 'filter.c', >> + 'inject-error.c', >> '../daxctl/json.c', >> '../daxctl/filter.c', >> ] >> -- >> 2.34.1 >> >>
diff --git a/cxl/builtin.h b/cxl/builtin.h index c483f30..e82fcb5 100644 --- a/cxl/builtin.h +++ b/cxl/builtin.h @@ -25,6 +25,7 @@ int cmd_create_region(int argc, const char **argv, struct cxl_ctx *ctx); int cmd_enable_region(int argc, const char **argv, struct cxl_ctx *ctx); int cmd_disable_region(int argc, const char **argv, struct cxl_ctx *ctx); int cmd_destroy_region(int argc, const char **argv, struct cxl_ctx *ctx); +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx); #ifdef ENABLE_LIBTRACEFS int cmd_monitor(int argc, const char **argv, struct cxl_ctx *ctx); #else diff --git a/cxl/cxl.c b/cxl/cxl.c index 1643667..f808926 100644 --- a/cxl/cxl.c +++ b/cxl/cxl.c @@ -79,6 +79,7 @@ static struct cmd_struct commands[] = { { "enable-region", .c_fn = cmd_enable_region }, { "disable-region", .c_fn = cmd_disable_region }, { "destroy-region", .c_fn = cmd_destroy_region }, + { "inject-error", .c_fn = cmd_inject_error }, { "monitor", .c_fn = cmd_monitor }, }; diff --git a/cxl/inject-error.c b/cxl/inject-error.c new file mode 100644 index 0000000..3645934 --- /dev/null +++ b/cxl/inject-error.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2025 AMD. All rights reserved. */ +#include <ccan/array_size/array_size.h> +#include <util/parse-options.h> +#include <cxl/libcxl.h> +#include <util/log.h> +#include <stdlib.h> +#include <unistd.h> +#include <stdio.h> +#include <errno.h> + +#define EINJ_TYPES_BUF_SIZE 512 + +static struct inject_params { + const char *type; + const char *devname; + const char *debugfs; + bool debug; + bool list; +} param; + +static struct cxl_proto_error { + enum cxl_proto_error_types err_type; + const char *err_str; +} cxl_proto_errors[] = { + { CXL_CACHE_CORRECTABLE, "cxl_cache_correctable", }, + { CXL_CACHE_UNCORRECTABLE, "cxl_cache_uncorrectable" }, + { CXL_CACHE_FATAL, "cxl_cache_fatal" }, + { CXL_MEM_CORRECTABLE, "cxl_mem_correctable" }, + { CXL_MEM_UNCORRECTABLE, "cxl_mem_uncorrectable" }, + { CXL_MEM_FATAL, "cxl_mem_fatal" } +}; + +#define BASE_OPTIONS() \ +OPT_BOOLEAN(0, "debug", ¶m.debug, "turn on debug output"), \ +OPT_BOOLEAN(0, "list-errors", ¶m.list, "list possible error types"), \ +OPT_STRING('m', "mount", ¶m.debugfs, "debugfs mount point", \ + "Mount point for debug file system, defaults to /sys/kernel/debug") + +#define INJECT_OPTIONS() \ +OPT_STRING('d', "device", ¶m.devname, "CXL device name", \ + "Device name of CXL device to inject error into. Protocol errors may only target downstream ports") \ + +static const struct option inject_options[] = { + BASE_OPTIONS(), + INJECT_OPTIONS(), + OPT_END(), +}; + +static struct log_ctx iel; + +static struct cxl_proto_error *find_cxl_proto_err(const char *type) +{ + unsigned long i; + + for (i = 0; i < ARRAY_SIZE(cxl_proto_errors); i++) { + if (!strcmp(type, cxl_proto_errors[i].err_str)) { + return &cxl_proto_errors[i]; + } + } + + log_err(&iel, "Invalid CXL protocol error type: %s\n", type); + return NULL; +} + +static int list_cxl_proto_errors(struct cxl_ctx *ctx, const char *debugfs) +{ + unsigned long i, err_num; + char buf[EINJ_TYPES_BUF_SIZE]; + char *line; + int rc; + + rc = cxl_get_proto_errors(ctx, buf, debugfs); + if (rc) { + log_err(&iel, "Failed to get CXL protocol errors: %d\n", rc); + return rc; + } + + line = strtok(buf, "\n"); + while (line) { + err_num = strtoul(line, NULL, 16); + if (err_num < CXL_CACHE_CORRECTABLE || err_num > CXL_MEM_FATAL) + continue; + + for (i = 0; i < ARRAY_SIZE(cxl_proto_errors); i++) + if (err_num == cxl_proto_errors[i].err_type) + printf("%s\n", cxl_proto_errors[i].err_str); + + line = strtok(NULL, "\n"); + } + + return 0; +} + +static struct cxl_dport *find_cxl_dport(struct cxl_ctx *ctx, const char *devname) +{ + struct cxl_port *port, *top; + struct cxl_dport *dport; + struct cxl_bus *bus; + + cxl_bus_foreach(ctx, bus) { + top = cxl_bus_get_port(bus); + + cxl_port_foreach_all(top, port) + cxl_dport_foreach(port, dport) + if (!strcmp(devname, + cxl_dport_get_devname(dport))) + return dport; + } + + log_err(&iel, "Downstream port \"%s\" not found\n", devname); + return NULL; +} + +static int inject_proto_err(struct cxl_ctx *ctx, const char *devname, + struct cxl_proto_error *perr, const char *debugfs) +{ + struct cxl_dport *dport; + int rc; + + if (!devname) { + log_err(&iel, "No downstream port specified for injection\n"); + return -EINVAL; + } + + dport = find_cxl_dport(ctx, devname); + if (!dport) + return -ENODEV; + + rc = cxl_dport_inject_proto_err(dport, perr->err_type, debugfs); + if (rc) + return rc; + + log_info(&iel, "injected %s protocol error.\n", perr->err_str); + return 0; +} + +static int inject_action(int argc, const char **argv, struct cxl_ctx *ctx, + const struct option *options, const char *usage) +{ + struct cxl_proto_error *perr; + const char * const u[] = { + usage, + NULL + }; + const char *debugfs; + int rc = -EINVAL; + + log_init(&iel, "cxl inject-error", "CXL_INJECT_LOG"); + argc = parse_options(argc, argv, options, u, 0); + + if (param.debug) { + cxl_set_log_priority(ctx, LOG_DEBUG); + iel.log_priority = LOG_DEBUG; + } else { + iel.log_priority = LOG_INFO; + } + + if (param.debugfs) + debugfs = param.debugfs; + else + debugfs = "/sys/kernel/debug"; + + if (param.list) + return list_cxl_proto_errors(ctx, debugfs); + + if (argc != 1) { + usage_with_options(u, options); + return rc; + } + + perr = find_cxl_proto_err(argv[0]); + if (perr) { + rc = inject_proto_err(ctx, param.devname, perr, debugfs); + if (rc) + log_err(&iel, "Failed to inject error: %d\n", rc); + } + + return rc; +} + +int cmd_inject_error(int argc, const char **argv, struct cxl_ctx *ctx) +{ + int rc = inject_action(argc, argv, ctx, inject_options, + "inject-error [<options>] <error-type>"); + + return rc ? EXIT_FAILURE : EXIT_SUCCESS; +} diff --git a/cxl/lib/libcxl.c b/cxl/lib/libcxl.c index 91eedd1..8174c11 100644 --- a/cxl/lib/libcxl.c +++ b/cxl/lib/libcxl.c @@ -3179,6 +3179,59 @@ CXL_EXPORT int cxl_dport_get_id(struct cxl_dport *dport) return dport->id; } +CXL_EXPORT int cxl_dport_inject_proto_err(struct cxl_dport *dport, + enum cxl_proto_error_types perr, + const char *debugfs) +{ + struct cxl_port *port = cxl_dport_get_port(dport); + size_t path_len = strlen(debugfs) + 24; + struct cxl_ctx *ctx = port->ctx; + char buf[32]; + char *path; + int rc; + + if (!dport->dev_path) { + err(ctx, "no dev_path for dport\n"); + return -EINVAL; + } + + path_len += strlen(dport->dev_path); + path = calloc(1, path_len); + if (!path) + return -ENOMEM; + + snprintf(path, path_len, "%s/cxl/%s/einj_inject", debugfs, + cxl_dport_get_devname(dport)); + + snprintf(buf, sizeof(buf), "0x%lx\n", (u64) perr); + rc = sysfs_write_attr(ctx, path, buf); + if (rc) + err(ctx, "could not write to %s: %d\n", path, rc); + + free(path); + return rc; +} + +CXL_EXPORT int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, + const char *debugfs) +{ + size_t path_len = strlen(debugfs) + 16; + char *path; + int rc = 0; + + path = calloc(1, path_len); + if (!path) + return -ENOMEM; + + snprintf(path, path_len, "%s/cxl/einj_types", debugfs); + rc = sysfs_read_attr(ctx, path, buf); + if (rc) + err(ctx, "could not read from %s: %d\n", path, rc); + + free(path); + return rc; +} + CXL_EXPORT struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport) { return dport->port; diff --git a/cxl/lib/libcxl.sym b/cxl/lib/libcxl.sym index 304d7fa..d39a12d 100644 --- a/cxl/lib/libcxl.sym +++ b/cxl/lib/libcxl.sym @@ -281,4 +281,6 @@ global: cxl_memdev_get_ram_qos_class; cxl_region_qos_class_mismatch; cxl_port_decoders_committed; + cxl_dport_inject_proto_err; + cxl_get_proto_errors; } LIBCXL_6; diff --git a/cxl/libcxl.h b/cxl/libcxl.h index fc6dd00..867daa4 100644 --- a/cxl/libcxl.h +++ b/cxl/libcxl.h @@ -160,6 +160,15 @@ struct cxl_port *cxl_port_get_next_all(struct cxl_port *port, for (port = cxl_port_get_first(top); port != NULL; \ port = cxl_port_get_next_all(port, top)) +enum cxl_proto_error_types { + CXL_CACHE_CORRECTABLE = 1 << 12, + CXL_CACHE_UNCORRECTABLE = 1 << 13, + CXL_CACHE_FATAL = 1 << 14, + CXL_MEM_CORRECTABLE = 1 << 15, + CXL_MEM_UNCORRECTABLE = 1 << 16, + CXL_MEM_FATAL = 1 << 17, +}; + struct cxl_dport; struct cxl_dport *cxl_dport_get_first(struct cxl_port *port); struct cxl_dport *cxl_dport_get_next(struct cxl_dport *dport); @@ -168,6 +177,10 @@ const char *cxl_dport_get_physical_node(struct cxl_dport *dport); const char *cxl_dport_get_firmware_node(struct cxl_dport *dport); struct cxl_port *cxl_dport_get_port(struct cxl_dport *dport); int cxl_dport_get_id(struct cxl_dport *dport); +int cxl_dport_inject_proto_err(struct cxl_dport *dport, + enum cxl_proto_error_types err, + const char *debugfs); +int cxl_get_proto_errors(struct cxl_ctx *ctx, char *buf, const char *debugfs); bool cxl_dport_maps_memdev(struct cxl_dport *dport, struct cxl_memdev *memdev); struct cxl_dport *cxl_port_get_dport_by_memdev(struct cxl_port *port, struct cxl_memdev *memdev); diff --git a/cxl/meson.build b/cxl/meson.build index 61b4d87..79da4e6 100644 --- a/cxl/meson.build +++ b/cxl/meson.build @@ -7,6 +7,7 @@ cxl_src = [ 'memdev.c', 'json.c', 'filter.c', + 'inject-error.c', '../daxctl/json.c', '../daxctl/filter.c', ]
Add inject-error command for injecting CXL errors into CXL devices. The command currently only has support for injecting CXL protocol errors into CXL downstream ports via EINJ. The command takes an error type and injects an error of that type into the specified downstream port. Downstream ports can be specified using the port's device name with the -d option. Available error types can be obtained by running "cxl inject-error --list-errors". This command requires the kernel to be built with CONFIG_DEBUGFS and CONFIG_ACPI_APEI_EINJ_CXL enabled. It also requires root privileges to run due to reading from <debugfs>/cxl/einj_types and writing to <debugfs>/cxl/<dport>/einj_inject. Example usage: # cxl inject-error --list-errors cxl.mem_correctable cxl.mem_fatal ... # cxl inject-error -d 0000:00:01.1 cxl.mem_correctable injected cxl.mem_correctable protocol error Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> --- cxl/builtin.h | 1 + cxl/cxl.c | 1 + cxl/inject-error.c | 188 +++++++++++++++++++++++++++++++++++++++++++++ cxl/lib/libcxl.c | 53 +++++++++++++ cxl/lib/libcxl.sym | 2 + cxl/libcxl.h | 13 ++++ cxl/meson.build | 1 + 7 files changed, 259 insertions(+) create mode 100644 cxl/inject-error.c