diff mbox series

[v2,1/6] cxl/memdev: Add support for the Inject Poison mailbox command

Message ID 97a0b128d0d0df56cea1a1a4ead65a40b9cf008e.1674101475.git.alison.schofield@intel.com
State Superseded
Headers show
Series cxl: CXL Inject & Clear Poison | expand

Commit Message

Alison Schofield Jan. 19, 2023, 5 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the INJECT POISON mailbox command. Add
a sysfs attribute and memdev driver support for injecting poison.

When a Device Physical Address (DPA) is written to the inject_poison
sysfs attribute, send an inject poison command to the device for the
specified address.

Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
inject poison request, the device will return poison when the address
is accessed through the CXL.mem bus. Injecting poison adds the address
to the device's Poison List and the error source is set to Injected.
In addition, the device adds a poison creation event to its internal
Informational Event log, updates the Event Status register, and if
configured, interrupts the host.

Also, per the CXL Specification, it is not an error to inject poison
into an address that already has poison present and no error is
returned from the device.

The inject_poison attribute is only visible for devices supporting
the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
 drivers/cxl/Kconfig                     | 10 ++++
 drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
 drivers/cxl/cxlmem.h                    |  5 ++
 4 files changed, 104 insertions(+)

Comments

Dan Williams Jan. 27, 2023, 11:06 p.m. UTC | #1
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the INJECT POISON mailbox command. Add
> a sysfs attribute and memdev driver support for injecting poison.
> 
> When a Device Physical Address (DPA) is written to the inject_poison
> sysfs attribute, send an inject poison command to the device for the
> specified address.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> inject poison request, the device will return poison when the address
> is accessed through the CXL.mem bus. Injecting poison adds the address
> to the device's Poison List and the error source is set to Injected.
> In addition, the device adds a poison creation event to its internal
> Informational Event log, updates the Event Status register, and if
> configured, interrupts the host.
> 
> Also, per the CXL Specification, it is not an error to inject poison
> into an address that already has poison present and no error is
> returned from the device.
> 
> The inject_poison attribute is only visible for devices supporting
> the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
>  drivers/cxl/Kconfig                     | 10 ++++
>  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h                    |  5 ++
>  4 files changed, 104 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index b715a4609718..e9c6dd02bd09 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -416,3 +416,25 @@ Description:
>  		if accessed, and the source of the poison. The retrieved
>  		errors are logged as kernel trace events with the label
>  		'cxl_poison'.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/inject_poison
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends an inject poison command to
> +		the device for the specified address. The DPA must be 64-byte
> +		aligned and the length of the injected poison is 64-bytes. If
> +		successful, the device returns poison when the address is
> +		accessed through the CXL.mem bus. Injecting poison adds the
> +		address to the device's Poison List and the error source is set
> +		to Injected. In addition, the device adds a poison creation
> +		event to its internal Informational Event log, updates the
> +		Event Status register, and if configured, interrupts the host.
> +		It is not an error to inject poison into an address that
> +		already has poison present and no error is returned. The
> +		inject_poison attribute is only visible for devices supporting
> +		the capability. Kconfig option CXL_POISON_INJECT must be on
> +		to enable this option. The default is off.
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 0ac53c422c31..6541f54725cd 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_POISON_INJECT
> +	bool "CXL: Support CXL Memory Device Poison Inject"
> +	depends on CXL_MEM
> +	help
> +	  Selecting this option creates the sysfs attributes inject_poison
> +	  and clear_poison for CXL memory devices supporting the capability.
> +	  See Documentation/ABI/testing/sysfs-bus-cxl.

Could maybe clarify that this is meant for hardware debug scenarios so
that is the reason it is disabled by default.

> +
> +	  If unsure, say N.
> +
>  endif
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index e0af7e9c9989..226662cf3331 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(trigger_poison_list);
>  
> +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> +{
> +	if (!resource_size(&cxlds->dpa_res)) {
> +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> +		return -EINVAL;
> +	}
> +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> +			dpa, &cxlds->dpa_res);
> +		return -EINVAL;
> +	}
> +	if (!IS_ALIGNED(dpa, 64)) {
> +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
> +			dpa);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +static ssize_t inject_poison_store(struct device *dev,
> +				   struct device_attribute *attr,
> +				   const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_inject_poison inject;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +
> +	inject = (struct cxl_mbox_inject_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> +		.size_in = sizeof(inject),
> +		.payload_in = &inject,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(inject_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_label_storage_size.attr,
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
> +	&dev_attr_inject_poison.attr,
>  	NULL,
>  };
>  
> @@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
>  		return 0;
>  
> +	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> +	    a == &dev_attr_inject_poison.attr)
> +		return 0;
> +
>  	if (a == &dev_attr_trigger_poison_list.attr) {
>  		struct device *dev = kobj_to_dev(kobj);
>  
> @@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_inject_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);

I'd move the IS_ENABLED(CONFIG_CXL_POISON_INJECT) inside here so just
one spot in this function handles the poison attribute.

> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> +			return 0;

Ugh, this is a problem. So "inject poison" never should have been
enabled for the ioctl path way back in:

87815ee9d006 cxl/pci: Add media provisioning required commands

All the nice sysfs interface and compile option to turn it off in this
patch is moot since userspace can just send the ioctl if the sysfs
attribute is missing.

On the one hand this is already shipping ABI, but given cxl-cli has not
been enabled it chances are high that it can be deleted without anyone
caring (i.e. breaking deployed configurations). That would need to be a
lead in patch. As for how to detect that the inject poison opcode is
supported, that needs something like a custom "cxlds->has_poison_inject"
flag into cxl_walk_cel(). I.e.  cxlds->enabled_cmds is only about the
enabled ioctl wrapper commands for CXL opcodes not the availability of
opcodes for cxl_internal_send_cmd().


> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 28ba0cd8f2d3..862ca4f4cc06 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -436,6 +436,11 @@ struct cxl_mbox_poison_payload_out {
>  #define CXL_POISON_SOURCE_INJECTED	3
>  #define CXL_POISON_SOURCE_VENDOR	7
>  
> +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> +struct cxl_mbox_inject_poison {
> +	__le64 address;
> +};
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> -- 
> 2.37.3
>
Alison Schofield Jan. 28, 2023, 2:47 a.m. UTC | #2
On Fri, Jan 27, 2023 at 03:06:16PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the INJECT POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for injecting poison.
> > 
> > When a Device Physical Address (DPA) is written to the inject_poison
> > sysfs attribute, send an inject poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> > inject poison request, the device will return poison when the address
> > is accessed through the CXL.mem bus. Injecting poison adds the address
> > to the device's Poison List and the error source is set to Injected.
> > In addition, the device adds a poison creation event to its internal
> > Informational Event log, updates the Event Status register, and if
> > configured, interrupts the host.
> > 
> > Also, per the CXL Specification, it is not an error to inject poison
> > into an address that already has poison present and no error is
> > returned from the device.
> > 
> > The inject_poison attribute is only visible for devices supporting
> > the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
> >  drivers/cxl/Kconfig                     | 10 ++++
> >  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
> >  drivers/cxl/cxlmem.h                    |  5 ++
> >  4 files changed, 104 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index b715a4609718..e9c6dd02bd09 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -416,3 +416,25 @@ Description:
> >  		if accessed, and the source of the poison. The retrieved
> >  		errors are logged as kernel trace events with the label
> >  		'cxl_poison'.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > +Date:		January, 2023
> > +KernelVersion:	v6.3
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute, the memdev driver sends an inject poison command to
> > +		the device for the specified address. The DPA must be 64-byte
> > +		aligned and the length of the injected poison is 64-bytes. If
> > +		successful, the device returns poison when the address is
> > +		accessed through the CXL.mem bus. Injecting poison adds the
> > +		address to the device's Poison List and the error source is set
> > +		to Injected. In addition, the device adds a poison creation
> > +		event to its internal Informational Event log, updates the
> > +		Event Status register, and if configured, interrupts the host.
> > +		It is not an error to inject poison into an address that
> > +		already has poison present and no error is returned. The
> > +		inject_poison attribute is only visible for devices supporting
> > +		the capability. Kconfig option CXL_POISON_INJECT must be on
> > +		to enable this option. The default is off.
> > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > index 0ac53c422c31..6541f54725cd 100644
> > --- a/drivers/cxl/Kconfig
> > +++ b/drivers/cxl/Kconfig
> > @@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
> >  	  If unsure, or if this kernel is meant for production environments,
> >  	  say N.
> >  
> > +config CXL_POISON_INJECT
> > +	bool "CXL: Support CXL Memory Device Poison Inject"
> > +	depends on CXL_MEM
> > +	help
> > +	  Selecting this option creates the sysfs attributes inject_poison
> > +	  and clear_poison for CXL memory devices supporting the capability.
> > +	  See Documentation/ABI/testing/sysfs-bus-cxl.
> 
> Could maybe clarify that this is meant for hardware debug scenarios so
> that is the reason it is disabled by default.
>
Got it. Thanks!

> > +
> > +	  If unsure, say N.
> > +
> >  endif
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index e0af7e9c9989..226662cf3331 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(trigger_poison_list);
> >  
> > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > +{
> > +	if (!resource_size(&cxlds->dpa_res)) {
> > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > +		return -EINVAL;
> > +	}
> > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > +			dpa, &cxlds->dpa_res);
> > +		return -EINVAL;
> > +	}
> > +	if (!IS_ALIGNED(dpa, 64)) {
> > +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
> > +			dpa);
> > +		return -EINVAL;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static ssize_t inject_poison_store(struct device *dev,
> > +				   struct device_attribute *attr,
> > +				   const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_mbox_inject_poison inject;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	inject = (struct cxl_mbox_inject_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> > +		.size_in = sizeof(inject),
> > +		.payload_in = &inject,
> > +	};
> > +
> > +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(inject_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_label_storage_size.attr,
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> > +	&dev_attr_inject_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> >  		return 0;
> >  
> > +	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> > +	    a == &dev_attr_inject_poison.attr)
> > +		return 0;
> > +
> >  	if (a == &dev_attr_trigger_poison_list.attr) {
> >  		struct device *dev = kobj_to_dev(kobj);
> >  
> > @@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_inject_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> 
> I'd move the IS_ENABLED(CONFIG_CXL_POISON_INJECT) inside here so just
> one spot in this function handles the poison attribute.

OK

> 
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > +			return 0;
> 
> Ugh, this is a problem. So "inject poison" never should have been
> enabled for the ioctl path way back in:
> 
> 87815ee9d006 cxl/pci: Add media provisioning required commands
> 
> All the nice sysfs interface and compile option to turn it off in this
> patch is moot since userspace can just send the ioctl if the sysfs
> attribute is missing.
> 
> On the one hand this is already shipping ABI, but given cxl-cli has not
> been enabled it chances are high that it can be deleted without anyone
> caring (i.e. breaking deployed configurations). That would need to be a
> lead in patch.

I'm confused on how to kill it.

I see it in the enum here: include/uapi/linux/cxl_mem.h, and I also
see Ira's patch reminding us not to break backward compatibility on
that enum. Do I replace with dummy entries?

I'll block in raw mode too.

> As for how to detect that the inject poison opcode is
> supported, that needs something like a custom "cxlds->has_poison_inject"
> flag into cxl_walk_cel(). I.e.  cxlds->enabled_cmds is only about the
> enabled ioctl wrapper commands for CXL opcodes not the availability of
> opcodes for cxl_internal_send_cmd().
Sounds like fun ;)

Thanks Dan,
Alison

> 
> 
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 28ba0cd8f2d3..862ca4f4cc06 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -436,6 +436,11 @@ struct cxl_mbox_poison_payload_out {
> >  #define CXL_POISON_SOURCE_INJECTED	3
> >  #define CXL_POISON_SOURCE_VENDOR	7
> >  
> > +/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
> > +struct cxl_mbox_inject_poison {
> > +	__le64 address;
> > +};
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI
> > -- 
> > 2.37.3
> > 
> 
>
Dan Williams Jan. 29, 2023, 3:49 a.m. UTC | #3
Alison Schofield wrote:
> On Fri, Jan 27, 2023 at 03:06:16PM -0800, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > CXL devices optionally support the INJECT POISON mailbox command. Add
> > > a sysfs attribute and memdev driver support for injecting poison.
> > > 
> > > When a Device Physical Address (DPA) is written to the inject_poison
> > > sysfs attribute, send an inject poison command to the device for the
> > > specified address.
> > > 
> > > Per the CXL Specification (3.0 8.2.9.8.4.2), after receiving a valid
> > > inject poison request, the device will return poison when the address
> > > is accessed through the CXL.mem bus. Injecting poison adds the address
> > > to the device's Poison List and the error source is set to Injected.
> > > In addition, the device adds a poison creation event to its internal
> > > Informational Event log, updates the Event Status register, and if
> > > configured, interrupts the host.
> > > 
> > > Also, per the CXL Specification, it is not an error to inject poison
> > > into an address that already has poison present and no error is
> > > returned from the device.
> > > 
> > > The inject_poison attribute is only visible for devices supporting
> > > the capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++
> > >  drivers/cxl/Kconfig                     | 10 ++++
> > >  drivers/cxl/core/memdev.c               | 67 +++++++++++++++++++++++++
> > >  drivers/cxl/cxlmem.h                    |  5 ++
> > >  4 files changed, 104 insertions(+)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index b715a4609718..e9c6dd02bd09 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -416,3 +416,25 @@ Description:
> > >  		if accessed, and the source of the poison. The retrieved
> > >  		errors are logged as kernel trace events with the label
> > >  		'cxl_poison'.
> > > +
> > > +
> > > +What:		/sys/bus/cxl/devices/memX/inject_poison
> > > +Date:		January, 2023
> > > +KernelVersion:	v6.3
> > > +Contact:	linux-cxl@vger.kernel.org
> > > +Description:
> > > +		(WO) When a Device Physical Address (DPA) is written to this
> > > +		attribute, the memdev driver sends an inject poison command to
> > > +		the device for the specified address. The DPA must be 64-byte
> > > +		aligned and the length of the injected poison is 64-bytes. If
> > > +		successful, the device returns poison when the address is
> > > +		accessed through the CXL.mem bus. Injecting poison adds the
> > > +		address to the device's Poison List and the error source is set
> > > +		to Injected. In addition, the device adds a poison creation
> > > +		event to its internal Informational Event log, updates the
> > > +		Event Status register, and if configured, interrupts the host.
> > > +		It is not an error to inject poison into an address that
> > > +		already has poison present and no error is returned. The
> > > +		inject_poison attribute is only visible for devices supporting
> > > +		the capability. Kconfig option CXL_POISON_INJECT must be on
> > > +		to enable this option. The default is off.
> > > diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> > > index 0ac53c422c31..6541f54725cd 100644
> > > --- a/drivers/cxl/Kconfig
> > > +++ b/drivers/cxl/Kconfig
> > > @@ -129,4 +129,14 @@ config CXL_REGION_INVALIDATION_TEST
> > >  	  If unsure, or if this kernel is meant for production environments,
> > >  	  say N.
> > >  
> > > +config CXL_POISON_INJECT
> > > +	bool "CXL: Support CXL Memory Device Poison Inject"
> > > +	depends on CXL_MEM
> > > +	help
> > > +	  Selecting this option creates the sysfs attributes inject_poison
> > > +	  and clear_poison for CXL memory devices supporting the capability.
> > > +	  See Documentation/ABI/testing/sysfs-bus-cxl.
> > 
> > Could maybe clarify that this is meant for hardware debug scenarios so
> > that is the reason it is disabled by default.
> >
> Got it. Thanks!
> 
> > > +
> > > +	  If unsure, say N.
> > > +
> > >  endif
> > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > > index e0af7e9c9989..226662cf3331 100644
> > > --- a/drivers/cxl/core/memdev.c
> > > +++ b/drivers/cxl/core/memdev.c
> > > @@ -142,6 +142,61 @@ static ssize_t trigger_poison_list_store(struct device *dev,
> > >  }
> > >  static DEVICE_ATTR_WO(trigger_poison_list);
> > >  
> > > +static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
> > > +{
> > > +	if (!resource_size(&cxlds->dpa_res)) {
> > > +		dev_dbg(cxlds->dev, "device has no dpa resource\n");
> > > +		return -EINVAL;
> > > +	}
> > > +	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
> > > +		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
> > > +			dpa, &cxlds->dpa_res);
> > > +		return -EINVAL;
> > > +	}
> > > +	if (!IS_ALIGNED(dpa, 64)) {
> > > +		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
> > > +			dpa);
> > > +		return -EINVAL;
> > > +	}
> > > +	return 0;
> > > +}
> > > +
> > > +static ssize_t inject_poison_store(struct device *dev,
> > > +				   struct device_attribute *attr,
> > > +				   const char *buf, size_t len)
> > > +{
> > > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > > +	struct cxl_mbox_inject_poison inject;
> > > +	struct cxl_mbox_cmd mbox_cmd;
> > > +	u64 dpa;
> > > +	int rc;
> > > +
> > > +	rc = kstrtou64(buf, 0, &dpa);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	inject = (struct cxl_mbox_inject_poison) {
> > > +		.address = cpu_to_le64(dpa)
> > > +	};
> > > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > > +		.opcode = CXL_MBOX_OP_INJECT_POISON,
> > > +		.size_in = sizeof(inject),
> > > +		.payload_in = &inject,
> > > +	};
> > > +
> > > +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	return len;
> > > +}
> > > +static DEVICE_ATTR_WO(inject_poison);
> > > +
> > >  static struct attribute *cxl_memdev_attributes[] = {
> > >  	&dev_attr_serial.attr,
> > >  	&dev_attr_firmware_version.attr,
> > > @@ -149,6 +204,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> > >  	&dev_attr_label_storage_size.attr,
> > >  	&dev_attr_numa_node.attr,
> > >  	&dev_attr_trigger_poison_list.attr,
> > > +	&dev_attr_inject_poison.attr,
> > >  	NULL,
> > >  };
> > >  
> > > @@ -168,6 +224,10 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> > >  	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
> > >  		return 0;
> > >  
> > > +	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> > > +	    a == &dev_attr_inject_poison.attr)
> > > +		return 0;
> > > +
> > >  	if (a == &dev_attr_trigger_poison_list.attr) {
> > >  		struct device *dev = kobj_to_dev(kobj);
> > >  
> > > @@ -175,6 +235,13 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> > >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > >  			return 0;
> > >  	}
> > > +	if (a == &dev_attr_inject_poison.attr) {
> > > +		struct device *dev = kobj_to_dev(kobj);
> > 
> > I'd move the IS_ENABLED(CONFIG_CXL_POISON_INJECT) inside here so just
> > one spot in this function handles the poison attribute.
> 
> OK
> 
> > 
> > > +
> > > +		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
> > > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> > > +			return 0;
> > 
> > Ugh, this is a problem. So "inject poison" never should have been
> > enabled for the ioctl path way back in:
> > 
> > 87815ee9d006 cxl/pci: Add media provisioning required commands
> > 
> > All the nice sysfs interface and compile option to turn it off in this
> > patch is moot since userspace can just send the ioctl if the sysfs
> > attribute is missing.
> > 
> > On the one hand this is already shipping ABI, but given cxl-cli has not
> > been enabled it chances are high that it can be deleted without anyone
> > caring (i.e. breaking deployed configurations). That would need to be a
> > lead in patch.
> 
> I'm confused on how to kill it.
> 
> I see it in the enum here: include/uapi/linux/cxl_mem.h, and I also
> see Ira's patch reminding us not to break backward compatibility on
> that enum. Do I replace with dummy entries?

Right, they need to be converted to "deprecated" entries that keep the
command id order intact.
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index b715a4609718..e9c6dd02bd09 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -416,3 +416,25 @@  Description:
 		if accessed, and the source of the poison. The retrieved
 		errors are logged as kernel trace events with the label
 		'cxl_poison'.
+
+
+What:		/sys/bus/cxl/devices/memX/inject_poison
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends an inject poison command to
+		the device for the specified address. The DPA must be 64-byte
+		aligned and the length of the injected poison is 64-bytes. If
+		successful, the device returns poison when the address is
+		accessed through the CXL.mem bus. Injecting poison adds the
+		address to the device's Poison List and the error source is set
+		to Injected. In addition, the device adds a poison creation
+		event to its internal Informational Event log, updates the
+		Event Status register, and if configured, interrupts the host.
+		It is not an error to inject poison into an address that
+		already has poison present and no error is returned. The
+		inject_poison attribute is only visible for devices supporting
+		the capability. Kconfig option CXL_POISON_INJECT must be on
+		to enable this option. The default is off.
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 0ac53c422c31..6541f54725cd 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -129,4 +129,14 @@  config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_POISON_INJECT
+	bool "CXL: Support CXL Memory Device Poison Inject"
+	depends on CXL_MEM
+	help
+	  Selecting this option creates the sysfs attributes inject_poison
+	  and clear_poison for CXL memory devices supporting the capability.
+	  See Documentation/ABI/testing/sysfs-bus-cxl.
+
+	  If unsure, say N.
+
 endif
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index e0af7e9c9989..226662cf3331 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -142,6 +142,61 @@  static ssize_t trigger_poison_list_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(trigger_poison_list);
 
+static int cxl_validate_poison_dpa(struct cxl_dev_state *cxlds, u64 dpa)
+{
+	if (!resource_size(&cxlds->dpa_res)) {
+		dev_dbg(cxlds->dev, "device has no dpa resource\n");
+		return -EINVAL;
+	}
+	if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx not in resource:%pR\n",
+			dpa, &cxlds->dpa_res);
+		return -EINVAL;
+	}
+	if (!IS_ALIGNED(dpa, 64)) {
+		dev_dbg(cxlds->dev, "dpa:0x%llx is not 64-byte aligned\n",
+			dpa);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static ssize_t inject_poison_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_mbox_inject_poison inject;
+	struct cxl_mbox_cmd mbox_cmd;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+
+	rc = cxl_validate_poison_dpa(cxlds, dpa);
+	if (rc)
+		return rc;
+
+	inject = (struct cxl_mbox_inject_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_INJECT_POISON,
+		.size_in = sizeof(inject),
+		.payload_in = &inject,
+	};
+
+	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(inject_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -149,6 +204,7 @@  static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_label_storage_size.attr,
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
+	&dev_attr_inject_poison.attr,
 	NULL,
 };
 
@@ -168,6 +224,10 @@  static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 	if (!IS_ENABLED(CONFIG_NUMA) && a == &dev_attr_numa_node.attr)
 		return 0;
 
+	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
+	    a == &dev_attr_inject_poison.attr)
+		return 0;
+
 	if (a == &dev_attr_trigger_poison_list.attr) {
 		struct device *dev = kobj_to_dev(kobj);
 
@@ -175,6 +235,13 @@  static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_inject_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_INJECT_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
+			return 0;
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 28ba0cd8f2d3..862ca4f4cc06 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -436,6 +436,11 @@  struct cxl_mbox_poison_payload_out {
 #define CXL_POISON_SOURCE_INJECTED	3
 #define CXL_POISON_SOURCE_VENDOR	7
 
+/* Inject & Clear Poison  CXL 3.0 Spec 8.2.9.8.4.2/3 */
+struct cxl_mbox_inject_poison {
+	__le64 address;
+};
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI