diff mbox series

[v2,2/6] cxl/memdev: Add support for the Clear Poison mailbox command

Message ID 3ae253f32602a62fa7521d5787b1b26b1c808275.1674101475.git.alison.schofield@intel.com
State Superseded
Headers show
Series cxl: CXL Inject & Clear Poison | expand

Commit Message

Alison Schofield Jan. 19, 2023, 5 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

CXL devices optionally support the CLEAR POISON mailbox command. Add
a sysfs attribute and memdev driver support for clearing poison.

When a Device Physical Address (DPA) is written to the clear_poison
sysfs attribute, send a clear poison command to the device for the
specified address.

Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
poison request, the device removes the address from the device's Poison
List and writes 0 (zero) for 64 bytes starting at address. If the device
cannot clear poison from the address, it returns a permanent media error
and -ENXIO is returned to the user.

Additionally, and per the spec also, it is not an error to clear poison
of an address that is not poisoned. No error is returned from the device
and the address is not overwritten.

*Implementation note: Although the CXL specification defines the clear
command to accept 64 bytes of 'write-data' to be used when clearing
the poisoned address, this implementation always uses 0 (zeros) for
the write-data.

The clear_poison attribute is only visible for devices supporting the
capability when the kernel is built with CONFIG_CXL_POISON_INJECT.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
 drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
 drivers/cxl/cxlmem.h                    |  6 +++
 3 files changed, 80 insertions(+), 1 deletion(-)

Comments

Dan Williams Jan. 27, 2023, 11:56 p.m. UTC | #1
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> CXL devices optionally support the CLEAR POISON mailbox command. Add
> a sysfs attribute and memdev driver support for clearing poison.
> 
> When a Device Physical Address (DPA) is written to the clear_poison
> sysfs attribute, send a clear poison command to the device for the
> specified address.
> 
> Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
> poison request, the device removes the address from the device's Poison
> List and writes 0 (zero) for 64 bytes starting at address. If the device
> cannot clear poison from the address, it returns a permanent media error
> and -ENXIO is returned to the user.
> 
> Additionally, and per the spec also, it is not an error to clear poison
> of an address that is not poisoned. No error is returned from the device
> and the address is not overwritten.
> 
> *Implementation note: Although the CXL specification defines the clear
> command to accept 64 bytes of 'write-data' to be used when clearing
> the poisoned address, this implementation always uses 0 (zeros) for
> the write-data.
> 
> The clear_poison attribute is only visible for devices supporting the
> capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> 
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
>  drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
>  drivers/cxl/cxlmem.h                    |  6 +++
>  3 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index e9c6dd02bd09..7e4897e7bc05 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -438,3 +438,21 @@ Description:
>  		inject_poison attribute is only visible for devices supporting
>  		the capability. Kconfig option CXL_POISON_INJECT must be on
>  		to enable this option. The default is off.
> +
> +
> +What:		/sys/bus/cxl/devices/memX/clear_poison
> +Date:		January, 2023
> +KernelVersion:	v6.3
> +Contact:	linux-cxl@vger.kernel.org
> +Description:
> +		(WO) When a Device Physical Address (DPA) is written to this
> +		attribute, the memdev driver sends a clear poison command to
> +		the device for the specified address. Clearing poison removes
> +		the address from the device's Poison List and writes 0 (zero)
> +		for 64 bytes starting at address. It is not an error to clear
> +		poison from an address that does not have poison set, and if
> +		poison was not set, the address is not overwritten. If the
> +		device cannot clear poison from the address, -ENXIO is returned.
> +		The clear_poison attribute is only visible for devices
> +		supporting the capability. Kconfig option CXL_POISON_INJECT
> +		must be on to enable this option. The default is off.

So unlike error inject, this interface leaves me cold because it is
changing the state of data without coordination.

You might say, "inject poison also changes the state of data without
coordination", while that is true it is expected that media can go bad
without warning. What software does not expect is that memory could be
put back into service without coordination. A memory error wants to be
cleared by the agent that currently owns the memory, like the page
allocator clearing PageHWPoison and putting the page back into service,
or a filesystem restoring a file that was previously quarantined.

The only way this interface can proceed is if it can assert that the
poison to be cleared is not mapped by any decoder which makes the owner
of the memory the administrator using the sysfs interface. That limits
its utility.

Inside the kernel the expectation is that the core-mm or filesystems are
using facilities like movdir64b to atomically clear poison without
needing to hassle with a CXL mailbox.

This sysfs interface can move forward but it needs the idle checks and
locking before it can issue the command. I would also have an eye
towards skipping the mailbox call on architectures that have poison
clearing instruction like x86's movdir64b, because as the spec says:
"This provides the same functionality as the host directly writing new
data to the device", so just try to do that by default. However that can
be a follow-on optimization.


> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 226662cf3331..4d86a4565c9e 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -197,6 +197,51 @@ static ssize_t inject_poison_store(struct device *dev,
>  }
>  static DEVICE_ATTR_WO(inject_poison);
>  
> +static ssize_t clear_poison_store(struct device *dev,
> +				  struct device_attribute *attr,
> +				  const char *buf, size_t len)
> +{
> +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mbox_clear_poison clear;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u64 dpa;
> +	int rc;
> +
> +	rc = kstrtou64(buf, 0, &dpa);
> +	if (rc)
> +		return rc;
> +
> +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> +	if (rc)
> +		return rc;
> +	/*
> +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> +	 * is defined to accept 64 bytes of 'write-data', along with the
> +	 * address to clear. The device writes the data into the address
> +	 * atomically, while clearing poison if the location is marked as
> +	 * being poisoned.
> +	 *
> +	 * Always use '0' for the write-data.
> +	 */
> +	clear = (struct cxl_mbox_clear_poison) {
> +		.address = cpu_to_le64(dpa)
> +	};
> +
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> +		.size_in = sizeof(clear),
> +		.payload_in = &clear,
> +	};
> +
> +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> +	if (rc)
> +		return rc;
> +
> +	return len;
> +}
> +static DEVICE_ATTR_WO(clear_poison);
> +
>  static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_serial.attr,
>  	&dev_attr_firmware_version.attr,
> @@ -205,6 +250,7 @@ static struct attribute *cxl_memdev_attributes[] = {
>  	&dev_attr_numa_node.attr,
>  	&dev_attr_trigger_poison_list.attr,
>  	&dev_attr_inject_poison.attr,
> +	&dev_attr_clear_poison.attr,
>  	NULL,
>  };
>  
> @@ -225,7 +271,8 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  		return 0;
>  
>  	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> -	    a == &dev_attr_inject_poison.attr)
> +	    (a == &dev_attr_inject_poison.attr ||
> +	     a == &dev_attr_clear_poison.attr))
>  		return 0;
>  
>  	if (a == &dev_attr_trigger_poison_list.attr) {
> @@ -242,6 +289,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
>  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
>  			return 0;
>  	}
> +	if (a == &dev_attr_clear_poison.attr) {
> +		struct device *dev = kobj_to_dev(kobj);
> +
> +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> +			return 0;

Similar comment as the last patch with respect to the command enabling.

> +		}
> +	}
>  	return a->mode;
>  }
>  
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 862ca4f4cc06..adcbd4a98819 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -441,6 +441,12 @@ struct cxl_mbox_inject_poison {
>  	__le64 address;
>  };
>  
> +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> +struct cxl_mbox_clear_poison {
> +	__le64 address;
> +	u8 write_data[CXL_POISON_LEN_MULT];
> +} __packed;
> +
>  /**
>   * struct cxl_mem_command - Driver representation of a memory device command
>   * @info: Command information as it exists for the UAPI
> -- 
> 2.37.3
>
Alison Schofield Jan. 28, 2023, 1:17 a.m. UTC | #2
On Fri, Jan 27, 2023 at 03:56:39PM -0800, Dan Williams wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > a sysfs attribute and memdev driver support for clearing poison.
> > 
> > When a Device Physical Address (DPA) is written to the clear_poison
> > sysfs attribute, send a clear poison command to the device for the
> > specified address.
> > 
> > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
> > poison request, the device removes the address from the device's Poison
> > List and writes 0 (zero) for 64 bytes starting at address. If the device
> > cannot clear poison from the address, it returns a permanent media error
> > and -ENXIO is returned to the user.
> > 
> > Additionally, and per the spec also, it is not an error to clear poison
> > of an address that is not poisoned. No error is returned from the device
> > and the address is not overwritten.
> > 
> > *Implementation note: Although the CXL specification defines the clear
> > command to accept 64 bytes of 'write-data' to be used when clearing
> > the poisoned address, this implementation always uses 0 (zeros) for
> > the write-data.
> > 
> > The clear_poison attribute is only visible for devices supporting the
> > capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > 
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
> >  drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
> >  drivers/cxl/cxlmem.h                    |  6 +++
> >  3 files changed, 80 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > index e9c6dd02bd09..7e4897e7bc05 100644
> > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > @@ -438,3 +438,21 @@ Description:
> >  		inject_poison attribute is only visible for devices supporting
> >  		the capability. Kconfig option CXL_POISON_INJECT must be on
> >  		to enable this option. The default is off.
> > +
> > +
> > +What:		/sys/bus/cxl/devices/memX/clear_poison
> > +Date:		January, 2023
> > +KernelVersion:	v6.3
> > +Contact:	linux-cxl@vger.kernel.org
> > +Description:
> > +		(WO) When a Device Physical Address (DPA) is written to this
> > +		attribute, the memdev driver sends a clear poison command to
> > +		the device for the specified address. Clearing poison removes
> > +		the address from the device's Poison List and writes 0 (zero)
> > +		for 64 bytes starting at address. It is not an error to clear
> > +		poison from an address that does not have poison set, and if
> > +		poison was not set, the address is not overwritten. If the
> > +		device cannot clear poison from the address, -ENXIO is returned.
> > +		The clear_poison attribute is only visible for devices
> > +		supporting the capability. Kconfig option CXL_POISON_INJECT
> > +		must be on to enable this option. The default is off.
> 
> So unlike error inject, this interface leaves me cold because it is
> changing the state of data without coordination.
> 
> You might say, "inject poison also changes the state of data without
> coordination", while that is true it is expected that media can go bad
> without warning. What software does not expect is that memory could be
> put back into service without coordination. A memory error wants to be
> cleared by the agent that currently owns the memory, like the page
> allocator clearing PageHWPoison and putting the page back into service,
> or a filesystem restoring a file that was previously quarantined.
> 
> The only way this interface can proceed is if it can assert that the
> poison to be cleared is not mapped by any decoder which makes the owner
> of the memory the administrator using the sysfs interface. That limits
> its utility.

It is a debug interface. 'Changing the state of data without the
coordination' yes, but isn't that coordination up to the user.
I get putting more protection in place, as in not allowing
inject or clear of a DPA that is mapped. 

Do you expect, or fear, that users will 'trigger poison list', and
then issue 'clears' of poisoned DPA's directly, instead of letting
core-mm and filesystems to handle poison?

> 
> Inside the kernel the expectation is that the core-mm or filesystems are
> using facilities like movdir64b to atomically clear poison without
> needing to hassle with a CXL mailbox.
> 
> This sysfs interface can move forward but it needs the idle checks and
> locking before it can issue the command. I would also have an eye
> towards skipping the mailbox call on architectures that have poison
> clearing instruction like x86's movdir64b, because as the spec says:
> "This provides the same functionality as the host directly writing new
> data to the device", so just try to do that by default. However that can
> be a follow-on optimization.

This (have an eye towards skipping ...) sounds like you do want
users to be able to clear real poison using the sysfs interface,
and for the the driver to be able to coordinate it upwards.

I'll go off and learn more about the upwards coordination, and
more thoughts here are welcome!

Thanks,
Alison

> 
> 
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index 226662cf3331..4d86a4565c9e 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -197,6 +197,51 @@ static ssize_t inject_poison_store(struct device *dev,
> >  }
> >  static DEVICE_ATTR_WO(inject_poison);
> >  
> > +static ssize_t clear_poison_store(struct device *dev,
> > +				  struct device_attribute *attr,
> > +				  const char *buf, size_t len)
> > +{
> > +	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> > +	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> > +	struct cxl_mbox_clear_poison clear;
> > +	struct cxl_mbox_cmd mbox_cmd;
> > +	u64 dpa;
> > +	int rc;
> > +
> > +	rc = kstrtou64(buf, 0, &dpa);
> > +	if (rc)
> > +		return rc;
> > +
> > +	rc = cxl_validate_poison_dpa(cxlds, dpa);
> > +	if (rc)
> > +		return rc;
> > +	/*
> > +	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
> > +	 * is defined to accept 64 bytes of 'write-data', along with the
> > +	 * address to clear. The device writes the data into the address
> > +	 * atomically, while clearing poison if the location is marked as
> > +	 * being poisoned.
> > +	 *
> > +	 * Always use '0' for the write-data.
> > +	 */
> > +	clear = (struct cxl_mbox_clear_poison) {
> > +		.address = cpu_to_le64(dpa)
> > +	};
> > +
> > +	mbox_cmd = (struct cxl_mbox_cmd) {
> > +		.opcode = CXL_MBOX_OP_CLEAR_POISON,
> > +		.size_in = sizeof(clear),
> > +		.payload_in = &clear,
> > +	};
> > +
> > +	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
> > +	if (rc)
> > +		return rc;
> > +
> > +	return len;
> > +}
> > +static DEVICE_ATTR_WO(clear_poison);
> > +
> >  static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_serial.attr,
> >  	&dev_attr_firmware_version.attr,
> > @@ -205,6 +250,7 @@ static struct attribute *cxl_memdev_attributes[] = {
> >  	&dev_attr_numa_node.attr,
> >  	&dev_attr_trigger_poison_list.attr,
> >  	&dev_attr_inject_poison.attr,
> > +	&dev_attr_clear_poison.attr,
> >  	NULL,
> >  };
> >  
> > @@ -225,7 +271,8 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  		return 0;
> >  
> >  	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
> > -	    a == &dev_attr_inject_poison.attr)
> > +	    (a == &dev_attr_inject_poison.attr ||
> > +	     a == &dev_attr_clear_poison.attr))
> >  		return 0;
> >  
> >  	if (a == &dev_attr_trigger_poison_list.attr) {
> > @@ -242,6 +289,14 @@ static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
> >  			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
> >  			return 0;
> >  	}
> > +	if (a == &dev_attr_clear_poison.attr) {
> > +		struct device *dev = kobj_to_dev(kobj);
> > +
> > +		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
> > +			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
> > +			return 0;
> 
> Similar comment as the last patch with respect to the command enabling.
> 
> > +		}
> > +	}
> >  	return a->mode;
> >  }
> >  
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 862ca4f4cc06..adcbd4a98819 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -441,6 +441,12 @@ struct cxl_mbox_inject_poison {
> >  	__le64 address;
> >  };
> >  
> > +/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
> > +struct cxl_mbox_clear_poison {
> > +	__le64 address;
> > +	u8 write_data[CXL_POISON_LEN_MULT];
> > +} __packed;
> > +
> >  /**
> >   * struct cxl_mem_command - Driver representation of a memory device command
> >   * @info: Command information as it exists for the UAPI
> > -- 
> > 2.37.3
> > 
> 
>
Dan Williams Jan. 28, 2023, 2:19 a.m. UTC | #3
Alison Schofield wrote:
> On Fri, Jan 27, 2023 at 03:56:39PM -0800, Dan Williams wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > CXL devices optionally support the CLEAR POISON mailbox command. Add
> > > a sysfs attribute and memdev driver support for clearing poison.
> > > 
> > > When a Device Physical Address (DPA) is written to the clear_poison
> > > sysfs attribute, send a clear poison command to the device for the
> > > specified address.
> > > 
> > > Per the CXL Specification (3.0 8.2.9.8.4.3), after receiving a valid clear
> > > poison request, the device removes the address from the device's Poison
> > > List and writes 0 (zero) for 64 bytes starting at address. If the device
> > > cannot clear poison from the address, it returns a permanent media error
> > > and -ENXIO is returned to the user.
> > > 
> > > Additionally, and per the spec also, it is not an error to clear poison
> > > of an address that is not poisoned. No error is returned from the device
> > > and the address is not overwritten.
> > > 
> > > *Implementation note: Although the CXL specification defines the clear
> > > command to accept 64 bytes of 'write-data' to be used when clearing
> > > the poisoned address, this implementation always uses 0 (zeros) for
> > > the write-data.
> > > 
> > > The clear_poison attribute is only visible for devices supporting the
> > > capability when the kernel is built with CONFIG_CXL_POISON_INJECT.
> > > 
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  Documentation/ABI/testing/sysfs-bus-cxl | 18 ++++++++
> > >  drivers/cxl/core/memdev.c               | 57 ++++++++++++++++++++++++-
> > >  drivers/cxl/cxlmem.h                    |  6 +++
> > >  3 files changed, 80 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> > > index e9c6dd02bd09..7e4897e7bc05 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-cxl
> > > +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> > > @@ -438,3 +438,21 @@ Description:
> > >  		inject_poison attribute is only visible for devices supporting
> > >  		the capability. Kconfig option CXL_POISON_INJECT must be on
> > >  		to enable this option. The default is off.
> > > +
> > > +
> > > +What:		/sys/bus/cxl/devices/memX/clear_poison
> > > +Date:		January, 2023
> > > +KernelVersion:	v6.3
> > > +Contact:	linux-cxl@vger.kernel.org
> > > +Description:
> > > +		(WO) When a Device Physical Address (DPA) is written to this
> > > +		attribute, the memdev driver sends a clear poison command to
> > > +		the device for the specified address. Clearing poison removes
> > > +		the address from the device's Poison List and writes 0 (zero)
> > > +		for 64 bytes starting at address. It is not an error to clear
> > > +		poison from an address that does not have poison set, and if
> > > +		poison was not set, the address is not overwritten. If the
> > > +		device cannot clear poison from the address, -ENXIO is returned.
> > > +		The clear_poison attribute is only visible for devices
> > > +		supporting the capability. Kconfig option CXL_POISON_INJECT
> > > +		must be on to enable this option. The default is off.
> > 
> > So unlike error inject, this interface leaves me cold because it is
> > changing the state of data without coordination.
> > 
> > You might say, "inject poison also changes the state of data without
> > coordination", while that is true it is expected that media can go bad
> > without warning. What software does not expect is that memory could be
> > put back into service without coordination. A memory error wants to be
> > cleared by the agent that currently owns the memory, like the page
> > allocator clearing PageHWPoison and putting the page back into service,
> > or a filesystem restoring a file that was previously quarantined.
> > 
> > The only way this interface can proceed is if it can assert that the
> > poison to be cleared is not mapped by any decoder which makes the owner
> > of the memory the administrator using the sysfs interface. That limits
> > its utility.
> 
> It is a debug interface.

...but it isn't. It's an arbitrary write 64-bytes of zeros primitive.
Nothing gates that the location to be written was one that had poison
injected to it. 

> 'Changing the state of data without the
> coordination' yes, but isn't that coordination up to the user.

There's no way for the user to control that. Consider a block in a pmem
filesystem that is actively adding and removing blocks from files.
There's no mechanism for a user to coordinate that from the driver side.
It needs to be coordinated from the filesystem side.

> I get putting more protection in place, as in not allowing
> inject or clear of a DPA that is mapped. 

That solves most of the concern, and truly makes it a debug interface
since the user action cannot possibly confuse any other part of the
kernel when the DPA is not mapped.

> Do you expect, or fear, that users will 'trigger poison list', and
> then issue 'clears' of poisoned DPA's directly, instead of letting
> core-mm and filesystems to handle poison?

Yes. This policy is also carry over from pmem enabling where you see
that "ndctl clear-errors" is tearing down namespaces before issuing the
clear-error command.

> > Inside the kernel the expectation is that the core-mm or filesystems are
> > using facilities like movdir64b to atomically clear poison without
> > needing to hassle with a CXL mailbox.
> > 
> > This sysfs interface can move forward but it needs the idle checks and
> > locking before it can issue the command. I would also have an eye
> > towards skipping the mailbox call on architectures that have poison
> > clearing instruction like x86's movdir64b, because as the spec says:
> > "This provides the same functionality as the host directly writing new
> > data to the device", so just try to do that by default. However that can
> > be a follow-on optimization.
> 
> This (have an eye towards skipping ...) sounds like you do want
> users to be able to clear real poison using the sysfs interface,
> and for the the driver to be able to coordinate it upwards.

No, the reverse, that clearing poison from sysfs is a debug special case
because it cannot be done at runtime.

There are existing efforts to teach other parts of the kernel how to
better deal with poison memory in general (nothing specific to CXL) [1].
Correctable poison is a new concept for most of the kernel. Usually
memory_failure() is a one way trip to offline the page.

[1]: https://lore.kernel.org/all/20220603053738.1218681-7-ruansy.fnst@fujitsu.com/
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index e9c6dd02bd09..7e4897e7bc05 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -438,3 +438,21 @@  Description:
 		inject_poison attribute is only visible for devices supporting
 		the capability. Kconfig option CXL_POISON_INJECT must be on
 		to enable this option. The default is off.
+
+
+What:		/sys/bus/cxl/devices/memX/clear_poison
+Date:		January, 2023
+KernelVersion:	v6.3
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) When a Device Physical Address (DPA) is written to this
+		attribute, the memdev driver sends a clear poison command to
+		the device for the specified address. Clearing poison removes
+		the address from the device's Poison List and writes 0 (zero)
+		for 64 bytes starting at address. It is not an error to clear
+		poison from an address that does not have poison set, and if
+		poison was not set, the address is not overwritten. If the
+		device cannot clear poison from the address, -ENXIO is returned.
+		The clear_poison attribute is only visible for devices
+		supporting the capability. Kconfig option CXL_POISON_INJECT
+		must be on to enable this option. The default is off.
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 226662cf3331..4d86a4565c9e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -197,6 +197,51 @@  static ssize_t inject_poison_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(inject_poison);
 
+static ssize_t clear_poison_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t len)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_mbox_clear_poison clear;
+	struct cxl_mbox_cmd mbox_cmd;
+	u64 dpa;
+	int rc;
+
+	rc = kstrtou64(buf, 0, &dpa);
+	if (rc)
+		return rc;
+
+	rc = cxl_validate_poison_dpa(cxlds, dpa);
+	if (rc)
+		return rc;
+	/*
+	 * In CXL 3.0 Spec 8.2.9.8.4.3, the Clear Poison mailbox command
+	 * is defined to accept 64 bytes of 'write-data', along with the
+	 * address to clear. The device writes the data into the address
+	 * atomically, while clearing poison if the location is marked as
+	 * being poisoned.
+	 *
+	 * Always use '0' for the write-data.
+	 */
+	clear = (struct cxl_mbox_clear_poison) {
+		.address = cpu_to_le64(dpa)
+	};
+
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_CLEAR_POISON,
+		.size_in = sizeof(clear),
+		.payload_in = &clear,
+	};
+
+	rc = cxl_internal_send_cmd(cxlds, &mbox_cmd);
+	if (rc)
+		return rc;
+
+	return len;
+}
+static DEVICE_ATTR_WO(clear_poison);
+
 static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_serial.attr,
 	&dev_attr_firmware_version.attr,
@@ -205,6 +250,7 @@  static struct attribute *cxl_memdev_attributes[] = {
 	&dev_attr_numa_node.attr,
 	&dev_attr_trigger_poison_list.attr,
 	&dev_attr_inject_poison.attr,
+	&dev_attr_clear_poison.attr,
 	NULL,
 };
 
@@ -225,7 +271,8 @@  static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 		return 0;
 
 	if (!IS_ENABLED(CONFIG_CXL_POISON_INJECT) &&
-	    a == &dev_attr_inject_poison.attr)
+	    (a == &dev_attr_inject_poison.attr ||
+	     a == &dev_attr_clear_poison.attr))
 		return 0;
 
 	if (a == &dev_attr_trigger_poison_list.attr) {
@@ -242,6 +289,14 @@  static umode_t cxl_memdev_visible(struct kobject *kobj, struct attribute *a,
 			      to_cxl_memdev(dev)->cxlds->enabled_cmds))
 			return 0;
 	}
+	if (a == &dev_attr_clear_poison.attr) {
+		struct device *dev = kobj_to_dev(kobj);
+
+		if (!test_bit(CXL_MEM_COMMAND_ID_CLEAR_POISON,
+			      to_cxl_memdev(dev)->cxlds->enabled_cmds)) {
+			return 0;
+		}
+	}
 	return a->mode;
 }
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 862ca4f4cc06..adcbd4a98819 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -441,6 +441,12 @@  struct cxl_mbox_inject_poison {
 	__le64 address;
 };
 
+/* Clear Poison  CXL 3.0 Spec 8.2.9.8.4.3 */
+struct cxl_mbox_clear_poison {
+	__le64 address;
+	u8 write_data[CXL_POISON_LEN_MULT];
+} __packed;
+
 /**
  * struct cxl_mem_command - Driver representation of a memory device command
  * @info: Command information as it exists for the UAPI