diff mbox series

[v15,11/15] EDAC: Add memory repair control feature

Message ID 20241101091735.1465-12-shiju.jose@huawei.com
State Superseded
Headers show
Series EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers | expand

Commit Message

Shiju Jose Nov. 1, 2024, 9:17 a.m. UTC
From: Shiju Jose <shiju.jose@huawei.com>

Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
memory sparing etc, control driver in order to control memory repairs
in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard memory
sparing, memory sparing at cacheline/row/bank/rank granularity etc.
Device with memory repair features registers with EDAC device driver,
which retrieves memory repair descriptor from EDAC memory repair driver and
exposes the sysfs repair control attributes to userspace in
/sys/bus/edac/devices/<dev-name>/mem_repairX/.

The common memory repair control interface abstracts the control of
arbitrary memory repair functionality into a standardized set of functions.
The sysfs memory repair attribute nodes are only available if the client
driver has implemented the corresponding attribute callback function and
provided operations to the EDAC device driver during registration.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 .../ABI/testing/sysfs-edac-memory-repair      | 168 ++++++++
 drivers/edac/Makefile                         |   2 +-
 drivers/edac/edac_device.c                    |  32 ++
 drivers/edac/mem_repair.c                     | 367 ++++++++++++++++++
 include/linux/edac.h                          |  87 +++++
 5 files changed, 655 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-edac-memory-repair
 create mode 100755 drivers/edac/mem_repair.c

Comments

Borislav Petkov Nov. 4, 2024, 6:15 a.m. UTC | #1
On Fri, Nov 01, 2024 at 09:17:29AM +0000, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
> memory sparing etc, control driver in order to control memory repairs
> in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard memory
> sparing, memory sparing at cacheline/row/bank/rank granularity etc.
> Device with memory repair features registers with EDAC device driver,
> which retrieves memory repair descriptor from EDAC memory repair driver and
> exposes the sysfs repair control attributes to userspace in
> /sys/bus/edac/devices/<dev-name>/mem_repairX/.
> 
> The common memory repair control interface abstracts the control of
> arbitrary memory repair functionality into a standardized set of functions.
> The sysfs memory repair attribute nodes are only available if the client
> driver has implemented the corresponding attribute callback function and
> provided operations to the EDAC device driver during registration.

What is the valid use case for this thing?

> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  .../ABI/testing/sysfs-edac-memory-repair      | 168 ++++++++
>  drivers/edac/Makefile                         |   2 +-
>  drivers/edac/edac_device.c                    |  32 ++
>  drivers/edac/mem_repair.c                     | 367 ++++++++++++++++++
>  include/linux/edac.h                          |  87 +++++
>  5 files changed, 655 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-memory-repair
>  create mode 100755 drivers/edac/mem_repair.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-memory-repair b/Documentation/ABI/testing/sysfs-edac-memory-repair
> new file mode 100644
> index 000000000000..393206b8d418
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-memory-repair
> @@ -0,0 +1,168 @@
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory
> +		pertains to the memory media repair features control, such as
> +		PPR (Post Package Repair), memory sparing etc, where<dev-name>
> +		directory corresponds to a device registered with the EDAC
> +		device driver for the memory repair features.
> +		Memory sparing is a repair function that replaces a portion
> +		of memory (spared memory) with a portion of functional memory.

a portion of *faulty* memory - this is the most important aspect here. You
want to explain why you're doing the replacing in the first place. Memory
which is going bad.

This first paragraph is a good explanation:

https://pubs.lenovo.com/sr850/memory_module_installation_order_sparing

the first hit in a web search here.

> +		Memory sparing has cacheline/row/bank/rank sparing
> +		granularities.
> +		The sysfs attributes nodes for a repair feature are only
> +		present if the parent driver has implemented the corresponding
> +		attr callback function and provided the necessary operations
> +		to the EDAC device driver during registration.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) Type of the repair instance. For eg. sPPR, hPPR, cacheline/

WTH is a "repair instance"? Can we pls make this human readable and not some
crap spec language?

> +		row/bank/rank memory sparing etc.
> +		0 - Soft PPR(sPPR)
> +		1 - Hard PPR(hPPR)

Write out those abbreviations.

> +		2 - Cacheline memory sparing
> +		3 - Row memory sparing
> +		4 - Bank memory sparing
> +		5 - Rank memory sparing
> +		All other values are reserved.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode_avail
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) Persist repair modes supported in the device.
> +		0 - Soft PPR(sPPR)/soft memory sparing (temporary repair).
> +		1 - Hard PPR(hPPR)/hard memory sparing (permanent repair).

Those need a longer, user-friendly explanation.

> +		All other values are reserved.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Current persist repair mode.
> +		0 - Soft PPR(sPPR)/soft memory sparing (temporary repair).
> +		1 - Hard PPR(hPPR)/hard memory sparing (permanent repair).
> +		All other values are reserved.

You can kill persist_mode_avail by making this persist_mode return the
available modes by reading it.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa_support
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if supports DPA for a repair operation.
> +		E.g. PPR, memory sparing.

DPA? What?!

Why is *that* a sysfs file?

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_safe_when_in_use
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if memory media is accessible and data is retained
> +		during the memory repair operation.

Ditto.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) HPA (Host Physical Address) for memory repair.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) DPA (Device Physical Address) for memory repair.

Both need more explanation.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/nibble_mask
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Nibble mask for memory repair.

What is that?

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank_group
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Memory bank group for repair.

Ditto.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Memory bank for memory repair.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/rank
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Memory rank for repair.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/row
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Row for memory repair.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/column
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Column for memory repair.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/channel
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Channel for memory repair.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/sub_channel
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Sub-channel for memory repair.

All those above: need better explanation and need a way of finding out how
many of those are in each device so that the user can actually give a sensible
value there.

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/query
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) Query whether the repair operation is supported for the
> +		memory attributes set. Return failure if resources are
> +		not available to perform repair.
> +		1 - issue query.
> +		All other values are reserved.

What?!

If anything this should be a "status".

> +
> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair
> +Date:		Jan 2025
> +KernelVersion:	6.13
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) Start the memory repair operation for the memory attributes
> +		set. Return failure if resources are not available to
> +		perform repair.
> +		1 - issue repair operation.
> +		All other values are reserved.
> +		In some states of system configuration (e.g. before address
> +		decoders have been configured), memory devices (e.g. CXL)
> +		may not have an active mapping in the main host address
> +		physical address map. As such, the memory to repair must be
> +		identified by a device specific physical addressing scheme
> +		using a DPA. The DPA to use will be presented in related
> +		error records.

Yeh, this needs to be part of the interface and not hidden in some obscure
doc.

The whole repairing control needs to be explained somewhere so that the user
can make an informed decision and *actually* use this thing and not break out
a sweat when faced with those gazillion sysfs nodes which don't tell her
a whole lot.

And look into making that interface more user-friendly.

Thx.
Shiju Jose Nov. 4, 2024, 1:05 p.m. UTC | #2
Thanks for reviewing  the patches.
Please find reply inline.

>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 04 November 2024 06:16
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; gregkh@linuxfoundation.org;
>sudeep.holla@arm.com; jassisinghbrar@gmail.com; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v15 11/15] EDAC: Add memory repair control feature
>
>On Fri, Nov 01, 2024 at 09:17:29AM +0000, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
>> memory sparing etc, control driver in order to control memory repairs
>> in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard
>> memory sparing, memory sparing at cacheline/row/bank/rank granularity etc.
>> Device with memory repair features registers with EDAC device driver,
>> which retrieves memory repair descriptor from EDAC memory repair
>> driver and exposes the sysfs repair control attributes to userspace in
>> /sys/bus/edac/devices/<dev-name>/mem_repairX/.
>>
>> The common memory repair control interface abstracts the control of
>> arbitrary memory repair functionality into a standardized set of functions.
>> The sysfs memory repair attribute nodes are only available if the
>> client driver has implemented the corresponding attribute callback
>> function and provided operations to the EDAC device driver during
>registration.
>
>What is the valid use case for this thing?

More detailed explanation of PPR and memory sparing and use cases was
added in Documentation/edac/memory_repair.rst, which is part of the last common
patch ("EDAC: Add documentation for RAS feature control") added for documentation
of various RAS features supported in this series. Was not sure the file to be part of this
patch or not.
>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  .../ABI/testing/sysfs-edac-memory-repair      | 168 ++++++++
>>  drivers/edac/Makefile                         |   2 +-
>>  drivers/edac/edac_device.c                    |  32 ++
>>  drivers/edac/mem_repair.c                     | 367 ++++++++++++++++++
>>  include/linux/edac.h                          |  87 +++++
>>  5 files changed, 655 insertions(+), 1 deletion(-)  create mode 100644
>> Documentation/ABI/testing/sysfs-edac-memory-repair
>>  create mode 100755 drivers/edac/mem_repair.c
>>
>> diff --git a/Documentation/ABI/testing/sysfs-edac-memory-repair
>> b/Documentation/ABI/testing/sysfs-edac-memory-repair
>> new file mode 100644
>> index 000000000000..393206b8d418
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-edac-memory-repair
>> @@ -0,0 +1,168 @@
>> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		The sysfs EDAC bus devices /<dev-name>/mem_repairX
>subdirectory
>> +		pertains to the memory media repair features control, such as
>> +		PPR (Post Package Repair), memory sparing etc, where<dev-
>name>
>> +		directory corresponds to a device registered with the EDAC
>> +		device driver for the memory repair features.
>> +		Memory sparing is a repair function that replaces a portion
>> +		of memory (spared memory) with a portion of functional
>memory.
>
>a portion of *faulty* memory - this is the most important aspect here. You want
>to explain why you're doing the replacing in the first place. Memory which is
>going bad.
>
>This first paragraph is a good explanation:
>
>https://pubs.lenovo.com/sr850/memory_module_installation_order_sparing
>
>the first hit in a web search here.
Thanks. I will incorporate  these details.

>
>> +		Memory sparing has cacheline/row/bank/rank sparing
>> +		granularities.
>> +		The sysfs attributes nodes for a repair feature are only
>> +		present if the parent driver has implemented the corresponding
>> +		attr callback function and provided the necessary operations
>> +		to the EDAC device driver during registration.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/repair_type
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) Type of the repair instance. For eg. sPPR, hPPR, cacheline/
>
>WTH is a "repair instance"? Can we pls make this human readable and not some
>crap spec language?
Sure.
>
>> +		row/bank/rank memory sparing etc.
>> +		0 - Soft PPR(sPPR)
>> +		1 - Hard PPR(hPPR)
>
>Write out those abbreviations.
Sure.
>
>> +		2 - Cacheline memory sparing
>> +		3 - Row memory sparing
>> +		4 - Bank memory sparing
>> +		5 - Rank memory sparing
>> +		All other values are reserved.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/persist_mode_avail
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) Persist repair modes supported in the device.
>> +		0 - Soft PPR(sPPR)/soft memory sparing (temporary repair).
>> +		1 - Hard PPR(hPPR)/hard memory sparing (permanent repair).
>
>Those need a longer, user-friendly explanation.
I will update.
>
>> +		All other values are reserved.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/persist_mode
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Current persist repair mode.
>> +		0 - Soft PPR(sPPR)/soft memory sparing (temporary repair).
>> +		1 - Hard PPR(hPPR)/hard memory sparing (permanent repair).
>> +		All other values are reserved.
>
>You can kill persist_mode_avail by making this persist_mode return the
>available modes by reading it.
persist_mode used to readback the value of persist_mode presently set. 
For eg.  1 - soft memory sparing for a sparing instance, though the CXL memory device supports both
soft and hard sparing, which is configurable. 
persist_mode_avail used to return the temporary and permanent repair capability of the device.  
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/dpa_support
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) True if supports DPA for a repair operation.
>> +		E.g. PPR, memory sparing.
>
>DPA? What?!
>
I will update with Device Physical Address, which was actually added for the
next attribute and 'dpa'. 

>Why is *that* a sysfs file?
>
Also I will update here with more details which was given in the last part of this document about DPA. 
Some memory devices (For eg. a CXL memory device) may expect the Device Physical Address(DPA)
for a repair operation instead of Host Physical Address(HPA), because it may not have an active mapping
in the main host address physical address map.  'dpa_support' attribute used to return this info to the user.  
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/repair_safe_when_in_use
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) True if memory media is accessible and data is retained
>> +		during the memory repair operation.
>
>Ditto.
Sure.
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) HPA (Host Physical Address) for memory repair.
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) DPA (Device Physical Address) for memory repair.
>
>Both need more explanation.
Sure.
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/nibble_mask
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Nibble mask for memory repair.
>
>What is that?
The nibble mask actually for CXL memory PPR and memory sparing operations,
which is reported by the device in DRAM Event Record and to the userspace in the
CXL DRAM trace event.
Please see the details from the spec.
I was not sure add or not these CXL specific details in this EDAC document.

CXL spec rev 3.1
1. Table 8-46. DRAM Event Record given details of nibble_mask as
"Nibble Mask: Identifies one or more nibbles in error on the memory bus producing
the event. Nibble Mask bit 0 shall be set if nibble 0 on the memory bus produced the
event, etc. This field should be valid for corrected memory errors." 
 
2. Table 8-103. sPPR Maintenance Input Payload and
8.2.9.7.1.3 hPPR Maintenance Operation
"Nibble Mask: Identifies one or more nibbles on the memory bus. Nibble
mapping is the same of DRAM Event Record nibble mapping, see
Table 8-46. A Nibble Mask bit set to 1 indicates the request to perform sPPR
operation in the specific device. All Nibble Mask bits set to 1 indicates the
request to perform the operation in all devices.
This field is ignored if the Nibble support flag in the sPPR Feature is cleared
to 0 (see Table 8-113), and the sPPR is performed in all devices."

3. Table 8-105. Memory Sparing Input Payload  
"Nibble Mask: This field may be used to specify one or more components to be spared.
See Table 8-46 for definition. It may be provided for all subclasses."

>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/bank_group
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Memory bank group for repair.
>
>Ditto.
Ok.
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Memory bank for memory repair.
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/rank
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Memory rank for repair.
>> +
>> +What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/row
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Row for memory repair.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/column
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Column for memory repair.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/channel
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Channel for memory repair.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/sub_channel
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Sub-channel for memory repair.
>
>All those above: need better explanation and need a way of finding out how
Sure.   
>many of those are in each device so that the user can actually give a sensible
>value there.
The visibility of these control attributes to the user  in sysfs is decided by the
is_visible() callback in the EDAC, which in turn depends on a memory device
support or not the control of a repair attribute. 
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/query
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(WO) Query whether the repair operation is supported for the
>> +		memory attributes set. Return failure if resources are
>> +		not available to perform repair.
>> +		1 - issue query.
>> +		All other values are reserved.
>
>What?!
>
>If anything this should be a "status".
This attribute used request to determine availability of resources for a repair operation
(For eg. memory PPR and sparing operation) for a given address and memory attributes set.
The device may return result for this request in different ways.
For example, in CXL device request query resource command for a,  
1. PPR operation returns resource availability as a return code of the command. 
2. memory sparing operation, the device will report the resource availability by producing a
Memory Sparing Event Record and  memory sparing trace event to the userspace.

May be 'dry-run' better name instead of query?
>
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/mem_repairX/repair
>> +Date:		Jan 2025
>> +KernelVersion:	6.13
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(WO) Start the memory repair operation for the memory
>attributes
>> +		set. Return failure if resources are not available to
>> +		perform repair.
>> +		1 - issue repair operation.
>> +		All other values are reserved.
>> +		In some states of system configuration (e.g. before address
>> +		decoders have been configured), memory devices (e.g. CXL)
>> +		may not have an active mapping in the main host address
>> +		physical address map. As such, the memory to repair must be
>> +		identified by a device specific physical addressing scheme
>> +		using a DPA. The DPA to use will be presented in related
>> +		error records.
>
>Yeh, this needs to be part of the interface and not hidden in some obscure doc.
Adding this info in Documentation/edac/memory_repair.rst is sufficient? 
>
>The whole repairing control needs to be explained somewhere so that the user
>can make an informed decision and *actually* use this thing and not break out a
>sweat when faced with those gazillion sysfs nodes which don't tell her a whole
>lot.
The details of the repairing control was added in
Documentation/edac/memory_repair.rst, which is part of the common
patch ("EDAC: Add documentation for RAS feature control").
>
>And look into making that interface more user-friendly.
>
>Thx.
>
>
>--
>Regards/Gruss,
>    Boris.
>

Thanks,
Shiju
Fan Ni Nov. 8, 2024, 4:59 p.m. UTC | #3
On Fri, Nov 01, 2024 at 09:17:29AM +0000, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
> memory sparing etc, control driver in order to control memory repairs
> in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard memory
> sparing, memory sparing at cacheline/row/bank/rank granularity etc.
> Device with memory repair features registers with EDAC device driver,
> which retrieves memory repair descriptor from EDAC memory repair driver and
> exposes the sysfs repair control attributes to userspace in
> /sys/bus/edac/devices/<dev-name>/mem_repairX/.
> 
> The common memory repair control interface abstracts the control of
> arbitrary memory repair functionality into a standardized set of functions.
> The sysfs memory repair attribute nodes are only available if the client
> driver has implemented the corresponding attribute callback function and
> provided operations to the EDAC device driver during registration.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  .../ABI/testing/sysfs-edac-memory-repair      | 168 ++++++++
>  drivers/edac/Makefile                         |   2 +-
>  drivers/edac/edac_device.c                    |  32 ++
>  drivers/edac/mem_repair.c                     | 367 ++++++++++++++++++
>  include/linux/edac.h                          |  87 +++++
>  5 files changed, 655 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-memory-repair
>  create mode 100755 drivers/edac/mem_repair.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-memory-repair b/Documentation/ABI/testing/sysfs-edac-memory-repair
> new file mode 100644
> index 000000000000..393206b8d418
...
 @@ -610,6 +611,7 @@ int edac_dev_register(struct device *parent, char *name,
>  		      const struct edac_dev_feature *ras_features)
>  {
>  	const struct attribute_group **ras_attr_groups;
> +	int mem_repair_cnt = 0, mem_repair_inst = 0;
>  	int scrub_cnt = 0, scrub_inst = 0;
>  	struct edac_dev_data *dev_data;
>  	struct edac_dev_feat_ctx *ctx;
> @@ -626,6 +628,10 @@ int edac_dev_register(struct device *parent, char *name,
>  			attr_gcnt++;
>  			scrub_cnt++;
>  			break;
> +		case RAS_FEAT_MEM_REPAIR:
> +			attr_gcnt++;
> +			mem_repair_cnt++;
> +			break;
>  		case RAS_FEAT_ECS:
>  			attr_gcnt += ras_features[feat].ecs_info.num_media_frus;
>  			break;
> @@ -652,6 +658,14 @@ int edac_dev_register(struct device *parent, char *name,
>  		}
>  	}
>  
> +	if (mem_repair_cnt) {
> +		ctx->mem_repair = kcalloc(mem_repair_cnt, sizeof(*ctx->mem_repair), GFP_KERNEL);
> +		if (!ctx->mem_repair) {
> +			ret = -ENOMEM;
> +			goto groups_free;

If the function returns here, we will have a leak from memory pointed by ctx->scrub.

Fan
> +		}
> +	}
> +
>  	attr_gcnt = 0;
>  	for (feat = 0; feat < num_features; feat++, ras_features++) {
>  		switch (ras_features->ft_type) {
> @@ -686,6 +700,23 @@ int edac_dev_register(struct device *parent, char *name,
>  
>  			attr_gcnt += ras_features->ecs_info.num_media_frus;
>  			break;
> +		case RAS_FEAT_MEM_REPAIR:
> +			if (!ras_features->mem_repair_ops ||
> +			    mem_repair_inst != ras_features->instance)
> +				goto data_mem_free;
> +
> +			dev_data = &ctx->mem_repair[mem_repair_inst];
> +			dev_data->instance = mem_repair_inst;
> +			dev_data->mem_repair_ops = ras_features->mem_repair_ops;
> +			dev_data->private = ras_features->ctx;
> +			ret = edac_mem_repair_get_desc(parent, &ras_attr_groups[attr_gcnt],
> +						       ras_features->instance);
> +			if (ret)
> +				goto data_mem_free;
> +
> +			mem_repair_inst++;
> +			attr_gcnt++;
> +			break;
>  		default:
>  			ret = -EINVAL;
>  			goto data_mem_free;
> @@ -712,6 +743,7 @@ int edac_dev_register(struct device *parent, char *name,
>  	return devm_add_action_or_reset(parent, edac_dev_unreg, &ctx->dev);
>  
>  data_mem_free:
> +	kfree(ctx->mem_repair);
>  	kfree(ctx->scrub);
>  groups_free:
>  	kfree(ras_attr_groups);
> diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
> new file mode 100755
> index 000000000000..93246ad0c9eb
> --- /dev/null
> +++ b/drivers/edac/mem_repair.c
> @@ -0,0 +1,367 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * The generic EDAC memory repair driver is designed to control the memory
> + * devices with memory repair features, such as Post Package Repair (PPR),
> + * memory sparing etc. The common sysfs memory repair interface abstracts
> + * the control of various arbitrary memory repair functionalities into a
> + * unified set of functions.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#include <linux/edac.h>
> +
> +enum edac_mem_repair_attributes {
> +	MEM_REPAIR_TYPE,
> +	MEM_REPAIR_PERSIST_MODE_AVAIL,
> +	MEM_REPAIR_PERSIST_MODE,
> +	MEM_REPAIR_DPA_SUPPORT,
> +	MEM_REPAIR_SAFE_IN_USE,
> +	MEM_REPAIR_HPA,
> +	MEM_REPAIR_DPA,
> +	MEM_REPAIR_NIBBLE_MASK,
> +	MEM_REPAIR_BANK_GROUP,
> +	MEM_REPAIR_BANK,
> +	MEM_REPAIR_RANK,
> +	MEM_REPAIR_ROW,
> +	MEM_REPAIR_COLUMN,
> +	MEM_REPAIR_CHANNEL,
> +	MEM_REPAIR_SUB_CHANNEL,
> +	MEM_REPAIR_QUERY,
> +	MEM_DO_REPAIR,
> +	MEM_REPAIR_MAX_ATTRS
> +};
> +
> +struct edac_mem_repair_dev_attr {
> +	struct device_attribute dev_attr;
> +	u8 instance;
> +};
> +
> +struct edac_mem_repair_context {
> +	char name[EDAC_FEAT_NAME_LEN];
> +	struct edac_mem_repair_dev_attr mem_repair_dev_attr[MEM_REPAIR_MAX_ATTRS];
> +	struct attribute *mem_repair_attrs[MEM_REPAIR_MAX_ATTRS + 1];
> +	struct attribute_group group;
> +};
> +
> +#define TO_MEM_REPAIR_DEV_ATTR(_dev_attr)      \
> +		container_of(_dev_attr, struct edac_mem_repair_dev_attr, dev_attr)
> +
> +#define EDAC_MEM_REPAIR_ATTR_SHOW(attrib, cb, type, format)			\
> +static ssize_t attrib##_show(struct device *ras_feat_dev,			\
> +			     struct device_attribute *attr, char *buf)		\
> +{										\
> +	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;			\
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);		\
> +	const struct edac_mem_repair_ops *ops =					\
> +				ctx->mem_repair[inst].mem_repair_ops;		\
> +	type data;								\
> +	int ret;								\
> +										\
> +	ret = ops->cb(ras_feat_dev->parent, ctx->mem_repair[inst].private,	\
> +		      &data);							\
> +	if (ret)								\
> +		return ret;							\
> +										\
> +	return sysfs_emit(buf, format, data);					\
> +}
> +
> +EDAC_MEM_REPAIR_ATTR_SHOW(repair_type, get_repair_type, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(persist_mode, get_persist_mode, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(dpa_support, get_dpa_support, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(repair_safe_when_in_use, get_repair_safe_when_in_use, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(hpa, get_hpa, u64, "0x%llx\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(dpa, get_dpa, u64, "0x%llx\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(nibble_mask, get_nibble_mask, u64, "0x%llx\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(bank_group, get_bank_group, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(bank, get_bank, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(rank, get_rank, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(row, get_row, u64, "0x%llx\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(column, get_column, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(channel, get_channel, u32, "%u\n")
> +EDAC_MEM_REPAIR_ATTR_SHOW(sub_channel, get_sub_channel, u32, "%u\n")
> +
> +#define EDAC_MEM_REPAIR_ATTR_STORE(attrib, cb, type, conv_func)			\
> +static ssize_t attrib##_store(struct device *ras_feat_dev,			\
> +			      struct device_attribute *attr,			\
> +			      const char *buf, size_t len)			\
> +{										\
> +	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;			\
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);		\
> +	const struct edac_mem_repair_ops *ops =					\
> +				ctx->mem_repair[inst].mem_repair_ops;		\
> +	type data;								\
> +	int ret;								\
> +										\
> +	ret = conv_func(buf, 0, &data);						\
> +	if (ret < 0)								\
> +		return ret;							\
> +										\
> +	ret = ops->cb(ras_feat_dev->parent, ctx->mem_repair[inst].private,	\
> +		      data);							\
> +	if (ret)								\
> +		return ret;							\
> +										\
> +	return len;								\
> +}
> +
> +EDAC_MEM_REPAIR_ATTR_STORE(persist_mode, set_persist_mode, unsigned long, kstrtoul)
> +EDAC_MEM_REPAIR_ATTR_STORE(hpa, set_hpa, u64, kstrtou64)
> +EDAC_MEM_REPAIR_ATTR_STORE(dpa, set_dpa, u64, kstrtou64)
> +EDAC_MEM_REPAIR_ATTR_STORE(nibble_mask, set_nibble_mask, u64, kstrtou64)
> +EDAC_MEM_REPAIR_ATTR_STORE(bank_group, set_bank_group, unsigned long, kstrtoul)
> +EDAC_MEM_REPAIR_ATTR_STORE(bank, set_bank, unsigned long, kstrtoul)
> +EDAC_MEM_REPAIR_ATTR_STORE(rank, set_rank, unsigned long, kstrtoul)
> +EDAC_MEM_REPAIR_ATTR_STORE(row, set_row, u64, kstrtou64)
> +EDAC_MEM_REPAIR_ATTR_STORE(column, set_column, unsigned long, kstrtoul)
> +EDAC_MEM_REPAIR_ATTR_STORE(channel, set_channel, unsigned long, kstrtoul)
> +EDAC_MEM_REPAIR_ATTR_STORE(sub_channel, set_sub_channel, unsigned long, kstrtoul)
> +
> +#define EDAC_MEM_REPAIR_DO_OP(attrib, cb)						\
> +static ssize_t attrib##_store(struct device *ras_feat_dev,				\
> +			      struct device_attribute *attr,				\
> +			      const char *buf, size_t len)				\
> +{											\
> +	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;				\
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);			\
> +	const struct edac_mem_repair_ops *ops = ctx->mem_repair[inst].mem_repair_ops;	\
> +	int ret;									\
> +											\
> +	ret = ops->cb(ras_feat_dev->parent, ctx->mem_repair[inst].private);		\
> +	if (ret)									\
> +		return ret;								\
> +											\
> +	return len;									\
> +}
> +
> +EDAC_MEM_REPAIR_DO_OP(query, do_query)
> +EDAC_MEM_REPAIR_DO_OP(repair, do_repair)
> +
> +static ssize_t persist_mode_avail_show(struct device *ras_feat_dev,
> +				       struct device_attribute *attr, char *buf)
> +{
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;
> +	const struct edac_mem_repair_ops *ops = ctx->mem_repair[inst].mem_repair_ops;
> +
> +	return ops->get_persist_mode_avail(ras_feat_dev->parent,
> +					   ctx->mem_repair[inst].private, buf);
> +}
> +
> +static umode_t mem_repair_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id)
> +{
> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
> +	struct device_attribute *dev_attr = container_of(a, struct device_attribute, attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	u8 inst = TO_MEM_REPAIR_DEV_ATTR(dev_attr)->instance;
> +	const struct edac_mem_repair_ops *ops = ctx->mem_repair[inst].mem_repair_ops;
> +
> +	switch (attr_id) {
> +	case MEM_REPAIR_TYPE:
> +		if (ops->get_repair_type)
> +			return a->mode;
> +		break;
> +	case MEM_REPAIR_PERSIST_MODE_AVAIL:
> +		if (ops->get_persist_mode_avail)
> +			return a->mode;
> +		break;
> +	case MEM_REPAIR_PERSIST_MODE:
> +		if (ops->get_persist_mode) {
> +			if (ops->set_persist_mode)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_DPA_SUPPORT:
> +		if (ops->get_dpa_support)
> +			return a->mode;
> +		break;
> +	case MEM_REPAIR_SAFE_IN_USE:
> +		if (ops->get_repair_safe_when_in_use)
> +			return a->mode;
> +		break;
> +	case MEM_REPAIR_HPA:
> +		if (ops->get_hpa) {
> +			if (ops->set_hpa)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_DPA:
> +		if (ops->get_dpa) {
> +			if (ops->set_dpa)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_NIBBLE_MASK:
> +		if (ops->get_nibble_mask) {
> +			if (ops->set_nibble_mask)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_BANK_GROUP:
> +		if (ops->get_bank_group) {
> +			if (ops->set_bank_group)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_BANK:
> +		if (ops->get_bank) {
> +			if (ops->set_bank)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_RANK:
> +		if (ops->get_rank) {
> +			if (ops->set_rank)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_ROW:
> +		if (ops->get_row) {
> +			if (ops->set_row)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_COLUMN:
> +		if (ops->get_column) {
> +			if (ops->set_column)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_CHANNEL:
> +		if (ops->get_channel) {
> +			if (ops->set_channel)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_SUB_CHANNEL:
> +		if (ops->get_sub_channel) {
> +			if (ops->set_sub_channel)
> +				return a->mode;
> +			else
> +				return 0444;
> +		}
> +		break;
> +	case MEM_REPAIR_QUERY:
> +		if (ops->do_query)
> +			return a->mode;
> +		break;
> +	case MEM_DO_REPAIR:
> +		if (ops->do_repair)
> +			return a->mode;
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return 0;
> +}
> +
> +#define EDAC_MEM_REPAIR_ATTR_RO(_name, _instance)       \
> +	((struct edac_mem_repair_dev_attr) { .dev_attr = __ATTR_RO(_name), \
> +					     .instance = _instance })
> +
> +#define EDAC_MEM_REPAIR_ATTR_WO(_name, _instance)       \
> +	((struct edac_mem_repair_dev_attr) { .dev_attr = __ATTR_WO(_name), \
> +					     .instance = _instance })
> +
> +#define EDAC_MEM_REPAIR_ATTR_RW(_name, _instance)       \
> +	((struct edac_mem_repair_dev_attr) { .dev_attr = __ATTR_RW(_name), \
> +					     .instance = _instance })
> +
> +static int mem_repair_create_desc(struct device *dev,
> +				  const struct attribute_group **attr_groups,
> +				  u8 instance)
> +{
> +	struct edac_mem_repair_context *ctx;
> +	struct attribute_group *group;
> +	int i;
> +	struct edac_mem_repair_dev_attr dev_attr[] = {
> +		[MEM_REPAIR_TYPE] = EDAC_MEM_REPAIR_ATTR_RO(repair_type,
> +							    instance),
> +		[MEM_REPAIR_PERSIST_MODE_AVAIL] =
> +				EDAC_MEM_REPAIR_ATTR_RO(persist_mode_avail,
> +							instance),
> +		[MEM_REPAIR_PERSIST_MODE] =
> +				EDAC_MEM_REPAIR_ATTR_RW(persist_mode, instance),
> +		[MEM_REPAIR_DPA_SUPPORT] =
> +				EDAC_MEM_REPAIR_ATTR_RO(dpa_support, instance),
> +		[MEM_REPAIR_SAFE_IN_USE] =
> +				EDAC_MEM_REPAIR_ATTR_RO(repair_safe_when_in_use,
> +							instance),
> +		[MEM_REPAIR_HPA] = EDAC_MEM_REPAIR_ATTR_RW(hpa, instance),
> +		[MEM_REPAIR_DPA] = EDAC_MEM_REPAIR_ATTR_RW(dpa, instance),
> +		[MEM_REPAIR_NIBBLE_MASK] =
> +				EDAC_MEM_REPAIR_ATTR_RW(nibble_mask, instance),
> +		[MEM_REPAIR_BANK_GROUP] =
> +				EDAC_MEM_REPAIR_ATTR_RW(bank_group, instance),
> +		[MEM_REPAIR_BANK] = EDAC_MEM_REPAIR_ATTR_RW(bank, instance),
> +		[MEM_REPAIR_RANK] = EDAC_MEM_REPAIR_ATTR_RW(rank, instance),
> +		[MEM_REPAIR_ROW] = EDAC_MEM_REPAIR_ATTR_RW(row, instance),
> +		[MEM_REPAIR_COLUMN] = EDAC_MEM_REPAIR_ATTR_RW(column, instance),
> +		[MEM_REPAIR_CHANNEL] = EDAC_MEM_REPAIR_ATTR_RW(channel, instance),
> +		[MEM_REPAIR_SUB_CHANNEL] =
> +				EDAC_MEM_REPAIR_ATTR_RW(sub_channel, instance),
> +		[MEM_REPAIR_QUERY] = EDAC_MEM_REPAIR_ATTR_WO(query, instance),
> +		[MEM_DO_REPAIR] = EDAC_MEM_REPAIR_ATTR_WO(repair, instance)
> +	};
> +
> +	ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < MEM_REPAIR_MAX_ATTRS; i++) {
> +		memcpy(&ctx->mem_repair_dev_attr[i].dev_attr,
> +		       &dev_attr[i], sizeof(dev_attr[i]));
> +		ctx->mem_repair_attrs[i] =
> +				&ctx->mem_repair_dev_attr[i].dev_attr.attr;
> +	}
> +
> +	sprintf(ctx->name, "%s%d", "mem_repair", instance);
> +	group = &ctx->group;
> +	group->name = ctx->name;
> +	group->attrs = ctx->mem_repair_attrs;
> +	group->is_visible  = mem_repair_attr_visible;
> +	attr_groups[0] = group;
> +
> +	return 0;
> +}
> +
> +/**
> + * edac_mem_repair_get_desc - get EDAC memory repair descriptors
> + * @dev: client device with memory repair feature
> + * @attr_groups: pointer to attribute group container
> + * @instance: device's memory repair instance number.
> + *
> + * Return:
> + *  * %0	- Success.
> + *  * %-EINVAL	- Invalid parameters passed.
> + *  * %-ENOMEM	- Dynamic memory allocation failed.
> + */
> +int edac_mem_repair_get_desc(struct device *dev,
> +			     const struct attribute_group **attr_groups, u8 instance)
> +{
> +	if (!dev || !attr_groups)
> +		return -EINVAL;
> +
> +	return mem_repair_create_desc(dev, attr_groups, instance);
> +}
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index 04385b1a9283..b52730d63088 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -670,6 +670,7 @@ static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
>  enum edac_dev_feat {
>  	RAS_FEAT_SCRUB,
>  	RAS_FEAT_ECS,
> +	RAS_FEAT_MEM_REPAIR,
>  	RAS_FEAT_MAX
>  };
>  
> @@ -731,11 +732,95 @@ int edac_ecs_get_desc(struct device *ecs_dev,
>  		      const struct attribute_group **attr_groups,
>  		      u16 num_media_frus);
>  
> +enum edac_mem_repair_type {
> +	EDAC_TYPE_SPPR,
> +	EDAC_TYPE_HPPR,
> +	EDAC_TYPE_CACHELINE_MEM_SPARING,
> +	EDAC_TYPE_ROW_MEM_SPARING,
> +	EDAC_TYPE_BANK_MEM_SPARING,
> +	EDAC_TYPE_RANK_MEM_SPARING,
> +};
> +
> +enum edac_mem_repair_persist_mode {
> +	EDAC_MEM_REPAIR_SOFT, /* soft memory repair */
> +	EDAC_MEM_REPAIR_HARD, /* hard memory repair */
> +};
> +
> +/**
> + * struct edac_mem_repair_ops - memory repair device operations
> + * (all elements optional)
> + * @get_repair_type: get the memory repair type, listed in enum edac_mem_repair_type.
> + * @get_persist_mode_avail: get the persist modes supported in the device.
> + * @get_persist_mode: get the persist mode of the memory repair instance.
> + * @set_persist_mode: set the persist mode for the memory repair instance.
> + * @get_dpa_support: get dpa support flag.
> + * @get_repair_safe_when_in_use: get whether memory media is accessible and
> + *			       data is retained during repair operation.
> + * @get_hpa: get HPA for memory repair.
> + * @set_hpa: set HPA for memory repair.
> + * @get_dpa: get DPA for memory repair.
> + * @set_dpa: set DPA for memory repair.
> + * @get_nibble_mask: get nibble mask for memory repair.
> + * @set_nibble_mask: set nibble mask for memory repair.
> + * @get_bank_group: get bank group for memory repair.
> + * @set_bank_group: set bank group for memory repair.
> + * @get_bank: get bank for memory repair.
> + * @set_bank: set bank for memory repair.
> + * @get_rank: get rank for memory repair.
> + * @set_rank: set rank for memory repair.
> + * @get_row: get row for memory repair.
> + * @set_row: set row for memory repair.
> + * @get_column: get column for memory repair.
> + * @set_column: set column for memory repair.
> + * @get_channel: get channel for memory repair.
> + * @set_channel: set channel for memory repair.
> + * @get_sub_channel: get sub channel for memory repair.
> + * @set_sub_channel: set sub channel for memory repair.
> + * @do_query: Query memory repair operation for the HPA/DPA/other attrs set
> + *	      is supported or not.
> + * @do_repair: start memory repair operation for the HPA/DPA/other attrs set.
> + */
> +struct edac_mem_repair_ops {
> +	int (*get_repair_type)(struct device *dev, void *drv_data, u32 *val);
> +	int (*get_persist_mode_avail)(struct device *dev, void *drv_data, char *buf);
> +	int (*get_persist_mode)(struct device *dev, void *drv_data, u32 *mode);
> +	int (*set_persist_mode)(struct device *dev, void *drv_data, u32 mode);
> +	int (*get_dpa_support)(struct device *dev, void *drv_data, u32 *val);
> +	int (*get_repair_safe_when_in_use)(struct device *dev, void *drv_data, u32 *val);
> +	int (*get_hpa)(struct device *dev, void *drv_data, u64 *hpa);
> +	int (*set_hpa)(struct device *dev, void *drv_data, u64 hpa);
> +	int (*get_dpa)(struct device *dev, void *drv_data, u64 *dpa);
> +	int (*set_dpa)(struct device *dev, void *drv_data, u64 dpa);
> +	int (*get_nibble_mask)(struct device *dev, void *drv_data, u64 *val);
> +	int (*set_nibble_mask)(struct device *dev, void *drv_data, u64 val);
> +	int (*get_bank_group)(struct device *dev, void *drv_data, u32 *val);
> +	int (*set_bank_group)(struct device *dev, void *drv_data, u32 val);
> +	int (*get_bank)(struct device *dev, void *drv_data, u32 *val);
> +	int (*set_bank)(struct device *dev, void *drv_data, u32 val);
> +	int (*get_rank)(struct device *dev, void *drv_data, u32 *val);
> +	int (*set_rank)(struct device *dev, void *drv_data, u32 val);
> +	int (*get_row)(struct device *dev, void *drv_data, u64 *val);
> +	int (*set_row)(struct device *dev, void *drv_data, u64 val);
> +	int (*get_column)(struct device *dev, void *drv_data, u32 *val);
> +	int (*set_column)(struct device *dev, void *drv_data, u32 val);
> +	int (*get_channel)(struct device *dev, void *drv_data, u32 *val);
> +	int (*set_channel)(struct device *dev, void *drv_data, u32 val);
> +	int (*get_sub_channel)(struct device *dev, void *drv_data, u32 *val);
> +	int (*set_sub_channel)(struct device *dev, void *drv_data, u32 val);
> +	int (*do_query)(struct device *dev, void *drv_data);
> +	int (*do_repair)(struct device *dev, void *drv_data);
> +};
> +
> +int edac_mem_repair_get_desc(struct device *dev,
> +			     const struct attribute_group **attr_groups,
> +			     u8 instance);
> +
>  /* EDAC device feature information structure */
>  struct edac_dev_data {
>  	union {
>  		const struct edac_scrub_ops *scrub_ops;
>  		const struct edac_ecs_ops *ecs_ops;
> +		const struct edac_mem_repair_ops *mem_repair_ops;
>  	};
>  	u8 instance;
>  	void *private;
> @@ -746,6 +831,7 @@ struct edac_dev_feat_ctx {
>  	void *private;
>  	struct edac_dev_data *scrub;
>  	struct edac_dev_data ecs;
> +	struct edac_dev_data *mem_repair;
>  };
>  
>  struct edac_dev_feature {
> @@ -754,6 +840,7 @@ struct edac_dev_feature {
>  	union {
>  		const struct edac_scrub_ops *scrub_ops;
>  		const struct edac_ecs_ops *ecs_ops;
> +		const struct edac_mem_repair_ops *mem_repair_ops;
>  	};
>  	void *ctx;
>  	struct edac_ecs_ex_info ecs_info;
> -- 
> 2.34.1
>
Borislav Petkov Nov. 11, 2024, 11:28 a.m. UTC | #4
On Mon, Nov 04, 2024 at 01:05:31PM +0000, Shiju Jose wrote:
> More detailed explanation of PPR and memory sparing and use cases was added
> in Documentation/edac/memory_repair.rst, which is part of the last common
> patch ("EDAC: Add documentation for RAS feature control") added for
> documentation of various RAS features supported in this series. Was not sure
> the file to be part of this patch or not.

If the commit message doesn't contain a justification for a patch's existence,
why do you even bother sending it?

IOW, no redirections pls - just state here what the use case is in short. You
can always go nuts into details in the docs.

> persist_mode used to readback the value of persist_mode presently set.  For
> eg.  1 - soft memory sparing for a sparing instance, though the CXL memory
> device supports both soft and hard sparing, which is configurable.
> persist_mode_avail used to return the temporary and permanent repair
> capability of the device.  

Wait, sysfs does a one value per file thing. What does persist_mode_avail
give?

Surely you can't dump a list of all available modes...

From that doc:

root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/mem_repair0/persist_mode_avail
0

Does that mean only sPPR is available?

If only one mode is available, why am I even querying things? There's no other
option.

Catch my drift?

> Also I will update here with more details which was given in the last part
> of this document about DPA.  Some memory devices (For eg. a CXL memory
> device) may expect the Device Physical Address(DPA) for a repair operation
> instead of Host Physical Address(HPA), because it may not have an active
> mapping in the main host address physical address map.  'dpa_support'
> attribute used to return this info to the user.  

All this stuff needs to be documented properly and especially how one is
supposed to use this interface. Not have people go read CXL specs just to be
able to even try to use this. I'd like to see clear steps in the docs what to
do and what they mean.

> The nibble mask actually for CXL memory PPR and memory sparing operations,
> which is reported by the device in DRAM Event Record and to the userspace in the
> CXL DRAM trace event.
> Please see the details from the spec.

This is *exactly* what I mean!

If I have to see the spec in order to use an interface, than that's a major
fail.

> I was not sure add or not these CXL specific details in this EDAC document.

So that document should contain enough info on how to use the interface. You
can always put links to the spec giving people further reading but some
initial how-do-I-use-this-damn-thing example should be there so that people
can find their way around this.

> The visibility of these control attributes to the user  in sysfs is decided
> by the is_visible() callback in the EDAC, which in turn depends on a memory
> device support or not the control of a repair attribute. 

That still doesn't answer my question: what are valid values I can put in all
those?

Try as many as I can until one sticks?

This is not a good interface.

And since sysfs does one-value-per-file, dumping ranges here is kinda wrong.

> This attribute used request to determine availability of resources for a repair operation
> (For eg. memory PPR and sparing operation) for a given address and memory attributes set.
> The device may return result for this request in different ways.
> For example, in CXL device request query resource command for a,  
> 1. PPR operation returns resource availability as a return code of the command. 
> 2. memory sparing operation, the device will report the resource availability by producing a
> Memory Sparing Event Record and  memory sparing trace event to the userspace.
> 
> May be 'dry-run' better name instead of query?

Maybe this should not exist at all: my simple thinking would say that
determining whether resources are available should be part of the actual
repair operation. If none are there, it should return "no resources
available". If there are, it should simply use them and do the repair.

Exposing this as an explicit step sounds silly.

> >Yeh, this needs to be part of the interface and not hidden in some obscure doc.
> Adding this info in Documentation/edac/memory_repair.rst is sufficient? 

Yap, for example. You can always concentrate the whole documentation there and
point to it from everywhere else.

> The details of the repairing control was added in
> Documentation/edac/memory_repair.rst, which is part of the common
> patch ("EDAC: Add documentation for RAS feature control").

Ok, point to it pls in this doc so that people can find it.

Thx.
Shiju Jose Nov. 11, 2024, 4:54 p.m. UTC | #5
>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 11 November 2024 11:28
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; gregkh@linuxfoundation.org;
>sudeep.holla@arm.com; jassisinghbrar@gmail.com; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v15 11/15] EDAC: Add memory repair control feature
>
>On Mon, Nov 04, 2024 at 01:05:31PM +0000, Shiju Jose wrote:
>> More detailed explanation of PPR and memory sparing and use cases was
>> added in Documentation/edac/memory_repair.rst, which is part of the
>> last common patch ("EDAC: Add documentation for RAS feature control")
>> added for documentation of various RAS features supported in this
>> series. Was not sure the file to be part of this patch or not.
>
>If the commit message doesn't contain a justification for a patch's existence,
>why do you even bother sending it?
Sure. I updated the commit message with example CXL use cases.

>
>IOW, no redirections pls - just state here what the use case is in short. You can
>always go nuts into details in the docs.
>
>> persist_mode used to readback the value of persist_mode presently set.
>> For eg.  1 - soft memory sparing for a sparing instance, though the
>> CXL memory device supports both soft and hard sparing, which is configurable.
>> persist_mode_avail used to return the temporary and permanent repair
>> capability of the device.
>
>Wait, sysfs does a one value per file thing. What does persist_mode_avail give?

Presently, 0 (soft memory repair) and 1 (hard memory repair),  depends on
which mode/s a memory device is supported.  
>
>Surely you can't dump a list of all available modes...
>
>From that doc:
>
>root@localhost:~# cat
>/sys/bus/edac/devices/cxl_mem0/mem_repair0/persist_mode_avail
>0
>
>Does that mean only sPPR is available?
This example was from the CXL soft PPR feature for which persistent mode is non-configurable, as soft repair.
For CXL hard PPR feature also persistent mode is non-configurable, as hard repair.
Thus presently for CXL PPR features, persist_mode_avail is not required.
But there may be some non-CXL memory devices with runtime configurable persistent mode
for PPR feature. 
However for CXL memory sparing feature, the persistent mode is configurable at runtime
for a memory sparing instance, thus both soft and hard sparing are supported.
Example given for CXL memory sparing feature in Documentation/edac/memory_repair.rst,
root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/mem_repair1/persist_mode_avail
0,1

Kernel sysfs doc mentioned about array of values as follows, though not seen much examples. 
https://docs.kernel.org/filesystems/sysfs.html
"Attributes should be ASCII text files, preferably with only one value per file. It is noted that
it may not be efficient to contain only one value per file, so it is socially acceptable to express
an array of values of the same type."
>
>If only one mode is available, why am I even querying things? There's no other
>option.

>
>Catch my drift?
>
>> Also I will update here with more details which was given in the last
>> part of this document about DPA.  Some memory devices (For eg. a CXL
>> memory
>> device) may expect the Device Physical Address(DPA) for a repair
>> operation instead of Host Physical Address(HPA), because it may not
>> have an active mapping in the main host address physical address map.
>'dpa_support'
>> attribute used to return this info to the user.
>
>All this stuff needs to be documented properly and especially how one is
>supposed to use this interface. Not have people go read CXL specs just to be able
>to even try to use this. I'd like to see clear steps in the docs what to do and what
>they mean.

Sure.

>
>> The nibble mask actually for CXL memory PPR and memory sparing
>> operations, which is reported by the device in DRAM Event Record and
>> to the userspace in the CXL DRAM trace event.
>> Please see the details from the spec.
>
>This is *exactly* what I mean!
>
>If I have to see the spec in order to use an interface, than that's a major fail.
>
>> I was not sure add or not these CXL specific details in this EDAC document.
>
>So that document should contain enough info on how to use the interface. You
>can always put links to the spec giving people further reading but some initial
>how-do-I-use-this-damn-thing example should be there so that people can find
>their way around this.
>
>> The visibility of these control attributes to the user  in sysfs is
>> decided by the is_visible() callback in the EDAC, which in turn
>> depends on a memory device support or not the control of a repair attribute.
>
>That still doesn't answer my question: what are valid values I can put in all
>those?

The values of these attributes are specific to device and portion of the memory to repair. 
For example, In CXL repair features,
CXL memory device identifies a failure on a memory component, device provides the corresponding
values of the attributes (DPA, channel, rank, nibble mask, bank group, bank, row, column or sub-channel etc)
in an event record to the host and to the userspace in the corresponding trace event.  
Userspace shall use these values for the query resource availability and repair operations.

>
>Try as many as I can until one sticks?
>
>This is not a good interface.
>
>And since sysfs does one-value-per-file, dumping ranges here is kinda wrong.
>
>> This attribute used request to determine availability of resources for
>> a repair operation (For eg. memory PPR and sparing operation) for a given
>address and memory attributes set.
>> The device may return result for this request in different ways.
>> For example, in CXL device request query resource command for a, 1.
>> PPR operation returns resource availability as a return code of the command.
>> 2. memory sparing operation, the device will report the resource
>> availability by producing a Memory Sparing Event Record and  memory
>sparing trace event to the userspace.
>>
>> May be 'dry-run' better name instead of query?
>
>Maybe this should not exist at all: my simple thinking would say that
>determining whether resources are available should be part of the actual repair
>operation. If none are there, it should return "no resources available". If there
>are, it should simply use them and do the repair.

This will work for the CXL PPR feature where the result of the query operation for resources  availability
return to the command, however for the CXL memory sparing features,  the result of the query resources 
availability command returned later in a Memory Sparing Event Record from the device. 
Userspace shall issue repair operation with the attributes values received on the Memory Sparing trace event.
Thus for the CXL memory sparing feature, query for resources availability and repair operation 
cannot be combined.

>
>Exposing this as an explicit step sounds silly.
>> >Yeh, this needs to be part of the interface and not hidden in some obscure
>doc.
>> Adding this info in Documentation/edac/memory_repair.rst is sufficient?
>
>Yap, for example. You can always concentrate the whole documentation there
>and point to it from everywhere else.

I merged the corresponding documentations into individual patches for better readability and
to avoid confusion.

>
>> The details of the repairing control was added in
>> Documentation/edac/memory_repair.rst, which is part of the common
>> patch ("EDAC: Add documentation for RAS feature control").
>
>Ok, point to it pls in this doc so that people can find it.
>
>Thx.
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju
Shiju Jose Nov. 11, 2024, 5:01 p.m. UTC | #6
>-----Original Message-----
>From: Fan Ni <nifan.cxl@gmail.com>
>Sent: 08 November 2024 16:59
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; gregkh@linuxfoundation.org;
>sudeep.holla@arm.com; jassisinghbrar@gmail.com; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v15 11/15] EDAC: Add memory repair control feature
>
>On Fri, Nov 01, 2024 at 09:17:29AM +0000, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add generic EDAC memory repair control, eg. PPR(Post Package Repair),
>> memory sparing etc, control driver in order to control memory repairs
>> in the system. Supports sPPR(soft PPR), hPPR(hard PPR), soft/hard
>> memory sparing, memory sparing at cacheline/row/bank/rank granularity etc.
>> Device with memory repair features registers with EDAC device driver,
>> which retrieves memory repair descriptor from EDAC memory repair
>> driver and exposes the sysfs repair control attributes to userspace in
>> /sys/bus/edac/devices/<dev-name>/mem_repairX/.
>>
>> The common memory repair control interface abstracts the control of
>> arbitrary memory repair functionality into a standardized set of functions.
>> The sysfs memory repair attribute nodes are only available if the
>> client driver has implemented the corresponding attribute callback
>> function and provided operations to the EDAC device driver during
>registration.
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  .../ABI/testing/sysfs-edac-memory-repair      | 168 ++++++++
>>  drivers/edac/Makefile                         |   2 +-
>>  drivers/edac/edac_device.c                    |  32 ++
>>  drivers/edac/mem_repair.c                     | 367 ++++++++++++++++++
>>  include/linux/edac.h                          |  87 +++++
>>  5 files changed, 655 insertions(+), 1 deletion(-)  create mode 100644
>> Documentation/ABI/testing/sysfs-edac-memory-repair
>>  create mode 100755 drivers/edac/mem_repair.c
>>
>> diff --git a/Documentation/ABI/testing/sysfs-edac-memory-repair
>> b/Documentation/ABI/testing/sysfs-edac-memory-repair
>> new file mode 100644
>> index 000000000000..393206b8d418
[...]
>>
>> +	if (mem_repair_cnt) {
>> +		ctx->mem_repair = kcalloc(mem_repair_cnt, sizeof(*ctx-
>>mem_repair), GFP_KERNEL);
>> +		if (!ctx->mem_repair) {
>> +			ret = -ENOMEM;
>> +			goto groups_free;
>
>If the function returns here, we will have a leak from memory pointed by ctx-
>>scrub.
Thanks Fan for reporting. 
Fixed.
>
>Fan
>> +		}
>> +	}
>> +
[...]
>--
>Fan Ni

Thanks,
Shiju
Borislav Petkov Nov. 14, 2024, 1:32 p.m. UTC | #7
On Mon, Nov 11, 2024 at 04:54:48PM +0000, Shiju Jose wrote:
> Presently, 0 (soft memory repair) and 1 (hard memory repair),  depends on
> which mode/s a memory device is supported. 

What if the device supports more than one mode?

> However for CXL memory sparing feature, the persistent mode is configurable at runtime
> for a memory sparing instance, thus both soft and hard sparing are supported.
> Example given for CXL memory sparing feature in Documentation/edac/memory_repair.rst,
> root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/mem_repair1/persist_mode_avail
> 0,1

Ok, and how is the user supposed to know what those mean?

> Kernel sysfs doc mentioned about array of values as follows, though not seen much examples. 
> https://docs.kernel.org/filesystems/sysfs.html
> "Attributes should be ASCII text files, preferably with only one value per file. It is noted that
> it may not be efficient to contain only one value per file, so it is socially acceptable to express
> an array of values of the same type."

True story. Ok, so there's an exception to that rule.

> The values of these attributes are specific to device and portion of the memory to repair. 
> For example, In CXL repair features,
> CXL memory device identifies a failure on a memory component, device provides the corresponding
> values of the attributes (DPA, channel, rank, nibble mask, bank group, bank, row, column or sub-channel etc)
> in an event record to the host and to the userspace in the corresponding trace event.  
> Userspace shall use these values for the query resource availability and repair operations.

I don't think you're answering my question. Lemme try again:

I am on a machine with such an interface. I do

echo 0xdeadbeef > /sys/devices...
-EINVAL

echo 0xface > ...
-EINVAL

How do I know what the allowed ranges are?

> This will work for the CXL PPR feature where the result of the query operation for resources  availability
> return to the command, however for the CXL memory sparing features,  the result of the query resources 
> availability command returned later in a Memory Sparing Event Record from the device. 
> Userspace shall issue repair operation with the attributes values received on the Memory Sparing trace event.
> Thus for the CXL memory sparing feature, query for resources availability and repair operation 
> cannot be combined.

What happens if the resources availability changes between the query and the
start of the repair operation?

The cat catches fire?
Jonathan Cameron Nov. 15, 2024, 12:14 p.m. UTC | #8
Hi Borislav,

I'll just jump in on one element.

> > This will work for the CXL PPR feature where the result of the query operation for resources  availability
> > return to the command, however for the CXL memory sparing features,  the result of the query resources 
> > availability command returned later in a Memory Sparing Event Record from the device. 
> > Userspace shall issue repair operation with the attributes values received on the Memory Sparing trace event.
> > Thus for the CXL memory sparing feature, query for resources availability and repair operation 
> > cannot be combined.  
> 
> What happens if the resources availability changes between the query and the
> start of the repair operation?
>
Short answer, you get an error return. 

The query is an optional step / optimization. You can just skip it.
There is no point in  querying if you are going to immediately issue the command to repair
(as that will report an error if you can't do it). 

A typical flow where it might be useful is:
1) Lots of corrected errors reported on a particular part of the memory.
2) OS decides enough is enough, that row/bank/nibble should be replaced.
3) Before doing so it checks it can actually replace it - otherwise maybe we will be disrupting a
   gigantic page or similar where the perf cost of just off lining is higher than we want.
4) After query the page is offlined etc (may or may not be necessary depending on the
   hardware design - we may be able to do it 'live').
5) 'Try' to repair. Hopefully no one raced with us and used up the remaining resources.
  Given this is typically only driven by something like RASDaemon that race should be
  a corner case only (very unlikely)
6) If repair fails can just bring the memory back - but this dance was expensive and
   we will carry on working with less than ideal memory (probably schedule some
   real maintenance to swap out the device).
7) If repair succeeds bring the memory back as now we have shiny new memory.

We could drop the query for now and bring it back later once more of the surrounding
infrastructure becomes clearer.  To me it's a useful feature, but I appreciate
this is early days and we shouldn't always try for all the bells and whistles on
day 1.


> The cat catches fire?
Dog person? :) Just a nice normal error return to indicate no resources.

Jonathan

>
Shiju Jose Nov. 15, 2024, 12:21 p.m. UTC | #9
Hi Boris,

Please find reply inline.
>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 14 November 2024 13:33
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; gregkh@linuxfoundation.org;
>sudeep.holla@arm.com; jassisinghbrar@gmail.com; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; tanxiaofei
><tanxiaofei@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; Roberto
>Sassu <roberto.sassu@huawei.com>; kangkang.shen@futurewei.com;
>wanghuiqiang <wanghuiqiang@huawei.com>; Linuxarm
><linuxarm@huawei.com>
>Subject: Re: [PATCH v15 11/15] EDAC: Add memory repair control feature
>
>On Mon, Nov 11, 2024 at 04:54:48PM +0000, Shiju Jose wrote:
>> Presently, 0 (soft memory repair) and 1 (hard memory repair),  depends
>> on which mode/s a memory device is supported.
>
>What if the device supports more than one mode?
>
>> However for CXL memory sparing feature, the persistent mode is
>> configurable at runtime for a memory sparing instance, thus both soft and
>hard sparing are supported.
>> Example given for CXL memory sparing feature in
>> Documentation/edac/memory_repair.rst,
>> root@localhost:~# cat
>> /sys/bus/edac/devices/cxl_mem0/mem_repair1/persist_mode_avail
>> 0,1
>
>Ok, and how is the user supposed to know what those mean?

Print in  string format?, may be as 'persist'/'volatile'? 

>
>> Kernel sysfs doc mentioned about array of values as follows, though not seen
>much examples.
>> https://docs.kernel.org/filesystems/sysfs.html
>> "Attributes should be ASCII text files, preferably with only one value
>> per file. It is noted that it may not be efficient to contain only one
>> value per file, so it is socially acceptable to express an array of values of the
>same type."
>
>True story. Ok, so there's an exception to that rule.
>
>> The values of these attributes are specific to device and portion of the memory
>to repair.
>> For example, In CXL repair features,
>> CXL memory device identifies a failure on a memory component, device
>> provides the corresponding values of the attributes (DPA, channel,
>> rank, nibble mask, bank group, bank, row, column or sub-channel etc) in an
>event record to the host and to the userspace in the corresponding trace event.
>> Userspace shall use these values for the query resource availability and repair
>operations.
>
>I don't think you're answering my question. Lemme try again:
>
>I am on a machine with such an interface. I do
>
>echo 0xdeadbeef > /sys/devices...
>-EINVAL
>
>echo 0xface > ...
>-EINVAL
>
>How do I know what the allowed ranges are?

I am  fine with adding the support for expose the ranges of these,
but makes more sense to do it when a driver surfaces that can do it. 
I will use the DPA for CXL features as an example of how it would be done and 
it's the one is supported now.

>> This will work for the CXL PPR feature where the result of the query
[...]
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju
Borislav Petkov Nov. 19, 2024, 12:32 p.m. UTC | #10
On Fri, Nov 15, 2024 at 12:14:15PM +0000, Jonathan Cameron wrote:
> We could drop the query for now and bring it back later once more of the surrounding
> infrastructure becomes clearer.  To me it's a useful feature, but I appreciate
> this is early days and we shouldn't always try for all the bells and whistles on
> day 1.

Agreed.

Let's add this only when it is really really needed and it cannot be part of
the actual repair flow. Right now it can be an implicit step of the repair
where latter is first tried, and if it works, it is actually done.

Then, if it really really turns out that one needs the "try" thing as an
explicit step, we can then actually carve it out.

Thx.
Borislav Petkov Nov. 19, 2024, 12:36 p.m. UTC | #11
On Fri, Nov 15, 2024 at 12:21:16PM +0000, Shiju Jose wrote:
> >Ok, and how is the user supposed to know what those mean?
> 
> Print in  string format?, may be as 'persist'/'volatile'? 

That sounds like an abuse of sysfs (Greg?) to me and even if it were possible,
you need explanation what those strings mean.

> I am  fine with adding the support for expose the ranges of these,
> but makes more sense to do it when a driver surfaces that can do it.

So how do you envision to do it otherwise? The user is supposed to guess the
ranges?

That's not a good UI IMO.
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-edac-memory-repair b/Documentation/ABI/testing/sysfs-edac-memory-repair
new file mode 100644
index 000000000000..393206b8d418
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-edac-memory-repair
@@ -0,0 +1,168 @@ 
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory
+		pertains to the memory media repair features control, such as
+		PPR (Post Package Repair), memory sparing etc, where<dev-name>
+		directory corresponds to a device registered with the EDAC
+		device driver for the memory repair features.
+		Memory sparing is a repair function that replaces a portion
+		of memory (spared memory) with a portion of functional memory.
+		Memory sparing has cacheline/row/bank/rank sparing
+		granularities.
+		The sysfs attributes nodes for a repair feature are only
+		present if the parent driver has implemented the corresponding
+		attr callback function and provided the necessary operations
+		to the EDAC device driver during registration.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) Type of the repair instance. For eg. sPPR, hPPR, cacheline/
+		row/bank/rank memory sparing etc.
+		0 - Soft PPR(sPPR)
+		1 - Hard PPR(hPPR)
+		2 - Cacheline memory sparing
+		3 - Row memory sparing
+		4 - Bank memory sparing
+		5 - Rank memory sparing
+		All other values are reserved.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode_avail
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) Persist repair modes supported in the device.
+		0 - Soft PPR(sPPR)/soft memory sparing (temporary repair).
+		1 - Hard PPR(hPPR)/hard memory sparing (permanent repair).
+		All other values are reserved.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Current persist repair mode.
+		0 - Soft PPR(sPPR)/soft memory sparing (temporary repair).
+		1 - Hard PPR(hPPR)/hard memory sparing (permanent repair).
+		All other values are reserved.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa_support
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if supports DPA for a repair operation.
+		E.g. PPR, memory sparing.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_safe_when_in_use
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if memory media is accessible and data is retained
+		during the memory repair operation.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) HPA (Host Physical Address) for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) DPA (Device Physical Address) for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/nibble_mask
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Nibble mask for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank_group
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Memory bank group for repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Memory bank for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/rank
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Memory rank for repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/row
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Row for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/column
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Column for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/channel
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Channel for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/sub_channel
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Sub-channel for memory repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/query
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(WO) Query whether the repair operation is supported for the
+		memory attributes set. Return failure if resources are
+		not available to perform repair.
+		1 - issue query.
+		All other values are reserved.
+
+What:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair
+Date:		Jan 2025
+KernelVersion:	6.13
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(WO) Start the memory repair operation for the memory attributes
+		set. Return failure if resources are not available to
+		perform repair.
+		1 - issue repair operation.
+		All other values are reserved.
+		In some states of system configuration (e.g. before address
+		decoders have been configured), memory devices (e.g. CXL)
+		may not have an active mapping in the main host address
+		physical address map. As such, the memory to repair must be
+		identified by a device specific physical addressing scheme
+		using a DPA. The DPA to use will be presented in related
+		error records.
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index b24c2c112d9c..d037bce88487 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -10,7 +10,7 @@  obj-$(CONFIG_EDAC)			:= edac_core.o
 
 edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
 edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
-edac_core-y	+= scrub.o ecs.o
+edac_core-y	+= scrub.o ecs.o mem_repair.o
 
 edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
 
diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index bbf5b52a1ce1..9575b04a7454 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -576,6 +576,7 @@  static void edac_dev_release(struct device *dev)
 {
 	struct edac_dev_feat_ctx *ctx = container_of(dev, struct edac_dev_feat_ctx, dev);
 
+	kfree(ctx->mem_repair);
 	kfree(ctx->scrub);
 	kfree(ctx->dev.groups);
 	kfree(ctx);
@@ -610,6 +611,7 @@  int edac_dev_register(struct device *parent, char *name,
 		      const struct edac_dev_feature *ras_features)
 {
 	const struct attribute_group **ras_attr_groups;
+	int mem_repair_cnt = 0, mem_repair_inst = 0;
 	int scrub_cnt = 0, scrub_inst = 0;
 	struct edac_dev_data *dev_data;
 	struct edac_dev_feat_ctx *ctx;
@@ -626,6 +628,10 @@  int edac_dev_register(struct device *parent, char *name,
 			attr_gcnt++;
 			scrub_cnt++;
 			break;
+		case RAS_FEAT_MEM_REPAIR:
+			attr_gcnt++;
+			mem_repair_cnt++;
+			break;
 		case RAS_FEAT_ECS:
 			attr_gcnt += ras_features[feat].ecs_info.num_media_frus;
 			break;
@@ -652,6 +658,14 @@  int edac_dev_register(struct device *parent, char *name,
 		}
 	}
 
+	if (mem_repair_cnt) {
+		ctx->mem_repair = kcalloc(mem_repair_cnt, sizeof(*ctx->mem_repair), GFP_KERNEL);
+		if (!ctx->mem_repair) {
+			ret = -ENOMEM;
+			goto groups_free;
+		}
+	}
+
 	attr_gcnt = 0;
 	for (feat = 0; feat < num_features; feat++, ras_features++) {
 		switch (ras_features->ft_type) {
@@ -686,6 +700,23 @@  int edac_dev_register(struct device *parent, char *name,
 
 			attr_gcnt += ras_features->ecs_info.num_media_frus;
 			break;
+		case RAS_FEAT_MEM_REPAIR:
+			if (!ras_features->mem_repair_ops ||
+			    mem_repair_inst != ras_features->instance)
+				goto data_mem_free;
+
+			dev_data = &ctx->mem_repair[mem_repair_inst];
+			dev_data->instance = mem_repair_inst;
+			dev_data->mem_repair_ops = ras_features->mem_repair_ops;
+			dev_data->private = ras_features->ctx;
+			ret = edac_mem_repair_get_desc(parent, &ras_attr_groups[attr_gcnt],
+						       ras_features->instance);
+			if (ret)
+				goto data_mem_free;
+
+			mem_repair_inst++;
+			attr_gcnt++;
+			break;
 		default:
 			ret = -EINVAL;
 			goto data_mem_free;
@@ -712,6 +743,7 @@  int edac_dev_register(struct device *parent, char *name,
 	return devm_add_action_or_reset(parent, edac_dev_unreg, &ctx->dev);
 
 data_mem_free:
+	kfree(ctx->mem_repair);
 	kfree(ctx->scrub);
 groups_free:
 	kfree(ras_attr_groups);
diff --git a/drivers/edac/mem_repair.c b/drivers/edac/mem_repair.c
new file mode 100755
index 000000000000..93246ad0c9eb
--- /dev/null
+++ b/drivers/edac/mem_repair.c
@@ -0,0 +1,367 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * The generic EDAC memory repair driver is designed to control the memory
+ * devices with memory repair features, such as Post Package Repair (PPR),
+ * memory sparing etc. The common sysfs memory repair interface abstracts
+ * the control of various arbitrary memory repair functionalities into a
+ * unified set of functions.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ */
+
+#include <linux/edac.h>
+
+enum edac_mem_repair_attributes {
+	MEM_REPAIR_TYPE,
+	MEM_REPAIR_PERSIST_MODE_AVAIL,
+	MEM_REPAIR_PERSIST_MODE,
+	MEM_REPAIR_DPA_SUPPORT,
+	MEM_REPAIR_SAFE_IN_USE,
+	MEM_REPAIR_HPA,
+	MEM_REPAIR_DPA,
+	MEM_REPAIR_NIBBLE_MASK,
+	MEM_REPAIR_BANK_GROUP,
+	MEM_REPAIR_BANK,
+	MEM_REPAIR_RANK,
+	MEM_REPAIR_ROW,
+	MEM_REPAIR_COLUMN,
+	MEM_REPAIR_CHANNEL,
+	MEM_REPAIR_SUB_CHANNEL,
+	MEM_REPAIR_QUERY,
+	MEM_DO_REPAIR,
+	MEM_REPAIR_MAX_ATTRS
+};
+
+struct edac_mem_repair_dev_attr {
+	struct device_attribute dev_attr;
+	u8 instance;
+};
+
+struct edac_mem_repair_context {
+	char name[EDAC_FEAT_NAME_LEN];
+	struct edac_mem_repair_dev_attr mem_repair_dev_attr[MEM_REPAIR_MAX_ATTRS];
+	struct attribute *mem_repair_attrs[MEM_REPAIR_MAX_ATTRS + 1];
+	struct attribute_group group;
+};
+
+#define TO_MEM_REPAIR_DEV_ATTR(_dev_attr)      \
+		container_of(_dev_attr, struct edac_mem_repair_dev_attr, dev_attr)
+
+#define EDAC_MEM_REPAIR_ATTR_SHOW(attrib, cb, type, format)			\
+static ssize_t attrib##_show(struct device *ras_feat_dev,			\
+			     struct device_attribute *attr, char *buf)		\
+{										\
+	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;			\
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);		\
+	const struct edac_mem_repair_ops *ops =					\
+				ctx->mem_repair[inst].mem_repair_ops;		\
+	type data;								\
+	int ret;								\
+										\
+	ret = ops->cb(ras_feat_dev->parent, ctx->mem_repair[inst].private,	\
+		      &data);							\
+	if (ret)								\
+		return ret;							\
+										\
+	return sysfs_emit(buf, format, data);					\
+}
+
+EDAC_MEM_REPAIR_ATTR_SHOW(repair_type, get_repair_type, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(persist_mode, get_persist_mode, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(dpa_support, get_dpa_support, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(repair_safe_when_in_use, get_repair_safe_when_in_use, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(hpa, get_hpa, u64, "0x%llx\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(dpa, get_dpa, u64, "0x%llx\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(nibble_mask, get_nibble_mask, u64, "0x%llx\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(bank_group, get_bank_group, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(bank, get_bank, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(rank, get_rank, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(row, get_row, u64, "0x%llx\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(column, get_column, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(channel, get_channel, u32, "%u\n")
+EDAC_MEM_REPAIR_ATTR_SHOW(sub_channel, get_sub_channel, u32, "%u\n")
+
+#define EDAC_MEM_REPAIR_ATTR_STORE(attrib, cb, type, conv_func)			\
+static ssize_t attrib##_store(struct device *ras_feat_dev,			\
+			      struct device_attribute *attr,			\
+			      const char *buf, size_t len)			\
+{										\
+	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;			\
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);		\
+	const struct edac_mem_repair_ops *ops =					\
+				ctx->mem_repair[inst].mem_repair_ops;		\
+	type data;								\
+	int ret;								\
+										\
+	ret = conv_func(buf, 0, &data);						\
+	if (ret < 0)								\
+		return ret;							\
+										\
+	ret = ops->cb(ras_feat_dev->parent, ctx->mem_repair[inst].private,	\
+		      data);							\
+	if (ret)								\
+		return ret;							\
+										\
+	return len;								\
+}
+
+EDAC_MEM_REPAIR_ATTR_STORE(persist_mode, set_persist_mode, unsigned long, kstrtoul)
+EDAC_MEM_REPAIR_ATTR_STORE(hpa, set_hpa, u64, kstrtou64)
+EDAC_MEM_REPAIR_ATTR_STORE(dpa, set_dpa, u64, kstrtou64)
+EDAC_MEM_REPAIR_ATTR_STORE(nibble_mask, set_nibble_mask, u64, kstrtou64)
+EDAC_MEM_REPAIR_ATTR_STORE(bank_group, set_bank_group, unsigned long, kstrtoul)
+EDAC_MEM_REPAIR_ATTR_STORE(bank, set_bank, unsigned long, kstrtoul)
+EDAC_MEM_REPAIR_ATTR_STORE(rank, set_rank, unsigned long, kstrtoul)
+EDAC_MEM_REPAIR_ATTR_STORE(row, set_row, u64, kstrtou64)
+EDAC_MEM_REPAIR_ATTR_STORE(column, set_column, unsigned long, kstrtoul)
+EDAC_MEM_REPAIR_ATTR_STORE(channel, set_channel, unsigned long, kstrtoul)
+EDAC_MEM_REPAIR_ATTR_STORE(sub_channel, set_sub_channel, unsigned long, kstrtoul)
+
+#define EDAC_MEM_REPAIR_DO_OP(attrib, cb)						\
+static ssize_t attrib##_store(struct device *ras_feat_dev,				\
+			      struct device_attribute *attr,				\
+			      const char *buf, size_t len)				\
+{											\
+	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;				\
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);			\
+	const struct edac_mem_repair_ops *ops = ctx->mem_repair[inst].mem_repair_ops;	\
+	int ret;									\
+											\
+	ret = ops->cb(ras_feat_dev->parent, ctx->mem_repair[inst].private);		\
+	if (ret)									\
+		return ret;								\
+											\
+	return len;									\
+}
+
+EDAC_MEM_REPAIR_DO_OP(query, do_query)
+EDAC_MEM_REPAIR_DO_OP(repair, do_repair)
+
+static ssize_t persist_mode_avail_show(struct device *ras_feat_dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	u8 inst = TO_MEM_REPAIR_DEV_ATTR(attr)->instance;
+	const struct edac_mem_repair_ops *ops = ctx->mem_repair[inst].mem_repair_ops;
+
+	return ops->get_persist_mode_avail(ras_feat_dev->parent,
+					   ctx->mem_repair[inst].private, buf);
+}
+
+static umode_t mem_repair_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id)
+{
+	struct device *ras_feat_dev = kobj_to_dev(kobj);
+	struct device_attribute *dev_attr = container_of(a, struct device_attribute, attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	u8 inst = TO_MEM_REPAIR_DEV_ATTR(dev_attr)->instance;
+	const struct edac_mem_repair_ops *ops = ctx->mem_repair[inst].mem_repair_ops;
+
+	switch (attr_id) {
+	case MEM_REPAIR_TYPE:
+		if (ops->get_repair_type)
+			return a->mode;
+		break;
+	case MEM_REPAIR_PERSIST_MODE_AVAIL:
+		if (ops->get_persist_mode_avail)
+			return a->mode;
+		break;
+	case MEM_REPAIR_PERSIST_MODE:
+		if (ops->get_persist_mode) {
+			if (ops->set_persist_mode)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_DPA_SUPPORT:
+		if (ops->get_dpa_support)
+			return a->mode;
+		break;
+	case MEM_REPAIR_SAFE_IN_USE:
+		if (ops->get_repair_safe_when_in_use)
+			return a->mode;
+		break;
+	case MEM_REPAIR_HPA:
+		if (ops->get_hpa) {
+			if (ops->set_hpa)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_DPA:
+		if (ops->get_dpa) {
+			if (ops->set_dpa)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_NIBBLE_MASK:
+		if (ops->get_nibble_mask) {
+			if (ops->set_nibble_mask)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_BANK_GROUP:
+		if (ops->get_bank_group) {
+			if (ops->set_bank_group)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_BANK:
+		if (ops->get_bank) {
+			if (ops->set_bank)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_RANK:
+		if (ops->get_rank) {
+			if (ops->set_rank)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_ROW:
+		if (ops->get_row) {
+			if (ops->set_row)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_COLUMN:
+		if (ops->get_column) {
+			if (ops->set_column)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_CHANNEL:
+		if (ops->get_channel) {
+			if (ops->set_channel)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_SUB_CHANNEL:
+		if (ops->get_sub_channel) {
+			if (ops->set_sub_channel)
+				return a->mode;
+			else
+				return 0444;
+		}
+		break;
+	case MEM_REPAIR_QUERY:
+		if (ops->do_query)
+			return a->mode;
+		break;
+	case MEM_DO_REPAIR:
+		if (ops->do_repair)
+			return a->mode;
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
+#define EDAC_MEM_REPAIR_ATTR_RO(_name, _instance)       \
+	((struct edac_mem_repair_dev_attr) { .dev_attr = __ATTR_RO(_name), \
+					     .instance = _instance })
+
+#define EDAC_MEM_REPAIR_ATTR_WO(_name, _instance)       \
+	((struct edac_mem_repair_dev_attr) { .dev_attr = __ATTR_WO(_name), \
+					     .instance = _instance })
+
+#define EDAC_MEM_REPAIR_ATTR_RW(_name, _instance)       \
+	((struct edac_mem_repair_dev_attr) { .dev_attr = __ATTR_RW(_name), \
+					     .instance = _instance })
+
+static int mem_repair_create_desc(struct device *dev,
+				  const struct attribute_group **attr_groups,
+				  u8 instance)
+{
+	struct edac_mem_repair_context *ctx;
+	struct attribute_group *group;
+	int i;
+	struct edac_mem_repair_dev_attr dev_attr[] = {
+		[MEM_REPAIR_TYPE] = EDAC_MEM_REPAIR_ATTR_RO(repair_type,
+							    instance),
+		[MEM_REPAIR_PERSIST_MODE_AVAIL] =
+				EDAC_MEM_REPAIR_ATTR_RO(persist_mode_avail,
+							instance),
+		[MEM_REPAIR_PERSIST_MODE] =
+				EDAC_MEM_REPAIR_ATTR_RW(persist_mode, instance),
+		[MEM_REPAIR_DPA_SUPPORT] =
+				EDAC_MEM_REPAIR_ATTR_RO(dpa_support, instance),
+		[MEM_REPAIR_SAFE_IN_USE] =
+				EDAC_MEM_REPAIR_ATTR_RO(repair_safe_when_in_use,
+							instance),
+		[MEM_REPAIR_HPA] = EDAC_MEM_REPAIR_ATTR_RW(hpa, instance),
+		[MEM_REPAIR_DPA] = EDAC_MEM_REPAIR_ATTR_RW(dpa, instance),
+		[MEM_REPAIR_NIBBLE_MASK] =
+				EDAC_MEM_REPAIR_ATTR_RW(nibble_mask, instance),
+		[MEM_REPAIR_BANK_GROUP] =
+				EDAC_MEM_REPAIR_ATTR_RW(bank_group, instance),
+		[MEM_REPAIR_BANK] = EDAC_MEM_REPAIR_ATTR_RW(bank, instance),
+		[MEM_REPAIR_RANK] = EDAC_MEM_REPAIR_ATTR_RW(rank, instance),
+		[MEM_REPAIR_ROW] = EDAC_MEM_REPAIR_ATTR_RW(row, instance),
+		[MEM_REPAIR_COLUMN] = EDAC_MEM_REPAIR_ATTR_RW(column, instance),
+		[MEM_REPAIR_CHANNEL] = EDAC_MEM_REPAIR_ATTR_RW(channel, instance),
+		[MEM_REPAIR_SUB_CHANNEL] =
+				EDAC_MEM_REPAIR_ATTR_RW(sub_channel, instance),
+		[MEM_REPAIR_QUERY] = EDAC_MEM_REPAIR_ATTR_WO(query, instance),
+		[MEM_DO_REPAIR] = EDAC_MEM_REPAIR_ATTR_WO(repair, instance)
+	};
+
+	ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	for (i = 0; i < MEM_REPAIR_MAX_ATTRS; i++) {
+		memcpy(&ctx->mem_repair_dev_attr[i].dev_attr,
+		       &dev_attr[i], sizeof(dev_attr[i]));
+		ctx->mem_repair_attrs[i] =
+				&ctx->mem_repair_dev_attr[i].dev_attr.attr;
+	}
+
+	sprintf(ctx->name, "%s%d", "mem_repair", instance);
+	group = &ctx->group;
+	group->name = ctx->name;
+	group->attrs = ctx->mem_repair_attrs;
+	group->is_visible  = mem_repair_attr_visible;
+	attr_groups[0] = group;
+
+	return 0;
+}
+
+/**
+ * edac_mem_repair_get_desc - get EDAC memory repair descriptors
+ * @dev: client device with memory repair feature
+ * @attr_groups: pointer to attribute group container
+ * @instance: device's memory repair instance number.
+ *
+ * Return:
+ *  * %0	- Success.
+ *  * %-EINVAL	- Invalid parameters passed.
+ *  * %-ENOMEM	- Dynamic memory allocation failed.
+ */
+int edac_mem_repair_get_desc(struct device *dev,
+			     const struct attribute_group **attr_groups, u8 instance)
+{
+	if (!dev || !attr_groups)
+		return -EINVAL;
+
+	return mem_repair_create_desc(dev, attr_groups, instance);
+}
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 04385b1a9283..b52730d63088 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -670,6 +670,7 @@  static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
 enum edac_dev_feat {
 	RAS_FEAT_SCRUB,
 	RAS_FEAT_ECS,
+	RAS_FEAT_MEM_REPAIR,
 	RAS_FEAT_MAX
 };
 
@@ -731,11 +732,95 @@  int edac_ecs_get_desc(struct device *ecs_dev,
 		      const struct attribute_group **attr_groups,
 		      u16 num_media_frus);
 
+enum edac_mem_repair_type {
+	EDAC_TYPE_SPPR,
+	EDAC_TYPE_HPPR,
+	EDAC_TYPE_CACHELINE_MEM_SPARING,
+	EDAC_TYPE_ROW_MEM_SPARING,
+	EDAC_TYPE_BANK_MEM_SPARING,
+	EDAC_TYPE_RANK_MEM_SPARING,
+};
+
+enum edac_mem_repair_persist_mode {
+	EDAC_MEM_REPAIR_SOFT, /* soft memory repair */
+	EDAC_MEM_REPAIR_HARD, /* hard memory repair */
+};
+
+/**
+ * struct edac_mem_repair_ops - memory repair device operations
+ * (all elements optional)
+ * @get_repair_type: get the memory repair type, listed in enum edac_mem_repair_type.
+ * @get_persist_mode_avail: get the persist modes supported in the device.
+ * @get_persist_mode: get the persist mode of the memory repair instance.
+ * @set_persist_mode: set the persist mode for the memory repair instance.
+ * @get_dpa_support: get dpa support flag.
+ * @get_repair_safe_when_in_use: get whether memory media is accessible and
+ *			       data is retained during repair operation.
+ * @get_hpa: get HPA for memory repair.
+ * @set_hpa: set HPA for memory repair.
+ * @get_dpa: get DPA for memory repair.
+ * @set_dpa: set DPA for memory repair.
+ * @get_nibble_mask: get nibble mask for memory repair.
+ * @set_nibble_mask: set nibble mask for memory repair.
+ * @get_bank_group: get bank group for memory repair.
+ * @set_bank_group: set bank group for memory repair.
+ * @get_bank: get bank for memory repair.
+ * @set_bank: set bank for memory repair.
+ * @get_rank: get rank for memory repair.
+ * @set_rank: set rank for memory repair.
+ * @get_row: get row for memory repair.
+ * @set_row: set row for memory repair.
+ * @get_column: get column for memory repair.
+ * @set_column: set column for memory repair.
+ * @get_channel: get channel for memory repair.
+ * @set_channel: set channel for memory repair.
+ * @get_sub_channel: get sub channel for memory repair.
+ * @set_sub_channel: set sub channel for memory repair.
+ * @do_query: Query memory repair operation for the HPA/DPA/other attrs set
+ *	      is supported or not.
+ * @do_repair: start memory repair operation for the HPA/DPA/other attrs set.
+ */
+struct edac_mem_repair_ops {
+	int (*get_repair_type)(struct device *dev, void *drv_data, u32 *val);
+	int (*get_persist_mode_avail)(struct device *dev, void *drv_data, char *buf);
+	int (*get_persist_mode)(struct device *dev, void *drv_data, u32 *mode);
+	int (*set_persist_mode)(struct device *dev, void *drv_data, u32 mode);
+	int (*get_dpa_support)(struct device *dev, void *drv_data, u32 *val);
+	int (*get_repair_safe_when_in_use)(struct device *dev, void *drv_data, u32 *val);
+	int (*get_hpa)(struct device *dev, void *drv_data, u64 *hpa);
+	int (*set_hpa)(struct device *dev, void *drv_data, u64 hpa);
+	int (*get_dpa)(struct device *dev, void *drv_data, u64 *dpa);
+	int (*set_dpa)(struct device *dev, void *drv_data, u64 dpa);
+	int (*get_nibble_mask)(struct device *dev, void *drv_data, u64 *val);
+	int (*set_nibble_mask)(struct device *dev, void *drv_data, u64 val);
+	int (*get_bank_group)(struct device *dev, void *drv_data, u32 *val);
+	int (*set_bank_group)(struct device *dev, void *drv_data, u32 val);
+	int (*get_bank)(struct device *dev, void *drv_data, u32 *val);
+	int (*set_bank)(struct device *dev, void *drv_data, u32 val);
+	int (*get_rank)(struct device *dev, void *drv_data, u32 *val);
+	int (*set_rank)(struct device *dev, void *drv_data, u32 val);
+	int (*get_row)(struct device *dev, void *drv_data, u64 *val);
+	int (*set_row)(struct device *dev, void *drv_data, u64 val);
+	int (*get_column)(struct device *dev, void *drv_data, u32 *val);
+	int (*set_column)(struct device *dev, void *drv_data, u32 val);
+	int (*get_channel)(struct device *dev, void *drv_data, u32 *val);
+	int (*set_channel)(struct device *dev, void *drv_data, u32 val);
+	int (*get_sub_channel)(struct device *dev, void *drv_data, u32 *val);
+	int (*set_sub_channel)(struct device *dev, void *drv_data, u32 val);
+	int (*do_query)(struct device *dev, void *drv_data);
+	int (*do_repair)(struct device *dev, void *drv_data);
+};
+
+int edac_mem_repair_get_desc(struct device *dev,
+			     const struct attribute_group **attr_groups,
+			     u8 instance);
+
 /* EDAC device feature information structure */
 struct edac_dev_data {
 	union {
 		const struct edac_scrub_ops *scrub_ops;
 		const struct edac_ecs_ops *ecs_ops;
+		const struct edac_mem_repair_ops *mem_repair_ops;
 	};
 	u8 instance;
 	void *private;
@@ -746,6 +831,7 @@  struct edac_dev_feat_ctx {
 	void *private;
 	struct edac_dev_data *scrub;
 	struct edac_dev_data ecs;
+	struct edac_dev_data *mem_repair;
 };
 
 struct edac_dev_feature {
@@ -754,6 +840,7 @@  struct edac_dev_feature {
 	union {
 		const struct edac_scrub_ops *scrub_ops;
 		const struct edac_ecs_ops *ecs_ops;
+		const struct edac_mem_repair_ops *mem_repair_ops;
 	};
 	void *ctx;
 	struct edac_ecs_ex_info ecs_info;