diff mbox series

[RFC,2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes

Message ID 20230915172818.761-3-shiju.jose@huawei.com (mailing list archive)
State New
Headers show
Series ACPI:RASF: Add support for ACPI RASF, ACPI RAS2 and configure scrubbers | expand

Commit Message

Shiju Jose Sept. 15, 2023, 5:28 p.m. UTC
From: Shiju Jose <shiju.jose@huawei.com>

Add sysfs documentation entries for the set of attributes those are
exposed in /sys/class/scrub/ by the scrub driver. These attributes
support configuring parameters of a scrub device.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 .../ABI/testing/sysfs-class-scrub-configure   | 82 +++++++++++++++++++
 1 file changed, 82 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure

Comments

Jiaqi Yan Sept. 22, 2023, 12:07 a.m. UTC | #1
On Fri, Sep 15, 2023 at 10:29 AM <shiju.jose@huawei.com> wrote:
>
> From: Shiju Jose <shiju.jose@huawei.com>
>
> Add sysfs documentation entries for the set of attributes those are
> exposed in /sys/class/scrub/ by the scrub driver. These attributes
> support configuring parameters of a scrub device.
>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  .../ABI/testing/sysfs-class-scrub-configure   | 82 +++++++++++++++++++
>  1 file changed, 82 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
>
> diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> new file mode 100644
> index 000000000000..347e2167dc62
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> @@ -0,0 +1,82 @@
> +What:          /sys/class/scrub/
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               The scrub/ class subdirectory belongs to the
> +               scrubber subsystem.
> +
> +What:          /sys/class/scrub/scrubX/
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               The /sys/class/scrub/scrub{0,1,2,3,...} directories

This API (sysfs interface) is very specific to the ACPI interface
defined for hardware patrol scrubber. I wonder can we have some
interface that is more generic, for a couple of reasons:

1. I am not aware of any chip/platform hardware that implemented the
hw ps part defined in ACPI RASF/RAS2 spec. So I am curious what the
RAS experts from different hardware vendors think about this. For
example, Tony and Dave from Intel, Jon and Vilas from AMD. Is there
any hardware platform (if allowed to disclose) that implemented ACPI
RASF/RAS2? If so, will vendors continue to support the control of
patrol scrubber using the ACPI spec? If not (as Tony said in [1], will
the vendor consider starting some future platform?

If we are unlikely to get the vendor support, creating this ACPI
specific sysfs API (and the driver implementations) in Linux seems to
have limited meaning.

> +               correspond to each scrub device.
> +
> +What:          /sys/class/scrub/scrubX/name
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               (RO) name of the memory scrub device
> +
> +What:          /sys/class/scrub/scrubX/regionY/

2. I believe the concept of "region" here is probably from
PATROL_SCRUB defined in “APCI section 5.2.20.5. Parameter Block". It
is indeed powerful: if a process's physical memory spans over multiple
memory controllers, OS can in theory scrub chunks of the memory
belonging to the process. However, from a previous discussion [1],
"From a h/w perspective it might always be complex". IIUC, the address
translation from physical address to channel address is hard to
achieve, and probably that's one of the tech reasons the patrol scrub
ACPI spec is not in practice implemented?

So my take is, control at the granularity of the memory controller is
probably a nice compromise. Both OS and userspace can get a pretty
decent amount of flexibility, start/stop/adjust speed of the scrubbing
on a memory controller; meanwhile it doesn't impose too much pain to
hardware vendors when they provide these features in hardware. In
terms of how these controls/features will be implemented, I imagine it
could be implemented:
* via hardware registers that directly or indirectly control on memory
controllers
* via ACPI if the situation changes in 10 years (and the RASF/RAS2/PCC
drivers implemented in this patchset can be directly plugged into)
* a kernel-thread that uses cpu read to detect memory errors, if
hardware support is unavailable or not good enough

Given these possible backends of scrubbing, I think a more generic
sysfs API that covers and abstracts these backends will be more
valuable right now. I haven’t thought thoroughly, but how about
defining the top-level interface as something like
“/sys/devices/system/memory_controller_scrubX/”, or
“/sys/class/memory_controllerX/scrub”?

[1] https://lore.kernel.org/linux-mm/SJ1PR11MB6083BF93E9A88E659CED5EC4FC3F9@SJ1PR11MB6083.namprd11.prod.outlook.com/T/#m13516ee35caa05b506080ae805bee14f9f958d43

> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               The /sys/class/scrub/scrubX/region{0,1,2,3,...}
> +               directories correspond to each scrub region under a scrub device.
> +               Scrub region is a physical address range for which scrub may be
> +               separately controlled. Regions may overlap in which case the
> +               scrubbing rate of the overlapped memory will be at least that
> +               expected due to each overlapping region.
> +
> +What:          /sys/class/scrub/scrubX/regionY/addr_base
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               (RW) The base of the address range of the memory region
> +               to be patrol scrubbed.
> +               On reading, returns the base of the memory region for
> +               the actual address range(The platform calculates
> +               the nearest patrol scrub boundary address from where
> +               it can start scrubbing).
> +
> +What:          /sys/class/scrub/scrubX/regionY/addr_size
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               (RW) The size of the address range to be patrol scrubbed.
> +               On reading, returns the size of the memory region for
> +               the actual address range.
> +
> +What:          /sys/class/scrub/scrubX/regionY/enable
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               (WO) Start/Stop scrubbing the memory region.
> +               1 - enable the memory scrubbing.
> +               0 - disable the memory scrubbing..
> +
> +What:          /sys/class/scrub/scrubX/regionY/speed_available
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               (RO) Supported range for the partol speed(scrub rate)
> +               by the scrubber for a memory region.
> +               The unit of the scrub rate vary depends on the scrubber.
> +
> +What:          /sys/class/scrub/scrubX/regionY/speed
> +Date:          September 2023
> +KernelVersion: 6.7
> +Contact:       linux-kernel@vger.kernel.org
> +Description:
> +               (RW) The partol speed(scrub rate) on the memory region specified and
> +               it must be with in the supported range by the scrubber.
> +               The unit of the scrub rate vary depends on the scrubber.
> --
> 2.34.1
>
>
Jonathan Cameron Sept. 22, 2023, 10:20 a.m. UTC | #2
On Thu, 21 Sep 2023 17:07:04 -0700
Jiaqi Yan <jiaqiyan@google.com> wrote:

> On Fri, Sep 15, 2023 at 10:29 AM <shiju.jose@huawei.com> wrote:
> >
> > From: Shiju Jose <shiju.jose@huawei.com>
> >
> > Add sysfs documentation entries for the set of attributes those are
> > exposed in /sys/class/scrub/ by the scrub driver. These attributes
> > support configuring parameters of a scrub device.
> >
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> > ---
> >  .../ABI/testing/sysfs-class-scrub-configure   | 82 +++++++++++++++++++
> >  1 file changed, 82 insertions(+)
> >  create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
> >
> > diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> > new file mode 100644
> > index 000000000000..347e2167dc62
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> > @@ -0,0 +1,82 @@
> > +What:          /sys/class/scrub/
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               The scrub/ class subdirectory belongs to the
> > +               scrubber subsystem.
> > +
> > +What:          /sys/class/scrub/scrubX/
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               The /sys/class/scrub/scrub{0,1,2,3,...} directories  
> 
> This API (sysfs interface) is very specific to the ACPI interface
> defined for hardware patrol scrubber. I wonder can we have some
> interface that is more generic, for a couple of reasons:

Agreed that it makes sense to define a broad interface.  We have
some hardware focused drivers we can't share yet (IP rules until
a release date in the near future) where this is a reasonable fit
- but indeed there are others such as mapping this to DDR ECS
where it isn't a great mapping.

I'd love to come up with an interface that has the right blend
of generality and flexibility.  That is easiest done before we have
any implementation merged.

> 
> 1. I am not aware of any chip/platform hardware that implemented the
> hw ps part defined in ACPI RASF/RAS2 spec. So I am curious what the
> RAS experts from different hardware vendors think about this. For
> example, Tony and Dave from Intel, Jon and Vilas from AMD. Is there
> any hardware platform (if allowed to disclose) that implemented ACPI
> RASF/RAS2? If so, will vendors continue to support the control of
> patrol scrubber using the ACPI spec? If not (as Tony said in [1], will
> the vendor consider starting some future platform?
> 
> If we are unlikely to get the vendor support, creating this ACPI
> specific sysfs API (and the driver implementations) in Linux seems to
> have limited meaning.

There is a bit of a chicken and egg problem here. Until there is 
reasonable support in kernel (or it looks like there will be),
BIOS teams push back on a requirement to add the tables.
I'd encourage no one to bother with RASF - RAS2 is much less
ambiguous.

> 
> > +               correspond to each scrub device.
> > +
> > +What:          /sys/class/scrub/scrubX/name
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               (RO) name of the memory scrub device
> > +
> > +What:          /sys/class/scrub/scrubX/regionY/  
> 
> 2. I believe the concept of "region" here is probably from
> PATROL_SCRUB defined in “APCI section 5.2.20.5. Parameter Block". It
> is indeed powerful: if a process's physical memory spans over multiple
> memory controllers, OS can in theory scrub chunks of the memory
> belonging to the process. However, from a previous discussion [1],
> "From a h/w perspective it might always be complex". IIUC, the address
> translation from physical address to channel address is hard to
> achieve, and probably that's one of the tech reasons the patrol scrub
> ACPI spec is not in practice implemented?

Next bit is kind of an aside as I mostly agree with your conclusions ;)

This obviously depends on your memory interleave. You want to present
physical address ranges as single controllable regions - probably
most interesting being stuff that maps to NUMA nodes.  The short
answer is that any firmware control will end up writing to all the
memory controllers involved in a given PA range - firmware can easily
establish which those are.

A memory controller can support multiple scrub regions
which map from a PA range to a particular set of RAM addresses
- that's implementation specific. The memory controller is getting
the host PA and can carry out appropriate hashing if it wants to.
Many scrub solutions won't do this - in which case it's max one
region per memory controller (mapped by firmware to one region per
interleave set - assuming interleave sets take in whole memory
controllers - which they normally do).

I would expect existing systems (not design with this differentiated
scrub in mind) to only support scrub control by PA range at the
granularity of interleave sets.

Note that this general question of PA based controls also
maps to things like resource control (resctl) where it's only interesting
to partition memory bandwidth such that the partition applies to the
whole interleave set - that's done for ARM systems anyway by having
the userspace interface pretend there is only one control, but
write the settings to all actual controllers involved. Not sure what
x86 does.

Taking a few examples that I know of.  All 4 socket server - with
control of these as bios options ;).
Assuming individual memory controllers don't allow scrub to be
configured by PA range.

1. Full system memory interleave (people do this form of crazy)
   In that case, there is only one set of firmware controls
   that write to the interfaces of every memory controller.  Depending
   on the interleave design that may still allow multiple regions.
  
2. Socket wide memory interleave.  In that case, firmware controls
   need to effect all memory controllers in that socket if the
   requested 'region' maps to them.  So 4 PA regions. 

3. Die wide memory interleave.  Finer granularity of control
   so perhaps 8 PA rgiones.

4. Finer granularity (there are reasons to do this for above mentioned
   bandwidth resource control which you can only do if not interleaving
   across multiple controllers).



> 
> So my take is, control at the granularity of the memory controller is
> probably a nice compromise. 
> Both OS and userspace can get a pretty
> decent amount of flexibility, start/stop/adjust speed of the scrubbing
> on a memory controller; meanwhile it doesn't impose too much pain to
> hardware vendors when they provide these features in hardware. In
> terms of how these controls/features will be implemented, I imagine it
> could be implemented:
> * via hardware registers that directly or indirectly control on memory
> controllers
> * via ACPI if the situation changes in 10 years (and the RASF/RAS2/PCC
> drivers implemented in this patchset can be directly plugged into)
> * a kernel-thread that uses cpu read to detect memory errors, if
> hardware support is unavailable or not good enough
> 

I more or less agree, but would tweak a little.

1) Allow for multiple regions per memory controller - that's needed
   for RASF etc anyway - because as far as they are concerned there
   is only one interface presented.
2) Tie the interface to interleave sets, not memory controllers.
   NUMA nodes often being a good stand in for those.
   Controlling memory controllers separately for parts of an interleave
   isn't something I'd think was very useful.  This will probably get
   messy in the future though and the complexity could be pushed to
   a userspace tool - as long as enough info is available elsewhere
   in sysfs.  So need those memory controller directories you propose
   to include info on the PA ranges that they are involved in backing
   (which to keep things horrible, can change at runtime via memory
    hotplug and remapping of host physical address ranges on CXL etc)

> Given these possible backends of scrubbing, I think a more generic
> sysfs API that covers and abstracts these backends will be more
> valuable right now. I haven’t thought thoroughly, but how about
> defining the top-level interface as something like
> “/sys/devices/system/memory_controller_scrubX/”, or
> “/sys/class/memory_controllerX/scrub”?

No particular harm in the rename of the directory I guess.
Though some of those 'memory_controllers' would be virtual as they
wouldn't correspond to actual memory controllers but rather to
sets of them.

Jonathan

> 
> [1] https://lore.kernel.org/linux-mm/SJ1PR11MB6083BF93E9A88E659CED5EC4FC3F9@SJ1PR11MB6083.namprd11.prod.outlook.com/T/#m13516ee35caa05b506080ae805bee14f9f958d43

> 
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               The /sys/class/scrub/scrubX/region{0,1,2,3,...}
> > +               directories correspond to each scrub region under a scrub device.
> > +               Scrub region is a physical address range for which scrub may be
> > +               separately controlled. Regions may overlap in which case the
> > +               scrubbing rate of the overlapped memory will be at least that
> > +               expected due to each overlapping region.
> > +
> > +What:          /sys/class/scrub/scrubX/regionY/addr_base
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               (RW) The base of the address range of the memory region
> > +               to be patrol scrubbed.
> > +               On reading, returns the base of the memory region for
> > +               the actual address range(The platform calculates
> > +               the nearest patrol scrub boundary address from where
> > +               it can start scrubbing).
> > +
> > +What:          /sys/class/scrub/scrubX/regionY/addr_size
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               (RW) The size of the address range to be patrol scrubbed.
> > +               On reading, returns the size of the memory region for
> > +               the actual address range.
> > +
> > +What:          /sys/class/scrub/scrubX/regionY/enable
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               (WO) Start/Stop scrubbing the memory region.
> > +               1 - enable the memory scrubbing.
> > +               0 - disable the memory scrubbing..
> > +
> > +What:          /sys/class/scrub/scrubX/regionY/speed_available
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               (RO) Supported range for the partol speed(scrub rate)
> > +               by the scrubber for a memory region.
> > +               The unit of the scrub rate vary depends on the scrubber.
> > +
> > +What:          /sys/class/scrub/scrubX/regionY/speed
> > +Date:          September 2023
> > +KernelVersion: 6.7
> > +Contact:       linux-kernel@vger.kernel.org
> > +Description:
> > +               (RW) The partol speed(scrub rate) on the memory region specified and
> > +               it must be with in the supported range by the scrubber.
> > +               The unit of the scrub rate vary depends on the scrubber.
> > --
> > 2.34.1
> >
> >  
>
Jiaqi Yan Sept. 28, 2023, 5:25 a.m. UTC | #3
On Fri, Sep 22, 2023 at 3:20 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Thu, 21 Sep 2023 17:07:04 -0700
> Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> > On Fri, Sep 15, 2023 at 10:29 AM <shiju.jose@huawei.com> wrote:
> > >
> > > From: Shiju Jose <shiju.jose@huawei.com>
> > >
> > > Add sysfs documentation entries for the set of attributes those are
> > > exposed in /sys/class/scrub/ by the scrub driver. These attributes
> > > support configuring parameters of a scrub device.
> > >
> > > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> > > ---
> > >  .../ABI/testing/sysfs-class-scrub-configure   | 82 +++++++++++++++++++
> > >  1 file changed, 82 insertions(+)
> > >  create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> > > new file mode 100644
> > > index 000000000000..347e2167dc62
> > > --- /dev/null
> > > +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> > > @@ -0,0 +1,82 @@
> > > +What:          /sys/class/scrub/
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               The scrub/ class subdirectory belongs to the
> > > +               scrubber subsystem.
> > > +
> > > +What:          /sys/class/scrub/scrubX/
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               The /sys/class/scrub/scrub{0,1,2,3,...} directories
> >
> > This API (sysfs interface) is very specific to the ACPI interface
> > defined for hardware patrol scrubber. I wonder can we have some
> > interface that is more generic, for a couple of reasons:
>
> Agreed that it makes sense to define a broad interface.  We have
> some hardware focused drivers we can't share yet (IP rules until
> a release date in the near future) where this is a reasonable fit
> - but indeed there are others such as mapping this to DDR ECS
> where it isn't a great mapping.
>
> I'd love to come up with an interface that has the right blend
> of generality and flexibility.  That is easiest done before we have
> any implementation merged.
>
> >
> > 1. I am not aware of any chip/platform hardware that implemented the
> > hw ps part defined in ACPI RASF/RAS2 spec. So I am curious what the
> > RAS experts from different hardware vendors think about this. For
> > example, Tony and Dave from Intel, Jon and Vilas from AMD. Is there
> > any hardware platform (if allowed to disclose) that implemented ACPI
> > RASF/RAS2? If so, will vendors continue to support the control of
> > patrol scrubber using the ACPI spec? If not (as Tony said in [1], will
> > the vendor consider starting some future platform?
> >
> > If we are unlikely to get the vendor support, creating this ACPI
> > specific sysfs API (and the driver implementations) in Linux seems to
> > have limited meaning.
>
> There is a bit of a chicken and egg problem here. Until there is
> reasonable support in kernel (or it looks like there will be),
> BIOS teams push back on a requirement to add the tables.
> I'd encourage no one to bother with RASF - RAS2 is much less
> ambiguous.

Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD (Jon
and Vilas) for your opinion on RAS2.

>
> >
> > > +               correspond to each scrub device.
> > > +
> > > +What:          /sys/class/scrub/scrubX/name
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               (RO) name of the memory scrub device
> > > +
> > > +What:          /sys/class/scrub/scrubX/regionY/
> >
> > 2. I believe the concept of "region" here is probably from
> > PATROL_SCRUB defined in “APCI section 5.2.20.5. Parameter Block". It
> > is indeed powerful: if a process's physical memory spans over multiple
> > memory controllers, OS can in theory scrub chunks of the memory
> > belonging to the process. However, from a previous discussion [1],
> > "From a h/w perspective it might always be complex". IIUC, the address
> > translation from physical address to channel address is hard to
> > achieve, and probably that's one of the tech reasons the patrol scrub
> > ACPI spec is not in practice implemented?
>
> Next bit is kind of an aside as I mostly agree with your conclusions ;)
>
> This obviously depends on your memory interleave. You want to present
> physical address ranges as single controllable regions - probably
> most interesting being stuff that maps to NUMA nodes.  The short
> answer is that any firmware control will end up writing to all the
> memory controllers involved in a given PA range - firmware can easily
> establish which those are.
>
> A memory controller can support multiple scrub regions
> which map from a PA range to a particular set of RAM addresses
> - that's implementation specific. The memory controller is getting
> the host PA and can carry out appropriate hashing if it wants to.
> Many scrub solutions won't do this - in which case it's max one
> region per memory controller (mapped by firmware to one region per
> interleave set - assuming interleave sets take in whole memory
> controllers - which they normally do).
>
> I would expect existing systems (not design with this differentiated
> scrub in mind) to only support scrub control by PA range at the
> granularity of interleave sets.
>
> Note that this general question of PA based controls also
> maps to things like resource control (resctl) where it's only interesting
> to partition memory bandwidth such that the partition applies to the
> whole interleave set - that's done for ARM systems anyway by having
> the userspace interface pretend there is only one control, but
> write the settings to all actual controllers involved. Not sure what
> x86 does.
>
> Taking a few examples that I know of.  All 4 socket server - with
> control of these as bios options ;).
> Assuming individual memory controllers don't allow scrub to be
> configured by PA range.
>
> 1. Full system memory interleave (people do this form of crazy)
>    In that case, there is only one set of firmware controls
>    that write to the interfaces of every memory controller.  Depending
>    on the interleave design that may still allow multiple regions.
>
> 2. Socket wide memory interleave.  In that case, firmware controls
>    need to effect all memory controllers in that socket if the
>    requested 'region' maps to them.  So 4 PA regions.
>
> 3. Die wide memory interleave.  Finer granularity of control
>    so perhaps 8 PA rgiones.
>
> 4. Finer granularity (there are reasons to do this for above mentioned
>    bandwidth resource control which you can only do if not interleaving
>    across multiple controllers).
>
>
>
> >
> > So my take is, control at the granularity of the memory controller is
> > probably a nice compromise.
> > Both OS and userspace can get a pretty
> > decent amount of flexibility, start/stop/adjust speed of the scrubbing
> > on a memory controller; meanwhile it doesn't impose too much pain to
> > hardware vendors when they provide these features in hardware. In
> > terms of how these controls/features will be implemented, I imagine it
> > could be implemented:
> > * via hardware registers that directly or indirectly control on memory
> > controllers
> > * via ACPI if the situation changes in 10 years (and the RASF/RAS2/PCC
> > drivers implemented in this patchset can be directly plugged into)
> > * a kernel-thread that uses cpu read to detect memory errors, if
> > hardware support is unavailable or not good enough
> >
>
> I more or less agree, but would tweak a little.
>
> 1) Allow for multiple regions per memory controller - that's needed
>    for RASF etc anyway - because as far as they are concerned there
>    is only one interface presented.
> 2) Tie the interface to interleave sets, not memory controllers.
>    NUMA nodes often being a good stand in for those.

Does you mean /sys/devices/system/node/nodeX/scrub, where scrub is a
virtual concept of scrubbing device that mapps to 1 or several
physical scrubber backends.

For example, starting/stopping the virtual device means issuing
START/STOP cmd to all backends. And...

>    Controlling memory controllers separately for parts of an interleave
>    isn't something I'd think was very useful.  This will probably get
>    messy in the future though and the complexity could be pushed to
>    a userspace tool - as long as enough info is available elsewhere
>    in sysfs.  So need those memory controller directories you propose
>    to include info on the PA ranges that they are involved in backing

is it acceptable if we don't provide PA range or region in the
interface *for now* if it complicates things a lot? I could be wrong,
but the user of scrubber seems would be ok with not being able to
scrub an arbitrary physical address range. In contrast, not knowing
the scrub results seems to be more annoying to users. So simply giving
some progress indicator like how many bytes a scrubber has scrubbed.

When we really need to support PA range or region, under the
/sys/devices/system/node/nodeX/scrub interface, it basically uses NUMA
node X's PA range. Then to scrub node memory in range [PA1, PA2), some
driver that understand all backends (or can talk to all backends'
drivers) needs to translate the PA into the address in backend's
address space, for example, [PA1, PA2) is mapped to 2 device ranges
[DA11, DA12) on backend_1 and [DA21, DA22) on backend_2.

>    (which to keep things horrible, can change at runtime via memory
>     hotplug and remapping of host physical address ranges on CXL etc)

CXL memory locally attached to the host is probably more or less the
same as normal physical memory. I wonder what it would be like for CXL
memory remotely attached through a memory pool. Does it make sense
that the controller/owner of the memory pool takes the responsibility
of controlling the CXL memory controller to control scrubbing? Does
the owner need to provide/mediate scrubbing support for other clients
using the memory pool?

Thanks,
Jiaqi

>
> > Given these possible backends of scrubbing, I think a more generic
> > sysfs API that covers and abstracts these backends will be more
> > valuable right now. I haven’t thought thoroughly, but how about
> > defining the top-level interface as something like
> > “/sys/devices/system/memory_controller_scrubX/”, or
> > “/sys/class/memory_controllerX/scrub”?
>
> No particular harm in the rename of the directory I guess.
> Though some of those 'memory_controllers' would be virtual as they
> wouldn't correspond to actual memory controllers but rather to
> sets of them.
>
> Jonathan
>
> >
> > [1] https://lore.kernel.org/linux-mm/SJ1PR11MB6083BF93E9A88E659CED5EC4FC3F9@SJ1PR11MB6083.namprd11.prod.outlook.com/T/#m13516ee35caa05b506080ae805bee14f9f958d43
>
> >
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               The /sys/class/scrub/scrubX/region{0,1,2,3,...}
> > > +               directories correspond to each scrub region under a scrub device.
> > > +               Scrub region is a physical address range for which scrub may be
> > > +               separately controlled. Regions may overlap in which case the
> > > +               scrubbing rate of the overlapped memory will be at least that
> > > +               expected due to each overlapping region.
> > > +
> > > +What:          /sys/class/scrub/scrubX/regionY/addr_base
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               (RW) The base of the address range of the memory region
> > > +               to be patrol scrubbed.
> > > +               On reading, returns the base of the memory region for
> > > +               the actual address range(The platform calculates
> > > +               the nearest patrol scrub boundary address from where
> > > +               it can start scrubbing).
> > > +
> > > +What:          /sys/class/scrub/scrubX/regionY/addr_size
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               (RW) The size of the address range to be patrol scrubbed.
> > > +               On reading, returns the size of the memory region for
> > > +               the actual address range.
> > > +
> > > +What:          /sys/class/scrub/scrubX/regionY/enable
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               (WO) Start/Stop scrubbing the memory region.
> > > +               1 - enable the memory scrubbing.
> > > +               0 - disable the memory scrubbing..
> > > +
> > > +What:          /sys/class/scrub/scrubX/regionY/speed_available
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               (RO) Supported range for the partol speed(scrub rate)
> > > +               by the scrubber for a memory region.
> > > +               The unit of the scrub rate vary depends on the scrubber.
> > > +
> > > +What:          /sys/class/scrub/scrubX/regionY/speed
> > > +Date:          September 2023
> > > +KernelVersion: 6.7
> > > +Contact:       linux-kernel@vger.kernel.org
> > > +Description:
> > > +               (RW) The partol speed(scrub rate) on the memory region specified and
> > > +               it must be with in the supported range by the scrubber.
> > > +               The unit of the scrub rate vary depends on the scrubber.
> > > --
> > > 2.34.1
> > >
> > >
> >
>
>
Jonathan Cameron Sept. 28, 2023, 1:14 p.m. UTC | #4
On Wed, 27 Sep 2023 22:25:52 -0700
Jiaqi Yan <jiaqiyan@google.com> wrote:

> On Fri, Sep 22, 2023 at 3:20 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Thu, 21 Sep 2023 17:07:04 -0700
> > Jiaqi Yan <jiaqiyan@google.com> wrote:
> >  
> > > On Fri, Sep 15, 2023 at 10:29 AM <shiju.jose@huawei.com> wrote:  
> > > >
> > > > From: Shiju Jose <shiju.jose@huawei.com>
> > > >
> > > > Add sysfs documentation entries for the set of attributes those are
> > > > exposed in /sys/class/scrub/ by the scrub driver. These attributes
> > > > support configuring parameters of a scrub device.
> > > >
> > > > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> > > > ---
> > > >  .../ABI/testing/sysfs-class-scrub-configure   | 82 +++++++++++++++++++
> > > >  1 file changed, 82 insertions(+)
> > > >  create mode 100644 Documentation/ABI/testing/sysfs-class-scrub-configure
> > > >
> > > > diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
> > > > new file mode 100644
> > > > index 000000000000..347e2167dc62
> > > > --- /dev/null
> > > > +++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
> > > > @@ -0,0 +1,82 @@
> > > > +What:          /sys/class/scrub/
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               The scrub/ class subdirectory belongs to the
> > > > +               scrubber subsystem.
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               The /sys/class/scrub/scrub{0,1,2,3,...} directories  
> > >
> > > This API (sysfs interface) is very specific to the ACPI interface
> > > defined for hardware patrol scrubber. I wonder can we have some
> > > interface that is more generic, for a couple of reasons:  
> >
> > Agreed that it makes sense to define a broad interface.  We have
> > some hardware focused drivers we can't share yet (IP rules until
> > a release date in the near future) where this is a reasonable fit
> > - but indeed there are others such as mapping this to DDR ECS
> > where it isn't a great mapping.
> >
> > I'd love to come up with an interface that has the right blend
> > of generality and flexibility.  That is easiest done before we have
> > any implementation merged.
> >  
> > >
> > > 1. I am not aware of any chip/platform hardware that implemented the
> > > hw ps part defined in ACPI RASF/RAS2 spec. So I am curious what the
> > > RAS experts from different hardware vendors think about this. For
> > > example, Tony and Dave from Intel, Jon and Vilas from AMD. Is there
> > > any hardware platform (if allowed to disclose) that implemented ACPI
> > > RASF/RAS2? If so, will vendors continue to support the control of
> > > patrol scrubber using the ACPI spec? If not (as Tony said in [1], will
> > > the vendor consider starting some future platform?
> > >
> > > If we are unlikely to get the vendor support, creating this ACPI
> > > specific sysfs API (and the driver implementations) in Linux seems to
> > > have limited meaning.  
> >
> > There is a bit of a chicken and egg problem here. Until there is
> > reasonable support in kernel (or it looks like there will be),
> > BIOS teams push back on a requirement to add the tables.
> > I'd encourage no one to bother with RASF - RAS2 is much less
> > ambiguous.  
> 
> Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD (Jon
> and Vilas) for your opinion on RAS2.
> 
> >  
> > >  
> > > > +               correspond to each scrub device.
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/name
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               (RO) name of the memory scrub device
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/regionY/  
> > >
> > > 2. I believe the concept of "region" here is probably from
> > > PATROL_SCRUB defined in “APCI section 5.2.20.5. Parameter Block". It
> > > is indeed powerful: if a process's physical memory spans over multiple
> > > memory controllers, OS can in theory scrub chunks of the memory
> > > belonging to the process. However, from a previous discussion [1],
> > > "From a h/w perspective it might always be complex". IIUC, the address
> > > translation from physical address to channel address is hard to
> > > achieve, and probably that's one of the tech reasons the patrol scrub
> > > ACPI spec is not in practice implemented?  
> >
> > Next bit is kind of an aside as I mostly agree with your conclusions ;)
> >
> > This obviously depends on your memory interleave. You want to present
> > physical address ranges as single controllable regions - probably
> > most interesting being stuff that maps to NUMA nodes.  The short
> > answer is that any firmware control will end up writing to all the
> > memory controllers involved in a given PA range - firmware can easily
> > establish which those are.
> >
> > A memory controller can support multiple scrub regions
> > which map from a PA range to a particular set of RAM addresses
> > - that's implementation specific. The memory controller is getting
> > the host PA and can carry out appropriate hashing if it wants to.
> > Many scrub solutions won't do this - in which case it's max one
> > region per memory controller (mapped by firmware to one region per
> > interleave set - assuming interleave sets take in whole memory
> > controllers - which they normally do).
> >
> > I would expect existing systems (not design with this differentiated
> > scrub in mind) to only support scrub control by PA range at the
> > granularity of interleave sets.
> >
> > Note that this general question of PA based controls also
> > maps to things like resource control (resctl) where it's only interesting
> > to partition memory bandwidth such that the partition applies to the
> > whole interleave set - that's done for ARM systems anyway by having
> > the userspace interface pretend there is only one control, but
> > write the settings to all actual controllers involved. Not sure what
> > x86 does.
> >
> > Taking a few examples that I know of.  All 4 socket server - with
> > control of these as bios options ;).
> > Assuming individual memory controllers don't allow scrub to be
> > configured by PA range.
> >
> > 1. Full system memory interleave (people do this form of crazy)
> >    In that case, there is only one set of firmware controls
> >    that write to the interfaces of every memory controller.  Depending
> >    on the interleave design that may still allow multiple regions.
> >
> > 2. Socket wide memory interleave.  In that case, firmware controls
> >    need to effect all memory controllers in that socket if the
> >    requested 'region' maps to them.  So 4 PA regions.
> >
> > 3. Die wide memory interleave.  Finer granularity of control
> >    so perhaps 8 PA rgiones.
> >
> > 4. Finer granularity (there are reasons to do this for above mentioned
> >    bandwidth resource control which you can only do if not interleaving
> >    across multiple controllers).
> >
> >
> >  
> > >
> > > So my take is, control at the granularity of the memory controller is
> > > probably a nice compromise.
> > > Both OS and userspace can get a pretty
> > > decent amount of flexibility, start/stop/adjust speed of the scrubbing
> > > on a memory controller; meanwhile it doesn't impose too much pain to
> > > hardware vendors when they provide these features in hardware. In
> > > terms of how these controls/features will be implemented, I imagine it
> > > could be implemented:
> > > * via hardware registers that directly or indirectly control on memory
> > > controllers
> > > * via ACPI if the situation changes in 10 years (and the RASF/RAS2/PCC
> > > drivers implemented in this patchset can be directly plugged into)
> > > * a kernel-thread that uses cpu read to detect memory errors, if
> > > hardware support is unavailable or not good enough
> > >  
> >
> > I more or less agree, but would tweak a little.
> >
> > 1) Allow for multiple regions per memory controller - that's needed
> >    for RASF etc anyway - because as far as they are concerned there
> >    is only one interface presented.
> > 2) Tie the interface to interleave sets, not memory controllers.
> >    NUMA nodes often being a good stand in for those.  
> 
> Does you mean /sys/devices/system/node/nodeX/scrub, where scrub is a
> virtual concept of scrubbing device that mapps to 1 or several
> physical scrubber backends.
> 
> For example, starting/stopping the virtual device means issuing
> START/STOP cmd to all backends. And...

That file location would work I think, though in some cases it
may get complex with single memory controllers covering parts of
different interleave sets. I guess mapping that mess is all a
software problem.


> 
> >    Controlling memory controllers separately for parts of an interleave
> >    isn't something I'd think was very useful.  This will probably get
> >    messy in the future though and the complexity could be pushed to
> >    a userspace tool - as long as enough info is available elsewhere
> >    in sysfs.  So need those memory controller directories you propose
> >    to include info on the PA ranges that they are involved in backing  
> 
> is it acceptable if we don't provide PA range or region in the
> interface *for now* if it complicates things a lot? I could be wrong,
> but the user of scrubber seems would be ok with not being able to
> scrub an arbitrary physical address range. In contrast, not knowing
> the scrub results seems to be more annoying to users. So simply giving
> some progress indicator like how many bytes a scrubber has scrubbed.

Progress is usually a non starter given it tends to be continuous in
the hardware controllers I've seen. So you control that it does it every
X hours, but doesn't provide a progress indicator.

I'd stil like PA range being there, but maybe RO fixed values.
Should be easy enough to establish. Also, won't matter if multiple
controllers identify same PA range, assumption would be control them all from
userspace, or use the NUMA type mapping above.

> 
> When we really need to support PA range or region, under the
> /sys/devices/system/node/nodeX/scrub interface, it basically uses NUMA
> node X's PA range. Then to scrub node memory in range [PA1, PA2), some
> driver that understand all backends (or can talk to all backends'
> drivers) needs to translate the PA into the address in backend's
> address space, for example, [PA1, PA2) is mapped to 2 device ranges
> [DA11, DA12) on backend_1 and [DA21, DA22) on backend_2.

Agreed.

> 
> >    (which to keep things horrible, can change at runtime via memory
> >     hotplug and remapping of host physical address ranges on CXL etc)  
> 
> CXL memory locally attached to the host is probably more or less the
> same as normal physical memory. I wonder what it would be like for CXL
> memory remotely attached through a memory pool. Does it make sense
> that the controller/owner of the memory pool takes the responsibility
> of controlling the CXL memory controller to control scrubbing? Does
> the owner need to provide/mediate scrubbing support for other clients
> using the memory pool?

Good and complex questions but lets leave those for a while ;)
beyond comprehending the fact that we can't always map interleave to
particular memory controllers as they may use only a subset of the
memory on that device. 

+CC'd linux-cxl for info, not because I want that discussion to go deep
at the moment.

Jonathan

> 
> Thanks,
> Jiaqi
> 
> >  
> > > Given these possible backends of scrubbing, I think a more generic
> > > sysfs API that covers and abstracts these backends will be more
> > > valuable right now. I haven’t thought thoroughly, but how about
> > > defining the top-level interface as something like
> > > “/sys/devices/system/memory_controller_scrubX/”, or
> > > “/sys/class/memory_controllerX/scrub”?  
> >
> > No particular harm in the rename of the directory I guess.
> > Though some of those 'memory_controllers' would be virtual as they
> > wouldn't correspond to actual memory controllers but rather to
> > sets of them.
> >
> > Jonathan
> >  
> > >
> > > [1] https://lore.kernel.org/linux-mm/SJ1PR11MB6083BF93E9A88E659CED5EC4FC3F9@SJ1PR11MB6083.namprd11.prod.outlook.com/T/#m13516ee35caa05b506080ae805bee14f9f958d43  
> >  
> > >  
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               The /sys/class/scrub/scrubX/region{0,1,2,3,...}
> > > > +               directories correspond to each scrub region under a scrub device.
> > > > +               Scrub region is a physical address range for which scrub may be
> > > > +               separately controlled. Regions may overlap in which case the
> > > > +               scrubbing rate of the overlapped memory will be at least that
> > > > +               expected due to each overlapping region.
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/regionY/addr_base
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               (RW) The base of the address range of the memory region
> > > > +               to be patrol scrubbed.
> > > > +               On reading, returns the base of the memory region for
> > > > +               the actual address range(The platform calculates
> > > > +               the nearest patrol scrub boundary address from where
> > > > +               it can start scrubbing).
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/regionY/addr_size
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               (RW) The size of the address range to be patrol scrubbed.
> > > > +               On reading, returns the size of the memory region for
> > > > +               the actual address range.
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/regionY/enable
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               (WO) Start/Stop scrubbing the memory region.
> > > > +               1 - enable the memory scrubbing.
> > > > +               0 - disable the memory scrubbing..
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/regionY/speed_available
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               (RO) Supported range for the partol speed(scrub rate)
> > > > +               by the scrubber for a memory region.
> > > > +               The unit of the scrub rate vary depends on the scrubber.
> > > > +
> > > > +What:          /sys/class/scrub/scrubX/regionY/speed
> > > > +Date:          September 2023
> > > > +KernelVersion: 6.7
> > > > +Contact:       linux-kernel@vger.kernel.org
> > > > +Description:
> > > > +               (RW) The partol speed(scrub rate) on the memory region specified and
> > > > +               it must be with in the supported range by the scrubber.
> > > > +               The unit of the scrub rate vary depends on the scrubber.
> > > > --
> > > > 2.34.1
> > > >
> > > >  
> > >  
> >
> >  
>
David Rientjes Oct. 5, 2023, 3:18 a.m. UTC | #5
On Wed, 27 Sep 2023, Jiaqi Yan wrote:

> > > 1. I am not aware of any chip/platform hardware that implemented the
> > > hw ps part defined in ACPI RASF/RAS2 spec. So I am curious what the
> > > RAS experts from different hardware vendors think about this. For
> > > example, Tony and Dave from Intel, Jon and Vilas from AMD. Is there
> > > any hardware platform (if allowed to disclose) that implemented ACPI
> > > RASF/RAS2? If so, will vendors continue to support the control of
> > > patrol scrubber using the ACPI spec? If not (as Tony said in [1], will
> > > the vendor consider starting some future platform?
> > >
> > > If we are unlikely to get the vendor support, creating this ACPI
> > > specific sysfs API (and the driver implementations) in Linux seems to
> > > have limited meaning.
> >
> > There is a bit of a chicken and egg problem here. Until there is
> > reasonable support in kernel (or it looks like there will be),
> > BIOS teams push back on a requirement to add the tables.
> > I'd encourage no one to bother with RASF - RAS2 is much less
> > ambiguous.
> 
> Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD (Jon
> and Vilas) for your opinion on RAS2.
> 

We'll need to know from vendors, ideally at minimum from both Intel and 
AMD, whether RAS2 is the long-term vision here.  Nothing is set in stone, 
of course, but deciding whether RAS2 is the standard that we should be 
rallying around will help to guide future development including in the 
kernel.

If RAS2 is insufficient for future use cases or we would need to support 
multiple implementations in the kernel for configuring the patrol scrubber 
depending on vendor, that's great feedback to have.

I'd much rather focus on implementing something in the kernel that we have 
some clarity about the vendors supporting, especially when it comes with 
user visible interfaces, as opposed to something that may not be used long 
term.  I think that's a fair ask and that vendor feedback is required 
here?
Jonathan Cameron Oct. 6, 2023, 1:02 p.m. UTC | #6
On Wed, 4 Oct 2023 20:18:12 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 27 Sep 2023, Jiaqi Yan wrote:
> 
> > > > 1. I am not aware of any chip/platform hardware that implemented the
> > > > hw ps part defined in ACPI RASF/RAS2 spec. So I am curious what the
> > > > RAS experts from different hardware vendors think about this. For
> > > > example, Tony and Dave from Intel, Jon and Vilas from AMD. Is there
> > > > any hardware platform (if allowed to disclose) that implemented ACPI
> > > > RASF/RAS2? If so, will vendors continue to support the control of
> > > > patrol scrubber using the ACPI spec? If not (as Tony said in [1], will
> > > > the vendor consider starting some future platform?
> > > >
> > > > If we are unlikely to get the vendor support, creating this ACPI
> > > > specific sysfs API (and the driver implementations) in Linux seems to
> > > > have limited meaning.  
> > >
> > > There is a bit of a chicken and egg problem here. Until there is
> > > reasonable support in kernel (or it looks like there will be),
> > > BIOS teams push back on a requirement to add the tables.
> > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > ambiguous.  
> > 
> > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD (Jon
> > and Vilas) for your opinion on RAS2.
> >   
> 
> We'll need to know from vendors, ideally at minimum from both Intel and 
> AMD, whether RAS2 is the long-term vision here.  Nothing is set in stone, 
> of course, but deciding whether RAS2 is the standard that we should be 
> rallying around will help to guide future development including in the 
> kernel.
> 
> If RAS2 is insufficient for future use cases or we would need to support 
> multiple implementations in the kernel for configuring the patrol scrubber 
> depending on vendor, that's great feedback to have.
> 
> I'd much rather focus on implementing something in the kernel that we have 
> some clarity about the vendors supporting, especially when it comes with 
> user visible interfaces, as opposed to something that may not be used long 
> term.  I think that's a fair ask and that vendor feedback is required 
> here?

Agreed and happy to have feedback from Intel and AMD + all the other CPU
vendors who make use of ACPI + all the OEMs who add stuff well beyond what
Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the
kernel covers stuff not used on mainstream x86 platforms because they are
doing something custom and we didn't want 2 + X custom implementations...

Some other interfaces for scrub control (beyond existing embedded ones)
will surface in the next few months where RAS2 is not appropriate.

Jonathan
Sridharan, Vilas Oct. 6, 2023, 1:06 p.m. UTC | #7
[AMD Official Use Only - General]

I do not believe AMD has implemented RASF/RAS2 at all.

We are looking at it, but our initial impression is that it is insufficiently flexible for general use. (Not just for this feature, but for others in the future.)

    -Vilas

-----Original Message-----
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Sent: Friday, October 6, 2023 9:02 AM
To: David Rientjes <rientjes@google.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; Sridharan, Vilas <Vilas.Sridharan@amd.com>; linuxarm@huawei.com; shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Wed, 4 Oct 2023 20:18:12 -0700 (PDT)
David Rientjes <rientjes@google.com> wrote:

> On Wed, 27 Sep 2023, Jiaqi Yan wrote:
>
> > > > 1. I am not aware of any chip/platform hardware that implemented
> > > > the hw ps part defined in ACPI RASF/RAS2 spec. So I am curious
> > > > what the RAS experts from different hardware vendors think about
> > > > this. For example, Tony and Dave from Intel, Jon and Vilas from
> > > > AMD. Is there any hardware platform (if allowed to disclose)
> > > > that implemented ACPI RASF/RAS2? If so, will vendors continue to
> > > > support the control of patrol scrubber using the ACPI spec? If
> > > > not (as Tony said in [1], will the vendor consider starting some future platform?
> > > >
> > > > If we are unlikely to get the vendor support, creating this ACPI
> > > > specific sysfs API (and the driver implementations) in Linux
> > > > seems to have limited meaning.
> > >
> > > There is a bit of a chicken and egg problem here. Until there is
> > > reasonable support in kernel (or it looks like there will be),
> > > BIOS teams push back on a requirement to add the tables.
> > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > ambiguous.
> >
> > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD
> > (Jon and Vilas) for your opinion on RAS2.
> >
>
> We'll need to know from vendors, ideally at minimum from both Intel
> and AMD, whether RAS2 is the long-term vision here.  Nothing is set in
> stone, of course, but deciding whether RAS2 is the standard that we
> should be rallying around will help to guide future development
> including in the kernel.
>
> If RAS2 is insufficient for future use cases or we would need to
> support multiple implementations in the kernel for configuring the
> patrol scrubber depending on vendor, that's great feedback to have.
>
> I'd much rather focus on implementing something in the kernel that we
> have some clarity about the vendors supporting, especially when it
> comes with user visible interfaces, as opposed to something that may
> not be used long term.  I think that's a fair ask and that vendor
> feedback is required here?

Agreed and happy to have feedback from Intel and AMD + all the other CPU vendors who make use of ACPI + all the OEMs who add stuff well beyond what Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the kernel covers stuff not used on mainstream x86 platforms because they are doing something custom and we didn't want 2 + X custom implementations...

Some other interfaces for scrub control (beyond existing embedded ones) will surface in the next few months where RAS2 is not appropriate.

Jonathan
Jonathan Cameron Oct. 11, 2023, 4:35 p.m. UTC | #8
On Fri, 6 Oct 2023 13:06:53 +0000
"Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:

> [AMD Official Use Only - General]
> 
> I do not believe AMD has implemented RASF/RAS2 at all.
> 
> We are looking at it, but our initial impression is that it is insufficiently flexible for general use. (Not just for this feature, but for others in the future.)
> 
>     -Vilas

Hi Vilas,

So obvious question is - worth fixing?

I'm not particularly keen to see 10+ different ways of meeting this requirement.

Probably not too bad if that's 10+ drivers implementing the same userspace ABI, but
definitely don't want 10 drivers and 10 ABIs.

Jonathan

> 
> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Friday, October 6, 2023 9:02 AM
> To: David Rientjes <rientjes@google.com>
> Cc: Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; Sridharan, Vilas <Vilas.Sridharan@amd.com>; linuxarm@huawei.com; shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, 4 Oct 2023 20:18:12 -0700 (PDT)
> David Rientjes <rientjes@google.com> wrote:
> 
> > On Wed, 27 Sep 2023, Jiaqi Yan wrote:
> >  
> > > > > 1. I am not aware of any chip/platform hardware that implemented
> > > > > the hw ps part defined in ACPI RASF/RAS2 spec. So I am curious
> > > > > what the RAS experts from different hardware vendors think about
> > > > > this. For example, Tony and Dave from Intel, Jon and Vilas from
> > > > > AMD. Is there any hardware platform (if allowed to disclose)
> > > > > that implemented ACPI RASF/RAS2? If so, will vendors continue to
> > > > > support the control of patrol scrubber using the ACPI spec? If
> > > > > not (as Tony said in [1], will the vendor consider starting some future platform?
> > > > >
> > > > > If we are unlikely to get the vendor support, creating this ACPI
> > > > > specific sysfs API (and the driver implementations) in Linux
> > > > > seems to have limited meaning.  
> > > >
> > > > There is a bit of a chicken and egg problem here. Until there is
> > > > reasonable support in kernel (or it looks like there will be),
> > > > BIOS teams push back on a requirement to add the tables.
> > > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > > ambiguous.  
> > >
> > > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD
> > > (Jon and Vilas) for your opinion on RAS2.
> > >  
> >
> > We'll need to know from vendors, ideally at minimum from both Intel
> > and AMD, whether RAS2 is the long-term vision here.  Nothing is set in
> > stone, of course, but deciding whether RAS2 is the standard that we
> > should be rallying around will help to guide future development
> > including in the kernel.
> >
> > If RAS2 is insufficient for future use cases or we would need to
> > support multiple implementations in the kernel for configuring the
> > patrol scrubber depending on vendor, that's great feedback to have.
> >
> > I'd much rather focus on implementing something in the kernel that we
> > have some clarity about the vendors supporting, especially when it
> > comes with user visible interfaces, as opposed to something that may
> > not be used long term.  I think that's a fair ask and that vendor
> > feedback is required here?  
> 
> Agreed and happy to have feedback from Intel and AMD + all the other CPU vendors who make use of ACPI + all the OEMs who add stuff well beyond what Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the kernel covers stuff not used on mainstream x86 platforms because they are doing something custom and we didn't want 2 + X custom implementations...
> 
> Some other interfaces for scrub control (beyond existing embedded ones) will surface in the next few months where RAS2 is not appropriate.
> 
> Jonathan
> 
>
Sridharan, Vilas Oct. 12, 2023, 1:41 p.m. UTC | #9
[AMD Official Use Only - General]

+ Leo and Yazen

We looked at RASF and RAS2 again. We don't think RASF is worth fixing. Our preference is to coalesce around RAS2 because we think it can be extended in interesting ways.

The patrol scrub function probably needs some changes to be more general across different types of hardware (there are some baked-in assumptions that don't always hold true).

We will look at some spec changes to fix the patrol scrub function, and we are going to start thinking about other functions that can be added to RAS2.

    -Vilas

-----Original Message-----
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Sent: Wednesday, October 11, 2023 12:36 PM
To: Sridharan, Vilas <Vilas.Sridharan@amd.com>
Cc: David Rientjes <rientjes@google.com>; Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; linuxarm@huawei.com; shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Fri, 6 Oct 2023 13:06:53 +0000
"Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:

> [AMD Official Use Only - General]
>
> I do not believe AMD has implemented RASF/RAS2 at all.
>
> We are looking at it, but our initial impression is that it is
> insufficiently flexible for general use. (Not just for this feature,
> but for others in the future.)
>
>     -Vilas

Hi Vilas,

So obvious question is - worth fixing?

I'm not particularly keen to see 10+ different ways of meeting this requirement.

Probably not too bad if that's 10+ drivers implementing the same userspace ABI, but definitely don't want 10 drivers and 10 ABIs.

Jonathan

>
> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Friday, October 6, 2023 9:02 AM
> To: David Rientjes <rientjes@google.com>
> Cc: Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>;
> Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com;
> Sridharan, Vilas <Vilas.Sridharan@amd.com>; linuxarm@huawei.com;
> shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com;
> jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com;
> pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com;
> gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation
> entries for set of scrub attributes
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Wed, 4 Oct 2023 20:18:12 -0700 (PDT) David Rientjes
> <rientjes@google.com> wrote:
>
> > On Wed, 27 Sep 2023, Jiaqi Yan wrote:
> >
> > > > > 1. I am not aware of any chip/platform hardware that
> > > > > implemented the hw ps part defined in ACPI RASF/RAS2 spec. So
> > > > > I am curious what the RAS experts from different hardware
> > > > > vendors think about this. For example, Tony and Dave from
> > > > > Intel, Jon and Vilas from AMD. Is there any hardware platform
> > > > > (if allowed to disclose) that implemented ACPI RASF/RAS2? If
> > > > > so, will vendors continue to support the control of patrol
> > > > > scrubber using the ACPI spec? If not (as Tony said in [1], will the vendor consider starting some future platform?
> > > > >
> > > > > If we are unlikely to get the vendor support, creating this
> > > > > ACPI specific sysfs API (and the driver implementations) in
> > > > > Linux seems to have limited meaning.
> > > >
> > > > There is a bit of a chicken and egg problem here. Until there is
> > > > reasonable support in kernel (or it looks like there will be),
> > > > BIOS teams push back on a requirement to add the tables.
> > > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > > ambiguous.
> > >
> > > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD
> > > (Jon and Vilas) for your opinion on RAS2.
> > >
> >
> > We'll need to know from vendors, ideally at minimum from both Intel
> > and AMD, whether RAS2 is the long-term vision here.  Nothing is set
> > in stone, of course, but deciding whether RAS2 is the standard that
> > we should be rallying around will help to guide future development
> > including in the kernel.
> >
> > If RAS2 is insufficient for future use cases or we would need to
> > support multiple implementations in the kernel for configuring the
> > patrol scrubber depending on vendor, that's great feedback to have.
> >
> > I'd much rather focus on implementing something in the kernel that
> > we have some clarity about the vendors supporting, especially when
> > it comes with user visible interfaces, as opposed to something that
> > may not be used long term.  I think that's a fair ask and that
> > vendor feedback is required here?
>
> Agreed and happy to have feedback from Intel and AMD + all the other CPU vendors who make use of ACPI + all the OEMs who add stuff well beyond what Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the kernel covers stuff not used on mainstream x86 platforms because they are doing something custom and we didn't want 2 + X custom implementations...
>
> Some other interfaces for scrub control (beyond existing embedded ones) will surface in the next few months where RAS2 is not appropriate.
>
> Jonathan
>
>
Jonathan Cameron Oct. 12, 2023, 3:02 p.m. UTC | #10
On Thu, 12 Oct 2023 13:41:19 +0000
"Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:

> [AMD Official Use Only - General]
> 
> + Leo and Yazen

Hi All.

+ Kangkang and Wanghuiqiang (Henson),

> 
> We looked at RASF and RAS2 again. We don't think RASF is worth fixing. Our preference is to coalesce around RAS2 because we think it can be extended in interesting ways.

Absolutely agree. I'm guessing RAS2 was previous go at fixing RASF though I haven't
done the archaeology.

> 
> The patrol scrub function probably needs some changes to be more general across different types of hardware (there are some baked-in assumptions that don't always hold true).

Agreed. One aspect I'd love to see improved is expanded discoverability of what
the hardware can do.

> 
> We will look at some spec changes to fix the patrol scrub function, and we are going to start thinking about other functions that can be added to RAS2.

Feel free to reach out if you want some early input on this. Are you thinking
a code first proposal?  If you think doing this through the standards body is a good idea
then perhaps message back here so we know when to look for further proposals in mantis.

Thanks,

Jonathan
> 
>     -Vilas
> 
> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Wednesday, October 11, 2023 12:36 PM
> To: Sridharan, Vilas <Vilas.Sridharan@amd.com>
> Cc: David Rientjes <rientjes@google.com>; Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; linuxarm@huawei.com; shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Fri, 6 Oct 2023 13:06:53 +0000
> "Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:
> 
> > [AMD Official Use Only - General]
> >
> > I do not believe AMD has implemented RASF/RAS2 at all.
> >
> > We are looking at it, but our initial impression is that it is
> > insufficiently flexible for general use. (Not just for this feature,
> > but for others in the future.)
> >
> >     -Vilas  
> 
> Hi Vilas,
> 
> So obvious question is - worth fixing?
> 
> I'm not particularly keen to see 10+ different ways of meeting this requirement.
> 
> Probably not too bad if that's 10+ drivers implementing the same userspace ABI, but definitely don't want 10 drivers and 10 ABIs.
> 
> Jonathan
> 
> >
> > -----Original Message-----
> > From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> > Sent: Friday, October 6, 2023 9:02 AM
> > To: David Rientjes <rientjes@google.com>
> > Cc: Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>;
> > Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com;
> > Sridharan, Vilas <Vilas.Sridharan@amd.com>; linuxarm@huawei.com;
> > shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org;
> > linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com;
> > jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com;
> > pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com;
> > gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> > Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation
> > entries for set of scrub attributes
> >
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > On Wed, 4 Oct 2023 20:18:12 -0700 (PDT) David Rientjes
> > <rientjes@google.com> wrote:
> >  
> > > On Wed, 27 Sep 2023, Jiaqi Yan wrote:
> > >  
> > > > > > 1. I am not aware of any chip/platform hardware that
> > > > > > implemented the hw ps part defined in ACPI RASF/RAS2 spec. So
> > > > > > I am curious what the RAS experts from different hardware
> > > > > > vendors think about this. For example, Tony and Dave from
> > > > > > Intel, Jon and Vilas from AMD. Is there any hardware platform
> > > > > > (if allowed to disclose) that implemented ACPI RASF/RAS2? If
> > > > > > so, will vendors continue to support the control of patrol
> > > > > > scrubber using the ACPI spec? If not (as Tony said in [1], will the vendor consider starting some future platform?
> > > > > >
> > > > > > If we are unlikely to get the vendor support, creating this
> > > > > > ACPI specific sysfs API (and the driver implementations) in
> > > > > > Linux seems to have limited meaning.  
> > > > >
> > > > > There is a bit of a chicken and egg problem here. Until there is
> > > > > reasonable support in kernel (or it looks like there will be),
> > > > > BIOS teams push back on a requirement to add the tables.
> > > > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > > > ambiguous.  
> > > >
> > > > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD
> > > > (Jon and Vilas) for your opinion on RAS2.
> > > >  
> > >
> > > We'll need to know from vendors, ideally at minimum from both Intel
> > > and AMD, whether RAS2 is the long-term vision here.  Nothing is set
> > > in stone, of course, but deciding whether RAS2 is the standard that
> > > we should be rallying around will help to guide future development
> > > including in the kernel.
> > >
> > > If RAS2 is insufficient for future use cases or we would need to
> > > support multiple implementations in the kernel for configuring the
> > > patrol scrubber depending on vendor, that's great feedback to have.
> > >
> > > I'd much rather focus on implementing something in the kernel that
> > > we have some clarity about the vendors supporting, especially when
> > > it comes with user visible interfaces, as opposed to something that
> > > may not be used long term.  I think that's a fair ask and that
> > > vendor feedback is required here?  
> >
> > Agreed and happy to have feedback from Intel and AMD + all the other CPU vendors who make use of ACPI + all the OEMs who add stuff well beyond what Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the kernel covers stuff not used on mainstream x86 platforms because they are doing something custom and we didn't want 2 + X custom implementations...
> >
> > Some other interfaces for scrub control (beyond existing embedded ones) will surface in the next few months where RAS2 is not appropriate.
> >
> > Jonathan
> >
> >  
>
Sridharan, Vilas Oct. 12, 2023, 3:44 p.m. UTC | #11
[AMD Official Use Only - General]

> Are you thinking a code first proposal?  If you think doing this through the standards body is a good idea then perhaps message back here so we know when to look for further proposals in mantis.

I am not super familiar with what you mean by 'code first proposal', but we are thinking about crafting an ECN (or a set of ECNs) for ACPI, that will be made public through ACPI's normal process.

    -Vilas

-----Original Message-----
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Sent: Thursday, October 12, 2023 11:02 AM
To: Sridharan, Vilas <Vilas.Sridharan@amd.com>
Cc: Duran, Leo <leo.duran@amd.com>; Ghannam, Yazen <Yazen.Ghannam@amd.com>; David Rientjes <rientjes@google.com>; Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; linuxarm@huawei.com; shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com; kangkang.shen@futurewei.com; wanghuiqiang@huawei.com
Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Thu, 12 Oct 2023 13:41:19 +0000
"Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:

> [AMD Official Use Only - General]
>
> + Leo and Yazen

Hi All.

+ Kangkang and Wanghuiqiang (Henson),

>
> We looked at RASF and RAS2 again. We don't think RASF is worth fixing. Our preference is to coalesce around RAS2 because we think it can be extended in interesting ways.

Absolutely agree. I'm guessing RAS2 was previous go at fixing RASF though I haven't done the archaeology.

>
> The patrol scrub function probably needs some changes to be more general across different types of hardware (there are some baked-in assumptions that don't always hold true).

Agreed. One aspect I'd love to see improved is expanded discoverability of what the hardware can do.

>
> We will look at some spec changes to fix the patrol scrub function, and we are going to start thinking about other functions that can be added to RAS2.

Feel free to reach out if you want some early input on this. Are you thinking a code first proposal?  If you think doing this through the standards body is a good idea then perhaps message back here so we know when to look for further proposals in mantis.

Thanks,

Jonathan
>
>     -Vilas
>
> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Wednesday, October 11, 2023 12:36 PM
> To: Sridharan, Vilas <Vilas.Sridharan@amd.com>
> Cc: David Rientjes <rientjes@google.com>; Jiaqi Yan
> <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon
> <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; linuxarm@huawei.com;
> shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com;
> jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com;
> pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com;
> gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation
> entries for set of scrub attributes
>
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>
>
> On Fri, 6 Oct 2023 13:06:53 +0000
> "Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:
>
> > [AMD Official Use Only - General]
> >
> > I do not believe AMD has implemented RASF/RAS2 at all.
> >
> > We are looking at it, but our initial impression is that it is
> > insufficiently flexible for general use. (Not just for this feature,
> > but for others in the future.)
> >
> >     -Vilas
>
> Hi Vilas,
>
> So obvious question is - worth fixing?
>
> I'm not particularly keen to see 10+ different ways of meeting this requirement.
>
> Probably not too bad if that's 10+ drivers implementing the same userspace ABI, but definitely don't want 10 drivers and 10 ABIs.
>
> Jonathan
>
> >
> > -----Original Message-----
> > From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> > Sent: Friday, October 6, 2023 9:02 AM
> > To: David Rientjes <rientjes@google.com>
> > Cc: Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony
> > <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>;
> > dave.hansen@linux.intel.com; Sridharan, Vilas
> > <Vilas.Sridharan@amd.com>; linuxarm@huawei.com;
> > shiju.jose@huawei.com; linux-acpi@vger.kernel.org;
> > linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org;
> > lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com;
> > david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com;
> > erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
> > mike.malvestuto@intel.com; gthelen@google.com;
> > tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> > Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation
> > entries for set of scrub attributes
> >
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > On Wed, 4 Oct 2023 20:18:12 -0700 (PDT) David Rientjes
> > <rientjes@google.com> wrote:
> >
> > > On Wed, 27 Sep 2023, Jiaqi Yan wrote:
> > >
> > > > > > 1. I am not aware of any chip/platform hardware that
> > > > > > implemented the hw ps part defined in ACPI RASF/RAS2 spec.
> > > > > > So I am curious what the RAS experts from different hardware
> > > > > > vendors think about this. For example, Tony and Dave from
> > > > > > Intel, Jon and Vilas from AMD. Is there any hardware
> > > > > > platform (if allowed to disclose) that implemented ACPI
> > > > > > RASF/RAS2? If so, will vendors continue to support the
> > > > > > control of patrol scrubber using the ACPI spec? If not (as Tony said in [1], will the vendor consider starting some future platform?
> > > > > >
> > > > > > If we are unlikely to get the vendor support, creating this
> > > > > > ACPI specific sysfs API (and the driver implementations) in
> > > > > > Linux seems to have limited meaning.
> > > > >
> > > > > There is a bit of a chicken and egg problem here. Until there
> > > > > is reasonable support in kernel (or it looks like there will
> > > > > be), BIOS teams push back on a requirement to add the tables.
> > > > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > > > ambiguous.
> > > >
> > > > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD
> > > > (Jon and Vilas) for your opinion on RAS2.
> > > >
> > >
> > > We'll need to know from vendors, ideally at minimum from both
> > > Intel and AMD, whether RAS2 is the long-term vision here.  Nothing
> > > is set in stone, of course, but deciding whether RAS2 is the
> > > standard that we should be rallying around will help to guide
> > > future development including in the kernel.
> > >
> > > If RAS2 is insufficient for future use cases or we would need to
> > > support multiple implementations in the kernel for configuring the
> > > patrol scrubber depending on vendor, that's great feedback to have.
> > >
> > > I'd much rather focus on implementing something in the kernel that
> > > we have some clarity about the vendors supporting, especially when
> > > it comes with user visible interfaces, as opposed to something
> > > that may not be used long term.  I think that's a fair ask and
> > > that vendor feedback is required here?
> >
> > Agreed and happy to have feedback from Intel and AMD + all the other CPU vendors who make use of ACPI + all the OEMs who add stuff well beyond what Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the kernel covers stuff not used on mainstream x86 platforms because they are doing something custom and we didn't want 2 + X custom implementations...
> >
> > Some other interfaces for scrub control (beyond existing embedded ones) will surface in the next few months where RAS2 is not appropriate.
> >
> > Jonathan
> >
> >
>
Jonathan Cameron Oct. 13, 2023, 9:07 a.m. UTC | #12
On Thu, 12 Oct 2023 15:44:18 +0000
"Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:

> [AMD Official Use Only - General]
> 
> > Are you thinking a code first proposal?  If you think doing this through the standards body is a good idea then perhaps message back here so we know when to look for further proposals in mantis.  
> 
> I am not super familiar with what you mean by 'code first proposal', but we are thinking about crafting an ECN (or a set of ECNs) for ACPI, that will be made public through ACPI's normal process.
> 
There are two ways to go about getting an ECN into the specification
and which one is chosen affects the 'made it public' part of the ECN.

One is public from the start and is done via a proposal submitted to the Specification
Updates section of the tianocore bugzilla. This is referred to as "code
first", but actually just means the request came from discussions initially
had outside of the UEFI forum. They are still discussed in ASWG, but review
also occurs in public on the bugzilla.
https://bugzilla.tianocore.org/buglist.cgi?component=Specification%20Update&product=EDK2%20Code%20First&resolution=---

The other is the more traditional method of proposing in private. There the issue
is that the review is limited to those who both engage closely with ASWG and those
who can remember their mantis password. Before we post any software based on changes
going via that route (as its covered by UEFI forum IP rules) we have to wait for a
formal specification release.  So basically the traditional method is typically slower
and doesn't let us do helpful things like ask the kernel community to review the
proposed changes. The code first route was added a few years ago to provide the
options for companies that preferred the flexibility and openness it provides.

As you can see from the link above, there is a lot of activity via the code
first route these days.

Jonathan


>     -Vilas
> 
> -----Original Message-----
> From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> Sent: Thursday, October 12, 2023 11:02 AM
> To: Sridharan, Vilas <Vilas.Sridharan@amd.com>
> Cc: Duran, Leo <leo.duran@amd.com>; Ghannam, Yazen <Yazen.Ghannam@amd.com>; David Rientjes <rientjes@google.com>; Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; linuxarm@huawei.com; shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com; kangkang.shen@futurewei.com; wanghuiqiang@huawei.com
> Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation entries for set of scrub attributes
> 
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Thu, 12 Oct 2023 13:41:19 +0000
> "Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:
> 
> > [AMD Official Use Only - General]
> >
> > + Leo and Yazen  
> 
> Hi All.
> 
> + Kangkang and Wanghuiqiang (Henson),
> 
> >
> > We looked at RASF and RAS2 again. We don't think RASF is worth fixing. Our preference is to coalesce around RAS2 because we think it can be extended in interesting ways.  
> 
> Absolutely agree. I'm guessing RAS2 was previous go at fixing RASF though I haven't done the archaeology.
> 
> >
> > The patrol scrub function probably needs some changes to be more general across different types of hardware (there are some baked-in assumptions that don't always hold true).  
> 
> Agreed. One aspect I'd love to see improved is expanded discoverability of what the hardware can do.
> 
> >
> > We will look at some spec changes to fix the patrol scrub function, and we are going to start thinking about other functions that can be added to RAS2.  
> 
> Feel free to reach out if you want some early input on this. Are you thinking a code first proposal?  If you think doing this through the standards body is a good idea then perhaps message back here so we know when to look for further proposals in mantis.
> 
> Thanks,
> 
> Jonathan
> >
> >     -Vilas
> >
> > -----Original Message-----
> > From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> > Sent: Wednesday, October 11, 2023 12:36 PM
> > To: Sridharan, Vilas <Vilas.Sridharan@amd.com>
> > Cc: David Rientjes <rientjes@google.com>; Jiaqi Yan
> > <jiaqiyan@google.com>; Luck, Tony <tony.luck@intel.com>; Grimm, Jon
> > <Jon.Grimm@amd.com>; dave.hansen@linux.intel.com; linuxarm@huawei.com;
> > shiju.jose@huawei.com; linux-acpi@vger.kernel.org; linux-mm@kvack.org;
> > linux-kernel@vger.kernel.org; rafael@kernel.org; lenb@kernel.org;
> > naoya.horiguchi@nec.com; james.morse@arm.com; david@redhat.com;
> > jthoughton@google.com; somasundaram.a@hpe.com; erdemaktas@google.com;
> > pgonda@google.com; duenwen@google.com; mike.malvestuto@intel.com;
> > gthelen@google.com; tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> > Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation
> > entries for set of scrub attributes
> >
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > On Fri, 6 Oct 2023 13:06:53 +0000
> > "Sridharan, Vilas" <Vilas.Sridharan@amd.com> wrote:
> >  
> > > [AMD Official Use Only - General]
> > >
> > > I do not believe AMD has implemented RASF/RAS2 at all.
> > >
> > > We are looking at it, but our initial impression is that it is
> > > insufficiently flexible for general use. (Not just for this feature,
> > > but for others in the future.)
> > >
> > >     -Vilas  
> >
> > Hi Vilas,
> >
> > So obvious question is - worth fixing?
> >
> > I'm not particularly keen to see 10+ different ways of meeting this requirement.
> >
> > Probably not too bad if that's 10+ drivers implementing the same userspace ABI, but definitely don't want 10 drivers and 10 ABIs.
> >
> > Jonathan
> >  
> > >
> > > -----Original Message-----
> > > From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
> > > Sent: Friday, October 6, 2023 9:02 AM
> > > To: David Rientjes <rientjes@google.com>
> > > Cc: Jiaqi Yan <jiaqiyan@google.com>; Luck, Tony
> > > <tony.luck@intel.com>; Grimm, Jon <Jon.Grimm@amd.com>;
> > > dave.hansen@linux.intel.com; Sridharan, Vilas
> > > <Vilas.Sridharan@amd.com>; linuxarm@huawei.com;
> > > shiju.jose@huawei.com; linux-acpi@vger.kernel.org;
> > > linux-mm@kvack.org; linux-kernel@vger.kernel.org; rafael@kernel.org;
> > > lenb@kernel.org; naoya.horiguchi@nec.com; james.morse@arm.com;
> > > david@redhat.com; jthoughton@google.com; somasundaram.a@hpe.com;
> > > erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
> > > mike.malvestuto@intel.com; gthelen@google.com;
> > > tanxiaofei@huawei.com; prime.zeng@hisilicon.com
> > > Subject: Re: [RFC PATCH 2/9] memory: scrub: sysfs: Add Documentation
> > > entries for set of scrub attributes
> > >
> > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> > >
> > >
> > > On Wed, 4 Oct 2023 20:18:12 -0700 (PDT) David Rientjes
> > > <rientjes@google.com> wrote:
> > >  
> > > > On Wed, 27 Sep 2023, Jiaqi Yan wrote:
> > > >  
> > > > > > > 1. I am not aware of any chip/platform hardware that
> > > > > > > implemented the hw ps part defined in ACPI RASF/RAS2 spec.
> > > > > > > So I am curious what the RAS experts from different hardware
> > > > > > > vendors think about this. For example, Tony and Dave from
> > > > > > > Intel, Jon and Vilas from AMD. Is there any hardware
> > > > > > > platform (if allowed to disclose) that implemented ACPI
> > > > > > > RASF/RAS2? If so, will vendors continue to support the
> > > > > > > control of patrol scrubber using the ACPI spec? If not (as Tony said in [1], will the vendor consider starting some future platform?
> > > > > > >
> > > > > > > If we are unlikely to get the vendor support, creating this
> > > > > > > ACPI specific sysfs API (and the driver implementations) in
> > > > > > > Linux seems to have limited meaning.  
> > > > > >
> > > > > > There is a bit of a chicken and egg problem here. Until there
> > > > > > is reasonable support in kernel (or it looks like there will
> > > > > > be), BIOS teams push back on a requirement to add the tables.
> > > > > > I'd encourage no one to bother with RASF - RAS2 is much less
> > > > > > ambiguous.  
> > > > >
> > > > > Here mainly to re-ping folks from Intel (Tony and Dave)  and AMD
> > > > > (Jon and Vilas) for your opinion on RAS2.
> > > > >  
> > > >
> > > > We'll need to know from vendors, ideally at minimum from both
> > > > Intel and AMD, whether RAS2 is the long-term vision here.  Nothing
> > > > is set in stone, of course, but deciding whether RAS2 is the
> > > > standard that we should be rallying around will help to guide
> > > > future development including in the kernel.
> > > >
> > > > If RAS2 is insufficient for future use cases or we would need to
> > > > support multiple implementations in the kernel for configuring the
> > > > patrol scrubber depending on vendor, that's great feedback to have.
> > > >
> > > > I'd much rather focus on implementing something in the kernel that
> > > > we have some clarity about the vendors supporting, especially when
> > > > it comes with user visible interfaces, as opposed to something
> > > > that may not be used long term.  I think that's a fair ask and
> > > > that vendor feedback is required here?  
> > >
> > > Agreed and happy to have feedback from Intel and AMD + all the other CPU vendors who make use of ACPI + all the OEMs who add stuff well beyond what Intel and AMD tell them to :)  I'll just note a lot of the ACPI support in the kernel covers stuff not used on mainstream x86 platforms because they are doing something custom and we didn't want 2 + X custom implementations...
> > >
> > > Some other interfaces for scrub control (beyond existing embedded ones) will surface in the next few months where RAS2 is not appropriate.
> > >
> > > Jonathan
> > >
> > >  
> >  
>
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-class-scrub-configure b/Documentation/ABI/testing/sysfs-class-scrub-configure
new file mode 100644
index 000000000000..347e2167dc62
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-scrub-configure
@@ -0,0 +1,82 @@ 
+What:		/sys/class/scrub/
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		The scrub/ class subdirectory belongs to the
+		scrubber subsystem.
+
+What:		/sys/class/scrub/scrubX/
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		The /sys/class/scrub/scrub{0,1,2,3,...} directories
+		correspond to each scrub device.
+
+What:		/sys/class/scrub/scrubX/name
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RO) name of the memory scrub device
+
+What:		/sys/class/scrub/scrubX/regionY/
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		The /sys/class/scrub/scrubX/region{0,1,2,3,...}
+		directories correspond to each scrub region under a scrub device.
+		Scrub region is a physical address range for which scrub may be
+		separately controlled. Regions may overlap in which case the
+		scrubbing rate of the overlapped memory will be at least that
+		expected due to each overlapping region.
+
+What:		/sys/class/scrub/scrubX/regionY/addr_base
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) The base of the address range of the memory region
+		to be patrol scrubbed.
+		On reading, returns the base of the memory region for
+		the actual address range(The platform calculates
+		the nearest patrol scrub boundary address from where
+		it can start scrubbing).
+
+What:		/sys/class/scrub/scrubX/regionY/addr_size
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) The size of the address range to be patrol scrubbed.
+		On reading, returns the size of the memory region for
+		the actual address range.
+
+What:		/sys/class/scrub/scrubX/regionY/enable
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(WO) Start/Stop scrubbing the memory region.
+		1 - enable the memory scrubbing.
+		0 - disable the memory scrubbing..
+
+What:		/sys/class/scrub/scrubX/regionY/speed_available
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RO) Supported range for the partol speed(scrub rate)
+		by the scrubber for a memory region.
+		The unit of the scrub rate vary depends on the scrubber.
+
+What:		/sys/class/scrub/scrubX/regionY/speed
+Date:		September 2023
+KernelVersion:	6.7
+Contact:	linux-kernel@vger.kernel.org
+Description:
+		(RW) The partol speed(scrub rate) on the memory region specified and
+		it must be with in the supported range by the scrubber.
+		The unit of the scrub rate vary depends on the scrubber.