[v2,0/5] Address translation for HDM decoding

Message ID	20240701174754.967954-1-rrichter@amd.com
Headers	show Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2087.outbound.protection.outlook.com [40.107.243.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5ABD16E88E; Mon, 1 Jul 2024 17:48:19 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Robert Richter <rrichter@amd.com> To: Alison Schofield <alison.schofield@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Ira Weiny <ira.weiny@intel.com>, Dan Williams <dan.j.williams@intel.com>, Jonathan Cameron <jonathan.cameron@huawei.com>, Dave Jiang <dave.jiang@intel.com>, Davidlohr Bueso <dave@stgolabs.net> CC: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>, "Robert Richter" <rrichter@amd.com> Subject: [PATCH v2 0/5] Address translation for HDM decoding Date: Mon, 1 Jul 2024 19:47:48 +0200 Message-ID: <20240701174754.967954-1-rrichter@amd.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	Address translation for HDM decoding \| expand [v2,0/5] Address translation for HDM decoding [v2,1/5] cxl/hdm: Moving HDM specific code to core/hdm.c. [v2,2/5] cxl/hdm: Implement address translation for HDM decoding [v2,3/5] cxl/acpi: Add platform flag for HPA address translation [v2,4/5] cxl/hdm: Setup HPA base for address translation using the HPA window in CFMWS [v2,5/5] cxl/acpi: Enable address translation for Zen4 platforms

Robert Richter July 1, 2024, 5:47 p.m. UTC

Default expectation of Linux is that HPA == SPA, which means that
hardware addresses in the decoders are the same as the kernel sees
them. However, there are platforms where this is not the case and an
address translation between decoder's (HPA) and the system's physical
addresses (SPA) is needed.

This series implements address translation for HDM decoding. The
implementation follows the rule that the representation of hardware
address ranges in the kernel are all SPA. If decoder registers (HDM
decoder cap or register range) are not SPA, a base offset must be
applied. Translation happens when accessing the registers back and
forth. After a read access an address will be converted to SPA and
before a write access the programmed address is translated from an
SPA. The decoder register access can be easily encapsulated by address
translation and thus there are only a few places where translation is
needed and the code must be changed. This is implemented in patch #2,
patch #1 is a prerequisite.

Address translation is restricted to platforms that need it. As such a
platform check is needed and a flag is introduced for this (patch #3).

For address translation the base offset must be determined for the
memory domain. Depending on the platform there are various options for
this. The address range in the CEDT's CFWMS entry of the CXL host
bridge can be used to determine the decoder's base address (patch
#4). This is enabled for AMD Zen4 platforms (patch #5).

Changelog:

v2:
 * Fixed build error for other archs [kbot]


Robert Richter (5):
  cxl/hdm: Moving HDM specific code to core/hdm.c.
  cxl/hdm: Implement address translation for HDM decoding
  cxl/acpi: Add platform flag for HPA address translation
  cxl/hdm: Setup HPA base for address translation using the HPA window
    in CFMWS
  cxl/acpi: Enable address translation for Zen4 platforms

 drivers/cxl/acpi.c      |  16 +++
 drivers/cxl/core/core.h |   2 +
 drivers/cxl/core/hdm.c  | 245 ++++++++++++++++++++++++++++++++++++++--
 drivers/cxl/core/pci.c  | 119 +------------------
 drivers/cxl/cxl.h       |   2 +
 drivers/cxl/cxlmem.h    |   4 +
 drivers/cxl/cxlpci.h    |   3 -
 7 files changed, 262 insertions(+), 129 deletions(-)


base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0

Alison Schofield July 11, 2024, 12:02 a.m. UTC | #1

On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> This series implements address translation for HDM decoding. The
> implementation follows the rule that the representation of hardware
> address ranges in the kernel are all SPA. If decoder registers (HDM
> decoder cap or register range) are not SPA, a base offset must be
> applied. Translation happens when accessing the registers back and
> forth. After a read access an address will be converted to SPA and
> before a write access the programmed address is translated from an
> SPA. The decoder register access can be easily encapsulated by address
> translation and thus there are only a few places where translation is
> needed and the code must be changed. This is implemented in patch #2,
> patch #1 is a prerequisite.
> 
> Address translation is restricted to platforms that need it. As such a
> platform check is needed and a flag is introduced for this (patch #3).
> 
> For address translation the base offset must be determined for the
> memory domain. Depending on the platform there are various options for
> this. The address range in the CEDT's CFWMS entry of the CXL host
> bridge can be used to determine the decoder's base address (patch
> #4). This is enabled for AMD Zen4 platforms (patch #5).


Hi Robert,

This HPA->SPA work needs to be done for addresses reported directly by
devices, ie DPAs in poison and other events - right?

For the XOR case, we discover the need for HPA->SPA while parsing the
CFMWS and add a cxl_hpa_to_spa_fn to the struct cxl_root_decoder. Later,
when the driver wants to translate a DPA (to include in a TRACE_EVENT)
it uses that 'extra mile' HPA->SPA function.

See Patch 2 in this series:
https://lore.kernel.org/cover.1719980933.git.alison.schofield@intel.com/T/#m9206e1f872ef252dbb54ce7f0365f0b267179fda

It seems the Zen4 extra mile is a simple offset from the base calc.
Do you think a zen4 hpa->spa function will fit in with what I've done?

FWIW I took this code for a spin through cxl-test on it's own and combined
w the xor address tranlation patch set and no collisions, all humming along
nicely (for non-zen config).

--Alison

> 
snip
>

Alison Schofield July 11, 2024, 12:26 a.m. UTC | #2

On Wed, Jul 10, 2024 at 05:02:46PM -0700, Alison Schofield wrote:
> On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> > Default expectation of Linux is that HPA == SPA, which means that
> > hardware addresses in the decoders are the same as the kernel sees
> > them. However, there are platforms where this is not the case and an
> > address translation between decoder's (HPA) and the system's physical
> > addresses (SPA) is needed.
> > 
> > This series implements address translation for HDM decoding. The
> > implementation follows the rule that the representation of hardware
> > address ranges in the kernel are all SPA. If decoder registers (HDM
> > decoder cap or register range) are not SPA, a base offset must be
> > applied. Translation happens when accessing the registers back and
> > forth. After a read access an address will be converted to SPA and
> > before a write access the programmed address is translated from an
> > SPA. The decoder register access can be easily encapsulated by address
> > translation and thus there are only a few places where translation is
> > needed and the code must be changed. This is implemented in patch #2,
> > patch #1 is a prerequisite.
> > 
> > Address translation is restricted to platforms that need it. As such a
> > platform check is needed and a flag is introduced for this (patch #3).
> > 
> > For address translation the base offset must be determined for the
> > memory domain. Depending on the platform there are various options for
> > this. The address range in the CEDT's CFWMS entry of the CXL host
> > bridge can be used to determine the decoder's base address (patch
> > #4). This is enabled for AMD Zen4 platforms (patch #5).
> 
> 
> Hi Robert,
> 
> This HPA->SPA work needs to be done for addresses reported directly by
> devices, ie DPAs in poison and other events - right?
> 
> For the XOR case, we discover the need for HPA->SPA while parsing the
> CFMWS and add a cxl_hpa_to_spa_fn to the struct cxl_root_decoder. Later,
> when the driver wants to translate a DPA (to include in a TRACE_EVENT)
> it uses that 'extra mile' HPA->SPA function.
> 
> See Patch 2 in this series:
> https://lore.kernel.org/cover.1719980933.git.alison.schofield@intel.com/T/#m9206e1f872ef252dbb54ce7f0365f0b267179fda

Better link:
https://lore.kernel.org/linux-cxl/cover.1719980933.git.alison.schofield@intel.com/

> 
> It seems the Zen4 extra mile is a simple offset from the base calc.
> Do you think a zen4 hpa->spa function will fit in with what I've done?
> 
> FWIW I took this code for a spin through cxl-test on it's own and combined
> w the xor address tranlation patch set and no collisions, all humming along
> nicely (for non-zen config).
> 
> --Alison
> 
> > 
> snip
> > 
>

Robert Richter July 11, 2024, 7:03 p.m. UTC | #3

Alison,

On 10.07.24 17:02:46, Alison Schofield wrote:
> This HPA->SPA work needs to be done for addresses reported directly by
> devices, ie DPAs in poison and other events - right?

we need it to translate the base address of the HDM decoder cap
registers or the range registers to an HPA when accessing it.

> For the XOR case, we discover the need for HPA->SPA while parsing the
> CFMWS and add a cxl_hpa_to_spa_fn to the struct cxl_root_decoder. Later,

As some platforms allow an even more fine grained HPA/SPA mapping that
may devide a memory domain in separate SPA ranges we might just want
to move it directly into struct cxl_decoder so that endpoints or
switches can use it too.

> when the driver wants to translate a DPA (to include in a TRACE_EVENT)
> it uses that 'extra mile' HPA->SPA function.
> 
> See Patch 2 in this series:
> https://lore.kernel.org/cover.1719980933.git.alison.schofield@intel.com/T/#m9206e1f872ef252dbb54ce7f0365f0b267179fda
> 
> It seems the Zen4 extra mile is a simple offset from the base calc.
> Do you think a zen4 hpa->spa function will fit in with what I've done?

Thanks for the pointer, I will take look into the details here and try
to reuse most of that infrastructure.

> 
> FWIW I took this code for a spin through cxl-test on it's own and combined
> w the xor address tranlation patch set and no collisions, all humming along
> nicely (for non-zen config).

Thanks,

-Robert

Gregory Price July 25, 2024, 10 p.m. UTC | #4

On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> This series implements address translation for HDM decoding. The
> implementation follows the rule that the representation of hardware
> address ranges in the kernel are all SPA. If decoder registers (HDM
> decoder cap or register range) are not SPA, a base offset must be
> applied. Translation happens when accessing the registers back and
> forth. After a read access an address will be converted to SPA and
> before a write access the programmed address is translated from an
> SPA. The decoder register access can be easily encapsulated by address
> translation and thus there are only a few places where translation is
> needed and the code must be changed. This is implemented in patch #2,
> patch #1 is a prerequisite.
> 
> Address translation is restricted to platforms that need it. As such a
> platform check is needed and a flag is introduced for this (patch #3).
> 
> For address translation the base offset must be determined for the
> memory domain. Depending on the platform there are various options for
> this. The address range in the CEDT's CFWMS entry of the CXL host
> bridge can be used to determine the decoder's base address (patch
> #4). This is enabled for AMD Zen4 platforms (patch #5).
> 
> Changelog:
> 
> v2:
>  * Fixed build error for other archs [kbot]
> 

Hi Robert,

I'm looking to test this patch series but saw you were looking at
reworking a portion of it.  Just wanted to inquire as to whether
you think I should wait for a v3 given this is a few weeks old now.

Thanks!
~Gregory

Robert Richter Aug. 6, 2024, 10:54 a.m. UTC | #5

On 25.07.24 18:00:00, Gregory Price wrote:
> On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> > Default expectation of Linux is that HPA == SPA, which means that
> > hardware addresses in the decoders are the same as the kernel sees
> > them. However, there are platforms where this is not the case and an
> > address translation between decoder's (HPA) and the system's physical
> > addresses (SPA) is needed.
> > 
> > This series implements address translation for HDM decoding. The
> > implementation follows the rule that the representation of hardware
> > address ranges in the kernel are all SPA. If decoder registers (HDM
> > decoder cap or register range) are not SPA, a base offset must be
> > applied. Translation happens when accessing the registers back and
> > forth. After a read access an address will be converted to SPA and
> > before a write access the programmed address is translated from an
> > SPA. The decoder register access can be easily encapsulated by address
> > translation and thus there are only a few places where translation is
> > needed and the code must be changed. This is implemented in patch #2,
> > patch #1 is a prerequisite.
> > 
> > Address translation is restricted to platforms that need it. As such a
> > platform check is needed and a flag is introduced for this (patch #3).
> > 
> > For address translation the base offset must be determined for the
> > memory domain. Depending on the platform there are various options for
> > this. The address range in the CEDT's CFWMS entry of the CXL host
> > bridge can be used to determine the decoder's base address (patch
> > #4). This is enabled for AMD Zen4 platforms (patch #5).
> > 
> > Changelog:
> > 
> > v2:
> >  * Fixed build error for other archs [kbot]
> > 
> 
> Hi Robert,
> 
> I'm looking to test this patch series but saw you were looking at
> reworking a portion of it.  Just wanted to inquire as to whether
> you think I should wait for a v3 given this is a few weeks old now.

Yes, please wait with it.

Thanks,

-Robert

Gregory Price Aug. 16, 2024, 6:32 p.m. UTC | #6

On Mon, Jul 01, 2024 at 07:47:48PM +0200, Robert Richter wrote:
> Default expectation of Linux is that HPA == SPA, which means that
> hardware addresses in the decoders are the same as the kernel sees
> them. However, there are platforms where this is not the case and an
> address translation between decoder's (HPA) and the system's physical
> addresses (SPA) is needed.
> 
> This series implements address translation for HDM decoding. The
> implementation follows the rule that the representation of hardware
> address ranges in the kernel are all SPA. If decoder registers (HDM
> decoder cap or register range) are not SPA, a base offset must be
> applied. Translation happens when accessing the registers back and
> forth. After a read access an address will be converted to SPA and
> before a write access the programmed address is translated from an
> SPA. The decoder register access can be easily encapsulated by address
> translation and thus there are only a few places where translation is
> needed and the code must be changed. This is implemented in patch #2,
> patch #1 is a prerequisite.
> 
> Address translation is restricted to platforms that need it. As such a
> platform check is needed and a flag is introduced for this (patch #3).
> 
> For address translation the base offset must be determined for the
> memory domain. Depending on the platform there are various options for
> this. The address range in the CEDT's CFWMS entry of the CXL host
> bridge can be used to determine the decoder's base address (patch
> #4). This is enabled for AMD Zen4 platforms (patch #5).
> 

Just testing this out, and I'm seeing an inability to actually map the
memory, though the reason escapes me.

Do you have the expected workflow of this down?

For example this fails:

echo ram > /sys/bus/cxl/devices/decoder2.0/mode
echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
echo 4096 > /sys/bus/cxl/devices/region0/interleave_granularity
echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
echo 0x40000000 > /sys/bus/cxl/devices/region0/size
echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
^^^^ at this point: -bash: echo: write error: Invalid argument

echo 1 > /sys/bus/cxl/devices/region0/commit

and while the cxl driver sees the correct topology, the current version
of cxl-cli reports the memdevs as "anonymous" and reports a failure to
lookup the targets for a region

without adding too much bulk to the email

[~/ndctl]# ./build/cxl/cxl list -vvvvv
libcxl: cxl_mappings_init: region0 target0:  lookup failure
[
  {
    "anon memdevs":[
      {
        "memdev":"mem0",
        "ram_size":137438953472,
...
      {
        "memdev":"mem1",
        "ram_size":137438953472,
...
  {
    "buses":[
      {
...
        "decoders:root0":[
          {
            "decoder":"decoder0.0",
            "resource":825975898112,
            "size":260382392320,
            "interleave_ways":1,
            "max_available_extent":260382392320,
            "volatile_capable":true,
            "qos_class":1,
            "nr_targets":1,
            "targets":[
              {
                "target":"pci0000:00",
                "alias":"ACPI0016:01",
                "position":0,
                "id":0
              }
            ],
            "regions:decoder0.0":[
              {
                "region":"region0",
                "resource":825975898112,
                "size":260382392320,
                "type":"ram",
                "interleave_ways":1,
                "interleave_granularity":256,
                "decode_state":"reset",
                "state":"disabled",
                "mappings":[
                ]
              }
...

Do you have a sense of what might generate this behavior?

~Gregory

> Changelog:
> 
> v2:
>  * Fixed build error for other archs [kbot]
> 
> 
> Robert Richter (5):
>   cxl/hdm: Moving HDM specific code to core/hdm.c.
>   cxl/hdm: Implement address translation for HDM decoding
>   cxl/acpi: Add platform flag for HPA address translation
>   cxl/hdm: Setup HPA base for address translation using the HPA window
>     in CFMWS
>   cxl/acpi: Enable address translation for Zen4 platforms
> 
>  drivers/cxl/acpi.c      |  16 +++
>  drivers/cxl/core/core.h |   2 +
>  drivers/cxl/core/hdm.c  | 245 ++++++++++++++++++++++++++++++++++++++--
>  drivers/cxl/core/pci.c  | 119 +------------------
>  drivers/cxl/cxl.h       |   2 +
>  drivers/cxl/cxlmem.h    |   4 +
>  drivers/cxl/cxlpci.h    |   3 -
>  7 files changed, 262 insertions(+), 129 deletions(-)
> 
> 
> base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
> -- 
> 2.39.2
>

[v2,0/5] Address translation for HDM decoding

Message

Comments