mbox series

[0/3] cxl/core: Enable Region creation on x86 with Low Mem Hole

Message ID 20241122155226.2068287-1-fabio.m.de.francesco@linux.intel.com
Headers show
Series cxl/core: Enable Region creation on x86 with Low Mem Hole | expand

Message

Fabio M. De Francesco Nov. 22, 2024, 3:51 p.m. UTC
The CXL Fixed Memory Window Structure (CFMWS) describes zero or more Host
Physical Address (HPA) windows that are associated with each CXL Host
Bridge. Each window represents a contiguous HPA that may be interleaved
with one or more targets (CXL v3.1 - 9.18.1.3).

The Low Memory Hole (LMH) of x86 is a range of addresses of physical low
memory to which systems cannot send transactions. On those systems, BIOS
publishes CFMWS which communicate the active System Physical Address (SPA)
ranges that map to a subset of the Host Physical Address (HPA) ranges. The
SPA range trims out the hole, and capacity in the endpoint is lost with no
SPA to map to CXL HPA in that hole.

In the early stages of CXL Regions construction and attach on platforms
with Low Memory Holes, the driver fails and returns an error because it
expects that the CXL Endpoint Decoder range is a subset of the Root
Decoder's.

Then detect SPA/HPA misalignment and allow CXL Regions construction and 
attach if and only if the misalignment is due to x86 Low Memory Holes.

- Patch 1/3 changes the calling conventions of three match_*_by_range()
  helpers in preparation of 2/3.
- Patch 2/3 detects x86 LMH and enables CXL Regions construction and
  attach by trimming HPA by SPA.
- Patch 3/3 simulates a LMH for running the CXL tests on patched driver.

Many thanks to Alison, Dan, and Ira for their help and for their reviews
of my RFC on Intel's internal ML.

Fabio M. De Francesco (3):
  cxl/core: Change match_*_by_range() calling convention
  cxl/core: Enable Region creation on x86 with Low Memory Hole
  cxl/test: Simulate an x86 Low Memory Hole for tests

 drivers/cxl/Kconfig          |  5 +++
 drivers/cxl/core/Makefile    |  1 +
 drivers/cxl/core/lmh.c       | 58 ++++++++++++++++++++++++++
 drivers/cxl/core/region.c    | 80 ++++++++++++++++++++++++++++--------
 drivers/cxl/cxl.h            | 32 +++++++++++++++
 tools/testing/cxl/Kbuild     |  1 +
 tools/testing/cxl/test/cxl.c |  4 +-
 7 files changed, 161 insertions(+), 20 deletions(-)
 create mode 100644 drivers/cxl/core/lmh.c

Comments

Gregory Price Nov. 22, 2024, 7:46 p.m. UTC | #1
On Fri, Nov 22, 2024 at 04:51:51PM +0100, Fabio M. De Francesco wrote:
> The CXL Fixed Memory Window Structure (CFMWS) describes zero or more Host
> Physical Address (HPA) windows that are associated with each CXL Host
> Bridge. Each window represents a contiguous HPA that may be interleaved
> with one or more targets (CXL v3.1 - 9.18.1.3).
> 
> The Low Memory Hole (LMH) of x86 is a range of addresses of physical low
> memory to which systems cannot send transactions. On those systems, BIOS
> publishes CFMWS which communicate the active System Physical Address (SPA)
> ranges that map to a subset of the Host Physical Address (HPA) ranges. The
> SPA range trims out the hole, and capacity in the endpoint is lost with no
> SPA to map to CXL HPA in that hole.
> 
> In the early stages of CXL Regions construction and attach on platforms
> with Low Memory Holes, the driver fails and returns an error because it
> expects that the CXL Endpoint Decoder range is a subset of the Root
> Decoder's.
> 
> Then detect SPA/HPA misalignment and allow CXL Regions construction and 
> attach if and only if the misalignment is due to x86 Low Memory Holes.

+cc Robert Richter and Terry Bowman

This is not the only memory-hole possibility. We may need something
more robust, rather than optimizing for a single memory hole solution.

Robert and Terry may have some additional context here.

~Gregory

> 
> - Patch 1/3 changes the calling conventions of three match_*_by_range()
>   helpers in preparation of 2/3.
> - Patch 2/3 detects x86 LMH and enables CXL Regions construction and
>   attach by trimming HPA by SPA.
> - Patch 3/3 simulates a LMH for running the CXL tests on patched driver.
> 
> Many thanks to Alison, Dan, and Ira for their help and for their reviews
> of my RFC on Intel's internal ML.
> 
> Fabio M. De Francesco (3):
>   cxl/core: Change match_*_by_range() calling convention
>   cxl/core: Enable Region creation on x86 with Low Memory Hole
>   cxl/test: Simulate an x86 Low Memory Hole for tests
> 
>  drivers/cxl/Kconfig          |  5 +++
>  drivers/cxl/core/Makefile    |  1 +
>  drivers/cxl/core/lmh.c       | 58 ++++++++++++++++++++++++++
>  drivers/cxl/core/region.c    | 80 ++++++++++++++++++++++++++++--------
>  drivers/cxl/cxl.h            | 32 +++++++++++++++
>  tools/testing/cxl/Kbuild     |  1 +
>  tools/testing/cxl/test/cxl.c |  4 +-
>  7 files changed, 161 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/cxl/core/lmh.c
> 
> -- 
> 2.46.2
>
Alison Schofield Nov. 25, 2024, 10 p.m. UTC | #2
On Fri, Nov 22, 2024 at 04:51:51PM +0100, Fabio M. De Francesco wrote:
> The CXL Fixed Memory Window Structure (CFMWS) describes zero or more Host
> Physical Address (HPA) windows that are associated with each CXL Host
> Bridge. Each window represents a contiguous HPA that may be interleaved
> with one or more targets (CXL v3.1 - 9.18.1.3).
> 
> The Low Memory Hole (LMH) of x86 is a range of addresses of physical low
> memory to which systems cannot send transactions. On those systems, BIOS
> publishes CFMWS which communicate the active System Physical Address (SPA)
> ranges that map to a subset of the Host Physical Address (HPA) ranges. The
> SPA range trims out the hole, and capacity in the endpoint is lost with no
> SPA to map to CXL HPA in that hole.
> 
> In the early stages of CXL Regions construction and attach on platforms
> with Low Memory Holes, the driver fails and returns an error because it
> expects that the CXL Endpoint Decoder range is a subset of the Root
> Decoder's.
> 
> Then detect SPA/HPA misalignment and allow CXL Regions construction and 
> attach if and only if the misalignment is due to x86 Low Memory Holes.
> 

Hi Fabio,

I took this for a test drive in cxl-test - thanks for that patch!

Here's a couple of observations on what users will see. Just stirring
the pot here, not knowing if there is, or even needs to be an explanation
to userspace about LMH.

1) Users will see that the endpoint decoders intend to map more than the
root decoder. Users may question their trimmed region size.

2) At least in this example, I don't think users can re-create this
region in place, ie via hotplug.  Once this region is destroyed, we
default to creating a smaller, aligned region, in its place.

cxl-cli output is appended showing the auto created region, it's decoders,
and then the creation of a user requested region, not exactly in its
place.


Upon load of cxl-test:

# cxl list -r region0 --decoders -u
[
  {
    "root decoders":[
      {
        "decoder":"decoder0.0",
        "resource":"0xf010000000",
        "size":"768.00 MiB (805.31 MB)",
        "interleave_ways":1,
        "max_available_extent":0,
        "volatile_capable":true,
        "qos_class":42,
        "nr_targets":1
      }
    ]
  },
  {
    "port decoders":[
      {
        "decoder":"decoder1.0",
        "resource":"0xf010000000",
        "size":"1024.00 MiB (1073.74 MB)",
        "interleave_ways":1,
        "region":"region0",
        "nr_targets":1
      },
      {
        "decoder":"decoder6.0",
        "resource":"0xf010000000",
        "size":"1024.00 MiB (1073.74 MB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "region":"region0",
        "nr_targets":2
      }
    ]
  },
  {
    "endpoint decoders":[
      {
        "decoder":"decoder10.0",
        "resource":"0xf010000000",
        "size":"1024.00 MiB (1073.74 MB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "region":"region0",
        "dpa_resource":"0",
        "dpa_size":"512.00 MiB (536.87 MB)",
        "mode":"ram"
      },
      {
        "decoder":"decoder13.0",
        "resource":"0xf010000000",
        "size":"1024.00 MiB (1073.74 MB)",
        "interleave_ways":2,
        "interleave_granularity":4096,
        "region":"region0",
        "dpa_resource":"0",
        "dpa_size":"512.00 MiB (536.87 MB)",
        "mode":"ram"
      }
    ]
  }
]

After destroying the auto region, root decoder show the 768MiB available:

# cxl list -d decoder0.0 -u
{
  "decoder":"decoder0.0",
  "resource":"0xf010000000",
  "size":"768.00 MiB (805.31 MB)",
  "interleave_ways":1,
  "max_available_extent":"768.00 MiB (805.31 MB)",
  "volatile_capable":true,
  "qos_class":42,
  "nr_targets":1
}


# cxl create-region -d decoder0.0 -m mem5 mem4
{
  "region":"region0",
  "resource":"0xf010000000",
  "size":"512.00 MiB (536.87 MB)",
  "type":"ram",
  "interleave_ways":2,
  "interleave_granularity":256,
  "decode_state":"commit",

snip

# cxl list -r region0 --decoders -u
[
  {
    "root decoders":[
      {
        "decoder":"decoder0.0",
        "resource":"0xf010000000",
        "size":"768.00 MiB (805.31 MB)",
        "interleave_ways":1,
        "max_available_extent":"256.00 MiB (268.44 MB)",
        "volatile_capable":true,
        "qos_class":42,
        "nr_targets":1
      }
    ]
  },
  {
    "port decoders":[
      {
        "decoder":"decoder1.0",
        "resource":"0xf010000000",
        "size":"512.00 MiB (536.87 MB)",
        "interleave_ways":1,
        "region":"region0",
        "nr_targets":1
      },
      {
        "decoder":"decoder6.0",
        "resource":"0xf010000000",
        "size":"512.00 MiB (536.87 MB)",
        "interleave_ways":2,
        "interleave_granularity":256,
        "region":"region0",
        "nr_targets":2
      }
    ]
  },
  {
    "endpoint decoders":[
      {
        "decoder":"decoder10.0",
        "resource":"0xf010000000",
        "size":"512.00 MiB (536.87 MB)",
        "interleave_ways":2,
        "interleave_granularity":256,
        "region":"region0",
        "dpa_resource":"0",
        "dpa_size":"256.00 MiB (268.44 MB)",
        "mode":"ram"
      },
      {
        "decoder":"decoder13.0",
        "resource":"0xf010000000",
        "size":"512.00 MiB (536.87 MB)",
        "interleave_ways":2,
        "interleave_granularity":256,
        "region":"region0",
        "dpa_resource":"0",
        "dpa_size":"256.00 MiB (268.44 MB)",
        "mode":"ram"
      }
    ]
  }
]