mbox series

[00/15] cxl: Add support for Restricted CXL hosts (RCD mode)

Message ID 20220831081603.3415-1-rrichter@amd.com
Headers show
Series cxl: Add support for Restricted CXL hosts (RCD mode) | expand

Message

Robert Richter Aug. 31, 2022, 8:15 a.m. UTC
In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
the PCIe enumeration hierarchy is different from CXL VH Enumeration
(formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
and 9.12, [1]). This series adds support for RCD mode. It implements
the detection of Restricted CXL Hosts (RCHs) and its corresponding
Restricted CXL Devices (RCDs). It does the necessary enumeration of
ports and connects the endpoints. With all the plumbing an RCH/RCD
pair is registered at the Linux CXL bus and becomes visible in sysfs
in the same way as CXL VH hosts and devices do already. RCDs are
brought up as CXL endpoints and bound to subsequent drivers such as
cxl_mem.

For CXL VH the host driver (cxl_acpi) starts host bridge discovery
once the ACPI0017 CXL root device is detected and then searches for
ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
might not necessarily exist and the host bridge can have a standard
PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
the PCIe hierarchy visible. As such the RCD mode enumeration and host
discovery is very different from CXL VH. See patch #5 for
implementation details.

This implementation expects the host's downstream and upstream port
RCRBs base address being reported by firmware using the optional CEDT
CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).

RCD mode does not support hot-plug, so host discovery is at boot time
only.

Patches #1 to #4 are prerequisites of the series with fixes needed and
a rework of debug messages for port enumeration. Those are general
patches and could be applied earlier and independently from the rest
assuming there are no objections with them. Patches #5 to #15 contain
the actual implementation of RCD mode support.

[1] https://www.computeexpresslink.org/spec-landing

Robert Richter (15):
  cxl/core: Remove duplicate declaration of devm_cxl_iomap_block()
  cxl/core: Check physical address before mapping it in
    devm_cxl_iomap_block()
  cxl: Unify debug messages when calling devm_cxl_add_port()
  cxl: Unify debug messages when calling devm_cxl_add_dport()
  cxl/acpi: Add probe function to detect restricted CXL hosts in RCD
    mode
  PCI/ACPI: Link host bridge to its ACPI fw node
  cxl/acpi: Check RCH's PCIe Host Bridge ACPI ID
  cxl/acpi: Check RCH's CXL DVSEC capabilities
  cxl/acpi: Determine PCI host bridge's ACPI UID
  cxl/acpi: Extract the RCH's RCRB base address from CEDT
  cxl/acpi: Extract the host's component register base address from RCRB
  cxl/acpi: Skip devm_cxl_port_enumerate_dports() when in RCD mode
  cxl/acpi: Rework devm_cxl_enumerate_ports() to support RCD mode
  cxl/acpi: Enumerate ports in RCD mode to enable RCHs and RCDs
  cxl/acpi: Specify module load order dependency for the cxl_acpi module

 drivers/acpi/pci_root.c      |   1 +
 drivers/cxl/acpi.c           | 311 ++++++++++++++++++++++++++++++++++-
 drivers/cxl/core/pci.c       |  22 ++-
 drivers/cxl/core/port.c      | 103 ++++++++----
 drivers/cxl/core/regs.c      |   3 +
 drivers/cxl/cxl.h            |   2 -
 drivers/cxl/mem.c            |   1 +
 tools/testing/cxl/test/cxl.c |   8 +-
 8 files changed, 400 insertions(+), 51 deletions(-)

Comments

Jonathan Cameron Aug. 31, 2022, 12:23 p.m. UTC | #1
On Wed, 31 Aug 2022 10:15:48 +0200
Robert Richter <rrichter@amd.com> wrote:

> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> the PCIe enumeration hierarchy is different from CXL VH Enumeration
> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> and 9.12, [1]). This series adds support for RCD mode. It implements
> the detection of Restricted CXL Hosts (RCHs) and its corresponding
> Restricted CXL Devices (RCDs). It does the necessary enumeration of
> ports and connects the endpoints. With all the plumbing an RCH/RCD
> pair is registered at the Linux CXL bus and becomes visible in sysfs
> in the same way as CXL VH hosts and devices do already. RCDs are
> brought up as CXL endpoints and bound to subsequent drivers such as
> cxl_mem.
> 
> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> once the ACPI0017 CXL root device is detected and then searches for
> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> might not necessarily exist and the host bridge can have a standard
> PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
> the PCIe hierarchy visible. As such the RCD mode enumeration and host
> discovery is very different from CXL VH. See patch #5 for
> implementation details.
> 
> This implementation expects the host's downstream and upstream port
> RCRBs base address being reported by firmware using the optional CEDT
> CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).
> 
> RCD mode does not support hot-plug, so host discovery is at boot time
> only.
> 
> Patches #1 to #4 are prerequisites of the series with fixes needed and
> a rework of debug messages for port enumeration. Those are general
> patches and could be applied earlier and independently from the rest
> assuming there are no objections with them. Patches #5 to #15 contain
> the actual implementation of RCD mode support.
> 
> [1] https://www.computeexpresslink.org/spec-landing

Hi Robert,

I'm curious on the aims of this work.  Given expectation for RCDs is often
that the host firmware has set them up before the OS loads, what functionality
do you want to gain by mapping these into the CXL 2.0+ focused infrastructure?

When I did some analysis a while back on CXL 1.1 I was pretty much assuming
that there was no real reason to let the OS know about it because it
couldn't do much of any use with the information.  There are some corners
like RAS where it might be useful or perhaps to enable some of the CXL 3.0
features that are allowed to be EP only and so could be relevant for
an older host (e.g. CPMUs).

With my QEMU hat on I wasn't planning to bother with anything pre 2.0
but it might be worth considering to let us exercise this code...

Jonathan


> 
> Robert Richter (15):
>   cxl/core: Remove duplicate declaration of devm_cxl_iomap_block()
>   cxl/core: Check physical address before mapping it in
>     devm_cxl_iomap_block()
>   cxl: Unify debug messages when calling devm_cxl_add_port()
>   cxl: Unify debug messages when calling devm_cxl_add_dport()
>   cxl/acpi: Add probe function to detect restricted CXL hosts in RCD
>     mode
>   PCI/ACPI: Link host bridge to its ACPI fw node
>   cxl/acpi: Check RCH's PCIe Host Bridge ACPI ID
>   cxl/acpi: Check RCH's CXL DVSEC capabilities
>   cxl/acpi: Determine PCI host bridge's ACPI UID
>   cxl/acpi: Extract the RCH's RCRB base address from CEDT
>   cxl/acpi: Extract the host's component register base address from RCRB
>   cxl/acpi: Skip devm_cxl_port_enumerate_dports() when in RCD mode
>   cxl/acpi: Rework devm_cxl_enumerate_ports() to support RCD mode
>   cxl/acpi: Enumerate ports in RCD mode to enable RCHs and RCDs
>   cxl/acpi: Specify module load order dependency for the cxl_acpi module
> 
>  drivers/acpi/pci_root.c      |   1 +
>  drivers/cxl/acpi.c           | 311 ++++++++++++++++++++++++++++++++++-
>  drivers/cxl/core/pci.c       |  22 ++-
>  drivers/cxl/core/port.c      | 103 ++++++++----
>  drivers/cxl/core/regs.c      |   3 +
>  drivers/cxl/cxl.h            |   2 -
>  drivers/cxl/mem.c            |   1 +
>  tools/testing/cxl/test/cxl.c |   8 +-
>  8 files changed, 400 insertions(+), 51 deletions(-)
>
Robert Richter Sept. 1, 2022, 8:19 a.m. UTC | #2
Jonathan,

On 31.08.22 13:23:29, Jonathan Cameron wrote:
> On Wed, 31 Aug 2022 10:15:48 +0200
> Robert Richter <rrichter@amd.com> wrote:
> 
> > In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> > the PCIe enumeration hierarchy is different from CXL VH Enumeration
> > (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> > and 9.12, [1]). This series adds support for RCD mode. It implements
> > the detection of Restricted CXL Hosts (RCHs) and its corresponding
> > Restricted CXL Devices (RCDs). It does the necessary enumeration of
> > ports and connects the endpoints. With all the plumbing an RCH/RCD
> > pair is registered at the Linux CXL bus and becomes visible in sysfs
> > in the same way as CXL VH hosts and devices do already. RCDs are
> > brought up as CXL endpoints and bound to subsequent drivers such as
> > cxl_mem.
> > 
> > For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> > once the ACPI0017 CXL root device is detected and then searches for
> > ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> > might not necessarily exist and the host bridge can have a standard
> > PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
> > the PCIe hierarchy visible. As such the RCD mode enumeration and host
> > discovery is very different from CXL VH. See patch #5 for
> > implementation details.
> > 
> > This implementation expects the host's downstream and upstream port
> > RCRBs base address being reported by firmware using the optional CEDT
> > CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).
> > 
> > RCD mode does not support hot-plug, so host discovery is at boot time
> > only.
> > 
> > Patches #1 to #4 are prerequisites of the series with fixes needed and
> > a rework of debug messages for port enumeration. Those are general
> > patches and could be applied earlier and independently from the rest
> > assuming there are no objections with them. Patches #5 to #15 contain
> > the actual implementation of RCD mode support.
> > 
> > [1] https://www.computeexpresslink.org/spec-landing
> 
> Hi Robert,
> 
> I'm curious on the aims of this work.  Given expectation for RCDs is often
> that the host firmware has set them up before the OS loads, what functionality
> do you want to gain by mapping these into the CXL 2.0+ focused infrastructure?
> 
> When I did some analysis a while back on CXL 1.1 I was pretty much assuming
> that there was no real reason to let the OS know about it because it
> couldn't do much of any use with the information.  There are some corners
> like RAS where it might be useful or perhaps to enable some of the CXL 3.0
> features that are allowed to be EP only and so could be relevant for
> an older host (e.g. CPMUs).

though CXL RCD works with a legacy kernel or without any CXL
functionality added, a CXL aware kernel can be useful also for RCD
mode. RAS is a topic here but also gathering device information such
as status or topology. Everything where access to the component
register block or mailbox interface is required.

Another plus, the CXL hierarchy becomes visible for RCD mode in sysfs
and the device hierarchy.

Reusing the existing infrastructure for this makes sense. Many
features overlap in both modes (e.g. RAS, mailbox again, or topology
information).

Thanks again for you review.

-Robert

> 
> With my QEMU hat on I wasn't planning to bother with anything pre 2.0
> but it might be worth considering to let us exercise this code...
> 
> Jonathan
Dan Williams Sept. 8, 2022, 5:43 a.m. UTC | #3
Apologies for the delay in getting to this I had hoped to be able to
finish up some other DAX work to focus on this, but time is getting
short so I will need to do both in parallel.

Robert Richter wrote:
> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> the PCIe enumeration hierarchy is different from CXL VH Enumeration
> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> and 9.12, [1]). This series adds support for RCD mode. It implements
> the detection of Restricted CXL Hosts (RCHs) and its corresponding
> Restricted CXL Devices (RCDs). It does the necessary enumeration of
> ports and connects the endpoints. With all the plumbing an RCH/RCD
> pair is registered at the Linux CXL bus and becomes visible in sysfs
> in the same way as CXL VH hosts and devices do already. RCDs are
> brought up as CXL endpoints and bound to subsequent drivers such as
> cxl_mem.
> 
> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> once the ACPI0017 CXL root device is detected and then searches for
> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> might not necessarily exist 

That's a broken BIOS as far as I can see. No ACPI0017 == no OS CXL
services and the CXL aspects of the device need to be 100% managed by
the BIOS. You can still run the cxl_pci driver in that case for mailbox
operation, but error handling must be firmware-first without ACPI0017.

> PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
> the PCIe hierarchy visible. As such the RCD mode enumeration and host
> discovery is very different from CXL VH. See patch #5 for
> implementation details.
> 
> This implementation expects the host's downstream and upstream port
> RCRBs base address being reported by firmware using the optional CEDT
> CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).
> 
> RCD mode does not support hot-plug, so host discovery is at boot time
> only.
> 
> Patches #1 to #4 are prerequisites of the series with fixes needed and
> a rework of debug messages for port enumeration. Those are general
> patches and could be applied earlier and independently from the rest
> assuming there are no objections with them. Patches #5 to #15 contain
> the actual implementation of RCD mode support.
> 
> [1] https://www.computeexpresslink.org/spec-landing
> 
> Robert Richter (15):
>   cxl/core: Remove duplicate declaration of devm_cxl_iomap_block()
>   cxl/core: Check physical address before mapping it in
>     devm_cxl_iomap_block()
>   cxl: Unify debug messages when calling devm_cxl_add_port()
>   cxl: Unify debug messages when calling devm_cxl_add_dport()
>   cxl/acpi: Add probe function to detect restricted CXL hosts in RCD
>     mode
>   PCI/ACPI: Link host bridge to its ACPI fw node
>   cxl/acpi: Check RCH's PCIe Host Bridge ACPI ID
>   cxl/acpi: Check RCH's CXL DVSEC capabilities
>   cxl/acpi: Determine PCI host bridge's ACPI UID
>   cxl/acpi: Extract the RCH's RCRB base address from CEDT
>   cxl/acpi: Extract the host's component register base address from RCRB
>   cxl/acpi: Skip devm_cxl_port_enumerate_dports() when in RCD mode
>   cxl/acpi: Rework devm_cxl_enumerate_ports() to support RCD mode
>   cxl/acpi: Enumerate ports in RCD mode to enable RCHs and RCDs
>   cxl/acpi: Specify module load order dependency for the cxl_acpi module
> 
>  drivers/acpi/pci_root.c      |   1 +
>  drivers/cxl/acpi.c           | 311 ++++++++++++++++++++++++++++++++++-
>  drivers/cxl/core/pci.c       |  22 ++-
>  drivers/cxl/core/port.c      | 103 ++++++++----
>  drivers/cxl/core/regs.c      |   3 +
>  drivers/cxl/cxl.h            |   2 -
>  drivers/cxl/mem.c            |   1 +
>  tools/testing/cxl/test/cxl.c |   8 +-
>  8 files changed, 400 insertions(+), 51 deletions(-)
> 
> -- 
> 2.30.2
>
Dan Williams Sept. 8, 2022, 6:41 a.m. UTC | #4
Robert Richter wrote:
> Jonathan,
> 
> On 31.08.22 13:23:29, Jonathan Cameron wrote:
> > On Wed, 31 Aug 2022 10:15:48 +0200
> > Robert Richter <rrichter@amd.com> wrote:
> > 
> > > In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> > > the PCIe enumeration hierarchy is different from CXL VH Enumeration
> > > (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> > > and 9.12, [1]). This series adds support for RCD mode. It implements
> > > the detection of Restricted CXL Hosts (RCHs) and its corresponding
> > > Restricted CXL Devices (RCDs). It does the necessary enumeration of
> > > ports and connects the endpoints. With all the plumbing an RCH/RCD
> > > pair is registered at the Linux CXL bus and becomes visible in sysfs
> > > in the same way as CXL VH hosts and devices do already. RCDs are
> > > brought up as CXL endpoints and bound to subsequent drivers such as
> > > cxl_mem.
> > > 
> > > For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> > > once the ACPI0017 CXL root device is detected and then searches for
> > > ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> > > might not necessarily exist and the host bridge can have a standard
> > > PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
> > > the PCIe hierarchy visible. As such the RCD mode enumeration and host
> > > discovery is very different from CXL VH. See patch #5 for
> > > implementation details.
> > > 
> > > This implementation expects the host's downstream and upstream port
> > > RCRBs base address being reported by firmware using the optional CEDT
> > > CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).
> > > 
> > > RCD mode does not support hot-plug, so host discovery is at boot time
> > > only.
> > > 
> > > Patches #1 to #4 are prerequisites of the series with fixes needed and
> > > a rework of debug messages for port enumeration. Those are general
> > > patches and could be applied earlier and independently from the rest
> > > assuming there are no objections with them. Patches #5 to #15 contain
> > > the actual implementation of RCD mode support.
> > > 
> > > [1] https://www.computeexpresslink.org/spec-landing
> > 
> > Hi Robert,
> > 
> > I'm curious on the aims of this work.  Given expectation for RCDs is often
> > that the host firmware has set them up before the OS loads, what functionality
> > do you want to gain by mapping these into the CXL 2.0+ focused infrastructure?
> > 
> > When I did some analysis a while back on CXL 1.1 I was pretty much assuming
> > that there was no real reason to let the OS know about it because it
> > couldn't do much of any use with the information.  There are some corners
> > like RAS where it might be useful or perhaps to enable some of the CXL 3.0
> > features that are allowed to be EP only and so could be relevant for
> > an older host (e.g. CPMUs).
> 
> though CXL RCD works with a legacy kernel or without any CXL
> functionality added, a CXL aware kernel can be useful also for RCD
> mode. RAS is a topic here but also gathering device information such
> as status or topology. Everything where access to the component
> register block or mailbox interface is required.

Unless the BIOS is going actively enable the standard CXL topology with
ACPI0017 then I think it should be hands off for the OS. The maintenance
burden of some of the hack to work around missing BIOS descriptions is
non-trivial, and it is still early days to encourage BIOS vendors to
enable what is needed and set end user expectations that these
pre-requisites exist.

As far as I can see this enabling adds an additional CXL "root" device
and I do not think userspace should need to care if a CXL 2.0 device is
attached to an RCH or not.

> Another plus, the CXL hierarchy becomes visible for RCD mode in sysfs
> and the device hierarchy.
> 
> Reusing the existing infrastructure for this makes sense. Many
> features overlap in both modes (e.g. RAS, mailbox again, or topology
> information).

RAS only if OS first is supported by the BIOS. Mailbox support happens
with or without a CXL root device. The topology information is certainly
important in OS first error handling, but if its firmware first its
going to have its own FRU id scheme. Much of the common case topology
information for the RCH case (like which RCIEP is hosting which CXL address
range) is covered by this pending lspci update:

https://github.com/pciutils/pciutils/pull/59:

...although that needs some help to get over the goal line.

Otherwise the topology information is mostly for describing all the
degrees of freedom of a full blown CXL 2.0 topoloy with host bridge and
switch interleaving.
Jonathan Zhang Sept. 8, 2022, 6:52 p.m. UTC | #5
> On Sep 7, 2022, at 10:43 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> Apologies for the delay in getting to this I had hoped to be able to
> finish up some other DAX work to focus on this, but time is getting
> short so I will need to do both in parallel.
> 
> Robert Richter wrote:
>> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
>> the PCIe enumeration hierarchy is different from CXL VH Enumeration
>> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
>> and 9.12, [1]). This series adds support for RCD mode. It implements
>> the detection of Restricted CXL Hosts (RCHs) and its corresponding
>> Restricted CXL Devices (RCDs). It does the necessary enumeration of
>> ports and connects the endpoints. With all the plumbing an RCH/RCD
>> pair is registered at the Linux CXL bus and becomes visible in sysfs
>> in the same way as CXL VH hosts and devices do already. RCDs are
>> brought up as CXL endpoints and bound to subsequent drivers such as
>> cxl_mem.
>> 
>> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
>> once the ACPI0017 CXL root device is detected and then searches for
>> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
>> might not necessarily exist 
> 
> That's a broken BIOS as far as I can see. No ACPI0017 == no OS CXL
> services and the CXL aspects of the device need to be 100% managed by
> the BIOS. You can still run the cxl_pci driver in that case for mailbox
> operation, but error handling must be firmware-first without ACPI0017.
Firmware-first or OS-first applies to CXL protocol error handling. For CXL 
memory error handling, the device generates a DRAM error record, the OS
parses such record and act accordingly. According to CXL spec (section
8.2.9.2.1.2 DRAM Event Record), DPA but not HPA is in such record. The OS
needs to translate such DPA into HPA to act on. I am taking this as an example
to show that OS CXL services is needed.
Instead of using ACPI0016 to tell whether the system is under RCH mode,
I suppose one way is to check “CXL version” field of CHBS structure in CEDT?

> 
>> PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
>> the PCIe hierarchy visible. As such the RCD mode enumeration and host
>> discovery is very different from CXL VH. See patch #5 for
>> implementation details.
>> 
>> This implementation expects the host's downstream and upstream port
>> RCRBs base address being reported by firmware using the optional CEDT
>> CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).
>> 
>> RCD mode does not support hot-plug, so host discovery is at boot time
>> only.
>> 
>> Patches #1 to #4 are prerequisites of the series with fixes needed and
>> a rework of debug messages for port enumeration. Those are general
>> patches and could be applied earlier and independently from the rest
>> assuming there are no objections with them. Patches #5 to #15 contain
>> the actual implementation of RCD mode support.
>> 
>> [1] https://www.computeexpresslink.org/spec-landing
>> 
>> Robert Richter (15):
>>  cxl/core: Remove duplicate declaration of devm_cxl_iomap_block()
>>  cxl/core: Check physical address before mapping it in
>>    devm_cxl_iomap_block()
>>  cxl: Unify debug messages when calling devm_cxl_add_port()
>>  cxl: Unify debug messages when calling devm_cxl_add_dport()
>>  cxl/acpi: Add probe function to detect restricted CXL hosts in RCD
>>    mode
>>  PCI/ACPI: Link host bridge to its ACPI fw node
>>  cxl/acpi: Check RCH's PCIe Host Bridge ACPI ID
>>  cxl/acpi: Check RCH's CXL DVSEC capabilities
>>  cxl/acpi: Determine PCI host bridge's ACPI UID
>>  cxl/acpi: Extract the RCH's RCRB base address from CEDT
>>  cxl/acpi: Extract the host's component register base address from RCRB
>>  cxl/acpi: Skip devm_cxl_port_enumerate_dports() when in RCD mode
>>  cxl/acpi: Rework devm_cxl_enumerate_ports() to support RCD mode
>>  cxl/acpi: Enumerate ports in RCD mode to enable RCHs and RCDs
>>  cxl/acpi: Specify module load order dependency for the cxl_acpi module
>> 
>> drivers/acpi/pci_root.c      |   1 +
>> drivers/cxl/acpi.c           | 311 ++++++++++++++++++++++++++++++++++-
>> drivers/cxl/core/pci.c       |  22 ++-
>> drivers/cxl/core/port.c      | 103 ++++++++----
>> drivers/cxl/core/regs.c      |   3 +
>> drivers/cxl/cxl.h            |   2 -
>> drivers/cxl/mem.c            |   1 +
>> tools/testing/cxl/test/cxl.c |   8 +-
>> 8 files changed, 400 insertions(+), 51 deletions(-)
>> 
>> -- 
>> 2.30.2
>> 
> 
> 
>
Dan Williams Sept. 8, 2022, 7:51 p.m. UTC | #6
Jonathan Zhang (Infra) wrote:
> 
> 
> > On Sep 7, 2022, at 10:43 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > Apologies for the delay in getting to this I had hoped to be able to
> > finish up some other DAX work to focus on this, but time is getting
> > short so I will need to do both in parallel.
> > 
> > Robert Richter wrote:
> >> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> >> the PCIe enumeration hierarchy is different from CXL VH Enumeration
> >> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> >> and 9.12, [1]). This series adds support for RCD mode. It implements
> >> the detection of Restricted CXL Hosts (RCHs) and its corresponding
> >> Restricted CXL Devices (RCDs). It does the necessary enumeration of
> >> ports and connects the endpoints. With all the plumbing an RCH/RCD
> >> pair is registered at the Linux CXL bus and becomes visible in sysfs
> >> in the same way as CXL VH hosts and devices do already. RCDs are
> >> brought up as CXL endpoints and bound to subsequent drivers such as
> >> cxl_mem.
> >> 
> >> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> >> once the ACPI0017 CXL root device is detected and then searches for
> >> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> >> might not necessarily exist 
> > 
> > That's a broken BIOS as far as I can see. No ACPI0017 == no OS CXL
> > services and the CXL aspects of the device need to be 100% managed by
> > the BIOS. You can still run the cxl_pci driver in that case for mailbox
> > operation, but error handling must be firmware-first without ACPI0017.
> Firmware-first or OS-first applies to CXL protocol error handling. For CXL 
> memory error handling, the device generates a DRAM error record, the OS
> parses such record and act accordingly. According to CXL spec (section
> 8.2.9.2.1.2 DRAM Event Record), DPA but not HPA is in such record. The OS
> needs to translate such DPA into HPA to act on. I am taking this as an example
> to show that OS CXL services is needed.
> Instead of using ACPI0016 to tell whether the system is under RCH mode,
> I suppose one way is to check “CXL version” field of CHBS structure in CEDT?

Unless the OS has negotiated CXL _OSC the BIOS owns the event retrieval
and translating it from DPA to HPA. I do want to add OS CXL services to
Linux, but only in the case when the BIOS is actively enabling OS native
address translation which includes populating ACPI0017, CFMWS, and
devices with the HDM decoder capability registers instead of DVSEC range
registers. Everything else is early-gen CXL that is 100% BIOS supported,
similar to DDR where a driver is not expected.
Jonathan Zhang Sept. 8, 2022, 8:36 p.m. UTC | #7
> On Sep 8, 2022, at 12:51 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> 
> !-------------------------------------------------------------------|
>  This Message Is From an External Sender
> 
> |-------------------------------------------------------------------!
> 
> Jonathan Zhang (Infra) wrote:
>> 
>> 
>>> On Sep 7, 2022, at 10:43 PM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> 
>>> Apologies for the delay in getting to this I had hoped to be able to
>>> finish up some other DAX work to focus on this, but time is getting
>>> short so I will need to do both in parallel.
>>> 
>>> Robert Richter wrote:
>>>> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
>>>> the PCIe enumeration hierarchy is different from CXL VH Enumeration
>>>> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
>>>> and 9.12, [1]). This series adds support for RCD mode. It implements
>>>> the detection of Restricted CXL Hosts (RCHs) and its corresponding
>>>> Restricted CXL Devices (RCDs). It does the necessary enumeration of
>>>> ports and connects the endpoints. With all the plumbing an RCH/RCD
>>>> pair is registered at the Linux CXL bus and becomes visible in sysfs
>>>> in the same way as CXL VH hosts and devices do already. RCDs are
>>>> brought up as CXL endpoints and bound to subsequent drivers such as
>>>> cxl_mem.
>>>> 
>>>> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
>>>> once the ACPI0017 CXL root device is detected and then searches for
>>>> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
>>>> might not necessarily exist 
>>> 
>>> That's a broken BIOS as far as I can see. No ACPI0017 == no OS CXL
>>> services and the CXL aspects of the device need to be 100% managed by
>>> the BIOS. You can still run the cxl_pci driver in that case for mailbox
>>> operation, but error handling must be firmware-first without ACPI0017.
>> Firmware-first or OS-first applies to CXL protocol error handling. For CXL 
>> memory error handling, the device generates a DRAM error record, the OS
>> parses such record and act accordingly. According to CXL spec (section
>> 8.2.9.2.1.2 DRAM Event Record), DPA but not HPA is in such record. The OS
>> needs to translate such DPA into HPA to act on. I am taking this as an example
>> to show that OS CXL services is needed.
>> Instead of using ACPI0016 to tell whether the system is under RCH mode,
>> I suppose one way is to check “CXL version” field of CHBS structure in CEDT?
> 
> Unless the OS has negotiated CXL _OSC the BIOS owns the event retrieval
> and translating it from DPA to HPA. I do want to add OS CXL services to
> Linux, but only in the case when the BIOS is actively enabling OS native
> address translation which includes populating ACPI0017, CFMWS, and
> devices with the HDM decoder capability registers instead of DVSEC range
> registers. Everything else is early-gen CXL that is 100% BIOS supported,
> similar to DDR where a driver is not expected.


It makes sense that the BIOS and OS need to negotiate CXL _OSC so that OS
would take care of address translation. That being said, only DVSEC range 
register (but not HDM decoder capability register) is available when the device is in
RCRB mode (section 9.11.8 figure 9-7) attached to a RCH. This type of
configuration needs to be supported with OS CXL service.
Dan Williams Sept. 8, 2022, 9:02 p.m. UTC | #8
Jonathan Zhang (Infra) wrote:
> 
> 
> > On Sep 8, 2022, at 12:51 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  This Message Is From an External Sender
> > 
> > |-------------------------------------------------------------------!
> > 
> > Jonathan Zhang (Infra) wrote:
> >> 
> >> 
> >>> On Sep 7, 2022, at 10:43 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> >>> 
> >>> Apologies for the delay in getting to this I had hoped to be able to
> >>> finish up some other DAX work to focus on this, but time is getting
> >>> short so I will need to do both in parallel.
> >>> 
> >>> Robert Richter wrote:
> >>>> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> >>>> the PCIe enumeration hierarchy is different from CXL VH Enumeration
> >>>> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> >>>> and 9.12, [1]). This series adds support for RCD mode. It implements
> >>>> the detection of Restricted CXL Hosts (RCHs) and its corresponding
> >>>> Restricted CXL Devices (RCDs). It does the necessary enumeration of
> >>>> ports and connects the endpoints. With all the plumbing an RCH/RCD
> >>>> pair is registered at the Linux CXL bus and becomes visible in sysfs
> >>>> in the same way as CXL VH hosts and devices do already. RCDs are
> >>>> brought up as CXL endpoints and bound to subsequent drivers such as
> >>>> cxl_mem.
> >>>> 
> >>>> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> >>>> once the ACPI0017 CXL root device is detected and then searches for
> >>>> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> >>>> might not necessarily exist 
> >>> 
> >>> That's a broken BIOS as far as I can see. No ACPI0017 == no OS CXL
> >>> services and the CXL aspects of the device need to be 100% managed by
> >>> the BIOS. You can still run the cxl_pci driver in that case for mailbox
> >>> operation, but error handling must be firmware-first without ACPI0017.
> >> Firmware-first or OS-first applies to CXL protocol error handling. For CXL 
> >> memory error handling, the device generates a DRAM error record, the OS
> >> parses such record and act accordingly. According to CXL spec (section
> >> 8.2.9.2.1.2 DRAM Event Record), DPA but not HPA is in such record. The OS
> >> needs to translate such DPA into HPA to act on. I am taking this as an example
> >> to show that OS CXL services is needed.
> >> Instead of using ACPI0016 to tell whether the system is under RCH mode,
> >> I suppose one way is to check “CXL version” field of CHBS structure in CEDT?
> > 
> > Unless the OS has negotiated CXL _OSC the BIOS owns the event retrieval
> > and translating it from DPA to HPA. I do want to add OS CXL services to
> > Linux, but only in the case when the BIOS is actively enabling OS native
> > address translation which includes populating ACPI0017, CFMWS, and
> > devices with the HDM decoder capability registers instead of DVSEC range
> > registers. Everything else is early-gen CXL that is 100% BIOS supported,
> > similar to DDR where a driver is not expected.
> 
> 
> It makes sense that the BIOS and OS need to negotiate CXL _OSC so that OS
> would take care of address translation. That being said, only DVSEC range 
> register (but not HDM decoder capability register) is available when the device is in
> RCRB mode (section 9.11.8 figure 9-7) attached to a RCH. This type of
> configuration needs to be supported with OS CXL service.
> 

So that figure does have the HDM capabilty pictured in the RCD upstream
port. However, Table 8-22 does seem to incidate that Type 3 D1 devices
are not permitted to have an HDM Decoder Capabilitiy Structure.

However that then leave me confused about figure 9-8 as that shows an
HDM decoder capability in the BAR and not the RCRB. Is that picture
wrong with respect what Table 8-22 indicates?
Dan Williams Sept. 16, 2022, 6:16 p.m. UTC | #9
Robert Richter wrote:
> In Restricted CXL Device (RCD) mode (formerly referred to as CXL 1.1)
> the PCIe enumeration hierarchy is different from CXL VH Enumeration
> (formerly referred to as 2.0, for both modes see CXL spec 3.0: 9.11
> and 9.12, [1]). This series adds support for RCD mode. It implements
> the detection of Restricted CXL Hosts (RCHs) and its corresponding
> Restricted CXL Devices (RCDs). It does the necessary enumeration of
> ports and connects the endpoints. With all the plumbing an RCH/RCD
> pair is registered at the Linux CXL bus and becomes visible in sysfs
> in the same way as CXL VH hosts and devices do already. RCDs are
> brought up as CXL endpoints and bound to subsequent drivers such as
> cxl_mem.
> 
> For CXL VH the host driver (cxl_acpi) starts host bridge discovery
> once the ACPI0017 CXL root device is detected and then searches for
> ACPI0016 host bridges to enable CXL. In RCD mode an ACPI0017 device
> might not necessarily exist and the host bridge can have a standard
> PCIe host bridge PNP0A08 ID, there aren't any CXL port or switches in
> the PCIe hierarchy visible. As such the RCD mode enumeration and host
> discovery is very different from CXL VH. See patch #5 for
> implementation details.
> 
> This implementation expects the host's downstream and upstream port
> RCRBs base address being reported by firmware using the optional CEDT
> CHBS entry of the host bridge (see CXL spec 3.0, 9.17.1.2).
> 
> RCD mode does not support hot-plug, so host discovery is at boot time
> only.
> 
> Patches #1 to #4 are prerequisites of the series with fixes needed and
> a rework of debug messages for port enumeration. Those are general
> patches and could be applied earlier and independently from the rest
> assuming there are no objections with them. Patches #5 to #15 contain
> the actual implementation of RCD mode support.

Hi Robert,

I did not see a response to some of my feedback but wanted to summarize
where I think the next version of this set needs to go:

1/ ACPI0017 is mandatory. If a BIOS does not provide ACPI0017 it is
explicitly opting the OS out of managing anything other than the CXL.io
side of memory expanders.

2/ Per table 8-22 in CXL 3.0 RCDs are not permitted to have HDM decoders
so that assumption in the driver needs to be reworked.

3/ It's not even clear that the Register Locator DVSEC has any role to
play in an RCD as every register the driver needs should be relative to
the RCRB. So the assumptions in the driver need to consider RCRB located
registers as a first class citizen.