Message ID | 166931487492.2104015.15204324083515120776.stgit@dwillia2-xfh.jf.intel.com |
---|---|
Headers | show |
Series | cxl: Add support for Restricted CXL hosts (RCD mode) | expand |
Dan, On 24.11.22 10:34:35, Dan Williams wrote: > Changes since v3 [1]: > - Rework / simplify CXL to LIBNVDIMM coordination to remove a > flush_work() locking dependency from underneath the root device lock. > - Move the root device rescan to a workqueue > - Connect RCDs directly as endpoints reachable through a CXL host bridge > as a dport, i.e. drop the extra dport indirection from v3 > - Add unit test infrastructure for an RCD configuration thank you for this posting. Patches #1-#6 are not really prerequisites (except for a trivial conflict), right? I only reviewed them starting with #6. Thanks, -Robert
On 24.11.22 10:34:35, Dan Williams wrote: > Changes since v3 [1]: > - Rework / simplify CXL to LIBNVDIMM coordination to remove a > flush_work() locking dependency from underneath the root device lock. > - Move the root device rescan to a workqueue > - Connect RCDs directly as endpoints reachable through a CXL host bridge > as a dport, i.e. drop the extra dport indirection from v3 > - Add unit test infrastructure for an RCD configuration > > [1]: http://lore.kernel.org/r/20221109104059.766720-1-rrichter@amd.com/ > > --- > > >From [PATCH v4 10/12] cxl/port: Add RCD endpoint port enumeration > --- > > Dan Williams (9): > cxl/acpi: Simplify cxl_nvdimm_bridge probing > cxl/region: Drop redundant pmem region release handling > cxl/pmem: Refactor nvdimm device registration, delete the workqueue > cxl/pmem: Remove the cxl_pmem_wq and related infrastructure > cxl/acpi: Move rescan to the workqueue > tools/testing/cxl: Make mock CEDT parsing more robust > cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem > cxl/port: Add RCD endpoint port enumeration > tools/testing/cxl: Add an RCH topology > > Robert Richter (2): > cxl/ACPI: Register CXL host ports by bridge device > cxl/acpi: Extract component registers of restricted hosts from RCRB > > Terry Bowman (1): > cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support > > > drivers/acpi/pci_root.c | 1 > drivers/cxl/acpi.c | 105 +++++++++--- > drivers/cxl/core/core.h | 8 - > drivers/cxl/core/pmem.c | 94 +++++++---- > drivers/cxl/core/port.c | 111 +++++++------ > drivers/cxl/core/region.c | 54 ++++++ > drivers/cxl/core/regs.c | 56 +++++++ > drivers/cxl/cxl.h | 46 +++-- > drivers/cxl/cxlmem.h | 15 ++ > drivers/cxl/mem.c | 72 ++++++++ > drivers/cxl/pci.c | 13 +- > drivers/cxl/pmem.c | 351 +++++------------------------------------ > tools/testing/cxl/Kbuild | 1 > tools/testing/cxl/test/cxl.c | 241 ++++++++++++++++++++++------ > tools/testing/cxl/test/mem.c | 40 ++++- > tools/testing/cxl/test/mock.c | 19 ++ > tools/testing/cxl/test/mock.h | 3 > 17 files changed, 712 insertions(+), 518 deletions(-) I have tested this series and the enumeration is as expected (see sysfs dump below). I see an HDM failure which I am investigating, but that seems unrelated to this series. You can add to this series my: Tested-by: Robert Richter <rrichter@amd.com> The drawback of this topology decision is that the endpoint's port is a child of root0, which is the ACPI0017's device (not the CXL host bridge): /sys/bus/cxl/devices/root0/endpoint1 I understand this is the CXL's RCD view on the pcie hierarchy here. Logically there would be a cxl host port in between, like this: /sys/bus/cxl/devices/root0/port1/endpoint2 Esp. if there are multiple hosts in the system the port relations will be lost and can only be determined using the pci dev's parent bridge. port1 could then also hold the uport's component regs with the upstream port capabilities to access e.g. status regs for RAS. Anyway, let's see how this approach flies. Thanks for looing into this. -Robert --- # find /sys/bus/cxl/devices/ -ls 265293 0 drwxr-xr-x 2 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/ 265493 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/root0 -> ../../../devices/platform/ACPI0017:00/root0 265664 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/endpoint1 -> ../../../devices/platform/ACPI0017:00/root0/endpoint1 265584 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/mem0 -> ../../../devices/pci0000:7f/0000:7f:00.0/mem0 # find /sys/bus/cxl/devices/*/ -ls 265660 0 drwxr-xr-x 2 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/endpoint1/ 265661 0 -rw-r--r-- 1 root root 4096 Nov 30 14:18 /sys/bus/cxl/devices/endpoint1/uevent 265667 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/endpoint1/driver -> ../../../../../bus/cxl/drivers/cxl_port 265669 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/endpoint1/uport -> ../../../../pci0000:7f/0000:7f:00.0/mem0 265668 0 -r-------- 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/endpoint1/CDAT 265665 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/endpoint1/subsystem -> ../../../../../bus/cxl 265662 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/endpoint1/devtype 265663 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/endpoint1/modalias 265573 0 drwxr-xr-x 4 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/mem0/ 265574 0 -rw-r--r-- 1 root root 4096 Nov 30 14:18 /sys/bus/cxl/devices/mem0/uevent 265578 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/label_storage_size 265582 0 drwxr-xr-x 2 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/mem0/pmem 265583 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/pmem/size 265576 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/firmware_version 265579 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/numa_node 265586 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/dev 265659 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/mem0/driver -> ../../../../bus/cxl/drivers/cxl_mem 265580 0 drwxr-xr-x 2 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/mem0/ram 265581 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/ram/size 265585 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/mem0/subsystem -> ../../../../bus/cxl 265575 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/serial 265577 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/mem0/payload_max 265489 0 drwxr-xr-x 3 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/root0/ 265490 0 -rw-r--r-- 1 root root 4096 Nov 30 14:18 /sys/bus/cxl/devices/root0/uevent 265660 0 drwxr-xr-x 2 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/root0/endpoint1 265661 0 -rw-r--r-- 1 root root 4096 Nov 30 14:18 /sys/bus/cxl/devices/root0/endpoint1/uevent 265667 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/root0/endpoint1/driver -> ../../../../../bus/cxl/drivers/cxl_port 265669 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/root0/endpoint1/uport -> ../../../../pci0000:7f/0000:7f:00.0/mem0 265668 0 -r-------- 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/root0/endpoint1/CDAT 265665 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/root0/endpoint1/subsystem -> ../../../../../bus/cxl 265662 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/root0/endpoint1/devtype 265663 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/root0/endpoint1/modalias 265495 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/root0/uport -> ../../ACPI0017:00 265496 0 lrwxrwxrwx 1 root root 0 Nov 30 14:21 /sys/bus/cxl/devices/root0/dport4 -> ../../../pci0000:7f 265494 0 lrwxrwxrwx 1 root root 0 Nov 30 14:18 /sys/bus/cxl/devices/root0/subsystem -> ../../../../bus/cxl 265491 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/root0/devtype 265492 0 -r--r--r-- 1 root root 4096 Nov 30 14:21 /sys/bus/cxl/devices/root0/modalias
Robert Richter wrote: > Dan, > > On 24.11.22 10:34:35, Dan Williams wrote: > > Changes since v3 [1]: > > - Rework / simplify CXL to LIBNVDIMM coordination to remove a > > flush_work() locking dependency from underneath the root device lock. > > - Move the root device rescan to a workqueue > > - Connect RCDs directly as endpoints reachable through a CXL host bridge > > as a dport, i.e. drop the extra dport indirection from v3 > > - Add unit test infrastructure for an RCD configuration > > thank you for this posting. > > Patches #1-#6 are not really prerequisites (except for a trivial > conflict), right? I only reviewed them starting with #6. In fact they are pre-requisites because of this hunk in: [PATCH v4 10/12] cxl/port: Add RCD endpoint port enumeration http://lore.kernel.org/r/166931493266.2104015.8062923429837042172.stgit@dwillia2-xfh.jf.intel.com/ @@ -119,17 +131,22 @@ static int cxl_mem_probe(struct device *dev) return -ENXIO; } - device_lock(&parent_port->dev); - if (!parent_port->dev.driver) { + if (dport->rch) + endpoint_parent = parent_port->uport; + else + endpoint_parent = &parent_port->dev; + + device_lock(endpoint_parent); + if (!endpoint_parent->driver) { dev_err(dev, "CXL port topology %s not enabled\n", dev_name(&parent_port->dev)); rc = -ENXIO; goto unlock; } - rc = devm_cxl_add_endpoint(cxlmd, dport); + rc = devm_cxl_add_endpoint(endpoint_parent, cxlmd, dport); unlock: - device_unlock(&parent_port->dev); + device_unlock(endpoint_parent); put_device(&parent_port->dev); if (rc) return rc; That device_lock(endpoint_parent) in the RCH case locks the ACPI0017 device so that a devm action can be serialized against ACPI0017 being unbound from its driver. Before RCH support this path only locked cxl_port objects to syncrhonize with the cxl_port driver. In the RCH case the port that needs to be locked while the endpoint is attached is the one owned by the cxl_acpi driver. This all happens while holding device_lock(&cxlmd->dev). It means lockdep complains about cxl_bus_rescan() being done under device_lock(&ACPI0017->dev), since rescan may take device_lock(&cxlmd->dev), and it complains about the flush_work() in unregister_nvb() for a similar entanglement. So the initial patches in this series are all about making it possible setup an unregistration chain when the root CXL device goes through ->remove().
On 30.11.22 12:59:28, Dan Williams wrote: > Robert Richter wrote: > > Dan, > > > > On 24.11.22 10:34:35, Dan Williams wrote: > > > Changes since v3 [1]: > > > - Rework / simplify CXL to LIBNVDIMM coordination to remove a > > > flush_work() locking dependency from underneath the root device lock. > > > - Move the root device rescan to a workqueue > > > - Connect RCDs directly as endpoints reachable through a CXL host bridge > > > as a dport, i.e. drop the extra dport indirection from v3 > > > - Add unit test infrastructure for an RCD configuration > > > > thank you for this posting. > > > > Patches #1-#6 are not really prerequisites (except for a trivial > > conflict), right? I only reviewed them starting with #6. > > In fact they are pre-requisites because of this hunk in: Thanks for the explanation of using device_lock() here. -Robert
Changes since v3 [1]: - Rework / simplify CXL to LIBNVDIMM coordination to remove a flush_work() locking dependency from underneath the root device lock. - Move the root device rescan to a workqueue - Connect RCDs directly as endpoints reachable through a CXL host bridge as a dport, i.e. drop the extra dport indirection from v3 - Add unit test infrastructure for an RCD configuration [1]: http://lore.kernel.org/r/20221109104059.766720-1-rrichter@amd.com/ --- >From [PATCH v4 10/12] cxl/port: Add RCD endpoint port enumeration Unlike a CXL memory expander in a VH topology that has at least one intervening 'struct cxl_port' instance between itself and the CXL root device, an RCD attaches one-level higher. For example: VH ┌──────────┐ │ ACPI0017 │ │ root0 │ └─────┬────┘ │ ┌─────┴────┐ │ dport0 │ ┌─────┤ ACPI0016 ├─────┐ │ │ port1 │ │ │ └────┬─────┘ │ │ │ │ ┌──┴───┐ ┌──┴───┐ ┌───┴──┐ │dport0│ │dport1│ │dport2│ │ RP0 │ │ RP1 │ │ RP2 │ └──────┘ └──┬───┘ └──────┘ │ ┌───┴─────┐ │endpoint0│ │ port2 │ └─────────┘ ...vs: RCH ┌──────────┐ │ ACPI0017 │ │ root0 │ └────┬─────┘ │ ┌───┴────┐ │ dport0 │ │ACPI0016│ └───┬────┘ │ ┌────┴─────┐ │endpoint0 │ │ port1 │ └──────────┘ So arrange for endpoint port in the RCH/RCD case to appear directly connected to the host-bridge in its singular role as a dport. Compare that to the VH case where the host-bridge serves a dual role as a 'cxl_dport' for the CXL root device *and* a 'cxl_port' upstream port for the Root Ports in the Root Complex that are modeled as 'cxl_dport' instances in the CXL topology. Another deviation from the VH case is that RCDs may need to look up their component registers from the Root Complex Register Block (RCRB). That platform firmware specified RCRB area is cached by the cxl_acpi driver and conveyed via the host-bridge dport to the cxl_mem driver to perform the cxl_rcrb_to_component() lookup for the endpoint port (See 9.11.8 CXL Devices Attached to an RCH for the lookup of the upstream port component registers). --- Dan Williams (9): cxl/acpi: Simplify cxl_nvdimm_bridge probing cxl/region: Drop redundant pmem region release handling cxl/pmem: Refactor nvdimm device registration, delete the workqueue cxl/pmem: Remove the cxl_pmem_wq and related infrastructure cxl/acpi: Move rescan to the workqueue tools/testing/cxl: Make mock CEDT parsing more robust cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem cxl/port: Add RCD endpoint port enumeration tools/testing/cxl: Add an RCH topology Robert Richter (2): cxl/ACPI: Register CXL host ports by bridge device cxl/acpi: Extract component registers of restricted hosts from RCRB Terry Bowman (1): cxl/acpi: Set ACPI's CXL _OSC to indicate CXL1.1 support drivers/acpi/pci_root.c | 1 drivers/cxl/acpi.c | 105 +++++++++--- drivers/cxl/core/core.h | 8 - drivers/cxl/core/pmem.c | 94 +++++++---- drivers/cxl/core/port.c | 111 +++++++------ drivers/cxl/core/region.c | 54 ++++++ drivers/cxl/core/regs.c | 56 +++++++ drivers/cxl/cxl.h | 46 +++-- drivers/cxl/cxlmem.h | 15 ++ drivers/cxl/mem.c | 72 ++++++++ drivers/cxl/pci.c | 13 +- drivers/cxl/pmem.c | 351 +++++------------------------------------ tools/testing/cxl/Kbuild | 1 tools/testing/cxl/test/cxl.c | 241 ++++++++++++++++++++++------ tools/testing/cxl/test/mem.c | 40 ++++- tools/testing/cxl/test/mock.c | 19 ++ tools/testing/cxl/test/mock.h | 3 17 files changed, 712 insertions(+), 518 deletions(-) base-commit: 3b39fd6cf12ceda2a2582dcb9b9ee9f4d197b857