diff mbox series

[v4,33/40] cxl/mem: Add the cxl_mem driver

Message ID 164316691403.3437657.5374419213236572727.stgit@dwillia2-desk3.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show
Series None | expand

Commit Message

Dan Williams Jan. 26, 2022, 3:16 a.m. UTC
From: Ben Widawsky <ben.widawsky@intel.com>

At this point the subsystem can enumerate all CXL ports (CXL.mem decode
resources in upstream switch ports and host bridges) in a system. The
last mile is connecting those ports to endpoints.

The cxl_mem driver connects an endpoint device to the platform CXL.mem
protoctol decode-topology. At ->probe() time it walks its
device-topology-ancestry and adds a CXL Port object at every Upstream
Port hop until it gets to CXL root. The CXL root object is only present
after a platform firmware driver registers platform CXL resources. For
ACPI based platform this is managed by the ACPI0017 device and the
cxl_acpi driver.

The ports are registered such that disabling a given port automatically
unregisters all descendant ports, and the chain can only be registered
after the root is established.

Given ACPI device scanning may run asynchronously compared to PCI device
scanning the root driver is tasked with rescanning the bus after the
root successfully probes.

Conversely if any ports in a chain between the root and an endpoint
becomes disconnected it subsequently triggers the endpoint to
unregister. Given lock depenedencies the endpoint unregistration happens
in a workqueue asynchronously. If userspace cares about synchronizing
delayed work after port events the /sys/bus/cxl/flush attribute is
available for that purpose.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
[djbw: clarify changelog, rework hotplug support]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes in v4:
- Fix HDM init success case in cxl_dvsec_decode_init() (Ben)

 Documentation/ABI/testing/sysfs-bus-cxl         |    9 +
 Documentation/driver-api/cxl/memory-devices.rst |    9 +
 drivers/cxl/Kconfig                             |   16 ++
 drivers/cxl/Makefile                            |    2 
 drivers/cxl/acpi.c                              |    4 
 drivers/cxl/core/core.h                         |    1 
 drivers/cxl/core/memdev.c                       |   18 ++
 drivers/cxl/core/port.c                         |   91 +++++++++
 drivers/cxl/cxl.h                               |    5 +
 drivers/cxl/cxlmem.h                            |    9 +
 drivers/cxl/mem.c                               |  222 +++++++++++++++++++++++
 drivers/cxl/port.c                              |   13 +
 tools/testing/cxl/Kbuild                        |    5 +
 tools/testing/cxl/mock_mem.c                    |   10 +
 14 files changed, 407 insertions(+), 7 deletions(-)
 create mode 100644 drivers/cxl/mem.c
 create mode 100644 tools/testing/cxl/mock_mem.c

Comments

Jonathan Cameron Feb. 1, 2022, 12:45 p.m. UTC | #1
On Tue, 25 Jan 2022 19:16:05 -0800
Dan Williams <dan.j.williams@intel.com> wrote:

> From: Ben Widawsky <ben.widawsky@intel.com>
> 
> At this point the subsystem can enumerate all CXL ports (CXL.mem decode
> resources in upstream switch ports and host bridges) in a system. The
> last mile is connecting those ports to endpoints.
> 
> The cxl_mem driver connects an endpoint device to the platform CXL.mem
> protoctol decode-topology. At ->probe() time it walks its
> device-topology-ancestry and adds a CXL Port object at every Upstream
> Port hop until it gets to CXL root. The CXL root object is only present
> after a platform firmware driver registers platform CXL resources. For
> ACPI based platform this is managed by the ACPI0017 device and the
> cxl_acpi driver.
> 
> The ports are registered such that disabling a given port automatically
> unregisters all descendant ports, and the chain can only be registered
> after the root is established.
> 
> Given ACPI device scanning may run asynchronously compared to PCI device
> scanning the root driver is tasked with rescanning the bus after the
> root successfully probes.
> 
> Conversely if any ports in a chain between the root and an endpoint
> becomes disconnected it subsequently triggers the endpoint to
> unregister. Given lock depenedencies the endpoint unregistration happens
> in a workqueue asynchronously. If userspace cares about synchronizing
> delayed work after port events the /sys/bus/cxl/flush attribute is
> available for that purpose.
> 
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> [djbw: clarify changelog, rework hotplug support]
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

A few comments inline.

Jonathan

> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 7bd53dc691ec..df6691d0a6d0 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -314,7 +314,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
>  	if (rc < 0)
>  		return rc;
>  
> -	return 0;
> +	/* In case PCI is scanned before ACPI re-trigger memdev attach */
> +	return cxl_bus_rescan();
>  }
>  
>  static const struct acpi_device_id cxl_acpi_ids[] = {
> @@ -335,3 +336,4 @@ module_platform_driver(cxl_acpi_driver);
>  MODULE_LICENSE("GPL v2");
>  MODULE_IMPORT_NS(CXL);
>  MODULE_IMPORT_NS(ACPI);
> +MODULE_SOFTDEP("pre: cxl_port");
I think a comment on 'why' would be useful for the SOFTDEP.

They are rare enough that it might surprise people.

> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 1a50c0fc399c..efbaa851929d 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -6,6 +6,7 @@
>  
>  extern const struct device_type cxl_nvdimm_bridge_type;
>  extern const struct device_type cxl_nvdimm_type;
> +extern const struct device_type cxl_memdev_type;
>  
>  extern struct attribute_group cxl_base_attribute_group;
>  
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index b2773664e407..ee0156419d06 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -155,13 +155,19 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = {
>  	NULL,
>  };
>  
> -static const struct device_type cxl_memdev_type = {
> +const struct device_type cxl_memdev_type = {

Currently this is only exposed for type checking and you also have is_cxl_memdev for
that so seems a bit unnecessary.

>  	.name = "cxl_memdev",
>  	.release = cxl_memdev_release,
>  	.devnode = cxl_memdev_devnode,
>  	.groups = cxl_memdev_attribute_groups,
>  };
>  
> +bool is_cxl_memdev(struct device *dev)
> +{
> +	return dev->type == &cxl_memdev_type;
> +}
> +EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, CXL);
> +
>  /**
>   * set_exclusive_cxl_commands() - atomically disable user cxl commands
>   * @cxlds: The device state to operate on
> @@ -213,6 +219,15 @@ static void cxl_memdev_unregister(void *_cxlmd)
>  	put_device(dev);
>  }
>  
> +static void detach_memdev(struct work_struct *work)
> +{
> +	struct cxl_memdev *cxlmd;
> +
> +	cxlmd = container_of(work, typeof(*cxlmd), detach_work);
> +	device_release_driver(&cxlmd->dev);
> +	put_device(&cxlmd->dev);
> +}
> +
>  static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>  					   const struct file_operations *fops)
>  {
> @@ -237,6 +252,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>  	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
>  	dev->type = &cxl_memdev_type;
>  	device_set_pm_not_required(dev);
> +	INIT_WORK(&cxlmd->detach_work, detach_memdev);
>  
>  	cdev = &cxlmd->cdev;
>  	cdev_init(cdev, fops);
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index b61957636907..75a66540a795 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
>  #include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/workqueue.h>
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> @@ -46,6 +47,8 @@ static int cxl_device_id(struct device *dev)
>  			return CXL_DEVICE_ROOT;
>  		return CXL_DEVICE_PORT;
>  	}
> +	if (dev->type == &cxl_memdev_type)

is_cxl_memdev() ?
Having dong that, is there any need to expose cxl_memdev_type?

> +		return CXL_DEVICE_MEMORY_EXPANDER;
>  	return 0;
>  }
>  
> @@ -320,8 +323,10 @@ static void unregister_port(void *_port)
>  {
>  	struct cxl_port *port = _port;
>  
> -	if (!is_cxl_root(port))
> +	if (!is_cxl_root(port)) {
>  		device_lock_assert(port->dev.parent);
> +		port->uport = NULL;
> +	}
>  
>  	device_unregister(&port->dev);
>  }

...

> +static void delete_endpoint(void *data)
> +{
> +	struct cxl_memdev *cxlmd = data;
> +	struct cxl_port *endpoint = dev_get_drvdata(&cxlmd->dev);
> +	struct cxl_port *parent_port;
> +	struct device *parent;
> +
> +	parent_port = cxl_mem_find_port(cxlmd);
> +	if (!parent_port)
> +		return;
> +	parent = &parent_port->dev;
> +
> +	cxl_device_lock(parent);
> +	if (parent->driver && endpoint->uport) {
> +		devm_release_action(parent, cxl_unlink_uport, endpoint);
> +		devm_release_action(parent, unregister_port, endpoint);
> +	}
> +	cxl_device_unlock(parent);
> +	put_device(parent);
> +	put_device(&endpoint->dev);
> +}
> +
> +int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
> +{
> +	struct device *dev = &cxlmd->dev;
> +
> +	get_device(&endpoint->dev);
> +	dev_set_drvdata(dev, endpoint);

That's a little nasty if it's just to provide
a second parameter to delete_endpoint, but I guess nothing else was using the drvdata..

> +	return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);

...


> +static BUS_ATTR_WO(flush);
> +
>  static __init int cxl_core_init(void)
>  {
>  	int rc;
> @@ -1329,12 +1395,27 @@ static __init int cxl_core_init(void)
>  	if (rc)
>  		return rc;
>  
> +	cxl_bus_wq = alloc_ordered_workqueue("cxl_port", 0);
> +	if (!cxl_bus_wq) {
> +		rc = -ENOMEM;
> +		goto err_wq;
> +	}
> +
>  	rc = bus_register(&cxl_bus_type);
>  	if (rc)
> -		goto err;
> +		goto err_bus;
> +
> +	rc = bus_create_file(&cxl_bus_type, &bus_attr_flush);

Can't we add this as part of the bus_type?  Always good to avoid
dynamic sysfs file creation if we possibly can.

> +	if (rc)
> +		goto err_flush;
> +
>  	return 0;
>  
> -err:
> +err_flush:
> +	bus_unregister(&cxl_bus_type);
> +err_bus:
> +	destroy_workqueue(cxl_bus_wq);
> +err_wq:
>  	cxl_memdev_exit();
>  	cxl_mbox_exit();
>  	return rc;
> @@ -1342,7 +1423,9 @@ static __init int cxl_core_init(void)
>  
>  static void cxl_core_exit(void)
>  {
> +	bus_remove_file(&cxl_bus_type, &bus_attr_flush);
>  	bus_unregister(&cxl_bus_type);
> +	destroy_workqueue(cxl_bus_wq);
>  	cxl_memdev_exit();
>  	cxl_mbox_exit();
>  }
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b71d40b68ccd..0bbe394f2f26 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -323,6 +323,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
>  struct cxl_port *find_cxl_root(struct device *dev);
>  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
>  int cxl_bus_rescan(void);
> +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);

Should be in previous patch where the function is defined.

> +bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
>  

> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 0ba0cf8dcdbc..7ba0edb4a1ab 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -35,11 +35,14 @@
>   * @cdev: char dev core object for ioctl operations
>   * @cxlds: The device state backing this device
>   * @id: id number of this memdev instance.
> + * @detach_work: active memdev lost a port in its ancestry
> + * @component_reg_phys: register base of component registers

?

>   */
>  struct cxl_memdev {
>  	struct device dev;
>  	struct cdev cdev;
>  	struct cxl_dev_state *cxlds;
> +	struct work_struct detach_work;
>  	int id;
>  };
>  
> @@ -48,6 +51,12 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
>  	return container_of(dev, struct cxl_memdev, dev);
>  }
>  
> +bool is_cxl_memdev(struct device *dev);
> +static inline bool is_cxl_endpoint(struct cxl_port *port)
> +{
> +	return is_cxl_memdev(port->uport);
> +}
> +
>  struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
>  
>  /**
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> new file mode 100644
> index 000000000000..27f9dd0d55b6
> --- /dev/null
> +++ b/drivers/cxl/mem.c
> @@ -0,0 +1,222 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +
> +#include "cxlmem.h"
> +#include "cxlpci.h"
> +
> +/**
> + * DOC: cxl mem
> + *
> + * CXL memory endpoint devices and switches are CXL capable devices that are
> + * participating in CXL.mem protocol. Their functionality builds on top of the
> + * CXL.io protocol that allows enumerating and configuring components via
> + * standard PCI mechanisms.
> + *
> + * The cxl_mem driver owns kicking off the enumeration of this CXL.mem
> + * capability. With the detection of a CXL capable endpoint, the driver will
> + * walk up to find the platform specific port it is connected to, and determine
> + * if there are intervening switches in the path. If there are switches, a
> + * secondary action to enumerate those (implemented in cxl_core).

action is to

> Finally the
> + * cxl_mem driver will add the device it is bound to as a CXL port for use in
> + * higher level operations.
> + */

...


> +
> +/**
> + * cxl_dvsec_decode_init() - Setup HDM decoding for the endpoint
> + * @cxlds: Device state
> + *
> + * Additionally, enables global HDM decoding. Warning: don't call this outside
> + * of probe. Once probe is complete, the port driver owns all access to the HDM
> + * decoder registers.
> + *
> + * Returns: false if DVSEC Ranges are being used instead of HDM decoders;
> + *	    otherwise returns true.

It feels like some of the paths below could reflect other problems
rather than an intention to use ranges.  Maybe this nees to have separate
error handling from detection of range register usage?

> + */
> +__mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_endpoint_dvsec_info *info = &cxlds->info;
> +	struct cxl_register_map map;
> +	struct cxl_component_reg_map *cmap = &map.component_map;
> +	bool global_enable, do_hdm_init = false;
> +	void __iomem *crb;
> +	u32 global_ctrl;
> +
> +	/* map hdm decoder */
> +	crb = ioremap(cxlds->component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE);
> +	if (!crb) {
> +		dev_dbg(cxlds->dev, "Failed to map component registers\n");
> +		return false;
> +	}
> +
> +	cxl_probe_component_regs(cxlds->dev, crb, cmap);
> +	if (!cmap->hdm_decoder.valid) {
> +		dev_dbg(cxlds->dev, "Invalid HDM decoder registers\n");
> +		goto out;
> +	}
> +
> +	global_ctrl = readl(crb + cmap->hdm_decoder.offset +
> +			    CXL_HDM_DECODER_CTRL_OFFSET);
> +	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> +	if (!global_enable && info->ranges) {
> +		dev_dbg(cxlds->dev, "DVSEC regions\n");

Perhaps worth adding a little more description to that.
Perhaps

"DVSEC ranges already programmed and HDM decoders not enabled."

> +		goto out;
> +	}
> +
> +	do_hdm_init = true;
> +
> +	/*
> +	 * Turn on global enable now since DVSEC ranges aren't being used and
> +	 * we'll eventually want the decoder enabled. This also prevents special
> +	 * casing in the port driver since this only applies to endpoints.

Possibly worth saying why it isn't worth turning this off again in the remove
path...

> +	 */
> +	if (!global_enable) {
> +		dev_dbg(cxlds->dev, "Enabling HDM decode\n");
> +		writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
> +		       crb + cmap->hdm_decoder.offset +
> +			       CXL_HDM_DECODER_CTRL_OFFSET);
> +	}
> +
> +out:
> +	iounmap(crb);
> +	return do_hdm_init;
> +}
> +

> +MODULE_LICENSE("GPL v2");
> +MODULE_IMPORT_NS(CXL);
> +MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER);
> +MODULE_SOFTDEP("pre: cxl_port");

As above, a 'why' comment for the softdep would be a good to have.

> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index ae94a537eccc..27ab7f8d122e 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -25,16 +25,27 @@
>   * PCIe topology.
>   */
>  
> +static void schedule_detach(void *cxlmd)
> +{
> +	schedule_cxl_memdev_detach(cxlmd);
> +}
> +
>  static int cxl_port_probe(struct device *dev)
>  {
>  	struct cxl_port *port = to_cxl_port(dev);
>  	struct cxl_hdm *cxlhdm;
>  	int rc;
>  
> +	if (is_cxl_endpoint(port)) {
> +		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> +
> +		get_device(&cxlmd->dev);
> +		return devm_add_action_or_reset(dev, schedule_detach, cxlmd);
> +	}
> +
>  	rc = devm_cxl_port_enumerate_dports(port);
>  	if (rc < 0)
>  		return rc;
> -

Reasonable to drop this, but not in this patch.


>  	if (rc == 1)
>  		return devm_cxl_add_passthrough_decoder(port);
>
Ben Widawsky Feb. 1, 2022, 5:44 p.m. UTC | #2
On 22-02-01 12:45:06, Jonathan Cameron wrote:
> On Tue, 25 Jan 2022 19:16:05 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > From: Ben Widawsky <ben.widawsky@intel.com>
> > 
> > At this point the subsystem can enumerate all CXL ports (CXL.mem decode
> > resources in upstream switch ports and host bridges) in a system. The
> > last mile is connecting those ports to endpoints.
> > 
> > The cxl_mem driver connects an endpoint device to the platform CXL.mem
> > protoctol decode-topology. At ->probe() time it walks its
> > device-topology-ancestry and adds a CXL Port object at every Upstream
> > Port hop until it gets to CXL root. The CXL root object is only present
> > after a platform firmware driver registers platform CXL resources. For
> > ACPI based platform this is managed by the ACPI0017 device and the
> > cxl_acpi driver.
> > 
> > The ports are registered such that disabling a given port automatically
> > unregisters all descendant ports, and the chain can only be registered
> > after the root is established.
> > 
> > Given ACPI device scanning may run asynchronously compared to PCI device
> > scanning the root driver is tasked with rescanning the bus after the
> > root successfully probes.
> > 
> > Conversely if any ports in a chain between the root and an endpoint
> > becomes disconnected it subsequently triggers the endpoint to
> > unregister. Given lock depenedencies the endpoint unregistration happens
> > in a workqueue asynchronously. If userspace cares about synchronizing
> > delayed work after port events the /sys/bus/cxl/flush attribute is
> > available for that purpose.
> > 
> > Reported-by: Randy Dunlap <rdunlap@infradead.org>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > [djbw: clarify changelog, rework hotplug support]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> 
> A few comments inline.
> 
> Jonathan
> 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 7bd53dc691ec..df6691d0a6d0 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -314,7 +314,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >  	if (rc < 0)
> >  		return rc;
> >  
> > -	return 0;
> > +	/* In case PCI is scanned before ACPI re-trigger memdev attach */
> > +	return cxl_bus_rescan();
> >  }
> >  
> >  static const struct acpi_device_id cxl_acpi_ids[] = {
> > @@ -335,3 +336,4 @@ module_platform_driver(cxl_acpi_driver);
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_IMPORT_NS(CXL);
> >  MODULE_IMPORT_NS(ACPI);
> > +MODULE_SOFTDEP("pre: cxl_port");
> I think a comment on 'why' would be useful for the SOFTDEP.
> 
> They are rare enough that it might surprise people.

I wonder with Dan's reworks in core/port if this is actually still needed. I'm
thinking it isn't.

> 
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 1a50c0fc399c..efbaa851929d 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -6,6 +6,7 @@
> >  
> >  extern const struct device_type cxl_nvdimm_bridge_type;
> >  extern const struct device_type cxl_nvdimm_type;
> > +extern const struct device_type cxl_memdev_type;
> >  
> >  extern struct attribute_group cxl_base_attribute_group;
> >  
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index b2773664e407..ee0156419d06 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -155,13 +155,19 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = {
> >  	NULL,
> >  };
> >  
> > -static const struct device_type cxl_memdev_type = {
> > +const struct device_type cxl_memdev_type = {
> 
> Currently this is only exposed for type checking and you also have is_cxl_memdev for
> that so seems a bit unnecessary.

Looks like this was leftover after some earlier change. I don't see a reason to
expose it either.

> 
> >  	.name = "cxl_memdev",
> >  	.release = cxl_memdev_release,
> >  	.devnode = cxl_memdev_devnode,
> >  	.groups = cxl_memdev_attribute_groups,
> >  };
> >  
> > +bool is_cxl_memdev(struct device *dev)
> > +{
> > +	return dev->type == &cxl_memdev_type;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, CXL);
> > +
> >  /**
> >   * set_exclusive_cxl_commands() - atomically disable user cxl commands
> >   * @cxlds: The device state to operate on
> > @@ -213,6 +219,15 @@ static void cxl_memdev_unregister(void *_cxlmd)
> >  	put_device(dev);
> >  }
> >  
> > +static void detach_memdev(struct work_struct *work)
> > +{
> > +	struct cxl_memdev *cxlmd;
> > +
> > +	cxlmd = container_of(work, typeof(*cxlmd), detach_work);
> > +	device_release_driver(&cxlmd->dev);
> > +	put_device(&cxlmd->dev);
> > +}
> > +
> >  static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> >  					   const struct file_operations *fops)
> >  {
> > @@ -237,6 +252,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> >  	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> >  	dev->type = &cxl_memdev_type;
> >  	device_set_pm_not_required(dev);
> > +	INIT_WORK(&cxlmd->detach_work, detach_memdev);
> >  
> >  	cdev = &cxlmd->cdev;
> >  	cdev_init(cdev, fops);
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index b61957636907..75a66540a795 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1,6 +1,7 @@
> >  // SPDX-License-Identifier: GPL-2.0-only
> >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> >  #include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/workqueue.h>
> >  #include <linux/device.h>
> >  #include <linux/module.h>
> >  #include <linux/pci.h>
> > @@ -46,6 +47,8 @@ static int cxl_device_id(struct device *dev)
> >  			return CXL_DEVICE_ROOT;
> >  		return CXL_DEVICE_PORT;
> >  	}
> > +	if (dev->type == &cxl_memdev_type)
> 
> is_cxl_memdev() ?
> Having dong that, is there any need to expose cxl_memdev_type?
> 

Same as above.

> > +		return CXL_DEVICE_MEMORY_EXPANDER;
> >  	return 0;
> >  }
> >  
> > @@ -320,8 +323,10 @@ static void unregister_port(void *_port)
> >  {
> >  	struct cxl_port *port = _port;
> >  
> > -	if (!is_cxl_root(port))
> > +	if (!is_cxl_root(port)) {
> >  		device_lock_assert(port->dev.parent);
> > +		port->uport = NULL;
> > +	}
> >  
> >  	device_unregister(&port->dev);
> >  }
> 
> ...
> 
> > +static void delete_endpoint(void *data)
> > +{
> > +	struct cxl_memdev *cxlmd = data;
> > +	struct cxl_port *endpoint = dev_get_drvdata(&cxlmd->dev);
> > +	struct cxl_port *parent_port;
> > +	struct device *parent;
> > +
> > +	parent_port = cxl_mem_find_port(cxlmd);
> > +	if (!parent_port)
> > +		return;
> > +	parent = &parent_port->dev;
> > +
> > +	cxl_device_lock(parent);
> > +	if (parent->driver && endpoint->uport) {
> > +		devm_release_action(parent, cxl_unlink_uport, endpoint);
> > +		devm_release_action(parent, unregister_port, endpoint);
> > +	}
> > +	cxl_device_unlock(parent);
> > +	put_device(parent);
> > +	put_device(&endpoint->dev);
> > +}
> > +
> > +int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
> > +{
> > +	struct device *dev = &cxlmd->dev;
> > +
> > +	get_device(&endpoint->dev);
> > +	dev_set_drvdata(dev, endpoint);
> 
> That's a little nasty if it's just to provide
> a second parameter to delete_endpoint, but I guess nothing else was using the drvdata..

I think Dan authored this bit. In my region series I cache the port that the
endpoint created and it could be used instead. I haven't actually looked at the
feedback from Dan on that patch yet.

All the comments below this seem reasonable to me.

> 
> > +	return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
> 
> ...
> 
> 
> > +static BUS_ATTR_WO(flush);
> > +
> >  static __init int cxl_core_init(void)
> >  {
> >  	int rc;
> > @@ -1329,12 +1395,27 @@ static __init int cxl_core_init(void)
> >  	if (rc)
> >  		return rc;
> >  
> > +	cxl_bus_wq = alloc_ordered_workqueue("cxl_port", 0);
> > +	if (!cxl_bus_wq) {
> > +		rc = -ENOMEM;
> > +		goto err_wq;
> > +	}
> > +
> >  	rc = bus_register(&cxl_bus_type);
> >  	if (rc)
> > -		goto err;
> > +		goto err_bus;
> > +
> > +	rc = bus_create_file(&cxl_bus_type, &bus_attr_flush);
> 
> Can't we add this as part of the bus_type?  Always good to avoid
> dynamic sysfs file creation if we possibly can.
> 
> > +	if (rc)
> > +		goto err_flush;
> > +
> >  	return 0;
> >  
> > -err:
> > +err_flush:
> > +	bus_unregister(&cxl_bus_type);
> > +err_bus:
> > +	destroy_workqueue(cxl_bus_wq);
> > +err_wq:
> >  	cxl_memdev_exit();
> >  	cxl_mbox_exit();
> >  	return rc;
> > @@ -1342,7 +1423,9 @@ static __init int cxl_core_init(void)
> >  
> >  static void cxl_core_exit(void)
> >  {
> > +	bus_remove_file(&cxl_bus_type, &bus_attr_flush);
> >  	bus_unregister(&cxl_bus_type);
> > +	destroy_workqueue(cxl_bus_wq);
> >  	cxl_memdev_exit();
> >  	cxl_mbox_exit();
> >  }
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b71d40b68ccd..0bbe394f2f26 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -323,6 +323,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> >  struct cxl_port *find_cxl_root(struct device *dev);
> >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> >  int cxl_bus_rescan(void);
> > +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);
> 
> Should be in previous patch where the function is defined.
> 
> > +bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
> >  
> 
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 0ba0cf8dcdbc..7ba0edb4a1ab 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -35,11 +35,14 @@
> >   * @cdev: char dev core object for ioctl operations
> >   * @cxlds: The device state backing this device
> >   * @id: id number of this memdev instance.
> > + * @detach_work: active memdev lost a port in its ancestry
> > + * @component_reg_phys: register base of component registers
> 
> ?
> 
> >   */
> >  struct cxl_memdev {
> >  	struct device dev;
> >  	struct cdev cdev;
> >  	struct cxl_dev_state *cxlds;
> > +	struct work_struct detach_work;
> >  	int id;
> >  };
> >  
> > @@ -48,6 +51,12 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
> >  	return container_of(dev, struct cxl_memdev, dev);
> >  }
> >  
> > +bool is_cxl_memdev(struct device *dev);
> > +static inline bool is_cxl_endpoint(struct cxl_port *port)
> > +{
> > +	return is_cxl_memdev(port->uport);
> > +}
> > +
> >  struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> >  
> >  /**
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > new file mode 100644
> > index 000000000000..27f9dd0d55b6
> > --- /dev/null
> > +++ b/drivers/cxl/mem.c
> > @@ -0,0 +1,222 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +
> > +#include "cxlmem.h"
> > +#include "cxlpci.h"
> > +
> > +/**
> > + * DOC: cxl mem
> > + *
> > + * CXL memory endpoint devices and switches are CXL capable devices that are
> > + * participating in CXL.mem protocol. Their functionality builds on top of the
> > + * CXL.io protocol that allows enumerating and configuring components via
> > + * standard PCI mechanisms.
> > + *
> > + * The cxl_mem driver owns kicking off the enumeration of this CXL.mem
> > + * capability. With the detection of a CXL capable endpoint, the driver will
> > + * walk up to find the platform specific port it is connected to, and determine
> > + * if there are intervening switches in the path. If there are switches, a
> > + * secondary action to enumerate those (implemented in cxl_core).
> 
> action is to
> 
> > Finally the
> > + * cxl_mem driver will add the device it is bound to as a CXL port for use in
> > + * higher level operations.
> > + */
> 
> ...
> 
> 
> > +
> > +/**
> > + * cxl_dvsec_decode_init() - Setup HDM decoding for the endpoint
> > + * @cxlds: Device state
> > + *
> > + * Additionally, enables global HDM decoding. Warning: don't call this outside
> > + * of probe. Once probe is complete, the port driver owns all access to the HDM
> > + * decoder registers.
> > + *
> > + * Returns: false if DVSEC Ranges are being used instead of HDM decoders;
> > + *	    otherwise returns true.
> 
> It feels like some of the paths below could reflect other problems
> rather than an intention to use ranges.  Maybe this nees to have separate
> error handling from detection of range register usage?
> 
> > + */
> > +__mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds)
> > +{
> > +	struct cxl_endpoint_dvsec_info *info = &cxlds->info;
> > +	struct cxl_register_map map;
> > +	struct cxl_component_reg_map *cmap = &map.component_map;
> > +	bool global_enable, do_hdm_init = false;
> > +	void __iomem *crb;
> > +	u32 global_ctrl;
> > +
> > +	/* map hdm decoder */
> > +	crb = ioremap(cxlds->component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE);
> > +	if (!crb) {
> > +		dev_dbg(cxlds->dev, "Failed to map component registers\n");
> > +		return false;
> > +	}
> > +
> > +	cxl_probe_component_regs(cxlds->dev, crb, cmap);
> > +	if (!cmap->hdm_decoder.valid) {
> > +		dev_dbg(cxlds->dev, "Invalid HDM decoder registers\n");
> > +		goto out;
> > +	}
> > +
> > +	global_ctrl = readl(crb + cmap->hdm_decoder.offset +
> > +			    CXL_HDM_DECODER_CTRL_OFFSET);
> > +	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> > +	if (!global_enable && info->ranges) {
> > +		dev_dbg(cxlds->dev, "DVSEC regions\n");
> 
> Perhaps worth adding a little more description to that.
> Perhaps
> 
> "DVSEC ranges already programmed and HDM decoders not enabled."
> 
> > +		goto out;
> > +	}
> > +
> > +	do_hdm_init = true;
> > +
> > +	/*
> > +	 * Turn on global enable now since DVSEC ranges aren't being used and
> > +	 * we'll eventually want the decoder enabled. This also prevents special
> > +	 * casing in the port driver since this only applies to endpoints.
> 
> Possibly worth saying why it isn't worth turning this off again in the remove
> path...
> 
> > +	 */
> > +	if (!global_enable) {
> > +		dev_dbg(cxlds->dev, "Enabling HDM decode\n");
> > +		writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
> > +		       crb + cmap->hdm_decoder.offset +
> > +			       CXL_HDM_DECODER_CTRL_OFFSET);
> > +	}
> > +
> > +out:
> > +	iounmap(crb);
> > +	return do_hdm_init;
> > +}
> > +
> 
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_IMPORT_NS(CXL);
> > +MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER);
> > +MODULE_SOFTDEP("pre: cxl_port");
> 
> As above, a 'why' comment for the softdep would be a good to have.
> 
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index ae94a537eccc..27ab7f8d122e 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -25,16 +25,27 @@
> >   * PCIe topology.
> >   */
> >  
> > +static void schedule_detach(void *cxlmd)
> > +{
> > +	schedule_cxl_memdev_detach(cxlmd);
> > +}
> > +
> >  static int cxl_port_probe(struct device *dev)
> >  {
> >  	struct cxl_port *port = to_cxl_port(dev);
> >  	struct cxl_hdm *cxlhdm;
> >  	int rc;
> >  
> > +	if (is_cxl_endpoint(port)) {
> > +		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > +
> > +		get_device(&cxlmd->dev);
> > +		return devm_add_action_or_reset(dev, schedule_detach, cxlmd);
> > +	}
> > +
> >  	rc = devm_cxl_port_enumerate_dports(port);
> >  	if (rc < 0)
> >  		return rc;
> > -
> 
> Reasonable to drop this, but not in this patch.
> 
> 
> >  	if (rc == 1)
> >  		return devm_cxl_add_passthrough_decoder(port);
> >  
>
Dan Williams Feb. 3, 2022, 2:49 a.m. UTC | #3
On Tue, Feb 1, 2022 at 4:45 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 25 Jan 2022 19:16:05 -0800
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > From: Ben Widawsky <ben.widawsky@intel.com>
> >
> > At this point the subsystem can enumerate all CXL ports (CXL.mem decode
> > resources in upstream switch ports and host bridges) in a system. The
> > last mile is connecting those ports to endpoints.
> >
> > The cxl_mem driver connects an endpoint device to the platform CXL.mem
> > protoctol decode-topology. At ->probe() time it walks its
> > device-topology-ancestry and adds a CXL Port object at every Upstream
> > Port hop until it gets to CXL root. The CXL root object is only present
> > after a platform firmware driver registers platform CXL resources. For
> > ACPI based platform this is managed by the ACPI0017 device and the
> > cxl_acpi driver.
> >
> > The ports are registered such that disabling a given port automatically
> > unregisters all descendant ports, and the chain can only be registered
> > after the root is established.
> >
> > Given ACPI device scanning may run asynchronously compared to PCI device
> > scanning the root driver is tasked with rescanning the bus after the
> > root successfully probes.
> >
> > Conversely if any ports in a chain between the root and an endpoint
> > becomes disconnected it subsequently triggers the endpoint to
> > unregister. Given lock depenedencies the endpoint unregistration happens
> > in a workqueue asynchronously. If userspace cares about synchronizing
> > delayed work after port events the /sys/bus/cxl/flush attribute is
> > available for that purpose.
> >
> > Reported-by: Randy Dunlap <rdunlap@infradead.org>
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > [djbw: clarify changelog, rework hotplug support]
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
>
> A few comments inline.
>
> Jonathan
>
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 7bd53dc691ec..df6691d0a6d0 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -314,7 +314,8 @@ static int cxl_acpi_probe(struct platform_device *pdev)
> >       if (rc < 0)
> >               return rc;
> >
> > -     return 0;
> > +     /* In case PCI is scanned before ACPI re-trigger memdev attach */
> > +     return cxl_bus_rescan();
> >  }
> >
> >  static const struct acpi_device_id cxl_acpi_ids[] = {
> > @@ -335,3 +336,4 @@ module_platform_driver(cxl_acpi_driver);
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_IMPORT_NS(CXL);
> >  MODULE_IMPORT_NS(ACPI);
> > +MODULE_SOFTDEP("pre: cxl_port");
> I think a comment on 'why' would be useful for the SOFTDEP.
>
> They are rare enough that it might surprise people.

Sure.

>
> > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> > index 1a50c0fc399c..efbaa851929d 100644
> > --- a/drivers/cxl/core/core.h
> > +++ b/drivers/cxl/core/core.h
> > @@ -6,6 +6,7 @@
> >
> >  extern const struct device_type cxl_nvdimm_bridge_type;
> >  extern const struct device_type cxl_nvdimm_type;
> > +extern const struct device_type cxl_memdev_type;
> >
> >  extern struct attribute_group cxl_base_attribute_group;
> >
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index b2773664e407..ee0156419d06 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -155,13 +155,19 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = {
> >       NULL,
> >  };
> >
> > -static const struct device_type cxl_memdev_type = {
> > +const struct device_type cxl_memdev_type = {
>
> Currently this is only exposed for type checking and you also have is_cxl_memdev for
> that so seems a bit unnecessary.

Yes, might have been a rebase error moving is_cxl_memdev() from a
'static inline' to an out-of-line helper in the core.

>
> >       .name = "cxl_memdev",
> >       .release = cxl_memdev_release,
> >       .devnode = cxl_memdev_devnode,
> >       .groups = cxl_memdev_attribute_groups,
> >  };
> >
> > +bool is_cxl_memdev(struct device *dev)
> > +{
> > +     return dev->type == &cxl_memdev_type;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, CXL);
> > +
> >  /**
> >   * set_exclusive_cxl_commands() - atomically disable user cxl commands
> >   * @cxlds: The device state to operate on
> > @@ -213,6 +219,15 @@ static void cxl_memdev_unregister(void *_cxlmd)
> >       put_device(dev);
> >  }
> >
> > +static void detach_memdev(struct work_struct *work)
> > +{
> > +     struct cxl_memdev *cxlmd;
> > +
> > +     cxlmd = container_of(work, typeof(*cxlmd), detach_work);
> > +     device_release_driver(&cxlmd->dev);
> > +     put_device(&cxlmd->dev);
> > +}
> > +
> >  static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> >                                          const struct file_operations *fops)
> >  {
> > @@ -237,6 +252,7 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> >       dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> >       dev->type = &cxl_memdev_type;
> >       device_set_pm_not_required(dev);
> > +     INIT_WORK(&cxlmd->detach_work, detach_memdev);
> >
> >       cdev = &cxlmd->cdev;
> >       cdev_init(cdev, fops);
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index b61957636907..75a66540a795 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1,6 +1,7 @@
> >  // SPDX-License-Identifier: GPL-2.0-only
> >  /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> >  #include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/workqueue.h>
> >  #include <linux/device.h>
> >  #include <linux/module.h>
> >  #include <linux/pci.h>
> > @@ -46,6 +47,8 @@ static int cxl_device_id(struct device *dev)
> >                       return CXL_DEVICE_ROOT;
> >               return CXL_DEVICE_PORT;
> >       }
> > +     if (dev->type == &cxl_memdev_type)
>
> is_cxl_memdev() ?
> Having dong that, is there any need to expose cxl_memdev_type?

Yup, good catch.

>
> > +             return CXL_DEVICE_MEMORY_EXPANDER;
> >       return 0;
> >  }
> >
> > @@ -320,8 +323,10 @@ static void unregister_port(void *_port)
> >  {
> >       struct cxl_port *port = _port;
> >
> > -     if (!is_cxl_root(port))
> > +     if (!is_cxl_root(port)) {
> >               device_lock_assert(port->dev.parent);
> > +             port->uport = NULL;
> > +     }
> >
> >       device_unregister(&port->dev);
> >  }
>
> ...
>
> > +static void delete_endpoint(void *data)
> > +{
> > +     struct cxl_memdev *cxlmd = data;
> > +     struct cxl_port *endpoint = dev_get_drvdata(&cxlmd->dev);
> > +     struct cxl_port *parent_port;
> > +     struct device *parent;
> > +
> > +     parent_port = cxl_mem_find_port(cxlmd);
> > +     if (!parent_port)
> > +             return;
> > +     parent = &parent_port->dev;
> > +
> > +     cxl_device_lock(parent);
> > +     if (parent->driver && endpoint->uport) {
> > +             devm_release_action(parent, cxl_unlink_uport, endpoint);
> > +             devm_release_action(parent, unregister_port, endpoint);
> > +     }
> > +     cxl_device_unlock(parent);
> > +     put_device(parent);
> > +     put_device(&endpoint->dev);
> > +}
> > +
> > +int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
> > +{
> > +     struct device *dev = &cxlmd->dev;
> > +
> > +     get_device(&endpoint->dev);
> > +     dev_set_drvdata(dev, endpoint);
>
> That's a little nasty if it's just to provide
> a second parameter to delete_endpoint, but I guess nothing else was using the drvdata..

Nothing else is using drvdata it and this endpoint port object really
is the primary driver state that matters for cxl_mem, so it would be
odd for it *not* to be available via drvdata.

>
> > +     return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
>
> ...
>
>
> > +static BUS_ATTR_WO(flush);
> > +
> >  static __init int cxl_core_init(void)
> >  {
> >       int rc;
> > @@ -1329,12 +1395,27 @@ static __init int cxl_core_init(void)
> >       if (rc)
> >               return rc;
> >
> > +     cxl_bus_wq = alloc_ordered_workqueue("cxl_port", 0);
> > +     if (!cxl_bus_wq) {
> > +             rc = -ENOMEM;
> > +             goto err_wq;
> > +     }
> > +
> >       rc = bus_register(&cxl_bus_type);
> >       if (rc)
> > -             goto err;
> > +             goto err_bus;
> > +
> > +     rc = bus_create_file(&cxl_bus_type, &bus_attr_flush);
>
> Can't we add this as part of the bus_type?  Always good to avoid
> dynamic sysfs file creation if we possibly can.

I agree, and apparently I did sped through this because setting
bus_groups before registration is certainly an option, will fix.

>
> > +     if (rc)
> > +             goto err_flush;
> > +
> >       return 0;
> >
> > -err:
> > +err_flush:
> > +     bus_unregister(&cxl_bus_type);
> > +err_bus:
> > +     destroy_workqueue(cxl_bus_wq);
> > +err_wq:
> >       cxl_memdev_exit();
> >       cxl_mbox_exit();
> >       return rc;
> > @@ -1342,7 +1423,9 @@ static __init int cxl_core_init(void)
> >
> >  static void cxl_core_exit(void)
> >  {
> > +     bus_remove_file(&cxl_bus_type, &bus_attr_flush);
> >       bus_unregister(&cxl_bus_type);
> > +     destroy_workqueue(cxl_bus_wq);
> >       cxl_memdev_exit();
> >       cxl_mbox_exit();
> >  }
> > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > index b71d40b68ccd..0bbe394f2f26 100644
> > --- a/drivers/cxl/cxl.h
> > +++ b/drivers/cxl/cxl.h
> > @@ -323,6 +323,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> >  struct cxl_port *find_cxl_root(struct device *dev);
> >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> >  int cxl_bus_rescan(void);
> > +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);
>
> Should be in previous patch where the function is defined.

Not really, because this patch is the first time it is used outside of
core/port.c. I would say convert the previous patch to make it static,
and move the export into this patch, but I'm also tempted to leave
well enough alone here unless there some additional reason to respin
patch 32.

>
> > +bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
> >
>
> > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> > index 0ba0cf8dcdbc..7ba0edb4a1ab 100644
> > --- a/drivers/cxl/cxlmem.h
> > +++ b/drivers/cxl/cxlmem.h
> > @@ -35,11 +35,14 @@
> >   * @cdev: char dev core object for ioctl operations
> >   * @cxlds: The device state backing this device
> >   * @id: id number of this memdev instance.
> > + * @detach_work: active memdev lost a port in its ancestry
> > + * @component_reg_phys: register base of component registers
>
> ?

Rebase error.

>
> >   */
> >  struct cxl_memdev {
> >       struct device dev;
> >       struct cdev cdev;
> >       struct cxl_dev_state *cxlds;
> > +     struct work_struct detach_work;
> >       int id;
> >  };
> >
> > @@ -48,6 +51,12 @@ static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
> >       return container_of(dev, struct cxl_memdev, dev);
> >  }
> >
> > +bool is_cxl_memdev(struct device *dev);
> > +static inline bool is_cxl_endpoint(struct cxl_port *port)
> > +{
> > +     return is_cxl_memdev(port->uport);
> > +}
> > +
> >  struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
> >
> >  /**
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > new file mode 100644
> > index 000000000000..27f9dd0d55b6
> > --- /dev/null
> > +++ b/drivers/cxl/mem.c
> > @@ -0,0 +1,222 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
> > +#include <linux/device.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +
> > +#include "cxlmem.h"
> > +#include "cxlpci.h"
> > +
> > +/**
> > + * DOC: cxl mem
> > + *
> > + * CXL memory endpoint devices and switches are CXL capable devices that are
> > + * participating in CXL.mem protocol. Their functionality builds on top of the
> > + * CXL.io protocol that allows enumerating and configuring components via
> > + * standard PCI mechanisms.
> > + *
> > + * The cxl_mem driver owns kicking off the enumeration of this CXL.mem
> > + * capability. With the detection of a CXL capable endpoint, the driver will
> > + * walk up to find the platform specific port it is connected to, and determine
> > + * if there are intervening switches in the path. If there are switches, a
> > + * secondary action to enumerate those (implemented in cxl_core).
>
> action is to

Yup.

>
> > Finally the
> > + * cxl_mem driver will add the device it is bound to as a CXL port for use in
> > + * higher level operations.
> > + */
>
> ...
>
>
> > +
> > +/**
> > + * cxl_dvsec_decode_init() - Setup HDM decoding for the endpoint
> > + * @cxlds: Device state
> > + *
> > + * Additionally, enables global HDM decoding. Warning: don't call this outside
> > + * of probe. Once probe is complete, the port driver owns all access to the HDM
> > + * decoder registers.
> > + *
> > + * Returns: false if DVSEC Ranges are being used instead of HDM decoders;
> > + *       otherwise returns true.
>
> It feels like some of the paths below could reflect other problems
> rather than an intention to use ranges.  Maybe this nees to have separate
> error handling from detection of range register usage?

I don't think they need to be broken out, but I do think the kdoc is a
bit misleading, will clarify.

>
> > + */
> > +__mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds)
> > +{
> > +     struct cxl_endpoint_dvsec_info *info = &cxlds->info;
> > +     struct cxl_register_map map;
> > +     struct cxl_component_reg_map *cmap = &map.component_map;
> > +     bool global_enable, do_hdm_init = false;
> > +     void __iomem *crb;
> > +     u32 global_ctrl;
> > +
> > +     /* map hdm decoder */
> > +     crb = ioremap(cxlds->component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE);
> > +     if (!crb) {
> > +             dev_dbg(cxlds->dev, "Failed to map component registers\n");
> > +             return false;
> > +     }
> > +
> > +     cxl_probe_component_regs(cxlds->dev, crb, cmap);
> > +     if (!cmap->hdm_decoder.valid) {
> > +             dev_dbg(cxlds->dev, "Invalid HDM decoder registers\n");
> > +             goto out;
> > +     }
> > +
> > +     global_ctrl = readl(crb + cmap->hdm_decoder.offset +
> > +                         CXL_HDM_DECODER_CTRL_OFFSET);
> > +     global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
> > +     if (!global_enable && info->ranges) {
> > +             dev_dbg(cxlds->dev, "DVSEC regions\n");
>
> Perhaps worth adding a little more description to that.
> Perhaps
>
> "DVSEC ranges already programmed and HDM decoders not enabled."

Looks good.

>
> > +             goto out;
> > +     }
> > +
> > +     do_hdm_init = true;
> > +
> > +     /*
> > +      * Turn on global enable now since DVSEC ranges aren't being used and
> > +      * we'll eventually want the decoder enabled. This also prevents special
> > +      * casing in the port driver since this only applies to endpoints.
>
> Possibly worth saying why it isn't worth turning this off again in the remove
> path...

Yeah, unlike PCI where the BARs are programmed at bus scan time and
the driver does the enable, CXL decoders are programmed synchronously
with enabling new interleaves. Also, for devices that have some
capacity already mapped by the BIOS into "Fixed" regions in the CFMWS,
global disable would potentially crash / hang the kernel.

Will add a comment for that and delete the comment about "preventing
special casing" since that's more changelog fodder than helpful inline
code comment.

>
> > +      */
> > +     if (!global_enable) {
> > +             dev_dbg(cxlds->dev, "Enabling HDM decode\n");
> > +             writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
> > +                    crb + cmap->hdm_decoder.offset +
> > +                            CXL_HDM_DECODER_CTRL_OFFSET);
> > +     }
> > +
> > +out:
> > +     iounmap(crb);
> > +     return do_hdm_init;
> > +}
> > +
>
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_IMPORT_NS(CXL);
> > +MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER);
> > +MODULE_SOFTDEP("pre: cxl_port");
>
> As above, a 'why' comment for the softdep would be a good to have.

Sure.

>
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index ae94a537eccc..27ab7f8d122e 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> > @@ -25,16 +25,27 @@
> >   * PCIe topology.
> >   */
> >
> > +static void schedule_detach(void *cxlmd)
> > +{
> > +     schedule_cxl_memdev_detach(cxlmd);
> > +}
> > +
> >  static int cxl_port_probe(struct device *dev)
> >  {
> >       struct cxl_port *port = to_cxl_port(dev);
> >       struct cxl_hdm *cxlhdm;
> >       int rc;
> >
> > +     if (is_cxl_endpoint(port)) {
> > +             struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > +
> > +             get_device(&cxlmd->dev);
> > +             return devm_add_action_or_reset(dev, schedule_detach, cxlmd);
> > +     }
> > +
> >       rc = devm_cxl_port_enumerate_dports(port);
> >       if (rc < 0)
> >               return rc;
> > -
>
> Reasonable to drop this, but not in this patch.

Ok.


>
>
> >       if (rc == 1)
> >               return devm_cxl_add_passthrough_decoder(port);
> >
>
Jonathan Cameron Feb. 3, 2022, 9:59 a.m. UTC | #4
Hi Dan,

> > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > index b71d40b68ccd..0bbe394f2f26 100644
> > > --- a/drivers/cxl/cxl.h
> > > +++ b/drivers/cxl/cxl.h
> > > @@ -323,6 +323,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> > >  struct cxl_port *find_cxl_root(struct device *dev);
> > >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> > >  int cxl_bus_rescan(void);
> > > +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);  
> >
> > Should be in previous patch where the function is defined.  
> 
> Not really, because this patch is the first time it is used outside of
> core/port.c. I would say convert the previous patch to make it static,
> and move the export into this patch, but I'm also tempted to leave
> well enough alone here unless there some additional reason to respin
> patch 32.

I hadn't read this when I sent reply to previous patch v4.  Up to you on
whether you tidy up or not.  Though I'm fairly sure you'll get
a missing static warning if you build previous patch without a header definition.
Agreed adding static then removing it again would be an option, but
meh, too much noise...  The one going the other way (defining a function
before it exists) is probably more important to fix.
Dan Williams Feb. 4, 2022, 2:54 p.m. UTC | #5
On Thu, Feb 3, 2022 at 2:00 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> Hi Dan,
>
> > > > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> > > > index b71d40b68ccd..0bbe394f2f26 100644
> > > > --- a/drivers/cxl/cxl.h
> > > > +++ b/drivers/cxl/cxl.h
> > > > @@ -323,6 +323,8 @@ struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
> > > >  struct cxl_port *find_cxl_root(struct device *dev);
> > > >  int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> > > >  int cxl_bus_rescan(void);
> > > > +struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);
> > >
> > > Should be in previous patch where the function is defined.
> >
> > Not really, because this patch is the first time it is used outside of
> > core/port.c. I would say convert the previous patch to make it static,
> > and move the export into this patch, but I'm also tempted to leave
> > well enough alone here unless there some additional reason to respin
> > patch 32.
>
> I hadn't read this when I sent reply to previous patch v4.  Up to you on
> whether you tidy up or not.  Though I'm fairly sure you'll get
> a missing static warning if you build previous patch without a header definition.
> Agreed adding static then removing it again would be an option, but
> meh, too much noise...  The one going the other way (defining a function
> before it exists) is probably more important to fix.

There's no warning about declaring a function that is never defined,
but that's egregious enough to go fixup.
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 0b51cfec0c66..7c2b846521f3 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -1,3 +1,12 @@ 
+What:		/sys/bus/cxl/flush
+Date:		Januarry, 2022
+KernelVersion:	v5.18
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(WO) If userspace manually unbinds a port the kernel schedules
+		all descendant memdevs for unbind. Writing '1' to this attribute
+		flushes that work.
+
 What:		/sys/bus/cxl/devices/memX/firmware_version
 Date:		December, 2020
 KernelVersion:	v5.12
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index 3498d38d7cbd..db476bb170b6 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -325,6 +325,9 @@  CXL Memory Device
 .. kernel-doc:: drivers/cxl/pci.c
    :internal:
 
+.. kernel-doc:: drivers/cxl/mem.c
+   :doc: cxl mem
+
 CXL Port
 --------
 .. kernel-doc:: drivers/cxl/port.c
@@ -344,6 +347,12 @@  CXL Core
 .. kernel-doc:: drivers/cxl/core/port.c
    :identifiers:
 
+.. kernel-doc:: drivers/cxl/core/pci.c
+   :doc: cxl core pci
+
+.. kernel-doc:: drivers/cxl/core/pci.c
+   :identifiers:
+
 .. kernel-doc:: drivers/cxl/core/pmem.c
    :doc: cxl pmem
 
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 4f4f7587f6ca..b88ab956bb7c 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -78,6 +78,22 @@  config CXL_PMEM
 
 	  If unsure say 'm'.
 
+config CXL_MEM
+	tristate "CXL: Memory Expansion"
+	depends on CXL_PCI
+	default CXL_BUS
+	help
+	  The CXL.mem protocol allows a device to act as a provider of "System
+	  RAM" and/or "Persistent Memory" that is fully coherent as if the
+	  memory were attached to the typical CPU memory controller. This is
+	  known as HDM "Host-managed Device Memory".
+
+	  Say 'y/m' to enable a driver that will attach to CXL.mem devices for
+	  memory expansion and control of HDM. See Chapter 9.13 in the CXL 2.0
+	  specification for a detailed description of HDM.
+
+	  If unsure say 'm'.
+
 config CXL_PORT
 	default CXL_BUS
 	tristate
diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile
index 56fcac2323cb..ce267ef11d93 100644
--- a/drivers/cxl/Makefile
+++ b/drivers/cxl/Makefile
@@ -1,10 +1,12 @@ 
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_CXL_BUS) += core/
 obj-$(CONFIG_CXL_PCI) += cxl_pci.o
+obj-$(CONFIG_CXL_MEM) += cxl_mem.o
 obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o
 obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o
 obj-$(CONFIG_CXL_PORT) += cxl_port.o
 
+cxl_mem-y := mem.o
 cxl_pci-y := pci.o
 cxl_acpi-y := acpi.o
 cxl_pmem-y := pmem.o
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 7bd53dc691ec..df6691d0a6d0 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -314,7 +314,8 @@  static int cxl_acpi_probe(struct platform_device *pdev)
 	if (rc < 0)
 		return rc;
 
-	return 0;
+	/* In case PCI is scanned before ACPI re-trigger memdev attach */
+	return cxl_bus_rescan();
 }
 
 static const struct acpi_device_id cxl_acpi_ids[] = {
@@ -335,3 +336,4 @@  module_platform_driver(cxl_acpi_driver);
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS(CXL);
 MODULE_IMPORT_NS(ACPI);
+MODULE_SOFTDEP("pre: cxl_port");
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 1a50c0fc399c..efbaa851929d 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -6,6 +6,7 @@ 
 
 extern const struct device_type cxl_nvdimm_bridge_type;
 extern const struct device_type cxl_nvdimm_type;
+extern const struct device_type cxl_memdev_type;
 
 extern struct attribute_group cxl_base_attribute_group;
 
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index b2773664e407..ee0156419d06 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -155,13 +155,19 @@  static const struct attribute_group *cxl_memdev_attribute_groups[] = {
 	NULL,
 };
 
-static const struct device_type cxl_memdev_type = {
+const struct device_type cxl_memdev_type = {
 	.name = "cxl_memdev",
 	.release = cxl_memdev_release,
 	.devnode = cxl_memdev_devnode,
 	.groups = cxl_memdev_attribute_groups,
 };
 
+bool is_cxl_memdev(struct device *dev)
+{
+	return dev->type == &cxl_memdev_type;
+}
+EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, CXL);
+
 /**
  * set_exclusive_cxl_commands() - atomically disable user cxl commands
  * @cxlds: The device state to operate on
@@ -213,6 +219,15 @@  static void cxl_memdev_unregister(void *_cxlmd)
 	put_device(dev);
 }
 
+static void detach_memdev(struct work_struct *work)
+{
+	struct cxl_memdev *cxlmd;
+
+	cxlmd = container_of(work, typeof(*cxlmd), detach_work);
+	device_release_driver(&cxlmd->dev);
+	put_device(&cxlmd->dev);
+}
+
 static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
 					   const struct file_operations *fops)
 {
@@ -237,6 +252,7 @@  static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
 	dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
 	dev->type = &cxl_memdev_type;
 	device_set_pm_not_required(dev);
+	INIT_WORK(&cxlmd->detach_work, detach_memdev);
 
 	cdev = &cxlmd->cdev;
 	cdev_init(cdev, fops);
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index b61957636907..75a66540a795 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1,6 +1,7 @@ 
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
 #include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/workqueue.h>
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -46,6 +47,8 @@  static int cxl_device_id(struct device *dev)
 			return CXL_DEVICE_ROOT;
 		return CXL_DEVICE_PORT;
 	}
+	if (dev->type == &cxl_memdev_type)
+		return CXL_DEVICE_MEMORY_EXPANDER;
 	return 0;
 }
 
@@ -320,8 +323,10 @@  static void unregister_port(void *_port)
 {
 	struct cxl_port *port = _port;
 
-	if (!is_cxl_root(port))
+	if (!is_cxl_root(port)) {
 		device_lock_assert(port->dev.parent);
+		port->uport = NULL;
+	}
 
 	device_unregister(&port->dev);
 }
@@ -412,7 +417,9 @@  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 	if (parent_port)
 		port->depth = parent_port->depth + 1;
 	dev = &port->dev;
-	if (parent_port)
+	if (is_cxl_memdev(uport))
+		rc = dev_set_name(dev, "endpoint%d", port->id);
+	else if (parent_port)
 		rc = dev_set_name(dev, "port%d", port->id);
 	else
 		rc = dev_set_name(dev, "root%d", port->id);
@@ -769,6 +776,38 @@  static struct device *grandparent(struct device *dev)
 	return NULL;
 }
 
+static void delete_endpoint(void *data)
+{
+	struct cxl_memdev *cxlmd = data;
+	struct cxl_port *endpoint = dev_get_drvdata(&cxlmd->dev);
+	struct cxl_port *parent_port;
+	struct device *parent;
+
+	parent_port = cxl_mem_find_port(cxlmd);
+	if (!parent_port)
+		return;
+	parent = &parent_port->dev;
+
+	cxl_device_lock(parent);
+	if (parent->driver && endpoint->uport) {
+		devm_release_action(parent, cxl_unlink_uport, endpoint);
+		devm_release_action(parent, unregister_port, endpoint);
+	}
+	cxl_device_unlock(parent);
+	put_device(parent);
+	put_device(&endpoint->dev);
+}
+
+int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
+{
+	struct device *dev = &cxlmd->dev;
+
+	get_device(&endpoint->dev);
+	dev_set_drvdata(dev, endpoint);
+	return devm_add_action_or_reset(dev, delete_endpoint, cxlmd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_endpoint_autoremove, CXL);
+
 static void delete_switch_port(struct cxl_memdev *cxlmd, struct cxl_port *port,
 			       struct list_head *dports)
 {
@@ -1319,6 +1358,33 @@  struct bus_type cxl_bus_type = {
 };
 EXPORT_SYMBOL_NS_GPL(cxl_bus_type, CXL);
 
+static struct workqueue_struct *cxl_bus_wq;
+
+int cxl_bus_rescan(void)
+{
+	return bus_rescan_devices(&cxl_bus_type);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_bus_rescan, CXL);
+
+bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd)
+{
+	return queue_work(cxl_bus_wq, &cxlmd->detach_work);
+}
+EXPORT_SYMBOL_NS_GPL(schedule_cxl_memdev_detach, CXL);
+
+/* for user tooling to ensure port disable work has completed */
+static ssize_t flush_store(struct bus_type *bus, const char *buf, size_t count)
+{
+	if (sysfs_streq(buf, "1")) {
+		flush_workqueue(cxl_bus_wq);
+		return count;
+	}
+
+	return -EINVAL;
+}
+
+static BUS_ATTR_WO(flush);
+
 static __init int cxl_core_init(void)
 {
 	int rc;
@@ -1329,12 +1395,27 @@  static __init int cxl_core_init(void)
 	if (rc)
 		return rc;
 
+	cxl_bus_wq = alloc_ordered_workqueue("cxl_port", 0);
+	if (!cxl_bus_wq) {
+		rc = -ENOMEM;
+		goto err_wq;
+	}
+
 	rc = bus_register(&cxl_bus_type);
 	if (rc)
-		goto err;
+		goto err_bus;
+
+	rc = bus_create_file(&cxl_bus_type, &bus_attr_flush);
+	if (rc)
+		goto err_flush;
+
 	return 0;
 
-err:
+err_flush:
+	bus_unregister(&cxl_bus_type);
+err_bus:
+	destroy_workqueue(cxl_bus_wq);
+err_wq:
 	cxl_memdev_exit();
 	cxl_mbox_exit();
 	return rc;
@@ -1342,7 +1423,9 @@  static __init int cxl_core_init(void)
 
 static void cxl_core_exit(void)
 {
+	bus_remove_file(&cxl_bus_type, &bus_attr_flush);
 	bus_unregister(&cxl_bus_type);
+	destroy_workqueue(cxl_bus_wq);
 	cxl_memdev_exit();
 	cxl_mbox_exit();
 }
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b71d40b68ccd..0bbe394f2f26 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -323,6 +323,8 @@  struct cxl_port *devm_cxl_add_port(struct device *host, struct device *uport,
 struct cxl_port *find_cxl_root(struct device *dev);
 int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
 int cxl_bus_rescan(void);
+struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd);
+bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
 
 struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
 				     struct device *dport, int port_id,
@@ -340,6 +342,8 @@  struct cxl_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
 int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
 int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
 int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld);
+int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
+
 struct cxl_hdm;
 struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port);
 int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm);
@@ -372,6 +376,7 @@  void cxl_driver_unregister(struct cxl_driver *cxl_drv);
 #define CXL_DEVICE_NVDIMM		2
 #define CXL_DEVICE_PORT			3
 #define CXL_DEVICE_ROOT			4
+#define CXL_DEVICE_MEMORY_EXPANDER	5
 
 #define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
 #define CXL_MODALIAS_FMT "cxl:t%d"
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 0ba0cf8dcdbc..7ba0edb4a1ab 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -35,11 +35,14 @@ 
  * @cdev: char dev core object for ioctl operations
  * @cxlds: The device state backing this device
  * @id: id number of this memdev instance.
+ * @detach_work: active memdev lost a port in its ancestry
+ * @component_reg_phys: register base of component registers
  */
 struct cxl_memdev {
 	struct device dev;
 	struct cdev cdev;
 	struct cxl_dev_state *cxlds;
+	struct work_struct detach_work;
 	int id;
 };
 
@@ -48,6 +51,12 @@  static inline struct cxl_memdev *to_cxl_memdev(struct device *dev)
 	return container_of(dev, struct cxl_memdev, dev);
 }
 
+bool is_cxl_memdev(struct device *dev);
+static inline bool is_cxl_endpoint(struct cxl_port *port)
+{
+	return is_cxl_memdev(port->uport);
+}
+
 struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds);
 
 /**
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
new file mode 100644
index 000000000000..27f9dd0d55b6
--- /dev/null
+++ b/drivers/cxl/mem.c
@@ -0,0 +1,222 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include "cxlmem.h"
+#include "cxlpci.h"
+
+/**
+ * DOC: cxl mem
+ *
+ * CXL memory endpoint devices and switches are CXL capable devices that are
+ * participating in CXL.mem protocol. Their functionality builds on top of the
+ * CXL.io protocol that allows enumerating and configuring components via
+ * standard PCI mechanisms.
+ *
+ * The cxl_mem driver owns kicking off the enumeration of this CXL.mem
+ * capability. With the detection of a CXL capable endpoint, the driver will
+ * walk up to find the platform specific port it is connected to, and determine
+ * if there are intervening switches in the path. If there are switches, a
+ * secondary action to enumerate those (implemented in cxl_core). Finally the
+ * cxl_mem driver will add the device it is bound to as a CXL port for use in
+ * higher level operations.
+ */
+
+static int wait_for_media(struct cxl_memdev *cxlmd)
+{
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_endpoint_dvsec_info *info = &cxlds->info;
+	int rc;
+
+	if (!info->mem_enabled)
+		return -EBUSY;
+
+	rc = cxlds->wait_media_ready(cxlds);
+	if (rc)
+		return rc;
+
+	/*
+	 * We know the device is active, and enabled, if any ranges are non-zero
+	 * we'll need to check later before adding the port since that owns the
+	 * HDM decoder registers.
+	 */
+	return 0;
+}
+
+static int create_endpoint(struct cxl_memdev *cxlmd,
+			   struct cxl_port *parent_port)
+{
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_port *endpoint;
+
+	endpoint = devm_cxl_add_port(&parent_port->dev, &cxlmd->dev,
+				     cxlds->component_reg_phys, parent_port);
+	if (IS_ERR(endpoint))
+		return PTR_ERR(endpoint);
+
+	dev_dbg(&cxlmd->dev, "add: %s\n", dev_name(&endpoint->dev));
+
+	if (!endpoint->dev.driver) {
+		dev_err(&cxlmd->dev, "%s failed probe\n",
+			dev_name(&endpoint->dev));
+		return -ENXIO;
+	}
+
+	return cxl_endpoint_autoremove(cxlmd, endpoint);
+}
+
+/**
+ * cxl_dvsec_decode_init() - Setup HDM decoding for the endpoint
+ * @cxlds: Device state
+ *
+ * Additionally, enables global HDM decoding. Warning: don't call this outside
+ * of probe. Once probe is complete, the port driver owns all access to the HDM
+ * decoder registers.
+ *
+ * Returns: false if DVSEC Ranges are being used instead of HDM decoders;
+ *	    otherwise returns true.
+ */
+__mock bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds)
+{
+	struct cxl_endpoint_dvsec_info *info = &cxlds->info;
+	struct cxl_register_map map;
+	struct cxl_component_reg_map *cmap = &map.component_map;
+	bool global_enable, do_hdm_init = false;
+	void __iomem *crb;
+	u32 global_ctrl;
+
+	/* map hdm decoder */
+	crb = ioremap(cxlds->component_reg_phys, CXL_COMPONENT_REG_BLOCK_SIZE);
+	if (!crb) {
+		dev_dbg(cxlds->dev, "Failed to map component registers\n");
+		return false;
+	}
+
+	cxl_probe_component_regs(cxlds->dev, crb, cmap);
+	if (!cmap->hdm_decoder.valid) {
+		dev_dbg(cxlds->dev, "Invalid HDM decoder registers\n");
+		goto out;
+	}
+
+	global_ctrl = readl(crb + cmap->hdm_decoder.offset +
+			    CXL_HDM_DECODER_CTRL_OFFSET);
+	global_enable = global_ctrl & CXL_HDM_DECODER_ENABLE;
+	if (!global_enable && info->ranges) {
+		dev_dbg(cxlds->dev, "DVSEC regions\n");
+		goto out;
+	}
+
+	do_hdm_init = true;
+
+	/*
+	 * Turn on global enable now since DVSEC ranges aren't being used and
+	 * we'll eventually want the decoder enabled. This also prevents special
+	 * casing in the port driver since this only applies to endpoints.
+	 */
+	if (!global_enable) {
+		dev_dbg(cxlds->dev, "Enabling HDM decode\n");
+		writel(global_ctrl | CXL_HDM_DECODER_ENABLE,
+		       crb + cmap->hdm_decoder.offset +
+			       CXL_HDM_DECODER_CTRL_OFFSET);
+	}
+
+out:
+	iounmap(crb);
+	return do_hdm_init;
+}
+
+static int cxl_mem_probe(struct device *dev)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_port *parent_port;
+	int rc;
+
+	/*
+	 * Someone is trying to reattach this device after it lost its port
+	 * connection (an endpoint port previously registered by this memdev was
+	 * disabled). This racy check is ok because if the port is still gone,
+	 * no harm done, and if the port hierarchy comes back it will re-trigger
+	 * this probe. Port rescan and memdev detach work share the same
+	 * single-threaded workqueue.
+	 */
+	if (work_pending(&cxlmd->detach_work))
+		return -EBUSY;
+
+	rc = wait_for_media(cxlmd);
+	if (rc) {
+		dev_err(dev, "Media not active (%d)\n", rc);
+		return rc;
+	}
+
+	/*
+	 * If DVSEC ranges are being used instead of HDM decoder registers there
+	 * is no use in trying to manage those.
+	 */
+	if (!cxl_dvsec_decode_init(cxlds)) {
+		struct cxl_endpoint_dvsec_info *info = &cxlds->info;
+		int i;
+
+		/* */
+		for (i = 0; i < 2; i++) {
+			u64 base, size;
+
+			/*
+			 * Give a nice warning to the user that BIOS has really
+			 * botched things for them if it didn't place DVSEC
+			 * ranges in the memory map.
+			 */
+			base = info->dvsec_range[i].start;
+			size = range_len(&info->dvsec_range[i]);
+			if (size && !region_intersects(base, size,
+						       IORESOURCE_SYSTEM_RAM,
+						       IORES_DESC_NONE)) {
+				dev_err(dev,
+					"DVSEC range %#llx-%#llx must be reserved by BIOS, but isn't\n",
+					base, base + size - 1);
+			}
+		}
+		dev_err(dev,
+			"Active DVSEC range registers in use. Will not bind.\n");
+		return -EBUSY;
+	}
+
+	rc = devm_cxl_enumerate_ports(cxlmd);
+	if (rc)
+		return rc;
+
+	parent_port = cxl_mem_find_port(cxlmd);
+	if (!parent_port) {
+		dev_err(dev, "CXL port topology not found\n");
+		return -ENXIO;
+	}
+
+	cxl_device_lock(&parent_port->dev);
+	if (!parent_port->dev.driver) {
+		dev_err(dev, "CXL port topology %s not enabled\n",
+			dev_name(&parent_port->dev));
+		rc = -ENXIO;
+		goto out;
+	}
+
+	rc = create_endpoint(cxlmd, parent_port);
+out:
+	cxl_device_unlock(&parent_port->dev);
+	put_device(&parent_port->dev);
+	return rc;
+}
+
+static struct cxl_driver cxl_mem_driver = {
+	.name = "cxl_mem",
+	.probe = cxl_mem_probe,
+	.id = CXL_DEVICE_MEMORY_EXPANDER,
+};
+
+module_cxl_driver(cxl_mem_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_IMPORT_NS(CXL);
+MODULE_ALIAS_CXL(CXL_DEVICE_MEMORY_EXPANDER);
+MODULE_SOFTDEP("pre: cxl_port");
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index ae94a537eccc..27ab7f8d122e 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -25,16 +25,27 @@ 
  * PCIe topology.
  */
 
+static void schedule_detach(void *cxlmd)
+{
+	schedule_cxl_memdev_detach(cxlmd);
+}
+
 static int cxl_port_probe(struct device *dev)
 {
 	struct cxl_port *port = to_cxl_port(dev);
 	struct cxl_hdm *cxlhdm;
 	int rc;
 
+	if (is_cxl_endpoint(port)) {
+		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
+
+		get_device(&cxlmd->dev);
+		return devm_add_action_or_reset(dev, schedule_detach, cxlmd);
+	}
+
 	rc = devm_cxl_port_enumerate_dports(port);
 	if (rc < 0)
 		return rc;
-
 	if (rc == 1)
 		return devm_cxl_add_passthrough_decoder(port);
 
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 3e2a529875ea..82e49ab0937d 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -31,6 +31,11 @@  obj-m += cxl_port.o
 cxl_port-y := $(CXL_SRC)/port.o
 cxl_port-y += config_check.o
 
+obj-m += cxl_mem.o
+
+cxl_mem-y := $(CXL_SRC)/mem.o
+cxl_mem-y += mock_mem.o
+cxl_mem-y += config_check.o
 
 obj-m += cxl_core.o
 
diff --git a/tools/testing/cxl/mock_mem.c b/tools/testing/cxl/mock_mem.c
new file mode 100644
index 000000000000..d1dec5845139
--- /dev/null
+++ b/tools/testing/cxl/mock_mem.c
@@ -0,0 +1,10 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2022 Intel Corporation. All rights reserved. */
+
+#include <linux/types.h>
+
+struct cxl_dev_state;
+bool cxl_dvsec_decode_init(struct cxl_dev_state *cxlds)
+{
+	return true;
+}