mbox series

[v2,0/6] cxl: Initialization and shutdown fixes

Message ID 172964779333.81806.8852577918216421011.stgit@dwillia2-xfh.jf.intel.com
Headers show
Series cxl: Initialization and shutdown fixes | expand

Message

Dan Williams Oct. 23, 2024, 1:43 a.m. UTC
Changes since v1 [1]:
- Fix some misspellings missed by checkpatch in changelogs (Jonathan)
- Add comments explaining the order of objects in drivers/cxl/Makefile
  (Jonathan)
- Rename attach_device => cxl_rescan_attach (Jonathan)
- Fixup Zijun's email (Zijun)

[1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com

---

Original cover:

Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
delayed arrival of the CXL "root" infrastructure [1] prompted questions
of how the existing mechanism for retrying cxl_mem_probe() could be
failing.

The critical missing piece in the debug was that Gregory's setup had
almost all CXL modules built-in to the kernel.

On the way to that discovery several other bugs and init-order corner
cases were discovered.

The main fix is to make sure the drivers/cxl/Makefile object order
supports root CXL ports being fully initialized upon cxl_acpi_probe()
exit. The modular case has some similar potential holes that are fixed
with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
cxl_test to reproduce the original report resulted in the discovery of a
separate long standing use after free bug in cxl_region_detach().

[2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net

---

Dan Williams (6):
      cxl/port: Fix CXL port initialization order when the subsystem is built-in
      cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
      cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
      cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
      cxl/port: Prevent out-of-order decoder allocation
      cxl/test: Improve init-order fidelity relative to real-world systems


 drivers/base/core.c          |   35 +++++++
 drivers/cxl/Kconfig          |    1 
 drivers/cxl/Makefile         |   20 +++-
 drivers/cxl/acpi.c           |    7 +
 drivers/cxl/core/hdm.c       |   50 +++++++++--
 drivers/cxl/core/port.c      |   13 ++-
 drivers/cxl/core/region.c    |   91 ++++++++++---------
 drivers/cxl/cxl.h            |    3 -
 include/linux/device.h       |    3 +
 tools/testing/cxl/test/cxl.c |  200 +++++++++++++++++++++++-------------------
 tools/testing/cxl/test/mem.c |    1 
 11 files changed, 269 insertions(+), 155 deletions(-)

base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b

Comments

Robert Richter Oct. 23, 2024, 1:12 p.m. UTC | #1
On 22.10.24 18:43:15, Dan Williams wrote:
> Changes since v1 [1]:
> - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> - Add comments explaining the order of objects in drivers/cxl/Makefile
>   (Jonathan)
> - Rename attach_device => cxl_rescan_attach (Jonathan)
> - Fixup Zijun's email (Zijun)
> 
> [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> 
> ---
> 
> Original cover:
> 
> Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> delayed arrival of the CXL "root" infrastructure [1] prompted questions
> of how the existing mechanism for retrying cxl_mem_probe() could be
> failing.

I found a similar issue with the region creation. 

A region is created with the first endpoint found and immediately
added as device which triggers cxl_region_probe(). Now, in
interleaving setups the region state comes into commit state only
after the last endpoint was probed. So the probe must be repeated
until all endpoints were enumerated. I ended up with this change:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a07b62254596..c78704e435e5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
 	}
 
 	if (p->state < CXL_CONFIG_COMMIT) {
-		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
-		rc = -ENXIO;
+		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
+				"region config state: %d\n", p->state);
 		goto out;
 	}
Gregory Price Oct. 23, 2024, 4 p.m. UTC | #2
On Wed, Oct 23, 2024 at 03:12:07PM +0200, Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> >   (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> > 
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> > 
> > ---
> > 
> > Original cover:
> > 
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
> 
> I found a similar issue with the region creation. 
> 
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
>  	}
>  
>  	if (p->state < CXL_CONFIG_COMMIT) {
> -		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> -		rc = -ENXIO;
> +		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> +				"region config state: %d\n", p->state);
>  		goto out;
>  	}
> 

I have not experienced any out of order operations since applying Dan's
v1 of this patch set, do you still see this after applying the existing
set?

Probably this is indicative of needing another SOFTDEP / ordering issue.

~Gregory


> -- 
> 2.39.5
> 
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.
> 
> -Robert
> 
> > 
> > The critical missing piece in the debug was that Gregory's setup had
> > almost all CXL modules built-in to the kernel.
> > 
> > On the way to that discovery several other bugs and init-order corner
> > cases were discovered.
> > 
> > The main fix is to make sure the drivers/cxl/Makefile object order
> > supports root CXL ports being fully initialized upon cxl_acpi_probe()
> > exit. The modular case has some similar potential holes that are fixed
> > with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> > cxl_test to reproduce the original report resulted in the discovery of a
> > separate long standing use after free bug in cxl_region_detach().
> > 
> > [2]: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> > 
> > ---
> > 
> > Dan Williams (6):
> >       cxl/port: Fix CXL port initialization order when the subsystem is built-in
> >       cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
> >       cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
> >       cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
> >       cxl/port: Prevent out-of-order decoder allocation
> >       cxl/test: Improve init-order fidelity relative to real-world systems
> > 
> > 
> >  drivers/base/core.c          |   35 +++++++
> >  drivers/cxl/Kconfig          |    1 
> >  drivers/cxl/Makefile         |   20 +++-
> >  drivers/cxl/acpi.c           |    7 +
> >  drivers/cxl/core/hdm.c       |   50 +++++++++--
> >  drivers/cxl/core/port.c      |   13 ++-
> >  drivers/cxl/core/region.c    |   91 ++++++++++---------
> >  drivers/cxl/cxl.h            |    3 -
> >  include/linux/device.h       |    3 +
> >  tools/testing/cxl/test/cxl.c |  200 +++++++++++++++++++++++-------------------
> >  tools/testing/cxl/test/mem.c |    1 
> >  11 files changed, 269 insertions(+), 155 deletions(-)
> > 
> > base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
Dan Williams Oct. 23, 2024, 8:34 p.m. UTC | #3
Robert Richter wrote:
> On 22.10.24 18:43:15, Dan Williams wrote:
> > Changes since v1 [1]:
> > - Fix some misspellings missed by checkpatch in changelogs (Jonathan)
> > - Add comments explaining the order of objects in drivers/cxl/Makefile
> >   (Jonathan)
> > - Rename attach_device => cxl_rescan_attach (Jonathan)
> > - Fixup Zijun's email (Zijun)
> > 
> > [1]: http://lore.kernel.org/172862483180.2150669.5564474284074502692.stgit@dwillia2-xfh.jf.intel.com
> > 
> > ---
> > 
> > Original cover:
> > 
> > Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
> > delayed arrival of the CXL "root" infrastructure [1] prompted questions
> > of how the existing mechanism for retrying cxl_mem_probe() could be
> > failing.
> 
> I found a similar issue with the region creation. 
> 
> A region is created with the first endpoint found and immediately
> added as device which triggers cxl_region_probe(). Now, in
> interleaving setups the region state comes into commit state only
> after the last endpoint was probed. So the probe must be repeated
> until all endpoints were enumerated. I ended up with this change:
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index a07b62254596..c78704e435e5 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
>  	}
>  
>  	if (p->state < CXL_CONFIG_COMMIT) {
> -		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> -		rc = -ENXIO;
> +		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> +				"region config state: %d\n", p->state);

I would argue EPROBE_DEFER is not appropriate because there is no
guarantee that the other members of the region show up, and if they do
they will re-trigger probe. So "probe must be repeated until all
endpoints were enumerated" is the case either way. I.e. either more
endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra
redundant probing *and* still results in a probe attempts as endpoints
arrive.

So a dev_dbg() plus -ENXIO return on uncommited region state is
expected.

>  		goto out;
>  	}
>  
> -- 
> 2.39.5
> 
> I don't see an init order issue here as the mem module is always up
> before the regions are probed.

Right, cxl_endpoint_port_probe() triggers region discovery and
cxl_endpoint_port_probe() currently only triggers after cxl_mem has
registered an endpoint port.

The failure this set is address is unwanted cxl_mem_probe() failures.
Robert Richter Oct. 24, 2024, 11:56 a.m. UTC | #4
On 23.10.24 13:34:36, Dan Williams wrote:
> Robert Richter wrote:

> >  	if (p->state < CXL_CONFIG_COMMIT) {
> > -		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
> > -		rc = -ENXIO;
> > +		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
> > +				"region config state: %d\n", p->state);
> 
> I would argue EPROBE_DEFER is not appropriate because there is no
> guarantee that the other members of the region show up, and if they do
> they will re-trigger probe. So "probe must be repeated until all
> endpoints were enumerated" is the case either way. I.e. either more
> endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra
> redundant probing *and* still results in a probe attempts as endpoints
> arrive.
> 
> So a dev_dbg() plus -ENXIO return on uncommited region state is
> expected.

So, the region device keeps failing a probe until all endpoints are
collected. This triggered by cxl_add_to_region() after the region went
into CXL_CONFIG_COMMIT state. Looks reasonable.

The setup I was using showed various probe failures so I 'fixed' this
issue without noticing the region device was reprobed later
successfully. Thanks for explaining.

-Robert