diff mbox series

cxl/port: Disable decoder setup for endpoints in RCD mode

Message ID 20230208071758.658652-1-rrichter@amd.com
State New, archived
Headers show
Series cxl/port: Disable decoder setup for endpoints in RCD mode | expand

Commit Message

Robert Richter Feb. 8, 2023, 7:17 a.m. UTC
In RCD mode the HDM decoder capability is optional for endpoints and
may not exist. The HDM range registers are used instead. Since the
driver relies on the existence of an HDM decoder capability, its
absence will cause the initialization of a memory card to fail.

Moreover, the driver also tries to enable or reuse enabled memory
ranges. In the worst case this may lead to a system hang due to
disabling system memory that was previously provided and setup by
system firmware.

To solve the issues described, disable decoder setup for RCD endpoints
and instead rely exclusively on system firmware to enable those memory
ranges. Decoders are used by the kernel to setup and configure CXL
memory regions, esp. to enable and disable them. Since Hot-plug is not
supported for devices in RCD mode, the ability to disable that memory
by the kernel using a decoder is not a necessarily requirement,
decoders are not needed then.

Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
Signed-off-by: Robert Richter <rrichter@amd.com>
---
 drivers/cxl/port.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)


base-commit: 623c0751336e4035ab0047f2c152a02bd26b612b

Comments

Dan Williams Feb. 9, 2023, 5:07 p.m. UTC | #1
Robert Richter wrote:
> In RCD mode the HDM decoder capability is optional for endpoints and
> may not exist. The HDM range registers are used instead. Since the
> driver relies on the existence of an HDM decoder capability, its
> absence will cause the initialization of a memory card to fail.
> 
> Moreover, the driver also tries to enable or reuse enabled memory
> ranges. In the worst case this may lead to a system hang due to
> disabling system memory that was previously provided and setup by
> system firmware.
> 
> To solve the issues described, disable decoder setup for RCD endpoints
> and instead rely exclusively on system firmware to enable those memory
> ranges. Decoders are used by the kernel to setup and configure CXL
> memory regions, esp. to enable and disable them. Since Hot-plug is not
> supported for devices in RCD mode, the ability to disable that memory
> by the kernel using a decoder is not a necessarily requirement,
> decoders are not needed then.
> 
> Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
> Signed-off-by: Robert Richter <rrichter@amd.com>

Does Dave's series address this problem?

https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/

...that is arranging for the driver to carry-on in the absence of the
HDM Decoder Capability.
Robert Richter Feb. 13, 2023, 2:49 p.m. UTC | #2
Dan,

On 09.02.23 09:07:18, Dan Williams wrote:
> Robert Richter wrote:
> > In RCD mode the HDM decoder capability is optional for endpoints and
> > may not exist. The HDM range registers are used instead. Since the
> > driver relies on the existence of an HDM decoder capability, its
> > absence will cause the initialization of a memory card to fail.
> > 
> > Moreover, the driver also tries to enable or reuse enabled memory
> > ranges. In the worst case this may lead to a system hang due to
> > disabling system memory that was previously provided and setup by
> > system firmware.
> > 
> > To solve the issues described, disable decoder setup for RCD endpoints
> > and instead rely exclusively on system firmware to enable those memory
> > ranges. Decoders are used by the kernel to setup and configure CXL
> > memory regions, esp. to enable and disable them. Since Hot-plug is not
> > supported for devices in RCD mode, the ability to disable that memory
> > by the kernel using a decoder is not a necessarily requirement,
> > decoders are not needed then.
> > 
> > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
> > Signed-off-by: Robert Richter <rrichter@amd.com>
> 
> Does Dave's series address this problem?
> 
> https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/
> 
> ...that is arranging for the driver to carry-on in the absence of the
> HDM Decoder Capability.

it might only solve the missing hdm decoder capability. I need to take
a closer look if that also solves a system hang I was debugging which
is caused by clearing the memory disable bit in the hdm dvsec range
register. So the best would be to use this patch now to fix decoder
initialization in RCD mode and then have Dave's patches on top. I am
going to test the series too.

Thanks,

-Robert
Dan Williams Feb. 14, 2023, 10:28 p.m. UTC | #3
Robert Richter wrote:
> Dan,
> 
> On 09.02.23 09:07:18, Dan Williams wrote:
> > Robert Richter wrote:
> > > In RCD mode the HDM decoder capability is optional for endpoints and
> > > may not exist. The HDM range registers are used instead. Since the
> > > driver relies on the existence of an HDM decoder capability, its
> > > absence will cause the initialization of a memory card to fail.
> > > 
> > > Moreover, the driver also tries to enable or reuse enabled memory
> > > ranges. In the worst case this may lead to a system hang due to
> > > disabling system memory that was previously provided and setup by
> > > system firmware.
> > > 
> > > To solve the issues described, disable decoder setup for RCD endpoints
> > > and instead rely exclusively on system firmware to enable those memory
> > > ranges. Decoders are used by the kernel to setup and configure CXL
> > > memory regions, esp. to enable and disable them. Since Hot-plug is not
> > > supported for devices in RCD mode, the ability to disable that memory
> > > by the kernel using a decoder is not a necessarily requirement,
> > > decoders are not needed then.
> > > 
> > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
> > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > 
> > Does Dave's series address this problem?
> > 
> > https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/
> > 
> > ...that is arranging for the driver to carry-on in the absence of the
> > HDM Decoder Capability.
> 
> it might only solve the missing hdm decoder capability. I need to take
> a closer look if that also solves a system hang I was debugging which
> is caused by clearing the memory disable bit in the hdm dvsec range
> register. So the best would be to use this patch now to fix decoder
> initialization in RCD mode and then have Dave's patches on top. I am
> going to test the series too.

My concern with this patch is that it skips HDM decoder enumeration
entirely in RCD mode. The CXL cards I have seen are CXL 1.1+ and do
export the HDM decoder capability.

The driver turns off mem_enable in a few scenarios, one of them indeed
looks buggy, but does not seem to be the one you addressed. The driver
should only disable mem if it was also the agent that enabled mem, but
looks like it does not always do that.

Can you confirm if this fixes this issue?

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index c18ed1bbb54d..2db3b5cf41e9 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -385,7 +385,8 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
         * If the HDM Decoder Capability is already enabled then assume
         * that some other agent like platform firmware set it up.
         */
-       if (global_ctrl & CXL_HDM_DECODER_ENABLE || (!hdm && info->mem_enabled))
+       if (!info->mem_enabled &&
+           (global_ctrl & CXL_HDM_DECODER_ENABLE || !hdm))
                return devm_cxl_enable_mem(&port->dev, cxlds);
        else if (!hdm)
                return -ENODEV;

Otherwise can you confirm if the platform provides a CFMWS window that
matches the range-register programming? If this is the problem then I
think this needs a platform quirk to workaround a BIOS that violates
kernel expectations.
Robert Richter Feb. 15, 2023, 4:32 p.m. UTC | #4
On 14.02.23 14:28:51, Dan Williams wrote:
> Robert Richter wrote:
> > Dan,
> > 
> > On 09.02.23 09:07:18, Dan Williams wrote:
> > > Robert Richter wrote:
> > > > In RCD mode the HDM decoder capability is optional for endpoints and
> > > > may not exist. The HDM range registers are used instead. Since the
> > > > driver relies on the existence of an HDM decoder capability, its
> > > > absence will cause the initialization of a memory card to fail.
> > > > 
> > > > Moreover, the driver also tries to enable or reuse enabled memory
> > > > ranges. In the worst case this may lead to a system hang due to
> > > > disabling system memory that was previously provided and setup by
> > > > system firmware.
> > > > 
> > > > To solve the issues described, disable decoder setup for RCD endpoints
> > > > and instead rely exclusively on system firmware to enable those memory
> > > > ranges. Decoders are used by the kernel to setup and configure CXL
> > > > memory regions, esp. to enable and disable them. Since Hot-plug is not
> > > > supported for devices in RCD mode, the ability to disable that memory
> > > > by the kernel using a decoder is not a necessarily requirement,
> > > > decoders are not needed then.
> > > > 
> > > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges")
> > > > Signed-off-by: Robert Richter <rrichter@amd.com>
> > > 
> > > Does Dave's series address this problem?
> > > 
> > > https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/
> > > 
> > > ...that is arranging for the driver to carry-on in the absence of the
> > > HDM Decoder Capability.
> > 
> > it might only solve the missing hdm decoder capability. I need to take
> > a closer look if that also solves a system hang I was debugging which
> > is caused by clearing the memory disable bit in the hdm dvsec range
> > register. So the best would be to use this patch now to fix decoder
> > initialization in RCD mode and then have Dave's patches on top. I am
> > going to test the series too.
> 
> My concern with this patch is that it skips HDM decoder enumeration
> entirely in RCD mode. The CXL cards I have seen are CXL 1.1+ and do
> export the HDM decoder capability.
> 
> The driver turns off mem_enable in a few scenarios, one of them indeed
> looks buggy, but does not seem to be the one you addressed. The driver
> should only disable mem if it was also the agent that enabled mem, but
> looks like it does not always do that.
> 
> Can you confirm if this fixes this issue?

I have tested the HDM decoder emulation series (v5) and it fixes the
issue. Looking into the paricular change for that, I hope to get a
condensed fix for stable.

Thanks,

-Robert
diff mbox series

Patch

diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 5453771bf330..19591d904bdf 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -43,11 +43,11 @@  static int cxl_port_probe(struct device *dev)
 			return rc;
 		if (rc == 1)
 			return devm_cxl_add_passthrough_decoder(port);
-	}
 
-	cxlhdm = devm_cxl_setup_hdm(port);
-	if (IS_ERR(cxlhdm))
-		return PTR_ERR(cxlhdm);
+		cxlhdm = devm_cxl_setup_hdm(port);
+		if (IS_ERR(cxlhdm))
+			return PTR_ERR(cxlhdm);
+	}
 
 	if (is_cxl_endpoint(port)) {
 		struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
@@ -61,6 +61,19 @@  static int cxl_port_probe(struct device *dev)
 		if (rc)
 			return rc;
 
+		/*
+		 * The HDM decoder capability may not exist. Do not
+		 * use decoders in RCD mode, instead rely on firmware
+		 * to setup the range or decoder registers and to
+		 * enable memory.
+		 */
+		if (cxlds->rcd)
+			return cxl_await_media_ready(cxlds);
+
+		cxlhdm = devm_cxl_setup_hdm(port);
+		if (IS_ERR(cxlhdm))
+			return PTR_ERR(cxlhdm);
+
 		rc = cxl_hdm_decode_init(cxlds, cxlhdm);
 		if (rc)
 			return rc;