Message ID | 20230208071758.658652-1-rrichter@amd.com |
---|---|
State | New, archived |
Headers | show |
Series | cxl/port: Disable decoder setup for endpoints in RCD mode | expand |
Robert Richter wrote: > In RCD mode the HDM decoder capability is optional for endpoints and > may not exist. The HDM range registers are used instead. Since the > driver relies on the existence of an HDM decoder capability, its > absence will cause the initialization of a memory card to fail. > > Moreover, the driver also tries to enable or reuse enabled memory > ranges. In the worst case this may lead to a system hang due to > disabling system memory that was previously provided and setup by > system firmware. > > To solve the issues described, disable decoder setup for RCD endpoints > and instead rely exclusively on system firmware to enable those memory > ranges. Decoders are used by the kernel to setup and configure CXL > memory regions, esp. to enable and disable them. Since Hot-plug is not > supported for devices in RCD mode, the ability to disable that memory > by the kernel using a decoder is not a necessarily requirement, > decoders are not needed then. > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges") > Signed-off-by: Robert Richter <rrichter@amd.com> Does Dave's series address this problem? https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/ ...that is arranging for the driver to carry-on in the absence of the HDM Decoder Capability.
Dan, On 09.02.23 09:07:18, Dan Williams wrote: > Robert Richter wrote: > > In RCD mode the HDM decoder capability is optional for endpoints and > > may not exist. The HDM range registers are used instead. Since the > > driver relies on the existence of an HDM decoder capability, its > > absence will cause the initialization of a memory card to fail. > > > > Moreover, the driver also tries to enable or reuse enabled memory > > ranges. In the worst case this may lead to a system hang due to > > disabling system memory that was previously provided and setup by > > system firmware. > > > > To solve the issues described, disable decoder setup for RCD endpoints > > and instead rely exclusively on system firmware to enable those memory > > ranges. Decoders are used by the kernel to setup and configure CXL > > memory regions, esp. to enable and disable them. Since Hot-plug is not > > supported for devices in RCD mode, the ability to disable that memory > > by the kernel using a decoder is not a necessarily requirement, > > decoders are not needed then. > > > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges") > > Signed-off-by: Robert Richter <rrichter@amd.com> > > Does Dave's series address this problem? > > https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/ > > ...that is arranging for the driver to carry-on in the absence of the > HDM Decoder Capability. it might only solve the missing hdm decoder capability. I need to take a closer look if that also solves a system hang I was debugging which is caused by clearing the memory disable bit in the hdm dvsec range register. So the best would be to use this patch now to fix decoder initialization in RCD mode and then have Dave's patches on top. I am going to test the series too. Thanks, -Robert
Robert Richter wrote: > Dan, > > On 09.02.23 09:07:18, Dan Williams wrote: > > Robert Richter wrote: > > > In RCD mode the HDM decoder capability is optional for endpoints and > > > may not exist. The HDM range registers are used instead. Since the > > > driver relies on the existence of an HDM decoder capability, its > > > absence will cause the initialization of a memory card to fail. > > > > > > Moreover, the driver also tries to enable or reuse enabled memory > > > ranges. In the worst case this may lead to a system hang due to > > > disabling system memory that was previously provided and setup by > > > system firmware. > > > > > > To solve the issues described, disable decoder setup for RCD endpoints > > > and instead rely exclusively on system firmware to enable those memory > > > ranges. Decoders are used by the kernel to setup and configure CXL > > > memory regions, esp. to enable and disable them. Since Hot-plug is not > > > supported for devices in RCD mode, the ability to disable that memory > > > by the kernel using a decoder is not a necessarily requirement, > > > decoders are not needed then. > > > > > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges") > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > > > Does Dave's series address this problem? > > > > https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/ > > > > ...that is arranging for the driver to carry-on in the absence of the > > HDM Decoder Capability. > > it might only solve the missing hdm decoder capability. I need to take > a closer look if that also solves a system hang I was debugging which > is caused by clearing the memory disable bit in the hdm dvsec range > register. So the best would be to use this patch now to fix decoder > initialization in RCD mode and then have Dave's patches on top. I am > going to test the series too. My concern with this patch is that it skips HDM decoder enumeration entirely in RCD mode. The CXL cards I have seen are CXL 1.1+ and do export the HDM decoder capability. The driver turns off mem_enable in a few scenarios, one of them indeed looks buggy, but does not seem to be the one you addressed. The driver should only disable mem if it was also the agent that enabled mem, but looks like it does not always do that. Can you confirm if this fixes this issue? diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index c18ed1bbb54d..2db3b5cf41e9 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -385,7 +385,8 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm, * If the HDM Decoder Capability is already enabled then assume * that some other agent like platform firmware set it up. */ - if (global_ctrl & CXL_HDM_DECODER_ENABLE || (!hdm && info->mem_enabled)) + if (!info->mem_enabled && + (global_ctrl & CXL_HDM_DECODER_ENABLE || !hdm)) return devm_cxl_enable_mem(&port->dev, cxlds); else if (!hdm) return -ENODEV; Otherwise can you confirm if the platform provides a CFMWS window that matches the range-register programming? If this is the problem then I think this needs a platform quirk to workaround a BIOS that violates kernel expectations.
On 14.02.23 14:28:51, Dan Williams wrote: > Robert Richter wrote: > > Dan, > > > > On 09.02.23 09:07:18, Dan Williams wrote: > > > Robert Richter wrote: > > > > In RCD mode the HDM decoder capability is optional for endpoints and > > > > may not exist. The HDM range registers are used instead. Since the > > > > driver relies on the existence of an HDM decoder capability, its > > > > absence will cause the initialization of a memory card to fail. > > > > > > > > Moreover, the driver also tries to enable or reuse enabled memory > > > > ranges. In the worst case this may lead to a system hang due to > > > > disabling system memory that was previously provided and setup by > > > > system firmware. > > > > > > > > To solve the issues described, disable decoder setup for RCD endpoints > > > > and instead rely exclusively on system firmware to enable those memory > > > > ranges. Decoders are used by the kernel to setup and configure CXL > > > > memory regions, esp. to enable and disable them. Since Hot-plug is not > > > > supported for devices in RCD mode, the ability to disable that memory > > > > by the kernel using a decoder is not a necessarily requirement, > > > > decoders are not needed then. > > > > > > > > Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges") > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > > > > > Does Dave's series address this problem? > > > > > > https://lore.kernel.org/linux-cxl/167588394236.1155956.8466475582138210344.stgit@djiang5-mobl3.local/ > > > > > > ...that is arranging for the driver to carry-on in the absence of the > > > HDM Decoder Capability. > > > > it might only solve the missing hdm decoder capability. I need to take > > a closer look if that also solves a system hang I was debugging which > > is caused by clearing the memory disable bit in the hdm dvsec range > > register. So the best would be to use this patch now to fix decoder > > initialization in RCD mode and then have Dave's patches on top. I am > > going to test the series too. > > My concern with this patch is that it skips HDM decoder enumeration > entirely in RCD mode. The CXL cards I have seen are CXL 1.1+ and do > export the HDM decoder capability. > > The driver turns off mem_enable in a few scenarios, one of them indeed > looks buggy, but does not seem to be the one you addressed. The driver > should only disable mem if it was also the agent that enabled mem, but > looks like it does not always do that. > > Can you confirm if this fixes this issue? I have tested the HDM decoder emulation series (v5) and it fixes the issue. Looking into the paricular change for that, I hope to get a condensed fix for stable. Thanks, -Robert
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 5453771bf330..19591d904bdf 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -43,11 +43,11 @@ static int cxl_port_probe(struct device *dev) return rc; if (rc == 1) return devm_cxl_add_passthrough_decoder(port); - } - cxlhdm = devm_cxl_setup_hdm(port); - if (IS_ERR(cxlhdm)) - return PTR_ERR(cxlhdm); + cxlhdm = devm_cxl_setup_hdm(port); + if (IS_ERR(cxlhdm)) + return PTR_ERR(cxlhdm); + } if (is_cxl_endpoint(port)) { struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport); @@ -61,6 +61,19 @@ static int cxl_port_probe(struct device *dev) if (rc) return rc; + /* + * The HDM decoder capability may not exist. Do not + * use decoders in RCD mode, instead rely on firmware + * to setup the range or decoder registers and to + * enable memory. + */ + if (cxlds->rcd) + return cxl_await_media_ready(cxlds); + + cxlhdm = devm_cxl_setup_hdm(port); + if (IS_ERR(cxlhdm)) + return PTR_ERR(cxlhdm); + rc = cxl_hdm_decode_init(cxlds, cxlhdm); if (rc) return rc;
In RCD mode the HDM decoder capability is optional for endpoints and may not exist. The HDM range registers are used instead. Since the driver relies on the existence of an HDM decoder capability, its absence will cause the initialization of a memory card to fail. Moreover, the driver also tries to enable or reuse enabled memory ranges. In the worst case this may lead to a system hang due to disabling system memory that was previously provided and setup by system firmware. To solve the issues described, disable decoder setup for RCD endpoints and instead rely exclusively on system firmware to enable those memory ranges. Decoders are used by the kernel to setup and configure CXL memory regions, esp. to enable and disable them. Since Hot-plug is not supported for devices in RCD mode, the ability to disable that memory by the kernel using a decoder is not a necessarily requirement, decoders are not needed then. Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges") Signed-off-by: Robert Richter <rrichter@amd.com> --- drivers/cxl/port.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) base-commit: 623c0751336e4035ab0047f2c152a02bd26b612b