Message ID | 20220831081603.3415-3-rrichter@amd.com |
---|---|
State | Superseded |
Delegated to: | Dan Williams |
Headers | show |
Series | cxl: Add support for Restricted CXL hosts (RCD mode) | expand |
On Wed, 31 Aug 2022 10:15:50 +0200 Robert Richter <rrichter@amd.com> wrote: > The physical base address of a CXL range can be invalid and is then > set to CXL_RESOURCE_NONE. Early check this case before mapping a > memory block in devm_cxl_iomap_block(). An example of when it is invalid would be a useful addition to this description. Feels to me like this is papering over the cracks in something we should have ruled out earlier in the flow. Jonathan > > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/core/regs.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c > index 39a129c57d40..f216c017a474 100644 > --- a/drivers/cxl/core/regs.c > +++ b/drivers/cxl/core/regs.c > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > void __iomem *ret_val; > struct resource *res; > > + if (addr == CXL_RESOURCE_NONE) > + return NULL; > + > res = devm_request_mem_region(dev, addr, length, dev_name(dev)); > if (!res) { > resource_size_t end = addr + length - 1;
On 31.08.22 09:56:51, Jonathan Cameron wrote: > On Wed, 31 Aug 2022 10:15:50 +0200 > Robert Richter <rrichter@amd.com> wrote: > > > The physical base address of a CXL range can be invalid and is then > > set to CXL_RESOURCE_NONE. Early check this case before mapping a > > memory block in devm_cxl_iomap_block(). > > An example of when it is invalid would be a useful addition > to this description. Feels to me like this is papering over the cracks > in something we should have ruled out earlier in the flow. IIRC this happened during code development in cxl_pci_probe() when cxl_map_regs() is called but the base addr was not determined correctly before. It's similar to a NULL pointer check. -Robert
Robert Richter wrote: > The physical base address of a CXL range can be invalid and is then > set to CXL_RESOURCE_NONE. Early check this case before mapping a > memory block in devm_cxl_iomap_block(). > > Signed-off-by: Robert Richter <rrichter@amd.com> > --- > drivers/cxl/core/regs.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c > index 39a129c57d40..f216c017a474 100644 > --- a/drivers/cxl/core/regs.c > +++ b/drivers/cxl/core/regs.c > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > void __iomem *ret_val; > struct resource *res; > > + if (addr == CXL_RESOURCE_NONE) > + return NULL; > + > res = devm_request_mem_region(dev, addr, length, dev_name(dev)); > if (!res) { > resource_size_t end = addr + length - 1; > -- > 2.30.2 > devm_request_mem_region() succeeds for you when this happens? More details about the failure scenario please.
On 07.09.22 22:48:57, Dan Williams wrote: > Robert Richter wrote: > > The physical base address of a CXL range can be invalid and is then > > set to CXL_RESOURCE_NONE. Early check this case before mapping a > > memory block in devm_cxl_iomap_block(). > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > --- > > drivers/cxl/core/regs.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c > > index 39a129c57d40..f216c017a474 100644 > > --- a/drivers/cxl/core/regs.c > > +++ b/drivers/cxl/core/regs.c > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > > void __iomem *ret_val; > > struct resource *res; > > > > + if (addr == CXL_RESOURCE_NONE) > > + return NULL; > > + > > res = devm_request_mem_region(dev, addr, length, dev_name(dev)); > > if (!res) { > > resource_size_t end = addr + length - 1; > > -- > > 2.30.2 > > > > devm_request_mem_region() succeeds for you when this happens? More > details about the failure scenario please. No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is calculated that even overflows. I only vaguely remember the exact error message. This may happen e.g. if the Component Register Block is missing in the DVSEC. cxl_find_regblock() may fail then and returns CXL_RESOURCE_NONE. There are a couple of code paths there component_reg_phys is set to CXL_RESOURCE_NONE without exiting immediately. I saw it during code development, when I pre-inititalized a port with component_reg_phys set to CXL_RESOURCE_NONE. Since that case can generally happen, I think it must be checked. -Robert
Robert Richter wrote: > On 07.09.22 22:48:57, Dan Williams wrote: > > Robert Richter wrote: > > > The physical base address of a CXL range can be invalid and is then > > > set to CXL_RESOURCE_NONE. Early check this case before mapping a > > > memory block in devm_cxl_iomap_block(). > > > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > > --- > > > drivers/cxl/core/regs.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c > > > index 39a129c57d40..f216c017a474 100644 > > > --- a/drivers/cxl/core/regs.c > > > +++ b/drivers/cxl/core/regs.c > > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > > > void __iomem *ret_val; > > > struct resource *res; > > > > > > + if (addr == CXL_RESOURCE_NONE) > > > + return NULL; > > > + > > > res = devm_request_mem_region(dev, addr, length, dev_name(dev)); > > > if (!res) { > > > resource_size_t end = addr + length - 1; > > > -- > > > 2.30.2 > > > > > > > devm_request_mem_region() succeeds for you when this happens? More > > details about the failure scenario please. > > No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is > calculated that even overflows. I only vaguely remember the exact > error message. > > This may happen e.g. if the Component Register Block is missing in the > DVSEC. cxl_find_regblock() may fail then and returns > CXL_RESOURCE_NONE. There are a couple of code paths there > component_reg_phys is set to CXL_RESOURCE_NONE without exiting > immediately. > > I saw it during code development, when I pre-inititalized a port with > component_reg_phys set to CXL_RESOURCE_NONE. Since that case can > generally happen, I think it must be checked. I think Jonathan had it right when we posited that the code should probably have failed before getting to this point. For example, the scenarios where the driver looks for a component register block via the register locator DVSEC are not valid for RCDs in the first instance.
On 16.09.22 11:04:01, Dan Williams wrote: > Robert Richter wrote: > > On 07.09.22 22:48:57, Dan Williams wrote: > > > Robert Richter wrote: > > > > The physical base address of a CXL range can be invalid and is then > > > > set to CXL_RESOURCE_NONE. Early check this case before mapping a > > > > memory block in devm_cxl_iomap_block(). > > > > > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > > > --- > > > > drivers/cxl/core/regs.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c > > > > index 39a129c57d40..f216c017a474 100644 > > > > --- a/drivers/cxl/core/regs.c > > > > +++ b/drivers/cxl/core/regs.c > > > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > > > > void __iomem *ret_val; > > > > struct resource *res; > > > > > > > > + if (addr == CXL_RESOURCE_NONE) > > > > + return NULL; > > > > + > > > > res = devm_request_mem_region(dev, addr, length, dev_name(dev)); > > > > if (!res) { > > > > resource_size_t end = addr + length - 1; > > > > -- > > > > 2.30.2 > > > > > > > > > > devm_request_mem_region() succeeds for you when this happens? More > > > details about the failure scenario please. > > > > No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is > > calculated that even overflows. I only vaguely remember the exact > > error message. > > > > This may happen e.g. if the Component Register Block is missing in the > > DVSEC. cxl_find_regblock() may fail then and returns > > CXL_RESOURCE_NONE. There are a couple of code paths there > > component_reg_phys is set to CXL_RESOURCE_NONE without exiting > > immediately. > > > > I saw it during code development, when I pre-inititalized a port with > > component_reg_phys set to CXL_RESOURCE_NONE. Since that case can > > generally happen, I think it must be checked. > > I think Jonathan had it right when we posited that the code should > probably have failed before getting to this point. For example, the > scenarios where the driver looks for a component register block via the > register locator DVSEC are not valid for RCDs in the first instance. I am ok having the code reviewed to prevent such situations. But the handling of component_reg_phys is by far not trivial as there are many locations it is initialized and others where it is used. Call chain is across multiple functions using various data sources for component_reg_phys, so it is hard to proof this may never happen. E.g. in add_port_attach_ep() I found this: component_reg_phys = find_component_registers(uport_dev); port = devm_cxl_add_port(&parent_port->dev, uport_dev, component_reg_phys, parent_dport); find_component_registers() and subsequent functions (e.g. cxl_regmap_to_base()) may return CXL_RESOURCE_NONE. But it is written to port without any further check in cxl_port_alloc(): port->component_reg_phys = component_reg_phys; It is then later directly used in devm_cxl_setup_hdm() to map io ranges with devm_cxl_iomap_block(). Just an example, I did not go through other code paths. IMO, the check is cheap and prevents weird follow up errors and log messages. But I would be also ok, to just drop the patch. There are no dependencies to this change. What do you think? Thanks, -Robert
Robert Richter wrote: > On 16.09.22 11:04:01, Dan Williams wrote: > > Robert Richter wrote: > > > On 07.09.22 22:48:57, Dan Williams wrote: > > > > Robert Richter wrote: > > > > > The physical base address of a CXL range can be invalid and is then > > > > > set to CXL_RESOURCE_NONE. Early check this case before mapping a > > > > > memory block in devm_cxl_iomap_block(). > > > > > > > > > > Signed-off-by: Robert Richter <rrichter@amd.com> > > > > > --- > > > > > drivers/cxl/core/regs.c | 3 +++ > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c > > > > > index 39a129c57d40..f216c017a474 100644 > > > > > --- a/drivers/cxl/core/regs.c > > > > > +++ b/drivers/cxl/core/regs.c > > > > > @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, > > > > > void __iomem *ret_val; > > > > > struct resource *res; > > > > > > > > > > + if (addr == CXL_RESOURCE_NONE) > > > > > + return NULL; > > > > > + > > > > > res = devm_request_mem_region(dev, addr, length, dev_name(dev)); > > > > > if (!res) { > > > > > resource_size_t end = addr + length - 1; > > > > > -- > > > > > 2.30.2 > > > > > > > > > > > > > devm_request_mem_region() succeeds for you when this happens? More > > > > details about the failure scenario please. > > > > > > No, CXL_RESOURCE_NONE (all FFs) is used as address. A broken range is > > > calculated that even overflows. I only vaguely remember the exact > > > error message. > > > > > > This may happen e.g. if the Component Register Block is missing in the > > > DVSEC. cxl_find_regblock() may fail then and returns > > > CXL_RESOURCE_NONE. There are a couple of code paths there > > > component_reg_phys is set to CXL_RESOURCE_NONE without exiting > > > immediately. > > > > > > I saw it during code development, when I pre-inititalized a port with > > > component_reg_phys set to CXL_RESOURCE_NONE. Since that case can > > > generally happen, I think it must be checked. > > > > I think Jonathan had it right when we posited that the code should > > probably have failed before getting to this point. For example, the > > scenarios where the driver looks for a component register block via the > > register locator DVSEC are not valid for RCDs in the first instance. > > I am ok having the code reviewed to prevent such situations. But the > handling of component_reg_phys is by far not trivial as there are many > locations it is initialized and others where it is used. Call chain is > across multiple functions using various data sources for > component_reg_phys, so it is hard to proof this may never happen. > E.g. in add_port_attach_ep() I found this: > > component_reg_phys = find_component_registers(uport_dev); > port = devm_cxl_add_port(&parent_port->dev, uport_dev, > component_reg_phys, parent_dport); > > find_component_registers() and subsequent functions (e.g. > cxl_regmap_to_base()) may return CXL_RESOURCE_NONE. But it is written > to port without any further check in cxl_port_alloc(): > > port->component_reg_phys = component_reg_phys; > > It is then later directly used in devm_cxl_setup_hdm() to map io > ranges with devm_cxl_iomap_block(). > > Just an example, I did not go through other code paths. > > IMO, the check is cheap and prevents weird follow up errors and log > messages. But I would be also ok, to just drop the patch. There are no > dependencies to this change. What do you think? Good points above... Yeah, the expectation is that only "broken hardware" fails to provide component registers, but the driver is currently deficient in that it assumes everything is CXL 2.0. Now you have me thinking it should go a bit further and do a WARN_ONCE() when CXL_RESOURCE_NONE is passed, as those locations need better error handling than just handling it like an ioremap() failure.
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c index 39a129c57d40..f216c017a474 100644 --- a/drivers/cxl/core/regs.c +++ b/drivers/cxl/core/regs.c @@ -165,6 +165,9 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr, void __iomem *ret_val; struct resource *res; + if (addr == CXL_RESOURCE_NONE) + return NULL; + res = devm_request_mem_region(dev, addr, length, dev_name(dev)); if (!res) { resource_size_t end = addr + length - 1;
The physical base address of a CXL range can be invalid and is then set to CXL_RESOURCE_NONE. Early check this case before mapping a memory block in devm_cxl_iomap_block(). Signed-off-by: Robert Richter <rrichter@amd.com> --- drivers/cxl/core/regs.c | 3 +++ 1 file changed, 3 insertions(+)