diff mbox series

cxl/region: Avoid null pointer dereference in region lookup

Message ID 20240604003609.202682-1-alison.schofield@intel.com
State Accepted
Commit 285f2a08841432fc3e498b1cd00cce5216cdf189
Headers show
Series cxl/region: Avoid null pointer dereference in region lookup | expand

Commit Message

Alison Schofield June 4, 2024, 12:36 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

cxl_dpa_to_region() looks up a region based on a memdev and DPA.
It wrongly assumes an endpoint found mapping the DPA is also of
a fully assembled region. When not true it leads to a null pointer
dereference looking up the region name.

This appears during testing of region lookup after a failure to
assemble a BIOS defined region or if the lookup raced with the
assembly of the BIOS defined region.

Failure to clean up BIOS defined regions that fail assembly is an
issue in itself and a fix to that problem will alleviate some of
the impact. It will not alleviate the race condition so let's harden
this path.

The behavior change is that the kernel oops due to a null pointer
dereference is replaced with a dev_dbg() message noting that an
endpoint was mapped.

Additional comments are added so that future users of this function
can more clearly understand what it provides.

Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/region.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)


base-commit: 49ba7b515c4c0719b866d16f068e62d16a8a3dd1

Comments

Jonathan Cameron June 4, 2024, 4:37 p.m. UTC | #1
On Mon,  3 Jun 2024 17:36:09 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> cxl_dpa_to_region() looks up a region based on a memdev and DPA.
> It wrongly assumes an endpoint found mapping the DPA is also of
> a fully assembled region. When not true it leads to a null pointer
> dereference looking up the region name.
> 
> This appears during testing of region lookup after a failure to
> assemble a BIOS defined region or if the lookup raced with the
> assembly of the BIOS defined region.
> 
> Failure to clean up BIOS defined regions that fail assembly is an
> issue in itself and a fix to that problem will alleviate some of
> the impact. It will not alleviate the race condition so let's harden
> this path.
> 
> The behavior change is that the kernel oops due to a null pointer
> dereference is replaced with a dev_dbg() message noting that an
> endpoint was mapped.
> 
> Additional comments are added so that future users of this function
> can more clearly understand what it provides.
> 
> Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Nasty corner case. Fix looks correct to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/region.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3c2b6144be23..88051bb673bf 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2688,22 +2688,33 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg)
>  {
>  	struct cxl_dpa_to_region_context *ctx = arg;
>  	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_region *cxlr;
>  	u64 dpa = ctx->dpa;
>  
>  	if (!is_endpoint_decoder(dev))
>  		return 0;
>  
>  	cxled = to_cxl_endpoint_decoder(dev);
> -	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> +	if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res))
>  		return 0;
>  
>  	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
>  		return 0;
>  
> -	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> -		dev_name(&cxled->cxld.region->dev));
> +	/*
> +	 * Stop the region search (return 1) when an endpoint mapping is
> +	 * found. The region may not be fully constructed so offering
> +	 * the cxlr in the context structure is not guaranteed.
> +	 */
> +	cxlr = cxled->cxld.region;
> +	if (cxlr)
> +		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> +			dev_name(&cxlr->dev));
> +	else
> +		dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
> +			dev_name(dev));
>  
> -	ctx->cxlr = cxled->cxld.region;
> +	ctx->cxlr = cxlr;
>  
>  	return 1;
>  }
> 
> base-commit: 49ba7b515c4c0719b866d16f068e62d16a8a3dd1
Dan Williams June 4, 2024, 5:47 p.m. UTC | #2
Perhaps change the subject with: s/Avoid/Fix/?

alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> cxl_dpa_to_region() looks up a region based on a memdev and DPA.
> It wrongly assumes an endpoint found mapping the DPA is also of
> a fully assembled region. When not true it leads to a null pointer
> dereference looking up the region name.

Can you place the a snippet of the Call Trace here, that helps support
that the writeup came to the right conclusion, and it helps folks
hitting this error to find this patch in a web search.

> This appears during testing of region lookup after a failure to
> assemble a BIOS defined region or if the lookup raced with the
> assembly of the BIOS defined region.
> 
> Failure to clean up BIOS defined regions that fail assembly is an
> issue in itself and a fix to that problem will alleviate some of
> the impact. It will not alleviate the race condition so let's harden
> this path.
> 
> The behavior change is that the kernel oops due to a null pointer
> dereference is replaced with a dev_dbg() message noting that an
> endpoint was mapped.
> 
> Additional comments are added so that future users of this function
> can more clearly understand what it provides.
> 
> Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/region.c | 19 +++++++++++++++----
>  1 file changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3c2b6144be23..88051bb673bf 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2688,22 +2688,33 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg)
>  {
>  	struct cxl_dpa_to_region_context *ctx = arg;
>  	struct cxl_endpoint_decoder *cxled;
> +	struct cxl_region *cxlr;
>  	u64 dpa = ctx->dpa;
>  
>  	if (!is_endpoint_decoder(dev))
>  		return 0;
>  
>  	cxled = to_cxl_endpoint_decoder(dev);
> -	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> +	if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res))

Why add the !cxled check? to_cxl_endpoint_decoder() only returns NULL if
someone messes up and sends a !is_endpoint_decoder(@dev) to this
conversion routine and that check was already performed above.

>  		return 0;
>  
>  	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
>  		return 0;
>  
> -	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> -		dev_name(&cxled->cxld.region->dev));
> +	/*
> +	 * Stop the region search (return 1) when an endpoint mapping is
> +	 * found. The region may not be fully constructed so offering
> +	 * the cxlr in the context structure is not guaranteed.
> +	 */

Comment looks good...

> +	cxlr = cxled->cxld.region;
> +	if (cxlr)
> +		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> +			dev_name(&cxlr->dev));
> +	else
> +		dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
> +			dev_name(dev));

...but I am not sure gymanistics to improve a debug message are worth
it, especially because all of the callers either emit the found region
in a tracepoint or print a warning, so I would just be in favor of
deleting the debug message altogether, or at most just trimming to the
"mapped in endpoint" one.
Alison Schofield June 5, 2024, 12:43 a.m. UTC | #3
On Tue, Jun 04, 2024 at 10:47:35AM -0700, Dan Williams wrote:
> Perhaps change the subject with: s/Avoid/Fix/?

OK

> 
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > cxl_dpa_to_region() looks up a region based on a memdev and DPA.
> > It wrongly assumes an endpoint found mapping the DPA is also of
> > a fully assembled region. When not true it leads to a null pointer
> > dereference looking up the region name.
> 
> Can you place the a snippet of the Call Trace here, that helps support
> that the writeup came to the right conclusion, and it helps folks
> hitting this error to find this patch in a web search.

Will add in v2

snip...
> 
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 3c2b6144be23..88051bb673bf 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2688,22 +2688,33 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg)
> >  {
> >  	struct cxl_dpa_to_region_context *ctx = arg;
> >  	struct cxl_endpoint_decoder *cxled;
> > +	struct cxl_region *cxlr;
> >  	u64 dpa = ctx->dpa;
> >  
> >  	if (!is_endpoint_decoder(dev))
> >  		return 0;
> >  
> >  	cxled = to_cxl_endpoint_decoder(dev);
> > -	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
> > +	if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res))
> 
> Why add the !cxled check? to_cxl_endpoint_decoder() only returns NULL if
> someone messes up and sends a !is_endpoint_decoder(@dev) to this
> conversion routine and that check was already performed above.
> 

Abundance of caution I suppose.  I'll remove.

> >  		return 0;
> >  
> >  	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
> >  		return 0;
> >  
> > -	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> > -		dev_name(&cxled->cxld.region->dev));
> > +	/*
> > +	 * Stop the region search (return 1) when an endpoint mapping is
> > +	 * found. The region may not be fully constructed so offering
> > +	 * the cxlr in the context structure is not guaranteed.
> > +	 */
> 
> Comment looks good...
> 
> > +	cxlr = cxled->cxld.region;
> > +	if (cxlr)
> > +		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> > +			dev_name(&cxlr->dev));
> > +	else
> > +		dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
> > +			dev_name(dev));
> 
> ...but I am not sure gymanistics to improve a debug message are worth
> it, especially because all of the callers either emit the found region
> in a tracepoint or print a warning, so I would just be in favor of
> deleting the debug message altogether, or at most just trimming to the
> "mapped in endpoint" one.

Yeah, probably went overboard here. Previously was offering the name
of the region so wanted to keep same - but seeing as this is a debug
only message, I think the endpoint alone, as you suggest is good.

Thanks for the review,
Alison
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3c2b6144be23..88051bb673bf 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2688,22 +2688,33 @@  static int __cxl_dpa_to_region(struct device *dev, void *arg)
 {
 	struct cxl_dpa_to_region_context *ctx = arg;
 	struct cxl_endpoint_decoder *cxled;
+	struct cxl_region *cxlr;
 	u64 dpa = ctx->dpa;
 
 	if (!is_endpoint_decoder(dev))
 		return 0;
 
 	cxled = to_cxl_endpoint_decoder(dev);
-	if (!cxled->dpa_res || !resource_size(cxled->dpa_res))
+	if (!cxled || !cxled->dpa_res || !resource_size(cxled->dpa_res))
 		return 0;
 
 	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
 		return 0;
 
-	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
-		dev_name(&cxled->cxld.region->dev));
+	/*
+	 * Stop the region search (return 1) when an endpoint mapping is
+	 * found. The region may not be fully constructed so offering
+	 * the cxlr in the context structure is not guaranteed.
+	 */
+	cxlr = cxled->cxld.region;
+	if (cxlr)
+		dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
+			dev_name(&cxlr->dev));
+	else
+		dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
+			dev_name(dev));
 
-	ctx->cxlr = cxled->cxld.region;
+	ctx->cxlr = cxlr;
 
 	return 1;
 }