diff mbox series

[v2] cxl/region: Fix null pointer dereference in region lookup

Message ID 20240605021928.223287-1-alison.schofield@intel.com
State New, archived
Headers show
Series [v2] cxl/region: Fix null pointer dereference in region lookup | expand

Commit Message

Alison Schofield June 5, 2024, 2:19 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

cxl_dpa_to_region() looks up a region based on a memdev and DPA.
It wrongly assumes an endpoint found mapping the DPA is also of
a fully assembled region. When not true it leads to a null pointer
dereference looking up the region name.

BUG: kernel NULL pointer dereference, address: 0000000000000050
RIP: 0010:__cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
Call Trace:
 <TASK>
 ? show_regs+0x5f/0x70
 ? __die+0x1f/0x70
 ? page_fault_oops+0x14b/0x430
 ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
 ? search_exception_tables+0x5b/0x60
 ? fixup_exception+0x22/0x300
 ? kernelmode_fixup_or_oops.constprop.0+0x5a/0x70
 ? __bad_area_nosemaphore+0x166/0x230
 ? up_read+0x43/0x90
 ? bad_area_nosemaphore+0x11/0x20
 ? do_user_addr_fault+0x2cb/0x6b0
 ? find_held_lock+0x31/0x90
 ? exc_page_fault+0x6e/0x220
 ? asm_exc_page_fault+0x27/0x30
 ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
 ? __cxl_dpa_to_region+0x35/0xc0 [cxl_core]
 ? __pfx___cxl_dpa_to_region+0x10/0x10 [cxl_core]
 device_for_each_child+0x4a/0x70
 cxl_dpa_to_region+0x61/0x70 [cxl_core]
 cxl_inject_poison+0xde/0x1e0 [cxl_core]
 cxl_debugfs_poison_inject+0x9/0x10 [cxl_mem]

This appears during testing of region lookup after a failure to
assemble a BIOS defined region or if the lookup raced with the
assembly of the BIOS defined region.

Failure to clean up BIOS defined regions that fail assembly is an
issue in itself and a fix to that problem will alleviate some of
the impact. It will not alleviate the race condition so let's harden
this path.

The behavior change is that the kernel oops due to a null pointer
dereference is replaced with a dev_dbg() message noting that an
endpoint was mapped.

Additional comments are added so that future users of this function
can more clearly understand what it provides.

Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---

Changes in v2:
- emit endpoint dev_dbg() message only, rm cxlr-ness (Dan)
- rm redundant null cxled check (Dan)
- add stack trace to commit log (Dan)
- s/Avoid/Fix in commit msg (Dan)
- add Reviewed-by Tag (Jonathan)

 drivers/cxl/core/region.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)


base-commit: 49ba7b515c4c0719b866d16f068e62d16a8a3dd1

Comments

Jonathan Cameron June 5, 2024, 12:18 p.m. UTC | #1
On Tue,  4 Jun 2024 19:19:28 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> cxl_dpa_to_region() looks up a region based on a memdev and DPA.
> It wrongly assumes an endpoint found mapping the DPA is also of
> a fully assembled region. When not true it leads to a null pointer
> dereference looking up the region name.
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000050
> RIP: 0010:__cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
> Call Trace:
>  <TASK>
>  ? show_regs+0x5f/0x70
>  ? __die+0x1f/0x70
>  ? page_fault_oops+0x14b/0x430
>  ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
>  ? search_exception_tables+0x5b/0x60
>  ? fixup_exception+0x22/0x300
>  ? kernelmode_fixup_or_oops.constprop.0+0x5a/0x70
>  ? __bad_area_nosemaphore+0x166/0x230
>  ? up_read+0x43/0x90
>  ? bad_area_nosemaphore+0x11/0x20
>  ? do_user_addr_fault+0x2cb/0x6b0
>  ? find_held_lock+0x31/0x90
>  ? exc_page_fault+0x6e/0x220
>  ? asm_exc_page_fault+0x27/0x30
>  ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
>  ? __cxl_dpa_to_region+0x35/0xc0 [cxl_core]
>  ? __pfx___cxl_dpa_to_region+0x10/0x10 [cxl_core]
>  device_for_each_child+0x4a/0x70
>  cxl_dpa_to_region+0x61/0x70 [cxl_core]
>  cxl_inject_poison+0xde/0x1e0 [cxl_core]
>  cxl_debugfs_poison_inject+0x9/0x10 [cxl_mem]
> 
> This appears during testing of region lookup after a failure to
> assemble a BIOS defined region or if the lookup raced with the
> assembly of the BIOS defined region.
> 
> Failure to clean up BIOS defined regions that fail assembly is an
> issue in itself and a fix to that problem will alleviate some of
> the impact. It will not alleviate the race condition so let's harden
> this path.
> 
> The behavior change is that the kernel oops due to a null pointer
> dereference is replaced with a dev_dbg() message noting that an
> endpoint was mapped.
> 
> Additional comments are added so that future users of this function
> can more clearly understand what it provides.
> 
> Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

This simplified version works for me so keep the tag... However given the spirit of
the patch changed rather a lot would have been reasonable to drop the RB.

> ---
> 
> Changes in v2:
> - emit endpoint dev_dbg() message only, rm cxlr-ness (Dan)
> - rm redundant null cxled check (Dan)
> - add stack trace to commit log (Dan)
> - s/Avoid/Fix in commit msg (Dan)
> - add Reviewed-by Tag (Jonathan)
> 
>  drivers/cxl/core/region.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3c2b6144be23..d9819650c529 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2700,9 +2700,13 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg)
>  	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
>  		return 0;
>  
> -	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
> -		dev_name(&cxled->cxld.region->dev));
> -
> +	/*
> +	 * Stop the region search (return 1) when an endpoint mapping is
> +	 * found. The region may not be fully constructed so offering
> +	 * the cxlr in the context structure is not guaranteed.
> +	 */
> +	dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
> +		dev_name(dev));
>  	ctx->cxlr = cxled->cxld.region;
>  
>  	return 1;
> 
> base-commit: 49ba7b515c4c0719b866d16f068e62d16a8a3dd1
Alison Schofield June 5, 2024, 3:31 p.m. UTC | #2
On Wed, Jun 05, 2024 at 01:18:35PM +0100, Jonathan Cameron wrote:
> On Tue,  4 Jun 2024 19:19:28 -0700
> alison.schofield@intel.com wrote:
> 
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > cxl_dpa_to_region() looks up a region based on a memdev and DPA.
> > It wrongly assumes an endpoint found mapping the DPA is also of
> > a fully assembled region. When not true it leads to a null pointer
> > dereference looking up the region name.
> > 
> > BUG: kernel NULL pointer dereference, address: 0000000000000050
> > RIP: 0010:__cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
> > Call Trace:
> >  <TASK>
> >  ? show_regs+0x5f/0x70
> >  ? __die+0x1f/0x70
> >  ? page_fault_oops+0x14b/0x430
> >  ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
> >  ? search_exception_tables+0x5b/0x60
> >  ? fixup_exception+0x22/0x300
> >  ? kernelmode_fixup_or_oops.constprop.0+0x5a/0x70
> >  ? __bad_area_nosemaphore+0x166/0x230
> >  ? up_read+0x43/0x90
> >  ? bad_area_nosemaphore+0x11/0x20
> >  ? do_user_addr_fault+0x2cb/0x6b0
> >  ? find_held_lock+0x31/0x90
> >  ? exc_page_fault+0x6e/0x220
> >  ? asm_exc_page_fault+0x27/0x30
> >  ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core]
> >  ? __cxl_dpa_to_region+0x35/0xc0 [cxl_core]
> >  ? __pfx___cxl_dpa_to_region+0x10/0x10 [cxl_core]
> >  device_for_each_child+0x4a/0x70
> >  cxl_dpa_to_region+0x61/0x70 [cxl_core]
> >  cxl_inject_poison+0xde/0x1e0 [cxl_core]
> >  cxl_debugfs_poison_inject+0x9/0x10 [cxl_mem]
> > 
> > This appears during testing of region lookup after a failure to
> > assemble a BIOS defined region or if the lookup raced with the
> > assembly of the BIOS defined region.
> > 
> > Failure to clean up BIOS defined regions that fail assembly is an
> > issue in itself and a fix to that problem will alleviate some of
> > the impact. It will not alleviate the race condition so let's harden
> > this path.
> > 
> > The behavior change is that the kernel oops due to a null pointer
> > dereference is replaced with a dev_dbg() message noting that an
> > endpoint was mapped.
> > 
> > Additional comments are added so that future users of this function
> > can more clearly understand what it provides.
> > 
> > Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> This simplified version works for me so keep the tag... However given the spirit of
> the patch changed rather a lot would have been reasonable to drop the RB.

Understand, sorry for the oversight.
Thanks for the second review too.
-- Alison
diff mbox series

Patch

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3c2b6144be23..d9819650c529 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2700,9 +2700,13 @@  static int __cxl_dpa_to_region(struct device *dev, void *arg)
 	if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start)
 		return 0;
 
-	dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa,
-		dev_name(&cxled->cxld.region->dev));
-
+	/*
+	 * Stop the region search (return 1) when an endpoint mapping is
+	 * found. The region may not be fully constructed so offering
+	 * the cxlr in the context structure is not guaranteed.
+	 */
+	dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa,
+		dev_name(dev));
 	ctx->cxlr = cxled->cxld.region;
 
 	return 1;