Message ID | 20240605021928.223287-1-alison.schofield@intel.com |
---|---|
State | New, archived |
Headers | show |
Series | [v2] cxl/region: Fix null pointer dereference in region lookup | expand |
On Tue, 4 Jun 2024 19:19:28 -0700 alison.schofield@intel.com wrote: > From: Alison Schofield <alison.schofield@intel.com> > > cxl_dpa_to_region() looks up a region based on a memdev and DPA. > It wrongly assumes an endpoint found mapping the DPA is also of > a fully assembled region. When not true it leads to a null pointer > dereference looking up the region name. > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > RIP: 0010:__cxl_dpa_to_region+0x8c/0xc0 [cxl_core] > Call Trace: > <TASK> > ? show_regs+0x5f/0x70 > ? __die+0x1f/0x70 > ? page_fault_oops+0x14b/0x430 > ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core] > ? search_exception_tables+0x5b/0x60 > ? fixup_exception+0x22/0x300 > ? kernelmode_fixup_or_oops.constprop.0+0x5a/0x70 > ? __bad_area_nosemaphore+0x166/0x230 > ? up_read+0x43/0x90 > ? bad_area_nosemaphore+0x11/0x20 > ? do_user_addr_fault+0x2cb/0x6b0 > ? find_held_lock+0x31/0x90 > ? exc_page_fault+0x6e/0x220 > ? asm_exc_page_fault+0x27/0x30 > ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core] > ? __cxl_dpa_to_region+0x35/0xc0 [cxl_core] > ? __pfx___cxl_dpa_to_region+0x10/0x10 [cxl_core] > device_for_each_child+0x4a/0x70 > cxl_dpa_to_region+0x61/0x70 [cxl_core] > cxl_inject_poison+0xde/0x1e0 [cxl_core] > cxl_debugfs_poison_inject+0x9/0x10 [cxl_mem] > > This appears during testing of region lookup after a failure to > assemble a BIOS defined region or if the lookup raced with the > assembly of the BIOS defined region. > > Failure to clean up BIOS defined regions that fail assembly is an > issue in itself and a fix to that problem will alleviate some of > the impact. It will not alleviate the race condition so let's harden > this path. > > The behavior change is that the kernel oops due to a null pointer > dereference is replaced with a dev_dbg() message noting that an > endpoint was mapped. > > Additional comments are added so that future users of this function > can more clearly understand what it provides. > > Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region") > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> This simplified version works for me so keep the tag... However given the spirit of the patch changed rather a lot would have been reasonable to drop the RB. > --- > > Changes in v2: > - emit endpoint dev_dbg() message only, rm cxlr-ness (Dan) > - rm redundant null cxled check (Dan) > - add stack trace to commit log (Dan) > - s/Avoid/Fix in commit msg (Dan) > - add Reviewed-by Tag (Jonathan) > > drivers/cxl/core/region.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 3c2b6144be23..d9819650c529 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -2700,9 +2700,13 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg) > if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start) > return 0; > > - dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa, > - dev_name(&cxled->cxld.region->dev)); > - > + /* > + * Stop the region search (return 1) when an endpoint mapping is > + * found. The region may not be fully constructed so offering > + * the cxlr in the context structure is not guaranteed. > + */ > + dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa, > + dev_name(dev)); > ctx->cxlr = cxled->cxld.region; > > return 1; > > base-commit: 49ba7b515c4c0719b866d16f068e62d16a8a3dd1
On Wed, Jun 05, 2024 at 01:18:35PM +0100, Jonathan Cameron wrote: > On Tue, 4 Jun 2024 19:19:28 -0700 > alison.schofield@intel.com wrote: > > > From: Alison Schofield <alison.schofield@intel.com> > > > > cxl_dpa_to_region() looks up a region based on a memdev and DPA. > > It wrongly assumes an endpoint found mapping the DPA is also of > > a fully assembled region. When not true it leads to a null pointer > > dereference looking up the region name. > > > > BUG: kernel NULL pointer dereference, address: 0000000000000050 > > RIP: 0010:__cxl_dpa_to_region+0x8c/0xc0 [cxl_core] > > Call Trace: > > <TASK> > > ? show_regs+0x5f/0x70 > > ? __die+0x1f/0x70 > > ? page_fault_oops+0x14b/0x430 > > ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core] > > ? search_exception_tables+0x5b/0x60 > > ? fixup_exception+0x22/0x300 > > ? kernelmode_fixup_or_oops.constprop.0+0x5a/0x70 > > ? __bad_area_nosemaphore+0x166/0x230 > > ? up_read+0x43/0x90 > > ? bad_area_nosemaphore+0x11/0x20 > > ? do_user_addr_fault+0x2cb/0x6b0 > > ? find_held_lock+0x31/0x90 > > ? exc_page_fault+0x6e/0x220 > > ? asm_exc_page_fault+0x27/0x30 > > ? __cxl_dpa_to_region+0x8c/0xc0 [cxl_core] > > ? __cxl_dpa_to_region+0x35/0xc0 [cxl_core] > > ? __pfx___cxl_dpa_to_region+0x10/0x10 [cxl_core] > > device_for_each_child+0x4a/0x70 > > cxl_dpa_to_region+0x61/0x70 [cxl_core] > > cxl_inject_poison+0xde/0x1e0 [cxl_core] > > cxl_debugfs_poison_inject+0x9/0x10 [cxl_mem] > > > > This appears during testing of region lookup after a failure to > > assemble a BIOS defined region or if the lookup raced with the > > assembly of the BIOS defined region. > > > > Failure to clean up BIOS defined regions that fail assembly is an > > issue in itself and a fix to that problem will alleviate some of > > the impact. It will not alleviate the race condition so let's harden > > this path. > > > > The behavior change is that the kernel oops due to a null pointer > > dereference is replaced with a dev_dbg() message noting that an > > endpoint was mapped. > > > > Additional comments are added so that future users of this function > > can more clearly understand what it provides. > > > > Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region") > > Signed-off-by: Alison Schofield <alison.schofield@intel.com> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > This simplified version works for me so keep the tag... However given the spirit of > the patch changed rather a lot would have been reasonable to drop the RB. Understand, sorry for the oversight. Thanks for the second review too. -- Alison
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 3c2b6144be23..d9819650c529 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2700,9 +2700,13 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg) if (dpa > cxled->dpa_res->end || dpa < cxled->dpa_res->start) return 0; - dev_dbg(dev, "dpa:0x%llx mapped in region:%s\n", dpa, - dev_name(&cxled->cxld.region->dev)); - + /* + * Stop the region search (return 1) when an endpoint mapping is + * found. The region may not be fully constructed so offering + * the cxlr in the context structure is not guaranteed. + */ + dev_dbg(dev, "dpa:0x%llx mapped in endpoint:%s\n", dpa, + dev_name(dev)); ctx->cxlr = cxled->cxld.region; return 1;