Message ID | 20221215170909.2650271-1-fan.ni@samsung.com |
---|---|
State | Accepted |
Commit | 01d2cb2593b17b36e69a697567cb493f0a096c73 |
Headers | show |
Series | cxl/region: Fix null pointer dereference for resetting decoder | expand |
On Thu, 15 Dec 2022 17:09:14 +0000 Fan Ni <fan.ni@samsung.com> wrote: > Not all decoders have a reset callback. > > The CXL specification allows a host bridge with a single root port to > have no explicit HDM decoders. Currently the region driver assumes there > are none. As such the CXL core creates a special pass through decoder > instance without a commit/reset callback. > > Prior to this patch, the ->reset() callback was called unconditionally when > calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, > 1 Root Port, and one directly attached CXL type 3 device or multiple CXL > type 3 devices attached to downstream ports of a switch can cause a null > pointer dereference. > > Before the fix, a kernel crash was observed when we destroy the region, and > a pass through decoder is reset. > > The issue can be reproduced as below, > 1) create a region with a CXL setup which includes a HB with a > single root port under which a memdev is attached directly. > 2) destroy the region with cxl destroy-region regionX -f. > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Fan Ni <fan.ni@samsung.com> Explanation seems correct to me. Only question (and it's one for the Maintainers) is whether they prefer optionality here or a stub reset() implementation for the pass through decoder. either way Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/cxl/core/region.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index f9ae5ad284ff..3931793a13ac 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > struct cxl_port *iter = cxled_to_port(cxled); > struct cxl_ep *ep; > - int rc; > + int rc = 0; > > while (!is_cxl_root(to_cxl_port(iter->dev.parent))) > iter = to_cxl_port(iter->dev.parent); > @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > - rc = cxld->reset(cxld); > + if (cxld->reset) > + rc = cxld->reset(cxld); > if (rc) > return rc; > } > @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr) > iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > - cxld->reset(cxld); > + if (cxld->reset) > + cxld->reset(cxld); > } > > cxled->cxld.reset(&cxled->cxld);
On 12/15/22 10:09 AM, Fan Ni wrote: > Not all decoders have a reset callback. > > The CXL specification allows a host bridge with a single root port to > have no explicit HDM decoders. Currently the region driver assumes there > are none. As such the CXL core creates a special pass through decoder > instance without a commit/reset callback. > > Prior to this patch, the ->reset() callback was called unconditionally when > calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, > 1 Root Port, and one directly attached CXL type 3 device or multiple CXL > type 3 devices attached to downstream ports of a switch can cause a null > pointer dereference. > > Before the fix, a kernel crash was observed when we destroy the region, and > a pass through decoder is reset. > > The issue can be reproduced as below, > 1) create a region with a CXL setup which includes a HB with a > single root port under which a memdev is attached directly. > 2) destroy the region with cxl destroy-region regionX -f. > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Fan Ni <fan.ni@samsung.com> Makes sense, especially with the emulated decoders coming w/o ->reset(). Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/region.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index f9ae5ad284ff..3931793a13ac 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > struct cxl_port *iter = cxled_to_port(cxled); > struct cxl_ep *ep; > - int rc; > + int rc = 0; > > while (!is_cxl_root(to_cxl_port(iter->dev.parent))) > iter = to_cxl_port(iter->dev.parent); > @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > - rc = cxld->reset(cxld); > + if (cxld->reset) > + rc = cxld->reset(cxld); > if (rc) > return rc; > } > @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr) > iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > - cxld->reset(cxld); > + if (cxld->reset) > + cxld->reset(cxld); > } > > cxled->cxld.reset(&cxled->cxld);
On Thu, 15 Dec 2022, Fan Ni wrote: >Not all decoders have a reset callback. > >The CXL specification allows a host bridge with a single root port to >have no explicit HDM decoders. Currently the region driver assumes there >are none. As such the CXL core creates a special pass through decoder >instance without a commit/reset callback. > >Prior to this patch, the ->reset() callback was called unconditionally when >calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, >1 Root Port, and one directly attached CXL type 3 device or multiple CXL >type 3 devices attached to downstream ports of a switch can cause a null >pointer dereference. > >Before the fix, a kernel crash was observed when we destroy the region, and >a pass through decoder is reset. > >The issue can be reproduced as below, > 1) create a region with a CXL setup which includes a HB with a > single root port under which a memdev is attached directly. > 2) destroy the region with cxl destroy-region regionX -f. > >Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") >Signed-off-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Jonathan Cameron wrote: > On Thu, 15 Dec 2022 17:09:14 +0000 > Fan Ni <fan.ni@samsung.com> wrote: > > > Not all decoders have a reset callback. > > > > The CXL specification allows a host bridge with a single root port to > > have no explicit HDM decoders. Currently the region driver assumes there > > are none. As such the CXL core creates a special pass through decoder > > instance without a commit/reset callback. > > > > Prior to this patch, the ->reset() callback was called unconditionally when > > calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, > > 1 Root Port, and one directly attached CXL type 3 device or multiple CXL > > type 3 devices attached to downstream ports of a switch can cause a null > > pointer dereference. > > > > Before the fix, a kernel crash was observed when we destroy the region, and > > a pass through decoder is reset. > > > > The issue can be reproduced as below, > > 1) create a region with a CXL setup which includes a HB with a > > single root port under which a memdev is attached directly. > > 2) destroy the region with cxl destroy-region regionX -f. > > > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > Explanation seems correct to me. Only question (and it's one for the > Maintainers) is whether they prefer optionality here or a stub reset() > implementation for the pass through decoder. Yeah, I think this fix as is works for the purposes of the -stable backport and then a follow-on can add the optionality.
On Thu, Dec 15, 2022 at 05:09:14PM +0000, Fan Ni wrote: > Not all decoders have a reset callback. > > The CXL specification allows a host bridge with a single root port to > have no explicit HDM decoders. Currently the region driver assumes there > are none. As such the CXL core creates a special pass through decoder > instance without a commit/reset callback. > > Prior to this patch, the ->reset() callback was called unconditionally when > calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, > 1 Root Port, and one directly attached CXL type 3 device or multiple CXL > type 3 devices attached to downstream ports of a switch can cause a null > pointer dereference. > > Before the fix, a kernel crash was observed when we destroy the region, and > a pass through decoder is reset. > > The issue can be reproduced as below, > 1) create a region with a CXL setup which includes a HB with a > single root port under which a memdev is attached directly. > 2) destroy the region with cxl destroy-region regionX -f. > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Fan Ni <fan.ni@samsung.com> > --- > drivers/cxl/core/region.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index f9ae5ad284ff..3931793a13ac 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > struct cxl_port *iter = cxled_to_port(cxled); > struct cxl_ep *ep; > - int rc; > + int rc = 0; > > while (!is_cxl_root(to_cxl_port(iter->dev.parent))) > iter = to_cxl_port(iter->dev.parent); > @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) > > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > - rc = cxld->reset(cxld); > + if (cxld->reset) > + rc = cxld->reset(cxld); > if (rc) > return rc; > } > @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr) > iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { > cxl_rr = cxl_rr_load(iter, cxlr); > cxld = cxl_rr->decoder; > - cxld->reset(cxld); > + if (cxld->reset) > + cxld->reset(cxld); > } > > cxled->cxld.reset(&cxled->cxld); > -- > 2.25.1 Should we try to get this upstreamed in 6.2-final? Seems like a good stable addition. Probably doesn't affect real hardware, but it certainly affects QEMU. Tested-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com>
Gregory Price wrote: [..] > Should we try to get this upstreamed in 6.2-final? Seems like a good > stable addition. Probably doesn't affect real hardware, but it certainly > affects QEMU. Yes, that's the plan. https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/commit/?h=fixes&id=01d2cb2593b1
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index f9ae5ad284ff..3931793a13ac 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -131,7 +131,7 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_port *iter = cxled_to_port(cxled); struct cxl_ep *ep; - int rc; + int rc = 0; while (!is_cxl_root(to_cxl_port(iter->dev.parent))) iter = to_cxl_port(iter->dev.parent); @@ -143,7 +143,8 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) cxl_rr = cxl_rr_load(iter, cxlr); cxld = cxl_rr->decoder; - rc = cxld->reset(cxld); + if (cxld->reset) + rc = cxld->reset(cxld); if (rc) return rc; } @@ -186,7 +187,8 @@ static int cxl_region_decode_commit(struct cxl_region *cxlr) iter = ep->next, ep = cxl_ep_load(iter, cxlmd)) { cxl_rr = cxl_rr_load(iter, cxlr); cxld = cxl_rr->decoder; - cxld->reset(cxld); + if (cxld->reset) + cxld->reset(cxld); } cxled->cxld.reset(&cxled->cxld);
Not all decoders have a reset callback. The CXL specification allows a host bridge with a single root port to have no explicit HDM decoders. Currently the region driver assumes there are none. As such the CXL core creates a special pass through decoder instance without a commit/reset callback. Prior to this patch, the ->reset() callback was called unconditionally when calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, 1 Root Port, and one directly attached CXL type 3 device or multiple CXL type 3 devices attached to downstream ports of a switch can cause a null pointer dereference. Before the fix, a kernel crash was observed when we destroy the region, and a pass through decoder is reset. The issue can be reproduced as below, 1) create a region with a CXL setup which includes a HB with a single root port under which a memdev is attached directly. 2) destroy the region with cxl destroy-region regionX -f. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Fan Ni <fan.ni@samsung.com> --- drivers/cxl/core/region.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)