Message ID | 169696311899.1171696.7812961484055097837.stgit@bgt-140510-bm03.eng.stellus.in |
---|---|
State | Superseded |
Headers | show |
Series | cxl/region: don't try to cleanup after cxl_region_setup_targets() fails | expand |
On Tue, Oct 10, 2023 at 06:38:39PM +0000, Jim Harris wrote: > Patch 5e42bcbc ("cxl/region: decrement ->nr_targets on error in > cxl_region_attach()") tried to avoid 'eiw' initialization errors when > ->nr_targets exceeded 16, by just decrementing ->nr_targets when > cxl_region_setup_targets() failed. I mean that's what I wrote but I fairly sure that I was concerned about ->nr_targets getting incremented to an invalid value. drivers/cxl/core/region.c 1746 p->targets[pos] = cxled; ^^^^^^^^^^^^^^^ This array has CXL_DECODER_MAX_INTERLEAVE (16) elements. 1747 cxled->pos = pos; 1748 p->nr_targets++; 1749 1750 if (p->nr_targets == p->interleave_ways) { ^^^^^^^^^^^^^^^^^^ This is how many we want, but it's capped at 16 so we don't go over. Like I guess we add one at a time until we hit the max and then when we get everything added 1751 rc = cxl_region_setup_targets(cxlr); Then we register stuff. So if we decrement and try to attach another region then my idea was that it would write over the last element in the array. But if we don't have the decrement and we try to attach another region it will go beyond the end of the array. 1752 if (rc) 1753 goto err_decrement; 1754 p->state = CXL_CONFIG_ACTIVE; 1755 } 1756 1757 cxled->cxld.interleave_ways = p->interleave_ways; 1758 cxled->cxld.interleave_granularity = p->interleave_granularity; 1759 cxled->cxld.hpa_range = (struct range) { 1760 .start = p->res->start, 1761 .end = p->res->end, 1762 }; 1763 1764 return 0; 1765 1766 err_decrement: 1767 p->nr_targets--; 1768 cxled->pos = -1; 1769 p->targets[pos] = NULL; 1770 return rc; 1771 } But I was just going from static analysis and code review and not testing and obviously you have tested this. A simple fix for my concern would be to do this: regards, dan carpenter diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 6d63b8798c29..5948c4a01745 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1649,6 +1649,11 @@ static int cxl_region_attach(struct cxl_region *cxlr, return -ENODEV; } + if (p->nr_targets >= p->interleave_ways) { + dev_dbg(&cxlr->dev, "%s too many regions\n", dev_name(&cxled->cxld.dev)); + return -EINVAL; + } + /* all full of members, or interleave config not established? */ if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { dev_dbg(&cxlr->dev, "region already active\n");
> On Oct 11, 2023, at 7:04 AM, Dan Carpenter <dan.carpenter@linaro.org> wrote: > > On Tue, Oct 10, 2023 at 06:38:39PM +0000, Jim Harris wrote: >> Patch 5e42bcbc ("cxl/region: decrement ->nr_targets on error in >> cxl_region_attach()") tried to avoid 'eiw' initialization errors when >> ->nr_targets exceeded 16, by just decrementing ->nr_targets when >> cxl_region_setup_targets() failed. > > I mean that's what I wrote but I fairly sure that I was concerned about > ->nr_targets getting incremented to an invalid value. > > drivers/cxl/core/region.c > 1746 p->targets[pos] = cxled; > ^^^^^^^^^^^^^^^ > This array has CXL_DECODER_MAX_INTERLEAVE (16) elements. Agreed, we need to guard against the array overflow too. > > But I was just going from static analysis and code review and not > testing and obviously you have tested this. A simple fix for my > concern would be to do this: > > regards, > dan carpenter > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 6d63b8798c29..5948c4a01745 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -1649,6 +1649,11 @@ static int cxl_region_attach(struct cxl_region *cxlr, > return -ENODEV; > } > > + if (p->nr_targets >= p->interleave_ways) { > + dev_dbg(&cxlr->dev, "%s too many regions\n", dev_name(&cxled->cxld.dev)); > + return -EINVAL; > + } > + > /* all full of members, or interleave config not established? */ > if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { > dev_dbg(&cxlr->dev, "region already active\n”); I’ll push a v2. I had to convince myself that we didn’t also need a comparison against CXL_DECODER_MAX_INTERLEAVE. But interleave_ways_store() will fail with a value > 16 via the ways_to_eiw() call, so the p->interleave_ways check is sufficient.
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 6d63b8798c29..315ca1640e06 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1750,7 +1750,7 @@ static int cxl_region_attach(struct cxl_region *cxlr, if (p->nr_targets == p->interleave_ways) { rc = cxl_region_setup_targets(cxlr); if (rc) - goto err_decrement; + return rc; p->state = CXL_CONFIG_ACTIVE; } @@ -1762,12 +1762,6 @@ static int cxl_region_attach(struct cxl_region *cxlr, }; return 0; - -err_decrement: - p->nr_targets--; - cxled->pos = -1; - p->targets[pos] = NULL; - return rc; } static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
Patch 5e42bcbc ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") tried to avoid 'eiw' initialization errors when ->nr_targets exceeded 16, by just decrementing ->nr_targets when cxl_region_setup_targets() failed. Patch 86987c76 ("cxl/region: Cleanup target list on attach error") extended that cleanup to also clear cxled->pos and p->targets[pos]. The initialization error was incidentally fixed separately by patch 8d4285425 ("cxl/region: Fix port setup uninitialized variable warnings") which was merged a few days after 5e42bcbc. But now the original cleanup when cxl_region_setup_targets() fails prevents endpoint and switch decoder resources from being reused: 1) the cleanup does not set the decoder's region to NULL, which results in future dpa_size_store() calls returning -EBUSY 2) the decoder is not properly freed, which results in future commit errors associated with the upstream switch Now that the initialization errors were fixed separately, the proper cleanup for this case is to just return immediately. Then the resources associated with this target get cleanup up as normal when the failed region is deleted. Tested by trying to create an invalid region for a 2 switch * 2 endpoint topology, and then following up with creating a valid region. Signed-off-by: Jim Harris <jim.harris@samsung.com> --- drivers/cxl/core/region.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)