Message ID | 168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com |
---|---|
State | Accepted |
Commit | adfe19738b71a893da62cb2e30bd6bdb4299ea67 |
Headers | show |
Series | cxl/region: Cache management and region decode reset fixes | expand |
On 6/16/23 18:24, Dan Williams wrote: > Jonathan reports that failed attempts to reset a region (teardown its > HDM decoder configuration) mistakenly advance the state of the region > to "not committed". Revert to the previous state of the region on reset > failure so that the reset can be re-attempted. > > Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> > Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> > --- > drivers/cxl/core/region.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 31f498f0fb3a..38db377e13f1 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -296,9 +296,11 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, > if (rc) > return rc; > > - if (commit) > + if (commit) { > rc = cxl_region_decode_commit(cxlr); > - else { > + if (rc == 0) > + p->state = CXL_CONFIG_COMMIT; > + } else { > p->state = CXL_CONFIG_RESET_PENDING; > up_write(&cxl_region_rwsem); > device_release_driver(&cxlr->dev); > @@ -308,18 +310,20 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, > * The lock was dropped, so need to revalidate that the reset is > * still pending. > */ > - if (p->state == CXL_CONFIG_RESET_PENDING) > + if (p->state == CXL_CONFIG_RESET_PENDING) { > rc = cxl_region_decode_reset(cxlr, p->interleave_ways); > + /* > + * Revert to committed since there may still be active > + * decoders associated with this region, or move forward > + * to active to mark the reset successful > + */ > + if (rc) > + p->state = CXL_CONFIG_COMMIT; > + else > + p->state = CXL_CONFIG_ACTIVE; > + } > } > > - if (rc) > - goto out; > - > - if (commit) > - p->state = CXL_CONFIG_COMMIT; > - else if (p->state == CXL_CONFIG_RESET_PENDING) > - p->state = CXL_CONFIG_ACTIVE; > - > out: > up_write(&cxl_region_rwsem); > >
On Fri, 16 Jun 2023 18:24:39 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > Jonathan reports that failed attempts to reset a region (teardown its > HDM decoder configuration) mistakenly advance the state of the region > to "not committed". Revert to the previous state of the region on reset > failure so that the reset can be re-attempted. > > Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> > Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > Signed-off-by: Dan Williams <dan.j.williams@intel.com> LGTM - though maybe even nicer if we can be pretty sure this will succeed before trying it.. (same comment as previous patch) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > --- > drivers/cxl/core/region.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index 31f498f0fb3a..38db377e13f1 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -296,9 +296,11 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, > if (rc) > return rc; > > - if (commit) > + if (commit) { > rc = cxl_region_decode_commit(cxlr); > - else { > + if (rc == 0) > + p->state = CXL_CONFIG_COMMIT; > + } else { > p->state = CXL_CONFIG_RESET_PENDING; > up_write(&cxl_region_rwsem); > device_release_driver(&cxlr->dev); > @@ -308,18 +310,20 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, > * The lock was dropped, so need to revalidate that the reset is > * still pending. > */ > - if (p->state == CXL_CONFIG_RESET_PENDING) > + if (p->state == CXL_CONFIG_RESET_PENDING) { > rc = cxl_region_decode_reset(cxlr, p->interleave_ways); > + /* > + * Revert to committed since there may still be active > + * decoders associated with this region, or move forward > + * to active to mark the reset successful > + */ > + if (rc) > + p->state = CXL_CONFIG_COMMIT; > + else > + p->state = CXL_CONFIG_ACTIVE; > + } > } > > - if (rc) > - goto out; > - > - if (commit) > - p->state = CXL_CONFIG_COMMIT; > - else if (p->state == CXL_CONFIG_RESET_PENDING) > - p->state = CXL_CONFIG_ACTIVE; > - > out: > up_write(&cxl_region_rwsem); > >
Jonathan Cameron wrote: > On Fri, 16 Jun 2023 18:24:39 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > > > Jonathan reports that failed attempts to reset a region (teardown its > > HDM decoder configuration) mistakenly advance the state of the region > > to "not committed". Revert to the previous state of the region on reset > > failure so that the reset can be re-attempted. > > > > Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> > > Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com > > Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") > > Signed-off-by: Dan Williams <dan.j.williams@intel.com> > LGTM - though maybe even nicer if we can be pretty sure this will succeed > before trying it.. (same comment as previous patch) I had the same reaction, but satisfied myself that this is something that userspace can manage. I.e. tooling can effectively predict when the kernel will complain about this ordering situation and prevent it. In other words, the only way this happens in practice is if userspace makes a mistake. It is already the case that partially committed decoders need to be tolerated by the platform since setup and teardown are not atomic. So I think 'cxl destroy-region' is where this follow-on smarts belongs. > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Thanks for the collaboration as always.
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 31f498f0fb3a..38db377e13f1 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -296,9 +296,11 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, if (rc) return rc; - if (commit) + if (commit) { rc = cxl_region_decode_commit(cxlr); - else { + if (rc == 0) + p->state = CXL_CONFIG_COMMIT; + } else { p->state = CXL_CONFIG_RESET_PENDING; up_write(&cxl_region_rwsem); device_release_driver(&cxlr->dev); @@ -308,18 +310,20 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, * The lock was dropped, so need to revalidate that the reset is * still pending. */ - if (p->state == CXL_CONFIG_RESET_PENDING) + if (p->state == CXL_CONFIG_RESET_PENDING) { rc = cxl_region_decode_reset(cxlr, p->interleave_ways); + /* + * Revert to committed since there may still be active + * decoders associated with this region, or move forward + * to active to mark the reset successful + */ + if (rc) + p->state = CXL_CONFIG_COMMIT; + else + p->state = CXL_CONFIG_ACTIVE; + } } - if (rc) - goto out; - - if (commit) - p->state = CXL_CONFIG_COMMIT; - else if (p->state == CXL_CONFIG_RESET_PENDING) - p->state = CXL_CONFIG_ACTIVE; - out: up_write(&cxl_region_rwsem);
Jonathan reports that failed attempts to reset a region (teardown its HDM decoder configuration) mistakenly advance the state of the region to "not committed". Revert to the previous state of the region on reset failure so that the reset can be re-attempted. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/cxl/core/region.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)