Message ID | 20220128213150.1333552-8-jane.chu@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | DAX poison recovery | expand |
On Fri, Jan 28, 2022 at 02:31:50PM -0700, Jane Chu wrote: > + if (!bad_pmem) { > write_pmem(pmem_addr, page, page_off, len); > + } else { > + rc = pmem_clear_poison(pmem, pmem_off, len); > + if (rc == BLK_STS_OK) > + write_pmem(pmem_addr, page, page_off, len); > + else > + pr_warn("%s: failed to clear poison\n", > + __func__); This warning probably needs ratelimiting. Also this flow looks a little odd. I'd redo the whole function with a clear bad_mem case: if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) { blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len); if (rc != BLK_STS_OK) { pr_warn("%s: failed to clear poison\n", __func__); return rc; } } flush_dcache_page(page); write_pmem(pmem_addr, page, page_off, len); return BLK_STS_OK;
On 2/2/2022 5:28 AM, Christoph Hellwig wrote: > On Fri, Jan 28, 2022 at 02:31:50PM -0700, Jane Chu wrote: >> + if (!bad_pmem) { >> write_pmem(pmem_addr, page, page_off, len); >> + } else { >> + rc = pmem_clear_poison(pmem, pmem_off, len); >> + if (rc == BLK_STS_OK) >> + write_pmem(pmem_addr, page, page_off, len); >> + else >> + pr_warn("%s: failed to clear poison\n", >> + __func__); > > This warning probably needs ratelimiting. Will do, in case bad hardware is encountered, I can see lots of warnings. > > Also this flow looks a little odd. I'd redo the whole function with a > clear bad_mem case: > > if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) { > blk_status_t rc = pmem_clear_poison(pmem, pmem_off, len); > > if (rc != BLK_STS_OK) { > pr_warn("%s: failed to clear poison\n", __func__); > return rc; > } > } > flush_dcache_page(page); > write_pmem(pmem_addr, page, page_off, len); > return BLK_STS_OK; > > This is much better, thanks for the suggestion! -jane
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index f2d6b34d48de..283890084d58 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -187,10 +187,15 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem, * after clear poison. */ flush_dcache_page(page); - write_pmem(pmem_addr, page, page_off, len); - if (unlikely(bad_pmem)) { - rc = pmem_clear_poison(pmem, pmem_off, len); + if (!bad_pmem) { write_pmem(pmem_addr, page, page_off, len); + } else { + rc = pmem_clear_poison(pmem, pmem_off, len); + if (rc == BLK_STS_OK) + write_pmem(pmem_addr, page, page_off, len); + else + pr_warn("%s: failed to clear poison\n", + __func__); } return rc;
Since poisoned page is marked as not-present, the first of the two back-to-back write_pmem() calls can only be made when there is no poison in the range, otherwise kernel Oops. Signed-off-by: Jane Chu <jane.chu@oracle.com> --- drivers/nvdimm/pmem.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)