Message ID | 20190127065421.10662-1-hlitz@ucsc.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lightnvm: pblk: fix race condition on GC | expand |
> On 27 Jan 2019, at 07.54, Heiner Litz <hlitz@ucsc.edu> wrote: > > This patch fixes a race condition where a write is mapped to the last > sectors of a line. The write is synced to the device but the L2P is not > updated yet. When the line is garbage collected before the L2P update is > performed, the sectors are ignored by the GC logic and the line is freed > before all sectors are moved. When the L2P is finally updated, it contains > a mapping to a freed line, subsequent reads of the corresponding LBAs fail. Hi Heiner, This has been an interesting issue to debug - good catch! > > Note that looking up the L2P and checking the ppa in the write buffer needs > to be performed atomically, hence the refactor of pblk_lookup_l2p_rand. > > Signed-off-by: Heiner Litz <hlitz@ucsc.edu> > --- > drivers/lightnvm/pblk-read.c | 27 +++++++++++++++++++++++++-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c > index 3789185144da..7c556b2218e4 100644 > --- a/drivers/lightnvm/pblk-read.c > +++ b/drivers/lightnvm/pblk-read.c > @@ -529,13 +529,35 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, > int valid_secs = 0; > int i; > > - pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs); > - > + spin_lock(&pblk->trans_lock); > for (i = 0; i < nr_secs; i++) { > if (lba_list[i] == ADDR_EMPTY) > continue; > > + ppa_list_l2p[i] = pblk_trans_map_get(pblk, lba_list[i]); > ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id); > + > + /* Obtain ppa from cache if the sector has been synced to the > + device but the L2P has not been updated yet */ > + if(pblk_addr_in_cache(ppa_list_l2p[i])) { > + struct pblk_rb *rb = &pblk->rwb; > + struct pblk_rb_entry *entry; > + struct pblk_w_ctx *w_ctx; > + u64 pos = pblk_addr_to_cacheline(ppa_list_l2p[i]); > + > +#ifdef CONFIG_NVM_PBLK_DEBUG > + /* Ensure that the access will not cause an overflow */ > + BUG_ON(pos >= rb->nr_entries); > +#endif > + > + entry = &rb->entries[pos]; > + w_ctx = &entry->w_ctx; > + if (pblk_ppa_comp(w_ctx->ppa, ppa_gc)) { > + rqd->ppa_list[valid_secs++] = ppa_gc; > + continue; > + } > + } > + > if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) { > paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY; > continue; > @@ -543,6 +565,7 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, > > rqd->ppa_list[valid_secs++] = ppa_list_l2p[i]; > } > + spin_unlock(&pblk->trans_lock); > > #ifdef CONFIG_NVM_PBLK_DEBUG > atomic_long_add(valid_secs, &pblk->inflight_reads); > -- > 2.17.1 Here is a suggestion: Why not add an atomic counter to the line stating the sectors that are synced in the L2P table and then loosely wait (i.e., check and sleep / schedule) until the counter reaches 0 on pblk_line_close_ws()? This way you guarantee that the line does not close - and therefore never reaches the GC lists - before all the L2P entries for that line point to the media. Any other form of synchronization that puts the burden at pblk_line_close_ws() would also work for me. In essence, I would rather pay the price on a per-line basis than blocking the trans_lock longer for each I/O. Thoughts? Javier
> On 29 Jan 2019, at 17.36, Heiner Litz <hlitz@ucsc.edu> wrote: > > Javier, > > On Tue, Jan 29, 2019 at 12:13 AM Javier González <javier@javigon.com> wrote: > > On 27 Jan 2019, at 07.54, Heiner Litz <hlitz@ucsc.edu> wrote: > > > > This patch fixes a race condition where a write is mapped to the last > > sectors of a line. The write is synced to the device but the L2P is not > > updated yet. When the line is garbage collected before the L2P update is > > performed, the sectors are ignored by the GC logic and the line is freed > > before all sectors are moved. When the L2P is finally updated, it contains > > a mapping to a freed line, subsequent reads of the corresponding LBAs fail. > > Hi Heiner, > > This has been an interesting issue to debug - good catch! > > Felt more like a marathon than a catch ;) Hehehe, I know. It is a good maraton then :) > > > > > > Note that looking up the L2P and checking the ppa in the write buffer needs > > to be performed atomically, hence the refactor of pblk_lookup_l2p_rand. > > > > Signed-off-by: Heiner Litz <hlitz@ucsc.edu> > > --- > > drivers/lightnvm/pblk-read.c | 27 +++++++++++++++++++++++++-- > > 1 file changed, 25 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c > > index 3789185144da..7c556b2218e4 100644 > > --- a/drivers/lightnvm/pblk-read.c > > +++ b/drivers/lightnvm/pblk-read.c > > @@ -529,13 +529,35 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, > > int valid_secs = 0; > > int i; > > > > - pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs); > > - > > + spin_lock(&pblk->trans_lock); > > for (i = 0; i < nr_secs; i++) { > > if (lba_list[i] == ADDR_EMPTY) > > continue; > > > > + ppa_list_l2p[i] = pblk_trans_map_get(pblk, lba_list[i]); > > ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id); > > + > > + /* Obtain ppa from cache if the sector has been synced to the > > + device but the L2P has not been updated yet */ > > + if(pblk_addr_in_cache(ppa_list_l2p[i])) { > > + struct pblk_rb *rb = &pblk->rwb; > > + struct pblk_rb_entry *entry; > > + struct pblk_w_ctx *w_ctx; > > + u64 pos = pblk_addr_to_cacheline(ppa_list_l2p[i]); > > + > > +#ifdef CONFIG_NVM_PBLK_DEBUG > > + /* Ensure that the access will not cause an overflow */ > > + BUG_ON(pos >= rb->nr_entries); > > +#endif > > + > > + entry = &rb->entries[pos]; > > + w_ctx = &entry->w_ctx; > > + if (pblk_ppa_comp(w_ctx->ppa, ppa_gc)) { > > + rqd->ppa_list[valid_secs++] = ppa_gc; > > + continue; > > + } > > + } > > + > > if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) { > > paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY; > > continue; > > @@ -543,6 +565,7 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, > > > > rqd->ppa_list[valid_secs++] = ppa_list_l2p[i]; > > } > > + spin_unlock(&pblk->trans_lock); > > > > #ifdef CONFIG_NVM_PBLK_DEBUG > > atomic_long_add(valid_secs, &pblk->inflight_reads); > > -- > > 2.17.1 > > > Here is a suggestion: Why not add an atomic counter to the > line stating the sectors that are synced in the L2P table and then > loosely wait (i.e., check and sleep / schedule) until the counter > reaches 0 on pblk_line_close_ws()? This way you guarantee that the line > does not close - and therefore never reaches the GC lists - before all > the L2P entries for that line point to the media. Any other form of > synchronization that puts the burden at pblk_line_close_ws() would also > work for me. > > In essence, I would rather pay the price on a per-line basis than > blocking the trans_lock longer for each I/O. > > The patch only adds 2 well predictable branches to the loop so I think the > impact would be minimal, but I still think your approach is cleaner. > I suggest checking the proposed sync counter whenever selecting a GC > candidate. I'll send out a V2 > Sounds good! Thanks! Javier
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 3789185144da..7c556b2218e4 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -529,13 +529,35 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, int valid_secs = 0; int i; - pblk_lookup_l2p_rand(pblk, ppa_list_l2p, lba_list, nr_secs); - + spin_lock(&pblk->trans_lock); for (i = 0; i < nr_secs; i++) { if (lba_list[i] == ADDR_EMPTY) continue; + ppa_list_l2p[i] = pblk_trans_map_get(pblk, lba_list[i]); ppa_gc = addr_to_gen_ppa(pblk, paddr_list_gc[i], line->id); + + /* Obtain ppa from cache if the sector has been synced to the + device but the L2P has not been updated yet */ + if(pblk_addr_in_cache(ppa_list_l2p[i])) { + struct pblk_rb *rb = &pblk->rwb; + struct pblk_rb_entry *entry; + struct pblk_w_ctx *w_ctx; + u64 pos = pblk_addr_to_cacheline(ppa_list_l2p[i]); + +#ifdef CONFIG_NVM_PBLK_DEBUG + /* Ensure that the access will not cause an overflow */ + BUG_ON(pos >= rb->nr_entries); +#endif + + entry = &rb->entries[pos]; + w_ctx = &entry->w_ctx; + if (pblk_ppa_comp(w_ctx->ppa, ppa_gc)) { + rqd->ppa_list[valid_secs++] = ppa_gc; + continue; + } + } + if (!pblk_ppa_comp(ppa_list_l2p[i], ppa_gc)) { paddr_list_gc[i] = lba_list[i] = ADDR_EMPTY; continue; @@ -543,6 +565,7 @@ static int read_ppalist_rq_gc(struct pblk *pblk, struct nvm_rq *rqd, rqd->ppa_list[valid_secs++] = ppa_list_l2p[i]; } + spin_unlock(&pblk->trans_lock); #ifdef CONFIG_NVM_PBLK_DEBUG atomic_long_add(valid_secs, &pblk->inflight_reads);
This patch fixes a race condition where a write is mapped to the last sectors of a line. The write is synced to the device but the L2P is not updated yet. When the line is garbage collected before the L2P update is performed, the sectors are ignored by the GC logic and the line is freed before all sectors are moved. When the L2P is finally updated, it contains a mapping to a freed line, subsequent reads of the corresponding LBAs fail. Note that looking up the L2P and checking the ppa in the write buffer needs to be performed atomically, hence the refactor of pblk_lookup_l2p_rand. Signed-off-by: Heiner Litz <hlitz@ucsc.edu> --- drivers/lightnvm/pblk-read.c | 27 +++++++++++++++++++++++++-- 1 file changed, 25 insertions(+), 2 deletions(-)