Message ID | 20190314111637.20236-1-hans@owltronix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs | expand |
While reading this patch, idea came to my mind - maybe it would be simply better to get rid of partial read handling from pblk in current form at all and use bio_split() & bio_chain() combination instead? This would definitely reduce a number of this 'kind of complicated' code inside pblk and let block layer help us with that. Even that the performance of such a requests could be a little worse (few smaller IOs instead of single vector IO), I believe that partial read is more a corner case, then a typical use case, so maybe this would be a path to go. Let me know what you think about such an approach - I can make a patch with that if you want. Igor On 14.03.2019 12:16, hans@owltronix.com wrote: > From: Hans Holmberg <hans.holmberg@cnexlabs.com> > > Ever since '07173c3ec276 ("block: enable multipage bvecs")' we > need to handle bios with multipage bvecs in pblk. > > Currently, a multipage bvec results in a crash[1]. > Fix this by using bvec iterators in stead of direct bvec indexing. > > Also add a dcache flush, for the same reasons as in: > '6e6e811d747b ("block: Add missing flush_dcache_page() call")' > > [1] https://github.com/OpenChannelSSD/linux/issues/61 > > Reported-by: Klaus Jensen <klaus.jensen@cnexlabs.com> > Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> > > --- > > RFC to get more eyes on this - note that we're doing something > very similar to bio_copy_data_iter. > > This works in my QEMU setup, and I'll start more regression testing now. > > drivers/lightnvm/pblk-read.c | 50 ++++++++++++++++++++---------------- > 1 file changed, 28 insertions(+), 22 deletions(-) > > diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c > index 1f9b319c9737..b8eb6bdb983b 100644 > --- a/drivers/lightnvm/pblk-read.c > +++ b/drivers/lightnvm/pblk-read.c > @@ -231,14 +231,14 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) > struct pblk_sec_meta *meta; > struct bio *new_bio = rqd->bio; > struct bio *bio = pr_ctx->orig_bio; > - struct bio_vec src_bv, dst_bv; > void *meta_list = rqd->meta_list; > - int bio_init_idx = pr_ctx->bio_init_idx; > unsigned long *read_bitmap = pr_ctx->bitmap; > + struct bvec_iter orig_iter = BVEC_ITER_ALL_INIT; > + struct bvec_iter new_iter = BVEC_ITER_ALL_INIT; > int nr_secs = pr_ctx->orig_nr_secs; > int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); > void *src_p, *dst_p; > - int hole, i; > + int bit, i; > > if (unlikely(nr_holes == 1)) { > struct ppa_addr ppa; > @@ -257,33 +257,39 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) > > /* Fill the holes in the original bio */ > i = 0; > - hole = find_first_zero_bit(read_bitmap, nr_secs); > - do { > - struct pblk_line *line; > + for (bit = 0; bit < nr_secs; bit++) { > + if (!test_bit(bit, read_bitmap)) { > + struct bio_vec dst_bv, src_bv; > + struct pblk_line *line; > > - line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); > - kref_put(&line->ref, pblk_line_put); > + line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); > + kref_put(&line->ref, pblk_line_put); > > - meta = pblk_get_meta(pblk, meta_list, hole); > - meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]); > + meta = pblk_get_meta(pblk, meta_list, bit); > + meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]); > > - src_bv = new_bio->bi_io_vec[i++]; > - dst_bv = bio->bi_io_vec[bio_init_idx + hole]; > + dst_bv = bio_iter_iovec(bio, orig_iter); > + src_bv = bio_iter_iovec(new_bio, new_iter); > > - src_p = kmap_atomic(src_bv.bv_page); > - dst_p = kmap_atomic(dst_bv.bv_page); > + src_p = kmap_atomic(src_bv.bv_page); > + dst_p = kmap_atomic(dst_bv.bv_page); > > - memcpy(dst_p + dst_bv.bv_offset, > - src_p + src_bv.bv_offset, > - PBLK_EXPOSED_PAGE_SIZE); > + memcpy(dst_p + dst_bv.bv_offset, > + src_p + src_bv.bv_offset, > + PBLK_EXPOSED_PAGE_SIZE); > > - kunmap_atomic(src_p); > - kunmap_atomic(dst_p); > + kunmap_atomic(src_p); > + kunmap_atomic(dst_p); > > - mempool_free(src_bv.bv_page, &pblk->page_bio_pool); > + flush_dcache_page(dst_bv.bv_page); > + mempool_free(src_bv.bv_page, &pblk->page_bio_pool); > > - hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1); > - } while (hole < nr_secs); > + bio_advance_iter(new_bio, &new_iter, > + PBLK_EXPOSED_PAGE_SIZE); > + i++; > + } > + bio_advance_iter(bio, &orig_iter, PBLK_EXPOSED_PAGE_SIZE); > + } > > bio_put(new_bio); > kfree(pr_ctx); >
> On 14 Mar 2019, at 06.49, Igor Konopko <igor.j.konopko@intel.com> wrote: > > While reading this patch, idea came to my mind - maybe it would be simply better to get rid of partial read handling from pblk in current form at all and use bio_split() & bio_chain() combination instead? > > This would definitely reduce a number of this 'kind of complicated' code inside pblk and let block layer help us with that. Even that the performance of such a requests could be a little worse (few smaller IOs instead of single vector IO), I believe that partial read is more a corner case, then a typical use case, so maybe this would be a path to go. > > Let me know what you think about such an approach - I can make a patch with that if you want. I agree with Igor. As I mentioned offline, we should fix this in a way that survives further changes in struct bio; either making pblk handling visible to the outside or rethinking the whole thing. Igor: If you can send a patch you mention, it would be great. I have been trying the helper approach for some time, but it is too specific, as fixing holes in the bvec breaks the bio advance-only semantics. Your approach seems much better. Thanks, Javier
On Thu, Mar 14, 2019 at 7:16 AM Javier González <javier@javigon.com> wrote: > > > On 14 Mar 2019, at 06.49, Igor Konopko <igor.j.konopko@intel.com> wrote: > > > > While reading this patch, idea came to my mind - maybe it would be simply better to get rid of partial read handling from pblk in current form at all and use bio_split() & bio_chain() combination instead? > > > > This would definitely reduce a number of this 'kind of complicated' code inside pblk and let block layer help us with that. Even that the performance of such a requests could be a little worse (few smaller IOs instead of single vector IO), I believe that partial read is more a corner case, then a typical use case, so maybe this would be a path to go. > > > > Let me know what you think about such an approach - I can make a patch with that if you want. I like this idea as it will clean up a lot of code and get rid of the read_bitmap. Note that a single request can be turned into up to 32 requests if cached and non-cached sectors alternate, each one probably requiring an l2p lookup. I still think it's worth doing though. When you split, note that you have to release all line_refs which were acquired during l2p lookup. > > I agree with Igor. > > As I mentioned offline, we should fix this in a way that survives > further changes in struct bio; either making pblk handling visible to > the outside or rethinking the whole thing. > > Igor: If you can send a patch you mention, it would be great. I have > been trying the helper approach for some time, but it is too specific, > as fixing holes in the bvec breaks the bio advance-only semantics. Your > approach seems much better. > > Thanks, > Javier
On Thu, Mar 14, 2019 at 5:15 PM Heiner Litz <hlitz@ucsc.edu> wrote: > > On Thu, Mar 14, 2019 at 7:16 AM Javier González <javier@javigon.com> wrote: > > > > > On 14 Mar 2019, at 06.49, Igor Konopko <igor.j.konopko@intel.com> wrote: > > > > > > While reading this patch, idea came to my mind - maybe it would be simply better to get rid of partial read handling from pblk in current form at all and use bio_split() & bio_chain() combination instead? > > > > > > This would definitely reduce a number of this 'kind of complicated' code inside pblk and let block layer help us with that. Even that the performance of such a requests could be a little worse (few smaller IOs instead of single vector IO), I believe that partial read is more a corner case, then a typical use case, so maybe this would be a path to go. > > > > > > Let me know what you think about such an approach - I can make a patch with that if you want. > > I like this idea as it will clean up a lot of code and get rid of the > read_bitmap. > Note that a single request can be turned into up to 32 requests if > cached and non-cached sectors alternate, each one probably requiring > an l2p lookup. I still think it's worth doing though. > When you split, note that you have to release all line_refs which were > acquired during l2p lookup. > > > > > I agree with Igor. > > I agree as well, this path deserves some loving care and cleanup. Let's just try to avoid read tail latency degradation. Simplifying the code is a great first step, then we can optimize if needed be. Thanks for the feedback y'all! > > As I mentioned offline, we should fix this in a way that survives > > further changes in struct bio; either making pblk handling visible to > > the outside or rethinking the whole thing. > > > > Igor: If you can send a patch you mention, it would be great. I have > > been trying the helper approach for some time, but it is too specific, > > as fixing holes in the bvec breaks the bio advance-only semantics. Your > > approach seems much better. > > > > Thanks, > > Javier
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index 1f9b319c9737..b8eb6bdb983b 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -231,14 +231,14 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) struct pblk_sec_meta *meta; struct bio *new_bio = rqd->bio; struct bio *bio = pr_ctx->orig_bio; - struct bio_vec src_bv, dst_bv; void *meta_list = rqd->meta_list; - int bio_init_idx = pr_ctx->bio_init_idx; unsigned long *read_bitmap = pr_ctx->bitmap; + struct bvec_iter orig_iter = BVEC_ITER_ALL_INIT; + struct bvec_iter new_iter = BVEC_ITER_ALL_INIT; int nr_secs = pr_ctx->orig_nr_secs; int nr_holes = nr_secs - bitmap_weight(read_bitmap, nr_secs); void *src_p, *dst_p; - int hole, i; + int bit, i; if (unlikely(nr_holes == 1)) { struct ppa_addr ppa; @@ -257,33 +257,39 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) /* Fill the holes in the original bio */ i = 0; - hole = find_first_zero_bit(read_bitmap, nr_secs); - do { - struct pblk_line *line; + for (bit = 0; bit < nr_secs; bit++) { + if (!test_bit(bit, read_bitmap)) { + struct bio_vec dst_bv, src_bv; + struct pblk_line *line; - line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); - kref_put(&line->ref, pblk_line_put); + line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); + kref_put(&line->ref, pblk_line_put); - meta = pblk_get_meta(pblk, meta_list, hole); - meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]); + meta = pblk_get_meta(pblk, meta_list, bit); + meta->lba = cpu_to_le64(pr_ctx->lba_list_media[i]); - src_bv = new_bio->bi_io_vec[i++]; - dst_bv = bio->bi_io_vec[bio_init_idx + hole]; + dst_bv = bio_iter_iovec(bio, orig_iter); + src_bv = bio_iter_iovec(new_bio, new_iter); - src_p = kmap_atomic(src_bv.bv_page); - dst_p = kmap_atomic(dst_bv.bv_page); + src_p = kmap_atomic(src_bv.bv_page); + dst_p = kmap_atomic(dst_bv.bv_page); - memcpy(dst_p + dst_bv.bv_offset, - src_p + src_bv.bv_offset, - PBLK_EXPOSED_PAGE_SIZE); + memcpy(dst_p + dst_bv.bv_offset, + src_p + src_bv.bv_offset, + PBLK_EXPOSED_PAGE_SIZE); - kunmap_atomic(src_p); - kunmap_atomic(dst_p); + kunmap_atomic(src_p); + kunmap_atomic(dst_p); - mempool_free(src_bv.bv_page, &pblk->page_bio_pool); + flush_dcache_page(dst_bv.bv_page); + mempool_free(src_bv.bv_page, &pblk->page_bio_pool); - hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1); - } while (hole < nr_secs); + bio_advance_iter(new_bio, &new_iter, + PBLK_EXPOSED_PAGE_SIZE); + i++; + } + bio_advance_iter(bio, &orig_iter, PBLK_EXPOSED_PAGE_SIZE); + } bio_put(new_bio); kfree(pr_ctx);