Message ID | 20211201051756.53742-3-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: split bio at btrfs_map_bio() time | expand |
On 2021/12/1 13:17, Qu Wenruo wrote: > Since block layer will advance bio::bi_iter, at endio time we can no > longer rely on bio::bi_iter for split bio. > > But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to > ensure endio function is only executed for the split range, not the > whole original bio. > > Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to > save bi_iter into btrfs_bio::iter. > > The following call sites need this helper call: > > - btrfs_submit_compressed_read() > For compressed read. For compressed write it doesn't really care as > they use ordered extent. > > - raid56_parity_write() > - raid56_parity_recovery() > For RAID56. > > - submit_stripe_bio() > For all other cases. These are not enough. There are cases where we allocate a bio but without going through btrfs_map_bio(), and error out. In that case, those bios don't have bbio::iter, and can cause errors in generic/475 related to data/metadata writeback failure. Fixed in my github repo, by just adding more btrfs_bio_save_iter() calls in error paths. Thanks, Qu > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/compression.c | 3 +++ > fs/btrfs/raid56.c | 2 ++ > fs/btrfs/volumes.c | 14 ++++++++++++++ > fs/btrfs/volumes.h | 18 ++++++++++++++++++ > 4 files changed, 37 insertions(+) > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > index e776956d5bc9..cc8d13369f53 100644 > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -870,6 +870,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, > /* include any pages we added in add_ra-bio_pages */ > cb->len = bio->bi_iter.bi_size; > > + /* Save bi_iter so that end_bio_extent_readpage() won't freak out. */ > + btrfs_bio_save_iter(btrfs_bio(bio)); > + > while (cur_disk_byte < disk_bytenr + compressed_len) { > u64 offset = cur_disk_byte - disk_bytenr; > unsigned int index = offset >> PAGE_SHIFT; > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c > index 0e239a4c3b26..13e726c88a81 100644 > --- a/fs/btrfs/raid56.c > +++ b/fs/btrfs/raid56.c > @@ -1731,6 +1731,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, > return PTR_ERR(rbio); > } > bio_list_add(&rbio->bio_list, bio); > + btrfs_bio_save_iter(btrfs_bio(bio)); > rbio->bio_list_bytes = bio->bi_iter.bi_size; > rbio->operation = BTRFS_RBIO_WRITE; > > @@ -2135,6 +2136,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc, > > rbio->operation = BTRFS_RBIO_READ_REBUILD; > bio_list_add(&rbio->bio_list, bio); > + btrfs_bio_save_iter(btrfs_bio(bio)); > rbio->bio_list_bytes = bio->bi_iter.bi_size; > > rbio->faila = find_logical_bio_stripe(rbio, bio); > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index f38c230111be..b70037cc1a51 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -6829,6 +6829,20 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, > BUG(); > } > > + /* > + * At endio time, bi_iter is no longer reliable, thus we have to save > + * current bi_iter into btrfs_bio so that even for split bio we can > + * iterate only the split part. > + * > + * And this has to be done before any bioc error, as endio functions > + * will rely on bbio::iter. > + * > + * For bio create by btrfs_bio_slit() or btrfs_bio_clone*(), it's > + * already set, but we can still have original bio which has its > + * iter not initialized. > + */ > + btrfs_bio_save_iter(btrfs_bio(bio)); > + > for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { > dev = bioc->stripes[dev_nr].dev; > if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, > diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h > index 3b8130680749..f9178d2c2fd6 100644 > --- a/fs/btrfs/volumes.h > +++ b/fs/btrfs/volumes.h > @@ -334,6 +334,12 @@ struct btrfs_bio { > struct btrfs_device *device; > u8 *csum; > u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; > + /* > + * Saved bio::bi_iter before submission. > + * > + * This allows us to interate the cloned/split bio properly, as at > + * endio time bio::bi_iter is no longer reliable. > + */ > struct bvec_iter iter; > > /* > @@ -356,6 +362,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) > } > } > > +/* > + * To save bbio::bio->bi_iter into bbio::iter so for callers who need the > + * original bi_iter can access the original part of the bio. > + * This is especially important for the incoming split btrfs_bio, which needs > + * to call its endio for and only for the split range. > + */ > +static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio) > +{ > + if (!bbio->iter.bi_size) > + bbio->iter = bbio->bio.bi_iter; > +} > + > struct btrfs_io_stripe { > struct btrfs_device *dev; > u64 physical; >
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index e776956d5bc9..cc8d13369f53 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -870,6 +870,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; + /* Save bi_iter so that end_bio_extent_readpage() won't freak out. */ + btrfs_bio_save_iter(btrfs_bio(bio)); + while (cur_disk_byte < disk_bytenr + compressed_len) { u64 offset = cur_disk_byte - disk_bytenr; unsigned int index = offset >> PAGE_SHIFT; diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0e239a4c3b26..13e726c88a81 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1731,6 +1731,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, return PTR_ERR(rbio); } bio_list_add(&rbio->bio_list, bio); + btrfs_bio_save_iter(btrfs_bio(bio)); rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->operation = BTRFS_RBIO_WRITE; @@ -2135,6 +2136,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc, rbio->operation = BTRFS_RBIO_READ_REBUILD; bio_list_add(&rbio->bio_list, bio); + btrfs_bio_save_iter(btrfs_bio(bio)); rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->faila = find_logical_bio_stripe(rbio, bio); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f38c230111be..b70037cc1a51 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6829,6 +6829,20 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, BUG(); } + /* + * At endio time, bi_iter is no longer reliable, thus we have to save + * current bi_iter into btrfs_bio so that even for split bio we can + * iterate only the split part. + * + * And this has to be done before any bioc error, as endio functions + * will rely on bbio::iter. + * + * For bio create by btrfs_bio_slit() or btrfs_bio_clone*(), it's + * already set, but we can still have original bio which has its + * iter not initialized. + */ + btrfs_bio_save_iter(btrfs_bio(bio)); + for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { dev = bioc->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 3b8130680749..f9178d2c2fd6 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -334,6 +334,12 @@ struct btrfs_bio { struct btrfs_device *device; u8 *csum; u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; + /* + * Saved bio::bi_iter before submission. + * + * This allows us to interate the cloned/split bio properly, as at + * endio time bio::bi_iter is no longer reliable. + */ struct bvec_iter iter; /* @@ -356,6 +362,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) } } +/* + * To save bbio::bio->bi_iter into bbio::iter so for callers who need the + * original bi_iter can access the original part of the bio. + * This is especially important for the incoming split btrfs_bio, which needs + * to call its endio for and only for the split range. + */ +static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio) +{ + if (!bbio->iter.bi_size) + bbio->iter = bbio->bio.bi_iter; +} + struct btrfs_io_stripe { struct btrfs_device *dev; u64 physical;
Since block layer will advance bio::bi_iter, at endio time we can no longer rely on bio::bi_iter for split bio. But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to ensure endio function is only executed for the split range, not the whole original bio. Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to save bi_iter into btrfs_bio::iter. The following call sites need this helper call: - btrfs_submit_compressed_read() For compressed read. For compressed write it doesn't really care as they use ordered extent. - raid56_parity_write() - raid56_parity_recovery() For RAID56. - submit_stripe_bio() For all other cases. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/compression.c | 3 +++ fs/btrfs/raid56.c | 2 ++ fs/btrfs/volumes.c | 14 ++++++++++++++ fs/btrfs/volumes.h | 18 ++++++++++++++++++ 4 files changed, 37 insertions(+)