Message ID | c0f65f07b18eee7cef4e0b0b439a45ae437a11c6.1583789410.git.osandov@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: read repair/direct I/O improvements | expand |
On Mon, Mar 09, 2020 at 02:32:29PM -0700, Omar Sandoval wrote: > + /* > + * We need to validate each sector individually if the I/O was for > + * multiple sectors. > + */ > + len = 0; > + for (i = 0; i < failed_bio->bi_vcnt; i++) { > + len += failed_bio->bi_io_vec[i].bv_len; > + if (len > inode->i_sb->s_blocksize) { > + need_validation = true; > + break; > + } So given that btrfs is the I/O submitter iterating over all bio_vecs should probably be fine. That being said I deeply dislike the idea of having code like this inside the file system. Maybe we can add a helper like: u32 bio_size_all(struct bio *bio) { u32 len, i; for (i = 0; i < bio->bi_vcnt; i++) len += bio->bi_io_vec[i].bv_len; return len; } in the core block code, with a kerneldoc comment documenting the exact use cases, and then use that?
On Tue, Mar 10, 2020 at 05:33:19PM +0100, Christoph Hellwig wrote: > On Mon, Mar 09, 2020 at 02:32:29PM -0700, Omar Sandoval wrote: > > + /* > > + * We need to validate each sector individually if the I/O was for > > + * multiple sectors. > > + */ > > + len = 0; > > + for (i = 0; i < failed_bio->bi_vcnt; i++) { > > + len += failed_bio->bi_io_vec[i].bv_len; > > + if (len > inode->i_sb->s_blocksize) { > > + need_validation = true; > > + break; > > + } > > So given that btrfs is the I/O submitter iterating over all bio_vecs > should probably be fine. That being said I deeply dislike the idea > of having code like this inside the file system. Maybe we can add > a helper like: > > u32 bio_size_all(struct bio *bio) > { > u32 len, i; > > for (i = 0; i < bio->bi_vcnt; i++) > len += bio->bi_io_vec[i].bv_len; > return len; > } > > in the core block code, with a kerneldoc comment documenting the > exact use cases, and then use that? That works for me. I was microoptimizing here since I can stop iterating once I know that the bio is greater than one sector, but since this is for the rare repair case, it really doesn't matter. On the other hand, a bio_for_each_bvec_all() helper would feel less intrusive into the bio guts and also support bailing early. I'm fine either way. Thoughts?
On Wed, Mar 11, 2020 at 02:07:47AM -0700, Omar Sandoval wrote: > That works for me. I was microoptimizing here since I can stop iterating > once I know that the bio is greater than one sector, but since this is > for the rare repair case, it really doesn't matter. > > On the other hand, a bio_for_each_bvec_all() helper would feel less > intrusive into the bio guts and also support bailing early. I'm fine > either way. Thoughts? I don't really care too much as long as we don't open code the bi_io_vec access.
On 9.03.20 г. 23:32 ч., Omar Sandoval wrote: > From: Omar Sandoval <osandov@fb.com> > > Read repair does two things: it finds a good copy of data to return to > the reader, and it corrects the bad copy on disk. If a read of multiple > sectors has an I/O error, repair does an extra "validation" step that > issues a separate read for each sector. This allows us to find the exact > failing sectors and only rewrite those. > > This heuristic is implemented in > bio_readpage_error()/btrfs_check_repairable() as: > > failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT; > if (failed_bio_pages > 1) > do validation > > However, at this point, bi_iter may have already been advanced. This > means that we'll skip the validation step and rewrite the entire failed > read. > > Fix it by getting the actual size from the biovec (which we can do > because this is only called for non-cloned bios, although that will > change in a later commit). > > Fixes: 8a2ee44a371c ("btrfs: look at bi_size for repair decisions") > Signed-off-by: Omar Sandoval <osandov@fb.com> > --- > fs/btrfs/extent_io.c | 28 ++++++++++++++++++++++------ > fs/btrfs/extent_io.h | 5 +++-- > 2 files changed, 25 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 837262d54e28..279731bff0a8 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2528,8 +2528,9 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, > return 0; > } > > -bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, > - struct io_failure_record *failrec, int failed_mirror) > +bool btrfs_check_repairable(struct inode *inode, bool need_validation, nit: While at it this function can be made static. It's only used in extent_io.c and it's defined before its sole caller - bio_readpage_error.
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 837262d54e28..279731bff0a8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2528,8 +2528,9 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end, return 0; } -bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, - struct io_failure_record *failrec, int failed_mirror) +bool btrfs_check_repairable(struct inode *inode, bool need_validation, + struct io_failure_record *failrec, + int failed_mirror) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); int num_copies; @@ -2552,7 +2553,7 @@ bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, * a) deliver good data to the caller * b) correct the bad sectors on disk */ - if (failed_bio_pages > 1) { + if (need_validation) { /* * to fulfill b), we need to know the exact failing sectors, as * we don't want to rewrite any more than the failed ones. thus, @@ -2638,11 +2639,13 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, struct inode *inode = page->mapping->host; struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; + bool need_validation = false; + u64 len; + int i; struct bio *bio; int read_mode = 0; blk_status_t status; int ret; - unsigned failed_bio_pages = failed_bio->bi_iter.bi_size >> PAGE_SHIFT; BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); @@ -2650,13 +2653,26 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, if (ret) return ret; - if (!btrfs_check_repairable(inode, failed_bio_pages, failrec, + /* + * We need to validate each sector individually if the I/O was for + * multiple sectors. + */ + len = 0; + for (i = 0; i < failed_bio->bi_vcnt; i++) { + len += failed_bio->bi_io_vec[i].bv_len; + if (len > inode->i_sb->s_blocksize) { + need_validation = true; + break; + } + } + + if (!btrfs_check_repairable(inode, need_validation, failrec, failed_mirror)) { free_io_failure(failure_tree, tree, failrec); return -EIO; } - if (failed_bio_pages > 1) + if (need_validation) read_mode |= REQ_FAILFAST_DEV; phy_offset >>= inode->i_sb->s_blocksize_bits; diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 234622101230..64e176995af2 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -312,8 +312,9 @@ struct io_failure_record { }; -bool btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages, - struct io_failure_record *failrec, int fail_mirror); +bool btrfs_check_repairable(struct inode *inode, bool need_validation, + struct io_failure_record *failrec, + int failed_mirror); struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio, struct io_failure_record *failrec, struct page *page, int pg_offset, int icsum,