[13/15] btrfs: simplify direct I/O read repair

Message ID	38cea444fa3f88ca514d161bd979d004c254e969.1583789410.git.osandov@fb.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=wiEv=42=vger.kernel.org=linux-btrfs-owner@kernel.org> From: Omar Sandoval <osandov@osandov.com> To: linux-btrfs@vger.kernel.org Cc: kernel-team@fb.com, Christoph Hellwig <hch@lst.de> Subject: [PATCH 13/15] btrfs: simplify direct I/O read repair Date: Mon, 9 Mar 2020 14:32:39 -0700 Message-Id: <38cea444fa3f88ca514d161bd979d004c254e969.1583789410.git.osandov@fb.com> In-Reply-To: <cover.1583789410.git.osandov@fb.com> References: <cover.1583789410.git.osandov@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk
Series	btrfs: read repair/direct I/O improvements \| expand [00/15] btrfs: read repair/direct I/O improvements [01/15] btrfs: fix error handling when submitting direct I/O bio [02/15] btrfs: fix double __endio_write_update_ordered in direct I/O [03/15] btrfs: look at full bi_io_vec for repair decision [04/15] btrfs: don't do repair validation for checksum errors [05/15] btrfs: clarify btrfs_lookup_bio_sums documentation [06/15] btrfs: rename __readpage_endio_check to check_data_csum [07/15] btrfs: make btrfs_check_repairable() static [08/15] btrfs: move btrfs_dio_private to inode.c [09/15] btrfs: kill btrfs_dio_private->private [10/15] btrfs: convert btrfs_dio_private->pending_bios to refcount_t [11/15] btrfs: put direct I/O checksums in btrfs_dio_private instead of bio [12/15] btrfs: get rid of one layer of bios in direct I/O [13/15] btrfs: simplify direct I/O read repair [14/15] btrfs: get rid of endio_repair_workers [15/15] btrfs: unify buffered and direct I/O read repair

Message ID

38cea444fa3f88ca514d161bd979d004c254e969.1583789410.git.osandov@fb.com (mailing list archive)

State

New, archived

Headers

From: Omar Sandoval <osandov@osandov.com>
To: linux-btrfs@vger.kernel.org
Cc: kernel-team@fb.com, Christoph Hellwig <hch@lst.de>
Subject: [PATCH 13/15] btrfs: simplify direct I/O read repair
Date: Mon,  9 Mar 2020 14:32:39 -0700
Message-Id: 
 <38cea444fa3f88ca514d161bd979d004c254e969.1583789410.git.osandov@fb.com>
In-Reply-To: <cover.1583789410.git.osandov@fb.com>
References: <cover.1583789410.git.osandov@fb.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Series

btrfs: read repair/direct I/O improvements | expand

Commit Message

Omar Sandoval March 9, 2020, 9:32 p.m. UTC

From: Omar Sandoval <osandov@fb.com>

Direct I/O read repair is an over-complicated mess. There is major code
duplication between __btrfs_subio_endio_read() (checks checksums and
handles I/O errors for files with checksums),
__btrfs_correct_data_nocsum() (handles I/O errors for files without
checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
for retries of files with checksums), and btrfs_retry_endio_nocsum()
(handles I/O errors for retries of files without checksum). If it sounds
like these should be one function, that's because they should.

After the previous commit getting rid of orig_bio, we can reuse the same
endio callback for repair I/O and the original I/O, we just need to
track the file offset and original iterator in the repair bio. We can
also unify the handling of files with and without checksums and replace
the atrocity that was probably the inspiration for "Go To Statement
Considered Harmful" with normal loops. We also no longer have to wait
for each repair I/O to complete one by one.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/extent_io.c |   2 +
 fs/btrfs/inode.c     | 268 +++++++------------------------------------
 2 files changed, 44 insertions(+), 226 deletions(-)

Comments

Josef Bacik March 11, 2020, 6:16 p.m. UTC | #1

On 3/9/20 5:32 PM, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Direct I/O read repair is an over-complicated mess. There is major code
> duplication between __btrfs_subio_endio_read() (checks checksums and
> handles I/O errors for files with checksums),
> __btrfs_correct_data_nocsum() (handles I/O errors for files without
> checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
> for retries of files with checksums), and btrfs_retry_endio_nocsum()
> (handles I/O errors for retries of files without checksum). If it sounds
> like these should be one function, that's because they should.
> 
> After the previous commit getting rid of orig_bio, we can reuse the same
> endio callback for repair I/O and the original I/O, we just need to
> track the file offset and original iterator in the repair bio. We can
> also unify the handling of files with and without checksums and replace
> the atrocity that was probably the inspiration for "Go To Statement
> Considered Harmful" with normal loops. We also no longer have to wait
> for each repair I/O to complete one by one.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>

Man that was a doozy, took me a while to realize we only ever use our ->logical 
for DIO.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

David Sterba April 3, 2020, 4:40 p.m. UTC | #2

On Mon, Mar 09, 2020 at 02:32:39PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Direct I/O read repair is an over-complicated mess. There is major code
> duplication between __btrfs_subio_endio_read() (checks checksums and
> handles I/O errors for files with checksums),
> __btrfs_correct_data_nocsum() (handles I/O errors for files without
> checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
> for retries of files with checksums), and btrfs_retry_endio_nocsum()
> (handles I/O errors for retries of files without checksum). If it sounds
> like these should be one function, that's because they should.
> 
> After the previous commit getting rid of orig_bio, we can reuse the same
> endio callback for repair I/O and the original I/O, we just need to
> track the file offset and original iterator in the repair bio. We can
> also unify the handling of files with and without checksums and replace
> the atrocity that was probably the inspiration for "Go To Statement
> Considered Harmful" with normal loops. We also no longer have to wait
> for each repair I/O to complete one by one.

This patch looks like a revert of 8b110e393c5a ("Btrfs: implement repair
function when direct read fails"), that probably added the extra layer
you're removing.

So instead of the funny remarks, I'd rather see some analysis that the
issues in the original patch are not coming back.  Quoting from the
changelog:

- When we find the data is not right, we try to read the data from the other
  mirror.
- When the io on the mirror ends, we will insert the endio work into the
  dedicated btrfs workqueue, not common read endio workqueue, because the
  original endio work is still blocked in the btrfs endio workqueue, if we
  insert the endio work of the io on the mirror into that workqueue, deadlock
  would happen.
- After we get right data, we write it back to the corrupted mirror.
- And if the data on the new mirror is still corrupted, we will try next
  mirror until we read right data or all the mirrors are traversed.
- After the above work, we set the uptodate flag according to the result.

It's not too detailed either, but what immediatelly looks suspicious is
the extra workqueue that was added to avoid deadlocks. That is now going
to be removed. This seems like a high level change even for such an old
code (2014) so that its effects are not affected by some other changes
in the dio code.

Omar Sandoval April 3, 2020, 6:05 p.m. UTC | #3

On Fri, Apr 03, 2020 at 06:40:51PM +0200, David Sterba wrote:
> On Mon, Mar 09, 2020 at 02:32:39PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osandov@fb.com>
> > 
> > Direct I/O read repair is an over-complicated mess. There is major code
> > duplication between __btrfs_subio_endio_read() (checks checksums and
> > handles I/O errors for files with checksums),
> > __btrfs_correct_data_nocsum() (handles I/O errors for files without
> > checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
> > for retries of files with checksums), and btrfs_retry_endio_nocsum()
> > (handles I/O errors for retries of files without checksum). If it sounds
> > like these should be one function, that's because they should.
> > 
> > After the previous commit getting rid of orig_bio, we can reuse the same
> > endio callback for repair I/O and the original I/O, we just need to
> > track the file offset and original iterator in the repair bio. We can
> > also unify the handling of files with and without checksums and replace
> > the atrocity that was probably the inspiration for "Go To Statement
> > Considered Harmful" with normal loops. We also no longer have to wait
> > for each repair I/O to complete one by one.
> 
> This patch looks like a revert of 8b110e393c5a ("Btrfs: implement repair
> function when direct read fails"), that probably added the extra layer
> you're removing.
> 
> So instead of the funny remarks, I'd rather see some analysis that the
> issues in the original patch are not coming back.  Quoting from the
> changelog:
> 
> - When we find the data is not right, we try to read the data from the other
>   mirror.
> - When the io on the mirror ends, we will insert the endio work into the
>   dedicated btrfs workqueue, not common read endio workqueue, because the
>   original endio work is still blocked in the btrfs endio workqueue, if we
>   insert the endio work of the io on the mirror into that workqueue, deadlock
>   would happen.
> - After we get right data, we write it back to the corrupted mirror.
> - And if the data on the new mirror is still corrupted, we will try next
>   mirror until we read right data or all the mirrors are traversed.
> - After the above work, we set the uptodate flag according to the result.
> 
> It's not too detailed either, but what immediatelly looks suspicious is
> the extra workqueue that was added to avoid deadlocks. That is now going
> to be removed. This seems like a high level change even for such an old
> code (2014) so that its effects are not affected by some other changes
> in the dio code.

This patch doesn't touch the extra workqueue. The next patch that gets
rid of it has an explanation:

    This was originally added in commit 8b110e393c5a ("Btrfs: implement
    repair function when direct read fails") because the original bio waited
    for the repair bio to complete, so the repair I/O couldn't go through
    the same workqueue. As of the previous commit, this is no longer true,
    so this separate workqueue is unnecessary.

I can expand on that for v2. The deadlock addressed by the original code
is pretty much that while the worker is executing the original bio, it
will be blocked on the repair bio completing, and the repair bio will be
blocked on the worker finishing the original bio. With this rework, the
original bio doesn't block on the repair bio, so the worker becomes
available for the repair bio right away.

David Sterba April 16, 2020, 10:08 a.m. UTC | #4

On Fri, Apr 03, 2020 at 11:05:23AM -0700, Omar Sandoval wrote:
> On Fri, Apr 03, 2020 at 06:40:51PM +0200, David Sterba wrote:
> > On Mon, Mar 09, 2020 at 02:32:39PM -0700, Omar Sandoval wrote:
> > > From: Omar Sandoval <osandov@fb.com>
> > > 
> > > Direct I/O read repair is an over-complicated mess. There is major code
> > > duplication between __btrfs_subio_endio_read() (checks checksums and
> > > handles I/O errors for files with checksums),
> > > __btrfs_correct_data_nocsum() (handles I/O errors for files without
> > > checksums), btrfs_retry_endio() (checks checksums and handles I/O errors
> > > for retries of files with checksums), and btrfs_retry_endio_nocsum()
> > > (handles I/O errors for retries of files without checksum). If it sounds
> > > like these should be one function, that's because they should.
> > > 
> > > After the previous commit getting rid of orig_bio, we can reuse the same
> > > endio callback for repair I/O and the original I/O, we just need to
> > > track the file offset and original iterator in the repair bio. We can
> > > also unify the handling of files with and without checksums and replace
> > > the atrocity that was probably the inspiration for "Go To Statement
> > > Considered Harmful" with normal loops. We also no longer have to wait
> > > for each repair I/O to complete one by one.
> > 
> > This patch looks like a revert of 8b110e393c5a ("Btrfs: implement repair
> > function when direct read fails"), that probably added the extra layer
> > you're removing.
> > 
> > So instead of the funny remarks, I'd rather see some analysis that the
> > issues in the original patch are not coming back.  Quoting from the
> > changelog:
> > 
> > - When we find the data is not right, we try to read the data from the other
> >   mirror.
> > - When the io on the mirror ends, we will insert the endio work into the
> >   dedicated btrfs workqueue, not common read endio workqueue, because the
> >   original endio work is still blocked in the btrfs endio workqueue, if we
> >   insert the endio work of the io on the mirror into that workqueue, deadlock
> >   would happen.
> > - After we get right data, we write it back to the corrupted mirror.
> > - And if the data on the new mirror is still corrupted, we will try next
> >   mirror until we read right data or all the mirrors are traversed.
> > - After the above work, we set the uptodate flag according to the result.
> > 
> > It's not too detailed either, but what immediatelly looks suspicious is
> > the extra workqueue that was added to avoid deadlocks. That is now going
> > to be removed. This seems like a high level change even for such an old
> > code (2014) so that its effects are not affected by some other changes
> > in the dio code.
> 
> This patch doesn't touch the extra workqueue. The next patch that gets
> rid of it has an explanation:
> 
>     This was originally added in commit 8b110e393c5a ("Btrfs: implement
>     repair function when direct read fails") because the original bio waited
>     for the repair bio to complete, so the repair I/O couldn't go through
>     the same workqueue. As of the previous commit, this is no longer true,
>     so this separate workqueue is unnecessary.
> 
> I can expand on that for v2. The deadlock addressed by the original code
> is pretty much that while the worker is executing the original bio, it
> will be blocked on the repair bio completing, and the repair bio will be
> blocked on the worker finishing the original bio. With this rework, the
> original bio doesn't block on the repair bio, so the worker becomes
> available for the repair bio right away.

I haven't noticed that the next patch mentions the commmit 8b110e393c5a,
so for clarity it would be better to have more references in both. The
explanation above sounds good to me.

Please send v2, of the whole patchset, it's post rc1 so time for new
code. I'm testing the dio-iomap code, it's getting better so hopefully
both patchsets can be merged together.

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index aee35d431f91..fad86ef4d09d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2622,6 +2622,8 @@  struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
 	}
 
 	bio_add_page(bio, page, failrec->len, pg_offset);
+	btrfs_io_bio(bio)->logical = failrec->start;
+	btrfs_io_bio(bio)->iter = bio->bi_iter;
 
 	return bio;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40c1562704e9..ef302b7c6c2d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7463,19 +7463,17 @@  static int btrfs_check_dio_repairable(struct inode *inode,
 
 static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
 				   struct page *page, unsigned int pgoff,
-				   u64 start, u64 end, int failed_mirror,
-				   bio_end_io_t *repair_endio, void *repair_arg)
+				   u64 start, u64 end, int failed_mirror)
 {
+	struct btrfs_dio_private *dip = failed_bio->bi_private;
 	struct io_failure_record *failrec;
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
 	struct bio *bio;
 	int isector;
 	unsigned int read_mode = 0;
-	int segs;
 	int ret;
 	blk_status_t status;
-	struct bio_vec bvec;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
 
@@ -7490,261 +7488,79 @@  static blk_status_t dio_read_error(struct inode *inode, struct bio *failed_bio,
 		return BLK_STS_IOERR;
 	}
 
-	segs = bio_segments(failed_bio);
-	bio_get_first_bvec(failed_bio, &bvec);
-	if (segs > 1 ||
-	    (bvec.bv_len > btrfs_inode_sectorsize(inode)))
+	if (btrfs_io_bio(failed_bio)->iter.bi_size > inode->i_sb->s_blocksize)
 		read_mode |= REQ_FAILFAST_DEV;
 
 	isector = start - btrfs_io_bio(failed_bio)->logical;
 	isector >>= inode->i_sb->s_blocksize_bits;
-	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
-				pgoff, isector, repair_endio, repair_arg);
+	bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page, pgoff,
+				      isector, failed_bio->bi_end_io, dip);
 	bio->bi_opf = REQ_OP_READ | read_mode;
 
 	btrfs_debug(BTRFS_I(inode)->root->fs_info,
 		    "repair DIO read error: submitting new dio read[%#x] to this_mirror=%d, in_validation=%d",
 		    read_mode, failrec->this_mirror, failrec->in_validation);
 
+	refcount_inc(&dip->refs);
 	status = submit_dio_repair_bio(inode, bio, failrec->this_mirror);
 	if (status) {
 		free_io_failure(failure_tree, io_tree, failrec);
 		bio_put(bio);
+		refcount_dec(&dip->refs);
 	}
 
 	return status;
 }
 
-struct btrfs_retry_complete {
-	struct completion done;
-	struct inode *inode;
-	u64 start;
-	int uptodate;
-};
-
-static void btrfs_retry_endio_nocsum(struct bio *bio)
-{
-	struct btrfs_retry_complete *done = bio->bi_private;
-	struct inode *inode = done->inode;
-	struct bio_vec *bvec;
-	struct extent_io_tree *io_tree, *failure_tree;
-	struct bvec_iter_all iter_all;
-
-	if (bio->bi_status)
-		goto end;
-
-	ASSERT(bio->bi_vcnt == 1);
-	io_tree = &BTRFS_I(inode)->io_tree;
-	failure_tree = &BTRFS_I(inode)->io_failure_tree;
-	ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(inode));
-
-	done->uptodate = 1;
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
-				 io_tree, done->start, bvec->bv_page,
-				 btrfs_ino(BTRFS_I(inode)), 0);
-end:
-	complete(&done->done);
-	bio_put(bio);
-}
-
-static blk_status_t __btrfs_correct_data_nocsum(struct inode *inode,
-						struct btrfs_io_bio *io_bio)
+static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
+					     struct btrfs_io_bio *io_bio,
+					     const bool uptodate)
 {
-	struct btrfs_fs_info *fs_info;
+	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
+	u32 sectorsize = fs_info->sectorsize;
+	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
+	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	struct bio_vec bvec;
 	struct bvec_iter iter;
-	struct btrfs_retry_complete done;
-	u64 start;
-	unsigned int pgoff;
-	u32 sectorsize;
-	int nr_sectors;
-	blk_status_t ret;
+	u64 start = io_bio->logical;
+	int icsum = 0;
 	blk_status_t err = BLK_STS_OK;
 
-	fs_info = BTRFS_I(inode)->root->fs_info;
-	sectorsize = fs_info->sectorsize;
-
-	start = io_bio->logical;
-	done.inode = inode;
-	io_bio->bio.bi_iter = io_bio->iter;
+	__bio_for_each_segment(bvec, &io_bio->bio, iter, io_bio->iter) {
+		unsigned int i, nr_sectors, pgoff;
 
-	bio_for_each_segment(bvec, &io_bio->bio, iter) {
 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
 		pgoff = bvec.bv_offset;
-
-next_block_or_try_again:
-		done.uptodate = 0;
-		done.start = start;
-		init_completion(&done.done);
-
-		ret = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
-				pgoff, start, start + sectorsize - 1,
-				io_bio->mirror_num,
-				btrfs_retry_endio_nocsum, &done);
-		if (ret) {
-			err = ret;
-			goto next;
-		}
-
-		wait_for_completion_io(&done.done);
-
-		if (!done.uptodate) {
-			/* We might have another mirror, so try again */
-			goto next_block_or_try_again;
-		}
-
-next:
-		start += sectorsize;
-
-		nr_sectors--;
-		if (nr_sectors) {
-			pgoff += sectorsize;
-			ASSERT(pgoff < PAGE_SIZE);
-			goto next_block_or_try_again;
-		}
-	}
-
-	return err;
-}
-
-static void btrfs_retry_endio(struct bio *bio)
-{
-	struct btrfs_retry_complete *done = bio->bi_private;
-	struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
-	struct extent_io_tree *io_tree, *failure_tree;
-	struct inode *inode = done->inode;
-	struct bio_vec *bvec;
-	int uptodate;
-	int ret;
-	int i = 0;
-	struct bvec_iter_all iter_all;
-
-	if (bio->bi_status)
-		goto end;
-
-	uptodate = 1;
-
-	ASSERT(bio->bi_vcnt == 1);
-	ASSERT(bio_first_bvec_all(bio)->bv_len == btrfs_inode_sectorsize(done->inode));
-
-	io_tree = &BTRFS_I(inode)->io_tree;
-	failure_tree = &BTRFS_I(inode)->io_failure_tree;
-
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		ret = check_data_csum(inode, io_bio, i, bvec->bv_page,
-				      bvec->bv_offset, done->start,
-				      bvec->bv_len);
-		if (!ret)
-			clean_io_failure(BTRFS_I(inode)->root->fs_info,
-					 failure_tree, io_tree, done->start,
-					 bvec->bv_page,
-					 btrfs_ino(BTRFS_I(inode)),
-					 bvec->bv_offset);
-		else
-			uptodate = 0;
-		i++;
-	}
-
-	done->uptodate = uptodate;
-end:
-	complete(&done->done);
-	bio_put(bio);
-}
-
-static blk_status_t __btrfs_subio_endio_read(struct inode *inode,
-		struct btrfs_io_bio *io_bio, blk_status_t err)
-{
-	struct btrfs_fs_info *fs_info;
-	struct bio_vec bvec;
-	struct bvec_iter iter;
-	struct btrfs_retry_complete done;
-	u64 start;
-	u64 offset = 0;
-	u32 sectorsize;
-	int nr_sectors;
-	unsigned int pgoff;
-	int csum_pos;
-	bool uptodate = (err == 0);
-	int ret;
-	blk_status_t status;
-
-	fs_info = BTRFS_I(inode)->root->fs_info;
-	sectorsize = fs_info->sectorsize;
-
-	err = BLK_STS_OK;
-	start = io_bio->logical;
-	done.inode = inode;
-	io_bio->bio.bi_iter = io_bio->iter;
-
-	bio_for_each_segment(bvec, &io_bio->bio, iter) {
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
-
-		pgoff = bvec.bv_offset;
-next_block:
-		if (uptodate) {
-			csum_pos = BTRFS_BYTES_TO_BLKS(fs_info, offset);
-			ret = check_data_csum(inode, io_bio, csum_pos,
-					      bvec.bv_page, pgoff, start,
-					      sectorsize);
-			if (likely(!ret))
-				goto next;
-		}
-try_again:
-		done.uptodate = 0;
-		done.start = start;
-		init_completion(&done.done);
-
-		status = dio_read_error(inode, &io_bio->bio, bvec.bv_page,
-					pgoff, start, start + sectorsize - 1,
-					io_bio->mirror_num, btrfs_retry_endio,
-					&done);
-		if (status) {
-			err = status;
-			goto next;
-		}
-
-		wait_for_completion_io(&done.done);
-
-		if (!done.uptodate) {
-			/* We might have another mirror, so try again */
-			goto try_again;
-		}
-next:
-		offset += sectorsize;
-		start += sectorsize;
-
-		ASSERT(nr_sectors);
-
-		nr_sectors--;
-		if (nr_sectors) {
+		for (i = 0; i < nr_sectors; i++) {
+			if (uptodate &&
+			    (!csum || !check_data_csum(inode, io_bio, icsum,
+						       bvec.bv_page, pgoff,
+						       start, sectorsize))) {
+				clean_io_failure(fs_info, failure_tree, io_tree,
+						 start, bvec.bv_page,
+						 btrfs_ino(BTRFS_I(inode)),
+						 pgoff);
+			} else {
+				blk_status_t status;
+
+				status = dio_read_error(inode, &io_bio->bio,
+							bvec.bv_page, pgoff,
+							start,
+							start + sectorsize - 1,
+							io_bio->mirror_num);
+				if (status)
+					err = status;
+			}
+			start += sectorsize;
+			icsum++;
 			pgoff += sectorsize;
 			ASSERT(pgoff < PAGE_SIZE);
-			goto next_block;
 		}
 	}
-
 	return err;
 }
 
-static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
-					     struct btrfs_io_bio *io_bio,
-					     blk_status_t err)
-{
-	bool skip_csum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
-
-	if (skip_csum) {
-		if (unlikely(err))
-			return __btrfs_correct_data_nocsum(inode, io_bio);
-		else
-			return BLK_STS_OK;
-	} else {
-		return __btrfs_subio_endio_read(inode, io_bio, err);
-	}
-}
-
 static void __endio_write_update_ordered(struct inode *inode,
 					 const u64 offset, const u64 bytes,
 					 const bool uptodate)
@@ -7813,7 +7629,7 @@  static void btrfs_end_dio_bio(struct bio *bio)
 
 	if (bio_op(bio) == REQ_OP_READ) {
 		err = btrfs_check_read_dio_bio(dip->inode, btrfs_io_bio(bio),
-					       err);
+					       !err);
 	}
 
 	if (err)

[13/15] btrfs: simplify direct I/O read repair

Commit Message

Comments

Patch