diff mbox series

[v1,4/7] xfs: Add b_rw_hint to xfs_buf

Message ID 1543376991-5764-5-git-send-email-allison.henderson@oracle.com (mailing list archive)
State New, archived
Headers show
Series Block/XFS: Support alternative mirror device retry | expand

Commit Message

Allison Henderson Nov. 28, 2018, 3:49 a.m. UTC
This patch adds a new field b_rw_hint to xfs_buf.  We will
need this to properly initialize the new bio->bi_rw_hint when
submitting the read request.  When the read completes, we
then store the returned mirror in the b_rw_hint.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_buf.c | 5 ++++-
 fs/xfs/xfs_buf.h | 7 +++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

Comments

Dave Chinner Nov. 28, 2018, 5:03 a.m. UTC | #1
On Tue, Nov 27, 2018 at 08:49:48PM -0700, Allison Henderson wrote:
> This patch adds a new field b_rw_hint to xfs_buf.  We will
> need this to properly initialize the new bio->bi_rw_hint when
> submitting the read request.  When the read completes, we
> then store the returned mirror in the b_rw_hint.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_buf.c | 5 ++++-
>  fs/xfs/xfs_buf.h | 7 +++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index b21ea2b..dd8ba59 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1322,8 +1322,10 @@ xfs_buf_bio_end_io(
>  	if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
>  		invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
>  
> -	if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
> +	if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
> +		bp->b_rw_hint = bio->bi_rw_hint;
>  		xfs_buf_ioend_async(bp);
> +	}
>  	bio_put(bio);
>  }
>  

This will miss setting bp->b_rw_hint for IO that completes before
submission returns to __xfs_buf_submit() (i.e. b_io_remaining is 2
at IO completion).

So I suspect it won't do the right thing on fast or synchronous
block devices like pmem. You should be able to tst this with a RAID1
made from two ramdisks...

Cheers,

Dave.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index b21ea2b..dd8ba59 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1322,8 +1322,10 @@  xfs_buf_bio_end_io(
 	if (!bp->b_error && xfs_buf_is_vmapped(bp) && (bp->b_flags & XBF_READ))
 		invalidate_kernel_vmap_range(bp->b_addr, xfs_buf_vmap_len(bp));
 
-	if (atomic_dec_and_test(&bp->b_io_remaining) == 1)
+	if (atomic_dec_and_test(&bp->b_io_remaining) == 1) {
+		bp->b_rw_hint = bio->bi_rw_hint;
 		xfs_buf_ioend_async(bp);
+	}
 	bio_put(bio);
 }
 
@@ -1369,6 +1371,7 @@  xfs_buf_ioapply_map(
 	bio->bi_iter.bi_sector = sector;
 	bio->bi_end_io = xfs_buf_bio_end_io;
 	bio->bi_private = bp;
+	bio->bi_rw_hint = bp->b_rw_hint;
 	bio_set_op_attrs(bio, op, op_flags);
 
 	for (; size && nr_pages; nr_pages--, page_index++) {
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index b9f5511..db138e5 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -197,6 +197,13 @@  typedef struct xfs_buf {
 	unsigned long		b_first_retry_time; /* in jiffies */
 	int			b_last_error;
 
+	/*
+	 * If b_rw_hint is set before a read, it specifies an alternate mirror
+	 * to read from.  Upon bio completion, b_rw_hint stores the last mirror
+	 * that was read from
+	 */
+	unsigned short		b_rw_hint;
+
 	const struct xfs_buf_ops	*b_ops;
 } xfs_buf_t;