diff mbox

[RESEND] md: Make flush bios explicitely sync

Message ID 20170524114013.14130-1-jack@suse.cz (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Jan Kara May 24, 2017, 11:40 a.m. UTC
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

CC: linux-raid@vger.kernel.org
CC: Shaohua Li <shli@kernel.org>
CC: Mike Snitzer <snitzer@redhat.com>
CC: dm-devel@redhat.com
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/md/dm-snap-persistent.c | 3 ++-
 drivers/md/md.c                 | 2 +-
 drivers/md/raid5-cache.c        | 4 ++--
 3 files changed, 5 insertions(+), 4 deletions(-)

Guys, I don't know enough about DM/MD to judge whether I've identified all the
places that want REQ_SYNC right. Can you please have a look?

Comments

Shaohua Li May 24, 2017, 11:22 p.m. UTC | #1
On Wed, May 24, 2017 at 01:40:13PM +0200, Jan Kara wrote:
> Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
> synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
> definitions.  generic_make_request_checks() however strips REQ_FUA and
> REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
> write cache and thus write effectively becomes asynchronous which can
> lead to performance regressions
> 
> Fix the problem by making sure all bios which are synchronous are
> properly marked with REQ_SYNC.

Hi,

DM and MD are different trees, so probably you should separate them to 2
patches. For the md part (md.c, raid5-cache.c), some placed which use REQ_FUA
are missed, like raid5.c and raid5-ppl.c

Can't remember if others asked the question in your first post, sorry, but why
we don't add REQ_SYNC in generic_make_request_checks() if we are going to
stripe REQ_FUA, REQ_PREFLUSH. That will be less error prone.

Thanks,
Shaohua

> CC: linux-raid@vger.kernel.org
> CC: Shaohua Li <shli@kernel.org>
> CC: Mike Snitzer <snitzer@redhat.com>
> CC: dm-devel@redhat.com
> Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  drivers/md/dm-snap-persistent.c | 3 ++-
>  drivers/md/md.c                 | 2 +-
>  drivers/md/raid5-cache.c        | 4 ++--
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> Guys, I don't know enough about DM/MD to judge whether I've identified all the
> places that want REQ_SYNC right. Can you please have a look?
> 
> diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
> index b93476c3ba3f..b92ab4cb0710 100644
> --- a/drivers/md/dm-snap-persistent.c
> +++ b/drivers/md/dm-snap-persistent.c
> @@ -741,7 +741,8 @@ static void persistent_commit_exception(struct dm_exception_store *store,
>  	/*
>  	 * Commit exceptions to disk.
>  	 */
> -	if (ps->valid && area_io(ps, REQ_OP_WRITE, REQ_PREFLUSH | REQ_FUA))
> +	if (ps->valid && area_io(ps, REQ_OP_WRITE,
> +				 REQ_SYNC | REQ_PREFLUSH | REQ_FUA))
>  		ps->valid = 0;
>  
>  	/*
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 10367ffe92e3..212a6777ff31 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -765,7 +765,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
>  	    test_bit(FailFast, &rdev->flags) &&
>  	    !test_bit(LastDev, &rdev->flags))
>  		ff = MD_FAILFAST;
> -	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA | ff;
> +	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA | ff;
>  
>  	atomic_inc(&mddev->pending_writes);
>  	submit_bio(bio);
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index 4c00bc248287..0a7af8b0a80a 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -1782,7 +1782,7 @@ static int r5l_log_write_empty_meta_block(struct r5l_log *log, sector_t pos,
>  	mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
>  					     mb, PAGE_SIZE));
>  	if (!sync_page_io(log->rdev, pos, PAGE_SIZE, page, REQ_OP_WRITE,
> -			  REQ_FUA, false)) {
> +			  REQ_SYNC | REQ_FUA, false)) {
>  		__free_page(page);
>  		return -EIO;
>  	}
> @@ -2388,7 +2388,7 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
>  		mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
>  						     mb, PAGE_SIZE));
>  		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
> -			     REQ_OP_WRITE, REQ_FUA, false);
> +			     REQ_OP_WRITE, REQ_SYNC | REQ_FUA, false);
>  		sh->log_start = ctx->pos;
>  		list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
>  		atomic_inc(&log->stripe_in_journal_count);
> -- 
> 2.12.0
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Jan Kara May 25, 2017, 8:11 a.m. UTC | #2
On Wed 24-05-17 16:22:36, Shaohua Li wrote:
> On Wed, May 24, 2017 at 01:40:13PM +0200, Jan Kara wrote:
> > Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
> > synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
> > definitions.  generic_make_request_checks() however strips REQ_FUA and
> > REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
> > write cache and thus write effectively becomes asynchronous which can
> > lead to performance regressions
> > 
> > Fix the problem by making sure all bios which are synchronous are
> > properly marked with REQ_SYNC.
> 
> DM and MD are different trees, so probably you should separate them to 2
> patches.

OK, I can do that.

> For the md part (md.c, raid5-cache.c), some placed which use REQ_FUA
> are missed, like raid5.c and raid5-ppl.c

So ops_run_io() in raid5.c only copy REQ_FUA from some internal raid5
flags. My thinking was that we want to just propagate whatever we were
instructed to do here.

The case in ppl_write_empty_header() is clearly missed, I'll fix that.
Thanks. I'm not quite sure about ppl_submit_iounit() - I don't see a place
where we are waiting for those bios to complete. If it is likely to happen
soon after bio submission, we should add REQ_SYNC there.

> Can't remember if others asked the question in your first post, sorry,
> but why we don't add REQ_SYNC in generic_make_request_checks() if we are
> going to stripe REQ_FUA, REQ_PREFLUSH. That will be less error prone.

Well, strictly speaking users of REQ_FUA do not necessarily have to use
REQ_SYNC. These are two different orthogonal things - one is a request for
bypassing disk cache, the other is a hint to the IO scheduler that there is
someone waiting for the IO to complete. Most of the time you wait for
REQ_FUA request immediately but I can see some uses in filesystems
where we might want to submit REQ_FUA request in the background (like when
doing background cleaning of the journal).

								Honza

> > CC: linux-raid@vger.kernel.org
> > CC: Shaohua Li <shli@kernel.org>
> > CC: Mike Snitzer <snitzer@redhat.com>
> > CC: dm-devel@redhat.com
> > Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  drivers/md/dm-snap-persistent.c | 3 ++-
> >  drivers/md/md.c                 | 2 +-
> >  drivers/md/raid5-cache.c        | 4 ++--
> >  3 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > Guys, I don't know enough about DM/MD to judge whether I've identified all the
> > places that want REQ_SYNC right. Can you please have a look?
> > 
> > diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
> > index b93476c3ba3f..b92ab4cb0710 100644
> > --- a/drivers/md/dm-snap-persistent.c
> > +++ b/drivers/md/dm-snap-persistent.c
> > @@ -741,7 +741,8 @@ static void persistent_commit_exception(struct dm_exception_store *store,
> >  	/*
> >  	 * Commit exceptions to disk.
> >  	 */
> > -	if (ps->valid && area_io(ps, REQ_OP_WRITE, REQ_PREFLUSH | REQ_FUA))
> > +	if (ps->valid && area_io(ps, REQ_OP_WRITE,
> > +				 REQ_SYNC | REQ_PREFLUSH | REQ_FUA))
> >  		ps->valid = 0;
> >  
> >  	/*
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 10367ffe92e3..212a6777ff31 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -765,7 +765,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
> >  	    test_bit(FailFast, &rdev->flags) &&
> >  	    !test_bit(LastDev, &rdev->flags))
> >  		ff = MD_FAILFAST;
> > -	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA | ff;
> > +	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA | ff;
> >  
> >  	atomic_inc(&mddev->pending_writes);
> >  	submit_bio(bio);
> > diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> > index 4c00bc248287..0a7af8b0a80a 100644
> > --- a/drivers/md/raid5-cache.c
> > +++ b/drivers/md/raid5-cache.c
> > @@ -1782,7 +1782,7 @@ static int r5l_log_write_empty_meta_block(struct r5l_log *log, sector_t pos,
> >  	mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
> >  					     mb, PAGE_SIZE));
> >  	if (!sync_page_io(log->rdev, pos, PAGE_SIZE, page, REQ_OP_WRITE,
> > -			  REQ_FUA, false)) {
> > +			  REQ_SYNC | REQ_FUA, false)) {
> >  		__free_page(page);
> >  		return -EIO;
> >  	}
> > @@ -2388,7 +2388,7 @@ r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
> >  		mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
> >  						     mb, PAGE_SIZE));
> >  		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
> > -			     REQ_OP_WRITE, REQ_FUA, false);
> > +			     REQ_OP_WRITE, REQ_SYNC | REQ_FUA, false);
> >  		sh->log_start = ctx->pos;
> >  		list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
> >  		atomic_inc(&log->stripe_in_journal_count);
> > -- 
> > 2.12.0
> >
diff mbox

Patch

diff --git a/drivers/md/dm-snap-persistent.c b/drivers/md/dm-snap-persistent.c
index b93476c3ba3f..b92ab4cb0710 100644
--- a/drivers/md/dm-snap-persistent.c
+++ b/drivers/md/dm-snap-persistent.c
@@ -741,7 +741,8 @@  static void persistent_commit_exception(struct dm_exception_store *store,
 	/*
 	 * Commit exceptions to disk.
 	 */
-	if (ps->valid && area_io(ps, REQ_OP_WRITE, REQ_PREFLUSH | REQ_FUA))
+	if (ps->valid && area_io(ps, REQ_OP_WRITE,
+				 REQ_SYNC | REQ_PREFLUSH | REQ_FUA))
 		ps->valid = 0;
 
 	/*
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 10367ffe92e3..212a6777ff31 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -765,7 +765,7 @@  void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
 	    test_bit(FailFast, &rdev->flags) &&
 	    !test_bit(LastDev, &rdev->flags))
 		ff = MD_FAILFAST;
-	bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH | REQ_FUA | ff;
+	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH | REQ_FUA | ff;
 
 	atomic_inc(&mddev->pending_writes);
 	submit_bio(bio);
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 4c00bc248287..0a7af8b0a80a 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -1782,7 +1782,7 @@  static int r5l_log_write_empty_meta_block(struct r5l_log *log, sector_t pos,
 	mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
 					     mb, PAGE_SIZE));
 	if (!sync_page_io(log->rdev, pos, PAGE_SIZE, page, REQ_OP_WRITE,
-			  REQ_FUA, false)) {
+			  REQ_SYNC | REQ_FUA, false)) {
 		__free_page(page);
 		return -EIO;
 	}
@@ -2388,7 +2388,7 @@  r5c_recovery_rewrite_data_only_stripes(struct r5l_log *log,
 		mb->checksum = cpu_to_le32(crc32c_le(log->uuid_checksum,
 						     mb, PAGE_SIZE));
 		sync_page_io(log->rdev, ctx->pos, PAGE_SIZE, page,
-			     REQ_OP_WRITE, REQ_FUA, false);
+			     REQ_OP_WRITE, REQ_SYNC | REQ_FUA, false);
 		sh->log_start = ctx->pos;
 		list_add_tail(&sh->r5c, &log->stripe_in_journal_list);
 		atomic_inc(&log->stripe_in_journal_count);