Message ID | 20180418040828.18165-5-david@fromorbit.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed 18-04-18 14:08:28, Dave Chinner wrote: > @@ -1012,8 +1035,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > dio->flags |= IOMAP_DIO_DIRTY; > } else { > dio->flags |= IOMAP_DIO_WRITE; > - if (iocb->ki_flags & IOCB_DSYNC) > + if (iocb->ki_flags & IOCB_DSYNC) { > dio->flags |= IOMAP_DIO_NEED_SYNC; > + /* > + * We optimistically try using FUA for this IO. Any > + * non-FUA write that occurs will clear this flag, hence > + * we know before completion whether a cache flush is > + * necessary. > + */ > + dio->flags |= IOMAP_DIO_WRITE_FUA; > + } So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC writes (in that case we also set IOCB_SYNC). And for those we cannot use the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe indicator of a need of full fsync for O_SYNC). Other than that the patch looks good to me. Honza
On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote: > > - if (iocb->ki_flags & IOCB_DSYNC) > > + if (iocb->ki_flags & IOCB_DSYNC) { > > dio->flags |= IOMAP_DIO_NEED_SYNC; > > + /* > > + * We optimistically try using FUA for this IO. Any > > + * non-FUA write that occurs will clear this flag, hence > > + * we know before completion whether a cache flush is > > + * necessary. > > + */ > > + dio->flags |= IOMAP_DIO_WRITE_FUA; > > + } > > So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC > writes (in that case we also set IOCB_SYNC). And for those we cannot use > the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe > indicator of a need of full fsync for O_SYNC). Other than that the patch > looks good to me. Oops, good catch. I think the above if should just be if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) { and we are fine. > > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ---end quoted text---
On 04/24/18 19:34, Christoph Hellwig wrote: > On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote: >>> - if (iocb->ki_flags & IOCB_DSYNC) >>> + if (iocb->ki_flags & IOCB_DSYNC) { >>> dio->flags |= IOMAP_DIO_NEED_SYNC; >>> + /* >>> + * We optimistically try using FUA for this IO. Any >>> + * non-FUA write that occurs will clear this flag, hence >>> + * we know before completion whether a cache flush is >>> + * necessary. >>> + */ >>> + dio->flags |= IOMAP_DIO_WRITE_FUA; >>> + } >> >> So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC >> writes (in that case we also set IOCB_SYNC). And for those we cannot use >> the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe >> indicator of a need of full fsync for O_SYNC). Other than that the patch >> looks good to me. > > Oops, good catch. I think the above if should just be > > if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) { > > and we are fine. The above line just gives parenthesis salad errors, so why not compromise on: if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC) { Unless my bit twiddling has completely left me I think this is what was intended, and it actually compiles too. cheers Holger
On Wed, Apr 25, 2018 at 12:07:07AM +0200, Holger Hoffstätte wrote: > The above line just gives parenthesis salad errors, so why not compromise > on: > > if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC) { > > Unless my bit twiddling has completely left me I think this is what was > intended, and it actually compiles too. Yes, that is what was intended :)
On Wed 25-04-18 00:07:07, Holger Hoffstätte wrote: > On 04/24/18 19:34, Christoph Hellwig wrote: > > On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote: > > > > - if (iocb->ki_flags & IOCB_DSYNC) > > > > + if (iocb->ki_flags & IOCB_DSYNC) { > > > > dio->flags |= IOMAP_DIO_NEED_SYNC; > > > > + /* > > > > + * We optimistically try using FUA for this IO. Any > > > > + * non-FUA write that occurs will clear this flag, hence > > > > + * we know before completion whether a cache flush is > > > > + * necessary. > > > > + */ > > > > + dio->flags |= IOMAP_DIO_WRITE_FUA; > > > > + } > > > > > > So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC > > > writes (in that case we also set IOCB_SYNC). And for those we cannot use > > > the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe > > > indicator of a need of full fsync for O_SYNC). Other than that the patch > > > looks good to me. > > > > Oops, good catch. I think the above if should just be > > > > if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) { > > > > and we are fine. > > The above line just gives parenthesis salad errors, so why not compromise > on: > > if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC) { > > Unless my bit twiddling has completely left me I think this is what was > intended, and it actually compiles too. Yup, I agree this is what needs to happen. Honza
On Tue, Apr 24, 2018 at 10:34:44AM -0700, Christoph Hellwig wrote: > On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote: > > > - if (iocb->ki_flags & IOCB_DSYNC) > > > + if (iocb->ki_flags & IOCB_DSYNC) { > > > dio->flags |= IOMAP_DIO_NEED_SYNC; > > > + /* > > > + * We optimistically try using FUA for this IO. Any > > > + * non-FUA write that occurs will clear this flag, hence > > > + * we know before completion whether a cache flush is > > > + * necessary. > > > + */ > > > + dio->flags |= IOMAP_DIO_WRITE_FUA; > > > + } > > > > So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC > > writes (in that case we also set IOCB_SYNC). And for those we cannot use > > the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe > > indicator of a need of full fsync for O_SYNC). Other than that the patch > > looks good to me. > > Oops, good catch. I think the above if should just be > > if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) { > > and we are fine. Ah, not exactly. IOMAP_DIO_NEED_SYNC needs to be set for either DYSNC or SYNC writes, while IOMAP_DIO_WRITE_FUA should only be set for DSYNC. I'll fix this up appropriately. Cheers, Dave.
diff --git a/fs/iomap.c b/fs/iomap.c index 1f59c2d9ade6..62f1f8256da2 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -685,6 +685,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_data); * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) #define IOMAP_DIO_WRITE (1 << 30) #define IOMAP_DIO_DIRTY (1 << 31) @@ -860,6 +861,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, struct iov_iter iter; struct bio *bio; bool need_zeroout = false; + bool use_fua = false; int nr_pages, ret; size_t copied = 0; @@ -883,8 +885,20 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, case IOMAP_MAPPED: if (iomap->flags & IOMAP_F_SHARED) dio->flags |= IOMAP_DIO_COW; - if (iomap->flags & IOMAP_F_NEW) + if (iomap->flags & IOMAP_F_NEW) { need_zeroout = true; + } else { + /* + * Use a FUA write if we need datasync semantics, this + * is a pure data IO that doesn't require any metadata + * updates and the underlying device supports FUA. This + * allows us to avoid cache flushes on IO completion. + */ + if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && + (dio->flags & IOMAP_DIO_WRITE_FUA) && + blk_queue_fua(bdev_get_queue(iomap->bdev))) + use_fua = true; + } break; default: WARN_ON_ONCE(1); @@ -932,10 +946,14 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, n = bio->bi_iter.bi_size; if (dio->flags & IOMAP_DIO_WRITE) { - bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | REQ_IDLE); + bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE; + if (use_fua) + bio->bi_opf |= REQ_FUA; + else + dio->flags &= ~IOMAP_DIO_WRITE_FUA; task_io_account_write(n); } else { - bio_set_op_attrs(bio, REQ_OP_READ, 0); + bio->bi_opf = REQ_OP_READ; if (dio->flags & IOMAP_DIO_DIRTY) bio_set_pages_dirty(bio); } @@ -965,7 +983,12 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, /* * iomap_dio_rw() always completes O_[D]SYNC writes regardless of whether the IO - * is being issued as AIO or not. + * is being issued as AIO or not. This allows us to optimise pure data writes + * to use REQ_FUA rather than requiring generic_write_sync() to issue a + * REQ_FLUSH post write. This is slightly tricky because a single request here + * can be mapped into multiple disjoint IOs and only a subset of the IOs issued + * may be pure data writes. In that case, we still need to do a full data sync + * completion. */ ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, @@ -1012,8 +1035,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->flags |= IOMAP_DIO_DIRTY; } else { dio->flags |= IOMAP_DIO_WRITE; - if (iocb->ki_flags & IOCB_DSYNC) + if (iocb->ki_flags & IOCB_DSYNC) { dio->flags |= IOMAP_DIO_NEED_SYNC; + /* + * We optimistically try using FUA for this IO. Any + * non-FUA write that occurs will clear this flag, hence + * we know before completion whether a cache flush is + * necessary. + */ + dio->flags |= IOMAP_DIO_WRITE_FUA; + } flags |= IOMAP_WRITE; } @@ -1070,6 +1101,13 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (ret < 0) iomap_dio_set_error(dio, ret); + /* + * If all the writes we issued were FUA, we don't need to flush the + * cache on IO completion. Clear the sync flag for this case. + */ + if (dio->flags & IOMAP_DIO_WRITE_FUA) + dio->flags &= ~IOMAP_DIO_NEED_SYNC; + if (!atomic_dec_and_test(&dio->ref)) { if (!is_sync_kiocb(iocb)) return -EIOCBQUEUED;