diff mbox

[v8,03/10] iomap: Complete partial direct I/O writes synchronously

Message ID 20180604193123.27655-4-agruenba@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andreas Gruenbacher June 4, 2018, 7:31 p.m. UTC
According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated.  This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O.  The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks.  As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write.  In that
case, wait for the short write to complete to allow gfs2 to recover.

This will make xfstest generic/240 work on gfs2.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Comments

David Sterba June 5, 2018, 12:10 p.m. UTC | #1
On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
> @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	if (ret < 0)
>  		iomap_dio_set_error(dio, ret);
>  
> +	smp_mb__before_atomic();
>  	if (!atomic_dec_and_test(&dio->ref)) {

The barrier should be documented. I tried to do a quick look around the
code if it's clear why it's there but it's not. Thanks.

> -		if (!is_sync_kiocb(iocb))
> +		if (!dio->wait_for_completion)
>  			return -EIOCBQUEUED;
>  
>  		for (;;) {
Andreas Grünbacher June 5, 2018, 12:32 p.m. UTC | #2
2018-06-05 14:10 GMT+02:00 David Sterba <dsterba@suse.cz>:
> On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
>> @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>       if (ret < 0)
>>               iomap_dio_set_error(dio, ret);
>>

+       /*
+        * Make sure changes to dio are visible globally before the atomic
+        * counter decrement.
+        */

>> +     smp_mb__before_atomic();
>>       if (!atomic_dec_and_test(&dio->ref)) {
>
> The barrier should be documented. I tried to do a quick look around the
> code if it's clear why it's there but it's not. Thanks.
>
>> -             if (!is_sync_kiocb(iocb))
>> +             if (!dio->wait_for_completion)
>>                       return -EIOCBQUEUED;
>>
>>               for (;;) {

Thanks,
Andreas
Christoph Hellwig June 6, 2018, 10:26 a.m. UTC | #3
On Tue, Jun 05, 2018 at 02:10:24PM +0200, David Sterba wrote:
> On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
> > @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >  	if (ret < 0)
> >  		iomap_dio_set_error(dio, ret);
> >  
> > +	smp_mb__before_atomic();
> >  	if (!atomic_dec_and_test(&dio->ref)) {
> 
> The barrier should be documented. I tried to do a quick look around the
> code if it's clear why it's there but it's not. Thanks.

It doesn't really make sense.  smp_mb__before_atomic is for relaxed
atomic operations, which atomic_dec_and_test is not.
Andreas Gruenbacher June 6, 2018, 11:44 a.m. UTC | #4
On 6 June 2018 at 12:26, Christoph Hellwig <hch@lst.de> wrote:
> On Tue, Jun 05, 2018 at 02:10:24PM +0200, David Sterba wrote:
>> On Mon, Jun 04, 2018 at 09:31:16PM +0200, Andreas Gruenbacher wrote:
>> > @@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>> >     if (ret < 0)
>> >             iomap_dio_set_error(dio, ret);
>> >
>> > +   smp_mb__before_atomic();
>> >     if (!atomic_dec_and_test(&dio->ref)) {
>>
>> The barrier should be documented. I tried to do a quick look around the
>> code if it's clear why it's there but it's not. Thanks.
>
> It doesn't really make sense.  smp_mb__before_atomic is for relaxed
> atomic operations, which atomic_dec_and_test is not.

After re-reading Documentation/core-api/atomic_ops.rst again, I agree
it's not needed. There is an example in the documentation that was
confusing me.

Thanks,
Andreas
diff mbox

Patch

diff --git a/fs/iomap.c b/fs/iomap.c
index 54b693da3a35..a0d3b7742060 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -696,6 +696,7 @@  struct iomap_dio {
 	atomic_t		ref;
 	unsigned		flags;
 	int			error;
+	bool			wait_for_completion;
 
 	union {
 		/* used during submission and for synchronous completion: */
@@ -797,9 +798,8 @@  static void iomap_dio_bio_end_io(struct bio *bio)
 		iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (is_sync_kiocb(dio->iocb)) {
+		if (dio->wait_for_completion) {
 			struct task_struct *waiter = dio->submit.waiter;
-
 			WRITE_ONCE(dio->submit.waiter, NULL);
 			wake_up_process(waiter);
 		} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -990,13 +990,12 @@  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->end_io = end_io;
 	dio->error = 0;
 	dio->flags = 0;
+	dio->wait_for_completion = is_sync_kiocb(iocb);
 
 	dio->submit.iter = iter;
-	if (is_sync_kiocb(iocb)) {
-		dio->submit.waiter = current;
-		dio->submit.cookie = BLK_QC_T_NONE;
-		dio->submit.last_queue = NULL;
-	}
+	dio->submit.waiter = current;
+	dio->submit.cookie = BLK_QC_T_NONE;
+	dio->submit.last_queue = NULL;
 
 	if (iov_iter_rw(iter) == READ) {
 		if (pos >= dio->i_size)
@@ -1033,7 +1032,7 @@  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		dio_warn_stale_pagecache(iocb->ki_filp);
 	ret = 0;
 
-	if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+	if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
 	    !inode->i_sb->s_dio_done_wq) {
 		ret = sb_init_dio_done_wq(inode->i_sb);
 		if (ret < 0)
@@ -1048,8 +1047,10 @@  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				iomap_dio_actor);
 		if (ret <= 0) {
 			/* magic error code to fall back to buffered I/O */
-			if (ret == -ENOTBLK)
+			if (ret == -ENOTBLK) {
+				dio->wait_for_completion = true;
 				ret = 0;
+			}
 			break;
 		}
 		pos += ret;
@@ -1062,8 +1063,9 @@  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (ret < 0)
 		iomap_dio_set_error(dio, ret);
 
+	smp_mb__before_atomic();
 	if (!atomic_dec_and_test(&dio->ref)) {
-		if (!is_sync_kiocb(iocb))
+		if (!dio->wait_for_completion)
 			return -EIOCBQUEUED;
 
 		for (;;) {