diff mbox series

[4/6] ext4: Warn if we ever fallback to buffered-io for DIO atomic writes

Message ID 7c4779f1f0c8ead30f660a2cfbdf4d7cc08e405a.1729825985.git.ritesh.list@gmail.com (mailing list archive)
State Deferred, archived
Headers show
Series ext4: Add atomic write support for DIO | expand

Commit Message

Ritesh Harjani (IBM) Oct. 25, 2024, 3:45 a.m. UTC
iomap will not return -ENOTBLK in case of dio atomic writes. But let's
also add a WARN_ON_ONCE and return -EIO as a safety net.

Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext4/file.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong Oct. 25, 2024, 4:16 p.m. UTC | #1
On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
> also add a WARN_ON_ONCE and return -EIO as a safety net.
> 
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
>  fs/ext4/file.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f9516121a036..af6ebd0ac0d6 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		iomap_ops = &ext4_iomap_overwrite_ops;
>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>  			   dio_flags, NULL, 0);
> -	if (ret == -ENOTBLK)
> +	if (ret == -ENOTBLK) {
>  		ret = 0;
> +		/*
> +		 * iomap will never return -ENOTBLK if write fails for atomic
> +		 * write. But let's just add a safety net.

I think it can if the pagecache invalidation fails, so you really do
need the safety net.  I suspect that the xfs version of this series
needs it too, though it may have fallen out?

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +		 */
> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
> +			ret = -EIO;
> +	}
> +
>  	if (extend) {
>  		/*
>  		 * We always perform extending DIO write synchronously so by
> -- 
> 2.46.0
> 
>
Ritesh Harjani (IBM) Oct. 25, 2024, 5:51 p.m. UTC | #2
"Darrick J. Wong" <djwong@kernel.org> writes:

> On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
>> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
>> also add a WARN_ON_ONCE and return -EIO as a safety net.
>> 
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>>  fs/ext4/file.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> index f9516121a036..af6ebd0ac0d6 100644
>> --- a/fs/ext4/file.c
>> +++ b/fs/ext4/file.c
>> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>  		iomap_ops = &ext4_iomap_overwrite_ops;
>>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>>  			   dio_flags, NULL, 0);
>> -	if (ret == -ENOTBLK)
>> +	if (ret == -ENOTBLK) {
>>  		ret = 0;
>> +		/*
>> +		 * iomap will never return -ENOTBLK if write fails for atomic
>> +		 * write. But let's just add a safety net.
>
> I think it can if the pagecache invalidation fails, so you really do
> need the safety net.

Ah, right! So in that case I should remove WARN_ON_ONCE and correct
the comment too.

> I suspect that the xfs version of this series
> needs it too, though it may have fallen out?
>

I think so yes. Looks like it got missed.


> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>
> --D
>

Thanks for pointing that out.

-ritesh

>> +		 */
>> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
>> +			ret = -EIO;
>> +	}
>> +
>>  	if (extend) {
>>  		/*
>>  		 * We always perform extending DIO write synchronously so by
>> -- 
>> 2.46.0
>> 
>>
Dave Chinner Oct. 27, 2024, 10:26 p.m. UTC | #3
On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
> also add a WARN_ON_ONCE and return -EIO as a safety net.
> 
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
>  fs/ext4/file.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index f9516121a036..af6ebd0ac0d6 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		iomap_ops = &ext4_iomap_overwrite_ops;
>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>  			   dio_flags, NULL, 0);
> -	if (ret == -ENOTBLK)
> +	if (ret == -ENOTBLK) {
>  		ret = 0;
> +		/*
> +		 * iomap will never return -ENOTBLK if write fails for atomic
> +		 * write. But let's just add a safety net.
> +		 */
> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
> +			ret = -EIO;
> +	}

Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
That way we don't have to put this logic into every filesystem.

When/if we start supporting atomic writes for buffered IO, then it's
worth pushing this out to filesystems, but right now it doesn't seem
necessary...

-Dave.
Ritesh Harjani (IBM) Oct. 28, 2024, 1:09 a.m. UTC | #4
Hi Dave, 

Dave Chinner <david@fromorbit.com> writes:

> On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
>> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
>> also add a WARN_ON_ONCE and return -EIO as a safety net.
>> 
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>>  fs/ext4/file.c | 10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> index f9516121a036..af6ebd0ac0d6 100644
>> --- a/fs/ext4/file.c
>> +++ b/fs/ext4/file.c
>> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>  		iomap_ops = &ext4_iomap_overwrite_ops;
>>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>>  			   dio_flags, NULL, 0);
>> -	if (ret == -ENOTBLK)
>> +	if (ret == -ENOTBLK) {
>>  		ret = 0;
>> +		/*
>> +		 * iomap will never return -ENOTBLK if write fails for atomic
>> +		 * write. But let's just add a safety net.
>> +		 */
>> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
>> +			ret = -EIO;
>> +	}
>
> Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
> That way we don't have to put this logic into every filesystem.

This was origially intended as a safety net hence the WARN_ON_ONCE.
Later Darrick pointed out that we still might have an unconverted
condition in iomap which can return ENOTBLK for DIO atomic writes (page
cache invalidation).

You pointed it right that it should be fixed in iomap. However do you
think filesystems can still keep this as safety net (maybe no need of
WARN_ON_ONCE).

>
> When/if we start supporting atomic writes for buffered IO, then it's
> worth pushing this out to filesystems, but right now it doesn't seem
> necessary...
>
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner Oct. 28, 2024, 5:26 a.m. UTC | #5
On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote:
> 
> Hi Dave, 
> 
> Dave Chinner <david@fromorbit.com> writes:
> 
> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
> >> also add a WARN_ON_ONCE and return -EIO as a safety net.
> >> 
> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> >> ---
> >>  fs/ext4/file.c | 10 +++++++++-
> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> >> index f9516121a036..af6ebd0ac0d6 100644
> >> --- a/fs/ext4/file.c
> >> +++ b/fs/ext4/file.c
> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >>  		iomap_ops = &ext4_iomap_overwrite_ops;
> >>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
> >>  			   dio_flags, NULL, 0);
> >> -	if (ret == -ENOTBLK)
> >> +	if (ret == -ENOTBLK) {
> >>  		ret = 0;
> >> +		/*
> >> +		 * iomap will never return -ENOTBLK if write fails for atomic
> >> +		 * write. But let's just add a safety net.
> >> +		 */
> >> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
> >> +			ret = -EIO;
> >> +	}
> >
> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
> > That way we don't have to put this logic into every filesystem.
> 
> This was origially intended as a safety net hence the WARN_ON_ONCE.
> Later Darrick pointed out that we still might have an unconverted
> condition in iomap which can return ENOTBLK for DIO atomic writes (page
> cache invalidation).

Yes. That's my point - iomap knows that it's an atomic write, it
knows that invalidation failed, and it knows that there is no such
thing as buffered atomic writes. So there is no possible fallback
here, and it should be returning EIO in the page cache invalidation
failure case and not ENOTBLK.

> You pointed it right that it should be fixed in iomap. However do you
> think filesystems can still keep this as safety net (maybe no need of
> WARN_ON_ONCE).

I don't see any point in adding "impossible to hit" checks into
filesystems just in case some core infrastructure has a bug
introduced....

-Dave.
Ritesh Harjani (IBM) Oct. 28, 2024, 8:43 a.m. UTC | #6
Dave Chinner <david@fromorbit.com> writes:

> On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote:
>> 
>> Hi Dave, 
>> 
>> Dave Chinner <david@fromorbit.com> writes:
>> 
>> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
>> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
>> >> also add a WARN_ON_ONCE and return -EIO as a safety net.
>> >> 
>> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> >> ---
>> >>  fs/ext4/file.c | 10 +++++++++-
>> >>  1 file changed, 9 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> >> index f9516121a036..af6ebd0ac0d6 100644
>> >> --- a/fs/ext4/file.c
>> >> +++ b/fs/ext4/file.c
>> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> >>  		iomap_ops = &ext4_iomap_overwrite_ops;
>> >>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>> >>  			   dio_flags, NULL, 0);
>> >> -	if (ret == -ENOTBLK)
>> >> +	if (ret == -ENOTBLK) {
>> >>  		ret = 0;
>> >> +		/*
>> >> +		 * iomap will never return -ENOTBLK if write fails for atomic
>> >> +		 * write. But let's just add a safety net.
>> >> +		 */
>> >> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
>> >> +			ret = -EIO;
>> >> +	}
>> >
>> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
>> > That way we don't have to put this logic into every filesystem.
>> 
>> This was origially intended as a safety net hence the WARN_ON_ONCE.
>> Later Darrick pointed out that we still might have an unconverted
>> condition in iomap which can return ENOTBLK for DIO atomic writes (page
>> cache invalidation).
>
> Yes. That's my point - iomap knows that it's an atomic write, it
> knows that invalidation failed, and it knows that there is no such
> thing as buffered atomic writes. So there is no possible fallback
> here, and it should be returning EIO in the page cache invalidation
> failure case and not ENOTBLK.
>

Sorry my bad. I think I might have looked into a different version of
the code earlier. So the current patch from John already takes care of
the condition where if the page cache invalidation fails we don't return
-ENOTBLK [1]

[1]: https://lore.kernel.org/linux-xfs/Zxnp8bma2KrMDg5m@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/T/#m3664bbe00287d98caa690bb04f51d0ef164f52b3

>> You pointed it right that it should be fixed in iomap. However do you
>> think filesystems can still keep this as safety net (maybe no need of
>> WARN_ON_ONCE).
>
> I don't see any point in adding "impossible to hit" checks into
> filesystems just in case some core infrastructure has a bug
> introduced....
>

So even though we have taken care of that case from page cache
invalidation code, however it can still happen if iomap_iter()
ever returns -ENOTBLK.  

e.g. 

    blk_start_plug(&plug);
	while ((ret = iomap_iter(&iomi, ops)) > 0) {
		iomi.processed = iomap_dio_iter(&iomi, dio);

		/*
		 * We can only poll for single bio I/Os.
		 */
		iocb->ki_flags &= ~IOCB_HIPRI;
	}

	blk_finish_plug(&plug);

	/*
	 * We only report that we've read data up to i_size.
	 * Revert iter to a state corresponding to that as some callers (such
	 * as the splice code) rely on it.
	 */
	if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size)
		iov_iter_revert(iter, iomi.pos - dio->i_size);

	if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) {
		if (!(iocb->ki_flags & IOCB_NOWAIT))
			wait_for_completion = true;
		ret = 0;
	}

	/* magic error code to fall back to buffered I/O */
	if (ret == -ENOTBLK) {
		wait_for_completion = true;
		ret = 0;
	}

Reviewing the code paths there is a lot of ping pongs between core iomap
and FS. So it's not just core iomap what we are talking about here.

So I am still inclined towards having that check in place as a safety net. 
However - let me take some time to review some of this code paths
please. I wanted to send this email mainly to mention the point that
page cache invalidation case is already taken care in iomap for atomic
writes, so there is no bug there. 

I will get back on rest of the cases after I have looked more closely at it.

> -Dave.
>
> -- 
> Dave Chinner
> david@fromorbit.com

Thanks for the review!
-ritesh
Ritesh Harjani (IBM) Oct. 28, 2024, 6:14 p.m. UTC | #7
Hi Dave, 

Dave Chinner <david@fromorbit.com> writes:

> On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote:
>> 
>> Hi Dave, 
>> 
>> Dave Chinner <david@fromorbit.com> writes:
>> 
>> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
>> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
>> >> also add a WARN_ON_ONCE and return -EIO as a safety net.
>> >> 
>> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> >> ---
>> >>  fs/ext4/file.c | 10 +++++++++-
>> >>  1 file changed, 9 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> >> index f9516121a036..af6ebd0ac0d6 100644
>> >> --- a/fs/ext4/file.c
>> >> +++ b/fs/ext4/file.c
>> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> >>  		iomap_ops = &ext4_iomap_overwrite_ops;
>> >>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>> >>  			   dio_flags, NULL, 0);
>> >> -	if (ret == -ENOTBLK)
>> >> +	if (ret == -ENOTBLK) {
>> >>  		ret = 0;
>> >> +		/*
>> >> +		 * iomap will never return -ENOTBLK if write fails for atomic
>> >> +		 * write. But let's just add a safety net.
>> >> +		 */
>> >> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
>> >> +			ret = -EIO;
>> >> +	}
>> >
>> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
>> > That way we don't have to put this logic into every filesystem.
>> 
>> This was origially intended as a safety net hence the WARN_ON_ONCE.
>> Later Darrick pointed out that we still might have an unconverted
>> condition in iomap which can return ENOTBLK for DIO atomic writes (page
>> cache invalidation).
>
> Yes. That's my point - iomap knows that it's an atomic write, it
> knows that invalidation failed, and it knows that there is no such
> thing as buffered atomic writes. So there is no possible fallback
> here, and it should be returning EIO in the page cache invalidation
> failure case and not ENOTBLK.
>

So the iomap DIO can return following as return values which can make
some filesystems fallback to buffered-io (if they implement fallback
logic) - 
(1) -ENOTBLK -> this is only returned for pagecache invalidation failure.
(2) 0 or partial write size -> This can never happen for atomic writes
(since we are only allowing for single fsblock as of now).

Now looking at XFS, it never fallsback to buffered-io ever except just 2
cases - 
1. When pagecache invalidation fails in iomap (can never happen for
atomic writes)
2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes)

So it anyways should never happen that XFS ever fallback to buffered-io
for DIO atomic writes. Even today it does not fallback to buffered-io
for non-atomic short DIO writes.

>> You pointed it right that it should be fixed in iomap. However do you
>> think filesystems can still keep this as safety net (maybe no need of
>> WARN_ON_ONCE).
>
> I don't see any point in adding "impossible to hit" checks into
> filesystems just in case some core infrastructure has a bug
> introduced....

Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short
writes, though it should not happen for current atomic write case where
we are only allowing for 1 fsblock. 

However given there are several places in EXT4 which has got fallback logic,
I would still like to go with a WARN_ON_ONCE where we are calling ext4
buffered-io handling for DIO fallback writes. This is to catch any bugs
even in future when we move to multi-fsblock case (until we have atomic
write support for buffered-io).

Thanks!
-ritesh
Dave Chinner Oct. 29, 2024, 10:29 p.m. UTC | #8
On Mon, Oct 28, 2024 at 11:44:00PM +0530, Ritesh Harjani wrote:
> 
> Hi Dave, 
> 
> Dave Chinner <david@fromorbit.com> writes:
> 
> > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote:
> >> 
> >> Hi Dave, 
> >> 
> >> Dave Chinner <david@fromorbit.com> writes:
> >> 
> >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
> >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
> >> >> also add a WARN_ON_ONCE and return -EIO as a safety net.
> >> >> 
> >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> >> >> ---
> >> >>  fs/ext4/file.c | 10 +++++++++-
> >> >>  1 file changed, 9 insertions(+), 1 deletion(-)
> >> >> 
> >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> >> >> index f9516121a036..af6ebd0ac0d6 100644
> >> >> --- a/fs/ext4/file.c
> >> >> +++ b/fs/ext4/file.c
> >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> >> >>  		iomap_ops = &ext4_iomap_overwrite_ops;
> >> >>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
> >> >>  			   dio_flags, NULL, 0);
> >> >> -	if (ret == -ENOTBLK)
> >> >> +	if (ret == -ENOTBLK) {
> >> >>  		ret = 0;
> >> >> +		/*
> >> >> +		 * iomap will never return -ENOTBLK if write fails for atomic
> >> >> +		 * write. But let's just add a safety net.
> >> >> +		 */
> >> >> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
> >> >> +			ret = -EIO;
> >> >> +	}
> >> >
> >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
> >> > That way we don't have to put this logic into every filesystem.
> >> 
> >> This was origially intended as a safety net hence the WARN_ON_ONCE.
> >> Later Darrick pointed out that we still might have an unconverted
> >> condition in iomap which can return ENOTBLK for DIO atomic writes (page
> >> cache invalidation).
> >
> > Yes. That's my point - iomap knows that it's an atomic write, it
> > knows that invalidation failed, and it knows that there is no such
> > thing as buffered atomic writes. So there is no possible fallback
> > here, and it should be returning EIO in the page cache invalidation
> > failure case and not ENOTBLK.
> >
> 
> So the iomap DIO can return following as return values which can make
> some filesystems fallback to buffered-io (if they implement fallback
> logic) - 
> (1) -ENOTBLK -> this is only returned for pagecache invalidation failure.
> (2) 0 or partial write size -> This can never happen for atomic writes
> (since we are only allowing for single fsblock as of now).

Even when we allow multi-FSB atomic writes, the definition of
atomic write is still "all or nothing". There is no scope for "short
writes" when IOCB_ATOMIC is set - any condition that means we can't
write the entire IO as a single bio, we need to abort and return
EINVAL.

Hence -ENOTBLK should never be returned by iomap for atomic DIO
writes - we need to say -EINVAL if the write could not be issued
atomically for whatever reason it may be so the application knows
that atomic IO submission was not possible for that IO.

> Now looking at XFS, it never fallsback to buffered-io ever except just 2
> cases - 
> 1. When pagecache invalidation fails in iomap (can never happen for
> atomic writes)

Why can't this happen for atomic DIO writes?  It's the same failure
cases as for normal DIO writes, isn't it? (i.e. race with mmap
writes)

My point is that if it's an atomic write, this failure should get
turned into -EINVAL by the iomap code. We do not want a fallback to
buffered IO when this situation happens for atomic IO.

> 2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes)

This path doesn't ever go through iomap - XFS catches that case
before it calls into iomap, so it's not relevant to how iomap
behaves w.r.t atomic IO.

> So it anyways should never happen that XFS ever fallback to buffered-io
> for DIO atomic writes. Even today it does not fallback to buffered-io
> for non-atomic short DIO writes.
> 
> >> You pointed it right that it should be fixed in iomap. However do you
> >> think filesystems can still keep this as safety net (maybe no need of
> >> WARN_ON_ONCE).
> >
> > I don't see any point in adding "impossible to hit" checks into
> > filesystems just in case some core infrastructure has a bug
> > introduced....
> 
> Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short
> writes, though it should not happen for current atomic write case where
> we are only allowing for 1 fsblock. 

Yes, but the -ENOTBLK error returned from ext4_iomap_end() if
nothing was written does not get returned to ext4 from
__iomap_dio_rw(). It is consumed by the iomap code:

	/* magic error code to fall back to buffered I/O */
        if (ret == -ENOTBLK) {
                wait_for_completion = true;
                ret = 0;
	}

This means that all the IO that was issued gets completed before
returning to the caller and that's how the short write comes about.

-ENOTBLK is *not returned to the caller* on a short write -
iomap_dio_rw will return 0 (success).  The caller then has to look
at the iov_iter state to determine if the write was fully completed.
This is exactly what the ext4 code currently does for all DIO
writes, not just those that return -ENOTBLK.

> I would still like to go with a WARN_ON_ONCE where we are calling ext4
> buffered-io handling for DIO fallback writes. This is to catch any bugs
> even in future when we move to multi-fsblock case (until we have atomic
> write support for buffered-io).

Your choice, but please realise that it is not going to catch short
atomic writes at all.

-Dave.
Ritesh Harjani (IBM) Oct. 29, 2024, 11:51 p.m. UTC | #9
Hi Dave, 

Dave Chinner <david@fromorbit.com> writes:

> On Mon, Oct 28, 2024 at 11:44:00PM +0530, Ritesh Harjani wrote:
>> 
>> Hi Dave, 
>> 
>> Dave Chinner <david@fromorbit.com> writes:
>> 
>> > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote:
>> >> 
>> >> Hi Dave, 
>> >> 
>> >> Dave Chinner <david@fromorbit.com> writes:
>> >> 
>> >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote:
>> >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's
>> >> >> also add a WARN_ON_ONCE and return -EIO as a safety net.
>> >> >> 
>> >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> >> >> ---
>> >> >>  fs/ext4/file.c | 10 +++++++++-
>> >> >>  1 file changed, 9 insertions(+), 1 deletion(-)
>> >> >> 
>> >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> >> >> index f9516121a036..af6ebd0ac0d6 100644
>> >> >> --- a/fs/ext4/file.c
>> >> >> +++ b/fs/ext4/file.c
>> >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> >> >>  		iomap_ops = &ext4_iomap_overwrite_ops;
>> >> >>  	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
>> >> >>  			   dio_flags, NULL, 0);
>> >> >> -	if (ret == -ENOTBLK)
>> >> >> +	if (ret == -ENOTBLK) {
>> >> >>  		ret = 0;
>> >> >> +		/*
>> >> >> +		 * iomap will never return -ENOTBLK if write fails for atomic
>> >> >> +		 * write. But let's just add a safety net.
>> >> >> +		 */
>> >> >> +		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
>> >> >> +			ret = -EIO;
>> >> >> +	}
>> >> >
>> >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC?
>> >> > That way we don't have to put this logic into every filesystem.
>> >> 
>> >> This was origially intended as a safety net hence the WARN_ON_ONCE.
>> >> Later Darrick pointed out that we still might have an unconverted
>> >> condition in iomap which can return ENOTBLK for DIO atomic writes (page
>> >> cache invalidation).
>> >
>> > Yes. That's my point - iomap knows that it's an atomic write, it
>> > knows that invalidation failed, and it knows that there is no such
>> > thing as buffered atomic writes. So there is no possible fallback
>> > here, and it should be returning EIO in the page cache invalidation
>> > failure case and not ENOTBLK.
>> >
>> 
>> So the iomap DIO can return following as return values which can make
>> some filesystems fallback to buffered-io (if they implement fallback
>> logic) - 
>> (1) -ENOTBLK -> this is only returned for pagecache invalidation failure.
>> (2) 0 or partial write size -> This can never happen for atomic writes
>> (since we are only allowing for single fsblock as of now).
>
> Even when we allow multi-FSB atomic writes, the definition of
> atomic write is still "all or nothing". There is no scope for "short
> writes" when IOCB_ATOMIC is set - any condition that means we can't
> write the entire IO as a single bio, we need to abort and return
> EINVAL.

yes. As long as it is a single bio, I agree even the short write
condition should not hit based on the current iomap code.

>
> Hence -ENOTBLK should never be returned by iomap for atomic DIO
> writes - we need to say -EINVAL if the write could not be issued
> atomically for whatever reason it may be so the application knows
> that atomic IO submission was not possible for that IO.
>

Agreed Dave. That is what iomap is doing today for atomic write code. 
(Except maybe one minor difference where it returns -EAGAIN in case of
page cache invalidation assuming the failure maybe transient and the
request could be tried again).


	
>> Now looking at XFS, it never fallsback to buffered-io ever except just 2
>> cases - 
>> 1. When pagecache invalidation fails in iomap (can never happen for
>> atomic writes)
>
> Why can't this happen for atomic DIO writes?  It's the same failure
> cases as for normal DIO writes, isn't it? (i.e. race with mmap
> writes)
>

I meant after the patch which adds atomic write support in iomap code
from John, make sure we don't return -ENOTBLK in case of atomic write request. 


> My point is that if it's an atomic write, this failure should get
> turned into -EINVAL by the iomap code. We do not want a fallback to
> buffered IO when this situation happens for atomic IO.
>
>> 2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes)
>
> This path doesn't ever go through iomap - XFS catches that case
> before it calls into iomap, so it's not relevant to how iomap
> behaves w.r.t atomic IO.
>

Right.

>> So it anyways should never happen that XFS ever fallback to buffered-io
>> for DIO atomic writes. Even today it does not fallback to buffered-io
>> for non-atomic short DIO writes.
>> 
>> >> You pointed it right that it should be fixed in iomap. However do you
>> >> think filesystems can still keep this as safety net (maybe no need of
>> >> WARN_ON_ONCE).
>> >
>> > I don't see any point in adding "impossible to hit" checks into
>> > filesystems just in case some core infrastructure has a bug
>> > introduced....
>> 
>> Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short
>> writes, though it should not happen for current atomic write case where
>> we are only allowing for 1 fsblock. 
>
> Yes, but the -ENOTBLK error returned from ext4_iomap_end() if
> nothing was written does not get returned to ext4 from
> __iomap_dio_rw(). It is consumed by the iomap code:
>
> 	/* magic error code to fall back to buffered I/O */
>         if (ret == -ENOTBLK) {
>                 wait_for_completion = true;
>                 ret = 0;
> 	}
>
> This means that all the IO that was issued gets completed before
> returning to the caller and that's how the short write comes about.
>
> -ENOTBLK is *not returned to the caller* on a short write -

yes. That's my understanding too of the short write case handling in
iomap.

> iomap_dio_rw will return 0 (success).  The caller then has to look
> at the iov_iter state to determine if the write was fully completed.
> This is exactly what the ext4 code currently does for all DIO
> writes, not just those that return -ENOTBLK.
>

yes. Agreed.

>> I would still like to go with a WARN_ON_ONCE where we are calling ext4
>> buffered-io handling for DIO fallback writes. This is to catch any bugs
>> even in future when we move to multi-fsblock case (until we have atomic
>> write support for buffered-io).
>
> Your choice, but please realise that it is not going to catch short
> atomic writes at all.
>

Thanks Dave. Yes, I would like to maybe keep a WARN_ON_ONCE since ext4
has a fallback handling logic where a short DIO or -ENOTBLK case could
be later handled by buffered-io logic (though I agree iomap won't let it
happen for atomic write case). 

But a WARN_ON_ONCE just before buffered-io fallback handling logic in
ext4 DIO path would be my preferred choice only to make sure we could
catch any unwanted bugs in future too.

So I was thinking of this change instead - 


diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 8116bd78910b..61787a37e9d4 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -599,6 +599,13 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
                ssize_t err;
                loff_t endbyte;

+               /*
+                * There is no support for atomic writes on buffered-io yet,
+                * we should never fallback to buffered-io for DIO atomic
+                * writes.
+                */
+               WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC);
+
                offset = iocb->ki_pos;
                err = ext4_buffered_write_iter(iocb, from);
                if (err < 0)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fcdee27b9aa2..26b3c84d7f64 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3449,12 +3449,16 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 {
        /*
         * Check to see whether an error occurred while writing out the data to
-        * the allocated blocks. If so, return the magic error code so that we
-        * fallback to buffered I/O and attempt to complete the remainder of
-        * the I/O. Any blocks that may have been allocated in preparation for
-        * the direct I/O will be reused during buffered I/O.
+        * the allocated blocks. If so, return the magic error code for
+        * non-atomic write so that we fallback to buffered I/O and attempt to
+        * complete the remainder of the I/O.
+        * For atomic writes we will simply fail the I/O request if we coudn't
+        * write anything. For non-atomic writes, any blocks that may have been
+        * allocated in preparation for the direct I/O will be reused during
+        * buffered I/O.
         */
-       if (flags & (IOMAP_WRITE | IOMAP_DIRECT) && written == 0)
+       if (!(flags & IOMAP_ATOMIC) && (flags & (IOMAP_WRITE | IOMAP_DIRECT))
+                       && written == 0)
                return -ENOTBLK;

        return 0;


> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Thanks a lot for the review!

-ritesh
diff mbox series

Patch

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index f9516121a036..af6ebd0ac0d6 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -576,8 +576,16 @@  static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		iomap_ops = &ext4_iomap_overwrite_ops;
 	ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops,
 			   dio_flags, NULL, 0);
-	if (ret == -ENOTBLK)
+	if (ret == -ENOTBLK) {
 		ret = 0;
+		/*
+		 * iomap will never return -ENOTBLK if write fails for atomic
+		 * write. But let's just add a safety net.
+		 */
+		if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC))
+			ret = -EIO;
+	}
+
 	if (extend) {
 		/*
 		 * We always perform extending DIO write synchronously so by