Message ID | 7c4779f1f0c8ead30f660a2cfbdf4d7cc08e405a.1729825985.git.ritesh.list@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | ext4: Add atomic write support for DIO | expand |
On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: > iomap will not return -ENOTBLK in case of dio atomic writes. But let's > also add a WARN_ON_ONCE and return -EIO as a safety net. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > --- > fs/ext4/file.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index f9516121a036..af6ebd0ac0d6 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > iomap_ops = &ext4_iomap_overwrite_ops; > ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, > dio_flags, NULL, 0); > - if (ret == -ENOTBLK) > + if (ret == -ENOTBLK) { > ret = 0; > + /* > + * iomap will never return -ENOTBLK if write fails for atomic > + * write. But let's just add a safety net. I think it can if the pagecache invalidation fails, so you really do need the safety net. I suspect that the xfs version of this series needs it too, though it may have fallen out? Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > + */ > + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) > + ret = -EIO; > + } > + > if (extend) { > /* > * We always perform extending DIO write synchronously so by > -- > 2.46.0 > >
"Darrick J. Wong" <djwong@kernel.org> writes: > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's >> also add a WARN_ON_ONCE and return -EIO as a safety net. >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >> --- >> fs/ext4/file.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c >> index f9516121a036..af6ebd0ac0d6 100644 >> --- a/fs/ext4/file.c >> +++ b/fs/ext4/file.c >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) >> iomap_ops = &ext4_iomap_overwrite_ops; >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, >> dio_flags, NULL, 0); >> - if (ret == -ENOTBLK) >> + if (ret == -ENOTBLK) { >> ret = 0; >> + /* >> + * iomap will never return -ENOTBLK if write fails for atomic >> + * write. But let's just add a safety net. > > I think it can if the pagecache invalidation fails, so you really do > need the safety net. Ah, right! So in that case I should remove WARN_ON_ONCE and correct the comment too. > I suspect that the xfs version of this series > needs it too, though it may have fallen out? > I think so yes. Looks like it got missed. > Reviewed-by: Darrick J. Wong <djwong@kernel.org> > > --D > Thanks for pointing that out. -ritesh >> + */ >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) >> + ret = -EIO; >> + } >> + >> if (extend) { >> /* >> * We always perform extending DIO write synchronously so by >> -- >> 2.46.0 >> >>
On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: > iomap will not return -ENOTBLK in case of dio atomic writes. But let's > also add a WARN_ON_ONCE and return -EIO as a safety net. > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > --- > fs/ext4/file.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index f9516121a036..af6ebd0ac0d6 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > iomap_ops = &ext4_iomap_overwrite_ops; > ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, > dio_flags, NULL, 0); > - if (ret == -ENOTBLK) > + if (ret == -ENOTBLK) { > ret = 0; > + /* > + * iomap will never return -ENOTBLK if write fails for atomic > + * write. But let's just add a safety net. > + */ > + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) > + ret = -EIO; > + } Why can't the iomap code return EIO in this case for IOCB_ATOMIC? That way we don't have to put this logic into every filesystem. When/if we start supporting atomic writes for buffered IO, then it's worth pushing this out to filesystems, but right now it doesn't seem necessary... -Dave.
Hi Dave, Dave Chinner <david@fromorbit.com> writes: > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's >> also add a WARN_ON_ONCE and return -EIO as a safety net. >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >> --- >> fs/ext4/file.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c >> index f9516121a036..af6ebd0ac0d6 100644 >> --- a/fs/ext4/file.c >> +++ b/fs/ext4/file.c >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) >> iomap_ops = &ext4_iomap_overwrite_ops; >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, >> dio_flags, NULL, 0); >> - if (ret == -ENOTBLK) >> + if (ret == -ENOTBLK) { >> ret = 0; >> + /* >> + * iomap will never return -ENOTBLK if write fails for atomic >> + * write. But let's just add a safety net. >> + */ >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) >> + ret = -EIO; >> + } > > Why can't the iomap code return EIO in this case for IOCB_ATOMIC? > That way we don't have to put this logic into every filesystem. This was origially intended as a safety net hence the WARN_ON_ONCE. Later Darrick pointed out that we still might have an unconverted condition in iomap which can return ENOTBLK for DIO atomic writes (page cache invalidation). You pointed it right that it should be fixed in iomap. However do you think filesystems can still keep this as safety net (maybe no need of WARN_ON_ONCE). > > When/if we start supporting atomic writes for buffered IO, then it's > worth pushing this out to filesystems, but right now it doesn't seem > necessary... > > -Dave. > -- > Dave Chinner > david@fromorbit.com
On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote: > > Hi Dave, > > Dave Chinner <david@fromorbit.com> writes: > > > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: > >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's > >> also add a WARN_ON_ONCE and return -EIO as a safety net. > >> > >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > >> --- > >> fs/ext4/file.c | 10 +++++++++- > >> 1 file changed, 9 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c > >> index f9516121a036..af6ebd0ac0d6 100644 > >> --- a/fs/ext4/file.c > >> +++ b/fs/ext4/file.c > >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > >> iomap_ops = &ext4_iomap_overwrite_ops; > >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, > >> dio_flags, NULL, 0); > >> - if (ret == -ENOTBLK) > >> + if (ret == -ENOTBLK) { > >> ret = 0; > >> + /* > >> + * iomap will never return -ENOTBLK if write fails for atomic > >> + * write. But let's just add a safety net. > >> + */ > >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) > >> + ret = -EIO; > >> + } > > > > Why can't the iomap code return EIO in this case for IOCB_ATOMIC? > > That way we don't have to put this logic into every filesystem. > > This was origially intended as a safety net hence the WARN_ON_ONCE. > Later Darrick pointed out that we still might have an unconverted > condition in iomap which can return ENOTBLK for DIO atomic writes (page > cache invalidation). Yes. That's my point - iomap knows that it's an atomic write, it knows that invalidation failed, and it knows that there is no such thing as buffered atomic writes. So there is no possible fallback here, and it should be returning EIO in the page cache invalidation failure case and not ENOTBLK. > You pointed it right that it should be fixed in iomap. However do you > think filesystems can still keep this as safety net (maybe no need of > WARN_ON_ONCE). I don't see any point in adding "impossible to hit" checks into filesystems just in case some core infrastructure has a bug introduced.... -Dave.
Dave Chinner <david@fromorbit.com> writes: > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote: >> >> Hi Dave, >> >> Dave Chinner <david@fromorbit.com> writes: >> >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's >> >> also add a WARN_ON_ONCE and return -EIO as a safety net. >> >> >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >> >> --- >> >> fs/ext4/file.c | 10 +++++++++- >> >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c >> >> index f9516121a036..af6ebd0ac0d6 100644 >> >> --- a/fs/ext4/file.c >> >> +++ b/fs/ext4/file.c >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) >> >> iomap_ops = &ext4_iomap_overwrite_ops; >> >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, >> >> dio_flags, NULL, 0); >> >> - if (ret == -ENOTBLK) >> >> + if (ret == -ENOTBLK) { >> >> ret = 0; >> >> + /* >> >> + * iomap will never return -ENOTBLK if write fails for atomic >> >> + * write. But let's just add a safety net. >> >> + */ >> >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) >> >> + ret = -EIO; >> >> + } >> > >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC? >> > That way we don't have to put this logic into every filesystem. >> >> This was origially intended as a safety net hence the WARN_ON_ONCE. >> Later Darrick pointed out that we still might have an unconverted >> condition in iomap which can return ENOTBLK for DIO atomic writes (page >> cache invalidation). > > Yes. That's my point - iomap knows that it's an atomic write, it > knows that invalidation failed, and it knows that there is no such > thing as buffered atomic writes. So there is no possible fallback > here, and it should be returning EIO in the page cache invalidation > failure case and not ENOTBLK. > Sorry my bad. I think I might have looked into a different version of the code earlier. So the current patch from John already takes care of the condition where if the page cache invalidation fails we don't return -ENOTBLK [1] [1]: https://lore.kernel.org/linux-xfs/Zxnp8bma2KrMDg5m@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/T/#m3664bbe00287d98caa690bb04f51d0ef164f52b3 >> You pointed it right that it should be fixed in iomap. However do you >> think filesystems can still keep this as safety net (maybe no need of >> WARN_ON_ONCE). > > I don't see any point in adding "impossible to hit" checks into > filesystems just in case some core infrastructure has a bug > introduced.... > So even though we have taken care of that case from page cache invalidation code, however it can still happen if iomap_iter() ever returns -ENOTBLK. e.g. blk_start_plug(&plug); while ((ret = iomap_iter(&iomi, ops)) > 0) { iomi.processed = iomap_dio_iter(&iomi, dio); /* * We can only poll for single bio I/Os. */ iocb->ki_flags &= ~IOCB_HIPRI; } blk_finish_plug(&plug); /* * We only report that we've read data up to i_size. * Revert iter to a state corresponding to that as some callers (such * as the splice code) rely on it. */ if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size) iov_iter_revert(iter, iomi.pos - dio->i_size); if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) { if (!(iocb->ki_flags & IOCB_NOWAIT)) wait_for_completion = true; ret = 0; } /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { wait_for_completion = true; ret = 0; } Reviewing the code paths there is a lot of ping pongs between core iomap and FS. So it's not just core iomap what we are talking about here. So I am still inclined towards having that check in place as a safety net. However - let me take some time to review some of this code paths please. I wanted to send this email mainly to mention the point that page cache invalidation case is already taken care in iomap for atomic writes, so there is no bug there. I will get back on rest of the cases after I have looked more closely at it. > -Dave. > > -- > Dave Chinner > david@fromorbit.com Thanks for the review! -ritesh
Hi Dave, Dave Chinner <david@fromorbit.com> writes: > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote: >> >> Hi Dave, >> >> Dave Chinner <david@fromorbit.com> writes: >> >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's >> >> also add a WARN_ON_ONCE and return -EIO as a safety net. >> >> >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >> >> --- >> >> fs/ext4/file.c | 10 +++++++++- >> >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c >> >> index f9516121a036..af6ebd0ac0d6 100644 >> >> --- a/fs/ext4/file.c >> >> +++ b/fs/ext4/file.c >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) >> >> iomap_ops = &ext4_iomap_overwrite_ops; >> >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, >> >> dio_flags, NULL, 0); >> >> - if (ret == -ENOTBLK) >> >> + if (ret == -ENOTBLK) { >> >> ret = 0; >> >> + /* >> >> + * iomap will never return -ENOTBLK if write fails for atomic >> >> + * write. But let's just add a safety net. >> >> + */ >> >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) >> >> + ret = -EIO; >> >> + } >> > >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC? >> > That way we don't have to put this logic into every filesystem. >> >> This was origially intended as a safety net hence the WARN_ON_ONCE. >> Later Darrick pointed out that we still might have an unconverted >> condition in iomap which can return ENOTBLK for DIO atomic writes (page >> cache invalidation). > > Yes. That's my point - iomap knows that it's an atomic write, it > knows that invalidation failed, and it knows that there is no such > thing as buffered atomic writes. So there is no possible fallback > here, and it should be returning EIO in the page cache invalidation > failure case and not ENOTBLK. > So the iomap DIO can return following as return values which can make some filesystems fallback to buffered-io (if they implement fallback logic) - (1) -ENOTBLK -> this is only returned for pagecache invalidation failure. (2) 0 or partial write size -> This can never happen for atomic writes (since we are only allowing for single fsblock as of now). Now looking at XFS, it never fallsback to buffered-io ever except just 2 cases - 1. When pagecache invalidation fails in iomap (can never happen for atomic writes) 2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes) So it anyways should never happen that XFS ever fallback to buffered-io for DIO atomic writes. Even today it does not fallback to buffered-io for non-atomic short DIO writes. >> You pointed it right that it should be fixed in iomap. However do you >> think filesystems can still keep this as safety net (maybe no need of >> WARN_ON_ONCE). > > I don't see any point in adding "impossible to hit" checks into > filesystems just in case some core infrastructure has a bug > introduced.... Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short writes, though it should not happen for current atomic write case where we are only allowing for 1 fsblock. However given there are several places in EXT4 which has got fallback logic, I would still like to go with a WARN_ON_ONCE where we are calling ext4 buffered-io handling for DIO fallback writes. This is to catch any bugs even in future when we move to multi-fsblock case (until we have atomic write support for buffered-io). Thanks! -ritesh
On Mon, Oct 28, 2024 at 11:44:00PM +0530, Ritesh Harjani wrote: > > Hi Dave, > > Dave Chinner <david@fromorbit.com> writes: > > > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote: > >> > >> Hi Dave, > >> > >> Dave Chinner <david@fromorbit.com> writes: > >> > >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: > >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's > >> >> also add a WARN_ON_ONCE and return -EIO as a safety net. > >> >> > >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> > >> >> --- > >> >> fs/ext4/file.c | 10 +++++++++- > >> >> 1 file changed, 9 insertions(+), 1 deletion(-) > >> >> > >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c > >> >> index f9516121a036..af6ebd0ac0d6 100644 > >> >> --- a/fs/ext4/file.c > >> >> +++ b/fs/ext4/file.c > >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) > >> >> iomap_ops = &ext4_iomap_overwrite_ops; > >> >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, > >> >> dio_flags, NULL, 0); > >> >> - if (ret == -ENOTBLK) > >> >> + if (ret == -ENOTBLK) { > >> >> ret = 0; > >> >> + /* > >> >> + * iomap will never return -ENOTBLK if write fails for atomic > >> >> + * write. But let's just add a safety net. > >> >> + */ > >> >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) > >> >> + ret = -EIO; > >> >> + } > >> > > >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC? > >> > That way we don't have to put this logic into every filesystem. > >> > >> This was origially intended as a safety net hence the WARN_ON_ONCE. > >> Later Darrick pointed out that we still might have an unconverted > >> condition in iomap which can return ENOTBLK for DIO atomic writes (page > >> cache invalidation). > > > > Yes. That's my point - iomap knows that it's an atomic write, it > > knows that invalidation failed, and it knows that there is no such > > thing as buffered atomic writes. So there is no possible fallback > > here, and it should be returning EIO in the page cache invalidation > > failure case and not ENOTBLK. > > > > So the iomap DIO can return following as return values which can make > some filesystems fallback to buffered-io (if they implement fallback > logic) - > (1) -ENOTBLK -> this is only returned for pagecache invalidation failure. > (2) 0 or partial write size -> This can never happen for atomic writes > (since we are only allowing for single fsblock as of now). Even when we allow multi-FSB atomic writes, the definition of atomic write is still "all or nothing". There is no scope for "short writes" when IOCB_ATOMIC is set - any condition that means we can't write the entire IO as a single bio, we need to abort and return EINVAL. Hence -ENOTBLK should never be returned by iomap for atomic DIO writes - we need to say -EINVAL if the write could not be issued atomically for whatever reason it may be so the application knows that atomic IO submission was not possible for that IO. > Now looking at XFS, it never fallsback to buffered-io ever except just 2 > cases - > 1. When pagecache invalidation fails in iomap (can never happen for > atomic writes) Why can't this happen for atomic DIO writes? It's the same failure cases as for normal DIO writes, isn't it? (i.e. race with mmap writes) My point is that if it's an atomic write, this failure should get turned into -EINVAL by the iomap code. We do not want a fallback to buffered IO when this situation happens for atomic IO. > 2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes) This path doesn't ever go through iomap - XFS catches that case before it calls into iomap, so it's not relevant to how iomap behaves w.r.t atomic IO. > So it anyways should never happen that XFS ever fallback to buffered-io > for DIO atomic writes. Even today it does not fallback to buffered-io > for non-atomic short DIO writes. > > >> You pointed it right that it should be fixed in iomap. However do you > >> think filesystems can still keep this as safety net (maybe no need of > >> WARN_ON_ONCE). > > > > I don't see any point in adding "impossible to hit" checks into > > filesystems just in case some core infrastructure has a bug > > introduced.... > > Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short > writes, though it should not happen for current atomic write case where > we are only allowing for 1 fsblock. Yes, but the -ENOTBLK error returned from ext4_iomap_end() if nothing was written does not get returned to ext4 from __iomap_dio_rw(). It is consumed by the iomap code: /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { wait_for_completion = true; ret = 0; } This means that all the IO that was issued gets completed before returning to the caller and that's how the short write comes about. -ENOTBLK is *not returned to the caller* on a short write - iomap_dio_rw will return 0 (success). The caller then has to look at the iov_iter state to determine if the write was fully completed. This is exactly what the ext4 code currently does for all DIO writes, not just those that return -ENOTBLK. > I would still like to go with a WARN_ON_ONCE where we are calling ext4 > buffered-io handling for DIO fallback writes. This is to catch any bugs > even in future when we move to multi-fsblock case (until we have atomic > write support for buffered-io). Your choice, but please realise that it is not going to catch short atomic writes at all. -Dave.
Hi Dave, Dave Chinner <david@fromorbit.com> writes: > On Mon, Oct 28, 2024 at 11:44:00PM +0530, Ritesh Harjani wrote: >> >> Hi Dave, >> >> Dave Chinner <david@fromorbit.com> writes: >> >> > On Mon, Oct 28, 2024 at 06:39:36AM +0530, Ritesh Harjani wrote: >> >> >> >> Hi Dave, >> >> >> >> Dave Chinner <david@fromorbit.com> writes: >> >> >> >> > On Fri, Oct 25, 2024 at 09:15:53AM +0530, Ritesh Harjani (IBM) wrote: >> >> >> iomap will not return -ENOTBLK in case of dio atomic writes. But let's >> >> >> also add a WARN_ON_ONCE and return -EIO as a safety net. >> >> >> >> >> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> >> >> >> --- >> >> >> fs/ext4/file.c | 10 +++++++++- >> >> >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> >> >> >> >> diff --git a/fs/ext4/file.c b/fs/ext4/file.c >> >> >> index f9516121a036..af6ebd0ac0d6 100644 >> >> >> --- a/fs/ext4/file.c >> >> >> +++ b/fs/ext4/file.c >> >> >> @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) >> >> >> iomap_ops = &ext4_iomap_overwrite_ops; >> >> >> ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, >> >> >> dio_flags, NULL, 0); >> >> >> - if (ret == -ENOTBLK) >> >> >> + if (ret == -ENOTBLK) { >> >> >> ret = 0; >> >> >> + /* >> >> >> + * iomap will never return -ENOTBLK if write fails for atomic >> >> >> + * write. But let's just add a safety net. >> >> >> + */ >> >> >> + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) >> >> >> + ret = -EIO; >> >> >> + } >> >> > >> >> > Why can't the iomap code return EIO in this case for IOCB_ATOMIC? >> >> > That way we don't have to put this logic into every filesystem. >> >> >> >> This was origially intended as a safety net hence the WARN_ON_ONCE. >> >> Later Darrick pointed out that we still might have an unconverted >> >> condition in iomap which can return ENOTBLK for DIO atomic writes (page >> >> cache invalidation). >> > >> > Yes. That's my point - iomap knows that it's an atomic write, it >> > knows that invalidation failed, and it knows that there is no such >> > thing as buffered atomic writes. So there is no possible fallback >> > here, and it should be returning EIO in the page cache invalidation >> > failure case and not ENOTBLK. >> > >> >> So the iomap DIO can return following as return values which can make >> some filesystems fallback to buffered-io (if they implement fallback >> logic) - >> (1) -ENOTBLK -> this is only returned for pagecache invalidation failure. >> (2) 0 or partial write size -> This can never happen for atomic writes >> (since we are only allowing for single fsblock as of now). > > Even when we allow multi-FSB atomic writes, the definition of > atomic write is still "all or nothing". There is no scope for "short > writes" when IOCB_ATOMIC is set - any condition that means we can't > write the entire IO as a single bio, we need to abort and return > EINVAL. yes. As long as it is a single bio, I agree even the short write condition should not hit based on the current iomap code. > > Hence -ENOTBLK should never be returned by iomap for atomic DIO > writes - we need to say -EINVAL if the write could not be issued > atomically for whatever reason it may be so the application knows > that atomic IO submission was not possible for that IO. > Agreed Dave. That is what iomap is doing today for atomic write code. (Except maybe one minor difference where it returns -EAGAIN in case of page cache invalidation assuming the failure maybe transient and the request could be tried again). >> Now looking at XFS, it never fallsback to buffered-io ever except just 2 >> cases - >> 1. When pagecache invalidation fails in iomap (can never happen for >> atomic writes) > > Why can't this happen for atomic DIO writes? It's the same failure > cases as for normal DIO writes, isn't it? (i.e. race with mmap > writes) > I meant after the patch which adds atomic write support in iomap code from John, make sure we don't return -ENOTBLK in case of atomic write request. > My point is that if it's an atomic write, this failure should get > turned into -EINVAL by the iomap code. We do not want a fallback to > buffered IO when this situation happens for atomic IO. > >> 2. On unaligned DIO writes to reflinked CoW (not possible for atomic writes) > > This path doesn't ever go through iomap - XFS catches that case > before it calls into iomap, so it's not relevant to how iomap > behaves w.r.t atomic IO. > Right. >> So it anyways should never happen that XFS ever fallback to buffered-io >> for DIO atomic writes. Even today it does not fallback to buffered-io >> for non-atomic short DIO writes. >> >> >> You pointed it right that it should be fixed in iomap. However do you >> >> think filesystems can still keep this as safety net (maybe no need of >> >> WARN_ON_ONCE). >> > >> > I don't see any point in adding "impossible to hit" checks into >> > filesystems just in case some core infrastructure has a bug >> > introduced.... >> >> Yes, that is true for XFS. EXT4 however can return -ENOTBLK for short >> writes, though it should not happen for current atomic write case where >> we are only allowing for 1 fsblock. > > Yes, but the -ENOTBLK error returned from ext4_iomap_end() if > nothing was written does not get returned to ext4 from > __iomap_dio_rw(). It is consumed by the iomap code: > > /* magic error code to fall back to buffered I/O */ > if (ret == -ENOTBLK) { > wait_for_completion = true; > ret = 0; > } > > This means that all the IO that was issued gets completed before > returning to the caller and that's how the short write comes about. > > -ENOTBLK is *not returned to the caller* on a short write - yes. That's my understanding too of the short write case handling in iomap. > iomap_dio_rw will return 0 (success). The caller then has to look > at the iov_iter state to determine if the write was fully completed. > This is exactly what the ext4 code currently does for all DIO > writes, not just those that return -ENOTBLK. > yes. Agreed. >> I would still like to go with a WARN_ON_ONCE where we are calling ext4 >> buffered-io handling for DIO fallback writes. This is to catch any bugs >> even in future when we move to multi-fsblock case (until we have atomic >> write support for buffered-io). > > Your choice, but please realise that it is not going to catch short > atomic writes at all. > Thanks Dave. Yes, I would like to maybe keep a WARN_ON_ONCE since ext4 has a fallback handling logic where a short DIO or -ENOTBLK case could be later handled by buffered-io logic (though I agree iomap won't let it happen for atomic write case). But a WARN_ON_ONCE just before buffered-io fallback handling logic in ext4 DIO path would be my preferred choice only to make sure we could catch any unwanted bugs in future too. So I was thinking of this change instead - diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 8116bd78910b..61787a37e9d4 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -599,6 +599,13 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) ssize_t err; loff_t endbyte; + /* + * There is no support for atomic writes on buffered-io yet, + * we should never fallback to buffered-io for DIO atomic + * writes. + */ + WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC); + offset = iocb->ki_pos; err = ext4_buffered_write_iter(iocb, from); if (err < 0) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fcdee27b9aa2..26b3c84d7f64 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3449,12 +3449,16 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length, { /* * Check to see whether an error occurred while writing out the data to - * the allocated blocks. If so, return the magic error code so that we - * fallback to buffered I/O and attempt to complete the remainder of - * the I/O. Any blocks that may have been allocated in preparation for - * the direct I/O will be reused during buffered I/O. + * the allocated blocks. If so, return the magic error code for + * non-atomic write so that we fallback to buffered I/O and attempt to + * complete the remainder of the I/O. + * For atomic writes we will simply fail the I/O request if we coudn't + * write anything. For non-atomic writes, any blocks that may have been + * allocated in preparation for the direct I/O will be reused during + * buffered I/O. */ - if (flags & (IOMAP_WRITE | IOMAP_DIRECT) && written == 0) + if (!(flags & IOMAP_ATOMIC) && (flags & (IOMAP_WRITE | IOMAP_DIRECT)) + && written == 0) return -ENOTBLK; return 0; > -Dave. > -- > Dave Chinner > david@fromorbit.com Thanks a lot for the review! -ritesh
diff --git a/fs/ext4/file.c b/fs/ext4/file.c index f9516121a036..af6ebd0ac0d6 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -576,8 +576,16 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from) iomap_ops = &ext4_iomap_overwrite_ops; ret = iomap_dio_rw(iocb, from, iomap_ops, &ext4_dio_write_ops, dio_flags, NULL, 0); - if (ret == -ENOTBLK) + if (ret == -ENOTBLK) { ret = 0; + /* + * iomap will never return -ENOTBLK if write fails for atomic + * write. But let's just add a safety net. + */ + if (WARN_ON_ONCE(iocb->ki_flags & IOCB_ATOMIC)) + ret = -EIO; + } + if (extend) { /* * We always perform extending DIO write synchronously so by
iomap will not return -ENOTBLK in case of dio atomic writes. But let's also add a WARN_ON_ONCE and return -EIO as a safety net. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/file.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)