Message ID | 20250227180813.1553404-10-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | large atomic writes for xfs with CoW | expand |
On Thu, Feb 27, 2025 at 06:08:10PM +0000, John Garry wrote: > Add xfs_file_dio_write_atomic() for dedicated handling of atomic writes. > > In case of -EAGAIN being returned from iomap_dio_rw(), reissue the write > in CoW-based atomic write mode. > > For CoW-based mode, ensure that we have no outstanding IOs which we > may trample on. > > Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> > Signed-off-by: John Garry <john.g.garry@oracle.com> > --- > fs/xfs/xfs_file.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 258c82cbce12..76ea59c638c3 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -619,6 +619,46 @@ xfs_file_dio_write_aligned( > return ret; > } > > +static noinline ssize_t > +xfs_file_dio_write_atomic( > + struct xfs_inode *ip, > + struct kiocb *iocb, > + struct iov_iter *from) > +{ > + unsigned int iolock = XFS_IOLOCK_SHARED; > + unsigned int dio_flags = 0; > + ssize_t ret; > + > +retry: > + ret = xfs_ilock_iocb_for_write(iocb, &iolock); > + if (ret) > + return ret; > + > + ret = xfs_file_write_checks(iocb, from, &iolock); > + if (ret) > + goto out_unlock; > + > + if (dio_flags & IOMAP_DIO_FORCE_WAIT) > + inode_dio_wait(VFS_I(ip)); > + > + trace_xfs_file_direct_write(iocb, from); > + ret = iomap_dio_rw(iocb, from, &xfs_atomic_write_iomap_ops, > + &xfs_dio_write_ops, dio_flags, NULL, 0); > + > + if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) && > + !(dio_flags & IOMAP_DIO_ATOMIC_SW)) { > + xfs_iunlock(ip, iolock); > + dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; One last little nit here: if the filesystem doesn't have reflink, you can't use copy on write as a fallback. /* * The atomic write fallback uses out of place writes * implemented with the COW code, so we must fail the * atomic write if that is not supported. */ if (!xfs_has_reflink(ip->i_mount)) return -EOPNOTSUPP; dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; You can retain my RVB if you add that. --D > + iolock = XFS_IOLOCK_EXCL; > + goto retry; > + } > + > +out_unlock: > + if (iolock) > + xfs_iunlock(ip, iolock); > + return ret; > +} > + > /* > * Handle block unaligned direct I/O writes > * > @@ -723,6 +763,8 @@ xfs_file_dio_write( > return -EINVAL; > if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask) > return xfs_file_dio_write_unaligned(ip, iocb, from); > + if (iocb->ki_flags & IOCB_ATOMIC) > + return xfs_file_dio_write_atomic(ip, iocb, from); > return xfs_file_dio_write_aligned(ip, iocb, from); > } > > -- > 2.31.1 >
On 28/02/2025 01:19, Darrick J. Wong wrote: >> + if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) && >> + !(dio_flags & IOMAP_DIO_ATOMIC_SW)) { >> + xfs_iunlock(ip, iolock); >> + dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; > One last little nit here: if the filesystem doesn't have reflink, you > can't use copy on write as a fallback. > > /* > * The atomic write fallback uses out of place writes > * implemented with the COW code, so we must fail the > * atomic write if that is not supported. > */ > if (!xfs_has_reflink(ip->i_mount)) > return -EOPNOTSUPP; > dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; > Currently the awu max is limited to 1x FS block if no reflink, and then we check the write length against awu max in xfs_file_write_iter() for IOCB_ATOMIC. And the xfs iomap would not request a SW-based atomic write for 1x FS block. So in a around-about way we are checking it. So let me know if you would still like that additional check - it seems sensible to add it. Cheers, John
On Fri, Feb 28, 2025 at 07:45:59AM +0000, John Garry wrote: > On 28/02/2025 01:19, Darrick J. Wong wrote: > > > + if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) && > > > + !(dio_flags & IOMAP_DIO_ATOMIC_SW)) { > > > + xfs_iunlock(ip, iolock); > > > + dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; > > One last little nit here: if the filesystem doesn't have reflink, you > > can't use copy on write as a fallback. > > > > /* > > * The atomic write fallback uses out of place writes > > * implemented with the COW code, so we must fail the > > * atomic write if that is not supported. > > */ > > if (!xfs_has_reflink(ip->i_mount)) > > return -EOPNOTSUPP; > > dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; > > > > Currently the awu max is limited to 1x FS block if no reflink, and then we > check the write length against awu max in xfs_file_write_iter() for > IOCB_ATOMIC. And the xfs iomap would not request a SW-based atomic write for > 1x FS block. So in a around-about way we are checking it. > > So let me know if you would still like that additional check - it seems > sensible to add it. Yes, please. The more guardrails the better, particularly when someone gets around to enabling software-only RWF_ATOMIC. --D > Cheers, > John > >
On 28/02/2025 15:39, Darrick J. Wong wrote: >>> One last little nit here: if the filesystem doesn't have reflink, you >>> can't use copy on write as a fallback. >>> >>> /* >>> * The atomic write fallback uses out of place writes >>> * implemented with the COW code, so we must fail the >>> * atomic write if that is not supported. >>> */ >>> if (!xfs_has_reflink(ip->i_mount)) >>> return -EOPNOTSUPP; >>> dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; >>> >> Currently the awu max is limited to 1x FS block if no reflink, and then we >> check the write length against awu max in xfs_file_write_iter() for >> IOCB_ATOMIC. And the xfs iomap would not request a SW-based atomic write for >> 1x FS block. So in a around-about way we are checking it. >> >> So let me know if you would still like that additional check - it seems >> sensible to add it. > Yes, please. The more guardrails the better, particularly when someone > gets around to enabling software-only RWF_ATOMIC. ok, but I think that adding it at the start of xfs_atomic_write_sw_iomap_begin() is a better place (to add it). It seems a bit neater (than adding it here) with the retry handling and locking/unlocking, and would save adding it in another possible future callsite for xfs_atomic_write_sw_iomap_begin(). Cheers, John
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 258c82cbce12..76ea59c638c3 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -619,6 +619,46 @@ xfs_file_dio_write_aligned( return ret; } +static noinline ssize_t +xfs_file_dio_write_atomic( + struct xfs_inode *ip, + struct kiocb *iocb, + struct iov_iter *from) +{ + unsigned int iolock = XFS_IOLOCK_SHARED; + unsigned int dio_flags = 0; + ssize_t ret; + +retry: + ret = xfs_ilock_iocb_for_write(iocb, &iolock); + if (ret) + return ret; + + ret = xfs_file_write_checks(iocb, from, &iolock); + if (ret) + goto out_unlock; + + if (dio_flags & IOMAP_DIO_FORCE_WAIT) + inode_dio_wait(VFS_I(ip)); + + trace_xfs_file_direct_write(iocb, from); + ret = iomap_dio_rw(iocb, from, &xfs_atomic_write_iomap_ops, + &xfs_dio_write_ops, dio_flags, NULL, 0); + + if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) && + !(dio_flags & IOMAP_DIO_ATOMIC_SW)) { + xfs_iunlock(ip, iolock); + dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; + iolock = XFS_IOLOCK_EXCL; + goto retry; + } + +out_unlock: + if (iolock) + xfs_iunlock(ip, iolock); + return ret; +} + /* * Handle block unaligned direct I/O writes * @@ -723,6 +763,8 @@ xfs_file_dio_write( return -EINVAL; if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask) return xfs_file_dio_write_unaligned(ip, iocb, from); + if (iocb->ki_flags & IOCB_ATOMIC) + return xfs_file_dio_write_atomic(ip, iocb, from); return xfs_file_dio_write_aligned(ip, iocb, from); }