Message ID | 20250227180813.1553404-6-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | large atomic writes for xfs with CoW | expand |
On Thu, Feb 27, 2025 at 06:08:06PM +0000, John Garry wrote: > Currently atomic write support requires dedicated HW support. This imposes > a restriction on the filesystem that disk blocks need to be aligned and > contiguously mapped to FS blocks to issue atomic writes. > > XFS has no method to guarantee FS block alignment for regular, > non-RT files. As such, atomic writes are currently limited to 1x FS block > there. > > To deal with the scenario that we are issuing an atomic write over > misaligned or discontiguous data blocks - and raise the atomic write size > limit - support a SW-based software emulated atomic write mode. For XFS, > this SW-based atomic writes would use CoW support to issue emulated untorn > writes. > > It is the responsibility of the FS to detect discontiguous atomic writes > and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, > SW-based atomic writes could be used always when the mounted bdev does > not support HW offload, but this strategy is not initially expected to be > used. > > Signed-off-by: John Garry <john.g.garry@oracle.com> Looks good now, thank you. Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> --D > --- > Documentation/filesystems/iomap/operations.rst | 16 ++++++++++++++-- > fs/iomap/direct-io.c | 4 +++- > include/linux/iomap.h | 6 ++++++ > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst > index 82bfe0e8c08e..b9757fe46641 100644 > --- a/Documentation/filesystems/iomap/operations.rst > +++ b/Documentation/filesystems/iomap/operations.rst > @@ -525,8 +525,20 @@ IOMAP_WRITE`` with any combination of the following enhancements: > conversion or copy on write), all updates for the entire file range > must be committed atomically as well. > Only one space mapping is allowed per untorn write. > - Untorn writes must be aligned to, and must not be longer than, a > - single file block. > + Untorn writes may be longer than a single file block. In all cases, > + the mapping start disk block must have at least the same alignment as > + the write offset. > + > + * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write > + protection via a software mechanism provided by the filesystem. > + All the disk block alignment and single bio restrictions which apply > + to IOMAP_ATOMIC_HW do not apply here. > + SW-based untorn writes would typically be used as a fallback when > + HW-based untorn writes may not be issued, e.g. the range of the write > + covers multiple extents, meaning that it is not possible to issue > + a single bio. > + All filesystem metadata updates for the entire file range must be > + committed atomically as well. > > Callers commonly hold ``i_rwsem`` in shared or exclusive mode before > calling this function. > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > index f87c4277e738..575bb69db00e 100644 > --- a/fs/iomap/direct-io.c > +++ b/fs/iomap/direct-io.c > @@ -644,7 +644,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > iomi.flags |= IOMAP_OVERWRITE_ONLY; > } > > - if (iocb->ki_flags & IOCB_ATOMIC) > + if (dio_flags & IOMAP_DIO_ATOMIC_SW) > + iomi.flags |= IOMAP_ATOMIC_SW; > + else if (iocb->ki_flags & IOCB_ATOMIC) > iomi.flags |= IOMAP_ATOMIC_HW; > > /* for data sync or sync, we need sync completion processing */ > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index e7aa05503763..4fa716241c46 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -183,6 +183,7 @@ struct iomap_folio_ops { > #define IOMAP_DAX 0 > #endif /* CONFIG_FS_DAX */ > #define IOMAP_ATOMIC_HW (1 << 9) /* HW-based torn-write protection */ > +#define IOMAP_ATOMIC_SW (1 << 10)/* SW-based torn-write protection */ > > struct iomap_ops { > /* > @@ -434,6 +435,11 @@ struct iomap_dio_ops { > */ > #define IOMAP_DIO_PARTIAL (1 << 2) > > +/* > + * Use software-based torn-write protection. > + */ > +#define IOMAP_DIO_ATOMIC_SW (1 << 3) > + > ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, > const struct iomap_ops *ops, const struct iomap_dio_ops *dops, > unsigned int dio_flags, void *private, size_t done_before); > -- > 2.31.1 >
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst index 82bfe0e8c08e..b9757fe46641 100644 --- a/Documentation/filesystems/iomap/operations.rst +++ b/Documentation/filesystems/iomap/operations.rst @@ -525,8 +525,20 @@ IOMAP_WRITE`` with any combination of the following enhancements: conversion or copy on write), all updates for the entire file range must be committed atomically as well. Only one space mapping is allowed per untorn write. - Untorn writes must be aligned to, and must not be longer than, a - single file block. + Untorn writes may be longer than a single file block. In all cases, + the mapping start disk block must have at least the same alignment as + the write offset. + + * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write + protection via a software mechanism provided by the filesystem. + All the disk block alignment and single bio restrictions which apply + to IOMAP_ATOMIC_HW do not apply here. + SW-based untorn writes would typically be used as a fallback when + HW-based untorn writes may not be issued, e.g. the range of the write + covers multiple extents, meaning that it is not possible to issue + a single bio. + All filesystem metadata updates for the entire file range must be + committed atomically as well. Callers commonly hold ``i_rwsem`` in shared or exclusive mode before calling this function. diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index f87c4277e738..575bb69db00e 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -644,7 +644,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_OVERWRITE_ONLY; } - if (iocb->ki_flags & IOCB_ATOMIC) + if (dio_flags & IOMAP_DIO_ATOMIC_SW) + iomi.flags |= IOMAP_ATOMIC_SW; + else if (iocb->ki_flags & IOCB_ATOMIC) iomi.flags |= IOMAP_ATOMIC_HW; /* for data sync or sync, we need sync completion processing */ diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e7aa05503763..4fa716241c46 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -183,6 +183,7 @@ struct iomap_folio_ops { #define IOMAP_DAX 0 #endif /* CONFIG_FS_DAX */ #define IOMAP_ATOMIC_HW (1 << 9) /* HW-based torn-write protection */ +#define IOMAP_ATOMIC_SW (1 << 10)/* SW-based torn-write protection */ struct iomap_ops { /* @@ -434,6 +435,11 @@ struct iomap_dio_ops { */ #define IOMAP_DIO_PARTIAL (1 << 2) +/* + * Use software-based torn-write protection. + */ +#define IOMAP_DIO_ATOMIC_SW (1 << 3) + ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int dio_flags, void *private, size_t done_before);
Currently atomic write support requires dedicated HW support. This imposes a restriction on the filesystem that disk blocks need to be aligned and contiguously mapped to FS blocks to issue atomic writes. XFS has no method to guarantee FS block alignment for regular, non-RT files. As such, atomic writes are currently limited to 1x FS block there. To deal with the scenario that we are issuing an atomic write over misaligned or discontiguous data blocks - and raise the atomic write size limit - support a SW-based software emulated atomic write mode. For XFS, this SW-based atomic writes would use CoW support to issue emulated untorn writes. It is the responsibility of the FS to detect discontiguous atomic writes and switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed, SW-based atomic writes could be used always when the mounted bdev does not support HW offload, but this strategy is not initially expected to be used. Signed-off-by: John Garry <john.g.garry@oracle.com> --- Documentation/filesystems/iomap/operations.rst | 16 ++++++++++++++-- fs/iomap/direct-io.c | 4 +++- include/linux/iomap.h | 6 ++++++ 3 files changed, 23 insertions(+), 3 deletions(-)