Message ID | 20220722071228.146690-2-ebiggers@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | make statx() return DIO alignment information | expand |
On Fri, Jul 22, 2022 at 12:12:20AM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@google.com> > > Traditionally, the conditions for when DIO (direct I/O) is supported > were fairly simple. For both block devices and regular files, DIO had > to be aligned to the logical block size of the block device. > > However, due to filesystem features that have been added over time (e.g. > multi-device support, data journalling, inline data, encryption, verity, > compression, checkpoint disabling, log-structured mode), the conditions > for when DIO is allowed on a regular file have gotten increasingly > complex. Whether a particular regular file supports DIO, and with what > alignment, can depend on various file attributes and filesystem mount > options, as well as which block device(s) the file's data is located on. > > Moreover, the general rule of DIO needing to be aligned to the block > device's logical block size is being relaxed to allow user buffers (but > not file offsets) aligned to the DMA alignment instead > (https://lore.kernel.org/linux-block/20220610195830.3574005-1-kbusch@fb.com/T/#u). > > XFS has an ioctl XFS_IOC_DIOINFO that exposes DIO alignment information. > Uplifting this to the VFS is one possibility. However, as discussed > (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u), > this ioctl is rarely used and not known to be used outside of > XFS-specific code. It was also never intended to indicate when a file > doesn't support DIO at all, nor was it intended for block devices. > > Therefore, let's expose this information via statx(). Add the > STATX_DIOALIGN flag and two new statx fields associated with it: > > * stx_dio_mem_align: the alignment (in bytes) required for user memory > buffers for DIO, or 0 if DIO is not supported on the file. > > * stx_dio_offset_align: the alignment (in bytes) required for file > offsets and I/O segment lengths for DIO, or 0 if DIO is not supported > on the file. This will only be nonzero if stx_dio_mem_align is > nonzero, and vice versa. > > Note that as with other statx() extensions, if STATX_DIOALIGN isn't set > in the returned statx struct, then these new fields won't be filled in. > This will happen if the file is neither a regular file nor a block > device, or if the file is a regular file and the filesystem doesn't > support STATX_DIOALIGN. It might also happen if the caller didn't > include STATX_DIOALIGN in the request mask, since statx() isn't required > to return unrequested information. > > This commit only adds the VFS-level plumbing for STATX_DIOALIGN. For > regular files, individual filesystems will still need to add code to > support it. For block devices, a separate commit will wire it up too. > > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Eric Biggers <ebiggers@google.com> Looks good to me, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/stat.c | 2 ++ > include/linux/stat.h | 2 ++ > include/uapi/linux/stat.h | 4 +++- > 3 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/stat.c b/fs/stat.c > index 9ced8860e0f35d..a7930d74448304 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -611,6 +611,8 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) > tmp.stx_dev_major = MAJOR(stat->dev); > tmp.stx_dev_minor = MINOR(stat->dev); > tmp.stx_mnt_id = stat->mnt_id; > + tmp.stx_dio_mem_align = stat->dio_mem_align; > + tmp.stx_dio_offset_align = stat->dio_offset_align; > > return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0; > } > diff --git a/include/linux/stat.h b/include/linux/stat.h > index 7df06931f25d85..ff277ced50e9fd 100644 > --- a/include/linux/stat.h > +++ b/include/linux/stat.h > @@ -50,6 +50,8 @@ struct kstat { > struct timespec64 btime; /* File creation time */ > u64 blocks; > u64 mnt_id; > + u32 dio_mem_align; > + u32 dio_offset_align; > }; > > #endif > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h > index 1500a0f58041ae..7cab2c65d3d7fc 100644 > --- a/include/uapi/linux/stat.h > +++ b/include/uapi/linux/stat.h > @@ -124,7 +124,8 @@ struct statx { > __u32 stx_dev_minor; > /* 0x90 */ > __u64 stx_mnt_id; > - __u64 __spare2; > + __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */ > + __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */ > /* 0xa0 */ > __u64 __spare3[12]; /* Spare space for future expansion */ > /* 0x100 */ > @@ -152,6 +153,7 @@ struct statx { > #define STATX_BASIC_STATS 0x000007ffU /* The stuff in the normal stat struct */ > #define STATX_BTIME 0x00000800U /* Want/got stx_btime */ > #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */ > +#define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */ > > #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */ > > -- > 2.37.0 >
Eric, > Therefore, let's expose this information via statx(). Add the > STATX_DIOALIGN flag and two new statx fields associated with it: > > * stx_dio_mem_align: the alignment (in bytes) required for user memory > buffers for DIO, or 0 if DIO is not supported on the file. > > * stx_dio_offset_align: the alignment (in bytes) required for file > offsets and I/O segment lengths for DIO, or 0 if DIO is not supported > on the file. This will only be nonzero if stx_dio_mem_align is > nonzero, and vice versa. Nice to finally have a generic interface for this! Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/fs/stat.c b/fs/stat.c index 9ced8860e0f35d..a7930d74448304 100644 --- a/fs/stat.c +++ b/fs/stat.c @@ -611,6 +611,8 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) tmp.stx_dev_major = MAJOR(stat->dev); tmp.stx_dev_minor = MINOR(stat->dev); tmp.stx_mnt_id = stat->mnt_id; + tmp.stx_dio_mem_align = stat->dio_mem_align; + tmp.stx_dio_offset_align = stat->dio_offset_align; return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0; } diff --git a/include/linux/stat.h b/include/linux/stat.h index 7df06931f25d85..ff277ced50e9fd 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -50,6 +50,8 @@ struct kstat { struct timespec64 btime; /* File creation time */ u64 blocks; u64 mnt_id; + u32 dio_mem_align; + u32 dio_offset_align; }; #endif diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 1500a0f58041ae..7cab2c65d3d7fc 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -124,7 +124,8 @@ struct statx { __u32 stx_dev_minor; /* 0x90 */ __u64 stx_mnt_id; - __u64 __spare2; + __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */ + __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */ /* 0xa0 */ __u64 __spare3[12]; /* Spare space for future expansion */ /* 0x100 */ @@ -152,6 +153,7 @@ struct statx { #define STATX_BASIC_STATS 0x000007ffU /* The stuff in the normal stat struct */ #define STATX_BTIME 0x00000800U /* Want/got stx_btime */ #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */ +#define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */ #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */