diff mbox series

[man-pages,RFC] statx.2, open.2: document STATX_DIOALIGN

Message ID 20220616202141.125079-1-ebiggers@kernel.org (mailing list archive)
State Superseded
Headers show
Series [man-pages,RFC] statx.2, open.2: document STATX_DIOALIGN | expand

Commit Message

Eric Biggers June 16, 2022, 8:21 p.m. UTC
From: Eric Biggers <ebiggers@google.com>

Document the proposed STATX_DIOALIGN support for statx()
(https://lore.kernel.org/linux-fsdevel/20220616201506.124209-1-ebiggers@kernel.org).

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 man2/open.2  | 43 ++++++++++++++++++++++++++++++++-----------
 man2/statx.2 | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 63 insertions(+), 12 deletions(-)

Comments

Darrick J. Wong June 23, 2022, 4:02 p.m. UTC | #1
On Thu, Jun 16, 2022 at 01:21:41PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@google.com>
> 
> Document the proposed STATX_DIOALIGN support for statx()
> (https://lore.kernel.org/linux-fsdevel/20220616201506.124209-1-ebiggers@kernel.org).
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> ---
>  man2/open.2  | 43 ++++++++++++++++++++++++++++++++-----------
>  man2/statx.2 | 32 +++++++++++++++++++++++++++++++-
>  2 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/man2/open.2 b/man2/open.2
> index d1485999f..ef29847c3 100644
> --- a/man2/open.2
> +++ b/man2/open.2
> @@ -1732,21 +1732,42 @@ of user-space buffers and the file offset of I/Os.
>  In Linux alignment
>  restrictions vary by filesystem and kernel version and might be
>  absent entirely.
> -However there is currently no filesystem\-independent
> -interface for an application to discover these restrictions for a given
> -file or filesystem.
> -Some filesystems provide their own interfaces
> -for doing so, for example the
> +The handling of misaligned
> +.B O_DIRECT
> +I/Os also varies; they can either fail with
> +.B EINVAL
> +or fall back to buffered I/O.
> +.PP
> +Since Linux 5.20,
> +.B O_DIRECT
> +support and alignment restrictions for a file can be queried using
> +.BR statx (2),
> +using the
> +.B STATX_DIOALIGN
> +flag.
> +Support for
> +.B STATX_DIOALIGN
> +varies by filesystem; see
> +.BR statx (2).
> +.PP
> +Some filesystems provide their own interfaces for querying
> +.B O_DIRECT
> +alignment restrictions, for example the
>  .B XFS_IOC_DIOINFO
>  operation in
>  .BR xfsctl (3).
> +.B STATX_DIOALIGN
> +should be used instead when it is available.
>  .PP
> -Under Linux 2.4, transfer sizes, the alignment of the user buffer,
> -and the file offset must all be multiples of the logical block size
> -of the filesystem.
> -Since Linux 2.6.0, alignment to the logical block size of the
> -underlying storage (typically 512 bytes) suffices.
> -The logical block size can be determined using the
> +If none of the above is available, then direct I/O support and alignment
> +restrictions can only be assumed from known characteristics of the filesystem,
> +the individual file, the underlying storage device(s), and the kernel version.
> +In Linux 2.4, most block device based filesystems require that the file offset
> +and the length and memory address of all I/O segments be multiples of the
> +filesystem block size (typically 4096 bytes).
> +In Linux 2.6.0, this was relaxed to the logical block size of the block device
> +(typically 512 bytes).
> +A block device's logical block size can be determined using the
>  .BR ioctl (2)
>  .B BLKSSZGET
>  operation or from the shell using the command:
> diff --git a/man2/statx.2 b/man2/statx.2
> index a8620be6f..fff0a63ec 100644
> --- a/man2/statx.2
> +++ b/man2/statx.2
> @@ -61,7 +61,12 @@ struct statx {
>         containing the filesystem where the file resides */
>      __u32 stx_dev_major;   /* Major ID */
>      __u32 stx_dev_minor;   /* Minor ID */
> +
>      __u64 stx_mnt_id;      /* Mount ID */
> +
> +    /* Direct I/O alignment restrictions */
> +    __u32 stx_dio_mem_align;
> +    __u32 stx_dio_offset_align;
>  };
>  .EE
>  .in
> @@ -244,8 +249,11 @@ STATX_SIZE	Want stx_size
>  STATX_BLOCKS	Want stx_blocks
>  STATX_BASIC_STATS	[All of the above]
>  STATX_BTIME	Want stx_btime
> +STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
> +         	This is deprecated and should not be used.

STATX_ALL is deprecated??  I was under the impression that _ALL meant
all the known bits for that kernel release, but...

>  STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)

...I guess that is not correct.

> -STATX_ALL	[All currently available fields]
> +STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
> +              	(since Linux 5.20; support varies by filesystem)
>  .TE
>  .in
>  .PP
> @@ -406,6 +414,28 @@ This is the same number reported by
>  .BR name_to_handle_at (2)
>  and corresponds to the number in the first field in one of the records in
>  .IR /proc/self/mountinfo .
> +.TP
> +.I stx_dio_mem_align
> +The alignment (in bytes) required for user memory buffers for direct I/O
> +.BR "" ( O_DIRECT )
> +on this file. or 0 if direct I/O is not supported on this file.

"...on this file, or 0 if..."

> +.IP
> +.B STATX_DIOALIGN
> +.IR "" ( stx_dio_mem_align
> +and
> +.IR stx_dio_offset_align )
> +is supported on block devices since Linux 5.20.
> +The support on regular files varies by filesystem; it is supported by ext4 and
> +f2fs since Linux 5.20.

If the VFS changes don't provoke further bikeshedding, I'll contribute
an XFS patch to go with your series.

--D

> +.TP
> +.I stx_dio_offset_align
> +The alignment (in bytes) required for file offsets and I/O segment lengths for
> +direct I/O
> +.BR "" ( O_DIRECT )
> +on this file, or 0 if direct I/O is not supported on this file.
> +This will only be nonzero if
> +.I stx_dio_mem_align
> +is nonzero, and vice versa.
>  .PP
>  For further information on the above fields, see
>  .BR inode (7).
> -- 
> 2.36.1
>
Andreas Dilger June 23, 2022, 4:27 p.m. UTC | #2
On Jun 23, 2022, at 10:02 AM, Darrick J. Wong <djwong@kernel.org> wrote:
> 
> On Thu, Jun 16, 2022 at 01:21:41PM -0700, Eric Biggers wrote:
>> From: Eric Biggers <ebiggers@google.com>
>> 
>> @@ -244,8 +249,11 @@ STATX_SIZE	Want stx_size
>> STATX_BLOCKS	Want stx_blocks
>> STATX_BASIC_STATS	[All of the above]
>> STATX_BTIME	Want stx_btime
>> +STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
>> +         	This is deprecated and should not be used.
> 
> STATX_ALL is deprecated??  I was under the impression that _ALL meant
> all the known bits for that kernel release, but...

For userspace STATX_ALL doesn't make sense, and it isn't used by the kernel.

Firstly, that would be a compile-time value for an application, so it
may be incorrect for the kernel the code is actually run on (either too
many or too few bits could be set).

Secondly, it isn't really useful for an app to request "all attributes"
if it doesn't know what they all mean, as that potentially adds useless
overhead.  Better for it to explicitly request the attributes that it
needs.  If that is fewer than the kernel could return it is irrelevant,
since the app would ignore them anyway.

The kernel will already ignore and mask attributes that *it* doesn't
understand, so requesting more is fine and STATX_ALL doesn't help this.

Cheers, Andreas
Eric Biggers June 23, 2022, 5:13 p.m. UTC | #3
On Thu, Jun 23, 2022 at 10:27:19AM -0600, Andreas Dilger wrote:
> On Jun 23, 2022, at 10:02 AM, Darrick J. Wong <djwong@kernel.org> wrote:
> > 
> > On Thu, Jun 16, 2022 at 01:21:41PM -0700, Eric Biggers wrote:
> >> From: Eric Biggers <ebiggers@google.com>
> >> 
> >> @@ -244,8 +249,11 @@ STATX_SIZE	Want stx_size
> >> STATX_BLOCKS	Want stx_blocks
> >> STATX_BASIC_STATS	[All of the above]
> >> STATX_BTIME	Want stx_btime
> >> +STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
> >> +         	This is deprecated and should not be used.
> > 
> > STATX_ALL is deprecated??  I was under the impression that _ALL meant
> > all the known bits for that kernel release, but...
> 
> For userspace STATX_ALL doesn't make sense, and it isn't used by the kernel.
> 
> Firstly, that would be a compile-time value for an application, so it
> may be incorrect for the kernel the code is actually run on (either too
> many or too few bits could be set).
> 
> Secondly, it isn't really useful for an app to request "all attributes"
> if it doesn't know what they all mean, as that potentially adds useless
> overhead.  Better for it to explicitly request the attributes that it
> needs.  If that is fewer than the kernel could return it is irrelevant,
> since the app would ignore them anyway.
> 
> The kernel will already ignore and mask attributes that *it* doesn't
> understand, so requesting more is fine and STATX_ALL doesn't help this.
> 

What Andreas said.  Note, this discussion really should be happening on my
standalone patch that fixes the documentation for STATX_ALL:
https://lore.kernel.org/r/20220614034459.79889-1-ebiggers@kernel.org.  I folded
it into this RFC one only so that it applies cleanly without a prerequisite.

- Eric
diff mbox series

Patch

diff --git a/man2/open.2 b/man2/open.2
index d1485999f..ef29847c3 100644
--- a/man2/open.2
+++ b/man2/open.2
@@ -1732,21 +1732,42 @@  of user-space buffers and the file offset of I/Os.
 In Linux alignment
 restrictions vary by filesystem and kernel version and might be
 absent entirely.
-However there is currently no filesystem\-independent
-interface for an application to discover these restrictions for a given
-file or filesystem.
-Some filesystems provide their own interfaces
-for doing so, for example the
+The handling of misaligned
+.B O_DIRECT
+I/Os also varies; they can either fail with
+.B EINVAL
+or fall back to buffered I/O.
+.PP
+Since Linux 5.20,
+.B O_DIRECT
+support and alignment restrictions for a file can be queried using
+.BR statx (2),
+using the
+.B STATX_DIOALIGN
+flag.
+Support for
+.B STATX_DIOALIGN
+varies by filesystem; see
+.BR statx (2).
+.PP
+Some filesystems provide their own interfaces for querying
+.B O_DIRECT
+alignment restrictions, for example the
 .B XFS_IOC_DIOINFO
 operation in
 .BR xfsctl (3).
+.B STATX_DIOALIGN
+should be used instead when it is available.
 .PP
-Under Linux 2.4, transfer sizes, the alignment of the user buffer,
-and the file offset must all be multiples of the logical block size
-of the filesystem.
-Since Linux 2.6.0, alignment to the logical block size of the
-underlying storage (typically 512 bytes) suffices.
-The logical block size can be determined using the
+If none of the above is available, then direct I/O support and alignment
+restrictions can only be assumed from known characteristics of the filesystem,
+the individual file, the underlying storage device(s), and the kernel version.
+In Linux 2.4, most block device based filesystems require that the file offset
+and the length and memory address of all I/O segments be multiples of the
+filesystem block size (typically 4096 bytes).
+In Linux 2.6.0, this was relaxed to the logical block size of the block device
+(typically 512 bytes).
+A block device's logical block size can be determined using the
 .BR ioctl (2)
 .B BLKSSZGET
 operation or from the shell using the command:
diff --git a/man2/statx.2 b/man2/statx.2
index a8620be6f..fff0a63ec 100644
--- a/man2/statx.2
+++ b/man2/statx.2
@@ -61,7 +61,12 @@  struct statx {
        containing the filesystem where the file resides */
     __u32 stx_dev_major;   /* Major ID */
     __u32 stx_dev_minor;   /* Minor ID */
+
     __u64 stx_mnt_id;      /* Mount ID */
+
+    /* Direct I/O alignment restrictions */
+    __u32 stx_dio_mem_align;
+    __u32 stx_dio_offset_align;
 };
 .EE
 .in
@@ -244,8 +249,11 @@  STATX_SIZE	Want stx_size
 STATX_BLOCKS	Want stx_blocks
 STATX_BASIC_STATS	[All of the above]
 STATX_BTIME	Want stx_btime
+STATX_ALL	The same as STATX_BASIC_STATS | STATX_BTIME.
+         	This is deprecated and should not be used.
 STATX_MNT_ID	Want stx_mnt_id (since Linux 5.8)
-STATX_ALL	[All currently available fields]
+STATX_DIOALIGN	Want stx_dio_mem_align and stx_dio_offset_align
+              	(since Linux 5.20; support varies by filesystem)
 .TE
 .in
 .PP
@@ -406,6 +414,28 @@  This is the same number reported by
 .BR name_to_handle_at (2)
 and corresponds to the number in the first field in one of the records in
 .IR /proc/self/mountinfo .
+.TP
+.I stx_dio_mem_align
+The alignment (in bytes) required for user memory buffers for direct I/O
+.BR "" ( O_DIRECT )
+on this file. or 0 if direct I/O is not supported on this file.
+.IP
+.B STATX_DIOALIGN
+.IR "" ( stx_dio_mem_align
+and
+.IR stx_dio_offset_align )
+is supported on block devices since Linux 5.20.
+The support on regular files varies by filesystem; it is supported by ext4 and
+f2fs since Linux 5.20.
+.TP
+.I stx_dio_offset_align
+The alignment (in bytes) required for file offsets and I/O segment lengths for
+direct I/O
+.BR "" ( O_DIRECT )
+on this file, or 0 if direct I/O is not supported on this file.
+This will only be nonzero if
+.I stx_dio_mem_align
+is nonzero, and vice versa.
 .PP
 For further information on the above fields, see
 .BR inode (7).