Message ID | 20240801-exportfs-u64-mount-id-v3-1-be5d6283144a@cyphar.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fhandle: expose u64 mount id to name_to_handle_at(2) | expand |
On Thu, 2024-08-01 at 13:52 +1000, Aleksa Sarai wrote: > Unfortunately, the way we have gone about adding new AT_* flags has > been a little messy. In the beginning, all of the AT_* flags had generic > meanings and so it made sense to share the flag bits indiscriminately. > However, we inevitably ran into syscalls that needed their own > syscall-specific flags. Due to the lack of a planned out policy, we > ended up with the following situations: > > * Existing syscalls adding new features tended to use new AT_* bits, > with some effort taken to try to re-use bits for flags that were so > obviously syscall specific that they only make sense for a single > syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet). > > Given the constraints of bitflags, this works well in practice, but > ideally (to avoid future confusion) we would plan ahead and define a > set of "per-syscall bits" ahead of time so that when allocating new > bits we don't end up with a complete mish-mash of which bits are > supposed to be per-syscall and which aren't. > > * New syscalls dealt with this in several ways: > > - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and > fspick(2)) created their separate own flag spaces that have no > overlap with the AT_* flags. Most of these ended up allocating > their bits sequentually. > > In the case of move_mount(2) and fspick(2), several flags have > identical meanings to AT_* flags but were allocated in their own > flag space. > > This makes sense for syscalls that will never share AT_* flags, but > for some syscalls this leads to duplication with AT_* flags in a > way that could cause confusion (if renameat2(2) grew a > RENAME_EMPTY_PATH it seems likely that users could mistake it for > AT_EMPTY_PATH since it is an *at(2) syscall). > > - Some syscalls unfortunately ended up both creating their own flag > space while also using bits from other flag spaces. The most > obvious example is open_tree(2), where the standard usage ends up > using flags from *THREE* separate flag spaces: > > open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE); > > (Note that O_CLOEXEC is also platform-specific, so several future > OPEN_TREE_* bits are also made unusable in one fell swoop.) > > It's not entirely clear to me what the "right" choice is for new > syscalls. Just saying that all future VFS syscalls should use AT_* flags > doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which > don't make much sense to burn generic AT_* flags for) and move_mount(2) > has separate AT_*-like flags for both the source and target so separate > flags are needed anyway (though it seems possible that renameat2(2) > could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame > they can't be reused). > > But at least for syscalls that _do_ choose to use AT_* flags, we should > explicitly state the policy that 0x2ff is currently intended for > per-syscall flags and that new flags should err on the side of > overlapping with existing flag bits (so we can extend the scope of > generic flags in the future if necessary). > > And add AT_* aliases for the RENAME_* flags to further cement that > renameat2(2) is an *at(2) flag, just with its own per-syscall flags. > > Suggested-by: Amir Goldstein <amir73il@gmail.com> > Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> > --- > include/uapi/linux/fcntl.h | 80 ++++++++++++++------- > tools/perf/trace/beauty/include/uapi/linux/fcntl.h | 83 +++++++++++++++------- > 2 files changed, 115 insertions(+), 48 deletions(-) > This seems like a reasonable proposal. We haven't been as careful about how these flags get allocated as we probably should have in the past. Nothing we can do about the existing ones, but some guidelines going forward are welcome. > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > index e55a3314bcb0..38a6d66d9e88 100644 > --- a/include/uapi/linux/fcntl.h > +++ b/include/uapi/linux/fcntl.h > @@ -90,37 +90,69 @@ > #define DN_ATTRIB 0x00000020 /* File changed attibutes */ > #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */ > > +#define AT_FDCWD -100 /* Special value for dirfd used to > + indicate openat should use the > + current working directory. */ > + > + > +/* Generic flags for the *at(2) family of syscalls. */ > + > +/* Reserved for per-syscall flags 0xff. */ > +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic > + links. */ > +/* Reserved for per-syscall flags 0x200 */ > +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount > + traversal. */ > +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative > + pathname to operate on dirfd > + directly. */ > /* > - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is > - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to > - * unlinkat. The two functions do completely different things and therefore, > - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to > - * faccessat would be undefined behavior and thus treating it equivalent to > - * AT_EACCESS is valid undefined behavior. > + * These flags are currently statx(2)-specific, but they could be made generic > + * in the future and so they should not be used for other per-syscall flags. > */ > -#define AT_FDCWD -100 /* Special value used to indicate > - openat should use the current > - working directory. */ > -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */ > +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ > +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ > +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ > +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ > + > +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ > + > +/* > + * Per-syscall flags for the *at(2) family of syscalls. > + * > + * These are flags that are so syscall-specific that a user passing these flags > + * to the wrong syscall is so "clearly wrong" that we can safely call such > + * usage "undefined behaviour". > + * > + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value. > + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful > + * only to unlinkat. The two functions do completely different things and > + * therefore, the flags can be allowed to overlap. For example, passing > + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it > + * equivalent to AT_EACCESS is valid undefined behavior. > + * > + * Note for implementers: When picking a new per-syscall AT_* flag, try to > + * reuse already existing flags first. This leaves us with as many unused bits > + * as possible, so we can use them for generic bits in the future if necessary. > + */ > + > +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */ > +#define AT_RENAME_NOREPLACE 0x0001 > +#define AT_RENAME_EXCHANGE 0x0002 > +#define AT_RENAME_WHITEOUT 0x0004 > + > +/* Flag for faccessat(2). */ > #define AT_EACCESS 0x200 /* Test access permitted for > effective IDs, not real IDs. */ > +/* Flag for unlinkat(2). */ > #define AT_REMOVEDIR 0x200 /* Remove directory instead of > unlinking file. */ > -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ > -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ > - > -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ > -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ > -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ > -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ > - > -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ > +/* Flags for name_to_handle_at(2). */ > +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare > + object identity and may not be > + usable with open_by_handle_at(2). */ > > -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ > -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > - compare object identity and may not > - be usable to open_by_handle_at(2) */ > #if defined(__KERNEL__) > #define AT_GETATTR_NOSEC 0x80000000 > #endif > diff --git a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h > index c0bcc185fa48..38a6d66d9e88 100644 > --- a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h > +++ b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h > @@ -16,6 +16,9 @@ > > #define F_DUPFD_QUERY (F_LINUX_SPECIFIC_BASE + 3) > > +/* Was the file just created? */ > +#define F_CREATED_QUERY (F_LINUX_SPECIFIC_BASE + 4) > + > /* > * Cancel a blocking posix lock; internal use only until we expose an > * asynchronous lock api to userspace: > @@ -87,37 +90,69 @@ > #define DN_ATTRIB 0x00000020 /* File changed attibutes */ > #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */ > > +#define AT_FDCWD -100 /* Special value for dirfd used to > + indicate openat should use the > + current working directory. */ > + > + > +/* Generic flags for the *at(2) family of syscalls. */ > + > +/* Reserved for per-syscall flags 0xff. */ > +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic > + links. */ > +/* Reserved for per-syscall flags 0x200 */ > +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount > + traversal. */ > +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative > + pathname to operate on dirfd > + directly. */ > +/* > + * These flags are currently statx(2)-specific, but they could be made generic > + * in the future and so they should not be used for other per-syscall flags. > + */ > +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ > +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ > +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ > +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ > + > +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ > + > /* > - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is > - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to > - * unlinkat. The two functions do completely different things and therefore, > - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to > - * faccessat would be undefined behavior and thus treating it equivalent to > - * AT_EACCESS is valid undefined behavior. > + * Per-syscall flags for the *at(2) family of syscalls. > + * > + * These are flags that are so syscall-specific that a user passing these flags > + * to the wrong syscall is so "clearly wrong" that we can safely call such > + * usage "undefined behaviour". > + * > + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value. > + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful > + * only to unlinkat. The two functions do completely different things and > + * therefore, the flags can be allowed to overlap. For example, passing > + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it > + * equivalent to AT_EACCESS is valid undefined behavior. > + * > + * Note for implementers: When picking a new per-syscall AT_* flag, try to > + * reuse already existing flags first. This leaves us with as many unused bits > + * as possible, so we can use them for generic bits in the future if necessary. > */ > -#define AT_FDCWD -100 /* Special value used to indicate > - openat should use the current > - working directory. */ > -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */ > + > +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */ > +#define AT_RENAME_NOREPLACE 0x0001 > +#define AT_RENAME_EXCHANGE 0x0002 > +#define AT_RENAME_WHITEOUT 0x0004 > + > +/* Flag for faccessat(2). */ > #define AT_EACCESS 0x200 /* Test access permitted for > effective IDs, not real IDs. */ > +/* Flag for unlinkat(2). */ > #define AT_REMOVEDIR 0x200 /* Remove directory instead of > unlinking file. */ > -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ > -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ > -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ > - > -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ > -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ > -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ > -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ > - > -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ > +/* Flags for name_to_handle_at(2). */ > +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare > + object identity and may not be > + usable with open_by_handle_at(2). */ > > -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ > -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > - compare object identity and may not > - be usable to open_by_handle_at(2) */ > #if defined(__KERNEL__) > #define AT_GETATTR_NOSEC 0x80000000 > #endif > Reviewed-by: Jeff Layton <jlayton@kernel.org>
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index e55a3314bcb0..38a6d66d9e88 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -90,37 +90,69 @@ #define DN_ATTRIB 0x00000020 /* File changed attibutes */ #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */ +#define AT_FDCWD -100 /* Special value for dirfd used to + indicate openat should use the + current working directory. */ + + +/* Generic flags for the *at(2) family of syscalls. */ + +/* Reserved for per-syscall flags 0xff. */ +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic + links. */ +/* Reserved for per-syscall flags 0x200 */ +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount + traversal. */ +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative + pathname to operate on dirfd + directly. */ /* - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to - * unlinkat. The two functions do completely different things and therefore, - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to - * faccessat would be undefined behavior and thus treating it equivalent to - * AT_EACCESS is valid undefined behavior. + * These flags are currently statx(2)-specific, but they could be made generic + * in the future and so they should not be used for other per-syscall flags. */ -#define AT_FDCWD -100 /* Special value used to indicate - openat should use the current - working directory. */ -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */ +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ + +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ + +/* + * Per-syscall flags for the *at(2) family of syscalls. + * + * These are flags that are so syscall-specific that a user passing these flags + * to the wrong syscall is so "clearly wrong" that we can safely call such + * usage "undefined behaviour". + * + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value. + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful + * only to unlinkat. The two functions do completely different things and + * therefore, the flags can be allowed to overlap. For example, passing + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it + * equivalent to AT_EACCESS is valid undefined behavior. + * + * Note for implementers: When picking a new per-syscall AT_* flag, try to + * reuse already existing flags first. This leaves us with as many unused bits + * as possible, so we can use them for generic bits in the future if necessary. + */ + +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */ +#define AT_RENAME_NOREPLACE 0x0001 +#define AT_RENAME_EXCHANGE 0x0002 +#define AT_RENAME_WHITEOUT 0x0004 + +/* Flag for faccessat(2). */ #define AT_EACCESS 0x200 /* Test access permitted for effective IDs, not real IDs. */ +/* Flag for unlinkat(2). */ #define AT_REMOVEDIR 0x200 /* Remove directory instead of unlinking file. */ -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ - -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ - -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ +/* Flags for name_to_handle_at(2). */ +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare + object identity and may not be + usable with open_by_handle_at(2). */ -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to - compare object identity and may not - be usable to open_by_handle_at(2) */ #if defined(__KERNEL__) #define AT_GETATTR_NOSEC 0x80000000 #endif diff --git a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h index c0bcc185fa48..38a6d66d9e88 100644 --- a/tools/perf/trace/beauty/include/uapi/linux/fcntl.h +++ b/tools/perf/trace/beauty/include/uapi/linux/fcntl.h @@ -16,6 +16,9 @@ #define F_DUPFD_QUERY (F_LINUX_SPECIFIC_BASE + 3) +/* Was the file just created? */ +#define F_CREATED_QUERY (F_LINUX_SPECIFIC_BASE + 4) + /* * Cancel a blocking posix lock; internal use only until we expose an * asynchronous lock api to userspace: @@ -87,37 +90,69 @@ #define DN_ATTRIB 0x00000020 /* File changed attibutes */ #define DN_MULTISHOT 0x80000000 /* Don't remove notifier */ +#define AT_FDCWD -100 /* Special value for dirfd used to + indicate openat should use the + current working directory. */ + + +/* Generic flags for the *at(2) family of syscalls. */ + +/* Reserved for per-syscall flags 0xff. */ +#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic + links. */ +/* Reserved for per-syscall flags 0x200 */ +#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ +#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount + traversal. */ +#define AT_EMPTY_PATH 0x1000 /* Allow empty relative + pathname to operate on dirfd + directly. */ +/* + * These flags are currently statx(2)-specific, but they could be made generic + * in the future and so they should not be used for other per-syscall flags. + */ +#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ +#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ +#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ +#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ + +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ + /* - * The constants AT_REMOVEDIR and AT_EACCESS have the same value. AT_EACCESS is - * meaningful only to faccessat, while AT_REMOVEDIR is meaningful only to - * unlinkat. The two functions do completely different things and therefore, - * the flags can be allowed to overlap. For example, passing AT_REMOVEDIR to - * faccessat would be undefined behavior and thus treating it equivalent to - * AT_EACCESS is valid undefined behavior. + * Per-syscall flags for the *at(2) family of syscalls. + * + * These are flags that are so syscall-specific that a user passing these flags + * to the wrong syscall is so "clearly wrong" that we can safely call such + * usage "undefined behaviour". + * + * For example, the constants AT_REMOVEDIR and AT_EACCESS have the same value. + * AT_EACCESS is meaningful only to faccessat, while AT_REMOVEDIR is meaningful + * only to unlinkat. The two functions do completely different things and + * therefore, the flags can be allowed to overlap. For example, passing + * AT_REMOVEDIR to faccessat would be undefined behavior and thus treating it + * equivalent to AT_EACCESS is valid undefined behavior. + * + * Note for implementers: When picking a new per-syscall AT_* flag, try to + * reuse already existing flags first. This leaves us with as many unused bits + * as possible, so we can use them for generic bits in the future if necessary. */ -#define AT_FDCWD -100 /* Special value used to indicate - openat should use the current - working directory. */ -#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */ + +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */ +#define AT_RENAME_NOREPLACE 0x0001 +#define AT_RENAME_EXCHANGE 0x0002 +#define AT_RENAME_WHITEOUT 0x0004 + +/* Flag for faccessat(2). */ #define AT_EACCESS 0x200 /* Test access permitted for effective IDs, not real IDs. */ +/* Flag for unlinkat(2). */ #define AT_REMOVEDIR 0x200 /* Remove directory instead of unlinking file. */ -#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */ -#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */ -#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */ - -#define AT_STATX_SYNC_TYPE 0x6000 /* Type of synchronisation required from statx() */ -#define AT_STATX_SYNC_AS_STAT 0x0000 /* - Do whatever stat() does */ -#define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ -#define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ - -#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ +/* Flags for name_to_handle_at(2). */ +#define AT_HANDLE_FID 0x200 /* File handle is needed to compare + object identity and may not be + usable with open_by_handle_at(2). */ -/* Flags for name_to_handle_at(2). We reuse AT_ flag space to save bits... */ -#define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to - compare object identity and may not - be usable to open_by_handle_at(2) */ #if defined(__KERNEL__) #define AT_GETATTR_NOSEC 0x80000000 #endif
Unfortunately, the way we have gone about adding new AT_* flags has been a little messy. In the beginning, all of the AT_* flags had generic meanings and so it made sense to share the flag bits indiscriminately. However, we inevitably ran into syscalls that needed their own syscall-specific flags. Due to the lack of a planned out policy, we ended up with the following situations: * Existing syscalls adding new features tended to use new AT_* bits, with some effort taken to try to re-use bits for flags that were so obviously syscall specific that they only make sense for a single syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet). Given the constraints of bitflags, this works well in practice, but ideally (to avoid future confusion) we would plan ahead and define a set of "per-syscall bits" ahead of time so that when allocating new bits we don't end up with a complete mish-mash of which bits are supposed to be per-syscall and which aren't. * New syscalls dealt with this in several ways: - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and fspick(2)) created their separate own flag spaces that have no overlap with the AT_* flags. Most of these ended up allocating their bits sequentually. In the case of move_mount(2) and fspick(2), several flags have identical meanings to AT_* flags but were allocated in their own flag space. This makes sense for syscalls that will never share AT_* flags, but for some syscalls this leads to duplication with AT_* flags in a way that could cause confusion (if renameat2(2) grew a RENAME_EMPTY_PATH it seems likely that users could mistake it for AT_EMPTY_PATH since it is an *at(2) syscall). - Some syscalls unfortunately ended up both creating their own flag space while also using bits from other flag spaces. The most obvious example is open_tree(2), where the standard usage ends up using flags from *THREE* separate flag spaces: open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE); (Note that O_CLOEXEC is also platform-specific, so several future OPEN_TREE_* bits are also made unusable in one fell swoop.) It's not entirely clear to me what the "right" choice is for new syscalls. Just saying that all future VFS syscalls should use AT_* flags doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which don't make much sense to burn generic AT_* flags for) and move_mount(2) has separate AT_*-like flags for both the source and target so separate flags are needed anyway (though it seems possible that renameat2(2) could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame they can't be reused). But at least for syscalls that _do_ choose to use AT_* flags, we should explicitly state the policy that 0x2ff is currently intended for per-syscall flags and that new flags should err on the side of overlapping with existing flag bits (so we can extend the scope of generic flags in the future if necessary). And add AT_* aliases for the RENAME_* flags to further cement that renameat2(2) is an *at(2) flag, just with its own per-syscall flags. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> --- include/uapi/linux/fcntl.h | 80 ++++++++++++++------- tools/perf/trace/beauty/include/uapi/linux/fcntl.h | 83 +++++++++++++++------- 2 files changed, 115 insertions(+), 48 deletions(-)