diff mbox series

fs/xattr: actually support O_PATH fds in *xattrat() syscalls

Message ID 20250205204628.49607-1-me@yhndnzj.com (mailing list archive)
State New
Headers show
Series fs/xattr: actually support O_PATH fds in *xattrat() syscalls | expand

Commit Message

Mike Yuan Feb. 5, 2025, 8:47 p.m. UTC
Cited from commit message of original patch [1]:

> One use case will be setfiles(8) setting SELinux file contexts
> ("security.selinux") without race conditions and without a file
> descriptor opened with read access requiring SELinux read permission.

Also, generally all *at() syscalls operate on O_PATH fds, unlike
f*() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
in the final version merged into tree. Instead, let's switch things
to CLASS(fd_raw).

Note that there's one side effect: f*xattr() starts to work with
O_PATH fds too. It's not clear to me whether this is desirable
(e.g. fstat() accepts O_PATH fds as an outlier).

[1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/

Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
Signed-off-by: Mike Yuan <me@yhndnzj.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Göttsche <cgzones@googlemail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: <stable@vger.kernel.org>
---
 fs/xattr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)


base-commit: a86bf2283d2c9769205407e2b54777c03d012939

Comments

Christian Brauner Feb. 6, 2025, 9:31 a.m. UTC | #1
On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> Cited from commit message of original patch [1]:
> 
> > One use case will be setfiles(8) setting SELinux file contexts
> > ("security.selinux") without race conditions and without a file
> > descriptor opened with read access requiring SELinux read permission.
> 
> Also, generally all *at() syscalls operate on O_PATH fds, unlike
> f*() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> in the final version merged into tree. Instead, let's switch things
> to CLASS(fd_raw).
> 
> Note that there's one side effect: f*xattr() starts to work with
> O_PATH fds too. It's not clear to me whether this is desirable
> (e.g. fstat() accepts O_PATH fds as an outlier).
> 
> [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> 
> Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> Signed-off-by: Mike Yuan <me@yhndnzj.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Göttsche <cgzones@googlemail.com>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: <stable@vger.kernel.org>
> ---

I expanded on this before. O_PATH is intentionally limited in scope and
it should not allow to perform operations that are similar to a read or
write which getting and setting xattrs is.

Patches that further weaken or dilute the semantics of O_PATH are not
acceptable.

>  fs/xattr.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xattr.c b/fs/xattr.c
> index 02bee149ad96..15df71e56187 100644
> --- a/fs/xattr.c
> +++ b/fs/xattr.c
> @@ -704,7 +704,7 @@ static int path_setxattrat(int dfd, const char __user *pathname,
>  
>  	filename = getname_maybe_null(pathname, at_flags);
>  	if (!filename) {
> -		CLASS(fd, f)(dfd);
> +		CLASS(fd_raw, f)(dfd);
>  		if (fd_empty(f))
>  			error = -EBADF;
>  		else
> @@ -848,7 +848,7 @@ static ssize_t path_getxattrat(int dfd, const char __user *pathname,
>  
>  	filename = getname_maybe_null(pathname, at_flags);
>  	if (!filename) {
> -		CLASS(fd, f)(dfd);
> +		CLASS(fd_raw, f)(dfd);
>  		if (fd_empty(f))
>  			return -EBADF;
>  		return file_getxattr(fd_file(f), &ctx);
> @@ -978,7 +978,7 @@ static ssize_t path_listxattrat(int dfd, const char __user *pathname,
>  
>  	filename = getname_maybe_null(pathname, at_flags);
>  	if (!filename) {
> -		CLASS(fd, f)(dfd);
> +		CLASS(fd_raw, f)(dfd);
>  		if (fd_empty(f))
>  			return -EBADF;
>  		return file_listxattr(fd_file(f), list, size);
> @@ -1079,7 +1079,7 @@ static int path_removexattrat(int dfd, const char __user *pathname,
>  
>  	filename = getname_maybe_null(pathname, at_flags);
>  	if (!filename) {
> -		CLASS(fd, f)(dfd);
> +		CLASS(fd_raw, f)(dfd);
>  		if (fd_empty(f))
>  			return -EBADF;
>  		return file_removexattr(fd_file(f), &kname);
> 
> base-commit: a86bf2283d2c9769205407e2b54777c03d012939
> -- 
> 2.48.1
> 
>
Mike Yuan Feb. 6, 2025, 9:51 a.m. UTC | #2
On 2/6/25 10:31, Christian Brauner <brauner@kernel.org> wrote:

>  On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
>  > Cited from commit message of original patch [1]:
>  >
>  > > One use case will be setfiles(8) setting SELinux file contexts
>  > > ("security.selinux") without race conditions and without a file
>  > > descriptor opened with read access requiring SELinux read permission.
>  >
>  > Also, generally all *at() syscalls operate on O_PATH fds, unlike
>  > f*() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
>  > in the final version merged into tree. Instead, let's switch things
>  > to CLASS(fd_raw).
>  >
>  > Note that there's one side effect: f*xattr() starts to work with
>  > O_PATH fds too. It's not clear to me whether this is desirable
>  > (e.g. fstat() accepts O_PATH fds as an outlier).
>  >
>  > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
>  >
>  > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
>  > Signed-off-by: Mike Yuan <me@yhndnzj.com>
>  > Cc: Al Viro <viro@zeniv.linux.org.uk>
>  > Cc: Christian Göttsche <cgzones@googlemail.com>
>  > Cc: Christian Brauner <brauner@kernel.org>
>  > Cc: <stable@vger.kernel.org>
>  > ---
>  
>  I expanded on this before. O_PATH is intentionally limited in scope and
>  it should not allow to perform operations that are similar to a read or
>  write which getting and setting xattrs is.
>  
>  Patches that further weaken or dilute the semantics of O_PATH are not
>  acceptable.

But the *at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f*() as-is?

>  >  fs/xattr.c | 8 ++++----
>  >  1 file changed, 4 insertions(+), 4 deletions(-)
>  >
>  > diff --git a/fs/xattr.c b/fs/xattr.c
>  > index 02bee149ad96..15df71e56187 100644
>  > --- a/fs/xattr.c
>  > +++ b/fs/xattr.c
>  > @@ -704,7 +704,7 @@ static int path_setxattrat(int dfd, const char __user *pathname,
>  >
>  >  	filename = getname_maybe_null(pathname, at_flags);
>  >  	if (!filename) {
>  > -		CLASS(fd, f)(dfd);
>  > +		CLASS(fd_raw, f)(dfd);
>  >  		if (fd_empty(f))
>  >  			error = -EBADF;
>  >  		else
>  > @@ -848,7 +848,7 @@ static ssize_t path_getxattrat(int dfd, const char __user *pathname,
>  >
>  >  	filename = getname_maybe_null(pathname, at_flags);
>  >  	if (!filename) {
>  > -		CLASS(fd, f)(dfd);
>  > +		CLASS(fd_raw, f)(dfd);
>  >  		if (fd_empty(f))
>  >  			return -EBADF;
>  >  		return file_getxattr(fd_file(f), &ctx);
>  > @@ -978,7 +978,7 @@ static ssize_t path_listxattrat(int dfd, const char __user *pathname,
>  >
>  >  	filename = getname_maybe_null(pathname, at_flags);
>  >  	if (!filename) {
>  > -		CLASS(fd, f)(dfd);
>  > +		CLASS(fd_raw, f)(dfd);
>  >  		if (fd_empty(f))
>  >  			return -EBADF;
>  >  		return file_listxattr(fd_file(f), list, size);
>  > @@ -1079,7 +1079,7 @@ static int path_removexattrat(int dfd, const char __user *pathname,
>  >
>  >  	filename = getname_maybe_null(pathname, at_flags);
>  >  	if (!filename) {
>  > -		CLASS(fd, f)(dfd);
>  > +		CLASS(fd_raw, f)(dfd);
>  >  		if (fd_empty(f))
>  >  			return -EBADF;
>  >  		return file_removexattr(fd_file(f), &kname);
>  >
>  > base-commit: a86bf2283d2c9769205407e2b54777c03d012939
>  > --
>  > 2.48.1
>  >
>  >
>
Christian Brauner Feb. 6, 2025, 10:03 a.m. UTC | #3
On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
> On 2/6/25 10:31, Christian Brauner <brauner@kernel.org> wrote:
> 
> >  On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> >  > Cited from commit message of original patch [1]:
> >  >
> >  > > One use case will be setfiles(8) setting SELinux file contexts
> >  > > ("security.selinux") without race conditions and without a file
> >  > > descriptor opened with read access requiring SELinux read permission.
> >  >
> >  > Also, generally all *at() syscalls operate on O_PATH fds, unlike
> >  > f*() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> >  > in the final version merged into tree. Instead, let's switch things
> >  > to CLASS(fd_raw).
> >  >
> >  > Note that there's one side effect: f*xattr() starts to work with
> >  > O_PATH fds too. It's not clear to me whether this is desirable
> >  > (e.g. fstat() accepts O_PATH fds as an outlier).
> >  >
> >  > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> >  >
> >  > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> >  > Signed-off-by: Mike Yuan <me@yhndnzj.com>
> >  > Cc: Al Viro <viro@zeniv.linux.org.uk>
> >  > Cc: Christian Göttsche <cgzones@googlemail.com>
> >  > Cc: Christian Brauner <brauner@kernel.org>
> >  > Cc: <stable@vger.kernel.org>
> >  > ---
> >  
> >  I expanded on this before. O_PATH is intentionally limited in scope and
> >  it should not allow to perform operations that are similar to a read or
> >  write which getting and setting xattrs is.
> >  
> >  Patches that further weaken or dilute the semantics of O_PATH are not
> >  acceptable.
> 
> But the *at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f*() as-is?

I'm confused. If you have:

        filename = getname_maybe_null(pathname, at_flags);
        if (!filename) {
                CLASS(fd, f)(dfd);
                if (fd_empty(f))
                        error = -EBADF;
                else
                        error = file_setxattr(fd_file(f), &ctx);

Then this branch ^^ cannot use fd_raw because you're allowing operations
directly on the O_PATH file descriptor.

Using the O_PATH file descriptor for lookup is obviously fine which is
why the other branch exists:

        } else {
                error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
        }

IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
which isn't acceptable. However, it is already perfectly fine to use an
O_PATH file descriptor for lookup.

> 
> >  >  fs/xattr.c | 8 ++++----
> >  >  1 file changed, 4 insertions(+), 4 deletions(-)
> >  >
> >  > diff --git a/fs/xattr.c b/fs/xattr.c
> >  > index 02bee149ad96..15df71e56187 100644
> >  > --- a/fs/xattr.c
> >  > +++ b/fs/xattr.c
> >  > @@ -704,7 +704,7 @@ static int path_setxattrat(int dfd, const char __user *pathname,
> >  >
> >  >  	filename = getname_maybe_null(pathname, at_flags);
> >  >  	if (!filename) {
> >  > -		CLASS(fd, f)(dfd);
> >  > +		CLASS(fd_raw, f)(dfd);
> >  >  		if (fd_empty(f))
> >  >  			error = -EBADF;
> >  >  		else
> >  > @@ -848,7 +848,7 @@ static ssize_t path_getxattrat(int dfd, const char __user *pathname,
> >  >
> >  >  	filename = getname_maybe_null(pathname, at_flags);
> >  >  	if (!filename) {
> >  > -		CLASS(fd, f)(dfd);
> >  > +		CLASS(fd_raw, f)(dfd);
> >  >  		if (fd_empty(f))
> >  >  			return -EBADF;
> >  >  		return file_getxattr(fd_file(f), &ctx);
> >  > @@ -978,7 +978,7 @@ static ssize_t path_listxattrat(int dfd, const char __user *pathname,
> >  >
> >  >  	filename = getname_maybe_null(pathname, at_flags);
> >  >  	if (!filename) {
> >  > -		CLASS(fd, f)(dfd);
> >  > +		CLASS(fd_raw, f)(dfd);
> >  >  		if (fd_empty(f))
> >  >  			return -EBADF;
> >  >  		return file_listxattr(fd_file(f), list, size);
> >  > @@ -1079,7 +1079,7 @@ static int path_removexattrat(int dfd, const char __user *pathname,
> >  >
> >  >  	filename = getname_maybe_null(pathname, at_flags);
> >  >  	if (!filename) {
> >  > -		CLASS(fd, f)(dfd);
> >  > +		CLASS(fd_raw, f)(dfd);
> >  >  		if (fd_empty(f))
> >  >  			return -EBADF;
> >  >  		return file_removexattr(fd_file(f), &kname);
> >  >
> >  > base-commit: a86bf2283d2c9769205407e2b54777c03d012939
> >  > --
> >  > 2.48.1
> >  >
> >  >
> >
Mike Yuan Feb. 6, 2025, 1:25 p.m. UTC | #4
On 2/6/25 11:03, Christian Brauner <brauner@kernel.org> wrote:

>  On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
>  > On 2/6/25 10:31, Christian Brauner <brauner@kernel.org> wrote:
>  >
>  > >  On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
>  > >  > Cited from commit message of original patch [1]:
>  > >  >
>  > >  > > One use case will be setfiles(8) setting SELinux file contexts
>  > >  > > ("security.selinux") without race conditions and without a file
>  > >  > > descriptor opened with read access requiring SELinux read permission.
>  > >  >
>  > >  > Also, generally all *at() syscalls operate on O_PATH fds, unlike
>  > >  > f*() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
>  > >  > in the final version merged into tree. Instead, let's switch things
>  > >  > to CLASS(fd_raw).
>  > >  >
>  > >  > Note that there's one side effect: f*xattr() starts to work with
>  > >  > O_PATH fds too. It's not clear to me whether this is desirable
>  > >  > (e.g. fstat() accepts O_PATH fds as an outlier).
>  > >  >
>  > >  > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
>  > >  >
>  > >  > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
>  > >  > Signed-off-by: Mike Yuan <me@yhndnzj.com>
>  > >  > Cc: Al Viro <viro@zeniv.linux.org.uk>
>  > >  > Cc: Christian Göttsche <cgzones@googlemail.com>
>  > >  > Cc: Christian Brauner <brauner@kernel.org>
>  > >  > Cc: <stable@vger.kernel.org>
>  > >  > ---
>  > >
>  > >  I expanded on this before. O_PATH is intentionally limited in scope and
>  > >  it should not allow to perform operations that are similar to a read or
>  > >  write which getting and setting xattrs is.
>  > >
>  > >  Patches that further weaken or dilute the semantics of O_PATH are not
>  > >  acceptable.
>  >
>  > But the *at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f*() as-is?
>  
>  I'm confused. If you have:
>  
>          filename = getname_maybe_null(pathname, at_flags);
>          if (!filename) {
>                  CLASS(fd, f)(dfd);
>                  if (fd_empty(f))
>                          error = -EBADF;
>                  else
>                          error = file_setxattr(fd_file(f), &ctx);
>  
>  Then this branch ^^ cannot use fd_raw because you're allowing operations
>  directly on the O_PATH file descriptor.
>  
>  Using the O_PATH file descriptor for lookup is obviously fine which is
>  why the other branch exists:
>  
>          } else {
>                  error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
>          }
>  
>  IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
>  which isn't acceptable. However, it is already perfectly fine to use an
>  O_PATH file descriptor for lookup.

Well, again, [1] clearly stated the use case:

> Those can be used to operate on extended attributes,
especially security related ones, either relative to a pinned directory
or [on a file descriptor without read access, avoiding a
/proc/<pid>/fd/<fd> detour, requiring a mounted procfs].

And this surfaced in my PR to systemd:

https://github.com/systemd/systemd/pull/36228/commits/34fe16fb177d2f917570c5f71dfa8f5b9746b9a7

How are *xattrat() syscalls different from e.g. fchmodat2(AT_EMPTY_PATH) in that regard? I can agree that the semantics of f*xattr() ought to be left untouched, yet I fail to grok the case for _at variants.

>  >
>  > >  >  fs/xattr.c | 8 ++++----
>  > >  >  1 file changed, 4 insertions(+), 4 deletions(-)
>  > >  >
>  > >  > diff --git a/fs/xattr.c b/fs/xattr.c
>  > >  > index 02bee149ad96..15df71e56187 100644
>  > >  > --- a/fs/xattr.c
>  > >  > +++ b/fs/xattr.c
>  > >  > @@ -704,7 +704,7 @@ static int path_setxattrat(int dfd, const char __user *pathname,
>  > >  >
>  > >  >  	filename = getname_maybe_null(pathname, at_flags);
>  > >  >  	if (!filename) {
>  > >  > -		CLASS(fd, f)(dfd);
>  > >  > +		CLASS(fd_raw, f)(dfd);
>  > >  >  		if (fd_empty(f))
>  > >  >  			error = -EBADF;
>  > >  >  		else
>  > >  > @@ -848,7 +848,7 @@ static ssize_t path_getxattrat(int dfd, const char __user *pathname,
>  > >  >
>  > >  >  	filename = getname_maybe_null(pathname, at_flags);
>  > >  >  	if (!filename) {
>  > >  > -		CLASS(fd, f)(dfd);
>  > >  > +		CLASS(fd_raw, f)(dfd);
>  > >  >  		if (fd_empty(f))
>  > >  >  			return -EBADF;
>  > >  >  		return file_getxattr(fd_file(f), &ctx);
>  > >  > @@ -978,7 +978,7 @@ static ssize_t path_listxattrat(int dfd, const char __user *pathname,
>  > >  >
>  > >  >  	filename = getname_maybe_null(pathname, at_flags);
>  > >  >  	if (!filename) {
>  > >  > -		CLASS(fd, f)(dfd);
>  > >  > +		CLASS(fd_raw, f)(dfd);
>  > >  >  		if (fd_empty(f))
>  > >  >  			return -EBADF;
>  > >  >  		return file_listxattr(fd_file(f), list, size);
>  > >  > @@ -1079,7 +1079,7 @@ static int path_removexattrat(int dfd, const char __user *pathname,
>  > >  >
>  > >  >  	filename = getname_maybe_null(pathname, at_flags);
>  > >  >  	if (!filename) {
>  > >  > -		CLASS(fd, f)(dfd);
>  > >  > +		CLASS(fd_raw, f)(dfd);
>  > >  >  		if (fd_empty(f))
>  > >  >  			return -EBADF;
>  > >  >  		return file_removexattr(fd_file(f), &kname);
>  > >  >
>  > >  > base-commit: a86bf2283d2c9769205407e2b54777c03d012939
>  > >  > --
>  > >  > 2.48.1
>  > >  >
>  > >  >
>  > >
>
Christian Brauner Feb. 6, 2025, 1:35 p.m. UTC | #5
On Thu, Feb 06, 2025 at 01:25:19PM +0000, Mike Yuan wrote:
> On 2/6/25 11:03, Christian Brauner <brauner@kernel.org> wrote:
> 
> >  On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
> >  > On 2/6/25 10:31, Christian Brauner <brauner@kernel.org> wrote:
> >  >
> >  > >  On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> >  > >  > Cited from commit message of original patch [1]:
> >  > >  >
> >  > >  > > One use case will be setfiles(8) setting SELinux file contexts
> >  > >  > > ("security.selinux") without race conditions and without a file
> >  > >  > > descriptor opened with read access requiring SELinux read permission.
> >  > >  >
> >  > >  > Also, generally all *at() syscalls operate on O_PATH fds, unlike
> >  > >  > f*() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> >  > >  > in the final version merged into tree. Instead, let's switch things
> >  > >  > to CLASS(fd_raw).
> >  > >  >
> >  > >  > Note that there's one side effect: f*xattr() starts to work with
> >  > >  > O_PATH fds too. It's not clear to me whether this is desirable
> >  > >  > (e.g. fstat() accepts O_PATH fds as an outlier).
> >  > >  >
> >  > >  > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> >  > >  >
> >  > >  > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> >  > >  > Signed-off-by: Mike Yuan <me@yhndnzj.com>
> >  > >  > Cc: Al Viro <viro@zeniv.linux.org.uk>
> >  > >  > Cc: Christian Göttsche <cgzones@googlemail.com>
> >  > >  > Cc: Christian Brauner <brauner@kernel.org>
> >  > >  > Cc: <stable@vger.kernel.org>
> >  > >  > ---
> >  > >
> >  > >  I expanded on this before. O_PATH is intentionally limited in scope and
> >  > >  it should not allow to perform operations that are similar to a read or
> >  > >  write which getting and setting xattrs is.
> >  > >
> >  > >  Patches that further weaken or dilute the semantics of O_PATH are not
> >  > >  acceptable.
> >  >
> >  > But the *at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f*() as-is?
> >  
> >  I'm confused. If you have:
> >  
> >          filename = getname_maybe_null(pathname, at_flags);
> >          if (!filename) {
> >                  CLASS(fd, f)(dfd);
> >                  if (fd_empty(f))
> >                          error = -EBADF;
> >                  else
> >                          error = file_setxattr(fd_file(f), &ctx);
> >  
> >  Then this branch ^^ cannot use fd_raw because you're allowing operations
> >  directly on the O_PATH file descriptor.
> >  
> >  Using the O_PATH file descriptor for lookup is obviously fine which is
> >  why the other branch exists:
> >  
> >          } else {
> >                  error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
> >          }
> >  
> >  IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
> >  which isn't acceptable. However, it is already perfectly fine to use an
> >  O_PATH file descriptor for lookup.
> 
> Well, again, [1] clearly stated the use case:
> 
> > Those can be used to operate on extended attributes,
> especially security related ones, either relative to a pinned directory
> or [on a file descriptor without read access, avoiding a
> /proc/<pid>/fd/<fd> detour, requiring a mounted procfs].
> 
> And this surfaced in my PR to systemd:
> 
> https://github.com/systemd/systemd/pull/36228/commits/34fe16fb177d2f917570c5f71dfa8f5b9746b9a7
> 
> How are *xattrat() syscalls different from e.g. fchmodat2(AT_EMPTY_PATH) in that regard? I can agree that the semantics of f*xattr() ought to be left untouched, yet I fail to grok the case for _at variants.

man openat:

       O_PATH (since Linux 2.6.39)
              Obtain a file descriptor that can be used for two purposes: to indicate a location in the filesystem tree and to perform operations that act purely at the file descriptor level.
              The file itself is not opened, and other file operations (e.g., read(2), write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF.

              The following operations can be performed on the resulting file descriptor:

              •  close(2).

              •  fchdir(2), if the file descriptor refers to a directory (since Linux 3.5).

              •  fstat(2) (since Linux 3.6).

              •  fstatfs(2) (since Linux 3.12).

              •  Duplicating the file descriptor (dup(2), fcntl(2) F_DUPFD, etc.).

              •  Getting and setting file descriptor flags (fcntl(2) F_GETFD and F_SETFD).

              •  Retrieving open file status flags using the fcntl(2) F_GETFL operation: the returned flags will include the bit O_PATH.

              •  Passing  the  file  descriptor  as the dirfd argument of openat() and the other "*at()" system calls.  This includes linkat(2) with AT_EMPTY_PATH (or via procfs using AT_SYM‐
                 LINK_FOLLOW) even if the file is not a directory.

              •  Passing the file descriptor to another process via a UNIX domain socket (see SCM_RIGHTS in unix(7)).

Both fchownat() and fchmodat() variants have had this behavior which
is a bug. And not a great one because it breaks O_PATH guarantees.
That's no reason to now also open up further holes such as getting and
setting xattrs.

If you want to perform read/write like operations you need a proper file
descriptor for that and not continue to expand the meaning of O_PATH
until it is indistinguishable from a regular file descriptor.
Mike Yuan Feb. 6, 2025, 2:26 p.m. UTC | #6
On 2025-02-06 at 14:35, Christian Brauner <brauner@kernel.org> wrote:
> 
> On Thu, Feb 06, 2025 at 01:25:19PM +0000, Mike Yuan wrote:
> 
> > On 2/6/25 11:03, Christian Brauner brauner@kernel.org wrote:
> > 
> > > On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
> > > 
> > > > On 2/6/25 10:31, Christian Brauner brauner@kernel.org wrote:
> > > > 
> > > > > On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> > > > > 
> > > > > > Cited from commit message of original patch [1]:
> > > > > > 
> > > > > > > One use case will be setfiles(8) setting SELinux file contexts
> > > > > > > ("security.selinux") without race conditions and without a file
> > > > > > > descriptor opened with read access requiring SELinux read permission.
> > > > > > 
> > > > > > Also, generally all at() syscalls operate on O_PATH fds, unlike
> > > > > > f() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> > > > > > in the final version merged into tree. Instead, let's switch things
> > > > > > to CLASS(fd_raw).
> > > > > > 
> > > > > > Note that there's one side effect: f*xattr() starts to work with
> > > > > > O_PATH fds too. It's not clear to me whether this is desirable
> > > > > > (e.g. fstat() accepts O_PATH fds as an outlier).
> > > > > > 
> > > > > > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> > > > > > 
> > > > > > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> > > > > > Signed-off-by: Mike Yuan me@yhndnzj.com
> > > > > > Cc: Al Viro viro@zeniv.linux.org.uk
> > > > > > Cc: Christian Göttsche cgzones@googlemail.com
> > > > > > Cc: Christian Brauner brauner@kernel.org
> > > > > > Cc: stable@vger.kernel.org
> > > > > > ---
> > > > > 
> > > > > I expanded on this before. O_PATH is intentionally limited in scope and
> > > > > it should not allow to perform operations that are similar to a read or
> > > > > write which getting and setting xattrs is.
> > > > > 
> > > > > Patches that further weaken or dilute the semantics of O_PATH are not
> > > > > acceptable.
> > > > 
> > > > But the at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f() as-is?
> > > 
> > > I'm confused. If you have:
> > > 
> > > filename = getname_maybe_null(pathname, at_flags);
> > > if (!filename) {
> > > CLASS(fd, f)(dfd);
> > > if (fd_empty(f))
> > > error = -EBADF;
> > > else
> > > error = file_setxattr(fd_file(f), &ctx);
> > > 
> > > Then this branch ^^ cannot use fd_raw because you're allowing operations
> > > directly on the O_PATH file descriptor.
> > > 
> > > Using the O_PATH file descriptor for lookup is obviously fine which is
> > > why the other branch exists:
> > > 
> > > } else {
> > > error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
> > > }
> > > 
> > > IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
> > > which isn't acceptable. However, it is already perfectly fine to use an
> > > O_PATH file descriptor for lookup.
> > 
> > Well, again, [1] clearly stated the use case:
> > 
> > > Those can be used to operate on extended attributes,
> > > especially security related ones, either relative to a pinned directory
> > > or [on a file descriptor without read access, avoiding a
> > > /proc/<pid>/fd/<fd> detour, requiring a mounted procfs].
> > 
> > And this surfaced in my PR to systemd:
> > 
> > https://github.com/systemd/systemd/pull/36228/commits/34fe16fb177d2f917570c5f71dfa8f5b9746b9a7
> > 
> > How are xattrat() syscalls different from e.g. fchmodat2(AT_EMPTY_PATH) in that regard? I can agree that the semantics of fxattr() ought to be left untouched, yet I fail to grok the case for _at variants.
> 
> 
> man openat:
> 
> O_PATH (since Linux 2.6.39)
> Obtain a file descriptor that can be used for two purposes: to indicate a location in the filesystem tree and to perform operations that act purely at the file descriptor level.
> The file itself is not opened, and other file operations (e.g., read(2), write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF.
> 
> The following operations can be performed on the resulting file descriptor:
> 
> • close(2).
> 
> • fchdir(2), if the file descriptor refers to a directory (since Linux 3.5).
> 
> • fstat(2) (since Linux 3.6).
> 
> • fstatfs(2) (since Linux 3.12).
> 
> • Duplicating the file descriptor (dup(2), fcntl(2) F_DUPFD, etc.).
> 
> • Getting and setting file descriptor flags (fcntl(2) F_GETFD and F_SETFD).
> 
> • Retrieving open file status flags using the fcntl(2) F_GETFL operation: the returned flags will include the bit O_PATH.
> 
> • Passing the file descriptor as the dirfd argument of openat() and the other "*at()" system calls. This includes linkat(2) with AT_EMPTY_PATH (or via procfs using AT_SYM‐
> LINK_FOLLOW) even if the file is not a directory.
> 
> • Passing the file descriptor to another process via a UNIX domain socket (see SCM_RIGHTS in unix(7)).
> 
> Both fchownat() and fchmodat() variants have had this behavior which
> is a bug. And not a great one because it breaks O_PATH guarantees.
> That's no reason to now also open up further holes such as getting and
> setting xattrs.

AFAICS that's not really what the kernel development has been after.
fchmodat2(2) in particular was added fairly recently (6.6,
https://github.com/torvalds/linux/commit/09da082b07bbae1c11d9560c8502800039aebcea).
One of the highlights was to add support for O_PATH + AT_EMPTY_PATH).

And to me the semantics seem reasonably OK really: O_PATH fds are a way to
pin the inode (i.e. still within the boundry of getting a reference to
a file system object (without opening it) if I shall put it). All the calls
mentioned above operate on the file system object metadata, not the actual data,
hence O_PATH is just providing a file _location_.

The separation of *at() and regular f*() versions have remained pretty consistent too
(of cource, not with this patch, which is mostly an RFC) - fchmod() and fchown()
refuse O_PATH fds while the at() counterparts accept them. Did all these suddenly
change overnight?

> If you want to perform read/write like operations you need a proper file
> descriptor for that and not continue to expand the meaning of O_PATH
> until it is indistinguishable from a regular file descriptor.

I definitely agree. But xattr is metadata, not data. listxattr() does not even
do any POSIX permission check...
Christian Brauner Feb. 6, 2025, 2:54 p.m. UTC | #7
On Thu, Feb 06, 2025 at 02:26:51PM +0000, Mike Yuan wrote:
> On 2025-02-06 at 14:35, Christian Brauner <brauner@kernel.org> wrote:
> > 
> > On Thu, Feb 06, 2025 at 01:25:19PM +0000, Mike Yuan wrote:
> > 
> > > On 2/6/25 11:03, Christian Brauner brauner@kernel.org wrote:
> > > 
> > > > On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
> > > > 
> > > > > On 2/6/25 10:31, Christian Brauner brauner@kernel.org wrote:
> > > > > 
> > > > > > On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> > > > > > 
> > > > > > > Cited from commit message of original patch [1]:
> > > > > > > 
> > > > > > > > One use case will be setfiles(8) setting SELinux file contexts
> > > > > > > > ("security.selinux") without race conditions and without a file
> > > > > > > > descriptor opened with read access requiring SELinux read permission.
> > > > > > > 
> > > > > > > Also, generally all at() syscalls operate on O_PATH fds, unlike
> > > > > > > f() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> > > > > > > in the final version merged into tree. Instead, let's switch things
> > > > > > > to CLASS(fd_raw).
> > > > > > > 
> > > > > > > Note that there's one side effect: f*xattr() starts to work with
> > > > > > > O_PATH fds too. It's not clear to me whether this is desirable
> > > > > > > (e.g. fstat() accepts O_PATH fds as an outlier).
> > > > > > > 
> > > > > > > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> > > > > > > 
> > > > > > > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> > > > > > > Signed-off-by: Mike Yuan me@yhndnzj.com
> > > > > > > Cc: Al Viro viro@zeniv.linux.org.uk
> > > > > > > Cc: Christian Göttsche cgzones@googlemail.com
> > > > > > > Cc: Christian Brauner brauner@kernel.org
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > ---
> > > > > > 
> > > > > > I expanded on this before. O_PATH is intentionally limited in scope and
> > > > > > it should not allow to perform operations that are similar to a read or
> > > > > > write which getting and setting xattrs is.
> > > > > > 
> > > > > > Patches that further weaken or dilute the semantics of O_PATH are not
> > > > > > acceptable.
> > > > > 
> > > > > But the at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f() as-is?
> > > > 
> > > > I'm confused. If you have:
> > > > 
> > > > filename = getname_maybe_null(pathname, at_flags);
> > > > if (!filename) {
> > > > CLASS(fd, f)(dfd);
> > > > if (fd_empty(f))
> > > > error = -EBADF;
> > > > else
> > > > error = file_setxattr(fd_file(f), &ctx);
> > > > 
> > > > Then this branch ^^ cannot use fd_raw because you're allowing operations
> > > > directly on the O_PATH file descriptor.
> > > > 
> > > > Using the O_PATH file descriptor for lookup is obviously fine which is
> > > > why the other branch exists:
> > > > 
> > > > } else {
> > > > error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
> > > > }
> > > > 
> > > > IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
> > > > which isn't acceptable. However, it is already perfectly fine to use an
> > > > O_PATH file descriptor for lookup.
> > > 
> > > Well, again, [1] clearly stated the use case:
> > > 
> > > > Those can be used to operate on extended attributes,
> > > > especially security related ones, either relative to a pinned directory
> > > > or [on a file descriptor without read access, avoiding a
> > > > /proc/<pid>/fd/<fd> detour, requiring a mounted procfs].
> > > 
> > > And this surfaced in my PR to systemd:
> > > 
> > > https://github.com/systemd/systemd/pull/36228/commits/34fe16fb177d2f917570c5f71dfa8f5b9746b9a7
> > > 
> > > How are xattrat() syscalls different from e.g. fchmodat2(AT_EMPTY_PATH) in that regard? I can agree that the semantics of fxattr() ought to be left untouched, yet I fail to grok the case for _at variants.
> > 
> > 
> > man openat:
> > 
> > O_PATH (since Linux 2.6.39)
> > Obtain a file descriptor that can be used for two purposes: to indicate a location in the filesystem tree and to perform operations that act purely at the file descriptor level.
> > The file itself is not opened, and other file operations (e.g., read(2), write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF.
> > 
> > The following operations can be performed on the resulting file descriptor:
> > 
> > • close(2).
> > 
> > • fchdir(2), if the file descriptor refers to a directory (since Linux 3.5).
> > 
> > • fstat(2) (since Linux 3.6).
> > 
> > • fstatfs(2) (since Linux 3.12).
> > 
> > • Duplicating the file descriptor (dup(2), fcntl(2) F_DUPFD, etc.).
> > 
> > • Getting and setting file descriptor flags (fcntl(2) F_GETFD and F_SETFD).
> > 
> > • Retrieving open file status flags using the fcntl(2) F_GETFL operation: the returned flags will include the bit O_PATH.
> > 
> > • Passing the file descriptor as the dirfd argument of openat() and the other "*at()" system calls. This includes linkat(2) with AT_EMPTY_PATH (or via procfs using AT_SYM‐
> > LINK_FOLLOW) even if the file is not a directory.
> > 
> > • Passing the file descriptor to another process via a UNIX domain socket (see SCM_RIGHTS in unix(7)).
> > 
> > Both fchownat() and fchmodat() variants have had this behavior which
> > is a bug. And not a great one because it breaks O_PATH guarantees.
> > That's no reason to now also open up further holes such as getting and
> > setting xattrs.
> 
> AFAICS that's not really what the kernel development has been after.
> fchmodat2(2) in particular was added fairly recently (6.6,
> https://github.com/torvalds/linux/commit/09da082b07bbae1c11d9560c8502800039aebcea).
> One of the highlights was to add support for O_PATH + AT_EMPTY_PATH).

These system calls had pre-existing inconsistency. And allowing
fchownat() but then not fchmodat2() seemd asymmetric. But given the
discussion now I wish I hadn't even allowed that.

> 
> And to me the semantics seem reasonably OK really: O_PATH fds are a way to
> pin the inode (i.e. still within the boundry of getting a reference to
> a file system object (without opening it) if I shall put it). All the calls
> mentioned above operate on the file system object metadata, not the actual data,
> hence O_PATH is just providing a file _location_.
> 
> The separation of *at() and regular f*() versions have remained pretty consistent too
> (of cource, not with this patch, which is mostly an RFC) - fchmod() and fchown()
> refuse O_PATH fds while the at() counterparts accept them. Did all these suddenly
> change overnight?
> 
> > If you want to perform read/write like operations you need a proper file
> > descriptor for that and not continue to expand the meaning of O_PATH
> > until it is indistinguishable from a regular file descriptor.
> 
> I definitely agree. But xattr is metadata, not data. listxattr() does not even
> do any POSIX permission check...

That it's metadata is an interesting difference but really not that
important. Plus, the manpage also doesn't care about this distinction.
Ownership and mode changes are bad enough, setting and getting random
xattrs seems like a terrible idea.

If I sent an O_PATH file descriptor around today in a sandbox without
procfs mounted then I know no one can suddenly set posix acls using that
file descriptor. With this change they suddenly can.

This won't be enabled.
Mike Yuan Feb. 6, 2025, 3:19 p.m. UTC | #8
On 2025-02-06 at 15:54, Christian Brauner <brauner@kernel.org> wrote:
> 
> On Thu, Feb 06, 2025 at 02:26:51PM +0000, Mike Yuan wrote:
> 
> > On 2025-02-06 at 14:35, Christian Brauner brauner@kernel.org wrote:
> > 
> > > On Thu, Feb 06, 2025 at 01:25:19PM +0000, Mike Yuan wrote:
> > > 
> > > > On 2/6/25 11:03, Christian Brauner brauner@kernel.org wrote:
> > > > 
> > > > > On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
> > > > > 
> > > > > > On 2/6/25 10:31, Christian Brauner brauner@kernel.org wrote:
> > > > > > 
> > > > > > > On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> > > > > > > 
> > > > > > > > Cited from commit message of original patch [1]:
> > > > > > > > 
> > > > > > > > > One use case will be setfiles(8) setting SELinux file contexts
> > > > > > > > > ("security.selinux") without race conditions and without a file
> > > > > > > > > descriptor opened with read access requiring SELinux read permission.
> > > > > > > > 
> > > > > > > > Also, generally all at() syscalls operate on O_PATH fds, unlike
> > > > > > > > f() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> > > > > > > > in the final version merged into tree. Instead, let's switch things
> > > > > > > > to CLASS(fd_raw).
> > > > > > > > 
> > > > > > > > Note that there's one side effect: f*xattr() starts to work with
> > > > > > > > O_PATH fds too. It's not clear to me whether this is desirable
> > > > > > > > (e.g. fstat() accepts O_PATH fds as an outlier).
> > > > > > > > 
> > > > > > > > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> > > > > > > > 
> > > > > > > > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> > > > > > > > Signed-off-by: Mike Yuan me@yhndnzj.com
> > > > > > > > Cc: Al Viro viro@zeniv.linux.org.uk
> > > > > > > > Cc: Christian Göttsche cgzones@googlemail.com
> > > > > > > > Cc: Christian Brauner brauner@kernel.org
> > > > > > > > Cc: stable@vger.kernel.org
> > > > > > > > ---
> > > > > > > 
> > > > > > > I expanded on this before. O_PATH is intentionally limited in scope and
> > > > > > > it should not allow to perform operations that are similar to a read or
> > > > > > > write which getting and setting xattrs is.
> > > > > > > 
> > > > > > > Patches that further weaken or dilute the semantics of O_PATH are not
> > > > > > > acceptable.
> > > > > > 
> > > > > > But the at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f() as-is?
> > > > > 
> > > > > I'm confused. If you have:
> > > > > 
> > > > > filename = getname_maybe_null(pathname, at_flags);
> > > > > if (!filename) {
> > > > > CLASS(fd, f)(dfd);
> > > > > if (fd_empty(f))
> > > > > error = -EBADF;
> > > > > else
> > > > > error = file_setxattr(fd_file(f), &ctx);
> > > > > 
> > > > > Then this branch ^^ cannot use fd_raw because you're allowing operations
> > > > > directly on the O_PATH file descriptor.
> > > > > 
> > > > > Using the O_PATH file descriptor for lookup is obviously fine which is
> > > > > why the other branch exists:
> > > > > 
> > > > > } else {
> > > > > error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
> > > > > }
> > > > > 
> > > > > IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
> > > > > which isn't acceptable. However, it is already perfectly fine to use an
> > > > > O_PATH file descriptor for lookup.
> > > > 
> > > > Well, again, [1] clearly stated the use case:
> > > > 
> > > > > Those can be used to operate on extended attributes,
> > > > > especially security related ones, either relative to a pinned directory
> > > > > or [on a file descriptor without read access, avoiding a
> > > > > /proc/<pid>/fd/<fd> detour, requiring a mounted procfs].
> > > > 
> > > > And this surfaced in my PR to systemd:
> > > > 
> > > > https://github.com/systemd/systemd/pull/36228/commits/34fe16fb177d2f917570c5f71dfa8f5b9746b9a7
> > > > 
> > > > How are xattrat() syscalls different from e.g. fchmodat2(AT_EMPTY_PATH) in that regard? I can agree that the semantics of fxattr() ought to be left untouched, yet I fail to grok the case for _at variants.
> > > 
> > > man openat:
> > > 
> > > O_PATH (since Linux 2.6.39)
> > > Obtain a file descriptor that can be used for two purposes: to indicate a location in the filesystem tree and to perform operations that act purely at the file descriptor level.
> > > The file itself is not opened, and other file operations (e.g., read(2), write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF.
> > > 
> > > The following operations can be performed on the resulting file descriptor:
> > > 
> > > • close(2).
> > > 
> > > • fchdir(2), if the file descriptor refers to a directory (since Linux 3.5).
> > > 
> > > • fstat(2) (since Linux 3.6).
> > > 
> > > • fstatfs(2) (since Linux 3.12).
> > > 
> > > • Duplicating the file descriptor (dup(2), fcntl(2) F_DUPFD, etc.).
> > > 
> > > • Getting and setting file descriptor flags (fcntl(2) F_GETFD and F_SETFD).
> > > 
> > > • Retrieving open file status flags using the fcntl(2) F_GETFL operation: the returned flags will include the bit O_PATH.
> > > 
> > > • Passing the file descriptor as the dirfd argument of openat() and the other "*at()" system calls. This includes linkat(2) with AT_EMPTY_PATH (or via procfs using AT_SYM‐
> > > LINK_FOLLOW) even if the file is not a directory.
> > > 
> > > • Passing the file descriptor to another process via a UNIX domain socket (see SCM_RIGHTS in unix(7)).
> > > 
> > > Both fchownat() and fchmodat() variants have had this behavior which
> > > is a bug. And not a great one because it breaks O_PATH guarantees.
> > > That's no reason to now also open up further holes such as getting and
> > > setting xattrs.
> > 
> > AFAICS that's not really what the kernel development has been after.
> > fchmodat2(2) in particular was added fairly recently (6.6,
> > https://github.com/torvalds/linux/commit/09da082b07bbae1c11d9560c8502800039aebcea).
> > One of the highlights was to add support for O_PATH + AT_EMPTY_PATH).
> 
> 
> These system calls had pre-existing inconsistency. And allowing
> fchownat() but then not fchmodat2() seemd asymmetric. But given the
> discussion now I wish I hadn't even allowed that.
> 
> > And to me the semantics seem reasonably OK really: O_PATH fds are a way to
> > pin the inode (i.e. still within the boundry of getting a reference to
> > a file system object (without opening it) if I shall put it). All the calls
> > mentioned above operate on the file system object metadata, not the actual data,
> > hence O_PATH is just providing a file location.
> > 
> > The separation of at() and regular f() versions have remained pretty consistent too
> > (of cource, not with this patch, which is mostly an RFC) - fchmod() and fchown()
> > refuse O_PATH fds while the at() counterparts accept them. Did all these suddenly
> > change overnight?
> > 
> > > If you want to perform read/write like operations you need a proper file
> > > descriptor for that and not continue to expand the meaning of O_PATH
> > > until it is indistinguishable from a regular file descriptor.
> > 
> > I definitely agree. But xattr is metadata, not data. listxattr() does not even
> > do any POSIX permission check...
> 
> 
> That it's metadata is an interesting difference but really not that
> important. Plus, the manpage also doesn't care about this distinction.
> Ownership and mode changes are bad enough, setting and getting random
> xattrs seems like a terrible idea.
> 
> If I sent an O_PATH file descriptor around today in a sandbox without
> procfs mounted then I know no one can suddenly set posix acls using that
> file descriptor. With this change they suddenly can.
> 
> This won't be enabled.

OK, so I see the previous discussions now:

https://lore.kernel.org/all/20220622025715.upflevvao3ttaekj@senku/

> Since the current functionality cannot be retroactively disabled as it
> is being used already through /proc/self/fd/$n, adding *xattrat(AT_EMPTY_PATH)
> doesn't really change what is currently possible by userspace.
>
> I would say we should add *xattrat(2) and then we can add an upgrade
> mask blocking it (and other operations) later.

I suppose some of these points shifted over the years, which is also fine.
It's unfortunate though that the commit message would remain misleading forever.
Christian Brauner Feb. 7, 2025, 9:05 a.m. UTC | #9
On Thu, Feb 06, 2025 at 03:19:03PM +0000, Mike Yuan wrote:
> On 2025-02-06 at 15:54, Christian Brauner <brauner@kernel.org> wrote:
> > 
> > On Thu, Feb 06, 2025 at 02:26:51PM +0000, Mike Yuan wrote:
> > 
> > > On 2025-02-06 at 14:35, Christian Brauner brauner@kernel.org wrote:
> > > 
> > > > On Thu, Feb 06, 2025 at 01:25:19PM +0000, Mike Yuan wrote:
> > > > 
> > > > > On 2/6/25 11:03, Christian Brauner brauner@kernel.org wrote:
> > > > > 
> > > > > > On Thu, Feb 06, 2025 at 09:51:33AM +0000, Mike Yuan wrote:
> > > > > > 
> > > > > > > On 2/6/25 10:31, Christian Brauner brauner@kernel.org wrote:
> > > > > > > 
> > > > > > > > On Wed, Feb 05, 2025 at 08:47:23PM +0000, Mike Yuan wrote:
> > > > > > > > 
> > > > > > > > > Cited from commit message of original patch [1]:
> > > > > > > > > 
> > > > > > > > > > One use case will be setfiles(8) setting SELinux file contexts
> > > > > > > > > > ("security.selinux") without race conditions and without a file
> > > > > > > > > > descriptor opened with read access requiring SELinux read permission.
> > > > > > > > > 
> > > > > > > > > Also, generally all at() syscalls operate on O_PATH fds, unlike
> > > > > > > > > f() ones. Yet the O_PATH fds are rejected by *xattrat() syscalls
> > > > > > > > > in the final version merged into tree. Instead, let's switch things
> > > > > > > > > to CLASS(fd_raw).
> > > > > > > > > 
> > > > > > > > > Note that there's one side effect: f*xattr() starts to work with
> > > > > > > > > O_PATH fds too. It's not clear to me whether this is desirable
> > > > > > > > > (e.g. fstat() accepts O_PATH fds as an outlier).
> > > > > > > > > 
> > > > > > > > > [1] https://lore.kernel.org/all/20240426162042.191916-1-cgoettsche@seltendoof.de/
> > > > > > > > > 
> > > > > > > > > Fixes: 6140be90ec70 ("fs/xattr: add *at family syscalls")
> > > > > > > > > Signed-off-by: Mike Yuan me@yhndnzj.com
> > > > > > > > > Cc: Al Viro viro@zeniv.linux.org.uk
> > > > > > > > > Cc: Christian Göttsche cgzones@googlemail.com
> > > > > > > > > Cc: Christian Brauner brauner@kernel.org
> > > > > > > > > Cc: stable@vger.kernel.org
> > > > > > > > > ---
> > > > > > > > 
> > > > > > > > I expanded on this before. O_PATH is intentionally limited in scope and
> > > > > > > > it should not allow to perform operations that are similar to a read or
> > > > > > > > write which getting and setting xattrs is.
> > > > > > > > 
> > > > > > > > Patches that further weaken or dilute the semantics of O_PATH are not
> > > > > > > > acceptable.
> > > > > > > 
> > > > > > > But the at() variants really should be able to work with O_PATH fds, otherwise they're basically useless? I guess I just need to keep f() as-is?
> > > > > > 
> > > > > > I'm confused. If you have:
> > > > > > 
> > > > > > filename = getname_maybe_null(pathname, at_flags);
> > > > > > if (!filename) {
> > > > > > CLASS(fd, f)(dfd);
> > > > > > if (fd_empty(f))
> > > > > > error = -EBADF;
> > > > > > else
> > > > > > error = file_setxattr(fd_file(f), &ctx);
> > > > > > 
> > > > > > Then this branch ^^ cannot use fd_raw because you're allowing operations
> > > > > > directly on the O_PATH file descriptor.
> > > > > > 
> > > > > > Using the O_PATH file descriptor for lookup is obviously fine which is
> > > > > > why the other branch exists:
> > > > > > 
> > > > > > } else {
> > > > > > error = filename_setxattr(dfd, filename, lookup_flags, &ctx);
> > > > > > }
> > > > > > 
> > > > > > IOW, your patch makes AT_EMPTY_PATH work with an O_PATH file descriptor
> > > > > > which isn't acceptable. However, it is already perfectly fine to use an
> > > > > > O_PATH file descriptor for lookup.
> > > > > 
> > > > > Well, again, [1] clearly stated the use case:
> > > > > 
> > > > > > Those can be used to operate on extended attributes,
> > > > > > especially security related ones, either relative to a pinned directory
> > > > > > or [on a file descriptor without read access, avoiding a
> > > > > > /proc/<pid>/fd/<fd> detour, requiring a mounted procfs].
> > > > > 
> > > > > And this surfaced in my PR to systemd:
> > > > > 
> > > > > https://github.com/systemd/systemd/pull/36228/commits/34fe16fb177d2f917570c5f71dfa8f5b9746b9a7
> > > > > 
> > > > > How are xattrat() syscalls different from e.g. fchmodat2(AT_EMPTY_PATH) in that regard? I can agree that the semantics of fxattr() ought to be left untouched, yet I fail to grok the case for _at variants.
> > > > 
> > > > man openat:
> > > > 
> > > > O_PATH (since Linux 2.6.39)
> > > > Obtain a file descriptor that can be used for two purposes: to indicate a location in the filesystem tree and to perform operations that act purely at the file descriptor level.
> > > > The file itself is not opened, and other file operations (e.g., read(2), write(2), fchmod(2), fchown(2), fgetxattr(2), ioctl(2), mmap(2)) fail with the error EBADF.
> > > > 
> > > > The following operations can be performed on the resulting file descriptor:
> > > > 
> > > > • close(2).
> > > > 
> > > > • fchdir(2), if the file descriptor refers to a directory (since Linux 3.5).
> > > > 
> > > > • fstat(2) (since Linux 3.6).
> > > > 
> > > > • fstatfs(2) (since Linux 3.12).
> > > > 
> > > > • Duplicating the file descriptor (dup(2), fcntl(2) F_DUPFD, etc.).
> > > > 
> > > > • Getting and setting file descriptor flags (fcntl(2) F_GETFD and F_SETFD).
> > > > 
> > > > • Retrieving open file status flags using the fcntl(2) F_GETFL operation: the returned flags will include the bit O_PATH.
> > > > 
> > > > • Passing the file descriptor as the dirfd argument of openat() and the other "*at()" system calls. This includes linkat(2) with AT_EMPTY_PATH (or via procfs using AT_SYM‐
> > > > LINK_FOLLOW) even if the file is not a directory.
> > > > 
> > > > • Passing the file descriptor to another process via a UNIX domain socket (see SCM_RIGHTS in unix(7)).
> > > > 
> > > > Both fchownat() and fchmodat() variants have had this behavior which
> > > > is a bug. And not a great one because it breaks O_PATH guarantees.
> > > > That's no reason to now also open up further holes such as getting and
> > > > setting xattrs.
> > > 
> > > AFAICS that's not really what the kernel development has been after.
> > > fchmodat2(2) in particular was added fairly recently (6.6,
> > > https://github.com/torvalds/linux/commit/09da082b07bbae1c11d9560c8502800039aebcea).
> > > One of the highlights was to add support for O_PATH + AT_EMPTY_PATH).
> > 
> > 
> > These system calls had pre-existing inconsistency. And allowing
> > fchownat() but then not fchmodat2() seemd asymmetric. But given the
> > discussion now I wish I hadn't even allowed that.
> > 
> > > And to me the semantics seem reasonably OK really: O_PATH fds are a way to
> > > pin the inode (i.e. still within the boundry of getting a reference to
> > > a file system object (without opening it) if I shall put it). All the calls
> > > mentioned above operate on the file system object metadata, not the actual data,
> > > hence O_PATH is just providing a file location.
> > > 
> > > The separation of at() and regular f() versions have remained pretty consistent too
> > > (of cource, not with this patch, which is mostly an RFC) - fchmod() and fchown()
> > > refuse O_PATH fds while the at() counterparts accept them. Did all these suddenly
> > > change overnight?
> > > 
> > > > If you want to perform read/write like operations you need a proper file
> > > > descriptor for that and not continue to expand the meaning of O_PATH
> > > > until it is indistinguishable from a regular file descriptor.
> > > 
> > > I definitely agree. But xattr is metadata, not data. listxattr() does not even
> > > do any POSIX permission check...
> > 
> > 
> > That it's metadata is an interesting difference but really not that
> > important. Plus, the manpage also doesn't care about this distinction.
> > Ownership and mode changes are bad enough, setting and getting random
> > xattrs seems like a terrible idea.
> > 
> > If I sent an O_PATH file descriptor around today in a sandbox without
> > procfs mounted then I know no one can suddenly set posix acls using that
> > file descriptor. With this change they suddenly can.
> > 
> > This won't be enabled.
> 
> OK, so I see the previous discussions now:
> 
> https://lore.kernel.org/all/20220622025715.upflevvao3ttaekj@senku/
> 
> > Since the current functionality cannot be retroactively disabled as it
> > is being used already through /proc/self/fd/$n, adding *xattrat(AT_EMPTY_PATH)
> > doesn't really change what is currently possible by userspace.
> >
> > I would say we should add *xattrat(2) and then we can add an upgrade
> > mask blocking it (and other operations) later.
> 
> I suppose some of these points shifted over the years, which is also fine.
> It's unfortunate though that the commit message would remain misleading forever.

I'm tempted to do some light-hearted trolling. I always marvel at the
precise and detailed systemd commit and merge messages. ;)

On a serious note, we often have stuff in commit messages that doesn't
age well. That's just a fact of development.
diff mbox series

Patch

diff --git a/fs/xattr.c b/fs/xattr.c
index 02bee149ad96..15df71e56187 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -704,7 +704,7 @@  static int path_setxattrat(int dfd, const char __user *pathname,
 
 	filename = getname_maybe_null(pathname, at_flags);
 	if (!filename) {
-		CLASS(fd, f)(dfd);
+		CLASS(fd_raw, f)(dfd);
 		if (fd_empty(f))
 			error = -EBADF;
 		else
@@ -848,7 +848,7 @@  static ssize_t path_getxattrat(int dfd, const char __user *pathname,
 
 	filename = getname_maybe_null(pathname, at_flags);
 	if (!filename) {
-		CLASS(fd, f)(dfd);
+		CLASS(fd_raw, f)(dfd);
 		if (fd_empty(f))
 			return -EBADF;
 		return file_getxattr(fd_file(f), &ctx);
@@ -978,7 +978,7 @@  static ssize_t path_listxattrat(int dfd, const char __user *pathname,
 
 	filename = getname_maybe_null(pathname, at_flags);
 	if (!filename) {
-		CLASS(fd, f)(dfd);
+		CLASS(fd_raw, f)(dfd);
 		if (fd_empty(f))
 			return -EBADF;
 		return file_listxattr(fd_file(f), list, size);
@@ -1079,7 +1079,7 @@  static int path_removexattrat(int dfd, const char __user *pathname,
 
 	filename = getname_maybe_null(pathname, at_flags);
 	if (!filename) {
-		CLASS(fd, f)(dfd);
+		CLASS(fd_raw, f)(dfd);
 		if (fd_empty(f))
 			return -EBADF;
 		return file_removexattr(fd_file(f), &kname);