diff mbox series

[v2] vfs: bypass may_create_in_sticky check on newly-created files if task has CAP_FOWNER

Message ID 20220727140014.69091-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v2] vfs: bypass may_create_in_sticky check on newly-created files if task has CAP_FOWNER | expand

Commit Message

Jeff Layton July 27, 2022, 2 p.m. UTC
From: Christian Brauner <brauner@kernel.org>

NFS server is exporting a sticky directory (mode 01777) with root
squashing enabled. Client has protect_regular enabled and then tries to
open a file as root in that directory. File is created (with ownership
set to nobody:nobody) but the open syscall returns an error. The problem
is may_create_in_sticky which rejects the open even though the file has
already been created.

Add a new condition to may_create_in_sticky. If the file was just
created, then allow bypassing the ownership check if the task has
CAP_FOWNER. With this change, the initial open of a file by root works,
but later opens of the same file will fail.

Note that we can contrive a similar situation by exporting with
all_squash and opening the file as an unprivileged user. This patch does
not fix that case. I suspect that that configuration is likely to be
fundamentally incompatible with the protect_* sysctls enabled on the
clients.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976829
Reported-by: Yongchen Yang <yoyang@redhat.com>
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/namei.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Hi Christian,

I left you as author here since this is basically identical to the patch
you suggested. Let me know if that's an issue.

-- Jeff

Comments

Christian Brauner July 27, 2022, 2:33 p.m. UTC | #1
On Wed, Jul 27, 2022 at 10:00:14AM -0400, Jeff Layton wrote:
> From: Christian Brauner <brauner@kernel.org>
> 
> NFS server is exporting a sticky directory (mode 01777) with root
> squashing enabled. Client has protect_regular enabled and then tries to
> open a file as root in that directory. File is created (with ownership
> set to nobody:nobody) but the open syscall returns an error. The problem
> is may_create_in_sticky which rejects the open even though the file has
> already been created.
> 
> Add a new condition to may_create_in_sticky. If the file was just
> created, then allow bypassing the ownership check if the task has
> CAP_FOWNER. With this change, the initial open of a file by root works,
> but later opens of the same file will fail.
> 
> Note that we can contrive a similar situation by exporting with
> all_squash and opening the file as an unprivileged user. This patch does
> not fix that case. I suspect that that configuration is likely to be
> fundamentally incompatible with the protect_* sysctls enabled on the
> clients.
> 
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976829
> Reported-by: Yongchen Yang <yoyang@redhat.com>
> Suggested-by: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/namei.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Hi Christian,
> 
> I left you as author here since this is basically identical to the patch
> you suggested. Let me know if that's an issue.

No, that's fine.

It feels pretty strange to be able to create a file and then not being
able to open it fwiw. But we have that basically with nodev already. And
we implicitly encode this in may_create_in_sticky() for this protected_*
stuff. Relaxing this through CAP_FOWNER makes sense as it's explicitly
thought to "Bypass permission checks on operations that normally require
the filesystem UID of the process to match the UID of the file".

One thing that I'm not sure about is something that Seth pointed out
namely whether there's any NFS server side race window that would render
FMODE_CREATED provided to may_create_in_sticky() inaccurate.
Jeff Layton July 27, 2022, 2:53 p.m. UTC | #2
On Wed, 2022-07-27 at 16:33 +0200, Christian Brauner wrote:
> On Wed, Jul 27, 2022 at 10:00:14AM -0400, Jeff Layton wrote:
> > From: Christian Brauner <brauner@kernel.org>
> > 
> > NFS server is exporting a sticky directory (mode 01777) with root
> > squashing enabled. Client has protect_regular enabled and then tries to
> > open a file as root in that directory. File is created (with ownership
> > set to nobody:nobody) but the open syscall returns an error. The problem
> > is may_create_in_sticky which rejects the open even though the file has
> > already been created.
> > 
> > Add a new condition to may_create_in_sticky. If the file was just
> > created, then allow bypassing the ownership check if the task has
> > CAP_FOWNER. With this change, the initial open of a file by root works,
> > but later opens of the same file will fail.
> > 
> > Note that we can contrive a similar situation by exporting with
> > all_squash and opening the file as an unprivileged user. This patch does
> > not fix that case. I suspect that that configuration is likely to be
> > fundamentally incompatible with the protect_* sysctls enabled on the
> > clients.
> > 
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976829
> > Reported-by: Yongchen Yang <yoyang@redhat.com>
> > Suggested-by: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/namei.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Hi Christian,
> > 
> > I left you as author here since this is basically identical to the patch
> > you suggested. Let me know if that's an issue.
> 
> No, that's fine.
> 

Thanks.

> It feels pretty strange to be able to create a file and then not being
> able to open it fwiw. But we have that basically with nodev already. And
> we implicitly encode this in may_create_in_sticky() for this protected_*
> stuff. Relaxing this through CAP_FOWNER makes sense as it's explicitly
> thought to "Bypass permission checks on operations that normally require
> the filesystem UID of the process to match the UID of the file".
> 
> One thing that I'm not sure about is something that Seth pointed out
> namely whether there's any NFS server side race window that would render
> FMODE_CREATED provided to may_create_in_sticky() inaccurate.


In general, permissions enforcement in NFS is done on the _server_.
Trying to enforce permissions/ownership on the client is sketchy at best
and subject to a number of potential race windows.

Practically, it probably depends on the server. With NFSv4 the client
looks at the change attr on the dir before and after the open, and if
they are different then it assumes the file was created.

This is usually non-atomic in most general-purpose server
implementations, but with knfsd I *think* we hold the parent's i_rwsem
for write when creating files, and maybe that's enough to prevent that
sort of race. I'm not certain though.
diff mbox series

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 1f28d3f463c3..26b602d1152b 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1221,7 +1221,8 @@  int may_linkat(struct user_namespace *mnt_userns, struct path *link)
  * Returns 0 if the open is allowed, -ve on error.
  */
 static int may_create_in_sticky(struct user_namespace *mnt_userns,
-				struct nameidata *nd, struct inode *const inode)
+				struct nameidata *nd, struct inode *const inode,
+				bool created)
 {
 	umode_t dir_mode = nd->dir_mode;
 	kuid_t dir_uid = nd->dir_uid;
@@ -1230,7 +1231,8 @@  static int may_create_in_sticky(struct user_namespace *mnt_userns,
 	    (!sysctl_protected_regular && S_ISREG(inode->i_mode)) ||
 	    likely(!(dir_mode & S_ISVTX)) ||
 	    uid_eq(i_uid_into_mnt(mnt_userns, inode), dir_uid) ||
-	    uid_eq(current_fsuid(), i_uid_into_mnt(mnt_userns, inode)))
+	    uid_eq(current_fsuid(), i_uid_into_mnt(mnt_userns, inode)) ||
+	    (created && inode_owner_or_capable(mnt_userns, inode)))
 		return 0;
 
 	if (likely(dir_mode & 0002) ||
@@ -3496,7 +3498,8 @@  static int do_open(struct nameidata *nd,
 		if (d_is_dir(nd->path.dentry))
 			return -EISDIR;
 		error = may_create_in_sticky(mnt_userns, nd,
-					     d_backing_inode(nd->path.dentry));
+					     d_backing_inode(nd->path.dentry),
+					     (file->f_mode & FMODE_CREATED));
 		if (unlikely(error))
 			return error;
 	}