Message ID | 20220727140014.69091-1-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] vfs: bypass may_create_in_sticky check on newly-created files if task has CAP_FOWNER | expand |
On Wed, Jul 27, 2022 at 10:00:14AM -0400, Jeff Layton wrote: > From: Christian Brauner <brauner@kernel.org> > > NFS server is exporting a sticky directory (mode 01777) with root > squashing enabled. Client has protect_regular enabled and then tries to > open a file as root in that directory. File is created (with ownership > set to nobody:nobody) but the open syscall returns an error. The problem > is may_create_in_sticky which rejects the open even though the file has > already been created. > > Add a new condition to may_create_in_sticky. If the file was just > created, then allow bypassing the ownership check if the task has > CAP_FOWNER. With this change, the initial open of a file by root works, > but later opens of the same file will fail. > > Note that we can contrive a similar situation by exporting with > all_squash and opening the file as an unprivileged user. This patch does > not fix that case. I suspect that that configuration is likely to be > fundamentally incompatible with the protect_* sysctls enabled on the > clients. > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976829 > Reported-by: Yongchen Yang <yoyang@redhat.com> > Suggested-by: Christian Brauner <brauner@kernel.org> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/namei.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > Hi Christian, > > I left you as author here since this is basically identical to the patch > you suggested. Let me know if that's an issue. No, that's fine. It feels pretty strange to be able to create a file and then not being able to open it fwiw. But we have that basically with nodev already. And we implicitly encode this in may_create_in_sticky() for this protected_* stuff. Relaxing this through CAP_FOWNER makes sense as it's explicitly thought to "Bypass permission checks on operations that normally require the filesystem UID of the process to match the UID of the file". One thing that I'm not sure about is something that Seth pointed out namely whether there's any NFS server side race window that would render FMODE_CREATED provided to may_create_in_sticky() inaccurate.
On Wed, 2022-07-27 at 16:33 +0200, Christian Brauner wrote: > On Wed, Jul 27, 2022 at 10:00:14AM -0400, Jeff Layton wrote: > > From: Christian Brauner <brauner@kernel.org> > > > > NFS server is exporting a sticky directory (mode 01777) with root > > squashing enabled. Client has protect_regular enabled and then tries to > > open a file as root in that directory. File is created (with ownership > > set to nobody:nobody) but the open syscall returns an error. The problem > > is may_create_in_sticky which rejects the open even though the file has > > already been created. > > > > Add a new condition to may_create_in_sticky. If the file was just > > created, then allow bypassing the ownership check if the task has > > CAP_FOWNER. With this change, the initial open of a file by root works, > > but later opens of the same file will fail. > > > > Note that we can contrive a similar situation by exporting with > > all_squash and opening the file as an unprivileged user. This patch does > > not fix that case. I suspect that that configuration is likely to be > > fundamentally incompatible with the protect_* sysctls enabled on the > > clients. > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=1976829 > > Reported-by: Yongchen Yang <yoyang@redhat.com> > > Suggested-by: Christian Brauner <brauner@kernel.org> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/namei.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > Hi Christian, > > > > I left you as author here since this is basically identical to the patch > > you suggested. Let me know if that's an issue. > > No, that's fine. > Thanks. > It feels pretty strange to be able to create a file and then not being > able to open it fwiw. But we have that basically with nodev already. And > we implicitly encode this in may_create_in_sticky() for this protected_* > stuff. Relaxing this through CAP_FOWNER makes sense as it's explicitly > thought to "Bypass permission checks on operations that normally require > the filesystem UID of the process to match the UID of the file". > > One thing that I'm not sure about is something that Seth pointed out > namely whether there's any NFS server side race window that would render > FMODE_CREATED provided to may_create_in_sticky() inaccurate. In general, permissions enforcement in NFS is done on the _server_. Trying to enforce permissions/ownership on the client is sketchy at best and subject to a number of potential race windows. Practically, it probably depends on the server. With NFSv4 the client looks at the change attr on the dir before and after the open, and if they are different then it assumes the file was created. This is usually non-atomic in most general-purpose server implementations, but with knfsd I *think* we hold the parent's i_rwsem for write when creating files, and maybe that's enough to prevent that sort of race. I'm not certain though.
diff --git a/fs/namei.c b/fs/namei.c index 1f28d3f463c3..26b602d1152b 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1221,7 +1221,8 @@ int may_linkat(struct user_namespace *mnt_userns, struct path *link) * Returns 0 if the open is allowed, -ve on error. */ static int may_create_in_sticky(struct user_namespace *mnt_userns, - struct nameidata *nd, struct inode *const inode) + struct nameidata *nd, struct inode *const inode, + bool created) { umode_t dir_mode = nd->dir_mode; kuid_t dir_uid = nd->dir_uid; @@ -1230,7 +1231,8 @@ static int may_create_in_sticky(struct user_namespace *mnt_userns, (!sysctl_protected_regular && S_ISREG(inode->i_mode)) || likely(!(dir_mode & S_ISVTX)) || uid_eq(i_uid_into_mnt(mnt_userns, inode), dir_uid) || - uid_eq(current_fsuid(), i_uid_into_mnt(mnt_userns, inode))) + uid_eq(current_fsuid(), i_uid_into_mnt(mnt_userns, inode)) || + (created && inode_owner_or_capable(mnt_userns, inode))) return 0; if (likely(dir_mode & 0002) || @@ -3496,7 +3498,8 @@ static int do_open(struct nameidata *nd, if (d_is_dir(nd->path.dentry)) return -EISDIR; error = may_create_in_sticky(mnt_userns, nd, - d_backing_inode(nd->path.dentry)); + d_backing_inode(nd->path.dentry), + (file->f_mode & FMODE_CREATED)); if (unlikely(error)) return error; }