Message ID | 20240221-idmap-fscap-refactor-v2-20-3039364623bd@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: use type-safe uid representation for filesystem capabilities | expand |
On Wed, Feb 21, 2024 at 03:24:51PM -0600, Seth Forshee (DigitalOcean) wrote: > Add handlers which read fs caps from the lower or upper filesystem and > write/remove fs caps to the upper filesystem, performing copy-up as > necessary. > > While fscaps only really make sense on regular files, the general policy > is to allow most xattr namespaces on all different inode types, so > fscaps handlers are installed in the inode operations for all types of > inodes. > > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> > --- > fs/overlayfs/dir.c | 2 ++ > fs/overlayfs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ > fs/overlayfs/overlayfs.h | 5 ++++ > 3 files changed, 79 insertions(+) > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > index 0f8b4a719237..4ff360fe10c9 100644 > --- a/fs/overlayfs/dir.c > +++ b/fs/overlayfs/dir.c > @@ -1307,6 +1307,8 @@ const struct inode_operations ovl_dir_inode_operations = { > .get_inode_acl = ovl_get_inode_acl, > .get_acl = ovl_get_acl, > .set_acl = ovl_set_acl, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > .fileattr_get = ovl_fileattr_get, > .fileattr_set = ovl_fileattr_set, > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c > index c63b31a460be..7a8978ea6fe1 100644 > --- a/fs/overlayfs/inode.c > +++ b/fs/overlayfs/inode.c > @@ -568,6 +568,72 @@ int ovl_set_acl(struct mnt_idmap *idmap, struct dentry *dentry, > } > #endif > > +int ovl_get_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > + struct vfs_caps *caps) > +{ > + int err; > + const struct cred *old_cred; > + struct path realpath; > + > + ovl_path_real(dentry, &realpath); > + old_cred = ovl_override_creds(dentry->d_sb); > + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realpath.dentry, caps); Right, vfs_get_fscaps() returns a struct vfs_caps which contains a vfs{g,u}id and has the lower/upper layer's idmap taken into account. That confused me at first because vfs_get_acl() returns a struct posix_acl which contains k{g,u}id. Reading through this made me realize that we need a few more words about the translations. The reason is that we do distinct things for POSIX ACLs and for fscaps. For POSIX ACLs when we call vfs_get_acl() what we get is a struct posix_acl which contains k{g,u}id_t types. Because struct posix_acl is cached filesytems wide and thus shared among concurrent retrievers from different mounts with different idmappings. Which means that we can't put vfs{g,u}id_t types in there. Instead we perform translations on the fly. We do that in the VFS during path lookup and we do that for overlayfs when it retrieves POSIX ACLs. However, for fscaps we seem to do it differently because they're not cached which is ok because they don't matter during path lookup as POSIX ACLs do. So performance here doesn't matter too much. But that means overall that the translations are quite distinct. And that gets confusing when we have a stacking filesystem in the mix where we have to take into account the privileges of the mounter of the overlayfs instance and the idmap of the lower/upper layer. I only skimmed my old commit but I think that commit 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") contains a detailed explanation of this as I see: > For POSIX ACLs we need to do something similar. However, in contrast to fscaps > we cannot apply the fix directly to the kernel internal posix acl data > structure as this would alter the cached values and would also require a rework > of how we currently deal with POSIX ACLs in general which almost never take the > filesystem idmapping into account (the noteable exception being FUSE but even > there the implementation is special) and instead retrieve the raw values based > on the initial idmapping. Could you please add a diagram/explanation illustrating the translations for fscaps in the general case and for stacking filesystems? It doesn't really matter too much where you put it. Either add a section to Documentation/filesystems/porting.rst or add a section to Documentation/filesystems/idmapping.rst. > + revert_creds(old_cred); > + return err; > +} > + > +int ovl_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > + const struct vfs_caps *caps, int setxattr_flags) > +{ > + int err; > + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); > + struct dentry *upperdentry = ovl_dentry_upper(dentry); > + struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry); > + const struct cred *old_cred; > + > + /* > + * If the fscaps are to be remove from a lower file, check that they > + * exist before copying up. > + */ > + if (!caps && !upperdentry) { > + struct path realpath; > + struct vfs_caps lower_caps; > + > + ovl_path_lower(dentry, &realpath); > + old_cred = ovl_override_creds(dentry->d_sb); > + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realdentry, > + &lower_caps); > + revert_creds(old_cred); > + if (err) > + goto out; > + } > + > + err = ovl_want_write(dentry); > + if (err) > + goto out; > + > + err = ovl_copy_up(dentry); > + if (err) > + goto out_drop_write; > + upperdentry = ovl_dentry_upper(dentry); > + > + old_cred = ovl_override_creds(dentry->d_sb); > + if (!caps) > + err = vfs_remove_fscaps(ovl_upper_mnt_idmap(ofs), upperdentry); > + else > + err = vfs_set_fscaps(ovl_upper_mnt_idmap(ofs), upperdentry, > + caps, setxattr_flags); > + revert_creds(old_cred); > + > + /* copy c/mtime */ > + ovl_copyattr(d_inode(dentry)); > + > +out_drop_write: > + ovl_drop_write(dentry); > +out: > + return err; > +} > + > int ovl_update_time(struct inode *inode, int flags) > { > if (flags & S_ATIME) { > @@ -747,6 +813,8 @@ static const struct inode_operations ovl_file_inode_operations = { > .get_inode_acl = ovl_get_inode_acl, > .get_acl = ovl_get_acl, > .set_acl = ovl_set_acl, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > .fiemap = ovl_fiemap, > .fileattr_get = ovl_fileattr_get, > @@ -758,6 +826,8 @@ static const struct inode_operations ovl_symlink_inode_operations = { > .get_link = ovl_get_link, > .getattr = ovl_getattr, > .listxattr = ovl_listxattr, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > }; > > @@ -769,6 +839,8 @@ static const struct inode_operations ovl_special_inode_operations = { > .get_inode_acl = ovl_get_inode_acl, > .get_acl = ovl_get_acl, > .set_acl = ovl_set_acl, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > }; > > diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h > index ee949f3e7c77..4f948749ee02 100644 > --- a/fs/overlayfs/overlayfs.h > +++ b/fs/overlayfs/overlayfs.h > @@ -781,6 +781,11 @@ static inline struct posix_acl *ovl_get_acl_path(const struct path *path, > } > #endif > > +int ovl_get_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > + struct vfs_caps *caps); > +int ovl_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > + const struct vfs_caps *caps, int setxattr_flags); > + > int ovl_update_time(struct inode *inode, int flags); > bool ovl_is_private_xattr(struct super_block *sb, const char *name); > > > -- > 2.43.0 >
On Wed, Feb 21, 2024 at 11:25 PM Seth Forshee (DigitalOcean) <sforshee@kernel.org> wrote: > > Add handlers which read fs caps from the lower or upper filesystem and > write/remove fs caps to the upper filesystem, performing copy-up as > necessary. > > While fscaps only really make sense on regular files, the general policy > is to allow most xattr namespaces on all different inode types, so > fscaps handlers are installed in the inode operations for all types of > inodes. > > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> > --- > fs/overlayfs/dir.c | 2 ++ > fs/overlayfs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ > fs/overlayfs/overlayfs.h | 5 ++++ > 3 files changed, 79 insertions(+) > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > index 0f8b4a719237..4ff360fe10c9 100644 > --- a/fs/overlayfs/dir.c > +++ b/fs/overlayfs/dir.c > @@ -1307,6 +1307,8 @@ const struct inode_operations ovl_dir_inode_operations = { > .get_inode_acl = ovl_get_inode_acl, > .get_acl = ovl_get_acl, > .set_acl = ovl_set_acl, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > .fileattr_get = ovl_fileattr_get, > .fileattr_set = ovl_fileattr_set, > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c > index c63b31a460be..7a8978ea6fe1 100644 > --- a/fs/overlayfs/inode.c > +++ b/fs/overlayfs/inode.c > @@ -568,6 +568,72 @@ int ovl_set_acl(struct mnt_idmap *idmap, struct dentry *dentry, > } > #endif > > +int ovl_get_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > + struct vfs_caps *caps) > +{ > + int err; > + const struct cred *old_cred; > + struct path realpath; > + > + ovl_path_real(dentry, &realpath); > + old_cred = ovl_override_creds(dentry->d_sb); > + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realpath.dentry, caps); > + revert_creds(old_cred); > + return err; > +} > + > +int ovl_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > + const struct vfs_caps *caps, int setxattr_flags) > +{ > + int err; > + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); > + struct dentry *upperdentry = ovl_dentry_upper(dentry); > + struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry); > + const struct cred *old_cred; > + > + /* > + * If the fscaps are to be remove from a lower file, check that they > + * exist before copying up. > + */ Don't you need to convert -ENODATA to 0 return value in this case? > + if (!caps && !upperdentry) { > + struct path realpath; > + struct vfs_caps lower_caps; > + > + ovl_path_lower(dentry, &realpath); > + old_cred = ovl_override_creds(dentry->d_sb); > + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realdentry, > + &lower_caps); > + revert_creds(old_cred); > + if (err) > + goto out; > + } > + > + err = ovl_want_write(dentry); > + if (err) > + goto out; > + ovl_want_write() should after ovl_copy_up(), see: 162d06444070 ("ovl: reorder ovl_want_write() after ovl_inode_lock()") > + err = ovl_copy_up(dentry); > + if (err) > + goto out_drop_write; > + upperdentry = ovl_dentry_upper(dentry); > + > + old_cred = ovl_override_creds(dentry->d_sb); > + if (!caps) > + err = vfs_remove_fscaps(ovl_upper_mnt_idmap(ofs), upperdentry); > + else > + err = vfs_set_fscaps(ovl_upper_mnt_idmap(ofs), upperdentry, > + caps, setxattr_flags); > + revert_creds(old_cred); > + > + /* copy c/mtime */ > + ovl_copyattr(d_inode(dentry)); > + > +out_drop_write: > + ovl_drop_write(dentry); > +out: > + return err; > +} > + > int ovl_update_time(struct inode *inode, int flags) > { > if (flags & S_ATIME) { > @@ -747,6 +813,8 @@ static const struct inode_operations ovl_file_inode_operations = { > .get_inode_acl = ovl_get_inode_acl, > .get_acl = ovl_get_acl, > .set_acl = ovl_set_acl, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > .fiemap = ovl_fiemap, > .fileattr_get = ovl_fileattr_get, > @@ -758,6 +826,8 @@ static const struct inode_operations ovl_symlink_inode_operations = { > .get_link = ovl_get_link, > .getattr = ovl_getattr, > .listxattr = ovl_listxattr, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > }; > > @@ -769,6 +839,8 @@ static const struct inode_operations ovl_special_inode_operations = { > .get_inode_acl = ovl_get_inode_acl, > .get_acl = ovl_get_acl, > .set_acl = ovl_set_acl, > + .get_fscaps = ovl_get_fscaps, > + .set_fscaps = ovl_set_fscaps, > .update_time = ovl_update_time, > }; > Sorry, I did not understand the explanation why fscaps ops are needed for non regular files. It does not look right to me. Thanks, Amir.
On Tue, Feb 27, 2024 at 03:28:18PM +0200, Amir Goldstein wrote: > > +int ovl_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, > > + const struct vfs_caps *caps, int setxattr_flags) > > +{ > > + int err; > > + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); > > + struct dentry *upperdentry = ovl_dentry_upper(dentry); > > + struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry); > > + const struct cred *old_cred; > > + > > + /* > > + * If the fscaps are to be remove from a lower file, check that they > > + * exist before copying up. > > + */ > > Don't you need to convert -ENODATA to 0 return value in this case? Do you mean that trying to remove an xattr that does not exist should return 0? Standard behavior is to return -ENODATA in this situation. > > > + if (!caps && !upperdentry) { > > + struct path realpath; > > + struct vfs_caps lower_caps; > > + > > + ovl_path_lower(dentry, &realpath); > > + old_cred = ovl_override_creds(dentry->d_sb); > > + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realdentry, > > + &lower_caps); > > + revert_creds(old_cred); > > + if (err) > > + goto out; > > + } > > + > > + err = ovl_want_write(dentry); > > + if (err) > > + goto out; > > + > > ovl_want_write() should after ovl_copy_up(), see: > 162d06444070 ("ovl: reorder ovl_want_write() after ovl_inode_lock()") Fixed. > > @@ -758,6 +826,8 @@ static const struct inode_operations ovl_symlink_inode_operations = { > > .get_link = ovl_get_link, > > .getattr = ovl_getattr, > > .listxattr = ovl_listxattr, > > + .get_fscaps = ovl_get_fscaps, > > + .set_fscaps = ovl_set_fscaps, > > .update_time = ovl_update_time, > > }; > > > > @@ -769,6 +839,8 @@ static const struct inode_operations ovl_special_inode_operations = { > > .get_inode_acl = ovl_get_inode_acl, > > .get_acl = ovl_get_acl, > > .set_acl = ovl_set_acl, > > + .get_fscaps = ovl_get_fscaps, > > + .set_fscaps = ovl_set_fscaps, > > .update_time = ovl_update_time, > > }; > > > > > Sorry, I did not understand the explanation why fscaps ops are needed > for non regular files. It does not look right to me. The kernel does not forbid XATTR_NAME_CAPS for non-regular files and will internally even try to read them from non-regular files during killpriv checks. If we do not add handlers then we will end up using the normal ovl xattr handlers, which call vfs_*xattr(). These will return an error for fscaps xattrs after this series, which would be a change in behavior for overlayfs and make it behave differently from other filesystems.
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 0f8b4a719237..4ff360fe10c9 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -1307,6 +1307,8 @@ const struct inode_operations ovl_dir_inode_operations = { .get_inode_acl = ovl_get_inode_acl, .get_acl = ovl_get_acl, .set_acl = ovl_set_acl, + .get_fscaps = ovl_get_fscaps, + .set_fscaps = ovl_set_fscaps, .update_time = ovl_update_time, .fileattr_get = ovl_fileattr_get, .fileattr_set = ovl_fileattr_set, diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index c63b31a460be..7a8978ea6fe1 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -568,6 +568,72 @@ int ovl_set_acl(struct mnt_idmap *idmap, struct dentry *dentry, } #endif +int ovl_get_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, + struct vfs_caps *caps) +{ + int err; + const struct cred *old_cred; + struct path realpath; + + ovl_path_real(dentry, &realpath); + old_cred = ovl_override_creds(dentry->d_sb); + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realpath.dentry, caps); + revert_creds(old_cred); + return err; +} + +int ovl_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, + const struct vfs_caps *caps, int setxattr_flags) +{ + int err; + struct ovl_fs *ofs = OVL_FS(dentry->d_sb); + struct dentry *upperdentry = ovl_dentry_upper(dentry); + struct dentry *realdentry = upperdentry ?: ovl_dentry_lower(dentry); + const struct cred *old_cred; + + /* + * If the fscaps are to be remove from a lower file, check that they + * exist before copying up. + */ + if (!caps && !upperdentry) { + struct path realpath; + struct vfs_caps lower_caps; + + ovl_path_lower(dentry, &realpath); + old_cred = ovl_override_creds(dentry->d_sb); + err = vfs_get_fscaps(mnt_idmap(realpath.mnt), realdentry, + &lower_caps); + revert_creds(old_cred); + if (err) + goto out; + } + + err = ovl_want_write(dentry); + if (err) + goto out; + + err = ovl_copy_up(dentry); + if (err) + goto out_drop_write; + upperdentry = ovl_dentry_upper(dentry); + + old_cred = ovl_override_creds(dentry->d_sb); + if (!caps) + err = vfs_remove_fscaps(ovl_upper_mnt_idmap(ofs), upperdentry); + else + err = vfs_set_fscaps(ovl_upper_mnt_idmap(ofs), upperdentry, + caps, setxattr_flags); + revert_creds(old_cred); + + /* copy c/mtime */ + ovl_copyattr(d_inode(dentry)); + +out_drop_write: + ovl_drop_write(dentry); +out: + return err; +} + int ovl_update_time(struct inode *inode, int flags) { if (flags & S_ATIME) { @@ -747,6 +813,8 @@ static const struct inode_operations ovl_file_inode_operations = { .get_inode_acl = ovl_get_inode_acl, .get_acl = ovl_get_acl, .set_acl = ovl_set_acl, + .get_fscaps = ovl_get_fscaps, + .set_fscaps = ovl_set_fscaps, .update_time = ovl_update_time, .fiemap = ovl_fiemap, .fileattr_get = ovl_fileattr_get, @@ -758,6 +826,8 @@ static const struct inode_operations ovl_symlink_inode_operations = { .get_link = ovl_get_link, .getattr = ovl_getattr, .listxattr = ovl_listxattr, + .get_fscaps = ovl_get_fscaps, + .set_fscaps = ovl_set_fscaps, .update_time = ovl_update_time, }; @@ -769,6 +839,8 @@ static const struct inode_operations ovl_special_inode_operations = { .get_inode_acl = ovl_get_inode_acl, .get_acl = ovl_get_acl, .set_acl = ovl_set_acl, + .get_fscaps = ovl_get_fscaps, + .set_fscaps = ovl_set_fscaps, .update_time = ovl_update_time, }; diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index ee949f3e7c77..4f948749ee02 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -781,6 +781,11 @@ static inline struct posix_acl *ovl_get_acl_path(const struct path *path, } #endif +int ovl_get_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, + struct vfs_caps *caps); +int ovl_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry, + const struct vfs_caps *caps, int setxattr_flags); + int ovl_update_time(struct inode *inode, int flags); bool ovl_is_private_xattr(struct super_block *sb, const char *name);
Add handlers which read fs caps from the lower or upper filesystem and write/remove fs caps to the upper filesystem, performing copy-up as necessary. While fscaps only really make sense on regular files, the general policy is to allow most xattr namespaces on all different inode types, so fscaps handlers are installed in the inode operations for all types of inodes. Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> --- fs/overlayfs/dir.c | 2 ++ fs/overlayfs/inode.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/overlayfs/overlayfs.h | 5 ++++ 3 files changed, 79 insertions(+)