Message ID | 20220829123843.1146874-4-brauner@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | acl: rework idmap handling when setting posix acls | expand |
On Mon, Aug 29, 2022 at 02:38:42PM +0200, Christian Brauner wrote: > Various filesystems store POSIX ACLs on the backing store in their uapi > format. Such filesystems need to translate from the uapi POSIX ACL > format into the VFS format during i_op->get_acl(). The VFS provides the > posix_acl_from_xattr() helper for this task. > > But the usage of posix_acl_from_xattr() is currently ambiguous. It is > intended to transform from a uapi POSIX ACL to the VFS represenation. > For example, when retrieving POSIX ACLs for permission checking during > lookup or when calling getxattr() to retrieve system.posix_acl_{access,default}. > > Calling posix_acl_from_xattr() during i_op->get_acl() will map the raw > {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL > format into k{g,u}id_t in the filesystem's idmapping and return a struct > posix_acl ready to be returned to the VFS for caching and to perform > permission checks on. > > However, posix_acl_from_xattr() is also called during setxattr() for all > filesystems that rely on VFS provides posix_acl_{access,default}_xattr_handler. > The posix_acl_xattr_set() handler which is used for the ->set() method > of posix_acl_{access,default}_xattr_handler uses posix_acl_from_xattr() > to translate from the uapi POSIX ACL format to the VFS format so that it > can be passed to the i_op->set_acl() handler of the filesystem or for > direct caching in case no i_op->set_acl() handler is defined. > > During setxattr() the {g,u}id values stored as ACL_{GROUP,USER} entries > in the uapi POSIX ACL format aren't raw {g,u}id values that need to be > mapped according to the filesystem's idmapping. Instead they are {g,u}id > values in the caller's idmapping which have been generated during > posix_acl_fix_xattr_from_user(). In other words, they are k{g,u}id_t > which are passed as raw {g,u}id values abusing the uapi POSIX ACL format > (Please note that this type safety violation has existed since the > introduction of k{g,u}id_t. Please see [1] for more details.). > > So when posix_acl_from_xattr() is called in posix_acl_xattr_set() the > filesystem idmapping is completely irrelevant. Instead, we abuse the > initial idmapping to recover the k{g,u}id_t base on the value stored in > raw {g,u}id as ACL_{GROUP,USER} in the uapi POSIX ACL format. > > We need to clearly distinguish betweeen these two operations as it is > really easy to confuse for filesystems as can be seen in ntfs3. > > In order to do this we factor out make_posix_acl() which takes callbacks > allowing callers to pass dedicated methods to generate the correct > k{g,u}id_t. This is just an internal static helper which is not exposed > to any filesystems but it neatly encapsulates the basic logic of walking > through a uapi POSIX ACL and returning an allocated VFS POSIX ACL with > the correct k{g,u}id_t values. > > The posix_acl_from_xattr() helper can then be implemented as a simple > call to make_posix_acl() with callbacks that generate the correct > k{g,u}id_t from the raw {g,u}id values in ACL_{GROUP,USER} entries in > the uapi POSIX ACL format as read from the backing store. > > For setxattr() we add a new helper vfs_set_acl_prepare() which has > callbacks to map the POSIX ACLs from the uapi format with the k{g,u}id_t > values stored in raw {g,u}id format in ACL_{GROUP,USER} entries into the > correct k{g,u}id_t values in the filesystem idmapping. In contrast to > posix_acl_from_xattr() the vfs_set_acl_prepare() helper needs to take > the mount idmapping into account. The differences are explained in more > detail in the kernel doc for the new functions. > > In follow up patches we will remove all abuses of posix_acl_from_xattr() > for setxattr() operations and replace it with calls to vfs_set_acl_prepare(). > > The new vfs_set_acl_prepare() helper allows us to deal with the > ambiguity in how the POSI ACL uapi struct stores {g,u}id values > depending on whether this is a getxattr() or setxattr() operation. > > This also allows us to remove the posix_acl_setxattr_idmapped_mnt() > helper reducing the abuse of the POSIX ACL uapi format to pass values > that should be distinct types in {g,u}id values stored as > ACL_{GROUP,USER} entries. > > The removal of posix_acl_setxattr_idmapped_mnt() in turn allows us to > re-constify the value parameter of vfs_setxattr() which in turn allows > us to avoid the nasty cast from a const void pointer to a non-const void > pointer on ovl_do_setxattr(). > > Ultimately, the plan is to get rid of the type violations completely and > never pass the values from k{g,u}id_t as raw {g,u}id in ACL_{GROUP,USER} > entries in uapi POSIX ACL format. But that's a longer way to go and this > is a preparatory step. > > Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] > Co-Developed-by: Seth Forshee <sforshee@digitalocean.com> > Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> I can't really give a Reviewed-by as co-author, but this does lgtm. One nit below however. > -/* > - * Convert from extended attribute to in-memory representation. > +/** > + * make_posix_acl - convert POSIX ACLs from uapi to VFS format using the > + * provided callbacks to map ACL_{GROUP,USER} entries into the > + * appropriate format > + * @mnt_userns: the mount's idmapping > + * @fs_userns: the filesystem's idmapping > + * @value: the uapi representation of POSIX ACLs > + * @size: the size of @void I think you mean "the size of @value"? This appears in a few other comments too.
On Mon, Aug 29, 2022 at 02:38:42PM +0200, Christian Brauner wrote: > Various filesystems store POSIX ACLs on the backing store in their uapi > format. Such filesystems need to translate from the uapi POSIX ACL > format into the VFS format during i_op->get_acl(). The VFS provides the > posix_acl_from_xattr() helper for this task. This has always been rather confusing. Maybe we should add a separate structure type for the on-disk vs uapi ACL formats? They will be the same in binary representation, but the extra type safety might make the core a lot more readable.
On Tue, Sep 06, 2022 at 06:57:46AM +0200, Christoph Hellwig wrote: > On Mon, Aug 29, 2022 at 02:38:42PM +0200, Christian Brauner wrote: > > Various filesystems store POSIX ACLs on the backing store in their uapi > > format. Such filesystems need to translate from the uapi POSIX ACL > > format into the VFS format during i_op->get_acl(). The VFS provides the > > posix_acl_from_xattr() helper for this task. > > This has always been rather confusing. Maybe we should add a separate Absolutely and it's pretty unsafe given that we're storing k{g,u}id_t in the uapi struct in the form of {g,u}id_t which we then recover later on. But I've documented this as best as I could in the helpers. > structure type for the on-disk vs uapi ACL formats? They will be the We do already have separate format for uapi and VFS ACLs. I'm not sure if you're suggesting another intermediate format. I'm currently working on a larger series to get rid of the uapi struct abuse for POSIX ACLs. Building on that work Seth will get rid of similar abuses for VFS caps. I'm fairly close but the rough idea is: struct xattr_args { const char *name; union { struct posix_acl *kacl; const void *kvalue; void *buffer; }; size_t size; }; struct xattr_handler { const char *name; const char *prefix; int flags; bool (*list)(struct dentry *dentry); int (*get)(const struct xattr_handler *, struct dentry *dentry, struct inode *inode, struct xattr_args *args); int (*set)(const struct xattr_handler *, struct user_namespace *mnt_userns, struct dentry *dentry, struct inode *inode, const struct xattr_args *args, int flags); }; All __vfs_{g,s}etxattr() helpers stay the same and can't be used to {g,s}et POSIX ACLs anymore instead we add: int vfs_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry, const char *acl_name, struct posix_acl *acl, int flags) { struct xattr_args xattr_args = { .name = acl_name, }; if (!is_posix_acl_xattr(acl_name)) return -EINVAL; return set_xattr_args(mnt_userns, dentry, &xattr_args, flags); } int vfs_get_acl(struct user_namespace *mnt_userns, struct dentry *dentry, const char *acl_name, struct posix_acl **acl) { int error; struct xattr_args xattr_args = { .name = acl_name, }; if (!is_posix_acl_xattr(acl_name)) return -EINVAL; error = get_xattr_args(mnt_userns, dentry, &xattr_args); if (error < 0) return error; *acl = xattr_args.kacl; return 0; } These two vfs helpers can then used by filesystems like overlayfs to set POSIX ACLs. This gets rid of passing crucial data that the VFS needs to interpret around in a void * blob as that's causing a lot of issues currently bc often filesystems or security hooks don't have any idea how to interpret them correctly. So the internal vfs api for getxattr() itself would then be: ssize_t do_get_acl(struct user_namespace *mnt_userns, struct dentry *d, struct xattr_ctx *ctx) { struct posix_acl *kacl = NULL; error = vfs_get_acl(mnt_userns, d, ctx->kname->name, &kacl); if (error) return error; /* convert to uapi format */ if (ctx->size) error = vfs_posix_acl_to_xattr(mnt_userns, d_inode(d), ctx->kacl, ctx->kvalue, ctx->size); posix_acl_release(kacl); return error; } ssize_t do_getxattr(struct user_namespace *mnt_userns, struct dentry *d, struct xattr_ctx *ctx) { ssize_t error; char *kname = ctx->kname->name; if (ctx->size) { if (ctx->size > XATTR_SIZE_MAX) ctx->size = XATTR_SIZE_MAX; ctx->kvalue = kvzalloc(ctx->size, GFP_KERNEL); if (!ctx->kvalue) return -ENOMEM; } if (is_posix_acl_xattr(ctx->kname->name)) error = do_get_acl(mnt_userns, d, ctx); else error = vfs_getxattr(mnt_userns, d, kname, ctx->kvalue, ctx->size); if (error > 0) { if (ctx->size && copy_to_user(ctx->value, ctx->kvalue, error)) error = -EFAULT; } else if (error == -ERANGE && ctx->size >= XATTR_SIZE_MAX) { /* The file system tried to returned a value bigger than XATTR_SIZE_MAX bytes. Not possible. */ error = -E2BIG; } return error; } and all the helpers that hack stuff into uapi POSIX ACLs are then gone. I'm fairly along but I'm happy to hear alternative ideas. Christian
On Tue, Sep 06, 2022 at 09:45:32AM +0200, Christian Brauner wrote: > > structure type for the on-disk vs uapi ACL formats? They will be the > > We do already have separate format for uapi and VFS ACLs. I'm not sure > if you're suggesting another intermediate format. Right now struct posix_acl_xattr_header and struct posix_acl_xattr_entry is used both for the UAPI, and the on-disk format of various file systems, despite the different cases using different kinds of uids/gids. > I'm currently working on a larger series to get rid of the uapi struct > abuse for POSIX ACLs. Building on that work Seth will get rid of similar > abuses for VFS caps. I'm fairly close but the rough idea is: Can we just stop accessing ACLs through the xattrs ops at all, and just have dedicated methods instead? This whole multiplexing of ACLs through xattrs APIs has been an unmitigated disaster. Similar for all other "xattrs" that are not just user data and interpreted by the kernel, but ACLs are by far the worst.
On Tue, Sep 06, 2022 at 09:53:13AM +0200, Christoph Hellwig wrote: > On Tue, Sep 06, 2022 at 09:45:32AM +0200, Christian Brauner wrote: > > > structure type for the on-disk vs uapi ACL formats? They will be the > > > > We do already have separate format for uapi and VFS ACLs. I'm not sure > > if you're suggesting another intermediate format. > > Right now struct posix_acl_xattr_header and > struct posix_acl_xattr_entry is used both for the UAPI, and the > on-disk format of various file systems, despite the different cases > using different kinds of uids/gids. > > > I'm currently working on a larger series to get rid of the uapi struct > > abuse for POSIX ACLs. Building on that work Seth will get rid of similar > > abuses for VFS caps. I'm fairly close but the rough idea is: > > Can we just stop accessing ACLs through the xattrs ops at all, and > just have dedicated methods instead? This whole multiplexing of > ACLs through xattrs APIs has been an unmitigated disaster. IIuc then this is exactly what I tried to do (I have a still very hacky version of this approach in https://gitlab.com/brauner/linux/-/commits/fs.posix_acl.vfsuid/). I've tried switching all filesystem to simply rely on i_op->{g,s}et_acl() but this doesn't work for at least 9p and cifs because they need access to the dentry. cifs hasn't even implemented i_op->get_acl() and I don't think they can because of the lack of a dentry argument. The problem is not just that i_op->{g,s}et_acl() don't take a dentry argument it's in principle also super annoying to pass it to them because i_op->get_acl() is used to retrieve POSIX ACLs during permission checking and thus is called from generic_permission() and thus inode_permission() and I don't think we want or even can pass down a dentry everywhere for those. So I stopped short of finishing this implementation because of that. So in order to make this work for cifs and 9p we would probably need a new i_op method that is separate from the i_op->get_acl() one used in the acl_permission_check() and friends... > > Similar for all other "xattrs" that are not just user data and > interpreted by the kernel, but ACLs are by far the worst. I absolutely agree.
On Tue, Sep 06, 2022 at 10:07:44AM +0200, Christian Brauner wrote: > I've tried switching all filesystem to simply rely on > i_op->{g,s}et_acl() but this doesn't work for at least 9p and cifs > because they need access to the dentry. cifs hasn't even implemented > i_op->get_acl() and I don't think they can because of the lack of a > dentry argument. > > The problem is not just that i_op->{g,s}et_acl() don't take a dentry > argument it's in principle also super annoying to pass it to them > because i_op->get_acl() is used to retrieve POSIX ACLs during permission > checking and thus is called from generic_permission() and thus > inode_permission() and I don't think we want or even can pass down a > dentry everywhere for those. So I stopped short of finishing this > implementation because of that. > > So in order to make this work for cifs and 9p we would probably need a > new i_op method that is separate from the i_op->get_acl() one used in > the acl_permission_check() and friends... Even if we can't use the existing methods, I think adding new set_denstry_acl/get_dentry_acl (or whatever we name them) methods is still better than doing this overload of the xattr methods (just like the uapi overload instead of separate syscalls, but we can't fix that).
On Tue, Sep 06, 2022 at 10:15:10AM +0200, Christoph Hellwig wrote: > On Tue, Sep 06, 2022 at 10:07:44AM +0200, Christian Brauner wrote: > > I've tried switching all filesystem to simply rely on > > i_op->{g,s}et_acl() but this doesn't work for at least 9p and cifs > > because they need access to the dentry. cifs hasn't even implemented > > i_op->get_acl() and I don't think they can because of the lack of a > > dentry argument. > > > > The problem is not just that i_op->{g,s}et_acl() don't take a dentry > > argument it's in principle also super annoying to pass it to them > > because i_op->get_acl() is used to retrieve POSIX ACLs during permission > > checking and thus is called from generic_permission() and thus > > inode_permission() and I don't think we want or even can pass down a > > dentry everywhere for those. So I stopped short of finishing this > > implementation because of that. > > > > So in order to make this work for cifs and 9p we would probably need a > > new i_op method that is separate from the i_op->get_acl() one used in > > the acl_permission_check() and friends... > > Even if we can't use the existing methods, I think adding new > set_denstry_acl/get_dentry_acl (or whatever we name them) methods is > still better than doing this overload of the xattr methods > (just like the uapi overload instead of separate syscalls, but we > can't fix that). Let me explore and see if I can finish the branch using dedicated i_op methods instead of updating i_op->get_acl(). I think any data that requires to be interpreteted by the VFS needs to have dedicated methods. Seth's branch for example, tries to add i_op->{g,s}et_vfs_caps() for vfs caps which also store ownership information instead of hacking it through the xattr api like we do now.
On Tue, Sep 06, 2022 at 10:24:28AM +0200, Christian Brauner wrote: > I think any data that requires to be interpreteted by the VFS needs to > have dedicated methods. Seth's branch for example, tries to add > i_op->{g,s}et_vfs_caps() for vfs caps which also store ownership > information instead of hacking it through the xattr api like we do now. Yes. Although with LSMs this will become really messy, but then again creating a complete unreviewable und auditable mess is what the LSM infrastructure was created for to start with..
On Tue, Sep 06, 2022 at 10:24:32AM +0200, Christian Brauner wrote: > On Tue, Sep 06, 2022 at 10:15:10AM +0200, Christoph Hellwig wrote: > > On Tue, Sep 06, 2022 at 10:07:44AM +0200, Christian Brauner wrote: > > > I've tried switching all filesystem to simply rely on > > > i_op->{g,s}et_acl() but this doesn't work for at least 9p and cifs > > > because they need access to the dentry. cifs hasn't even implemented > > > i_op->get_acl() and I don't think they can because of the lack of a > > > dentry argument. > > > > > > The problem is not just that i_op->{g,s}et_acl() don't take a dentry > > > argument it's in principle also super annoying to pass it to them > > > because i_op->get_acl() is used to retrieve POSIX ACLs during permission > > > checking and thus is called from generic_permission() and thus > > > inode_permission() and I don't think we want or even can pass down a > > > dentry everywhere for those. So I stopped short of finishing this > > > implementation because of that. > > > > > > So in order to make this work for cifs and 9p we would probably need a > > > new i_op method that is separate from the i_op->get_acl() one used in > > > the acl_permission_check() and friends... > > > > Even if we can't use the existing methods, I think adding new > > set_denstry_acl/get_dentry_acl (or whatever we name them) methods is > > still better than doing this overload of the xattr methods > > (just like the uapi overload instead of separate syscalls, but we > > can't fix that). > > Let me explore and see if I can finish the branch using dedicated i_op > methods instead of updating i_op->get_acl(). > > I think any data that requires to be interpreteted by the VFS needs to > have dedicated methods. Seth's branch for example, tries to add > i_op->{g,s}et_vfs_caps() for vfs caps which also store ownership > information instead of hacking it through the xattr api like we do now. I finished a draft of the series. It severly lacks in meangingful commit messages and I won't be able to finish it before Plumbers next week. If people want to take a look the branch is available on gitlab and kernel.org: https://gitlab.com/brauner/linux/-/commits/fs.posix_acl.vfsuid/ https://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git/log/?h=fs.posix_acl.vfsuid This passes xfstests (ext4, xfs, btrfs, overlayfs with and without idmapped layers, and LTP). I only needed to add i_op->get_dentry_acl() as it was possible to adapt ->set_acl() to take a dentry argument and not an inode argument. So we have a dedicated POSIX ACL api: struct posix_acl *vfs_get_acl(struct user_namespace *mnt_userns, struct dentry *dentry, const char *acl_name) int vfs_set_acl(struct user_namespace *mnt_userns, struct dentry *dentry, const char *acl_name, struct posix_acl *kacl) int vfs_remove_acl(struct user_namespace *mnt_userns, struct dentry *dentry, const char *acl_name) only relying on i_op->get_dentry_acl() and i_op->set_acl() removing the void * and uapi POSIX ACL abuse completely.
On Fri, Sep 09, 2022 at 10:03:39AM +0200, Christian Brauner wrote: > This passes xfstests (ext4, xfs, btrfs, overlayfs with and without > idmapped layers, and LTP). I only needed to add i_op->get_dentry_acl() > as it was possible to adapt ->set_acl() to take a dentry argument and > not an inode argument. This looks pretty nice. Two high level comments: - instead of adding lots of stub ->get_dentry_acl Ń–mplementations that wrap ->get_acl, just call ->get_acl if ->get_dentry_acl is not implementet in the VFS - I think the methods that take a dentry should be named consisently, so either ->get_dentry_acl and ->get_dentry_acl vs ->get_acl, or ->get_acl and ->set_acl vs ->get_inode_acl or something like that.
diff --git a/fs/posix_acl.c b/fs/posix_acl.c index abe387700ba9..31eac28e6582 100644 --- a/fs/posix_acl.c +++ b/fs/posix_acl.c @@ -857,12 +857,32 @@ void posix_acl_fix_xattr_to_user(void *value, size_t size) posix_acl_fix_xattr_userns(user_ns, &init_user_ns, value, size); } -/* - * Convert from extended attribute to in-memory representation. +/** + * make_posix_acl - convert POSIX ACLs from uapi to VFS format using the + * provided callbacks to map ACL_{GROUP,USER} entries into the + * appropriate format + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @value: the uapi representation of POSIX ACLs + * @size: the size of @void + * @uid_cb: callback to use for mapping the uid stored in ACL_USER entries + * @gid_cb: callback to use for mapping the gid stored in ACL_GROUP entries + * + * The make_posix_acl() helper is an abstraction to translate from uapi format + * into the VFS format allowing the caller to specific callbacks to map + * ACL_{GROUP,USER} entries into the expected format. This is used in + * posix_acl_from_xattr() and vfs_set_acl_prepare() and avoids pointless code + * duplication. + * + * Return: Allocated struct posix_acl on success, NULL for a valid header but + * without actual POSIX ACL entries, or ERR_PTR() encoded error code. */ -struct posix_acl * -posix_acl_from_xattr(struct user_namespace *user_ns, - const void *value, size_t size) +static struct posix_acl *make_posix_acl(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, const void *value, size_t size, + kuid_t (*uid_cb)(struct user_namespace *, struct user_namespace *, + const struct posix_acl_xattr_entry *), + kgid_t (*gid_cb)(struct user_namespace *, struct user_namespace *, + const struct posix_acl_xattr_entry *)) { const struct posix_acl_xattr_header *header = value; const struct posix_acl_xattr_entry *entry = (const void *)(header + 1), *end; @@ -893,16 +913,12 @@ posix_acl_from_xattr(struct user_namespace *user_ns, break; case ACL_USER: - acl_e->e_uid = - make_kuid(user_ns, - le32_to_cpu(entry->e_id)); + acl_e->e_uid = uid_cb(mnt_userns, fs_userns, entry); if (!uid_valid(acl_e->e_uid)) goto fail; break; case ACL_GROUP: - acl_e->e_gid = - make_kgid(user_ns, - le32_to_cpu(entry->e_id)); + acl_e->e_gid = gid_cb(mnt_userns, fs_userns, entry); if (!gid_valid(acl_e->e_gid)) goto fail; break; @@ -917,6 +933,181 @@ posix_acl_from_xattr(struct user_namespace *user_ns, posix_acl_release(acl); return ERR_PTR(-EINVAL); } + +/** + * vfs_set_acl_prepare_kuid - map ACL_USER uid according to mount- and + * filesystem idmapping + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @e: a ACL_USER entry in POSIX ACL uapi format + * + * The uid stored as ACL_USER entry in @e is a kuid_t stored as a raw {g,u}id + * value. The vfs_set_acl_prepare_kuid() will recover the kuid_t through + * KUIDT_INIT() and then map it according to the idmapped mount. The resulting + * kuid_t is the value which the filesystem can map up into a raw backing store + * id in the filesystem's idmapping. + * + * This is used in vfs_set_acl_prepare() to generate the proper VFS + * representation of POSIX ACLs with ACL_USER entries during setxattr(). + * + * Return: A kuid in @fs_userns for the uid stored in @e. + */ +static inline kuid_t +vfs_set_acl_prepare_kuid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const struct posix_acl_xattr_entry *e) +{ + kuid_t kuid = KUIDT_INIT(le32_to_cpu(e->e_id)); + return from_vfsuid(mnt_userns, fs_userns, VFSUIDT_INIT(kuid)); +} + +/** + * vfs_set_acl_prepare_kgid - map ACL_GROUP gid according to mount- and + * filesystem idmapping + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @e: a ACL_GROUP entry in POSIX ACL uapi format + * + * The gid stored as ACL_GROUP entry in @e is a kgid_t stored as a raw {g,u}id + * value. The vfs_set_acl_prepare_kgid() will recover the kgid_t through + * KGIDT_INIT() and then map it according to the idmapped mount. The resulting + * kgid_t is the value which the filesystem can map up into a raw backing store + * id in the filesystem's idmapping. + * + * This is used in vfs_set_acl_prepare() to generate the proper VFS + * representation of POSIX ACLs with ACL_GROUP entries during setxattr(). + * + * Return: A kgid in @fs_userns for the gid stored in @e. + */ +static inline kgid_t +vfs_set_acl_prepare_kgid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const struct posix_acl_xattr_entry *e) +{ + kgid_t kgid = KGIDT_INIT(le32_to_cpu(e->e_id)); + return from_vfsgid(mnt_userns, fs_userns, VFSGIDT_INIT(kgid)); +} + +/** + * vfs_set_acl_prepare - convert POSIX ACLs from uapi to VFS format taking + * mount and filesystem idmappings into account + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @value: the uapi representation of POSIX ACLs + * @size: the size of @void + * + * When setting POSIX ACLs with ACL_{GROUP,USER} entries they need to be + * mapped according to the relevant mount- and filesystem idmapping. It is + * important that the ACL_{GROUP,USER} entries in struct posix_acl will be + * mapped into k{g,u}id_t that are supposed to be mapped up in the filesystem + * idmapping. This is crucial since the resulting struct posix_acl might be + * cached filesystem wide. The vfs_set_acl_prepare() function will take care to + * perform all necessary idmappings. + * + * Note, that since basically forever the {g,u}id values encoded as + * ACL_{GROUP,USER} entries in the uapi POSIX ACLs passed via @value contain + * values that have been mapped according to the caller's idmapping. In other + * words, POSIX ACLs passed in uapi format as @value during setxattr() contain + * {g,u}id values in their ACL_{GROUP,USER} entries that should actually have + * been stored as k{g,u}id_t. + * + * This means, vfs_set_acl_prepare() needs to first recover the k{g,u}id_t by + * calling K{G,U}IDT_INIT(). Afterwards they can be interpreted as vfs{g,u}id_t + * through from_vfs{g,u}id() to account for any idmapped mounts. The + * vfs_set_acl_prepare_k{g,u}id() helpers will take care to generate the + * correct k{g,u}id_t. + * + * The filesystem will then receive the POSIX ACLs ready to be cached + * filesystem wide and ready to be written to the backing store taking the + * filesystem's idmapping into account. + * + * Return: Allocated struct posix_acl on success, NULL for a valid header but + * without actual POSIX ACL entries, or ERR_PTR() encoded error code. + */ +struct posix_acl *vfs_set_acl_prepare(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const void *value, size_t size) +{ + return make_posix_acl(mnt_userns, fs_userns, value, size, + vfs_set_acl_prepare_kuid, + vfs_set_acl_prepare_kgid); +} +EXPORT_SYMBOL(vfs_set_acl_prepare); + +/** + * posix_acl_from_xattr_kuid - map ACL_USER uid into filesystem idmapping + * @mnt_userns: unused + * @fs_userns: the filesystem's idmapping + * @e: a ACL_USER entry in POSIX ACL uapi format + * + * Map the uid stored as ACL_USER entry in @e into the filesystem's idmapping. + * This is used in posix_acl_from_xattr() to generate the proper VFS + * representation of POSIX ACLs with ACL_USER entries. + * + * Return: A kuid in @fs_userns for the uid stored in @e. + */ +static inline kuid_t +posix_acl_from_xattr_kuid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const struct posix_acl_xattr_entry *e) +{ + return make_kuid(fs_userns, le32_to_cpu(e->e_id)); +} + +/** + * posix_acl_from_xattr_kgid - map ACL_GROUP gid into filesystem idmapping + * @mnt_userns: unused + * @fs_userns: the filesystem's idmapping + * @e: a ACL_GROUP entry in POSIX ACL uapi format + * + * Map the gid stored as ACL_GROUP entry in @e into the filesystem's idmapping. + * This is used in posix_acl_from_xattr() to generate the proper VFS + * representation of POSIX ACLs with ACL_GROUP entries. + * + * Return: A kgid in @fs_userns for the gid stored in @e. + */ +static inline kgid_t +posix_acl_from_xattr_kgid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const struct posix_acl_xattr_entry *e) +{ + return make_kgid(fs_userns, le32_to_cpu(e->e_id)); +} + +/** + * posix_acl_from_xattr - convert POSIX ACLs from backing store to VFS format + * @fs_userns: the filesystem's idmapping + * @value: the uapi representation of POSIX ACLs + * @size: the size of @void + * + * Filesystems that store POSIX ACLs in the unaltered uapi format should use + * posix_acl_from_xattr() when reading them from the backing store and + * converting them into the struct posix_acl VFS format. The helper is + * specifically intended to be called from the ->get_acl() inode operation. + * + * The posix_acl_from_xattr() function will map the raw {g,u}id values stored + * in ACL_{GROUP,USER} entries into the filesystem idmapping in @fs_userns. The + * posix_acl_from_xattr_k{g,u}id() helpers will take care to generate the + * correct k{g,u}id_t. The returned struct posix_acl can be cached. + * + * Note that posix_acl_from_xattr() does not take idmapped mounts into account. + * If it did it calling is from the ->get_acl() inode operation would return + * POSIX ACLs mapped according to an idmapped mount which would mean that the + * value couldn't be cached for the filesystem. Idmapped mounts are taken into + * account on the fly during permission checking or right at the VFS - + * userspace boundary before reporting them to the user. + * + * Return: Allocated struct posix_acl on success, NULL for a valid header but + * without actual POSIX ACL entries, or ERR_PTR() encoded error code. + */ +struct posix_acl * +posix_acl_from_xattr(struct user_namespace *fs_userns, + const void *value, size_t size) +{ + return make_posix_acl(&init_user_ns, fs_userns, value, size, + posix_acl_from_xattr_kuid, + posix_acl_from_xattr_kgid); +} EXPORT_SYMBOL (posix_acl_from_xattr); /* diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h index b6bd3eac2bcc..47eca15fd842 100644 --- a/include/linux/posix_acl_xattr.h +++ b/include/linux/posix_acl_xattr.h @@ -66,6 +66,9 @@ struct posix_acl *posix_acl_from_xattr(struct user_namespace *user_ns, const void *value, size_t size); int posix_acl_to_xattr(struct user_namespace *user_ns, const struct posix_acl *acl, void *buffer, size_t size); +struct posix_acl *vfs_set_acl_prepare(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const void *value, size_t size); extern const struct xattr_handler posix_acl_access_xattr_handler; extern const struct xattr_handler posix_acl_default_xattr_handler;
Various filesystems store POSIX ACLs on the backing store in their uapi format. Such filesystems need to translate from the uapi POSIX ACL format into the VFS format during i_op->get_acl(). The VFS provides the posix_acl_from_xattr() helper for this task. But the usage of posix_acl_from_xattr() is currently ambiguous. It is intended to transform from a uapi POSIX ACL to the VFS represenation. For example, when retrieving POSIX ACLs for permission checking during lookup or when calling getxattr() to retrieve system.posix_acl_{access,default}. Calling posix_acl_from_xattr() during i_op->get_acl() will map the raw {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL format into k{g,u}id_t in the filesystem's idmapping and return a struct posix_acl ready to be returned to the VFS for caching and to perform permission checks on. However, posix_acl_from_xattr() is also called during setxattr() for all filesystems that rely on VFS provides posix_acl_{access,default}_xattr_handler. The posix_acl_xattr_set() handler which is used for the ->set() method of posix_acl_{access,default}_xattr_handler uses posix_acl_from_xattr() to translate from the uapi POSIX ACL format to the VFS format so that it can be passed to the i_op->set_acl() handler of the filesystem or for direct caching in case no i_op->set_acl() handler is defined. During setxattr() the {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL format aren't raw {g,u}id values that need to be mapped according to the filesystem's idmapping. Instead they are {g,u}id values in the caller's idmapping which have been generated during posix_acl_fix_xattr_from_user(). In other words, they are k{g,u}id_t which are passed as raw {g,u}id values abusing the uapi POSIX ACL format (Please note that this type safety violation has existed since the introduction of k{g,u}id_t. Please see [1] for more details.). So when posix_acl_from_xattr() is called in posix_acl_xattr_set() the filesystem idmapping is completely irrelevant. Instead, we abuse the initial idmapping to recover the k{g,u}id_t base on the value stored in raw {g,u}id as ACL_{GROUP,USER} in the uapi POSIX ACL format. We need to clearly distinguish betweeen these two operations as it is really easy to confuse for filesystems as can be seen in ntfs3. In order to do this we factor out make_posix_acl() which takes callbacks allowing callers to pass dedicated methods to generate the correct k{g,u}id_t. This is just an internal static helper which is not exposed to any filesystems but it neatly encapsulates the basic logic of walking through a uapi POSIX ACL and returning an allocated VFS POSIX ACL with the correct k{g,u}id_t values. The posix_acl_from_xattr() helper can then be implemented as a simple call to make_posix_acl() with callbacks that generate the correct k{g,u}id_t from the raw {g,u}id values in ACL_{GROUP,USER} entries in the uapi POSIX ACL format as read from the backing store. For setxattr() we add a new helper vfs_set_acl_prepare() which has callbacks to map the POSIX ACLs from the uapi format with the k{g,u}id_t values stored in raw {g,u}id format in ACL_{GROUP,USER} entries into the correct k{g,u}id_t values in the filesystem idmapping. In contrast to posix_acl_from_xattr() the vfs_set_acl_prepare() helper needs to take the mount idmapping into account. The differences are explained in more detail in the kernel doc for the new functions. In follow up patches we will remove all abuses of posix_acl_from_xattr() for setxattr() operations and replace it with calls to vfs_set_acl_prepare(). The new vfs_set_acl_prepare() helper allows us to deal with the ambiguity in how the POSI ACL uapi struct stores {g,u}id values depending on whether this is a getxattr() or setxattr() operation. This also allows us to remove the posix_acl_setxattr_idmapped_mnt() helper reducing the abuse of the POSIX ACL uapi format to pass values that should be distinct types in {g,u}id values stored as ACL_{GROUP,USER} entries. The removal of posix_acl_setxattr_idmapped_mnt() in turn allows us to re-constify the value parameter of vfs_setxattr() which in turn allows us to avoid the nasty cast from a const void pointer to a non-const void pointer on ovl_do_setxattr(). Ultimately, the plan is to get rid of the type violations completely and never pass the values from k{g,u}id_t as raw {g,u}id in ACL_{GROUP,USER} entries in uapi POSIX ACL format. But that's a longer way to go and this is a preparatory step. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Co-Developed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> --- fs/posix_acl.c | 213 ++++++++++++++++++++++++++++++-- include/linux/posix_acl_xattr.h | 3 + 2 files changed, 205 insertions(+), 11 deletions(-)