mbox series

[v1,0/9] fuse: basic support for idmapped mounts

Message ID 20240108120824.122178-1-aleksandr.mikhalitsyn@canonical.com (mailing list archive)
Headers show
Series fuse: basic support for idmapped mounts | expand

Message

Aleksandr Mikhalitsyn Jan. 8, 2024, 12:08 p.m. UTC
Dear friends,

This patch series aimed to provide support for idmapped mounts
for fuse. We already have idmapped mounts support for almost all
widely-used filesystems:
* local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
* network (ceph)

Git tree (based on https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=for-next):
v1: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1
current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts

Having fuse supported looks like a good next step. At the same time
fuse conceptually close to the network filesystems and supporting it is
a quite challenging task.

Let me briefly explain what was done in this series and which obstacles we have.

With this series, you can use idmapped mounts with fuse if the following
conditions are met:
1. The filesystem daemon declares idmap support (new FUSE_INIT response feature
flags FUSE_OWNER_UID_GID_EXT and FUSE_ALLOW_IDMAP)
2. The filesystem superblock was mounted with the "default_permissions" parameter
3. The filesystem fuse daemon does not perform any UID/GID-based checks internally
and fully trusts the kernel to do that (yes, it's almost the same as 2.)

I have prepared a bunch of real-world examples of the user space modifications
that can be done to use this extension:
- libfuse support
https://github.com/mihalicyn/libfuse/commits/idmap_support
- fuse-overlayfs support:
https://github.com/mihalicyn/fuse-overlayfs/commits/idmap_support
- cephfs-fuse conversion example
https://github.com/mihalicyn/ceph/commits/fuse_idmap
- glusterfs conversion example (there is a conceptual issue)
https://github.com/mihalicyn/glusterfs/commits/fuse_idmap

The glusterfs is a bit problematic, unfortunately, because even if the glusterfs
superblock was mounted with the "default_permissions" parameter (1 and 2 conditions
are satisfied), it fails to satisfy the 3rd condition. The glusterfs fuse daemon sends
caller UIDs/GIDs over the wire and all the permission checks are done twice (first
on the client side (in the fuse kernel module) and second on the glusterfs server side).
Just for demonstration's sake, I found a hacky (but working) solution for glusterfs
that disables these server-side permission checks (see [1]). This allows you to play
with the filesystem and idmapped mounts and it works just fine.

The problem described above is the main problem that we can meet when
working on idmapped mounts support for network-based filesystems (or network-like filesystems
like fuse). When people look at the idmapped mounts feature at first they tend to think
that idmaps are for faking caller UIDs/GIDs, but that's not the case. There was a big
discussion about this in the "ceph: support idmapped mounts" patch series [2], [3].
The brief outcome from this discussion is that we don't want and don't have to fool
filesystem code and map a caller's UID/GID everywhere, but only in VFS i_op's
which are provided with a "struct mnt_idmap *idmap"). For example ->lookup()
callback is not provided with it and that's on purpose! We don't expect the low-level
filesystem code to do any permissions checks inside this callback because everything
was already checked on the higher level (see may_lookup() helper). For local filesystems
this assumption works like a charm, but for network-based, unfortunately, not.
For example, the cephfs kernel client *always* send called UID/GID with *any* request
(->lookup included!) and then *may* (depending on the MDS configuration) perform any
permissions checks on the server side based on these values, which obviously leads
to issues/inconsistencies if VFS idmaps are involved.

Fuse filesystem very-very close to cephfs example, because we have req->in.h.uid/req->in.h.gid
and these values are present in all fuse requests and userspace may use them as it wants.

All of the above explains why we have a "default_permissions" requirement. If filesystem
does not use it, then permission checks will be widespread across all the i_op's like
->lookup, ->unlink, ->readlink instead of being consolidated in the one place (->permission callback).

In this series, my approach is the same as in cephfs [4], [5]. Don't touch req->in.h.uid/req->in.h.gid values
at all (because we can't properly idmap them as we don't have "struct mnt_idmap *idmap" everywhere),
instead, provide the userspace with a new optional (FUSE_OWNER_UID_GID_EXT) UID/GID suitable
only for ->mknod, ->mkdir, ->symlink, ->atomic_open and these values have to be used as the
owner UID and GID for newly created inodes.

Things to discuss:
- we enable idmapped mounts support only if "default_permissions" mode is enabled,
because otherwise, we would need to deal with UID/GID mappings on the userspace side OR
provide the userspace with idmapped req->in.h.uid/req->in.h.gid values which is not
something that we probably want to do. Idmapped mounts philosophy is not about faking
caller uid/gid.

- We have a small offlist discussion with Christian about adding fs_type->allow_idmap
hook. Christian pointed out that it would be nice to have a superblock flag instead like
SB_I_NOIDMAP and we can set this flag during mount time if we see that the filesystem does not
support idmappings. But, unfortunately, I didn't succeed here because the kernel will
know if the filesystem supports idmapping or not after FUSE_INIT request, but FUSE_INIT request
is being sent at the end of the mounting process, so the mount and superblock will exist and
visible by the userspace in that time. It seems like setting SB_I_NOIDMAP flag, in this
case, is too late as a user may do the trick by creating an idmapped mount while it wasn't
restricted by SB_I_NOIDMAP. Alternatively, we can introduce a "positive" version SB_I_ALLOWIDMAP
and a "weak" version of FS_ALLOW_IDMAP like FS_MAY_ALLOW_IDMAP. So if FS_MAY_ALLOW_IDMAP is set,
then SB_I_ALLOWIDMAP has to be set on the superblock to allow the creation of an idmapped mount.
But that's a matter of our discussion.

How to play with it:
1. take any patched filesystem from the list (fuse-overlayfs, cephfs-fuse, glusterfs) and mount it
2. ./mount-idmapped --map-mount b:1000:0:2 /mnt/my_fuse_mount /mnt/my_fuse_mount_idmapped
(maps UID/GIDs as 1000 -> 0, 1001 -> 1)
[ taken from https://raw.githubusercontent.com/brauner/mount-idmapped/master/mount-idmapped.c ]

[1] https://github.com/mihalicyn/glusterfs/commit/ab3ec2c7cbe22618cba9cc94a52a492b1904d0b2
[2] https://lore.kernel.org/lkml/20230608154256.562906-1-aleksandr.mikhalitsyn@canonical.com/
[3] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/
[4] https://github.com/ceph/ceph/pull/52575
[5] https://lore.kernel.org/all/20230807132626.182101-4-aleksandr.mikhalitsyn@canonical.com/

Thanks!
Alex

Cc: Christian Brauner <brauner@kernel.org>
Cc: Seth Forshee <sforshee@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Bernd Schubert <bschubert@ddn.com>
Cc: <linux-fsdevel@vger.kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

Alexander Mikhalitsyn (9):
  fs/namespace: introduce fs_type->allow_idmap hook
  fs/fuse: add FUSE_OWNER_UID_GID_EXT extension
  fs/fuse: support idmap for mkdir/mknod/symlink/create
  fs/fuse: support idmapped getattr inode op
  fs/fuse: support idmapped ->permission inode op
  fs/fuse: support idmapped ->setattr op
  fs/fuse: drop idmap argument from __fuse_get_acl
  fs/fuse: support idmapped ->set_acl
  fs/fuse: allow idmapped mounts

 fs/fuse/acl.c             |  10 ++-
 fs/fuse/dir.c             | 143 +++++++++++++++++++++++++-------------
 fs/fuse/file.c            |   2 +-
 fs/fuse/fuse_i.h          |  10 ++-
 fs/fuse/inode.c           |  24 ++++++-
 fs/namespace.c            |   3 +-
 include/linux/fs.h        |   1 +
 include/uapi/linux/fuse.h |  24 ++++++-
 8 files changed, 153 insertions(+), 64 deletions(-)

Comments

Christian Brauner Jan. 21, 2024, 5:50 p.m. UTC | #1
On Mon, Jan 08, 2024 at 01:08:15PM +0100, Alexander Mikhalitsyn wrote:
> Dear friends,
> 
> This patch series aimed to provide support for idmapped mounts
> for fuse. We already have idmapped mounts support for almost all
> widely-used filesystems:
> * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
> * network (ceph)
> 
> Git tree (based on https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=for-next):
> v1: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1
> current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts

Great work!

> Things to discuss:
> - we enable idmapped mounts support only if "default_permissions" mode is enabled,
> because otherwise, we would need to deal with UID/GID mappings on the userspace side OR
> provide the userspace with idmapped req->in.h.uid/req->in.h.gid values which is not
> something that we probably want to do. Idmapped mounts philosophy is not about faking
> caller uid/gid.

Having VFS idmaps but then outsourcing permission checking to userspace
is conceptually strange so requiring default_permissions is the correct
thing to do. 

> - We have a small offlist discussion with Christian about adding fs_type->allow_idmap
> hook. Christian pointed out that it would be nice to have a superblock flag instead like
> SB_I_NOIDMAP and we can set this flag during mount time if we see that the filesystem does not
> support idmappings. But, unfortunately, I didn't succeed here because the kernel will
> know if the filesystem supports idmapping or not after FUSE_INIT request, but FUSE_INIT request
> is being sent at the end of the mounting process, so the mount and superblock will exist and
> visible by the userspace in that time. It seems like setting SB_I_NOIDMAP flag, in this
> case, is too late as a user may do the trick by creating an idmapped mount while it wasn't
> restricted by SB_I_NOIDMAP. Alternatively, we can introduce a "positive" version SB_I_ALLOWIDMAP

I see.

> and a "weak" version of FS_ALLOW_IDMAP like FS_MAY_ALLOW_IDMAP. So if FS_MAY_ALLOW_IDMAP is set,
> then SB_I_ALLOWIDMAP has to be set on the superblock to allow the creation of an idmapped mount.
> But that's a matter of our discussion.

I dislike making adding a struct super_block method. Because it means that we
call into the filesystem from generic mount code and specifically with the
namespace semaphore held. If there's ever any network filesystem that e.g.,
calls to a hung server it will lockup the whole system. So I'm opposed to
calling into the filesystem here at all. It's also ugly because this is really
a vfs level change. The only involvement should be whether the filesystem type
can actually support this ideally.

I think we should handle this within FUSE. So we allow the creation of idmapped
mounts just based on FS_ALLOW_IDMAP. And if the server doesn't support the
FUSE_OWNER_UID_GID_EXT then we simply refuse all creation requests originating
from an idmapped mount. Either we return EOPNOSUPP or we return EOVERFLOW to
indicate that we can't represent the owner correctly because the server is
missing the required extension.

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 3f37ba6a7a10..0726da21150a 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -606,8 +606,16 @@ static int get_create_ext(struct mnt_idmap *idmap,
                err = get_security_context(dentry, mode, &ext);
        if (!err && fc->create_supp_group)
                err = get_create_supp_group(dir, &ext);
-       if (!err && fc->owner_uid_gid_ext)
-               err = get_owner_uid_gid(idmap, fc, &ext);
+       if (!err) {
+               /*
+                * If the server doesn't support FUSE_OWNER_UID_GID_EXT and
+                * this is a creation request from an idmapped mount refuse it.
+                */
+               if (fc->owner_uid_gid_ext)
+                       err = get_owner_uid_gid(idmap, fc, &ext);
+               else if (idmap != &nop_mnt_idmap)
+                       err = -EOPNOTSUPP;
+       }

        if (!err && ext.size) {
                WARN_ON(args->in_numargs >= ARRAY_SIZE(args->in_args));
Seth Forshee (DigitalOcean) Jan. 22, 2024, 9:13 p.m. UTC | #2
On Sun, Jan 21, 2024 at 06:50:57PM +0100, Christian Brauner wrote:
> > - We have a small offlist discussion with Christian about adding fs_type->allow_idmap
> > hook. Christian pointed out that it would be nice to have a superblock flag instead like
> > SB_I_NOIDMAP and we can set this flag during mount time if we see that the filesystem does not
> > support idmappings. But, unfortunately, I didn't succeed here because the kernel will
> > know if the filesystem supports idmapping or not after FUSE_INIT request, but FUSE_INIT request
> > is being sent at the end of the mounting process, so the mount and superblock will exist and
> > visible by the userspace in that time. It seems like setting SB_I_NOIDMAP flag, in this
> > case, is too late as a user may do the trick by creating an idmapped mount while it wasn't
> > restricted by SB_I_NOIDMAP. Alternatively, we can introduce a "positive" version SB_I_ALLOWIDMAP
> 
> I see.
> 
> > and a "weak" version of FS_ALLOW_IDMAP like FS_MAY_ALLOW_IDMAP. So if FS_MAY_ALLOW_IDMAP is set,
> > then SB_I_ALLOWIDMAP has to be set on the superblock to allow the creation of an idmapped mount.
> > But that's a matter of our discussion.
> 
> I dislike making adding a struct super_block method. Because it means that we
> call into the filesystem from generic mount code and specifically with the
> namespace semaphore held. If there's ever any network filesystem that e.g.,
> calls to a hung server it will lockup the whole system. So I'm opposed to
> calling into the filesystem here at all. It's also ugly because this is really
> a vfs level change. The only involvement should be whether the filesystem type
> can actually support this ideally.
> 
> I think we should handle this within FUSE. So we allow the creation of idmapped
> mounts just based on FS_ALLOW_IDMAP. And if the server doesn't support the
> FUSE_OWNER_UID_GID_EXT then we simply refuse all creation requests originating
> from an idmapped mount. Either we return EOPNOSUPP or we return EOVERFLOW to
> indicate that we can't represent the owner correctly because the server is
> missing the required extension.

Could fuse just set SB_I_NOIDMAP initially then clear it if the init
reply indicates idmap support? This is like the "weak" FS_ALLOW_IDMAP
option without requiring another file_system_type flag.
Christian Brauner Jan. 23, 2024, 3:45 p.m. UTC | #3
On Mon, Jan 22, 2024 at 03:13:12PM -0600, Seth Forshee wrote:
> On Sun, Jan 21, 2024 at 06:50:57PM +0100, Christian Brauner wrote:
> > > - We have a small offlist discussion with Christian about adding fs_type->allow_idmap
> > > hook. Christian pointed out that it would be nice to have a superblock flag instead like
> > > SB_I_NOIDMAP and we can set this flag during mount time if we see that the filesystem does not
> > > support idmappings. But, unfortunately, I didn't succeed here because the kernel will
> > > know if the filesystem supports idmapping or not after FUSE_INIT request, but FUSE_INIT request
> > > is being sent at the end of the mounting process, so the mount and superblock will exist and
> > > visible by the userspace in that time. It seems like setting SB_I_NOIDMAP flag, in this
> > > case, is too late as a user may do the trick by creating an idmapped mount while it wasn't
> > > restricted by SB_I_NOIDMAP. Alternatively, we can introduce a "positive" version SB_I_ALLOWIDMAP
> > 
> > I see.
> > 
> > > and a "weak" version of FS_ALLOW_IDMAP like FS_MAY_ALLOW_IDMAP. So if FS_MAY_ALLOW_IDMAP is set,
> > > then SB_I_ALLOWIDMAP has to be set on the superblock to allow the creation of an idmapped mount.
> > > But that's a matter of our discussion.
> > 
> > I dislike making adding a struct super_block method. Because it means that we
> > call into the filesystem from generic mount code and specifically with the
> > namespace semaphore held. If there's ever any network filesystem that e.g.,
> > calls to a hung server it will lockup the whole system. So I'm opposed to
> > calling into the filesystem here at all. It's also ugly because this is really
> > a vfs level change. The only involvement should be whether the filesystem type
> > can actually support this ideally.
> > 
> > I think we should handle this within FUSE. So we allow the creation of idmapped
> > mounts just based on FS_ALLOW_IDMAP. And if the server doesn't support the
> > FUSE_OWNER_UID_GID_EXT then we simply refuse all creation requests originating
> > from an idmapped mount. Either we return EOPNOSUPP or we return EOVERFLOW to
> > indicate that we can't represent the owner correctly because the server is
> > missing the required extension.
> 
> Could fuse just set SB_I_NOIDMAP initially then clear it if the init
> reply indicates idmap support? This is like the "weak" FS_ALLOW_IDMAP
> option without requiring another file_system_type flag.

Would probably works too, yes. I'm just trying to avoid any additional
flag usage. Ideally we'd just do it in FUSE but this way is possibly
also effective. Although it kinda leaks internal FUSE details as in
"SB_I_NOIDMAP" really means "SB_I_INITIALIZED" which I kind of dislike.
Aleksandr Mikhalitsyn Jan. 29, 2024, 4:52 p.m. UTC | #4
On Sun, 21 Jan 2024 18:50:57 +0100
Christian Brauner <brauner@kernel.org> wrote:

> On Mon, Jan 08, 2024 at 01:08:15PM +0100, Alexander Mikhalitsyn wrote:
> > Dear friends,
> > 
> > This patch series aimed to provide support for idmapped mounts
> > for fuse. We already have idmapped mounts support for almost all
> > widely-used filesystems:
> > * local (ext4, btrfs, xfs, fat, vfat, ntfs3, squashfs, f2fs, erofs, ZFS (out-of-tree))
> > * network (ceph)
> > 
> > Git tree (based on https://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git/log/?h=for-next):
> > v1: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts.v1
> > current: https://github.com/mihalicyn/linux/commits/fuse_idmapped_mounts
> 
> Great work!
> 
> > Things to discuss:
> > - we enable idmapped mounts support only if "default_permissions" mode is enabled,
> > because otherwise, we would need to deal with UID/GID mappings on the userspace side OR
> > provide the userspace with idmapped req->in.h.uid/req->in.h.gid values which is not
> > something that we probably want to do. Idmapped mounts philosophy is not about faking
> > caller uid/gid.
> 
> Having VFS idmaps but then outsourcing permission checking to userspace
> is conceptually strange so requiring default_permissions is the correct
> thing to do. 
> 
> > - We have a small offlist discussion with Christian about adding fs_type->allow_idmap
> > hook. Christian pointed out that it would be nice to have a superblock flag instead like
> > SB_I_NOIDMAP and we can set this flag during mount time if we see that the filesystem does not
> > support idmappings. But, unfortunately, I didn't succeed here because the kernel will
> > know if the filesystem supports idmapping or not after FUSE_INIT request, but FUSE_INIT request
> > is being sent at the end of the mounting process, so the mount and superblock will exist and
> > visible by the userspace in that time. It seems like setting SB_I_NOIDMAP flag, in this
> > case, is too late as a user may do the trick by creating an idmapped mount while it wasn't
> > restricted by SB_I_NOIDMAP. Alternatively, we can introduce a "positive" version SB_I_ALLOWIDMAP
> 
> I see.
> 
> > and a "weak" version of FS_ALLOW_IDMAP like FS_MAY_ALLOW_IDMAP. So if FS_MAY_ALLOW_IDMAP is set,
> > then SB_I_ALLOWIDMAP has to be set on the superblock to allow the creation of an idmapped mount.
> > But that's a matter of our discussion.
> 
> I dislike making adding a struct super_block method. Because it means that we
> call into the filesystem from generic mount code and specifically with the
> namespace semaphore held. If there's ever any network filesystem that e.g.,
> calls to a hung server it will lockup the whole system.So I'm opposed to
> calling into the filesystem here at all. It's also ugly because this is really
> a vfs level change. The only involvement should be whether the filesystem type
> can actually support this ideally.

That's a very interesting point about holding a semaphore. Thanks!

> 
> I think we should handle this within FUSE. So we allow the creation of idmapped
> mounts just based on FS_ALLOW_IDMAP. And if the server doesn't support the
> FUSE_OWNER_UID_GID_EXT then we simply refuse all creation requests originating
> from an idmapped mount. Either we return EOPNOSUPP or we return EOVERFLOW to
> indicate that we can't represent the owner correctly because the server is
> missing the required extension.

Ok, that's effectively the same approach that we already have in cephfs:
https://github.com/torvalds/linux/blob/41bccc98fb7931d63d03f326a746ac4d429c1dd3/fs/ceph/mds_client.c#L3059

I personally comfortable with this too.

It's interesting to hear what Miklos thinks about it.

Kind regards,
Alex

> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 3f37ba6a7a10..0726da21150a 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -606,8 +606,16 @@ static int get_create_ext(struct mnt_idmap *idmap,
>                 err = get_security_context(dentry, mode, &ext);
>         if (!err && fc->create_supp_group)
>                 err = get_create_supp_group(dir, &ext);
> -       if (!err && fc->owner_uid_gid_ext)
> -               err = get_owner_uid_gid(idmap, fc, &ext);
> +       if (!err) {
> +               /*
> +                * If the server doesn't support FUSE_OWNER_UID_GID_EXT and
> +                * this is a creation request from an idmapped mount refuse it.
> +                */
> +               if (fc->owner_uid_gid_ext)
> +                       err = get_owner_uid_gid(idmap, fc, &ext);
> +               else if (idmap != &nop_mnt_idmap)
> +                       err = -EOPNOTSUPP;
> +       }
> 
>         if (!err && ext.size) {
>                 WARN_ON(args->in_numargs >= ARRAY_SIZE(args->in_args));