Message ID | 20201115103718.298186-8-christian.brauner@ubuntu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: idmapped mounts | expand |
On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > +{ > + return mnt->mnt_user_ns; > +} I think you might want a READ_ONCE() here. Right now it seems ok, since the mnt_user_ns can't change, but if we ever allow it to change (and I see you have a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree :D), the pattern of, user_ns = mnt_user_ns(path->mnt); if (mnt_idmapped(path->mnt)) { uid = kuid_from_mnt(user_ns, uid); gid = kgid_from_mnt(user_ns, gid); } could race. Tycho
On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote: > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > > +{ > > + return mnt->mnt_user_ns; > > +} > > I think you might want a READ_ONCE() here. Right now it seems ok, since the > mnt_user_ns can't change, but if we ever allow it to change (and I see you have > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree > :D), the pattern of, > > user_ns = mnt_user_ns(path->mnt); > if (mnt_idmapped(path->mnt)) { > uid = kuid_from_mnt(user_ns, uid); > gid = kgid_from_mnt(user_ns, gid); > } > > could race. Actually, isn't a race possible now? kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */); WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns); WRITE_ONCE(m->mnt.mnt_flags, flags); kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */); So maybe it should be: if (mnt_idmapped(path->mnt)) { barrier(); user_ns = mnt_user_ns(path->mnt); uid = kuid_from_mnt(user_ns, uid); gid = kgid_from_mnt(user_ns, gid); } since there's no data dependency between mnt_idmapped() and mnt_user_ns()? Tycho
On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote: > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote: > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > > > +{ > > > + return mnt->mnt_user_ns; > > > +} > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree > > :D), the pattern of, > > > > user_ns = mnt_user_ns(path->mnt); > > if (mnt_idmapped(path->mnt)) { > > uid = kuid_from_mnt(user_ns, uid); > > gid = kgid_from_mnt(user_ns, gid); > > } > > > > could race. > > Actually, isn't a race possible now? > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */); > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns); > WRITE_ONCE(m->mnt.mnt_flags, flags); > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */); > > So maybe it should be: > > if (mnt_idmapped(path->mnt)) { > barrier(); > user_ns = mnt_user_ns(path->mnt); > uid = kuid_from_mnt(user_ns, uid); > gid = kgid_from_mnt(user_ns, gid); > } > > since there's no data dependency between mnt_idmapped() and > mnt_user_ns()? I think I had something to handle this case in another branch of mine. The READ_ONCE() you mentioned in another patch I had originally dropped because I wasn't sure whether it works on pointers but after talking to Jann and David it seems that it handles pointers fine. Let me take a look and fix it in the next version. I just finished porting the test suite to xfstests as Christoph requested and I'm looking at this now. Thanks! Christian
On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote: > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote: > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote: > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > > > > +{ > > > > + return mnt->mnt_user_ns; > > > > +} > > > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree > > > :D), the pattern of, > > > > > > user_ns = mnt_user_ns(path->mnt); > > > if (mnt_idmapped(path->mnt)) { > > > uid = kuid_from_mnt(user_ns, uid); > > > gid = kgid_from_mnt(user_ns, gid); > > > } > > > > > > could race. > > > > Actually, isn't a race possible now? > > > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */); > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns); > > WRITE_ONCE(m->mnt.mnt_flags, flags); > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */); > > > > So maybe it should be: > > > > if (mnt_idmapped(path->mnt)) { > > barrier(); > > user_ns = mnt_user_ns(path->mnt); > > uid = kuid_from_mnt(user_ns, uid); > > gid = kgid_from_mnt(user_ns, gid); > > } > > > > since there's no data dependency between mnt_idmapped() and > > mnt_user_ns()? > > I think I had something to handle this case in another branch of mine. > The READ_ONCE() you mentioned in another patch I had originally dropped > because I wasn't sure whether it works on pointers but after talking to > Jann and David it seems that it handles pointers fine. > Let me take a look and fix it in the next version. I just finished > porting the test suite to xfstests as Christoph requested and I'm > looking at this now. Another way would be to just have mnt_idmapped() test mnt_user_ns() != &init_user_ns instead of the flags; then I think you get the data dependency and thus correct ordering for free. Tycho
On Tue, Nov 24, 2020 at 08:37:40AM -0500, Tycho Andersen wrote: > On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote: > > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote: > > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote: > > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > > > > > +{ > > > > > + return mnt->mnt_user_ns; > > > > > +} > > > > > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the > > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have > > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree > > > > :D), the pattern of, > > > > > > > > user_ns = mnt_user_ns(path->mnt); > > > > if (mnt_idmapped(path->mnt)) { > > > > uid = kuid_from_mnt(user_ns, uid); > > > > gid = kgid_from_mnt(user_ns, gid); > > > > } > > > > > > > > could race. > > > > > > Actually, isn't a race possible now? > > > > > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */); > > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns); > > > WRITE_ONCE(m->mnt.mnt_flags, flags); > > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */); > > > > > > So maybe it should be: > > > > > > if (mnt_idmapped(path->mnt)) { > > > barrier(); > > > user_ns = mnt_user_ns(path->mnt); > > > uid = kuid_from_mnt(user_ns, uid); > > > gid = kgid_from_mnt(user_ns, gid); > > > } > > > > > > since there's no data dependency between mnt_idmapped() and > > > mnt_user_ns()? > > > > I think I had something to handle this case in another branch of mine. > > The READ_ONCE() you mentioned in another patch I had originally dropped > > because I wasn't sure whether it works on pointers but after talking to > > Jann and David it seems that it handles pointers fine. > > Let me take a look and fix it in the next version. I just finished > > porting the test suite to xfstests as Christoph requested and I'm > > looking at this now. > > Another way would be to just have mnt_idmapped() test > mnt_user_ns() != &init_user_ns instead of the flags; then I think you > get the data dependency and thus correct ordering for free. I indeed dropped mnt_idmapped() which is unnecessary. :) I think we should still use smp_store_release() in mnt_user_ns() paired with smp_load_acquire() in do_idmap_mount() thought. Christian
On Tue, Nov 24, 2020 at 02:40:35PM +0100, Christian Brauner wrote: > On Tue, Nov 24, 2020 at 08:37:40AM -0500, Tycho Andersen wrote: > > On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote: > > > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote: > > > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote: > > > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > > > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > > > > > > +{ > > > > > > + return mnt->mnt_user_ns; > > > > > > +} > > > > > > > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the > > > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have > > > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree > > > > > :D), the pattern of, > > > > > > > > > > user_ns = mnt_user_ns(path->mnt); > > > > > if (mnt_idmapped(path->mnt)) { > > > > > uid = kuid_from_mnt(user_ns, uid); > > > > > gid = kgid_from_mnt(user_ns, gid); > > > > > } > > > > > > > > > > could race. > > > > > > > > Actually, isn't a race possible now? > > > > > > > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */); > > > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns); > > > > WRITE_ONCE(m->mnt.mnt_flags, flags); > > > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */); > > > > > > > > So maybe it should be: > > > > > > > > if (mnt_idmapped(path->mnt)) { > > > > barrier(); > > > > user_ns = mnt_user_ns(path->mnt); > > > > uid = kuid_from_mnt(user_ns, uid); > > > > gid = kgid_from_mnt(user_ns, gid); > > > > } > > > > > > > > since there's no data dependency between mnt_idmapped() and > > > > mnt_user_ns()? > > > > > > I think I had something to handle this case in another branch of mine. > > > The READ_ONCE() you mentioned in another patch I had originally dropped > > > because I wasn't sure whether it works on pointers but after talking to > > > Jann and David it seems that it handles pointers fine. > > > Let me take a look and fix it in the next version. I just finished > > > porting the test suite to xfstests as Christoph requested and I'm > > > looking at this now. > > > > Another way would be to just have mnt_idmapped() test > > mnt_user_ns() != &init_user_ns instead of the flags; then I think you > > get the data dependency and thus correct ordering for free. > > I indeed dropped mnt_idmapped() which is unnecessary. :) It still might be a nice helper to prevent people from checking the flags and forgetting that there's a memory ordering issue, though. > I think we should still use smp_store_release() in mnt_user_ns() paired > with smp_load_acquire() in do_idmap_mount() thought. Sounds reasonable. Tycho
On Tue, Nov 24, 2020 at 08:44:59AM -0500, Tycho Andersen wrote: > On Tue, Nov 24, 2020 at 02:40:35PM +0100, Christian Brauner wrote: > > On Tue, Nov 24, 2020 at 08:37:40AM -0500, Tycho Andersen wrote: > > > On Tue, Nov 24, 2020 at 01:30:35PM +0100, Christian Brauner wrote: > > > > On Mon, Nov 23, 2020 at 11:24:28AM -0500, Tycho Andersen wrote: > > > > > On Mon, Nov 23, 2020 at 10:47:19AM -0500, Tycho Andersen wrote: > > > > > > On Sun, Nov 15, 2020 at 11:36:46AM +0100, Christian Brauner wrote: > > > > > > > +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) > > > > > > > +{ > > > > > > > + return mnt->mnt_user_ns; > > > > > > > +} > > > > > > > > > > > > I think you might want a READ_ONCE() here. Right now it seems ok, since the > > > > > > mnt_user_ns can't change, but if we ever allow it to change (and I see you have > > > > > > a idmapped_mounts_wip_v2_allow_to_change_idmapping branch on your public tree > > > > > > :D), the pattern of, > > > > > > > > > > > > user_ns = mnt_user_ns(path->mnt); > > > > > > if (mnt_idmapped(path->mnt)) { > > > > > > uid = kuid_from_mnt(user_ns, uid); > > > > > > gid = kgid_from_mnt(user_ns, gid); > > > > > > } > > > > > > > > > > > > could race. > > > > > > > > > > Actually, isn't a race possible now? > > > > > > > > > > kuid_from_mnt(mnt_user_ns(path->mnt) /* &init_user_ns */); > > > > > WRITE_ONCE(mnt->mnt.mnt_user_ns, user_ns); > > > > > WRITE_ONCE(m->mnt.mnt_flags, flags); > > > > > kgid_from_mnt(mnt_user_ns(path->mnt) /* the right user ns */); > > > > > > > > > > So maybe it should be: > > > > > > > > > > if (mnt_idmapped(path->mnt)) { > > > > > barrier(); > > > > > user_ns = mnt_user_ns(path->mnt); > > > > > uid = kuid_from_mnt(user_ns, uid); > > > > > gid = kgid_from_mnt(user_ns, gid); > > > > > } > > > > > > > > > > since there's no data dependency between mnt_idmapped() and > > > > > mnt_user_ns()? > > > > > > > > I think I had something to handle this case in another branch of mine. > > > > The READ_ONCE() you mentioned in another patch I had originally dropped > > > > because I wasn't sure whether it works on pointers but after talking to > > > > Jann and David it seems that it handles pointers fine. > > > > Let me take a look and fix it in the next version. I just finished > > > > porting the test suite to xfstests as Christoph requested and I'm > > > > looking at this now. > > > > > > Another way would be to just have mnt_idmapped() test > > > mnt_user_ns() != &init_user_ns instead of the flags; then I think you > > > get the data dependency and thus correct ordering for free. > > > > I indeed dropped mnt_idmapped() which is unnecessary. :) > > It still might be a nice helper to prevent people from checking the > flags and forgetting that there's a memory ordering issue, though. I just mentioned this offline but for the record: the flag is gone since we can rely on the pointer alone. :) Christian
diff --git a/fs/namespace.c b/fs/namespace.c index 9fc8b22dba26..15fb0ae3f01f 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -220,6 +220,7 @@ static struct mount *alloc_vfsmnt(const char *name) INIT_HLIST_NODE(&mnt->mnt_mp_list); INIT_LIST_HEAD(&mnt->mnt_umounting); INIT_HLIST_HEAD(&mnt->mnt_stuck_children); + mnt->mnt.mnt_user_ns = &init_user_ns; } return mnt; @@ -559,6 +560,8 @@ int sb_prepare_remount_readonly(struct super_block *sb) static void free_vfsmnt(struct mount *mnt) { + if (mnt_idmapped(&mnt->mnt) && mnt_user_ns(&mnt->mnt) != &init_user_ns) + put_user_ns(mnt_user_ns(&mnt->mnt)); kfree_const(mnt->mnt_devname); #ifdef CONFIG_SMP free_percpu(mnt->mnt_pcp); @@ -1067,6 +1070,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL); atomic_inc(&sb->s_active); + mnt->mnt.mnt_user_ns = old->mnt.mnt_user_ns; + if (mnt_user_ns(&old->mnt) != &init_user_ns) + mnt->mnt.mnt_user_ns = get_user_ns(mnt->mnt.mnt_user_ns); mnt->mnt.mnt_sb = sb; mnt->mnt.mnt_root = dget(root); mnt->mnt_mountpoint = mnt->mnt.mnt_root; diff --git a/include/linux/fs.h b/include/linux/fs.h index 9e487cbf0f5c..9e05fb4f997c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2260,6 +2260,7 @@ struct file_system_type { #define FS_HAS_SUBTYPE 4 #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ +#define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ #define FS_THP_SUPPORT 8192 /* Remove once all fs converted */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ int (*init_fs_context)(struct fs_context *); diff --git a/include/linux/mount.h b/include/linux/mount.h index aaf343b38671..3c7ba1bd4a21 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -31,6 +31,7 @@ struct fs_context; #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40 /* does the user want this to be r/o? */ #define MNT_NOSYMFOLLOW 0x80 +#define MNT_IDMAPPED 0x400 #define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -72,8 +73,19 @@ struct vfsmount { struct dentry *mnt_root; /* root of the mounted tree */ struct super_block *mnt_sb; /* pointer to superblock */ int mnt_flags; + struct user_namespace *mnt_user_ns; } __randomize_layout; +static inline bool mnt_idmapped(const struct vfsmount *mnt) +{ + return READ_ONCE(mnt->mnt_flags) & MNT_IDMAPPED; +} + +static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) +{ + return mnt->mnt_user_ns; +} + struct file; /* forward dec */ struct path;
In order to support per-mount idmappings vfsmounts will be marked with user namespaces. The idmapping associated with that user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts will be marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and no performance impact is seen. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged user to setup idmapped mounts we can allow already idmapped mounts to be marked with another user namespace. For now, we will enforce in later patches that once a mount has been idmapped it can't be remapped. This keeps permission checking and life-cycle management simple especially since users can always create a new mount with a different idmapping anyway. The idea to attach user namespaces to vfsmounts has been floated around in various forms at Linux Plumbers in ~2018 with the original idea being tracing back to a discussion during a conference in St. Petersburg between Christoph, Tycho, and myself. Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> --- /* v2 */ patch introduced - Christoph Hellwig: - Split internal implementation into separate patch and move syscall implementation later. --- fs/namespace.c | 6 ++++++ include/linux/fs.h | 1 + include/linux/mount.h | 12 ++++++++++++ 3 files changed, 19 insertions(+)