diff mbox

[review,02/13] mnt: Refactor fs_fully_visible into mount_too_revealing

Message ID 20160620172130.15712-2-ebiederm@xmission.com (mailing list archive)
State New, archived
Headers show

Commit Message

Eric W. Biederman June 20, 2016, 5:21 p.m. UTC
Replace the call of fs_fully_visible in do_new_mount from before the
new superblock is allocated with a call of mount_too_revealing after
the superblock is allocated.   This winds up being a much better location
for maintainability of the code.

The first change this enables is the replacement of FS_USERNS_VISIBLE
with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
to sb_iflags on the superblock.

Unfortunately mount_too_revealing fundamentally needs to touch
mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
times.  If the mnt_flags did not need to be touched the code
could be easily moved into the filesystem specific mount code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/namespace.c     | 38 +++++++++++++++++++++++++-------------
 fs/proc/inode.c    |  1 +
 fs/proc/root.c     |  2 +-
 fs/sysfs/mount.c   |  4 ++--
 include/linux/fs.h |  4 +++-
 5 files changed, 32 insertions(+), 17 deletions(-)

Comments

Andy Lutomirski June 20, 2016, 10:53 p.m. UTC | #1
On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Replace the call of fs_fully_visible in do_new_mount from before the
> new superblock is allocated with a call of mount_too_revealing after
> the superblock is allocated.   This winds up being a much better location
> for maintainability of the code.
>
> The first change this enables is the replacement of FS_USERNS_VISIBLE
> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> to sb_iflags on the superblock.

Why is this useful?
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman June 21, 2016, 6:54 p.m. UTC | #2
Andy Lutomirski <luto@amacapital.net> writes:

> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Replace the call of fs_fully_visible in do_new_mount from before the
>> new superblock is allocated with a call of mount_too_revealing after
>> the superblock is allocated.   This winds up being a much better location
>> for maintainability of the code.
>>
>> The first change this enables is the replacement of FS_USERNS_VISIBLE
>> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
>> to sb_iflags on the superblock.
>
> Why is this useful?

A couple of reasons.
- It helps clean up do_new_mount which is currently so overloaded by
  special cases that it is difficult to see the core logic.

- It makes the check about the actual superblock that is being mounted
  rather than the superblock that might be mounted.

- The practical place where being about the actual superblock that is
  being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
  that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
  tests from the code, while verify that those tests are not needed
  because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.

- The conceptual change of testing once the superblock has been
  generated makes changes like the one above much more sensible
  and it helps untangle mount namespace versus superblock concerns.

That last is a big part of what this patchset is about.  When do we care
about the superblock and when do we care about the mount namespace.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seth Forshee June 22, 2016, 7:40 p.m. UTC | #3
On Mon, Jun 20, 2016 at 12:21:19PM -0500, Eric W. Biederman wrote:
> Replace the call of fs_fully_visible in do_new_mount from before the
> new superblock is allocated with a call of mount_too_revealing after
> the superblock is allocated.   This winds up being a much better location
> for maintainability of the code.
> 
> The first change this enables is the replacement of FS_USERNS_VISIBLE
> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> to sb_iflags on the superblock.
> 
> Unfortunately mount_too_revealing fundamentally needs to touch
> mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
> times.  If the mnt_flags did not need to be touched the code
> could be easily moved into the filesystem specific mount code.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Acked-by: Seth Forshee <seth.forshee@canonical.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Djalal Harouni June 23, 2016, 9:23 p.m. UTC | #4
On Tue, Jun 21, 2016 at 8:54 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
>
>> On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>> Replace the call of fs_fully_visible in do_new_mount from before the
>>> new superblock is allocated with a call of mount_too_revealing after
>>> the superblock is allocated.   This winds up being a much better location
>>> for maintainability of the code.
>>>
>>> The first change this enables is the replacement of FS_USERNS_VISIBLE
>>> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
>>> to sb_iflags on the superblock.
>>
>> Why is this useful?
>
> A couple of reasons.
> - It helps clean up do_new_mount which is currently so overloaded by
>   special cases that it is difficult to see the core logic.
>
> - It makes the check about the actual superblock that is being mounted
>   rather than the superblock that might be mounted.
>
> - The practical place where being about the actual superblock that is
>   being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
>   that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
>   tests from the code, while verify that those tests are not needed
>   because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.

Yes, but it seems in that patch 11/13 the SB_I_NOEXEC and SB_I_NODEV
flags are only enforced and checked in case 'user_ns != init_user_ns' so for
init_user_ns we don't enforce it. Even if we set the flags and things
are correct
now, but as you have noted in your previous patches related to this we try to
commit to never exec on procfs and sysfs... so maybe take that check on its
own and move it before the init_user_ns one ?

> - The conceptual change of testing once the superblock has been
>   generated makes changes like the one above much more sensible
>   and it helps untangle mount namespace versus superblock concerns.
>
> That last is a big part of what this patchset is about.  When do we care
> about the superblock and when do we care about the mount namespace.

Historically fs_fully_visible() or mount_too_revealing() now gathered lot of
security checks... so one may wonder about the implication of moving it
after !?... yes having a clear context about superblocks and mount
namespaces matters... but I'm not sure about the order.

> Eric
>

Thank you!
Serge E. Hallyn June 24, 2016, 6:56 a.m. UTC | #5
On Tue, Jun 21, 2016 at 01:54:21PM -0500, Eric W. Biederman wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> 
> > On Mon, Jun 20, 2016 at 10:21 AM, Eric W. Biederman
> > <ebiederm@xmission.com> wrote:
> >> Replace the call of fs_fully_visible in do_new_mount from before the
> >> new superblock is allocated with a call of mount_too_revealing after
> >> the superblock is allocated.   This winds up being a much better location
> >> for maintainability of the code.
> >>
> >> The first change this enables is the replacement of FS_USERNS_VISIBLE
> >> with SB_I_USERNS_VISIBLE.  Moving the flag from struct filesystem_type
> >> to sb_iflags on the superblock.
> >
> > Why is this useful?
> 
> A couple of reasons.
> - It helps clean up do_new_mount which is currently so overloaded by
>   special cases that it is difficult to see the core logic.

Agreed, i find this easier to read and reason about.

> - It makes the check about the actual superblock that is being mounted
>   rather than the superblock that might be mounted.
> 
> - The practical place where being about the actual superblock that is
>   being mounted helps is that in "11/13 mnt: Simplify mount_too_revealing"
>   that removes the MNT_LOCK_NOSUID MNT_LOCK_NOEXEC and MNT_LOCK_NODEV
>   tests from the code, while verify that those tests are not needed
>   because the sb_iflags contains SB_I_NOEXEC and SB_I_NODEV.
> 
> - The conceptual change of testing once the superblock has been
>   generated makes changes like the one above much more sensible
>   and it helps untangle mount namespace versus superblock concerns.
> 
> That last is a big part of what this patchset is about.  When do we care
> about the superblock and when do we care about the mount namespace.
> 
> Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/namespace.c b/fs/namespace.c
index 783004af5707..1a69aa786975 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2375,7 +2375,7 @@  unlock:
 	return err;
 }
 
-static bool fs_fully_visible(struct file_system_type *fs_type, int *new_mnt_flags);
+static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags);
 
 /*
  * create a new mount for userspace and request it to be added into the
@@ -2408,12 +2408,6 @@  static int do_new_mount(struct path *path, const char *fstype, int flags,
 			flags |= MS_NODEV;
 			mnt_flags |= MNT_NODEV | MNT_LOCK_NODEV;
 		}
-		if (type->fs_flags & FS_USERNS_VISIBLE) {
-			if (!fs_fully_visible(type, &mnt_flags)) {
-				put_filesystem(type);
-				return -EPERM;
-			}
-		}
 	}
 
 	mnt = vfs_kern_mount(type, flags, name, data);
@@ -2425,6 +2419,11 @@  static int do_new_mount(struct path *path, const char *fstype, int flags,
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
+	if (mount_too_revealing(mnt, &mnt_flags)) {
+		mntput(mnt);
+		return -EPERM;
+	}
+
 	err = do_add_mount(real_mount(mnt), path, mnt_flags);
 	if (err)
 		mntput(mnt);
@@ -3216,22 +3215,19 @@  bool current_chrooted(void)
 	return chrooted;
 }
 
-static bool fs_fully_visible(struct file_system_type *type, int *new_mnt_flags)
+static bool mnt_already_visible(struct mnt_namespace *ns, struct vfsmount *new,
+				int *new_mnt_flags)
 {
-	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	int new_flags = *new_mnt_flags;
 	struct mount *mnt;
 	bool visible = false;
 
-	if (unlikely(!ns))
-		return false;
-
 	down_read(&namespace_sem);
 	list_for_each_entry(mnt, &ns->list, mnt_list) {
 		struct mount *child;
 		int mnt_flags;
 
-		if (mnt->mnt.mnt_sb->s_type != type)
+		if (mnt->mnt.mnt_sb->s_type != new->mnt_sb->s_type)
 			continue;
 
 		/* This mount is not fully visible if it's root directory
@@ -3298,6 +3294,22 @@  found:
 	return visible;
 }
 
+static bool mount_too_revealing(struct vfsmount *mnt, int *new_mnt_flags)
+{
+	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
+	unsigned long s_iflags;
+
+	if (ns->user_ns == &init_user_ns)
+		return false;
+
+	/* Can this filesystem be too revealing? */
+	s_iflags = mnt->mnt_sb->s_iflags;
+	if (!(s_iflags & SB_I_USERNS_VISIBLE))
+		return false;
+
+	return !mnt_already_visible(ns, mnt, new_mnt_flags);
+}
+
 static struct ns_common *mntns_get(struct task_struct *task)
 {
 	struct ns_common *ns = NULL;
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 42305ddcbaa0..78fa452d65ed 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -462,6 +462,7 @@  int proc_fill_super(struct super_block *s)
 	struct inode *root_inode;
 	int ret;
 
+	s->s_iflags |= SB_I_USERNS_VISIBLE;
 	s->s_flags |= MS_NODIRATIME | MS_NOSUID | MS_NOEXEC;
 	s->s_blocksize = 1024;
 	s->s_blocksize_bits = 10;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 55bc7d6c8aac..a1b2860fec62 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -158,7 +158,7 @@  static struct file_system_type proc_fs_type = {
 	.name		= "proc",
 	.mount		= proc_mount,
 	.kill_sb	= proc_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 void __init proc_root_init(void)
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index f3db82071cfb..f31e36994dfb 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -42,7 +42,7 @@  static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 		kobj_ns_drop(KOBJ_NS_TYPE_NET, ns);
 	else if (new_sb)
 		/* Userspace would break if executables appear on sysfs */
-		root->d_sb->s_iflags |= SB_I_NOEXEC;
+		root->d_sb->s_iflags |= SB_I_USERNS_VISIBLE | SB_I_NOEXEC;
 
 	return root;
 }
@@ -59,7 +59,7 @@  static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.mount		= sysfs_mount,
 	.kill_sb	= sysfs_kill_sb,
-	.fs_flags	= FS_USERNS_VISIBLE | FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 int __init sysfs_init(void)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dd288148a6b1..71988dd3af95 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1328,6 +1328,9 @@  struct mm_struct;
 #define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
 
+/* sb->s_iflags to limit user namespace mounts */
+#define SB_I_USERNS_VISIBLE		0x00000010 /* fstype already mounted */
+
 /* Possible states of 'frozen' field */
 enum {
 	SB_UNFROZEN = 0,		/* FS is unfrozen */
@@ -2011,7 +2014,6 @@  struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_USERNS_DEV_MOUNT	16 /* A userns mount does not imply MNT_NODEV */
-#define FS_USERNS_VISIBLE	32	/* FS must already be visible */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);