diff mbox series

[v7,10/14] securityfs: Extend securityfs with namespacing support

Message ID 20211216054323.1707384-11-stefanb@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show
Series ima: Namespace IMA with audit support in IMA-ns | expand

Commit Message

Stefan Berger Dec. 16, 2021, 5:43 a.m. UTC
From: Stefan Berger <stefanb@linux.ibm.com>

Extend 'securityfs' for support of IMA namespacing so that each
IMA (user) namespace can have its own front-end for showing the currently
active policy, the measurement list, number of violations and so on.

Drop the addition dentry reference to enable simple cleanup of dentries
upon umount.

Prevent mounting of an instance of securityfs in another user namespace
than it belongs to. Also, prevent accesses to directories when another
user namespace is active than the one that the instance of securityfs
belongs to.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 security/inode.c | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

Comments

Christian Brauner Dec. 16, 2021, 1:40 p.m. UTC | #1
On Thu, Dec 16, 2021 at 12:43:19AM -0500, Stefan Berger wrote:
> From: Stefan Berger <stefanb@linux.ibm.com>
> 
> Extend 'securityfs' for support of IMA namespacing so that each
> IMA (user) namespace can have its own front-end for showing the currently
> active policy, the measurement list, number of violations and so on.
> 
> Drop the addition dentry reference to enable simple cleanup of dentries
> upon umount.
> 
> Prevent mounting of an instance of securityfs in another user namespace
> than it belongs to. Also, prevent accesses to directories when another
> user namespace is active than the one that the instance of securityfs
> belongs to.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> ---
>  security/inode.c | 37 ++++++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/security/inode.c b/security/inode.c
> index fee01ff4d831..a0d9f086e3d5 100644
> --- a/security/inode.c
> +++ b/security/inode.c
> @@ -26,6 +26,29 @@
>  static struct vfsmount *init_securityfs_mount;
>  static int init_securityfs_mount_count;
>  
> +static int securityfs_permission(struct user_namespace *mnt_userns,
> +				 struct inode *inode, int mask)
> +{
> +	int err;
> +
> +	err = generic_permission(&init_user_ns, inode, mask);
> +	if (!err) {
> +		if (inode->i_sb->s_user_ns != current_user_ns())
> +			err = -EACCES;

I really think the correct semantics is to grant all callers access
whose user namespace is the same as or an ancestor of the securityfs
userns. It's weird to deny access to callers who are located in an
ancestor userns.

For example, a privileged process on the host should be allowed to setns
to the userns of an unprivileged container and inspect its securityfs
instance.

We're mostly interested to block such as scenarios where two sibling
unprivileged containers are created in the initial userns and an fd
proxy or something funnels a file descriptor from one sibling container
to the another one and the receiving sibling container can use readdir()
or openat() on this fd. (I'm not even convinced that this is actually a
problem but stricter semantics at the beginning can't hurt. We can
always relax this later.)
Christian Brauner Dec. 16, 2021, 4:28 p.m. UTC | #2
On Thu, Dec 16, 2021 at 02:40:27PM +0100, Christian Brauner wrote:
> On Thu, Dec 16, 2021 at 12:43:19AM -0500, Stefan Berger wrote:
> > From: Stefan Berger <stefanb@linux.ibm.com>
> > 
> > Extend 'securityfs' for support of IMA namespacing so that each
> > IMA (user) namespace can have its own front-end for showing the currently
> > active policy, the measurement list, number of violations and so on.
> > 
> > Drop the addition dentry reference to enable simple cleanup of dentries
> > upon umount.
> > 
> > Prevent mounting of an instance of securityfs in another user namespace
> > than it belongs to. Also, prevent accesses to directories when another
> > user namespace is active than the one that the instance of securityfs
> > belongs to.
> > 
> > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
> > ---
> >  security/inode.c | 37 ++++++++++++++++++++++++++++++++++---
> >  1 file changed, 34 insertions(+), 3 deletions(-)
> > 
> > diff --git a/security/inode.c b/security/inode.c
> > index fee01ff4d831..a0d9f086e3d5 100644
> > --- a/security/inode.c
> > +++ b/security/inode.c
> > @@ -26,6 +26,29 @@
> >  static struct vfsmount *init_securityfs_mount;
> >  static int init_securityfs_mount_count;
> >  
> > +static int securityfs_permission(struct user_namespace *mnt_userns,
> > +				 struct inode *inode, int mask)
> > +{
> > +	int err;
> > +
> > +	err = generic_permission(&init_user_ns, inode, mask);
> > +	if (!err) {
> > +		if (inode->i_sb->s_user_ns != current_user_ns())
> > +			err = -EACCES;
> 
> I really think the correct semantics is to grant all callers access
> whose user namespace is the same as or an ancestor of the securityfs
> userns. It's weird to deny access to callers who are located in an
> ancestor userns.
> 
> For example, a privileged process on the host should be allowed to setns
> to the userns of an unprivileged container and inspect its securityfs

s/userns/mntns/

> instance.
> 
> We're mostly interested to block such as scenarios where two sibling
> unprivileged containers are created in the initial userns and an fd
> proxy or something funnels a file descriptor from one sibling container
> to the another one and the receiving sibling container can use readdir()
> or openat() on this fd. (I'm not even convinced that this is actually a
> problem but stricter semantics at the beginning can't hurt. We can
> always relax this later.)
>
Stefan Berger Jan. 3, 2022, 2:09 p.m. UTC | #3
On 12/16/21 08:40, Christian Brauner wrote:
> On Thu, Dec 16, 2021 at 12:43:19AM -0500, Stefan Berger wrote:
>> From: Stefan Berger <stefanb@linux.ibm.com>
>>
>> Extend 'securityfs' for support of IMA namespacing so that each
>> IMA (user) namespace can have its own front-end for showing the currently
>> active policy, the measurement list, number of violations and so on.
>>
>> Drop the addition dentry reference to enable simple cleanup of dentries
>> upon umount.
>>
>> Prevent mounting of an instance of securityfs in another user namespace
>> than it belongs to. Also, prevent accesses to directories when another
>> user namespace is active than the one that the instance of securityfs
>> belongs to.
>>
>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>> ---
>>   security/inode.c | 37 ++++++++++++++++++++++++++++++++++---
>>   1 file changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/security/inode.c b/security/inode.c
>> index fee01ff4d831..a0d9f086e3d5 100644
>> --- a/security/inode.c
>> +++ b/security/inode.c
>> @@ -26,6 +26,29 @@
>>   static struct vfsmount *init_securityfs_mount;
>>   static int init_securityfs_mount_count;
>>   
>> +static int securityfs_permission(struct user_namespace *mnt_userns,
>> +				 struct inode *inode, int mask)
>> +{
>> +	int err;
>> +
>> +	err = generic_permission(&init_user_ns, inode, mask);
>> +	if (!err) {
>> +		if (inode->i_sb->s_user_ns != current_user_ns())
>> +			err = -EACCES;
> I really think the correct semantics is to grant all callers access
> whose user namespace is the same as or an ancestor of the securityfs
> userns. It's weird to deny access to callers who are located in an
> ancestor userns.

Ok, will be using current_in_userns() or the more explicit in_userns() 
for the check.


>
> For example, a privileged process on the host should be allowed to setns
> to the userns of an unprivileged container and inspect its securityfs
> instance.
>
> We're mostly interested to block such as scenarios where two sibling
> unprivileged containers are created in the initial userns and an fd
> proxy or something funnels a file descriptor from one sibling container
> to the another one and the receiving sibling container can use readdir()
> or openat() on this fd. (I'm not even convinced that this is actually a
> problem but stricter semantics at the beginning can't hurt. We can
> always relax this later.)
diff mbox series

Patch

diff --git a/security/inode.c b/security/inode.c
index fee01ff4d831..a0d9f086e3d5 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -26,6 +26,29 @@ 
 static struct vfsmount *init_securityfs_mount;
 static int init_securityfs_mount_count;
 
+static int securityfs_permission(struct user_namespace *mnt_userns,
+				 struct inode *inode, int mask)
+{
+	int err;
+
+	err = generic_permission(&init_user_ns, inode, mask);
+	if (!err) {
+		if (inode->i_sb->s_user_ns != current_user_ns())
+			err = -EACCES;
+	}
+
+	return err;
+}
+
+const struct inode_operations securityfs_dir_inode_operations = {
+	.permission	= securityfs_permission,
+	.lookup		= simple_lookup,
+};
+
+const struct inode_operations securityfs_file_inode_operations = {
+	.permission	= securityfs_permission,
+};
+
 static void securityfs_free_inode(struct inode *inode)
 {
 	if (S_ISLNK(inode->i_mode))
@@ -41,20 +64,25 @@  static const struct super_operations securityfs_super_operations = {
 static int securityfs_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr files[] = {{""}};
+	struct user_namespace *ns = fc->user_ns;
 	int error;
 
+	if (WARN_ON(ns != current_user_ns()))
+		return -EINVAL;
+
 	error = simple_fill_super(sb, SECURITYFS_MAGIC, files);
 	if (error)
 		return error;
 
 	sb->s_op = &securityfs_super_operations;
+	sb->s_root->d_inode->i_op = &securityfs_dir_inode_operations;
 
 	return 0;
 }
 
 static int securityfs_get_tree(struct fs_context *fc)
 {
-	return get_tree_single(fc, securityfs_fill_super);
+	return get_tree_keyed(fc, securityfs_fill_super, fc->user_ns);
 }
 
 static const struct fs_context_operations securityfs_context_ops = {
@@ -72,6 +100,7 @@  static struct file_system_type fs_type = {
 	.name =		"securityfs",
 	.init_fs_context = securityfs_init_fs_context,
 	.kill_sb =	kill_litter_super,
+	.fs_flags =	FS_USERNS_MOUNT,
 };
 
 /**
@@ -157,7 +186,7 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
 	inode->i_private = data;
 	if (S_ISDIR(mode)) {
-		inode->i_op = &simple_dir_inode_operations;
+		inode->i_op = &securityfs_dir_inode_operations;
 		inode->i_fop = &simple_dir_operations;
 		inc_nlink(inode);
 		inc_nlink(dir);
@@ -165,10 +194,10 @@  static struct dentry *securityfs_create_dentry(const char *name, umode_t mode,
 		inode->i_op = iops ? iops : &simple_symlink_inode_operations;
 		inode->i_link = data;
 	} else {
+		inode->i_op = &securityfs_file_inode_operations;
 		inode->i_fop = fops;
 	}
 	d_instantiate(dentry, inode);
-	dget(dentry);
 	inode_unlock(dir);
 	return dentry;
 
@@ -316,10 +345,12 @@  void securityfs_remove(struct dentry *dentry)
 	dir = d_inode(dentry->d_parent);
 	inode_lock(dir);
 	if (simple_positive(dentry)) {
+		dget(dentry);
 		if (d_is_dir(dentry))
 			simple_rmdir(dir, dentry);
 		else
 			simple_unlink(dir, dentry);
+		d_delete(dentry);
 		dput(dentry);
 	}
 	inode_unlock(dir);