diff mbox

[RFC,04/10] netns, selinux: create the selinux netlink socket per network namespace

Message ID 20171002155825.28620-5-sds@tycho.nsa.gov (mailing list archive)
State RFC
Headers show

Commit Message

Stephen Smalley Oct. 2, 2017, 3:58 p.m. UTC
The selinux netlink socket is used to notify userspace of changes to
the enforcing mode and policy reloads.  At present, these notifications
are always sent to the initial network namespace.  In order to support
multiple selinux namespaces, each with its own enforcing mode and
policy, we need to create and use a separate selinux netlink socket
for each network namespace.

Without this change, a policy reload in a child selinux namespace
causes a notification to be sent to processes in the init namespace
with a sequence number that may be higher than the policy sequence
number for that namespace.  As a result, userspace AVC instances in
the init namespace will then end up rejecting any further access
vector results from its own security server instance due to the
policy sequence number appearing to regress, which in turn causes
all subsequent uncached access checks to fail.  Similarly,
without this change, changing enforcing mode in the child selinux
namespace triggers a notification to all userspace AVC instances
in the init namespace that will switch their enforcing modes.

This change does alter SELinux behavior, since previously reloading
policy or changing enforcing mode in a non-init network namespace would
trigger a notification to processes in the init network namespace.
However, this behavior is not being relied upon by existing userspace
AFAICT and is arguably wrong regardless.

This change presumes that one will always unshare the network namespace
when unsharing a new selinux namespace (the reverse is not required).
Otherwise, the same inconsistencies could arise between the notifications
and the relevant policy.  At present, nothing enforces this guarantee
at the kernel level; it is left up to userspace (e.g. container runtimes).
It is an open question as to whether this is a good idea or whether
unsharing of the selinux namespace should automatically unshare the network
namespace.  However, keeping them separate is consistent with the handling
of the mount namespace currently, which also should be unshared so that
a private selinuxfs mount can be created.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
---
 include/net/net_namespace.h |  3 +++
 security/selinux/netlink.c  | 31 +++++++++++++++++++++++++------
 2 files changed, 28 insertions(+), 6 deletions(-)

Comments

Serge E. Hallyn Oct. 5, 2017, 5:47 a.m. UTC | #1
On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote:
> The selinux netlink socket is used to notify userspace of changes to
> the enforcing mode and policy reloads.  At present, these notifications
> are always sent to the initial network namespace.  In order to support
> multiple selinux namespaces, each with its own enforcing mode and
> policy, we need to create and use a separate selinux netlink socket
> for each network namespace.

...

> +static int __init selnl_init(void)
> +{
> +	if (register_pernet_subsys(&selnl_net_ops))
> +		panic("Could not register selinux netlink operations\n");
>  	return 0;
>  }

This doesn't seem right to me.  If the socket is only used to send
notifications to userspace, then every net_ns doesn't need a socket,
only the first netns that the selinux ns was associated, right?

So long as there is a way to find the netns to which an selinux ns
is tied, a userspace logger could even setns into that netns to listen
for updates, if it wasn't certain to be in the right ns when it ran.

Otherwise (I haven't peeked ahead) you'll have to keep the *list* of
net_ns which live in a given selinuxfs and copy all messages to all of
those namesapces?

-serge
Stephen Smalley Oct. 5, 2017, 2:06 p.m. UTC | #2
On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote:
> On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote:
> > The selinux netlink socket is used to notify userspace of changes
> > to
> > the enforcing mode and policy reloads.  At present, these
> > notifications
> > are always sent to the initial network namespace.  In order to
> > support
> > multiple selinux namespaces, each with its own enforcing mode and
> > policy, we need to create and use a separate selinux netlink socket
> > for each network namespace.
> 
> ...
> 
> > +static int __init selnl_init(void)
> > +{
> > +	if (register_pernet_subsys(&selnl_net_ops))
> > +		panic("Could not register selinux netlink
> > operations\n");
> >  	return 0;
> >  }
> 
> This doesn't seem right to me.  If the socket is only used to send
> notifications to userspace, then every net_ns doesn't need a socket,
> only the first netns that the selinux ns was associated, right?

What does "the first netns that the selinux ns was associated" mean?
We could unshare them in any order; in the sample command sequence I
gave in the patch description for "selinux: add a selinuxfs interface
to unshare selinux namespace", I unshared the SELinux namespace first,
then the network namespace, but we could just as easily do it in the
reverse order (or at the same time if unshare(2) supported that).  So
you can't assume that the network namespace in which you are running at
the time you unshare selinux namespace is the right one, nor that the
first unshare of the network namespace after unsharing the selinux
namespace is the right one (not that we even have a way to catch that
currently).

> So long as there is a way to find the netns to which an selinux ns
> is tied, a userspace logger could even setns into that netns to
> listen
> for updates, if it wasn't certain to be in the right ns when it ran.
> 
> Otherwise (I haven't peeked ahead) you'll have to keep the *list* of
> net_ns which live in a given selinuxfs and copy all messages to all
> of
> those namesapces?

No, we only deliver to the network namespace of the process that
performed the setenforce or policy load (most commonly init, could also
be an admin running a management command or installing a policy rpm). 
We assume the container runtime properly handles unsharing of the
mount, network, and selinux namespaces before launching the container
init.  A container process that subsequently unshares its network
namespace won't see notifications for any subsequent policy reloads or
setenforce calls.  I don't know if that will prove to be a problem in
practice.
Stephen Smalley Oct. 5, 2017, 2:11 p.m. UTC | #3
On Thu, 2017-10-05 at 10:06 -0400, Stephen Smalley wrote:
> On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote:
> > On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote:
> > > The selinux netlink socket is used to notify userspace of changes
> > > to
> > > the enforcing mode and policy reloads.  At present, these
> > > notifications
> > > are always sent to the initial network namespace.  In order to
> > > support
> > > multiple selinux namespaces, each with its own enforcing mode and
> > > policy, we need to create and use a separate selinux netlink
> > > socket
> > > for each network namespace.
> > 
> > ...
> > 
> > > +static int __init selnl_init(void)
> > > +{
> > > +	if (register_pernet_subsys(&selnl_net_ops))
> > > +		panic("Could not register selinux netlink
> > > operations\n");
> > >  	return 0;
> > >  }
> > 
> > This doesn't seem right to me.  If the socket is only used to send
> > notifications to userspace, then every net_ns doesn't need a
> > socket,
> > only the first netns that the selinux ns was associated, right?
> 
> What does "the first netns that the selinux ns was associated" mean?
> We could unshare them in any order; in the sample command sequence I
> gave in the patch description for "selinux: add a selinuxfs interface
> to unshare selinux namespace", I unshared the SELinux namespace
> first,
> then the network namespace, but we could just as easily do it in the
> reverse order (or at the same time if unshare(2) supported that).  So
> you can't assume that the network namespace in which you are running
> at
> the time you unshare selinux namespace is the right one, nor that the
> first unshare of the network namespace after unsharing the selinux
> namespace is the right one (not that we even have a way to catch that
> currently).
> 
> > So long as there is a way to find the netns to which an selinux ns
> > is tied, a userspace logger could even setns into that netns to
> > listen
> > for updates, if it wasn't certain to be in the right ns when it
> > ran.
> > 
> > Otherwise (I haven't peeked ahead) you'll have to keep the *list*
> > of
> > net_ns which live in a given selinuxfs and copy all messages to all
> > of
> > those namesapces?
> 
> No, we only deliver to the network namespace of the process that
> performed the setenforce or policy load (most commonly init, could
> also
> be an admin running a management command or installing a policy
> rpm). 
> We assume the container runtime properly handles unsharing of the
> mount, network, and selinux namespaces before launching the container
> init.  A container process that subsequently unshares its network
> namespace won't see notifications for any subsequent policy reloads
> or
> setenforce calls.  I don't know if that will prove to be a problem in
> practice.

It should be noted however that this wouldn't be a regression, since
today the netlink notifications are only delivered to the init network
namespace.
James Morris Oct. 6, 2017, 1:07 a.m. UTC | #4
On Mon, 2 Oct 2017, Stephen Smalley wrote:

> This change presumes that one will always unshare the network namespace
> when unsharing a new selinux namespace (the reverse is not required).
> Otherwise, the same inconsistencies could arise between the notifications
> and the relevant policy.  At present, nothing enforces this guarantee
> at the kernel level; it is left up to userspace (e.g. container runtimes).
> It is an open question as to whether this is a good idea or whether
> unsharing of the selinux namespace should automatically unshare the network
> namespace.  

What about logging a kernel warning if just SELinux is unshared?

I think we want to avoid surprising the user by unsharing things for them, 
and yes, it will be possible to mess your system up if you configure it 
badly.

> However, keeping them separate is consistent with the handling
> of the mount namespace currently, which also should be unshared so that
> a private selinuxfs mount can be created.

Right, and this will in practice always be automated and abstracted from 
an end user pov.
Stephen Smalley Oct. 6, 2017, 1:21 p.m. UTC | #5
On Fri, 2017-10-06 at 12:07 +1100, James Morris wrote:
> On Mon, 2 Oct 2017, Stephen Smalley wrote:
> 
> > This change presumes that one will always unshare the network
> > namespace
> > when unsharing a new selinux namespace (the reverse is not
> > required).
> > Otherwise, the same inconsistencies could arise between the
> > notifications
> > and the relevant policy.  At present, nothing enforces this
> > guarantee
> > at the kernel level; it is left up to userspace (e.g. container
> > runtimes).
> > It is an open question as to whether this is a good idea or whether
> > unsharing of the selinux namespace should automatically unshare the
> > network
> > namespace.  
> 
> What about logging a kernel warning if just SELinux is unshared?

As with Serge's suggestion, the problem is that one can unshare them in
any order, and potentially with intervening steps to set up the
namespace or prepare for doing so, so there is no obvious point where
you could detect and issue such a warning.  Without an interface that
allows unsharing them both simultaneously (either unshare(2)-based or
selinuxfs-based), I don't think we can provide such a warning.

I don't think it will prove to be a problem in practice however;
container runtimes just need to do the right thing (and we can help
this by providing helpers in libselinux or the like).  The larger
concern is not that we'll forget to unshare the network namespace when
we unshare the selinux namespace, but that subsequent further unsharing
of the network namespace by itself could cause lossage of
notifications.  The two cases of concern are that a process unshares
its network namespace again (after the original unsharing of both
selinux namespace and network namespace for the container creation) and
subsequently:

1) Does not get any netlink notifications of setenforce or policy load
events for its selinux namespace. This is only an issue if a program
that uses the userspace AVC also unshares its network namespace or
otherwise is launched into its own network namespace separate from that
of its container.  And it isn't a regression, since before this change
notifications would only be sent to the init network namespace ever, so
this change actually represents an improvement in the ability to at
least get notifications when running in the container's network
namespace.

2) Sets enforcing mode or loads policy itself, in which case the
notification for its setenforce or load_policy will only go to its
network namespace and will not be received by other processes in the
same selinux namespace.  This is only an issue if a process running in
a separate network namespace from that of its container sets enforcing
mode or loads policy.  This seems unlikely to me, since such setting of
enforcing mode or loading of policy will conventionally be restricted
to a small set of privileged processes, such as the container init
process, administrator shells, and package installation/updates, and I
wouldn't expect them to run in a separate network namespace than their
container.

> 
> I think we want to avoid surprising the user by unsharing things for
> them, 
> and yes, it will be possible to mess your system up if you configure
> it 
> badly.
> 
> > However, keeping them separate is consistent with the handling
> > of the mount namespace currently, which also should be unshared so
> > that
> > a private selinuxfs mount can be created.
> 
> Right, and this will in practice always be automated and abstracted
> from 
> an end user pov.
Serge E. Hallyn Oct. 6, 2017, 7:24 p.m. UTC | #6
Quoting Stephen Smalley (sds@tycho.nsa.gov):
> On Fri, 2017-10-06 at 12:07 +1100, James Morris wrote:
> > On Mon, 2 Oct 2017, Stephen Smalley wrote:
> > 
> > > This change presumes that one will always unshare the network
> > > namespace
> > > when unsharing a new selinux namespace (the reverse is not
> > > required).
> > > Otherwise, the same inconsistencies could arise between the
> > > notifications
> > > and the relevant policy.  At present, nothing enforces this
> > > guarantee
> > > at the kernel level; it is left up to userspace (e.g. container
> > > runtimes).
> > > It is an open question as to whether this is a good idea or whether
> > > unsharing of the selinux namespace should automatically unshare the
> > > network
> > > namespace.  
> > 
> > What about logging a kernel warning if just SELinux is unshared?
> 
> As with Serge's suggestion, the problem is that one can unshare them in
> any order, and potentially with intervening steps to set up the

But is it going to stay that way?  I thought that was a temporary thing
while you test it out.

Because as it is it seems unacceptably loose.  A few questions:

(thinking as I type here - ok only the last one is the real question)

Assume I'm in my given netns.  I unshare just my selinuxns.  Does the
selinuxns have the authority to decide things about that netns?

If I unshare selinuxns+netns, then the selinuxns clearly "owns" the netns,
so the selinuxns has clear authority.  Likewise, when I unshare the
netns after the selinuxns, from that point on the selinuxns can be said
to have authority over the netns.

I think you want to keep it completely orthogonal, and I guess so long
as the parent selinuxns policy still applies it's ok.

So I unshare my selinuxns but not userns.  I don't have CAP_MAC_ADMIN
against the user_ns, so any selinux related changes will be denied.
Once I unshare userns, they will be allowed.

Ah, right.  Here's the real question.  Policy dictates file transition
rules.  How will those be namespaced?  Will only the init_selinux_ns be
allowed to specify a file context?  You can't let ns_capable(current_selinux_ns,
CAP_MAC_ADMIN) guide that safely, but capable(CAP_MAC_ADMIN) will be
very restrictive.  What's the plan there?  Sorry if it's spelled out
elsewhere.

-serge
Stephen Smalley Oct. 10, 2017, 2:35 p.m. UTC | #7
On Fri, 2017-10-06 at 14:24 -0500, Serge E. Hallyn wrote:
> Quoting Stephen Smalley (sds@tycho.nsa.gov):
> > On Fri, 2017-10-06 at 12:07 +1100, James Morris wrote:
> > > On Mon, 2 Oct 2017, Stephen Smalley wrote:
> > > 
> > > > This change presumes that one will always unshare the network
> > > > namespace
> > > > when unsharing a new selinux namespace (the reverse is not
> > > > required).
> > > > Otherwise, the same inconsistencies could arise between the
> > > > notifications
> > > > and the relevant policy.  At present, nothing enforces this
> > > > guarantee
> > > > at the kernel level; it is left up to userspace (e.g. container
> > > > runtimes).
> > > > It is an open question as to whether this is a good idea or
> > > > whether
> > > > unsharing of the selinux namespace should automatically unshare
> > > > the
> > > > network
> > > > namespace.  
> > > 
> > > What about logging a kernel warning if just SELinux is unshared?
> > 
> > As with Serge's suggestion, the problem is that one can unshare
> > them in
> > any order, and potentially with intervening steps to set up the
> 
> But is it going to stay that way?  I thought that was a temporary
> thing
> while you test it out.
> 
> Because as it is it seems unacceptably loose.  A few questions:
> 
> (thinking as I type here - ok only the last one is the real question)
> 
> Assume I'm in my given netns.  I unshare just my selinuxns.  Does the
> selinuxns have the authority to decide things about that netns?
> 
> If I unshare selinuxns+netns, then the selinuxns clearly "owns" the
> netns,
> so the selinuxns has clear authority.  Likewise, when I unshare the
> netns after the selinuxns, from that point on the selinuxns can be
> said
> to have authority over the netns.
> 
> I think you want to keep it completely orthogonal, and I guess so
> long
> as the parent selinuxns policy still applies it's ok.
> 
> So I unshare my selinuxns but not userns.  I don't have CAP_MAC_ADMIN
> against the user_ns, so any selinux related changes will be denied.
> Once I unshare userns, they will be allowed.
> 
> Ah, right.  Here's the real question.  Policy dictates file
> transition
> rules.  How will those be namespaced?  Will only the init_selinux_ns
> be
> allowed to specify a file context?  You can't let
> ns_capable(current_selinux_ns,
> CAP_MAC_ADMIN) guide that safely, but capable(CAP_MAC_ADMIN) will be
> very restrictive.  What's the plan there?  Sorry if it's spelled out
> elsewhere.

I don't think we know yet.  The current patches support a separate and
distinct in-core inode security context for each namespace, which
supports scenarios such as using context mounts on the host OS to label
each container with a single context with MCS separation while using
per-file xattrs within a container to support a conventional targeted
policy, but we haven't yet resolved how to deal with the actual
persistent file security contexts, e.g. whether there will be only one
(possibly with a label mapping defined) or multiple.
Serge E. Hallyn Oct. 29, 2017, 3:16 a.m. UTC | #8
On Thu, Oct 05, 2017 at 10:06:55AM -0400, Stephen Smalley wrote:
> On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote:
> > On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote:
> > > The selinux netlink socket is used to notify userspace of changes
> > > to
> > > the enforcing mode and policy reloads.  At present, these
> > > notifications
> > > are always sent to the initial network namespace.  In order to
> > > support
> > > multiple selinux namespaces, each with its own enforcing mode and
> > > policy, we need to create and use a separate selinux netlink socket
> > > for each network namespace.
> > 
> > ...
> > 
> > > +static int __init selnl_init(void)
> > > +{
> > > +	if (register_pernet_subsys(&selnl_net_ops))
> > > +		panic("Could not register selinux netlink
> > > operations\n");
> > >  	return 0;
> > >  }
> > 
> > This doesn't seem right to me.  If the socket is only used to send
> > notifications to userspace, then every net_ns doesn't need a socket,
> > only the first netns that the selinux ns was associated, right?
> 
> What does "the first netns that the selinux ns was associated" mean?
> We could unshare them in any order; in the sample command sequence I
> gave in the patch description for "selinux: add a selinuxfs interface
> to unshare selinux namespace", I unshared the SELinux namespace first,
> then the network namespace, but we could just as easily do it in the
> reverse order (or at the same time if unshare(2) supported that).  So
> you can't assume that the network namespace in which you are running at
> the time you unshare selinux namespace is the right one, nor that the
> first unshare of the network namespace after unsharing the selinux
> namespace is the right one (not that we even have a way to catch that
> currently).
> 
> > So long as there is a way to find the netns to which an selinux ns
> > is tied, a userspace logger could even setns into that netns to
> > listen
> > for updates, if it wasn't certain to be in the right ns when it ran.
> > 
> > Otherwise (I haven't peeked ahead) you'll have to keep the *list* of
> > net_ns which live in a given selinuxfs and copy all messages to all
> > of
> > those namesapces?
> 
> No, we only deliver to the network namespace of the process that
> performed the setenforce or policy load (most commonly init, could also

(Oops, I see I never replied to this,)

I see - sounds good then, thanks.
diff mbox

Patch

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 57faa37..e4dd04a 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -149,6 +149,9 @@  struct net {
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
+#if IS_ENABLED(CONFIG_SECURITY_SELINUX)
+	struct sock		*selnl;
+#endif
 } __randomize_layout;
 
 #include <linux/seq_file_net.h>
diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c
index 828fb6a..679616b 100644
--- a/security/selinux/netlink.c
+++ b/security/selinux/netlink.c
@@ -22,8 +22,6 @@ 
 
 #include "security.h"
 
-static struct sock *selnl;
-
 static int selnl_msglen(int msgtype)
 {
 	int ret = 0;
@@ -69,6 +67,7 @@  static void selnl_add_payload(struct nlmsghdr *nlh, int len, int msgtype, void *
 
 static void selnl_notify(int msgtype, void *data)
 {
+	struct sock *selnl = current->nsproxy->net_ns->selnl;
 	int len;
 	sk_buff_data_t tmp;
 	struct sk_buff *skb;
@@ -108,16 +107,36 @@  void selnl_notify_policyload(u32 seqno)
 	selnl_notify(SELNL_MSG_POLICYLOAD, &seqno);
 }
 
-static int __init selnl_init(void)
+static int __net_init selnl_net_init(struct net *net)
 {
+	struct sock *sk;
 	struct netlink_kernel_cfg cfg = {
 		.groups	= SELNLGRP_MAX,
 		.flags	= NL_CFG_F_NONROOT_RECV,
 	};
 
-	selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX, &cfg);
-	if (selnl == NULL)
-		panic("SELinux:  Cannot create netlink socket.");
+	sk = netlink_kernel_create(net, NETLINK_SELINUX, &cfg);
+	if (!sk)
+		return -ENOMEM;
+	net->selnl = sk;
+	return 0;
+}
+
+static void __net_exit selnl_net_exit(struct net *net)
+{
+	netlink_kernel_release(net->selnl);
+	net->selnl = NULL;
+}
+
+static struct pernet_operations selnl_net_ops = {
+	.init = selnl_net_init,
+	.exit = selnl_net_exit,
+};
+
+static int __init selnl_init(void)
+{
+	if (register_pernet_subsys(&selnl_net_ops))
+		panic("Could not register selinux netlink operations\n");
 	return 0;
 }