Message ID | 20171002155825.28620-5-sds@tycho.nsa.gov (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote: > The selinux netlink socket is used to notify userspace of changes to > the enforcing mode and policy reloads. At present, these notifications > are always sent to the initial network namespace. In order to support > multiple selinux namespaces, each with its own enforcing mode and > policy, we need to create and use a separate selinux netlink socket > for each network namespace. ... > +static int __init selnl_init(void) > +{ > + if (register_pernet_subsys(&selnl_net_ops)) > + panic("Could not register selinux netlink operations\n"); > return 0; > } This doesn't seem right to me. If the socket is only used to send notifications to userspace, then every net_ns doesn't need a socket, only the first netns that the selinux ns was associated, right? So long as there is a way to find the netns to which an selinux ns is tied, a userspace logger could even setns into that netns to listen for updates, if it wasn't certain to be in the right ns when it ran. Otherwise (I haven't peeked ahead) you'll have to keep the *list* of net_ns which live in a given selinuxfs and copy all messages to all of those namesapces? -serge
On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote: > On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote: > > The selinux netlink socket is used to notify userspace of changes > > to > > the enforcing mode and policy reloads. At present, these > > notifications > > are always sent to the initial network namespace. In order to > > support > > multiple selinux namespaces, each with its own enforcing mode and > > policy, we need to create and use a separate selinux netlink socket > > for each network namespace. > > ... > > > +static int __init selnl_init(void) > > +{ > > + if (register_pernet_subsys(&selnl_net_ops)) > > + panic("Could not register selinux netlink > > operations\n"); > > return 0; > > } > > This doesn't seem right to me. If the socket is only used to send > notifications to userspace, then every net_ns doesn't need a socket, > only the first netns that the selinux ns was associated, right? What does "the first netns that the selinux ns was associated" mean? We could unshare them in any order; in the sample command sequence I gave in the patch description for "selinux: add a selinuxfs interface to unshare selinux namespace", I unshared the SELinux namespace first, then the network namespace, but we could just as easily do it in the reverse order (or at the same time if unshare(2) supported that). So you can't assume that the network namespace in which you are running at the time you unshare selinux namespace is the right one, nor that the first unshare of the network namespace after unsharing the selinux namespace is the right one (not that we even have a way to catch that currently). > So long as there is a way to find the netns to which an selinux ns > is tied, a userspace logger could even setns into that netns to > listen > for updates, if it wasn't certain to be in the right ns when it ran. > > Otherwise (I haven't peeked ahead) you'll have to keep the *list* of > net_ns which live in a given selinuxfs and copy all messages to all > of > those namesapces? No, we only deliver to the network namespace of the process that performed the setenforce or policy load (most commonly init, could also be an admin running a management command or installing a policy rpm). We assume the container runtime properly handles unsharing of the mount, network, and selinux namespaces before launching the container init. A container process that subsequently unshares its network namespace won't see notifications for any subsequent policy reloads or setenforce calls. I don't know if that will prove to be a problem in practice.
On Thu, 2017-10-05 at 10:06 -0400, Stephen Smalley wrote: > On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote: > > On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote: > > > The selinux netlink socket is used to notify userspace of changes > > > to > > > the enforcing mode and policy reloads. At present, these > > > notifications > > > are always sent to the initial network namespace. In order to > > > support > > > multiple selinux namespaces, each with its own enforcing mode and > > > policy, we need to create and use a separate selinux netlink > > > socket > > > for each network namespace. > > > > ... > > > > > +static int __init selnl_init(void) > > > +{ > > > + if (register_pernet_subsys(&selnl_net_ops)) > > > + panic("Could not register selinux netlink > > > operations\n"); > > > return 0; > > > } > > > > This doesn't seem right to me. If the socket is only used to send > > notifications to userspace, then every net_ns doesn't need a > > socket, > > only the first netns that the selinux ns was associated, right? > > What does "the first netns that the selinux ns was associated" mean? > We could unshare them in any order; in the sample command sequence I > gave in the patch description for "selinux: add a selinuxfs interface > to unshare selinux namespace", I unshared the SELinux namespace > first, > then the network namespace, but we could just as easily do it in the > reverse order (or at the same time if unshare(2) supported that). So > you can't assume that the network namespace in which you are running > at > the time you unshare selinux namespace is the right one, nor that the > first unshare of the network namespace after unsharing the selinux > namespace is the right one (not that we even have a way to catch that > currently). > > > So long as there is a way to find the netns to which an selinux ns > > is tied, a userspace logger could even setns into that netns to > > listen > > for updates, if it wasn't certain to be in the right ns when it > > ran. > > > > Otherwise (I haven't peeked ahead) you'll have to keep the *list* > > of > > net_ns which live in a given selinuxfs and copy all messages to all > > of > > those namesapces? > > No, we only deliver to the network namespace of the process that > performed the setenforce or policy load (most commonly init, could > also > be an admin running a management command or installing a policy > rpm). > We assume the container runtime properly handles unsharing of the > mount, network, and selinux namespaces before launching the container > init. A container process that subsequently unshares its network > namespace won't see notifications for any subsequent policy reloads > or > setenforce calls. I don't know if that will prove to be a problem in > practice. It should be noted however that this wouldn't be a regression, since today the netlink notifications are only delivered to the init network namespace.
On Mon, 2 Oct 2017, Stephen Smalley wrote: > This change presumes that one will always unshare the network namespace > when unsharing a new selinux namespace (the reverse is not required). > Otherwise, the same inconsistencies could arise between the notifications > and the relevant policy. At present, nothing enforces this guarantee > at the kernel level; it is left up to userspace (e.g. container runtimes). > It is an open question as to whether this is a good idea or whether > unsharing of the selinux namespace should automatically unshare the network > namespace. What about logging a kernel warning if just SELinux is unshared? I think we want to avoid surprising the user by unsharing things for them, and yes, it will be possible to mess your system up if you configure it badly. > However, keeping them separate is consistent with the handling > of the mount namespace currently, which also should be unshared so that > a private selinuxfs mount can be created. Right, and this will in practice always be automated and abstracted from an end user pov.
On Fri, 2017-10-06 at 12:07 +1100, James Morris wrote: > On Mon, 2 Oct 2017, Stephen Smalley wrote: > > > This change presumes that one will always unshare the network > > namespace > > when unsharing a new selinux namespace (the reverse is not > > required). > > Otherwise, the same inconsistencies could arise between the > > notifications > > and the relevant policy. At present, nothing enforces this > > guarantee > > at the kernel level; it is left up to userspace (e.g. container > > runtimes). > > It is an open question as to whether this is a good idea or whether > > unsharing of the selinux namespace should automatically unshare the > > network > > namespace. > > What about logging a kernel warning if just SELinux is unshared? As with Serge's suggestion, the problem is that one can unshare them in any order, and potentially with intervening steps to set up the namespace or prepare for doing so, so there is no obvious point where you could detect and issue such a warning. Without an interface that allows unsharing them both simultaneously (either unshare(2)-based or selinuxfs-based), I don't think we can provide such a warning. I don't think it will prove to be a problem in practice however; container runtimes just need to do the right thing (and we can help this by providing helpers in libselinux or the like). The larger concern is not that we'll forget to unshare the network namespace when we unshare the selinux namespace, but that subsequent further unsharing of the network namespace by itself could cause lossage of notifications. The two cases of concern are that a process unshares its network namespace again (after the original unsharing of both selinux namespace and network namespace for the container creation) and subsequently: 1) Does not get any netlink notifications of setenforce or policy load events for its selinux namespace. This is only an issue if a program that uses the userspace AVC also unshares its network namespace or otherwise is launched into its own network namespace separate from that of its container. And it isn't a regression, since before this change notifications would only be sent to the init network namespace ever, so this change actually represents an improvement in the ability to at least get notifications when running in the container's network namespace. 2) Sets enforcing mode or loads policy itself, in which case the notification for its setenforce or load_policy will only go to its network namespace and will not be received by other processes in the same selinux namespace. This is only an issue if a process running in a separate network namespace from that of its container sets enforcing mode or loads policy. This seems unlikely to me, since such setting of enforcing mode or loading of policy will conventionally be restricted to a small set of privileged processes, such as the container init process, administrator shells, and package installation/updates, and I wouldn't expect them to run in a separate network namespace than their container. > > I think we want to avoid surprising the user by unsharing things for > them, > and yes, it will be possible to mess your system up if you configure > it > badly. > > > However, keeping them separate is consistent with the handling > > of the mount namespace currently, which also should be unshared so > > that > > a private selinuxfs mount can be created. > > Right, and this will in practice always be automated and abstracted > from > an end user pov.
Quoting Stephen Smalley (sds@tycho.nsa.gov): > On Fri, 2017-10-06 at 12:07 +1100, James Morris wrote: > > On Mon, 2 Oct 2017, Stephen Smalley wrote: > > > > > This change presumes that one will always unshare the network > > > namespace > > > when unsharing a new selinux namespace (the reverse is not > > > required). > > > Otherwise, the same inconsistencies could arise between the > > > notifications > > > and the relevant policy. At present, nothing enforces this > > > guarantee > > > at the kernel level; it is left up to userspace (e.g. container > > > runtimes). > > > It is an open question as to whether this is a good idea or whether > > > unsharing of the selinux namespace should automatically unshare the > > > network > > > namespace. > > > > What about logging a kernel warning if just SELinux is unshared? > > As with Serge's suggestion, the problem is that one can unshare them in > any order, and potentially with intervening steps to set up the But is it going to stay that way? I thought that was a temporary thing while you test it out. Because as it is it seems unacceptably loose. A few questions: (thinking as I type here - ok only the last one is the real question) Assume I'm in my given netns. I unshare just my selinuxns. Does the selinuxns have the authority to decide things about that netns? If I unshare selinuxns+netns, then the selinuxns clearly "owns" the netns, so the selinuxns has clear authority. Likewise, when I unshare the netns after the selinuxns, from that point on the selinuxns can be said to have authority over the netns. I think you want to keep it completely orthogonal, and I guess so long as the parent selinuxns policy still applies it's ok. So I unshare my selinuxns but not userns. I don't have CAP_MAC_ADMIN against the user_ns, so any selinux related changes will be denied. Once I unshare userns, they will be allowed. Ah, right. Here's the real question. Policy dictates file transition rules. How will those be namespaced? Will only the init_selinux_ns be allowed to specify a file context? You can't let ns_capable(current_selinux_ns, CAP_MAC_ADMIN) guide that safely, but capable(CAP_MAC_ADMIN) will be very restrictive. What's the plan there? Sorry if it's spelled out elsewhere. -serge
On Fri, 2017-10-06 at 14:24 -0500, Serge E. Hallyn wrote: > Quoting Stephen Smalley (sds@tycho.nsa.gov): > > On Fri, 2017-10-06 at 12:07 +1100, James Morris wrote: > > > On Mon, 2 Oct 2017, Stephen Smalley wrote: > > > > > > > This change presumes that one will always unshare the network > > > > namespace > > > > when unsharing a new selinux namespace (the reverse is not > > > > required). > > > > Otherwise, the same inconsistencies could arise between the > > > > notifications > > > > and the relevant policy. At present, nothing enforces this > > > > guarantee > > > > at the kernel level; it is left up to userspace (e.g. container > > > > runtimes). > > > > It is an open question as to whether this is a good idea or > > > > whether > > > > unsharing of the selinux namespace should automatically unshare > > > > the > > > > network > > > > namespace. > > > > > > What about logging a kernel warning if just SELinux is unshared? > > > > As with Serge's suggestion, the problem is that one can unshare > > them in > > any order, and potentially with intervening steps to set up the > > But is it going to stay that way? I thought that was a temporary > thing > while you test it out. > > Because as it is it seems unacceptably loose. A few questions: > > (thinking as I type here - ok only the last one is the real question) > > Assume I'm in my given netns. I unshare just my selinuxns. Does the > selinuxns have the authority to decide things about that netns? > > If I unshare selinuxns+netns, then the selinuxns clearly "owns" the > netns, > so the selinuxns has clear authority. Likewise, when I unshare the > netns after the selinuxns, from that point on the selinuxns can be > said > to have authority over the netns. > > I think you want to keep it completely orthogonal, and I guess so > long > as the parent selinuxns policy still applies it's ok. > > So I unshare my selinuxns but not userns. I don't have CAP_MAC_ADMIN > against the user_ns, so any selinux related changes will be denied. > Once I unshare userns, they will be allowed. > > Ah, right. Here's the real question. Policy dictates file > transition > rules. How will those be namespaced? Will only the init_selinux_ns > be > allowed to specify a file context? You can't let > ns_capable(current_selinux_ns, > CAP_MAC_ADMIN) guide that safely, but capable(CAP_MAC_ADMIN) will be > very restrictive. What's the plan there? Sorry if it's spelled out > elsewhere. I don't think we know yet. The current patches support a separate and distinct in-core inode security context for each namespace, which supports scenarios such as using context mounts on the host OS to label each container with a single context with MCS separation while using per-file xattrs within a container to support a conventional targeted policy, but we haven't yet resolved how to deal with the actual persistent file security contexts, e.g. whether there will be only one (possibly with a label mapping defined) or multiple.
On Thu, Oct 05, 2017 at 10:06:55AM -0400, Stephen Smalley wrote: > On Thu, 2017-10-05 at 00:47 -0500, Serge E. Hallyn wrote: > > On Mon, Oct 02, 2017 at 11:58:19AM -0400, Stephen Smalley wrote: > > > The selinux netlink socket is used to notify userspace of changes > > > to > > > the enforcing mode and policy reloads. At present, these > > > notifications > > > are always sent to the initial network namespace. In order to > > > support > > > multiple selinux namespaces, each with its own enforcing mode and > > > policy, we need to create and use a separate selinux netlink socket > > > for each network namespace. > > > > ... > > > > > +static int __init selnl_init(void) > > > +{ > > > + if (register_pernet_subsys(&selnl_net_ops)) > > > + panic("Could not register selinux netlink > > > operations\n"); > > > return 0; > > > } > > > > This doesn't seem right to me. If the socket is only used to send > > notifications to userspace, then every net_ns doesn't need a socket, > > only the first netns that the selinux ns was associated, right? > > What does "the first netns that the selinux ns was associated" mean? > We could unshare them in any order; in the sample command sequence I > gave in the patch description for "selinux: add a selinuxfs interface > to unshare selinux namespace", I unshared the SELinux namespace first, > then the network namespace, but we could just as easily do it in the > reverse order (or at the same time if unshare(2) supported that). So > you can't assume that the network namespace in which you are running at > the time you unshare selinux namespace is the right one, nor that the > first unshare of the network namespace after unsharing the selinux > namespace is the right one (not that we even have a way to catch that > currently). > > > So long as there is a way to find the netns to which an selinux ns > > is tied, a userspace logger could even setns into that netns to > > listen > > for updates, if it wasn't certain to be in the right ns when it ran. > > > > Otherwise (I haven't peeked ahead) you'll have to keep the *list* of > > net_ns which live in a given selinuxfs and copy all messages to all > > of > > those namesapces? > > No, we only deliver to the network namespace of the process that > performed the setenforce or policy load (most commonly init, could also (Oops, I see I never replied to this,) I see - sounds good then, thanks.
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 57faa37..e4dd04a 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -149,6 +149,9 @@ struct net { #endif struct sock *diag_nlsk; atomic_t fnhe_genid; +#if IS_ENABLED(CONFIG_SECURITY_SELINUX) + struct sock *selnl; +#endif } __randomize_layout; #include <linux/seq_file_net.h> diff --git a/security/selinux/netlink.c b/security/selinux/netlink.c index 828fb6a..679616b 100644 --- a/security/selinux/netlink.c +++ b/security/selinux/netlink.c @@ -22,8 +22,6 @@ #include "security.h" -static struct sock *selnl; - static int selnl_msglen(int msgtype) { int ret = 0; @@ -69,6 +67,7 @@ static void selnl_add_payload(struct nlmsghdr *nlh, int len, int msgtype, void * static void selnl_notify(int msgtype, void *data) { + struct sock *selnl = current->nsproxy->net_ns->selnl; int len; sk_buff_data_t tmp; struct sk_buff *skb; @@ -108,16 +107,36 @@ void selnl_notify_policyload(u32 seqno) selnl_notify(SELNL_MSG_POLICYLOAD, &seqno); } -static int __init selnl_init(void) +static int __net_init selnl_net_init(struct net *net) { + struct sock *sk; struct netlink_kernel_cfg cfg = { .groups = SELNLGRP_MAX, .flags = NL_CFG_F_NONROOT_RECV, }; - selnl = netlink_kernel_create(&init_net, NETLINK_SELINUX, &cfg); - if (selnl == NULL) - panic("SELinux: Cannot create netlink socket."); + sk = netlink_kernel_create(net, NETLINK_SELINUX, &cfg); + if (!sk) + return -ENOMEM; + net->selnl = sk; + return 0; +} + +static void __net_exit selnl_net_exit(struct net *net) +{ + netlink_kernel_release(net->selnl); + net->selnl = NULL; +} + +static struct pernet_operations selnl_net_ops = { + .init = selnl_net_init, + .exit = selnl_net_exit, +}; + +static int __init selnl_init(void) +{ + if (register_pernet_subsys(&selnl_net_ops)) + panic("Could not register selinux netlink operations\n"); return 0; }
The selinux netlink socket is used to notify userspace of changes to the enforcing mode and policy reloads. At present, these notifications are always sent to the initial network namespace. In order to support multiple selinux namespaces, each with its own enforcing mode and policy, we need to create and use a separate selinux netlink socket for each network namespace. Without this change, a policy reload in a child selinux namespace causes a notification to be sent to processes in the init namespace with a sequence number that may be higher than the policy sequence number for that namespace. As a result, userspace AVC instances in the init namespace will then end up rejecting any further access vector results from its own security server instance due to the policy sequence number appearing to regress, which in turn causes all subsequent uncached access checks to fail. Similarly, without this change, changing enforcing mode in the child selinux namespace triggers a notification to all userspace AVC instances in the init namespace that will switch their enforcing modes. This change does alter SELinux behavior, since previously reloading policy or changing enforcing mode in a non-init network namespace would trigger a notification to processes in the init network namespace. However, this behavior is not being relied upon by existing userspace AFAICT and is arguably wrong regardless. This change presumes that one will always unshare the network namespace when unsharing a new selinux namespace (the reverse is not required). Otherwise, the same inconsistencies could arise between the notifications and the relevant policy. At present, nothing enforces this guarantee at the kernel level; it is left up to userspace (e.g. container runtimes). It is an open question as to whether this is a good idea or whether unsharing of the selinux namespace should automatically unshare the network namespace. However, keeping them separate is consistent with the handling of the mount namespace currently, which also should be unshared so that a private selinuxfs mount can be created. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> --- include/net/net_namespace.h | 3 +++ security/selinux/netlink.c | 31 +++++++++++++++++++++++++------ 2 files changed, 28 insertions(+), 6 deletions(-)