Message ID | 20171221210605.181720-1-zenczykowski@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Maciej Żenczykowski <zenczykowski@gmail.com> writes: > From: Maciej Żenczykowski <maze@google.com> > > This allows locking down user namespaces tighter, > and it could even be considered a security fix. No. This makes no logical sense. A task that enters a user namespace loses all capabilities to everything outside of the user namespace. Capabilities inside a user namespace are only valid for objects created inside that user namespace. So limiting capabilities inside a user namespace when the capability bounding set is already fully honored by not giving the processes any of those capabilities makes no logical sense. If the concern is kernel attack surface versus logical permissions we can look at ways to reduce the attack surface but that needs to be fully discussed in the change log. > Signed-off-by: Maciej Żenczykowski <maze@google.com> > --- > kernel/user_namespace.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index 246d4d4ce5c7..2354f7ade78a 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -50,11 +50,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) > * anything as the capabilities are bound to the new user namespace. > */ > cred->securebits = SECUREBITS_DEFAULT; > + cred->cap_bset = task_no_new_privs(current) ? current_cred()->cap_bset > + : CAP_FULL_SET; > cred->cap_inheritable = CAP_EMPTY_SET; > - cred->cap_permitted = CAP_FULL_SET; > - cred->cap_effective = CAP_FULL_SET; > + cred->cap_permitted = cred->cap_bset; > + cred->cap_effective = cred->cap_bset; > cred->cap_ambient = CAP_EMPTY_SET; > - cred->cap_bset = CAP_FULL_SET; > #ifdef CONFIG_KEYS > key_put(cred->request_key_auth); > cred->request_key_auth = NULL; -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 21, 2017 at 10:44 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > No. This makes no logical sense. > > A task that enters a user namespace loses all capabilities to everything > outside of the user namespace. Capabilities inside a user namespace are > only valid for objects created inside that user namespace. > > So limiting capabilities inside a user namespace when the capability > bounding set is already fully honored by not giving the processes any of > those capabilities makes no logical sense. > > If the concern is kernel attack surface versus logical permissions we > can look at ways to reduce the attack surface but that needs to be fully > discussed in the change log. Here's an example of using user namespaces to read a file you shouldn't be able to. lpk19:~# uname -r 4.15.0-smp-d1ce8ceb8ba8 (we start as true global root) lpk19:~# id uid=0(root) gid=0(root) groups=0(root) (cleanup after previous run) lpk19:~# cd /; chattr -i /immu; rm -f /immu/log; rmdir /immu (now we create an append only logfile owned by target user:group) lpk19:~# cd /; mkdir /immu; touch /immu/log; chown produser:prod /immu/log; chmod a-rwx,u+w /immu/log; chattr +a /immu/log (let's show what things look like) lpk19:~# chattr +i /immu; ls -ld / /immu /immu/log; lsattr -d / /immu /immu/log drwxr-xr-x 22 root root 4096 Dec 21 16:33 / drwxr-xr-x 2 root root 4096 Dec 21 16:23 /immu --w------- 1 produser prod 0 Dec 21 16:23 /immu/log -----------I--e---- / ----i---------e---- /immu -----a--------e---- /immu/log (the immutable bit prevents us from changing permissions on the file) lpk19:/# chmod a+rwx /immu/log chmod: changing permissions of '/immu/log': Operation not permitted (the append only bit prevents us from simply overwriting the file) lpk19:/# echo log1 > /immu/log -bash: /immu/log: Operation not permitted (but we can append to it) lpk19:/# echo log1 >> /immu/log (we're global root with CAP_DAC_OVERRIDE, so we can *still* read it) lpk19:/# cat /immu/log log1 (let's transition to target user) lpk19:/# su - produser produser@lpk19:~$ id uid=2080(produser) gid=620(prod) groups=620(prod) (we can't overwrite it) produser@lpk19:~$ echo log2 > /immu/log -su: /immu/log: Operation not permitted (but we can log to it: as intended) produser@lpk19:~$ echo log2 >> /immu/log (we can't change its permissions, cause it's in an immutable directory) produser@lpk19:~$ chmod u+r /immu/log chmod: changing permissions of '/immu/log': Operation not permitted (we can't dump the file, cause we don't have CAP_DAC_OVERRIDE) produser@lpk19:~$ cat /immu/log cat: /immu/log: Permission denied (or can we?) produser@lpk19:~$ unshare -U -r cat /immu/log log1 log2 ---- Now, of course, the above patch doesn't actually fix this on it's own, since 'su' doesn't (yet?) know to restrict bset or to set no_new_privs. But: it allows the sandbox equivalent of su to drop CAP_DAC_OVERRIDE from it's inh/eff/perm/ambient/bset, and set no_new_privs. Now the unshare won't gain CAP_DAC_OVERRIDE and won't be able to cat the non-readable append-only log file. IMHO the point of having a capability bounding set and/or no_new_privs is to never be able to regain capabilities. Note also that 'no_new_privs' isn't cleared across a unshare(CLONE_NEWUSER) [presumably also applies to setns()]. We can of course argue the implementation details (for example instead of using the existing no_new_privs flag, add a new keep_bset_across_userns_transitions securebits flag)... but *something* has to be done. - Maciej -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski <zenczykowski@gmail.com> writes: > On Thu, Dec 21, 2017 at 10:44 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> No. This makes no logical sense. >> >> A task that enters a user namespace loses all capabilities to everything >> outside of the user namespace. Capabilities inside a user namespace are >> only valid for objects created inside that user namespace. >> >> So limiting capabilities inside a user namespace when the capability >> bounding set is already fully honored by not giving the processes any of >> those capabilities makes no logical sense. >> >> If the concern is kernel attack surface versus logical permissions we >> can look at ways to reduce the attack surface but that needs to be fully >> discussed in the change log. > > Here's an example of using user namespaces to read a file you > shouldn't be able to. > > lpk19:~# uname -r > 4.15.0-smp-d1ce8ceb8ba8 > > (we start as true global root) > lpk19:~# id > uid=0(root) gid=0(root) groups=0(root) > > (cleanup after previous run) > lpk19:~# cd /; chattr -i /immu; rm -f /immu/log; rmdir /immu > > (now we create an append only logfile owned by target user:group) > lpk19:~# cd /; mkdir /immu; touch /immu/log; chown produser:prod > /immu/log; chmod a-rwx,u+w /immu/log; chattr +a /immu/log > > (let's show what things look like) > lpk19:~# chattr +i /immu; ls -ld / /immu /immu/log; lsattr -d / /immu /immu/log > drwxr-xr-x 22 root root 4096 Dec 21 16:33 / > drwxr-xr-x 2 root root 4096 Dec 21 16:23 /immu > --w------- 1 produser prod 0 Dec 21 16:23 /immu/log > -----------I--e---- / > ----i---------e---- /immu > -----a--------e---- /immu/log > > (the immutable bit prevents us from changing permissions on the file) > lpk19:/# chmod a+rwx /immu/log > chmod: changing permissions of '/immu/log': Operation not permitted > > (the append only bit prevents us from simply overwriting the file) > lpk19:/# echo log1 > /immu/log > -bash: /immu/log: Operation not permitted > > (but we can append to it) > lpk19:/# echo log1 >> /immu/log > > (we're global root with CAP_DAC_OVERRIDE, so we can *still* read it) > lpk19:/# cat /immu/log > log1 > > (let's transition to target user) > lpk19:/# su - produser > > produser@lpk19:~$ id > uid=2080(produser) gid=620(prod) groups=620(prod) > > (we can't overwrite it) > produser@lpk19:~$ echo log2 > /immu/log > -su: /immu/log: Operation not permitted > > (but we can log to it: as intended) > produser@lpk19:~$ echo log2 >> /immu/log > > (we can't change its permissions, cause it's in an immutable directory) > produser@lpk19:~$ chmod u+r /immu/log > chmod: changing permissions of '/immu/log': Operation not permitted > > (we can't dump the file, cause we don't have CAP_DAC_OVERRIDE) > produser@lpk19:~$ cat /immu/log > cat: /immu/log: Permission denied > > (or can we?) > produser@lpk19:~$ unshare -U -r cat /immu/log > log1 > log2 > > ---- > > Now, of course, the above patch doesn't actually fix this on it's own, > since 'su' doesn't (yet?) know to restrict bset or to set > no_new_privs. > > But: it allows the sandbox equivalent of su to drop CAP_DAC_OVERRIDE > from it's inh/eff/perm/ambient/bset, and set no_new_privs. > Now the unshare won't gain CAP_DAC_OVERRIDE and won't be able to cat > the non-readable append-only log file. > > IMHO the point of having a capability bounding set and/or no_new_privs > is to never be able to regain capabilities. > Note also that 'no_new_privs' isn't cleared across a > unshare(CLONE_NEWUSER) [presumably also applies to setns()]. > > We can of course argue the implementation details (for example instead > of using the existing no_new_privs flag, add a new > keep_bset_across_userns_transitions securebits flag)... but > *something* has to be done. Good point about CAP_DAC_OVERRIDE on files you own. I think there is an argument that you are playing dangerous games with the permission system there, as it isn't effectively a file you own if you can't read it, and you can't change it's permissions. Given little things like that I can completely see no_new_privs meaning you can't create a user namespace. That seems consistent with the meaning and philosophy of no_new_privs. So simple it is hard to get wrong. We could do more clever things like plug this whole in user namespaces, and that would not hurt my feelings. However unless that is our only choice to avoid badly breaking userspace I would have to have to depend on user namespaces being perfect for no_new_privs to be a proper jail. As a general rule user namespaces are where we tackle the subtle scary things that should work, and no_new_privs is where we implement a simple hard to get wrong jail. Most of the time the effect is the same to an outside observer (bounded permissions), but there is a real difference in difficulty of implementation. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Good point about CAP_DAC_OVERRIDE on files you own. > > I think there is an argument that you are playing dangerous games with > the permission system there, as it isn't effectively a file you own if > you can't read it, and you can't change it's permissions. Append-only files are useful - particularly for logging. It could also simply be a non-readable file on a R/O filesystem. > Given little things like that I can completely see no_new_privs meaning > you can't create a user namespace. That seems consistent with the > meaning and philosophy of no_new_privs. So simple it is hard to get > wrong. Yes, I could totally buy the argument that no_new_privs should prevent creating a user ns. However, there's also setns() and that's a fair bit harder to reason about. Entirely deny it? But that actually seems potentially useful... Allow it but cap it? That's what this does... > We could do more clever things like plug this whole in user namespaces, > and that would not hurt my feelings. Sure, this particular one wouldn't be all that easy I think... and how many such holes are there? I found this particular one *after* your first reply in this thread. > However unless that is our only > choice to avoid badly breaking userspace I would have to have to depend > on user namespaces being perfect for no_new_privs to be a proper jail. This stuff is ridiculously complex to get right from userspace. :-( > As a general rule user namespaces are where we tackle the subtle scary > things that should work, and no_new_privs is where we implement a simple > hard to get wrong jail. Most of the time the effect is the same to an > outside observer (bounded permissions), but there is a real difference > in difficulty of implementation. So, where to now... Would you accept patches that: - make no_new_priv block user ns creation? - make no_new_priv block user ns transition? Or perhaps we can assume that lack of create privs is sufficient, and if there's a pre-existing user ns for you to enter, then that's acceptable... Although this implies you probably always want to combine no_new_privs with a leaf user ns, or no_new_privs isn't all that useful for root in root ns... This added complexity, probably means it should be blocked... - inherits bset across user ns creation/transition based on X? [this is the one we care about, because there are simply too many bugs in the kernel wrt. certain caps] X could be: - a new flag similar to no_new_priv - a new securebit flag (w/lockbit) [provided securebits survive a userns transition, haven't checked] - or perhaps a new capability - something else? How do we make forward progress? -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2017-12-21, Eric W. Biederman <ebiederm@xmission.com> wrote: > Good point about CAP_DAC_OVERRIDE on files you own. > > I think there is an argument that you are playing dangerous games with > the permission system there, as it isn't effectively a file you own if > you can't read it, and you can't change it's permissions. This problem reminds me of the whole "unmapped group" problem. If you have access to a file through an unmapped group you can still access a file -- which to me is wrong. I understand the need for checking unmapped groups in order to fix the "chmod 707" problem, but I think that unmapped groups should only *block* access and never *grant* it. I was working on a patch for that issue a while ago but it touched more VFS than I was comfortable with. Eric, is that a fix you would be interested in?
Maciej Żenczykowski <zenczykowski@gmail.com> writes: >> Good point about CAP_DAC_OVERRIDE on files you own. >> >> I think there is an argument that you are playing dangerous games with >> the permission system there, as it isn't effectively a file you own if >> you can't read it, and you can't change it's permissions. > > Append-only files are useful - particularly for logging. > It could also simply be a non-readable file on a R/O filesystem. > >> Given little things like that I can completely see no_new_privs meaning >> you can't create a user namespace. That seems consistent with the >> meaning and philosophy of no_new_privs. So simple it is hard to get >> wrong. > > Yes, I could totally buy the argument that no_new_privs should prevent > creating a user ns. > > However, there's also setns() and that's a fair bit harder to reason about. > Entirely deny it? But that actually seems potentially useful... > Allow it but cap it? That's what this does... > >> We could do more clever things like plug this whole in user namespaces, >> and that would not hurt my feelings. > > Sure, this particular one wouldn't be all that easy I think... and how > many such holes are there? > I found this particular one *after* your first reply in this thread. > >> However unless that is our only >> choice to avoid badly breaking userspace I would have to have to depend >> on user namespaces being perfect for no_new_privs to be a proper jail. > > This stuff is ridiculously complex to get right from userspace. :-( >> As a general rule user namespaces are where we tackle the subtle scary >> things that should work, and no_new_privs is where we implement a simple >> hard to get wrong jail. Most of the time the effect is the same to an >> outside observer (bounded permissions), but there is a real difference >> in difficulty of implementation. > > So, where to now... > > Would you accept patches that: > > - make no_new_priv block user ns creation? > > - make no_new_priv block user ns transition? Yes. The approach will need to be rethought if there is anything deliberately combining user namespaces and no_new_privs. As regressions are a no-no. So we need wide spread testing, to avoid that. But as much as possible I want no_new_privs to be simple and doing it's job. I will also take and encourage patches that close this minor privilege escalation from the user namespace side. As ideally creating a user namespace should be as safe as no_new_privs. > Or perhaps we can assume that lack of create privs is sufficient, and > if there's a pre-existing user ns for you to enter, then that's > acceptable... > Although this implies you probably always want to combine no_new_privs > with a leaf user ns, or no_new_privs isn't all that useful for root in > root ns... > This added complexity, probably means it should be blocked... Yes. > - inherits bset across user ns creation/transition based on X? > [this is the one we care about, because there are simply too many bugs > in the kernel wrt. certain caps] That was my suspicion, and attack surface reduction is a different discussion. Would no_new_privs preventing a userns transition be enough for the cases you care about? Otherwise this is a different conversation because it is not about semantics but about making the code safer to use. In general if code is simply not safe to user in a user namespace I would prefer to tighten the permission checks, and just not allow that code. Mostly what I have seen in previous conversations is simply concerns about code that is not used or needed, being a problem. > X could be: > - a new flag similar to no_new_priv > - a new securebit flag (w/lockbit) [provided securebits survive a > userns transition, haven't checked] > - or perhaps a new capability > - something else? > > How do we make forward progress? We start by causing no_new_privs to block userns creation and entering. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Aleksa Sarai <asarai@suse.de> writes: > On 2017-12-21, Eric W. Biederman <ebiederm@xmission.com> wrote: >> Good point about CAP_DAC_OVERRIDE on files you own. >> >> I think there is an argument that you are playing dangerous games with >> the permission system there, as it isn't effectively a file you own if >> you can't read it, and you can't change it's permissions. > > This problem reminds me of the whole "unmapped group" problem. If you > have access to a file through an unmapped group you can still access a > file -- which to me is wrong. I understand the need for checking > unmapped groups in order to fix the "chmod 707" problem, but I think > that unmapped groups should only *block* access and never *grant* it. > > I was working on a patch for that issue a while ago but it touched more > VFS than I was comfortable with. Eric, is that a fix you would be > interested in? I am not certain. I don't see how there is a problem with an unmapped group granting permissions. You are talking about a scenario where a more privileged login program set your groups, and uid and gid. The process despite being a user namespace does not have permission to transition them. As such I don't see the harm. But spell it out for me, and deal with ensuring we don't have user space regressions and I will take a patch that improves the security of user namespaces. I think the issue that raised all of this is that dropping a group can in rare instances increase permissions. I have heard people grumble at me that the way I handle it with /etc/subuid might break things on some people's systems. AKA don't allow it by default but allow root to configure a way for people using user namespaces to do that. I have yet to see someone come forward and say that is a problem in the real world. If it actually is a problem I want to hear about it. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 22, 2017 at 08:08:04AM -0600, Eric W. Biederman wrote: > Maciej Żenczykowski <zenczykowski@gmail.com> writes: > > >> Good point about CAP_DAC_OVERRIDE on files you own. > >> > >> I think there is an argument that you are playing dangerous games with > >> the permission system there, as it isn't effectively a file you own if > >> you can't read it, and you can't change it's permissions. > > > > Append-only files are useful - particularly for logging. > > It could also simply be a non-readable file on a R/O filesystem. > > > >> Given little things like that I can completely see no_new_privs meaning > >> you can't create a user namespace. That seems consistent with the > >> meaning and philosophy of no_new_privs. So simple it is hard to get > >> wrong. > > > > Yes, I could totally buy the argument that no_new_privs should prevent > > creating a user ns. > > > > However, there's also setns() and that's a fair bit harder to reason about. > > Entirely deny it? But that actually seems potentially useful... > > Allow it but cap it? That's what this does... > > > >> We could do more clever things like plug this whole in user namespaces, > >> and that would not hurt my feelings. > > > > Sure, this particular one wouldn't be all that easy I think... and how > > many such holes are there? > > I found this particular one *after* your first reply in this thread. > > > >> However unless that is our only > >> choice to avoid badly breaking userspace I would have to have to depend > >> on user namespaces being perfect for no_new_privs to be a proper jail. > > > > This stuff is ridiculously complex to get right from userspace. :-( > > >> As a general rule user namespaces are where we tackle the subtle scary > >> things that should work, and no_new_privs is where we implement a simple > >> hard to get wrong jail. Most of the time the effect is the same to an > >> outside observer (bounded permissions), but there is a real difference > >> in difficulty of implementation. > > > > So, where to now... > > > > Would you accept patches that: > > > > - make no_new_priv block user ns creation? > > > > - make no_new_priv block user ns transition? > > Yes. > > The approach will need to be rethought if there is anything deliberately > combining user namespaces and no_new_privs. As regressions are a no-no. > So we need wide spread testing, to avoid that. Just to be clear: as soon as you set no_new_privs you should not be able to create new user namespace anymore and you shouldn't be able to attach to user namespaces anymore. This should work and not cause regressions for e.g. liblxc were we always set no_new_privs as one of the last steps and especially after we have clone(CLONE_NEWUSER) the container and - for the attach case - after we have already done the setns(fd, CLONE_NEWUSER). These use-cases should be preserved. But it seems to me that they will be. But I'd like to test this once patches are floating around. > > But as much as possible I want no_new_privs to be simple and doing it's > job. > > I will also take and encourage patches that close this minor privilege > escalation from the user namespace side. As ideally creating a user > namespace should be as safe as no_new_privs. > > > > Or perhaps we can assume that lack of create privs is sufficient, and > > if there's a pre-existing user ns for you to enter, then that's > > acceptable... > > Although this implies you probably always want to combine no_new_privs > > with a leaf user ns, or no_new_privs isn't all that useful for root in > > root ns... > > This added complexity, probably means it should be blocked... > > Yes. > > > - inherits bset across user ns creation/transition based on X? > > [this is the one we care about, because there are simply too many bugs > > in the kernel wrt. certain caps] > > That was my suspicion, and attack surface reduction is a different > discussion. Would no_new_privs preventing a userns transition be enough > for the cases you care about? > > Otherwise this is a different conversation because it is not about > semantics but about making the code safer to use. In general if code is > simply not safe to user in a user namespace I would prefer to tighten > the permission checks, and just not allow that code. > > Mostly what I have seen in previous conversations is simply concerns > about code that is not used or needed, being a problem. > > > X could be: > > - a new flag similar to no_new_priv > > - a new securebit flag (w/lockbit) [provided securebits survive a > > userns transition, haven't checked] > > - or perhaps a new capability > > - something else? > > > > How do we make forward progress? > > We start by causing no_new_privs to block userns creation and entering. That sounds reasonable. One thing I thought about is what the interaction between no_new_privs and /proc/sys/user/max_user_namespaces should look like after this change. Both will basically have the same effect, right? So wouldn't it make sense to have no_new_privs imply that /proc/sys/user/max_user_namespaces is 0 and becomes read-only in the in the calling process' user namespace (e.g. returning EINVAL would make sense)? I worry that otherwise we might confuse users when they see that /proc/sys/user/max_user_namespaces is not 0 but they still aren't able to create user namespaces. I know that having multiple security options block the same thing independent of each other is not unprecedented (e.g. in addition to /proc/sys/user/max_user_namespaces CentOS and RHEL have an additional boot parameter to {dis,en}able user namespace creation which often confuses the heck out of users) but I'd rather not make this a common scenario when we can establish a sensible interaction between two security settings. Christian -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 246d4d4ce5c7..2354f7ade78a 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -50,11 +50,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) * anything as the capabilities are bound to the new user namespace. */ cred->securebits = SECUREBITS_DEFAULT; + cred->cap_bset = task_no_new_privs(current) ? current_cred()->cap_bset + : CAP_FULL_SET; cred->cap_inheritable = CAP_EMPTY_SET; - cred->cap_permitted = CAP_FULL_SET; - cred->cap_effective = CAP_FULL_SET; + cred->cap_permitted = cred->cap_bset; + cred->cap_effective = cred->cap_bset; cred->cap_ambient = CAP_EMPTY_SET; - cred->cap_bset = CAP_FULL_SET; #ifdef CONFIG_KEYS key_put(cred->request_key_auth); cred->request_key_auth = NULL;