diff mbox series

LSM: add SafeSetID module that gates setid calls

Message ID 20181031152846.234791-1-mortonm@chromium.org (mailing list archive)
State New, archived
Headers show
Series LSM: add SafeSetID module that gates setid calls | expand

Commit Message

Micah Morton Oct. 31, 2018, 3:28 p.m. UTC
From: Micah Morton <mortonm@chromium.org>

SafeSetID gates the setid family of syscalls to restrict UID/GID
transitions from a given UID/GID to only those approved by a
system-wide whitelist. These restrictions also prohibit the given
UIDs/GIDs from obtaining auxiliary privileges associated with
CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
mappings. For now, only gating the set*uid family of syscalls is
supported, with support for set*gid coming in a future patch set.

Signed-off-by: Micah Morton <mortonm@chromium.org>
---

NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
code that likely needs improvement before being an acceptable approach.
I'm specifically interested to see if there are better ideas for how
this could be done.

 Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
 Documentation/admin-guide/LSM/index.rst     |   1 +
 arch/Kconfig                                |   5 +
 arch/arm/Kconfig                            |   1 +
 arch/arm64/Kconfig                          |   1 +
 arch/x86/Kconfig                            |   1 +
 security/Kconfig                            |   1 +
 security/Makefile                           |   2 +
 security/safesetid/Kconfig                  |  13 +
 security/safesetid/Makefile                 |   7 +
 security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
 security/safesetid/lsm.h                    |  30 ++
 security/safesetid/securityfs.c             | 189 +++++++++++
 13 files changed, 679 insertions(+)
 create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
 create mode 100644 security/safesetid/Kconfig
 create mode 100644 security/safesetid/Makefile
 create mode 100644 security/safesetid/lsm.c
 create mode 100644 security/safesetid/lsm.h
 create mode 100644 security/safesetid/securityfs.c

Comments

Serge E. Hallyn Oct. 31, 2018, 9:02 p.m. UTC | #1
Quoting mortonm@chromium.org (mortonm@chromium.org):
> From: Micah Morton <mortonm@chromium.org>
> 
> SafeSetID gates the setid family of syscalls to restrict UID/GID
> transitions from a given UID/GID to only those approved by a
> system-wide whitelist. These restrictions also prohibit the given
> UIDs/GIDs from obtaining auxiliary privileges associated with
> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> mappings. For now, only gating the set*uid family of syscalls is
> supported, with support for set*gid coming in a future patch set.
> 
> Signed-off-by: Micah Morton <mortonm@chromium.org>
> ---
> 
> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> code that likely needs improvement before being an acceptable approach.
> I'm specifically interested to see if there are better ideas for how
> this could be done.
> 
>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
>  Documentation/admin-guide/LSM/index.rst     |   1 +
>  arch/Kconfig                                |   5 +
>  arch/arm/Kconfig                            |   1 +
>  arch/arm64/Kconfig                          |   1 +
>  arch/x86/Kconfig                            |   1 +
>  security/Kconfig                            |   1 +
>  security/Makefile                           |   2 +
>  security/safesetid/Kconfig                  |  13 +
>  security/safesetid/Makefile                 |   7 +
>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
>  security/safesetid/lsm.h                    |  30 ++
>  security/safesetid/securityfs.c             | 189 +++++++++++
>  13 files changed, 679 insertions(+)
>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
>  create mode 100644 security/safesetid/Kconfig
>  create mode 100644 security/safesetid/Makefile
>  create mode 100644 security/safesetid/lsm.c
>  create mode 100644 security/safesetid/lsm.h
>  create mode 100644 security/safesetid/securityfs.c
> 
> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> new file mode 100644
> index 000000000000..e7d072124424
> --- /dev/null
> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> @@ -0,0 +1,94 @@
> +=========
> +SafeSetID
> +=========
> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> +UID/GID transitions from a given UID/GID to only those approved by a
> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> +allowing a user to set up user namespace UID mappings.
> +
> +
> +Background
> +==========
> +In absence of file capabilities, processes spawned on a Linux system that need
> +to switch to a different user must be spawned with CAP_SETUID privileges.
> +CAP_SETUID is granted to programs running as root or those running as a non-root
> +user that have been explicitly given the CAP_SETUID runtime capability. It is
> +often preferable to use Linux runtime capabilities rather than file
> +capabilities, since using file capabilities to run a program with elevated
> +privileges opens up possible security holes since any user with access to the
> +file can exec() that program to gain the elevated privileges.

Not true, see inheritable capabilities.  You also might look at ambient
capabilities.

Just to be sure - your end-goal is to have a set of tasks which have
some privileges, including CAP_SETUID, but which cannot transition to
certain uids, perhaps including root?
Kees Cook Oct. 31, 2018, 9:57 p.m. UTC | #2
On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> Just to be sure - your end-goal is to have a set of tasks which have
> some privileges, including CAP_SETUID, but which cannot transition to
> certain uids, perhaps including root?

AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
_without_ CAP_SETUID and still allow whitelisted uid transitions.

-Kees
Casey Schaufler Oct. 31, 2018, 10:37 p.m. UTC | #3
On 10/31/2018 2:57 PM, Kees Cook wrote:
> On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
>> Just to be sure - your end-goal is to have a set of tasks which have
>> some privileges, including CAP_SETUID, but which cannot transition to
>> certain uids, perhaps including root?
> AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
> _without_ CAP_SETUID and still allow whitelisted uid transitions.

I don't like that thought at all at all. You need CAP_SETUID for
some transitions but not all. I can call setreuid() and restore
the saved UID to the effective UID. If this LSM works correctly
(I haven't examined it carefully yet) it should prevent restoring
the effective UID if there isn't an appropriate whitelist entry.

It also violates the "additional restriction" model of LSMs.

That has the potential to introduce a failure when a process tries
to give up privilege. If 0:1000 isn't on the whitelist but 1000:0
is Bad Things can happen. A SUID root program would be unable to
give up its privilege by going back to the real UID in this case.
Micah Morton Nov. 1, 2018, 1:12 a.m. UTC | #4
On Wed, Oct 31, 2018 at 3:37 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 10/31/2018 2:57 PM, Kees Cook wrote:
> > On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> >> Just to be sure - your end-goal is to have a set of tasks which have
> >> some privileges, including CAP_SETUID, but which cannot transition to
> >> certain uids, perhaps including root?

Correct, only whitelisted uids can be switched to. This only pertains
to CAP_SETUID, other capabilities are not affected.

> > AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
> > _without_ CAP_SETUID and still allow whitelisted uid transitions.

Kees is right that this LSM only pertains to a single capability:
CAP_SETUID (future work could tackle CAP_SETGID in the same fashion)
-- although the idea here is to put in per-user limitations on what a
process running as that user can do even when it _has_ CAP_SETUID. So
it doesn't grant any extra privileges to processes that don't have
CAP_SETUID, only restricts processes that _do_ have CAP_SETUID if the
user they are running under is restricted.

>
> I don't like that thought at all at all. You need CAP_SETUID for
> some transitions but not all. I can call setreuid() and restore
> the saved UID to the effective UID. If this LSM works correctly
> (I haven't examined it carefully yet) it should prevent restoring
> the effective UID if there isn't an appropriate whitelist entry.

Yep, thats how it works. The idea here is that you still need
CAP_SETUID for all transitions, regardless of whether whitelist
policies exist or not.

>
> It also violates the "additional restriction" model of LSMs.
>
> That has the potential to introduce a failure when a process tries
> to give up privilege. If 0:1000 isn't on the whitelist but 1000:0

As above, if a process drops CAP_SETUID it wouldn't be able to do any
transitions (if this is what you mean by give up privilege). The
whitelist is a one-way policy so if one wanted to restrict user 123
but let it switch to 456 and back, 2 policies would need to be added:
123 -> 456 and 456 -> 123.

> is Bad Things can happen. A SUID root program would be unable to
> give up its privilege by going back to the real UID in this case.
>
Serge E. Hallyn Nov. 1, 2018, 6:07 a.m. UTC | #5
On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> Quoting mortonm@chromium.org (mortonm@chromium.org):
> > From: Micah Morton <mortonm@chromium.org>
> > 
> > SafeSetID gates the setid family of syscalls to restrict UID/GID
> > transitions from a given UID/GID to only those approved by a
> > system-wide whitelist. These restrictions also prohibit the given
> > UIDs/GIDs from obtaining auxiliary privileges associated with
> > CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> > mappings. For now, only gating the set*uid family of syscalls is
> > supported, with support for set*gid coming in a future patch set.
> > 
> > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > ---
> > 
> > NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> > code that likely needs improvement before being an acceptable approach.
> > I'm specifically interested to see if there are better ideas for how
> > this could be done.
> > 
> >  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> >  Documentation/admin-guide/LSM/index.rst     |   1 +
> >  arch/Kconfig                                |   5 +
> >  arch/arm/Kconfig                            |   1 +
> >  arch/arm64/Kconfig                          |   1 +
> >  arch/x86/Kconfig                            |   1 +
> >  security/Kconfig                            |   1 +
> >  security/Makefile                           |   2 +
> >  security/safesetid/Kconfig                  |  13 +
> >  security/safesetid/Makefile                 |   7 +
> >  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> >  security/safesetid/lsm.h                    |  30 ++
> >  security/safesetid/securityfs.c             | 189 +++++++++++
> >  13 files changed, 679 insertions(+)
> >  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> >  create mode 100644 security/safesetid/Kconfig
> >  create mode 100644 security/safesetid/Makefile
> >  create mode 100644 security/safesetid/lsm.c
> >  create mode 100644 security/safesetid/lsm.h
> >  create mode 100644 security/safesetid/securityfs.c
> > 
> > diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> > new file mode 100644
> > index 000000000000..e7d072124424
> > --- /dev/null
> > +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> > @@ -0,0 +1,94 @@
> > +=========
> > +SafeSetID
> > +=========
> > +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> > +UID/GID transitions from a given UID/GID to only those approved by a
> > +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> > +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> > +allowing a user to set up user namespace UID mappings.
> > +
> > +
> > +Background
> > +==========
> > +In absence of file capabilities, processes spawned on a Linux system that need
> > +to switch to a different user must be spawned with CAP_SETUID privileges.
> > +CAP_SETUID is granted to programs running as root or those running as a non-root
> > +user that have been explicitly given the CAP_SETUID runtime capability. It is
> > +often preferable to use Linux runtime capabilities rather than file
> > +capabilities, since using file capabilities to run a program with elevated
> > +privileges opens up possible security holes since any user with access to the
> > +file can exec() that program to gain the elevated privileges.
> 
> Not true, see inheritable capabilities.  You also might look at ambient
> capabilities.

So for example with pam_cap.so you could have your N uids each be given
the desired pI, and assign the corrsponding fIs to the files they should
be able to exec with privilege.  No other uids will run those files with
privilege.  *1

Can you give some more details about exactly how you see SafeSetID being
used?

I'm still not quite clear on whether you want N completely unprivileged
uids to be used by some user (i.e. uid 1000), or whether one or more of
those should also have some privileged, or whether one of the uids might
or might not b uid 0.  Years ago I used to use N separate uids to
somewhat segragate workloads on my laptop, and I'd like my browser to
do something like that.  Is that the kind of uid switching you have
in mind?

-serge

*1 And maybe with one of the p9auth/factotem proposals out there you
could have a userspace daemon hand out the tokens for setuid, but that's
getting "out there" and probably derailing this conversation :)
Serge E. Hallyn Nov. 1, 2018, 6:13 a.m. UTC | #6
On Wed, Oct 31, 2018 at 06:12:46PM -0700, Micah Morton wrote:
> On Wed, Oct 31, 2018 at 3:37 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >
> > On 10/31/2018 2:57 PM, Kees Cook wrote:
> > > On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > >> Just to be sure - your end-goal is to have a set of tasks which have
> > >> some privileges, including CAP_SETUID, but which cannot transition to
> > >> certain uids, perhaps including root?
> 
> Correct, only whitelisted uids can be switched to. This only pertains
> to CAP_SETUID, other capabilities are not affected.
> 
> > > AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
> > > _without_ CAP_SETUID and still allow whitelisted uid transitions.
> 
> Kees is right that this LSM only pertains to a single capability:
> CAP_SETUID (future work could tackle CAP_SETGID in the same fashion)
> -- although the idea here is to put in per-user limitations on what a
> process running as that user can do even when it _has_ CAP_SETUID. So
> it doesn't grant any extra privileges to processes that don't have
> CAP_SETUID, only restricts processes that _do_ have CAP_SETUID if the
> user they are running under is restricted.
> 
> >
> > I don't like that thought at all at all. You need CAP_SETUID for
> > some transitions but not all. I can call setreuid() and restore
> > the saved UID to the effective UID. If this LSM works correctly
> > (I haven't examined it carefully yet) it should prevent restoring
> > the effective UID if there isn't an appropriate whitelist entry.
> 
> Yep, thats how it works. The idea here is that you still need
> CAP_SETUID for all transitions, regardless of whether whitelist
> policies exist or not.
> 
> >
> > It also violates the "additional restriction" model of LSMs.

Does it, or does the fact that CAP_SETUID is still required in order
to change uids address that?

> > That has the potential to introduce a failure when a process tries
> > to give up privilege. If 0:1000 isn't on the whitelist but 1000:0
> 
> As above, if a process drops CAP_SETUID it wouldn't be able to do any
> transitions (if this is what you mean by give up privilege). The
> whitelist is a one-way policy so if one wanted to restrict user 123
> but let it switch to 456 and back, 2 policies would need to be added:
> 123 -> 456 and 456 -> 123.
> 
> > is Bad Things can happen. A SUID root program would be unable to
> > give up its privilege by going back to the real UID in this case.

Yes, this was the root cause of the "sendmail capabilities bug" - a
privileged daemon which could be made to run with slightly less
privilege in such a way that it failed to drop privilege, then continued
ot run with some privilege.

But the key trigger there was that an unprivileged task could prevent
the more privileged task from dropping its privilege.

Is that the case here?  It might be...  If one of the uid-restricted
tasks running with CAP_SETUID runs a filter over some malicious data
which forces it to run a program which intends to change its uid and
fails to detect that that failed.  It's not quite as cut-and-dried
though, and if we simply do not allow uid 0 to be in the set of uids,
that may prevent any such cases.

-serge
Casey Schaufler Nov. 1, 2018, 3:39 p.m. UTC | #7
On 10/31/2018 11:13 PM, Serge E. Hallyn wrote:
> On Wed, Oct 31, 2018 at 06:12:46PM -0700, Micah Morton wrote:
>> On Wed, Oct 31, 2018 at 3:37 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>> On 10/31/2018 2:57 PM, Kees Cook wrote:
>>>> On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
>>>>> Just to be sure - your end-goal is to have a set of tasks which have
>>>>> some privileges, including CAP_SETUID, but which cannot transition to
>>>>> certain uids, perhaps including root?
>> Correct, only whitelisted uids can be switched to. This only pertains
>> to CAP_SETUID, other capabilities are not affected.
>>
>>>> AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
>>>> _without_ CAP_SETUID and still allow whitelisted uid transitions.
>> Kees is right that this LSM only pertains to a single capability:
>> CAP_SETUID (future work could tackle CAP_SETGID in the same fashion)
>> -- although the idea here is to put in per-user limitations on what a
>> process running as that user can do even when it _has_ CAP_SETUID. So
>> it doesn't grant any extra privileges to processes that don't have
>> CAP_SETUID, only restricts processes that _do_ have CAP_SETUID if the
>> user they are running under is restricted.
>>
>>> I don't like that thought at all at all. You need CAP_SETUID for
>>> some transitions but not all. I can call setreuid() and restore
>>> the saved UID to the effective UID. If this LSM works correctly
>>> (I haven't examined it carefully yet) it should prevent restoring
>>> the effective UID if there isn't an appropriate whitelist entry.
>> Yep, thats how it works. The idea here is that you still need
>> CAP_SETUID for all transitions, regardless of whether whitelist
>> policies exist or not.
>>
>>> It also violates the "additional restriction" model of LSMs.
> Does it, or does the fact that CAP_SETUID is still required in order
> to change uids address that?

Yes, it does. Reading Kees' response had me a little concerned.

>>> That has the potential to introduce a failure when a process tries
>>> to give up privilege. If 0:1000 isn't on the whitelist but 1000:0
>> As above, if a process drops CAP_SETUID it wouldn't be able to do any
>> transitions (if this is what you mean by give up privilege). The
>> whitelist is a one-way policy so if one wanted to restrict user 123
>> but let it switch to 456 and back, 2 policies would need to be added:
>> 123 -> 456 and 456 -> 123.
>>
>>> is Bad Things can happen. A SUID root program would be unable to
>>> give up its privilege by going back to the real UID in this case.
> Yes, this was the root cause of the "sendmail capabilities bug"

I'm very familiar with that particular bug, as Bob Mende's
work to convert sendmail to using capabilities was done for
a project I owned. The blowback against all things security
was pretty intense.

>  - a
> privileged daemon which could be made to run with slightly less
> privilege in such a way that it failed to drop privilege, then continued
> ot run with some privilege.
>
> But the key trigger there was that an unprivileged task could prevent
> the more privileged task from dropping its privilege.
>
> Is that the case here?

I think it is reasonably safe to assume that there
are many instances of programs that don't handle errors
from setreuid() in the reset case. Without privilege
setreuid() can be used to swap effective and real UIDs.

>   It might be...  If one of the uid-restricted
> tasks running with CAP_SETUID runs a filter over some malicious data
> which forces it to run a program which intends to change its uid and
> fails to detect that that failed.  It's not quite as cut-and-dried
> though, and if we simply do not allow uid 0 to be in the set of uids,
> that may prevent any such cases.

Alas, UID 0 is not the only case we have to worry about.
If I run a program owned by tss (Trousers) with the setuid
bit set it will change the effective UID to tss. If this
program expects to switch effective UID back to me and
the SafeSetID whitelist prevents it, Bad Things may happen
even though no capabilities or root privilege where ever
involved.

It would be easy for an inexperienced or malicious admin to
include cschaufler:tss in the whitelist but miss on adding
tss:cschaufler.
Serge E. Hallyn Nov. 1, 2018, 3:56 p.m. UTC | #8
Quoting Casey Schaufler (casey@schaufler-ca.com):
> On 10/31/2018 11:13 PM, Serge E. Hallyn wrote:
> > On Wed, Oct 31, 2018 at 06:12:46PM -0700, Micah Morton wrote:
> >> On Wed, Oct 31, 2018 at 3:37 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>> On 10/31/2018 2:57 PM, Kees Cook wrote:
> >>>> On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>>> Just to be sure - your end-goal is to have a set of tasks which have
> >>>>> some privileges, including CAP_SETUID, but which cannot transition to
> >>>>> certain uids, perhaps including root?
> >> Correct, only whitelisted uids can be switched to. This only pertains
> >> to CAP_SETUID, other capabilities are not affected.
> >>
> >>>> AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
> >>>> _without_ CAP_SETUID and still allow whitelisted uid transitions.
> >> Kees is right that this LSM only pertains to a single capability:
> >> CAP_SETUID (future work could tackle CAP_SETGID in the same fashion)
> >> -- although the idea here is to put in per-user limitations on what a
> >> process running as that user can do even when it _has_ CAP_SETUID. So
> >> it doesn't grant any extra privileges to processes that don't have
> >> CAP_SETUID, only restricts processes that _do_ have CAP_SETUID if the
> >> user they are running under is restricted.
> >>
> >>> I don't like that thought at all at all. You need CAP_SETUID for
> >>> some transitions but not all. I can call setreuid() and restore
> >>> the saved UID to the effective UID. If this LSM works correctly
> >>> (I haven't examined it carefully yet) it should prevent restoring
> >>> the effective UID if there isn't an appropriate whitelist entry.
> >> Yep, thats how it works. The idea here is that you still need
> >> CAP_SETUID for all transitions, regardless of whether whitelist
> >> policies exist or not.
> >>
> >>> It also violates the "additional restriction" model of LSMs.
> > Does it, or does the fact that CAP_SETUID is still required in order
> > to change uids address that?
> 
> Yes, it does. Reading Kees' response had me a little concerned.
> 
> >>> That has the potential to introduce a failure when a process tries
> >>> to give up privilege. If 0:1000 isn't on the whitelist but 1000:0
> >> As above, if a process drops CAP_SETUID it wouldn't be able to do any
> >> transitions (if this is what you mean by give up privilege). The
> >> whitelist is a one-way policy so if one wanted to restrict user 123
> >> but let it switch to 456 and back, 2 policies would need to be added:
> >> 123 -> 456 and 456 -> 123.
> >>
> >>> is Bad Things can happen. A SUID root program would be unable to
> >>> give up its privilege by going back to the real UID in this case.
> > Yes, this was the root cause of the "sendmail capabilities bug"
> 
> I'm very familiar with that particular bug, as Bob Mende's
> work to convert sendmail to using capabilities was done for
> a project I owned. The blowback against all things security
> was pretty intense.
> 
> >  - a
> > privileged daemon which could be made to run with slightly less
> > privilege in such a way that it failed to drop privilege, then continued
> > ot run with some privilege.
> >
> > But the key trigger there was that an unprivileged task could prevent
> > the more privileged task from dropping its privilege.
> >
> > Is that the case here?
> 
> I think it is reasonably safe to assume that there
> are many instances of programs that don't handle errors
> from setreuid() in the reset case. Without privilege
> setreuid() can be used to swap effective and real UIDs.
> 
> >   It might be...  If one of the uid-restricted
> > tasks running with CAP_SETUID runs a filter over some malicious data
> > which forces it to run a program which intends to change its uid and
> > fails to detect that that failed.  It's not quite as cut-and-dried
> > though, and if we simply do not allow uid 0 to be in the set of uids,
> > that may prevent any such cases.
> 
> Alas, UID 0 is not the only case we have to worry about.
> If I run a program owned by tss (Trousers) with the setuid
> bit set it will change the effective UID to tss. If this
> program expects to switch effective UID back to me and
> the SafeSetID whitelist prevents it, Bad Things may happen
> even though no capabilities or root privilege where ever
> involved.

Yes, but I don't think an unprivileged user can make that happen.
If you look at the patch, you require cap_sys_admin againt your
user namespace in order to limit the uid range.  So either you
were privileged to begin with, or you create a new user namespace.
If you create a new userns, you can only map uids which are delegated
to you - presumably not tss - into that namespace.

> It would be easy for an inexperienced or malicious admin to
> include cschaufler:tss in the whitelist but miss on adding
> tss:cschaufler.

Well, it's also pretty easy for an admin to add 0 or tss into
serge's delegated mappings in /etc/subuid, I suppose...

Now I hadn't noticed the one-way directional nature of these
whitelist entries.  I'd been asuming there was just a set of
ids it was allowed to transition to it.  Not sure which is
better, I can see pros/cons to both.
Micah Morton Nov. 1, 2018, 4:11 p.m. UTC | #9
On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> > Quoting mortonm@chromium.org (mortonm@chromium.org):
> > > From: Micah Morton <mortonm@chromium.org>
> > >
> > > SafeSetID gates the setid family of syscalls to restrict UID/GID
> > > transitions from a given UID/GID to only those approved by a
> > > system-wide whitelist. These restrictions also prohibit the given
> > > UIDs/GIDs from obtaining auxiliary privileges associated with
> > > CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> > > mappings. For now, only gating the set*uid family of syscalls is
> > > supported, with support for set*gid coming in a future patch set.
> > >
> > > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > > ---
> > >
> > > NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> > > code that likely needs improvement before being an acceptable approach.
> > > I'm specifically interested to see if there are better ideas for how
> > > this could be done.
> > >
> > >  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> > >  Documentation/admin-guide/LSM/index.rst     |   1 +
> > >  arch/Kconfig                                |   5 +
> > >  arch/arm/Kconfig                            |   1 +
> > >  arch/arm64/Kconfig                          |   1 +
> > >  arch/x86/Kconfig                            |   1 +
> > >  security/Kconfig                            |   1 +
> > >  security/Makefile                           |   2 +
> > >  security/safesetid/Kconfig                  |  13 +
> > >  security/safesetid/Makefile                 |   7 +
> > >  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> > >  security/safesetid/lsm.h                    |  30 ++
> > >  security/safesetid/securityfs.c             | 189 +++++++++++
> > >  13 files changed, 679 insertions(+)
> > >  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> > >  create mode 100644 security/safesetid/Kconfig
> > >  create mode 100644 security/safesetid/Makefile
> > >  create mode 100644 security/safesetid/lsm.c
> > >  create mode 100644 security/safesetid/lsm.h
> > >  create mode 100644 security/safesetid/securityfs.c
> > >
> > > diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> > > new file mode 100644
> > > index 000000000000..e7d072124424
> > > --- /dev/null
> > > +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> > > @@ -0,0 +1,94 @@
> > > +=========
> > > +SafeSetID
> > > +=========
> > > +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> > > +UID/GID transitions from a given UID/GID to only those approved by a
> > > +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> > > +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> > > +allowing a user to set up user namespace UID mappings.
> > > +
> > > +
> > > +Background
> > > +==========
> > > +In absence of file capabilities, processes spawned on a Linux system that need
> > > +to switch to a different user must be spawned with CAP_SETUID privileges.
> > > +CAP_SETUID is granted to programs running as root or those running as a non-root
> > > +user that have been explicitly given the CAP_SETUID runtime capability. It is
> > > +often preferable to use Linux runtime capabilities rather than file
> > > +capabilities, since using file capabilities to run a program with elevated
> > > +privileges opens up possible security holes since any user with access to the
> > > +file can exec() that program to gain the elevated privileges.
> >
> > Not true, see inheritable capabilities.  You also might look at ambient
> > capabilities.
>
> So for example with pam_cap.so you could have your N uids each be given
> the desired pI, and assign the corrsponding fIs to the files they should
> be able to exec with privilege.  No other uids will run those files with
> privilege.  *1

Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?

>
> Can you give some more details about exactly how you see SafeSetID being
> used?

Sure. The main use case for this LSM is to allow a non-root program to
transition to other untrusted uids without full blown CAP_SETUID
capabilities. The non-root program would still need CAP_SETUID to do
any kind of transition, but the additional restrictions imposed by
this LSM would mean it is a "safer" version of CAP_SETUID since the
non-root program cannot take advantage of CAP_SETUID to do any
unapproved actions (i.e. setuid to uid 0 or create/enter new user
namespace). The higher level goal is to allow for uid-based sandboxing
of system services without having to give out CAP_SETUID all over the
place just so that non-root programs can drop to
even-further-non-privileged uids. This is especially relevant when one
non-root daemon on the system should be allowed to spawn other
processes as different uids, but its undesirable to give the daemon a
basically-root-equivalent CAP_SETUID.

>
> I'm still not quite clear on whether you want N completely unprivileged
> uids to be used by some user (i.e. uid 1000), or whether one or more of
> those should also have some privileged, or whether one of the uids might
> or might not b uid 0.  Years ago I used to use N separate uids to
> somewhat segragate workloads on my laptop, and I'd like my browser to
> do something like that.  Is that the kind of uid switching you have
> in mind?

"N completely unprivileged uids to be used by some user (i.e. uid
1000)" is the closest description of what this LSM has in mind. For
example, uid 123 is some system service that needs runtime
capabilities X, Y and Z and a bunch of DBus permissions associated
with uid 123, but also wants to spawn another program without any of
these capabilities/permissions. In this case we would like to avoid
giving the system service CAP_SETUID.

>
> -serge
>
> *1 And maybe with one of the p9auth/factotem proposals out there you
> could have a userspace daemon hand out the tokens for setuid, but that's
> getting "out there" and probably derailing this conversation :)
Micah Morton Nov. 1, 2018, 4:18 p.m. UTC | #10
On Thu, Nov 1, 2018 at 8:39 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 10/31/2018 11:13 PM, Serge E. Hallyn wrote:
> > On Wed, Oct 31, 2018 at 06:12:46PM -0700, Micah Morton wrote:
> >> On Wed, Oct 31, 2018 at 3:37 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>> On 10/31/2018 2:57 PM, Kees Cook wrote:
> >>>> On Wed, Oct 31, 2018 at 2:02 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>>> Just to be sure - your end-goal is to have a set of tasks which have
> >>>>> some privileges, including CAP_SETUID, but which cannot transition to
> >>>>> certain uids, perhaps including root?
> >> Correct, only whitelisted uids can be switched to. This only pertains
> >> to CAP_SETUID, other capabilities are not affected.
> >>
> >>>> AIUI, the issue is that CAP_SETUID is TOO permissive. Instead, run
> >>>> _without_ CAP_SETUID and still allow whitelisted uid transitions.
> >> Kees is right that this LSM only pertains to a single capability:
> >> CAP_SETUID (future work could tackle CAP_SETGID in the same fashion)
> >> -- although the idea here is to put in per-user limitations on what a
> >> process running as that user can do even when it _has_ CAP_SETUID. So
> >> it doesn't grant any extra privileges to processes that don't have
> >> CAP_SETUID, only restricts processes that _do_ have CAP_SETUID if the
> >> user they are running under is restricted.
> >>
> >>> I don't like that thought at all at all. You need CAP_SETUID for
> >>> some transitions but not all. I can call setreuid() and restore
> >>> the saved UID to the effective UID. If this LSM works correctly
> >>> (I haven't examined it carefully yet) it should prevent restoring
> >>> the effective UID if there isn't an appropriate whitelist entry.
> >> Yep, thats how it works. The idea here is that you still need
> >> CAP_SETUID for all transitions, regardless of whether whitelist
> >> policies exist or not.
> >>
> >>> It also violates the "additional restriction" model of LSMs.
> > Does it, or does the fact that CAP_SETUID is still required in order
> > to change uids address that?
>
> Yes, it does. Reading Kees' response had me a little concerned.
>
> >>> That has the potential to introduce a failure when a process tries
> >>> to give up privilege. If 0:1000 isn't on the whitelist but 1000:0
> >> As above, if a process drops CAP_SETUID it wouldn't be able to do any
> >> transitions (if this is what you mean by give up privilege). The
> >> whitelist is a one-way policy so if one wanted to restrict user 123
> >> but let it switch to 456 and back, 2 policies would need to be added:
> >> 123 -> 456 and 456 -> 123.
> >>
> >>> is Bad Things can happen. A SUID root program would be unable to
> >>> give up its privilege by going back to the real UID in this case.
> > Yes, this was the root cause of the "sendmail capabilities bug"
>
> I'm very familiar with that particular bug, as Bob Mende's
> work to convert sendmail to using capabilities was done for
> a project I owned. The blowback against all things security
> was pretty intense.
>
> >  - a
> > privileged daemon which could be made to run with slightly less
> > privilege in such a way that it failed to drop privilege, then continued
> > ot run with some privilege.
> >
> > But the key trigger there was that an unprivileged task could prevent
> > the more privileged task from dropping its privilege.
> >
> > Is that the case here?
>
> I think it is reasonably safe to assume that there
> are many instances of programs that don't handle errors
> from setreuid() in the reset case. Without privilege
> setreuid() can be used to swap effective and real UIDs.

This LSM won't interfere with any of the one-off transitions allowed
by the set*uid family of syscalls that don't require CAP_SETUID. See
safesetid_task_fix_setuid in lsm.c.

>
> >   It might be...  If one of the uid-restricted
> > tasks running with CAP_SETUID runs a filter over some malicious data
> > which forces it to run a program which intends to change its uid and
> > fails to detect that that failed.  It's not quite as cut-and-dried
> > though, and if we simply do not allow uid 0 to be in the set of uids,
> > that may prevent any such cases.
>
> Alas, UID 0 is not the only case we have to worry about.
> If I run a program owned by tss (Trousers) with the setuid
> bit set it will change the effective UID to tss. If this
> program expects to switch effective UID back to me and
> the SafeSetID whitelist prevents it, Bad Things may happen
> even though no capabilities or root privilege where ever
> involved.
>
> It would be easy for an inexperienced or malicious admin to
> include cschaufler:tss in the whitelist but miss on adding
> tss:cschaufler.
>

Same as above, this LSM will only affect transitions that would need
CAP_SETUID. AFAICT switching the effective UID back after that
setuid-bit scenario is not something that requires CAP_SETUID, and
thus would continue to work as it always has in Linux.
Micah Morton Nov. 1, 2018, 4:22 p.m. UTC | #11
On Thu, Nov 1, 2018 at 9:11 AM Micah Morton <mortonm@chromium.org> wrote:
>
> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >
> > On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> > > Quoting mortonm@chromium.org (mortonm@chromium.org):
> > > > From: Micah Morton <mortonm@chromium.org>
> > > >
> > > > SafeSetID gates the setid family of syscalls to restrict UID/GID
> > > > transitions from a given UID/GID to only those approved by a
> > > > system-wide whitelist. These restrictions also prohibit the given
> > > > UIDs/GIDs from obtaining auxiliary privileges associated with
> > > > CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> > > > mappings. For now, only gating the set*uid family of syscalls is
> > > > supported, with support for set*gid coming in a future patch set.
> > > >
> > > > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > > > ---
> > > >
> > > > NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> > > > code that likely needs improvement before being an acceptable approach.
> > > > I'm specifically interested to see if there are better ideas for how
> > > > this could be done.
> > > >
> > > >  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> > > >  Documentation/admin-guide/LSM/index.rst     |   1 +
> > > >  arch/Kconfig                                |   5 +
> > > >  arch/arm/Kconfig                            |   1 +
> > > >  arch/arm64/Kconfig                          |   1 +
> > > >  arch/x86/Kconfig                            |   1 +
> > > >  security/Kconfig                            |   1 +
> > > >  security/Makefile                           |   2 +
> > > >  security/safesetid/Kconfig                  |  13 +
> > > >  security/safesetid/Makefile                 |   7 +
> > > >  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> > > >  security/safesetid/lsm.h                    |  30 ++
> > > >  security/safesetid/securityfs.c             | 189 +++++++++++
> > > >  13 files changed, 679 insertions(+)
> > > >  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> > > >  create mode 100644 security/safesetid/Kconfig
> > > >  create mode 100644 security/safesetid/Makefile
> > > >  create mode 100644 security/safesetid/lsm.c
> > > >  create mode 100644 security/safesetid/lsm.h
> > > >  create mode 100644 security/safesetid/securityfs.c
> > > >
> > > > diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> > > > new file mode 100644
> > > > index 000000000000..e7d072124424
> > > > --- /dev/null
> > > > +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> > > > @@ -0,0 +1,94 @@
> > > > +=========
> > > > +SafeSetID
> > > > +=========
> > > > +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> > > > +UID/GID transitions from a given UID/GID to only those approved by a
> > > > +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> > > > +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> > > > +allowing a user to set up user namespace UID mappings.
> > > > +
> > > > +
> > > > +Background
> > > > +==========
> > > > +In absence of file capabilities, processes spawned on a Linux system that need
> > > > +to switch to a different user must be spawned with CAP_SETUID privileges.
> > > > +CAP_SETUID is granted to programs running as root or those running as a non-root
> > > > +user that have been explicitly given the CAP_SETUID runtime capability. It is
> > > > +often preferable to use Linux runtime capabilities rather than file
> > > > +capabilities, since using file capabilities to run a program with elevated
> > > > +privileges opens up possible security holes since any user with access to the
> > > > +file can exec() that program to gain the elevated privileges.
> > >
> > > Not true, see inheritable capabilities.  You also might look at ambient
> > > capabilities.
> >
> > So for example with pam_cap.so you could have your N uids each be given
> > the desired pI, and assign the corrsponding fIs to the files they should
> > be able to exec with privilege.  No other uids will run those files with
> > privilege.  *1
>
> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
>
> >
> > Can you give some more details about exactly how you see SafeSetID being
> > used?
>
> Sure. The main use case for this LSM is to allow a non-root program to
> transition to other untrusted uids without full blown CAP_SETUID
> capabilities. The non-root program would still need CAP_SETUID to do
> any kind of transition, but the additional restrictions imposed by
> this LSM would mean it is a "safer" version of CAP_SETUID since the
> non-root program cannot take advantage of CAP_SETUID to do any
> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> namespace). The higher level goal is to allow for uid-based sandboxing
> of system services without having to give out CAP_SETUID all over the
> place just so that non-root programs can drop to
> even-further-non-privileged uids. This is especially relevant when one
> non-root daemon on the system should be allowed to spawn other
> processes as different uids, but its undesirable to give the daemon a
> basically-root-equivalent CAP_SETUID.
>
> >
> > I'm still not quite clear on whether you want N completely unprivileged
> > uids to be used by some user (i.e. uid 1000), or whether one or more of
> > those should also have some privileged, or whether one of the uids might
> > or might not b uid 0.  Years ago I used to use N separate uids to
> > somewhat segragate workloads on my laptop, and I'd like my browser to
> > do something like that.  Is that the kind of uid switching you have
> > in mind?
>
> "N completely unprivileged uids to be used by some user (i.e. uid
> 1000)" is the closest description of what this LSM has in mind. For
> example, uid 123 is some system service that needs runtime
> capabilities X, Y and Z and a bunch of DBus permissions associated
> with uid 123, but also wants to spawn another program without any of
> these capabilities/permissions. In this case we would like to avoid
> giving the system service CAP_SETUID.

To clarify: "spawn another program *as a different uid* without any..."

>
> >
> > -serge
> >
> > *1 And maybe with one of the p9auth/factotem proposals out there you
> > could have a userspace daemon hand out the tokens for setuid, but that's
> > getting "out there" and probably derailing this conversation :)
Micah Morton Nov. 1, 2018, 4:41 p.m. UTC | #12
On Thu, Nov 1, 2018 at 9:11 AM Micah Morton <mortonm@chromium.org> wrote:
>
> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >
> > On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> > > Quoting mortonm@chromium.org (mortonm@chromium.org):
> > > > From: Micah Morton <mortonm@chromium.org>
> > > >
> > > > SafeSetID gates the setid family of syscalls to restrict UID/GID
> > > > transitions from a given UID/GID to only those approved by a
> > > > system-wide whitelist. These restrictions also prohibit the given
> > > > UIDs/GIDs from obtaining auxiliary privileges associated with
> > > > CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> > > > mappings. For now, only gating the set*uid family of syscalls is
> > > > supported, with support for set*gid coming in a future patch set.
> > > >
> > > > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > > > ---
> > > >
> > > > NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> > > > code that likely needs improvement before being an acceptable approach.
> > > > I'm specifically interested to see if there are better ideas for how
> > > > this could be done.
> > > >
> > > >  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> > > >  Documentation/admin-guide/LSM/index.rst     |   1 +
> > > >  arch/Kconfig                                |   5 +
> > > >  arch/arm/Kconfig                            |   1 +
> > > >  arch/arm64/Kconfig                          |   1 +
> > > >  arch/x86/Kconfig                            |   1 +
> > > >  security/Kconfig                            |   1 +
> > > >  security/Makefile                           |   2 +
> > > >  security/safesetid/Kconfig                  |  13 +
> > > >  security/safesetid/Makefile                 |   7 +
> > > >  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> > > >  security/safesetid/lsm.h                    |  30 ++
> > > >  security/safesetid/securityfs.c             | 189 +++++++++++
> > > >  13 files changed, 679 insertions(+)
> > > >  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> > > >  create mode 100644 security/safesetid/Kconfig
> > > >  create mode 100644 security/safesetid/Makefile
> > > >  create mode 100644 security/safesetid/lsm.c
> > > >  create mode 100644 security/safesetid/lsm.h
> > > >  create mode 100644 security/safesetid/securityfs.c
> > > >
> > > > diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> > > > new file mode 100644
> > > > index 000000000000..e7d072124424
> > > > --- /dev/null
> > > > +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> > > > @@ -0,0 +1,94 @@
> > > > +=========
> > > > +SafeSetID
> > > > +=========
> > > > +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> > > > +UID/GID transitions from a given UID/GID to only those approved by a
> > > > +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> > > > +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> > > > +allowing a user to set up user namespace UID mappings.
> > > > +
> > > > +
> > > > +Background
> > > > +==========
> > > > +In absence of file capabilities, processes spawned on a Linux system that need
> > > > +to switch to a different user must be spawned with CAP_SETUID privileges.
> > > > +CAP_SETUID is granted to programs running as root or those running as a non-root
> > > > +user that have been explicitly given the CAP_SETUID runtime capability. It is
> > > > +often preferable to use Linux runtime capabilities rather than file
> > > > +capabilities, since using file capabilities to run a program with elevated
> > > > +privileges opens up possible security holes since any user with access to the
> > > > +file can exec() that program to gain the elevated privileges.
> > >
> > > Not true, see inheritable capabilities.  You also might look at ambient
> > > capabilities.
> >
> > So for example with pam_cap.so you could have your N uids each be given
> > the desired pI, and assign the corrsponding fIs to the files they should
> > be able to exec with privilege.  No other uids will run those files with
> > privilege.  *1
>
> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
>
> >
> > Can you give some more details about exactly how you see SafeSetID being
> > used?
>
> Sure. The main use case for this LSM is to allow a non-root program to
> transition to other untrusted uids without full blown CAP_SETUID
> capabilities. The non-root program would still need CAP_SETUID to do
> any kind of transition, but the additional restrictions imposed by
> this LSM would mean it is a "safer" version of CAP_SETUID since the
> non-root program cannot take advantage of CAP_SETUID to do any
> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> namespace). The higher level goal is to allow for uid-based sandboxing
> of system services without having to give out CAP_SETUID all over the
> place just so that non-root programs can drop to
> even-further-non-privileged uids. This is especially relevant when one
> non-root daemon on the system should be allowed to spawn other
> processes as different uids, but its undesirable to give the daemon a
> basically-root-equivalent CAP_SETUID.
>
> >
> > I'm still not quite clear on whether you want N completely unprivileged
> > uids to be used by some user (i.e. uid 1000), or whether one or more of
> > those should also have some privileged, or whether one of the uids might
> > or might not b uid 0.  Years ago I used to use N separate uids to
> > somewhat segragate workloads on my laptop, and I'd like my browser to
> > do something like that.  Is that the kind of uid switching you have
> > in mind?
>
> "N completely unprivileged uids to be used by some user (i.e. uid
> 1000)" is the closest description of what this LSM has in mind. For
> example, uid 123 is some system service that needs runtime
> capabilities X, Y and Z and a bunch of DBus permissions associated
> with uid 123, but also wants to spawn another program without any of
> these capabilities/permissions. In this case we would like to avoid
> giving the system service CAP_SETUID.

Another clarification: "avoid giving the system service
_full/unrestricted_ CAP_SETUID capabilities"

>
> >
> > -serge
> >
> > *1 And maybe with one of the p9auth/factotem proposals out there you
> > could have a userspace daemon hand out the tokens for setuid, but that's
> > getting "out there" and probably derailing this conversation :)
Casey Schaufler Nov. 1, 2018, 5:08 p.m. UTC | #13
On 11/1/2018 9:11 AM, Micah Morton wrote:
> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
>>> Quoting mortonm@chromium.org (mortonm@chromium.org):
>>>> From: Micah Morton <mortonm@chromium.org>
>>>>
>>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
>>>> transitions from a given UID/GID to only those approved by a
>>>> system-wide whitelist. These restrictions also prohibit the given
>>>> UIDs/GIDs from obtaining auxiliary privileges associated with
>>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
>>>> mappings. For now, only gating the set*uid family of syscalls is
>>>> supported, with support for set*gid coming in a future patch set.
>>>>
>>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
>>>> ---
>>>>
>>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
>>>> code that likely needs improvement before being an acceptable approach.
>>>> I'm specifically interested to see if there are better ideas for how
>>>> this could be done.
>>>>
>>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
>>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
>>>>  arch/Kconfig                                |   5 +
>>>>  arch/arm/Kconfig                            |   1 +
>>>>  arch/arm64/Kconfig                          |   1 +
>>>>  arch/x86/Kconfig                            |   1 +
>>>>  security/Kconfig                            |   1 +
>>>>  security/Makefile                           |   2 +
>>>>  security/safesetid/Kconfig                  |  13 +
>>>>  security/safesetid/Makefile                 |   7 +
>>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
>>>>  security/safesetid/lsm.h                    |  30 ++
>>>>  security/safesetid/securityfs.c             | 189 +++++++++++
>>>>  13 files changed, 679 insertions(+)
>>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
>>>>  create mode 100644 security/safesetid/Kconfig
>>>>  create mode 100644 security/safesetid/Makefile
>>>>  create mode 100644 security/safesetid/lsm.c
>>>>  create mode 100644 security/safesetid/lsm.h
>>>>  create mode 100644 security/safesetid/securityfs.c
>>>>
>>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
>>>> new file mode 100644
>>>> index 000000000000..e7d072124424
>>>> --- /dev/null
>>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
>>>> @@ -0,0 +1,94 @@
>>>> +=========
>>>> +SafeSetID
>>>> +=========
>>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
>>>> +UID/GID transitions from a given UID/GID to only those approved by a
>>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
>>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
>>>> +allowing a user to set up user namespace UID mappings.
>>>> +
>>>> +
>>>> +Background
>>>> +==========
>>>> +In absence of file capabilities, processes spawned on a Linux system that need
>>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
>>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
>>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
>>>> +often preferable to use Linux runtime capabilities rather than file
>>>> +capabilities, since using file capabilities to run a program with elevated
>>>> +privileges opens up possible security holes since any user with access to the
>>>> +file can exec() that program to gain the elevated privileges.
>>> Not true, see inheritable capabilities.  You also might look at ambient
>>> capabilities.
>> So for example with pam_cap.so you could have your N uids each be given
>> the desired pI, and assign the corrsponding fIs to the files they should
>> be able to exec with privilege.  No other uids will run those files with
>> privilege.  *1
> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
>
>> Can you give some more details about exactly how you see SafeSetID being
>> used?
> Sure. The main use case for this LSM is to allow a non-root program to
> transition to other untrusted uids without full blown CAP_SETUID
> capabilities. The non-root program would still need CAP_SETUID to do
> any kind of transition, but the additional restrictions imposed by
> this LSM would mean it is a "safer" version of CAP_SETUID since the
> non-root program cannot take advantage of CAP_SETUID to do any
> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> namespace). The higher level goal is to allow for uid-based sandboxing
> of system services without having to give out CAP_SETUID all over the
> place just so that non-root programs can drop to
> even-further-non-privileged uids. This is especially relevant when one
> non-root daemon on the system should be allowed to spawn other
> processes as different uids, but its undesirable to give the daemon a
> basically-root-equivalent CAP_SETUID.

I don't want to sound stupid(er than usual), but it sounds like
you could do all this using setuid bits prudently. Based on this
description, I don't see that anything new is needed.

>> I'm still not quite clear on whether you want N completely unprivileged
>> uids to be used by some user (i.e. uid 1000), or whether one or more of
>> those should also have some privileged, or whether one of the uids might
>> or might not b uid 0.  Years ago I used to use N separate uids to
>> somewhat segragate workloads on my laptop, and I'd like my browser to
>> do something like that.  Is that the kind of uid switching you have
>> in mind?
> "N completely unprivileged uids to be used by some user (i.e. uid
> 1000)" is the closest description of what this LSM has in mind. For
> example, uid 123 is some system service that needs runtime
> capabilities X, Y and Z and a bunch of DBus permissions associated
> with uid 123, but also wants to spawn another program without any of
> these capabilities/permissions. In this case we would like to avoid
> giving the system service CAP_SETUID.
>
>> -serge
>>
>> *1 And maybe with one of the p9auth/factotem proposals out there you
>> could have a userspace daemon hand out the tokens for setuid, but that's
>> getting "out there" and probably derailing this conversation :)
Micah Morton Nov. 1, 2018, 7:52 p.m. UTC | #14
On Thu, Nov 1, 2018 at 10:08 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/1/2018 9:11 AM, Micah Morton wrote:
> > On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> >>> Quoting mortonm@chromium.org (mortonm@chromium.org):
> >>>> From: Micah Morton <mortonm@chromium.org>
> >>>>
> >>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
> >>>> transitions from a given UID/GID to only those approved by a
> >>>> system-wide whitelist. These restrictions also prohibit the given
> >>>> UIDs/GIDs from obtaining auxiliary privileges associated with
> >>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> >>>> mappings. For now, only gating the set*uid family of syscalls is
> >>>> supported, with support for set*gid coming in a future patch set.
> >>>>
> >>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
> >>>> ---
> >>>>
> >>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> >>>> code that likely needs improvement before being an acceptable approach.
> >>>> I'm specifically interested to see if there are better ideas for how
> >>>> this could be done.
> >>>>
> >>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> >>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
> >>>>  arch/Kconfig                                |   5 +
> >>>>  arch/arm/Kconfig                            |   1 +
> >>>>  arch/arm64/Kconfig                          |   1 +
> >>>>  arch/x86/Kconfig                            |   1 +
> >>>>  security/Kconfig                            |   1 +
> >>>>  security/Makefile                           |   2 +
> >>>>  security/safesetid/Kconfig                  |  13 +
> >>>>  security/safesetid/Makefile                 |   7 +
> >>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> >>>>  security/safesetid/lsm.h                    |  30 ++
> >>>>  security/safesetid/securityfs.c             | 189 +++++++++++
> >>>>  13 files changed, 679 insertions(+)
> >>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>  create mode 100644 security/safesetid/Kconfig
> >>>>  create mode 100644 security/safesetid/Makefile
> >>>>  create mode 100644 security/safesetid/lsm.c
> >>>>  create mode 100644 security/safesetid/lsm.h
> >>>>  create mode 100644 security/safesetid/securityfs.c
> >>>>
> >>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>> new file mode 100644
> >>>> index 000000000000..e7d072124424
> >>>> --- /dev/null
> >>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>> @@ -0,0 +1,94 @@
> >>>> +=========
> >>>> +SafeSetID
> >>>> +=========
> >>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> >>>> +UID/GID transitions from a given UID/GID to only those approved by a
> >>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> >>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> >>>> +allowing a user to set up user namespace UID mappings.
> >>>> +
> >>>> +
> >>>> +Background
> >>>> +==========
> >>>> +In absence of file capabilities, processes spawned on a Linux system that need
> >>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
> >>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
> >>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
> >>>> +often preferable to use Linux runtime capabilities rather than file
> >>>> +capabilities, since using file capabilities to run a program with elevated
> >>>> +privileges opens up possible security holes since any user with access to the
> >>>> +file can exec() that program to gain the elevated privileges.
> >>> Not true, see inheritable capabilities.  You also might look at ambient
> >>> capabilities.
> >> So for example with pam_cap.so you could have your N uids each be given
> >> the desired pI, and assign the corrsponding fIs to the files they should
> >> be able to exec with privilege.  No other uids will run those files with
> >> privilege.  *1
> > Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
> >
> >> Can you give some more details about exactly how you see SafeSetID being
> >> used?
> > Sure. The main use case for this LSM is to allow a non-root program to
> > transition to other untrusted uids without full blown CAP_SETUID
> > capabilities. The non-root program would still need CAP_SETUID to do
> > any kind of transition, but the additional restrictions imposed by
> > this LSM would mean it is a "safer" version of CAP_SETUID since the
> > non-root program cannot take advantage of CAP_SETUID to do any
> > unapproved actions (i.e. setuid to uid 0 or create/enter new user
> > namespace). The higher level goal is to allow for uid-based sandboxing
> > of system services without having to give out CAP_SETUID all over the
> > place just so that non-root programs can drop to
> > even-further-non-privileged uids. This is especially relevant when one
> > non-root daemon on the system should be allowed to spawn other
> > processes as different uids, but its undesirable to give the daemon a
> > basically-root-equivalent CAP_SETUID.
>
> I don't want to sound stupid(er than usual), but it sounds like
> you could do all this using setuid bits prudently. Based on this
> description, I don't see that anything new is needed.

There are situations where setuid bits don't get the job done, as
there are many situations where a program just wants to call setuid as
part of its execution (or fork + setuid without exec), instead of
fork/exec'ing a setuid binary. Take the following scenario for
example: init script (as root) spawns a network manager program as uid
1000 and then the network manager spawns OpenVPN. The common mode of
operation for OpenVPN is to start running as the uid it was spawned
with (1000) at startup, but then drop to a lesser-privileged uid (e.g.
2000) after initialization/setup by calling setuid. This is something
setuid bits wouldn't help with, without refactoring OpenVPN. So one
option here is to give the network manager CAP_SETUID, which will be
inherited by OpenVPN, and then OpenVPN drops to uid 2000 and drops
CAP_SETUID (would probably require patching OpenVPN for the capability
dropping). The problem here is that if the network manager itself is
untrusted and exploitable, then giving it unrestricted CAP_SETUID is a
big security risk. Even just sticking with the network manager / VPN
example, strongSwan VPN also uses the same drop-to-user-through-setuid
setup, as do other Linux applications. Refactoring these applications
to fork/exec setuid binaries instead of simply calling setuid is often
infeasible. So a direct call to setuid is often necessary/expected,
and setuid bits don't help here.

Also, use of setuid bits precludes the use of the no_new_privs bit,
which is usually at least a nice-to-have (if not need-to-have) for
sandboxed processes on the system.

>
> >> I'm still not quite clear on whether you want N completely unprivileged
> >> uids to be used by some user (i.e. uid 1000), or whether one or more of
> >> those should also have some privileged, or whether one of the uids might
> >> or might not b uid 0.  Years ago I used to use N separate uids to
> >> somewhat segragate workloads on my laptop, and I'd like my browser to
> >> do something like that.  Is that the kind of uid switching you have
> >> in mind?
> > "N completely unprivileged uids to be used by some user (i.e. uid
> > 1000)" is the closest description of what this LSM has in mind. For
> > example, uid 123 is some system service that needs runtime
> > capabilities X, Y and Z and a bunch of DBus permissions associated
> > with uid 123, but also wants to spawn another program without any of
> > these capabilities/permissions. In this case we would like to avoid
> > giving the system service CAP_SETUID.
> >
> >> -serge
> >>
> >> *1 And maybe with one of the p9auth/factotem proposals out there you
> >> could have a userspace daemon hand out the tokens for setuid, but that's
> >> getting "out there" and probably derailing this conversation :)
>
Casey Schaufler Nov. 2, 2018, 4:05 p.m. UTC | #15
On 11/1/2018 12:52 PM, Micah Morton wrote:
> On Thu, Nov 1, 2018 at 10:08 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 11/1/2018 9:11 AM, Micah Morton wrote:
>>> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>>>> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
>>>>> Quoting mortonm@chromium.org (mortonm@chromium.org):
>>>>>> From: Micah Morton <mortonm@chromium.org>
>>>>>>
>>>>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
>>>>>> transitions from a given UID/GID to only those approved by a
>>>>>> system-wide whitelist. These restrictions also prohibit the given
>>>>>> UIDs/GIDs from obtaining auxiliary privileges associated with
>>>>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
>>>>>> mappings. For now, only gating the set*uid family of syscalls is
>>>>>> supported, with support for set*gid coming in a future patch set.
>>>>>>
>>>>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
>>>>>> ---
>>>>>>
>>>>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
>>>>>> code that likely needs improvement before being an acceptable approach.
>>>>>> I'm specifically interested to see if there are better ideas for how
>>>>>> this could be done.
>>>>>>
>>>>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
>>>>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
>>>>>>  arch/Kconfig                                |   5 +
>>>>>>  arch/arm/Kconfig                            |   1 +
>>>>>>  arch/arm64/Kconfig                          |   1 +
>>>>>>  arch/x86/Kconfig                            |   1 +
>>>>>>  security/Kconfig                            |   1 +
>>>>>>  security/Makefile                           |   2 +
>>>>>>  security/safesetid/Kconfig                  |  13 +
>>>>>>  security/safesetid/Makefile                 |   7 +
>>>>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
>>>>>>  security/safesetid/lsm.h                    |  30 ++
>>>>>>  security/safesetid/securityfs.c             | 189 +++++++++++
>>>>>>  13 files changed, 679 insertions(+)
>>>>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
>>>>>>  create mode 100644 security/safesetid/Kconfig
>>>>>>  create mode 100644 security/safesetid/Makefile
>>>>>>  create mode 100644 security/safesetid/lsm.c
>>>>>>  create mode 100644 security/safesetid/lsm.h
>>>>>>  create mode 100644 security/safesetid/securityfs.c
>>>>>>
>>>>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
>>>>>> new file mode 100644
>>>>>> index 000000000000..e7d072124424
>>>>>> --- /dev/null
>>>>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
>>>>>> @@ -0,0 +1,94 @@
>>>>>> +=========
>>>>>> +SafeSetID
>>>>>> +=========
>>>>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
>>>>>> +UID/GID transitions from a given UID/GID to only those approved by a
>>>>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
>>>>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
>>>>>> +allowing a user to set up user namespace UID mappings.
>>>>>> +
>>>>>> +
>>>>>> +Background
>>>>>> +==========
>>>>>> +In absence of file capabilities, processes spawned on a Linux system that need
>>>>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
>>>>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
>>>>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
>>>>>> +often preferable to use Linux runtime capabilities rather than file
>>>>>> +capabilities, since using file capabilities to run a program with elevated
>>>>>> +privileges opens up possible security holes since any user with access to the
>>>>>> +file can exec() that program to gain the elevated privileges.
>>>>> Not true, see inheritable capabilities.  You also might look at ambient
>>>>> capabilities.
>>>> So for example with pam_cap.so you could have your N uids each be given
>>>> the desired pI, and assign the corrsponding fIs to the files they should
>>>> be able to exec with privilege.  No other uids will run those files with
>>>> privilege.  *1
>>> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
>>>
>>>> Can you give some more details about exactly how you see SafeSetID being
>>>> used?
>>> Sure. The main use case for this LSM is to allow a non-root program to
>>> transition to other untrusted uids without full blown CAP_SETUID
>>> capabilities. The non-root program would still need CAP_SETUID to do
>>> any kind of transition, but the additional restrictions imposed by
>>> this LSM would mean it is a "safer" version of CAP_SETUID since the
>>> non-root program cannot take advantage of CAP_SETUID to do any
>>> unapproved actions (i.e. setuid to uid 0 or create/enter new user
>>> namespace). The higher level goal is to allow for uid-based sandboxing
>>> of system services without having to give out CAP_SETUID all over the
>>> place just so that non-root programs can drop to
>>> even-further-non-privileged uids. This is especially relevant when one
>>> non-root daemon on the system should be allowed to spawn other
>>> processes as different uids, but its undesirable to give the daemon a
>>> basically-root-equivalent CAP_SETUID.
>> I don't want to sound stupid(er than usual), but it sounds like
>> you could do all this using setuid bits prudently. Based on this
>> description, I don't see that anything new is needed.
> There are situations where setuid bits don't get the job done, as
> there are many situations where a program just wants to call setuid as
> part of its execution (or fork + setuid without exec), instead of
> fork/exec'ing a setuid binary.

Yes, I understand that.

> Take the following scenario for
> example: init script (as root) spawns a network manager program as uid
> 1000

So far, so good.

> and then the network manager spawns OpenVPN. The common mode of
> operation for OpenVPN is to start running as the uid it was spawned
> with (1000) at startup, but then drop to a lesser-privileged uid (e.g.
> 2000) after initialization/setup by calling setuid.

OK. That's an operation that does and ought to require privilege.

> This is something
> setuid bits wouldn't help with, without refactoring OpenVPN.

You're correct.

> So one
> option here is to give the network manager CAP_SETUID, which will be
> inherited by OpenVPN, and then OpenVPN drops to uid 2000 and drops
> CAP_SETUID (would probably require patching OpenVPN for the capability
> dropping).

Or, you put CAP_SETUID on the file capabilities for OpenVPN,
which is the way the P1003.1e DRAFT specification would have
you accomplish this. Unfortunately, with all the changes made
to capabilities for namespaces and all I'm not 100% sure I
could say exactly how to set that.

> The problem here is that if the network manager itself is
> untrusted and exploitable, then giving it unrestricted CAP_SETUID is a
> big security risk.

Right. That's why you set the file capabilities on OpenVPN.

> Even just sticking with the network manager / VPN
> example, strongSwan VPN also uses the same drop-to-user-through-setuid
> setup, as do other Linux applications.

Same solution.

> Refactoring these applications
> to fork/exec setuid binaries instead of simply calling setuid is often
> infeasible. So a direct call to setuid is often necessary/expected,
> and setuid bits don't help here.

What is it with kids these days, that they are so
afraid of fixing code that needs fixing? But that's
not necessary in this example.

> Also, use of setuid bits precludes the use of the no_new_privs bit,
> which is usually at least a nice-to-have (if not need-to-have) for
> sandboxed processes on the system.

But you've already said that you *want* to change the security state,
"drop to a lesser-privileged uid", so you're already mucking with the
sandbox. If you're going to say that changing UIDs doesn't count for
sandboxing I'll point out that you brought up the notion of a
lesser-privileged UID.
Micah Morton Nov. 2, 2018, 5:12 p.m. UTC | #16
On Fri, Nov 2, 2018 at 9:05 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/1/2018 12:52 PM, Micah Morton wrote:
> > On Thu, Nov 1, 2018 at 10:08 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 11/1/2018 9:11 AM, Micah Morton wrote:
> >>> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> >>>>> Quoting mortonm@chromium.org (mortonm@chromium.org):
> >>>>>> From: Micah Morton <mortonm@chromium.org>
> >>>>>>
> >>>>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
> >>>>>> transitions from a given UID/GID to only those approved by a
> >>>>>> system-wide whitelist. These restrictions also prohibit the given
> >>>>>> UIDs/GIDs from obtaining auxiliary privileges associated with
> >>>>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> >>>>>> mappings. For now, only gating the set*uid family of syscalls is
> >>>>>> supported, with support for set*gid coming in a future patch set.
> >>>>>>
> >>>>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
> >>>>>> ---
> >>>>>>
> >>>>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> >>>>>> code that likely needs improvement before being an acceptable approach.
> >>>>>> I'm specifically interested to see if there are better ideas for how
> >>>>>> this could be done.
> >>>>>>
> >>>>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> >>>>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
> >>>>>>  arch/Kconfig                                |   5 +
> >>>>>>  arch/arm/Kconfig                            |   1 +
> >>>>>>  arch/arm64/Kconfig                          |   1 +
> >>>>>>  arch/x86/Kconfig                            |   1 +
> >>>>>>  security/Kconfig                            |   1 +
> >>>>>>  security/Makefile                           |   2 +
> >>>>>>  security/safesetid/Kconfig                  |  13 +
> >>>>>>  security/safesetid/Makefile                 |   7 +
> >>>>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> >>>>>>  security/safesetid/lsm.h                    |  30 ++
> >>>>>>  security/safesetid/securityfs.c             | 189 +++++++++++
> >>>>>>  13 files changed, 679 insertions(+)
> >>>>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>  create mode 100644 security/safesetid/Kconfig
> >>>>>>  create mode 100644 security/safesetid/Makefile
> >>>>>>  create mode 100644 security/safesetid/lsm.c
> >>>>>>  create mode 100644 security/safesetid/lsm.h
> >>>>>>  create mode 100644 security/safesetid/securityfs.c
> >>>>>>
> >>>>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>> new file mode 100644
> >>>>>> index 000000000000..e7d072124424
> >>>>>> --- /dev/null
> >>>>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>> @@ -0,0 +1,94 @@
> >>>>>> +=========
> >>>>>> +SafeSetID
> >>>>>> +=========
> >>>>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> >>>>>> +UID/GID transitions from a given UID/GID to only those approved by a
> >>>>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> >>>>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> >>>>>> +allowing a user to set up user namespace UID mappings.
> >>>>>> +
> >>>>>> +
> >>>>>> +Background
> >>>>>> +==========
> >>>>>> +In absence of file capabilities, processes spawned on a Linux system that need
> >>>>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
> >>>>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
> >>>>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
> >>>>>> +often preferable to use Linux runtime capabilities rather than file
> >>>>>> +capabilities, since using file capabilities to run a program with elevated
> >>>>>> +privileges opens up possible security holes since any user with access to the
> >>>>>> +file can exec() that program to gain the elevated privileges.
> >>>>> Not true, see inheritable capabilities.  You also might look at ambient
> >>>>> capabilities.
> >>>> So for example with pam_cap.so you could have your N uids each be given
> >>>> the desired pI, and assign the corrsponding fIs to the files they should
> >>>> be able to exec with privilege.  No other uids will run those files with
> >>>> privilege.  *1
> >>> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
> >>>
> >>>> Can you give some more details about exactly how you see SafeSetID being
> >>>> used?
> >>> Sure. The main use case for this LSM is to allow a non-root program to
> >>> transition to other untrusted uids without full blown CAP_SETUID
> >>> capabilities. The non-root program would still need CAP_SETUID to do
> >>> any kind of transition, but the additional restrictions imposed by
> >>> this LSM would mean it is a "safer" version of CAP_SETUID since the
> >>> non-root program cannot take advantage of CAP_SETUID to do any
> >>> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> >>> namespace). The higher level goal is to allow for uid-based sandboxing
> >>> of system services without having to give out CAP_SETUID all over the
> >>> place just so that non-root programs can drop to
> >>> even-further-non-privileged uids. This is especially relevant when one
> >>> non-root daemon on the system should be allowed to spawn other
> >>> processes as different uids, but its undesirable to give the daemon a
> >>> basically-root-equivalent CAP_SETUID.
> >> I don't want to sound stupid(er than usual), but it sounds like
> >> you could do all this using setuid bits prudently. Based on this
> >> description, I don't see that anything new is needed.
> > There are situations where setuid bits don't get the job done, as
> > there are many situations where a program just wants to call setuid as
> > part of its execution (or fork + setuid without exec), instead of
> > fork/exec'ing a setuid binary.
>
> Yes, I understand that.
>
> > Take the following scenario for
> > example: init script (as root) spawns a network manager program as uid
> > 1000
>
> So far, so good.
>
> > and then the network manager spawns OpenVPN. The common mode of
> > operation for OpenVPN is to start running as the uid it was spawned
> > with (1000) at startup, but then drop to a lesser-privileged uid (e.g.
> > 2000) after initialization/setup by calling setuid.
>
> OK. That's an operation that does and ought to require privilege.

Sure, but the idea behind this LSM is that full CAP_SETUID
capabilities are a lot more privilege than is necessary in this
scenario.

>
> > This is something
> > setuid bits wouldn't help with, without refactoring OpenVPN.
>
> You're correct.
>
> > So one
> > option here is to give the network manager CAP_SETUID, which will be
> > inherited by OpenVPN, and then OpenVPN drops to uid 2000 and drops
> > CAP_SETUID (would probably require patching OpenVPN for the capability
> > dropping).
>
> Or, you put CAP_SETUID on the file capabilities for OpenVPN,
> which is the way the P1003.1e DRAFT specification would have
> you accomplish this. Unfortunately, with all the changes made
> to capabilities for namespaces and all I'm not 100% sure I
> could say exactly how to set that.
>
> > The problem here is that if the network manager itself is
> > untrusted and exploitable, then giving it unrestricted CAP_SETUID is a
> > big security risk.
>
> Right. That's why you set the file capabilities on OpenVPN.

So it seems like you're suggesting that any time a program needs to
switch user by calling setuid, that it should get full CAP_SETUID
capabilities (whether that's through setting file capabilities on the
binary or inheriting CAP_SETUID from a parent process or otherwise).
But that brings us back to the basic problem this LSM is trying to
solve. Namely, we don't want to sprinkle unrestricted CAP_SETUID privs
all over the system for binaries that just want to switch to specific
uid[s] and don't need any of the root-equivalent privileges provided
by CAP_SETUID.

>
> > Even just sticking with the network manager / VPN
> > example, strongSwan VPN also uses the same drop-to-user-through-setuid
> > setup, as do other Linux applications.
>
> Same solution.
>
> > Refactoring these applications
> > to fork/exec setuid binaries instead of simply calling setuid is often
> > infeasible. So a direct call to setuid is often necessary/expected,
> > and setuid bits don't help here.
>
> What is it with kids these days, that they are so
> afraid of fixing code that needs fixing? But that's
> not necessary in this example.
>
> > Also, use of setuid bits precludes the use of the no_new_privs bit,
> > which is usually at least a nice-to-have (if not need-to-have) for
> > sandboxed processes on the system.
>
> But you've already said that you *want* to change the security state,
> "drop to a lesser-privileged uid", so you're already mucking with the
> sandbox. If you're going to say that changing UIDs doesn't count for
> sandboxing I'll point out that you brought up the notion of a
> lesser-privileged UID.

There are plenty of ways that non-root processes further restrict
especially vulnerable parts of their code to even lesser-privileged
contexts. But its often easier to reason about the security of such
applications if the no_new_privs bit is set and file capabilities are
avoided, so the application can have full control of which privileges
are given to spawned processes without having to worry about which
privileges are attached to which files. Granted, the no_new_privs
issue is less central to the LSM being proposed here compared to the
discussion above.

>
Stephen Smalley Nov. 2, 2018, 6:07 p.m. UTC | #17
On 10/31/18 11:28 AM, mortonm@chromium.org wrote:
> From: Micah Morton <mortonm@chromium.org>
> 
> SafeSetID gates the setid family of syscalls to restrict UID/GID
> transitions from a given UID/GID to only those approved by a
> system-wide whitelist. These restrictions also prohibit the given
> UIDs/GIDs from obtaining auxiliary privileges associated with
> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> mappings. For now, only gating the set*uid family of syscalls is
> supported, with support for set*gid coming in a future patch set.
> 
> Signed-off-by: Micah Morton <mortonm@chromium.org>
> ---
> 
> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> code that likely needs improvement before being an acceptable approach.
> I'm specifically interested to see if there are better ideas for how
> this could be done.

If it were me, I'd modify the callers of ns_capable(..., CAP_SETUID) in 
some manner to let you distinguish rather than trying to test the 
current syscall within the capable hook.  Modify the set*id system calls 
to use a variant interface that passes flags or something; there is 
already precedent for the _noaudit case but it isn't general.  More 
generally, extending ns_capable() and friends to take a variety of 
additional inputs would be useful, e.g. to allow one to pass down the 
inode for CAP_DAC_OVERRIDE/READ_SEARCH checks so that one could 
authorize it for specific files rather than all or nothing. This is 
already partly done via capable_wrt_inode_uidgid() but the inode isn't 
propagated down to ns_capable() and thus cannot be passed down to the 
security hook currently.

> 
>   Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
>   Documentation/admin-guide/LSM/index.rst     |   1 +
>   arch/Kconfig                                |   5 +
>   arch/arm/Kconfig                            |   1 +
>   arch/arm64/Kconfig                          |   1 +
>   arch/x86/Kconfig                            |   1 +
>   security/Kconfig                            |   1 +
>   security/Makefile                           |   2 +
>   security/safesetid/Kconfig                  |  13 +
>   security/safesetid/Makefile                 |   7 +
>   security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
>   security/safesetid/lsm.h                    |  30 ++
>   security/safesetid/securityfs.c             | 189 +++++++++++
>   13 files changed, 679 insertions(+)
>   create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
>   create mode 100644 security/safesetid/Kconfig
>   create mode 100644 security/safesetid/Makefile
>   create mode 100644 security/safesetid/lsm.c
>   create mode 100644 security/safesetid/lsm.h
>   create mode 100644 security/safesetid/securityfs.c
> 
> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> new file mode 100644
> index 000000000000..e7d072124424
> --- /dev/null
> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> @@ -0,0 +1,94 @@
> +=========
> +SafeSetID
> +=========
> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> +UID/GID transitions from a given UID/GID to only those approved by a
> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> +allowing a user to set up user namespace UID mappings.
> +
> +
> +Background
> +==========
> +In absence of file capabilities, processes spawned on a Linux system that need
> +to switch to a different user must be spawned with CAP_SETUID privileges.
> +CAP_SETUID is granted to programs running as root or those running as a non-root
> +user that have been explicitly given the CAP_SETUID runtime capability. It is
> +often preferable to use Linux runtime capabilities rather than file
> +capabilities, since using file capabilities to run a program with elevated
> +privileges opens up possible security holes since any user with access to the
> +file can exec() that program to gain the elevated privileges.
> +
> +While it is possible to implement a tree of processes by giving full
> +CAP_SET{U/G}ID capabilities, this is often at odds with the goals of running a
> +tree of processes under non-root user(s) in the first place. Specifically,
> +since CAP_SETUID allows changing to any user on the system, including the root
> +user, it is an overpowered capability for what is needed in this scenario,
> +especially since programs often only call setuid() to drop privileges to a
> +lesser-privileged user -- not elevate privileges. Unfortunately, there is no
> +generally feasible way in Linux to restrict the potential UIDs that a user can
> +switch to through setuid() beyond allowing a switch to any user on the system.
> +This SafeSetID LSM seeks to provide a solution for restricting setid
> +capabilities in such a way.
> +
> +
> +Other Approaches Considered
> +===========================
> +
> +Solve this problem in userspace
> +-------------------------------
> +For candidate applications that would like to have restricted setid capabilities
> +as implemented in this LSM, an alternative option would be to simply take away
> +setid capabilities from the application completely and refactor the process
> +spawning semantics in the application (e.g. by using a privileged helper program
> +to do process spawning and UID/GID transitions). Unfortunately, there are a
> +number of semantics around process spawning that would be affected by this, such
> +as fork() calls where the program doesn’t immediately call exec() after the
> +fork(), parent processes specifying custom environment variables or command line
> +args for spawned child processes, or inheritance of file handles across a
> +fork()/exec(). Because of this, as solution that uses a privileged helper in
> +userspace would likely be less appealing to incorporate into existing projects
> +that rely on certain process-spawning semantics in Linux.
> +
> +Use user namespaces
> +-------------------
> +Another possible approach would be to run a given process tree in its own user
> +namespace and give programs in the tree setid capabilities. In this way,
> +programs in the tree could change to any desired UID/GID in the context of their
> +own user namespace, and only approved UIDs/GIDs could be mapped back to the
> +initial system user namespace, affectively preventing privilege escalation.
> +Unfortunately, it is not generally feasible to use user namespaces in isolation,
> +without pairing them with other namespace types, which is not always an option.
> +Linux checks for capabilities based off of the user namespace that “owns” some
> +entity. For example, Linux has the notion that network namespaces are owned by
> +the user namespace in which they were created. A consequence of this is that
> +capability checks for access to a given network namespace are done by checking
> +whether a task has the given capability in the context of the user namespace
> +that owns the network namespace -- not necessarily the user namespace under
> +which the given task runs. Therefore spawning a process in a new user namespace
> +effectively prevents it from accessing the network namespace owned by the
> +initial namespace. This is a deal-breaker for any application that expects to
> +retain the CAP_NET_ADMIN capability for the purpose of adjusting network
> +configurations. Using user namespaces in isolation causes problems regarding
> +other system interactions, including use of pid namespaces and device creation.
> +
> +Use an existing LSM
> +-------------------
> +None of the other in-tree LSMs have the capability to gate setid transitions, or
> +even employ the security_task_fix_setuid hook at all. SELinux says of that hook:
> +"Since setuid only affects the current process, and since the SELinux controls
> +are not based on the Linux identity attributes, SELinux does not need to control
> +this operation."
> +
> +
> +Directions for use
> +==================
> +This LSM hooks the setid syscalls to make sure transitions are allowed if an
> +applicable restriction policy is in place. Policies are configured through
> +securityfs by writing to the safesetid/add_whitelist_policy and
> +safesetid/flush_whitelist_policies files at the location where securityfs is
> +mounted. The format for adding a policy is '<UID>:<UID>', using literal
> +numbers, such as '123:456'. To flush the policies, any write to the file is
> +sufficient. Again, configuring a policy for a UID will prevent that UID from
> +obtaining auxiliary setid privileges, such as allowing a user to set up user
> +namespace UID mappings.
> diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
> index c980dfe9abf1..a0c387649e12 100644
> --- a/Documentation/admin-guide/LSM/index.rst
> +++ b/Documentation/admin-guide/LSM/index.rst
> @@ -39,3 +39,4 @@ the one "major" module (e.g. SELinux) if there is one configured.
>      Smack
>      tomoyo
>      Yama
> +   SafeSetID
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 1aa59063f1fd..c87070807ba2 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -381,6 +381,11 @@ config ARCH_WANT_OLD_COMPAT_IPC
>   	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
>   	bool
>   
> +config HAVE_SAFESETID
> +	bool
> +	help
> +	  This option enables the SafeSetID LSM.
> +
>   config HAVE_ARCH_SECCOMP_FILTER
>   	bool
>   	help
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 843edfd000be..35b1a772c971 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -92,6 +92,7 @@ config ARM
>   	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
>   	select HAVE_REGS_AND_STACK_ACCESS_API
>   	select HAVE_RSEQ
> +	select HAVE_SAFESETID
>   	select HAVE_STACKPROTECTOR
>   	select HAVE_SYSCALL_TRACEPOINTS
>   	select HAVE_UID16
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 42c090cf0292..2c6f5ec3a55e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -127,6 +127,7 @@ config ARM64
>   	select HAVE_PERF_USER_STACK_DUMP
>   	select HAVE_REGS_AND_STACK_ACCESS_API
>   	select HAVE_RCU_TABLE_FREE
> +	select HAVE_SAFESETID
>   	select HAVE_STACKPROTECTOR
>   	select HAVE_SYSCALL_TRACEPOINTS
>   	select HAVE_KPROBES
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 887d3a7bb646..a6527d6c0426 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -27,6 +27,7 @@ config X86_64
>   	select ARCH_SUPPORTS_INT128
>   	select ARCH_USE_CMPXCHG_LOCKREF
>   	select HAVE_ARCH_SOFT_DIRTY
> +	select HAVE_SAFESETID
>   	select MODULES_USE_ELF_RELA
>   	select NEED_DMA_MAP_STATE
>   	select SWIOTLB
> diff --git a/security/Kconfig b/security/Kconfig
> index c4302067a3ad..7d9008ad5903 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -237,6 +237,7 @@ source security/tomoyo/Kconfig
>   source security/apparmor/Kconfig
>   source security/loadpin/Kconfig
>   source security/yama/Kconfig
> +source security/safesetid/Kconfig
>   
>   source security/integrity/Kconfig
>   
> diff --git a/security/Makefile b/security/Makefile
> index 4d2d3782ddef..88209d827832 100644
> --- a/security/Makefile
> +++ b/security/Makefile
> @@ -10,6 +10,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
>   subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
>   subdir-$(CONFIG_SECURITY_YAMA)		+= yama
>   subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
> +subdir-$(CONFIG_SECURITY_SAFESETID)	+= safesetid
>   
>   # always enable default capabilities
>   obj-y					+= commoncap.o
> @@ -25,6 +26,7 @@ obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
>   obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
>   obj-$(CONFIG_SECURITY_YAMA)		+= yama/
>   obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
> +obj-$(CONFIG_SECURITY_SAFESETID)	+= safesetid/
>   obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
>   
>   # Object integrity file lists
> diff --git a/security/safesetid/Kconfig b/security/safesetid/Kconfig
> new file mode 100644
> index 000000000000..4ff82c7ed273
> --- /dev/null
> +++ b/security/safesetid/Kconfig
> @@ -0,0 +1,13 @@
> +config SECURITY_SAFESETID
> +        bool "Gate setid transitions to limit CAP_SET{U/G}ID capabilities"
> +        depends on HAVE_SAFESETID
> +        default n
> +        help
> +          SafeSetID is an LSM module that gates the setid family of syscalls to
> +          restrict UID/GID transitions from a given UID/GID to only those
> +          approved by a system-wide whitelist. These restrictions also prohibit
> +          the given UIDs/GIDs from obtaining auxiliary privileges associated
> +          with CAP_SET{U/G}ID, such as allowing a user to set up user namespace
> +          UID mappings.
> +
> +          If you are unsure how to answer this question, answer N.
> diff --git a/security/safesetid/Makefile b/security/safesetid/Makefile
> new file mode 100644
> index 000000000000..6b0660321164
> --- /dev/null
> +++ b/security/safesetid/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Makefile for the safesetid LSM.
> +#
> +
> +obj-$(CONFIG_SECURITY_SAFESETID) := safesetid.o
> +safesetid-y := lsm.o securityfs.o
> diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
> new file mode 100644
> index 000000000000..e30ff06d8e07
> --- /dev/null
> +++ b/security/safesetid/lsm.c
> @@ -0,0 +1,334 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * SafeSetID Linux Security Module
> + *
> + * Author: Micah Morton <mortonm@chromium.org>
> + *
> + * Copyright (C) 2018 The Chromium OS Authors.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + *
> + */
> +
> +#define pr_fmt(fmt) "SafeSetID: " fmt
> +
> +#include <asm/syscall.h>
> +#include <linux/hashtable.h>
> +#include <linux/lsm_hooks.h>
> +#include <linux/module.h>
> +#include <linux/ptrace.h>
> +#include <linux/sched/task_stack.h>
> +#include <linux/security.h>
> +
> +#define NUM_BITS 8 /* 128 buckets in hash table */
> +
> +static DEFINE_HASHTABLE(safesetid_whitelist_hashtable, NUM_BITS);
> +
> +/*
> + * Hash table entry to store safesetid policy signifying that 'parent' user
> + * can setid to 'child' user.
> + */
> +struct entry {
> +	struct hlist_node next;
> +	struct hlist_node dlist; /* for deletion cleanup */
> +	uint64_t parent_kuid;
> +	uint64_t child_kuid;
> +};
> +
> +static DEFINE_SPINLOCK(safesetid_whitelist_hashtable_spinlock);
> +
> +static bool check_setuid_policy_hashtable_key(kuid_t parent)
> +{
> +	struct entry *entry;
> +
> +	rcu_read_lock();
> +	hash_for_each_possible_rcu(safesetid_whitelist_hashtable,
> +				   entry, next, __kuid_val(parent)) {
> +		if (entry->parent_kuid == __kuid_val(parent)) {
> +			rcu_read_unlock();
> +			return true;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return false;
> +}
> +
> +static bool check_setuid_policy_hashtable_key_value(kuid_t parent,
> +						    kuid_t child)
> +{
> +	struct entry *entry;
> +
> +	rcu_read_lock();
> +	hash_for_each_possible_rcu(safesetid_whitelist_hashtable,
> +				   entry, next, __kuid_val(parent)) {
> +		if (entry->parent_kuid == __kuid_val(parent) &&
> +		    entry->child_kuid == __kuid_val(child)) {
> +			rcu_read_unlock();
> +			return true;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return false;
> +}
> +
> +/*
> + * TODO: Figuring out whether the current syscall number (saved on the kernel
> + * stack) is one of the set*uid syscalls is an operation that requires checking
> + * the number against arch-specific constants as seen below. The need for this
> + * LSM to know about arch-specific syscall stuff is not ideal. Is it better to
> + * implement an arch-specific function that gets called from this file and
> + * update arch/Kconfig to mention that the HAVE_SAFESETID symbol should only be
> + * selected for architectures that implement the function? Any other ideas?
> + */
> +static bool setuid_syscall(int num)
> +{
> +#ifdef CONFIG_X86_64
> +#ifdef CONFIG_COMPAT
> +	if (!(num == __NR_setreuid ||
> +	      num == __NR_setuid ||
> +	      num == __NR_setresuid ||
> +	      num == __NR_setfsuid ||
> +	      num == __NR_ia32_setreuid32 ||
> +	      num == __NR_ia32_setuid ||
> +	      num == __NR_ia32_setresuid ||
> +	      num == __NR_ia32_setresuid ||
> +	      num == __NR_ia32_setuid32))
> +		return false;
> +#else
> +	if (!(num == __NR_setreuid ||
> +	      num == __NR_setuid ||
> +	      num == __NR_setresuid ||
> +	      num == __NR_setfsuid))
> +		return false;
> +#endif /* CONFIG_COMPAT */
> +#elif defined CONFIG_ARM64
> +#ifdef CONFIG_COMPAT
> +	if (!(num == __NR_setuid ||
> +	      num == __NR_setreuid ||
> +	      num == __NR_setfsuid ||
> +	      num == __NR_setresuid ||
> +	      num == __NR_setreuid32 ||
> +	      num == __NR_setresuid32 ||
> +	      num == __NR_setuid32 ||
> +	      num == __NR_setfsuid32 ||
> +	      num == __NR_compat_setuid ||
> +	      num == __NR_compat_setreuid ||
> +	      num == __NR_compat_setfsuid ||
> +	      num == __NR_compat_setresuid ||
> +	      num == __NR_compat_setreuid32 ||
> +	      num == __NR_compat_setresuid32 ||
> +	      num == __NR_compat_setuid32 ||
> +	      num == __NR_compat_setfsuid32))
> +		return false;
> +#else
> +	if (!(num == __NR_setuid ||
> +	      num == __NR_setreuid ||
> +	      num == __NR_setfsuid ||
> +	      num == __NR_setresuid))
> +		return false;
> +#endif /* CONFIG_COMPAT */
> +#elif defined CONFIG_ARM
> +	if (!(num == __NR_setreuid32 ||
> +	      num == __NR_setuid32 ||
> +	      num == __NR_setresuid32 ||
> +	      num == __NR_setfsuid32))
> +		return false;
> +#else
> +	BUILD_BUG();
> +#endif
> +	return true;
> +}
> +
> +static int safesetid_security_capable(const struct cred *cred,
> +				      struct user_namespace *ns,
> +				      int cap,
> +				      int audit)
> +{
> +	/* The current->mm check will fail if this is a kernel thread. */
> +	if (cap == CAP_SETUID &&
> +	    current->mm &&
> +	    check_setuid_policy_hashtable_key(cred->uid)) {
> +		/*
> +		 * syscall_get_nr can theoretically return 0 or -1, but that
> +		 * would signify that the syscall is being aborted due to a
> +		 * signal, so we don't need to check for this case here.
> +		 */
> +		if (!(setuid_syscall(syscall_get_nr(current,
> +						    current_pt_regs()))))
> +			/*
> +			 * Deny if we're not in a set*uid() syscall to avoid
> +			 * giving powers gated by CAP_SETUID that are related
> +			 * to functionality other than calling set*uid() (e.g.
> +			 * allowing user to set up userns uid mappings).
> +			 */
> +			return -1;
> +	}
> +	return 0;
> +}
> +
> +static void setuid_policy_warning(kuid_t parent, kuid_t child)
> +{
> +	pr_warn("UID transition (%d -> %d) blocked",
> +		__kuid_val(parent),
> +		__kuid_val(child));
> +}
> +
> +static int check_uid_transition(kuid_t parent, kuid_t child)
> +{
> +	if (check_setuid_policy_hashtable_key_value(parent, child))
> +		return 0;
> +	setuid_policy_warning(parent, child);
> +	return -1;
> +}
> +
> +/*
> + * Check whether there is either an exception for user under old cred struct to
> + * set*uid to user under new cred struct, or the UID transition is allowed (by
> + * Linux set*uid rules) even without CAP_SETUID.
> + */
> +static int safesetid_task_fix_setuid(struct cred *new,
> +				     const struct cred *old,
> +				     int flags)
> +{
> +
> +	/* Do nothing if there are no setuid restrictions for this UID. */
> +	if (!check_setuid_policy_hashtable_key(old->uid))
> +		return 0;
> +
> +	switch (flags) {
> +	case LSM_SETID_RE:
> +		/*
> +		 * Users for which setuid restrictions exist can only set the
> +		 * real UID to the real UID or the effective UID, unless an
> +		 * explicit whitelist policy allows the transition.
> +		 */
> +		if (!uid_eq(old->uid, new->uid) &&
> +			!uid_eq(old->euid, new->uid)) {
> +			return check_uid_transition(old->uid, new->uid);
> +		}
> +		/*
> +		 * Users for which setuid restrictions exist can only set the
> +		 * effective UID to the real UID, the effective UID, or the
> +		 * saved set-UID, unless an explicit whitelist policy allows
> +		 * the transition.
> +		 */
> +		if (!uid_eq(old->uid, new->euid) &&
> +			!uid_eq(old->euid, new->euid) &&
> +			!uid_eq(old->suid, new->euid)) {
> +			return check_uid_transition(old->euid, new->euid);
> +		}
> +		break;
> +	case LSM_SETID_ID:
> +		/*
> +		 * Users for which setuid restrictions exist cannot change the
> +		 * real UID or saved set-UID unless an explicit whitelist
> +		 * policy allows the transition.
> +		 */
> +		if (!uid_eq(old->uid, new->uid))
> +			return check_uid_transition(old->uid, new->uid);
> +		if (!uid_eq(old->suid, new->suid))
> +			return check_uid_transition(old->suid, new->suid);
> +		break;
> +	case LSM_SETID_RES:
> +		/*
> +		 * Users for which setuid restrictions exist cannot change the
> +		 * real UID, effective UID, or saved set-UID to anything but
> +		 * one of: the current real UID, the current effective UID or
> +		 * the current saved set-user-ID unless an explicit whitelist
> +		 * policy allows the transition.
> +		 */
> +		if (!uid_eq(new->uid, old->uid) &&
> +			!uid_eq(new->uid, old->euid) &&
> +			!uid_eq(new->uid, old->suid)) {
> +			return check_uid_transition(old->uid, new->uid);
> +		}
> +		if (!uid_eq(new->euid, old->uid) &&
> +			!uid_eq(new->euid, old->euid) &&
> +			!uid_eq(new->euid, old->suid)) {
> +			return check_uid_transition(old->euid, new->euid);
> +		}
> +		if (!uid_eq(new->suid, old->uid) &&
> +			!uid_eq(new->suid, old->euid) &&
> +			!uid_eq(new->suid, old->suid)) {
> +			return check_uid_transition(old->suid, new->suid);
> +		}
> +		break;
> +	case LSM_SETID_FS:
> +		/*
> +		 * Users for which setuid restrictions exist cannot change the
> +		 * filesystem UID to anything but one of: the current real UID,
> +		 * the current effective UID or the current saved set-UID
> +		 * unless an explicit whitelist policy allows the transition.
> +		 */
> +		if (!uid_eq(new->fsuid, old->uid)  &&
> +			!uid_eq(new->fsuid, old->euid)  &&
> +			!uid_eq(new->fsuid, old->suid) &&
> +			!uid_eq(new->fsuid, old->fsuid)) {
> +			return check_uid_transition(old->fsuid, new->fsuid);
> +		}
> +		break;
> +	}
> +	return 0;
> +}
> +
> +int add_safesetid_whitelist_entry(kuid_t parent, kuid_t child)
> +{
> +	struct entry *new;
> +
> +	/* Return if entry already exists */
> +	if (check_setuid_policy_hashtable_key_value(parent, child))
> +		return 0;
> +
> +	new = kzalloc(sizeof(struct entry), GFP_KERNEL);
> +	if (!new)
> +		return -ENOMEM;
> +	new->parent_kuid = __kuid_val(parent);
> +	new->child_kuid = __kuid_val(child);
> +	spin_lock(&safesetid_whitelist_hashtable_spinlock);
> +	hash_add_rcu(safesetid_whitelist_hashtable,
> +		     &new->next,
> +		     __kuid_val(parent));
> +	spin_unlock(&safesetid_whitelist_hashtable_spinlock);
> +	return 0;
> +}
> +
> +void flush_safesetid_whitelist_entries(void)
> +{
> +	struct entry *entry;
> +	struct hlist_node *hlist_node;
> +	unsigned int bkt_loop_cursor;
> +	HLIST_HEAD(free_list);
> +
> +	/*
> +	 * Could probably use hash_for_each_rcu here instead, but this should
> +	 * be fine as well.
> +	 */
> +	hash_for_each_safe(safesetid_whitelist_hashtable, bkt_loop_cursor,
> +			   hlist_node, entry, next) {
> +		spin_lock(&safesetid_whitelist_hashtable_spinlock);
> +		hash_del_rcu(&entry->next);
> +		spin_unlock(&safesetid_whitelist_hashtable_spinlock);
> +		hlist_add_head(&entry->dlist, &free_list);
> +	}
> +	synchronize_rcu();
> +	hlist_for_each_entry_safe(entry, hlist_node, &free_list, dlist)
> +		kfree(entry);
> +}
> +
> +static struct security_hook_list safesetid_security_hooks[] = {
> +	LSM_HOOK_INIT(task_fix_setuid, safesetid_task_fix_setuid),
> +	LSM_HOOK_INIT(capable, safesetid_security_capable)
> +};
> +
> +static int __init safesetid_security_init(void)
> +{
> +	security_add_hooks(safesetid_security_hooks,
> +			   ARRAY_SIZE(safesetid_security_hooks), "safesetid");
> +
> +	return 0;
> +}
> +security_initcall(safesetid_security_init);
> diff --git a/security/safesetid/lsm.h b/security/safesetid/lsm.h
> new file mode 100644
> index 000000000000..bf78af9bf314
> --- /dev/null
> +++ b/security/safesetid/lsm.h
> @@ -0,0 +1,30 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * SafeSetID Linux Security Module
> + *
> + * Author: Micah Morton <mortonm@chromium.org>
> + *
> + * Copyright (C) 2018 The Chromium OS Authors.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + *
> + */
> +#ifndef _SAFESETID_H
> +#define _SAFESETID_H
> +
> +#include <linux/types.h>
> +
> +/* Function type. */
> +enum safesetid_whitelist_file_write_type {
> +	SAFESETID_WHITELIST_ADD, /* Add whitelist policy. */
> +	SAFESETID_WHITELIST_FLUSH, /* Flush whitelist policies. */
> +};
> +
> +/* Add entry to safesetid whitelist to allow 'parent' to setid to 'child'. */
> +int add_safesetid_whitelist_entry(kuid_t parent, kuid_t child);
> +
> +void flush_safesetid_whitelist_entries(void);
> +
> +#endif /* _SAFESETID_H */
> diff --git a/security/safesetid/securityfs.c b/security/safesetid/securityfs.c
> new file mode 100644
> index 000000000000..ff5fcf2c1b37
> --- /dev/null
> +++ b/security/safesetid/securityfs.c
> @@ -0,0 +1,189 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * SafeSetID Linux Security Module
> + *
> + * Author: Micah Morton <mortonm@chromium.org>
> + *
> + * Copyright (C) 2018 The Chromium OS Authors.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2, as
> + * published by the Free Software Foundation.
> + *
> + */
> +#include <linux/security.h>
> +#include <linux/cred.h>
> +
> +#include "lsm.h"
> +
> +static struct dentry *safesetid_policy_dir;
> +
> +struct safesetid_file_entry {
> +	const char *name;
> +	enum safesetid_whitelist_file_write_type type;
> +	struct dentry *dentry;
> +};
> +
> +static struct safesetid_file_entry safesetid_files[] = {
> +	{.name = "add_whitelist_policy",
> +	 .type = SAFESETID_WHITELIST_ADD},
> +	{.name = "flush_whitelist_policies",
> +	 .type = SAFESETID_WHITELIST_FLUSH},
> +};
> +
> +/*
> + * In the case the input buffer contains one or more invalid UIDs, the kuid_t
> + * variables pointed to by 'parent' and 'child' will get updated but this
> + * function will return an error.
> + */
> +static int parse_safesetid_whitelist_policy(const char __user *buf,
> +					    size_t len,
> +					    kuid_t *parent,
> +					    kuid_t *child)
> +{
> +	char *kern_buf;
> +	char *parent_buf;
> +	char *child_buf;
> +	const char separator[] = ":";
> +	int ret;
> +	size_t first_substring_length;
> +	long parsed_parent;
> +	long parsed_child;
> +
> +	/* Duplicate string from user memory and NULL-terminate */
> +	kern_buf = memdup_user_nul(buf, len);
> +	if (IS_ERR(kern_buf))
> +		return PTR_ERR(kern_buf);
> +
> +	/*
> +	 * Format of |buf| string should be <UID>:<UID>.
> +	 * Find location of ":" in kern_buf (copied from |buf|).
> +	 */
> +	first_substring_length = strcspn(kern_buf, separator);
> +	if (first_substring_length == 0 || first_substring_length == len) {
> +		ret = -EINVAL;
> +		goto free_kern;
> +	}
> +
> +	parent_buf = kmemdup_nul(kern_buf, first_substring_length, GFP_KERNEL);
> +	if (!parent_buf) {
> +		ret = -ENOMEM;
> +		goto free_kern;
> +	}
> +
> +	ret = kstrtol(parent_buf, 0, &parsed_parent);
> +	if (ret)
> +		goto free_both;
> +
> +	child_buf = kern_buf + first_substring_length + 1;
> +	ret = kstrtol(child_buf, 0, &parsed_child);
> +	if (ret)
> +		goto free_both;
> +
> +	*parent = make_kuid(current_user_ns(), parsed_parent);
> +	if (!uid_valid(*parent)) {
> +		ret = -EINVAL;
> +		goto free_both;
> +	}
> +
> +	*child = make_kuid(current_user_ns(), parsed_child);
> +	if (!uid_valid(*child)) {
> +		ret = -EINVAL;
> +		goto free_both;
> +	}
> +
> +free_both:
> +	kfree(parent_buf);
> +free_kern:
> +	kfree(kern_buf);
> +	return ret;
> +}
> +
> +static ssize_t safesetid_file_write(struct file *file,
> +				    const char __user *buf,
> +				    size_t len,
> +				    loff_t *ppos)
> +{
> +	struct safesetid_file_entry *file_entry =
> +		file->f_inode->i_private;
> +	kuid_t parent;
> +	kuid_t child;
> +	int ret;
> +
> +	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (*ppos != 0)
> +		return -EINVAL;
> +
> +	if (file_entry->type == SAFESETID_WHITELIST_FLUSH) {
> +		flush_safesetid_whitelist_entries();
> +		return len;
> +	}
> +
> +	/*
> +	 * If we get to here, must be the case that file_entry->type equals
> +	 * SAFESETID_WHITELIST_ADD
> +	 */
> +	ret = parse_safesetid_whitelist_policy(buf, len, &parent,
> +							 &child);
> +	if (ret)
> +		return ret;
> +
> +	ret = add_safesetid_whitelist_entry(parent, child);
> +	if (ret)
> +		return ret;
> +
> +	/* Return len on success so caller won't keep trying to write */
> +	return len;
> +}
> +
> +static const struct file_operations safesetid_file_fops = {
> +	.write = safesetid_file_write,
> +};
> +
> +static void safesetid_shutdown_securityfs(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(safesetid_files); ++i) {
> +		struct safesetid_file_entry *entry =
> +			&safesetid_files[i];
> +		securityfs_remove(entry->dentry);
> +		entry->dentry = NULL;
> +	}
> +
> +	securityfs_remove(safesetid_policy_dir);
> +	safesetid_policy_dir = NULL;
> +}
> +
> +static int __init safesetid_init_securityfs(void)
> +{
> +	int i;
> +	int ret;
> +
> +	safesetid_policy_dir = securityfs_create_dir("safesetid", NULL);
> +	if (!safesetid_policy_dir) {
> +		ret = PTR_ERR(safesetid_policy_dir);
> +		goto error;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(safesetid_files); ++i) {
> +		struct safesetid_file_entry *entry =
> +			&safesetid_files[i];
> +		entry->dentry = securityfs_create_file(
> +			entry->name, 0200, safesetid_policy_dir,
> +			entry, &safesetid_file_fops);
> +		if (IS_ERR(entry->dentry)) {
> +			ret = PTR_ERR(entry->dentry);
> +			goto error;
> +		}
> +	}
> +
> +	return 0;
> +
> +error:
> +	safesetid_shutdown_securityfs();
> +	return ret;
> +}
> +fs_initcall(safesetid_init_securityfs);
>
Casey Schaufler Nov. 2, 2018, 6:19 p.m. UTC | #18
On 11/2/2018 10:12 AM, Micah Morton wrote:
> On Fri, Nov 2, 2018 at 9:05 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 11/1/2018 12:52 PM, Micah Morton wrote:
>>> On Thu, Nov 1, 2018 at 10:08 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>>>> On 11/1/2018 9:11 AM, Micah Morton wrote:
>>>>> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>>>>>> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
>>>>>>> Quoting mortonm@chromium.org (mortonm@chromium.org):
>>>>>>>> From: Micah Morton <mortonm@chromium.org>
>>>>>>>>
>>>>>>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
>>>>>>>> transitions from a given UID/GID to only those approved by a
>>>>>>>> system-wide whitelist. These restrictions also prohibit the given
>>>>>>>> UIDs/GIDs from obtaining auxiliary privileges associated with
>>>>>>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
>>>>>>>> mappings. For now, only gating the set*uid family of syscalls is
>>>>>>>> supported, with support for set*gid coming in a future patch set.
>>>>>>>>
>>>>>>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
>>>>>>>> code that likely needs improvement before being an acceptable approach.
>>>>>>>> I'm specifically interested to see if there are better ideas for how
>>>>>>>> this could be done.
>>>>>>>>
>>>>>>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
>>>>>>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
>>>>>>>>  arch/Kconfig                                |   5 +
>>>>>>>>  arch/arm/Kconfig                            |   1 +
>>>>>>>>  arch/arm64/Kconfig                          |   1 +
>>>>>>>>  arch/x86/Kconfig                            |   1 +
>>>>>>>>  security/Kconfig                            |   1 +
>>>>>>>>  security/Makefile                           |   2 +
>>>>>>>>  security/safesetid/Kconfig                  |  13 +
>>>>>>>>  security/safesetid/Makefile                 |   7 +
>>>>>>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
>>>>>>>>  security/safesetid/lsm.h                    |  30 ++
>>>>>>>>  security/safesetid/securityfs.c             | 189 +++++++++++
>>>>>>>>  13 files changed, 679 insertions(+)
>>>>>>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
>>>>>>>>  create mode 100644 security/safesetid/Kconfig
>>>>>>>>  create mode 100644 security/safesetid/Makefile
>>>>>>>>  create mode 100644 security/safesetid/lsm.c
>>>>>>>>  create mode 100644 security/safesetid/lsm.h
>>>>>>>>  create mode 100644 security/safesetid/securityfs.c
>>>>>>>>
>>>>>>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..e7d072124424
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
>>>>>>>> @@ -0,0 +1,94 @@
>>>>>>>> +=========
>>>>>>>> +SafeSetID
>>>>>>>> +=========
>>>>>>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
>>>>>>>> +UID/GID transitions from a given UID/GID to only those approved by a
>>>>>>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
>>>>>>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
>>>>>>>> +allowing a user to set up user namespace UID mappings.
>>>>>>>> +
>>>>>>>> +
>>>>>>>> +Background
>>>>>>>> +==========
>>>>>>>> +In absence of file capabilities, processes spawned on a Linux system that need
>>>>>>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
>>>>>>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
>>>>>>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
>>>>>>>> +often preferable to use Linux runtime capabilities rather than file
>>>>>>>> +capabilities, since using file capabilities to run a program with elevated
>>>>>>>> +privileges opens up possible security holes since any user with access to the
>>>>>>>> +file can exec() that program to gain the elevated privileges.
>>>>>>> Not true, see inheritable capabilities.  You also might look at ambient
>>>>>>> capabilities.
>>>>>> So for example with pam_cap.so you could have your N uids each be given
>>>>>> the desired pI, and assign the corrsponding fIs to the files they should
>>>>>> be able to exec with privilege.  No other uids will run those files with
>>>>>> privilege.  *1
>>>>> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
>>>>>
>>>>>> Can you give some more details about exactly how you see SafeSetID being
>>>>>> used?
>>>>> Sure. The main use case for this LSM is to allow a non-root program to
>>>>> transition to other untrusted uids without full blown CAP_SETUID
>>>>> capabilities. The non-root program would still need CAP_SETUID to do
>>>>> any kind of transition, but the additional restrictions imposed by
>>>>> this LSM would mean it is a "safer" version of CAP_SETUID since the
>>>>> non-root program cannot take advantage of CAP_SETUID to do any
>>>>> unapproved actions (i.e. setuid to uid 0 or create/enter new user
>>>>> namespace). The higher level goal is to allow for uid-based sandboxing
>>>>> of system services without having to give out CAP_SETUID all over the
>>>>> place just so that non-root programs can drop to
>>>>> even-further-non-privileged uids. This is especially relevant when one
>>>>> non-root daemon on the system should be allowed to spawn other
>>>>> processes as different uids, but its undesirable to give the daemon a
>>>>> basically-root-equivalent CAP_SETUID.
>>>> I don't want to sound stupid(er than usual), but it sounds like
>>>> you could do all this using setuid bits prudently. Based on this
>>>> description, I don't see that anything new is needed.
>>> There are situations where setuid bits don't get the job done, as
>>> there are many situations where a program just wants to call setuid as
>>> part of its execution (or fork + setuid without exec), instead of
>>> fork/exec'ing a setuid binary.
>> Yes, I understand that.
>>
>>> Take the following scenario for
>>> example: init script (as root) spawns a network manager program as uid
>>> 1000
>> So far, so good.
>>
>>> and then the network manager spawns OpenVPN. The common mode of
>>> operation for OpenVPN is to start running as the uid it was spawned
>>> with (1000) at startup, but then drop to a lesser-privileged uid (e.g.
>>> 2000) after initialization/setup by calling setuid.
>> OK. That's an operation that does and ought to require privilege.
> Sure, but the idea behind this LSM is that full CAP_SETUID
> capabilities are a lot more privilege than is necessary in this
> scenario.

I'll start by pointing out that CAP_SETUID is about the finest grained
capability there is. It's very precise in what it allows. I think that
your concern is about the worst case scenario, which is setting the
effective UID to 0, and hence gaining all privilege.


>>> This is something
>>> setuid bits wouldn't help with, without refactoring OpenVPN.
>> You're correct.
>>
>>> So one
>>> option here is to give the network manager CAP_SETUID, which will be
>>> inherited by OpenVPN, and then OpenVPN drops to uid 2000 and drops
>>> CAP_SETUID (would probably require patching OpenVPN for the capability
>>> dropping).
>> Or, you put CAP_SETUID on the file capabilities for OpenVPN,
>> which is the way the P1003.1e DRAFT specification would have
>> you accomplish this. Unfortunately, with all the changes made
>> to capabilities for namespaces and all I'm not 100% sure I
>> could say exactly how to set that.
>>
>>> The problem here is that if the network manager itself is
>>> untrusted and exploitable, then giving it unrestricted CAP_SETUID is a
>>> big security risk.
>> Right. That's why you set the file capabilities on OpenVPN.
> So it seems like you're suggesting that any time a program needs to
> switch user by calling setuid,

... in a way that requires CAP_SETUID ...

> that it should get full CAP_SETUID
> capabilities (whether that's through setting file capabilities on the
> binary or inheriting CAP_SETUID from a parent process or otherwise).

Yup. That's correct. With all the duties and responsibilities associated
with the dangers of UID management. Changing UIDs shouldn't be done
lightly and needs to be done carefully.

> But that brings us back to the basic problem this LSM is trying to
> solve. Namely, we don't want to sprinkle unrestricted CAP_SETUID privs
> all over the system for binaries that just want to switch to specific
> uid[s] and don't need any of the root-equivalent privileges provided
> by CAP_SETUID.

I would see marking a program with a list of UIDs it can run with or
that its children can run with as a better solution. You get much
better locality of reference that way.

>>> Even just sticking with the network manager / VPN
>>> example, strongSwan VPN also uses the same drop-to-user-through-setuid
>>> setup, as do other Linux applications.
>> Same solution.
>>
>>> Refactoring these applications
>>> to fork/exec setuid binaries instead of simply calling setuid is often
>>> infeasible. So a direct call to setuid is often necessary/expected,
>>> and setuid bits don't help here.
>> What is it with kids these days, that they are so
>> afraid of fixing code that needs fixing? But that's
>> not necessary in this example.
>>
>>> Also, use of setuid bits precludes the use of the no_new_privs bit,
>>> which is usually at least a nice-to-have (if not need-to-have) for
>>> sandboxed processes on the system.
>> But you've already said that you *want* to change the security state,
>> "drop to a lesser-privileged uid", so you're already mucking with the
>> sandbox. If you're going to say that changing UIDs doesn't count for
>> sandboxing I'll point out that you brought up the notion of a
>> lesser-privileged UID.
> There are plenty of ways that non-root processes further restrict
> especially vulnerable parts of their code to even lesser-privileged
> contexts. But its often easier to reason about the security of such
> applications if the no_new_privs bit is set and file capabilities are
> avoided, so the application can have full control of which privileges
> are given to spawned processes without having to worry about which
> privileges are attached to which files. Granted, the no_new_privs
> issue is less central to the LSM being proposed here compared to the
> discussion above.

Let me suggest a change to the way your LSM works
that would reduce my concerns. Rather than refusing to
make a UID change that isn't on your whitelist, kill a
process that makes a prohibited request. This mitigates
the problem where a process doesn't check for an error
return. Sure, your system will be harder to get running
until your whitelist is complete, but you'll avoid a
whole category of security bugs.
Serge E. Hallyn Nov. 2, 2018, 6:30 p.m. UTC | #19
Quoting Casey Schaufler (casey@schaufler-ca.com):
> On 11/2/2018 10:12 AM, Micah Morton wrote:
> > On Fri, Nov 2, 2018 at 9:05 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 11/1/2018 12:52 PM, Micah Morton wrote:
> >>> On Thu, Nov 1, 2018 at 10:08 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>>> On 11/1/2018 9:11 AM, Micah Morton wrote:
> >>>>> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>>>> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> >>>>>>> Quoting mortonm@chromium.org (mortonm@chromium.org):
> >>>>>>>> From: Micah Morton <mortonm@chromium.org>
> >>>>>>>>
> >>>>>>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
> >>>>>>>> transitions from a given UID/GID to only those approved by a
> >>>>>>>> system-wide whitelist. These restrictions also prohibit the given
> >>>>>>>> UIDs/GIDs from obtaining auxiliary privileges associated with
> >>>>>>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> >>>>>>>> mappings. For now, only gating the set*uid family of syscalls is
> >>>>>>>> supported, with support for set*gid coming in a future patch set.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
> >>>>>>>> ---
> >>>>>>>>
> >>>>>>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> >>>>>>>> code that likely needs improvement before being an acceptable approach.
> >>>>>>>> I'm specifically interested to see if there are better ideas for how
> >>>>>>>> this could be done.
> >>>>>>>>
> >>>>>>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> >>>>>>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
> >>>>>>>>  arch/Kconfig                                |   5 +
> >>>>>>>>  arch/arm/Kconfig                            |   1 +
> >>>>>>>>  arch/arm64/Kconfig                          |   1 +
> >>>>>>>>  arch/x86/Kconfig                            |   1 +
> >>>>>>>>  security/Kconfig                            |   1 +
> >>>>>>>>  security/Makefile                           |   2 +
> >>>>>>>>  security/safesetid/Kconfig                  |  13 +
> >>>>>>>>  security/safesetid/Makefile                 |   7 +
> >>>>>>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> >>>>>>>>  security/safesetid/lsm.h                    |  30 ++
> >>>>>>>>  security/safesetid/securityfs.c             | 189 +++++++++++
> >>>>>>>>  13 files changed, 679 insertions(+)
> >>>>>>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>>>  create mode 100644 security/safesetid/Kconfig
> >>>>>>>>  create mode 100644 security/safesetid/Makefile
> >>>>>>>>  create mode 100644 security/safesetid/lsm.c
> >>>>>>>>  create mode 100644 security/safesetid/lsm.h
> >>>>>>>>  create mode 100644 security/safesetid/securityfs.c
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>>> new file mode 100644
> >>>>>>>> index 000000000000..e7d072124424
> >>>>>>>> --- /dev/null
> >>>>>>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>>> @@ -0,0 +1,94 @@
> >>>>>>>> +=========
> >>>>>>>> +SafeSetID
> >>>>>>>> +=========
> >>>>>>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> >>>>>>>> +UID/GID transitions from a given UID/GID to only those approved by a
> >>>>>>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> >>>>>>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> >>>>>>>> +allowing a user to set up user namespace UID mappings.
> >>>>>>>> +
> >>>>>>>> +
> >>>>>>>> +Background
> >>>>>>>> +==========
> >>>>>>>> +In absence of file capabilities, processes spawned on a Linux system that need
> >>>>>>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
> >>>>>>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
> >>>>>>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
> >>>>>>>> +often preferable to use Linux runtime capabilities rather than file
> >>>>>>>> +capabilities, since using file capabilities to run a program with elevated
> >>>>>>>> +privileges opens up possible security holes since any user with access to the
> >>>>>>>> +file can exec() that program to gain the elevated privileges.
> >>>>>>> Not true, see inheritable capabilities.  You also might look at ambient
> >>>>>>> capabilities.
> >>>>>> So for example with pam_cap.so you could have your N uids each be given
> >>>>>> the desired pI, and assign the corrsponding fIs to the files they should
> >>>>>> be able to exec with privilege.  No other uids will run those files with
> >>>>>> privilege.  *1
> >>>>> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
> >>>>>
> >>>>>> Can you give some more details about exactly how you see SafeSetID being
> >>>>>> used?
> >>>>> Sure. The main use case for this LSM is to allow a non-root program to
> >>>>> transition to other untrusted uids without full blown CAP_SETUID
> >>>>> capabilities. The non-root program would still need CAP_SETUID to do
> >>>>> any kind of transition, but the additional restrictions imposed by
> >>>>> this LSM would mean it is a "safer" version of CAP_SETUID since the
> >>>>> non-root program cannot take advantage of CAP_SETUID to do any
> >>>>> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> >>>>> namespace). The higher level goal is to allow for uid-based sandboxing
> >>>>> of system services without having to give out CAP_SETUID all over the
> >>>>> place just so that non-root programs can drop to
> >>>>> even-further-non-privileged uids. This is especially relevant when one
> >>>>> non-root daemon on the system should be allowed to spawn other
> >>>>> processes as different uids, but its undesirable to give the daemon a
> >>>>> basically-root-equivalent CAP_SETUID.
> >>>> I don't want to sound stupid(er than usual), but it sounds like
> >>>> you could do all this using setuid bits prudently. Based on this
> >>>> description, I don't see that anything new is needed.
> >>> There are situations where setuid bits don't get the job done, as
> >>> there are many situations where a program just wants to call setuid as
> >>> part of its execution (or fork + setuid without exec), instead of
> >>> fork/exec'ing a setuid binary.
> >> Yes, I understand that.
> >>
> >>> Take the following scenario for
> >>> example: init script (as root) spawns a network manager program as uid
> >>> 1000
> >> So far, so good.
> >>
> >>> and then the network manager spawns OpenVPN. The common mode of
> >>> operation for OpenVPN is to start running as the uid it was spawned
> >>> with (1000) at startup, but then drop to a lesser-privileged uid (e.g.
> >>> 2000) after initialization/setup by calling setuid.
> >> OK. That's an operation that does and ought to require privilege.
> > Sure, but the idea behind this LSM is that full CAP_SETUID
> > capabilities are a lot more privilege than is necessary in this
> > scenario.
> 
> I'll start by pointing out that CAP_SETUID is about the finest grained
> capability there is. It's very precise in what it allows. I think that
> your concern is about the worst case scenario, which is setting the
> effective UID to 0, and hence gaining all privilege.
> 
> 
> >>> This is something
> >>> setuid bits wouldn't help with, without refactoring OpenVPN.
> >> You're correct.
> >>
> >>> So one
> >>> option here is to give the network manager CAP_SETUID, which will be
> >>> inherited by OpenVPN, and then OpenVPN drops to uid 2000 and drops
> >>> CAP_SETUID (would probably require patching OpenVPN for the capability
> >>> dropping).
> >> Or, you put CAP_SETUID on the file capabilities for OpenVPN,
> >> which is the way the P1003.1e DRAFT specification would have
> >> you accomplish this. Unfortunately, with all the changes made
> >> to capabilities for namespaces and all I'm not 100% sure I
> >> could say exactly how to set that.
> >>
> >>> The problem here is that if the network manager itself is
> >>> untrusted and exploitable, then giving it unrestricted CAP_SETUID is a
> >>> big security risk.
> >> Right. That's why you set the file capabilities on OpenVPN.
> > So it seems like you're suggesting that any time a program needs to
> > switch user by calling setuid,
> 
> ... in a way that requires CAP_SETUID ...
> 
> > that it should get full CAP_SETUID
> > capabilities (whether that's through setting file capabilities on the
> > binary or inheriting CAP_SETUID from a parent process or otherwise).
> 
> Yup. That's correct. With all the duties and responsibilities associated
> with the dangers of UID management. Changing UIDs shouldn't be done
> lightly and needs to be done carefully.
> 
> > But that brings us back to the basic problem this LSM is trying to
> > solve. Namely, we don't want to sprinkle unrestricted CAP_SETUID privs
> > all over the system for binaries that just want to switch to specific
> > uid[s] and don't need any of the root-equivalent privileges provided
> > by CAP_SETUID.
> 
> I would see marking a program with a list of UIDs it can run with or
> that its children can run with as a better solution. You get much
> better locality of reference that way.
> 
> >>> Even just sticking with the network manager / VPN
> >>> example, strongSwan VPN also uses the same drop-to-user-through-setuid
> >>> setup, as do other Linux applications.
> >> Same solution.
> >>
> >>> Refactoring these applications
> >>> to fork/exec setuid binaries instead of simply calling setuid is often
> >>> infeasible. So a direct call to setuid is often necessary/expected,
> >>> and setuid bits don't help here.
> >> What is it with kids these days, that they are so
> >> afraid of fixing code that needs fixing? But that's
> >> not necessary in this example.
> >>
> >>> Also, use of setuid bits precludes the use of the no_new_privs bit,
> >>> which is usually at least a nice-to-have (if not need-to-have) for
> >>> sandboxed processes on the system.
> >> But you've already said that you *want* to change the security state,
> >> "drop to a lesser-privileged uid", so you're already mucking with the
> >> sandbox. If you're going to say that changing UIDs doesn't count for
> >> sandboxing I'll point out that you brought up the notion of a
> >> lesser-privileged UID.
> > There are plenty of ways that non-root processes further restrict
> > especially vulnerable parts of their code to even lesser-privileged
> > contexts. But its often easier to reason about the security of such
> > applications if the no_new_privs bit is set and file capabilities are
> > avoided, so the application can have full control of which privileges
> > are given to spawned processes without having to worry about which
> > privileges are attached to which files. Granted, the no_new_privs
> > issue is less central to the LSM being proposed here compared to the
> > discussion above.
> 
> Let me suggest a change to the way your LSM works
> that would reduce my concerns. Rather than refusing to
> make a UID change that isn't on your whitelist, kill a
> process that makes a prohibited request. This mitigates
> the problem where a process doesn't check for an error
> return. Sure, your system will be harder to get running
> until your whitelist is complete, but you'll avoid a
> whole category of security bugs.

Might also consider not restricting CAP_SETUID, but instead adding a
new CAP_SETUID_RANGE capability.  That way you can be sure there will be
no regressions with any programs which run with CAP_SETUID.

Though that violates what Casey was just arguing halfway up the email.

-serge
Casey Schaufler Nov. 2, 2018, 7:02 p.m. UTC | #20
On 11/2/2018 11:30 AM, Serge E. Hallyn wrote:
> Quoting Casey Schaufler (casey@schaufler-ca.com):
>
>> Let me suggest a change to the way your LSM works
>> that would reduce my concerns. Rather than refusing to
>> make a UID change that isn't on your whitelist, kill a
>> process that makes a prohibited request. This mitigates
>> the problem where a process doesn't check for an error
>> return. Sure, your system will be harder to get running
>> until your whitelist is complete, but you'll avoid a
>> whole category of security bugs.
> Might also consider not restricting CAP_SETUID, but instead adding a
> new CAP_SETUID_RANGE capability.  That way you can be sure there will be
> no regressions with any programs which run with CAP_SETUID.
>
> Though that violates what Casey was just arguing halfway up the email.

I know that it's hard to believe 20 years after the fact,
but the POSIX group worked very hard to ensure that the granularity
of capabilities was correct for the security policy that the
interfaces defined in P1003.1. What would CAP_SETUID_RANGE mean?
Micah Morton Nov. 2, 2018, 7:13 p.m. UTC | #21
On Fri, Nov 2, 2018 at 11:04 AM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 10/31/18 11:28 AM, mortonm@chromium.org wrote:
> > From: Micah Morton <mortonm@chromium.org>
> >
> > SafeSetID gates the setid family of syscalls to restrict UID/GID
> > transitions from a given UID/GID to only those approved by a
> > system-wide whitelist. These restrictions also prohibit the given
> > UIDs/GIDs from obtaining auxiliary privileges associated with
> > CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> > mappings. For now, only gating the set*uid family of syscalls is
> > supported, with support for set*gid coming in a future patch set.
> >
> > Signed-off-by: Micah Morton <mortonm@chromium.org>
> > ---
> >
> > NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> > code that likely needs improvement before being an acceptable approach.
> > I'm specifically interested to see if there are better ideas for how
> > this could be done.
>
> If it were me, I'd modify the callers of ns_capable(..., CAP_SETUID) in
> some manner to let you distinguish rather than trying to test the
> current syscall within the capable hook.  Modify the set*id system calls
> to use a variant interface that passes flags or something; there is
> already precedent for the _noaudit case but it isn't general.  More
> generally, extending ns_capable() and friends to take a variety of
> additional inputs would be useful, e.g. to allow one to pass down the
> inode for CAP_DAC_OVERRIDE/READ_SEARCH checks so that one could
> authorize it for specific files rather than all or nothing. This is
> already partly done via capable_wrt_inode_uidgid() but the inode isn't
> propagated down to ns_capable() and thus cannot be passed down to the
> security hook currently.

Yeah good point. There are only a few spots in the kernel that call
ns_capable(..., CAP_SETUID), so it would be pretty easy to annotate
all of them with a flag, similar to what is done with the LSM_SETID_*
constants for differentiating the set*uid calls in the
security_task_fix_setuid hook:
https://elixir.bootlin.com/linux/latest/source/include/linux/security.h#L126.
If we are going to add in changes to the kernel beyond registring LSM
hooks, this is probably the way to go.

>
> >
> >   Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> >   Documentation/admin-guide/LSM/index.rst     |   1 +
> >   arch/Kconfig                                |   5 +
> >   arch/arm/Kconfig                            |   1 +
> >   arch/arm64/Kconfig                          |   1 +
> >   arch/x86/Kconfig                            |   1 +
> >   security/Kconfig                            |   1 +
> >   security/Makefile                           |   2 +
> >   security/safesetid/Kconfig                  |  13 +
> >   security/safesetid/Makefile                 |   7 +
> >   security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> >   security/safesetid/lsm.h                    |  30 ++
> >   security/safesetid/securityfs.c             | 189 +++++++++++
> >   13 files changed, 679 insertions(+)
> >   create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> >   create mode 100644 security/safesetid/Kconfig
> >   create mode 100644 security/safesetid/Makefile
> >   create mode 100644 security/safesetid/lsm.c
> >   create mode 100644 security/safesetid/lsm.h
> >   create mode 100644 security/safesetid/securityfs.c
> >
> > diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> > new file mode 100644
> > index 000000000000..e7d072124424
> > --- /dev/null
> > +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> > @@ -0,0 +1,94 @@
> > +=========
> > +SafeSetID
> > +=========
> > +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> > +UID/GID transitions from a given UID/GID to only those approved by a
> > +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> > +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> > +allowing a user to set up user namespace UID mappings.
> > +
> > +
> > +Background
> > +==========
> > +In absence of file capabilities, processes spawned on a Linux system that need
> > +to switch to a different user must be spawned with CAP_SETUID privileges.
> > +CAP_SETUID is granted to programs running as root or those running as a non-root
> > +user that have been explicitly given the CAP_SETUID runtime capability. It is
> > +often preferable to use Linux runtime capabilities rather than file
> > +capabilities, since using file capabilities to run a program with elevated
> > +privileges opens up possible security holes since any user with access to the
> > +file can exec() that program to gain the elevated privileges.
> > +
> > +While it is possible to implement a tree of processes by giving full
> > +CAP_SET{U/G}ID capabilities, this is often at odds with the goals of running a
> > +tree of processes under non-root user(s) in the first place. Specifically,
> > +since CAP_SETUID allows changing to any user on the system, including the root
> > +user, it is an overpowered capability for what is needed in this scenario,
> > +especially since programs often only call setuid() to drop privileges to a
> > +lesser-privileged user -- not elevate privileges. Unfortunately, there is no
> > +generally feasible way in Linux to restrict the potential UIDs that a user can
> > +switch to through setuid() beyond allowing a switch to any user on the system.
> > +This SafeSetID LSM seeks to provide a solution for restricting setid
> > +capabilities in such a way.
> > +
> > +
> > +Other Approaches Considered
> > +===========================
> > +
> > +Solve this problem in userspace
> > +-------------------------------
> > +For candidate applications that would like to have restricted setid capabilities
> > +as implemented in this LSM, an alternative option would be to simply take away
> > +setid capabilities from the application completely and refactor the process
> > +spawning semantics in the application (e.g. by using a privileged helper program
> > +to do process spawning and UID/GID transitions). Unfortunately, there are a
> > +number of semantics around process spawning that would be affected by this, such
> > +as fork() calls where the program doesn’t immediately call exec() after the
> > +fork(), parent processes specifying custom environment variables or command line
> > +args for spawned child processes, or inheritance of file handles across a
> > +fork()/exec(). Because of this, as solution that uses a privileged helper in
> > +userspace would likely be less appealing to incorporate into existing projects
> > +that rely on certain process-spawning semantics in Linux.
> > +
> > +Use user namespaces
> > +-------------------
> > +Another possible approach would be to run a given process tree in its own user
> > +namespace and give programs in the tree setid capabilities. In this way,
> > +programs in the tree could change to any desired UID/GID in the context of their
> > +own user namespace, and only approved UIDs/GIDs could be mapped back to the
> > +initial system user namespace, affectively preventing privilege escalation.
> > +Unfortunately, it is not generally feasible to use user namespaces in isolation,
> > +without pairing them with other namespace types, which is not always an option.
> > +Linux checks for capabilities based off of the user namespace that “owns” some
> > +entity. For example, Linux has the notion that network namespaces are owned by
> > +the user namespace in which they were created. A consequence of this is that
> > +capability checks for access to a given network namespace are done by checking
> > +whether a task has the given capability in the context of the user namespace
> > +that owns the network namespace -- not necessarily the user namespace under
> > +which the given task runs. Therefore spawning a process in a new user namespace
> > +effectively prevents it from accessing the network namespace owned by the
> > +initial namespace. This is a deal-breaker for any application that expects to
> > +retain the CAP_NET_ADMIN capability for the purpose of adjusting network
> > +configurations. Using user namespaces in isolation causes problems regarding
> > +other system interactions, including use of pid namespaces and device creation.
> > +
> > +Use an existing LSM
> > +-------------------
> > +None of the other in-tree LSMs have the capability to gate setid transitions, or
> > +even employ the security_task_fix_setuid hook at all. SELinux says of that hook:
> > +"Since setuid only affects the current process, and since the SELinux controls
> > +are not based on the Linux identity attributes, SELinux does not need to control
> > +this operation."
> > +
> > +
> > +Directions for use
> > +==================
> > +This LSM hooks the setid syscalls to make sure transitions are allowed if an
> > +applicable restriction policy is in place. Policies are configured through
> > +securityfs by writing to the safesetid/add_whitelist_policy and
> > +safesetid/flush_whitelist_policies files at the location where securityfs is
> > +mounted. The format for adding a policy is '<UID>:<UID>', using literal
> > +numbers, such as '123:456'. To flush the policies, any write to the file is
> > +sufficient. Again, configuring a policy for a UID will prevent that UID from
> > +obtaining auxiliary setid privileges, such as allowing a user to set up user
> > +namespace UID mappings.
> > diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
> > index c980dfe9abf1..a0c387649e12 100644
> > --- a/Documentation/admin-guide/LSM/index.rst
> > +++ b/Documentation/admin-guide/LSM/index.rst
> > @@ -39,3 +39,4 @@ the one "major" module (e.g. SELinux) if there is one configured.
> >      Smack
> >      tomoyo
> >      Yama
> > +   SafeSetID
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 1aa59063f1fd..c87070807ba2 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -381,6 +381,11 @@ config ARCH_WANT_OLD_COMPAT_IPC
> >       select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
> >       bool
> >
> > +config HAVE_SAFESETID
> > +     bool
> > +     help
> > +       This option enables the SafeSetID LSM.
> > +
> >   config HAVE_ARCH_SECCOMP_FILTER
> >       bool
> >       help
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index 843edfd000be..35b1a772c971 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -92,6 +92,7 @@ config ARM
> >       select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
> >       select HAVE_REGS_AND_STACK_ACCESS_API
> >       select HAVE_RSEQ
> > +     select HAVE_SAFESETID
> >       select HAVE_STACKPROTECTOR
> >       select HAVE_SYSCALL_TRACEPOINTS
> >       select HAVE_UID16
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 42c090cf0292..2c6f5ec3a55e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -127,6 +127,7 @@ config ARM64
> >       select HAVE_PERF_USER_STACK_DUMP
> >       select HAVE_REGS_AND_STACK_ACCESS_API
> >       select HAVE_RCU_TABLE_FREE
> > +     select HAVE_SAFESETID
> >       select HAVE_STACKPROTECTOR
> >       select HAVE_SYSCALL_TRACEPOINTS
> >       select HAVE_KPROBES
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 887d3a7bb646..a6527d6c0426 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -27,6 +27,7 @@ config X86_64
> >       select ARCH_SUPPORTS_INT128
> >       select ARCH_USE_CMPXCHG_LOCKREF
> >       select HAVE_ARCH_SOFT_DIRTY
> > +     select HAVE_SAFESETID
> >       select MODULES_USE_ELF_RELA
> >       select NEED_DMA_MAP_STATE
> >       select SWIOTLB
> > diff --git a/security/Kconfig b/security/Kconfig
> > index c4302067a3ad..7d9008ad5903 100644
> > --- a/security/Kconfig
> > +++ b/security/Kconfig
> > @@ -237,6 +237,7 @@ source security/tomoyo/Kconfig
> >   source security/apparmor/Kconfig
> >   source security/loadpin/Kconfig
> >   source security/yama/Kconfig
> > +source security/safesetid/Kconfig
> >
> >   source security/integrity/Kconfig
> >
> > diff --git a/security/Makefile b/security/Makefile
> > index 4d2d3782ddef..88209d827832 100644
> > --- a/security/Makefile
> > +++ b/security/Makefile
> > @@ -10,6 +10,7 @@ subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
> >   subdir-$(CONFIG_SECURITY_APPARMOR)  += apparmor
> >   subdir-$(CONFIG_SECURITY_YAMA)              += yama
> >   subdir-$(CONFIG_SECURITY_LOADPIN)   += loadpin
> > +subdir-$(CONFIG_SECURITY_SAFESETID)  += safesetid
> >
> >   # always enable default capabilities
> >   obj-y                                       += commoncap.o
> > @@ -25,6 +26,7 @@ obj-$(CONFIG_SECURITY_TOMOYO)               += tomoyo/
> >   obj-$(CONFIG_SECURITY_APPARMOR)             += apparmor/
> >   obj-$(CONFIG_SECURITY_YAMA)         += yama/
> >   obj-$(CONFIG_SECURITY_LOADPIN)              += loadpin/
> > +obj-$(CONFIG_SECURITY_SAFESETID)     += safesetid/
> >   obj-$(CONFIG_CGROUP_DEVICE)         += device_cgroup.o
> >
> >   # Object integrity file lists
> > diff --git a/security/safesetid/Kconfig b/security/safesetid/Kconfig
> > new file mode 100644
> > index 000000000000..4ff82c7ed273
> > --- /dev/null
> > +++ b/security/safesetid/Kconfig
> > @@ -0,0 +1,13 @@
> > +config SECURITY_SAFESETID
> > +        bool "Gate setid transitions to limit CAP_SET{U/G}ID capabilities"
> > +        depends on HAVE_SAFESETID
> > +        default n
> > +        help
> > +          SafeSetID is an LSM module that gates the setid family of syscalls to
> > +          restrict UID/GID transitions from a given UID/GID to only those
> > +          approved by a system-wide whitelist. These restrictions also prohibit
> > +          the given UIDs/GIDs from obtaining auxiliary privileges associated
> > +          with CAP_SET{U/G}ID, such as allowing a user to set up user namespace
> > +          UID mappings.
> > +
> > +          If you are unsure how to answer this question, answer N.
> > diff --git a/security/safesetid/Makefile b/security/safesetid/Makefile
> > new file mode 100644
> > index 000000000000..6b0660321164
> > --- /dev/null
> > +++ b/security/safesetid/Makefile
> > @@ -0,0 +1,7 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Makefile for the safesetid LSM.
> > +#
> > +
> > +obj-$(CONFIG_SECURITY_SAFESETID) := safesetid.o
> > +safesetid-y := lsm.o securityfs.o
> > diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
> > new file mode 100644
> > index 000000000000..e30ff06d8e07
> > --- /dev/null
> > +++ b/security/safesetid/lsm.c
> > @@ -0,0 +1,334 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * SafeSetID Linux Security Module
> > + *
> > + * Author: Micah Morton <mortonm@chromium.org>
> > + *
> > + * Copyright (C) 2018 The Chromium OS Authors.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + */
> > +
> > +#define pr_fmt(fmt) "SafeSetID: " fmt
> > +
> > +#include <asm/syscall.h>
> > +#include <linux/hashtable.h>
> > +#include <linux/lsm_hooks.h>
> > +#include <linux/module.h>
> > +#include <linux/ptrace.h>
> > +#include <linux/sched/task_stack.h>
> > +#include <linux/security.h>
> > +
> > +#define NUM_BITS 8 /* 128 buckets in hash table */
> > +
> > +static DEFINE_HASHTABLE(safesetid_whitelist_hashtable, NUM_BITS);
> > +
> > +/*
> > + * Hash table entry to store safesetid policy signifying that 'parent' user
> > + * can setid to 'child' user.
> > + */
> > +struct entry {
> > +     struct hlist_node next;
> > +     struct hlist_node dlist; /* for deletion cleanup */
> > +     uint64_t parent_kuid;
> > +     uint64_t child_kuid;
> > +};
> > +
> > +static DEFINE_SPINLOCK(safesetid_whitelist_hashtable_spinlock);
> > +
> > +static bool check_setuid_policy_hashtable_key(kuid_t parent)
> > +{
> > +     struct entry *entry;
> > +
> > +     rcu_read_lock();
> > +     hash_for_each_possible_rcu(safesetid_whitelist_hashtable,
> > +                                entry, next, __kuid_val(parent)) {
> > +             if (entry->parent_kuid == __kuid_val(parent)) {
> > +                     rcu_read_unlock();
> > +                     return true;
> > +             }
> > +     }
> > +     rcu_read_unlock();
> > +
> > +     return false;
> > +}
> > +
> > +static bool check_setuid_policy_hashtable_key_value(kuid_t parent,
> > +                                                 kuid_t child)
> > +{
> > +     struct entry *entry;
> > +
> > +     rcu_read_lock();
> > +     hash_for_each_possible_rcu(safesetid_whitelist_hashtable,
> > +                                entry, next, __kuid_val(parent)) {
> > +             if (entry->parent_kuid == __kuid_val(parent) &&
> > +                 entry->child_kuid == __kuid_val(child)) {
> > +                     rcu_read_unlock();
> > +                     return true;
> > +             }
> > +     }
> > +     rcu_read_unlock();
> > +
> > +     return false;
> > +}
> > +
> > +/*
> > + * TODO: Figuring out whether the current syscall number (saved on the kernel
> > + * stack) is one of the set*uid syscalls is an operation that requires checking
> > + * the number against arch-specific constants as seen below. The need for this
> > + * LSM to know about arch-specific syscall stuff is not ideal. Is it better to
> > + * implement an arch-specific function that gets called from this file and
> > + * update arch/Kconfig to mention that the HAVE_SAFESETID symbol should only be
> > + * selected for architectures that implement the function? Any other ideas?
> > + */
> > +static bool setuid_syscall(int num)
> > +{
> > +#ifdef CONFIG_X86_64
> > +#ifdef CONFIG_COMPAT
> > +     if (!(num == __NR_setreuid ||
> > +           num == __NR_setuid ||
> > +           num == __NR_setresuid ||
> > +           num == __NR_setfsuid ||
> > +           num == __NR_ia32_setreuid32 ||
> > +           num == __NR_ia32_setuid ||
> > +           num == __NR_ia32_setresuid ||
> > +           num == __NR_ia32_setresuid ||
> > +           num == __NR_ia32_setuid32))
> > +             return false;
> > +#else
> > +     if (!(num == __NR_setreuid ||
> > +           num == __NR_setuid ||
> > +           num == __NR_setresuid ||
> > +           num == __NR_setfsuid))
> > +             return false;
> > +#endif /* CONFIG_COMPAT */
> > +#elif defined CONFIG_ARM64
> > +#ifdef CONFIG_COMPAT
> > +     if (!(num == __NR_setuid ||
> > +           num == __NR_setreuid ||
> > +           num == __NR_setfsuid ||
> > +           num == __NR_setresuid ||
> > +           num == __NR_setreuid32 ||
> > +           num == __NR_setresuid32 ||
> > +           num == __NR_setuid32 ||
> > +           num == __NR_setfsuid32 ||
> > +           num == __NR_compat_setuid ||
> > +           num == __NR_compat_setreuid ||
> > +           num == __NR_compat_setfsuid ||
> > +           num == __NR_compat_setresuid ||
> > +           num == __NR_compat_setreuid32 ||
> > +           num == __NR_compat_setresuid32 ||
> > +           num == __NR_compat_setuid32 ||
> > +           num == __NR_compat_setfsuid32))
> > +             return false;
> > +#else
> > +     if (!(num == __NR_setuid ||
> > +           num == __NR_setreuid ||
> > +           num == __NR_setfsuid ||
> > +           num == __NR_setresuid))
> > +             return false;
> > +#endif /* CONFIG_COMPAT */
> > +#elif defined CONFIG_ARM
> > +     if (!(num == __NR_setreuid32 ||
> > +           num == __NR_setuid32 ||
> > +           num == __NR_setresuid32 ||
> > +           num == __NR_setfsuid32))
> > +             return false;
> > +#else
> > +     BUILD_BUG();
> > +#endif
> > +     return true;
> > +}
> > +
> > +static int safesetid_security_capable(const struct cred *cred,
> > +                                   struct user_namespace *ns,
> > +                                   int cap,
> > +                                   int audit)
> > +{
> > +     /* The current->mm check will fail if this is a kernel thread. */
> > +     if (cap == CAP_SETUID &&
> > +         current->mm &&
> > +         check_setuid_policy_hashtable_key(cred->uid)) {
> > +             /*
> > +              * syscall_get_nr can theoretically return 0 or -1, but that
> > +              * would signify that the syscall is being aborted due to a
> > +              * signal, so we don't need to check for this case here.
> > +              */
> > +             if (!(setuid_syscall(syscall_get_nr(current,
> > +                                                 current_pt_regs()))))
> > +                     /*
> > +                      * Deny if we're not in a set*uid() syscall to avoid
> > +                      * giving powers gated by CAP_SETUID that are related
> > +                      * to functionality other than calling set*uid() (e.g.
> > +                      * allowing user to set up userns uid mappings).
> > +                      */
> > +                     return -1;
> > +     }
> > +     return 0;
> > +}
> > +
> > +static void setuid_policy_warning(kuid_t parent, kuid_t child)
> > +{
> > +     pr_warn("UID transition (%d -> %d) blocked",
> > +             __kuid_val(parent),
> > +             __kuid_val(child));
> > +}
> > +
> > +static int check_uid_transition(kuid_t parent, kuid_t child)
> > +{
> > +     if (check_setuid_policy_hashtable_key_value(parent, child))
> > +             return 0;
> > +     setuid_policy_warning(parent, child);
> > +     return -1;
> > +}
> > +
> > +/*
> > + * Check whether there is either an exception for user under old cred struct to
> > + * set*uid to user under new cred struct, or the UID transition is allowed (by
> > + * Linux set*uid rules) even without CAP_SETUID.
> > + */
> > +static int safesetid_task_fix_setuid(struct cred *new,
> > +                                  const struct cred *old,
> > +                                  int flags)
> > +{
> > +
> > +     /* Do nothing if there are no setuid restrictions for this UID. */
> > +     if (!check_setuid_policy_hashtable_key(old->uid))
> > +             return 0;
> > +
> > +     switch (flags) {
> > +     case LSM_SETID_RE:
> > +             /*
> > +              * Users for which setuid restrictions exist can only set the
> > +              * real UID to the real UID or the effective UID, unless an
> > +              * explicit whitelist policy allows the transition.
> > +              */
> > +             if (!uid_eq(old->uid, new->uid) &&
> > +                     !uid_eq(old->euid, new->uid)) {
> > +                     return check_uid_transition(old->uid, new->uid);
> > +             }
> > +             /*
> > +              * Users for which setuid restrictions exist can only set the
> > +              * effective UID to the real UID, the effective UID, or the
> > +              * saved set-UID, unless an explicit whitelist policy allows
> > +              * the transition.
> > +              */
> > +             if (!uid_eq(old->uid, new->euid) &&
> > +                     !uid_eq(old->euid, new->euid) &&
> > +                     !uid_eq(old->suid, new->euid)) {
> > +                     return check_uid_transition(old->euid, new->euid);
> > +             }
> > +             break;
> > +     case LSM_SETID_ID:
> > +             /*
> > +              * Users for which setuid restrictions exist cannot change the
> > +              * real UID or saved set-UID unless an explicit whitelist
> > +              * policy allows the transition.
> > +              */
> > +             if (!uid_eq(old->uid, new->uid))
> > +                     return check_uid_transition(old->uid, new->uid);
> > +             if (!uid_eq(old->suid, new->suid))
> > +                     return check_uid_transition(old->suid, new->suid);
> > +             break;
> > +     case LSM_SETID_RES:
> > +             /*
> > +              * Users for which setuid restrictions exist cannot change the
> > +              * real UID, effective UID, or saved set-UID to anything but
> > +              * one of: the current real UID, the current effective UID or
> > +              * the current saved set-user-ID unless an explicit whitelist
> > +              * policy allows the transition.
> > +              */
> > +             if (!uid_eq(new->uid, old->uid) &&
> > +                     !uid_eq(new->uid, old->euid) &&
> > +                     !uid_eq(new->uid, old->suid)) {
> > +                     return check_uid_transition(old->uid, new->uid);
> > +             }
> > +             if (!uid_eq(new->euid, old->uid) &&
> > +                     !uid_eq(new->euid, old->euid) &&
> > +                     !uid_eq(new->euid, old->suid)) {
> > +                     return check_uid_transition(old->euid, new->euid);
> > +             }
> > +             if (!uid_eq(new->suid, old->uid) &&
> > +                     !uid_eq(new->suid, old->euid) &&
> > +                     !uid_eq(new->suid, old->suid)) {
> > +                     return check_uid_transition(old->suid, new->suid);
> > +             }
> > +             break;
> > +     case LSM_SETID_FS:
> > +             /*
> > +              * Users for which setuid restrictions exist cannot change the
> > +              * filesystem UID to anything but one of: the current real UID,
> > +              * the current effective UID or the current saved set-UID
> > +              * unless an explicit whitelist policy allows the transition.
> > +              */
> > +             if (!uid_eq(new->fsuid, old->uid)  &&
> > +                     !uid_eq(new->fsuid, old->euid)  &&
> > +                     !uid_eq(new->fsuid, old->suid) &&
> > +                     !uid_eq(new->fsuid, old->fsuid)) {
> > +                     return check_uid_transition(old->fsuid, new->fsuid);
> > +             }
> > +             break;
> > +     }
> > +     return 0;
> > +}
> > +
> > +int add_safesetid_whitelist_entry(kuid_t parent, kuid_t child)
> > +{
> > +     struct entry *new;
> > +
> > +     /* Return if entry already exists */
> > +     if (check_setuid_policy_hashtable_key_value(parent, child))
> > +             return 0;
> > +
> > +     new = kzalloc(sizeof(struct entry), GFP_KERNEL);
> > +     if (!new)
> > +             return -ENOMEM;
> > +     new->parent_kuid = __kuid_val(parent);
> > +     new->child_kuid = __kuid_val(child);
> > +     spin_lock(&safesetid_whitelist_hashtable_spinlock);
> > +     hash_add_rcu(safesetid_whitelist_hashtable,
> > +                  &new->next,
> > +                  __kuid_val(parent));
> > +     spin_unlock(&safesetid_whitelist_hashtable_spinlock);
> > +     return 0;
> > +}
> > +
> > +void flush_safesetid_whitelist_entries(void)
> > +{
> > +     struct entry *entry;
> > +     struct hlist_node *hlist_node;
> > +     unsigned int bkt_loop_cursor;
> > +     HLIST_HEAD(free_list);
> > +
> > +     /*
> > +      * Could probably use hash_for_each_rcu here instead, but this should
> > +      * be fine as well.
> > +      */
> > +     hash_for_each_safe(safesetid_whitelist_hashtable, bkt_loop_cursor,
> > +                        hlist_node, entry, next) {
> > +             spin_lock(&safesetid_whitelist_hashtable_spinlock);
> > +             hash_del_rcu(&entry->next);
> > +             spin_unlock(&safesetid_whitelist_hashtable_spinlock);
> > +             hlist_add_head(&entry->dlist, &free_list);
> > +     }
> > +     synchronize_rcu();
> > +     hlist_for_each_entry_safe(entry, hlist_node, &free_list, dlist)
> > +             kfree(entry);
> > +}
> > +
> > +static struct security_hook_list safesetid_security_hooks[] = {
> > +     LSM_HOOK_INIT(task_fix_setuid, safesetid_task_fix_setuid),
> > +     LSM_HOOK_INIT(capable, safesetid_security_capable)
> > +};
> > +
> > +static int __init safesetid_security_init(void)
> > +{
> > +     security_add_hooks(safesetid_security_hooks,
> > +                        ARRAY_SIZE(safesetid_security_hooks), "safesetid");
> > +
> > +     return 0;
> > +}
> > +security_initcall(safesetid_security_init);
> > diff --git a/security/safesetid/lsm.h b/security/safesetid/lsm.h
> > new file mode 100644
> > index 000000000000..bf78af9bf314
> > --- /dev/null
> > +++ b/security/safesetid/lsm.h
> > @@ -0,0 +1,30 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * SafeSetID Linux Security Module
> > + *
> > + * Author: Micah Morton <mortonm@chromium.org>
> > + *
> > + * Copyright (C) 2018 The Chromium OS Authors.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + */
> > +#ifndef _SAFESETID_H
> > +#define _SAFESETID_H
> > +
> > +#include <linux/types.h>
> > +
> > +/* Function type. */
> > +enum safesetid_whitelist_file_write_type {
> > +     SAFESETID_WHITELIST_ADD, /* Add whitelist policy. */
> > +     SAFESETID_WHITELIST_FLUSH, /* Flush whitelist policies. */
> > +};
> > +
> > +/* Add entry to safesetid whitelist to allow 'parent' to setid to 'child'. */
> > +int add_safesetid_whitelist_entry(kuid_t parent, kuid_t child);
> > +
> > +void flush_safesetid_whitelist_entries(void);
> > +
> > +#endif /* _SAFESETID_H */
> > diff --git a/security/safesetid/securityfs.c b/security/safesetid/securityfs.c
> > new file mode 100644
> > index 000000000000..ff5fcf2c1b37
> > --- /dev/null
> > +++ b/security/safesetid/securityfs.c
> > @@ -0,0 +1,189 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * SafeSetID Linux Security Module
> > + *
> > + * Author: Micah Morton <mortonm@chromium.org>
> > + *
> > + * Copyright (C) 2018 The Chromium OS Authors.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + */
> > +#include <linux/security.h>
> > +#include <linux/cred.h>
> > +
> > +#include "lsm.h"
> > +
> > +static struct dentry *safesetid_policy_dir;
> > +
> > +struct safesetid_file_entry {
> > +     const char *name;
> > +     enum safesetid_whitelist_file_write_type type;
> > +     struct dentry *dentry;
> > +};
> > +
> > +static struct safesetid_file_entry safesetid_files[] = {
> > +     {.name = "add_whitelist_policy",
> > +      .type = SAFESETID_WHITELIST_ADD},
> > +     {.name = "flush_whitelist_policies",
> > +      .type = SAFESETID_WHITELIST_FLUSH},
> > +};
> > +
> > +/*
> > + * In the case the input buffer contains one or more invalid UIDs, the kuid_t
> > + * variables pointed to by 'parent' and 'child' will get updated but this
> > + * function will return an error.
> > + */
> > +static int parse_safesetid_whitelist_policy(const char __user *buf,
> > +                                         size_t len,
> > +                                         kuid_t *parent,
> > +                                         kuid_t *child)
> > +{
> > +     char *kern_buf;
> > +     char *parent_buf;
> > +     char *child_buf;
> > +     const char separator[] = ":";
> > +     int ret;
> > +     size_t first_substring_length;
> > +     long parsed_parent;
> > +     long parsed_child;
> > +
> > +     /* Duplicate string from user memory and NULL-terminate */
> > +     kern_buf = memdup_user_nul(buf, len);
> > +     if (IS_ERR(kern_buf))
> > +             return PTR_ERR(kern_buf);
> > +
> > +     /*
> > +      * Format of |buf| string should be <UID>:<UID>.
> > +      * Find location of ":" in kern_buf (copied from |buf|).
> > +      */
> > +     first_substring_length = strcspn(kern_buf, separator);
> > +     if (first_substring_length == 0 || first_substring_length == len) {
> > +             ret = -EINVAL;
> > +             goto free_kern;
> > +     }
> > +
> > +     parent_buf = kmemdup_nul(kern_buf, first_substring_length, GFP_KERNEL);
> > +     if (!parent_buf) {
> > +             ret = -ENOMEM;
> > +             goto free_kern;
> > +     }
> > +
> > +     ret = kstrtol(parent_buf, 0, &parsed_parent);
> > +     if (ret)
> > +             goto free_both;
> > +
> > +     child_buf = kern_buf + first_substring_length + 1;
> > +     ret = kstrtol(child_buf, 0, &parsed_child);
> > +     if (ret)
> > +             goto free_both;
> > +
> > +     *parent = make_kuid(current_user_ns(), parsed_parent);
> > +     if (!uid_valid(*parent)) {
> > +             ret = -EINVAL;
> > +             goto free_both;
> > +     }
> > +
> > +     *child = make_kuid(current_user_ns(), parsed_child);
> > +     if (!uid_valid(*child)) {
> > +             ret = -EINVAL;
> > +             goto free_both;
> > +     }
> > +
> > +free_both:
> > +     kfree(parent_buf);
> > +free_kern:
> > +     kfree(kern_buf);
> > +     return ret;
> > +}
> > +
> > +static ssize_t safesetid_file_write(struct file *file,
> > +                                 const char __user *buf,
> > +                                 size_t len,
> > +                                 loff_t *ppos)
> > +{
> > +     struct safesetid_file_entry *file_entry =
> > +             file->f_inode->i_private;
> > +     kuid_t parent;
> > +     kuid_t child;
> > +     int ret;
> > +
> > +     if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
> > +             return -EPERM;
> > +
> > +     if (*ppos != 0)
> > +             return -EINVAL;
> > +
> > +     if (file_entry->type == SAFESETID_WHITELIST_FLUSH) {
> > +             flush_safesetid_whitelist_entries();
> > +             return len;
> > +     }
> > +
> > +     /*
> > +      * If we get to here, must be the case that file_entry->type equals
> > +      * SAFESETID_WHITELIST_ADD
> > +      */
> > +     ret = parse_safesetid_whitelist_policy(buf, len, &parent,
> > +                                                      &child);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = add_safesetid_whitelist_entry(parent, child);
> > +     if (ret)
> > +             return ret;
> > +
> > +     /* Return len on success so caller won't keep trying to write */
> > +     return len;
> > +}
> > +
> > +static const struct file_operations safesetid_file_fops = {
> > +     .write = safesetid_file_write,
> > +};
> > +
> > +static void safesetid_shutdown_securityfs(void)
> > +{
> > +     int i;
> > +
> > +     for (i = 0; i < ARRAY_SIZE(safesetid_files); ++i) {
> > +             struct safesetid_file_entry *entry =
> > +                     &safesetid_files[i];
> > +             securityfs_remove(entry->dentry);
> > +             entry->dentry = NULL;
> > +     }
> > +
> > +     securityfs_remove(safesetid_policy_dir);
> > +     safesetid_policy_dir = NULL;
> > +}
> > +
> > +static int __init safesetid_init_securityfs(void)
> > +{
> > +     int i;
> > +     int ret;
> > +
> > +     safesetid_policy_dir = securityfs_create_dir("safesetid", NULL);
> > +     if (!safesetid_policy_dir) {
> > +             ret = PTR_ERR(safesetid_policy_dir);
> > +             goto error;
> > +     }
> > +
> > +     for (i = 0; i < ARRAY_SIZE(safesetid_files); ++i) {
> > +             struct safesetid_file_entry *entry =
> > +                     &safesetid_files[i];
> > +             entry->dentry = securityfs_create_file(
> > +                     entry->name, 0200, safesetid_policy_dir,
> > +                     entry, &safesetid_file_fops);
> > +             if (IS_ERR(entry->dentry)) {
> > +                     ret = PTR_ERR(entry->dentry);
> > +                     goto error;
> > +             }
> > +     }
> > +
> > +     return 0;
> > +
> > +error:
> > +     safesetid_shutdown_securityfs();
> > +     return ret;
> > +}
> > +fs_initcall(safesetid_init_securityfs);
> >
>
Serge E. Hallyn Nov. 2, 2018, 7:22 p.m. UTC | #22
Quoting Casey Schaufler (casey@schaufler-ca.com):
> On 11/2/2018 11:30 AM, Serge E. Hallyn wrote:
> > Quoting Casey Schaufler (casey@schaufler-ca.com):
> >
> >> Let me suggest a change to the way your LSM works
> >> that would reduce my concerns. Rather than refusing to
> >> make a UID change that isn't on your whitelist, kill a
> >> process that makes a prohibited request. This mitigates
> >> the problem where a process doesn't check for an error
> >> return. Sure, your system will be harder to get running
> >> until your whitelist is complete, but you'll avoid a
> >> whole category of security bugs.
> > Might also consider not restricting CAP_SETUID, but instead adding a
> > new CAP_SETUID_RANGE capability.  That way you can be sure there will be
> > no regressions with any programs which run with CAP_SETUID.
> >
> > Though that violates what Casey was just arguing halfway up the email.
> 
> I know that it's hard to believe 20 years after the fact,
> but the POSIX group worked very hard to ensure that the granularity
> of capabilities was correct for the security policy that the
> interfaces defined in P1003.1. What would CAP_SETUID_RANGE mean?

CAP_SETUID would mean you can switch to any uid.

CAP_SETUID_RANGE would mean you could make the transitions which have
been defined through <handwave> some mechanism.  Be it prctl, or some
new security.uidrange xattr, or the mechanism Micah proposed, except
it only applies for CAP_SETUID_RANGE not CAP_SETUID.

-serge
Micah Morton Nov. 2, 2018, 7:28 p.m. UTC | #23
On Fri, Nov 2, 2018 at 11:19 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/2/2018 10:12 AM, Micah Morton wrote:
> > On Fri, Nov 2, 2018 at 9:05 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >> On 11/1/2018 12:52 PM, Micah Morton wrote:
> >>> On Thu, Nov 1, 2018 at 10:08 AM Casey Schaufler <casey@schaufler-ca.com> wrote:
> >>>> On 11/1/2018 9:11 AM, Micah Morton wrote:
> >>>>> On Wed, Oct 31, 2018 at 11:07 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >>>>>> On Wed, Oct 31, 2018 at 09:02:45PM +0000, Serge E. Hallyn wrote:
> >>>>>>> Quoting mortonm@chromium.org (mortonm@chromium.org):
> >>>>>>>> From: Micah Morton <mortonm@chromium.org>
> >>>>>>>>
> >>>>>>>> SafeSetID gates the setid family of syscalls to restrict UID/GID
> >>>>>>>> transitions from a given UID/GID to only those approved by a
> >>>>>>>> system-wide whitelist. These restrictions also prohibit the given
> >>>>>>>> UIDs/GIDs from obtaining auxiliary privileges associated with
> >>>>>>>> CAP_SET{U/G}ID, such as allowing a user to set up user namespace UID
> >>>>>>>> mappings. For now, only gating the set*uid family of syscalls is
> >>>>>>>> supported, with support for set*gid coming in a future patch set.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Micah Morton <mortonm@chromium.org>
> >>>>>>>> ---
> >>>>>>>>
> >>>>>>>> NOTE: See the TODO above setuid_syscall() in lsm.c for an aspect of this
> >>>>>>>> code that likely needs improvement before being an acceptable approach.
> >>>>>>>> I'm specifically interested to see if there are better ideas for how
> >>>>>>>> this could be done.
> >>>>>>>>
> >>>>>>>>  Documentation/admin-guide/LSM/SafeSetID.rst |  94 ++++++
> >>>>>>>>  Documentation/admin-guide/LSM/index.rst     |   1 +
> >>>>>>>>  arch/Kconfig                                |   5 +
> >>>>>>>>  arch/arm/Kconfig                            |   1 +
> >>>>>>>>  arch/arm64/Kconfig                          |   1 +
> >>>>>>>>  arch/x86/Kconfig                            |   1 +
> >>>>>>>>  security/Kconfig                            |   1 +
> >>>>>>>>  security/Makefile                           |   2 +
> >>>>>>>>  security/safesetid/Kconfig                  |  13 +
> >>>>>>>>  security/safesetid/Makefile                 |   7 +
> >>>>>>>>  security/safesetid/lsm.c                    | 334 ++++++++++++++++++++
> >>>>>>>>  security/safesetid/lsm.h                    |  30 ++
> >>>>>>>>  security/safesetid/securityfs.c             | 189 +++++++++++
> >>>>>>>>  13 files changed, 679 insertions(+)
> >>>>>>>>  create mode 100644 Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>>>  create mode 100644 security/safesetid/Kconfig
> >>>>>>>>  create mode 100644 security/safesetid/Makefile
> >>>>>>>>  create mode 100644 security/safesetid/lsm.c
> >>>>>>>>  create mode 100644 security/safesetid/lsm.h
> >>>>>>>>  create mode 100644 security/safesetid/securityfs.c
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>>> new file mode 100644
> >>>>>>>> index 000000000000..e7d072124424
> >>>>>>>> --- /dev/null
> >>>>>>>> +++ b/Documentation/admin-guide/LSM/SafeSetID.rst
> >>>>>>>> @@ -0,0 +1,94 @@
> >>>>>>>> +=========
> >>>>>>>> +SafeSetID
> >>>>>>>> +=========
> >>>>>>>> +SafeSetID is an LSM module that gates the setid family of syscalls to restrict
> >>>>>>>> +UID/GID transitions from a given UID/GID to only those approved by a
> >>>>>>>> +system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
> >>>>>>>> +from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
> >>>>>>>> +allowing a user to set up user namespace UID mappings.
> >>>>>>>> +
> >>>>>>>> +
> >>>>>>>> +Background
> >>>>>>>> +==========
> >>>>>>>> +In absence of file capabilities, processes spawned on a Linux system that need
> >>>>>>>> +to switch to a different user must be spawned with CAP_SETUID privileges.
> >>>>>>>> +CAP_SETUID is granted to programs running as root or those running as a non-root
> >>>>>>>> +user that have been explicitly given the CAP_SETUID runtime capability. It is
> >>>>>>>> +often preferable to use Linux runtime capabilities rather than file
> >>>>>>>> +capabilities, since using file capabilities to run a program with elevated
> >>>>>>>> +privileges opens up possible security holes since any user with access to the
> >>>>>>>> +file can exec() that program to gain the elevated privileges.
> >>>>>>> Not true, see inheritable capabilities.  You also might look at ambient
> >>>>>>> capabilities.
> >>>>>> So for example with pam_cap.so you could have your N uids each be given
> >>>>>> the desired pI, and assign the corrsponding fIs to the files they should
> >>>>>> be able to exec with privilege.  No other uids will run those files with
> >>>>>> privilege.  *1
> >>>>> Sorry, what are "pl" and "fls" here? "Privilege level" and "files"?
> >>>>>
> >>>>>> Can you give some more details about exactly how you see SafeSetID being
> >>>>>> used?
> >>>>> Sure. The main use case for this LSM is to allow a non-root program to
> >>>>> transition to other untrusted uids without full blown CAP_SETUID
> >>>>> capabilities. The non-root program would still need CAP_SETUID to do
> >>>>> any kind of transition, but the additional restrictions imposed by
> >>>>> this LSM would mean it is a "safer" version of CAP_SETUID since the
> >>>>> non-root program cannot take advantage of CAP_SETUID to do any
> >>>>> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> >>>>> namespace). The higher level goal is to allow for uid-based sandboxing
> >>>>> of system services without having to give out CAP_SETUID all over the
> >>>>> place just so that non-root programs can drop to
> >>>>> even-further-non-privileged uids. This is especially relevant when one
> >>>>> non-root daemon on the system should be allowed to spawn other
> >>>>> processes as different uids, but its undesirable to give the daemon a
> >>>>> basically-root-equivalent CAP_SETUID.
> >>>> I don't want to sound stupid(er than usual), but it sounds like
> >>>> you could do all this using setuid bits prudently. Based on this
> >>>> description, I don't see that anything new is needed.
> >>> There are situations where setuid bits don't get the job done, as
> >>> there are many situations where a program just wants to call setuid as
> >>> part of its execution (or fork + setuid without exec), instead of
> >>> fork/exec'ing a setuid binary.
> >> Yes, I understand that.
> >>
> >>> Take the following scenario for
> >>> example: init script (as root) spawns a network manager program as uid
> >>> 1000
> >> So far, so good.
> >>
> >>> and then the network manager spawns OpenVPN. The common mode of
> >>> operation for OpenVPN is to start running as the uid it was spawned
> >>> with (1000) at startup, but then drop to a lesser-privileged uid (e.g.
> >>> 2000) after initialization/setup by calling setuid.
> >> OK. That's an operation that does and ought to require privilege.
> > Sure, but the idea behind this LSM is that full CAP_SETUID
> > capabilities are a lot more privilege than is necessary in this
> > scenario.
>
> I'll start by pointing out that CAP_SETUID is about the finest grained
> capability there is. It's very precise in what it allows. I think that
> your concern is about the worst case scenario, which is setting the
> effective UID to 0, and hence gaining all privilege.

Yes, that plus the other powers granted by CAP_SETUID apart from
calling one of the set*uid functions:
https://elixir.bootlin.com/linux/latest/ident/CAP_SETUID (e.g. setting
up user ns mappings).

>
>
> >>> This is something
> >>> setuid bits wouldn't help with, without refactoring OpenVPN.
> >> You're correct.
> >>
> >>> So one
> >>> option here is to give the network manager CAP_SETUID, which will be
> >>> inherited by OpenVPN, and then OpenVPN drops to uid 2000 and drops
> >>> CAP_SETUID (would probably require patching OpenVPN for the capability
> >>> dropping).
> >> Or, you put CAP_SETUID on the file capabilities for OpenVPN,
> >> which is the way the P1003.1e DRAFT specification would have
> >> you accomplish this. Unfortunately, with all the changes made
> >> to capabilities for namespaces and all I'm not 100% sure I
> >> could say exactly how to set that.
> >>
> >>> The problem here is that if the network manager itself is
> >>> untrusted and exploitable, then giving it unrestricted CAP_SETUID is a
> >>> big security risk.
> >> Right. That's why you set the file capabilities on OpenVPN.
> > So it seems like you're suggesting that any time a program needs to
> > switch user by calling setuid,
>
> ... in a way that requires CAP_SETUID ...
>
> > that it should get full CAP_SETUID
> > capabilities (whether that's through setting file capabilities on the
> > binary or inheriting CAP_SETUID from a parent process or otherwise).
>
> Yup. That's correct. With all the duties and responsibilities associated
> with the dangers of UID management. Changing UIDs shouldn't be done
> lightly and needs to be done carefully.
>
> > But that brings us back to the basic problem this LSM is trying to
> > solve. Namely, we don't want to sprinkle unrestricted CAP_SETUID privs
> > all over the system for binaries that just want to switch to specific
> > uid[s] and don't need any of the root-equivalent privileges provided
> > by CAP_SETUID.
>
> I would see marking a program with a list of UIDs it can run with or
> that its children can run with as a better solution. You get much
> better locality of reference that way.

AFAICT in this scenario an exploited program could still be tricked
(e.g. command injection) into doing the unapproved actions.

>
> >>> Even just sticking with the network manager / VPN
> >>> example, strongSwan VPN also uses the same drop-to-user-through-setuid
> >>> setup, as do other Linux applications.
> >> Same solution.
> >>
> >>> Refactoring these applications
> >>> to fork/exec setuid binaries instead of simply calling setuid is often
> >>> infeasible. So a direct call to setuid is often necessary/expected,
> >>> and setuid bits don't help here.
> >> What is it with kids these days, that they are so
> >> afraid of fixing code that needs fixing? But that's
> >> not necessary in this example.
> >>
> >>> Also, use of setuid bits precludes the use of the no_new_privs bit,
> >>> which is usually at least a nice-to-have (if not need-to-have) for
> >>> sandboxed processes on the system.
> >> But you've already said that you *want* to change the security state,
> >> "drop to a lesser-privileged uid", so you're already mucking with the
> >> sandbox. If you're going to say that changing UIDs doesn't count for
> >> sandboxing I'll point out that you brought up the notion of a
> >> lesser-privileged UID.
> > There are plenty of ways that non-root processes further restrict
> > especially vulnerable parts of their code to even lesser-privileged
> > contexts. But its often easier to reason about the security of such
> > applications if the no_new_privs bit is set and file capabilities are
> > avoided, so the application can have full control of which privileges
> > are given to spawned processes without having to worry about which
> > privileges are attached to which files. Granted, the no_new_privs
> > issue is less central to the LSM being proposed here compared to the
> > discussion above.
>
> Let me suggest a change to the way your LSM works
> that would reduce my concerns. Rather than refusing to
> make a UID change that isn't on your whitelist, kill a
> process that makes a prohibited request. This mitigates
> the problem where a process doesn't check for an error
> return. Sure, your system will be harder to get running
> until your whitelist is complete, but you'll avoid a
> whole category of security bugs.

That's a valid point. ATM I can't think of any reason I'd be opposed to that.

>
James Morris Nov. 6, 2018, 8:59 p.m. UTC | #24
On Thu, 1 Nov 2018, Micah Morton wrote:

> > Can you give some more details about exactly how you see SafeSetID being
> > used?
> 
> Sure. The main use case for this LSM is to allow a non-root program to
> transition to other untrusted uids without full blown CAP_SETUID
> capabilities. The non-root program would still need CAP_SETUID to do
> any kind of transition, but the additional restrictions imposed by
> this LSM would mean it is a "safer" version of CAP_SETUID since the
> non-root program cannot take advantage of CAP_SETUID to do any
> unapproved actions (i.e. setuid to uid 0 or create/enter new user
> namespace). The higher level goal is to allow for uid-based sandboxing
> of system services without having to give out CAP_SETUID all over the
> place just so that non-root programs can drop to
> even-further-non-privileged uids. This is especially relevant when one
> non-root daemon on the system should be allowed to spawn other
> processes as different uids, but its undesirable to give the daemon a
> basically-root-equivalent CAP_SETUID.

Please include this use-case in the kernel documentation.



- James
Micah Morton Nov. 8, 2018, 8:53 p.m. UTC | #25
It seems like the CAP_SETUID_RANGE idea proposed by Serge is mainly a
way to avoid silently breaking programs that run with CAP_SETUID,
which could cause security vulnerabilities. Serge, does Casey's
suggestion (killing processes that try to perform unapproved
transitions) make sense to you as an alternate way to safeguard
against this? Sure there could be regressions, but they would only
happen to users for which whitelist policies had been configured, and
killing processes should be an effective way of identifying any
missing whitelist policies on the system for some restricted user. One
less attractive thing about adding a CAP_SETUID_RANGE capability would
be that more of the common kernel code would have to be modified to
add a new capability, whereas currently the LSM just uses the LSM
hooks.

The other unresolved aspect here (discussed by Stephen above) is how
to "know" that ns_capable(..., CAP_SETUID) was called by sys_set*uid
and not some other kernel code path. The reason we need to know this
is to be able to distinguish id transitions from other privileged
actions (e.g. create/modify/enter user namespace). Certain transitions
should be allowed for whitelisted users, but the other privileged
actions should be denied (or else the security hardening provided by
this LSM is significantly weakened). Do people think the current
reliance on comparing the return value of syscall_get_nr() to
arch-specific syscall constants (e.g. __NR_setuid) is a deal-breaker
and we should find an arch-independent solution such as the one
proposed by Stephen? Or is checking against arch-specific constants
not a big deal and the code can stay as is?
On Fri, Nov 2, 2018 at 12:22 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> Quoting Casey Schaufler (casey@schaufler-ca.com):
> > On 11/2/2018 11:30 AM, Serge E. Hallyn wrote:
> > > Quoting Casey Schaufler (casey@schaufler-ca.com):
> > >
> > >> Let me suggest a change to the way your LSM works
> > >> that would reduce my concerns. Rather than refusing to
> > >> make a UID change that isn't on your whitelist, kill a
> > >> process that makes a prohibited request. This mitigates
> > >> the problem where a process doesn't check for an error
> > >> return. Sure, your system will be harder to get running
> > >> until your whitelist is complete, but you'll avoid a
> > >> whole category of security bugs.
> > > Might also consider not restricting CAP_SETUID, but instead adding a
> > > new CAP_SETUID_RANGE capability.  That way you can be sure there will be
> > > no regressions with any programs which run with CAP_SETUID.
> > >
> > > Though that violates what Casey was just arguing halfway up the email.
> >
> > I know that it's hard to believe 20 years after the fact,
> > but the POSIX group worked very hard to ensure that the granularity
> > of capabilities was correct for the security policy that the
> > interfaces defined in P1003.1. What would CAP_SETUID_RANGE mean?
>
> CAP_SETUID would mean you can switch to any uid.
>
> CAP_SETUID_RANGE would mean you could make the transitions which have
> been defined through <handwave> some mechanism.  Be it prctl, or some
> new security.uidrange xattr, or the mechanism Micah proposed, except
> it only applies for CAP_SETUID_RANGE not CAP_SETUID.
>
> -serge
Casey Schaufler Nov. 8, 2018, 9:34 p.m. UTC | #26
On 11/8/2018 12:53 PM, Micah Morton wrote:
> It seems like the CAP_SETUID_RANGE idea proposed by Serge is mainly a
> way to avoid silently breaking programs that run with CAP_SETUID,
> which could cause security vulnerabilities.

It wouldn't be "silent". You'd have to fix anything that needs
the new capability instead of the old.

> Serge, does Casey's
> suggestion (killing processes that try to perform unapproved
> transitions) make sense to you as an alternate way to safeguard
> against this? Sure there could be regressions, but they would only
> happen to users for which whitelist policies had been configured, and
> killing processes should be an effective way of identifying any
> missing whitelist policies on the system for some restricted user.

I'll point out that seccomp uses the "kill what you deny" approach.
It would be one thing if it was reasonable to assume that programs
are checking their error returns, but we know they're not.

> One
> less attractive thing about adding a CAP_SETUID_RANGE capability would
> be that more of the common kernel code would have to be modified to
> add a new capability, whereas currently the LSM just uses the LSM
> hooks.

That's got to be a secondary consideration. If CAP_SETUID_RANGE
were the right solution the additional work would be worth doing.

> The other unresolved aspect here (discussed by Stephen above) is how
> to "know" that ns_capable(..., CAP_SETUID) was called by sys_set*uid
> and not some other kernel code path.

It sounds like you want a new LSM hook (security_uid_change?) to
put in that/those code path/paths. That would be a lot cleaner than
trying to infer the syscall.

> The reason we need to know this
> is to be able to distinguish id transitions from other privileged
> actions (e.g. create/modify/enter user namespace). Certain transitions
> should be allowed for whitelisted users, but the other privileged
> actions should be denied (or else the security hardening provided by
> this LSM is significantly weakened). Do people think the current
> reliance on comparing the return value of syscall_get_nr() to
> arch-specific syscall constants (e.g. __NR_setuid) is a deal-breaker
> and we should find an arch-independent solution such as the one
> proposed by Stephen?

The problem with inferring the syscall is that the mechanism for
doing so is subject to change and architectural magic.

> Or is checking against arch-specific constants
> not a big deal and the code can stay as is?

It's bad enough when we have to make LSM code check things
like what filesystem an inode relates to. I would say that
you really want a different approach.

> On Fri, Nov 2, 2018 at 12:22 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>> Quoting Casey Schaufler (casey@schaufler-ca.com):
>>> On 11/2/2018 11:30 AM, Serge E. Hallyn wrote:
>>>> Quoting Casey Schaufler (casey@schaufler-ca.com):
>>>>
>>>>> Let me suggest a change to the way your LSM works
>>>>> that would reduce my concerns. Rather than refusing to
>>>>> make a UID change that isn't on your whitelist, kill a
>>>>> process that makes a prohibited request. This mitigates
>>>>> the problem where a process doesn't check for an error
>>>>> return. Sure, your system will be harder to get running
>>>>> until your whitelist is complete, but you'll avoid a
>>>>> whole category of security bugs.
>>>> Might also consider not restricting CAP_SETUID, but instead adding a
>>>> new CAP_SETUID_RANGE capability.  That way you can be sure there will be
>>>> no regressions with any programs which run with CAP_SETUID.
>>>>
>>>> Though that violates what Casey was just arguing halfway up the email.
>>> I know that it's hard to believe 20 years after the fact,
>>> but the POSIX group worked very hard to ensure that the granularity
>>> of capabilities was correct for the security policy that the
>>> interfaces defined in P1003.1. What would CAP_SETUID_RANGE mean?
>> CAP_SETUID would mean you can switch to any uid.
>>
>> CAP_SETUID_RANGE would mean you could make the transitions which have
>> been defined through <handwave> some mechanism.  Be it prctl, or some
>> new security.uidrange xattr, or the mechanism Micah proposed, except
>> it only applies for CAP_SETUID_RANGE not CAP_SETUID.
>>
>> -serge
Micah Morton Nov. 9, 2018, 12:30 a.m. UTC | #27
On Thu, Nov 8, 2018 at 1:34 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 11/8/2018 12:53 PM, Micah Morton wrote:
> > It seems like the CAP_SETUID_RANGE idea proposed by Serge is mainly a
> > way to avoid silently breaking programs that run with CAP_SETUID,
> > which could cause security vulnerabilities.
>
> It wouldn't be "silent". You'd have to fix anything that needs
> the new capability instead of the old.
>
> > Serge, does Casey's
> > suggestion (killing processes that try to perform unapproved
> > transitions) make sense to you as an alternate way to safeguard
> > against this? Sure there could be regressions, but they would only
> > happen to users for which whitelist policies had been configured, and
> > killing processes should be an effective way of identifying any
> > missing whitelist policies on the system for some restricted user.
>
> I'll point out that seccomp uses the "kill what you deny" approach.
> It would be one thing if it was reasonable to assume that programs
> are checking their error returns, but we know they're not.

Totally agree

>
> > One
> > less attractive thing about adding a CAP_SETUID_RANGE capability would
> > be that more of the common kernel code would have to be modified to
> > add a new capability, whereas currently the LSM just uses the LSM
> > hooks.
>
> That's got to be a secondary consideration. If CAP_SETUID_RANGE
> were the right solution the additional work would be worth doing.
>
> > The other unresolved aspect here (discussed by Stephen above) is how
> > to "know" that ns_capable(..., CAP_SETUID) was called by sys_set*uid
> > and not some other kernel code path.
>
> It sounds like you want a new LSM hook (security_uid_change?) to
> put in that/those code path/paths. That would be a lot cleaner than
> trying to infer the syscall.

Providing a new LSM hook would be tricky, because there are places
(besides sys_set*id) that call ns_capable(..., CAP_SETUID) which we
would want to fail if there were whitelist restrictions for the
current uid: https://elixir.bootlin.com/linux/latest/ident/CAP_SETUID.
Even if we changed all non-sys_set*id code paths to call some hook
that will deny them if the current uid is restricted, I'm not sure
there would be any way to ensure that code paths added in the future
would call the hook to restrict themselves in the event that the user
is restricted. I guess our security_capable hook could always deny the
CAP_SETUID check for restricted users but then change the lines in
kernel/sys.c (e.g.
https://elixir.bootlin.com/linux/latest/source/kernel/sys.c#L584) to
do something like 'if (ns_capable(old->user_ns, CAP_SETUID) ||
security_uid_change(old->uid, new->uid))'. But that would be weird
since security_uid_change would have to redo all the checks that
ns_capable did (to basically say _yes_ if the current user has
CAP_SETUID _and_ the transition is allowed in the whitelist).

I might be missing something, but Stephen's idea seems to make more
sense to me. We already pass the
SECURITY_CAP_AUDIT/SECURITY_CAP_NOAUDIT flags to the security_capable
hook to let it know whether to write an audit message for the check.
Seems like it would be easy to extend this to define more bits (e.g.
SECURITY_SETID_TRANSITION) in that flag or have a more general
mechanism where we pass some kind of struct to security_capable that
contains these flags and possibly other args like Stephen was
mentioning. I'll share a patch for that approach, assuming I don't
realize there is some reason that actually wouldn't be a good idea.

>
> > The reason we need to know this
> > is to be able to distinguish id transitions from other privileged
> > actions (e.g. create/modify/enter user namespace). Certain transitions
> > should be allowed for whitelisted users, but the other privileged
> > actions should be denied (or else the security hardening provided by
> > this LSM is significantly weakened). Do people think the current
> > reliance on comparing the return value of syscall_get_nr() to
> > arch-specific syscall constants (e.g. __NR_setuid) is a deal-breaker
> > and we should find an arch-independent solution such as the one
> > proposed by Stephen?
>
> The problem with inferring the syscall is that the mechanism for
> doing so is subject to change and architectural magic.
>
> > Or is checking against arch-specific constants
> > not a big deal and the code can stay as is?
>
> It's bad enough when we have to make LSM code check things
> like what filesystem an inode relates to. I would say that
> you really want a different approach.
>
> > On Fri, Nov 2, 2018 at 12:22 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >> Quoting Casey Schaufler (casey@schaufler-ca.com):
> >>> On 11/2/2018 11:30 AM, Serge E. Hallyn wrote:
> >>>> Quoting Casey Schaufler (casey@schaufler-ca.com):
> >>>>
> >>>>> Let me suggest a change to the way your LSM works
> >>>>> that would reduce my concerns. Rather than refusing to
> >>>>> make a UID change that isn't on your whitelist, kill a
> >>>>> process that makes a prohibited request. This mitigates
> >>>>> the problem where a process doesn't check for an error
> >>>>> return. Sure, your system will be harder to get running
> >>>>> until your whitelist is complete, but you'll avoid a
> >>>>> whole category of security bugs.
> >>>> Might also consider not restricting CAP_SETUID, but instead adding a
> >>>> new CAP_SETUID_RANGE capability.  That way you can be sure there will be
> >>>> no regressions with any programs which run with CAP_SETUID.
> >>>>
> >>>> Though that violates what Casey was just arguing halfway up the email.
> >>> I know that it's hard to believe 20 years after the fact,
> >>> but the POSIX group worked very hard to ensure that the granularity
> >>> of capabilities was correct for the security policy that the
> >>> interfaces defined in P1003.1. What would CAP_SETUID_RANGE mean?
> >> CAP_SETUID would mean you can switch to any uid.
> >>
> >> CAP_SETUID_RANGE would mean you could make the transitions which have
> >> been defined through <handwave> some mechanism.  Be it prctl, or some
> >> new security.uidrange xattr, or the mechanism Micah proposed, except
> >> it only applies for CAP_SETUID_RANGE not CAP_SETUID.
> >>
> >> -serge
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
new file mode 100644
index 000000000000..e7d072124424
--- /dev/null
+++ b/Documentation/admin-guide/LSM/SafeSetID.rst
@@ -0,0 +1,94 @@ 
+=========
+SafeSetID
+=========
+SafeSetID is an LSM module that gates the setid family of syscalls to restrict
+UID/GID transitions from a given UID/GID to only those approved by a
+system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
+from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
+allowing a user to set up user namespace UID mappings.
+
+
+Background
+==========
+In absence of file capabilities, processes spawned on a Linux system that need
+to switch to a different user must be spawned with CAP_SETUID privileges.
+CAP_SETUID is granted to programs running as root or those running as a non-root
+user that have been explicitly given the CAP_SETUID runtime capability. It is
+often preferable to use Linux runtime capabilities rather than file
+capabilities, since using file capabilities to run a program with elevated
+privileges opens up possible security holes since any user with access to the
+file can exec() that program to gain the elevated privileges.
+
+While it is possible to implement a tree of processes by giving full
+CAP_SET{U/G}ID capabilities, this is often at odds with the goals of running a
+tree of processes under non-root user(s) in the first place. Specifically,
+since CAP_SETUID allows changing to any user on the system, including the root
+user, it is an overpowered capability for what is needed in this scenario,
+especially since programs often only call setuid() to drop privileges to a
+lesser-privileged user -- not elevate privileges. Unfortunately, there is no
+generally feasible way in Linux to restrict the potential UIDs that a user can
+switch to through setuid() beyond allowing a switch to any user on the system.
+This SafeSetID LSM seeks to provide a solution for restricting setid
+capabilities in such a way.
+
+
+Other Approaches Considered
+===========================
+
+Solve this problem in userspace
+-------------------------------
+For candidate applications that would like to have restricted setid capabilities
+as implemented in this LSM, an alternative option would be to simply take away
+setid capabilities from the application completely and refactor the process
+spawning semantics in the application (e.g. by using a privileged helper program
+to do process spawning and UID/GID transitions). Unfortunately, there are a
+number of semantics around process spawning that would be affected by this, such
+as fork() calls where the program doesn’t immediately call exec() after the
+fork(), parent processes specifying custom environment variables or command line
+args for spawned child processes, or inheritance of file handles across a
+fork()/exec(). Because of this, as solution that uses a privileged helper in
+userspace would likely be less appealing to incorporate into existing projects
+that rely on certain process-spawning semantics in Linux.
+
+Use user namespaces
+-------------------
+Another possible approach would be to run a given process tree in its own user
+namespace and give programs in the tree setid capabilities. In this way,
+programs in the tree could change to any desired UID/GID in the context of their
+own user namespace, and only approved UIDs/GIDs could be mapped back to the
+initial system user namespace, affectively preventing privilege escalation.
+Unfortunately, it is not generally feasible to use user namespaces in isolation,
+without pairing them with other namespace types, which is not always an option.
+Linux checks for capabilities based off of the user namespace that “owns” some
+entity. For example, Linux has the notion that network namespaces are owned by
+the user namespace in which they were created. A consequence of this is that
+capability checks for access to a given network namespace are done by checking
+whether a task has the given capability in the context of the user namespace
+that owns the network namespace -- not necessarily the user namespace under
+which the given task runs. Therefore spawning a process in a new user namespace
+effectively prevents it from accessing the network namespace owned by the
+initial namespace. This is a deal-breaker for any application that expects to
+retain the CAP_NET_ADMIN capability for the purpose of adjusting network
+configurations. Using user namespaces in isolation causes problems regarding
+other system interactions, including use of pid namespaces and device creation.
+
+Use an existing LSM
+-------------------
+None of the other in-tree LSMs have the capability to gate setid transitions, or
+even employ the security_task_fix_setuid hook at all. SELinux says of that hook:
+"Since setuid only affects the current process, and since the SELinux controls
+are not based on the Linux identity attributes, SELinux does not need to control
+this operation."
+
+
+Directions for use
+==================
+This LSM hooks the setid syscalls to make sure transitions are allowed if an
+applicable restriction policy is in place. Policies are configured through
+securityfs by writing to the safesetid/add_whitelist_policy and
+safesetid/flush_whitelist_policies files at the location where securityfs is
+mounted. The format for adding a policy is '<UID>:<UID>', using literal
+numbers, such as '123:456'. To flush the policies, any write to the file is
+sufficient. Again, configuring a policy for a UID will prevent that UID from
+obtaining auxiliary setid privileges, such as allowing a user to set up user
+namespace UID mappings.
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
index c980dfe9abf1..a0c387649e12 100644
--- a/Documentation/admin-guide/LSM/index.rst
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -39,3 +39,4 @@  the one "major" module (e.g. SELinux) if there is one configured.
    Smack
    tomoyo
    Yama
+   SafeSetID
diff --git a/arch/Kconfig b/arch/Kconfig
index 1aa59063f1fd..c87070807ba2 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -381,6 +381,11 @@  config ARCH_WANT_OLD_COMPAT_IPC
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
 	bool
 
+config HAVE_SAFESETID
+	bool
+	help
+	  This option enables the SafeSetID LSM.
+
 config HAVE_ARCH_SECCOMP_FILTER
 	bool
 	help
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 843edfd000be..35b1a772c971 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -92,6 +92,7 @@  config ARM
 	select HAVE_RCU_TABLE_FREE if (SMP && ARM_LPAE)
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RSEQ
+	select HAVE_SAFESETID
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_UID16
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 42c090cf0292..2c6f5ec3a55e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -127,6 +127,7 @@  config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_RCU_TABLE_FREE
+	select HAVE_SAFESETID
 	select HAVE_STACKPROTECTOR
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_KPROBES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 887d3a7bb646..a6527d6c0426 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -27,6 +27,7 @@  config X86_64
 	select ARCH_SUPPORTS_INT128
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select HAVE_ARCH_SOFT_DIRTY
+	select HAVE_SAFESETID
 	select MODULES_USE_ELF_RELA
 	select NEED_DMA_MAP_STATE
 	select SWIOTLB
diff --git a/security/Kconfig b/security/Kconfig
index c4302067a3ad..7d9008ad5903 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -237,6 +237,7 @@  source security/tomoyo/Kconfig
 source security/apparmor/Kconfig
 source security/loadpin/Kconfig
 source security/yama/Kconfig
+source security/safesetid/Kconfig
 
 source security/integrity/Kconfig
 
diff --git a/security/Makefile b/security/Makefile
index 4d2d3782ddef..88209d827832 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -10,6 +10,7 @@  subdir-$(CONFIG_SECURITY_TOMOYO)        += tomoyo
 subdir-$(CONFIG_SECURITY_APPARMOR)	+= apparmor
 subdir-$(CONFIG_SECURITY_YAMA)		+= yama
 subdir-$(CONFIG_SECURITY_LOADPIN)	+= loadpin
+subdir-$(CONFIG_SECURITY_SAFESETID)	+= safesetid
 
 # always enable default capabilities
 obj-y					+= commoncap.o
@@ -25,6 +26,7 @@  obj-$(CONFIG_SECURITY_TOMOYO)		+= tomoyo/
 obj-$(CONFIG_SECURITY_APPARMOR)		+= apparmor/
 obj-$(CONFIG_SECURITY_YAMA)		+= yama/
 obj-$(CONFIG_SECURITY_LOADPIN)		+= loadpin/
+obj-$(CONFIG_SECURITY_SAFESETID)	+= safesetid/
 obj-$(CONFIG_CGROUP_DEVICE)		+= device_cgroup.o
 
 # Object integrity file lists
diff --git a/security/safesetid/Kconfig b/security/safesetid/Kconfig
new file mode 100644
index 000000000000..4ff82c7ed273
--- /dev/null
+++ b/security/safesetid/Kconfig
@@ -0,0 +1,13 @@ 
+config SECURITY_SAFESETID
+        bool "Gate setid transitions to limit CAP_SET{U/G}ID capabilities"
+        depends on HAVE_SAFESETID
+        default n
+        help
+          SafeSetID is an LSM module that gates the setid family of syscalls to
+          restrict UID/GID transitions from a given UID/GID to only those
+          approved by a system-wide whitelist. These restrictions also prohibit
+          the given UIDs/GIDs from obtaining auxiliary privileges associated
+          with CAP_SET{U/G}ID, such as allowing a user to set up user namespace
+          UID mappings.
+
+          If you are unsure how to answer this question, answer N.
diff --git a/security/safesetid/Makefile b/security/safesetid/Makefile
new file mode 100644
index 000000000000..6b0660321164
--- /dev/null
+++ b/security/safesetid/Makefile
@@ -0,0 +1,7 @@ 
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the safesetid LSM.
+#
+
+obj-$(CONFIG_SECURITY_SAFESETID) := safesetid.o
+safesetid-y := lsm.o securityfs.o
diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
new file mode 100644
index 000000000000..e30ff06d8e07
--- /dev/null
+++ b/security/safesetid/lsm.c
@@ -0,0 +1,334 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * SafeSetID Linux Security Module
+ *
+ * Author: Micah Morton <mortonm@chromium.org>
+ *
+ * Copyright (C) 2018 The Chromium OS Authors.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#define pr_fmt(fmt) "SafeSetID: " fmt
+
+#include <asm/syscall.h>
+#include <linux/hashtable.h>
+#include <linux/lsm_hooks.h>
+#include <linux/module.h>
+#include <linux/ptrace.h>
+#include <linux/sched/task_stack.h>
+#include <linux/security.h>
+
+#define NUM_BITS 8 /* 128 buckets in hash table */
+
+static DEFINE_HASHTABLE(safesetid_whitelist_hashtable, NUM_BITS);
+
+/*
+ * Hash table entry to store safesetid policy signifying that 'parent' user
+ * can setid to 'child' user.
+ */
+struct entry {
+	struct hlist_node next;
+	struct hlist_node dlist; /* for deletion cleanup */
+	uint64_t parent_kuid;
+	uint64_t child_kuid;
+};
+
+static DEFINE_SPINLOCK(safesetid_whitelist_hashtable_spinlock);
+
+static bool check_setuid_policy_hashtable_key(kuid_t parent)
+{
+	struct entry *entry;
+
+	rcu_read_lock();
+	hash_for_each_possible_rcu(safesetid_whitelist_hashtable,
+				   entry, next, __kuid_val(parent)) {
+		if (entry->parent_kuid == __kuid_val(parent)) {
+			rcu_read_unlock();
+			return true;
+		}
+	}
+	rcu_read_unlock();
+
+	return false;
+}
+
+static bool check_setuid_policy_hashtable_key_value(kuid_t parent,
+						    kuid_t child)
+{
+	struct entry *entry;
+
+	rcu_read_lock();
+	hash_for_each_possible_rcu(safesetid_whitelist_hashtable,
+				   entry, next, __kuid_val(parent)) {
+		if (entry->parent_kuid == __kuid_val(parent) &&
+		    entry->child_kuid == __kuid_val(child)) {
+			rcu_read_unlock();
+			return true;
+		}
+	}
+	rcu_read_unlock();
+
+	return false;
+}
+
+/*
+ * TODO: Figuring out whether the current syscall number (saved on the kernel
+ * stack) is one of the set*uid syscalls is an operation that requires checking
+ * the number against arch-specific constants as seen below. The need for this
+ * LSM to know about arch-specific syscall stuff is not ideal. Is it better to
+ * implement an arch-specific function that gets called from this file and
+ * update arch/Kconfig to mention that the HAVE_SAFESETID symbol should only be
+ * selected for architectures that implement the function? Any other ideas?
+ */
+static bool setuid_syscall(int num)
+{
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_COMPAT
+	if (!(num == __NR_setreuid ||
+	      num == __NR_setuid ||
+	      num == __NR_setresuid ||
+	      num == __NR_setfsuid ||
+	      num == __NR_ia32_setreuid32 ||
+	      num == __NR_ia32_setuid ||
+	      num == __NR_ia32_setresuid ||
+	      num == __NR_ia32_setresuid ||
+	      num == __NR_ia32_setuid32))
+		return false;
+#else
+	if (!(num == __NR_setreuid ||
+	      num == __NR_setuid ||
+	      num == __NR_setresuid ||
+	      num == __NR_setfsuid))
+		return false;
+#endif /* CONFIG_COMPAT */
+#elif defined CONFIG_ARM64
+#ifdef CONFIG_COMPAT
+	if (!(num == __NR_setuid ||
+	      num == __NR_setreuid ||
+	      num == __NR_setfsuid ||
+	      num == __NR_setresuid ||
+	      num == __NR_setreuid32 ||
+	      num == __NR_setresuid32 ||
+	      num == __NR_setuid32 ||
+	      num == __NR_setfsuid32 ||
+	      num == __NR_compat_setuid ||
+	      num == __NR_compat_setreuid ||
+	      num == __NR_compat_setfsuid ||
+	      num == __NR_compat_setresuid ||
+	      num == __NR_compat_setreuid32 ||
+	      num == __NR_compat_setresuid32 ||
+	      num == __NR_compat_setuid32 ||
+	      num == __NR_compat_setfsuid32))
+		return false;
+#else
+	if (!(num == __NR_setuid ||
+	      num == __NR_setreuid ||
+	      num == __NR_setfsuid ||
+	      num == __NR_setresuid))
+		return false;
+#endif /* CONFIG_COMPAT */
+#elif defined CONFIG_ARM
+	if (!(num == __NR_setreuid32 ||
+	      num == __NR_setuid32 ||
+	      num == __NR_setresuid32 ||
+	      num == __NR_setfsuid32))
+		return false;
+#else
+	BUILD_BUG();
+#endif
+	return true;
+}
+
+static int safesetid_security_capable(const struct cred *cred,
+				      struct user_namespace *ns,
+				      int cap,
+				      int audit)
+{
+	/* The current->mm check will fail if this is a kernel thread. */
+	if (cap == CAP_SETUID &&
+	    current->mm &&
+	    check_setuid_policy_hashtable_key(cred->uid)) {
+		/*
+		 * syscall_get_nr can theoretically return 0 or -1, but that
+		 * would signify that the syscall is being aborted due to a
+		 * signal, so we don't need to check for this case here.
+		 */
+		if (!(setuid_syscall(syscall_get_nr(current,
+						    current_pt_regs()))))
+			/*
+			 * Deny if we're not in a set*uid() syscall to avoid
+			 * giving powers gated by CAP_SETUID that are related
+			 * to functionality other than calling set*uid() (e.g.
+			 * allowing user to set up userns uid mappings).
+			 */
+			return -1;
+	}
+	return 0;
+}
+
+static void setuid_policy_warning(kuid_t parent, kuid_t child)
+{
+	pr_warn("UID transition (%d -> %d) blocked",
+		__kuid_val(parent),
+		__kuid_val(child));
+}
+
+static int check_uid_transition(kuid_t parent, kuid_t child)
+{
+	if (check_setuid_policy_hashtable_key_value(parent, child))
+		return 0;
+	setuid_policy_warning(parent, child);
+	return -1;
+}
+
+/*
+ * Check whether there is either an exception for user under old cred struct to
+ * set*uid to user under new cred struct, or the UID transition is allowed (by
+ * Linux set*uid rules) even without CAP_SETUID.
+ */
+static int safesetid_task_fix_setuid(struct cred *new,
+				     const struct cred *old,
+				     int flags)
+{
+
+	/* Do nothing if there are no setuid restrictions for this UID. */
+	if (!check_setuid_policy_hashtable_key(old->uid))
+		return 0;
+
+	switch (flags) {
+	case LSM_SETID_RE:
+		/*
+		 * Users for which setuid restrictions exist can only set the
+		 * real UID to the real UID or the effective UID, unless an
+		 * explicit whitelist policy allows the transition.
+		 */
+		if (!uid_eq(old->uid, new->uid) &&
+			!uid_eq(old->euid, new->uid)) {
+			return check_uid_transition(old->uid, new->uid);
+		}
+		/*
+		 * Users for which setuid restrictions exist can only set the
+		 * effective UID to the real UID, the effective UID, or the
+		 * saved set-UID, unless an explicit whitelist policy allows
+		 * the transition.
+		 */
+		if (!uid_eq(old->uid, new->euid) &&
+			!uid_eq(old->euid, new->euid) &&
+			!uid_eq(old->suid, new->euid)) {
+			return check_uid_transition(old->euid, new->euid);
+		}
+		break;
+	case LSM_SETID_ID:
+		/*
+		 * Users for which setuid restrictions exist cannot change the
+		 * real UID or saved set-UID unless an explicit whitelist
+		 * policy allows the transition.
+		 */
+		if (!uid_eq(old->uid, new->uid))
+			return check_uid_transition(old->uid, new->uid);
+		if (!uid_eq(old->suid, new->suid))
+			return check_uid_transition(old->suid, new->suid);
+		break;
+	case LSM_SETID_RES:
+		/*
+		 * Users for which setuid restrictions exist cannot change the
+		 * real UID, effective UID, or saved set-UID to anything but
+		 * one of: the current real UID, the current effective UID or
+		 * the current saved set-user-ID unless an explicit whitelist
+		 * policy allows the transition.
+		 */
+		if (!uid_eq(new->uid, old->uid) &&
+			!uid_eq(new->uid, old->euid) &&
+			!uid_eq(new->uid, old->suid)) {
+			return check_uid_transition(old->uid, new->uid);
+		}
+		if (!uid_eq(new->euid, old->uid) &&
+			!uid_eq(new->euid, old->euid) &&
+			!uid_eq(new->euid, old->suid)) {
+			return check_uid_transition(old->euid, new->euid);
+		}
+		if (!uid_eq(new->suid, old->uid) &&
+			!uid_eq(new->suid, old->euid) &&
+			!uid_eq(new->suid, old->suid)) {
+			return check_uid_transition(old->suid, new->suid);
+		}
+		break;
+	case LSM_SETID_FS:
+		/*
+		 * Users for which setuid restrictions exist cannot change the
+		 * filesystem UID to anything but one of: the current real UID,
+		 * the current effective UID or the current saved set-UID
+		 * unless an explicit whitelist policy allows the transition.
+		 */
+		if (!uid_eq(new->fsuid, old->uid)  &&
+			!uid_eq(new->fsuid, old->euid)  &&
+			!uid_eq(new->fsuid, old->suid) &&
+			!uid_eq(new->fsuid, old->fsuid)) {
+			return check_uid_transition(old->fsuid, new->fsuid);
+		}
+		break;
+	}
+	return 0;
+}
+
+int add_safesetid_whitelist_entry(kuid_t parent, kuid_t child)
+{
+	struct entry *new;
+
+	/* Return if entry already exists */
+	if (check_setuid_policy_hashtable_key_value(parent, child))
+		return 0;
+
+	new = kzalloc(sizeof(struct entry), GFP_KERNEL);
+	if (!new)
+		return -ENOMEM;
+	new->parent_kuid = __kuid_val(parent);
+	new->child_kuid = __kuid_val(child);
+	spin_lock(&safesetid_whitelist_hashtable_spinlock);
+	hash_add_rcu(safesetid_whitelist_hashtable,
+		     &new->next,
+		     __kuid_val(parent));
+	spin_unlock(&safesetid_whitelist_hashtable_spinlock);
+	return 0;
+}
+
+void flush_safesetid_whitelist_entries(void)
+{
+	struct entry *entry;
+	struct hlist_node *hlist_node;
+	unsigned int bkt_loop_cursor;
+	HLIST_HEAD(free_list);
+
+	/*
+	 * Could probably use hash_for_each_rcu here instead, but this should
+	 * be fine as well.
+	 */
+	hash_for_each_safe(safesetid_whitelist_hashtable, bkt_loop_cursor,
+			   hlist_node, entry, next) {
+		spin_lock(&safesetid_whitelist_hashtable_spinlock);
+		hash_del_rcu(&entry->next);
+		spin_unlock(&safesetid_whitelist_hashtable_spinlock);
+		hlist_add_head(&entry->dlist, &free_list);
+	}
+	synchronize_rcu();
+	hlist_for_each_entry_safe(entry, hlist_node, &free_list, dlist)
+		kfree(entry);
+}
+
+static struct security_hook_list safesetid_security_hooks[] = {
+	LSM_HOOK_INIT(task_fix_setuid, safesetid_task_fix_setuid),
+	LSM_HOOK_INIT(capable, safesetid_security_capable)
+};
+
+static int __init safesetid_security_init(void)
+{
+	security_add_hooks(safesetid_security_hooks,
+			   ARRAY_SIZE(safesetid_security_hooks), "safesetid");
+
+	return 0;
+}
+security_initcall(safesetid_security_init);
diff --git a/security/safesetid/lsm.h b/security/safesetid/lsm.h
new file mode 100644
index 000000000000..bf78af9bf314
--- /dev/null
+++ b/security/safesetid/lsm.h
@@ -0,0 +1,30 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * SafeSetID Linux Security Module
+ *
+ * Author: Micah Morton <mortonm@chromium.org>
+ *
+ * Copyright (C) 2018 The Chromium OS Authors.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ *
+ */
+#ifndef _SAFESETID_H
+#define _SAFESETID_H
+
+#include <linux/types.h>
+
+/* Function type. */
+enum safesetid_whitelist_file_write_type {
+	SAFESETID_WHITELIST_ADD, /* Add whitelist policy. */
+	SAFESETID_WHITELIST_FLUSH, /* Flush whitelist policies. */
+};
+
+/* Add entry to safesetid whitelist to allow 'parent' to setid to 'child'. */
+int add_safesetid_whitelist_entry(kuid_t parent, kuid_t child);
+
+void flush_safesetid_whitelist_entries(void);
+
+#endif /* _SAFESETID_H */
diff --git a/security/safesetid/securityfs.c b/security/safesetid/securityfs.c
new file mode 100644
index 000000000000..ff5fcf2c1b37
--- /dev/null
+++ b/security/safesetid/securityfs.c
@@ -0,0 +1,189 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * SafeSetID Linux Security Module
+ *
+ * Author: Micah Morton <mortonm@chromium.org>
+ *
+ * Copyright (C) 2018 The Chromium OS Authors.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ *
+ */
+#include <linux/security.h>
+#include <linux/cred.h>
+
+#include "lsm.h"
+
+static struct dentry *safesetid_policy_dir;
+
+struct safesetid_file_entry {
+	const char *name;
+	enum safesetid_whitelist_file_write_type type;
+	struct dentry *dentry;
+};
+
+static struct safesetid_file_entry safesetid_files[] = {
+	{.name = "add_whitelist_policy",
+	 .type = SAFESETID_WHITELIST_ADD},
+	{.name = "flush_whitelist_policies",
+	 .type = SAFESETID_WHITELIST_FLUSH},
+};
+
+/*
+ * In the case the input buffer contains one or more invalid UIDs, the kuid_t
+ * variables pointed to by 'parent' and 'child' will get updated but this
+ * function will return an error.
+ */
+static int parse_safesetid_whitelist_policy(const char __user *buf,
+					    size_t len,
+					    kuid_t *parent,
+					    kuid_t *child)
+{
+	char *kern_buf;
+	char *parent_buf;
+	char *child_buf;
+	const char separator[] = ":";
+	int ret;
+	size_t first_substring_length;
+	long parsed_parent;
+	long parsed_child;
+
+	/* Duplicate string from user memory and NULL-terminate */
+	kern_buf = memdup_user_nul(buf, len);
+	if (IS_ERR(kern_buf))
+		return PTR_ERR(kern_buf);
+
+	/*
+	 * Format of |buf| string should be <UID>:<UID>.
+	 * Find location of ":" in kern_buf (copied from |buf|).
+	 */
+	first_substring_length = strcspn(kern_buf, separator);
+	if (first_substring_length == 0 || first_substring_length == len) {
+		ret = -EINVAL;
+		goto free_kern;
+	}
+
+	parent_buf = kmemdup_nul(kern_buf, first_substring_length, GFP_KERNEL);
+	if (!parent_buf) {
+		ret = -ENOMEM;
+		goto free_kern;
+	}
+
+	ret = kstrtol(parent_buf, 0, &parsed_parent);
+	if (ret)
+		goto free_both;
+
+	child_buf = kern_buf + first_substring_length + 1;
+	ret = kstrtol(child_buf, 0, &parsed_child);
+	if (ret)
+		goto free_both;
+
+	*parent = make_kuid(current_user_ns(), parsed_parent);
+	if (!uid_valid(*parent)) {
+		ret = -EINVAL;
+		goto free_both;
+	}
+
+	*child = make_kuid(current_user_ns(), parsed_child);
+	if (!uid_valid(*child)) {
+		ret = -EINVAL;
+		goto free_both;
+	}
+
+free_both:
+	kfree(parent_buf);
+free_kern:
+	kfree(kern_buf);
+	return ret;
+}
+
+static ssize_t safesetid_file_write(struct file *file,
+				    const char __user *buf,
+				    size_t len,
+				    loff_t *ppos)
+{
+	struct safesetid_file_entry *file_entry =
+		file->f_inode->i_private;
+	kuid_t parent;
+	kuid_t child;
+	int ret;
+
+	if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (*ppos != 0)
+		return -EINVAL;
+
+	if (file_entry->type == SAFESETID_WHITELIST_FLUSH) {
+		flush_safesetid_whitelist_entries();
+		return len;
+	}
+
+	/*
+	 * If we get to here, must be the case that file_entry->type equals
+	 * SAFESETID_WHITELIST_ADD
+	 */
+	ret = parse_safesetid_whitelist_policy(buf, len, &parent,
+							 &child);
+	if (ret)
+		return ret;
+
+	ret = add_safesetid_whitelist_entry(parent, child);
+	if (ret)
+		return ret;
+
+	/* Return len on success so caller won't keep trying to write */
+	return len;
+}
+
+static const struct file_operations safesetid_file_fops = {
+	.write = safesetid_file_write,
+};
+
+static void safesetid_shutdown_securityfs(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(safesetid_files); ++i) {
+		struct safesetid_file_entry *entry =
+			&safesetid_files[i];
+		securityfs_remove(entry->dentry);
+		entry->dentry = NULL;
+	}
+
+	securityfs_remove(safesetid_policy_dir);
+	safesetid_policy_dir = NULL;
+}
+
+static int __init safesetid_init_securityfs(void)
+{
+	int i;
+	int ret;
+
+	safesetid_policy_dir = securityfs_create_dir("safesetid", NULL);
+	if (!safesetid_policy_dir) {
+		ret = PTR_ERR(safesetid_policy_dir);
+		goto error;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(safesetid_files); ++i) {
+		struct safesetid_file_entry *entry =
+			&safesetid_files[i];
+		entry->dentry = securityfs_create_file(
+			entry->name, 0200, safesetid_policy_dir,
+			entry, &safesetid_file_fops);
+		if (IS_ERR(entry->dentry)) {
+			ret = PTR_ERR(entry->dentry);
+			goto error;
+		}
+	}
+
+	return 0;
+
+error:
+	safesetid_shutdown_securityfs();
+	return ret;
+}
+fs_initcall(safesetid_init_securityfs);