[v4,0/4] Introduce security_create_user_ns()

Message ID	20220801180146.1157914-1-fred@cloudflare.com (mailing list archive)
Headers	show Return-Path: <linux-security-module-owner@kernel.org> From: Frederick Lawler <fred@cloudflare.com> To: kpsingh@kernel.org, revest@chromium.org, jackmanb@chromium.org, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, jmorris@namei.org, serge@hallyn.com, paul@paul-moore.com, stephen.smalley.work@gmail.com, eparis@parisplace.org, shuah@kernel.org, brauner@kernel.org, casey@schaufler-ca.com, ebiederm@xmission.com, bpf@vger.kernel.org, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kernel-team@cloudflare.com, cgzones@googlemail.com, karl@bigbadwolfsecurity.com, Frederick Lawler <fred@cloudflare.com> Subject: [PATCH v4 0/4] Introduce security_create_user_ns() Date: Mon, 1 Aug 2022 13:01:42 -0500 Message-Id: <20220801180146.1157914-1-fred@cloudflare.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce security_create_user_ns() \| expand [v4,0/4] Introduce security_create_user_ns() [v4,1/4] security, lsm: Introduce security_create_user_ns() [v4,2/4] bpf-lsm: Make bpf_lsm_userns_create() sleepable [v4,3/4] selftests/bpf: Add tests verifying bpf lsm userns_create hook [v4,4/4] selinux: Implement userns_create hook

Frederick Lawler Aug. 1, 2022, 6:01 p.m. UTC

While creating a LSM BPF MAC policy to block user namespace creation, we
used the LSM cred_prepare hook because that is the closest hook to prevent
a call to create_user_ns().

The calls look something like this:

    cred = prepare_creds()
        security_prepare_creds()
            call_int_hook(cred_prepare, ...
    if (cred)
        create_user_ns(cred)

We noticed that error codes were not propagated from this hook and
introduced a patch [1] to propagate those errors.

The discussion notes that security_prepare_creds()
is not appropriate for MAC policies, and instead the hook is
meant for LSM authors to prepare credentials for mutation. [2]

Ultimately, we concluded that a better course of action is to introduce
a new security hook for LSM authors. [3]

This patch set first introduces a new security_create_user_ns() function
and userns_create LSM hook, then marks the hook as sleepable in BPF.

Links:
1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare.com/

Past discussions:
V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/
V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/
V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/

Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given
  permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take
  struct cred
- Move security_create_user_ns() call after id mapping check in
  create_user_ns()
- Update documentation to reflect changes

Frederick Lawler (4):
  security, lsm: Introduce security_create_user_ns()
  bpf-lsm: Make bpf_lsm_userns_create() sleepable
  selftests/bpf: Add tests verifying bpf lsm userns_create hook
  selinux: Implement userns_create hook

 include/linux/lsm_hook_defs.h                 |   1 +
 include/linux/lsm_hooks.h                     |   4 +
 include/linux/security.h                      |   6 ++
 kernel/bpf/bpf_lsm.c                          |   1 +
 kernel/user_namespace.c                       |   5 +
 security/security.c                           |   5 +
 security/selinux/hooks.c                      |   9 ++
 security/selinux/include/classmap.h           |   2 +
 .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
 .../selftests/bpf/progs/test_deny_namespace.c |  33 ++++++
 10 files changed, 168 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c

Eric W. Biederman Aug. 2, 2022, 2:56 a.m. UTC | #1

Frederick Lawler <fred@cloudflare.com> writes:

> While creating a LSM BPF MAC policy to block user namespace creation, we
> used the LSM cred_prepare hook because that is the closest hook to prevent
> a call to create_user_ns().

Re-nack for all of the same reasons.
AKA This can only break the users of the user namespace.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

You aren't fixing what your problem you are papering over it by denying
access to the user namespace.

Nack Nack Nack.

Stop.

Go back to the drawing board.

Do not pass go.

Do not collect $200.


> The calls look something like this:
>
>     cred = prepare_creds()
>         security_prepare_creds()
>             call_int_hook(cred_prepare, ...
>     if (cred)
>         create_user_ns(cred)
>
> We noticed that error codes were not propagated from this hook and
> introduced a patch [1] to propagate those errors.
>
> The discussion notes that security_prepare_creds()
> is not appropriate for MAC policies, and instead the hook is
> meant for LSM authors to prepare credentials for mutation. [2]
>
> Ultimately, we concluded that a better course of action is to introduce
> a new security hook for LSM authors. [3]
>
> This patch set first introduces a new security_create_user_ns() function
> and userns_create LSM hook, then marks the hook as sleepable in BPF.
>
> Links:
> 1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
> 2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
> 3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare.com/
>
> Past discussions:
> V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/
> V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/
> V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
>
> Changes since v3:
> - Explicitly set CAP_SYS_ADMIN to test namespace is created given
>   permission
> - Simplify BPF test to use sleepable hook only
> - Prefer unshare() over clone() for tests
> Changes since v2:
> - Rename create_user_ns hook to userns_create
> - Use user_namespace as an object opposed to a generic namespace object
> - s/domB_t/domA_t in commit message
> Changes since v1:
> - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
> - Add selinux: Implement create_user_ns hook patch
> - Change function signature of security_create_user_ns() to only take
>   struct cred
> - Move security_create_user_ns() call after id mapping check in
>   create_user_ns()
> - Update documentation to reflect changes
>
> Frederick Lawler (4):
>   security, lsm: Introduce security_create_user_ns()
>   bpf-lsm: Make bpf_lsm_userns_create() sleepable
>   selftests/bpf: Add tests verifying bpf lsm userns_create hook
>   selinux: Implement userns_create hook
>
>  include/linux/lsm_hook_defs.h                 |   1 +
>  include/linux/lsm_hooks.h                     |   4 +
>  include/linux/security.h                      |   6 ++
>  kernel/bpf/bpf_lsm.c                          |   1 +
>  kernel/user_namespace.c                       |   5 +
>  security/security.c                           |   5 +
>  security/selinux/hooks.c                      |   9 ++
>  security/selinux/include/classmap.h           |   2 +
>  .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++
>  .../selftests/bpf/progs/test_deny_namespace.c |  33 ++++++
>  10 files changed, 168 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c

Eric

Paul Moore Aug. 3, 2022, 2:10 a.m. UTC | #2

On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Frederick Lawler <fred@cloudflare.com> writes:
>
> > While creating a LSM BPF MAC policy to block user namespace creation, we
> > used the LSM cred_prepare hook because that is the closest hook to prevent
> > a call to create_user_ns().
>
> Re-nack for all of the same reasons.
> AKA This can only break the users of the user namespace.
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> You aren't fixing what your problem you are papering over it by denying
> access to the user namespace.
>
> Nack Nack Nack.
>
> Stop.
>
> Go back to the drawing board.
>
> Do not pass go.
>
> Do not collect $200.

If you want us to take your comments seriously Eric, you need to
provide the list with some constructive feedback that would allow
Frederick to move forward with a solution to the use case that has
been proposed.  You response above may be many things, but it is
certainly not that.

We've heard from different users now that there are very real use
cases for this LSM hook.  I understand you are concerned about adding
additional controls to user namespaces, but these are controls
requested by real users, and the controls being requested (LSM hooks,
with BPF and SELinux implementations) are configurable by the *users*
at *runtime*.  This patchset does not force additional restrictions on
user namespaces, it provides a mechanism that *users* can leverage to
add additional granularity to the access controls surrounding user
namespaces.

Eric, if you have a different approach in mind to adding a LSM hook to
user namespace creation I think we would all very much like to hear
about it.  However, if you do not have any suggestions along those
lines, and simply want to NACK any effort to add a LSM hook to user
namespace creation, I think we all understand your point of view and
respectfully disagree.  Barring any new approaches or suggestions, I
think Frederick's patches look reasonable and I still plan on merging
them into the LSM next branch when the merge window closes.

Eric W. Biederman Aug. 8, 2022, 6:56 p.m. UTC | #3

Paul Moore <paul@paul-moore.com> writes:

> On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Frederick Lawler <fred@cloudflare.com> writes:
>>
>> > While creating a LSM BPF MAC policy to block user namespace creation, we
>> > used the LSM cred_prepare hook because that is the closest hook to prevent
>> > a call to create_user_ns().
>>
>> Re-nack for all of the same reasons.
>> AKA This can only break the users of the user namespace.
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>
>> You aren't fixing what your problem you are papering over it by denying
>> access to the user namespace.
>>
>> Nack Nack Nack.
>>
>> Stop.
>>
>> Go back to the drawing board.
>>
>> Do not pass go.
>>
>> Do not collect $200.
>
> If you want us to take your comments seriously Eric, you need to
> provide the list with some constructive feedback that would allow
> Frederick to move forward with a solution to the use case that has
> been proposed.  You response above may be many things, but it is
> certainly not that.

I did provide constructive feedback.  My feedback to his problem
was to address the real problem of bugs in the kernel.

It is not a constructive approach to shoot the messenger
and is not a constructive approach to blow me off every time you
reply.

I have proposed that is there is a subsystem that is unduly buggy we
stop it from being enabled with a user-namespaces.

Further this is a hook really should have extra-ordinary requirements,
as all it can do is add additional failure modes to something that
does not really fail.  AKA all it can do is break-userspace.

As such I need to see a justification on why it makes sense to
break-userspace.

> We've heard from different users now that there are very real use
> cases for this LSM hook.  I understand you are concerned about adding
 > additional controls to user namespaces, but these are controls
> requested by real users, and the controls being requested (LSM hooks,
> with BPF and SELinux implementations) are configurable by the *users*
> at *runtime*.  This patchset does not force additional restrictions on
> user namespaces, it provides a mechanism that *users* can leverage to
> add additional granularity to the access controls surrounding user
> namespaces.

But that is not the problem that cloudfare encountered and are trying to
solve.

At least that is not what I was told when I asked early in the review
cycle.

All saying that is user-configurable does is shift the blame from the
kernel maintainers to the users.  Shift the responsibility from people
who should have enough expertise to know what is going on to people
who are by definition have other concerns, so are less likely to be as
well informed, and less likely to come up with good solutions.

> Eric, if you have a different approach in mind to adding a LSM hook to
> user namespace creation I think we would all very much like to hear
> about it.  However, if you do not have any suggestions along those
> lines, and simply want to NACK any effort to add a LSM hook to user
> namespace creation, I think we all understand your point of view and
> respectfully disagree.  Barring any new approaches or suggestions, I
> think Frederick's patches look reasonable and I still plan on merging
> them into the LSM next branch when the merge window closes.

But it is my code you are planning to merge this into, and your are
asking me to support something.

I admit I have not had time to read everything.  I am sick and tired
and quite frankly very tired that people are busy wanting to shoot
the messenger to the fact that there are bugs in the kernel.

I am speaking up and engaging as best as I can with objections that
are not hot-air.

You are very much proposing to merge code that can only cause
regressions and cause me grief.  At least that is all I see.  I don't
see anything in the change descriptions of the change that refutes that.

I don't see any interaction in fact with my concerns.

In fact your last reply was to completely blow off my request on how to
address the concerns that inspired this patch and to say other people
have a use too.

At this point I am happy to turn your request around and ask that you
address my concerns and not blow them off.  As I have seen no
constructive engagement with my concerns.   I think that is reasonable
as by definition I will get the support issues when some LSM has some
ill-thought out idea of how things should work and I get the bug report.

Eric

Paul Moore Aug. 8, 2022, 7:16 p.m. UTC | #4

On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> Frederick Lawler <fred@cloudflare.com> writes:
> >>
> >> > While creating a LSM BPF MAC policy to block user namespace creation, we
> >> > used the LSM cred_prepare hook because that is the closest hook to prevent
> >> > a call to create_user_ns().
> >>
> >> Re-nack for all of the same reasons.
> >> AKA This can only break the users of the user namespace.
> >>
> >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >>
> >> You aren't fixing what your problem you are papering over it by denying
> >> access to the user namespace.
> >>
> >> Nack Nack Nack.
> >>
> >> Stop.
> >>
> >> Go back to the drawing board.
> >>
> >> Do not pass go.
> >>
> >> Do not collect $200.
> >
> > If you want us to take your comments seriously Eric, you need to
> > provide the list with some constructive feedback that would allow
> > Frederick to move forward with a solution to the use case that has
> > been proposed.  You response above may be many things, but it is
> > certainly not that.
>
> I did provide constructive feedback.  My feedback to his problem
> was to address the real problem of bugs in the kernel.

We've heard from several people who have use cases which require
adding LSM-level access controls and observability to user namespace
creation.  This is the problem we are trying to solve here; if you do
not like the approach proposed in this patchset please suggest another
implementation that allows LSMs visibility into user namespace
creation.

Eric W. Biederman Aug. 8, 2022, 7:26 p.m. UTC | #5

Paul Moore <paul@paul-moore.com> writes:

>> I did provide constructive feedback.  My feedback to his problem
>> was to address the real problem of bugs in the kernel.
>
> We've heard from several people who have use cases which require
> adding LSM-level access controls and observability to user namespace
> creation.  This is the problem we are trying to solve here; if you do
> not like the approach proposed in this patchset please suggest another
> implementation that allows LSMs visibility into user namespace
> creation.

Please stop, ignoring my feedback, not detailing what problem or
problems you are actually trying to be solved, and threatening to merge
code into files that I maintain that has the express purpose of breaking
my users.

You just artificially constrained the problems, so that no other
solution is acceptable.  On that basis alone I am object to this whole
approach to steam roll over me and my code.

Eric

Eric W. Biederman Aug. 8, 2022, 7:43 p.m. UTC | #6

"Eric W. Biederman" <ebiederm@xmission.com> writes:

> Paul Moore <paul@paul-moore.com> writes:
>
>>> I did provide constructive feedback.  My feedback to his problem
>>> was to address the real problem of bugs in the kernel.
>>
>> We've heard from several people who have use cases which require
>> adding LSM-level access controls and observability to user namespace
>> creation.  This is the problem we are trying to solve here; if you do
>> not like the approach proposed in this patchset please suggest another
>> implementation that allows LSMs visibility into user namespace
>> creation.
>
> Please stop, ignoring my feedback, not detailing what problem or
> problems you are actually trying to be solved, and threatening to merge
> code into files that I maintain that has the express purpose of breaking
> my users.
>
> You just artificially constrained the problems, so that no other
> solution is acceptable.  On that basis alone I am object to this whole
> approach to steam roll over me and my code.

If you want an example of what kind of harm it can cause to introduce a
failure where no failure was before I invite you to look at what
happened with sendmail when setuid was modified to fail, when changing
the user of a process would cause RLIMIT_NPROC to be exceeded.

I am not arguing that what you are proposing is that bad but unexpected
failures cause real problems, and at a minimum that needs a better
response than: "There is at least one user that wants a failure here".

Frankly I would love to see an argument that semantically it ever makes
sense for creating a user namespace to fail.  If that argument has
already been made, my apologies to the person who made as I missed it,
in being sick and tired, and frustrated at being blown off, when
I asked for a proper discuss of the problem at hand.

Eric

Paul Moore Aug. 8, 2022, 7:49 p.m. UTC | #7

On Mon, Aug 8, 2022 at 3:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> >> I did provide constructive feedback.  My feedback to his problem
> >> was to address the real problem of bugs in the kernel.
> >
> > We've heard from several people who have use cases which require
> > adding LSM-level access controls and observability to user namespace
> > creation.  This is the problem we are trying to solve here; if you do
> > not like the approach proposed in this patchset please suggest another
> > implementation that allows LSMs visibility into user namespace
> > creation.
>
> Please stop, ignoring my feedback, not detailing what problem or
> problems you are actually trying to be solved, and threatening to merge
> code into files that I maintain that has the express purpose of breaking
> my users.

I've heard you talk about bugs being the only reason why people would
want to ever block user namespaces, but I think we've all seen use
cases now where it goes beyond that.  However, even if it didn't, the
need to build high confidence/assurance systems where big chunks of
functionality can be disabled based on a security policy is a very
real use case, and this patchset would help enable that.  I've noticed
you like to talk about these hooks being a source of "regressions",
but access controls are not regressions Eric, they are tools that
system builders, administrators, and users use to secure their
systems.

From my perspective, I believe that addresses your feedback around
"fix the bugs" and "this is a regression", which is the only thing
I've noted from your responses in this thread and others, but if I'm
missing something more technical please let me/us know.

> You just artificially constrained the problems, so that no other
> solution is acceptable.

There is a real need to be able to gain both additional visibility and
access control over user namespace creation, please suggest the
approach(es) you would find acceptable.

> On that basis alone I am object to this whole
> approach to steam roll over me and my code.

I saw that choice of wording in your last email and thought it a bit
curious, so I did a quick git log dump on kernel/user_namespace.c and
I see approximately 31 contributors to that one file.  I've always
thought of the open source maintainer role as more of a "steward" and
less of an "owner", but that's just my opinion.

Paul Moore Aug. 8, 2022, 10:47 p.m. UTC | #8

On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> writes:
> > Paul Moore <paul@paul-moore.com> writes:
> >
> >>> I did provide constructive feedback.  My feedback to his problem
> >>> was to address the real problem of bugs in the kernel.
> >>
> >> We've heard from several people who have use cases which require
> >> adding LSM-level access controls and observability to user namespace
> >> creation.  This is the problem we are trying to solve here; if you do
> >> not like the approach proposed in this patchset please suggest another
> >> implementation that allows LSMs visibility into user namespace
> >> creation.
> >
> > Please stop, ignoring my feedback, not detailing what problem or
> > problems you are actually trying to be solved, and threatening to merge
> > code into files that I maintain that has the express purpose of breaking
> > my users.
> >
> > You just artificially constrained the problems, so that no other
> > solution is acceptable.  On that basis alone I am object to this whole
> > approach to steam roll over me and my code.
>
> If you want an example of what kind of harm it can cause to introduce a
> failure where no failure was before I invite you to look at what
> happened with sendmail when setuid was modified to fail, when changing
> the user of a process would cause RLIMIT_NPROC to be exceeded.

I think we are all familiar with the sendmail capabilities bug and the
others like it, but using that as an excuse to block additional access
controls seems very weak.  The Linux Kernel is very different from
when the sendmail bug hit (what was that, ~20 years ago?), with
advancements in capabilities and other discretionary controls, as well
as mandatory access controls which have enabled Linux to be certified
through a number of third party security evaluations.

> I am not arguing that what you are proposing is that bad but unexpected
> failures cause real problems, and at a minimum that needs a better
> response than: "There is at least one user that wants a failure here".

Let me fix that for you: "There are multiple users who want to have
better visibility and access control for user namespace creation."

--
paul-moore.com

Eric W. Biederman Aug. 9, 2022, 4:07 p.m. UTC | #9

Paul Moore <paul@paul-moore.com> writes:

> On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>> > Paul Moore <paul@paul-moore.com> writes:
>> >
>> >>> I did provide constructive feedback.  My feedback to his problem
>> >>> was to address the real problem of bugs in the kernel.
>> >>
>> >> We've heard from several people who have use cases which require
>> >> adding LSM-level access controls and observability to user namespace
>> >> creation.  This is the problem we are trying to solve here; if you do
>> >> not like the approach proposed in this patchset please suggest another
>> >> implementation that allows LSMs visibility into user namespace
>> >> creation.
>> >
>> > Please stop, ignoring my feedback, not detailing what problem or
>> > problems you are actually trying to be solved, and threatening to merge
>> > code into files that I maintain that has the express purpose of breaking
>> > my users.
>> >
>> > You just artificially constrained the problems, so that no other
>> > solution is acceptable.  On that basis alone I am object to this whole
>> > approach to steam roll over me and my code.
>>
>> If you want an example of what kind of harm it can cause to introduce a
>> failure where no failure was before I invite you to look at what
>> happened with sendmail when setuid was modified to fail, when changing
>> the user of a process would cause RLIMIT_NPROC to be exceeded.
>
> I think we are all familiar with the sendmail capabilities bug and the
> others like it, but using that as an excuse to block additional access
> controls seems very weak.  The Linux Kernel is very different from
> when the sendmail bug hit (what was that, ~20 years ago?), with
> advancements in capabilities and other discretionary controls, as well
> as mandatory access controls which have enabled Linux to be certified
> through a number of third party security evaluations.

If you are familiar with scenarios like that then why is there not
being due diligence performed to ensure that userspace won't break?

Certainly none of the paperwork you are talking about does that kind
of checking and it most definitely is not happening before the code
gets merged. 

I am saying that performing that due diligence should be a requirement
before anyone even thinks about merging a patch that adds permission
checks where no existed before.

Sometimes changes to fix security bugs can get away with adding new
restrictions because we know with a very very high degree of probability
that the only thing that will break will be exploit code.  In the rare
case when real world applications are broken such changes need to be
reverted or adapted.  No one has even made the argument that only
exploit code will be affected.

So I am sorry I am the one who has to be the one to get in the way of a
broken process with semantic review,  but due diligence has not been
done.  So I am say no way this code should be merged.

In addition to actually breaking existing userspace, I think there is a
very real danger of breaking userspace, I think there is a very real
danger of breaking network effects by making such a large change to the
design of user namespaces.

>> I am not arguing that what you are proposing is that bad but unexpected
>> failures cause real problems, and at a minimum that needs a better
>> response than: "There is at least one user that wants a failure here".
>
> Let me fix that for you: "There are multiple users who want to have
> better visibility and access control for user namespace creation."

Visibility sure.  Design a proper hook for that.  All the proposed hook
can do is print an audit message.  It can't allocate or manage any state
as there is not the corresponding hook when a user namespace is freed.
So the proposed hook is not appropriate for increasing visibility.

Access control.  Not a chance unless it is carefully designed and
reviewed.  There is a very large cost to adding access control where
it has not previously existed.

I talk about that cost as people breaking my users as that is how I see
it.  I don't see any discussion on why I am wrong.

If we are going to add an access controls I want to see someone point
out something that is actually semantically a problem.  What motivates
an access control?

So far the only answer I have received is people want to reduce the
attack surface of the kernel.  I don't possibly see how reducing the
attack surface by removing user namespaces makes the probability of
having an exploitable kernel bug, anything approaching zero.

So I look at the calculus.  Chance of actually breaking userspace, or
preventing people with a legitimate use from using user namespaces > 0%.
Chance of actually preventing a determined attacker from exploiting the
kernel < 1%.  Amount of work to maintain, non-zero, and I really don't
like it.

Lots of work to achieve nothing but breaking some of my users.

So please stop trying to redesign my subsystem and cause me headaches,
unless you are going to do the due diligence necessary to do so
responsibly.

Eric

Eric W. Biederman Aug. 9, 2022, 4:40 p.m. UTC | #10

Paul Moore <paul@paul-moore.com> writes:

> On Mon, Aug 8, 2022 at 3:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Paul Moore <paul@paul-moore.com> writes:
>> >> I did provide constructive feedback.  My feedback to his problem
>> >> was to address the real problem of bugs in the kernel.
>> >
>> > We've heard from several people who have use cases which require
>> > adding LSM-level access controls and observability to user namespace
>> > creation.  This is the problem we are trying to solve here; if you do
>> > not like the approach proposed in this patchset please suggest another
>> > implementation that allows LSMs visibility into user namespace
>> > creation.
>>
>> Please stop, ignoring my feedback, not detailing what problem or
>> problems you are actually trying to be solved, and threatening to merge
>> code into files that I maintain that has the express purpose of breaking
>> my users.
>
> I've heard you talk about bugs being the only reason why people would
> want to ever block user namespaces, but I think we've all seen use
> cases now where it goes beyond that.

I really have not, and I don't appreciate being called a liar.

> However, even if it didn't, the
> need to build high confidence/assurance systems where big chunks of
> functionality can be disabled based on a security policy is a very
> real use case, and this patchset would help enable that.

Details please.  What does this look like.  What is the overall plan for
attack surface reduction in one of these systems.  How does it differ
from seccomp?

How does this differ from setting /proc/sys/user/max_usernamespaces to 0?

Why is it only the user namespace that needs to be modified to implement
such a system?

Why is there no discussion of that in the change description.

> I've noticed
> you like to talk about these hooks being a source of "regressions",
> but access controls are not regressions Eric, they are tools that
> system builders, administrators, and users use to secure their
> systems.
>
> From my perspective, I believe that addresses your feedback around
> "fix the bugs" and "this is a regression", which is the only thing
> I've noted from your responses in this thread and others, but if I'm
> missing something more technical please let me/us know.

Which is a short way of saying that the using this hook for attack
surface reduction without a larger plan will be ineffective.  If the
attack surface is not sufficiently reduced it will not achieve a
prevention of exploits and the attacks will still happen and be
successful.

With a change that is designed to prevent exploits not actually doing so
all that is left is breaking userspace and causing maintenance problems.

Earlier I asked to confirm that was the only reason cloudfare was
interested in this change.  I have asked that we have an actual
conversation about what is trying to be achieved.

Instead the conversation has simply been about implementation issues
and not about if the code will be worth having.  So far in my book the
code very much does not look worth having.  That is my technical
judgment and I don't see anyone taking about my arguments or even
really engaging in them.

Since I keep getting blown off, instead of having my concerns addressed
I say this code should not go.

>> You just artificially constrained the problems, so that no other
>> solution is acceptable.
>
> There is a real need to be able to gain both additional visibility and
> access control over user namespace creation, please suggest the
> approach(es) you would find acceptable.

The suggested hook is not at all appropriate for visibility.  Either the
user namespace needs to have some state that can be set, or there needs
to be something that is notified when the user namespace goes away.  At
best the hook can print an audit message.  So the proposed hook is
really not appropriate to add visibility to the user namespace.

For the record I don't object to adding visibility, I am just pointing
out the proposed hook is not appropriate to that task.

What is the need to have an access control?

Why do you need to fundamentally change the design of user namespaces?

Those are questions I have not seen any answers to.  Without actual
answers of what the actual problems are I can't have a reasonable
technical conversation.

>> On that basis alone I am object to this whole
>> approach to steam roll over me and my code.
>
> I saw that choice of wording in your last email and thought it a bit
> curious, so I did a quick git log dump on kernel/user_namespace.c and
> I see approximately 31 contributors to that one file.  I've always
> thought of the open source maintainer role as more of a "steward" and
> less of an "owner", but that's just my opinion.

As such it is unfortunately my responsibility to say no to badly thought
out proposals.  Proposals that will negatively affect the people using
the code I maintain.

My apologies if I have not been more elegant right now when I have been
constantly sick, and tired.  People getting scared of user namespaces
for no real reason has been an on-going trend for a decade or so.  This
isn't a new issue, and it irritates me that it is still going on.  I
have addressed real concerns and fixed code, for many many years.

This round of the people being afraid of user namespaces, I have yet to
find any real concerns.

So when I express my concerns that this is a pointless exercise and
people don't address my concern.  I say no.

Eric

Paul Moore Aug. 9, 2022, 4:47 p.m. UTC | #11

On Tue, Aug 9, 2022 at 12:08 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> > On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >> "Eric W. Biederman" <ebiederm@xmission.com> writes:
> >> > Paul Moore <paul@paul-moore.com> writes:
> >> >
> >> >>> I did provide constructive feedback.  My feedback to his problem
> >> >>> was to address the real problem of bugs in the kernel.
> >> >>
> >> >> We've heard from several people who have use cases which require
> >> >> adding LSM-level access controls and observability to user namespace
> >> >> creation.  This is the problem we are trying to solve here; if you do
> >> >> not like the approach proposed in this patchset please suggest another
> >> >> implementation that allows LSMs visibility into user namespace
> >> >> creation.
> >> >
> >> > Please stop, ignoring my feedback, not detailing what problem or
> >> > problems you are actually trying to be solved, and threatening to merge
> >> > code into files that I maintain that has the express purpose of breaking
> >> > my users.
> >> >
> >> > You just artificially constrained the problems, so that no other
> >> > solution is acceptable.  On that basis alone I am object to this whole
> >> > approach to steam roll over me and my code.
> >>
> >> If you want an example of what kind of harm it can cause to introduce a
> >> failure where no failure was before I invite you to look at what
> >> happened with sendmail when setuid was modified to fail, when changing
> >> the user of a process would cause RLIMIT_NPROC to be exceeded.
> >
> > I think we are all familiar with the sendmail capabilities bug and the
> > others like it, but using that as an excuse to block additional access
> > controls seems very weak.  The Linux Kernel is very different from
> > when the sendmail bug hit (what was that, ~20 years ago?), with
> > advancements in capabilities and other discretionary controls, as well
> > as mandatory access controls which have enabled Linux to be certified
> > through a number of third party security evaluations.
>
> If you are familiar with scenarios like that then why is there not
> being due diligence performed to ensure that userspace won't break?

What level of due diligence would satisfy you Eric?  This feels very
much like an unbounded ask that can be used to permanently block any
effort to add any form of additional access control around user
namespace creation.  If that isn't the case, and this request is being
made in good faith, please elaborate on what you believe would be
sufficient analysis of this patch?

Personally, the fact that the fork()/clone() variants and the
unshare() syscall all can fail and return error codes to userspace
seems like a reasonable level of safety.  If userspace is currently
ignoring the return values of fork/clone/unshare that is a problem
independent of this patchset.  Even looking at the existing
create_user_ns() function shows several cases where the user namespace
code can forcibly error out due to access controls, memory pressure,
etc.  I also see that additional restrictions were put on user
namespace creation after it was introduced, e.g. the chroot
restriction; I'm assuming that didn't break "your" users?  If you can
describe the analysis you did on that change, perhaps we can do the
same for this patchset and call it sufficient, yes?

> >> I am not arguing that what you are proposing is that bad but unexpected
> >> failures cause real problems, and at a minimum that needs a better
> >> response than: "There is at least one user that wants a failure here".
> >
> > Let me fix that for you: "There are multiple users who want to have
> > better visibility and access control for user namespace creation."
>
> Visibility sure.  Design a proper hook for that.  All the proposed hook
> can do is print an audit message.  It can't allocate or manage any state
> as there is not the corresponding hook when a user namespace is freed.
> So the proposed hook is not appropriate for increasing visibility.

Auditing very much increases visibility.  Look at what the BPF LSM can
do, observability is one of its primary goals.

> Access control.  Not a chance unless it is carefully designed and
> reviewed.

See the above.  The relevant syscalls already have the risk of
failure, if userspace is assuming fork/clone/unshare/etc. will never
fail then that application is broken independent of this discussion.

Casey Schaufler Aug. 9, 2022, 5:43 p.m. UTC | #12

On 8/9/2022 9:07 AM, Eric W. Biederman wrote:
> Paul Moore <paul@paul-moore.com> writes:
>
>> On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>> "Eric W. Biederman" <ebiederm@xmission.com> writes:
>>>> Paul Moore <paul@paul-moore.com> writes:
>>>>
>>>>>> I did provide constructive feedback.  My feedback to his problem
>>>>>> was to address the real problem of bugs in the kernel.
>>>>> We've heard from several people who have use cases which require
>>>>> adding LSM-level access controls and observability to user namespace
>>>>> creation.  This is the problem we are trying to solve here; if you do
>>>>> not like the approach proposed in this patchset please suggest another
>>>>> implementation that allows LSMs visibility into user namespace
>>>>> creation.
>>>> Please stop, ignoring my feedback, not detailing what problem or
>>>> problems you are actually trying to be solved, and threatening to merge
>>>> code into files that I maintain that has the express purpose of breaking
>>>> my users.
>>>>
>>>> You just artificially constrained the problems, so that no other
>>>> solution is acceptable.  On that basis alone I am object to this whole
>>>> approach to steam roll over me and my code.
>>> If you want an example of what kind of harm it can cause to introduce a
>>> failure where no failure was before I invite you to look at what
>>> happened with sendmail when setuid was modified to fail, when changing
>>> the user of a process would cause RLIMIT_NPROC to be exceeded.
>> I think we are all familiar with the sendmail capabilities bug and the
>> others like it, but using that as an excuse to block additional access
>> controls seems very weak.  The Linux Kernel is very different from
>> when the sendmail bug hit (what was that, ~20 years ago?), with
>> advancements in capabilities and other discretionary controls, as well
>> as mandatory access controls which have enabled Linux to be certified
>> through a number of third party security evaluations.
> If you are familiar with scenarios like that then why is there not
> being due diligence performed to ensure that userspace won't break?
>
> Certainly none of the paperwork you are talking about does that kind
> of checking and it most definitely is not happening before the code
> gets merged. 
>
> I am saying that performing that due diligence should be a requirement
> before anyone even thinks about merging a patch that adds permission
> checks where no existed before.
>
> Sometimes changes to fix security bugs can get away with adding new
> restrictions because we know with a very very high degree of probability
> that the only thing that will break will be exploit code.  In the rare
> case when real world applications are broken such changes need to be
> reverted or adapted.  No one has even made the argument that only
> exploit code will be affected.
>
> So I am sorry I am the one who has to be the one to get in the way of a
> broken process with semantic review,  but due diligence has not been
> done.  So I am say no way this code should be merged.
>
>
> In addition to actually breaking existing userspace, I think there is a
> very real danger of breaking userspace, I think there is a very real
> danger of breaking network effects by making such a large change to the
> design of user namespaces.
>
>
>>> I am not arguing that what you are proposing is that bad but unexpected
>>> failures cause real problems, and at a minimum that needs a better
>>> response than: "There is at least one user that wants a failure here".
>> Let me fix that for you: "There are multiple users who want to have
>> better visibility and access control for user namespace creation."
> Visibility sure.  Design a proper hook for that.  All the proposed hook
> can do is print an audit message.  It can't allocate or manage any state
> as there is not the corresponding hook when a user namespace is freed.
> So the proposed hook is not appropriate for increasing visibility.
>
>
> Access control.  Not a chance unless it is carefully designed and
> reviewed.  There is a very large cost to adding access control where
> it has not previously existed.
>
> I talk about that cost as people breaking my users as that is how I see
> it.  I don't see any discussion on why I am wrong.
>
> If we are going to add an access controls I want to see someone point
> out something that is actually semantically a problem.  What motivates
> an access control?

Smack has no interest in using the proposed hook because user namespaces
are neither subjects nor objects. They are collections of DAC and/or
privilege configuration alternatives. Or something like that. From the
viewpoint of a security module that only implements old fashioned MAC
there is no value in constraining user namespaces.

SELinux, on the other hand, seeks to be comprehensive well beyond
controlling accesses between subjects and objects. Asking the question
"should there be a control on this operation?" seems sufficient to justify
adding the control to SELinux policy. This is characteristic of
"Fine Grain" control.

So I'm of two minds on this. I don't need the hook, but I also understand
why SELinux and BPF want it. I don't necessarily agree with their logic,
but it is consistent with existing behavior. Any system that uses either
of those security modules needs to be ready for unexpected denials based
on any potential security concern. Keeping namespaces completely orthogonal
to LSM seems doomed to failure eventually.

>
> So far the only answer I have received is people want to reduce the
> attack surface of the kernel.  I don't possibly see how reducing the
> attack surface by removing user namespaces makes the probability of
> having an exploitable kernel bug, anything approaching zero.
>
> So I look at the calculus.  Chance of actually breaking userspace, or
> preventing people with a legitimate use from using user namespaces > 0%.
> Chance of actually preventing a determined attacker from exploiting the
> kernel < 1%.  Amount of work to maintain, non-zero, and I really don't
> like it.
>
> Lots of work to achieve nothing but breaking some of my users.
>
> So please stop trying to redesign my subsystem and cause me headaches,
> unless you are going to do the due diligence necessary to do so
> responsibly.
>
> Eric

Eric W. Biederman Aug. 9, 2022, 9:40 p.m. UTC | #13

Paul Moore <paul@paul-moore.com> writes:
>
> What level of due diligence would satisfy you Eric?

Having a real conversation about what a change is doing and to talk
about it's merits and it's pro's and cons.  I can't promise I would be
convinced but that is the kind of conversation it would take.

I was not trying to place an insurmountable barrier I was simply looking
to see if people had been being careful and doing what is generally
accepted for submitting a kernel patch.  From all I can see that has
completely not happened here.

> If that isn't the case, and this request is being made in good faith

Again you are calling me a liar. I really don't appreciate that.

As for something already returning an error.  The setuid system call
also has error returns, and enforcing RLIMIT_NPROC caused sendmail to
misbehave.

I bring up the past in this way only to illustrate that things can
happen.  That simply examining the kernel and not thinking about
userspace really isn't enough.

I am also concerned about the ecosystem effects of adding random access
control checks to a system call that does not perform access control
checks.

As I said this patch is changing a rather fundamental design decision by
adding an access control.  A design decision that for the most part has
worked out quite well, and has allowed applications to add sandboxing
support to themselves without asking permission to anyone.

Adding an access control all of a sudden means application developers
are having to ask for permission to things that are perfectly safe,
and it means many parts of the kernel gets less love both in use
and in maintenance.

It might be possible to convince me that design decision needs to
change, or that what is being proposed is small enough it does not
practically change that design decision.

Calling me a liar is not the way to change my mind.  Ignoring me
and pushing this through without addressing my concerns is not
the way to change my mind.

I honestly I want what I asked for at the start.  I want discussion of
what problems are being solved so we can talk about the problem or
problems and if this is even the appropriate solution to them.

Eric

Eric W. Biederman Aug. 9, 2022, 9:52 p.m. UTC | #14

Casey Schaufler <casey@schaufler-ca.com> writes:

> Smack has no interest in using the proposed hook because user namespaces
> are neither subjects nor objects. They are collections of DAC and/or
> privilege configuration alternatives. Or something like that. From the
> viewpoint of a security module that only implements old fashioned MAC
> there is no value in constraining user namespaces.

The goal is to simply enable things that become safe when you don't have
to worry about confusing setuid executables.

> SELinux, on the other hand, seeks to be comprehensive well beyond
> controlling accesses between subjects and objects. Asking the question
> "should there be a control on this operation?" seems sufficient to justify
> adding the control to SELinux policy. This is characteristic of
> "Fine Grain" control.

I see that from a theoretical standpoint.  On the flip side I prefer
arguments grounded in something more than what an abstract framework
could appreciate.  We are so far from any theoretical purity in the
linux kernel that I can't see theoretical purity being enough to justify
a decision like this.

> So I'm of two minds on this. I don't need the hook, but I also understand
> why SELinux and BPF want it. I don't necessarily agree with their logic,
> but it is consistent with existing behavior. Any system that uses either
> of those security modules needs to be ready for unexpected denials based
> on any potential security concern. Keeping namespaces completely orthogonal
> to LSM seems doomed to failure eventually.

I can cross that bridge when there is a healthy conversation about such
a change.

Too often I get "ouch! Creating a user namespace was used as the easiest
way to exploit a security bug. Let's solve the issue by denying user
namespaces."  Which leads to half thought out policies made out of fear.

Which is where I think this conversation started.  I haven't seen it
make it's way to any healthy reasons to deny user namespaces yet.

Eric

Paul Moore Aug. 9, 2022, 10:40 p.m. UTC | #15

On Tue, Aug 9, 2022 at 5:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> >
> > What level of due diligence would satisfy you Eric?
>
> Having a real conversation about what a change is doing and to talk
> about it's merits and it's pro's and cons.  I can't promise I would be
> convinced but that is the kind of conversation it would take.

Earlier today you talked about due diligence to ensure that userspace
won't break and I provided my reasoning on why userspace would not
break (at least not because of this change).  Userspace might be
blocked from creating a new user namespace due to a security policy,
but that would be the expected and desired outcome, not breakage.  As
far as your most recent comment regarding merit and pros/cons, I
believe we have had that discussion (quite a few times already); it
just seems you are not satisfied with the majority's conclusion.

Personally, I'm not sure there is anything more I can do to convince
you that this patchset is reasonable; I'm going to leave it to others
at this point, or we can all simply agree to disagree for the moment.
Just as you haven't heard a compelling argument for this patchset, I
haven't heard a compelling argument against it.  Barring some
significant new discussion point, or opinion, I still plan on merging
this into the LSM next branch when the merge window closes next week
so it has time to go through a full round of linux-next testing.
Assuming no unresolvable problems are found during the additional
testing I plan to send it to Linus during the v6.1 merge window and
I'm guessing we will get to go through this all again.  It's less than
ideal, but I think this is where we are at right now.

Alexei Starovoitov Aug. 10, 2022, 12:51 a.m. UTC | #16

On Tue, Aug 9, 2022 at 3:40 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Tue, Aug 9, 2022 at 5:41 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Paul Moore <paul@paul-moore.com> writes:
> > >
> > > What level of due diligence would satisfy you Eric?
> >
> > Having a real conversation about what a change is doing and to talk
> > about it's merits and it's pro's and cons.  I can't promise I would be
> > convinced but that is the kind of conversation it would take.
>
> Earlier today you talked about due diligence to ensure that userspace
> won't break and I provided my reasoning on why userspace would not
> break (at least not because of this change).  Userspace might be
> blocked from creating a new user namespace due to a security policy,
> but that would be the expected and desired outcome, not breakage.  As
> far as your most recent comment regarding merit and pros/cons, I
> believe we have had that discussion (quite a few times already); it
> just seems you are not satisfied with the majority's conclusion.
>
> Personally, I'm not sure there is anything more I can do to convince
> you that this patchset is reasonable; I'm going to leave it to others
> at this point, or we can all simply agree to disagree for the moment.
> Just as you haven't heard a compelling argument for this patchset, I
> haven't heard a compelling argument against it.  Barring some
> significant new discussion point, or opinion, I still plan on merging
> this into the LSM next branch when the merge window closes next week
> so it has time to go through a full round of linux-next testing.
> Assuming no unresolvable problems are found during the additional
> testing I plan to send it to Linus during the v6.1 merge window and
> I'm guessing we will get to go through this all again.  It's less than
> ideal, but I think this is where we are at right now.

+1

Serge E. Hallyn Aug. 14, 2022, 3:55 p.m. UTC | #17

On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
> On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Paul Moore <paul@paul-moore.com> writes:
> > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > >> Frederick Lawler <fred@cloudflare.com> writes:
> > >>
> > >> > While creating a LSM BPF MAC policy to block user namespace creation, we
> > >> > used the LSM cred_prepare hook because that is the closest hook to prevent
> > >> > a call to create_user_ns().
> > >>
> > >> Re-nack for all of the same reasons.
> > >> AKA This can only break the users of the user namespace.
> > >>
> > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > >>
> > >> You aren't fixing what your problem you are papering over it by denying
> > >> access to the user namespace.
> > >>
> > >> Nack Nack Nack.
> > >>
> > >> Stop.
> > >>
> > >> Go back to the drawing board.
> > >>
> > >> Do not pass go.
> > >>
> > >> Do not collect $200.
> > >
> > > If you want us to take your comments seriously Eric, you need to
> > > provide the list with some constructive feedback that would allow
> > > Frederick to move forward with a solution to the use case that has
> > > been proposed.  You response above may be many things, but it is
> > > certainly not that.
> >
> > I did provide constructive feedback.  My feedback to his problem
> > was to address the real problem of bugs in the kernel.
> 
> We've heard from several people who have use cases which require
> adding LSM-level access controls and observability to user namespace
> creation.  This is the problem we are trying to solve here; if you do
> not like the approach proposed in this patchset please suggest another
> implementation that allows LSMs visibility into user namespace
> creation.

Regarding the observability - can someone concisely lay out why just
auditing userns creation would not suffice?  Userspace could decide
what to report based on whether the creating user_ns == /proc/1/ns/user...

Regarding limiting the tweaking of otherwise-privileged code by
unprivileged users, i wonder whether we could instead add smarts to
ns_capable().  Point being, uid mapping would still work, but we'd
break the "privileged against resources you own" part of user
namespaces.  I would want it to default to allow, but then when a
0-day is found which requires reaching ns_capable() code, admins 
could easily prevent exploitation until reboot from a fixed kernel.

Paul Moore Aug. 15, 2022, 2:32 a.m. UTC | #18

On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
> > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > Paul Moore <paul@paul-moore.com> writes:
> > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > >> Frederick Lawler <fred@cloudflare.com> writes:
> > > >>
> > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we
> > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent
> > > >> > a call to create_user_ns().
> > > >>
> > > >> Re-nack for all of the same reasons.
> > > >> AKA This can only break the users of the user namespace.
> > > >>
> > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > >>
> > > >> You aren't fixing what your problem you are papering over it by denying
> > > >> access to the user namespace.
> > > >>
> > > >> Nack Nack Nack.
> > > >>
> > > >> Stop.
> > > >>
> > > >> Go back to the drawing board.
> > > >>
> > > >> Do not pass go.
> > > >>
> > > >> Do not collect $200.
> > > >
> > > > If you want us to take your comments seriously Eric, you need to
> > > > provide the list with some constructive feedback that would allow
> > > > Frederick to move forward with a solution to the use case that has
> > > > been proposed.  You response above may be many things, but it is
> > > > certainly not that.
> > >
> > > I did provide constructive feedback.  My feedback to his problem
> > > was to address the real problem of bugs in the kernel.
> >
> > We've heard from several people who have use cases which require
> > adding LSM-level access controls and observability to user namespace
> > creation.  This is the problem we are trying to solve here; if you do
> > not like the approach proposed in this patchset please suggest another
> > implementation that allows LSMs visibility into user namespace
> > creation.
>
> Regarding the observability - can someone concisely lay out why just
> auditing userns creation would not suffice?  Userspace could decide
> what to report based on whether the creating user_ns == /proc/1/ns/user...

One of the selling points of the BPF LSM is that it allows for various
different ways of reporting and logging beyond audit.  However, even
if it was limited to just audit I believe that provides some useful
justification as auditing fork()/clone() isn't quite the same and
could be difficult to do at scale in some configurations.  I haven't
personally added a BPF LSM program to the kernel so I can't speak to
the details on what is possible, but I'm sure others on the To/CC line
could help provide more information if that is important to you.

> Regarding limiting the tweaking of otherwise-privileged code by
> unprivileged users, i wonder whether we could instead add smarts to
> ns_capable().

The existing security_capable() hook is eventually called by ns_capable():

  ns_capable()
    ns_capable_common()
      security_capable(const struct cred *cred,
                       struct user_namespace *ns,
                       int cap,
                       unsigned int opts);

... I'm not sure what additional smarts would be useful here?

[side note: SELinux does actually distinguish between capability
checks in the initial user namespace vs child namespaces]

> Point being, uid mapping would still work, but we'd
> break the "privileged against resources you own" part of user
> namespaces.  I would want it to default to allow, but then when a
> 0-day is found which requires reaching ns_capable() code, admins
> could easily prevent exploitation until reboot from a fixed kernel.

That assumes that everything you care about is behind a capability
check, which is probably going to be correct in a lot of the cases,
but I think it would be a mistake to assume that is always going to be
true.

--
paul-moore.com

Serge E. Hallyn Aug. 15, 2022, 3:41 p.m. UTC | #19

On Sun, Aug 14, 2022 at 10:32:51PM -0400, Paul Moore wrote:
> On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
> > > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > > Paul Moore <paul@paul-moore.com> writes:
> > > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > > >> Frederick Lawler <fred@cloudflare.com> writes:
> > > > >>
> > > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we
> > > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent
> > > > >> > a call to create_user_ns().
> > > > >>
> > > > >> Re-nack for all of the same reasons.
> > > > >> AKA This can only break the users of the user namespace.
> > > > >>
> > > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > >>
> > > > >> You aren't fixing what your problem you are papering over it by denying
> > > > >> access to the user namespace.
> > > > >>
> > > > >> Nack Nack Nack.
> > > > >>
> > > > >> Stop.
> > > > >>
> > > > >> Go back to the drawing board.
> > > > >>
> > > > >> Do not pass go.
> > > > >>
> > > > >> Do not collect $200.
> > > > >
> > > > > If you want us to take your comments seriously Eric, you need to
> > > > > provide the list with some constructive feedback that would allow
> > > > > Frederick to move forward with a solution to the use case that has
> > > > > been proposed.  You response above may be many things, but it is
> > > > > certainly not that.
> > > >
> > > > I did provide constructive feedback.  My feedback to his problem
> > > > was to address the real problem of bugs in the kernel.
> > >
> > > We've heard from several people who have use cases which require
> > > adding LSM-level access controls and observability to user namespace
> > > creation.  This is the problem we are trying to solve here; if you do
> > > not like the approach proposed in this patchset please suggest another
> > > implementation that allows LSMs visibility into user namespace
> > > creation.
> >
> > Regarding the observability - can someone concisely lay out why just
> > auditing userns creation would not suffice?  Userspace could decide
> > what to report based on whether the creating user_ns == /proc/1/ns/user...
> 
> One of the selling points of the BPF LSM is that it allows for various
> different ways of reporting and logging beyond audit.  However, even
> if it was limited to just audit I believe that provides some useful
> justification as auditing fork()/clone() isn't quite the same and
> could be difficult to do at scale in some configurations.  I haven't
> personally added a BPF LSM program to the kernel so I can't speak to
> the details on what is possible, but I'm sure others on the To/CC line
> could help provide more information if that is important to you.
> 
> > Regarding limiting the tweaking of otherwise-privileged code by
> > unprivileged users, i wonder whether we could instead add smarts to
> > ns_capable().
> 
> The existing security_capable() hook is eventually called by ns_capable():
> 
>   ns_capable()
>     ns_capable_common()
>       security_capable(const struct cred *cred,
>                        struct user_namespace *ns,
>                        int cap,
>                        unsigned int opts);
> 
> ... I'm not sure what additional smarts would be useful here?

Oh - i wasn't necessarily thinking of an LSM.  I was picturing a
sysctl next to unprivileged_userns_clone.  But you're right, looks
like an LSM could already do this.  Of course, there's an issue early
on in that the root user in the new namespace couldn't setuid, so
the uid mapping is still limited.  So this idea probably isn't worth
the characters we've typed about it so far, sorry.

> [side note: SELinux does actually distinguish between capability
> checks in the initial user namespace vs child namespaces]
> 
> > Point being, uid mapping would still work, but we'd
> > break the "privileged against resources you own" part of user
> > namespaces.  I would want it to default to allow, but then when a
> > 0-day is found which requires reaching ns_capable() code, admins
> > could easily prevent exploitation until reboot from a fixed kernel.
> 
> That assumes that everything you care about is behind a capability
> check, which is probably going to be correct in a lot of the cases,
> but I think it would be a mistake to assume that is always going to be
> true.

I might be thinking wrongly, but if it's not behind a capability check,
then it seems to me it's not an exploit that can only be reached by
becoming root in a user namespace, which means disabling user namespace
creation by unprivileged users will not stop the attack.

Paul Moore Aug. 15, 2022, 4:24 p.m. UTC | #20

On Mon, Aug 15, 2022 at 11:41 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> On Sun, Aug 14, 2022 at 10:32:51PM -0400, Paul Moore wrote:
> > On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > > On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
> > > > On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > > > Paul Moore <paul@paul-moore.com> writes:
> > > > > > On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > > > >> Frederick Lawler <fred@cloudflare.com> writes:
> > > > > >>
> > > > > >> > While creating a LSM BPF MAC policy to block user namespace creation, we
> > > > > >> > used the LSM cred_prepare hook because that is the closest hook to prevent
> > > > > >> > a call to create_user_ns().
> > > > > >>
> > > > > >> Re-nack for all of the same reasons.
> > > > > >> AKA This can only break the users of the user namespace.
> > > > > >>
> > > > > >> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > >>
> > > > > >> You aren't fixing what your problem you are papering over it by denying
> > > > > >> access to the user namespace.
> > > > > >>
> > > > > >> Nack Nack Nack.
> > > > > >>
> > > > > >> Stop.
> > > > > >>
> > > > > >> Go back to the drawing board.
> > > > > >>
> > > > > >> Do not pass go.
> > > > > >>
> > > > > >> Do not collect $200.
> > > > > >
> > > > > > If you want us to take your comments seriously Eric, you need to
> > > > > > provide the list with some constructive feedback that would allow
> > > > > > Frederick to move forward with a solution to the use case that has
> > > > > > been proposed.  You response above may be many things, but it is
> > > > > > certainly not that.
> > > > >
> > > > > I did provide constructive feedback.  My feedback to his problem
> > > > > was to address the real problem of bugs in the kernel.
> > > >
> > > > We've heard from several people who have use cases which require
> > > > adding LSM-level access controls and observability to user namespace
> > > > creation.  This is the problem we are trying to solve here; if you do
> > > > not like the approach proposed in this patchset please suggest another
> > > > implementation that allows LSMs visibility into user namespace
> > > > creation.
> > >
> > > Regarding the observability - can someone concisely lay out why just
> > > auditing userns creation would not suffice?  Userspace could decide
> > > what to report based on whether the creating user_ns == /proc/1/ns/user...
> >
> > One of the selling points of the BPF LSM is that it allows for various
> > different ways of reporting and logging beyond audit.  However, even
> > if it was limited to just audit I believe that provides some useful
> > justification as auditing fork()/clone() isn't quite the same and
> > could be difficult to do at scale in some configurations.  I haven't
> > personally added a BPF LSM program to the kernel so I can't speak to
> > the details on what is possible, but I'm sure others on the To/CC line
> > could help provide more information if that is important to you.
> >
> > > Regarding limiting the tweaking of otherwise-privileged code by
> > > unprivileged users, i wonder whether we could instead add smarts to
> > > ns_capable().
> >
> > The existing security_capable() hook is eventually called by ns_capable():
> >
> >   ns_capable()
> >     ns_capable_common()
> >       security_capable(const struct cred *cred,
> >                        struct user_namespace *ns,
> >                        int cap,
> >                        unsigned int opts);
> >
> > ... I'm not sure what additional smarts would be useful here?
>
> Oh - i wasn't necessarily thinking of an LSM.  I was picturing a
> sysctl next to unprivileged_userns_clone.  But you're right, looks
> like an LSM could already do this.  Of course, there's an issue early
> on in that the root user in the new namespace couldn't setuid, so
> the uid mapping is still limited.  So this idea probably isn't worth
> the characters we've typed about it so far, sorry.

No harm, no foul.  This thread has already reached record lows with
respect to usefulness-vs-characters ratio, a few more isn't going to
hurt anything ;)

> > [side note: SELinux does actually distinguish between capability
> > checks in the initial user namespace vs child namespaces]
> >
> > > Point being, uid mapping would still work, but we'd
> > > break the "privileged against resources you own" part of user
> > > namespaces.  I would want it to default to allow, but then when a
> > > 0-day is found which requires reaching ns_capable() code, admins
> > > could easily prevent exploitation until reboot from a fixed kernel.
> >
> > That assumes that everything you care about is behind a capability
> > check, which is probably going to be correct in a lot of the cases,
> > but I think it would be a mistake to assume that is always going to be
> > true.
>
> I might be thinking wrongly, but if it's not behind a capability check,
> then it seems to me it's not an exploit that can only be reached by
> becoming root in a user namespace, which means disabling user namespace
> creation by unprivileged users will not stop the attack.

I was primarily thinking about two things: subj/obj relationships
which are really not addressed with capability checks, and unrelated
problems which aren't the fault of the user namespace but could be
somehow made easier through some of the unique situations offered by
user namespaces.  There are exploits that often require chaining
together multiple "things" to trigger the necessary flaw, and
sometimes the most immediate way to stop such an attack is to apply
additional controls to one of these intermediate steps.  Frekerick's
work puts the necessary infrastructure in place so we can do that with
user namespaces.

[v4,0/4] Introduce security_create_user_ns()

Message

Comments