[v5,bpf-next,0/3] Introduce CAP_BPF

Message ID	20200508215340.41921-1-alexei.starovoitov@gmail.com (mailing list archive)
Headers	show Return-Path: <SRS0=gDDV=6W=vger.kernel.org=linux-security-module-owner@kernel.org> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> To: davem@davemloft.net Cc: daniel@iogearbox.net, netdev@vger.kernel.org, bpf@vger.kernel.org, kernel-team@fb.com, linux-security-module@vger.kernel.org, acme@redhat.com, jamorris@linux.microsoft.com, jannh@google.com, kpsingh@google.com Subject: [PATCH v5 bpf-next 0/3] Introduce CAP_BPF Date: Fri, 8 May 2020 14:53:37 -0700 Message-Id: <20200508215340.41921-1-alexei.starovoitov@gmail.com> Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk
Series	Introduce CAP_BPF \| expand [v5,bpf-next,0/3] Introduce CAP_BPF [v5,bpf-next,1/3] bpf, capability: Introduce CAP_BPF [v5,bpf-next,2/3] bpf: implement CAP_BPF [v5,bpf-next,3/3] selftests/bpf: use CAP_BPF and CAP_PERFMON in tests

Message ID

20200508215340.41921-1-alexei.starovoitov@gmail.com (mailing list archive)

Headers

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: davem@davemloft.net
Cc: daniel@iogearbox.net, netdev@vger.kernel.org, bpf@vger.kernel.org,
        kernel-team@fb.com, linux-security-module@vger.kernel.org,
        acme@redhat.com, jamorris@linux.microsoft.com, jannh@google.com,
        kpsingh@google.com
Subject: [PATCH v5 bpf-next 0/3] Introduce CAP_BPF
Date: Fri,  8 May 2020 14:53:37 -0700
Message-Id: <20200508215340.41921-1-alexei.starovoitov@gmail.com>
Sender: owner-linux-security-module@vger.kernel.org
Precedence: bulk

Series

Introduce CAP_BPF | expand

Message

Alexei Starovoitov May 8, 2020, 9:53 p.m. UTC

From: Alexei Starovoitov <ast@kernel.org>

v4->v5:

Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.

The user process has to have
- CAP_BPF and CAP_PERFMON to load tracing programs.
- CAP_BPF and CAP_NET_ADMIN to load networking programs.
(or CAP_SYS_ADMIN for backward compatibility).

CAP_BPF solves three main goals:
1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
   More on this below. This is the major difference vs v4 set back from Sep 2019.
2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
   prevents pointer leaks and arbitrary kernel memory access.
3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
   and making BPF infra more secure. Currently fuzzers run in unpriv.
   They will be able to run with CAP_BPF.

The patchset is long overdue follow-up from the last plumbers conference.
Comparing to what was discussed at LPC the CAP* checks at attach time are gone.
For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was
no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was
required at load time and CAP_NET_ADMIN at attach time, but there are several
ways to bypass CAP_NET_ADMIN:
- if networking prog is using tail_call writing FD into prog_array will
  effectively attach it, but bpf_map_update_elem is an unprivileged operation.
- freplace prog with CAP_SYS_ADMIN can replace networking prog

Consolidating all CAP checks at load time makes security model similar to
open() syscall. Once the user got an FD it can do everything with it.
read/write/poll don't check permissions. The same way when bpf_prog_load
command returns an FD the user can do everything (including attaching,
detaching, and bpf_test_run).

The important design decision is to allow ID->FD transition for
CAP_SYS_ADMIN only. What it means that user processes can run
with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each
other unless they pass FDs via scm_rights or via pinning in bpffs.
ID->FD is a mechanism for human override and introspection.
An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that
only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user
space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps,
links) that are owned by such processes from each other.

Another significant change from LPC is that the verifier checks are split into
allow_ptr_leaks and bpf_capable flags. The allow_ptr_leaks disables spectre
defense and allows pointer manipulations while bpf_capable enables all modern
verifier features like bpf-to-bpf calls, BTF, bounded loops, indirect stack
access, dead code elimination, etc. All the goodness.
These flags are initialized as:
  env->allow_ptr_leaks = perfmon_capable();
  env->bpf_capable = bpf_capable();
That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern
verifier features while being more secure.

Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON,
since subtracting pointers (like skb->data_end - skb->data) is a pointer leak,
but the verifier may get smarter in the future.

Please see patches for more details.

Alexei Starovoitov (3):
  bpf, capability: Introduce CAP_BPF
  bpf: implement CAP_BPF
  selftests/bpf: use CAP_BPF and CAP_PERFMON in tests

 drivers/media/rc/bpf-lirc.c                   |  2 +-
 include/linux/bpf_verifier.h                  |  1 +
 include/linux/capability.h                    |  5 ++
 include/uapi/linux/capability.h               | 34 +++++++-
 kernel/bpf/arraymap.c                         |  2 +-
 kernel/bpf/bpf_struct_ops.c                   |  2 +-
 kernel/bpf/core.c                             |  4 +-
 kernel/bpf/cpumap.c                           |  2 +-
 kernel/bpf/hashtab.c                          |  4 +-
 kernel/bpf/helpers.c                          |  4 +-
 kernel/bpf/lpm_trie.c                         |  2 +-
 kernel/bpf/queue_stack_maps.c                 |  2 +-
 kernel/bpf/reuseport_array.c                  |  2 +-
 kernel/bpf/stackmap.c                         |  2 +-
 kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
 kernel/bpf/verifier.c                         | 24 ++---
 kernel/trace/bpf_trace.c                      |  3 +
 net/core/bpf_sk_storage.c                     |  4 +-
 net/core/filter.c                             |  4 +-
 security/selinux/include/classmap.h           |  4 +-
 tools/testing/selftests/bpf/test_verifier.c   | 44 ++++++++--
 tools/testing/selftests/bpf/verifier/calls.c  | 16 ++--
 .../selftests/bpf/verifier/dead_code.c        | 10 +--
 23 files changed, 191 insertions(+), 73 deletions(-)

Comments

Casey Schaufler May 8, 2020, 10:45 p.m. UTC | #1

On 5/8/2020 2:53 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
>
> v4->v5:
>
> Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
> CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.
>
> The user process has to have
> - CAP_BPF and CAP_PERFMON to load tracing programs.
> - CAP_BPF and CAP_NET_ADMIN to load networking programs.
> (or CAP_SYS_ADMIN for backward compatibility).

Is there a case where CAP_BPF is useful in the absence of other capabilities?
I generally object to new capabilities in cases where existing capabilities
are already required.

>
> CAP_BPF solves three main goals:
> 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
>    More on this below. This is the major difference vs v4 set back from Sep 2019.
> 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
>    prevents pointer leaks and arbitrary kernel memory access.
> 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
>    and making BPF infra more secure. Currently fuzzers run in unpriv.
>    They will be able to run with CAP_BPF.
>
> The patchset is long overdue follow-up from the last plumbers conference.
> Comparing to what was discussed at LPC the CAP* checks at attach time are gone.
> For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was
> no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was
> required at load time and CAP_NET_ADMIN at attach time, but there are several
> ways to bypass CAP_NET_ADMIN:
> - if networking prog is using tail_call writing FD into prog_array will
>   effectively attach it, but bpf_map_update_elem is an unprivileged operation.
> - freplace prog with CAP_SYS_ADMIN can replace networking prog
>
> Consolidating all CAP checks at load time makes security model similar to
> open() syscall. Once the user got an FD it can do everything with it.
> read/write/poll don't check permissions. The same way when bpf_prog_load
> command returns an FD the user can do everything (including attaching,
> detaching, and bpf_test_run).
>
> The important design decision is to allow ID->FD transition for
> CAP_SYS_ADMIN only. What it means that user processes can run
> with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each
> other unless they pass FDs via scm_rights or via pinning in bpffs.
> ID->FD is a mechanism for human override and introspection.
> An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that
> only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user
> space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps,
> links) that are owned by such processes from each other.
>
> Another significant change from LPC is that the verifier checks are split into
> allow_ptr_leaks and bpf_capable flags. The allow_ptr_leaks disables spectre
> defense and allows pointer manipulations while bpf_capable enables all modern
> verifier features like bpf-to-bpf calls, BTF, bounded loops, indirect stack
> access, dead code elimination, etc. All the goodness.
> These flags are initialized as:
>   env->allow_ptr_leaks = perfmon_capable();
>   env->bpf_capable = bpf_capable();
> That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern
> verifier features while being more secure.
>
> Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON,
> since subtracting pointers (like skb->data_end - skb->data) is a pointer leak,
> but the verifier may get smarter in the future.
>
> Please see patches for more details.
>
> Alexei Starovoitov (3):
>   bpf, capability: Introduce CAP_BPF
>   bpf: implement CAP_BPF
>   selftests/bpf: use CAP_BPF and CAP_PERFMON in tests
>
>  drivers/media/rc/bpf-lirc.c                   |  2 +-
>  include/linux/bpf_verifier.h                  |  1 +
>  include/linux/capability.h                    |  5 ++
>  include/uapi/linux/capability.h               | 34 +++++++-
>  kernel/bpf/arraymap.c                         |  2 +-
>  kernel/bpf/bpf_struct_ops.c                   |  2 +-
>  kernel/bpf/core.c                             |  4 +-
>  kernel/bpf/cpumap.c                           |  2 +-
>  kernel/bpf/hashtab.c                          |  4 +-
>  kernel/bpf/helpers.c                          |  4 +-
>  kernel/bpf/lpm_trie.c                         |  2 +-
>  kernel/bpf/queue_stack_maps.c                 |  2 +-
>  kernel/bpf/reuseport_array.c                  |  2 +-
>  kernel/bpf/stackmap.c                         |  2 +-
>  kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
>  kernel/bpf/verifier.c                         | 24 ++---
>  kernel/trace/bpf_trace.c                      |  3 +
>  net/core/bpf_sk_storage.c                     |  4 +-
>  net/core/filter.c                             |  4 +-
>  security/selinux/include/classmap.h           |  4 +-
>  tools/testing/selftests/bpf/test_verifier.c   | 44 ++++++++--
>  tools/testing/selftests/bpf/verifier/calls.c  | 16 ++--
>  .../selftests/bpf/verifier/dead_code.c        | 10 +--
>  23 files changed, 191 insertions(+), 73 deletions(-)
>

Alexei Starovoitov May 8, 2020, 11 p.m. UTC | #2

On Fri, May 08, 2020 at 03:45:36PM -0700, Casey Schaufler wrote:
> On 5/8/2020 2:53 PM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > v4->v5:
> >
> > Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
> > CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.
> >
> > The user process has to have
> > - CAP_BPF and CAP_PERFMON to load tracing programs.
> > - CAP_BPF and CAP_NET_ADMIN to load networking programs.
> > (or CAP_SYS_ADMIN for backward compatibility).
> 
> Is there a case where CAP_BPF is useful in the absence of other capabilities?
> I generally object to new capabilities in cases where existing capabilities
> are already required.

You mean beyond what is written about CAP_BPF in include/uapi/linux/capability.h in patch 1?
There are prog types that are neither tracing nor networking.
Like LIRC2 and cgroup-device are not, but they were put under CAP_SYS_ADMIN + CAP_NET_ADMIN
because there was no CAP_BPF. This patch keeps them under CAP_BPF + CAP_NET_ADMIN for now.
May be that can be relaxed later. For sure future prog types won't have to deal with
such binary decision.