[00/18] bpf: Secure and authenticated preloading of eBPF programs

Message ID	20220328175033.2437312-1-roberto.sassu@huawei.com (mailing list archive)
Headers	show Return-Path: <linux-security-module-owner@kernel.org> From: Roberto Sassu <roberto.sassu@huawei.com> To: <corbet@lwn.net>, <viro@zeniv.linux.org.uk>, <ast@kernel.org>, <daniel@iogearbox.net>, <andrii@kernel.org>, <kpsingh@kernel.org>, <shuah@kernel.org>, <mcoquelin.stm32@gmail.com>, <alexandre.torgue@foss.st.com>, <zohar@linux.ibm.com> CC: <linux-doc@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>, <netdev@vger.kernel.org>, <bpf@vger.kernel.org>, <linux-kselftest@vger.kernel.org>, <linux-stm32@st-md-mailman.stormreply.com>, <linux-arm-kernel@lists.infradead.org>, <linux-integrity@vger.kernel.org>, <linux-security-module@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Roberto Sassu <roberto.sassu@huawei.com> Subject: [PATCH 00/18] bpf: Secure and authenticated preloading of eBPF programs Date: Mon, 28 Mar 2022 19:50:15 +0200 Message-ID: <20220328175033.2437312-1-roberto.sassu@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	bpf: Secure and authenticated preloading of eBPF programs \| expand [00/18] bpf: Secure and authenticated preloading of eBPF programs [01/18] bpf: Export bpf_link_inc() [02/18] bpf-preload: Move bpf_preload.h to include/linux [03/18] bpf-preload: Generalize object pinning from the kernel [04/18] bpf-preload: Export and call bpf_obj_do_pin_kernel() [05/18] bpf-preload: Generate static variables [06/18] bpf-preload: Generate free_objs_and_skel() [07/18] bpf-preload: Generate preload() [08/18] bpf-preload: Generate load_skel() [09/18] bpf-preload: Generate code to pin non-internal maps [10/18] bpf-preload: Generate bpf_preload_ops [11/18] bpf-preload: Store multiple bpf_preload_ops structures in a linked list [12/18] bpf-preload: Implement new registration method for preloading eBPF programs [13/18] bpf-preload: Move pinned links and maps to a dedicated directory in bpffs [14/18] bpf-preload: Switch to new preload registration method [15/18] bpf-preload: Generate code of kernel module to preload [16/18] bpf-preload: Do kernel mount to ensure that pinned objects don't disappear [17/18] bpf-preload/selftests: Add test for automatic generation of preload methods [18/18] bpf-preload/selftests: Preload a test eBPF program and check pinned objects

Roberto Sassu March 28, 2022, 5:50 p.m. UTC

eBPF already allows programs to be preloaded and kept running without
intervention from user space. There is a dedicated kernel module called
bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
program. If this module is enabled in the kernel configuration, its loading
will be triggered when the bpf filesystem is mounted (unless the module is
built-in), and the links of iterators_bpf are pinned in that filesystem
(they will appear as the progs.debug and maps.debug files).

However, the current mechanism, if used to preload an LSM, would not offer
the same security guarantees of LSMs integrated in the security subsystem.
Also, it is not generic enough to be used for preloading arbitrary eBPF
programs, unless the bpf_preload code is heavily modified.

More specifically, the security problems are:
- any program can be pinned to the bpf filesystem without limitations
  (unless a MAC mechanism enforces some restrictions);
- programs being executed can be terminated at any time by deleting the
  pinned objects or unmounting the bpf filesystem.

The usability problems are:
- only a fixed amount of links can be pinned;
- only links can be pinned, other object types are not supported;
- code to pin objects has to be written manually;
- preloading multiple eBPF programs is not practical, bpf_preload has to be
  modified to include additional light skeletons.

Solve the security problems by mounting the bpf filesystem from the kernel,
by preloading authenticated kernel modules (e.g. with module.sig_enforce)
and by pinning objects to that filesystem. This particular filesystem
instance guarantees that desired eBPF programs run until the very end of
the kernel lifecycle, since even root cannot interfere with it.

Solve the usability problems by generalizing the pinning function, to
handle not only links but also maps and progs. Also increment the object
reference count and call the pinning function directly from the preload
method (currently in the bpf_preload kernel module) rather than from the
bpf filesystem code itself, so that a generic eBPF program can do those
operations depending on its objects (this also avoids the limitation of the
fixed-size array for storing the objects to pin).

Then, simplify the process of pinning objects defined by a generic eBPF
program by automatically generating the required methods in the light
skeleton. Also, generate a separate kernel module for each eBPF program to
preload, so that existing ones don't have to be modified. Finally, support
preloading multiple eBPF programs by allowing users to specify a list from
the kernel configuration, at build time, or with the new kernel option
bpf_preload_list=, at run-time.

To summarize, this patch set makes it possible to plug in out-of-tree LSMs
matching the security guarantees of their counterpart in the security
subsystem, without having to modify the kernel itself. The same benefits
are extended to other eBPF program types.

Only one remaining problem is how to support auto-attaching eBPF programs
with LSM type. It will be solved with a separate patch set.

Patches 1-2 export some definitions, to build out-of-tree kernel modules
with eBPF programs to preload. Patches 3-4 allow eBPF programs to pin
objects by themselves. Patches 5-10 automatically generate the methods for
preloading in the light skeleton. Patches 11-14 make it possible to preload
multiple eBPF programs. Patch 15 automatically generates the kernel module
for preloading an eBPF program, patch 16 does a kernel mount of the bpf
filesystem, and finally patches 17-18 test the functionality introduced.

Roberto Sassu (18):
  bpf: Export bpf_link_inc()
  bpf-preload: Move bpf_preload.h to include/linux
  bpf-preload: Generalize object pinning from the kernel
  bpf-preload: Export and call bpf_obj_do_pin_kernel()
  bpf-preload: Generate static variables
  bpf-preload: Generate free_objs_and_skel()
  bpf-preload: Generate preload()
  bpf-preload: Generate load_skel()
  bpf-preload: Generate code to pin non-internal maps
  bpf-preload: Generate bpf_preload_ops
  bpf-preload: Store multiple bpf_preload_ops structures in a linked
    list
  bpf-preload: Implement new registration method for preloading eBPF
    programs
  bpf-preload: Move pinned links and maps to a dedicated directory in
    bpffs
  bpf-preload: Switch to new preload registration method
  bpf-preload: Generate code of kernel module to preload
  bpf-preload: Do kernel mount to ensure that pinned objects don't
    disappear
  bpf-preload/selftests: Add test for automatic generation of preload
    methods
  bpf-preload/selftests: Preload a test eBPF program and check pinned
    objects

 .../admin-guide/kernel-parameters.txt         |   8 +
 fs/namespace.c                                |   1 +
 include/linux/bpf.h                           |   5 +
 include/linux/bpf_preload.h                   |  37 ++
 init/main.c                                   |   2 +
 kernel/bpf/inode.c                            | 295 +++++++++--
 kernel/bpf/preload/Kconfig                    |  25 +-
 kernel/bpf/preload/bpf_preload.h              |  16 -
 kernel/bpf/preload/bpf_preload_kern.c         |  85 +---
 kernel/bpf/preload/iterators/Makefile         |   9 +-
 .../bpf/preload/iterators/iterators.lskel.h   | 466 +++++++++++-------
 kernel/bpf/syscall.c                          |   1 +
 .../bpf/bpftool/Documentation/bpftool-gen.rst |  13 +
 tools/bpf/bpftool/bash-completion/bpftool     |   6 +-
 tools/bpf/bpftool/gen.c                       | 331 +++++++++++++
 tools/bpf/bpftool/main.c                      |   7 +-
 tools/bpf/bpftool/main.h                      |   1 +
 tools/testing/selftests/bpf/Makefile          |  32 +-
 .../bpf/bpf_testmod_preload/.gitignore        |   7 +
 .../bpf/bpf_testmod_preload/Makefile          |  20 +
 .../gen_preload_methods.expected.diff         |  97 ++++
 .../bpf/prog_tests/test_gen_preload_methods.c |  27 +
 .../bpf/prog_tests/test_preload_methods.c     |  69 +++
 .../selftests/bpf/progs/gen_preload_methods.c |  23 +
 24 files changed, 1246 insertions(+), 337 deletions(-)
 create mode 100644 include/linux/bpf_preload.h
 delete mode 100644 kernel/bpf/preload/bpf_preload.h
 create mode 100644 tools/testing/selftests/bpf/bpf_testmod_preload/.gitignore
 create mode 100644 tools/testing/selftests/bpf/bpf_testmod_preload/Makefile
 create mode 100644 tools/testing/selftests/bpf/prog_tests/gen_preload_methods.expected.diff
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_gen_preload_methods.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_preload_methods.c
 create mode 100644 tools/testing/selftests/bpf/progs/gen_preload_methods.c

Andrii Nakryiko March 29, 2022, 11:51 p.m. UTC | #1

On Mon, Mar 28, 2022 at 10:51 AM Roberto Sassu <roberto.sassu@huawei.com> wrote:
>
> eBPF already allows programs to be preloaded and kept running without
> intervention from user space. There is a dedicated kernel module called
> bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
> program. If this module is enabled in the kernel configuration, its loading
> will be triggered when the bpf filesystem is mounted (unless the module is
> built-in), and the links of iterators_bpf are pinned in that filesystem
> (they will appear as the progs.debug and maps.debug files).
>
> However, the current mechanism, if used to preload an LSM, would not offer
> the same security guarantees of LSMs integrated in the security subsystem.
> Also, it is not generic enough to be used for preloading arbitrary eBPF
> programs, unless the bpf_preload code is heavily modified.
>
> More specifically, the security problems are:
> - any program can be pinned to the bpf filesystem without limitations
>   (unless a MAC mechanism enforces some restrictions);
> - programs being executed can be terminated at any time by deleting the
>   pinned objects or unmounting the bpf filesystem.
>
> The usability problems are:
> - only a fixed amount of links can be pinned;
> - only links can be pinned, other object types are not supported;
> - code to pin objects has to be written manually;
> - preloading multiple eBPF programs is not practical, bpf_preload has to be
>   modified to include additional light skeletons.
>
> Solve the security problems by mounting the bpf filesystem from the kernel,
> by preloading authenticated kernel modules (e.g. with module.sig_enforce)
> and by pinning objects to that filesystem. This particular filesystem
> instance guarantees that desired eBPF programs run until the very end of
> the kernel lifecycle, since even root cannot interfere with it.
>
> Solve the usability problems by generalizing the pinning function, to
> handle not only links but also maps and progs. Also increment the object
> reference count and call the pinning function directly from the preload
> method (currently in the bpf_preload kernel module) rather than from the
> bpf filesystem code itself, so that a generic eBPF program can do those
> operations depending on its objects (this also avoids the limitation of the
> fixed-size array for storing the objects to pin).
>
> Then, simplify the process of pinning objects defined by a generic eBPF
> program by automatically generating the required methods in the light
> skeleton. Also, generate a separate kernel module for each eBPF program to
> preload, so that existing ones don't have to be modified. Finally, support
> preloading multiple eBPF programs by allowing users to specify a list from
> the kernel configuration, at build time, or with the new kernel option
> bpf_preload_list=, at run-time.
>
> To summarize, this patch set makes it possible to plug in out-of-tree LSMs
> matching the security guarantees of their counterpart in the security
> subsystem, without having to modify the kernel itself. The same benefits
> are extended to other eBPF program types.
>
> Only one remaining problem is how to support auto-attaching eBPF programs
> with LSM type. It will be solved with a separate patch set.
>
> Patches 1-2 export some definitions, to build out-of-tree kernel modules
> with eBPF programs to preload. Patches 3-4 allow eBPF programs to pin
> objects by themselves. Patches 5-10 automatically generate the methods for
> preloading in the light skeleton. Patches 11-14 make it possible to preload
> multiple eBPF programs. Patch 15 automatically generates the kernel module
> for preloading an eBPF program, patch 16 does a kernel mount of the bpf
> filesystem, and finally patches 17-18 test the functionality introduced.
>

This approach of moving tons of pretty generic code into codegen of
lskel seems suboptimal. Why so much code has to be codegenerated?
Especially that tiny module code?

Can you please elaborate on why it can't be done in a way that doesn't
require such extensive light skeleton codegen changes?


> Roberto Sassu (18):
>   bpf: Export bpf_link_inc()
>   bpf-preload: Move bpf_preload.h to include/linux
>   bpf-preload: Generalize object pinning from the kernel
>   bpf-preload: Export and call bpf_obj_do_pin_kernel()
>   bpf-preload: Generate static variables
>   bpf-preload: Generate free_objs_and_skel()
>   bpf-preload: Generate preload()
>   bpf-preload: Generate load_skel()
>   bpf-preload: Generate code to pin non-internal maps
>   bpf-preload: Generate bpf_preload_ops
>   bpf-preload: Store multiple bpf_preload_ops structures in a linked
>     list
>   bpf-preload: Implement new registration method for preloading eBPF
>     programs
>   bpf-preload: Move pinned links and maps to a dedicated directory in
>     bpffs
>   bpf-preload: Switch to new preload registration method
>   bpf-preload: Generate code of kernel module to preload
>   bpf-preload: Do kernel mount to ensure that pinned objects don't
>     disappear
>   bpf-preload/selftests: Add test for automatic generation of preload
>     methods
>   bpf-preload/selftests: Preload a test eBPF program and check pinned
>     objects

please use proper prefixes: bpf (for kernel-side changes), libbpf,
bpftool, selftests/bpf, etc


>
>  .../admin-guide/kernel-parameters.txt         |   8 +
>  fs/namespace.c                                |   1 +
>  include/linux/bpf.h                           |   5 +
>  include/linux/bpf_preload.h                   |  37 ++
>  init/main.c                                   |   2 +
>  kernel/bpf/inode.c                            | 295 +++++++++--
>  kernel/bpf/preload/Kconfig                    |  25 +-
>  kernel/bpf/preload/bpf_preload.h              |  16 -
>  kernel/bpf/preload/bpf_preload_kern.c         |  85 +---
>  kernel/bpf/preload/iterators/Makefile         |   9 +-
>  .../bpf/preload/iterators/iterators.lskel.h   | 466 +++++++++++-------
>  kernel/bpf/syscall.c                          |   1 +
>  .../bpf/bpftool/Documentation/bpftool-gen.rst |  13 +
>  tools/bpf/bpftool/bash-completion/bpftool     |   6 +-
>  tools/bpf/bpftool/gen.c                       | 331 +++++++++++++
>  tools/bpf/bpftool/main.c                      |   7 +-
>  tools/bpf/bpftool/main.h                      |   1 +
>  tools/testing/selftests/bpf/Makefile          |  32 +-
>  .../bpf/bpf_testmod_preload/.gitignore        |   7 +
>  .../bpf/bpf_testmod_preload/Makefile          |  20 +
>  .../gen_preload_methods.expected.diff         |  97 ++++
>  .../bpf/prog_tests/test_gen_preload_methods.c |  27 +
>  .../bpf/prog_tests/test_preload_methods.c     |  69 +++
>  .../selftests/bpf/progs/gen_preload_methods.c |  23 +
>  24 files changed, 1246 insertions(+), 337 deletions(-)
>  create mode 100644 include/linux/bpf_preload.h
>  delete mode 100644 kernel/bpf/preload/bpf_preload.h
>  create mode 100644 tools/testing/selftests/bpf/bpf_testmod_preload/.gitignore
>  create mode 100644 tools/testing/selftests/bpf/bpf_testmod_preload/Makefile
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/gen_preload_methods.expected.diff
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_gen_preload_methods.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_preload_methods.c
>  create mode 100644 tools/testing/selftests/bpf/progs/gen_preload_methods.c
>
> --
> 2.32.0
>

Roberto Sassu March 30, 2022, 7:21 a.m. UTC | #2

> From: Andrii Nakryiko [mailto:andrii.nakryiko@gmail.com]
> Sent: Wednesday, March 30, 2022 1:51 AM
> On Mon, Mar 28, 2022 at 10:51 AM Roberto Sassu
> <roberto.sassu@huawei.com> wrote:

[...]

> > Patches 1-2 export some definitions, to build out-of-tree kernel modules
> > with eBPF programs to preload. Patches 3-4 allow eBPF programs to pin
> > objects by themselves. Patches 5-10 automatically generate the methods
> for
> > preloading in the light skeleton. Patches 11-14 make it possible to preload
> > multiple eBPF programs. Patch 15 automatically generates the kernel
> module
> > for preloading an eBPF program, patch 16 does a kernel mount of the bpf
> > filesystem, and finally patches 17-18 test the functionality introduced.
> >
> 
> This approach of moving tons of pretty generic code into codegen of
> lskel seems suboptimal. Why so much code has to be codegenerated?
> Especially that tiny module code?

Hi Andrii

the main goal of this patch set is to use the preloading
mechanism to plug in securely LSMs implemented as eBPF
programs.

I have a use case, I want to plug in my eBPF program,
DIGLIM eBPF.

I started to modify the preloading code manually, and
I realized how complicated the process is if you want
to add something more than the existing iterators_bpf
program.

First, you have to look at which objects you want to
preload, then write code for each of them. This process
is repetitive and deterministic, this is why I immediately
thought that it is a good case for automatic code
generation.

My idea is that, if this mechanism is accepted, an
implementer of an LSM wishing to be preloaded at
the very beginning, only has to write his eBPF code,
the kernel and bpftool take care of the rest.
Generation of the preloading code is optional, and
need to be enabled with the -P option, in addition to -L.

The light skeleton of DIGLIM eBPF looks like:

https://github.com/robertosassu/linux/blob/bpf-preload-v1/kernel/bpf/preload/diglim/diglim.lskel.h

The preloading interface is very similar to the one used
by the security subsystem: an ordered list of eBPF
programs to preload set in the kernel configuration,
that can be overwritten with the kernel option
bpf_preload_list=.

The changes that would be required to preload DIGLIM
eBPF look like:

https://github.com/robertosassu/linux/commit/c07e1a78584ee688aeb812f07dc7ab3060ac6152

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Zhong Ronghua
 
> Can you please elaborate on why it can't be done in a way that doesn't
> require such extensive light skeleton codegen changes?
> 
> 
> > Roberto Sassu (18):
> >   bpf: Export bpf_link_inc()
> >   bpf-preload: Move bpf_preload.h to include/linux
> >   bpf-preload: Generalize object pinning from the kernel
> >   bpf-preload: Export and call bpf_obj_do_pin_kernel()
> >   bpf-preload: Generate static variables
> >   bpf-preload: Generate free_objs_and_skel()
> >   bpf-preload: Generate preload()
> >   bpf-preload: Generate load_skel()
> >   bpf-preload: Generate code to pin non-internal maps
> >   bpf-preload: Generate bpf_preload_ops
> >   bpf-preload: Store multiple bpf_preload_ops structures in a linked
> >     list
> >   bpf-preload: Implement new registration method for preloading eBPF
> >     programs
> >   bpf-preload: Move pinned links and maps to a dedicated directory in
> >     bpffs
> >   bpf-preload: Switch to new preload registration method
> >   bpf-preload: Generate code of kernel module to preload
> >   bpf-preload: Do kernel mount to ensure that pinned objects don't
> >     disappear
> >   bpf-preload/selftests: Add test for automatic generation of preload
> >     methods
> >   bpf-preload/selftests: Preload a test eBPF program and check pinned
> >     objects
> 
> please use proper prefixes: bpf (for kernel-side changes), libbpf,
> bpftool, selftests/bpf, etc
> 
> 
> >
> >  .../admin-guide/kernel-parameters.txt         |   8 +
> >  fs/namespace.c                                |   1 +
> >  include/linux/bpf.h                           |   5 +
> >  include/linux/bpf_preload.h                   |  37 ++
> >  init/main.c                                   |   2 +
> >  kernel/bpf/inode.c                            | 295 +++++++++--
> >  kernel/bpf/preload/Kconfig                    |  25 +-
> >  kernel/bpf/preload/bpf_preload.h              |  16 -
> >  kernel/bpf/preload/bpf_preload_kern.c         |  85 +---
> >  kernel/bpf/preload/iterators/Makefile         |   9 +-
> >  .../bpf/preload/iterators/iterators.lskel.h   | 466 +++++++++++-------
> >  kernel/bpf/syscall.c                          |   1 +
> >  .../bpf/bpftool/Documentation/bpftool-gen.rst |  13 +
> >  tools/bpf/bpftool/bash-completion/bpftool     |   6 +-
> >  tools/bpf/bpftool/gen.c                       | 331 +++++++++++++
> >  tools/bpf/bpftool/main.c                      |   7 +-
> >  tools/bpf/bpftool/main.h                      |   1 +
> >  tools/testing/selftests/bpf/Makefile          |  32 +-
> >  .../bpf/bpf_testmod_preload/.gitignore        |   7 +
> >  .../bpf/bpf_testmod_preload/Makefile          |  20 +
> >  .../gen_preload_methods.expected.diff         |  97 ++++
> >  .../bpf/prog_tests/test_gen_preload_methods.c |  27 +
> >  .../bpf/prog_tests/test_preload_methods.c     |  69 +++
> >  .../selftests/bpf/progs/gen_preload_methods.c |  23 +
> >  24 files changed, 1246 insertions(+), 337 deletions(-)
> >  create mode 100644 include/linux/bpf_preload.h
> >  delete mode 100644 kernel/bpf/preload/bpf_preload.h
> >  create mode 100644
> tools/testing/selftests/bpf/bpf_testmod_preload/.gitignore
> >  create mode 100644
> tools/testing/selftests/bpf/bpf_testmod_preload/Makefile
> >  create mode 100644
> tools/testing/selftests/bpf/prog_tests/gen_preload_methods.expected.diff
> >  create mode 100644
> tools/testing/selftests/bpf/prog_tests/test_gen_preload_methods.c
> >  create mode 100644
> tools/testing/selftests/bpf/prog_tests/test_preload_methods.c
> >  create mode 100644
> tools/testing/selftests/bpf/progs/gen_preload_methods.c
> >
> > --
> > 2.32.0
> >

Alexei Starovoitov March 31, 2022, 2:27 a.m. UTC | #3

On Mon, Mar 28, 2022 at 07:50:15PM +0200, Roberto Sassu wrote:
> eBPF already allows programs to be preloaded and kept running without
> intervention from user space. There is a dedicated kernel module called
> bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
> program. If this module is enabled in the kernel configuration, its loading
> will be triggered when the bpf filesystem is mounted (unless the module is
> built-in), and the links of iterators_bpf are pinned in that filesystem
> (they will appear as the progs.debug and maps.debug files).
> 
> However, the current mechanism, if used to preload an LSM, would not offer
> the same security guarantees of LSMs integrated in the security subsystem.
> Also, it is not generic enough to be used for preloading arbitrary eBPF
> programs, unless the bpf_preload code is heavily modified.
> 
> More specifically, the security problems are:
> - any program can be pinned to the bpf filesystem without limitations
>   (unless a MAC mechanism enforces some restrictions);
> - programs being executed can be terminated at any time by deleting the
>   pinned objects or unmounting the bpf filesystem.

So many things to untangle here.

The above paragraphs are misleading and incorrect.
The commit log sounds like there are security issues that this
patch set is fixing.
This is not true.
Looks like there is a massive misunderstanding on what bpffs is.
It's a file system to pin and get bpf objects with normal
file access permissions. Nothing else.
Do NOT use it to pin LSM or any other security sensitive bpf programs
and then complain that root can unpin them.
Yes. Root can and should be able to 'rm -rf' anything in bpffs instance.

> The usability problems are:
> - only a fixed amount of links can be pinned;

where do you see this limit?

> - only links can be pinned, other object types are not supported;

really? progs, maps can be pinned as well.

> - code to pin objects has to be written manually;

huh?

> Solve the security problems by mounting the bpf filesystem from the kernel,
> by preloading authenticated kernel modules (e.g. with module.sig_enforce)
> and by pinning objects to that filesystem. This particular filesystem
> instance guarantees that desired eBPF programs run until the very end of
> the kernel lifecycle, since even root cannot interfere with it.

No.

I suspect there is huge confusion on what these two "progs.debug"
and "maps.debug" files are in a bpffs instance.
They are debug files to pretty pring loaded maps and progs for folks who
like to use 'cat' to examine the state of the system instead of 'bpftool'.
The root can remove these files from bpffs.

There is no reason for kernel module to pin its bpf progs.
If you want to develop DIGLIM as a kernel module that uses light skeleton
just do:
#include <linux/init.h>
#include <linux/module.h>
#include "diglim.lskel.h"

static struct diglim_bpf *skel;

static int __init load(void)
{
        skel = diglim_bpf__open_and_load();
        err = diglim_bpf__attach(skel);
}
/* detach skel in __fini */

It's really that short.

Then you will be able to
- insmod diglim.ko -> will load and attach bpf progs.
- rmmod diglim -> will detach them.

Independantly from these two mistunderstandings of bpffs and light skel
we've been talking about auto exposing loaded bpf progs, maps, links
in a bpffs without incrementing refcnt of them.
When progs get unloaded the files will disappear.
Some folks believe that doing 'ls' in a directory and see one file
for each bpf prog loaded and then doing 'cat' on that file would be
useful for debugging. That idea wasn't rejected. We're still thinking
what would be the best way to auto-expose all bpf objects for debugging
and whether it's actually makes sense to do considering that
bpftool already has commands to list all progs, maps, links, etc
with great detail.
It's pretty much an argument between 'cat+ls' believers and
'bpftool' cmdline believers.
That discussion is orthogonal and should not be mixed with bpffs, lsm,
security or anything else.

Roberto Sassu March 31, 2022, 8:25 a.m. UTC | #4

> From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com]
> Sent: Thursday, March 31, 2022 4:27 AM
> On Mon, Mar 28, 2022 at 07:50:15PM +0200, Roberto Sassu wrote:
> > eBPF already allows programs to be preloaded and kept running without
> > intervention from user space. There is a dedicated kernel module called
> > bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
> > program. If this module is enabled in the kernel configuration, its loading
> > will be triggered when the bpf filesystem is mounted (unless the module is
> > built-in), and the links of iterators_bpf are pinned in that filesystem
> > (they will appear as the progs.debug and maps.debug files).
> >
> > However, the current mechanism, if used to preload an LSM, would not
> offer
> > the same security guarantees of LSMs integrated in the security
> subsystem.
> > Also, it is not generic enough to be used for preloading arbitrary eBPF
> > programs, unless the bpf_preload code is heavily modified.
> >
> > More specifically, the security problems are:
> > - any program can be pinned to the bpf filesystem without limitations
> >   (unless a MAC mechanism enforces some restrictions);
> > - programs being executed can be terminated at any time by deleting the
> >   pinned objects or unmounting the bpf filesystem.
> 
> So many things to untangle here.

Hi Alexei

thanks for taking the time to provide such detailed
explanation.

> The above paragraphs are misleading and incorrect.
> The commit log sounds like there are security issues that this
> patch set is fixing.
> This is not true.

I reiterate the goal: enforce a mandatory policy with
an out-of-tree LSM (a kernel module is fine), with the
same guarantees of LSMs integrated in the security
subsystem.

The root user is not part of the TCB (i.e. is untrusted),
all the changes that user wants to make must be subject
of decision by the LSM enforcing the mandatory policy.

I thought about adding support for LSMs from kernel
modules via a new built-in LSM (called LoadLSM), but
to me it looks that the bpf LSM is closer to achieve the
same goal. And in addition, eBPF significantly simplifies
with its helpers writing an LSM.

> Looks like there is a massive misunderstanding on what bpffs is.
> It's a file system to pin and get bpf objects with normal
> file access permissions. Nothing else.
> Do NOT use it to pin LSM or any other security sensitive bpf programs
> and then complain that root can unpin them.
> Yes. Root can and should be able to 'rm -rf' anything in bpffs instance.
> 
> > The usability problems are:
> > - only a fixed amount of links can be pinned;
> 
> where do you see this limit?

static int populate_bpffs(struct dentry *parent)
{
        struct bpf_preload_info objs[BPF_PRELOAD_LINKS] = {};

#define BPF_PRELOAD_LINKS 2

> > - only links can be pinned, other object types are not supported;
> 
> really? progs, maps can be pinned as well.

struct bpf_preload_info {
        char link_name[16];
        struct bpf_link *link;
};

> > - code to pin objects has to be written manually;
> 
> huh?

I meant if you want to extend the bpf_preload kernel
module.

> > Solve the security problems by mounting the bpf filesystem from the
> kernel,
> > by preloading authenticated kernel modules (e.g. with
> module.sig_enforce)
> > and by pinning objects to that filesystem. This particular filesystem
> > instance guarantees that desired eBPF programs run until the very end of
> > the kernel lifecycle, since even root cannot interfere with it.
> 
> No.

Ok. How can the goal I stated above be achieved properly?

> I suspect there is huge confusion on what these two "progs.debug"
> and "maps.debug" files are in a bpffs instance.
> They are debug files to pretty pring loaded maps and progs for folks who
> like to use 'cat' to examine the state of the system instead of 'bpftool'.
> The root can remove these files from bpffs.
> 
> There is no reason for kernel module to pin its bpf progs.
> If you want to develop DIGLIM as a kernel module that uses light skeleton
> just do:
> #include <linux/init.h>
> #include <linux/module.h>
> #include "diglim.lskel.h"
> 
> static struct diglim_bpf *skel;
> 
> static int __init load(void)
> {
>         skel = diglim_bpf__open_and_load();
>         err = diglim_bpf__attach(skel);
> }
> /* detach skel in __fini */
> 
> It's really that short.
> 
> Then you will be able to
> - insmod diglim.ko -> will load and attach bpf progs.
> - rmmod diglim -> will detach them.

root can stop the LSM without consulting the security
policy. The goal of having root untrusted is not achieved.

Maybe there is another way to prevent unloading
the kernel module. I didn't find it yet. If there was an
LSM hook called when kernel modules are unloaded,
that would be sufficient, I guess.

My point was that pinning progs seems to be the
recommended way of keeping them running. Pinning
them to unreachable inodes intuitively looked the
way to go for achieving the stated goal. Or maybe I
should just increment the reference count of links
and don't decrement during an rmmod?

If there is something I'm missing, please let me know.

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Zhong Ronghua

Alexei Starovoitov April 1, 2022, 11:55 p.m. UTC | #5

On Thu, Mar 31, 2022 at 08:25:22AM +0000, Roberto Sassu wrote:
> > From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com]
> > Sent: Thursday, March 31, 2022 4:27 AM
> > On Mon, Mar 28, 2022 at 07:50:15PM +0200, Roberto Sassu wrote:
> > > eBPF already allows programs to be preloaded and kept running without
> > > intervention from user space. There is a dedicated kernel module called
> > > bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
> > > program. If this module is enabled in the kernel configuration, its loading
> > > will be triggered when the bpf filesystem is mounted (unless the module is
> > > built-in), and the links of iterators_bpf are pinned in that filesystem
> > > (they will appear as the progs.debug and maps.debug files).
> > >
> > > However, the current mechanism, if used to preload an LSM, would not
> > offer
> > > the same security guarantees of LSMs integrated in the security
> > subsystem.
> > > Also, it is not generic enough to be used for preloading arbitrary eBPF
> > > programs, unless the bpf_preload code is heavily modified.
> > >
> > > More specifically, the security problems are:
> > > - any program can be pinned to the bpf filesystem without limitations
> > >   (unless a MAC mechanism enforces some restrictions);
> > > - programs being executed can be terminated at any time by deleting the
> > >   pinned objects or unmounting the bpf filesystem.
> > 
> > So many things to untangle here.
> 
> Hi Alexei
> 
> thanks for taking the time to provide such detailed
> explanation.
> 
> > The above paragraphs are misleading and incorrect.
> > The commit log sounds like there are security issues that this
> > patch set is fixing.
> > This is not true.
> 
> I reiterate the goal: enforce a mandatory policy with
> an out-of-tree LSM (a kernel module is fine), with the
> same guarantees of LSMs integrated in the security
> subsystem.

To make it 100% clear:
Any in-kernel feature that benefits out-of-tree module will be rejected.

> The root user is not part of the TCB (i.e. is untrusted),
> all the changes that user wants to make must be subject
> of decision by the LSM enforcing the mandatory policy.
> 
> I thought about adding support for LSMs from kernel
> modules via a new built-in LSM (called LoadLSM), but

Such approach will be rejected. See above.

> > I suspect there is huge confusion on what these two "progs.debug"
> > and "maps.debug" files are in a bpffs instance.
> > They are debug files to pretty pring loaded maps and progs for folks who
> > like to use 'cat' to examine the state of the system instead of 'bpftool'.
> > The root can remove these files from bpffs.
> > 
> > There is no reason for kernel module to pin its bpf progs.
> > If you want to develop DIGLIM as a kernel module that uses light skeleton
> > just do:
> > #include <linux/init.h>
> > #include <linux/module.h>
> > #include "diglim.lskel.h"
> > 
> > static struct diglim_bpf *skel;
> > 
> > static int __init load(void)
> > {
> >         skel = diglim_bpf__open_and_load();
> >         err = diglim_bpf__attach(skel);
> > }
> > /* detach skel in __fini */
> > 
> > It's really that short.
> > 
> > Then you will be able to
> > - insmod diglim.ko -> will load and attach bpf progs.
> > - rmmod diglim -> will detach them.
> 
> root can stop the LSM without consulting the security
> policy. The goal of having root untrusted is not achieved.

Out-of-tree module can do any hack.
For example:
1. don't do detach skel in __fini
  rmmod will remove the module, but bpf progs will keep running.
2. do module_get(THIS_MODULE) in __init
  rmmod will return EBUSY
  and have some out-of-band way of dropping mod refcnt.
3. hack into sys_delete_module. if module_name==diglem return EBUSY.
4. add proper LSM hook to delete_module

> My point was that pinning progs seems to be the
> recommended way of keeping them running. 

Not quite. bpf_link refcnt is what keeps progs attached.
bpffs is mainly used for:
- to pass maps/links from one process to another
when passing fd is not possible.
- to solve the case of crashing user space.
The user space agent will restart and will pick up where
it's left by reading map, link, prog FDs from bpffs.
- pinning bpf iterators that are later used to 'cat' such files.
That is what bpf_preload is doing by creating two debug
files "maps.debug" and "progs.debug".

> Pinning
> them to unreachable inodes intuitively looked the
> way to go for achieving the stated goal. 

We can consider inodes in bpffs that are not unlinkable by root
in the future, but certainly not for this use case.

> Or maybe I
> should just increment the reference count of links
> and don't decrement during an rmmod?

I suggest to abandon out-of-tree goal.
Only then we can help and continue this discussion.

KP Singh April 2, 2022, 1:03 a.m. UTC | #6

On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Mar 31, 2022 at 08:25:22AM +0000, Roberto Sassu wrote:
> > > From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com]
> > > Sent: Thursday, March 31, 2022 4:27 AM
> > > On Mon, Mar 28, 2022 at 07:50:15PM +0200, Roberto Sassu wrote:
> > > > eBPF already allows programs to be preloaded and kept running without
> > > > intervention from user space. There is a dedicated kernel module called
> > > > bpf_preload, which contains the light skeleton of the iterators_bpf eBPF
> > > > program. If this module is enabled in the kernel configuration, its loading
> > > > will be triggered when the bpf filesystem is mounted (unless the module is
> > > > built-in), and the links of iterators_bpf are pinned in that filesystem
> > > > (they will appear as the progs.debug and maps.debug files).
> > > >
> > > > However, the current mechanism, if used to preload an LSM, would not
> > > offer
> > > > the same security guarantees of LSMs integrated in the security
> > > subsystem.
> > > > Also, it is not generic enough to be used for preloading arbitrary eBPF
> > > > programs, unless the bpf_preload code is heavily modified.
> > > >
> > > > More specifically, the security problems are:
> > > > - any program can be pinned to the bpf filesystem without limitations
> > > >   (unless a MAC mechanism enforces some restrictions);
> > > > - programs being executed can be terminated at any time by deleting the
> > > >   pinned objects or unmounting the bpf filesystem.
> > >
> > > So many things to untangle here.
> >
> > Hi Alexei
> >
> > thanks for taking the time to provide such detailed
> > explanation.
> >
> > > The above paragraphs are misleading and incorrect.
> > > The commit log sounds like there are security issues that this
> > > patch set is fixing.
> > > This is not true.

+1 these are not security issues. They are limitations of your MAC policy.

> >
> > I reiterate the goal: enforce a mandatory policy with
> > an out-of-tree LSM (a kernel module is fine), with the
> > same guarantees of LSMs integrated in the security
> > subsystem.
>
> To make it 100% clear:
> Any in-kernel feature that benefits out-of-tree module will be rejected.
>
> > The root user is not part of the TCB (i.e. is untrusted),
> > all the changes that user wants to make must be subject
> > of decision by the LSM enforcing the mandatory policy.
> >
> > I thought about adding support for LSMs from kernel
> > modules via a new built-in LSM (called LoadLSM), but

Kernel modules cannot implement LSMs, this has already been
proposed on the lists and has been rejected.

>
> Such approach will be rejected. See above.
>
> > > I suspect there is huge confusion on what these two "progs.debug"
> > > and "maps.debug" files are in a bpffs instance.
> > > They are debug files to pretty pring loaded maps and progs for folks who
> > > like to use 'cat' to examine the state of the system instead of 'bpftool'.
> > > The root can remove these files from bpffs.
> > >
> > > There is no reason for kernel module to pin its bpf progs.
> > > If you want to develop DIGLIM as a kernel module that uses light skeleton
> > > just do:
> > > #include <linux/init.h>
> > > #include <linux/module.h>
> > > #include "diglim.lskel.h"
> > >
> > > static struct diglim_bpf *skel;
> > >
> > > static int __init load(void)
> > > {
> > >         skel = diglim_bpf__open_and_load();
> > >         err = diglim_bpf__attach(skel);
> > > }
> > > /* detach skel in __fini */
> > >
> > > It's really that short.
> > >
> > > Then you will be able to
> > > - insmod diglim.ko -> will load and attach bpf progs.
> > > - rmmod diglim -> will detach them.
> >
> > root can stop the LSM without consulting the security
> > policy. The goal of having root untrusted is not achieved.

Ofcourse, this is an issue, if you are using BPF to define a MAC
policy, the policy
needs to be comprehensive to prevent itself from being overridden. This is why
We have so many LSM hooks. If you think some are missing, let's add them.

This is why implementing a policy is not trivial, but we need to allow
users to build
such policies with the help from the kernel and not by using
out-of-tree modules.

I do think we can add some more helpers (e.g. for modifying xattrs
from BPF) that
would help us build complex policies.

>
> Out-of-tree module can do any hack.
> For example:
> 1. don't do detach skel in __fini
>   rmmod will remove the module, but bpf progs will keep running.
> 2. do module_get(THIS_MODULE) in __init
>   rmmod will return EBUSY
>   and have some out-of-band way of dropping mod refcnt.
> 3. hack into sys_delete_module. if module_name==diglem return EBUSY.
> 4. add proper LSM hook to delete_module

+1 I recommend this (but not from an out of tree module)

>
> > My point was that pinning progs seems to be the
> > recommended way of keeping them running.
>
> Not quite. bpf_link refcnt is what keeps progs attached.
> bpffs is mainly used for:
> - to pass maps/links from one process to another
> when passing fd is not possible.
> - to solve the case of crashing user space.
> The user space agent will restart and will pick up where
> it's left by reading map, link, prog FDs from bpffs.
> - pinning bpf iterators that are later used to 'cat' such files.
> That is what bpf_preload is doing by creating two debug
> files "maps.debug" and "progs.debug".
>
> > Pinning
> > them to unreachable inodes intuitively looked the
> > way to go for achieving the stated goal.
>
> We can consider inodes in bpffs that are not unlinkable by root
> in the future, but certainly not for this use case.

Can this not be already done by adding a BPF_LSM program to the
inode_unlink LSM hook?

>
> > Or maybe I
> > should just increment the reference count of links
> > and don't decrement during an rmmod?
>
> I suggest to abandon out-of-tree goal.
> Only then we can help and continue this discussion.

+1

Djalal Harouni April 4, 2022, 7:44 a.m. UTC | #7

On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
>
> On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
...
> >
> > > Pinning
> > > them to unreachable inodes intuitively looked the
> > > way to go for achieving the stated goal.
> >
> > We can consider inodes in bpffs that are not unlinkable by root
> > in the future, but certainly not for this use case.
>
> Can this not be already done by adding a BPF_LSM program to the
> inode_unlink LSM hook?
>

Also, beside of the inode_unlink... and out of curiosity: making sysfs/bpffs/
readonly after pinning, then using bpf LSM hooks sb_mount|remount|unmount...
family combining bpf() LSM hook... isn't this enough to:
1. Restrict who can pin to bpffs without using a full MAC
2. Restrict who can delete or unmount bpf filesystem

?

Roberto Sassu April 4, 2022, 5:20 p.m. UTC | #8

> From: Djalal Harouni [mailto:tixxdz@gmail.com]
> Sent: Monday, April 4, 2022 9:45 AM
> On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
> >
> > On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> ...
> > >
> > > > Pinning
> > > > them to unreachable inodes intuitively looked the
> > > > way to go for achieving the stated goal.
> > >
> > > We can consider inodes in bpffs that are not unlinkable by root
> > > in the future, but certainly not for this use case.
> >
> > Can this not be already done by adding a BPF_LSM program to the
> > inode_unlink LSM hook?
> >
> 
> Also, beside of the inode_unlink... and out of curiosity: making sysfs/bpffs/
> readonly after pinning, then using bpf LSM hooks
> sb_mount|remount|unmount...
> family combining bpf() LSM hook... isn't this enough to:
> 1. Restrict who can pin to bpffs without using a full MAC
> 2. Restrict who can delete or unmount bpf filesystem
> 
> ?

I'm thinking to implement something like this.

First, I add a new program flag called
BPF_F_STOP_ONCONFIRM, which causes the ref count
of the link to increase twice at creation time. In this way,
user space cannot make the link disappear, unless a
confirmation is explicitly sent via the bpf() system call.

Another advantage is that other LSMs can decide
whether or not they allow a program with this flag
(in the bpf security hook).

This would work regardless of the method used to
load the eBPF program (user space or kernel space).

Second, I extend the bpf() system call with a new
subcommand, BPF_LINK_CONFIRM_STOP, which
decreases the ref count for the link of the programs
with the BPF_F_STOP_ONCONFIRM flag. I will also
introduce a new security hook (something like
security_link_confirm_stop), so that an LSM has the
opportunity to deny the stop (the bpf security hook
would not be sufficient to determine exactly for
which link the confirmation is given, an LSM should
be able to deny the stop for its own programs).

What do you think?

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Zhong Ronghua

Roberto Sassu April 4, 2022, 5:41 p.m. UTC | #9

> From: KP Singh [mailto:kpsingh@kernel.org]
> Sent: Saturday, April 2, 2022 3:03 AM
> On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Mar 31, 2022 at 08:25:22AM +0000, Roberto Sassu wrote:
> > > > From: Alexei Starovoitov [mailto:alexei.starovoitov@gmail.com]
> > > > Sent: Thursday, March 31, 2022 4:27 AM
> > > > On Mon, Mar 28, 2022 at 07:50:15PM +0200, Roberto Sassu wrote:
> > > > > eBPF already allows programs to be preloaded and kept running
> without
> > > > > intervention from user space. There is a dedicated kernel module
> called
> > > > > bpf_preload, which contains the light skeleton of the iterators_bpf
> eBPF
> > > > > program. If this module is enabled in the kernel configuration, its
> loading
> > > > > will be triggered when the bpf filesystem is mounted (unless the
> module is
> > > > > built-in), and the links of iterators_bpf are pinned in that filesystem
> > > > > (they will appear as the progs.debug and maps.debug files).
> > > > >
> > > > > However, the current mechanism, if used to preload an LSM, would
> not
> > > > offer
> > > > > the same security guarantees of LSMs integrated in the security
> > > > subsystem.
> > > > > Also, it is not generic enough to be used for preloading arbitrary eBPF
> > > > > programs, unless the bpf_preload code is heavily modified.
> > > > >
> > > > > More specifically, the security problems are:
> > > > > - any program can be pinned to the bpf filesystem without limitations
> > > > >   (unless a MAC mechanism enforces some restrictions);
> > > > > - programs being executed can be terminated at any time by deleting
> the
> > > > >   pinned objects or unmounting the bpf filesystem.
> > > >
> > > > So many things to untangle here.
> > >
> > > Hi Alexei
> > >
> > > thanks for taking the time to provide such detailed
> > > explanation.
> > >
> > > > The above paragraphs are misleading and incorrect.
> > > > The commit log sounds like there are security issues that this
> > > > patch set is fixing.
> > > > This is not true.
> 
> +1 these are not security issues. They are limitations of your MAC policy.
> 
> > >
> > > I reiterate the goal: enforce a mandatory policy with
> > > an out-of-tree LSM (a kernel module is fine), with the
> > > same guarantees of LSMs integrated in the security
> > > subsystem.
> >
> > To make it 100% clear:
> > Any in-kernel feature that benefits out-of-tree module will be rejected.
> >
> > > The root user is not part of the TCB (i.e. is untrusted),
> > > all the changes that user wants to make must be subject
> > > of decision by the LSM enforcing the mandatory policy.
> > >
> > > I thought about adding support for LSMs from kernel
> > > modules via a new built-in LSM (called LoadLSM), but
> 
> Kernel modules cannot implement LSMs, this has already been
> proposed on the lists and has been rejected.

Looking at commit cb80ddc67152 ("bpf: Convert bpf_preload.ko
to use light skeleton."), I got that it is the most efficient way
to load an eBPF program (does not even require libbpf).

Another advantage was that we get integrity verification
from the module infrastructure. This would have been the
optimal solution in terms of dependencies. Enforcing
integrity could be turned on with the module.sig_enforce
kernel option.

If we switch to user space, the choice would be IMA.
However, in my use case (DIGLIM) it would be used just
for the purpose of doing integrity verifications pre-init.

Thinking which policy could be implemented for such purpose,
maybe something like appraise every process that is not linked
to an executable? And since there are no xattrs in the initial
ram disk, could I append a module signature to an ELF binary?

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Zhong Ronghua

> > Such approach will be rejected. See above.
> >
> > > > I suspect there is huge confusion on what these two "progs.debug"
> > > > and "maps.debug" files are in a bpffs instance.
> > > > They are debug files to pretty pring loaded maps and progs for folks
> who
> > > > like to use 'cat' to examine the state of the system instead of 'bpftool'.
> > > > The root can remove these files from bpffs.
> > > >
> > > > There is no reason for kernel module to pin its bpf progs.
> > > > If you want to develop DIGLIM as a kernel module that uses light
> skeleton
> > > > just do:
> > > > #include <linux/init.h>
> > > > #include <linux/module.h>
> > > > #include "diglim.lskel.h"
> > > >
> > > > static struct diglim_bpf *skel;
> > > >
> > > > static int __init load(void)
> > > > {
> > > >         skel = diglim_bpf__open_and_load();
> > > >         err = diglim_bpf__attach(skel);
> > > > }
> > > > /* detach skel in __fini */
> > > >
> > > > It's really that short.
> > > >
> > > > Then you will be able to
> > > > - insmod diglim.ko -> will load and attach bpf progs.
> > > > - rmmod diglim -> will detach them.
> > >
> > > root can stop the LSM without consulting the security
> > > policy. The goal of having root untrusted is not achieved.
> 
> Ofcourse, this is an issue, if you are using BPF to define a MAC
> policy, the policy
> needs to be comprehensive to prevent itself from being overridden. This is
> why
> We have so many LSM hooks. If you think some are missing, let's add them.
> 
> This is why implementing a policy is not trivial, but we need to allow
> users to build
> such policies with the help from the kernel and not by using
> out-of-tree modules.
> 
> I do think we can add some more helpers (e.g. for modifying xattrs
> from BPF) that
> would help us build complex policies.
> 
> >
> > Out-of-tree module can do any hack.
> > For example:
> > 1. don't do detach skel in __fini
> >   rmmod will remove the module, but bpf progs will keep running.
> > 2. do module_get(THIS_MODULE) in __init
> >   rmmod will return EBUSY
> >   and have some out-of-band way of dropping mod refcnt.
> > 3. hack into sys_delete_module. if module_name==diglem return EBUSY.
> > 4. add proper LSM hook to delete_module
> 
> +1 I recommend this (but not from an out of tree module)
> 
> >
> > > My point was that pinning progs seems to be the
> > > recommended way of keeping them running.
> >
> > Not quite. bpf_link refcnt is what keeps progs attached.
> > bpffs is mainly used for:
> > - to pass maps/links from one process to another
> > when passing fd is not possible.
> > - to solve the case of crashing user space.
> > The user space agent will restart and will pick up where
> > it's left by reading map, link, prog FDs from bpffs.
> > - pinning bpf iterators that are later used to 'cat' such files.
> > That is what bpf_preload is doing by creating two debug
> > files "maps.debug" and "progs.debug".
> >
> > > Pinning
> > > them to unreachable inodes intuitively looked the
> > > way to go for achieving the stated goal.
> >
> > We can consider inodes in bpffs that are not unlinkable by root
> > in the future, but certainly not for this use case.
> 
> Can this not be already done by adding a BPF_LSM program to the
> inode_unlink LSM hook?
> 
> >
> > > Or maybe I
> > > should just increment the reference count of links
> > > and don't decrement during an rmmod?
> >
> > I suggest to abandon out-of-tree goal.
> > Only then we can help and continue this discussion.
> 
> +1

Alexei Starovoitov April 4, 2022, 10:49 p.m. UTC | #10

On Mon, Apr 4, 2022 at 10:21 AM Roberto Sassu <roberto.sassu@huawei.com> wrote:
>
> > From: Djalal Harouni [mailto:tixxdz@gmail.com]
> > Sent: Monday, April 4, 2022 9:45 AM
> > On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
> > >
> > > On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > ...
> > > >
> > > > > Pinning
> > > > > them to unreachable inodes intuitively looked the
> > > > > way to go for achieving the stated goal.
> > > >
> > > > We can consider inodes in bpffs that are not unlinkable by root
> > > > in the future, but certainly not for this use case.
> > >
> > > Can this not be already done by adding a BPF_LSM program to the
> > > inode_unlink LSM hook?
> > >
> >
> > Also, beside of the inode_unlink... and out of curiosity: making sysfs/bpffs/
> > readonly after pinning, then using bpf LSM hooks
> > sb_mount|remount|unmount...
> > family combining bpf() LSM hook... isn't this enough to:
> > 1. Restrict who can pin to bpffs without using a full MAC
> > 2. Restrict who can delete or unmount bpf filesystem
> >
> > ?
>
> I'm thinking to implement something like this.
>
> First, I add a new program flag called
> BPF_F_STOP_ONCONFIRM, which causes the ref count
> of the link to increase twice at creation time. In this way,
> user space cannot make the link disappear, unless a
> confirmation is explicitly sent via the bpf() system call.
>
> Another advantage is that other LSMs can decide
> whether or not they allow a program with this flag
> (in the bpf security hook).
>
> This would work regardless of the method used to
> load the eBPF program (user space or kernel space).
>
> Second, I extend the bpf() system call with a new
> subcommand, BPF_LINK_CONFIRM_STOP, which
> decreases the ref count for the link of the programs
> with the BPF_F_STOP_ONCONFIRM flag. I will also
> introduce a new security hook (something like
> security_link_confirm_stop), so that an LSM has the
> opportunity to deny the stop (the bpf security hook
> would not be sufficient to determine exactly for
> which link the confirmation is given, an LSM should
> be able to deny the stop for its own programs).
>
> What do you think?

Hack upon a hack? Makes no sense.

KP Singh April 5, 2022, midnight UTC | #11

On Tue, Apr 5, 2022 at 12:49 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 4, 2022 at 10:21 AM Roberto Sassu <roberto.sassu@huawei.com> wrote:
> >
> > > From: Djalal Harouni [mailto:tixxdz@gmail.com]
> > > Sent: Monday, April 4, 2022 9:45 AM
> > > On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
> > > >
> > > > On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > ...
> > > > >
> > > > > > Pinning
> > > > > > them to unreachable inodes intuitively looked the
> > > > > > way to go for achieving the stated goal.
> > > > >
> > > > > We can consider inodes in bpffs that are not unlinkable by root
> > > > > in the future, but certainly not for this use case.
> > > >
> > > > Can this not be already done by adding a BPF_LSM program to the
> > > > inode_unlink LSM hook?
> > > >
> > >
> > > Also, beside of the inode_unlink... and out of curiosity: making sysfs/bpffs/
> > > readonly after pinning, then using bpf LSM hooks
> > > sb_mount|remount|unmount...
> > > family combining bpf() LSM hook... isn't this enough to:
> > > 1. Restrict who can pin to bpffs without using a full MAC
> > > 2. Restrict who can delete or unmount bpf filesystem
> > >

I like this approach better, you will have to restrict the BPF, if you
want to implement MAC policy using BPF.

Can you please try implementing something using these hooks?

> > > ?
> >
> > I'm thinking to implement something like this.
> >
> > First, I add a new program flag called
> > BPF_F_STOP_ONCONFIRM, which causes the ref count
> > of the link to increase twice at creation time. In this way,
> > user space cannot make the link disappear, unless a
> > confirmation is explicitly sent via the bpf() system call.

I don't like this approach, this just sounds like an intentional
dangling reference, prone to refcounting errors and it does not
really solve the purpose you want to achieve.

And you will still need a policy around the BPF syscall,
so why not just use the LSM hooks as suggested above?

> >
> > Another advantage is that other LSMs can decide
> > whether or not they allow a program with this flag
> > (in the bpf security hook).
> >
> > This would work regardless of the method used to
> > load the eBPF program (user space or kernel space).
> >
> > Second, I extend the bpf() system call with a new
> > subcommand, BPF_LINK_CONFIRM_STOP, which
> > decreases the ref count for the link of the programs
> > with the BPF_F_STOP_ONCONFIRM flag. I will also
> > introduce a new security hook (something like
> > security_link_confirm_stop), so that an LSM has the
> > opportunity to deny the stop (the bpf security hook
> > would not be sufficient to determine exactly for
> > which link the confirmation is given, an LSM should
> > be able to deny the stop for its own programs).
> >
> > What do you think?
>
> Hack upon a hack? Makes no sense.

Casey Schaufler April 5, 2022, 2:49 p.m. UTC | #12

On 4/4/2022 10:20 AM, Roberto Sassu wrote:
>> From: Djalal Harouni [mailto:tixxdz@gmail.com]
>> Sent: Monday, April 4, 2022 9:45 AM
>> On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
>>> On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>> ...
>>>>> Pinning
>>>>> them to unreachable inodes intuitively looked the
>>>>> way to go for achieving the stated goal.
>>>> We can consider inodes in bpffs that are not unlinkable by root
>>>> in the future, but certainly not for this use case.
>>> Can this not be already done by adding a BPF_LSM program to the
>>> inode_unlink LSM hook?
>>>
>> Also, beside of the inode_unlink... and out of curiosity: making sysfs/bpffs/
>> readonly after pinning, then using bpf LSM hooks
>> sb_mount|remount|unmount...
>> family combining bpf() LSM hook... isn't this enough to:
>> 1. Restrict who can pin to bpffs without using a full MAC
>> 2. Restrict who can delete or unmount bpf filesystem
>>
>> ?
> I'm thinking to implement something like this.
>
> First, I add a new program flag called
> BPF_F_STOP_ONCONFIRM, which causes the ref count
> of the link to increase twice at creation time. In this way,
> user space cannot make the link disappear, unless a
> confirmation is explicitly sent via the bpf() system call.
>
> Another advantage is that other LSMs can decide
> whether or not they allow a program with this flag
> (in the bpf security hook).
>
> This would work regardless of the method used to
> load the eBPF program (user space or kernel space).
>
> Second, I extend the bpf() system call with a new
> subcommand, BPF_LINK_CONFIRM_STOP, which
> decreasres the ref count for the link of the programs
> with the BPF_F_STOP_ONCONFIRM flag. I will also
> introduce a new security hook (something like
> security_link_confirm_stop), so that an LSM has the
> opportunity to deny the stop (the bpf security hook
> would not be sufficient to determine exactly for
> which link the confirmation is given, an LSM should
> be able to deny the stop for its own programs).

Would you please stop referring to a set of eBPF programs
loaded into the BPF LSM as an LSM? Call it a BPF security
module (BSM) if you must use an abbreviation. An LSM is a
provider of security_ hooks. In your case that is BPF. When
you call the set of eBPF programs an LSM it is like calling
an SELinux policy an LSM.

>
> What do you think?
>
> Thanks
>
> Roberto
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
> Managing Director: Li Peng, Zhong Ronghua

Roberto Sassu April 5, 2022, 3:29 p.m. UTC | #13

> From: Casey Schaufler [mailto:casey@schaufler-ca.com]
> Sent: Tuesday, April 5, 2022 4:50 PM
> On 4/4/2022 10:20 AM, Roberto Sassu wrote:
> >> From: Djalal Harouni [mailto:tixxdz@gmail.com]
> >> Sent: Monday, April 4, 2022 9:45 AM
> >> On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
> >>> On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >> ...
> >>>>> Pinning
> >>>>> them to unreachable inodes intuitively looked the
> >>>>> way to go for achieving the stated goal.
> >>>> We can consider inodes in bpffs that are not unlinkable by root
> >>>> in the future, but certainly not for this use case.
> >>> Can this not be already done by adding a BPF_LSM program to the
> >>> inode_unlink LSM hook?
> >>>
> >> Also, beside of the inode_unlink... and out of curiosity: making
> sysfs/bpffs/
> >> readonly after pinning, then using bpf LSM hooks
> >> sb_mount|remount|unmount...
> >> family combining bpf() LSM hook... isn't this enough to:
> >> 1. Restrict who can pin to bpffs without using a full MAC
> >> 2. Restrict who can delete or unmount bpf filesystem
> >>
> >> ?
> > I'm thinking to implement something like this.
> >
> > First, I add a new program flag called
> > BPF_F_STOP_ONCONFIRM, which causes the ref count
> > of the link to increase twice at creation time. In this way,
> > user space cannot make the link disappear, unless a
> > confirmation is explicitly sent via the bpf() system call.
> >
> > Another advantage is that other LSMs can decide
> > whether or not they allow a program with this flag
> > (in the bpf security hook).
> >
> > This would work regardless of the method used to
> > load the eBPF program (user space or kernel space).
> >
> > Second, I extend the bpf() system call with a new
> > subcommand, BPF_LINK_CONFIRM_STOP, which
> > decreasres the ref count for the link of the programs
> > with the BPF_F_STOP_ONCONFIRM flag. I will also
> > introduce a new security hook (something like
> > security_link_confirm_stop), so that an LSM has the
> > opportunity to deny the stop (the bpf security hook
> > would not be sufficient to determine exactly for
> > which link the confirmation is given, an LSM should
> > be able to deny the stop for its own programs).
> 
> Would you please stop referring to a set of eBPF programs
> loaded into the BPF LSM as an LSM? Call it a BPF security
> module (BSM) if you must use an abbreviation. An LSM is a
> provider of security_ hooks. In your case that is BPF. When
> you call the set of eBPF programs an LSM it is like calling
> an SELinux policy an LSM.

An eBPF program could be a provider of security_ hooks
too. The bpf LSM is an aggregator, similarly to your
infrastructure to manage built-in LSMs. Maybe, calling
it second-level LSM or secondary LSM would better
represent this new class.

The only differences are the registration method, (SEC
directive instead of DEFINE_LSM), and what the hook
implementation can access.

The implementation of a security_ hook via eBPF can
follow the same structure of built-in LSMs, i.e. it can be
uniquely responsible for enforcing and be policy-agnostic,
and can retrieve the decisions based on a policy from a
component implemented somewhere else.

Hopefully, I understood the basic principles correctly.
I let the eBPF maintainers comment on this.

Thanks

Roberto

HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
Managing Director: Li Peng, Zhong Ronghua

> > What do you think?
> >
> > Thanks
> >
> > Roberto
> >
> > HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
> > Managing Director: Li Peng, Zhong Ronghua

Casey Schaufler April 5, 2022, 4:21 p.m. UTC | #14

On 4/5/2022 8:29 AM, Roberto Sassu wrote:
>> From: Casey Schaufler [mailto:casey@schaufler-ca.com]
>> Sent: Tuesday, April 5, 2022 4:50 PM
>> On 4/4/2022 10:20 AM, Roberto Sassu wrote:
>>>> From: Djalal Harouni [mailto:tixxdz@gmail.com]
>>>> Sent: Monday, April 4, 2022 9:45 AM
>>>> On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
>>>>> On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>> ...
>>>>>>> Pinning
>>>>>>> them to unreachable inodes intuitively looked the
>>>>>>> way to go for achieving the stated goal.
>>>>>> We can consider inodes in bpffs that are not unlinkable by root
>>>>>> in the future, but certainly not for this use case.
>>>>> Can this not be already done by adding a BPF_LSM program to the
>>>>> inode_unlink LSM hook?
>>>>>
>>>> Also, beside of the inode_unlink... and out of curiosity: making
>> sysfs/bpffs/
>>>> readonly after pinning, then using bpf LSM hooks
>>>> sb_mount|remount|unmount...
>>>> family combining bpf() LSM hook... isn't this enough to:
>>>> 1. Restrict who can pin to bpffs without using a full MAC
>>>> 2. Restrict who can delete or unmount bpf filesystem
>>>>
>>>> ?
>>> I'm thinking to implement something like this.
>>>
>>> First, I add a new program flag called
>>> BPF_F_STOP_ONCONFIRM, which causes the ref count
>>> of the link to increase twice at creation time. In this way,
>>> user space cannot make the link disappear, unless a
>>> confirmation is explicitly sent via the bpf() system call.
>>>
>>> Another advantage is that other LSMs can decide
>>> whether or not they allow a program with this flag
>>> (in the bpf security hook).
>>>
>>> This would work regardless of the method used to
>>> load the eBPF program (user space or kernel space).
>>>
>>> Second, I extend the bpf() system call with a new
>>> subcommand, BPF_LINK_CONFIRM_STOP, which
>>> decreasres the ref count for the link of the programs
>>> with the BPF_F_STOP_ONCONFIRM flag. I will also
>>> introduce a new security hook (something like
>>> security_link_confirm_stop), so that an LSM has the
>>> opportunity to deny the stop (the bpf security hook
>>> would not be sufficient to determine exactly for
>>> which link the confirmation is given, an LSM should
>>> be able to deny the stop for its own programs).
>> Would you please stop referring to a set of eBPF programs
>> loaded into the BPF LSM as an LSM? Call it a BPF security
>> module (BSM) if you must use an abbreviation. An LSM is a
>> provider of security_ hooks. In your case that is BPF. When
>> you call the set of eBPF programs an LSM it is like calling
>> an SELinux policy an LSM.
> An eBPF program could be a provider of security_ hooks
> too.

No, it can't. If I look in /sys/kernel/security/lsm what
you see is "bpf". The LSM is BPF. What BPF does in its
hooks is up to it and its responsibility.

>   The bpf LSM is an aggregator, similarly to your
> infrastructure to manage built-in LSMs. Maybe, calling
> it second-level LSM or secondary LSM would better
> represent this new class.

It isn't an LSM, and adding a qualifier doesn't make it
one and only adds to the confusion.

> The only differences are the registration method, (SEC
> directive instead of DEFINE_LSM), and what the hook
> implementation can access.

Those two things pretty well define what an LSM is.

> The implementation of a security_ hook via eBPF can
> follow the same structure of built-in LSMs, i.e. it can be
> uniquely responsible for enforcing and be policy-agnostic,
> and can retrieve the decisions based on a policy from a
> component implemented somewhere else.

The BPF LSM provides mechanism. The eBPF programs provide policy.

>
> Hopefully, I understood the basic principles correctly.
> I let the eBPF maintainers comment on this.
>
> Thanks
>
> Roberto
>
> HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
> Managing Director: Li Peng, Zhong Ronghua
>
>>> What do you think?
>>>
>>> Thanks
>>>
>>> Roberto
>>>
>>> HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
>>> Managing Director: Li Peng, Zhong Ronghua

KP Singh April 5, 2022, 4:37 p.m. UTC | #15

On Tue, Apr 5, 2022 at 6:22 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>
> On 4/5/2022 8:29 AM, Roberto Sassu wrote:
> >> From: Casey Schaufler [mailto:casey@schaufler-ca.com]
> >> Sent: Tuesday, April 5, 2022 4:50 PM
> >> On 4/4/2022 10:20 AM, Roberto Sassu wrote:
> >>>> From: Djalal Harouni [mailto:tixxdz@gmail.com]
> >>>> Sent: Monday, April 4, 2022 9:45 AM
> >>>> On Sun, Apr 3, 2022 at 5:42 PM KP Singh <kpsingh@kernel.org> wrote:
> >>>>> On Sat, Apr 2, 2022 at 1:55 AM Alexei Starovoitov
> >>>>> <alexei.starovoitov@gmail.com> wrote:
> >>>> ...
> >>>>>>> Pinning
> >>>>>>> them to unreachable inodes intuitively looked the
> >>>>>>> way to go for achieving the stated goal.
> >>>>>> We can consider inodes in bpffs that are not unlinkable by root
> >>>>>> in the future, but certainly not for this use case.
> >>>>> Can this not be already done by adding a BPF_LSM program to the
> >>>>> inode_unlink LSM hook?
> >>>>>
> >>>> Also, beside of the inode_unlink... and out of curiosity: making
> >> sysfs/bpffs/
> >>>> readonly after pinning, then using bpf LSM hooks
> >>>> sb_mount|remount|unmount...
> >>>> family combining bpf() LSM hook... isn't this enough to:
> >>>> 1. Restrict who can pin to bpffs without using a full MAC
> >>>> 2. Restrict who can delete or unmount bpf filesystem
> >>>>
> >>>> ?
> >>> I'm thinking to implement something like this.
> >>>
> >>> First, I add a new program flag called
> >>> BPF_F_STOP_ONCONFIRM, which causes the ref count
> >>> of the link to increase twice at creation time. In this way,
> >>> user space cannot make the link disappear, unless a
> >>> confirmation is explicitly sent via the bpf() system call.
> >>>
> >>> Another advantage is that other LSMs can decide
> >>> whether or not they allow a program with this flag
> >>> (in the bpf security hook).
> >>>
> >>> This would work regardless of the method used to
> >>> load the eBPF program (user space or kernel space).
> >>>
> >>> Second, I extend the bpf() system call with a new
> >>> subcommand, BPF_LINK_CONFIRM_STOP, which
> >>> decreasres the ref count for the link of the programs
> >>> with the BPF_F_STOP_ONCONFIRM flag. I will also
> >>> introduce a new security hook (something like
> >>> security_link_confirm_stop), so that an LSM has the
> >>> opportunity to deny the stop (the bpf security hook
> >>> would not be sufficient to determine exactly for
> >>> which link the confirmation is given, an LSM should
> >>> be able to deny the stop for its own programs).
> >> Would you please stop referring to a set of eBPF programs
> >> loaded into the BPF LSM as an LSM? Call it a BPF security
> >> module (BSM) if you must use an abbreviation. An LSM is a
> >> provider of security_ hooks. In your case that is BPF. When
> >> you call the set of eBPF programs an LSM it is like calling
> >> an SELinux policy an LSM.
> > An eBPF program could be a provider of security_ hooks
> > too.
>
> No, it can't. If I look in /sys/kernel/security/lsm what
> you see is "bpf". The LSM is BPF. What BPF does in its
> hooks is up to it and its responsibility.
>
> >   The bpf LSM is an aggregator, similarly to your
> > infrastructure to manage built-in LSMs. Maybe, calling
> > it second-level LSM or secondary LSM would better
> > represent this new class.
>
> It isn't an LSM, and adding a qualifier doesn't make it
> one and only adds to the confusion.
>
> > The only differences are the registration method, (SEC
> > directive instead of DEFINE_LSM), and what the hook
> > implementation can access.
>
> Those two things pretty well define what an LSM is.
>
> > The implementation of a security_ hook via eBPF can
> > follow the same structure of built-in LSMs, i.e. it can be
> > uniquely responsible for enforcing and be policy-agnostic,
> > and can retrieve the decisions based on a policy from a
> > component implemented somewhere else.
>
> The BPF LSM provides mechanism. The eBPF programs provide policy.

Yeah, let's stick what we call an LSM in the kernel, Here,
"bpf" is the LSM like selinux,apparmor and this is what you set in
CONFIG_LSM or pass on cmdline as lsm= and can be seen
in /sys/kernel/security/lsm

Calling your BPF programs an LSM will just confuse people.

>
> >
> > Hopefully, I understood the basic principles correctly.
> > I let the eBPF maintainers comment on this.
> >
> > Thanks
> >
> > Roberto
> >
> > HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
> > Managing Director: Li Peng, Zhong Ronghua
> >
> >>> What do you think?
> >>>
> >>> Thanks
> >>>
> >>> Roberto
> >>>
> >>> HUAWEI TECHNOLOGIES Duesseldorf GmbH, HRB 56063
> >>> Managing Director: Li Peng, Zhong Ronghua

[00/18] bpf: Secure and authenticated preloading of eBPF programs

Message

Comments