[v2,bpf-next,0/9] add new acquire/release BPF kfuncs

Message ID	cover.1709675979.git.mattbobrowski@google.com (mailing list archive)
Headers	show Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C32B252F6B for <linux-fsdevel@vger.kernel.org>; Wed, 6 Mar 2024 07:39:22 +0000 (UTC) Date: Wed, 6 Mar 2024 07:39:14 +0000 From: Matt Bobrowski <mattbobrowski@google.com> To: bpf@vger.kernel.org Cc: ast@kernel.org, andrii@kernel.org, kpsingh@google.com, jannh@google.com, jolsa@kernel.org, daniel@iogearbox.net, brauner@kernel.org, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org Subject: [PATCH v2 bpf-next 0/9] add new acquire/release BPF kfuncs Message-ID: <cover.1709675979.git.mattbobrowski@google.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	add new acquire/release BPF kfuncs \| expand [v2,bpf-next,0/9] add new acquire/release BPF kfuncs [v2,bpf-next,1/9] bpf: rename fs_kfunc_set_ids to lsm_kfunc_set_ids [v2,bpf-next,2/9] bpf: add new acquire/release BPF kfuncs for mm_struct [v2,bpf-next,3/9] bpf/selftests: add selftests for mm_struct acquire/release BPF kfuncs [v2,bpf-next,4/9] bpf: add new acquire/release based BPF kfuncs for exe_file [v2,bpf-next,5/9] bpf/selftests: add selftests for exe_file acquire/release BPF kfuncs [v2,bpf-next,6/9] bpf: add acquire/release based BPF kfuncs for fs_struct's paths [v2,bpf-next,7/9] bpf/selftests: add selftests for root/pwd path based BPF kfuncs [v2,bpf-next,9/9] bpf/selftests: adapt selftests test_d_path for BPF kfunc bpf_path_d_path() [v2,bpf-next,8/9] bpf: add trusted d_path() based BPF kfunc bpf_path_d_path()

Matt Bobrowski March 6, 2024, 7:39 a.m. UTC

G'day All,

The original cover letter providing background context and motivating
factors around the needs for the BPF kfuncs introduced within this
patch series can be found here [0], so please do reference that if
need be.

Notably, one of the main contention points within v1 of this patch
series was that we were effectively leaning on some preexisting
in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
within some of the newly introduced BPF kfuncs. As noted in my
response here [1] though, I struggle to understand the technical
reasoning behind why exposing such in-kernel helpers, specifically
only to BPF LSM program types in the form of BPF kfuncs, is inherently
a terrible idea. So, until someone provides me with a sound technical
explanation as to why this cannot or should not be done, I'll continue
to lean on them. The alternative is to reimplement the necessary
in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.

Changes since v1:
   * Dropped the probe-read related patches [2, 3], which focused on
     retroactively fixing bpf_d_path() such that it's susceptability
     to memory corruption issues is drastically reduced. Rightfully so
     though, it was deemed that reimplementing a semi-functional
     variant of d_path() that was effectively backed by
     copy_from_kernel_nofault() is suboptimal.

[0] https://lore.kernel.org/bpf/cover.1708377880.git.mattbobrowski@google.com/
[1] https://lore.kernel.org/bpf/ZdX83H7rTEwMYvs2@google.com/
[2] https://lore.kernel.org/bpf/5643840bd57d0c2345635552ae228dfb2ed3428c.1708377880.git.mattbobrowski@google.com/
[3] https://lore.kernel.org/bpf/18c7b587d43bbc7e80593bf51ea9d3eb99e47bc1.1708377880.git.mattbobrowski@google.com/

Matt Bobrowski (9):
  bpf: rename fs_kfunc_set_ids to lsm_kfunc_set_ids
  bpf: add new acquire/release BPF kfuncs for mm_struct
  bpf/selftests: add selftests for mm_struct acquire/release BPF kfuncs
  bpf: add new acquire/release based BPF kfuncs for exe_file
  bpf/selftests: add selftests for exe_file acquire/release BPF kfuncs
  bpf: add acquire/release based BPF kfuncs for fs_struct's paths
  bpf/selftests: add selftests for root/pwd path based BPF kfuncs
  bpf: add trusted d_path() based BPF kfunc bpf_path_d_path()
  bpf/selftests: adapt selftests test_d_path for BPF kfunc
    bpf_path_d_path()

 kernel/trace/bpf_trace.c                      | 248 +++++++++++++++++-
 .../testing/selftests/bpf/prog_tests/d_path.c |  80 ++++++
 .../selftests/bpf/prog_tests/exe_file_kfunc.c |  49 ++++
 .../selftests/bpf/prog_tests/mm_kfunc.c       |  48 ++++
 .../selftests/bpf/prog_tests/path_kfunc.c     |  48 ++++
 .../selftests/bpf/progs/d_path_common.h       |  35 +++
 .../bpf/progs/d_path_kfunc_failure.c          |  66 +++++
 .../bpf/progs/d_path_kfunc_success.c          |  25 ++
 .../bpf/progs/exe_file_kfunc_common.h         |  23 ++
 .../bpf/progs/exe_file_kfunc_failure.c        | 181 +++++++++++++
 .../bpf/progs/exe_file_kfunc_success.c        |  52 ++++
 .../selftests/bpf/progs/mm_kfunc_common.h     |  19 ++
 .../selftests/bpf/progs/mm_kfunc_failure.c    | 103 ++++++++
 .../selftests/bpf/progs/mm_kfunc_success.c    |  30 +++
 .../selftests/bpf/progs/path_kfunc_common.h   |  20 ++
 .../selftests/bpf/progs/path_kfunc_failure.c  | 114 ++++++++
 .../selftests/bpf/progs/path_kfunc_success.c  |  30 +++
 .../testing/selftests/bpf/progs/test_d_path.c |  20 +-
 .../bpf/progs/test_d_path_check_rdonly_mem.c  |   8 +-
 .../bpf/progs/test_d_path_check_types.c       |   8 +-
 20 files changed, 1160 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/exe_file_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/mm_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/path_kfunc.c
 create mode 100644 tools/testing/selftests/bpf/progs/d_path_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/d_path_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/d_path_kfunc_success.c
 create mode 100644 tools/testing/selftests/bpf/progs/exe_file_kfunc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/exe_file_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/exe_file_kfunc_success.c
 create mode 100644 tools/testing/selftests/bpf/progs/mm_kfunc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/mm_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/mm_kfunc_success.c
 create mode 100644 tools/testing/selftests/bpf/progs/path_kfunc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/path_kfunc_failure.c
 create mode 100644 tools/testing/selftests/bpf/progs/path_kfunc_success.c

Christian Brauner March 6, 2024, 11:21 a.m. UTC | #1

On Wed, Mar 06, 2024 at 07:39:14AM +0000, Matt Bobrowski wrote:
> G'day All,
> 
> The original cover letter providing background context and motivating
> factors around the needs for the BPF kfuncs introduced within this
> patch series can be found here [0], so please do reference that if
> need be.
> 
> Notably, one of the main contention points within v1 of this patch
> series was that we were effectively leaning on some preexisting
> in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
> within some of the newly introduced BPF kfuncs. As noted in my
> response here [1] though, I struggle to understand the technical
> reasoning behind why exposing such in-kernel helpers, specifically
> only to BPF LSM program types in the form of BPF kfuncs, is inherently
> a terrible idea. So, until someone provides me with a sound technical
> explanation as to why this cannot or should not be done, I'll continue
> to lean on them. The alternative is to reimplement the necessary
> in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.

You may lean as much as you like. What I've reacted to is that you've
(not you specifically, I'm sure) messed up. You've exposed d_path() to
users  without understanding that it wasn't safe apparently.

And now we get patches that use the self-inflicted brokeness as an
argument to expose a bunch of other low-level helpers to fix that.

The fact that it's "just bpf LSM" programs doesn't alleviate any
concerns whatsoever. Not just because that is just an entry vector but
also because we have LSMs induced API abuse that we only ever get to see
the fallout from when we refactor apis and then it causes pain for the vfs.

I'll take another look at the proposed helpers you need as bpf kfuncs
and I'll give my best not to be overly annoyed by all of this. I have no
intention of not helping you quite the opposite but I'm annoyed that
we're here in the first place.

What I want is to stop this madness of exposing stuff to users without
fully understanding it's semantics and required guarantees.

Christian Brauner March 6, 2024, 12:13 p.m. UTC | #2

On Wed, Mar 06, 2024 at 12:21:28PM +0100, Christian Brauner wrote:
> On Wed, Mar 06, 2024 at 07:39:14AM +0000, Matt Bobrowski wrote:
> > G'day All,
> > 
> > The original cover letter providing background context and motivating
> > factors around the needs for the BPF kfuncs introduced within this
> > patch series can be found here [0], so please do reference that if
> > need be.
> > 
> > Notably, one of the main contention points within v1 of this patch
> > series was that we were effectively leaning on some preexisting
> > in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
> > within some of the newly introduced BPF kfuncs. As noted in my
> > response here [1] though, I struggle to understand the technical
> > reasoning behind why exposing such in-kernel helpers, specifically
> > only to BPF LSM program types in the form of BPF kfuncs, is inherently
> > a terrible idea. So, until someone provides me with a sound technical
> > explanation as to why this cannot or should not be done, I'll continue
> > to lean on them. The alternative is to reimplement the necessary
> > in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.
> 
> You may lean as much as you like. What I've reacted to is that you've
> (not you specifically, I'm sure) messed up. You've exposed d_path() to
> users  without understanding that it wasn't safe apparently.
> 
> And now we get patches that use the self-inflicted brokeness as an
> argument to expose a bunch of other low-level helpers to fix that.
> 
> The fact that it's "just bpf LSM" programs doesn't alleviate any
> concerns whatsoever. Not just because that is just an entry vector but
> also because we have LSMs induced API abuse that we only ever get to see
> the fallout from when we refactor apis and then it causes pain for the vfs.
> 
> I'll take another look at the proposed helpers you need as bpf kfuncs
> and I'll give my best not to be overly annoyed by all of this. I have no
> intention of not helping you quite the opposite but I'm annoyed that
> we're here in the first place.
> 
> What I want is to stop this madness of exposing stuff to users without
> fully understanding it's semantics and required guarantees.

So, looking at this series you're now asking us to expose:

(1) mmgrab()
(2) mmput()
(3) fput()
(5) get_mm_exe_file()
(4) get_task_exe_file()
(7) get_task_fs_pwd()
(6) get_task_fs_root()
(8) path_get()
(9) path_put()

in one go and the justification in all patches amounts to "This is
common in some BPF LSM programs".

So, broken stuff got exposed to users or at least a broken BPF LSM
program was written somewhere out there that is susceptible to UAFs
becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
So you're now scrambling to fix this by asking for a bunch of low-level
exports.

What is the guarantee that you don't end up writing another BPF LSM that
abuses these exports in a way that causes even more issues and then
someone else comes back asking for the next round of bpf funcs to be
exposed to fix it.

The difference between a regular LSM asking about this and a BPF LSM
program is that we can see in the hook implementation what the LSM
intends to do with this and we can judge whether that's safe or not.

Here you're asking us to do this blindfolded. So I feel we can't really
ACK a series such as this without actually seeing what is intended to be
done with all these helpers that you want as kfuncs. Not after the
previous brokenness.

In any case, you need separate ACKs from mm for the mmgrab()/mmput()
kfuncs as well.

Because really, all I see immediately supportable is the addition of a
safe variant of bpf making use of the trusted pointer argument
constraint:

[PATCH v2 bpf-next 8/9] bpf: add trusted d_path() based BPF kfunc bpf_path_d_path()

Paul Moore March 6, 2024, 9:44 p.m. UTC | #3

On Wed, Mar 6, 2024 at 7:14 AM Christian Brauner <brauner@kernel.org> wrote:
> On Wed, Mar 06, 2024 at 12:21:28PM +0100, Christian Brauner wrote:
> > On Wed, Mar 06, 2024 at 07:39:14AM +0000, Matt Bobrowski wrote:
> > > G'day All,
> > >
> > > The original cover letter providing background context and motivating
> > > factors around the needs for the BPF kfuncs introduced within this
> > > patch series can be found here [0], so please do reference that if
> > > need be.
> > >
> > > Notably, one of the main contention points within v1 of this patch
> > > series was that we were effectively leaning on some preexisting
> > > in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
> > > within some of the newly introduced BPF kfuncs. As noted in my
> > > response here [1] though, I struggle to understand the technical
> > > reasoning behind why exposing such in-kernel helpers, specifically
> > > only to BPF LSM program types in the form of BPF kfuncs, is inherently
> > > a terrible idea. So, until someone provides me with a sound technical
> > > explanation as to why this cannot or should not be done, I'll continue
> > > to lean on them. The alternative is to reimplement the necessary
> > > in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.
> >
> > You may lean as much as you like. What I've reacted to is that you've
> > (not you specifically, I'm sure) messed up. You've exposed d_path() to
> > users  without understanding that it wasn't safe apparently.
> >
> > And now we get patches that use the self-inflicted brokeness as an
> > argument to expose a bunch of other low-level helpers to fix that.
> >
> > The fact that it's "just bpf LSM" programs doesn't alleviate any
> > concerns whatsoever. Not just because that is just an entry vector but
> > also because we have LSMs induced API abuse that we only ever get to see
> > the fallout from when we refactor apis and then it causes pain for the vfs.

Let's not start throwing stones at the in-tree, native LSM folks;
instead let's just all admit that it's often a challenge working
across subsystem lines, but we usually find a way because we all want
the kernel to move forward.  We've all got our grudges and scars,
let's not start poking each other with sticks.

> > I'll take another look at the proposed helpers you need as bpf kfuncs
> > and I'll give my best not to be overly annoyed by all of this. I have no
> > intention of not helping you quite the opposite but I'm annoyed that
> > we're here in the first place.
> >
> > What I want is to stop this madness of exposing stuff to users without
> > fully understanding it's semantics and required guarantees.
>
> So, looking at this series you're now asking us to expose:
>
> (1) mmgrab()
> (2) mmput()
> (3) fput()
> (5) get_mm_exe_file()
> (4) get_task_exe_file()
> (7) get_task_fs_pwd()
> (6) get_task_fs_root()
> (8) path_get()
> (9) path_put()
>
> in one go and the justification in all patches amounts to "This is
> common in some BPF LSM programs".
>
> So, broken stuff got exposed to users or at least a broken BPF LSM
> program was written somewhere out there that is susceptible to UAFs
> becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
> So you're now scrambling to fix this by asking for a bunch of low-level
> exports.
>
> What is the guarantee that you don't end up writing another BPF LSM that
> abuses these exports in a way that causes even more issues and then
> someone else comes back asking for the next round of bpf funcs to be
> exposed to fix it.
>
> The difference between a regular LSM asking about this and a BPF LSM
> program is that we can see in the hook implementation what the LSM
> intends to do with this and we can judge whether that's safe or not.

I wish things were different, but the current reality is that from a
practical perspective BPF LSMs are not much different than
out-of-tree, native LSMs.  There is no requirement, or even
convention, for BPF LSMs to post their designs, docs, or
implementations on the LSM list for review like we do with other LSMs;
maybe it has happened once or twice, but I can't remember the last
time I saw a serious BPF LSM posted to the LSM list.  Even if someone
did do The Right Thing with their BPF LSM and proposed it for review
just like a native LSM, there is no guarantee someone else couldn't
fork that LSM and do something wrong (intentional or not).

Unless something changes, I'm left to treat BPF LSMs the same as any
other out-of-tree kernel code: I'm not going to make life difficult
for a /out-of-tree/BPF/ LSM simply because it is /out-of-tree/BPF/,
but I'm not going to do anything to negatively impact the current, or
future, state of the in-tree LSMs simply to appease the
/out-of-tree/BPF/ LSMs.

Alexei Starovoitov March 7, 2024, 4:05 a.m. UTC | #4

On Wed, Mar 6, 2024 at 4:13 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Mar 06, 2024 at 12:21:28PM +0100, Christian Brauner wrote:
> > On Wed, Mar 06, 2024 at 07:39:14AM +0000, Matt Bobrowski wrote:
> > > G'day All,
> > >
> > > The original cover letter providing background context and motivating
> > > factors around the needs for the BPF kfuncs introduced within this
> > > patch series can be found here [0], so please do reference that if
> > > need be.
> > >
> > > Notably, one of the main contention points within v1 of this patch
> > > series was that we were effectively leaning on some preexisting
> > > in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
> > > within some of the newly introduced BPF kfuncs. As noted in my
> > > response here [1] though, I struggle to understand the technical
> > > reasoning behind why exposing such in-kernel helpers, specifically
> > > only to BPF LSM program types in the form of BPF kfuncs, is inherently
> > > a terrible idea. So, until someone provides me with a sound technical
> > > explanation as to why this cannot or should not be done, I'll continue
> > > to lean on them. The alternative is to reimplement the necessary
> > > in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.
> >
> > You may lean as much as you like. What I've reacted to is that you've
> > (not you specifically, I'm sure) messed up. You've exposed d_path() to
> > users  without understanding that it wasn't safe apparently.
> >
> > And now we get patches that use the self-inflicted brokeness as an
> > argument to expose a bunch of other low-level helpers to fix that.
> >
> > The fact that it's "just bpf LSM" programs doesn't alleviate any
> > concerns whatsoever. Not just because that is just an entry vector but
> > also because we have LSMs induced API abuse that we only ever get to see
> > the fallout from when we refactor apis and then it causes pain for the vfs.
> >
> > I'll take another look at the proposed helpers you need as bpf kfuncs
> > and I'll give my best not to be overly annoyed by all of this. I have no
> > intention of not helping you quite the opposite but I'm annoyed that
> > we're here in the first place.
> >
> > What I want is to stop this madness of exposing stuff to users without
> > fully understanding it's semantics and required guarantees.
>
> So, looking at this series you're now asking us to expose:
>
> (1) mmgrab()
> (2) mmput()
> (3) fput()
> (5) get_mm_exe_file()
> (4) get_task_exe_file()
> (7) get_task_fs_pwd()
> (6) get_task_fs_root()
> (8) path_get()
> (9) path_put()
>
> in one go and the justification in all patches amounts to "This is
> common in some BPF LSM programs".
>
> So, broken stuff got exposed to users or at least a broken BPF LSM
> program was written somewhere out there that is susceptible to UAFs
> becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
> So you're now scrambling to fix this by asking for a bunch of low-level
> exports.
>
> What is the guarantee that you don't end up writing another BPF LSM that
> abuses these exports in a way that causes even more issues and then
> someone else comes back asking for the next round of bpf funcs to be
> exposed to fix it.

There is no guarantee.
We made a safety mistake with bpf_d_path() though
we restricted it very tight. And that UAF is tricky.
I'm still amazed how Jann managed to find it.
We all make mistakes.
It's not the first one and not going to be the last.

What Matt is doing is an honest effort to fix it
in the upstream kernel for all bpf users to benefit.
He could have done it with a kernel module.
The above "low level" helpers are all either static inline
in .h or they call EXPORT_SYMBOL[_GPL] or simply inc/dec refcnt.

One can implement such kfuncs in an out of tree kernel module
and be done with it, but in the bpf community we encourage
everyone to upstream their work.

So kudos to Matt for working on these patches.

His bpf-lsm use case is not special.
It just needs a safe way to call d_path.
Let's look at one of the test in patch 9:

+SEC("lsm.s/file_open")
+__failure __msg("R1 must be referenced or trusted")
+int BPF_PROG(path_d_path_kfunc_untrusted_from_current)
+{
+       struct path *pwd;
+       struct task_struct *current;
+
+       current = bpf_get_current_task_btf();
+       /* Walking a trusted pointer returned from bpf_get_current_task_btf()
+        * yields and untrusted pointer. */
+       pwd = &current->fs->pwd;
+       bpf_path_d_path(pwd, buf, sizeof(buf));
+       return 0;
+}

This test checks that such an access pattern is unsafe and
the verifier will catch it.

To make it safe one needs to do:

  current = bpf_get_current_task_btf();
  pwd = bpf_get_task_fs_pwd(current);
  if (!pwd) // error path
  bpf_path_d_path(pwd, ...);
  bpf_put_path(pwd);

these are the kfuncs from patch 6.

And notice that they have KF_ACQUIRE and KF_RELEASE flags.

They tell the verifier to recognize that bpf_get_task_fs_pwd()
kfunc acquires 'struct path *'.
Meaning that bpf prog cannot just return without releasing it.

The bpf prog cannot use-after-free that 'pwd' either
after it was released by bpf_put_path(pwd).

The verifier static analysis catches such UAF-s.
It didn't catch Jann's UAF earlier, because we didn't have
these kfuncs! Hence the fix is to add such kfuncs with
acquire/release semantics.

> The difference between a regular LSM asking about this and a BPF LSM
> program is that we can see in the hook implementation what the LSM
> intends to do with this and we can judge whether that's safe or not.

See above example.
The verifier is doing a much better job than humans when it comes
to safety.

> Here you're asking us to do this blindfolded.

If you don't trust the verifier to enforce safety,
you shouldn't trust Rust compiler to generate safe code either.

In another reply you've compared kfuncs to EXPORT_SYMBOL_GPL.
Such analogy is correct to some extent,
but unlike exported symbols kfuncs are restricted to particular
program types. They don't accept arbitrary pointers,
and reference count is enforced as well.
That's a pretty big difference vs EXPORT_SYMBOL.

> Because really, all I see immediately supportable is the addition of a
> safe variant of bpf making use of the trusted pointer argument
> constraint:
>
> [PATCH v2 bpf-next 8/9] bpf: add trusted d_path() based BPF kfunc bpf_path_d_path()

This kfunc alone is useful in limited cases.
See above example. For simple LSM file_open hook
the bpf prog needs bpf_get_task_fs_pwd() to use this d_path kfunc.

Christian Brauner March 7, 2024, 9:54 a.m. UTC | #5

On Wed, Mar 06, 2024 at 08:05:05PM -0800, Alexei Starovoitov wrote:
> On Wed, Mar 6, 2024 at 4:13 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Mar 06, 2024 at 12:21:28PM +0100, Christian Brauner wrote:
> > > On Wed, Mar 06, 2024 at 07:39:14AM +0000, Matt Bobrowski wrote:
> > > > G'day All,
> > > >
> > > > The original cover letter providing background context and motivating
> > > > factors around the needs for the BPF kfuncs introduced within this
> > > > patch series can be found here [0], so please do reference that if
> > > > need be.
> > > >
> > > > Notably, one of the main contention points within v1 of this patch
> > > > series was that we were effectively leaning on some preexisting
> > > > in-kernel APIs such as get_task_exe_file() and get_mm_exe_file()
> > > > within some of the newly introduced BPF kfuncs. As noted in my
> > > > response here [1] though, I struggle to understand the technical
> > > > reasoning behind why exposing such in-kernel helpers, specifically
> > > > only to BPF LSM program types in the form of BPF kfuncs, is inherently
> > > > a terrible idea. So, until someone provides me with a sound technical
> > > > explanation as to why this cannot or should not be done, I'll continue
> > > > to lean on them. The alternative is to reimplement the necessary
> > > > in-kernel APIs within the BPF kfuncs, but that's just nonsensical IMO.
> > >
> > > You may lean as much as you like. What I've reacted to is that you've
> > > (not you specifically, I'm sure) messed up. You've exposed d_path() to
> > > users  without understanding that it wasn't safe apparently.
> > >
> > > And now we get patches that use the self-inflicted brokeness as an
> > > argument to expose a bunch of other low-level helpers to fix that.
> > >
> > > The fact that it's "just bpf LSM" programs doesn't alleviate any
> > > concerns whatsoever. Not just because that is just an entry vector but
> > > also because we have LSMs induced API abuse that we only ever get to see
> > > the fallout from when we refactor apis and then it causes pain for the vfs.
> > >
> > > I'll take another look at the proposed helpers you need as bpf kfuncs
> > > and I'll give my best not to be overly annoyed by all of this. I have no
> > > intention of not helping you quite the opposite but I'm annoyed that
> > > we're here in the first place.
> > >
> > > What I want is to stop this madness of exposing stuff to users without
> > > fully understanding it's semantics and required guarantees.
> >
> > So, looking at this series you're now asking us to expose:
> >
> > (1) mmgrab()
> > (2) mmput()
> > (3) fput()
> > (5) get_mm_exe_file()
> > (4) get_task_exe_file()
> > (7) get_task_fs_pwd()
> > (6) get_task_fs_root()
> > (8) path_get()
> > (9) path_put()
> >
> > in one go and the justification in all patches amounts to "This is
> > common in some BPF LSM programs".
> >
> > So, broken stuff got exposed to users or at least a broken BPF LSM
> > program was written somewhere out there that is susceptible to UAFs
> > becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
> > So you're now scrambling to fix this by asking for a bunch of low-level
> > exports.
> >
> > What is the guarantee that you don't end up writing another BPF LSM that
> > abuses these exports in a way that causes even more issues and then
> > someone else comes back asking for the next round of bpf funcs to be
> > exposed to fix it.
> 
> There is no guarantee.
> We made a safety mistake with bpf_d_path() though
> we restricted it very tight. And that UAF is tricky.
> I'm still amazed how Jann managed to find it.
> We all make mistakes.
> It's not the first one and not going to be the last.
> 
> What Matt is doing is an honest effort to fix it
> in the upstream kernel for all bpf users to benefit.
> He could have done it with a kernel module.
> The above "low level" helpers are all either static inline
> in .h or they call EXPORT_SYMBOL[_GPL] or simply inc/dec refcnt.
> 
> One can implement such kfuncs in an out of tree kernel module
> and be done with it, but in the bpf community we encourage
> everyone to upstream their work.
> 
> So kudos to Matt for working on these patches.
> 
> His bpf-lsm use case is not special.
> It just needs a safe way to call d_path.
> 
> +SEC("lsm.s/file_open")
> +__failure __msg("R1 must be referenced or trusted")
> +int BPF_PROG(path_d_path_kfunc_untrusted_from_current)
> +{
> +       struct path *pwd;
> +       struct task_struct *current;
> +
> +       current = bpf_get_current_task_btf();
> +       /* Walking a trusted pointer returned from bpf_get_current_task_btf()
> +        * yields and untrusted pointer. */
> +       pwd = &current->fs->pwd;
> +       bpf_path_d_path(pwd, buf, sizeof(buf));
> +       return 0;
> +}
> 
> This test checks that such an access pattern is unsafe and
> the verifier will catch it.
> 
> To make it safe one needs to do:
> 
>   current = bpf_get_current_task_btf();
>   pwd = bpf_get_task_fs_pwd(current);
>   if (!pwd) // error path
>   bpf_path_d_path(pwd, ...);
>   bpf_put_path(pwd);
> 
> these are the kfuncs from patch 6.
> 
> And notice that they have KF_ACQUIRE and KF_RELEASE flags.
> 
> They tell the verifier to recognize that bpf_get_task_fs_pwd()
> kfunc acquires 'struct path *'.
> Meaning that bpf prog cannot just return without releasing it.
> 
> The bpf prog cannot use-after-free that 'pwd' either
> after it was released by bpf_put_path(pwd).
> 
> The verifier static analysis catches such UAF-s.
> It didn't catch Jann's UAF earlier, because we didn't have
> these kfuncs! Hence the fix is to add such kfuncs with
> acquire/release semantics.
> 
> > The difference between a regular LSM asking about this and a BPF LSM
> > program is that we can see in the hook implementation what the LSM
> > intends to do with this and we can judge whether that's safe or not.
> 
> See above example.
> The verifier is doing a much better job than humans when it comes
> to safety.
> 
> > Here you're asking us to do this blindfolded.
> 
> If you don't trust the verifier to enforce safety,
> you shouldn't trust Rust compiler to generate safe code either.
> 
> In another reply you've compared kfuncs to EXPORT_SYMBOL_GPL.
> Such analogy is correct to some extent,
> but unlike exported symbols kfuncs are restricted to particular
> program types. They don't accept arbitrary pointers,
> and reference count is enforced as well.
> That's a pretty big difference vs EXPORT_SYMBOL.

There's one fundamental question here that we'll need an official answer to:

Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
to request access to various helpers in the kernel?

Because fundamentally this is what this patchset is asking to be done.

If the ZFS out-of-tree kernel module were to send us a similar patch
series asking us for a list of 9 functions that they'd like us to export
what would the answer to that be? It would be "no" - on principle alone.

So what is different between an out-of-tree BPF LSM program that no one
even has ever seen and an out-of-tree kernel module that one can at
least look at in Github? Why should we reject requests from the latter
but are supposed to accept requests from the former?

If we say yes to the BPF LSM program requests we would have to say yes
to ZFS as well.

Paul Moore March 7, 2024, 8:50 p.m. UTC | #6

On Thu, Mar 7, 2024 at 4:55 AM Christian Brauner <brauner@kernel.org> wrote:
>
> There's one fundamental question here that we'll need an official answer to:
>
> Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
> to request access to various helpers in the kernel?

Phrased in a slightly different way, and a bit more generalized: do we
treat out-of-tree BPF programs the same as we do with out-of-tree
kernel modules?  I believe that's the real question, and if we answer
that, we should also have our answer for the internal helper function
question.

> Because fundamentally this is what this patchset is asking to be done.
>
> If the ZFS out-of-tree kernel module were to send us a similar patch
> series asking us for a list of 9 functions that they'd like us to export
> what would the answer to that be? It would be "no" - on principle alone.
>
> So what is different between an out-of-tree BPF LSM program that no one
> even has ever seen and an out-of-tree kernel module that one can at
> least look at in Github? Why should we reject requests from the latter
> but are supposed to accept requests from the former?
>
> If we say yes to the BPF LSM program requests we would have to say yes
> to ZFS as well.

Alexei Starovoitov March 8, 2024, 3:11 a.m. UTC | #7

On Thu, Mar 7, 2024 at 1:55 AM Christian Brauner <brauner@kernel.org> wrote:
>
> > >
> > > So, looking at this series you're now asking us to expose:
> > >
> > > (1) mmgrab()
> > > (2) mmput()
> > > (3) fput()
> > > (5) get_mm_exe_file()
> > > (4) get_task_exe_file()
> > > (7) get_task_fs_pwd()
> > > (6) get_task_fs_root()
> > > (8) path_get()
> > > (9) path_put()
> > >
> > > in one go and the justification in all patches amounts to "This is
> > > common in some BPF LSM programs".
> > >
> > > So, broken stuff got exposed to users or at least a broken BPF LSM
> > > program was written somewhere out there that is susceptible to UAFs
> > > becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
> > > So you're now scrambling to fix this by asking for a bunch of low-level
> > > exports.
> > >
> > > What is the guarantee that you don't end up writing another BPF LSM that
> > > abuses these exports in a way that causes even more issues and then
> > > someone else comes back asking for the next round of bpf funcs to be
> > > exposed to fix it.
> >
> > There is no guarantee.
> > We made a safety mistake with bpf_d_path() though
> > we restricted it very tight. And that UAF is tricky.
> > I'm still amazed how Jann managed to find it.
> > We all make mistakes.
> > It's not the first one and not going to be the last.
> >
> > What Matt is doing is an honest effort to fix it
> > in the upstream kernel for all bpf users to benefit.
> > He could have done it with a kernel module.
> > The above "low level" helpers are all either static inline
> > in .h or they call EXPORT_SYMBOL[_GPL] or simply inc/dec refcnt.
> >
> > One can implement such kfuncs in an out of tree kernel module
> > and be done with it, but in the bpf community we encourage
> > everyone to upstream their work.
> >
> > So kudos to Matt for working on these patches.
> >
> > His bpf-lsm use case is not special.
> > It just needs a safe way to call d_path.
> >
> > +SEC("lsm.s/file_open")
> > +__failure __msg("R1 must be referenced or trusted")
> > +int BPF_PROG(path_d_path_kfunc_untrusted_from_current)
> > +{
> > +       struct path *pwd;
> > +       struct task_struct *current;
> > +
> > +       current = bpf_get_current_task_btf();
> > +       /* Walking a trusted pointer returned from bpf_get_current_task_btf()
> > +        * yields and untrusted pointer. */
> > +       pwd = &current->fs->pwd;
> > +       bpf_path_d_path(pwd, buf, sizeof(buf));
> > +       return 0;
> > +}
> >
> > This test checks that such an access pattern is unsafe and
> > the verifier will catch it.
> >
> > To make it safe one needs to do:
> >
> >   current = bpf_get_current_task_btf();
> >   pwd = bpf_get_task_fs_pwd(current);
> >   if (!pwd) // error path
> >   bpf_path_d_path(pwd, ...);
> >   bpf_put_path(pwd);
> >
> > these are the kfuncs from patch 6.
> >
> > And notice that they have KF_ACQUIRE and KF_RELEASE flags.
> >
> > They tell the verifier to recognize that bpf_get_task_fs_pwd()
> > kfunc acquires 'struct path *'.
> > Meaning that bpf prog cannot just return without releasing it.
> >
> > The bpf prog cannot use-after-free that 'pwd' either
> > after it was released by bpf_put_path(pwd).
> >
> > The verifier static analysis catches such UAF-s.
> > It didn't catch Jann's UAF earlier, because we didn't have
> > these kfuncs! Hence the fix is to add such kfuncs with
> > acquire/release semantics.
> >
> > > The difference between a regular LSM asking about this and a BPF LSM
> > > program is that we can see in the hook implementation what the LSM
> > > intends to do with this and we can judge whether that's safe or not.
> >
> > See above example.
> > The verifier is doing a much better job than humans when it comes
> > to safety.
> >
> > > Here you're asking us to do this blindfolded.
> >
> > If you don't trust the verifier to enforce safety,
> > you shouldn't trust Rust compiler to generate safe code either.
> >
> > In another reply you've compared kfuncs to EXPORT_SYMBOL_GPL.
> > Such analogy is correct to some extent,
> > but unlike exported symbols kfuncs are restricted to particular
> > program types. They don't accept arbitrary pointers,
> > and reference count is enforced as well.
> > That's a pretty big difference vs EXPORT_SYMBOL.
>
> There's one fundamental question here that we'll need an official answer to:
>
> Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
> to request access to various helpers in the kernel?

obviously not.

> Because fundamentally this is what this patchset is asking to be done.

Pardon ?

> If the ZFS out-of-tree kernel module were to send us a similar patch
> series asking us for a list of 9 functions that they'd like us to export
> what would the answer to that be?

This patch set doesn't request any new EXPORT_SYMBOL.
Quoting previous reply:
"
 The above "low level" helpers are all either static inline
 in .h or they call EXPORT_SYMBOL[_GPL] or simply inc/dec refcnt.
"

Let's look at your list:

> > > (1) mmgrab()
> > > (2) mmput()
> > > (3) fput()
> > > (5) get_mm_exe_file()
> > > (4) get_task_exe_file()
> > > (7) get_task_fs_pwd()
> > > (6) get_task_fs_root()
> > > (8) path_get()
> > > (9) path_put()

First of all, there is no such thing as get_task_fs_pwd/root
in the kernel.

The hypothetical out-of-tree ZFS can use the rest just fine.
One can argue that get_mm_exe_file() is not exported,
but it's nothing but rcu_lock-wrap plus get_file_rcu()
which is EXPORT_SYMBOL.

kfunc-s in patch 6 wrap get_fs_root/pwd from linux/fs_struct.h
that calls EXPORT_SYMBOL(path_get).

So upon close examination no new EXPORT_SYMBOL-s were requested.

Christian,

I understand your irritation with anything bpf related
due to our mistake with fd=0, but as I said earlier it was
an honest mistake. There was no malicious intent.
Time to move on.

Alexei Starovoitov March 8, 2024, 3:25 a.m. UTC | #8

On Thu, Mar 7, 2024 at 12:51 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Mar 7, 2024 at 4:55 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > There's one fundamental question here that we'll need an official answer to:
> >
> > Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
> > to request access to various helpers in the kernel?
>
> Phrased in a slightly different way, and a bit more generalized: do we
> treat out-of-tree BPF programs the same as we do with out-of-tree
> kernel modules?  I believe that's the real question, and if we answer
> that, we should also have our answer for the internal helper function
> question.

From 10k ft view bpf programs may look like kernel modules,
but looking closely they are very different.

Modules can read/write any data structure and can call any exported function.
All modules fall into two categories GPL or not.
While bpf progs are divided by program type.
Tracing progs can read any kernel memory safely via probe_read_kernel.
Networking prog can read/write packets, but cannot read kernel memory.
bpf_lsm programs can be called from lsm hooks and
call only kfuncs that were explicitly allowlisted to bpf_lsm prog type.
Furthermore kfuncs have acquire/release semantics enforced by
the verifier.
For example, bpf progs can do bpf_rcu_read_lock() which is
a wrapper around rcu_read_lock() and the verifier will make sure
that bpf_rcu_read_unlock() is called.
Under bpf_rcu_read_lock() bpf programs can dereference __rcu tagged
fields and the verifier will track them as rcu protected objects
until bpf_rcu_read_unlock().
In other words the verifier is doing sparse-on-steroids analysis
and enforcing it.
Kernel modules are not subject to such enforcement.

One more distinction: 99.9% of bpf features require a GPL-ed bpf program.
All kfuncs are GPL only.

Christian Brauner March 8, 2024, 10:35 a.m. UTC | #9

On Thu, Mar 07, 2024 at 07:11:22PM -0800, Alexei Starovoitov wrote:
> On Thu, Mar 7, 2024 at 1:55 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > >
> > > > So, looking at this series you're now asking us to expose:
> > > >
> > > > (1) mmgrab()
> > > > (2) mmput()
> > > > (3) fput()
> > > > (5) get_mm_exe_file()
> > > > (4) get_task_exe_file()
> > > > (7) get_task_fs_pwd()
> > > > (6) get_task_fs_root()
> > > > (8) path_get()
> > > > (9) path_put()
> > > >
> > > > in one go and the justification in all patches amounts to "This is
> > > > common in some BPF LSM programs".
> > > >
> > > > So, broken stuff got exposed to users or at least a broken BPF LSM
> > > > program was written somewhere out there that is susceptible to UAFs
> > > > becauase you didn't restrict bpf_d_path() to trusted pointer arguments.
> > > > So you're now scrambling to fix this by asking for a bunch of low-level
> > > > exports.
> > > >
> > > > What is the guarantee that you don't end up writing another BPF LSM that
> > > > abuses these exports in a way that causes even more issues and then
> > > > someone else comes back asking for the next round of bpf funcs to be
> > > > exposed to fix it.
> > >
> > > There is no guarantee.
> > > We made a safety mistake with bpf_d_path() though
> > > we restricted it very tight. And that UAF is tricky.
> > > I'm still amazed how Jann managed to find it.
> > > We all make mistakes.
> > > It's not the first one and not going to be the last.
> > >
> > > What Matt is doing is an honest effort to fix it
> > > in the upstream kernel for all bpf users to benefit.
> > > He could have done it with a kernel module.
> > > The above "low level" helpers are all either static inline
> > > in .h or they call EXPORT_SYMBOL[_GPL] or simply inc/dec refcnt.
> > >
> > > One can implement such kfuncs in an out of tree kernel module
> > > and be done with it, but in the bpf community we encourage
> > > everyone to upstream their work.
> > >
> > > So kudos to Matt for working on these patches.
> > >
> > > His bpf-lsm use case is not special.
> > > It just needs a safe way to call d_path.
> > >
> > > +SEC("lsm.s/file_open")
> > > +__failure __msg("R1 must be referenced or trusted")
> > > +int BPF_PROG(path_d_path_kfunc_untrusted_from_current)
> > > +{
> > > +       struct path *pwd;
> > > +       struct task_struct *current;
> > > +
> > > +       current = bpf_get_current_task_btf();
> > > +       /* Walking a trusted pointer returned from bpf_get_current_task_btf()
> > > +        * yields and untrusted pointer. */
> > > +       pwd = &current->fs->pwd;
> > > +       bpf_path_d_path(pwd, buf, sizeof(buf));
> > > +       return 0;
> > > +}
> > >
> > > This test checks that such an access pattern is unsafe and
> > > the verifier will catch it.
> > >
> > > To make it safe one needs to do:
> > >
> > >   current = bpf_get_current_task_btf();
> > >   pwd = bpf_get_task_fs_pwd(current);
> > >   if (!pwd) // error path
> > >   bpf_path_d_path(pwd, ...);
> > >   bpf_put_path(pwd);
> > >
> > > these are the kfuncs from patch 6.
> > >
> > > And notice that they have KF_ACQUIRE and KF_RELEASE flags.
> > >
> > > They tell the verifier to recognize that bpf_get_task_fs_pwd()
> > > kfunc acquires 'struct path *'.
> > > Meaning that bpf prog cannot just return without releasing it.
> > >
> > > The bpf prog cannot use-after-free that 'pwd' either
> > > after it was released by bpf_put_path(pwd).
> > >
> > > The verifier static analysis catches such UAF-s.
> > > It didn't catch Jann's UAF earlier, because we didn't have
> > > these kfuncs! Hence the fix is to add such kfuncs with
> > > acquire/release semantics.
> > >
> > > > The difference between a regular LSM asking about this and a BPF LSM
> > > > program is that we can see in the hook implementation what the LSM
> > > > intends to do with this and we can judge whether that's safe or not.
> > >
> > > See above example.
> > > The verifier is doing a much better job than humans when it comes
> > > to safety.
> > >
> > > > Here you're asking us to do this blindfolded.
> > >
> > > If you don't trust the verifier to enforce safety,
> > > you shouldn't trust Rust compiler to generate safe code either.
> > >
> > > In another reply you've compared kfuncs to EXPORT_SYMBOL_GPL.
> > > Such analogy is correct to some extent,
> > > but unlike exported symbols kfuncs are restricted to particular
> > > program types. They don't accept arbitrary pointers,
> > > and reference count is enforced as well.
> > > That's a pretty big difference vs EXPORT_SYMBOL.
> >
> > There's one fundamental question here that we'll need an official answer to:
> >
> > Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
> > to request access to various helpers in the kernel?
> 
> obviously not.
> 
> > Because fundamentally this is what this patchset is asking to be done.
> 
> Pardon ?
> 
> > If the ZFS out-of-tree kernel module were to send us a similar patch
> > series asking us for a list of 9 functions that they'd like us to export
> > what would the answer to that be?
> 
> This patch set doesn't request any new EXPORT_SYMBOL.

In order for that out-of-tree BPF LSM program to get similar guarantees
than an equivalent kernel module we need to export all nine functions
for BPF. Included in that set are functions that aren't currently even
exported to modules.

These exports are specifically for an out-of-tree BPF LSM program that
is not accessible to the public. The question in the other mail stands.

> First of all, there is no such thing as get_task_fs_pwd/root
> in the kernel.

Yeah, we'd need specific helpers for a never seen before out-of-tree BPF
LSM. I don't see how that's different from an out-of-tree kernel module.

> One can argue that get_mm_exe_file() is not exported,
> but it's nothing but rcu_lock-wrap plus get_file_rcu()
> which is EXPORT_SYMBOL.

Oh, good spot. That's an accident. get_file_rcu() definitely shouldn't
be exported. So that'll be removed asap.

> Christian,
> 
> I understand your irritation with anything bpf related
> due to our mistake with fd=0, but as I said earlier it was
> an honest mistake. There was no malicious intent.
> Time to move on.

I understand your concern but I'm neither irritated by bpf nor do I hold
grudges. As I'm writing this bpf's bpffs and token changes are on their
way into mainline for v6.9 which I reviewed and acked immediately after
that discussion.

Rest assures that you can move on from that concern. There's no need to
keep bringing it up in unrelated discussions.

Christian Brauner March 8, 2024, 10:58 a.m. UTC | #10

On Thu, Mar 07, 2024 at 07:25:05PM -0800, Alexei Starovoitov wrote:
> On Thu, Mar 7, 2024 at 12:51 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Thu, Mar 7, 2024 at 4:55 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > There's one fundamental question here that we'll need an official answer to:
> > >
> > > Is it ok for an out-of-tree BPF LSM program, that nobody has ever seen
> > > to request access to various helpers in the kernel?
> >
> > Phrased in a slightly different way, and a bit more generalized: do we
> > treat out-of-tree BPF programs the same as we do with out-of-tree
> > kernel modules?  I believe that's the real question, and if we answer
> > that, we should also have our answer for the internal helper function
> > question.
> 
> From 10k ft view bpf programs may look like kernel modules,
> but looking closely they are very different.
> 
> Modules can read/write any data structure and can call any exported function.
> All modules fall into two categories GPL or not.
> While bpf progs are divided by program type.
> Tracing progs can read any kernel memory safely via probe_read_kernel.
> Networking prog can read/write packets, but cannot read kernel memory.
> bpf_lsm programs can be called from lsm hooks and
> call only kfuncs that were explicitly allowlisted to bpf_lsm prog type.
> Furthermore kfuncs have acquire/release semantics enforced by
> the verifier.
> For example, bpf progs can do bpf_rcu_read_lock() which is
> a wrapper around rcu_read_lock() and the verifier will make sure
> that bpf_rcu_read_unlock() is called.
> Under bpf_rcu_read_lock() bpf programs can dereference __rcu tagged
> fields and the verifier will track them as rcu protected objects
> until bpf_rcu_read_unlock().
> In other words the verifier is doing sparse-on-steroids analysis
> and enforcing it.
> Kernel modules are not subject to such enforcement.
> 
> One more distinction: 99.9% of bpf features require a GPL-ed bpf program.
> All kfuncs are GPL only.

While these are certainly all very nice properties it doesn't change the
core question.

An out of-tree BPF LSM program is requesting us to add nine vfs helpers
to the BPF kfunc api. That out-of-tree BPF LSM program is not even
accessible to the public. There is no meaningful difference to an
out-of-tree kernel module asking us to add nine functions to the export
api. The safety properties don't matter for this. Clearly, they didn't
prevent the previous bpf_d_path() stuff from happening.

We need rules for how this is supposed to be handled. Unless those rules
are clearly specified and make sense we shouldn't be adding BPF kfuncs
based on requests from out-of-tree BPF LSMs that amount to "we need
this".

Alexei Starovoitov March 9, 2024, 1:23 a.m. UTC | #11

On Fri, Mar 8, 2024 at 2:36 AM Christian Brauner <brauner@kernel.org> wrote:
>
>
> These exports are specifically for an out-of-tree BPF LSM program that
> is not accessible to the public. The question in the other mail stands.

The question was already answered. You just don't like the answer.
bpf progs are not equivalent to kernel modules.
They have completely different safety and visibility properties.
The safety part I already talked about.
Sounds like the visibility has to be explained.
Kernel modules are opaque binary blobs.
bpf programs are fully transparent. The intent is known
to the verifier and to anyone with understanding
of bpf assembly.
Those that cannot read bpf asm can read C source code that is
embedded in the bpf program in kernel memory.
It's not the same as "llvm-dwarfdump module.ko" on disk.
The bpf prog source code is loaded into the kernel
at program verification time for debugging and visibility reasons.
If there is a verifier bug and bpf manages to crash the kernel
vmcore will have relevant lines of program C source code right there.

Hence out-of-tree or in-tree bpf makes no practical difference.
The program cannot hide its meaning and doesn't hamper debugging.

Hence adding EXPORT_SYMBOL == Brace for impact!
Expect crashes, api misuse and what not.

While adding bpf_kfunc is a nop for kernel development.
If kfunc is in the way of code refactoring it can be removed
(as we demonstrated several times).
A kfunc won't cause headaches for the kernel code it is
calling (assuming no verifier bugs).
If there is a bug it's on us to fix it as we demonstrated in the past.
For example: bpf_probe_read_kernel().
It's a wrapper of copy_from_kernel_nofault() and over the years
bpf users hit various bugs in copy_from_kernel_nofault(),
reported them, and _bpf developers_ fixed them.
Though copy_from_kernel_nofault() is as generic as it can get
and the same bugs could have been reproduced without bpf
we took care of fixing these parts of the kernel.

Look at path_put().
It's EXPORT_SYMBOL and any kernel module can easily screw up
reference counting, so that sooner or later distro folks
will experience debug pains due to out-of-tree drivers.

kfunc that calls path_put() won't have such consequences.
The verifier will prevent path_put() on a pointer that wasn't
acquired by the same bpf program. No support pains.
It's a nop for vfs folks.

> > First of all, there is no such thing as get_task_fs_pwd/root
> > in the kernel.
>
> Yeah, we'd need specific helpers for a never seen before out-of-tree BPF
> LSM. I don't see how that's different from an out-of-tree kernel module.

Sorry, but you don't seem to understand what bpf can and cannot do,
hence they look similar.

> > One can argue that get_mm_exe_file() is not exported,
> > but it's nothing but rcu_lock-wrap plus get_file_rcu()
> > which is EXPORT_SYMBOL.
>
> Oh, good spot. That's an accident. get_file_rcu() definitely shouldn't
> be exported. So that'll be removed asap.

So, just to make a point that
"Included in that set are functions that aren't currently even
exported to modules"
you want to un-export get_file_rcu() ?

Because as the patch stands today everything that these kfuncs are
calling is EXPORT_SYMBOL.

Christian Brauner March 11, 2024, noon UTC | #12

On Fri, Mar 08, 2024 at 05:23:30PM -0800, Alexei Starovoitov wrote:
> On Fri, Mar 8, 2024 at 2:36 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> >
> > These exports are specifically for an out-of-tree BPF LSM program that
> > is not accessible to the public. The question in the other mail stands.
> 
> The question was already answered. You just don't like the answer.
> bpf progs are not equivalent to kernel modules.
> They have completely different safety and visibility properties.
> The safety part I already talked about.
> Sounds like the visibility has to be explained.
> Kernel modules are opaque binary blobs.
> bpf programs are fully transparent. The intent is known
> to the verifier and to anyone with understanding
> of bpf assembly.
> Those that cannot read bpf asm can read C source code that is
> embedded in the bpf program in kernel memory.
> It's not the same as "llvm-dwarfdump module.ko" on disk.
> The bpf prog source code is loaded into the kernel
> at program verification time for debugging and visibility reasons.
> If there is a verifier bug and bpf manages to crash the kernel
> vmcore will have relevant lines of program C source code right there.
> 
> Hence out-of-tree or in-tree bpf makes no practical difference.
> The program cannot hide its meaning and doesn't hamper debugging.
> 
> Hence adding EXPORT_SYMBOL == Brace for impact!
> Expect crashes, api misuse and what not.
> 
> While adding bpf_kfunc is a nop for kernel development.
> If kfunc is in the way of code refactoring it can be removed
> (as we demonstrated several times).
> A kfunc won't cause headaches for the kernel code it is
> calling (assuming no verifier bugs).
> If there is a bug it's on us to fix it as we demonstrated in the past.
> For example: bpf_probe_read_kernel().
> It's a wrapper of copy_from_kernel_nofault() and over the years
> bpf users hit various bugs in copy_from_kernel_nofault(),
> reported them, and _bpf developers_ fixed them.
> Though copy_from_kernel_nofault() is as generic as it can get
> and the same bugs could have been reproduced without bpf
> we took care of fixing these parts of the kernel.
> 
> Look at path_put().
> It's EXPORT_SYMBOL and any kernel module can easily screw up
> reference counting, so that sooner or later distro folks
> will experience debug pains due to out-of-tree drivers.
> 
> kfunc that calls path_put() won't have such consequences.
> The verifier will prevent path_put() on a pointer that wasn't
> acquired by the same bpf program. No support pains.
> It's a nop for vfs folks.
> 
> > > First of all, there is no such thing as get_task_fs_pwd/root
> > > in the kernel.
> >
> > Yeah, we'd need specific helpers for a never seen before out-of-tree BPF
> > LSM. I don't see how that's different from an out-of-tree kernel module.
> 
> Sorry, but you don't seem to understand what bpf can and cannot do,
> hence they look similar.

Maybe. On the other hand you seem to ignore what I'm saying. You
currently don't have a clear set of rules for when it's ok for someone
to send patches and request access to bpf kfuncs to implement a new BPF
program. This patchset very much illustrates this point. The safety
properties of bpf don't matter for this. And again, your safety
properties very much didn't protect you from your bpf_d_path() mess.

We're not even clearly told where and how these helper are supposed to be
used. That's not ok and will never be ok. As long as there are no clear
criteria to operate under this is highly problematic. This may be fine
from a bpf perspective and one can even understand why because that's
apparently your model or promise to your users. But there's no reason to
expect the same level of laxness from any of the subsystems you're
requesting kfuncs from.

> > > One can argue that get_mm_exe_file() is not exported,
> > > but it's nothing but rcu_lock-wrap plus get_file_rcu()
> > > which is EXPORT_SYMBOL.
> >
> > Oh, good spot. That's an accident. get_file_rcu() definitely shouldn't
> > be exported. So that'll be removed asap.
> 
> So, just to make a point that
> "Included in that set are functions that aren't currently even
> exported to modules"
> you want to un-export get_file_rcu() ?

No. The reason it was exported was because of the drm subsystem and we
already quite disliked that. But it turned out that's not needed so in
commit 61d4fb0b349e ("file, i915: fix file reference for
mmap_singleton()") they were moved away from this helper.

And then we simply forgot to unexport it.

A helper such as get_file_rcu() is called on a file object that is
subject to SLAB_TYPESAFE_BY_RCU semantics where the caller doesn't hold
a reference. The semantics of that are maybe understood by a couple of
people in the kernel. There is absolutely no way that any userspace will
get access to such low-level helpers. They have zero business to be
involved in the lifetimes of objects on this level just as no module has.

So really, this is an orthogonal cleanup.

Matt Bobrowski March 12, 2024, 5:06 p.m. UTC | #13

Hey Christian,

On Mon, Mar 11, 2024 at 01:00:56PM +0100, Christian Brauner wrote:
> On Fri, Mar 08, 2024 at 05:23:30PM -0800, Alexei Starovoitov wrote:
> > On Fri, Mar 8, 2024 at 2:36 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > >
> > > These exports are specifically for an out-of-tree BPF LSM program that
> > > is not accessible to the public. The question in the other mail stands.
> > 
> > The question was already answered. You just don't like the answer.
> > bpf progs are not equivalent to kernel modules.
> > They have completely different safety and visibility properties.
> > The safety part I already talked about.
> > Sounds like the visibility has to be explained.
> > Kernel modules are opaque binary blobs.
> > bpf programs are fully transparent. The intent is known
> > to the verifier and to anyone with understanding
> > of bpf assembly.
> > Those that cannot read bpf asm can read C source code that is
> > embedded in the bpf program in kernel memory.
> > It's not the same as "llvm-dwarfdump module.ko" on disk.
> > The bpf prog source code is loaded into the kernel
> > at program verification time for debugging and visibility reasons.
> > If there is a verifier bug and bpf manages to crash the kernel
> > vmcore will have relevant lines of program C source code right there.
> > 
> > Hence out-of-tree or in-tree bpf makes no practical difference.
> > The program cannot hide its meaning and doesn't hamper debugging.
> > 
> > Hence adding EXPORT_SYMBOL == Brace for impact!
> > Expect crashes, api misuse and what not.
> > 
> > While adding bpf_kfunc is a nop for kernel development.
> > If kfunc is in the way of code refactoring it can be removed
> > (as we demonstrated several times).
> > A kfunc won't cause headaches for the kernel code it is
> > calling (assuming no verifier bugs).
> > If there is a bug it's on us to fix it as we demonstrated in the past.
> > For example: bpf_probe_read_kernel().
> > It's a wrapper of copy_from_kernel_nofault() and over the years
> > bpf users hit various bugs in copy_from_kernel_nofault(),
> > reported them, and _bpf developers_ fixed them.
> > Though copy_from_kernel_nofault() is as generic as it can get
> > and the same bugs could have been reproduced without bpf
> > we took care of fixing these parts of the kernel.
> > 
> > Look at path_put().
> > It's EXPORT_SYMBOL and any kernel module can easily screw up
> > reference counting, so that sooner or later distro folks
> > will experience debug pains due to out-of-tree drivers.
> > 
> > kfunc that calls path_put() won't have such consequences.
> > The verifier will prevent path_put() on a pointer that wasn't
> > acquired by the same bpf program. No support pains.
> > It's a nop for vfs folks.
> > 
> > > > First of all, there is no such thing as get_task_fs_pwd/root
> > > > in the kernel.
> > >
> > > Yeah, we'd need specific helpers for a never seen before out-of-tree BPF
> > > LSM. I don't see how that's different from an out-of-tree kernel module.
> > 
> > Sorry, but you don't seem to understand what bpf can and cannot do,
> > hence they look similar.
> 
> Maybe. On the other hand you seem to ignore what I'm saying. You
> currently don't have a clear set of rules for when it's ok for someone
> to send patches and request access to bpf kfuncs to implement a new BPF
> program. This patchset very much illustrates this point. The safety
> properties of bpf don't matter for this. And again, your safety
> properties very much didn't protect you from your bpf_d_path() mess.
> 
> We're not even clearly told where and how these helper are supposed to be
> used. That's not ok and will never be ok. As long as there are no clear
> criteria to operate under this is highly problematic. This may be fine
> from a bpf perspective and one can even understand why because that's
> apparently your model or promise to your users. But there's no reason to
> expect the same level of laxness from any of the subsystems you're
> requesting kfuncs from.

You raise a completely fair point, and I truly do apologies for the
lack of context and in depth explanations around the specific
situations that the proposed BPF kfuncs are intended to be used
from. Admittedly, that's a failure on my part, and I can completely
understand why from a maintainers point of view there would be
reservations around acknowledging requests for adding such invisible
dependencies.

Now, I'm in a little bit of a tough situation as I'm unable to point
you to an open-source BPF LSM implementation that intends to make use
of such newly proposed BPF kfuncs. That's just an unfortunate
constraint and circumstance that I'm having to deal with, so I'm just
going to have to provide heavily redacted and incomplete example to
illustrate how these BPF kfuncs intend to be used from BPF LSM
programs that I personally work on here at Google. Notably though, the
contexts that I do share here may obviously be a nonholistic view on
how these newly introduced BPF kfuncs end up getting used in practice
by some other completely arbitrary open-source BPF LSM programs.

Anyway, as Alexei had pointed out in one of the prior responses, the
core motivating factor behind introducing these newly proposed BPF
kfuncs purely stems from the requirement of needing to call
bpf_d_path() safely on a struct path from the context of a BPF LSM
program, specifically within the security_file_open() and
security_mmap_file() LSM hooks. Now, as noted within the original bug
report [0], it's currently not considered safe to pluck a struct path
out from an arbitrary in-kernel data structure, which in our case was
current->mm->exe_file->f_path, and have it passed to bpf_d_path() from
the aforementioned LSM hook points, or any other LSM hook point for
that matter.

So, without using these newly introduced BPF kfuncs, our BPF LSM
program hanging off security_file_open() looks as follows:

```
int BPF_PROG(file_open, struct file *file)
{
  // Perform a whole bunch of operations on the supplied file argument. This
  // includes some form of policy evaluation, and if there's a violation against
  // policy and auditing is enabled, then we eventually call bpf_d_path() on
  // file->f_path. Calling bpf_d_path() on the file argument isn't problematic
  // as we have a stable path here as the file argument is reference counted.
  struct path *target = &file->f_path;

  // ...

  struct task_struct *current = bpf_get_current_task_btf();

  // ...

  bpf_rcu_read_lock();
  // Reserve a slot on the BPF ring buffer such that the actor's path can be
  // passed back to userspace.
  void *buf = bpf_ringbuf_reserve(&ringbuf, PATH_MAX, 0);
  if (!buf) {
    goto unlock;
  }

  // For contextual purposes when performing an audit we also call bpf_d_path()
  // on the actor, being current->mm->exe_file->f_path.
  struct path *actor = &current->mm->exe_file->f_path;

  // Now perform the path resolution on the actor via bpf_d_path().
  u64 ret = bpf_d_path(actor, buf, PATH_MAX);
  if (ret > 0) {
    bpf_ringbuf_submit(buf, BPF_RB_NO_WAKEUP);
  } else {
    bpf_ringbuf_discard(buf, 0);
  }

unlock:
  bpf_rcu_read_unlock();
  return 0;
}
```

Post landing these BPF kfuncs, the BPF LSM program hanging off
security_file_open() would be updated to make use of the
acquire/release BPF kfuncs as below. Here I'm only making use of
bpf_get_task_exe_file(), but similar usage also extends to
bpf_get_task_fs_root() and bpf_get_task_fs_pwd().

```
int BPF_PROG(file_open, struct file *file)
{
  // Perform a whole bunch of operations on the supplied file argument. This
  // includes some form of policy evaluation, and if there's a violation against
  // policy and auditing is enabled, then we eventually call bpf_path_d_path()
  // on file->f_path. Calling bpf_path_d_path() on the file argument isn't
  // problematic as we have a stable path here as the file argument is trusted
  // and reference counted.
  struct path *target = &file->f_path;

  // ...

  struct task_struct *current = bpf_get_current_task_btf();

  // ...

  // Reserve a slot on the BPF ring buffer such that the actor's path can be
  // passed back to userspace.
  void *buf = bpf_ringbuf_reserve(&ringbuf, PATH_MAX, 0);
  if (!buf) {
    return 0;
  }

  // For contextual purposes when performing an audit we also call
  // bpf_path_d_path() on the actor, being current->mm->exe_file->f_path.
  // Here we're operating on a stable trused and reference counted file,
  // thanks to bpf_get_task_exe_file().
  struct file *exe_file = bpf_get_task_exe_file(current);
  if (!exe_file) {
    bpf_ringbuf_discard(buf, 0);
    return 0;
  }

  // Now perform the path resolution on the actor via bpf_path_d_path(), which
  // only accepts a trusted struct path.
  u64 ret = bpf_path_d_path(&exe_file->f_path, buf, PATH_MAX);
  if (ret > 0) {
    bpf_ringbuf_submit(buf, BPF_RB_NO_WAKEUP);
  } else {
    bpf_ringbuf_discard(buf, 0);
  }

  // Drop the reference on exe_file.
  bpf_put_file(exe_file);
  return 0;
}
```

This is rather incredibly straightforward, but the fundamental
difference between the two implementations is that one allows us to
work on stable file, whereas the other does not. That's really
it. Similarly, we do more or less the same for our BPF LSM program
that hangs off security_mmap_file().

Do you need anything else which illustrates the proposed BPF kfunc
usage?

[0] https://lore.kernel.org/bpf/CAG48ez0ppjcT=QxU-jtCUfb5xQb3mLr=5FcwddF_VKfEBPs_Dg@mail.gmail.com/

/M

Matt Bobrowski March 12, 2024, 8:11 p.m. UTC | #14

On Tue, Mar 12, 2024 at 05:06:36PM +0000, Matt Bobrowski wrote:
> Hey Christian,
> 
> On Mon, Mar 11, 2024 at 01:00:56PM +0100, Christian Brauner wrote:
> > On Fri, Mar 08, 2024 at 05:23:30PM -0800, Alexei Starovoitov wrote:
> > > On Fri, Mar 8, 2024 at 2:36 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > >
> > > > These exports are specifically for an out-of-tree BPF LSM program that
> > > > is not accessible to the public. The question in the other mail stands.
> > > 
> > > The question was already answered. You just don't like the answer.
> > > bpf progs are not equivalent to kernel modules.
> > > They have completely different safety and visibility properties.
> > > The safety part I already talked about.
> > > Sounds like the visibility has to be explained.
> > > Kernel modules are opaque binary blobs.
> > > bpf programs are fully transparent. The intent is known
> > > to the verifier and to anyone with understanding
> > > of bpf assembly.
> > > Those that cannot read bpf asm can read C source code that is
> > > embedded in the bpf program in kernel memory.
> > > It's not the same as "llvm-dwarfdump module.ko" on disk.
> > > The bpf prog source code is loaded into the kernel
> > > at program verification time for debugging and visibility reasons.
> > > If there is a verifier bug and bpf manages to crash the kernel
> > > vmcore will have relevant lines of program C source code right there.
> > > 
> > > Hence out-of-tree or in-tree bpf makes no practical difference.
> > > The program cannot hide its meaning and doesn't hamper debugging.
> > > 
> > > Hence adding EXPORT_SYMBOL == Brace for impact!
> > > Expect crashes, api misuse and what not.
> > > 
> > > While adding bpf_kfunc is a nop for kernel development.
> > > If kfunc is in the way of code refactoring it can be removed
> > > (as we demonstrated several times).
> > > A kfunc won't cause headaches for the kernel code it is
> > > calling (assuming no verifier bugs).
> > > If there is a bug it's on us to fix it as we demonstrated in the past.
> > > For example: bpf_probe_read_kernel().
> > > It's a wrapper of copy_from_kernel_nofault() and over the years
> > > bpf users hit various bugs in copy_from_kernel_nofault(),
> > > reported them, and _bpf developers_ fixed them.
> > > Though copy_from_kernel_nofault() is as generic as it can get
> > > and the same bugs could have been reproduced without bpf
> > > we took care of fixing these parts of the kernel.
> > > 
> > > Look at path_put().
> > > It's EXPORT_SYMBOL and any kernel module can easily screw up
> > > reference counting, so that sooner or later distro folks
> > > will experience debug pains due to out-of-tree drivers.
> > > 
> > > kfunc that calls path_put() won't have such consequences.
> > > The verifier will prevent path_put() on a pointer that wasn't
> > > acquired by the same bpf program. No support pains.
> > > It's a nop for vfs folks.
> > > 
> > > > > First of all, there is no such thing as get_task_fs_pwd/root
> > > > > in the kernel.
> > > >
> > > > Yeah, we'd need specific helpers for a never seen before out-of-tree BPF
> > > > LSM. I don't see how that's different from an out-of-tree kernel module.
> > > 
> > > Sorry, but you don't seem to understand what bpf can and cannot do,
> > > hence they look similar.
> > 
> > Maybe. On the other hand you seem to ignore what I'm saying. You
> > currently don't have a clear set of rules for when it's ok for someone
> > to send patches and request access to bpf kfuncs to implement a new BPF
> > program. This patchset very much illustrates this point. The safety
> > properties of bpf don't matter for this. And again, your safety
> > properties very much didn't protect you from your bpf_d_path() mess.
> > 
> > We're not even clearly told where and how these helper are supposed to be
> > used. That's not ok and will never be ok. As long as there are no clear
> > criteria to operate under this is highly problematic. This may be fine
> > from a bpf perspective and one can even understand why because that's
> > apparently your model or promise to your users. But there's no reason to
> > expect the same level of laxness from any of the subsystems you're
> > requesting kfuncs from.
> 
> You raise a completely fair point, and I truly do apologies for the
> lack of context and in depth explanations around the specific
> situations that the proposed BPF kfuncs are intended to be used
> from. Admittedly, that's a failure on my part, and I can completely
> understand why from a maintainers point of view there would be
> reservations around acknowledging requests for adding such invisible
> dependencies.
> 
> Now, I'm in a little bit of a tough situation as I'm unable to point
> you to an open-source BPF LSM implementation that intends to make use
> of such newly proposed BPF kfuncs. That's just an unfortunate
> constraint and circumstance that I'm having to deal with, so I'm just
> going to have to provide heavily redacted and incomplete example to
> illustrate how these BPF kfuncs intend to be used from BPF LSM
> programs that I personally work on here at Google. Notably though, the
> contexts that I do share here may obviously be a nonholistic view on
> how these newly introduced BPF kfuncs end up getting used in practice
> by some other completely arbitrary open-source BPF LSM programs.
> 
> Anyway, as Alexei had pointed out in one of the prior responses, the
> core motivating factor behind introducing these newly proposed BPF
> kfuncs purely stems from the requirement of needing to call
> bpf_d_path() safely on a struct path from the context of a BPF LSM
> program, specifically within the security_file_open() and
> security_mmap_file() LSM hooks. Now, as noted within the original bug
> report [0], it's currently not considered safe to pluck a struct path
> out from an arbitrary in-kernel data structure, which in our case was
> current->mm->exe_file->f_path, and have it passed to bpf_d_path() from
> the aforementioned LSM hook points, or any other LSM hook point for
> that matter.
> 
> So, without using these newly introduced BPF kfuncs, our BPF LSM
> program hanging off security_file_open() looks as follows:
> 
> ```
> int BPF_PROG(file_open, struct file *file)
> {
>   // Perform a whole bunch of operations on the supplied file argument. This
>   // includes some form of policy evaluation, and if there's a violation against
>   // policy and auditing is enabled, then we eventually call bpf_d_path() on
>   // file->f_path. Calling bpf_d_path() on the file argument isn't problematic
>   // as we have a stable path here as the file argument is reference counted.
>   struct path *target = &file->f_path;
> 
>   // ...
> 
>   struct task_struct *current = bpf_get_current_task_btf();
> 
>   // ...
>   
>   bpf_rcu_read_lock();
>   // Reserve a slot on the BPF ring buffer such that the actor's path can be
>   // passed back to userspace.
>   void *buf = bpf_ringbuf_reserve(&ringbuf, PATH_MAX, 0);
>   if (!buf) {
>     goto unlock;
>   }
> 
>   // For contextual purposes when performing an audit we also call bpf_d_path()
>   // on the actor, being current->mm->exe_file->f_path.
>   struct path *actor = &current->mm->exe_file->f_path;
> 
>   // Now perform the path resolution on the actor via bpf_d_path().
>   u64 ret = bpf_d_path(actor, buf, PATH_MAX);
>   if (ret > 0) {
>     bpf_ringbuf_submit(buf, BPF_RB_NO_WAKEUP);
>   } else {
>     bpf_ringbuf_discard(buf, 0);
>   }
> 
> unlock:
>   bpf_rcu_read_unlock();
>   return 0;
> }
> ```

Note that we're also aware of the fact that calling bpf_d_path()
within an RCU read-side critical shouldn't be permitted. I have a
patch teed up which addresses this. bpf_path_d_path() OTOH isn't
susceptible to this problem as the BPF verifier ensure that BPF kfuncs
annotated KF_SLEEPABLE can't be called whilst in an RCU read-side
critical section.

/M

Alexei Starovoitov March 13, 2024, 9:05 p.m. UTC | #15

On Mon, Mar 11, 2024 at 5:01 AM Christian Brauner <brauner@kernel.org> wrote:
>
> > > > One can argue that get_mm_exe_file() is not exported,
> > > > but it's nothing but rcu_lock-wrap plus get_file_rcu()
> > > > which is EXPORT_SYMBOL.
> > >
> > > Oh, good spot. That's an accident. get_file_rcu() definitely shouldn't
> > > be exported. So that'll be removed asap.
> >
> > So, just to make a point that
> > "Included in that set are functions that aren't currently even
> > exported to modules"
> > you want to un-export get_file_rcu() ?
>
> No. The reason it was exported was because of the drm subsystem and we
> already quite disliked that. But it turned out that's not needed so in
> commit 61d4fb0b349e ("file, i915: fix file reference for
> mmap_singleton()") they were moved away from this helper.

Arguably that commit 61d4fb0b349e should have had
Fixes: 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU")
i915 was buggy before you touched it
and safe_by_rcu exposed the bug.
I can see why you guys looked at it, saw issues,
and decided to look away.
Though your guess in commit 61d4fb0b349e
"
    Now, there might be delays until
    file->f_op->release::singleton_release() is called and
    i915->gem.mmap_singleton is set to NULL.
"
feels unlikely.
I suspect release() delay cannot be that long to cause rcu stall.
In the log prior to the splat there are just two mmap related calls
from selftests in i915_gem_mman_live_selftests():
i915: Running i915_gem_mman_live_selftests/igt_mmap_offset_exhaustion
i915: Running i915_gem_mman_live_selftests/igt_mmap
1st mmap test passed, but 2nd failed.
So it looks like it's not a race, but an issue with cleanup in that driver.
And instead of getting to the bottom of the issue
you've decided to paper over with get_file_active().
I agree with that trade-off.
But the bug in i915 is still there and it's probably an UAF.
get_file_active() is probably operating on a broken 'struct file'
that got to zero, but somehow it still around
or it's just a garbage memory and file->f_count
just happened to be zero.

My point is that it's not ok to have such double standards.
On one side you're arguing that we shouldn't introduce kfunc:
+__bpf_kfunc struct file *bpf_get_task_exe_file(struct task_struct *task)
+{
+ return get_task_exe_file(task);
+}
that cleanly takes ref cnt on task->mm->exe_file and _not_ using lower
level get_file/get_file_rcu/get_file_active api-s directly which
are certainly problematic to expose anywhere, since safe_by_rcu
protocol is delicate.

But on the other side there is buggy i915 that does
questionable dance with get_file_active().
It's EXPORT_SYMBOL_GPL as well and out of tree driver can
ruin safe_by_rcu file properties with hard to debug consequences.

> There is absolutely no way that any userspace will
> get access to such low-level helpers. They have zero business to be
> involved in the lifetimes of objects on this level just as no module has.

correct, and kfuncs do not give bpf prog to do direct get_file*() access
because we saw how tricky safe_by_rcu is.
Hence kfuncs acquire file via get_task_exe_file or get_mm_exe_file
and release via fput.
That's the same pattern that security/tomoyo/util.c is doing:
   exe_file = get_mm_exe_file(mm);
   if (!exe_file)
        return NULL;

   cp = tomoyo_realpath_from_path(&exe_file->f_path);
   fput(exe_file);

in bpf_lsm case it will be:

   exe_file = bpf_get_mm_exe_file(mm);
   if (!exe_file)
   // the verifier will enforce that bpf prog has this NULL check here
   // because we annotate kfunc as:
BTF_ID_FLAGS(func, bpf_get_mm_exe_file, KF_ACQUIRE | KF_TRUSTED_ARGS |
KF_RET_NULL)

 bpf_path_d_path(&exe_file->f_path, ...);
 bpf_put_file(exe_file);
// and the verifier will enforce that bpf_put_file() is called too.
// and there is no path out of this bpf program that can take file refcnt
// without releasing.

So really these kfuncs are a nop from vfs pov.
If there is a bug in the verifier we will debug it and we will fix it.

You keep saying that bpf_d_path() is a mess.
Right. It is a mess now and we're fixing it.
When it was introduced 4 years ago it was safe at that time.
The unrelated verifier "smartness" made it possible to use it in UAF.
We found the issue now and we're fixing it.
Over these years we didn't ask vfs folks to help fix such bugs,
and not asking for help now.
You're being cc-ed on the patches to be aware on how we plan to fix
this bpf_d_path() mess. If you have a viable alternative please suggest.
As it stands the new kfuncs are clean and safe way to solve this mess.

Christian Brauner March 18, 2024, 1:14 p.m. UTC | #16

On Wed, Mar 13, 2024 at 02:05:13PM -0700, Alexei Starovoitov wrote:
> On Mon, Mar 11, 2024 at 5:01 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > > > > One can argue that get_mm_exe_file() is not exported,
> > > > > but it's nothing but rcu_lock-wrap plus get_file_rcu()
> > > > > which is EXPORT_SYMBOL.
> > > >
> > > > Oh, good spot. That's an accident. get_file_rcu() definitely shouldn't
> > > > be exported. So that'll be removed asap.
> > >
> > > So, just to make a point that
> > > "Included in that set are functions that aren't currently even
> > > exported to modules"
> > > you want to un-export get_file_rcu() ?
> >
> > No. The reason it was exported was because of the drm subsystem and we
> > already quite disliked that. But it turned out that's not needed so in
> > commit 61d4fb0b349e ("file, i915: fix file reference for
> > mmap_singleton()") they were moved away from this helper.
> 
> Arguably that commit 61d4fb0b349e should have had
> Fixes: 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU")
> i915 was buggy before you touched it
> and safe_by_rcu exposed the bug.
> I can see why you guys looked at it, saw issues,
> and decided to look away.
> Though your guess in commit 61d4fb0b349e
> "
>     Now, there might be delays until
>     file->f_op->release::singleton_release() is called and
>     i915->gem.mmap_singleton is set to NULL.
> "
> feels unlikely.
> I suspect release() delay cannot be that long to cause rcu stall.
> In the log prior to the splat there are just two mmap related calls
> from selftests in i915_gem_mman_live_selftests():
> i915: Running i915_gem_mman_live_selftests/igt_mmap_offset_exhaustion
> i915: Running i915_gem_mman_live_selftests/igt_mmap
> 1st mmap test passed, but 2nd failed.
> So it looks like it's not a race, but an issue with cleanup in that driver.
> And instead of getting to the bottom of the issue
> you've decided to paper over with get_file_active().
> I agree with that trade-off.
> But the bug in i915 is still there and it's probably an UAF.
> get_file_active() is probably operating on a broken 'struct file'
> that got to zero, but somehow it still around
> or it's just a garbage memory and file->f_count
> just happened to be zero.
> 
> My point is that it's not ok to have such double standards.
> On one side you're arguing that we shouldn't introduce kfunc:
> +__bpf_kfunc struct file *bpf_get_task_exe_file(struct task_struct *task)
> +{
> + return get_task_exe_file(task);
> +}
> that cleanly takes ref cnt on task->mm->exe_file and _not_ using lower
> level get_file/get_file_rcu/get_file_active api-s directly which
> are certainly problematic to expose anywhere, since safe_by_rcu
> protocol is delicate.
> 
> But on the other side there is buggy i915 that does
> questionable dance with get_file_active().
> It's EXPORT_SYMBOL_GPL as well and out of tree driver can
> ruin safe_by_rcu file properties with hard to debug consequences.

You're lending strong support for my earlier point. Because this is a
clear an example where a subsystem got access to a helper that it
shouldn't have had access to. So we fixed the issue.

But this whole polemic just illustrates that you simply didn't bother to
understand how the code works. The way you talk about UAF together with
SLAB_TYPESAFE_BY_RCU is telling. Please read the code instead of
guessing.

So the same way we don't have to take responsibility for bpf
misunderstanding the expectations of d_path() we don't have to take
responsibility for misusing an internal helper by another subsystem.

So your argument here is moot at best and polemical and opportunistic at
worst. It certainly doesn't illustrate what you think it does.

And the above is fundamentally a suspiciously long sideshow. So let's
get back to the core topic: Unless you document your rules when it's ok
for a bpf program to come along and request access to internal apis
patchsets such as this are not acceptable.

> 
> > There is absolutely no way that any userspace will
> > get access to such low-level helpers. They have zero business to be
> > involved in the lifetimes of objects on this level just as no module has.
> 
> correct, and kfuncs do not give bpf prog to do direct get_file*() access
> because we saw how tricky safe_by_rcu is.
> Hence kfuncs acquire file via get_task_exe_file or get_mm_exe_file
> and release via fput.
> That's the same pattern that security/tomoyo/util.c is doing:
>    exe_file = get_mm_exe_file(mm);
>    if (!exe_file)
>         return NULL;
> 
>    cp = tomoyo_realpath_from_path(&exe_file->f_path);
>    fput(exe_file);
> 
> in bpf_lsm case it will be:
> 
>    exe_file = bpf_get_mm_exe_file(mm);
>    if (!exe_file)
>    // the verifier will enforce that bpf prog has this NULL check here
>    // because we annotate kfunc as:
> BTF_ID_FLAGS(func, bpf_get_mm_exe_file, KF_ACQUIRE | KF_TRUSTED_ARGS |
> KF_RET_NULL)
> 
>  bpf_path_d_path(&exe_file->f_path, ...);
>  bpf_put_file(exe_file);
> // and the verifier will enforce that bpf_put_file() is called too.
> // and there is no path out of this bpf program that can take file refcnt
> // without releasing.
> 
> So really these kfuncs are a nop from vfs pov.
> If there is a bug in the verifier we will debug it and we will fix it.
> 
> You keep saying that bpf_d_path() is a mess.
> Right. It is a mess now and we're fixing it.
> When it was introduced 4 years ago it was safe at that time.

Uhm, no it was always sketchy.

> The unrelated verifier "smartness" made it possible to use it in UAF.
> We found the issue now and we're fixing it.
> Over these years we didn't ask vfs folks to help fix such bugs,
> and not asking for help now.
> You're being cc-ed on the patches to be aware on how we plan to fix
> this bpf_d_path() mess. If you have a viable alternative please suggest.

The fix is to export a variant with trusted args.

> As it stands the new kfuncs are clean and safe way to solve this mess.

I will remind you of what you have been told in [1]:

"No. It's not up to maintainers to suggest alternatives. Sometimes it's
simply enough to explain *why* something isn't acceptable.

A plain "no" without explanation isn't sufficient. NAKs need a good
reason. But they don't need more than that.

The onus of coming up with an acceptable solution is on the person who
needs something new."

You've been provided:

a) good reasons why the patchset in it's current form isn't acceptable
   repeated multiple times
b) support for exporting a variant of bpf_d_path() that is safe to use
c) a request that all kfunc exports for the vfs will have to be located
   under fs/, not in kernel/bpf/
d) a path on how to move forward with additional kfunc requests:
   Clear and documented rules when it's ok for someone to come along and
   request access to bpf kfuncs when it's to be rejected and when it's
   ok to be supported.

You repeatedly threatening to go over the heads of people will not make
them more amenable to happily integrate with your subsystem.

[1]: https://lore.kernel.org/all/CAHk-=whD2HMe4ja5nR6WWofUh3nLmhjoSPDvZm2-XMGjeie5Tg@mail.gmail.com

Christian Brauner March 18, 2024, 1:24 p.m. UTC | #17

On Tue, Mar 12, 2024 at 05:06:36PM +0000, Matt Bobrowski wrote:
> Hey Christian,
> 
> On Mon, Mar 11, 2024 at 01:00:56PM +0100, Christian Brauner wrote:
> > On Fri, Mar 08, 2024 at 05:23:30PM -0800, Alexei Starovoitov wrote:
> > > On Fri, Mar 8, 2024 at 2:36 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > >
> > > > These exports are specifically for an out-of-tree BPF LSM program that
> > > > is not accessible to the public. The question in the other mail stands.
> > > 
> > > The question was already answered. You just don't like the answer.
> > > bpf progs are not equivalent to kernel modules.
> > > They have completely different safety and visibility properties.
> > > The safety part I already talked about.
> > > Sounds like the visibility has to be explained.
> > > Kernel modules are opaque binary blobs.
> > > bpf programs are fully transparent. The intent is known
> > > to the verifier and to anyone with understanding
> > > of bpf assembly.
> > > Those that cannot read bpf asm can read C source code that is
> > > embedded in the bpf program in kernel memory.
> > > It's not the same as "llvm-dwarfdump module.ko" on disk.
> > > The bpf prog source code is loaded into the kernel
> > > at program verification time for debugging and visibility reasons.
> > > If there is a verifier bug and bpf manages to crash the kernel
> > > vmcore will have relevant lines of program C source code right there.
> > > 
> > > Hence out-of-tree or in-tree bpf makes no practical difference.
> > > The program cannot hide its meaning and doesn't hamper debugging.
> > > 
> > > Hence adding EXPORT_SYMBOL == Brace for impact!
> > > Expect crashes, api misuse and what not.
> > > 
> > > While adding bpf_kfunc is a nop for kernel development.
> > > If kfunc is in the way of code refactoring it can be removed
> > > (as we demonstrated several times).
> > > A kfunc won't cause headaches for the kernel code it is
> > > calling (assuming no verifier bugs).
> > > If there is a bug it's on us to fix it as we demonstrated in the past.
> > > For example: bpf_probe_read_kernel().
> > > It's a wrapper of copy_from_kernel_nofault() and over the years
> > > bpf users hit various bugs in copy_from_kernel_nofault(),
> > > reported them, and _bpf developers_ fixed them.
> > > Though copy_from_kernel_nofault() is as generic as it can get
> > > and the same bugs could have been reproduced without bpf
> > > we took care of fixing these parts of the kernel.
> > > 
> > > Look at path_put().
> > > It's EXPORT_SYMBOL and any kernel module can easily screw up
> > > reference counting, so that sooner or later distro folks
> > > will experience debug pains due to out-of-tree drivers.
> > > 
> > > kfunc that calls path_put() won't have such consequences.
> > > The verifier will prevent path_put() on a pointer that wasn't
> > > acquired by the same bpf program. No support pains.
> > > It's a nop for vfs folks.
> > > 
> > > > > First of all, there is no such thing as get_task_fs_pwd/root
> > > > > in the kernel.
> > > >
> > > > Yeah, we'd need specific helpers for a never seen before out-of-tree BPF
> > > > LSM. I don't see how that's different from an out-of-tree kernel module.
> > > 
> > > Sorry, but you don't seem to understand what bpf can and cannot do,
> > > hence they look similar.
> > 
> > Maybe. On the other hand you seem to ignore what I'm saying. You
> > currently don't have a clear set of rules for when it's ok for someone
> > to send patches and request access to bpf kfuncs to implement a new BPF
> > program. This patchset very much illustrates this point. The safety
> > properties of bpf don't matter for this. And again, your safety
> > properties very much didn't protect you from your bpf_d_path() mess.
> > 
> > We're not even clearly told where and how these helper are supposed to be
> > used. That's not ok and will never be ok. As long as there are no clear
> > criteria to operate under this is highly problematic. This may be fine
> > from a bpf perspective and one can even understand why because that's
> > apparently your model or promise to your users. But there's no reason to
> > expect the same level of laxness from any of the subsystems you're
> > requesting kfuncs from.
> 
> You raise a completely fair point, and I truly do apologies for the
> lack of context and in depth explanations around the specific
> situations that the proposed BPF kfuncs are intended to be used
> from. Admittedly, that's a failure on my part, and I can completely
> understand why from a maintainers point of view there would be
> reservations around acknowledging requests for adding such invisible
> dependencies.

Thanks for providing more background.

> Now, I'm in a little bit of a tough situation as I'm unable to point
> you to an open-source BPF LSM implementation that intends to make use
> of such newly proposed BPF kfuncs. That's just an unfortunate
> constraint and circumstance that I'm having to deal with, so I'm just
> going to have to provide heavily redacted and incomplete example to
> illustrate how these BPF kfuncs intend to be used from BPF LSM
> programs that I personally work on here at Google. Notably though, the
> contexts that I do share here may obviously be a nonholistic view on
> how these newly introduced BPF kfuncs end up getting used in practice
> by some other completely arbitrary open-source BPF LSM programs.

I have to say that this to me is wild. Essentially we're supposed to
allow access to our internal APIs based on internal use-cases that
aren't public and likely never will be.

If that's acceptable then I want bpf to document this in their kernel
Documentation and submit this for review to the wider community.

Alexei Starovoitov March 27, 2024, 9:41 p.m. UTC | #18

On Mon, Mar 18, 2024 at 6:14 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Wed, Mar 13, 2024 at 02:05:13PM -0700, Alexei Starovoitov wrote:
>
> But this whole polemic just illustrates that you simply didn't bother to
> understand how the code works. The way you talk about UAF together with
> SLAB_TYPESAFE_BY_RCU is telling. Please read the code instead of
> guessing.

Ok. Fair enough. I've read the code and old threads from Sep-Nov.
I think the concerns about typesafe_by_rcu made folks believe in
races that don't exist.

           if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
..
            *  (a) the file ref already went down to zero and the
            *      file hasn't been reused yet or the file count
            *      isn't zero but the file has already been reused.

..
            if (unlikely(file != rcu_dereference_raw(*fdentry)) ||

The first part of the comment is partially incorrect.
(it's in the wrong place).

The file ref could have been down to zero and not reused yet,
but it's before atomic_long_inc_not_zero.
Once the code reaches 2nd check the file guaranteed to be reused
and it went through init_file(), because that's the only code
that brings it back from zero.
This race is ok:

cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
// file->f_count == 1
                                        rcu_assign_pointer(fdt->fd[fd], NULL);
                                        fput() // reaches zero

atomic_long_inc_not_zero()
// will not succeed.

This race is ok too:
cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
// file->f_count == 1
                                        rcu_assign_pointer(fdt->fd[fd], NULL);

atomic_long_inc_not_zero()
// succeeds. f_count == 2
                                        fput() // f_count == 1

file != rcu_dereference_raw(*fdentry)
// will fail
but it doesn't have to.
This is a safe race.
It's no different if cpu0 went through
these steps including successful last check
                                        and then cpu1 did close(fd)

The file held by cpu0 is not on the
path to zero.

Similarly, back then, there was a concern about two parallel __fget_files_rcu()
where one cpu incremented refcnt, failed some check and didn't do fput yet.
In this case the file is not on the path to zero either.
Both cpu-s saw non-zero f_count when they went through atomic_long_inc_not_zero.
The file is not on the path to be reused.

Now the second part of the comment
"the file count isn't zero but the file has already been reused"
is misleading.

The (file != rcu_dereference_raw(*fdentry)) check is racy.
Another cpu could have replaced that slot right after that check.
Example:
cpu0 doing lockless __fget_files_rcu() while cpu1 doing sys_close.
__fget_files_rcu() will be returning a file that doesn't exist in fdt.
And it's safe.

This race is possible:
cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
                                        fput() reaches zero
                                        file_free
                                        alloc_empty_file // got same file
                                        init_file // f_count = 1
atomic_long_inc_not_zero()
// succeeds. f_count == 2
file != rcu_dereference_raw(*fdentry))
// preventing a reuse of a file that
was never in this fdt.

The only value of this check is to make sure that this file
either _is_ or _was_ at some point in this fdt.
It's not preventing reuse per-se.

This race is possible:
cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
                                        fput() reaches zero
                                        file_free
                                        alloc_empty_file // got same file
                                        init_file // f_count = 1
                                        fd_install
atomic_long_inc_not_zero()
// succeeds. f_count == 2
file == rcu_dereference_raw(*fdentry))
// success, though the file _was reused_.

I suggest to revise the comment.

>
> You've been provided:
>
> a) good reasons why the patchset in it's current form isn't acceptable
>    repeated multiple times

We will improve commit logs in the next revision.

> b) support for exporting a variant of bpf_d_path() that is safe to use

good, but bpf_d_path kfunc alone is not useful.
As Jann noted back in September:
https://lore.kernel.org/all/CAG48ez2d5CW=CDi+fBOU1YqtwHfubN3q6w=1LfD+ss+Q1PWHgQ@mail.gmail.com/

The conversion of files to typesafe_by_rcu broke that verifier
assumption about mm->exe_file and we need kfuncs to safely
acquire/release file reference to fix that breakage.

> c) a request that all kfunc exports for the vfs will have to be located
>    under fs/, not in kernel/bpf/

we've added kfuncs to
net/netfilter/, net/xfrm/, net/ipv4/, kernel/cgroup/, drivers/hid/
because maintainers of those subsystems demonstrated understanding
of what bpf is doing and what these kfuncs are for.

We can put them in fs/, but you need to demonstrate willingness to
understand the problem we're solving instead of arguing
about how hard file typesafe_by_rcu is to understand.

> d) a path on how to move forward with additional kfunc requests:
>    Clear and documented rules when it's ok for someone to come along and
>    request access to bpf kfuncs when it's to be rejected and when it's
>    ok to be supported.

There are ~36500 EXPORT_SYMBOL in the kernel.
Are there "clear documented rules when it's ok for someone to"
add or remove them?
There is a gentleman's agreement that maintainers of subsystems need to
be cc-ed when new EXPORT_SYMBOL-s are added.
In this case no new EXPORT_SYMBOLs are requested.

Compare that to 221 bpf helpers (which are uapi, and for the last
2 years we didn't add a single one) and 151 bpf kfuncs which are
not uapi as clearly documented in Documentation/bpf/kfuncs.rst
When developers want to add them they cc bpf@vger and relevant
subsystems just like we did with netfilter, xfrm, cgroup, hid.
kfunc deprecation rules are also documented in kfunc.rst

> You repeatedly threatening to go over the heads of people will not make
> them more amenable to happily integrate with your subsystem.

This is not it. We made our own mistakes with bpf_d_path safety, and now
file typesafe_by_rcu broke bpf safety assumptions.
We have to fix it one way or the other.
It's bpf that got affected by your changes.
But we don't demand that you fix bpf bits. We're fixing them.
But you have to provide technical reasons why file acquire/release
kfuncs are not suitable.
"Only 3 people understand typesafe_by_rcu" is not it.

[v2,bpf-next,0/9] add new acquire/release BPF kfuncs

Message

Comments