[bpf,v2] bpf: Fix prog_array UAF in __uprobe_perf_func()

Message ID	20241206-bpf-fix-uprobe-uaf-v2-1-4c75c54fe424@google.com (mailing list archive)
State	New
Headers	show Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECA0D1953BA for <linux-trace-kernel@vger.kernel.org>; Fri, 6 Dec 2024 20:45:33 +0000 (UTC) From: Jann Horn <jannh@google.com> Date: Fri, 06 Dec 2024 21:45:28 +0100 Subject: [PATCH bpf v2] bpf: Fix prog_array UAF in __uprobe_perf_func() Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20241206-bpf-fix-uprobe-uaf-v2-1-4c75c54fe424@google.com> To: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, John Fastabend <john.fastabend@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Delyan Kratunov <delyank@fb.com> Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, stable@vger.kernel.org, Jann Horn <jannh@google.com>
Series	[bpf,v2] bpf: Fix prog_array UAF in __uprobe_perf_func() \| expand [bpf,v2] bpf: Fix prog_array UAF in __uprobe_perf_func()

Jann Horn Dec. 6, 2024, 8:45 p.m. UTC

Currently, the pointer stored in call->prog_array is loaded in
__uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
loaded pointer can immediately be dangling. Later,
bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
but this is too late. It then uses rcu_dereference_check(), but this use of
rcu_dereference_check() does not actually dereference anything.

It looks like the intention was to pass a pointer to the member
call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
the pointer in there. Fix the issue by actually doing that.

Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
before the might_fault() in bpf_prog_run_array_uprobe() and add an
include of linux/delay.h.

Build this userspace program:

```
$ cat dummy.c
#include <stdio.h>
int main(void) {
  printf("hello world\n");
}
$ gcc -o dummy dummy.c
```

Then build this BPF program and load it (change the path to point to
the "dummy" binary you built):

```
$ cat bpf-uprobe-kern.c
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
char _license[] SEC("license") = "GPL";

SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
int BPF_UPROBE(main_uprobe) {
  bpf_printk("main uprobe triggered\n");
  return 0;
}
$ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
$ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
```

Then run ./dummy in one terminal, and after launching it, run
`sudo umount uprobe-test` in another terminal. Once the 10-second
mdelay() is over, a use-after-free should occur, which may or may
not crash your kernel at the `prog->sleepable` check in
bpf_prog_run_array_uprobe() depending on your luck.
---
Changes in v2:
- remove diff chunk in patch notes that confuses git
- Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
---
 include/linux/bpf.h         | 4 ++--
 kernel/trace/trace_uprobe.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


---
base-commit: 509df676c2d79c985ec2eaa3e3a3bbe557645861
change-id: 20241206-bpf-fix-uprobe-uaf-53d928bab3d0

Andrii Nakryiko Dec. 6, 2024, 10:15 p.m. UTC | #1

On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
>
> Currently, the pointer stored in call->prog_array is loaded in
> __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> loaded pointer can immediately be dangling. Later,
> bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> but this is too late. It then uses rcu_dereference_check(), but this use of
> rcu_dereference_check() does not actually dereference anything.
>
> It looks like the intention was to pass a pointer to the member
> call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> the pointer in there. Fix the issue by actually doing that.
>
> Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> before the might_fault() in bpf_prog_run_array_uprobe() and add an
> include of linux/delay.h.
>
> Build this userspace program:
>
> ```
> $ cat dummy.c
> #include <stdio.h>
> int main(void) {
>   printf("hello world\n");
> }
> $ gcc -o dummy dummy.c
> ```
>
> Then build this BPF program and load it (change the path to point to
> the "dummy" binary you built):
>
> ```
> $ cat bpf-uprobe-kern.c
> #include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_tracing.h>
> char _license[] SEC("license") = "GPL";
>
> SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> int BPF_UPROBE(main_uprobe) {
>   bpf_printk("main uprobe triggered\n");
>   return 0;
> }
> $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> ```
>
> Then run ./dummy in one terminal, and after launching it, run
> `sudo umount uprobe-test` in another terminal. Once the 10-second
> mdelay() is over, a use-after-free should occur, which may or may
> not crash your kernel at the `prog->sleepable` check in
> bpf_prog_run_array_uprobe() depending on your luck.
> ---
> Changes in v2:
> - remove diff chunk in patch notes that confuses git
> - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> ---
>  include/linux/bpf.h         | 4 ++--
>  kernel/trace/trace_uprobe.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>

Looking at how similar in spirit bpf_prog_run_array() is meant to be
used, it seems like it is the caller's responsibility to
RCU-dereference array and keep RCU critical section before calling
into bpf_prog_run_array(). So I wonder if it's best to do this instead
(Gmail will butcher the diff, but it's about the idea):

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eaee2a819f4c..4b8a9edd3727 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
  * rcu-protected dynamically sized maps.
  */
 static __always_inline u32
-bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
+bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
                          const void *ctx, bpf_prog_run_fn run_prog)
 {
        const struct bpf_prog_array_item *item;
        const struct bpf_prog *prog;
-       const struct bpf_prog_array *array;
        struct bpf_run_ctx *old_run_ctx;
        struct bpf_trace_run_ctx run_ctx;
        u32 ret = 1;

        might_fault();
+       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
+
+       if (unlikely(!array))
+               goto out;

-       rcu_read_lock_trace();
        migrate_disable();

        run_ctx.is_uprobe = true;

-       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
-       if (unlikely(!array))
-               goto out;
        old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
        item = &array->items[0];
        while ((prog = READ_ONCE(item->prog))) {
@@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
bpf_prog_array __rcu *array_rcu,
        bpf_reset_run_ctx(old_run_ctx);
 out:
        migrate_enable();
-       rcu_read_unlock_trace();
        return ret;
 }

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index fed382b7881b..87a2b8fefa90 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
        if (bpf_prog_array_valid(call)) {
                u32 ret;

+               rcu_read_lock_trace();
                ret = bpf_prog_run_array_uprobe(call->prog_array,
regs, bpf_prog_run);
+               rcu_read_unlock_trace();
                if (!ret)
                        return;
        }


> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index eaee2a819f4c150a34a7b1075584711609682e4c..00b3c5b197df75a0386233b9885b480b2ce72f5f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2193,7 +2193,7 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
>   * rcu-protected dynamically sized maps.
>   */
>  static __always_inline u32
> -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> +bpf_prog_run_array_uprobe(struct bpf_prog_array __rcu **array_rcu,
>                           const void *ctx, bpf_prog_run_fn run_prog)
>  {
>         const struct bpf_prog_array_item *item;
> @@ -2210,7 +2210,7 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
>
>         run_ctx.is_uprobe = true;
>
> -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> +       array = rcu_dereference_check(*array_rcu, rcu_read_lock_trace_held());
>         if (unlikely(!array))
>                 goto out;
>         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index fed382b7881b82ee3c334ea77860cce77581a74d..c4eef1eb5ddb3c65205aa9d64af1c72d62fab87f 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1404,7 +1404,7 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
>         if (bpf_prog_array_valid(call)) {
>                 u32 ret;
>
> -               ret = bpf_prog_run_array_uprobe(call->prog_array, regs, bpf_prog_run);
> +               ret = bpf_prog_run_array_uprobe(&call->prog_array, regs, bpf_prog_run);
>                 if (!ret)
>                         return;
>         }
>
> ---
> base-commit: 509df676c2d79c985ec2eaa3e3a3bbe557645861
> change-id: 20241206-bpf-fix-uprobe-uaf-53d928bab3d0
>
> --
> Jann Horn <jannh@google.com>
>
>

Jann Horn Dec. 6, 2024, 10:25 p.m. UTC | #2

On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> >
> > Currently, the pointer stored in call->prog_array is loaded in
> > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > loaded pointer can immediately be dangling. Later,
> > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > but this is too late. It then uses rcu_dereference_check(), but this use of
> > rcu_dereference_check() does not actually dereference anything.
> >
> > It looks like the intention was to pass a pointer to the member
> > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > the pointer in there. Fix the issue by actually doing that.
> >
> > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
> > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > include of linux/delay.h.
> >
> > Build this userspace program:
> >
> > ```
> > $ cat dummy.c
> > #include <stdio.h>
> > int main(void) {
> >   printf("hello world\n");
> > }
> > $ gcc -o dummy dummy.c
> > ```
> >
> > Then build this BPF program and load it (change the path to point to
> > the "dummy" binary you built):
> >
> > ```
> > $ cat bpf-uprobe-kern.c
> > #include <linux/bpf.h>
> > #include <bpf/bpf_helpers.h>
> > #include <bpf/bpf_tracing.h>
> > char _license[] SEC("license") = "GPL";
> >
> > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > int BPF_UPROBE(main_uprobe) {
> >   bpf_printk("main uprobe triggered\n");
> >   return 0;
> > }
> > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > ```
> >
> > Then run ./dummy in one terminal, and after launching it, run
> > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > mdelay() is over, a use-after-free should occur, which may or may
> > not crash your kernel at the `prog->sleepable` check in
> > bpf_prog_run_array_uprobe() depending on your luck.
> > ---
> > Changes in v2:
> > - remove diff chunk in patch notes that confuses git
> > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > ---
> >  include/linux/bpf.h         | 4 ++--
> >  kernel/trace/trace_uprobe.c | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
>
> Looking at how similar in spirit bpf_prog_run_array() is meant to be
> used, it seems like it is the caller's responsibility to
> RCU-dereference array and keep RCU critical section before calling
> into bpf_prog_run_array(). So I wonder if it's best to do this instead
> (Gmail will butcher the diff, but it's about the idea):

Yeah, that's the other option I was considering. That would be more
consistent with the existing bpf_prog_run_array(), but has the
downside of unnecessarily pushing responsibility up to the caller...
I'm fine with either.

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index eaee2a819f4c..4b8a9edd3727 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
>   * rcu-protected dynamically sized maps.
>   */
>  static __always_inline u32
> -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
>                           const void *ctx, bpf_prog_run_fn run_prog)
>  {
>         const struct bpf_prog_array_item *item;
>         const struct bpf_prog *prog;
> -       const struct bpf_prog_array *array;
>         struct bpf_run_ctx *old_run_ctx;
>         struct bpf_trace_run_ctx run_ctx;
>         u32 ret = 1;
>
>         might_fault();
> +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> +
> +       if (unlikely(!array))
> +               goto out;
>
> -       rcu_read_lock_trace();
>         migrate_disable();
>
>         run_ctx.is_uprobe = true;
>
> -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> -       if (unlikely(!array))
> -               goto out;
>         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>         item = &array->items[0];
>         while ((prog = READ_ONCE(item->prog))) {
> @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> bpf_prog_array __rcu *array_rcu,
>         bpf_reset_run_ctx(old_run_ctx);
>  out:
>         migrate_enable();
> -       rcu_read_unlock_trace();
>         return ret;
>  }
>
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index fed382b7881b..87a2b8fefa90 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
>         if (bpf_prog_array_valid(call)) {
>                 u32 ret;
>
> +               rcu_read_lock_trace();
>                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> regs, bpf_prog_run);

But then this should be something like this (possibly split across
multiple lines with a helper variable or such):

ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
rcu_read_lock_trace_held()), regs, bpf_prog_run);

> +               rcu_read_unlock_trace();
>                 if (!ret)
>                         return;
>         }
>
>

Andrii Nakryiko Dec. 6, 2024, 10:30 p.m. UTC | #3

On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > Currently, the pointer stored in call->prog_array is loaded in
> > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > loaded pointer can immediately be dangling. Later,
> > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > rcu_dereference_check() does not actually dereference anything.
> > >
> > > It looks like the intention was to pass a pointer to the member
> > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > the pointer in there. Fix the issue by actually doing that.
> > >
> > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Jann Horn <jannh@google.com>
> > > ---
> > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > include of linux/delay.h.
> > >
> > > Build this userspace program:
> > >
> > > ```
> > > $ cat dummy.c
> > > #include <stdio.h>
> > > int main(void) {
> > >   printf("hello world\n");
> > > }
> > > $ gcc -o dummy dummy.c
> > > ```
> > >
> > > Then build this BPF program and load it (change the path to point to
> > > the "dummy" binary you built):
> > >
> > > ```
> > > $ cat bpf-uprobe-kern.c
> > > #include <linux/bpf.h>
> > > #include <bpf/bpf_helpers.h>
> > > #include <bpf/bpf_tracing.h>
> > > char _license[] SEC("license") = "GPL";
> > >
> > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > int BPF_UPROBE(main_uprobe) {
> > >   bpf_printk("main uprobe triggered\n");
> > >   return 0;
> > > }
> > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > ```
> > >
> > > Then run ./dummy in one terminal, and after launching it, run
> > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > mdelay() is over, a use-after-free should occur, which may or may
> > > not crash your kernel at the `prog->sleepable` check in
> > > bpf_prog_run_array_uprobe() depending on your luck.
> > > ---
> > > Changes in v2:
> > > - remove diff chunk in patch notes that confuses git
> > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > ---
> > >  include/linux/bpf.h         | 4 ++--
> > >  kernel/trace/trace_uprobe.c | 2 +-
> > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > >
> >
> > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > used, it seems like it is the caller's responsibility to
> > RCU-dereference array and keep RCU critical section before calling
> > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > (Gmail will butcher the diff, but it's about the idea):
>
> Yeah, that's the other option I was considering. That would be more
> consistent with the existing bpf_prog_run_array(), but has the
> downside of unnecessarily pushing responsibility up to the caller...
> I'm fine with either.

there is really just one caller ("legacy" singular uprobe handler), so
I think this should be fine. Unless someone objects I'd keep it
consistent with other "prog_array_run" helpers

>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index eaee2a819f4c..4b8a9edd3727 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> >   * rcu-protected dynamically sized maps.
> >   */
> >  static __always_inline u32
> > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> >                           const void *ctx, bpf_prog_run_fn run_prog)
> >  {
> >         const struct bpf_prog_array_item *item;
> >         const struct bpf_prog *prog;
> > -       const struct bpf_prog_array *array;
> >         struct bpf_run_ctx *old_run_ctx;
> >         struct bpf_trace_run_ctx run_ctx;
> >         u32 ret = 1;
> >
> >         might_fault();
> > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > +
> > +       if (unlikely(!array))
> > +               goto out;
> >
> > -       rcu_read_lock_trace();
> >         migrate_disable();
> >
> >         run_ctx.is_uprobe = true;
> >
> > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > -       if (unlikely(!array))
> > -               goto out;
> >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >         item = &array->items[0];
> >         while ((prog = READ_ONCE(item->prog))) {
> > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > bpf_prog_array __rcu *array_rcu,
> >         bpf_reset_run_ctx(old_run_ctx);
> >  out:
> >         migrate_enable();
> > -       rcu_read_unlock_trace();
> >         return ret;
> >  }
> >
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index fed382b7881b..87a2b8fefa90 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> >         if (bpf_prog_array_valid(call)) {
> >                 u32 ret;
> >
> > +               rcu_read_lock_trace();
> >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > regs, bpf_prog_run);
>
> But then this should be something like this (possibly split across
> multiple lines with a helper variable or such):
>
> ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> rcu_read_lock_trace_held()), regs, bpf_prog_run);

Yeah, absolutely, forgot to move the RCU dereference part, good catch!
But I wouldn't do the _check() variant here, literally the previous
line does rcu_read_trace_lock(), so this check part seems like just
unnecessary verboseness, I'd go with a simple rcu_dereference().

>
> > +               rcu_read_unlock_trace();
> >                 if (!ret)
> >                         return;
> >         }
> >
> >

Jann Horn Dec. 6, 2024, 10:43 p.m. UTC | #4

On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > loaded pointer can immediately be dangling. Later,
> > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > rcu_dereference_check() does not actually dereference anything.
> > > >
> > > > It looks like the intention was to pass a pointer to the member
> > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > the pointer in there. Fix the issue by actually doing that.
> > > >
> > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > ---
> > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > include of linux/delay.h.
> > > >
> > > > Build this userspace program:
> > > >
> > > > ```
> > > > $ cat dummy.c
> > > > #include <stdio.h>
> > > > int main(void) {
> > > >   printf("hello world\n");
> > > > }
> > > > $ gcc -o dummy dummy.c
> > > > ```
> > > >
> > > > Then build this BPF program and load it (change the path to point to
> > > > the "dummy" binary you built):
> > > >
> > > > ```
> > > > $ cat bpf-uprobe-kern.c
> > > > #include <linux/bpf.h>
> > > > #include <bpf/bpf_helpers.h>
> > > > #include <bpf/bpf_tracing.h>
> > > > char _license[] SEC("license") = "GPL";
> > > >
> > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > int BPF_UPROBE(main_uprobe) {
> > > >   bpf_printk("main uprobe triggered\n");
> > > >   return 0;
> > > > }
> > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > ```
> > > >
> > > > Then run ./dummy in one terminal, and after launching it, run
> > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > not crash your kernel at the `prog->sleepable` check in
> > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > ---
> > > > Changes in v2:
> > > > - remove diff chunk in patch notes that confuses git
> > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > ---
> > > >  include/linux/bpf.h         | 4 ++--
> > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > >
> > >
> > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > used, it seems like it is the caller's responsibility to
> > > RCU-dereference array and keep RCU critical section before calling
> > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > (Gmail will butcher the diff, but it's about the idea):
> >
> > Yeah, that's the other option I was considering. That would be more
> > consistent with the existing bpf_prog_run_array(), but has the
> > downside of unnecessarily pushing responsibility up to the caller...
> > I'm fine with either.
>
> there is really just one caller ("legacy" singular uprobe handler), so
> I think this should be fine. Unless someone objects I'd keep it
> consistent with other "prog_array_run" helpers

Ack, I will make it consistent.

> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index eaee2a819f4c..4b8a9edd3727 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > >   * rcu-protected dynamically sized maps.
> > >   */
> > >  static __always_inline u32
> > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > >  {
> > >         const struct bpf_prog_array_item *item;
> > >         const struct bpf_prog *prog;
> > > -       const struct bpf_prog_array *array;
> > >         struct bpf_run_ctx *old_run_ctx;
> > >         struct bpf_trace_run_ctx run_ctx;
> > >         u32 ret = 1;
> > >
> > >         might_fault();
> > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > +
> > > +       if (unlikely(!array))
> > > +               goto out;
> > >
> > > -       rcu_read_lock_trace();
> > >         migrate_disable();
> > >
> > >         run_ctx.is_uprobe = true;
> > >
> > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > -       if (unlikely(!array))
> > > -               goto out;
> > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > >         item = &array->items[0];
> > >         while ((prog = READ_ONCE(item->prog))) {
> > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > bpf_prog_array __rcu *array_rcu,
> > >         bpf_reset_run_ctx(old_run_ctx);
> > >  out:
> > >         migrate_enable();
> > > -       rcu_read_unlock_trace();
> > >         return ret;
> > >  }
> > >
> > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > index fed382b7881b..87a2b8fefa90 100644
> > > --- a/kernel/trace/trace_uprobe.c
> > > +++ b/kernel/trace/trace_uprobe.c
> > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > >         if (bpf_prog_array_valid(call)) {
> > >                 u32 ret;
> > >
> > > +               rcu_read_lock_trace();
> > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > regs, bpf_prog_run);
> >
> > But then this should be something like this (possibly split across
> > multiple lines with a helper variable or such):
> >
> > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > rcu_read_lock_trace_held()), regs, bpf_prog_run);
>
> Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> But I wouldn't do the _check() variant here, literally the previous
> line does rcu_read_trace_lock(), so this check part seems like just
> unnecessary verboseness, I'd go with a simple rcu_dereference().

rcu_dereference() is not legal there - that asserts that we are in a
normal RCU read-side critical section, which we are not.
rcu_dereference_raw() would be, but I think it is nice to document the
semantics to make it explicit under which lock we're operating.

I'll send a v3 in a bit after testing it.

Jann Horn Dec. 6, 2024, 11:13 p.m. UTC | #5

On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@google.com> wrote:
> On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > > > >
> > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > loaded pointer can immediately be dangling. Later,
> > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > rcu_dereference_check() does not actually dereference anything.
> > > > >
> > > > > It looks like the intention was to pass a pointer to the member
> > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > the pointer in there. Fix the issue by actually doing that.
> > > > >
> > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > > ---
> > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > include of linux/delay.h.
> > > > >
> > > > > Build this userspace program:
> > > > >
> > > > > ```
> > > > > $ cat dummy.c
> > > > > #include <stdio.h>
> > > > > int main(void) {
> > > > >   printf("hello world\n");
> > > > > }
> > > > > $ gcc -o dummy dummy.c
> > > > > ```
> > > > >
> > > > > Then build this BPF program and load it (change the path to point to
> > > > > the "dummy" binary you built):
> > > > >
> > > > > ```
> > > > > $ cat bpf-uprobe-kern.c
> > > > > #include <linux/bpf.h>
> > > > > #include <bpf/bpf_helpers.h>
> > > > > #include <bpf/bpf_tracing.h>
> > > > > char _license[] SEC("license") = "GPL";
> > > > >
> > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > int BPF_UPROBE(main_uprobe) {
> > > > >   bpf_printk("main uprobe triggered\n");
> > > > >   return 0;
> > > > > }
> > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > ```
> > > > >
> > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > ---
> > > > > Changes in v2:
> > > > > - remove diff chunk in patch notes that confuses git
> > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > > ---
> > > > >  include/linux/bpf.h         | 4 ++--
> > > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > >
> > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > used, it seems like it is the caller's responsibility to
> > > > RCU-dereference array and keep RCU critical section before calling
> > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > (Gmail will butcher the diff, but it's about the idea):
> > >
> > > Yeah, that's the other option I was considering. That would be more
> > > consistent with the existing bpf_prog_run_array(), but has the
> > > downside of unnecessarily pushing responsibility up to the caller...
> > > I'm fine with either.
> >
> > there is really just one caller ("legacy" singular uprobe handler), so
> > I think this should be fine. Unless someone objects I'd keep it
> > consistent with other "prog_array_run" helpers
>
> Ack, I will make it consistent.
>
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > >   * rcu-protected dynamically sized maps.
> > > >   */
> > > >  static __always_inline u32
> > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > > >  {
> > > >         const struct bpf_prog_array_item *item;
> > > >         const struct bpf_prog *prog;
> > > > -       const struct bpf_prog_array *array;
> > > >         struct bpf_run_ctx *old_run_ctx;
> > > >         struct bpf_trace_run_ctx run_ctx;
> > > >         u32 ret = 1;
> > > >
> > > >         might_fault();
> > > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > +
> > > > +       if (unlikely(!array))
> > > > +               goto out;
> > > >
> > > > -       rcu_read_lock_trace();
> > > >         migrate_disable();
> > > >
> > > >         run_ctx.is_uprobe = true;
> > > >
> > > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > -       if (unlikely(!array))
> > > > -               goto out;
> > > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > >         item = &array->items[0];
> > > >         while ((prog = READ_ONCE(item->prog))) {
> > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > bpf_prog_array __rcu *array_rcu,
> > > >         bpf_reset_run_ctx(old_run_ctx);
> > > >  out:
> > > >         migrate_enable();
> > > > -       rcu_read_unlock_trace();
> > > >         return ret;
> > > >  }
> > > >
> > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > index fed382b7881b..87a2b8fefa90 100644
> > > > --- a/kernel/trace/trace_uprobe.c
> > > > +++ b/kernel/trace/trace_uprobe.c
> > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > >         if (bpf_prog_array_valid(call)) {
> > > >                 u32 ret;
> > > >
> > > > +               rcu_read_lock_trace();
> > > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > regs, bpf_prog_run);
> > >
> > > But then this should be something like this (possibly split across
> > > multiple lines with a helper variable or such):
> > >
> > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> >
> > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > But I wouldn't do the _check() variant here, literally the previous
> > line does rcu_read_trace_lock(), so this check part seems like just
> > unnecessary verboseness, I'd go with a simple rcu_dereference().
>
> rcu_dereference() is not legal there - that asserts that we are in a
> normal RCU read-side critical section, which we are not.
> rcu_dereference_raw() would be, but I think it is nice to document the
> semantics to make it explicit under which lock we're operating.
>
> I'll send a v3 in a bit after testing it.

Actually, now I'm still hitting a page fault with my WIP v3 fix
applied... I'll probably poke at this some more next week.

Andrii Nakryiko Dec. 6, 2024, 11:15 p.m. UTC | #6

On Fri, Dec 6, 2024 at 3:14 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@google.com> wrote:
> > On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > > > > >
> > > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > > loaded pointer can immediately be dangling. Later,
> > > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > > rcu_dereference_check() does not actually dereference anything.
> > > > > >
> > > > > > It looks like the intention was to pass a pointer to the member
> > > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > > the pointer in there. Fix the issue by actually doing that.
> > > > > >
> > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > > > ---
> > > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > > include of linux/delay.h.
> > > > > >
> > > > > > Build this userspace program:
> > > > > >
> > > > > > ```
> > > > > > $ cat dummy.c
> > > > > > #include <stdio.h>
> > > > > > int main(void) {
> > > > > >   printf("hello world\n");
> > > > > > }
> > > > > > $ gcc -o dummy dummy.c
> > > > > > ```
> > > > > >
> > > > > > Then build this BPF program and load it (change the path to point to
> > > > > > the "dummy" binary you built):
> > > > > >
> > > > > > ```
> > > > > > $ cat bpf-uprobe-kern.c
> > > > > > #include <linux/bpf.h>
> > > > > > #include <bpf/bpf_helpers.h>
> > > > > > #include <bpf/bpf_tracing.h>
> > > > > > char _license[] SEC("license") = "GPL";
> > > > > >
> > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > > int BPF_UPROBE(main_uprobe) {
> > > > > >   bpf_printk("main uprobe triggered\n");
> > > > > >   return 0;
> > > > > > }
> > > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > > ```
> > > > > >
> > > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > > ---
> > > > > > Changes in v2:
> > > > > > - remove diff chunk in patch notes that confuses git
> > > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > > > ---
> > > > > >  include/linux/bpf.h         | 4 ++--
> > > > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > >
> > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > > used, it seems like it is the caller's responsibility to
> > > > > RCU-dereference array and keep RCU critical section before calling
> > > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > > (Gmail will butcher the diff, but it's about the idea):
> > > >
> > > > Yeah, that's the other option I was considering. That would be more
> > > > consistent with the existing bpf_prog_run_array(), but has the
> > > > downside of unnecessarily pushing responsibility up to the caller...
> > > > I'm fine with either.
> > >
> > > there is really just one caller ("legacy" singular uprobe handler), so
> > > I think this should be fine. Unless someone objects I'd keep it
> > > consistent with other "prog_array_run" helpers
> >
> > Ack, I will make it consistent.
> >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > > >   * rcu-protected dynamically sized maps.
> > > > >   */
> > > > >  static __always_inline u32
> > > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > > > >  {
> > > > >         const struct bpf_prog_array_item *item;
> > > > >         const struct bpf_prog *prog;
> > > > > -       const struct bpf_prog_array *array;
> > > > >         struct bpf_run_ctx *old_run_ctx;
> > > > >         struct bpf_trace_run_ctx run_ctx;
> > > > >         u32 ret = 1;
> > > > >
> > > > >         might_fault();
> > > > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > > +
> > > > > +       if (unlikely(!array))
> > > > > +               goto out;
> > > > >
> > > > > -       rcu_read_lock_trace();
> > > > >         migrate_disable();
> > > > >
> > > > >         run_ctx.is_uprobe = true;
> > > > >
> > > > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > > -       if (unlikely(!array))
> > > > > -               goto out;
> > > > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > >         item = &array->items[0];
> > > > >         while ((prog = READ_ONCE(item->prog))) {
> > > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > > bpf_prog_array __rcu *array_rcu,
> > > > >         bpf_reset_run_ctx(old_run_ctx);
> > > > >  out:
> > > > >         migrate_enable();
> > > > > -       rcu_read_unlock_trace();
> > > > >         return ret;
> > > > >  }
> > > > >
> > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > > index fed382b7881b..87a2b8fefa90 100644
> > > > > --- a/kernel/trace/trace_uprobe.c
> > > > > +++ b/kernel/trace/trace_uprobe.c
> > > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > > >         if (bpf_prog_array_valid(call)) {
> > > > >                 u32 ret;
> > > > >
> > > > > +               rcu_read_lock_trace();
> > > > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > > regs, bpf_prog_run);
> > > >
> > > > But then this should be something like this (possibly split across
> > > > multiple lines with a helper variable or such):
> > > >
> > > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> > >
> > > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > > But I wouldn't do the _check() variant here, literally the previous
> > > line does rcu_read_trace_lock(), so this check part seems like just
> > > unnecessary verboseness, I'd go with a simple rcu_dereference().
> >
> > rcu_dereference() is not legal there - that asserts that we are in a
> > normal RCU read-side critical section, which we are not.
> > rcu_dereference_raw() would be, but I think it is nice to document the
> > semantics to make it explicit under which lock we're operating.

sure, I don't mind

> >
> > I'll send a v3 in a bit after testing it.
>
> Actually, now I'm still hitting a page fault with my WIP v3 fix
> applied... I'll probably poke at this some more next week.

OK, that's interesting, keep us posted!

Jann Horn Dec. 9, 2024, 6:21 p.m. UTC | #7

On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Fri, Dec 6, 2024 at 3:14 PM Jann Horn <jannh@google.com> wrote:
> > On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@google.com> wrote:
> > > On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
> > > > >
> > > > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > > > > > >
> > > > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > > > loaded pointer can immediately be dangling. Later,
> > > > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > > > rcu_dereference_check() does not actually dereference anything.
> > > > > > >
> > > > > > > It looks like the intention was to pass a pointer to the member
> > > > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > > > the pointer in there. Fix the issue by actually doing that.
> > > > > > >
> > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > > > > ---
> > > > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > > > include of linux/delay.h.
> > > > > > >
> > > > > > > Build this userspace program:
> > > > > > >
> > > > > > > ```
> > > > > > > $ cat dummy.c
> > > > > > > #include <stdio.h>
> > > > > > > int main(void) {
> > > > > > >   printf("hello world\n");
> > > > > > > }
> > > > > > > $ gcc -o dummy dummy.c
> > > > > > > ```
> > > > > > >
> > > > > > > Then build this BPF program and load it (change the path to point to
> > > > > > > the "dummy" binary you built):
> > > > > > >
> > > > > > > ```
> > > > > > > $ cat bpf-uprobe-kern.c
> > > > > > > #include <linux/bpf.h>
> > > > > > > #include <bpf/bpf_helpers.h>
> > > > > > > #include <bpf/bpf_tracing.h>
> > > > > > > char _license[] SEC("license") = "GPL";
> > > > > > >
> > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > > > int BPF_UPROBE(main_uprobe) {
> > > > > > >   bpf_printk("main uprobe triggered\n");
> > > > > > >   return 0;
> > > > > > > }
> > > > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > > > ```
> > > > > > >
> > > > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > > > ---
> > > > > > > Changes in v2:
> > > > > > > - remove diff chunk in patch notes that confuses git
> > > > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > > > > ---
> > > > > > >  include/linux/bpf.h         | 4 ++--
> > > > > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > >
> > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > > > used, it seems like it is the caller's responsibility to
> > > > > > RCU-dereference array and keep RCU critical section before calling
> > > > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > > > (Gmail will butcher the diff, but it's about the idea):
> > > > >
> > > > > Yeah, that's the other option I was considering. That would be more
> > > > > consistent with the existing bpf_prog_run_array(), but has the
> > > > > downside of unnecessarily pushing responsibility up to the caller...
> > > > > I'm fine with either.
> > > >
> > > > there is really just one caller ("legacy" singular uprobe handler), so
> > > > I think this should be fine. Unless someone objects I'd keep it
> > > > consistent with other "prog_array_run" helpers
> > >
> > > Ack, I will make it consistent.
> > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > > > >   * rcu-protected dynamically sized maps.
> > > > > >   */
> > > > > >  static __always_inline u32
> > > > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > > > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > > > > >  {
> > > > > >         const struct bpf_prog_array_item *item;
> > > > > >         const struct bpf_prog *prog;
> > > > > > -       const struct bpf_prog_array *array;
> > > > > >         struct bpf_run_ctx *old_run_ctx;
> > > > > >         struct bpf_trace_run_ctx run_ctx;
> > > > > >         u32 ret = 1;
> > > > > >
> > > > > >         might_fault();
> > > > > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > > > +
> > > > > > +       if (unlikely(!array))
> > > > > > +               goto out;
> > > > > >
> > > > > > -       rcu_read_lock_trace();
> > > > > >         migrate_disable();
> > > > > >
> > > > > >         run_ctx.is_uprobe = true;
> > > > > >
> > > > > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > > > -       if (unlikely(!array))
> > > > > > -               goto out;
> > > > > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > >         item = &array->items[0];
> > > > > >         while ((prog = READ_ONCE(item->prog))) {
> > > > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > > > bpf_prog_array __rcu *array_rcu,
> > > > > >         bpf_reset_run_ctx(old_run_ctx);
> > > > > >  out:
> > > > > >         migrate_enable();
> > > > > > -       rcu_read_unlock_trace();
> > > > > >         return ret;
> > > > > >  }
> > > > > >
> > > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > > > index fed382b7881b..87a2b8fefa90 100644
> > > > > > --- a/kernel/trace/trace_uprobe.c
> > > > > > +++ b/kernel/trace/trace_uprobe.c
> > > > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > > > >         if (bpf_prog_array_valid(call)) {
> > > > > >                 u32 ret;
> > > > > >
> > > > > > +               rcu_read_lock_trace();
> > > > > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > > > regs, bpf_prog_run);
> > > > >
> > > > > But then this should be something like this (possibly split across
> > > > > multiple lines with a helper variable or such):
> > > > >
> > > > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> > > >
> > > > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > > > But I wouldn't do the _check() variant here, literally the previous
> > > > line does rcu_read_trace_lock(), so this check part seems like just
> > > > unnecessary verboseness, I'd go with a simple rcu_dereference().
> > >
> > > rcu_dereference() is not legal there - that asserts that we are in a
> > > normal RCU read-side critical section, which we are not.
> > > rcu_dereference_raw() would be, but I think it is nice to document the
> > > semantics to make it explicit under which lock we're operating.
>
> sure, I don't mind
>
> > >
> > > I'll send a v3 in a bit after testing it.
> >
> > Actually, now I'm still hitting a page fault with my WIP v3 fix
> > applied... I'll probably poke at this some more next week.
>
> OK, that's interesting, keep us posted!

If I replace the "uprobe/" in my reproducer with "uprobe.s/", the
reproducer stops crashing even on bpf/master without this fix -
because it happens that handle_swbp() is already holding a
rcu_read_lock_trace() lock way up the stack. So I think this fix
should still be applied, but it probably doesn't need to go into
stable unless there is another path to the buggy code that doesn't
come from handle_swbp(). I guess I probably should resend my patch
with an updated commit message pointing out this caveat?

The problem I'm actually hitting seems to be a use-after-free of a
"struct bpf_prog" because of mismatching RCU flavors. Uprobes always
use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But
it is possible to attach a non-sleepable BPF program to a uprobe, and
non-sleepable BPF programs are freed via normal RCU (see
__bpf_prog_put_noref()). And that is what happens with the reproducer
from my initial post
(https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com/)
- I can see that __bpf_prog_put_noref runs with prog->sleepable==0.

So I think that while I am delaying execution in
bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out
the event->tp_event->prog_array pointer and does
bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then
the BPF program can be freed since the reader doesn't hold an RCU read
lock. This seems a bit annoying to fix - there could legitimately be
several versions of the bpf_prog_array that are still used by
tasks-trace-RCU readers, so I think we can't just NULL out the array
entry and use RCU for the bpf_prog_array access on the reader side. I
guess we could add another flag on BPF programs that answers "should
this program be freed via tasks-trace-RCU" (separately from whether
the program is sleepable)?

Andrii Nakryiko Dec. 9, 2024, 10:14 p.m. UTC | #8

On Mon, Dec 9, 2024 at 10:22 AM Jann Horn <jannh@google.com> wrote:
>
> On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Fri, Dec 6, 2024 at 3:14 PM Jann Horn <jannh@google.com> wrote:
> > > On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@google.com> wrote:
> > > > On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
> > > > > >
> > > > > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > > > > > > >
> > > > > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > > > > loaded pointer can immediately be dangling. Later,
> > > > > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > > > > rcu_dereference_check() does not actually dereference anything.
> > > > > > > >
> > > > > > > > It looks like the intention was to pass a pointer to the member
> > > > > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > > > > the pointer in there. Fix the issue by actually doing that.
> > > > > > > >
> > > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > > > > Cc: stable@vger.kernel.org
> > > > > > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > > > > > ---
> > > > > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > > > > include of linux/delay.h.
> > > > > > > >
> > > > > > > > Build this userspace program:
> > > > > > > >
> > > > > > > > ```
> > > > > > > > $ cat dummy.c
> > > > > > > > #include <stdio.h>
> > > > > > > > int main(void) {
> > > > > > > >   printf("hello world\n");
> > > > > > > > }
> > > > > > > > $ gcc -o dummy dummy.c
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Then build this BPF program and load it (change the path to point to
> > > > > > > > the "dummy" binary you built):
> > > > > > > >
> > > > > > > > ```
> > > > > > > > $ cat bpf-uprobe-kern.c
> > > > > > > > #include <linux/bpf.h>
> > > > > > > > #include <bpf/bpf_helpers.h>
> > > > > > > > #include <bpf/bpf_tracing.h>
> > > > > > > > char _license[] SEC("license") = "GPL";
> > > > > > > >
> > > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > > > > int BPF_UPROBE(main_uprobe) {
> > > > > > > >   bpf_printk("main uprobe triggered\n");
> > > > > > > >   return 0;
> > > > > > > > }
> > > > > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > > > > ---
> > > > > > > > Changes in v2:
> > > > > > > > - remove diff chunk in patch notes that confuses git
> > > > > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > > > > > ---
> > > > > > > >  include/linux/bpf.h         | 4 ++--
> > > > > > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > >
> > > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > > > > used, it seems like it is the caller's responsibility to
> > > > > > > RCU-dereference array and keep RCU critical section before calling
> > > > > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > > > > (Gmail will butcher the diff, but it's about the idea):
> > > > > >
> > > > > > Yeah, that's the other option I was considering. That would be more
> > > > > > consistent with the existing bpf_prog_run_array(), but has the
> > > > > > downside of unnecessarily pushing responsibility up to the caller...
> > > > > > I'm fine with either.
> > > > >
> > > > > there is really just one caller ("legacy" singular uprobe handler), so
> > > > > I think this should be fine. Unless someone objects I'd keep it
> > > > > consistent with other "prog_array_run" helpers
> > > >
> > > > Ack, I will make it consistent.
> > > >
> > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > > > > --- a/include/linux/bpf.h
> > > > > > > +++ b/include/linux/bpf.h
> > > > > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > > > > >   * rcu-protected dynamically sized maps.
> > > > > > >   */
> > > > > > >  static __always_inline u32
> > > > > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > > > > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > > > > > >  {
> > > > > > >         const struct bpf_prog_array_item *item;
> > > > > > >         const struct bpf_prog *prog;
> > > > > > > -       const struct bpf_prog_array *array;
> > > > > > >         struct bpf_run_ctx *old_run_ctx;
> > > > > > >         struct bpf_trace_run_ctx run_ctx;
> > > > > > >         u32 ret = 1;
> > > > > > >
> > > > > > >         might_fault();
> > > > > > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > > > > +
> > > > > > > +       if (unlikely(!array))
> > > > > > > +               goto out;
> > > > > > >
> > > > > > > -       rcu_read_lock_trace();
> > > > > > >         migrate_disable();
> > > > > > >
> > > > > > >         run_ctx.is_uprobe = true;
> > > > > > >
> > > > > > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > > > > -       if (unlikely(!array))
> > > > > > > -               goto out;
> > > > > > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > >         item = &array->items[0];
> > > > > > >         while ((prog = READ_ONCE(item->prog))) {
> > > > > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > > > > bpf_prog_array __rcu *array_rcu,
> > > > > > >         bpf_reset_run_ctx(old_run_ctx);
> > > > > > >  out:
> > > > > > >         migrate_enable();
> > > > > > > -       rcu_read_unlock_trace();
> > > > > > >         return ret;
> > > > > > >  }
> > > > > > >
> > > > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > > > > index fed382b7881b..87a2b8fefa90 100644
> > > > > > > --- a/kernel/trace/trace_uprobe.c
> > > > > > > +++ b/kernel/trace/trace_uprobe.c
> > > > > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > > > > >         if (bpf_prog_array_valid(call)) {
> > > > > > >                 u32 ret;
> > > > > > >
> > > > > > > +               rcu_read_lock_trace();
> > > > > > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > > > > regs, bpf_prog_run);
> > > > > >
> > > > > > But then this should be something like this (possibly split across
> > > > > > multiple lines with a helper variable or such):
> > > > > >
> > > > > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > > > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> > > > >
> > > > > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > > > > But I wouldn't do the _check() variant here, literally the previous
> > > > > line does rcu_read_trace_lock(), so this check part seems like just
> > > > > unnecessary verboseness, I'd go with a simple rcu_dereference().
> > > >
> > > > rcu_dereference() is not legal there - that asserts that we are in a
> > > > normal RCU read-side critical section, which we are not.
> > > > rcu_dereference_raw() would be, but I think it is nice to document the
> > > > semantics to make it explicit under which lock we're operating.
> >
> > sure, I don't mind
> >
> > > >
> > > > I'll send a v3 in a bit after testing it.
> > >
> > > Actually, now I'm still hitting a page fault with my WIP v3 fix
> > > applied... I'll probably poke at this some more next week.
> >
> > OK, that's interesting, keep us posted!
>
> If I replace the "uprobe/" in my reproducer with "uprobe.s/", the
> reproducer stops crashing even on bpf/master without this fix -
> because it happens that handle_swbp() is already holding a
> rcu_read_lock_trace() lock way up the stack. So I think this fix
> should still be applied, but it probably doesn't need to go into
> stable unless there is another path to the buggy code that doesn't
> come from handle_swbp(). I guess I probably should resend my patch
> with an updated commit message pointing out this caveat?
>
> The problem I'm actually hitting seems to be a use-after-free of a
> "struct bpf_prog" because of mismatching RCU flavors. Uprobes always
> use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But
> it is possible to attach a non-sleepable BPF program to a uprobe, and
> non-sleepable BPF programs are freed via normal RCU (see
> __bpf_prog_put_noref()). And that is what happens with the reproducer
> from my initial post
> (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com/)
> - I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
>
> So I think that while I am delaying execution in
> bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out
> the event->tp_event->prog_array pointer and does
> bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then
> the BPF program can be freed since the reader doesn't hold an RCU read
> lock. This seems a bit annoying to fix - there could legitimately be
> several versions of the bpf_prog_array that are still used by
> tasks-trace-RCU readers, so I think we can't just NULL out the array
> entry and use RCU for the bpf_prog_array access on the reader side. I
> guess we could add another flag on BPF programs that answers "should
> this program be freed via tasks-trace-RCU" (separately from whether
> the program is sleepable)?

Yes, we shouldn't NULL anything out.

This is the same issue we've been solving recently for sleepable
tracepoints, see [0] and other patches in the same patch set. We
solved it for sleepable (aka "faultable") tracepoints, but uprobes
have the same problem where the attachment point is sleepable by
nature (and thus protected by RCU Tasks Trace), but BPF program itself
is non-sleepable (and thus we only wait for synchronize_rcu() before
freeing), which causes a disconnect.

We can easily fix this for BPF link-based uprobes, but legacy uprobes
can be directly attached to perf event, so that's a bit more
cumbersome. Let me think what should be the best way to handle this.

Meanwhile, I agree, please send your original fix (with changes we
discussed), it's good to have them, even if they don't fix your
original issue. I'll CC you on fixes once I have them.

  [0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@kernel.org/

Andrii Nakryiko Dec. 9, 2024, 10:45 p.m. UTC | #9

On Mon, Dec 9, 2024 at 2:14 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Dec 9, 2024 at 10:22 AM Jann Horn <jannh@google.com> wrote:
> >
> > On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > On Fri, Dec 6, 2024 at 3:14 PM Jann Horn <jannh@google.com> wrote:
> > > > On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@google.com> wrote:
> > > > > On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@google.com> wrote:
> > > > > > >
> > > > > > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@google.com> wrote:
> > > > > > > > >
> > > > > > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > > > > > loaded pointer can immediately be dangling. Later,
> > > > > > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > > > > > rcu_dereference_check() does not actually dereference anything.
> > > > > > > > >
> > > > > > > > > It looks like the intention was to pass a pointer to the member
> > > > > > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > > > > > the pointer in there. Fix the issue by actually doing that.
> > > > > > > > >
> > > > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > > > > > Cc: stable@vger.kernel.org
> > > > > > > > > Signed-off-by: Jann Horn <jannh@google.com>
> > > > > > > > > ---
> > > > > > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > > > > > include of linux/delay.h.
> > > > > > > > >
> > > > > > > > > Build this userspace program:
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > > $ cat dummy.c
> > > > > > > > > #include <stdio.h>
> > > > > > > > > int main(void) {
> > > > > > > > >   printf("hello world\n");
> > > > > > > > > }
> > > > > > > > > $ gcc -o dummy dummy.c
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > Then build this BPF program and load it (change the path to point to
> > > > > > > > > the "dummy" binary you built):
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > > $ cat bpf-uprobe-kern.c
> > > > > > > > > #include <linux/bpf.h>
> > > > > > > > > #include <bpf/bpf_helpers.h>
> > > > > > > > > #include <bpf/bpf_tracing.h>
> > > > > > > > > char _license[] SEC("license") = "GPL";
> > > > > > > > >
> > > > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > > > > > int BPF_UPROBE(main_uprobe) {
> > > > > > > > >   bpf_printk("main uprobe triggered\n");
> > > > > > > > >   return 0;
> > > > > > > > > }
> > > > > > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > > > > > ---
> > > > > > > > > Changes in v2:
> > > > > > > > > - remove diff chunk in patch notes that confuses git
> > > > > > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com
> > > > > > > > > ---
> > > > > > > > >  include/linux/bpf.h         | 4 ++--
> > > > > > > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > > > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > >
> > > > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > > > > > used, it seems like it is the caller's responsibility to
> > > > > > > > RCU-dereference array and keep RCU critical section before calling
> > > > > > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > > > > > (Gmail will butcher the diff, but it's about the idea):
> > > > > > >
> > > > > > > Yeah, that's the other option I was considering. That would be more
> > > > > > > consistent with the existing bpf_prog_run_array(), but has the
> > > > > > > downside of unnecessarily pushing responsibility up to the caller...
> > > > > > > I'm fine with either.
> > > > > >
> > > > > > there is really just one caller ("legacy" singular uprobe handler), so
> > > > > > I think this should be fine. Unless someone objects I'd keep it
> > > > > > consistent with other "prog_array_run" helpers
> > > > >
> > > > > Ack, I will make it consistent.
> > > > >
> > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > > > > > --- a/include/linux/bpf.h
> > > > > > > > +++ b/include/linux/bpf.h
> > > > > > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > > > > > >   * rcu-protected dynamically sized maps.
> > > > > > > >   */
> > > > > > > >  static __always_inline u32
> > > > > > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > > > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > > > > > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > > > > > > >  {
> > > > > > > >         const struct bpf_prog_array_item *item;
> > > > > > > >         const struct bpf_prog *prog;
> > > > > > > > -       const struct bpf_prog_array *array;
> > > > > > > >         struct bpf_run_ctx *old_run_ctx;
> > > > > > > >         struct bpf_trace_run_ctx run_ctx;
> > > > > > > >         u32 ret = 1;
> > > > > > > >
> > > > > > > >         might_fault();
> > > > > > > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > > > > > +
> > > > > > > > +       if (unlikely(!array))
> > > > > > > > +               goto out;
> > > > > > > >
> > > > > > > > -       rcu_read_lock_trace();
> > > > > > > >         migrate_disable();
> > > > > > > >
> > > > > > > >         run_ctx.is_uprobe = true;
> > > > > > > >
> > > > > > > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > > > > > -       if (unlikely(!array))
> > > > > > > > -               goto out;
> > > > > > > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > > >         item = &array->items[0];
> > > > > > > >         while ((prog = READ_ONCE(item->prog))) {
> > > > > > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > > > > > bpf_prog_array __rcu *array_rcu,
> > > > > > > >         bpf_reset_run_ctx(old_run_ctx);
> > > > > > > >  out:
> > > > > > > >         migrate_enable();
> > > > > > > > -       rcu_read_unlock_trace();
> > > > > > > >         return ret;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > > > > > index fed382b7881b..87a2b8fefa90 100644
> > > > > > > > --- a/kernel/trace/trace_uprobe.c
> > > > > > > > +++ b/kernel/trace/trace_uprobe.c
> > > > > > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > > > > > >         if (bpf_prog_array_valid(call)) {
> > > > > > > >                 u32 ret;
> > > > > > > >
> > > > > > > > +               rcu_read_lock_trace();
> > > > > > > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > > > > > regs, bpf_prog_run);
> > > > > > >
> > > > > > > But then this should be something like this (possibly split across
> > > > > > > multiple lines with a helper variable or such):
> > > > > > >
> > > > > > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > > > > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> > > > > >
> > > > > > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > > > > > But I wouldn't do the _check() variant here, literally the previous
> > > > > > line does rcu_read_trace_lock(), so this check part seems like just
> > > > > > unnecessary verboseness, I'd go with a simple rcu_dereference().
> > > > >
> > > > > rcu_dereference() is not legal there - that asserts that we are in a
> > > > > normal RCU read-side critical section, which we are not.
> > > > > rcu_dereference_raw() would be, but I think it is nice to document the
> > > > > semantics to make it explicit under which lock we're operating.
> > >
> > > sure, I don't mind
> > >
> > > > >
> > > > > I'll send a v3 in a bit after testing it.
> > > >
> > > > Actually, now I'm still hitting a page fault with my WIP v3 fix
> > > > applied... I'll probably poke at this some more next week.
> > >
> > > OK, that's interesting, keep us posted!
> >
> > If I replace the "uprobe/" in my reproducer with "uprobe.s/", the
> > reproducer stops crashing even on bpf/master without this fix -
> > because it happens that handle_swbp() is already holding a
> > rcu_read_lock_trace() lock way up the stack. So I think this fix
> > should still be applied, but it probably doesn't need to go into
> > stable unless there is another path to the buggy code that doesn't
> > come from handle_swbp(). I guess I probably should resend my patch
> > with an updated commit message pointing out this caveat?
> >
> > The problem I'm actually hitting seems to be a use-after-free of a
> > "struct bpf_prog" because of mismatching RCU flavors. Uprobes always
> > use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But
> > it is possible to attach a non-sleepable BPF program to a uprobe, and
> > non-sleepable BPF programs are freed via normal RCU (see
> > __bpf_prog_put_noref()). And that is what happens with the reproducer
> > from my initial post
> > (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@google.com/)
> > - I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
> >
> > So I think that while I am delaying execution in
> > bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out
> > the event->tp_event->prog_array pointer and does
> > bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then
> > the BPF program can be freed since the reader doesn't hold an RCU read
> > lock. This seems a bit annoying to fix - there could legitimately be
> > several versions of the bpf_prog_array that are still used by
> > tasks-trace-RCU readers, so I think we can't just NULL out the array
> > entry and use RCU for the bpf_prog_array access on the reader side. I
> > guess we could add another flag on BPF programs that answers "should
> > this program be freed via tasks-trace-RCU" (separately from whether
> > the program is sleepable)?
>
> Yes, we shouldn't NULL anything out.
>
> This is the same issue we've been solving recently for sleepable
> tracepoints, see [0] and other patches in the same patch set. We
> solved it for sleepable (aka "faultable") tracepoints, but uprobes
> have the same problem where the attachment point is sleepable by
> nature (and thus protected by RCU Tasks Trace), but BPF program itself
> is non-sleepable (and thus we only wait for synchronize_rcu() before
> freeing), which causes a disconnect.
>
> We can easily fix this for BPF link-based uprobes, but legacy uprobes
> can be directly attached to perf event, so that's a bit more
> cumbersome. Let me think what should be the best way to handle this.
>
> Meanwhile, I agree, please send your original fix (with changes we
> discussed), it's good to have them, even if they don't fix your
> original issue. I'll CC you on fixes once I have them.

Ok, weeding through the perf/uprobe plumbing for BPF, I think we avoid
this problem with uprobe BPF link because uprobe_unregister_sync()
waits for RCU Tasks Trace GP, and so once we finish uprobe
unregistration, we have a guarantee that there is no more uprobe that
might dereference our BPF program. (I might have thought about this
problem when fixing BPF link for sleepable tracepoints, but I missed
the legacy perf_event attach/detach case).

With legacy perf event perf_event_detach_bpf_prog() we don't do any of
that, we just NULL out pointer and do bpf_prog_put(), not caring
whether this is uprobe, kprobe, or tracepoint...

So one way to solve this is to either teach
perf_event_detach_bpf_prog() to delay bpf_prog_put() until after RCU
Tasks Trace GP (which is what we do with bpf_prog_array, but not the
program itself), or add prog->aux->sleepable_hook flag in addition to
prog->aux->sleepable, and then inside bpf_prog_put() check
(prog->aux->sleepable || prog->aux->sleepable_hook) and do RCU Tasks
Trace delay (in addition to normal call_rcu()).

Third alternative would be to have something like
bpf_prog_put_sleepable() (just like we have
bpf_prog_array_free_sleepable()), where this would do additional
call_rcu_tasks_trace() even if BPF program itself isn't sleepable.

Alexei, Daniel, any thoughts or preferences?

>
>   [0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@kernel.org/

Alexei Starovoitov Dec. 10, 2024, 12:54 a.m. UTC | #10

On Mon, Dec 9, 2024 at 2:45 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> Ok, weeding through the perf/uprobe plumbing for BPF, I think we avoid
> this problem with uprobe BPF link because uprobe_unregister_sync()
> waits for RCU Tasks Trace GP, and so once we finish uprobe
> unregistration, we have a guarantee that there is no more uprobe that
> might dereference our BPF program. (I might have thought about this
> problem when fixing BPF link for sleepable tracepoints, but I missed
> the legacy perf_event attach/detach case).
>
> With legacy perf event perf_event_detach_bpf_prog() we don't do any of
> that, we just NULL out pointer and do bpf_prog_put(), not caring
> whether this is uprobe, kprobe, or tracepoint...
>
> So one way to solve this is to either teach
> perf_event_detach_bpf_prog() to delay bpf_prog_put() until after RCU
> Tasks Trace GP (which is what we do with bpf_prog_array, but not the
> program itself),

since this is a legacy prog detach api I would just add sync_rcu_tt
there. It's a backportable one line change.

> or add prog->aux->sleepable_hook flag in addition to
> prog->aux->sleepable, and then inside bpf_prog_put() check
> (prog->aux->sleepable || prog->aux->sleepable_hook) and do RCU Tasks
> Trace delay (in addition to normal call_rcu()).

That sounds like more work and if we introduce sleepable_hook
we would better generalize it and rely on it everywhere.
Which makes it even more work and certainly not for stable.

> Third alternative would be to have something like
> bpf_prog_put_sleepable() (just like we have
> bpf_prog_array_free_sleepable()), where this would do additional
> call_rcu_tasks_trace() even if BPF program itself isn't sleepable.

Sounds like less work than 2, but conceptually it's the same as 1.
Just call_rcu_tt vs sync_rcu_tt.
And we can wait just fine in that code path.
So I'd go with 1.

[bpf,v2] bpf: Fix prog_array UAF in __uprobe_perf_func()

Commit Message

Comments

Patch