diff mbox series

[v2] selftests/bpf: Update map_kptr examples to reflect real use-cases

Message ID 20221002171012.3529521-1-void@manifault.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [v2] selftests/bpf: Update map_kptr examples to reflect real use-cases | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 fail Logs for test_progs on s390x with gcc

Commit Message

David Vernet Oct. 2, 2022, 5:10 p.m. UTC
In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used
to verify and illustrate a typical use case of kptrs wherein an additional
reference is taken on a referenced kptr that is already stored in a map.
This would be useful for programs that, for example, want to pass the
referenced kptr to a kfunc without removing it from the map.

Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't
representative of a real kfunc that needs to guard against possible
refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get()
does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the
user and then calls refcount_inc() if the pointer is nonzero, but this
can race with another callback in the same program that removes the kptr
from the map  and frees it:

1. A BPF program with a referenced kptr in a map passes the kptr to
   bpf_kfunc_call_test_kptr_get() as:

   p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);

   from CPU 0.

2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the
   struct prog_test_ref_kfunc **pp contains a non-NULL pointer.

3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove
   the kptr from the map, and frees it with a call to
   bpf_kfunc_call_test_release(). This drops the final refcount on the
   kptr.

4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing
   a use-after-free.

In the map_kptr selftest, this doesn't cause a use-after-free because
the structure being refcounted is statically allocated, and the
refcounts aren't actually used to control the object lifecycle. In a
kfunc supporting a real use case, the refcount going to 0 would likely
cause the object to be freed, as it does for e.g. struct task_struct.

A more realistic use-case would use something like RCU in the kfunc
handler to ensure that the kptr object can be safely accessed, and then
issuing a refcount_inc_not_zero() to acquire a refcount on the object.
This patch updates the map_kptr selftest to do this.

Signed-off-by: David Vernet <void@manifault.com>
---
 net/bpf/test_run.c                            | 31 ++++++++++++++++---
 .../selftests/bpf/prog_tests/map_kptr.c       |  4 +--
 .../testing/selftests/bpf/verifier/map_kptr.c |  4 +--
 3 files changed, 31 insertions(+), 8 deletions(-)

Comments

Kumar Kartikeya Dwivedi Oct. 2, 2022, 11:35 p.m. UTC | #1
On Sun, 2 Oct 2022 at 19:10, David Vernet <void@manifault.com> wrote:
>
> In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used
> to verify and illustrate a typical use case of kptrs wherein an additional
> reference is taken on a referenced kptr that is already stored in a map.
> This would be useful for programs that, for example, want to pass the
> referenced kptr to a kfunc without removing it from the map.
>
> Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't
> representative of a real kfunc that needs to guard against possible
> refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get()
> does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the
> user and then calls refcount_inc() if the pointer is nonzero, but this
> can race with another callback in the same program that removes the kptr
> from the map  and frees it:
>
> 1. A BPF program with a referenced kptr in a map passes the kptr to
>    bpf_kfunc_call_test_kptr_get() as:
>
>    p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);
>
>    from CPU 0.
>
> 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the
>    struct prog_test_ref_kfunc **pp contains a non-NULL pointer.
>
> 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove
>    the kptr from the map, and frees it with a call to
>    bpf_kfunc_call_test_release(). This drops the final refcount on the
>    kptr.
>
> 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing
>    a use-after-free.
>
> In the map_kptr selftest, this doesn't cause a use-after-free because
> the structure being refcounted is statically allocated, and the
> refcounts aren't actually used to control the object lifecycle. In a
> kfunc supporting a real use case, the refcount going to 0 would likely
> cause the object to be freed, as it does for e.g. struct task_struct.
>
> A more realistic use-case would use something like RCU in the kfunc
> handler to ensure that the kptr object can be safely accessed, and then
> issuing a refcount_inc_not_zero() to acquire a refcount on the object.
> This patch updates the map_kptr selftest to do this.
>
> Signed-off-by: David Vernet <void@manifault.com>
> ---

In my defense, I did note all this in the commit adding support for
kptr_get, so it seemed overkill to do it for a static struct.
But it's probably not a bad idea to have a real example, given it's a
necessity that such a helper requires reclamation of the object
through RCU, and people probably won't go and read the commit message.

However, some questions below...

>  net/bpf/test_run.c                            | 31 ++++++++++++++++---
>  .../selftests/bpf/prog_tests/map_kptr.c       |  4 +--
>  .../testing/selftests/bpf/verifier/map_kptr.c |  4 +--
>  3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 13d578ce2a09..3fe9495abcbe 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc {
>         int b;
>         struct prog_test_member memb;
>         struct prog_test_ref_kfunc *next;
> +       struct rcu_head rcu;
> +       atomic_t destroyed;
>         refcount_t cnt;
>  };
>
> @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = {
>         .a = 42,
>         .b = 108,
>         .next = &prog_test_struct,
> +       .destroyed = ATOMIC_INIT(0),
>         .cnt = REFCOUNT_INIT(1),
>  };
>
>  noinline struct prog_test_ref_kfunc *
>  bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr)
>  {
> +       WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed));
>         refcount_inc(&prog_test_struct.cnt);
>         return &prog_test_struct;
>  }
> @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void)
>         return NULL;
>  }
>
> +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp)
> +{
> +       struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu);
> +
> +       WARN_ON_ONCE(refcount_read(&p->cnt) > 0);
> +       atomic_set(&p->destroyed, true);
> +}
> +
>  noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
>  {
>         if (!p)
>                 return;
>
> -       refcount_dec(&p->cnt);
> +       WARN_ON_ONCE(atomic_read(&p->destroyed));
> +       if (refcount_dec_and_test(&p->cnt))
> +               call_rcu(&p->rcu, delayed_destroy_test_ref_struct);

I wonder whether this is ever called, I haven't really given this
patch a shot, but I don't see how the refcount can ever drop back to
zero. It's initialized as 1, and then only acquired after that, so
pairing all releases should still preserve refcount as 1.

Also, even if you made it work, wouldn't you have the warning once you
run more selftests using prog_test_run, if you just set the  destroyed
bit on each test run?
David Vernet Oct. 3, 2022, 1:52 p.m. UTC | #2
On Mon, Oct 03, 2022 at 01:35:49AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Sun, 2 Oct 2022 at 19:10, David Vernet <void@manifault.com> wrote:
> >
> > In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used
> > to verify and illustrate a typical use case of kptrs wherein an additional
> > reference is taken on a referenced kptr that is already stored in a map.
> > This would be useful for programs that, for example, want to pass the
> > referenced kptr to a kfunc without removing it from the map.
> >
> > Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't
> > representative of a real kfunc that needs to guard against possible
> > refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get()
> > does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the
> > user and then calls refcount_inc() if the pointer is nonzero, but this
> > can race with another callback in the same program that removes the kptr
> > from the map  and frees it:
> >
> > 1. A BPF program with a referenced kptr in a map passes the kptr to
> >    bpf_kfunc_call_test_kptr_get() as:
> >
> >    p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);
> >
> >    from CPU 0.
> >
> > 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the
> >    struct prog_test_ref_kfunc **pp contains a non-NULL pointer.
> >
> > 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove
> >    the kptr from the map, and frees it with a call to
> >    bpf_kfunc_call_test_release(). This drops the final refcount on the
> >    kptr.
> >
> > 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing
> >    a use-after-free.
> >
> > In the map_kptr selftest, this doesn't cause a use-after-free because
> > the structure being refcounted is statically allocated, and the
> > refcounts aren't actually used to control the object lifecycle. In a
> > kfunc supporting a real use case, the refcount going to 0 would likely
> > cause the object to be freed, as it does for e.g. struct task_struct.
> >
> > A more realistic use-case would use something like RCU in the kfunc
> > handler to ensure that the kptr object can be safely accessed, and then
> > issuing a refcount_inc_not_zero() to acquire a refcount on the object.
> > This patch updates the map_kptr selftest to do this.
> >
> > Signed-off-by: David Vernet <void@manifault.com>
> > ---
> 
> In my defense, I did note all this in the commit adding support for
> kptr_get, so it seemed overkill to do it for a static struct.
> But it's probably not a bad idea to have a real example, given it's a
> necessity that such a helper requires reclamation of the object
> through RCU, and people probably won't go and read the commit message.

No defense needed, though I of course agree that it's important to have
a real example here. I decided to add it because I got confused about
how it was safe to use refcount_inc() instead of refcount_inc_not_zero()
in bpf_kfunc_call_test_kptr_get().

> However, some questions below...
> 
> >  net/bpf/test_run.c                            | 31 ++++++++++++++++---
> >  .../selftests/bpf/prog_tests/map_kptr.c       |  4 +--
> >  .../testing/selftests/bpf/verifier/map_kptr.c |  4 +--
> >  3 files changed, 31 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index 13d578ce2a09..3fe9495abcbe 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc {
> >         int b;
> >         struct prog_test_member memb;
> >         struct prog_test_ref_kfunc *next;
> > +       struct rcu_head rcu;
> > +       atomic_t destroyed;
> >         refcount_t cnt;
> >  };
> >
> > @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = {
> >         .a = 42,
> >         .b = 108,
> >         .next = &prog_test_struct,
> > +       .destroyed = ATOMIC_INIT(0),
> >         .cnt = REFCOUNT_INIT(1),
> >  };
> >
> >  noinline struct prog_test_ref_kfunc *
> >  bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr)
> >  {
> > +       WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed));
> >         refcount_inc(&prog_test_struct.cnt);
> >         return &prog_test_struct;
> >  }
> > @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void)
> >         return NULL;
> >  }
> >
> > +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp)
> > +{
> > +       struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu);
> > +
> > +       WARN_ON_ONCE(refcount_read(&p->cnt) > 0);
> > +       atomic_set(&p->destroyed, true);
> > +}
> > +
> >  noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
> >  {
> >         if (!p)
> >                 return;
> >
> > -       refcount_dec(&p->cnt);
> > +       WARN_ON_ONCE(atomic_read(&p->destroyed));
> > +       if (refcount_dec_and_test(&p->cnt))
> > +               call_rcu(&p->rcu, delayed_destroy_test_ref_struct);
> 
> I wonder whether this is ever called, I haven't really given this
> patch a shot, but I don't see how the refcount can ever drop back to
> zero. It's initialized as 1, and then only acquired after that, so
> pairing all releases should still preserve refcount as 1.

Yeah, the call_rcu() path is never hit. If we wanted to update the test
so that this codepath was actually exercised, I think we'd need to add
another kfunc that returned a reference to a dynamically allocated
object rather than using the global, static one. I'm happy to do that if
we think it's useful. The downside to adding more of these test kfuncs
is that they actually do add a small bit of runtime overhead to the
kernel because they're unconditionally registered in the __init function
for test_run.c.

> Also, even if you made it work, wouldn't you have the warning once you
> run more selftests using prog_test_run, if you just set the  destroyed
> bit on each test run?

If we want to update the test to have the refcount drop to 0, we would
probably have to instead use dynamically allocated objects. At that
point, we'd probably just crash instead of seeing a warning if we
accidentally let a caller invoke acquire or release after the object had
been destroyed. Maybe the better thing to do here is to just warn
unconditionally in the destructor rather than setting a flag? What we
really want to ensure is that the final refcount that's "owned" by the
main kernel is never dropped.
Kumar Kartikeya Dwivedi Oct. 3, 2022, 4:22 p.m. UTC | #3
On Mon, 3 Oct 2022 at 15:52, David Vernet <void@manifault.com> wrote:
>
> On Mon, Oct 03, 2022 at 01:35:49AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Sun, 2 Oct 2022 at 19:10, David Vernet <void@manifault.com> wrote:
> > >
> > > In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used
> > > to verify and illustrate a typical use case of kptrs wherein an additional
> > > reference is taken on a referenced kptr that is already stored in a map.
> > > This would be useful for programs that, for example, want to pass the
> > > referenced kptr to a kfunc without removing it from the map.
> > >
> > > Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't
> > > representative of a real kfunc that needs to guard against possible
> > > refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get()
> > > does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the
> > > user and then calls refcount_inc() if the pointer is nonzero, but this
> > > can race with another callback in the same program that removes the kptr
> > > from the map  and frees it:
> > >
> > > 1. A BPF program with a referenced kptr in a map passes the kptr to
> > >    bpf_kfunc_call_test_kptr_get() as:
> > >
> > >    p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0);
> > >
> > >    from CPU 0.
> > >
> > > 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the
> > >    struct prog_test_ref_kfunc **pp contains a non-NULL pointer.
> > >
> > > 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove
> > >    the kptr from the map, and frees it with a call to
> > >    bpf_kfunc_call_test_release(). This drops the final refcount on the
> > >    kptr.
> > >
> > > 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing
> > >    a use-after-free.
> > >
> > > In the map_kptr selftest, this doesn't cause a use-after-free because
> > > the structure being refcounted is statically allocated, and the
> > > refcounts aren't actually used to control the object lifecycle. In a
> > > kfunc supporting a real use case, the refcount going to 0 would likely
> > > cause the object to be freed, as it does for e.g. struct task_struct.
> > >
> > > A more realistic use-case would use something like RCU in the kfunc
> > > handler to ensure that the kptr object can be safely accessed, and then
> > > issuing a refcount_inc_not_zero() to acquire a refcount on the object.
> > > This patch updates the map_kptr selftest to do this.
> > >
> > > Signed-off-by: David Vernet <void@manifault.com>
> > > ---
> >
> > In my defense, I did note all this in the commit adding support for
> > kptr_get, so it seemed overkill to do it for a static struct.
> > But it's probably not a bad idea to have a real example, given it's a
> > necessity that such a helper requires reclamation of the object
> > through RCU, and people probably won't go and read the commit message.
>
> No defense needed, though I of course agree that it's important to have
> a real example here. I decided to add it because I got confused about
> how it was safe to use refcount_inc() instead of refcount_inc_not_zero()
> in bpf_kfunc_call_test_kptr_get().
>
> > However, some questions below...
> >
> > >  net/bpf/test_run.c                            | 31 ++++++++++++++++---
> > >  .../selftests/bpf/prog_tests/map_kptr.c       |  4 +--
> > >  .../testing/selftests/bpf/verifier/map_kptr.c |  4 +--
> > >  3 files changed, 31 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > > index 13d578ce2a09..3fe9495abcbe 100644
> > > --- a/net/bpf/test_run.c
> > > +++ b/net/bpf/test_run.c
> > > @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc {
> > >         int b;
> > >         struct prog_test_member memb;
> > >         struct prog_test_ref_kfunc *next;
> > > +       struct rcu_head rcu;
> > > +       atomic_t destroyed;
> > >         refcount_t cnt;
> > >  };
> > >
> > > @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = {
> > >         .a = 42,
> > >         .b = 108,
> > >         .next = &prog_test_struct,
> > > +       .destroyed = ATOMIC_INIT(0),
> > >         .cnt = REFCOUNT_INIT(1),
> > >  };
> > >
> > >  noinline struct prog_test_ref_kfunc *
> > >  bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr)
> > >  {
> > > +       WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed));
> > >         refcount_inc(&prog_test_struct.cnt);
> > >         return &prog_test_struct;
> > >  }
> > > @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void)
> > >         return NULL;
> > >  }
> > >
> > > +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp)
> > > +{
> > > +       struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu);
> > > +
> > > +       WARN_ON_ONCE(refcount_read(&p->cnt) > 0);
> > > +       atomic_set(&p->destroyed, true);
> > > +}
> > > +
> > >  noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
> > >  {
> > >         if (!p)
> > >                 return;
> > >
> > > -       refcount_dec(&p->cnt);
> > > +       WARN_ON_ONCE(atomic_read(&p->destroyed));
> > > +       if (refcount_dec_and_test(&p->cnt))
> > > +               call_rcu(&p->rcu, delayed_destroy_test_ref_struct);
> >
> > I wonder whether this is ever called, I haven't really given this
> > patch a shot, but I don't see how the refcount can ever drop back to
> > zero. It's initialized as 1, and then only acquired after that, so
> > pairing all releases should still preserve refcount as 1.
>
> Yeah, the call_rcu() path is never hit. If we wanted to update the test
> so that this codepath was actually exercised, I think we'd need to add
> another kfunc that returned a reference to a dynamically allocated
> object rather than using the global, static one. I'm happy to do that if
> we think it's useful. The downside to adding more of these test kfuncs
> is that they actually do add a small bit of runtime overhead to the
> kernel because they're unconditionally registered in the __init function
> for test_run.c.
>

But that only happens once, right? It still happens, so I don't see
what changes.

> > Also, even if you made it work, wouldn't you have the warning once you
> > run more selftests using prog_test_run, if you just set the  destroyed
> > bit on each test run?
>
> If we want to update the test to have the refcount drop to 0, we would
> probably have to instead use dynamically allocated objects. At that
> point, we'd probably just crash instead of seeing a warning if we
> accidentally let a caller invoke acquire or release after the object had
> been destroyed. Maybe the better thing to do here is to just warn
> unconditionally in the destructor rather than setting a flag? What we
> really want to ensure is that the final refcount that's "owned" by the
> main kernel is never dropped.

I think the refcount_t API already warns if underflow happens.

To be clear, I don't mind if you want to improve this, it's certainly
a mess right now. Tests can't even run in parallel easily because it's
global. Testing like an actually allocated object might be a good way
to simulate a real scenario. And I totally agree that having a real
example is useful to people who want to add support for more kptrs.

Maybe some comments around the code would also be helpful on what the
expectations are. We should make it easy for people to hook into this
stuff. Nobody reads documentation, people usually only look at
existing examples.
David Vernet Oct. 4, 2022, 3:29 p.m. UTC | #4
On Mon, Oct 03, 2022 at 06:22:55PM +0200, Kumar Kartikeya Dwivedi wrote:

[...]

> > > >  noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
> > > >  {
> > > >         if (!p)
> > > >                 return;
> > > >
> > > > -       refcount_dec(&p->cnt);
> > > > +       WARN_ON_ONCE(atomic_read(&p->destroyed));
> > > > +       if (refcount_dec_and_test(&p->cnt))
> > > > +               call_rcu(&p->rcu, delayed_destroy_test_ref_struct);
> > >
> > > I wonder whether this is ever called, I haven't really given this
> > > patch a shot, but I don't see how the refcount can ever drop back to
> > > zero. It's initialized as 1, and then only acquired after that, so
> > > pairing all releases should still preserve refcount as 1.
> >
> > Yeah, the call_rcu() path is never hit. If we wanted to update the test
> > so that this codepath was actually exercised, I think we'd need to add
> > another kfunc that returned a reference to a dynamically allocated
> > object rather than using the global, static one. I'm happy to do that if
> > we think it's useful. The downside to adding more of these test kfuncs
> > is that they actually do add a small bit of runtime overhead to the
> > kernel because they're unconditionally registered in the __init function
> > for test_run.c.
> >
> 
> But that only happens once, right? It still happens, so I don't see
> what changes.

The idea here would be to return a dynamically allocated object with an
initial refcount of 1 that's owned by the _BPF program_, rather than
what we have today where the global struct has an initial refcount
that's owned by the main kernel and is never expected to go to 0. For
all success (i.e. non-fail) testcases that are able to dynamically
allocate this object, the refcount should go to 0 for each of them and
the object will be destroyed after the current RCU grace period. Please
let me know if I've misunderstood your point.

> > > Also, even if you made it work, wouldn't you have the warning once you
> > > run more selftests using prog_test_run, if you just set the  destroyed
> > > bit on each test run?
> >
> > If we want to update the test to have the refcount drop to 0, we would
> > probably have to instead use dynamically allocated objects. At that
> > point, we'd probably just crash instead of seeing a warning if we
> > accidentally let a caller invoke acquire or release after the object had
> > been destroyed. Maybe the better thing to do here is to just warn
> > unconditionally in the destructor rather than setting a flag? What we
> > really want to ensure is that the final refcount that's "owned" by the
> > main kernel is never dropped.
> 
> I think the refcount_t API already warns if underflow happens.

Right, a warning would probably show up if we did a release that caused
an underflow, but it would not for an acquire after the refcount dropped
to 0.

> To be clear, I don't mind if you want to improve this, it's certainly
> a mess right now. Tests can't even run in parallel easily because it's
> global. Testing like an actually allocated object might be a good way
> to simulate a real scenario. And I totally agree that having a real
> example is useful to people who want to add support for more kptrs.

Ok, let me update the tests to do two things then:

1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns
   a dynamically allocated instance of a prog_test_ref_kfunc *. This is
   similar in intention to bpf_xdp_ct_alloc().
2. Update bpf_kfunc_call_test_acquire() and
   bpf_kfunc_call_test_release() to take a trusted pointer to that
   allocated prog_test_ref_kfunc *.
3. Keep the other changes which e.g. use RCU in
   bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr.
   Once the __rcu kptr variant is landed we can get rid of
   bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire()
   require an __rcu pointer.

Continuing on point (3) above, we should _probably_ also have an example
for an object that isn't RCU-protected? I imagine that we don't want to
get rid of KF_KPTR_GET entirely once we have __rcu pointers because some
kptr_get() implementations may synchronize in other ways, such as with
spinlocks or whatever. Let's leave this until after __rcu lands though.

Does this all sound good?

> Maybe some comments around the code would also be helpful on what the
> expectations are. We should make it easy for people to hook into this
> stuff. Nobody reads documentation, people usually only look at
> existing examples.

For sure, can do.
Kumar Kartikeya Dwivedi Oct. 11, 2022, 2:40 a.m. UTC | #5
On Tue, 4 Oct 2022 at 20:59, David Vernet <void@manifault.com> wrote:
>
> On Mon, Oct 03, 2022 at 06:22:55PM +0200, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > > > >  noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
> > > > >  {
> > > > >         if (!p)
> > > > >                 return;
> > > > >
> > > > > -       refcount_dec(&p->cnt);
> > > > > +       WARN_ON_ONCE(atomic_read(&p->destroyed));
> > > > > +       if (refcount_dec_and_test(&p->cnt))
> > > > > +               call_rcu(&p->rcu, delayed_destroy_test_ref_struct);
> > > >
> > > > I wonder whether this is ever called, I haven't really given this
> > > > patch a shot, but I don't see how the refcount can ever drop back to
> > > > zero. It's initialized as 1, and then only acquired after that, so
> > > > pairing all releases should still preserve refcount as 1.
> > >
> > > Yeah, the call_rcu() path is never hit. If we wanted to update the test
> > > so that this codepath was actually exercised, I think we'd need to add
> > > another kfunc that returned a reference to a dynamically allocated
> > > object rather than using the global, static one. I'm happy to do that if
> > > we think it's useful. The downside to adding more of these test kfuncs
> > > is that they actually do add a small bit of runtime overhead to the
> > > kernel because they're unconditionally registered in the __init function
> > > for test_run.c.
> > >
> >
> > But that only happens once, right? It still happens, so I don't see
> > what changes.
>
> The idea here would be to return a dynamically allocated object with an
> initial refcount of 1 that's owned by the _BPF program_, rather than
> what we have today where the global struct has an initial refcount
> that's owned by the main kernel and is never expected to go to 0. For
> all success (i.e. non-fail) testcases that are able to dynamically
> allocate this object, the refcount should go to 0 for each of them and
> the object will be destroyed after the current RCU grace period. Please
> let me know if I've misunderstood your point.
>
> > > > Also, even if you made it work, wouldn't you have the warning once you
> > > > run more selftests using prog_test_run, if you just set the  destroyed
> > > > bit on each test run?
> > >
> > > If we want to update the test to have the refcount drop to 0, we would
> > > probably have to instead use dynamically allocated objects. At that
> > > point, we'd probably just crash instead of seeing a warning if we
> > > accidentally let a caller invoke acquire or release after the object had
> > > been destroyed. Maybe the better thing to do here is to just warn
> > > unconditionally in the destructor rather than setting a flag? What we
> > > really want to ensure is that the final refcount that's "owned" by the
> > > main kernel is never dropped.
> >
> > I think the refcount_t API already warns if underflow happens.
>
> Right, a warning would probably show up if we did a release that caused
> an underflow, but it would not for an acquire after the refcount dropped
> to 0.
>

It should, see REFCOUNT_ADD_UAF in include/linux/refcount.h.

> > To be clear, I don't mind if you want to improve this, it's certainly
> > a mess right now. Tests can't even run in parallel easily because it's
> > global. Testing like an actually allocated object might be a good way
> > to simulate a real scenario. And I totally agree that having a real
> > example is useful to people who want to add support for more kptrs.
>
> Ok, let me update the tests to do two things then:
>
> 1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns
>    a dynamically allocated instance of a prog_test_ref_kfunc *. This is
>    similar in intention to bpf_xdp_ct_alloc().
> 2. Update bpf_kfunc_call_test_acquire() and
>    bpf_kfunc_call_test_release() to take a trusted pointer to that
>    allocated prog_test_ref_kfunc *.

This should work, but you would have to go through a lot of tests,
sadly I assumed it is global in a lot of places to make testing easier
(e.g. observing count after releasing by doing p->next, which gave me
a PTR_TO_BTF_ID that is preserved after release).
Some other way would have to be found to do the same thing.

> 3. Keep the other changes which e.g. use RCU in
>    bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr.
>    Once the __rcu kptr variant is landed we can get rid of
>    bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire()
>    require an __rcu pointer.
>

In the case of RCU I don't plan on passing the rcu pointer to acquire
functions, but rather kptr_get stops taking pointer to pointer. I.e.
in your point 3, kptr_get does what you want _acquire to do. It takes
an rcu protected pointer to an object and safely increments its count.

> Continuing on point (3) above, we should _probably_ also have an example
> for an object that isn't RCU-protected? I imagine that we don't want to
> get rid of KF_KPTR_GET entirely once we have __rcu pointers because some
> kptr_get() implementations may synchronize in other ways, such as with
> spinlocks or whatever. Let's leave this until after __rcu lands though.
>

I think it's unlikely kptr_get can work without it, spinlocks may be
required internally (e.g. to inspect the object key/generation in
SLAB_TYPESAFE_BY_RCU case without races) but that goes on top of RCU
protection. But yes, it depends, maybe it will work for some special
cases. Though I don't think it's worth adding an example for the
uncommon case.

So you will still need kptr_get helpers, some of them simply do
refcount_inc_not_zero, others may do a little more.
Anyway, let's discuss more when the set is posted.

> Does this all sound good?
>

Yes.
David Vernet Oct. 11, 2022, 3:21 p.m. UTC | #6
On Tue, Oct 11, 2022 at 08:10:26AM +0530, Kumar Kartikeya Dwivedi wrote:

[...]

> > > > > Also, even if you made it work, wouldn't you have the warning once you
> > > > > run more selftests using prog_test_run, if you just set the  destroyed
> > > > > bit on each test run?
> > > >
> > > > If we want to update the test to have the refcount drop to 0, we would
> > > > probably have to instead use dynamically allocated objects. At that
> > > > point, we'd probably just crash instead of seeing a warning if we
> > > > accidentally let a caller invoke acquire or release after the object had
> > > > been destroyed. Maybe the better thing to do here is to just warn
> > > > unconditionally in the destructor rather than setting a flag? What we
> > > > really want to ensure is that the final refcount that's "owned" by the
> > > > main kernel is never dropped.
> > >
> > > I think the refcount_t API already warns if underflow happens.
> >
> > Right, a warning would probably show up if we did a release that caused
> > an underflow, but it would not for an acquire after the refcount dropped
> > to 0.
> >
> 
> It should, see REFCOUNT_ADD_UAF in include/linux/refcount.h.

Ahh, right you are, fair enough and thanks for hand holding and pointing
me directly at the relevant code. I now agree that the warns on the
destroyed field are just redundant.

> > > To be clear, I don't mind if you want to improve this, it's certainly
> > > a mess right now. Tests can't even run in parallel easily because it's
> > > global. Testing like an actually allocated object might be a good way
> > > to simulate a real scenario. And I totally agree that having a real
> > > example is useful to people who want to add support for more kptrs.
> >
> > Ok, let me update the tests to do two things then:
> >
> > 1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns
> >    a dynamically allocated instance of a prog_test_ref_kfunc *. This is
> >    similar in intention to bpf_xdp_ct_alloc().
> > 2. Update bpf_kfunc_call_test_acquire() and
> >    bpf_kfunc_call_test_release() to take a trusted pointer to that
> >    allocated prog_test_ref_kfunc *.
> 
> This should work, but you would have to go through a lot of tests,
> sadly I assumed it is global in a lot of places to make testing easier
> (e.g. observing count after releasing by doing p->next, which gave me
> a PTR_TO_BTF_ID that is preserved after release).
> Some other way would have to be found to do the same thing.

Hmmm, ok, let me spend a bit of time trying to make all of this dynamic.
If it becomes clear that it's too much of a PITA, I might just drop the
patch; especially considering that your __rcu kptr work will address
what I was really initially trying to fix (which is that the kptr_get
pattern used in the test would likely be racy for a real kfunc). Or if
we want to, we could keep what it has now, but I could just update
delayed_destroy_test_ref_struct() to do nothing other than
WARN_ON_ONCE() and remove the extra warns for
prog_test_struct.destroyed. We can make a final call when I send out v3.

> > 3. Keep the other changes which e.g. use RCU in
> >    bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr.
> >    Once the __rcu kptr variant is landed we can get rid of
> >    bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire()
> >    require an __rcu pointer.
> >
> 
> In the case of RCU I don't plan on passing the rcu pointer to acquire
> functions, but rather kptr_get stops taking pointer to pointer. I.e.
> in your point 3, kptr_get does what you want _acquire to do. It takes
> an rcu protected pointer to an object and safely increments its count.

Oh, ok. So the idea is that the acquire function takes a normal trusted
pointer, and kptr_get takes an RCU protected kptr (so it would still
have to do refcount_inc_not_zero()). Makes sense.

> > Continuing on point (3) above, we should _probably_ also have an example
> > for an object that isn't RCU-protected? I imagine that we don't want to
> > get rid of KF_KPTR_GET entirely once we have __rcu pointers because some
> > kptr_get() implementations may synchronize in other ways, such as with
> > spinlocks or whatever. Let's leave this until after __rcu lands though.
> >
> 
> I think it's unlikely kptr_get can work without it, spinlocks may be
> required internally (e.g. to inspect the object key/generation in
> SLAB_TYPESAFE_BY_RCU case without races) but that goes on top of RCU
> protection. But yes, it depends, maybe it will work for some special
> cases. Though I don't think it's worth adding an example for the
> uncommon case.

Yeah, let's leave it off for now until we have a concrete use case in
the kernel that we want to mirror in a testcase.
diff mbox series

Patch

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 13d578ce2a09..3fe9495abcbe 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -565,6 +565,8 @@  struct prog_test_ref_kfunc {
 	int b;
 	struct prog_test_member memb;
 	struct prog_test_ref_kfunc *next;
+	struct rcu_head rcu;
+	atomic_t destroyed;
 	refcount_t cnt;
 };
 
@@ -572,12 +574,14 @@  static struct prog_test_ref_kfunc prog_test_struct = {
 	.a = 42,
 	.b = 108,
 	.next = &prog_test_struct,
+	.destroyed = ATOMIC_INIT(0),
 	.cnt = REFCOUNT_INIT(1),
 };
 
 noinline struct prog_test_ref_kfunc *
 bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr)
 {
+	WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed));
 	refcount_inc(&prog_test_struct.cnt);
 	return &prog_test_struct;
 }
@@ -589,12 +593,22 @@  bpf_kfunc_call_memb_acquire(void)
 	return NULL;
 }
 
+static void delayed_destroy_test_ref_struct(struct rcu_head *rhp)
+{
+	struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu);
+
+	WARN_ON_ONCE(refcount_read(&p->cnt) > 0);
+	atomic_set(&p->destroyed, true);
+}
+
 noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
 {
 	if (!p)
 		return;
 
-	refcount_dec(&p->cnt);
+	WARN_ON_ONCE(atomic_read(&p->destroyed));
+	if (refcount_dec_and_test(&p->cnt))
+		call_rcu(&p->rcu, delayed_destroy_test_ref_struct);
 }
 
 noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p)
@@ -641,11 +655,20 @@  noinline void bpf_kfunc_call_int_mem_release(int *p)
 noinline struct prog_test_ref_kfunc *
 bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **pp, int a, int b)
 {
-	struct prog_test_ref_kfunc *p = READ_ONCE(*pp);
+	struct prog_test_ref_kfunc *p;
 
-	if (!p)
+	rcu_read_lock();
+	p = READ_ONCE(*pp);
+	if (!p) {
+		rcu_read_unlock();
 		return NULL;
-	refcount_inc(&p->cnt);
+	}
+
+	WARN_ON_ONCE(atomic_read(&p->destroyed));
+	if (!refcount_inc_not_zero(&p->cnt))
+		p = NULL;
+	rcu_read_unlock();
+
 	return p;
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
index fdcea7a61491..1efeec146d8e 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -16,10 +16,10 @@  struct {
 	{ "non_const_var_off_kptr_xchg", "R1 doesn't have constant offset. kptr has to be" },
 	{ "misaligned_access_write", "kptr access misaligned expected=8 off=7" },
 	{ "misaligned_access_read", "kptr access misaligned expected=8 off=1" },
-	{ "reject_var_off_store", "variable untrusted_ptr_ access var_off=(0x0; 0x1e0)" },
+	{ "reject_var_off_store", "variable untrusted_ptr_ access var_off=(0x0; 0x3f0)" },
 	{ "reject_bad_type_match", "invalid kptr access, R1 type=untrusted_ptr_prog_test_ref_kfunc" },
 	{ "marked_as_untrusted_or_null", "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_" },
-	{ "correct_btf_id_check_size", "access beyond struct prog_test_ref_kfunc at off 32 size 4" },
+	{ "correct_btf_id_check_size", "access beyond struct prog_test_ref_kfunc at off 48 size 4" },
 	{ "inherit_untrusted_on_walk", "R1 type=untrusted_ptr_ expected=percpu_ptr_" },
 	{ "reject_kptr_xchg_on_unref", "off=8 kptr isn't referenced kptr" },
 	{ "reject_kptr_get_no_map_val", "arg#0 expected pointer to map value" },
diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
index 6914904344c0..d7e76cf81362 100644
--- a/tools/testing/selftests/bpf/verifier/map_kptr.c
+++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
@@ -212,13 +212,13 @@ 
 	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
 	BPF_EXIT_INSN(),
-	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 32),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 48),
 	BPF_EXIT_INSN(),
 	},
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.fixup_map_kptr = { 1 },
 	.result = REJECT,
-	.errstr = "access beyond struct prog_test_ref_kfunc at off 32 size 8",
+	.errstr = "access beyond struct prog_test_ref_kfunc at off 48 size 8",
 },
 {
 	"map_kptr: unref: inherit PTR_UNTRUSTED on struct walk",