Message ID | 20221002171012.3529521-1-void@manifault.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [v2] selftests/bpf: Update map_kptr examples to reflect real use-cases | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
bpf/vmtest-bpf-next-PR | fail | PR summary |
bpf/vmtest-bpf-next-VM_Test-4 | success | Logs for llvm-toolchain |
bpf/vmtest-bpf-next-VM_Test-5 | success | Logs for set-matrix |
bpf/vmtest-bpf-next-VM_Test-2 | success | Logs for build for x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-3 | success | Logs for build for x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-1 | success | Logs for build for s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-16 | success | Logs for test_verifier on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-17 | success | Logs for test_verifier on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-13 | success | Logs for test_progs_no_alu32 on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-14 | success | Logs for test_progs_no_alu32 on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-7 | success | Logs for test_maps on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-8 | success | Logs for test_maps on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-10 | success | Logs for test_progs on x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-11 | success | Logs for test_progs on x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-15 | success | Logs for test_verifier on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-6 | success | Logs for test_maps on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-12 | fail | Logs for test_progs_no_alu32 on s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-9 | fail | Logs for test_progs on s390x with gcc |
On Sun, 2 Oct 2022 at 19:10, David Vernet <void@manifault.com> wrote: > > In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used > to verify and illustrate a typical use case of kptrs wherein an additional > reference is taken on a referenced kptr that is already stored in a map. > This would be useful for programs that, for example, want to pass the > referenced kptr to a kfunc without removing it from the map. > > Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't > representative of a real kfunc that needs to guard against possible > refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get() > does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the > user and then calls refcount_inc() if the pointer is nonzero, but this > can race with another callback in the same program that removes the kptr > from the map and frees it: > > 1. A BPF program with a referenced kptr in a map passes the kptr to > bpf_kfunc_call_test_kptr_get() as: > > p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0); > > from CPU 0. > > 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the > struct prog_test_ref_kfunc **pp contains a non-NULL pointer. > > 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove > the kptr from the map, and frees it with a call to > bpf_kfunc_call_test_release(). This drops the final refcount on the > kptr. > > 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing > a use-after-free. > > In the map_kptr selftest, this doesn't cause a use-after-free because > the structure being refcounted is statically allocated, and the > refcounts aren't actually used to control the object lifecycle. In a > kfunc supporting a real use case, the refcount going to 0 would likely > cause the object to be freed, as it does for e.g. struct task_struct. > > A more realistic use-case would use something like RCU in the kfunc > handler to ensure that the kptr object can be safely accessed, and then > issuing a refcount_inc_not_zero() to acquire a refcount on the object. > This patch updates the map_kptr selftest to do this. > > Signed-off-by: David Vernet <void@manifault.com> > --- In my defense, I did note all this in the commit adding support for kptr_get, so it seemed overkill to do it for a static struct. But it's probably not a bad idea to have a real example, given it's a necessity that such a helper requires reclamation of the object through RCU, and people probably won't go and read the commit message. However, some questions below... > net/bpf/test_run.c | 31 ++++++++++++++++--- > .../selftests/bpf/prog_tests/map_kptr.c | 4 +-- > .../testing/selftests/bpf/verifier/map_kptr.c | 4 +-- > 3 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > index 13d578ce2a09..3fe9495abcbe 100644 > --- a/net/bpf/test_run.c > +++ b/net/bpf/test_run.c > @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc { > int b; > struct prog_test_member memb; > struct prog_test_ref_kfunc *next; > + struct rcu_head rcu; > + atomic_t destroyed; > refcount_t cnt; > }; > > @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = { > .a = 42, > .b = 108, > .next = &prog_test_struct, > + .destroyed = ATOMIC_INIT(0), > .cnt = REFCOUNT_INIT(1), > }; > > noinline struct prog_test_ref_kfunc * > bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) > { > + WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed)); > refcount_inc(&prog_test_struct.cnt); > return &prog_test_struct; > } > @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void) > return NULL; > } > > +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp) > +{ > + struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu); > + > + WARN_ON_ONCE(refcount_read(&p->cnt) > 0); > + atomic_set(&p->destroyed, true); > +} > + > noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) > { > if (!p) > return; > > - refcount_dec(&p->cnt); > + WARN_ON_ONCE(atomic_read(&p->destroyed)); > + if (refcount_dec_and_test(&p->cnt)) > + call_rcu(&p->rcu, delayed_destroy_test_ref_struct); I wonder whether this is ever called, I haven't really given this patch a shot, but I don't see how the refcount can ever drop back to zero. It's initialized as 1, and then only acquired after that, so pairing all releases should still preserve refcount as 1. Also, even if you made it work, wouldn't you have the warning once you run more selftests using prog_test_run, if you just set the destroyed bit on each test run?
On Mon, Oct 03, 2022 at 01:35:49AM +0200, Kumar Kartikeya Dwivedi wrote: > On Sun, 2 Oct 2022 at 19:10, David Vernet <void@manifault.com> wrote: > > > > In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used > > to verify and illustrate a typical use case of kptrs wherein an additional > > reference is taken on a referenced kptr that is already stored in a map. > > This would be useful for programs that, for example, want to pass the > > referenced kptr to a kfunc without removing it from the map. > > > > Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't > > representative of a real kfunc that needs to guard against possible > > refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get() > > does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the > > user and then calls refcount_inc() if the pointer is nonzero, but this > > can race with another callback in the same program that removes the kptr > > from the map and frees it: > > > > 1. A BPF program with a referenced kptr in a map passes the kptr to > > bpf_kfunc_call_test_kptr_get() as: > > > > p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0); > > > > from CPU 0. > > > > 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the > > struct prog_test_ref_kfunc **pp contains a non-NULL pointer. > > > > 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove > > the kptr from the map, and frees it with a call to > > bpf_kfunc_call_test_release(). This drops the final refcount on the > > kptr. > > > > 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing > > a use-after-free. > > > > In the map_kptr selftest, this doesn't cause a use-after-free because > > the structure being refcounted is statically allocated, and the > > refcounts aren't actually used to control the object lifecycle. In a > > kfunc supporting a real use case, the refcount going to 0 would likely > > cause the object to be freed, as it does for e.g. struct task_struct. > > > > A more realistic use-case would use something like RCU in the kfunc > > handler to ensure that the kptr object can be safely accessed, and then > > issuing a refcount_inc_not_zero() to acquire a refcount on the object. > > This patch updates the map_kptr selftest to do this. > > > > Signed-off-by: David Vernet <void@manifault.com> > > --- > > In my defense, I did note all this in the commit adding support for > kptr_get, so it seemed overkill to do it for a static struct. > But it's probably not a bad idea to have a real example, given it's a > necessity that such a helper requires reclamation of the object > through RCU, and people probably won't go and read the commit message. No defense needed, though I of course agree that it's important to have a real example here. I decided to add it because I got confused about how it was safe to use refcount_inc() instead of refcount_inc_not_zero() in bpf_kfunc_call_test_kptr_get(). > However, some questions below... > > > net/bpf/test_run.c | 31 ++++++++++++++++--- > > .../selftests/bpf/prog_tests/map_kptr.c | 4 +-- > > .../testing/selftests/bpf/verifier/map_kptr.c | 4 +-- > > 3 files changed, 31 insertions(+), 8 deletions(-) > > > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > > index 13d578ce2a09..3fe9495abcbe 100644 > > --- a/net/bpf/test_run.c > > +++ b/net/bpf/test_run.c > > @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc { > > int b; > > struct prog_test_member memb; > > struct prog_test_ref_kfunc *next; > > + struct rcu_head rcu; > > + atomic_t destroyed; > > refcount_t cnt; > > }; > > > > @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = { > > .a = 42, > > .b = 108, > > .next = &prog_test_struct, > > + .destroyed = ATOMIC_INIT(0), > > .cnt = REFCOUNT_INIT(1), > > }; > > > > noinline struct prog_test_ref_kfunc * > > bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) > > { > > + WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed)); > > refcount_inc(&prog_test_struct.cnt); > > return &prog_test_struct; > > } > > @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void) > > return NULL; > > } > > > > +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp) > > +{ > > + struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu); > > + > > + WARN_ON_ONCE(refcount_read(&p->cnt) > 0); > > + atomic_set(&p->destroyed, true); > > +} > > + > > noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) > > { > > if (!p) > > return; > > > > - refcount_dec(&p->cnt); > > + WARN_ON_ONCE(atomic_read(&p->destroyed)); > > + if (refcount_dec_and_test(&p->cnt)) > > + call_rcu(&p->rcu, delayed_destroy_test_ref_struct); > > I wonder whether this is ever called, I haven't really given this > patch a shot, but I don't see how the refcount can ever drop back to > zero. It's initialized as 1, and then only acquired after that, so > pairing all releases should still preserve refcount as 1. Yeah, the call_rcu() path is never hit. If we wanted to update the test so that this codepath was actually exercised, I think we'd need to add another kfunc that returned a reference to a dynamically allocated object rather than using the global, static one. I'm happy to do that if we think it's useful. The downside to adding more of these test kfuncs is that they actually do add a small bit of runtime overhead to the kernel because they're unconditionally registered in the __init function for test_run.c. > Also, even if you made it work, wouldn't you have the warning once you > run more selftests using prog_test_run, if you just set the destroyed > bit on each test run? If we want to update the test to have the refcount drop to 0, we would probably have to instead use dynamically allocated objects. At that point, we'd probably just crash instead of seeing a warning if we accidentally let a caller invoke acquire or release after the object had been destroyed. Maybe the better thing to do here is to just warn unconditionally in the destructor rather than setting a flag? What we really want to ensure is that the final refcount that's "owned" by the main kernel is never dropped.
On Mon, 3 Oct 2022 at 15:52, David Vernet <void@manifault.com> wrote: > > On Mon, Oct 03, 2022 at 01:35:49AM +0200, Kumar Kartikeya Dwivedi wrote: > > On Sun, 2 Oct 2022 at 19:10, David Vernet <void@manifault.com> wrote: > > > > > > In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used > > > to verify and illustrate a typical use case of kptrs wherein an additional > > > reference is taken on a referenced kptr that is already stored in a map. > > > This would be useful for programs that, for example, want to pass the > > > referenced kptr to a kfunc without removing it from the map. > > > > > > Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't > > > representative of a real kfunc that needs to guard against possible > > > refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get() > > > does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the > > > user and then calls refcount_inc() if the pointer is nonzero, but this > > > can race with another callback in the same program that removes the kptr > > > from the map and frees it: > > > > > > 1. A BPF program with a referenced kptr in a map passes the kptr to > > > bpf_kfunc_call_test_kptr_get() as: > > > > > > p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0); > > > > > > from CPU 0. > > > > > > 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the > > > struct prog_test_ref_kfunc **pp contains a non-NULL pointer. > > > > > > 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove > > > the kptr from the map, and frees it with a call to > > > bpf_kfunc_call_test_release(). This drops the final refcount on the > > > kptr. > > > > > > 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing > > > a use-after-free. > > > > > > In the map_kptr selftest, this doesn't cause a use-after-free because > > > the structure being refcounted is statically allocated, and the > > > refcounts aren't actually used to control the object lifecycle. In a > > > kfunc supporting a real use case, the refcount going to 0 would likely > > > cause the object to be freed, as it does for e.g. struct task_struct. > > > > > > A more realistic use-case would use something like RCU in the kfunc > > > handler to ensure that the kptr object can be safely accessed, and then > > > issuing a refcount_inc_not_zero() to acquire a refcount on the object. > > > This patch updates the map_kptr selftest to do this. > > > > > > Signed-off-by: David Vernet <void@manifault.com> > > > --- > > > > In my defense, I did note all this in the commit adding support for > > kptr_get, so it seemed overkill to do it for a static struct. > > But it's probably not a bad idea to have a real example, given it's a > > necessity that such a helper requires reclamation of the object > > through RCU, and people probably won't go and read the commit message. > > No defense needed, though I of course agree that it's important to have > a real example here. I decided to add it because I got confused about > how it was safe to use refcount_inc() instead of refcount_inc_not_zero() > in bpf_kfunc_call_test_kptr_get(). > > > However, some questions below... > > > > > net/bpf/test_run.c | 31 ++++++++++++++++--- > > > .../selftests/bpf/prog_tests/map_kptr.c | 4 +-- > > > .../testing/selftests/bpf/verifier/map_kptr.c | 4 +-- > > > 3 files changed, 31 insertions(+), 8 deletions(-) > > > > > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > > > index 13d578ce2a09..3fe9495abcbe 100644 > > > --- a/net/bpf/test_run.c > > > +++ b/net/bpf/test_run.c > > > @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc { > > > int b; > > > struct prog_test_member memb; > > > struct prog_test_ref_kfunc *next; > > > + struct rcu_head rcu; > > > + atomic_t destroyed; > > > refcount_t cnt; > > > }; > > > > > > @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = { > > > .a = 42, > > > .b = 108, > > > .next = &prog_test_struct, > > > + .destroyed = ATOMIC_INIT(0), > > > .cnt = REFCOUNT_INIT(1), > > > }; > > > > > > noinline struct prog_test_ref_kfunc * > > > bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) > > > { > > > + WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed)); > > > refcount_inc(&prog_test_struct.cnt); > > > return &prog_test_struct; > > > } > > > @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void) > > > return NULL; > > > } > > > > > > +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp) > > > +{ > > > + struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu); > > > + > > > + WARN_ON_ONCE(refcount_read(&p->cnt) > 0); > > > + atomic_set(&p->destroyed, true); > > > +} > > > + > > > noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) > > > { > > > if (!p) > > > return; > > > > > > - refcount_dec(&p->cnt); > > > + WARN_ON_ONCE(atomic_read(&p->destroyed)); > > > + if (refcount_dec_and_test(&p->cnt)) > > > + call_rcu(&p->rcu, delayed_destroy_test_ref_struct); > > > > I wonder whether this is ever called, I haven't really given this > > patch a shot, but I don't see how the refcount can ever drop back to > > zero. It's initialized as 1, and then only acquired after that, so > > pairing all releases should still preserve refcount as 1. > > Yeah, the call_rcu() path is never hit. If we wanted to update the test > so that this codepath was actually exercised, I think we'd need to add > another kfunc that returned a reference to a dynamically allocated > object rather than using the global, static one. I'm happy to do that if > we think it's useful. The downside to adding more of these test kfuncs > is that they actually do add a small bit of runtime overhead to the > kernel because they're unconditionally registered in the __init function > for test_run.c. > But that only happens once, right? It still happens, so I don't see what changes. > > Also, even if you made it work, wouldn't you have the warning once you > > run more selftests using prog_test_run, if you just set the destroyed > > bit on each test run? > > If we want to update the test to have the refcount drop to 0, we would > probably have to instead use dynamically allocated objects. At that > point, we'd probably just crash instead of seeing a warning if we > accidentally let a caller invoke acquire or release after the object had > been destroyed. Maybe the better thing to do here is to just warn > unconditionally in the destructor rather than setting a flag? What we > really want to ensure is that the final refcount that's "owned" by the > main kernel is never dropped. I think the refcount_t API already warns if underflow happens. To be clear, I don't mind if you want to improve this, it's certainly a mess right now. Tests can't even run in parallel easily because it's global. Testing like an actually allocated object might be a good way to simulate a real scenario. And I totally agree that having a real example is useful to people who want to add support for more kptrs. Maybe some comments around the code would also be helpful on what the expectations are. We should make it easy for people to hook into this stuff. Nobody reads documentation, people usually only look at existing examples.
On Mon, Oct 03, 2022 at 06:22:55PM +0200, Kumar Kartikeya Dwivedi wrote: [...] > > > > noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) > > > > { > > > > if (!p) > > > > return; > > > > > > > > - refcount_dec(&p->cnt); > > > > + WARN_ON_ONCE(atomic_read(&p->destroyed)); > > > > + if (refcount_dec_and_test(&p->cnt)) > > > > + call_rcu(&p->rcu, delayed_destroy_test_ref_struct); > > > > > > I wonder whether this is ever called, I haven't really given this > > > patch a shot, but I don't see how the refcount can ever drop back to > > > zero. It's initialized as 1, and then only acquired after that, so > > > pairing all releases should still preserve refcount as 1. > > > > Yeah, the call_rcu() path is never hit. If we wanted to update the test > > so that this codepath was actually exercised, I think we'd need to add > > another kfunc that returned a reference to a dynamically allocated > > object rather than using the global, static one. I'm happy to do that if > > we think it's useful. The downside to adding more of these test kfuncs > > is that they actually do add a small bit of runtime overhead to the > > kernel because they're unconditionally registered in the __init function > > for test_run.c. > > > > But that only happens once, right? It still happens, so I don't see > what changes. The idea here would be to return a dynamically allocated object with an initial refcount of 1 that's owned by the _BPF program_, rather than what we have today where the global struct has an initial refcount that's owned by the main kernel and is never expected to go to 0. For all success (i.e. non-fail) testcases that are able to dynamically allocate this object, the refcount should go to 0 for each of them and the object will be destroyed after the current RCU grace period. Please let me know if I've misunderstood your point. > > > Also, even if you made it work, wouldn't you have the warning once you > > > run more selftests using prog_test_run, if you just set the destroyed > > > bit on each test run? > > > > If we want to update the test to have the refcount drop to 0, we would > > probably have to instead use dynamically allocated objects. At that > > point, we'd probably just crash instead of seeing a warning if we > > accidentally let a caller invoke acquire or release after the object had > > been destroyed. Maybe the better thing to do here is to just warn > > unconditionally in the destructor rather than setting a flag? What we > > really want to ensure is that the final refcount that's "owned" by the > > main kernel is never dropped. > > I think the refcount_t API already warns if underflow happens. Right, a warning would probably show up if we did a release that caused an underflow, but it would not for an acquire after the refcount dropped to 0. > To be clear, I don't mind if you want to improve this, it's certainly > a mess right now. Tests can't even run in parallel easily because it's > global. Testing like an actually allocated object might be a good way > to simulate a real scenario. And I totally agree that having a real > example is useful to people who want to add support for more kptrs. Ok, let me update the tests to do two things then: 1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns a dynamically allocated instance of a prog_test_ref_kfunc *. This is similar in intention to bpf_xdp_ct_alloc(). 2. Update bpf_kfunc_call_test_acquire() and bpf_kfunc_call_test_release() to take a trusted pointer to that allocated prog_test_ref_kfunc *. 3. Keep the other changes which e.g. use RCU in bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr. Once the __rcu kptr variant is landed we can get rid of bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire() require an __rcu pointer. Continuing on point (3) above, we should _probably_ also have an example for an object that isn't RCU-protected? I imagine that we don't want to get rid of KF_KPTR_GET entirely once we have __rcu pointers because some kptr_get() implementations may synchronize in other ways, such as with spinlocks or whatever. Let's leave this until after __rcu lands though. Does this all sound good? > Maybe some comments around the code would also be helpful on what the > expectations are. We should make it easy for people to hook into this > stuff. Nobody reads documentation, people usually only look at > existing examples. For sure, can do.
On Tue, 4 Oct 2022 at 20:59, David Vernet <void@manifault.com> wrote: > > On Mon, Oct 03, 2022 at 06:22:55PM +0200, Kumar Kartikeya Dwivedi wrote: > > [...] > > > > > > noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) > > > > > { > > > > > if (!p) > > > > > return; > > > > > > > > > > - refcount_dec(&p->cnt); > > > > > + WARN_ON_ONCE(atomic_read(&p->destroyed)); > > > > > + if (refcount_dec_and_test(&p->cnt)) > > > > > + call_rcu(&p->rcu, delayed_destroy_test_ref_struct); > > > > > > > > I wonder whether this is ever called, I haven't really given this > > > > patch a shot, but I don't see how the refcount can ever drop back to > > > > zero. It's initialized as 1, and then only acquired after that, so > > > > pairing all releases should still preserve refcount as 1. > > > > > > Yeah, the call_rcu() path is never hit. If we wanted to update the test > > > so that this codepath was actually exercised, I think we'd need to add > > > another kfunc that returned a reference to a dynamically allocated > > > object rather than using the global, static one. I'm happy to do that if > > > we think it's useful. The downside to adding more of these test kfuncs > > > is that they actually do add a small bit of runtime overhead to the > > > kernel because they're unconditionally registered in the __init function > > > for test_run.c. > > > > > > > But that only happens once, right? It still happens, so I don't see > > what changes. > > The idea here would be to return a dynamically allocated object with an > initial refcount of 1 that's owned by the _BPF program_, rather than > what we have today where the global struct has an initial refcount > that's owned by the main kernel and is never expected to go to 0. For > all success (i.e. non-fail) testcases that are able to dynamically > allocate this object, the refcount should go to 0 for each of them and > the object will be destroyed after the current RCU grace period. Please > let me know if I've misunderstood your point. > > > > > Also, even if you made it work, wouldn't you have the warning once you > > > > run more selftests using prog_test_run, if you just set the destroyed > > > > bit on each test run? > > > > > > If we want to update the test to have the refcount drop to 0, we would > > > probably have to instead use dynamically allocated objects. At that > > > point, we'd probably just crash instead of seeing a warning if we > > > accidentally let a caller invoke acquire or release after the object had > > > been destroyed. Maybe the better thing to do here is to just warn > > > unconditionally in the destructor rather than setting a flag? What we > > > really want to ensure is that the final refcount that's "owned" by the > > > main kernel is never dropped. > > > > I think the refcount_t API already warns if underflow happens. > > Right, a warning would probably show up if we did a release that caused > an underflow, but it would not for an acquire after the refcount dropped > to 0. > It should, see REFCOUNT_ADD_UAF in include/linux/refcount.h. > > To be clear, I don't mind if you want to improve this, it's certainly > > a mess right now. Tests can't even run in parallel easily because it's > > global. Testing like an actually allocated object might be a good way > > to simulate a real scenario. And I totally agree that having a real > > example is useful to people who want to add support for more kptrs. > > Ok, let me update the tests to do two things then: > > 1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns > a dynamically allocated instance of a prog_test_ref_kfunc *. This is > similar in intention to bpf_xdp_ct_alloc(). > 2. Update bpf_kfunc_call_test_acquire() and > bpf_kfunc_call_test_release() to take a trusted pointer to that > allocated prog_test_ref_kfunc *. This should work, but you would have to go through a lot of tests, sadly I assumed it is global in a lot of places to make testing easier (e.g. observing count after releasing by doing p->next, which gave me a PTR_TO_BTF_ID that is preserved after release). Some other way would have to be found to do the same thing. > 3. Keep the other changes which e.g. use RCU in > bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr. > Once the __rcu kptr variant is landed we can get rid of > bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire() > require an __rcu pointer. > In the case of RCU I don't plan on passing the rcu pointer to acquire functions, but rather kptr_get stops taking pointer to pointer. I.e. in your point 3, kptr_get does what you want _acquire to do. It takes an rcu protected pointer to an object and safely increments its count. > Continuing on point (3) above, we should _probably_ also have an example > for an object that isn't RCU-protected? I imagine that we don't want to > get rid of KF_KPTR_GET entirely once we have __rcu pointers because some > kptr_get() implementations may synchronize in other ways, such as with > spinlocks or whatever. Let's leave this until after __rcu lands though. > I think it's unlikely kptr_get can work without it, spinlocks may be required internally (e.g. to inspect the object key/generation in SLAB_TYPESAFE_BY_RCU case without races) but that goes on top of RCU protection. But yes, it depends, maybe it will work for some special cases. Though I don't think it's worth adding an example for the uncommon case. So you will still need kptr_get helpers, some of them simply do refcount_inc_not_zero, others may do a little more. Anyway, let's discuss more when the set is posted. > Does this all sound good? > Yes.
On Tue, Oct 11, 2022 at 08:10:26AM +0530, Kumar Kartikeya Dwivedi wrote: [...] > > > > > Also, even if you made it work, wouldn't you have the warning once you > > > > > run more selftests using prog_test_run, if you just set the destroyed > > > > > bit on each test run? > > > > > > > > If we want to update the test to have the refcount drop to 0, we would > > > > probably have to instead use dynamically allocated objects. At that > > > > point, we'd probably just crash instead of seeing a warning if we > > > > accidentally let a caller invoke acquire or release after the object had > > > > been destroyed. Maybe the better thing to do here is to just warn > > > > unconditionally in the destructor rather than setting a flag? What we > > > > really want to ensure is that the final refcount that's "owned" by the > > > > main kernel is never dropped. > > > > > > I think the refcount_t API already warns if underflow happens. > > > > Right, a warning would probably show up if we did a release that caused > > an underflow, but it would not for an acquire after the refcount dropped > > to 0. > > > > It should, see REFCOUNT_ADD_UAF in include/linux/refcount.h. Ahh, right you are, fair enough and thanks for hand holding and pointing me directly at the relevant code. I now agree that the warns on the destroyed field are just redundant. > > > To be clear, I don't mind if you want to improve this, it's certainly > > > a mess right now. Tests can't even run in parallel easily because it's > > > global. Testing like an actually allocated object might be a good way > > > to simulate a real scenario. And I totally agree that having a real > > > example is useful to people who want to add support for more kptrs. > > > > Ok, let me update the tests to do two things then: > > > > 1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns > > a dynamically allocated instance of a prog_test_ref_kfunc *. This is > > similar in intention to bpf_xdp_ct_alloc(). > > 2. Update bpf_kfunc_call_test_acquire() and > > bpf_kfunc_call_test_release() to take a trusted pointer to that > > allocated prog_test_ref_kfunc *. > > This should work, but you would have to go through a lot of tests, > sadly I assumed it is global in a lot of places to make testing easier > (e.g. observing count after releasing by doing p->next, which gave me > a PTR_TO_BTF_ID that is preserved after release). > Some other way would have to be found to do the same thing. Hmmm, ok, let me spend a bit of time trying to make all of this dynamic. If it becomes clear that it's too much of a PITA, I might just drop the patch; especially considering that your __rcu kptr work will address what I was really initially trying to fix (which is that the kptr_get pattern used in the test would likely be racy for a real kfunc). Or if we want to, we could keep what it has now, but I could just update delayed_destroy_test_ref_struct() to do nothing other than WARN_ON_ONCE() and remove the extra warns for prog_test_struct.destroyed. We can make a final call when I send out v3. > > 3. Keep the other changes which e.g. use RCU in > > bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr. > > Once the __rcu kptr variant is landed we can get rid of > > bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire() > > require an __rcu pointer. > > > > In the case of RCU I don't plan on passing the rcu pointer to acquire > functions, but rather kptr_get stops taking pointer to pointer. I.e. > in your point 3, kptr_get does what you want _acquire to do. It takes > an rcu protected pointer to an object and safely increments its count. Oh, ok. So the idea is that the acquire function takes a normal trusted pointer, and kptr_get takes an RCU protected kptr (so it would still have to do refcount_inc_not_zero()). Makes sense. > > Continuing on point (3) above, we should _probably_ also have an example > > for an object that isn't RCU-protected? I imagine that we don't want to > > get rid of KF_KPTR_GET entirely once we have __rcu pointers because some > > kptr_get() implementations may synchronize in other ways, such as with > > spinlocks or whatever. Let's leave this until after __rcu lands though. > > > > I think it's unlikely kptr_get can work without it, spinlocks may be > required internally (e.g. to inspect the object key/generation in > SLAB_TYPESAFE_BY_RCU case without races) but that goes on top of RCU > protection. But yes, it depends, maybe it will work for some special > cases. Though I don't think it's worth adding an example for the > uncommon case. Yeah, let's leave it off for now until we have a concrete use case in the kernel that we want to mirror in a testcase.
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 13d578ce2a09..3fe9495abcbe 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -565,6 +565,8 @@ struct prog_test_ref_kfunc { int b; struct prog_test_member memb; struct prog_test_ref_kfunc *next; + struct rcu_head rcu; + atomic_t destroyed; refcount_t cnt; }; @@ -572,12 +574,14 @@ static struct prog_test_ref_kfunc prog_test_struct = { .a = 42, .b = 108, .next = &prog_test_struct, + .destroyed = ATOMIC_INIT(0), .cnt = REFCOUNT_INIT(1), }; noinline struct prog_test_ref_kfunc * bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) { + WARN_ON_ONCE(atomic_read(&prog_test_struct.destroyed)); refcount_inc(&prog_test_struct.cnt); return &prog_test_struct; } @@ -589,12 +593,22 @@ bpf_kfunc_call_memb_acquire(void) return NULL; } +static void delayed_destroy_test_ref_struct(struct rcu_head *rhp) +{ + struct prog_test_ref_kfunc *p = container_of(rhp, struct prog_test_ref_kfunc, rcu); + + WARN_ON_ONCE(refcount_read(&p->cnt) > 0); + atomic_set(&p->destroyed, true); +} + noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) { if (!p) return; - refcount_dec(&p->cnt); + WARN_ON_ONCE(atomic_read(&p->destroyed)); + if (refcount_dec_and_test(&p->cnt)) + call_rcu(&p->rcu, delayed_destroy_test_ref_struct); } noinline void bpf_kfunc_call_memb_release(struct prog_test_member *p) @@ -641,11 +655,20 @@ noinline void bpf_kfunc_call_int_mem_release(int *p) noinline struct prog_test_ref_kfunc * bpf_kfunc_call_test_kptr_get(struct prog_test_ref_kfunc **pp, int a, int b) { - struct prog_test_ref_kfunc *p = READ_ONCE(*pp); + struct prog_test_ref_kfunc *p; - if (!p) + rcu_read_lock(); + p = READ_ONCE(*pp); + if (!p) { + rcu_read_unlock(); return NULL; - refcount_inc(&p->cnt); + } + + WARN_ON_ONCE(atomic_read(&p->destroyed)); + if (!refcount_inc_not_zero(&p->cnt)) + p = NULL; + rcu_read_unlock(); + return p; } diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c index fdcea7a61491..1efeec146d8e 100644 --- a/tools/testing/selftests/bpf/prog_tests/map_kptr.c +++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c @@ -16,10 +16,10 @@ struct { { "non_const_var_off_kptr_xchg", "R1 doesn't have constant offset. kptr has to be" }, { "misaligned_access_write", "kptr access misaligned expected=8 off=7" }, { "misaligned_access_read", "kptr access misaligned expected=8 off=1" }, - { "reject_var_off_store", "variable untrusted_ptr_ access var_off=(0x0; 0x1e0)" }, + { "reject_var_off_store", "variable untrusted_ptr_ access var_off=(0x0; 0x3f0)" }, { "reject_bad_type_match", "invalid kptr access, R1 type=untrusted_ptr_prog_test_ref_kfunc" }, { "marked_as_untrusted_or_null", "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_" }, - { "correct_btf_id_check_size", "access beyond struct prog_test_ref_kfunc at off 32 size 4" }, + { "correct_btf_id_check_size", "access beyond struct prog_test_ref_kfunc at off 48 size 4" }, { "inherit_untrusted_on_walk", "R1 type=untrusted_ptr_ expected=percpu_ptr_" }, { "reject_kptr_xchg_on_unref", "off=8 kptr isn't referenced kptr" }, { "reject_kptr_get_no_map_val", "arg#0 expected pointer to map value" }, diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c index 6914904344c0..d7e76cf81362 100644 --- a/tools/testing/selftests/bpf/verifier/map_kptr.c +++ b/tools/testing/selftests/bpf/verifier/map_kptr.c @@ -212,13 +212,13 @@ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), BPF_EXIT_INSN(), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 32), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 48), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SCHED_CLS, .fixup_map_kptr = { 1 }, .result = REJECT, - .errstr = "access beyond struct prog_test_ref_kfunc at off 32 size 8", + .errstr = "access beyond struct prog_test_ref_kfunc at off 48 size 8", }, { "map_kptr: unref: inherit PTR_UNTRUSTED on struct walk",
In the map_kptr selftest, the bpf_kfunc_call_test_kptr_get() kfunc is used to verify and illustrate a typical use case of kptrs wherein an additional reference is taken on a referenced kptr that is already stored in a map. This would be useful for programs that, for example, want to pass the referenced kptr to a kfunc without removing it from the map. Unfortunately, the implementation of bpf_kfunc_call_test_kptr_get() isn't representative of a real kfunc that needs to guard against possible refcounting races by BPF program callers. bpf_kfunc_call_test_kptr_get() does a READ_ONCE() on the struct prog_test_ref_kfunc **pp passed by the user and then calls refcount_inc() if the pointer is nonzero, but this can race with another callback in the same program that removes the kptr from the map and frees it: 1. A BPF program with a referenced kptr in a map passes the kptr to bpf_kfunc_call_test_kptr_get() as: p = bpf_kfunc_call_test_kptr_get(&v->ref_ptr, 0, 0); from CPU 0. 2. bpf_kfunc_call_test_kptr_get() does READ_ONCE(), and sees that the struct prog_test_ref_kfunc **pp contains a non-NULL pointer. 3. Another BPF handler on CPU 1 then invokes bpf_kptr_xchg() to remove the kptr from the map, and frees it with a call to bpf_kfunc_call_test_release(). This drops the final refcount on the kptr. 4. CPU 0 then issues refcount_inc() on the kptr with refcount 0, causing a use-after-free. In the map_kptr selftest, this doesn't cause a use-after-free because the structure being refcounted is statically allocated, and the refcounts aren't actually used to control the object lifecycle. In a kfunc supporting a real use case, the refcount going to 0 would likely cause the object to be freed, as it does for e.g. struct task_struct. A more realistic use-case would use something like RCU in the kfunc handler to ensure that the kptr object can be safely accessed, and then issuing a refcount_inc_not_zero() to acquire a refcount on the object. This patch updates the map_kptr selftest to do this. Signed-off-by: David Vernet <void@manifault.com> --- net/bpf/test_run.c | 31 ++++++++++++++++--- .../selftests/bpf/prog_tests/map_kptr.c | 4 +-- .../testing/selftests/bpf/verifier/map_kptr.c | 4 +-- 3 files changed, 31 insertions(+), 8 deletions(-)