Message ID | 20221106015152.2556188-2-memxor@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Fix deadlock with bpf_timer in fentry/fexit progs | expand |
On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is > excluded is_tracing_prog_type checks. This means that they can use maps > containing bpf_spin_lock, bpf_timer, etc. without verification failure. > > However, allowing fentry/fexit programs to use maps that have such > bpf_timer in the map value can lead to deadlock. > > Suppose that an fentry program is attached to bpf_prog_put, and a TC > program executes and does bpf_map_update_elem on an array map that both > progs share. If the fentry programs calls bpf_map_update_elem for the > same key, it will lead to acquiring of the same lock from within the > critical section protecting the timer. > > The call chain is: > > bpf_prog_test_run_opts() // TC > bpf_prog_TC > bpf_map_update_elem(array_map, key=0) > bpf_obj_free_fields > bpf_timer_cancel_and_free > __bpf_spin_lock_irqsave > drop_prog_refcnt > bpf_prog_put > bpf_prog_FENTRY // FENTRY > bpf_map_update_elem(array_map, key=0) > bpf_obj_free_fields > bpf_timer_cancel_and_free > __bpf_spin_lock_irqsave // DEADLOCK > > BPF_TRACE_ITER attach type can be excluded because it always executes in > process context. > > Update selftests using bpf_timer in fentry to TC as they will be broken > by this change. which is an obvious red flag and the reason why we cannot do this change. This specific issue could be addressed with addition of notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put. Other calls from __bpf_prog_put can stay as-is, since they won't be called in this convoluted case. I frankly don't get why you're spending time digging such odd corner cases that no one can hit in real use. There are probably other equally weird corner cases and sooner or later will just declare them as 'wont-fix'. Not kidding. Please channel your energy to something that helps. Positive patches are more pleasant to review.
On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote: > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is > > excluded is_tracing_prog_type checks. This means that they can use maps > > containing bpf_spin_lock, bpf_timer, etc. without verification failure. > > > > However, allowing fentry/fexit programs to use maps that have such > > bpf_timer in the map value can lead to deadlock. > > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC > > program executes and does bpf_map_update_elem on an array map that both > > progs share. If the fentry programs calls bpf_map_update_elem for the > > same key, it will lead to acquiring of the same lock from within the > > critical section protecting the timer. > > > > The call chain is: > > > > bpf_prog_test_run_opts() // TC > > bpf_prog_TC > > bpf_map_update_elem(array_map, key=0) > > bpf_obj_free_fields > > bpf_timer_cancel_and_free > > __bpf_spin_lock_irqsave > > drop_prog_refcnt > > bpf_prog_put > > bpf_prog_FENTRY // FENTRY > > bpf_map_update_elem(array_map, key=0) > > bpf_obj_free_fields > > bpf_timer_cancel_and_free > > __bpf_spin_lock_irqsave // DEADLOCK > > > > BPF_TRACE_ITER attach type can be excluded because it always executes in > > process context. > > > > Update selftests using bpf_timer in fentry to TC as they will be broken > > by this change. > > which is an obvious red flag and the reason why we cannot do > this change. > This specific issue could be addressed with addition of > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put. > Other calls from __bpf_prog_put can stay as-is, > since they won't be called in this convoluted case. > I frankly don't get why you're spending time digging such > odd corner cases that no one can hit in real use. I was trying to figure out whether bpf_list_head_free would be safe to call all the time in map updates from bpf_obj_free_fields, since it takes the very same spin lock that BPF program can also take to update the list. Map update ops are not allowed in the critical section, so this particular kind of recurisve map update call should not be possible. perf event is already prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update the same map. But then I went looking whether it was a problem elsewhere... FWIW I have updated my patch to do: if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD if (is_tracing_prog_type(prog_type) || ‣type: prog_type (prog_type == BPF_PROG_TYPE_TRACING && env->prog->expected_attach_type != BPF_TRACE_ITER)) { verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp return -EINVAL; } } v5 coming soon. > There are probably other equally weird corner cases and sooner > or later will just declare them as 'wont-fix'. Not kidding. Understood. > Please channel your energy to something that helps. > Positive patches are more pleasant to review. Understood.
On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote: > > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is > > > excluded is_tracing_prog_type checks. This means that they can use maps > > > containing bpf_spin_lock, bpf_timer, etc. without verification failure. > > > > > > However, allowing fentry/fexit programs to use maps that have such > > > bpf_timer in the map value can lead to deadlock. > > > > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC > > > program executes and does bpf_map_update_elem on an array map that both > > > progs share. If the fentry programs calls bpf_map_update_elem for the > > > same key, it will lead to acquiring of the same lock from within the > > > critical section protecting the timer. > > > > > > The call chain is: > > > > > > bpf_prog_test_run_opts() // TC > > > bpf_prog_TC > > > bpf_map_update_elem(array_map, key=0) > > > bpf_obj_free_fields > > > bpf_timer_cancel_and_free > > > __bpf_spin_lock_irqsave > > > drop_prog_refcnt > > > bpf_prog_put > > > bpf_prog_FENTRY // FENTRY > > > bpf_map_update_elem(array_map, key=0) > > > bpf_obj_free_fields > > > bpf_timer_cancel_and_free > > > __bpf_spin_lock_irqsave // DEADLOCK > > > > > > BPF_TRACE_ITER attach type can be excluded because it always executes in > > > process context. > > > > > > Update selftests using bpf_timer in fentry to TC as they will be broken > > > by this change. > > > > which is an obvious red flag and the reason why we cannot do > > this change. > > This specific issue could be addressed with addition of > > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put. > > Other calls from __bpf_prog_put can stay as-is, > > since they won't be called in this convoluted case. > > I frankly don't get why you're spending time digging such > > odd corner cases that no one can hit in real use. > > I was trying to figure out whether bpf_list_head_free would be safe to call all > the time in map updates from bpf_obj_free_fields, since it takes the very same > spin lock that BPF program can also take to update the list. > > Map update ops are not allowed in the critical section, so this particular kind > of recurisve map update call should not be possible. perf event is already > prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update > the same map. > > But then I went looking whether it was a problem elsewhere... > > FWIW I have updated my patch to do: > > if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD > if (is_tracing_prog_type(prog_type) || ‣type: prog_type > (prog_type == BPF_PROG_TYPE_TRACING && > env->prog->expected_attach_type != BPF_TRACE_ITER)) { > verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp > return -EINVAL; > } > } That is a severe limitation. Why cannot you use notrace approach?
On Mon, Nov 07, 2022 at 06:01:44AM IST, Alexei Starovoitov wrote: > On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote: > > > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > > > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is > > > > excluded is_tracing_prog_type checks. This means that they can use maps > > > > containing bpf_spin_lock, bpf_timer, etc. without verification failure. > > > > > > > > However, allowing fentry/fexit programs to use maps that have such > > > > bpf_timer in the map value can lead to deadlock. > > > > > > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC > > > > program executes and does bpf_map_update_elem on an array map that both > > > > progs share. If the fentry programs calls bpf_map_update_elem for the > > > > same key, it will lead to acquiring of the same lock from within the > > > > critical section protecting the timer. > > > > > > > > The call chain is: > > > > > > > > bpf_prog_test_run_opts() // TC > > > > bpf_prog_TC > > > > bpf_map_update_elem(array_map, key=0) > > > > bpf_obj_free_fields > > > > bpf_timer_cancel_and_free > > > > __bpf_spin_lock_irqsave > > > > drop_prog_refcnt > > > > bpf_prog_put > > > > bpf_prog_FENTRY // FENTRY > > > > bpf_map_update_elem(array_map, key=0) > > > > bpf_obj_free_fields > > > > bpf_timer_cancel_and_free > > > > __bpf_spin_lock_irqsave // DEADLOCK > > > > > > > > BPF_TRACE_ITER attach type can be excluded because it always executes in > > > > process context. > > > > > > > > Update selftests using bpf_timer in fentry to TC as they will be broken > > > > by this change. > > > > > > which is an obvious red flag and the reason why we cannot do > > > this change. > > > This specific issue could be addressed with addition of > > > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put. > > > Other calls from __bpf_prog_put can stay as-is, > > > since they won't be called in this convoluted case. > > > I frankly don't get why you're spending time digging such > > > odd corner cases that no one can hit in real use. > > > > I was trying to figure out whether bpf_list_head_free would be safe to call all > > the time in map updates from bpf_obj_free_fields, since it takes the very same > > spin lock that BPF program can also take to update the list. > > > > Map update ops are not allowed in the critical section, so this particular kind > > of recurisve map update call should not be possible. perf event is already > > prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update > > the same map. > > > > But then I went looking whether it was a problem elsewhere... > > > > FWIW I have updated my patch to do: > > > > if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD > > if (is_tracing_prog_type(prog_type) || ‣type: prog_type > > (prog_type == BPF_PROG_TYPE_TRACING && > > env->prog->expected_attach_type != BPF_TRACE_ITER)) { > > verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp > > return -EINVAL; > > } > > } > > That is a severe limitation. > Why cannot you use notrace approach? Yes, notrace is indeed an option, but the problem is that everything within that critical section needs to be notrace. bpf_list_head_free also ends up calling bpf_obj_free_fields again (the level of recursion however won't exceed 3, since we clamp list_head -> list_head till 2 levels). So the notrace needs to be applied to everything within it, which is not a problem now. It can be done. BPF_TIMER and BPF_KPTR_REF (the indirect call to dtor) won't be triggered, so it probably just needs to be bpf_list_head_free and bpf_obj_free_fields. But it can break silently in the future, if e.g. kptr is allowed. Same for drop_prog_refcnt if something changes. Every change to anything they call (and called by functions they call) needs to keep the restriction in mind. I was wondering if in both cases of bpf_timer and bpf_list_head, we can simply swap out the object locally while holding the lock, and then do everything outside the spin lock. For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it would mean swapping out the list_head and then draining it outside the lock. Then we hopefully don't need to use notrace, and it wouldn't be possible for any tracing prog to execute while we hold the bpf_spin_lock (unless I missed something).
On Sun, Nov 6, 2022 at 5:48 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Mon, Nov 07, 2022 at 06:01:44AM IST, Alexei Starovoitov wrote: > > On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > > > On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote: > > > > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > > > > > > > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is > > > > > excluded is_tracing_prog_type checks. This means that they can use maps > > > > > containing bpf_spin_lock, bpf_timer, etc. without verification failure. > > > > > > > > > > However, allowing fentry/fexit programs to use maps that have such > > > > > bpf_timer in the map value can lead to deadlock. > > > > > > > > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC > > > > > program executes and does bpf_map_update_elem on an array map that both > > > > > progs share. If the fentry programs calls bpf_map_update_elem for the > > > > > same key, it will lead to acquiring of the same lock from within the > > > > > critical section protecting the timer. > > > > > > > > > > The call chain is: > > > > > > > > > > bpf_prog_test_run_opts() // TC > > > > > bpf_prog_TC > > > > > bpf_map_update_elem(array_map, key=0) > > > > > bpf_obj_free_fields > > > > > bpf_timer_cancel_and_free > > > > > __bpf_spin_lock_irqsave > > > > > drop_prog_refcnt > > > > > bpf_prog_put > > > > > bpf_prog_FENTRY // FENTRY > > > > > bpf_map_update_elem(array_map, key=0) > > > > > bpf_obj_free_fields > > > > > bpf_timer_cancel_and_free > > > > > __bpf_spin_lock_irqsave // DEADLOCK > > > > > > > > > > BPF_TRACE_ITER attach type can be excluded because it always executes in > > > > > process context. > > > > > > > > > > Update selftests using bpf_timer in fentry to TC as they will be broken > > > > > by this change. > > > > > > > > which is an obvious red flag and the reason why we cannot do > > > > this change. > > > > This specific issue could be addressed with addition of > > > > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put. > > > > Other calls from __bpf_prog_put can stay as-is, > > > > since they won't be called in this convoluted case. > > > > I frankly don't get why you're spending time digging such > > > > odd corner cases that no one can hit in real use. > > > > > > I was trying to figure out whether bpf_list_head_free would be safe to call all > > > the time in map updates from bpf_obj_free_fields, since it takes the very same > > > spin lock that BPF program can also take to update the list. > > > > > > Map update ops are not allowed in the critical section, so this particular kind > > > of recurisve map update call should not be possible. perf event is already > > > prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update > > > the same map. > > > > > > But then I went looking whether it was a problem elsewhere... > > > > > > FWIW I have updated my patch to do: > > > > > > if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD > > > if (is_tracing_prog_type(prog_type) || ‣type: prog_type > > > (prog_type == BPF_PROG_TYPE_TRACING && > > > env->prog->expected_attach_type != BPF_TRACE_ITER)) { > > > verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp > > > return -EINVAL; > > > } > > > } > > > > That is a severe limitation. > > Why cannot you use notrace approach? > > Yes, notrace is indeed an option, but the problem is that everything within that > critical section needs to be notrace. bpf_list_head_free also ends up calling > bpf_obj_free_fields again (the level of recursion however won't exceed 3, since > we clamp list_head -> list_head till 2 levels). > > So the notrace needs to be applied to everything within it, which is not a > problem now. It can be done. let's do it then. > BPF_TIMER and BPF_KPTR_REF (the indirect call to > dtor) won't be triggered, so it probably just needs to be bpf_list_head_free > and bpf_obj_free_fields. > > But it can break silently in the future, if e.g. kptr is allowed. Same for > drop_prog_refcnt if something changes. Every change to anything they call (and > called by functions they call) needs to keep the restriction in mind. Only funcs that can realistically be called. Not every function that is in there. > I was wondering if in both cases of bpf_timer and bpf_list_head, we can simply > swap out the object locally while holding the lock, and then do everything > outside the spin lock. Maybe, but I wouldn't bother optimizing for convoluted cases. Use notrace when you can. Everything that bpf_mem_alloc is doing is notrace for the same reason. > > For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical > section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it > would mean swapping out the list_head and then draining it outside the lock. That also works. drop_prog_refcnt() can be moved after unlock. Don't see any race. > Then we hopefully don't need to use notrace, and it wouldn't be possible for any > tracing prog to execute while we hold the bpf_spin_lock (unless I missed > something). yep. spin_lock, link list, obj_new won't be allowed in is_tracing_prog_type(). Only talking about prog_type_tracing, since those are a lot more than tracing. Everything new falls into this category. We might even create an alias like prog_type_generic to highlight that.
On Sun, Nov 6, 2022 at 6:34 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > > > For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical > > section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it > > would mean swapping out the list_head and then draining it outside the lock. > > That also works. > drop_prog_refcnt() can be moved after unlock. > Don't see any race. I mean not the whole function obviously. Instead of static void drop_prog_refcnt(struct bpf_hrtimer *t) it can become static struct bpf_prog *drop_prog_refcnt(struct bpf_hrtimer *t) t->prog and callback_fn should only be manipulated under lock. bpf_prog_put itself can happen after unlock.
On Mon, Nov 07, 2022 at 09:15:26AM IST, Alexei Starovoitov wrote: > On Sun, Nov 6, 2022 at 6:34 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical > > > section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it > > > > would mean swapping out the list_head and then draining it outside the lock. > > > > That also works. > > drop_prog_refcnt() can be moved after unlock. > > Don't see any race. > > I mean not the whole function obviously. > Instead of > static void drop_prog_refcnt(struct bpf_hrtimer *t) > it can become > static struct bpf_prog *drop_prog_refcnt(struct bpf_hrtimer *t) > t->prog and callback_fn should only be manipulated > under lock. > bpf_prog_put itself can happen after unlock. Right, both t->prog and t->callback_fn need to be set to NULL under the lock. I will send out the bpf_timer change separately. For now, I moved list draining out of the lock in my series and removed the check on BPF_PROG_TYPE_TRACING, and posted it.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d3b75aa0c54d..6e948c695e84 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -12784,7 +12784,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env, return err; } -static bool is_tracing_prog_type(enum bpf_prog_type type) +static bool is_tracing_prog_type(enum bpf_prog_type type, enum bpf_attach_type eatype) { switch (type) { case BPF_PROG_TYPE_KPROBE: @@ -12792,6 +12792,9 @@ static bool is_tracing_prog_type(enum bpf_prog_type type) case BPF_PROG_TYPE_PERF_EVENT: case BPF_PROG_TYPE_RAW_TRACEPOINT: case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE: + case BPF_PROG_TYPE_TRACING: + if (eatype == BPF_TRACE_ITER) + return false; return true; default: return false; @@ -12803,6 +12806,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, struct bpf_prog *prog) { + enum bpf_attach_type eatype = env->prog->expected_attach_type; enum bpf_prog_type prog_type = resolve_prog_type(prog); if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) { @@ -12811,7 +12815,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, return -EINVAL; } - if (is_tracing_prog_type(prog_type)) { + if (is_tracing_prog_type(prog_type, eatype)) { verbose(env, "tracing progs cannot use bpf_spin_lock yet\n"); return -EINVAL; } @@ -12823,7 +12827,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, } if (btf_record_has_field(map->record, BPF_TIMER)) { - if (is_tracing_prog_type(prog_type)) { + if (is_tracing_prog_type(prog_type, eatype)) { verbose(env, "tracing progs cannot use bpf_timer yet\n"); return -EINVAL; } diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c index 7eb049214859..c0c39da12250 100644 --- a/tools/testing/selftests/bpf/prog_tests/timer.c +++ b/tools/testing/selftests/bpf/prog_tests/timer.c @@ -1,25 +1,30 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2021 Facebook */ #include <test_progs.h> +#include <network_helpers.h> #include "timer.skel.h" static int timer(struct timer *timer_skel) { + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); int err, prog_fd; - LIBBPF_OPTS(bpf_test_run_opts, topts); - - err = timer__attach(timer_skel); - if (!ASSERT_OK(err, "timer_attach")) - return err; ASSERT_EQ(timer_skel->data->callback_check, 52, "callback_check1"); ASSERT_EQ(timer_skel->data->callback2_check, 52, "callback2_check1"); prog_fd = bpf_program__fd(timer_skel->progs.test1); err = bpf_prog_test_run_opts(prog_fd, &topts); - ASSERT_OK(err, "test_run"); - ASSERT_EQ(topts.retval, 0, "test_run"); - timer__detach(timer_skel); + ASSERT_OK(err, "test_run test1"); + ASSERT_EQ(topts.retval, 0, "test_run retval test1"); + + prog_fd = bpf_program__fd(timer_skel->progs.test2); + err = bpf_prog_test_run_opts(prog_fd, &topts); + ASSERT_OK(err, "test_run test2"); + ASSERT_EQ(topts.retval, 0, "test_run retval test2"); usleep(50); /* 10 usecs should be enough, but give it extra */ /* check that timer_cb1() was executed 10+10 times */ diff --git a/tools/testing/selftests/bpf/prog_tests/timer_crash.c b/tools/testing/selftests/bpf/prog_tests/timer_crash.c index f74b82305da8..91f2333b92aa 100644 --- a/tools/testing/selftests/bpf/prog_tests/timer_crash.c +++ b/tools/testing/selftests/bpf/prog_tests/timer_crash.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 #include <test_progs.h> +#include <network_helpers.h> #include "timer_crash.skel.h" enum { @@ -9,6 +10,11 @@ enum { static void test_timer_crash_mode(int mode) { + LIBBPF_OPTS(bpf_test_run_opts, topts, + .data_in = &pkt_v4, + .data_size_in = sizeof(pkt_v4), + .repeat = 1, + ); struct timer_crash *skel; skel = timer_crash__open_and_load(); @@ -18,7 +24,8 @@ static void test_timer_crash_mode(int mode) skel->bss->crash_map = mode; if (!ASSERT_OK(timer_crash__attach(skel), "timer_crash__attach")) goto end; - usleep(1); + ASSERT_OK(bpf_prog_test_run_opts(bpf_program__fd(skel->progs.timer), &topts), "test_run"); + ASSERT_EQ(topts.retval, 0, "test_run retval"); end: timer_crash__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index acda5c9cea93..492f62917898 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -119,8 +119,8 @@ static int timer_cb1(void *map, int *key, struct bpf_timer *timer) return 0; } -SEC("fentry/bpf_fentry_test1") -int BPF_PROG2(test1, int, a) +SEC("tc") +int test1(void *ctx) { struct bpf_timer *arr_timer, *lru_timer; struct elem init = {}; @@ -235,8 +235,8 @@ int bpf_timer_test(void) return 0; } -SEC("fentry/bpf_fentry_test2") -int BPF_PROG2(test2, int, a, int, b) +SEC("tc") +int test2(void *ctx) { struct hmap_elem init = {}, *val; int key = HTAB, key_malloc = HTAB_MALLOC; diff --git a/tools/testing/selftests/bpf/progs/timer_crash.c b/tools/testing/selftests/bpf/progs/timer_crash.c index f8f7944e70da..971eb93f485c 100644 --- a/tools/testing/selftests/bpf/progs/timer_crash.c +++ b/tools/testing/selftests/bpf/progs/timer_crash.c @@ -26,8 +26,8 @@ struct { int pid = 0; int crash_map = 0; /* 0 for amap, 1 for hmap */ -SEC("fentry/do_nanosleep") -int sys_enter(void *ctx) +SEC("tc") +int timer(void *ctx) { struct map_elem *e, value = {}; void *map = crash_map ? (void *)&hmap : (void *)&amap;
Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is excluded is_tracing_prog_type checks. This means that they can use maps containing bpf_spin_lock, bpf_timer, etc. without verification failure. However, allowing fentry/fexit programs to use maps that have such bpf_timer in the map value can lead to deadlock. Suppose that an fentry program is attached to bpf_prog_put, and a TC program executes and does bpf_map_update_elem on an array map that both progs share. If the fentry programs calls bpf_map_update_elem for the same key, it will lead to acquiring of the same lock from within the critical section protecting the timer. The call chain is: bpf_prog_test_run_opts() // TC bpf_prog_TC bpf_map_update_elem(array_map, key=0) bpf_obj_free_fields bpf_timer_cancel_and_free __bpf_spin_lock_irqsave drop_prog_refcnt bpf_prog_put bpf_prog_FENTRY // FENTRY bpf_map_update_elem(array_map, key=0) bpf_obj_free_fields bpf_timer_cancel_and_free __bpf_spin_lock_irqsave // DEADLOCK BPF_TRACE_ITER attach type can be excluded because it always executes in process context. Update selftests using bpf_timer in fentry to TC as they will be broken by this change. Fixes: 5e0bc3082e2e ("bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> --- kernel/bpf/verifier.c | 10 ++++++--- .../testing/selftests/bpf/prog_tests/timer.c | 21 ++++++++++++------- .../selftests/bpf/prog_tests/timer_crash.c | 9 +++++++- tools/testing/selftests/bpf/progs/timer.c | 8 +++---- .../testing/selftests/bpf/progs/timer_crash.c | 4 ++-- 5 files changed, 34 insertions(+), 18 deletions(-)