Message ID | 20230509132433.2FSY_6t7@linutronix.de (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | [RFC] bpf: Remove in_atomic() from bpf_link_put(). | expand |
On Tue, May 9, 2023 at 6:24 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are > invoked within softirq context. By setting rcutree.use_softirq=0 boot > option the RCU callbacks will be invoked in a per-CPU kthread with > bottom halves disabled which implies a RCU read section. > > On PREEMPT_RT the context remains fully preemptible. The RCU read > section however does not allow schedule() invocation. The latter happens > in mutex_lock() performed by bpf_trampoline_unlink_prog() originated > from bpf_link_put(). > > Remove the context checks and use the workqueue unconditionally. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- Please see [0] and corresponding revert commit. We do want bpf_link_free() to happen synchronously if it's caused by close() syscall. f00f2f7fe860 ("Revert "bpf: Fix potential call bpf_link_free() in atomic context"") [0] https://lore.kernel.org/bpf/CAEf4BzZ9zwA=SrLTx9JT50OeM6fVPg0Py0Gx+K9ah2we8YtCRA@mail.gmail.com/ > The warning can be observed as: > | BUG: sleeping function called from invalid context at kernel/locking/rtmutex_api.c:510 > | in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 47, name: rcuc/3 > | preempt_count: 0, expected: 0 > | RCU nest depth: 2, expected: 0 > | CPU: 3 PID: 47 Comm: rcuc/3 Tainted: G E v6.3-rt12 #1 > | Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.3a 01/06/2021 > | Call Trace: > | <TASK> > | dump_stack_lvl+0x43/0x60 > | __might_resched+0x137/0x190 > | mutex_lock+0x1a/0x50 > | bpf_trampoline_unlink_prog+0x1b/0x100 > | bpf_tracing_link_release+0x12/0x40 > | bpf_link_free+0x70/0x90 > | bpf_free_inode+0x3e/0x80 > | rcu_core+0x4ff/0x7c0 > | rcu_cpu_kthread+0xa9/0x2f0 > | smpboot_thread_fn+0x141/0x2c0 > | kthread+0x110/0x130 > | ret_from_fork+0x2c/0x50 > | </TASK> > > kernel/bpf/syscall.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 14f39c1e573ee..0adaa1bfbb0d2 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2785,12 +2785,8 @@ void bpf_link_put(struct bpf_link *link) > if (!atomic64_dec_and_test(&link->refcnt)) > return; > > - if (in_atomic()) { > - INIT_WORK(&link->work, bpf_link_put_deferred); > - schedule_work(&link->work); > - } else { > - bpf_link_free(link); > - } > + INIT_WORK(&link->work, bpf_link_put_deferred); > + schedule_work(&link->work); > } > EXPORT_SYMBOL(bpf_link_put); > > -- > 2.40.1 > >
On Wed, May 10, 2023 at 12:19 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, May 9, 2023 at 6:24 AM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > > > bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are > > invoked within softirq context. By setting rcutree.use_softirq=0 boot > > option the RCU callbacks will be invoked in a per-CPU kthread with > > bottom halves disabled which implies a RCU read section. > > > > On PREEMPT_RT the context remains fully preemptible. The RCU read > > section however does not allow schedule() invocation. The latter happens > > in mutex_lock() performed by bpf_trampoline_unlink_prog() originated > > from bpf_link_put(). > > > > Remove the context checks and use the workqueue unconditionally. > > > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > --- > > Please see [0] and corresponding revert commit. We do want > bpf_link_free() to happen synchronously if it's caused by close() > syscall. > > f00f2f7fe860 ("Revert "bpf: Fix potential call bpf_link_free() in > atomic context"") > > [0] https://lore.kernel.org/bpf/CAEf4BzZ9zwA=SrLTx9JT50OeM6fVPg0Py0Gx+K9ah2we8YtCRA@mail.gmail.com/ Sebastian, Andrii is correct. We cannot do this unconditionally, but we can do it for IS_ENABLED(CONFIG_PREEMPT_RT) if it's causing issues on RT, but BPF users won't be happy with non deterministic prog detach. Do you see a different way of solving it?
On 2023-05-16 22:26:01 [-0700], Alexei Starovoitov wrote: > Sebastian, Hi Alexei, > Andrii is correct. We cannot do this unconditionally, > but we can do it for IS_ENABLED(CONFIG_PREEMPT_RT) > if it's causing issues on RT, but BPF users won't be happy > with non deterministic prog detach. > Do you see a different way of solving it? Yes. I've been distracted with other things, I get back to it. Sebastian
On Tue, May 16, 2023 at 10:26 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, May 10, 2023 at 12:19 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Tue, May 9, 2023 at 6:24 AM Sebastian Andrzej Siewior > > <bigeasy@linutronix.de> wrote: > > > > > > bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are > > > invoked within softirq context. By setting rcutree.use_softirq=0 boot > > > option the RCU callbacks will be invoked in a per-CPU kthread with > > > bottom halves disabled which implies a RCU read section. > > > > > > On PREEMPT_RT the context remains fully preemptible. The RCU read > > > section however does not allow schedule() invocation. The latter happens > > > in mutex_lock() performed by bpf_trampoline_unlink_prog() originated > > > from bpf_link_put(). > > > > > > Remove the context checks and use the workqueue unconditionally. > > > > > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > > --- > > > > Please see [0] and corresponding revert commit. We do want > > bpf_link_free() to happen synchronously if it's caused by close() > > syscall. > > > > f00f2f7fe860 ("Revert "bpf: Fix potential call bpf_link_free() in > > atomic context"") > > > > [0] https://lore.kernel.org/bpf/CAEf4BzZ9zwA=SrLTx9JT50OeM6fVPg0Py0Gx+K9ah2we8YtCRA@mail.gmail.com/ > > Sebastian, > > Andrii is correct. We cannot do this unconditionally, > but we can do it for IS_ENABLED(CONFIG_PREEMPT_RT) > if it's causing issues on RT, but BPF users won't be happy > with non deterministic prog detach. > Do you see a different way of solving it? Is struct file_operations.release() callback guaranteed to be called from user context? If yes, then perhaps the most straightforward way to guarantee synchronous bpf_link cleanup on close(link_fd) is to have a bpf_link_put() variant that will be only called from user-context and will just do bpf_link_free(link) directly, without checking in_atomic().
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 14f39c1e573ee..0adaa1bfbb0d2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2785,12 +2785,8 @@ void bpf_link_put(struct bpf_link *link) if (!atomic64_dec_and_test(&link->refcnt)) return; - if (in_atomic()) { - INIT_WORK(&link->work, bpf_link_put_deferred); - schedule_work(&link->work); - } else { - bpf_link_free(link); - } + INIT_WORK(&link->work, bpf_link_put_deferred); + schedule_work(&link->work); } EXPORT_SYMBOL(bpf_link_put);
bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are invoked within softirq context. By setting rcutree.use_softirq=0 boot option the RCU callbacks will be invoked in a per-CPU kthread with bottom halves disabled which implies a RCU read section. On PREEMPT_RT the context remains fully preemptible. The RCU read section however does not allow schedule() invocation. The latter happens in mutex_lock() performed by bpf_trampoline_unlink_prog() originated from bpf_link_put(). Remove the context checks and use the workqueue unconditionally. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- The warning can be observed as: | BUG: sleeping function called from invalid context at kernel/locking/rtmutex_api.c:510 | in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 47, name: rcuc/3 | preempt_count: 0, expected: 0 | RCU nest depth: 2, expected: 0 | CPU: 3 PID: 47 Comm: rcuc/3 Tainted: G E v6.3-rt12 #1 | Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.3a 01/06/2021 | Call Trace: | <TASK> | dump_stack_lvl+0x43/0x60 | __might_resched+0x137/0x190 | mutex_lock+0x1a/0x50 | bpf_trampoline_unlink_prog+0x1b/0x100 | bpf_tracing_link_release+0x12/0x40 | bpf_link_free+0x70/0x90 | bpf_free_inode+0x3e/0x80 | rcu_core+0x4ff/0x7c0 | rcu_cpu_kthread+0xa9/0x2f0 | smpboot_thread_fn+0x141/0x2c0 | kthread+0x110/0x130 | ret_from_fork+0x2c/0x50 | </TASK> kernel/bpf/syscall.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)