diff mbox series

bpf: defer printk() inside __bpf_prog_run()

Message ID 345098dc-8cb4-4808-98cf-fa9ab3af4fc4@I-love.SAKURA.ne.jp (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: defer printk() inside __bpf_prog_run() | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch, async
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-llvm-17 / test
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-18 / veristat

Commit Message

Tetsuo Handa June 25, 2024, 1:56 p.m. UTC
syzbot is reporting circular locking dependency inside __bpf_prog_run(),
for fault injection calls printk() despite rq lock is already held.

Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
printk() messages. (migrate_{disable,enable}() is not needed if
CONFIG_PREEMPT_RT=y because cant_migrate() asserts that caller already
disabled migration.)

======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc4-syzkaller-00874-g84562f9953ec #0 Not tainted
------------------------------------------------------
syz-executor.1/25480 is trying to acquire lock:
ffffffff8e328140 (console_owner){..-.}-{0:0}, at: rcu_try_lock_acquire include/linux/rcupdate.h:334 [inline]
ffffffff8e328140 (console_owner){..-.}-{0:0}, at: srcu_read_lock_nmisafe include/linux/srcu.h:232 [inline]
ffffffff8e328140 (console_owner){..-.}-{0:0}, at: console_srcu_read_lock kernel/printk/printk.c:286 [inline]
ffffffff8e328140 (console_owner){..-.}-{0:0}, at: console_flush_all+0x152/0xfd0 kernel/printk/printk.c:2986

but task is already holding lock:
ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559

which lock already depends on the new lock.

(...snipped...)

Chain exists of:
  console_owner --> &p->pi_lock --> &rq->__lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&rq->__lock);
                               lock(&p->pi_lock);
                               lock(&rq->__lock);
  lock(console_owner);

 *** DEADLOCK ***

6 locks held by syz-executor.1/25480:
 #0: ffffffff8f5e6f48 (rtnl_mutex){+.+.}-{3:3}, at: dev_ioctl+0x706/0x1340 net/core/dev_ioctl.c:785
 #1: ffffffff8f67dd68 (flowtable_lock){+.+.}-{3:3}, at: nf_flow_table_cleanup+0x23/0xb0 net/netfilter/nf_flow_table_core.c:593
 #2: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2402 [inline]
 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run4+0x244/0x590 kernel/trace/bpf_trace.c:2446
 #4: ffffffff8e20fa60 (console_lock){+.+.}-{0:0}, at: _printk+0xd5/0x120 kernel/printk/printk.c:2370
 #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: rcu_try_lock_acquire include/linux/rcupdate.h:334 [inline]
 #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: srcu_read_lock_nmisafe include/linux/srcu.h:232 [inline]
 #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: console_srcu_read_lock kernel/printk/printk.c:286 [inline]
 #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: console_flush_all+0x152/0xfd0 kernel/printk/printk.c:2986

stack backtrace:
CPU: 0 PID: 25480 Comm: syz-executor.1 Not tainted 6.10.0-rc4-syzkaller-00874-g84562f9953ec #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
 check_prev_add kernel/locking/lockdep.c:3134 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x18e0/0x5900 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
 console_lock_spinning_enable kernel/printk/printk.c:1870 [inline]
 console_emit_next_record kernel/printk/printk.c:2922 [inline]
 console_flush_all+0x810/0xfd0 kernel/printk/printk.c:2994
 console_unlock+0x13b/0x4d0 kernel/printk/printk.c:3063
 vprintk_emit+0x5a6/0x770 kernel/printk/printk.c:2345
 _printk+0xd5/0x120 kernel/printk/printk.c:2370
 fail_dump lib/fault-inject.c:45 [inline]
 should_fail_ex+0x391/0x4e0 lib/fault-inject.c:153
 __copy_to_user_inatomic include/linux/uaccess.h:123 [inline]
 copy_to_user_nofault+0x86/0x140 mm/maccess.c:149
 bpf_prog_b0a3dac844962ed2+0x47/0x4d
 bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline]
 __bpf_prog_run include/linux/filter.h:691 [inline]
 bpf_prog_run include/linux/filter.h:698 [inline]
 __bpf_trace_run kernel/trace/bpf_trace.c:2403 [inline]
 bpf_trace_run4+0x334/0x590 kernel/trace/bpf_trace.c:2446
 __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222
 trace_sched_switch include/trace/events/sched.h:222 [inline]
 __schedule+0x2587/0x4a20 kernel/sched/core.c:6742
 preempt_schedule_notrace+0x100/0x140 kernel/sched/core.c:7017
 preempt_schedule_notrace_thunk+0x1a/0x30 arch/x86/entry/thunk.S:13
 rcu_is_watching+0x7e/0xb0 kernel/rcu/tree.c:725
 trace_lock_acquire include/trace/events/lock.h:24 [inline]
 lock_acquire+0xe3/0x550 kernel/locking/lockdep.c:5725
 rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
 rcu_read_lock include/linux/rcupdate.h:781 [inline]
 start_flush_work kernel/workqueue.c:4122 [inline]
 __flush_work+0x107/0xd00 kernel/workqueue.c:4181
 flush_work kernel/workqueue.c:4232 [inline]
 flush_delayed_work+0x169/0x1c0 kernel/workqueue.c:4254
 nf_flow_table_gc_cleanup net/netfilter/nf_flow_table_core.c:585 [inline]
 nf_flow_table_cleanup+0x62/0xb0 net/netfilter/nf_flow_table_core.c:595
 flow_offload_netdev_event+0x51/0x70 net/netfilter/nft_flow_offload.c:492
 notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
 __dev_notify_flags+0x207/0x400
 dev_change_flags+0xf0/0x1a0 net/core/dev.c:8858
 dev_ifsioc+0x7c8/0xe70 net/core/dev_ioctl.c:529
 dev_ioctl+0x719/0x1340 net/core/dev_ioctl.c:786
 sock_do_ioctl+0x240/0x460 net/socket.c:1236
 sock_ioctl+0x629/0x8e0 net/socket.c:1341
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported-by: syzbot <syzbot+f78380e4eae53c64125c@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=f78380e4eae53c64125c
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Only compile tested.

 include/linux/filter.h | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

John Ogness June 25, 2024, 2:17 p.m. UTC | #1
On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
> for fault injection calls printk() despite rq lock is already held.
>
> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
> printk() messages.

Why is the reason for disabling preemption?

John Ogness
Tetsuo Handa June 25, 2024, 3:07 p.m. UTC | #2
On 2024/06/25 23:17, John Ogness wrote:
> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
>> for fault injection calls printk() despite rq lock is already held.
>>
>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
>> printk() messages.
> 
> Why is the reason for disabling preemption?

Because since kernel/printk/printk_safe.c uses a percpu counter for deferring
printk(), printk_safe_enter() and printk_safe_exit() have to be called from
the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable()
after printk_safe_exit() guarantees that printk_safe_enter() and
printk_safe_exit() are called from the same CPU.
John Ogness June 25, 2024, 3:47 p.m. UTC | #3
On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> On 2024/06/25 23:17, John Ogness wrote:
>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
>>> for fault injection calls printk() despite rq lock is already held.
>>>
>>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
>>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
>>> printk() messages.
>> 
>> Why is the reason for disabling preemption?
>
> Because since kernel/printk/printk_safe.c uses a percpu counter for deferring
> printk(), printk_safe_enter() and printk_safe_exit() have to be called from
> the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable()
> after printk_safe_exit() guarantees that printk_safe_enter() and
> printk_safe_exit() are called from the same CPU.

Yes, but we already have cant_migrate(). Are you suggesting there are
configurations where cant_migrate() is true but the context can be
migrated anyway?

John
Tetsuo Handa June 25, 2024, 4:05 p.m. UTC | #4
On 2024/06/26 0:47, John Ogness wrote:
> On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>> On 2024/06/25 23:17, John Ogness wrote:
>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
>>>> for fault injection calls printk() despite rq lock is already held.
>>>>
>>>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
>>>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
>>>> printk() messages.
>>>
>>> Why is the reason for disabling preemption?
>>
>> Because since kernel/printk/printk_safe.c uses a percpu counter for deferring
>> printk(), printk_safe_enter() and printk_safe_exit() have to be called from
>> the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable()
>> after printk_safe_exit() guarantees that printk_safe_enter() and
>> printk_safe_exit() are called from the same CPU.
> 
> Yes, but we already have cant_migrate(). Are you suggesting there are
> configurations where cant_migrate() is true but the context can be
> migrated anyway?

No, I'm not aware of such configuration.

Does migrate_disable() imply preempt_disable() ?
If yes, we don't need to also call preempt_disable().
My understanding is that migration is about "on which CPU a process runs"
and preemption is about "whether a different process runs on this CPU".
That is, disabling migration and disabling preemption are independent.

Is migrate_disable() alone sufficient for managing a percpu counter?
If yes, we don't need to also call preempt_disable() in order to manage
a percpu counter.
Alexei Starovoitov June 25, 2024, 7:32 p.m. UTC | #5
On Tue, Jun 25, 2024 at 9:05 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/06/26 0:47, John Ogness wrote:
> > On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >> On 2024/06/25 23:17, John Ogness wrote:
> >>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
> >>>> for fault injection calls printk() despite rq lock is already held.
> >>>>
> >>>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
> >>>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
> >>>> printk() messages.
> >>>
> >>> Why is the reason for disabling preemption?
> >>
> >> Because since kernel/printk/printk_safe.c uses a percpu counter for deferring
> >> printk(), printk_safe_enter() and printk_safe_exit() have to be called from
> >> the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable()
> >> after printk_safe_exit() guarantees that printk_safe_enter() and
> >> printk_safe_exit() are called from the same CPU.
> >
> > Yes, but we already have cant_migrate(). Are you suggesting there are
> > configurations where cant_migrate() is true but the context can be
> > migrated anyway?
>
> No, I'm not aware of such configuration.
>
> Does migrate_disable() imply preempt_disable() ?
> If yes, we don't need to also call preempt_disable().
> My understanding is that migration is about "on which CPU a process runs"
> and preemption is about "whether a different process runs on this CPU".
> That is, disabling migration and disabling preemption are independent.
>
> Is migrate_disable() alone sufficient for managing a percpu counter?
> If yes, we don't need to also call preempt_disable() in order to manage
> a percpu counter.

If you want to add printk_deferred_enter() it
probably should be in should_fail_ex(). Not here.
We will not be wrapping all bpf progs this way.

pw-bot: cr
Tetsuo Handa June 25, 2024, 11:52 p.m. UTC | #6
On 2024/06/26 4:32, Alexei Starovoitov wrote:
>>>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
>>>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
>>>>>> for fault injection calls printk() despite rq lock is already held.
> 
> If you want to add printk_deferred_enter() it
> probably should be in should_fail_ex(). Not here.
> We will not be wrapping all bpf progs this way.

should_fail_ex() is just an instance.
Three months ago you said "bpf never calls printk()" at
https://lkml.kernel.org/r/CAADnVQLmLMt2bF9aAB26dtBCvy2oUFt+AAKDRgTTrc7Xk_zxJQ@mail.gmail.com ,
but bpf does indirectly call printk() due to debug functionality.

We will be able to stop wrapping with printk_deferred_enter() after the printk
rework completes ( https://lkml.kernel.org/r/ZXBCB2Gv1O-1-T6f@alley ). But we
can't predict how long we need to wait for all console drivers to get converted.

Until the printk rework completes, it is responsibility of the caller to guard
whatever possible printk() with rq lock already held. If you think that only
individual function that may call printk() (e.g. should_fail_ex()) should be
wrapped, just saying "We will not be wrapping all bpf progs this way" does not
help, for we need to scatter migrate_{disable,enable}() overhead as well as
printk_deferred_{enter,exit}() to individual function despite majority of callers
do not call e.g. should_fail_ex() with rq lock already held. Only who needs to
call e.g. should_fail_ex() with rq lock already held should pay the cost. In this
case, the one who should pay the cost is tracing hooks that are called with rq
lock already held. I don't think that it is reasonable to add overhead to all
users because tracing hooks might not be enabled or bpf program might not call
e.g. should_fail_ex().

If you have a method that we can predict whether e.g. should_fail_ex() is called,
you can wrap only bpf progs that call e.g. should_fail_ex(). But it is your role
to maintain list of functions that might trigger printk(). I think that you don't
want such burden (as well as all users don't want burden/overhead of adding
migrate_{disable,enable}() only for the sake of bpf subsystem).
Alexei Starovoitov June 25, 2024, 11:56 p.m. UTC | #7
On Tue, Jun 25, 2024 at 4:52 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/06/26 4:32, Alexei Starovoitov wrote:
> >>>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >>>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
> >>>>>> for fault injection calls printk() despite rq lock is already held.
> >
> > If you want to add printk_deferred_enter() it
> > probably should be in should_fail_ex(). Not here.
> > We will not be wrapping all bpf progs this way.
>
> should_fail_ex() is just an instance.
> Three months ago you said "bpf never calls printk()" at
> https://lkml.kernel.org/r/CAADnVQLmLMt2bF9aAB26dtBCvy2oUFt+AAKDRgTTrc7Xk_zxJQ@mail.gmail.com ,
> but bpf does indirectly call printk() due to debug functionality.
>
> We will be able to stop wrapping with printk_deferred_enter() after the printk
> rework completes ( https://lkml.kernel.org/r/ZXBCB2Gv1O-1-T6f@alley ). But we
> can't predict how long we need to wait for all console drivers to get converted.
>
> Until the printk rework completes, it is responsibility of the caller to guard
> whatever possible printk() with rq lock already held. If you think that only
> individual function that may call printk() (e.g. should_fail_ex()) should be
> wrapped, just saying "We will not be wrapping all bpf progs this way" does not
> help, for we need to scatter migrate_{disable,enable}() overhead as well as
> printk_deferred_{enter,exit}() to individual function despite majority of callers
> do not call e.g. should_fail_ex() with rq lock already held. Only who needs to
> call e.g. should_fail_ex() with rq lock already held should pay the cost. In this
> case, the one who should pay the cost is tracing hooks that are called with rq
> lock already held. I don't think that it is reasonable to add overhead to all
> users because tracing hooks might not be enabled or bpf program might not call
> e.g. should_fail_ex().
>
> If you have a method that we can predict whether e.g. should_fail_ex() is called,
> you can wrap only bpf progs that call e.g. should_fail_ex(). But it is your role
> to maintain list of functions that might trigger printk(). I think that you don't
> want such burden (as well as all users don't want burden/overhead of adding
> migrate_{disable,enable}() only for the sake of bpf subsystem).

You are missing the point. The bug has nothing to do with bpf.
It can happen without any bpf loaded. Exactly the same way.
should_fail_usercopy() is called on all user accesses.
Tetsuo Handa June 26, 2024, 12:02 a.m. UTC | #8
On 2024/06/26 8:56, Alexei Starovoitov wrote:
> You are missing the point. The bug has nothing to do with bpf.

The bug is caused by calling tracing hooks with rq lock held.
If tracing hooks do not exist, this bug does not exist.

> It can happen without any bpf loaded. Exactly the same way.
> should_fail_usercopy() is called on all user accesses.

Not all callers of e.g. should_fail_usercopy() are holding rq lock.
Petr Mladek June 26, 2024, 8:18 a.m. UTC | #9
On Tue 2024-06-25 17:53:14, John Ogness wrote:
> On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> > On 2024/06/25 23:17, John Ogness wrote:
> >> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
> >>> for fault injection calls printk() despite rq lock is already held.
> >>>
> >>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and
> >>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any
> >>> printk() messages.
> >> 
> >> Why is the reason for disabling preemption?
> >
> > Because since kernel/printk/printk_safe.c uses a percpu counter for deferring
> > printk(), printk_safe_enter() and printk_safe_exit() have to be called from
> > the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable()
> > after printk_safe_exit() guarantees that printk_safe_enter() and
> > printk_safe_exit() are called from the same CPU.
> 
> Yes, but we already have cant_migrate(). Are you suggesting there are
> configurations where cant_migrate() is true but the context can be
> migrated anyway?

IMHO, we want to enter printk_safe only with preemption disabled.
Otherwise, printk() would stay deferred on the given CPU for any
task scheduled in this section.

Best Regards,
Petr
Petr Mladek June 26, 2024, 8:45 a.m. UTC | #10
On Wed 2024-06-26 08:52:44, Tetsuo Handa wrote:
> On 2024/06/26 4:32, Alexei Starovoitov wrote:
> >>>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >>>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(),
> >>>>>> for fault injection calls printk() despite rq lock is already held.
> > 
> > If you want to add printk_deferred_enter() it
> > probably should be in should_fail_ex(). Not here.
> > We will not be wrapping all bpf progs this way.
> 
> should_fail_ex() is just an instance.
> Three months ago you said "bpf never calls printk()" at
> https://lkml.kernel.org/r/CAADnVQLmLMt2bF9aAB26dtBCvy2oUFt+AAKDRgTTrc7Xk_zxJQ@mail.gmail.com ,
> but bpf does indirectly call printk() due to debug functionality.
> 
> We will be able to stop wrapping with printk_deferred_enter() after the printk
> rework completes ( https://lkml.kernel.org/r/ZXBCB2Gv1O-1-T6f@alley ). But we
> can't predict how long we need to wait for all console drivers to get converted.
> 
> Until the printk rework completes, it is responsibility of the caller to guard
> whatever possible printk() with rq lock already held.

Honestly, even the current printk rework does not solve the deadlock
with rq lock completely. The console_lock/console_sem will still be needed for
serialization with early consoles. It might need to be used when
printing emergency messages while there is still a boot console.

I am sure that it might be solved but I am not aware of any plan at
the moment.

I have just got a crazy idea. printk() needs to take the rq lock in
console_unlock() only when there is a waiter for the lock. The problem
might be gone if we offloaded the wakeup into an irq_work.

It is just an idea. I haven't thought much of all consequences and
scenarios. It might violate some basic locking rule and might not work.
Anyway, it would require special variant for unlocking semaphore which would
be used in console_unlock().

> If you think that only
> individual function that may call printk() (e.g. should_fail_ex()) should be
> wrapped, just saying "We will not be wrapping all bpf progs this way" does not
> help, for we need to scatter migrate_{disable,enable}() overhead as well as
> printk_deferred_{enter,exit}() to individual function despite majority of callers
> do not call e.g. should_fail_ex() with rq lock already held. Only who needs to
> call e.g. should_fail_ex() with rq lock already held should pay the cost. In this
> case, the one who should pay the cost is tracing hooks that are called with rq
> lock already held. I don't think that it is reasonable to add overhead to all
> users because tracing hooks might not be enabled or bpf program might not call
> e.g. should_fail_ex().
> 
> If you have a method that we can predict whether e.g. should_fail_ex() is called,
> you can wrap only bpf progs that call e.g. should_fail_ex(). But it is your role
> to maintain list of functions that might trigger printk(). I think that you don't
> want such burden (as well as all users don't want burden/overhead of adding
> migrate_{disable,enable}() only for the sake of bpf subsystem).

Yeah, converting printk() into printk_deferred() or using
printk_deferred_enter() around particular code paths is a whac-a-mole
game.

Best Regards,
Petr
Steven Rostedt June 26, 2024, 4:27 p.m. UTC | #11
On Wed, 26 Jun 2024 09:02:22 +0900
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> On 2024/06/26 8:56, Alexei Starovoitov wrote:
> > You are missing the point. The bug has nothing to do with bpf.  
> 
> The bug is caused by calling tracing hooks with rq lock held.
> If tracing hooks do not exist, this bug does not exist.

Could you expand on this. What tracing hooks are called with rq lock
held? You mean the scheduling events?

> 
> > It can happen without any bpf loaded. Exactly the same way.
> > should_fail_usercopy() is called on all user accesses.  
> 
> Not all callers of e.g. should_fail_usercopy() are holding rq lock.

Sorry, but if a function is going to call printk and can be called in
any context that has rq locks held, then it should be doing the printk
deferred and preempt disable logic, and not expect the caller of it to
do that dirty work. Otherwise this will expand out of control.

The same goes with calling spin_lock_irq() vs spin_lock_irqsave(). If a
function is called with interrupts disabled sometimes and sometimes
not, it needs the irqsave() version. We don't make all callers of it
disable interrupts.

-- Steve
Tetsuo Handa June 26, 2024, 10:15 p.m. UTC | #12
On 2024/06/27 1:27, Steven Rostedt wrote:
> On Wed, 26 Jun 2024 09:02:22 +0900
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> On 2024/06/26 8:56, Alexei Starovoitov wrote:
>>> You are missing the point. The bug has nothing to do with bpf.  
>>
>> The bug is caused by calling tracing hooks with rq lock held.
>> If tracing hooks do not exist, this bug does not exist.
> 
> Could you expand on this. What tracing hooks are called with rq lock
> held? You mean the scheduling events?

Yes, trace_sched_switch().
__schedule() calls trace_sched_switch() hook with rq lock held.

 #2: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559

 __bpf_prog_run include/linux/filter.h:691 [inline]
 bpf_prog_run include/linux/filter.h:698 [inline]
 __bpf_trace_run kernel/trace/bpf_trace.c:2403 [inline]
 bpf_trace_run4+0x334/0x590 kernel/trace/bpf_trace.c:2446
 __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222
 trace_sched_switch include/trace/events/sched.h:222 [inline]
 __schedule+0x2587/0x4a20 kernel/sched/core.c:6742
 preempt_schedule_notrace+0x100/0x140 kernel/sched/core.c:7017
Steven Rostedt June 26, 2024, 10:33 p.m. UTC | #13
On Thu, 27 Jun 2024 07:15:25 +0900
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> On 2024/06/27 1:27, Steven Rostedt wrote:
> > On Wed, 26 Jun 2024 09:02:22 +0900
> > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> >   
> >> On 2024/06/26 8:56, Alexei Starovoitov wrote:  
> >>> You are missing the point. The bug has nothing to do with bpf.    
> >>
> >> The bug is caused by calling tracing hooks with rq lock held.
> >> If tracing hooks do not exist, this bug does not exist.  
> > 
> > Could you expand on this. What tracing hooks are called with rq lock
> > held? You mean the scheduling events?  
> 
> Yes, trace_sched_switch().
> __schedule() calls trace_sched_switch() hook with rq lock held.
> 
>  #2: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559
> 
>  __bpf_prog_run include/linux/filter.h:691 [inline]
>  bpf_prog_run include/linux/filter.h:698 [inline]
>  __bpf_trace_run kernel/trace/bpf_trace.c:2403 [inline]
>  bpf_trace_run4+0x334/0x590 kernel/trace/bpf_trace.c:2446
>  __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222
>  trace_sched_switch include/trace/events/sched.h:222 [inline]
>  __schedule+0x2587/0x4a20 kernel/sched/core.c:6742
>  preempt_schedule_notrace+0x100/0x140 kernel/sched/core.c:7017

So you are saying that because a BPF hook can attach to a tracepoint
that is called with rq locks held, it should always disable preemption
and call printk_deferred_enter(), because it *might* hit an error path
that will call printk?? In other words, how the BPF hook is used
determines if the rq lock is held or not when it is called.

I can use that same argument for should_fail_ex(). Because how it is
used determines if the rq lock is held or not when it is called. And it
is the function that actually calls printk().

Sorry, but it makes no sense to put the burden of the
printk_deferred_enter() on the BPF hook logic. It should sit solely
with the code that actually calls printk().

-- Steve
Tetsuo Handa June 26, 2024, 11:08 p.m. UTC | #14
On 2024/06/27 7:33, Steven Rostedt wrote:
> So you are saying that because a BPF hook can attach to a tracepoint
> that is called with rq locks held, it should always disable preemption
> and call printk_deferred_enter(), because it *might* hit an error path
> that will call printk?? In other words, how the BPF hook is used
> determines if the rq lock is held or not when it is called.

Yes.

> 
> I can use that same argument for should_fail_ex(). Because how it is
> used determines if the rq lock is held or not when it is called. And it
> is the function that actually calls printk().

Strictly speaking, KASAN/KMSAN/KCSAN etc. *might* call printk() at any location.
In that aspect, just wrapping individual function that explicitly calls printk()
might not be sufficient. We will need to widen section for deferring printk(),
but we don't want to needlessly call migrate_disable()/preempt_disable()/
printk_deferred_enter() due to performance reason. We need to find a balanced
location for calling migrate_disable()/preempt_disable()/printk_deferred_enter().
I consider __bpf_prog_run() as a balanced location.

> 
> Sorry, but it makes no sense to put the burden of the
> printk_deferred_enter() on the BPF hook logic. It should sit solely
> with the code that actually calls printk().

How do you respond to Petr Mladek's comment

  Yeah, converting printk() into printk_deferred() or using
  printk_deferred_enter() around particular code paths is a whac-a-mole
  game.

at https://lkml.kernel.org/r/ZnvVQ5cs9F0b7paI@pathway.suse.cz ?
Alexei Starovoitov June 26, 2024, 11:52 p.m. UTC | #15
On Wed, Jun 26, 2024 at 4:09 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2024/06/27 7:33, Steven Rostedt wrote:
> > So you are saying that because a BPF hook can attach to a tracepoint
> > that is called with rq locks held, it should always disable preemption
> > and call printk_deferred_enter(), because it *might* hit an error path
> > that will call printk?? In other words, how the BPF hook is used
> > determines if the rq lock is held or not when it is called.
>
> Yes.
>
> >
> > I can use that same argument for should_fail_ex(). Because how it is
> > used determines if the rq lock is held or not when it is called. And it
> > is the function that actually calls printk().
>
> Strictly speaking, KASAN/KMSAN/KCSAN etc. *might* call printk() at any location.
> In that aspect, just wrapping individual function that explicitly calls printk()
> might not be sufficient. We will need to widen section for deferring printk(),
> but we don't want to needlessly call migrate_disable()/preempt_disable()/
> printk_deferred_enter() due to performance reason. We need to find a balanced
> location for calling migrate_disable()/preempt_disable()/printk_deferred_enter().
> I consider __bpf_prog_run() as a balanced location.

Tetsuo,
your repeated invalid arguments are not making this thread productive.
Told you already that the same can happen without bpf in the picture.

> >
> > Sorry, but it makes no sense to put the burden of the
> > printk_deferred_enter() on the BPF hook logic. It should sit solely
> > with the code that actually calls printk().
>
> How do you respond to Petr Mladek's comment
>
>   Yeah, converting printk() into printk_deferred() or using
>   printk_deferred_enter() around particular code paths is a whac-a-mole
>   game.

Exactly. wrapping bpf with printk_deferred_enter() is such a whac-a-mole.
It doesn't fix an issue.
Tetsuo Handa June 27, 2024, midnight UTC | #16
On 2024/06/27 8:52, Alexei Starovoitov wrote:
>>> Sorry, but it makes no sense to put the burden of the
>>> printk_deferred_enter() on the BPF hook logic. It should sit solely
>>> with the code that actually calls printk().
>>
>> How do you respond to Petr Mladek's comment
>>
>>   Yeah, converting printk() into printk_deferred() or using
>>   printk_deferred_enter() around particular code paths is a whac-a-mole
>>   game.
> 
> Exactly. wrapping bpf with printk_deferred_enter() is such a whac-a-mole.
> It doesn't fix an issue.

Then, what change do you propose?
Steven Rostedt June 27, 2024, 12:09 a.m. UTC | #17
On Thu, 27 Jun 2024 08:08:57 +0900
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> How do you respond to Petr Mladek's comment
> 
>   Yeah, converting printk() into printk_deferred() or using
>   printk_deferred_enter() around particular code paths is a whac-a-mole
>   game.
> 
> at https://lkml.kernel.org/r/ZnvVQ5cs9F0b7paI@pathway.suse.cz ?

I agree with that. And your solution is no different than whack-a-mole.
It's just that you used a bigger hammer to wack the mole.

-- Steve
Tetsuo Handa June 27, 2024, 12:21 a.m. UTC | #18
On 2024/06/27 9:09, Steven Rostedt wrote:
> On Thu, 27 Jun 2024 08:08:57 +0900
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:
> 
>> How do you respond to Petr Mladek's comment
>>
>>   Yeah, converting printk() into printk_deferred() or using
>>   printk_deferred_enter() around particular code paths is a whac-a-mole
>>   game.
>>
>> at https://lkml.kernel.org/r/ZnvVQ5cs9F0b7paI@pathway.suse.cz ?
> 
> I agree with that. And your solution is no different than whack-a-mole.
> It's just that you used a bigger hammer to wack the mole.

What change do you propose?

https://syzkaller.appspot.com/text?tag=Patch&x=121c92fe180000 ?
Steven Rostedt June 27, 2024, 12:29 a.m. UTC | #19
On Thu, 27 Jun 2024 09:21:38 +0900
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote:

> What change do you propose?
> 
> https://syzkaller.appspot.com/text?tag=Patch&x=121c92fe180000 ?

That's one solution, but you need to call printk_deferred_enter()
*after* taking the lock, otherwise preemption could still be enabled
and you could schedule and migrate between the printk_deferred_enter()
and taking of the lock.

-- Steve
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index dd41a93f06b2e..977ae5b486164 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -674,6 +674,10 @@  static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
 	u32 ret;
 
 	cant_migrate();
+#ifndef CONFIG_PREEMPT_RT
+	preempt_disable();
+#endif
+	printk_deferred_enter();
 	if (static_branch_unlikely(&bpf_stats_enabled_key)) {
 		struct bpf_prog_stats *stats;
 		u64 duration, start = sched_clock();
@@ -690,6 +694,10 @@  static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog,
 	} else {
 		ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
 	}
+	printk_deferred_exit();
+#ifndef CONFIG_PREEMPT_RT
+	preempt_enable();
+#endif
 	return ret;
 }