Message ID | 345098dc-8cb4-4808-98cf-fa9ab3af4fc4@I-love.SAKURA.ne.jp (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: defer printk() inside __bpf_prog_run() | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch, async |
bpf/vmtest-bpf-next-VM_Test-0 | success | Logs for Lint |
bpf/vmtest-bpf-next-VM_Test-2 | success | Logs for Unittests |
bpf/vmtest-bpf-next-VM_Test-3 | success | Logs for Validate matrix.py |
bpf/vmtest-bpf-next-VM_Test-1 | success | Logs for ShellCheck |
bpf/vmtest-bpf-next-VM_Test-5 | success | Logs for aarch64-gcc / build-release |
bpf/vmtest-bpf-next-PR | fail | PR summary |
bpf/vmtest-bpf-next-VM_Test-7 | success | Logs for aarch64-gcc / veristat |
bpf/vmtest-bpf-next-VM_Test-9 | success | Logs for s390x-gcc / build-release |
bpf/vmtest-bpf-next-VM_Test-10 | success | Logs for s390x-gcc / test |
bpf/vmtest-bpf-next-VM_Test-4 | fail | Logs for aarch64-gcc / build / build for aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-11 | success | Logs for s390x-gcc / veristat |
bpf/vmtest-bpf-next-VM_Test-8 | fail | Logs for s390x-gcc / build / build for s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-6 | success | Logs for aarch64-gcc / test |
bpf/vmtest-bpf-next-VM_Test-13 | fail | Logs for x86_64-gcc / build / build for x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-18 | fail | Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2 |
bpf/vmtest-bpf-next-VM_Test-20 | success | Logs for x86_64-llvm-17 / veristat |
bpf/vmtest-bpf-next-VM_Test-12 | success | Logs for set-matrix |
bpf/vmtest-bpf-next-VM_Test-14 | success | Logs for x86_64-gcc / build-release |
bpf/vmtest-bpf-next-VM_Test-15 | success | Logs for x86_64-gcc / test |
bpf/vmtest-bpf-next-VM_Test-17 | fail | Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17 |
bpf/vmtest-bpf-next-VM_Test-16 | success | Logs for x86_64-gcc / veristat |
bpf/vmtest-bpf-next-VM_Test-19 | success | Logs for x86_64-llvm-17 / test |
bpf/vmtest-bpf-next-VM_Test-22 | fail | Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2 |
bpf/vmtest-bpf-next-VM_Test-23 | success | Logs for x86_64-llvm-18 / test |
bpf/vmtest-bpf-next-VM_Test-21 | fail | Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18 |
bpf/vmtest-bpf-next-VM_Test-24 | success | Logs for x86_64-llvm-18 / veristat |
On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > syzbot is reporting circular locking dependency inside __bpf_prog_run(), > for fault injection calls printk() despite rq lock is already held. > > Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and > preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any > printk() messages. Why is the reason for disabling preemption? John Ogness
On 2024/06/25 23:17, John Ogness wrote: > On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >> syzbot is reporting circular locking dependency inside __bpf_prog_run(), >> for fault injection calls printk() despite rq lock is already held. >> >> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and >> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any >> printk() messages. > > Why is the reason for disabling preemption? Because since kernel/printk/printk_safe.c uses a percpu counter for deferring printk(), printk_safe_enter() and printk_safe_exit() have to be called from the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable() after printk_safe_exit() guarantees that printk_safe_enter() and printk_safe_exit() are called from the same CPU.
On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > On 2024/06/25 23:17, John Ogness wrote: >> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), >>> for fault injection calls printk() despite rq lock is already held. >>> >>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and >>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any >>> printk() messages. >> >> Why is the reason for disabling preemption? > > Because since kernel/printk/printk_safe.c uses a percpu counter for deferring > printk(), printk_safe_enter() and printk_safe_exit() have to be called from > the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable() > after printk_safe_exit() guarantees that printk_safe_enter() and > printk_safe_exit() are called from the same CPU. Yes, but we already have cant_migrate(). Are you suggesting there are configurations where cant_migrate() is true but the context can be migrated anyway? John
On 2024/06/26 0:47, John Ogness wrote: > On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >> On 2024/06/25 23:17, John Ogness wrote: >>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), >>>> for fault injection calls printk() despite rq lock is already held. >>>> >>>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and >>>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any >>>> printk() messages. >>> >>> Why is the reason for disabling preemption? >> >> Because since kernel/printk/printk_safe.c uses a percpu counter for deferring >> printk(), printk_safe_enter() and printk_safe_exit() have to be called from >> the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable() >> after printk_safe_exit() guarantees that printk_safe_enter() and >> printk_safe_exit() are called from the same CPU. > > Yes, but we already have cant_migrate(). Are you suggesting there are > configurations where cant_migrate() is true but the context can be > migrated anyway? No, I'm not aware of such configuration. Does migrate_disable() imply preempt_disable() ? If yes, we don't need to also call preempt_disable(). My understanding is that migration is about "on which CPU a process runs" and preemption is about "whether a different process runs on this CPU". That is, disabling migration and disabling preemption are independent. Is migrate_disable() alone sufficient for managing a percpu counter? If yes, we don't need to also call preempt_disable() in order to manage a percpu counter.
On Tue, Jun 25, 2024 at 9:05 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2024/06/26 0:47, John Ogness wrote: > > On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> On 2024/06/25 23:17, John Ogness wrote: > >>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), > >>>> for fault injection calls printk() despite rq lock is already held. > >>>> > >>>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and > >>>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any > >>>> printk() messages. > >>> > >>> Why is the reason for disabling preemption? > >> > >> Because since kernel/printk/printk_safe.c uses a percpu counter for deferring > >> printk(), printk_safe_enter() and printk_safe_exit() have to be called from > >> the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable() > >> after printk_safe_exit() guarantees that printk_safe_enter() and > >> printk_safe_exit() are called from the same CPU. > > > > Yes, but we already have cant_migrate(). Are you suggesting there are > > configurations where cant_migrate() is true but the context can be > > migrated anyway? > > No, I'm not aware of such configuration. > > Does migrate_disable() imply preempt_disable() ? > If yes, we don't need to also call preempt_disable(). > My understanding is that migration is about "on which CPU a process runs" > and preemption is about "whether a different process runs on this CPU". > That is, disabling migration and disabling preemption are independent. > > Is migrate_disable() alone sufficient for managing a percpu counter? > If yes, we don't need to also call preempt_disable() in order to manage > a percpu counter. If you want to add printk_deferred_enter() it probably should be in should_fail_ex(). Not here. We will not be wrapping all bpf progs this way. pw-bot: cr
On 2024/06/26 4:32, Alexei Starovoitov wrote: >>>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: >>>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), >>>>>> for fault injection calls printk() despite rq lock is already held. > > If you want to add printk_deferred_enter() it > probably should be in should_fail_ex(). Not here. > We will not be wrapping all bpf progs this way. should_fail_ex() is just an instance. Three months ago you said "bpf never calls printk()" at https://lkml.kernel.org/r/CAADnVQLmLMt2bF9aAB26dtBCvy2oUFt+AAKDRgTTrc7Xk_zxJQ@mail.gmail.com , but bpf does indirectly call printk() due to debug functionality. We will be able to stop wrapping with printk_deferred_enter() after the printk rework completes ( https://lkml.kernel.org/r/ZXBCB2Gv1O-1-T6f@alley ). But we can't predict how long we need to wait for all console drivers to get converted. Until the printk rework completes, it is responsibility of the caller to guard whatever possible printk() with rq lock already held. If you think that only individual function that may call printk() (e.g. should_fail_ex()) should be wrapped, just saying "We will not be wrapping all bpf progs this way" does not help, for we need to scatter migrate_{disable,enable}() overhead as well as printk_deferred_{enter,exit}() to individual function despite majority of callers do not call e.g. should_fail_ex() with rq lock already held. Only who needs to call e.g. should_fail_ex() with rq lock already held should pay the cost. In this case, the one who should pay the cost is tracing hooks that are called with rq lock already held. I don't think that it is reasonable to add overhead to all users because tracing hooks might not be enabled or bpf program might not call e.g. should_fail_ex(). If you have a method that we can predict whether e.g. should_fail_ex() is called, you can wrap only bpf progs that call e.g. should_fail_ex(). But it is your role to maintain list of functions that might trigger printk(). I think that you don't want such burden (as well as all users don't want burden/overhead of adding migrate_{disable,enable}() only for the sake of bpf subsystem).
On Tue, Jun 25, 2024 at 4:52 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2024/06/26 4:32, Alexei Starovoitov wrote: > >>>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >>>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), > >>>>>> for fault injection calls printk() despite rq lock is already held. > > > > If you want to add printk_deferred_enter() it > > probably should be in should_fail_ex(). Not here. > > We will not be wrapping all bpf progs this way. > > should_fail_ex() is just an instance. > Three months ago you said "bpf never calls printk()" at > https://lkml.kernel.org/r/CAADnVQLmLMt2bF9aAB26dtBCvy2oUFt+AAKDRgTTrc7Xk_zxJQ@mail.gmail.com , > but bpf does indirectly call printk() due to debug functionality. > > We will be able to stop wrapping with printk_deferred_enter() after the printk > rework completes ( https://lkml.kernel.org/r/ZXBCB2Gv1O-1-T6f@alley ). But we > can't predict how long we need to wait for all console drivers to get converted. > > Until the printk rework completes, it is responsibility of the caller to guard > whatever possible printk() with rq lock already held. If you think that only > individual function that may call printk() (e.g. should_fail_ex()) should be > wrapped, just saying "We will not be wrapping all bpf progs this way" does not > help, for we need to scatter migrate_{disable,enable}() overhead as well as > printk_deferred_{enter,exit}() to individual function despite majority of callers > do not call e.g. should_fail_ex() with rq lock already held. Only who needs to > call e.g. should_fail_ex() with rq lock already held should pay the cost. In this > case, the one who should pay the cost is tracing hooks that are called with rq > lock already held. I don't think that it is reasonable to add overhead to all > users because tracing hooks might not be enabled or bpf program might not call > e.g. should_fail_ex(). > > If you have a method that we can predict whether e.g. should_fail_ex() is called, > you can wrap only bpf progs that call e.g. should_fail_ex(). But it is your role > to maintain list of functions that might trigger printk(). I think that you don't > want such burden (as well as all users don't want burden/overhead of adding > migrate_{disable,enable}() only for the sake of bpf subsystem). You are missing the point. The bug has nothing to do with bpf. It can happen without any bpf loaded. Exactly the same way. should_fail_usercopy() is called on all user accesses.
On 2024/06/26 8:56, Alexei Starovoitov wrote: > You are missing the point. The bug has nothing to do with bpf. The bug is caused by calling tracing hooks with rq lock held. If tracing hooks do not exist, this bug does not exist. > It can happen without any bpf loaded. Exactly the same way. > should_fail_usercopy() is called on all user accesses. Not all callers of e.g. should_fail_usercopy() are holding rq lock.
On Tue 2024-06-25 17:53:14, John Ogness wrote: > On 2024-06-26, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > On 2024/06/25 23:17, John Ogness wrote: > >> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), > >>> for fault injection calls printk() despite rq lock is already held. > >>> > >>> Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and > >>> preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any > >>> printk() messages. > >> > >> Why is the reason for disabling preemption? > > > > Because since kernel/printk/printk_safe.c uses a percpu counter for deferring > > printk(), printk_safe_enter() and printk_safe_exit() have to be called from > > the same CPU. preempt_disable() before printk_safe_enter() and preempt_enable() > > after printk_safe_exit() guarantees that printk_safe_enter() and > > printk_safe_exit() are called from the same CPU. > > Yes, but we already have cant_migrate(). Are you suggesting there are > configurations where cant_migrate() is true but the context can be > migrated anyway? IMHO, we want to enter printk_safe only with preemption disabled. Otherwise, printk() would stay deferred on the given CPU for any task scheduled in this section. Best Regards, Petr
On Wed 2024-06-26 08:52:44, Tetsuo Handa wrote: > On 2024/06/26 4:32, Alexei Starovoitov wrote: > >>>>> On 2024-06-25, Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >>>>>> syzbot is reporting circular locking dependency inside __bpf_prog_run(), > >>>>>> for fault injection calls printk() despite rq lock is already held. > > > > If you want to add printk_deferred_enter() it > > probably should be in should_fail_ex(). Not here. > > We will not be wrapping all bpf progs this way. > > should_fail_ex() is just an instance. > Three months ago you said "bpf never calls printk()" at > https://lkml.kernel.org/r/CAADnVQLmLMt2bF9aAB26dtBCvy2oUFt+AAKDRgTTrc7Xk_zxJQ@mail.gmail.com , > but bpf does indirectly call printk() due to debug functionality. > > We will be able to stop wrapping with printk_deferred_enter() after the printk > rework completes ( https://lkml.kernel.org/r/ZXBCB2Gv1O-1-T6f@alley ). But we > can't predict how long we need to wait for all console drivers to get converted. > > Until the printk rework completes, it is responsibility of the caller to guard > whatever possible printk() with rq lock already held. Honestly, even the current printk rework does not solve the deadlock with rq lock completely. The console_lock/console_sem will still be needed for serialization with early consoles. It might need to be used when printing emergency messages while there is still a boot console. I am sure that it might be solved but I am not aware of any plan at the moment. I have just got a crazy idea. printk() needs to take the rq lock in console_unlock() only when there is a waiter for the lock. The problem might be gone if we offloaded the wakeup into an irq_work. It is just an idea. I haven't thought much of all consequences and scenarios. It might violate some basic locking rule and might not work. Anyway, it would require special variant for unlocking semaphore which would be used in console_unlock(). > If you think that only > individual function that may call printk() (e.g. should_fail_ex()) should be > wrapped, just saying "We will not be wrapping all bpf progs this way" does not > help, for we need to scatter migrate_{disable,enable}() overhead as well as > printk_deferred_{enter,exit}() to individual function despite majority of callers > do not call e.g. should_fail_ex() with rq lock already held. Only who needs to > call e.g. should_fail_ex() with rq lock already held should pay the cost. In this > case, the one who should pay the cost is tracing hooks that are called with rq > lock already held. I don't think that it is reasonable to add overhead to all > users because tracing hooks might not be enabled or bpf program might not call > e.g. should_fail_ex(). > > If you have a method that we can predict whether e.g. should_fail_ex() is called, > you can wrap only bpf progs that call e.g. should_fail_ex(). But it is your role > to maintain list of functions that might trigger printk(). I think that you don't > want such burden (as well as all users don't want burden/overhead of adding > migrate_{disable,enable}() only for the sake of bpf subsystem). Yeah, converting printk() into printk_deferred() or using printk_deferred_enter() around particular code paths is a whac-a-mole game. Best Regards, Petr
On Wed, 26 Jun 2024 09:02:22 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > On 2024/06/26 8:56, Alexei Starovoitov wrote: > > You are missing the point. The bug has nothing to do with bpf. > > The bug is caused by calling tracing hooks with rq lock held. > If tracing hooks do not exist, this bug does not exist. Could you expand on this. What tracing hooks are called with rq lock held? You mean the scheduling events? > > > It can happen without any bpf loaded. Exactly the same way. > > should_fail_usercopy() is called on all user accesses. > > Not all callers of e.g. should_fail_usercopy() are holding rq lock. Sorry, but if a function is going to call printk and can be called in any context that has rq locks held, then it should be doing the printk deferred and preempt disable logic, and not expect the caller of it to do that dirty work. Otherwise this will expand out of control. The same goes with calling spin_lock_irq() vs spin_lock_irqsave(). If a function is called with interrupts disabled sometimes and sometimes not, it needs the irqsave() version. We don't make all callers of it disable interrupts. -- Steve
On 2024/06/27 1:27, Steven Rostedt wrote: > On Wed, 26 Jun 2024 09:02:22 +0900 > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> On 2024/06/26 8:56, Alexei Starovoitov wrote: >>> You are missing the point. The bug has nothing to do with bpf. >> >> The bug is caused by calling tracing hooks with rq lock held. >> If tracing hooks do not exist, this bug does not exist. > > Could you expand on this. What tracing hooks are called with rq lock > held? You mean the scheduling events? Yes, trace_sched_switch(). __schedule() calls trace_sched_switch() hook with rq lock held. #2: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 __bpf_prog_run include/linux/filter.h:691 [inline] bpf_prog_run include/linux/filter.h:698 [inline] __bpf_trace_run kernel/trace/bpf_trace.c:2403 [inline] bpf_trace_run4+0x334/0x590 kernel/trace/bpf_trace.c:2446 __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222 trace_sched_switch include/trace/events/sched.h:222 [inline] __schedule+0x2587/0x4a20 kernel/sched/core.c:6742 preempt_schedule_notrace+0x100/0x140 kernel/sched/core.c:7017
On Thu, 27 Jun 2024 07:15:25 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > On 2024/06/27 1:27, Steven Rostedt wrote: > > On Wed, 26 Jun 2024 09:02:22 +0900 > > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > > > >> On 2024/06/26 8:56, Alexei Starovoitov wrote: > >>> You are missing the point. The bug has nothing to do with bpf. > >> > >> The bug is caused by calling tracing hooks with rq lock held. > >> If tracing hooks do not exist, this bug does not exist. > > > > Could you expand on this. What tracing hooks are called with rq lock > > held? You mean the scheduling events? > > Yes, trace_sched_switch(). > __schedule() calls trace_sched_switch() hook with rq lock held. > > #2: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 > > __bpf_prog_run include/linux/filter.h:691 [inline] > bpf_prog_run include/linux/filter.h:698 [inline] > __bpf_trace_run kernel/trace/bpf_trace.c:2403 [inline] > bpf_trace_run4+0x334/0x590 kernel/trace/bpf_trace.c:2446 > __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222 > trace_sched_switch include/trace/events/sched.h:222 [inline] > __schedule+0x2587/0x4a20 kernel/sched/core.c:6742 > preempt_schedule_notrace+0x100/0x140 kernel/sched/core.c:7017 So you are saying that because a BPF hook can attach to a tracepoint that is called with rq locks held, it should always disable preemption and call printk_deferred_enter(), because it *might* hit an error path that will call printk?? In other words, how the BPF hook is used determines if the rq lock is held or not when it is called. I can use that same argument for should_fail_ex(). Because how it is used determines if the rq lock is held or not when it is called. And it is the function that actually calls printk(). Sorry, but it makes no sense to put the burden of the printk_deferred_enter() on the BPF hook logic. It should sit solely with the code that actually calls printk(). -- Steve
On 2024/06/27 7:33, Steven Rostedt wrote: > So you are saying that because a BPF hook can attach to a tracepoint > that is called with rq locks held, it should always disable preemption > and call printk_deferred_enter(), because it *might* hit an error path > that will call printk?? In other words, how the BPF hook is used > determines if the rq lock is held or not when it is called. Yes. > > I can use that same argument for should_fail_ex(). Because how it is > used determines if the rq lock is held or not when it is called. And it > is the function that actually calls printk(). Strictly speaking, KASAN/KMSAN/KCSAN etc. *might* call printk() at any location. In that aspect, just wrapping individual function that explicitly calls printk() might not be sufficient. We will need to widen section for deferring printk(), but we don't want to needlessly call migrate_disable()/preempt_disable()/ printk_deferred_enter() due to performance reason. We need to find a balanced location for calling migrate_disable()/preempt_disable()/printk_deferred_enter(). I consider __bpf_prog_run() as a balanced location. > > Sorry, but it makes no sense to put the burden of the > printk_deferred_enter() on the BPF hook logic. It should sit solely > with the code that actually calls printk(). How do you respond to Petr Mladek's comment Yeah, converting printk() into printk_deferred() or using printk_deferred_enter() around particular code paths is a whac-a-mole game. at https://lkml.kernel.org/r/ZnvVQ5cs9F0b7paI@pathway.suse.cz ?
On Wed, Jun 26, 2024 at 4:09 PM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > On 2024/06/27 7:33, Steven Rostedt wrote: > > So you are saying that because a BPF hook can attach to a tracepoint > > that is called with rq locks held, it should always disable preemption > > and call printk_deferred_enter(), because it *might* hit an error path > > that will call printk?? In other words, how the BPF hook is used > > determines if the rq lock is held or not when it is called. > > Yes. > > > > > I can use that same argument for should_fail_ex(). Because how it is > > used determines if the rq lock is held or not when it is called. And it > > is the function that actually calls printk(). > > Strictly speaking, KASAN/KMSAN/KCSAN etc. *might* call printk() at any location. > In that aspect, just wrapping individual function that explicitly calls printk() > might not be sufficient. We will need to widen section for deferring printk(), > but we don't want to needlessly call migrate_disable()/preempt_disable()/ > printk_deferred_enter() due to performance reason. We need to find a balanced > location for calling migrate_disable()/preempt_disable()/printk_deferred_enter(). > I consider __bpf_prog_run() as a balanced location. Tetsuo, your repeated invalid arguments are not making this thread productive. Told you already that the same can happen without bpf in the picture. > > > > Sorry, but it makes no sense to put the burden of the > > printk_deferred_enter() on the BPF hook logic. It should sit solely > > with the code that actually calls printk(). > > How do you respond to Petr Mladek's comment > > Yeah, converting printk() into printk_deferred() or using > printk_deferred_enter() around particular code paths is a whac-a-mole > game. Exactly. wrapping bpf with printk_deferred_enter() is such a whac-a-mole. It doesn't fix an issue.
On 2024/06/27 8:52, Alexei Starovoitov wrote: >>> Sorry, but it makes no sense to put the burden of the >>> printk_deferred_enter() on the BPF hook logic. It should sit solely >>> with the code that actually calls printk(). >> >> How do you respond to Petr Mladek's comment >> >> Yeah, converting printk() into printk_deferred() or using >> printk_deferred_enter() around particular code paths is a whac-a-mole >> game. > > Exactly. wrapping bpf with printk_deferred_enter() is such a whac-a-mole. > It doesn't fix an issue. Then, what change do you propose?
On Thu, 27 Jun 2024 08:08:57 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > How do you respond to Petr Mladek's comment > > Yeah, converting printk() into printk_deferred() or using > printk_deferred_enter() around particular code paths is a whac-a-mole > game. > > at https://lkml.kernel.org/r/ZnvVQ5cs9F0b7paI@pathway.suse.cz ? I agree with that. And your solution is no different than whack-a-mole. It's just that you used a bigger hammer to wack the mole. -- Steve
On 2024/06/27 9:09, Steven Rostedt wrote: > On Thu, 27 Jun 2024 08:08:57 +0900 > Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > >> How do you respond to Petr Mladek's comment >> >> Yeah, converting printk() into printk_deferred() or using >> printk_deferred_enter() around particular code paths is a whac-a-mole >> game. >> >> at https://lkml.kernel.org/r/ZnvVQ5cs9F0b7paI@pathway.suse.cz ? > > I agree with that. And your solution is no different than whack-a-mole. > It's just that you used a bigger hammer to wack the mole. What change do you propose? https://syzkaller.appspot.com/text?tag=Patch&x=121c92fe180000 ?
On Thu, 27 Jun 2024 09:21:38 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> wrote: > What change do you propose? > > https://syzkaller.appspot.com/text?tag=Patch&x=121c92fe180000 ? That's one solution, but you need to call printk_deferred_enter() *after* taking the lock, otherwise preemption could still be enabled and you could schedule and migrate between the printk_deferred_enter() and taking of the lock. -- Steve
diff --git a/include/linux/filter.h b/include/linux/filter.h index dd41a93f06b2e..977ae5b486164 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -674,6 +674,10 @@ static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog, u32 ret; cant_migrate(); +#ifndef CONFIG_PREEMPT_RT + preempt_disable(); +#endif + printk_deferred_enter(); if (static_branch_unlikely(&bpf_stats_enabled_key)) { struct bpf_prog_stats *stats; u64 duration, start = sched_clock(); @@ -690,6 +694,10 @@ static __always_inline u32 __bpf_prog_run(const struct bpf_prog *prog, } else { ret = dfunc(ctx, prog->insnsi, prog->bpf_func); } + printk_deferred_exit(); +#ifndef CONFIG_PREEMPT_RT + preempt_enable(); +#endif return ret; }
syzbot is reporting circular locking dependency inside __bpf_prog_run(), for fault injection calls printk() despite rq lock is already held. Guard __bpf_prog_run() using printk_deferred_{enter,exit}() (and preempt_{disable,enable}() if CONFIG_PREEMPT_RT=n) in order to defer any printk() messages. (migrate_{disable,enable}() is not needed if CONFIG_PREEMPT_RT=y because cant_migrate() asserts that caller already disabled migration.) ====================================================== WARNING: possible circular locking dependency detected 6.10.0-rc4-syzkaller-00874-g84562f9953ec #0 Not tainted ------------------------------------------------------ syz-executor.1/25480 is trying to acquire lock: ffffffff8e328140 (console_owner){..-.}-{0:0}, at: rcu_try_lock_acquire include/linux/rcupdate.h:334 [inline] ffffffff8e328140 (console_owner){..-.}-{0:0}, at: srcu_read_lock_nmisafe include/linux/srcu.h:232 [inline] ffffffff8e328140 (console_owner){..-.}-{0:0}, at: console_srcu_read_lock kernel/printk/printk.c:286 [inline] ffffffff8e328140 (console_owner){..-.}-{0:0}, at: console_flush_all+0x152/0xfd0 kernel/printk/printk.c:2986 but task is already holding lock: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 which lock already depends on the new lock. (...snipped...) Chain exists of: console_owner --> &p->pi_lock --> &rq->__lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&rq->__lock); lock(&p->pi_lock); lock(&rq->__lock); lock(console_owner); *** DEADLOCK *** 6 locks held by syz-executor.1/25480: #0: ffffffff8f5e6f48 (rtnl_mutex){+.+.}-{3:3}, at: dev_ioctl+0x706/0x1340 net/core/dev_ioctl.c:785 #1: ffffffff8f67dd68 (flowtable_lock){+.+.}-{3:3}, at: nf_flow_table_cleanup+0x23/0xb0 net/netfilter/nf_flow_table_core.c:593 #2: ffff8880b943e798 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:559 #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline] #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2402 [inline] #3: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run4+0x244/0x590 kernel/trace/bpf_trace.c:2446 #4: ffffffff8e20fa60 (console_lock){+.+.}-{0:0}, at: _printk+0xd5/0x120 kernel/printk/printk.c:2370 #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: rcu_try_lock_acquire include/linux/rcupdate.h:334 [inline] #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: srcu_read_lock_nmisafe include/linux/srcu.h:232 [inline] #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: console_srcu_read_lock kernel/printk/printk.c:286 [inline] #5: ffffffff8e20f690 (console_srcu){....}-{0:0}, at: console_flush_all+0x152/0xfd0 kernel/printk/printk.c:2986 stack backtrace: CPU: 0 PID: 25480 Comm: syz-executor.1 Not tainted 6.10.0-rc4-syzkaller-00874-g84562f9953ec #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187 check_prev_add kernel/locking/lockdep.c:3134 [inline] check_prevs_add kernel/locking/lockdep.c:3253 [inline] validate_chain+0x18e0/0x5900 kernel/locking/lockdep.c:3869 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 console_lock_spinning_enable kernel/printk/printk.c:1870 [inline] console_emit_next_record kernel/printk/printk.c:2922 [inline] console_flush_all+0x810/0xfd0 kernel/printk/printk.c:2994 console_unlock+0x13b/0x4d0 kernel/printk/printk.c:3063 vprintk_emit+0x5a6/0x770 kernel/printk/printk.c:2345 _printk+0xd5/0x120 kernel/printk/printk.c:2370 fail_dump lib/fault-inject.c:45 [inline] should_fail_ex+0x391/0x4e0 lib/fault-inject.c:153 __copy_to_user_inatomic include/linux/uaccess.h:123 [inline] copy_to_user_nofault+0x86/0x140 mm/maccess.c:149 bpf_prog_b0a3dac844962ed2+0x47/0x4d bpf_dispatcher_nop_func include/linux/bpf.h:1243 [inline] __bpf_prog_run include/linux/filter.h:691 [inline] bpf_prog_run include/linux/filter.h:698 [inline] __bpf_trace_run kernel/trace/bpf_trace.c:2403 [inline] bpf_trace_run4+0x334/0x590 kernel/trace/bpf_trace.c:2446 __traceiter_sched_switch+0x98/0xd0 include/trace/events/sched.h:222 trace_sched_switch include/trace/events/sched.h:222 [inline] __schedule+0x2587/0x4a20 kernel/sched/core.c:6742 preempt_schedule_notrace+0x100/0x140 kernel/sched/core.c:7017 preempt_schedule_notrace_thunk+0x1a/0x30 arch/x86/entry/thunk.S:13 rcu_is_watching+0x7e/0xb0 kernel/rcu/tree.c:725 trace_lock_acquire include/trace/events/lock.h:24 [inline] lock_acquire+0xe3/0x550 kernel/locking/lockdep.c:5725 rcu_lock_acquire include/linux/rcupdate.h:329 [inline] rcu_read_lock include/linux/rcupdate.h:781 [inline] start_flush_work kernel/workqueue.c:4122 [inline] __flush_work+0x107/0xd00 kernel/workqueue.c:4181 flush_work kernel/workqueue.c:4232 [inline] flush_delayed_work+0x169/0x1c0 kernel/workqueue.c:4254 nf_flow_table_gc_cleanup net/netfilter/nf_flow_table_core.c:585 [inline] nf_flow_table_cleanup+0x62/0xb0 net/netfilter/nf_flow_table_core.c:595 flow_offload_netdev_event+0x51/0x70 net/netfilter/nft_flow_offload.c:492 notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93 __dev_notify_flags+0x207/0x400 dev_change_flags+0xf0/0x1a0 net/core/dev.c:8858 dev_ifsioc+0x7c8/0xe70 net/core/dev_ioctl.c:529 dev_ioctl+0x719/0x1340 net/core/dev_ioctl.c:786 sock_do_ioctl+0x240/0x460 net/socket.c:1236 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot <syzbot+f78380e4eae53c64125c@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=f78380e4eae53c64125c Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> --- Only compile tested. include/linux/filter.h | 8 ++++++++ 1 file changed, 8 insertions(+)