Message ID | 20241016011144.3058445-1-kuba@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | configs/debug: make sure PROVE_RCU_LIST=y takes effect | expand |
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote: > Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") > added CONFIG_PROVE_RCU_LIST=y to the common CI config, > but RCU_EXPERT is not set, and it's a dependency for > CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage > of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking > indicate that it does catch bugs. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> thanks, - Joel > --- > I'd be slightly tempted to still send it to Linux for v6.12. > > CC: paulmck@kernel.org > CC: frederic@kernel.org > CC: neeraj.upadhyay@kernel.org > CC: joel@joelfernandes.org > CC: rcu@vger.kernel.org > CC: linux-kernel@vger.kernel.org > CC: kees@kernel.org > CC: matttbe@kernel.org > --- > kernel/configs/debug.config | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config > index 509ee703de15..20552f163930 100644 > --- a/kernel/configs/debug.config > +++ b/kernel/configs/debug.config > @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y > # > # RCU Debugging > # > +CONFIG_RCU_EXPERT=y > CONFIG_PROVE_RCU=y > CONFIG_PROVE_RCU_LIST=y > # > -- > 2.46.2 >
Hi Jakub, On 16/10/2024 03:11, Jakub Kicinski wrote: > Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") > added CONFIG_PROVE_RCU_LIST=y to the common CI config, > but RCU_EXPERT is not set, and it's a dependency for > CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage > of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking > indicate that it does catch bugs. Good catch! I confirm it fixes the issue: Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > I'd be slightly tempted to still send it to Linux for v6.12. Good idea, it sounds like a fix because I guess if it was on the list, it was supposed to be used. But be careful it might detect quite a few issues: I just enabled it on MPTCP tree, and it found issues. It might be good to check the impact first, e.g. at least enabling it when running the Networking tests. Please note that the CI didn't pick up this patch, because it is marked for RCU (PATCH rcu): https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/ If the impact is important, it might be better to target linux-next first, no? Cheers, Matt
On 16/10/2024 17:18, Matthieu Baerts wrote: > Hi Jakub, > > On 16/10/2024 03:11, Jakub Kicinski wrote: >> Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") >> added CONFIG_PROVE_RCU_LIST=y to the common CI config, >> but RCU_EXPERT is not set, and it's a dependency for >> CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage >> of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking >> indicate that it does catch bugs. > > Good catch! I confirm it fixes the issue: > > Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > >> Signed-off-by: Jakub Kicinski <kuba@kernel.org> >> --- >> I'd be slightly tempted to still send it to Linux for v6.12. > > Good idea, it sounds like a fix because I guess if it was on the list, > it was supposed to be used. But be careful it might detect quite a few > issues: I just enabled it on MPTCP tree, and it found issues. Sorry, I forgot to mention that it found 3 issues: one specific to MPTCP, one in Netfilter, but also one when shutting down the VM: > # /usr/lib/klibc/bin/poweroff > [ 2360.588763][T11825] > [ 2360.589006][T11825] ============================= > [ 2360.589424][T11825] WARNING: suspicious RCU usage > [ 2360.589952][T11825] 6.12.0-rc2+ #1 Tainted: G N > [ 2360.590522][T11825] ----------------------------- > [ 2360.590896][T11825] kernel/events/core.c:13962 RCU-list traversed in non-reader section!! > [ 2360.592341][T11825] > [ 2360.592341][T11825] other info that might help us debug this: > [ 2360.592341][T11825] > [ 2360.593343][T11825] > [ 2360.593343][T11825] rcu_scheduler_active = 2, debug_locks = 1 > [ 2360.593951][T11825] 3 locks held by poweroff/11825: > [2360.594481][T11825] #0: ffffffff89641e28 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot (kernel/reboot.c:770) > [2360.594997][T11825] #1: ffffffff8963eab0 ((reboot_notifier_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain (kernel/notifier.c:388) > [2360.595859][T11825] #2: ffffffff898d1a68 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context (kernel/events/core.c:13983) > [ 2360.596645][T11825] > [ 2360.596645][T11825] stack backtrace: > [ 2360.597476][T11825] CPU: 3 UID: 0 PID: 11825 Comm: poweroff Tainted: G N 6.12.0-rc2+ #1 > [ 2360.597987][T11825] Tainted: [N]=TEST > [ 2360.598291][T11825] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 2360.598890][T11825] Call Trace: > [ 2360.599295][T11825] <TASK> > [2360.599568][T11825] dump_stack_lvl (lib/dump_stack.c:123) > [2360.600032][T11825] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) > [2360.600518][T11825] perf_event_clear_cpumask (kernel/events/core.c:13962 (discriminator 9)) > [2360.600972][T11825] ? __pfx_perf_event_clear_cpumask (kernel/events/core.c:13939) > [2360.601607][T11825] ? acpi_execute_simple_method (drivers/acpi/utils.c:679) > [2360.601988][T11825] ? __pfx_acpi_execute_simple_method (drivers/acpi/utils.c:679) > [2360.602577][T11825] ? md_notify_reboot (drivers/md/md.c:9860) > [2360.603043][T11825] perf_event_exit_cpu_context (kernel/events/core.c:13984 (discriminator 1)) > [2360.603543][T11825] perf_reboot (kernel/events/core.c:14066 (discriminator 3)) > [2360.603979][T11825] ? trace_notifier_run (include/trace/events/notifier.h:59 (discriminator 2)) > [2360.604454][T11825] notifier_call_chain (kernel/notifier.c:93) > [2360.604833][T11825] blocking_notifier_call_chain (kernel/notifier.c:389) > [2360.605522][T11825] kernel_power_off (kernel/reboot.c:294) > [2360.605908][T11825] __do_sys_reboot (kernel/reboot.c:771) > [2360.606320][T11825] ? __pfx___do_sys_reboot (kernel/reboot.c:717) > [2360.606739][T11825] ? __pfx_ksys_sync (fs/sync.c:98) > [2360.607113][T11825] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)) > [2360.607551][T11825] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) > [ 2360.607911][T11825] RIP: 0033:0x20e505 > [ 2360.608320][T11825] Code: 48 89 77 38 c3 89 f0 48 8b 1f 48 8b 67 08 48 8b 6f 10 4c 8b 67 18 4c 8b 6f 20 4c 8b 77 28 4c 8b 7f 30 ff 67 38 49 89 ca 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 f7 d8 89 05 d6 8e 00 00 48 83 c8 ff c3 > All code > ======== > 0: 48 89 77 38 mov %rsi,0x38(%rdi) > 4: c3 ret > 5: 89 f0 mov %esi,%eax > 7: 48 8b 1f mov (%rdi),%rbx > a: 48 8b 67 08 mov 0x8(%rdi),%rsp > e: 48 8b 6f 10 mov 0x10(%rdi),%rbp > 12: 4c 8b 67 18 mov 0x18(%rdi),%r12 > 16: 4c 8b 6f 20 mov 0x20(%rdi),%r13 > 1a: 4c 8b 77 28 mov 0x28(%rdi),%r14 > 1e: 4c 8b 7f 30 mov 0x30(%rdi),%r15 > 22: ff 67 38 jmp *0x38(%rdi) > 25: 49 89 ca mov %rcx,%r10 > 28: 0f 05 syscall > 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction > 30: 73 01 jae 0x33 > 32: c3 ret > 33: f7 d8 neg %eax > 35: 89 05 d6 8e 00 00 mov %eax,0x8ed6(%rip) # 0x8f11 > 3b: 48 83 c8 ff or $0xffffffffffffffff,%rax > 3f: c3 ret > > Code starting with the faulting instruction > =========================================== > 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax > 6: 73 01 jae 0x9 > 8: c3 ret > 9: f7 d8 neg %eax > b: 89 05 d6 8e 00 00 mov %eax,0x8ed6(%rip) # 0x8ee7 > 11: 48 83 c8 ff or $0xffffffffffffffff,%rax > 15: c3 ret > [ 2360.609571][T11825] RSP: 002b:00007ffd898b0d58 EFLAGS: 00000217 ORIG_RAX: 00000000000000a9 > [ 2360.610171][T11825] RAX: ffffffffffffffda RBX: 000000004321fedc RCX: 000000000020e505 > [ 2360.610763][T11825] RDX: 000000004321fedc RSI: 0000000028121969 RDI: 00000000fee1dead > [ 2360.611461][T11825] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000213020 > [ 2360.611987][T11825] R10: 0000000000000000 R11: 0000000000000217 R12: 00007ffd898b0db8 > [ 2360.612583][T11825] R13: 0000000000401000 R14: 0000000000000000 R15: 0000000000000000 > [ 2360.613142][T11825] </TASK> > [ 2360.615641][T11825] ACPI: PM: Preparing to enter system sleep state S5 > [ 2360.617134][T11825] kvm: exiting hardware virtualization > [ 2360.617715][T11825] reboot: Power down I can easily reproduce this one: this means that all "debug" VMs will report issues! Cheers, Matt
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote: > Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") > added CONFIG_PROVE_RCU_LIST=y to the common CI config, > but RCU_EXPERT is not set, and it's a dependency for > CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage > of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking > indicate that it does catch bugs. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> > --- > I'd be slightly tempted to still send it to Linux for v6.12. > > CC: paulmck@kernel.org > CC: frederic@kernel.org > CC: neeraj.upadhyay@kernel.org > CC: joel@joelfernandes.org > CC: rcu@vger.kernel.org > CC: linux-kernel@vger.kernel.org > CC: kees@kernel.org > CC: matttbe@kernel.org > --- > kernel/configs/debug.config | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config > index 509ee703de15..20552f163930 100644 > --- a/kernel/configs/debug.config > +++ b/kernel/configs/debug.config > @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y > # > # RCU Debugging > # > +CONFIG_RCU_EXPERT=y > CONFIG_PROVE_RCU=y > CONFIG_PROVE_RCU_LIST=y > # > -- > 2.46.2 >
On Tue, Oct 15, 2024 at 06:11:44PM -0700, Jakub Kicinski wrote: > Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") > added CONFIG_PROVE_RCU_LIST=y to the common CI config, > but RCU_EXPERT is not set, and it's a dependency for > CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage > of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking > indicate that it does catch bugs. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> I exercised this in conjunction with tools/testing/selftests/net/config and the resulting .config is as described before and after. Reviewed-by: Simon Horman <horms@kernel.org>
On Wed, 16 Oct 2024 17:18:47 +0200 Matthieu Baerts wrote: > https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/ > > If the impact is important, it might be better to target linux-next > first, no? Thanks for testing! I didn't anticipate it to be so effective. Looks like it's not in -next, yet, and we got an Ack from Paul and co. so I'll toss it into net-next.
Hi Jakub, On 28/10/2024 18:22, Jakub Kicinski wrote: > On Wed, 16 Oct 2024 17:18:47 +0200 Matthieu Baerts wrote: >> https://patchwork.kernel.org/project/netdevbpf/patch/20241016011144.3058445-1-kuba@kernel.org/ >> >> If the impact is important, it might be better to target linux-next >> first, no? > > Thanks for testing! I didn't anticipate it to be so effective. > > Looks like it's not in -next, yet, and we got an Ack from Paul > and co. so I'll toss it into net-next. Thanks! Please note that the 3 issues I mentioned have been fixed somewhere: - MPTCP: patches have been sent to the Netdev ML [1] - Netfilter: patches have been sent to the Netfilter ML [2] - perf (VM shutdown): patches have been applied in perf/urgent [3] [1] https://lore.kernel.org/20241021-net-mptcp-sched-lock-v1-1-637759cf061c@kernel.org [2] https://lore.kernel.org/20241025133230.22491-2-fw@strlen.de [3] https://lore.kernel.org/20240913162340.2142976-1-kan.liang@linux.intel.com Cheers, Matt
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 15 Oct 2024 18:11:44 -0700 you wrote: > Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") > added CONFIG_PROVE_RCU_LIST=y to the common CI config, > but RCU_EXPERT is not set, and it's a dependency for > CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage > of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking > indicate that it does catch bugs. > > [...] Here is the summary with links: - [rcu] configs/debug: make sure PROVE_RCU_LIST=y takes effect https://git.kernel.org/netdev/net-next/c/a3e4bf7f9675 You are awesome, thank you!
diff --git a/kernel/configs/debug.config b/kernel/configs/debug.config index 509ee703de15..20552f163930 100644 --- a/kernel/configs/debug.config +++ b/kernel/configs/debug.config @@ -103,6 +103,7 @@ CONFIG_BUG_ON_DATA_CORRUPTION=y # # RCU Debugging # +CONFIG_RCU_EXPERT=y CONFIG_PROVE_RCU=y CONFIG_PROVE_RCU_LIST=y #
Commit 0aaa8977acbf ("configs: introduce debug.config for CI-like setup") added CONFIG_PROVE_RCU_LIST=y to the common CI config, but RCU_EXPERT is not set, and it's a dependency for CONFIG_PROVE_RCU_LIST=y. Make sure CIs take advantage of CONFIG_PROVE_RCU_LIST=y, recent fixes in networking indicate that it does catch bugs. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- I'd be slightly tempted to still send it to Linux for v6.12. CC: paulmck@kernel.org CC: frederic@kernel.org CC: neeraj.upadhyay@kernel.org CC: joel@joelfernandes.org CC: rcu@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: kees@kernel.org CC: matttbe@kernel.org --- kernel/configs/debug.config | 1 + 1 file changed, 1 insertion(+)