Message ID | 20221213172429.7774f4ba@gandalf.local.home (mailing list archive) |
---|---|
State | Accepted |
Commit | fb9f5ee9bf5cf7ebc8731a7033e57d98832dc650 |
Headers | show |
Series | tracing: Do not synchronize freeing of trigger filter on boot up | expand |
On Tue, 13 Dec 2022 17:24:29 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > If a trigger filter on the kernel command line fails to apply (due to > syntax error), it will be freed. The freeing will call > tracepoint_synchronize_unregister(), but this is not needed during early > boot up, and will even trigger a lockdep splat. > > Avoid calling the synchronization function when system_state is > SYSTEM_BOOTING. Shouldn't this be done inside tracepoint_synchronize_unregister()? Then, it will prevent similar warnings if we expand boot time feature. Thank you, > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > kernel/trace/trace_events_trigger.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c > index 19ce9d22bfd7..e535959939d3 100644 > --- a/kernel/trace/trace_events_trigger.c > +++ b/kernel/trace/trace_events_trigger.c > @@ -1085,8 +1085,14 @@ int set_trigger_filter(char *filter_str, > rcu_assign_pointer(data->filter, filter); > > if (tmp) { > - /* Make sure the call is done with the filter */ > - tracepoint_synchronize_unregister(); > + /* > + * Make sure the call is done with the filter. > + * It is possible that a filter could fail at boot up, > + * and then this path will be called. Avoid the synchronization > + * in that case. > + */ > + if (system_state != SYSTEM_BOOTING) > + tracepoint_synchronize_unregister(); > free_event_filter(tmp); > } > > -- > 2.35.1 >
On Wed, 14 Dec 2022 08:49:54 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > Avoid calling the synchronization function when system_state is > > SYSTEM_BOOTING. > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > Then, it will prevent similar warnings if we expand boot time feature. I thought about that, but because this is a special case, I was thinking that we don't just gloss over it and have something call this when it shouldn't be in early boot up. I was surprised when it was called, and then had to debug it a bit to figure out why it was. When I discovered the path it took to get there, I realized it was legitimate. Right now, I only see this one case as an exception. If more pop up that are correct, then we can switch it over. But right now, I still want calling tracepoint_synchronize_unregister() to warn if it is called in early boot up, because that could be a sign that something went wrong. -- Steve
On Tue, 13 Dec 2022 19:03:21 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 14 Dec 2022 08:49:54 +0900 > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > > > Avoid calling the synchronization function when system_state is > > > SYSTEM_BOOTING. > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > Then, it will prevent similar warnings if we expand boot time feature. > > I thought about that, but because this is a special case, I was thinking > that we don't just gloss over it and have something call this when it > shouldn't be in early boot up. I was surprised when it was called, and then > had to debug it a bit to figure out why it was. When I discovered the path > it took to get there, I realized it was legitimate. > > Right now, I only see this one case as an exception. If more pop up that > are correct, then we can switch it over. But right now, I still want > calling tracepoint_synchronize_unregister() to warn if it is called in > early boot up, because that could be a sign that something went wrong. Thanks, I got it. :) So when we find more synchronization is involved, we'll revisit it. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Thank you, > > -- Steve
On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote: > On Tue, 13 Dec 2022 17:24:29 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > If a trigger filter on the kernel command line fails to apply (due to > > syntax error), it will be freed. The freeing will call > > tracepoint_synchronize_unregister(), but this is not needed during early > > boot up, and will even trigger a lockdep splat. > > > > Avoid calling the synchronization function when system_state is > > SYSTEM_BOOTING. > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > Then, it will prevent similar warnings if we expand boot time feature. How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? Just in case there are ever additional issues of this sort? Thanx, Paul ------------------------------------------------------------------------ commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae Author: Paul E. McKenney <paulmck@kernel.org> Date: Wed Dec 14 11:41:44 2022 -0800 rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks Currently, RCU_LOCKDEP_WARN() checks the condition before checking to see if lockdep is still enabled. This is necessary to avoid the false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject RCU_LOCKDEP_WARN() false positives"). However, the current state can result in false-positive splats during early boot before lockdep is fully initialized. This commit therefore checks debug_lockdep_rcu_enabled() both before and after checking the condition, thus avoiding both sets of false-positive error reports. Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index aa1b1c3546d7a..1aec1d53b0c91 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void) * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met * @c: condition to check * @s: informative message + * + * This checks debug_lockdep_rcu_enabled() before checking (c) to + * prevent early boot splats due to lockdep not yet being initialized, + * and rechecks it after checking (c) to prevent false-positive splats + * due to races with lockdep being disabled. See commit 3066820034b5dd + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail. */ #define RCU_LOCKDEP_WARN(c, s) \ do { \ static bool __section(".data.unlikely") __warned; \ - if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ + if (debug_lockdep_rcu_enabled() && (c) && \ + debug_lockdep_rcu_enabled() && !__warned) { \ __warned = true; \ lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ } \
On 2022-12-14 15:03, Paul E. McKenney wrote: > On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote: >> On Tue, 13 Dec 2022 17:24:29 -0500 >> Steven Rostedt <rostedt@goodmis.org> wrote: >> >>> From: "Steven Rostedt (Google)" <rostedt@goodmis.org> >>> >>> If a trigger filter on the kernel command line fails to apply (due to >>> syntax error), it will be freed. The freeing will call >>> tracepoint_synchronize_unregister(), but this is not needed during early >>> boot up, and will even trigger a lockdep splat. >>> >>> Avoid calling the synchronization function when system_state is >>> SYSTEM_BOOTING. >> >> Shouldn't this be done inside tracepoint_synchronize_unregister()? >> Then, it will prevent similar warnings if we expand boot time feature. > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > Just in case there are ever additional issues of this sort? Hi Paul, Your approach makes sense. Thanks for looking into this. Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae > Author: Paul E. McKenney <paulmck@kernel.org> > Date: Wed Dec 14 11:41:44 2022 -0800 > > rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks > > Currently, RCU_LOCKDEP_WARN() checks the condition before checking > to see if lockdep is still enabled. This is necessary to avoid the > false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject > RCU_LOCKDEP_WARN() false positives"). However, the current state can > result in false-positive splats during early boot before lockdep is fully > initialized. This commit therefore checks debug_lockdep_rcu_enabled() > both before and after checking the condition, thus avoiding both sets > of false-positive error reports. > > Reported-by: Steven Rostedt <rostedt@goodmis.org> > Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > Cc: Boqun Feng <boqun.feng@gmail.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > index aa1b1c3546d7a..1aec1d53b0c91 100644 > --- a/include/linux/rcupdate.h > +++ b/include/linux/rcupdate.h > @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void) > * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met > * @c: condition to check > * @s: informative message > + * > + * This checks debug_lockdep_rcu_enabled() before checking (c) to > + * prevent early boot splats due to lockdep not yet being initialized, > + * and rechecks it after checking (c) to prevent false-positive splats > + * due to races with lockdep being disabled. See commit 3066820034b5dd > + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail. > */ > #define RCU_LOCKDEP_WARN(c, s) \ > do { \ > static bool __section(".data.unlikely") __warned; \ > - if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > + if (debug_lockdep_rcu_enabled() && (c) && \ > + debug_lockdep_rcu_enabled() && !__warned) { \ > __warned = true; \ > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > } \
On Wed, Dec 14, 2022 at 03:37:13PM -0500, Mathieu Desnoyers wrote: > On 2022-12-14 15:03, Paul E. McKenney wrote: > > On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote: > > > On Tue, 13 Dec 2022 17:24:29 -0500 > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > > > > > If a trigger filter on the kernel command line fails to apply (due to > > > > syntax error), it will be freed. The freeing will call > > > > tracepoint_synchronize_unregister(), but this is not needed during early > > > > boot up, and will even trigger a lockdep splat. > > > > > > > > Avoid calling the synchronization function when system_state is > > > > SYSTEM_BOOTING. > > > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > > Then, it will prevent similar warnings if we expand boot time feature. > > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > > Just in case there are ever additional issues of this sort? > > Hi Paul, > > Your approach makes sense. Thanks for looking into this. > > Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Thank you, I will apply this on my next rebase. Thanx, Paul > > ------------------------------------------------------------------------ > > > > commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae > > Author: Paul E. McKenney <paulmck@kernel.org> > > Date: Wed Dec 14 11:41:44 2022 -0800 > > > > rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks > > Currently, RCU_LOCKDEP_WARN() checks the condition before checking > > to see if lockdep is still enabled. This is necessary to avoid the > > false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject > > RCU_LOCKDEP_WARN() false positives"). However, the current state can > > result in false-positive splats during early boot before lockdep is fully > > initialized. This commit therefore checks debug_lockdep_rcu_enabled() > > both before and after checking the condition, thus avoiding both sets > > of false-positive error reports. > > Reported-by: Steven Rostedt <rostedt@goodmis.org> > > Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org> > > Cc: Boqun Feng <boqun.feng@gmail.com> > > Cc: Matthew Wilcox <willy@infradead.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h > > index aa1b1c3546d7a..1aec1d53b0c91 100644 > > --- a/include/linux/rcupdate.h > > +++ b/include/linux/rcupdate.h > > @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void) > > * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met > > * @c: condition to check > > * @s: informative message > > + * > > + * This checks debug_lockdep_rcu_enabled() before checking (c) to > > + * prevent early boot splats due to lockdep not yet being initialized, > > + * and rechecks it after checking (c) to prevent false-positive splats > > + * due to races with lockdep being disabled. See commit 3066820034b5dd > > + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail. > > */ > > #define RCU_LOCKDEP_WARN(c, s) \ > > do { \ > > static bool __section(".data.unlikely") __warned; \ > > - if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \ > > + if (debug_lockdep_rcu_enabled() && (c) && \ > > + debug_lockdep_rcu_enabled() && !__warned) { \ > > __warned = true; \ > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \ > > } \ > > -- > Mathieu Desnoyers > EfficiOS Inc. > https://www.efficios.com >
On Wed, 14 Dec 2022 12:03:33 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > Avoid calling the synchronization function when system_state is > > > SYSTEM_BOOTING. > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > Then, it will prevent similar warnings if we expand boot time feature. > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > Just in case there are ever additional issues of this sort? Adding more tracing command line parameters is triggering this more. I now hit: [ 0.213442] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.214653] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.232023] ------------[ cut here ]------------ [ 0.232704] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:1410 rcu_poll_gp_seq_start_unlocked+0x8a/0xa0 [ 0.234241] Modules linked in: [ 0.234715] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6-test-00046-ga0ca17bfe724-dirty #63 [ 0.236069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 [ 0.237549] RIP: 0010:rcu_poll_gp_seq_start_unlocked+0x8a/0xa0 [ 0.238495] Code: 02 0f 0b 48 89 ee 5b 48 c7 c7 00 ca b1 b3 5d e9 cc 0f c1 00 65 8b 05 85 f3 dd 4d 85 c0 75 a1 65 8b 05 d6 f0 dd 4d 85 c0 75 96 <0f> 0b eb 92 5b 5d e9 eb 18 fc 00 66 66 2e 0f 1f 84 00 00 00 00 00 [ 0.241433] RSP: 0000:ffffffffb3a03d28 EFLAGS: 00010046 [ 0.242289] RAX: 0000000000000000 RBX: ffffffffb3bcfc48 RCX: 0000000000000001 [ 0.243355] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffffb3bcfc48 [ 0.244421] RBP: 0000000000000000 R08: ffff97a3c012b8f8 R09: fffffffffffffffe [ 0.245569] R10: ffffffffb3a03e78 R11: 00000000ffffffff R12: 0000000000000040 [ 0.246752] R13: ffffffffb3a30500 R14: ffffffffb3be6180 R15: ffff97a3c0355d10 [ 0.247826] FS: 0000000000000000(0000) GS:ffff97a537c00000(0000) knlGS:0000000000000000 [ 0.249084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.250012] CR2: ffff97a51e201000 CR3: 000000025c826001 CR4: 00000000000606b0 [ 0.251122] Call Trace: [ 0.251491] <TASK> [ 0.251810] synchronize_rcu+0xb6/0x1e0 [ 0.252393] ? smp_call_function_many_cond+0x36d/0x3b0 [ 0.253243] ? enable_trace_buffered_event+0x10/0x10 [ 0.254021] trace_buffered_event_disable+0x7c/0x140 [ 0.254786] __ftrace_event_enable_disable+0x11c/0x250 [ 0.255559] __ftrace_set_clr_event_nolock+0xe3/0x130 [ 0.256321] ftrace_set_clr_event+0x9a/0xe0 [ 0.256976] early_enable_events+0x64/0xa8 [ 0.257647] trace_event_init+0xd6/0x2dd [ 0.258305] start_kernel+0x760/0x9e0 [ 0.258862] secondary_startup_64_no_verify+0xe5/0xeb [ 0.259634] </TASK> [ 0.259966] irq event stamp: 0 [ 0.260427] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 0.261448] hardirqs last disabled at (0): [<0000000000000000>] 0x0 [ 0.262424] softirqs last enabled at (0): [<0000000000000000>] 0x0 [ 0.263402] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 0.264345] ---[ end trace 0000000000000000 ]--- [ 0.265088] ------------[ cut here ]------------ Where kernel/rcu/tree.c:1410 is at: // Make the polled API aware of the beginning of a grace period, but // where caller does not hold the root rcu_node structure's lock. static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) { unsigned long flags; struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { lockdep_assert_irqs_enabled(); <<-------------- 1410 raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_start(snap); if (rcu_init_invoked()) raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } -- Steve
On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote: > On Wed, 14 Dec 2022 12:03:33 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > Avoid calling the synchronization function when system_state is > > > > SYSTEM_BOOTING. > > > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > > Then, it will prevent similar warnings if we expand boot time feature. > > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > > Just in case there are ever additional issues of this sort? > > Adding more tracing command line parameters is triggering this more. I now > hit: Fair point, and apologies for the hassle. Any chance of getting an official "it is now late enough in boot to safely invoke lockdep" API? ;-) In the meantime, does the (untested and quite crude) patch at the end of this message help? Thanx, Paul > [ 0.213442] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. > [ 0.214653] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 > [ 0.232023] ------------[ cut here ]------------ > [ 0.232704] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:1410 rcu_poll_gp_seq_start_unlocked+0x8a/0xa0 > [ 0.234241] Modules linked in: > [ 0.234715] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6-test-00046-ga0ca17bfe724-dirty #63 > [ 0.236069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 > [ 0.237549] RIP: 0010:rcu_poll_gp_seq_start_unlocked+0x8a/0xa0 > [ 0.238495] Code: 02 0f 0b 48 89 ee 5b 48 c7 c7 00 ca b1 b3 5d e9 cc 0f c1 00 65 8b 05 85 f3 dd 4d 85 c0 75 a1 65 8b 05 d6 f0 dd 4d 85 c0 75 96 <0f> 0b eb 92 5b 5d e9 eb 18 fc 00 66 66 2e 0f 1f 84 00 00 00 00 00 > [ 0.241433] RSP: 0000:ffffffffb3a03d28 EFLAGS: 00010046 > [ 0.242289] RAX: 0000000000000000 RBX: ffffffffb3bcfc48 RCX: 0000000000000001 > [ 0.243355] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffffb3bcfc48 > [ 0.244421] RBP: 0000000000000000 R08: ffff97a3c012b8f8 R09: fffffffffffffffe > [ 0.245569] R10: ffffffffb3a03e78 R11: 00000000ffffffff R12: 0000000000000040 > [ 0.246752] R13: ffffffffb3a30500 R14: ffffffffb3be6180 R15: ffff97a3c0355d10 > [ 0.247826] FS: 0000000000000000(0000) GS:ffff97a537c00000(0000) knlGS:0000000000000000 > [ 0.249084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.250012] CR2: ffff97a51e201000 CR3: 000000025c826001 CR4: 00000000000606b0 > [ 0.251122] Call Trace: > [ 0.251491] <TASK> > [ 0.251810] synchronize_rcu+0xb6/0x1e0 > [ 0.252393] ? smp_call_function_many_cond+0x36d/0x3b0 > [ 0.253243] ? enable_trace_buffered_event+0x10/0x10 > [ 0.254021] trace_buffered_event_disable+0x7c/0x140 > [ 0.254786] __ftrace_event_enable_disable+0x11c/0x250 > [ 0.255559] __ftrace_set_clr_event_nolock+0xe3/0x130 > [ 0.256321] ftrace_set_clr_event+0x9a/0xe0 > [ 0.256976] early_enable_events+0x64/0xa8 > [ 0.257647] trace_event_init+0xd6/0x2dd > [ 0.258305] start_kernel+0x760/0x9e0 > [ 0.258862] secondary_startup_64_no_verify+0xe5/0xeb > [ 0.259634] </TASK> > [ 0.259966] irq event stamp: 0 > [ 0.260427] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 0.261448] hardirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 0.262424] softirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 0.263402] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 0.264345] ---[ end trace 0000000000000000 ]--- > [ 0.265088] ------------[ cut here ]------------ > > Where kernel/rcu/tree.c:1410 is at: > > // Make the polled API aware of the beginning of a grace period, but > // where caller does not hold the root rcu_node structure's lock. > static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) > { > unsigned long flags; > struct rcu_node *rnp = rcu_get_root(); > > if (rcu_init_invoked()) { > lockdep_assert_irqs_enabled(); <<-------------- 1410 > raw_spin_lock_irqsave_rcu_node(rnp, flags); > } > rcu_poll_gp_seq_start(snap); > if (rcu_init_invoked()) > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ee8a6a711719a..af8c99920e1a4 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -151,7 +151,6 @@ static void sync_sched_exp_online_cleanup(int cpu); static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp); static bool rcu_rdp_is_offloaded(struct rcu_data *rdp); static bool rcu_rdp_cpu_online(struct rcu_data *rdp); -static bool rcu_init_invoked(void); static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf); static void rcu_init_new_rnp(struct rcu_node *rnp_leaf); @@ -1314,7 +1313,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap) { struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_lockdep_assert_held_rcu_node(rnp); // If RCU was idle, note beginning of GP. @@ -1330,7 +1329,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap) { struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_lockdep_assert_held_rcu_node(rnp); // If the previously noted GP is still in effect, record the @@ -1352,12 +1351,12 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) unsigned long flags; struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) { + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) { lockdep_assert_irqs_enabled(); raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_start(snap); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } @@ -1368,12 +1367,12 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap) unsigned long flags; struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) { + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) { lockdep_assert_irqs_enabled(); raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_end(snap); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } @@ -4042,13 +4041,6 @@ EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online); #endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */ -// Has rcu_init() been invoked? This is used (for example) to determine -// whether spinlocks may be acquired safely. -static bool rcu_init_invoked(void) -{ - return !!rcu_state.n_online_cpus; -} - /* * Near the end of the offline process. Trace the fact that this CPU * is going offline. diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index ed6c3cce28f23..26146530d48f9 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -1019,14 +1019,14 @@ unsigned long start_poll_synchronize_rcu_expedited(void) s = get_state_synchronize_rcu(); rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id()); rnp = rdp->mynode; - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags); if (!poll_state_synchronize_rcu(s)) { rnp->exp_seq_poll_rq = s; - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) queue_work(rcu_gp_wq, &rnp->exp_poll_wq); } - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags); return s;
On Thu, Dec 15, 2022 at 09:02:56AM -0800, Paul E. McKenney wrote: > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote: > > On Wed, 14 Dec 2022 12:03:33 -0800 > > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > > > Avoid calling the synchronization function when system_state is > > > > > SYSTEM_BOOTING. > > > > > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > > > Then, it will prevent similar warnings if we expand boot time feature. > > > > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > > > Just in case there are ever additional issues of this sort? > > > > Adding more tracing command line parameters is triggering this more. I now > > hit: > > Fair point, and apologies for the hassle. > > Any chance of getting an official "it is now late enough in boot to > safely invoke lockdep" API? ;-) > > In the meantime, does the (untested and quite crude) patch at the end > of this message help? OK, I was clearly not yet awake. :-/ The more targeted (but still untested) patch below might be more appropriate... Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ee8a6a711719a..f627888715dca 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap) { struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_lockdep_assert_held_rcu_node(rnp); // If RCU was idle, note beginning of GP. @@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap) { struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_lockdep_assert_held_rcu_node(rnp); // If the previously noted GP is still in effect, record the @@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { - lockdep_assert_irqs_enabled(); + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) + lockdep_assert_irqs_enabled(); raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_start(snap); @@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap) struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { - lockdep_assert_irqs_enabled(); + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) + lockdep_assert_irqs_enabled(); raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_end(snap);
On Thu, 15 Dec 2022 09:02:56 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote: > > On Wed, 14 Dec 2022 12:03:33 -0800 > > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > > > Avoid calling the synchronization function when system_state is > > > > > SYSTEM_BOOTING. > > > > > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > > > Then, it will prevent similar warnings if we expand boot time feature. > > > > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > > > Just in case there are ever additional issues of this sort? > > > > Adding more tracing command line parameters is triggering this more. I now > > hit: > > Fair point, and apologies for the hassle. > > Any chance of getting an official "it is now late enough in boot to > safely invoke lockdep" API? ;-) lockdep API isn't the problem, it's that we are still in the earlyboot stage where interrupts are disabled, and you can't enable them. Lockdep works just fine there, and is reporting interrupts being disabled correctly. The backtrace reported *does* have interrupts disabled. The thing is, because we are still running on a single CPU with interrupts disabled, there is no need for synchronization. Even grabbing a mutex is safe because there's guaranteed to be no contention (unless it's not released). This is why a lot of warnings are suppressed if system_state is SYSTEM_BOOTING. > > In the meantime, does the (untested and quite crude) patch at the end > of this message help? I'll go and test it, but I'm guessing it will work fine. You could also test against system_state != SYSTEM_BOOTING, as that gets cleared just before kernel_init() can continue (it waits for the complete() that is called after system_state is set to SYSTEM_SCHEDULING). Which happens shortly after rcu_scheduler_starting(). I wonder if you could even replace RCU_SCHEDULER_RUNNING with system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting() call. -- Steve
On Thu, Dec 15, 2022 at 01:51:02PM -0500, Steven Rostedt wrote: > On Thu, 15 Dec 2022 09:02:56 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote: > > > On Wed, 14 Dec 2022 12:03:33 -0800 > > > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > > > > > Avoid calling the synchronization function when system_state is > > > > > > SYSTEM_BOOTING. > > > > > > > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()? > > > > > Then, it will prevent similar warnings if we expand boot time feature. > > > > > > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()? > > > > Just in case there are ever additional issues of this sort? > > > > > > Adding more tracing command line parameters is triggering this more. I now > > > hit: > > > > Fair point, and apologies for the hassle. > > > > Any chance of getting an official "it is now late enough in boot to > > safely invoke lockdep" API? ;-) > > lockdep API isn't the problem, it's that we are still in the earlyboot stage > where interrupts are disabled, and you can't enable them. Lockdep works > just fine there, and is reporting interrupts being disabled correctly. The > backtrace reported *does* have interrupts disabled. > > The thing is, because we are still running on a single CPU with interrupts > disabled, there is no need for synchronization. Even grabbing a mutex is > safe because there's guaranteed to be no contention (unless it's not > released). This is why a lot of warnings are suppressed if system_state is > SYSTEM_BOOTING. Agreed, and my second attempt is a bit more surgical. (Please see below for a more official version of it.) > > In the meantime, does the (untested and quite crude) patch at the end > > of this message help? > > I'll go and test it, but I'm guessing it will work fine. You could also test > against system_state != SYSTEM_BOOTING, as that gets cleared just before > kernel_init() can continue (it waits for the complete() that is called > after system_state is set to SYSTEM_SCHEDULING). Which happens shortly > after rcu_scheduler_starting(). > > I wonder if you could even replace RCU_SCHEDULER_RUNNING with > system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting() > call. In this particular case, agreed, I could use system_state. But there are other cases that must change behavior as soon as preemption can happen, which is upon return from that call to user_mode_thread(). The update to system_state doesn't happen until much later. So I don't get to remove that rcu_scheduler_starting() call. What case? Here is one: o The newly spawned init process does something that uses RCU, but is preempted while holding rcu_read_lock(). o The boot thread, which did the preempting, waits for a grace period. If we use rcu_scheduler_active, all is well because synchronize_rcu() will do a real run-time grace period, thus waiting for that reader. But system_state has not yet been updated, so if synchronize_rcu() were instead to pay attention to that one, there might be a tragically too-short RCU grace period. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit 876c5ac113fa66a64fa241e69d9a2251b8daa5ee Author: Paul E. McKenney <paulmck@kernel.org> Date: Thu Dec 15 09:26:09 2022 -0800 rcu: Don't assert interrupts enabled too early in boot The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check that interrupts are enabled, as they normally should be when waiting for an RCU grace period. Except that it is legal to wait for grace periods during early boot, before interrupts have been enabled for the first time, and polling for grace periods is required to work during this time. This can result in false-positive lockdep splats in the presence of boot-time-initiated tracing. This commit therefore conditions those interrupts-enabled checks on rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by which time interrupts have been enabled. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ee8a6a711719a..f627888715dca 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap) { struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_lockdep_assert_held_rcu_node(rnp); // If RCU was idle, note beginning of GP. @@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap) { struct rcu_node *rnp = rcu_get_root(); - if (rcu_init_invoked()) + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) raw_lockdep_assert_held_rcu_node(rnp); // If the previously noted GP is still in effect, record the @@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { - lockdep_assert_irqs_enabled(); + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) + lockdep_assert_irqs_enabled(); raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_start(snap); @@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap) struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { - lockdep_assert_irqs_enabled(); + if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) + lockdep_assert_irqs_enabled(); raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_end(snap);
On Thu, 15 Dec 2022 11:01:58 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > What case? > > Here is one: > > o The newly spawned init process does something that uses RCU, > but is preempted while holding rcu_read_lock(). > > o The boot thread, which did the preempting, waits for a grace > period. If we use rcu_scheduler_active, all is well because > synchronize_rcu() will do a real run-time grace period, thus > waiting for that reader. > > But system_state has not yet been updated, so if synchronize_rcu() > were instead to pay attention to that one, there might be a > tragically too-short RCU grace period. The thing is, preemption is disabled the entire time here. That is, from: void kthread_show_list(void); noinline void __ref rest_init(void) { struct task_struct *tsk; int pid; rcu_scheduler_starting(); through: system_state = SYSTEM_SCHEDULING; complete(&kthreadd_done); Preemption is disabled and other CPUs have not even been started yet. Although the might_sleep() call might schedule the kernel_init() task but that will only block on the completion. In other words, I don't think anything can cause any issues this early in the boot up. -- Steve
On Thu, Dec 15, 2022 at 05:39:13PM -0500, Steven Rostedt wrote: > On Thu, 15 Dec 2022 11:01:58 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > What case? > > > > Here is one: > > > > o The newly spawned init process does something that uses RCU, > > but is preempted while holding rcu_read_lock(). > > > > o The boot thread, which did the preempting, waits for a grace > > period. If we use rcu_scheduler_active, all is well because > > synchronize_rcu() will do a real run-time grace period, thus > > waiting for that reader. > > > > But system_state has not yet been updated, so if synchronize_rcu() > > were instead to pay attention to that one, there might be a > > tragically too-short RCU grace period. > > The thing is, preemption is disabled the entire time here. > > That is, from: > > void kthread_show_list(void); > noinline void __ref rest_init(void) > { > struct task_struct *tsk; > int pid; > > rcu_scheduler_starting(); > > through: > > system_state = SYSTEM_SCHEDULING; > > complete(&kthreadd_done); > > > Preemption is disabled and other CPUs have not even been started yet. > > Although the might_sleep() call might schedule the kernel_init() task but > that will only block on the completion. > > In other words, I don't think anything can cause any issues this early in > the boot up. The nice thing about the current placement of rcu_scheduler_starting() is that there is not yet any other task to switch to. ;-) Thanx, Paul
On Thu, 15 Dec 2022 15:10:27 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > The nice thing about the current placement of rcu_scheduler_starting() > is that there is not yet any other task to switch to. ;-) Fair enough. Anyway the last patch appears to do the job. Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> -- Steve
On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote: > On Thu, 15 Dec 2022 15:10:27 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > The nice thing about the current placement of rcu_scheduler_starting() > > is that there is not yet any other task to switch to. ;-) > > Fair enough. Anyway the last patch appears to do the job. > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Applied, thank you! Thanx, Paul
On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote: > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote: > > On Thu, 15 Dec 2022 15:10:27 -0800 > > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > The nice thing about the current placement of rcu_scheduler_starting() > > > is that there is not yet any other task to switch to. ;-) > > > > Fair enough. Anyway the last patch appears to do the job. > > > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > Applied, thank you! And I should have checked up on urgency. Normal process would hold this one until the v6.3 merge window, but please let me know if you would like it in v6.2. It is after all a bug fix. Thanx, Paul
On Thu, 15 Dec 2022 20:40:22 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote: > > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote: > > > On Thu, 15 Dec 2022 15:10:27 -0800 > > > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > > > The nice thing about the current placement of rcu_scheduler_starting() > > > > is that there is not yet any other task to switch to. ;-) > > > > > > Fair enough. Anyway the last patch appears to do the job. > > > > > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > > Applied, thank you! > > And I should have checked up on urgency. Normal process would hold this > one until the v6.3 merge window, but please let me know if you would > like it in v6.2. It is after all a bug fix. > Yes, any fix that fixes a lockdep splat is considered urgent and should be pushed ASAP. As it makes finding other lockdep splats more difficult (as the first splat shuts down lockdep). Please send this during the merge window, as I have to carry it internally until it makes it into mainline. Thanks! -- Steve
On Fri, Dec 16, 2022 at 06:08:10AM -0500, Steven Rostedt wrote: > On Thu, 15 Dec 2022 20:40:22 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote: > > > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote: > > > > On Thu, 15 Dec 2022 15:10:27 -0800 > > > > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > > > > > > The nice thing about the current placement of rcu_scheduler_starting() > > > > > is that there is not yet any other task to switch to. ;-) > > > > > > > > Fair enough. Anyway the last patch appears to do the job. > > > > > > > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > > > > Applied, thank you! > > > > And I should have checked up on urgency. Normal process would hold this > > one until the v6.3 merge window, but please let me know if you would > > like it in v6.2. It is after all a bug fix. > > > > Yes, any fix that fixes a lockdep splat is considered urgent and should be > pushed ASAP. As it makes finding other lockdep splats more difficult (as > the first splat shuts down lockdep). > > Please send this during the merge window, as I have to carry it internally > until it makes it into mainline. This is now rebased and marked with tag rcu-urgent.2022.12.17a. If testing goes well, I will send it by the middle of next week. Thanx, Paul
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index 19ce9d22bfd7..e535959939d3 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -1085,8 +1085,14 @@ int set_trigger_filter(char *filter_str, rcu_assign_pointer(data->filter, filter); if (tmp) { - /* Make sure the call is done with the filter */ - tracepoint_synchronize_unregister(); + /* + * Make sure the call is done with the filter. + * It is possible that a filter could fail at boot up, + * and then this path will be called. Avoid the synchronization + * in that case. + */ + if (system_state != SYSTEM_BOOTING) + tracepoint_synchronize_unregister(); free_event_filter(tmp); }