Message ID | 20221019200137.70343645@gandalf.local.home (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | tracing: Add trace_trigger kernel command line option | expand |
On Wed, 19 Oct 2022 20:01:37 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) > #ifdef CONFIG_TRACEPOINTS > static inline void tracepoint_synchronize_unregister(void) > { > + /* Early updates do not need synchronization */ > + if (early_boot_irqs_disabled) > + return; > + > synchronize_srcu(&tracepoint_srcu); > synchronize_rcu(); I wonder if this check should be just moved to the RCU synchronization code? That is, if early_boot_irqs_disabled is set, do nothing, as there should be nothing to synchronize against. It took me a bit to figure out why I was constantly getting: [ 1.072013] rcu: srcu_init: Setting srcu_struct sizes based on contention. [ 1.078888] ------------[ cut here ]------------ [ 1.083422] Interrupts were enabled early [ 1.087399] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8 [ 1.094656] Modules linked in: [ 1.097683] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #461 [ 1.104509] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 [ 1.113411] RIP: 0010:start_kernel+0x36f/0x4e8 [ 1.117817] Code: 48 27 5f e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 26 31 a0 e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 da 36 47 fe ff 15 14 f9 89 ff e8 56 [ 1.136485] RSP: 0000:ffffffffa0603f30 EFLAGS: 00010286 [ 1.141670] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 1.148754] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001 [ 1.155840] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa0603dd8 [ 1.162926] R10: 0000000000000003 R11: ffffffffa0734168 R12: 000000000000180a [ 1.170011] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018 [ 1.177097] FS: 0000000000000000(0000) GS:ffff91909aa00000(0000) knlGS:0000000000000000 [ 1.185131] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.190835] CR2: ffff91909edff000 CR3: 000000007760a001 CR4: 00000000000606f0 [ 1.197924] Call Trace: [ 1.200345] <TASK> It was a bit of whack-a-mole in finding all the places that were calling synchronize RCU. -- Steve
Hi Steven, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master v6.1-rc1 next-20221020] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/tracing-Add-trace_trigger-kernel-command-line-option/20221020-102230 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20221019200137.70343645%40gandalf.local.home patch subject: [PATCH] tracing: Add trace_trigger kernel command line option config: i386-randconfig-a001 compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/476ab107174fa2ac06ef7171b093235382008c77 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Steven-Rostedt/tracing-Add-trace_trigger-kernel-command-line-option/20221020-102230 git checkout 476ab107174fa2ac06ef7171b093235382008c77 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/trace/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): kernel/trace/trace_events.c: In function '__trace_early_add_new_event': >> kernel/trace/trace_events.c:2864:13: warning: unused variable 'i' [-Wunused-variable] 2864 | int i; | ^ vim +/i +2864 kernel/trace/trace_events.c 2852 2853 /* 2854 * Just create a descriptor for early init. A descriptor is required 2855 * for enabling events at boot. We want to enable events before 2856 * the filesystem is initialized. 2857 */ 2858 static int 2859 __trace_early_add_new_event(struct trace_event_call *call, 2860 struct trace_array *tr) 2861 { 2862 struct trace_event_file *file; 2863 int ret; > 2864 int i; 2865 2866 file = trace_create_new_event(call, tr); 2867 if (!file) 2868 return -ENOMEM; 2869 2870 ret = event_define_fields(call); 2871 if (ret) 2872 return ret; 2873
Hi Steven, Thank you for the patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.1-rc1 next-20221020] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Steven-Rostedt/tracing-Add-trace_trigger-kernel-command-line-option/20221020-102230 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20221019200137.70343645%40gandalf.local.home patch subject: [PATCH] tracing: Add trace_trigger kernel command line option config: s390-allmodconfig compiler: s390-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/476ab107174fa2ac06ef7171b093235382008c77 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Steven-Rostedt/tracing-Add-trace_trigger-kernel-command-line-option/20221020-102230 git checkout 476ab107174fa2ac06ef7171b093235382008c77 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>, old ones prefixed by <<): >> ERROR: modpost: "early_boot_irqs_disabled" [mm/kasan/kasan_test.ko] undefined! >> ERROR: modpost: "early_boot_irqs_disabled" [mm/kfence/kfence_test.ko] undefined! ERROR: modpost: "devm_ioremap_resource" [drivers/dma/qcom/hdma.ko] undefined! ERROR: modpost: "devm_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined! ERROR: modpost: "devm_ioremap_resource" [drivers/dma/idma64.ko] undefined! ERROR: modpost: "iounmap" [drivers/tty/ipwireless/ipwireless.ko] undefined! ERROR: modpost: "ioremap" [drivers/tty/ipwireless/ipwireless.ko] undefined! ERROR: modpost: "devm_platform_ioremap_resource" [drivers/char/xillybus/xillybus_of.ko] undefined! ERROR: modpost: "devm_memremap" [drivers/misc/open-dice.ko] undefined! ERROR: modpost: "devm_memunmap" [drivers/misc/open-dice.ko] undefined! WARNING: modpost: suppressed 20 unresolved symbol warnings because there were too many)
On Wed, 19 Oct 2022 20:01:37 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > Allow triggers to be enabled at kernel boot up. For example: > > trace_trigger="sched_switch.stacktrace if prev_state == 2" > > The above will enable the stacktrace trigger on top of the sched_switch > event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then > at boot up, a stacktrace will trigger and be recorded in the tracing ring > buffer every time the sched_switch happens where the previous state is > TASK_INTERRUPTIBLE. > > As this calls into tracepoint logic during very early boot (before > interrupts are enabled), a check has to be done to see if early boot > interrupts are still disabled, and if so, avoid any call to RCU > synchronization, as that will enable interrupts and cause boot up issues. Just out of curiousity, can you do it by boot-time tracer? (Is it too late for your issue?) $ cat >> stacktrace.bconf ftrace.event.sched.sched_switch.actions = "stacktrace if prev_state == 2" ^D $ bootconfig -a stacktrace.bconf initrd.img And boot kernel with "bootconfig". Then, ---------- init-1 [000] d..3. 0.546668: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule => schedule_timeout => wait_for_completion_killable => __kthread_create_on_node => kthread_create_on_node => audit_init => do_one_initcall => kernel_init_freeable => kernel_init => ret_from_fork kauditd-57 [007] d..3. 0.546677: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule => schedule_preempt_disabled => kthread => ret_from_fork ---------- Thank you, > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > .../admin-guide/kernel-parameters.txt | 19 ++++++ > include/linux/tracepoint.h | 4 ++ > kernel/trace/trace.c | 3 +- > kernel/trace/trace_events.c | 63 ++++++++++++++++++- > kernel/tracepoint.c | 6 ++ > 5 files changed, 92 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index a465d5242774..ccf91a4bf113 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -6257,6 +6257,25 @@ > See also Documentation/trace/ftrace.rst "trace options" > section. > > + trace_trigger=[trigger-list] > + [FTRACE] Add a event trigger on specific events. > + Set a trigger on top of a specific event, with an optional > + filter. > + > + The format is is "trace_trigger=<event>.<trigger>[ if <filter>],..." > + Where more than one trigger may be specified that are comma deliminated. > + > + For example: > + > + trace_trigger="sched_switch.stacktrace if prev_state == 2" > + > + The above will enable the "stacktrace" trigger on the "sched_switch" > + event but only trigger it if the "prev_state" of the "sched_switch" > + event is "2" (TASK_UNINTERUPTIBLE). > + > + See also "Event triggers" in Documentation/trace/events.rst > + > + > traceoff_on_warning > [FTRACE] enable this option to disable tracing when a > warning is hit. This turns off "tracing_on". Tracing can > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 4b33b95eb8be..a5c6b5772897 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) > #ifdef CONFIG_TRACEPOINTS > static inline void tracepoint_synchronize_unregister(void) > { > + /* Early updates do not need synchronization */ > + if (early_boot_irqs_disabled) > + return; > + > synchronize_srcu(&tracepoint_srcu); > synchronize_rcu(); > } > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 47a44b055a1d..c03fd7037add 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2749,7 +2749,8 @@ void trace_buffered_event_disable(void) > preempt_enable(); > > /* Wait for all current users to finish */ > - synchronize_rcu(); > + if (!early_boot_irqs_disabled) > + synchronize_rcu(); > > for_each_tracing_cpu(cpu) { > free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index 0356cae0cf74..06554939252c 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -2796,6 +2796,44 @@ trace_create_new_event(struct trace_event_call *call, > return file; > } > > +#ifdef CONFIG_HIST_TRIGGERS > +#define MAX_BOOT_TRIGGERS 32 > + > +static struct boot_triggers { > + const char *event; > + char *trigger; > +} bootup_triggers[MAX_BOOT_TRIGGERS]; > + > +static char bootup_trigger_buf[COMMAND_LINE_SIZE]; > +static int nr_boot_triggers; > + > +static __init int setup_trace_triggers(char *str) > +{ > + char *trigger; > + char *buf; > + int i; > + > + strlcpy(bootup_trigger_buf, str, COMMAND_LINE_SIZE); > + ring_buffer_expanded = true; > + disable_tracing_selftest("running event triggers"); > + > + buf = bootup_trigger_buf; > + for (i = 0; i < MAX_BOOT_TRIGGERS; i++) { > + trigger = strsep(&buf, ","); > + if (!trigger) > + break; > + bootup_triggers[i].event = strsep(&trigger, "."); > + bootup_triggers[i].trigger = strsep(&trigger, "."); > + if (!bootup_triggers[i].trigger) > + break; > + } > + > + nr_boot_triggers = i; > + return 1; > +} > +__setup("trace_trigger=", setup_trace_triggers); > +#endif > + > /* Add an event to a trace directory */ > static int > __trace_add_new_event(struct trace_event_call *call, struct trace_array *tr) > @@ -2822,12 +2860,32 @@ __trace_early_add_new_event(struct trace_event_call *call, > struct trace_array *tr) > { > struct trace_event_file *file; > + int ret; > + int i; > > file = trace_create_new_event(call, tr); > if (!file) > return -ENOMEM; > > - return event_define_fields(call); > + ret = event_define_fields(call); > + if (ret) > + return ret; > + > +#ifdef CONFIG_HIST_TRIGGERS > + for (i = 0; i < nr_boot_triggers; i++) { > + if (strcmp(trace_event_name(call), bootup_triggers[i].event)) > + continue; > + mutex_lock(&event_mutex); > + ret = trigger_process_regex(file, bootup_triggers[i].trigger); > + mutex_unlock(&event_mutex); > + if (ret) > + pr_err("Failed to register trigger '%s' on event %s\n", > + bootup_triggers[i].trigger, > + bootup_triggers[i].event); > + } > +#endif > + > + return 0; > } > > struct ftrace_module_file_ops; > @@ -3726,6 +3784,8 @@ static __init int event_trace_enable(void) > list_add(&call->list, &ftrace_events); > } > > + register_trigger_cmds(); > + > /* > * We need the top trace array to have a working set of trace > * points at early init, before the debug files and directories > @@ -3740,7 +3800,6 @@ static __init int event_trace_enable(void) > > register_event_cmds(); > > - register_trigger_cmds(); > > return 0; > } > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > index f23144af5743..f6e4ee1e40b3 100644 > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -48,6 +48,9 @@ static void tp_rcu_get_state(enum tp_transition_sync sync) > { > struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; > > + if (early_boot_irqs_disabled) > + return; > + > /* Keep the latest get_state snapshot. */ > snapshot->rcu = get_state_synchronize_rcu(); > snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu); > @@ -58,6 +61,9 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync) > { > struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; > > + if (early_boot_irqs_disabled) > + return; > + > if (!snapshot->ongoing) > return; > cond_synchronize_rcu(snapshot->rcu); > -- > 2.35.1 >
Hi Steven, On Wed, Oct 19, 2022 at 8:01 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > Allow triggers to be enabled at kernel boot up. For example: > > trace_trigger="sched_switch.stacktrace if prev_state == 2" > > The above will enable the stacktrace trigger on top of the sched_switch > event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then > at boot up, a stacktrace will trigger and be recorded in the tracing ring > buffer every time the sched_switch happens where the previous state is > TASK_INTERRUPTIBLE. > > As this calls into tracepoint logic during very early boot (before > interrupts are enabled), a check has to be done to see if early boot > interrupts are still disabled, and if so, avoid any call to RCU > synchronization, as that will enable interrupts and cause boot up issues. > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- rest_init() -> rcu_scheduler_starting() sets RCU_SCHEDULER_INIT start_kernel() disables interrupts and does the dance you mentioned, while setting that early_ flag. start_kernel() is called from arch code. From your crash, the trace trigger is happening before the early_ boot flag is set to true. rcu_blocking_is_gp() should return true at the point your trigger fires because RCU_SCHEDULER_INACTIVE is still set. So I would expect your synchronize_rcu() to not do anything. There's comments in rcu_blocking_is_gp() like so: * During early boot, any blocking grace-period wait automatically * implies a grace period. So it should be returning straight from here. I wonder where/why it's re-enabling interrupts for you. What does rcu_blocking_is_gp() return if you can print that, at the time of the crash? Cheers, - J > .../admin-guide/kernel-parameters.txt | 19 ++++++ > include/linux/tracepoint.h | 4 ++ > kernel/trace/trace.c | 3 +- > kernel/trace/trace_events.c | 63 ++++++++++++++++++- > kernel/tracepoint.c | 6 ++ > 5 files changed, 92 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index a465d5242774..ccf91a4bf113 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -6257,6 +6257,25 @@ > See also Documentation/trace/ftrace.rst "trace options" > section. > > + trace_trigger=[trigger-list] > + [FTRACE] Add a event trigger on specific events. > + Set a trigger on top of a specific event, with an optional > + filter. > + > + The format is is "trace_trigger=<event>.<trigger>[ if <filter>],..." > + Where more than one trigger may be specified that are comma deliminated. > + > + For example: > + > + trace_trigger="sched_switch.stacktrace if prev_state == 2" > + > + The above will enable the "stacktrace" trigger on the "sched_switch" > + event but only trigger it if the "prev_state" of the "sched_switch" > + event is "2" (TASK_UNINTERUPTIBLE). > + > + See also "Event triggers" in Documentation/trace/events.rst > + > + > traceoff_on_warning > [FTRACE] enable this option to disable tracing when a > warning is hit. This turns off "tracing_on". Tracing can > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 4b33b95eb8be..a5c6b5772897 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) > #ifdef CONFIG_TRACEPOINTS > static inline void tracepoint_synchronize_unregister(void) > { > + /* Early updates do not need synchronization */ > + if (early_boot_irqs_disabled) > + return; > + > synchronize_srcu(&tracepoint_srcu); > synchronize_rcu(); > } > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 47a44b055a1d..c03fd7037add 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2749,7 +2749,8 @@ void trace_buffered_event_disable(void) > preempt_enable(); > > /* Wait for all current users to finish */ > - synchronize_rcu(); > + if (!early_boot_irqs_disabled) > + synchronize_rcu(); > > for_each_tracing_cpu(cpu) { > free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c > index 0356cae0cf74..06554939252c 100644 > --- a/kernel/trace/trace_events.c > +++ b/kernel/trace/trace_events.c > @@ -2796,6 +2796,44 @@ trace_create_new_event(struct trace_event_call *call, > return file; > } > > +#ifdef CONFIG_HIST_TRIGGERS > +#define MAX_BOOT_TRIGGERS 32 > + > +static struct boot_triggers { > + const char *event; > + char *trigger; > +} bootup_triggers[MAX_BOOT_TRIGGERS]; > + > +static char bootup_trigger_buf[COMMAND_LINE_SIZE]; > +static int nr_boot_triggers; > + > +static __init int setup_trace_triggers(char *str) > +{ > + char *trigger; > + char *buf; > + int i; > + > + strlcpy(bootup_trigger_buf, str, COMMAND_LINE_SIZE); > + ring_buffer_expanded = true; > + disable_tracing_selftest("running event triggers"); > + > + buf = bootup_trigger_buf; > + for (i = 0; i < MAX_BOOT_TRIGGERS; i++) { > + trigger = strsep(&buf, ","); > + if (!trigger) > + break; > + bootup_triggers[i].event = strsep(&trigger, "."); > + bootup_triggers[i].trigger = strsep(&trigger, "."); > + if (!bootup_triggers[i].trigger) > + break; > + } > + > + nr_boot_triggers = i; > + return 1; > +} > +__setup("trace_trigger=", setup_trace_triggers); > +#endif > + > /* Add an event to a trace directory */ > static int > __trace_add_new_event(struct trace_event_call *call, struct trace_array *tr) > @@ -2822,12 +2860,32 @@ __trace_early_add_new_event(struct trace_event_call *call, > struct trace_array *tr) > { > struct trace_event_file *file; > + int ret; > + int i; > > file = trace_create_new_event(call, tr); > if (!file) > return -ENOMEM; > > - return event_define_fields(call); > + ret = event_define_fields(call); > + if (ret) > + return ret; > + > +#ifdef CONFIG_HIST_TRIGGERS > + for (i = 0; i < nr_boot_triggers; i++) { > + if (strcmp(trace_event_name(call), bootup_triggers[i].event)) > + continue; > + mutex_lock(&event_mutex); > + ret = trigger_process_regex(file, bootup_triggers[i].trigger); > + mutex_unlock(&event_mutex); > + if (ret) > + pr_err("Failed to register trigger '%s' on event %s\n", > + bootup_triggers[i].trigger, > + bootup_triggers[i].event); > + } > +#endif > + > + return 0; > } > > struct ftrace_module_file_ops; > @@ -3726,6 +3784,8 @@ static __init int event_trace_enable(void) > list_add(&call->list, &ftrace_events); > } > > + register_trigger_cmds(); > + > /* > * We need the top trace array to have a working set of trace > * points at early init, before the debug files and directories > @@ -3740,7 +3800,6 @@ static __init int event_trace_enable(void) > > register_event_cmds(); > > - register_trigger_cmds(); > > return 0; > } > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > index f23144af5743..f6e4ee1e40b3 100644 > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -48,6 +48,9 @@ static void tp_rcu_get_state(enum tp_transition_sync sync) > { > struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; > > + if (early_boot_irqs_disabled) > + return; > + > /* Keep the latest get_state snapshot. */ > snapshot->rcu = get_state_synchronize_rcu(); > snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu); > @@ -58,6 +61,9 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync) > { > struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; > > + if (early_boot_irqs_disabled) > + return; > + > if (!snapshot->ongoing) > return; > cond_synchronize_rcu(snapshot->rcu); > -- > 2.35.1 >
On Thu, Oct 20, 2022 at 10:43 AM Joel Fernandes <joel@joelfernandes.org> wrote: > > Hi Steven, > > On Wed, Oct 19, 2022 at 8:01 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > Allow triggers to be enabled at kernel boot up. For example: > > > > trace_trigger="sched_switch.stacktrace if prev_state == 2" > > > > The above will enable the stacktrace trigger on top of the sched_switch > > event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then > > at boot up, a stacktrace will trigger and be recorded in the tracing ring > > buffer every time the sched_switch happens where the previous state is > > TASK_INTERRUPTIBLE. > > > > As this calls into tracepoint logic during very early boot (before > > interrupts are enabled), a check has to be done to see if early boot > > interrupts are still disabled, and if so, avoid any call to RCU > > synchronization, as that will enable interrupts and cause boot up issues. > > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > rest_init() -> rcu_scheduler_starting() sets RCU_SCHEDULER_INIT > > start_kernel() disables interrupts and does the dance you mentioned, > while setting that early_ flag. start_kernel() is called from arch > code. > > From your crash, the trace trigger is happening before the early_ boot > flag is set to true. My apologies. I meant, before early_boot_irqs_disabled is set to false.
On Thu, 20 Oct 2022 23:33:18 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Wed, 19 Oct 2022 20:01:37 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > Allow triggers to be enabled at kernel boot up. For example: > > > > trace_trigger="sched_switch.stacktrace if prev_state == 2" > > > > The above will enable the stacktrace trigger on top of the sched_switch > > event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then > > at boot up, a stacktrace will trigger and be recorded in the tracing ring > > buffer every time the sched_switch happens where the previous state is > > TASK_INTERRUPTIBLE. > > > > As this calls into tracepoint logic during very early boot (before > > interrupts are enabled), a check has to be done to see if early boot > > interrupts are still disabled, and if so, avoid any call to RCU > > synchronization, as that will enable interrupts and cause boot up issues. > > Just out of curiousity, can you do it by boot-time tracer? > (Is it too late for your issue?) Yeah, I'm looking at adding triggers very early. I could even add "traceoff" trigger too here. I'm guessing that bootconf is done later in boot up? Especially since I needed to fix RCU calls to get this working. > > $ cat >> stacktrace.bconf > ftrace.event.sched.sched_switch.actions = "stacktrace if prev_state == 2" > ^D > $ bootconfig -a stacktrace.bconf initrd.img > Not to mention, I'm doing this for Chromebooks where it's easy to update the command line (on dev devices), but not as easy to modify the kernel. -- Steve
On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote: > On Wed, 19 Oct 2022 20:01:37 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) > > #ifdef CONFIG_TRACEPOINTS > > static inline void tracepoint_synchronize_unregister(void) > > { > > + /* Early updates do not need synchronization */ > > + if (early_boot_irqs_disabled) > > + return; > > + > > synchronize_srcu(&tracepoint_srcu); > > synchronize_rcu(); > > I wonder if this check should be just moved to the RCU synchronization > code? That is, if early_boot_irqs_disabled is set, do nothing, as there > should be nothing to synchronize against. There already is a similar check, but it follows the lockdep check. Does the following patch help? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index ca4b5dcec675b..356ef70d5442c 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm) { struct rcu_synchronize rcu; + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) + return; + RCU_LOCKDEP_WARN(lockdep_is_held(ssp) || lock_is_held(&rcu_bh_lock_map) || lock_is_held(&rcu_lock_map) || lock_is_held(&rcu_sched_lock_map), "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section"); - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) - return; might_sleep(); check_init_srcu_struct(ssp); init_completion(&rcu.completion);
On Thu, 20 Oct 2022 08:59:00 -0700 "Paul E. McKenney" <paulmck@kernel.org> wrote: > On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote: > > On Wed, 19 Oct 2022 20:01:37 -0400 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) > > > #ifdef CONFIG_TRACEPOINTS > > > static inline void tracepoint_synchronize_unregister(void) > > > { > > > + /* Early updates do not need synchronization */ > > > + if (early_boot_irqs_disabled) > > > + return; > > > + > > > synchronize_srcu(&tracepoint_srcu); > > > synchronize_rcu(); > > > > I wonder if this check should be just moved to the RCU synchronization > > code? That is, if early_boot_irqs_disabled is set, do nothing, as there > > should be nothing to synchronize against. > > There already is a similar check, but it follows the lockdep check. > > Does the following patch help? Not sure if it would. I added this: diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c03fd7037add..79ac31a6a87b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void) disable_trace_buffered_event, NULL, 1); preempt_enable(); + if (!irqs_disabled()) + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__); /* Wait for all current users to finish */ - if (!early_boot_irqs_disabled) - synchronize_rcu(); + synchronize_rcu(); + if (!irqs_disabled()) + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__); for_each_tracing_cpu(cpu) { free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); Which produced this: [ 0.972867] ftrace: allocating 47021 entries in 184 pages [ 0.978285] ftrace section at ffffffffacef74c0 sorted properly [ 0.991153] ftrace: allocated 184 pages with 4 groups [ 0.996163] Dynamic Preempt: full [ 0.999354] rcu: Preemptible hierarchical RCU implementation. [ 1.005020] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=8. [ 1.011680] Tasks-RCU CPU stall warnings timeout set to 120000 (rcu_task_stall_timeout). [ 1.019813] Trampoline variant of Tasks RCU enabled. [ 1.024831] Rude variant of Tasks RCU enabled. [ 1.029330] Tracing variant of Tasks RCU enabled. [ 1.034089] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. [ 1.041699] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 1.052607] IRQS DISABLED!! after trace_buffered_event_disable:2756 [ 1.058712] NR_IRQS: 8448, nr_irqs: 488, preallocated irqs: 16 [ 1.064670] NO_HZ: Full dynticks CPUs: 3. [ 1.068501] rcu: Offload RCU callbacks from CPUs: 3. [ 1.073519] rcu: srcu_init: Setting srcu_struct sizes based on contention. [ 1.080400] ------------[ cut here ]------------ [ 1.084938] Interrupts were enabled early [ 1.088919] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8 [ 1.096184] Modules linked in: [ 1.099213] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #468 [ 1.106047] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 [ 1.114960] RIP: 0010:start_kernel+0x36f/0x4e8 [ 1.119373] Code: 48 27 53 e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 25 31 ac e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 ba 36 47 fe ff 15 14 f9 89 ff e8 56 [ 1.138057] RSP: 0000:ffffffffac603f30 EFLAGS: 00010286 [ 1.143248] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 1.150343] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001 [ 1.157438] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffac603dd8 [ 1.164537] R10: 0000000000000003 R11: ffffffffac734168 R12: 000000000000180a [ 1.171633] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018 [ 1.178730] FS: 0000000000000000(0000) GS:ffff93ee1aa00000(0000) knlGS:0000000000000000 [ 1.186778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.192489] CR2: ffff93ee1edff000 CR3: 000000002e60a001 CR4: 00000000000606f0 [ 1.199584] Call Trace: [ 1.202009] <TASK> So I'm not sure how just updating srcu will do anything to prevent this. -- Steve > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > index ca4b5dcec675b..356ef70d5442c 100644 > --- a/kernel/rcu/srcutree.c > +++ b/kernel/rcu/srcutree.c > @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm) > { > struct rcu_synchronize rcu; > > + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > + return; > + > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) || > lock_is_held(&rcu_bh_lock_map) || > lock_is_held(&rcu_lock_map) || > lock_is_held(&rcu_sched_lock_map), > "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section"); > > - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > - return; > might_sleep(); > check_init_srcu_struct(ssp); > init_completion(&rcu.completion);
> On Oct 20, 2022, at 12:33 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 20 Oct 2022 08:59:00 -0700 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > >>> On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote: >>> On Wed, 19 Oct 2022 20:01:37 -0400 >>> Steven Rostedt <rostedt@goodmis.org> wrote: >>> >>>> @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) >>>> #ifdef CONFIG_TRACEPOINTS >>>> static inline void tracepoint_synchronize_unregister(void) >>>> { >>>> + /* Early updates do not need synchronization */ >>>> + if (early_boot_irqs_disabled) >>>> + return; >>>> + >>>> synchronize_srcu(&tracepoint_srcu); >>>> synchronize_rcu(); >>> >>> I wonder if this check should be just moved to the RCU synchronization >>> code? That is, if early_boot_irqs_disabled is set, do nothing, as there >>> should be nothing to synchronize against. >> >> There already is a similar check, but it follows the lockdep check. >> >> Does the following patch help? > > Not sure if it would. > > I added this: I’m not working today, so I can’t look very closely. But could you remove the might_sleep() from rcu_gp_is_blocking() and see if it fixes your issue? I think that function call might be incompatible with early boot. Just a hunch. - Joel > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index c03fd7037add..79ac31a6a87b 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void) > disable_trace_buffered_event, NULL, 1); > preempt_enable(); > > + if (!irqs_disabled()) > + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__); > /* Wait for all current users to finish */ > - if (!early_boot_irqs_disabled) > - synchronize_rcu(); > + synchronize_rcu(); > + if (!irqs_disabled()) > + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__); > > for_each_tracing_cpu(cpu) { > free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); > > Which produced this: > > > [ 0.972867] ftrace: allocating 47021 entries in 184 pages > [ 0.978285] ftrace section at ffffffffacef74c0 sorted properly > [ 0.991153] ftrace: allocated 184 pages with 4 groups > [ 0.996163] Dynamic Preempt: full > [ 0.999354] rcu: Preemptible hierarchical RCU implementation. > [ 1.005020] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=8. > [ 1.011680] Tasks-RCU CPU stall warnings timeout set to 120000 (rcu_task_stall_timeout). > [ 1.019813] Trampoline variant of Tasks RCU enabled. > [ 1.024831] Rude variant of Tasks RCU enabled. > [ 1.029330] Tracing variant of Tasks RCU enabled. > [ 1.034089] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. > [ 1.041699] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 > [ 1.052607] IRQS DISABLED!! after trace_buffered_event_disable:2756 > [ 1.058712] NR_IRQS: 8448, nr_irqs: 488, preallocated irqs: 16 > [ 1.064670] NO_HZ: Full dynticks CPUs: 3. > [ 1.068501] rcu: Offload RCU callbacks from CPUs: 3. > [ 1.073519] rcu: srcu_init: Setting srcu_struct sizes based on contention. > [ 1.080400] ------------[ cut here ]------------ > [ 1.084938] Interrupts were enabled early > [ 1.088919] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8 > [ 1.096184] Modules linked in: > [ 1.099213] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #468 > [ 1.106047] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 > [ 1.114960] RIP: 0010:start_kernel+0x36f/0x4e8 > [ 1.119373] Code: 48 27 53 e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 25 31 ac e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 ba 36 47 fe ff 15 14 f9 89 ff e8 56 > [ 1.138057] RSP: 0000:ffffffffac603f30 EFLAGS: 00010286 > [ 1.143248] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 1.150343] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001 > [ 1.157438] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffac603dd8 > [ 1.164537] R10: 0000000000000003 R11: ffffffffac734168 R12: 000000000000180a > [ 1.171633] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018 > [ 1.178730] FS: 0000000000000000(0000) GS:ffff93ee1aa00000(0000) knlGS:0000000000000000 > [ 1.186778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.192489] CR2: ffff93ee1edff000 CR3: 000000002e60a001 CR4: 00000000000606f0 > [ 1.199584] Call Trace: > [ 1.202009] <TASK> > > > So I'm not sure how just updating srcu will do anything to prevent this. > > -- Steve > >> >> Thanx, Paul >> >> ------------------------------------------------------------------------ >> >> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c >> index ca4b5dcec675b..356ef70d5442c 100644 >> --- a/kernel/rcu/srcutree.c >> +++ b/kernel/rcu/srcutree.c >> @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm) >> { >> struct rcu_synchronize rcu; >> >> + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) >> + return; >> + >> RCU_LOCKDEP_WARN(lockdep_is_held(ssp) || >> lock_is_held(&rcu_bh_lock_map) || >> lock_is_held(&rcu_lock_map) || >> lock_is_held(&rcu_sched_lock_map), >> "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section"); >> >> - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) >> - return; >> might_sleep(); >> check_init_srcu_struct(ssp); >> init_completion(&rcu.completion); >
On Thu, 20 Oct 2022 12:33:57 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index c03fd7037add..79ac31a6a87b 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void) > disable_trace_buffered_event, NULL, 1); > preempt_enable(); > > + if (!irqs_disabled()) > + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__); > /* Wait for all current users to finish */ > - if (!early_boot_irqs_disabled) > - synchronize_rcu(); > + synchronize_rcu(); > + if (!irqs_disabled()) > + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__); > > for_each_tracing_cpu(cpu) { > free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); Continuing the above printk()s I found the culprit. synchronize_rcu() { rcu_poll_gp_seq_start_unlocked() { struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { lockdep_assert_irqs_enabled(); raw_spin_lock_irq_rcu_node(rnp); } rcu_poll_gp_seq_start(snap); if (rcu_init_invoked()) raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here } -- Steve
On Thu, Oct 20, 2022 at 12:33:57PM -0400, Steven Rostedt wrote: > On Thu, 20 Oct 2022 08:59:00 -0700 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > On Wed, Oct 19, 2022 at 08:07:45PM -0400, Steven Rostedt wrote: > > > On Wed, 19 Oct 2022 20:01:37 -0400 > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) > > > > #ifdef CONFIG_TRACEPOINTS > > > > static inline void tracepoint_synchronize_unregister(void) > > > > { > > > > + /* Early updates do not need synchronization */ > > > > + if (early_boot_irqs_disabled) > > > > + return; > > > > + > > > > synchronize_srcu(&tracepoint_srcu); > > > > synchronize_rcu(); > > > > > > I wonder if this check should be just moved to the RCU synchronization > > > code? That is, if early_boot_irqs_disabled is set, do nothing, as there > > > should be nothing to synchronize against. > > > > There already is a similar check, but it follows the lockdep check. > > > > Does the following patch help? > > Not sure if it would. > > I added this: > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index c03fd7037add..79ac31a6a87b 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void) > disable_trace_buffered_event, NULL, 1); > preempt_enable(); > > + if (!irqs_disabled()) > + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__); > /* Wait for all current users to finish */ > - if (!early_boot_irqs_disabled) > - synchronize_rcu(); > + synchronize_rcu(); > + if (!irqs_disabled()) > + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__); > > for_each_tracing_cpu(cpu) { > free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); > > Which produced this: > > > [ 0.972867] ftrace: allocating 47021 entries in 184 pages > [ 0.978285] ftrace section at ffffffffacef74c0 sorted properly > [ 0.991153] ftrace: allocated 184 pages with 4 groups > [ 0.996163] Dynamic Preempt: full > [ 0.999354] rcu: Preemptible hierarchical RCU implementation. > [ 1.005020] rcu: RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=8. > [ 1.011680] Tasks-RCU CPU stall warnings timeout set to 120000 (rcu_task_stall_timeout). > [ 1.019813] Trampoline variant of Tasks RCU enabled. > [ 1.024831] Rude variant of Tasks RCU enabled. > [ 1.029330] Tracing variant of Tasks RCU enabled. > [ 1.034089] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. > [ 1.041699] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 > [ 1.052607] IRQS DISABLED!! after trace_buffered_event_disable:2756 > [ 1.058712] NR_IRQS: 8448, nr_irqs: 488, preallocated irqs: 16 > [ 1.064670] NO_HZ: Full dynticks CPUs: 3. > [ 1.068501] rcu: Offload RCU callbacks from CPUs: 3. > [ 1.073519] rcu: srcu_init: Setting srcu_struct sizes based on contention. > [ 1.080400] ------------[ cut here ]------------ > [ 1.084938] Interrupts were enabled early > [ 1.088919] WARNING: CPU: 0 PID: 0 at init/main.c:1061 start_kernel+0x36f/0x4e8 > [ 1.096184] Modules linked in: > [ 1.099213] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc1-test+ #468 > [ 1.106047] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 > [ 1.114960] RIP: 0010:start_kernel+0x36f/0x4e8 > [ 1.119373] Code: 48 27 53 e8 fd 7b 03 00 e8 55 72 e5 fe e8 6b 05 03 00 ff 15 2a f9 89 ff 0f ba e0 09 73 0e 48 c7 c7 98 25 31 ac e8 01 5b df fe <0f> 0b c6 05 c7 2c af ff 00 e8 ba 36 47 fe ff 15 14 f9 89 ff e8 56 > [ 1.138057] RSP: 0000:ffffffffac603f30 EFLAGS: 00010286 > [ 1.143248] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 > [ 1.150343] RDX: 0000000000000002 RSI: 00000000ffffdfff RDI: 0000000000000001 > [ 1.157438] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffac603dd8 > [ 1.164537] R10: 0000000000000003 R11: ffffffffac734168 R12: 000000000000180a > [ 1.171633] R13: 00000000010fecf0 R14: 0000000000000000 R15: 00000000d44e3018 > [ 1.178730] FS: 0000000000000000(0000) GS:ffff93ee1aa00000(0000) knlGS:0000000000000000 > [ 1.186778] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.192489] CR2: ffff93ee1edff000 CR3: 000000002e60a001 CR4: 00000000000606f0 > [ 1.199584] Call Trace: > [ 1.202009] <TASK> > > > So I'm not sure how just updating srcu will do anything to prevent this. Apologies, SRCU on the brain due to a different email thread... :-/ Thanx, Paul > -- Steve > > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c > > index ca4b5dcec675b..356ef70d5442c 100644 > > --- a/kernel/rcu/srcutree.c > > +++ b/kernel/rcu/srcutree.c > > @@ -1267,14 +1267,15 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm) > > { > > struct rcu_synchronize rcu; > > > > + if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > > + return; > > + > > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) || > > lock_is_held(&rcu_bh_lock_map) || > > lock_is_held(&rcu_lock_map) || > > lock_is_held(&rcu_sched_lock_map), > > "Illegal synchronize_srcu() in same-type SRCU (or in RCU) read-side critical section"); > > > > - if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE) > > - return; > > might_sleep(); > > check_init_srcu_struct(ssp); > > init_completion(&rcu.completion); >
On Thu, Oct 20, 2022 at 01:05:28PM -0400, Steven Rostedt wrote: > On Thu, 20 Oct 2022 12:33:57 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index c03fd7037add..79ac31a6a87b 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -2748,9 +2748,12 @@ void trace_buffered_event_disable(void) > > disable_trace_buffered_event, NULL, 1); > > preempt_enable(); > > > > + if (!irqs_disabled()) > > + printk("IRQS DISABLED!! before %s:%d\n", __func__, __LINE__); > > /* Wait for all current users to finish */ > > - if (!early_boot_irqs_disabled) > > - synchronize_rcu(); > > + synchronize_rcu(); > > + if (!irqs_disabled()) > > + printk("IRQS DISABLED!! after %s:%d\n", __func__, __LINE__); > > > > for_each_tracing_cpu(cpu) { > > free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); > > > Continuing the above printk()s I found the culprit. > > synchronize_rcu() { > rcu_poll_gp_seq_start_unlocked() { > struct rcu_node *rnp = rcu_get_root(); > > if (rcu_init_invoked()) { > lockdep_assert_irqs_enabled(); > raw_spin_lock_irq_rcu_node(rnp); > } > rcu_poll_gp_seq_start(snap); > if (rcu_init_invoked()) > raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here > } Thank you for digging into this! Does the following help? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index abc615808b6e8..45ad9ef00d647 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1425,30 +1425,32 @@ static void rcu_poll_gp_seq_end(unsigned long *snap) // where caller does not hold the root rcu_node structure's lock. static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) { + unsigned long flags; struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { lockdep_assert_irqs_enabled(); - raw_spin_lock_irq_rcu_node(rnp); + raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_start(snap); if (rcu_init_invoked()) - raw_spin_unlock_irq_rcu_node(rnp); + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } // Make the polled API aware of the end of a grace period, but where // caller does not hold the root rcu_node structure's lock. static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap) { + unsigned long flags; struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { lockdep_assert_irqs_enabled(); - raw_spin_lock_irq_rcu_node(rnp); + raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_end(snap); if (rcu_init_invoked()) - raw_spin_unlock_irq_rcu_node(rnp); + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } /*
On Thu, 20 Oct 2022 10:55:12 -0700 "Paul E. McKenney" <paulmck@kernel.org> wrote: > > synchronize_rcu() { > > rcu_poll_gp_seq_start_unlocked() { > > struct rcu_node *rnp = rcu_get_root(); > > > > if (rcu_init_invoked()) { > > lockdep_assert_irqs_enabled(); > > raw_spin_lock_irq_rcu_node(rnp); > > } > > rcu_poll_gp_seq_start(snap); > > if (rcu_init_invoked()) > > raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here > > } > > Thank you for digging into this! > > Does the following help? It does indeed! I can remove the early_boot_irqs_disabled checks if you can add this to this rc cycle and mark it stable. -- Steve
On Thu, Oct 20, 2022 at 02:53:40PM -0400, Steven Rostedt wrote: > On Thu, 20 Oct 2022 10:55:12 -0700 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > synchronize_rcu() { > > > rcu_poll_gp_seq_start_unlocked() { > > > struct rcu_node *rnp = rcu_get_root(); > > > > > > if (rcu_init_invoked()) { > > > lockdep_assert_irqs_enabled(); > > > raw_spin_lock_irq_rcu_node(rnp); > > > } > > > rcu_poll_gp_seq_start(snap); > > > if (rcu_init_invoked()) > > > raw_spin_unlock_irq_rcu_node(rnp); <<-- Interrupts enabled here > > > } > > > > Thank you for digging into this! > > > > Does the following help? > > It does indeed! > > I can remove the early_boot_irqs_disabled checks if you can add this to > this rc cycle and mark it stable. My plan is to push it in after a bit of testing. The Fixes tag should make the right -stable stuff happen. Thanx, Paul ------------------------------------------------------------------------ commit 952643575669332565c77066fb42d67987680f12 Author: Paul E. McKenney <paulmck@kernel.org> Date: Thu Oct 20 10:58:14 2022 -0700 rcu: Keep synchronize_rcu() from enabling irqs in early boot Making polled RCU grace periods account for expedited grace periods required acquiring the leaf rcu_node structure's lock during early boot, but after rcu_init() was called. This lock is irq-disabled, but the code incorrectly assumes that irqs are always disabled when invoking synchronize_rcu(). The exception is early boot before the scheduler has started, which means that upon return from synchronize_rcu(), irqs will be incorrectly enabled. This commit fixes this bug by using irqsave/irqrestore locking primitives. Fixes: bf95b2bc3e42 ("rcu: Switch polled grace-period APIs to ->gp_seq_polled") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index abc615808b6e8..45ad9ef00d647 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1425,30 +1425,32 @@ static void rcu_poll_gp_seq_end(unsigned long *snap) // where caller does not hold the root rcu_node structure's lock. static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap) { + unsigned long flags; struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { lockdep_assert_irqs_enabled(); - raw_spin_lock_irq_rcu_node(rnp); + raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_start(snap); if (rcu_init_invoked()) - raw_spin_unlock_irq_rcu_node(rnp); + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } // Make the polled API aware of the end of a grace period, but where // caller does not hold the root rcu_node structure's lock. static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap) { + unsigned long flags; struct rcu_node *rnp = rcu_get_root(); if (rcu_init_invoked()) { lockdep_assert_irqs_enabled(); - raw_spin_lock_irq_rcu_node(rnp); + raw_spin_lock_irqsave_rcu_node(rnp, flags); } rcu_poll_gp_seq_end(snap); if (rcu_init_invoked()) - raw_spin_unlock_irq_rcu_node(rnp); + raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } /*
On Thu, 20 Oct 2022 15:21:08 -0700 "Paul E. McKenney" <paulmck@kernel.org> wrote: > My plan is to push it in after a bit of testing. The Fixes tag should > make the right -stable stuff happen. Great, thanks! I'll resend my patch without the synchronize_rcu() avoidance. -- Steve
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a465d5242774..ccf91a4bf113 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6257,6 +6257,25 @@ See also Documentation/trace/ftrace.rst "trace options" section. + trace_trigger=[trigger-list] + [FTRACE] Add a event trigger on specific events. + Set a trigger on top of a specific event, with an optional + filter. + + The format is is "trace_trigger=<event>.<trigger>[ if <filter>],..." + Where more than one trigger may be specified that are comma deliminated. + + For example: + + trace_trigger="sched_switch.stacktrace if prev_state == 2" + + The above will enable the "stacktrace" trigger on the "sched_switch" + event but only trigger it if the "prev_state" of the "sched_switch" + event is "2" (TASK_UNINTERUPTIBLE). + + See also "Event triggers" in Documentation/trace/events.rst + + traceoff_on_warning [FTRACE] enable this option to disable tracing when a warning is hit. This turns off "tracing_on". Tracing can diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 4b33b95eb8be..a5c6b5772897 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -90,6 +90,10 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) #ifdef CONFIG_TRACEPOINTS static inline void tracepoint_synchronize_unregister(void) { + /* Early updates do not need synchronization */ + if (early_boot_irqs_disabled) + return; + synchronize_srcu(&tracepoint_srcu); synchronize_rcu(); } diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 47a44b055a1d..c03fd7037add 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2749,7 +2749,8 @@ void trace_buffered_event_disable(void) preempt_enable(); /* Wait for all current users to finish */ - synchronize_rcu(); + if (!early_boot_irqs_disabled) + synchronize_rcu(); for_each_tracing_cpu(cpu) { free_page((unsigned long)per_cpu(trace_buffered_event, cpu)); diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c index 0356cae0cf74..06554939252c 100644 --- a/kernel/trace/trace_events.c +++ b/kernel/trace/trace_events.c @@ -2796,6 +2796,44 @@ trace_create_new_event(struct trace_event_call *call, return file; } +#ifdef CONFIG_HIST_TRIGGERS +#define MAX_BOOT_TRIGGERS 32 + +static struct boot_triggers { + const char *event; + char *trigger; +} bootup_triggers[MAX_BOOT_TRIGGERS]; + +static char bootup_trigger_buf[COMMAND_LINE_SIZE]; +static int nr_boot_triggers; + +static __init int setup_trace_triggers(char *str) +{ + char *trigger; + char *buf; + int i; + + strlcpy(bootup_trigger_buf, str, COMMAND_LINE_SIZE); + ring_buffer_expanded = true; + disable_tracing_selftest("running event triggers"); + + buf = bootup_trigger_buf; + for (i = 0; i < MAX_BOOT_TRIGGERS; i++) { + trigger = strsep(&buf, ","); + if (!trigger) + break; + bootup_triggers[i].event = strsep(&trigger, "."); + bootup_triggers[i].trigger = strsep(&trigger, "."); + if (!bootup_triggers[i].trigger) + break; + } + + nr_boot_triggers = i; + return 1; +} +__setup("trace_trigger=", setup_trace_triggers); +#endif + /* Add an event to a trace directory */ static int __trace_add_new_event(struct trace_event_call *call, struct trace_array *tr) @@ -2822,12 +2860,32 @@ __trace_early_add_new_event(struct trace_event_call *call, struct trace_array *tr) { struct trace_event_file *file; + int ret; + int i; file = trace_create_new_event(call, tr); if (!file) return -ENOMEM; - return event_define_fields(call); + ret = event_define_fields(call); + if (ret) + return ret; + +#ifdef CONFIG_HIST_TRIGGERS + for (i = 0; i < nr_boot_triggers; i++) { + if (strcmp(trace_event_name(call), bootup_triggers[i].event)) + continue; + mutex_lock(&event_mutex); + ret = trigger_process_regex(file, bootup_triggers[i].trigger); + mutex_unlock(&event_mutex); + if (ret) + pr_err("Failed to register trigger '%s' on event %s\n", + bootup_triggers[i].trigger, + bootup_triggers[i].event); + } +#endif + + return 0; } struct ftrace_module_file_ops; @@ -3726,6 +3784,8 @@ static __init int event_trace_enable(void) list_add(&call->list, &ftrace_events); } + register_trigger_cmds(); + /* * We need the top trace array to have a working set of trace * points at early init, before the debug files and directories @@ -3740,7 +3800,6 @@ static __init int event_trace_enable(void) register_event_cmds(); - register_trigger_cmds(); return 0; } diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index f23144af5743..f6e4ee1e40b3 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -48,6 +48,9 @@ static void tp_rcu_get_state(enum tp_transition_sync sync) { struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; + if (early_boot_irqs_disabled) + return; + /* Keep the latest get_state snapshot. */ snapshot->rcu = get_state_synchronize_rcu(); snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu); @@ -58,6 +61,9 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync) { struct tp_transition_snapshot *snapshot = &tp_transition_snapshot[sync]; + if (early_boot_irqs_disabled) + return; + if (!snapshot->ongoing) return; cond_synchronize_rcu(snapshot->rcu);