Message ID | 20241018214300.6df82178@rorschach (mailing list archive) |
---|---|
State | Accepted |
Commit | 2c02f7375e658ae93d57a31a66f91b62754ef8f1 |
Headers | show |
Series | fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks | expand |
On Fri, 18 Oct 2024 21:43:00 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > From: Steven Rostedt <rostedt@goodmis.org> > > The function graph infrastructure allocates a shadow stack for every task > when enabled. This includes the idle tasks. The first time the function > graph is invoked, the shadow stacks are created and never freed until the > task exits. This includes the idle tasks. > > Only the idle tasks that were for online CPUs had their shadow stacks > created when function graph tracing started. If function graph tracing is > enabled and a CPU comes online, the idle task representing that CPU will > not have its shadow stack created, and all function graph tracing for that > idle task will be silently dropped. > > Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks. > This will include idle tasks for CPUs that come online during tracing. > > This issue can be reproduced by: > > # cd /sys/kernel/tracing > # echo 0 > /sys/devices/system/cpu/cpu1/online > # echo 0 > set_ftrace_pid > # echo function_graph > current_tracer > # echo 1 > options/funcgraph-proc > # echo 1 > /sys/devices/system/cpu/cpu1 > # grep '<idle>' per_cpu/cpu1/trace | head > > Before, nothing would show up. > > After: > 1) <idle>-0 | 0.811 us | __enqueue_entity(); > 1) <idle>-0 | 5.626 us | } /* enqueue_entity */ > 1) <idle>-0 | | dl_server_update_idle_time() { > 1) <idle>-0 | | dl_scaled_delta_exec() { > 1) <idle>-0 | 0.450 us | arch_scale_cpu_capacity(); > 1) <idle>-0 | 1.242 us | } > 1) <idle>-0 | 1.908 us | } > 1) <idle>-0 | | dl_server_start() { > 1) <idle>-0 | | enqueue_dl_entity() { > 1) <idle>-0 | | task_contending() { > > Note, if tracing stops and restarts, the old way would then initialize > the onlined CPUs. > Looks good to me, except one comment below; Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> [...] > int register_ftrace_graph(struct fgraph_ops *gops) > { > + static bool fgraph_initialized; > int command = 0; > int ret = 0; > int i = -1; > > mutex_lock(&ftrace_lock); > > + if (!fgraph_initialized) { > + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "fgraph_idle_init", Nit: Maybe it is better to call it as "tracing/fgraph:online" ? Thank you, > + fgraph_cpu_init, NULL); > + if (ret < 0) { > + pr_warn("fgraph: Error to init cpu hotplug support\n"); > + return ret; > + } > + fgraph_initialized = true; > + ret = 0; > + } > + > if (!fgraph_array[0]) { > /* The array must always have real data on it */ > for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) > -- > 2.45.2 >
On Mon, 21 Oct 2024 14:58:10 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > + if (!fgraph_initialized) { > > + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "fgraph_idle_init", > > Nit: Maybe it is better to call it as "tracing/fgraph:online" ? Ah this already went upstream. But yeah, I can rename it for the merge window. -- Steve
On Mon, 21 Oct 2024 14:58:10 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > > + if (!fgraph_initialized) { > > + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "fgraph_idle_init", > > Nit: Maybe it is better to call it as "tracing/fgraph:online" ? I'm going to call it "fgraph:online" as it's not technically tracing. I'll also add it to my urgent branch so it gets into 6.12. -- Steve
Hi Stephen, On Sat, Oct 19, 2024 at 3:43 AM Steven Rostedt <rostedt@goodmis.org> wrote: > The function graph infrastructure allocates a shadow stack for every task > when enabled. This includes the idle tasks. The first time the function > graph is invoked, the shadow stacks are created and never freed until the > task exits. This includes the idle tasks. (...) > Cc: stable@vger.kernel.org > Fixes: 868baf07b1a25 ("ftrace: Fix memory leak with function graph and cpu hotplug") > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> This patch regressed boot-time tracing for me. How to reproduce: - Enable CONFIG_FTRACE, CONFIG_FUNCTION_TRACER, CONFIG_BOOTTIME_TRACING - Pass command line ftrace=function_graph ftrace_graph_filter=do_idle to make ftrace trace this function all through the boot process. Before this patch: cd /sys/kernel/debug/tracing cat trace gives a nice trace of all invocations of do_idle() during boot. After this patch: cd /sys/kernel/debug/tracing cat trace Gives an empty trace :( And: cat current_tracer function_graph cat set_graph_function do_idle cat tracing_on 1 So all *is* set up, just not performing I tried to figure out why this happens but I'm not good with tracing internals. Any ideas? Yours, Linus Walleij
On Tue, 10 Dec 2024 16:11:16 +0100 Linus Walleij <linus.walleij@linaro.org> wrote: > This patch regressed boot-time tracing for me. > > How to reproduce: > - Enable CONFIG_FTRACE, CONFIG_FUNCTION_TRACER, > CONFIG_BOOTTIME_TRACING > - Pass command line > ftrace=function_graph ftrace_graph_filter=do_idle > to make ftrace trace this function all through the boot process. > > Before this patch: > > cd /sys/kernel/debug/tracing > cat trace > > gives a nice trace of all invocations of do_idle() during boot. > > After this patch: > > cd /sys/kernel/debug/tracing > cat trace > > Gives an empty trace :( > > And: > > cat current_tracer > function_graph > cat set_graph_function > do_idle > cat tracing_on > 1 > > So all *is* set up, just not performing > > I tried to figure out why this happens but I'm not good with tracing > internals. Any ideas? Thanks for the report. I'm currently at the ELISA workshop this week, but will try to reproduce it. -- Steve
On Tue, 10 Dec 2024 16:11:16 +0100 Linus Walleij <linus.walleij@linaro.org> wrote: > Hi Stephen, > > On Sat, Oct 19, 2024 at 3:43 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > The function graph infrastructure allocates a shadow stack for every task > > when enabled. This includes the idle tasks. The first time the function > > graph is invoked, the shadow stacks are created and never freed until the > > task exits. This includes the idle tasks. > (...) > > Cc: stable@vger.kernel.org > > Fixes: 868baf07b1a25 ("ftrace: Fix memory leak with function graph and cpu hotplug") > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > This patch regressed boot-time tracing for me. > > How to reproduce: > - Enable CONFIG_FTRACE, CONFIG_FUNCTION_TRACER, > CONFIG_BOOTTIME_TRACING > - Pass command line > ftrace=function_graph ftrace_graph_filter=do_idle > to make ftrace trace this function all through the boot process. > > Before this patch: > > cd /sys/kernel/debug/tracing > cat trace > > gives a nice trace of all invocations of do_idle() during boot. > > After this patch: > > cd /sys/kernel/debug/tracing > cat trace > > Gives an empty trace :( > > And: > > cat current_tracer > function_graph > cat set_graph_function > do_idle > cat tracing_on > 1 > > So all *is* set up, just not performing > > I tried to figure out why this happens but I'm not good with tracing > internals. Any ideas? Interesting. Does this happen only on boot-time tracing or after boot too? If it does not work only for boot-time, cpuhp_setup_state() may not work before starting boot-time function graph tracing. Thank you, > > Yours, > Linus Walleij
On Wed, Dec 11, 2024 at 12:24 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > cd /sys/kernel/debug/tracing > > cat trace > > > > Gives an empty trace :( > > > > And: > > > > cat current_tracer > > function_graph > > cat set_graph_function > > do_idle > > cat tracing_on > > 1 > > > > So all *is* set up, just not performing > > > > I tried to figure out why this happens but I'm not good with tracing > > internals. Any ideas? > > Interesting. Does this happen only on boot-time tracing or after boot too? > If it does not work only for boot-time, cpuhp_setup_state() may not work > before starting boot-time function graph tracing. If I boot without any tracing enabled from the cmdline and: echo 0 > tracing_on echo function_graph > current_tracer echo do_idle > set_graph_function echo 1 > tracing_on I don't get any output either. It works for other functions, such as echo ktime_get > set_graph_function It seems it's the set_graph_function thing that isn't working with do_idle at all after this patch. Why just this function... The function is clearly there: cat available_filter_functions | grep do_idle do_idle I can also verify that this function is indeed getting invoked by adding prints to it (it's invoked all the time on any normal system). Does this have something to do with the context where do_idle is called? It's all really confusing... Yours, Linus Walleij
On Wed, 11 Dec 2024 15:23:05 +0100 Linus Walleij <linus.walleij@linaro.org> wrote: > If I boot without any tracing enabled from the cmdline and: > > echo 0 > tracing_on > echo function_graph > current_tracer > echo do_idle > set_graph_function > echo 1 > tracing_on > > I don't get any output either. > > It works for other functions, such as > > echo ktime_get > set_graph_function > > It seems it's the set_graph_function thing that isn't working > with do_idle at all after this patch. Why just this function... > The function is clearly there: > > cat available_filter_functions | grep do_idle > do_idle > > I can also verify that this function is indeed getting invoked > by adding prints to it (it's invoked all the time on any normal > system). Does this have something to do with the context > where do_idle is called? It's all really confusing... Yeah, I figured it out. That commit moved the initialization before fgraph was registered, and we had in ftrace_graph_init_idle_task(): if (ftrace_graph_active) { unsigned long *ret_stack; ret_stack = per_cpu(idle_ret_stack, cpu); if (!ret_stack) { ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL); if (!ret_stack) return; per_cpu(idle_ret_stack, cpu) = ret_stack; } graph_init_task(t, ret_stack); } But because ftrace_graph_active was not set yet, the initialization didn't happen. Can you try this patch? -- Steve diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index 43f4e3f57438..4706a7dce93a 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -1160,13 +1160,19 @@ void fgraph_update_pid_func(void) static int start_graph_tracing(void) { unsigned long **ret_stack_list; - int ret; + int ret, cpu; ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL); if (!ret_stack_list) return -ENOMEM; + /* The cpu_boot init_task->ret_stack will never be freed */ + for_each_online_cpu(cpu) { + if (!idle_task(cpu)->ret_stack) + ftrace_graph_init_idle_task(idle_task(cpu), cpu); + } + do { ret = alloc_retstack_tasklist(ret_stack_list); } while (ret == -EAGAIN);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index d7d4fb403f6f..43f4e3f57438 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -1160,19 +1160,13 @@ void fgraph_update_pid_func(void) static int start_graph_tracing(void) { unsigned long **ret_stack_list; - int ret, cpu; + int ret; ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL); if (!ret_stack_list) return -ENOMEM; - /* The cpu_boot init_task->ret_stack will never be freed */ - for_each_online_cpu(cpu) { - if (!idle_task(cpu)->ret_stack) - ftrace_graph_init_idle_task(idle_task(cpu), cpu); - } - do { ret = alloc_retstack_tasklist(ret_stack_list); } while (ret == -EAGAIN); @@ -1242,14 +1236,34 @@ static void ftrace_graph_disable_direct(bool disable_branch) fgraph_direct_gops = &fgraph_stub; } +/* The cpu_boot init_task->ret_stack will never be freed */ +static int fgraph_cpu_init(unsigned int cpu) +{ + if (!idle_task(cpu)->ret_stack) + ftrace_graph_init_idle_task(idle_task(cpu), cpu); + return 0; +} + int register_ftrace_graph(struct fgraph_ops *gops) { + static bool fgraph_initialized; int command = 0; int ret = 0; int i = -1; mutex_lock(&ftrace_lock); + if (!fgraph_initialized) { + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "fgraph_idle_init", + fgraph_cpu_init, NULL); + if (ret < 0) { + pr_warn("fgraph: Error to init cpu hotplug support\n"); + return ret; + } + fgraph_initialized = true; + ret = 0; + } + if (!fgraph_array[0]) { /* The array must always have real data on it */ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)