Message ID | 20240926061311.25625-1-tatsuya.s2862@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] ftrace: Hide a extra entry in stack trace | expand |
On Thu, 26 Sep 2024 15:13:07 +0900 Tatsuya S <tatsuya.s2862@gmail.com> wrote: > A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y). > > [003] ..... 110.171589: vfs_write <-__x64_sys_write > [003] ..... 110.171600: <stack trace> > => XXXXXXXXX (Wrong function name) > => vfs_write > => __x64_sys_write > => do_syscall_64 > => entry_SYSCALL_64_after_hwframe OK, I confirmed it; ------ echo 1 > options/func_stack_trace echo "vfs_write" >> set_ftrace_filter echo "function" > current_tracer echo > /dev/null cat trace sh-136 [005] ..... 266.884180: vfs_write <-ksys_write sh-136 [005] ..... 266.884188: <stack trace> => 0xffffffffa0004099 => vfs_write => ksys_write => do_syscall_64 => entry_SYSCALL_64_after_hwframe ------ > > To resolve this, increment skip in __ftrace_trace_stack(). > The reason why skip is incremented in __ftrace_trace_stack() > is because __ftrace_trace_stack() in stack trace is the only function > that wasn't skipped from anywhere. > > Signed-off-by: Tatsuya S <tatsuya.s2862@gmail.com> > --- > kernel/trace/trace.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index c3b2c7dfadef..0f2e255f563c 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer, > * Add one, for this function and the call to save_stack_trace() > * If regs is set, then these functions will not be in the way. > */ Hmm, with this change, the above comment should also be updated. > -#ifndef CONFIG_UNWINDER_ORC > - if (!regs) > + if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs) > skip++; > -#endif Also, this solves just one pattern (only enable function tracer) but if there are fprobes (or kprobes) on the same function, it introduces another issue. e.g. (with this fix) ------ echo 1 > options/func_stack_trace echo 1 > options/stacktrace echo "vfs_write" >> set_ftrace_filter echo "function" > current_tracer echo "f:myevent vfs_write" > dynamic_events echo 1 > events/fprobes/myevent/enable echo > /dev/null cat trace ... sh-140 [001] ...1. 18.352601: myevent: (vfs_write+0x4/0x560) sh-140 [001] ...1. 18.352602: <stack trace> => ksys_write => do_syscall_64 => entry_SYSCALL_64_after_hwframe sh-140 [001] ...1. 18.352602: vfs_write <-ksys_write sh-140 [001] ...1. 18.352604: <stack trace> => ftrace_regs_call => vfs_write => ksys_write => do_syscall_64 => entry_SYSCALL_64_after_hwframe ------ As you can see, myevent skips "vfs_write". (and function tracer still have ftrace_regs_call() ) Thank you, > > preempt_disable_notrace(); > > -- > 2.46.1 >
On 9/30/24 8:51 AM, Masami Hiramatsu (Google) wrote: > On Thu, 26 Sep 2024 15:13:07 +0900 > Tatsuya S<tatsuya.s2862@gmail.com> wrote: > >> A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y). >> >> [003] ..... 110.171589: vfs_write <-__x64_sys_write >> [003] ..... 110.171600: <stack trace> >> => XXXXXXXXX (Wrong function name) >> => vfs_write >> => __x64_sys_write >> => do_syscall_64 >> => entry_SYSCALL_64_after_hwframe > OK, I confirmed it; > > ------ > echo 1 > options/func_stack_trace > echo "vfs_write" >> set_ftrace_filter > echo "function" > current_tracer > echo > /dev/null > cat trace > sh-136 [005] ..... 266.884180: vfs_write <-ksys_write > sh-136 [005] ..... 266.884188: <stack trace> > => 0xffffffffa0004099 > => vfs_write > => ksys_write > => do_syscall_64 > => entry_SYSCALL_64_after_hwframe > ------ > >> To resolve this, increment skip in __ftrace_trace_stack(). >> The reason why skip is incremented in __ftrace_trace_stack() >> is because __ftrace_trace_stack() in stack trace is the only function >> that wasn't skipped from anywhere. >> >> Signed-off-by: Tatsuya S<tatsuya.s2862@gmail.com> >> --- >> kernel/trace/trace.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >> index c3b2c7dfadef..0f2e255f563c 100644 >> --- a/kernel/trace/trace.c >> +++ b/kernel/trace/trace.c >> @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer, >> * Add one, for this function and the call to save_stack_trace() >> * If regs is set, then these functions will not be in the way. >> */ > Hmm, with this change, the above comment should also be updated. > > >> -#ifndef CONFIG_UNWINDER_ORC >> - if (!regs) >> + if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs) >> skip++; >> -#endif > Also, this solves just one pattern (only enable function tracer) but if > there are fprobes (or kprobes) on the same function, it introduces another issue. > e.g. (with this fix) > > ------ > echo 1 > options/func_stack_trace > echo 1 > options/stacktrace > echo "vfs_write" >> set_ftrace_filter > echo "function" > current_tracer > echo "f:myevent vfs_write" > dynamic_events > echo 1 > events/fprobes/myevent/enable > echo > /dev/null > cat trace > ... > sh-140 [001] ...1. 18.352601: myevent: (vfs_write+0x4/0x560) > sh-140 [001] ...1. 18.352602: <stack trace> > => ksys_write > => do_syscall_64 > => entry_SYSCALL_64_after_hwframe > sh-140 [001] ...1. 18.352602: vfs_write <-ksys_write > sh-140 [001] ...1. 18.352604: <stack trace> > => ftrace_regs_call > => vfs_write > => ksys_write > => do_syscall_64 > => entry_SYSCALL_64_after_hwframe > ------ > As you can see, myevent skips "vfs_write". > (and function tracer still have ftrace_regs_call() ) Thanks for the other tests. This issue may be function_trace_call() specific problem. So I will change the place to increment skip number. > Thank you, > >> >> preempt_disable_notrace(); >> >> -- >> 2.46.1 >> Thank you,
On 10/1/24 10:27 PM, ts wrote: > > On 9/30/24 8:51 AM, Masami Hiramatsu (Google) wrote: >> On Thu, 26 Sep 2024 15:13:07 +0900 >> Tatsuya S<tatsuya.s2862@gmail.com> wrote: >> >>> A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y). >>> >>> [003] ..... 110.171589: vfs_write <-__x64_sys_write >>> [003] ..... 110.171600: <stack trace> >>> => XXXXXXXXX (Wrong function name) >>> => vfs_write >>> => __x64_sys_write >>> => do_syscall_64 >>> => entry_SYSCALL_64_after_hwframe >> OK, I confirmed it; >> >> ------ >> echo 1 > options/func_stack_trace >> echo "vfs_write" >> set_ftrace_filter >> echo "function" > current_tracer >> echo > /dev/null >> cat trace >> sh-136 [005] ..... 266.884180: vfs_write >> <-ksys_write >> sh-136 [005] ..... 266.884188: <stack trace> >> => 0xffffffffa0004099 >> => vfs_write >> => ksys_write >> => do_syscall_64 >> => entry_SYSCALL_64_after_hwframe >> ------ >> >>> To resolve this, increment skip in __ftrace_trace_stack(). >>> The reason why skip is incremented in __ftrace_trace_stack() >>> is because __ftrace_trace_stack() in stack trace is the only function >>> that wasn't skipped from anywhere. >>> >>> Signed-off-by: Tatsuya S<tatsuya.s2862@gmail.com> >>> --- >>> kernel/trace/trace.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c >>> index c3b2c7dfadef..0f2e255f563c 100644 >>> --- a/kernel/trace/trace.c >>> +++ b/kernel/trace/trace.c >>> @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct >>> trace_buffer *buffer, >>> * Add one, for this function and the call to save_stack_trace() >>> * If regs is set, then these functions will not be in the way. >>> */ >> Hmm, with this change, the above comment should also be updated. >> >> >>> -#ifndef CONFIG_UNWINDER_ORC >>> - if (!regs) >>> + if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs) >>> skip++; >>> -#endif >> Also, this solves just one pattern (only enable function tracer) but if >> there are fprobes (or kprobes) on the same function, it introduces >> another issue. >> e.g. (with this fix) >> >> ------ >> echo 1 > options/func_stack_trace >> echo 1 > options/stacktrace >> echo "vfs_write" >> set_ftrace_filter >> echo "function" > current_tracer >> echo "f:myevent vfs_write" > dynamic_events >> echo 1 > events/fprobes/myevent/enable >> echo > /dev/null >> cat trace >> ... >> sh-140 [001] ...1. 18.352601: myevent: >> (vfs_write+0x4/0x560) >> sh-140 [001] ...1. 18.352602: <stack trace> >> => ksys_write >> => do_syscall_64 >> => entry_SYSCALL_64_after_hwframe >> sh-140 [001] ...1. 18.352602: vfs_write >> <-ksys_write >> sh-140 [001] ...1. 18.352604: <stack trace> >> => ftrace_regs_call >> => vfs_write >> => ksys_write >> => do_syscall_64 >> => entry_SYSCALL_64_after_hwframe >> ------ >> As you can see, myevent skips "vfs_write". >> (and function tracer still have ftrace_regs_call() ) > > Thanks for the other tests. This issue may be function_trace_call() > specific problem. > Not function_trace_call(), function_stack_trace_call(). > So I will change the place to increment skip number. > >> Thank you, >> >>> preempt_disable_notrace(); >>> -- >>> 2.46.1 >>> > Thank you,
On Tue, 1 Oct 2024 22:27:03 +0900 ts <tatsuya.s2862@gmail.com> wrote: > > ... > > sh-140 [001] ...1. 18.352601: myevent: (vfs_write+0x4/0x560) > > sh-140 [001] ...1. 18.352602: <stack trace> > > => ksys_write > > => do_syscall_64 > > => entry_SYSCALL_64_after_hwframe > > sh-140 [001] ...1. 18.352602: vfs_write <-ksys_write > > sh-140 [001] ...1. 18.352604: <stack trace> > > => ftrace_regs_call > > => vfs_write > > => ksys_write > > => do_syscall_64 > > => entry_SYSCALL_64_after_hwframe > > ------ > > As you can see, myevent skips "vfs_write". > > (and function tracer still have ftrace_regs_call() ) > > Thanks for the other tests. This issue may be function_trace_call() > specific problem. > > So I will change the place to increment skip number. My fear is that we are going to just break it elsewhere. The problem with the "skip" is that there's so many configurations when we get here, we may not really know what to skip. If the compiler inlines something, then we may skip something we do not want to. I rather have extra information than not enough. -- Steve
On 10/1/24 10:47 PM, Steven Rostedt wrote: > On Tue, 1 Oct 2024 22:27:03 +0900 > ts <tatsuya.s2862@gmail.com> wrote: > >>> ... >>> sh-140 [001] ...1. 18.352601: myevent: (vfs_write+0x4/0x560) >>> sh-140 [001] ...1. 18.352602: <stack trace> >>> => ksys_write >>> => do_syscall_64 >>> => entry_SYSCALL_64_after_hwframe >>> sh-140 [001] ...1. 18.352602: vfs_write <-ksys_write >>> sh-140 [001] ...1. 18.352604: <stack trace> >>> => ftrace_regs_call >>> => vfs_write >>> => ksys_write >>> => do_syscall_64 >>> => entry_SYSCALL_64_after_hwframe >>> ------ >>> As you can see, myevent skips "vfs_write". >>> (and function tracer still have ftrace_regs_call() ) >> >> Thanks for the other tests. This issue may be function_trace_call() >> specific problem. >> >> So I will change the place to increment skip number. > > My fear is that we are going to just break it elsewhere. The problem with > the "skip" is that there's so many configurations when we get here, we may > not really know what to skip. If the compiler inlines something, then we > may skip something we do not want to. > > I rather have extra information than not enough. > > -- Steve It may not be clean and be bit redundant, but I think it would be more maintainable to treat "skip(and skipped functions)" separately only at the top(parent) of functions that display stack trace. thanks.
On Wed, 2 Oct 2024 15:43:35 +0900 Tatsuya S <tatsuya.s2862@gmail.com> wrote: > > > On 10/1/24 10:47 PM, Steven Rostedt wrote: > > On Tue, 1 Oct 2024 22:27:03 +0900 > > ts <tatsuya.s2862@gmail.com> wrote: > > > >>> ... > >>> sh-140 [001] ...1. 18.352601: myevent: (vfs_write+0x4/0x560) > >>> sh-140 [001] ...1. 18.352602: <stack trace> > >>> => ksys_write > >>> => do_syscall_64 > >>> => entry_SYSCALL_64_after_hwframe > >>> sh-140 [001] ...1. 18.352602: vfs_write <-ksys_write > >>> sh-140 [001] ...1. 18.352604: <stack trace> > >>> => ftrace_regs_call > >>> => vfs_write > >>> => ksys_write > >>> => do_syscall_64 > >>> => entry_SYSCALL_64_after_hwframe > >>> ------ > >>> As you can see, myevent skips "vfs_write". > >>> (and function tracer still have ftrace_regs_call() ) > >> > >> Thanks for the other tests. This issue may be function_trace_call() > >> specific problem. > >> > >> So I will change the place to increment skip number. > > > > My fear is that we are going to just break it elsewhere. The problem with > > the "skip" is that there's so many configurations when we get here, we may > > not really know what to skip. If the compiler inlines something, then we > > may skip something we do not want to. > > > > I rather have extra information than not enough. > > > > -- Steve > > It may not be clean and be bit redundant, but I think it would be more > maintainable to treat > > "skip(and skipped functions)" separately only at the top(parent) of > functions that display stack trace. I think you'd better make a set of test programs which gets the stacktrace with several different conditions (combinations of tracers/probes/kconfgis) at first. Then we can make sure it does not break anything. Thank you,
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index c3b2c7dfadef..0f2e255f563c 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer, * Add one, for this function and the call to save_stack_trace() * If regs is set, then these functions will not be in the way. */ -#ifndef CONFIG_UNWINDER_ORC - if (!regs) + if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs) skip++; -#endif preempt_disable_notrace();
A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y). [003] ..... 110.171589: vfs_write <-__x64_sys_write [003] ..... 110.171600: <stack trace> => XXXXXXXXX (Wrong function name) => vfs_write => __x64_sys_write => do_syscall_64 => entry_SYSCALL_64_after_hwframe To resolve this, increment skip in __ftrace_trace_stack(). The reason why skip is incremented in __ftrace_trace_stack() is because __ftrace_trace_stack() in stack trace is the only function that wasn't skipped from anywhere. Signed-off-by: Tatsuya S <tatsuya.s2862@gmail.com> --- kernel/trace/trace.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)