diff mbox series

[v2] ftrace: Hide a extra entry in stack trace

Message ID 20240926061311.25625-1-tatsuya.s2862@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] ftrace: Hide a extra entry in stack trace | expand

Commit Message

Tatsuya S Sept. 26, 2024, 6:13 a.m. UTC
A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y).

    [003] .....   110.171589: vfs_write <-__x64_sys_write
    [003] .....   110.171600: <stack trace>
 => XXXXXXXXX (Wrong function name)
 => vfs_write
 => __x64_sys_write
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe

To resolve this, increment skip in __ftrace_trace_stack().
The reason why skip is incremented in __ftrace_trace_stack()
is because __ftrace_trace_stack() in stack trace is the only function
that wasn't skipped from anywhere.

Signed-off-by: Tatsuya S <tatsuya.s2862@gmail.com>
---
 kernel/trace/trace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Masami Hiramatsu (Google) Sept. 29, 2024, 11:51 p.m. UTC | #1
On Thu, 26 Sep 2024 15:13:07 +0900
Tatsuya S <tatsuya.s2862@gmail.com> wrote:

> A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y).
> 
>     [003] .....   110.171589: vfs_write <-__x64_sys_write
>     [003] .....   110.171600: <stack trace>
>  => XXXXXXXXX (Wrong function name)
>  => vfs_write
>  => __x64_sys_write
>  => do_syscall_64
>  => entry_SYSCALL_64_after_hwframe

OK, I confirmed it;

------
echo 1 > options/func_stack_trace 
echo "vfs_write" >> set_ftrace_filter
echo "function" > current_tracer
echo > /dev/null
cat trace
              sh-136     [005] .....   266.884180: vfs_write <-ksys_write
              sh-136     [005] .....   266.884188: <stack trace>
 => 0xffffffffa0004099
 => vfs_write
 => ksys_write
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe
------

> 
> To resolve this, increment skip in __ftrace_trace_stack().
> The reason why skip is incremented in __ftrace_trace_stack()
> is because __ftrace_trace_stack() in stack trace is the only function
> that wasn't skipped from anywhere.
> 
> Signed-off-by: Tatsuya S <tatsuya.s2862@gmail.com>
> ---
>  kernel/trace/trace.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index c3b2c7dfadef..0f2e255f563c 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
>  	 * Add one, for this function and the call to save_stack_trace()
>  	 * If regs is set, then these functions will not be in the way.
>  	 */

Hmm, with this change, the above comment should also be updated.


> -#ifndef CONFIG_UNWINDER_ORC
> -	if (!regs)
> +	if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs)
>  		skip++;
> -#endif

Also, this solves just one pattern (only enable function tracer) but if
there are fprobes (or kprobes) on the same function, it introduces another issue.
e.g. (with this fix)

------
echo 1 > options/func_stack_trace
echo 1 > options/stacktrace
echo "vfs_write" >> set_ftrace_filter
echo "function" > current_tracer
echo "f:myevent vfs_write" > dynamic_events
echo 1 > events/fprobes/myevent/enable
echo > /dev/null
cat trace
...
              sh-140     [001] ...1.    18.352601: myevent: (vfs_write+0x4/0x560)
              sh-140     [001] ...1.    18.352602: <stack trace>
 => ksys_write
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe
              sh-140     [001] ...1.    18.352602: vfs_write <-ksys_write
              sh-140     [001] ...1.    18.352604: <stack trace>
 => ftrace_regs_call
 => vfs_write
 => ksys_write
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe
------
As you can see, myevent skips "vfs_write".
(and function tracer still have ftrace_regs_call() )

Thank you,

>  
>  	preempt_disable_notrace();
>  
> -- 
> 2.46.1
>
Tatsuya S Oct. 1, 2024, 1:27 p.m. UTC | #2
On 9/30/24 8:51 AM, Masami Hiramatsu (Google) wrote:
> On Thu, 26 Sep 2024 15:13:07 +0900
> Tatsuya S<tatsuya.s2862@gmail.com> wrote:
>
>> A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y).
>>
>>      [003] .....   110.171589: vfs_write <-__x64_sys_write
>>      [003] .....   110.171600: <stack trace>
>>   => XXXXXXXXX (Wrong function name)
>>   => vfs_write
>>   => __x64_sys_write
>>   => do_syscall_64
>>   => entry_SYSCALL_64_after_hwframe
> OK, I confirmed it;
>
> ------
> echo 1 > options/func_stack_trace
> echo "vfs_write" >> set_ftrace_filter
> echo "function" > current_tracer
> echo > /dev/null
> cat trace
>                sh-136     [005] .....   266.884180: vfs_write <-ksys_write
>                sh-136     [005] .....   266.884188: <stack trace>
>   => 0xffffffffa0004099
>   => vfs_write
>   => ksys_write
>   => do_syscall_64
>   => entry_SYSCALL_64_after_hwframe
> ------
>
>> To resolve this, increment skip in __ftrace_trace_stack().
>> The reason why skip is incremented in __ftrace_trace_stack()
>> is because __ftrace_trace_stack() in stack trace is the only function
>> that wasn't skipped from anywhere.
>>
>> Signed-off-by: Tatsuya S<tatsuya.s2862@gmail.com>
>> ---
>>   kernel/trace/trace.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index c3b2c7dfadef..0f2e255f563c 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer,
>>   	 * Add one, for this function and the call to save_stack_trace()
>>   	 * If regs is set, then these functions will not be in the way.
>>   	 */
> Hmm, with this change, the above comment should also be updated.
>
>
>> -#ifndef CONFIG_UNWINDER_ORC
>> -	if (!regs)
>> +	if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs)
>>   		skip++;
>> -#endif
> Also, this solves just one pattern (only enable function tracer) but if
> there are fprobes (or kprobes) on the same function, it introduces another issue.
> e.g. (with this fix)
>
> ------
> echo 1 > options/func_stack_trace
> echo 1 > options/stacktrace
> echo "vfs_write" >> set_ftrace_filter
> echo "function" > current_tracer
> echo "f:myevent vfs_write" > dynamic_events
> echo 1 > events/fprobes/myevent/enable
> echo > /dev/null
> cat trace
> ...
>                sh-140     [001] ...1.    18.352601: myevent: (vfs_write+0x4/0x560)
>                sh-140     [001] ...1.    18.352602: <stack trace>
>   => ksys_write
>   => do_syscall_64
>   => entry_SYSCALL_64_after_hwframe
>                sh-140     [001] ...1.    18.352602: vfs_write <-ksys_write
>                sh-140     [001] ...1.    18.352604: <stack trace>
>   => ftrace_regs_call
>   => vfs_write
>   => ksys_write
>   => do_syscall_64
>   => entry_SYSCALL_64_after_hwframe
> ------
> As you can see, myevent skips "vfs_write".
> (and function tracer still have ftrace_regs_call() )

Thanks for the other tests. This issue may be function_trace_call() 
specific problem.

So I will change the place to increment skip number.

> Thank you,
>
>>   
>>   	preempt_disable_notrace();
>>   
>> -- 
>> 2.46.1
>>
Thank you,
Tatsuya S Oct. 1, 2024, 1:33 p.m. UTC | #3
On 10/1/24 10:27 PM, ts wrote:
>
> On 9/30/24 8:51 AM, Masami Hiramatsu (Google) wrote:
>> On Thu, 26 Sep 2024 15:13:07 +0900
>> Tatsuya S<tatsuya.s2862@gmail.com> wrote:
>>
>>> A extra entry is shown on stack trace(CONFIG_UNWINDER_ORC=y).
>>>
>>>      [003] .....   110.171589: vfs_write <-__x64_sys_write
>>>      [003] .....   110.171600: <stack trace>
>>>   => XXXXXXXXX (Wrong function name)
>>>   => vfs_write
>>>   => __x64_sys_write
>>>   => do_syscall_64
>>>   => entry_SYSCALL_64_after_hwframe
>> OK, I confirmed it;
>>
>> ------
>> echo 1 > options/func_stack_trace
>> echo "vfs_write" >> set_ftrace_filter
>> echo "function" > current_tracer
>> echo > /dev/null
>> cat trace
>>                sh-136     [005] .....   266.884180: vfs_write 
>> <-ksys_write
>>                sh-136     [005] .....   266.884188: <stack trace>
>>   => 0xffffffffa0004099
>>   => vfs_write
>>   => ksys_write
>>   => do_syscall_64
>>   => entry_SYSCALL_64_after_hwframe
>> ------
>>
>>> To resolve this, increment skip in __ftrace_trace_stack().
>>> The reason why skip is incremented in __ftrace_trace_stack()
>>> is because __ftrace_trace_stack() in stack trace is the only function
>>> that wasn't skipped from anywhere.
>>>
>>> Signed-off-by: Tatsuya S<tatsuya.s2862@gmail.com>
>>> ---
>>>   kernel/trace/trace.c | 4 +---
>>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>>> index c3b2c7dfadef..0f2e255f563c 100644
>>> --- a/kernel/trace/trace.c
>>> +++ b/kernel/trace/trace.c
>>> @@ -2916,10 +2916,8 @@ static void __ftrace_trace_stack(struct 
>>> trace_buffer *buffer,
>>>        * Add one, for this function and the call to save_stack_trace()
>>>        * If regs is set, then these functions will not be in the way.
>>>        */
>> Hmm, with this change, the above comment should also be updated.
>>
>>
>>> -#ifndef CONFIG_UNWINDER_ORC
>>> -    if (!regs)
>>> +    if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs)
>>>           skip++;
>>> -#endif
>> Also, this solves just one pattern (only enable function tracer) but if
>> there are fprobes (or kprobes) on the same function, it introduces 
>> another issue.
>> e.g. (with this fix)
>>
>> ------
>> echo 1 > options/func_stack_trace
>> echo 1 > options/stacktrace
>> echo "vfs_write" >> set_ftrace_filter
>> echo "function" > current_tracer
>> echo "f:myevent vfs_write" > dynamic_events
>> echo 1 > events/fprobes/myevent/enable
>> echo > /dev/null
>> cat trace
>> ...
>>                sh-140     [001] ...1.    18.352601: myevent: 
>> (vfs_write+0x4/0x560)
>>                sh-140     [001] ...1.    18.352602: <stack trace>
>>   => ksys_write
>>   => do_syscall_64
>>   => entry_SYSCALL_64_after_hwframe
>>                sh-140     [001] ...1.    18.352602: vfs_write 
>> <-ksys_write
>>                sh-140     [001] ...1.    18.352604: <stack trace>
>>   => ftrace_regs_call
>>   => vfs_write
>>   => ksys_write
>>   => do_syscall_64
>>   => entry_SYSCALL_64_after_hwframe
>> ------
>> As you can see, myevent skips "vfs_write".
>> (and function tracer still have ftrace_regs_call() )
>
> Thanks for the other tests. This issue may be function_trace_call() 
> specific problem.
>
Not function_trace_call(),  function_stack_trace_call().
> So I will change the place to increment skip number.
>
>> Thank you,
>>
>>>         preempt_disable_notrace();
>>>   --
>>> 2.46.1
>>>
> Thank you,
Steven Rostedt Oct. 1, 2024, 1:47 p.m. UTC | #4
On Tue, 1 Oct 2024 22:27:03 +0900
ts <tatsuya.s2862@gmail.com> wrote:

> > ...
> >                sh-140     [001] ...1.    18.352601: myevent: (vfs_write+0x4/0x560)
> >                sh-140     [001] ...1.    18.352602: <stack trace>  
> >   => ksys_write
> >   => do_syscall_64
> >   => entry_SYSCALL_64_after_hwframe  
> >                sh-140     [001] ...1.    18.352602: vfs_write <-ksys_write
> >                sh-140     [001] ...1.    18.352604: <stack trace>  
> >   => ftrace_regs_call
> >   => vfs_write
> >   => ksys_write
> >   => do_syscall_64
> >   => entry_SYSCALL_64_after_hwframe  
> > ------
> > As you can see, myevent skips "vfs_write".
> > (and function tracer still have ftrace_regs_call() )  
> 
> Thanks for the other tests. This issue may be function_trace_call() 
> specific problem.
> 
> So I will change the place to increment skip number.

My fear is that we are going to just break it elsewhere. The problem with
the "skip" is that there's so many configurations when we get here, we may
not really know what to skip. If the compiler inlines something, then we
may skip something we do not want to.

I rather have extra information than not enough.

-- Steve
Tatsuya S Oct. 2, 2024, 6:43 a.m. UTC | #5
On 10/1/24 10:47 PM, Steven Rostedt wrote:
> On Tue, 1 Oct 2024 22:27:03 +0900
> ts <tatsuya.s2862@gmail.com> wrote:
> 
>>> ...
>>>                 sh-140     [001] ...1.    18.352601: myevent: (vfs_write+0x4/0x560)
>>>                 sh-140     [001] ...1.    18.352602: <stack trace>
>>>    => ksys_write
>>>    => do_syscall_64
>>>    => entry_SYSCALL_64_after_hwframe
>>>                 sh-140     [001] ...1.    18.352602: vfs_write <-ksys_write
>>>                 sh-140     [001] ...1.    18.352604: <stack trace>
>>>    => ftrace_regs_call
>>>    => vfs_write
>>>    => ksys_write
>>>    => do_syscall_64
>>>    => entry_SYSCALL_64_after_hwframe
>>> ------
>>> As you can see, myevent skips "vfs_write".
>>> (and function tracer still have ftrace_regs_call() )
>>
>> Thanks for the other tests. This issue may be function_trace_call()
>> specific problem.
>>
>> So I will change the place to increment skip number.
> 
> My fear is that we are going to just break it elsewhere. The problem with
> the "skip" is that there's so many configurations when we get here, we may
> not really know what to skip. If the compiler inlines something, then we
> may skip something we do not want to.
> 
> I rather have extra information than not enough.
> 
> -- Steve

It may not be clean and be bit redundant, but I think it would be more 
maintainable to treat

"skip(and skipped functions)" separately only at the top(parent) of 
functions that display stack trace.


thanks.
Masami Hiramatsu (Google) Oct. 2, 2024, 2:02 p.m. UTC | #6
On Wed, 2 Oct 2024 15:43:35 +0900
Tatsuya S <tatsuya.s2862@gmail.com> wrote:

> 
> 
> On 10/1/24 10:47 PM, Steven Rostedt wrote:
> > On Tue, 1 Oct 2024 22:27:03 +0900
> > ts <tatsuya.s2862@gmail.com> wrote:
> > 
> >>> ...
> >>>                 sh-140     [001] ...1.    18.352601: myevent: (vfs_write+0x4/0x560)
> >>>                 sh-140     [001] ...1.    18.352602: <stack trace>
> >>>    => ksys_write
> >>>    => do_syscall_64
> >>>    => entry_SYSCALL_64_after_hwframe
> >>>                 sh-140     [001] ...1.    18.352602: vfs_write <-ksys_write
> >>>                 sh-140     [001] ...1.    18.352604: <stack trace>
> >>>    => ftrace_regs_call
> >>>    => vfs_write
> >>>    => ksys_write
> >>>    => do_syscall_64
> >>>    => entry_SYSCALL_64_after_hwframe
> >>> ------
> >>> As you can see, myevent skips "vfs_write".
> >>> (and function tracer still have ftrace_regs_call() )
> >>
> >> Thanks for the other tests. This issue may be function_trace_call()
> >> specific problem.
> >>
> >> So I will change the place to increment skip number.
> > 
> > My fear is that we are going to just break it elsewhere. The problem with
> > the "skip" is that there's so many configurations when we get here, we may
> > not really know what to skip. If the compiler inlines something, then we
> > may skip something we do not want to.
> > 
> > I rather have extra information than not enough.
> > 
> > -- Steve
> 
> It may not be clean and be bit redundant, but I think it would be more 
> maintainable to treat
> 
> "skip(and skipped functions)" separately only at the top(parent) of 
> functions that display stack trace.

I think you'd better make a set of test programs which gets the stacktrace
with several different conditions (combinations of tracers/probes/kconfgis)
at first. Then we can make sure it does not break anything. 

Thank you,
diff mbox series

Patch

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c3b2c7dfadef..0f2e255f563c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2916,10 +2916,8 @@  static void __ftrace_trace_stack(struct trace_buffer *buffer,
 	 * Add one, for this function and the call to save_stack_trace()
 	 * If regs is set, then these functions will not be in the way.
 	 */
-#ifndef CONFIG_UNWINDER_ORC
-	if (!regs)
+	if (IS_ENABLED(CONFIG_UNWINDER_ORC) || !regs)
 		skip++;
-#endif
 
 	preempt_disable_notrace();