diff mbox series

[QUESTION] ftrace_test_recursion_trylock behaviour

Message ID ZUKLnmYyHpthlMEE@krava (mailing list archive)
State Not Applicable
Headers show
Series [QUESTION] ftrace_test_recursion_trylock behaviour | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Jiri Olsa Nov. 1, 2023, 5:32 p.m. UTC
hi,
I'm doing some testing on top of fprobes and noticed that the
ftrace_test_recursion_trylock allows caller from the same context
going through twice.

The change below adds extra fprobe on stack_trace_print, which is
called within the sample_entry_handler and I can see it being executed
with following trace output:

           <...>-457     [003] ...1.    32.352554: sample_entry_handler: Enter <kernel_clone+0x0/0x380> ip = 0xffffffff81177420
           <...>-457     [003] ...2.    32.352578: sample_entry_handler_extra: Enter <stack_trace_print+0x0/0x60> ip = 0xffffffff8127ae70

IOW nested ftrace_test_recursion_trylock call in the same context succeeded.

It seems the reason is the TRACE_CTX_TRANSITION bit logic.

Just making sure it's intentional.. we have kprobe_multi code on top of
fprobe with another re-entry logic and that might behave differently based
on ftrace_test_recursion_trylock logic.

thanks,
jirka


---

Comments

Steven Rostedt Nov. 1, 2023, 5:45 p.m. UTC | #1
On Wed, 1 Nov 2023 18:32:14 +0100
Jiri Olsa <olsajiri@gmail.com> wrote:

> hi,
> I'm doing some testing on top of fprobes and noticed that the
> ftrace_test_recursion_trylock allows caller from the same context
> going through twice.
> 
> The change below adds extra fprobe on stack_trace_print, which is
> called within the sample_entry_handler and I can see it being executed
> with following trace output:
> 
>            <...>-457     [003] ...1.    32.352554: sample_entry_handler:
> Enter <kernel_clone+0x0/0x380> ip = 0xffffffff81177420 <...>-457
> [003] ...2.    32.352578: sample_entry_handler_extra: Enter
> <stack_trace_print+0x0/0x60> ip = 0xffffffff8127ae70
> 
> IOW nested ftrace_test_recursion_trylock call in the same context
> succeeded.
> 
> It seems the reason is the TRACE_CTX_TRANSITION bit logic.
> 
> Just making sure it's intentional.. we have kprobe_multi code on top of
> fprobe with another re-entry logic and that might behave differently based
> on ftrace_test_recursion_trylock logic.

Yes it's intentional, as it's a work around for an issue that may be
cleared up now with Peter Zijlstra's noinstr updates.

The use case for that TRACE_CTX_TRANSITION is when a function is traced
just after an interrupt was triggered but before the preempt count was
updated to let us know that we are in an interrupt context.

Daniel Bristot reported a regression after the trylock was first introduced
where the interrupt entry function was traced sometimes but not always.
That's because if the interrupt happened normally, it would be traced, but
if the interrupt happened when another event was being traced, the recursion
logic would see that the trace of the interrupt was happening in the same
context as the event it interrupted and drop the interrupt trace. But after
the preempt count was updated, the other functions in the interrupt would be
seen. This led to very confusing trace output.

The solution to that was this workaround hack, where the trace recursion
logic would allow a single recursion (the interrupt preempting another
trace before it set preempt count).

But with noinstr, there should be no more instances of this problem and we
can drop that extra bit. But the last I checked, there were a few places
that still could be traced without the preempt_count set. I'll have to
re-investigate.

-- Steve
Jiri Olsa Nov. 2, 2023, 8:50 a.m. UTC | #2
On Wed, Nov 01, 2023 at 01:45:56PM -0400, Steven Rostedt wrote:
> On Wed, 1 Nov 2023 18:32:14 +0100
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > hi,
> > I'm doing some testing on top of fprobes and noticed that the
> > ftrace_test_recursion_trylock allows caller from the same context
> > going through twice.
> > 
> > The change below adds extra fprobe on stack_trace_print, which is
> > called within the sample_entry_handler and I can see it being executed
> > with following trace output:
> > 
> >            <...>-457     [003] ...1.    32.352554: sample_entry_handler:
> > Enter <kernel_clone+0x0/0x380> ip = 0xffffffff81177420 <...>-457
> > [003] ...2.    32.352578: sample_entry_handler_extra: Enter
> > <stack_trace_print+0x0/0x60> ip = 0xffffffff8127ae70
> > 
> > IOW nested ftrace_test_recursion_trylock call in the same context
> > succeeded.
> > 
> > It seems the reason is the TRACE_CTX_TRANSITION bit logic.
> > 
> > Just making sure it's intentional.. we have kprobe_multi code on top of
> > fprobe with another re-entry logic and that might behave differently based
> > on ftrace_test_recursion_trylock logic.
> 
> Yes it's intentional, as it's a work around for an issue that may be
> cleared up now with Peter Zijlstra's noinstr updates.
> 
> The use case for that TRACE_CTX_TRANSITION is when a function is traced
> just after an interrupt was triggered but before the preempt count was
> updated to let us know that we are in an interrupt context.
> 
> Daniel Bristot reported a regression after the trylock was first introduced
> where the interrupt entry function was traced sometimes but not always.
> That's because if the interrupt happened normally, it would be traced, but
> if the interrupt happened when another event was being traced, the recursion
> logic would see that the trace of the interrupt was happening in the same
> context as the event it interrupted and drop the interrupt trace. But after
> the preempt count was updated, the other functions in the interrupt would be
> seen. This led to very confusing trace output.
> 
> The solution to that was this workaround hack, where the trace recursion
> logic would allow a single recursion (the interrupt preempting another
> trace before it set preempt count).
> 
> But with noinstr, there should be no more instances of this problem and we
> can drop that extra bit. But the last I checked, there were a few places
> that still could be traced without the preempt_count set. I'll have to
> re-investigate.

I see, so I'll keep in mind that it could change in the future

thanks,
jirka
diff mbox series

Patch

diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 64e715e7ed11..531272af0682 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -23,6 +23,9 @@ 
 static struct fprobe sample_probe;
 static unsigned long nhit;
 
+static struct fprobe sample_probe_extra;
+static char *symbol_extra = "stack_trace_print";
+
 static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
 module_param_string(symbol, symbol, sizeof(symbol), 0644);
 MODULE_PARM_DESC(symbol, "Probed symbol(s), given by comma separated symbols or a wildcard pattern.");
@@ -48,6 +51,15 @@  static void show_backtrace(void)
 	stack_trace_print(stacks, len, 24);
 }
 
+static int sample_entry_handler_extra(struct fprobe *fp, unsigned long ip,
+				      unsigned long ret_ip,
+				      struct pt_regs *regs, void *data)
+{
+	if (use_trace)
+		trace_printk("Enter <%pS> ip = 0x%p\n", (void *)ip, (void *)ip);
+	return 0;
+}
+
 static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
 				unsigned long ret_ip,
 				struct pt_regs *regs, void *data)
@@ -96,6 +108,8 @@  static int __init fprobe_init(void)
 	sample_probe.entry_handler = sample_entry_handler;
 	sample_probe.exit_handler = sample_exit_handler;
 
+	sample_probe_extra.entry_handler = sample_entry_handler_extra;
+
 	if (strchr(symbol, '*')) {
 		/* filter based fprobe */
 		ret = register_fprobe(&sample_probe, symbol,
@@ -137,12 +151,19 @@  static int __init fprobe_init(void)
 	else
 		pr_info("Planted fprobe at %s\n", symbol);
 
+	ret = register_fprobe_syms(&sample_probe_extra, (const char **) &symbol_extra, 1);
+	if (ret < 0)
+		pr_err("register_fprobe extra failed, returned %d\n", ret);
+	else
+		pr_info("Planted extra fprobe at %s\n", symbol_extra);
+
 	return ret;
 }
 
 static void __exit fprobe_exit(void)
 {
 	unregister_fprobe(&sample_probe);
+	unregister_fprobe(&sample_probe_extra);
 
 	pr_info("fprobe at %s unregistered. %ld times hit, %ld times missed\n",
 		symbol, nhit, sample_probe.nmissed);