diff mbox series

[v3] perf,x86: avoid missing caller address in stack traces captured in uprobe

Message ID 20240703040203.3368505-1-andrii@kernel.org (mailing list archive)
State Superseded
Headers show
Series [v3] perf,x86: avoid missing caller address in stack traces captured in uprobe | expand

Commit Message

Andrii Nakryiko July 3, 2024, 4:02 a.m. UTC
When tracing user functions with uprobe functionality, it's common to
install the probe (e.g., a BPF program) at the first instruction of the
function. This is often going to be `push %rbp` instruction in function
preamble, which means that within that function frame pointer hasn't
been established yet. This leads to consistently missing an actual
caller of the traced function, because perf_callchain_user() only
records current IP (capturing traced function) and then following frame
pointer chain (which would be caller's frame, containing the address of
caller's caller).

So when we have target_1 -> target_2 -> target_3 call chain and we are
tracing an entry to target_3, captured stack trace will report
target_1 -> target_3 call chain, which is wrong and confusing.

This patch proposes a x86-64-specific heuristic to detect `push %rbp`
(`push %ebp` on 32-bit architecture) instruction being traced. Given
entire kernel implementation of user space stack trace capturing works
under assumption that user space code was compiled with frame pointer
register (%rbp/%ebp) preservation, it seems pretty reasonable to use
this instruction as a strong indicator that this is the entry to the
function. In that case, return address is still pointed to by %rsp/%esp,
so we fetch it and add to stack trace before proceeding to unwind the
rest using frame pointer-based logic.

We also check for `endbr64` (for 64-bit modes) as another common pattern
for function entry, as suggested by Josh Poimboeuf. Even if we get this
wrong sometimes for uprobes attached not at the function entry, it's OK
because stack trace will still be overall meaningful, just with one
extra bogus entry. If we don't detect this, we end up with guaranteed to
be missing caller function entry in the stack trace, which is worse
overall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
v2->v3:
  - added endr64 detection and extracted heuristics into a function (Josh);
v1->v2:
  - use native unsigned long for ret_addr (Peter);
  - add same logic for compat logic in perf_callchain_user32 (Peter).

 arch/x86/events/core.c  | 56 +++++++++++++++++++++++++++++++++++++++++
 include/linux/uprobes.h |  2 ++
 kernel/events/uprobes.c |  2 ++
 3 files changed, 60 insertions(+)

Comments

Josh Poimboeuf July 3, 2024, 10:39 p.m. UTC | #1
On Tue, Jul 02, 2024 at 09:02:03PM -0700, Andrii Nakryiko wrote:
> @@ -2833,6 +2858,18 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
>  
>  	fp = compat_ptr(ss_base + regs->bp);
>  	pagefault_disable();
> +
> +#ifdef CONFIG_UPROBES
> +	/* see perf_callchain_user() below for why we do this */
> +	if (current->utask) {
> +		u32 ret_addr;
> +
> +		if (is_uprobe_at_func_entry(regs, current->utask->auprobe) &&
> +		    !__get_user(ret_addr, (const u32 __user *)regs->sp))

Shouldn't the regs->sp value be checked with __access_ok() before
calling __get_user()?

Also, instead of littering functions with ifdefs it would be better to
abstract this out into a separate function which has an "always return
false" version for !CONFIG_UPROBES.  Then the above could be simplified to
something like:

	...
 	pagefault_disable();

	if (is_uprobe_at_func_entry(regs, current) &&
	    __access_ok(regs->sp, 4) &&
	    !__get_user(ret_addr, (const u32 __user *)regs->sp))
	    	perf_callchain_store(entry, ret_addr);
	...

Also it's good practice to wait at least several days before posting new
versions to avoid spamming reviewers and to give them time to digest
what you've already sent.
Andrii Nakryiko July 3, 2024, 11:26 p.m. UTC | #2
On Wed, Jul 3, 2024 at 3:39 PM Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> On Tue, Jul 02, 2024 at 09:02:03PM -0700, Andrii Nakryiko wrote:
> > @@ -2833,6 +2858,18 @@ perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
> >
> >       fp = compat_ptr(ss_base + regs->bp);
> >       pagefault_disable();
> > +
> > +#ifdef CONFIG_UPROBES
> > +     /* see perf_callchain_user() below for why we do this */
> > +     if (current->utask) {
> > +             u32 ret_addr;
> > +
> > +             if (is_uprobe_at_func_entry(regs, current->utask->auprobe) &&
> > +                 !__get_user(ret_addr, (const u32 __user *)regs->sp))
>
> Shouldn't the regs->sp value be checked with __access_ok() before
> calling __get_user()?

Ah, it's __get_user vs get_user quirk, right? Should I just use
get_user() here? It seems like existing code is trying to avoid two
__access_ok() checks for two fields of stack_frame, but here we don't
have that optimization opportunity anyways.

>
> Also, instead of littering functions with ifdefs it would be better to
> abstract this out into a separate function which has an "always return
> false" version for !CONFIG_UPROBES.  Then the above could be simplified to
> something like:

Sure, can do.

>
>         ...
>         pagefault_disable();

But I'd leave pagefault_disable() outside of that function, because
caller has to do it either way.

>
>         if (is_uprobe_at_func_entry(regs, current) &&
>             __access_ok(regs->sp, 4) &&
>             !__get_user(ret_addr, (const u32 __user *)regs->sp))
>                 perf_callchain_store(entry, ret_addr);
>         ...
>
> Also it's good practice to wait at least several days before posting new
> versions to avoid spamming reviewers and to give them time to digest
> what you've already sent.

I'm not sure about "at least several days", tbh. I generally try to
not post more often than once a day, and that only if I received some
meaningful reviewing feedback (like in your case). I do wait a few
days for reviews before pinging the mailing list again, though.

Would I get this feedback if I haven't posted v3? Or we'd just be
delaying the inevitable for a few more idle days? This particular
change (in it's initial version before yours and recent Peter's
comments) has been sitting under review since May 8th ([0], and then
posted without changes on May 21st, [1]), so I'm not exactly rushing
the things here.

Either way, I won't get to this until next week, so I won't spam you
too much anymore, sorry.

  [0] https://lore.kernel.org/linux-trace-kernel/20240508212605.4012172-4-andrii@kernel.org/
  [1] https://lore.kernel.org/linux-trace-kernel/20240522013845.1631305-4-andrii@kernel.org/

>
> --
> Josh
diff mbox series

Patch

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..2174a9d2173e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2813,6 +2813,31 @@  static unsigned long get_segment_base(unsigned int segment)
 	return get_desc_base(desc);
 }
 
+#ifdef CONFIG_UPROBES
+/*
+ * Heuristic-based check if uprobe is installed at the function entry.
+ *
+ * Under assumption of user code being compiled with frame pointers,
+ * `push %rbp/%ebp` is a good indicator that we indeed are.
+ *
+ * Similarly, `endbr64` (assuming 64-bit mode) is also a common pattern.
+ * If we get this wrong, captured stack trace might have one extra bogus
+ * entry, but the rest of stack trace will still be meaningful.
+ */
+static bool is_uprobe_at_func_entry(struct pt_regs *regs, struct arch_uprobe *auprobe)
+{
+	if (!auprobe)
+		return false;
+	/* push %rbp/%ebp */
+	if (auprobe->insn[0] == 0x55)
+		return true;
+	/* endbr64 (64-bit only) */
+	if (user_64bit_mode(regs) && *(u32 *)auprobe->insn == 0xfa1e0ff3)
+		return true;
+	return false;
+}
+#endif
+
 #ifdef CONFIG_IA32_EMULATION
 
 #include <linux/compat.h>
@@ -2833,6 +2858,18 @@  perf_callchain_user32(struct pt_regs *regs, struct perf_callchain_entry_ctx *ent
 
 	fp = compat_ptr(ss_base + regs->bp);
 	pagefault_disable();
+
+#ifdef CONFIG_UPROBES
+	/* see perf_callchain_user() below for why we do this */
+	if (current->utask) {
+		u32 ret_addr;
+
+		if (is_uprobe_at_func_entry(regs, current->utask->auprobe) &&
+		    !__get_user(ret_addr, (const u32 __user *)regs->sp))
+			perf_callchain_store(entry, ret_addr);
+	}
+#endif
+
 	while (entry->nr < entry->max_stack) {
 		if (!valid_user_frame(fp, sizeof(frame)))
 			break;
@@ -2884,6 +2921,25 @@  perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
 		return;
 
 	pagefault_disable();
+
+#ifdef CONFIG_UPROBES
+	/*
+	 * If we are called from uprobe handler, and we are indeed at the very
+	 * entry to user function (which is normally a `push %rbp` instruction,
+	 * under assumption of application being compiled with frame pointers),
+	 * we should read return address from *regs->sp before proceeding
+	 * to follow frame pointers, otherwise we'll skip immediate caller
+	 * as %rbp is not yet setup.
+	 */
+	if (current->utask) {
+		unsigned long ret_addr;
+
+		if (is_uprobe_at_func_entry(regs, current->utask->auprobe) &&
+		    !__get_user(ret_addr, (const unsigned long __user *)regs->sp))
+			perf_callchain_store(entry, ret_addr);
+	}
+#endif
+
 	while (entry->nr < entry->max_stack) {
 		if (!valid_user_frame(fp, sizeof(frame)))
 			break;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index b503fafb7fb3..a270a5892ab4 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -76,6 +76,8 @@  struct uprobe_task {
 	struct uprobe			*active_uprobe;
 	unsigned long			xol_vaddr;
 
+	struct arch_uprobe              *auprobe;
+
 	struct return_instance		*return_instances;
 	unsigned int			depth;
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 99be2adedbc0..6e22e4d80f1e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2082,6 +2082,7 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 	bool need_prep = false; /* prepare return uprobe, when needed */
 
 	down_read(&uprobe->register_rwsem);
+	current->utask->auprobe = &uprobe->arch;
 	for (uc = uprobe->consumers; uc; uc = uc->next) {
 		int rc = 0;
 
@@ -2096,6 +2097,7 @@  static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
 
 		remove &= rc;
 	}
+	current->utask->auprobe = NULL;
 
 	if (need_prep && !remove)
 		prepare_uretprobe(uprobe, regs); /* put bp at return */