diff mbox series

[RFC,-tip,3/3] x86/kprobes,orc: Unwind optprobe trampoline correctly

Message ID 161716949640.721514.14252504351086671126.stgit@devnote2 (mailing list archive)
State RFC
Headers show
Series x86/kprobes,orc: Fix ORC unwinder to unwind stack with optimized probe | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Masami Hiramatsu (Google) March 31, 2021, 5:44 a.m. UTC
ORC Unwinder can not unwind if an optprobe trampoline is on the
stack because optprobe trampoline code has no ORC information.

This uses the ORC information on the template code of the
trampoline to adjust the sp register by ORC information and
extract the correct probed address from the optprobe trampoline
address.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/include/asm/kprobes.h |    6 ++++
 arch/x86/kernel/kprobes/opt.c  |   54 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/unwind_orc.c   |   15 +++++++++--
 3 files changed, 72 insertions(+), 3 deletions(-)

Comments

Josh Poimboeuf March 31, 2021, 3:57 p.m. UTC | #1
On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote:
> +#ifdef CONFIG_UNWINDER_ORC
> +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp)
> +{
> +	unsigned long offset, entry, probe_addr;
> +	struct optimized_kprobe *op;
> +	struct orc_entry *orc;
> +
> +	entry = find_kprobe_optinsn_slot_entry(addr);
> +	if (!entry)
> +		return addr;
> +
> +	offset = addr - entry;
> +
> +	/* Decode arg1 and get the optprobe */
> +	op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX));
> +	if (!op)
> +		return addr;
> +
> +	probe_addr = (unsigned long)op->kp.addr;
> +
> +	if (offset < TMPL_END_IDX) {
> +		orc = orc_find((unsigned long)optprobe_template_func + offset);
> +		if (!orc || orc->sp_reg != ORC_REG_SP)
> +			return addr;
> +		/*
> +		 * Since optprobe trampoline doesn't push caller on the stack,
> +		 * need to decrement 1 stack entry size
> +		 */
> +		*sp += orc->sp_offset - sizeof(long);
> +		return probe_addr;
> +	} else {
> +		return probe_addr + offset - TMPL_END_IDX;
> +	}
> +}
> +#endif

Hm, I'd like to avoid intertwining kprobes and ORC like this.

ORC unwinds other generated code by assuming the generated code uses a
frame pointer.  Could we do that here?

With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has
ENCODE_FRAME_POINTER, but that's not going to work for ORC.

Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the
beginning of the template and 'pop %rbp' at the end?

I guess SAVE_REGS_STRING would need to be smart enough to push the
original saved version of %rbp.  Of course then that breaks the
kretprobe_trampoline() usage, so it may need to be a separate macro.

[ Or make the same change to kretprobe_trampoline().  Then the other
  patch set wouldn't be needed either ;-) ]

Of course the downside is, when you get an interrupt during the frame
pointer setup, unwinding is broken.  But I think that's acceptable for
generated code.  We've lived with that limitation for all code, with
CONFIG_FRAME_POINTER, for many years.

Eventually we may want to have a way to register generated code (and the
ORC for it).
Masami Hiramatsu (Google) April 1, 2021, 1:44 a.m. UTC | #2
On Wed, 31 Mar 2021 10:57:36 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote:
> > +#ifdef CONFIG_UNWINDER_ORC
> > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp)
> > +{
> > +	unsigned long offset, entry, probe_addr;
> > +	struct optimized_kprobe *op;
> > +	struct orc_entry *orc;
> > +
> > +	entry = find_kprobe_optinsn_slot_entry(addr);
> > +	if (!entry)
> > +		return addr;
> > +
> > +	offset = addr - entry;
> > +
> > +	/* Decode arg1 and get the optprobe */
> > +	op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX));
> > +	if (!op)
> > +		return addr;
> > +
> > +	probe_addr = (unsigned long)op->kp.addr;
> > +
> > +	if (offset < TMPL_END_IDX) {
> > +		orc = orc_find((unsigned long)optprobe_template_func + offset);
> > +		if (!orc || orc->sp_reg != ORC_REG_SP)
> > +			return addr;
> > +		/*
> > +		 * Since optprobe trampoline doesn't push caller on the stack,
> > +		 * need to decrement 1 stack entry size
> > +		 */
> > +		*sp += orc->sp_offset - sizeof(long);
> > +		return probe_addr;
> > +	} else {
> > +		return probe_addr + offset - TMPL_END_IDX;
> > +	}
> > +}
> > +#endif
> 
> Hm, I'd like to avoid intertwining kprobes and ORC like this.
> 
> ORC unwinds other generated code by assuming the generated code uses a
> frame pointer.  Could we do that here?

No, because the optprobe is not a function call. I considered to make
it call, but since it has to execute copied instructions directly on
the trampoline code (without changing stack frame) it is not possible.

> With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has
> ENCODE_FRAME_POINTER, but that's not going to work for ORC.

Even in that case, the problem is that any interrupt can happen
before doing ENCODE_FRAME_POINTER. I think this ENCODE_FRAME_POINTER
in the SAVE_REGS_STRING is for probing right before the target
function setup a frame pointer.

> Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the
> beginning of the template and 'pop %rbp' at the end?

No, since the trampoline code is not called, it is jumped into.
This means there is no "return address" in the stack. If we setup
the frame, there is no return address, thus it might stop there.
(Moreover, optprobe can copy multiple instructins on trampoline
buffer, since relative jump consumes 5bytes. where is the "return address"?)

> 
> I guess SAVE_REGS_STRING would need to be smart enough to push the
> original saved version of %rbp.  Of course then that breaks the
> kretprobe_trampoline() usage, so it may need to be a separate macro.
> 
> [ Or make the same change to kretprobe_trampoline().  Then the other
>   patch set wouldn't be needed either ;-) ]

Hmm, I don't think it is a good idea which making such change on the
optimized (hot) path only for the stack tracing. Moreover, that maybe
not transparent with the stack made by int3.

> Of course the downside is, when you get an interrupt during the frame
> pointer setup, unwinding is broken.  But I think that's acceptable for
> generated code.  We've lived with that limitation for all code, with
> CONFIG_FRAME_POINTER, for many years.

But above code can fix such issue too. To fix a corner case, non-generic
code may be required, even it is not so simple.

> 
> Eventually we may want to have a way to register generated code (and the
> ORC for it).
> 
> -- 
> Josh
>
Masami Hiramatsu (Google) April 1, 2021, 1:54 a.m. UTC | #3
On Wed, 31 Mar 2021 14:44:56 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:


> +#ifdef CONFIG_UNWINDER_ORC
> +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp)
> +{
> +	unsigned long offset, entry, probe_addr;
> +	struct optimized_kprobe *op;
> +	struct orc_entry *orc;
> +
> +	entry = find_kprobe_optinsn_slot_entry(addr);
> +	if (!entry)
> +		return addr;
> +
> +	offset = addr - entry;
> +
> +	/* Decode arg1 and get the optprobe */
> +	op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX));
> +	if (!op)
> +		return addr;
> +
> +	probe_addr = (unsigned long)op->kp.addr;
> +
> +	if (offset < TMPL_END_IDX) {

Maybe I should add a comment here.

/*
 * Since this is on the trampoline code based on the template code,
 * ORC information on the template code can be used for adjusting
 * stack pointer. The template code may change the stack pointer to
 * store pt_regs.
 */

> +		orc = orc_find((unsigned long)optprobe_template_func + offset);
> +		if (!orc || orc->sp_reg != ORC_REG_SP)
> +			return addr;
> +		/*
> +		 * Since optprobe trampoline doesn't push caller on the stack,
> +		 * need to decrement 1 stack entry size
> +		 */
> +		*sp += orc->sp_offset - sizeof(long);
> +		return probe_addr;
> +	} else {

/*
 * In this case, the address is on the instruction copied from probed
 * address, and jump instruction. Here the stack pointer is exactly
 * the same as the value where it was copied by the kprobes.
 */

> +		return probe_addr + offset - TMPL_END_IDX;
> +	}
> +}
> +#endif
> +


Thank you,
Masami Hiramatsu (Google) April 1, 2021, 2:19 a.m. UTC | #4
On Thu, 1 Apr 2021 10:44:52 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Wed, 31 Mar 2021 10:57:36 -0500
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote:
> > > +#ifdef CONFIG_UNWINDER_ORC
> > > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp)
> > > +{
> > > +	unsigned long offset, entry, probe_addr;
> > > +	struct optimized_kprobe *op;
> > > +	struct orc_entry *orc;
> > > +
> > > +	entry = find_kprobe_optinsn_slot_entry(addr);
> > > +	if (!entry)
> > > +		return addr;
> > > +
> > > +	offset = addr - entry;
> > > +
> > > +	/* Decode arg1 and get the optprobe */
> > > +	op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX));
> > > +	if (!op)
> > > +		return addr;
> > > +
> > > +	probe_addr = (unsigned long)op->kp.addr;
> > > +
> > > +	if (offset < TMPL_END_IDX) {
> > > +		orc = orc_find((unsigned long)optprobe_template_func + offset);
> > > +		if (!orc || orc->sp_reg != ORC_REG_SP)
> > > +			return addr;
> > > +		/*
> > > +		 * Since optprobe trampoline doesn't push caller on the stack,
> > > +		 * need to decrement 1 stack entry size
> > > +		 */
> > > +		*sp += orc->sp_offset - sizeof(long);
> > > +		return probe_addr;
> > > +	} else {
> > > +		return probe_addr + offset - TMPL_END_IDX;
> > > +	}
> > > +}
> > > +#endif
> > 
> > Hm, I'd like to avoid intertwining kprobes and ORC like this.
> > 
> > ORC unwinds other generated code by assuming the generated code uses a
> > frame pointer.  Could we do that here?
> 
> No, because the optprobe is not a function call. I considered to make
> it call, but since it has to execute copied instructions directly on
> the trampoline code (without changing stack frame) it is not possible.
> 
> > With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has
> > ENCODE_FRAME_POINTER, but that's not going to work for ORC.
> 
> Even in that case, the problem is that any interrupt can happen
> before doing ENCODE_FRAME_POINTER. I think this ENCODE_FRAME_POINTER
> in the SAVE_REGS_STRING is for probing right before the target
> function setup a frame pointer.
> 
> > Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the
> > beginning of the template and 'pop %rbp' at the end?
> 
> No, since the trampoline code is not called, it is jumped into.
> This means there is no "return address" in the stack. If we setup
> the frame, there is no return address, thus it might stop there.
> (Moreover, optprobe can copy multiple instructins on trampoline
> buffer, since relative jump consumes 5bytes. where is the "return address"?)
> 
> > 
> > I guess SAVE_REGS_STRING would need to be smart enough to push the
> > original saved version of %rbp.  Of course then that breaks the
> > kretprobe_trampoline() usage, so it may need to be a separate macro.
> > 
> > [ Or make the same change to kretprobe_trampoline().  Then the other
> >   patch set wouldn't be needed either ;-) ]
> 
> Hmm, I don't think it is a good idea which making such change on the
> optimized (hot) path only for the stack tracing. Moreover, that maybe
> not transparent with the stack made by int3.
> 
> > Of course the downside is, when you get an interrupt during the frame
> > pointer setup, unwinding is broken.  But I think that's acceptable for
> > generated code.  We've lived with that limitation for all code, with
> > CONFIG_FRAME_POINTER, for many years.
> 
> But above code can fix such issue too. To fix a corner case, non-generic
> code may be required, even it is not so simple.

Hmm, I would like to confirm your policy on ORC unwinder. If it doesn't
care the stacktrace from the interrupt handler, I think your suggestion
is OK. But in that case, from a developer viewpoint, I need to recommend
users to configure CONFIG_UNWIND_FRAME=y when CONFIG_KPROBES=y.

> > Eventually we may want to have a way to register generated code (and the
> > ORC for it).

I see, but the generated code usually does not have a generic way to
handle it. E.g. bpf has a solid entry point, but kretprobe trampoline's
entry point is any "RET", optprobe trampoline's entry point is a jump
which is also generated (patched) ...

Thank you,
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 71ea2eab43d5..9bbc45fcb3f1 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -119,9 +119,15 @@  extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
 extern int kprobe_int3_handler(struct pt_regs *regs);
 
+unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp);
 #else
 
 static inline int kprobe_debug_handler(struct pt_regs *regs) { return 0; }
+static inline unsigned long recover_optprobe_trampoline(unsigned long addr,
+							unsigned long *sp)
+{
+	return addr;
+}
 
 #endif /* CONFIG_KPROBES */
 #endif /* _ASM_X86_KPROBES_H */
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 6d26e5cf2ba2..f91922ba4844 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -30,6 +30,9 @@ 
 #include <asm/set_memory.h>
 #include <asm/sections.h>
 #include <asm/nospec-branch.h>
+#include <asm/orc_types.h>
+
+struct orc_entry *orc_find(unsigned long ip);
 
 #include "common.h"
 
@@ -100,6 +103,21 @@  static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
 	*(unsigned long *)addr = val;
 }
 
+/* Extract mov operand */
+static unsigned long extract_set_arg1(kprobe_opcode_t *addr)
+{
+#ifdef CONFIG_X86_64
+	if (addr[0] != 0x48 || addr[1] != 0xbf)
+		return 0;
+	addr += 2;
+#else
+	if (*addr != 0xb8)
+		return 0;
+	addr++;
+#endif
+	return *(unsigned long *)addr;
+}
+
 static void
 optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs);
 
@@ -483,6 +501,42 @@  int arch_prepare_optimized_kprobe(struct optimized_kprobe *op,
 	goto out;
 }
 
+#ifdef CONFIG_UNWINDER_ORC
+unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp)
+{
+	unsigned long offset, entry, probe_addr;
+	struct optimized_kprobe *op;
+	struct orc_entry *orc;
+
+	entry = find_kprobe_optinsn_slot_entry(addr);
+	if (!entry)
+		return addr;
+
+	offset = addr - entry;
+
+	/* Decode arg1 and get the optprobe */
+	op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX));
+	if (!op)
+		return addr;
+
+	probe_addr = (unsigned long)op->kp.addr;
+
+	if (offset < TMPL_END_IDX) {
+		orc = orc_find((unsigned long)optprobe_template_func + offset);
+		if (!orc || orc->sp_reg != ORC_REG_SP)
+			return addr;
+		/*
+		 * Since optprobe trampoline doesn't push caller on the stack,
+		 * need to decrement 1 stack entry size
+		 */
+		*sp += orc->sp_offset - sizeof(long);
+		return probe_addr;
+	} else {
+		return probe_addr + offset - TMPL_END_IDX;
+	}
+}
+#endif
+
 /*
  * Replace breakpoints (INT3) with relative jumps (JMP.d32).
  * Caller must call with locking kprobe_mutex and text_mutex.
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index c70dfeea4552..9f685f9c2358 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -79,7 +79,7 @@  static struct orc_entry *orc_module_find(unsigned long ip)
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
-static struct orc_entry *orc_find(unsigned long ip);
+struct orc_entry *orc_find(unsigned long ip);
 
 /*
  * Ftrace dynamic trampolines do not have orc entries of their own.
@@ -142,7 +142,7 @@  static struct orc_entry orc_fp_entry = {
 	.end		= 0,
 };
 
-static struct orc_entry *orc_find(unsigned long ip)
+struct orc_entry *orc_find(unsigned long ip)
 {
 	static struct orc_entry *orc;
 
@@ -537,6 +537,7 @@  bool unwind_next_frame(struct unwind_state *state)
 
 		state->ip = unwind_recover_ret_addr(state, state->ip,
 						    (unsigned long *)ip_p);
+		state->ip = recover_optprobe_trampoline(state->ip, &sp);
 		state->sp = sp;
 		state->regs = NULL;
 		state->prev_regs = NULL;
@@ -558,6 +559,14 @@  bool unwind_next_frame(struct unwind_state *state)
 		 */
 		state->ip = unwind_recover_kretprobe(state, state->ip,
 				(unsigned long *)(state->sp - sizeof(long)));
+
+		/*
+		 * The optprobe trampoline has a unique stackframe. It has
+		 * no caller (probed) address on the stack, Thus it has to
+		 * decode the trampoline code and change the stack pointer
+		 * for the next frame, but not change the pt_regs.
+		 */
+		state->ip = recover_optprobe_trampoline(state->ip, &state->sp);
 		state->regs = (struct pt_regs *)sp;
 		state->prev_regs = NULL;
 		state->full_regs = true;
@@ -573,7 +582,7 @@  bool unwind_next_frame(struct unwind_state *state)
 		/* See UNWIND_HINT_TYPE_REGS case comment. */
 		state->ip = unwind_recover_kretprobe(state, state->ip,
 				(unsigned long *)(state->sp - sizeof(long)));
-
+		state->ip = recover_optprobe_trampoline(state->ip, &state->sp);
 		if (state->full_regs)
 			state->prev_regs = state->regs;
 		state->regs = (void *)sp - IRET_FRAME_OFFSET;