Message ID | 161716949640.721514.14252504351086671126.stgit@devnote2 (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | x86/kprobes,orc: Fix ORC unwinder to unwind stack with optimized probe | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote: > +#ifdef CONFIG_UNWINDER_ORC > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp) > +{ > + unsigned long offset, entry, probe_addr; > + struct optimized_kprobe *op; > + struct orc_entry *orc; > + > + entry = find_kprobe_optinsn_slot_entry(addr); > + if (!entry) > + return addr; > + > + offset = addr - entry; > + > + /* Decode arg1 and get the optprobe */ > + op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX)); > + if (!op) > + return addr; > + > + probe_addr = (unsigned long)op->kp.addr; > + > + if (offset < TMPL_END_IDX) { > + orc = orc_find((unsigned long)optprobe_template_func + offset); > + if (!orc || orc->sp_reg != ORC_REG_SP) > + return addr; > + /* > + * Since optprobe trampoline doesn't push caller on the stack, > + * need to decrement 1 stack entry size > + */ > + *sp += orc->sp_offset - sizeof(long); > + return probe_addr; > + } else { > + return probe_addr + offset - TMPL_END_IDX; > + } > +} > +#endif Hm, I'd like to avoid intertwining kprobes and ORC like this. ORC unwinds other generated code by assuming the generated code uses a frame pointer. Could we do that here? With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has ENCODE_FRAME_POINTER, but that's not going to work for ORC. Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the beginning of the template and 'pop %rbp' at the end? I guess SAVE_REGS_STRING would need to be smart enough to push the original saved version of %rbp. Of course then that breaks the kretprobe_trampoline() usage, so it may need to be a separate macro. [ Or make the same change to kretprobe_trampoline(). Then the other patch set wouldn't be needed either ;-) ] Of course the downside is, when you get an interrupt during the frame pointer setup, unwinding is broken. But I think that's acceptable for generated code. We've lived with that limitation for all code, with CONFIG_FRAME_POINTER, for many years. Eventually we may want to have a way to register generated code (and the ORC for it).
On Wed, 31 Mar 2021 10:57:36 -0500 Josh Poimboeuf <jpoimboe@redhat.com> wrote: > On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote: > > +#ifdef CONFIG_UNWINDER_ORC > > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp) > > +{ > > + unsigned long offset, entry, probe_addr; > > + struct optimized_kprobe *op; > > + struct orc_entry *orc; > > + > > + entry = find_kprobe_optinsn_slot_entry(addr); > > + if (!entry) > > + return addr; > > + > > + offset = addr - entry; > > + > > + /* Decode arg1 and get the optprobe */ > > + op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX)); > > + if (!op) > > + return addr; > > + > > + probe_addr = (unsigned long)op->kp.addr; > > + > > + if (offset < TMPL_END_IDX) { > > + orc = orc_find((unsigned long)optprobe_template_func + offset); > > + if (!orc || orc->sp_reg != ORC_REG_SP) > > + return addr; > > + /* > > + * Since optprobe trampoline doesn't push caller on the stack, > > + * need to decrement 1 stack entry size > > + */ > > + *sp += orc->sp_offset - sizeof(long); > > + return probe_addr; > > + } else { > > + return probe_addr + offset - TMPL_END_IDX; > > + } > > +} > > +#endif > > Hm, I'd like to avoid intertwining kprobes and ORC like this. > > ORC unwinds other generated code by assuming the generated code uses a > frame pointer. Could we do that here? No, because the optprobe is not a function call. I considered to make it call, but since it has to execute copied instructions directly on the trampoline code (without changing stack frame) it is not possible. > With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has > ENCODE_FRAME_POINTER, but that's not going to work for ORC. Even in that case, the problem is that any interrupt can happen before doing ENCODE_FRAME_POINTER. I think this ENCODE_FRAME_POINTER in the SAVE_REGS_STRING is for probing right before the target function setup a frame pointer. > Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the > beginning of the template and 'pop %rbp' at the end? No, since the trampoline code is not called, it is jumped into. This means there is no "return address" in the stack. If we setup the frame, there is no return address, thus it might stop there. (Moreover, optprobe can copy multiple instructins on trampoline buffer, since relative jump consumes 5bytes. where is the "return address"?) > > I guess SAVE_REGS_STRING would need to be smart enough to push the > original saved version of %rbp. Of course then that breaks the > kretprobe_trampoline() usage, so it may need to be a separate macro. > > [ Or make the same change to kretprobe_trampoline(). Then the other > patch set wouldn't be needed either ;-) ] Hmm, I don't think it is a good idea which making such change on the optimized (hot) path only for the stack tracing. Moreover, that maybe not transparent with the stack made by int3. > Of course the downside is, when you get an interrupt during the frame > pointer setup, unwinding is broken. But I think that's acceptable for > generated code. We've lived with that limitation for all code, with > CONFIG_FRAME_POINTER, for many years. But above code can fix such issue too. To fix a corner case, non-generic code may be required, even it is not so simple. > > Eventually we may want to have a way to register generated code (and the > ORC for it). > > -- > Josh >
On Wed, 31 Mar 2021 14:44:56 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > +#ifdef CONFIG_UNWINDER_ORC > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp) > +{ > + unsigned long offset, entry, probe_addr; > + struct optimized_kprobe *op; > + struct orc_entry *orc; > + > + entry = find_kprobe_optinsn_slot_entry(addr); > + if (!entry) > + return addr; > + > + offset = addr - entry; > + > + /* Decode arg1 and get the optprobe */ > + op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX)); > + if (!op) > + return addr; > + > + probe_addr = (unsigned long)op->kp.addr; > + > + if (offset < TMPL_END_IDX) { Maybe I should add a comment here. /* * Since this is on the trampoline code based on the template code, * ORC information on the template code can be used for adjusting * stack pointer. The template code may change the stack pointer to * store pt_regs. */ > + orc = orc_find((unsigned long)optprobe_template_func + offset); > + if (!orc || orc->sp_reg != ORC_REG_SP) > + return addr; > + /* > + * Since optprobe trampoline doesn't push caller on the stack, > + * need to decrement 1 stack entry size > + */ > + *sp += orc->sp_offset - sizeof(long); > + return probe_addr; > + } else { /* * In this case, the address is on the instruction copied from probed * address, and jump instruction. Here the stack pointer is exactly * the same as the value where it was copied by the kprobes. */ > + return probe_addr + offset - TMPL_END_IDX; > + } > +} > +#endif > + Thank you,
On Thu, 1 Apr 2021 10:44:52 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > On Wed, 31 Mar 2021 10:57:36 -0500 > Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > On Wed, Mar 31, 2021 at 02:44:56PM +0900, Masami Hiramatsu wrote: > > > +#ifdef CONFIG_UNWINDER_ORC > > > +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp) > > > +{ > > > + unsigned long offset, entry, probe_addr; > > > + struct optimized_kprobe *op; > > > + struct orc_entry *orc; > > > + > > > + entry = find_kprobe_optinsn_slot_entry(addr); > > > + if (!entry) > > > + return addr; > > > + > > > + offset = addr - entry; > > > + > > > + /* Decode arg1 and get the optprobe */ > > > + op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX)); > > > + if (!op) > > > + return addr; > > > + > > > + probe_addr = (unsigned long)op->kp.addr; > > > + > > > + if (offset < TMPL_END_IDX) { > > > + orc = orc_find((unsigned long)optprobe_template_func + offset); > > > + if (!orc || orc->sp_reg != ORC_REG_SP) > > > + return addr; > > > + /* > > > + * Since optprobe trampoline doesn't push caller on the stack, > > > + * need to decrement 1 stack entry size > > > + */ > > > + *sp += orc->sp_offset - sizeof(long); > > > + return probe_addr; > > > + } else { > > > + return probe_addr + offset - TMPL_END_IDX; > > > + } > > > +} > > > +#endif > > > > Hm, I'd like to avoid intertwining kprobes and ORC like this. > > > > ORC unwinds other generated code by assuming the generated code uses a > > frame pointer. Could we do that here? > > No, because the optprobe is not a function call. I considered to make > it call, but since it has to execute copied instructions directly on > the trampoline code (without changing stack frame) it is not possible. > > > With CONFIG_FRAME_POINTER, unwinding works because SAVE_REGS_STRING has > > ENCODE_FRAME_POINTER, but that's not going to work for ORC. > > Even in that case, the problem is that any interrupt can happen > before doing ENCODE_FRAME_POINTER. I think this ENCODE_FRAME_POINTER > in the SAVE_REGS_STRING is for probing right before the target > function setup a frame pointer. > > > Instead of these patches, can we 'push %rbp; mov %rsp, %rbp' at the > > beginning of the template and 'pop %rbp' at the end? > > No, since the trampoline code is not called, it is jumped into. > This means there is no "return address" in the stack. If we setup > the frame, there is no return address, thus it might stop there. > (Moreover, optprobe can copy multiple instructins on trampoline > buffer, since relative jump consumes 5bytes. where is the "return address"?) > > > > > I guess SAVE_REGS_STRING would need to be smart enough to push the > > original saved version of %rbp. Of course then that breaks the > > kretprobe_trampoline() usage, so it may need to be a separate macro. > > > > [ Or make the same change to kretprobe_trampoline(). Then the other > > patch set wouldn't be needed either ;-) ] > > Hmm, I don't think it is a good idea which making such change on the > optimized (hot) path only for the stack tracing. Moreover, that maybe > not transparent with the stack made by int3. > > > Of course the downside is, when you get an interrupt during the frame > > pointer setup, unwinding is broken. But I think that's acceptable for > > generated code. We've lived with that limitation for all code, with > > CONFIG_FRAME_POINTER, for many years. > > But above code can fix such issue too. To fix a corner case, non-generic > code may be required, even it is not so simple. Hmm, I would like to confirm your policy on ORC unwinder. If it doesn't care the stacktrace from the interrupt handler, I think your suggestion is OK. But in that case, from a developer viewpoint, I need to recommend users to configure CONFIG_UNWIND_FRAME=y when CONFIG_KPROBES=y. > > Eventually we may want to have a way to register generated code (and the > > ORC for it). I see, but the generated code usually does not have a generic way to handle it. E.g. bpf has a solid entry point, but kretprobe trampoline's entry point is any "RET", optprobe trampoline's entry point is a jump which is also generated (patched) ... Thank you,
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h index 71ea2eab43d5..9bbc45fcb3f1 100644 --- a/arch/x86/include/asm/kprobes.h +++ b/arch/x86/include/asm/kprobes.h @@ -119,9 +119,15 @@ extern int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); extern int kprobe_int3_handler(struct pt_regs *regs); +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp); #else static inline int kprobe_debug_handler(struct pt_regs *regs) { return 0; } +static inline unsigned long recover_optprobe_trampoline(unsigned long addr, + unsigned long *sp) +{ + return addr; +} #endif /* CONFIG_KPROBES */ #endif /* _ASM_X86_KPROBES_H */ diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c index 6d26e5cf2ba2..f91922ba4844 100644 --- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -30,6 +30,9 @@ #include <asm/set_memory.h> #include <asm/sections.h> #include <asm/nospec-branch.h> +#include <asm/orc_types.h> + +struct orc_entry *orc_find(unsigned long ip); #include "common.h" @@ -100,6 +103,21 @@ static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val) *(unsigned long *)addr = val; } +/* Extract mov operand */ +static unsigned long extract_set_arg1(kprobe_opcode_t *addr) +{ +#ifdef CONFIG_X86_64 + if (addr[0] != 0x48 || addr[1] != 0xbf) + return 0; + addr += 2; +#else + if (*addr != 0xb8) + return 0; + addr++; +#endif + return *(unsigned long *)addr; +} + static void optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs); @@ -483,6 +501,42 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, goto out; } +#ifdef CONFIG_UNWINDER_ORC +unsigned long recover_optprobe_trampoline(unsigned long addr, unsigned long *sp) +{ + unsigned long offset, entry, probe_addr; + struct optimized_kprobe *op; + struct orc_entry *orc; + + entry = find_kprobe_optinsn_slot_entry(addr); + if (!entry) + return addr; + + offset = addr - entry; + + /* Decode arg1 and get the optprobe */ + op = (void *)extract_set_arg1((void *)(entry + TMPL_MOVE_IDX)); + if (!op) + return addr; + + probe_addr = (unsigned long)op->kp.addr; + + if (offset < TMPL_END_IDX) { + orc = orc_find((unsigned long)optprobe_template_func + offset); + if (!orc || orc->sp_reg != ORC_REG_SP) + return addr; + /* + * Since optprobe trampoline doesn't push caller on the stack, + * need to decrement 1 stack entry size + */ + *sp += orc->sp_offset - sizeof(long); + return probe_addr; + } else { + return probe_addr + offset - TMPL_END_IDX; + } +} +#endif + /* * Replace breakpoints (INT3) with relative jumps (JMP.d32). * Caller must call with locking kprobe_mutex and text_mutex. diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index c70dfeea4552..9f685f9c2358 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -79,7 +79,7 @@ static struct orc_entry *orc_module_find(unsigned long ip) #endif #ifdef CONFIG_DYNAMIC_FTRACE -static struct orc_entry *orc_find(unsigned long ip); +struct orc_entry *orc_find(unsigned long ip); /* * Ftrace dynamic trampolines do not have orc entries of their own. @@ -142,7 +142,7 @@ static struct orc_entry orc_fp_entry = { .end = 0, }; -static struct orc_entry *orc_find(unsigned long ip) +struct orc_entry *orc_find(unsigned long ip) { static struct orc_entry *orc; @@ -537,6 +537,7 @@ bool unwind_next_frame(struct unwind_state *state) state->ip = unwind_recover_ret_addr(state, state->ip, (unsigned long *)ip_p); + state->ip = recover_optprobe_trampoline(state->ip, &sp); state->sp = sp; state->regs = NULL; state->prev_regs = NULL; @@ -558,6 +559,14 @@ bool unwind_next_frame(struct unwind_state *state) */ state->ip = unwind_recover_kretprobe(state, state->ip, (unsigned long *)(state->sp - sizeof(long))); + + /* + * The optprobe trampoline has a unique stackframe. It has + * no caller (probed) address on the stack, Thus it has to + * decode the trampoline code and change the stack pointer + * for the next frame, but not change the pt_regs. + */ + state->ip = recover_optprobe_trampoline(state->ip, &state->sp); state->regs = (struct pt_regs *)sp; state->prev_regs = NULL; state->full_regs = true; @@ -573,7 +582,7 @@ bool unwind_next_frame(struct unwind_state *state) /* See UNWIND_HINT_TYPE_REGS case comment. */ state->ip = unwind_recover_kretprobe(state, state->ip, (unsigned long *)(state->sp - sizeof(long))); - + state->ip = recover_optprobe_trampoline(state->ip, &state->sp); if (state->full_regs) state->prev_regs = state->regs; state->regs = (void *)sp - IRET_FRAME_OFFSET;
ORC Unwinder can not unwind if an optprobe trampoline is on the stack because optprobe trampoline code has no ORC information. This uses the ORC information on the template code of the trampoline to adjust the sp register by ORC information and extract the correct probed address from the optprobe trampoline address. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- arch/x86/include/asm/kprobes.h | 6 ++++ arch/x86/kernel/kprobes/opt.c | 54 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/unwind_orc.c | 15 +++++++++-- 3 files changed, 72 insertions(+), 3 deletions(-)