[2/4] x86/kprobes: Fix frame pointer annotations

Message ID	20190508080612.721269814@infradead.org (mailing list archive)
State	New
Headers	show Return-Path: <linux-kselftest-owner@kernel.org> Message-Id: <20190508080612.721269814@infradead.org> User-Agent: quilt/0.65 Date: Wed, 08 May 2019 09:49:03 +0200 From: Peter Zijlstra <peterz@infradead.org> To: linux-kernel@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Nicolai Stange <nstange@suse.de>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org, Josh Poimboeuf <jpoimboe@redhat.com>, Jiri Kosina <jikos@kernel.org>, Miroslav Benes <mbenes@suse.cz>, Petr Mladek <pmladek@suse.com>, Joe Lawrence <joe.lawrence@redhat.com>, Shuah Khan <shuah@kernel.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, Tim Chen <tim.c.chen@linux.intel.com>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Mimi Zohar <zohar@linux.ibm.com>, Juergen Gross <jgross@suse.com>, Nick Desaulniers <ndesaulniers@google.com>, Nayna Jain <nayna@linux.ibm.com>, Masahiro Yamada <yamada.masahiro@socionext.com>, Joerg Roedel <jroedel@suse.de>, linux-kselftest@vger.kernel.org, Masami Hiramatsu <mhiramat@kernel.org> Subject: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations References: <20190508074901.982470324@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk
Series	x86: int3 fallout \| expand [0/4] x86: int3 fallout [1/4] x86/entry/32: Clean up return from interrupt preemption path [2/4] x86/kprobes: Fix frame pointer annotations [3/4] x86/ftrace: Add pt_regs frame annotations [RFC,4/4] x86_32: Provide consistent pt_regs

Peter Zijlstra May 8, 2019, 7:49 a.m. UTC

The kprobe trampolines have a FRAME_POINTER annotation that makes no
sense. It marks the frame in the middle of pt_regs, at the place of
saving BP.

Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
from the respective entry_*.S.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

Josh Poimboeuf May 8, 2019, 11:54 a.m. UTC | #1

On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote:
> The kprobe trampolines have a FRAME_POINTER annotation that makes no
> sense. It marks the frame in the middle of pt_regs, at the place of
> saving BP.
> 
> Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
> from the respective entry_*.S.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> --- a/arch/x86/kernel/kprobes/common.h
> +++ b/arch/x86/kernel/kprobes/common.h
> @@ -6,14 +6,15 @@
>  
>  #include <asm/asm.h>
>  
> +#ifdef CONFIG_X86_64
> +
>  #ifdef CONFIG_FRAME_POINTER
> -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
> -			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
> +#define ENCODE_FRAME_POINTER			\
> +	"	leaq 1(%rsp), %rbp\n"
>  #else
> -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
> +#define ENCODE_FRAME_POINTER
>  #endif

> +#ifdef CONFIG_FRAME_POINTER
> +#define ENCODE_FRAME_POINTER			\
> +	"	movl %esp, %ebp\n"		\
> +	"	andl $0x7fffffff, %ebp\n"
> +#else
> +#define ENCODE_FRAME_POINTER
> +#endif

We should put these macros in a header file somewhere (including
stringified versions).

Peter Zijlstra May 8, 2019, 12:04 p.m. UTC | #2

On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote:
> On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote:
> > The kprobe trampolines have a FRAME_POINTER annotation that makes no
> > sense. It marks the frame in the middle of pt_regs, at the place of
> > saving BP.
> > 
> > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
> > from the respective entry_*.S.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
> >  1 file changed, 23 insertions(+), 9 deletions(-)
> > 
> > --- a/arch/x86/kernel/kprobes/common.h
> > +++ b/arch/x86/kernel/kprobes/common.h
> > @@ -6,14 +6,15 @@
> >  
> >  #include <asm/asm.h>
> >  
> > +#ifdef CONFIG_X86_64
> > +
> >  #ifdef CONFIG_FRAME_POINTER
> > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
> > -			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
> > +#define ENCODE_FRAME_POINTER			\
> > +	"	leaq 1(%rsp), %rbp\n"
> >  #else
> > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
> > +#define ENCODE_FRAME_POINTER
> >  #endif
> 
> > +#ifdef CONFIG_FRAME_POINTER
> > +#define ENCODE_FRAME_POINTER			\
> > +	"	movl %esp, %ebp\n"		\
> > +	"	andl $0x7fffffff, %ebp\n"
> > +#else
> > +#define ENCODE_FRAME_POINTER
> > +#endif
> 
> We should put these macros in a header file somewhere (including
> stringified versions).

Probably a good idea. I'll frob them into asm/frame.h.

Do the x86_64 variants also want some ORC annotation?

Peter Zijlstra May 8, 2019, 12:40 p.m. UTC | #3

On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote:

> > We should put these macros in a header file somewhere (including
> > stringified versions).
> 
> Probably a good idea. I'll frob them into asm/frame.h.

---
Subject: x86: Move ENCODE_FRAME_POINTER to asm/frame.h
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed May 8 14:30:48 CEST 2019

In preparation for wider use, move the ENCODE_FRAME_POINTER macros to
a common header and provide inline asm versions.

These macros are used to encode a pt_regs frame for the unwinder; see
unwind_frame.c:decode_frame_pointer().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/entry/calling.h     |   15 --------------
 arch/x86/entry/entry_32.S    |   16 --------------
 arch/x86/include/asm/frame.h |   46 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 31 deletions(-)

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -172,21 +172,6 @@ For 32-bit we have the following convent
 	.endif
 .endm
 
-/*
- * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
- * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
- * is just setting the LSB, which makes it an invalid stack address and is also
- * a signal to the unwinder that it's a pt_regs pointer in disguise.
- *
- * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts
- * the original rbp.
- */
-.macro ENCODE_FRAME_POINTER ptregs_offset=0
-#ifdef CONFIG_FRAME_POINTER
-	leaq 1+\ptregs_offset(%rsp), %rbp
-#endif
-.endm
-
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 
 /*
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -246,22 +246,6 @@
 .Lend_\@:
 .endm
 
-/*
- * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
- * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
- * is just clearing the MSB, which makes it an invalid stack address and is also
- * a signal to the unwinder that it's a pt_regs pointer in disguise.
- *
- * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
- * original rbp.
- */
-.macro ENCODE_FRAME_POINTER
-#ifdef CONFIG_FRAME_POINTER
-	mov %esp, %ebp
-	andl $0x7fffffff, %ebp
-#endif
-.endm
-
 .macro RESTORE_INT_REGS
 	popl	%ebx
 	popl	%ecx
--- a/arch/x86/include/asm/frame.h
+++ b/arch/x86/include/asm/frame.h
@@ -22,6 +22,39 @@
 	pop %_ASM_BP
 .endm
 
+#ifdef CONFIG_X86_64
+/*
+ * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
+ * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
+ * is just setting the LSB, which makes it an invalid stack address and is also
+ * a signal to the unwinder that it's a pt_regs pointer in disguise.
+ *
+ * NOTE: This macro must be used *after* PUSH_AND_CLEAR_REGS because it corrupts
+ * the original rbp.
+ */
+.macro ENCODE_FRAME_POINTER ptregs_offset=0
+#ifdef CONFIG_FRAME_POINTER
+	leaq 1+\ptregs_offset(%rsp), %rbp
+#endif
+.endm
+#else /* !CONFIG_X86_64 */
+/*
+ * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
+ * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
+ * is just clearing the MSB, which makes it an invalid stack address and is also
+ * a signal to the unwinder that it's a pt_regs pointer in disguise.
+ *
+ * NOTE: This macro must be used *after* SAVE_ALL because it corrupts the
+ * original ebp.
+ */
+.macro ENCODE_FRAME_POINTER
+#ifdef CONFIG_FRAME_POINTER
+	mov %esp, %ebp
+	andl $0x7fffffff, %ebp
+#endif
+.endm
+#endif /* CONFIG_X86_64 */
+
 #else /* !__ASSEMBLY__ */
 
 #define FRAME_BEGIN				\
@@ -30,6 +63,19 @@
 
 #define FRAME_END "pop %" _ASM_BP "\n"
 
+#ifdef CONFIG_FRAME_POINTER
+#ifdef CONFIG_X86_64
+#define ENCODE_FRAME_POINTER			\
+	"lea 1(%rsp), %rbp\n\t"
+#else /* !CONFIG_X86_64 */
+#define ENCODE_FRAME_POINTER			\
+	"movl %esp, %ebp\n\t"			\
+	"andl $0x7fffffff, %ebp\n\t"
+#endif /* CONFIG_X86_64 */
+#else /* CONFIG_FRAME_POINTER */
+#define ENCODE_FRAME_POINTER
+#endif /* CONFIG_FRAME_POINTER */
+
 #endif /* __ASSEMBLY__ */
 
 #define FRAME_OFFSET __ASM_SEL(4, 8)

Josh Poimboeuf May 8, 2019, 12:42 p.m. UTC | #4

On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 06:54:16AM -0500, Josh Poimboeuf wrote:
> > On Wed, May 08, 2019 at 09:49:03AM +0200, Peter Zijlstra wrote:
> > > The kprobe trampolines have a FRAME_POINTER annotation that makes no
> > > sense. It marks the frame in the middle of pt_regs, at the place of
> > > saving BP.
> > > 
> > > Change it to mark the pt_regs frame as per the ENCODE_FRAME_POINTER
> > > from the respective entry_*.S.
> > > 
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  arch/x86/kernel/kprobes/common.h |   32 +++++++++++++++++++++++---------
> > >  1 file changed, 23 insertions(+), 9 deletions(-)
> > > 
> > > --- a/arch/x86/kernel/kprobes/common.h
> > > +++ b/arch/x86/kernel/kprobes/common.h
> > > @@ -6,14 +6,15 @@
> > >  
> > >  #include <asm/asm.h>
> > >  
> > > +#ifdef CONFIG_X86_64
> > > +
> > >  #ifdef CONFIG_FRAME_POINTER
> > > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n" \
> > > -			 "	mov  %" _ASM_SP ", %" _ASM_BP "\n"
> > > +#define ENCODE_FRAME_POINTER			\
> > > +	"	leaq 1(%rsp), %rbp\n"
> > >  #else
> > > -# define SAVE_RBP_STRING "	push %" _ASM_BP "\n"
> > > +#define ENCODE_FRAME_POINTER
> > >  #endif
> > 
> > > +#ifdef CONFIG_FRAME_POINTER
> > > +#define ENCODE_FRAME_POINTER			\
> > > +	"	movl %esp, %ebp\n"		\
> > > +	"	andl $0x7fffffff, %ebp\n"
> > > +#else
> > > +#define ENCODE_FRAME_POINTER
> > > +#endif
> > 
> > We should put these macros in a header file somewhere (including
> > stringified versions).
> 
> Probably a good idea. I'll frob them into asm/frame.h.
> 
> Do the x86_64 variants also want some ORC annotation?

Maybe so.  Though it looks like regs->ip isn't saved.  The saved
registers might need to be tweaked.  I'll need to look into it.

Peter Zijlstra May 8, 2019, 3:39 p.m. UTC | #5

On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:

> > Do the x86_64 variants also want some ORC annotation?
> 
> Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> registers might need to be tweaked.  I'll need to look into it.

What all these sites do (and maybe we should look at unifying them
somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
pt_regs).

So regs->ip will be the return address (which is fixed up to be the CALL
address in the handler).

Josh Poimboeuf May 8, 2019, 6:48 p.m. UTC | #6

On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> 
> > > Do the x86_64 variants also want some ORC annotation?
> > 
> > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > registers might need to be tweaked.  I'll need to look into it.
> 
> What all these sites do (and maybe we should look at unifying them
> somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> pt_regs).
> 
> So regs->ip will be the return address (which is fixed up to be the CALL
> address in the handler).

But from what I can tell, trampoline_handler() hard-codes regs->ip to
point to kretprobe_trampoline(), and the original return address is
placed in regs->sp.

Masami, is there a reason why regs->ip doesn't have the original return
address and regs->sp doesn't have the original SP?  I think that would
help the unwinder understand things.

Masami Hiramatsu (Google) May 9, 2019, 1:20 a.m. UTC | #7

Hi Josh,

On Wed, 8 May 2019 13:48:48 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> > 
> > > > Do the x86_64 variants also want some ORC annotation?
> > > 
> > > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > > registers might need to be tweaked.  I'll need to look into it.
> > 
> > What all these sites do (and maybe we should look at unifying them
> > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> > pt_regs).
> > 
> > So regs->ip will be the return address (which is fixed up to be the CALL
> > address in the handler).
> 
> But from what I can tell, trampoline_handler() hard-codes regs->ip to
> point to kretprobe_trampoline(), and the original return address is
> placed in regs->sp.
> 
> Masami, is there a reason why regs->ip doesn't have the original return
> address and regs->sp doesn't have the original SP?  I think that would
> help the unwinder understand things.

Yes, for regs->ip, there is a histrical reason. Since previously, we had
an int3 at trampoline, so the user (kretprobe) handler expects that
regs->ip is trampoline address and ri->ret_addr is original return address.
It is better to check the other archs, but I think it is possible to
change the regs->ip to original return address, since no one cares such
"fixed address". :)

For the regs->sp, there are 2 reasons.

For x86-64, it's just for over-optimizing (reduce stack usage).
I think we can make a gap for putting return address, something like

	"kretprobe_trampoline:\n"
#ifdef CONFIG_X86_64
	"	pushq %rsp\n"	/* Make a gap for return address */
	"	pushq 0(%rsp)\n"	/* Copy original stack pointer */
	"	pushfq\n"
	SAVE_REGS_STRING
	"	movq %rsp, %rdi\n"
	"	call trampoline_handler\n"
	/* Push the true return address to the bottom */
	"	movq %rax, 20*8(%rsp)\n"
	RESTORE_REGS_STRING
	"	popfq\n"
	"	addq $8, %rsp\n"	/* Skip original stack pointer */

For i386 (x86-32), there is no other way to keep &regs->sp as
the original stack pointer. It has to be changed with this series,
maybe as same as x86-64.

Thank you,

Peter Zijlstra May 9, 2019, 8:14 a.m. UTC | #8

On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote:
> Hi Josh,
> 
> On Wed, 8 May 2019 13:48:48 -0500
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > Do the x86_64 variants also want some ORC annotation?
> > > > 
> > > > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > > > registers might need to be tweaked.  I'll need to look into it.
> > > 
> > > What all these sites do (and maybe we should look at unifying them
> > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> > > pt_regs).
> > > 
> > > So regs->ip will be the return address (which is fixed up to be the CALL
> > > address in the handler).
> > 
> > But from what I can tell, trampoline_handler() hard-codes regs->ip to
> > point to kretprobe_trampoline(), and the original return address is
> > placed in regs->sp.
> > 
> > Masami, is there a reason why regs->ip doesn't have the original return
> > address and regs->sp doesn't have the original SP?  I think that would
> > help the unwinder understand things.
> 
> Yes, for regs->ip, there is a histrical reason. Since previously, we had
> an int3 at trampoline, so the user (kretprobe) handler expects that
> regs->ip is trampoline address and ri->ret_addr is original return address.
> It is better to check the other archs, but I think it is possible to
> change the regs->ip to original return address, since no one cares such
> "fixed address". :)
> 
> For the regs->sp, there are 2 reasons.
> 
> For x86-64, it's just for over-optimizing (reduce stack usage).
> I think we can make a gap for putting return address, something like
> 
> 	"kretprobe_trampoline:\n"
> #ifdef CONFIG_X86_64
> 	"	pushq %rsp\n"	/* Make a gap for return address */
> 	"	pushq 0(%rsp)\n"	/* Copy original stack pointer */
> 	"	pushfq\n"
> 	SAVE_REGS_STRING
> 	"	movq %rsp, %rdi\n"
> 	"	call trampoline_handler\n"
> 	/* Push the true return address to the bottom */
> 	"	movq %rax, 20*8(%rsp)\n"
> 	RESTORE_REGS_STRING
> 	"	popfq\n"
> 	"	addq $8, %rsp\n"	/* Skip original stack pointer */
> 
> For i386 (x86-32), there is no other way to keep &regs->sp as
> the original stack pointer. It has to be changed with this series,
> maybe as same as x86-64.

Right; I already fixed that in my patch changing i386's pt_regs.

But what I'd love to do is something like the belwo patch, and make all
the trampolines (very much including ftrace) use that. Such that we then
only have 1 copy of this magic (well, 2 because x86_64 also needs an
implementation of this of course).

Changing ftrace over to this would be a little more work but it can
easily chain things a little to get its original context back:

ENTRY(ftrace_regs_caller)
GLOBAL(ftrace_regs_func)
	push ftrace_stub
	push ftrace_regs_handler
	jmp call_to_exception_trampoline
END(ftrace_regs_caller)

typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);

struct ftrace_regs_stack {
	ftrace_func_t func;
	unsigned long parent_ip;
};

void ftrace_regs_handler(struct pr_regs *regs)
{
	struct ftrace_regs_stack *st = (void *)regs->sp;
	ftrace_func_t func = st->func;

	regs->sp += sizeof(long); /* pop func */

	func(regs->ip, st->parent_ip, function_trace_op, regs);
}

Hmm? I didn't look into the function_graph thing, but I imagine it can
be added without too much pain.

---
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
 	call	do_exit
 1:	jmp 1b
 END(rewind_stack_do_exit)
+
+/*
+ * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
+ * just did was in fact scribbled with an INT3.
+ *
+ * Use this trampoline like:
+ *
+ *   PUSH $func
+ *   JMP call_to_exception_trampoline
+ *
+ * $func will see regs->ip point at the CALL instruction and must therefore
+ * modify regs->ip in order to make progress (just like a normal INT3 scribbled
+ * CALL).
+ *
+ * NOTE: we do not restore any of the segment registers.
+ */
+ENTRY(call_to_exception_trampoline)
+	/*
+	 * On entry the stack looks like:
+	 *
+	 *   2*4(%esp) <previous context>
+	 *   1*4(%esp) RET-IP
+	 *   0*4(%esp) func
+	 *
+	 * transform this into:
+	 *
+	 *  19*4(%esp) <previous context>
+	 *  18*4(%esp) gap / RET-IP
+	 *  17*4(%esp) gap / func
+	 *  16*4(%esp) ss
+	 *  15*4(%esp) sp / <previous context>
+	 *  14*4(%esp) flags
+	 *  13*4(%esp) cs
+	 *  12*4(%esp) ip / RET-IP
+	 *  11*4(%esp) orig_eax
+	 *  10*4(%esp) gs
+	 *   9*4(%esp) fs
+	 *   8*4(%esp) es
+	 *   7*4(%esp) ds
+	 *   6*4(%esp) eax
+	 *   5*4(%esp) ebp
+	 *   4*4(%esp) edi
+	 *   3*4(%esp) esi
+	 *   2*4(%esp) edx
+	 *   1*4(%esp) ecx
+	 *   0*4(%esp) ebx
+	 */
+	pushl	%ss
+	pushl	%esp		# points at ss
+	addl	$3*4, (%esp)	#   point it at <previous context>
+	pushfl
+	pushl	%cs
+	pushl	5*4(%esp)	# RET-IP
+	subl	5, (%esp)	#   point at CALL instruction
+	pushl	$-1
+	pushl	%gs
+	pushl	%fs
+	pushl	%es
+	pushl	%ds
+	pushl	%eax
+	pushl	%ebp
+	pushl	%edi
+	pushl	%esi
+	pushl	%edx
+	pushl	%ecx
+	pushl	%ebx
+
+	ENCODE_FRAME_POINTER
+
+	movl	%esp, %eax	# 1st argument: pt_regs
+
+	movl	17*4(%esp), %ebx	# func
+	CALL_NOSPEC %ebx
+
+	movl	PT_OLDESP(%esp), %eax
+
+	movl	PT_EIP(%esp), %ecx
+	movl	%ecx, -1*4(%eax)
+
+	movl	PT_EFLAGS(%esp), %ecx
+	movl	%ecx, -2*4(%eax)
+
+	movl	PT_EAX(%esp), %ecx
+	movl	%ecx, -3*4(%eax)
+
+	popl	%ebx
+	popl	%ecx
+	popl	%edx
+	popl	%esi
+	popl	%edi
+	popl	%ebp
+
+	lea	-3*4(%eax), %esp
+	popl	%eax
+	popfl
+	ret
+END(call_to_exception_trampoline)
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -731,29 +731,8 @@ asm(
 	".global kretprobe_trampoline\n"
 	".type kretprobe_trampoline, @function\n"
 	"kretprobe_trampoline:\n"
-	/* We don't bother saving the ss register */
-#ifdef CONFIG_X86_64
-	"	pushq %rsp\n"
-	"	pushfq\n"
-	SAVE_REGS_STRING
-	"	movq %rsp, %rdi\n"
-	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
-	"	movq %rax, 19*8(%rsp)\n"
-	RESTORE_REGS_STRING
-	"	popfq\n"
-#else
-	"	pushl %esp\n"
-	"	pushfl\n"
-	SAVE_REGS_STRING
-	"	movl %esp, %eax\n"
-	"	call trampoline_handler\n"
-	/* Replace saved sp with true return address. */
-	"	movl %eax, 15*4(%esp)\n"
-	RESTORE_REGS_STRING
-	"	popfl\n"
-#endif
-	"	ret\n"
+	"push trampoline_handler\n"
+	"jmp call_to_exception_trampoline\n"
 	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
 );
 NOKPROBE_SYMBOL(kretprobe_trampoline);
@@ -791,12 +770,7 @@ static __used void *trampoline_handler(s
 
 	INIT_HLIST_HEAD(&empty_rp);
 	kretprobe_hash_lock(current, &head, &flags);
-	/* fixup registers */
-	regs->cs = __KERNEL_CS;
-#ifdef CONFIG_X86_32
-	regs->cs |= get_kernel_rpl();
-	regs->gs = 0;
-#endif
+
 	/* We use pt_regs->sp for return address holder. */
 	frame_pointer = &regs->sp;
 	regs->ip = trampoline_address;

Peter Zijlstra May 9, 2019, 9:27 a.m. UTC | #9

On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */
> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> }

Alternatively we can add things like:

static inline unsigned long int3_emulate_pop(struct pt_regs *regs)
{
	unsigned long val = *(unsigned long *)regs->sp;
	regs->sp += sizeof(unsigned long);
	return val;
}

And do:

	ftrace_func_t func = (void *)int3_emulate_pop(regs);

Josh Poimboeuf May 9, 2019, 2 p.m. UTC | #10

On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).
> 
> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
> 	push ftrace_stub
> 	push ftrace_regs_handler
> 	jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */
> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.

I like this patch a lot, assuming it can be made to work for the
different users.

Masami Hiramatsu (Google) May 9, 2019, 2:01 p.m. UTC | #11

On Thu, 9 May 2019 10:14:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote:
> > Hi Josh,
> > 
> > On Wed, 8 May 2019 13:48:48 -0500
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > > > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > > Do the x86_64 variants also want some ORC annotation?
> > > > > 
> > > > > Maybe so.  Though it looks like regs->ip isn't saved.  The saved
> > > > > registers might need to be tweaked.  I'll need to look into it.
> > > > 
> > > > What all these sites do (and maybe we should look at unifying them
> > > > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> > > > pt_regs).
> > > > 
> > > > So regs->ip will be the return address (which is fixed up to be the CALL
> > > > address in the handler).
> > > 
> > > But from what I can tell, trampoline_handler() hard-codes regs->ip to
> > > point to kretprobe_trampoline(), and the original return address is
> > > placed in regs->sp.
> > > 
> > > Masami, is there a reason why regs->ip doesn't have the original return
> > > address and regs->sp doesn't have the original SP?  I think that would
> > > help the unwinder understand things.
> > 
> > Yes, for regs->ip, there is a histrical reason. Since previously, we had
> > an int3 at trampoline, so the user (kretprobe) handler expects that
> > regs->ip is trampoline address and ri->ret_addr is original return address.
> > It is better to check the other archs, but I think it is possible to
> > change the regs->ip to original return address, since no one cares such
> > "fixed address". :)
> > 
> > For the regs->sp, there are 2 reasons.
> > 
> > For x86-64, it's just for over-optimizing (reduce stack usage).
> > I think we can make a gap for putting return address, something like
> > 
> > 	"kretprobe_trampoline:\n"
> > #ifdef CONFIG_X86_64
> > 	"	pushq %rsp\n"	/* Make a gap for return address */
> > 	"	pushq 0(%rsp)\n"	/* Copy original stack pointer */
> > 	"	pushfq\n"
> > 	SAVE_REGS_STRING
> > 	"	movq %rsp, %rdi\n"
> > 	"	call trampoline_handler\n"
> > 	/* Push the true return address to the bottom */
> > 	"	movq %rax, 20*8(%rsp)\n"
> > 	RESTORE_REGS_STRING
> > 	"	popfq\n"
> > 	"	addq $8, %rsp\n"	/* Skip original stack pointer */
> > 
> > For i386 (x86-32), there is no other way to keep &regs->sp as
> > the original stack pointer. It has to be changed with this series,
> > maybe as same as x86-64.
> 
> Right; I already fixed that in my patch changing i386's pt_regs.

I see it, and it is good to me. :)

> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).

OK, but I will make kretprobe integrated with func-graph tracer,
since it is inefficient that we have 2 different hidden return stack...

Anyway,

> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
> 	push ftrace_stub
> 	push ftrace_regs_handler
> 	jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */

Sorry, why pop here? 

> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.

Yes, that should be good for function_graph trampoline too.
We use very similar technic.

> 
> ---
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
>  	call	do_exit
>  1:	jmp 1b
>  END(rewind_stack_do_exit)
> +
> +/*
> + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> + * just did was in fact scribbled with an INT3.
> + *
> + * Use this trampoline like:
> + *
> + *   PUSH $func
> + *   JMP call_to_exception_trampoline
> + *
> + * $func will see regs->ip point at the CALL instruction and must therefore
> + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> + * CALL).
> + *
> + * NOTE: we do not restore any of the segment registers.
> + */
> +ENTRY(call_to_exception_trampoline)
> +	/*
> +	 * On entry the stack looks like:
> +	 *
> +	 *   2*4(%esp) <previous context>
> +	 *   1*4(%esp) RET-IP
> +	 *   0*4(%esp) func
> +	 *
> +	 * transform this into:
> +	 *
> +	 *  19*4(%esp) <previous context>
> +	 *  18*4(%esp) gap / RET-IP
> +	 *  17*4(%esp) gap / func
> +	 *  16*4(%esp) ss
> +	 *  15*415*4(%esp) sp / <previous context>

isn't this "&<previous context>" ?

> +	 *  14*4(%esp) flags
> +	 *  13*4(%esp) cs
> +	 *  12*4(%esp) ip / RET-IP
> +	 *  11*4(%esp) orig_eax
> +	 *  10*4(%esp) gs
> +	 *   9*4(%esp) fs
> +	 *   8*4(%esp) es
> +	 *   7*4(%esp) ds
> +	 *   6*4(%esp) eax
> +	 *   5*4(%esp) ebp
> +	 *   4*4(%esp) edi
> +	 *   3*4(%esp) esi
> +	 *   2*4(%esp) edx
> +	 *   1*4(%esp) ecx
> +	 *   0*4(%esp) ebx
> +	 */
> +	pushl	%ss
> +	pushl	%esp		# points at ss
> +	addl	$3*4, (%esp)	#   point it at <previous context>
> +	pushfl
> +	pushl	%cs
> +	pushl	5*4(%esp)	# RET-IP
> +	subl	5, (%esp)	#   point at CALL instruction
> +	pushl	$-1
> +	pushl	%gs
> +	pushl	%fs
> +	pushl	%es
> +	pushl	%ds
> +	pushl	%eax
> +	pushl	%ebp
> +	pushl	%edi
> +	pushl	%esi
> +	pushl	%edx
> +	pushl	%ecx
> +	pushl	%ebx
> +
> +	ENCODE_FRAME_POINTER
> +
> +	movl	%esp, %eax	# 1st argument: pt_regs
> +
> +	movl	17*4(%esp), %ebx	# func
> +	CALL_NOSPEC %ebx
> +
> +	movl	PT_OLDESP(%esp), %eax

Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"?

> +
> +	movl	PT_EIP(%esp), %ecx
> +	movl	%ecx, -1*4(%eax)

Ah, OK, so $func must set the true return address to regs->ip
instead of returning it.

> +
> +	movl	PT_EFLAGS(%esp), %ecx
> +	movl	%ecx, -2*4(%eax)
> +
> +	movl	PT_EAX(%esp), %ecx
> +	movl	%ecx, -3*4(%eax)

So, at this point, the stack becomes

 18*4(%esp) RET-IP
 17*4(%esp) eflags
 16*4(%esp) eax

Correct?

> +
> +	popl	%ebx
> +	popl	%ecx
> +	popl	%edx
> +	popl	%esi
> +	popl	%edi
> +	popl	%ebp
> +
> +	lea	-3*4(%eax), %esp
> +	popl	%eax
> +	popfl
> +	ret
> +END(call_to_exception_trampoline)
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -731,29 +731,8 @@ asm(
>  	".global kretprobe_trampoline\n"
>  	".type kretprobe_trampoline, @function\n"
>  	"kretprobe_trampoline:\n"
> -	/* We don't bother saving the ss register */
> -#ifdef CONFIG_X86_64
> -	"	pushq %rsp\n"
> -	"	pushfq\n"
> -	SAVE_REGS_STRING
> -	"	movq %rsp, %rdi\n"
> -	"	call trampoline_handler\n"
> -	/* Replace saved sp with true return address. */
> -	"	movq %rax, 19*8(%rsp)\n"
> -	RESTORE_REGS_STRING
> -	"	popfq\n"
> -#else
> -	"	pushl %esp\n"
> -	"	pushfl\n"
> -	SAVE_REGS_STRING
> -	"	movl %esp, %eax\n"
> -	"	call trampoline_handler\n"
> -	/* Replace saved sp with true return address. */
> -	"	movl %eax, 15*4(%esp)\n"
> -	RESTORE_REGS_STRING
> -	"	popfl\n"
> -#endif
> -	"	ret\n"

Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
the address which is returned from the target function. We have no 
"ret-ip" here at this point. So something like

+	"push $0\n"	/* This is a gap, will be filled with real return address*/

> +	"push trampoline_handler\n"
> +	"jmp call_to_exception_trampoline\n"
>  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
>  );
>  NOKPROBE_SYMBOL(kretprobe_trampoline);
> @@ -791,12 +770,7 @@ static __used void *trampoline_handler(s
>  
>  	INIT_HLIST_HEAD(&empty_rp);
>  	kretprobe_hash_lock(current, &head, &flags);
> -	/* fixup registers */
> -	regs->cs = __KERNEL_CS;
> -#ifdef CONFIG_X86_32
> -	regs->cs |= get_kernel_rpl();
> -	regs->gs = 0;
> -#endif
> +
>  	/* We use pt_regs->sp for return address holder. */
>  	frame_pointer = &regs->sp;
>  	regs->ip = trampoline_address;

Thank you,

Andy Lutomirski May 9, 2019, 4:20 p.m. UTC | #12

> On May 9, 2019, at 1:14 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Thu, May 09, 2019 at 10:20:30AM +0900, Masami Hiramatsu wrote:
>> Hi Josh,
>> 
>> On Wed, 8 May 2019 13:48:48 -0500
>> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> 
>>>> On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
>>>>> On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
>>>>> On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
>>>> 
>>>>>> Do the x86_64 variants also want some ORC annotation?
>>>>> 
>>>>> Maybe so.  Though it looks like regs->ip isn't saved.  The saved
>>>>> registers might need to be tweaked.  I'll need to look into it.
>>>> 
>>>> What all these sites do (and maybe we should look at unifying them
>>>> somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
>>>> pt_regs).
>>>> 
>>>> So regs->ip will be the return address (which is fixed up to be the CALL
>>>> address in the handler).
>>> 
>>> But from what I can tell, trampoline_handler() hard-codes regs->ip to
>>> point to kretprobe_trampoline(), and the original return address is
>>> placed in regs->sp.
>>> 
>>> Masami, is there a reason why regs->ip doesn't have the original return
>>> address and regs->sp doesn't have the original SP?  I think that would
>>> help the unwinder understand things.
>> 
>> Yes, for regs->ip, there is a histrical reason. Since previously, we had
>> an int3 at trampoline, so the user (kretprobe) handler expects that
>> regs->ip is trampoline address and ri->ret_addr is original return address.
>> It is better to check the other archs, but I think it is possible to
>> change the regs->ip to original return address, since no one cares such
>> "fixed address". :)
>> 
>> For the regs->sp, there are 2 reasons.
>> 
>> For x86-64, it's just for over-optimizing (reduce stack usage).
>> I think we can make a gap for putting return address, something like
>> 
>>    "kretprobe_trampoline:\n"
>> #ifdef CONFIG_X86_64
>>    "    pushq %rsp\n"    /* Make a gap for return address */
>>    "    pushq 0(%rsp)\n"    /* Copy original stack pointer */
>>    "    pushfq\n"
>>    SAVE_REGS_STRING
>>    "    movq %rsp, %rdi\n"
>>    "    call trampoline_handler\n"
>>    /* Push the true return address to the bottom */
>>    "    movq %rax, 20*8(%rsp)\n"
>>    RESTORE_REGS_STRING
>>    "    popfq\n"
>>    "    addq $8, %rsp\n"    /* Skip original stack pointer */
>> 
>> For i386 (x86-32), there is no other way to keep &regs->sp as
>> the original stack pointer. It has to be changed with this series,
>> maybe as same as x86-64.
> 
> Right; I already fixed that in my patch changing i386's pt_regs.
> 
> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).
> 
> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
>    push ftrace_stub
>    push ftrace_regs_handler
>    jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
>    ftrace_func_t func;
>    unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
>    struct ftrace_regs_stack *st = (void *)regs->sp;
>    ftrace_func_t func = st->func;
> 
>    regs->sp += sizeof(long); /* pop func */
> 
>    func(regs->ip, st->parent_ip, function_trace_op, regs);
> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.
> 
> ---
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
>    call    do_exit
> 1:    jmp 1b
> END(rewind_stack_do_exit)
> +
> +/*
> + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> + * just did was in fact scribbled with an INT3.
> + *
> + * Use this trampoline like:
> + *
> + *   PUSH $func
> + *   JMP call_to_exception_trampoline
> + *
> + * $func will see regs->ip point at the CALL instruction and must therefore
> + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> + * CALL).
> + *
> + * NOTE: we do not restore any of the segment registers.
> + */
> +ENTRY(call_to_exception_trampoline)
> +    /*
> +     * On entry the stack looks like:
> +     *
> +     *   2*4(%esp) <previous context>
> +     *   1*4(%esp) RET-IP
> +     *   0*4(%esp) func
> +     *
> +     * transform this into:
> +     *
> +     *  19*4(%esp) <previous context>
> +     *  18*4(%esp) gap / RET-IP
> +     *  17*4(%esp) gap / func
> +     *  16*4(%esp) ss
> +     *  15*4(%esp) sp / <previous context>
> +     *  14*4(%esp) flags
> +     *  13*4(%esp) cs
> +     *  12*4(%esp) ip / RET-IP
> +     *  11*4(%esp) orig_eax
> +     *  10*4(%esp) gs
> +     *   9*4(%esp) fs
> +     *   8*4(%esp) es
> +     *   7*4(%esp) ds
> +     *   6*4(%esp) eax
> +     *   5*4(%esp) ebp
> +     *   4*4(%esp) edi
> +     *   3*4(%esp) esi
> +     *   2*4(%esp) edx
> +     *   1*4(%esp) ecx
> +     *   0*4(%esp) ebx
> +     */
> +    pushl    %ss
> +    pushl    %esp        # points at ss
> +    addl    $3*4, (%esp)    #   point it at <previous context>
> +    pushfl
> +    pushl    %cs
> +    pushl    5*4(%esp)    # RET-IP
> +    subl    5, (%esp)    #   point at CALL instruction
> +    pushl    $-1
> +    pushl    %gs
> +    pushl    %fs
> +    pushl    %es
> +    pushl    %ds
> +    pushl    %eax
> +    pushl    %ebp
> +    pushl    %edi
> +    pushl    %esi
> +    pushl    %edx
> +    pushl    %ecx
> +    pushl    %ebx
> +
> +    ENCODE_FRAME_POINTER
> +
> +    movl    %esp, %eax    # 1st argument: pt_regs
> +
> +    movl    17*4(%esp), %ebx    # func
> +    CALL_NOSPEC %ebx
> +
> +    movl    PT_OLDESP(%esp), %eax
> +
> +    movl    PT_EIP(%esp), %ecx
> +    movl    %ecx, -1*4(%eax)
> +
> +    movl    PT_EFLAGS(%esp), %ecx
> +    movl    %ecx, -2*4(%eax)
> +
> +    movl    PT_EAX(%esp), %ecx
> +    movl    %ecx, -3*4(%eax)
> +
> +    popl    %ebx
> +    popl    %ecx
> +    popl    %edx
> +    popl    %esi
> +    popl    %edi
> +    popl    %ebp
> +
> +    lea    -3*4(%eax), %esp
> +    popl    %eax
> +    popfl
> +    ret
> +END(call_to_exception_trampoline)
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -731,29 +731,8 @@ asm(
>    ".global kretprobe_trampoline\n"
>    ".type kretprobe_trampoline, @function\n"
>    "kretprobe_trampoline:\n"
> -    /* We don't bother saving the ss register */
> -#ifdef CONFIG_X86_64
> -    "    pushq %rsp\n"
> -    "    pushfq\n"
> -    SAVE_REGS_STRING
> -    "    movq %rsp, %rdi\n"
> -    "    call trampoline_handler\n"
> -    /* Replace saved sp with true return address. */
> -    "    movq %rax, 19*8(%rsp)\n"
> -    RESTORE_REGS_STRING
> -    "    popfq\n"
> -#else
> -    "    pushl %esp\n"
> -    "    pushfl\n"
> -    SAVE_REGS_STRING
> -    "    movl %esp, %eax\n"
> -    "    call trampoline_handler\n"
> -    /* Replace saved sp with true return address. */
> -    "    movl %eax, 15*4(%esp)\n"
> -    RESTORE_REGS_STRING
> -    "    popfl\n"
> -#endif
> -    "    ret\n"
> +    "push trampoline_handler\n"
> +    "jmp call_to_exception_trampoline\n"
>    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> );


Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.

Peter Zijlstra May 9, 2019, 5:14 p.m. UTC | #13

On Thu, May 09, 2019 at 11:01:06PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 10:14:31 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > But what I'd love to do is something like the belwo patch, and make all
> > the trampolines (very much including ftrace) use that. Such that we then
> > only have 1 copy of this magic (well, 2 because x86_64 also needs an
> > implementation of this of course).
> 
> OK, but I will make kretprobe integrated with func-graph tracer,
> since it is inefficient that we have 2 different hidden return stack...
> 
> Anyway,
> 
> > Changing ftrace over to this would be a little more work but it can
> > easily chain things a little to get its original context back:
> > 
> > ENTRY(ftrace_regs_caller)
> > GLOBAL(ftrace_regs_func)
> > 	push ftrace_stub
> > 	push ftrace_regs_handler
> > 	jmp call_to_exception_trampoline
> > END(ftrace_regs_caller)
> > 
> > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> > 
> > struct ftrace_regs_stack {
> > 	ftrace_func_t func;
> > 	unsigned long parent_ip;
> > };
> > 
> > void ftrace_regs_handler(struct pr_regs *regs)
> > {
> > 	struct ftrace_regs_stack *st = (void *)regs->sp;
> > 	ftrace_func_t func = st->func;
> > 
> > 	regs->sp += sizeof(long); /* pop func */
> 
> Sorry, why pop here? 

Otherwise it stays on the return stack and bad things happen. Note how
the below trampoline thing uses regs->sp.

> > 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> > }
> > 
> > Hmm? I didn't look into the function_graph thing, but I imagine it can
> > be added without too much pain.
> 
> Yes, that should be good for function_graph trampoline too.
> We use very similar technic.

Ideally also the optimized kprobe trampoline, but I've not managed to
fully comprehend that one.

> > 
> > ---
> > --- a/arch/x86/entry/entry_32.S
> > +++ b/arch/x86/entry/entry_32.S
> > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
> >  	call	do_exit
> >  1:	jmp 1b
> >  END(rewind_stack_do_exit)
> > +
> > +/*
> > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> > + * just did was in fact scribbled with an INT3.
> > + *
> > + * Use this trampoline like:
> > + *
> > + *   PUSH $func
> > + *   JMP call_to_exception_trampoline
> > + *
> > + * $func will see regs->ip point at the CALL instruction and must therefore
> > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> > + * CALL).
> > + *
> > + * NOTE: we do not restore any of the segment registers.
> > + */
> > +ENTRY(call_to_exception_trampoline)
> > +	/*
> > +	 * On entry the stack looks like:
> > +	 *
> > +	 *   2*4(%esp) <previous context>
> > +	 *   1*4(%esp) RET-IP
> > +	 *   0*4(%esp) func
> > +	 *
> > +	 * transform this into:
> > +	 *
> > +	 *  19*4(%esp) <previous context>
> > +	 *  18*4(%esp) gap / RET-IP
> > +	 *  17*4(%esp) gap / func
> > +	 *  16*4(%esp) ss
> > +	 *  15*415*4(%esp) sp / <previous context>
> 
> isn't this "&<previous context>" ?

Yes.

> > +	 *  14*4(%esp) flags
> > +	 *  13*4(%esp) cs
> > +	 *  12*4(%esp) ip / RET-IP
> > +	 *  11*4(%esp) orig_eax
> > +	 *  10*4(%esp) gs
> > +	 *   9*4(%esp) fs
> > +	 *   8*4(%esp) es
> > +	 *   7*4(%esp) ds
> > +	 *   6*4(%esp) eax
> > +	 *   5*4(%esp) ebp
> > +	 *   4*4(%esp) edi
> > +	 *   3*4(%esp) esi
> > +	 *   2*4(%esp) edx
> > +	 *   1*4(%esp) ecx
> > +	 *   0*4(%esp) ebx
> > +	 */
> > +	pushl	%ss
> > +	pushl	%esp		# points at ss
> > +	addl	$3*4, (%esp)	#   point it at <previous context>
> > +	pushfl
> > +	pushl	%cs
> > +	pushl	5*4(%esp)	# RET-IP
> > +	subl	5, (%esp)	#   point at CALL instruction
> > +	pushl	$-1
> > +	pushl	%gs
> > +	pushl	%fs
> > +	pushl	%es
> > +	pushl	%ds
> > +	pushl	%eax
> > +	pushl	%ebp
> > +	pushl	%edi
> > +	pushl	%esi
> > +	pushl	%edx
> > +	pushl	%ecx
> > +	pushl	%ebx
> > +
> > +	ENCODE_FRAME_POINTER
> > +
> > +	movl	%esp, %eax	# 1st argument: pt_regs
> > +
> > +	movl	17*4(%esp), %ebx	# func
> > +	CALL_NOSPEC %ebx
> > +
> > +	movl	PT_OLDESP(%esp), %eax
> 
> Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"?

The latter.

> > +
> > +	movl	PT_EIP(%esp), %ecx
> > +	movl	%ecx, -1*4(%eax)
> 
> Ah, OK, so $func must set the true return address to regs->ip
> instead of returning it.

Just so.

> > +
> > +	movl	PT_EFLAGS(%esp), %ecx
> > +	movl	%ecx, -2*4(%eax)
> > +
> > +	movl	PT_EAX(%esp), %ecx
> > +	movl	%ecx, -3*4(%eax)
> 
> So, at this point, the stack becomes
> 
  3*4(%esp) &regs->sp
  2*4(%esp) RET-IP
  1*4(%esp) eflags
  0*4(%esp) eax

> Correct?

Yes, relative to regs->sp, which is why we need to pop 'func', otherwise
it stays on the stack.

> > +
> > +	popl	%ebx
> > +	popl	%ecx
> > +	popl	%edx
> > +	popl	%esi
> > +	popl	%edi
> > +	popl	%ebp
> > +
> > +	lea	-3*4(%eax), %esp
> > +	popl	%eax
> > +	popfl
> > +	ret
> > +END(call_to_exception_trampoline)
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -731,29 +731,8 @@ asm(
> >  	".global kretprobe_trampoline\n"
> >  	".type kretprobe_trampoline, @function\n"
> >  	"kretprobe_trampoline:\n"
> > -	/* We don't bother saving the ss register */
> > -#ifdef CONFIG_X86_64
> > -	"	pushq %rsp\n"
> > -	"	pushfq\n"
> > -	SAVE_REGS_STRING
> > -	"	movq %rsp, %rdi\n"
> > -	"	call trampoline_handler\n"
> > -	/* Replace saved sp with true return address. */
> > -	"	movq %rax, 19*8(%rsp)\n"
> > -	RESTORE_REGS_STRING
> > -	"	popfq\n"
> > -#else
> > -	"	pushl %esp\n"
> > -	"	pushfl\n"
> > -	SAVE_REGS_STRING
> > -	"	movl %esp, %eax\n"
> > -	"	call trampoline_handler\n"
> > -	/* Replace saved sp with true return address. */
> > -	"	movl %eax, 15*4(%esp)\n"
> > -	RESTORE_REGS_STRING
> > -	"	popfl\n"
> > -#endif
> > -	"	ret\n"
> 
> Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> the address which is returned from the target function. We have no 
> "ret-ip" here at this point. So something like
> 
> +	"push $0\n"	/* This is a gap, will be filled with real return address*/

The trampoline already provides a gap, trampoline_handler() will need to
use int3_emulate_push() if it wants to inject something on the return
stack.

> > +	"push trampoline_handler\n"
> > +	"jmp call_to_exception_trampoline\n"
> >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> >  );
> >  NOKPROBE_SYMBOL(kretprobe_trampoline);

Peter Zijlstra May 9, 2019, 5:18 p.m. UTC | #14

On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > +ENTRY(call_to_exception_trampoline)
> > +    /*
> > +     * On entry the stack looks like:
> > +     *
> > +     *   2*4(%esp) <previous context>
> > +     *   1*4(%esp) RET-IP
> > +     *   0*4(%esp) func
> > +     *
> > +     * transform this into:
> > +     *
> > +     *  19*4(%esp) <previous context>
> > +     *  18*4(%esp) gap / RET-IP
> > +     *  17*4(%esp) gap / func
> > +     *  16*4(%esp) ss
> > +     *  15*4(%esp) sp / <previous context>
> > +     *  14*4(%esp) flags
> > +     *  13*4(%esp) cs
> > +     *  12*4(%esp) ip / RET-IP
> > +     *  11*4(%esp) orig_eax
> > +     *  10*4(%esp) gs
> > +     *   9*4(%esp) fs
> > +     *   8*4(%esp) es
> > +     *   7*4(%esp) ds
> > +     *   6*4(%esp) eax
> > +     *   5*4(%esp) ebp
> > +     *   4*4(%esp) edi
> > +     *   3*4(%esp) esi
> > +     *   2*4(%esp) edx
> > +     *   1*4(%esp) ecx
> > +     *   0*4(%esp) ebx
> > +     */
> > +    pushl    %ss
> > +    pushl    %esp        # points at ss
> > +    addl    $3*4, (%esp)    #   point it at <previous context>
> > +    pushfl
> > +    pushl    %cs
> > +    pushl    5*4(%esp)    # RET-IP
> > +    subl    5, (%esp)    #   point at CALL instruction
> > +    pushl    $-1
> > +    pushl    %gs
> > +    pushl    %fs
> > +    pushl    %es
> > +    pushl    %ds
> > +    pushl    %eax
> > +    pushl    %ebp
> > +    pushl    %edi
> > +    pushl    %esi
> > +    pushl    %edx
> > +    pushl    %ecx
> > +    pushl    %ebx
> > +
> > +    ENCODE_FRAME_POINTER
> > +
> > +    movl    %esp, %eax    # 1st argument: pt_regs
> > +
> > +    movl    17*4(%esp), %ebx    # func
> > +    CALL_NOSPEC %ebx
> > +
> > +    movl    PT_OLDESP(%esp), %eax
> > +
> > +    movl    PT_EIP(%esp), %ecx
> > +    movl    %ecx, -1*4(%eax)
> > +
> > +    movl    PT_EFLAGS(%esp), %ecx
> > +    movl    %ecx, -2*4(%eax)
> > +
> > +    movl    PT_EAX(%esp), %ecx
> > +    movl    %ecx, -3*4(%eax)
> > +
> > +    popl    %ebx
> > +    popl    %ecx
> > +    popl    %edx
> > +    popl    %esi
> > +    popl    %edi
> > +    popl    %ebp
> > +
> > +    lea    -3*4(%eax), %esp
> > +    popl    %eax
> > +    popfl
> > +    ret
> > +END(call_to_exception_trampoline)

> Potentially minor nit: you’re doing popfl, but you’re not doing
> TRACE_IRQ_whatever.  This makes me think that you should either add
> the tracing (ugh!) or you should maybe just skip the popfl.

Yeah, so we really should not change flags I suppose. If this lives I'll
remove the popfl.

Steven Rostedt May 9, 2019, 5:37 p.m. UTC | #15

On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> 
> Right; I already fixed that in my patch changing i386's pt_regs.
> 
> But what I'd love to do is something like the belwo patch, and make all
> the trampolines (very much including ftrace) use that. Such that we then
> only have 1 copy of this magic (well, 2 because x86_64 also needs an
> implementation of this of course).
> 
> Changing ftrace over to this would be a little more work but it can
> easily chain things a little to get its original context back:
> 
> ENTRY(ftrace_regs_caller)
> GLOBAL(ftrace_regs_func)
> 	push ftrace_stub
> 	push ftrace_regs_handler

Note, ftrace_stub is dynamically modified to remove any indirect calls.

> 	jmp call_to_exception_trampoline
> END(ftrace_regs_caller)
> 
> typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> 
> struct ftrace_regs_stack {
> 	ftrace_func_t func;
> 	unsigned long parent_ip;
> };
> 
> void ftrace_regs_handler(struct pr_regs *regs)
> {
> 	struct ftrace_regs_stack *st = (void *)regs->sp;
> 	ftrace_func_t func = st->func;
> 
> 	regs->sp += sizeof(long); /* pop func */
> 
> 	func(regs->ip, st->parent_ip, function_trace_op, regs);

I try very hard to limit all indirect function calls from the function tracing
path, as they do add noticeable overhead.

-- Steve

> }
> 
> Hmm? I didn't look into the function_graph thing, but I imagine it can
> be added without too much pain.
>

Steven Rostedt May 9, 2019, 5:43 p.m. UTC | #16

On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > +END(call_to_exception_trampoline)
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -731,29 +731,8 @@ asm(
> >    ".global kretprobe_trampoline\n"
> >    ".type kretprobe_trampoline, @function\n"
> >    "kretprobe_trampoline:\n"
> > -    /* We don't bother saving the ss register */
> > -#ifdef CONFIG_X86_64
> > -    "    pushq %rsp\n"
> > -    "    pushfq\n"
> > -    SAVE_REGS_STRING
> > -    "    movq %rsp, %rdi\n"
> > -    "    call trampoline_handler\n"
> > -    /* Replace saved sp with true return address. */
> > -    "    movq %rax, 19*8(%rsp)\n"
> > -    RESTORE_REGS_STRING
> > -    "    popfq\n"
> > -#else
> > -    "    pushl %esp\n"
> > -    "    pushfl\n"
> > -    SAVE_REGS_STRING
> > -    "    movl %esp, %eax\n"
> > -    "    call trampoline_handler\n"
> > -    /* Replace saved sp with true return address. */
> > -    "    movl %eax, 15*4(%esp)\n"
> > -    RESTORE_REGS_STRING
> > -    "    popfl\n"
> > -#endif
> > -    "    ret\n"
> > +    "push trampoline_handler\n"
> > +    "jmp call_to_exception_trampoline\n"
> >    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > );
> 
> 
> Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.


Note, kprobes (and ftrace for that matter) are not saving flags for
interrupts, but because it must not modify the sign, zero and carry flags.

-- Steve

Peter Zijlstra May 9, 2019, 6:26 p.m. UTC | #17

On Thu, May 09, 2019 at 01:37:42PM -0400, Steven Rostedt wrote:
> On Thu, May 09, 2019 at 10:14:31AM +0200, Peter Zijlstra wrote:
> > 
> > Right; I already fixed that in my patch changing i386's pt_regs.
> > 
> > But what I'd love to do is something like the belwo patch, and make all
> > the trampolines (very much including ftrace) use that. Such that we then
> > only have 1 copy of this magic (well, 2 because x86_64 also needs an
> > implementation of this of course).
> > 
> > Changing ftrace over to this would be a little more work but it can
> > easily chain things a little to get its original context back:
> > 
> > ENTRY(ftrace_regs_caller)
> > GLOBAL(ftrace_regs_func)
> > 	push ftrace_stub
> > 	push ftrace_regs_handler
> 
> Note, ftrace_stub is dynamically modified to remove any indirect calls.

Yeah, I realized that a few hours after I send this out; as you might
have seen by the IRC chatter on this.

Still, maybe we can wrap the thing in a .macro and reuse things that
way. Because I really hate there are at least 3 (x2 for x86_64) copies
of this around.

Steven Rostedt May 9, 2019, 6:36 p.m. UTC | #18

On Thu, 9 May 2019 20:26:22 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Still, maybe we can wrap the thing in a .macro and reuse things that
> way. Because I really hate there are at least 3 (x2 for x86_64) copies
> of this around.

I'm good with something like this. Have a single place that does the
pt_regs saving would be great. Then I could separate ftrace_regs_caller
from ftrace_caller, and simplify them both, as I would only need to
make sure frame pointers still work for ftrace_caller, and
ftrace_regs_caller would depend on the macro to make sure it works.

-- Steve

Masami Hiramatsu (Google) May 10, 2019, 3:21 a.m. UTC | #19

On Thu, 9 May 2019 13:43:16 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > > +END(call_to_exception_trampoline)
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -731,29 +731,8 @@ asm(
> > >    ".global kretprobe_trampoline\n"
> > >    ".type kretprobe_trampoline, @function\n"
> > >    "kretprobe_trampoline:\n"
> > > -    /* We don't bother saving the ss register */
> > > -#ifdef CONFIG_X86_64
> > > -    "    pushq %rsp\n"
> > > -    "    pushfq\n"
> > > -    SAVE_REGS_STRING
> > > -    "    movq %rsp, %rdi\n"
> > > -    "    call trampoline_handler\n"
> > > -    /* Replace saved sp with true return address. */
> > > -    "    movq %rax, 19*8(%rsp)\n"
> > > -    RESTORE_REGS_STRING
> > > -    "    popfq\n"
> > > -#else
> > > -    "    pushl %esp\n"
> > > -    "    pushfl\n"
> > > -    SAVE_REGS_STRING
> > > -    "    movl %esp, %eax\n"
> > > -    "    call trampoline_handler\n"
> > > -    /* Replace saved sp with true return address. */
> > > -    "    movl %eax, 15*4(%esp)\n"
> > > -    RESTORE_REGS_STRING
> > > -    "    popfl\n"
> > > -#endif
> > > -    "    ret\n"
> > > +    "push trampoline_handler\n"
> > > +    "jmp call_to_exception_trampoline\n"
> > >    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > );
> > 
> > 
> > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.
> 
> 
> Note, kprobes (and ftrace for that matter) are not saving flags for
> interrupts, but because it must not modify the sign, zero and carry flags.

Yes, optprobe also has to save and restore the flags.
Above trampline is for kretprobe, which is placed at the function return, so
we don't have to care about flags.

Thanks,

Masami Hiramatsu (Google) May 10, 2019, 4:58 a.m. UTC | #20

On Thu, 9 May 2019 19:14:16 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, May 09, 2019 at 11:01:06PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 10:14:31 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > But what I'd love to do is something like the belwo patch, and make all
> > > the trampolines (very much including ftrace) use that. Such that we then
> > > only have 1 copy of this magic (well, 2 because x86_64 also needs an
> > > implementation of this of course).
> > 
> > OK, but I will make kretprobe integrated with func-graph tracer,
> > since it is inefficient that we have 2 different hidden return stack...
> > 
> > Anyway,
> > 
> > > Changing ftrace over to this would be a little more work but it can
> > > easily chain things a little to get its original context back:
> > > 
> > > ENTRY(ftrace_regs_caller)
> > > GLOBAL(ftrace_regs_func)
> > > 	push ftrace_stub
> > > 	push ftrace_regs_handler
> > > 	jmp call_to_exception_trampoline
> > > END(ftrace_regs_caller)
> > > 
> > > typedef void (*ftrace_func_t)(unsigned long, unsigned long, struct ftrace_op *, struct pt_regs *);
> > > 
> > > struct ftrace_regs_stack {
> > > 	ftrace_func_t func;
> > > 	unsigned long parent_ip;
> > > };
> > > 
> > > void ftrace_regs_handler(struct pr_regs *regs)
> > > {
> > > 	struct ftrace_regs_stack *st = (void *)regs->sp;
> > > 	ftrace_func_t func = st->func;
> > > 
> > > 	regs->sp += sizeof(long); /* pop func */
> > 
> > Sorry, why pop here? 
> 
> Otherwise it stays on the return stack and bad things happen. Note how
> the below trampoline thing uses regs->sp.
> 
> > > 	func(regs->ip, st->parent_ip, function_trace_op, regs);
> > > }
> > > 
> > > Hmm? I didn't look into the function_graph thing, but I imagine it can
> > > be added without too much pain.
> > 
> > Yes, that should be good for function_graph trampoline too.
> > We use very similar technic.
> 
> Ideally also the optimized kprobe trampoline, but I've not managed to
> fully comprehend that one.

As you pointed in other reply, save/restore can be a macro, but
each trampoline code is slightly different. Optprobe template has
below parts

(jumped from probed address)
[store regs]
[setup function arguments (pt_regs and probed address)]
[handler call]
[restore regs]
[execute copied instruction]
[jump back to probed address]

Note that there is a limitation that if it is optiomized probe, user
handler can not change regs->ip. (we can not use "ret" after executed
a copied instruction, which must run on same stack)

> 
> > > 
> > > ---
> > > --- a/arch/x86/entry/entry_32.S
> > > +++ b/arch/x86/entry/entry_32.S
> > > @@ -1576,3 +1576,100 @@ ENTRY(rewind_stack_do_exit)
> > >  	call	do_exit
> > >  1:	jmp 1b
> > >  END(rewind_stack_do_exit)
> > > +
> > > +/*
> > > + * Transforms a CALL frame into an exception frame; IOW it pretends the CALL we
> > > + * just did was in fact scribbled with an INT3.
> > > + *
> > > + * Use this trampoline like:
> > > + *
> > > + *   PUSH $func
> > > + *   JMP call_to_exception_trampoline
> > > + *
> > > + * $func will see regs->ip point at the CALL instruction and must therefore
> > > + * modify regs->ip in order to make progress (just like a normal INT3 scribbled
> > > + * CALL).
> > > + *
> > > + * NOTE: we do not restore any of the segment registers.
> > > + */
> > > +ENTRY(call_to_exception_trampoline)
> > > +	/*
> > > +	 * On entry the stack looks like:
> > > +	 *
> > > +	 *   2*4(%esp) <previous context>
> > > +	 *   1*4(%esp) RET-IP
> > > +	 *   0*4(%esp) func
> > > +	 *
> > > +	 * transform this into:
> > > +	 *
> > > +	 *  19*4(%esp) <previous context>
> > > +	 *  18*4(%esp) gap / RET-IP
> > > +	 *  17*4(%esp) gap / func
> > > +	 *  16*4(%esp) ss
> > > +	 *  15*415*4(%esp) sp / <previous context>
> > 
> > isn't this "&<previous context>" ?
> 
> Yes.
> 
> > > +	 *  14*4(%esp) flags
> > > +	 *  13*4(%esp) cs
> > > +	 *  12*4(%esp) ip / RET-IP
> > > +	 *  11*4(%esp) orig_eax
> > > +	 *  10*4(%esp) gs
> > > +	 *   9*4(%esp) fs
> > > +	 *   8*4(%esp) es
> > > +	 *   7*4(%esp) ds
> > > +	 *   6*4(%esp) eax
> > > +	 *   5*4(%esp) ebp
> > > +	 *   4*4(%esp) edi
> > > +	 *   3*4(%esp) esi
> > > +	 *   2*4(%esp) edx
> > > +	 *   1*4(%esp) ecx
> > > +	 *   0*4(%esp) ebx
> > > +	 */
> > > +	pushl	%ss
> > > +	pushl	%esp		# points at ss
> > > +	addl	$3*4, (%esp)	#   point it at <previous context>
> > > +	pushfl
> > > +	pushl	%cs
> > > +	pushl	5*4(%esp)	# RET-IP
> > > +	subl	5, (%esp)	#   point at CALL instruction
> > > +	pushl	$-1
> > > +	pushl	%gs
> > > +	pushl	%fs
> > > +	pushl	%es
> > > +	pushl	%ds
> > > +	pushl	%eax
> > > +	pushl	%ebp
> > > +	pushl	%edi
> > > +	pushl	%esi
> > > +	pushl	%edx
> > > +	pushl	%ecx
> > > +	pushl	%ebx
> > > +
> > > +	ENCODE_FRAME_POINTER
> > > +
> > > +	movl	%esp, %eax	# 1st argument: pt_regs
> > > +
> > > +	movl	17*4(%esp), %ebx	# func
> > > +	CALL_NOSPEC %ebx
> > > +
> > > +	movl	PT_OLDESP(%esp), %eax
> > 
> > Is PT_OLDESP(%esp) "<previous context>" or "&<previous contex>"?
> 
> The latter.
> 
> > > +
> > > +	movl	PT_EIP(%esp), %ecx
> > > +	movl	%ecx, -1*4(%eax)
> > 
> > Ah, OK, so $func must set the true return address to regs->ip
> > instead of returning it.
> 
> Just so.
> 
> > > +
> > > +	movl	PT_EFLAGS(%esp), %ecx
> > > +	movl	%ecx, -2*4(%eax)
> > > +
> > > +	movl	PT_EAX(%esp), %ecx
> > > +	movl	%ecx, -3*4(%eax)
> > 
> > So, at this point, the stack becomes
> > 
>   3*4(%esp) &regs->sp
>   2*4(%esp) RET-IP
>   1*4(%esp) eflags
>   0*4(%esp) eax
> 
> > Correct?
> 
> Yes, relative to regs->sp, which is why we need to pop 'func', otherwise
> it stays on the stack.
> 
> > > +
> > > +	popl	%ebx
> > > +	popl	%ecx
> > > +	popl	%edx
> > > +	popl	%esi
> > > +	popl	%edi
> > > +	popl	%ebp
> > > +
> > > +	lea	-3*4(%eax), %esp
> > > +	popl	%eax
> > > +	popfl
> > > +	ret
> > > +END(call_to_exception_trampoline)
> > > --- a/arch/x86/kernel/kprobes/core.c
> > > +++ b/arch/x86/kernel/kprobes/core.c
> > > @@ -731,29 +731,8 @@ asm(
> > >  	".global kretprobe_trampoline\n"
> > >  	".type kretprobe_trampoline, @function\n"
> > >  	"kretprobe_trampoline:\n"
> > > -	/* We don't bother saving the ss register */
> > > -#ifdef CONFIG_X86_64
> > > -	"	pushq %rsp\n"
> > > -	"	pushfq\n"
> > > -	SAVE_REGS_STRING
> > > -	"	movq %rsp, %rdi\n"
> > > -	"	call trampoline_handler\n"
> > > -	/* Replace saved sp with true return address. */
> > > -	"	movq %rax, 19*8(%rsp)\n"
> > > -	RESTORE_REGS_STRING
> > > -	"	popfq\n"
> > > -#else
> > > -	"	pushl %esp\n"
> > > -	"	pushfl\n"
> > > -	SAVE_REGS_STRING
> > > -	"	movl %esp, %eax\n"
> > > -	"	call trampoline_handler\n"
> > > -	/* Replace saved sp with true return address. */
> > > -	"	movl %eax, 15*4(%esp)\n"
> > > -	RESTORE_REGS_STRING
> > > -	"	popfl\n"
> > > -#endif
> > > -	"	ret\n"
> > 
> > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > the address which is returned from the target function. We have no 
> > "ret-ip" here at this point. So something like
> > 
> > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> 
> The trampoline already provides a gap, trampoline_handler() will need to
> use int3_emulate_push() if it wants to inject something on the return
> stack.

I guess you mean the int3 case. This trampoline is used as a return destination.
When the target function is called, kretprobe interrupts the first instruction,
and replace the return address with this trampoline. When a "ret" instruction
is done, it returns to this trampoline. Thus the stack frame start with
previous context here. As you described above,

> > > +	 * On entry the stack looks like:
> > > +	 *
> > > +	 *   2*4(%esp) <previous context>
> > > +	 *   1*4(%esp) RET-IP
> > > +	 *   0*4(%esp) func

From this trampoline call, the stack looks like:

	 *   1*4(%esp) <previous context>
	 *   0*4(%esp) func

So we need one more push.

> 
> > > +	"push trampoline_handler\n"
> > > +	"jmp call_to_exception_trampoline\n"
> > >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > >  );
> > >  NOKPROBE_SYMBOL(kretprobe_trampoline);

Thank you,

Peter Zijlstra May 10, 2019, 12:14 p.m. UTC | #21

On Fri, May 10, 2019 at 12:21:03PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 13:43:16 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, May 09, 2019 at 09:20:06AM -0700, Andy Lutomirski wrote:
> > > > +END(call_to_exception_trampoline)
> > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > @@ -731,29 +731,8 @@ asm(
> > > >    ".global kretprobe_trampoline\n"
> > > >    ".type kretprobe_trampoline, @function\n"
> > > >    "kretprobe_trampoline:\n"
> > > > -    /* We don't bother saving the ss register */
> > > > -#ifdef CONFIG_X86_64
> > > > -    "    pushq %rsp\n"
> > > > -    "    pushfq\n"
> > > > -    SAVE_REGS_STRING
> > > > -    "    movq %rsp, %rdi\n"
> > > > -    "    call trampoline_handler\n"
> > > > -    /* Replace saved sp with true return address. */
> > > > -    "    movq %rax, 19*8(%rsp)\n"
> > > > -    RESTORE_REGS_STRING
> > > > -    "    popfq\n"
> > > > -#else
> > > > -    "    pushl %esp\n"
> > > > -    "    pushfl\n"
> > > > -    SAVE_REGS_STRING
> > > > -    "    movl %esp, %eax\n"
> > > > -    "    call trampoline_handler\n"
> > > > -    /* Replace saved sp with true return address. */
> > > > -    "    movl %eax, 15*4(%esp)\n"
> > > > -    RESTORE_REGS_STRING
> > > > -    "    popfl\n"
> > > > -#endif
> > > > -    "    ret\n"
> > > > +    "push trampoline_handler\n"
> > > > +    "jmp call_to_exception_trampoline\n"
> > > >    ".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > > );
> > > 
> > > 
> > > Potentially minor nit: you’re doing popfl, but you’re not doing TRACE_IRQ_whatever.  This makes me think that you should either add the tracing (ugh!) or you should maybe just skip the popfl.
> > 
> > 
> > Note, kprobes (and ftrace for that matter) are not saving flags for
> > interrupts, but because it must not modify the sign, zero and carry flags.

Uh, C ABI considers those clobbered over function calls, surely. Relying
on those flags over a CALL would be _insane_.

Peter Zijlstra May 10, 2019, 12:17 p.m. UTC | #22

On Fri, May 10, 2019 at 12:21:03PM +0900, Masami Hiramatsu wrote:
> Yes, optprobe also has to save and restore the flags.
> Above trampline is for kretprobe, which is placed at the function return, so
> we don't have to care about flags.

Sure, optprobe is actually special here, because it branches out at
'random' places and does indeed need to preserve flags.

But both ftrace and retprobes are at C function call boundaries.
Preserving flags doesn't make sense.

Peter Zijlstra May 10, 2019, 12:31 p.m. UTC | #23

On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 19:14:16 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > Ideally also the optimized kprobe trampoline, but I've not managed to
> > fully comprehend that one.
> 
> As you pointed in other reply, save/restore can be a macro, but
> each trampoline code is slightly different. Optprobe template has
> below parts
> 
> (jumped from probed address)
> [store regs]
> [setup function arguments (pt_regs and probed address)]
> [handler call]
> [restore regs]
> [execute copied instruction]

 instruction_s_ ?

The JMP to this trampoline is likely 5 bytes and could have clobbered
multiple instructions, we'd then have to place them all here, and

> [jump back to probed address]

jump to after whatever instructions were clobbered by the JMP.

> Note that there is a limitation that if it is optiomized probe, user
> handler can not change regs->ip. (we can not use "ret" after executed
> a copied instruction, which must run on same stack)

Changing regs->ip in this case is going to be massively dodgy indeed :-)
But so would changing much else; changing stack layout would also be
somewhat tricky.

Peter Zijlstra May 10, 2019, 12:40 p.m. UTC | #24

On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> On Thu, 9 May 2019 19:14:16 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > @@ -731,29 +731,8 @@ asm(
> > > >  	".global kretprobe_trampoline\n"
> > > >  	".type kretprobe_trampoline, @function\n"
> > > >  	"kretprobe_trampoline:\n"

> > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > the address which is returned from the target function. We have no 
> > > "ret-ip" here at this point. So something like
> > > 
> > > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> > 
> > The trampoline already provides a gap, trampoline_handler() will need to
> > use int3_emulate_push() if it wants to inject something on the return
> > stack.
> 
> I guess you mean the int3 case. This trampoline is used as a return destination.

> When the target function is called, kretprobe interrupts the first instruction,
> and replace the return address with this trampoline. When a "ret" instruction
> is done, it returns to this trampoline. Thus the stack frame start with
> previous context here. As you described above,

I would prefer to change that to inject an extra return address, instead
of replacing it. With the new exception stuff we can actually do that.

So on entry we then go from:

	<previous context>
	RET-IP

to

	<previous context>
	RET-IP
	return-trampoline

So when the function returns, it falls into the trampoline instead.

> > > > +	 * On entry the stack looks like:
> > > > +	 *
> > > > +	 *   2*4(%esp) <previous context>
> > > > +	 *   1*4(%esp) RET-IP
> > > > +	 *   0*4(%esp) func
> 
> From this trampoline call, the stack looks like:
> 
> 	 *   1*4(%esp) <previous context>
> 	 *   0*4(%esp) func
> 
> So we need one more push.

And then the stack looks just right at this point.

> > > > +	"push trampoline_handler\n"
> > > > +	"jmp call_to_exception_trampoline\n"
> > > >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > >  );
> > > >  NOKPROBE_SYMBOL(kretprobe_trampoline);

Steven Rostedt May 10, 2019, 2:54 p.m. UTC | #25

On Fri, 10 May 2019 14:17:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> But both ftrace and retprobes are at C function call boundaries.
> Preserving flags doesn't make sense.

I agree, but I did it just because of my OCD and being complete in
"emulating an int3" for ftrace_regs_caller ;-)

Yeah, we can remove the popfl from the ftrace trampoline.

-- Steve

Masami Hiramatsu (Google) May 11, 2019, 12:52 a.m. UTC | #26

On Fri, 10 May 2019 14:31:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 19:14:16 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Ideally also the optimized kprobe trampoline, but I've not managed to
> > > fully comprehend that one.
> > 
> > As you pointed in other reply, save/restore can be a macro, but
> > each trampoline code is slightly different. Optprobe template has
> > below parts
> > 
> > (jumped from probed address)
> > [store regs]
> > [setup function arguments (pt_regs and probed address)]
> > [handler call]
> > [restore regs]
> > [execute copied instruction]
> 
>  instruction_s_ ?

Yes.

> 
> The JMP to this trampoline is likely 5 bytes and could have clobbered
> multiple instructions, we'd then have to place them all here, and
> 
> > [jump back to probed address]
> 
> jump to after whatever instructions were clobbered by the JMP.

Right!

> > Note that there is a limitation that if it is optiomized probe, user
> > handler can not change regs->ip. (we can not use "ret" after executed
> > a copied instruction, which must run on same stack)
> 
> Changing regs->ip in this case is going to be massively dodgy indeed :-)
> But so would changing much else; changing stack layout would also be
> somewhat tricky.

Yes, so the stack must be same after [restore regs].

Thank you,

Masami Hiramatsu (Google) May 11, 2019, 12:56 a.m. UTC | #27

On Fri, 10 May 2019 14:40:54 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > On Thu, 9 May 2019 19:14:16 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > > @@ -731,29 +731,8 @@ asm(
> > > > >  	".global kretprobe_trampoline\n"
> > > > >  	".type kretprobe_trampoline, @function\n"
> > > > >  	"kretprobe_trampoline:\n"
> 
> > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > > the address which is returned from the target function. We have no 
> > > > "ret-ip" here at this point. So something like
> > > > 
> > > > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> > > 
> > > The trampoline already provides a gap, trampoline_handler() will need to
> > > use int3_emulate_push() if it wants to inject something on the return
> > > stack.
> > 
> > I guess you mean the int3 case. This trampoline is used as a return destination.
> 
> > When the target function is called, kretprobe interrupts the first instruction,
> > and replace the return address with this trampoline. When a "ret" instruction
> > is done, it returns to this trampoline. Thus the stack frame start with
> > previous context here. As you described above,
> 
> I would prefer to change that to inject an extra return address, instead
> of replacing it. With the new exception stuff we can actually do that.
> 
> So on entry we then go from:
> 
> 	<previous context>
> 	RET-IP
> 
> to
> 
> 	<previous context>
> 	RET-IP
> 	return-trampoline
> 
> So when the function returns, it falls into the trampoline instead.

Is that really possible? On x86-64, most parameters are passed by registers,
but x86-32 (and x86-64 in rare case) some parameters can be passed by stack.
If we change the stack layout in the function prologue, the code in
function body can not access those parameters on stack.

Thank you,

> 
> > > > > +	 * On entry the stack looks like:
> > > > > +	 *
> > > > > +	 *   2*4(%esp) <previous context>
> > > > > +	 *   1*4(%esp) RET-IP
> > > > > +	 *   0*4(%esp) func
> > 
> > From this trampoline call, the stack looks like:
> > 
> > 	 *   1*4(%esp) <previous context>
> > 	 *   0*4(%esp) func
> > 
> > So we need one more push.
> 
> And then the stack looks just right at this point.
> 
> > > > > +	"push trampoline_handler\n"
> > > > > +	"jmp call_to_exception_trampoline\n"
> > > > >  	".size kretprobe_trampoline, .-kretprobe_trampoline\n"
> > > > >  );
> > > > >  NOKPROBE_SYMBOL(kretprobe_trampoline);

Peter Zijlstra May 13, 2019, 8:15 a.m. UTC | #28

On Sat, May 11, 2019 at 09:56:55AM +0900, Masami Hiramatsu wrote:
> On Fri, 10 May 2019 14:40:54 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, May 10, 2019 at 01:58:31PM +0900, Masami Hiramatsu wrote:
> > > On Thu, 9 May 2019 19:14:16 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > > --- a/arch/x86/kernel/kprobes/core.c
> > > > > > +++ b/arch/x86/kernel/kprobes/core.c
> > > > > > @@ -731,29 +731,8 @@ asm(
> > > > > >  	".global kretprobe_trampoline\n"
> > > > > >  	".type kretprobe_trampoline, @function\n"
> > > > > >  	"kretprobe_trampoline:\n"
> > 
> > > > > Here, we need a gap for storing ret-ip, because kretprobe_trampoline is
> > > > > the address which is returned from the target function. We have no 
> > > > > "ret-ip" here at this point. So something like
> > > > > 
> > > > > +	"push $0\n"	/* This is a gap, will be filled with real return address*/
> > > > 
> > > > The trampoline already provides a gap, trampoline_handler() will need to
> > > > use int3_emulate_push() if it wants to inject something on the return
> > > > stack.
> > > 
> > > I guess you mean the int3 case. This trampoline is used as a return destination.
> > 
> > > When the target function is called, kretprobe interrupts the first instruction,
> > > and replace the return address with this trampoline. When a "ret" instruction
> > > is done, it returns to this trampoline. Thus the stack frame start with
> > > previous context here. As you described above,
> > 
> > I would prefer to change that to inject an extra return address, instead
> > of replacing it. With the new exception stuff we can actually do that.
> > 
> > So on entry we then go from:
> > 
> > 	<previous context>
> > 	RET-IP
> > 
> > to
> > 
> > 	<previous context>
> > 	RET-IP
> > 	return-trampoline
> > 
> > So when the function returns, it falls into the trampoline instead.
> 
> Is that really possible? On x86-64, most parameters are passed by registers,
> but x86-32 (and x86-64 in rare case) some parameters can be passed by stack.
> If we change the stack layout in the function prologue, the code in
> function body can not access those parameters on stack.

Ooh, I see what you mean... yes that might be trouble indeed. Damn..

[2/4] x86/kprobes: Fix frame pointer annotations

Commit Message

Comments

Patch