diff mbox

arm: kprobes: Align stack to 8-bytes in test code

Message ID 20170316135359.23019-1-tixy@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jon Medhurst (Tixy) March 16, 2017, 1:53 p.m. UTC
kprobes test cases need to have a stack that is aligned to an 8-byte
boundary because they call other functions (and the ARM ABI mandates
that alignment) and because test cases include 64-bit accesses to the
stack. Unfortunately, GCC doesn't ensure this alignment for inline
assembler and for the code in question seems to always misalign it by
pushing just the LR register onto the stack. We therefore need to
explicitly perform stack alignment at the start of each test case.

Without this fix, some test cases will generate alignment faults on
systems where alignment is enforced. Even if the kernel is configured to
handle these faults in software, triggering them is ugly. It also
exposes limitations in the fault handling code which doesn't cope with
writes to the stack. E.g. when handling this instruction

   strd r6, [sp, #-64]!

the fault handling code will write to a stack location below the SP
value at the point the fault occurred, which coincides with where the
exception handler has pushed the saved register context. This results in
corruption of those registers.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
---

I'm assuming the fact the alignment exception handler doesn't cope with
instructions that push things to the stack isn't a problem that we need
to be concerned about, given that compiler generated code and handwitten
assembler shouldn't trigger this unless it's buggy?

Russell, this is the last of several issues [1] [2] I found when testing
Masami Hiramatsu's kprobe changes [3]. That is a total of 4 kprobes
patches and 3 fixes around code patching. Assuming these are acceptable
I can create a branch and a pull request, or feed them into the patch
tracker, let me know.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/494365.html
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/494370.html
[3] https://lkml.org/lkml/2017/2/14/709


 arch/arm/probes/kprobes/test-core.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Russell King (Oracle) March 17, 2017, 12:10 p.m. UTC | #1
On Thu, Mar 16, 2017 at 01:53:59PM +0000, Jon Medhurst wrote:
> Without this fix, some test cases will generate alignment faults on
> systems where alignment is enforced. Even if the kernel is configured to
> handle these faults in software, triggering them is ugly. It also
> exposes limitations in the fault handling code which doesn't cope with
> writes to the stack. E.g. when handling this instruction
> 
>    strd r6, [sp, #-64]!
> 
> the fault handling code will write to a stack location below the SP
> value at the point the fault occurred, which coincides with where the
> exception handler has pushed the saved register context. This results in
> corruption of those registers.

The general rule today is that the stack must always be 64-bit aligned,
so an even number of registers must always be pushed to the stack.

> diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c
> index c893726aa52d..1c98a87786ca 100644
> --- a/arch/arm/probes/kprobes/test-core.c
> +++ b/arch/arm/probes/kprobes/test-core.c
> @@ -977,7 +977,10 @@ static void coverage_end(void)
>  void __naked __kprobes_test_case_start(void)
>  {
>  	__asm__ __volatile__ (
> -		"stmdb	sp!, {r4-r11}				\n\t"
> +		"mov	r2, sp					\n\t"
> +		"bic	r3, r2, #7				\n\t"
> +		"mov	sp, r3					\n\t"
> +		"stmdb	sp!, {r2-r11}				\n\t"

I'm not sure these is where the problem is - on entry, the stack
should be 64-bit aligned.  You're pushing an even number of registers
onto the stack, so it should remain 64-bit aligned.

>  		"sub	sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t"

This looks to be 256 bytes, so that should be fine.

I think the real question is... how is the stack becoming misaligned?
Jon Medhurst (Tixy) March 17, 2017, 12:59 p.m. UTC | #2
On Fri, 2017-03-17 at 12:10 +0000, Russell King - ARM Linux wrote:
> On Thu, Mar 16, 2017 at 01:53:59PM +0000, Jon Medhurst wrote:
> > Without this fix, some test cases will generate alignment faults on
> > systems where alignment is enforced. Even if the kernel is configured to
> > handle these faults in software, triggering them is ugly. It also
> > exposes limitations in the fault handling code which doesn't cope with
> > writes to the stack. E.g. when handling this instruction
> > 
> >    strd r6, [sp, #-64]!
> > 
> > the fault handling code will write to a stack location below the SP
> > value at the point the fault occurred, which coincides with where the
> > exception handler has pushed the saved register context. This results in
> > corruption of those registers.
> 
> The general rule today is that the stack must always be 64-bit aligned,
> so an even number of registers must always be pushed to the stack.
> 
> > diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c
> > index c893726aa52d..1c98a87786ca 100644
> > --- a/arch/arm/probes/kprobes/test-core.c
> > +++ b/arch/arm/probes/kprobes/test-core.c
> > @@ -977,7 +977,10 @@ static void coverage_end(void)
> >  void __naked __kprobes_test_case_start(void)
> >  {
> >  	__asm__ __volatile__ (
> > -		"stmdb	sp!, {r4-r11}				\n\t"
> > +		"mov	r2, sp					\n\t"
> > +		"bic	r3, r2, #7				\n\t"
> > +		"mov	sp, r3					\n\t"
> > +		"stmdb	sp!, {r2-r11}				\n\t"
> 
> I'm not sure these is where the problem is - on entry, the stack
> should be 64-bit aligned

It isn't, because GCC turns code like this

void foo(void)
{
        asm volatile("bl __kprobes_test_case_start"
                     : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc");
}

into this...

8010e4ac <foo>:
8010e4ac:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
8010e4b0:       eb002c99        bl      8011971c <__kprobes_test_case_start>
8010e4b4:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)

Perhaps we need a way of telling GCC we are using the stack but I've not
managed to spot a way of doing that.
Russell King (Oracle) March 17, 2017, 2:06 p.m. UTC | #3
On Fri, Mar 17, 2017 at 12:59:02PM +0000, Jon Medhurst (Tixy) wrote:
> On Fri, 2017-03-17 at 12:10 +0000, Russell King - ARM Linux wrote:
> > On Thu, Mar 16, 2017 at 01:53:59PM +0000, Jon Medhurst wrote:
> > > Without this fix, some test cases will generate alignment faults on
> > > systems where alignment is enforced. Even if the kernel is configured to
> > > handle these faults in software, triggering them is ugly. It also
> > > exposes limitations in the fault handling code which doesn't cope with
> > > writes to the stack. E.g. when handling this instruction
> > > 
> > >    strd r6, [sp, #-64]!
> > > 
> > > the fault handling code will write to a stack location below the SP
> > > value at the point the fault occurred, which coincides with where the
> > > exception handler has pushed the saved register context. This results in
> > > corruption of those registers.
> > 
> > The general rule today is that the stack must always be 64-bit aligned,
> > so an even number of registers must always be pushed to the stack.
> > 
> > > diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c
> > > index c893726aa52d..1c98a87786ca 100644
> > > --- a/arch/arm/probes/kprobes/test-core.c
> > > +++ b/arch/arm/probes/kprobes/test-core.c
> > > @@ -977,7 +977,10 @@ static void coverage_end(void)
> > >  void __naked __kprobes_test_case_start(void)
> > >  {
> > >  	__asm__ __volatile__ (
> > > -		"stmdb	sp!, {r4-r11}				\n\t"
> > > +		"mov	r2, sp					\n\t"
> > > +		"bic	r3, r2, #7				\n\t"
> > > +		"mov	sp, r3					\n\t"
> > > +		"stmdb	sp!, {r2-r11}				\n\t"
> > 
> > I'm not sure these is where the problem is - on entry, the stack
> > should be 64-bit aligned
> 
> It isn't, because GCC turns code like this
> 
> void foo(void)
> {
>         asm volatile("bl __kprobes_test_case_start"
>                      : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc");
> }
> 
> into this...
> 
> 8010e4ac <foo>:
> 8010e4ac:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
> 8010e4b0:       eb002c99        bl      8011971c <__kprobes_test_case_start>
> 8010e4b4:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)
> 
> Perhaps we need a way of telling GCC we are using the stack but I've not
> managed to spot a way of doing that.

Using which compiler options?
Jon Medhurst (Tixy) March 17, 2017, 2:42 p.m. UTC | #4
On Fri, 2017-03-17 at 14:06 +0000, Russell King - ARM Linux wrote:
> On Fri, Mar 17, 2017 at 12:59:02PM +0000, Jon Medhurst (Tixy) wrote:
> > 
[...]
> > It isn't, because GCC turns code like this
> > 
> > void foo(void)
> > {
> >         asm volatile("bl __kprobes_test_case_start"
> >                      : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc");
> > }
> > 
> > into this...
> > 
> > 8010e4ac <foo>:
> > 8010e4ac:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
> > 8010e4b0:       eb002c99        bl      8011971c <__kprobes_test_case_start>
> > 8010e4b4:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)
> > 
> > Perhaps we need a way of telling GCC we are using the stack but I've not
> > managed to spot a way of doing that.
> 
> Using which compiler options?

The ones the Linux makefile picks when building with vexpress_defconfig.
I hacked a kernel file to build the above example but the behaviour is
what I observed with the real kprobes code. I've pasted the commanline
produced by building with V=1 at the end of this email.

One thing I've noticed playing around just now is that if I add "sp" to
the clobber list, and use a newer GCC (gcc-linaro-6.2.1-2016.11) then it
does allign the stack correctly. "sp" makes no difference with GCC 5.3 or
4.8.

That commandline ...

/data/arm32/aarch32-toolchain/gcc-linaro-5.3.1-2016.05-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-gcc
-Wp,-MD,arch/arm/kernel/.patch.o.d  -nostdinc -isystem
/data/arm32/aarch32-toolchain/gcc-linaro-5.3.1-2016.05-x86_64_arm-linux-gnueabihf/bin/../lib/gcc/arm-linux-gnueabihf/5.3.1/include
-I./arch/arm/include -I./arch/arm/include/generated/uapi
-I./arch/arm/include/generated  -I./include -I./arch/arm/include/uapi
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h
-D__KERNEL__ -mlittle-endian -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs
-fno-strict-aliasing -fno-common -Werror-implicit-function-declaration
-Wno-format-security -std=gnu89 -fno-PIE -fno-dwarf2-cfi-asm -fno-ipa-sra
-mabi=aapcs-linux -mno-thumb-interwork -mfpu=vfp -funwind-tables -marm
-D__LINUX_ARM_ARCH__=7 -march=armv7-a -msoft-float -Uarm
-fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
-Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable
-fomit-frame-pointer -fno-var-tracking-assignments -g
-fno-inline-functions-called-once -Wdeclaration-after-statement
-Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types
-DCC_HAVE_ASM_GOTO    -DKBUILD_BASENAME='"patch"'  -DKBUILD_MODNAME='"patch"' -c
-o arch/arm/kernel/patch.o arch/arm/kernel/patch.c
Russell King (Oracle) March 17, 2017, 3:05 p.m. UTC | #5
On Fri, Mar 17, 2017 at 02:42:09PM +0000, Jon Medhurst (Tixy) wrote:
> On Fri, 2017-03-17 at 14:06 +0000, Russell King - ARM Linux wrote:
> > On Fri, Mar 17, 2017 at 12:59:02PM +0000, Jon Medhurst (Tixy) wrote:
> > > 
> [...]
> > > It isn't, because GCC turns code like this
> > > 
> > > void foo(void)
> > > {
> > >         asm volatile("bl __kprobes_test_case_start"
> > >                      : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc");
> > > }
> > > 
> > > into this...
> > > 
> > > 8010e4ac <foo>:
> > > 8010e4ac:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
> > > 8010e4b0:       eb002c99        bl      8011971c <__kprobes_test_case_start>
> > > 8010e4b4:       e49df004        pop     {pc}            ; (ldr pc, [sp], #4)
> > > 
> > > Perhaps we need a way of telling GCC we are using the stack but I've not
> > > managed to spot a way of doing that.
> > 
> > Using which compiler options?
> 
> The ones the Linux makefile picks when building with vexpress_defconfig.
> I hacked a kernel file to build the above example but the behaviour is
> what I observed with the real kprobes code. I've pasted the commanline
> produced by building with V=1 at the end of this email.
> 
> One thing I've noticed playing around just now is that if I add "sp" to
> the clobber list, and use a newer GCC (gcc-linaro-6.2.1-2016.11) then it
> does allign the stack correctly. "sp" makes no difference with GCC 5.3 or
> 4.8.

Meanwhile the kernel images I have here all have code to align the stack
after pushing registers in functions, except for __csum_ipv6_magic and
__memzero, which are both assembly.  That's gcc 4.7.4 building iMX6 only
(which is also armv7-a.)

I suspect we're into compiler behavioural differences, which can't be
relied upon, so I guess we do need to do something in
__kprobes_test_case_start() to work around it.

I'd do:

	mov	ip, sp
	tst	sp, #4
	subeq	sp, sp, #4		@ need to mis-align as we don't save
	stmfd	sp!, {r4 - r11, ip}	@ an even number of registers to end
	sub	sp, sp, #size		@ up with an aligned stack here

and when restoring:

	add	ip, sp, #size
	ldmfd	ip, {r4 - r11, sp}
	bx	r0
Jon Medhurst (Tixy) March 17, 2017, 5:50 p.m. UTC | #6
On Fri, 2017-03-17 at 15:05 +0000, Russell King - ARM Linux wrote
[...]
> I suspect we're into compiler behavioural differences, which can't be
> relied upon, so I guess we do need to do something in
> __kprobes_test_case_start() to work around it.
> 
> I'd do:
> 
> 	mov	ip, sp
> 	tst	sp, #4
> 	subeq	sp, sp, #4		@ need to mis-align as we don't save
> 	stmfd	sp!, {r4 - r11, ip}	@ an even number of registers to end
> 	sub	sp, sp, #size		@ up with an aligned stack here
> 
> and when restoring:
> 
> 	add	ip, sp, #size
> 	ldmfd	ip, {r4 - r11, sp}
> 	bx	r0

Thumb doesn't allow a lot of operations with SP (like that 'tst sp, #4'
or having SP in the reg list for LDM) which is why I ended up producing
slightly convoluted code. Some of these are deprecated for ARM too,
which means they might not be supported on newer architecture versions,
oh joy.
diff mbox

Patch

diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c
index c893726aa52d..1c98a87786ca 100644
--- a/arch/arm/probes/kprobes/test-core.c
+++ b/arch/arm/probes/kprobes/test-core.c
@@ -977,7 +977,10 @@  static void coverage_end(void)
 void __naked __kprobes_test_case_start(void)
 {
 	__asm__ __volatile__ (
-		"stmdb	sp!, {r4-r11}				\n\t"
+		"mov	r2, sp					\n\t"
+		"bic	r3, r2, #7				\n\t"
+		"mov	sp, r3					\n\t"
+		"stmdb	sp!, {r2-r11}				\n\t"
 		"sub	sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t"
 		"bic	r0, lr, #1  @ r0 = inline data		\n\t"
 		"mov	r1, sp					\n\t"
@@ -997,7 +1000,8 @@  void __naked __kprobes_test_case_end_32(void)
 		"movne	pc, r0					\n\t"
 		"mov	r0, r4					\n\t"
 		"add	sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t"
-		"ldmia	sp!, {r4-r11}				\n\t"
+		"ldmia	sp!, {r2-r11}				\n\t"
+		"mov	sp, r2					\n\t"
 		"mov	pc, r0					\n\t"
 	);
 }
@@ -1013,7 +1017,8 @@  void __naked __kprobes_test_case_end_16(void)
 		"bxne	r0					\n\t"
 		"mov	r0, r4					\n\t"
 		"add	sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t"
-		"ldmia	sp!, {r4-r11}				\n\t"
+		"ldmia	sp!, {r2-r11}				\n\t"
+		"mov	sp, r2					\n\t"
 		"bx	r0					\n\t"
 	);
 }