Message ID | 20170316135359.23019-1-tixy@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Mar 16, 2017 at 01:53:59PM +0000, Jon Medhurst wrote: > Without this fix, some test cases will generate alignment faults on > systems where alignment is enforced. Even if the kernel is configured to > handle these faults in software, triggering them is ugly. It also > exposes limitations in the fault handling code which doesn't cope with > writes to the stack. E.g. when handling this instruction > > strd r6, [sp, #-64]! > > the fault handling code will write to a stack location below the SP > value at the point the fault occurred, which coincides with where the > exception handler has pushed the saved register context. This results in > corruption of those registers. The general rule today is that the stack must always be 64-bit aligned, so an even number of registers must always be pushed to the stack. > diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c > index c893726aa52d..1c98a87786ca 100644 > --- a/arch/arm/probes/kprobes/test-core.c > +++ b/arch/arm/probes/kprobes/test-core.c > @@ -977,7 +977,10 @@ static void coverage_end(void) > void __naked __kprobes_test_case_start(void) > { > __asm__ __volatile__ ( > - "stmdb sp!, {r4-r11} \n\t" > + "mov r2, sp \n\t" > + "bic r3, r2, #7 \n\t" > + "mov sp, r3 \n\t" > + "stmdb sp!, {r2-r11} \n\t" I'm not sure these is where the problem is - on entry, the stack should be 64-bit aligned. You're pushing an even number of registers onto the stack, so it should remain 64-bit aligned. > "sub sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t" This looks to be 256 bytes, so that should be fine. I think the real question is... how is the stack becoming misaligned?
On Fri, 2017-03-17 at 12:10 +0000, Russell King - ARM Linux wrote: > On Thu, Mar 16, 2017 at 01:53:59PM +0000, Jon Medhurst wrote: > > Without this fix, some test cases will generate alignment faults on > > systems where alignment is enforced. Even if the kernel is configured to > > handle these faults in software, triggering them is ugly. It also > > exposes limitations in the fault handling code which doesn't cope with > > writes to the stack. E.g. when handling this instruction > > > > strd r6, [sp, #-64]! > > > > the fault handling code will write to a stack location below the SP > > value at the point the fault occurred, which coincides with where the > > exception handler has pushed the saved register context. This results in > > corruption of those registers. > > The general rule today is that the stack must always be 64-bit aligned, > so an even number of registers must always be pushed to the stack. > > > diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c > > index c893726aa52d..1c98a87786ca 100644 > > --- a/arch/arm/probes/kprobes/test-core.c > > +++ b/arch/arm/probes/kprobes/test-core.c > > @@ -977,7 +977,10 @@ static void coverage_end(void) > > void __naked __kprobes_test_case_start(void) > > { > > __asm__ __volatile__ ( > > - "stmdb sp!, {r4-r11} \n\t" > > + "mov r2, sp \n\t" > > + "bic r3, r2, #7 \n\t" > > + "mov sp, r3 \n\t" > > + "stmdb sp!, {r2-r11} \n\t" > > I'm not sure these is where the problem is - on entry, the stack > should be 64-bit aligned It isn't, because GCC turns code like this void foo(void) { asm volatile("bl __kprobes_test_case_start" : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc"); } into this... 8010e4ac <foo>: 8010e4ac: e52de004 push {lr} ; (str lr, [sp, #-4]!) 8010e4b0: eb002c99 bl 8011971c <__kprobes_test_case_start> 8010e4b4: e49df004 pop {pc} ; (ldr pc, [sp], #4) Perhaps we need a way of telling GCC we are using the stack but I've not managed to spot a way of doing that.
On Fri, Mar 17, 2017 at 12:59:02PM +0000, Jon Medhurst (Tixy) wrote: > On Fri, 2017-03-17 at 12:10 +0000, Russell King - ARM Linux wrote: > > On Thu, Mar 16, 2017 at 01:53:59PM +0000, Jon Medhurst wrote: > > > Without this fix, some test cases will generate alignment faults on > > > systems where alignment is enforced. Even if the kernel is configured to > > > handle these faults in software, triggering them is ugly. It also > > > exposes limitations in the fault handling code which doesn't cope with > > > writes to the stack. E.g. when handling this instruction > > > > > > strd r6, [sp, #-64]! > > > > > > the fault handling code will write to a stack location below the SP > > > value at the point the fault occurred, which coincides with where the > > > exception handler has pushed the saved register context. This results in > > > corruption of those registers. > > > > The general rule today is that the stack must always be 64-bit aligned, > > so an even number of registers must always be pushed to the stack. > > > > > diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c > > > index c893726aa52d..1c98a87786ca 100644 > > > --- a/arch/arm/probes/kprobes/test-core.c > > > +++ b/arch/arm/probes/kprobes/test-core.c > > > @@ -977,7 +977,10 @@ static void coverage_end(void) > > > void __naked __kprobes_test_case_start(void) > > > { > > > __asm__ __volatile__ ( > > > - "stmdb sp!, {r4-r11} \n\t" > > > + "mov r2, sp \n\t" > > > + "bic r3, r2, #7 \n\t" > > > + "mov sp, r3 \n\t" > > > + "stmdb sp!, {r2-r11} \n\t" > > > > I'm not sure these is where the problem is - on entry, the stack > > should be 64-bit aligned > > It isn't, because GCC turns code like this > > void foo(void) > { > asm volatile("bl __kprobes_test_case_start" > : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc"); > } > > into this... > > 8010e4ac <foo>: > 8010e4ac: e52de004 push {lr} ; (str lr, [sp, #-4]!) > 8010e4b0: eb002c99 bl 8011971c <__kprobes_test_case_start> > 8010e4b4: e49df004 pop {pc} ; (ldr pc, [sp], #4) > > Perhaps we need a way of telling GCC we are using the stack but I've not > managed to spot a way of doing that. Using which compiler options?
On Fri, 2017-03-17 at 14:06 +0000, Russell King - ARM Linux wrote: > On Fri, Mar 17, 2017 at 12:59:02PM +0000, Jon Medhurst (Tixy) wrote: > > [...] > > It isn't, because GCC turns code like this > > > > void foo(void) > > { > > asm volatile("bl __kprobes_test_case_start" > > : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc"); > > } > > > > into this... > > > > 8010e4ac <foo>: > > 8010e4ac: e52de004 push {lr} ; (str lr, [sp, #-4]!) > > 8010e4b0: eb002c99 bl 8011971c <__kprobes_test_case_start> > > 8010e4b4: e49df004 pop {pc} ; (ldr pc, [sp], #4) > > > > Perhaps we need a way of telling GCC we are using the stack but I've not > > managed to spot a way of doing that. > > Using which compiler options? The ones the Linux makefile picks when building with vexpress_defconfig. I hacked a kernel file to build the above example but the behaviour is what I observed with the real kprobes code. I've pasted the commanline produced by building with V=1 at the end of this email. One thing I've noticed playing around just now is that if I add "sp" to the clobber list, and use a newer GCC (gcc-linaro-6.2.1-2016.11) then it does allign the stack correctly. "sp" makes no difference with GCC 5.3 or 4.8. That commandline ... /data/arm32/aarch32-toolchain/gcc-linaro-5.3.1-2016.05-x86_64_arm-linux-gnueabihf/bin/arm-linux-gnueabihf-gcc -Wp,-MD,arch/arm/kernel/.patch.o.d -nostdinc -isystem /data/arm32/aarch32-toolchain/gcc-linaro-5.3.1-2016.05-x86_64_arm-linux-gnueabihf/bin/../lib/gcc/arm-linux-gnueabihf/5.3.1/include -I./arch/arm/include -I./arch/arm/include/generated/uapi -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -mlittle-endian -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -fno-dwarf2-cfi-asm -fno-ipa-sra -mabi=aapcs-linux -mno-thumb-interwork -mfpu=vfp -funwind-tables -marm -D__LINUX_ARM_ARCH__=7 -march=armv7-a -msoft-float -Uarm -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -fno-var-tracking-assignments -g -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -DCC_HAVE_ASM_GOTO -DKBUILD_BASENAME='"patch"' -DKBUILD_MODNAME='"patch"' -c -o arch/arm/kernel/patch.o arch/arm/kernel/patch.c
On Fri, Mar 17, 2017 at 02:42:09PM +0000, Jon Medhurst (Tixy) wrote: > On Fri, 2017-03-17 at 14:06 +0000, Russell King - ARM Linux wrote: > > On Fri, Mar 17, 2017 at 12:59:02PM +0000, Jon Medhurst (Tixy) wrote: > > > > [...] > > > It isn't, because GCC turns code like this > > > > > > void foo(void) > > > { > > > asm volatile("bl __kprobes_test_case_start" > > > : : : "r0", "r1", "r2", "r3", "ip", "lr", "memory", "cc"); > > > } > > > > > > into this... > > > > > > 8010e4ac <foo>: > > > 8010e4ac: e52de004 push {lr} ; (str lr, [sp, #-4]!) > > > 8010e4b0: eb002c99 bl 8011971c <__kprobes_test_case_start> > > > 8010e4b4: e49df004 pop {pc} ; (ldr pc, [sp], #4) > > > > > > Perhaps we need a way of telling GCC we are using the stack but I've not > > > managed to spot a way of doing that. > > > > Using which compiler options? > > The ones the Linux makefile picks when building with vexpress_defconfig. > I hacked a kernel file to build the above example but the behaviour is > what I observed with the real kprobes code. I've pasted the commanline > produced by building with V=1 at the end of this email. > > One thing I've noticed playing around just now is that if I add "sp" to > the clobber list, and use a newer GCC (gcc-linaro-6.2.1-2016.11) then it > does allign the stack correctly. "sp" makes no difference with GCC 5.3 or > 4.8. Meanwhile the kernel images I have here all have code to align the stack after pushing registers in functions, except for __csum_ipv6_magic and __memzero, which are both assembly. That's gcc 4.7.4 building iMX6 only (which is also armv7-a.) I suspect we're into compiler behavioural differences, which can't be relied upon, so I guess we do need to do something in __kprobes_test_case_start() to work around it. I'd do: mov ip, sp tst sp, #4 subeq sp, sp, #4 @ need to mis-align as we don't save stmfd sp!, {r4 - r11, ip} @ an even number of registers to end sub sp, sp, #size @ up with an aligned stack here and when restoring: add ip, sp, #size ldmfd ip, {r4 - r11, sp} bx r0
On Fri, 2017-03-17 at 15:05 +0000, Russell King - ARM Linux wrote [...] > I suspect we're into compiler behavioural differences, which can't be > relied upon, so I guess we do need to do something in > __kprobes_test_case_start() to work around it. > > I'd do: > > mov ip, sp > tst sp, #4 > subeq sp, sp, #4 @ need to mis-align as we don't save > stmfd sp!, {r4 - r11, ip} @ an even number of registers to end > sub sp, sp, #size @ up with an aligned stack here > > and when restoring: > > add ip, sp, #size > ldmfd ip, {r4 - r11, sp} > bx r0 Thumb doesn't allow a lot of operations with SP (like that 'tst sp, #4' or having SP in the reg list for LDM) which is why I ended up producing slightly convoluted code. Some of these are deprecated for ARM too, which means they might not be supported on newer architecture versions, oh joy.
diff --git a/arch/arm/probes/kprobes/test-core.c b/arch/arm/probes/kprobes/test-core.c index c893726aa52d..1c98a87786ca 100644 --- a/arch/arm/probes/kprobes/test-core.c +++ b/arch/arm/probes/kprobes/test-core.c @@ -977,7 +977,10 @@ static void coverage_end(void) void __naked __kprobes_test_case_start(void) { __asm__ __volatile__ ( - "stmdb sp!, {r4-r11} \n\t" + "mov r2, sp \n\t" + "bic r3, r2, #7 \n\t" + "mov sp, r3 \n\t" + "stmdb sp!, {r2-r11} \n\t" "sub sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t" "bic r0, lr, #1 @ r0 = inline data \n\t" "mov r1, sp \n\t" @@ -997,7 +1000,8 @@ void __naked __kprobes_test_case_end_32(void) "movne pc, r0 \n\t" "mov r0, r4 \n\t" "add sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t" - "ldmia sp!, {r4-r11} \n\t" + "ldmia sp!, {r2-r11} \n\t" + "mov sp, r2 \n\t" "mov pc, r0 \n\t" ); } @@ -1013,7 +1017,8 @@ void __naked __kprobes_test_case_end_16(void) "bxne r0 \n\t" "mov r0, r4 \n\t" "add sp, sp, #"__stringify(TEST_MEMORY_SIZE)"\n\t" - "ldmia sp!, {r4-r11} \n\t" + "ldmia sp!, {r2-r11} \n\t" + "mov sp, r2 \n\t" "bx r0 \n\t" ); }
kprobes test cases need to have a stack that is aligned to an 8-byte boundary because they call other functions (and the ARM ABI mandates that alignment) and because test cases include 64-bit accesses to the stack. Unfortunately, GCC doesn't ensure this alignment for inline assembler and for the code in question seems to always misalign it by pushing just the LR register onto the stack. We therefore need to explicitly perform stack alignment at the start of each test case. Without this fix, some test cases will generate alignment faults on systems where alignment is enforced. Even if the kernel is configured to handle these faults in software, triggering them is ugly. It also exposes limitations in the fault handling code which doesn't cope with writes to the stack. E.g. when handling this instruction strd r6, [sp, #-64]! the fault handling code will write to a stack location below the SP value at the point the fault occurred, which coincides with where the exception handler has pushed the saved register context. This results in corruption of those registers. Signed-off-by: Jon Medhurst <tixy@linaro.org> --- I'm assuming the fact the alignment exception handler doesn't cope with instructions that push things to the stack isn't a problem that we need to be concerned about, given that compiler generated code and handwitten assembler shouldn't trigger this unless it's buggy? Russell, this is the last of several issues [1] [2] I found when testing Masami Hiramatsu's kprobe changes [3]. That is a total of 4 kprobes patches and 3 fixes around code patching. Assuming these are acceptable I can create a branch and a pull request, or feed them into the patch tracker, let me know. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/494365.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/494370.html [3] https://lkml.org/lkml/2017/2/14/709 arch/arm/probes/kprobes/test-core.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)