Message ID | 20201120230211.584929-1-pcc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] kasan: arm64: support specialized outlined tag mismatch checks | expand |
On Sat, Nov 21, 2020 at 12:02 AM Peter Collingbourne <pcc@google.com> wrote: > > By using outlined checks we can achieve a significant code size > improvement by moving the tag-based ASAN checks into separate > functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these > functions have a custom calling convention that preserves most > registers and is specialized to the register containing the address > and the type of access, and as a result we can eliminate the code > size and performance overhead of a standard calling convention such > as AAPCS for these functions. > > This change depends on a separate series of changes to Clang [1] to > support outlined checks in the kernel, although the change works fine > without them (we just don't get outlined checks). This is because the > flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang > changes land. The flag was introduced in the Clang 9.0 timeframe as > part of the support for outlined checks in userspace and because our > minimum Clang version is 10.0 we can pass it unconditionally. > > Outlined checks require a new runtime function with a custom calling > convention. Add this function to arch/arm64/lib. > > I measured the code size of defconfig + tag-based KASAN, as well > as boot time (i.e. time to init launch) on a DragonBoard 845c with > an Android arm64 GKI kernel. The results are below: > > code size boot time > CONFIG_KASAN_INLINE=y before 92824064 6.18s > CONFIG_KASAN_INLINE=y after 38822400 6.65s > CONFIG_KASAN_OUTLINE=y 39215616 11.48s > > We can see straight away that specialized outlined checks beat the > existing CONFIG_KASAN_OUTLINE=y on both code size and boot time > for tag-based ASAN. > > As for the comparison between CONFIG_KASAN_INLINE=y before and after > we saw similar performance numbers in userspace [2] and decided > that since the performance overhead is minimal compared to the > overhead of tag-based ASAN itself as well as compared to the code > size improvements we would just replace the inlined checks with the > specialized outlined checks without the option to select between them, > and that is what I have implemented in this patch. But we may make a > different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y > turn on specialized outlined checks if Clang is new enough. > > Signed-off-by: Peter Collingbourne <pcc@google.com> > Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 > Link: [1] https://reviews.llvm.org/D90426 > Link: [2] https://reviews.llvm.org/D56954 > --- > v2: > - use calculations in the stack spills and restores > - improve the comment at the top of the function > - add a BTI instruction > > arch/arm64/include/asm/asm-prototypes.h | 6 +++ > arch/arm64/include/asm/module.lds.h | 17 ++++++- > arch/arm64/lib/Makefile | 2 + > arch/arm64/lib/kasan_sw_tags.S | 60 +++++++++++++++++++++++++ > mm/kasan/tags.c | 7 +++ > scripts/Makefile.kasan | 1 + > 6 files changed, 91 insertions(+), 2 deletions(-) > create mode 100644 arch/arm64/lib/kasan_sw_tags.S > > diff --git a/arch/arm64/include/asm/asm-prototypes.h b/arch/arm64/include/asm/asm-prototypes.h > index 1c9a3a0c5fa5..ec1d9655f885 100644 > --- a/arch/arm64/include/asm/asm-prototypes.h > +++ b/arch/arm64/include/asm/asm-prototypes.h > @@ -23,4 +23,10 @@ long long __ashlti3(long long a, int b); > long long __ashrti3(long long a, int b); > long long __lshrti3(long long a, int b); > > +/* > + * This function uses a custom calling convention and cannot be called from C so > + * this prototype is not entirely accurate. > + */ > +void __hwasan_tag_mismatch(unsigned long addr, unsigned long access_info); > + > #endif /* __ASM_PROTOTYPES_H */ > diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h > index 691f15af788e..4a6d717f75f3 100644 > --- a/arch/arm64/include/asm/module.lds.h > +++ b/arch/arm64/include/asm/module.lds.h > @@ -1,7 +1,20 @@ > -#ifdef CONFIG_ARM64_MODULE_PLTS > SECTIONS { > +#ifdef CONFIG_ARM64_MODULE_PLTS > .plt (NOLOAD) : { BYTE(0) } > .init.plt (NOLOAD) : { BYTE(0) } > .text.ftrace_trampoline (NOLOAD) : { BYTE(0) } > -} > #endif > + > +#ifdef CONFIG_KASAN_SW_TAGS > + /* > + * Outlined checks go into comdat-deduplicated sections named .text.hot. > + * Because they are in comdats they are not combined by the linker and > + * we otherwise end up with multiple sections with the same .text.hot > + * name in the .ko file. The kernel module loader warns if it sees > + * multiple sections with the same name so we use this sections > + * directive to force them into a single section and silence the > + * warning. > + */ > + .text.hot : { *(.text.hot) } > +#endif > +} > diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile > index d31e1169d9b8..8e60d76a1b47 100644 > --- a/arch/arm64/lib/Makefile > +++ b/arch/arm64/lib/Makefile > @@ -18,3 +18,5 @@ obj-$(CONFIG_CRC32) += crc32.o > obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > > obj-$(CONFIG_ARM64_MTE) += mte.o > + > +obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o > diff --git a/arch/arm64/lib/kasan_sw_tags.S b/arch/arm64/lib/kasan_sw_tags.S > new file mode 100644 > index 000000000000..5c179b32af71 > --- /dev/null > +++ b/arch/arm64/lib/kasan_sw_tags.S > @@ -0,0 +1,60 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (C) 2020 Google LLC > + */ > + > +#include <linux/linkage.h> > +#include <asm/assembler.h> > + > +/* > + * Report a tag mismatch detected by tag-based KASAN. > + * > + * This function has a custom calling convention in order to minimize the sizes > + * of the compiler-generated thunks that call it. All registers except for x16 > + * and x17 must be preserved. This includes x0 and x1 which are used by the > + * caller to pass arguments. In order to allow these registers to be restored > + * the caller spills x0 and x1 to sp+0 and sp+8. The registers x29 and x30 are > + * spilled to sp+232 and sp+240, and although it is not strictly necessary for > + * the caller to spill them, that is how the ABI for these functions has been > + * defined. The 256 bytes of stack space allocated by the caller must be > + * deallocated on return. > + * > + * This function takes care of transitioning to the standard AAPCS calling > + * convention and calls the C function kasan_tag_mismatch to report the error. > + * > + * Parameters: > + * x0 - the fault address > + * x1 - an encoded description of the faulting access > + */ > +SYM_CODE_START(__hwasan_tag_mismatch) > +#ifdef BTI_C > + BTI_C > +#endif > + add x29, sp, #232 > + stp x2, x3, [sp, #8 * 2] > + stp x4, x5, [sp, #8 * 4] > + stp x6, x7, [sp, #8 * 6] > + stp x8, x9, [sp, #8 * 8] > + stp x10, x11, [sp, #8 * 10] > + stp x12, x13, [sp, #8 * 12] > + stp x14, x15, [sp, #8 * 14] > +#ifndef CONFIG_SHADOW_CALL_STACK > + str x18, [sp, #8 * 18] > +#endif > + mov x2, x30 > + bl kasan_tag_mismatch > + ldp x29, x30, [sp, #8 * 29] > +#ifndef CONFIG_SHADOW_CALL_STACK > + ldr x18, [sp, #8 * 18] > +#endif > + ldp x14, x15, [sp, #8 * 14] > + ldp x12, x13, [sp, #8 * 12] > + ldp x10, x11, [sp, #8 * 10] > + ldp x8, x9, [sp, #8 * 8] > + ldp x6, x7, [sp, #8 * 6] > + ldp x4, x5, [sp, #8 * 4] > + ldp x2, x3, [sp, #8 * 2] > + ldp x0, x1, [sp], #256 > + ret > +SYM_CODE_END(__hwasan_tag_mismatch) > +EXPORT_SYMBOL(__hwasan_tag_mismatch) > diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c > index e02a36a51f42..d00613956c79 100644 > --- a/mm/kasan/tags.c > +++ b/mm/kasan/tags.c > @@ -198,3 +198,10 @@ struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, > > return &alloc_meta->free_track[i]; > } > + > +void kasan_tag_mismatch(unsigned long addr, unsigned long access_info, > + unsigned long ret_ip) > +{ > + kasan_report(addr, 1 << (access_info & 0xf), access_info & 0x10, > + ret_ip); > +} > diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan > index 1e000cc2e7b4..1f2cccc264d8 100644 > --- a/scripts/Makefile.kasan > +++ b/scripts/Makefile.kasan > @@ -44,6 +44,7 @@ endif > CFLAGS_KASAN := -fsanitize=kernel-hwaddress \ > -mllvm -hwasan-instrument-stack=$(CONFIG_KASAN_STACK) \ > -mllvm -hwasan-use-short-granules=0 \ > + -mllvm -hwasan-inline-all-checks=0 \ > $(instrumentation_flags) > > endif # CONFIG_KASAN_SW_TAGS > -- > 2.29.2.454.gaff20da3a2-goog > Acked-by: Andrey Konovalov <andreyknvl@google.com>
Hi Peter, Thanks for bearing with me on the ABI bits. My main concerns now is making that clear, and I have some concrete suggestions below. As this depends on your other patches to the stack trace code, could you please make that dependency more explicit (e.g. fold the two into a single series)? That way we can avoid accidental breakage. On Fri, Nov 20, 2020 at 03:02:11PM -0800, Peter Collingbourne wrote: > By using outlined checks we can achieve a significant code size > improvement by moving the tag-based ASAN checks into separate > functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these > functions have a custom calling convention that preserves most > registers and is specialized to the register containing the address > and the type of access, and as a result we can eliminate the code > size and performance overhead of a standard calling convention such > as AAPCS for these functions. > > This change depends on a separate series of changes to Clang [1] to > support outlined checks in the kernel, although the change works fine > without them (we just don't get outlined checks). This is because the > flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang > changes land. The flag was introduced in the Clang 9.0 timeframe as > part of the support for outlined checks in userspace and because our > minimum Clang version is 10.0 we can pass it unconditionally. > > Outlined checks require a new runtime function with a custom calling > convention. Add this function to arch/arm64/lib. > > I measured the code size of defconfig + tag-based KASAN, as well > as boot time (i.e. time to init launch) on a DragonBoard 845c with > an Android arm64 GKI kernel. The results are below: > > code size boot time > CONFIG_KASAN_INLINE=y before 92824064 6.18s > CONFIG_KASAN_INLINE=y after 38822400 6.65s > CONFIG_KASAN_OUTLINE=y 39215616 11.48s > > We can see straight away that specialized outlined checks beat the > existing CONFIG_KASAN_OUTLINE=y on both code size and boot time > for tag-based ASAN. > > As for the comparison between CONFIG_KASAN_INLINE=y before and after > we saw similar performance numbers in userspace [2] and decided > that since the performance overhead is minimal compared to the > overhead of tag-based ASAN itself as well as compared to the code > size improvements we would just replace the inlined checks with the > specialized outlined checks without the option to select between them, > and that is what I have implemented in this patch. But we may make a > different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y > turn on specialized outlined checks if Clang is new enough. > > Signed-off-by: Peter Collingbourne <pcc@google.com> > Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 > Link: [1] https://reviews.llvm.org/D90426 > Link: [2] https://reviews.llvm.org/D56954 > --- > v2: > - use calculations in the stack spills and restores > - improve the comment at the top of the function > - add a BTI instruction > +/* > + * Report a tag mismatch detected by tag-based KASAN. > + * > + * This function has a custom calling convention in order to minimize the sizes > + * of the compiler-generated thunks that call it. All registers except for x16 > + * and x17 must be preserved. This includes x0 and x1 which are used by the > + * caller to pass arguments. In order to allow these registers to be restored > + * the caller spills x0 and x1 to sp+0 and sp+8. The registers x29 and x30 are > + * spilled to sp+232 and sp+240, and although it is not strictly necessary for > + * the caller to spill them, that is how the ABI for these functions has been > + * defined. The 256 bytes of stack space allocated by the caller must be > + * deallocated on return. > + * > + * This function takes care of transitioning to the standard AAPCS calling > + * convention and calls the C function kasan_tag_mismatch to report the error. > + * > + * Parameters: > + * x0 - the fault address > + * x1 - an encoded description of the faulting access > + */ To make this more explicit, would you be happy with the below? /* * Report a tag mismatch detected by tag-based KASAN. * * A compiler-generated thunk calls this with a non-AAPCS calling * convention. Upon entry to this function, registers are as follows: * * x0: fault address (see below for restore) * x1: fault description (see below for restore) * x2 to x15: callee-saved * x16 to x17: safe to clobber * x18 to x30: callee-saved * sp: pre-decremented by 256 bytes (see below for restore) * * The caller has decremented the SP by 256 bytes, and created a * structure on the stack as follows: * * sp + 0..15: x0 and x1 to be restored * sp + 16..231: free for use * sp + 232..247: x29 and x30 (same as in GPRs) * sp + 248..255: free for use * * Note that this is not a struct pt_regs. * * To call a regular AAPCS function we must save x2 to x15 (which we can * store in the gaps), and create a frame record (for which we can use * x29 and x30 spilled by the caller as those match the GPRs). * * The caller expects x0 and x1 to be restored from the structure, and * for the structure to be removed from the stack (i.e. the SP must be * incremented by 256 prior to return). */ > +SYM_CODE_START(__hwasan_tag_mismatch) > +#ifdef BTI_C > + BTI_C > +#endif > + add x29, sp, #232 > + stp x2, x3, [sp, #8 * 2] > + stp x4, x5, [sp, #8 * 4] > + stp x6, x7, [sp, #8 * 6] > + stp x8, x9, [sp, #8 * 8] > + stp x10, x11, [sp, #8 * 10] > + stp x12, x13, [sp, #8 * 12] > + stp x14, x15, [sp, #8 * 14] > +#ifndef CONFIG_SHADOW_CALL_STACK > + str x18, [sp, #8 * 18] > +#endif Can we please add a linespace here... > + mov x2, x30 > + bl kasan_tag_mismatch ... and one here? That'll clearly separate the save/call/restore sequences. > + ldp x29, x30, [sp, #8 * 29] > +#ifndef CONFIG_SHADOW_CALL_STACK > + ldr x18, [sp, #8 * 18] > +#endif > + ldp x14, x15, [sp, #8 * 14] > + ldp x12, x13, [sp, #8 * 12] > + ldp x10, x11, [sp, #8 * 10] > + ldp x8, x9, [sp, #8 * 8] > + ldp x6, x7, [sp, #8 * 6] > + ldp x4, x5, [sp, #8 * 4] > + ldp x2, x3, [sp, #8 * 2] > + ldp x0, x1, [sp], #256 To match what we do elsewhere, please put the restore into ascending order, restoring x29 and x30 last. That'll match our other trampolines, is more forgiving for CPUs that only prefetch forwards, and it makes it easier to compare the save and restore sequences line-by-line. Then we can have a separate: /* remove the structure from the stack */ add sp, sp, #256 ... which is easier to match up with the calling convention description. Thanks, Mark. > + ret > +SYM_CODE_END(__hwasan_tag_mismatch) > +EXPORT_SYMBOL(__hwasan_tag_mismatch) > diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c > index e02a36a51f42..d00613956c79 100644 > --- a/mm/kasan/tags.c > +++ b/mm/kasan/tags.c > @@ -198,3 +198,10 @@ struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, > > return &alloc_meta->free_track[i]; > } > + > +void kasan_tag_mismatch(unsigned long addr, unsigned long access_info, > + unsigned long ret_ip) > +{ > + kasan_report(addr, 1 << (access_info & 0xf), access_info & 0x10, > + ret_ip); > +} > diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan > index 1e000cc2e7b4..1f2cccc264d8 100644 > --- a/scripts/Makefile.kasan > +++ b/scripts/Makefile.kasan > @@ -44,6 +44,7 @@ endif > CFLAGS_KASAN := -fsanitize=kernel-hwaddress \ > -mllvm -hwasan-instrument-stack=$(CONFIG_KASAN_STACK) \ > -mllvm -hwasan-use-short-granules=0 \ > + -mllvm -hwasan-inline-all-checks=0 \ > $(instrumentation_flags) > > endif # CONFIG_KASAN_SW_TAGS > -- > 2.29.2.454.gaff20da3a2-goog >
Hi Mark, On Tue, Nov 24, 2020 at 1:18 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > Hi Peter, > > Thanks for bearing with me on the ABI bits. My main concerns now is > making that clear, and I have some concrete suggestions below. > > As this depends on your other patches to the stack trace code, could you > please make that dependency more explicit (e.g. fold the two into a > single series)? That way we can avoid accidental breakage. Sure, done in v3. > On Fri, Nov 20, 2020 at 03:02:11PM -0800, Peter Collingbourne wrote: > > By using outlined checks we can achieve a significant code size > > improvement by moving the tag-based ASAN checks into separate > > functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these > > functions have a custom calling convention that preserves most > > registers and is specialized to the register containing the address > > and the type of access, and as a result we can eliminate the code > > size and performance overhead of a standard calling convention such > > as AAPCS for these functions. > > > > This change depends on a separate series of changes to Clang [1] to > > support outlined checks in the kernel, although the change works fine > > without them (we just don't get outlined checks). This is because the > > flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang > > changes land. The flag was introduced in the Clang 9.0 timeframe as > > part of the support for outlined checks in userspace and because our > > minimum Clang version is 10.0 we can pass it unconditionally. > > > > Outlined checks require a new runtime function with a custom calling > > convention. Add this function to arch/arm64/lib. > > > > I measured the code size of defconfig + tag-based KASAN, as well > > as boot time (i.e. time to init launch) on a DragonBoard 845c with > > an Android arm64 GKI kernel. The results are below: > > > > code size boot time > > CONFIG_KASAN_INLINE=y before 92824064 6.18s > > CONFIG_KASAN_INLINE=y after 38822400 6.65s > > CONFIG_KASAN_OUTLINE=y 39215616 11.48s > > > > We can see straight away that specialized outlined checks beat the > > existing CONFIG_KASAN_OUTLINE=y on both code size and boot time > > for tag-based ASAN. > > > > As for the comparison between CONFIG_KASAN_INLINE=y before and after > > we saw similar performance numbers in userspace [2] and decided > > that since the performance overhead is minimal compared to the > > overhead of tag-based ASAN itself as well as compared to the code > > size improvements we would just replace the inlined checks with the > > specialized outlined checks without the option to select between them, > > and that is what I have implemented in this patch. But we may make a > > different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y > > turn on specialized outlined checks if Clang is new enough. > > > > Signed-off-by: Peter Collingbourne <pcc@google.com> > > Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 > > Link: [1] https://reviews.llvm.org/D90426 > > Link: [2] https://reviews.llvm.org/D56954 > > --- > > v2: > > - use calculations in the stack spills and restores > > - improve the comment at the top of the function > > - add a BTI instruction > > > +/* > > + * Report a tag mismatch detected by tag-based KASAN. > > + * > > + * This function has a custom calling convention in order to minimize the sizes > > + * of the compiler-generated thunks that call it. All registers except for x16 > > + * and x17 must be preserved. This includes x0 and x1 which are used by the > > + * caller to pass arguments. In order to allow these registers to be restored > > + * the caller spills x0 and x1 to sp+0 and sp+8. The registers x29 and x30 are > > + * spilled to sp+232 and sp+240, and although it is not strictly necessary for > > + * the caller to spill them, that is how the ABI for these functions has been > > + * defined. The 256 bytes of stack space allocated by the caller must be > > + * deallocated on return. > > + * > > + * This function takes care of transitioning to the standard AAPCS calling > > + * convention and calls the C function kasan_tag_mismatch to report the error. > > + * > > + * Parameters: > > + * x0 - the fault address > > + * x1 - an encoded description of the faulting access > > + */ > > To make this more explicit, would you be happy with the below? > > /* > * Report a tag mismatch detected by tag-based KASAN. > * > * A compiler-generated thunk calls this with a non-AAPCS calling > * convention. Upon entry to this function, registers are as follows: > * > * x0: fault address (see below for restore) > * x1: fault description (see below for restore) > * x2 to x15: callee-saved > * x16 to x17: safe to clobber > * x18 to x30: callee-saved > * sp: pre-decremented by 256 bytes (see below for restore) > * > * The caller has decremented the SP by 256 bytes, and created a > * structure on the stack as follows: > * > * sp + 0..15: x0 and x1 to be restored > * sp + 16..231: free for use > * sp + 232..247: x29 and x30 (same as in GPRs) > * sp + 248..255: free for use > * > * Note that this is not a struct pt_regs. > * > * To call a regular AAPCS function we must save x2 to x15 (which we can > * store in the gaps), and create a frame record (for which we can use > * x29 and x30 spilled by the caller as those match the GPRs). > * > * The caller expects x0 and x1 to be restored from the structure, and > * for the structure to be removed from the stack (i.e. the SP must be > * incremented by 256 prior to return). > */ > > > +SYM_CODE_START(__hwasan_tag_mismatch) > > +#ifdef BTI_C > > + BTI_C > > +#endif > > + add x29, sp, #232 > > + stp x2, x3, [sp, #8 * 2] > > + stp x4, x5, [sp, #8 * 4] > > + stp x6, x7, [sp, #8 * 6] > > + stp x8, x9, [sp, #8 * 8] > > + stp x10, x11, [sp, #8 * 10] > > + stp x12, x13, [sp, #8 * 12] > > + stp x14, x15, [sp, #8 * 14] > > +#ifndef CONFIG_SHADOW_CALL_STACK > > + str x18, [sp, #8 * 18] > > +#endif > > Can we please add a linespace here... > > > + mov x2, x30 > > + bl kasan_tag_mismatch > > ... and one here? That'll clearly separate the save/call/restore > sequences. > > > + ldp x29, x30, [sp, #8 * 29] > > +#ifndef CONFIG_SHADOW_CALL_STACK > > + ldr x18, [sp, #8 * 18] > > +#endif > > + ldp x14, x15, [sp, #8 * 14] > > + ldp x12, x13, [sp, #8 * 12] > > + ldp x10, x11, [sp, #8 * 10] > > + ldp x8, x9, [sp, #8 * 8] > > + ldp x6, x7, [sp, #8 * 6] > > + ldp x4, x5, [sp, #8 * 4] > > + ldp x2, x3, [sp, #8 * 2] > > + ldp x0, x1, [sp], #256 > > To match what we do elsewhere, please put the restore into ascending > order, restoring x29 and x30 last. That'll match our other trampolines, > is more forgiving for CPUs that only prefetch forwards, and it makes it > easier to compare the save and restore sequences line-by-line. > > Then we can have a separate: > > /* remove the structure from the stack */ > add sp, sp, #256 > > ... which is easier to match up with the calling convention description. > > Thanks, > Mark. Thanks for these suggestions. They all look good to me so I've adopted them as is in v3. Peter
diff --git a/arch/arm64/include/asm/asm-prototypes.h b/arch/arm64/include/asm/asm-prototypes.h index 1c9a3a0c5fa5..ec1d9655f885 100644 --- a/arch/arm64/include/asm/asm-prototypes.h +++ b/arch/arm64/include/asm/asm-prototypes.h @@ -23,4 +23,10 @@ long long __ashlti3(long long a, int b); long long __ashrti3(long long a, int b); long long __lshrti3(long long a, int b); +/* + * This function uses a custom calling convention and cannot be called from C so + * this prototype is not entirely accurate. + */ +void __hwasan_tag_mismatch(unsigned long addr, unsigned long access_info); + #endif /* __ASM_PROTOTYPES_H */ diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h index 691f15af788e..4a6d717f75f3 100644 --- a/arch/arm64/include/asm/module.lds.h +++ b/arch/arm64/include/asm/module.lds.h @@ -1,7 +1,20 @@ -#ifdef CONFIG_ARM64_MODULE_PLTS SECTIONS { +#ifdef CONFIG_ARM64_MODULE_PLTS .plt (NOLOAD) : { BYTE(0) } .init.plt (NOLOAD) : { BYTE(0) } .text.ftrace_trampoline (NOLOAD) : { BYTE(0) } -} #endif + +#ifdef CONFIG_KASAN_SW_TAGS + /* + * Outlined checks go into comdat-deduplicated sections named .text.hot. + * Because they are in comdats they are not combined by the linker and + * we otherwise end up with multiple sections with the same .text.hot + * name in the .ko file. The kernel module loader warns if it sees + * multiple sections with the same name so we use this sections + * directive to force them into a single section and silence the + * warning. + */ + .text.hot : { *(.text.hot) } +#endif +} diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index d31e1169d9b8..8e60d76a1b47 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -18,3 +18,5 @@ obj-$(CONFIG_CRC32) += crc32.o obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o obj-$(CONFIG_ARM64_MTE) += mte.o + +obj-$(CONFIG_KASAN_SW_TAGS) += kasan_sw_tags.o diff --git a/arch/arm64/lib/kasan_sw_tags.S b/arch/arm64/lib/kasan_sw_tags.S new file mode 100644 index 000000000000..5c179b32af71 --- /dev/null +++ b/arch/arm64/lib/kasan_sw_tags.S @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2020 Google LLC + */ + +#include <linux/linkage.h> +#include <asm/assembler.h> + +/* + * Report a tag mismatch detected by tag-based KASAN. + * + * This function has a custom calling convention in order to minimize the sizes + * of the compiler-generated thunks that call it. All registers except for x16 + * and x17 must be preserved. This includes x0 and x1 which are used by the + * caller to pass arguments. In order to allow these registers to be restored + * the caller spills x0 and x1 to sp+0 and sp+8. The registers x29 and x30 are + * spilled to sp+232 and sp+240, and although it is not strictly necessary for + * the caller to spill them, that is how the ABI for these functions has been + * defined. The 256 bytes of stack space allocated by the caller must be + * deallocated on return. + * + * This function takes care of transitioning to the standard AAPCS calling + * convention and calls the C function kasan_tag_mismatch to report the error. + * + * Parameters: + * x0 - the fault address + * x1 - an encoded description of the faulting access + */ +SYM_CODE_START(__hwasan_tag_mismatch) +#ifdef BTI_C + BTI_C +#endif + add x29, sp, #232 + stp x2, x3, [sp, #8 * 2] + stp x4, x5, [sp, #8 * 4] + stp x6, x7, [sp, #8 * 6] + stp x8, x9, [sp, #8 * 8] + stp x10, x11, [sp, #8 * 10] + stp x12, x13, [sp, #8 * 12] + stp x14, x15, [sp, #8 * 14] +#ifndef CONFIG_SHADOW_CALL_STACK + str x18, [sp, #8 * 18] +#endif + mov x2, x30 + bl kasan_tag_mismatch + ldp x29, x30, [sp, #8 * 29] +#ifndef CONFIG_SHADOW_CALL_STACK + ldr x18, [sp, #8 * 18] +#endif + ldp x14, x15, [sp, #8 * 14] + ldp x12, x13, [sp, #8 * 12] + ldp x10, x11, [sp, #8 * 10] + ldp x8, x9, [sp, #8 * 8] + ldp x6, x7, [sp, #8 * 6] + ldp x4, x5, [sp, #8 * 4] + ldp x2, x3, [sp, #8 * 2] + ldp x0, x1, [sp], #256 + ret +SYM_CODE_END(__hwasan_tag_mismatch) +EXPORT_SYMBOL(__hwasan_tag_mismatch) diff --git a/mm/kasan/tags.c b/mm/kasan/tags.c index e02a36a51f42..d00613956c79 100644 --- a/mm/kasan/tags.c +++ b/mm/kasan/tags.c @@ -198,3 +198,10 @@ struct kasan_track *kasan_get_free_track(struct kmem_cache *cache, return &alloc_meta->free_track[i]; } + +void kasan_tag_mismatch(unsigned long addr, unsigned long access_info, + unsigned long ret_ip) +{ + kasan_report(addr, 1 << (access_info & 0xf), access_info & 0x10, + ret_ip); +} diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan index 1e000cc2e7b4..1f2cccc264d8 100644 --- a/scripts/Makefile.kasan +++ b/scripts/Makefile.kasan @@ -44,6 +44,7 @@ endif CFLAGS_KASAN := -fsanitize=kernel-hwaddress \ -mllvm -hwasan-instrument-stack=$(CONFIG_KASAN_STACK) \ -mllvm -hwasan-use-short-granules=0 \ + -mllvm -hwasan-inline-all-checks=0 \ $(instrumentation_flags) endif # CONFIG_KASAN_SW_TAGS
By using outlined checks we can achieve a significant code size improvement by moving the tag-based ASAN checks into separate functions. Unlike the existing CONFIG_KASAN_OUTLINE mode these functions have a custom calling convention that preserves most registers and is specialized to the register containing the address and the type of access, and as a result we can eliminate the code size and performance overhead of a standard calling convention such as AAPCS for these functions. This change depends on a separate series of changes to Clang [1] to support outlined checks in the kernel, although the change works fine without them (we just don't get outlined checks). This is because the flag -mllvm -hwasan-inline-all-checks=0 has no effect until the Clang changes land. The flag was introduced in the Clang 9.0 timeframe as part of the support for outlined checks in userspace and because our minimum Clang version is 10.0 we can pass it unconditionally. Outlined checks require a new runtime function with a custom calling convention. Add this function to arch/arm64/lib. I measured the code size of defconfig + tag-based KASAN, as well as boot time (i.e. time to init launch) on a DragonBoard 845c with an Android arm64 GKI kernel. The results are below: code size boot time CONFIG_KASAN_INLINE=y before 92824064 6.18s CONFIG_KASAN_INLINE=y after 38822400 6.65s CONFIG_KASAN_OUTLINE=y 39215616 11.48s We can see straight away that specialized outlined checks beat the existing CONFIG_KASAN_OUTLINE=y on both code size and boot time for tag-based ASAN. As for the comparison between CONFIG_KASAN_INLINE=y before and after we saw similar performance numbers in userspace [2] and decided that since the performance overhead is minimal compared to the overhead of tag-based ASAN itself as well as compared to the code size improvements we would just replace the inlined checks with the specialized outlined checks without the option to select between them, and that is what I have implemented in this patch. But we may make a different decision for the kernel such as having CONFIG_KASAN_OUTLINE=y turn on specialized outlined checks if Clang is new enough. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76 Link: [1] https://reviews.llvm.org/D90426 Link: [2] https://reviews.llvm.org/D56954 --- v2: - use calculations in the stack spills and restores - improve the comment at the top of the function - add a BTI instruction arch/arm64/include/asm/asm-prototypes.h | 6 +++ arch/arm64/include/asm/module.lds.h | 17 ++++++- arch/arm64/lib/Makefile | 2 + arch/arm64/lib/kasan_sw_tags.S | 60 +++++++++++++++++++++++++ mm/kasan/tags.c | 7 +++ scripts/Makefile.kasan | 1 + 6 files changed, 91 insertions(+), 2 deletions(-) create mode 100644 arch/arm64/lib/kasan_sw_tags.S