Message ID | 20200622193146.2985288-4-keescook@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Optionally randomize kernel stack offset each syscall | expand |
On 6/22/20 12:31 PM, Kees Cook wrote: > This provides the ability for architectures to enable kernel stack base > address offset randomization. This feature is controlled by the boot > param "randomize_kstack_offset=on/off", with its default value set by > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. > > Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com > Signed-off-by: Kees Cook <keescook@chromium.org> > --- > Makefile | 4 ++++ > arch/Kconfig | 23 ++++++++++++++++++ > include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++ > init/main.c | 23 ++++++++++++++++++ > 4 files changed, 90 insertions(+) > create mode 100644 include/linux/randomize_kstack.h Hi, Please add documentation for the new kernel boot parameter to Documentation/admin-guide/kernel-parameters.txt. > diff --git a/arch/Kconfig b/arch/Kconfig > index 1ea61290900a..1f52c9cfefca 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -883,6 +883,29 @@ config VMAP_STACK > virtual mappings with real shadow memory, and KASAN_VMALLOC must > be enabled. > > +config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET > + def_bool n > + help > + An arch should select this symbol if it can support kernel stack > + offset randomization with calls to add_random_kstack_offset() > + during syscall entry and choose_random_kstack_offset() during > + syscall exit. Downgrading of -fstack-protector-strong to > + -fstack-protector should also be applied to the entry code and > + closely examined, as the artificial stack bump looks like an array > + to the compiler, so it will attempt to add canary checks regardless > + of the static branch state. > + > +config RANDOMIZE_KSTACK_OFFSET_DEFAULT > + bool "Randomize kernel stack offset on syscall entry" > + depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET > + help > + The kernel stack offset can be randomized (after pt_regs) by > + roughly 5 bits of entropy, frustrating memory corruption > + attacks that depend on stack address determinism or > + cross-syscall address exposures. This feature is controlled > + by kernel boot param "randomize_kstack_offset=on/off", and this > + config chooses the default boot state. thanks.
On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@chromium.org> wrote: > This provides the ability for architectures to enable kernel stack base > address offset randomization. This feature is controlled by the boot > param "randomize_kstack_offset=on/off", with its default value set by > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. [...] > +#define add_random_kstack_offset() do { \ > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > + &randomize_kstack_offset)) { \ > + u32 offset = this_cpu_read(kstack_offset); \ > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > + asm volatile("" : "=m"(*ptr)); \ > + } \ > +} while (0) clang generates better code here if the mask is stack-aligned - otherwise it needs to round the stack pointer / the offset: $ cat alloca_align.c #include <alloca.h> void callee(void); void alloca_blah(unsigned long rand) { asm volatile(""::"r"(alloca(rand & MASK))); callee(); } $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff $ objdump -d alloca_align.o [...] 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 81 e7 ff 03 00 00 and $0x3ff,%edi a: 83 c7 0f add $0xf,%edi d: 83 e7 f0 and $0xfffffff0,%edi 10: 48 89 e0 mov %rsp,%rax 13: 48 29 f8 sub %rdi,%rax 16: 48 89 c4 mov %rax,%rsp 19: e8 00 00 00 00 callq 1e <alloca_blah+0x1e> 1e: 48 89 ec mov %rbp,%rsp 21: 5d pop %rbp 22: c3 retq $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0 $ objdump -d alloca_align.o [...] 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 89 e0 mov %rsp,%rax 7: 81 e7 f0 03 00 00 and $0x3f0,%edi d: 48 29 f8 sub %rdi,%rax 10: 48 89 c4 mov %rax,%rsp 13: e8 00 00 00 00 callq 18 <alloca_blah+0x18> 18: 48 89 ec mov %rbp,%rsp 1b: 5d pop %rbp 1c: c3 retq $ (From a glance at the assembly, gcc seems to always assume that the length may be misaligned.) Maybe this should be something along the lines of __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with appropriate definitions of the stack alignment mask depending on the architecture's choice of stack alignment for kernel code).
On Mon, Jun 22, 2020 at 12:40:49PM -0700, Randy Dunlap wrote: > On 6/22/20 12:31 PM, Kees Cook wrote: > > This provides the ability for architectures to enable kernel stack base > > address offset randomization. This feature is controlled by the boot > > param "randomize_kstack_offset=on/off", with its default value set by > > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. > > > > Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > > Link: https://lore.kernel.org/r/20190415060918.3766-1-elena.reshetova@intel.com > > Signed-off-by: Kees Cook <keescook@chromium.org> > > --- > > Makefile | 4 ++++ > > arch/Kconfig | 23 ++++++++++++++++++ > > include/linux/randomize_kstack.h | 40 ++++++++++++++++++++++++++++++++ > > init/main.c | 23 ++++++++++++++++++ > > 4 files changed, 90 insertions(+) > > create mode 100644 include/linux/randomize_kstack.h > > Please add documentation for the new kernel boot parameter to > Documentation/admin-guide/kernel-parameters.txt. Oops, yes. Thanks for the reminder! (I wonder if checkpatch can notice "+early_param" and suggest the Doc update hmmm)
On Mon, Jun 22, 2020 at 10:07:37PM +0200, Jann Horn wrote: > On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@chromium.org> wrote: > > This provides the ability for architectures to enable kernel stack base > > address offset randomization. This feature is controlled by the boot > > param "randomize_kstack_offset=on/off", with its default value set by > > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. > [...] > > +#define add_random_kstack_offset() do { \ > > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > > + &randomize_kstack_offset)) { \ > > + u32 offset = this_cpu_read(kstack_offset); \ > > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > > + asm volatile("" : "=m"(*ptr)); \ > > + } \ > > +} while (0) > > clang generates better code here if the mask is stack-aligned - > otherwise it needs to round the stack pointer / the offset: Interesting. I was hoping to avoid needing to know the architecture stack alignment (leaving it up to the compiler). > > $ cat alloca_align.c > #include <alloca.h> > void callee(void); > > void alloca_blah(unsigned long rand) { > asm volatile(""::"r"(alloca(rand & MASK))); > callee(); > } > $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3ff > $ objdump -d alloca_align.o > [...] > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 81 e7 ff 03 00 00 and $0x3ff,%edi > a: 83 c7 0f add $0xf,%edi > d: 83 e7 f0 and $0xfffffff0,%edi > 10: 48 89 e0 mov %rsp,%rax > 13: 48 29 f8 sub %rdi,%rax > 16: 48 89 c4 mov %rax,%rsp > 19: e8 00 00 00 00 callq 1e <alloca_blah+0x1e> > 1e: 48 89 ec mov %rbp,%rsp > 21: 5d pop %rbp > 22: c3 retq > $ clang -O3 -c -o alloca_align.o alloca_align.c -DMASK=0x3f0 > $ objdump -d alloca_align.o > [...] > 0: 55 push %rbp > 1: 48 89 e5 mov %rsp,%rbp > 4: 48 89 e0 mov %rsp,%rax > 7: 81 e7 f0 03 00 00 and $0x3f0,%edi > d: 48 29 f8 sub %rdi,%rax > 10: 48 89 c4 mov %rax,%rsp > 13: e8 00 00 00 00 callq 18 <alloca_blah+0x18> > 18: 48 89 ec mov %rbp,%rsp > 1b: 5d pop %rbp > 1c: c3 retq > $ > > (From a glance at the assembly, gcc seems to always assume that the > length may be misaligned.) Right -- this is why I didn't bother with it, since it didn't seem to notice what I'd already done to the alloca() argument. (But from what I could measure on cycle counts, the additional ALU didn't seem to really make much difference ... it _would_ be nice to avoid it, of course.) > Maybe this should be something along the lines of > __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with > appropriate definitions of the stack alignment mask depending on the > architecture's choice of stack alignment for kernel code). Is that explicitly selected anywhere in the kernel? I thought the alignment was left up to the compiler (as in I've seen bugs fixed where the kernel had to deal with the alignment choices the compiler was making...)
On Mon, Jun 22, 2020 at 11:30 PM Kees Cook <keescook@chromium.org> wrote: > On Mon, Jun 22, 2020 at 10:07:37PM +0200, Jann Horn wrote: > > On Mon, Jun 22, 2020 at 9:31 PM Kees Cook <keescook@chromium.org> wrote: > > > This provides the ability for architectures to enable kernel stack base > > > address offset randomization. This feature is controlled by the boot > > > param "randomize_kstack_offset=on/off", with its default value set by > > > CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT. > > [...] > > > +#define add_random_kstack_offset() do { \ > > > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > > > + &randomize_kstack_offset)) { \ > > > + u32 offset = this_cpu_read(kstack_offset); \ > > > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > > > + asm volatile("" : "=m"(*ptr)); \ > > > + } \ > > > +} while (0) > > > > clang generates better code here if the mask is stack-aligned - > > otherwise it needs to round the stack pointer / the offset: [...] > > Maybe this should be something along the lines of > > __builtin_alloca(offset & (0x3ff & ARCH_STACK_ALIGN_MASK)) (with > > appropriate definitions of the stack alignment mask depending on the > > architecture's choice of stack alignment for kernel code). > > Is that explicitly selected anywhere in the kernel? I thought the > alignment was left up to the compiler (as in I've seen bugs fixed where > the kernel had to deal with the alignment choices the compiler was > making...) No, at least on x86-64 and x86 Linux overrides the normal ABI. From arch/x86/Makefile: # For gcc stack alignment is specified with -mpreferred-stack-boundary, # clang has the option -mstack-alignment for that purpose. ifneq ($(call cc-option, -mpreferred-stack-boundary=4),) cc_stack_align4 := -mpreferred-stack-boundary=2 cc_stack_align8 := -mpreferred-stack-boundary=3 else ifneq ($(call cc-option, -mstack-alignment=16),) cc_stack_align4 := -mstack-alignment=4 cc_stack_align8 := -mstack-alignment=8 endif [...] ifeq ($(CONFIG_X86_32),y) [...] # Align the stack to the register width instead of using the default # alignment of 16 bytes. This reduces stack usage and the number of # alignment instructions. KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align4)) [...] else [...] # By default gcc and clang use a stack alignment of 16 bytes for x86. # However the standard kernel entry on x86-64 leaves the stack on an # 8-byte boundary. If the compiler isn't informed about the actual # alignment it will generate extra alignment instructions for the # default alignment which keep the stack *mis*aligned. # Furthermore an alignment to the register width reduces stack usage # and the number of alignment instructions. KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8)) [...] endif Normal x86-64 ABI has 16-byte stack alignment; Linux kernel x86-64 ABI has 8-byte stack alignment. Similarly, the normal Linux 32-bit x86 ABI is 16-byte aligned; meanwhile Linux kernel x86 ABI has 4-byte stack alignment. This is because userspace code wants the stack to be sufficiently aligned for fancy SSE instructions and such; the kernel, on the other hand, never uses those in normal code, and cares about stack usage and such very much.
On Mon, Jun 22, 2020 at 11:42:29PM +0200, Jann Horn wrote: > No, at least on x86-64 and x86 Linux overrides the normal ABI. From > arch/x86/Makefile: Ah! Thanks for the pointer. > > # For gcc stack alignment is specified with -mpreferred-stack-boundary, > # clang has the option -mstack-alignment for that purpose. > ifneq ($(call cc-option, -mpreferred-stack-boundary=4),) > cc_stack_align4 := -mpreferred-stack-boundary=2 > cc_stack_align8 := -mpreferred-stack-boundary=3 > else ifneq ($(call cc-option, -mstack-alignment=16),) > cc_stack_align4 := -mstack-alignment=4 > cc_stack_align8 := -mstack-alignment=8 > endif > [...] > ifeq ($(CONFIG_X86_32),y) > [...] > # Align the stack to the register width instead of using the default > # alignment of 16 bytes. This reduces stack usage and the number of > # alignment instructions. > KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align4)) > [...] > else > [...] > # By default gcc and clang use a stack alignment of 16 bytes for x86. > # However the standard kernel entry on x86-64 leaves the stack on an > # 8-byte boundary. If the compiler isn't informed about the actual > # alignment it will generate extra alignment instructions for the > # default alignment which keep the stack *mis*aligned. > # Furthermore an alignment to the register width reduces stack usage > # and the number of alignment instructions. > KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8)) > [...] > endif And it seems that only x86 does this. No other architecture specifies -mpreferred-stack-boundary... > Normal x86-64 ABI has 16-byte stack alignment; Linux kernel x86-64 ABI > has 8-byte stack alignment. > Similarly, the normal Linux 32-bit x86 ABI is 16-byte aligned; > meanwhile Linux kernel x86 ABI has 4-byte stack alignment. > > This is because userspace code wants the stack to be sufficiently > aligned for fancy SSE instructions and such; the kernel, on the other > hand, never uses those in normal code, and cares about stack usage and > such very much. This makes it nicer for Clang: diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index 1df0dc52cadc..f7e1f68fb50c 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -10,6 +10,14 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, randomize_kstack_offset); DECLARE_PER_CPU(u32, kstack_offset); +#ifdef CONFIG_X86_64 +#define ARCH_STACK_ALIGN_MASK ~((1 << 8) - 1) +#elif defined(CONFIG_X86_32) +#define ARCH_STACK_ALIGN_MASK ~((1 << 4) - 1) +#else +#define ARCH_STACK_ALIGN_MASK ~(0) +#endif + /* * Do not use this anywhere else in the kernel. This is used here because * it provides an arch-agnostic way to grow the stack with correct @@ -23,7 +31,8 @@ void *__builtin_alloca(size_t size); if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ &randomize_kstack_offset)) { \ u32 offset = this_cpu_read(kstack_offset); \ - u8 *ptr = __builtin_alloca(offset & 0x3FF); \ + u8 *ptr = __builtin_alloca(offset & 0x3FF & \ + ARCH_STACK_ALIGN_MASK); \ asm volatile("" : "=m"(*ptr)); \ } \ } while (0) But I don't like open-coding the x86-ony stack alignment... it should be in Kconfig or something, I think?
On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote: > + > +#define add_random_kstack_offset() do { \ > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > + &randomize_kstack_offset)) { \ > + u32 offset = this_cpu_read(kstack_offset); \ > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > + asm volatile("" : "=m"(*ptr)); \ > + } \ > +} while (0) This feels a little fragile. ptr doesn't escape the block, so the compiler is free to restore the stack immediately after this block. In fact, given that all you've said is that the asm modifies *ptr, but nothing uses that output, the compiler could eliminate the whole thing, no? https://godbolt.org/z/HT43F5 gcc restores the stack immediately, if no function calls come after it. clang completely eliminates the code if no function calls come after. I'm not sure why function calls should affect it, though, given that those functions can't possibly access ptr or the memory it points to. A full memory barrier (like barrier_data) should be better -- it gives the compiler a reason to believe that ptr might escape and be accessed by any code following the block?
On Mon, Jun 22, 2020 at 06:56:15PM -0400, Arvind Sankar wrote: > On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote: > > + > > +#define add_random_kstack_offset() do { \ > > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > > + &randomize_kstack_offset)) { \ > > + u32 offset = this_cpu_read(kstack_offset); \ > > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > > + asm volatile("" : "=m"(*ptr)); \ > > + } \ > > +} while (0) > > This feels a little fragile. ptr doesn't escape the block, so the > compiler is free to restore the stack immediately after this block. In > fact, given that all you've said is that the asm modifies *ptr, but > nothing uses that output, the compiler could eliminate the whole thing, > no? > > https://godbolt.org/z/HT43F5 > > gcc restores the stack immediately, if no function calls come after it. > > clang completely eliminates the code if no function calls come after. nothing uses the stack in your example. And adding a barrier (which is what the "=m" is, doesn't change it. > I'm not sure why function calls should affect it, though, given that > those functions can't possibly access ptr or the memory it points to. It seems to work correctly: https://godbolt.org/z/c3W5BW
On Mon, Jun 22, 2020 at 04:07:11PM -0700, Kees Cook wrote: > On Mon, Jun 22, 2020 at 06:56:15PM -0400, Arvind Sankar wrote: > > On Mon, Jun 22, 2020 at 12:31:44PM -0700, Kees Cook wrote: > > > + > > > +#define add_random_kstack_offset() do { \ > > > + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ > > > + &randomize_kstack_offset)) { \ > > > + u32 offset = this_cpu_read(kstack_offset); \ > > > + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ > > > + asm volatile("" : "=m"(*ptr)); \ > > > + } \ > > > +} while (0) > > > > This feels a little fragile. ptr doesn't escape the block, so the > > compiler is free to restore the stack immediately after this block. In > > fact, given that all you've said is that the asm modifies *ptr, but > > nothing uses that output, the compiler could eliminate the whole thing, > > no? > > > > https://godbolt.org/z/HT43F5 > > > > gcc restores the stack immediately, if no function calls come after it. > > > > clang completely eliminates the code if no function calls come after. > > nothing uses the stack in your example. And adding a barrier (which is > what the "=m" is, doesn't change it. Yeah, I realized that that was what's going on. And clang isn't actually DCE'ing it, it's taking advantage of the red zone since my alloca was small enough. But I still don't see anything _stopping_ the compiler from optimizing this better in the future. The "=m" is not a barrier: it just informs the compiler that the asm produces an output value in *ptr (and no other outputs). If nothing can consume that output, it doesn't stop the compiler from freeing the allocation immediately after the asm instead of at the end of the function. I'm talking about something like asm volatile("" : : "r" (ptr) : "memory"); which tells the compiler that the asm may change memory arbitrarily. Here, we don't use it really as a barrier, but to tell the compiler that the asm may have stashed the value of ptr somewhere in memory, so it's not free to reuse the space that it pointed to until the function returns (unless it can prove that nothing accesses memory, not just that nothing accesses ptr).
On Mon, Jun 22, 2020 at 08:05:10PM -0400, Arvind Sankar wrote: > But I still don't see anything _stopping_ the compiler from optimizing > this better in the future. The "=m" is not a barrier: it just informs > the compiler that the asm produces an output value in *ptr (and no other > outputs). If nothing can consume that output, it doesn't stop the > compiler from freeing the allocation immediately after the asm instead > of at the end of the function. Ah, yeah, I get what you mean. > I'm talking about something like > asm volatile("" : : "r" (ptr) : "memory"); > which tells the compiler that the asm may change memory arbitrarily. Yeah, I will adjust it. > Here, we don't use it really as a barrier, but to tell the compiler that > the asm may have stashed the value of ptr somewhere in memory, so it's > not free to reuse the space that it pointed to until the function > returns (unless it can prove that nothing accesses memory, not just that > nothing accesses ptr).
On 22.06.2020 22:31, Kees Cook wrote: > As Linux kernel stack protections have been constantly improving > (vmap-based stack allocation with guard pages, removal of thread_info, > STACKLEAK), attackers have had to find new ways for their exploits > to work. They have done so, continuing to rely on the kernel's stack > determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT > were not relevant. For example, the following recent attacks would have > been hampered if the stack offset was non-deterministic between syscalls: > > https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf > (page 70: targeting the pt_regs copy with linear stack overflow) > > https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html > (leaked stack address from one syscall as a target during next syscall) > > The main idea is that since the stack offset is randomized on each system > call, it is harder for an attack to reliably land in any particular place > on the thread stack, even with address exposures, as the stack base will > change on the next syscall. Also, since randomization is performed after > placing pt_regs, the ptrace-based approach[1] to discover the randomized > offset during a long-running syscall should not be possible. Hello Kees! I would recommend to disable CONFIG_STACKLEAK_METRICS if kernel stack offset randomization is enabled. It is a debugging feature that provides information about kernel stack usage. That info can be useful for calculating the random offset. I would also recommend to check: there might be other kernel features for debugging or getting statistics that can be used to disclose the random stack offset. Best regards, Alexander
From: Kees Cook > Sent: 23 June 2020 01:56 > On Mon, Jun 22, 2020 at 08:05:10PM -0400, Arvind Sankar wrote: > > But I still don't see anything _stopping_ the compiler from optimizing > > this better in the future. The "=m" is not a barrier: it just informs > > the compiler that the asm produces an output value in *ptr (and no other > > outputs). If nothing can consume that output, it doesn't stop the > > compiler from freeing the allocation immediately after the asm instead > > of at the end of the function. > > Ah, yeah, I get what you mean. > > > I'm talking about something like > > asm volatile("" : : "r" (ptr) : "memory"); > > which tells the compiler that the asm may change memory arbitrarily. > > Yeah, I will adjust it. > > > Here, we don't use it really as a barrier, but to tell the compiler that > > the asm may have stashed the value of ptr somewhere in memory, so it's > > not free to reuse the space that it pointed to until the function > > returns (unless it can prove that nothing accesses memory, not just that > > nothing accesses ptr). Do you need another asm volatile("" : : "r" (ptr) : "memory"); (or similar) at the bottom of the function - that the compiler thinks might access the memory whose address it thought got saved earlier? I wonder if it would be easier to allocate the stack space in the asm wrapper? At least as an architecture option. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/Makefile b/Makefile index b46e91bf0b0e..8cb7a1388950 100644 --- a/Makefile +++ b/Makefile @@ -809,6 +809,10 @@ ifdef CONFIG_INIT_STACK_ALL KBUILD_CFLAGS += -ftrivial-auto-var-init=pattern endif +# While VLAs have been removed, GCC produces unreachable stack probes +# for the randomize_kstack_offset feature. Disable it for all compilers. +KBUILD_CFLAGS += $(call cc-option,-fno-stack-clash-protection,) + DEBUG_CFLAGS := $(call cc-option, -fno-var-tracking-assignments) ifdef CONFIG_DEBUG_INFO diff --git a/arch/Kconfig b/arch/Kconfig index 1ea61290900a..1f52c9cfefca 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -883,6 +883,29 @@ config VMAP_STACK virtual mappings with real shadow memory, and KASAN_VMALLOC must be enabled. +config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET + def_bool n + help + An arch should select this symbol if it can support kernel stack + offset randomization with calls to add_random_kstack_offset() + during syscall entry and choose_random_kstack_offset() during + syscall exit. Downgrading of -fstack-protector-strong to + -fstack-protector should also be applied to the entry code and + closely examined, as the artificial stack bump looks like an array + to the compiler, so it will attempt to add canary checks regardless + of the static branch state. + +config RANDOMIZE_KSTACK_OFFSET_DEFAULT + bool "Randomize kernel stack offset on syscall entry" + depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET + help + The kernel stack offset can be randomized (after pt_regs) by + roughly 5 bits of entropy, frustrating memory corruption + attacks that depend on stack address determinism or + cross-syscall address exposures. This feature is controlled + by kernel boot param "randomize_kstack_offset=on/off", and this + config chooses the default boot state. + config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h new file mode 100644 index 000000000000..1df0dc52cadc --- /dev/null +++ b/include/linux/randomize_kstack.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _LINUX_RANDOMIZE_KSTACK_H +#define _LINUX_RANDOMIZE_KSTACK_H + +#include <linux/kernel.h> +#include <linux/jump_label.h> +#include <linux/percpu-defs.h> + +DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, + randomize_kstack_offset); +DECLARE_PER_CPU(u32, kstack_offset); + +/* + * Do not use this anywhere else in the kernel. This is used here because + * it provides an arch-agnostic way to grow the stack with correct + * alignment. Also, since this use is being explicitly masked to a max of + * 10 bits, stack-clash style attacks are unlikely. For more details see + * "VLAs" in Documentation/process/deprecated.rst + */ +void *__builtin_alloca(size_t size); + +#define add_random_kstack_offset() do { \ + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ + &randomize_kstack_offset)) { \ + u32 offset = this_cpu_read(kstack_offset); \ + u8 *ptr = __builtin_alloca(offset & 0x3FF); \ + asm volatile("" : "=m"(*ptr)); \ + } \ +} while (0) + +#define choose_random_kstack_offset(rand) do { \ + if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ + &randomize_kstack_offset)) { \ + u32 offset = this_cpu_read(kstack_offset); \ + offset ^= (rand); \ + this_cpu_write(kstack_offset, offset); \ + } \ +} while (0) + +#endif diff --git a/init/main.c b/init/main.c index 0ead83e86b5a..fa8ae0ae3ac2 100644 --- a/init/main.c +++ b/init/main.c @@ -822,6 +822,29 @@ static void __init mm_init(void) pti_init(); } +#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET +DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, + randomize_kstack_offset); +DEFINE_PER_CPU(u32, kstack_offset); + +static int __init early_randomize_kstack_offset(char *buf) +{ + int ret; + bool bool_result; + + ret = kstrtobool(buf, &bool_result); + if (ret) + return ret; + + if (bool_result) + static_branch_enable(&randomize_kstack_offset); + else + static_branch_disable(&randomize_kstack_offset); + return 0; +} +early_param("randomize_kstack_offset", early_randomize_kstack_offset); +#endif + void __init __weak arch_call_rest_init(void) { rest_init();