Message ID | 20191024225132.13410-6-samitolvanen@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add support for Clang's Shadow Call Stack | expand |
On Thu, Oct 24, 2019 at 03:51:20PM -0700, samitolvanen@google.com wrote: > This change adds generic support for Clang's Shadow Call Stack, > which uses a shadow stack to protect return addresses from being > overwritten by an attacker. Details are available here: > > https://clang.llvm.org/docs/ShadowCallStack.html > > Note that security guarantees in the kernel differ from the > ones documented for user space. The kernel must store addresses > of shadow stacks used by other tasks and interrupt handlers in > memory, which means an attacker capable reading and writing > arbitrary memory may be able to locate them and hijack control > flow by modifying shadow stacks that are not currently in use. > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > --- > Makefile | 6 ++ > arch/Kconfig | 33 +++++++ > include/linux/compiler-clang.h | 6 ++ > include/linux/compiler_types.h | 4 + > include/linux/scs.h | 78 +++++++++++++++++ > init/init_task.c | 8 ++ > kernel/Makefile | 1 + > kernel/fork.c | 9 ++ > kernel/sched/core.c | 2 + > kernel/sched/sched.h | 1 + > kernel/scs.c | 155 +++++++++++++++++++++++++++++++++ > 11 files changed, 303 insertions(+) > create mode 100644 include/linux/scs.h > create mode 100644 kernel/scs.c > > diff --git a/Makefile b/Makefile > index 5475cdb6d57d..2b5c59fb18f2 100644 > --- a/Makefile > +++ b/Makefile > @@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone) > endif > > +ifdef CONFIG_SHADOW_CALL_STACK > +CC_FLAGS_SCS := -fsanitize=shadow-call-stack > +KBUILD_CFLAGS += $(CC_FLAGS_SCS) > +export CC_FLAGS_SCS > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 5f8a5d84dbbe..5e34cbcd8d6a 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -521,6 +521,39 @@ config STACKPROTECTOR_STRONG > about 20% of all kernel functions, which increases the kernel code > size by about 2%. > > +config ARCH_SUPPORTS_SHADOW_CALL_STACK > + bool > + help > + An architecture should select this if it supports Clang's Shadow > + Call Stack, has asm/scs.h, and implements runtime support for shadow > + stack switching. > + > +config SHADOW_CALL_STACK_VMAP > + bool > + depends on SHADOW_CALL_STACK > + help > + Use virtually mapped shadow call stacks. Selecting this option > + provides better stack exhaustion protection, but increases per-thread > + memory consumption as a full page is allocated for each shadow stack. > + > +config SHADOW_CALL_STACK > + bool "Clang Shadow Call Stack" > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK > + help > + This option enables Clang's Shadow Call Stack, which uses a > + shadow stack to protect function return addresses from being > + overwritten by an attacker. More information can be found from > + Clang's documentation: > + > + https://clang.llvm.org/docs/ShadowCallStack.html > + > + Note that security guarantees in the kernel differ from the ones > + documented for user space. The kernel must store addresses of shadow > + stacks used by other tasks and interrupt handlers in memory, which > + means an attacker capable reading and writing arbitrary memory may > + be able to locate them and hijack control flow by modifying shadow > + stacks that are not currently in use. > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > index 333a6695a918..afe5e24088b2 100644 > --- a/include/linux/compiler-clang.h > +++ b/include/linux/compiler-clang.h > @@ -42,3 +42,9 @@ > * compilers, like ICC. > */ > #define barrier() __asm__ __volatile__("" : : : "memory") > + > +#if __has_feature(shadow_call_stack) > +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) > +#else > +# define __noscs > +#endif Huh. I didn't realise it was valid to have a space after the `#` like this. I see we're very inconsistent about style on that front, so this is fine, I'll just have to get used to it. :) > diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h > index 72393a8c1a6c..be5d5be4b1ae 100644 > --- a/include/linux/compiler_types.h > +++ b/include/linux/compiler_types.h > @@ -202,6 +202,10 @@ struct ftrace_likely_data { > # define randomized_struct_fields_end > #endif > > +#ifndef __noscs > +# define __noscs > +#endif > + > #ifndef asm_volatile_goto > #define asm_volatile_goto(x...) asm goto(x) > #endif > diff --git a/include/linux/scs.h b/include/linux/scs.h > new file mode 100644 > index 000000000000..c8b0ccfdd803 > --- /dev/null > +++ b/include/linux/scs.h > @@ -0,0 +1,78 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2018 Google LLC > + */ > + > +#ifndef _LINUX_SCS_H > +#define _LINUX_SCS_H > + > +#include <linux/gfp.h> > +#include <linux/sched.h> > +#include <asm/page.h> > + > +#ifdef CONFIG_SHADOW_CALL_STACK > + > +#define SCS_SIZE 1024 I think it'd be worth a comment on how this size was chosen. IIRC this empirical? > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL Keyboard smash? ... or is there a prize for whoever figures out the secret? ;) > + > +#define GFP_SCS (GFP_KERNEL | __GFP_ZERO) > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return task_thread_info(tsk)->shadow_call_stack; > +} > + > +static inline void task_set_scs(struct task_struct *tsk, void *s) > +{ > + task_thread_info(tsk)->shadow_call_stack = s; > +} This should probably be named get and set, or have: #define task_scs(tsk) (task_thread_info(tsk)->shadow_call_stack) ... which can have a trivial implementation as NULL for the !SCS case. > + > +extern void scs_init(void); > +extern void scs_task_init(struct task_struct *tsk); > +extern void scs_task_reset(struct task_struct *tsk); > +extern int scs_prepare(struct task_struct *tsk, int node); > +extern bool scs_corrupted(struct task_struct *tsk); > +extern void scs_release(struct task_struct *tsk); > + > +#else /* CONFIG_SHADOW_CALL_STACK */ > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return 0; > +} For all the trivial wrappers you can put the implementation on the same line as the prototype. That makes it a bit easier to compare against the prototypes on the other side of the ifdeffery. e.g. this lot can be: static inline void *task_scs(struct task_struct *tsk) { return 0; } static inline void task_set_scs(struct task_struct *tsk, void *s) { } static inline void scs_init(void) { } ... > +#endif /* CONFIG_SHADOW_CALL_STACK */ > + > +#endif /* _LINUX_SCS_H */ > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..cbd40460e903 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -11,6 +11,7 @@ > #include <linux/mm.h> > #include <linux/audit.h> > #include <linux/numa.h> > +#include <linux/scs.h> > > #include <asm/pgtable.h> > #include <linux/uaccess.h> > @@ -184,6 +185,13 @@ struct task_struct init_task > }; > EXPORT_SYMBOL(init_task); > > +#ifdef CONFIG_SHADOW_CALL_STACK > +unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] __init_task_data > + __aligned(SCS_SIZE) = { > + [(SCS_SIZE / sizeof(long)) - 1] = SCS_END_MAGIC > +}; > +#endif > + > /* > * Initial thread structure. Alignment of this is handled by a special > * linker map entry. > diff --git a/kernel/Makefile b/kernel/Makefile > index daad787fb795..313dbd44d576 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/ > obj-$(CONFIG_IRQ_WORK) += irq_work.o > obj-$(CONFIG_CPU_PM) += cpu_pm.o > obj-$(CONFIG_BPF) += bpf/ > +obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > > obj-$(CONFIG_PERF_EVENTS) += events/ > > diff --git a/kernel/fork.c b/kernel/fork.c > index bcdf53125210..ae7ebe9f0586 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -94,6 +94,7 @@ > #include <linux/livepatch.h> > #include <linux/thread_info.h> > #include <linux/stackleak.h> > +#include <linux/scs.h> Nit: alphabetical order, please (this should come before stackleak.h). > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > @@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk) > > void free_task(struct task_struct *tsk) > { > + scs_release(tsk); > + > #ifndef CONFIG_THREAD_INFO_IN_TASK > /* > * The task is finally done with both the stack and thread_info, > @@ -834,6 +837,8 @@ void __init fork_init(void) > NULL, free_vm_stack_cache); > #endif > > + scs_init(); > + > lockdep_init_task(&init_task); > uprobes_init(); > } > @@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) > clear_user_return_notifier(tsk); > clear_tsk_need_resched(tsk); > set_task_stack_end_magic(tsk); > + scs_task_init(tsk); > > #ifdef CONFIG_STACKPROTECTOR > tsk->stack_canary = get_random_canary(); > @@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process( > args->tls); > if (retval) > goto bad_fork_cleanup_io; > + retval = scs_prepare(p, node); > + if (retval) > + goto bad_fork_cleanup_thread; Can we please fold scs_prepare() into scs_task_init() and do this in dup_task_struct()? That way we set this up consistently in one place, where we're also allocating the regular stack. Arguably stackleak_task_init() would better fit there too. > > stackleak_task_init(p); > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index dd05a378631a..e7faeb383008 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu) > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > + scs_task_reset(idle); I'm a bit confused by this -- please see comments below on scs_task_reset(). > + > __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 0db2c1b3361e..c153003a011c 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -58,6 +58,7 @@ > #include <linux/profile.h> > #include <linux/psi.h> > #include <linux/rcupdate_wait.h> > +#include <linux/scs.h> > #include <linux/security.h> > #include <linux/stop_machine.h> > #include <linux/suspend.h> > diff --git a/kernel/scs.c b/kernel/scs.c > new file mode 100644 > index 000000000000..383d29e8c199 > --- /dev/null > +++ b/kernel/scs.c > @@ -0,0 +1,155 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2019 Google LLC > + */ > + > +#include <linux/cpuhotplug.h> > +#include <linux/mm.h> > +#include <linux/slab.h> > +#include <linux/scs.h> Nit: alphabetical order, please. > +#include <linux/vmalloc.h> > +#include <asm/scs.h> > + > +static inline void *__scs_base(struct task_struct *tsk) > +{ > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); > +} We only ever assign the base to task_scs(tsk), with the current live value being in a register that we don't read. Are we expecting arch code to keep this up-to-date with the register value? I would have expected that we just leave this as the base (as we do for the regular stack in the task struct), and it's down to arch code to save/restore the current value where necessary. Am I missing some caveat with that approach? > + > +#ifdef CONFIG_SHADOW_CALL_STACK_VMAP > + > +/* Keep a cache of shadow stacks */ > +#define SCS_CACHE_SIZE 2 > +static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]); > + > +static void *scs_alloc(int node) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + void *s; > + > + s = this_cpu_xchg(scs_cache[i], NULL); > + if (s) { > + memset(s, 0, SCS_SIZE); > + return s; > + } > + } > + > + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE); It's probably worth a comment on why we rely on SCS_SIZE <= PAGE_SIZE. > + > + return __vmalloc_node_range(PAGE_SIZE, SCS_SIZE, > + VMALLOC_START, VMALLOC_END, > + GFP_SCS, PAGE_KERNEL, 0, > + node, __builtin_return_address(0)); > +} > + > +static void scs_free(void *s) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0) > + continue; > + > + return; > + } > + > + vfree_atomic(s); > +} > + > +static int scs_cleanup(unsigned int cpu) > +{ > + int i; > + void **cache = per_cpu_ptr(scs_cache, cpu); > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + vfree(cache[i]); > + cache[i] = NULL; > + } > + > + return 0; > +} > + > +void __init scs_init(void) > +{ > + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL, > + scs_cleanup); > +} > + > +#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static struct kmem_cache *scs_cache; > + > +static inline void *scs_alloc(int node) > +{ > + return kmem_cache_alloc_node(scs_cache, GFP_SCS, node); > +} > + > +static inline void scs_free(void *s) > +{ > + kmem_cache_free(scs_cache, s); > +} > + > +void __init scs_init(void) > +{ > + scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE, > + 0, NULL); > + WARN_ON(!scs_cache); > +} > + > +#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static inline unsigned long *scs_magic(struct task_struct *tsk) > +{ > + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long)); Slightly simpler as: return (unsigned long *)(__scs_base(tsk) + SCS_SIZE) - 1; Thanks, Mark.
On Thu, Oct 24, 2019 at 3:51 PM <samitolvanen@google.com> wrote: > > This change adds generic support for Clang's Shadow Call Stack, > which uses a shadow stack to protect return addresses from being > overwritten by an attacker. Details are available here: > > https://clang.llvm.org/docs/ShadowCallStack.html > > Note that security guarantees in the kernel differ from the > ones documented for user space. The kernel must store addresses > of shadow stacks used by other tasks and interrupt handlers in > memory, which means an attacker capable reading and writing > arbitrary memory may be able to locate them and hijack control > flow by modifying shadow stacks that are not currently in use. > > Signed-off-by: Sami Tolvanen <samitolvanen@google.com> > --- > Makefile | 6 ++ > arch/Kconfig | 33 +++++++ > include/linux/compiler-clang.h | 6 ++ > include/linux/compiler_types.h | 4 + > include/linux/scs.h | 78 +++++++++++++++++ > init/init_task.c | 8 ++ > kernel/Makefile | 1 + > kernel/fork.c | 9 ++ > kernel/sched/core.c | 2 + > kernel/sched/sched.h | 1 + > kernel/scs.c | 155 +++++++++++++++++++++++++++++++++ > 11 files changed, 303 insertions(+) > create mode 100644 include/linux/scs.h > create mode 100644 kernel/scs.c > > diff --git a/Makefile b/Makefile > index 5475cdb6d57d..2b5c59fb18f2 100644 > --- a/Makefile > +++ b/Makefile > @@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH > KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone) > endif > > +ifdef CONFIG_SHADOW_CALL_STACK > +CC_FLAGS_SCS := -fsanitize=shadow-call-stack > +KBUILD_CFLAGS += $(CC_FLAGS_SCS) > +export CC_FLAGS_SCS > +endif > + > # arch Makefile may override CC so keep this after arch Makefile is included > NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 5f8a5d84dbbe..5e34cbcd8d6a 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -521,6 +521,39 @@ config STACKPROTECTOR_STRONG > about 20% of all kernel functions, which increases the kernel code > size by about 2%. > > +config ARCH_SUPPORTS_SHADOW_CALL_STACK > + bool > + help > + An architecture should select this if it supports Clang's Shadow > + Call Stack, has asm/scs.h, and implements runtime support for shadow > + stack switching. > + > +config SHADOW_CALL_STACK_VMAP > + bool > + depends on SHADOW_CALL_STACK > + help > + Use virtually mapped shadow call stacks. Selecting this option > + provides better stack exhaustion protection, but increases per-thread > + memory consumption as a full page is allocated for each shadow stack. > + > +config SHADOW_CALL_STACK > + bool "Clang Shadow Call Stack" > + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK > + help > + This option enables Clang's Shadow Call Stack, which uses a > + shadow stack to protect function return addresses from being > + overwritten by an attacker. More information can be found from > + Clang's documentation: > + > + https://clang.llvm.org/docs/ShadowCallStack.html > + > + Note that security guarantees in the kernel differ from the ones > + documented for user space. The kernel must store addresses of shadow > + stacks used by other tasks and interrupt handlers in memory, which > + means an attacker capable reading and writing arbitrary memory may > + be able to locate them and hijack control flow by modifying shadow > + stacks that are not currently in use. > + > config HAVE_ARCH_WITHIN_STACK_FRAMES > bool > help > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h > index 333a6695a918..afe5e24088b2 100644 > --- a/include/linux/compiler-clang.h > +++ b/include/linux/compiler-clang.h > @@ -42,3 +42,9 @@ > * compilers, like ICC. > */ > #define barrier() __asm__ __volatile__("" : : : "memory") > + > +#if __has_feature(shadow_call_stack) > +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) > +#else > +# define __noscs > +#endif > diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h > index 72393a8c1a6c..be5d5be4b1ae 100644 > --- a/include/linux/compiler_types.h > +++ b/include/linux/compiler_types.h > @@ -202,6 +202,10 @@ struct ftrace_likely_data { > # define randomized_struct_fields_end > #endif > > +#ifndef __noscs > +# define __noscs > +#endif > + > #ifndef asm_volatile_goto > #define asm_volatile_goto(x...) asm goto(x) > #endif > diff --git a/include/linux/scs.h b/include/linux/scs.h > new file mode 100644 > index 000000000000..c8b0ccfdd803 > --- /dev/null > +++ b/include/linux/scs.h > @@ -0,0 +1,78 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2018 Google LLC > + */ > + > +#ifndef _LINUX_SCS_H > +#define _LINUX_SCS_H > + > +#include <linux/gfp.h> > +#include <linux/sched.h> > +#include <asm/page.h> > + > +#ifdef CONFIG_SHADOW_CALL_STACK > + > +#define SCS_SIZE 1024 > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL > + > +#define GFP_SCS (GFP_KERNEL | __GFP_ZERO) > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return task_thread_info(tsk)->shadow_call_stack; > +} > + > +static inline void task_set_scs(struct task_struct *tsk, void *s) > +{ > + task_thread_info(tsk)->shadow_call_stack = s; > +} > + > +extern void scs_init(void); > +extern void scs_task_init(struct task_struct *tsk); > +extern void scs_task_reset(struct task_struct *tsk); > +extern int scs_prepare(struct task_struct *tsk, int node); > +extern bool scs_corrupted(struct task_struct *tsk); > +extern void scs_release(struct task_struct *tsk); > + > +#else /* CONFIG_SHADOW_CALL_STACK */ > + > +static inline void *task_scs(struct task_struct *tsk) > +{ > + return 0; > +} > + > +static inline void task_set_scs(struct task_struct *tsk, void *s) > +{ > +} > + > +static inline void scs_init(void) > +{ > +} > + > +static inline void scs_task_init(struct task_struct *tsk) > +{ > +} > + > +static inline void scs_task_reset(struct task_struct *tsk) > +{ > +} > + > +static inline int scs_prepare(struct task_struct *tsk, int node) > +{ > + return 0; > +} > + > +static inline bool scs_corrupted(struct task_struct *tsk) > +{ > + return false; > +} > + > +static inline void scs_release(struct task_struct *tsk) > +{ > +} > + > +#endif /* CONFIG_SHADOW_CALL_STACK */ > + > +#endif /* _LINUX_SCS_H */ > diff --git a/init/init_task.c b/init/init_task.c > index 9e5cbe5eab7b..cbd40460e903 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -11,6 +11,7 @@ > #include <linux/mm.h> > #include <linux/audit.h> > #include <linux/numa.h> > +#include <linux/scs.h> > > #include <asm/pgtable.h> > #include <linux/uaccess.h> > @@ -184,6 +185,13 @@ struct task_struct init_task > }; > EXPORT_SYMBOL(init_task); > > +#ifdef CONFIG_SHADOW_CALL_STACK > +unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] __init_task_data > + __aligned(SCS_SIZE) = { > + [(SCS_SIZE / sizeof(long)) - 1] = SCS_END_MAGIC > +}; > +#endif > + > /* > * Initial thread structure. Alignment of this is handled by a special > * linker map entry. > diff --git a/kernel/Makefile b/kernel/Makefile > index daad787fb795..313dbd44d576 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/ > obj-$(CONFIG_IRQ_WORK) += irq_work.o > obj-$(CONFIG_CPU_PM) += cpu_pm.o > obj-$(CONFIG_BPF) += bpf/ > +obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > > obj-$(CONFIG_PERF_EVENTS) += events/ > > diff --git a/kernel/fork.c b/kernel/fork.c > index bcdf53125210..ae7ebe9f0586 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -94,6 +94,7 @@ > #include <linux/livepatch.h> > #include <linux/thread_info.h> > #include <linux/stackleak.h> > +#include <linux/scs.h> > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > @@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk) > > void free_task(struct task_struct *tsk) > { > + scs_release(tsk); > + > #ifndef CONFIG_THREAD_INFO_IN_TASK > /* > * The task is finally done with both the stack and thread_info, > @@ -834,6 +837,8 @@ void __init fork_init(void) > NULL, free_vm_stack_cache); > #endif > > + scs_init(); > + > lockdep_init_task(&init_task); > uprobes_init(); > } > @@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) > clear_user_return_notifier(tsk); > clear_tsk_need_resched(tsk); > set_task_stack_end_magic(tsk); > + scs_task_init(tsk); > > #ifdef CONFIG_STACKPROTECTOR > tsk->stack_canary = get_random_canary(); > @@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process( > args->tls); > if (retval) > goto bad_fork_cleanup_io; > + retval = scs_prepare(p, node); > + if (retval) > + goto bad_fork_cleanup_thread; > > stackleak_task_init(p); > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index dd05a378631a..e7faeb383008 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu) > raw_spin_lock_irqsave(&idle->pi_lock, flags); > raw_spin_lock(&rq->lock); > > + scs_task_reset(idle); > + > __sched_fork(0, idle); > idle->state = TASK_RUNNING; > idle->se.exec_start = sched_clock(); > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 0db2c1b3361e..c153003a011c 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -58,6 +58,7 @@ > #include <linux/profile.h> > #include <linux/psi.h> > #include <linux/rcupdate_wait.h> > +#include <linux/scs.h> > #include <linux/security.h> > #include <linux/stop_machine.h> > #include <linux/suspend.h> > diff --git a/kernel/scs.c b/kernel/scs.c > new file mode 100644 > index 000000000000..383d29e8c199 > --- /dev/null > +++ b/kernel/scs.c > @@ -0,0 +1,155 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Shadow Call Stack support. > + * > + * Copyright (C) 2019 Google LLC > + */ > + > +#include <linux/cpuhotplug.h> > +#include <linux/mm.h> > +#include <linux/slab.h> > +#include <linux/scs.h> > +#include <linux/vmalloc.h> > +#include <asm/scs.h> > + > +static inline void *__scs_base(struct task_struct *tsk) > +{ > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); > +} > + > +#ifdef CONFIG_SHADOW_CALL_STACK_VMAP > + > +/* Keep a cache of shadow stacks */ > +#define SCS_CACHE_SIZE 2 > +static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]); > + > +static void *scs_alloc(int node) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + void *s; > + > + s = this_cpu_xchg(scs_cache[i], NULL); > + if (s) { > + memset(s, 0, SCS_SIZE); > + return s; > + } > + } > + > + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE); > + > + return __vmalloc_node_range(PAGE_SIZE, SCS_SIZE, > + VMALLOC_START, VMALLOC_END, > + GFP_SCS, PAGE_KERNEL, 0, > + node, __builtin_return_address(0)); > +} > + > +static void scs_free(void *s) > +{ > + int i; > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0) > + continue; > + > + return; > + } prefer: for ...: if foo() == 0: return to: for ...: if foo() != 0: continue return > + > + vfree_atomic(s); > +} > + > +static int scs_cleanup(unsigned int cpu) > +{ > + int i; > + void **cache = per_cpu_ptr(scs_cache, cpu); > + > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > + vfree(cache[i]); > + cache[i] = NULL; > + } > + > + return 0; > +} > + > +void __init scs_init(void) > +{ > + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL, > + scs_cleanup); > +} > + > +#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static struct kmem_cache *scs_cache; > + > +static inline void *scs_alloc(int node) > +{ > + return kmem_cache_alloc_node(scs_cache, GFP_SCS, node); > +} > + > +static inline void scs_free(void *s) > +{ > + kmem_cache_free(scs_cache, s); > +} > + > +void __init scs_init(void) > +{ > + scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE, > + 0, NULL); > + WARN_ON(!scs_cache); > +} > + > +#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */ > + > +static inline unsigned long *scs_magic(struct task_struct *tsk) > +{ > + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long)); > +} > + > +static inline void scs_set_magic(struct task_struct *tsk) > +{ > + *scs_magic(tsk) = SCS_END_MAGIC; > +} > + > +void scs_task_init(struct task_struct *tsk) > +{ > + task_set_scs(tsk, NULL); > +} > + > +void scs_task_reset(struct task_struct *tsk) > +{ > + task_set_scs(tsk, __scs_base(tsk)); > +} > + > +int scs_prepare(struct task_struct *tsk, int node) > +{ > + void *s; > + > + s = scs_alloc(node); > + if (!s) > + return -ENOMEM; > + > + task_set_scs(tsk, s); > + scs_set_magic(tsk); > + > + return 0; > +} > + > +bool scs_corrupted(struct task_struct *tsk) > +{ > + return *scs_magic(tsk) != SCS_END_MAGIC; > +} > + > +void scs_release(struct task_struct *tsk) > +{ > + void *s; > + > + s = __scs_base(tsk); > + if (!s) > + return; > + > + WARN_ON(scs_corrupted(tsk)); > + > + scs_task_init(tsk); > + scs_free(s); > +} > -- > 2.24.0.rc0.303.g954a862665-goog >
On Fri, Oct 25, 2019 at 3:56 AM Mark Rutland <mark.rutland@arm.com> wrote: > > +#define SCS_SIZE 1024 > > I think it'd be worth a comment on how this size was chosen. IIRC this > empirical? Correct. I'll add a comment. > > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL > > Keyboard smash? ... or is there a prize for whoever figures out the > secret? ;) It's a random number, so if someone figures out a secret in it, they'll definitely deserve a prize. :) > > +static inline void task_set_scs(struct task_struct *tsk, void *s) > > +{ > > + task_thread_info(tsk)->shadow_call_stack = s; > > +} > > This should probably be named get and set, or have: > > #define task_scs(tsk) (task_thread_info(tsk)->shadow_call_stack) > > ... which can have a trivial implementation as NULL for the !SCS case. Sure, sounds good. > For all the trivial wrappers you can put the implementation on the same > line as the prototype. That makes it a bit easier to compare against the > prototypes on the other side of the ifdeffery. > > e.g. this lot can be: > > static inline void *task_scs(struct task_struct *tsk) { return 0; } > static inline void task_set_scs(struct task_struct *tsk, void *s) { } > static inline void scs_init(void) { } Agreed. > > diff --git a/kernel/fork.c b/kernel/fork.c > > index bcdf53125210..ae7ebe9f0586 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -94,6 +94,7 @@ > > #include <linux/livepatch.h> > > #include <linux/thread_info.h> > > #include <linux/stackleak.h> > > +#include <linux/scs.h> > > Nit: alphabetical order, please (this should come before stackleak.h). The includes in kernel/fork.c aren't in alphabetical order, so I just added this to the end here. > > + retval = scs_prepare(p, node); > > + if (retval) > > + goto bad_fork_cleanup_thread; > > Can we please fold scs_prepare() into scs_task_init() and do this in > dup_task_struct()? That way we set this up consistently in one place, > where we're also allocating the regular stack. Yes, that does sound cleaner. I'll change that. > > + scs_task_reset(idle); > > I'm a bit confused by this -- please see comments below on > scs_task_reset(). > > +static inline void *__scs_base(struct task_struct *tsk) > > +{ > > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); > > +} > > We only ever assign the base to task_scs(tsk), with the current live > value being in a register that we don't read. Are we expecting arch code > to keep this up-to-date with the register value? > > I would have expected that we just leave this as the base (as we do for > the regular stack in the task struct), and it's down to arch code to > save/restore the current value where necessary. > > Am I missing some caveat with that approach? To keep the address of the currently active shadow stack out of memory, the arm64 implementation clears this field when it loads x18 and saves the current value before a context switch. The generic code doesn't expect the arch code to necessarily do so, but does allow it. This requires us to use __scs_base() when accessing the base pointer and to reset it in idle tasks before they're reused, hence scs_task_reset(). > > + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE); > > It's probably worth a comment on why we rely on SCS_SIZE <= PAGE_SIZE. Ack. > > +static inline unsigned long *scs_magic(struct task_struct *tsk) > > +{ > > + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long)); > > Slightly simpler as: > > return (unsigned long *)(__scs_base(tsk) + SCS_SIZE) - 1; Yes, that's a bit cleaner. I'll fix these in v3. Thank you for the review! Sami
On Fri, Oct 25, 2019 at 9:22 AM Nick Desaulniers <ndesaulniers@google.com> wrote: > > +static void scs_free(void *s) > > +{ > > + int i; > > + > > + for (i = 0; i < SCS_CACHE_SIZE; i++) { > > + if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0) > > + continue; > > + > > + return; > > + } > > prefer: > > for ...: > if foo() == 0: > return > > to: > > for ...: > if foo() != 0: > continue > return This was essentially copied from free_thread_stack in kernel/fork.c, but I agree, your way is cleaner. I'll change this in the next version. Thanks! Sami
On Thu, 2019-10-24 at 15:51 -0700, samitolvanen@google.com wrote: > This change adds generic support for Clang's Shadow Call Stack, > which uses a shadow stack to protect return addresses from being > overwritten by an attacker. Details are available here: [] > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h [] > @@ -42,3 +42,9 @@ > * compilers, like ICC. > */ > #define barrier() __asm__ __volatile__("" : : : "memory") > + > +#if __has_feature(shadow_call_stack) > +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) __no_sanitize__
Hi Joe, On Sat, Oct 26, 2019 at 8:57 AM Joe Perches <joe@perches.com> wrote: > > +#if __has_feature(shadow_call_stack) > > +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) > > __no_sanitize__ Sorry, I missed your earlier message about this. I'm following Clang's documentation for the attribute: https://clang.llvm.org/docs/ShadowCallStack.html#attribute-no-sanitize-shadow-call-stack Although __no_sanitize__ seems to work too. Is there a particular reason to prefer that form over the one in the documentation? Sami
On Mon, Oct 28, 2019 at 8:31 AM Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote: > We decided to do it like that when I introduced compiler_attributes.h. > > Given it is hidden behind a definition, we don't care about which one we use internally; therefore the idea was to avoid clashes as much as possible with other names/definitions/etc. > > The syntax is supported in the compilers we care about (for docs on attributes, the best reference is GCC's by the way). Got it, thank you for explaining. I'll change this to __no_sanitize__ in v3 since Clang seems to be happy with either version. Sami
On Fri, Oct 25, 2019 at 01:49:21PM -0700, Sami Tolvanen wrote: > On Fri, Oct 25, 2019 at 3:56 AM Mark Rutland <mark.rutland@arm.com> wrote: > > > +#define SCS_END_MAGIC 0xaf0194819b1635f6UL > > > > Keyboard smash? ... or is there a prize for whoever figures out the > > secret? ;) > > It's a random number, so if someone figures out a secret in it, > they'll definitely deserve a prize. :) I'll Cc some treasure hunters. :) > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > index bcdf53125210..ae7ebe9f0586 100644 > > > --- a/kernel/fork.c > > > +++ b/kernel/fork.c > > > @@ -94,6 +94,7 @@ > > > #include <linux/livepatch.h> > > > #include <linux/thread_info.h> > > > #include <linux/stackleak.h> > > > +#include <linux/scs.h> > > > > Nit: alphabetical order, please (this should come before stackleak.h). > > The includes in kernel/fork.c aren't in alphabetical order, so I just > added this to the end here. Fair enough. It looked otherwise in the context, and we generally aim for that as a soft rule. [...] > > > +static inline void *__scs_base(struct task_struct *tsk) > > > +{ > > > + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); > > > +} > > > > We only ever assign the base to task_scs(tsk), with the current live > > value being in a register that we don't read. Are we expecting arch code > > to keep this up-to-date with the register value? > > > > I would have expected that we just leave this as the base (as we do for > > the regular stack in the task struct), and it's down to arch code to > > save/restore the current value where necessary. > > > > Am I missing some caveat with that approach? > > To keep the address of the currently active shadow stack out of > memory, the arm64 implementation clears this field when it loads x18 > and saves the current value before a context switch. The generic code > doesn't expect the arch code to necessarily do so, but does allow it. > This requires us to use __scs_base() when accessing the base pointer > and to reset it in idle tasks before they're reused, hence > scs_task_reset(). Ok. That'd be worth a comment somewhere, since it adds a number of things which would otherwise be unnecessary. IIUC this assumes an adversary who knows the address of a task's thread_info, and has an arbitrary-read (to extract the SCS base from thead_info) and an arbitrary-write (to modify the SCS area). Assuming that's the case, I don't think this buys much. If said adversary controls two userspace threads A and B, they only need to wait until A is context-switched out or in userspace, and read A's SCS base using B. Given that, I'd rather always store the SCS base in the thread_info, and simplify the rest of the code manipulating it. Thanks, Mark.
On Mon, Oct 28, 2019 at 04:35:33PM +0000, Mark Rutland wrote: > On Fri, Oct 25, 2019 at 01:49:21PM -0700, Sami Tolvanen wrote: > > To keep the address of the currently active shadow stack out of > > memory, the arm64 implementation clears this field when it loads x18 > > and saves the current value before a context switch. The generic code > > doesn't expect the arch code to necessarily do so, but does allow it. > > This requires us to use __scs_base() when accessing the base pointer > > and to reset it in idle tasks before they're reused, hence > > scs_task_reset(). > > Ok. That'd be worth a comment somewhere, since it adds a number of > things which would otherwise be unnecessary. > > IIUC this assumes an adversary who knows the address of a task's > thread_info, and has an arbitrary-read (to extract the SCS base from > thead_info) and an arbitrary-write (to modify the SCS area). > > Assuming that's the case, I don't think this buys much. If said > adversary controls two userspace threads A and B, they only need to wait > until A is context-switched out or in userspace, and read A's SCS base > using B. > > Given that, I'd rather always store the SCS base in the thread_info, and > simplify the rest of the code manipulating it. I'd like to keep this as-is since it provides a temporal protection. Having arbitrary kernel read and write at arbitrary time is a very powerful attack primitive, and is, IMO, not very common. Many attacks tend to be chains of bugs that give attackers narrow visibility in to the kernel at specific moments. I would say this design is more about stopping "current" from dumping thread_info (as there are many more opportunities for current to see its own thread_info compared to arbitrary addresses or another task's thread_info). As such, I think it's a reasonable precaution to take.
On Mon, Oct 28, 2019 at 12:57 PM Kees Cook <keescook@chromium.org> wrote: > On Mon, Oct 28, 2019 at 04:35:33PM +0000, Mark Rutland wrote: > > On Fri, Oct 25, 2019 at 01:49:21PM -0700, Sami Tolvanen wrote: > > > To keep the address of the currently active shadow stack out of > > > memory, the arm64 implementation clears this field when it loads x18 > > > and saves the current value before a context switch. The generic code > > > doesn't expect the arch code to necessarily do so, but does allow it. > > > This requires us to use __scs_base() when accessing the base pointer > > > and to reset it in idle tasks before they're reused, hence > > > scs_task_reset(). > > > > Ok. That'd be worth a comment somewhere, since it adds a number of > > things which would otherwise be unnecessary. > > > > IIUC this assumes an adversary who knows the address of a task's > > thread_info, and has an arbitrary-read (to extract the SCS base from > > thead_info) and an arbitrary-write (to modify the SCS area). > > > > Assuming that's the case, I don't think this buys much. If said > > adversary controls two userspace threads A and B, they only need to wait > > until A is context-switched out or in userspace, and read A's SCS base > > using B. > > > > Given that, I'd rather always store the SCS base in the thread_info, and > > simplify the rest of the code manipulating it. > > I'd like to keep this as-is since it provides a temporal protection. > Having arbitrary kernel read and write at arbitrary time is a very > powerful attack primitive, and is, IMO, not very common. Many attacks > tend to be chains of bugs that give attackers narrow visibility in to the > kernel at specific moments. I would say this design is more about stopping > "current" from dumping thread_info (as there are many more opportunities > for current to see its own thread_info compared to arbitrary addresses > or another task's thread_info). As such, I think it's a reasonable > precaution to take. I'm not sure if always storing the base address in thread_info would simplify the code that much. We could remove __scs_base() and scs_task_reset(), which are both trivial, and drop a few instructions in the arch-specific code that clear the field. I do agree that a comment or two would help understand what's going on here though. Sami
diff --git a/Makefile b/Makefile index 5475cdb6d57d..2b5c59fb18f2 100644 --- a/Makefile +++ b/Makefile @@ -846,6 +846,12 @@ ifdef CONFIG_LIVEPATCH KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone) endif +ifdef CONFIG_SHADOW_CALL_STACK +CC_FLAGS_SCS := -fsanitize=shadow-call-stack +KBUILD_CFLAGS += $(CC_FLAGS_SCS) +export CC_FLAGS_SCS +endif + # arch Makefile may override CC so keep this after arch Makefile is included NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include) diff --git a/arch/Kconfig b/arch/Kconfig index 5f8a5d84dbbe..5e34cbcd8d6a 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -521,6 +521,39 @@ config STACKPROTECTOR_STRONG about 20% of all kernel functions, which increases the kernel code size by about 2%. +config ARCH_SUPPORTS_SHADOW_CALL_STACK + bool + help + An architecture should select this if it supports Clang's Shadow + Call Stack, has asm/scs.h, and implements runtime support for shadow + stack switching. + +config SHADOW_CALL_STACK_VMAP + bool + depends on SHADOW_CALL_STACK + help + Use virtually mapped shadow call stacks. Selecting this option + provides better stack exhaustion protection, but increases per-thread + memory consumption as a full page is allocated for each shadow stack. + +config SHADOW_CALL_STACK + bool "Clang Shadow Call Stack" + depends on ARCH_SUPPORTS_SHADOW_CALL_STACK + help + This option enables Clang's Shadow Call Stack, which uses a + shadow stack to protect function return addresses from being + overwritten by an attacker. More information can be found from + Clang's documentation: + + https://clang.llvm.org/docs/ShadowCallStack.html + + Note that security guarantees in the kernel differ from the ones + documented for user space. The kernel must store addresses of shadow + stacks used by other tasks and interrupt handlers in memory, which + means an attacker capable reading and writing arbitrary memory may + be able to locate them and hijack control flow by modifying shadow + stacks that are not currently in use. + config HAVE_ARCH_WITHIN_STACK_FRAMES bool help diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 333a6695a918..afe5e24088b2 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -42,3 +42,9 @@ * compilers, like ICC. */ #define barrier() __asm__ __volatile__("" : : : "memory") + +#if __has_feature(shadow_call_stack) +# define __noscs __attribute__((no_sanitize("shadow-call-stack"))) +#else +# define __noscs +#endif diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 72393a8c1a6c..be5d5be4b1ae 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -202,6 +202,10 @@ struct ftrace_likely_data { # define randomized_struct_fields_end #endif +#ifndef __noscs +# define __noscs +#endif + #ifndef asm_volatile_goto #define asm_volatile_goto(x...) asm goto(x) #endif diff --git a/include/linux/scs.h b/include/linux/scs.h new file mode 100644 index 000000000000..c8b0ccfdd803 --- /dev/null +++ b/include/linux/scs.h @@ -0,0 +1,78 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Shadow Call Stack support. + * + * Copyright (C) 2018 Google LLC + */ + +#ifndef _LINUX_SCS_H +#define _LINUX_SCS_H + +#include <linux/gfp.h> +#include <linux/sched.h> +#include <asm/page.h> + +#ifdef CONFIG_SHADOW_CALL_STACK + +#define SCS_SIZE 1024 +#define SCS_END_MAGIC 0xaf0194819b1635f6UL + +#define GFP_SCS (GFP_KERNEL | __GFP_ZERO) + +static inline void *task_scs(struct task_struct *tsk) +{ + return task_thread_info(tsk)->shadow_call_stack; +} + +static inline void task_set_scs(struct task_struct *tsk, void *s) +{ + task_thread_info(tsk)->shadow_call_stack = s; +} + +extern void scs_init(void); +extern void scs_task_init(struct task_struct *tsk); +extern void scs_task_reset(struct task_struct *tsk); +extern int scs_prepare(struct task_struct *tsk, int node); +extern bool scs_corrupted(struct task_struct *tsk); +extern void scs_release(struct task_struct *tsk); + +#else /* CONFIG_SHADOW_CALL_STACK */ + +static inline void *task_scs(struct task_struct *tsk) +{ + return 0; +} + +static inline void task_set_scs(struct task_struct *tsk, void *s) +{ +} + +static inline void scs_init(void) +{ +} + +static inline void scs_task_init(struct task_struct *tsk) +{ +} + +static inline void scs_task_reset(struct task_struct *tsk) +{ +} + +static inline int scs_prepare(struct task_struct *tsk, int node) +{ + return 0; +} + +static inline bool scs_corrupted(struct task_struct *tsk) +{ + return false; +} + +static inline void scs_release(struct task_struct *tsk) +{ +} + +#endif /* CONFIG_SHADOW_CALL_STACK */ + +#endif /* _LINUX_SCS_H */ diff --git a/init/init_task.c b/init/init_task.c index 9e5cbe5eab7b..cbd40460e903 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -11,6 +11,7 @@ #include <linux/mm.h> #include <linux/audit.h> #include <linux/numa.h> +#include <linux/scs.h> #include <asm/pgtable.h> #include <linux/uaccess.h> @@ -184,6 +185,13 @@ struct task_struct init_task }; EXPORT_SYMBOL(init_task); +#ifdef CONFIG_SHADOW_CALL_STACK +unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] __init_task_data + __aligned(SCS_SIZE) = { + [(SCS_SIZE / sizeof(long)) - 1] = SCS_END_MAGIC +}; +#endif + /* * Initial thread structure. Alignment of this is handled by a special * linker map entry. diff --git a/kernel/Makefile b/kernel/Makefile index daad787fb795..313dbd44d576 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -102,6 +102,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/ obj-$(CONFIG_IRQ_WORK) += irq_work.o obj-$(CONFIG_CPU_PM) += cpu_pm.o obj-$(CONFIG_BPF) += bpf/ +obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/fork.c b/kernel/fork.c index bcdf53125210..ae7ebe9f0586 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -94,6 +94,7 @@ #include <linux/livepatch.h> #include <linux/thread_info.h> #include <linux/stackleak.h> +#include <linux/scs.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -451,6 +452,8 @@ void put_task_stack(struct task_struct *tsk) void free_task(struct task_struct *tsk) { + scs_release(tsk); + #ifndef CONFIG_THREAD_INFO_IN_TASK /* * The task is finally done with both the stack and thread_info, @@ -834,6 +837,8 @@ void __init fork_init(void) NULL, free_vm_stack_cache); #endif + scs_init(); + lockdep_init_task(&init_task); uprobes_init(); } @@ -907,6 +912,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) clear_user_return_notifier(tsk); clear_tsk_need_resched(tsk); set_task_stack_end_magic(tsk); + scs_task_init(tsk); #ifdef CONFIG_STACKPROTECTOR tsk->stack_canary = get_random_canary(); @@ -2022,6 +2028,9 @@ static __latent_entropy struct task_struct *copy_process( args->tls); if (retval) goto bad_fork_cleanup_io; + retval = scs_prepare(p, node); + if (retval) + goto bad_fork_cleanup_thread; stackleak_task_init(p); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dd05a378631a..e7faeb383008 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6013,6 +6013,8 @@ void init_idle(struct task_struct *idle, int cpu) raw_spin_lock_irqsave(&idle->pi_lock, flags); raw_spin_lock(&rq->lock); + scs_task_reset(idle); + __sched_fork(0, idle); idle->state = TASK_RUNNING; idle->se.exec_start = sched_clock(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0db2c1b3361e..c153003a011c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -58,6 +58,7 @@ #include <linux/profile.h> #include <linux/psi.h> #include <linux/rcupdate_wait.h> +#include <linux/scs.h> #include <linux/security.h> #include <linux/stop_machine.h> #include <linux/suspend.h> diff --git a/kernel/scs.c b/kernel/scs.c new file mode 100644 index 000000000000..383d29e8c199 --- /dev/null +++ b/kernel/scs.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Shadow Call Stack support. + * + * Copyright (C) 2019 Google LLC + */ + +#include <linux/cpuhotplug.h> +#include <linux/mm.h> +#include <linux/slab.h> +#include <linux/scs.h> +#include <linux/vmalloc.h> +#include <asm/scs.h> + +static inline void *__scs_base(struct task_struct *tsk) +{ + return (void *)((uintptr_t)task_scs(tsk) & ~(SCS_SIZE - 1)); +} + +#ifdef CONFIG_SHADOW_CALL_STACK_VMAP + +/* Keep a cache of shadow stacks */ +#define SCS_CACHE_SIZE 2 +static DEFINE_PER_CPU(void *, scs_cache[SCS_CACHE_SIZE]); + +static void *scs_alloc(int node) +{ + int i; + + for (i = 0; i < SCS_CACHE_SIZE; i++) { + void *s; + + s = this_cpu_xchg(scs_cache[i], NULL); + if (s) { + memset(s, 0, SCS_SIZE); + return s; + } + } + + BUILD_BUG_ON(SCS_SIZE > PAGE_SIZE); + + return __vmalloc_node_range(PAGE_SIZE, SCS_SIZE, + VMALLOC_START, VMALLOC_END, + GFP_SCS, PAGE_KERNEL, 0, + node, __builtin_return_address(0)); +} + +static void scs_free(void *s) +{ + int i; + + for (i = 0; i < SCS_CACHE_SIZE; i++) { + if (this_cpu_cmpxchg(scs_cache[i], 0, s) != 0) + continue; + + return; + } + + vfree_atomic(s); +} + +static int scs_cleanup(unsigned int cpu) +{ + int i; + void **cache = per_cpu_ptr(scs_cache, cpu); + + for (i = 0; i < SCS_CACHE_SIZE; i++) { + vfree(cache[i]); + cache[i] = NULL; + } + + return 0; +} + +void __init scs_init(void) +{ + cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "scs:scs_cache", NULL, + scs_cleanup); +} + +#else /* !CONFIG_SHADOW_CALL_STACK_VMAP */ + +static struct kmem_cache *scs_cache; + +static inline void *scs_alloc(int node) +{ + return kmem_cache_alloc_node(scs_cache, GFP_SCS, node); +} + +static inline void scs_free(void *s) +{ + kmem_cache_free(scs_cache, s); +} + +void __init scs_init(void) +{ + scs_cache = kmem_cache_create("scs_cache", SCS_SIZE, SCS_SIZE, + 0, NULL); + WARN_ON(!scs_cache); +} + +#endif /* CONFIG_SHADOW_CALL_STACK_VMAP */ + +static inline unsigned long *scs_magic(struct task_struct *tsk) +{ + return (unsigned long *)(__scs_base(tsk) + SCS_SIZE - sizeof(long)); +} + +static inline void scs_set_magic(struct task_struct *tsk) +{ + *scs_magic(tsk) = SCS_END_MAGIC; +} + +void scs_task_init(struct task_struct *tsk) +{ + task_set_scs(tsk, NULL); +} + +void scs_task_reset(struct task_struct *tsk) +{ + task_set_scs(tsk, __scs_base(tsk)); +} + +int scs_prepare(struct task_struct *tsk, int node) +{ + void *s; + + s = scs_alloc(node); + if (!s) + return -ENOMEM; + + task_set_scs(tsk, s); + scs_set_magic(tsk); + + return 0; +} + +bool scs_corrupted(struct task_struct *tsk) +{ + return *scs_magic(tsk) != SCS_END_MAGIC; +} + +void scs_release(struct task_struct *tsk) +{ + void *s; + + s = __scs_base(tsk); + if (!s) + return; + + WARN_ON(scs_corrupted(tsk)); + + scs_task_init(tsk); + scs_free(s); +}
This change adds generic support for Clang's Shadow Call Stack, which uses a shadow stack to protect return addresses from being overwritten by an attacker. Details are available here: https://clang.llvm.org/docs/ShadowCallStack.html Note that security guarantees in the kernel differ from the ones documented for user space. The kernel must store addresses of shadow stacks used by other tasks and interrupt handlers in memory, which means an attacker capable reading and writing arbitrary memory may be able to locate them and hijack control flow by modifying shadow stacks that are not currently in use. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> --- Makefile | 6 ++ arch/Kconfig | 33 +++++++ include/linux/compiler-clang.h | 6 ++ include/linux/compiler_types.h | 4 + include/linux/scs.h | 78 +++++++++++++++++ init/init_task.c | 8 ++ kernel/Makefile | 1 + kernel/fork.c | 9 ++ kernel/sched/core.c | 2 + kernel/sched/sched.h | 1 + kernel/scs.c | 155 +++++++++++++++++++++++++++++++++ 11 files changed, 303 insertions(+) create mode 100644 include/linux/scs.h create mode 100644 kernel/scs.c