Message ID | 20220929222936.14584-26-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadowstacks for userspace | expand |
On Thu, Sep 29, 2022 at 03:29:22PM -0700, Rick Edgecombe wrote: > From: Yu-cheng Yu <yu-cheng.yu@intel.com> > > When a process is duplicated, but the child shares the address space with > the parent, there is potential for the threads sharing a single stack to > cause conflicts for each other. In the normal non-cet case this is handled > in two ways. > > With regular CLONE_VM a new stack is provided by userspace such that the > parent and child have different stacks. > > For vfork, the parent is suspended until the child exits. So as long as > the child doesn't return from the vfork()/CLONE_VFORK calling function and > sticks to a limited set of operations, the parent and child can share the > same stack. > > For shadow stack, these scenarios present similar sharing problems. For the > CLONE_VM case, the child and the parent must have separate shadow stacks. > Instead of changing clone to take a shadow stack, have the kernel just > allocate one and switch to it. > > Use stack_size passed from clone3() syscall for thread shadow stack size. A > compat-mode thread shadow stack size is further reduced to 1/4. This > allows more threads to run in a 32-bit address space. The clone() does not > pass stack_size, which was added to clone3(). In that case, use > RLIMIT_STACK size and cap to 4 GB. > > For shadow stack enabled vfork(), the parent and child can share the same > shadow stack, like they can share a normal stack. Since the parent is > suspended until the child terminates, the child will not interfere with > the parent while executing as long as it doesn't return from the vfork() > and overwrite up the shadow stack. The child can safely overwrite down > the shadow stack, as the parent can just overwrite this later. So CET does > not add any additional limitations for vfork(). > > Userspace implementing posix vfork() can actually prevent the child from > returning from the vfork() calling function, using CET. Glibc does this > by adjusting the shadow stack pointer in the child, so that the child > receives a #CP if it tries to return from vfork() calling function. > > Free the shadow stack on thread exit by doing it in mm_release(). Skip > this when exiting a vfork() child since the stack is shared in the > parent. > > During this operation, the shadow stack pointer of the new thread needs > to be updated to point to the newly allocated shadow stack. Since the > ability to do this is confined to the FPU subsystem, change > fpu_clone() to take the new shadow stack pointer, and update it > internally inside the FPU subsystem. This part was suggested by Thomas > Gleixner. > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > --- > > v2: > - Have fpu_clone() take new shadow stack pointer and update SSP in > xsave buffer for new task. (tglx) > > v1: > - Expand commit log. > - Add more comments. > - Switch to xsave helpers. > > Yu-cheng v30: > - Update comments about clone()/clone3(). (Borislav Petkov) > > Yu-cheng v29: > - WARN_ON_ONCE() when get_xsave_addr() returns NULL, and update comments. > (Dave Hansen) > > arch/x86/include/asm/cet.h | 7 +++++ > arch/x86/include/asm/fpu/sched.h | 3 +- > arch/x86/include/asm/mmu_context.h | 2 ++ > arch/x86/kernel/fpu/core.c | 40 ++++++++++++++++++++++++- > arch/x86/kernel/process.c | 17 ++++++++++- > arch/x86/kernel/shstk.c | 48 +++++++++++++++++++++++++++++- > 6 files changed, 113 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > index 778d3054ccc7..f332e9b42b6d 100644 > --- a/arch/x86/kernel/fpu/core.c > +++ b/arch/x86/kernel/fpu/core.c > @@ -555,8 +555,40 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) > } > } > > +#ifdef CONFIG_X86_SHADOW_STACK > +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) > +{ > + struct cet_user_state *xstate; > + > + /* If ssp update is not needed. */ > + if (!ssp) > + return 0; > + > + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, > + XFEATURE_CET_USER); > + > + /* > + * If there is a non-zero ssp, then 'dst' must be configured with a shadow > + * stack and the fpu state should be up to date since it was just copied > + * from the parent in fpu_clone(). So there must be a valid non-init CET > + * state location in the buffer. > + */ > + if (WARN_ON_ONCE(!xstate)) > + return 1; > + > + xstate->user_ssp = (u64)ssp; > + > + return 0; > +} > +#else > +static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr) > +{ return 0; ? > +} > +#endif > +
On Mon, 2022-10-03 at 13:36 +0300, Mike Rapoport wrote: > > +static int update_fpu_shstk(struct task_struct *dst, unsigned long > > shstk_addr) > > +{ > > return 0; ? > > > +} > > +#endif > > + Oops. It was a last minute change to have update_fpu_shstk() return int. Thanks.
On Thu, Sep 29, 2022 at 03:29:22PM -0700, Rick Edgecombe wrote: > [...] > +#ifdef CONFIG_X86_SHADOW_STACK > +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) > +{ > + struct cet_user_state *xstate; > + > + /* If ssp update is not needed. */ > + if (!ssp) > + return 0; My brain will work to undo the collision of Shadow Stack Pointer with Stack Smashing Protection. ;) > [...] > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c > index a0b8d4adb2bf..db4e53f9fdaf 100644 > --- a/arch/x86/kernel/shstk.c > +++ b/arch/x86/kernel/shstk.c > @@ -118,6 +118,46 @@ void reset_thread_shstk(void) > current->thread.features_locked = 0; > } > > +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, > + unsigned long stack_size, unsigned long *shstk_addr) Er, arg 3 is "stack_size". From later: > + ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, &shstk_addr); ^^^^^^^^^^^ clone_flags and args->flags are identical ... this must be accidentally working. I was expecting 0 there. > +{ > + struct thread_shstk *shstk = &tsk->thread.shstk; > + unsigned long addr; > + > + /* > + * If shadow stack is not enabled on the new thread, skip any > + * switch to a new shadow stack. > + */ > + if (!feature_enabled(CET_SHSTK)) > + return 0; > + > + /* > + * clone() does not pass stack_size, which was added to clone3(). > + * Use RLIMIT_STACK and cap to 4 GB. > + */ > + if (!stack_size) > + stack_size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); Again, perhaps the clamp should happen in alloc_shstk()? > + > + /* > + * For CLONE_VM, except vfork, the child needs a separate shadow > + * stack. > + */ > + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) > + return 0; > + > + > + stack_size = PAGE_ALIGN(stack_size); Uhm, I think a line went missing here. :P "x86/cet/shstk: Introduce map_shadow_stack syscall" adds the missing: + addr = alloc_shstk(0, stack_size, 0, false); Please add back the original. :) > + if (IS_ERR_VALUE(addr)) > + return PTR_ERR((void *)addr); > + > + shstk->base = addr; > + shstk->size = stack_size; > + > + *shstk_addr = addr + stack_size; > + > + return 0; > +} > + > void shstk_free(struct task_struct *tsk) > { > struct thread_shstk *shstk = &tsk->thread.shstk; > @@ -126,7 +166,13 @@ void shstk_free(struct task_struct *tsk) > !feature_enabled(CET_SHSTK)) > return; > > - if (!tsk->mm) > + /* > + * When fork() with CLONE_VM fails, the child (tsk) already has a > + * shadow stack allocated, and exit_thread() calls this function to > + * free it. In this case the parent (current) and the child share > + * the same mm struct. > + */ > + if (!tsk->mm || tsk->mm != current->mm) > return; > > unmap_shadow_stack(shstk->base, shstk->size); > -- > 2.17.1 >
On Mon, 2022-10-03 at 13:29 -0700, Kees Cook wrote: > On Thu, Sep 29, 2022 at 03:29:22PM -0700, Rick Edgecombe wrote: > > [...] > > +#ifdef CONFIG_X86_SHADOW_STACK > > +static int update_fpu_shstk(struct task_struct *dst, unsigned long > > ssp) > > +{ > > + struct cet_user_state *xstate; > > + > > + /* If ssp update is not needed. */ > > + if (!ssp) > > + return 0; > > My brain will work to undo the collision of Shadow Stack Pointer with > Stack Smashing Protection. ;) > > > [...] > > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c > > index a0b8d4adb2bf..db4e53f9fdaf 100644 > > --- a/arch/x86/kernel/shstk.c > > +++ b/arch/x86/kernel/shstk.c > > @@ -118,6 +118,46 @@ void reset_thread_shstk(void) > > current->thread.features_locked = 0; > > } > > > > +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned > > long clone_flags, > > + unsigned long stack_size, unsigned long > > *shstk_addr) > > Er, arg 3 is "stack_size". From later: > > > + ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, > > &shstk_addr); > > ^^^^^^^^^^^ > > clone_flags and args->flags are identical ... this must be > accidentally > working. I was expecting 0 there. Oh wow. A stack_size used to be passed into copy_thread(), but I messed up the rebase badly. Thanks for catching it. > > > +{ > > + struct thread_shstk *shstk = &tsk->thread.shstk; > > + unsigned long addr; > > + > > + /* > > + * If shadow stack is not enabled on the new thread, skip any > > + * switch to a new shadow stack. > > + */ > > + if (!feature_enabled(CET_SHSTK)) > > + return 0; > > + > > + /* > > + * clone() does not pass stack_size, which was added to > > clone3(). > > + * Use RLIMIT_STACK and cap to 4 GB. > > + */ > > + if (!stack_size) > > + stack_size = min_t(unsigned long long, > > rlimit(RLIMIT_STACK), SZ_4G); > > Again, perhaps the clamp should happen in alloc_shstk()? The map_shadow_stack() is kind of like mmap(). I think it shouldn't get the rlimit restriction. But I can pull the shared logic into a helper for the other two cases. > > > + > > + /* > > + * For CLONE_VM, except vfork, the child needs a separate > > shadow > > + * stack. > > + */ > > + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) > > + return 0; > > + > > + > > + stack_size = PAGE_ALIGN(stack_size); > > Uhm, I think a line went missing here. :P > > "x86/cet/shstk: Introduce map_shadow_stack syscall" adds the missing: > > + addr = alloc_shstk(0, stack_size, 0, false); > > Please add back the original. :) Yes, more rebase mangling. Thanks. > > > + if (IS_ERR_VALUE(addr)) > > + return PTR_ERR((void *)addr); > > + > > + shstk->base = addr; > > + shstk->size = stack_size; > > + > > + *shstk_addr = addr + stack_size; > > + > > + return 0; > > +} > > + > > void shstk_free(struct task_struct *tsk) > > { > > struct thread_shstk *shstk = &tsk->thread.shstk; > > @@ -126,7 +166,13 @@ void shstk_free(struct task_struct *tsk) > > !feature_enabled(CET_SHSTK)) > > return; > > > > - if (!tsk->mm) > > + /* > > + * When fork() with CLONE_VM fails, the child (tsk) already has > > a > > + * shadow stack allocated, and exit_thread() calls this > > function to > > + * free it. In this case the parent (current) and the child > > share > > + * the same mm struct. > > + */ > > + if (!tsk->mm || tsk->mm != current->mm) > > return; > > > > unmap_shadow_stack(shstk->base, shstk->size); > > -- > > 2.17.1 > > > >
diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index a4a1f4c0089b..924de99e0c61 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -16,6 +16,9 @@ struct thread_shstk { long cet_prctl(struct task_struct *task, int option, unsigned long features); int shstk_setup(void); +int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr); void shstk_free(struct task_struct *p); int shstk_disable(void); void reset_thread_shstk(void); @@ -23,6 +26,10 @@ void reset_thread_shstk(void); static inline long cet_prctl(struct task_struct *task, int option, unsigned long features) { return -EINVAL; } static inline int shstk_setup(void) { return -EOPNOTSUPP; } +static inline int shstk_alloc_thread_stack(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size, + unsigned long *shstk_addr) { return 0; } static inline void shstk_free(struct task_struct *p) {} static inline int shstk_disable(void) { return -EOPNOTSUPP; } static inline void reset_thread_shstk(void) {} diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h index b2486b2cbc6e..54c9c2fd1907 100644 --- a/arch/x86/include/asm/fpu/sched.h +++ b/arch/x86/include/asm/fpu/sched.h @@ -11,7 +11,8 @@ extern void save_fpregs_to_fpstate(struct fpu *fpu); extern void fpu__drop(struct fpu *fpu); -extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal); +extern int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long shstk_addr); extern void fpu_flush_thread(void); /* diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index b8d40ddeab00..d29988cbdf20 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -146,6 +146,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 778d3054ccc7..f332e9b42b6d 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -555,8 +555,40 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +#ifdef CONFIG_X86_SHADOW_STACK +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp) +{ + struct cet_user_state *xstate; + + /* If ssp update is not needed. */ + if (!ssp) + return 0; + + xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave, + XFEATURE_CET_USER); + + /* + * If there is a non-zero ssp, then 'dst' must be configured with a shadow + * stack and the fpu state should be up to date since it was just copied + * from the parent in fpu_clone(). So there must be a valid non-init CET + * state location in the buffer. + */ + if (WARN_ON_ONCE(!xstate)) + return 1; + + xstate->user_ssp = (u64)ssp; + + return 0; +} +#else +static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr) +{ +} +#endif + /* Clone current's FPU state on fork */ -int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) +int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal, + unsigned long ssp) { struct fpu *src_fpu = ¤t->thread.fpu; struct fpu *dst_fpu = &dst->thread.fpu; @@ -616,6 +648,12 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal) if (use_xsave()) dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID; + /* + * Update shadow stack pointer, in case it changed during clone. + */ + if (update_fpu_shstk(dst, ssp)) + return 1; + trace_x86_fpu_copy_src(src_fpu); trace_x86_fpu_copy_dst(dst_fpu); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 034880311e6b..5e63d190becd 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -47,6 +47,7 @@ #include <asm/frame.h> #include <asm/unwind.h> #include <asm/tdx.h> +#include <asm/cet.h> #include "process.h" @@ -118,6 +119,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -139,6 +141,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) struct inactive_task_frame *frame; struct fork_frame *fork_frame; struct pt_regs *childregs; + unsigned long shstk_addr = 0; int ret = 0; childregs = task_pt_regs(p); @@ -173,7 +176,12 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) frame->flags = X86_EFLAGS_FIXED; #endif - fpu_clone(p, clone_flags, args->fn); + /* Allocate a new shadow stack for pthread if needed */ + ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, &shstk_addr); + if (ret) + return ret; + + fpu_clone(p, clone_flags, args->fn, shstk_addr); /* Kernel thread ? */ if (unlikely(p->flags & PF_KTHREAD)) { @@ -219,6 +227,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args) if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); + /* + * If copy_thread() if failing, don't leak the shadow stack possibly + * allocated in shstk_alloc_thread_stack() above. + */ + if (ret) + shstk_free(p); + return ret; } diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index a0b8d4adb2bf..db4e53f9fdaf 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -118,6 +118,46 @@ void reset_thread_shstk(void) current->thread.features_locked = 0; } +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size, unsigned long *shstk_addr) +{ + struct thread_shstk *shstk = &tsk->thread.shstk; + unsigned long addr; + + /* + * If shadow stack is not enabled on the new thread, skip any + * switch to a new shadow stack. + */ + if (!feature_enabled(CET_SHSTK)) + return 0; + + /* + * clone() does not pass stack_size, which was added to clone3(). + * Use RLIMIT_STACK and cap to 4 GB. + */ + if (!stack_size) + stack_size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); + + /* + * For CLONE_VM, except vfork, the child needs a separate shadow + * stack. + */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + + stack_size = PAGE_ALIGN(stack_size); + if (IS_ERR_VALUE(addr)) + return PTR_ERR((void *)addr); + + shstk->base = addr; + shstk->size = stack_size; + + *shstk_addr = addr + stack_size; + + return 0; +} + void shstk_free(struct task_struct *tsk) { struct thread_shstk *shstk = &tsk->thread.shstk; @@ -126,7 +166,13 @@ void shstk_free(struct task_struct *tsk) !feature_enabled(CET_SHSTK)) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child share + * the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; unmap_shadow_stack(shstk->base, shstk->size);