Message ID | 20210427204315.24153-24-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Control-flow Enforcement: Shadow Stack | expand |
On Tue, Apr 27, 2021 at 01:43:08PM -0700, Yu-cheng Yu wrote: > @@ -181,6 +184,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, > if (clone_flags & CLONE_SETTLS) > ret = set_new_tls(p, tls); > > +#ifdef CONFIG_X86_64 IS_ENABLED > + /* Allocate a new shadow stack for pthread */ > + if (!ret) > + ret = shstk_setup_thread(p, clone_flags, stack_size); > +#endif > + And why is this addition here... > if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) > io_bitmap_share(p); ... instead of here? <--- > > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c > index c815c7507830..d387df84b7f1 100644 > --- a/arch/x86/kernel/shstk.c > +++ b/arch/x86/kernel/shstk.c > @@ -70,6 +70,55 @@ int shstk_setup(void) > return 0; > } > +int shstk_setup_thread(struct task_struct *tsk, unsigned long clone_flags, Judging by what this function does, its name wants to be shstk_alloc_thread_stack() or so? > + unsigned long stack_size) > +{ > + unsigned long addr, size; > + struct cet_user_state *state; > + struct cet_status *cet = &tsk->thread.cet; The tip-tree preferred ordering of variable declarations at the beginning of a function is reverse fir tree order:: struct long_struct_name *descriptive_name; unsigned long foo, bar; unsigned int tmp; int ret; The above is faster to parse than the reverse ordering:: int ret; unsigned int tmp; unsigned long foo, bar; struct long_struct_name *descriptive_name; And even more so than random ordering:: unsigned long foo, bar; int ret; struct long_struct_name *descriptive_name; unsigned int tmp; > + > + if (!cet->shstk_size) > + return 0; > + This check needs a comment. > + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) > + return 0; > + > + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, > + XFEATURE_CET_USER); Let that line stick out. > + > + if (!state) > + return -EINVAL; > + > + if (stack_size == 0) if (!stack_size) > + return -EINVAL; and that test needs to be done first in the function. > + > + /* Cap shadow stack size to 4 GB */ Why? > + size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); > + size = min(size, stack_size); > + > + /* > + * Compat-mode pthreads share a limited address space. > + * If each function call takes an average of four slots > + * stack space, allocate 1/4 of stack size for shadow stack. > + */ > + if (in_compat_syscall()) > + size /= 4; <---- newline here. > + size = round_up(size, PAGE_SIZE); > + addr = alloc_shstk(size); > + ^ Superfluous newline. > + if (IS_ERR_VALUE(addr)) { > + cet->shstk_base = 0; > + cet->shstk_size = 0; > + return PTR_ERR((void *)addr); > + } > + > + fpu__prepare_write(&tsk->thread.fpu); > + state->user_ssp = (u64)(addr + size); cet_user_state has u64, cet_status has unsigned longs. Make them all u64. And since cet_status is per thread, but I had suggested struct shstk_desc, I think now that that should be called struct thread_shstk or so to denote *exactly* what it is. Thx.
On 5/10/2021 7:15 AM, Borislav Petkov wrote: > On Tue, Apr 27, 2021 at 01:43:08PM -0700, Yu-cheng Yu wrote: [...] >> diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c >> index c815c7507830..d387df84b7f1 100644 >> --- a/arch/x86/kernel/shstk.c >> +++ b/arch/x86/kernel/shstk.c >> @@ -70,6 +70,55 @@ int shstk_setup(void) >> return 0; >> } > >> +int shstk_setup_thread(struct task_struct *tsk, unsigned long clone_flags, > > Judging by what this function does, its name wants to be > > shstk_alloc_thread_stack() > > or so? > >> + unsigned long stack_size) >> +{ >> + unsigned long addr, size; >> + struct cet_user_state *state; >> + struct cet_status *cet = &tsk->thread.cet; > > The tip-tree preferred ordering of variable declarations at the > beginning of a function is reverse fir tree order:: > > struct long_struct_name *descriptive_name; > unsigned long foo, bar; > unsigned int tmp; > int ret; > > The above is faster to parse than the reverse ordering:: > > int ret; > unsigned int tmp; > unsigned long foo, bar; > struct long_struct_name *descriptive_name; > > And even more so than random ordering:: > > unsigned long foo, bar; > int ret; > struct long_struct_name *descriptive_name; > unsigned int tmp; > >> + >> + if (!cet->shstk_size) >> + return 0; >> + > > This check needs a comment. > >> + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) >> + return 0; >> + >> + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, >> + XFEATURE_CET_USER); > > Let that line stick out. > >> + >> + if (!state) >> + return -EINVAL; >> + >> + if (stack_size == 0) > > if (!stack_size) > >> + return -EINVAL; > > and that test needs to be done first in the function. > >> + >> + /* Cap shadow stack size to 4 GB */ > > Why? > This is not necessary. I will make it just stack_size, which is passed in from copy_thread(). >> + size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); >> + size = min(size, stack_size); >> + >> + /* >> + * Compat-mode pthreads share a limited address space. >> + * If each function call takes an average of four slots >> + * stack space, allocate 1/4 of stack size for shadow stack. >> + */ >> + if (in_compat_syscall()) >> + size /= 4; > > <---- newline here. > >> + size = round_up(size, PAGE_SIZE); >> + addr = alloc_shstk(size); >> + > > ^ Superfluous newline. > >> + if (IS_ERR_VALUE(addr)) { >> + cet->shstk_base = 0; >> + cet->shstk_size = 0; >> + return PTR_ERR((void *)addr); >> + } >> + >> + fpu__prepare_write(&tsk->thread.fpu); >> + state->user_ssp = (u64)(addr + size); > > cet_user_state has u64, cet_status has unsigned longs. Make them all u64. > > And since cet_status is per thread, but I had suggested struct > shstk_desc, I think now that that should be called > > struct thread_shstk > > or so to denote *exactly* what it is. > So this struct will be: struct thread_shstk { u64 shstk_base; u64 shstk_size; u64 locked:1; u64 ibt:1; }; Ok? Thanks, Yu-cheng
On Mon, May 10, 2021 at 03:57:56PM -0700, Yu, Yu-cheng wrote: > So this struct will be: > > struct thread_shstk { > u64 shstk_base; > u64 shstk_size; > u64 locked:1; > u64 ibt:1; > }; > > Ok? Pretty much. You can even remove the "shstk_" from the members and when you call the pointer "shstk", accessing the members will read shstk->base shstk->size ... and all is organic and readable :) Thx.
On 5/10/2021 7:15 AM, Borislav Petkov wrote: > On Tue, Apr 27, 2021 at 01:43:08PM -0700, Yu-cheng Yu wrote: >> @@ -181,6 +184,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, >> if (clone_flags & CLONE_SETTLS) >> ret = set_new_tls(p, tls); >> >> +#ifdef CONFIG_X86_64 > > IS_ENABLED > >> + /* Allocate a new shadow stack for pthread */ >> + if (!ret) >> + ret = shstk_setup_thread(p, clone_flags, stack_size); >> +#endif >> + > > And why is this addition here... > >> if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) >> io_bitmap_share(p); > > ... instead of here? > > <--- > io_bitmap_share() does refcount_inc(¤t->thread.io_bitmap->refcnt), and the function won't fail. However, shadow stack allocation can fail. So, maybe leave io_bitmap_share() at the end? Thanks, Yu-cheng
From: Borislav Petkov > Sent: 11 May 2021 18:10 > > On Mon, May 10, 2021 at 03:57:56PM -0700, Yu, Yu-cheng wrote: > > So this struct will be: > > > > struct thread_shstk { > > u64 shstk_base; > > u64 shstk_size; > > u64 locked:1; > > u64 ibt:1; No point in bit fields? > > }; > > > > Ok? > > Pretty much. > > You can even remove the "shstk_" from the members and when you call the > pointer "shstk", accessing the members will read > > shstk->base > shstk->size > ... > > and all is organic and readable :) And entirely not greppable. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, May 11, 2021 at 11:35:03AM -0700, Yu, Yu-cheng wrote: > io_bitmap_share() does refcount_inc(¤t->thread.io_bitmap->refcnt), and > the function won't fail. However, shadow stack allocation can fail. So, > maybe leave io_bitmap_share() at the end? Yap, makes sense. Thx.
diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index aa85d599b184..8b83ded577cc 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -16,10 +16,15 @@ struct cet_status { #ifdef CONFIG_X86_SHADOW_STACK int shstk_setup(void); +int shstk_setup_thread(struct task_struct *p, unsigned long clone_flags, + unsigned long stack_size); void shstk_free(struct task_struct *p); void shstk_disable(void); #else static inline int shstk_setup(void) { return 0; } +static inline int shstk_setup_thread(struct task_struct *p, + unsigned long clone_flags, + unsigned long stack_size) { return 0; } static inline void shstk_free(struct task_struct *p) {} static inline void shstk_disable(void) {} #endif diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 27516046117a..53569114aa01 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -11,6 +11,7 @@ #include <asm/tlbflush.h> #include <asm/paravirt.h> +#include <asm/cet.h> #include <asm/debugreg.h> extern atomic64_t last_mm_ctx_id; @@ -146,6 +147,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + shstk_free(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9c214d7085a4..fa01e8679d01 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include <asm/io_bitmap.h> #include <asm/proto.h> #include <asm/frame.h> +#include <asm/cet.h> #include "process.h" @@ -109,6 +110,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + shstk_free(tsk); fpu__drop(fpu); } @@ -122,8 +124,9 @@ static int set_new_tls(struct task_struct *p, unsigned long tls) return do_set_thread_area_64(p, ARCH_SET_FS, tls); } -int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, - struct task_struct *p, unsigned long tls) +int copy_thread(unsigned long clone_flags, unsigned long sp, + unsigned long stack_size, struct task_struct *p, + unsigned long tls) { struct inactive_task_frame *frame; struct fork_frame *fork_frame; @@ -163,7 +166,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, /* Kernel thread ? */ if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) { memset(childregs, 0, sizeof(struct pt_regs)); - kthread_frame_init(frame, sp, arg); + kthread_frame_init(frame, sp, stack_size); return 0; } @@ -181,6 +184,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, if (clone_flags & CLONE_SETTLS) ret = set_new_tls(p, tls); +#ifdef CONFIG_X86_64 + /* Allocate a new shadow stack for pthread */ + if (!ret) + ret = shstk_setup_thread(p, clone_flags, stack_size); +#endif + if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) io_bitmap_share(p); diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c index c815c7507830..d387df84b7f1 100644 --- a/arch/x86/kernel/shstk.c +++ b/arch/x86/kernel/shstk.c @@ -70,6 +70,55 @@ int shstk_setup(void) return 0; } +int shstk_setup_thread(struct task_struct *tsk, unsigned long clone_flags, + unsigned long stack_size) +{ + unsigned long addr, size; + struct cet_user_state *state; + struct cet_status *cet = &tsk->thread.cet; + + if (!cet->shstk_size) + return 0; + + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) + return 0; + + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, + XFEATURE_CET_USER); + + if (!state) + return -EINVAL; + + if (stack_size == 0) + return -EINVAL; + + /* Cap shadow stack size to 4 GB */ + size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); + size = min(size, stack_size); + + /* + * Compat-mode pthreads share a limited address space. + * If each function call takes an average of four slots + * stack space, allocate 1/4 of stack size for shadow stack. + */ + if (in_compat_syscall()) + size /= 4; + size = round_up(size, PAGE_SIZE); + addr = alloc_shstk(size); + + if (IS_ERR_VALUE(addr)) { + cet->shstk_base = 0; + cet->shstk_size = 0; + return PTR_ERR((void *)addr); + } + + fpu__prepare_write(&tsk->thread.fpu); + state->user_ssp = (u64)(addr + size); + cet->shstk_base = addr; + cet->shstk_size = size; + return 0; +} + void shstk_free(struct task_struct *tsk) { struct cet_status *cet = &tsk->thread.cet; @@ -79,7 +128,13 @@ void shstk_free(struct task_struct *tsk) !cet->shstk_base) return; - if (!tsk->mm) + /* + * When fork() with CLONE_VM fails, the child (tsk) already has a + * shadow stack allocated, and exit_thread() calls this function to + * free it. In this case the parent (current) and the child is + * sharing the same mm struct. + */ + if (!tsk->mm || tsk->mm != current->mm) return; while (1) {
For clone() with CLONE_VM specified, the child and the parent must have separate shadow stacks. Thus, the kernel allocates, and frees on thread exit a new shadow stack for the child. Use stack_size passed from clone3() syscall for thread shadow stack size, but cap it to min(RLIMIT_STACK, 4 GB). A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32- bit address space. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- arch/x86/include/asm/cet.h | 5 +++ arch/x86/include/asm/mmu_context.h | 3 ++ arch/x86/kernel/process.c | 15 ++++++-- arch/x86/kernel/shstk.c | 57 +++++++++++++++++++++++++++++- 4 files changed, 76 insertions(+), 4 deletions(-)