Message ID | 20180607143807.3611-5-yu-cheng.yu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > When fork() specifies CLONE_VM but not CLONE_VFORK, the child > needs a separate program stack and a separate shadow stack. > This patch handles allocation and freeing of the thread shadow > stack. Aha -- you're trying to make this automatic. I'm not convinced this is a good idea. The Linux kernel has a long and storied history of enabling new hardware features in ways that are almost entirely useless for userspace. Florian, do you have any thoughts on how the user/kernel interaction for the shadow stack should work? My intuition would be that all shadow stack management should be entirely controlled by userspace -- newly cloned threads (with CLONE_VM) should have no shadow stack initially, and newly started processes should have no shadow stack until they ask for one. If it would be needed for optimization, there could some indication in an ELF binary that it is requesting an initial shadow stack. But maybe some kind of automation like this patch does is actually reasonable. --Andy
On 06/07/2018 08:21 PM, Andy Lutomirski wrote: > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: >> >> When fork() specifies CLONE_VM but not CLONE_VFORK, the child >> needs a separate program stack and a separate shadow stack. >> This patch handles allocation and freeing of the thread shadow >> stack. > > Aha -- you're trying to make this automatic. I'm not convinced this > is a good idea. The Linux kernel has a long and storied history of > enabling new hardware features in ways that are almost entirely > useless for userspace. > > Florian, do you have any thoughts on how the user/kernel interaction > for the shadow stack should work? I have not looked at this in detail, have not played with the emulator, and have not been privy to any discussions before these patches have been posted, however … I believe that we want as little code in userspace for shadow stack management as possible. One concern I have is that even with the code we arguably need for various kinds of stack unwinding, we might have unwittingly built a generic trampoline that leads to full CET bypass. I also expect that we'd only have donor mappings in userspace anyway, and that the memory is not actually accessible from userspace if it is used for a shadow stack. > My intuition would be that all > shadow stack management should be entirely controlled by userspace -- > newly cloned threads (with CLONE_VM) should have no shadow stack > initially, and newly started processes should have no shadow stack > until they ask for one. If the new thread doesn't have a shadow stack, we need to disable signals around clone, and we are very likely forced to rewrite the early thread setup in assembler, to avoid spurious calls (including calls to thunks to get EIP on i386). I wouldn't want to do this If we can avoid it. Just using C and hoping to get away with it doesn't sound greater, either. And obviously there is the matter that the initial thread setup code ends up being that universal trampoline. Thanks, Florian
On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote: > > On 06/07/2018 08:21 PM, Andy Lutomirski wrote: > > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > >> > >> When fork() specifies CLONE_VM but not CLONE_VFORK, the child > >> needs a separate program stack and a separate shadow stack. > >> This patch handles allocation and freeing of the thread shadow > >> stack. > > > > Aha -- you're trying to make this automatic. I'm not convinced this > > is a good idea. The Linux kernel has a long and storied history of > > enabling new hardware features in ways that are almost entirely > > useless for userspace. > > > > Florian, do you have any thoughts on how the user/kernel interaction > > for the shadow stack should work? > > I have not looked at this in detail, have not played with the emulator, > and have not been privy to any discussions before these patches have > been posted, however … > > I believe that we want as little code in userspace for shadow stack > management as possible. One concern I have is that even with the code > we arguably need for various kinds of stack unwinding, we might have > unwittingly built a generic trampoline that leads to full CET bypass. I was imagining an API like "allocate a shadow stack for the current thread, fail if the current thread already has one, and turn on the shadow stack". glibc would call clone and then call this ABI pretty much immediately (i.e. before making any calls from which it expects to return). We definitely want strong enough user control that tools like CRIU can continue to work. I haven't looked at the SDM recently enough to remember for sure, but I'm reasonably confident that user code can learn the address of its own shadow stack. If nothing else, CRIU needs to be able to restore from a context where there's a signal on the stack and the signal frame contains a shadow stack pointer. > > I also expect that we'd only have donor mappings in userspace anyway, > and that the memory is not actually accessible from userspace if it is > used for a shadow stack. > > > My intuition would be that all > > shadow stack management should be entirely controlled by userspace -- > > newly cloned threads (with CLONE_VM) should have no shadow stack > > initially, and newly started processes should have no shadow stack > > until they ask for one. > > If the new thread doesn't have a shadow stack, we need to disable > signals around clone, and we are very likely forced to rewrite the early > thread setup in assembler, to avoid spurious calls (including calls to > thunks to get EIP on i386). I wouldn't want to do this If we can avoid > it. Just using C and hoping to get away with it doesn't sound greater, > either. And obviously there is the matter that the initial thread setup > code ends up being that universal trampoline. > Only if the trampoline works if the shadow stack is already enabled. I could very easily be convinced that automatic shadow stack setup is a good idea, but I still think we need manual control for CRIU and such.
On 06/07/2018 10:53 PM, Andy Lutomirski wrote: > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote: >> >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote: >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: >>>> >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child >>>> needs a separate program stack and a separate shadow stack. >>>> This patch handles allocation and freeing of the thread shadow >>>> stack. >>> >>> Aha -- you're trying to make this automatic. I'm not convinced this >>> is a good idea. The Linux kernel has a long and storied history of >>> enabling new hardware features in ways that are almost entirely >>> useless for userspace. >>> >>> Florian, do you have any thoughts on how the user/kernel interaction >>> for the shadow stack should work? >> >> I have not looked at this in detail, have not played with the emulator, >> and have not been privy to any discussions before these patches have >> been posted, however … >> >> I believe that we want as little code in userspace for shadow stack >> management as possible. One concern I have is that even with the code >> we arguably need for various kinds of stack unwinding, we might have >> unwittingly built a generic trampoline that leads to full CET bypass. > > I was imagining an API like "allocate a shadow stack for the current > thread, fail if the current thread already has one, and turn on the > shadow stack". glibc would call clone and then call this ABI pretty > much immediately (i.e. before making any calls from which it expects > to return). Ahh. So you propose not to enable shadow stack enforcement on the new thread even if it is enabled for the current thread? For the cases where CLONE_VM is involved? It will still need a new assembler wrapper which sets up the shadow stack, and it's probably required to disable signals. I think it should be reasonable safe and actually implementable. But the benefits are not immediately obvious to me. > We definitely want strong enough user control that tools like CRIU can > continue to work. I haven't looked at the SDM recently enough to > remember for sure, but I'm reasonably confident that user code can > learn the address of its own shadow stack. If nothing else, CRIU > needs to be able to restore from a context where there's a signal on > the stack and the signal frame contains a shadow stack pointer. CRIU also needs the shadow stack *contents*, which shouldn't be directly available to the process. So it needs very special interfaces anyway. Does CRIU implement MPX support? Thanks, Florian
On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote: > > On 06/07/2018 10:53 PM, Andy Lutomirski wrote: > > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote: > >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > >>>> > >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child > >>>> needs a separate program stack and a separate shadow stack. > >>>> This patch handles allocation and freeing of the thread shadow > >>>> stack. > >>> > >>> Aha -- you're trying to make this automatic. I'm not convinced this > >>> is a good idea. The Linux kernel has a long and storied history of > >>> enabling new hardware features in ways that are almost entirely > >>> useless for userspace. > >>> > >>> Florian, do you have any thoughts on how the user/kernel interaction > >>> for the shadow stack should work? > >> > >> I have not looked at this in detail, have not played with the emulator, > >> and have not been privy to any discussions before these patches have > >> been posted, however … > >> > >> I believe that we want as little code in userspace for shadow stack > >> management as possible. One concern I have is that even with the code > >> we arguably need for various kinds of stack unwinding, we might have > >> unwittingly built a generic trampoline that leads to full CET bypass. > > > > I was imagining an API like "allocate a shadow stack for the current > > thread, fail if the current thread already has one, and turn on the > > shadow stack". glibc would call clone and then call this ABI pretty > > much immediately (i.e. before making any calls from which it expects > > to return). > > Ahh. So you propose not to enable shadow stack enforcement on the new > thread even if it is enabled for the current thread? For the cases > where CLONE_VM is involved? > > It will still need a new assembler wrapper which sets up the shadow > stack, and it's probably required to disable signals. > > I think it should be reasonable safe and actually implementable. But > the benefits are not immediately obvious to me. Doing it this way would have been my first incliniation. It would avoid all the oddities of the kernel magically creating a VMA when clone() is called, guessing the shadow stack size, etc. But I'm okay with having the kernel do it automatically, too. I think it would be very nice to have a way for user code to find out the size of the shadow stack and change it, though. (And relocate it, but maybe that's impossible. The CET documentation doesn't have a clear description of the shadow stack layout.) > > > We definitely want strong enough user control that tools like CRIU can > > continue to work. I haven't looked at the SDM recently enough to > > remember for sure, but I'm reasonably confident that user code can > > learn the address of its own shadow stack. If nothing else, CRIU > > needs to be able to restore from a context where there's a signal on > > the stack and the signal frame contains a shadow stack pointer. > > CRIU also needs the shadow stack *contents*, which shouldn't be directly > available to the process. So it needs very special interfaces anyway. True. I proposed in a different email that ptrace() have full control of the shadow stack (read, write, lock, unlock, etc). > > Does CRIU implement MPX support? Dunno. But given that MPX seems to be dying, I'm not sure it matters. --Andy
On Fri, 2018-06-08 at 08:01 -0700, Andy Lutomirski wrote: > On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > On 06/07/2018 10:53 PM, Andy Lutomirski wrote: > > > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote: > > >> > > >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote: > > >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote: > > >>>> > > >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child > > >>>> needs a separate program stack and a separate shadow stack. > > >>>> This patch handles allocation and freeing of the thread shadow > > >>>> stack. > > >>> > > >>> Aha -- you're trying to make this automatic. I'm not convinced this > > >>> is a good idea. The Linux kernel has a long and storied history of > > >>> enabling new hardware features in ways that are almost entirely > > >>> useless for userspace. > > >>> > > >>> Florian, do you have any thoughts on how the user/kernel interaction > > >>> for the shadow stack should work? > > >> > > >> I have not looked at this in detail, have not played with the emulator, > > >> and have not been privy to any discussions before these patches have > > >> been posted, however … > > >> > > >> I believe that we want as little code in userspace for shadow stack > > >> management as possible. One concern I have is that even with the code > > >> we arguably need for various kinds of stack unwinding, we might have > > >> unwittingly built a generic trampoline that leads to full CET bypass. > > > > > > I was imagining an API like "allocate a shadow stack for the current > > > thread, fail if the current thread already has one, and turn on the > > > shadow stack". glibc would call clone and then call this ABI pretty > > > much immediately (i.e. before making any calls from which it expects > > > to return). > > > > Ahh. So you propose not to enable shadow stack enforcement on the new > > thread even if it is enabled for the current thread? For the cases > > where CLONE_VM is involved? > > > > It will still need a new assembler wrapper which sets up the shadow > > stack, and it's probably required to disable signals. > > > > I think it should be reasonable safe and actually implementable. But > > the benefits are not immediately obvious to me. > > Doing it this way would have been my first incliniation. It would > avoid all the oddities of the kernel magically creating a VMA when > clone() is called, guessing the shadow stack size, etc. But I'm okay > with having the kernel do it automatically, too. HJ wanted to add a arch_prctl that allocates a new shadow stack and switches to it. That was mainly for swapcontext. Perhaps we can also use that for threads? HJ, can you comment on this? > I think it would be > very nice to have a way for user code to find out the size of the > shadow stack and change it, though. (And relocate it, but maybe > that's impossible. The CET documentation doesn't have a clear > description of the shadow stack layout.) The shadow stack is vm_mmap'ed from memory and does not have any special layout. We can add a arch_prctl to find out shadow stack's address and size. > > > > > We definitely want strong enough user control that tools like CRIU can > > > continue to work. I haven't looked at the SDM recently enough to > > > remember for sure, but I'm reasonably confident that user code can > > > learn the address of its own shadow stack. If nothing else, CRIU > > > needs to be able to restore from a context where there's a signal on > > > the stack and the signal frame contains a shadow stack pointer. > > > > CRIU also needs the shadow stack *contents*, which shouldn't be directly > > available to the process. So it needs very special interfaces anyway. > > True. I proposed in a different email that ptrace() have full control > of the shadow stack (read, write, lock, unlock, etc). PTRACE can do PTRACE_POKEDATA on shadow stack. We can add lock/unlock. > > > > Does CRIU implement MPX support? > > Dunno. But given that MPX seems to be dying, I'm not sure it matters. > > --Andy
diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h index 5507469cb803..c8fd87e13859 100644 --- a/arch/x86/include/asm/cet.h +++ b/arch/x86/include/asm/cet.h @@ -19,6 +19,7 @@ struct cet_stat { unsigned long cet_get_shstk_ptr(void); int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val); int cet_setup_shstk(void); +int cet_setup_thread_shstk(struct task_struct *p); void cet_disable_shstk(void); void cet_disable_free_shstk(struct task_struct *p); int cet_restore_signal(unsigned long ssp); @@ -28,6 +29,7 @@ static inline unsigned long cet_get_shstk_ptr(void) { return 0; } static inline int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val) { return 0; } static inline int cet_setup_shstk(void) { return 0; } +static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; } static inline void cet_disable_shstk(void) {} static inline void cet_disable_free_shstk(struct task_struct *p) {} static inline int cet_restore_signal(unsigned long ssp) { return 0; } diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index cf9911b5a53c..42395257efc3 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -13,6 +13,7 @@ #include <asm/tlbflush.h> #include <asm/paravirt.h> #include <asm/mpx.h> +#include <asm/cet.h> extern atomic64_t last_mm_ctx_id; @@ -228,6 +229,8 @@ do { \ #else #define deactivate_mm(tsk, mm) \ do { \ + if (!tsk->vfork_done) \ + cet_disable_free_shstk(tsk); \ load_gs_index(0); \ loadsegment(fs, 0); \ } while (0) diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c index 6f445ce94c83..156f5d88ffd5 100644 --- a/arch/x86/kernel/cet.c +++ b/arch/x86/kernel/cet.c @@ -103,6 +103,40 @@ int cet_setup_shstk(void) return 0; } +int cet_setup_thread_shstk(struct task_struct *tsk) +{ + unsigned long addr, size; + struct cet_user_state *state; + + if (!current->thread.cet.shstk_enabled) + return 0; + + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, + XFEATURE_MASK_SHSTK_USER); + + if (!state) + return -EINVAL; + + size = tsk->thread.cet.shstk_size; + if (size == 0) + size = SHSTK_SIZE; + + addr = shstk_mmap(0, size); + + if (addr >= TASK_SIZE) { + tsk->thread.cet.shstk_base = 0; + tsk->thread.cet.shstk_size = 0; + tsk->thread.cet.shstk_enabled = 0; + return -ENOMEM; + } + + state->user_ssp = (u64)(addr + size - sizeof(u64)); + tsk->thread.cet.shstk_base = addr; + tsk->thread.cet.shstk_size = size; + tsk->thread.cet.shstk_enabled = 1; + return 0; +} + void cet_disable_shstk(void) { u64 r; diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b3b0b482983a..ae56caee41f9 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -127,6 +127,7 @@ void exit_thread(struct task_struct *tsk) free_vm86(t); + cet_disable_free_shstk(tsk); fpu__drop(fpu); } diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 12bb445fb98d..6e493b0bcedd 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -317,6 +317,13 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, if (sp) childregs->sp = sp; + /* Allocate a new shadow stack for pthread */ + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM) { + err = cet_setup_thread_shstk(p); + if (err) + goto out; + } + err = -ENOMEM; if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) { p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,
When fork() specifies CLONE_VM but not CLONE_VFORK, the child needs a separate program stack and a separate shadow stack. This patch handles allocation and freeing of the thread shadow stack. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> --- arch/x86/include/asm/cet.h | 2 ++ arch/x86/include/asm/mmu_context.h | 3 +++ arch/x86/kernel/cet.c | 34 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 1 + arch/x86/kernel/process_64.c | 7 +++++++ 5 files changed, 47 insertions(+)