diff mbox

[04/10] x86/cet: Handle thread shadow stack

Message ID 20180607143807.3611-5-yu-cheng.yu@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yu-cheng Yu June 7, 2018, 2:38 p.m. UTC
When fork() specifies CLONE_VM but not CLONE_VFORK, the child
needs a separate program stack and a separate shadow stack.
This patch handles allocation and freeing of the thread shadow
stack.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
---
 arch/x86/include/asm/cet.h         |  2 ++
 arch/x86/include/asm/mmu_context.h |  3 +++
 arch/x86/kernel/cet.c              | 34 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/process.c          |  1 +
 arch/x86/kernel/process_64.c       |  7 +++++++
 5 files changed, 47 insertions(+)

Comments

Andy Lutomirski June 7, 2018, 6:21 p.m. UTC | #1
On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>
> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> needs a separate program stack and a separate shadow stack.
> This patch handles allocation and freeing of the thread shadow
> stack.

Aha -- you're trying to make this automatic.  I'm not convinced this
is a good idea.  The Linux kernel has a long and storied history of
enabling new hardware features in ways that are almost entirely
useless for userspace.

Florian, do you have any thoughts on how the user/kernel interaction
for the shadow stack should work?  My intuition would be that all
shadow stack management should be entirely controlled by userspace --
newly cloned threads (with CLONE_VM) should have no shadow stack
initially, and newly started processes should have no shadow stack
until they ask for one.  If it would be needed for optimization, there
could some indication in an ELF binary that it is requesting an
initial shadow stack.

But maybe some kind of automation like this patch does is actually reasonable.

--Andy
Florian Weimer June 7, 2018, 7:47 p.m. UTC | #2
On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>
>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
>> needs a separate program stack and a separate shadow stack.
>> This patch handles allocation and freeing of the thread shadow
>> stack.
> 
> Aha -- you're trying to make this automatic.  I'm not convinced this
> is a good idea.  The Linux kernel has a long and storied history of
> enabling new hardware features in ways that are almost entirely
> useless for userspace.
> 
> Florian, do you have any thoughts on how the user/kernel interaction
> for the shadow stack should work?

I have not looked at this in detail, have not played with the emulator, 
and have not been privy to any discussions before these patches have 
been posted, however …

I believe that we want as little code in userspace for shadow stack 
management as possible.  One concern I have is that even with the code 
we arguably need for various kinds of stack unwinding, we might have 
unwittingly built a generic trampoline that leads to full CET bypass.

I also expect that we'd only have donor mappings in userspace anyway, 
and that the memory is not actually accessible from userspace if it is 
used for a shadow stack.

> My intuition would be that all
> shadow stack management should be entirely controlled by userspace --
> newly cloned threads (with CLONE_VM) should have no shadow stack
> initially, and newly started processes should have no shadow stack
> until they ask for one.

If the new thread doesn't have a shadow stack, we need to disable 
signals around clone, and we are very likely forced to rewrite the early 
thread setup in assembler, to avoid spurious calls (including calls to 
thunks to get EIP on i386).  I wouldn't want to do this If we can avoid 
it.  Just using C and hoping to get away with it doesn't sound greater, 
either.  And obviously there is the matter that the initial thread setup 
code ends up being that universal trampoline.

Thanks,
Florian
Andy Lutomirski June 7, 2018, 8:53 p.m. UTC | #3
On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>
> >> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >> needs a separate program stack and a separate shadow stack.
> >> This patch handles allocation and freeing of the thread shadow
> >> stack.
> >
> > Aha -- you're trying to make this automatic.  I'm not convinced this
> > is a good idea.  The Linux kernel has a long and storied history of
> > enabling new hardware features in ways that are almost entirely
> > useless for userspace.
> >
> > Florian, do you have any thoughts on how the user/kernel interaction
> > for the shadow stack should work?
>
> I have not looked at this in detail, have not played with the emulator,
> and have not been privy to any discussions before these patches have
> been posted, however …
>
> I believe that we want as little code in userspace for shadow stack
> management as possible.  One concern I have is that even with the code
> we arguably need for various kinds of stack unwinding, we might have
> unwittingly built a generic trampoline that leads to full CET bypass.

I was imagining an API like "allocate a shadow stack for the current
thread, fail if the current thread already has one, and turn on the
shadow stack".  glibc would call clone and then call this ABI pretty
much immediately (i.e. before making any calls from which it expects
to return).

We definitely want strong enough user control that tools like CRIU can
continue to work.  I haven't looked at the SDM recently enough to
remember for sure, but I'm reasonably confident that user code can
learn the address of its own shadow stack.  If nothing else, CRIU
needs to be able to restore from a context where there's a signal on
the stack and the signal frame contains a shadow stack pointer.


>
> I also expect that we'd only have donor mappings in userspace anyway,
> and that the memory is not actually accessible from userspace if it is
> used for a shadow stack.
>
> > My intuition would be that all
> > shadow stack management should be entirely controlled by userspace --
> > newly cloned threads (with CLONE_VM) should have no shadow stack
> > initially, and newly started processes should have no shadow stack
> > until they ask for one.
>
> If the new thread doesn't have a shadow stack, we need to disable
> signals around clone, and we are very likely forced to rewrite the early
> thread setup in assembler, to avoid spurious calls (including calls to
> thunks to get EIP on i386).  I wouldn't want to do this If we can avoid
> it.  Just using C and hoping to get away with it doesn't sound greater,
> either.  And obviously there is the matter that the initial thread setup
> code ends up being that universal trampoline.
>

Only if the trampoline works if the shadow stack is already enabled.

I could very easily be convinced that automatic shadow stack setup is
a good idea, but I still think we need manual control for CRIU and
such.
Florian Weimer June 8, 2018, 2:53 p.m. UTC | #4
On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
>>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
>>>>
>>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
>>>> needs a separate program stack and a separate shadow stack.
>>>> This patch handles allocation and freeing of the thread shadow
>>>> stack.
>>>
>>> Aha -- you're trying to make this automatic.  I'm not convinced this
>>> is a good idea.  The Linux kernel has a long and storied history of
>>> enabling new hardware features in ways that are almost entirely
>>> useless for userspace.
>>>
>>> Florian, do you have any thoughts on how the user/kernel interaction
>>> for the shadow stack should work?
>>
>> I have not looked at this in detail, have not played with the emulator,
>> and have not been privy to any discussions before these patches have
>> been posted, however …
>>
>> I believe that we want as little code in userspace for shadow stack
>> management as possible.  One concern I have is that even with the code
>> we arguably need for various kinds of stack unwinding, we might have
>> unwittingly built a generic trampoline that leads to full CET bypass.
> 
> I was imagining an API like "allocate a shadow stack for the current
> thread, fail if the current thread already has one, and turn on the
> shadow stack".  glibc would call clone and then call this ABI pretty
> much immediately (i.e. before making any calls from which it expects
> to return).

Ahh.  So you propose not to enable shadow stack enforcement on the new 
thread even if it is enabled for the current thread?  For the cases 
where CLONE_VM is involved?

It will still need a new assembler wrapper which sets up the shadow 
stack, and it's probably required to disable signals.

I think it should be reasonable safe and actually implementable.  But 
the benefits are not immediately obvious to me.

> We definitely want strong enough user control that tools like CRIU can
> continue to work.  I haven't looked at the SDM recently enough to
> remember for sure, but I'm reasonably confident that user code can
> learn the address of its own shadow stack.  If nothing else, CRIU
> needs to be able to restore from a context where there's a signal on
> the stack and the signal frame contains a shadow stack pointer.

CRIU also needs the shadow stack *contents*, which shouldn't be directly 
available to the process.  So it needs very special interfaces anyway.

Does CRIU implement MPX support?

Thanks,
Florian
Andy Lutomirski June 8, 2018, 3:01 p.m. UTC | #5
On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> >>>>
> >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> >>>> needs a separate program stack and a separate shadow stack.
> >>>> This patch handles allocation and freeing of the thread shadow
> >>>> stack.
> >>>
> >>> Aha -- you're trying to make this automatic.  I'm not convinced this
> >>> is a good idea.  The Linux kernel has a long and storied history of
> >>> enabling new hardware features in ways that are almost entirely
> >>> useless for userspace.
> >>>
> >>> Florian, do you have any thoughts on how the user/kernel interaction
> >>> for the shadow stack should work?
> >>
> >> I have not looked at this in detail, have not played with the emulator,
> >> and have not been privy to any discussions before these patches have
> >> been posted, however …
> >>
> >> I believe that we want as little code in userspace for shadow stack
> >> management as possible.  One concern I have is that even with the code
> >> we arguably need for various kinds of stack unwinding, we might have
> >> unwittingly built a generic trampoline that leads to full CET bypass.
> >
> > I was imagining an API like "allocate a shadow stack for the current
> > thread, fail if the current thread already has one, and turn on the
> > shadow stack".  glibc would call clone and then call this ABI pretty
> > much immediately (i.e. before making any calls from which it expects
> > to return).
>
> Ahh.  So you propose not to enable shadow stack enforcement on the new
> thread even if it is enabled for the current thread?  For the cases
> where CLONE_VM is involved?
>
> It will still need a new assembler wrapper which sets up the shadow
> stack, and it's probably required to disable signals.
>
> I think it should be reasonable safe and actually implementable.  But
> the benefits are not immediately obvious to me.

Doing it this way would have been my first incliniation.  It would
avoid all the oddities of the kernel magically creating a VMA when
clone() is called, guessing the shadow stack size, etc.  But I'm okay
with having the kernel do it automatically, too.  I think it would be
very nice to have a way for user code to find out the size of the
shadow stack and change it, though.  (And relocate it, but maybe
that's impossible.  The CET documentation doesn't have a clear
description of the shadow stack layout.)

>
> > We definitely want strong enough user control that tools like CRIU can
> > continue to work.  I haven't looked at the SDM recently enough to
> > remember for sure, but I'm reasonably confident that user code can
> > learn the address of its own shadow stack.  If nothing else, CRIU
> > needs to be able to restore from a context where there's a signal on
> > the stack and the signal frame contains a shadow stack pointer.
>
> CRIU also needs the shadow stack *contents*, which shouldn't be directly
> available to the process.  So it needs very special interfaces anyway.

True.  I proposed in a different email that ptrace() have full control
of the shadow stack (read, write, lock, unlock, etc).

>
> Does CRIU implement MPX support?

Dunno.  But given that MPX seems to be dying, I'm not sure it matters.

--Andy
Yu-cheng Yu June 8, 2018, 3:50 p.m. UTC | #6
On Fri, 2018-06-08 at 08:01 -0700, Andy Lutomirski wrote:
> On Fri, Jun 8, 2018 at 7:53 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > On 06/07/2018 10:53 PM, Andy Lutomirski wrote:
> > > On Thu, Jun 7, 2018 at 12:47 PM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> On 06/07/2018 08:21 PM, Andy Lutomirski wrote:
> > >>> On Thu, Jun 7, 2018 at 7:41 AM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> > >>>>
> > >>>> When fork() specifies CLONE_VM but not CLONE_VFORK, the child
> > >>>> needs a separate program stack and a separate shadow stack.
> > >>>> This patch handles allocation and freeing of the thread shadow
> > >>>> stack.
> > >>>
> > >>> Aha -- you're trying to make this automatic.  I'm not convinced this
> > >>> is a good idea.  The Linux kernel has a long and storied history of
> > >>> enabling new hardware features in ways that are almost entirely
> > >>> useless for userspace.
> > >>>
> > >>> Florian, do you have any thoughts on how the user/kernel interaction
> > >>> for the shadow stack should work?
> > >>
> > >> I have not looked at this in detail, have not played with the emulator,
> > >> and have not been privy to any discussions before these patches have
> > >> been posted, however …
> > >>
> > >> I believe that we want as little code in userspace for shadow stack
> > >> management as possible.  One concern I have is that even with the code
> > >> we arguably need for various kinds of stack unwinding, we might have
> > >> unwittingly built a generic trampoline that leads to full CET bypass.
> > >
> > > I was imagining an API like "allocate a shadow stack for the current
> > > thread, fail if the current thread already has one, and turn on the
> > > shadow stack".  glibc would call clone and then call this ABI pretty
> > > much immediately (i.e. before making any calls from which it expects
> > > to return).
> >
> > Ahh.  So you propose not to enable shadow stack enforcement on the new
> > thread even if it is enabled for the current thread?  For the cases
> > where CLONE_VM is involved?
> >
> > It will still need a new assembler wrapper which sets up the shadow
> > stack, and it's probably required to disable signals.
> >
> > I think it should be reasonable safe and actually implementable.  But
> > the benefits are not immediately obvious to me.
> 
> Doing it this way would have been my first incliniation.  It would
> avoid all the oddities of the kernel magically creating a VMA when
> clone() is called, guessing the shadow stack size, etc.  But I'm okay
> with having the kernel do it automatically, too.

HJ wanted to add a arch_prctl that allocates a new shadow stack and
switches to it.  That was mainly for swapcontext.  Perhaps we can also
use that for threads?  HJ, can you comment on this?

> I think it would be
> very nice to have a way for user code to find out the size of the
> shadow stack and change it, though.  (And relocate it, but maybe
> that's impossible.  The CET documentation doesn't have a clear
> description of the shadow stack layout.)

The shadow stack is vm_mmap'ed from memory and does not have any special
layout.  We can add a arch_prctl to find out shadow stack's address and
size.

> >
> > > We definitely want strong enough user control that tools like CRIU can
> > > continue to work.  I haven't looked at the SDM recently enough to
> > > remember for sure, but I'm reasonably confident that user code can
> > > learn the address of its own shadow stack.  If nothing else, CRIU
> > > needs to be able to restore from a context where there's a signal on
> > > the stack and the signal frame contains a shadow stack pointer.
> >
> > CRIU also needs the shadow stack *contents*, which shouldn't be directly
> > available to the process.  So it needs very special interfaces anyway.
> 
> True.  I proposed in a different email that ptrace() have full control
> of the shadow stack (read, write, lock, unlock, etc).

PTRACE can do PTRACE_POKEDATA on shadow stack.  We can add lock/unlock.

> >
> > Does CRIU implement MPX support?
> 
> Dunno.  But given that MPX seems to be dying, I'm not sure it matters.
> 
> --Andy
diff mbox

Patch

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index 5507469cb803..c8fd87e13859 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -19,6 +19,7 @@  struct cet_stat {
 unsigned long cet_get_shstk_ptr(void);
 int cet_push_shstk(int ia32, unsigned long ssp, unsigned long val);
 int cet_setup_shstk(void);
+int cet_setup_thread_shstk(struct task_struct *p);
 void cet_disable_shstk(void);
 void cet_disable_free_shstk(struct task_struct *p);
 int cet_restore_signal(unsigned long ssp);
@@ -28,6 +29,7 @@  static inline unsigned long cet_get_shstk_ptr(void) { return 0; }
 static inline int cet_push_shstk(int ia32, unsigned long ssp,
 				 unsigned long val) { return 0; }
 static inline int cet_setup_shstk(void) { return 0; }
+static inline int cet_setup_thread_shstk(struct task_struct *p) { return 0; }
 static inline void cet_disable_shstk(void) {}
 static inline void cet_disable_free_shstk(struct task_struct *p) {}
 static inline int cet_restore_signal(unsigned long ssp) { return 0; }
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index cf9911b5a53c..42395257efc3 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@ 
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
 #include <asm/mpx.h>
+#include <asm/cet.h>
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -228,6 +229,8 @@  do {						\
 #else
 #define deactivate_mm(tsk, mm)			\
 do {						\
+	if (!tsk->vfork_done)			\
+		cet_disable_free_shstk(tsk);	\
 	load_gs_index(0);			\
 	loadsegment(fs, 0);			\
 } while (0)
diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c
index 6f445ce94c83..156f5d88ffd5 100644
--- a/arch/x86/kernel/cet.c
+++ b/arch/x86/kernel/cet.c
@@ -103,6 +103,40 @@  int cet_setup_shstk(void)
 	return 0;
 }
 
+int cet_setup_thread_shstk(struct task_struct *tsk)
+{
+	unsigned long addr, size;
+	struct cet_user_state *state;
+
+	if (!current->thread.cet.shstk_enabled)
+		return 0;
+
+	state = get_xsave_addr(&tsk->thread.fpu.state.xsave,
+			       XFEATURE_MASK_SHSTK_USER);
+
+	if (!state)
+		return -EINVAL;
+
+	size = tsk->thread.cet.shstk_size;
+	if (size == 0)
+		size = SHSTK_SIZE;
+
+	addr = shstk_mmap(0, size);
+
+	if (addr >= TASK_SIZE) {
+		tsk->thread.cet.shstk_base = 0;
+		tsk->thread.cet.shstk_size = 0;
+		tsk->thread.cet.shstk_enabled = 0;
+		return -ENOMEM;
+	}
+
+	state->user_ssp = (u64)(addr + size - sizeof(u64));
+	tsk->thread.cet.shstk_base = addr;
+	tsk->thread.cet.shstk_size = size;
+	tsk->thread.cet.shstk_enabled = 1;
+	return 0;
+}
+
 void cet_disable_shstk(void)
 {
 	u64 r;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b3b0b482983a..ae56caee41f9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -127,6 +127,7 @@  void exit_thread(struct task_struct *tsk)
 
 	free_vm86(t);
 
+	cet_disable_free_shstk(tsk);
 	fpu__drop(fpu);
 }
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445fb98d..6e493b0bcedd 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -317,6 +317,13 @@  int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	if (sp)
 		childregs->sp = sp;
 
+	/* Allocate a new shadow stack for pthread */
+	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) == CLONE_VM) {
+		err = cet_setup_thread_shstk(p);
+		if (err)
+			goto out;
+	}
+
 	err = -ENOMEM;
 	if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
 		p->thread.io_bitmap_ptr = kmemdup(me->thread.io_bitmap_ptr,