diff mbox series

[v2,25/39] x86/cet/shstk: Handle thread shadow stack

Message ID 20220929222936.14584-26-rick.p.edgecombe@intel.com (mailing list archive)
State New
Headers show
Series Shadowstacks for userspace | expand

Commit Message

Edgecombe, Rick P Sept. 29, 2022, 10:29 p.m. UTC
From: Yu-cheng Yu <yu-cheng.yu@intel.com>

When a process is duplicated, but the child shares the address space with
the parent, there is potential for the threads sharing a single stack to
cause conflicts for each other. In the normal non-cet case this is handled
in two ways.

With regular CLONE_VM a new stack is provided by userspace such that the
parent and child have different stacks.

For vfork, the parent is suspended until the child exits. So as long as
the child doesn't return from the vfork()/CLONE_VFORK calling function and
sticks to a limited set of operations, the parent and child can share the
same stack.

For shadow stack, these scenarios present similar sharing problems. For the
CLONE_VM case, the child and the parent must have separate shadow stacks.
Instead of changing clone to take a shadow stack, have the kernel just
allocate one and switch to it.

Use stack_size passed from clone3() syscall for thread shadow stack size. A
compat-mode thread shadow stack size is further reduced to 1/4. This
allows more threads to run in a 32-bit address space. The clone() does not
pass stack_size, which was added to clone3(). In that case, use
RLIMIT_STACK size and cap to 4 GB.

For shadow stack enabled vfork(), the parent and child can share the same
shadow stack, like they can share a normal stack. Since the parent is
suspended until the child terminates, the child will not interfere with
the parent while executing as long as it doesn't return from the vfork()
and overwrite up the shadow stack. The child can safely overwrite down
the shadow stack, as the parent can just overwrite this later. So CET does
not add any additional limitations for vfork().

Userspace implementing posix vfork() can actually prevent the child from
returning from the vfork() calling function, using CET. Glibc does this
by adjusting the shadow stack pointer in the child, so that the child
receives a #CP if it tries to return from vfork() calling function.

Free the shadow stack on thread exit by doing it in mm_release(). Skip
this when exiting a vfork() child since the stack is shared in the
parent.

During this operation, the shadow stack pointer of the new thread needs
to be updated to point to the newly allocated shadow stack. Since the
ability to do this is confined to the FPU subsystem, change
fpu_clone() to take the new shadow stack pointer, and update it
internally inside the FPU subsystem. This part was suggested by Thomas
Gleixner.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

---

v2:
 - Have fpu_clone() take new shadow stack pointer and update SSP in
   xsave buffer for new task. (tglx)

v1:
 - Expand commit log.
 - Add more comments.
 - Switch to xsave helpers.

Yu-cheng v30:
 - Update comments about clone()/clone3(). (Borislav Petkov)

Yu-cheng v29:
 - WARN_ON_ONCE() when get_xsave_addr() returns NULL, and update comments.
   (Dave Hansen)

 arch/x86/include/asm/cet.h         |  7 +++++
 arch/x86/include/asm/fpu/sched.h   |  3 +-
 arch/x86/include/asm/mmu_context.h |  2 ++
 arch/x86/kernel/fpu/core.c         | 40 ++++++++++++++++++++++++-
 arch/x86/kernel/process.c          | 17 ++++++++++-
 arch/x86/kernel/shstk.c            | 48 +++++++++++++++++++++++++++++-
 6 files changed, 113 insertions(+), 4 deletions(-)

Comments

Mike Rapoport Oct. 3, 2022, 10:36 a.m. UTC | #1
On Thu, Sep 29, 2022 at 03:29:22PM -0700, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> When a process is duplicated, but the child shares the address space with
> the parent, there is potential for the threads sharing a single stack to
> cause conflicts for each other. In the normal non-cet case this is handled
> in two ways.
> 
> With regular CLONE_VM a new stack is provided by userspace such that the
> parent and child have different stacks.
> 
> For vfork, the parent is suspended until the child exits. So as long as
> the child doesn't return from the vfork()/CLONE_VFORK calling function and
> sticks to a limited set of operations, the parent and child can share the
> same stack.
> 
> For shadow stack, these scenarios present similar sharing problems. For the
> CLONE_VM case, the child and the parent must have separate shadow stacks.
> Instead of changing clone to take a shadow stack, have the kernel just
> allocate one and switch to it.
> 
> Use stack_size passed from clone3() syscall for thread shadow stack size. A
> compat-mode thread shadow stack size is further reduced to 1/4. This
> allows more threads to run in a 32-bit address space. The clone() does not
> pass stack_size, which was added to clone3(). In that case, use
> RLIMIT_STACK size and cap to 4 GB.
> 
> For shadow stack enabled vfork(), the parent and child can share the same
> shadow stack, like they can share a normal stack. Since the parent is
> suspended until the child terminates, the child will not interfere with
> the parent while executing as long as it doesn't return from the vfork()
> and overwrite up the shadow stack. The child can safely overwrite down
> the shadow stack, as the parent can just overwrite this later. So CET does
> not add any additional limitations for vfork().
> 
> Userspace implementing posix vfork() can actually prevent the child from
> returning from the vfork() calling function, using CET. Glibc does this
> by adjusting the shadow stack pointer in the child, so that the child
> receives a #CP if it tries to return from vfork() calling function.
> 
> Free the shadow stack on thread exit by doing it in mm_release(). Skip
> this when exiting a vfork() child since the stack is shared in the
> parent.
> 
> During this operation, the shadow stack pointer of the new thread needs
> to be updated to point to the newly allocated shadow stack. Since the
> ability to do this is confined to the FPU subsystem, change
> fpu_clone() to take the new shadow stack pointer, and update it
> internally inside the FPU subsystem. This part was suggested by Thomas
> Gleixner.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 
> ---
> 
> v2:
>  - Have fpu_clone() take new shadow stack pointer and update SSP in
>    xsave buffer for new task. (tglx)
> 
> v1:
>  - Expand commit log.
>  - Add more comments.
>  - Switch to xsave helpers.
> 
> Yu-cheng v30:
>  - Update comments about clone()/clone3(). (Borislav Petkov)
> 
> Yu-cheng v29:
>  - WARN_ON_ONCE() when get_xsave_addr() returns NULL, and update comments.
>    (Dave Hansen)
> 
>  arch/x86/include/asm/cet.h         |  7 +++++
>  arch/x86/include/asm/fpu/sched.h   |  3 +-
>  arch/x86/include/asm/mmu_context.h |  2 ++
>  arch/x86/kernel/fpu/core.c         | 40 ++++++++++++++++++++++++-
>  arch/x86/kernel/process.c          | 17 ++++++++++-
>  arch/x86/kernel/shstk.c            | 48 +++++++++++++++++++++++++++++-
>  6 files changed, 113 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 778d3054ccc7..f332e9b42b6d 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -555,8 +555,40 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu)
>  	}
>  }
>  
> +#ifdef CONFIG_X86_SHADOW_STACK
> +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
> +{
> +	struct cet_user_state *xstate;
> +
> +	/* If ssp update is not needed. */
> +	if (!ssp)
> +		return 0;
> +
> +	xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave,
> +				XFEATURE_CET_USER);
> +
> +	/*
> +	 * If there is a non-zero ssp, then 'dst' must be configured with a shadow
> +	 * stack and the fpu state should be up to date since it was just copied
> +	 * from the parent in fpu_clone(). So there must be a valid non-init CET
> +	 * state location in the buffer.
> +	 */
> +	if (WARN_ON_ONCE(!xstate))
> +		return 1;
> +
> +	xstate->user_ssp = (u64)ssp;
> +
> +	return 0;
> +}
> +#else
> +static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr)
> +{

return 0; ?

> +}
> +#endif
> +
Edgecombe, Rick P Oct. 3, 2022, 4:57 p.m. UTC | #2
On Mon, 2022-10-03 at 13:36 +0300, Mike Rapoport wrote:
> > +static int update_fpu_shstk(struct task_struct *dst, unsigned long
> > shstk_addr)
> > +{
> 
> return 0; ?
> 
> > +}
> > +#endif
> > +

Oops. It was a last minute change to have update_fpu_shstk() return
int. Thanks.
Kees Cook Oct. 3, 2022, 8:29 p.m. UTC | #3
On Thu, Sep 29, 2022 at 03:29:22PM -0700, Rick Edgecombe wrote:
> [...]
> +#ifdef CONFIG_X86_SHADOW_STACK
> +static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
> +{
> +	struct cet_user_state *xstate;
> +
> +	/* If ssp update is not needed. */
> +	if (!ssp)
> +		return 0;

My brain will work to undo the collision of Shadow Stack Pointer with
Stack Smashing Protection. ;)

> [...]
> diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
> index a0b8d4adb2bf..db4e53f9fdaf 100644
> --- a/arch/x86/kernel/shstk.c
> +++ b/arch/x86/kernel/shstk.c
> @@ -118,6 +118,46 @@ void reset_thread_shstk(void)
>  	current->thread.features_locked = 0;
>  }
>  
> +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
> +			     unsigned long stack_size, unsigned long *shstk_addr)

Er, arg 3 is "stack_size". From later:

> +     ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, &shstk_addr);
                                                       ^^^^^^^^^^^

clone_flags and args->flags are identical ... this must be accidentally
working. I was expecting 0 there.

> +{
> +	struct thread_shstk *shstk = &tsk->thread.shstk;
> +	unsigned long addr;
> +
> +	/*
> +	 * If shadow stack is not enabled on the new thread, skip any
> +	 * switch to a new shadow stack.
> +	 */
> +	if (!feature_enabled(CET_SHSTK))
> +		return 0;
> +
> +	/*
> +	 * clone() does not pass stack_size, which was added to clone3().
> +	 * Use RLIMIT_STACK and cap to 4 GB.
> +	 */
> +	if (!stack_size)
> +		stack_size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G);

Again, perhaps the clamp should happen in alloc_shstk()?

> +
> +	/*
> +	 * For CLONE_VM, except vfork, the child needs a separate shadow
> +	 * stack.
> +	 */
> +	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM)
> +		return 0;
> +
> +
> +	stack_size = PAGE_ALIGN(stack_size);

Uhm, I think a line went missing here. :P

"x86/cet/shstk: Introduce map_shadow_stack syscall" adds the missing:

+	addr = alloc_shstk(0, stack_size, 0, false);

Please add back the original. :)

> +	if (IS_ERR_VALUE(addr))
> +		return PTR_ERR((void *)addr);
> +
> +	shstk->base = addr;
> +	shstk->size = stack_size;
> +
> +	*shstk_addr = addr + stack_size;
> +
> +	return 0;
> +}
> +
>  void shstk_free(struct task_struct *tsk)
>  {
>  	struct thread_shstk *shstk = &tsk->thread.shstk;
> @@ -126,7 +166,13 @@ void shstk_free(struct task_struct *tsk)
>  	    !feature_enabled(CET_SHSTK))
>  		return;
>  
> -	if (!tsk->mm)
> +	/*
> +	 * When fork() with CLONE_VM fails, the child (tsk) already has a
> +	 * shadow stack allocated, and exit_thread() calls this function to
> +	 * free it.  In this case the parent (current) and the child share
> +	 * the same mm struct.
> +	 */
> +	if (!tsk->mm || tsk->mm != current->mm)
>  		return;
>  
>  	unmap_shadow_stack(shstk->base, shstk->size);
> -- 
> 2.17.1
>
Edgecombe, Rick P Oct. 4, 2022, 10:09 p.m. UTC | #4
On Mon, 2022-10-03 at 13:29 -0700, Kees Cook wrote:
> On Thu, Sep 29, 2022 at 03:29:22PM -0700, Rick Edgecombe wrote:
> > [...]
> > +#ifdef CONFIG_X86_SHADOW_STACK
> > +static int update_fpu_shstk(struct task_struct *dst, unsigned long
> > ssp)
> > +{
> > +	struct cet_user_state *xstate;
> > +
> > +	/* If ssp update is not needed. */
> > +	if (!ssp)
> > +		return 0;
> 
> My brain will work to undo the collision of Shadow Stack Pointer with
> Stack Smashing Protection. ;)
> 
> > [...]
> > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
> > index a0b8d4adb2bf..db4e53f9fdaf 100644
> > --- a/arch/x86/kernel/shstk.c
> > +++ b/arch/x86/kernel/shstk.c
> > @@ -118,6 +118,46 @@ void reset_thread_shstk(void)
> >  	current->thread.features_locked = 0;
> >  }
> >  
> > +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned
> > long clone_flags,
> > +			     unsigned long stack_size, unsigned long
> > *shstk_addr)
> 
> Er, arg 3 is "stack_size". From later:
> 
> > +     ret = shstk_alloc_thread_stack(p, clone_flags, args->flags,
> > &shstk_addr);
> 
>                                                        ^^^^^^^^^^^
> 
> clone_flags and args->flags are identical ... this must be
> accidentally
> working. I was expecting 0 there.

Oh wow. A stack_size used to be passed into copy_thread(), but I messed
up the rebase badly. Thanks for catching it.

> 
> > +{
> > +	struct thread_shstk *shstk = &tsk->thread.shstk;
> > +	unsigned long addr;
> > +
> > +	/*
> > +	 * If shadow stack is not enabled on the new thread, skip any
> > +	 * switch to a new shadow stack.
> > +	 */
> > +	if (!feature_enabled(CET_SHSTK))
> > +		return 0;
> > +
> > +	/*
> > +	 * clone() does not pass stack_size, which was added to
> > clone3().
> > +	 * Use RLIMIT_STACK and cap to 4 GB.
> > +	 */
> > +	if (!stack_size)
> > +		stack_size = min_t(unsigned long long,
> > rlimit(RLIMIT_STACK), SZ_4G);
> 
> Again, perhaps the clamp should happen in alloc_shstk()?

The map_shadow_stack() is kind of like mmap(). I think it shouldn't get
the rlimit restriction. But I can pull the shared logic into a helper
for the other two cases.

> 
> > +
> > +	/*
> > +	 * For CLONE_VM, except vfork, the child needs a separate
> > shadow
> > +	 * stack.
> > +	 */
> > +	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM)
> > +		return 0;
> > +
> > +
> > +	stack_size = PAGE_ALIGN(stack_size);
> 
> Uhm, I think a line went missing here. :P
> 
> "x86/cet/shstk: Introduce map_shadow_stack syscall" adds the missing:
> 
> +	addr = alloc_shstk(0, stack_size, 0, false);
> 
> Please add back the original. :)

Yes, more rebase mangling. Thanks.

> 
> > +	if (IS_ERR_VALUE(addr))
> > +		return PTR_ERR((void *)addr);
> > +
> > +	shstk->base = addr;
> > +	shstk->size = stack_size;
> > +
> > +	*shstk_addr = addr + stack_size;
> > +
> > +	return 0;
> > +}
> > +
> >  void shstk_free(struct task_struct *tsk)
> >  {
> >  	struct thread_shstk *shstk = &tsk->thread.shstk;
> > @@ -126,7 +166,13 @@ void shstk_free(struct task_struct *tsk)
> >  	    !feature_enabled(CET_SHSTK))
> >  		return;
> >  
> > -	if (!tsk->mm)
> > +	/*
> > +	 * When fork() with CLONE_VM fails, the child (tsk) already has
> > a
> > +	 * shadow stack allocated, and exit_thread() calls this
> > function to
> > +	 * free it.  In this case the parent (current) and the child
> > share
> > +	 * the same mm struct.
> > +	 */
> > +	if (!tsk->mm || tsk->mm != current->mm)
> >  		return;
> >  
> >  	unmap_shadow_stack(shstk->base, shstk->size);
> > -- 
> > 2.17.1
> > 
> 
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cet.h b/arch/x86/include/asm/cet.h
index a4a1f4c0089b..924de99e0c61 100644
--- a/arch/x86/include/asm/cet.h
+++ b/arch/x86/include/asm/cet.h
@@ -16,6 +16,9 @@  struct thread_shstk {
 long cet_prctl(struct task_struct *task, int option,
 		      unsigned long features);
 int shstk_setup(void);
+int shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
+			     unsigned long stack_size,
+			     unsigned long *shstk_addr);
 void shstk_free(struct task_struct *p);
 int shstk_disable(void);
 void reset_thread_shstk(void);
@@ -23,6 +26,10 @@  void reset_thread_shstk(void);
 static inline long cet_prctl(struct task_struct *task, int option,
 		      unsigned long features) { return -EINVAL; }
 static inline int shstk_setup(void) { return -EOPNOTSUPP; }
+static inline int shstk_alloc_thread_stack(struct task_struct *p,
+					   unsigned long clone_flags,
+					   unsigned long stack_size,
+					   unsigned long *shstk_addr) { return 0; }
 static inline void shstk_free(struct task_struct *p) {}
 static inline int shstk_disable(void) { return -EOPNOTSUPP; }
 static inline void reset_thread_shstk(void) {}
diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index b2486b2cbc6e..54c9c2fd1907 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -11,7 +11,8 @@ 
 
 extern void save_fpregs_to_fpstate(struct fpu *fpu);
 extern void fpu__drop(struct fpu *fpu);
-extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal);
+extern int  fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
+		      unsigned long shstk_addr);
 extern void fpu_flush_thread(void);
 
 /*
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..d29988cbdf20 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -146,6 +146,8 @@  do {						\
 #else
 #define deactivate_mm(tsk, mm)			\
 do {						\
+	if (!tsk->vfork_done)			\
+		shstk_free(tsk);		\
 	load_gs_index(0);			\
 	loadsegment(fs, 0);			\
 } while (0)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 778d3054ccc7..f332e9b42b6d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -555,8 +555,40 @@  static inline void fpu_inherit_perms(struct fpu *dst_fpu)
 	}
 }
 
+#ifdef CONFIG_X86_SHADOW_STACK
+static int update_fpu_shstk(struct task_struct *dst, unsigned long ssp)
+{
+	struct cet_user_state *xstate;
+
+	/* If ssp update is not needed. */
+	if (!ssp)
+		return 0;
+
+	xstate = get_xsave_addr(&dst->thread.fpu.fpstate->regs.xsave,
+				XFEATURE_CET_USER);
+
+	/*
+	 * If there is a non-zero ssp, then 'dst' must be configured with a shadow
+	 * stack and the fpu state should be up to date since it was just copied
+	 * from the parent in fpu_clone(). So there must be a valid non-init CET
+	 * state location in the buffer.
+	 */
+	if (WARN_ON_ONCE(!xstate))
+		return 1;
+
+	xstate->user_ssp = (u64)ssp;
+
+	return 0;
+}
+#else
+static int update_fpu_shstk(struct task_struct *dst, unsigned long shstk_addr)
+{
+}
+#endif
+
 /* Clone current's FPU state on fork */
-int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal)
+int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal,
+	      unsigned long ssp)
 {
 	struct fpu *src_fpu = &current->thread.fpu;
 	struct fpu *dst_fpu = &dst->thread.fpu;
@@ -616,6 +648,12 @@  int fpu_clone(struct task_struct *dst, unsigned long clone_flags, bool minimal)
 	if (use_xsave())
 		dst_fpu->fpstate->regs.xsave.header.xfeatures &= ~XFEATURE_MASK_PASID;
 
+	/*
+	 * Update shadow stack pointer, in case it changed during clone.
+	 */
+	if (update_fpu_shstk(dst, ssp))
+		return 1;
+
 	trace_x86_fpu_copy_src(src_fpu);
 	trace_x86_fpu_copy_dst(dst_fpu);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 034880311e6b..5e63d190becd 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,6 +47,7 @@ 
 #include <asm/frame.h>
 #include <asm/unwind.h>
 #include <asm/tdx.h>
+#include <asm/cet.h>
 
 #include "process.h"
 
@@ -118,6 +119,7 @@  void exit_thread(struct task_struct *tsk)
 
 	free_vm86(t);
 
+	shstk_free(tsk);
 	fpu__drop(fpu);
 }
 
@@ -139,6 +141,7 @@  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	struct inactive_task_frame *frame;
 	struct fork_frame *fork_frame;
 	struct pt_regs *childregs;
+	unsigned long shstk_addr = 0;
 	int ret = 0;
 
 	childregs = task_pt_regs(p);
@@ -173,7 +176,12 @@  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	frame->flags = X86_EFLAGS_FIXED;
 #endif
 
-	fpu_clone(p, clone_flags, args->fn);
+	/* Allocate a new shadow stack for pthread if needed */
+	ret = shstk_alloc_thread_stack(p, clone_flags, args->flags, &shstk_addr);
+	if (ret)
+		return ret;
+
+	fpu_clone(p, clone_flags, args->fn, shstk_addr);
 
 	/* Kernel thread ? */
 	if (unlikely(p->flags & PF_KTHREAD)) {
@@ -219,6 +227,13 @@  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP)))
 		io_bitmap_share(p);
 
+	/*
+	 * If copy_thread() if failing, don't leak the shadow stack possibly
+	 * allocated in shstk_alloc_thread_stack() above.
+	 */
+	if (ret)
+		shstk_free(p);
+
 	return ret;
 }
 
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index a0b8d4adb2bf..db4e53f9fdaf 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -118,6 +118,46 @@  void reset_thread_shstk(void)
 	current->thread.features_locked = 0;
 }
 
+int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
+			     unsigned long stack_size, unsigned long *shstk_addr)
+{
+	struct thread_shstk *shstk = &tsk->thread.shstk;
+	unsigned long addr;
+
+	/*
+	 * If shadow stack is not enabled on the new thread, skip any
+	 * switch to a new shadow stack.
+	 */
+	if (!feature_enabled(CET_SHSTK))
+		return 0;
+
+	/*
+	 * clone() does not pass stack_size, which was added to clone3().
+	 * Use RLIMIT_STACK and cap to 4 GB.
+	 */
+	if (!stack_size)
+		stack_size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G);
+
+	/*
+	 * For CLONE_VM, except vfork, the child needs a separate shadow
+	 * stack.
+	 */
+	if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM)
+		return 0;
+
+
+	stack_size = PAGE_ALIGN(stack_size);
+	if (IS_ERR_VALUE(addr))
+		return PTR_ERR((void *)addr);
+
+	shstk->base = addr;
+	shstk->size = stack_size;
+
+	*shstk_addr = addr + stack_size;
+
+	return 0;
+}
+
 void shstk_free(struct task_struct *tsk)
 {
 	struct thread_shstk *shstk = &tsk->thread.shstk;
@@ -126,7 +166,13 @@  void shstk_free(struct task_struct *tsk)
 	    !feature_enabled(CET_SHSTK))
 		return;
 
-	if (!tsk->mm)
+	/*
+	 * When fork() with CLONE_VM fails, the child (tsk) already has a
+	 * shadow stack allocated, and exit_thread() calls this function to
+	 * free it.  In this case the parent (current) and the child share
+	 * the same mm struct.
+	 */
+	if (!tsk->mm || tsk->mm != current->mm)
 		return;
 
 	unmap_shadow_stack(shstk->base, shstk->size);