diff mbox series

[v4,21/36] arm64/mm: Implement map_shadow_stack()

Message ID 20230807-arm64-gcs-v4-21-68cfa37f9069@kernel.org (mailing list archive)
State Superseded
Headers show
Series arm64/gcs: Provide support for GCS in userspace | expand

Checks

Context Check Description
conchuod/tree_selection fail Failed to apply to next/pending-fixes, riscv/for-next or riscv/master

Commit Message

Mark Brown Aug. 7, 2023, 10 p.m. UTC
As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for guarded control stacks since they lead to
windows where memory is allocated but not yet protected or stacks which
are not properly and safely initialised. Instead a new syscall
map_shadow_stack() has been defined which allocates and initialises a
shadow stack page.

Implement this for arm64.  Two flags are provided, allowing applications
to request that the stack be initialised with a valid cap token at the
top of the stack and optionally also an end of stack marker above that.
We support requesting an end of stack marker alone but since this is a
NULL pointer it is indistinguishable from not initialising anything by
itself.

Since the x86 code has not yet been rebased to v6.5-rc1 this includes
the architecture neutral parts of Rick Edgecmbe's "x86/shstk: Introduce
map_shadow_stack syscall".

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/mm/gcs.c               | 58 ++++++++++++++++++++++++++++++++++++++-
 include/linux/syscalls.h          |  1 +
 include/uapi/asm-generic/unistd.h |  5 +++-
 kernel/sys_ni.c                   |  1 +
 4 files changed, 63 insertions(+), 2 deletions(-)

Comments

Catalin Marinas Aug. 11, 2023, 4:38 p.m. UTC | #1
On Mon, Aug 07, 2023 at 11:00:26PM +0100, Mark Brown wrote:
> As discussed extensively in the changelog for the addition of this
> syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
> existing mmap() and madvise() syscalls do not map entirely well onto the
> security requirements for guarded control stacks since they lead to
> windows where memory is allocated but not yet protected or stacks which
> are not properly and safely initialised. Instead a new syscall
> map_shadow_stack() has been defined which allocates and initialises a
> shadow stack page.

I guess I need to read the x86 discussion after all ;).

Given that we won't have an mmap(PROT_SHADOW_STACK), are we going to
have restrictions on mprotect()? E.g. it would be useful to reject a
PROT_EXEC on the shadow stack.
Edgecombe, Rick P Aug. 15, 2023, 8:42 p.m. UTC | #2
On Mon, 2023-08-07 at 23:00 +0100, Mark Brown wrote:
> +       if (flags & ~(SHADOW_STACK_SET_TOKEN |
> SHADOW_STACK_SET_MARKER))
> +               return -EINVAL;
> +

Thanks for adding SHADOW_STACK_SET_MARKER. I don't see where it is
defined in these patches though. Might have been left out on accident?
Mark Brown Aug. 15, 2023, 9:01 p.m. UTC | #3
On Tue, Aug 15, 2023 at 08:42:52PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2023-08-07 at 23:00 +0100, Mark Brown wrote:
> > +       if (flags & ~(SHADOW_STACK_SET_TOKEN |
> > SHADOW_STACK_SET_MARKER))
> > +               return -EINVAL;

> Thanks for adding SHADOW_STACK_SET_MARKER. I don't see where it is
> defined in these patches though. Might have been left out on accident?

I added it to the dependency patches I've got which pull bits out of the
x86 series prior to you having rebased it, the ABI bits are mixed in
with the x86 architecture changes which I didn't feel like dealing with
the rebasing for so I pulled out the ABI portions.  I'll resolve this
properly when I rebase back onto the x86 series (ideally after the next
merge window it'll be in mainline!).  For these that'll probably boil
down to adding defines to prctl.h for the generic prctl API.
Mark Brown Aug. 18, 2023, 5:08 p.m. UTC | #4
On Fri, Aug 11, 2023 at 05:38:24PM +0100, Catalin Marinas wrote:

> Given that we won't have an mmap(PROT_SHADOW_STACK), are we going to
> have restrictions on mprotect()? E.g. it would be useful to reject a
> PROT_EXEC on the shadow stack.

mprotect() uses arch_validate_flags() which we're already having cover
this so it's already covered.
Catalin Marinas Aug. 22, 2023, 4:40 p.m. UTC | #5
On Fri, Aug 18, 2023 at 06:08:52PM +0100, Mark Brown wrote:
> On Fri, Aug 11, 2023 at 05:38:24PM +0100, Catalin Marinas wrote:
> 
> > Given that we won't have an mmap(PROT_SHADOW_STACK), are we going to
> > have restrictions on mprotect()? E.g. it would be useful to reject a
> > PROT_EXEC on the shadow stack.
> 
> mprotect() uses arch_validate_flags() which we're already having cover
> this so it's already covered.

I searched the patches and there's no change to the arm64
arch_validate_flags(). Maybe I missed it.
Mark Brown Aug. 22, 2023, 5:05 p.m. UTC | #6
On Tue, Aug 22, 2023 at 05:40:38PM +0100, Catalin Marinas wrote:
> On Fri, Aug 18, 2023 at 06:08:52PM +0100, Mark Brown wrote:

> > mprotect() uses arch_validate_flags() which we're already having cover
> > this so it's already covered.

> I searched the patches and there's no change to the arm64
> arch_validate_flags(). Maybe I missed it.

It's in v5, the update to arch_validate_flags() was one of your comments
from another patch in the series.
diff mbox series

Patch

diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index 64c9f9a85925..b41700d6695e 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -52,7 +52,6 @@  unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
 		return 0;
 
 	size = gcs_size(size);
-
 	addr = alloc_gcs(0, size, 0, 0);
 	if (IS_ERR_VALUE(addr))
 		return addr;
@@ -64,6 +63,63 @@  unsigned long gcs_alloc_thread_stack(struct task_struct *tsk,
 	return addr;
 }
 
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
+{
+	unsigned long alloc_size;
+	unsigned long __user *cap_ptr;
+	unsigned long cap_val;
+	int ret, cap_offset;
+
+	if (!system_supports_gcs())
+		return -EOPNOTSUPP;
+
+	if (flags & ~(SHADOW_STACK_SET_TOKEN | SHADOW_STACK_SET_MARKER))
+		return -EINVAL;
+
+	if (addr % 8)
+		return -EINVAL;
+
+	if (size == 8 || size % 8)
+		return -EINVAL;
+
+	/*
+	 * An overflow would result in attempting to write the restore token
+	 * to the wrong location. Not catastrophic, but just return the right
+	 * error code and block it.
+	 */
+	alloc_size = PAGE_ALIGN(size);
+	if (alloc_size < size)
+		return -EOVERFLOW;
+
+	addr = alloc_gcs(addr, alloc_size, 0, false);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+
+	/*
+	 * Put a cap token at the end of the allocated region so it
+	 * can be switched to.
+	 */
+	if (flags & SHADOW_STACK_SET_TOKEN) {
+		/* Leave an extra empty frame as a top of stack marker? */
+		if (flags & SHADOW_STACK_SET_MARKER)
+			cap_offset = 2;
+		else
+			cap_offset = 1;
+
+		cap_ptr = (unsigned long __user *)(addr + size -
+						   (cap_offset * sizeof(unsigned long)));
+		cap_val = GCS_CAP(cap_ptr);
+
+		ret = copy_to_user_gcs(cap_ptr, &cap_val, 1);
+		if (ret != 0) {
+			vm_munmap(addr, size);
+			return -EFAULT;
+		}
+	}
+
+	return addr;
+}
+
 /*
  * Apply the GCS mode configured for the specified task to the
  * hardware.
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 03e3d0121d5e..7f6dc0988197 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -953,6 +953,7 @@  asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long l
 asmlinkage long sys_cachestat(unsigned int fd,
 		struct cachestat_range __user *cstat_range,
 		struct cachestat __user *cstat, unsigned int flags);
+asmlinkage long sys_map_shadow_stack(unsigned long addr, unsigned long size, unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index fd6c1cb585db..38885a795ea6 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -820,8 +820,11 @@  __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node)
 #define __NR_cachestat 451
 __SYSCALL(__NR_cachestat, sys_cachestat)
 
+#define __NR_map_shadow_stack 452
+__SYSCALL(__NR_map_shadow_stack, sys_map_shadow_stack)
+
 #undef __NR_syscalls
-#define __NR_syscalls 452
+#define __NR_syscalls 453
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 781de7cc6a4e..e137c1385c56 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -274,6 +274,7 @@  COND_SYSCALL(vm86old);
 COND_SYSCALL(modify_ldt);
 COND_SYSCALL(vm86);
 COND_SYSCALL(kexec_file_load);
+COND_SYSCALL(map_shadow_stack);
 
 /* s390 */
 COND_SYSCALL(s390_pci_mmio_read);