diff mbox series

[22/35] x86/mm: Prevent VM_WRITE shadow stacks

Message ID 20220130211838.8382-23-rick.p.edgecombe@intel.com (mailing list archive)
State New
Headers show
Series Shadow stacks for userspace | expand

Commit Message

Rick Edgecombe Jan. 30, 2022, 9:18 p.m. UTC
Shadow stack accesses are writes from handle_mm_fault() perspective. So to
generate the correct PTE, maybe_mkwrite() will rely on the presence of
VM_SHADOW_STACK or VM_WRITE in the vma.

In future patches, when VM_SHADOW_STACK is actually creatable by
userspace, a problem could happen if a user calls
mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code
would then be confused in the event of shadow stack accesses, and create a
writable PTE for a shadow stack access. Then the process would fault in a
loop.

Prevent this from happening by blocking this kind of memory (VM_WRITE and
VM_SHADOW_STACK) from being created, instead of complicating the fault
handler logic to handle it.

Add an x86 arch_validate_flags() implementation to handle the check.
Rename the uapi/asm/mman.h header guard to be able to use it for
arch/x86/include/asm/mman.h where the arch_validate_flags() will be.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---

v1:
 - New patch.

 arch/x86/include/asm/mman.h      | 21 +++++++++++++++++++++
 arch/x86/include/uapi/asm/mman.h |  6 +++---
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/include/asm/mman.h

Comments

Dave Hansen Feb. 11, 2022, 10:19 p.m. UTC | #1
On 1/30/22 13:18, Rick Edgecombe wrote:
> Shadow stack accesses are writes from handle_mm_fault() perspective. So to
> generate the correct PTE, maybe_mkwrite() will rely on the presence of
> VM_SHADOW_STACK or VM_WRITE in the vma.
> 
> In future patches, when VM_SHADOW_STACK is actually creatable by
> userspace, a problem could happen if a user calls
> mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code
> would then be confused in the event of shadow stack accesses, and create a
> writable PTE for a shadow stack access. Then the process would fault in a
> loop.
> 
> Prevent this from happening by blocking this kind of memory (VM_WRITE and
> VM_SHADOW_STACK) from being created, instead of complicating the fault
> handler logic to handle it.
> 
> Add an x86 arch_validate_flags() implementation to handle the check.
> Rename the uapi/asm/mman.h header guard to be able to use it for
> arch/x86/include/asm/mman.h where the arch_validate_flags() will be.

It would be great if this also said:

	There is an existing arch_validate_flags() hook for mmap() and
	mprotect() which allows architectures to reject unwanted
	->vm_flags combinations.  Add an implementation for x86.

That's somewhat implied from what is there already, but making it more
clear would be nice.  There's a much higher bar to add a new arch hook
than to just implement an existing one.


> diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h
> new file mode 100644
> index 000000000000..b44fe31deb3a
> --- /dev/null
> +++ b/arch/x86/include/asm/mman.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_MMAN_H
> +#define _ASM_X86_MMAN_H
> +
> +#include <linux/mm.h>
> +#include <uapi/asm/mman.h>
> +
> +#ifdef CONFIG_X86_SHADOW_STACK
> +static inline bool arch_validate_flags(unsigned long vm_flags)
> +{
> +	if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE))
> +		return false;
> +
> +	return true;
> +}

The design decision here seems to be that VM_SHADOW_STACK is itself a
pseudo-VM_WRITE flag.  Like you said: "Shadow stack accesses are writes
from handle_mm_fault()".

Very early on, this series seems to have made the decision that shadow
stacks are writable and need lots of write handling behavior, *BUT*
shouldn't have VM_WRITE set.  As a whole, that seems odd.

The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be set
together.  I guess the downside is that pte_mkwrite() would need to be
made to work on shadow stack PTEs.

That particular design decision was never discussed.  I think it has a
really big impact on the rest of the series.  What do you think?  Was it
a good idea?  Or would the alternative be more complicated than what you
have now?
Rick Edgecombe Feb. 12, 2022, 1:44 a.m. UTC | #2
On Fri, 2022-02-11 at 14:19 -0800, Dave Hansen wrote:
> On 1/30/22 13:18, Rick Edgecombe wrote:
> > Shadow stack accesses are writes from handle_mm_fault()
> > perspective. So to
> > generate the correct PTE, maybe_mkwrite() will rely on the presence
> > of
> > VM_SHADOW_STACK or VM_WRITE in the vma.
> > 
> > In future patches, when VM_SHADOW_STACK is actually creatable by
> > userspace, a problem could happen if a user calls
> > mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory.
> > The code
> > would then be confused in the event of shadow stack accesses, and
> > create a
> > writable PTE for a shadow stack access. Then the process would
> > fault in a
> > loop.
> > 
> > Prevent this from happening by blocking this kind of memory
> > (VM_WRITE and
> > VM_SHADOW_STACK) from being created, instead of complicating the
> > fault
> > handler logic to handle it.
> > 
> > Add an x86 arch_validate_flags() implementation to handle the
> > check.
> > Rename the uapi/asm/mman.h header guard to be able to use it for
> > arch/x86/include/asm/mman.h where the arch_validate_flags() will
> > be.
> 
> It would be great if this also said:
> 
> 	There is an existing arch_validate_flags() hook for mmap() and
> 	mprotect() which allows architectures to reject unwanted
> 	->vm_flags combinations.  Add an implementation for x86.
> 
> That's somewhat implied from what is there already, but making it
> more
> clear would be nice.  There's a much higher bar to add a new arch
> hook
> than to just implement an existing one.

Ok, makes sense.

> 
> 
> > diff --git a/arch/x86/include/asm/mman.h
> > b/arch/x86/include/asm/mman.h
> > new file mode 100644
> > index 000000000000..b44fe31deb3a
> > --- /dev/null
> > +++ b/arch/x86/include/asm/mman.h
> > @@ -0,0 +1,21 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_X86_MMAN_H
> > +#define _ASM_X86_MMAN_H
> > +
> > +#include <linux/mm.h>
> > +#include <uapi/asm/mman.h>
> > +
> > +#ifdef CONFIG_X86_SHADOW_STACK
> > +static inline bool arch_validate_flags(unsigned long vm_flags)
> > +{
> > +	if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE))
> > +		return false;
> > +
> > +	return true;
> > +}
> 
> The design decision here seems to be that VM_SHADOW_STACK is itself a
> pseudo-VM_WRITE flag.  Like you said: "Shadow stack accesses are
> writes
> from handle_mm_fault()".
> 
> Very early on, this series seems to have made the decision that
> shadow
> stacks are writable and need lots of write handling behavior, *BUT*
> shouldn't have VM_WRITE set.  As a whole, that seems odd.
> 
> The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be
> set
> together.  I guess the downside is that pte_mkwrite() would need to
> be
> made to work on shadow stack PTEs.
> 
> That particular design decision was never discussed.  I think it has
> a
> really big impact on the rest of the series.  What do you think?  Was
> it
> a good idea?  Or would the alternative be more complicated than what
> you
> have now?

First of all, thanks again for the deep review of the MM piece. I'm
still pondering the overall problem, which is why I haven't responded
to those yet.

I had originally thought that the MM changes were a bit hard to follow.
I was also somewhat amazed at how naturally normal COW worked. I was
wondering where the big COW stuff would be happening. In the way that
COW was sort of tucked away, overloading writability seemed sort of
aligned. But the names are very confusing, and this patch probably
should have been a hint that there are problems design wise.

For writability, especially with WRSS, I do think it's a bit unnatural
to think of shadow stack memory as anything but writable. Especially
when it comes to COW. But shadow stack accesses are not always writes,
incssp for example. The code will create shadow stack memory for shadow
stack access loads, which of course isn't writing anything, but is
required to make the instruction work. So it calls mkwrite(), which is
weird. But... it does need to leave it in a state that is kind of
writable, so makes a little sense I guess.

I was wondering if maybe the mm code can't be fully sensible for shadow
stacks without creating maybe_mkshstk() and adding it everywhere in a
whole new fault path. Then you have reads, writes and shadow stack
accesses that each have their own logic. It might require so many
additions that better names and comments are preferable. I don't know
though, still trying to come up with a good opinion.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h
new file mode 100644
index 000000000000..b44fe31deb3a
--- /dev/null
+++ b/arch/x86/include/asm/mman.h
@@ -0,0 +1,21 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_MMAN_H
+#define _ASM_X86_MMAN_H
+
+#include <linux/mm.h>
+#include <uapi/asm/mman.h>
+
+#ifdef CONFIG_X86_SHADOW_STACK
+static inline bool arch_validate_flags(unsigned long vm_flags)
+{
+	if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE))
+		return false;
+
+	return true;
+}
+
+#define arch_validate_flags(vm_flags) arch_validate_flags(vm_flags)
+
+#endif /* CONFIG_X86_SHADOW_STACK */
+
+#endif /* _ASM_X86_MMAN_H */
diff --git a/arch/x86/include/uapi/asm/mman.h b/arch/x86/include/uapi/asm/mman.h
index d4a8d0424bfb..9704e27c4d24 100644
--- a/arch/x86/include/uapi/asm/mman.h
+++ b/arch/x86/include/uapi/asm/mman.h
@@ -1,6 +1,6 @@ 
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-#ifndef _ASM_X86_MMAN_H
-#define _ASM_X86_MMAN_H
+#ifndef _UAPI_ASM_X86_MMAN_H
+#define _UAPI_ASM_X86_MMAN_H
 
 #define MAP_32BIT	0x40		/* only give out 32bit addresses */
 
@@ -28,4 +28,4 @@ 
 
 #include <asm-generic/mman.h>
 
-#endif /* _ASM_X86_MMAN_H */
+#endif /* _UAPI_ASM_X86_MMAN_H */