[v6,19/41] x86/mm: Check shadow stack page fault errors

Message ID	20230218211433.26859-20-rick.p.edgecombe@intel.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Rick Edgecombe <rick.p.edgecombe@intel.com> To: x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>, Andy Lutomirski <luto@kernel.org>, Balbir Singh <bsingharora@gmail.com>, Borislav Petkov <bp@alien8.de>, Cyrill Gorcunov <gorcunov@gmail.com>, Dave Hansen <dave.hansen@linux.intel.com>, Eugene Syromiatnikov <esyr@redhat.com>, Florian Weimer <fweimer@redhat.com>, "H . J . Lu" <hjl.tools@gmail.com>, Jann Horn <jannh@google.com>, Jonathan Corbet <corbet@lwn.net>, Kees Cook <keescook@chromium.org>, Mike Kravetz <mike.kravetz@oracle.com>, Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>, Peter Zijlstra <peterz@infradead.org>, Randy Dunlap <rdunlap@infradead.org>, Weijiang Yang <weijiang.yang@intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, John Allen <john.allen@amd.com>, kcc@google.com, eranian@google.com, rppt@kernel.org, jamorris@linux.microsoft.com, dethoma@microsoft.com, akpm@linux-foundation.org, Andrew.Cooper3@citrix.com, christina.schimpe@intel.com, david@redhat.com, debug@rivosinc.com Cc: rick.p.edgecombe@intel.com, Yu-cheng Yu <yu-cheng.yu@intel.com> Subject: [PATCH v6 19/41] x86/mm: Check shadow stack page fault errors Date: Sat, 18 Feb 2023 13:14:11 -0800 Message-Id: <20230218211433.26859-20-rick.p.edgecombe@intel.com> In-Reply-To: <20230218211433.26859-1-rick.p.edgecombe@intel.com> References: <20230218211433.26859-1-rick.p.edgecombe@intel.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Shadow stacks for userspace \| expand [v6,00/41] Shadow stacks for userspace [v6,01/41] Documentation/x86: Add CET shadow stack description [v6,02/41] x86/shstk: Add Kconfig option for shadow stack [v6,03/41] x86/cpufeatures: Add CPU feature flags for shadow stacks [v6,04/41] x86/cpufeatures: Enable CET CR4 bit for shadow stack [v6,05/41] x86/fpu/xstate: Introduce CET MSR and XSAVES supervisor states [v6,06/41] x86/fpu: Add helper for modifying xstate [v6,07/41] x86: Move control protection handler to separate file [v6,08/41] x86/shstk: Add user control-protection fault handler [v6,09/41] x86/mm: Remove _PAGE_DIRTY from kernel RO pages [v6,10/41] x86/mm: Move pmd_write(), pud_write() up in the file [v6,11/41] mm: Introduce pte_mkwrite_kernel() [v6,12/41] s390/mm: Introduce pmd_mkwrite_kernel() [v6,13/41] mm: Make pte_mkwrite() take a VMA [v6,14/41] x86/mm: Introduce _PAGE_SAVED_DIRTY [v6,15/41] x86/mm: Update ptep/pmdp_set_wrprotect() for _PAGE_SAVED_DIRTY [v6,16/41] x86/mm: Start actually marking _PAGE_SAVED_DIRTY [v6,17/41] mm: Move VM_UFFD_MINOR_BIT from 37 to 38 [v6,18/41] mm: Introduce VM_SHADOW_STACK for shadow stack memory [v6,19/41] x86/mm: Check shadow stack page fault errors [v6,20/41] x86/mm: Teach pte_mkwrite() about stack memory [v6,21/41] mm: Add guard pages around a shadow stack. [v6,22/41] mm/mmap: Add shadow stack pages to memory accounting [v6,23/41] mm: Re-introduce vm_flags to do_mmap() [v6,24/41] mm: Don't allow write GUPs to shadow stack memory [v6,25/41] x86/mm: Introduce MAP_ABOVE4G [v6,26/41] mm: Warn on shadow stack memory in wrong vma [v6,27/41] x86/mm: Warn if create Write=0,Dirty=1 with raw prot [v6,28/41] x86: Introduce userspace API for shadow stack [v6,29/41] x86/shstk: Add user-mode shadow stack support [v6,30/41] x86/shstk: Handle thread shadow stack [v6,31/41] x86/shstk: Introduce routines modifying shstk [v6,32/41] x86/shstk: Handle signals for shadow stack [v6,33/41] x86/shstk: Introduce map_shadow_stack syscall [v6,34/41] x86/shstk: Support WRSS for userspace [v6,35/41] x86: Expose thread features in /proc/$PID/status [v6,36/41] x86/shstk: Wire in shadow stack interface [v6,37/41] selftests/x86: Add shadow stack test [v6,38/41] x86/fpu: Add helper for initing features [v6,39/41] x86: Add PTRACE interface for shadow stack [v6,40/41] x86/shstk: Add ARCH_SHSTK_UNLOCK [v6,41/41] x86/shstk: Add ARCH_SHSTK_STATUS

Message ID

20230218211433.26859-20-rick.p.edgecombe@intel.com (mailing list archive)

State

New

Headers

From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: x86@kernel.org,
	"H . Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org,
	linux-mm@kvack.org,
	linux-arch@vger.kernel.org,
	linux-api@vger.kernel.org,
	Arnd Bergmann <arnd@arndb.de>,
	Andy Lutomirski <luto@kernel.org>,
	Balbir Singh <bsingharora@gmail.com>,
	Borislav Petkov <bp@alien8.de>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Eugene Syromiatnikov <esyr@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	"H . J . Lu" <hjl.tools@gmail.com>,
	Jann Horn <jannh@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Pavel Machek <pavel@ucw.cz>,
	Peter Zijlstra <peterz@infradead.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Weijiang Yang <weijiang.yang@intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	John Allen <john.allen@amd.com>,
	kcc@google.com,
	eranian@google.com,
	rppt@kernel.org,
	jamorris@linux.microsoft.com,
	dethoma@microsoft.com,
	akpm@linux-foundation.org,
	Andrew.Cooper3@citrix.com,
	christina.schimpe@intel.com,
	david@redhat.com,
	debug@rivosinc.com
Cc: rick.p.edgecombe@intel.com,
	Yu-cheng Yu <yu-cheng.yu@intel.com>
Subject: [PATCH v6 19/41] x86/mm: Check shadow stack page fault errors
Date: Sat, 18 Feb 2023 13:14:11 -0800
Message-Id: <20230218211433.26859-20-rick.p.edgecombe@intel.com>
In-Reply-To: <20230218211433.26859-1-rick.p.edgecombe@intel.com>
References: <20230218211433.26859-1-rick.p.edgecombe@intel.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Shadow stacks for userspace | expand

Commit Message

Edgecombe, Rick P Feb. 18, 2023, 9:14 p.m. UTC

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

The CPU performs "shadow stack accesses" when it expects to encounter
shadow stack mappings. These accesses can be implicit (via CALL/RET
instructions) or explicit (instructions like WRSS).

Shadow stack accesses to shadow-stack mappings can result in faults in
normal, valid operation just like regular accesses to regular mappings.
Shadow stacks need some of the same features like delayed allocation, swap
and copy-on-write. The kernel needs to use faults to implement those
features.

The architecture has concepts of both shadow stack reads and shadow stack
writes. Any shadow stack access to non-shadow stack memory will generate
a fault with the shadow stack error code bit set.

This means that, unlike normal write protection, the fault handler needs
to create a type of memory that can be written to (with instructions that
generate shadow stack writes), even to fulfill a read access. So in the
case of COW memory, the COW needs to take place even with a shadow stack
read. Otherwise the page will be left (shadow stack) writable in
userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE
for shadow stack accesses, even if the access was a shadow stack read.

For the purpose of making this clearer, consider the following example.
If a process has a shadow stack, and forks, the shadow stack PTEs will
become read-only due to COW. If the CPU in one process performs a shadow
stack read access to the shadow stack, for example executing a RET and
causing the CPU to read the shadow stack copy of the return address, then
in order for the fault to be resolved the PTE will need to be set with
shadow stack permissions. But then the memory would be changeable from
userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger
COW, otherwise the shared page would be changeable from both processes.

Shadow stack accesses can also result in errors, such as when a shadow
stack overflows, or if a shadow stack access occurs to a non-shadow-stack
mapping. Also, generate the errors for invalid shadow stack accesses.

Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Tested-by: John Allen <john.allen@amd.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

---
v6:
 - Update comment due to rename of Cow bit to SavedDirty

v5:
 - Add description of COW example (Boris)
 - Replace "permissioned" (Boris)
 - Remove capitalization of shadow stack (Boris)

v4:
 - Further improve comment talking about FAULT_FLAG_WRITE (Peterz)

v3:
 - Improve comment talking about using FAULT_FLAG_WRITE (Peterz)
---
 arch/x86/include/asm/trap_pf.h |  2 ++
 arch/x86/mm/fault.c            | 38 ++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

Comments

David Hildenbrand Feb. 20, 2023, 12:57 p.m. UTC | #1

On 18.02.23 22:14, Rick Edgecombe wrote:
> From: Yu-cheng Yu <yu-cheng.yu@intel.com>
> 
> The CPU performs "shadow stack accesses" when it expects to encounter
> shadow stack mappings. These accesses can be implicit (via CALL/RET
> instructions) or explicit (instructions like WRSS).
> 
> Shadow stack accesses to shadow-stack mappings can result in faults in
> normal, valid operation just like regular accesses to regular mappings.
> Shadow stacks need some of the same features like delayed allocation, swap
> and copy-on-write. The kernel needs to use faults to implement those
> features.
> 
> The architecture has concepts of both shadow stack reads and shadow stack
> writes. Any shadow stack access to non-shadow stack memory will generate
> a fault with the shadow stack error code bit set.
> 
> This means that, unlike normal write protection, the fault handler needs
> to create a type of memory that can be written to (with instructions that
> generate shadow stack writes), even to fulfill a read access. So in the
> case of COW memory, the COW needs to take place even with a shadow stack
> read. Otherwise the page will be left (shadow stack) writable in
> userspace. So to trigger the appropriate behavior, set FAULT_FLAG_WRITE
> for shadow stack accesses, even if the access was a shadow stack read.
> 
> For the purpose of making this clearer, consider the following example.
> If a process has a shadow stack, and forks, the shadow stack PTEs will
> become read-only due to COW. If the CPU in one process performs a shadow
> stack read access to the shadow stack, for example executing a RET and
> causing the CPU to read the shadow stack copy of the return address, then
> in order for the fault to be resolved the PTE will need to be set with
> shadow stack permissions. But then the memory would be changeable from
> userspace (from CALL, RET, WRSS, etc). So this scenario needs to trigger
> COW, otherwise the shared page would be changeable from both processes.
> 
> Shadow stack accesses can also result in errors, such as when a shadow
> stack overflows, or if a shadow stack access occurs to a non-shadow-stack
> mapping. Also, generate the errors for invalid shadow stack accesses.
> 
> Tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Tested-by: John Allen <john.allen@amd.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> 
> ---
> v6:
>   - Update comment due to rename of Cow bit to SavedDirty
> 
> v5:
>   - Add description of COW example (Boris)
>   - Replace "permissioned" (Boris)
>   - Remove capitalization of shadow stack (Boris)
> 
> v4:
>   - Further improve comment talking about FAULT_FLAG_WRITE (Peterz)
> 
> v3:
>   - Improve comment talking about using FAULT_FLAG_WRITE (Peterz)
> ---
>   arch/x86/include/asm/trap_pf.h |  2 ++
>   arch/x86/mm/fault.c            | 38 ++++++++++++++++++++++++++++++++++
>   2 files changed, 40 insertions(+)
> 
> diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
> index 10b1de500ab1..afa524325e55 100644
> --- a/arch/x86/include/asm/trap_pf.h
> +++ b/arch/x86/include/asm/trap_pf.h
> @@ -11,6 +11,7 @@
>    *   bit 3 ==				1: use of reserved bit detected
>    *   bit 4 ==				1: fault was an instruction fetch
>    *   bit 5 ==				1: protection keys block access
> + *   bit 6 ==				1: shadow stack access fault
>    *   bit 15 ==				1: SGX MMU page-fault
>    */
>   enum x86_pf_error_code {
> @@ -20,6 +21,7 @@ enum x86_pf_error_code {
>   	X86_PF_RSVD	=		1 << 3,
>   	X86_PF_INSTR	=		1 << 4,
>   	X86_PF_PK	=		1 << 5,
> +	X86_PF_SHSTK	=		1 << 6,
>   	X86_PF_SGX	=		1 << 15,
>   };
>   
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 7b0d4ab894c8..42885d8e2036 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1138,8 +1138,22 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
>   				       (error_code & X86_PF_INSTR), foreign))
>   		return 1;
>   
> +	/*
> +	 * Shadow stack accesses (PF_SHSTK=1) are only permitted to
> +	 * shadow stack VMAs. All other accesses result in an error.
> +	 */
> +	if (error_code & X86_PF_SHSTK) {
> +		if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK)))
> +			return 1;
> +		if (unlikely(!(vma->vm_flags & VM_WRITE)))
> +			return 1;
> +		return 0;
> +	}
> +
>   	if (error_code & X86_PF_WRITE) {
>   		/* write, present and write, not present: */
> +		if (unlikely(vma->vm_flags & VM_SHADOW_STACK))
> +			return 1;
>   		if (unlikely(!(vma->vm_flags & VM_WRITE)))
>   			return 1;
>   		return 0;
> @@ -1331,6 +1345,30 @@ void do_user_addr_fault(struct pt_regs *regs,
>   
>   	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
>   
> +	/*
> +	 * When a page becomes COW it changes from a shadow stack permission
> +	 * page (Write=0,Dirty=1) to (Write=0,Dirty=0,SavedDirty=1), which is simply
> +	 * read-only to the CPU. When shadow stack is enabled, a RET would
> +	 * normally pop the shadow stack by reading it with a "shadow stack
> +	 * read" access. However, in the COW case the shadow stack memory does
> +	 * not have shadow stack permissions, it is read-only. So it will
> +	 * generate a fault.
> +	 *
> +	 * For conventionally writable pages, a read can be serviced with a
> +	 * read only PTE, and COW would not have to happen. But for shadow
> +	 * stack, there isn't the concept of read-only shadow stack memory.
> +	 * If it is shadow stack permission, it can be modified via CALL and
> +	 * RET instructions. So COW needs to happen before any memory can be
> +	 * mapped with shadow stack permissions.
> +	 *
> +	 * Shadow stack accesses (read or write) need to be serviced with
> +	 * shadow stack permission memory, so in the case of a shadow stack
> +	 * read access, treat it as a WRITE fault so both COW will happen and
> +	 * the write fault path will tickle maybe_mkwrite() and map the memory
> +	 * shadow stack.
> +	 */

Again, I suggest dropping all details about COW from this comment and 
from the patch description. It's just one such case that can happen.

Edgecombe, Rick P Feb. 22, 2023, 11:07 p.m. UTC | #2

On Mon, 2023-02-20 at 13:57 +0100, David Hildenbrand wrote:
> >    
> > +     /*
> > +      * When a page becomes COW it changes from a shadow stack
> > permission
> > +      * page (Write=0,Dirty=1) to (Write=0,Dirty=0,SavedDirty=1),
> > which is simply
> > +      * read-only to the CPU. When shadow stack is enabled, a RET
> > would
> > +      * normally pop the shadow stack by reading it with a "shadow
> > stack
> > +      * read" access. However, in the COW case the shadow stack
> > memory does
> > +      * not have shadow stack permissions, it is read-only. So it
> > will
> > +      * generate a fault.
> > +      *
> > +      * For conventionally writable pages, a read can be serviced
> > with a
> > +      * read only PTE, and COW would not have to happen. But for
> > shadow
> > +      * stack, there isn't the concept of read-only shadow stack
> > memory.
> > +      * If it is shadow stack permission, it can be modified via
> > CALL and
> > +      * RET instructions. So COW needs to happen before any memory
> > can be
> > +      * mapped with shadow stack permissions.
> > +      *
> > +      * Shadow stack accesses (read or write) need to be serviced
> > with
> > +      * shadow stack permission memory, so in the case of a shadow
> > stack
> > +      * read access, treat it as a WRITE fault so both COW will
> > happen and
> > +      * the write fault path will tickle maybe_mkwrite() and map
> > the memory
> > +      * shadow stack.
> > +      */
> 
> Again, I suggest dropping all details about COW from this comment
> and 
> from the patch description. It's just one such case that can happen.

Hi David,

I was just trying to edit this one to drop COW details, but I think in
this case, one of the major reasons for the code *is* actually COW. We
are not working around the whole inadvertent shadow stack memory piece
here, but something else: Making sure shadow stack memory is faulted in
and doing COW if required to make this possible. I came up with this,
does it seem better?

/*
 * For conventionally writable pages, a read can be serviced with a
 *
read only PTE. But for shadow stack, there isn't a concept of
 * read-
only shadow stack memory. If it a PTE has the shadow stack
 *
permission, it can be modified via CALL and RET instructions. So
 * core
MM needs to fault in a writable PTE and do things it already
 * does for
write faults.
 *
 * Shadow stack accesses (read or write) need to be
serviced with
 * shadow stack permission memory, so in the case of a
shadow stack
 * read access, treat it as a WRITE fault so both any
required COW will
 * happen and the write fault path will tickle
maybe_mkwrite() and map
 * the memory shadow stack.
 */

Thanks,
Rick

David Hildenbrand Feb. 23, 2023, 12:55 p.m. UTC | #3

On 23.02.23 00:07, Edgecombe, Rick P wrote:
> On Mon, 2023-02-20 at 13:57 +0100, David Hildenbrand wrote:
>>>     
>>> +     /*
>>> +      * When a page becomes COW it changes from a shadow stack
>>> permission
>>> +      * page (Write=0,Dirty=1) to (Write=0,Dirty=0,SavedDirty=1),
>>> which is simply
>>> +      * read-only to the CPU. When shadow stack is enabled, a RET
>>> would
>>> +      * normally pop the shadow stack by reading it with a "shadow
>>> stack
>>> +      * read" access. However, in the COW case the shadow stack
>>> memory does
>>> +      * not have shadow stack permissions, it is read-only. So it
>>> will
>>> +      * generate a fault.
>>> +      *
>>> +      * For conventionally writable pages, a read can be serviced
>>> with a
>>> +      * read only PTE, and COW would not have to happen. But for
>>> shadow
>>> +      * stack, there isn't the concept of read-only shadow stack
>>> memory.
>>> +      * If it is shadow stack permission, it can be modified via
>>> CALL and
>>> +      * RET instructions. So COW needs to happen before any memory
>>> can be
>>> +      * mapped with shadow stack permissions.
>>> +      *
>>> +      * Shadow stack accesses (read or write) need to be serviced
>>> with
>>> +      * shadow stack permission memory, so in the case of a shadow
>>> stack
>>> +      * read access, treat it as a WRITE fault so both COW will
>>> happen and
>>> +      * the write fault path will tickle maybe_mkwrite() and map
>>> the memory
>>> +      * shadow stack.
>>> +      */
>>
>> Again, I suggest dropping all details about COW from this comment
>> and
>> from the patch description. It's just one such case that can happen.
> 
> Hi David,

Hi Rick,

> 
> I was just trying to edit this one to drop COW details, but I think in
> this case, one of the major reasons for the code *is* actually COW. We
> are not working around the whole inadvertent shadow stack memory piece
> here, but something else: Making sure shadow stack memory is faulted in
> and doing COW if required to make this possible. I came up with this,
> does it seem better?

Regarding the fault handling I completely agree. We have to treat a read 
like a write event. And as read-only shadow stack PTEs don't exist, we 
have to tell the MM to create a writable one for us.

> 
> 
> /*
>   * For conventionally writable pages, a read can be serviced with a
>   *
> read only PTE. But for shadow stack, there isn't a concept of
>   * read-
> only shadow stack memory. If it a PTE has the shadow stack
>   *
> permission, it can be modified via CALL and RET instructions. So
>   * core
> MM needs to fault in a writable PTE and do things it already
>   * does for
> write faults.
>   *
>   * Shadow stack accesses (read or write) need to be
> serviced with
>   * shadow stack permission memory, so in the case of a
> shadow stack
>   * read access, treat it as a WRITE fault so both any
> required COW will
>   * happen and the write fault path will tickle
> maybe_mkwrite() and map
>   * the memory shadow stack.
>   */

That sounds good! I'd rewrite the last part slightly.

"
Shadow stack accesses (read or write) need to be serviced with
shadow stack permission memory, which always include write permissions. 
So in the case of a shadow stack read access, treat it as a WRITE fault. 
This will make sure that MM will prepare everything (e.g., break COW) 
such that maybe_mkwrite() can create a proper shadow stack PTE.
"

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..afa524325e55 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -11,6 +11,7 @@ 
  *   bit 3 ==				1: use of reserved bit detected
  *   bit 4 ==				1: fault was an instruction fetch
  *   bit 5 ==				1: protection keys block access
+ *   bit 6 ==				1: shadow stack access fault
  *   bit 15 ==				1: SGX MMU page-fault
  */
 enum x86_pf_error_code {
@@ -20,6 +21,7 @@  enum x86_pf_error_code {
 	X86_PF_RSVD	=		1 << 3,
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
+	X86_PF_SHSTK	=		1 << 6,
 	X86_PF_SGX	=		1 << 15,
 };
 
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7b0d4ab894c8..42885d8e2036 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1138,8 +1138,22 @@  access_error(unsigned long error_code, struct vm_area_struct *vma)
 				       (error_code & X86_PF_INSTR), foreign))
 		return 1;
 
+	/*
+	 * Shadow stack accesses (PF_SHSTK=1) are only permitted to
+	 * shadow stack VMAs. All other accesses result in an error.
+	 */
+	if (error_code & X86_PF_SHSTK) {
+		if (unlikely(!(vma->vm_flags & VM_SHADOW_STACK)))
+			return 1;
+		if (unlikely(!(vma->vm_flags & VM_WRITE)))
+			return 1;
+		return 0;
+	}
+
 	if (error_code & X86_PF_WRITE) {
 		/* write, present and write, not present: */
+		if (unlikely(vma->vm_flags & VM_SHADOW_STACK))
+			return 1;
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
 			return 1;
 		return 0;
@@ -1331,6 +1345,30 @@  void do_user_addr_fault(struct pt_regs *regs,
 
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
+	/*
+	 * When a page becomes COW it changes from a shadow stack permission
+	 * page (Write=0,Dirty=1) to (Write=0,Dirty=0,SavedDirty=1), which is simply
+	 * read-only to the CPU. When shadow stack is enabled, a RET would
+	 * normally pop the shadow stack by reading it with a "shadow stack
+	 * read" access. However, in the COW case the shadow stack memory does
+	 * not have shadow stack permissions, it is read-only. So it will
+	 * generate a fault.
+	 *
+	 * For conventionally writable pages, a read can be serviced with a
+	 * read only PTE, and COW would not have to happen. But for shadow
+	 * stack, there isn't the concept of read-only shadow stack memory.
+	 * If it is shadow stack permission, it can be modified via CALL and
+	 * RET instructions. So COW needs to happen before any memory can be
+	 * mapped with shadow stack permissions.
+	 *
+	 * Shadow stack accesses (read or write) need to be serviced with
+	 * shadow stack permission memory, so in the case of a shadow stack
+	 * read access, treat it as a WRITE fault so both COW will happen and
+	 * the write fault path will tickle maybe_mkwrite() and map the memory
+	 * shadow stack.
+	 */
+	if (error_code & X86_PF_SHSTK)
+		flags |= FAULT_FLAG_WRITE;
 	if (error_code & X86_PF_WRITE)
 		flags |= FAULT_FLAG_WRITE;
 	if (error_code & X86_PF_INSTR)

[v6,19/41] x86/mm: Check shadow stack page fault errors

Commit Message

Comments

Patch