diff mbox series

[RFC,v9,13/51] x86/fault: Handle RMP page faults for user addresses

Message ID 20230612042559.375660-14-michael.roth@amd.com (mailing list archive)
State Not Applicable
Delegated to: Herbert Xu
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Commit Message

Michael Roth June 12, 2023, 4:25 a.m. UTC
From: Brijesh Singh <brijesh.singh@amd.com>

When SEV-SNP is enabled globally, a write from the host is subject to
checks performed by the hardware against the RMP table (APM2 15.36.10)
at the end of a page walk:

  1. Assigned bit in the RMP table is not set (i.e page is shared).
  2. Immutable bit in the RMP table is not set.
  3. If the page table entry that gives the sPA indicates that the
     target page size is a large page, then all RMP entries for the 4KB
     constituting pages of the target must have the assigned bit 0.

Nothing constructive can come of an attempt by userspace to violate case
1) (which will result in writing garbage due to page encryption) or case
2) (userspace should not ever need or be allowed to write to a page that
the host has specifically needed to mark immutable).

Case 3) is dependent on the hypervisor. In case of KVM, due to how
shared/private pages are partitioned into separate memory pools via
restricted/guarded memory, there should never be a case where a page in
the private pool overlaps with a shared page: either it is a
hugepage-sized allocation and all the sub-pages are private, or it is a
single-page allocation, in which case it cannot overlap with anything
but itself.

Therefore, for all 3 cases, it is appropriate to simply kill the
userspace process if it ever generates an RMP #PF. Implement that logic
here.

Co-developed-by: Jarkko Sakkinen <jarkko.sakkinen@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@profian.com>
Co-developed-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
[mdr: drop all previous page-splitting logic since it is no longer
 needed with restricted/guarded memory, update commit message
 accordingly]
Signed-off-by: Michael Roth <michael.roth@amd.com>
---
 arch/x86/mm/fault.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Dave Hansen June 12, 2023, 4:40 p.m. UTC | #1
On 6/11/23 21:25, Michael Roth wrote:
> From: Brijesh Singh <brijesh.singh@amd.com>
> 
> When SEV-SNP is enabled globally, a write from the host is subject to
> checks performed by the hardware against the RMP table (APM2 15.36.10)
> at the end of a page walk:
> 
>   1. Assigned bit in the RMP table is not set (i.e page is shared).
>   2. Immutable bit in the RMP table is not set.
>   3. If the page table entry that gives the sPA indicates that the
>      target page size is a large page, then all RMP entries for the 4KB
>      constituting pages of the target must have the assigned bit 0.
> 
> Nothing constructive can come of an attempt by userspace to violate case
> 1) (which will result in writing garbage due to page encryption) or case
> 2) (userspace should not ever need or be allowed to write to a page that
> the host has specifically needed to mark immutable).

What does this _mean_?  If nothing constructive can come of it, what
does that mean for the kernel?

> Case 3) is dependent on the hypervisor. In case of KVM, due to how
> shared/private pages are partitioned into separate memory pools via
> restricted/guarded memory, there should never be a case where a page in
> the private pool overlaps with a shared page: either it is a
> hugepage-sized allocation and all the sub-pages are private, or it is a
> single-page allocation, in which case it cannot overlap with anything
> but itself.
> 
> Therefore, for all 3 cases, it is appropriate to simply kill the
> userspace process if it ever generates an RMP #PF. Implement that logic
> here.
...
> +	if (error_code & X86_PF_RMP) {
> +		pr_err("Unexpected RMP page fault for address 0x%lx, terminating process\n",
> +		       address);
> +		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
> +		return;
> +	}
> +

This is special-snowflake code.  You're making the argument that an RMP
fault is a special snowflake and needs special handling.

Why should an RMP violation be any different than, say a write to a
read-only page (that also ends in signal delivery)?

I kinda dislike the entire changelog here.  I really don't know what
point it's making or what it is arguing.
diff mbox series

Patch

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index d46b9cf832b9..6465bff9d1ba 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1329,6 +1329,13 @@  void do_user_addr_fault(struct pt_regs *regs,
 	if (error_code & X86_PF_INSTR)
 		flags |= FAULT_FLAG_INSTRUCTION;
 
+	if (error_code & X86_PF_RMP) {
+		pr_err("Unexpected RMP page fault for address 0x%lx, terminating process\n",
+		       address);
+		do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
+		return;
+	}
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Faults in the vsyscall page might need emulation.  The