diff mbox series

KVM: x86: Add GPA limit check to kvm_arch_vcpu_pre_fault_memory()

Message ID f2a46971d37ee3bf32ff33dc730e16bf0f755410.1721091397.git.isaku.yamahata@intel.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() | expand

Commit Message

Isaku Yamahata July 16, 2024, 1:22 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com>

Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest
maxphyaddr and kvm_mmu_max_gfn().

The KVM page fault handler decides which level of TDP to use, 4-level TDP
or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the
host maxphyaddr, and whether the host supports 5-level TDP or not.  The
4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up
to 52 bits.  If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the
host supports 5-level TDP.

If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler
(concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without
upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA.  It is not
expected behavior.  It wrongly maps GPA without upper bits with the page
for GPA with upper bits.

KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault()
with a user-space-supplied GPA without the limit check so that the user
space can trigger WARN_ON_ONCE().  Check the GPA limit to fix it.

- For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
  When the host supports 5-level TDP, KVM decides to use 4-level TDP if
  cpuid_maxphyaddr() <= 48.  cpuid_maxhyaddr() check prevents
  KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.

- For TDX case:
  We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA
  passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or
  Shared-EPT (direct or mirrored in [1]) without explicitly
  setting/clearing the GPA (except setting up the TDP iterator,
  tdp_iter_refresh_sptep()).  We'd like to make kvm_mmu_max_gfn() per VM
  for TDX to be 52 or 47 independent of the guest maxphyaddr with other
  patches.

Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 5 +++++
 1 file changed, 5 insertions(+)


base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547

Comments

Sean Christopherson July 16, 2024, 8:17 p.m. UTC | #1
On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest
> maxphyaddr and kvm_mmu_max_gfn().
> 
> The KVM page fault handler decides which level of TDP to use, 4-level TDP
> or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the
> host maxphyaddr, and whether the host supports 5-level TDP or not.  The
> 4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up
> to 52 bits.  If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the
> host supports 5-level TDP.
> 
> If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler
> (concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without
> upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA.  It is not
> expected behavior.  It wrongly maps GPA without upper bits with the page
> for GPA with upper bits.
> 
> KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault()
> with a user-space-supplied GPA without the limit check so that the user
> space can trigger WARN_ON_ONCE().  Check the GPA limit to fix it.

Which WARN?

> - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
>   When the host supports 5-level TDP, KVM decides to use 4-level TDP if
>   cpuid_maxphyaddr() <= 48.  cpuid_maxhyaddr() check prevents
>   KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.

Hardening against cpuid_maxphyaddr() should be out of scope.  We don't enforce
it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr
is false and the GPA is supposed to be illegal.  And trying to enforce it here is
a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the
restriction.

> - For TDX case:
>   We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA
>   passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or
>   Shared-EPT (direct or mirrored in [1]) without explicitly
>   setting/clearing the GPA (except setting up the TDP iterator,
>   tdp_iter_refresh_sptep()).  We'd like to make kvm_mmu_max_gfn() per VM
>   for TDX to be 52 or 47 independent of the guest maxphyaddr with other
>   patches.
> 
> Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4e0e9963066f..6ee5af55cee1 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
>  	u64 end;
>  	int r;
>  
> +	if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu)))
> +		return -E2BIG;
> +	if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn())
> +		return -E2BIG;


Related to my thoughts on making kvm_mmu_max_gfn() and rejecting aliased memslots,
I think we should add a common helper that's used by kvm_arch_prepare_memory_region()
and kvm_arch_vcpu_pre_fault_memory() to reject GPAs that are disallowed.

https://lore.kernel.org/all/ZpbKqG_ZhCWxl-Fc@google.com

> +
>  	/*
>  	 * reload is efficient when called repeatedly, so we can do it on
>  	 * every iteration.
> 
> base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547
> -- 
> 2.45.2
>
Isaku Yamahata July 16, 2024, 11:49 p.m. UTC | #2
On Tue, Jul 16, 2024 at 01:17:27PM -0700,
Sean Christopherson <seanjc@google.com> wrote:

> On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest
> > maxphyaddr and kvm_mmu_max_gfn().
> > 
> > The KVM page fault handler decides which level of TDP to use, 4-level TDP
> > or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the
> > host maxphyaddr, and whether the host supports 5-level TDP or not.  The
> > 4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up
> > to 52 bits.  If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the
> > host supports 5-level TDP.
> > 
> > If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler
> > (concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without
> > upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA.  It is not
> > expected behavior.  It wrongly maps GPA without upper bits with the page
> > for GPA with upper bits.
> > 
> > KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault()
> > with a user-space-supplied GPA without the limit check so that the user
> > space can trigger WARN_ON_ONCE().  Check the GPA limit to fix it.
> 
> Which WARN?

Sorry, I confused with the local changes for 4/5-level.


> 
> > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
> >   When the host supports 5-level TDP, KVM decides to use 4-level TDP if
> >   cpuid_maxphyaddr() <= 48.  cpuid_maxhyaddr() check prevents
> >   KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.
> 
> Hardening against cpuid_maxphyaddr() should be out of scope.  We don't enforce
> it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr
> is false and the GPA is supposed to be illegal.  And trying to enforce it here is
> a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the
> restriction.

Ok, I'll drop maxphys addr check.


> > - For TDX case:
> >   We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA
> >   passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or
> >   Shared-EPT (direct or mirrored in [1]) without explicitly
> >   setting/clearing the GPA (except setting up the TDP iterator,
> >   tdp_iter_refresh_sptep()).  We'd like to make kvm_mmu_max_gfn() per VM
> >   for TDX to be 52 or 47 independent of the guest maxphyaddr with other
> >   patches.
> > 
> > Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()")
> > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 4e0e9963066f..6ee5af55cee1 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> >  	u64 end;
> >  	int r;
> >  
> > +	if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu)))
> > +		return -E2BIG;
> > +	if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn())
> > +		return -E2BIG;
> 
> 
> Related to my thoughts on making kvm_mmu_max_gfn() and rejecting aliased memslots,
> I think we should add a common helper that's used by kvm_arch_prepare_memory_region()
> and kvm_arch_vcpu_pre_fault_memory() to reject GPAs that are disallowed.
> 
> https://lore.kernel.org/all/ZpbKqG_ZhCWxl-Fc@google.com

I'll look into it. I'll combine two patches into single patch series.
Isaku Yamahata July 18, 2024, 11:38 p.m. UTC | #3
On Tue, Jul 16, 2024 at 04:49:00PM -0700,
Isaku Yamahata <isaku.yamahata@intel.com> wrote:

> > > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV):
> > >   When the host supports 5-level TDP, KVM decides to use 4-level TDP if
> > >   cpuid_maxphyaddr() <= 48.  cpuid_maxhyaddr() check prevents
> > >   KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN.
> > 
> > Hardening against cpuid_maxphyaddr() should be out of scope.  We don't enforce
> > it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr
> > is false and the GPA is supposed to be illegal.  And trying to enforce it here is
> > a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the
> > restriction.
> 
> Ok, I'll drop maxphys addr check.

Now Rick added a patch to check aliased GFN.  This patch and per-VM mmu_max_gfn
become unnecessarily.  Now I come up with update to pre_fault to test no
memslot case.
https://lore.kernel.org/kvm/20240718211230.1492011-19-rick.p.edgecombe@intel.com/

For non-x86 case, I'm not sure if we can expect what error.


From d62fc5170b17788041d364e6a17f97f01be4130e Mon Sep 17 00:00:00 2001
Message-ID: <d62fc5170b17788041d364e6a17f97f01be4130e.1721345479.git.isaku.yamahata@intel.com>
From: Isaku Yamahata <isaku.yamahata@intel.com>
Date: Wed, 29 May 2024 12:13:20 -0700
Subject: [PATCH] KVM: selftests: Update pre_fault_memory_test.c to test no
 memslot case

Add test cases to pass GPA to get ENOENT where no memslot is assigned.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
This tests passes for kvm queue branch, also with KVM TDX branch.
---
 .../selftests/kvm/pre_fault_memory_test.c     | 37 ++++++++++++++-----
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 0350a8896a2f..8d057a0bc6fd 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -30,8 +30,8 @@ static void guest_code(uint64_t base_gpa)
 	GUEST_DONE();
 }
 
-static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
-			     u64 left)
+static void __pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
+			       u64 left, int *ret, int *save_errno)
 {
 	struct kvm_pre_fault_memory range = {
 		.gpa = gpa,
@@ -39,21 +39,28 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
 		.flags = 0,
 	};
 	u64 prev;
-	int ret, save_errno;
 
 	do {
 		prev = range.size;
-		ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range);
-		save_errno = errno;
-		TEST_ASSERT((range.size < prev) ^ (ret < 0),
+		*ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range);
+		*save_errno = errno;
+		TEST_ASSERT((range.size < prev) ^ (*ret < 0),
 			    "%sexpecting range.size to change on %s",
-			    ret < 0 ? "not " : "",
-			    ret < 0 ? "failure" : "success");
-	} while (ret >= 0 ? range.size : save_errno == EINTR);
+			    *ret < 0 ? "not " : "",
+			    *ret < 0 ? "failure" : "success");
+	} while (*ret >= 0 ? range.size : *save_errno == EINTR);
 
 	TEST_ASSERT(range.size == left,
 		    "Completed with %lld bytes left, expected %" PRId64,
 		    range.size, left);
+}
+
+static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
+			     u64 left)
+{
+	int ret, save_errno;
+
+	__pre_fault_memory(vcpu, gpa, size, left, &ret, &save_errno);
 
 	if (left == 0)
 		__TEST_ASSERT_VM_VCPU_IOCTL(!ret, "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
@@ -77,6 +84,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	uint64_t guest_test_phys_mem;
 	uint64_t guest_test_virt_mem;
 	uint64_t alignment, guest_page_size;
+	int ret, save_errno;
 
 	vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
 
@@ -101,6 +109,17 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE);
 	pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE);
 
+#ifdef __x86_64__
+	__pre_fault_memory(vcpu, guest_test_phy_mem - guest_page_size,
+			   guest_page_size, guest_page_size, &ret, &save_errno);
+	__TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT,
+				    "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
+	__pre_fault_memory(vcpu, (vm->max_gfn + 1) << vm->page_shift,
+			   guest_page_size, guest_page_size, &ret, &save_errno);
+	__TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT,
+				    "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
+#endif
+
 	vcpu_args_set(vcpu, 1, guest_test_virt_mem);
 	vcpu_run(vcpu);
 

base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547
diff mbox series

Patch

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e0e9963066f..6ee5af55cee1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4756,6 +4756,11 @@  long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
 	u64 end;
 	int r;
 
+	if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu)))
+		return -E2BIG;
+	if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn())
+		return -E2BIG;
+
 	/*
 	 * reload is efficient when called repeatedly, so we can do it on
 	 * every iteration.