Message ID | f2a46971d37ee3bf32ff33dc730e16bf0f755410.1721091397.git.isaku.yamahata@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() | expand |
On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest > maxphyaddr and kvm_mmu_max_gfn(). > > The KVM page fault handler decides which level of TDP to use, 4-level TDP > or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the > host maxphyaddr, and whether the host supports 5-level TDP or not. The > 4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up > to 52 bits. If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the > host supports 5-level TDP. > > If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler > (concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without > upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA. It is not > expected behavior. It wrongly maps GPA without upper bits with the page > for GPA with upper bits. > > KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault() > with a user-space-supplied GPA without the limit check so that the user > space can trigger WARN_ON_ONCE(). Check the GPA limit to fix it. Which WARN? > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV): > When the host supports 5-level TDP, KVM decides to use 4-level TDP if > cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents > KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN. Hardening against cpuid_maxphyaddr() should be out of scope. We don't enforce it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr is false and the GPA is supposed to be illegal. And trying to enforce it here is a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the restriction. > - For TDX case: > We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA > passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or > Shared-EPT (direct or mirrored in [1]) without explicitly > setting/clearing the GPA (except setting up the TDP iterator, > tdp_iter_refresh_sptep()). We'd like to make kvm_mmu_max_gfn() per VM > for TDX to be 52 or 47 independent of the guest maxphyaddr with other > patches. > > Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()") > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > --- > arch/x86/kvm/mmu/mmu.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4e0e9963066f..6ee5af55cee1 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > u64 end; > int r; > > + if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu))) > + return -E2BIG; > + if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn()) > + return -E2BIG; Related to my thoughts on making kvm_mmu_max_gfn() and rejecting aliased memslots, I think we should add a common helper that's used by kvm_arch_prepare_memory_region() and kvm_arch_vcpu_pre_fault_memory() to reject GPAs that are disallowed. https://lore.kernel.org/all/ZpbKqG_ZhCWxl-Fc@google.com > + > /* > * reload is efficient when called repeatedly, so we can do it on > * every iteration. > > base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547 > -- > 2.45.2 >
On Tue, Jul 16, 2024 at 01:17:27PM -0700, Sean Christopherson <seanjc@google.com> wrote: > On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > > > Add GPA limit check to kvm_arch_vcpu_pre_fault_memory() with guest > > maxphyaddr and kvm_mmu_max_gfn(). > > > > The KVM page fault handler decides which level of TDP to use, 4-level TDP > > or 5-level TDP based on guest maxphyaddr (CPUID[0x80000008].EAX[7:0]), the > > host maxphyaddr, and whether the host supports 5-level TDP or not. The > > 4-level TDP can map GPA up to 48 bits, and the 5-level TDP can map GPA up > > to 52 bits. If guest maxphyaddr <= 48, KVM uses 4-level TDP even when the > > host supports 5-level TDP. > > > > If we pass GPA > beyond the TDP mappable limit to the TDP MMU fault handler > > (concretely GPA > 48-bits with 4-level TDP), it will operate on GPA without > > upper bits, (GPA & ((1UL < 48) - 1)), not the specified GPA. It is not > > expected behavior. It wrongly maps GPA without upper bits with the page > > for GPA with upper bits. > > > > KVM_PRE_FAULT_MEMORY calls x86 KVM page fault handler, kvm_tdp_page_fault() > > with a user-space-supplied GPA without the limit check so that the user > > space can trigger WARN_ON_ONCE(). Check the GPA limit to fix it. > > Which WARN? Sorry, I confused with the local changes for 4/5-level. > > > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV): > > When the host supports 5-level TDP, KVM decides to use 4-level TDP if > > cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents > > KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN. > > Hardening against cpuid_maxphyaddr() should be out of scope. We don't enforce > it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr > is false and the GPA is supposed to be illegal. And trying to enforce it here is > a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the > restriction. Ok, I'll drop maxphys addr check. > > - For TDX case: > > We'd like to exclude shared bit (or gfn_direct_mask in [1]) from GPA > > passed to the TDP MMU so that the TDP MMU can handle Secure-EPT or > > Shared-EPT (direct or mirrored in [1]) without explicitly > > setting/clearing the GPA (except setting up the TDP iterator, > > tdp_iter_refresh_sptep()). We'd like to make kvm_mmu_max_gfn() per VM > > for TDX to be 52 or 47 independent of the guest maxphyaddr with other > > patches. > > > > Fixes: 6e01b7601dfe ("KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()") > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > > --- > > arch/x86/kvm/mmu/mmu.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 4e0e9963066f..6ee5af55cee1 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > > u64 end; > > int r; > > > > + if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu))) > > + return -E2BIG; > > + if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn()) > > + return -E2BIG; > > > Related to my thoughts on making kvm_mmu_max_gfn() and rejecting aliased memslots, > I think we should add a common helper that's used by kvm_arch_prepare_memory_region() > and kvm_arch_vcpu_pre_fault_memory() to reject GPAs that are disallowed. > > https://lore.kernel.org/all/ZpbKqG_ZhCWxl-Fc@google.com I'll look into it. I'll combine two patches into single patch series.
On Tue, Jul 16, 2024 at 04:49:00PM -0700, Isaku Yamahata <isaku.yamahata@intel.com> wrote: > > > - For non-TDX case (DEFAULT_VM, SW_PROTECTED_VM, or SEV): > > > When the host supports 5-level TDP, KVM decides to use 4-level TDP if > > > cpuid_maxphyaddr() <= 48. cpuid_maxhyaddr() check prevents > > > KVM_PRE_FAULT_MEMORY from passing GFN beyond mappable GFN. > > > > Hardening against cpuid_maxphyaddr() should be out of scope. We don't enforce > > it for guest faults, e.g. KVM doesn't kill the guest if allow_smaller_maxphyaddr > > is false and the GPA is supposed to be illegal. And trying to enforce it here is > > a fool's errand since userspace can simply do KVM_SET_CPUID2 to circumvent the > > restriction. > > Ok, I'll drop maxphys addr check. Now Rick added a patch to check aliased GFN. This patch and per-VM mmu_max_gfn become unnecessarily. Now I come up with update to pre_fault to test no memslot case. https://lore.kernel.org/kvm/20240718211230.1492011-19-rick.p.edgecombe@intel.com/ For non-x86 case, I'm not sure if we can expect what error. From d62fc5170b17788041d364e6a17f97f01be4130e Mon Sep 17 00:00:00 2001 Message-ID: <d62fc5170b17788041d364e6a17f97f01be4130e.1721345479.git.isaku.yamahata@intel.com> From: Isaku Yamahata <isaku.yamahata@intel.com> Date: Wed, 29 May 2024 12:13:20 -0700 Subject: [PATCH] KVM: selftests: Update pre_fault_memory_test.c to test no memslot case Add test cases to pass GPA to get ENOENT where no memslot is assigned. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> --- This tests passes for kvm queue branch, also with KVM TDX branch. --- .../selftests/kvm/pre_fault_memory_test.c | 37 ++++++++++++++----- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c index 0350a8896a2f..8d057a0bc6fd 100644 --- a/tools/testing/selftests/kvm/pre_fault_memory_test.c +++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c @@ -30,8 +30,8 @@ static void guest_code(uint64_t base_gpa) GUEST_DONE(); } -static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, - u64 left) +static void __pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, + u64 left, int *ret, int *save_errno) { struct kvm_pre_fault_memory range = { .gpa = gpa, @@ -39,21 +39,28 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, .flags = 0, }; u64 prev; - int ret, save_errno; do { prev = range.size; - ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range); - save_errno = errno; - TEST_ASSERT((range.size < prev) ^ (ret < 0), + *ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range); + *save_errno = errno; + TEST_ASSERT((range.size < prev) ^ (*ret < 0), "%sexpecting range.size to change on %s", - ret < 0 ? "not " : "", - ret < 0 ? "failure" : "success"); - } while (ret >= 0 ? range.size : save_errno == EINTR); + *ret < 0 ? "not " : "", + *ret < 0 ? "failure" : "success"); + } while (*ret >= 0 ? range.size : *save_errno == EINTR); TEST_ASSERT(range.size == left, "Completed with %lld bytes left, expected %" PRId64, range.size, left); +} + +static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, + u64 left) +{ + int ret, save_errno; + + __pre_fault_memory(vcpu, gpa, size, left, &ret, &save_errno); if (left == 0) __TEST_ASSERT_VM_VCPU_IOCTL(!ret, "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm); @@ -77,6 +84,7 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private) uint64_t guest_test_phys_mem; uint64_t guest_test_virt_mem; uint64_t alignment, guest_page_size; + int ret, save_errno; vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code); @@ -101,6 +109,17 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private) pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE); pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE); +#ifdef __x86_64__ + __pre_fault_memory(vcpu, guest_test_phy_mem - guest_page_size, + guest_page_size, guest_page_size, &ret, &save_errno); + __TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT, + "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm); + __pre_fault_memory(vcpu, (vm->max_gfn + 1) << vm->page_shift, + guest_page_size, guest_page_size, &ret, &save_errno); + __TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT, + "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm); +#endif + vcpu_args_set(vcpu, 1, guest_test_virt_mem); vcpu_run(vcpu); base-commit: c8b8b8190a80b591aa73c27c70a668799f8db547
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4e0e9963066f..6ee5af55cee1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4756,6 +4756,11 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, u64 end; int r; + if (range->gpa >= (1UL << cpuid_maxphyaddr(vcpu))) + return -E2BIG; + if (gpa_to_gfn(range->gpa) > kvm_mmu_max_gfn()) + return -E2BIG; + /* * reload is efficient when called repeatedly, so we can do it on * every iteration.