Message ID | 20240510211024.556136-14-michael.roth@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PULL,01/19] KVM: MMU: Disable fast path if KVM_EXIT_MEMORY_FAULT is needed | expand |
On Fri, May 10, 2024, Michael Roth wrote: > Implement a platform hook to do the work of restoring the direct map > entries of gmem-managed pages and transitioning the corresponding RMP > table entries back to the default shared/hypervisor-owned state. ... > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) > +{ > + kvm_pfn_t pfn; > + > + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); > + > + for (pfn = start; pfn < end;) { > + bool use_2m_update = false; > + int rc, rmp_level; > + bool assigned; > + > + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); > + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n", > + pfn, rc)) > + goto next_pfn; This is comically trivial to hit, as it fires when running guest_memfd_test on a !SNP host. Presumably the correct fix is to simply do nothing for !sev_snp_guest(), but that's easier said than done due to the lack of a @kvm in .gmem_invalidate(). That too is not a big fix, but that's beside the point. IMO, the fact that I'm the first person to (completely inadvertantly) hit this rather basic bug is a good hint that we should wait until 6.11 to merge SNP support.
On Wed, May 15, 2024 at 03:32:31PM -0700, Sean Christopherson wrote: > On Fri, May 10, 2024, Michael Roth wrote: > > Implement a platform hook to do the work of restoring the direct map > > entries of gmem-managed pages and transitioning the corresponding RMP > > table entries back to the default shared/hypervisor-owned state. > > ... > > > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) > > +{ > > + kvm_pfn_t pfn; > > + > > + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); > > + > > + for (pfn = start; pfn < end;) { > > + bool use_2m_update = false; > > + int rc, rmp_level; > > + bool assigned; > > + > > + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); > > + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n", > > + pfn, rc)) > > + goto next_pfn; > > This is comically trivial to hit, as it fires when running guest_memfd_test on a > !SNP host. Presumably the correct fix is to simply do nothing for !sev_snp_guest(), > but that's easier said than done due to the lack of a @kvm in .gmem_invalidate(). Yah, the code assumes that SNP is the only SVM user that would use gmem pages. Unfortunately KVM_X86_SW_PROTECTED_VM is the one other situation where this can be the case. The minimal fix would be to squash the below into this patch: diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 176ba117413a..56b0b59b8263 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -4675,6 +4675,9 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) { kvm_pfn_t pfn; + if (!cc_platform_has(CC_ATTR_HOST_SEV_SNP)) + return; + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); for (pfn = start; pfn < end;) { It's not perfect because the callback will still run for KVM_X86_SW_PROTECTED_VM if SNP is enabled, but in the context of KVM_X86_SW_PROTECTED_VM being a stand-in for testing SNP/TDX, that might not be such a bad thing. Longer term if we need something more robust would be to modify the .free_folio callback path to pass along folio->mapping, or switch to something else that provides similar functionality. Another approach might be to set .free_folio dynamically based on the vm_type of the gmem user when creating the gmem instance. > > That too is not a big fix, but that's beside the point. IMO, the fact that I'm > the first person to (completely inadvertantly) hit this rather basic bug is a > good hint that we should wait until 6.11 to merge SNP support. We do regular testing of normal guests with/without SNP enabled, but unfortunately we've only been doing KST runs on SNP-enabled hosts. I've retested with the above fix and everything looks good with SVM/SEV/SEV-ES/SNP/selftests with and without SNP enabled, but I understand if we still have reservations after this. -Mike
On Thu, May 16, 2024 at 12:32 AM Sean Christopherson <seanjc@google.com> wrote: > > +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) > > +{ > > + kvm_pfn_t pfn; > > + > > + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); > > + > > + for (pfn = start; pfn < end;) { > > + bool use_2m_update = false; > > + int rc, rmp_level; > > + bool assigned; > > + > > + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); > > + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n", > > + pfn, rc)) > > + goto next_pfn; > > This is comically trivial to hit, as it fires when running guest_memfd_test on a > !SNP host. Presumably the correct fix is to simply do nothing for !sev_snp_guest(), > but that's easier said than done due to the lack of a @kvm in .gmem_invalidate(). > > That too is not a big fix, but that's beside the point. IMO, the fact that I'm > the first person to (completely inadvertantly) hit this rather basic bug is a > good hint that we should wait until 6.11 to merge SNP support. Of course there is an explanation - I usually run all the tests before pushing anything to kvm/next, here I did not do it because 1) I was busy with the merge window and 2) I wanted to give exposure to the code in linux-next, which was the right call indeed but it's beside the point. Between the clang issue and this one, it's clear that even though the implementation is 99.99% okay (especially considering the size), there are a few kinks to fix. I'll fix everything up and re-push to kvm/next, but I agree that we shouldn't rush it any further. What really matters is that development on userspace can proceed. This also confirms that it's important to replace kvm/next with kvm/queue in linux-next, since linux-next doesn't care that much about branches that rebase. Paolo
On Thu, May 16, 2024 at 5:12 AM Michael Roth <michael.roth@amd.com> wrote: > Longer term if we need something more robust would be to modify the > .free_folio callback path to pass along folio->mapping, or switch to > something else that provides similar functionality. Another approach > might be to set .free_folio dynamically based on the vm_type of the > gmem user when creating the gmem instance. You need to not warn. Testing CC_ATTR_HOST_SEV_SNP is just an optimization. Paolo diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index dc00b89404a2..1c57b4535f15 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -4676,8 +4676,7 @@ void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) bool assigned; rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); - if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n", - pfn, rc)) + if (rc) goto next_pfn; if (!assigned) Paolo
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 10768f13b240..2a7f69abcac3 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -138,6 +138,7 @@ config KVM_AMD_SEV select ARCH_HAS_CC_PLATFORM select KVM_GENERIC_PRIVATE_MEM select HAVE_KVM_GMEM_PREPARE + select HAVE_KVM_GMEM_INVALIDATE help Provides support for launching Encrypted VMs (SEV) and Encrypted VMs with Encrypted State (SEV-ES) on AMD processors. diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 2bc4aa91cd31..379ac6efd74e 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -4663,3 +4663,67 @@ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order) return 0; } + +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) +{ + kvm_pfn_t pfn; + + pr_debug("%s: PFN start 0x%llx PFN end 0x%llx\n", __func__, start, end); + + for (pfn = start; pfn < end;) { + bool use_2m_update = false; + int rc, rmp_level; + bool assigned; + + rc = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); + if (WARN_ONCE(rc, "SEV: Failed to retrieve RMP entry for PFN 0x%llx error %d\n", + pfn, rc)) + goto next_pfn; + + if (!assigned) + goto next_pfn; + + use_2m_update = IS_ALIGNED(pfn, PTRS_PER_PMD) && + end >= (pfn + PTRS_PER_PMD) && + rmp_level > PG_LEVEL_4K; + + /* + * If an unaligned PFN corresponds to a 2M region assigned as a + * large page in the RMP table, PSMASH the region into individual + * 4K RMP entries before attempting to convert a 4K sub-page. + */ + if (!use_2m_update && rmp_level > PG_LEVEL_4K) { + /* + * This shouldn't fail, but if it does, report it, but + * still try to update RMP entry to shared and pray this + * was a spurious error that can be addressed later. + */ + rc = snp_rmptable_psmash(pfn); + WARN_ONCE(rc, "SEV: Failed to PSMASH RMP entry for PFN 0x%llx error %d\n", + pfn, rc); + } + + rc = rmp_make_shared(pfn, use_2m_update ? PG_LEVEL_2M : PG_LEVEL_4K); + if (WARN_ONCE(rc, "SEV: Failed to update RMP entry for PFN 0x%llx error %d\n", + pfn, rc)) + goto next_pfn; + + /* + * SEV-ES avoids host/guest cache coherency issues through + * WBINVD hooks issued via MMU notifiers during run-time, and + * KVM's VM destroy path at shutdown. Those MMU notifier events + * don't cover gmem since there is no requirement to map pages + * to a HVA in order to use them for a running guest. While the + * shutdown path would still likely cover things for SNP guests, + * userspace may also free gmem pages during run-time via + * hole-punching operations on the guest_memfd, so flush the + * cache entries for these pages before free'ing them back to + * the host. + */ + clflush_cache_range(__va(pfn_to_hpa(pfn)), + use_2m_update ? PMD_SIZE : PAGE_SIZE); +next_pfn: + pfn += use_2m_update ? PTRS_PER_PMD : 1; + cond_resched(); + } +} diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index b9ecc06f8934..653cdb23a7d1 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5083,6 +5083,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .alloc_apic_backing_page = svm_alloc_apic_backing_page, .gmem_prepare = sev_gmem_prepare, + .gmem_invalidate = sev_gmem_invalidate, }; /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 4203bd9012e9..3cea024a7c18 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -737,6 +737,7 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code); void sev_vcpu_unblocking(struct kvm_vcpu *vcpu); void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu); int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); +void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); #else static inline struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu) { return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); @@ -757,6 +758,7 @@ static inline int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, in { return 0; } +static inline void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end) {} #endif