Message ID | 20231016115028.996656-9-michael.roth@amd.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
Series | KVM: gmem hooks/changes needed for x86 (other archs?) | expand |
On Mon, Oct 16, 2023, Michael Roth wrote: > For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to > determine with an #NPF is due to a private/shared access by the guest. > Implement that handling here. Also add handling needed to deal with > SNP guests which in some cases will make MMIO accesses with the > encryption bit. ... > @@ -4356,12 +4357,19 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > return RET_PF_EMULATE; > } > > - if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { > + /* > + * In some cases SNP guests will make MMIO accesses with the encryption > + * bit set. Handle these via the normal MMIO fault path. > + */ > + if (!slot && private_fault && kvm_is_vm_type(vcpu->kvm, KVM_X86_SNP_VM)) > + private_fault = false; Why? This is inarguably a guest bug. > + if (private_fault != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { > kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > return -EFAULT; > } > > - if (fault->is_private) > + if (private_fault) > return kvm_faultin_pfn_private(vcpu, fault); > > async = false; > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > index 759c8b718201..e5b973051ad9 100644 > --- a/arch/x86/kvm/mmu/mmu_internal.h > +++ b/arch/x86/kvm/mmu/mmu_internal.h > @@ -251,6 +251,24 @@ struct kvm_page_fault { > > int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err) > +{ > + bool private_fault = false; > + > + if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) { > + private_fault = !!(err & PFERR_GUEST_ENC_MASK); > + } else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) { > + /* > + * This handling is for gmem self-tests and guests that treat > + * userspace as the authority on whether a fault should be > + * private or not. > + */ > + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > + } This can be more simply: if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) return !!(err & PFERR_GUEST_ENC_MASK); if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) return kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
On Tue, Jan 30, 2024 at 05:13:00PM -0800, Sean Christopherson wrote: > On Mon, Oct 16, 2023, Michael Roth wrote: > > For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to > > determine with an #NPF is due to a private/shared access by the guest. > > Implement that handling here. Also add handling needed to deal with > > SNP guests which in some cases will make MMIO accesses with the > > encryption bit. > > ... > > > @@ -4356,12 +4357,19 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > return RET_PF_EMULATE; > > } > > > > - if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { > > + /* > > + * In some cases SNP guests will make MMIO accesses with the encryption > > + * bit set. Handle these via the normal MMIO fault path. > > + */ > > + if (!slot && private_fault && kvm_is_vm_type(vcpu->kvm, KVM_X86_SNP_VM)) > > + private_fault = false; > > Why? This is inarguably a guest bug. AFAICT this isn't explicitly disallowed by the SNP spec. There was however a set of security mitigations for SEV-ES that resulted in this being behavior being highly discouraged in linux guest code: https://lkml.org/lkml/2020/10/20/464 as well as OVMF guest code: https://edk2.groups.io/g/devel/message/69948 However the OVMF guest code still allows 1 exception for accesses to the local APIC base address, which is the only case I'm aware of that triggers this condition: https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/CcExitLib/CcExitVcHandler.c#L100 I think the rationale there is that if the guest absolutely *knows* that encrypted information is not stored at a particular MMIO address, then it can selectively choose to allow for exceptional cases like these. So KVM would need to allow for these cases in order to be fully compatible with existing SNP guests that do this. > > > + if (private_fault != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { > > kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > > return -EFAULT; > > } > > > > - if (fault->is_private) > > + if (private_fault) > > return kvm_faultin_pfn_private(vcpu, fault); > > > > async = false; > > diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h > > index 759c8b718201..e5b973051ad9 100644 > > --- a/arch/x86/kvm/mmu/mmu_internal.h > > +++ b/arch/x86/kvm/mmu/mmu_internal.h > > @@ -251,6 +251,24 @@ struct kvm_page_fault { > > > > int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); > > > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err) > > +{ > > + bool private_fault = false; > > + > > + if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) { > > + private_fault = !!(err & PFERR_GUEST_ENC_MASK); > > + } else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) { > > + /* > > + * This handling is for gmem self-tests and guests that treat > > + * userspace as the authority on whether a fault should be > > + * private or not. > > + */ > > + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > > + } > > This can be more simply: > > if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) > return !!(err & PFERR_GUEST_ENC_MASK); > > if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) > return kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > Yes, indeed. But TDX has taken a different approach for SW_PROTECTED_VM case where they do this check in kvm_mmu_page_fault() and then synthesize the PFERR_GUEST_ENC_MASK into error_code before calling kvm_mmu_do_page_fault(). It's not in the v18 patchset AFAICT, but it's in the tdx-upstream git branch that corresponds to it: https://github.com/intel/tdx/commit/3717a903ef453aa7b62e7eb65f230566b7f158d4 Would you prefer that SNP adopt the same approach? -Mike
On Wed, Feb 07, 2024, Michael Roth wrote: > On Tue, Jan 30, 2024 at 05:13:00PM -0800, Sean Christopherson wrote: > > On Mon, Oct 16, 2023, Michael Roth wrote: > > > For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to > > > determine with an #NPF is due to a private/shared access by the guest. > > > Implement that handling here. Also add handling needed to deal with > > > SNP guests which in some cases will make MMIO accesses with the > > > encryption bit. > > > > ... > > > > > @@ -4356,12 +4357,19 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > > return RET_PF_EMULATE; > > > } > > > > > > - if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { > > > + /* > > > + * In some cases SNP guests will make MMIO accesses with the encryption > > > + * bit set. Handle these via the normal MMIO fault path. > > > + */ > > > + if (!slot && private_fault && kvm_is_vm_type(vcpu->kvm, KVM_X86_SNP_VM)) > > > + private_fault = false; > > > > Why? This is inarguably a guest bug. > > AFAICT this isn't explicitly disallowed by the SNP spec. There are _lots_ of things that aren't explicitly disallowed by the APM, that doesn't mean that _KVM_ needs to actively support them. I am *not* taking on more broken crud in KVM to workaround OVMF's stupidity, the KVM_X86_QUIRK_CD_NW_CLEARED has taken up literally days of my time at this point. > So KVM would need to allow for these cases in order to be fully compatible > with existing SNP guests that do this. No. KVM does not yet support SNP, so as far as KVM's ABI goes, there are no existing guests. Yes, I realize that I am burying my head in the sand to some extent, but it is simply not sustainable for KVM to keep trying to pick up the pieces of poorly defined hardware specs and broken guest firmware. > > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err) > > > +{ > > > + bool private_fault = false; > > > + > > > + if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) { > > > + private_fault = !!(err & PFERR_GUEST_ENC_MASK); > > > + } else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) { > > > + /* > > > + * This handling is for gmem self-tests and guests that treat > > > + * userspace as the authority on whether a fault should be > > > + * private or not. > > > + */ > > > + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > > > + } > > > > This can be more simply: > > > > if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) > > return !!(err & PFERR_GUEST_ENC_MASK); > > > > if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) > > return kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > > > > Yes, indeed. But TDX has taken a different approach for SW_PROTECTED_VM > case where they do this check in kvm_mmu_page_fault() and then synthesize > the PFERR_GUEST_ENC_MASK into error_code before calling > kvm_mmu_do_page_fault(). It's not in the v18 patchset AFAICT, but it's > in the tdx-upstream git branch that corresponds to it: > > https://github.com/intel/tdx/commit/3717a903ef453aa7b62e7eb65f230566b7f158d4 > > Would you prefer that SNP adopt the same approach? Ah, yes, 'twas my suggestion in the first place. FWIW, I was just reviewing the literal code here and wasn't paying much attention to the content. https://lore.kernel.org/all/f474282d701aca7af00e4f7171445abb5e734c6f.1689893403.git.isaku.yamahata@intel.com
On Thu, Feb 8, 2024 at 6:27 PM Sean Christopherson <seanjc@google.com> wrote: > No. KVM does not yet support SNP, so as far as KVM's ABI goes, there are no > existing guests. Yes, I realize that I am burying my head in the sand to some > extent, but it is simply not sustainable for KVM to keep trying to pick up the > pieces of poorly defined hardware specs and broken guest firmware. 101% agreed. There are cases in which we have to and should bend together backwards for guests (e.g. older Linux kernels), but not for code that---according to current practices---is chosen by the host admin. (I am of the opinion that "bring your own firmware" is the only sane way to handle attestation/measurement, but that's not how things are done currently). Paolo > > > > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err) > > > > +{ > > > > + bool private_fault = false; > > > > + > > > > + if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) { > > > > + private_fault = !!(err & PFERR_GUEST_ENC_MASK); > > > > + } else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) { > > > > + /* > > > > + * This handling is for gmem self-tests and guests that treat > > > > + * userspace as the authority on whether a fault should be > > > > + * private or not. > > > > + */ > > > > + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > > > > + } > > > > > > This can be more simply: > > > > > > if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) > > > return !!(err & PFERR_GUEST_ENC_MASK); > > > > > > if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) > > > return kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); > > > > > > > Yes, indeed. But TDX has taken a different approach for SW_PROTECTED_VM > > case where they do this check in kvm_mmu_page_fault() and then synthesize > > the PFERR_GUEST_ENC_MASK into error_code before calling > > kvm_mmu_do_page_fault(). It's not in the v18 patchset AFAICT, but it's > > in the tdx-upstream git branch that corresponds to it: > > > > https://github.com/intel/tdx/commit/3717a903ef453aa7b62e7eb65f230566b7f158d4 > > > > Would you prefer that SNP adopt the same approach? > > Ah, yes, 'twas my suggestion in the first place. FWIW, I was just reviewing the > literal code here and wasn't paying much attention to the content. > > https://lore.kernel.org/all/f474282d701aca7af00e4f7171445abb5e734c6f.1689893403.git.isaku.yamahata@intel.com >
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 686f88c263a9..10c323e2faa4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4327,6 +4327,7 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; + bool private_fault = fault->is_private; bool async; /* @@ -4356,12 +4357,19 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } - if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + /* + * In some cases SNP guests will make MMIO accesses with the encryption + * bit set. Handle these via the normal MMIO fault path. + */ + if (!slot && private_fault && kvm_is_vm_type(vcpu->kvm, KVM_X86_SNP_VM)) + private_fault = false; + + if (private_fault != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { kvm_mmu_prepare_memory_fault_exit(vcpu, fault); return -EFAULT; } - if (fault->is_private) + if (private_fault) return kvm_faultin_pfn_private(vcpu, fault); async = false; diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 759c8b718201..e5b973051ad9 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -251,6 +251,24 @@ struct kvm_page_fault { int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err) +{ + bool private_fault = false; + + if (kvm_is_vm_type(kvm, KVM_X86_SNP_VM)) { + private_fault = !!(err & PFERR_GUEST_ENC_MASK); + } else if (kvm_is_vm_type(kvm, KVM_X86_SW_PROTECTED_VM)) { + /* + * This handling is for gmem self-tests and guests that treat + * userspace as the authority on whether a fault should be + * private or not. + */ + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT); + } + + return private_fault; +} + /* * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(), * and of course kvm_mmu_do_page_fault(). @@ -298,7 +316,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .max_level = KVM_MAX_HUGEPAGE_LEVEL, .req_level = PG_LEVEL_4K, .goal_level = PG_LEVEL_4K, - .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT), + .is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err), }; int r;
For KVM_X86_SNP_VM, only the PFERR_GUEST_ENC_MASK flag is needed to determine with an #NPF is due to a private/shared access by the guest. Implement that handling here. Also add handling needed to deal with SNP guests which in some cases will make MMIO accesses with the encryption bit. Signed-off-by: Michael Roth <michael.roth@amd.com> --- arch/x86/kvm/mmu/mmu.c | 12 ++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 20 +++++++++++++++++++- 2 files changed, 29 insertions(+), 3 deletions(-)