Message ID | 20250401221107.921677-1-seanjc@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | KVM: VMX: Add a quirk to (not) honor guest PAT on CPUs that support self-snoop | expand |
On Tue, Apr 01, 2025 at 03:11:07PM -0700, Sean Christopherson wrote: > Add back support for honoring guest PAT on Intel CPUs that support self- > snoop (and don't have errata), but guarded by a quirk so as not to break > existing setups that subtly relied on KVM forcing WB for synthetic > devices. > > This effectively reverts commit 9d70f3fec14421e793ffbc0ec2f739b24e534900 > and reapplies 377b2f359d1f71c75f8cc352b5c81f2210312d83, but with a quirk. > > Cc: Yan Zhao <yan.y.zhao@intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > Hi Sean, > AFAIK, we don't have an answer as to whether the slow UC behavior on CLX+ > is working as intended or a CPU flaw, which Paolo was hoping we would get We did answer the slow UC behavior is working as intended at [1]. "After consulting with CPU architects, it's told that this behavior is expected on ICX/SPR Xeon platforms due to the snooping implementation." Paolo then help update the series to v2 [2] /v3 [3]. Did you overlook those series, or is there something I missed? Thanks Yan [1] https://lore.kernel.org/all/20250224070716.31360-1-yan.y.zhao@intel.com/ [2] https://lore.kernel.org/all/20250301073428.2435768-1-pbonzini@redhat.com/ [3] https://lore.kernel.org/all/20250304060647.2903469-1-pbonzini@redhat.com/ > before adding a quirk. But I don't want to lose sight of honoring guest > PAT, nor am I particularly inclined to force end users to wait for a > definitive answer on hardware they may not even care about. > > Documentation/virt/kvm/api.rst | 25 +++++++++++++++++++++++++ > arch/x86/include/asm/kvm_host.h | 3 ++- > arch/x86/include/uapi/asm/kvm.h | 1 + > arch/x86/kvm/mmu.h | 2 +- > arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++---- > arch/x86/kvm/vmx/vmx.c | 11 +++++++---- > arch/x86/kvm/x86.c | 2 +- > 7 files changed, 50 insertions(+), 11 deletions(-) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 1f8625b7646a..2a1444d99c37 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -8158,6 +8158,31 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the > and 0x489), as KVM does now allow them to > be set by userspace (KVM sets them based on > guest CPUID, for safety purposes). > + > +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel CPUs with TDP (EPT) > + enabled, KVM ignores guest PAT unless the > + VM has an assigned non-coherent device, > + even if it is entirely safe/correct for KVM > + to honor guest PAT. When this quirk is > + disabled, and the host CPU fully supports > + selfsnoop (isn't affected by errata), KVM > + honors guest PAT for all VMs. > + > + The only _known_ issue with honoring guest > + PAT is when QEMU's Bochs VGA is exposed to > + a VM on Cascade Lake and later Intel server > + CPUs, and the guest kernel is running an > + outdated driver that maps video RAM as UC. > + Accessing UC memory on the affected Intel > + CPUs is an order of magnitude slower than > + previous generations, to the point where > + the access latency prevents the guest from > + booting. This quirk can likely be disabled > + if the above do not hold true. > + > + Note, KVM always honors guest PAT on AMD > + CPUs when TDP (NPT) is enabled. KVM never > + honors guest PAT when TDP is disabled. > =================================== ============================================ > > 7.32 KVM_CAP_MAX_VCPU_ID > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index a884ab544335..427b906da5cc 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -2409,7 +2409,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); > KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \ > KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \ > KVM_X86_QUIRK_SLOT_ZAP_ALL | \ > - KVM_X86_QUIRK_STUFF_FEATURE_MSRS) > + KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \ > + KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT) > > /* > * KVM previously used a u32 field in kvm_run to indicate the hypercall was > diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h > index 460306b35a4b..074e2b74e68c 100644 > --- a/arch/x86/include/uapi/asm/kvm.h > +++ b/arch/x86/include/uapi/asm/kvm.h > @@ -441,6 +441,7 @@ struct kvm_sync_regs { > #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6) > #define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7) > #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8) > +#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT (1 << 9) > > #define KVM_STATE_NESTED_FORMAT_VMX 0 > #define KVM_STATE_NESTED_FORMAT_SVM 1 > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h > index 050a0e229a4d..639264635a1a 100644 > --- a/arch/x86/kvm/mmu.h > +++ b/arch/x86/kvm/mmu.h > @@ -231,7 +231,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > return -(u32)fault & errcode; > } > > -bool kvm_mmu_may_ignore_guest_pat(void); > +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm); > > int kvm_mmu_post_init_vm(struct kvm *kvm); > void kvm_mmu_pre_destroy_vm(struct kvm *kvm); > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 63bb77ee1bb1..16c64e80d946 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4835,18 +4835,27 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, > } > #endif > > -bool kvm_mmu_may_ignore_guest_pat(void) > +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm) > { > /* > - * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM > + * When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does > + * not support self-snoop (or is affected by an erratum), and the VM > * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to > * honor the memtype from the guest's PAT so that guest accesses to > * memory that is DMA'd aren't cached against the guest's wishes. As a > * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA, > - * KVM _always_ ignores guest PAT (when EPT is enabled). > + * KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE > + * bits in response to non-coherent device (un)registration. > + * > + * Due to an unfortunate confluence of slow hardware, suboptimal guest > + * drivers, and historical use cases, honoring self-snoop and guest PAT > + * is also buried behind a quirk. > */ > - return shadow_memtype_mask; > + return (!static_cpu_has(X86_FEATURE_SELFSNOOP) || > + kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT)) && > + shadow_memtype_mask; > } > +EXPORT_SYMBOL_GPL(kvm_mmu_may_ignore_guest_pat); > > int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > { > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index b70ed72c1783..734db162cab3 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7730,11 +7730,14 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) > > /* > * Force WB and ignore guest PAT if the VM does NOT have a non-coherent > - * device attached. Letting the guest control memory types on Intel > - * CPUs may result in unexpected behavior, and so KVM's ABI is to trust > - * the guest to behave only as a last resort. > + * device attached, and either the CPU doesn't support self-snoop or > + * KVM's quirk to ignore guest PAT is enabled. Letting the guest > + * control memory types on Intel CPUs without self-snoop may result in > + * unexpected behavior, and so KVM's (historical) ABI is to trust the > + * guest to behave only as a last resort. > */ > - if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) > + if (kvm_mmu_may_ignore_guest_pat(vcpu->kvm) && > + !kvm_arch_has_noncoherent_dma(vcpu->kvm)) > return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; > > return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index c841817a914a..4a94eb974f0d 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13528,7 +13528,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm) > * (or last) non-coherent device is (un)registered to so that new SPTEs > * with the correct "ignore guest PAT" setting are created. > */ > - if (kvm_mmu_may_ignore_guest_pat()) > + if (kvm_mmu_may_ignore_guest_pat(kvm)) > kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); > } > > > base-commit: 782f9feaa9517caf33186dcdd6b50a8f770ed29b > -- > 2.49.0.504.g3bcea36a83-goog >
On Wed, Apr 02, 2025, Yan Zhao wrote: > On Tue, Apr 01, 2025 at 03:11:07PM -0700, Sean Christopherson wrote: > > Add back support for honoring guest PAT on Intel CPUs that support self- > > snoop (and don't have errata), but guarded by a quirk so as not to break > > existing setups that subtly relied on KVM forcing WB for synthetic > > devices. > > > > This effectively reverts commit 9d70f3fec14421e793ffbc0ec2f739b24e534900 > > and reapplies 377b2f359d1f71c75f8cc352b5c81f2210312d83, but with a quirk. > > > > Cc: Yan Zhao <yan.y.zhao@intel.com> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > > Hi Sean, > > > AFAIK, we don't have an answer as to whether the slow UC behavior on CLX+ > > is working as intended or a CPU flaw, which Paolo was hoping we would get > We did answer the slow UC behavior is working as intended at [1]. > > "After consulting with CPU architects, > it's told that this behavior is expected on ICX/SPR Xeon platforms due to > the snooping implementation." > > Paolo then help update the series to v2 [2] /v3 [3]. > > Did you overlook those series, or is there something I missed? Nope, you didn't miss anything. I have that series in my TODO folder, but only glanced at it when it flew by and completely missed that it quirks ignoring guest PAT. Not sure how I missed the cover letter subject though... Anyways, ignore this, my bad. Thanks for the update, and sorry for the noise!
On Tue, Apr 01, 2025 at 05:22:38PM -0700, Sean Christopherson wrote: > On Wed, Apr 02, 2025, Yan Zhao wrote: > > On Tue, Apr 01, 2025 at 03:11:07PM -0700, Sean Christopherson wrote: > > > Add back support for honoring guest PAT on Intel CPUs that support self- > > > snoop (and don't have errata), but guarded by a quirk so as not to break > > > existing setups that subtly relied on KVM forcing WB for synthetic > > > devices. > > > > > > This effectively reverts commit 9d70f3fec14421e793ffbc0ec2f739b24e534900 > > > and reapplies 377b2f359d1f71c75f8cc352b5c81f2210312d83, but with a quirk. > > > > > > Cc: Yan Zhao <yan.y.zhao@intel.com> > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > --- > > > > > Hi Sean, > > > > > AFAIK, we don't have an answer as to whether the slow UC behavior on CLX+ > > > is working as intended or a CPU flaw, which Paolo was hoping we would get > > We did answer the slow UC behavior is working as intended at [1]. > > > > "After consulting with CPU architects, > > it's told that this behavior is expected on ICX/SPR Xeon platforms due to > > the snooping implementation." > > > > Paolo then help update the series to v2 [2] /v3 [3]. > > > > Did you overlook those series, or is there something I missed? > > Nope, you didn't miss anything. I have that series in my TODO folder, but only > glanced at it when it flew by and completely missed that it quirks ignoring > guest PAT. Not sure how I missed the cover letter subject though... > > Anyways, ignore this, my bad. Thanks for the update, and sorry for the noise! That's OK :)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 1f8625b7646a..2a1444d99c37 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -8158,6 +8158,31 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the and 0x489), as KVM does now allow them to be set by userspace (KVM sets them based on guest CPUID, for safety purposes). + +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT By default, on Intel CPUs with TDP (EPT) + enabled, KVM ignores guest PAT unless the + VM has an assigned non-coherent device, + even if it is entirely safe/correct for KVM + to honor guest PAT. When this quirk is + disabled, and the host CPU fully supports + selfsnoop (isn't affected by errata), KVM + honors guest PAT for all VMs. + + The only _known_ issue with honoring guest + PAT is when QEMU's Bochs VGA is exposed to + a VM on Cascade Lake and later Intel server + CPUs, and the guest kernel is running an + outdated driver that maps video RAM as UC. + Accessing UC memory on the affected Intel + CPUs is an order of magnitude slower than + previous generations, to the point where + the access latency prevents the guest from + booting. This quirk can likely be disabled + if the above do not hold true. + + Note, KVM always honors guest PAT on AMD + CPUs when TDP (NPT) is enabled. KVM never + honors guest PAT when TDP is disabled. =================================== ============================================ 7.32 KVM_CAP_MAX_VCPU_ID diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a884ab544335..427b906da5cc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2409,7 +2409,8 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); KVM_X86_QUIRK_FIX_HYPERCALL_INSN | \ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS | \ KVM_X86_QUIRK_SLOT_ZAP_ALL | \ - KVM_X86_QUIRK_STUFF_FEATURE_MSRS) + KVM_X86_QUIRK_STUFF_FEATURE_MSRS | \ + KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT) /* * KVM previously used a u32 field in kvm_run to indicate the hypercall was diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 460306b35a4b..074e2b74e68c 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -441,6 +441,7 @@ struct kvm_sync_regs { #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS (1 << 6) #define KVM_X86_QUIRK_SLOT_ZAP_ALL (1 << 7) #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS (1 << 8) +#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT (1 << 9) #define KVM_STATE_NESTED_FORMAT_VMX 0 #define KVM_STATE_NESTED_FORMAT_SVM 1 diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 050a0e229a4d..639264635a1a 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -231,7 +231,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, return -(u32)fault & errcode; } -bool kvm_mmu_may_ignore_guest_pat(void); +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm); int kvm_mmu_post_init_vm(struct kvm *kvm); void kvm_mmu_pre_destroy_vm(struct kvm *kvm); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 63bb77ee1bb1..16c64e80d946 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4835,18 +4835,27 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu, } #endif -bool kvm_mmu_may_ignore_guest_pat(void) +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm) { /* - * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM + * When EPT is enabled (shadow_memtype_mask is non-zero), the CPU does + * not support self-snoop (or is affected by an erratum), and the VM * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to * honor the memtype from the guest's PAT so that guest accesses to * memory that is DMA'd aren't cached against the guest's wishes. As a * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA, - * KVM _always_ ignores guest PAT (when EPT is enabled). + * KVM _always_ ignores or honors guest PAT, i.e. doesn't toggle SPTE + * bits in response to non-coherent device (un)registration. + * + * Due to an unfortunate confluence of slow hardware, suboptimal guest + * drivers, and historical use cases, honoring self-snoop and guest PAT + * is also buried behind a quirk. */ - return shadow_memtype_mask; + return (!static_cpu_has(X86_FEATURE_SELFSNOOP) || + kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT)) && + shadow_memtype_mask; } +EXPORT_SYMBOL_GPL(kvm_mmu_may_ignore_guest_pat); int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index b70ed72c1783..734db162cab3 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7730,11 +7730,14 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) /* * Force WB and ignore guest PAT if the VM does NOT have a non-coherent - * device attached. Letting the guest control memory types on Intel - * CPUs may result in unexpected behavior, and so KVM's ABI is to trust - * the guest to behave only as a last resort. + * device attached, and either the CPU doesn't support self-snoop or + * KVM's quirk to ignore guest PAT is enabled. Letting the guest + * control memory types on Intel CPUs without self-snoop may result in + * unexpected behavior, and so KVM's (historical) ABI is to trust the + * guest to behave only as a last resort. */ - if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) + if (kvm_mmu_may_ignore_guest_pat(vcpu->kvm) && + !kvm_arch_has_noncoherent_dma(vcpu->kvm)) return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c841817a914a..4a94eb974f0d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13528,7 +13528,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm) * (or last) non-coherent device is (un)registered to so that new SPTEs * with the correct "ignore guest PAT" setting are created. */ - if (kvm_mmu_may_ignore_guest_pat()) + if (kvm_mmu_may_ignore_guest_pat(kvm)) kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL)); }
Add back support for honoring guest PAT on Intel CPUs that support self- snoop (and don't have errata), but guarded by a quirk so as not to break existing setups that subtly relied on KVM forcing WB for synthetic devices. This effectively reverts commit 9d70f3fec14421e793ffbc0ec2f739b24e534900 and reapplies 377b2f359d1f71c75f8cc352b5c81f2210312d83, but with a quirk. Cc: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- AFAIK, we don't have an answer as to whether the slow UC behavior on CLX+ is working as intended or a CPU flaw, which Paolo was hoping we would get before adding a quirk. But I don't want to lose sight of honoring guest PAT, nor am I particularly inclined to force end users to wait for a definitive answer on hardware they may not even care about. Documentation/virt/kvm/api.rst | 25 +++++++++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/include/uapi/asm/kvm.h | 1 + arch/x86/kvm/mmu.h | 2 +- arch/x86/kvm/mmu/mmu.c | 17 +++++++++++++---- arch/x86/kvm/vmx/vmx.c | 11 +++++++---- arch/x86/kvm/x86.c | 2 +- 7 files changed, 50 insertions(+), 11 deletions(-) base-commit: 782f9feaa9517caf33186dcdd6b50a8f770ed29b