Message ID | 20240517173926.965351-38-seanjc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86: CPUID overhaul, fixes, and caching | expand |
On 5/18/2024 1:39 AM, Sean Christopherson wrote: [...] > index e021681f34ac..ad0168d3aec5 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - __set_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); This reverse_cpuid_check() seems unnecessary since in patch(17), we already have moved it in __feature_leaf(). But I don't have full source code to double check it now. > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > } > > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > @@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - return test_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); Ditto. > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); > } > > [...]
On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote: > Replace the internals of the governed features framework with a more > comprehensive "guest CPU capabilities" implementation, i.e. with a guest > version of kvm_cpu_caps. Keep the skeleton of governed features around > for now as vmx_adjust_sec_exec_control() relies on detecting governed > features to do the right thing for XSAVES, and switching all guest feature > queries to guest_cpu_cap_has() requires subtle and non-trivial changes, > i.e. is best done as a standalone change. > > Tracking *all* guest capabilities that KVM cares will allow excising the > poorly named "governed features" framework, and effectively optimizes all > KVM queries of guest capabilities, i.e. doesn't require making a > subjective decision as to whether or not a feature is worth "governing", > and doesn't require adding the code to do so. > > The cost of tracking all features is currently 92 bytes per vCPU on 64-bit > kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features. > That cost is well worth paying even if the only benefit was eliminating > the "governed features" terminology. And practically speaking, the real > cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm > into a new order-N allocation, and if that happens there are better ways > to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR > state separate allocations. > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> > Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/include/asm/kvm_host.h | 45 +++++++++++++++++++++------------ > arch/x86/kvm/cpuid.c | 14 +++++++--- > arch/x86/kvm/cpuid.h | 12 ++++----- > arch/x86/kvm/reverse_cpuid.h | 16 ------------ > 4 files changed, 46 insertions(+), 41 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 3003e99155e7..8840d21ee0b5 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -743,6 +743,22 @@ struct kvm_queued_exception { > bool has_payload; > }; > > +/* > + * Hardware-defined CPUID leafs that are either scattered by the kernel or are > + * unknown to the kernel, but need to be directly used by KVM. Note, these > + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. > + */ > +enum kvm_only_cpuid_leafs { > + CPUID_12_EAX = NCAPINTS, > + CPUID_7_1_EDX, > + CPUID_8000_0007_EDX, > + CPUID_8000_0022_EAX, > + CPUID_7_2_EDX, > + NR_KVM_CPU_CAPS, > + > + NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, > +}; > + > struct kvm_vcpu_arch { > /* > * rip and regs accesses must go through > @@ -861,23 +877,20 @@ struct kvm_vcpu_arch { > bool is_amd_compatible; > > /* > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly > - * when "struct kvm_vcpu_arch" is no longer defined in an > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. > - * can be increased as necessary. > + * cpu_caps holds the effective guest capabilities, i.e. the features > + * the vCPU is allowed to use. Typically, but not always, features can > + * be used by the guest if and only if both KVM and userspace want to > + * expose the feature to the guest. Nitpick: Since even the comment mentions this, wouldn't it be better to call this cpu_effective_caps? or at least cpu_eff_caps, to emphasize that these are indeed effective capabilities, e.g these that both kvm and userspace support? > + * > + * A common exception is for virtualization holes, i.e. when KVM can't > + * prevent the guest from using a feature, in which case the vCPU "has" > + * the feature regardless of what KVM or userspace desires. > + * > + * Note, features that don't require KVM involvement in any way are > + * NOT enforced/sanitized by KVM, i.e. are taken verbatim from the > + * guest CPUID provided by userspace. > */ > -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG > - > - /* > - * Track whether or not the guest is allowed to use features that are > - * governed by KVM, where "governed" means KVM needs to manage state > - * and/or explicitly enable the feature in hardware. Typically, but > - * not always, governed features can be used by the guest if and only > - * if both KVM and userspace want to expose the feature to the guest. > - */ > - struct { > - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES); > - } governed_features; > + u32 cpu_caps[NR_KVM_CPU_CAPS]; > > u64 reserved_gpa_bits; > int maxphyaddr; > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 286abefc93d5..89c506cf649b 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -387,9 +387,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > struct kvm_cpuid_entry2 *best; > bool allow_gbpages; > > - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES); > - bitmap_zero(vcpu->arch.governed_features.enabled, > - KVM_MAX_NR_GOVERNED_FEATURES); > + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > kvm_update_cpuid_runtime(vcpu); > > @@ -473,6 +471,7 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu) > static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > int nent) > { > + u32 vcpu_caps[NR_KVM_CPU_CAPS]; > int r; > > /* > @@ -480,10 +479,18 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > * order to massage the new entries, e.g. to account for dynamic bits > * that KVM controls, without clobbering the current guest CPUID, which > * KVM needs to preserve in order to unwind on failure. > + * > + * Similarly, save the vCPU's current cpu_caps so that the capabilities > + * can be updated alongside the CPUID entries when performing runtime > + * updates. Full initialization is done if and only if the vCPU hasn't > + * run, i.e. only if userspace is potentially changing CPUID features. > */ > swap(vcpu->arch.cpuid_entries, e2); > swap(vcpu->arch.cpuid_nent, nent); > > + memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps)); > + BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps)); > + > /* > * KVM does not correctly handle changing guest CPUID after KVM_RUN, as > * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't > @@ -527,6 +534,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, > return 0; > > err: > + memcpy(vcpu->arch.cpu_caps, vcpu_caps, sizeof(vcpu_caps)); > swap(vcpu->arch.cpuid_entries, e2); > swap(vcpu->arch.cpuid_nent, nent); > return r; > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > index e021681f34ac..ad0168d3aec5 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - __set_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); Indeed, no need for reverse_cpuid_check here, as already mentioned. > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > } > > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > @@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - return test_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); > } > > static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) > diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h > index 245f71c16272..63d5735fbc8a 100644 > --- a/arch/x86/kvm/reverse_cpuid.h > +++ b/arch/x86/kvm/reverse_cpuid.h > @@ -6,22 +6,6 @@ > #include <asm/cpufeature.h> > #include <asm/cpufeatures.h> > > -/* > - * Hardware-defined CPUID leafs that are either scattered by the kernel or are > - * unknown to the kernel, but need to be directly used by KVM. Note, these > - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. > - */ > -enum kvm_only_cpuid_leafs { > - CPUID_12_EAX = NCAPINTS, > - CPUID_7_1_EDX, > - CPUID_8000_0007_EDX, > - CPUID_8000_0022_EAX, > - CPUID_7_2_EDX, > - NR_KVM_CPU_CAPS, > - > - NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, > -}; > - > /* > * Define a KVM-only feature flag. > * Overall: Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Best regards, Maxim Levitsky
On Thu, Jul 04, 2024, Maxim Levitsky wrote: > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote: > > @@ -861,23 +877,20 @@ struct kvm_vcpu_arch { > > bool is_amd_compatible; > > > > /* > > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly > > - * when "struct kvm_vcpu_arch" is no longer defined in an > > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. > > - * can be increased as necessary. > > + * cpu_caps holds the effective guest capabilities, i.e. the features > > + * the vCPU is allowed to use. Typically, but not always, features can > > + * be used by the guest if and only if both KVM and userspace want to > > + * expose the feature to the guest. > > Nitpick: Since even the comment mentions this, wouldn't it be better to call this > cpu_effective_caps? or at least cpu_eff_caps, to emphasize that these are indeed > effective capabilities, e.g these that both kvm and userspace support? I strongly prefer cpu_caps, in part to match kvm_cpu_caps, but also because adding "effective" to the name incorrectly suggests that there are other guest capabilities that aren't effective. These are the _only_ per-vCPU capabilities as far as KVM is concerned, i.e. they are the single source of truth. kvm_cpu_caps holds KVM's capabilities, boot_cpu_data holds kernel capabilities, and bare metal holds its capabilities somewhere in silicion. E.g. being pedantic, kvm_cpu_caps are also KVM's effective capabilities, as they are a reflection of KVM-the-module's capabilities, module params, kernel capabilities, and CPU capabilities.
On Tue, 2024-07-09 at 11:30 -0700, Sean Christopherson wrote: > On Thu, Jul 04, 2024, Maxim Levitsky wrote: > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote: > > > @@ -861,23 +877,20 @@ struct kvm_vcpu_arch { > > > bool is_amd_compatible; > > > > > > /* > > > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly > > > - * when "struct kvm_vcpu_arch" is no longer defined in an > > > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. > > > - * can be increased as necessary. > > > + * cpu_caps holds the effective guest capabilities, i.e. the features > > > + * the vCPU is allowed to use. Typically, but not always, features can > > > + * be used by the guest if and only if both KVM and userspace want to > > > + * expose the feature to the guest. > > > > Nitpick: Since even the comment mentions this, wouldn't it be better to call this > > cpu_effective_caps? or at least cpu_eff_caps, to emphasize that these are indeed > > effective capabilities, e.g these that both kvm and userspace support? > > I strongly prefer cpu_caps, in part to match kvm_cpu_caps, but also because adding > "effective" to the name incorrectly suggests that there are other guest capabilities > that aren't effective. These are the _only_ per-vCPU capabilities as far as KVM > is concerned, i.e. they are the single source of truth. kvm_cpu_caps holds KVM's > capabilities, boot_cpu_data holds kernel capabilities, and bare metal holds its > capabilities somewhere in silicion. Looking from this POV, it make sense. > > E.g. being pedantic, kvm_cpu_caps are also KVM's effective capabilities, as they > are a reflection of KVM-the-module's capabilities, module params, kernel capabilities, > and CPU capabilities. > Let it be then, Best regards, Maxim Levitsky
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3003e99155e7..8840d21ee0b5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -743,6 +743,22 @@ struct kvm_queued_exception { bool has_payload; }; +/* + * Hardware-defined CPUID leafs that are either scattered by the kernel or are + * unknown to the kernel, but need to be directly used by KVM. Note, these + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. + */ +enum kvm_only_cpuid_leafs { + CPUID_12_EAX = NCAPINTS, + CPUID_7_1_EDX, + CPUID_8000_0007_EDX, + CPUID_8000_0022_EAX, + CPUID_7_2_EDX, + NR_KVM_CPU_CAPS, + + NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, +}; + struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -861,23 +877,20 @@ struct kvm_vcpu_arch { bool is_amd_compatible; /* - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly - * when "struct kvm_vcpu_arch" is no longer defined in an - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. - * can be increased as necessary. + * cpu_caps holds the effective guest capabilities, i.e. the features + * the vCPU is allowed to use. Typically, but not always, features can + * be used by the guest if and only if both KVM and userspace want to + * expose the feature to the guest. + * + * A common exception is for virtualization holes, i.e. when KVM can't + * prevent the guest from using a feature, in which case the vCPU "has" + * the feature regardless of what KVM or userspace desires. + * + * Note, features that don't require KVM involvement in any way are + * NOT enforced/sanitized by KVM, i.e. are taken verbatim from the + * guest CPUID provided by userspace. */ -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG - - /* - * Track whether or not the guest is allowed to use features that are - * governed by KVM, where "governed" means KVM needs to manage state - * and/or explicitly enable the feature in hardware. Typically, but - * not always, governed features can be used by the guest if and only - * if both KVM and userspace want to expose the feature to the guest. - */ - struct { - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES); - } governed_features; + u32 cpu_caps[NR_KVM_CPU_CAPS]; u64 reserved_gpa_bits; int maxphyaddr; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 286abefc93d5..89c506cf649b 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -387,9 +387,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; bool allow_gbpages; - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES); - bitmap_zero(vcpu->arch.governed_features.enabled, - KVM_MAX_NR_GOVERNED_FEATURES); + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); kvm_update_cpuid_runtime(vcpu); @@ -473,6 +471,7 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu) static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, int nent) { + u32 vcpu_caps[NR_KVM_CPU_CAPS]; int r; /* @@ -480,10 +479,18 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, * order to massage the new entries, e.g. to account for dynamic bits * that KVM controls, without clobbering the current guest CPUID, which * KVM needs to preserve in order to unwind on failure. + * + * Similarly, save the vCPU's current cpu_caps so that the capabilities + * can be updated alongside the CPUID entries when performing runtime + * updates. Full initialization is done if and only if the vCPU hasn't + * run, i.e. only if userspace is potentially changing CPUID features. */ swap(vcpu->arch.cpuid_entries, e2); swap(vcpu->arch.cpuid_nent, nent); + memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps)); + BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps)); + /* * KVM does not correctly handle changing guest CPUID after KVM_RUN, as * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't @@ -527,6 +534,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, return 0; err: + memcpy(vcpu->arch.cpu_caps, vcpu_caps, sizeof(vcpu_caps)); swap(vcpu->arch.cpuid_entries, e2); swap(vcpu->arch.cpuid_nent, nent); return r; diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index e021681f34ac..ad0168d3aec5 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, unsigned int x86_feature) { - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); + unsigned int x86_leaf = __feature_leaf(x86_feature); - __set_bit(kvm_governed_feature_index(x86_feature), - vcpu->arch.governed_features.enabled); + reverse_cpuid_check(x86_leaf); + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); } static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, @@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, unsigned int x86_feature) { - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); + unsigned int x86_leaf = __feature_leaf(x86_feature); - return test_bit(kvm_governed_feature_index(x86_feature), - vcpu->arch.governed_features.enabled); + reverse_cpuid_check(x86_leaf); + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); } static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h index 245f71c16272..63d5735fbc8a 100644 --- a/arch/x86/kvm/reverse_cpuid.h +++ b/arch/x86/kvm/reverse_cpuid.h @@ -6,22 +6,6 @@ #include <asm/cpufeature.h> #include <asm/cpufeatures.h> -/* - * Hardware-defined CPUID leafs that are either scattered by the kernel or are - * unknown to the kernel, but need to be directly used by KVM. Note, these - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. - */ -enum kvm_only_cpuid_leafs { - CPUID_12_EAX = NCAPINTS, - CPUID_7_1_EDX, - CPUID_8000_0007_EDX, - CPUID_8000_0022_EAX, - CPUID_7_2_EDX, - NR_KVM_CPU_CAPS, - - NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, -}; - /* * Define a KVM-only feature flag. *