Message ID | 20191128014016.4389-12-sean.j.christopherson@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | x86/cpu: Clean up handling of VMX features | expand |
On Wed, Nov 27, 2019 at 05:40:08PM -0800, Sean Christopherson wrote: > Add support for generating VMX feature names in capflags.c and use the > resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't > print VMX flags if no bits are set in word 0, which holds Pin Controls. > Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if > they are not supported then the CPU is broken, it does not actually > support VMX, or the kernel wasn't built with support for the target CPU. > > Print the features in a dedicated "vmx flags" line to avoid polluting > the common "flags" and to avoid having to prefix all flags with "vmx_", > which results in horrendously long names. > > Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI > for those flags. This means that "flags" and "vmx flags" will have > duplicate entries for tpr_shadow (virtual_tpr), vnmi (virtual_nmis), > ept, flexpriority, vpid and ept_ad, but caps the pollution of "flags" at > those six VMX features. The vendor specific code that populates the > synthetic flags will be consolidated in a future patch to futher ^ further > +#ifdef CONFIG_X86_VMX_FEATURE_NAMES > + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { > + seq_puts(m, "\nvmx flags\t:"); > + for (i = 0; i < 32*NVMXINTS; i++) { > + if (test_bit(i, (unsigned long *)c->vmx_capability) && > + x86_vmx_flags[i] != NULL) > + seq_printf(m, " %s", x86_vmx_flags[i]); > + } > + } > +#endif Oh well, some could be shorter: vmx flags : virtual_nmis preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offsetting virtual_tpr mtf virt_apic_accesses ept vpid unrestricted_guest ple shadow_vmcs pml mode_based_ept_exec virtual_nmis -> vnmis preemption_timer -> preempt_tmr flexpriority -> flexprio tsc_offsetting -> tsc_ofs virtual_tpr -> vtpr virt_apic_accesses -> vapic unrestricted_guest -> unres_guest and so on. Those are just my examples - I betcha the SDM is more creative here with abbreviations. But you guys are going to grep for them. If it were me, I'd save on typing. :-)
On 12/12/19 13:26, Borislav Petkov wrote: > > vmx flags : virtual_nmis preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offsetting virtual_tpr mtf virt_apic_accesses ept vpid unrestricted_guest ple shadow_vmcs pml mode_based_ept_exec > > virtual_nmis -> vnmis Even vnmi > preemption_timer -> preempt_tmr I would prefer the full one here. > flexpriority -> flexprio Full name? > tsc_offsetting -> tsc_ofs tsc_offset? > virtual_tpr -> vtpr Do we need this? It's usually included together with flexpriority. > virt_apic_accesses -> vapic apicv > unrestricted_guest -> unres_guest Full? Or just unrestricted In general I would stick to the same names as kvm_intel module parameters (sans "enable_" if applicable) and not even bother publishing the others. Some features are either not used by KVM or available on all VMX processors. Paolo > and so on. Those are just my examples - I betcha the SDM is more > creative here with abbreviations. But you guys are going to grep for > them. If it were me, I'd save on typing. :-)
> On 12 Dec 2019, at 16:13, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 12/12/19 13:26, Borislav Petkov wrote: >> >> vmx flags : virtual_nmis preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offsetting virtual_tpr mtf virt_apic_accesses ept vpid unrestricted_guest ple shadow_vmcs pml mode_based_ept_exec >> >> virtual_nmis -> vnmis > > Even vnmi > >> preemption_timer -> preempt_tmr > > I would prefer the full one here. > >> flexpriority -> flexprio > > Full name? > >> tsc_offsetting -> tsc_ofs > > tsc_offset? > >> virtual_tpr -> vtpr > > Do we need this? It's usually included together with flexpriority. > >> virt_apic_accesses -> vapic > > apicv Frankly, I dislike APICv terminology. I prefer to enumerate the various VMX features which are collectively called APICv by KVM. APICv currently represents in KVM terminology the combination of APIC-register virtualization, virtual-interrupt-delivery and posted-interrupts (See cpu_has_vmx_apicv()). In fact, the coupling of “enable_apicv” module parameter have made me multiple times to need to disable entire APICv features when there for example was only a bug in posted-interrupts. Even you got confused as virtualize-apic-access is not part of KVM’s APICv terminology but rather it’s enablement depend on flexpriority_enabled (See cpu_need_virtualize_apic_accesses()). i.e. It can be used for faster intercept handling of accesses to guest xAPIC MMIO page. > >> unrestricted_guest -> unres_guest > > Full? Or just unrestricted I prefer unrestricted_guest. > > In general I would stick to the same names as kvm_intel module > parameters (sans "enable_" if applicable) and not even bother publishing > the others. Some features are either not used by KVM or available on > all VMX processors. > > Paolo > >> and so on. Those are just my examples - I betcha the SDM is more >> creative here with abbreviations. But you guys are going to grep for >> them. If it were me, I'd save on typing. :-) >
On 12/12/19 16:52, Liran Alon wrote: >>> virt_apic_accesses -> vapic >> apicv > Frankly, I dislike APICv terminology. I prefer to enumerate the > various VMX features which are collectively called APICv by KVM. > APICv currently represents in KVM terminology the combination of > APIC-register virtualization, virtual-interrupt-delivery and > posted-interrupts (See cpu_has_vmx_apicv()). > > In fact, the coupling of “enable_apicv” module parameter have made me > multiple times to need to disable entire APICv features when there > for example was only a bug in posted-interrupts. > > Even you got confused as virtualize-apic-access is not part of KVM’s > APICv terminology but rather it’s enablement depend on > flexpriority_enabled (See cpu_need_virtualize_apic_accesses()). i.e. > It can be used for faster intercept handling of accesses to guest > xAPIC MMIO page. Right, I got confused with APIC-register virtualization. Virtualize APIC accesses is another one I wouldn't bother putting in /proc/cpuinfo, since it's usually present together with flexpriority. Paolo
On Thu, Dec 12, 2019 at 04:57:10PM +0100, Paolo Bonzini wrote: > On 12/12/19 16:52, Liran Alon wrote: > >>> virt_apic_accesses -> vapic > >> apicv > > Frankly, I dislike APICv terminology. I prefer to enumerate the > > various VMX features which are collectively called APICv by KVM. > > APICv currently represents in KVM terminology the combination of > > APIC-register virtualization, virtual-interrupt-delivery and > > posted-interrupts (See cpu_has_vmx_apicv()). > > > > In fact, the coupling of “enable_apicv” module parameter have made me > > multiple times to need to disable entire APICv features when there > > for example was only a bug in posted-interrupts. > > > > Even you got confused as virtualize-apic-access is not part of KVM’s > > APICv terminology but rather it’s enablement depend on > > flexpriority_enabled (See cpu_need_virtualize_apic_accesses()). i.e. > > It can be used for faster intercept handling of accesses to guest > > xAPIC MMIO page. > > Right, I got confused with APIC-register virtualization. Virtualize > APIC accesses is another one I wouldn't bother putting in /proc/cpuinfo, > since it's usually present together with flexpriority. Key word being "usually". My intent in printing out partially redundant flags was to help users debug/understand why the combined feature isn't supported. E.g. userspace can already easily (relatively speaking) query flexpriority support via /sys/module/kvm_intel/parameters/flexpriority. But if that comes back "N", the user has no way to determine exactly why flexpriority is disabled.
On 12/12/19 18:43, Sean Christopherson wrote: > Key word being "usually". My intent in printing out partially redundant > flags was to help users debug/understand why the combined feature isn't > supported. E.g. userspace can already easily (relatively speaking) query > flexpriority support via /sys/module/kvm_intel/parameters/flexpriority. > But if that comes back "N", the user has no way to determine exactly why > flexpriority is disabled. There are tools such as vmxcap. It is part of QEMU, but I wouldn't mind moving it into the kernel tree. Paolo
> On 12 Dec 2019, at 19:43, Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Thu, Dec 12, 2019 at 04:57:10PM +0100, Paolo Bonzini wrote: >> On 12/12/19 16:52, Liran Alon wrote: >>>>> virt_apic_accesses -> vapic >>>> apicv >>> Frankly, I dislike APICv terminology. I prefer to enumerate the >>> various VMX features which are collectively called APICv by KVM. >>> APICv currently represents in KVM terminology the combination of >>> APIC-register virtualization, virtual-interrupt-delivery and >>> posted-interrupts (See cpu_has_vmx_apicv()). >>> >>> In fact, the coupling of “enable_apicv” module parameter have made me >>> multiple times to need to disable entire APICv features when there >>> for example was only a bug in posted-interrupts. >>> >>> Even you got confused as virtualize-apic-access is not part of KVM’s >>> APICv terminology but rather it’s enablement depend on >>> flexpriority_enabled (See cpu_need_virtualize_apic_accesses()). i.e. >>> It can be used for faster intercept handling of accesses to guest >>> xAPIC MMIO page. >> >> Right, I got confused with APIC-register virtualization. Virtualize >> APIC accesses is another one I wouldn't bother putting in /proc/cpuinfo, >> since it's usually present together with flexpriority. > > Key word being "usually". My intent in printing out partially redundant > flags was to help users debug/understand why the combined feature isn't > supported. E.g. userspace can already easily (relatively speaking) query > flexpriority support via /sys/module/kvm_intel/parameters/flexpriority. > But if that comes back "N", the user has no way to determine exactly why > flexpriority is disabled. +1 on that. /proc/cpuinfo should just dump supported VMX features that kernel is aware of as exposed from CPU. Without further processing.
> On 12 Dec 2019, at 19:47, Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 12/12/19 18:43, Sean Christopherson wrote: >> Key word being "usually". My intent in printing out partially redundant >> flags was to help users debug/understand why the combined feature isn't >> supported. E.g. userspace can already easily (relatively speaking) query >> flexpriority support via /sys/module/kvm_intel/parameters/flexpriority. >> But if that comes back "N", the user has no way to determine exactly why >> flexpriority is disabled. > > There are tools such as vmxcap. It is part of QEMU, but I wouldn't mind > moving it into the kernel tree. > > Paolo > True. But it relies on /dev/msr0 (exposed from msr kernel module) to be able to read any host arbitrary MSR. It’s much more elegant to just report CPU VMX features in a /proc/cpuinfo pseudo-file and don’t require strong privileges. Why should CPU VMX features be treated differently than standard CPUID deduced features? CPU SVM features are also present in /proc/cpuinfo today (Because they are deduced from CPUID leafs). So it should be similar. -Liran
On Thu, Dec 12, 2019 at 9:53 AM Liran Alon <liran.alon@oracle.com> wrote:
> Why should CPU VMX features be treated differently than standard CPUID deduced features?
Do we have the right Intel people on the recipient list to answer this
question? Presumably, Intel felt that this information should be
available in supervisor mode only.
Sean?
> On 12 Dec 2019, at 19:57, Jim Mattson <jmattson@google.com> wrote: > > On Thu, Dec 12, 2019 at 9:53 AM Liran Alon <liran.alon@oracle.com> wrote: > >> Why should CPU VMX features be treated differently than standard CPUID deduced features? > > Do we have the right Intel people on the recipient list to answer this > question? Presumably, Intel felt that this information should be > available in supervisor mode only. > > Sean? Good question. Probably because it just makes sense that Ring3 will never need to use this info as all VMX instructions are privileged. i.e. Can only be executed in Ring0. De-facto in KVM we have discovered this assumption to be problematic BTW, as KVM created an interface to query VMX MSRs values to properly define the requested vCPU model. :P (See kvm_get_msr_feature())
On Thu, Dec 12, 2019 at 03:13:45PM +0100, Paolo Bonzini wrote: > On 12/12/19 13:26, Borislav Petkov wrote: > > > > vmx flags : virtual_nmis preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offsetting virtual_tpr mtf virt_apic_accesses ept vpid unrestricted_guest ple shadow_vmcs pml mode_based_ept_exec Tying into the consistency comment below, any objection to always prefixing "ept" for relevant controls instead of following the SDM? Specifically, that would yield ept_mode_based_exec and ept_spp > > > > virtual_nmis -> vnmis > > Even vnmi > > > preemption_timer -> preempt_tmr > > I would prefer the full one here. > > > flexpriority -> flexprio > > Full name? > > > tsc_offsetting -> tsc_ofs > > tsc_offset? I'll go with tsc_offset. > > virtual_tpr -> vtpr > > Do we need this? It's usually included together with flexpriority. > > > virt_apic_accesses -> vapic Using v<feature> across the board makes sense to keep things consistent, i.e. vnmi, vtpr, vapic, etc... Anyone have thoughts on how to shorten "APIC-register virtualization" without colliding with vapic or apicv? I currently have apic_reg_virt, which is a bit wordy. apic_regv isn't awful, but I don't love it. The other control that will be awkard is "Virtual Interrupt Delivery". vint_delivery? > > unrestricted_guest -> unres_guest > > Full? Or just unrestricted I prefer unrestricted_guest, a bare unrestricted just makes me wonder "unrestricted what?". But I can live with "unrestricted" if that's the consensus. > In general I would stick to the same names as kvm_intel module > parameters (sans "enable_" if applicable) and not even bother publishing > the others. Some features are either not used by KVM or available on > all VMX processors. IMO there's value in printing features that are not 1:1 with module params. I also think it makes sense to print features of interest even if KVM doesn't (yet) support the feature, e.g. to allow a user/developer to check if they can use/test a KVM build with support for a new feature without having to build and install the new kernel. > Paolo > > > and so on. Those are just my examples - I betcha the SDM is more > > creative here with abbreviations. But you guys are going to grep for > > them. If it were me, I'd save on typing. :-)
On 12/12/19 19:18, Sean Christopherson wrote: > Using v<feature> across the board makes sense to keep things consistent, > i.e. vnmi, vtpr, vapic, etc... > > Anyone have thoughts on how to shorten "APIC-register virtualization" > without colliding with vapic or apicv? I currently have apic_reg_virt, > which is a bit wordy. apic_regv isn't awful, but I don't love it. Perhaps vapic_access and vapic_register? > > The other control that will be awkard is "Virtual Interrupt Delivery". > vint_delivery? We can just use vid I think. And posted_intr. >>> unrestricted_guest -> unres_guest >> >> Full? Or just unrestricted > > I prefer unrestricted_guest, a bare unrestricted just makes me wonder > "unrestricted what?". But I can live with "unrestricted" if that's the > consensus. I do prefer unrestricted_guest actually. >> In general I would stick to the same names as kvm_intel module >> parameters (sans "enable_" if applicable) and not even bother publishing >> the others. Some features are either not used by KVM or available on >> all VMX processors. > > IMO there's value in printing features that are not 1:1 with module params. > > I also think it makes sense to print features of interest even if KVM > doesn't (yet) support the feature, e.g. to allow a user/developer to check > if they can use/test a KVM build with support for a new feature without > having to build and install the new kernel. > >> Paolo >> >>> and so on. Those are just my examples - I betcha the SDM is more >>> creative here with abbreviations. But you guys are going to grep for >>> them. If it were me, I'd save on typing. :-) >
On Thu, Dec 12, 2019 at 08:04:19PM +0200, Liran Alon wrote: > > > > On 12 Dec 2019, at 19:57, Jim Mattson <jmattson@google.com> wrote: > > > > On Thu, Dec 12, 2019 at 9:53 AM Liran Alon <liran.alon@oracle.com> wrote: > > > >> Why should CPU VMX features be treated differently than standard CPUID deduced features? > > > > Do we have the right Intel people on the recipient list to answer this > > question? Presumably, Intel felt that this information should be > > available in supervisor mode only. > > > > Sean? > > Good question. Probably because it just makes sense that Ring3 will never need to use > this info as all VMX instructions are privileged. i.e. Can only be executed in Ring0. I highly doubt ring0 vs. ring3 was a motivating factor. I suspect the MSR interface is primarily driven by VMX's allowed-0 vs. allowed-1 behavior, which would be awkward to encode in CPUID. Reporting via MSR also likely provided more flexibility for updating/fixing CPU behavior, e.g. patching the RDMSR hook is likely far easier than patching CPUID. Even if the architects intended the information to be supervisor-only, that's just their opinion, no? > De-facto in KVM we have discovered this assumption to be problematic BTW, > as KVM created an interface to query VMX MSRs values to properly define the requested > vCPU model. :P (See kvm_get_msr_feature())
On Thu, Dec 12, 2019 at 07:52:55PM +0200, Liran Alon wrote: > Why should CPU VMX features be treated differently than standard CPUID > deduced features? CPU SVM features are also present in /proc/cpuinfo > today (Because they are deduced from CPUID leafs). So it should be > similar. Well, VMX features are not CPUID bits apparently because <reasons>. And Intel hw folk will give you a bunch, I bet. So Sean needs to do all this dancing to get them to be more usable to other kernel code. SVM feature bits are proper CPUID bits and this is what the cpufeatures glue does - mirro CPUID bits, mainly.
On Thu, Dec 12, 2019 at 07:23:46PM +0100, Paolo Bonzini wrote: > On 12/12/19 19:18, Sean Christopherson wrote: > > Using v<feature> across the board makes sense to keep things consistent, > > i.e. vnmi, vtpr, vapic, etc... > > > > Anyone have thoughts on how to shorten "APIC-register virtualization" > > without colliding with vapic or apicv? I currently have apic_reg_virt, > > which is a bit wordy. apic_regv isn't awful, but I don't love it. > > Perhaps vapic_access and vapic_register? I ended up going with vapic and vapic_reg, figured everyone looking at this knows what "reg" is short for, and I like the progression shown by vapic -> vapic_reg.
diff --git a/arch/x86/boot/mkcpustr.c b/arch/x86/boot/mkcpustr.c index 9caa10e82217..da0ccc5de538 100644 --- a/arch/x86/boot/mkcpustr.c +++ b/arch/x86/boot/mkcpustr.c @@ -15,6 +15,7 @@ #include "../include/asm/required-features.h" #include "../include/asm/disabled-features.h" #include "../include/asm/cpufeatures.h" +#include "../include/asm/vmxfeatures.h" #include "../kernel/cpu/capflags.c" int main(void) diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 4173b0de7a2f..dba6a83bc349 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -57,11 +57,12 @@ obj-$(CONFIG_ACRN_GUEST) += acrn.o ifdef CONFIG_X86_FEATURE_NAMES quiet_cmd_mkcapflags = MKCAP $@ - cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $< $@ + cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^ cpufeature = $(src)/../../include/asm/cpufeatures.h +vmxfeature = $(src)/../../include/asm/vmxfeatures.h -$(obj)/capflags.c: $(cpufeature) $(src)/mkcapflags.sh FORCE +$(obj)/capflags.c: $(cpufeature) $(vmxfeature) $(src)/mkcapflags.sh FORCE $(call if_changed,mkcapflags) endif targets += capflags.c diff --git a/arch/x86/kernel/cpu/mkcapflags.sh b/arch/x86/kernel/cpu/mkcapflags.sh index aed45b8895d5..1db560ed2ca3 100644 --- a/arch/x86/kernel/cpu/mkcapflags.sh +++ b/arch/x86/kernel/cpu/mkcapflags.sh @@ -6,8 +6,7 @@ set -e -IN=$1 -OUT=$2 +OUT=$1 dump_array() { @@ -15,6 +14,7 @@ dump_array() SIZE=$2 PFX=$3 POSTFIX=$4 + IN=$5 PFX_SZ=$(echo $PFX | wc -c) TABS="$(printf '\t\t\t\t\t')" @@ -57,11 +57,18 @@ trap 'rm "$OUT"' EXIT echo "#endif" echo "" - dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" + dump_array "x86_cap_flags" "NCAPINTS*32" "X86_FEATURE_" "" $2 echo "" - dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" + dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" $2 + echo "" + echo "#ifdef CONFIG_X86_VMX_FEATURE_NAMES" + echo "#ifndef _ASM_X86_VMXFEATURES_H" + echo "#include <asm/vmxfeatures.h>" + echo "#endif" + dump_array "x86_vmx_flags" "NVMXINTS*32" "VMX_FEATURE_" "" $3 + echo "#endif /* CONFIG_X86_VMX_FEATURE_NAMES */" ) > $OUT trap - EXIT diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index cb2e49810d68..4eec8889b0ff 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -7,6 +7,10 @@ #include "cpu.h" +#ifdef CONFIG_X86_VMX_FEATURE_NAMES +extern const char * const x86_vmx_flags[NVMXINTS*32]; +#endif + /* * Get CPU information for use by the procfs. */ @@ -102,6 +106,17 @@ static int show_cpuinfo(struct seq_file *m, void *v) if (cpu_has(c, i) && x86_cap_flags[i] != NULL) seq_printf(m, " %s", x86_cap_flags[i]); +#ifdef CONFIG_X86_VMX_FEATURE_NAMES + if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { + seq_puts(m, "\nvmx flags\t:"); + for (i = 0; i < 32*NVMXINTS; i++) { + if (test_bit(i, (unsigned long *)c->vmx_capability) && + x86_vmx_flags[i] != NULL) + seq_printf(m, " %s", x86_vmx_flags[i]); + } + } +#endif + seq_puts(m, "\nbugs\t\t:"); for (i = 0; i < 32*NBUGINTS; i++) { unsigned int bug_bit = 32*NCAPINTS + i;
Add support for generating VMX feature names in capflags.c and use the resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo. Don't print VMX flags if no bits are set in word 0, which holds Pin Controls. Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if they are not supported then the CPU is broken, it does not actually support VMX, or the kernel wasn't built with support for the target CPU. Print the features in a dedicated "vmx flags" line to avoid polluting the common "flags" and to avoid having to prefix all flags with "vmx_", which results in horrendously long names. Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI for those flags. This means that "flags" and "vmx flags" will have duplicate entries for tpr_shadow (virtual_tpr), vnmi (virtual_nmis), ept, flexpriority, vpid and ept_ad, but caps the pollution of "flags" at those six VMX features. The vendor specific code that populates the synthetic flags will be consolidated in a future patch to futher minimize the lasting damage. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/boot/mkcpustr.c | 1 + arch/x86/kernel/cpu/Makefile | 5 +++-- arch/x86/kernel/cpu/mkcapflags.sh | 15 +++++++++++---- arch/x86/kernel/cpu/proc.c | 15 +++++++++++++++ 4 files changed, 30 insertions(+), 6 deletions(-)