Message ID | 20220119182818.3641304-1-daviddunn@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] Provide VM capability to disable PMU virtualization for individual VMs | expand |
On Wed, Jan 19, 2022, David Dunn wrote: > When PMU virtualization is enabled via the module parameter, usermode > can disable PMU virtualization on individual VMs using this new > capability. > > This provides a uniform way to disable PMU virtualization on x86. Since > AMD doesn't have a CPUID bit for PMU support, disabling PMU > virtualization requires some other state to indicate whether the PMU > related MSRs are ignored. > > Since KVM_GET_SUPPORTED_CPUID reports the maximal CPUID information > based on module parameters, usermode will need to adjust CPUID when > disabling PMU virtualization on individual VMs. On Intel CPUs, the > change to PMU enablement will not alter existing until SET_CPUID2 is > invoked. > > Signed-off-by: David Dunn <daviddunn@google.com> > --- I'm not necessarily opposed to this capability, but can't userspace get the same result by using MSR filtering to inject #GP on the PMU MSRs? > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 55518b7d3b96..9b640c5bb4f6 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4326,6 +4326,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > if (r < sizeof(struct kvm_xsave)) > r = sizeof(struct kvm_xsave); > break; > + case KVM_CAP_ENABLE_PMU: > + r = enable_pmu; > + break; > } > default: > break; > @@ -5937,6 +5940,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > kvm->arch.exit_on_emulation_error = cap->args[0]; > r = 0; > break; > + case KVM_CAP_ENABLE_PMU: > + r = -EINVAL; > + if (!enable_pmu || cap->args[0] & ~1) Probably worth adding a #define in uapi/.../kvm.h for bit 0. > + break; > + kvm->arch.enable_pmu = cap->args[0]; > + r = 0; > + break; > default: > r = -EINVAL; > break; > @@ -11562,6 +11572,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); > > kvm->arch.guest_can_read_msr_platform_info = true; > + kvm->arch.enable_pmu = true; Rather than default to "true", just capture the global "enable_pmu" and then all the sites that check "enable_pmu" in VM context can check _only_ kvm->arch.enable_pmu. enable_pmu is readonly, so there's no danger of it being toggled after the VM is created. > #if IS_ENABLED(CONFIG_HYPERV) > spin_lock_init(&kvm->arch.hv_root_tdp_lock); > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 9563d294f181..37cbcdffe773 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 > #define KVM_CAP_VM_GPA_BITS 207 > #define KVM_CAP_XSAVE2 208 > +#define KVM_CAP_ENABLE_PMU 209 > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h > index f066637ee206..e71712c71ab1 100644 > --- a/tools/include/uapi/linux/kvm.h > +++ b/tools/include/uapi/linux/kvm.h > @@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_MTE 205 > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 > #define KVM_CAP_XSAVE2 207 > +#define KVM_CAP_ENABLE_PMU 209 > > #ifdef KVM_CAP_IRQ_ROUTING > > -- > 2.34.1.703.g22d0c6ccf7-goog >
Thanks Sean. On Wed, Jan 19, 2022 at 5:15 PM Sean Christopherson <seanjc@google.com> wrote: > I'm not necessarily opposed to this capability, but can't userspace get the same > result by using MSR filtering to inject #GP on the PMU MSRs? Yes. It is possible for each userspace to inject #GP on Intel and ignore on AMD. But I think it is less error prone to handle it once in KVM in the same way we handle the module parameter. No extra complexity in KVM but it reduces the complexity in clients. > Probably worth adding a #define in uapi/.../kvm.h for bit 0. > Rather than default to "true", just capture the global "enable_pmu" and then all > the sites that check "enable_pmu" in VM context can check _only_ kvm->arch.enable_pmu. > enable_pmu is readonly, so there's no danger of it being toggled after the VM is > created. Thanks for the feedback. I'll incorporate both of these in v2. Dave Dunn
Hi David, Thanks for coming to address this. Please modify the patch(es) subject to follow the convention. On 20/1/2022 2:28 am, David Dunn wrote: > When PMU virtualization is enabled via the module parameter, usermode > can disable PMU virtualization on individual VMs using this new > capability. Will the user space fail or be notified when the enable_pmu say no ? > > This provides a uniform way to disable PMU virtualization on x86. Since > AMD doesn't have a CPUID bit for PMU support, disabling PMU Not entirely absent, such as PERFCTR_CORE. > virtualization requires some other state to indicate whether the PMU > related MSRs are ignored. Not just ignored, but made to disappear altogether. > > Since KVM_GET_SUPPORTED_CPUID reports the maximal CPUID information > based on module parameters, usermode will need to adjust CPUID when > disabling PMU virtualization on individual VMs. On Intel CPUs, the > change to PMU enablement will not alter existing until SET_CPUID2 is > invoked. Please clarify. Do we have a requirement for the order in which the SET_CPUID2 and ioctl_enable_cap interfaces are called? > > Signed-off-by: David Dunn <daviddunn@google.com> > --- > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/svm/pmu.c | 2 +- > arch/x86/kvm/vmx/pmu_intel.c | 2 +- > arch/x86/kvm/x86.c | 11 +++++++++++ > include/uapi/linux/kvm.h | 1 + > tools/include/uapi/linux/kvm.h | 1 + > 6 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 682ad02a4e58..5cdcd4a7671b 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1232,6 +1232,7 @@ struct kvm_arch { > hpa_t hv_root_tdp; > spinlock_t hv_root_tdp_lock; > #endif > + bool enable_pmu; The name makes it difficult to distinguish the scope of access to the variable. Try storing it via "pmu->version == 0". > }; > > struct kvm_vm_stat { > diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c > index 5aa45f13b16d..605bcfb55625 100644 > --- a/arch/x86/kvm/svm/pmu.c > +++ b/arch/x86/kvm/svm/pmu.c > @@ -101,7 +101,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr, > { > struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu); > > - if (!enable_pmu) > + if (!enable_pmu || !vcpu->kvm->arch.enable_pmu) > return NULL; > > switch (msr) { > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c > index 466d18fc0c5d..4c3885765027 100644 > --- a/arch/x86/kvm/vmx/pmu_intel.c > +++ b/arch/x86/kvm/vmx/pmu_intel.c > @@ -487,7 +487,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) > pmu->reserved_bits = 0xffffffff00200000ull; > > entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); > - if (!entry || !enable_pmu) > + if (!entry || !vcpu->kvm->arch.enable_pmu || !enable_pmu) > return; > eax.full = entry->eax; > edx.full = entry->edx; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 55518b7d3b96..9b640c5bb4f6 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4326,6 +4326,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > if (r < sizeof(struct kvm_xsave)) > r = sizeof(struct kvm_xsave); > break; > + case KVM_CAP_ENABLE_PMU: > + r = enable_pmu; > + break; > } > default: > break; > @@ -5937,6 +5940,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > kvm->arch.exit_on_emulation_error = cap->args[0]; > r = 0; > break; > + case KVM_CAP_ENABLE_PMU: > + r = -EINVAL; > + if (!enable_pmu || cap->args[0] & ~1) > + break; > + kvm->arch.enable_pmu = cap->args[0]; > + r = 0; > + break; > default: > r = -EINVAL; > break; > @@ -11562,6 +11572,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); > > kvm->arch.guest_can_read_msr_platform_info = true; > + kvm->arch.enable_pmu = true; > > #if IS_ENABLED(CONFIG_HYPERV) > spin_lock_init(&kvm->arch.hv_root_tdp_lock); > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 9563d294f181..37cbcdffe773 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 > #define KVM_CAP_VM_GPA_BITS 207 > #define KVM_CAP_XSAVE2 208 > +#define KVM_CAP_ENABLE_PMU 209 Rename it to KVM_CAP_PMU_CAPABILITY and use the bit 0 for *DISABLE_PMU*. > > #ifdef KVM_CAP_IRQ_ROUTING > > diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h > index f066637ee206..e71712c71ab1 100644 > --- a/tools/include/uapi/linux/kvm.h > +++ b/tools/include/uapi/linux/kvm.h > @@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_ARM_MTE 205 > #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 > #define KVM_CAP_XSAVE2 207 > +#define KVM_CAP_ENABLE_PMU 209 > > #ifdef KVM_CAP_IRQ_ROUTING >
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 682ad02a4e58..5cdcd4a7671b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1232,6 +1232,7 @@ struct kvm_arch { hpa_t hv_root_tdp; spinlock_t hv_root_tdp_lock; #endif + bool enable_pmu; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c index 5aa45f13b16d..605bcfb55625 100644 --- a/arch/x86/kvm/svm/pmu.c +++ b/arch/x86/kvm/svm/pmu.c @@ -101,7 +101,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr, { struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu); - if (!enable_pmu) + if (!enable_pmu || !vcpu->kvm->arch.enable_pmu) return NULL; switch (msr) { diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index 466d18fc0c5d..4c3885765027 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -487,7 +487,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu) pmu->reserved_bits = 0xffffffff00200000ull; entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); - if (!entry || !enable_pmu) + if (!entry || !vcpu->kvm->arch.enable_pmu || !enable_pmu) return; eax.full = entry->eax; edx.full = entry->edx; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 55518b7d3b96..9b640c5bb4f6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4326,6 +4326,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) if (r < sizeof(struct kvm_xsave)) r = sizeof(struct kvm_xsave); break; + case KVM_CAP_ENABLE_PMU: + r = enable_pmu; + break; } default: break; @@ -5937,6 +5940,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, kvm->arch.exit_on_emulation_error = cap->args[0]; r = 0; break; + case KVM_CAP_ENABLE_PMU: + r = -EINVAL; + if (!enable_pmu || cap->args[0] & ~1) + break; + kvm->arch.enable_pmu = cap->args[0]; + r = 0; + break; default: r = -EINVAL; break; @@ -11562,6 +11572,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); kvm->arch.guest_can_read_msr_platform_info = true; + kvm->arch.enable_pmu = true; #if IS_ENABLED(CONFIG_HYPERV) spin_lock_init(&kvm->arch.hv_root_tdp_lock); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 9563d294f181..37cbcdffe773 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 #define KVM_CAP_VM_GPA_BITS 207 #define KVM_CAP_XSAVE2 208 +#define KVM_CAP_ENABLE_PMU 209 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h index f066637ee206..e71712c71ab1 100644 --- a/tools/include/uapi/linux/kvm.h +++ b/tools/include/uapi/linux/kvm.h @@ -1132,6 +1132,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_ARM_MTE 205 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206 #define KVM_CAP_XSAVE2 207 +#define KVM_CAP_ENABLE_PMU 209 #ifdef KVM_CAP_IRQ_ROUTING
When PMU virtualization is enabled via the module parameter, usermode can disable PMU virtualization on individual VMs using this new capability. This provides a uniform way to disable PMU virtualization on x86. Since AMD doesn't have a CPUID bit for PMU support, disabling PMU virtualization requires some other state to indicate whether the PMU related MSRs are ignored. Since KVM_GET_SUPPORTED_CPUID reports the maximal CPUID information based on module parameters, usermode will need to adjust CPUID when disabling PMU virtualization on individual VMs. On Intel CPUs, the change to PMU enablement will not alter existing until SET_CPUID2 is invoked. Signed-off-by: David Dunn <daviddunn@google.com> --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/pmu.c | 2 +- arch/x86/kvm/vmx/pmu_intel.c | 2 +- arch/x86/kvm/x86.c | 11 +++++++++++ include/uapi/linux/kvm.h | 1 + tools/include/uapi/linux/kvm.h | 1 + 6 files changed, 16 insertions(+), 2 deletions(-)