Message ID | 20240313003739.3365845-1-mizhang@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: x86/pmu: Return correct value of IA32_PERF_CAPABILITIES for userspace after vCPU has run | expand |
On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote: > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read > it after vCPU has already run. Previously, KVM will always return the guest > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The > guest cached value on default is kvm_caps.supported_perf_cap. However, > when userspace sets the value during live migration, the call fails because of > the check on X86_FEATURE_PDCM. Could you point where in the set_msr path that could fail? (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and kvm_set_msr_common) > > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run, > KVM should faithfully return correct value to satisify legitimate requests from > userspace such as VM suspend/resume and live migrartion. In this case, KVM > should return 0 when guest cpuid lacks X86_FEATURE_PDCM. Some typos above (satisfy, migration, CPUID) Seems the description here isn’t aligned to your code below? The code below prevents userspace from reading the MSR value (not return 0 as the read value) in that case. >So fix the > problem by adding an additional check in vmx_set_msr(). > > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is > always 0. > > Cc: Aaron Lewis <aaronlewis@google.com> > Cc: Jim Mattson <jmattson@google.com> > Signed-off-by: Mingwei Zhang <mizhang@google.com> > --- > arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index > 40e3780d73ae..6d8667b56091 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, > struct msr_data *msr_info) > msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash > [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0]; > break; > + case MSR_IA32_PERF_CAPABILITIES: > + /* > + * Host VMM should not get potentially invalid MSR value if > vCPU > + * has already run but guest cpuid lacks the support for the > + * MSR. > + */ > + if (msr_info->host_initiated && > + kvm_vcpu_has_run(vcpu) && > + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) > + return 1; > + break; > case KVM_FIRST_EMULATED_VMX_MSR ... > KVM_LAST_EMULATED_VMX_MSR: > if (!guest_can_use(vcpu, X86_FEATURE_VMX)) > return 1; > > base-commit: fd89499a5151d197ba30f7b801f6d8f4646cf446 > -- > 2.44.0.291.gc1ea87d7ee-goog >
On Wed, Mar 13, 2024, Wei W Wang wrote: > On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote: > > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read > > it after vCPU has already run. Previously, KVM will always return the guest > > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The > > guest cached value on default is kvm_caps.supported_perf_cap. However, > > when userspace sets the value during live migration, the call fails because of > > the check on X86_FEATURE_PDCM. > > Could you point where in the set_msr path that could fail? > (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and > kvm_set_msr_common) The changelog is misleading, it's not the PDCM feature bit, it's the PMU version check in vmx_set_msr(): case MSR_IA32_PERF_CAPABILITIES: if (data && !vcpu_to_pmu(vcpu)->version) return 1; > > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run, > > KVM should faithfully return correct value to satisify legitimate requests from > > userspace such as VM suspend/resume and live migrartion. In this case, KVM > > should return 0 when guest cpuid lacks X86_FEATURE_PDCM. > Some typos above (satisfy, migration, CPUID) > > Seems the description here isn’t aligned to your code below? > The code below prevents userspace from reading the MSR value (not return 0 as the > read value) in that case. Ya. > >So fix the > > problem by adding an additional check in vmx_set_msr(). > > > > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine > > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is > > always 0. > > > > Cc: Aaron Lewis <aaronlewis@google.com> > > Cc: Jim Mattson <jmattson@google.com> > > Signed-off-by: Mingwei Zhang <mizhang@google.com> > > --- > > arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index > > 40e3780d73ae..6d8667b56091 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, > > struct msr_data *msr_info) > > msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash > > [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0]; > > break; > > + case MSR_IA32_PERF_CAPABILITIES: > > + /* > > + * Host VMM should not get potentially invalid MSR value if > > vCPU > > + * has already run but guest cpuid lacks the support for the > > + * MSR. > > + */ > > + if (msr_info->host_initiated && > > + kvm_vcpu_has_run(vcpu) && > > + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) > > + return 1; As Wei pointed out, this doesn't match the changelog. And I don't think this is what we want, at least not in isolation. Making KVM more restrictive on userspace reads doesn't solve the live migration save/restore issue, and the kvm_vcpu_has_run() adds yet another flavor of MSR handling. We discussed this whole MSRs mess at PUCK this morning. I forgot to hit RECORD, but Paolo took notes and will post them soon. Going from memory, the plan is to: 1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to KVM_SET_MSRS. 2. Go with roughly what I proposed in the CET thread (for unsupported MSRS, read 0 and drop writes of '0')[*]. 3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero- initialize the MSRs). With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads and writes to PERF_CAPABILITIES, and the common userspace MSR handling will convert "unsupported" to "success" as appropriate. [*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@google.com
On Wed, Mar 13, 2024 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote: > We discussed this whole MSRs mess at PUCK this morning. I forgot to hit RECORD, > but Paolo took notes and will post them soon. > > Going from memory, the plan is to: > > 1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to > KVM_SET_MSRS. Correct. > 2. Go with roughly what I proposed in the CET thread (for unsupported MSRS, > read 0 and drop writes of '0')[*]. More precisely, read a sensible default value corresponding to "everything disabled", which generally speaking should be 0. And generally speaking, commit to: - allowing host_initiated reads independent of CPUID - allowing host_initiated writes of the same value that was read - blocking host_initiated writes of nonzero (or nondefault) values if the corresponding guest CPUID bit is clear Right now some MSRs do not allow host initiated writes, for example MSR_KVM_* (check for calls to guest_pv_has), and the VMX MSRs. Generally speaking we want to fix them, unless it's an unholy pain (for example the VMX capabilities MSRs are good candidates for pain, because they have some "must be 1" bits in bits 63:32). All this should be covered by selftests. > 3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO > (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero- > initialize the MSRs). More precisely, even if quirk==enabled we will move the setting of a non-zero default value for the MSR from vCPU creation to KVM_SET_CPUID2, and only set a non-zero default value if the CPUID bit is set. Another small thing in my notes was to look at the duplication between emulated_msrs and msr_based_features_all_except_vmx. Right now MSR_AMD64_DE_CFG is the only one that is not in both and, probably not a coincidence, it's also the only one implemented only for one vendor. There's probably some opportunity for both cleanups and fixes. It looks like svm_has_emulated_msr(MSR_AMD64_DE_CFG) should return true for example. Paolo > With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads > and writes to PERF_CAPABILITIES, and the common userspace MSR handling will > convert "unsupported" to "success" as appropriate. > > [*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@google.com
On Wed, Mar 13, 2024, Wang, Wei W wrote: > On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote: > > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read > > it after vCPU has already run. Previously, KVM will always return the guest > > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The > > guest cached value on default is kvm_caps.supported_perf_cap. However, > > when userspace sets the value during live migration, the call fails because of > > the check on X86_FEATURE_PDCM. > > Could you point where in the set_msr path that could fail? > (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and > kvm_set_msr_common) > My memory cheated me... The check was on pmu->version, which not X86_FEATURE_PDCM. Note pmu->version is basically backed by another bits guest CPUID. > > > > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run, > > KVM should faithfully return correct value to satisify legitimate requests from > > userspace such as VM suspend/resume and live migrartion. In this case, KVM > > should return 0 when guest cpuid lacks X86_FEATURE_PDCM. > Some typos above (satisfy, migration, CPUID) > > Seems the description here isn’t aligned to your code below? > The code below prevents userspace from reading the MSR value (not return 0 as the > read value) in that case. > > >So fix the > > problem by adding an additional check in vmx_set_msr(). > > > > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine > > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is > > always 0. > > > > Cc: Aaron Lewis <aaronlewis@google.com> > > Cc: Jim Mattson <jmattson@google.com> > > Signed-off-by: Mingwei Zhang <mizhang@google.com> > > --- > > arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index > > 40e3780d73ae..6d8667b56091 100644 > > --- a/arch/x86/kvm/vmx/vmx.c > > +++ b/arch/x86/kvm/vmx/vmx.c > > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, > > struct msr_data *msr_info) > > msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash > > [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0]; > > break; > > + case MSR_IA32_PERF_CAPABILITIES: > > + /* > > + * Host VMM should not get potentially invalid MSR value if > > vCPU > > + * has already run but guest cpuid lacks the support for the > > + * MSR. > > + */ > > + if (msr_info->host_initiated && > > + kvm_vcpu_has_run(vcpu) && > > + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) > > + return 1; > > + break; > > case KVM_FIRST_EMULATED_VMX_MSR ... > > KVM_LAST_EMULATED_VMX_MSR: > > if (!guest_can_use(vcpu, X86_FEATURE_VMX)) > > return 1; > > > > base-commit: fd89499a5151d197ba30f7b801f6d8f4646cf446 > > -- > > 2.44.0.291.gc1ea87d7ee-goog > > >
On Wed, Mar 13, 2024, Paolo Bonzini wrote: > On Wed, Mar 13, 2024 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote: > > We discussed this whole MSRs mess at PUCK this morning. I forgot to hit RECORD, > > but Paolo took notes and will post them soon. > > > > Going from memory, the plan is to: > > > > 1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to > > KVM_SET_MSRS. > > Correct. This is clear to me now. Glad to have the direction settled down. > > > 2. Go with roughly what I proposed in the CET thread (for unsupported MSRS, > > read 0 and drop writes of '0')[*]. > > More precisely, read a sensible default value corresponding to > "everything disabled", which generally speaking should be 0. And > generally speaking, commit to: > - allowing host_initiated reads independent of CPUID > - allowing host_initiated writes of the same value that was read > - blocking host_initiated writes of nonzero (or nondefault) values if > the corresponding guest CPUID bit is clear > > Right now some MSRs do not allow host initiated writes, for example > MSR_KVM_* (check for calls to guest_pv_has), and the VMX MSRs. > > Generally speaking we want to fix them, unless it's an unholy pain > (for example the VMX capabilities MSRs are good candidates for pain, > because they have some "must be 1" bits in bits 63:32). > > All this should be covered by selftests. > > > 3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO > > (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero- > > initialize the MSRs). > > More precisely, even if quirk==enabled we will move the setting of a > non-zero default value for the MSR from vCPU creation to > KVM_SET_CPUID2, and only set a non-zero default value if the CPUID bit > is set. > > Another small thing in my notes was to look at the duplication between > emulated_msrs and msr_based_features_all_except_vmx. Right now > MSR_AMD64_DE_CFG is the only one that is not in both and, probably not > a coincidence, it's also the only one implemented only for one vendor. > There's probably some opportunity for both cleanups and fixes. It > looks like svm_has_emulated_msr(MSR_AMD64_DE_CFG) should return true > for example. > > Paolo > Ack. Thanks. > > With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads > > and writes to PERF_CAPABILITIES, and the common userspace MSR handling will > > convert "unsupported" to "success" as appropriate. > > > > [*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@google.com >
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 40e3780d73ae..6d8667b56091 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash [msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0]; break; + case MSR_IA32_PERF_CAPABILITIES: + /* + * Host VMM should not get potentially invalid MSR value if vCPU + * has already run but guest cpuid lacks the support for the + * MSR. + */ + if (msr_info->host_initiated && + kvm_vcpu_has_run(vcpu) && + !guest_cpuid_has(vcpu, X86_FEATURE_PDCM)) + return 1; + break; case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR: if (!guest_can_use(vcpu, X86_FEATURE_VMX)) return 1;
Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read it after vCPU has already run. Previously, KVM will always return the guest cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The guest cached value on default is kvm_caps.supported_perf_cap. However, when userspace sets the value during live migration, the call fails because of the check on X86_FEATURE_PDCM. Initially, it sounds like a pure userspace issue. It is not. After vCPU has run, KVM should faithfully return correct value to satisify legitimate requests from userspace such as VM suspend/resume and live migrartion. In this case, KVM should return 0 when guest cpuid lacks X86_FEATURE_PDCM. So fix the problem by adding an additional check in vmx_set_msr(). Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine because it set_msr() is guarded by kvm_caps.supported_perf_cap which is always 0. Cc: Aaron Lewis <aaronlewis@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> --- arch/x86/kvm/vmx/vmx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) base-commit: fd89499a5151d197ba30f7b801f6d8f4646cf446