diff mbox series

KVM: x86/pmu: Return correct value of IA32_PERF_CAPABILITIES for userspace after vCPU has run

Message ID 20240313003739.3365845-1-mizhang@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86/pmu: Return correct value of IA32_PERF_CAPABILITIES for userspace after vCPU has run | expand

Commit Message

Mingwei Zhang March 13, 2024, 12:37 a.m. UTC
Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read
it after vCPU has already run. Previously, KVM will always return the guest
cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The
guest cached value on default is kvm_caps.supported_perf_cap. However, when
userspace sets the value during live migration, the call fails because of
the check on X86_FEATURE_PDCM.

Initially, it sounds like a pure userspace issue. It is not. After vCPU has
run, KVM should faithfully return correct value to satisify legitimate
requests from userspace such as VM suspend/resume and live migrartion. In
this case, KVM should return 0 when guest cpuid lacks X86_FEATURE_PDCM. So
fix the problem by adding an additional check in vmx_set_msr().

Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine
because it set_msr() is guarded by kvm_caps.supported_perf_cap which is
always 0.

Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 11 +++++++++++
 1 file changed, 11 insertions(+)


base-commit: fd89499a5151d197ba30f7b801f6d8f4646cf446

Comments

Wang, Wei W March 13, 2024, 10:22 a.m. UTC | #1
On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote:
> Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read
> it after vCPU has already run. Previously, KVM will always return the guest
> cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The
> guest cached value on default is kvm_caps.supported_perf_cap. However,
> when userspace sets the value during live migration, the call fails because of
> the check on X86_FEATURE_PDCM.

Could you point where in the set_msr path that could fail?
(I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and
kvm_set_msr_common)

> 
> Initially, it sounds like a pure userspace issue. It is not. After vCPU has run,
> KVM should faithfully return correct value to satisify legitimate requests from
> userspace such as VM suspend/resume and live migrartion. In this case, KVM
> should return 0 when guest cpuid lacks X86_FEATURE_PDCM. 
Some typos above (satisfy, migration, CPUID)

Seems the description here isn’t aligned to your code below?
The code below prevents userspace from reading the MSR value (not return 0 as the
read value) in that case.

>So fix the
> problem by adding an additional check in vmx_set_msr().
> 
> Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine
> because it set_msr() is guarded by kvm_caps.supported_perf_cap which is
> always 0.
> 
> Cc: Aaron Lewis <aaronlewis@google.com>
> Cc: Jim Mattson <jmattson@google.com>
> Signed-off-by: Mingwei Zhang <mizhang@google.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> 40e3780d73ae..6d8667b56091 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
>  		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
>  			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
>  		break;
> +	case MSR_IA32_PERF_CAPABILITIES:
> +		/*
> +		 * Host VMM should not get potentially invalid MSR value if
> vCPU
> +		 * has already run but guest cpuid lacks the support for the
> +		 * MSR.
> +		 */
> +		if (msr_info->host_initiated &&
> +		    kvm_vcpu_has_run(vcpu) &&
> +		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> +			return 1;
> +		break;
>  	case KVM_FIRST_EMULATED_VMX_MSR ...
> KVM_LAST_EMULATED_VMX_MSR:
>  		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
>  			return 1;
> 
> base-commit: fd89499a5151d197ba30f7b801f6d8f4646cf446
> --
> 2.44.0.291.gc1ea87d7ee-goog
>
Sean Christopherson March 13, 2024, 2:42 p.m. UTC | #2
On Wed, Mar 13, 2024, Wei W Wang wrote:
> On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote:
> > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read
> > it after vCPU has already run. Previously, KVM will always return the guest
> > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The
> > guest cached value on default is kvm_caps.supported_perf_cap. However,
> > when userspace sets the value during live migration, the call fails because of
> > the check on X86_FEATURE_PDCM.
> 
> Could you point where in the set_msr path that could fail?
> (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and
> kvm_set_msr_common)

The changelog is misleading, it's not the PDCM feature bit, it's the PMU version
check in vmx_set_msr():

	case MSR_IA32_PERF_CAPABILITIES:
		if (data && !vcpu_to_pmu(vcpu)->version)
			return 1;

> > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run,
> > KVM should faithfully return correct value to satisify legitimate requests from
> > userspace such as VM suspend/resume and live migrartion. In this case, KVM
> > should return 0 when guest cpuid lacks X86_FEATURE_PDCM. 
> Some typos above (satisfy, migration, CPUID)
> 
> Seems the description here isn’t aligned to your code below?
> The code below prevents userspace from reading the MSR value (not return 0 as the
> read value) in that case.

Ya.

> >So fix the
> > problem by adding an additional check in vmx_set_msr().
> > 
> > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine
> > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is
> > always 0.
> > 
> > Cc: Aaron Lewis <aaronlewis@google.com>
> > Cc: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 40e3780d73ae..6d8667b56091 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
> > struct msr_data *msr_info)
> >  		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
> >  			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
> >  		break;
> > +	case MSR_IA32_PERF_CAPABILITIES:
> > +		/*
> > +		 * Host VMM should not get potentially invalid MSR value if
> > vCPU
> > +		 * has already run but guest cpuid lacks the support for the
> > +		 * MSR.
> > +		 */
> > +		if (msr_info->host_initiated &&
> > +		    kvm_vcpu_has_run(vcpu) &&
> > +		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> > +			return 1;

As Wei pointed out, this doesn't match the changelog.  And I don't think this is
what we want, at least not in isolation.  Making KVM more restrictive on userspace
reads doesn't solve the live migration save/restore issue, and the kvm_vcpu_has_run()
adds yet another flavor of MSR handling.

We discussed this whole MSRs mess at PUCK this morning.  I forgot to hit RECORD,
but Paolo took notes and will post them soon.

Going from memory, the plan is to:

  1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to
     KVM_SET_MSRS.

  2. Go with roughly what I proposed in the CET thread (for unsupported MSRS,
     read 0 and drop writes of '0')[*].

  3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO
     (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero-
      initialize the MSRs).

With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads
and writes to PERF_CAPABILITIES, and the common userspace MSR handling will
convert "unsupported" to "success" as appropriate.

[*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@google.com
Paolo Bonzini March 13, 2024, 3:01 p.m. UTC | #3
On Wed, Mar 13, 2024 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote:
> We discussed this whole MSRs mess at PUCK this morning.  I forgot to hit RECORD,
> but Paolo took notes and will post them soon.
>
> Going from memory, the plan is to:
>
>   1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to
>      KVM_SET_MSRS.

Correct.

>   2. Go with roughly what I proposed in the CET thread (for unsupported MSRS,
>      read 0 and drop writes of '0')[*].

More precisely, read a sensible default value corresponding to
"everything disabled", which generally speaking should be 0. And
generally speaking, commit to:
- allowing host_initiated reads independent of CPUID
- allowing host_initiated writes of the same value that was read
- blocking host_initiated writes of nonzero (or nondefault) values if
the corresponding guest CPUID bit is clear

Right now some MSRs do not allow host initiated writes, for example
MSR_KVM_* (check for calls to guest_pv_has), and the VMX MSRs.

Generally speaking we want to fix them, unless it's an unholy pain
(for example the VMX capabilities MSRs are good candidates for pain,
because they have some "must be 1" bits in bits 63:32).

All this should be covered by selftests.

>   3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO
>      (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero-
>       initialize the MSRs).

More precisely, even if quirk==enabled we will move the setting of a
non-zero default value for the MSR from vCPU creation to
KVM_SET_CPUID2, and only set a non-zero default value if the CPUID bit
is set.

Another small thing in my notes was to look at the duplication between
emulated_msrs and msr_based_features_all_except_vmx. Right now
MSR_AMD64_DE_CFG is the only one that is not in both and, probably not
a coincidence, it's also the only one implemented only for one vendor.
There's probably some opportunity for both cleanups and fixes. It
looks like svm_has_emulated_msr(MSR_AMD64_DE_CFG) should return true
for example.

Paolo

> With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads
> and writes to PERF_CAPABILITIES, and the common userspace MSR handling will
> convert "unsupported" to "success" as appropriate.
>
> [*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@google.com
Mingwei Zhang March 13, 2024, 5:30 p.m. UTC | #4
On Wed, Mar 13, 2024, Wang, Wei W wrote:
> On Wednesday, March 13, 2024 8:38 AM, Mingwei Zhang wrote:
> > Return correct value of IA32_PERF_CAPABILITIES when userspace tries to read
> > it after vCPU has already run. Previously, KVM will always return the guest
> > cached value on get_msr() even if guest CPUID lacks X86_FEATURE_PDCM. The
> > guest cached value on default is kvm_caps.supported_perf_cap. However,
> > when userspace sets the value during live migration, the call fails because of
> > the check on X86_FEATURE_PDCM.
> 
> Could you point where in the set_msr path that could fail?
> (I don’t find there is a check of X86_FEATURE_PDCM in vmx_set_msr and
> kvm_set_msr_common)
> 
My memory cheated me... The check was on pmu->version, which not
X86_FEATURE_PDCM. Note pmu->version is basically backed by another bits
guest CPUID.

> > 
> > Initially, it sounds like a pure userspace issue. It is not. After vCPU has run,
> > KVM should faithfully return correct value to satisify legitimate requests from
> > userspace such as VM suspend/resume and live migrartion. In this case, KVM
> > should return 0 when guest cpuid lacks X86_FEATURE_PDCM. 
> Some typos above (satisfy, migration, CPUID)
> 
> Seems the description here isn’t aligned to your code below?
> The code below prevents userspace from reading the MSR value (not return 0 as the
> read value) in that case.
> 
> >So fix the
> > problem by adding an additional check in vmx_set_msr().
> > 
> > Note that IA32_PERF_CAPABILITIES is emulated on AMD side, which is fine
> > because it set_msr() is guarded by kvm_caps.supported_perf_cap which is
> > always 0.
> > 
> > Cc: Aaron Lewis <aaronlewis@google.com>
> > Cc: Jim Mattson <jmattson@google.com>
> > Signed-off-by: Mingwei Zhang <mizhang@google.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 40e3780d73ae..6d8667b56091 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2049,6 +2049,17 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu,
> > struct msr_data *msr_info)
> >  		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
> >  			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
> >  		break;
> > +	case MSR_IA32_PERF_CAPABILITIES:
> > +		/*
> > +		 * Host VMM should not get potentially invalid MSR value if
> > vCPU
> > +		 * has already run but guest cpuid lacks the support for the
> > +		 * MSR.
> > +		 */
> > +		if (msr_info->host_initiated &&
> > +		    kvm_vcpu_has_run(vcpu) &&
> > +		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> > +			return 1;
> > +		break;
> >  	case KVM_FIRST_EMULATED_VMX_MSR ...
> > KVM_LAST_EMULATED_VMX_MSR:
> >  		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> >  			return 1;
> > 
> > base-commit: fd89499a5151d197ba30f7b801f6d8f4646cf446
> > --
> > 2.44.0.291.gc1ea87d7ee-goog
> > 
>
Mingwei Zhang March 13, 2024, 6:18 p.m. UTC | #5
On Wed, Mar 13, 2024, Paolo Bonzini wrote:
> On Wed, Mar 13, 2024 at 3:42 PM Sean Christopherson <seanjc@google.com> wrote:
> > We discussed this whole MSRs mess at PUCK this morning.  I forgot to hit RECORD,
> > but Paolo took notes and will post them soon.
> >
> > Going from memory, the plan is to:
> >
> >   1. Commit to, and document, that userspace must do KVM_SET_CPUID{,2} prior to
> >      KVM_SET_MSRS.
> 
> Correct.

This is clear to me now. Glad to have the direction settled down.

> 
> >   2. Go with roughly what I proposed in the CET thread (for unsupported MSRS,
> >      read 0 and drop writes of '0')[*].
> 
> More precisely, read a sensible default value corresponding to
> "everything disabled", which generally speaking should be 0. And
> generally speaking, commit to:
> - allowing host_initiated reads independent of CPUID
> - allowing host_initiated writes of the same value that was read
> - blocking host_initiated writes of nonzero (or nondefault) values if
> the corresponding guest CPUID bit is clear

>
> Right now some MSRs do not allow host initiated writes, for example
> MSR_KVM_* (check for calls to guest_pv_has), and the VMX MSRs.
> 
> Generally speaking we want to fix them, unless it's an unholy pain
> (for example the VMX capabilities MSRs are good candidates for pain,
> because they have some "must be 1" bits in bits 63:32).
> 
> All this should be covered by selftests.
> 
> >   3. Add a quire for PERF_CAPABILITIES, ARCH_CAPABILITIES, and PLATFORM_INFO
> >      (if quirk==enabled, keep KVM's current behavior; if quirk==disabled, zero-
> >       initialize the MSRs).
> 
> More precisely, even if quirk==enabled we will move the setting of a
> non-zero default value for the MSR from vCPU creation to
> KVM_SET_CPUID2, and only set a non-zero default value if the CPUID bit
> is set.
> 
> Another small thing in my notes was to look at the duplication between
> emulated_msrs and msr_based_features_all_except_vmx. Right now
> MSR_AMD64_DE_CFG is the only one that is not in both and, probably not
> a coincidence, it's also the only one implemented only for one vendor.
> There's probably some opportunity for both cleanups and fixes. It
> looks like svm_has_emulated_msr(MSR_AMD64_DE_CFG) should return true
> for example.
> 
> Paolo
> 

Ack. Thanks.

> > With those pieces in place, KVM can simply check X86_FEATURE_PDCM for both reads
> > and writes to PERF_CAPABILITIES, and the common userspace MSR handling will
> > convert "unsupported" to "success" as appropriate.
> >
> > [*] https://lore.kernel.org/all/ZfDdS8rtVtyEr0UR@google.com
>
diff mbox series

Patch

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 40e3780d73ae..6d8667b56091 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2049,6 +2049,17 @@  static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
 			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
 		break;
+	case MSR_IA32_PERF_CAPABILITIES:
+		/*
+		 * Host VMM should not get potentially invalid MSR value if vCPU
+		 * has already run but guest cpuid lacks the support for the
+		 * MSR.
+		 */
+		if (msr_info->host_initiated &&
+		    kvm_vcpu_has_run(vcpu) &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
+			return 1;
+		break;
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
 			return 1;