diff mbox series

[v2,02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup

Message ID 20240517173926.965351-3-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series KVM: x86: CPUID overhaul, fixes, and caching | expand

Commit Message

Sean Christopherson May 17, 2024, 5:38 p.m. UTC
Explicitly perform runtime CPUID adjustments as part of the "after set
CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID
state during kvm_update_cpuid_runtime().  E.g. see commit 4736d85f0d18
("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT").

Whacking each mole individually is not sustainable or robust, e.g. while
the aforemention commit fixed KVM's PV features, the same issue lurks for
Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime
features (though spoiler alert, neither should KVM).

Updating runtime features in the "full" path will also simplify adding a
snapshot of the guest's capabilities, i.e. of caching the intersection of
guest CPUID and kvm_cpu_caps (modulo a few edge cases).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Maxim Levitsky July 5, 2024, 12:51 a.m. UTC | #1
On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Explicitly perform runtime CPUID adjustments as part of the "after set
> CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID
> state during kvm_update_cpuid_runtime().  E.g. see commit 4736d85f0d18
> ("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT").
> 
> Whacking each mole individually is not sustainable or robust, e.g. while
> the aforemention commit fixed KVM's PV features, the same issue lurks for
> Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime
> features (though spoiler alert, neither should KVM).

> 
> Updating runtime features in the "full" path will also simplify adding a
> snapshot of the guest's capabilities, i.e. of caching the intersection of
> guest CPUID and kvm_cpu_caps (modulo a few edge cases).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 2b19ff991ceb..e60ffb421e4b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -345,6 +345,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	bitmap_zero(vcpu->arch.governed_features.enabled,
>  		    KVM_MAX_NR_GOVERNED_FEATURES);
>  
> +	kvm_update_cpuid_runtime(vcpu);
> +
>  	/*
>  	 * If TDP is enabled, let the guest use GBPAGES if they're supported in
>  	 * hardware.  The hardware page walker doesn't let KVM disable GBPAGES,
> @@ -426,8 +428,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  {
>  	int r;
>  
> -	__kvm_update_cpuid_runtime(vcpu, e2, nent);
> -
>  	/*
>  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
>  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	 * whether the supplied CPUID data is equal to what's already set.
>  	 */
>  	if (kvm_vcpu_has_run(vcpu)) {
> +		/*
> +		 * Note, runtime CPUID updates may consume other CPUID-driven
> +		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> +		 * state before full CPUID processing is functionally correct
> +		 * only because any change in CPUID is disallowed, i.e. using
> +		 * stale data is ok because KVM will reject the change.
> +		 */

If I understand correctly the sole reason for the below __kvm_update_cpuid_runtime
is to ensure that kvm_cpuid_check_equal doesn't fail because current cpuid also
was post-processed with runtime updates.

Can we have a comment stating this? Or even better how about moving the
call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal,
to emphasize this?


> +		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> +
>  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
>  		if (r)
>  			return r;



Overall I am not 100% sure what is better:

Before the patch it was roughly like this:

1. Post process the user given cpuid with bits of KVM runtime state (like xcr0)
At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID.

2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid)

3. kvm_check_cpuid on the user provided cpuid

4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid

5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid

6. kvm_vcpu_after_set_cpuid itself.


After this change it works like that:

1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid)
2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs
3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing
4. kvm_get_hypervisor_cpuid
5. kvm_update_cpuid_runtime
6. The old kvm_vcpu_after_set_cpuid

I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and kvm_get_hypervisor_cpuid into
kvm_vcpu_after_set_cpuid would clean up this mess a bit regardless of this patch.

Best regards,
	Maxim Levitsky
Sean Christopherson July 9, 2024, 7:46 p.m. UTC | #2
On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> >  	/*
> >  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> >  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> >  	 * whether the supplied CPUID data is equal to what's already set.
> >  	 */
> >  	if (kvm_vcpu_has_run(vcpu)) {
> > +		/*
> > +		 * Note, runtime CPUID updates may consume other CPUID-driven
> > +		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> > +		 * state before full CPUID processing is functionally correct
> > +		 * only because any change in CPUID is disallowed, i.e. using
> > +		 * stale data is ok because KVM will reject the change.
> > +		 */
> 
> If I understand correctly the sole reason for the below
> __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't
> fail because current cpuid also was post-processed with runtime updates.

Yep.

> Can we have a comment stating this? Or even better how about moving the
> call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal,
> to emphasize this?

Ya, I'll do both.

> > +		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> > +
> >  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
> >  		if (r)
> >  			return r;
> 
> 
> 
> Overall I am not 100% sure what is better:
> 
> Before the patch it was roughly like this:
> 
> 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0)
> At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID.
> 
> 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid)
> 
> 3. kvm_check_cpuid on the user provided cpuid
> 
> 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid
> 
> 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid
> 
> 6. kvm_vcpu_after_set_cpuid itself.
> 
> 
> After this change it works like that:
> 
> 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid)
> 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs
> 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing
> 4. kvm_get_hypervisor_cpuid
> 5. kvm_update_cpuid_runtime
> 6. The old kvm_vcpu_after_set_cpuid
> 
> I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and
> kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this
> mess a bit regardless of this patch.

It takes many more patches, but doing the swap() allows for the removal of several
APIs that poke into a "raw" kvm_cpuid_entry2 array, and by the end of the series
(with your above feedback addressed) the code gets to (sans comments):

	swap(vcpu->arch.cpuid_entries, e2);
	swap(vcpu->arch.cpuid_nent, nent);

	memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));

	if (kvm_vcpu_has_run(vcpu)) {
		r = kvm_cpuid_check_equal(vcpu, e2, nent);
		if (r)
			goto err;
		goto success;
	}

Those are really just bonuses though, the main goal is to prevent recurrences of
bugs where KVM consumes stale vCPU state[*], which is what prompted this change.

[*] https://lore.kernel.org/all/20240228101837.93642-1-vkuznets@redhat.com
Maxim Levitsky July 24, 2024, 5:24 p.m. UTC | #3
On Tue, 2024-07-09 at 12:46 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > >  	/*
> > >  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> > >  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> > > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> > >  	 * whether the supplied CPUID data is equal to what's already set.
> > >  	 */
> > >  	if (kvm_vcpu_has_run(vcpu)) {
> > > +		/*
> > > +		 * Note, runtime CPUID updates may consume other CPUID-driven
> > > +		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> > > +		 * state before full CPUID processing is functionally correct
> > > +		 * only because any change in CPUID is disallowed, i.e. using
> > > +		 * stale data is ok because KVM will reject the change.
> > > +		 */
> > 
> > If I understand correctly the sole reason for the below
> > __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't
> > fail because current cpuid also was post-processed with runtime updates.
> 
> Yep.
> 
> > Can we have a comment stating this? Or even better how about moving the
> > call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal,
> > to emphasize this?
> 
> Ya, I'll do both.
> 
> > > +		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> > > +
> > >  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
> > >  		if (r)
> > >  			return r;
> > 
> > 
> > Overall I am not 100% sure what is better:
> > 
> > Before the patch it was roughly like this:
> > 
> > 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0)
> > At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID.
> > 
> > 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid)
> > 
> > 3. kvm_check_cpuid on the user provided cpuid
> > 
> > 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid
> > 
> > 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid
> > 
> > 6. kvm_vcpu_after_set_cpuid itself.
> > 
> > 
> > After this change it works like that:
> > 
> > 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid)
> > 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs
> > 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing
> > 4. kvm_get_hypervisor_cpuid
> > 5. kvm_update_cpuid_runtime
> > 6. The old kvm_vcpu_after_set_cpuid
> > 
> > I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and
> > kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this
> > mess a bit regardless of this patch.
> 
> It takes many more patches, but doing the swap() allows for the removal of several
> APIs that poke into a "raw" kvm_cpuid_entry2 array, and by the end of the series
> (with your above feedback addressed) the code gets to (sans comments):
> 
> 	swap(vcpu->arch.cpuid_entries, e2);
> 	swap(vcpu->arch.cpuid_nent, nent);
> 
> 	memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
> 	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
> 
> 	if (kvm_vcpu_has_run(vcpu)) {
> 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
> 		if (r)
> 			goto err;
> 		goto success;
> 	}
> 
> Those are really just bonuses though, the main goal is to prevent recurrences of
> bugs where KVM consumes stale vCPU state[*], which is what prompted this change.
> 
> [*] https://lore.kernel.org/all/20240228101837.93642-1-vkuznets@redhat.com
> 

All makes sense, thanks!
Best regards,
	Maxim Levitsky
diff mbox series

Patch

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2b19ff991ceb..e60ffb421e4b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -345,6 +345,8 @@  void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	bitmap_zero(vcpu->arch.governed_features.enabled,
 		    KVM_MAX_NR_GOVERNED_FEATURES);
 
+	kvm_update_cpuid_runtime(vcpu);
+
 	/*
 	 * If TDP is enabled, let the guest use GBPAGES if they're supported in
 	 * hardware.  The hardware page walker doesn't let KVM disable GBPAGES,
@@ -426,8 +428,6 @@  static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 {
 	int r;
 
-	__kvm_update_cpuid_runtime(vcpu, e2, nent);
-
 	/*
 	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
 	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -440,6 +440,15 @@  static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	 * whether the supplied CPUID data is equal to what's already set.
 	 */
 	if (kvm_vcpu_has_run(vcpu)) {
+		/*
+		 * Note, runtime CPUID updates may consume other CPUID-driven
+		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
+		 * state before full CPUID processing is functionally correct
+		 * only because any change in CPUID is disallowed, i.e. using
+		 * stale data is ok because KVM will reject the change.
+		 */
+		__kvm_update_cpuid_runtime(vcpu, e2, nent);
+
 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
 		if (r)
 			return r;