diff mbox series

[6/6] KVM: Do compatibility checks on hotplugged CPUs

Message ID 20211227081515.2088920-7-chao.gao@intel.com (mailing list archive)
State New, archived
Headers show
Series Improve KVM's interaction with CPU hotplug | expand

Commit Message

Chao Gao Dec. 27, 2021, 8:15 a.m. UTC
At init time, KVM does compatibility checks to ensure that all online
CPUs support hardware virtualization and a common set of features. But
KVM uses hotplugged CPUs without such compatibility checks. On Intel
CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or
vmentry failure if the hotplugged CPU doesn't meet minimal feature
requirements.

Do compatibility checks when onlining a CPU. If any VM is running,
KVM hotplug callback returns an error to abort onlining incompatible
CPUs.

But if no VM is running, onlining incompatible CPUs is allowed. Instead,
KVM is prohibited from creating VMs similar to the policy for init-time
compatibility checks.

CPU hotplug is disabled during hardware_enable_all() to prevent the corner
case as shown below. A hotplugged CPU marks itself online in
cpu_online_mask (1) and enables interrupt (2) before invoking callbacks
registered in ONLINE section (3). So, if hardware_enable_all() is invoked
on another CPU right after (2), then on_each_cpu() in hardware_enable_all()
invokes hardware_enable_nolock() on the hotplugged CPU before
kvm_online_cpu() is called. This makes the CPU escape from compatibility
checks, which is risky.

	start_secondary { ...
		set_cpu_online(smp_processor_id(), true); <- 1
		...
		local_irq_enable();  <- 2
		...
		cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); <- 3
	}

Keep compatibility checks at KVM init time. It can help to find
incompatibility issues earlier and refuse to load arch KVM module
(e.g., kvm-intel).

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 virt/kvm/kvm_main.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

Comments

Sean Christopherson Jan. 11, 2022, 12:46 a.m. UTC | #1
On Mon, Dec 27, 2021, Chao Gao wrote:
> At init time, KVM does compatibility checks to ensure that all online
> CPUs support hardware virtualization and a common set of features. But
> KVM uses hotplugged CPUs without such compatibility checks. On Intel
> CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or
> vmentry failure if the hotplugged CPU doesn't meet minimal feature
> requirements.
> 
> Do compatibility checks when onlining a CPU. If any VM is running,
> KVM hotplug callback returns an error to abort onlining incompatible
> CPUs.
> 
> But if no VM is running, onlining incompatible CPUs is allowed. Instead,
> KVM is prohibited from creating VMs similar to the policy for init-time
> compatibility checks.

...

> ---
>  virt/kvm/kvm_main.c | 36 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c1054604d1e8..0ff80076d48d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -106,6 +106,8 @@ LIST_HEAD(vm_list);
>  static cpumask_var_t cpus_hardware_enabled;
>  static int kvm_usage_count;
>  static atomic_t hardware_enable_failed;
> +/* Set if hardware becomes incompatible after CPU hotplug */
> +static bool hardware_incompatible;
>  
>  static struct kmem_cache *kvm_vcpu_cache;
>  
> @@ -4855,20 +4857,32 @@ static void hardware_enable_nolock(void *junk)
>  
>  static int kvm_online_cpu(unsigned int cpu)
>  {
> -	int ret = 0;
> +	int ret;
>  
> +	ret = kvm_arch_check_processor_compat();
>  	raw_spin_lock(&kvm_count_lock);
>  	/*
>  	 * Abort the CPU online process if hardware virtualization cannot
>  	 * be enabled. Otherwise running VMs would encounter unrecoverable
>  	 * errors when scheduled to this CPU.
>  	 */
> -	if (kvm_usage_count) {
> +	if (!ret && kvm_usage_count) {
>  		hardware_enable_nolock(NULL);
>  		if (atomic_read(&hardware_enable_failed)) {
>  			ret = -EIO;
>  			pr_info("kvm: abort onlining CPU%d", cpu);
>  		}
> +	} else if (ret && !kvm_usage_count) {
> +		/*
> +		 * Continue onlining an incompatible CPU if no VM is
> +		 * running. KVM should reject creating any VM after this
> +		 * point. Then this CPU can be still used to run non-VM
> +		 * workload.
> +		 */
> +		ret = 0;
> +		hardware_incompatible = true;

This has a fairly big flaw in that it prevents KVM from creating VMs even if the
offending CPU is offlined.  That seems like a very reasonable thing to do, e.g.
admin sees that hotplugging a CPU broke KVM and removes the CPU to remedy the
problem.  And if KVM is built-in, reloading KVM to wipe hardware_incompatible
after offlining the CPU isn't an option.

To make this approach work, I think kvm_offline_cpu() would have to reevaluate
hardware_incompatible if the flag is set.

And should there be a KVM module param to let the admin opt in/out of this
behavior?  E.g. if the primary use case for a system is to run VMs, disabling
KVM just to online a CPU isn't very helpful.

That said, I'm not convinced that continuing with the hotplug in this scenario
is ever the right thing to do.  Either the CPU being hotplugged really is a different
CPU, or it's literally broken.  In both cases, odds are very, very good that running
on the dodgy CPU will hose the kernel sooner or later, i.e. KVM's compatibility checks
are just the canary in the coal mine.

TDX is a different beast as (a) that's purely a security restriction and (b) anyone
trying to run TDX guests darn well better know that TDX doesn't allow hotplug.
In other words, if TDX gets disabled due to hotplug, either someone majorly screwed
up and is going to be unhappy no matter what, or there's no intention of using TDX
and it's a complete don't care.

> +		pr_info("kvm: prohibit VM creation due to incompatible CPU%d",

pr_info() is a bit weak, this should be at least pr_warn() and maybe even pr_err().

> +			cpu);

Eh, I'd omit the newline and let that poke out.

>  	}
>  	raw_spin_unlock(&kvm_count_lock);
>  	return ret;
> @@ -4913,8 +4927,24 @@ static int hardware_enable_all(void)
>  {
>  	int r = 0;
>  
> +	/*
> +	 * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> +	 * is called. on_each_cpu() between them includes the CPU. As a result,
> +	 * hardware_enable_nolock() may get invoked before kvm_online_cpu().
> +	 * This would enable hardware virtualization on that cpu without
> +	 * compatibility checks, which can potentially crash system or break
> +	 * running VMs.
> +	 *
> +	 * Disable CPU hotplug to prevent this case from happening.
> +	 */
> +	cpus_read_lock();
>  	raw_spin_lock(&kvm_count_lock);
>  
> +	if (hardware_incompatible) {

Another error message would likely be helpful here.  Even better would be if KVM
could provide some way for userspace to query which CPU(s) is bad.

> +		r = -EIO;
> +		goto unlock;
> +	}
> +
>  	kvm_usage_count++;
>  	if (kvm_usage_count == 1) {
>  		atomic_set(&hardware_enable_failed, 0);
> @@ -4926,7 +4956,9 @@ static int hardware_enable_all(void)
>  		}
>  	}
>  
> +unlock:
>  	raw_spin_unlock(&kvm_count_lock);
> +	cpus_read_unlock();
>  
>  	return r;
>  }
> -- 
> 2.25.1
>
Chao Gao Jan. 11, 2022, 5:32 a.m. UTC | #2
On Tue, Jan 11, 2022 at 12:46:52AM +0000, Sean Christopherson wrote:
>On Mon, Dec 27, 2021, Chao Gao wrote:
>> At init time, KVM does compatibility checks to ensure that all online
>> CPUs support hardware virtualization and a common set of features. But
>> KVM uses hotplugged CPUs without such compatibility checks. On Intel
>> CPUs, this leads to #GP if the hotplugged CPU doesn't support VMX or
>> vmentry failure if the hotplugged CPU doesn't meet minimal feature
>> requirements.
>> 
>> Do compatibility checks when onlining a CPU. If any VM is running,
>> KVM hotplug callback returns an error to abort onlining incompatible
>> CPUs.
>> 
>> But if no VM is running, onlining incompatible CPUs is allowed. Instead,
>> KVM is prohibited from creating VMs similar to the policy for init-time
>> compatibility checks.
>
>...
>
>> ---
>>  virt/kvm/kvm_main.c | 36 ++++++++++++++++++++++++++++++++++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
>> 
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index c1054604d1e8..0ff80076d48d 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -106,6 +106,8 @@ LIST_HEAD(vm_list);
>>  static cpumask_var_t cpus_hardware_enabled;
>>  static int kvm_usage_count;
>>  static atomic_t hardware_enable_failed;
>> +/* Set if hardware becomes incompatible after CPU hotplug */
>> +static bool hardware_incompatible;
>>  
>>  static struct kmem_cache *kvm_vcpu_cache;
>>  
>> @@ -4855,20 +4857,32 @@ static void hardware_enable_nolock(void *junk)
>>  
>>  static int kvm_online_cpu(unsigned int cpu)
>>  {
>> -	int ret = 0;
>> +	int ret;
>>  
>> +	ret = kvm_arch_check_processor_compat();
>>  	raw_spin_lock(&kvm_count_lock);
>>  	/*
>>  	 * Abort the CPU online process if hardware virtualization cannot
>>  	 * be enabled. Otherwise running VMs would encounter unrecoverable
>>  	 * errors when scheduled to this CPU.
>>  	 */
>> -	if (kvm_usage_count) {
>> +	if (!ret && kvm_usage_count) {
>>  		hardware_enable_nolock(NULL);
>>  		if (atomic_read(&hardware_enable_failed)) {
>>  			ret = -EIO;
>>  			pr_info("kvm: abort onlining CPU%d", cpu);
>>  		}
>> +	} else if (ret && !kvm_usage_count) {
>> +		/*
>> +		 * Continue onlining an incompatible CPU if no VM is
>> +		 * running. KVM should reject creating any VM after this
>> +		 * point. Then this CPU can be still used to run non-VM
>> +		 * workload.
>> +		 */
>> +		ret = 0;
>> +		hardware_incompatible = true;
>
>This has a fairly big flaw in that it prevents KVM from creating VMs even if the
>offending CPU is offlined.  That seems like a very reasonable thing to do, e.g.
>admin sees that hotplugging a CPU broke KVM and removes the CPU to remedy the
>problem.  And if KVM is built-in, reloading KVM to wipe hardware_incompatible
>after offlining the CPU isn't an option.

Ideally, yes, creation VMs should be allowed after offending CPUs are offlined.
But the problem is kind of foundamental: 

After kernel tries to online a CPU without VMX, boot_cpu_has(X86_FEATURE_VMX)
returns false. So, the current behavior is reloading KVM would fail if
kernel *tried* to bring up a CPU without VMX. So, it looks to me that
boot_cpu_has() doesn't do feature re-evalution either. Given that, I doubt
the value of making KVM able to create VM in this case.

>
>To make this approach work, I think kvm_offline_cpu() would have to reevaluate
>hardware_incompatible if the flag is set.
>
>And should there be a KVM module param to let the admin opt in/out of this
>behavior?  E.g. if the primary use case for a system is to run VMs, disabling
>KVM just to online a CPU isn't very helpful.
>
>That said, I'm not convinced that continuing with the hotplug in this scenario
>is ever the right thing to do.  Either the CPU being hotplugged really is a different
>CPU, or it's literally broken.  In both cases, odds are very, very good that running
>on the dodgy CPU will hose the kernel sooner or later, i.e. KVM's compatibility checks
>are just the canary in the coal mine.

Ok. Then here are two options:
1. KVM always prevents incompatible CPUs from being brought up regardless of running VMs
2. make "disabling KVM on incompatible CPUs" an opt-in feature.

Which one do you think is better?

And as said above, even with option 1, KVM reloading would fail due to
boot_cpu_has(X86_FEATURE_VMX). I suppose it isn't necessary to be fixed in this series.

>
>TDX is a different beast as (a) that's purely a security restriction and (b) anyone
>trying to run TDX guests darn well better know that TDX doesn't allow hotplug.
>In other words, if TDX gets disabled due to hotplug, either someone majorly screwed
>up and is going to be unhappy no matter what, or there's no intention of using TDX
>and it's a complete don't care.
>
>> +		pr_info("kvm: prohibit VM creation due to incompatible CPU%d",
>
>pr_info() is a bit weak, this should be at least pr_warn() and maybe even pr_err().
>
>> +			cpu);
>
>Eh, I'd omit the newline and let that poke out.

Will do.

>
>>  	}
>>  	raw_spin_unlock(&kvm_count_lock);
>>  	return ret;
>> @@ -4913,8 +4927,24 @@ static int hardware_enable_all(void)
>>  {
>>  	int r = 0;
>>  
>> +	/*
>> +	 * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
>> +	 * is called. on_each_cpu() between them includes the CPU. As a result,
>> +	 * hardware_enable_nolock() may get invoked before kvm_online_cpu().
>> +	 * This would enable hardware virtualization on that cpu without
>> +	 * compatibility checks, which can potentially crash system or break
>> +	 * running VMs.
>> +	 *
>> +	 * Disable CPU hotplug to prevent this case from happening.
>> +	 */
>> +	cpus_read_lock();
>>  	raw_spin_lock(&kvm_count_lock);
>>  
>> +	if (hardware_incompatible) {
>
>Another error message would likely be helpful here.  Even better would be if KVM
>could provide some way for userspace to query which CPU(s) is bad.

If option 1 is chosen, this check will be removed.

For option 2, will add an error message. And how about a debugfs tunable to provide
the list of bad CPUs?
Sean Christopherson Jan. 12, 2022, 5:52 p.m. UTC | #3
On Tue, Jan 11, 2022, Chao Gao wrote:
> On Tue, Jan 11, 2022 at 12:46:52AM +0000, Sean Christopherson wrote:
> >This has a fairly big flaw in that it prevents KVM from creating VMs even if the
> >offending CPU is offlined.  That seems like a very reasonable thing to do, e.g.
> >admin sees that hotplugging a CPU broke KVM and removes the CPU to remedy the
> >problem.  And if KVM is built-in, reloading KVM to wipe hardware_incompatible
> >after offlining the CPU isn't an option.

...

> >That said, I'm not convinced that continuing with the hotplug in this scenario
> >is ever the right thing to do.  Either the CPU being hotplugged really is a different
> >CPU, or it's literally broken.  In both cases, odds are very, very good that running
> >on the dodgy CPU will hose the kernel sooner or later, i.e. KVM's compatibility checks
> >are just the canary in the coal mine.
> 
> Ok. Then here are two options:
> 1. KVM always prevents incompatible CPUs from being brought up regardless of running VMs
> 2. make "disabling KVM on incompatible CPUs" an opt-in feature.
> 
> Which one do you think is better?

IMO, #1.  It's simpler to implement and document, and is less likely to surprise
the user.  We can always pivot to #2 _if_ anyone requests the ability to dynamically
disable KVM in order to bring up heterogenous CPUs and has a reasonable, sane use
case for doing so.  But that's a big "if" as I would be very surprised if it's even
possible to encounter such a setup without a hardware bug, firmware bug, and/or user
error.
Jim Mattson Jan. 12, 2022, 11:01 p.m. UTC | #4
On Wed, Jan 12, 2022 at 9:54 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Jan 11, 2022, Chao Gao wrote:
> > On Tue, Jan 11, 2022 at 12:46:52AM +0000, Sean Christopherson wrote:
> > >This has a fairly big flaw in that it prevents KVM from creating VMs even if the
> > >offending CPU is offlined.  That seems like a very reasonable thing to do, e.g.
> > >admin sees that hotplugging a CPU broke KVM and removes the CPU to remedy the
> > >problem.  And if KVM is built-in, reloading KVM to wipe hardware_incompatible
> > >after offlining the CPU isn't an option.
>
> ...
>
> > >That said, I'm not convinced that continuing with the hotplug in this scenario
> > >is ever the right thing to do.  Either the CPU being hotplugged really is a different
> > >CPU, or it's literally broken.  In both cases, odds are very, very good that running
> > >on the dodgy CPU will hose the kernel sooner or later, i.e. KVM's compatibility checks
> > >are just the canary in the coal mine.
> >
> > Ok. Then here are two options:
> > 1. KVM always prevents incompatible CPUs from being brought up regardless of running VMs
> > 2. make "disabling KVM on incompatible CPUs" an opt-in feature.
> >
> > Which one do you think is better?
>
> IMO, #1.  It's simpler to implement and document, and is less likely to surprise
> the user.  We can always pivot to #2 _if_ anyone requests the ability to dynamically
> disable KVM in order to bring up heterogenous CPUs and has a reasonable, sane use
> case for doing so.  But that's a big "if" as I would be very surprised if it's even
> possible to encounter such a setup without a hardware bug, firmware bug, and/or user
> error.

How quickly we forget the Woodcrest B/G fiasco.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c1054604d1e8..0ff80076d48d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -106,6 +106,8 @@  LIST_HEAD(vm_list);
 static cpumask_var_t cpus_hardware_enabled;
 static int kvm_usage_count;
 static atomic_t hardware_enable_failed;
+/* Set if hardware becomes incompatible after CPU hotplug */
+static bool hardware_incompatible;
 
 static struct kmem_cache *kvm_vcpu_cache;
 
@@ -4855,20 +4857,32 @@  static void hardware_enable_nolock(void *junk)
 
 static int kvm_online_cpu(unsigned int cpu)
 {
-	int ret = 0;
+	int ret;
 
+	ret = kvm_arch_check_processor_compat();
 	raw_spin_lock(&kvm_count_lock);
 	/*
 	 * Abort the CPU online process if hardware virtualization cannot
 	 * be enabled. Otherwise running VMs would encounter unrecoverable
 	 * errors when scheduled to this CPU.
 	 */
-	if (kvm_usage_count) {
+	if (!ret && kvm_usage_count) {
 		hardware_enable_nolock(NULL);
 		if (atomic_read(&hardware_enable_failed)) {
 			ret = -EIO;
 			pr_info("kvm: abort onlining CPU%d", cpu);
 		}
+	} else if (ret && !kvm_usage_count) {
+		/*
+		 * Continue onlining an incompatible CPU if no VM is
+		 * running. KVM should reject creating any VM after this
+		 * point. Then this CPU can be still used to run non-VM
+		 * workload.
+		 */
+		ret = 0;
+		hardware_incompatible = true;
+		pr_info("kvm: prohibit VM creation due to incompatible CPU%d",
+			cpu);
 	}
 	raw_spin_unlock(&kvm_count_lock);
 	return ret;
@@ -4913,8 +4927,24 @@  static int hardware_enable_all(void)
 {
 	int r = 0;
 
+	/*
+	 * During onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+	 * is called. on_each_cpu() between them includes the CPU. As a result,
+	 * hardware_enable_nolock() may get invoked before kvm_online_cpu().
+	 * This would enable hardware virtualization on that cpu without
+	 * compatibility checks, which can potentially crash system or break
+	 * running VMs.
+	 *
+	 * Disable CPU hotplug to prevent this case from happening.
+	 */
+	cpus_read_lock();
 	raw_spin_lock(&kvm_count_lock);
 
+	if (hardware_incompatible) {
+		r = -EIO;
+		goto unlock;
+	}
+
 	kvm_usage_count++;
 	if (kvm_usage_count == 1) {
 		atomic_set(&hardware_enable_failed, 0);
@@ -4926,7 +4956,9 @@  static int hardware_enable_all(void)
 		}
 	}
 
+unlock:
 	raw_spin_unlock(&kvm_count_lock);
+	cpus_read_unlock();
 
 	return r;
 }