diff mbox

[21/31] nVMX: vmcs12 checks on nested entry

Message ID 201105161954.p4GJseNX001963@rice.haifa.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nadav Har'El May 16, 2011, 7:54 p.m. UTC
This patch adds a bunch of tests of the validity of the vmcs12 fields,
according to what the VMX spec and our implementation allows. If fields
we cannot (or don't want to) honor are discovered, an entry failure is
emulated.

According to the spec, there are two types of entry failures: If the problem
was in vmcs12's host state or control fields, the VMLAUNCH instruction simply
fails. But a problem is found in the guest state, the behavior is more
similar to that of an exit.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
---
 arch/x86/include/asm/vmx.h |    8 ++
 arch/x86/kvm/vmx.c         |   94 +++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Tian, Kevin May 25, 2011, 3:01 a.m. UTC | #1
> From: Nadav Har'El
> Sent: Tuesday, May 17, 2011 3:55 AM
> 
> This patch adds a bunch of tests of the validity of the vmcs12 fields,
> according to what the VMX spec and our implementation allows. If fields
> we cannot (or don't want to) honor are discovered, an entry failure is
> emulated.
> 
> According to the spec, there are two types of entry failures: If the problem
> was in vmcs12's host state or control fields, the VMLAUNCH instruction simply
> fails. But a problem is found in the guest state, the behavior is more
> similar to that of an exit.
> 
> Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
> ---
>  arch/x86/include/asm/vmx.h |    8 ++
>  arch/x86/kvm/vmx.c         |   94
> +++++++++++++++++++++++++++++++++++
>  2 files changed, 102 insertions(+)
> 
> --- .before/arch/x86/kvm/vmx.c	2011-05-16 22:36:49.000000000 +0300
> +++ .after/arch/x86/kvm/vmx.c	2011-05-16 22:36:49.000000000 +0300
> @@ -870,6 +870,10 @@ static inline bool nested_cpu_has2(struc
>  		(vmcs12->secondary_vm_exec_control & bit);
>  }
> 
> +static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
> +			struct vmcs12 *vmcs12,
> +			u32 reason, unsigned long qualification);
> +
>  static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
>  {
>  	int i;
> @@ -6160,6 +6164,79 @@ static int nested_vmx_run(struct kvm_vcp
> 
>  	vmcs12 = get_vmcs12(vcpu);
> 
> +	/*
> +	 * The nested entry process starts with enforcing various prerequisites
> +	 * on vmcs12 as required by the Intel SDM, and act appropriately when
> +	 * they fail: As the SDM explains, some conditions should cause the
> +	 * instruction to fail, while others will cause the instruction to seem
> +	 * to succeed, but return an EXIT_REASON_INVALID_STATE.
> +	 * To speed up the normal (success) code path, we should avoid checking
> +	 * for misconfigurations which will anyway be caught by the processor
> +	 * when using the merged vmcs02.
> +	 */
> +	if (vmcs12->launch_state == launch) {
> +		nested_vmx_failValid(vcpu,
> +			launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS
> +			       : VMXERR_VMRESUME_NONLAUNCHED_VMCS);
> +		return 1;
> +	}

from SDM:
	ELSIF (VMLAUNCH and launch state of current VMCS is not "clear")
		THEN VMfailValid(VMLAUNCH with non-clear VMCS);
	ELSIF (VMRESUME and launch state of current VMCS is not "launched")
		THEN VMfailValid(VMRESUME with non-launched VMCS);

So it's legal to use VMLAUNCH on a launched VMCS. However here you
changes this behavior. On the other hand, do you want to add a 'clear' state
along with L1 VMCLEAR to catch the failure here?

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nadav Har'El May 25, 2011, 5:38 a.m. UTC | #2
On Wed, May 25, 2011, Tian, Kevin wrote about "RE: [PATCH 21/31] nVMX: vmcs12 checks on nested entry":
> > +	if (vmcs12->launch_state == launch) {
> > +		nested_vmx_failValid(vcpu,
> > +			launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS
> > +			       : VMXERR_VMRESUME_NONLAUNCHED_VMCS);
> > +		return 1;
> > +	}
> 
> from SDM:
> 	ELSIF (VMLAUNCH and launch state of current VMCS is not "clear")
> 		THEN VMfailValid(VMLAUNCH with non-clear VMCS);
> 	ELSIF (VMRESUME and launch state of current VMCS is not "launched")
> 		THEN VMfailValid(VMRESUME with non-launched VMCS);
> 
> So it's legal to use VMLAUNCH on a launched VMCS. However here you
> changes this behavior. On the other hand, do you want to add a 'clear' state
> along with L1 VMCLEAR to catch the failure here?

I don't understand: I always understood the spec to mean that "clear" and
"launched" the two opposite states of the "launch state" bit? If it isn't,
what does "clear" mean?

Is it really "legal to use a VMLAUNCH on a launched VMCS"?
If it is, why does KVM, for example, go to great lengths to VMLAUNCH the
first time, and VMRESUME all subsequent times?
Tian, Kevin May 25, 2011, 7:33 a.m. UTC | #3
> From: Nadav Har'El [mailto:nyh@math.technion.ac.il]
> Sent: Wednesday, May 25, 2011 1:38 PM
> 
> On Wed, May 25, 2011, Tian, Kevin wrote about "RE: [PATCH 21/31] nVMX:
> vmcs12 checks on nested entry":
> > > +	if (vmcs12->launch_state == launch) {
> > > +		nested_vmx_failValid(vcpu,
> > > +			launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS
> > > +			       : VMXERR_VMRESUME_NONLAUNCHED_VMCS);
> > > +		return 1;
> > > +	}
> >
> > from SDM:
> > 	ELSIF (VMLAUNCH and launch state of current VMCS is not "clear")
> > 		THEN VMfailValid(VMLAUNCH with non-clear VMCS);
> > 	ELSIF (VMRESUME and launch state of current VMCS is not "launched")
> > 		THEN VMfailValid(VMRESUME with non-launched VMCS);
> >
> > So it's legal to use VMLAUNCH on a launched VMCS. However here you
> > changes this behavior. On the other hand, do you want to add a 'clear' state
> > along with L1 VMCLEAR to catch the failure here?
> 
> I don't understand: I always understood the spec to mean that "clear" and
> "launched" the two opposite states of the "launch state" bit? If it isn't,
> what does "clear" mean?
> 
> Is it really "legal to use a VMLAUNCH on a launched VMCS"?
> If it is, why does KVM, for example, go to great lengths to VMLAUNCH the
> first time, and VMRESUME all subsequent times?
> 

You're correct. I've got my head messed on this point. :-)

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- .before/arch/x86/kvm/vmx.c	2011-05-16 22:36:49.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2011-05-16 22:36:49.000000000 +0300
@@ -870,6 +870,10 @@  static inline bool nested_cpu_has2(struc
 		(vmcs12->secondary_vm_exec_control & bit);
 }
 
+static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
+			struct vmcs12 *vmcs12,
+			u32 reason, unsigned long qualification);
+
 static int __find_msr_index(struct vcpu_vmx *vmx, u32 msr)
 {
 	int i;
@@ -6160,6 +6164,79 @@  static int nested_vmx_run(struct kvm_vcp
 
 	vmcs12 = get_vmcs12(vcpu);
 
+	/*
+	 * The nested entry process starts with enforcing various prerequisites
+	 * on vmcs12 as required by the Intel SDM, and act appropriately when
+	 * they fail: As the SDM explains, some conditions should cause the
+	 * instruction to fail, while others will cause the instruction to seem
+	 * to succeed, but return an EXIT_REASON_INVALID_STATE.
+	 * To speed up the normal (success) code path, we should avoid checking
+	 * for misconfigurations which will anyway be caught by the processor
+	 * when using the merged vmcs02.
+	 */
+	if (vmcs12->launch_state == launch) {
+		nested_vmx_failValid(vcpu,
+			launch ? VMXERR_VMLAUNCH_NONCLEAR_VMCS
+			       : VMXERR_VMRESUME_NONLAUNCHED_VMCS);
+		return 1;
+	}
+
+	if ((vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_MSR_BITMAPS) &&
+			!IS_ALIGNED(vmcs12->msr_bitmap, PAGE_SIZE)) {
+		/*TODO: Also verify bits beyond physical address width are 0*/
+		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+		return 1;
+	}
+
+	if (vmcs12->vm_entry_msr_load_count > 0 ||
+	    vmcs12->vm_exit_msr_load_count > 0 ||
+	    vmcs12->vm_exit_msr_store_count > 0) {
+		if (printk_ratelimit())
+			printk(KERN_WARNING
+			  "%s: VMCS MSR_{LOAD,STORE} unsupported\n", __func__);
+		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+		return 1;
+	}
+
+	if (!vmx_control_verify(vmcs12->cpu_based_vm_exec_control,
+	      nested_vmx_procbased_ctls_low, nested_vmx_procbased_ctls_high) ||
+	    !vmx_control_verify(vmcs12->secondary_vm_exec_control,
+	      nested_vmx_secondary_ctls_low, nested_vmx_secondary_ctls_high) ||
+	    !vmx_control_verify(vmcs12->pin_based_vm_exec_control,
+	      nested_vmx_pinbased_ctls_low, nested_vmx_pinbased_ctls_high) ||
+	    !vmx_control_verify(vmcs12->vm_exit_controls,
+	      nested_vmx_exit_ctls_low, nested_vmx_exit_ctls_high) ||
+	    !vmx_control_verify(vmcs12->vm_entry_controls,
+	      nested_vmx_entry_ctls_low, nested_vmx_entry_ctls_high))
+	{
+		nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
+		return 1;
+	}
+
+	if (((vmcs12->host_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
+	    ((vmcs12->host_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+		nested_vmx_failValid(vcpu,
+			VMXERR_ENTRY_INVALID_HOST_STATE_FIELD);
+		return 1;
+	}
+
+	if (((vmcs12->guest_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
+	    ((vmcs12->guest_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
+		nested_vmx_entry_failure(vcpu, vmcs12,
+			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);
+		return 1;
+	}
+	if (vmcs12->vmcs_link_pointer != -1ull) {
+		nested_vmx_entry_failure(vcpu, vmcs12,
+			EXIT_REASON_INVALID_STATE, ENTRY_FAIL_VMCS_LINK_PTR);
+		return 1;
+	}
+
+	/*
+	 * We're finally done with prerequisite checking, and can start with
+	 * the nested entry.
+	 */
+
 	enter_guest_mode(vcpu);
 
 	vmx->nested.vmcs01_tsc_offset = vmcs_read64(TSC_OFFSET);
@@ -6460,6 +6537,23 @@  static void nested_vmx_vmexit(struct kvm
 		nested_vmx_succeed(vcpu);
 }
 
+/*
+ * L1's failure to enter L2 is a subset of a normal exit, as explained in
+ * 23.7 "VM-entry failures during or after loading guest state" (this also
+ * lists the acceptable exit-reason and exit-qualification parameters).
+ * It should only be called before L2 actually succeeded to run, and when
+ * vmcs01 is current (it doesn't leave_guest_mode() or switch vmcss).
+ */
+static void nested_vmx_entry_failure(struct kvm_vcpu *vcpu,
+			struct vmcs12 *vmcs12,
+			u32 reason, unsigned long qualification)
+{
+	load_vmcs12_host_state(vcpu, vmcs12);
+	vmcs12->vm_exit_reason = reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
+	vmcs12->exit_qualification = qualification;
+	nested_vmx_succeed(vcpu);
+}
+
 static int vmx_check_intercept(struct kvm_vcpu *vcpu,
 			       struct x86_instruction_info *info,
 			       enum x86_intercept_stage stage)
--- .before/arch/x86/include/asm/vmx.h	2011-05-16 22:36:49.000000000 +0300
+++ .after/arch/x86/include/asm/vmx.h	2011-05-16 22:36:49.000000000 +0300
@@ -427,6 +427,14 @@  struct vmx_msr_entry {
 } __aligned(16);
 
 /*
+ * Exit Qualifications for entry failure during or after loading guest state
+ */
+#define ENTRY_FAIL_DEFAULT		0
+#define ENTRY_FAIL_PDPTE		2
+#define ENTRY_FAIL_NMI			3
+#define ENTRY_FAIL_VMCS_LINK_PTR	4
+
+/*
  * VM-instruction error numbers
  */
 enum vm_instruction_error_number {