From patchwork Wed Dec 8 17:01:29 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 391062 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id oB8H1eM1019173 for ; Wed, 8 Dec 2010 17:01:40 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752835Ab0LHRBi (ORCPT ); Wed, 8 Dec 2010 12:01:38 -0500 Received: from mtagate7.uk.ibm.com ([194.196.100.167]:45413 "EHLO mtagate7.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752184Ab0LHRBh (ORCPT ); Wed, 8 Dec 2010 12:01:37 -0500 Received: from d06nrmr1507.portsmouth.uk.ibm.com (d06nrmr1507.portsmouth.uk.ibm.com [9.149.38.233]) by mtagate7.uk.ibm.com (8.13.1/8.13.1) with ESMTP id oB8H1Wqh002819 for ; Wed, 8 Dec 2010 17:01:32 GMT Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by d06nrmr1507.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id oB8H1XYD2826340 for ; Wed, 8 Dec 2010 17:01:33 GMT Received: from d06av06.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id oB8H1VXj017379 for ; Wed, 8 Dec 2010 10:01:31 -0700 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.217]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id oB8H1UM7017375 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Dec 2010 10:01:31 -0700 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.4/8.14.4) with ESMTP id oB8H1Urx008579; Wed, 8 Dec 2010 19:01:30 +0200 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.4/8.14.4/Submit) id oB8H1TI5008577; Wed, 8 Dec 2010 19:01:29 +0200 Date: Wed, 8 Dec 2010 19:01:29 +0200 Message-Id: <201012081701.oB8H1TI5008577@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: gleb@redhat.com, avi@redhat.com To: kvm@vger.kernel.org From: "Nadav Har'El" References: <1291827596-nyh@il.ibm.com> Subject: [PATCH 03/28] nVMX: Implement VMXON and VMXOFF Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Wed, 08 Dec 2010 17:01:41 +0000 (UTC) --- .before/arch/x86/kvm/vmx.c 2010-12-08 18:56:48.000000000 +0200 +++ .after/arch/x86/kvm/vmx.c 2010-12-08 18:56:48.000000000 +0200 @@ -127,6 +127,17 @@ struct shared_msr_entry { u64 mask; }; +/* + * The nested_vmx structure is part of vcpu_vmx, and holds information we need + * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example, + * the current VMCS set by L1, a list of the VMCSs used to run the active + * L2 guests on the hardware, and more. + */ +struct nested_vmx { + /* Has the level1 guest done vmxon? */ + bool vmxon; +}; + struct vcpu_vmx { struct kvm_vcpu vcpu; struct list_head local_vcpus_link; @@ -174,6 +185,9 @@ struct vcpu_vmx { u32 exit_reason; bool rdtscp_enabled; + + /* Support for a guest hypervisor (nested VMX) */ + struct nested_vmx nested; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -3653,6 +3667,90 @@ static int handle_invalid_op(struct kvm_ } /* + * Emulate the VMXON instruction. + * Currently, we just remember that VMX is active, and do not save or even + * inspect the argument to VMXON (the so-called "VMXON pointer") because we + * do not currently need to store anything in that guest-allocated memory + * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the their + * argument is different from the VMXON pointer (which the spec says they do). + */ +static int handle_vmon(struct kvm_vcpu *vcpu) +{ + struct kvm_segment cs; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + /* The Intel VMX Instruction Reference lists a bunch of bits that + * are prerequisite to running VMXON, most notably CR4.VMXE must be + * set to 1. Otherwise, we should fail with #UD. We test these now: + */ + if (!nested || + !kvm_read_cr4_bits(vcpu, X86_CR4_VMXE) || + !kvm_read_cr0_bits(vcpu, X86_CR0_PE) || + (vmx_get_rflags(vcpu) & X86_EFLAGS_VM)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); + if (is_long_mode(vcpu) && !cs.l) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (vmx_get_cpl(vcpu)) { + kvm_inject_gp(vcpu, 0); + return 1; + } + + vmx->nested.vmxon = true; + + skip_emulated_instruction(vcpu); + return 1; +} + +/* + * Intel's VMX Instruction Reference specifies a common set of prerequisites + * for running VMX instructions (except VMXON, whose prerequisites are + * slightly different). It also specifies what exception to inject otherwise. + */ +static int nested_vmx_check_permission(struct kvm_vcpu *vcpu) +{ + struct kvm_segment cs; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!vmx->nested.vmxon) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 0; + } + + vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); + if ((vmx_get_rflags(vcpu) & X86_EFLAGS_VM) || + (is_long_mode(vcpu) && !cs.l)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 0; + } + + if (vmx_get_cpl(vcpu)) { + kvm_inject_gp(vcpu, 0); + return 0; + } + + return 1; +} + +/* Emulate the VMXOFF instruction */ +static int handle_vmoff(struct kvm_vcpu *vcpu) +{ + if (!nested_vmx_check_permission(vcpu)) + return 1; + + to_vmx(vcpu)->nested.vmxon = false; + + skip_emulated_instruction(vcpu); + return 1; +} + +/* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs * to be done to userspace and return 0. @@ -3680,8 +3778,8 @@ static int (*kvm_vmx_exit_handlers[])(st [EXIT_REASON_VMREAD] = handle_vmx_insn, [EXIT_REASON_VMRESUME] = handle_vmx_insn, [EXIT_REASON_VMWRITE] = handle_vmx_insn, - [EXIT_REASON_VMOFF] = handle_vmx_insn, - [EXIT_REASON_VMON] = handle_vmx_insn, + [EXIT_REASON_VMOFF] = handle_vmoff, + [EXIT_REASON_VMON] = handle_vmon, [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd,