From patchwork Sun Jun 13 12:24:06 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 105782 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter.kernel.org (8.14.3/8.14.3) with ESMTP id o5DCODpf026553 for ; Sun, 13 Jun 2010 12:24:13 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753292Ab0FMMYL (ORCPT ); Sun, 13 Jun 2010 08:24:11 -0400 Received: from mtagate6.de.ibm.com ([195.212.17.166]:51375 "EHLO mtagate6.de.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752936Ab0FMMYJ (ORCPT ); Sun, 13 Jun 2010 08:24:09 -0400 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate6.de.ibm.com (8.13.1/8.13.1) with ESMTP id o5DCO8bU031039 for ; Sun, 13 Jun 2010 12:24:08 GMT Received: from d12av03.megacenter.de.ibm.com (d12av03.megacenter.de.ibm.com [9.149.165.213]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o5DCO8CB1192096 for ; Sun, 13 Jun 2010 14:24:08 +0200 Received: from d12av03.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av03.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id o5DCO8fZ017824 for ; Sun, 13 Jun 2010 14:24:08 +0200 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.205]) by d12av03.megacenter.de.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id o5DCO7xA017708 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 13 Jun 2010 14:24:07 +0200 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.4/8.14.4) with ESMTP id o5DCO6gq012899; Sun, 13 Jun 2010 15:24:06 +0300 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.4/8.14.4/Submit) id o5DCO63N012897; Sun, 13 Jun 2010 15:24:06 +0300 Date: Sun, 13 Jun 2010 15:24:06 +0300 Message-Id: <201006131224.o5DCO63N012897@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: kvm@vger.kernel.org To: avi@redhat.com From: "Nadav Har'El" References: <1276431753-nyh@il.ibm.com> Subject: [PATCH 3/24] Implement VMXON and VMXOFF Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter.kernel.org [140.211.167.41]); Sun, 13 Jun 2010 12:24:13 +0000 (UTC) --- .before/arch/x86/kvm/vmx.c 2010-06-13 15:01:28.000000000 +0300 +++ .after/arch/x86/kvm/vmx.c 2010-06-13 15:01:28.000000000 +0300 @@ -117,6 +117,16 @@ struct shared_msr_entry { u64 mask; }; +/* The nested_vmx structure is part of vcpu_vmx, and holds information we need + * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example, + * the current VMCS set by L1, a list of the VMCSs used to run the active + * L2 guests on the hardware, and more. + */ +struct nested_vmx { + /* Has the level1 guest done vmxon? */ + bool vmxon; +}; + struct vcpu_vmx { struct kvm_vcpu vcpu; struct list_head local_vcpus_link; @@ -168,6 +178,9 @@ struct vcpu_vmx { u32 exit_reason; bool rdtscp_enabled; + + /* Support for guest hypervisors (nested VMX) */ + struct nested_vmx nested; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -3353,6 +3366,93 @@ static int handle_vmx_insn(struct kvm_vc return 1; } +/* Emulate the VMXON instruction. + * Currently, we just remember that VMX is active, and do not save or even + * inspect the argument to VMXON (the so-called "VMXON pointer") because we + * do not currently need to store anything in that guest-allocated memory + * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the their + * argument is different from the VMXON pointer (which the spec says they do). + */ +static int handle_vmon(struct kvm_vcpu *vcpu) +{ + struct kvm_segment cs; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + /* The Intel VMX Instruction Reference lists a bunch of bits that + * are prerequisite to running VMXON, most notably CR4.VMXE must be + * set to 1. Otherwise, we should fail with #UD. We test these now: + */ + if (!nested) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (!(vcpu->arch.cr4 & X86_CR4_VMXE) || + !(vcpu->arch.cr0 & X86_CR0_PE) || + (vmx_get_rflags(vcpu) & X86_EFLAGS_VM)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); + if (is_long_mode(vcpu) && !cs.l) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (vmx_get_cpl(vcpu)) { + kvm_inject_gp(vcpu, 0); + return 1; + } + + vmx->nested.vmxon = 1; + + skip_emulated_instruction(vcpu); + return 1; +} + +/* + * Intel's VMX Instruction Reference specifies a common set of prerequisites + * for running VMX instructions (except VMXON, whose prerequisites are + * slightly different). It also specifies what exception to inject otherwise. + */ +static int nested_vmx_check_permission(struct kvm_vcpu *vcpu) +{ + struct kvm_segment cs; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!vmx->nested.vmxon) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 0; + } + + vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); + if ((vmx_get_rflags(vcpu) & X86_EFLAGS_VM) || + (is_long_mode(vcpu) && !cs.l)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 0; + } + + if (vmx_get_cpl(vcpu)) { + kvm_inject_gp(vcpu, 0); + return 0; + } + + return 1; +} + +/* Emulate the VMXOFF instruction */ +static int handle_vmoff(struct kvm_vcpu *vcpu) +{ + if (!nested_vmx_check_permission(vcpu)) + return 1; + + to_vmx(vcpu)->nested.vmxon = 0; + + skip_emulated_instruction(vcpu); + return 1; +} + static int handle_invlpg(struct kvm_vcpu *vcpu) { unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); @@ -3642,8 +3742,8 @@ static int (*kvm_vmx_exit_handlers[])(st [EXIT_REASON_VMREAD] = handle_vmx_insn, [EXIT_REASON_VMRESUME] = handle_vmx_insn, [EXIT_REASON_VMWRITE] = handle_vmx_insn, - [EXIT_REASON_VMOFF] = handle_vmx_insn, - [EXIT_REASON_VMON] = handle_vmx_insn, + [EXIT_REASON_VMOFF] = handle_vmoff, + [EXIT_REASON_VMON] = handle_vmon, [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd,