From patchwork Sun Oct 17 10:05:06 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 259741 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id o9HA5HvV014306 for ; Sun, 17 Oct 2010 10:05:17 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932319Ab0JQKFO (ORCPT ); Sun, 17 Oct 2010 06:05:14 -0400 Received: from mtagate7.uk.ibm.com ([194.196.100.167]:50383 "EHLO mtagate7.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932248Ab0JQKFN (ORCPT ); Sun, 17 Oct 2010 06:05:13 -0400 Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate7.uk.ibm.com (8.13.1/8.13.1) with ESMTP id o9HA59vP030163 for ; Sun, 17 Oct 2010 10:05:09 GMT Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com [9.149.37.216]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o9HA59g63383372 for ; Sun, 17 Oct 2010 11:05:09 +0100 Received: from d06av04.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av04.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id o9HA59bD016993 for ; Sun, 17 Oct 2010 11:05:09 +0100 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.112]) by d06av04.portsmouth.uk.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id o9HA57U4016983 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 17 Oct 2010 11:05:08 +0100 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.4/8.14.4) with ESMTP id o9HA57fl029297; Sun, 17 Oct 2010 12:05:07 +0200 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.4/8.14.4/Submit) id o9HA56HQ029295; Sun, 17 Oct 2010 12:05:06 +0200 Date: Sun, 17 Oct 2010 12:05:06 +0200 Message-Id: <201010171005.o9HA56HQ029295@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: gleb@redhat.com, avi@redhat.com To: kvm@vger.kernel.org From: "Nadav Har'El" References: <1287309814-nyh@il.ibm.com> Subject: [PATCH 03/27] nVMX: Implement VMXON and VMXOFF Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Sun, 17 Oct 2010 10:05:17 +0000 (UTC) --- .before/arch/x86/kvm/vmx.c 2010-10-17 11:52:00.000000000 +0200 +++ .after/arch/x86/kvm/vmx.c 2010-10-17 11:52:00.000000000 +0200 @@ -127,6 +127,17 @@ struct shared_msr_entry { u64 mask; }; +/* + * The nested_vmx structure is part of vcpu_vmx, and holds information we need + * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example, + * the current VMCS set by L1, a list of the VMCSs used to run the active + * L2 guests on the hardware, and more. + */ +struct nested_vmx { + /* Has the level1 guest done vmxon? */ + bool vmxon; +}; + struct vcpu_vmx { struct kvm_vcpu vcpu; struct list_head local_vcpus_link; @@ -174,6 +185,9 @@ struct vcpu_vmx { u32 exit_reason; bool rdtscp_enabled; + + /* Support for a guest hypervisor (nested VMX) */ + struct nested_vmx nested; }; static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) @@ -3364,6 +3378,90 @@ static int handle_vmx_insn(struct kvm_vc return 1; } +/* + * Emulate the VMXON instruction. + * Currently, we just remember that VMX is active, and do not save or even + * inspect the argument to VMXON (the so-called "VMXON pointer") because we + * do not currently need to store anything in that guest-allocated memory + * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the their + * argument is different from the VMXON pointer (which the spec says they do). + */ +static int handle_vmon(struct kvm_vcpu *vcpu) +{ + struct kvm_segment cs; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + /* The Intel VMX Instruction Reference lists a bunch of bits that + * are prerequisite to running VMXON, most notably CR4.VMXE must be + * set to 1. Otherwise, we should fail with #UD. We test these now: + */ + if (!nested || + !kvm_read_cr4_bits(vcpu, X86_CR4_VMXE) || + !kvm_read_cr0_bits(vcpu, X86_CR0_PE) || + (vmx_get_rflags(vcpu) & X86_EFLAGS_VM)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); + if (is_long_mode(vcpu) && !cs.l) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (vmx_get_cpl(vcpu)) { + kvm_inject_gp(vcpu, 0); + return 1; + } + + vmx->nested.vmxon = true; + + skip_emulated_instruction(vcpu); + return 1; +} + +/* + * Intel's VMX Instruction Reference specifies a common set of prerequisites + * for running VMX instructions (except VMXON, whose prerequisites are + * slightly different). It also specifies what exception to inject otherwise. + */ +static int nested_vmx_check_permission(struct kvm_vcpu *vcpu) +{ + struct kvm_segment cs; + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!vmx->nested.vmxon) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 0; + } + + vmx_get_segment(vcpu, &cs, VCPU_SREG_CS); + if ((vmx_get_rflags(vcpu) & X86_EFLAGS_VM) || + (is_long_mode(vcpu) && !cs.l)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 0; + } + + if (vmx_get_cpl(vcpu)) { + kvm_inject_gp(vcpu, 0); + return 0; + } + + return 1; +} + +/* Emulate the VMXOFF instruction */ +static int handle_vmoff(struct kvm_vcpu *vcpu) +{ + if (!nested_vmx_check_permission(vcpu)) + return 1; + + to_vmx(vcpu)->nested.vmxon = false; + + skip_emulated_instruction(vcpu); + return 1; +} + static int handle_invlpg(struct kvm_vcpu *vcpu) { unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION); @@ -3673,8 +3771,8 @@ static int (*kvm_vmx_exit_handlers[])(st [EXIT_REASON_VMREAD] = handle_vmx_insn, [EXIT_REASON_VMRESUME] = handle_vmx_insn, [EXIT_REASON_VMWRITE] = handle_vmx_insn, - [EXIT_REASON_VMOFF] = handle_vmx_insn, - [EXIT_REASON_VMON] = handle_vmx_insn, + [EXIT_REASON_VMOFF] = handle_vmoff, + [EXIT_REASON_VMON] = handle_vmon, [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold, [EXIT_REASON_APIC_ACCESS] = handle_apic_access, [EXIT_REASON_WBINVD] = handle_wbinvd,