From patchwork Thu Jan 27 08:31:53 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 510491 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id p0R8Vxcf011652 for ; Thu, 27 Jan 2011 08:31:59 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752023Ab1A0Ib5 (ORCPT ); Thu, 27 Jan 2011 03:31:57 -0500 Received: from mtagate6.uk.ibm.com ([194.196.100.166]:54392 "EHLO mtagate6.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751747Ab1A0Ib4 (ORCPT ); Thu, 27 Jan 2011 03:31:56 -0500 Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate6.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p0R8VtNq009221 for ; Thu, 27 Jan 2011 08:31:55 GMT Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p0R8Vwmd1822940 for ; Thu, 27 Jan 2011 08:31:58 GMT Received: from d06av06.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p0R8Vtra023392 for ; Thu, 27 Jan 2011 01:31:55 -0700 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.217]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p0R8VsJC023382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 27 Jan 2011 01:31:54 -0700 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.4/8.14.4) with ESMTP id p0R8Vrim002413; Thu, 27 Jan 2011 10:31:53 +0200 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.4/8.14.4/Submit) id p0R8VrP1002411; Thu, 27 Jan 2011 10:31:53 +0200 Date: Thu, 27 Jan 2011 10:31:53 +0200 Message-Id: <201101270831.p0R8VrP1002411@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: gleb@redhat.com, avi@redhat.com To: kvm@vger.kernel.org From: "Nadav Har'El" References: <1296116987-nyh@il.ibm.com> Subject: [PATCH 04/29] nVMX: Introduce vmcs12: a VMCS structure for L1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Thu, 27 Jan 2011 08:32:00 +0000 (UTC) --- .before/arch/x86/kvm/vmx.c 2011-01-26 18:06:03.000000000 +0200 +++ .after/arch/x86/kvm/vmx.c 2011-01-26 18:06:03.000000000 +0200 @@ -131,6 +131,34 @@ struct shared_msr_entry { }; /* + * struct vmcs12 describes the state that our guest hypervisor (L1) keeps for a + * single nested guest (L2), hence the name vmcs12. Any VMX implementation has + * a VMCS structure, and vmcs12 is our emulated VMX's VMCS. This structure is + * stored in guest memory specified by VMPTRLD, but is opaque to the guest, + * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. More + * than one of these structures may exist, if L1 runs multiple L2 guests. + * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the + * underlying hardware which will be used to run L2. + * This structure is packed in order to preserve the binary content after live + * migration. If there are changes in the content or layout, VMCS12_REVISION + * must be changed. + */ +struct __packed vmcs12 { + /* According to the Intel spec, a VMCS region must start with the + * following two fields. Then follow implementation-specific data. + */ + u32 revision_id; + u32 abort; +}; + +/* + * VMCS12_REVISION is an arbitrary id that should be changed if the content or + * layout of struct vmcs12 is changed. MSR_IA32_VMX_BASIC returns this id, and + * VMPTRLD verifies that the VMCS region that L1 is loading contains this id. + */ +#define VMCS12_REVISION 0x11e57ed0 + +/* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example, * the current VMCS set by L1, a list of the VMCSs used to run the active @@ -139,6 +167,12 @@ struct shared_msr_entry { struct nested_vmx { /* Has the level1 guest done vmxon? */ bool vmxon; + + /* The guest-physical address of the current VMCS L1 keeps for L2 */ + gpa_t current_vmptr; + /* The host-usable pointer to the above */ + struct page *current_vmcs12_page; + struct vmcs12 *current_vmcs12; }; struct vcpu_vmx { @@ -198,6 +232,26 @@ static inline struct vcpu_vmx *to_vmx(st return container_of(vcpu, struct vcpu_vmx, vcpu); } +static struct page *nested_get_page(struct kvm_vcpu *vcpu, gpa_t addr) +{ + struct page *page = gfn_to_page(vcpu->kvm, addr >> PAGE_SHIFT); + if (is_error_page(page)) { + kvm_release_page_clean(page); + return NULL; + } + return page; +} + +static void nested_release_page(struct page *page) +{ + kvm_release_page_dirty(page); +} + +static void nested_release_page_clean(struct page *page) +{ + kvm_release_page_clean(page); +} + static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); static void kvm_cpu_vmxon(u64 addr); @@ -3865,6 +3919,11 @@ static void free_nested(struct vcpu_vmx if (!vmx->nested.vmxon) return; vmx->nested.vmxon = false; + if (vmx->nested.current_vmptr != -1ull) { + kunmap(vmx->nested.current_vmcs12_page); + nested_release_page(vmx->nested.current_vmcs12_page); + vmx->nested.current_vmptr = -1ull; + } } /* Emulate the VMXOFF instruction */ @@ -4300,6 +4359,7 @@ static void vmx_free_vcpu(struct kvm_vcp struct vcpu_vmx *vmx = to_vmx(vcpu); free_vpid(vmx); + free_nested(vmx); vmx_free_vmcs(vcpu); kfree(vmx->guest_msrs); kvm_vcpu_uninit(vcpu); @@ -4366,6 +4426,9 @@ static struct kvm_vcpu *vmx_create_vcpu( goto free_vmcs; } + vmx->nested.current_vmptr = -1ull; + vmx->nested.current_vmcs12 = NULL; + return &vmx->vcpu; free_vmcs: