From patchwork Wed Dec 8 17:02:31 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 391082 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by demeter1.kernel.org (8.14.4/8.14.3) with ESMTP id oB8H2eIS019270 for ; Wed, 8 Dec 2010 17:02:40 GMT Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753207Ab0LHRCh (ORCPT ); Wed, 8 Dec 2010 12:02:37 -0500 Received: from mtagate6.uk.ibm.com ([194.196.100.166]:58349 "EHLO mtagate6.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752241Ab0LHRCh (ORCPT ); Wed, 8 Dec 2010 12:02:37 -0500 Received: from d06nrmr1507.portsmouth.uk.ibm.com (d06nrmr1507.portsmouth.uk.ibm.com [9.149.38.233]) by mtagate6.uk.ibm.com (8.13.1/8.13.1) with ESMTP id oB8H2ZQC003159 for ; Wed, 8 Dec 2010 17:02:35 GMT Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by d06nrmr1507.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id oB8H2blX3170364 for ; Wed, 8 Dec 2010 17:02:37 GMT Received: from d06av03.portsmouth.uk.ibm.com (localhost.localdomain [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id oB8H2ZYB029310 for ; Wed, 8 Dec 2010 10:02:35 -0700 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.217]) by d06av03.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id oB8H2Y1R029250 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Dec 2010 10:02:34 -0700 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.4/8.14.4) with ESMTP id oB8H2Vgj008604; Wed, 8 Dec 2010 19:02:31 +0200 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.4/8.14.4/Submit) id oB8H2V6S008602; Wed, 8 Dec 2010 19:02:31 +0200 Date: Wed, 8 Dec 2010 19:02:31 +0200 Message-Id: <201012081702.oB8H2V6S008602@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: gleb@redhat.com, avi@redhat.com To: kvm@vger.kernel.org From: "Nadav Har'El" References: <1291827596-nyh@il.ibm.com> Subject: [PATCH 05/28] nVMX: Introduce vmcs12: a VMCS structure for L1 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.3 (demeter1.kernel.org [140.211.167.41]); Wed, 08 Dec 2010 17:02:40 +0000 (UTC) --- .before/arch/x86/kvm/vmx.c 2010-12-08 18:56:49.000000000 +0200 +++ .after/arch/x86/kvm/vmx.c 2010-12-08 18:56:49.000000000 +0200 @@ -128,6 +128,34 @@ struct shared_msr_entry { }; /* + * struct vmcs12 describes the state that our guest hypervisor (L1) keeps for a + * single nested guest (L2), hence the name vmcs12. Any VMX implementation has + * a VMCS structure, and vmcs12 is our emulated VMX's VMCS. This structure is + * stored in guest memory specified by VMPTRLD, but is opaque to the guest, + * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. More + * than one of these structures may exist, if L1 runs multiple L2 guests. + * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the + * underlying hardware which will be used to run L2. + * This structure is packed in order to preserve the binary content after live + * migration. If there are changes in the content or layout, VMCS12_REVISION + * must be changed. + */ +struct __packed vmcs12 { + /* According to the Intel spec, a VMCS region must start with the + * following two fields. Then follow implementation-specific data. + */ + u32 revision_id; + u32 abort; +}; + +/* + * VMCS12_REVISION is an arbitrary id that should be changed if the content or + * layout of struct vmcs12 is changed. MSR_IA32_VMX_BASIC returns this id, and + * VMPTRLD verifies that the VMCS region that L1 is loading contains this id. + */ +#define VMCS12_REVISION 0x11e57ed0 + +/* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example, * the current VMCS set by L1, a list of the VMCSs used to run the active @@ -136,6 +164,12 @@ struct shared_msr_entry { struct nested_vmx { /* Has the level1 guest done vmxon? */ bool vmxon; + + /* The guest-physical address of the current VMCS L1 keeps for L2 */ + gpa_t current_vmptr; + /* The host-usable pointer to the above */ + struct page *current_vmcs12_page; + struct vmcs12 *current_vmcs12; }; struct vcpu_vmx { @@ -195,6 +229,28 @@ static inline struct vcpu_vmx *to_vmx(st return container_of(vcpu, struct vcpu_vmx, vcpu); } +static struct page *nested_get_page(struct kvm_vcpu *vcpu, gpa_t addr) +{ + struct page *page = gfn_to_page(vcpu->kvm, addr >> PAGE_SHIFT); + if (is_error_page(page)) { + kvm_release_page_clean(page); + return NULL; + } + return page; +} + +static void nested_release_page(struct page *page) +{ + kunmap(page); + kvm_release_page_dirty(page); +} + +static void nested_release_page_clean(struct page *page) +{ + kunmap(page); + kvm_release_page_clean(page); +} + static int init_rmode(struct kvm *kvm); static u64 construct_eptp(unsigned long root_hpa); static void kvm_cpu_vmxon(u64 addr); @@ -3755,6 +3811,9 @@ static int handle_vmoff(struct kvm_vcpu to_vmx(vcpu)->nested.vmxon = false; + if (to_vmx(vcpu)->nested.current_vmptr != -1ull) + nested_release_page(to_vmx(vcpu)->nested.current_vmcs12_page); + skip_emulated_instruction(vcpu); return 1; } @@ -4183,6 +4242,8 @@ static void vmx_free_vcpu(struct kvm_vcp struct vcpu_vmx *vmx = to_vmx(vcpu); free_vpid(vmx); + if (vmx->nested.vmxon && to_vmx(vcpu)->nested.current_vmptr != -1ull) + nested_release_page(to_vmx(vcpu)->nested.current_vmcs12_page); vmx_free_vmcs(vcpu); kfree(vmx->guest_msrs); kvm_vcpu_uninit(vcpu); @@ -4249,6 +4310,9 @@ static struct kvm_vcpu *vmx_create_vcpu( goto free_vmcs; } + vmx->nested.current_vmptr = -1ull; + vmx->nested.current_vmcs12 = NULL; + return &vmx->vcpu; free_vmcs: