From patchwork Wed Aug 1 14:40:48 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Har'El X-Patchwork-Id: 1264471 Return-Path: X-Original-To: patchwork-kvm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id A00BC3FC23 for ; Wed, 1 Aug 2012 14:40:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755069Ab2HAOkz (ORCPT ); Wed, 1 Aug 2012 10:40:55 -0400 Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:37689 "EHLO e06smtp10.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754756Ab2HAOky (ORCPT ); Wed, 1 Aug 2012 10:40:54 -0400 Received: from /spool/local by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 1 Aug 2012 15:40:53 +0100 Received: from d06nrmr1307.portsmouth.uk.ibm.com (9.149.38.129) by e06smtp10.uk.ibm.com (192.168.101.140) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 1 Aug 2012 15:40:51 +0100 Received: from d06av01.portsmouth.uk.ibm.com (d06av01.portsmouth.uk.ibm.com [9.149.37.212]) by d06nrmr1307.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q71EeoEG2752666 for ; Wed, 1 Aug 2012 15:40:50 +0100 Received: from d06av01.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q71EeoEr032331 for ; Wed, 1 Aug 2012 08:40:50 -0600 Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.107]) by d06av01.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q71EenmS032318 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Aug 2012 08:40:49 -0600 Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1]) by rice.haifa.ibm.com (8.14.5/8.14.4) with ESMTP id q71EemiX023915; Wed, 1 Aug 2012 17:40:48 +0300 Received: (from nyh@localhost) by rice.haifa.ibm.com (8.14.5/8.14.5/Submit) id q71EemBU023913; Wed, 1 Aug 2012 17:40:48 +0300 Date: Wed, 1 Aug 2012 17:40:48 +0300 Message-Id: <201208011440.q71EemBU023913@rice.haifa.ibm.com> X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav Har'El" using -f Cc: Joerg.Roedel@amd.com, avi@redhat.com, owasserm@redhat.com, abelg@il.ibm.com, eddie.dong@intel.com, yang.z.zhang@intel.com To: kvm@vger.kernel.org From: "Nadav Har'El" References: <1343831766-nyh@il.ibm.com> Subject: [PATCH 08/10] nEPT: Nested INVEPT x-cbid: 12080114-4966-0000-0000-0000031B586E Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If we let L1 use EPT, we should probably also support the INVEPT instruction. In our current nested EPT implementation, when L1 changes its EPT table for L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course of this modification already calls INVEPT. Therefore, when L1 calls INVEPT, we don't really need to do anything. In particular we *don't* need to call the real INVEPT again. All we do in our INVEPT is verify the validity of the call, and its parameters, and then do nothing. In KVM Forum 2010, Dong et al. presented "Nested Virtualization Friendly KVM" and classified our current nested EPT implementation as "shadow-like virtual EPT". He recommended instead a different approach, which he called "VTLB-like virtual EPT". If we had taken that alternative approach, INVEPT would have had a bigger role: L0 would only rebuild the shadow EPT table when L1 calls INVEPT. Signed-off-by: Nadav Har'El --- arch/x86/include/asm/vmx.h | 2 arch/x86/kvm/vmx.c | 87 +++++++++++++++++++++++++++++++++++ 2 files changed, 89 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- .before/arch/x86/include/asm/vmx.h 2012-08-01 17:22:47.000000000 +0300 +++ .after/arch/x86/include/asm/vmx.h 2012-08-01 17:22:47.000000000 +0300 @@ -280,6 +280,7 @@ enum vmcs_field { #define EXIT_REASON_APIC_ACCESS 44 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 +#define EXIT_REASON_INVEPT 50 #define EXIT_REASON_WBINVD 54 #define EXIT_REASON_XSETBV 55 #define EXIT_REASON_INVPCID 58 @@ -406,6 +407,7 @@ enum vmcs_field { #define VMX_EPTP_WB_BIT (1ull << 14) #define VMX_EPT_2MB_PAGE_BIT (1ull << 16) #define VMX_EPT_1GB_PAGE_BIT (1ull << 17) +#define VMX_EPT_INVEPT_BIT (1ull << 20) #define VMX_EPT_AD_BIT (1ull << 21) #define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24) #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25) --- .before/arch/x86/kvm/vmx.c 2012-08-01 17:22:47.000000000 +0300 +++ .after/arch/x86/kvm/vmx.c 2012-08-01 17:22:47.000000000 +0300 @@ -2026,6 +2026,10 @@ static __init void nested_vmx_setup_ctls /* nested EPT: emulate EPT also to L1 */ nested_vmx_secondary_ctls_high |= SECONDARY_EXEC_ENABLE_EPT; nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT; + nested_vmx_ept_caps |= + VMX_EPT_INVEPT_BIT | VMX_EPT_EXTENT_GLOBAL_BIT | + VMX_EPT_EXTENT_CONTEXT_BIT | + VMX_EPT_EXTENT_INDIVIDUAL_BIT; nested_vmx_ept_caps &= vmx_capability.ept; } else nested_vmx_ept_caps = 0; @@ -5702,6 +5706,87 @@ static int handle_vmptrst(struct kvm_vcp return 1; } +/* Emulate the INVEPT instruction */ +static int handle_invept(struct kvm_vcpu *vcpu) +{ + u32 vmx_instruction_info; + unsigned long type; + gva_t gva; + struct x86_exception e; + struct { + u64 eptp, gpa; + } operand; + + if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) || + !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + if (!nested_vmx_check_permission(vcpu)) + return 1; + + if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) { + kvm_queue_exception(vcpu, UD_VECTOR); + return 1; + } + + /* According to the Intel VMX instruction reference, the memory + * operand is read even if it isn't needed (e.g., for type==global) + */ + vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO); + if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION), + vmx_instruction_info, &gva)) + return 1; + if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand, + sizeof(operand), &e)) { + kvm_inject_page_fault(vcpu, &e); + return 1; + } + + type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf); + + switch (type) { + case VMX_EPT_EXTENT_GLOBAL: + if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_GLOBAL_BIT)) + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + else { + /* + * Do nothing: when L1 changes EPT12, we already + * update EPT02 (the shadow EPT table) and call INVEPT. + * So when L1 calls INVEPT, there's nothing left to do. + */ + nested_vmx_succeed(vcpu); + } + break; + case VMX_EPT_EXTENT_CONTEXT: + if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_CONTEXT_BIT)) + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + else { + /* Do nothing */ + nested_vmx_succeed(vcpu); + } + break; + case VMX_EPT_EXTENT_INDIVIDUAL_ADDR: + if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_INDIVIDUAL_BIT)) + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + else { + /* Do nothing */ + nested_vmx_succeed(vcpu); + } + break; + default: + nested_vmx_failValid(vcpu, + VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID); + } + + skip_emulated_instruction(vcpu); + return 1; +} + /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs @@ -5744,6 +5829,7 @@ static int (*kvm_vmx_exit_handlers[])(st [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause, [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op, [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op, + [EXIT_REASON_INVEPT] = handle_invept, }; static const int kvm_vmx_max_exit_handlers = @@ -5928,6 +6014,7 @@ static bool nested_vmx_exit_handled(stru case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD: case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE: case EXIT_REASON_VMOFF: case EXIT_REASON_VMON: + case EXIT_REASON_INVEPT: /* * VMX instructions trap unconditionally. This allows L1 to * emulate them for its L2 guest, i.e., allows 3-level nesting!