From patchwork Thu Apr 25 07:52:00 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Nakajima, Jun" <jun.nakajima@intel.com>
X-Patchwork-Id: 2487431
Return-Path: <kvm-owner@vger.kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork1.kernel.org (Postfix) with ESMTP id 33C243FC64
	for <patchwork-kvm@patchwork.kernel.org>;
	Thu, 25 Apr 2013 07:52:07 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756198Ab3DYHwD (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Thu, 25 Apr 2013 03:52:03 -0400
Received: from mail-vc0-f173.google.com ([209.85.220.173]:65527 "EHLO
	mail-vc0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755416Ab3DYHwB (ORCPT <rfc822; kvm@vger.kernel.org>);
	Thu, 25 Apr 2013 03:52:01 -0400
Received: by mail-vc0-f173.google.com with SMTP id ia10so1219278vcb.4
	for <kvm@vger.kernel.org>; Thu, 25 Apr 2013 00:52:01 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:x-received:date:message-id:subject:from:to
	:content-type:x-gm-message-state;
	bh=kvnzQqAHv23H0bWDRksX3XDz7rgLhwZ5TAaHe6muY3o=;
	b=Npx22La2MXzyEdGHrqtHthKx7ObZfMEqvYz+nr5pcH2f1vTGvLgP7duAUh1T9gLFAx
	MvBU2EU5Vto7tsx/rnv2azN17/YYmujCrcrZUBbknTDLOyWRCaMFnpj62PdUVd7Paa7f
	xR3UU5arv0L37Ne+UyYYZD3gAz9tOwY9YjGnY/vegY/UmFcZt6eXW9A/RztUtcJ/jebw
	FDwUMTTflzZHKgNPKA3VEtz4BPm+fltWOX/0cLCb+v8TZfdXPfkMlW1uiO0kYP4KcSyC
	SkLT3AtxpekCdj7dK+cnBnhtjhaqpJKudvt4NWtEc7/o2OtZYkeBu94fntfTya0LEh2W
	Du8g==
MIME-Version: 1.0
X-Received: by 10.52.90.112 with SMTP id bv16mr22185230vdb.62.1366876320998;
	Thu, 25 Apr 2013 00:52:00 -0700 (PDT)
Received: by 10.58.64.196 with HTTP; Thu, 25 Apr 2013 00:52:00 -0700 (PDT)
Date: Thu, 25 Apr 2013 00:52:00 -0700
Message-ID: 
 <CAL54oT1M4grk4XTk-W1xQnE=gSPcXHEi41x588x_hMvp3aKekQ@mail.gmail.com>
Subject: [PATCH 08/12] Subject: [PATCH 08/10] nEPT: Nested INVEPT
From: "Nakajima, Jun" <jun.nakajima@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>
X-Gm-Message-State: 
 ALoCoQkiwGXWO5MQcneWI+bvGMmAYPTX2SEn0YQj3FAUJy6ONuoHBZ4qixZm0eh2lEZvEbqj/nng
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

If we let L1 use EPT, we should probably also support the INVEPT instruction.

In our current nested EPT implementation, when L1 changes its EPT table for
L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course
of this modification already calls INVEPT. Therefore, when L1 calls INVEPT,
we don't really need to do anything. In particular we *don't* need to call
the real INVEPT again. All we do in our INVEPT is verify the validity of the
call, and its parameters, and then do nothing.

In KVM Forum 2010, Dong et al. presented "Nested Virtualization Friendly KVM"
and classified our current nested EPT implementation as "shadow-like virtual
EPT". He recommended instead a different approach, which he called "VTLB-like
virtual EPT". If we had taken that alternative approach, INVEPT would have had
a bigger role: L0 would only rebuild the shadow EPT table when L1 calls INVEPT.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>

modified:   arch/x86/include/asm/vmx.h
modified:   arch/x86/kvm/vmx.c
---
 arch/x86/include/asm/vmx.h |  4 ++-
 arch/x86/kvm/vmx.c         | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+), 1 deletion(-)

 static const int kvm_vmx_max_exit_handlers =
@@ -6106,6 +6188,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
  case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
  case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
  case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
+ case EXIT_REASON_INVEPT:
  /*
  * VMX instructions trap unconditionally. This allows L1 to
  * emulate them for its L2 guest, i.e., allows 3-level nesting!
--
1.8.2.1.610.g562af5b
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index b6fbf86..0ce54f3 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -376,7 +376,9 @@ enum vmcs_field {
 #define VMX_EPTP_WB_BIT (1ull << 14)
 #define VMX_EPT_2MB_PAGE_BIT (1ull << 16)
 #define VMX_EPT_1GB_PAGE_BIT (1ull << 17)
-#define VMX_EPT_AD_BIT    (1ull << 21)
+#define VMX_EPT_INVEPT_BIT (1ull << 20)
+#define VMX_EPT_AD_BIT (1ull << 21)
+#define VMX_EPT_EXTENT_INDIVIDUAL_BIT (1ull << 24)
 #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull << 25)
 #define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a5e14d1..10f2a69 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5878,6 +5878,87 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
  return 1;
 }

+/* Emulate the INVEPT instruction */
+static int handle_invept(struct kvm_vcpu *vcpu)
+{
+ u32 vmx_instruction_info;
+ unsigned long type;
+ gva_t gva;
+ struct x86_exception e;
+ struct {
+ u64 eptp, gpa;
+ } operand;
+
+ if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
+    !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ if (!nested_vmx_check_permission(vcpu))
+ return 1;
+
+ if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ /* According to the Intel VMX instruction reference, the memory
+ * operand is read even if it isn't needed (e.g., for type==global)
+ */
+ vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+ if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
+ vmx_instruction_info, &gva))
+ return 1;
+ if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
+ sizeof(operand), &e)) {
+ kvm_inject_page_fault(vcpu, &e);
+ return 1;
+ }
+
+ type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+ switch (type) {
+ case VMX_EPT_EXTENT_GLOBAL:
+ if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_GLOBAL_BIT))
+ nested_vmx_failValid(vcpu,
+ VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+ else {
+ /*
+ * Do nothing: when L1 changes EPT12, we already
+ * update EPT02 (the shadow EPT table) and call INVEPT.
+ * So when L1 calls INVEPT, there's nothing left to do.
+ */
+ nested_vmx_succeed(vcpu);
+ }
+ break;
+ case VMX_EPT_EXTENT_CONTEXT:
+ if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_CONTEXT_BIT))
+ nested_vmx_failValid(vcpu,
+ VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+ else {
+ /* Do nothing */
+ nested_vmx_succeed(vcpu);
+ }
+ break;
+ case VMX_EPT_EXTENT_INDIVIDUAL_ADDR:
+ if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_INDIVIDUAL_BIT))
+ nested_vmx_failValid(vcpu,
+ VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+ else {
+ /* Do nothing */
+ nested_vmx_succeed(vcpu);
+ }
+ break;
+ default:
+ nested_vmx_failValid(vcpu,
+ VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+ }
+
+ skip_emulated_instruction(vcpu);
+ return 1;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -5922,6 +6003,7 @@ static int (*const
kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
  [EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
  [EXIT_REASON_MWAIT_INSTRUCTION]      = handle_invalid_op,
  [EXIT_REASON_MONITOR_INSTRUCTION]     = handle_invalid_op,
+ [EXIT_REASON_INVEPT]                  = handle_invept,
 };