From patchwork Sun May 19 04:52:29 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Nakajima, Jun" <jun.nakajima@intel.com>
X-Patchwork-Id: 2589721
Return-Path: <kvm-owner@vger.kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork2.kernel.org (Postfix) with ESMTP id C88E1DF24C
	for <patchwork-kvm@patchwork.kernel.org>;
	Sun, 19 May 2013 04:53:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752025Ab3ESExD (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Sun, 19 May 2013 00:53:03 -0400
Received: from mail-da0-f49.google.com ([209.85.210.49]:33547 "EHLO
	mail-da0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751371Ab3ESExA (ORCPT <rfc822; kvm@vger.kernel.org>);
	Sun, 19 May 2013 00:53:00 -0400
Received: by mail-da0-f49.google.com with SMTP id p5so3152717dak.8
	for <kvm@vger.kernel.org>; Sat, 18 May 2013 21:52:59 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=x-received:from:to:cc:subject:date:message-id:x-mailer:in-reply-to
	:references:x-gm-message-state;
	bh=VSgrcnXLqs0uWUmToGpcK/G1ElvAZlhE5UcPLwVe/G8=;
	b=Uo/FhB3QCy4yuBlArw/+jH/NnupBIGdYtRZOjONCtAxtuhb640v15ZH0MZm4Biyfpz
	Fpay2tsTL+I6E25nEXDYA5aF2y/KiuFCK1u9DggTFYq43cYvvnunQ62Rvb3cXHDE2wMM
	/t9Su+3cE4yRnC2fyiJ8LmaSPviqvM81tCj+lOcTCwPEGOvNILJV1nzJhvrwnfOHjvJJ
	UzN8cDo3CBkr0bi2LRlSbmONQ5HZzgDKKcGSD8DoDsztZWzGtGwfKinRF7YVE30CHIgA
	BpI6mBjYq6sLrBupOZTce7qaIsjTJ5/Yt2Z4S/pe4PbXH1pzz3LbtGHGBeH3OnDOU9dr
	AMFw==
X-Received: by 10.66.188.137 with SMTP id ga9mr56010364pac.9.1368939179863;
	Sat, 18 May 2013 21:52:59 -0700 (PDT)
Received: from localhost (c-98-207-34-191.hsd1.ca.comcast.net.
	[98.207.34.191]) by mx.google.com with ESMTPSA id
	zo4sm18214737pbc.21.2013.05.18.21.52.58 for <multiple recipients>
	(version=TLSv1.2 cipher=RC4-SHA bits=128/128);
	Sat, 18 May 2013 21:52:59 -0700 (PDT)
From: Jun Nakajima <jun.nakajima@intel.com>
To: kvm@vger.kernel.org
Cc: Gleb Natapov <gleb@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH v3 10/13] nEPT: Nested INVEPT
Date: Sat, 18 May 2013 21:52:29 -0700
Message-Id: <1368939152-11406-10-git-send-email-jun.nakajima@intel.com>
X-Mailer: git-send-email 1.8.2.1.610.g562af5b
In-Reply-To: <1368939152-11406-1-git-send-email-jun.nakajima@intel.com>
References: <1368939152-11406-1-git-send-email-jun.nakajima@intel.com>
X-Gm-Message-State: 
 ALoCoQlQMuAHpTrlw5lb1ST2T4NsaaWZn9UhSnaCDBho7yzIr3M68yFFUMwfhS96HuCy2tfxP7+y
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

From: Nadav Har'El <nyh@il.ibm.com>

If we let L1 use EPT, we should probably also support the INVEPT instruction.

In our current nested EPT implementation, when L1 changes its EPT table for
L2 (i.e., EPT12), L0 modifies the shadow EPT table (EPT02), and in the course
of this modification already calls INVEPT. Therefore, when L1 calls INVEPT,
we don't really need to do anything. In particular we *don't* need to call
the real INVEPT again. All we do in our INVEPT is verify the validity of the
call, and its parameters, and then do nothing.

In KVM Forum 2010, Dong et al. presented "Nested Virtualization Friendly KVM"
and classified our current nested EPT implementation as "shadow-like virtual
EPT". He recommended instead a different approach, which he called "VTLB-like
virtual EPT". If we had taken that alternative approach, INVEPT would have had
a bigger role: L0 would only rebuild the shadow EPT table when L1 calls INVEPT.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Jun Nakajima <jun.nakajima@intel.com>
Signed-off-by: Xinhao Xu <xinhao.xu@intel.com>
---
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/vmx.c              | 83 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index d651082..7a34e8f 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -65,6 +65,7 @@
 #define EXIT_REASON_EOI_INDUCED         45
 #define EXIT_REASON_EPT_VIOLATION       48
 #define EXIT_REASON_EPT_MISCONFIG       49
+#define EXIT_REASON_INVEPT              50
 #define EXIT_REASON_PREEMPTION_TIMER    52
 #define EXIT_REASON_WBINVD              54
 #define EXIT_REASON_XSETBV              55
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1cf8a41..d9d991d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6251,6 +6251,87 @@ static int handle_vmptrst(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/* Emulate the INVEPT instruction */
+static int handle_invept(struct kvm_vcpu *vcpu)
+{
+	u32 vmx_instruction_info;
+	unsigned long type;
+	gva_t gva;
+	struct x86_exception e;
+	struct {
+		u64 eptp, gpa;
+	} operand;
+
+	if (!(nested_vmx_secondary_ctls_high & SECONDARY_EXEC_ENABLE_EPT) ||
+	    !(nested_vmx_ept_caps & VMX_EPT_INVEPT_BIT)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	if (!nested_vmx_check_permission(vcpu))
+		return 1;
+
+	if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+		kvm_queue_exception(vcpu, UD_VECTOR);
+		return 1;
+	}
+
+	/* According to the Intel VMX instruction reference, the memory
+	 * operand is read even if it isn't needed (e.g., for type==global)
+	 */
+	vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
+	if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
+			vmx_instruction_info, &gva))
+		return 1;
+	if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &operand,
+				sizeof(operand), &e)) {
+		kvm_inject_page_fault(vcpu, &e);
+		return 1;
+	}
+
+	type = kvm_register_read(vcpu, (vmx_instruction_info >> 28) & 0xf);
+
+	switch (type) {
+	case VMX_EPT_EXTENT_GLOBAL:
+		if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_GLOBAL_BIT))
+			nested_vmx_failValid(vcpu,
+				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+		else {
+			/*
+			 * Do nothing: when L1 changes EPT12, we already
+			 * update EPT02 (the shadow EPT table) and call INVEPT.
+			 * So when L1 calls INVEPT, there's nothing left to do.
+			 */
+			nested_vmx_succeed(vcpu);
+		}
+		break;
+	case VMX_EPT_EXTENT_CONTEXT:
+		if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_CONTEXT_BIT))
+			nested_vmx_failValid(vcpu,
+				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+		else {
+			/* Do nothing */
+			nested_vmx_succeed(vcpu);
+		}
+		break;
+	case VMX_EPT_EXTENT_INDIVIDUAL_ADDR:
+		if (!(nested_vmx_ept_caps & VMX_EPT_EXTENT_INDIVIDUAL_BIT))
+			nested_vmx_failValid(vcpu,
+				VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+		else {
+			/* Do nothing */
+			nested_vmx_succeed(vcpu);
+		}
+		break;
+	default:
+		nested_vmx_failValid(vcpu,
+			VMXERR_INVALID_OPERAND_TO_INVEPT_INVVPID);
+	}
+
+	skip_emulated_instruction(vcpu);
+	return 1;
+}
+
 /*
  * The exit handlers return 1 if the exit was handled fully and guest execution
  * may resume.  Otherwise they set the kvm_run parameter to indicate what needs
@@ -6295,6 +6376,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[EXIT_REASON_PAUSE_INSTRUCTION]       = handle_pause,
 	[EXIT_REASON_MWAIT_INSTRUCTION]	      = handle_invalid_op,
 	[EXIT_REASON_MONITOR_INSTRUCTION]     = handle_invalid_op,
+	[EXIT_REASON_INVEPT]                  = handle_invept,
 };
 
 static const int kvm_vmx_max_exit_handlers =
@@ -6521,6 +6603,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 	case EXIT_REASON_VMPTRST: case EXIT_REASON_VMREAD:
 	case EXIT_REASON_VMRESUME: case EXIT_REASON_VMWRITE:
 	case EXIT_REASON_VMOFF: case EXIT_REASON_VMON:
+	case EXIT_REASON_INVEPT:
 		/*
 		 * VMX instructions trap unconditionally. This allows L1 to
 		 * emulate them for its L2 guest, i.e., allows 3-level nesting!