From patchwork Thu Jul 14 00:16:37 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Matlack <dmatlack@google.com>
X-Patchwork-Id: 9228639
Return-Path: <kvm-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	1CFEF6088F for <patchwork-kvm@patchwork.kernel.org>;
	Thu, 14 Jul 2016 00:18:05 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0D93927FBE
	for <patchwork-kvm@patchwork.kernel.org>;
	Thu, 14 Jul 2016 00:18:05 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 01D802808C; Thu, 14 Jul 2016 00:18:04 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, RCVD_IN_DNSWL_HI,
	T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 79CDE28066
	for <patchwork-kvm@patchwork.kernel.org>;
	Thu, 14 Jul 2016 00:18:04 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751599AbcGNARj (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Wed, 13 Jul 2016 20:17:39 -0400
Received: from mail-pf0-f175.google.com ([209.85.192.175]:36222 "EHLO
	mail-pf0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751574AbcGNARg (ORCPT <rfc822; kvm@vger.kernel.org>);
	Wed, 13 Jul 2016 20:17:36 -0400
Received: by mail-pf0-f175.google.com with SMTP id t190so23462719pfb.3
	for <kvm@vger.kernel.org>; Wed, 13 Jul 2016 17:17:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=from:to:cc:subject:date:message-id;
	bh=pr4QXzRSyIb3f9y6aClc58aV2sBDRrbvv7tcv6bQ5jw=;
	b=ZKz8TiKqPGfrSLSRYokUaj7Jz9ie0FvGmsd3wsECxSotL6eZnn5B49QsE7pMcttbzo
	yeKVl712EdJxaCNMDFlBlVnLQH5ghk5jxrOHk/sRXzhoY93A/vKheMVl2bO0An+2LohY
	HM8ytjyVhyp0BrdLelTYHsmknIU13TrVs84QaGG+fi8U8tsgS85O+kQeT6U7LUTWzzoY
	IHUaXzDrLgRyLmzgb4zdrzQECRHUCaBFVcqMk1+cFk8/J29JKEacJz8i7PArnq5Qbi8C
	yJu4OCnVzEPFXLSgF9dIJ26G8q6PJQ6MdoMhKVw5hF0dqGj8sFho0dVSWgF1whbkwGvd
	58og==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=pr4QXzRSyIb3f9y6aClc58aV2sBDRrbvv7tcv6bQ5jw=;
	b=Kk9vcRQ/o445AVm11eiRHZLGhlheEdwoXXl6F+EAhj1A+MK29oD4QuvPwgPUMljDzh
	rTkalAfNvXQXXKTYfYA7pJdEowSTsX4XMjjRxzL8nDNVEiis09mEel9R3ef9FRi5BHaS
	oJyR+wJH0fcq19QzmrMpEYPFPp+qQ/NL3i/ovzAUcceWfLB3QBTo9GOfoOBeKJO2KvSz
	trqyoMlD62/LkjT3xEdv+7N/lfiwXSzD5iIRNJK1s/lHK6oG+iIcPQu8hhMgkTWmgr64
	Tgz9zr/an7/yfLtNtxUz+L+jOZT9VvZpqCVVoiExXBCLK8LiI8aEWsOkFlYfAfE4PJWS
	aUeQ==
X-Gm-Message-State: 
 ALyK8tLdLgDJj7F7iXnm6aBEW/ipJT6OVDHd4v1Foj1inrQtRqEU9Ds60hblWzrcxqIod70V
X-Received: by 10.98.64.193 with SMTP id f62mr6706623pfd.141.1468455455708;
	Wed, 13 Jul 2016 17:17:35 -0700 (PDT)
Received: from dmatlack.sea.corp.google.com ([100.100.206.67])
	by smtp.gmail.com with ESMTPSA id
	f6sm5475146pfa.17.2016.07.13.17.17.34
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
	Wed, 13 Jul 2016 17:17:35 -0700 (PDT)
From: David Matlack <dmatlack@google.com>
To: kvm@vger.kernel.org
Cc: pbonzini@redhat.com, jmattson@google.com, pfeiner@google.com,
	linux-kernel@vger.kernel.org, David Matlack <dmatlack@google.com>
Subject: [PATCH] kvm: x86: nVMX: maintain internal copy of current VMCS
Date: Wed, 13 Jul 2016 17:16:37 -0700
Message-Id: <1468455397-22003-1-git-send-email-dmatlack@google.com>
X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

KVM maintains L1's current VMCS in guest memory, at the guest physical
page identified by the argument to VMPTRLD. This makes hairy
time-of-check to time-of-use bugs possible,as VCPUs can be writing
the the VMCS page in memory while KVM is emulating VMLAUNCH and
VMRESUME.

The spec documents that writing to the VMCS page while it is loaded is
"undefined". Therefore it is reasonable to load the entire VMCS into
an internal cache during VMPTRLD and ignore writes to the VMCS page
-- the guest should be using VMREAD and VMWRITE to access the current
VMCS.

To adhere to the spec, KVM should flush the current VMCS during VMPTRLD,
and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR).
Since this implementation of VMCS caching only maintains the the current
VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the
current VMCS pointer.

KVM will also flush during VMXOFF, which is not mandated by the spec,
but also not in conflict with the spec.

Signed-off-by: David Matlack <dmatlack@google.com>
---
 arch/x86/kvm/vmx.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 64a79f2..640ad91 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -398,6 +398,12 @@ struct nested_vmx {
 	/* The host-usable pointer to the above */
 	struct page *current_vmcs12_page;
 	struct vmcs12 *current_vmcs12;
+	/*
+	 * Cache of the guest's VMCS, existing outside of guest memory.
+	 * Loaded from guest memory during VMPTRLD. Flushed to guest
+	 * memory during VMXOFF, VMCLEAR, VMPTRLD.
+	 */
+	struct vmcs12 *cached_vmcs12;
 	struct vmcs *current_shadow_vmcs;
 	/*
 	 * Indicates if the shadow vmcs must be updated with the
@@ -841,7 +847,7 @@ static inline short vmcs_field_to_offset(unsigned long field)
 
 static inline struct vmcs12 *get_vmcs12(struct kvm_vcpu *vcpu)
 {
-	return to_vmx(vcpu)->nested.current_vmcs12;
+	return to_vmx(vcpu)->nested.cached_vmcs12;
 }
 
 static struct page *nested_get_page(struct kvm_vcpu *vcpu, gpa_t addr)
@@ -6866,10 +6872,16 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
+	vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
+	if (!vmx->nested.cached_vmcs12)
+		return -ENOMEM;
+
 	if (enable_shadow_vmcs) {
 		shadow_vmcs = alloc_vmcs();
-		if (!shadow_vmcs)
+		if (!shadow_vmcs) {
+			kfree(vmx->nested.cached_vmcs12);
 			return -ENOMEM;
+		}
 		/* mark vmcs as shadow */
 		shadow_vmcs->revision_id |= (1u << 31);
 		/* init shadow vmcs */
@@ -6940,6 +6952,11 @@ static inline void nested_release_vmcs12(struct vcpu_vmx *vmx)
 		vmcs_write64(VMCS_LINK_POINTER, -1ull);
 	}
 	vmx->nested.posted_intr_nv = -1;
+
+	/* Flush VMCS12 to guest memory */
+	memcpy(vmx->nested.current_vmcs12, vmx->nested.cached_vmcs12,
+	       VMCS12_SIZE);
+
 	kunmap(vmx->nested.current_vmcs12_page);
 	nested_release_page(vmx->nested.current_vmcs12_page);
 	vmx->nested.current_vmptr = -1ull;
@@ -6960,6 +6977,7 @@ static void free_nested(struct vcpu_vmx *vmx)
 	nested_release_vmcs12(vmx);
 	if (enable_shadow_vmcs)
 		free_vmcs(vmx->nested.current_shadow_vmcs);
+	kfree(vmx->nested.cached_vmcs12);
 	/* Unpin physical memory we referred to in current vmcs02 */
 	if (vmx->nested.apic_access_page) {
 		nested_release_page(vmx->nested.apic_access_page);
@@ -7363,6 +7381,13 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
 		vmx->nested.current_vmptr = vmptr;
 		vmx->nested.current_vmcs12 = new_vmcs12;
 		vmx->nested.current_vmcs12_page = page;
+		/*
+		 * Load VMCS12 from guest memory since it is not already
+		 * cached.
+		 */
+		memcpy(vmx->nested.cached_vmcs12,
+		       vmx->nested.current_vmcs12, VMCS12_SIZE);
+
 		if (enable_shadow_vmcs) {
 			vmcs_set_bits(SECONDARY_VM_EXEC_CONTROL,
 				      SECONDARY_EXEC_SHADOW_VMCS);
@@ -8326,7 +8351,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa)
 	 * the next L2->L1 exit.
 	 */
 	if (!is_guest_mode(vcpu) ||
-	    !nested_cpu_has2(vmx->nested.current_vmcs12,
+	    !nested_cpu_has2(get_vmcs12(&vmx->vcpu),
 			     SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES))
 		vmcs_write64(APIC_ACCESS_ADDR, hpa);
 }