From patchwork Sun May  8 08:19:18 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Nadav Har'El <nyh@il.ibm.com>
X-Patchwork-Id: 765142
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by demeter2.kernel.org (8.14.4/8.14.3) with ESMTP id p488JQgK031917
	for <patchwork-kvm@patchwork.kernel.org>; Sun, 8 May 2011 08:19:26 GMT
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1752032Ab1EHITX (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Sun, 8 May 2011 04:19:23 -0400
Received: from mtagate1.uk.ibm.com ([194.196.100.161]:46196 "EHLO
	mtagate1.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751875Ab1EHITW (ORCPT <rfc822; kvm@vger.kernel.org>);
	Sun, 8 May 2011 04:19:22 -0400
Received: from d06nrmr1307.portsmouth.uk.ibm.com
	(d06nrmr1307.portsmouth.uk.ibm.com [9.149.38.129])
	by mtagate1.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p488JLjI027731
	for <kvm@vger.kernel.org>; Sun, 8 May 2011 08:19:21 GMT
Received: from d06av04.portsmouth.uk.ibm.com (d06av04.portsmouth.uk.ibm.com
	[9.149.37.216])
	by d06nrmr1307.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with
	ESMTP id p488Kal72187514
	for <kvm@vger.kernel.org>; Sun, 8 May 2011 09:20:36 +0100
Received: from d06av04.portsmouth.uk.ibm.com (loopback [127.0.0.1])
	by d06av04.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with
	ESMTP id p488JK7V023437
	for <kvm@vger.kernel.org>; Sun, 8 May 2011 02:19:20 -0600
Received: from rice.haifa.ibm.com (rice.haifa.ibm.com [9.148.8.217])
	by d06av04.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with
	ESMTP id p488JJf2023429
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 8 May 2011 02:19:20 -0600
Received: from rice.haifa.ibm.com (lnx-nyh.haifa.ibm.com [127.0.0.1])
	by rice.haifa.ibm.com (8.14.4/8.14.4) with ESMTP id p488JJdi017954;
	Sun, 8 May 2011 11:19:19 +0300
Received: (from nyh@localhost)
	by rice.haifa.ibm.com (8.14.4/8.14.4/Submit) id p488JIgW017952;
	Sun, 8 May 2011 11:19:18 +0300
Date: Sun, 8 May 2011 11:19:18 +0300
Message-Id: <201105080819.p488JIgW017952@rice.haifa.ibm.com>
X-Authentication-Warning: rice.haifa.ibm.com: nyh set sender to "Nadav
	Har'El" <nyh@il.ibm.com> using -f
Cc: gleb@redhat.com, avi@redhat.com
To: kvm@vger.kernel.org
From: "Nadav Har'El" <nyh@il.ibm.com>
References: <1304842511-nyh@il.ibm.com>
Subject: [PATCH 08/30] nVMX: Fix local_vcpus_link handling
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by
	milter-greylist-4.2.6 (demeter2.kernel.org [140.211.167.43]);
	Sun, 08 May 2011 08:19:26 +0000 (UTC)

In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, others for
each L2), and each of those may be have been last loaded on a different cpu.

This trivial patch changes the code to keep on vcpus_on_cpu only L1 VMCSs.
This fixes crashes on L1 shutdown caused by incorrectly maintaing the linked
lists.

It is not a complete solution, though. It doesn't flush the inactive L1 or L2
VMCSs loaded on a CPU which is being shutdown. Doing this correctly will
probably require replacing the vcpu linked list by a link list of "saved_vcms"
objects (VMCS, cpu and launched), and it is left as a TODO.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
---
 arch/x86/kvm/vmx.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- .before/arch/x86/kvm/vmx.c	2011-05-08 10:43:18.000000000 +0300
+++ .after/arch/x86/kvm/vmx.c	2011-05-08 10:43:18.000000000 +0300
@@ -638,7 +638,9 @@ static void __vcpu_clear(void *arg)
 		vmcs_clear(vmx->vmcs);
 	if (per_cpu(current_vmcs, cpu) == vmx->vmcs)
 		per_cpu(current_vmcs, cpu) = NULL;
-	list_del(&vmx->local_vcpus_link);
+	/* TODO: currently, local_vcpus_link is just for L1 VMCSs */
+	if (!is_guest_mode(&vmx->vcpu))
+		list_del(&vmx->local_vcpus_link);
 	vmx->vcpu.cpu = -1;
 	vmx->launched = 0;
 }
@@ -1100,8 +1102,10 @@ static void vmx_vcpu_load(struct kvm_vcp
 
 		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 		local_irq_disable();
-		list_add(&vmx->local_vcpus_link,
-			 &per_cpu(vcpus_on_cpu, cpu));
+		/* TODO: currently, local_vcpus_link is just for L1 VMCSs */
+		if (!is_guest_mode(&vmx->vcpu))
+			list_add(&vmx->local_vcpus_link,
+				 &per_cpu(vcpus_on_cpu, cpu));
 		local_irq_enable();
 
 		/*
@@ -1806,7 +1810,9 @@ static void vmclear_local_vcpus(void)
 
 	list_for_each_entry_safe(vmx, n, &per_cpu(vcpus_on_cpu, cpu),
 				 local_vcpus_link)
-		__vcpu_clear(vmx);
+		/* TODO: currently, local_vcpus_link is just for L1 VMCSs */
+		if (!is_guest_mode(&vmx->vcpu))
+			__vcpu_clear(vmx);
 }