From patchwork Fri Nov 30 16:47:40 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christoffer Dall <c.dall@virtualopensystems.com>
X-Patchwork-Id: 1825741
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	by patchwork2.kernel.org (Postfix) with ESMTP id 77520DF24C
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Fri, 30 Nov 2012 16:50:53 +0000 (UTC)
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux))
	id 1TeTkv-00040T-Dl; Fri, 30 Nov 2012 16:48:01 +0000
Received: from mail-vb0-f49.google.com ([209.85.212.49])
	by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux))
	id 1TeTkd-0003zH-K6 for linux-arm-kernel@lists.infradead.org;
	Fri, 30 Nov 2012 16:47:44 +0000
Received: by mail-vb0-f49.google.com with SMTP id r6so11571844vbi.22
	for <linux-arm-kernel@lists.infradead.org>;
	Fri, 30 Nov 2012 08:47:42 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:x-originating-ip:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type:x-gm-message-state;
	bh=HMECoVqeGZq21y6RujMg3DAVytq1hWFs1a5y4mMQ15E=;
	b=fY0h2PQNZrcXA+Ye2rf8sf5Ya4mD2oQVB3ZTt9RURDd9oTVhqP4n2Pa0D94kZbvUQF
	wRqfZDJ9v1uDVa7O1Rb0Lu9LRcNcX6xYznYXzarQh4AZeRbECyAJ1ZTA74TI/tZcue+r
	w97ZYiIvSjQemujisPqWqBIZBTuKAje/HbM3l1jPU6vIvXEbE4qki5isinA9o2NB9rph
	CpxTJYcldvmh5DV1SUsE5xPXPzizUVTq8IYLZgAxlBbEHjHt/V/2i5kwVuYqn5VPhNyY
	vDmHUVarAGtJb6GjzYwgVicElzMy/gSoGFESpOt3axMHzp3Lb1ElvIu5NzDfXpwL8iu3
	iZzw==
MIME-Version: 1.0
Received: by 10.59.5.229 with SMTP id cp5mr1521937ved.32.1354294060436; Fri,
	30 Nov 2012 08:47:40 -0800 (PST)
Received: by 10.220.127.16 with HTTP; Fri, 30 Nov 2012 08:47:40 -0800 (PST)
X-Originating-IP: [72.80.83.148]
In-Reply-To: <20121130151500.GC26289@mudshark.cambridge.arm.com>
References: <20121110154203.2836.46686.stgit@chazy-air>
	<20121110154306.2836.93473.stgit@chazy-air>
	<20121119145758.GA3205@mudshark.cambridge.arm.com>
	<CANM98qJFFP=2ueyUpbN6x15XwtWXbgZjkMBaq2jVpqik4kT54Q@mail.gmail.com>
	<20121130151500.GC26289@mudshark.cambridge.arm.com>
Date: Fri, 30 Nov 2012 11:47:40 -0500
Message-ID: 
 <CANM98q+TQZiXe3GLjcPpPVZz40emmizgnEmyZEju1ruKNR6EEQ@mail.gmail.com>
Subject: Re: [PATCH v4 08/14] KVM: ARM: World-switch implementation
From: Christoffer Dall <c.dall@virtualopensystems.com>
To: Will Deacon <will.deacon@arm.com>
X-Gm-Message-State: 
 ALoCoQk5biAi/nXNmMSPE+pH3jQ/wUTDyCzQPo+ruRO8VCmJ2vZy6G4dTPqgb6mwlzY/ZyMaYigL
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20121130_114744_151602_367C51F7 
X-CRM114-Status: GOOD (  33.59  )
X-Spam-Score: -2.6 (--)
X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary:
	Content analysis details:   (-2.6 points)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	-0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/,
	low trust [209.85.212.49 listed in list.dnswl.org]
	-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
	[score: 0.0000]
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Marc Zyngier <Marc.Zyngier@arm.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Antonios Motakis <a.motakis@virtualopensystems.com>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: linux-arm-kernel-bounces@lists.infradead.org
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

On Fri, Nov 30, 2012 at 10:15 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Nov 30, 2012 at 06:37:04AM +0000, Christoffer Dall wrote:
>> On Mon, Nov 19, 2012 at 9:57 AM, Will Deacon <will.deacon@arm.com> wrote:
>> >
>> > I must be missing something here: how do you ensure that a guest running
>> > on multiple CPUs continues to have the same VMID across them after a
>> > rollover?
>> >
>>
>> when a roll over occurs, there's no problem until someone comes along
>> that doesn't have a valid vmid (need_new_vmid_gen will return true).
>
> Well a roll-over is triggered by somebody not having a valid VMID and us
> failing to allocate a new one from the current generation.
>
>> In this case, to assign a vmid, we need to start a new generation of
>> id's to assign one, and must ensure that all old vmid's are no longer
>> used. So how do we ensure that?
>>
>> Well, we increment the kvm_vmid_gen, causing all other cpus who try to
>> run a VM to hit the spin_lock if they exit the VMs. We reserve the
>> vmid 1 for the new cpu, and we call on_each_cpu, which causes an ipi
>> to all other physical cpus, and waits until the other physical cpus
>> actually complete reset_vm_context.
>>
>> At this point, once on_each_cpu(reset_vm_context) returns, all other
>> physical CPUs have cleared their data structures for occurences of old
>> vmids, and the kvm_vmid_gen has been incremented, so no other vcpus
>> can come and claim other vmids until we unlock the spinlock, and
>> everything starts over.
>>
>> Makes sense?
>
> Yes, but I still don't understand how you ensure VMID consistency across
> different vcpus of the same vm. Imagine the following scenario:
>
> We have two VMs:
>
>         VM0: VCPU0 on physical CPU0, VCPU1 on physical CPU1
>         VM1: VCPU0 on physical CPU2
>
> Also assume that VM0 is happily running and we want to schedule VM1 for
> the first time. Finally, also assume that kvm_next_vmid is zero (that is,
> the next allocation will trigger a roll-over).
>
> Now, we want to schedule VM1:
>
>
>         kvm_arch_init_vm
>                 kvm->arch.vmid_gen = 0;
>         kvm_arch_vcpu_ioctl_run
>                 update_vttbr
>                         need_new_vmid_gen == true
>                         lock(kvm_vmid_lock)
>                         kvm_vmid_gen++;
>                         kvm_next_vmid = 1;
>                         on_each_cpu(reset_vm_context);
>
>
> At this point, the other two (physical) CPUs exit the guest:
>
>
>         kvm_guest_exit                                  // Received IRQ from cross-call
>         local_irq_enable
>                 kvm_call_hyp(__kvm_flush_vm_context);   // Invalidate TLB (this is overkill as should be bcast)
>         cond_resched;
>         update_vttbr
>                 need_new_vmid_gen == true
>                 /* spin on kvm_vmid_lock */
>
>
> I think the __kvm_flush_vm_context is overkill -- you should check
> tlb_ops_need_broadcast (which is currently only true for 11MPCore). However,
> continuing with this, the other CPU gets its vmid and releases the lock:
>
>
>                         /* receives vmid 1, kvm_next_vmid = 2 */
>                         unlock(kvm_vmid_lock)
>                         /* Back to the guest */
>
>
> Now, let's say that CPU0 takes an interrupt (which it goes off to handle)
> and CPU1 grabs the lock:
>
>
>                 lock(kvm_vmid_lock)
>                 /* CPU1 receives vmid 2, bumps vmid counter */
>                 unlock(kvm_vmid_lock)
>                 /* Back to the guest */
>
>
> At this point, VM1 is running and VM0:VCPU1 is running. VM0:VCPU0 is not
> running because physical CPU0 is handling an interrupt. The problem is that
> when VCPU0 *is* resumed, it will update the VMID of VM0 and could be
> scheduled in parallel with VCPU1 but with a different VMID.
>
> How do you avoid this in the current code?
>
I don't. Nice catch. Please apply your interesting brain to the following fix:)

From 5ba0481b78b5b5e9321cb2bb05d611b74149461c Mon Sep 17 00:00:00 2001
From: Christoffer Dall <c.dall@virtualopensystems.com>
Date: Fri, 30 Nov 2012 11:43:24 -0500
Subject: [PATCH] KVM: ARM: Fix race condition in update_vttbr

Will Deacon pointed out another funny race in the update_vttbr code,
where two VCPUs belonging to the same VM could end up using different
VMIDs during a roll over of the VMID space.

To preserve the lockless vcpu_run loop in the common case, we re-check
the vmid generation after grabbing the spin lock.

Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com>
---
 arch/arm/kvm/arm.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c4f631e..dbdee30 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -445,6 +445,16 @@ static void update_vttbr(struct kvm *kvm)

 	spin_lock(&kvm_vmid_lock);

+	/*
+	 * We need to re-check the vmid_gen here to ensure that if another vcpu
+	 * already allocated a valid vmid for this vm, then this vcpu should
+	 * use the same vmid.
+	 */
+	if (!need_new_vmid_gen(kvm)) {
+		spin_unlock(&kvm_vmid_lock);
+		return;
+	}
+
 	/* First user of a new VMID generation? */
 	if (unlikely(kvm_next_vmid == 0)) {
 		atomic64_inc(&kvm_vmid_gen);