From patchwork Tue Dec 25 12:41:09 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Wanlong Gao <gaowanlong@cn.fujitsu.com>
X-Patchwork-Id: 1909761
Return-Path: <kvm-owner@vger.kernel.org>
X-Original-To: patchwork-kvm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork1.kernel.org (Postfix) with ESMTP id 407AC3FC66
	for <patchwork-kvm@patchwork.kernel.org>;
	Tue, 25 Dec 2012 12:41:01 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753861Ab2LYMks (ORCPT
	<rfc822;patchwork-kvm@patchwork.kernel.org>);
	Tue, 25 Dec 2012 07:40:48 -0500
Received: from cn.fujitsu.com ([222.73.24.84]:9327 "EHLO song.cn.fujitsu.com"
	rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP
	id S1753768Ab2LYMkq (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 25 Dec 2012 07:40:46 -0500
X-IronPort-AV: E=Sophos;i="4.84,352,1355068800"; d="scan'208";a="6471850"
Received: from unknown (HELO tang.cn.fujitsu.com) ([10.167.250.3])
	by song.cn.fujitsu.com with ESMTP; 25 Dec 2012 20:38:47 +0800
Received: from fnstmail02.fnst.cn.fujitsu.com (tang.cn.fujitsu.com
	[127.0.0.1])
	by tang.cn.fujitsu.com (8.14.3/8.13.1) with ESMTP id qBPCfWex004528;
	Tue, 25 Dec 2012 20:41:32 +0800
Received: from [10.167.225.197] ([10.167.225.197])
	by fnstmail02.fnst.cn.fujitsu.com (Lotus Domino Release 8.5.3)
	with ESMTP id 2012122520403255-602791 ;
	Tue, 25 Dec 2012 20:40:32 +0800
Message-ID: <50D99EE5.2090905@cn.fujitsu.com>
Date: Tue, 25 Dec 2012 20:41:09 +0800
From: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reply-To: gaowanlong@cn.fujitsu.com
Organization: Fujitsu
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
To: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, hutao@cn.fujitsu.com, linux-scsi@vger.kernel.org,
	virtualization@lists.linux-foundation.org, rusty@rustcorp.com.au,
	asias@redhat.com, stefanha@redhat.com, nab@linux-iscsi.org
Subject: Re: [PATCH v2 5/5] virtio-scsi: introduce multiqueue support
References: <1355833972-20319-1-git-send-email-pbonzini@redhat.com>
	<1355833972-20319-6-git-send-email-pbonzini@redhat.com>
	<20121218135736.GF26110@redhat.com> <50D078C8.208@redhat.com>
	<20121218150327.GB27400@redhat.com> <50D09100.3060505@redhat.com>
	<20121218160226.GA28445@redhat.com>
In-Reply-To: <20121218160226.GA28445@redhat.com>
X-MIMETrack: Itemize by SMTP Server on mailserver/fnst(Release
	8.5.3|September 15, 2011) at 2012/12/25 20:40:32,
	Serialize by Router on mailserver/fnst(Release 8.5.3|September 15,
	2011) at 2012/12/25 20:40:33,
	Serialize complete at 2012/12/25 20:40:33
Sender: kvm-owner@vger.kernel.org
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

On 12/19/2012 12:02 AM, Michael S. Tsirkin wrote:
> On Tue, Dec 18, 2012 at 04:51:28PM +0100, Paolo Bonzini wrote:
>> Il 18/12/2012 16:03, Michael S. Tsirkin ha scritto:
>>> On Tue, Dec 18, 2012 at 03:08:08PM +0100, Paolo Bonzini wrote:
>>>> Il 18/12/2012 14:57, Michael S. Tsirkin ha scritto:
>>>>>> -static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
>>>>>> +static int virtscsi_queuecommand(struct virtio_scsi *vscsi,
>>>>>> +				 struct virtio_scsi_target_state *tgt,
>>>>>> +				 struct scsi_cmnd *sc)
>>>>>>  {
>>>>>> -	struct virtio_scsi *vscsi = shost_priv(sh);
>>>>>> -	struct virtio_scsi_target_state *tgt = &vscsi->tgt[sc->device->id];
>>>>>>  	struct virtio_scsi_cmd *cmd;
>>>>>> +	struct virtio_scsi_vq *req_vq;
>>>>>>  	int ret;
>>>>>>  
>>>>>>  	struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
>>>>>> @@ -461,7 +533,8 @@ static int virtscsi_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
>>>>>>  	BUG_ON(sc->cmd_len > VIRTIO_SCSI_CDB_SIZE);
>>>>>>  	memcpy(cmd->req.cmd.cdb, sc->cmnd, sc->cmd_len);
>>>>>>  
>>>>>> -	if (virtscsi_kick_cmd(tgt, &vscsi->req_vq, cmd,
>>>>>> +	req_vq = ACCESS_ONCE(tgt->req_vq);
>>>>>
>>>>> This ACCESS_ONCE without a barrier looks strange to me.
>>>>> Can req_vq change? Needs a comment.
>>>>
>>>> Barriers are needed to order two things.  Here I don't have the second thing
>>>> to order against, hence no barrier.
>>>>
>>>> Accessing req_vq lockless is safe, and there's a comment about it, but you
>>>> still want ACCESS_ONCE to ensure the compiler doesn't play tricks.
>>>
>>> That's just it.
>>> Why don't you want compiler to play tricks?
>>
>> Because I want the lockless access to occur exactly when I write it.
> 
> It doesn't occur when you write it. CPU can still move accesses
> around. That's why you either need both ACCESS_ONCE and a barrier
> or none.
> 
>> Otherwise I have one more thing to think about, i.e. what a crazy
>> compiler writer could do with my code.  And having been on the other
>> side of the trench, compiler writers can have *really* crazy ideas.
>>
>> Anyhow, I'll reorganize the code to move the ACCESS_ONCE closer to the
>> write and make it clearer.
>>
>>>>>> +	if (virtscsi_kick_cmd(tgt, req_vq, cmd,
>>>>>>  			      sizeof cmd->req.cmd, sizeof cmd->resp.cmd,
>>>>>>  			      GFP_ATOMIC) == 0)
>>>>>>  		ret = 0;
>>>>>> @@ -472,6 +545,48 @@ out:
>>>>>>  	return ret;
>>>>>>  }
>>>>>>  
>>>>>> +static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
>>>>>> +					struct scsi_cmnd *sc)
>>>>>> +{
>>>>>> +	struct virtio_scsi *vscsi = shost_priv(sh);
>>>>>> +	struct virtio_scsi_target_state *tgt = &vscsi->tgt[sc->device->id];
>>>>>> +
>>>>>> +	atomic_inc(&tgt->reqs);
>>>>>
>>>>> And here we don't have barrier after atomic? Why? Needs a comment.
>>>>
>>>> Because we don't write req_vq, so there's no two writes to order.  Barrier
>>>> against what?
>>>
>>> Between atomic update and command. Once you queue command it
>>> can complete and decrement reqs, if this happens before
>>> increment reqs can become negative even.
>>
>> This is not a problem.  Please read Documentation/memory-barrier.txt:
>>
>>    The following also do _not_ imply memory barriers, and so may
>>    require explicit memory barriers under some circumstances
>>    (smp_mb__before_atomic_dec() for instance):
>>
>>         atomic_add();
>>         atomic_sub();
>>         atomic_inc();
>>         atomic_dec();
>>
>>    If they're used for statistics generation, then they probably don't
>>    need memory barriers, unless there's a coupling between statistical
>>    data.
>>
>> This is the single-queue case, so it falls under this case.
> 
> Aha I missed it's single queue. Correct but please add a comment.
> 
>>>>>>  	/* Discover virtqueues and write information to configuration.  */
>>>>>> -	err = vdev->config->find_vqs(vdev, 3, vqs, callbacks, names);
>>>>>> +	err = vdev->config->find_vqs(vdev, num_vqs, vqs, callbacks, names);
>>>>>>  	if (err)
>>>>>>  		return err;
>>>>>>  
>>>>>> -	virtscsi_init_vq(&vscsi->ctrl_vq, vqs[0]);
>>>>>> -	virtscsi_init_vq(&vscsi->event_vq, vqs[1]);
>>>>>> -	virtscsi_init_vq(&vscsi->req_vq, vqs[2]);
>>>>>> +	virtscsi_init_vq(&vscsi->ctrl_vq, vqs[0], false);
>>>>>> +	virtscsi_init_vq(&vscsi->event_vq, vqs[1], false);
>>>>>> +	for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
>>>>>> +		virtscsi_init_vq(&vscsi->req_vqs[i - VIRTIO_SCSI_VQ_BASE],
>>>>>> +				 vqs[i], vscsi->num_queues > 1);
>>>>>
>>>>> So affinity is true if >1 vq? I am guessing this is not
>>>>> going to do the right thing unless you have at least
>>>>> as many vqs as CPUs.
>>>>
>>>> Yes, and then you're not setting up the thing correctly.
>>>
>>> Why not just check instead of doing the wrong thing?
>>
>> The right thing could be to set the affinity with a stride, e.g. CPUs
>> 0-4 for virtqueue 0 and so on until CPUs 3-7 for virtqueue 3.
>>
>> Paolo
> 
> I think a simple #vqs == #cpus check would be kind of OK for
> starters, otherwise let userspace set affinity.
> Again need to think what happens with CPU hotplug.

How about dynamicly setting affinity this way?

========================================================================
Subject: [PATCH] virtio-scsi: set virtqueue affinity under cpu hotplug

We set virtqueue affinity when the num_queues equals to the number
of VCPUs. Register a hotcpu notifier to notifier whether the
VCPU number is changed. If the VCPU number is changed, we
force to set virtqueue affinity again.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
---
 drivers/scsi/virtio_scsi.c | 72 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 3641d5f..1b28e03 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -20,6 +20,7 @@
 #include <linux/virtio_ids.h>
 #include <linux/virtio_config.h>
 #include <linux/virtio_scsi.h>
+#include <linux/cpu.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_cmnd.h>
@@ -106,6 +107,9 @@ struct virtio_scsi {
 
 	u32 num_queues;
 
+	/* Does the affinity hint is set for virtqueues? */
+	bool affinity_hint_set;
+
 	struct virtio_scsi_vq ctrl_vq;
 	struct virtio_scsi_vq event_vq;
 	struct virtio_scsi_vq req_vqs[];
@@ -113,6 +117,7 @@ struct virtio_scsi {
 
 static struct kmem_cache *virtscsi_cmd_cache;
 static mempool_t *virtscsi_cmd_pool;
+static bool cpu_hotplug = false;
 
 static inline struct Scsi_Host *virtio_scsi_host(struct virtio_device *vdev)
 {
@@ -555,6 +560,52 @@ static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
 	return virtscsi_queuecommand(vscsi, tgt, sc);
 }
 
+static void virtscsi_set_affinity(struct virtio_scsi *vscsi,
+				  bool affinity)
+{
+	int i;
+
+	/* In multiqueue mode, when the number of cpu is equal to the number of
+	 * request queues , we let the queues to be private to one cpu by
+	 * setting the affinity hint to eliminate the contention.
+	 */
+	if ((vscsi->num_queues == 1 ||
+	    vscsi->num_queues != num_online_cpus()) && affinity) {
+		if (vscsi->affinity_hint_set)
+			affinity = false;
+		else
+			return;
+	}
+
+	for (i = 0; i < vscsi->num_queues - VIRTIO_SCSI_VQ_BASE; i++) {
+		int cpu = affinity ? i : -1;
+		virtqueue_set_affinity(vscsi->req_vqs[i].vq, cpu);
+	}
+
+	if (affinity)
+		vscsi->affinity_hint_set = true;
+	else
+		vscsi->affinity_hint_set = false;
+}
+
+static int __cpuinit virtscsi_cpu_callback(struct notifier_block *nfb,
+					   unsigned long action, void *hcpu)
+{
+	switch (action) {
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		cpu_hotplug = true;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata virtscsi_cpu_notifier =
+{
+	.notifier_call = virtscsi_cpu_callback,
+};
+
 static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
 				       struct scsi_cmnd *sc)
 {
@@ -563,6 +614,11 @@ static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
 	unsigned long flags;
 	u32 queue_num;
 
+	if (unlikely(cpu_hotplug == true)) {
+		virtscsi_set_affinity(vscsi, true);
+		cpu_hotplug = false;
+	}
+
 	/*
 	 * Using an atomic_t for tgt->reqs lets the virtqueue handler
 	 * decrement it without taking the spinlock.
@@ -703,12 +759,10 @@ static struct scsi_host_template virtscsi_host_template_multi = {
 
 
 static void virtscsi_init_vq(struct virtio_scsi_vq *virtscsi_vq,
-			     struct virtqueue *vq, bool affinity)
+			     struct virtqueue *vq)
 {
 	spin_lock_init(&virtscsi_vq->vq_lock);
 	virtscsi_vq->vq = vq;
-	if (affinity)
-		virtqueue_set_affinity(vq, vq->index - VIRTIO_SCSI_VQ_BASE);
 }
 
 static void virtscsi_init_tgt(struct virtio_scsi *vscsi, int i)
@@ -736,6 +790,8 @@ static void virtscsi_remove_vqs(struct virtio_device *vdev)
 	struct Scsi_Host *sh = virtio_scsi_host(vdev);
 	struct virtio_scsi *vscsi = shost_priv(sh);
 
+	virtscsi_set_affinity(vscsi, false);
+
 	/* Stop all the virtqueues. */
 	vdev->config->reset(vdev);
 
@@ -779,11 +835,14 @@ static int virtscsi_init(struct virtio_device *vdev,
 	if (err)
 		return err;
 
-	virtscsi_init_vq(&vscsi->ctrl_vq, vqs[0], false);
-	virtscsi_init_vq(&vscsi->event_vq, vqs[1], false);
+	virtscsi_init_vq(&vscsi->ctrl_vq, vqs[0]);
+	virtscsi_init_vq(&vscsi->event_vq, vqs[1]);
 	for (i = VIRTIO_SCSI_VQ_BASE; i < num_vqs; i++)
 		virtscsi_init_vq(&vscsi->req_vqs[i - VIRTIO_SCSI_VQ_BASE],
-				 vqs[i], vscsi->num_queues > 1);
+				 vqs[i]);
+
+	virtscsi_set_affinity(vscsi, true);
+	register_hotcpu_notifier(&virtscsi_cpu_notifier);
 
 	virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
 	virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
@@ -882,6 +941,7 @@ static void __devexit virtscsi_remove(struct virtio_device *vdev)
 	struct Scsi_Host *shost = virtio_scsi_host(vdev);
 	struct virtio_scsi *vscsi = shost_priv(shost);
 
+	unregister_hotcpu_notifier(&virtscsi_cpu_notifier);
 	if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG))
 		virtscsi_cancel_event_work(vscsi);