From patchwork Fri Dec 29 02:35:11 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Zhoujian (jay)" <jianjay.zhou@huawei.com>
X-Patchwork-Id: 10136021
Return-Path: 
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	7F9696037D for <patchwork-qemu-devel@patchwork.kernel.org>;
	Fri, 29 Dec 2017 02:37:21 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 71CDE2DA08
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Fri, 29 Dec 2017 02:37:21 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 66C8F2DA72; Fri, 29 Dec 2017 02:37:21 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 845D62DA21
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Fri, 29 Dec 2017 02:37:20 +0000 (UTC)
Received: from localhost ([::1]:49831 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>)
	id 1eUkXv-0005DF-PT for patchwork-qemu-devel@patchwork.kernel.org;
	Thu, 28 Dec 2017 21:37:19 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50888)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jianjay.zhou@huawei.com>) id 1eUkWu-0004XL-2f
	for qemu-devel@nongnu.org; Thu, 28 Dec 2017 21:36:17 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jianjay.zhou@huawei.com>) id 1eUkWq-0006CY-Su
	for qemu-devel@nongnu.org; Thu, 28 Dec 2017 21:36:16 -0500
Received: from [45.249.212.35] (port=57558 helo=huawei.com)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <jianjay.zhou@huawei.com>)
	id 1eUkWq-00069L-01
	for qemu-devel@nongnu.org; Thu, 28 Dec 2017 21:36:12 -0500
Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.58])
	by Forcepoint Email with ESMTP id 6AF49A70A96CC;
	Fri, 29 Dec 2017 10:36:05 +0800 (CST)
Received: from localhost (10.177.19.14) by DGGEMS409-HUB.china.huawei.com
	(10.3.19.209) with Microsoft SMTP Server id 14.3.361.1;
	Fri, 29 Dec 2017 10:35:57 +0800
From: Jay Zhou <jianjay.zhou@huawei.com>
To: <qemu-devel@nongnu.org>
Date: Fri, 29 Dec 2017 10:35:11 +0800
Message-ID: <1514514911-15596-1-git-send-email-jianjay.zhou@huawei.com>
X-Mailer: git-send-email 2.6.1.windows.1
MIME-Version: 1.0
X-Originating-IP: [10.177.19.14]
X-CFilter-Loop: Reflected
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 45.249.212.35
Subject: [Qemu-devel] [PATCH v3] vhost: add used memslot number for
	vhost-user and vhost-kernel separately
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: weidong.huang@huawei.com, mst@redhat.com, wangxinxin.wang@huawei.com,
	Zhe Liu <gary.liuzhe@huawei.com>, arei.gonglei@huawei.com,
	jianjay.zhou@huawei.com, imammedo@redhat.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Used_memslots is equal to dev->mem->nregions now, it is true for
vhost kernel, but not for vhost user, which uses the memory regions
that have file descriptor. In fact, not all of the memory regions
have file descriptor.
It is usefully in some scenarios, e.g. used_memslots is 8, and only
5 memory slots can be used by vhost user, it is failed to hotplug
a new DIMM memory because vhost_has_free_slot just returned false,
however we can hotplug it safely in fact.

Meanwhile, instead of asserting in vhost_user_set_mem_table(),
error number is used to gracefully prevent device to start. This
fixed the VM crash issue.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Zhe Liu <gary.liuzhe@huawei.com>
---
 hw/virtio/vhost-backend.c         | 14 +++++++
 hw/virtio/vhost-user.c            | 84 +++++++++++++++++++++++++++++----------
 hw/virtio/vhost.c                 | 16 ++++----
 include/hw/virtio/vhost-backend.h |  4 ++
 4 files changed, 91 insertions(+), 27 deletions(-)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 7f09efa..866718c 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -15,6 +15,8 @@
 #include "hw/virtio/vhost-backend.h"
 #include "qemu/error-report.h"
 
+static unsigned int vhost_kernel_used_memslots;
+
 static int vhost_kernel_call(struct vhost_dev *dev, unsigned long int request,
                              void *arg)
 {
@@ -233,6 +235,16 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev,
         qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL);
 }
 
+static void vhost_kernel_set_used_memslots(struct vhost_dev *dev)
+{
+    vhost_kernel_used_memslots = dev->mem->nregions;
+}
+
+static unsigned int vhost_kernel_get_used_memslots(void)
+{
+    return vhost_kernel_used_memslots;
+}
+
 static const VhostOps kernel_ops = {
         .backend_type = VHOST_BACKEND_TYPE_KERNEL,
         .vhost_backend_init = vhost_kernel_init,
@@ -264,6 +276,8 @@ static const VhostOps kernel_ops = {
 #endif /* CONFIG_VHOST_VSOCK */
         .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback,
         .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg,
+        .vhost_set_used_memslots = vhost_kernel_set_used_memslots,
+        .vhost_get_used_memslots = vhost_kernel_get_used_memslots,
 };
 
 int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 093675e..0f913be 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -122,6 +122,8 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    (0x1)
 
+static unsigned int vhost_user_used_memslots;
+
 struct vhost_user {
     CharBackend *chr;
     int slave_fd;
@@ -289,12 +291,53 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
+static int vhost_user_prepare_msg(struct vhost_dev *dev, VhostUserMsg *msg,
+                                  int *fds)
+{
+    int r = 0;
+    int i, fd;
+    size_t fd_num = 0;
+
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        ram_addr_t offset;
+        MemoryRegion *mr;
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+        if (fd > 0) {
+            if (fd_num < VHOST_MEMORY_MAX_NREGIONS) {
+                msg->payload.memory.nregions++;
+                msg->payload.memory.regions[fd_num].userspace_addr =
+                                                    reg->userspace_addr;
+                msg->payload.memory.regions[fd_num].memory_size =
+                                                    reg->memory_size;
+                msg->payload.memory.regions[fd_num].guest_phys_addr =
+                                                    reg->guest_phys_addr;
+                msg->payload.memory.regions[fd_num].mmap_offset = offset;
+                fds[fd_num] = fd;
+            } else {
+                r = -1;
+            }
+            fd_num++;
+        }
+    }
+
+    /* Save the number of memory slots available for vhost user,
+     * vhost_user_get_used_memslots() can use it next time
+     */
+    vhost_user_used_memslots = fd_num;
+
+    return r;
+}
+
 static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
     int fds[VHOST_MEMORY_MAX_NREGIONS];
-    int i, fd;
-    size_t fd_num = 0;
+    size_t fd_num;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
                                               VHOST_USER_PROTOCOL_F_REPLY_ACK);
 
@@ -307,26 +350,12 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
         msg.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        ram_addr_t offset;
-        MemoryRegion *mr;
-
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
-        if (fd > 0) {
-            msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
-            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
-            msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
-            fds[fd_num++] = fd;
-        }
+    if (vhost_user_prepare_msg(dev, &msg, fds) < 0) {
+        error_report("Failed preparing vhost-user memory table msg");
+        return -1;
     }
 
-    msg.payload.memory.nregions = fd_num;
+    fd_num = msg.payload.memory.nregions;
 
     if (!fd_num) {
         error_report("Failed initializing vhost-user memory map, "
@@ -922,6 +951,19 @@ static void vhost_user_set_iotlb_callback(struct vhost_dev *dev, int enabled)
     /* No-op as the receive channel is not dedicated to IOTLB messages. */
 }
 
+static void vhost_user_set_used_memslots(struct vhost_dev *dev)
+{
+    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    VhostUserMsg msg;
+
+    vhost_user_prepare_msg(dev, &msg, fds);
+}
+
+static unsigned int vhost_user_get_used_memslots(void)
+{
+    return vhost_user_used_memslots;
+}
+
 const VhostOps user_ops = {
         .backend_type = VHOST_BACKEND_TYPE_USER,
         .vhost_backend_init = vhost_user_init,
@@ -948,4 +990,6 @@ const VhostOps user_ops = {
         .vhost_net_set_mtu = vhost_user_net_set_mtu,
         .vhost_set_iotlb_callback = vhost_user_set_iotlb_callback,
         .vhost_send_device_iotlb_msg = vhost_user_send_device_iotlb_msg,
+        .vhost_set_used_memslots = vhost_user_set_used_memslots,
+        .vhost_get_used_memslots = vhost_user_get_used_memslots,
 };
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e4290ce..59a32e9 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -43,20 +43,21 @@
 static struct vhost_log *vhost_log;
 static struct vhost_log *vhost_log_shm;
 
-static unsigned int used_memslots;
 static QLIST_HEAD(, vhost_dev) vhost_devices =
     QLIST_HEAD_INITIALIZER(vhost_devices);
 
 bool vhost_has_free_slot(void)
 {
-    unsigned int slots_limit = ~0U;
     struct vhost_dev *hdev;
 
     QLIST_FOREACH(hdev, &vhost_devices, entry) {
-        unsigned int r = hdev->vhost_ops->vhost_backend_memslots_limit(hdev);
-        slots_limit = MIN(slots_limit, r);
+        if (hdev->vhost_ops->vhost_get_used_memslots() >=
+            hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
+            return false;
+        }
     }
-    return slots_limit > used_memslots;
+
+    return true;
 }
 
 static void vhost_dev_sync_region(struct vhost_dev *dev,
@@ -606,7 +607,7 @@ static void vhost_set_memory(MemoryListener *listener,
     dev->mem_changed_start_addr = MIN(dev->mem_changed_start_addr, start_addr);
     dev->mem_changed_end_addr = MAX(dev->mem_changed_end_addr, start_addr + size - 1);
     dev->memory_changed = true;
-    used_memslots = dev->mem->nregions;
+    dev->vhost_ops->vhost_set_used_memslots(dev);
 }
 
 static bool vhost_section(MemoryRegionSection *section)
@@ -1251,7 +1252,8 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
         goto fail;
     }
 
-    if (used_memslots > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
+    if (hdev->vhost_ops->vhost_get_used_memslots() >
+        hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
         error_report("vhost backend memory slots limit is less"
                 " than current number of present memory slots");
         r = -1;
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index a7a5f22..19961b5 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -84,6 +84,8 @@ typedef void (*vhost_set_iotlb_callback_op)(struct vhost_dev *dev,
                                            int enabled);
 typedef int (*vhost_send_device_iotlb_msg_op)(struct vhost_dev *dev,
                                               struct vhost_iotlb_msg *imsg);
+typedef void (*vhost_set_used_memslots_op)(struct vhost_dev *dev);
+typedef unsigned int (*vhost_get_used_memslots_op)(void);
 
 typedef struct VhostOps {
     VhostBackendType backend_type;
@@ -118,6 +120,8 @@ typedef struct VhostOps {
     vhost_vsock_set_running_op vhost_vsock_set_running;
     vhost_set_iotlb_callback_op vhost_set_iotlb_callback;
     vhost_send_device_iotlb_msg_op vhost_send_device_iotlb_msg;
+    vhost_set_used_memslots_op vhost_set_used_memslots;
+    vhost_get_used_memslots_op vhost_get_used_memslots;
 } VhostOps;
 
 extern const VhostOps user_ops;