From patchwork Wed May 19 16:28:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eugenio Perez Martin X-Patchwork-Id: 12267913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31F7DC433B4 for ; Wed, 19 May 2021 16:31:34 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ACC2661006 for ; Wed, 19 May 2021 16:31:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ACC2661006 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:56094 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ljP6a-0004TA-KT for qemu-devel@archiver.kernel.org; Wed, 19 May 2021 12:31:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34270) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ljP4i-00021p-JA for qemu-devel@nongnu.org; Wed, 19 May 2021 12:29:36 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21368) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ljP4f-00054K-DK for qemu-devel@nongnu.org; Wed, 19 May 2021 12:29:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621441768; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=O83wLQUkB2jMqEKdetFLiXk7xT5RTVTgTKygIY9tlOs=; b=Qt+2cx0HV++jh6pn2vPQX1kVF9n8b5LzjyHu8hLXIBJ2oJzE8c6jYKS81nfGJrKf7AtcxN l+gq96w36yIZ9xBUYIc3cTKAg0tvjKVMQt//800WefZ5xP3JcimxNQzhR+YnY4kl0nXoak /ngrRvvn11RCX5z6DAE7id/YOoOnGl8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-542--CXuFhpxMTK5pFEJjtlHvg-1; Wed, 19 May 2021 12:29:25 -0400 X-MC-Unique: -CXuFhpxMTK5pFEJjtlHvg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0ED87801817; Wed, 19 May 2021 16:29:24 +0000 (UTC) Received: from eperezma.remote.csb (ovpn-113-65.ams2.redhat.com [10.36.113.65]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12CEF5D6AC; Wed, 19 May 2021 16:29:06 +0000 (UTC) From: =?utf-8?q?Eugenio_P=C3=A9rez?= To: qemu-devel@nongnu.org Subject: [RFC v3 00/29] vDPA software assisted live migration Date: Wed, 19 May 2021 18:28:34 +0200 Message-Id: <20210519162903.1172366-1-eperezma@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eperezma@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=eperezma@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -31 X-Spam_score: -3.2 X-Spam_bar: --- X-Spam_report: (-3.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.39, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Parav Pandit , "Michael S. Tsirkin" , Jason Wang , Juan Quintela , Markus Armbruster , virtualization@lists.linux-foundation.org, Harpreet Singh Anand , Xiao W Wang , Stefan Hajnoczi , Eli Cohen , Michael Lilja , Stefano Garzarella Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This series enable shadow virtqueue for vhost-vdpa devices. This is a new method of vhost devices migration: Instead of relay on vhost device's dirty logging capability, SW assisted LM intercepts dataplane, forwarding the descriptors between VM and device. Is intended for vDPA devices with no dirty memory tracking capabilities. In this migration mode, qemu offers a new vring to the device to read and write into, and disable vhost notifiers, processing guest and vhost notifications in qemu. On used buffer relay, qemu will mark the dirty memory as with plain virtio-net devices. This way, devices does not need to have dirty page logging capability. This series is a POC doing SW LM for vhost-net and vhost-vdpa devices. The former already have dirty page logging capabilities, but it is both easier to test and uses different code paths in qemu. For qemu to use shadow virtqueues the vhost-net devices need to be instantiated: * With IOMMU (iommu_platform=on,ats=on) * Without event_idx (event_idx=off) And shadow virtqueue needs to be enabled for them with QMP command like: { "execute": "x-vhost-enable-shadow-vq", "arguments": { "name": "dev0", "enable": true } } The series includes some commits to delete in the final version. One of them is the one that adds vhost_kernel_vring_pause to vhost kernel devices. This is only intended to work with vhost-net devices, as a way to test the solution, so don't use any other vhost kernel device in the same test. The vhost-vdpa devices should work the same way. However, vp_vdpa is not working properly with intel iommu unmapping, so this series add two extra commits to allow testing the solution enable SVQ mode from the device start and forbidding any other vhost-vdpa memory mapping. The causes of this are still under debugging. For testing vhost-vdpa devices vp_vdpa device has been used with nested virtualization, using a qemu virtio-net device in L0. To be able to stop and reset status, features in RFC status has been implemented in commits 5 and 6. After that, virtio-net driver in L0 guest is replaced by vp_vdpa driver, and a nested qemu instance is launched using it. This vp_vdpa driver needs to be also modified to support the RFCs, mainly allowing it to removing the _S_STOPPED status flag and implementing actual vp_vdpa_set_vq_state and vp_vdpa_get_vq_state callbacks. Just the notification forwarding (with no descriptor relay) can be achieved with patches 7 and 8, and starting SVQ. Previous commits are cleanup ones and declaration of QMP command. Commit 17 introduces the buffer forwarding. Previous one are for preparations again, and laters are for enabling some obvious optimizations. However, it needs the vdpa device to be able to map every IOVA space, and some vDPA devices are not able to do so. Checking of this is added in previous commits. Later commits allow vhost and shadow virtqueue to track and translate between qemu virtual addresses and a restricted iommu range. At the moment is not able to delete old translations, limit maximum range it can translate, nor vhost add new memory regions from the moment SVQ is enabled, but is somehow straightforward to add these. This is a big series, so the idea is to send it in logical chunks once all comments have been collected. As a first complete usecase, a SVQ mode with no possibility of going back to regular mode would cover a first usecase, and this RFC already have all the ingredients but internal memory tracking. It is based on the ideas of DPDK SW assisted LM, in the series of DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does not map the shadow vq in guest's VA, but in qemu's. Comments are welcome! TODO: * Event, indirect, packed, and others features of virtio - Waiting for confirmation of the big picture. * vDPA devices: Grow IOVA tree to track new or deleted memory. Cap IOVA limit in tree so it cannot grow forever. * To sepparate buffers forwarding in its own AIO context, so we can throw more threads to that task and we don't need to stop the main event loop. * IOMMU optimizations, so bacthing and bigger chunks of IOVA can be sent to device. * Automatic kick-in on live-migration. * Proper documentation. Thanks! Changes from v2 RFC: * Adding vhost-vdpa devices support * Fixed some memory leaks pointed by different comments Changes from v1 RFC: * Use QMP instead of migration to start SVQ mode. * Only accepting IOMMU devices, closer behavior with target devices (vDPA) * Fix invalid masking/unmasking of vhost call fd. * Use of proper methods for synchronization. * No need to modify VirtIO device code, all of the changes are contained in vhost code. * Delete superfluous code. * An intermediate RFC was sent with only the notifications forwarding changes. It can be seen in https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/ * v1 at https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html Eugenio PĂ©rez (29): virtio: Add virtio_queue_is_host_notifier_enabled vhost: Save masked_notifier state vhost: Add VhostShadowVirtqueue vhost: Add x-vhost-enable-shadow-vq qmp virtio: Add VIRTIO_F_QUEUE_STATE virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED vhost: Route guest->host notification through shadow virtqueue vhost: Route host->guest notification through shadow virtqueue vhost: Avoid re-set masked notifier in shadow vq virtio: Add vhost_shadow_vq_get_vring_addr vhost: Add vhost_vring_pause operation vhost: add vhost_kernel_vring_pause vhost: Add vhost_get_iova_range operation vhost: add vhost_has_limited_iova_range vhost: Add enable_custom_iommu to VhostOps vhost-vdpa: Add vhost_vdpa_enable_custom_iommu vhost: Shadow virtqueue buffers forwarding vhost: Use vhost_enable_custom_iommu to unmap everything if available vhost: Check for device VRING_USED_F_NO_NOTIFY at shadow virtqueue kick vhost: Use VRING_AVAIL_F_NO_INTERRUPT at device call on shadow virtqueue vhost: Add VhostIOVATree vhost: Add iova_rev_maps_find_iova to IOVAReverseMaps vhost: Use a tree to store memory mappings vhost: Add iova_rev_maps_alloc vhost: Add custom IOTLB translations to SVQ vhost: Map in vdpa-dev vhost-vdpa: Implement vhost_vdpa_vring_pause operation vhost-vdpa: never map with vDPA listener vhost: Start vhost-vdpa SVQ directly qapi/net.json | 22 + hw/virtio/vhost-iova-tree.h | 61 ++ hw/virtio/vhost-shadow-virtqueue.h | 38 ++ hw/virtio/virtio-pci.h | 1 + include/hw/virtio/vhost-backend.h | 16 + include/hw/virtio/vhost-vdpa.h | 2 +- include/hw/virtio/vhost.h | 14 + include/hw/virtio/virtio.h | 5 +- .../standard-headers/linux/virtio_config.h | 5 + include/standard-headers/linux/virtio_pci.h | 2 + hw/net/virtio-net.c | 4 +- hw/virtio/vhost-backend.c | 42 ++ hw/virtio/vhost-iova-tree.c | 283 ++++++++ hw/virtio/vhost-shadow-virtqueue.c | 643 ++++++++++++++++++ hw/virtio/vhost-vdpa.c | 73 +- hw/virtio/vhost.c | 459 ++++++++++++- hw/virtio/virtio-pci.c | 9 + hw/virtio/virtio.c | 5 + hw/virtio/meson.build | 2 +- hw/virtio/trace-events | 1 + 20 files changed, 1663 insertions(+), 24 deletions(-) create mode 100644 hw/virtio/vhost-iova-tree.h create mode 100644 hw/virtio/vhost-shadow-virtqueue.h create mode 100644 hw/virtio/vhost-iova-tree.c create mode 100644 hw/virtio/vhost-shadow-virtqueue.c