From patchwork Thu Oct 19 14:34:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eugenio Perez Martin X-Patchwork-Id: 13429319 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6B7FCCDB465 for ; Thu, 19 Oct 2023 14:37:40 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qtU7K-0006hA-Qb; Thu, 19 Oct 2023 10:35:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qtU7G-0006fs-PP for qemu-devel@nongnu.org; Thu, 19 Oct 2023 10:35:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qtU7D-0000EN-PP for qemu-devel@nongnu.org; Thu, 19 Oct 2023 10:35:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697726108; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ZiWlRNulXY9zcz/Z0lV7RlrqHDJuq9AIIhFR5zobfxc=; b=dJVoO1UqcC2NBNG5UfFeC/e0J+kNBIJoC7JFBJNuU1dBrpCFqsSCwFjJFujX6aIpJ4M8oX bPjqx6qU76hxSjpIjgV2dBZO6gzYXCBu8EgGdcDHDNzHt+Sb4Cy7tnSCPPoYtCrYpTTyPn EQ257PwcXVKxUJD5n1V7XpX4wOu+coQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-582-5-wwwBJLMSS0vqaxBrrw5g-1; Thu, 19 Oct 2023 10:35:00 -0400 X-MC-Unique: 5-wwwBJLMSS0vqaxBrrw5g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CDD078EB36C; Thu, 19 Oct 2023 14:34:59 +0000 (UTC) Received: from eperezma.remote.csb (unknown [10.39.192.162]) by smtp.corp.redhat.com (Postfix) with ESMTP id 75F45C15BB8; Thu, 19 Oct 2023 14:34:57 +0000 (UTC) From: =?utf-8?q?Eugenio_P=C3=A9rez?= To: qemu-devel@nongnu.org Cc: Shannon , Parav Pandit , Stefano Garzarella , "Michael S. Tsirkin" , yin31149@gmail.com, Jason Wang , Yajun Wu , Zhu Lingshan , Lei Yang , Dragos Tatulea , Juan Quintela , Laurent Vivier , si-wei.liu@oracle.com, Gautam Dawar Subject: [RFC PATCH 00/18] Map memory at destination .load_setup in vDPA-net migration Date: Thu, 19 Oct 2023 16:34:37 +0200 Message-Id: <20231019143455.2377694-1-eperezma@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 Received-SPF: pass client-ip=170.10.133.124; envelope-from=eperezma@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Current memory operations like pinning may take a lot of time at the destination. Currently they are done after the source of the migration is stopped, and before the workload is resumed at the destination. This is a period where neigher traffic can flow, nor the VM workload can continue (downtime). We can do better as we know the memory layout of the guest RAM at the destination from the moment the migration starts. Moving that operation allows QEMU to communicate the kernel the maps while the workload is still running in the source, so Linux can start mapping them. Ideally, all IOMMU is configured, but if the vDPA parent driver uses on-chip IOMMU and .set_map we're still saving all the pinning time. Note that further devices setup at the end of the migration may alter the guest memory layout. But same as the previous point, many operations are still done incrementally, like memory pinning, so we're saving time anyway. The first bunch of patches just reorganizes the code, so memory related operation parameters are shared between all vhost_vdpa devices. This is because the destination does not know what vhost_vdpa struct will have the registered listener member, so it is easier to place them in a shared struct rather to keep them in vhost_vdpa struct. Future version may squash or omit these patches. Only tested with vdpa_sim. I'm sending this before full benchmark, as some work like [1] can be based on it, and Si-Wei agreed on benchmark this series with his experience. Future directions on top of this series may include: * Iterative migration of virtio-net devices, as it may reduce downtime per [1]. vhost-vdpa net can apply the configuration through CVQ in the destination while the source is still migrating. * Move more things ahead of migration time, like DRIVER_OK. * Check that the devices of the destination are valid, and cancel the migration in case it is not. [1] https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/ Eugenio PĂ©rez (18): vdpa: add VhostVDPAShared vdpa: move iova tree to the shared struct vdpa: move iova_range to vhost_vdpa_shared vdpa: move shadow_data to vhost_vdpa_shared vdpa: use vdpa shared for tracing vdpa: move file descriptor to vhost_vdpa_shared vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared vdpa: move backend_cap to vhost_vdpa_shared vdpa: remove msg type of vhost_vdpa vdpa: move iommu_list to vhost_vdpa_shared vdpa: use VhostVDPAShared in vdpa_dma_map and unmap vdpa: use dev_shared in vdpa_iommu vdpa: move memory listener to vhost_vdpa_shared vdpa: do not set virtio status bits if unneeded vdpa: add vhost_vdpa_load_setup vdpa: add vhost_vdpa_net_load_setup NetClient callback vdpa: use shadow_data instead of first device v->shadow_vqs_enabled virtio_net: register incremental migration handlers include/hw/virtio/vhost-vdpa.h | 43 +++++--- include/net/net.h | 4 + hw/net/virtio-net.c | 23 +++++ hw/virtio/vdpa-dev.c | 7 +- hw/virtio/vhost-vdpa.c | 183 ++++++++++++++++++--------------- net/vhost-vdpa.c | 127 ++++++++++++----------- hw/virtio/trace-events | 14 +-- 7 files changed, 239 insertions(+), 162 deletions(-)