From patchwork Tue Jun 18 16:12:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1CA0BC2BA18 for ; Tue, 18 Jun 2024 16:13:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSQ-0002q0-2K; Tue, 18 Jun 2024 12:13:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSO-0002ml-N3 for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:16 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSN-0000ou-6L for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:16 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbS1-0001aT-BC; Tue, 18 Jun 2024 18:12:53 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 01/13] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Date: Tue, 18 Jun 2024 18:12:19 +0200 Message-ID: <8a5b0ed0530bfbecdc1a1a908da7dd7b2eb2687a.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This way both the start and end points of migrating a particular VFIO device are known. Add also a vfio_save_iterate_empty_hit trace event so it is known when there's no more data to send for that device. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 13 +++++++++++++ hw/vfio/trace-events | 3 +++ include/hw/vfio/vfio-common.h | 3 +++ 3 files changed, 19 insertions(+) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 34d4be2ce1b1..93f767e3c2dd 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -472,6 +472,9 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp) return -ENOMEM; } + migration->save_iterate_run = false; + migration->save_iterate_empty_hit = false; + if (vfio_precopy_supported(vbasedev)) { switch (migration->device_state) { case VFIO_DEVICE_STATE_RUNNING: @@ -605,9 +608,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) VFIOMigration *migration = vbasedev->migration; ssize_t data_size; + if (!migration->save_iterate_run) { + trace_vfio_save_iterate_started(vbasedev->name); + migration->save_iterate_run = true; + } + data_size = vfio_save_block(f, migration); if (data_size < 0) { return data_size; + } else if (data_size == 0 && !migration->save_iterate_empty_hit) { + trace_vfio_save_iterate_empty_hit(vbasedev->name); + migration->save_iterate_empty_hit = true; } vfio_update_estimated_pending_data(migration, data_size); @@ -633,6 +644,8 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) int ret; Error *local_err = NULL; + trace_vfio_save_complete_precopy_started(vbasedev->name); + /* We reach here with device state STOP or STOP_COPY only */ ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY, VFIO_DEVICE_STATE_STOP, &local_err); diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 64161bf6f44c..814000796687 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -158,8 +158,11 @@ vfio_migration_state_notifier(const char *name, int state) " (%s) state %d" vfio_save_block(const char *name, int data_size) " (%s) data_size %d" vfio_save_cleanup(const char *name) " (%s)" vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" +vfio_save_complete_precopy_started(const char *name) " (%s)" vfio_save_device_config_state(const char *name) " (%s)" vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 +vfio_save_iterate_started(const char *name) " (%s)" +vfio_save_iterate_empty_hit(const char *name) " (%s)" vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data buffer size 0x%"PRIx64 vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 4cb1ab8645dc..510818f4dae3 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -71,6 +71,9 @@ typedef struct VFIOMigration { uint64_t precopy_init_size; uint64_t precopy_dirty_size; bool initial_data_sent; + + bool save_iterate_run; + bool save_iterate_empty_hit; } VFIOMigration; struct VFIOGroup; From patchwork Tue Jun 18 16:12:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702684 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 67816C2BA18 for ; Tue, 18 Jun 2024 16:13:22 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSR-0002tf-AT; Tue, 18 Jun 2024 12:13:19 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSP-0002oU-6J for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:17 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSN-0000p4-O6 for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:16 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbS6-0001al-Mz; Tue, 18 Jun 2024 18:12:58 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 02/13] migration/ram: Add load start trace event Date: Tue, 18 Jun 2024 18:12:20 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" There's a RAM load complete trace event but there wasn't its start equivalent. Signed-off-by: Maciej S. Szmigiero --- migration/ram.c | 1 + migration/trace-events | 1 + 2 files changed, 2 insertions(+) diff --git a/migration/ram.c b/migration/ram.c index ceea586b06ba..87b0cf86db0c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4129,6 +4129,7 @@ static int ram_load_precopy(QEMUFile *f) RAM_SAVE_FLAG_ZERO); } + trace_ram_load_start(); while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { ram_addr_t addr; void *host = NULL, *host_bak = NULL; diff --git a/migration/trace-events b/migration/trace-events index 0b7c3324fb5e..43dfe4a4bc03 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -115,6 +115,7 @@ colo_flush_ram_cache_end(void) "" save_xbzrle_page_skipping(void) "" save_xbzrle_page_overflow(void) "" ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations" +ram_load_start(void) "" ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64 ram_write_tracking_ramblock_start(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu" ram_write_tracking_ramblock_stop(const char *block_id, size_t page_size, void *addr, size_t length) "%s: page_size: %zu addr: %p length: %zu" From patchwork Tue Jun 18 16:12:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1BB79C27C4F for ; Tue, 18 Jun 2024 16:13:32 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSS-0002uc-BP; Tue, 18 Jun 2024 12:13:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSQ-0002pz-0q for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:18 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSO-0000pF-Lp for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:17 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSB-0001av-VK; Tue, 18 Jun 2024 18:13:04 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 03/13] migration/multifd: Zero p->flags before starting filling a packet Date: Tue, 18 Jun 2024 18:12:21 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This way there aren't stale flags there. p->flags can't contain SYNC to be sent at the next RAM packet since syncs are now handled separately in multifd_send_thread. Signed-off-by: Maciej S. Szmigiero --- migration/multifd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/multifd.c b/migration/multifd.c index f317bff07746..c8a5b363f7d4 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -933,6 +933,7 @@ static void *multifd_send_thread(void *opaque) if (qatomic_load_acquire(&p->pending_job)) { MultiFDPages_t *pages = p->pages; + p->flags = 0; p->iovs_num = 0; assert(pages->num); @@ -986,7 +987,6 @@ static void *multifd_send_thread(void *opaque) } /* p->next_packet_size will always be zero for a SYNC packet */ stat64_add(&mig_stats.multifd_bytes, p->packet_len); - p->flags = 0; } qatomic_set(&p->pending_sync, false); From patchwork Tue Jun 18 16:12:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 264A6C27C4F for ; Tue, 18 Jun 2024 16:13:59 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSX-00030f-VH; Tue, 18 Jun 2024 12:13:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSW-0002zr-An for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:24 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSU-0000qR-OO for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:24 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSH-0001bF-Ao; Tue, 18 Jun 2024 18:13:09 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 04/13] migration: Add save_live_complete_precopy_{begin, end} handlers Date: Tue, 18 Jun 2024 18:12:22 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" These SaveVMHandlers allow device to provide its own asynchronous transmission of the remaining data at the end of a precopy phase. In this use case the save_live_complete_precopy_begin handler is supposed to start such transmission (for example, by launching appropriate threads) while the save_live_complete_precopy_end handler is supposed to wait until such transfer has finished (for example, until the sending threads have exited). Signed-off-by: Maciej S. Szmigiero --- include/migration/register.h | 34 ++++++++++++++++++++++++++++++++++ migration/savevm.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/include/migration/register.h b/include/migration/register.h index f60e797894e5..f7b3df799991 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -103,6 +103,40 @@ typedef struct SaveVMHandlers { */ int (*save_live_complete_precopy)(QEMUFile *f, void *opaque); + /** + * @save_live_complete_precopy_begin + * + * Called at the end of a precopy phase, before all @save_live_complete_precopy + * handlers. The handler might, for example, arrange for device-specific + * asynchronous transmission of the remaining data. When postcopy is enabled, + * devices that support postcopy will skip this step. + * + * @f: QEMUFile where the handler can synchronously send data before returning + * @idstr: this device section idstr + * @instance_id: this device section instance_id + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ + int (*save_live_complete_precopy_begin)(QEMUFile *f, + char *idstr, uint32_t instance_id, + void *opaque); + /** + * @save_live_complete_precopy_end + * + * Called at the end of a precopy phase, after all @save_live_complete_precopy + * handlers. The handler might, for example, wait for the asynchronous + * transmission started by the @save_live_complete_precopy_begin handler + * to complete. When postcopy is enabled, devices that support postcopy will + * skip this step. + * + * @f: QEMUFile where the handler can synchronously send data before returning + * @opaque: data pointer passed to register_savevm_live() + * + * Returns zero to indicate success and negative for error + */ + int (*save_live_complete_precopy_end)(QEMUFile *f, void *opaque); + /* This runs both outside and inside the BQL. */ /** diff --git a/migration/savevm.c b/migration/savevm.c index c621f2359ba3..56fb1c4c2563 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1494,6 +1494,27 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) SaveStateEntry *se; int ret; + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (!se->ops || (in_postcopy && se->ops->has_postcopy && + se->ops->has_postcopy(se->opaque)) || + !se->ops->save_live_complete_precopy_begin) { + continue; + } + + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy_begin(f, + se->idstr, se->instance_id, + se->opaque); + + save_section_footer(f, se); + + if (ret < 0) { + qemu_file_set_error(f, ret); + return -1; + } + } + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { if (!se->ops || (in_postcopy && se->ops->has_postcopy && @@ -1525,6 +1546,20 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) end_ts_each - start_ts_each); } + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (!se->ops || (in_postcopy && se->ops->has_postcopy && + se->ops->has_postcopy(se->opaque)) || + !se->ops->save_live_complete_precopy_end) { + continue; + } + + ret = se->ops->save_live_complete_precopy_end(f, se->opaque); + if (ret < 0) { + qemu_file_set_error(f, ret); + return -1; + } + } + trace_vmstate_downtime_checkpoint("src-iterable-saved"); return 0; From patchwork Tue Jun 18 16:12:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 745A6C2BA18 for ; Tue, 18 Jun 2024 16:14:35 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSk-0003ev-43; Tue, 18 Jun 2024 12:13:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSh-0003Pr-Rx for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:35 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSg-0000rz-0Z for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:35 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSM-0001be-Ih; Tue, 18 Jun 2024 18:13:14 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 05/13] migration: Add qemu_loadvm_load_state_buffer() and its handler Date: Tue, 18 Jun 2024 18:12:23 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" qemu_loadvm_load_state_buffer() and its load_state_buffer SaveVMHandler allow providing device state buffer to explicitly specified device via its idstr and instance id. Signed-off-by: Maciej S. Szmigiero --- include/migration/register.h | 15 +++++++++++++++ migration/savevm.c | 25 +++++++++++++++++++++++++ migration/savevm.h | 3 +++ 3 files changed, 43 insertions(+) diff --git a/include/migration/register.h b/include/migration/register.h index f7b3df799991..ce7641c90cea 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -261,6 +261,21 @@ typedef struct SaveVMHandlers { */ int (*load_state)(QEMUFile *f, void *opaque, int version_id); + /** + * @load_state_buffer + * + * Load device state buffer provided to qemu_loadvm_load_state_buffer(). + * + * @opaque: data pointer passed to register_savevm_live() + * @data: the data buffer to load + * @data_size: the data length in buffer + * @errp: pointer to Error*, to store an error if it happens. + * + * Returns zero to indicate success and negative for error + */ + int (*load_state_buffer)(void *opaque, char *data, size_t data_size, + Error **errp); + /** * @load_setup * diff --git a/migration/savevm.c b/migration/savevm.c index 56fb1c4c2563..2e538cb02936 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3099,6 +3099,31 @@ int qemu_loadvm_approve_switchover(void) return migrate_send_rp_switchover_ack(mis); } +int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, + char *buf, size_t len, Error **errp) +{ + SaveStateEntry *se; + + se = find_se(idstr, instance_id); + if (!se) { + error_setg(errp, "Unknown idstr %s or instance id %u for load state buffer", + idstr, instance_id); + return -1; + } + + if (!se->ops || !se->ops->load_state_buffer) { + error_setg(errp, "idstr %s / instance %u has no load state buffer operation", + idstr, instance_id); + return -1; + } + + if (se->ops->load_state_buffer(se->opaque, buf, len, errp) != 0) { + return -1; + } + + return 0; +} + bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bool has_devices, strList *devices, Error **errp) { diff --git a/migration/savevm.h b/migration/savevm.h index 9ec96a995c93..d388f1bfca98 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -70,4 +70,7 @@ int qemu_loadvm_approve_switchover(void); int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, bool in_postcopy, bool inactivate_disks); +int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, + char *buf, size_t len, Error **errp); + #endif From patchwork Tue Jun 18 16:12:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C9045C27C4F for ; Tue, 18 Jun 2024 16:13:48 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSh-0003Q6-Uo; Tue, 18 Jun 2024 12:13:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSg-0003Jt-Hj for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:35 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSe-0000rs-RP for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:34 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSR-0001c1-PX; Tue, 18 Jun 2024 18:13:19 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 06/13] migration: Add load_finish handler and associated functions Date: Tue, 18 Jun 2024 18:12:24 +0200 Message-ID: <659b661eef5e9dc47f06dc7a945c2195e936a441.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" load_finish SaveVMHandler allows migration code to poll whether a device-specific asynchronous device state loading operation had finished. In order to avoid calling this handler needlessly the device is supposed to notify the migration code of its possible readiness via a call to qemu_loadvm_load_finish_ready_broadcast() while holding qemu_loadvm_load_finish_ready_lock. Signed-off-by: Maciej S. Szmigiero --- include/migration/register.h | 21 +++++++++++++++ migration/migration.c | 6 +++++ migration/migration.h | 3 +++ migration/savevm.c | 52 ++++++++++++++++++++++++++++++++++++ migration/savevm.h | 4 +++ 5 files changed, 86 insertions(+) diff --git a/include/migration/register.h b/include/migration/register.h index ce7641c90cea..7c20a9fb86ff 100644 --- a/include/migration/register.h +++ b/include/migration/register.h @@ -276,6 +276,27 @@ typedef struct SaveVMHandlers { int (*load_state_buffer)(void *opaque, char *data, size_t data_size, Error **errp); + /** + * @load_finish + * + * Poll whether all asynchronous device state loading had finished. + * Not called on the load failure path. + * + * Called while holding the qemu_loadvm_load_finish_ready_lock. + * + * If this method signals "not ready" then it might not be called + * again until qemu_loadvm_load_finish_ready_broadcast() is invoked + * while holding qemu_loadvm_load_finish_ready_lock. + * + * @opaque: data pointer passed to register_savevm_live() + * @is_finished: whether the loading had finished (output parameter) + * @errp: pointer to Error*, to store an error if it happens. + * + * Returns zero to indicate success and negative for error + * It's not an error that the loading still hasn't finished. + */ + int (*load_finish)(void *opaque, bool *is_finished, Error **errp); + /** * @load_setup * diff --git a/migration/migration.c b/migration/migration.c index e1b269624c01..ff149e00132f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -236,6 +236,9 @@ void migration_object_init(void) current_incoming->exit_on_error = INMIGRATE_DEFAULT_EXIT_ON_ERROR; + qemu_mutex_init(¤t_incoming->load_finish_ready_mutex); + qemu_cond_init(¤t_incoming->load_finish_ready_cond); + migration_object_check(current_migration, &error_fatal); ram_mig_init(); @@ -387,6 +390,9 @@ void migration_incoming_state_destroy(void) mis->postcopy_qemufile_dst = NULL; } + qemu_mutex_destroy(&mis->load_finish_ready_mutex); + qemu_cond_destroy(&mis->load_finish_ready_cond); + yank_unregister_instance(MIGRATION_YANK_INSTANCE); } diff --git a/migration/migration.h b/migration/migration.h index 6af01362d424..0f2716ac42c6 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -230,6 +230,9 @@ struct MigrationIncomingState { /* Do exit on incoming migration failure */ bool exit_on_error; + + QemuCond load_finish_ready_cond; + QemuMutex load_finish_ready_mutex; }; MigrationIncomingState *migration_incoming_get_current(void); diff --git a/migration/savevm.c b/migration/savevm.c index 2e538cb02936..46cfb73eae79 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -3020,6 +3020,37 @@ int qemu_loadvm_state(QEMUFile *f) return ret; } + qemu_loadvm_load_finish_ready_lock(); + while (!ret) { /* Don't call load_finish() handlers on the load failure path */ + bool all_ready = true; + SaveStateEntry *se = NULL; + + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + bool this_ready; + + if (!se->ops || !se->ops->load_finish) { + continue; + } + + ret = se->ops->load_finish(se->opaque, &this_ready, &local_err); + if (ret) { + error_report_err(local_err); + + qemu_loadvm_load_finish_ready_unlock(); + return -EINVAL; + } else if (!this_ready) { + all_ready = false; + } + } + + if (all_ready) { + break; + } + + qemu_cond_wait(&mis->load_finish_ready_cond, &mis->load_finish_ready_mutex); + } + qemu_loadvm_load_finish_ready_unlock(); + if (ret == 0) { ret = qemu_file_get_error(f); } @@ -3124,6 +3155,27 @@ int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, return 0; } +void qemu_loadvm_load_finish_ready_lock(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + qemu_mutex_lock(&mis->load_finish_ready_mutex); +} + +void qemu_loadvm_load_finish_ready_unlock(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + qemu_mutex_unlock(&mis->load_finish_ready_mutex); +} + +void qemu_loadvm_load_finish_ready_broadcast(void) +{ + MigrationIncomingState *mis = migration_incoming_get_current(); + + qemu_cond_broadcast(&mis->load_finish_ready_cond); +} + bool save_snapshot(const char *name, bool overwrite, const char *vmstate, bool has_devices, strList *devices, Error **errp) { diff --git a/migration/savevm.h b/migration/savevm.h index d388f1bfca98..69ae22cded7a 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -73,4 +73,8 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, int qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id, char *buf, size_t len, Error **errp); +void qemu_loadvm_load_finish_ready_lock(void); +void qemu_loadvm_load_finish_ready_unlock(void); +void qemu_loadvm_load_finish_ready_broadcast(void); + #endif From patchwork Tue Jun 18 16:12:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D9F3BC2BA18 for ; Tue, 18 Jun 2024 16:14:50 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSr-0004Sm-Jn; Tue, 18 Jun 2024 12:13:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSp-0004Ey-Vh for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:43 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSn-0000ty-Tl for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:43 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSX-0001cW-66; Tue, 18 Jun 2024 18:13:25 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 07/13] migration/multifd: Device state transfer support - receive side Date: Tue, 18 Jun 2024 18:12:25 +0200 Message-ID: <41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Add a basic support for receiving device state via multifd channels - channels that are shared with RAM transfers. To differentiate between a device state and a RAM packet the packet header is read first. Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the packet header either device state (MultiFDPacketDeviceState_t) or RAM data (existing MultiFDPacket_t) is then read. The received device state data is provided to qemu_loadvm_load_state_buffer() function for processing in the device's load_state_buffer handler. Signed-off-by: Maciej S. Szmigiero --- migration/multifd.c | 123 +++++++++++++++++++++++++++++++++++++------- migration/multifd.h | 31 ++++++++++- 2 files changed, 134 insertions(+), 20 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index c8a5b363f7d4..6e0af84bb9a1 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -21,6 +21,7 @@ #include "file.h" #include "migration.h" #include "migration-stats.h" +#include "savevm.h" #include "socket.h" #include "tls.h" #include "qemu-file.h" @@ -404,7 +405,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p) uint32_t zero_num = pages->num - pages->normal_num; int i; - packet->flags = cpu_to_be32(p->flags); + packet->hdr.flags = cpu_to_be32(p->flags); packet->pages_alloc = cpu_to_be32(p->pages->allocated); packet->normal_pages = cpu_to_be32(pages->normal_num); packet->zero_pages = cpu_to_be32(zero_num); @@ -432,28 +433,44 @@ void multifd_send_fill_packet(MultiFDSendParams *p) p->flags, p->next_packet_size); } -static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) +static int multifd_recv_unfill_packet_header(MultiFDRecvParams *p, MultiFDPacketHdr_t *hdr, + Error **errp) { - MultiFDPacket_t *packet = p->packet; - int i; - - packet->magic = be32_to_cpu(packet->magic); - if (packet->magic != MULTIFD_MAGIC) { + hdr->magic = be32_to_cpu(hdr->magic); + if (hdr->magic != MULTIFD_MAGIC) { error_setg(errp, "multifd: received packet " "magic %x and expected magic %x", - packet->magic, MULTIFD_MAGIC); + hdr->magic, MULTIFD_MAGIC); return -1; } - packet->version = be32_to_cpu(packet->version); - if (packet->version != MULTIFD_VERSION) { + hdr->version = be32_to_cpu(hdr->version); + if (hdr->version != MULTIFD_VERSION) { error_setg(errp, "multifd: received packet " "version %u and expected version %u", - packet->version, MULTIFD_VERSION); + hdr->version, MULTIFD_VERSION); return -1; } - p->flags = be32_to_cpu(packet->flags); + p->flags = be32_to_cpu(hdr->flags); + + return 0; +} + +static int multifd_recv_unfill_packet_device_state(MultiFDRecvParams *p, Error **errp) +{ + MultiFDPacketDeviceState_t *packet = p->packet_dev_state; + + packet->instance_id = be32_to_cpu(packet->instance_id); + p->next_packet_size = be32_to_cpu(packet->next_packet_size); + + return 0; +} + +static int multifd_recv_unfill_packet_ram(MultiFDRecvParams *p, Error **errp) +{ + MultiFDPacket_t *packet = p->packet; + int i; packet->pages_alloc = be32_to_cpu(packet->pages_alloc); /* @@ -485,7 +502,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) p->next_packet_size = be32_to_cpu(packet->next_packet_size); p->packet_num = be64_to_cpu(packet->packet_num); - p->packets_recved++; p->total_normal_pages += p->normal_num; p->total_zero_pages += p->zero_num; @@ -533,6 +549,19 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) return 0; } +static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp) +{ + p->packets_recved++; + + if (p->flags & MULTIFD_FLAG_DEVICE_STATE) { + return multifd_recv_unfill_packet_device_state(p, errp); + } else { + return multifd_recv_unfill_packet_ram(p, errp); + } + + g_assert_not_reached(); +} + static bool multifd_send_should_exit(void) { return qatomic_read(&multifd_send_state->exiting); @@ -1177,8 +1206,8 @@ bool multifd_send_setup(void) p->packet_len = sizeof(MultiFDPacket_t) + sizeof(uint64_t) * page_count; p->packet = g_malloc0(p->packet_len); - p->packet->magic = cpu_to_be32(MULTIFD_MAGIC); - p->packet->version = cpu_to_be32(MULTIFD_VERSION); + p->packet->hdr.magic = cpu_to_be32(MULTIFD_MAGIC); + p->packet->hdr.version = cpu_to_be32(MULTIFD_VERSION); /* We need one extra place for the packet header */ p->iov = g_new0(struct iovec, page_count + 1); @@ -1353,6 +1382,7 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p) p->packet_len = 0; g_free(p->packet); p->packet = NULL; + g_clear_pointer(&p->packet_dev_state, g_free); g_free(p->iov); p->iov = NULL; g_free(p->normal); @@ -1467,8 +1497,13 @@ static void *multifd_recv_thread(void *opaque) rcu_register_thread(); while (true) { + MultiFDPacketHdr_t hdr; uint32_t flags = 0; + bool is_device_state = false; bool has_data = false; + uint8_t *pkt_buf; + size_t pkt_len; + p->normal_num = 0; if (use_packets) { @@ -1476,8 +1511,27 @@ static void *multifd_recv_thread(void *opaque) break; } - ret = qio_channel_read_all_eof(p->c, (void *)p->packet, - p->packet_len, &local_err); + ret = qio_channel_read_all_eof(p->c, (void *)&hdr, + sizeof(hdr), &local_err); + if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ + break; + } + + ret = multifd_recv_unfill_packet_header(p, &hdr, &local_err); + if (ret) { + break; + } + + is_device_state = p->flags & MULTIFD_FLAG_DEVICE_STATE; + if (is_device_state) { + pkt_buf = (uint8_t *)p->packet_dev_state + sizeof(hdr); + pkt_len = sizeof(*p->packet_dev_state) - sizeof(hdr); + } else { + pkt_buf = (uint8_t *)p->packet + sizeof(hdr); + pkt_len = p->packet_len - sizeof(hdr); + } + + ret = qio_channel_read_all_eof(p->c, (char *)pkt_buf, pkt_len, &local_err); if (ret == 0 || ret == -1) { /* 0: EOF -1: Error */ break; } @@ -1520,8 +1574,33 @@ static void *multifd_recv_thread(void *opaque) has_data = !!p->data->size; } - if (has_data) { - ret = multifd_recv_state->ops->recv(p, &local_err); + if (!is_device_state) { + if (has_data) { + ret = multifd_recv_state->ops->recv(p, &local_err); + if (ret != 0) { + break; + } + } + } else { + g_autofree char *idstr = NULL; + g_autofree char *dev_state_buf = NULL; + + assert(use_packets); + + if (p->next_packet_size > 0) { + dev_state_buf = g_malloc(p->next_packet_size); + + ret = qio_channel_read_all(p->c, dev_state_buf, p->next_packet_size, &local_err); + if (ret != 0) { + break; + } + } + + idstr = g_strndup(p->packet_dev_state->idstr, sizeof(p->packet_dev_state->idstr)); + ret = qemu_loadvm_load_state_buffer(idstr, + p->packet_dev_state->instance_id, + dev_state_buf, p->next_packet_size, + &local_err); if (ret != 0) { break; } @@ -1529,6 +1608,11 @@ static void *multifd_recv_thread(void *opaque) if (use_packets) { if (flags & MULTIFD_FLAG_SYNC) { + if (is_device_state) { + error_setg(&local_err, "multifd: received SYNC device state packet"); + break; + } + qemu_sem_post(&multifd_recv_state->sem_sync); qemu_sem_wait(&p->sem_sync); } @@ -1600,6 +1684,7 @@ int multifd_recv_setup(Error **errp) p->packet_len = sizeof(MultiFDPacket_t) + sizeof(uint64_t) * page_count; p->packet = g_malloc0(p->packet_len); + p->packet_dev_state = g_malloc0(sizeof(*p->packet_dev_state)); } p->name = g_strdup_printf("multifdrecv_%d", i); p->iov = g_new0(struct iovec, page_count); diff --git a/migration/multifd.h b/migration/multifd.h index c9d9b0923953..40ee613dd88a 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -41,6 +41,12 @@ MultiFDRecvData *multifd_get_recv_data(void); #define MULTIFD_FLAG_ZLIB (1 << 1) #define MULTIFD_FLAG_ZSTD (2 << 1) +/* + * If set it means that this packet contains device state + * (MultiFDPacketDeviceState_t), not RAM data (MultiFDPacket_t). + */ +#define MULTIFD_FLAG_DEVICE_STATE (1 << 4) + /* This value needs to be a multiple of qemu_target_page_size() */ #define MULTIFD_PACKET_SIZE (512 * 1024) @@ -48,6 +54,11 @@ typedef struct { uint32_t magic; uint32_t version; uint32_t flags; +} __attribute__((packed)) MultiFDPacketHdr_t; + +typedef struct { + MultiFDPacketHdr_t hdr; + /* maximum number of allocated pages */ uint32_t pages_alloc; /* non zero pages */ @@ -68,6 +79,16 @@ typedef struct { uint64_t offset[]; } __attribute__((packed)) MultiFDPacket_t; +typedef struct { + MultiFDPacketHdr_t hdr; + + char idstr[256] QEMU_NONSTRING; + uint32_t instance_id; + + /* size of the next packet that contains the actual data */ + uint32_t next_packet_size; +} __attribute__((packed)) MultiFDPacketDeviceState_t; + typedef struct { /* number of used pages */ uint32_t num; @@ -87,6 +108,13 @@ struct MultiFDRecvData { off_t file_offset; }; +typedef struct { + char *idstr; + uint32_t instance_id; + char *buf; + size_t buf_len; +} MultiFDDeviceState_t; + typedef struct { /* Fields are only written at creating/deletion time */ /* No lock required for them, they are read only */ @@ -194,8 +222,9 @@ typedef struct { /* thread local variables. No locking required */ - /* pointer to the packet */ + /* pointers to the possible packet types */ MultiFDPacket_t *packet; + MultiFDPacketDeviceState_t *packet_dev_state; /* size of the next packet that contains pages */ uint32_t next_packet_size; /* packets received through this channel */ From patchwork Tue Jun 18 16:12:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7C5B3C41513 for ; Tue, 18 Jun 2024 16:14:35 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbSy-00057m-DQ; Tue, 18 Jun 2024 12:13:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSv-00055D-E7 for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:49 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSt-0000uN-Dn for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:49 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSc-0001cr-CI; Tue, 18 Jun 2024 18:13:30 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 08/13] migration/multifd: Convert multifd_send_pages::next_channel to atomic Date: Tue, 18 Jun 2024 18:12:26 +0200 Message-ID: <9ca772abfb2cd8fbd180d981a97d06f57d9fa2d5.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This is necessary for multifd_send_pages() to be able to be called from multiple threads. Signed-off-by: Maciej S. Szmigiero --- migration/multifd.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 6e0af84bb9a1..daa34172bf24 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -614,26 +614,38 @@ static bool multifd_send_pages(void) return false; } - /* We wait here, until at least one channel is ready */ - qemu_sem_wait(&multifd_send_state->channels_ready); - /* * next_channel can remain from a previous migration that was * using more channels, so ensure it doesn't overflow if the * limit is lower now. */ - next_channel %= migrate_multifd_channels(); - for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) { + i = qatomic_load_acquire(&next_channel); + if (unlikely(i >= migrate_multifd_channels())) { + qatomic_cmpxchg(&next_channel, i, 0); + } + + /* We wait here, until at least one channel is ready */ + qemu_sem_wait(&multifd_send_state->channels_ready); + + while (true) { + int i_next; + if (multifd_send_should_exit()) { return false; } + + i = qatomic_load_acquire(&next_channel); + i_next = (i + 1) % migrate_multifd_channels(); + if (qatomic_cmpxchg(&next_channel, i, i_next) != i) { + continue; + } + p = &multifd_send_state->params[i]; /* * Lockless read to p->pending_job is safe, because only multifd * sender thread can clear it. */ if (qatomic_read(&p->pending_job) == false) { - next_channel = (i + 1) % migrate_multifd_channels(); break; } } From patchwork Tue Jun 18 16:12:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702694 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 336CDC27C4F for ; Tue, 18 Jun 2024 16:14:39 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbTA-0005WE-1V; Tue, 18 Jun 2024 12:14:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT0-0005Ix-Gd for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:56 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbSx-0000uz-5M for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:53 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSh-0001dB-Io; Tue, 18 Jun 2024 18:13:35 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 09/13] migration/multifd: Device state transfer support - send side Date: Tue, 18 Jun 2024 18:12:27 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" A new function multifd_queue_device_state() is provided for device to queue its state for transmission via a multifd channel. Signed-off-by: Maciej S. Szmigiero --- include/migration/misc.h | 4 + migration/multifd-zlib.c | 2 +- migration/multifd-zstd.c | 2 +- migration/multifd.c | 181 +++++++++++++++++++++++++++++++++------ migration/multifd.h | 26 ++++-- 5 files changed, 182 insertions(+), 33 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index bfadc5613bac..abf6f33eeae8 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -111,4 +111,8 @@ bool migration_in_bg_snapshot(void); /* migration/block-dirty-bitmap.c */ void dirty_bitmap_mig_init(void); +/* migration/multifd.c */ +int multifd_queue_device_state(char *idstr, uint32_t instance_id, + char *data, size_t len); + #endif diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c index 737a9645d2fe..424547aa5be0 100644 --- a/migration/multifd-zlib.c +++ b/migration/multifd-zlib.c @@ -177,7 +177,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp) out: p->flags |= MULTIFD_FLAG_ZLIB; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); return 0; } diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c index 256858df0a0a..89ef21898485 100644 --- a/migration/multifd-zstd.c +++ b/migration/multifd-zstd.c @@ -166,7 +166,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp) out: p->flags |= MULTIFD_FLAG_ZSTD; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); return 0; } diff --git a/migration/multifd.c b/migration/multifd.c index daa34172bf24..6a7e5d659925 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -12,6 +12,7 @@ #include "qemu/osdep.h" #include "qemu/cutils.h" +#include "qemu/iov.h" #include "qemu/rcu.h" #include "exec/target_page.h" #include "sysemu/sysemu.h" @@ -19,6 +20,7 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "file.h" +#include "migration/misc.h" #include "migration.h" #include "migration-stats.h" #include "savevm.h" @@ -49,9 +51,12 @@ typedef struct { } __attribute__((packed)) MultiFDInit_t; struct { + QemuMutex queue_job_mutex; + MultiFDSendParams *params; - /* array of pages to sent */ + /* array of pages or device state to be sent */ MultiFDPages_t *pages; + MultiFDDeviceState_t *device_state; /* * Global number of generated multifd packets. * @@ -168,7 +173,7 @@ static void multifd_send_prepare_iovs(MultiFDSendParams *p) } /** - * nocomp_send_prepare: prepare date to be able to send + * nocomp_send_prepare_ram: prepare RAM data for sending * * For no compression we just have to calculate the size of the * packet. @@ -178,7 +183,7 @@ static void multifd_send_prepare_iovs(MultiFDSendParams *p) * @p: Params for the channel that we are using * @errp: pointer to an error */ -static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) +static int nocomp_send_prepare_ram(MultiFDSendParams *p, Error **errp) { bool use_zero_copy_send = migrate_zero_copy_send(); int ret; @@ -197,13 +202,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) * Only !zerocopy needs the header in IOV; zerocopy will * send it separately. */ - multifd_send_prepare_header(p); + multifd_send_prepare_header_ram(p); } multifd_send_prepare_iovs(p); p->flags |= MULTIFD_FLAG_NOCOMP; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); if (use_zero_copy_send) { /* Send header first, without zerocopy */ @@ -217,6 +222,56 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) return 0; } +static void multifd_send_fill_packet_device_state(MultiFDSendParams *p) +{ + MultiFDPacketDeviceState_t *packet = p->packet_device_state; + + packet->hdr.flags = cpu_to_be32(p->flags); + strncpy(packet->idstr, p->device_state->idstr, sizeof(packet->idstr)); + packet->instance_id = cpu_to_be32(p->device_state->instance_id); + packet->next_packet_size = cpu_to_be32(p->next_packet_size); +} + +/** + * nocomp_send_prepare_device_state: prepare device state data for sending + * + * Returns 0 for success or -1 for error + * + * @p: Params for the channel that we are using + * @errp: pointer to an error + */ +static int nocomp_send_prepare_device_state(MultiFDSendParams *p, + Error **errp) +{ + multifd_send_prepare_header_device_state(p); + + assert(!(p->flags & MULTIFD_FLAG_SYNC)); + + p->next_packet_size = p->device_state->buf_len; + if (p->next_packet_size > 0) { + p->iov[p->iovs_num].iov_base = p->device_state->buf; + p->iov[p->iovs_num].iov_len = p->next_packet_size; + p->iovs_num++; + } + + p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE; + + multifd_send_fill_packet_device_state(p); + + return 0; +} + +static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp) +{ + if (p->is_device_state_job) { + return nocomp_send_prepare_device_state(p, errp); + } else { + return nocomp_send_prepare_ram(p, errp); + } + + g_assert_not_reached(); +} + /** * nocomp_recv_setup: setup receive side * @@ -397,7 +452,18 @@ static void multifd_pages_clear(MultiFDPages_t *pages) g_free(pages); } -void multifd_send_fill_packet(MultiFDSendParams *p) +static void multifd_device_state_free(MultiFDDeviceState_t *device_state) +{ + if (!device_state) { + return; + } + + g_clear_pointer(&device_state->idstr, g_free); + g_clear_pointer(&device_state->buf, g_free); + g_free(device_state); +} + +void multifd_send_fill_packet_ram(MultiFDSendParams *p) { MultiFDPacket_t *packet = p->packet; MultiFDPages_t *pages = p->pages; @@ -585,7 +651,8 @@ static void multifd_send_kick_main(MultiFDSendParams *p) } /* - * How we use multifd_send_state->pages and channel->pages? + * How we use multifd_send_state->pages + channel->pages + * and multifd_send_state->device_state + channel->device_state? * * We create a pages for each channel, and a main one. Each time that * we need to send a batch of pages we interchange the ones between @@ -601,14 +668,15 @@ static void multifd_send_kick_main(MultiFDSendParams *p) * have to had finish with its own, otherwise pending_job can't be * false. * + * 'device_state' struct has similar handling. + * * Returns true if succeed, false otherwise. */ -static bool multifd_send_pages(void) +static bool multifd_send_queue_job(bool is_device_state) { int i; static int next_channel; MultiFDSendParams *p = NULL; /* make happy gcc */ - MultiFDPages_t *pages = multifd_send_state->pages; if (multifd_send_should_exit()) { return false; @@ -645,7 +713,7 @@ static bool multifd_send_pages(void) * Lockless read to p->pending_job is safe, because only multifd * sender thread can clear it. */ - if (qatomic_read(&p->pending_job) == false) { + if (qatomic_cmpxchg(&p->pending_job_preparing, false, true) == false) { break; } } @@ -655,12 +723,30 @@ static bool multifd_send_pages(void) * qatomic_store_release() in multifd_send_thread(). */ smp_mb_acquire(); - assert(!p->pages->num); - multifd_send_state->pages = p->pages; - p->pages = pages; + + if (!is_device_state) { + assert(!p->pages->num); + } else { + assert(!p->device_state->buf); + } + + p->is_device_state_job = is_device_state; + + if (!is_device_state) { + MultiFDPages_t *pages = multifd_send_state->pages; + + multifd_send_state->pages = p->pages; + p->pages = pages; + } else { + MultiFDDeviceState_t *device_state = multifd_send_state->device_state; + + multifd_send_state->device_state = p->device_state; + p->device_state = device_state; + } + /* - * Making sure p->pages is setup before marking pending_job=true. Pairs - * with the qatomic_load_acquire() in multifd_send_thread(). + * Making sure p->pages or p->device state is setup before marking + * pending_job=true. Pairs with the qatomic_load_acquire() in multifd_send_thread(). */ qatomic_store_release(&p->pending_job, true); qemu_sem_post(&p->sem); @@ -707,7 +793,7 @@ retry: * After flush, always retry. */ if (pages->block != block || multifd_queue_full(pages)) { - if (!multifd_send_pages()) { + if (!multifd_send_queue_job(false)) { return false; } goto retry; @@ -718,6 +804,28 @@ retry: return true; } +int multifd_queue_device_state(char *idstr, uint32_t instance_id, + char *data, size_t len) +{ + /* Device state submissions can come from multiple threads */ + QEMU_LOCK_GUARD(&multifd_send_state->queue_job_mutex); + MultiFDDeviceState_t *device_state = multifd_send_state->device_state; + + assert(!device_state->buf); + device_state->idstr = g_strdup(idstr); + device_state->instance_id = instance_id; + device_state->buf = g_memdup2(data, len); + device_state->buf_len = len; + + if (!multifd_send_queue_job(true)) { + g_clear_pointer(&device_state->idstr, g_free); + g_clear_pointer(&device_state->buf, g_free); + return -1; + } + + return 0; +} + /* Multifd send side hit an error; remember it and prepare to quit */ static void multifd_send_set_error(Error *err) { @@ -822,10 +930,12 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp) multifd_pages_clear(p->pages); p->pages = NULL; p->packet_len = 0; + g_clear_pointer(&p->packet_device_state, g_free); g_free(p->packet); p->packet = NULL; g_free(p->iov); p->iov = NULL; + g_clear_pointer(&p->device_state, multifd_device_state_free); multifd_send_state->ops->send_cleanup(p, errp); return *errp == NULL; @@ -840,7 +950,9 @@ static void multifd_send_cleanup_state(void) g_free(multifd_send_state->params); multifd_send_state->params = NULL; multifd_pages_clear(multifd_send_state->pages); + g_clear_pointer(&multifd_send_state->device_state, multifd_device_state_free); multifd_send_state->pages = NULL; + qemu_mutex_destroy(&multifd_send_state->queue_job_mutex); g_free(multifd_send_state); multifd_send_state = NULL; } @@ -894,10 +1006,11 @@ int multifd_send_sync_main(void) return 0; } if (multifd_send_state->pages->num) { - if (!multifd_send_pages()) { + if (!multifd_send_queue_job(false)) { error_report("%s: multifd_send_pages fail", __func__); return -1; } + assert(!multifd_send_state->pages->num); } flush_zero_copy = migrate_zero_copy_send(); @@ -973,17 +1086,22 @@ static void *multifd_send_thread(void *opaque) */ if (qatomic_load_acquire(&p->pending_job)) { MultiFDPages_t *pages = p->pages; + bool is_device_state = p->is_device_state_job; + size_t total_size; p->flags = 0; p->iovs_num = 0; - assert(pages->num); + assert(is_device_state || pages->num); ret = multifd_send_state->ops->send_prepare(p, &local_err); if (ret != 0) { break; } + total_size = iov_size(p->iov, p->iovs_num); if (migrate_mapped_ram()) { + assert(!is_device_state); + ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num, p->pages->block, &local_err); } else { @@ -996,12 +1114,18 @@ static void *multifd_send_thread(void *opaque) break; } - stat64_add(&mig_stats.multifd_bytes, - p->next_packet_size + p->packet_len); - stat64_add(&mig_stats.normal_pages, pages->normal_num); - stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num); + stat64_add(&mig_stats.multifd_bytes, total_size); + if (!is_device_state) { + stat64_add(&mig_stats.normal_pages, pages->normal_num); + stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num); + } - multifd_pages_reset(p->pages); + if (is_device_state) { + g_clear_pointer(&p->device_state->idstr, g_free); + g_clear_pointer(&p->device_state->buf, g_free); + } else { + multifd_pages_reset(p->pages); + } p->next_packet_size = 0; /* @@ -1010,6 +1134,7 @@ static void *multifd_send_thread(void *opaque) * multifd_send_pages(). */ qatomic_store_release(&p->pending_job, false); + qatomic_store_release(&p->pending_job_preparing, false); } else { /* * If not a normal job, must be a sync request. Note that @@ -1020,7 +1145,7 @@ static void *multifd_send_thread(void *opaque) if (use_packets) { p->flags = MULTIFD_FLAG_SYNC; - multifd_send_fill_packet(p); + multifd_send_fill_packet_ram(p); ret = qio_channel_write_all(p->c, (void *)p->packet, p->packet_len, &local_err); if (ret != 0) { @@ -1199,9 +1324,11 @@ bool multifd_send_setup(void) thread_count = migrate_multifd_channels(); multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); + qemu_mutex_init(&multifd_send_state->queue_job_mutex); multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); multifd_send_state->pages = multifd_pages_init(page_count); qemu_sem_init(&multifd_send_state->channels_created, 0); + multifd_send_state->device_state = g_malloc0(sizeof(*multifd_send_state->device_state)); qemu_sem_init(&multifd_send_state->channels_ready, 0); qatomic_set(&multifd_send_state->exiting, 0); multifd_send_state->ops = multifd_ops[migrate_multifd_compression()]; @@ -1215,11 +1342,15 @@ bool multifd_send_setup(void) p->pages = multifd_pages_init(page_count); if (use_packets) { + p->device_state = g_malloc0(sizeof(*p->device_state)); + p->packet_len = sizeof(MultiFDPacket_t) + sizeof(uint64_t) * page_count; p->packet = g_malloc0(p->packet_len); p->packet->hdr.magic = cpu_to_be32(MULTIFD_MAGIC); p->packet->hdr.version = cpu_to_be32(MULTIFD_VERSION); + p->packet_device_state = g_malloc0(sizeof(*p->packet_device_state)); + p->packet_device_state->hdr = p->packet->hdr; /* We need one extra place for the packet header */ p->iov = g_new0(struct iovec, page_count + 1); @@ -1786,7 +1917,7 @@ bool multifd_send_prepare_common(MultiFDSendParams *p) return false; } - multifd_send_prepare_header(p); + multifd_send_prepare_header_ram(p); return true; } diff --git a/migration/multifd.h b/migration/multifd.h index 40ee613dd88a..655bec110f87 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -156,18 +156,25 @@ typedef struct { * cleared by the multifd sender threads. */ bool pending_job; + bool pending_job_preparing; bool pending_sync; - /* array of pages to sent. - * The owner of 'pages' depends of 'pending_job' value: + + /* Whether the pending job is pages (false) or device state (true) */ + bool is_device_state_job; + + /* Array of pages or device state to be sent (depending on the flag above). + * The owner of these depends of 'pending_job' value: * pending_job == 0 -> migration_thread can use it. * pending_job != 0 -> multifd_channel can use it. */ MultiFDPages_t *pages; + MultiFDDeviceState_t *device_state; /* thread local variables. No locking required */ - /* pointer to the packet */ + /* pointers to the possible packet types */ MultiFDPacket_t *packet; + MultiFDPacketDeviceState_t *packet_device_state; /* size of the next packet that contains pages */ uint32_t next_packet_size; /* packets sent through this channel */ @@ -267,18 +274,25 @@ typedef struct { } MultiFDMethods; void multifd_register_ops(int method, MultiFDMethods *ops); -void multifd_send_fill_packet(MultiFDSendParams *p); +void multifd_send_fill_packet_ram(MultiFDSendParams *p); bool multifd_send_prepare_common(MultiFDSendParams *p); void multifd_send_zero_page_detect(MultiFDSendParams *p); void multifd_recv_zero_page_process(MultiFDRecvParams *p); -static inline void multifd_send_prepare_header(MultiFDSendParams *p) +void multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc); + +static inline void multifd_send_prepare_header_ram(MultiFDSendParams *p) { p->iov[0].iov_len = p->packet_len; p->iov[0].iov_base = p->packet; p->iovs_num++; } -void multifd_channel_connect(MultiFDSendParams *p, QIOChannel *ioc); +static inline void multifd_send_prepare_header_device_state(MultiFDSendParams *p) +{ + p->iov[0].iov_len = sizeof(*p->packet_device_state); + p->iov[0].iov_base = p->packet_device_state; + p->iovs_num++; +} #endif From patchwork Tue Jun 18 16:12:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08441C2BB85 for ; Tue, 18 Jun 2024 16:14:35 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbTC-0005dF-3W; Tue, 18 Jun 2024 12:14:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT2-0005K8-Ey for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:58 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT0-0000vK-Sh for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:55 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSm-0001dZ-Qi; Tue, 18 Jun 2024 18:13:40 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 10/13] migration/multifd: Add migration_has_device_state_support() Date: Tue, 18 Jun 2024 18:12:28 +0200 Message-ID: <2dd6ec214193ecc7b1aba9e5f9655aeb9b213f28.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Since device state transfer via multifd channels requires multifd channels with packets and is currently not compatible with multifd compression add an appropriate query function so device can learn whether it can actually make use of it. Signed-off-by: Maciej S. Szmigiero --- include/migration/misc.h | 1 + migration/multifd.c | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/include/migration/misc.h b/include/migration/misc.h index abf6f33eeae8..4f3de2f23819 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -112,6 +112,7 @@ bool migration_in_bg_snapshot(void); void dirty_bitmap_mig_init(void); /* migration/multifd.c */ +bool migration_has_device_state_support(void); int multifd_queue_device_state(char *idstr, uint32_t instance_id, char *data, size_t len); diff --git a/migration/multifd.c b/migration/multifd.c index 6a7e5d659925..e5f7021465ec 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -804,6 +804,12 @@ retry: return true; } +bool migration_has_device_state_support(void) +{ + return migrate_multifd() && !migrate_mapped_ram() && + migrate_multifd_compression() == MULTIFD_COMPRESSION_NONE; +} + int multifd_queue_device_state(char *idstr, uint32_t instance_id, char *data, size_t len) { From patchwork Tue Jun 18 16:12:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E061EC2BA18 for ; Tue, 18 Jun 2024 16:14:26 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbTJ-00067L-5d; Tue, 18 Jun 2024 12:14:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT8-0005ac-1l for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:03 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT5-0000vm-EA for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:01 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSs-0001dw-0e; Tue, 18 Jun 2024 18:13:46 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 11/13] vfio/migration: Multifd device state transfer support - receive side Date: Tue, 18 Jun 2024 18:12:29 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" The multifd received data needs to be reassembled since device state packets sent via different multifd channels can arrive out-of-order. Therefore, each VFIO device state packet carries a header indicating its position in the stream. The last such VFIO device state packet should have VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state. Since it's important to finish loading device state transferred via the main migration channel (via save_live_iterate handler) before starting loading the data asynchronously transferred via multifd a new VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE flag is introduced to mark the end of the main migration channel data. The device state loading process waits until that flag is seen before commencing loading of the multifd-transferred device state. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 325 +++++++++++++++++++++++++++++++++- hw/vfio/trace-events | 9 +- include/hw/vfio/vfio-common.h | 14 ++ 3 files changed, 344 insertions(+), 4 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 93f767e3c2dd..719e36800ab5 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -15,6 +15,7 @@ #include #include +#include "io/channel-buffer.h" #include "sysemu/runstate.h" #include "hw/vfio/vfio-common.h" #include "migration/misc.h" @@ -47,6 +48,7 @@ #define VFIO_MIG_FLAG_DEV_SETUP_STATE (0xffffffffef100003ULL) #define VFIO_MIG_FLAG_DEV_DATA_STATE (0xffffffffef100004ULL) #define VFIO_MIG_FLAG_DEV_INIT_DATA_SENT (0xffffffffef100005ULL) +#define VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE (0xffffffffef100006ULL) /* * This is an arbitrary size based on migration of mlx5 devices, where typically @@ -55,6 +57,15 @@ */ #define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB) +#define VFIO_DEVICE_STATE_CONFIG_STATE (1) + +typedef struct VFIODeviceStatePacket { + uint32_t version; + uint32_t idx; + uint32_t flags; + uint8_t data[0]; +} QEMU_PACKED VFIODeviceStatePacket; + static int64_t bytes_transferred; static const char *mig_state_to_str(enum vfio_device_mig_state state) @@ -254,6 +265,176 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev, return ret; } +typedef struct LoadedBuffer { + bool is_present; + char *data; + size_t len; +} LoadedBuffer; + +static void loaded_buffer_clear(gpointer data) +{ + LoadedBuffer *lb = data; + + if (!lb->is_present) { + return; + } + + g_clear_pointer(&lb->data, g_free); + lb->is_present = false; +} + +static int vfio_load_state_buffer(void *opaque, char *data, size_t data_size, + Error **errp) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + VFIODeviceStatePacket *packet = (VFIODeviceStatePacket *)data; + QEMU_LOCK_GUARD(&migration->load_bufs_mutex); + LoadedBuffer *lb; + + if (data_size < sizeof(*packet)) { + error_setg(errp, "packet too short at %zu (min is %zu)", + data_size, sizeof(*packet)); + return -1; + } + + if (packet->version != 0) { + error_setg(errp, "packet has unknown version %" PRIu32, + packet->version); + return -1; + } + + if (packet->idx == UINT32_MAX) { + error_setg(errp, "packet has too high idx %" PRIu32, + packet->idx); + return -1; + } + + trace_vfio_load_state_device_buffer_incoming(vbasedev->name, packet->idx); + + /* config state packet should be the last one in the stream */ + if (packet->flags & VFIO_DEVICE_STATE_CONFIG_STATE) { + migration->load_buf_idx_last = packet->idx; + } + + assert(migration->load_bufs); + if (packet->idx >= migration->load_bufs->len) { + g_array_set_size(migration->load_bufs, packet->idx + 1); + } + + lb = &g_array_index(migration->load_bufs, typeof(*lb), packet->idx); + if (lb->is_present) { + error_setg(errp, "state buffer %" PRIu32 " already filled", packet->idx); + return -1; + } + + assert(packet->idx >= migration->load_buf_idx); + + lb->data = g_memdup2(&packet->data, data_size - sizeof(*packet)); + lb->len = data_size - sizeof(*packet); + lb->is_present = true; + + qemu_cond_broadcast(&migration->load_bufs_buffer_ready_cond); + + return 0; +} + +static void *vfio_load_bufs_thread(void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + Error **errp = &migration->load_bufs_thread_errp; + g_autoptr(QemuLockable) locker = qemu_lockable_auto_lock( + QEMU_MAKE_LOCKABLE(&migration->load_bufs_mutex)); + LoadedBuffer *lb; + + while (!migration->load_bufs_device_ready && + !migration->load_bufs_thread_want_exit) { + qemu_cond_wait(&migration->load_bufs_device_ready_cond, &migration->load_bufs_mutex); + } + + while (!migration->load_bufs_thread_want_exit) { + bool starved; + ssize_t ret; + + assert(migration->load_buf_idx <= migration->load_buf_idx_last); + + if (migration->load_buf_idx >= migration->load_bufs->len) { + assert(migration->load_buf_idx == migration->load_bufs->len); + starved = true; + } else { + lb = &g_array_index(migration->load_bufs, typeof(*lb), migration->load_buf_idx); + starved = !lb->is_present; + } + + if (starved) { + trace_vfio_load_state_device_buffer_starved(vbasedev->name, migration->load_buf_idx); + qemu_cond_wait(&migration->load_bufs_buffer_ready_cond, &migration->load_bufs_mutex); + continue; + } + + if (migration->load_buf_idx == migration->load_buf_idx_last) { + break; + } + + if (migration->load_buf_idx == 0) { + trace_vfio_load_state_device_buffer_start(vbasedev->name); + } + + if (lb->len) { + g_autofree char *buf = NULL; + size_t buf_len; + int errno_save; + + trace_vfio_load_state_device_buffer_load_start(vbasedev->name, + migration->load_buf_idx); + + /* lb might become re-allocated when we drop the lock */ + buf = g_steal_pointer(&lb->data); + buf_len = lb->len; + + /* Loading data to the device takes a while, drop the lock during this process */ + qemu_mutex_unlock(&migration->load_bufs_mutex); + ret = write(migration->data_fd, buf, buf_len); + errno_save = errno; + qemu_mutex_lock(&migration->load_bufs_mutex); + + if (ret < 0) { + error_setg(errp, "write to state buffer %" PRIu32 " failed with %d", + migration->load_buf_idx, errno_save); + break; + } else if (ret < buf_len) { + error_setg(errp, "write to state buffer %" PRIu32 " incomplete %zd / %zu", + migration->load_buf_idx, ret, buf_len); + break; + } + + trace_vfio_load_state_device_buffer_load_end(vbasedev->name, + migration->load_buf_idx); + } + + if (migration->load_buf_idx == migration->load_buf_idx_last - 1) { + trace_vfio_load_state_device_buffer_end(vbasedev->name); + } + + migration->load_buf_idx++; + } + + if (migration->load_bufs_thread_want_exit && + !*errp) { + error_setg(errp, "load bufs thread asked to quit"); + } + + g_clear_pointer(&locker, qemu_lockable_auto_unlock); + + qemu_loadvm_load_finish_ready_lock(); + migration->load_bufs_thread_finished = true; + qemu_loadvm_load_finish_ready_broadcast(); + qemu_loadvm_load_finish_ready_unlock(); + + return NULL; +} + static int vfio_save_device_config_state(QEMUFile *f, void *opaque, Error **errp) { @@ -285,6 +466,8 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) VFIODevice *vbasedev = opaque; uint64_t data; + trace_vfio_load_device_config_state_start(vbasedev->name); + if (vbasedev->ops && vbasedev->ops->vfio_load_config) { int ret; @@ -303,7 +486,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void *opaque) return -EINVAL; } - trace_vfio_load_device_config_state(vbasedev->name); + trace_vfio_load_device_config_state_end(vbasedev->name); return qemu_file_get_error(f); } @@ -687,16 +870,69 @@ static void vfio_save_state(QEMUFile *f, void *opaque) static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + int ret; + + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, + vbasedev->migration->device_state, errp); + if (ret) { + return ret; + } + + assert(!migration->load_bufs); + migration->load_bufs = g_array_new(FALSE, TRUE, sizeof(LoadedBuffer)); + g_array_set_clear_func(migration->load_bufs, loaded_buffer_clear); + + qemu_mutex_init(&migration->load_bufs_mutex); + + migration->load_bufs_device_ready = false; + qemu_cond_init(&migration->load_bufs_device_ready_cond); + + migration->load_buf_idx = 0; + migration->load_buf_idx_last = UINT32_MAX; + qemu_cond_init(&migration->load_bufs_buffer_ready_cond); + + migration->config_state_loaded_to_dev = false; + + assert(!migration->load_bufs_thread_started); + + migration->load_bufs_thread_finished = false; + migration->load_bufs_thread_want_exit = false; + qemu_thread_create(&migration->load_bufs_thread, "vfio-load-bufs", + vfio_load_bufs_thread, opaque, QEMU_THREAD_JOINABLE); - return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, - vbasedev->migration->device_state, errp); + migration->load_bufs_thread_started = true; + + return 0; } static int vfio_load_cleanup(void *opaque) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + + if (migration->load_bufs_thread_started) { + qemu_mutex_lock(&migration->load_bufs_mutex); + migration->load_bufs_thread_want_exit = true; + qemu_mutex_unlock(&migration->load_bufs_mutex); + + qemu_cond_broadcast(&migration->load_bufs_device_ready_cond); + qemu_cond_broadcast(&migration->load_bufs_buffer_ready_cond); + + qemu_thread_join(&migration->load_bufs_thread); + + assert(migration->load_bufs_thread_finished); + + migration->load_bufs_thread_started = false; + } vfio_migration_cleanup(vbasedev); + + g_clear_pointer(&migration->load_bufs, g_array_unref); + qemu_cond_destroy(&migration->load_bufs_buffer_ready_cond); + qemu_cond_destroy(&migration->load_bufs_device_ready_cond); + qemu_mutex_destroy(&migration->load_bufs_mutex); + trace_vfio_load_cleanup(vbasedev->name); return 0; @@ -705,6 +941,7 @@ static int vfio_load_cleanup(void *opaque) static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; int ret = 0; uint64_t data; @@ -716,6 +953,7 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) switch (data) { case VFIO_MIG_FLAG_DEV_CONFIG_STATE: { + migration->config_state_loaded_to_dev = true; return vfio_load_device_config_state(f, opaque); } case VFIO_MIG_FLAG_DEV_SETUP_STATE: @@ -742,6 +980,15 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) } break; } + case VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE: + { + QEMU_LOCK_GUARD(&migration->load_bufs_mutex); + + migration->load_bufs_device_ready = true; + qemu_cond_broadcast(&migration->load_bufs_device_ready_cond); + + break; + } case VFIO_MIG_FLAG_DEV_INIT_DATA_SENT: { if (!vfio_precopy_supported(vbasedev) || @@ -774,6 +1021,76 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) return ret; } +static int vfio_load_finish(void *opaque, bool *is_finished, Error **errp) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + g_autoptr(QemuLockable) locker = NULL; + LoadedBuffer *lb; + g_autoptr(QIOChannelBuffer) bioc = NULL; + QEMUFile *f_out = NULL, *f_in = NULL; + uint64_t mig_header; + int ret; + + if (migration->config_state_loaded_to_dev) { + *is_finished = true; + return 0; + } + + if (!migration->load_bufs_thread_finished) { + assert(migration->load_bufs_thread_started); + *is_finished = false; + return 0; + } + + if (migration->load_bufs_thread_errp) { + error_propagate(errp, g_steal_pointer(&migration->load_bufs_thread_errp)); + return -1; + } + + locker = qemu_lockable_auto_lock(QEMU_MAKE_LOCKABLE(&migration->load_bufs_mutex)); + + assert(migration->load_buf_idx == migration->load_buf_idx_last); + lb = &g_array_index(migration->load_bufs, typeof(*lb), migration->load_buf_idx); + assert(lb->is_present); + + bioc = qio_channel_buffer_new(lb->len); + qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-load"); + + f_out = qemu_file_new_output(QIO_CHANNEL(bioc)); + qemu_put_buffer(f_out, (uint8_t *)lb->data, lb->len); + + ret = qemu_fflush(f_out); + if (ret) { + error_setg(errp, "load device config state file flush failed with %d", ret); + g_clear_pointer(&f_out, qemu_fclose); + return -1; + } + + qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL); + f_in = qemu_file_new_input(QIO_CHANNEL(bioc)); + + mig_header = qemu_get_be64(f_in); + if (mig_header != VFIO_MIG_FLAG_DEV_CONFIG_STATE) { + error_setg(errp, "load device config state invalid header %"PRIu64, mig_header); + g_clear_pointer(&f_out, qemu_fclose); + g_clear_pointer(&f_in, qemu_fclose); + return -1; + } + + ret = vfio_load_device_config_state(f_in, opaque); + g_clear_pointer(&f_out, qemu_fclose); + g_clear_pointer(&f_in, qemu_fclose); + if (ret < 0) { + error_setg(errp, "load device config state failed with %d", ret); + return -1; + } + + migration->config_state_loaded_to_dev = true; + *is_finished = true; + return 0; +} + static bool vfio_switchover_ack_needed(void *opaque) { VFIODevice *vbasedev = opaque; @@ -794,6 +1111,8 @@ static const SaveVMHandlers savevm_vfio_handlers = { .load_setup = vfio_load_setup, .load_cleanup = vfio_load_cleanup, .load_state = vfio_load_state, + .load_state_buffer = vfio_load_state_buffer, + .load_finish = vfio_load_finish, .switchover_ack_needed = vfio_switchover_ack_needed, }; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 814000796687..7f224e4d240f 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -148,9 +148,16 @@ vfio_display_edid_write_error(void) "" # migration.c vfio_load_cleanup(const char *name) " (%s)" -vfio_load_device_config_state(const char *name) " (%s)" +vfio_load_device_config_state_start(const char *name) " (%s)" +vfio_load_device_config_state_end(const char *name) " (%s)" vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64 vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " (%s) size 0x%"PRIx64" ret %d" +vfio_load_state_device_buffer_incoming(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_start(const char *name) " (%s)" +vfio_load_state_device_buffer_starved(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_load_start(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_load_end(const char *name, uint32_t idx) " (%s) idx %"PRIu32 +vfio_load_state_device_buffer_end(const char *name) " (%s)" vfio_migration_realize(const char *name) " (%s)" vfio_migration_set_device_state(const char *name, const char *state) " (%s) state %s" vfio_migration_set_state(const char *name, const char *new_state, const char *recover_state) " (%s) new state %s, recover state %s" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 510818f4dae3..aa8476a859a6 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -74,6 +74,20 @@ typedef struct VFIOMigration { bool save_iterate_run; bool save_iterate_empty_hit; + QemuThread load_bufs_thread; + Error *load_bufs_thread_errp; + bool load_bufs_thread_started; + bool load_bufs_thread_finished; + bool load_bufs_thread_want_exit; + + GArray *load_bufs; + bool load_bufs_device_ready; + QemuCond load_bufs_device_ready_cond; + QemuCond load_bufs_buffer_ready_cond; + QemuMutex load_bufs_mutex; + uint32_t load_buf_idx; + uint32_t load_buf_idx_last; + bool config_state_loaded_to_dev; } VFIOMigration; struct VFIOGroup; From patchwork Tue Jun 18 16:12:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1702AC27C4F for ; Tue, 18 Jun 2024 16:14:56 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbTI-0005uZ-7b; Tue, 18 Jun 2024 12:14:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT5-0005Vg-SS for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:00 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbT4-0000vc-BM for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:13:59 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbSx-0001eF-7S; Tue, 18 Jun 2024 18:13:51 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 12/13] vfio/migration: Add x-migration-multifd-transfer VFIO property Date: Tue, 18 Jun 2024 18:12:30 +0200 Message-ID: <9b8ae3b7cbfd2e4328ae8e0167fb51d04567cbac.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" This property allows configuring at runtime whether to send the particular device state via multifd channels when live migrating that device. It is ignored on the receive side and defaults to "false" for bit stream compatibility with older QEMU versions. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/pci.c | 7 +++++++ include/hw/vfio/vfio-common.h | 1 + 2 files changed, 8 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 74a79bdf61f9..e2ac1db96002 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3346,6 +3346,8 @@ static void vfio_instance_init(Object *obj) pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS; } +static PropertyInfo qdev_prop_bool_mutable; + static Property vfio_pci_dev_properties[] = { DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host), DEFINE_PROP_UUID_NODEFAULT("vf-token", VFIOPCIDevice, vf_token), @@ -3367,6 +3369,8 @@ static Property vfio_pci_dev_properties[] = { VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice, vbasedev.enable_migration, ON_OFF_AUTO_AUTO), + DEFINE_PROP("x-migration-multifd-transfer", VFIOPCIDevice, + vbasedev.migration_multifd_transfer, qdev_prop_bool_mutable, bool), DEFINE_PROP_BOOL("migration-events", VFIOPCIDevice, vbasedev.migration_events, false), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), @@ -3464,6 +3468,9 @@ static const TypeInfo vfio_pci_nohotplug_dev_info = { static void register_vfio_pci_dev_type(void) { + qdev_prop_bool_mutable = qdev_prop_bool; + qdev_prop_bool_mutable.realized_set_allowed = true; + type_register_static(&vfio_pci_dev_info); type_register_static(&vfio_pci_nohotplug_dev_info); } diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index aa8476a859a6..bc85891d8fff 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -132,6 +132,7 @@ typedef struct VFIODevice { bool no_mmap; bool ram_block_discard_allowed; OnOffAuto enable_migration; + bool migration_multifd_transfer; bool migration_events; VFIODeviceOps *ops; unsigned int num_irqs; From patchwork Tue Jun 18 16:12:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej S. Szmigiero" X-Patchwork-Id: 13702696 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D3DBDC2BB85 for ; Tue, 18 Jun 2024 16:14:53 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJbTJ-0006FF-VA; Tue, 18 Jun 2024 12:14:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbTD-0005hi-1D for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:08 -0400 Received: from vps-vb.mhejs.net ([37.28.154.113]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJbTB-0000wN-0O for qemu-devel@nongnu.org; Tue, 18 Jun 2024 12:14:06 -0400 Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sJbT2-0001ee-E2; Tue, 18 Jun 2024 18:13:56 +0200 From: "Maciej S. Szmigiero" To: Peter Xu , Fabiano Rosas Cc: Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_G?= =?utf-8?q?oater?= , Eric Blake , Markus Armbruster , =?utf-8?q?Daniel_P_=2E_Berrang=C3=A9?= , Avihai Horon , Joao Martins , qemu-devel@nongnu.org Subject: [PATCH v1 13/13] vfio/migration: Multifd device state transfer support - send side Date: Tue, 18 Jun 2024 18:12:31 +0200 Message-ID: <4630a3df3b9ecc12a4df4df884bb9e5ca3278c08.1718717584.git.maciej.szmigiero@oracle.com> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=37.28.154.113; envelope-from=mail@maciej.szmigiero.name; helo=vps-vb.mhejs.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org From: "Maciej S. Szmigiero" Implement the multifd device state transfer via additional per-device thread spawned from save_live_complete_precopy_begin handler. Switch between doing the data transfer in the new handler and doing it in the old save_state handler depending on the x-migration-multifd-transfer device property value. Signed-off-by: Maciej S. Szmigiero --- hw/vfio/migration.c | 207 ++++++++++++++++++++++++++++++++++ hw/vfio/trace-events | 3 + include/hw/vfio/vfio-common.h | 9 ++ 3 files changed, 219 insertions(+) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 719e36800ab5..28a835f8a945 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -643,6 +643,16 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp) uint64_t stop_copy_size = VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE; int ret; + /* Make a copy of this setting at the start in case it is changed mid-migration */ + migration->multifd_transfer = vbasedev->migration_multifd_transfer; + + if (migration->multifd_transfer && !migration_has_device_state_support()) { + error_setg(errp, + "%s: Multifd device transfer requested but unsupported in the current config", + vbasedev->name); + return -EINVAL; + } + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE); vfio_query_stop_copy_size(vbasedev, &stop_copy_size); @@ -692,6 +702,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp) return ret; } +static void vfio_save_complete_precopy_async_thread_thread_terminate(VFIODevice *vbasedev); + static void vfio_save_cleanup(void *opaque) { VFIODevice *vbasedev = opaque; @@ -699,6 +711,8 @@ static void vfio_save_cleanup(void *opaque) Error *local_err = NULL; int ret; + vfio_save_complete_precopy_async_thread_thread_terminate(vbasedev); + /* * Changing device state from STOP_COPY to STOP can take time. Do it here, * after migration has completed, so it won't increase downtime. @@ -712,6 +726,7 @@ static void vfio_save_cleanup(void *opaque) } } + g_clear_pointer(&migration->idstr, g_free); g_free(migration->data_buffer); migration->data_buffer = NULL; migration->precopy_init_size = 0; @@ -823,10 +838,17 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; ssize_t data_size; int ret; Error *local_err = NULL; + if (migration->multifd_transfer) { + /* Emit dummy NOP data */ + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + return 0; + } + trace_vfio_save_complete_precopy_started(vbasedev->name); /* We reach here with device state STOP or STOP_COPY only */ @@ -852,12 +874,188 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) return ret; } +static int vfio_save_complete_precopy_async_thread_config_state(VFIODevice *vbasedev, uint32_t idx) +{ + VFIOMigration *migration = vbasedev->migration; + g_autoptr(QIOChannelBuffer) bioc = NULL; + QEMUFile *f = NULL; + int ret; + g_autofree VFIODeviceStatePacket *packet = NULL; + size_t packet_len; + + bioc = qio_channel_buffer_new(0); + qio_channel_set_name(QIO_CHANNEL(bioc), "vfio-device-config-save"); + + f = qemu_file_new_output(QIO_CHANNEL(bioc)); + + ret = vfio_save_device_config_state(f, vbasedev, NULL); + if (ret) { + return ret; + } + + ret = qemu_fflush(f); + if (ret) { + goto ret_close_file; + } + + packet_len = sizeof(*packet) + bioc->usage; + packet = g_malloc0(packet_len); + packet->idx = idx; + packet->flags = VFIO_DEVICE_STATE_CONFIG_STATE; + memcpy(&packet->data, bioc->data, bioc->usage); + + ret = multifd_queue_device_state(migration->idstr, migration->instance_id, + (char *)packet, packet_len); + + bytes_transferred += packet_len; + +ret_close_file: + g_clear_pointer(&f, qemu_fclose); + return ret; +} + +static void *vfio_save_complete_precopy_async_thread(void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + int *ret = &migration->save_complete_precopy_thread_ret; + g_autofree VFIODeviceStatePacket *packet = NULL; + uint32_t idx; + + /* We reach here with device state STOP or STOP_COPY only */ + *ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY, + VFIO_DEVICE_STATE_STOP, NULL); + if (*ret) { + return NULL; + } + + packet = g_malloc0(sizeof(*packet) + migration->data_buffer_size); + + for (idx = 0; ; idx++) { + ssize_t data_size; + size_t packet_size; + + data_size = read(migration->data_fd, &packet->data, + migration->data_buffer_size); + if (data_size < 0) { + if (errno != ENOMSG) { + *ret = -errno; + return NULL; + } + + /* + * Pre-copy emptied all the device state for now. For more information, + * please refer to the Linux kernel VFIO uAPI. + */ + data_size = 0; + } + + if (data_size == 0) + break; + + packet->idx = idx; + packet_size = sizeof(*packet) + data_size; + + *ret = multifd_queue_device_state(migration->idstr, migration->instance_id, + (char *)packet, packet_size); + if (*ret) { + return NULL; + } + + bytes_transferred += packet_size; + } + + *ret = vfio_save_complete_precopy_async_thread_config_state(vbasedev, idx); + if (*ret) { + return NULL; + } + + trace_vfio_save_complete_precopy_async_finished(vbasedev->name); + + return NULL; +} + +static int vfio_save_complete_precopy_begin(QEMUFile *f, + char *idstr, uint32_t instance_id, + void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + int ret; + + migration->save_complete_precopy_thread_ret = 0; + + if (!migration->multifd_transfer) { + /* Emit dummy NOP data */ + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + return 0; + } + + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE_COMPLETE); + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + + ret = qemu_fflush(f); + if (ret) { + return ret; + } + + assert(!migration->save_complete_precopy_thread_started); + + assert(!migration->idstr); + migration->idstr = g_strdup(idstr); + migration->instance_id = instance_id; + + qemu_thread_create(&migration->save_complete_precopy_thread, + "vfio-save_complete_precopy", + vfio_save_complete_precopy_async_thread, + opaque, QEMU_THREAD_JOINABLE); + + migration->save_complete_precopy_thread_started = true; + + trace_vfio_save_complete_precopy_async_started(vbasedev->name, idstr, instance_id); + + return 0; +} + +static void vfio_save_complete_precopy_async_thread_thread_terminate(VFIODevice *vbasedev) +{ + VFIOMigration *migration = vbasedev->migration; + + if (!migration->save_complete_precopy_thread_started) { + return; + } + + qemu_thread_join(&migration->save_complete_precopy_thread); + + migration->save_complete_precopy_thread_started = false; + + trace_vfio_save_complete_precopy_async_joined(vbasedev->name, + migration->save_complete_precopy_thread_ret); +} + +static int vfio_save_complete_precopy_end(QEMUFile *f, void *opaque) +{ + VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; + + vfio_save_complete_precopy_async_thread_thread_terminate(vbasedev); + + return migration->save_complete_precopy_thread_ret; +} + static void vfio_save_state(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; + VFIOMigration *migration = vbasedev->migration; Error *local_err = NULL; int ret; + if (migration->multifd_transfer) { + /* Emit dummy NOP data */ + qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); + return; + } + ret = vfio_save_device_config_state(f, opaque, &local_err); if (ret) { error_prepend(&local_err, @@ -1106,6 +1304,8 @@ static const SaveVMHandlers savevm_vfio_handlers = { .state_pending_exact = vfio_state_pending_exact, .is_active_iterate = vfio_is_active_iterate, .save_live_iterate = vfio_save_iterate, + .save_live_complete_precopy_begin = vfio_save_complete_precopy_begin, + .save_live_complete_precopy_end = vfio_save_complete_precopy_end, .save_live_complete_precopy = vfio_save_complete_precopy, .save_state = vfio_save_state, .load_setup = vfio_load_setup, @@ -1127,6 +1327,10 @@ static void vfio_vmstate_change_prepare(void *opaque, bool running, Error *local_err = NULL; int ret; + if (running) { + vfio_save_complete_precopy_async_thread_thread_terminate(vbasedev); + } + new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ? VFIO_DEVICE_STATE_PRE_COPY_P2P : VFIO_DEVICE_STATE_RUNNING_P2P; @@ -1153,6 +1357,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) int ret; if (running) { + /* In case "prepare" callback wasn't registered */ + vfio_save_complete_precopy_async_thread_thread_terminate(vbasedev); + new_state = VFIO_DEVICE_STATE_RUNNING; } else { new_state = diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 7f224e4d240f..569bb02434f1 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -166,6 +166,9 @@ vfio_save_block(const char *name, int data_size) " (%s) data_size %d" vfio_save_cleanup(const char *name) " (%s)" vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d" vfio_save_complete_precopy_started(const char *name) " (%s)" +vfio_save_complete_precopy_async_started(const char *name, const char *idstr, uint32_t instance_id) " (%s) idstr %s instance %"PRIu32 +vfio_save_complete_precopy_async_finished(const char *name) " (%s)" +vfio_save_complete_precopy_async_joined(const char *name, int ret) " (%s) ret %d" vfio_save_device_config_state(const char *name) " (%s)" vfio_save_iterate(const char *name, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64 vfio_save_iterate_started(const char *name) " (%s)" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index bc85891d8fff..2d76d3fc8bba 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -70,16 +70,25 @@ typedef struct VFIOMigration { uint64_t mig_flags; uint64_t precopy_init_size; uint64_t precopy_dirty_size; + bool multifd_transfer; bool initial_data_sent; bool save_iterate_run; bool save_iterate_empty_hit; + + QemuThread save_complete_precopy_thread; + int save_complete_precopy_thread_ret; + bool save_complete_precopy_thread_started; + QemuThread load_bufs_thread; Error *load_bufs_thread_errp; bool load_bufs_thread_started; bool load_bufs_thread_finished; bool load_bufs_thread_want_exit; + char *idstr; + uint32_t instance_id; + GArray *load_bufs; bool load_bufs_device_ready; QemuCond load_bufs_device_ready_cond;