From patchwork Wed Mar 17 16:32:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26B28C433E0 for ; Wed, 17 Mar 2021 16:35:41 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF98164F10 for ; Wed, 17 Mar 2021 16:35:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF98164F10 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:35392 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZ91-00023x-Gc for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 12:35:39 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36024) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ6p-0000LM-AB for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:33:27 -0400 Received: from relay.sw.ru ([185.231.240.75]:49366) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ6f-0004db-HX for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:33:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=4JbJkudXb/wdFoO1R1oG/sXIBBkFqAWDWXTx/mwLFrk=; b=Jr9+PwSd4zaJ 2n0ajPIU4aRkurm0YObVgQT1HJ2nS9ma6q7IuDnqRXx+OpdIs90F0vU2KcD5yffdicDTX7xZ9Ebc0 5I0hay7ofX35Z8vmwawWvK5cxAOdFZRYrUnr6LlOoXyZiIVepvTOfut4OJe36QqTictJe1XPLHteg C+Gm4=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ63-0034yI-R0; Wed, 17 Mar 2021 19:32:35 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 1/9] migration/snap-tool: Introduce qemu-snap tool Date: Wed, 17 Mar 2021 19:32:14 +0300 Message-Id: <20210317163222.182609-2-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Initial commit with code to set up execution environment, parse command-line arguments, show usage/version info and so on. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 35 ++++ meson.build | 2 + qemu-snap.c | 414 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 451 insertions(+) create mode 100644 include/qemu-snap.h create mode 100644 qemu-snap.c diff --git a/include/qemu-snap.h b/include/qemu-snap.h new file mode 100644 index 0000000000..b8e48bfcbb --- /dev/null +++ b/include/qemu-snap.h @@ -0,0 +1,35 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef QEMU_SNAP_H +#define QEMU_SNAP_H + +/* Target page size, if not specified explicitly in command-line */ +#define DEFAULT_PAGE_SIZE 4096 +/* + * Maximum supported target page size, cause we use QEMUFile and + * qemu_get_buffer_in_place(). IO_BUF_SIZE is currently 32KB. + */ +#define PAGE_SIZE_MAX 16384 + +typedef struct SnapSaveState { + BlockBackend *blk; /* Block backend */ +} SnapSaveState; + +typedef struct SnapLoadState { + BlockBackend *blk; /* Block backend */ +} SnapLoadState; + +SnapSaveState *snap_save_get_state(void); +SnapLoadState *snap_load_get_state(void); + +#endif /* QEMU_SNAP_H */ diff --git a/meson.build b/meson.build index a7d2dd429d..11564165ba 100644 --- a/meson.build +++ b/meson.build @@ -2324,6 +2324,8 @@ if have_tools dependencies: [block, qemuutil], install: true) qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [blockdev, qemuutil, gnutls], install: true) + qemu_snap = executable('qemu-snap', files('qemu-snap.c'), + dependencies: [blockdev, qemuutil, migration], install: true) subdir('storage-daemon') subdir('contrib/rdmacm-mux') diff --git a/qemu-snap.c b/qemu-snap.c new file mode 100644 index 0000000000..c7118927f7 --- /dev/null +++ b/qemu-snap.c @@ -0,0 +1,414 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include + +#include "qemu-common.h" +#include "qemu-version.h" +#include "qapi/error.h" +#include "qapi/qmp/qdict.h" +#include "sysemu/block-backend.h" +#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */ +#include "qemu/cutils.h" +#include "qemu/coroutine.h" +#include "qemu/error-report.h" +#include "qemu/config-file.h" +#include "qemu/log.h" +#include "trace/control.h" +#include "io/channel-util.h" +#include "io/channel-buffer.h" +#include "migration/qemu-file-channel.h" +#include "migration/qemu-file.h" +#include "qemu-snap.h" + +#define OPT_CACHE 256 +#define OPT_AIO 257 + +/* Parameters for snapshot saving */ +typedef struct SnapSaveParams { + const char *filename; /* QCOW2 image file name */ + int64_t image_size; /* QCOW2 virtual image size */ + int bdrv_flags; /* BDRV flags (cache/AIO mode)*/ + bool writethrough; /* BDRV writes in FUA mode */ + + int64_t page_size; /* Target page size to use */ + + int fd; /* Migration stream input FD */ +} SnapSaveParams; + +/* Parameters for snapshot saving */ +typedef struct SnapLoadParams { + const char *filename; /* QCOW2 image file name */ + int bdrv_flags; /* BDRV flags (cache/AIO mode)*/ + + int64_t page_size; /* Target page size to use */ + bool postcopy; /* Use postcopy */ + /* Switch to postcopy after postcopy_percent% of RAM loaded */ + int postcopy_percent; + + int fd; /* Migration stream output FD */ + int rp_fd; /* Return-path FD (for postcopy) */ +} SnapLoadParams; + +static SnapSaveState save_state; +static SnapLoadState load_state; + +#ifdef CONFIG_POSIX +void qemu_system_killed(int signum, pid_t pid) +{ +} +#endif /* CONFIG_POSIX */ + +static void snap_shutdown(void) +{ + bdrv_close_all(); +} + +SnapSaveState *snap_save_get_state(void) +{ + return &save_state; +} + +SnapLoadState *snap_load_get_state(void) +{ + return &load_state; +} + +static void snap_save_init_state(void) +{ + memset(&save_state, 0, sizeof(save_state)); +} + +static void snap_save_destroy_state(void) +{ + /* TODO: implement */ +} + +static void snap_load_init_state(void) +{ + memset(&load_state, 0, sizeof(load_state)); +} + +static void snap_load_destroy_state(void) +{ + /* TODO: implement */ +} + +static int snap_save(const SnapSaveParams *params) +{ + SnapSaveState *sn; + + snap_save_init_state(); + sn = snap_save_get_state(); + (void) sn; + + snap_save_destroy_state(); + + return 0; +} + +static int snap_load(SnapLoadParams *params) +{ + SnapLoadState *sn; + + snap_load_init_state(); + sn = snap_load_get_state(); + (void) sn; + + snap_load_destroy_state(); + + return 0; +} + +static int64_t cvtnum_full(const char *name, const char *value, + int64_t min, int64_t max) +{ + uint64_t res; + int err; + + err = qemu_strtosz(value, NULL, &res); + if (err < 0 && err != -ERANGE) { + error_report("Invalid %s specified. You may use " + "k, M, G, T, P or E suffixes for", name); + error_report("kilobytes, megabytes, gigabytes, terabytes, " + "petabytes and exabytes."); + return err; + } + if (err == -ERANGE || res > max || res < min) { + error_report("Invalid %s specified. Must be between %" PRId64 + " and %" PRId64 ".", name, min, max); + return -ERANGE; + } + + return res; +} + +static int64_t cvtnum(const char *name, const char *value) +{ + return cvtnum_full(name, value, 0, INT64_MAX); +} + +static bool is_2power(int64_t val) +{ + return val && ((val & (val - 1)) == 0); +} + +static void usage(const char *name) +{ + printf( + "Usage: %s [OPTIONS] save|load FILE\n" + "QEMU External Snapshot Utility\n" + "\n" + " -h, --help display this help and exit\n" + " -V, --version output version information and exit\n" + "\n" + "General purpose options:\n" + " -t, --trace [[enable=]][,events=][,file=]\n" + " specify tracing options\n" + "\n" + "Image options:\n" + " -s, --image-size=SIZE size of image to create for 'save'\n" + " -n, --nocache disable host cache\n" + " --cache=MODE set cache mode (none, writeback, ...)\n" + " --aio=MODE set AIO mode (native, io_uring or threads)\n" + "\n" + "Snapshot options:\n" + " -S, --page-size=SIZE target page size\n" + " -p, --postcopy=%%RAM switch to postcopy after '%%RAM' loaded\n" + "\n" + QEMU_HELP_BOTTOM "\n", name); +} + +static void version(const char *name) +{ + printf( + "%s " QEMU_FULL_VERSION "\n" + "Written by Andrey Gruzdev.\n" + "\n" + QEMU_COPYRIGHT "\n" + "This is free software; see the source for copying conditions. There is NO\n" + "warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.\n", + name); +} + +int main(int argc, char **argv) +{ + static const struct option l_opt[] = { + { "help", no_argument, NULL, 'h' }, + { "version", no_argument, NULL, 'V' }, + { "image-size", required_argument, NULL, 's' }, + { "page-size", required_argument, NULL, 'S' }, + { "postcopy", required_argument, NULL, 'p' }, + { "nocache", no_argument, NULL, 'n' }, + { "cache", required_argument, NULL, OPT_CACHE }, + { "aio", required_argument, NULL, OPT_AIO }, + { "trace", required_argument, NULL, 't' }, + { NULL, 0, NULL, 0 } + }; + static const char *s_opt = "hVs:S:p:nt:"; + + int ch; + int l_ind = 0; + + bool seen_image_size = false; + bool seen_page_size = false; + bool seen_postcopy = false; + bool seen_cache = false; + bool seen_aio = false; + int64_t image_size = 0; + int64_t page_size = DEFAULT_PAGE_SIZE; + int64_t postcopy_percent = 0; + int bdrv_flags = 0; + bool writethrough = false; + bool postcopy = false; + const char *cmd_name; + const char *file_name; + Error *local_err = NULL; + +#ifdef CONFIG_POSIX + signal(SIGPIPE, SIG_IGN); +#endif + error_init(argv[0]); + module_call_init(MODULE_INIT_TRACE); + module_call_init(MODULE_INIT_QOM); + + qemu_add_opts(&qemu_trace_opts); + qemu_init_exec_dir(argv[0]); + + while ((ch = getopt_long(argc, argv, s_opt, l_opt, &l_ind)) != -1) { + switch (ch) { + case '?': + error_report("Try `%s --help' for more information", argv[0]); + return EXIT_FAILURE; + + case 's': + if (seen_image_size) { + error_report("-s and --image-size can only be specified once"); + return EXIT_FAILURE; + } + seen_image_size = true; + + image_size = cvtnum(l_opt[l_ind].name, optarg); + if (image_size <= 0) { + error_report("Invalid image size parameter '%s'", optarg); + return EXIT_FAILURE; + } + break; + + case 'S': + if (seen_page_size) { + error_report("-S and --page-size can only be specified once"); + return EXIT_FAILURE; + } + seen_page_size = true; + + page_size = cvtnum(l_opt[l_ind].name, optarg); + if (page_size <= 0 || !is_2power(page_size) || + page_size > PAGE_SIZE_MAX) { + error_report("Invalid target page size parameter '%s'", optarg); + return EXIT_FAILURE; + } + break; + + case 'p': + if (seen_postcopy) { + error_report("-p and --postcopy can only be specified once"); + return EXIT_FAILURE; + } + seen_postcopy = true; + + postcopy_percent = cvtnum(l_opt[l_ind].name, optarg); + if (!(postcopy_percent > 0 && postcopy_percent < 100)) { + error_report("Invalid postcopy %%RAM parameter '%s'", optarg); + return EXIT_FAILURE; + } + postcopy = true; + break; + + case 'n': + optarg = (char *) "none"; + /* fallthrough */ + + case OPT_CACHE: + if (seen_cache) { + error_report("-n and --cache can only be specified once"); + return EXIT_FAILURE; + } + seen_cache = true; + + if (bdrv_parse_cache_mode(optarg, &bdrv_flags, &writethrough)) { + error_report("Invalid cache mode '%s'", optarg); + return EXIT_FAILURE; + } + break; + + case OPT_AIO: + if (seen_aio) { + error_report("--aio can only be specified once"); + return EXIT_FAILURE; + } + seen_aio = true; + + if (bdrv_parse_aio(optarg, &bdrv_flags)) { + error_report("Invalid AIO mode '%s'", optarg); + return EXIT_FAILURE; + } + break; + + case 'V': + version(argv[0]); + return EXIT_SUCCESS; + + case 'h': + usage(argv[0]); + return EXIT_SUCCESS; + + case 't': + trace_opt_parse(optarg); + break; + + } + } + + if ((argc - optind) != 2) { + error_report("Invalid number of arguments"); + return EXIT_FAILURE; + } + + if (!trace_init_backends()) { + return EXIT_FAILURE; + } + trace_init_file(); + qemu_set_log(LOG_TRACE); + + if (qemu_init_main_loop(&local_err)) { + error_report_err(local_err); + return EXIT_FAILURE; + } + + bdrv_init(); + atexit(snap_shutdown); + + cmd_name = argv[optind]; + file_name = argv[optind + 1]; + + if (!strcmp(cmd_name, "save")) { + SnapSaveParams params = { + .filename = file_name, + .image_size = image_size, + .page_size = page_size, + .bdrv_flags = (bdrv_flags | BDRV_O_RDWR), + .writethrough = writethrough, + .fd = STDIN_FILENO }; + int res; + + if (seen_postcopy) { + error_report("-p and --postcopy cannot be used for 'save'"); + return EXIT_FAILURE; + } + if (!seen_image_size) { + error_report("-s or --size are required for 'save'"); + return EXIT_FAILURE; + } + + res = snap_save(¶ms); + if (res < 0) { + return EXIT_FAILURE; + } + return EXIT_SUCCESS; + } else if (!strcmp(cmd_name, "load")) { + SnapLoadParams params = { + .filename = file_name, + .bdrv_flags = bdrv_flags, + .postcopy = postcopy, + .postcopy_percent = postcopy_percent, + .page_size = page_size, + .fd = STDOUT_FILENO, + .rp_fd = STDIN_FILENO }; + int res; + + if (seen_image_size) { + error_report("-s and --size cannot be used for 'load'"); + return EXIT_FAILURE; + } + + res = snap_load(¶ms); + if (res < 0) { + return EXIT_FAILURE; + } + return EXIT_SUCCESS; + } + + error_report("Invalid command"); + return EXIT_FAILURE; +} From patchwork Wed Mar 17 16:32:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52CCEC433DB for ; Wed, 17 Mar 2021 17:11:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D243364EB6 for ; Wed, 17 Mar 2021 17:11:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D243364EB6 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:47302 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZi9-0007eE-MP for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 13:11:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36084) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ7A-0000N4-F8 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:33:46 -0400 Received: from relay.sw.ru ([185.231.240.75]:49604) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ75-0004jS-3G for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:33:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=cOs5lK6sYyvwHZDboMekbywKe4agSAHQpulY6WbxnJ0=; b=sD6sP6IU9fgi wKGWSe0ZuJz5zuaUb/Nr6Xzd9DhnOMNlaD8TdpDKdWGkcNsLZmeJDgBmQvV8tvpP4MELbxXvWdZ5X UwU4ro7ZH6CTAzGTTygei6M0oNzSbfij0LgEUjNckKk4R2vODgDvwsaEgB0hvRfmCBwU9x7cAlOr5 ISz7c=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ6R-0034yI-S8; Wed, 17 Mar 2021 19:33:00 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 2/9] migration/snap-tool: Snapshot image create/open routines for qemu-snap tool Date: Wed, 17 Mar 2021 19:32:15 +0300 Message-Id: <20210317163222.182609-3-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Implementation of routines for QCOW2 image creation and opening. Some predefined parameters for image creation and opening are introduced that provide reasonable tradeoff between performance, file size and usability. Thus, it was chosen to disable preallocation and keep image file dense on host file system, the apparent file size equals allocated in this case which is anyways beneficial for the user experience. Larger 1MB cluster size is adopted to reduce allocation overhead and improve I/O performance while keeping internal fragmentation of snapshot image reasonably small. Signed-off-by: Andrey Gruzdev --- qemu-snap.c | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 90 insertions(+), 4 deletions(-) diff --git a/qemu-snap.c b/qemu-snap.c index c7118927f7..c9f8d7166a 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -31,6 +31,16 @@ #include "migration/qemu-file.h" #include "qemu-snap.h" +/* QCOW2 image options */ +#define BLK_FORMAT_DRIVER "qcow2" +#define BLK_CREATE_OPT_STRING "preallocation=off,lazy_refcounts=on," \ + "extended_l2=off,compat=v3,cluster_size=1M," \ + "refcount_bits=8" +/* L2 cache size to cover 2TB of memory */ +#define BLK_L2_CACHE_SIZE "16M" +/* Single L2 cache entry for the whole L2 table */ +#define BLK_L2_CACHE_ENTRY_SIZE "1M" + #define OPT_CACHE 256 #define OPT_AIO 257 @@ -104,30 +114,106 @@ static void snap_load_destroy_state(void) /* TODO: implement */ } +static BlockBackend *snap_create(const char *filename, int64_t image_size, + int flags, bool writethrough) +{ + char *create_opt_string; + QDict *blk_opts; + BlockBackend *blk; + Error *local_err = NULL; + + /* Create QCOW2 image with given parameters */ + create_opt_string = g_strdup(BLK_CREATE_OPT_STRING); + bdrv_img_create(filename, BLK_FORMAT_DRIVER, NULL, NULL, + create_opt_string, image_size, flags, true, &local_err); + g_free(create_opt_string); + + if (local_err) { + error_reportf_err(local_err, "Could not create '%s': ", filename); + goto fail; + } + + /* Block backend open options */ + blk_opts = qdict_new(); + qdict_put_str(blk_opts, "driver", BLK_FORMAT_DRIVER); + qdict_put_str(blk_opts, "l2-cache-size", BLK_L2_CACHE_SIZE); + qdict_put_str(blk_opts, "l2-cache-entry-size", BLK_L2_CACHE_ENTRY_SIZE); + + /* Open block backend instance for the created image */ + blk = blk_new_open(filename, NULL, blk_opts, flags, &local_err); + if (!blk) { + error_reportf_err(local_err, "Could not open '%s': ", filename); + /* Delete image file */ + qemu_unlink(filename); + goto fail; + } + + blk_set_enable_write_cache(blk, !writethrough); + return blk; + +fail: + return NULL; +} + +static BlockBackend *snap_open(const char *filename, int flags) +{ + QDict *blk_opts; + BlockBackend *blk; + Error *local_err = NULL; + + /* Block backend open options */ + blk_opts = qdict_new(); + qdict_put_str(blk_opts, "driver", BLK_FORMAT_DRIVER); + qdict_put_str(blk_opts, "l2-cache-size", BLK_L2_CACHE_SIZE); + qdict_put_str(blk_opts, "l2-cache-entry-size", BLK_L2_CACHE_ENTRY_SIZE); + + /* Open block backend instance */ + blk = blk_new_open(filename, NULL, blk_opts, flags, &local_err); + if (!blk) { + error_reportf_err(local_err, "Could not open '%s': ", filename); + return NULL; + } + + return blk; +} + static int snap_save(const SnapSaveParams *params) { SnapSaveState *sn; + int res = -1; snap_save_init_state(); sn = snap_save_get_state(); - (void) sn; + sn->blk = snap_create(params->filename, params->image_size, + params->bdrv_flags, params->writethrough); + if (!sn->blk) { + goto fail; + } + +fail: snap_save_destroy_state(); - return 0; + return res; } static int snap_load(SnapLoadParams *params) { SnapLoadState *sn; + int res = -1; snap_load_init_state(); sn = snap_load_get_state(); - (void) sn; + sn->blk = snap_open(params->filename, params->bdrv_flags); + if (!sn->blk) { + goto fail; + } + +fail: snap_load_destroy_state(); - return 0; + return res; } static int64_t cvtnum_full(const char *name, const char *value, From patchwork Wed Mar 17 16:32:16 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBD9CC433DB for ; Wed, 17 Mar 2021 16:52:28 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 52EDA64F4F for ; Wed, 17 Mar 2021 16:52:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 52EDA64F4F Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:43906 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZPH-0001jb-CH for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 12:52:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36172) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ7S-0000tS-HA for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:34:02 -0400 Received: from relay.sw.ru ([185.231.240.75]:49892) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ7Q-0004sM-N1 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:34:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=RrwxIBwXtwLmbUQ/MT7DER+r9mV6A3s/DhkSW9EIpng=; b=teFp42zdibli +2oIk6cy+8OIOBaR4ihZT92uG0f7cx0wtycZvJ8CEX8hFCWfifpwLH1RoUeq+x9t22WjLWvBFe7vv gY1fad2GkbRQp8IXRM1DwQ2nsYo+GUNiZeYilwonlGuGTNxyKlcp16R38miLHQD728n3PwvwpEG96 m05MM=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ6p-0034yI-Qh; Wed, 17 Mar 2021 19:33:24 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 3/9] migration/snap-tool: Preparations to run code in main loop context Date: Wed, 17 Mar 2021 19:32:16 +0300 Message-Id: <20210317163222.182609-4-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Major part of code is using QEMUFile and block layer routines, thus to take advantage from concurrent I/O operations we need to use coroutines and run in the the main loop context. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 3 +++ meson.build | 2 +- qemu-snap-handlers.c | 38 ++++++++++++++++++++++++++ qemu-snap.c | 63 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+), 1 deletion(-) create mode 100644 qemu-snap-handlers.c diff --git a/include/qemu-snap.h b/include/qemu-snap.h index b8e48bfcbb..b6fd779b13 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -32,4 +32,7 @@ typedef struct SnapLoadState { SnapSaveState *snap_save_get_state(void); SnapLoadState *snap_load_get_state(void); +int coroutine_fn snap_save_state_main(SnapSaveState *sn); +int coroutine_fn snap_load_state_main(SnapLoadState *sn); + #endif /* QEMU_SNAP_H */ diff --git a/meson.build b/meson.build index 11564165ba..252c55d6a3 100644 --- a/meson.build +++ b/meson.build @@ -2324,7 +2324,7 @@ if have_tools dependencies: [block, qemuutil], install: true) qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [blockdev, qemuutil, gnutls], install: true) - qemu_snap = executable('qemu-snap', files('qemu-snap.c'), + qemu_snap = executable('qemu-snap', files('qemu-snap.c', 'qemu-snap-handlers.c'), dependencies: [blockdev, qemuutil, migration], install: true) subdir('storage-daemon') diff --git a/qemu-snap-handlers.c b/qemu-snap-handlers.c new file mode 100644 index 0000000000..bdc1911909 --- /dev/null +++ b/qemu-snap-handlers.c @@ -0,0 +1,38 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "sysemu/block-backend.h" +#include "qemu/coroutine.h" +#include "qemu/cutils.h" +#include "qemu/bitmap.h" +#include "qemu/error-report.h" +#include "io/channel-buffer.h" +#include "migration/qemu-file-channel.h" +#include "migration/qemu-file.h" +#include "migration/savevm.h" +#include "migration/ram.h" +#include "qemu-snap.h" + +/* Save snapshot data from incoming migration stream */ +int coroutine_fn snap_save_state_main(SnapSaveState *sn) +{ + /* TODO: implement */ + return 0; +} + +/* Load snapshot data and send it with outgoing migration stream */ +int coroutine_fn snap_load_state_main(SnapLoadState *sn) +{ + /* TODO: implement */ + return 0; +} diff --git a/qemu-snap.c b/qemu-snap.c index c9f8d7166a..ec56aa55d2 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -44,6 +44,14 @@ #define OPT_CACHE 256 #define OPT_AIO 257 +/* Snapshot task execution state */ +typedef struct SnapTaskState { + QEMUBH *bh; /* BH to enter task's coroutine */ + Coroutine *co; /* Coroutine to execute task */ + + int ret; /* Return code, -EINPROGRESS until complete */ +} SnapTaskState; + /* Parameters for snapshot saving */ typedef struct SnapSaveParams { const char *filename; /* QCOW2 image file name */ @@ -177,6 +185,51 @@ static BlockBackend *snap_open(const char *filename, int flags) return blk; } +static void coroutine_fn do_snap_save_co(void *opaque) +{ + SnapTaskState *task_state = opaque; + SnapSaveState *sn = snap_save_get_state(); + + /* Enter main routine */ + task_state->ret = snap_save_state_main(sn); +} + +static void coroutine_fn do_snap_load_co(void *opaque) +{ + SnapTaskState *task_state = opaque; + SnapLoadState *sn = snap_load_get_state(); + + /* Enter main routine */ + task_state->ret = snap_load_state_main(sn); +} + +/* We use BH to enter coroutine from the main loop context */ +static void enter_co_bh(void *opaque) +{ + SnapTaskState *task_state = opaque; + + qemu_coroutine_enter(task_state->co); + /* Delete BH once we entered coroutine from the main loop */ + qemu_bh_delete(task_state->bh); + task_state->bh = NULL; +} + +static int run_snap_task(CoroutineEntry *entry) +{ + SnapTaskState task_state; + + task_state.bh = qemu_bh_new(enter_co_bh, &task_state); + task_state.co = qemu_coroutine_create(entry, &task_state); + task_state.ret = -EINPROGRESS; + + qemu_bh_schedule(task_state.bh); + while (task_state.ret == -EINPROGRESS) { + main_loop_wait(false); + } + + return task_state.ret; +} + static int snap_save(const SnapSaveParams *params) { SnapSaveState *sn; @@ -191,6 +244,11 @@ static int snap_save(const SnapSaveParams *params) goto fail; } + res = run_snap_task(do_snap_save_co); + if (res) { + error_report("Failed to save snapshot: error=%d", res); + } + fail: snap_save_destroy_state(); @@ -210,6 +268,11 @@ static int snap_load(SnapLoadParams *params) goto fail; } + res = run_snap_task(do_snap_load_co); + if (res) { + error_report("Failed to load snapshot: error=%d", res); + } + fail: snap_load_destroy_state(); From patchwork Wed Mar 17 16:32:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0B57C433E0 for ; Wed, 17 Mar 2021 16:51:22 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5CA0E64F4F for ; Wed, 17 Mar 2021 16:51:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5CA0E64F4F Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:41652 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZOD-0000e5-AF for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 12:51:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36250) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ7q-0001Z1-IM for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:34:26 -0400 Received: from relay.sw.ru ([185.231.240.75]:50030) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ7o-00053V-P8 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:34:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=jdRv6V4YC3uc+u+U8vjbnEi5Ik3OoBtjZJSkcF8l4y8=; b=dbtVMBK5Bul/ /JIyHgQK7gYQk7JdbSVUJHqIoZlwLRTCoq2LPoCGkvxJtQ06TXZ62+QdCKJZFBcjX/r/4Gqcnu4I7 Hc+4RdYcK8MBUQiuQFtFYKgKiseE6e1nILq/uOqw14IoDenOkOXj4A7je0h3vkoE9s161zPgYtcuG Sy3U4=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ7D-0034yI-Qi; Wed, 17 Mar 2021 19:33:47 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 4/9] migration/snap-tool: Introduce qemu_ftell2() routine to qemu-file.c Date: Wed, 17 Mar 2021 19:32:17 +0300 Message-Id: <20210317163222.182609-5-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" In several place we need to get QEMUFile input position in the meaning of the number of bytes read by qemu_get_byte()/qemu_get_buffer() routines. Existing qemu_ftell() returns offset in terms of the number of bytes read from underlying IOChannel object which is not suitable here. Signed-off-by: Andrey Gruzdev --- migration/qemu-file.c | 6 ++++++ migration/qemu-file.h | 1 + 2 files changed, 7 insertions(+) diff --git a/migration/qemu-file.c b/migration/qemu-file.c index d6e03dbc0e..66be5e6460 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -657,6 +657,12 @@ int64_t qemu_ftell(QEMUFile *f) return f->pos; } +int64_t qemu_ftell2(QEMUFile *f) +{ + qemu_fflush(f); + return f->pos + f->buf_index - f->buf_size; +} + int qemu_file_rate_limit(QEMUFile *f) { if (f->shutdown) { diff --git a/migration/qemu-file.h b/migration/qemu-file.h index a9b6d6ccb7..bd1a6def02 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -124,6 +124,7 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks); int qemu_get_fd(QEMUFile *f); int qemu_fclose(QEMUFile *f); int64_t qemu_ftell(QEMUFile *f); +int64_t qemu_ftell2(QEMUFile *f); int64_t qemu_ftell_fast(QEMUFile *f); /* * put_buffer without copying the buffer. From patchwork Wed Mar 17 16:32:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10BDAC433DB for ; Wed, 17 Mar 2021 16:54:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7722864F50 for ; Wed, 17 Mar 2021 16:54:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7722864F50 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:49358 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZRh-00046v-I0 for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 12:54:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36328) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ8G-0002Kf-0l for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:34:52 -0400 Received: from relay.sw.ru ([185.231.240.75]:50154) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ8C-0005CX-QO for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:34:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=bIljFTtGhUariGTdpKMmSt7HxzPiX/72JJgo4/X87FQ=; b=tVuVCXaHwF58 uBLDSHdPeLo6ZC5zz5RRVubqDZTGMdDBiX3dDRayCCsQvRnLlMuP8SVHos+/TJXzM9Uk8o+mV0YIE zw2nHi3Wv0KC7cqxLEbYzY7KOrGKKpWvi5XQACUkx3INYJSOwBfNGZQzGebgD/JKHBPI/sCEO266a KmyXE=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ7b-0034yI-PP; Wed, 17 Mar 2021 19:34:11 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 5/9] migration/snap-tool: Block layer AIO support and file utility routines Date: Wed, 17 Mar 2021 19:32:18 +0300 Message-Id: <20210317163222.182609-6-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Introducing support for asynchronous block layer requests with in-order completion guerantee using simple buffer descriptor ring and coroutines. Added support for opening QEMUFile with VMSTATE area of QCOW2 image as backing, also introduced several file utility routines. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 48 +++++++ meson.build | 2 +- qemu-snap-io.c | 325 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 374 insertions(+), 1 deletion(-) create mode 100644 qemu-snap-io.c diff --git a/include/qemu-snap.h b/include/qemu-snap.h index b6fd779b13..f4b38d6442 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -13,6 +13,11 @@ #ifndef QEMU_SNAP_H #define QEMU_SNAP_H +/* Synthetic value for invalid offset */ +#define INVALID_OFFSET ((int64_t) -1) +/* Max. byte count for QEMUFile inplace read */ +#define INPLACE_READ_MAX (32768 - 4096) + /* Target page size, if not specified explicitly in command-line */ #define DEFAULT_PAGE_SIZE 4096 /* @@ -21,6 +26,31 @@ */ #define PAGE_SIZE_MAX 16384 +typedef struct AioBufferPool AioBufferPool; + +typedef struct AioBufferStatus { + /* BDRV operation start offset */ + int64_t offset; + /* BDRV operation byte count or negative error code */ + int count; +} AioBufferStatus; + +typedef struct AioBuffer { + void *data; /* Data buffer */ + int size; /* Size of data buffer */ + + AioBufferStatus status; /* Status returned by task->func() */ +} AioBuffer; + +typedef struct AioBufferTask { + AioBuffer *buffer; /* AIO buffer */ + + int64_t offset; /* BDRV operation start offset */ + int size; /* BDRV requested transfer size */ +} AioBufferTask; + +typedef AioBufferStatus coroutine_fn (*AioBufferFunc)(AioBufferTask *task); + typedef struct SnapSaveState { BlockBackend *blk; /* Block backend */ } SnapSaveState; @@ -35,4 +65,22 @@ SnapLoadState *snap_load_get_state(void); int coroutine_fn snap_save_state_main(SnapSaveState *sn); int coroutine_fn snap_load_state_main(SnapLoadState *sn); +QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable); + +AioBufferPool *coroutine_fn aio_pool_new(int buf_align, int buf_size, int buf_count); +void aio_pool_free(AioBufferPool *pool); +void aio_pool_set_max_in_flight(AioBufferPool *pool, int max_in_flight); +int aio_pool_status(AioBufferPool *pool); + +bool coroutine_fn aio_pool_can_acquire_next(AioBufferPool *pool); +AioBuffer *coroutine_fn aio_pool_try_acquire_next(AioBufferPool *pool); +AioBuffer *coroutine_fn aio_pool_wait_compl_next(AioBufferPool *pool); +void coroutine_fn aio_buffer_release(AioBuffer *buffer); + +void coroutine_fn aio_buffer_start_task(AioBuffer *buffer, AioBufferFunc func, + int64_t offset, int size); + +void file_transfer_to_eof(QEMUFile *f_dst, QEMUFile *f_src); +void file_transfer_bytes(QEMUFile *f_dst, QEMUFile *f_src, size_t size); + #endif /* QEMU_SNAP_H */ diff --git a/meson.build b/meson.build index 252c55d6a3..48f2367a5a 100644 --- a/meson.build +++ b/meson.build @@ -2324,7 +2324,7 @@ if have_tools dependencies: [block, qemuutil], install: true) qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'), dependencies: [blockdev, qemuutil, gnutls], install: true) - qemu_snap = executable('qemu-snap', files('qemu-snap.c', 'qemu-snap-handlers.c'), + qemu_snap = executable('qemu-snap', files('qemu-snap.c', 'qemu-snap-handlers.c', 'qemu-snap-io.c'), dependencies: [blockdev, qemuutil, migration], install: true) subdir('storage-daemon') diff --git a/qemu-snap-io.c b/qemu-snap-io.c new file mode 100644 index 0000000000..972c353255 --- /dev/null +++ b/qemu-snap-io.c @@ -0,0 +1,325 @@ +/* + * QEMU External Snapshot Utility + * + * Copyright Virtuozzo GmbH, 2021 + * + * Authors: + * Andrey Gruzdev + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/coroutine.h" +#include "qemu/error-report.h" +#include "sysemu/block-backend.h" +#include "qapi/error.h" +#include "migration/qemu-file.h" +#include "qemu-snap.h" + +/* + * AIO buffer pool. + * + * Coroutine-based environment to support concurrent block layer operations + * providing pre-allocated data buffers and in-order completion guarantee. + * + * All routines (with an exception of aio_pool_free()) are required to be + * called from the same coroutine in main loop context. + * + * Call sequence to keep several pending block layer requests: + * + * aio_pool_new() ! + * ! + * aio_pool_try_acquire_next() !<------!<------! + * aio_buffer_start_task() !------>! ! + * ! ! + * aio_pool_wait_compl_next() ! ! + * aio_buffer_release() !-------------->! + * ! + * aio_pool_free() ! + * + */ + +/* AIO buffer private struct */ +typedef struct AioBufferImpl { + AioBuffer user; /* Public part */ + AioBufferPool *pool; /* Buffer pool */ + + bool acquired; /* Buffer is acquired */ + bool busy; /* Task not complete */ +} AioBufferImpl; + +/* AIO task private struct */ +typedef struct AioBufferTaskImpl { + AioBufferTask user; /* Public part */ + AioBufferFunc func; /* Task func */ +} AioBufferTaskImpl; + +/* AIO buffer pool */ +typedef struct AioBufferPool { + int count; /* Number of AioBuffer's */ + + Coroutine *main_co; /* Parent coroutine */ + int status; /* Overall pool status */ + + /* Index of next buffer to await in-order */ + int wait_head; + /* Index of next buffer to acquire in-order */ + int acquire_tail; + + /* AioBuffer that is currently awaited for task completion, or NULL */ + AioBufferImpl *wait_on_buffer; + + int in_flight; /* AIO requests in-flight */ + int max_in_flight; /* Max. AIO in-flight requests */ + + AioBufferImpl buffers[]; /* Flex-array of AioBuffer's */ +} AioBufferPool; + +/* Wrapper for task->func() to maintain private state */ +static void coroutine_fn aio_buffer_co(void *opaque) +{ + AioBufferTaskImpl *task = (AioBufferTaskImpl *) opaque; + AioBufferImpl *buffer = (AioBufferImpl *) task->user.buffer; + AioBufferPool *pool = buffer->pool; + + buffer->busy = true; + buffer->user.status = task->func((AioBufferTask *) task); + /* Update pool status in case of an error */ + if (buffer->user.status.count < 0 && pool->status == 0) { + pool->status = buffer->user.status.count; + } + buffer->busy = false; + + g_free(task); + + if (buffer == pool->wait_on_buffer) { + pool->wait_on_buffer = NULL; + aio_co_wake(pool->main_co); + } +} + +/* Check that aio_pool_try_acquire_next() shall succeed */ +bool coroutine_fn aio_pool_can_acquire_next(AioBufferPool *pool) +{ + assert(qemu_coroutine_self() == pool->main_co); + + return (pool->in_flight < pool->max_in_flight) && + !pool->buffers[pool->acquire_tail].acquired; +} + +/* Try to acquire next buffer from the pool */ +AioBuffer *coroutine_fn aio_pool_try_acquire_next(AioBufferPool *pool) +{ + AioBufferImpl *buffer; + + assert(qemu_coroutine_self() == pool->main_co); + + if (pool->in_flight >= pool->max_in_flight) { + return NULL; + } + + buffer = &pool->buffers[pool->acquire_tail]; + if (!buffer->acquired) { + assert(!buffer->busy); + + buffer->acquired = true; + pool->acquire_tail = (pool->acquire_tail + 1) % pool->count; + + pool->in_flight++; + return (AioBuffer *) buffer; + } + + return NULL; +} + +/* Start BDRV task on acquired buffer */ +void coroutine_fn aio_buffer_start_task(AioBuffer *buffer, + AioBufferFunc func, int64_t offset, int size) +{ + AioBufferImpl *buffer_impl = (AioBufferImpl *) buffer; + AioBufferTaskImpl *task; + + assert(qemu_coroutine_self() == buffer_impl->pool->main_co); + assert(buffer_impl->acquired && !buffer_impl->busy); + assert(size <= buffer->size); + + task = g_new0(AioBufferTaskImpl, 1); + task->user.buffer = buffer; + task->user.offset = offset; + task->user.size = size; + task->func = func; + + qemu_coroutine_enter(qemu_coroutine_create(aio_buffer_co, task)); +} + +/* Wait for buffer task completion in-order */ +AioBuffer *coroutine_fn aio_pool_wait_compl_next(AioBufferPool *pool) +{ + AioBufferImpl *buffer; + + assert(qemu_coroutine_self() == pool->main_co); + + buffer = &pool->buffers[pool->wait_head]; + if (!buffer->acquired) { + return NULL; + } + + if (!buffer->busy) { +restart: + pool->wait_head = (pool->wait_head + 1) % pool->count; + return (AioBuffer *) buffer; + } + + pool->wait_on_buffer = buffer; + qemu_coroutine_yield(); + + assert(!pool->wait_on_buffer); + assert(!buffer->busy); + + goto restart; +} + +/* Release buffer */ +void coroutine_fn aio_buffer_release(AioBuffer *buffer) +{ + AioBufferImpl *buffer_impl = (AioBufferImpl *) buffer; + + assert(qemu_coroutine_self() == buffer_impl->pool->main_co); + assert(buffer_impl->acquired && !buffer_impl->busy); + + buffer_impl->acquired = false; + buffer_impl->pool->in_flight--; +} + +/* Create new AIO buffer pool */ +AioBufferPool *coroutine_fn aio_pool_new(int buf_align, + int buf_size, int buf_count) +{ + AioBufferPool *pool = g_malloc0(sizeof(AioBufferPool) + + buf_count * sizeof(pool->buffers[0])); + + pool->main_co = qemu_coroutine_self(); + + pool->count = buf_count; + pool->max_in_flight = pool->count; + + for (int i = 0; i < buf_count; i++) { + pool->buffers[i].pool = pool; + pool->buffers[i].user.data = qemu_memalign(buf_align, buf_size); + pool->buffers[i].user.size = buf_size; + } + + return pool; +} + +/* Free AIO buffer pool */ +void aio_pool_free(AioBufferPool *pool) +{ + for (int i = 0; i < pool->count; i++) { + qemu_vfree(pool->buffers[i].user.data); + } + g_free(pool); +} + +/* Limit the max. number of in-flight BDRV tasks/requests */ +void aio_pool_set_max_in_flight(AioBufferPool *pool, int max_in_flight) +{ + assert(max_in_flight > 0); + pool->max_in_flight = MIN(max_in_flight, pool->count); +} + +/* Get overall pool operation status */ +int aio_pool_status(AioBufferPool *pool) +{ + return pool->status; +} + +static ssize_t bdrv_vmstate_writev_buffer(void *opaque, struct iovec *iov, + int iovcnt, int64_t pos, Error **errp) +{ + int ret; + QEMUIOVector qiov; + + qemu_iovec_init_external(&qiov, iov, iovcnt); + ret = bdrv_writev_vmstate((BlockDriverState *) opaque, &qiov, pos); + if (ret < 0) { + return ret; + } + + return qiov.size; +} + +static ssize_t bdrv_vmstate_get_buffer(void *opaque, uint8_t *buf, + int64_t pos, size_t size, Error **errp) +{ + return bdrv_load_vmstate((BlockDriverState *) opaque, buf, pos, size); +} + +static int bdrv_vmstate_fclose(void *opaque, Error **errp) +{ + return bdrv_flush((BlockDriverState *) opaque); +} + +static const QEMUFileOps bdrv_vmstate_read_ops = { + .get_buffer = bdrv_vmstate_get_buffer, + .close = bdrv_vmstate_fclose, +}; + +static const QEMUFileOps bdrv_vmstate_write_ops = { + .writev_buffer = bdrv_vmstate_writev_buffer, + .close = bdrv_vmstate_fclose, +}; + +/* Create QEMUFile object to access vmstate area of the image */ +QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable) +{ + if (is_writable) { + return qemu_fopen_ops(bs, &bdrv_vmstate_write_ops); + } + return qemu_fopen_ops(bs, &bdrv_vmstate_read_ops); +} + +/* + * Transfer data from source QEMUFile to destination + * until we rich EOF on source. + */ +void file_transfer_to_eof(QEMUFile *f_dst, QEMUFile *f_src) +{ + bool eof = false; + + while (!eof) { + const size_t size = INPLACE_READ_MAX; + uint8_t *buffer = NULL; + size_t count; + + count = qemu_peek_buffer(f_src, &buffer, size, 0); + qemu_file_skip(f_src, count); + /* Reached stream EOF? */ + if (count != size) { + eof = true; + } + + qemu_put_buffer(f_dst, buffer, count); + } +} + +/* Transfer given number of bytes from source QEMUFile to destination */ +void file_transfer_bytes(QEMUFile *f_dst, QEMUFile *f_src, size_t size) +{ + size_t rest = size; + + while (rest) { + uint8_t *ptr = NULL; + size_t req_size; + size_t count; + + req_size = MIN(rest, INPLACE_READ_MAX); + count = qemu_peek_buffer(f_src, &ptr, req_size, 0); + qemu_file_skip(f_src, count); + + qemu_put_buffer(f_dst, ptr, count); + rest -= count; + } +} From patchwork Wed Mar 17 16:32:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89761C433DB for ; Wed, 17 Mar 2021 16:39:57 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CB56A64F6A for ; Wed, 17 Mar 2021 16:39:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB56A64F6A Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:44466 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZD9-000681-LO for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 12:39:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36460) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ8g-0002cF-Rf for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:35:24 -0400 Received: from relay.sw.ru ([185.231.240.75]:50300) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ8b-0005P3-25 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:35:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=NHJOjm6wNZ/ix5dzVo/rxljiVEJ9yBpUOxowQqxxGVg=; b=YzdSjZhOtwA0 KkwXKPkyfTw9fX2NuKkEZEsTxwIt3i+K/035ClXjeWqm/yh2oF2BDyeRcq8YUyALyLj85Le+u9wj3 VjVr5m6ts9emBT9a6hkJQHlLAr4KoxcgU1wjVaTRHMl64M5hxwvsQarhBj/+BCKmEMoHcMjmzbsUU DY7Go=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ7z-0034yI-Pn; Wed, 17 Mar 2021 19:34:35 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 6/9] migration/snap-tool: Move RAM_SAVE_FLAG_xxx defines to migration/ram.h Date: Wed, 17 Mar 2021 19:32:19 +0300 Message-Id: <20210317163222.182609-7-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Move RAM_SAVE_FLAG_xxx defines from migration/ram.c to migration/ram.h Signed-off-by: Andrey Gruzdev --- migration/ram.c | 16 ---------------- migration/ram.h | 16 ++++++++++++++++ 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 52537f14ac..d3da0c8208 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -65,22 +65,6 @@ /***********************************************************/ /* ram save/restore */ -/* RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it - * worked for pages that where filled with the same char. We switched - * it to only search for the zero value. And to avoid confusion with - * RAM_SSAVE_FLAG_COMPRESS_PAGE just rename it. - */ - -#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ -#define RAM_SAVE_FLAG_ZERO 0x02 -#define RAM_SAVE_FLAG_MEM_SIZE 0x04 -#define RAM_SAVE_FLAG_PAGE 0x08 -#define RAM_SAVE_FLAG_EOS 0x10 -#define RAM_SAVE_FLAG_CONTINUE 0x20 -#define RAM_SAVE_FLAG_XBZRLE 0x40 -/* 0x80 is reserved in migration.h start with 0x100 next */ -#define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 - static inline bool is_zero_range(uint8_t *p, uint64_t size) { return buffer_is_zero(p, size); diff --git a/migration/ram.h b/migration/ram.h index 6378bb3ebc..c6bad8bbdf 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -33,6 +33,22 @@ #include "exec/cpu-common.h" #include "io/channel.h" +/* RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it + * worked for pages that where filled with the same char. We switched + * it to only search for the zero value. And to avoid confusion with + * RAM_SSAVE_FLAG_COMPRESS_PAGE just rename it. + */ + +#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */ +#define RAM_SAVE_FLAG_ZERO 0x02 +#define RAM_SAVE_FLAG_MEM_SIZE 0x04 +#define RAM_SAVE_FLAG_PAGE 0x08 +#define RAM_SAVE_FLAG_EOS 0x10 +#define RAM_SAVE_FLAG_CONTINUE 0x20 +#define RAM_SAVE_FLAG_XBZRLE 0x40 +/* 0x80 is reserved in migration.h start with 0x100 next */ +#define RAM_SAVE_FLAG_COMPRESS_PAGE 0x100 + extern MigrationStats ram_counters; extern XBZRLECacheStats xbzrle_counters; extern CompressionStats compression_counters; From patchwork Wed Mar 17 16:32:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89AA3C433DB for ; Wed, 17 Mar 2021 17:17:46 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B223064EED for ; Wed, 17 Mar 2021 17:17:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B223064EED Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:58022 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZnk-00049C-KG for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 13:17:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36526) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ92-0002rc-LJ for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:35:40 -0400 Received: from relay.sw.ru ([185.231.240.75]:50398) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ8z-0005UD-21 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:35:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=/1FvT4/rC1wGR/G2lPPw0+A45U6B1UCcaO9+4u019UY=; b=YrBZjiU1U4My vqJtraF7ioIpaKGoL7clYCaK8j91eu17wUI8THfnd4YwB1tx9RDXl9fRdywO2ETNIwjo9dW904gFy fqCHgS3VpZ+0y34PYCOnGKSw0MusOVrTJAjIZqTkBoarRfS6P6kgjVU7I0wZlGkL3sRIVAsLy2P5n 8ihUM=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ8N-0034yI-OL; Wed, 17 Mar 2021 19:34:59 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 7/9] migration/snap-tool: Complete implementation of snapshot saving Date: Wed, 17 Mar 2021 19:32:20 +0300 Message-Id: <20210317163222.182609-8-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Includes code to parse incoming migration stream, dispatch data to section handlers and deal with complications of open-coded migration format without introducing strong dependencies on QEMU migration code. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 42 +++ qemu-snap-handlers.c | 717 ++++++++++++++++++++++++++++++++++++++++++- qemu-snap.c | 54 +++- 3 files changed, 810 insertions(+), 3 deletions(-) diff --git a/include/qemu-snap.h b/include/qemu-snap.h index f4b38d6442..570f200c9d 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -51,8 +51,47 @@ typedef struct AioBufferTask { typedef AioBufferStatus coroutine_fn (*AioBufferFunc)(AioBufferTask *task); +typedef struct QIOChannelBuffer QIOChannelBuffer; + typedef struct SnapSaveState { + const char *filename; /* Image file name */ BlockBackend *blk; /* Block backend */ + + QEMUFile *f_fd; /* Incoming migration stream QEMUFile */ + QEMUFile *f_vmstate; /* Block backend vmstate area QEMUFile */ + /* + * Buffer to stash first few KBs of incoming stream, part of it later will + * go the the VMSTATE area of the image file. Specifically, these are VM + * state header, configuration section and the section which contains + * RAM block list. + */ + QIOChannelBuffer *ioc_lbuf; + /* Page coalescing buffer channel */ + QIOChannelBuffer *ioc_pbuf; + + /* BDRV offset matching start of ioc_pbuf */ + int64_t bdrv_offset; + /* Last BDRV offset saved to ioc_pbuf */ + int64_t last_bdrv_offset; + + /* Stream read position, updated at the beginning of each new section */ + int64_t stream_pos; + + /* Stream read position at the beginning of RAM block list section */ + int64_t ram_list_pos; + /* Stream read position at the beginning of the first RAM data section */ + int64_t ram_pos; + /* Stream read position at the beginning of the first device state section */ + int64_t device_pos; + + /* Final status */ + int status; + + /* + * Keep first few bytes from the beginning of each section for the case + * when we meet device state section and go into 'default_handler'. + */ + uint8_t section_header[512]; } SnapSaveState; typedef struct SnapLoadState { @@ -65,6 +104,9 @@ SnapLoadState *snap_load_get_state(void); int coroutine_fn snap_save_state_main(SnapSaveState *sn); int coroutine_fn snap_load_state_main(SnapLoadState *sn); +void snap_ram_init_state(int page_bits); +void snap_ram_destroy_state(void); + QEMUFile *qemu_fopen_bdrv_vmstate(BlockDriverState *bs, int is_writable); AioBufferPool *coroutine_fn aio_pool_new(int buf_align, int buf_size, int buf_count); diff --git a/qemu-snap-handlers.c b/qemu-snap-handlers.c index bdc1911909..4b63d42a29 100644 --- a/qemu-snap-handlers.c +++ b/qemu-snap-handlers.c @@ -23,11 +23,704 @@ #include "migration/ram.h" #include "qemu-snap.h" +/* BDRV vmstate area MAGIC for state header */ +#define VMSTATE_MAGIC 0x5354564d +/* BDRV vmstate area header size */ +#define VMSTATE_HEADER_SIZE 28 +/* BDRV vmstate area header eof_pos field offset */ +#define VMSTATE_HEADER_EOF_OFFSET 24 + +/* Alignment of QEMU RAM block on backing storage */ +#define BLK_RAM_BLOCK_ALIGN (1024 * 1024) +/* Max. byte count for page coalescing buffer */ +#define PAGE_COALESC_MAX (512 * 1024) + +/* RAM block descriptor */ +typedef struct RAMBlockDesc { + int64_t bdrv_offset; /* Offset on backing storage */ + int64_t length; /* RAM block used_length */ + int64_t nr_pages; /* RAM block page count (length >> page_bits) */ + + char idstr[256]; /* RAM block id string */ + + /* Link into ram_list */ + QSIMPLEQ_ENTRY(RAMBlockDesc) next; +} RAMBlockDesc; + +/* State reflecting RAM part of snapshot */ +typedef struct RAMState { + int64_t page_size; /* Page size */ + int64_t page_mask; /* Page mask */ + int page_bits; /* Page size bits */ + + int64_t normal_pages; /* Total number of normal (non-zero) pages */ + + /* List of RAM blocks */ + QSIMPLEQ_HEAD(, RAMBlockDesc) ram_block_list; +} RAMState; + +/* Section handler ops */ +typedef struct SectionHandlerOps { + int (*save_section)(QEMUFile *f, void *opaque, int version_id); + int (*load_section)(QEMUFile *f, void *opaque, int version_id); +} SectionHandlerOps; + +/* Section handler */ +typedef struct SectionHandlersEntry { + const char *idstr; /* Section id string */ + const int instance_id; /* Section instance id */ + const int version_id; /* Max. supported section version id */ + + int state_section_id; /* Section id from migration stream */ + int state_version_id; /* Version id from migration stream */ + + SectionHandlerOps *ops; /* Section handler callbacks */ +} SectionHandlersEntry; + +/* Available section handlers */ +typedef struct SectionHandlers { + /* Handler for sections not identified by 'handlers' array */ + SectionHandlersEntry default_entry; + /* Array of section save/load handlers */ + SectionHandlersEntry entries[]; +} SectionHandlers; + +#define SECTION_HANDLERS_ENTRY(_idstr, _instance_id, _version_id, _ops) { \ + .idstr = _idstr, \ + .instance_id = (_instance_id), \ + .version_id = (_version_id), \ + .ops = (_ops), \ +} + +#define SECTION_HANDLERS_END() { NULL, } + +/* Forward declarations */ +static int default_save(QEMUFile *f, void *opaque, int version_id); +static int ram_save(QEMUFile *f, void *opaque, int version_id); +static int save_state_complete(SnapSaveState *sn); + +static RAMState ram_state; + +static SectionHandlerOps default_handler_ops = { + .save_section = default_save, + .load_section = NULL, +}; + +static SectionHandlerOps ram_handler_ops = { + .save_section = ram_save, + .load_section = NULL, +}; + +static SectionHandlers section_handlers = { + .default_entry = + SECTION_HANDLERS_ENTRY("default", 0, 0, &default_handler_ops), + .entries = { + SECTION_HANDLERS_ENTRY("ram", 0, 4, &ram_handler_ops), + SECTION_HANDLERS_END(), + }, +}; + +static SectionHandlersEntry *find_se(const char *idstr, int instance_id) +{ + SectionHandlersEntry *se; + + for (se = section_handlers.entries; se->idstr; se++) { + if (!strcmp(se->idstr, idstr) && (instance_id == se->instance_id)) { + return se; + } + } + return NULL; +} + +static SectionHandlersEntry *find_se_by_section_id(int section_id) +{ + SectionHandlersEntry *se; + + for (se = section_handlers.entries; se->idstr; se++) { + if (section_id == se->state_section_id) { + return se; + } + } + return NULL; +} + +static bool check_section_footer(QEMUFile *f, SectionHandlersEntry *se) +{ + uint8_t token; + int section_id; + + token = qemu_get_byte(f); + if (token != QEMU_VM_SECTION_FOOTER) { + error_report("Missing footer for section '%s'", se->idstr); + return false; + } + + section_id = qemu_get_be32(f); + if (section_id != se->state_section_id) { + error_report("Mismatched section_id in footer for section '%s':" + " read_id=%d expected_id=%d", + se->idstr, section_id, se->state_section_id); + return false; + } + return true; +} + +static inline +bool ram_offset_in_block(RAMBlockDesc *block, int64_t offset) +{ + return (block && (offset < block->length)); +} + +static inline +bool ram_bdrv_offset_in_block(RAMBlockDesc *block, int64_t bdrv_offset) +{ + return (block && (bdrv_offset >= block->bdrv_offset) && + (bdrv_offset < block->bdrv_offset + block->length)); +} + +static inline +int64_t ram_bdrv_from_block_offset(RAMBlockDesc *block, int64_t offset) +{ + if (!ram_offset_in_block(block, offset)) { + return INVALID_OFFSET; + } + + return block->bdrv_offset + offset; +} + +static inline +int64_t ram_block_offset_from_bdrv(RAMBlockDesc *block, int64_t bdrv_offset) +{ + int64_t offset; + + if (!block) { + return INVALID_OFFSET; + } + offset = bdrv_offset - block->bdrv_offset; + return offset >= 0 ? offset : INVALID_OFFSET; +} + +static RAMBlockDesc *ram_block_by_idstr(const char *idstr) +{ + RAMBlockDesc *block; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + if (!strcmp(idstr, block->idstr)) { + return block; + } + } + return NULL; +} + +static RAMBlockDesc *ram_block_from_stream(QEMUFile *f, int flags) +{ + static RAMBlockDesc *block; + char idstr[256]; + + if (flags & RAM_SAVE_FLAG_CONTINUE) { + if (!block) { + error_report("Corrupted 'ram' section: offset=0x%" PRIx64, + qemu_ftell2(f)); + return NULL; + } + return block; + } + + if (!qemu_get_counted_string(f, idstr)) { + return NULL; + } + block = ram_block_by_idstr(idstr); + if (!block) { + error_report("Can't find RAM block '%s'", idstr); + return NULL; + } + return block; +} + +static int64_t ram_block_next_bdrv_offset(void) +{ + RAMBlockDesc *last_block; + int64_t offset; + + last_block = QSIMPLEQ_LAST(&ram_state.ram_block_list, RAMBlockDesc, next); + if (!last_block) { + return 0; + } + offset = last_block->bdrv_offset + last_block->length; + return ROUND_UP(offset, BLK_RAM_BLOCK_ALIGN); +} + +static void ram_block_add(const char *idstr, int64_t size) +{ + RAMBlockDesc *block; + + block = g_new0(RAMBlockDesc, 1); + block->length = size; + block->bdrv_offset = ram_block_next_bdrv_offset(); + strcpy(block->idstr, idstr); + + QSIMPLEQ_INSERT_TAIL(&ram_state.ram_block_list, block, next); +} + +static void ram_block_list_from_stream(QEMUFile *f, int64_t mem_size) +{ + int64_t total_ram_bytes; + + total_ram_bytes = mem_size; + while (total_ram_bytes > 0) { + char idstr[256]; + int64_t size; + + if (!qemu_get_counted_string(f, idstr)) { + error_report("Can't get RAM block id string in 'ram' " + "MEM_SIZE: offset=0x%" PRIx64 " error=%d", + qemu_ftell2(f), qemu_file_get_error(f)); + return; + } + size = (int64_t) qemu_get_be64(f); + + ram_block_add(idstr, size); + total_ram_bytes -= size; + } + if (total_ram_bytes != 0) { + error_report("Mismatched MEM_SIZE vs sum of RAM block lengths:" + " mem_size=%" PRId64 " block_sum=%" PRId64, + mem_size, (mem_size - total_ram_bytes)); + } +} + +static void save_check_file_errors(SnapSaveState *sn, int *res) +{ + /* Check for -EIO that indicates plane EOF */ + if (*res == -EIO) { + *res = 0; + } + /* Check file errors for success and -EINVAL retcodes */ + if (*res >= 0 || *res == -EINVAL) { + int f_res; + + f_res = qemu_file_get_error(sn->f_fd); + f_res = (f_res == -EIO) ? 0 : f_res; + if (!f_res) { + f_res = qemu_file_get_error(sn->f_vmstate); + } + *res = f_res ? f_res : *res; + } +} + +static int ram_save_page(SnapSaveState *sn, uint8_t *page_ptr, int64_t bdrv_offset) +{ + size_t pbuf_usage = sn->ioc_pbuf->usage; + int page_size = ram_state.page_size; + int res = 0; + + if (bdrv_offset != sn->last_bdrv_offset || + (pbuf_usage + page_size) >= PAGE_COALESC_MAX) { + if (pbuf_usage) { + /* Flush coalesced pages to block device */ + res = blk_pwrite(sn->blk, sn->bdrv_offset, + sn->ioc_pbuf->data, pbuf_usage, 0); + res = res < 0 ? res : 0; + } + + /* Reset coalescing buffer state */ + sn->ioc_pbuf->usage = 0; + sn->ioc_pbuf->offset = 0; + /* Switch to new starting bdrv_offset */ + sn->bdrv_offset = bdrv_offset; + } + + qio_channel_write(QIO_CHANNEL(sn->ioc_pbuf), + (char *) page_ptr, page_size, NULL); + sn->last_bdrv_offset = bdrv_offset + page_size; + return res; +} + +static int ram_save_page_flush(SnapSaveState *sn) +{ + size_t pbuf_usage = sn->ioc_pbuf->usage; + int res = 0; + + if (pbuf_usage) { + /* Flush coalesced pages to block device */ + res = blk_pwrite(sn->blk, sn->bdrv_offset, + sn->ioc_pbuf->data, pbuf_usage, 0); + res = res < 0 ? res : 0; + } + + /* Reset coalescing buffer state */ + sn->ioc_pbuf->usage = 0; + sn->ioc_pbuf->offset = 0; + + sn->last_bdrv_offset = INVALID_OFFSET; + return res; +} + +static int ram_save(QEMUFile *f, void *opaque, int version_id) +{ + SnapSaveState *sn = (SnapSaveState *) opaque; + RAMState *rs = &ram_state; + int incompat_flags = (RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE); + int page_size = rs->page_size; + int flags = 0; + int res = 0; + + if (version_id != 4) { + error_report("Unsupported version %d for 'ram' handler v4", version_id); + return -EINVAL; + } + + while (!res && !(flags & RAM_SAVE_FLAG_EOS)) { + RAMBlockDesc *block; + int64_t bdrv_offset = INVALID_OFFSET; + uint8_t *page_ptr; + ssize_t count; + int64_t addr; + + addr = qemu_get_be64(f); + flags = addr & ~rs->page_mask; + addr &= rs->page_mask; + + if (flags & incompat_flags) { + error_report("RAM page with incompatible flags: offset=0x%" PRIx64 + " flags=0x%x", qemu_ftell2(f), flags); + res = -EINVAL; + break; + } + + if (flags & (RAM_SAVE_FLAG_ZERO | RAM_SAVE_FLAG_PAGE)) { + block = ram_block_from_stream(f, flags); + bdrv_offset = ram_bdrv_from_block_offset(block, addr); + if (bdrv_offset == INVALID_OFFSET) { + error_report("Corrupted RAM page: offset=0x%" PRIx64 + " page_addr=0x%" PRIx64, qemu_ftell2(f), addr); + res = -EINVAL; + break; + } + } + + switch (flags & ~RAM_SAVE_FLAG_CONTINUE) { + case RAM_SAVE_FLAG_MEM_SIZE: + /* Save position of the section containing list of RAM blocks */ + if (sn->ram_list_pos) { + error_report("Unexpected RAM page with FLAG_MEM_SIZE:" + " offset=0x%" PRIx64 " page_addr=0x%" PRIx64 + " flags=0x%x", qemu_ftell2(f), addr, flags); + res = -EINVAL; + break; + } + sn->ram_list_pos = sn->stream_pos; + + /* Fill RAM block list */ + ram_block_list_from_stream(f, addr); + break; + + case RAM_SAVE_FLAG_ZERO: + /* Nothing to do with zero page */ + qemu_get_byte(f); + break; + + case RAM_SAVE_FLAG_PAGE: + count = qemu_peek_buffer(f, &page_ptr, page_size, 0); + qemu_file_skip(f, count); + if (count != page_size) { + /* I/O error */ + break; + } + + res = ram_save_page(sn, page_ptr, bdrv_offset); + /* Update normal page count */ + ram_state.normal_pages++; + break; + + case RAM_SAVE_FLAG_EOS: + /* Normal exit */ + break; + + default: + error_report("RAM page with unknown combination of flags:" + " offset=0x%" PRIx64 " page_addr=0x%" PRIx64 + " flags=0x%x", qemu_ftell2(f), addr, flags); + res = -EINVAL; + } + + /* Make additional check for file errors */ + if (!res) { + res = qemu_file_get_error(f); + } + } + + /* Flush page coalescing buffer at RAM_SAVE_FLAG_EOS */ + if (!res) { + res = ram_save_page_flush(sn); + } + return res; +} + +static int default_save(QEMUFile *f, void *opaque, int version_id) +{ + SnapSaveState *sn = (SnapSaveState *) opaque; + + if (!sn->ram_pos) { + error_report("Section with unknown ID before first 'ram' section:" + " offset=0x%" PRIx64, sn->stream_pos); + return -EINVAL; + } + if (!sn->device_pos) { + sn->device_pos = sn->stream_pos; + /* + * Save the rest of migration data needed to restore VM state. + * It is the header, configuration section, first 'ram' section + * with the list of RAM blocks and device state data. + */ + return save_state_complete(sn); + } + + /* Should never get here */ + assert(false); + return -EINVAL; +} + +static int save_state_complete(SnapSaveState *sn) +{ + QEMUFile *f = sn->f_fd; + int64_t pos; + int64_t eof_pos; + + /* Current read position */ + pos = qemu_ftell2(f); + + /* Put specific MAGIC at the beginning of saved BDRV vmstate stream */ + qemu_put_be32(sn->f_vmstate, VMSTATE_MAGIC); + /* Target page size */ + qemu_put_be32(sn->f_vmstate, ram_state.page_size); + /* Number of normal (non-zero) pages in snapshot */ + qemu_put_be64(sn->f_vmstate, ram_state.normal_pages); + /* Offset of RAM block list section relative to QEMU_VM_FILE_MAGIC */ + qemu_put_be32(sn->f_vmstate, sn->ram_list_pos); + /* Offset of first device state section relative to QEMU_VM_FILE_MAGIC */ + qemu_put_be32(sn->f_vmstate, sn->ram_pos); + /* + * Put a slot here since we don't really know how + * long is the rest of migration stream. + */ + qemu_put_be32(sn->f_vmstate, 0); + + /* + * At the completion stage we save the leading part of migration stream + * which contains header, configuration section and the 'ram' section + * with QEMU_VM_SECTION_FULL type containing the list of RAM blocks. + * + * All of this comes before the first QEMU_VM_SECTION_PART token for 'ram'. + * That QEMU_VM_SECTION_PART token is pointed by sn->ram_pos. + */ + qemu_put_buffer(sn->f_vmstate, sn->ioc_lbuf->data, sn->ram_pos); + /* + * And then we save the trailing part with device state. + * + * First we take section header which has already been skipped + * by QEMUFile but we can get it from sn->section_header. + */ + qemu_put_buffer(sn->f_vmstate, sn->section_header, (pos - sn->device_pos)); + + /* Forward the rest of stream data to the BDRV vmstate file */ + file_transfer_to_eof(sn->f_vmstate, f); + /* It does qemu_fflush() internally */ + eof_pos = qemu_ftell(sn->f_vmstate); + + /* Hack: simulate negative seek() */ + qemu_update_position(sn->f_vmstate, + (size_t)(ssize_t) (VMSTATE_HEADER_EOF_OFFSET - eof_pos)); + qemu_put_be32(sn->f_vmstate, eof_pos - VMSTATE_HEADER_SIZE); + /* Final flush to deliver eof_offset header field */ + qemu_fflush(sn->f_vmstate); + + return 1; +} + +static int save_section_config(SnapSaveState *sn) +{ + QEMUFile *f = sn->f_fd; + uint32_t id_len; + + id_len = qemu_get_be32(f); + if (id_len > 255) { + error_report("Corrupted QEMU_VM_CONFIGURATION section"); + return -EINVAL; + } + qemu_file_skip(f, id_len); + return 0; +} + +static int save_section_start_full(SnapSaveState *sn) +{ + QEMUFile *f = sn->f_fd; + SectionHandlersEntry *se; + int section_id; + int instance_id; + int version_id; + char id_str[256]; + int res; + + /* Read section start */ + section_id = qemu_get_be32(f); + if (!qemu_get_counted_string(f, id_str)) { + return qemu_file_get_error(f); + } + instance_id = qemu_get_be32(f); + version_id = qemu_get_be32(f); + + se = find_se(id_str, instance_id); + if (!se) { + se = §ion_handlers.default_entry; + } else if (version_id > se->version_id) { + /* Validate version */ + error_report("Unsupported version %d for '%s' v%d", + version_id, id_str, se->version_id); + return -EINVAL; + } + + se->state_section_id = section_id; + se->state_version_id = version_id; + + res = se->ops->save_section(f, sn, se->state_version_id); + /* + * Positive return value indicates save completion, + * no need to check section footer. + */ + if (res) { + return res; + } + + /* Finally check section footer */ + if (!check_section_footer(f, se)) { + return -EINVAL; + } + return 0; +} + +static int save_section_part_end(SnapSaveState *sn) +{ + QEMUFile *f = sn->f_fd; + SectionHandlersEntry *se; + int section_id; + int res; + + /* First section with QEMU_VM_SECTION_PART type must be the 'ram' section */ + if (!sn->ram_pos) { + sn->ram_pos = sn->stream_pos; + } + + section_id = qemu_get_be32(f); + se = find_se_by_section_id(section_id); + if (!se) { + error_report("Unknown section ID: %d", section_id); + return -EINVAL; + } + + res = se->ops->save_section(f, sn, se->state_version_id); + if (res) { + error_report("Error while saving section: id_str='%s' section_id=%d", + se->idstr, section_id); + return res; + } + + /* Finally check section footer */ + if (!check_section_footer(f, se)) { + return -EINVAL; + } + return 0; +} + +static int save_state_header(SnapSaveState *sn) +{ + QEMUFile *f = sn->f_fd; + uint32_t v; + + /* Validate QEMU MAGIC */ + v = qemu_get_be32(f); + if (v != QEMU_VM_FILE_MAGIC) { + error_report("Not a migration stream"); + return -EINVAL; + } + v = qemu_get_be32(f); + if (v == QEMU_VM_FILE_VERSION_COMPAT) { + error_report("SaveVM v2 format is obsolete"); + return -EINVAL; + } + if (v != QEMU_VM_FILE_VERSION) { + error_report("Unsupported migration stream version"); + return -EINVAL; + } + return 0; +} + /* Save snapshot data from incoming migration stream */ int coroutine_fn snap_save_state_main(SnapSaveState *sn) { - /* TODO: implement */ - return 0; + QEMUFile *f = sn->f_fd; + uint8_t *buf; + uint8_t section_type; + int res = 0; + + res = save_state_header(sn); + if (res) { + save_check_file_errors(sn, &res); + return res; + } + + while (!res) { + /* Update current stream position so it points to the section type token */ + sn->stream_pos = qemu_ftell2(f); + + /* + * Keep some data from the beginning of the section to use it if it appears + * that we have reached device state section and go into 'default_handler'. + */ + qemu_peek_buffer(f, &buf, sizeof(sn->section_header), 0); + memcpy(sn->section_header, buf, sizeof(sn->section_header)); + + /* Read section type token */ + section_type = qemu_get_byte(f); + + switch (section_type) { + case QEMU_VM_CONFIGURATION: + res = save_section_config(sn); + break; + + case QEMU_VM_SECTION_FULL: + case QEMU_VM_SECTION_START: + res = save_section_start_full(sn); + break; + + case QEMU_VM_SECTION_PART: + case QEMU_VM_SECTION_END: + res = save_section_part_end(sn); + break; + + case QEMU_VM_EOF: + /* + * End of migration stream, but normally we will never really get here + * since final part of migration stream is a series of QEMU_VM_SECTION_FULL + * sections holding non-iterable device state. In our case all this + * state is saved with single call to snap_save_section_start_full() + * when we first meet unknown section id string. + */ + res = -EINVAL; + break; + + default: + error_report("Unknown section type %d", section_type); + res = -EINVAL; + } + + /* Additional check for file errors on success and -EINVAL */ + save_check_file_errors(sn, &res); + } + + /* Replace positive exit code with 0 */ + sn->status = res < 0 ? res : 0; + return sn->status; } /* Load snapshot data and send it with outgoing migration stream */ @@ -36,3 +729,23 @@ int coroutine_fn snap_load_state_main(SnapLoadState *sn) /* TODO: implement */ return 0; } + +/* Initialize snapshot RAM state */ +void snap_ram_init_state(int page_bits) +{ + RAMState *rs = &ram_state; + + memset(rs, 0, sizeof(ram_state)); + + rs->page_bits = page_bits; + rs->page_size = (int64_t) 1 << page_bits; + rs->page_mask = ~(rs->page_size - 1); + + /* Initialize RAM block list head */ + QSIMPLEQ_INIT(&rs->ram_block_list); +} + +/* Destroy snapshot RAM state */ +void snap_ram_destroy_state(void) +{ +} diff --git a/qemu-snap.c b/qemu-snap.c index ec56aa55d2..a337a7667b 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -105,11 +105,31 @@ SnapLoadState *snap_load_get_state(void) static void snap_save_init_state(void) { memset(&save_state, 0, sizeof(save_state)); + save_state.status = -1; } static void snap_save_destroy_state(void) { - /* TODO: implement */ + SnapSaveState *sn = snap_save_get_state(); + + if (sn->ioc_lbuf) { + object_unref(OBJECT(sn->ioc_lbuf)); + } + if (sn->ioc_pbuf) { + object_unref(OBJECT(sn->ioc_pbuf)); + } + if (sn->f_vmstate) { + qemu_fclose(sn->f_vmstate); + } + if (sn->blk) { + blk_flush(sn->blk); + blk_unref(sn->blk); + + /* Delete image file in case of failure */ + if (sn->status) { + qemu_unlink(sn->filename); + } + } } static void snap_load_init_state(void) @@ -190,6 +210,8 @@ static void coroutine_fn do_snap_save_co(void *opaque) SnapTaskState *task_state = opaque; SnapSaveState *sn = snap_save_get_state(); + /* Switch to non-blocking mode in coroutine context */ + qemu_file_set_blocking(sn->f_fd, false); /* Enter main routine */ task_state->ret = snap_save_state_main(sn); } @@ -233,17 +255,46 @@ static int run_snap_task(CoroutineEntry *entry) static int snap_save(const SnapSaveParams *params) { SnapSaveState *sn; + QIOChannel *ioc_fd; + uint8_t *buf; + size_t count; int res = -1; + snap_ram_init_state(ctz64(params->page_size)); snap_save_init_state(); sn = snap_save_get_state(); + sn->filename = params->filename; + + ioc_fd = qio_channel_new_fd(params->fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-incoming"); + sn->f_fd = qemu_fopen_channel_input(ioc_fd); + object_unref(OBJECT(ioc_fd)); + + /* Create buffer channel to store leading part of incoming stream */ + sn->ioc_lbuf = qio_channel_buffer_new(INPLACE_READ_MAX); + qio_channel_set_name(QIO_CHANNEL(sn->ioc_lbuf), "snap-leader-buffer"); + + count = qemu_peek_buffer(sn->f_fd, &buf, INPLACE_READ_MAX, 0); + res = qemu_file_get_error(sn->f_fd); + if (res < 0) { + goto fail; + } + qio_channel_write(QIO_CHANNEL(sn->ioc_lbuf), (char *) buf, count, NULL); + + /* Used for incoming page coalescing */ + sn->ioc_pbuf = qio_channel_buffer_new(128 * 1024); + qio_channel_set_name(QIO_CHANNEL(sn->ioc_pbuf), "snap-page-buffer"); + sn->blk = snap_create(params->filename, params->image_size, params->bdrv_flags, params->writethrough); if (!sn->blk) { goto fail; } + /* Open QEMUFile so we can write to BDRV vmstate area */ + sn->f_vmstate = qemu_fopen_bdrv_vmstate(blk_bs(sn->blk), 1); + res = run_snap_task(do_snap_save_co); if (res) { error_report("Failed to save snapshot: error=%d", res); @@ -251,6 +302,7 @@ static int snap_save(const SnapSaveParams *params) fail: snap_save_destroy_state(); + snap_ram_destroy_state(); return res; } From patchwork Wed Mar 17 16:32:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8F3BC433E9 for ; Wed, 17 Mar 2021 17:20:47 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3125F64EE7 for ; Wed, 17 Mar 2021 17:20:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3125F64EE7 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:36648 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZqg-00072n-7Y for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 13:20:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36706) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9Y-0003CD-00 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:15 -0400 Received: from relay.sw.ru ([185.231.240.75]:50744) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9O-0005im-GT for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=KQSnYrifT5MGz3UBYuDBKnvuFBfN3TpZOQ1Wwyz4PIM=; b=Z0YzBg2acvde YY1moErPWy9f6pXPswRxKfDqWZt07vSgIeXKv/HNG8PQ1t+veMBESQcJ2v5KEaTKppKb0iaMeLf3f AcCcSvb6n6OSZ8kgDeE5w7msJAa1+JbKx+WUAMamIPtXZeDIIJDU3Z3ZBoS4h+ajh2H4ezzMNMERE OiX3s=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ8l-0034yI-PH; Wed, 17 Mar 2021 19:35:23 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 8/9] migration/snap-tool: Implementation of snapshot loading in precopy Date: Wed, 17 Mar 2021 19:32:21 +0300 Message-Id: <20210317163222.182609-9-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This part implements snapshot loading in precopy mode. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 24 ++ qemu-snap-handlers.c | 586 ++++++++++++++++++++++++++++++++++++++++++- qemu-snap.c | 44 +++- 3 files changed, 649 insertions(+), 5 deletions(-) diff --git a/include/qemu-snap.h b/include/qemu-snap.h index 570f200c9d..f9f6db529f 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -26,6 +26,11 @@ */ #define PAGE_SIZE_MAX 16384 +/* Buffer size for RAM chunk loads via AIO buffer_pool */ +#define AIO_BUFFER_SIZE (1024 * 1024) +/* Max. concurrent AIO tasks */ +#define AIO_TASKS_MAX 8 + typedef struct AioBufferPool AioBufferPool; typedef struct AioBufferStatus { @@ -96,6 +101,25 @@ typedef struct SnapSaveState { typedef struct SnapLoadState { BlockBackend *blk; /* Block backend */ + + QEMUFile *f_fd; /* Outgoing migration stream QEMUFile */ + QEMUFile *f_vmstate; /* Block backend vmstate area QEMUFile */ + /* + * Buffer to keep first few KBs of BDRV vmstate that we stashed at the + * start. Within this buffer are VM state header, configuration section + * and the first 'ram' section with RAM block list. + */ + QIOChannelBuffer *ioc_lbuf; + + /* AIO buffer pool */ + AioBufferPool *aio_pool; + + /* BDRV vmstate offset of RAM block list section */ + int64_t state_ram_list_offset; + /* BDRV vmstate offset of the first device section */ + int64_t state_device_offset; + /* BDRV vmstate End-Of-File */ + int64_t state_eof; } SnapLoadState; SnapSaveState *snap_save_get_state(void); diff --git a/qemu-snap-handlers.c b/qemu-snap-handlers.c index 4b63d42a29..7dfe950829 100644 --- a/qemu-snap-handlers.c +++ b/qemu-snap-handlers.c @@ -41,12 +41,22 @@ typedef struct RAMBlockDesc { int64_t length; /* RAM block used_length */ int64_t nr_pages; /* RAM block page count (length >> page_bits) */ + int64_t last_offset; /* Last offset sent in precopy */ + char idstr[256]; /* RAM block id string */ + unsigned long *bitmap; /* Loaded pages bitmap */ + /* Link into ram_list */ QSIMPLEQ_ENTRY(RAMBlockDesc) next; } RAMBlockDesc; +/* Reference to the RAM page with block/page tuple */ +typedef struct RAMPageRef { + RAMBlockDesc *block; /* RAM block containing page */ + int64_t page; /* Page index in RAM block */ +} RAMPageRef; + /* State reflecting RAM part of snapshot */ typedef struct RAMState { int64_t page_size; /* Page size */ @@ -54,6 +64,15 @@ typedef struct RAMState { int page_bits; /* Page size bits */ int64_t normal_pages; /* Total number of normal (non-zero) pages */ + int64_t loaded_pages; /* Current number of normal pages loaded */ + + /* Last RAM block touched by load_send_ram_iterate() */ + RAMBlockDesc *last_block; + /* Last page touched by load_send_ram_iterate() */ + int64_t last_page; + + /* Last RAM block sent by load_send_ram_iterate() */ + RAMBlockDesc *last_sent_block; /* List of RAM blocks */ QSIMPLEQ_HEAD(, RAMBlockDesc) ram_block_list; @@ -96,19 +115,22 @@ typedef struct SectionHandlers { /* Forward declarations */ static int default_save(QEMUFile *f, void *opaque, int version_id); +static int default_load(QEMUFile *f, void *opaque, int version_id); static int ram_save(QEMUFile *f, void *opaque, int version_id); +static int ram_load(QEMUFile *f, void *opaque, int version_id); static int save_state_complete(SnapSaveState *sn); +static int coroutine_fn load_send_pages_flush(SnapLoadState *sn); static RAMState ram_state; static SectionHandlerOps default_handler_ops = { .save_section = default_save, - .load_section = NULL, + .load_section = default_load, }; static SectionHandlerOps ram_handler_ops = { .save_section = ram_save, - .load_section = NULL, + .load_section = ram_load, }; static SectionHandlers section_handlers = { @@ -212,6 +234,18 @@ static RAMBlockDesc *ram_block_by_idstr(const char *idstr) return NULL; } +static RAMBlockDesc *ram_block_by_bdrv_offset(int64_t bdrv_offset) +{ + RAMBlockDesc *block; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + if (ram_bdrv_offset_in_block(block, bdrv_offset)) { + return block; + } + } + return NULL; +} + static RAMBlockDesc *ram_block_from_stream(QEMUFile *f, int flags) { static RAMBlockDesc *block; @@ -289,6 +323,36 @@ static void ram_block_list_from_stream(QEMUFile *f, int64_t mem_size) } } +static void ram_block_list_init_bitmaps(void) +{ + RAMBlockDesc *block; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + block->nr_pages = block->length >> ram_state.page_bits; + + block->bitmap = bitmap_new(block->nr_pages); + bitmap_set(block->bitmap, 0, block->nr_pages); + } +} + +static inline +int64_t ram_block_bitmap_find_next(RAMBlockDesc *block, int64_t start) +{ + return find_next_bit(block->bitmap, block->nr_pages, start); +} + +static inline +int64_t ram_block_bitmap_find_next_clear(RAMBlockDesc *block, int64_t start) +{ + return find_next_zero_bit(block->bitmap, block->nr_pages, start); +} + +static inline +void ram_block_bitmap_clear(RAMBlockDesc *block, int64_t start, int64_t count) +{ + bitmap_clear(block->bitmap, start, count); +} + static void save_check_file_errors(SnapSaveState *sn, int *res) { /* Check for -EIO that indicates plane EOF */ @@ -723,11 +787,517 @@ int coroutine_fn snap_save_state_main(SnapSaveState *sn) return sn->status; } +static void load_check_file_errors(SnapLoadState *sn, int *res) +{ + /* Check file errors even on success */ + if (*res >= 0 || *res == -EINVAL) { + int f_res; + + f_res = qemu_file_get_error(sn->f_fd); + if (!f_res) { + f_res = qemu_file_get_error(sn->f_vmstate); + } + *res = f_res ? f_res : *res; + } +} + +static int ram_load(QEMUFile *f, void *opaque, int version_id) +{ + int compat_flags = (RAM_SAVE_FLAG_MEM_SIZE | RAM_SAVE_FLAG_EOS); + int64_t page_mask = ram_state.page_mask; + int flags = 0; + int res = 0; + + if (version_id != 4) { + error_report("Unsupported version %d for 'ram' handler v4", version_id); + return -EINVAL; + } + + while (!res && !(flags & RAM_SAVE_FLAG_EOS)) { + int64_t addr; + + addr = qemu_get_be64(f); + flags = addr & ~page_mask; + addr &= page_mask; + + if (flags & ~compat_flags) { + error_report("RAM page with incompatible flags: offset=0x%" PRIx64 + " flags=0x%x", qemu_ftell2(f), flags); + res = -EINVAL; + break; + } + + switch (flags) { + case RAM_SAVE_FLAG_MEM_SIZE: + /* Fill RAM block list */ + ram_block_list_from_stream(f, addr); + break; + + case RAM_SAVE_FLAG_EOS: + /* Normal exit */ + break; + + default: + error_report("RAM page with unknown combination of flags:" + " offset=0x%" PRIx64 " page_addr=0x%" PRIx64 + " flags=0x%x", qemu_ftell2(f), addr, flags); + res = -EINVAL; + } + + /* Check for file errors even if all looks good */ + if (!res) { + res = qemu_file_get_error(f); + } + } + return res; +} + +static int default_load(QEMUFile *f, void *opaque, int version_id) +{ + error_report("Section with unknown ID: offset=0x%" PRIx64, + qemu_ftell2(f)); + return -EINVAL; +} + +static void send_page_header(QEMUFile *f, RAMBlockDesc *block, int64_t offset) +{ + uint8_t hdr_buf[512]; + int hdr_len = 8; + + stq_be_p(hdr_buf, offset); + if (!(offset & RAM_SAVE_FLAG_CONTINUE)) { + int id_len; + + id_len = strlen(block->idstr); + assert(id_len < 256); + + hdr_buf[hdr_len] = id_len; + memcpy((hdr_buf + hdr_len + 1), block->idstr, id_len); + + hdr_len += 1 + id_len; + } + + qemu_put_buffer(f, hdr_buf, hdr_len); +} + +static void send_zeropage(QEMUFile *f, RAMBlockDesc *block, int64_t offset) +{ + send_page_header(f, block, offset | RAM_SAVE_FLAG_ZERO); + qemu_put_byte(f, 0); +} + +static int send_pages_from_buffer(QEMUFile *f, AioBuffer *buffer) +{ + RAMState *rs = &ram_state; + int page_size = rs->page_size; + RAMBlockDesc *block = rs->last_sent_block; + int64_t bdrv_offset = buffer->status.offset; + int64_t flags = RAM_SAVE_FLAG_CONTINUE; + int pages = 0; + + /* Need to switch to the another RAM block? */ + if (!ram_bdrv_offset_in_block(block, bdrv_offset)) { + /* + * Lookup RAM block by BDRV offset cause in postcopy we + * can issue AIO buffer loads from non-contiguous blocks. + */ + block = ram_block_by_bdrv_offset(bdrv_offset); + rs->last_sent_block = block; + /* Reset RAM_SAVE_FLAG_CONTINUE */ + flags = 0; + } + + for (int offset = 0; offset < buffer->status.count; + offset += page_size, bdrv_offset += page_size) { + void *page_buf = buffer->data + offset; + int64_t addr; + + addr = ram_block_offset_from_bdrv(block, bdrv_offset); + + if (buffer_is_zero(page_buf, page_size)) { + send_zeropage(f, block, (addr | flags)); + } else { + send_page_header(f, block, + (addr | RAM_SAVE_FLAG_PAGE | flags)); + qemu_put_buffer_async(f, page_buf, page_size, false); + + /* Update non-zero page count */ + rs->loaded_pages++; + } + /* + * AioBuffer is always within a single RAM block so we need + * to set RAM_SAVE_FLAG_CONTINUE here unconditionally. + */ + flags = RAM_SAVE_FLAG_CONTINUE; + pages++; + } + + /* Need to flush cause we use qemu_put_buffer_async() */ + qemu_fflush(f); + return pages; +} + +static bool find_next_unsent_page(RAMPageRef *p_ref) +{ + RAMState *rs = &ram_state; + RAMBlockDesc *block = rs->last_block; + int64_t page = rs->last_page; + bool found = false; + bool full_round = false; + + if (!block) { +restart: + block = QSIMPLEQ_FIRST(&rs->ram_block_list); + page = 0; + full_round = true; + } + + while (!found && block) { + page = ram_block_bitmap_find_next(block, page); + if (page >= block->nr_pages) { + block = QSIMPLEQ_NEXT(block, next); + page = 0; + continue; + } + found = true; + } + + if (!found && !full_round) { + goto restart; + } + + if (found) { + p_ref->block = block; + p_ref->page = page; + } + return found; +} + +static inline +void get_unsent_page_range(RAMPageRef *p_ref, RAMBlockDesc **block, + int64_t *offset, int64_t *limit) +{ + int64_t page_limit; + + *block = p_ref->block; + *offset = p_ref->page << ram_state.page_bits; + page_limit = ram_block_bitmap_find_next_clear(p_ref->block, (p_ref->page + 1)); + *limit = page_limit << ram_state.page_bits; +} + +static AioBufferStatus coroutine_fn load_buffers_task_co(AioBufferTask *task) +{ + SnapLoadState *sn = snap_load_get_state(); + AioBufferStatus ret; + int count; + + count = blk_pread(sn->blk, task->offset, task->buffer->data, task->size); + + ret.offset = task->offset; + ret.count = count; + + return ret; +} + +static void coroutine_fn load_buffers_fill_queue(SnapLoadState *sn) +{ + RAMState *rs = &ram_state; + RAMPageRef p_ref; + RAMBlockDesc *block; + int64_t offset; + int64_t limit; + int64_t pages; + + if (!find_next_unsent_page(&p_ref)) { + return; + } + + get_unsent_page_range(&p_ref, &block, &offset, &limit); + + do { + AioBuffer *buffer; + int64_t bdrv_offset; + int size; + + /* Try to acquire next buffer from the pool */ + buffer = aio_pool_try_acquire_next(sn->aio_pool); + if (!buffer) { + break; + } + + bdrv_offset = ram_bdrv_from_block_offset(block, offset); + assert(bdrv_offset != INVALID_OFFSET); + + /* Get maximum transfer size for current RAM block and offset */ + size = MIN((limit - offset), buffer->size); + aio_buffer_start_task(buffer, load_buffers_task_co, bdrv_offset, size); + + offset += size; + } while (offset < limit); + + rs->last_block = block; + rs->last_page = offset >> rs->page_bits; + + block->last_offset = offset; + + pages = rs->last_page - p_ref.page; + ram_block_bitmap_clear(block, p_ref.page, pages); +} + +static int coroutine_fn load_send_pages(SnapLoadState *sn) +{ + AioBuffer *compl_buffer; + int pages = 0; + + load_buffers_fill_queue(sn); + + compl_buffer = aio_pool_wait_compl_next(sn->aio_pool); + if (compl_buffer) { + /* Check AIO completion status */ + pages = aio_pool_status(sn->aio_pool); + if (pages < 0) { + return pages; + } + + pages = send_pages_from_buffer(sn->f_fd, compl_buffer); + aio_buffer_release(compl_buffer); + } + + return pages; +} + +static int coroutine_fn load_send_pages_flush(SnapLoadState *sn) +{ + AioBuffer *compl_buffer; + + while ((compl_buffer = aio_pool_wait_compl_next(sn->aio_pool))) { + int res = aio_pool_status(sn->aio_pool); + /* Check AIO completion status */ + if (res < 0) { + return res; + } + + send_pages_from_buffer(sn->f_fd, compl_buffer); + aio_buffer_release(compl_buffer); + } + + return 0; +} + +static void send_section_header_part_end(QEMUFile *f, SectionHandlersEntry *se, + uint8_t section_type) +{ + assert(section_type == QEMU_VM_SECTION_PART || + section_type == QEMU_VM_SECTION_END); + + qemu_put_byte(f, section_type); + qemu_put_be32(f, se->state_section_id); +} + +static void send_section_footer(QEMUFile *f, SectionHandlersEntry *se) +{ + qemu_put_byte(f, QEMU_VM_SECTION_FOOTER); + qemu_put_be32(f, se->state_section_id); +} + +#define YIELD_AFTER_MS 500 /* ms */ + +static int coroutine_fn load_send_ram_iterate(SnapLoadState *sn) +{ + SectionHandlersEntry *se; + int64_t t_start; + int tmp_res; + int res = 1; + + /* Send 'ram' section header with QEMU_VM_SECTION_PART type */ + se = find_se("ram", 0); + send_section_header_part_end(sn->f_fd, se, QEMU_VM_SECTION_PART); + + t_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + for (int iter = 0; res > 0; iter++) { + res = load_send_pages(sn); + + if (!(iter & 7)) { + int64_t t_cur = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + if ((t_cur - t_start) > YIELD_AFTER_MS) { + break; + } + } + } + + /* Zero return code means that there're no more pages to send */ + if (res >= 0) { + res = res ? 0 : 1; + } + + /* Flush AIO buffers cause some may still remain unsent */ + tmp_res = load_send_pages_flush(sn); + res = tmp_res ? tmp_res : res; + + /* Send EOS flag before section footer */ + qemu_put_be64(sn->f_fd, RAM_SAVE_FLAG_EOS); + send_section_footer(sn->f_fd, se); + + qemu_fflush(sn->f_fd); + return res; +} + +static int load_send_leader(SnapLoadState *sn) +{ + qemu_put_buffer(sn->f_fd, (sn->ioc_lbuf->data + VMSTATE_HEADER_SIZE), + sn->state_device_offset); + return qemu_file_get_error(sn->f_fd); +} + +static int load_send_complete(SnapLoadState *sn) +{ + /* Transfer device state to the output pipe */ + file_transfer_bytes(sn->f_fd, sn->f_vmstate, + (sn->state_eof - sn->state_device_offset)); + qemu_fflush(sn->f_fd); + return 1; +} + +static int load_section_start_full(SnapLoadState *sn) +{ + QEMUFile *f = sn->f_vmstate; + int section_id; + int instance_id; + int version_id; + char idstr[256]; + SectionHandlersEntry *se; + int res; + + /* Read section start */ + section_id = qemu_get_be32(f); + if (!qemu_get_counted_string(f, idstr)) { + return qemu_file_get_error(f); + } + instance_id = qemu_get_be32(f); + version_id = qemu_get_be32(f); + + se = find_se(idstr, instance_id); + if (!se) { + se = §ion_handlers.default_entry; + } else if (version_id > se->version_id) { + /* Validate version */ + error_report("Unsupported version %d for '%s' v%d", + version_id, idstr, se->version_id); + return -EINVAL; + } + + se->state_section_id = section_id; + se->state_version_id = version_id; + + res = se->ops->load_section(f, sn, se->state_version_id); + if (res) { + return res; + } + + /* Finally check section footer */ + if (!check_section_footer(f, se)) { + return -EINVAL; + } + return 0; +} + +static int load_setup_ramlist(SnapLoadState *sn) +{ + QEMUFile *f = sn->f_vmstate; + uint8_t section_type; + int64_t section_pos; + int res; + + section_pos = qemu_ftell2(f); + + /* Read section type token */ + section_type = qemu_get_byte(f); + if (section_type == QEMU_VM_EOF) { + error_report("Unexpected EOF token: offset=0x%" PRIx64, section_pos); + return -EINVAL; + } else if (section_type != QEMU_VM_SECTION_FULL && + section_type != QEMU_VM_SECTION_START) { + error_report("Unexpected section type %d: offset=0x%" PRIx64, + section_type, section_pos); + return -EINVAL; + } + + res = load_section_start_full(sn); + if (!res) { + ram_block_list_init_bitmaps(); + } + return res; +} + +static int load_state_header(SnapLoadState *sn) +{ + QEMUFile *f = sn->f_vmstate; + uint32_t v; + + /* Validate specific MAGIC in vmstate area */ + v = qemu_get_be32(f); + if (v != VMSTATE_MAGIC) { + error_report("Not a valid VMSTATE"); + return -EINVAL; + } + v = qemu_get_be32(f); + if (v != ram_state.page_size) { + error_report("VMSTATE page size not matching target"); + return -EINVAL; + } + + /* Number of non-zero pages in all RAM blocks */ + ram_state.normal_pages = qemu_get_be64(f); + + /* VMSTATE area offsets, counted from QEMU_FILE_MAGIC */ + sn->state_ram_list_offset = qemu_get_be32(f); + sn->state_device_offset = qemu_get_be32(f); + sn->state_eof = qemu_get_be32(f); + + /* Check that offsets are within the limits */ + if ((VMSTATE_HEADER_SIZE + sn->state_device_offset) > INPLACE_READ_MAX || + sn->state_device_offset <= sn->state_ram_list_offset) { + error_report("Corrupted VMSTATE header"); + return -EINVAL; + } + + /* Skip up to RAM block list section */ + qemu_file_skip(f, sn->state_ram_list_offset); + return 0; +} + /* Load snapshot data and send it with outgoing migration stream */ int coroutine_fn snap_load_state_main(SnapLoadState *sn) { - /* TODO: implement */ - return 0; + int res; + + res = load_state_header(sn); + if (res) { + goto fail; + } + res = load_setup_ramlist(sn); + if (res) { + goto fail; + } + res = load_send_leader(sn); + if (res) { + goto fail; + } + + do { + res = load_send_ram_iterate(sn); + /* Make additional check for file errors */ + load_check_file_errors(sn, &res); + } while (!res); + + if (res == 1) { + res = load_send_complete(sn); + } + +fail: + load_check_file_errors(sn, &res); + /* Replace positive exit code with 0 */ + return res < 0 ? res : 0; } /* Initialize snapshot RAM state */ @@ -748,4 +1318,12 @@ void snap_ram_init_state(int page_bits) /* Destroy snapshot RAM state */ void snap_ram_destroy_state(void) { + RAMBlockDesc *block; + RAMBlockDesc *next_block; + + /* Free RAM blocks */ + QSIMPLEQ_FOREACH_SAFE(block, &ram_state.ram_block_list, next, next_block) { + g_free(block->bitmap); + g_free(block); + } } diff --git a/qemu-snap.c b/qemu-snap.c index a337a7667b..c5efbd6803 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -139,7 +139,20 @@ static void snap_load_init_state(void) static void snap_load_destroy_state(void) { - /* TODO: implement */ + SnapLoadState *sn = snap_load_get_state(); + + if (sn->aio_pool) { + aio_pool_free(sn->aio_pool); + } + if (sn->ioc_lbuf) { + object_unref(OBJECT(sn->ioc_lbuf)); + } + if (sn->f_vmstate) { + qemu_fclose(sn->f_vmstate); + } + if (sn->blk) { + blk_unref(sn->blk); + } } static BlockBackend *snap_create(const char *filename, int64_t image_size, @@ -221,6 +234,12 @@ static void coroutine_fn do_snap_load_co(void *opaque) SnapTaskState *task_state = opaque; SnapLoadState *sn = snap_load_get_state(); + /* Switch to non-blocking mode in coroutine context */ + qemu_file_set_blocking(sn->f_vmstate, false); + qemu_file_set_blocking(sn->f_fd, false); + /* Initialize AIO buffer pool in coroutine context */ + sn->aio_pool = aio_pool_new(DEFAULT_PAGE_SIZE, AIO_BUFFER_SIZE, + AIO_TASKS_MAX); /* Enter main routine */ task_state->ret = snap_load_state_main(sn); } @@ -310,15 +329,37 @@ fail: static int snap_load(SnapLoadParams *params) { SnapLoadState *sn; + QIOChannel *ioc_fd; + uint8_t *buf; + size_t count; int res = -1; + snap_ram_init_state(ctz64(params->page_size)); snap_load_init_state(); sn = snap_load_get_state(); + ioc_fd = qio_channel_new_fd(params->fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-outgoing"); + sn->f_fd = qemu_fopen_channel_output(ioc_fd); + object_unref(OBJECT(ioc_fd)); + sn->blk = snap_open(params->filename, params->bdrv_flags); if (!sn->blk) { goto fail; } + /* Open QEMUFile for BDRV vmstate area */ + sn->f_vmstate = qemu_fopen_bdrv_vmstate(blk_bs(sn->blk), 0); + + /* Create buffer channel to store leading part of VMSTATE stream */ + sn->ioc_lbuf = qio_channel_buffer_new(INPLACE_READ_MAX); + qio_channel_set_name(QIO_CHANNEL(sn->ioc_lbuf), "snap-leader-buffer"); + + count = qemu_peek_buffer(sn->f_vmstate, &buf, INPLACE_READ_MAX, 0); + res = qemu_file_get_error(sn->f_vmstate); + if (res < 0) { + goto fail; + } + qio_channel_write(QIO_CHANNEL(sn->ioc_lbuf), (char *) buf, count, NULL); res = run_snap_task(do_snap_load_co); if (res) { @@ -327,6 +368,7 @@ static int snap_load(SnapLoadParams *params) fail: snap_load_destroy_state(); + snap_ram_destroy_state(); return res; } From patchwork Wed Mar 17 16:32:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Gruzdev X-Patchwork-Id: 12146497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D5E5C433DB for ; Wed, 17 Mar 2021 17:00:57 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DDC7661580 for ; Wed, 17 Mar 2021 17:00:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DDC7661580 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:57306 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lMZXT-0008CH-VJ for qemu-devel@archiver.kernel.org; Wed, 17 Mar 2021 13:00:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36866) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9p-0003Lm-LH for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:33 -0400 Received: from relay.sw.ru ([185.231.240.75]:50928) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lMZ9k-0005xn-W4 for qemu-devel@nongnu.org; Wed, 17 Mar 2021 12:36:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=virtuozzo.com; s=relay; h=MIME-Version:Message-Id:Date:Subject:From: Content-Type; bh=GFETwEQ2pOKFwsfVAY3vkGj6h6qxlwMYoHVTykZaxKY=; b=FPNFXKcNdbYq xQe2UzRg/K9HuqKKkwSfoguCEkZI/CPyOL+xfG5PwFmlxoYWuhy2dt9JXQ5otiFFn+3V7p20JYEbx YhIeCtvO6UU8HYcrEijMrqlIKoRXfl+wiKO3OjERdLz9jnmF/45mvhjSQkbAJJAtvIyss4alEjNZN rheQU=; Received: from [192.168.15.248] (helo=andrey-MS-7B54.sw.ru) by relay.sw.ru with esmtp (Exim 4.94) (envelope-from ) id 1lMZ99-0034yI-Og; Wed, 17 Mar 2021 19:35:48 +0300 From: Andrey Gruzdev To: qemu-devel@nongnu.org Cc: Den Lunev , Eric Blake , Paolo Bonzini , Juan Quintela , "Dr . David Alan Gilbert" , Markus Armbruster , Peter Xu , Andrey Gruzdev Subject: [RFC PATCH 9/9] migration/snap-tool: Implementation of snapshot loading in postcopy Date: Wed, 17 Mar 2021 19:32:22 +0300 Message-Id: <20210317163222.182609-10-andrey.gruzdev@virtuozzo.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> References: <20210317163222.182609-1-andrey.gruzdev@virtuozzo.com> MIME-Version: 1.0 Received-SPF: pass client-ip=185.231.240.75; envelope-from=andrey.gruzdev@virtuozzo.com; helo=relay.sw.ru X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Implementation of asynchronous snapshot loading using standard postcopy migration mechanism on destination VM. The point of switchover to postcopy is trivially selected based on percentage of non-zero pages loaded in precopy. Signed-off-by: Andrey Gruzdev --- include/qemu-snap.h | 11 + qemu-snap-handlers.c | 482 ++++++++++++++++++++++++++++++++++++++++++- qemu-snap.c | 16 ++ 3 files changed, 504 insertions(+), 5 deletions(-) diff --git a/include/qemu-snap.h b/include/qemu-snap.h index f9f6db529f..4bf79e0964 100644 --- a/include/qemu-snap.h +++ b/include/qemu-snap.h @@ -30,6 +30,8 @@ #define AIO_BUFFER_SIZE (1024 * 1024) /* Max. concurrent AIO tasks */ #define AIO_TASKS_MAX 8 +/* Max. concurrent AIO tasks in postcopy */ +#define AIO_TASKS_POSTCOPY_MAX 4 typedef struct AioBufferPool AioBufferPool; @@ -103,6 +105,7 @@ typedef struct SnapLoadState { BlockBackend *blk; /* Block backend */ QEMUFile *f_fd; /* Outgoing migration stream QEMUFile */ + QEMUFile *f_rp_fd; /* Return path stream QEMUFile */ QEMUFile *f_vmstate; /* Block backend vmstate area QEMUFile */ /* * Buffer to keep first few KBs of BDRV vmstate that we stashed at the @@ -114,6 +117,14 @@ typedef struct SnapLoadState { /* AIO buffer pool */ AioBufferPool *aio_pool; + bool postcopy; /* From command-line --postcopy */ + int postcopy_percent; /* From command-line --postcopy */ + bool in_postcopy; /* Switched to postcopy mode */ + + /* Return path listening thread */ + QemuThread rp_listen_thread; + bool has_rp_listen_thread; + /* BDRV vmstate offset of RAM block list section */ int64_t state_ram_list_offset; /* BDRV vmstate offset of the first device section */ diff --git a/qemu-snap-handlers.c b/qemu-snap-handlers.c index 7dfe950829..ae581b3178 100644 --- a/qemu-snap-handlers.c +++ b/qemu-snap-handlers.c @@ -57,6 +57,16 @@ typedef struct RAMPageRef { int64_t page; /* Page index in RAM block */ } RAMPageRef; +/* Page request from destination in postcopy */ +typedef struct RAMPageRequest { + RAMBlockDesc *block; /* RAM block*/ + int64_t offset; /* Offset within RAM block */ + unsigned size; /* Size of request */ + + /* Link into ram_state.page_req */ + QSIMPLEQ_ENTRY(RAMPageRequest) next; +} RAMPageRequest; + /* State reflecting RAM part of snapshot */ typedef struct RAMState { int64_t page_size; /* Page size */ @@ -64,6 +74,7 @@ typedef struct RAMState { int page_bits; /* Page size bits */ int64_t normal_pages; /* Total number of normal (non-zero) pages */ + int64_t precopy_pages; /* Normal pages to load in precopy */ int64_t loaded_pages; /* Current number of normal pages loaded */ /* Last RAM block touched by load_send_ram_iterate() */ @@ -73,9 +84,15 @@ typedef struct RAMState { /* Last RAM block sent by load_send_ram_iterate() */ RAMBlockDesc *last_sent_block; + /* RAM block from last enqueued load-postcopy page request */ + RAMBlockDesc *last_req_block; /* List of RAM blocks */ QSIMPLEQ_HEAD(, RAMBlockDesc) ram_block_list; + + /* Page request queue for load-postcopy */ + QemuMutex page_req_mutex; + QSIMPLEQ_HEAD(, RAMPageRequest) page_req; } RAMState; /* Section handler ops */ @@ -801,6 +818,422 @@ static void load_check_file_errors(SnapLoadState *sn, int *res) } } +static bool get_queued_page(RAMPageRef *p_ref) +{ + RAMState *rs = &ram_state; + RAMBlockDesc *block = NULL; + int64_t offset; + + if (QSIMPLEQ_EMPTY_ATOMIC(&rs->page_req)) { + return false; + } + + QEMU_LOCK_GUARD(&rs->page_req_mutex); + + while (!QSIMPLEQ_EMPTY(&rs->page_req)) { + RAMPageRequest *entry = QSIMPLEQ_FIRST(&rs->page_req); + + block = entry->block; + offset = entry->offset; + + if (entry->size > rs->page_size) { + entry->size -= rs->page_size; + entry->offset += rs->page_size; + } else { + QSIMPLEQ_REMOVE_HEAD(&rs->page_req, next); + g_free(entry); + } + + if (test_bit((offset >> rs->page_bits), block->bitmap)) { + p_ref->block = block; + p_ref->page = offset >> rs->page_bits; + return true; + } + } + return false; +} + +static int queue_page_request(const char *id_str, int64_t offset, unsigned size) +{ + RAMState *rs = &ram_state; + RAMBlockDesc *block; + RAMPageRequest *new_entry; + + if (!id_str) { + block = rs->last_req_block; + if (!block) { + error_report("RP-REQ_PAGES: no previous block"); + return -EINVAL; + } + } else { + block = ram_block_by_idstr(id_str); + if (!block) { + error_report("RP-REQ_PAGES: cannot find block '%s'", id_str); + return -EINVAL; + } + rs->last_req_block = block; + } + + if (offset + size > block->length) { + error_report("RP-REQ_PAGES: offset/size out RAM block end_offset=0x%" PRIx64 + " limit=0x%" PRIx64, (offset + size), block->length); + return -EINVAL; + } + + new_entry = g_new0(RAMPageRequest, 1); + new_entry->block = block; + new_entry->offset = offset; + new_entry->size = size; + + qemu_mutex_lock(&rs->page_req_mutex); + QSIMPLEQ_INSERT_TAIL(&rs->page_req, new_entry, next); + qemu_mutex_unlock(&rs->page_req_mutex); + + return 0; +} + +/* QEMU_VM_COMMAND sub-commands */ +typedef enum VmSubCmd { + MIG_CMD_OPEN_RETURN_PATH = 1, + MIG_CMD_POSTCOPY_ADVISE = 3, + MIG_CMD_POSTCOPY_LISTEN = 4, + MIG_CMD_POSTCOPY_RUN = 5, + MIG_CMD_POSTCOPY_RAM_DISCARD = 6, + MIG_CMD_PACKAGED = 7, +} VmSubCmd; + +/* Return-path message types */ +typedef enum RpMsgType { + MIG_RP_MSG_INVALID = 0, + MIG_RP_MSG_SHUT = 1, + MIG_RP_MSG_REQ_PAGES_ID = 3, + MIG_RP_MSG_REQ_PAGES = 4, + MIG_RP_MSG_MAX = 7, +} RpMsgType; + +typedef struct RpMsgArgs { + int len; + const char *name; +} RpMsgArgs; + +/* + * Return-path message length/name indexed by message type. + * -1 value stands for variable message length. + */ +static RpMsgArgs rp_msg_args[] = { + [MIG_RP_MSG_INVALID] = { .len = -1, .name = "INVALID" }, + [MIG_RP_MSG_SHUT] = { .len = 4, .name = "SHUT" }, + [MIG_RP_MSG_REQ_PAGES_ID] = { .len = -1, .name = "REQ_PAGES_ID" }, + [MIG_RP_MSG_REQ_PAGES] = { .len = 12, .name = "REQ_PAGES" }, + [MIG_RP_MSG_MAX] = { .len = -1, .name = "MAX" }, +}; + +/* Return-path message processing thread */ +static void *rp_listen_thread(void *opaque) +{ + SnapLoadState *sn = (SnapLoadState *) opaque; + QEMUFile *f = sn->f_rp_fd; + int res = 0; + + while (!res) { + uint8_t h_buf[512]; + const int h_max_len = sizeof(h_buf); + int h_type; + int h_len; + size_t count; + + h_type = qemu_get_be16(f); + h_len = qemu_get_be16(f); + /* Make early check for input errors */ + res = qemu_file_get_error(f); + if (res) { + break; + } + + /* Check message type */ + if (h_type >= MIG_RP_MSG_MAX || h_type == MIG_RP_MSG_INVALID) { + error_report("RP: received invalid message type=%d length=%d", + h_type, h_len); + res = -EINVAL; + break; + } + + /* Check message length */ + if (rp_msg_args[h_type].len != -1 && h_len != rp_msg_args[h_type].len) { + error_report("RP: received '%s' message len=%d expected=%d", + rp_msg_args[h_type].name, h_len, rp_msg_args[h_type].len); + res = -EINVAL; + break; + } else if (h_len > h_max_len) { + error_report("RP: received '%s' message len=%d max_len=%d", + rp_msg_args[h_type].name, h_len, h_max_len); + res = -EINVAL; + break; + } + + count = qemu_get_buffer(f, h_buf, h_len); + if (count != h_len) { + break; + } + + switch (h_type) { + case MIG_RP_MSG_SHUT: + { + int shut_error; + + shut_error = be32_to_cpu(*(uint32_t *) h_buf); + if (shut_error) { + error_report("RP: sibling shutdown error=%d", shut_error); + } + /* Exit processing loop */ + res = 1; + break; + } + + case MIG_RP_MSG_REQ_PAGES: + case MIG_RP_MSG_REQ_PAGES_ID: + { + uint64_t offset; + uint32_t size; + char *id_str = NULL; + + offset = be64_to_cpu(*(uint64_t *) (h_buf + 0)); + size = be32_to_cpu(*(uint32_t *) (h_buf + 8)); + + if (h_type == MIG_RP_MSG_REQ_PAGES_ID) { + int h_parsed_len = rp_msg_args[MIG_RP_MSG_REQ_PAGES].len; + + if (h_len > h_parsed_len) { + int id_len; + + /* RAM block ID string */ + id_len = h_buf[h_parsed_len]; + id_str = (char *) &h_buf[h_parsed_len + 1]; + id_str[id_len] = 0; + + h_parsed_len += id_len + 1; + } + + if (h_parsed_len != h_len) { + error_report("RP: received '%s' message len=%d expected=%d", + rp_msg_args[MIG_RP_MSG_REQ_PAGES_ID].name, + h_len, h_parsed_len); + res = -EINVAL; + break; + } + } + + res = queue_page_request(id_str, offset, size); + break; + } + + default: + error_report("RP: received unexpected message type=%d len=%d", + h_type, h_len); + res = -EINVAL; + } + } + + if (res >= 0) { + res = qemu_file_get_error(f); + } + if (res) { + error_report("RP: listen thread exit error=%d", res); + } + return NULL; +} + +static void send_command(QEMUFile *f, int cmd, uint16_t len, uint8_t *data) +{ + qemu_put_byte(f, QEMU_VM_COMMAND); + qemu_put_be16(f, (uint16_t) cmd); + qemu_put_be16(f, len); + + qemu_put_buffer_async(f, data, len, false); + qemu_fflush(f); +} + +static void send_ram_block_discard(QEMUFile *f, RAMBlockDesc *block) +{ + int id_len; + int msg_len; + uint8_t msg_buf[512]; + + id_len = strlen(block->idstr); + assert(id_len < 256); + + /* Version, always 0 */ + msg_buf[0] = 0; + /* RAM block ID string length, not including terminating 0 */ + msg_buf[1] = id_len; + /* RAM block ID string with terminating zero */ + memcpy(msg_buf + 2, block->idstr, (id_len + 1)); + msg_len = 2 + id_len + 1; + /* + * We take discard range start offset for the RAM block + * from precopy-mode block->last_offset. + */ + stq_be_p(msg_buf + msg_len, block->last_offset); + msg_len += 8; + /* Discard range length */ + stq_be_p(msg_buf + msg_len, (block->length - block->last_offset)); + msg_len += 8; + + send_command(f, MIG_CMD_POSTCOPY_RAM_DISCARD, msg_len, msg_buf); +} + +static int send_ram_each_block_discard(QEMUFile *f) +{ + RAMBlockDesc *block; + int res = 0; + + QSIMPLEQ_FOREACH(block, &ram_state.ram_block_list, next) { + send_ram_block_discard(f, block); + res = qemu_file_get_error(f); + if (res) { + break; + } + } + return res; +} + +static int load_prepare_postcopy(SnapLoadState *sn) +{ + QEMUFile *f = sn->f_fd; + uint64_t tmp[2]; + int res; + + /* Set number of pages to load in precopy before switching to postcopy */ + ram_state.precopy_pages = ram_state.normal_pages * + sn->postcopy_percent / 100; + + /* Send POSTCOPY_ADVISE */ + tmp[0] = cpu_to_be64(ram_state.page_size); + tmp[1] = cpu_to_be64(ram_state.page_size); + send_command(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t *) tmp); + /* Open return path on destination */ + send_command(f, MIG_CMD_OPEN_RETURN_PATH, 0, NULL); + + /* + * Check file errors after POSTCOPY_ADVISE command since destination may already + * have closed input pipe in case postcopy had not been enabled in advance. + */ + res = qemu_file_get_error(f); + if (!res) { + qemu_thread_create(&sn->rp_listen_thread, "return-path-thread", + rp_listen_thread, sn, QEMU_THREAD_JOINABLE); + sn->has_rp_listen_thread = true; + } + return res; +} + +static int load_start_postcopy(SnapLoadState *sn) +{ + QIOChannelBuffer *bioc; + QEMUFile *fb; + int eof_pos; + uint32_t length; + int res = 0; + + /* + * Send RAM discards for each block's unsent part. Without discards, + * userfault_fd code on destination will not trigger page requests + * as expected. Also, the UFFDIO_COPY/ZEROPAGE ioctl's that are used + * to place incoming page in postcopy would give an error if that page + * has not faulted with userfault_fd MISSING reason. + */ + res = send_ram_each_block_discard(sn->f_fd); + if (res) { + return res; + } + + /* + * To perform a switch to postcopy on destination, we need to send + * commands and the device state data in the following order: + * * MIG_CMD_POSTCOPY_LISTEN + * * device state sections + * * MIG_CMD_POSTCOPY_RUN + * All of this has to be packaged into a single blob using + * MIG_CMD_PACKAGED command. While loading the device state we may + * trigger page transfer requests and the fd must be free to process + * those, thus the destination must read the whole device state off + * the fd before it starts processing it. To wrap it up in a package, + * we use QEMU buffer channel object. + */ + bioc = qio_channel_buffer_new(512 * 1024); + qio_channel_set_name(QIO_CHANNEL(bioc), "snap-postcopy-buffer"); + fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc)); + object_unref(OBJECT(bioc)); + + /* First goes MIG_CMD_POSTCOPY_LISTEN command */ + send_command(fb, MIG_CMD_POSTCOPY_LISTEN, 0, NULL); + + /* Then the rest of device state with optional VMDESC section.. */ + file_transfer_bytes(fb, sn->f_vmstate, + (sn->state_eof - sn->state_device_offset)); + qemu_fflush(fb); + + /* + * VMDESC json section may be present at the end of the stream + * so we'll try to locate it and truncate trailer for postcopy. + */ + eof_pos = bioc->usage - 1; + for (int offset = (bioc->usage - 11); offset >= 0; offset--) { + if (bioc->data[offset] == QEMU_VM_SECTION_FOOTER && + bioc->data[offset + 5] == QEMU_VM_EOF && + bioc->data[offset + 6] == QEMU_VM_VMDESCRIPTION) { + uint32_t json_length; + uint32_t expected_length = bioc->usage - (offset + 11); + + json_length = be32_to_cpu(*(uint32_t *) &bioc->data[offset + 7]); + if (json_length != expected_length) { + error_report("Corrupted VMDESC trailer: length=%" PRIu32 + " expected=%" PRIu32, json_length, expected_length); + res = -EINVAL; + goto fail; + } + + eof_pos = offset + 5; + break; + } + } + + /* + * In postcopy we need to remove QEMU_VM_EOF token which normally goes + * after the last non-iterable device state section before the (optional) + * VMDESC json section. This is required to allow snapshot loading to + * continue in postcopy after we sent the rest of device state. + * VMDESC section also has to be removed from the stream if present. + */ + if (eof_pos >= 0 && bioc->data[eof_pos] == QEMU_VM_EOF) { + bioc->usage = eof_pos; + bioc->offset = eof_pos; + } + + /* And the final MIG_CMD_POSTCOPY_RUN */ + send_command(fb, MIG_CMD_POSTCOPY_RUN, 0, NULL); + + /* Now send that blob */ + length = cpu_to_be32(bioc->usage); + send_command(sn->f_fd, MIG_CMD_PACKAGED, sizeof(length), + (uint8_t *) &length); + qemu_put_buffer_async(sn->f_fd, bioc->data, bioc->usage, false); + qemu_fflush(sn->f_fd); + + /* + * We set lower limit on the number of AIO in-flight requests + * to reduce return path PAGE_REQ processing latencies. + */ + aio_pool_set_max_in_flight(sn->aio_pool, AIO_TASKS_POSTCOPY_MAX); + /* Now in postcopy */ + sn->in_postcopy = true; + +fail: + qemu_fclose(fb); + load_check_file_errors(sn, &res); + return res; +} + static int ram_load(QEMUFile *f, void *opaque, int version_id) { int compat_flags = (RAM_SAVE_FLAG_MEM_SIZE | RAM_SAVE_FLAG_EOS); @@ -1007,11 +1440,23 @@ static void coroutine_fn load_buffers_fill_queue(SnapLoadState *sn) int64_t offset; int64_t limit; int64_t pages; + bool urgent; - if (!find_next_unsent_page(&p_ref)) { + /* + * First need to check if aio_pool_try_acquire_next() will + * succeed at least once since we can't revert get_queued_page(). + */ + if (!aio_pool_can_acquire_next(sn->aio_pool)) { return; } + urgent = get_queued_page(&p_ref); + if (!urgent) { + if (!find_next_unsent_page(&p_ref)) { + return; + } + } + get_unsent_page_range(&p_ref, &block, &offset, &limit); do { @@ -1033,7 +1478,7 @@ static void coroutine_fn load_buffers_fill_queue(SnapLoadState *sn) aio_buffer_start_task(buffer, load_buffers_task_co, bdrv_offset, size); offset += size; - } while (offset < limit); + } while (!urgent && (offset < limit)); rs->last_block = block; rs->last_page = offset >> rs->page_bits; @@ -1151,9 +1596,14 @@ static int load_send_leader(SnapLoadState *sn) static int load_send_complete(SnapLoadState *sn) { - /* Transfer device state to the output pipe */ - file_transfer_bytes(sn->f_fd, sn->f_vmstate, - (sn->state_eof - sn->state_device_offset)); + if (!sn->in_postcopy) { + /* Transfer device state to the output pipe */ + file_transfer_bytes(sn->f_fd, sn->f_vmstate, + (sn->state_eof - sn->state_device_offset)); + } else { + /* In postcopy send final QEMU_VM_EOF token */ + qemu_put_byte(sn->f_fd, QEMU_VM_EOF); + } qemu_fflush(sn->f_fd); return 1; } @@ -1266,6 +1716,11 @@ static int load_state_header(SnapLoadState *sn) return 0; } +static bool load_switch_to_postcopy(SnapLoadState *sn) +{ + return ram_state.loaded_pages > ram_state.precopy_pages; +} + /* Load snapshot data and send it with outgoing migration stream */ int coroutine_fn snap_load_state_main(SnapLoadState *sn) { @@ -1283,11 +1738,22 @@ int coroutine_fn snap_load_state_main(SnapLoadState *sn) if (res) { goto fail; } + if (sn->postcopy) { + res = load_prepare_postcopy(sn); + if (res) { + goto fail; + } + } do { res = load_send_ram_iterate(sn); /* Make additional check for file errors */ load_check_file_errors(sn, &res); + + if (!res && sn->postcopy && !sn->in_postcopy && + load_switch_to_postcopy(sn)) { + res = load_start_postcopy(sn); + } } while (!res); if (res == 1) { @@ -1313,6 +1779,10 @@ void snap_ram_init_state(int page_bits) /* Initialize RAM block list head */ QSIMPLEQ_INIT(&rs->ram_block_list); + + /* Initialize load-postcopy page request queue */ + qemu_mutex_init(&rs->page_req_mutex); + QSIMPLEQ_INIT(&rs->page_req); } /* Destroy snapshot RAM state */ @@ -1326,4 +1796,6 @@ void snap_ram_destroy_state(void) g_free(block->bitmap); g_free(block); } + /* Destroy page request mutex */ + qemu_mutex_destroy(&ram_state.page_req_mutex); } diff --git a/qemu-snap.c b/qemu-snap.c index c5efbd6803..89fb918cfc 100644 --- a/qemu-snap.c +++ b/qemu-snap.c @@ -141,6 +141,10 @@ static void snap_load_destroy_state(void) { SnapLoadState *sn = snap_load_get_state(); + if (sn->has_rp_listen_thread) { + qemu_thread_join(&sn->rp_listen_thread); + } + if (sn->aio_pool) { aio_pool_free(sn->aio_pool); } @@ -330,6 +334,7 @@ static int snap_load(SnapLoadParams *params) { SnapLoadState *sn; QIOChannel *ioc_fd; + QIOChannel *ioc_rp_fd; uint8_t *buf; size_t count; int res = -1; @@ -338,11 +343,22 @@ static int snap_load(SnapLoadParams *params) snap_load_init_state(); sn = snap_load_get_state(); + sn->postcopy = params->postcopy; + sn->postcopy_percent = params->postcopy_percent; + ioc_fd = qio_channel_new_fd(params->fd, NULL); qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-outgoing"); sn->f_fd = qemu_fopen_channel_output(ioc_fd); object_unref(OBJECT(ioc_fd)); + /* Open return path QEMUFile in case we shall use postcopy */ + if (params->postcopy) { + ioc_rp_fd = qio_channel_new_fd(params->rp_fd, NULL); + qio_channel_set_name(QIO_CHANNEL(ioc_fd), "snap-channel-rp"); + sn->f_rp_fd = qemu_fopen_channel_input(ioc_rp_fd); + object_unref(OBJECT(ioc_rp_fd)); + } + sn->blk = snap_open(params->filename, params->bdrv_flags); if (!sn->blk) { goto fail;