From patchwork Thu Dec 22 11:02:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13079585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 20B08C4332F for ; Thu, 22 Dec 2022 11:09:27 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p8JLI-0007jz-HO; Thu, 22 Dec 2022 06:02:28 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLF-0007iW-UD for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:26 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLD-0000Zc-Tx for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671706943; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9YeoZfOmebuFOKj2y+CCMHbm9gRcfJ0Mk1FsetvIqDA=; b=GO9XjqdmdZo5WSQZquZ9xZWUS70TquA+D7infrA8BjJlfMadoUFF6FrpXCp7FYXPidDgt8 xB27BnmAELgPh9rO/Ay26U7Mh9VautOoXcWP6dW7dkpvi5kVldL8/BwvqLwpqVPWRuM8XY Thjs755WwI7ntPdWn6h3BES/LyQFU0Q= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-467-ETklx0iwOSi-oAqGx17GjQ-1; Thu, 22 Dec 2022 06:02:21 -0500 X-MC-Unique: ETklx0iwOSi-oAqGx17GjQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2BB89857A85 for ; Thu, 22 Dec 2022 11:02:21 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 30E4414152F4; Thu, 22 Dec 2022 11:02:19 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , Juan Quintela , Peter Xu , "Michael S . Tsirkin" , Michal Privoznik Subject: [PATCH v3 1/6] migration: Allow immutable device state to be migrated early (i.e., before RAM) Date: Thu, 22 Dec 2022 12:02:10 +0100 Message-Id: <20221222110215.130392-2-david@redhat.com> In-Reply-To: <20221222110215.130392-1-david@redhat.com> References: <20221222110215.130392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org For virtio-mem, we want to have the plugged/unplugged state of memory blocks available before migrating any actual RAM content. This information is immutable on the migration source while migration is active, For example, we want to use this information for proper preallocation support with migration: currently, we don't preallocate memory on the migration target, and especially with hugetlb, we can easily run out of hugetlb pages during RAM migration and will crash (SIGBUS) instead of catching this gracefully via preallocation. Migrating device state before we start iterating is currently impossible. Introduce and use qemu_savevm_state_start_precopy(), and use a new special migration priority -- MIG_PRI_POST_SETUP -- to decide whether state will be saved in qemu_savevm_state_start_precopy() or in qemu_savevm_state_complete_precopy_*(). We have to take care of properly including the early device state in the vmdesc. Relying on migrate_get_current() to temporarily store the vmdesc is a bit sub-optimal, but we use that explicitly or implicitly all over the place already, so this barely matters in practice. Note that only very selected devices (i.e., ones seriously messing with RAM setup) are supposed to make use of that. Signed-off-by: David Hildenbrand Signed-off-by: David Hildenbrand Signed-off-by: David Hildenbrand --- include/migration/vmstate.h | 7 +++ migration/migration.c | 13 +++++ migration/migration.h | 4 ++ migration/savevm.c | 104 +++++++++++++++++++++++++++--------- migration/savevm.h | 1 + 5 files changed, 104 insertions(+), 25 deletions(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index ad24aa1934..79eb2409a2 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -156,6 +156,13 @@ typedef enum { MIG_PRI_VIRTIO_MEM, /* Must happen before IOMMU */ MIG_PRI_GICV3_ITS, /* Must happen before PCI devices */ MIG_PRI_GICV3, /* Must happen before the ITS */ + /* + * Must happen before all other devices (iterable and non-iterable), + * especiall, before migrating RAM content. Such device state must be + * guaranteed to be immutable on the migration source until migration + * ends and must not depend on the CPU state to be synchronized. + */ + MIG_PRI_POST_SETUP, MIG_PRI_MAX, } MigrationPriority; diff --git a/migration/migration.c b/migration/migration.c index 52b5d39244..78b6bb8765 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2170,6 +2170,9 @@ void migrate_init(MigrationState *s) s->vm_was_running = false; s->iteration_initial_bytes = 0; s->threshold_size = 0; + + json_writer_free(s->vmdesc); + s->vmdesc = NULL; } int migrate_add_blocker_internal(Error *reason, Error **errp) @@ -3997,6 +4000,9 @@ static void *migration_thread(void *opaque) trace_migration_thread_setup_complete(); + /* Process early data that has to get migrated before iterating. */ + qemu_savevm_state_start_precopy(s->to_dst_file); + while (migration_is_active(s)) { if (urgent || !qemu_file_rate_limit(s->to_dst_file)) { MigIterateState iter_state = migration_iteration_run(s); @@ -4125,6 +4131,12 @@ static void *bg_migration_thread(void *opaque) if (vm_stop_force_state(RUN_STATE_PAUSED)) { goto fail; } + + /* Migrate early data that would usually get migrated before iterating. */ + if (qemu_savevm_state_start_precopy(fb)) { + goto fail; + } + /* * Put vCPUs in sync with shadow context structures, then * save their state to channel-buffer along with devices. @@ -4445,6 +4457,7 @@ static void migration_instance_finalize(Object *obj) qemu_sem_destroy(&ms->rp_state.rp_sem); qemu_sem_destroy(&ms->postcopy_qemufile_src_sem); error_free(ms->error); + json_writer_free(ms->vmdesc); } static void migration_instance_init(Object *obj) diff --git a/migration/migration.h b/migration/migration.h index ae4ffd3454..66511ce532 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -17,6 +17,7 @@ #include "exec/cpu-common.h" #include "hw/qdev-core.h" #include "qapi/qapi-types-migration.h" +#include "qapi/qmp/json-writer.h" #include "qemu/thread.h" #include "qemu/coroutine_int.h" #include "io/channel.h" @@ -366,6 +367,9 @@ struct MigrationState { * This save hostname when out-going migration starts */ char *hostname; + + /* QEMU_VM_VMDESCRIPTION content filled for all non-iterable devices. */ + JSONWriter *vmdesc; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/savevm.c b/migration/savevm.c index a0cdb714f7..b810409574 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -42,7 +42,6 @@ #include "postcopy-ram.h" #include "qapi/error.h" #include "qapi/qapi-commands-migration.h" -#include "qapi/qmp/json-writer.h" #include "qapi/clone-visitor.h" #include "qapi/qapi-builtin-visit.h" #include "qapi/qmp/qerror.h" @@ -1325,6 +1324,71 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f) qemu_fflush(f); } +static int qemu_savevm_state_precopy_one_non_iterable(SaveStateEntry *se, + QEMUFile *f, + JSONWriter *vmdesc) +{ + int ret; + + if ((!se->ops || !se->ops->save_state) && !se->vmsd) { + return 0; + } + if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) { + trace_savevm_section_skip(se->idstr, se->section_id); + return 0; + } + + trace_savevm_section_start(se->idstr, se->section_id); + + json_writer_start_object(vmdesc, NULL); + json_writer_str(vmdesc, "name", se->idstr); + json_writer_int64(vmdesc, "instance_id", se->instance_id); + + save_section_header(f, se, QEMU_VM_SECTION_FULL); + ret = vmstate_save(f, se, vmdesc); + if (ret) { + qemu_file_set_error(f, ret); + return ret; + } + trace_savevm_section_end(se->idstr, se->section_id, 0); + save_section_footer(f, se); + + json_writer_end_object(vmdesc); + return 0; +} + +int qemu_savevm_state_start_precopy(QEMUFile *f) +{ + MigrationState *ms = migrate_get_current(); + JSONWriter *vmdesc; + SaveStateEntry *se; + int ret; + + assert(!ms->vmdesc); + ms->vmdesc = vmdesc = json_writer_new(false); + json_writer_start_object(vmdesc, NULL); + json_writer_int64(vmdesc, "page_size", qemu_target_page_size()); + json_writer_start_array(vmdesc, "devices"); + + /* + * Only immutable non-iterable device state is expected to be saved this + * early. All remaining (ordinary) non-iterable device state will be saved + * in qemu_savevm_state_complete_precopy_non_iterable(). + */ + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (save_state_priority(se) < MIG_PRI_POST_SETUP) { + continue; + } + + ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc); + if (ret) { + return ret; + } + } + + return 0; +} + static int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy) { @@ -1364,41 +1428,24 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, bool in_postcopy, bool inactivate_disks) { - g_autoptr(JSONWriter) vmdesc = NULL; + MigrationState *ms = migrate_get_current(); + JSONWriter *vmdesc = ms->vmdesc; int vmdesc_len; SaveStateEntry *se; int ret; - vmdesc = json_writer_new(false); - json_writer_start_object(vmdesc, NULL); - json_writer_int64(vmdesc, "page_size", qemu_target_page_size()); - json_writer_start_array(vmdesc, "devices"); - QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + /* qemu_savevm_state_start_precopy() is expected to be called first. */ + assert(vmdesc); - if ((!se->ops || !se->ops->save_state) && !se->vmsd) { - continue; - } - if (se->vmsd && !vmstate_save_needed(se->vmsd, se->opaque)) { - trace_savevm_section_skip(se->idstr, se->section_id); + QTAILQ_FOREACH(se, &savevm_state.handlers, entry) { + if (save_state_priority(se) >= MIG_PRI_POST_SETUP) { continue; } - trace_savevm_section_start(se->idstr, se->section_id); - - json_writer_start_object(vmdesc, NULL); - json_writer_str(vmdesc, "name", se->idstr); - json_writer_int64(vmdesc, "instance_id", se->instance_id); - - save_section_header(f, se, QEMU_VM_SECTION_FULL); - ret = vmstate_save(f, se, vmdesc); + ret = qemu_savevm_state_precopy_one_non_iterable(se, f, vmdesc); if (ret) { - qemu_file_set_error(f, ret); return ret; } - trace_savevm_section_end(se->idstr, se->section_id, 0); - save_section_footer(f, se); - - json_writer_end_object(vmdesc); } if (inactivate_disks) { @@ -1427,6 +1474,10 @@ int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, qemu_put_buffer(f, (uint8_t *)json_writer_get(vmdesc), vmdesc_len); } + /* Free it now to detect any inconsistencies. */ + g_free(vmdesc); + ms->vmdesc = NULL; + return 0; } @@ -1541,6 +1592,9 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp) qemu_savevm_state_setup(f); qemu_mutex_lock_iothread(); + /* Process early data that has to get migrated before iterating. */ + qemu_savevm_state_start_precopy(f); + while (qemu_file_get_error(f) == 0) { if (qemu_savevm_state_iterate(f, false) > 0) { break; diff --git a/migration/savevm.h b/migration/savevm.h index 6461342cb4..323bd5ab3b 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -38,6 +38,7 @@ void qemu_savevm_state_header(QEMUFile *f); int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy); void qemu_savevm_state_cleanup(void); void qemu_savevm_state_complete_postcopy(QEMUFile *f); +int qemu_savevm_state_start_precopy(QEMUFile *f); int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only, bool inactivate_disks); void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size, From patchwork Thu Dec 22 11:02:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13079583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F07BEC4332F for ; Thu, 22 Dec 2022 11:07:06 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p8JLL-0007n1-Mr; Thu, 22 Dec 2022 06:02:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLH-0007jR-AK for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:27 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLF-0000cl-Hs for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671706944; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rz1pjqTRn/bl1PU7ngq+kDvOE5ivhQUVs2tUoGLPig0=; b=Hq34FqgcRIU80/pbwDcDRJMZS8O0Av7HOto6V5JExU49iK/+kdfEPkvk99LPvlvP4uPQCD e4yQv25sWAU8buuy2DSG7M705k96TayYyqUJ7X6pXbvhijZpRkn7NW7zIgF9NGiSSxAyih s0kRLJ+/ZV1Cg/FYr7EVem5bwfDQFtk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-605-J3dvFjAsPU6sDxArBXK7kA-1; Thu, 22 Dec 2022 06:02:23 -0500 X-MC-Unique: J3dvFjAsPU6sDxArBXK7kA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3647B18483B4 for ; Thu, 22 Dec 2022 11:02:23 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 876E0140E90F; Thu, 22 Dec 2022 11:02:21 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , Juan Quintela , Peter Xu , "Michael S . Tsirkin" , Michal Privoznik Subject: [PATCH v3 2/6] migration/vmstate: Introduce VMSTATE_WITH_TMP_TEST() and VMSTATE_BITMAP_TEST() Date: Thu, 22 Dec 2022 12:02:11 +0100 Message-Id: <20221222110215.130392-3-david@redhat.com> In-Reply-To: <20221222110215.130392-1-david@redhat.com> References: <20221222110215.130392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org We'll make use of both next in the context of virtio-mem. Signed-off-by: David Hildenbrand Reviewed-by: Dr. David Alan Gilbert --- include/migration/vmstate.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 79eb2409a2..73ad1ae0eb 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -712,8 +712,9 @@ extern const VMStateInfo vmstate_info_qlist; * '_state' type * That the pointer is right at the start of _tmp_type. */ -#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) { \ +#define VMSTATE_WITH_TMP_TEST(_state, _test, _tmp_type, _vmsd) { \ .name = "tmp", \ + .field_exists = (_test), \ .size = sizeof(_tmp_type) + \ QEMU_BUILD_BUG_ON_ZERO(offsetof(_tmp_type, parent) != 0) + \ type_check_pointer(_state, \ @@ -722,6 +723,9 @@ extern const VMStateInfo vmstate_info_qlist; .info = &vmstate_info_tmp, \ } +#define VMSTATE_WITH_TMP(_state, _tmp_type, _vmsd) \ + VMSTATE_WITH_TMP_TEST(_state, NULL, _tmp_type, _vmsd) + #define VMSTATE_UNUSED_BUFFER(_test, _version, _size) { \ .name = "unused", \ .field_exists = (_test), \ @@ -745,8 +749,9 @@ extern const VMStateInfo vmstate_info_qlist; /* _field_size should be a int32_t field in the _state struct giving the * size of the bitmap _field in bits. */ -#define VMSTATE_BITMAP(_field, _state, _version, _field_size) { \ +#define VMSTATE_BITMAP_TEST(_field, _state, _test, _version, _field_size) { \ .name = (stringify(_field)), \ + .field_exists = (_test), \ .version_id = (_version), \ .size_offset = vmstate_offset_value(_state, _field_size, int32_t),\ .info = &vmstate_info_bitmap, \ @@ -754,6 +759,9 @@ extern const VMStateInfo vmstate_info_qlist; .offset = offsetof(_state, _field), \ } +#define VMSTATE_BITMAP(_field, _state, _version, _field_size) \ + VMSTATE_BITMAP_TEST(_field, _state, NULL, _version, _field_size) + /* For migrating a QTAILQ. * Target QTAILQ needs be properly initialized. * _type: type of QTAILQ element From patchwork Thu Dec 22 11:02:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13079588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 950E6C4332F for ; Thu, 22 Dec 2022 11:11:37 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p8JLK-0007lT-Dy; Thu, 22 Dec 2022 06:02:30 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLI-0007kJ-PQ for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:28 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLH-0000di-45 for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671706946; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U64efsTGZYraQEpt11QT2hT+tmoHPemZSMX2VUk2i4w=; b=Y71mhPYApJH6pU6E5jZS3YyUEUqLnEx/HyUEwLF2ScHCK1wJ8wIPuNehtWJ6PI7QYJ9IP5 rCBXg7lris3yvWs3uP2z8aVFCNWWvxPHV9U7QpRAnFvB2ZADU9da3Cw1V5PsiGeH+xR1nr gzI2aRpHajO40rZsaysLF3WKn5LkVkc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-91-mTCYz_NkNiu0mhXJK4aLXA-1; Thu, 22 Dec 2022 06:02:24 -0500 X-MC-Unique: mTCYz_NkNiu0mhXJK4aLXA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A68B33806079 for ; Thu, 22 Dec 2022 11:02:24 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 731C814152F4; Thu, 22 Dec 2022 11:02:23 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , Juan Quintela , Peter Xu , "Michael S . Tsirkin" , Michal Privoznik Subject: [PATCH v3 3/6] migration: Factor out checks for advised and listening incomming postcopy Date: Thu, 22 Dec 2022 12:02:12 +0100 Message-Id: <20221222110215.130392-4-david@redhat.com> In-Reply-To: <20221222110215.130392-1-david@redhat.com> References: <20221222110215.130392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Let's factor out both checks, to be used in virtio-mem context next. While at it, fix a spelling error in a related comment. Signed-off-by: David Hildenbrand --- include/migration/misc.h | 6 +++++- migration/migration.c | 14 ++++++++++++++ migration/ram.c | 16 ++-------------- 3 files changed, 21 insertions(+), 15 deletions(-) diff --git a/include/migration/misc.h b/include/migration/misc.h index 465906710d..c7e67a6804 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -67,8 +67,12 @@ bool migration_has_failed(MigrationState *); /* ...and after the device transmission */ bool migration_in_postcopy_after_devices(MigrationState *); void migration_global_dump(Monitor *mon); -/* True if incomming migration entered POSTCOPY_INCOMING_DISCARD */ +/* True if incoming migration entered POSTCOPY_INCOMING_DISCARD */ bool migration_in_incoming_postcopy(void); +/* True if incoming migration entered POSTCOPY_INCOMING_ADVISE */ +bool migration_incoming_postcopy_advised(void); +/* True if incoming migration entered POSTCOPY_INCOMING_LISTENING */ +bool migration_incoming_postcopy_listening(void); /* True if background snapshot is active */ bool migration_in_bg_snapshot(void); diff --git a/migration/migration.c b/migration/migration.c index 78b6bb8765..7a69bb93b0 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2094,6 +2094,20 @@ bool migration_in_incoming_postcopy(void) return ps >= POSTCOPY_INCOMING_DISCARD && ps < POSTCOPY_INCOMING_END; } +bool migration_incoming_postcopy_advised(void) +{ + PostcopyState ps = postcopy_state_get(); + + return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END; +} + +bool migration_incoming_postcopy_listening(void) +{ + PostcopyState ps = postcopy_state_get(); + + return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END; +} + bool migration_in_bg_snapshot(void) { MigrationState *s = migrate_get_current(); diff --git a/migration/ram.c b/migration/ram.c index 334309f1c6..44b063eccd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -4091,18 +4091,6 @@ int ram_load_postcopy(QEMUFile *f, int channel) return ret; } -static bool postcopy_is_advised(void) -{ - PostcopyState ps = postcopy_state_get(); - return ps >= POSTCOPY_INCOMING_ADVISE && ps < POSTCOPY_INCOMING_END; -} - -static bool postcopy_is_running(void) -{ - PostcopyState ps = postcopy_state_get(); - return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END; -} - /* * Flush content of RAM cache into SVM's memory. * Only flush the pages that be dirtied by PVM or SVM or both. @@ -4167,7 +4155,7 @@ static int ram_load_precopy(QEMUFile *f) MigrationIncomingState *mis = migration_incoming_get_current(); int flags = 0, ret = 0, invalid_flags = 0, len = 0, i = 0; /* ADVISE is earlier, it shows the source has the postcopy capability on */ - bool postcopy_advised = postcopy_is_advised(); + bool postcopy_advised = migration_incoming_postcopy_advised(); if (!migrate_use_compression()) { invalid_flags |= RAM_SAVE_FLAG_COMPRESS_PAGE; } @@ -4365,7 +4353,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) * If system is running in postcopy mode, page inserts to host memory must * be atomic */ - bool postcopy_running = postcopy_is_running(); + bool postcopy_running = migration_incoming_postcopy_listening(); seq_iter++; From patchwork Thu Dec 22 11:02:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13079582 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32072C001B2 for ; Thu, 22 Dec 2022 11:06:42 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p8JLM-0007or-My; Thu, 22 Dec 2022 06:02:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLK-0007lu-Ii for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:30 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLI-0000dx-M8 for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671706948; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xvQizLOJBawnOU2rcD7C8fqld3+IuUMt32YPNXyAFrA=; b=FBmawrugMrsEHweO4zgYO+p5lZfopynYO3S13c5sY0HFxUc65Sph7OJDVU7hW7esKc+IiA I/GZzPMYQrSo6zjuB735gyoMpzXPoiGS+32TcOCW2vBUcU6U6ah210uC5MDpPAymovi8G2 rBJZEB7LIbOE0nnjUKJeLYIZOKTK+zw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-552-n-yg1uBUNiSZis6ufD890w-1; Thu, 22 Dec 2022 06:02:26 -0500 X-MC-Unique: n-yg1uBUNiSZis6ufD890w-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4F6F43806079 for ; Thu, 22 Dec 2022 11:02:26 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0728214152F4; Thu, 22 Dec 2022 11:02:24 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , Juan Quintela , Peter Xu , "Michael S . Tsirkin" , Michal Privoznik Subject: [PATCH v3 4/6] virtio-mem: Fail if a memory backend with "prealloc=on" is specified Date: Thu, 22 Dec 2022 12:02:13 +0100 Message-Id: <20221222110215.130392-5-david@redhat.com> In-Reply-To: <20221222110215.130392-1-david@redhat.com> References: <20221222110215.130392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass client-ip=170.10.133.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org "prealloc=on" for the memory backend does not work as expected, as virtio-mem will simply discard all preallocated memory immediately again. In the best case, it's an expensive NOP. In the worst case, it's an unexpected allocation error. Instead, "prealloc=on" should be specified for the virtio-mem device only, such that virtio-mem will try preallocating memory before plugging memory dynamically to the guest. Fail if such a memory backend is provided. Tested-by: Michal Privoznik Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 5c22c4b876..e0e908d392 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -772,6 +772,12 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) error_setg(errp, "'%s' property specifies an unsupported memdev", VIRTIO_MEM_MEMDEV_PROP); return; + } else if (vmem->memdev->prealloc) { + error_setg(errp, "'%s' property specifies a memdev with preallocation" + " enabled: %s. Instead, specify 'prealloc=on' for the" + " virtio-mem device. ", VIRTIO_MEM_MEMDEV_PROP, + object_get_canonical_path_component(OBJECT(vmem->memdev))); + return; } if ((nb_numa_nodes && vmem->node >= nb_numa_nodes) || From patchwork Thu Dec 22 11:02:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13079581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6A0DDC4332F for ; Thu, 22 Dec 2022 11:05:16 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p8JLO-0007qe-SI; Thu, 22 Dec 2022 06:02:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLM-0007oN-Mg for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:32 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLK-0000eA-IX for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671706950; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V1Y9nichDIuJNQpnjB4iEl7ytlahVz9PoK+yJM+RgZ8=; b=AQ3UfPFhRjHk4l7Zbm8fpQVsd+4NGwvdj7i1AwZpAaVR5dycUdZBf2/hWUJOiumBrVeMi4 1M/piU6NJ2TN6vL0ea9D7NrZ8YCBbcFOkR/qcIp5MQ1n3r5vlrLqUC2o4qKWmZ+KsDCDNV mYaBPuDghd9xwlOCc0EiLXzmobbJMfc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-539-jnlqCxEnN96msoDVpXUTBw-1; Thu, 22 Dec 2022 06:02:28 -0500 X-MC-Unique: jnlqCxEnN96msoDVpXUTBw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1CABF2A59555 for ; Thu, 22 Dec 2022 11:02:28 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id A4E8714152F4; Thu, 22 Dec 2022 11:02:26 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , Juan Quintela , Peter Xu , "Michael S . Tsirkin" , Michal Privoznik Subject: [PATCH v3 5/6] virtio-mem: Migrate bitmap, size and sanity checks early Date: Thu, 22 Dec 2022 12:02:14 +0100 Message-Id: <20221222110215.130392-6-david@redhat.com> In-Reply-To: <20221222110215.130392-1-david@redhat.com> References: <20221222110215.130392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org The bitmap and the size are immutable while migration is active: see virtio_mem_is_busy(). We can migrate this information early, before migrating any actual RAM content. Having this information in place early will, for example, allow for properly preallocating memory before touching these memory locations during RAM migration: this way, we can make sure that all memory was actually preallocated and that any user errors (e.g., insufficient hugetlb pages) can be handled gracefully. In contrast, usable_region_size and requested_size can theoretically still be modified on the source while the VM is running. Keep migrating these properties the usual, late, way. Use a new device property to keep behavior of compat machines unmodified. Signed-off-by: David Hildenbrand --- hw/core/machine.c | 4 ++- hw/virtio/virtio-mem.c | 51 ++++++++++++++++++++++++++++++++-- include/hw/virtio/virtio-mem.h | 8 ++++++ 3 files changed, 60 insertions(+), 3 deletions(-) diff --git a/hw/core/machine.c b/hw/core/machine.c index f589b92909..6532190412 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -40,7 +40,9 @@ #include "hw/virtio/virtio-pci.h" #include "qom/object_interfaces.h" -GlobalProperty hw_compat_7_2[] = {}; +GlobalProperty hw_compat_7_2[] = { + { "virtio-mem", "x-early-migration", "false" }, +}; const size_t hw_compat_7_2_len = G_N_ELEMENTS(hw_compat_7_2); GlobalProperty hw_compat_7_1[] = { diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index e0e908d392..043b96f509 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -31,6 +31,8 @@ #include CONFIG_DEVICES #include "trace.h" +static const VMStateDescription vmstate_virtio_mem_device_early; + /* * We only had legacy x86 guests that did not support * VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. Other targets don't have legacy guests. @@ -878,6 +880,10 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) host_memory_backend_set_mapped(vmem->memdev, true); vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem)); + if (vmem->early_migration) { + vmstate_register(VMSTATE_IF(vmem), VMSTATE_INSTANCE_ID_ANY, + &vmstate_virtio_mem_device_early, vmem); + } qemu_register_reset(virtio_mem_system_reset, vmem); /* @@ -899,6 +905,10 @@ static void virtio_mem_device_unrealize(DeviceState *dev) */ memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); qemu_unregister_reset(virtio_mem_system_reset, vmem); + if (vmem->early_migration) { + vmstate_unregister(VMSTATE_IF(vmem), &vmstate_virtio_mem_device_early, + vmem); + } vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem)); host_memory_backend_set_mapped(vmem->memdev, false); virtio_del_queue(vdev, 0); @@ -1015,18 +1025,53 @@ static const VMStateDescription vmstate_virtio_mem_sanity_checks = { }, }; +static bool virtio_mem_vmstate_field_exists(void *opaque, int version_id) +{ + const VirtIOMEM *vmem = VIRTIO_MEM(opaque); + + /* With early migration, these fields were already migrated. */ + return !vmem->early_migration; +} + static const VMStateDescription vmstate_virtio_mem_device = { .name = "virtio-mem-device", .minimum_version_id = 1, .version_id = 1, .priority = MIG_PRI_VIRTIO_MEM, .post_load = virtio_mem_post_load, + .fields = (VMStateField[]) { + VMSTATE_WITH_TMP_TEST(VirtIOMEM, virtio_mem_vmstate_field_exists, + VirtIOMEMMigSanityChecks, + vmstate_virtio_mem_sanity_checks), + VMSTATE_UINT64(usable_region_size, VirtIOMEM), + VMSTATE_UINT64_TEST(size, VirtIOMEM, virtio_mem_vmstate_field_exists), + VMSTATE_UINT64(requested_size, VirtIOMEM), + VMSTATE_BITMAP_TEST(bitmap, VirtIOMEM, virtio_mem_vmstate_field_exists, + 0, bitmap_size), + VMSTATE_END_OF_LIST() + }, +}; + +/* + * Transfer properties that are immutable while migration is active early, + * such that we have have this information around before migrating any RAM + * content. + * + * Note that virtio_mem_is_busy() makes sure these properties can no longer + * change on the migration source until migration completed. + * + * With QEMU compat machines, we transmit these properties later, via + * vmstate_virtio_mem_device instead -- see virtio_mem_vmstate_field_exists(). + */ +static const VMStateDescription vmstate_virtio_mem_device_early = { + .name = "virtio-mem-device-early", + .minimum_version_id = 1, + .version_id = 1, + .priority = MIG_PRI_POST_SETUP, .fields = (VMStateField[]) { VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks, vmstate_virtio_mem_sanity_checks), - VMSTATE_UINT64(usable_region_size, VirtIOMEM), VMSTATE_UINT64(size, VirtIOMEM), - VMSTATE_UINT64(requested_size, VirtIOMEM), VMSTATE_BITMAP(bitmap, VirtIOMEM, 0, bitmap_size), VMSTATE_END_OF_LIST() }, @@ -1211,6 +1256,8 @@ static Property virtio_mem_properties[] = { DEFINE_PROP_ON_OFF_AUTO(VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP, VirtIOMEM, unplugged_inaccessible, ON_OFF_AUTO_AUTO), #endif + DEFINE_PROP_BOOL(VIRTIO_MEM_EARLY_MIGRATION_PROP, VirtIOMEM, + early_migration, true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index 7745cfc1a3..f15e561785 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -31,6 +31,7 @@ OBJECT_DECLARE_TYPE(VirtIOMEM, VirtIOMEMClass, #define VIRTIO_MEM_BLOCK_SIZE_PROP "block-size" #define VIRTIO_MEM_ADDR_PROP "memaddr" #define VIRTIO_MEM_UNPLUGGED_INACCESSIBLE_PROP "unplugged-inaccessible" +#define VIRTIO_MEM_EARLY_MIGRATION_PROP "x-early-migration" #define VIRTIO_MEM_PREALLOC_PROP "prealloc" struct VirtIOMEM { @@ -74,6 +75,13 @@ struct VirtIOMEM { /* whether to prealloc memory when plugging new blocks */ bool prealloc; + /* + * Whether we migrate properties that are immutable while migration is + * active early, before state of other devices and especially, before + * migrating any RAM content. + */ + bool early_migration; + /* notifiers to notify when "size" changes */ NotifierList size_change_notifiers; From patchwork Thu Dec 22 11:02:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13079584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 28AB3C4332F for ; Thu, 22 Dec 2022 11:08:36 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p8JLR-0007tB-L4; Thu, 22 Dec 2022 06:02:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLN-0007pz-U5 for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:33 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p8JLM-0000eL-3t for qemu-devel@nongnu.org; Thu, 22 Dec 2022 06:02:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671706951; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3lmQ5wsyvSzjKBf/IDUrMm4eBwOJhPf42l97obI7maA=; b=IcwJOaeLAV/P0AcXSWBHwtg/6SxLEOr2EoVACr70uY3u8wTJ0go0IUlYER4Er2za/1D5cv lcOTxMtxd4Jpjwc/FQT9Esqrf3VAL8E/I61at3/Bh3SfWbluFQOuL8oOh2cwBacMmQs8uh vvxvAVx7frE/Pv7ugd9KxzJrNIgpjKw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-maRIFQQ2P4CEc_ELO2r2RQ-1; Thu, 22 Dec 2022 06:02:30 -0500 X-MC-Unique: maRIFQQ2P4CEc_ELO2r2RQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C2E332806051 for ; Thu, 22 Dec 2022 11:02:29 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.53]) by smtp.corp.redhat.com (Postfix) with ESMTP id 58F7F14152F4; Thu, 22 Dec 2022 11:02:28 +0000 (UTC) From: David Hildenbrand To: qemu-devel@nongnu.org Cc: David Hildenbrand , "Dr . David Alan Gilbert" , Juan Quintela , Peter Xu , "Michael S . Tsirkin" , Michal Privoznik , Jing Qi Subject: [PATCH v3 6/6] virtio-mem: Proper support for preallocation with migration Date: Thu, 22 Dec 2022 12:02:15 +0100 Message-Id: <20221222110215.130392-7-david@redhat.com> In-Reply-To: <20221222110215.130392-1-david@redhat.com> References: <20221222110215.130392-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 Received-SPF: pass client-ip=170.10.129.124; envelope-from=david@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Ordinary memory preallocation runs when QEMU starts up and creates the memory backends, before processing the incoming migration stream. With virtio-mem, we don't know which memory blocks to preallocate before migration started. Now that we migrate the virtio-mem bitmap early, before migrating any RAM content, we can safely preallocate memory for all plugged memory blocks before migrating any RAM content. This is especially relevant for the following cases: (1) User errors With hugetlb/files, if we don't have sufficient backend memory available on the migration destination, we'll crash QEMU (SIGBUS) during RAM migration when running out of backend memory. Preallocating memory before actual RAM migration allows for failing gracefully and informing the user about the setup problem. (2) Excluded memory ranges during migration For example, virtio-balloon free page hinting will exclude some pages from getting migrated. In that case, we won't crash during RAM migration, but later, when running the VM on the destination, which is bad. To fix this for new QEMU machines that migrate the bitmap early, preallocate the memory early, before any RAM migration. Warn with old QEMU machines. Getting postcopy right is a bit tricky, but we essentially now implement the same (problematic) preallocation logic as ordinary preallocation: preallocate memory early and discard it again before precopy starts. During ordinary preallocation, discarding of RAM happens when postcopy is advised. As the our state (bitmap) is loaded after postcopy was advised but before postcopy starts listening, we have to discard memory we preallocated immediately again ourselves. Note that nothing (not even hugetlb reservations) guarantees for postcopy that backend memory (especially, hugetlb pages) are still free after they were freed ones while discarding RAM. Still, allocating that memory at least once helps catching some basic setup problems. Before this change, trying to restore a VM when insufficient hugetlb pages are around results in the process crashing to to a "Bus error" (SIGBUS). With this change, QEMU fails gracefully: qemu-system-x86_64: qemu_prealloc_mem: preallocating memory failed: Bad address qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-mem-device-early' qemu-system-x86_64: load of migration failed: Cannot allocate memory Reported-by: Jing Qi Signed-off-by: David Hildenbrand --- hw/virtio/virtio-mem.c | 97 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 043b96f509..c1cf448046 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -204,6 +204,30 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg, return ret; } +static int virtio_mem_for_each_plugged_range(const VirtIOMEM *vmem, void *arg, + virtio_mem_range_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + int ret = 0; + + first_bit = find_first_bit(vmem->bitmap, vmem->bitmap_size); + while (first_bit < vmem->bitmap_size) { + offset = first_bit * vmem->block_size; + last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, + first_bit + 1) - 1; + size = (last_bit - first_bit + 1) * vmem->block_size; + + ret = cb(vmem, arg, offset, size); + if (ret) { + break; + } + first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, + last_bit + 2); + } + return ret; +} + /* * Adjust the memory section to cover the intersection with the given range. * @@ -938,6 +962,10 @@ static int virtio_mem_post_load(void *opaque, int version_id) RamDiscardListener *rdl; int ret; + if (vmem->prealloc && !vmem->early_migration) { + warn_report("Proper preallocation with migration requires a newer QEMU machine"); + } + /* * We started out with all memory discarded and our memory region is mapped * into an address space. Replay, now that we updated the bitmap. @@ -957,6 +985,74 @@ static int virtio_mem_post_load(void *opaque, int version_id) return virtio_mem_restore_unplugged(vmem); } +static int virtio_mem_prealloc_range_cb(const VirtIOMEM *vmem, void *arg, + uint64_t offset, uint64_t size) +{ + void *area = memory_region_get_ram_ptr(&vmem->memdev->mr) + offset; + int fd = memory_region_get_fd(&vmem->memdev->mr); + Error *local_err = NULL; + + qemu_prealloc_mem(fd, area, size, 1, NULL, &local_err); + if (local_err) { + error_report_err(local_err); + return -ENOMEM; + } + return 0; +} + +static int virtio_mem_post_load_early(void *opaque, int version_id) +{ + VirtIOMEM *vmem = VIRTIO_MEM(opaque); + RAMBlock *rb = vmem->memdev->mr.ram_block; + int ret; + + if (!vmem->prealloc) { + return 0; + } + + if (migration_incoming_postcopy_listening()) { + /* + * This is unexpected, we're not supposed to be loaded after + * postcopy is listening because ram_block_enable_notify() already + * armed userfaultfd. Let's play safe and catch it. + */ + warn_report("Postcopy is already listening, preallocation is impossible."); + return -EBUSY; + } + + /* + * We restored the bitmap and verified that the basic properties + * match on source and destination, so we can go ahead and preallocate + * memory for all plugged memory blocks, before actual RAM migration starts + * touching this memory. + */ + ret = virtio_mem_for_each_plugged_range(vmem, NULL, + virtio_mem_prealloc_range_cb); + if (ret) { + return ret; + } + + /* + * This is tricky: postcopy wants to start with a clean slate. On + * POSTCOPY_INCOMING_ADVISE, postcopy code discards all (ordinarily + * preallocated) RAM such that postcopy will work as expected later. + * + * However, we run after POSTCOPY_INCOMING_ADVISE -- but before actual + * RAM migration. So let's discard all memory again. This looks like an + * expensive NOP, but actually serves a purpose: we made sure that we + * were able to allocate all required backend memory once. We cannot + * guarantee that the backend memory we will free will remain free + * until we need it during postcopy, but at least we can catch the + * obvious setup issues this way. + */ + if (migration_incoming_postcopy_advised()) { + if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) { + return -EBUSY; + } + } + return 0; +} + typedef struct VirtIOMEMMigSanityChecks { VirtIOMEM *parent; uint64_t addr; @@ -1068,6 +1164,7 @@ static const VMStateDescription vmstate_virtio_mem_device_early = { .minimum_version_id = 1, .version_id = 1, .priority = MIG_PRI_POST_SETUP, + .post_load = virtio_mem_post_load_early, .fields = (VMStateField[]) { VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks, vmstate_virtio_mem_sanity_checks),