From patchwork Mon Feb 24 06:54:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399425 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B592192A for ; Mon, 24 Feb 2020 06:55:47 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 49B0320675 for ; Mon, 24 Feb 2020 06:55:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 49B0320675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60630 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67ec-0004yX-3P for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:55:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48115) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dk-0003Fz-Hz for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dj-0006CM-Ft for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:52 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:52404 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dj-00064p-3o for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:51 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 192E5494E0556D372EFC; Mon, 24 Feb 2020 14:54:46 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:38 +0800 From: zhanghailiang To: Subject: [PATCH V2 1/8] migration: fix COLO broken caused by a previous commit Date: Mon, 24 Feb 2020 14:54:07 +0800 Message-ID: <20200224065414.36524-2-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.32 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" This commit "migration: Create migration_is_running()" broke COLO. Becuase there is a process broken by this commit. colo_process_checkpoint ->colo_do_checkpoint_transaction ->migrate_set_block_enabled ->qmp_migrate_set_capabilities It can be fixed by make COLO process as an exception, Maybe we need a better way to fix it. Cc: Juan Quintela Signed-off-by: zhanghailiang Reviewed-by: Juan Quintela --- migration/migration.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 8fb68795dc..06d1ff9d56 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -865,7 +865,6 @@ bool migration_is_running(int state) case MIGRATION_STATUS_DEVICE: case MIGRATION_STATUS_WAIT_UNPLUG: case MIGRATION_STATUS_CANCELLING: - case MIGRATION_STATUS_COLO: return true; default: From patchwork Mon Feb 24 06:54:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399437 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 278801395 for ; Mon, 24 Feb 2020 06:57:41 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 071E120675 for ; Mon, 24 Feb 2020 06:57:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 071E120675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60656 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67gS-00081Y-6y for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:57:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48146) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dl-0003G3-CA for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dj-0006Cs-Ob for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:52410 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dj-00064r-3p for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:51 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 0EB83B71BB992289D780; Mon, 24 Feb 2020 14:54:46 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:38 +0800 From: zhanghailiang To: Subject: [PATCH V2 2/8] migration/colo: wrap incoming checkpoint process into new helper Date: Mon, 24 Feb 2020 14:54:08 +0800 Message-ID: <20200224065414.36524-3-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.32 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Split checkpoint incoming process into a helper. Signed-off-by: zhanghailiang Reviewed-by: Dr. David Alan Gilbert --- migration/colo.c | 260 ++++++++++++++++++++++++----------------------- 1 file changed, 133 insertions(+), 127 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 2c88aa57a2..93c5a452fb 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -664,13 +664,138 @@ void migrate_start_colo_process(MigrationState *s) qemu_mutex_lock_iothread(); } -static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request, - Error **errp) +static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, + QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp) +{ + uint64_t total_size; + uint64_t value; + Error *local_err = NULL; + int ret; + + qemu_mutex_lock_iothread(); + vm_stop_force_state(RUN_STATE_COLO); + trace_colo_vm_state_change("run", "stop"); + qemu_mutex_unlock_iothread(); + + /* FIXME: This is unnecessary for periodic checkpoint mode */ + colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY, + &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + colo_receive_check_message(mis->from_src_file, + COLO_MESSAGE_VMSTATE_SEND, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + qemu_mutex_lock_iothread(); + cpu_synchronize_all_pre_loadvm(); + ret = qemu_loadvm_state_main(mis->from_src_file, mis); + qemu_mutex_unlock_iothread(); + + if (ret < 0) { + error_setg(errp, "Load VM's live state (ram) error"); + return; + } + + value = colo_receive_message_value(mis->from_src_file, + COLO_MESSAGE_VMSTATE_SIZE, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + /* + * Read VM device state data into channel buffer, + * It's better to re-use the memory allocated. + * Here we need to handle the channel buffer directly. + */ + if (value > bioc->capacity) { + bioc->capacity = value; + bioc->data = g_realloc(bioc->data, bioc->capacity); + } + total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value); + if (total_size != value) { + error_setg(errp, "Got %" PRIu64 " VMState data, less than expected" + " %" PRIu64, total_size, value); + return; + } + bioc->usage = total_size; + qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL); + + colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED, + &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + qemu_mutex_lock_iothread(); + vmstate_loading = true; + ret = qemu_load_device_state(fb); + if (ret < 0) { + error_setg(errp, "COLO: load device state failed"); + qemu_mutex_unlock_iothread(); + return; + } + +#ifdef CONFIG_REPLICATION + replication_get_error_all(&local_err); + if (local_err) { + error_propagate(errp, local_err); + qemu_mutex_unlock_iothread(); + return; + } + + /* discard colo disk buffer */ + replication_do_checkpoint_all(&local_err); + if (local_err) { + error_propagate(errp, local_err); + qemu_mutex_unlock_iothread(); + return; + } +#else + abort(); +#endif + /* Notify all filters of all NIC to do checkpoint */ + colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err); + + if (local_err) { + error_propagate(errp, local_err); + qemu_mutex_unlock_iothread(); + return; + } + + vmstate_loading = false; + vm_start(); + trace_colo_vm_state_change("stop", "run"); + qemu_mutex_unlock_iothread(); + + if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { + failover_set_state(FAILOVER_STATUS_RELAUNCH, + FAILOVER_STATUS_NONE); + failover_request_active(NULL); + return; + } + + colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED, + &local_err); + if (local_err) { + error_propagate(errp, local_err); + } +} + +static void colo_wait_handle_message(MigrationIncomingState *mis, + QEMUFile *fb, QIOChannelBuffer *bioc, Error **errp) { COLOMessage msg; Error *local_err = NULL; - msg = colo_receive_message(f, &local_err); + msg = colo_receive_message(mis->from_src_file, &local_err); if (local_err) { error_propagate(errp, local_err); return; @@ -678,10 +803,9 @@ static void colo_wait_handle_message(QEMUFile *f, int *checkpoint_request, switch (msg) { case COLO_MESSAGE_CHECKPOINT_REQUEST: - *checkpoint_request = 1; + colo_incoming_process_checkpoint(mis, fb, bioc, errp); break; default: - *checkpoint_request = 0; error_setg(errp, "Got unknown COLO message: %d", msg); break; } @@ -692,10 +816,7 @@ void *colo_process_incoming_thread(void *opaque) MigrationIncomingState *mis = opaque; QEMUFile *fb = NULL; QIOChannelBuffer *bioc = NULL; /* Cache incoming device state */ - uint64_t total_size; - uint64_t value; Error *local_err = NULL; - int ret; rcu_register_thread(); qemu_sem_init(&mis->colo_incoming_sem, 0); @@ -749,134 +870,19 @@ void *colo_process_incoming_thread(void *opaque) } while (mis->state == MIGRATION_STATUS_COLO) { - int request = 0; - - colo_wait_handle_message(mis->from_src_file, &request, &local_err); + colo_wait_handle_message(mis, fb, bioc, &local_err); if (local_err) { - goto out; + error_report_err(local_err); + break; } - assert(request); if (failover_get_state() != FAILOVER_STATUS_NONE) { error_report("failover request"); - goto out; - } - - qemu_mutex_lock_iothread(); - vm_stop_force_state(RUN_STATE_COLO); - trace_colo_vm_state_change("run", "stop"); - qemu_mutex_unlock_iothread(); - - /* FIXME: This is unnecessary for periodic checkpoint mode */ - colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY, - &local_err); - if (local_err) { - goto out; - } - - colo_receive_check_message(mis->from_src_file, - COLO_MESSAGE_VMSTATE_SEND, &local_err); - if (local_err) { - goto out; - } - - qemu_mutex_lock_iothread(); - cpu_synchronize_all_pre_loadvm(); - ret = qemu_loadvm_state_main(mis->from_src_file, mis); - qemu_mutex_unlock_iothread(); - - if (ret < 0) { - error_report("Load VM's live state (ram) error"); - goto out; - } - - value = colo_receive_message_value(mis->from_src_file, - COLO_MESSAGE_VMSTATE_SIZE, &local_err); - if (local_err) { - goto out; - } - - /* - * Read VM device state data into channel buffer, - * It's better to re-use the memory allocated. - * Here we need to handle the channel buffer directly. - */ - if (value > bioc->capacity) { - bioc->capacity = value; - bioc->data = g_realloc(bioc->data, bioc->capacity); - } - total_size = qemu_get_buffer(mis->from_src_file, bioc->data, value); - if (total_size != value) { - error_report("Got %" PRIu64 " VMState data, less than expected" - " %" PRIu64, total_size, value); - goto out; - } - bioc->usage = total_size; - qio_channel_io_seek(QIO_CHANNEL(bioc), 0, 0, NULL); - - colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_RECEIVED, - &local_err); - if (local_err) { - goto out; - } - - qemu_mutex_lock_iothread(); - vmstate_loading = true; - ret = qemu_load_device_state(fb); - if (ret < 0) { - error_report("COLO: load device state failed"); - qemu_mutex_unlock_iothread(); - goto out; - } - -#ifdef CONFIG_REPLICATION - replication_get_error_all(&local_err); - if (local_err) { - qemu_mutex_unlock_iothread(); - goto out; - } - - /* discard colo disk buffer */ - replication_do_checkpoint_all(&local_err); - if (local_err) { - qemu_mutex_unlock_iothread(); - goto out; - } -#else - abort(); -#endif - /* Notify all filters of all NIC to do checkpoint */ - colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err); - - if (local_err) { - qemu_mutex_unlock_iothread(); - goto out; - } - - vmstate_loading = false; - vm_start(); - trace_colo_vm_state_change("stop", "run"); - qemu_mutex_unlock_iothread(); - - if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { - failover_set_state(FAILOVER_STATUS_RELAUNCH, - FAILOVER_STATUS_NONE); - failover_request_active(NULL); - goto out; - } - - colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED, - &local_err); - if (local_err) { - goto out; + break; } } out: vmstate_loading = false; - /* Throw the unreported error message after exited from loop */ - if (local_err) { - error_report_err(local_err); - } /* * There are only two reasons we can get here, some error happened From patchwork Mon Feb 24 06:54:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399435 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B5C5792A for ; Mon, 24 Feb 2020 06:57:39 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 95C5520675 for ; Mon, 24 Feb 2020 06:57:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 95C5520675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60654 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67gQ-0007vv-QM for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:57:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48116) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dk-0003G0-Ik for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dj-0006CR-Fz for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:52 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:40848 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dj-00064s-3p for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:51 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 402B59380F0EC1C1E944; Mon, 24 Feb 2020 14:54:46 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:39 +0800 From: zhanghailiang To: Subject: [PATCH V2 3/8] savevm: Don't call colo_init_ram_cache twice Date: Mon, 24 Feb 2020 14:54:09 +0800 Message-ID: <20200224065414.36524-4-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.35 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" This helper has been called twice which is wrong. Left the one where called while get COLO enable message from source side. Signed-off-by: zhanghailiang Reviewed-by: Juan Quintela --- migration/migration.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 06d1ff9d56..e8c62c6e2e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -484,11 +484,6 @@ static void process_incoming_migration_co(void *opaque) goto fail; } - if (colo_init_ram_cache() < 0) { - error_report("Init ram cache failed"); - goto fail; - } - qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming", colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE); mis->have_colo_incoming_thread = true; From patchwork Mon Feb 24 06:54:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399439 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 917CF1395 for ; Mon, 24 Feb 2020 06:57:43 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7312C20675 for ; Mon, 24 Feb 2020 06:57:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7312C20675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60658 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67gU-000874-Md for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:57:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48148) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dl-0003G6-FY for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dk-0006D2-4G for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:2728 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dj-00064o-Ew for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:52 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 36AD710ACAAA30F9F5F0; Mon, 24 Feb 2020 14:54:46 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:39 +0800 From: zhanghailiang To: Subject: [PATCH V2 4/8] COLO: Optimize memory back-up process Date: Mon, 24 Feb 2020 14:54:10 +0800 Message-ID: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.190 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" This patch will reduce the downtime of VM for the initial process, Privously, we copied all these memory in preparing stage of COLO while we need to stop VM, which is a time-consuming process. Here we optimize it by a trick, back-up every page while in migration process while COLO is enabled, though it affects the speed of the migration, but it obviously reduce the downtime of back-up all SVM'S memory in COLO preparing stage. Signed-off-by: zhanghailiang Reviewed-by: Dr. David Alan Gilbert --- migration/colo.c | 3 +++ migration/ram.c | 68 +++++++++++++++++++++++++++++++++++------------- migration/ram.h | 1 + 3 files changed, 54 insertions(+), 18 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 93c5a452fb..44942c4e23 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -26,6 +26,7 @@ #include "qemu/main-loop.h" #include "qemu/rcu.h" #include "migration/failover.h" +#include "migration/ram.h" #ifdef CONFIG_REPLICATION #include "replication.h" #endif @@ -845,6 +846,8 @@ void *colo_process_incoming_thread(void *opaque) */ qemu_file_set_blocking(mis->from_src_file, true); + colo_incoming_start_dirty_log(); + bioc = qio_channel_buffer_new(COLO_BUFFER_BASE_SIZE); fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc)); object_unref(OBJECT(bioc)); diff --git a/migration/ram.c b/migration/ram.c index ed23ed1c7c..ebf9e6ba51 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -2277,6 +2277,7 @@ static void ram_list_init_bitmaps(void) * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole * guest memory. */ + block->bmap = bitmap_new(pages); bitmap_set(block->bmap, 0, pages); block->clear_bmap_shift = shift; @@ -2986,7 +2987,6 @@ int colo_init_ram_cache(void) } return -errno; } - memcpy(block->colo_cache, block->host, block->used_length); } } @@ -3000,19 +3000,36 @@ int colo_init_ram_cache(void) RAMBLOCK_FOREACH_NOT_IGNORED(block) { unsigned long pages = block->max_length >> TARGET_PAGE_BITS; - block->bmap = bitmap_new(pages); - bitmap_set(block->bmap, 0, pages); } } - ram_state = g_new0(RAMState, 1); - ram_state->migration_dirty_pages = 0; - qemu_mutex_init(&ram_state->bitmap_mutex); - memory_global_dirty_log_start(); + ram_state_init(&ram_state); return 0; } +/* TODO: duplicated with ram_init_bitmaps */ +void colo_incoming_start_dirty_log(void) +{ + RAMBlock *block = NULL; + /* For memory_global_dirty_log_start below. */ + qemu_mutex_lock_iothread(); + qemu_mutex_lock_ramlist(); + + memory_global_dirty_log_sync(); + WITH_RCU_READ_LOCK_GUARD() { + RAMBLOCK_FOREACH_NOT_IGNORED(block) { + ramblock_sync_dirty_bitmap(ram_state, block); + /* Discard this dirty bitmap record */ + bitmap_zero(block->bmap, block->max_length >> TARGET_PAGE_BITS); + } + memory_global_dirty_log_start(); + } + ram_state->migration_dirty_pages = 0; + qemu_mutex_unlock_ramlist(); + qemu_mutex_unlock_iothread(); +} + /* It is need to hold the global lock to call this helper */ void colo_release_ram_cache(void) { @@ -3032,9 +3049,7 @@ void colo_release_ram_cache(void) } } } - qemu_mutex_destroy(&ram_state->bitmap_mutex); - g_free(ram_state); - ram_state = NULL; + ram_state_cleanup(&ram_state); } /** @@ -3302,7 +3317,6 @@ static void colo_flush_ram_cache(void) ramblock_sync_dirty_bitmap(ram_state, block); } } - trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages); WITH_RCU_READ_LOCK_GUARD() { block = QLIST_FIRST_RCU(&ram_list.blocks); @@ -3348,7 +3362,7 @@ static int ram_load_precopy(QEMUFile *f) while (!ret && !(flags & RAM_SAVE_FLAG_EOS)) { ram_addr_t addr, total_ram_bytes; - void *host = NULL; + void *host = NULL, *host_bak = NULL; uint8_t ch; /* @@ -3379,20 +3393,35 @@ static int ram_load_precopy(QEMUFile *f) RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) { RAMBlock *block = ram_block_from_stream(f, flags); + host = host_from_ram_block_offset(block, addr); /* - * After going into COLO, we should load the Page into colo_cache. + * After going into COLO stage, we should not load the page + * into SVM's memory diretly, we put them into colo_cache firstly. + * NOTE: We need to keep a copy of SVM's ram in colo_cache. + * Privously, we copied all these memory in preparing stage of COLO + * while we need to stop VM, which is a time-consuming process. + * Here we optimize it by a trick, back-up every page while in + * migration process while COLO is enabled, though it affects the + * speed of the migration, but it obviously reduce the downtime of + * back-up all SVM'S memory in COLO preparing stage. */ - if (migration_incoming_in_colo_state()) { - host = colo_cache_from_block_offset(block, addr); - } else { - host = host_from_ram_block_offset(block, addr); + if (migration_incoming_colo_enabled()) { + if (migration_incoming_in_colo_state()) { + /* In COLO stage, put all pages into cache temporarily */ + host = colo_cache_from_block_offset(block, addr); + } else { + /* + * In migration stage but before COLO stage, + * Put all pages into both cache and SVM's memory. + */ + host_bak = colo_cache_from_block_offset(block, addr); + } } if (!host) { error_report("Illegal RAM offset " RAM_ADDR_FMT, addr); ret = -EINVAL; break; } - if (!migration_incoming_in_colo_state()) { ramblock_recv_bitmap_set(block, host); } @@ -3506,6 +3535,9 @@ static int ram_load_precopy(QEMUFile *f) if (!ret) { ret = qemu_file_get_error(f); } + if (!ret && host_bak) { + memcpy(host_bak, host, TARGET_PAGE_SIZE); + } } ret |= wait_for_decompress_done(); diff --git a/migration/ram.h b/migration/ram.h index a553d40751..5ceaff7cb4 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -66,5 +66,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); /* ram cache */ int colo_init_ram_cache(void); void colo_release_ram_cache(void); +void colo_incoming_start_dirty_log(void); #endif From patchwork Mon Feb 24 06:54:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399431 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4247E92A for ; Mon, 24 Feb 2020 06:55:51 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 23A6220675 for ; Mon, 24 Feb 2020 06:55:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23A6220675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60636 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67eg-00055l-7J for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:55:50 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48166) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dm-0003GS-IG for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dl-0006E6-LR for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:54 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:40950 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dl-0006Cd-9F for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 207AEBF394EC29AE2C13; Mon, 24 Feb 2020 14:54:51 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:40 +0800 From: zhanghailiang To: Subject: [PATCH V2 6/8] migration: recognize COLO as part of activating process Date: Mon, 24 Feb 2020 14:54:12 +0800 Message-ID: <20200224065414.36524-7-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.35 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" We will migrate parts of dirty pages backgroud lively during the gap time of two checkpoints, without this modification, it will not work because ram_save_iterate() will check it before send RAM_SAVE_FLAG_EOS at the end of it. Signed-off-by: zhanghailiang Reviewed-by: Dr. David Alan Gilbert --- migration/migration.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/migration.c b/migration/migration.c index e8c62c6e2e..f71c337600 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -840,6 +840,7 @@ bool migration_is_setup_or_active(int state) case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: case MIGRATION_STATUS_WAIT_UNPLUG: + case MIGRATION_STATUS_COLO: return true; default: From patchwork Mon Feb 24 06:54:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399443 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F29FD924 for ; Mon, 24 Feb 2020 06:59:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D3DF520675 for ; Mon, 24 Feb 2020 06:59:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D3DF520675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60672 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67iF-0001rL-2t for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:59:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48184) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dn-0003Hb-9G for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dl-0006ES-Tv for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:55 -0500 Received: from szxga07-in.huawei.com ([45.249.212.35]:40948 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dl-0006Ce-9f for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 1A7A07E78B65621728BD; Mon, 24 Feb 2020 14:54:51 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:41 +0800 From: zhanghailiang To: Subject: [PATCH V2 7/8] COLO: Migrate dirty pages during the gap of checkpointing Date: Mon, 24 Feb 2020 14:54:13 +0800 Message-ID: <20200224065414.36524-8-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.35 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" We can migrate some dirty pages during the gap of checkpointing, by this way, we can reduce the amount of ram migrated during checkpointing. Signed-off-by: zhanghailiang --- migration/colo.c | 73 ++++++++++++++++++++++++++++++++++++++++-- migration/migration.h | 1 + migration/trace-events | 1 + qapi/migration.json | 4 ++- 4 files changed, 75 insertions(+), 4 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index 44942c4e23..c36d94072f 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -47,6 +47,13 @@ static COLOMode last_colo_mode; #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) +#define DEFAULT_RAM_PENDING_CHECK 1000 + +/* should be calculated by bandwidth and max downtime ? */ +#define THRESHOLD_PENDING_SIZE (100 * 1024 * 1024UL) + +static int checkpoint_request; + bool migration_in_colo_state(void) { MigrationState *s = migrate_get_current(); @@ -517,6 +524,20 @@ static void colo_compare_notify_checkpoint(Notifier *notifier, void *data) colo_checkpoint_notify(data); } +static bool colo_need_migrate_ram_background(MigrationState *s) +{ + uint64_t pending_size, pend_pre, pend_compat, pend_post; + int64_t max_size = THRESHOLD_PENDING_SIZE; + + qemu_savevm_state_pending(s->to_dst_file, max_size, &pend_pre, + &pend_compat, &pend_post); + pending_size = pend_pre + pend_compat + pend_post; + + trace_colo_need_migrate_ram_background(pending_size); + return (pending_size >= max_size); +} + + static void colo_process_checkpoint(MigrationState *s) { QIOChannelBuffer *bioc; @@ -572,6 +593,8 @@ static void colo_process_checkpoint(MigrationState *s) timer_mod(s->colo_delay_timer, current_time + s->parameters.x_checkpoint_delay); + timer_mod(s->pending_ram_check_timer, + current_time + DEFAULT_RAM_PENDING_CHECK); while (s->state == MIGRATION_STATUS_COLO) { if (failover_get_state() != FAILOVER_STATUS_NONE) { @@ -584,9 +607,30 @@ static void colo_process_checkpoint(MigrationState *s) if (s->state != MIGRATION_STATUS_COLO) { goto out; } - ret = colo_do_checkpoint_transaction(s, bioc, fb); - if (ret < 0) { - goto out; + if (atomic_xchg(&checkpoint_request, 0)) { + /* start a colo checkpoint */ + ret = colo_do_checkpoint_transaction(s, bioc, fb); + if (ret < 0) { + goto out; + } + } else { + if (colo_need_migrate_ram_background(s)) { + colo_send_message(s->to_dst_file, + COLO_MESSAGE_MIGRATE_RAM_BACKGROUND, + &local_err); + if (local_err) { + goto out; + } + + qemu_savevm_state_iterate(s->to_dst_file, false); + qemu_put_byte(s->to_dst_file, QEMU_VM_EOF); + ret = qemu_file_get_error(s->to_dst_file); + if (ret < 0) { + error_setg_errno(&local_err, -ret, + "Failed to send dirty pages backgroud"); + goto out; + } + } } } @@ -627,6 +671,8 @@ out: colo_compare_unregister_notifier(&packets_compare_notifier); timer_del(s->colo_delay_timer); timer_free(s->colo_delay_timer); + timer_del(s->pending_ram_check_timer); + timer_free(s->pending_ram_check_timer); qemu_sem_destroy(&s->colo_checkpoint_sem); /* @@ -644,6 +690,7 @@ void colo_checkpoint_notify(void *opaque) MigrationState *s = opaque; int64_t next_notify_time; + atomic_inc(&checkpoint_request); qemu_sem_post(&s->colo_checkpoint_sem); s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); next_notify_time = s->colo_checkpoint_time + @@ -651,6 +698,19 @@ void colo_checkpoint_notify(void *opaque) timer_mod(s->colo_delay_timer, next_notify_time); } +static void colo_pending_ram_check_notify(void *opaque) +{ + int64_t next_notify_time; + MigrationState *s = opaque; + + if (migration_in_colo_state()) { + next_notify_time = DEFAULT_RAM_PENDING_CHECK + + qemu_clock_get_ms(QEMU_CLOCK_HOST); + timer_mod(s->pending_ram_check_timer, next_notify_time); + qemu_sem_post(&s->colo_checkpoint_sem); + } +} + void migrate_start_colo_process(MigrationState *s) { qemu_mutex_unlock_iothread(); @@ -658,6 +718,8 @@ void migrate_start_colo_process(MigrationState *s) s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, colo_checkpoint_notify, s); + s->pending_ram_check_timer = timer_new_ms(QEMU_CLOCK_HOST, + colo_pending_ram_check_notify, s); qemu_sem_init(&s->colo_exit_sem, 0); migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); @@ -806,6 +868,11 @@ static void colo_wait_handle_message(MigrationIncomingState *mis, case COLO_MESSAGE_CHECKPOINT_REQUEST: colo_incoming_process_checkpoint(mis, fb, bioc, errp); break; + case COLO_MESSAGE_MIGRATE_RAM_BACKGROUND: + if (qemu_loadvm_state_main(mis->from_src_file, mis) < 0) { + error_setg(errp, "Load ram background failed"); + } + break; default: error_setg(errp, "Got unknown COLO message: %d", msg); break; diff --git a/migration/migration.h b/migration/migration.h index 8473ddfc88..5355259789 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -219,6 +219,7 @@ struct MigrationState QemuSemaphore colo_checkpoint_sem; int64_t colo_checkpoint_time; QEMUTimer *colo_delay_timer; + QEMUTimer *pending_ram_check_timer; /* The first error that has occurred. We used the mutex to be able to return the 1st error message */ diff --git a/migration/trace-events b/migration/trace-events index 4ab0a503d2..f2ed0c8645 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -295,6 +295,7 @@ migration_tls_incoming_handshake_complete(void) "" colo_vm_state_change(const char *old, const char *new) "Change '%s' => '%s'" colo_send_message(const char *msg) "Send '%s' message" colo_receive_message(const char *msg) "Receive '%s' message" +colo_need_migrate_ram_background(uint64_t pending_size) "Pending 0x%" PRIx64 " dirty ram" # colo-failover.c colo_failover_set_state(const char *new_state) "new state %s" diff --git a/qapi/migration.json b/qapi/migration.json index 52f3429969..73445f1978 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -977,12 +977,14 @@ # # @vmstate-loaded: VM's state has been loaded by SVM. # +# @migrate-ram-background: Send some dirty pages during the gap of COLO checkpoint +# # Since: 2.8 ## { 'enum': 'COLOMessage', 'data': [ 'checkpoint-ready', 'checkpoint-request', 'checkpoint-reply', 'vmstate-send', 'vmstate-size', 'vmstate-received', - 'vmstate-loaded' ] } + 'vmstate-loaded', 'migrate-ram-background' ] } ## # @COLOMode: From patchwork Mon Feb 24 06:54:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 11399441 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9D4E4924 for ; Mon, 24 Feb 2020 06:59:29 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7D4CF20675 for ; Mon, 24 Feb 2020 06:59:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D4CF20675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60670 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67iC-0001mf-HB for patchwork-qemu-devel@patchwork.kernel.org; Mon, 24 Feb 2020 01:59:28 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:48172) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j67dm-0003GY-P4 for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j67dl-0006EE-P0 for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:54 -0500 Received: from szxga06-in.huawei.com ([45.249.212.32]:52524 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1j67dl-0006Co-DN for qemu-devel@nongnu.org; Mon, 24 Feb 2020 01:54:53 -0500 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 4E4FD83CDB1972F96EFE; Mon, 24 Feb 2020 14:54:51 +0800 (CST) Received: from huawei.com (10.133.214.142) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.439.0; Mon, 24 Feb 2020 14:54:42 +0800 From: zhanghailiang To: Subject: [PATCH V2 8/8] migration/colo: Only flush ram cache while do checkpoint Date: Mon, 24 Feb 2020 14:54:14 +0800 Message-ID: <20200224065414.36524-9-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> References: <20200224065414.36524-1-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.133.214.142] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 45.249.212.32 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: danielcho@qnap.com, zhanghailiang , dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" After add migrating ram backgroud, we will call ram_load for this process, but we should not flush ram cache during this process. Move the flush action to the right place. Signed-off-by: zhanghailiang Reviewed-by: Dr. David Alan Gilbert --- migration/colo.c | 1 + migration/ram.c | 5 +---- migration/ram.h | 1 + 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index c36d94072f..18df8289f8 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -799,6 +799,7 @@ static void colo_incoming_process_checkpoint(MigrationIncomingState *mis, qemu_mutex_lock_iothread(); vmstate_loading = true; + colo_flush_ram_cache(); ret = qemu_load_device_state(fb); if (ret < 0) { error_setg(errp, "COLO: load device state failed"); diff --git a/migration/ram.c b/migration/ram.c index 1b3f423351..7bc841d14f 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3305,7 +3305,7 @@ static bool postcopy_is_running(void) * Flush content of RAM cache into SVM's memory. * Only flush the pages that be dirtied by PVM or SVM or both. */ -static void colo_flush_ram_cache(void) +void colo_flush_ram_cache(void) { RAMBlock *block = NULL; void *dst_host; @@ -3576,9 +3576,6 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) } trace_ram_load_complete(ret, seq_iter); - if (!ret && migration_incoming_in_colo_state()) { - colo_flush_ram_cache(); - } return ret; } diff --git a/migration/ram.h b/migration/ram.h index 5ceaff7cb4..ae14341482 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -67,5 +67,6 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb); int colo_init_ram_cache(void); void colo_release_ram_cache(void); void colo_incoming_start_dirty_log(void); +void colo_flush_ram_cache(void); #endif