From patchwork Tue Jan 17 12:57:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 9520841 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 27E946020B for ; Tue, 17 Jan 2017 13:03:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 391282854A for ; Tue, 17 Jan 2017 13:03:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2DE7628550; Tue, 17 Jan 2017 13:03:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 772272854A for ; Tue, 17 Jan 2017 13:03:15 +0000 (UTC) Received: from localhost ([::1]:35060 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cTTPt-0005Jv-AE for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Jan 2017 08:03:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41846) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cTTMG-0002dj-QT for qemu-devel@nongnu.org; Tue, 17 Jan 2017 07:59:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cTTMC-0003Kn-Nd for qemu-devel@nongnu.org; Tue, 17 Jan 2017 07:59:28 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:55585) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1cTTMB-0003J8-RN for qemu-devel@nongnu.org; Tue, 17 Jan 2017 07:59:24 -0500 Received: from 172.24.1.136 (EHLO szxeml431-hub.china.huawei.com) ([172.24.1.136]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DYA55872; Tue, 17 Jan 2017 20:58:33 +0800 (CST) Received: from localhost (10.177.24.212) by szxeml431-hub.china.huawei.com (10.82.67.208) with Microsoft SMTP Server id 14.3.235.1; Tue, 17 Jan 2017 20:58:23 +0800 From: zhanghailiang To: Date: Tue, 17 Jan 2017 20:57:44 +0800 Message-ID: <1484657864-21708-4-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.7.2.windows.1 In-Reply-To: <1484657864-21708-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1484657864-21708-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.587E1507.0134, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 2d871d249158b22ad22b4496e38e1521 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 58.251.152.64 Subject: [Qemu-devel] [PATCH 3/3] COLO: Don't process failover request while loading VM's state X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: xuquan8@huawei.com, lizhijian@cn.fujitsu.com, zhangchen.fnst@cn.fujitsu.com, quintela@redhat.com, eddie.dong@intel.com, dgilbert@redhat.com, amit.shah@redhat.com, zhanghailiang Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP We should not do failover work while the main thread is loading VM's state. Otherwise the consistent of VM's memory and device state will be broken. We will restart the loading process after jump over the stage, The new failover status 'RELAUNCH' will help to record if we need to restart the process. Cc: Eric Blake Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert --- migration/colo.c | 26 ++++++++++++++++++++++++++ qapi-schema.json | 4 +++- 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/migration/colo.c b/migration/colo.c index 3222812..712308e 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -20,6 +20,8 @@ #include "qapi/error.h" #include "migration/failover.h" +static bool vmstate_loading; + #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) bool colo_supported(void) @@ -51,6 +53,19 @@ static void secondary_vm_do_failover(void) int old_state; MigrationIncomingState *mis = migration_incoming_get_current(); + /* Can not do failover during the process of VM's loading VMstate, Or + * it will break the secondary VM. + */ + if (vmstate_loading) { + old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, + FAILOVER_STATUS_RELAUNCH); + if (old_state != FAILOVER_STATUS_ACTIVE) { + error_report("Unknown error while do failover for secondary VM," + "old_state: %s", FailoverStatus_lookup[old_state]); + } + return; + } + migrate_set_state(&mis->state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); @@ -548,13 +563,23 @@ void *colo_process_incoming_thread(void *opaque) qemu_mutex_lock_iothread(); qemu_system_reset(VMRESET_SILENT); + vmstate_loading = true; if (qemu_loadvm_state(fb) < 0) { error_report("COLO: loadvm failed"); qemu_mutex_unlock_iothread(); goto out; } + + vmstate_loading = false; qemu_mutex_unlock_iothread(); + if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) { + failover_set_state(FAILOVER_STATUS_RELAUNCH, + FAILOVER_STATUS_NONE); + failover_request_active(NULL); + goto out; + } + colo_send_message(mis->to_src_file, COLO_MESSAGE_VMSTATE_LOADED, &local_err); if (local_err) { @@ -563,6 +588,7 @@ void *colo_process_incoming_thread(void *opaque) } out: + vmstate_loading = false; /* Throw the unreported error message after exited from loop */ if (local_err) { error_report_err(local_err); diff --git a/qapi-schema.json b/qapi-schema.json index ce20f16..97210c7 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -856,10 +856,12 @@ # # @completed: finish the process of failover # +# @relaunch: restart the failover process, from 'none' -> 'completed' +# # Since: 2.8 ## { 'enum': 'FailoverStatus', - 'data': [ 'none', 'require', 'active', 'completed'] } + 'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] } ## # @x-colo-lost-heartbeat: