From patchwork Tue Jan 17 12:57:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 9520829 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A6E636020A for ; Tue, 17 Jan 2017 12:59:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B209228546 for ; Tue, 17 Jan 2017 12:59:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A3F3F2854A; Tue, 17 Jan 2017 12:59:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0CF7528546 for ; Tue, 17 Jan 2017 12:59:29 +0000 (UTC) Received: from localhost ([::1]:34939 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cTTMG-0002Nf-3I for patchwork-qemu-devel@patchwork.kernel.org; Tue, 17 Jan 2017 07:59:28 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41636) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cTTLq-0002Lj-9L for qemu-devel@nongnu.org; Tue, 17 Jan 2017 07:59:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cTTLp-0003E0-41 for qemu-devel@nongnu.org; Tue, 17 Jan 2017 07:59:02 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:55454) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1cTTLn-000397-SH for qemu-devel@nongnu.org; Tue, 17 Jan 2017 07:59:01 -0500 Received: from 172.24.1.137 (EHLO szxeml425-hub.china.huawei.com) ([172.24.1.137]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DYA55874; Tue, 17 Jan 2017 20:58:36 +0800 (CST) Received: from localhost (10.177.24.212) by szxeml425-hub.china.huawei.com (10.82.67.180) with Microsoft SMTP Server id 14.3.235.1; Tue, 17 Jan 2017 20:58:22 +0800 From: zhanghailiang To: Date: Tue, 17 Jan 2017 20:57:43 +0800 Message-ID: <1484657864-21708-3-git-send-email-zhang.zhanghailiang@huawei.com> X-Mailer: git-send-email 2.7.2.windows.1 In-Reply-To: <1484657864-21708-1-git-send-email-zhang.zhanghailiang@huawei.com> References: <1484657864-21708-1-git-send-email-zhang.zhanghailiang@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.587E1507.03A8, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: e873da59c596e0cd2b03d3abb2243fd2 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 58.251.152.64 Subject: [Qemu-devel] [PATCH 2/3] COLO: Shutdown related socket fd while do failover X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: xuquan8@huawei.com, lizhijian@cn.fujitsu.com, zhangchen.fnst@cn.fujitsu.com, quintela@redhat.com, eddie.dong@intel.com, dgilbert@redhat.com, amit.shah@redhat.com, zhanghailiang Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP If the net connection between primary host and secondary host breaks while COLO/COLO incoming threads are doing read() or write(). It will block until connection is timeout, and the failover process will be blocked because of it. So it is necessary to shutdown all the socket fds used by COLO to avoid this situation. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, Or there will be an error. Signed-off-by: zhanghailiang Signed-off-by: Li Zhijian Reviewed-by: Dr. David Alan Gilbert Cc: Dr. David Alan Gilbert --- include/migration/migration.h | 3 +++ migration/colo.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/include/migration/migration.h b/include/migration/migration.h index 487ac1e..7cac877 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -113,6 +113,7 @@ struct MigrationIncomingState { QemuThread colo_incoming_thread; /* The coroutine we should enter (back) after failover */ Coroutine *migration_incoming_co; + QemuSemaphore colo_incoming_sem; /* See savevm.c */ LoadStateEntry_Head loadvm_handlers; @@ -182,6 +183,8 @@ struct MigrationState QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; /* The RAMBlock used in the last src_page_request */ RAMBlock *last_req_rb; + /* The semaphore is used to notify COLO thread that failover is finished */ + QemuSemaphore colo_exit_sem; /* The semaphore is used to notify COLO thread to do checkpoint */ QemuSemaphore colo_checkpoint_sem; diff --git a/migration/colo.c b/migration/colo.c index 08b2e46..3222812 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void) /* recover runstate to normal migration finish state */ autostart = true; } + /* + * Make sure COLO incoming thread not block in recv or send, + * If mis->from_src_file and mis->to_src_file use the same fd, + * The second shutdown() will return -1, we ignore this value, + * It is harmless. + */ + if (mis->from_src_file) { + qemu_file_shutdown(mis->from_src_file); + } + if (mis->to_src_file) { + qemu_file_shutdown(mis->to_src_file); + } old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, FAILOVER_STATUS_COMPLETED); @@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void) "secondary VM", FailoverStatus_lookup[old_state]); return; } + /* Notify COLO incoming thread that failover work is finished */ + qemu_sem_post(&mis->colo_incoming_sem); /* For Secondary VM, jump to incoming co */ if (mis->migration_incoming_co) { qemu_coroutine_enter(mis->migration_incoming_co); @@ -81,6 +95,18 @@ static void primary_vm_do_failover(void) migrate_set_state(&s->state, MIGRATION_STATUS_COLO, MIGRATION_STATUS_COMPLETED); + /* + * Wake up COLO thread which may blocked in recv() or send(), + * The s->rp_state.from_dst_file and s->to_dst_file may use the + * same fd, but we still shutdown the fd for twice, it is harmless. + */ + if (s->to_dst_file) { + qemu_file_shutdown(s->to_dst_file); + } + if (s->rp_state.from_dst_file) { + qemu_file_shutdown(s->rp_state.from_dst_file); + } + old_state = failover_set_state(FAILOVER_STATUS_ACTIVE, FAILOVER_STATUS_COMPLETED); if (old_state != FAILOVER_STATUS_ACTIVE) { @@ -88,6 +114,8 @@ static void primary_vm_do_failover(void) FailoverStatus_lookup[old_state]); return; } + /* Notify COLO thread that failover work is finished */ + qemu_sem_post(&s->colo_exit_sem); } void colo_do_failover(MigrationState *s) @@ -361,6 +389,14 @@ out: timer_del(s->colo_delay_timer); + /* Hope this not to be too long to wait here */ + qemu_sem_wait(&s->colo_exit_sem); + qemu_sem_destroy(&s->colo_exit_sem); + /* + * Must be called after failover BH is completed, + * Or the failover BH may shutdown the wrong fd that + * re-used by other threads after we release here. + */ if (s->rp_state.from_dst_file) { qemu_fclose(s->rp_state.from_dst_file); } @@ -385,6 +421,7 @@ void migrate_start_colo_process(MigrationState *s) s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, colo_checkpoint_notify, s); + qemu_sem_init(&s->colo_exit_sem, 0); migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); colo_process_checkpoint(s); @@ -423,6 +460,8 @@ void *colo_process_incoming_thread(void *opaque) uint64_t value; Error *local_err = NULL; + qemu_sem_init(&mis->colo_incoming_sem, 0); + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); @@ -533,6 +572,10 @@ out: qemu_fclose(fb); } + /* Hope this not to be too long to loop here */ + qemu_sem_wait(&mis->colo_incoming_sem); + qemu_sem_destroy(&mis->colo_incoming_sem); + /* Must be called after failover BH is completed */ if (mis->to_src_file) { qemu_fclose(mis->to_src_file); }