From patchwork Tue Apr 12 13:01:56 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 8810291 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 1334B9F54F for ; Tue, 12 Apr 2016 13:05:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 1F36F2035E for ; Tue, 12 Apr 2016 13:05:23 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C7ECE20357 for ; Tue, 12 Apr 2016 13:05:20 +0000 (UTC) Received: from localhost ([::1]:49031 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1apy0O-0004kR-4y for patchwork-qemu-devel@patchwork.kernel.org; Tue, 12 Apr 2016 09:05:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39457) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1apxxu-0000D6-Mr for qemu-devel@nongnu.org; Tue, 12 Apr 2016 09:02:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1apxxr-0007nv-UP for qemu-devel@nongnu.org; Tue, 12 Apr 2016 09:02:46 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:33750) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1apxxq-0007iJ-Fo for qemu-devel@nongnu.org; Tue, 12 Apr 2016 09:02:43 -0400 Received: from 172.24.1.136 (EHLO szxeml432-hub.china.huawei.com) ([172.24.1.136]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DIN82493; Tue, 12 Apr 2016 21:02:15 +0800 (CST) Received: from [127.0.0.1] (10.177.24.212) by szxeml432-hub.china.huawei.com (10.82.67.209) with Microsoft SMTP Server id 14.3.235.1; Tue, 12 Apr 2016 21:02:04 +0800 To: Li Zhijian , References: <1460096797-14916-1-git-send-email-zhang.zhanghailiang@huawei.com> <1460096797-14916-17-git-send-email-zhang.zhanghailiang@huawei.com> <570C6561.201@cn.fujitsu.com> From: Hailiang Zhang Message-ID: <570CF1C4.5000408@huawei.com> Date: Tue, 12 Apr 2016 21:01:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <570C6561.201@cn.fujitsu.com> X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090205.570CF1DA.00BE, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 188f7853e9c6b137ec2cff2987016db7 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 58.251.152.64 Subject: Re: [Qemu-devel] [PATCH COLO-Frame v16 16/35] COLO: synchronize PVM's state to SVM periodically X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: xiecl.fnst@cn.fujitsu.com, zhangchen.fnst@cn.fujitsu.com, quintela@redhat.com, armbru@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, dgilbert@redhat.com, arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, hongyang.yang@easystack.cn Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 2016/4/12 11:02, Li Zhijian wrote: > > > On 04/08/2016 02:26 PM, zhanghailiang wrote: >> Do checkpoint periodically, the default interval is 200ms. >> >> Signed-off-by: zhanghailiang >> Signed-off-by: Li Zhijian >> Reviewed-by: Dr. David Alan Gilbert >> --- >> v12: >> - Add Reviewed-by tag >> v11: >> - Fix wrong sleep time for checkpoint period. (Dave's comment) >> --- >> migration/colo.c | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/migration/colo.c b/migration/colo.c >> index 4dae069..4e3b39f 100644 >> --- a/migration/colo.c >> +++ b/migration/colo.c >> @@ -11,6 +11,7 @@ >> */ >> >> #include "qemu/osdep.h" >> +#include "qemu/timer.h" >> #include "sysemu/sysemu.h" >> #include "migration/colo.h" >> #include "trace.h" >> @@ -231,6 +232,7 @@ out: >> static void colo_process_checkpoint(MigrationState *s) >> { >> QEMUSizedBuffer *buffer = NULL; >> + int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); >> Error *local_err = NULL; >> int ret; >> >> @@ -262,11 +264,21 @@ static void colo_process_checkpoint(MigrationState *s) >> trace_colo_vm_state_change("stop", "run"); >> >> while (s->state == MIGRATION_STATUS_COLO) { >> + current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); >> + if (current_time - checkpoint_time < >> + s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) { >> + int64_t delay_ms; >> + >> + delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] - >> + (current_time - checkpoint_time); >> + g_usleep(delay_ms * 1000); > > Once a large value(e.g. 1000000) is set to s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY], > that means here will sleep 1000 seconds and people can't revert this operation. > Can we let this sleep operation more flexible ? > Good catch, that is really a problem, we can solve it by the follow patch, it will simplify the delay time process in COLO and benefits the later COLO that based on colo-proxy. the colo-proxy thread can call colo_checkpoint_notify() directly to notify COLO thread to do checkpoint. I will not update COLO frame with this patch for now, since it is experimental and the related patches have been reviewed, i will keep it as a optimized patch for later COLO. Thanks. From bde668d63c182540a013074a70c7a37474aedf94 Mon Sep 17 00:00:00 2001 From: zhanghailiang Date: Wed, 13 Apr 2016 01:11:09 +0800 Subject: [PATCH] COLO: use timer to notify COLO to do checkpoint Signed-off-by: zhanghailiang --- include/migration/colo.h | 1 + include/migration/migration.h | 2 ++ migration/colo.c | 33 ++++++++++++++++++++------------- migration/migration.c | 1 + 4 files changed, 24 insertions(+), 13 deletions(-) diff --git a/include/migration/colo.h b/include/migration/colo.h index 87ea6d2..9f0098e 100644 --- a/include/migration/colo.h +++ b/include/migration/colo.h @@ -38,5 +38,6 @@ void colo_do_failover(MigrationState *s); bool colo_shutdown(void); void colo_add_buffer_filter(Notifier *notifier, void *data); +void colo_checkpoint_notify(void *opaque); #endif diff --git a/include/migration/migration.h b/include/migration/migration.h index 1009918..fed5a14 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -181,6 +181,8 @@ struct MigrationState RAMBlock *last_req_rb; QemuSemaphore colo_sem; + int64_t checkpoint_time; + QEMUTimer *delay_timer; }; void migrate_set_state(int *state, int old_state, int new_state); diff --git a/migration/colo.c b/migration/colo.c index 56260d8..9fb1a30 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -474,7 +474,7 @@ void colo_add_buffer_filter(Notifier *notifier, void *data) static void colo_process_checkpoint(MigrationState *s) { QEMUSizedBuffer *buffer = NULL; - int64_t current_time, checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); + int64_t current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); Error *local_err = NULL; int ret; @@ -528,29 +528,22 @@ static void colo_process_checkpoint(MigrationState *s) if (ret < 0) { goto out; } - + timer_mod(s->delay_timer, + current_time + s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]); while (s->state == MIGRATION_STATUS_COLO) { if (failover_request_is_active()) { error_report("failover request"); goto out; } - current_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); - if ((current_time - checkpoint_time < - s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]) && - !colo_shutdown_requested) { - int64_t delay_ms; - - delay_ms = s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] - - (current_time - checkpoint_time); - g_usleep(delay_ms * 1000); + if (!colo_shutdown_requested) { + qemu_sem_wait(&s->colo_sem); } /* start a colo checkpoint */ ret = colo_do_checkpoint_transaction(s, buffer); if (ret < 0) { goto out; } - checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); } out: @@ -572,7 +565,7 @@ out: qsb_free(buffer); buffer = NULL; - + timer_del(s->delay_timer); /* Hope this not to be too long to wait here */ qemu_sem_wait(&s->colo_sem); qemu_sem_destroy(&s->colo_sem); @@ -586,12 +579,26 @@ out: } } +void colo_checkpoint_notify(void *opaque) +{ + MigrationState *s = opaque; + int64_t next_notify_time; + + qemu_sem_post(&s->colo_sem); + s->checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); + next_notify_time = s->checkpoint_time + + s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY]; + timer_mod(s->delay_timer, next_notify_time); +} + void migrate_start_colo_process(MigrationState *s) { qemu_mutex_unlock_iothread(); qemu_sem_init(&s->colo_sem, 0); migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COLO); + s->delay_timer = timer_new_ms(QEMU_CLOCK_HOST, colo_checkpoint_notify, + s); colo_process_checkpoint(s); qemu_mutex_lock_iothread(); } diff --git a/migration/migration.c b/migration/migration.c index 3bceecc..8907075 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -825,6 +825,7 @@ void qmp_migrate_set_parameters(bool has_compress_level, if (has_x_checkpoint_delay) { s->parameters[MIGRATION_PARAMETER_X_CHECKPOINT_DELAY] = x_checkpoint_delay; + colo_checkpoint_notify(s); } }