From patchwork Wed Jun 8 18:13:04 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Md Haris Iqbal X-Patchwork-Id: 9165419 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6D11460572 for ; Wed, 8 Jun 2016 18:13:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63DE52656B for ; Wed, 8 Jun 2016 18:13:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 57FD62824F; Wed, 8 Jun 2016 18:13:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_WEB, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B83EB2656B for ; Wed, 8 Jun 2016 18:13:48 +0000 (UTC) Received: from localhost ([::1]:58758 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAhz9-0007uF-0P for patchwork-qemu-devel@patchwork.kernel.org; Wed, 08 Jun 2016 14:13:47 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35691) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAhyq-0007u5-Ow for qemu-devel@nongnu.org; Wed, 08 Jun 2016 14:13:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bAhyk-0005s7-M0 for qemu-devel@nongnu.org; Wed, 08 Jun 2016 14:13:27 -0400 Received: from mail-pa0-x242.google.com ([2607:f8b0:400e:c03::242]:36073) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bAhyk-0005rw-Bm for qemu-devel@nongnu.org; Wed, 08 Jun 2016 14:13:22 -0400 Received: by mail-pa0-x242.google.com with SMTP id fg1so1006141pad.3 for ; Wed, 08 Jun 2016 11:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=RO8UqP3Vuzu+NNDYUhtFLnxtGFQ00IDtMBu4p0gPkpI=; b=lFLS8nEC1vjHgrzb51dEx4Ct9Y/no5k2xUz7iGZ13G/oWBz7VMaIiZUhLKJ+fOOLGF 3KQ/r+x2wKlqYe7uluZHXuIpbVF12JX78TpX/t8F0r20xrhFwpml9+TmXYSqhzeoenme yNJ0cDGH3O6AyMj5OVI5jr14tu3eojb4iL2u1bnTN+akY9/jqGiCt2tgB4yPqpjA2OJP wORG4XnxRoEoelFxdRHyPWP7Tr2Hu8HVbh/ULeO4ptJlhMZnK+Gn+WJyyuimg6h7qe4g epKoabpAWt/n3uroZKrmmTSyaE+seeBXXv9ECcb0bDw+mtw1Ei6YGCab3BhrLPMzDGyl E+uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=RO8UqP3Vuzu+NNDYUhtFLnxtGFQ00IDtMBu4p0gPkpI=; b=JiLLgpscmcFE0rWRX7Ie4TnZwoOfz+3lEMtQ9dLMjqmH1Ri8hM/cF1bVhhtPqLyr8e 22FDAdpiiaaC5hoCW2dtFF6I8EiuiGaQvQYP+phzMiUowlA/k0Ixn/kXmea7cvaIDPqa /OPivv0W2apiQqGWI75hZuL85opgtTQwM4/w+LugcsMPYzrk2B4Fyz8TRgIBY6uFm6nZ hv7q1RrictJL7o3qBuZUn3RJbTkHOVtqCxvokECTUPmPEXODKN+90u/2+IhUSit7urTp PKtHDcRLpVTFsFLUlv0XRacgDydqpwPkebUxO97HRhdjKUnVIEWxGa92aVVlSBRzvdT+ Jk7A== X-Gm-Message-State: ALyK8tIqNU7SdRZJ0Q9D3IXrV4xuGS07WDrbgtaLSSs88HKL2+kuMnRR/D74tf1p41hWoQ== X-Received: by 10.66.27.80 with SMTP id r16mr7092594pag.102.1465409601221; Wed, 08 Jun 2016 11:13:21 -0700 (PDT) Received: from dragon-master.nitk.ac.in ([14.139.155.210]) by smtp.googlemail.com with ESMTPSA id h66sm4031620pfe.14.2016.06.08.11.13.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 08 Jun 2016 11:13:20 -0700 (PDT) From: Md Haris Iqbal To: qemu-devel@nongnu.org Date: Wed, 8 Jun 2016 23:43:04 +0530 Message-Id: <1465409584-16308-1-git-send-email-haris.phnx@gmail.com> X-Mailer: git-send-email 2.7.4 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2607:f8b0:400e:c03::242 Subject: [Qemu-devel] [Qemu-devel [RFC] [WIP] v2] Keeping the Source side alive incase of network failure (Migration recovery from network failure) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Md Haris Iqbal , dgilbert@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP --- include/migration/migration.h | 1 + migration/migration.c | 76 ++++++++++++++++++++++++++++++++++++++++--- qapi-schema.json | 11 +++++-- vl.c | 4 +++ 4 files changed, 85 insertions(+), 7 deletions(-) diff --git a/include/migration/migration.h b/include/migration/migration.h index 4a3201b..59e26e6 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -326,6 +326,7 @@ void global_state_store_running(void); void flush_page_queue(MigrationState *ms); int ram_save_queue_pages(MigrationState *ms, const char *rbname, ram_addr_t start, ram_addr_t len); +int qemu_migrate_postcopy_outgoing_recovery(MigrationState *ms); PostcopyState postcopy_state_get(void); /* Set the state and return the old state */ diff --git a/migration/migration.c b/migration/migration.c index a77f62e..41c28e1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -676,6 +676,33 @@ MigrationInfo *qmp_query_migrate(Error **errp) case MIGRATION_STATUS_CANCELLED: info->has_status = true; break; + case MIGRATION_STATUS_POSTCOPY_RECOVERY: + info->has_status = true; + info->has_total_time = true; + info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + + info->has_ram = true; + info->ram = g_malloc0(sizeof(*info->ram)); + info->ram->transferred = ram_bytes_transferred(); + info->ram->remaining = ram_bytes_remaining(); + info->ram->total = ram_bytes_total(); + info->ram->duplicate = dup_mig_pages_transferred(); + info->ram->skipped = skipped_mig_pages_transferred(); + info->ram->normal = norm_mig_pages_transferred(); + info->ram->normal_bytes = norm_mig_bytes_transferred(); + info->ram->dirty_pages_rate = s->dirty_pages_rate; + info->ram->mbps = s->mbps; + info->ram->dirty_sync_count = s->dirty_sync_count; + + if (blk_mig_active()) { + info->has_disk = true; + info->disk = g_malloc0(sizeof(*info->disk)); + info->disk->transferred = blk_mig_bytes_transferred(); + info->disk->remaining = blk_mig_bytes_remaining(); + info->disk->total = blk_mig_bytes_total(); + } + + get_xbzrle_cache_stats(info); } info->status = s->state; @@ -1660,6 +1687,8 @@ static void *migration_thread(void *opaque) /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ enum MigrationStatus current_active_state = MIGRATION_STATUS_ACTIVE; + int ret; + rcu_register_thread(); qemu_savevm_state_header(s->to_dst_file); @@ -1726,11 +1755,32 @@ static void *migration_thread(void *opaque) } } - if (qemu_file_get_error(s->to_dst_file)) { - migrate_set_state(&s->state, current_active_state, - MIGRATION_STATUS_FAILED); - trace_migration_thread_file_err(); - break; + if ((ret = qemu_file_get_error(s->to_dst_file))) { + fprintf(stderr, "1 : Error %s %d\n", strerror(-ret), -ret); + + /* This check is based on how the error is set during the network + * recv(). When recv() returns 0 (i.e. no data to read), the error + * is set to -EIO. For all other network errors, it is set + * according to the return value received. + */ + if (ret != -EIO && s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) { + /* Network Failure during postcopy */ + + current_active_state = MIGRATION_STATUS_POSTCOPY_RECOVERY; + runstate_set(RUN_STATE_POSTMIGRATE_RECOVERY); + fprintf(stderr, "1.1 : Error %s %d\n", strerror(-ret), -ret); + ret = qemu_migrate_postcopy_outgoing_recovery(s); + if(ret < 0) { + break; + } + + } else { + migrate_set_state(&s->state, current_active_state, + MIGRATION_STATUS_FAILED); + fprintf(stderr, "1.2 : Error %s %d\n", strerror(-ret), -ret); + trace_migration_thread_file_err(); + break; + } } current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); if (current_time >= initial_time + BUFFER_DELAY) { @@ -1831,6 +1881,22 @@ void migrate_fd_connect(MigrationState *s) s->migration_thread_running = true; } +int qemu_migrate_postcopy_outgoing_recovery(MigrationState* ms) +{ + migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + MIGRATION_STATUS_POSTCOPY_RECOVERY); + + ms->in_recovery = true; + /* Code for network recovery to be added here */ + while(atomic_mb_read(&ms->in_recovery) == true) { + fprintf(stderr, "Not letting it fail %p\n", ms->to_dst_file); + sleep(5); + } + + return -1; + +} + PostcopyState postcopy_state_get(void) { return atomic_mb_read(&incoming_postcopy_state); diff --git a/qapi-schema.json b/qapi-schema.json index 1b7b1e1..613f8d2 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -154,12 +154,14 @@ # @watchdog: the watchdog action is configured to pause and has been triggered # # @guest-panicked: guest has been panicked as a result of guest OS panic +# +# @postmigrate-recovery: guest is paused for recovery after a network failure ## { 'enum': 'RunState', 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused', 'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm', 'running', 'save-vm', 'shutdown', 'suspended', 'watchdog', - 'guest-panicked' ] } + 'guest-panicked', 'postmigrate-recovery' ] } ## # @StatusInfo: @@ -434,12 +436,15 @@ # # @failed: some error occurred during migration process. # +# @postcopy-recovery: in recovery mode, after a network failure. +# # Since: 2.3 # ## { 'enum': 'MigrationStatus', 'data': [ 'none', 'setup', 'cancelling', 'cancelled', - 'active', 'postcopy-active', 'completed', 'failed' ] } + 'active', 'postcopy-active', 'completed', 'failed', + 'postcopy-recovery' ] } ## # @MigrationInfo @@ -2058,6 +2063,8 @@ # # @uri: the Uniform Resource Identifier of the destination VM # +# @recover: #optional recover from a broken migration +# # @blk: #optional do block migration (full disk copy) # # @inc: #optional incremental disk copy migration diff --git a/vl.c b/vl.c index 5fd22cb..c237140 100644 --- a/vl.c +++ b/vl.c @@ -618,6 +618,10 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING }, { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE }, { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PRELAUNCH }, + { RUN_STATE_FINISH_MIGRATE, RUN_STATE_POSTMIGRATE_RECOVERY }, + + { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_FINISH_MIGRATE }, + { RUN_STATE_POSTMIGRATE_RECOVERY, RUN_STATE_SHUTDOWN }, { RUN_STATE_RESTORE_VM, RUN_STATE_RUNNING }, { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },