From patchwork Tue Dec 27 10:38:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Zhanghailiang X-Patchwork-Id: 9489161 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 6EFDE62AAD for ; Tue, 27 Dec 2016 10:41:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56D5A201BC for ; Tue, 27 Dec 2016 10:41:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 49CA02094F; Tue, 27 Dec 2016 10:41:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 29D76201BC for ; Tue, 27 Dec 2016 10:41:46 +0000 (UTC) Received: from localhost ([::1]:53533 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cLpCR-00077H-PR for patchwork-qemu-devel@patchwork.kernel.org; Tue, 27 Dec 2016 05:41:43 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48205) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cLpC6-00076z-Ks for qemu-devel@nongnu.org; Tue, 27 Dec 2016 05:41:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cLpC3-0001JO-FY for qemu-devel@nongnu.org; Tue, 27 Dec 2016 05:41:22 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:54218) by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71) (envelope-from ) id 1cLpC2-0001Hi-BV; Tue, 27 Dec 2016 05:41:19 -0500 Received: from 172.24.1.36 (EHLO szxeml434-hub.china.huawei.com) ([172.24.1.36]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DWZ14454; Tue, 27 Dec 2016 18:38:50 +0800 (CST) Received: from [127.0.0.1] (10.177.24.212) by szxeml434-hub.china.huawei.com (10.82.67.225) with Microsoft SMTP Server id 14.3.235.1; Tue, 27 Dec 2016 18:38:42 +0800 To: "Dr. David Alan Gilbert" References: <1479555831-30960-1-git-send-email-zhang.zhanghailiang@huawei.com> <20161206134211.GD4990@noname.str.redhat.com> <20161206152415.GF2125@work-vm> <5848F139.1020402@huawei.com> <20161208200230.GH2054@work-vm> <585B40C5.9020603@huawei.com> From: Hailiang Zhang Message-ID: <58624490.1090609@huawei.com> Date: Tue, 27 Dec 2016 18:38:08 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <585B40C5.9020603@huawei.com> X-Originating-IP: [10.177.24.212] X-CFilter-Loop: Reflected X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 58.251.152.64 Subject: Re: [Qemu-devel] [PATCH] migration: re-active images when migration fails to complete X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , amit.shah@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP On 2016/12/22 10:56, Hailiang Zhang wrote: > On 2016/12/9 4:02, Dr. David Alan Gilbert wrote: >> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: >>> Hi, >>> >>> On 2016/12/6 23:24, Dr. David Alan Gilbert wrote: >>>> * Kevin Wolf (kwolf@redhat.com) wrote: >>>>> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: >>>>>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case >>>>>> which migration aborted QEMU because it didn't regain the control >>>>>> of images while some errors happened. >>>>>> >>>>>> Actually, we have another case in that error path to abort QEMU >>>>>> because of the same reason: >>>>>> migration_thread() >>>>>> migration_completion() >>>>>> bdrv_inactivate_all() ----------------> inactivate images >>>>>> qemu_savevm_state_complete_precopy() >>>>>> socket_writev_buffer() --------> error because destination fails >>>>>> qemu_fflush() -------------------> set error on migration stream >>>>>> qemu_mutex_unlock_iothread() ------> unlock >>>>>> qmp_migrate_cancel() ---------------------> user cancelled migration >>>>>> migrate_set_state() ------------------> set migrate CANCELLING >>>>> >>>>> Important to note here: qmp_migrate_cancel() is executed by a concurrent >>>>> thread, it doesn't depend on any code paths in migration_completion(). >>>>> >>>>>> migration_completion() -----------------> go on to fail_invalidate >>>>>> if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch >>>>>> migration_thread() -----------------------> break migration loop >>>>>> vm_start() -----------------------------> restart guest with inactive >>>>>> images >>>>>> We failed to regain the control of images because we only regain it >>>>>> while the migration state is "active", but here users cancelled the migration >>>>>> when they found some errors happened (for example, libvirtd daemon is shutdown >>>>>> in destination unexpectedly). >>>>>> >>>>>> Signed-off-by: zhanghailiang >>>>>> --- >>>>>> migration/migration.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>> index f498ab8..0c1ee6d 100644 >>>>>> --- a/migration/migration.c >>>>>> +++ b/migration/migration.c >>>>>> @@ -1752,7 +1752,8 @@ fail_invalidate: >>>>>> /* If not doing postcopy, vm_start() will be called: let's regain >>>>>> * control on images. >>>>>> */ >>>>>> - if (s->state == MIGRATION_STATUS_ACTIVE) { >>>>> >>>>> This if condition tries to check whether we ran the code path that >>>>> called bdrv_inactivate_all(), so that we only try to reactivate images >>>>> it if we really inactivated them first. >>>>> >>>>> The problem with it is that it ignores a possible concurrent >>>>> modification of s->state. >>>>> >>>>>> + if (s->state == MIGRATION_STATUS_ACTIVE || >>>>>> + s->state == MIGRATION_STATUS_CANCELLING) { >>>>> >>>>> This adds another state that we could end up with with a concurrent >>>>> modification, so that even in this case we undo the inactivation. >>>>> >>>>> However, it is no longer limited to the cases where we inactivated the >>>>> image. It also applies to other code paths (like the postcopy one) where >>>>> we didn't inactivate images. >>>>> >>>>> What saves the patch is that bdrv_invalidate_cache() is a no-op for >>>>> block devices that aren't inactivated, so calling it more often than >>>>> necessary is okay. >>>>> >>>>> But then, if we're going to rely on this, it would be much better to >>>>> just remove the if altogether. I can't say whether there are any other >>>>> possible values of s->state that we should consider, and by removing the >>>>> if we would be guaranteed to catch all of them. >>>>> >>>>> If we don't want to rely on it, just keep a local bool that remembers >>>>> whether we inactivated images and check that here. >>>>> >>>>>> Error *local_err = NULL; >>>>>> >>>>>> bdrv_invalidate_cache_all(&local_err); >>>>> >>>>> So in summary, this is a horrible patch because it checks the wrong >>>>> thing, and for I can't really say if it covers everything it needs to >>>>> cover, but arguably it happens to correctly fix the outcome of a >>>>> previously failing case. >>>>> >>>>> Normally I would reject such a patch and require a clean solution, but >>>>> then we're on the day of -rc3, so if you can't send v2 right away, we >>>>> might not have the time for it. >>>>> >>>>> Tough call... >>>> >>>> Hmm, this case is messy; I created this function having split it out >>>> of the main loop a couple of years back but it did get more messy >>>> with more s->state checks; as far as I can tell it's always >>>> done the transition to COMPLETED at the end well after the locked >>>> section, so there's always been that chance that cancellation sneaks >>>> in just before or just after the locked section. >>>> >>>> Some of the bad cases that can happen: >>>> a) A cancel sneaks in after the ACTIVE check but before or after >>>> the locked section; should we reactivate the disks? Well that >>>> depends on whether the destination actually got the full migration >>>> stream - we don't know! >>>> If the destination actually starts running we must not reactivate >>>> the disk on the source even if the CPU is stopped. >>>> >>> >>> Yes, we didn't have a mechanism to know exactly whether or not the VM in >>> destination is well received. >>> >>>> b) If the bdrv_inactive_all fails for one device, but the others >>>> are fine, we go down the fail: label and don't reactivate, so >>>> the source dies even though it might have been mostly OK. >>>> >>> >>>> We can move the _lock to before the check of s->state at the top, >>>> which would stop the cancel sneaking in early. >>>> In the case where postcopy was never enabled (!migrate_postcopy_ram()) >>>> we can move the COMPLETED transition into the lock as well; so I think >>>> then we kind of become safe. >>>> In the case where postcopy was enabled I think we can do the COMPLETED >>>> transition before waiting for the return path to close - I think but >>>> I need to think more about that. >>>> And there seem to be some dodgy cases where we call the invalidate >>>> there after a late postcopy failure; that's bad, we shouldn't reactivate >>>> the source disks after going into postcopy. >>>> >>>> So, in summary; this function is a mess - it needs a much bigger >>>> fix than this patch. >>>> >>> >>> So what's the conclusion ? >>> Will you send a patch to fix it ? Or let's fix it step by step ? >>> I think Kevin's suggestion which just remove the *if* check is OK. >> >> I don't see any of the simple solutions are easy; so I plan >> to look at it properly, but am not sure when; if you have time >> to do it then feel free. >> > > Hmm, we still have gaps between bdrv_inactivate_all() and migration thread > totally exit, which migration cancelling can slip in. > We do caught that case while we finished migration_completion() but > before the begin of cleanup work (It has global lock to be protected). > The related codes is: > > migration_completion(s, current_active_state, > &old_vm_running, &start_time); > break; > } > } > ------------------------------------------------------> gap begin > if (qemu_file_get_error(s->to_dst_file)) { > migrate_set_state(&s->state, current_active_state, > MIGRATION_STATUS_FAILED); > trace_migration_thread_file_err(); > break; > } > current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > if (current_time >= initial_time + BUFFER_DELAY) { > uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) - > initial_bytes; > uint64_t time_spent = current_time - initial_time; > double bandwidth = (double)transferred_bytes / time_spent; > max_size = bandwidth * s->parameters.downtime_limit; > > s->mbps = (((double) transferred_bytes * 8.0) / > ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; > > trace_migrate_transferred(transferred_bytes, time_spent, > bandwidth, max_size); > /* if we haven't sent anything, we don't want to recalculate > 10000 is a small enough number for our purposes */ > if (s->dirty_bytes_rate && transferred_bytes > 10000) { > s->expected_downtime = s->dirty_bytes_rate / bandwidth; > } > > qemu_file_reset_rate_limit(s->to_dst_file); > initial_time = current_time; > initial_bytes = qemu_ftell(s->to_dst_file); > } > if (qemu_file_rate_limit(s->to_dst_file)) { > /* usleep expects microseconds */ > g_usleep((initial_time + BUFFER_DELAY - current_time)*1000); > } > } > > trace_migration_thread_after_loop(); > /* If we enabled cpu throttling for auto-converge, turn it off. */ > cpu_throttle_stop(); > end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > --------------------------------------------------------------> gap end > qemu_mutex_lock_iothread(); > /* > > So maybe we should call bdrv_invalidate_cache_all() in qmp_migrate_cancel() > directly ? which i think will cover all the cases. (Including another potential > case, which QEMU has finished the migration process, but libvirtd decides to > cancel the migration before shutdown it.) > > The things here I'm not sure is, is it OK to call bdrv_invalidate_cache_all() > without bdrv_inactivate_all() has been called ? If not, we need to record > if we have inactive all blocks before call bdrv_invalidate_cache_all()。 > Examples codes: Subject: [PATCH] migration: re-active images while migration been canceled after inactive them Signed-off-by: zhanghailiang --- include/migration/migration.h | 3 +++ migration/migration.c | 13 +++++++++++++ 2 files changed, 16 insertions(+) diff --git a/include/migration/migration.h b/include/migration/migration.h index c309d23..2d5b724 100644 --- a/include/migration/migration.h +++ b/include/migration/migration.h @@ -177,6 +177,9 @@ struct MigrationState /* Flag set once the migration thread is running (and needs joining) */ bool migration_thread_running; + /* Flag set once the migration thread called bdrv_inactivate_all */ + bool block_inactive; + /* Queue of outstanding page requests from the destination */ QemuMutex src_page_req_mutex; QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests; diff --git a/migration/migration.c b/migration/migration.c index f498ab8..8525c00 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1006,6 +1006,16 @@ static void migrate_fd_cancel(MigrationState *s) if (s->state == MIGRATION_STATUS_CANCELLING && f) { qemu_file_shutdown(f); } + if (s->block_inactive) { + Error *local_err = NULL; + + bdrv_invalidate_cache_all(&local_err); + if (local_err) { + error_report_err(local_err); + } else { + s->block_inactive = false; + } + } } void add_migration_state_change_notifier(Notifier *notify) @@ -1705,6 +1715,7 @@ static void migration_completion(MigrationState *s, int current_active_state, if (ret >= 0) { qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); qemu_savevm_state_complete_precopy(s->to_dst_file, false); + s->block_inactive = true; } } qemu_mutex_unlock_iothread(); @@ -1758,6 +1769,8 @@ fail_invalidate: bdrv_invalidate_cache_all(&local_err); if (local_err) { error_report_err(local_err); + } else { + s->block_inactive = false; } }