From patchwork Tue Dec 27 10:38:08 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Zhanghailiang <zhang.zhanghailiang@huawei.com>
X-Patchwork-Id: 9489161
Return-Path: 
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	6EFDE62AAD for <patchwork-qemu-devel@patchwork.kernel.org>;
	Tue, 27 Dec 2016 10:41:48 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 56D5A201BC
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Tue, 27 Dec 2016 10:41:48 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 49CA02094F; Tue, 27 Dec 2016 10:41:48 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
Received: from lists.gnu.org (lists.gnu.org [208.118.235.17])
	(using TLSv1 with cipher AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 29D76201BC
	for <patchwork-qemu-devel@patchwork.kernel.org>;
	Tue, 27 Dec 2016 10:41:46 +0000 (UTC)
Received: from localhost ([::1]:53533 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>)
	id 1cLpCR-00077H-PR for patchwork-qemu-devel@patchwork.kernel.org;
	Tue, 27 Dec 2016 05:41:43 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48205)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1cLpC6-00076z-Ks
	for qemu-devel@nongnu.org; Tue, 27 Dec 2016 05:41:24 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>) id 1cLpC3-0001JO-FY
	for qemu-devel@nongnu.org; Tue, 27 Dec 2016 05:41:22 -0500
Received: from szxga01-in.huawei.com ([58.251.152.64]:54218)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.71)
	(envelope-from <zhang.zhanghailiang@huawei.com>)
	id 1cLpC2-0001Hi-BV; Tue, 27 Dec 2016 05:41:19 -0500
Received: from 172.24.1.36 (EHLO szxeml434-hub.china.huawei.com)
	([172.24.1.36])
	by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued)
	with ESMTP id DWZ14454; Tue, 27 Dec 2016 18:38:50 +0800 (CST)
Received: from [127.0.0.1] (10.177.24.212) by szxeml434-hub.china.huawei.com
	(10.82.67.225) with Microsoft SMTP Server id 14.3.235.1;
	Tue, 27 Dec 2016 18:38:42 +0800
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
References: <1479555831-30960-1-git-send-email-zhang.zhanghailiang@huawei.com>
	<20161206134211.GD4990@noname.str.redhat.com>
	<20161206152415.GF2125@work-vm>
	<5848F139.1020402@huawei.com> <20161208200230.GH2054@work-vm>
	<585B40C5.9020603@huawei.com>
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
Message-ID: <58624490.1090609@huawei.com>
Date: Tue, 27 Dec 2016 18:38:08 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
	Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <585B40C5.9020603@huawei.com>
X-Originating-IP: [10.177.24.212]
X-CFilter-Loop: Reflected
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic]
	[fuzzy]
X-Received-From: 58.251.152.64
Subject: Re: [Qemu-devel] [PATCH] migration: re-active images when migration
	fails to complete
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: Kevin Wolf <kwolf@redhat.com>, amit.shah@redhat.com,
	qemu-devel@nongnu.org, qemu-block@nongnu.org, quintela@redhat.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
X-Virus-Scanned: ClamAV using ClamSMTP

On 2016/12/22 10:56, Hailiang Zhang wrote:
> On 2016/12/9 4:02, Dr. David Alan Gilbert wrote:
>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>>> Hi，
>>>
>>> On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:
>>>> * Kevin Wolf (kwolf@redhat.com) wrote:
>>>>> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:
>>>>>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
>>>>>> which migration aborted QEMU because it didn't regain the control
>>>>>> of images while some errors happened.
>>>>>>
>>>>>> Actually, we have another case in that error path to abort QEMU
>>>>>> because of the same reason:
>>>>>>        migration_thread()
>>>>>>            migration_completion()
>>>>>>               bdrv_inactivate_all() ----------------> inactivate images
>>>>>>               qemu_savevm_state_complete_precopy()
>>>>>>                   socket_writev_buffer() --------> error because destination fails
>>>>>>                 qemu_fflush() -------------------> set error on migration stream
>>>>>>               qemu_mutex_unlock_iothread() ------> unlock
>>>>>>        qmp_migrate_cancel() ---------------------> user cancelled migration
>>>>>>            migrate_set_state() ------------------> set migrate CANCELLING
>>>>>
>>>>> Important to note here: qmp_migrate_cancel() is executed by a concurrent
>>>>> thread, it doesn't depend on any code paths in migration_completion().
>>>>>
>>>>>>        migration_completion() -----------------> go on to fail_invalidate
>>>>>>            if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
>>>>>>        migration_thread() -----------------------> break migration loop
>>>>>>          vm_start() -----------------------------> restart guest with inactive
>>>>>>                                                    images
>>>>>> We failed to regain the control of images because we only regain it
>>>>>> while the migration state is "active", but here users cancelled the migration
>>>>>> when they found some errors happened (for example, libvirtd daemon is shutdown
>>>>>> in destination unexpectedly).
>>>>>>
>>>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>>>> ---
>>>>>>     migration/migration.c | 3 ++-
>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index f498ab8..0c1ee6d 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -1752,7 +1752,8 @@ fail_invalidate:
>>>>>>         /* If not doing postcopy, vm_start() will be called: let's regain
>>>>>>          * control on images.
>>>>>>          */
>>>>>> -    if (s->state == MIGRATION_STATUS_ACTIVE) {
>>>>>
>>>>> This if condition tries to check whether we ran the code path that
>>>>> called bdrv_inactivate_all(), so that we only try to reactivate images
>>>>> it if we really inactivated them first.
>>>>>
>>>>> The problem with it is that it ignores a possible concurrent
>>>>> modification of s->state.
>>>>>
>>>>>> +    if (s->state == MIGRATION_STATUS_ACTIVE ||
>>>>>> +        s->state == MIGRATION_STATUS_CANCELLING) {
>>>>>
>>>>> This adds another state that we could end up with with a concurrent
>>>>> modification, so that even in this case we undo the inactivation.
>>>>>
>>>>> However, it is no longer limited to the cases where we inactivated the
>>>>> image. It also applies to other code paths (like the postcopy one) where
>>>>> we didn't inactivate images.
>>>>>
>>>>> What saves the patch is that bdrv_invalidate_cache() is a no-op for
>>>>> block devices that aren't inactivated, so calling it more often than
>>>>> necessary is okay.
>>>>>
>>>>> But then, if we're going to rely on this, it would be much better to
>>>>> just remove the if altogether. I can't say whether there are any other
>>>>> possible values of s->state that we should consider, and by removing the
>>>>> if we would be guaranteed to catch all of them.
>>>>>
>>>>> If we don't want to rely on it, just keep a local bool that remembers
>>>>> whether we inactivated images and check that here.
>>>>>
>>>>>>             Error *local_err = NULL;
>>>>>>
>>>>>>             bdrv_invalidate_cache_all(&local_err);
>>>>>
>>>>> So in summary, this is a horrible patch because it checks the wrong
>>>>> thing, and for I can't really say if it covers everything it needs to
>>>>> cover, but arguably it happens to correctly fix the outcome of a
>>>>> previously failing case.
>>>>>
>>>>> Normally I would reject such a patch and require a clean solution, but
>>>>> then we're on the day of -rc3, so if you can't send v2 right away, we
>>>>> might not have the time for it.
>>>>>
>>>>> Tough call...
>>>>
>>>> Hmm, this case is messy; I created this function having split it out
>>>> of the main loop a couple of years back but it did get more messy
>>>> with more s->state checks; as far as I can tell it's always
>>>> done the transition to COMPLETED at the end well after the locked
>>>> section, so there's always been that chance that cancellation sneaks
>>>> in just before or just after the locked section.
>>>>
>>>> Some of the bad cases that can happen:
>>>>       a) A cancel sneaks in after the ACTIVE check but before or after
>>>>         the locked section;  should we reactivate the disks? Well that
>>>>         depends on whether the destination actually got the full migration
>>>>         stream - we don't know!
>>>>            If the destination actually starts running we must not reactivate
>>>>            the disk on the source even if the CPU is stopped.
>>>>
>>>
>>> Yes, we didn't have a mechanism to know exactly whether or not the VM in
>>> destination is well received.
>>>
>>>>       b) If the bdrv_inactive_all fails for one device, but the others
>>>>          are fine, we go down the fail: label and don't reactivate, so
>>>>          the source dies even though it might have been mostly OK.
>>>>
>>>
>>>> We can move the _lock to before the check of s->state at the top,
>>>> which would stop the cancel sneaking in early.
>>>> In the case where postcopy was never enabled (!migrate_postcopy_ram())
>>>> we can move the COMPLETED transition into the lock as well; so I think
>>>> then we kind of become safe.
>>>> In the case where postcopy was enabled I think we can do the COMPLETED
>>>> transition before waiting for the return path to close - I think but
>>>> I need to think more about that.
>>>> And there seem to be some dodgy cases where we call the invalidate
>>>> there after a late postcopy failure; that's bad, we shouldn't reactivate
>>>> the source disks after going into postcopy.
>>>>
>>>> So, in summary; this function is a mess - it needs a much bigger
>>>> fix than this patch.
>>>>
>>>
>>> So what's the conclusion ?
>>> Will you send a patch to fix it ? Or let's fix it step by step ?
>>> I think Kevin's suggestion which just remove the *if* check is OK.
>>
>> I don't see any of the simple solutions are easy;  so I plan
>> to look at it properly, but am not sure when;  if you have time
>> to do it then feel free.
>>
>
> Hmm, we still have gaps between bdrv_inactivate_all() and migration thread
> totally exit, which migration cancelling can slip in.
> We do caught that case while we finished migration_completion() but
> before the begin of cleanup work (It has global lock to be protected).
> The related codes is:
>
>                   migration_completion(s, current_active_state,
>                                        &old_vm_running, &start_time);
>                   break;
>               }
>           }
> ------------------------------------------------------> gap begin
>           if (qemu_file_get_error(s->to_dst_file)) {
>               migrate_set_state(&s->state, current_active_state,
>                                 MIGRATION_STATUS_FAILED);
>               trace_migration_thread_file_err();
>               break;
>           }
>           current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>           if (current_time >= initial_time + BUFFER_DELAY) {
>               uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
>                                            initial_bytes;
>               uint64_t time_spent = current_time - initial_time;
>               double bandwidth = (double)transferred_bytes / time_spent;
>               max_size = bandwidth * s->parameters.downtime_limit;
>
>               s->mbps = (((double) transferred_bytes * 8.0) /
>                       ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
>
>               trace_migrate_transferred(transferred_bytes, time_spent,
>                                         bandwidth, max_size);
>               /* if we haven't sent anything, we don't want to recalculate
>                  10000 is a small enough number for our purposes */
>               if (s->dirty_bytes_rate && transferred_bytes > 10000) {
>                   s->expected_downtime = s->dirty_bytes_rate / bandwidth;
>               }
>
>               qemu_file_reset_rate_limit(s->to_dst_file);
>               initial_time = current_time;
>               initial_bytes = qemu_ftell(s->to_dst_file);
>           }
>           if (qemu_file_rate_limit(s->to_dst_file)) {
>               /* usleep expects microseconds */
>               g_usleep((initial_time + BUFFER_DELAY - current_time)*1000);
>           }
>       }
>
>       trace_migration_thread_after_loop();
>       /* If we enabled cpu throttling for auto-converge, turn it off. */
>       cpu_throttle_stop();
>       end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> --------------------------------------------------------------> gap end
>       qemu_mutex_lock_iothread();
>       /*
>
> So maybe we should call bdrv_invalidate_cache_all() in qmp_migrate_cancel()
> directly ? which i think will cover all the cases. (Including another potential
> case, which QEMU has finished the migration process, but libvirtd decides to
> cancel the migration before shutdown it.)
>
> The things here I'm not sure is, is it OK to call bdrv_invalidate_cache_all()
> without bdrv_inactivate_all() has been called ? If not, we need to record
> if we have inactive all blocks before call bdrv_invalidate_cache_all()。
>

Examples codes:

Subject: [PATCH] migration: re-active images while migration been canceled
  after inactive them

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
  include/migration/migration.h |  3 +++
  migration/migration.c         | 13 +++++++++++++
  2 files changed, 16 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23..2d5b724 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -177,6 +177,9 @@ struct MigrationState
      /* Flag set once the migration thread is running (and needs joining) */
      bool migration_thread_running;

+    /* Flag set once the migration thread called bdrv_inactivate_all */
+    bool block_inactive;
+
      /* Queue of outstanding page requests from the destination */
      QemuMutex src_page_req_mutex;
      QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..8525c00 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1006,6 +1006,16 @@ static void migrate_fd_cancel(MigrationState *s)
      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
          qemu_file_shutdown(f);
      }
+    if (s->block_inactive) {
+        Error *local_err = NULL;
+
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+        } else {
+            s->block_inactive = false;
+        }
+    }
  }

  void add_migration_state_change_notifier(Notifier *notify)
@@ -1705,6 +1715,7 @@ static void migration_completion(MigrationState *s, int current_active_state,
              if (ret >= 0) {
                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                  qemu_savevm_state_complete_precopy(s->to_dst_file, false);
+                s->block_inactive = true;
              }
          }
          qemu_mutex_unlock_iothread();
@@ -1758,6 +1769,8 @@ fail_invalidate:
          bdrv_invalidate_cache_all(&local_err);
          if (local_err) {
              error_report_err(local_err);
+        } else {
+            s->block_inactive = false;
          }
      }