diff mbox

migration: re-active images when migration fails to complete

Message ID 58624490.1090609@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Zhanghailiang Dec. 27, 2016, 10:38 a.m. UTC
On 2016/12/22 10:56, Hailiang Zhang wrote:
> On 2016/12/9 4:02, Dr. David Alan Gilbert wrote:
>> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote:
>>> Hi,
>>>
>>> On 2016/12/6 23:24, Dr. David Alan Gilbert wrote:
>>>> * Kevin Wolf (kwolf@redhat.com) wrote:
>>>>> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben:
>>>>>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case
>>>>>> which migration aborted QEMU because it didn't regain the control
>>>>>> of images while some errors happened.
>>>>>>
>>>>>> Actually, we have another case in that error path to abort QEMU
>>>>>> because of the same reason:
>>>>>>        migration_thread()
>>>>>>            migration_completion()
>>>>>>               bdrv_inactivate_all() ----------------> inactivate images
>>>>>>               qemu_savevm_state_complete_precopy()
>>>>>>                   socket_writev_buffer() --------> error because destination fails
>>>>>>                 qemu_fflush() -------------------> set error on migration stream
>>>>>>               qemu_mutex_unlock_iothread() ------> unlock
>>>>>>        qmp_migrate_cancel() ---------------------> user cancelled migration
>>>>>>            migrate_set_state() ------------------> set migrate CANCELLING
>>>>>
>>>>> Important to note here: qmp_migrate_cancel() is executed by a concurrent
>>>>> thread, it doesn't depend on any code paths in migration_completion().
>>>>>
>>>>>>        migration_completion() -----------------> go on to fail_invalidate
>>>>>>            if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch
>>>>>>        migration_thread() -----------------------> break migration loop
>>>>>>          vm_start() -----------------------------> restart guest with inactive
>>>>>>                                                    images
>>>>>> We failed to regain the control of images because we only regain it
>>>>>> while the migration state is "active", but here users cancelled the migration
>>>>>> when they found some errors happened (for example, libvirtd daemon is shutdown
>>>>>> in destination unexpectedly).
>>>>>>
>>>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>>>>> ---
>>>>>>     migration/migration.c | 3 ++-
>>>>>>     1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>> index f498ab8..0c1ee6d 100644
>>>>>> --- a/migration/migration.c
>>>>>> +++ b/migration/migration.c
>>>>>> @@ -1752,7 +1752,8 @@ fail_invalidate:
>>>>>>         /* If not doing postcopy, vm_start() will be called: let's regain
>>>>>>          * control on images.
>>>>>>          */
>>>>>> -    if (s->state == MIGRATION_STATUS_ACTIVE) {
>>>>>
>>>>> This if condition tries to check whether we ran the code path that
>>>>> called bdrv_inactivate_all(), so that we only try to reactivate images
>>>>> it if we really inactivated them first.
>>>>>
>>>>> The problem with it is that it ignores a possible concurrent
>>>>> modification of s->state.
>>>>>
>>>>>> +    if (s->state == MIGRATION_STATUS_ACTIVE ||
>>>>>> +        s->state == MIGRATION_STATUS_CANCELLING) {
>>>>>
>>>>> This adds another state that we could end up with with a concurrent
>>>>> modification, so that even in this case we undo the inactivation.
>>>>>
>>>>> However, it is no longer limited to the cases where we inactivated the
>>>>> image. It also applies to other code paths (like the postcopy one) where
>>>>> we didn't inactivate images.
>>>>>
>>>>> What saves the patch is that bdrv_invalidate_cache() is a no-op for
>>>>> block devices that aren't inactivated, so calling it more often than
>>>>> necessary is okay.
>>>>>
>>>>> But then, if we're going to rely on this, it would be much better to
>>>>> just remove the if altogether. I can't say whether there are any other
>>>>> possible values of s->state that we should consider, and by removing the
>>>>> if we would be guaranteed to catch all of them.
>>>>>
>>>>> If we don't want to rely on it, just keep a local bool that remembers
>>>>> whether we inactivated images and check that here.
>>>>>
>>>>>>             Error *local_err = NULL;
>>>>>>
>>>>>>             bdrv_invalidate_cache_all(&local_err);
>>>>>
>>>>> So in summary, this is a horrible patch because it checks the wrong
>>>>> thing, and for I can't really say if it covers everything it needs to
>>>>> cover, but arguably it happens to correctly fix the outcome of a
>>>>> previously failing case.
>>>>>
>>>>> Normally I would reject such a patch and require a clean solution, but
>>>>> then we're on the day of -rc3, so if you can't send v2 right away, we
>>>>> might not have the time for it.
>>>>>
>>>>> Tough call...
>>>>
>>>> Hmm, this case is messy; I created this function having split it out
>>>> of the main loop a couple of years back but it did get more messy
>>>> with more s->state checks; as far as I can tell it's always
>>>> done the transition to COMPLETED at the end well after the locked
>>>> section, so there's always been that chance that cancellation sneaks
>>>> in just before or just after the locked section.
>>>>
>>>> Some of the bad cases that can happen:
>>>>       a) A cancel sneaks in after the ACTIVE check but before or after
>>>>         the locked section;  should we reactivate the disks? Well that
>>>>         depends on whether the destination actually got the full migration
>>>>         stream - we don't know!
>>>>            If the destination actually starts running we must not reactivate
>>>>            the disk on the source even if the CPU is stopped.
>>>>
>>>
>>> Yes, we didn't have a mechanism to know exactly whether or not the VM in
>>> destination is well received.
>>>
>>>>       b) If the bdrv_inactive_all fails for one device, but the others
>>>>          are fine, we go down the fail: label and don't reactivate, so
>>>>          the source dies even though it might have been mostly OK.
>>>>
>>>
>>>> We can move the _lock to before the check of s->state at the top,
>>>> which would stop the cancel sneaking in early.
>>>> In the case where postcopy was never enabled (!migrate_postcopy_ram())
>>>> we can move the COMPLETED transition into the lock as well; so I think
>>>> then we kind of become safe.
>>>> In the case where postcopy was enabled I think we can do the COMPLETED
>>>> transition before waiting for the return path to close - I think but
>>>> I need to think more about that.
>>>> And there seem to be some dodgy cases where we call the invalidate
>>>> there after a late postcopy failure; that's bad, we shouldn't reactivate
>>>> the source disks after going into postcopy.
>>>>
>>>> So, in summary; this function is a mess - it needs a much bigger
>>>> fix than this patch.
>>>>
>>>
>>> So what's the conclusion ?
>>> Will you send a patch to fix it ? Or let's fix it step by step ?
>>> I think Kevin's suggestion which just remove the *if* check is OK.
>>
>> I don't see any of the simple solutions are easy;  so I plan
>> to look at it properly, but am not sure when;  if you have time
>> to do it then feel free.
>>
>
> Hmm, we still have gaps between bdrv_inactivate_all() and migration thread
> totally exit, which migration cancelling can slip in.
> We do caught that case while we finished migration_completion() but
> before the begin of cleanup work (It has global lock to be protected).
> The related codes is:
>
>                   migration_completion(s, current_active_state,
>                                        &old_vm_running, &start_time);
>                   break;
>               }
>           }
> ------------------------------------------------------> gap begin
>           if (qemu_file_get_error(s->to_dst_file)) {
>               migrate_set_state(&s->state, current_active_state,
>                                 MIGRATION_STATUS_FAILED);
>               trace_migration_thread_file_err();
>               break;
>           }
>           current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>           if (current_time >= initial_time + BUFFER_DELAY) {
>               uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) -
>                                            initial_bytes;
>               uint64_t time_spent = current_time - initial_time;
>               double bandwidth = (double)transferred_bytes / time_spent;
>               max_size = bandwidth * s->parameters.downtime_limit;
>
>               s->mbps = (((double) transferred_bytes * 8.0) /
>                       ((double) time_spent / 1000.0)) / 1000.0 / 1000.0;
>
>               trace_migrate_transferred(transferred_bytes, time_spent,
>                                         bandwidth, max_size);
>               /* if we haven't sent anything, we don't want to recalculate
>                  10000 is a small enough number for our purposes */
>               if (s->dirty_bytes_rate && transferred_bytes > 10000) {
>                   s->expected_downtime = s->dirty_bytes_rate / bandwidth;
>               }
>
>               qemu_file_reset_rate_limit(s->to_dst_file);
>               initial_time = current_time;
>               initial_bytes = qemu_ftell(s->to_dst_file);
>           }
>           if (qemu_file_rate_limit(s->to_dst_file)) {
>               /* usleep expects microseconds */
>               g_usleep((initial_time + BUFFER_DELAY - current_time)*1000);
>           }
>       }
>
>       trace_migration_thread_after_loop();
>       /* If we enabled cpu throttling for auto-converge, turn it off. */
>       cpu_throttle_stop();
>       end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> --------------------------------------------------------------> gap end
>       qemu_mutex_lock_iothread();
>       /*
>
> So maybe we should call bdrv_invalidate_cache_all() in qmp_migrate_cancel()
> directly ? which i think will cover all the cases. (Including another potential
> case, which QEMU has finished the migration process, but libvirtd decides to
> cancel the migration before shutdown it.)
>
> The things here I'm not sure is, is it OK to call bdrv_invalidate_cache_all()
> without bdrv_inactivate_all() has been called ? If not, we need to record
> if we have inactive all blocks before call bdrv_invalidate_cache_all()。
>

Examples codes:

Subject: [PATCH] migration: re-active images while migration been canceled
  after inactive them

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
---
  include/migration/migration.h |  3 +++
  migration/migration.c         | 13 +++++++++++++
  2 files changed, 16 insertions(+)
diff mbox

Patch

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23..2d5b724 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -177,6 +177,9 @@  struct MigrationState
      /* Flag set once the migration thread is running (and needs joining) */
      bool migration_thread_running;

+    /* Flag set once the migration thread called bdrv_inactivate_all */
+    bool block_inactive;
+
      /* Queue of outstanding page requests from the destination */
      QemuMutex src_page_req_mutex;
      QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) src_page_requests;
diff --git a/migration/migration.c b/migration/migration.c
index f498ab8..8525c00 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1006,6 +1006,16 @@  static void migrate_fd_cancel(MigrationState *s)
      if (s->state == MIGRATION_STATUS_CANCELLING && f) {
          qemu_file_shutdown(f);
      }
+    if (s->block_inactive) {
+        Error *local_err = NULL;
+
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+        } else {
+            s->block_inactive = false;
+        }
+    }
  }

  void add_migration_state_change_notifier(Notifier *notify)
@@ -1705,6 +1715,7 @@  static void migration_completion(MigrationState *s, int current_active_state,
              if (ret >= 0) {
                  qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
                  qemu_savevm_state_complete_precopy(s->to_dst_file, false);
+                s->block_inactive = true;
              }
          }
          qemu_mutex_unlock_iothread();
@@ -1758,6 +1769,8 @@  fail_invalidate:
          bdrv_invalidate_cache_all(&local_err);
          if (local_err) {
              error_report_err(local_err);
+        } else {
+            s->block_inactive = false;
          }
      }