mbox series

[V9,00/12] fix migration of suspended runstate

Message ID 1704312341-66640-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
Headers show
Series fix migration of suspended runstate | expand

Message

Steven Sistare Jan. 3, 2024, 8:05 p.m. UTC
Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.

See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Changes in V4:
  * rebase to tip
  * add patch for vm_prepare_start (thanks Peter)
  * add patch to preserve cpu ticks

Changes in V5:
  * rebase to tip
  * added patches to completely stop vm in suspended state:
      cpus: refactor vm_stop
      cpus: stop vm in suspended state
  * added patch to partially resume vm in suspended state:
      cpus: start vm in suspended state
  * modified "preserve suspended ..." patches to use the above.
  * deleted patch "preserve cpu ticks if suspended".  stop ticks in
    vm_stop_force_state instead.
  * deleted patch "add runstate function".  defined new helper function
    migrate_new_runstate in "preserve suspended runstate"
  * Added some RB's, but removed other RB's because the patches changed.

Changes in V6:
  * all vm_stop calls completely stop the suspended state
  * refactored and updated the "cpus" patches
  * simplified the "preserve suspended" patches
  * added patch "bootfile per vm"

Changes in V7:
  * rebase to tip, add RB-s
  * fix backwards compatibility for global_state.vm_was_suspended
  * delete vm_prepare_start state argument, and rename patch
    "pass runstate to vm_prepare_start" to
    "check running not RUN_STATE_RUNNING"
  * drop patches:
      tests/qtest: bootfile per vm
      tests/qtest: background migration with suspend
  * rename runstate_is_started to runstate_is_live
  * move wait_for_suspend in tests

Changes in V8:
  * rebase to tip
  * add RB's
  * add comment for runstate_is_live
  * simplify global_state - the needed function, and its use of vm_was_suspended

Changes in V9:
  * rebase to tip
  * update commit message and doc in "stop vm in suspended runstate"

Steve Sistare (12):
  cpus: vm_was_suspended
  cpus: stop vm in suspended runstate
  cpus: check running not RUN_STATE_RUNNING
  cpus: vm_resume
  migration: propagate suspended runstate
  migration: preserve suspended runstate
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend

 backends/tpm/tpm_emulator.c          |   2 +-
 hw/usb/hcd-ehci.c                    |   2 +-
 hw/usb/redirect.c                    |   2 +-
 hw/xen/xen-hvm-common.c              |   2 +-
 include/migration/snapshot.h         |   7 ++
 include/sysemu/runstate.h            |  20 ++++
 migration/global_state.c             |  47 +++++----
 migration/migration-hmp-cmds.c       |   8 +-
 migration/migration.c                |  15 +--
 migration/savevm.c                   |  23 +++--
 qapi/misc.json                       |  11 ++-
 qapi/run-state.json                  |   6 +-
 system/cpus.c                        |  47 +++++++--
 system/runstate.c                    |   9 ++
 system/vl.c                          |   2 +
 tests/migration/i386/Makefile        |   5 +-
 tests/migration/i386/a-b-bootblock.S |  50 +++++++++-
 tests/migration/i386/a-b-bootblock.h |  26 +++--
 tests/qtest/migration-helpers.c      |  27 ++----
 tests/qtest/migration-helpers.h      |  11 ++-
 tests/qtest/migration-test.c         | 181 +++++++++++++++++++++++++----------
 21 files changed, 356 insertions(+), 147 deletions(-)

Comments

Peter Xu Jan. 4, 2024, 4:37 a.m. UTC | #1
On Wed, Jan 03, 2024 at 12:05:29PM -0800, Steve Sistare wrote:
> Migration of a guest in the suspended runstate is broken.  The incoming
> migration code automatically tries to wake the guest, which is wrong;
> the guest should end migration in the same runstate it started.  Further,
> after saving a snapshot in the suspended state and loading it, the vm_start
> fails.  The runstate is RUNNING, but the guest is not.
> 
> See the commit messages for the details.

I was planning to wait for an ack from Markus, but I noticed Markus will
only be back next week.  So I queued it for now, and we can work on top
just in case.

Thanks,
Markus Armbruster Jan. 8, 2024, 12:47 p.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jan 03, 2024 at 12:05:29PM -0800, Steve Sistare wrote:
>> Migration of a guest in the suspended runstate is broken.  The incoming
>> migration code automatically tries to wake the guest, which is wrong;
>> the guest should end migration in the same runstate it started.  Further,
>> after saving a snapshot in the suspended state and loading it, the vm_start
>> fails.  The runstate is RUNNING, but the guest is not.
>> 
>> See the commit messages for the details.
>
> I was planning to wait for an ack from Markus, but I noticed Markus will
> only be back next week.  So I queued it for now, and we can work on top
> just in case.

Merge into master in commit c8193acc078e297fd46b6229e02b819b65c6702e.

I had a look at the QAPI schema changes [PATCH 02].  They look good to
me now.  Thanks!