diff mbox series

[v1,4/4] migration/multifd: Move load_cleanup inside incoming_state_destroy

Message ID 20230210063630.532185-4-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/4] migration/multifd: Change multifd_load_cleanup() signature and usage | expand

Commit Message

Leonardo Bras Feb. 10, 2023, 6:36 a.m. UTC
Currently running migration_incoming_state_destroy() without first running
multifd_load_cleanup() will cause a yank error:

qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance:
Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
(core dumped)

The above error happens in the target host, when multifd is being used
for precopy, and then postcopy is triggered and the migration finishes.
This will crash the VM in the target host.

To avoid that, move multifd_load_cleanup() inside
migration_incoming_state_destroy(), so that the load cleanup becomes part
of the incoming state destroying process.

Running multifd_load_cleanup() twice can become an issue, though, but the
only scenario it could be ran twice is on process_incoming_migration_bh().
So removing this extra call is necessary.

On the other hand, this multifd_load_cleanup() call happens way before the
migration_incoming_state_destroy() and having this happening before
dirty_bitmap_mig_before_vm_start() and vm_start() may be a need.

So introduce a new function multifd_load_shutdown() that will mainly stop
all multifd threads and close their QIOChannels. Then use this function
instead of multifd_load_cleanup() to make sure nothing else is received
before dirty_bitmap_mig_before_vm_start().

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/multifd.h   | 1 +
 migration/migration.c | 4 +++-
 migration/multifd.c   | 7 +++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

Comments

Leonardo Bras Feb. 10, 2023, 6:40 a.m. UTC | #1
On Fri, 2023-02-10 at 03:36 -0300, Leonardo Bras wrote:
> Currently running migration_incoming_state_destroy() without first running
> multifd_load_cleanup() will cause a yank error:
> 
> qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance:
> Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> (core dumped)
> 
> The above error happens in the target host, when multifd is being used
> for precopy, and then postcopy is triggered and the migration finishes.
> This will crash the VM in the target host.
> 
> To avoid that, move multifd_load_cleanup() inside
> migration_incoming_state_destroy(), so that the load cleanup becomes part
> of the incoming state destroying process.
> 
> Running multifd_load_cleanup() twice can become an issue, though, but the
> only scenario it could be ran twice is on process_incoming_migration_bh().
> So removing this extra call is necessary.
> 
> On the other hand, this multifd_load_cleanup() call happens way before the
> migration_incoming_state_destroy() and having this happening before
> dirty_bitmap_mig_before_vm_start() and vm_start() may be a need.
> 
> So introduce a new function multifd_load_shutdown() that will mainly stop
> all multifd threads and close their QIOChannels. Then use this function
> instead of multifd_load_cleanup() to make sure nothing else is received
> before dirty_bitmap_mig_before_vm_start().

Please add:

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>

> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/multifd.h   | 1 +
>  migration/migration.c | 4 +++-
>  migration/multifd.c   | 7 +++++++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9a7e1a8826..7cfc265148 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -17,6 +17,7 @@ int multifd_save_setup(Error **errp);
>  void multifd_save_cleanup(void);
>  int multifd_load_setup(Error **errp);
>  void multifd_load_cleanup(void);
> +void multifd_load_shutdown(void);
>  bool multifd_recv_all_channels_created(void);
>  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>  void multifd_recv_sync_main(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index ce962ea577..9f69447320 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -302,6 +302,8 @@ void migration_incoming_state_destroy(void)
>  {
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
>  
> +    multifd_load_cleanup();
> +
>      if (mis->to_src_file) {
>          /* Tell source that we are done */
>          migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
> @@ -543,7 +545,7 @@ static void process_incoming_migration_bh(void *opaque)
>       */
>      qemu_announce_self(&mis->announce_timer, migrate_announce_params());
>  
> -    multifd_load_cleanup();
> +    multifd_load_shutdown();
>  
>      dirty_bitmap_mig_before_vm_start();
>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 7e37a459ed..9302c9f6cf 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1022,6 +1022,13 @@ static void multifd_recv_terminate_threads(Error *err)
>      }
>  }
>  
> +void multifd_load_shutdown(void)
> +{
> +    if (migrate_use_multifd() && migrate_multi_channels_is_allowed()) {
> +        multifd_recv_terminate_threads(NULL);
> +    }
> +}
> +
>  void multifd_load_cleanup(void)
>  {
>      int i;
Juan Quintela Feb. 10, 2023, 12:51 p.m. UTC | #2
Leonardo Bras <leobras@redhat.com> wrote:
> Currently running migration_incoming_state_destroy() without first running
> multifd_load_cleanup() will cause a yank error:
>
> qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance:
> Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
> (core dumped)
>
> The above error happens in the target host, when multifd is being used
> for precopy, and then postcopy is triggered and the migration finishes.
> This will crash the VM in the target host.
>
> To avoid that, move multifd_load_cleanup() inside
> migration_incoming_state_destroy(), so that the load cleanup becomes part
> of the incoming state destroying process.
>
> Running multifd_load_cleanup() twice can become an issue, though, but the
> only scenario it could be ran twice is on process_incoming_migration_bh().
> So removing this extra call is necessary.
>
> On the other hand, this multifd_load_cleanup() call happens way before the
> migration_incoming_state_destroy() and having this happening before
> dirty_bitmap_mig_before_vm_start() and vm_start() may be a need.
>
> So introduce a new function multifd_load_shutdown() that will mainly stop
> all multifd threads and close their QIOChannels. Then use this function
> instead of multifd_load_cleanup() to make sure nothing else is received
> before dirty_bitmap_mig_before_vm_start().
>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>
diff mbox series

Patch

diff --git a/migration/multifd.h b/migration/multifd.h
index 9a7e1a8826..7cfc265148 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -17,6 +17,7 @@  int multifd_save_setup(Error **errp);
 void multifd_save_cleanup(void);
 int multifd_load_setup(Error **errp);
 void multifd_load_cleanup(void);
+void multifd_load_shutdown(void);
 bool multifd_recv_all_channels_created(void);
 void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
diff --git a/migration/migration.c b/migration/migration.c
index ce962ea577..9f69447320 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -302,6 +302,8 @@  void migration_incoming_state_destroy(void)
 {
     struct MigrationIncomingState *mis = migration_incoming_get_current();
 
+    multifd_load_cleanup();
+
     if (mis->to_src_file) {
         /* Tell source that we are done */
         migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
@@ -543,7 +545,7 @@  static void process_incoming_migration_bh(void *opaque)
      */
     qemu_announce_self(&mis->announce_timer, migrate_announce_params());
 
-    multifd_load_cleanup();
+    multifd_load_shutdown();
 
     dirty_bitmap_mig_before_vm_start();
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 7e37a459ed..9302c9f6cf 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1022,6 +1022,13 @@  static void multifd_recv_terminate_threads(Error *err)
     }
 }
 
+void multifd_load_shutdown(void)
+{
+    if (migrate_use_multifd() && migrate_multi_channels_is_allowed()) {
+        multifd_recv_terminate_threads(NULL);
+    }
+}
+
 void multifd_load_cleanup(void)
 {
     int i;