diff mbox series

[v3,6/9] migration: Don't set FAILED state when cancelling

Message ID 20250213175927.19642-7-farosas@suse.de (mailing list archive)
State New
Headers show
Series migration: Fix issues during qmp_migrate_cancel | expand

Commit Message

Fabiano Rosas Feb. 13, 2025, 5:59 p.m. UTC
The expected outcome from qmp_migrate_cancel() is that the source
migration goes to the terminal state
MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
cancelling.

Make sure there is never a state transition from an unspecified state
into FAILED. Code that sets FAILED, should always either make sure
that the old state is not CANCELLING or specify the old state.

Note that the destination is allowed to go into FAILED, so there's no
issue there.

(I don't think this is relevant as a backport because cancelling does
work, it just doesn't show the right state at the end)

Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Peter Xu Feb. 13, 2025, 9:04 p.m. UTC | #1
On Thu, Feb 13, 2025 at 02:59:24PM -0300, Fabiano Rosas wrote:
> The expected outcome from qmp_migrate_cancel() is that the source
> migration goes to the terminal state
> MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
> cancelling.
> 
> Make sure there is never a state transition from an unspecified state
> into FAILED. Code that sets FAILED, should always either make sure
> that the old state is not CANCELLING or specify the old state.
> 
> Note that the destination is allowed to go into FAILED, so there's no
> issue there.
> 
> (I don't think this is relevant as a backport because cancelling does
> work, it just doesn't show the right state at the end)
> 
> Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
> Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
> Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
> Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Not like migrate_set_state_failure(MigrationState *s)?  Not a huge deal,
though..

Reviewed-by: Peter Xu <peterx@redhat.com>
Fabiano Rosas Feb. 14, 2025, 12:25 p.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> On Thu, Feb 13, 2025 at 02:59:24PM -0300, Fabiano Rosas wrote:
>> The expected outcome from qmp_migrate_cancel() is that the source
>> migration goes to the terminal state
>> MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
>> cancelling.
>> 
>> Make sure there is never a state transition from an unspecified state
>> into FAILED. Code that sets FAILED, should always either make sure
>> that the old state is not CANCELLING or specify the old state.
>> 
>> Note that the destination is allowed to go into FAILED, so there's no
>> issue there.
>> 
>> (I don't think this is relevant as a backport because cancelling does
>> work, it just doesn't show the right state at the end)
>> 
>> Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
>> Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
>> Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
>> Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Not like migrate_set_state_failure(MigrationState *s)?  Not a huge deal,
> though..

I thought we had agreed over IRC that it was best to hold that until the
other MigrationStatus work happens?

Anyway, looking closer at this, there are places that handle CANCELLING
beforehand (_detect_error) and places that only set FAILED after
specific states (multifd), so a single helper will require more
churn. Let's postpone that please.

>
> Reviewed-by: Peter Xu <peterx@redhat.com>
Peter Xu Feb. 14, 2025, 3:13 p.m. UTC | #3
On Fri, Feb 14, 2025 at 09:25:12AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Feb 13, 2025 at 02:59:24PM -0300, Fabiano Rosas wrote:
> >> The expected outcome from qmp_migrate_cancel() is that the source
> >> migration goes to the terminal state
> >> MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
> >> cancelling.
> >> 
> >> Make sure there is never a state transition from an unspecified state
> >> into FAILED. Code that sets FAILED, should always either make sure
> >> that the old state is not CANCELLING or specify the old state.
> >> 
> >> Note that the destination is allowed to go into FAILED, so there's no
> >> issue there.
> >> 
> >> (I don't think this is relevant as a backport because cancelling does
> >> work, it just doesn't show the right state at the end)
> >> 
> >> Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
> >> Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
> >> Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
> >> Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > Not like migrate_set_state_failure(MigrationState *s)?  Not a huge deal,
> > though..
> 
> I thought we had agreed over IRC that it was best to hold that until the
> other MigrationStatus work happens?

If we touched this anyway, IMHO no hurt to add a helper too.

migrate_set_state_failure() can then be renamed to migrate_set_failure(),
take a Error* instead so it might help that effort too.

> 
> Anyway, looking closer at this, there are places that handle CANCELLING
> beforehand (_detect_error) and places that only set FAILED after
> specific states (multifd), so a single helper will require more
> churn. Let's postpone that please.

Sure.  Let's go ahead with this.

> 
> >
> > Reviewed-by: Peter Xu <peterx@redhat.com>
>
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index 48c9ad3c96..c597aa707e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2648,7 +2648,10 @@  static int postcopy_start(MigrationState *ms, Error **errp)
     if (migrate_postcopy_preempt()) {
         migration_wait_main_channel(ms);
         if (postcopy_preempt_establish_channel(ms)) {
-            migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+            if (ms->state != MIGRATION_STATUS_CANCELLING) {
+                migrate_set_state(&ms->state, ms->state,
+                                  MIGRATION_STATUS_FAILED);
+            }
             error_setg(errp, "%s: Failed to establish preempt channel",
                        __func__);
             return -1;
@@ -2986,7 +2989,9 @@  fail:
         error_free(local_err);
     }
 
-    migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    if (s->state != MIGRATION_STATUS_CANCELLING) {
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    }
 }
 
 /**
@@ -3009,7 +3014,7 @@  static void bg_migration_completion(MigrationState *s)
         qemu_put_buffer(s->to_dst_file, s->bioc->data, s->bioc->usage);
         qemu_fflush(s->to_dst_file);
     } else if (s->state == MIGRATION_STATUS_CANCELLING) {
-        goto fail;
+        return;
     }
 
     if (qemu_file_get_error(s->to_dst_file)) {
@@ -3953,7 +3958,9 @@  void migration_connect(MigrationState *s, Error *error_in)
 
 fail:
     migrate_set_error(s, local_err);
-    migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    if (s->state != MIGRATION_STATUS_CANCELLING) {
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    }
     error_report_err(local_err);
     migration_cleanup(s);
 }