Message ID | 20230502205212.134680-1-eblake@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: Attempt disk reactivation in more failure scenarios | expand |
On Tue, May 02, 2023 at 03:52:12PM -0500, Eric Blake wrote: > Commit fe904ea824 added a fail_inactivate label, which tries to > reactivate disks on the source after a failure while s->state == > MIGRATION_STATUS_ACTIVE, but didn't actually use the label if > qemu_savevm_state_complete_precopy() failed. This failure to > reactivate is also present in commit 6039dd5b1c (also covering the new > s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring > s->block_inactive is set more reliably). > > Consolidate the two labels back into one - no matter HOW migration is > failed, if there is any chance we can reach vm_start() after having > attempted inactivation, it is essential that we have tried to restart > disks before then. This also makes the cleanup more like > migrate_fd_cancel(). > > Suggested-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Peter Xu <peterx@redhat.com>
Am 02.05.2023 um 22:52 hat Eric Blake geschrieben: > Commit fe904ea824 added a fail_inactivate label, which tries to > reactivate disks on the source after a failure while s->state == > MIGRATION_STATUS_ACTIVE, but didn't actually use the label if > qemu_savevm_state_complete_precopy() failed. This failure to > reactivate is also present in commit 6039dd5b1c (also covering the new > s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring > s->block_inactive is set more reliably). > > Consolidate the two labels back into one - no matter HOW migration is > failed, if there is any chance we can reach vm_start() after having > attempted inactivation, it is essential that we have tried to restart > disks before then. This also makes the cleanup more like > migrate_fd_cancel(). > > Suggested-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Eric Blake <eblake@redhat.com> Thanks, applied to the block branch. Kevin
Eric Blake <eblake@redhat.com> wrote: > Commit fe904ea824 added a fail_inactivate label, which tries to > reactivate disks on the source after a failure while s->state == > MIGRATION_STATUS_ACTIVE, but didn't actually use the label if > qemu_savevm_state_complete_precopy() failed. This failure to > reactivate is also present in commit 6039dd5b1c (also covering the new > s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring > s->block_inactive is set more reliably). > > Consolidate the two labels back into one - no matter HOW migration is > failed, if there is any chance we can reach vm_start() after having > attempted inactivation, it is essential that we have tried to restart > disks before then. This also makes the cleanup more like > migrate_fd_cancel(). > > Suggested-by: Kevin Wolf <kwolf@redhat.com> > Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> I still can't believe that power down disks and decide if restart (or not) the vm is such a complicated bussiness. Sniff.
diff --git a/migration/migration.c b/migration/migration.c index abcadbb619e..7f982bd2c80 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2299,6 +2299,11 @@ static void migration_completion(MigrationState *s) MIGRATION_STATUS_DEVICE); } if (ret >= 0) { + /* + * Inactivate disks except in COLO, and track that we + * have done so in order to remember to reactivate + * them if migration fails or is cancelled. + */ s->block_inactive = !migrate_colo(); qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); ret = qemu_savevm_state_complete_precopy(s->to_dst_file, false, @@ -2343,13 +2348,13 @@ static void migration_completion(MigrationState *s) rp_error = await_return_path_close_on_source(s); trace_migration_return_path_end_after(rp_error); if (rp_error) { - goto fail_invalidate; + goto fail; } } if (qemu_file_get_error(s->to_dst_file)) { trace_migration_completion_file_err(); - goto fail_invalidate; + goto fail; } if (migrate_colo() && s->state == MIGRATION_STATUS_ACTIVE) { @@ -2363,26 +2368,25 @@ static void migration_completion(MigrationState *s) return; -fail_invalidate: - /* If not doing postcopy, vm_start() will be called: let's regain - * control on images. - */ - if (s->state == MIGRATION_STATUS_ACTIVE || - s->state == MIGRATION_STATUS_DEVICE) { +fail: + if (s->block_inactive && (s->state == MIGRATION_STATUS_ACTIVE || + s->state == MIGRATION_STATUS_DEVICE)) { + /* + * If not doing postcopy, vm_start() will be called: let's + * regain control on images. + */ Error *local_err = NULL; qemu_mutex_lock_iothread(); bdrv_activate_all(&local_err); if (local_err) { error_report_err(local_err); - s->block_inactive = true; } else { s->block_inactive = false; } qemu_mutex_unlock_iothread(); } -fail: migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_FAILED); }
Commit fe904ea824 added a fail_inactivate label, which tries to reactivate disks on the source after a failure while s->state == MIGRATION_STATUS_ACTIVE, but didn't actually use the label if qemu_savevm_state_complete_precopy() failed. This failure to reactivate is also present in commit 6039dd5b1c (also covering the new s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring s->block_inactive is set more reliably). Consolidate the two labels back into one - no matter HOW migration is failed, if there is any chance we can reach vm_start() after having attempted inactivation, it is essential that we have tried to restart disks before then. This also makes the cleanup more like migrate_fd_cancel(). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- migration/migration.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) base-commit: b5f47ba73b7c1457d2f18d71c00e1a91a76fe60b