Message ID | 20210629155007.629086-3-lvivier@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: failover: continue to wait card unplug on error | expand |
Laurent Vivier <lvivier@redhat.com> wrote: > If the user cancels the migration in the unplug-wait state, > QEMU will try to plug back the card and this fails because the card > is partially unplugged. > To avoid the problem, continue to wait the card unplug, but to > allow the migration to be canceled if the card never finishes to unplug > use a timeout. > > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > --- > migration/migration.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 3e92c405a2b6..3b06d43a7f42 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3679,6 +3679,17 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, > qemu_savevm_state_guest_unplug_pending()) { > qemu_sem_timedwait(&s->wait_unplug_sem, 250); > } > + if (s->state != MIGRATION_STATUS_WAIT_UNPLUG) { > + int timeout = 120; /* 30 seconds */ > + /* > + * migration has been canceled > + * but as we have started an unplug we must wait the end > + * to be able to plug back the card > + */ > + while (timeout-- && qemu_savevm_state_guest_unplug_pending()) { > + qemu_sem_timedwait(&s->wait_unplug_sem, 250); > + } > + } > > migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); > } else { I agree with the idea. But if we are getting out due to timeout == 0, shouldn't we return some error, warning, whatever? Later, Juan.
On 29/06/2021 19:50, Juan Quintela wrote: > Laurent Vivier <lvivier@redhat.com> wrote: >> If the user cancels the migration in the unplug-wait state, >> QEMU will try to plug back the card and this fails because the card >> is partially unplugged. >> To avoid the problem, continue to wait the card unplug, but to >> allow the migration to be canceled if the card never finishes to unplug >> use a timeout. >> >> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 >> Signed-off-by: Laurent Vivier <lvivier@redhat.com> >> --- >> migration/migration.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 3e92c405a2b6..3b06d43a7f42 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -3679,6 +3679,17 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, >> qemu_savevm_state_guest_unplug_pending()) { >> qemu_sem_timedwait(&s->wait_unplug_sem, 250); >> } >> + if (s->state != MIGRATION_STATUS_WAIT_UNPLUG) { >> + int timeout = 120; /* 30 seconds */ >> + /* >> + * migration has been canceled >> + * but as we have started an unplug we must wait the end >> + * to be able to plug back the card >> + */ >> + while (timeout-- && qemu_savevm_state_guest_unplug_pending()) { >> + qemu_sem_timedwait(&s->wait_unplug_sem, 250); >> + } >> + } >> >> migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); >> } else { > I agree with the idea. But if we are getting out due to timeout == 0, > shouldn't we return some error, warning, whatever? In that case, we keep the current behaviour: guest kernel will report an error when it will try to plug back the card that has not been unplugged. This is a corner case: if it happens we have something really wrong with the machine. Perhaps we can remove the timeout, but I don't like to block the user, or increase it to be sure. Thanks, Laurent
Laurent Vivier <lvivier@redhat.com> wrote: > On 29/06/2021 19:50, Juan Quintela wrote: >> Laurent Vivier <lvivier@redhat.com> wrote: >>> If the user cancels the migration in the unplug-wait state, >>> QEMU will try to plug back the card and this fails because the card >>> is partially unplugged. >>> To avoid the problem, continue to wait the card unplug, but to >>> allow the migration to be canceled if the card never finishes to unplug >>> use a timeout. >>> >>> Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 >>> Signed-off-by: Laurent Vivier <lvivier@redhat.com> >>> --- >>> migration/migration.c | 11 +++++++++++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/migration/migration.c b/migration/migration.c >>> index 3e92c405a2b6..3b06d43a7f42 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -3679,6 +3679,17 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, >>> qemu_savevm_state_guest_unplug_pending()) { >>> qemu_sem_timedwait(&s->wait_unplug_sem, 250); >>> } >>> + if (s->state != MIGRATION_STATUS_WAIT_UNPLUG) { >>> + int timeout = 120; /* 30 seconds */ >>> + /* >>> + * migration has been canceled >>> + * but as we have started an unplug we must wait the end >>> + * to be able to plug back the card >>> + */ >>> + while (timeout-- && qemu_savevm_state_guest_unplug_pending()) { >>> + qemu_sem_timedwait(&s->wait_unplug_sem, 250); >>> + } >>> + } >>> >>> migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); >>> } else { >> I agree with the idea. But if we are getting out due to timeout == 0, >> shouldn't we return some error, warning, whatever? > > In that case, we keep the current behaviour: guest kernel will report > an error when it > will try to plug back the card that has not been unplugged. This is a > corner case: if it > happens we have something really wrong with the machine. Perhaps we can remove the > timeout, but I don't like to block the user, or increase it to be sure. Oh, I whole agree that it is a corner case, and that it shouldn't happen. But if it happens, we don't log it anywhere. That was my complaint. Later, Juan.
* Laurent Vivier (lvivier@redhat.com) wrote: > If the user cancels the migration in the unplug-wait state, > QEMU will try to plug back the card and this fails because the card > is partially unplugged. > To avoid the problem, continue to wait the card unplug, but to > allow the migration to be canceled if the card never finishes to unplug > use a timeout. > > Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 > Signed-off-by: Laurent Vivier <lvivier@redhat.com> I'll take this for now, but as Juan says, we could really do with some diags when this happens, so when someone comes and tells us that the hotplug has failed we can see. Please send something to add it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 3e92c405a2b6..3b06d43a7f42 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3679,6 +3679,17 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, > qemu_savevm_state_guest_unplug_pending()) { > qemu_sem_timedwait(&s->wait_unplug_sem, 250); > } > + if (s->state != MIGRATION_STATUS_WAIT_UNPLUG) { > + int timeout = 120; /* 30 seconds */ > + /* > + * migration has been canceled > + * but as we have started an unplug we must wait the end > + * to be able to plug back the card > + */ > + while (timeout-- && qemu_savevm_state_guest_unplug_pending()) { > + qemu_sem_timedwait(&s->wait_unplug_sem, 250); > + } > + } > > migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); > } else { > -- > 2.31.1 > >
diff --git a/migration/migration.c b/migration/migration.c index 3e92c405a2b6..3b06d43a7f42 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3679,6 +3679,17 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, qemu_savevm_state_guest_unplug_pending()) { qemu_sem_timedwait(&s->wait_unplug_sem, 250); } + if (s->state != MIGRATION_STATUS_WAIT_UNPLUG) { + int timeout = 120; /* 30 seconds */ + /* + * migration has been canceled + * but as we have started an unplug we must wait the end + * to be able to plug back the card + */ + while (timeout-- && qemu_savevm_state_guest_unplug_pending()) { + qemu_sem_timedwait(&s->wait_unplug_sem, 250); + } + } migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); } else {
If the user cancels the migration in the unplug-wait state, QEMU will try to plug back the card and this fails because the card is partially unplugged. To avoid the problem, continue to wait the card unplug, but to allow the migration to be canceled if the card never finishes to unplug use a timeout. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- migration/migration.c | 11 +++++++++++ 1 file changed, 11 insertions(+)