Message ID | 1479555831-30960-1-git-send-email-zhang.zhanghailiang@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Cc: qemu-stable@nongnu.org On 2016/11/19 19:43, zhanghailiang wrote: > commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case > which migration aborted QEMU because it didn't regain the control > of images while some errors happened. > > Actually, we have another case in that error path to abort QEMU > because of the same reason: > migration_thread() > migration_completion() > bdrv_inactivate_all() ----------------> inactivate images > qemu_savevm_state_complete_precopy() > socket_writev_buffer() --------> error because destination fails > qemu_fflush() -------------------> set error on migration stream > qemu_mutex_unlock_iothread() ------> unlock > qmp_migrate_cancel() ---------------------> user cancelled migration > migrate_set_state() ------------------> set migrate CANCELLING > migration_completion() -----------------> go on to fail_invalidate > if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch > migration_thread() -----------------------> break migration loop > vm_start() -----------------------------> restart guest with inactive > images > We failed to regain the control of images because we only regain it > while the migration state is "active", but here users cancelled the migration > when they found some errors happened (for example, libvirtd daemon is shutdown > in destination unexpectedly). > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > --- > migration/migration.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index f498ab8..0c1ee6d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1752,7 +1752,8 @@ fail_invalidate: > /* If not doing postcopy, vm_start() will be called: let's regain > * control on images. > */ > - if (s->state == MIGRATION_STATUS_ACTIVE) { > + if (s->state == MIGRATION_STATUS_ACTIVE || > + s->state == MIGRATION_STATUS_CANCELLING) { > Error *local_err = NULL; > > bdrv_invalidate_cache_all(&local_err); >
Forwarding to qemu-block so I won't forget to have a look. Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: > commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case > which migration aborted QEMU because it didn't regain the control > of images while some errors happened. > > Actually, we have another case in that error path to abort QEMU > because of the same reason: > migration_thread() > migration_completion() > bdrv_inactivate_all() ----------------> inactivate images > qemu_savevm_state_complete_precopy() > socket_writev_buffer() --------> error because destination fails > qemu_fflush() -------------------> set error on migration stream > qemu_mutex_unlock_iothread() ------> unlock > qmp_migrate_cancel() ---------------------> user cancelled migration > migrate_set_state() ------------------> set migrate CANCELLING > migration_completion() -----------------> go on to fail_invalidate > if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch > migration_thread() -----------------------> break migration loop > vm_start() -----------------------------> restart guest with inactive > images > We failed to regain the control of images because we only regain it > while the migration state is "active", but here users cancelled the migration > when they found some errors happened (for example, libvirtd daemon is shutdown > in destination unexpectedly). > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > --- > migration/migration.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index f498ab8..0c1ee6d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1752,7 +1752,8 @@ fail_invalidate: > /* If not doing postcopy, vm_start() will be called: let's regain > * control on images. > */ > - if (s->state == MIGRATION_STATUS_ACTIVE) { > + if (s->state == MIGRATION_STATUS_ACTIVE || > + s->state == MIGRATION_STATUS_CANCELLING) { > Error *local_err = NULL; > > bdrv_invalidate_cache_all(&local_err); > -- > 1.8.3.1 > > >
Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: > commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case > which migration aborted QEMU because it didn't regain the control > of images while some errors happened. > > Actually, we have another case in that error path to abort QEMU > because of the same reason: > migration_thread() > migration_completion() > bdrv_inactivate_all() ----------------> inactivate images > qemu_savevm_state_complete_precopy() > socket_writev_buffer() --------> error because destination fails > qemu_fflush() -------------------> set error on migration stream > qemu_mutex_unlock_iothread() ------> unlock > qmp_migrate_cancel() ---------------------> user cancelled migration > migrate_set_state() ------------------> set migrate CANCELLING Important to note here: qmp_migrate_cancel() is executed by a concurrent thread, it doesn't depend on any code paths in migration_completion(). > migration_completion() -----------------> go on to fail_invalidate > if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch > migration_thread() -----------------------> break migration loop > vm_start() -----------------------------> restart guest with inactive > images > We failed to regain the control of images because we only regain it > while the migration state is "active", but here users cancelled the migration > when they found some errors happened (for example, libvirtd daemon is shutdown > in destination unexpectedly). > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > --- > migration/migration.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index f498ab8..0c1ee6d 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1752,7 +1752,8 @@ fail_invalidate: > /* If not doing postcopy, vm_start() will be called: let's regain > * control on images. > */ > - if (s->state == MIGRATION_STATUS_ACTIVE) { This if condition tries to check whether we ran the code path that called bdrv_inactivate_all(), so that we only try to reactivate images it if we really inactivated them first. The problem with it is that it ignores a possible concurrent modification of s->state. > + if (s->state == MIGRATION_STATUS_ACTIVE || > + s->state == MIGRATION_STATUS_CANCELLING) { This adds another state that we could end up with with a concurrent modification, so that even in this case we undo the inactivation. However, it is no longer limited to the cases where we inactivated the image. It also applies to other code paths (like the postcopy one) where we didn't inactivate images. What saves the patch is that bdrv_invalidate_cache() is a no-op for block devices that aren't inactivated, so calling it more often than necessary is okay. But then, if we're going to rely on this, it would be much better to just remove the if altogether. I can't say whether there are any other possible values of s->state that we should consider, and by removing the if we would be guaranteed to catch all of them. If we don't want to rely on it, just keep a local bool that remembers whether we inactivated images and check that here. > Error *local_err = NULL; > > bdrv_invalidate_cache_all(&local_err); So in summary, this is a horrible patch because it checks the wrong thing, and for I can't really say if it covers everything it needs to cover, but arguably it happens to correctly fix the outcome of a previously failing case. Normally I would reject such a patch and require a clean solution, but then we're on the day of -rc3, so if you can't send v2 right away, we might not have the time for it. Tough call... Kevin
On Tue, Dec 6, 2016 at 1:42 PM, Kevin Wolf <kwolf@redhat.com> wrote: > Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: >> Error *local_err = NULL; >> >> bdrv_invalidate_cache_all(&local_err); > > So in summary, this is a horrible patch because it checks the wrong > thing, and for I can't really say if it covers everything it needs to > cover, but arguably it happens to correctly fix the outcome of a > previously failing case. > > Normally I would reject such a patch and require a clean solution, but > then we're on the day of -rc3, so if you can't send v2 right away, we > might not have the time for it. > > Tough call... This bug is not a release blocker because it was already in QEMU 2.7 and not a regression. Let's take time to understand this fully. There could be related bugs that can be fixed if this area of QEMU is scrutinized. We can merge fixes in -stable when consensus is reached. Stefan
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: > > commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case > > which migration aborted QEMU because it didn't regain the control > > of images while some errors happened. > > > > Actually, we have another case in that error path to abort QEMU > > because of the same reason: > > migration_thread() > > migration_completion() > > bdrv_inactivate_all() ----------------> inactivate images > > qemu_savevm_state_complete_precopy() > > socket_writev_buffer() --------> error because destination fails > > qemu_fflush() -------------------> set error on migration stream > > qemu_mutex_unlock_iothread() ------> unlock > > qmp_migrate_cancel() ---------------------> user cancelled migration > > migrate_set_state() ------------------> set migrate CANCELLING > > Important to note here: qmp_migrate_cancel() is executed by a concurrent > thread, it doesn't depend on any code paths in migration_completion(). > > > migration_completion() -----------------> go on to fail_invalidate > > if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch > > migration_thread() -----------------------> break migration loop > > vm_start() -----------------------------> restart guest with inactive > > images > > We failed to regain the control of images because we only regain it > > while the migration state is "active", but here users cancelled the migration > > when they found some errors happened (for example, libvirtd daemon is shutdown > > in destination unexpectedly). > > > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > > --- > > migration/migration.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/migration/migration.c b/migration/migration.c > > index f498ab8..0c1ee6d 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -1752,7 +1752,8 @@ fail_invalidate: > > /* If not doing postcopy, vm_start() will be called: let's regain > > * control on images. > > */ > > - if (s->state == MIGRATION_STATUS_ACTIVE) { > > This if condition tries to check whether we ran the code path that > called bdrv_inactivate_all(), so that we only try to reactivate images > it if we really inactivated them first. > > The problem with it is that it ignores a possible concurrent > modification of s->state. > > > + if (s->state == MIGRATION_STATUS_ACTIVE || > > + s->state == MIGRATION_STATUS_CANCELLING) { > > This adds another state that we could end up with with a concurrent > modification, so that even in this case we undo the inactivation. > > However, it is no longer limited to the cases where we inactivated the > image. It also applies to other code paths (like the postcopy one) where > we didn't inactivate images. > > What saves the patch is that bdrv_invalidate_cache() is a no-op for > block devices that aren't inactivated, so calling it more often than > necessary is okay. > > But then, if we're going to rely on this, it would be much better to > just remove the if altogether. I can't say whether there are any other > possible values of s->state that we should consider, and by removing the > if we would be guaranteed to catch all of them. > > If we don't want to rely on it, just keep a local bool that remembers > whether we inactivated images and check that here. > > > Error *local_err = NULL; > > > > bdrv_invalidate_cache_all(&local_err); > > So in summary, this is a horrible patch because it checks the wrong > thing, and for I can't really say if it covers everything it needs to > cover, but arguably it happens to correctly fix the outcome of a > previously failing case. > > Normally I would reject such a patch and require a clean solution, but > then we're on the day of -rc3, so if you can't send v2 right away, we > might not have the time for it. > > Tough call... Hmm, this case is messy; I created this function having split it out of the main loop a couple of years back but it did get more messy with more s->state checks; as far as I can tell it's always done the transition to COMPLETED at the end well after the locked section, so there's always been that chance that cancellation sneaks in just before or just after the locked section. Some of the bad cases that can happen: a) A cancel sneaks in after the ACTIVE check but before or after the locked section; should we reactivate the disks? Well that depends on whether the destination actually got the full migration stream - we don't know! If the destination actually starts running we must not reactivate the disk on the source even if the CPU is stopped. b) If the bdrv_inactive_all fails for one device, but the others are fine, we go down the fail: label and don't reactivate, so the source dies even though it might have been mostly OK. We can move the _lock to before the check of s->state at the top, which would stop the cancel sneaking in early. In the case where postcopy was never enabled (!migrate_postcopy_ram()) we can move the COMPLETED transition into the lock as well; so I think then we kind of become safe. In the case where postcopy was enabled I think we can do the COMPLETED transition before waiting for the return path to close - I think but I need to think more about that. And there seem to be some dodgy cases where we call the invalidate there after a late postcopy failure; that's bad, we shouldn't reactivate the source disks after going into postcopy. So, in summary; this function is a mess - it needs a much bigger fix than this patch. Dave > Kevin > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi, On 2016/12/6 23:24, Dr. David Alan Gilbert wrote: > * Kevin Wolf (kwolf@redhat.com) wrote: >> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: >>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case >>> which migration aborted QEMU because it didn't regain the control >>> of images while some errors happened. >>> >>> Actually, we have another case in that error path to abort QEMU >>> because of the same reason: >>> migration_thread() >>> migration_completion() >>> bdrv_inactivate_all() ----------------> inactivate images >>> qemu_savevm_state_complete_precopy() >>> socket_writev_buffer() --------> error because destination fails >>> qemu_fflush() -------------------> set error on migration stream >>> qemu_mutex_unlock_iothread() ------> unlock >>> qmp_migrate_cancel() ---------------------> user cancelled migration >>> migrate_set_state() ------------------> set migrate CANCELLING >> >> Important to note here: qmp_migrate_cancel() is executed by a concurrent >> thread, it doesn't depend on any code paths in migration_completion(). >> >>> migration_completion() -----------------> go on to fail_invalidate >>> if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch >>> migration_thread() -----------------------> break migration loop >>> vm_start() -----------------------------> restart guest with inactive >>> images >>> We failed to regain the control of images because we only regain it >>> while the migration state is "active", but here users cancelled the migration >>> when they found some errors happened (for example, libvirtd daemon is shutdown >>> in destination unexpectedly). >>> >>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >>> --- >>> migration/migration.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/migration/migration.c b/migration/migration.c >>> index f498ab8..0c1ee6d 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -1752,7 +1752,8 @@ fail_invalidate: >>> /* If not doing postcopy, vm_start() will be called: let's regain >>> * control on images. >>> */ >>> - if (s->state == MIGRATION_STATUS_ACTIVE) { >> >> This if condition tries to check whether we ran the code path that >> called bdrv_inactivate_all(), so that we only try to reactivate images >> it if we really inactivated them first. >> >> The problem with it is that it ignores a possible concurrent >> modification of s->state. >> >>> + if (s->state == MIGRATION_STATUS_ACTIVE || >>> + s->state == MIGRATION_STATUS_CANCELLING) { >> >> This adds another state that we could end up with with a concurrent >> modification, so that even in this case we undo the inactivation. >> >> However, it is no longer limited to the cases where we inactivated the >> image. It also applies to other code paths (like the postcopy one) where >> we didn't inactivate images. >> >> What saves the patch is that bdrv_invalidate_cache() is a no-op for >> block devices that aren't inactivated, so calling it more often than >> necessary is okay. >> >> But then, if we're going to rely on this, it would be much better to >> just remove the if altogether. I can't say whether there are any other >> possible values of s->state that we should consider, and by removing the >> if we would be guaranteed to catch all of them. >> >> If we don't want to rely on it, just keep a local bool that remembers >> whether we inactivated images and check that here. >> >>> Error *local_err = NULL; >>> >>> bdrv_invalidate_cache_all(&local_err); >> >> So in summary, this is a horrible patch because it checks the wrong >> thing, and for I can't really say if it covers everything it needs to >> cover, but arguably it happens to correctly fix the outcome of a >> previously failing case. >> >> Normally I would reject such a patch and require a clean solution, but >> then we're on the day of -rc3, so if you can't send v2 right away, we >> might not have the time for it. >> >> Tough call... > > Hmm, this case is messy; I created this function having split it out > of the main loop a couple of years back but it did get more messy > with more s->state checks; as far as I can tell it's always > done the transition to COMPLETED at the end well after the locked > section, so there's always been that chance that cancellation sneaks > in just before or just after the locked section. > > Some of the bad cases that can happen: > a) A cancel sneaks in after the ACTIVE check but before or after > the locked section; should we reactivate the disks? Well that > depends on whether the destination actually got the full migration > stream - we don't know! > If the destination actually starts running we must not reactivate > the disk on the source even if the CPU is stopped. > Yes, we didn't have a mechanism to know exactly whether or not the VM in destination is well received. > b) If the bdrv_inactive_all fails for one device, but the others > are fine, we go down the fail: label and don't reactivate, so > the source dies even though it might have been mostly OK. > > We can move the _lock to before the check of s->state at the top, > which would stop the cancel sneaking in early. > In the case where postcopy was never enabled (!migrate_postcopy_ram()) > we can move the COMPLETED transition into the lock as well; so I think > then we kind of become safe. > In the case where postcopy was enabled I think we can do the COMPLETED > transition before waiting for the return path to close - I think but > I need to think more about that. > And there seem to be some dodgy cases where we call the invalidate > there after a late postcopy failure; that's bad, we shouldn't reactivate > the source disks after going into postcopy. > > So, in summary; this function is a mess - it needs a much bigger > fix than this patch. > So what's the conclusion ? Will you send a patch to fix it ? Or let's fix it step by step ? I think Kevin's suggestion which just remove the *if* check is OK. Thanks, Hailiang > Dave > >> Kevin >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >
* Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: > Hi, > > On 2016/12/6 23:24, Dr. David Alan Gilbert wrote: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: > > > > commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case > > > > which migration aborted QEMU because it didn't regain the control > > > > of images while some errors happened. > > > > > > > > Actually, we have another case in that error path to abort QEMU > > > > because of the same reason: > > > > migration_thread() > > > > migration_completion() > > > > bdrv_inactivate_all() ----------------> inactivate images > > > > qemu_savevm_state_complete_precopy() > > > > socket_writev_buffer() --------> error because destination fails > > > > qemu_fflush() -------------------> set error on migration stream > > > > qemu_mutex_unlock_iothread() ------> unlock > > > > qmp_migrate_cancel() ---------------------> user cancelled migration > > > > migrate_set_state() ------------------> set migrate CANCELLING > > > > > > Important to note here: qmp_migrate_cancel() is executed by a concurrent > > > thread, it doesn't depend on any code paths in migration_completion(). > > > > > > > migration_completion() -----------------> go on to fail_invalidate > > > > if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch > > > > migration_thread() -----------------------> break migration loop > > > > vm_start() -----------------------------> restart guest with inactive > > > > images > > > > We failed to regain the control of images because we only regain it > > > > while the migration state is "active", but here users cancelled the migration > > > > when they found some errors happened (for example, libvirtd daemon is shutdown > > > > in destination unexpectedly). > > > > > > > > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> > > > > --- > > > > migration/migration.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/migration/migration.c b/migration/migration.c > > > > index f498ab8..0c1ee6d 100644 > > > > --- a/migration/migration.c > > > > +++ b/migration/migration.c > > > > @@ -1752,7 +1752,8 @@ fail_invalidate: > > > > /* If not doing postcopy, vm_start() will be called: let's regain > > > > * control on images. > > > > */ > > > > - if (s->state == MIGRATION_STATUS_ACTIVE) { > > > > > > This if condition tries to check whether we ran the code path that > > > called bdrv_inactivate_all(), so that we only try to reactivate images > > > it if we really inactivated them first. > > > > > > The problem with it is that it ignores a possible concurrent > > > modification of s->state. > > > > > > > + if (s->state == MIGRATION_STATUS_ACTIVE || > > > > + s->state == MIGRATION_STATUS_CANCELLING) { > > > > > > This adds another state that we could end up with with a concurrent > > > modification, so that even in this case we undo the inactivation. > > > > > > However, it is no longer limited to the cases where we inactivated the > > > image. It also applies to other code paths (like the postcopy one) where > > > we didn't inactivate images. > > > > > > What saves the patch is that bdrv_invalidate_cache() is a no-op for > > > block devices that aren't inactivated, so calling it more often than > > > necessary is okay. > > > > > > But then, if we're going to rely on this, it would be much better to > > > just remove the if altogether. I can't say whether there are any other > > > possible values of s->state that we should consider, and by removing the > > > if we would be guaranteed to catch all of them. > > > > > > If we don't want to rely on it, just keep a local bool that remembers > > > whether we inactivated images and check that here. > > > > > > > Error *local_err = NULL; > > > > > > > > bdrv_invalidate_cache_all(&local_err); > > > > > > So in summary, this is a horrible patch because it checks the wrong > > > thing, and for I can't really say if it covers everything it needs to > > > cover, but arguably it happens to correctly fix the outcome of a > > > previously failing case. > > > > > > Normally I would reject such a patch and require a clean solution, but > > > then we're on the day of -rc3, so if you can't send v2 right away, we > > > might not have the time for it. > > > > > > Tough call... > > > > Hmm, this case is messy; I created this function having split it out > > of the main loop a couple of years back but it did get more messy > > with more s->state checks; as far as I can tell it's always > > done the transition to COMPLETED at the end well after the locked > > section, so there's always been that chance that cancellation sneaks > > in just before or just after the locked section. > > > > Some of the bad cases that can happen: > > a) A cancel sneaks in after the ACTIVE check but before or after > > the locked section; should we reactivate the disks? Well that > > depends on whether the destination actually got the full migration > > stream - we don't know! > > If the destination actually starts running we must not reactivate > > the disk on the source even if the CPU is stopped. > > > > Yes, we didn't have a mechanism to know exactly whether or not the VM in > destination is well received. > > > b) If the bdrv_inactive_all fails for one device, but the others > > are fine, we go down the fail: label and don't reactivate, so > > the source dies even though it might have been mostly OK. > > > > > We can move the _lock to before the check of s->state at the top, > > which would stop the cancel sneaking in early. > > In the case where postcopy was never enabled (!migrate_postcopy_ram()) > > we can move the COMPLETED transition into the lock as well; so I think > > then we kind of become safe. > > In the case where postcopy was enabled I think we can do the COMPLETED > > transition before waiting for the return path to close - I think but > > I need to think more about that. > > And there seem to be some dodgy cases where we call the invalidate > > there after a late postcopy failure; that's bad, we shouldn't reactivate > > the source disks after going into postcopy. > > > > So, in summary; this function is a mess - it needs a much bigger > > fix than this patch. > > > > So what's the conclusion ? > Will you send a patch to fix it ? Or let's fix it step by step ? > I think Kevin's suggestion which just remove the *if* check is OK. I don't see any of the simple solutions are easy; so I plan to look at it properly, but am not sure when; if you have time to do it then feel free. Dave > > Thanks, > Hailiang > > > Dave > > > > > Kevin > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > . > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 2016/12/9 4:02, Dr. David Alan Gilbert wrote: > * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: >> Hi, >> >> On 2016/12/6 23:24, Dr. David Alan Gilbert wrote: >>> * Kevin Wolf (kwolf@redhat.com) wrote: >>>> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: >>>>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case >>>>> which migration aborted QEMU because it didn't regain the control >>>>> of images while some errors happened. >>>>> >>>>> Actually, we have another case in that error path to abort QEMU >>>>> because of the same reason: >>>>> migration_thread() >>>>> migration_completion() >>>>> bdrv_inactivate_all() ----------------> inactivate images >>>>> qemu_savevm_state_complete_precopy() >>>>> socket_writev_buffer() --------> error because destination fails >>>>> qemu_fflush() -------------------> set error on migration stream >>>>> qemu_mutex_unlock_iothread() ------> unlock >>>>> qmp_migrate_cancel() ---------------------> user cancelled migration >>>>> migrate_set_state() ------------------> set migrate CANCELLING >>>> >>>> Important to note here: qmp_migrate_cancel() is executed by a concurrent >>>> thread, it doesn't depend on any code paths in migration_completion(). >>>> >>>>> migration_completion() -----------------> go on to fail_invalidate >>>>> if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch >>>>> migration_thread() -----------------------> break migration loop >>>>> vm_start() -----------------------------> restart guest with inactive >>>>> images >>>>> We failed to regain the control of images because we only regain it >>>>> while the migration state is "active", but here users cancelled the migration >>>>> when they found some errors happened (for example, libvirtd daemon is shutdown >>>>> in destination unexpectedly). >>>>> >>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >>>>> --- >>>>> migration/migration.c | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>> index f498ab8..0c1ee6d 100644 >>>>> --- a/migration/migration.c >>>>> +++ b/migration/migration.c >>>>> @@ -1752,7 +1752,8 @@ fail_invalidate: >>>>> /* If not doing postcopy, vm_start() will be called: let's regain >>>>> * control on images. >>>>> */ >>>>> - if (s->state == MIGRATION_STATUS_ACTIVE) { >>>> >>>> This if condition tries to check whether we ran the code path that >>>> called bdrv_inactivate_all(), so that we only try to reactivate images >>>> it if we really inactivated them first. >>>> >>>> The problem with it is that it ignores a possible concurrent >>>> modification of s->state. >>>> >>>>> + if (s->state == MIGRATION_STATUS_ACTIVE || >>>>> + s->state == MIGRATION_STATUS_CANCELLING) { >>>> >>>> This adds another state that we could end up with with a concurrent >>>> modification, so that even in this case we undo the inactivation. >>>> >>>> However, it is no longer limited to the cases where we inactivated the >>>> image. It also applies to other code paths (like the postcopy one) where >>>> we didn't inactivate images. >>>> >>>> What saves the patch is that bdrv_invalidate_cache() is a no-op for >>>> block devices that aren't inactivated, so calling it more often than >>>> necessary is okay. >>>> >>>> But then, if we're going to rely on this, it would be much better to >>>> just remove the if altogether. I can't say whether there are any other >>>> possible values of s->state that we should consider, and by removing the >>>> if we would be guaranteed to catch all of them. >>>> >>>> If we don't want to rely on it, just keep a local bool that remembers >>>> whether we inactivated images and check that here. >>>> >>>>> Error *local_err = NULL; >>>>> >>>>> bdrv_invalidate_cache_all(&local_err); >>>> >>>> So in summary, this is a horrible patch because it checks the wrong >>>> thing, and for I can't really say if it covers everything it needs to >>>> cover, but arguably it happens to correctly fix the outcome of a >>>> previously failing case. >>>> >>>> Normally I would reject such a patch and require a clean solution, but >>>> then we're on the day of -rc3, so if you can't send v2 right away, we >>>> might not have the time for it. >>>> >>>> Tough call... >>> >>> Hmm, this case is messy; I created this function having split it out >>> of the main loop a couple of years back but it did get more messy >>> with more s->state checks; as far as I can tell it's always >>> done the transition to COMPLETED at the end well after the locked >>> section, so there's always been that chance that cancellation sneaks >>> in just before or just after the locked section. >>> >>> Some of the bad cases that can happen: >>> a) A cancel sneaks in after the ACTIVE check but before or after >>> the locked section; should we reactivate the disks? Well that >>> depends on whether the destination actually got the full migration >>> stream - we don't know! >>> If the destination actually starts running we must not reactivate >>> the disk on the source even if the CPU is stopped. >>> >> >> Yes, we didn't have a mechanism to know exactly whether or not the VM in >> destination is well received. >> >>> b) If the bdrv_inactive_all fails for one device, but the others >>> are fine, we go down the fail: label and don't reactivate, so >>> the source dies even though it might have been mostly OK. >>> >> >>> We can move the _lock to before the check of s->state at the top, >>> which would stop the cancel sneaking in early. >>> In the case where postcopy was never enabled (!migrate_postcopy_ram()) >>> we can move the COMPLETED transition into the lock as well; so I think >>> then we kind of become safe. >>> In the case where postcopy was enabled I think we can do the COMPLETED >>> transition before waiting for the return path to close - I think but >>> I need to think more about that. >>> And there seem to be some dodgy cases where we call the invalidate >>> there after a late postcopy failure; that's bad, we shouldn't reactivate >>> the source disks after going into postcopy. >>> >>> So, in summary; this function is a mess - it needs a much bigger >>> fix than this patch. >>> >> >> So what's the conclusion ? >> Will you send a patch to fix it ? Or let's fix it step by step ? >> I think Kevin's suggestion which just remove the *if* check is OK. > > I don't see any of the simple solutions are easy; so I plan > to look at it properly, but am not sure when; if you have time > to do it then feel free. > Hmm, we still have gaps between bdrv_inactivate_all() and migration thread totally exit, which migration cancelling can slip in. We do caught that case while we finished migration_completion() but before the begin of cleanup work (It has global lock to be protected). The related codes is: migration_completion(s, current_active_state, &old_vm_running, &start_time); break; } } ------------------------------------------------------> gap begin if (qemu_file_get_error(s->to_dst_file)) { migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_FAILED); trace_migration_thread_file_err(); break; } current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); if (current_time >= initial_time + BUFFER_DELAY) { uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) - initial_bytes; uint64_t time_spent = current_time - initial_time; double bandwidth = (double)transferred_bytes / time_spent; max_size = bandwidth * s->parameters.downtime_limit; s->mbps = (((double) transferred_bytes * 8.0) / ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; trace_migrate_transferred(transferred_bytes, time_spent, bandwidth, max_size); /* if we haven't sent anything, we don't want to recalculate 10000 is a small enough number for our purposes */ if (s->dirty_bytes_rate && transferred_bytes > 10000) { s->expected_downtime = s->dirty_bytes_rate / bandwidth; } qemu_file_reset_rate_limit(s->to_dst_file); initial_time = current_time; initial_bytes = qemu_ftell(s->to_dst_file); } if (qemu_file_rate_limit(s->to_dst_file)) { /* usleep expects microseconds */ g_usleep((initial_time + BUFFER_DELAY - current_time)*1000); } } trace_migration_thread_after_loop(); /* If we enabled cpu throttling for auto-converge, turn it off. */ cpu_throttle_stop(); end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); --------------------------------------------------------------> gap end qemu_mutex_lock_iothread(); /* So maybe we should call bdrv_invalidate_cache_all() in qmp_migrate_cancel() directly ? which i think will cover all the cases. (Including another potential case, which QEMU has finished the migration process, but libvirtd decides to cancel the migration before shutdown it.) The things here I'm not sure is, is it OK to call bdrv_invalidate_cache_all() without bdrv_inactivate_all() has been called ? If not, we need to record if we have inactive all blocks before call bdrv_invalidate_cache_all()。 Any ideas ? Hailiang > Dave > >> >> Thanks, >> Hailiang >> >>> Dave >>> >>>> Kevin >>>> >>> -- >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>> >>> . >>> >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > . >
ping .. ? Any comments ? Or should I send a for formal patch ? On 2016/12/22 10:56, Hailiang Zhang wrote: > On 2016/12/9 4:02, Dr. David Alan Gilbert wrote: >> * Hailiang Zhang (zhang.zhanghailiang@huawei.com) wrote: >>> Hi, >>> >>> On 2016/12/6 23:24, Dr. David Alan Gilbert wrote: >>>> * Kevin Wolf (kwolf@redhat.com) wrote: >>>>> Am 19.11.2016 um 12:43 hat zhanghailiang geschrieben: >>>>>> commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case >>>>>> which migration aborted QEMU because it didn't regain the control >>>>>> of images while some errors happened. >>>>>> >>>>>> Actually, we have another case in that error path to abort QEMU >>>>>> because of the same reason: >>>>>> migration_thread() >>>>>> migration_completion() >>>>>> bdrv_inactivate_all() ----------------> inactivate images >>>>>> qemu_savevm_state_complete_precopy() >>>>>> socket_writev_buffer() --------> error because destination fails >>>>>> qemu_fflush() -------------------> set error on migration stream >>>>>> qemu_mutex_unlock_iothread() ------> unlock >>>>>> qmp_migrate_cancel() ---------------------> user cancelled migration >>>>>> migrate_set_state() ------------------> set migrate CANCELLING >>>>> >>>>> Important to note here: qmp_migrate_cancel() is executed by a concurrent >>>>> thread, it doesn't depend on any code paths in migration_completion(). >>>>> >>>>>> migration_completion() -----------------> go on to fail_invalidate >>>>>> if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch >>>>>> migration_thread() -----------------------> break migration loop >>>>>> vm_start() -----------------------------> restart guest with inactive >>>>>> images >>>>>> We failed to regain the control of images because we only regain it >>>>>> while the migration state is "active", but here users cancelled the migration >>>>>> when they found some errors happened (for example, libvirtd daemon is shutdown >>>>>> in destination unexpectedly). >>>>>> >>>>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> >>>>>> --- >>>>>> migration/migration.c | 3 ++- >>>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>> index f498ab8..0c1ee6d 100644 >>>>>> --- a/migration/migration.c >>>>>> +++ b/migration/migration.c >>>>>> @@ -1752,7 +1752,8 @@ fail_invalidate: >>>>>> /* If not doing postcopy, vm_start() will be called: let's regain >>>>>> * control on images. >>>>>> */ >>>>>> - if (s->state == MIGRATION_STATUS_ACTIVE) { >>>>> >>>>> This if condition tries to check whether we ran the code path that >>>>> called bdrv_inactivate_all(), so that we only try to reactivate images >>>>> it if we really inactivated them first. >>>>> >>>>> The problem with it is that it ignores a possible concurrent >>>>> modification of s->state. >>>>> >>>>>> + if (s->state == MIGRATION_STATUS_ACTIVE || >>>>>> + s->state == MIGRATION_STATUS_CANCELLING) { >>>>> >>>>> This adds another state that we could end up with with a concurrent >>>>> modification, so that even in this case we undo the inactivation. >>>>> >>>>> However, it is no longer limited to the cases where we inactivated the >>>>> image. It also applies to other code paths (like the postcopy one) where >>>>> we didn't inactivate images. >>>>> >>>>> What saves the patch is that bdrv_invalidate_cache() is a no-op for >>>>> block devices that aren't inactivated, so calling it more often than >>>>> necessary is okay. >>>>> >>>>> But then, if we're going to rely on this, it would be much better to >>>>> just remove the if altogether. I can't say whether there are any other >>>>> possible values of s->state that we should consider, and by removing the >>>>> if we would be guaranteed to catch all of them. >>>>> >>>>> If we don't want to rely on it, just keep a local bool that remembers >>>>> whether we inactivated images and check that here. >>>>> >>>>>> Error *local_err = NULL; >>>>>> >>>>>> bdrv_invalidate_cache_all(&local_err); >>>>> >>>>> So in summary, this is a horrible patch because it checks the wrong >>>>> thing, and for I can't really say if it covers everything it needs to >>>>> cover, but arguably it happens to correctly fix the outcome of a >>>>> previously failing case. >>>>> >>>>> Normally I would reject such a patch and require a clean solution, but >>>>> then we're on the day of -rc3, so if you can't send v2 right away, we >>>>> might not have the time for it. >>>>> >>>>> Tough call... >>>> >>>> Hmm, this case is messy; I created this function having split it out >>>> of the main loop a couple of years back but it did get more messy >>>> with more s->state checks; as far as I can tell it's always >>>> done the transition to COMPLETED at the end well after the locked >>>> section, so there's always been that chance that cancellation sneaks >>>> in just before or just after the locked section. >>>> >>>> Some of the bad cases that can happen: >>>> a) A cancel sneaks in after the ACTIVE check but before or after >>>> the locked section; should we reactivate the disks? Well that >>>> depends on whether the destination actually got the full migration >>>> stream - we don't know! >>>> If the destination actually starts running we must not reactivate >>>> the disk on the source even if the CPU is stopped. >>>> >>> >>> Yes, we didn't have a mechanism to know exactly whether or not the VM in >>> destination is well received. >>> >>>> b) If the bdrv_inactive_all fails for one device, but the others >>>> are fine, we go down the fail: label and don't reactivate, so >>>> the source dies even though it might have been mostly OK. >>>> >>> >>>> We can move the _lock to before the check of s->state at the top, >>>> which would stop the cancel sneaking in early. >>>> In the case where postcopy was never enabled (!migrate_postcopy_ram()) >>>> we can move the COMPLETED transition into the lock as well; so I think >>>> then we kind of become safe. >>>> In the case where postcopy was enabled I think we can do the COMPLETED >>>> transition before waiting for the return path to close - I think but >>>> I need to think more about that. >>>> And there seem to be some dodgy cases where we call the invalidate >>>> there after a late postcopy failure; that's bad, we shouldn't reactivate >>>> the source disks after going into postcopy. >>>> >>>> So, in summary; this function is a mess - it needs a much bigger >>>> fix than this patch. >>>> >>> >>> So what's the conclusion ? >>> Will you send a patch to fix it ? Or let's fix it step by step ? >>> I think Kevin's suggestion which just remove the *if* check is OK. >> >> I don't see any of the simple solutions are easy; so I plan >> to look at it properly, but am not sure when; if you have time >> to do it then feel free. >> > > Hmm, we still have gaps between bdrv_inactivate_all() and migration thread > totally exit, which migration cancelling can slip in. > We do caught that case while we finished migration_completion() but > before the begin of cleanup work (It has global lock to be protected). > The related codes is: > > migration_completion(s, current_active_state, > &old_vm_running, &start_time); > break; > } > } > ------------------------------------------------------> gap begin > if (qemu_file_get_error(s->to_dst_file)) { > migrate_set_state(&s->state, current_active_state, > MIGRATION_STATUS_FAILED); > trace_migration_thread_file_err(); > break; > } > current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > if (current_time >= initial_time + BUFFER_DELAY) { > uint64_t transferred_bytes = qemu_ftell(s->to_dst_file) - > initial_bytes; > uint64_t time_spent = current_time - initial_time; > double bandwidth = (double)transferred_bytes / time_spent; > max_size = bandwidth * s->parameters.downtime_limit; > > s->mbps = (((double) transferred_bytes * 8.0) / > ((double) time_spent / 1000.0)) / 1000.0 / 1000.0; > > trace_migrate_transferred(transferred_bytes, time_spent, > bandwidth, max_size); > /* if we haven't sent anything, we don't want to recalculate > 10000 is a small enough number for our purposes */ > if (s->dirty_bytes_rate && transferred_bytes > 10000) { > s->expected_downtime = s->dirty_bytes_rate / bandwidth; > } > > qemu_file_reset_rate_limit(s->to_dst_file); > initial_time = current_time; > initial_bytes = qemu_ftell(s->to_dst_file); > } > if (qemu_file_rate_limit(s->to_dst_file)) { > /* usleep expects microseconds */ > g_usleep((initial_time + BUFFER_DELAY - current_time)*1000); > } > } > > trace_migration_thread_after_loop(); > /* If we enabled cpu throttling for auto-converge, turn it off. */ > cpu_throttle_stop(); > end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > --------------------------------------------------------------> gap end > qemu_mutex_lock_iothread(); > /* > > So maybe we should call bdrv_invalidate_cache_all() in qmp_migrate_cancel() > directly ? which i think will cover all the cases. (Including another potential > case, which QEMU has finished the migration process, but libvirtd decides to > cancel the migration before shutdown it.) > > The things here I'm not sure is, is it OK to call bdrv_invalidate_cache_all() > without bdrv_inactivate_all() has been called ? If not, we need to record > if we have inactive all blocks before call bdrv_invalidate_cache_all()。 > > Any ideas ? > > Hailiang > >> Dave >> >>> >>> Thanks, >>> Hailiang >>> >>>> Dave >>>> >>>>> Kevin >>>>> >>>> -- >>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >>>> >>>> . >>>> >>> >> -- >> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK >> >> . >>
diff --git a/migration/migration.c b/migration/migration.c index f498ab8..0c1ee6d 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1752,7 +1752,8 @@ fail_invalidate: /* If not doing postcopy, vm_start() will be called: let's regain * control on images. */ - if (s->state == MIGRATION_STATUS_ACTIVE) { + if (s->state == MIGRATION_STATUS_ACTIVE || + s->state == MIGRATION_STATUS_CANCELLING) { Error *local_err = NULL; bdrv_invalidate_cache_all(&local_err);
commit fe904ea8242cbae2d7e69c052c754b8f5f1ba1d6 fixed a case which migration aborted QEMU because it didn't regain the control of images while some errors happened. Actually, we have another case in that error path to abort QEMU because of the same reason: migration_thread() migration_completion() bdrv_inactivate_all() ----------------> inactivate images qemu_savevm_state_complete_precopy() socket_writev_buffer() --------> error because destination fails qemu_fflush() -------------------> set error on migration stream qemu_mutex_unlock_iothread() ------> unlock qmp_migrate_cancel() ---------------------> user cancelled migration migrate_set_state() ------------------> set migrate CANCELLING migration_completion() -----------------> go on to fail_invalidate if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch migration_thread() -----------------------> break migration loop vm_start() -----------------------------> restart guest with inactive images We failed to regain the control of images because we only regain it while the migration state is "active", but here users cancelled the migration when they found some errors happened (for example, libvirtd daemon is shutdown in destination unexpectedly). Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> --- migration/migration.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)