Message ID | 20241020130108.27148-3-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vfio/migration: Some bug fixes and cleanups | expand |
Hello Avihai, On 10/20/24 15:01, Avihai Horon wrote: > When the VM is shut down vfio_vmstate_change/_prepare() are called to > transition the VFIO device state to STOP. They are called after > migration_shutdown() and thus, by this time, the migration object is > already freed (more specifically, MigrationState->qemu_file_lock is > already destroyed). > > In this case, if there is an error in vfio_vmstate_change/_prepare(), it > calls migration_file_set_error() which tries to lock the already > destroyed MigrationState->qemu_file_lock, leading to the following > assert: > > qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. > > Fix this by not setting migration file error in the shut down flow. > > Fixes: 20c64c8a51a4 ("migration: migration_file_set_error") > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > --- > hw/vfio/migration.c | 31 +++++++++++++++++++++---------- > 1 file changed, 21 insertions(+), 10 deletions(-) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 992dc3b102..1c44b036ea 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -783,6 +783,25 @@ static const SaveVMHandlers savevm_vfio_handlers = { > > /* ---------------------------------------------------------------------- */ > > +static void vfio_vmstate_change_error_report(int ret, Error *err, > + RunState state) > +{ > + if (state == RUN_STATE_SHUTDOWN) { > + /* > + * If VM is being shut down, migration object might have already been > + * freed, so just report the error. > + */ > + error_report_err(err); > + return; > + } > + > + /* > + * Migration should be aborted in this case, but vm_state_notify() > + * currently does not support reporting failures. > + */ > + migration_file_set_error(ret, err); > +} This seems correct, but testing the machine's runtime state to work around the fact that 'current_migration' is not safe to use reveals a flaw. In vfio, migration_is_setup_or_active() is unsafe. So are the calls to vfio_set_migration_error(). This comment seems in contradiction with the migration code : /* When we add fault tolerance, we could have several migrations at once. For now we don't need to add dynamic creation of migration */ Why is 'current_migration' allocated and destroyed today ? Could we replace it with a singleton and switch the state to MIGRATION_STATUS_NONE instead of destroying it ? If not, should 'current_migration' be set to NULL in migration_shutdown() and its value tested before accessing to any of its fields, in a thread safe manner if necessary. Thanks, C.
On Mon, Oct 21, 2024 at 05:14:11PM +0200, Cédric Le Goater wrote: > Hello Avihai, > > On 10/20/24 15:01, Avihai Horon wrote: > > When the VM is shut down vfio_vmstate_change/_prepare() are called to > > transition the VFIO device state to STOP. They are called after > > migration_shutdown() and thus, by this time, the migration object is > > already freed (more specifically, MigrationState->qemu_file_lock is > > already destroyed). > > > > In this case, if there is an error in vfio_vmstate_change/_prepare(), it > > calls migration_file_set_error() which tries to lock the already > > destroyed MigrationState->qemu_file_lock, leading to the following > > assert: > > > > qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. > > > > Fix this by not setting migration file error in the shut down flow. > > > > Fixes: 20c64c8a51a4 ("migration: migration_file_set_error") > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > > --- > > hw/vfio/migration.c | 31 +++++++++++++++++++++---------- > > 1 file changed, 21 insertions(+), 10 deletions(-) > > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > > index 992dc3b102..1c44b036ea 100644 > > --- a/hw/vfio/migration.c > > +++ b/hw/vfio/migration.c > > @@ -783,6 +783,25 @@ static const SaveVMHandlers savevm_vfio_handlers = { > > /* ---------------------------------------------------------------------- */ > > +static void vfio_vmstate_change_error_report(int ret, Error *err, > > + RunState state) > > +{ > > + if (state == RUN_STATE_SHUTDOWN) { > > + /* > > + * If VM is being shut down, migration object might have already been > > + * freed, so just report the error. > > + */ > > + error_report_err(err); > > + return; > > + } > > + > > + /* > > + * Migration should be aborted in this case, but vm_state_notify() > > + * currently does not support reporting failures. > > + */ > > + migration_file_set_error(ret, err); > > +} > > This seems correct, but testing the machine's runtime state to > work around the fact that 'current_migration' is not safe to > use reveals a flaw. > > In vfio, migration_is_setup_or_active() is unsafe. So are the > calls to vfio_set_migration_error(). > > > This comment seems in contradiction with the migration code : > > /* When we add fault tolerance, we could have several > migrations at once. For now we don't need to add > dynamic creation of migration */ > > Why is 'current_migration' allocated and destroyed today ? > > Could we replace it with a singleton and switch the state to > MIGRATION_STATUS_NONE instead of destroying it ? > > If not, should 'current_migration' be set to NULL in > migration_shutdown() and its value tested before accessing > to any of its fields, in a thread safe manner if necessary. IIUC the migration thread should always see valid migration object, as it takes one refcount at the entrance of migration_thread(): object_ref(OBJECT(s)); So it's not yet clear to me on why the mutex was destroyed, if the main object should still be there; logically it was only destroyed in the finalize phase (migration_instance_finalize), but that should be after migration thread quits and releases the refcount..
Hello, > IIUC the migration thread should always see valid migration object, as it > takes one refcount at the entrance of migration_thread(): > > object_ref(OBJECT(s)); Could the migration have failed before ? in migrate_fd_connect() > So it's not yet clear to me on why the mutex was destroyed, if the main > object should still be there; logically it was only destroyed in the > finalize phase (migration_instance_finalize), but that should be after > migration thread quits and releases the refcount.. Avihai, would you have a reproducer ? or did you keep some logs ? a backtrace maybe ? Thanks, C.
On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote: > Hello, > > > IIUC the migration thread should always see valid migration object, as it > > takes one refcount at the entrance of migration_thread(): > > > > object_ref(OBJECT(s)); > > Could the migration have failed before ? in migrate_fd_connect() I just noticed it's a vm state change notifier.. If so, maybe VFIO could refer to its internal states showing that it's during a migration first (so as to guarantee the migration object is valid; e.g., only after save_setup() but before save_cleanup() hooks are invoked). Then migration_file_set_error() is only needed when it's during a migration already. Otherwise it's only a vm state change event, so nothing relevant to migration.
On 21/10/2024 19:54, Peter Xu wrote: > External email: Use caution opening links or attachments > > > On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote: >> Hello, >> >>> IIUC the migration thread should always see valid migration object, as it >>> takes one refcount at the entrance of migration_thread(): >>> >>> object_ref(OBJECT(s)); >> Could the migration have failed before ? in migrate_fd_connect() > I just noticed it's a vm state change notifier.. Yep. I stumbled upon this bug by accident when running on a buggy machine. Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change). You can repro it by forcefully triggering an error on *->STOP transition: diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 17199b73ae..d41cb7c634 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) } ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err); - if (ret) { + if (ret || new_state == VFIO_DEVICE_STATE_STOP) { + ret = -1; + error_setg(&local_err, "%s: forced error", vbasedev->name); /* * Migration should be aborted in this case, but vm_state_notify() * currently does not support reporting failures. > > If so, maybe VFIO could refer to its internal states showing that it's > during a migration first (so as to guarantee the migration object is valid; > e.g., only after save_setup() but before save_cleanup() hooks are invoked). It's an option, but I think it's a bit awkward as we'd need to check that VFIOMigration->data_buffer is set or add a new flag. Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed. Maybe a simpler solution would be to extend the the migration object lifetime? Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup. Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.). If so, the fix could be something like the below? -------------8<------------- diff --git a/include/migration/misc.h b/include/migration/misc.h index bfadc5613b..5eb099349a 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp); /* migration/migration.c */ void migration_object_init(void); +void migration_object_finalize(void); void migration_shutdown(void); bool migration_is_idle(void); bool migration_is_active(void); diff --git a/migration/migration.c b/migration/migration.c index 021faee2f3..9eaa7947bc 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -265,6 +265,11 @@ void migration_object_init(void) dirty_bitmap_mig_init(); } +void migration_object_finalize(void) +{ + object_unref(OBJECT(current_migration)); +} + typedef struct { QEMUBH *bh; QEMUBHFunc *cb; @@ -330,7 +335,6 @@ void migration_shutdown(void) * stop the migration using this structure */ migration_cancel(NULL); - object_unref(OBJECT(current_migration)); /* * Cancel outgoing migration of dirty bitmaps. It should diff --git a/system/runstate.c b/system/runstate.c index c2c9afa905..fa823f5e72 100644 --- a/system/runstate.c +++ b/system/runstate.c @@ -930,5 +930,6 @@ void qemu_cleanup(int status) monitor_cleanup(); qemu_chr_cleanup(); user_creatable_cleanup(); + migration_object_finalize(); /* TODO: unref root container, check all devices are ok */ } -------------8<------------- Thanks.
On 10/22/24 11:38, Avihai Horon wrote: > > On 21/10/2024 19:54, Peter Xu wrote: >> External email: Use caution opening links or attachments >> >> >> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote: >>> Hello, >>> >>>> IIUC the migration thread should always see valid migration object, as it >>>> takes one refcount at the entrance of migration_thread(): >>>> >>>> object_ref(OBJECT(s)); >>> Could the migration have failed before ? in migrate_fd_connect() >> I just noticed it's a vm state change notifier.. > > Yep. > I stumbled upon this bug by accident when running on a buggy machine. > Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change). > > You can repro it by forcefully triggering an error on *->STOP transition: > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 17199b73ae..d41cb7c634 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) > } > > ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err); > - if (ret) { > + if (ret || new_state == VFIO_DEVICE_STATE_STOP) { > + ret = -1; > + error_setg(&local_err, "%s: forced error", vbasedev->name); > /* > * Migration should be aborted in this case, but vm_state_notify() > * currently does not support reporting failures. > >> >> If so, maybe VFIO could refer to its internal states showing that it's >> during a migration first (so as to guarantee the migration object is valid; >> e.g., only after save_setup() but before save_cleanup() hooks are invoked). > > It's an option, but I think it's a bit awkward as we'd need to check that VFIOMigration->data_buffer is set That's what I was looking at too. It works. It feels a bit awkward indeed. We could hide the details in an helper routine though. > or add a new flag. yes that's another solution. Peter, I wonder if we could grab a ref on current_migration in save_setup(), store it under VFIOMigration and drop it save_cleanup() ? > Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed. > > Maybe a simpler solution would be to extend the the migration object lifetime? > Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup. > Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.). > > If so, the fix could be something like the below? > > -------------8<------------- > diff --git a/include/migration/misc.h b/include/migration/misc.h > index bfadc5613b..5eb099349a 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp); > > /* migration/migration.c */ > void migration_object_init(void); > +void migration_object_finalize(void); > void migration_shutdown(void); > bool migration_is_idle(void); > bool migration_is_active(void); > diff --git a/migration/migration.c b/migration/migration.c > index 021faee2f3..9eaa7947bc 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -265,6 +265,11 @@ void migration_object_init(void) > dirty_bitmap_mig_init(); > } > > +void migration_object_finalize(void) > +{ > + object_unref(OBJECT(current_migration)); > +} > + > typedef struct { > QEMUBH *bh; > QEMUBHFunc *cb; > @@ -330,7 +335,6 @@ void migration_shutdown(void) > * stop the migration using this structure > */ > migration_cancel(NULL); > - object_unref(OBJECT(current_migration)); > > /* > * Cancel outgoing migration of dirty bitmaps. It should > diff --git a/system/runstate.c b/system/runstate.c > index c2c9afa905..fa823f5e72 100644 > --- a/system/runstate.c > +++ b/system/runstate.c > @@ -930,5 +930,6 @@ void qemu_cleanup(int status) > monitor_cleanup(); > qemu_chr_cleanup(); > user_creatable_cleanup(); > + migration_object_finalize(); > /* TODO: unref root container, check all devices are ok */ > } > -------------8<------------- 892ae715b6bc was trying to fix potential use-after-free issues. I think it is safer to introduce an helper routine checking (in some ways) if a migration is in progress than partially revert 892ae715b6bc. Thanks, C.
On 22/10/2024 16:24, Cédric Le Goater wrote: > External email: Use caution opening links or attachments > > > On 10/22/24 11:38, Avihai Horon wrote: >> >> On 21/10/2024 19:54, Peter Xu wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote: >>>> Hello, >>>> >>>>> IIUC the migration thread should always see valid migration >>>>> object, as it >>>>> takes one refcount at the entrance of migration_thread(): >>>>> >>>>> object_ref(OBJECT(s)); >>>> Could the migration have failed before ? in migrate_fd_connect() >>> I just noticed it's a vm state change notifier.. >> >> Yep. >> I stumbled upon this bug by accident when running on a buggy machine. >> Migration wasn't involved, I just started the VM, shut it down and >> got the assert (as my VFIO device was faulty and errored on >> RUNNING->STOP state change). >> >> You can repro it by forcefully triggering an error on *->STOP >> transition: >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 17199b73ae..d41cb7c634 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, >> bool running, RunState state) >> } >> >> ret = vfio_migration_set_state_or_reset(vbasedev, new_state, >> &local_err); >> - if (ret) { >> + if (ret || new_state == VFIO_DEVICE_STATE_STOP) { >> + ret = -1; >> + error_setg(&local_err, "%s: forced error", vbasedev->name); >> /* >> * Migration should be aborted in this case, but >> vm_state_notify() >> * currently does not support reporting failures. >> >>> >>> If so, maybe VFIO could refer to its internal states showing that it's >>> during a migration first (so as to guarantee the migration object is >>> valid; >>> e.g., only after save_setup() but before save_cleanup() hooks are >>> invoked). >> >> It's an option, but I think it's a bit awkward as we'd need to check >> that VFIOMigration->data_buffer is set > > That's what I was looking at too. It works. It feels a bit awkward > indeed. We could hide the details in an helper routine though. > >> or add a new flag. > > yes that's another solution. > > Peter, > > I wonder if we could grab a ref on current_migration in save_setup(), > store it under VFIOMigration and drop it save_cleanup() ? > > >> Besides that, as Cedric pointed out, VFIO code calls >> migration_is_setup_or_active() which can also be unsafe, as it might >> be invoked after migration object has been freed. >> >> Maybe a simpler solution would be to extend the the migration object >> lifetime? >> Looking at commit history, you can see that commit 1f8956041ad3 >> ("migration: finalize current_migration object") added migration >> object finalization at the very end of qemu cleanup. >> Then came commit 892ae715b6bc ("migration: Cleanup during exit") and >> moved the migration object finalization to the beginning of qemu >> cleanup (before stopping the VM etc.). >> >> If so, the fix could be something like the below? >> >> -------------8<------------- >> diff --git a/include/migration/misc.h b/include/migration/misc.h >> index bfadc5613b..5eb099349a 100644 >> --- a/include/migration/misc.h >> +++ b/include/migration/misc.h >> @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp); >> >> /* migration/migration.c */ >> void migration_object_init(void); >> +void migration_object_finalize(void); >> void migration_shutdown(void); >> bool migration_is_idle(void); >> bool migration_is_active(void); >> diff --git a/migration/migration.c b/migration/migration.c >> index 021faee2f3..9eaa7947bc 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -265,6 +265,11 @@ void migration_object_init(void) >> dirty_bitmap_mig_init(); >> } >> >> +void migration_object_finalize(void) >> +{ >> + object_unref(OBJECT(current_migration)); >> +} >> + >> typedef struct { >> QEMUBH *bh; >> QEMUBHFunc *cb; >> @@ -330,7 +335,6 @@ void migration_shutdown(void) >> * stop the migration using this structure >> */ >> migration_cancel(NULL); >> - object_unref(OBJECT(current_migration)); >> >> /* >> * Cancel outgoing migration of dirty bitmaps. It should >> diff --git a/system/runstate.c b/system/runstate.c >> index c2c9afa905..fa823f5e72 100644 >> --- a/system/runstate.c >> +++ b/system/runstate.c >> @@ -930,5 +930,6 @@ void qemu_cleanup(int status) >> monitor_cleanup(); >> qemu_chr_cleanup(); >> user_creatable_cleanup(); >> + migration_object_finalize(); >> /* TODO: unref root container, check all devices are ok */ >> } >> -------------8<------------- > > 892ae715b6bc was trying to fix potential use-after-free issues. Yes, and it did, by ref/unref current_migration in migration_thread(). In addition, it added a migrate_fd_cancel() in the beginning of qemu cleanup to encourage migration_thread to quit. IIUC, these are the core changes of the commit, and the only reason to move "object_unref(OBJECT(current_migration))" from the end of qemu cleanup to the beginning of it was to put migration cleanup code in one place. That's why I think moving "object_unref(OBJECT(current_migration))" back to the end of qemu cleanup is safe and shouldn't make much difference. It will delay freeing of MigrationState right to the end and will allow us to use the already existing helper (migration_is_setup_or_active()) to check migration status. IMHO this seems like the cleanest solution, unless I am missing something. Thanks. > > I think it is safer to introduce an helper routine checking > (in some ways) if a migration is in progress than partially > revert 892ae715b6bc. > > Thanks, > > C. >
On Tue, Oct 22, 2024 at 03:24:56PM +0200, Cédric Le Goater wrote: > On 10/22/24 11:38, Avihai Horon wrote: > > > > On 21/10/2024 19:54, Peter Xu wrote: > > > External email: Use caution opening links or attachments > > > > > > > > > On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote: > > > > Hello, > > > > > > > > > IIUC the migration thread should always see valid migration object, as it > > > > > takes one refcount at the entrance of migration_thread(): > > > > > > > > > > object_ref(OBJECT(s)); > > > > Could the migration have failed before ? in migrate_fd_connect() > > > I just noticed it's a vm state change notifier.. > > > > Yep. > > I stumbled upon this bug by accident when running on a buggy machine. > > Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change). > > > > You can repro it by forcefully triggering an error on *->STOP transition: > > > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > > index 17199b73ae..d41cb7c634 100644 > > --- a/hw/vfio/migration.c > > +++ b/hw/vfio/migration.c > > @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) > > } > > > > ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err); > > - if (ret) { > > + if (ret || new_state == VFIO_DEVICE_STATE_STOP) { > > + ret = -1; > > + error_setg(&local_err, "%s: forced error", vbasedev->name); > > /* > > * Migration should be aborted in this case, but vm_state_notify() > > * currently does not support reporting failures. > > > > > > > > If so, maybe VFIO could refer to its internal states showing that it's > > > during a migration first (so as to guarantee the migration object is valid; > > > e.g., only after save_setup() but before save_cleanup() hooks are invoked). > > > > It's an option, but I think it's a bit awkward as we'd need to check > > that VFIOMigration->data_buffer is set > > That's what I was looking at too. It works. It feels a bit awkward > indeed. We could hide the details in an helper routine though. > > > or add a new flag. > > yes that's another solution. > > Peter, > > I wonder if we could grab a ref on current_migration in save_setup(), > store it under VFIOMigration and drop it save_cleanup() ? VFIO can definitely do that, but I am not sure how that would help.. as I think the migration object can never go away anyway during setup->cleanup, so taking that extra refcount shouldn't change anything. IOW, AFAICT this bug is triggered only when without migration at all. > > > > Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed. > > > > Maybe a simpler solution would be to extend the the migration object lifetime? > > Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup. > > Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.). > > > > If so, the fix could be something like the below? > > > > -------------8<------------- > > diff --git a/include/migration/misc.h b/include/migration/misc.h > > index bfadc5613b..5eb099349a 100644 > > --- a/include/migration/misc.h > > +++ b/include/migration/misc.h > > @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp); > > > > /* migration/migration.c */ > > void migration_object_init(void); > > +void migration_object_finalize(void); > > void migration_shutdown(void); > > bool migration_is_idle(void); > > bool migration_is_active(void); > > diff --git a/migration/migration.c b/migration/migration.c > > index 021faee2f3..9eaa7947bc 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -265,6 +265,11 @@ void migration_object_init(void) > > dirty_bitmap_mig_init(); > > } > > > > +void migration_object_finalize(void) > > +{ > > + object_unref(OBJECT(current_migration)); > > +} > > + > > typedef struct { > > QEMUBH *bh; > > QEMUBHFunc *cb; > > @@ -330,7 +335,6 @@ void migration_shutdown(void) > > * stop the migration using this structure > > */ > > migration_cancel(NULL); > > - object_unref(OBJECT(current_migration)); > > > > /* > > * Cancel outgoing migration of dirty bitmaps. It should > > diff --git a/system/runstate.c b/system/runstate.c > > index c2c9afa905..fa823f5e72 100644 > > --- a/system/runstate.c > > +++ b/system/runstate.c > > @@ -930,5 +930,6 @@ void qemu_cleanup(int status) > > monitor_cleanup(); > > qemu_chr_cleanup(); > > user_creatable_cleanup(); > > + migration_object_finalize(); > > /* TODO: unref root container, check all devices are ok */ > > } > > -------------8<------------- > > 892ae715b6bc was trying to fix potential use-after-free issues. > > I think it is safer to introduce an helper routine checking > (in some ways) if a migration is in progress than partially > revert 892ae715b6bc. Right, Dave also mentioned something in 892ae715b6bc about moving it earlier: We do this a bit earlier; so hopefully migration gets out of the way before all the devices etc are freed. But I don't know the relationship on device free() v.s. the migration object. We might at least want to figure that out if we want to move it again. I notice that vdpa also started using migration_is_setup_or_active(), so if it's used in more places, maybe indeed we can consider making them safe to be called at any phase of QEMU. Logically it is safe in vm state change hook always because it has the BQL and the object can only be freed when with BQL. So let me send a small patch later to hopefully make all these exported functions (including migration_file_set_error() in this case, logically anything in migration/misc.h) safe to be called without migration.
On 10/22/24 16:29, Peter Xu wrote: > On Tue, Oct 22, 2024 at 03:24:56PM +0200, Cédric Le Goater wrote: >> On 10/22/24 11:38, Avihai Horon wrote: >>> >>> On 21/10/2024 19:54, Peter Xu wrote: >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote: >>>>> Hello, >>>>> >>>>>> IIUC the migration thread should always see valid migration object, as it >>>>>> takes one refcount at the entrance of migration_thread(): >>>>>> >>>>>> object_ref(OBJECT(s)); >>>>> Could the migration have failed before ? in migrate_fd_connect() >>>> I just noticed it's a vm state change notifier.. >>> >>> Yep. >>> I stumbled upon this bug by accident when running on a buggy machine. >>> Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change). >>> >>> You can repro it by forcefully triggering an error on *->STOP transition: >>> >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >>> index 17199b73ae..d41cb7c634 100644 >>> --- a/hw/vfio/migration.c >>> +++ b/hw/vfio/migration.c >>> @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) >>> } >>> >>> ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err); >>> - if (ret) { >>> + if (ret || new_state == VFIO_DEVICE_STATE_STOP) { >>> + ret = -1; >>> + error_setg(&local_err, "%s: forced error", vbasedev->name); >>> /* >>> * Migration should be aborted in this case, but vm_state_notify() >>> * currently does not support reporting failures. >>> >>>> >>>> If so, maybe VFIO could refer to its internal states showing that it's >>>> during a migration first (so as to guarantee the migration object is valid; >>>> e.g., only after save_setup() but before save_cleanup() hooks are invoked). >>> >>> It's an option, but I think it's a bit awkward as we'd need to check >>> that VFIOMigration->data_buffer is set >> >> That's what I was looking at too. It works. It feels a bit awkward >> indeed. We could hide the details in an helper routine though. >> >>> or add a new flag. >> >> yes that's another solution. >> >> Peter, >> >> I wonder if we could grab a ref on current_migration in save_setup(), >> store it under VFIOMigration and drop it save_cleanup() ? > > VFIO can definitely do that, but I am not sure how that would help.. as I > think the migration object can never go away anyway during setup->cleanup, > so taking that extra refcount shouldn't change anything. It won't be but we would have a VFIOMigration::MigrationState pointer to test in the vmstate change handler, which is a bit cleaner than testing migration->data_buffer. Just an idea. I am not convinced myself either. > IOW, AFAICT this bug is triggered only when without migration at all. > >> >> >>> Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed. >>> >>> Maybe a simpler solution would be to extend the the migration object lifetime? >>> Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup. >>> Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.). >>> >>> If so, the fix could be something like the below? >>> >>> -------------8<------------- >>> diff --git a/include/migration/misc.h b/include/migration/misc.h >>> index bfadc5613b..5eb099349a 100644 >>> --- a/include/migration/misc.h >>> +++ b/include/migration/misc.h >>> @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp); >>> >>> /* migration/migration.c */ >>> void migration_object_init(void); >>> +void migration_object_finalize(void); >>> void migration_shutdown(void); >>> bool migration_is_idle(void); >>> bool migration_is_active(void); >>> diff --git a/migration/migration.c b/migration/migration.c >>> index 021faee2f3..9eaa7947bc 100644 >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -265,6 +265,11 @@ void migration_object_init(void) >>> dirty_bitmap_mig_init(); >>> } >>> >>> +void migration_object_finalize(void) >>> +{ >>> + object_unref(OBJECT(current_migration)); >>> +} >>> + >>> typedef struct { >>> QEMUBH *bh; >>> QEMUBHFunc *cb; >>> @@ -330,7 +335,6 @@ void migration_shutdown(void) >>> * stop the migration using this structure >>> */ >>> migration_cancel(NULL); >>> - object_unref(OBJECT(current_migration)); >>> >>> /* >>> * Cancel outgoing migration of dirty bitmaps. It should >>> diff --git a/system/runstate.c b/system/runstate.c >>> index c2c9afa905..fa823f5e72 100644 >>> --- a/system/runstate.c >>> +++ b/system/runstate.c >>> @@ -930,5 +930,6 @@ void qemu_cleanup(int status) >>> monitor_cleanup(); >>> qemu_chr_cleanup(); >>> user_creatable_cleanup(); >>> + migration_object_finalize(); >>> /* TODO: unref root container, check all devices are ok */ >>> } >>> -------------8<------------- >> >> 892ae715b6bc was trying to fix potential use-after-free issues. >> >> I think it is safer to introduce an helper routine checking >> (in some ways) if a migration is in progress than partially >> revert 892ae715b6bc. > > Right, Dave also mentioned something in 892ae715b6bc about moving it > earlier: > > We do this a bit earlier; so hopefully migration gets out of the way > before all the devices etc are freed. > > But I don't know the relationship on device free() v.s. the migration > object. We might at least want to figure that out if we want to move it > again. > > I notice that vdpa also started using migration_is_setup_or_active(), so if > it's used in more places, maybe indeed we can consider making them safe to > be called at any phase of QEMU. > > Logically it is safe in vm state change hook always because it has the BQL > and the object can only be freed when with BQL. > > So let me send a small patch later to hopefully make all these exported > functions (including migration_file_set_error() in this case, logically > anything in migration/misc.h) safe to be called without migration. > OK. Thanks C.
On Tue, Oct 22, 2024 at 04:42:14PM +0200, Cédric Le Goater wrote: > > So let me send a small patch later to hopefully make all these exported > > functions (including migration_file_set_error() in this case, logically > > anything in migration/misc.h) safe to be called without migration. > > OK. Posted here: https://lore.kernel.org/qemu-devel/20241022160720.1013543-1-peterx@redhat.com/ Thanks,
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 992dc3b102..1c44b036ea 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -783,6 +783,25 @@ static const SaveVMHandlers savevm_vfio_handlers = { /* ---------------------------------------------------------------------- */ +static void vfio_vmstate_change_error_report(int ret, Error *err, + RunState state) +{ + if (state == RUN_STATE_SHUTDOWN) { + /* + * If VM is being shut down, migration object might have already been + * freed, so just report the error. + */ + error_report_err(err); + return; + } + + /* + * Migration should be aborted in this case, but vm_state_notify() + * currently does not support reporting failures. + */ + migration_file_set_error(ret, err); +} + static void vfio_vmstate_change_prepare(void *opaque, bool running, RunState state) { @@ -798,11 +817,7 @@ static void vfio_vmstate_change_prepare(void *opaque, bool running, ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err); if (ret) { - /* - * Migration should be aborted in this case, but vm_state_notify() - * currently does not support reporting failures. - */ - migration_file_set_error(ret, local_err); + vfio_vmstate_change_error_report(ret, local_err, state); } trace_vfio_vmstate_change_prepare(vbasedev->name, running, @@ -829,11 +844,7 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state) ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err); if (ret) { - /* - * Migration should be aborted in this case, but vm_state_notify() - * currently does not support reporting failures. - */ - migration_file_set_error(ret, local_err); + vfio_vmstate_change_error_report(ret, local_err, state); } trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
When the VM is shut down vfio_vmstate_change/_prepare() are called to transition the VFIO device state to STOP. They are called after migration_shutdown() and thus, by this time, the migration object is already freed (more specifically, MigrationState->qemu_file_lock is already destroyed). In this case, if there is an error in vfio_vmstate_change/_prepare(), it calls migration_file_set_error() which tries to lock the already destroyed MigrationState->qemu_file_lock, leading to the following assert: qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. Fix this by not setting migration file error in the shut down flow. Fixes: 20c64c8a51a4 ("migration: migration_file_set_error") Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- hw/vfio/migration.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-)