diff mbox series

[2/3] vfio/migration: Refactor vfio_vmstate_change/_prepare() error reporting

Message ID 20241020130108.27148-3-avihaih@nvidia.com (mailing list archive)
State New
Headers show
Series vfio/migration: Some bug fixes and cleanups | expand

Commit Message

Avihai Horon Oct. 20, 2024, 1:01 p.m. UTC
When the VM is shut down vfio_vmstate_change/_prepare() are called to
transition the VFIO device state to STOP. They are called after
migration_shutdown() and thus, by this time, the migration object is
already freed (more specifically, MigrationState->qemu_file_lock is
already destroyed).

In this case, if there is an error in vfio_vmstate_change/_prepare(), it
calls migration_file_set_error() which tries to lock the already
destroyed MigrationState->qemu_file_lock, leading to the following
assert:

  qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.

Fix this by not setting migration file error in the shut down flow.

Fixes: 20c64c8a51a4 ("migration: migration_file_set_error")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

Comments

Cédric Le Goater Oct. 21, 2024, 3:14 p.m. UTC | #1
Hello Avihai,

On 10/20/24 15:01, Avihai Horon wrote:
> When the VM is shut down vfio_vmstate_change/_prepare() are called to
> transition the VFIO device state to STOP. They are called after
> migration_shutdown() and thus, by this time, the migration object is
> already freed (more specifically, MigrationState->qemu_file_lock is
> already destroyed).
> 
> In this case, if there is an error in vfio_vmstate_change/_prepare(), it
> calls migration_file_set_error() which tries to lock the already
> destroyed MigrationState->qemu_file_lock, leading to the following
> assert:
> 
>    qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
> 
> Fix this by not setting migration file error in the shut down flow.
> 
> Fixes: 20c64c8a51a4 ("migration: migration_file_set_error")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   hw/vfio/migration.c | 31 +++++++++++++++++++++----------
>   1 file changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 992dc3b102..1c44b036ea 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -783,6 +783,25 @@ static const SaveVMHandlers savevm_vfio_handlers = {
>   
>   /* ---------------------------------------------------------------------- */
>   
> +static void vfio_vmstate_change_error_report(int ret, Error *err,
> +                                             RunState state)
> +{
> +    if (state == RUN_STATE_SHUTDOWN) {
> +        /*
> +         * If VM is being shut down, migration object might have already been
> +         * freed, so just report the error.
> +         */
> +        error_report_err(err);
> +        return;
> +    }
> +
> +    /*
> +     * Migration should be aborted in this case, but vm_state_notify()
> +     * currently does not support reporting failures.
> +     */
> +    migration_file_set_error(ret, err);
> +}

This seems correct, but testing the machine's runtime state to
work around the fact that 'current_migration' is not safe to
use reveals a flaw.

In vfio, migration_is_setup_or_active() is unsafe. So are the
calls to vfio_set_migration_error().


This comment seems in contradiction with the migration code :

/* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
    dynamic creation of migration */

Why is 'current_migration' allocated and destroyed today ?

Could we replace it with a singleton and switch the state to
MIGRATION_STATUS_NONE instead of destroying it ?

If not, should 'current_migration' be set to NULL in
migration_shutdown() and its value tested before accessing
to any of its fields, in a thread safe manner if necessary.

Thanks,

C.
Peter Xu Oct. 21, 2024, 3:50 p.m. UTC | #2
On Mon, Oct 21, 2024 at 05:14:11PM +0200, Cédric Le Goater wrote:
> Hello Avihai,
> 
> On 10/20/24 15:01, Avihai Horon wrote:
> > When the VM is shut down vfio_vmstate_change/_prepare() are called to
> > transition the VFIO device state to STOP. They are called after
> > migration_shutdown() and thus, by this time, the migration object is
> > already freed (more specifically, MigrationState->qemu_file_lock is
> > already destroyed).
> > 
> > In this case, if there is an error in vfio_vmstate_change/_prepare(), it
> > calls migration_file_set_error() which tries to lock the already
> > destroyed MigrationState->qemu_file_lock, leading to the following
> > assert:
> > 
> >    qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
> > 
> > Fix this by not setting migration file error in the shut down flow.
> > 
> > Fixes: 20c64c8a51a4 ("migration: migration_file_set_error")
> > Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> > ---
> >   hw/vfio/migration.c | 31 +++++++++++++++++++++----------
> >   1 file changed, 21 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > index 992dc3b102..1c44b036ea 100644
> > --- a/hw/vfio/migration.c
> > +++ b/hw/vfio/migration.c
> > @@ -783,6 +783,25 @@ static const SaveVMHandlers savevm_vfio_handlers = {
> >   /* ---------------------------------------------------------------------- */
> > +static void vfio_vmstate_change_error_report(int ret, Error *err,
> > +                                             RunState state)
> > +{
> > +    if (state == RUN_STATE_SHUTDOWN) {
> > +        /*
> > +         * If VM is being shut down, migration object might have already been
> > +         * freed, so just report the error.
> > +         */
> > +        error_report_err(err);
> > +        return;
> > +    }
> > +
> > +    /*
> > +     * Migration should be aborted in this case, but vm_state_notify()
> > +     * currently does not support reporting failures.
> > +     */
> > +    migration_file_set_error(ret, err);
> > +}
> 
> This seems correct, but testing the machine's runtime state to
> work around the fact that 'current_migration' is not safe to
> use reveals a flaw.
> 
> In vfio, migration_is_setup_or_active() is unsafe. So are the
> calls to vfio_set_migration_error().
> 
> 
> This comment seems in contradiction with the migration code :
> 
> /* When we add fault tolerance, we could have several
>    migrations at once.  For now we don't need to add
>    dynamic creation of migration */
> 
> Why is 'current_migration' allocated and destroyed today ?
> 
> Could we replace it with a singleton and switch the state to
> MIGRATION_STATUS_NONE instead of destroying it ?
> 
> If not, should 'current_migration' be set to NULL in
> migration_shutdown() and its value tested before accessing
> to any of its fields, in a thread safe manner if necessary.

IIUC the migration thread should always see valid migration object, as it
takes one refcount at the entrance of migration_thread():

    object_ref(OBJECT(s));

So it's not yet clear to me on why the mutex was destroyed, if the main
object should still be there; logically it was only destroyed in the
finalize phase (migration_instance_finalize), but that should be after
migration thread quits and releases the refcount..
Cédric Le Goater Oct. 21, 2024, 4:43 p.m. UTC | #3
Hello,

> IIUC the migration thread should always see valid migration object, as it
> takes one refcount at the entrance of migration_thread():
> 
>      object_ref(OBJECT(s));

Could the migration have failed before ? in migrate_fd_connect()

> So it's not yet clear to me on why the mutex was destroyed, if the main
> object should still be there; logically it was only destroyed in the
> finalize phase (migration_instance_finalize), but that should be after
> migration thread quits and releases the refcount..


Avihai, would you have a reproducer ? or did you keep some logs ?
a backtrace maybe ?

Thanks,

C.
Peter Xu Oct. 21, 2024, 4:54 p.m. UTC | #4
On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote:
> Hello,
> 
> > IIUC the migration thread should always see valid migration object, as it
> > takes one refcount at the entrance of migration_thread():
> > 
> >      object_ref(OBJECT(s));
> 
> Could the migration have failed before ? in migrate_fd_connect()

I just noticed it's a vm state change notifier..

If so, maybe VFIO could refer to its internal states showing that it's
during a migration first (so as to guarantee the migration object is valid;
e.g., only after save_setup() but before save_cleanup() hooks are invoked).

Then migration_file_set_error() is only needed when it's during a migration
already.  Otherwise it's only a vm state change event, so nothing relevant
to migration.
Avihai Horon Oct. 22, 2024, 9:38 a.m. UTC | #5
On 21/10/2024 19:54, Peter Xu wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote:
>> Hello,
>>
>>> IIUC the migration thread should always see valid migration object, as it
>>> takes one refcount at the entrance of migration_thread():
>>>
>>>       object_ref(OBJECT(s));
>> Could the migration have failed before ? in migrate_fd_connect()
> I just noticed it's a vm state change notifier..

Yep.
I stumbled upon this bug by accident when running on a buggy machine.
Migration wasn't involved, I just started the VM, shut it down and got 
the assert (as my VFIO device was faulty and errored on RUNNING->STOP 
state change).

You can repro it by forcefully triggering an error on *->STOP transition:

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 17199b73ae..d41cb7c634 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool 
running, RunState state)
      }

      ret = vfio_migration_set_state_or_reset(vbasedev, new_state, 
&local_err);
-    if (ret) {
+    if (ret || new_state == VFIO_DEVICE_STATE_STOP) {
+        ret = -1;
+        error_setg(&local_err, "%s: forced error", vbasedev->name);
          /*
           * Migration should be aborted in this case, but vm_state_notify()
           * currently does not support reporting failures.

>
> If so, maybe VFIO could refer to its internal states showing that it's
> during a migration first (so as to guarantee the migration object is valid;
> e.g., only after save_setup() but before save_cleanup() hooks are invoked).

It's an option, but I think it's a bit awkward as we'd need to check 
that VFIOMigration->data_buffer is set or add a new flag.
Besides that, as Cedric pointed out, VFIO code calls 
migration_is_setup_or_active() which can also be unsafe, as it might be 
invoked after migration object has been freed.

Maybe a simpler solution would be to extend the the migration object 
lifetime?
Looking at commit history, you can see that commit 1f8956041ad3 
("migration: finalize current_migration object") added migration object 
finalization at the very end of qemu cleanup.
Then came commit 892ae715b6bc ("migration: Cleanup during exit") and 
moved the migration object finalization to the beginning of qemu cleanup 
(before stopping the VM etc.).

If so, the fix could be something like the below?

-------------8<-------------
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bfadc5613b..5eb099349a 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);

  /* migration/migration.c */
  void migration_object_init(void);
+void migration_object_finalize(void);
  void migration_shutdown(void);
  bool migration_is_idle(void);
  bool migration_is_active(void);
diff --git a/migration/migration.c b/migration/migration.c
index 021faee2f3..9eaa7947bc 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -265,6 +265,11 @@ void migration_object_init(void)
      dirty_bitmap_mig_init();
  }

+void migration_object_finalize(void)
+{
+    object_unref(OBJECT(current_migration));
+}
+
  typedef struct {
      QEMUBH *bh;
      QEMUBHFunc *cb;
@@ -330,7 +335,6 @@ void migration_shutdown(void)
       * stop the migration using this structure
       */
      migration_cancel(NULL);
-    object_unref(OBJECT(current_migration));

      /*
       * Cancel outgoing migration of dirty bitmaps. It should
diff --git a/system/runstate.c b/system/runstate.c
index c2c9afa905..fa823f5e72 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -930,5 +930,6 @@ void qemu_cleanup(int status)
      monitor_cleanup();
      qemu_chr_cleanup();
      user_creatable_cleanup();
+    migration_object_finalize();
      /* TODO: unref root container, check all devices are ok */
  }
-------------8<-------------

Thanks.
Cédric Le Goater Oct. 22, 2024, 1:24 p.m. UTC | #6
On 10/22/24 11:38, Avihai Horon wrote:
> 
> On 21/10/2024 19:54, Peter Xu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote:
>>> Hello,
>>>
>>>> IIUC the migration thread should always see valid migration object, as it
>>>> takes one refcount at the entrance of migration_thread():
>>>>
>>>>       object_ref(OBJECT(s));
>>> Could the migration have failed before ? in migrate_fd_connect()
>> I just noticed it's a vm state change notifier..
> 
> Yep.
> I stumbled upon this bug by accident when running on a buggy machine.
> Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change).
> 
> You can repro it by forcefully triggering an error on *->STOP transition:
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 17199b73ae..d41cb7c634 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>       }
> 
>       ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
> -    if (ret) {
> +    if (ret || new_state == VFIO_DEVICE_STATE_STOP) {
> +        ret = -1;
> +        error_setg(&local_err, "%s: forced error", vbasedev->name);
>           /*
>            * Migration should be aborted in this case, but vm_state_notify()
>            * currently does not support reporting failures.
> 
>>
>> If so, maybe VFIO could refer to its internal states showing that it's
>> during a migration first (so as to guarantee the migration object is valid;
>> e.g., only after save_setup() but before save_cleanup() hooks are invoked).
> 
> It's an option, but I think it's a bit awkward as we'd need to check that VFIOMigration->data_buffer is set 

That's what I was looking at too. It works. It feels a bit awkward
indeed. We could hide the details in an helper routine though.

> or add a new flag.

yes that's another solution.

Peter,

I wonder if we could grab a ref on current_migration in save_setup(),
store it under VFIOMigration and drop it save_cleanup() ?


> Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed.
> 
> Maybe a simpler solution would be to extend the the migration object lifetime?
> Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup.
> Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.).
> 
> If so, the fix could be something like the below?
> 
> -------------8<-------------
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index bfadc5613b..5eb099349a 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);
> 
>   /* migration/migration.c */
>   void migration_object_init(void);
> +void migration_object_finalize(void);
>   void migration_shutdown(void);
>   bool migration_is_idle(void);
>   bool migration_is_active(void);
> diff --git a/migration/migration.c b/migration/migration.c
> index 021faee2f3..9eaa7947bc 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -265,6 +265,11 @@ void migration_object_init(void)
>       dirty_bitmap_mig_init();
>   }
> 
> +void migration_object_finalize(void)
> +{
> +    object_unref(OBJECT(current_migration));
> +}
> +
>   typedef struct {
>       QEMUBH *bh;
>       QEMUBHFunc *cb;
> @@ -330,7 +335,6 @@ void migration_shutdown(void)
>        * stop the migration using this structure
>        */
>       migration_cancel(NULL);
> -    object_unref(OBJECT(current_migration));
> 
>       /*
>        * Cancel outgoing migration of dirty bitmaps. It should
> diff --git a/system/runstate.c b/system/runstate.c
> index c2c9afa905..fa823f5e72 100644
> --- a/system/runstate.c
> +++ b/system/runstate.c
> @@ -930,5 +930,6 @@ void qemu_cleanup(int status)
>       monitor_cleanup();
>       qemu_chr_cleanup();
>       user_creatable_cleanup();
> +    migration_object_finalize();
>       /* TODO: unref root container, check all devices are ok */
>   }
> -------------8<-------------

892ae715b6bc was trying to fix potential use-after-free issues.

I think it is safer to introduce an helper routine checking
(in some ways) if a migration is in progress than partially
revert 892ae715b6bc.

Thanks,

C.
Avihai Horon Oct. 22, 2024, 2:21 p.m. UTC | #7
On 22/10/2024 16:24, Cédric Le Goater wrote:
> External email: Use caution opening links or attachments
>
>
> On 10/22/24 11:38, Avihai Horon wrote:
>>
>> On 21/10/2024 19:54, Peter Xu wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote:
>>>> Hello,
>>>>
>>>>> IIUC the migration thread should always see valid migration 
>>>>> object, as it
>>>>> takes one refcount at the entrance of migration_thread():
>>>>>
>>>>>       object_ref(OBJECT(s));
>>>> Could the migration have failed before ? in migrate_fd_connect()
>>> I just noticed it's a vm state change notifier..
>>
>> Yep.
>> I stumbled upon this bug by accident when running on a buggy machine.
>> Migration wasn't involved, I just started the VM, shut it down and 
>> got the assert (as my VFIO device was faulty and errored on 
>> RUNNING->STOP state change).
>>
>> You can repro it by forcefully triggering an error on *->STOP 
>> transition:
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 17199b73ae..d41cb7c634 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, 
>> bool running, RunState state)
>>       }
>>
>>       ret = vfio_migration_set_state_or_reset(vbasedev, new_state, 
>> &local_err);
>> -    if (ret) {
>> +    if (ret || new_state == VFIO_DEVICE_STATE_STOP) {
>> +        ret = -1;
>> +        error_setg(&local_err, "%s: forced error", vbasedev->name);
>>           /*
>>            * Migration should be aborted in this case, but 
>> vm_state_notify()
>>            * currently does not support reporting failures.
>>
>>>
>>> If so, maybe VFIO could refer to its internal states showing that it's
>>> during a migration first (so as to guarantee the migration object is 
>>> valid;
>>> e.g., only after save_setup() but before save_cleanup() hooks are 
>>> invoked).
>>
>> It's an option, but I think it's a bit awkward as we'd need to check 
>> that VFIOMigration->data_buffer is set
>
> That's what I was looking at too. It works. It feels a bit awkward
> indeed. We could hide the details in an helper routine though.
>
>> or add a new flag.
>
> yes that's another solution.
>
> Peter,
>
> I wonder if we could grab a ref on current_migration in save_setup(),
> store it under VFIOMigration and drop it save_cleanup() ?
>
>
>> Besides that, as Cedric pointed out, VFIO code calls 
>> migration_is_setup_or_active() which can also be unsafe, as it might 
>> be invoked after migration object has been freed.
>>
>> Maybe a simpler solution would be to extend the the migration object 
>> lifetime?
>> Looking at commit history, you can see that commit 1f8956041ad3 
>> ("migration: finalize current_migration object") added migration 
>> object finalization at the very end of qemu cleanup.
>> Then came commit 892ae715b6bc ("migration: Cleanup during exit") and 
>> moved the migration object finalization to the beginning of qemu 
>> cleanup (before stopping the VM etc.).
>>
>> If so, the fix could be something like the below?
>>
>> -------------8<-------------
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index bfadc5613b..5eb099349a 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);
>>
>>   /* migration/migration.c */
>>   void migration_object_init(void);
>> +void migration_object_finalize(void);
>>   void migration_shutdown(void);
>>   bool migration_is_idle(void);
>>   bool migration_is_active(void);
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 021faee2f3..9eaa7947bc 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -265,6 +265,11 @@ void migration_object_init(void)
>>       dirty_bitmap_mig_init();
>>   }
>>
>> +void migration_object_finalize(void)
>> +{
>> +    object_unref(OBJECT(current_migration));
>> +}
>> +
>>   typedef struct {
>>       QEMUBH *bh;
>>       QEMUBHFunc *cb;
>> @@ -330,7 +335,6 @@ void migration_shutdown(void)
>>        * stop the migration using this structure
>>        */
>>       migration_cancel(NULL);
>> -    object_unref(OBJECT(current_migration));
>>
>>       /*
>>        * Cancel outgoing migration of dirty bitmaps. It should
>> diff --git a/system/runstate.c b/system/runstate.c
>> index c2c9afa905..fa823f5e72 100644
>> --- a/system/runstate.c
>> +++ b/system/runstate.c
>> @@ -930,5 +930,6 @@ void qemu_cleanup(int status)
>>       monitor_cleanup();
>>       qemu_chr_cleanup();
>>       user_creatable_cleanup();
>> +    migration_object_finalize();
>>       /* TODO: unref root container, check all devices are ok */
>>   }
>> -------------8<-------------
>
> 892ae715b6bc was trying to fix potential use-after-free issues.

Yes, and it did, by ref/unref current_migration in migration_thread().
In addition, it added a migrate_fd_cancel() in the beginning of qemu 
cleanup to encourage migration_thread to quit.
IIUC, these are the core changes of the commit, and the only reason to 
move "object_unref(OBJECT(current_migration))" from the end of qemu 
cleanup to the beginning of it was to put migration cleanup code in one 
place.

That's why I think moving "object_unref(OBJECT(current_migration))" back 
to the end of qemu cleanup is safe and shouldn't make much difference.
It will delay freeing of MigrationState right to the end and will allow 
us to use the already existing helper (migration_is_setup_or_active()) 
to check migration status.

IMHO this seems like the cleanest solution, unless I am missing something.

Thanks.

>
> I think it is safer to introduce an helper routine checking
> (in some ways) if a migration is in progress than partially
> revert 892ae715b6bc.
>
> Thanks,
>
> C.
>
Peter Xu Oct. 22, 2024, 2:29 p.m. UTC | #8
On Tue, Oct 22, 2024 at 03:24:56PM +0200, Cédric Le Goater wrote:
> On 10/22/24 11:38, Avihai Horon wrote:
> > 
> > On 21/10/2024 19:54, Peter Xu wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote:
> > > > Hello,
> > > > 
> > > > > IIUC the migration thread should always see valid migration object, as it
> > > > > takes one refcount at the entrance of migration_thread():
> > > > > 
> > > > >       object_ref(OBJECT(s));
> > > > Could the migration have failed before ? in migrate_fd_connect()
> > > I just noticed it's a vm state change notifier..
> > 
> > Yep.
> > I stumbled upon this bug by accident when running on a buggy machine.
> > Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change).
> > 
> > You can repro it by forcefully triggering an error on *->STOP transition:
> > 
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > index 17199b73ae..d41cb7c634 100644
> > --- a/hw/vfio/migration.c
> > +++ b/hw/vfio/migration.c
> > @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
> >       }
> > 
> >       ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
> > -    if (ret) {
> > +    if (ret || new_state == VFIO_DEVICE_STATE_STOP) {
> > +        ret = -1;
> > +        error_setg(&local_err, "%s: forced error", vbasedev->name);
> >           /*
> >            * Migration should be aborted in this case, but vm_state_notify()
> >            * currently does not support reporting failures.
> > 
> > > 
> > > If so, maybe VFIO could refer to its internal states showing that it's
> > > during a migration first (so as to guarantee the migration object is valid;
> > > e.g., only after save_setup() but before save_cleanup() hooks are invoked).
> > 
> > It's an option, but I think it's a bit awkward as we'd need to check
> > that VFIOMigration->data_buffer is set
> 
> That's what I was looking at too. It works. It feels a bit awkward
> indeed. We could hide the details in an helper routine though.
> 
> > or add a new flag.
> 
> yes that's another solution.
> 
> Peter,
> 
> I wonder if we could grab a ref on current_migration in save_setup(),
> store it under VFIOMigration and drop it save_cleanup() ?

VFIO can definitely do that, but I am not sure how that would help.. as I
think the migration object can never go away anyway during setup->cleanup,
so taking that extra refcount shouldn't change anything.

IOW, AFAICT this bug is triggered only when without migration at all.

> 
> 
> > Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed.
> > 
> > Maybe a simpler solution would be to extend the the migration object lifetime?
> > Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup.
> > Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.).
> > 
> > If so, the fix could be something like the below?
> > 
> > -------------8<-------------
> > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > index bfadc5613b..5eb099349a 100644
> > --- a/include/migration/misc.h
> > +++ b/include/migration/misc.h
> > @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);
> > 
> >   /* migration/migration.c */
> >   void migration_object_init(void);
> > +void migration_object_finalize(void);
> >   void migration_shutdown(void);
> >   bool migration_is_idle(void);
> >   bool migration_is_active(void);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 021faee2f3..9eaa7947bc 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -265,6 +265,11 @@ void migration_object_init(void)
> >       dirty_bitmap_mig_init();
> >   }
> > 
> > +void migration_object_finalize(void)
> > +{
> > +    object_unref(OBJECT(current_migration));
> > +}
> > +
> >   typedef struct {
> >       QEMUBH *bh;
> >       QEMUBHFunc *cb;
> > @@ -330,7 +335,6 @@ void migration_shutdown(void)
> >        * stop the migration using this structure
> >        */
> >       migration_cancel(NULL);
> > -    object_unref(OBJECT(current_migration));
> > 
> >       /*
> >        * Cancel outgoing migration of dirty bitmaps. It should
> > diff --git a/system/runstate.c b/system/runstate.c
> > index c2c9afa905..fa823f5e72 100644
> > --- a/system/runstate.c
> > +++ b/system/runstate.c
> > @@ -930,5 +930,6 @@ void qemu_cleanup(int status)
> >       monitor_cleanup();
> >       qemu_chr_cleanup();
> >       user_creatable_cleanup();
> > +    migration_object_finalize();
> >       /* TODO: unref root container, check all devices are ok */
> >   }
> > -------------8<-------------
> 
> 892ae715b6bc was trying to fix potential use-after-free issues.
> 
> I think it is safer to introduce an helper routine checking
> (in some ways) if a migration is in progress than partially
> revert 892ae715b6bc.

Right, Dave also mentioned something in 892ae715b6bc about moving it
earlier:

    We do this a bit earlier; so hopefully migration gets out of the way
    before all the devices etc are freed.

But I don't know the relationship on device free() v.s. the migration
object.  We might at least want to figure that out if we want to move it
again.

I notice that vdpa also started using migration_is_setup_or_active(), so if
it's used in more places, maybe indeed we can consider making them safe to
be called at any phase of QEMU.

Logically it is safe in vm state change hook always because it has the BQL
and the object can only be freed when with BQL.

So let me send a small patch later to hopefully make all these exported
functions (including migration_file_set_error() in this case, logically
anything in migration/misc.h) safe to be called without migration.
Cédric Le Goater Oct. 22, 2024, 2:42 p.m. UTC | #9
On 10/22/24 16:29, Peter Xu wrote:
> On Tue, Oct 22, 2024 at 03:24:56PM +0200, Cédric Le Goater wrote:
>> On 10/22/24 11:38, Avihai Horon wrote:
>>>
>>> On 21/10/2024 19:54, Peter Xu wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On Mon, Oct 21, 2024 at 06:43:13PM +0200, Cédric Le Goater wrote:
>>>>> Hello,
>>>>>
>>>>>> IIUC the migration thread should always see valid migration object, as it
>>>>>> takes one refcount at the entrance of migration_thread():
>>>>>>
>>>>>>        object_ref(OBJECT(s));
>>>>> Could the migration have failed before ? in migrate_fd_connect()
>>>> I just noticed it's a vm state change notifier..
>>>
>>> Yep.
>>> I stumbled upon this bug by accident when running on a buggy machine.
>>> Migration wasn't involved, I just started the VM, shut it down and got the assert (as my VFIO device was faulty and errored on RUNNING->STOP state change).
>>>
>>> You can repro it by forcefully triggering an error on *->STOP transition:
>>>
>>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>>> index 17199b73ae..d41cb7c634 100644
>>> --- a/hw/vfio/migration.c
>>> +++ b/hw/vfio/migration.c
>>> @@ -831,7 +831,9 @@ static void vfio_vmstate_change(void *opaque, bool running, RunState state)
>>>        }
>>>
>>>        ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
>>> -    if (ret) {
>>> +    if (ret || new_state == VFIO_DEVICE_STATE_STOP) {
>>> +        ret = -1;
>>> +        error_setg(&local_err, "%s: forced error", vbasedev->name);
>>>            /*
>>>             * Migration should be aborted in this case, but vm_state_notify()
>>>             * currently does not support reporting failures.
>>>
>>>>
>>>> If so, maybe VFIO could refer to its internal states showing that it's
>>>> during a migration first (so as to guarantee the migration object is valid;
>>>> e.g., only after save_setup() but before save_cleanup() hooks are invoked).
>>>
>>> It's an option, but I think it's a bit awkward as we'd need to check
>>> that VFIOMigration->data_buffer is set
>>
>> That's what I was looking at too. It works. It feels a bit awkward
>> indeed. We could hide the details in an helper routine though.
>>
>>> or add a new flag.
>>
>> yes that's another solution.
>>
>> Peter,
>>
>> I wonder if we could grab a ref on current_migration in save_setup(),
>> store it under VFIOMigration and drop it save_cleanup() ?
> 
> VFIO can definitely do that, but I am not sure how that would help.. as I
> think the migration object can never go away anyway during setup->cleanup,
> so taking that extra refcount shouldn't change anything.

It won't be but we would have a VFIOMigration::MigrationState pointer to
test in the vmstate change handler, which is a bit cleaner than testing
migration->data_buffer. Just an idea. I am not convinced myself either.

> IOW, AFAICT this bug is triggered only when without migration at all.
> 
>>
>>
>>> Besides that, as Cedric pointed out, VFIO code calls migration_is_setup_or_active() which can also be unsafe, as it might be invoked after migration object has been freed.
>>>
>>> Maybe a simpler solution would be to extend the the migration object lifetime?
>>> Looking at commit history, you can see that commit 1f8956041ad3 ("migration: finalize current_migration object") added migration object finalization at the very end of qemu cleanup.
>>> Then came commit 892ae715b6bc ("migration: Cleanup during exit") and moved the migration object finalization to the beginning of qemu cleanup (before stopping the VM etc.).
>>>
>>> If so, the fix could be something like the below?
>>>
>>> -------------8<-------------
>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>> index bfadc5613b..5eb099349a 100644
>>> --- a/include/migration/misc.h
>>> +++ b/include/migration/misc.h
>>> @@ -52,6 +52,7 @@ void dump_vmstate_json_to_file(FILE *out_fp);
>>>
>>>    /* migration/migration.c */
>>>    void migration_object_init(void);
>>> +void migration_object_finalize(void);
>>>    void migration_shutdown(void);
>>>    bool migration_is_idle(void);
>>>    bool migration_is_active(void);
>>> diff --git a/migration/migration.c b/migration/migration.c
>>> index 021faee2f3..9eaa7947bc 100644
>>> --- a/migration/migration.c
>>> +++ b/migration/migration.c
>>> @@ -265,6 +265,11 @@ void migration_object_init(void)
>>>        dirty_bitmap_mig_init();
>>>    }
>>>
>>> +void migration_object_finalize(void)
>>> +{
>>> +    object_unref(OBJECT(current_migration));
>>> +}
>>> +
>>>    typedef struct {
>>>        QEMUBH *bh;
>>>        QEMUBHFunc *cb;
>>> @@ -330,7 +335,6 @@ void migration_shutdown(void)
>>>         * stop the migration using this structure
>>>         */
>>>        migration_cancel(NULL);
>>> -    object_unref(OBJECT(current_migration));
>>>
>>>        /*
>>>         * Cancel outgoing migration of dirty bitmaps. It should
>>> diff --git a/system/runstate.c b/system/runstate.c
>>> index c2c9afa905..fa823f5e72 100644
>>> --- a/system/runstate.c
>>> +++ b/system/runstate.c
>>> @@ -930,5 +930,6 @@ void qemu_cleanup(int status)
>>>        monitor_cleanup();
>>>        qemu_chr_cleanup();
>>>        user_creatable_cleanup();
>>> +    migration_object_finalize();
>>>        /* TODO: unref root container, check all devices are ok */
>>>    }
>>> -------------8<-------------
>>
>> 892ae715b6bc was trying to fix potential use-after-free issues.
>>
>> I think it is safer to introduce an helper routine checking
>> (in some ways) if a migration is in progress than partially
>> revert 892ae715b6bc.
> 
> Right, Dave also mentioned something in 892ae715b6bc about moving it
> earlier:
> 
>      We do this a bit earlier; so hopefully migration gets out of the way
>      before all the devices etc are freed.
> 
> But I don't know the relationship on device free() v.s. the migration
> object.  We might at least want to figure that out if we want to move it
> again.
> 
> I notice that vdpa also started using migration_is_setup_or_active(), so if
> it's used in more places, maybe indeed we can consider making them safe to
> be called at any phase of QEMU.
> 
> Logically it is safe in vm state change hook always because it has the BQL
> and the object can only be freed when with BQL.
> 
> So let me send a small patch later to hopefully make all these exported
> functions (including migration_file_set_error() in this case, logically
> anything in migration/misc.h) safe to be called without migration.
> 

OK.

Thanks

C.
Peter Xu Oct. 22, 2024, 4:10 p.m. UTC | #10
On Tue, Oct 22, 2024 at 04:42:14PM +0200, Cédric Le Goater wrote:
> > So let me send a small patch later to hopefully make all these exported
> > functions (including migration_file_set_error() in this case, logically
> > anything in migration/misc.h) safe to be called without migration.
> 
> OK.

Posted here:

https://lore.kernel.org/qemu-devel/20241022160720.1013543-1-peterx@redhat.com/

Thanks,
diff mbox series

Patch

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 992dc3b102..1c44b036ea 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -783,6 +783,25 @@  static const SaveVMHandlers savevm_vfio_handlers = {
 
 /* ---------------------------------------------------------------------- */
 
+static void vfio_vmstate_change_error_report(int ret, Error *err,
+                                             RunState state)
+{
+    if (state == RUN_STATE_SHUTDOWN) {
+        /*
+         * If VM is being shut down, migration object might have already been
+         * freed, so just report the error.
+         */
+        error_report_err(err);
+        return;
+    }
+
+    /*
+     * Migration should be aborted in this case, but vm_state_notify()
+     * currently does not support reporting failures.
+     */
+    migration_file_set_error(ret, err);
+}
+
 static void vfio_vmstate_change_prepare(void *opaque, bool running,
                                         RunState state)
 {
@@ -798,11 +817,7 @@  static void vfio_vmstate_change_prepare(void *opaque, bool running,
 
     ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
     if (ret) {
-        /*
-         * Migration should be aborted in this case, but vm_state_notify()
-         * currently does not support reporting failures.
-         */
-        migration_file_set_error(ret, local_err);
+        vfio_vmstate_change_error_report(ret, local_err, state);
     }
 
     trace_vfio_vmstate_change_prepare(vbasedev->name, running,
@@ -829,11 +844,7 @@  static void vfio_vmstate_change(void *opaque, bool running, RunState state)
 
     ret = vfio_migration_set_state_or_reset(vbasedev, new_state, &local_err);
     if (ret) {
-        /*
-         * Migration should be aborted in this case, but vm_state_notify()
-         * currently does not support reporting failures.
-         */
-        migration_file_set_error(ret, local_err);
+        vfio_vmstate_change_error_report(ret, local_err, state);
     }
 
     trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),