Message ID | 20230626082353.18535-2-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/migration: Make VFIO migration non-experimental | expand |
On 6/26/23 10:23, Avihai Horon wrote: > Changing the device state from STOP_COPY to STOP can take time as the > device may need to free resources and do other operations as part of the > transition. Currently, this is done in vfio_save_complete_precopy() and > therefore it is counted in the migration downtime. > > To avoid this, change the device state from STOP_COPY to STOP in > vfio_save_cleanup(), which is called after migration has completed and > thus is not part of migration downtime. That's optimization and > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > --- > hw/vfio/migration.c | 19 +++++++++++++------ > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index acbf0bb7ab..a8bfbe4b89 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque) > VFIODevice *vbasedev = opaque; > VFIOMigration *migration = vbasedev->migration; > > + /* > + * Changing device state from STOP_COPY to STOP can take time. Do it here, > + * after migration has completed, so it won't increase downtime. > + */ > + if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) { > + /* > + * If setting the device in STOP state fails, the device should be > + * reset. To do so, use ERROR state as a recover state. > + */ > + vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, > + VFIO_DEVICE_STATE_ERROR); > + } > + > g_free(migration->data_buffer); > migration->data_buffer = NULL; > migration->precopy_init_size = 0; > @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) > return ret; > } > > - /* > - * If setting the device in STOP state fails, the device should be reset. > - * To do so, use ERROR state as a recover state. > - */ > - ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, > - VFIO_DEVICE_STATE_ERROR); we loose the possible returned error. Let's keep that change for later. Thanks, C. > trace_vfio_save_complete_precopy(vbasedev->name, ret); > > return ret;
On 26/06/2023 12:56, Cédric Le Goater wrote: > External email: Use caution opening links or attachments > > > On 6/26/23 10:23, Avihai Horon wrote: >> Changing the device state from STOP_COPY to STOP can take time as the >> device may need to free resources and do other operations as part of the >> transition. Currently, this is done in vfio_save_complete_precopy() and >> therefore it is counted in the migration downtime. >> >> To avoid this, change the device state from STOP_COPY to STOP in >> vfio_save_cleanup(), which is called after migration has completed and >> thus is not part of migration downtime. > > That's optimization and > >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> --- >> hw/vfio/migration.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index acbf0bb7ab..a8bfbe4b89 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque) >> VFIODevice *vbasedev = opaque; >> VFIOMigration *migration = vbasedev->migration; >> >> + /* >> + * Changing device state from STOP_COPY to STOP can take time. >> Do it here, >> + * after migration has completed, so it won't increase downtime. >> + */ >> + if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) { >> + /* >> + * If setting the device in STOP state fails, the device >> should be >> + * reset. To do so, use ERROR state as a recover state. >> + */ >> + vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, >> + VFIO_DEVICE_STATE_ERROR); >> + } >> + >> g_free(migration->data_buffer); >> migration->data_buffer = NULL; >> migration->precopy_init_size = 0; >> @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile >> *f, void *opaque) >> return ret; >> } >> >> - /* >> - * If setting the device in STOP state fails, the device should >> be reset. >> - * To do so, use ERROR state as a recover state. >> - */ >> - ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, >> - VFIO_DEVICE_STATE_ERROR); > > we loose the possible returned error. Right, but vfio_save_cleanup() is called as part of migration cleanup flow after migration has finished, so I don't think returning an error is useful here and the error is still reported in vfio_migration_set_state() so we don't really loose it. > Let's keep that change for later. Sure. Thanks! > > >> trace_vfio_save_complete_precopy(vbasedev->name, ret); >> >> return ret; >
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index acbf0bb7ab..a8bfbe4b89 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque) VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; + /* + * Changing device state from STOP_COPY to STOP can take time. Do it here, + * after migration has completed, so it won't increase downtime. + */ + if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) { + /* + * If setting the device in STOP state fails, the device should be + * reset. To do so, use ERROR state as a recover state. + */ + vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, + VFIO_DEVICE_STATE_ERROR); + } + g_free(migration->data_buffer); migration->data_buffer = NULL; migration->precopy_init_size = 0; @@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) return ret; } - /* - * If setting the device in STOP state fails, the device should be reset. - * To do so, use ERROR state as a recover state. - */ - ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, - VFIO_DEVICE_STATE_ERROR); trace_vfio_save_complete_precopy(vbasedev->name, ret); return ret;
Changing the device state from STOP_COPY to STOP can take time as the device may need to free resources and do other operations as part of the transition. Currently, this is done in vfio_save_complete_precopy() and therefore it is counted in the migration downtime. To avoid this, change the device state from STOP_COPY to STOP in vfio_save_cleanup(), which is called after migration has completed and thus is not part of migration downtime. Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- hw/vfio/migration.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)