Message ID | 20241024213056.1395400-8-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Migration: Make misc.h helpers available for whole VM lifecycle | expand |
On 25/10/2024 0:30, Peter Xu wrote: > External email: Use caution opening links or attachments > > > We have two outside users of this API, so it's exported. > > Is it really necessary? Does it matter whether it must be > ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. Actually for VFIO it does matter, because we don't want VFIO to do DPT log_sync in SETUP stage when DPT might not have been started yet. See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration SETUP state"). Thanks. > > The external user is trying to detect whether migration is running or not, > as simple as that. > > To make the migration_is*() APIs even shorter, let's use > migration_is_running() for outside worlds. > > Internally there're actually three places that literally needs > migration_is_active() rather than running(). Keep that an internal helper. > > After this patch, we finally only export one helper that allows external > world to try detect migration status, which is migration_is_running(). > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/migration/misc.h | 1 - > migration/migration.h | 1 + > hw/vfio/common.c | 4 ++-- > system/dirtylimit.c | 3 +-- > 4 files changed, 4 insertions(+), 5 deletions(-) > > diff --git a/include/migration/misc.h b/include/migration/misc.h > index ad1e25826a..c0e23fdac9 100644 > --- a/include/migration/misc.h > +++ b/include/migration/misc.h > @@ -53,7 +53,6 @@ void dump_vmstate_json_to_file(FILE *out_fp); > void migration_object_init(void); > void migration_shutdown(void); > > -bool migration_is_active(void); > bool migration_is_running(void); > bool migration_thread_is_self(void); > > diff --git a/migration/migration.h b/migration/migration.h > index 0956e9274b..9fa26ab06a 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, > > int migrate_init(MigrationState *s, Error **errp); > bool migration_is_blocked(Error **errp); > +bool migration_is_active(void); > /* True if outgoing migration has entered postcopy phase */ > bool migration_in_postcopy(void); > bool migration_postcopy_is_alive(MigrationStatus state); > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index cc72282c71..7eb99ebd4d 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -174,7 +174,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) > { > VFIODevice *vbasedev; > > - if (!migration_is_active()) { > + if (!migration_is_running()) { > return false; > } > > @@ -219,7 +219,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) > { > VFIODevice *vbasedev; > > - if (!migration_is_active()) { > + if (!migration_is_running()) { > return false; > } > > diff --git a/system/dirtylimit.c b/system/dirtylimit.c > index ab20da34bb..d7a855c603 100644 > --- a/system/dirtylimit.c > +++ b/system/dirtylimit.c > @@ -80,8 +80,7 @@ static void vcpu_dirty_rate_stat_collect(void) > int i = 0; > int64_t period = DIRTYLIMIT_CALC_TIME_MS; > > - if (migrate_dirty_limit() && > - migration_is_active()) { > + if (migrate_dirty_limit() && migration_is_running()) { > period = migrate_vcpu_dirty_limit_period(); > } > > -- > 2.45.0 >
On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: > > On 25/10/2024 0:30, Peter Xu wrote: > > External email: Use caution opening links or attachments > > > > > > We have two outside users of this API, so it's exported. > > > > Is it really necessary? Does it matter whether it must be > > ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. > > Actually for VFIO it does matter, because we don't want VFIO to do DPT > log_sync in SETUP stage when DPT might not have been started yet. > See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration > SETUP state"). This seems to be a known issue for migration in general, rather than VFIO specific. Hyman has a patch for it, not yet reviewed.. https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com That corresponds to your comment here: Redundant -- all RAM is marked dirty in migration SETUP state and is transferred only after migration is set to ACTIVE state, so doing log_sync during migration SETUP is pointless. So I wonder whether it's only VFIO that should skip it, or log_sync() simply shouldn't be called at all during SETUP, because of its redundancy. The other thing you mentioned here: Can fail -- there is a time window, between setting migration state to SETUP and starting dirty tracking by RAM save_live_setup handler, during which dirty tracking is still not started. Any VFIO log_sync call that is issued during this time window will fail. For example, this error can be triggered by migrating a VM when a GUI is active, which constantly calls log_sync. This is VFIO specific. Why this can fail even if global tracking is started already? I didn't yet get why a GUI being active in guest could affect log_sync() from working.. after vfio_listener_log_global_start() properly setup everything. Thanks, > > Thanks. > > > > > The external user is trying to detect whether migration is running or not, > > as simple as that. > > > > To make the migration_is*() APIs even shorter, let's use > > migration_is_running() for outside worlds. > > > > Internally there're actually three places that literally needs > > migration_is_active() rather than running(). Keep that an internal helper. > > > > After this patch, we finally only export one helper that allows external > > world to try detect migration status, which is migration_is_running(). > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > include/migration/misc.h | 1 - > > migration/migration.h | 1 + > > hw/vfio/common.c | 4 ++-- > > system/dirtylimit.c | 3 +-- > > 4 files changed, 4 insertions(+), 5 deletions(-) > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h > > index ad1e25826a..c0e23fdac9 100644 > > --- a/include/migration/misc.h > > +++ b/include/migration/misc.h > > @@ -53,7 +53,6 @@ void dump_vmstate_json_to_file(FILE *out_fp); > > void migration_object_init(void); > > void migration_shutdown(void); > > > > -bool migration_is_active(void); > > bool migration_is_running(void); > > bool migration_thread_is_self(void); > > > > diff --git a/migration/migration.h b/migration/migration.h > > index 0956e9274b..9fa26ab06a 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, > > > > int migrate_init(MigrationState *s, Error **errp); > > bool migration_is_blocked(Error **errp); > > +bool migration_is_active(void); > > /* True if outgoing migration has entered postcopy phase */ > > bool migration_in_postcopy(void); > > bool migration_postcopy_is_alive(MigrationStatus state); > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > index cc72282c71..7eb99ebd4d 100644 > > --- a/hw/vfio/common.c > > +++ b/hw/vfio/common.c > > @@ -174,7 +174,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) > > { > > VFIODevice *vbasedev; > > > > - if (!migration_is_active()) { > > + if (!migration_is_running()) { > > return false; > > } > > > > @@ -219,7 +219,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) > > { > > VFIODevice *vbasedev; > > > > - if (!migration_is_active()) { > > + if (!migration_is_running()) { > > return false; > > } > > > > diff --git a/system/dirtylimit.c b/system/dirtylimit.c > > index ab20da34bb..d7a855c603 100644 > > --- a/system/dirtylimit.c > > +++ b/system/dirtylimit.c > > @@ -80,8 +80,7 @@ static void vcpu_dirty_rate_stat_collect(void) > > int i = 0; > > int64_t period = DIRTYLIMIT_CALC_TIME_MS; > > > > - if (migrate_dirty_limit() && > > - migration_is_active()) { > > + if (migrate_dirty_limit() && migration_is_running()) { > > period = migrate_vcpu_dirty_limit_period(); > > } > > > > -- > > 2.45.0 > > >
On 28/10/2024 17:45, Peter Xu wrote: > External email: Use caution opening links or attachments > > > On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: >> On 25/10/2024 0:30, Peter Xu wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> We have two outside users of this API, so it's exported. >>> >>> Is it really necessary? Does it matter whether it must be >>> ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. >> Actually for VFIO it does matter, because we don't want VFIO to do DPT >> log_sync in SETUP stage when DPT might not have been started yet. >> See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration >> SETUP state"). > This seems to be a known issue for migration in general, rather than VFIO > specific. Hyman has a patch for it, not yet reviewed.. > > https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com > > That corresponds to your comment here: > > Redundant -- all RAM is marked dirty in migration SETUP state and is > transferred only after migration is set to ACTIVE state, so doing > log_sync during migration SETUP is pointless. > > So I wonder whether it's only VFIO that should skip it, or log_sync() > simply shouldn't be called at all during SETUP, because of its redundancy. Not sure why this sync was there in the first place, but if its only purpose was to sync dirty pages then yes, I guess it be dropped. > > The other thing you mentioned here: > > Can fail -- there is a time window, between setting migration state to > SETUP and starting dirty tracking by RAM save_live_setup handler, during > which dirty tracking is still not started. Any VFIO log_sync call that > is issued during this time window will fail. For example, this error can > be triggered by migrating a VM when a GUI is active, which constantly > calls log_sync. > > This is VFIO specific. Why this can fail even if global tracking is > started already? It can fail if global tracking is *not* started yet. As mentioned in the commit message, there is a time window where migration is in SETUP state but global tracking is not started yet. Thanks. > I didn't yet get why a GUI being active in guest could > affect log_sync() from working.. after vfio_listener_log_global_start() > properly setup everything. > > Thanks, > >> Thanks. >> >>> The external user is trying to detect whether migration is running or not, >>> as simple as that. >>> >>> To make the migration_is*() APIs even shorter, let's use >>> migration_is_running() for outside worlds. >>> >>> Internally there're actually three places that literally needs >>> migration_is_active() rather than running(). Keep that an internal helper. >>> >>> After this patch, we finally only export one helper that allows external >>> world to try detect migration status, which is migration_is_running(). >>> >>> Signed-off-by: Peter Xu <peterx@redhat.com> >>> --- >>> include/migration/misc.h | 1 - >>> migration/migration.h | 1 + >>> hw/vfio/common.c | 4 ++-- >>> system/dirtylimit.c | 3 +-- >>> 4 files changed, 4 insertions(+), 5 deletions(-) >>> >>> diff --git a/include/migration/misc.h b/include/migration/misc.h >>> index ad1e25826a..c0e23fdac9 100644 >>> --- a/include/migration/misc.h >>> +++ b/include/migration/misc.h >>> @@ -53,7 +53,6 @@ void dump_vmstate_json_to_file(FILE *out_fp); >>> void migration_object_init(void); >>> void migration_shutdown(void); >>> >>> -bool migration_is_active(void); >>> bool migration_is_running(void); >>> bool migration_thread_is_self(void); >>> >>> diff --git a/migration/migration.h b/migration/migration.h >>> index 0956e9274b..9fa26ab06a 100644 >>> --- a/migration/migration.h >>> +++ b/migration/migration.h >>> @@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, >>> >>> int migrate_init(MigrationState *s, Error **errp); >>> bool migration_is_blocked(Error **errp); >>> +bool migration_is_active(void); >>> /* True if outgoing migration has entered postcopy phase */ >>> bool migration_in_postcopy(void); >>> bool migration_postcopy_is_alive(MigrationStatus state); >>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>> index cc72282c71..7eb99ebd4d 100644 >>> --- a/hw/vfio/common.c >>> +++ b/hw/vfio/common.c >>> @@ -174,7 +174,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) >>> { >>> VFIODevice *vbasedev; >>> >>> - if (!migration_is_active()) { >>> + if (!migration_is_running()) { >>> return false; >>> } >>> >>> @@ -219,7 +219,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) >>> { >>> VFIODevice *vbasedev; >>> >>> - if (!migration_is_active()) { >>> + if (!migration_is_running()) { >>> return false; >>> } >>> >>> diff --git a/system/dirtylimit.c b/system/dirtylimit.c >>> index ab20da34bb..d7a855c603 100644 >>> --- a/system/dirtylimit.c >>> +++ b/system/dirtylimit.c >>> @@ -80,8 +80,7 @@ static void vcpu_dirty_rate_stat_collect(void) >>> int i = 0; >>> int64_t period = DIRTYLIMIT_CALC_TIME_MS; >>> >>> - if (migrate_dirty_limit() && >>> - migration_is_active()) { >>> + if (migrate_dirty_limit() && migration_is_running()) { >>> period = migrate_vcpu_dirty_limit_period(); >>> } >>> >>> -- >>> 2.45.0 >>> > -- > Peter Xu >
On Mon, Oct 28, 2024 at 06:41:42PM +0200, Avihai Horon wrote: > > On 28/10/2024 17:45, Peter Xu wrote: > > External email: Use caution opening links or attachments > > > > > > On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: > > > On 25/10/2024 0:30, Peter Xu wrote: > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > We have two outside users of this API, so it's exported. > > > > > > > > Is it really necessary? Does it matter whether it must be > > > > ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. > > > Actually for VFIO it does matter, because we don't want VFIO to do DPT > > > log_sync in SETUP stage when DPT might not have been started yet. > > > See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration > > > SETUP state"). > > This seems to be a known issue for migration in general, rather than VFIO > > specific. Hyman has a patch for it, not yet reviewed.. > > > > https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com > > > > That corresponds to your comment here: > > > > Redundant -- all RAM is marked dirty in migration SETUP state and is > > transferred only after migration is set to ACTIVE state, so doing > > log_sync during migration SETUP is pointless. > > > > So I wonder whether it's only VFIO that should skip it, or log_sync() > > simply shouldn't be called at all during SETUP, because of its redundancy. > > Not sure why this sync was there in the first place, but if its only purpose > was to sync dirty pages then yes, I guess it be dropped. > > > > > The other thing you mentioned here: > > > > Can fail -- there is a time window, between setting migration state to > > SETUP and starting dirty tracking by RAM save_live_setup handler, during > > which dirty tracking is still not started. Any VFIO log_sync call that > > is issued during this time window will fail. For example, this error can > > be triggered by migrating a VM when a GUI is active, which constantly > > calls log_sync. > > > > This is VFIO specific. Why this can fail even if global tracking is > > started already? > > It can fail if global tracking is *not* started yet. > As mentioned in the commit message, there is a time window where migration > is in SETUP state but global tracking is not started yet. Hmm, I'm totally confused now.. The only thing that can kickoff the sync during SETUP, AFAICT, is: ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp); if (!ret) { goto out_unlock; } migration_bitmap_sync_precopy(false); <------------- here I need to confess this may not be the right place to invoke it in ram.c (I think we probably should move it out at some point.. into generic migration code). However I don't yet see why log_start() is not called first in your case before log_sync(). Could you elaborate? > > Thanks. > > > I didn't yet get why a GUI being active in guest could > > affect log_sync() from working.. after vfio_listener_log_global_start() > > properly setup everything. > > > > Thanks, > > > > > Thanks. > > > > > > > The external user is trying to detect whether migration is running or not, > > > > as simple as that. > > > > > > > > To make the migration_is*() APIs even shorter, let's use > > > > migration_is_running() for outside worlds. > > > > > > > > Internally there're actually three places that literally needs > > > > migration_is_active() rather than running(). Keep that an internal helper. > > > > > > > > After this patch, we finally only export one helper that allows external > > > > world to try detect migration status, which is migration_is_running(). > > > > > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > include/migration/misc.h | 1 - > > > > migration/migration.h | 1 + > > > > hw/vfio/common.c | 4 ++-- > > > > system/dirtylimit.c | 3 +-- > > > > 4 files changed, 4 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/include/migration/misc.h b/include/migration/misc.h > > > > index ad1e25826a..c0e23fdac9 100644 > > > > --- a/include/migration/misc.h > > > > +++ b/include/migration/misc.h > > > > @@ -53,7 +53,6 @@ void dump_vmstate_json_to_file(FILE *out_fp); > > > > void migration_object_init(void); > > > > void migration_shutdown(void); > > > > > > > > -bool migration_is_active(void); > > > > bool migration_is_running(void); > > > > bool migration_thread_is_self(void); > > > > > > > > diff --git a/migration/migration.h b/migration/migration.h > > > > index 0956e9274b..9fa26ab06a 100644 > > > > --- a/migration/migration.h > > > > +++ b/migration/migration.h > > > > @@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, > > > > > > > > int migrate_init(MigrationState *s, Error **errp); > > > > bool migration_is_blocked(Error **errp); > > > > +bool migration_is_active(void); > > > > /* True if outgoing migration has entered postcopy phase */ > > > > bool migration_in_postcopy(void); > > > > bool migration_postcopy_is_alive(MigrationStatus state); > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > > index cc72282c71..7eb99ebd4d 100644 > > > > --- a/hw/vfio/common.c > > > > +++ b/hw/vfio/common.c > > > > @@ -174,7 +174,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) > > > > { > > > > VFIODevice *vbasedev; > > > > > > > > - if (!migration_is_active()) { > > > > + if (!migration_is_running()) { > > > > return false; > > > > } > > > > > > > > @@ -219,7 +219,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) > > > > { > > > > VFIODevice *vbasedev; > > > > > > > > - if (!migration_is_active()) { > > > > + if (!migration_is_running()) { > > > > return false; > > > > } > > > > > > > > diff --git a/system/dirtylimit.c b/system/dirtylimit.c > > > > index ab20da34bb..d7a855c603 100644 > > > > --- a/system/dirtylimit.c > > > > +++ b/system/dirtylimit.c > > > > @@ -80,8 +80,7 @@ static void vcpu_dirty_rate_stat_collect(void) > > > > int i = 0; > > > > int64_t period = DIRTYLIMIT_CALC_TIME_MS; > > > > > > > > - if (migrate_dirty_limit() && > > > > - migration_is_active()) { > > > > + if (migrate_dirty_limit() && migration_is_running()) { > > > > period = migrate_vcpu_dirty_limit_period(); > > > > } > > > > > > > > -- > > > > 2.45.0 > > > > > > -- > > Peter Xu > > >
On 28/10/2024 18:58, Peter Xu wrote: > External email: Use caution opening links or attachments > > > On Mon, Oct 28, 2024 at 06:41:42PM +0200, Avihai Horon wrote: >> On 28/10/2024 17:45, Peter Xu wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: >>>> On 25/10/2024 0:30, Peter Xu wrote: >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> We have two outside users of this API, so it's exported. >>>>> >>>>> Is it really necessary? Does it matter whether it must be >>>>> ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. >>>> Actually for VFIO it does matter, because we don't want VFIO to do DPT >>>> log_sync in SETUP stage when DPT might not have been started yet. >>>> See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration >>>> SETUP state"). >>> This seems to be a known issue for migration in general, rather than VFIO >>> specific. Hyman has a patch for it, not yet reviewed.. >>> >>> https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com >>> >>> That corresponds to your comment here: >>> >>> Redundant -- all RAM is marked dirty in migration SETUP state and is >>> transferred only after migration is set to ACTIVE state, so doing >>> log_sync during migration SETUP is pointless. >>> >>> So I wonder whether it's only VFIO that should skip it, or log_sync() >>> simply shouldn't be called at all during SETUP, because of its redundancy. >> Not sure why this sync was there in the first place, but if its only purpose >> was to sync dirty pages then yes, I guess it be dropped. >> >>> The other thing you mentioned here: >>> >>> Can fail -- there is a time window, between setting migration state to >>> SETUP and starting dirty tracking by RAM save_live_setup handler, during >>> which dirty tracking is still not started. Any VFIO log_sync call that >>> is issued during this time window will fail. For example, this error can >>> be triggered by migrating a VM when a GUI is active, which constantly >>> calls log_sync. >>> >>> This is VFIO specific. Why this can fail even if global tracking is >>> started already? >> It can fail if global tracking is *not* started yet. >> As mentioned in the commit message, there is a time window where migration >> is in SETUP state but global tracking is not started yet. > Hmm, I'm totally confused now.. > > The only thing that can kickoff the sync during SETUP, AFAICT, is: > > ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp); > if (!ret) { > goto out_unlock; > } > migration_bitmap_sync_precopy(false); <------------- here > > I need to confess this may not be the right place to invoke it in ram.c (I > think we probably should move it out at some point.. into generic migration > code). However I don't yet see why log_start() is not called first in your > case before log_sync(). > > Could you elaborate? Indeed, in the above code log_start is called before log_sync. I was referring to the case where some other code path triggers log_sync. E.g., if you open a VNC to the guest then it constantly calls log_sync to refresh the graphics. In that case, one of these log_syncs can happen between "migration status is set to SETUP" and "global tracking is started". Thanks. >> Thanks. >> >>> I didn't yet get why a GUI being active in guest could >>> affect log_sync() from working.. after vfio_listener_log_global_start() >>> properly setup everything. >>> >>> Thanks, >>> >>>> Thanks. >>>> >>>>> The external user is trying to detect whether migration is running or not, >>>>> as simple as that. >>>>> >>>>> To make the migration_is*() APIs even shorter, let's use >>>>> migration_is_running() for outside worlds. >>>>> >>>>> Internally there're actually three places that literally needs >>>>> migration_is_active() rather than running(). Keep that an internal helper. >>>>> >>>>> After this patch, we finally only export one helper that allows external >>>>> world to try detect migration status, which is migration_is_running(). >>>>> >>>>> Signed-off-by: Peter Xu <peterx@redhat.com> >>>>> --- >>>>> include/migration/misc.h | 1 - >>>>> migration/migration.h | 1 + >>>>> hw/vfio/common.c | 4 ++-- >>>>> system/dirtylimit.c | 3 +-- >>>>> 4 files changed, 4 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h >>>>> index ad1e25826a..c0e23fdac9 100644 >>>>> --- a/include/migration/misc.h >>>>> +++ b/include/migration/misc.h >>>>> @@ -53,7 +53,6 @@ void dump_vmstate_json_to_file(FILE *out_fp); >>>>> void migration_object_init(void); >>>>> void migration_shutdown(void); >>>>> >>>>> -bool migration_is_active(void); >>>>> bool migration_is_running(void); >>>>> bool migration_thread_is_self(void); >>>>> >>>>> diff --git a/migration/migration.h b/migration/migration.h >>>>> index 0956e9274b..9fa26ab06a 100644 >>>>> --- a/migration/migration.h >>>>> +++ b/migration/migration.h >>>>> @@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, >>>>> >>>>> int migrate_init(MigrationState *s, Error **errp); >>>>> bool migration_is_blocked(Error **errp); >>>>> +bool migration_is_active(void); >>>>> /* True if outgoing migration has entered postcopy phase */ >>>>> bool migration_in_postcopy(void); >>>>> bool migration_postcopy_is_alive(MigrationStatus state); >>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>>> index cc72282c71..7eb99ebd4d 100644 >>>>> --- a/hw/vfio/common.c >>>>> +++ b/hw/vfio/common.c >>>>> @@ -174,7 +174,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) >>>>> { >>>>> VFIODevice *vbasedev; >>>>> >>>>> - if (!migration_is_active()) { >>>>> + if (!migration_is_running()) { >>>>> return false; >>>>> } >>>>> >>>>> @@ -219,7 +219,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) >>>>> { >>>>> VFIODevice *vbasedev; >>>>> >>>>> - if (!migration_is_active()) { >>>>> + if (!migration_is_running()) { >>>>> return false; >>>>> } >>>>> >>>>> diff --git a/system/dirtylimit.c b/system/dirtylimit.c >>>>> index ab20da34bb..d7a855c603 100644 >>>>> --- a/system/dirtylimit.c >>>>> +++ b/system/dirtylimit.c >>>>> @@ -80,8 +80,7 @@ static void vcpu_dirty_rate_stat_collect(void) >>>>> int i = 0; >>>>> int64_t period = DIRTYLIMIT_CALC_TIME_MS; >>>>> >>>>> - if (migrate_dirty_limit() && >>>>> - migration_is_active()) { >>>>> + if (migrate_dirty_limit() && migration_is_running()) { >>>>> period = migrate_vcpu_dirty_limit_period(); >>>>> } >>>>> >>>>> -- >>>>> 2.45.0 >>>>> >>> -- >>> Peter Xu >>> > -- > Peter Xu >
On Mon, Oct 28, 2024 at 07:20:27PM +0200, Avihai Horon wrote: > > On 28/10/2024 18:58, Peter Xu wrote: > > External email: Use caution opening links or attachments > > > > > > On Mon, Oct 28, 2024 at 06:41:42PM +0200, Avihai Horon wrote: > > > On 28/10/2024 17:45, Peter Xu wrote: > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: > > > > > On 25/10/2024 0:30, Peter Xu wrote: > > > > > > External email: Use caution opening links or attachments > > > > > > > > > > > > > > > > > > We have two outside users of this API, so it's exported. > > > > > > > > > > > > Is it really necessary? Does it matter whether it must be > > > > > > ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. > > > > > Actually for VFIO it does matter, because we don't want VFIO to do DPT > > > > > log_sync in SETUP stage when DPT might not have been started yet. > > > > > See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration > > > > > SETUP state"). > > > > This seems to be a known issue for migration in general, rather than VFIO > > > > specific. Hyman has a patch for it, not yet reviewed.. > > > > > > > > https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com > > > > > > > > That corresponds to your comment here: > > > > > > > > Redundant -- all RAM is marked dirty in migration SETUP state and is > > > > transferred only after migration is set to ACTIVE state, so doing > > > > log_sync during migration SETUP is pointless. > > > > > > > > So I wonder whether it's only VFIO that should skip it, or log_sync() > > > > simply shouldn't be called at all during SETUP, because of its redundancy. > > > Not sure why this sync was there in the first place, but if its only purpose > > > was to sync dirty pages then yes, I guess it be dropped. > > > > > > > The other thing you mentioned here: > > > > > > > > Can fail -- there is a time window, between setting migration state to > > > > SETUP and starting dirty tracking by RAM save_live_setup handler, during > > > > which dirty tracking is still not started. Any VFIO log_sync call that > > > > is issued during this time window will fail. For example, this error can > > > > be triggered by migrating a VM when a GUI is active, which constantly > > > > calls log_sync. > > > > > > > > This is VFIO specific. Why this can fail even if global tracking is > > > > started already? > > > It can fail if global tracking is *not* started yet. > > > As mentioned in the commit message, there is a time window where migration > > > is in SETUP state but global tracking is not started yet. > > Hmm, I'm totally confused now.. > > > > The only thing that can kickoff the sync during SETUP, AFAICT, is: > > > > ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp); > > if (!ret) { > > goto out_unlock; > > } > > migration_bitmap_sync_precopy(false); <------------- here > > > > I need to confess this may not be the right place to invoke it in ram.c (I > > think we probably should move it out at some point.. into generic migration > > code). However I don't yet see why log_start() is not called first in your > > case before log_sync(). > > > > Could you elaborate? > > Indeed, in the above code log_start is called before log_sync. > > I was referring to the case where some other code path triggers log_sync. > E.g., if you open a VNC to the guest then it constantly calls log_sync to > refresh the graphics. In that case, one of these log_syncs can happen > between "migration status is set to SETUP" and "global tracking is started". I see. That's unfortunate.. Though this is also the case where it shouldn't be VFIO's problem alone. See some other users of log_sync(): vhost_sync_dirty_bitmap(): if (!dev->log_enabled || !dev->started) { return 0; } kvm_slot_get_dirty_log(): if (ret == -ENOENT) { /* kernel does not have dirty bitmap in this slot */ ret = 0; } And I didn't further look. In short, IMHO looks like VFIO still shouldn't be special on differeciating and make migration export the SETUP phase just for this.. as VFIO has log_start() like all the rest, so VFIO can also know whether tracking is enabled at all, then it can silently no-op the log_sync() like all the rest of the users. If you agree, I'd prefer we keep this patch - it'll be nice we only ever expose migration_is_running() for migration status checks, without exposing SETUP only for this VFIO use case even if it could have followed what other modules are doing. If you would like to propose a patch for VFIO, I'd be happy to include your patch before this patch (just in case this patch could land some day) to make sure VFIO works as before. Since I don't have VFIO HW to test, it'll be challenging for me to propose and test such patch otherwise. Thanks,
On 28/10/2024 21:06, Peter Xu wrote: > External email: Use caution opening links or attachments > > > On Mon, Oct 28, 2024 at 07:20:27PM +0200, Avihai Horon wrote: >> On 28/10/2024 18:58, Peter Xu wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Mon, Oct 28, 2024 at 06:41:42PM +0200, Avihai Horon wrote: >>>> On 28/10/2024 17:45, Peter Xu wrote: >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: >>>>>> On 25/10/2024 0:30, Peter Xu wrote: >>>>>>> External email: Use caution opening links or attachments >>>>>>> >>>>>>> >>>>>>> We have two outside users of this API, so it's exported. >>>>>>> >>>>>>> Is it really necessary? Does it matter whether it must be >>>>>>> ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. >>>>>> Actually for VFIO it does matter, because we don't want VFIO to do DPT >>>>>> log_sync in SETUP stage when DPT might not have been started yet. >>>>>> See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during migration >>>>>> SETUP state"). >>>>> This seems to be a known issue for migration in general, rather than VFIO >>>>> specific. Hyman has a patch for it, not yet reviewed.. >>>>> >>>>> https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com >>>>> >>>>> That corresponds to your comment here: >>>>> >>>>> Redundant -- all RAM is marked dirty in migration SETUP state and is >>>>> transferred only after migration is set to ACTIVE state, so doing >>>>> log_sync during migration SETUP is pointless. >>>>> >>>>> So I wonder whether it's only VFIO that should skip it, or log_sync() >>>>> simply shouldn't be called at all during SETUP, because of its redundancy. >>>> Not sure why this sync was there in the first place, but if its only purpose >>>> was to sync dirty pages then yes, I guess it be dropped. >>>> >>>>> The other thing you mentioned here: >>>>> >>>>> Can fail -- there is a time window, between setting migration state to >>>>> SETUP and starting dirty tracking by RAM save_live_setup handler, during >>>>> which dirty tracking is still not started. Any VFIO log_sync call that >>>>> is issued during this time window will fail. For example, this error can >>>>> be triggered by migrating a VM when a GUI is active, which constantly >>>>> calls log_sync. >>>>> >>>>> This is VFIO specific. Why this can fail even if global tracking is >>>>> started already? >>>> It can fail if global tracking is *not* started yet. >>>> As mentioned in the commit message, there is a time window where migration >>>> is in SETUP state but global tracking is not started yet. >>> Hmm, I'm totally confused now.. >>> >>> The only thing that can kickoff the sync during SETUP, AFAICT, is: >>> >>> ret = memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp); >>> if (!ret) { >>> goto out_unlock; >>> } >>> migration_bitmap_sync_precopy(false); <------------- here >>> >>> I need to confess this may not be the right place to invoke it in ram.c (I >>> think we probably should move it out at some point.. into generic migration >>> code). However I don't yet see why log_start() is not called first in your >>> case before log_sync(). >>> >>> Could you elaborate? >> Indeed, in the above code log_start is called before log_sync. >> >> I was referring to the case where some other code path triggers log_sync. >> E.g., if you open a VNC to the guest then it constantly calls log_sync to >> refresh the graphics. In that case, one of these log_syncs can happen >> between "migration status is set to SETUP" and "global tracking is started". > I see. That's unfortunate.. > > Though this is also the case where it shouldn't be VFIO's problem alone. > See some other users of log_sync(): > > vhost_sync_dirty_bitmap(): > if (!dev->log_enabled || !dev->started) { > return 0; > } > > kvm_slot_get_dirty_log(): > if (ret == -ENOENT) { > /* kernel does not have dirty bitmap in this slot */ > ret = 0; > } > > And I didn't further look. > > In short, IMHO looks like VFIO still shouldn't be special on differeciating > and make migration export the SETUP phase just for this.. as VFIO has > log_start() like all the rest, so VFIO can also know whether tracking is > enabled at all, then it can silently no-op the log_sync() like all the rest > of the users. > > If you agree, I'd prefer we keep this patch - it'll be nice we only ever > expose migration_is_running() for migration status checks, without exposing > SETUP only for this VFIO use case even if it could have followed what other > modules are doing. Yes, I agree this could be a nice cleanup. > > If you would like to propose a patch for VFIO, I'd be happy to include your > patch before this patch (just in case this patch could land some day) to > make sure VFIO works as before. Since I don't have VFIO HW to test, it'll > be challenging for me to propose and test such patch otherwise. I can do that, though it may be a bit involved because VFIO has multiple dirty tracking mechanisms (legacy, device DPT and IOMMU DPT). Plus, I don't have HW that supports IOMMU DPT at hand for testing. I assume this is not an urgent cleanup, right? Thanks.
On Wed, Oct 30, 2024, 10:39 a.m. Avihai Horon <avihaih@nvidia.com> wrote: > > On 28/10/2024 21:06, Peter Xu wrote: > > External email: Use caution opening links or attachments > > > > > > On Mon, Oct 28, 2024 at 07:20:27PM +0200, Avihai Horon wrote: > >> On 28/10/2024 18:58, Peter Xu wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Mon, Oct 28, 2024 at 06:41:42PM +0200, Avihai Horon wrote: > >>>> On 28/10/2024 17:45, Peter Xu wrote: > >>>>> External email: Use caution opening links or attachments > >>>>> > >>>>> > >>>>> On Mon, Oct 28, 2024 at 09:43:16AM +0200, Avihai Horon wrote: > >>>>>> On 25/10/2024 0:30, Peter Xu wrote: > >>>>>>> External email: Use caution opening links or attachments > >>>>>>> > >>>>>>> > >>>>>>> We have two outside users of this API, so it's exported. > >>>>>>> > >>>>>>> Is it really necessary? Does it matter whether it must be > >>>>>>> ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. > >>>>>> Actually for VFIO it does matter, because we don't want VFIO to do > DPT > >>>>>> log_sync in SETUP stage when DPT might not have been started yet. > >>>>>> See commit ff180c6bd7a8 ("vfio/migration: Skip log_sync during > migration > >>>>>> SETUP state"). > >>>>> This seems to be a known issue for migration in general, rather than > VFIO > >>>>> specific. Hyman has a patch for it, not yet reviewed.. > >>>>> > >>>>> https://lore.kernel.org/r/cover.1729648695.git.yong.huang@smartx.com > >>>>> > >>>>> That corresponds to your comment here: > >>>>> > >>>>> Redundant -- all RAM is marked dirty in migration SETUP state > and is > >>>>> transferred only after migration is set to ACTIVE state, so > doing > >>>>> log_sync during migration SETUP is pointless. > >>>>> > >>>>> So I wonder whether it's only VFIO that should skip it, or log_sync() > >>>>> simply shouldn't be called at all during SETUP, because of its > redundancy. > >>>> Not sure why this sync was there in the first place, but if its only > purpose > >>>> was to sync dirty pages then yes, I guess it be dropped. > >>>> > >>>>> The other thing you mentioned here: > >>>>> > >>>>> Can fail -- there is a time window, between setting migration > state to > >>>>> SETUP and starting dirty tracking by RAM save_live_setup > handler, during > >>>>> which dirty tracking is still not started. Any VFIO log_sync > call that > >>>>> is issued during this time window will fail. For example, > this error can > >>>>> be triggered by migrating a VM when a GUI is active, which > constantly > >>>>> calls log_sync. > >>>>> > >>>>> This is VFIO specific. Why this can fail even if global tracking is > >>>>> started already? > >>>> It can fail if global tracking is *not* started yet. > >>>> As mentioned in the commit message, there is a time window where > migration > >>>> is in SETUP state but global tracking is not started yet. > >>> Hmm, I'm totally confused now.. > >>> > >>> The only thing that can kickoff the sync during SETUP, AFAICT, is: > >>> > >>> ret = > memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp); > >>> if (!ret) { > >>> goto out_unlock; > >>> } > >>> migration_bitmap_sync_precopy(false); <------------- > here > >>> > >>> I need to confess this may not be the right place to invoke it in > ram.c (I > >>> think we probably should move it out at some point.. into generic > migration > >>> code). However I don't yet see why log_start() is not called first in > your > >>> case before log_sync(). > >>> > >>> Could you elaborate? > >> Indeed, in the above code log_start is called before log_sync. > >> > >> I was referring to the case where some other code path triggers > log_sync. > >> E.g., if you open a VNC to the guest then it constantly calls log_sync > to > >> refresh the graphics. In that case, one of these log_syncs can happen > >> between "migration status is set to SETUP" and "global tracking is > started". > > I see. That's unfortunate.. > > > > Though this is also the case where it shouldn't be VFIO's problem alone. > > See some other users of log_sync(): > > > > vhost_sync_dirty_bitmap(): > > if (!dev->log_enabled || !dev->started) { > > return 0; > > } > > > > kvm_slot_get_dirty_log(): > > if (ret == -ENOENT) { > > /* kernel does not have dirty bitmap in this slot */ > > ret = 0; > > } > > > > And I didn't further look. > > > > In short, IMHO looks like VFIO still shouldn't be special on > differeciating > > and make migration export the SETUP phase just for this.. as VFIO has > > log_start() like all the rest, so VFIO can also know whether tracking is > > enabled at all, then it can silently no-op the log_sync() like all the > rest > > of the users. > > > > If you agree, I'd prefer we keep this patch - it'll be nice we only ever > > expose migration_is_running() for migration status checks, without > exposing > > SETUP only for this VFIO use case even if it could have followed what > other > > modules are doing. > > Yes, I agree this could be a nice cleanup. > > > > > If you would like to propose a patch for VFIO, I'd be happy to include > your > > patch before this patch (just in case this patch could land some day) to > > make sure VFIO works as before. Since I don't have VFIO HW to test, > it'll > > be challenging for me to propose and test such patch otherwise. > > I can do that, though it may be a bit involved because VFIO has multiple > dirty tracking mechanisms (legacy, device DPT and IOMMU DPT). Plus, I > don't have HW that supports IOMMU DPT at hand for testing. > I assume this is not an urgent cleanup, right? > Yes there's no rush. If you agree with the change, feel free to clean this helper up when you post the vfio changes. We can still have one more helper exported before that. Thanks a lot. > Thanks. > >
diff --git a/include/migration/misc.h b/include/migration/misc.h index ad1e25826a..c0e23fdac9 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -53,7 +53,6 @@ void dump_vmstate_json_to_file(FILE *out_fp); void migration_object_init(void); void migration_shutdown(void); -bool migration_is_active(void); bool migration_is_running(void); bool migration_thread_is_self(void); diff --git a/migration/migration.h b/migration/migration.h index 0956e9274b..9fa26ab06a 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -492,6 +492,7 @@ int migration_call_notifiers(MigrationState *s, MigrationEventType type, int migrate_init(MigrationState *s, Error **errp); bool migration_is_blocked(Error **errp); +bool migration_is_active(void); /* True if outgoing migration has entered postcopy phase */ bool migration_in_postcopy(void); bool migration_postcopy_is_alive(MigrationStatus state); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index cc72282c71..7eb99ebd4d 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -174,7 +174,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer) { VFIODevice *vbasedev; - if (!migration_is_active()) { + if (!migration_is_running()) { return false; } @@ -219,7 +219,7 @@ vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer) { VFIODevice *vbasedev; - if (!migration_is_active()) { + if (!migration_is_running()) { return false; } diff --git a/system/dirtylimit.c b/system/dirtylimit.c index ab20da34bb..d7a855c603 100644 --- a/system/dirtylimit.c +++ b/system/dirtylimit.c @@ -80,8 +80,7 @@ static void vcpu_dirty_rate_stat_collect(void) int i = 0; int64_t period = DIRTYLIMIT_CALC_TIME_MS; - if (migrate_dirty_limit() && - migration_is_active()) { + if (migrate_dirty_limit() && migration_is_running()) { period = migrate_vcpu_dirty_limit_period(); }
We have two outside users of this API, so it's exported. Is it really necessary? Does it matter whether it must be ACTIVE/POSTCOPY_ACTIVE/DEVICE? I guess no. The external user is trying to detect whether migration is running or not, as simple as that. To make the migration_is*() APIs even shorter, let's use migration_is_running() for outside worlds. Internally there're actually three places that literally needs migration_is_active() rather than running(). Keep that an internal helper. After this patch, we finally only export one helper that allows external world to try detect migration status, which is migration_is_running(). Signed-off-by: Peter Xu <peterx@redhat.com> --- include/migration/misc.h | 1 - migration/migration.h | 1 + hw/vfio/common.c | 4 ++-- system/dirtylimit.c | 3 +-- 4 files changed, 4 insertions(+), 5 deletions(-)