diff mbox series

[RFC,v1,1/1] migration: Disable postcopy + multifd migration

Message ID 20230327161518.2385074-1-leobras@redhat.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v1,1/1] migration: Disable postcopy + multifd migration | expand

Commit Message

Leonardo Bras March 27, 2023, 4:15 p.m. UTC
Since the introduction of multifd, it's possible to perform a multifd
migration and finish it using postcopy.

A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
a successful use of this migration scenario, and now it should be
working on most cases.

But since there is not enough testing/support nor any reported users for
this scenario, we should disable this combination before it may cause any
problems for users.

Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 migration/migration.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Peter Xu March 27, 2023, 4:28 p.m. UTC | #1
On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> Since the introduction of multifd, it's possible to perform a multifd
> migration and finish it using postcopy.
> 
> A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> a successful use of this migration scenario, and now it should be
> working on most cases.
> 
> But since there is not enough testing/support nor any reported users for
> this scenario, we should disable this combination before it may cause any
> problems for users.
> 
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>
Dr. David Alan Gilbert March 28, 2023, 10:59 a.m. UTC | #2
* Leonardo Bras (leobras@redhat.com) wrote:
> Since the introduction of multifd, it's possible to perform a multifd
> migration and finish it using postcopy.
> 
> A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> a successful use of this migration scenario, and now it should be
> working on most cases.
> 
> But since there is not enough testing/support nor any reported users for
> this scenario, we should disable this combination before it may cause any
> problems for users.
> 
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  migration/migration.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ae2025d9d8..c601964b0e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1356,6 +1356,11 @@ static bool migrate_caps_check(bool *cap_list,
>              error_setg(errp, "Postcopy is not compatible with ignore-shared");
>              return false;
>          }
> +
> +        if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
> +            error_setg(errp, "Postcopy is not yet compatible with multifd");
> +            return false;
> +        }
>      }
>  
>      if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
> -- 
> 2.40.0
>
Daniel P. Berrangé March 30, 2023, 2:20 p.m. UTC | #3
On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> Since the introduction of multifd, it's possible to perform a multifd
> migration and finish it using postcopy.
> 
> A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> a successful use of this migration scenario, and now it should be
> working on most cases.
> 
> But since there is not enough testing/support nor any reported users for
> this scenario, we should disable this combination before it may cause any
> problems for users.

Clearly we don't have enough testing, but multifd+postcopy looks
like a clearly useful scenario that we should be supporting.

Every post-copy starts with at least one pre-copy iteration, and
using multifd for that will be important for big VMs where single
threaded pre-copy is going to be CPU bound. The greater amount we
can transfer in the pre-copy phase, the less page faults / latency
spikes postcopy is going to see.

In terms of migration usage, my personal recommendation to mgmt
apps would be that they should always enable the post-copy feature
when starting a migration. Even if they expect to try to get it to
complete using exclusively pre-copy in the common case, its useful
to have post-copy capability flag enabled, as a get out of jail
free card. ie if migration ends up getting stuck in non-convergance,
or they have a sudden need to urgently complete the migration it is
good to be able to flip to post-copy mode.

I'd suggest that we instead add a multifd+postcopy test case to
migration-test.c and tackle any bugs it exposes. By blocking it
unconditionally we ensure no one will exercise it to expose any
further bugs.

> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---
>  migration/migration.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index ae2025d9d8..c601964b0e 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1356,6 +1356,11 @@ static bool migrate_caps_check(bool *cap_list,
>              error_setg(errp, "Postcopy is not compatible with ignore-shared");
>              return false;
>          }
> +
> +        if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
> +            error_setg(errp, "Postcopy is not yet compatible with multifd");
> +            return false;
> +        }
>      }
>  
>      if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
> -- 
> 2.40.0
> 
> 

With regards,
Daniel
Peter Xu March 30, 2023, 2:36 p.m. UTC | #4
On Thu, Mar 30, 2023 at 03:20:14PM +0100, Daniel P. Berrangé wrote:
> On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> > Since the introduction of multifd, it's possible to perform a multifd
> > migration and finish it using postcopy.
> > 
> > A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> > a successful use of this migration scenario, and now it should be
> > working on most cases.
> > 
> > But since there is not enough testing/support nor any reported users for
> > this scenario, we should disable this combination before it may cause any
> > problems for users.
> 
> Clearly we don't have enough testing, but multifd+postcopy looks
> like a clearly useful scenario that we should be supporting.
> 
> Every post-copy starts with at least one pre-copy iteration, and
> using multifd for that will be important for big VMs where single
> threaded pre-copy is going to be CPU bound. The greater amount we
> can transfer in the pre-copy phase, the less page faults / latency
> spikes postcopy is going to see.

If we're using 1-round precopy + postcopy approach, the amount of memory
will be the same which is the guest mem size.

Multifd will make the round shorter so more chance of getting less
re-dirtied pages during the iteration, but that effect is limited.  E.g.:

  - For a very idle guest, finishing 1st round in 1min or 3min may not
    bring a large difference because most of the pages will be constant
    anyway, or

  - For a very busy guest, probably similar amount of pages will be dirtied
    no matter in 1min / 3min.  Multifd will bring a benefit here, but
    busier the guest smaller the effect.

> 
> In terms of migration usage, my personal recommendation to mgmt
> apps would be that they should always enable the post-copy feature
> when starting a migration. Even if they expect to try to get it to
> complete using exclusively pre-copy in the common case, its useful
> to have post-copy capability flag enabled, as a get out of jail
> free card. ie if migration ends up getting stuck in non-convergance,
> or they have a sudden need to urgently complete the migration it is
> good to be able to flip to post-copy mode.

I fully agree.

It should not need to be enabled only if not capable, e.g., the dest host
may not have privilege to initiate the userfaultfd (since QEMU postcopy
requires kernel fault traps, so it's very likely).

The recent introduction of /dev/userfaultfd should make it even less likely
to happen, it'll still require (1) admin adjusted permissions of the
devnode and qemu ownership so qemu is in the white list, and (2) kernel
needs to be new enough to have /dev/userfaultfd.

> 
> I'd suggest that we instead add a multifd+postcopy test case to
> migration-test.c and tackle any bugs it exposes. By blocking it
> unconditionally we ensure no one will exercise it to expose any
> further bugs.

That's doable.  But then we'd better also figure out how to identify the
below two use cases of both features enabled:

  a. Enable multifd in precopy only, then switch to postcopy (currently
  mostly working but buggy; I think Juan can provide more information here,
  at least we need to rework multifd flush when switching, and test and
  test over to make sure there's nothing else missing).

  b. Enable multifd in both precopy and postcopy phase (currently
  definitely not supported)

So that mgmt app will be aware whether multifd will be enabled in postcopy
or not.  Currently we can't identify it.

I assume we can say by default "mutlifd+postcopy" means a) above, but we
need to document it, and when b) is wanted and implemented someday, we'll
need some other flag/cap for it.
Daniel P. Berrangé March 30, 2023, 2:57 p.m. UTC | #5
On Thu, Mar 30, 2023 at 10:36:11AM -0400, Peter Xu wrote:
> On Thu, Mar 30, 2023 at 03:20:14PM +0100, Daniel P. Berrangé wrote:
> > On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> > > Since the introduction of multifd, it's possible to perform a multifd
> > > migration and finish it using postcopy.
> > > 
> > > A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> > > a successful use of this migration scenario, and now it should be
> > > working on most cases.
> > > 
> > > But since there is not enough testing/support nor any reported users for
> > > this scenario, we should disable this combination before it may cause any
> > > problems for users.
> > 
> > Clearly we don't have enough testing, but multifd+postcopy looks
> > like a clearly useful scenario that we should be supporting.
> > 
> > Every post-copy starts with at least one pre-copy iteration, and
> > using multifd for that will be important for big VMs where single
> > threaded pre-copy is going to be CPU bound. The greater amount we
> > can transfer in the pre-copy phase, the less page faults / latency
> > spikes postcopy is going to see.
> 
> If we're using 1-round precopy + postcopy approach, the amount of memory
> will be the same which is the guest mem size.
> 
> Multifd will make the round shorter so more chance of getting less
> re-dirtied pages during the iteration, but that effect is limited.  E.g.:
> 
>   - For a very idle guest, finishing 1st round in 1min or 3min may not
>     bring a large difference because most of the pages will be constant
>     anyway, or
> 
>   - For a very busy guest, probably similar amount of pages will be dirtied
>     no matter in 1min / 3min.  Multifd will bring a benefit here, but
>     busier the guest smaller the effect.

I don't feel like that follows. If we're bottlenecking mostly on CPU
but have sufficient network bandwidth, then multifd can be the difference
between needing to switch to post-copy or being successful in converging
in pre-copy.

IOW, without multifd we can expect 90% of guests will get stuck and need
a switch to post-copy, but with multifd 90% of the guest will complete
while in precopy mode and only 10% need switch to post-copy. That's good
because it means most guests will avoid the increased failure risk and
the period of increased page fault latency from post-copy.


> > In terms of migration usage, my personal recommendation to mgmt
> > apps would be that they should always enable the post-copy feature
> > when starting a migration. Even if they expect to try to get it to
> > complete using exclusively pre-copy in the common case, its useful
> > to have post-copy capability flag enabled, as a get out of jail
> > free card. ie if migration ends up getting stuck in non-convergance,
> > or they have a sudden need to urgently complete the migration it is
> > good to be able to flip to post-copy mode.
> 
> I fully agree.
> 
> It should not need to be enabled only if not capable, e.g., the dest host
> may not have privilege to initiate the userfaultfd (since QEMU postcopy
> requires kernel fault traps, so it's very likely).

Sure, the mgmt app (libvirt) should be checking support for userfaultfd
on both sides before permitting / trying to enable the feature.


> > I'd suggest that we instead add a multifd+postcopy test case to
> > migration-test.c and tackle any bugs it exposes. By blocking it
> > unconditionally we ensure no one will exercise it to expose any
> > further bugs.
> 
> That's doable.  But then we'd better also figure out how to identify the
> below two use cases of both features enabled:
> 
>   a. Enable multifd in precopy only, then switch to postcopy (currently
>   mostly working but buggy; I think Juan can provide more information here,
>   at least we need to rework multifd flush when switching, and test and
>   test over to make sure there's nothing else missing).
> 
>   b. Enable multifd in both precopy and postcopy phase (currently
>   definitely not supported)
> 
> So that mgmt app will be aware whether multifd will be enabled in postcopy
> or not.  Currently we can't identify it.
> 
> I assume we can say by default "mutlifd+postcopy" means a) above, but we
> need to document it, and when b) is wanted and implemented someday, we'll
> need some other flag/cap for it.

As I've mentioned a few times, I think we need to throw away the idea
of exposing capabilities that mgmt apps need to learn about, and make
the migration protocol fully bi-directional so src + dst QEMU can
directly negotiate features. Apps shouldn't have to care about the
day-to-day improvements in the migration impl to the extent that they
are today.

With regards,
Daniel
Dr. David Alan Gilbert March 30, 2023, 3:59 p.m. UTC | #6
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Mar 30, 2023 at 10:36:11AM -0400, Peter Xu wrote:
> > On Thu, Mar 30, 2023 at 03:20:14PM +0100, Daniel P. Berrangé wrote:
> > > On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> > > > Since the introduction of multifd, it's possible to perform a multifd
> > > > migration and finish it using postcopy.
> > > > 
> > > > A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> > > > a successful use of this migration scenario, and now it should be
> > > > working on most cases.
> > > > 
> > > > But since there is not enough testing/support nor any reported users for
> > > > this scenario, we should disable this combination before it may cause any
> > > > problems for users.
> > > 
> > > Clearly we don't have enough testing, but multifd+postcopy looks
> > > like a clearly useful scenario that we should be supporting.
> > > 
> > > Every post-copy starts with at least one pre-copy iteration, and
> > > using multifd for that will be important for big VMs where single
> > > threaded pre-copy is going to be CPU bound. The greater amount we
> > > can transfer in the pre-copy phase, the less page faults / latency
> > > spikes postcopy is going to see.
> > 
> > If we're using 1-round precopy + postcopy approach, the amount of memory
> > will be the same which is the guest mem size.
> > 
> > Multifd will make the round shorter so more chance of getting less
> > re-dirtied pages during the iteration, but that effect is limited.  E.g.:
> > 
> >   - For a very idle guest, finishing 1st round in 1min or 3min may not
> >     bring a large difference because most of the pages will be constant
> >     anyway, or
> > 
> >   - For a very busy guest, probably similar amount of pages will be dirtied
> >     no matter in 1min / 3min.  Multifd will bring a benefit here, but
> >     busier the guest smaller the effect.
> 
> I don't feel like that follows. If we're bottlenecking mostly on CPU
> but have sufficient network bandwidth, then multifd can be the difference
> between needing to switch to post-copy or being successful in converging
> in pre-copy.
> 
> IOW, without multifd we can expect 90% of guests will get stuck and need
> a switch to post-copy, but with multifd 90% of the guest will complete
> while in precopy mode and only 10% need switch to post-copy. That's good
> because it means most guests will avoid the increased failure risk and
> the period of increased page fault latency from post-copy.

Agreed, although I think Peter's point was that in the cases where you
know the guests are crazy busy and you're always going to need postcopy,
it's a bit less of an issue.
(But still, getting multiple fd's in the postcopy phase is good to
reduce latency).

Dave

> 
> > > In terms of migration usage, my personal recommendation to mgmt
> > > apps would be that they should always enable the post-copy feature
> > > when starting a migration. Even if they expect to try to get it to
> > > complete using exclusively pre-copy in the common case, its useful
> > > to have post-copy capability flag enabled, as a get out of jail
> > > free card. ie if migration ends up getting stuck in non-convergance,
> > > or they have a sudden need to urgently complete the migration it is
> > > good to be able to flip to post-copy mode.
> > 
> > I fully agree.
> > 
> > It should not need to be enabled only if not capable, e.g., the dest host
> > may not have privilege to initiate the userfaultfd (since QEMU postcopy
> > requires kernel fault traps, so it's very likely).
> 
> Sure, the mgmt app (libvirt) should be checking support for userfaultfd
> on both sides before permitting / trying to enable the feature.
> 
> 
> > > I'd suggest that we instead add a multifd+postcopy test case to
> > > migration-test.c and tackle any bugs it exposes. By blocking it
> > > unconditionally we ensure no one will exercise it to expose any
> > > further bugs.
> > 
> > That's doable.  But then we'd better also figure out how to identify the
> > below two use cases of both features enabled:
> > 
> >   a. Enable multifd in precopy only, then switch to postcopy (currently
> >   mostly working but buggy; I think Juan can provide more information here,
> >   at least we need to rework multifd flush when switching, and test and
> >   test over to make sure there's nothing else missing).
> > 
> >   b. Enable multifd in both precopy and postcopy phase (currently
> >   definitely not supported)
> > 
> > So that mgmt app will be aware whether multifd will be enabled in postcopy
> > or not.  Currently we can't identify it.
> > 
> > I assume we can say by default "mutlifd+postcopy" means a) above, but we
> > need to document it, and when b) is wanted and implemented someday, we'll
> > need some other flag/cap for it.
> 
> As I've mentioned a few times, I think we need to throw away the idea
> of exposing capabilities that mgmt apps need to learn about, and make
> the migration protocol fully bi-directional so src + dst QEMU can
> directly negotiate features. Apps shouldn't have to care about the
> day-to-day improvements in the migration impl to the extent that they
> are today.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
Peter Xu March 30, 2023, 10:18 p.m. UTC | #7
On Thu, Mar 30, 2023 at 04:59:09PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Thu, Mar 30, 2023 at 10:36:11AM -0400, Peter Xu wrote:
> > > On Thu, Mar 30, 2023 at 03:20:14PM +0100, Daniel P. Berrangé wrote:
> > > > On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> > > > > Since the introduction of multifd, it's possible to perform a multifd
> > > > > migration and finish it using postcopy.
> > > > > 
> > > > > A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> > > > > a successful use of this migration scenario, and now it should be
> > > > > working on most cases.
> > > > > 
> > > > > But since there is not enough testing/support nor any reported users for
> > > > > this scenario, we should disable this combination before it may cause any
> > > > > problems for users.
> > > > 
> > > > Clearly we don't have enough testing, but multifd+postcopy looks
> > > > like a clearly useful scenario that we should be supporting.
> > > > 
> > > > Every post-copy starts with at least one pre-copy iteration, and
> > > > using multifd for that will be important for big VMs where single
> > > > threaded pre-copy is going to be CPU bound. The greater amount we
> > > > can transfer in the pre-copy phase, the less page faults / latency
> > > > spikes postcopy is going to see.
> > > 
> > > If we're using 1-round precopy + postcopy approach, the amount of memory
> > > will be the same which is the guest mem size.
> > > 
> > > Multifd will make the round shorter so more chance of getting less
> > > re-dirtied pages during the iteration, but that effect is limited.  E.g.:
> > > 
> > >   - For a very idle guest, finishing 1st round in 1min or 3min may not
> > >     bring a large difference because most of the pages will be constant
> > >     anyway, or
> > > 
> > >   - For a very busy guest, probably similar amount of pages will be dirtied
> > >     no matter in 1min / 3min.  Multifd will bring a benefit here, but
> > >     busier the guest smaller the effect.
> > 
> > I don't feel like that follows. If we're bottlenecking mostly on CPU
> > but have sufficient network bandwidth, then multifd can be the difference
> > between needing to switch to post-copy or being successful in converging
> > in pre-copy.
> > 
> > IOW, without multifd we can expect 90% of guests will get stuck and need
> > a switch to post-copy, but with multifd 90% of the guest will complete
> > while in precopy mode and only 10% need switch to post-copy. That's good
> > because it means most guests will avoid the increased failure risk and
> > the period of increased page fault latency from post-copy.

Makes sense.  But we may need someone to look after that, though.  I am
aware that Juan used to plan doing work in this area.  Juan, have you
started looking into fixing multifd + postcopy (for current phase, not for
a complete support)?  If we're confident and think resolving it is easy
then I think it'll be worthwhile, and this patch may not be needed.

We should always keep in mind though that currently the user can suffer
from weird errors or crashes when using them together, and that's the major
reason Leonardo proposed this patch - we either fix things soon or we
disable them, which also makes sense to me.

I think time somehow proved that it's non-trivial to fix them soon, hence
this patch.  I'll be 100% more than happy when patches coming to prove me
wrong to fix things up (along with a multifd+postcopy qtest).

> 
> Agreed, although I think Peter's point was that in the cases where you
> know the guests are crazy busy and you're always going to need postcopy,
> it's a bit less of an issue.
> (But still, getting multiple fd's in the postcopy phase is good to
> reduce latency).

Yes, that'll be another story though, IMHO.

When talking about this, I'd guess it'll be easier (and much less code) to
just spawn more preempt threads than multifd ones: some of them can service
page faults only, but some of them just keeps dumping concurrently with the
migration thread.

It should be easy because all preempt threads on dest buffers the reads and
they'll be as simple as a wrapper of ram_load_postcopy().  I think it could
naturally just work, but I'll need to check when we think it more
seriously.

> 
> Dave
> 
> > 
> > > > In terms of migration usage, my personal recommendation to mgmt
> > > > apps would be that they should always enable the post-copy feature
> > > > when starting a migration. Even if they expect to try to get it to
> > > > complete using exclusively pre-copy in the common case, its useful
> > > > to have post-copy capability flag enabled, as a get out of jail
> > > > free card. ie if migration ends up getting stuck in non-convergance,
> > > > or they have a sudden need to urgently complete the migration it is
> > > > good to be able to flip to post-copy mode.
> > > 
> > > I fully agree.
> > > 
> > > It should not need to be enabled only if not capable, e.g., the dest host
> > > may not have privilege to initiate the userfaultfd (since QEMU postcopy
> > > requires kernel fault traps, so it's very likely).
> > 
> > Sure, the mgmt app (libvirt) should be checking support for userfaultfd
> > on both sides before permitting / trying to enable the feature.
> > 
> > 
> > > > I'd suggest that we instead add a multifd+postcopy test case to
> > > > migration-test.c and tackle any bugs it exposes. By blocking it
> > > > unconditionally we ensure no one will exercise it to expose any
> > > > further bugs.
> > > 
> > > That's doable.  But then we'd better also figure out how to identify the
> > > below two use cases of both features enabled:
> > > 
> > >   a. Enable multifd in precopy only, then switch to postcopy (currently
> > >   mostly working but buggy; I think Juan can provide more information here,
> > >   at least we need to rework multifd flush when switching, and test and
> > >   test over to make sure there's nothing else missing).
> > > 
> > >   b. Enable multifd in both precopy and postcopy phase (currently
> > >   definitely not supported)
> > > 
> > > So that mgmt app will be aware whether multifd will be enabled in postcopy
> > > or not.  Currently we can't identify it.
> > > 
> > > I assume we can say by default "mutlifd+postcopy" means a) above, but we
> > > need to document it, and when b) is wanted and implemented someday, we'll
> > > need some other flag/cap for it.
> > 
> > As I've mentioned a few times, I think we need to throw away the idea
> > of exposing capabilities that mgmt apps need to learn about, and make
> > the migration protocol fully bi-directional so src + dst QEMU can
> > directly negotiate features. Apps shouldn't have to care about the
> > day-to-day improvements in the migration impl to the extent that they
> > are today.

I agree that setting the same caps on both sides are ugly, but shouldn't
this a separate problem, and we should allow the user to choose (no matter
to apply that to src only, or to both sides)?

To be explicit, I am thinking maybe even if multifd+postcopy full support
will be implemented, for some reason the user still wants to only use
multifd during precopy but not postcopy.  I'm afraid automatically choose
what's the latest supported may not always work for the user for whatever
reasons and could have other implications here.

In short, IMHO it's an ABI breakage if user enabled both features, then it
behaves differently after upgrading QEMU with a full multifd+postcopy
support added.

Thanks,
Leonardo Bras March 31, 2023, 5:51 a.m. UTC | #8
On Thu, 2023-03-30 at 18:18 -0400, Peter Xu wrote:
> On Thu, Mar 30, 2023 at 04:59:09PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Thu, Mar 30, 2023 at 10:36:11AM -0400, Peter Xu wrote:
> > > > On Thu, Mar 30, 2023 at 03:20:14PM +0100, Daniel P. Berrangé wrote:
> > > > > On Mon, Mar 27, 2023 at 01:15:18PM -0300, Leonardo Bras wrote:
> > > > > > Since the introduction of multifd, it's possible to perform a multifd
> > > > > > migration and finish it using postcopy.
> > > > > > 
> > > > > > A bug introduced by yank (fixed on cfc3bcf373) was previously preventing
> > > > > > a successful use of this migration scenario, and now it should be
> > > > > > working on most cases.
> > > > > > 
> > > > > > But since there is not enough testing/support nor any reported users for
> > > > > > this scenario, we should disable this combination before it may cause any
> > > > > > problems for users.
> > > > > 
> > > > > Clearly we don't have enough testing, but multifd+postcopy looks
> > > > > like a clearly useful scenario that we should be supporting.
> > > > > 
> > > > > Every post-copy starts with at least one pre-copy iteration, and
> > > > > using multifd for that will be important for big VMs where single
> > > > > threaded pre-copy is going to be CPU bound. The greater amount we
> > > > > can transfer in the pre-copy phase, the less page faults / latency
> > > > > spikes postcopy is going to see.

I totally agree. It's a valid and useful scenario, and I am far from denying
this.

The thing is: postcopy + multifd has been 'available' for a while, but 100%
broken at least since the introduction of Yank (this was fixed on cfc3bcf373).
And even this 100% incidence bug was only noticed by an avid tester, meaning no
one was actually triggering postcopy after a multifd migration.

The proposal of this patchset is to avoid that unaware users just ends up losing
their VM due to a possible bug that may happen while using this configuration.

> > > > 
> > > > If we're using 1-round precopy + postcopy approach, the amount of memory
> > > > will be the same which is the guest mem size.
> > > > 
> > > > Multifd will make the round shorter so more chance of getting less
> > > > re-dirtied pages during the iteration, but that effect is limited.  E.g.:
> > > > 
> > > >   - For a very idle guest, finishing 1st round in 1min or 3min may not
> > > >     bring a large difference because most of the pages will be constant
> > > >     anyway, or
> > > > 
> > > >   - For a very busy guest, probably similar amount of pages will be dirtied
> > > >     no matter in 1min / 3min.  Multifd will bring a benefit here, but
> > > >     busier the guest smaller the effect.
> > > 
> > > I don't feel like that follows. If we're bottlenecking mostly on CPU
> > > but have sufficient network bandwidth, then multifd can be the difference
> > > between needing to switch to post-copy or being successful in converging
> > > in pre-copy.
> > > 
> > > IOW, without multifd we can expect 90% of guests will get stuck and need
> > > a switch to post-copy, but with multifd 90% of the guest will complete
> > > while in precopy mode and only 10% need switch to post-copy. That's good
> > > because it means most guests will avoid the increased failure risk and
> > > the period of increased page fault latency from post-copy.
> 
> Makes sense.  But we may need someone to look after that, though.  I am
> aware that Juan used to plan doing work in this area.  Juan, have you
> started looking into fixing multifd + postcopy (for current phase, not for
> a complete support)?  If we're confident and think resolving it is easy
> then I think it'll be worthwhile, and this patch may not be needed.
> 
> We should always keep in mind though that currently the user can suffer
> from weird errors or crashes when using them together, and that's the major
> reason Leonardo proposed this patch - we either fix things soon or we
> disable them, which also makes sense to me.
> 
> I think time somehow proved that it's non-trivial to fix them soon, hence
> this patch.  I'll be 100% more than happy when patches coming to prove me
> wrong to fix things up (along with a multifd+postcopy qtest).

Yes, exactly.

I am not sure on how bugfixes are provided upstream to older versions, but this
issue hits since qemu 6.0.0. Are we providing a fix to those versions? 

If so, it's probably easier to have this patch disabling the combination than a
full backport with the fix.


> 
> > 
> > Agreed, although I think Peter's point was that in the cases where you
> > know the guests are crazy busy and you're always going to need postcopy,
> > it's a bit less of an issue.
> > (But still, getting multiple fd's in the postcopy phase is good to
> > reduce latency).
> 
> Yes, that'll be another story though, IMHO.
> 
> When talking about this, I'd guess it'll be easier (and much less code) to
> just spawn more preempt threads than multifd ones: some of them can service
> page faults only, but some of them just keeps dumping concurrently with the
> migration thread.
> 
> It should be easy because all preempt threads on dest buffers the reads and
> they'll be as simple as a wrapper of ram_load_postcopy().  I think it could
> naturally just work, but I'll need to check when we think it more
> seriously.
> 
> > 
> > Dave
> > 
> > > 
> > > > > In terms of migration usage, my personal recommendation to mgmt
> > > > > apps would be that they should always enable the post-copy feature
> > > > > when starting a migration. Even if they expect to try to get it to
> > > > > complete using exclusively pre-copy in the common case, its useful
> > > > > to have post-copy capability flag enabled, as a get out of jail
> > > > > free card. ie if migration ends up getting stuck in non-convergance,
> > > > > or they have a sudden need to urgently complete the migration it is
> > > > > good to be able to flip to post-copy mode.
> > > > 
> > > > I fully agree.

Yes, I also agree with this. Nice analogy with the "out of jail free card" :)
It's great to have multifd with the postcopy option, once it is working fine.

Imagine this same scenario: multifd migration not converging, so trigger
postcopy: VM crashes. 
It feels much worse that telling the user this option is not supported.

> > > > 
> > > > It should not need to be enabled only if not capable, e.g., the dest host
> > > > may not have privilege to initiate the userfaultfd (since QEMU postcopy
> > > > requires kernel fault traps, so it's very likely).
> > > 
> > > Sure, the mgmt app (libvirt) should be checking support for userfaultfd
> > > on both sides before permitting / trying to enable the feature.

+1

> > > 
> > > 
> > > > > I'd suggest that we instead add a multifd+postcopy test case to
> > > > > migration-test.c and tackle any bugs it exposes. By blocking it
> > > > > unconditionally we ensure no one will exercise it to expose any
> > > > > further bugs.

We can revert the patch as soon as it is working nicely. This may take some
time, though. That's why we want to keep it safe for the time being.

> > > > 
> > > > That's doable.  But then we'd better also figure out how to identify the
> > > > below two use cases of both features enabled:
> > > > 
> > > >   a. Enable multifd in precopy only, then switch to postcopy (currently
> > > >   mostly working but buggy; I think Juan can provide more information here,
> > > >   at least we need to rework multifd flush when switching, and test and
> > > >   test over to make sure there's nothing else missing).
> > > > 
> > > >   b. Enable multifd in both precopy and postcopy phase (currently
> > > >   definitely not supported)
> > > > 
> > > > So that mgmt app will be aware whether multifd will be enabled in postcopy
> > > > or not.  Currently we can't identify it.
> > > > 
> > > > I assume we can say by default "mutlifd+postcopy" means a) above, but we
> > > > need to document it, and when b) is wanted and implemented someday, we'll
> > > > need some other flag/cap for it.
> > > 
> > > As I've mentioned a few times, I think we need to throw away the idea
> > > of exposing capabilities that mgmt apps need to learn about, and make
> > > the migration protocol fully bi-directional so src + dst QEMU can
> > > directly negotiate features. Apps shouldn't have to care about the
> > > day-to-day improvements in the migration impl to the extent that they
> > > are today.
> 
> I agree that setting the same caps on both sides are ugly, but shouldn't
> this a separate problem, and we should allow the user to choose (no matter
> to apply that to src only, or to both sides)?

Agree that setting stuff only once is much nicer than setting it twice, and
could have avoided some config issues I had in the past. 

> 
> To be explicit, I am thinking maybe even if multifd+postcopy full support
> will be implemented, for some reason the user still wants to only use
> multifd during precopy but not postcopy.  I'm afraid automatically choose
> what's the latest supported may not always work for the user for whatever
> reasons and could have other implications here.

I agree the user should be the one choosing what to use, but then having both
sides' qemu to negotiate this without the other side's management app, that
could be nice. Failure to setting something could happen on requesting side,
something like "Remote host don't support TLS".

> 
> In short, IMHO it's an ABI breakage if user enabled both features, then it
> behaves differently after upgrading QEMU with a full multifd+postcopy
> support added.
> 
> Thanks,
> 


Thank you all for the comments!

Leo
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index ae2025d9d8..c601964b0e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1356,6 +1356,11 @@  static bool migrate_caps_check(bool *cap_list,
             error_setg(errp, "Postcopy is not compatible with ignore-shared");
             return false;
         }
+
+        if (cap_list[MIGRATION_CAPABILITY_MULTIFD]) {
+            error_setg(errp, "Postcopy is not yet compatible with multifd");
+            return false;
+        }
     }
 
     if (cap_list[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {