[RFC,V1,00/14] precreate phase

Message ID	1729178055-207271-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> From: Steve Sistare <steven.sistare@oracle.com> To: qemu-devel@nongnu.org Cc: Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, David Hildenbrand <david@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Eduardo Habkost <eduardo@habkost.net>, Philippe Mathieu-Daude <philmd@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrange" <berrange@redhat.com>, Markus Armbruster <armbru@redhat.com>, Steve Sistare <steven.sistare@oracle.com> Subject: [RFC V1 00/14] precreate phase Date: Thu, 17 Oct 2024 08:14:01 -0700 Message-Id: <1729178055-207271-1-git-send-email-steven.sistare@oracle.com> Received-SPF: pass client-ip=205.220.165.32; envelope-from=steven.sistare@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Series	precreate phase \| expand [RFC,V1,00/14] precreate phase [RFC,V1,01/14] accel: encapsulate search state [RFC,V1,02/14] accel: accel preinit function [RFC,V1,03/14] accel: split configure_accelerators [RFC,V1,04/14] accel: set accelerator and machine props earlier [RFC,V1,05/14] migration: init and listen during precreate [RFC,V1,06/14] vl: precreate phase [RFC,V1,07/14] monitor: chardev name [RFC,V1,08/14] qom: get properties [RFC,V1,09/14] qemu-option: filtered foreach [RFC,V1,10/14] qemu-options: pass object to filter [RFC,V1,11/14] monitor: connect in precreate [RFC,V1,12/14] qtest: connect in precreate [RFC,V1,13/14] net: cleanup for precreate phase [RFC,V1,14/14] migration: allow commands during precreate and preconfig

Steven Sistare Oct. 17, 2024, 3:14 p.m. UTC

Define a new qemu initialization phase called 'precreate' which occurs
before most backends or devices have been created.  The only exception
is monitor and qtest devices and their associated chardevs.

QEMU runs in the main loop during this phase.  Monitor connections are
active and can receive migration configuration commands.  QEMU starts
listening on the normal migration URI during this phase, which can come
from either the QEMU command line or from a migrate_incoming command.
Thus the user can issue query-migrate to get the socket-address for
dynamically allocated port numbers during precreate.

In this series QEMU passes through and does not linger in the precreate
phase, and the user sees no change in behavior.  The cpr-transfer series
will linger in the phase for an incoming CPR operation, and exit the phase
when the migrate command is send to source QEMU and causes destination QEMU
to read CPR state.

A future series may wish to add a command-line argument and monitor
command that enters and exits the precreate phase for general purpose use.
That would enable the user to issue monitor commands that specify backend
creation parameters.

Now that both the monitor and migration are initialized early, one could
implement a future scheme in which very few parameters are passed on the
command line, and all backend and device creation parameters are passed
in the migration stream.

Patches 1-5 untangle dependencies between accelerator and migration
initialization, allowing the latter to be called earlier.  Currently
configure_accelerators chooses the accelerator, defines compatability
properties needed to initialize migration, and creates the machine in
accel_init_machine.  The patches split configure_accelerators into a first
part which probes for accelerators and defines properties, and a second
part which creates the machine.  The second part is called after the
precreate phase.  The patches also set machine options earlier, so they
are known when migration is initialized.

Patch 6 defines the boundaries of the precreate phase.

Patches 7-12 define and use helper functions to extract monitor and qtest
options from the command line, find their associated chardevs, and initialize
the monitor and qtest devices early, prior to qemu_create_early_backends.

Patches 13-14 contain miscellany.

Steve Sistare (14):
  accel: encapsulate search state
  accel: accel preinit function
  accel: split configure_accelerators
  accel: set accelerator and machine props earlier
  migration: init and listen during precreate
  vl: precreate phase
  monitor: chardev name
  qom: get properties
  qemu-option: filtered foreach
  qemu-options: pass object to filter
  monitor: connect in precreate
  qtest: connect in precreate
  net: cleanup for precreate phase
  migration: allow commands during precreate and preconfig

 accel/accel-system.c            |   2 -
 accel/kvm/kvm-all.c             |  58 +++++----
 accel/xen/xen-all.c             |  11 +-
 hmp-commands.hx                 |   2 +
 include/monitor/monitor.h       |   1 +
 include/qapi/visitor.h          |   1 +
 include/qemu/accel.h            |   1 +
 include/qemu/option.h           |   5 +
 include/qom/object_interfaces.h |   8 ++
 monitor/monitor.c               |  21 ++++
 net/net.c                       |   4 +-
 qapi/migration.json             |  16 ++-
 qapi/misc.json                  |   3 +-
 qom/object_interfaces.c         |  27 ++--
 system/vl.c                     | 270 ++++++++++++++++++++++++++++++----------
 target/i386/nvmm/nvmm-all.c     |  10 +-
 target/i386/whpx/whpx-all.c     |  14 ++-
 util/qemu-option.c              |  25 ++++
 18 files changed, 358 insertions(+), 121 deletions(-)

Steven Sistare Oct. 17, 2024, 3:19 p.m. UTC | #1

On 10/17/2024 11:14 AM, Steve Sistare wrote:
> Define a new qemu initialization phase called 'precreate' which occurs
> before most backends or devices have been created.  The only exception
> is monitor and qtest devices and their associated chardevs.
> 
> QEMU runs in the main loop during this phase.  Monitor connections are
> active and can receive migration configuration commands.  QEMU starts
> listening on the normal migration URI during this phase, which can come
> from either the QEMU command line or from a migrate_incoming command.
> Thus the user can issue query-migrate to get the socket-address for
> dynamically allocated port numbers during precreate.
> 
> In this series QEMU passes through and does not linger in the precreate
> phase, and the user sees no change in behavior.  The cpr-transfer series
> will linger in the phase for an incoming CPR operation, and exit the phase
> when the migrate command is send to source QEMU and causes destination QEMU
> to read CPR state.

Hi Peter, I rebased the cpr-transfer series on precreate.  The
cpr-transfer migration-test now works.  Do you want to see cpr-transfer
V3 now, or wait until we get feedback on precreate?  The only significant
change is that I deleted the HUP synchronization, and I post an async
listen for the incoming cpr-uri connection.

- Steve

Peter Xu Oct. 17, 2024, 3:53 p.m. UTC | #2

On Thu, Oct 17, 2024 at 11:19:51AM -0400, Steven Sistare wrote:
> On 10/17/2024 11:14 AM, Steve Sistare wrote:
> > Define a new qemu initialization phase called 'precreate' which occurs
> > before most backends or devices have been created.  The only exception
> > is monitor and qtest devices and their associated chardevs.
> > 
> > QEMU runs in the main loop during this phase.  Monitor connections are
> > active and can receive migration configuration commands.  QEMU starts
> > listening on the normal migration URI during this phase, which can come
> > from either the QEMU command line or from a migrate_incoming command.
> > Thus the user can issue query-migrate to get the socket-address for
> > dynamically allocated port numbers during precreate.
> > 
> > In this series QEMU passes through and does not linger in the precreate
> > phase, and the user sees no change in behavior.  The cpr-transfer series
> > will linger in the phase for an incoming CPR operation, and exit the phase
> > when the migrate command is send to source QEMU and causes destination QEMU
> > to read CPR state.
> 
> Hi Peter, I rebased the cpr-transfer series on precreate.  The
> cpr-transfer migration-test now works.  Do you want to see cpr-transfer
> V3 now, or wait until we get feedback on precreate?  The only significant
> change is that I deleted the HUP synchronization, and I post an async
> listen for the incoming cpr-uri connection.

Maybe you can still send it, because I remember there're some other
discussion that may not settled yet (e.g. how anon-memfd is applied, iirc),
then it can be reviewed and discussed concurrently when proper with the
precreate series.

Thanks,

Steven Sistare Oct. 21, 2024, 3:56 p.m. UTC | #3

On 10/17/2024 11:53 AM, Peter Xu wrote:
> On Thu, Oct 17, 2024 at 11:19:51AM -0400, Steven Sistare wrote:
>> On 10/17/2024 11:14 AM, Steve Sistare wrote:
>>> Define a new qemu initialization phase called 'precreate' which occurs
>>> before most backends or devices have been created.  The only exception
>>> is monitor and qtest devices and their associated chardevs.
>>>
>>> QEMU runs in the main loop during this phase.  Monitor connections are
>>> active and can receive migration configuration commands.  QEMU starts
>>> listening on the normal migration URI during this phase, which can come
>>> from either the QEMU command line or from a migrate_incoming command.
>>> Thus the user can issue query-migrate to get the socket-address for
>>> dynamically allocated port numbers during precreate.
>>>
>>> In this series QEMU passes through and does not linger in the precreate
>>> phase, and the user sees no change in behavior.  The cpr-transfer series
>>> will linger in the phase for an incoming CPR operation, and exit the phase
>>> when the migrate command is send to source QEMU and causes destination QEMU
>>> to read CPR state.
>>
>> Hi Peter, I rebased the cpr-transfer series on precreate.  The
>> cpr-transfer migration-test now works.  Do you want to see cpr-transfer
>> V3 now, or wait until we get feedback on precreate?  The only significant
>> change is that I deleted the HUP synchronization, and I post an async
>> listen for the incoming cpr-uri connection.
> 
> Maybe you can still send it, because I remember there're some other
> discussion that may not settled yet (e.g. how anon-memfd is applied, iirc),
> then it can be reviewed and discussed concurrently when proper with the
> precreate series.

For now I'll reply to the open email threads for cpr-transfer V2.

- Steve

Paolo Bonzini Oct. 23, 2024, 4:31 p.m. UTC | #4

On 10/17/24 17:14, Steve Sistare wrote:
> Define a new qemu initialization phase called 'precreate' which occurs
> before most backends or devices have been created.  The only exception
> is monitor and qtest devices and their associated chardevs.
> 
> QEMU runs in the main loop during this phase.  Monitor connections are
> active and can receive migration configuration commands.  QEMU starts
> listening on the normal migration URI during this phase, which can come
> from either the QEMU command line or from a migrate_incoming command.
> Thus the user can issue query-migrate to get the socket-address for
> dynamically allocated port numbers during precreate.
> 
> In this series QEMU passes through and does not linger in the precreate
> phase, and the user sees no change in behavior.  The cpr-transfer series
> will linger in the phase for an incoming CPR operation, and exit the phase
> when the migrate command is send to source QEMU and causes destination QEMU
> to read CPR state.
> 
> A future series may wish to add a command-line argument and monitor
> command that enters and exits the precreate phase for general purpose use.
> That would enable the user to issue monitor commands that specify backend
> creation parameters.

I have a problem with the concept, which is that the split between 
command line and monitor is much harder to understand.

Ideally, one would come entirely before the other; preconfig is already 
ugly in how -device is processed later than everything else[1].  This 
series makes this much more complex.

If I understand correctly, what you want is effectively to execute 
monitor commands from the migration stream.  If you want to do that, we 
need to take more qemu_init code and turn it into QMP commands, so that 
precreate can exit qemu_init very early and the "after precreate" 
command line options can be replaced by interactions on the monitor.

This is the idea that drove the creation of -M smp.xxx, -M boot.xxx, -M 
memory.xxx, etc. (see for example commit 8c4da4b5218, "machine: add boot 
compound property", 2022-05-12).  For example -semihosting-config, 
-acpitable, -smbios, -fw_cfg, -option-rom, -rtc could all become -M 
properties and handled by a single monitor command machine-set.

Of all the other options, -accel, -cpu and -display are the main missing 
ones (-cpu is the very hard one); a full list is at 
https://wiki.qemu.org/User:Paolo_Bonzini/Machine_init_sequence#QMP_configuration_flow.

Anyhow, at this point all that's needed is a -chardev/-mon pair (and I 
guess -incoming defer) in order to bootstrap the monitor in precreate mode.

It's okay to prototype without full support for the options I've listed, 
but if we want to go with precreate we should make most command line 
options entirely incompatible with it, and also make it imply -nodefaults.

Paolo

[1] -loadvm and -incoming too; but for those two we could make their 
monitor commands exit preconfig mode automatically, and invoke them from 
the monitor instead of specifying them on the command line.

Steven Sistare Oct. 24, 2024, 9:16 p.m. UTC | #5

On 10/23/2024 12:31 PM, Paolo Bonzini wrote:
> On 10/17/24 17:14, Steve Sistare wrote:
>> Define a new qemu initialization phase called 'precreate' which occurs
>> before most backends or devices have been created.  The only exception
>> is monitor and qtest devices and their associated chardevs.
>>
>> QEMU runs in the main loop during this phase.  Monitor connections are
>> active and can receive migration configuration commands.  QEMU starts
>> listening on the normal migration URI during this phase, which can come
>> from either the QEMU command line or from a migrate_incoming command.
>> Thus the user can issue query-migrate to get the socket-address for
>> dynamically allocated port numbers during precreate.
>>
>> In this series QEMU passes through and does not linger in the precreate
>> phase, and the user sees no change in behavior.  The cpr-transfer series
>> will linger in the phase for an incoming CPR operation, and exit the phase
>> when the migrate command is send to source QEMU and causes destination QEMU
>> to read CPR state.
>>
>> A future series may wish to add a command-line argument and monitor
>> command that enters and exits the precreate phase for general purpose use.
>> That would enable the user to issue monitor commands that specify backend
>> creation parameters.
> 
> I have a problem with the concept, which is that the split between command line and monitor is much harder to understand.
> 
> Ideally, one would come entirely before the other; preconfig is already ugly in how -device is processed later than everything else[1].  This series makes this much more complex.
> 
> If I understand correctly, what you want is effectively to execute monitor commands from the migration stream.  If you want to do that, we need to take more qemu_init code and turn it into QMP commands, so that precreate can exit qemu_init very early and the "after precreate" command line options can be replaced by interactions on the monitor.
> 
> This is the idea that drove the creation of -M smp.xxx, -M boot.xxx, -M memory.xxx, etc. (see for example commit 8c4da4b5218, "machine: add boot compound property", 2022-05-12).  For example -semihosting-config, -acpitable, -smbios, -fw_cfg, -option-rom, -rtc could all become -M properties and handled by a single monitor command machine-set.
> 
> Of all the other options, -accel, -cpu and -display are the main missing ones (-cpu is the very hard one); a full list is at https://wiki.qemu.org/User:Paolo_Bonzini/Machine_init_sequence#QMP_configuration_flow.
> 
> Anyhow, at this point all that's needed is a -chardev/-mon pair (and I guess -incoming defer) in order to bootstrap the monitor in precreate mode.
> 
> It's okay to prototype without full support for the options I've listed, but if we want to go with precreate we should make most command line options entirely incompatible with it, and also make it imply -nodefaults.
> 
> Paolo
> 
> [1] -loadvm and -incoming too; but for those two we could make their monitor commands exit preconfig mode automatically, and invoke them from the monitor instead of specifying them on the command line.

Regarding: "what you want is effectively to execute monitor commands
from the migration stream"

That is not the goal of this series.  It could be someone else's goal, when
fully developing a precreate phase, and in that context I understand and
agree with your comments.  I have a narrower immediate problem to solve,
however.

For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
a dedicated channel, then src qemu sends migration state over the normal
migration channel.

Dst qemu reads the fds early, then calls the backend and device creation
functions which use them.  Dst qemu then accepts and reads the migration
channel.

We need a way to send monitor commands that set dst migration capabilities,
before src qemu starts the migration.  Hence the dst cannot proceed to
backend and device creation because the src has not sent fd's yet.  Hence
we need a dst monitor before device creation.  The precreate phase does that.

Regarding: "This series makes this much more complex."

I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
and other early dependencies can remain as is.  I would drop the notion of
a precreate phase, and instead leverage the preconfig phase.  I would move
qemu_create_late_backends, and a small part at the end of qemu_init, to
qmp_x_exit_preconfig.

These patches from the series (slightly renamed) would suffice.
The "move preconfig boundary" patch is new.

   * migration: init and listen during preconfig
   * vl: move preconfig boundary
   * monitor: connect in preconfig
   * net: cleanup for preconfig phase
   * migration: allow commands during preconfig

Would you consider supporting that?

- Steve

Daniel P. Berrangé Oct. 25, 2024, 8:46 a.m. UTC | #6

On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
> 
> Regarding: "what you want is effectively to execute monitor commands
> from the migration stream"
> 
> That is not the goal of this series.  It could be someone else's goal, when
> fully developing a precreate phase, and in that context I understand and
> agree with your comments.  I have a narrower immediate problem to solve,
> however.
> 
> For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
> a dedicated channel, then src qemu sends migration state over the normal
> migration channel.
> 
> Dst qemu reads the fds early, then calls the backend and device creation
> functions which use them.  Dst qemu then accepts and reads the migration
> channel.
> 
> We need a way to send monitor commands that set dst migration capabilities,
> before src qemu starts the migration.  Hence the dst cannot proceed to
> backend and device creation because the src has not sent fd's yet.  Hence
> we need a dst monitor before device creation.  The precreate phase does that.

Sigh, what we obviously need here, is what we've always talked about as our
long term design goal:

A way to launch QEMU with the CLI only specifying the QMP socket, and every
other config aspect done by issuing QMP commands, which are processed in the
order the mgmt app sends them, so QEMU hasn't have to hardcode processing
of different pieces in different phases.

Anything that isn't that, is piling more hacks on top of our existing
mountain of hacks. That's OK if it does something useful as a side effect
that moves us incrementally closer towards that desired end goal.

> Regarding: "This series makes this much more complex."
> 
> I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
> and other early dependencies can remain as is.  I would drop the notion of
> a precreate phase, and instead leverage the preconfig phase.  I would move
> qemu_create_late_backends, and a small part at the end of qemu_init, to
> qmp_x_exit_preconfig.

Is CPR still going to useful enough in the real world if you drop chardev
support ? Every VM has at least one chardev for a serial device doesn't
it, and often more since we wire chardevs into all kinds of places.

With regards,
Daniel

Steven Sistare Oct. 25, 2024, 1:33 p.m. UTC | #7

On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote:
> On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
>>
>> Regarding: "what you want is effectively to execute monitor commands
>> from the migration stream"
>>
>> That is not the goal of this series.  It could be someone else's goal, when
>> fully developing a precreate phase, and in that context I understand and
>> agree with your comments.  I have a narrower immediate problem to solve,
>> however.
>>
>> For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
>> a dedicated channel, then src qemu sends migration state over the normal
>> migration channel.
>>
>> Dst qemu reads the fds early, then calls the backend and device creation
>> functions which use them.  Dst qemu then accepts and reads the migration
>> channel.
>>
>> We need a way to send monitor commands that set dst migration capabilities,
>> before src qemu starts the migration.  Hence the dst cannot proceed to
>> backend and device creation because the src has not sent fd's yet.  Hence
>> we need a dst monitor before device creation.  The precreate phase does that.
> 
> Sigh, what we obviously need here, is what we've always talked about as our
> long term design goal:
> 
> A way to launch QEMU with the CLI only specifying the QMP socket, and every
> other config aspect done by issuing QMP commands, which are processed in the
> order the mgmt app sends them, so QEMU hasn't have to hardcode processing
> of different pieces in different phases.
> 
> Anything that isn't that, is piling more hacks on top of our existing
> mountain of hacks. That's OK if it does something useful as a side effect
> that moves us incrementally closer towards that desired end goal.
> 
>> Regarding: "This series makes this much more complex."
>>
>> I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
>> and other early dependencies can remain as is.  I would drop the notion of
>> a precreate phase, and instead leverage the preconfig phase.  I would move
>> qemu_create_late_backends, and a small part at the end of qemu_init, to
>> qmp_x_exit_preconfig.
> 
> Is CPR still going to useful enough in the real world if you drop chardev
> support ? Every VM has at least one chardev for a serial device doesn't
> it, and often more since we wire chardevs into all kinds of places.

CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already
knows how to create and manage new connections to dest qemu, as it would for normal
migration.

CPR for chardev is very useful for cpr-exec mode.  And cpr-exec mode does not need any
of these monitor patches, because old qemu exec's new qemu, and they are never active
at the same time.  One must completely specify the migration using src qemu before
initiating the exec.  I mourn cpr-exec mode.

Which begs the question, do we really need to allow migration parameters to be set
in the dest monitor when using cpr?  CPR is a very restricted mode of migration.
Let me discuss this with Peter.

- Steve

Daniel P. Berrangé Oct. 25, 2024, 1:43 p.m. UTC | #8

On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote:
> On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote:
> > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
> > > 
> > > Regarding: "what you want is effectively to execute monitor commands
> > > from the migration stream"
> > > 
> > > That is not the goal of this series.  It could be someone else's goal, when
> > > fully developing a precreate phase, and in that context I understand and
> > > agree with your comments.  I have a narrower immediate problem to solve,
> > > however.
> > > 
> > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
> > > a dedicated channel, then src qemu sends migration state over the normal
> > > migration channel.
> > > 
> > > Dst qemu reads the fds early, then calls the backend and device creation
> > > functions which use them.  Dst qemu then accepts and reads the migration
> > > channel.
> > > 
> > > We need a way to send monitor commands that set dst migration capabilities,
> > > before src qemu starts the migration.  Hence the dst cannot proceed to
> > > backend and device creation because the src has not sent fd's yet.  Hence
> > > we need a dst monitor before device creation.  The precreate phase does that.
> > 
> > Sigh, what we obviously need here, is what we've always talked about as our
> > long term design goal:
> > 
> > A way to launch QEMU with the CLI only specifying the QMP socket, and every
> > other config aspect done by issuing QMP commands, which are processed in the
> > order the mgmt app sends them, so QEMU hasn't have to hardcode processing
> > of different pieces in different phases.
> > 
> > Anything that isn't that, is piling more hacks on top of our existing
> > mountain of hacks. That's OK if it does something useful as a side effect
> > that moves us incrementally closer towards that desired end goal.
> > 
> > > Regarding: "This series makes this much more complex."
> > > 
> > > I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
> > > and other early dependencies can remain as is.  I would drop the notion of
> > > a precreate phase, and instead leverage the preconfig phase.  I would move
> > > qemu_create_late_backends, and a small part at the end of qemu_init, to
> > > qmp_x_exit_preconfig.
> > 
> > Is CPR still going to useful enough in the real world if you drop chardev
> > support ? Every VM has at least one chardev for a serial device doesn't
> > it, and often more since we wire chardevs into all kinds of places.
> 
> CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already
> knows how to create and manage new connections to dest qemu, as it would for normal
> migration.
> 
> CPR for chardev is very useful for cpr-exec mode.  And cpr-exec mode does not need any
> of these monitor patches, because old qemu exec's new qemu, and they are never active
> at the same time.  One must completely specify the migration using src qemu before
> initiating the exec.  I mourn cpr-exec mode.
> 
> Which begs the question, do we really need to allow migration parameters to be set
> in the dest monitor when using cpr?  CPR is a very restricted mode of migration.
> Let me discuss this with Peter.

The migration QAPI design has always felt rather odd to me, in that we
have perfectly good commands "migrate" & "migrate-incoming" that are able
to accept an arbitrary list of parameters when invoked. Instead of passing
parameters to them though, we instead require apps use the separate
migreate-set-parameters/capabiltiies commands many times over to set
global variables which the later 'migrate' command then uses.

The reason for this is essentially a historical mistake - we copied the
way we did it from HMP, which was this way because HMP was bad at supporting
arbitrary customizable paramters to commands. I wish we hadn't copied this
design over to QMP.

To bring it back on topic, we need QMP on the dest to set parameters,
because -incoming  was limited to only take the URI.

If the "migrate-incoming" command accepted all parameters directly,
then we could use QAPI visitor to usupport a "-incoming ..." command
that took an arbitrary JSON document and turned it into a call to
"migrate-incoming".

With that we would never need QMP on the target for cpr-exec, avoiding
this ordering poblem you're facing....assuming we put processing of
-incoming at the right point in the code flow

Can we fix this design and expose the full configurability on the
CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
CLI args.

It seems entirely practical to me to add parameters to 'migrate-incoming'
in a backwards compatible manner and deprecate set-parameters/capabilities

With regards,
Daniel

Steven Sistare Oct. 25, 2024, 2:32 p.m. UTC | #9

On 10/25/2024 9:43 AM, Daniel P. Berrangé wrote:
> On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote:
>> On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote:
>>> On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
>>>>
>>>> Regarding: "what you want is effectively to execute monitor commands
>>>> from the migration stream"
>>>>
>>>> That is not the goal of this series.  It could be someone else's goal, when
>>>> fully developing a precreate phase, and in that context I understand and
>>>> agree with your comments.  I have a narrower immediate problem to solve,
>>>> however.
>>>>
>>>> For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
>>>> a dedicated channel, then src qemu sends migration state over the normal
>>>> migration channel.
>>>>
>>>> Dst qemu reads the fds early, then calls the backend and device creation
>>>> functions which use them.  Dst qemu then accepts and reads the migration
>>>> channel.
>>>>
>>>> We need a way to send monitor commands that set dst migration capabilities,
>>>> before src qemu starts the migration.  Hence the dst cannot proceed to
>>>> backend and device creation because the src has not sent fd's yet.  Hence
>>>> we need a dst monitor before device creation.  The precreate phase does that.
>>>
>>> Sigh, what we obviously need here, is what we've always talked about as our
>>> long term design goal:
>>>
>>> A way to launch QEMU with the CLI only specifying the QMP socket, and every
>>> other config aspect done by issuing QMP commands, which are processed in the
>>> order the mgmt app sends them, so QEMU hasn't have to hardcode processing
>>> of different pieces in different phases.
>>>
>>> Anything that isn't that, is piling more hacks on top of our existing
>>> mountain of hacks. That's OK if it does something useful as a side effect
>>> that moves us incrementally closer towards that desired end goal.
>>>
>>>> Regarding: "This series makes this much more complex."
>>>>
>>>> I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
>>>> and other early dependencies can remain as is.  I would drop the notion of
>>>> a precreate phase, and instead leverage the preconfig phase.  I would move
>>>> qemu_create_late_backends, and a small part at the end of qemu_init, to
>>>> qmp_x_exit_preconfig.
>>>
>>> Is CPR still going to useful enough in the real world if you drop chardev
>>> support ? Every VM has at least one chardev for a serial device doesn't
>>> it, and often more since we wire chardevs into all kinds of places.
>>
>> CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already
>> knows how to create and manage new connections to dest qemu, as it would for normal
>> migration.
>>
>> CPR for chardev is very useful for cpr-exec mode.  And cpr-exec mode does not need any
>> of these monitor patches, because old qemu exec's new qemu, and they are never active
>> at the same time.  One must completely specify the migration using src qemu before
>> initiating the exec.  I mourn cpr-exec mode.
>>
>> Which begs the question, do we really need to allow migration parameters to be set
>> in the dest monitor when using cpr?  CPR is a very restricted mode of migration.
>> Let me discuss this with Peter.
> 
> The migration QAPI design has always felt rather odd to me, in that we
> have perfectly good commands "migrate" & "migrate-incoming" that are able
> to accept an arbitrary list of parameters when invoked. Instead of passing
> parameters to them though, we instead require apps use the separate
> migreate-set-parameters/capabiltiies commands many times over to set
> global variables which the later 'migrate' command then uses.
> 
> The reason for this is essentially a historical mistake - we copied the
> way we did it from HMP, which was this way because HMP was bad at supporting
> arbitrary customizable paramters to commands. I wish we hadn't copied this
> design over to QMP.
> 
> To bring it back on topic, we need QMP on the dest to set parameters,
> because -incoming  was limited to only take the URI.
> 
> If the "migrate-incoming" command accepted all parameters directly,
> then we could use QAPI visitor to usupport a "-incoming ..." command
> that took an arbitrary JSON document and turned it into a call to
> "migrate-incoming".
> 
> With that we would never need QMP on the target for cpr-exec, avoiding
> this ordering poblem you're facing....assuming we put processing of
> -incoming at the right point in the code flow
> 
> Can we fix this design and expose the full configurability on the
> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> CLI args.
> 
> It seems entirely practical to me to add parameters to 'migrate-incoming'
> in a backwards compatible manner and deprecate set-parameters/capabilities

Hi Daniel, should we ever need to set caps or parameters for CPR, that sounds
like a good way forward.  And a good idea independently of CPR.  However, I
am hoping to proceed with CPR with the initial restriction that one cannot
set them. The case that motivated my exploration of precreate is artificial --
qtest wanting to enable migration events -- and I can fix that.  I know of no
real cases where caps must be set for CPR.

The other screw case which motivated this thread is a dynamically chosen TCP
port number for the migration listen socket.  One must query dest qemu to get it.
Your suggestions here for new incoming syntax would not help.  However, for CPR,
we always migrate to the same host, so a unix domain socket can be used.

- Steve

Daniel P. Berrangé Oct. 25, 2024, 2:49 p.m. UTC | #10

On Fri, Oct 25, 2024 at 10:32:15AM -0400, Steven Sistare wrote:
> On 10/25/2024 9:43 AM, Daniel P. Berrangé wrote:
> > On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote:
> > > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
> > > > > 
> > > > > Regarding: "what you want is effectively to execute monitor commands
> > > > > from the migration stream"
> > > > > 
> > > > > That is not the goal of this series.  It could be someone else's goal, when
> > > > > fully developing a precreate phase, and in that context I understand and
> > > > > agree with your comments.  I have a narrower immediate problem to solve,
> > > > > however.
> > > > > 
> > > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
> > > > > a dedicated channel, then src qemu sends migration state over the normal
> > > > > migration channel.
> > > > > 
> > > > > Dst qemu reads the fds early, then calls the backend and device creation
> > > > > functions which use them.  Dst qemu then accepts and reads the migration
> > > > > channel.
> > > > > 
> > > > > We need a way to send monitor commands that set dst migration capabilities,
> > > > > before src qemu starts the migration.  Hence the dst cannot proceed to
> > > > > backend and device creation because the src has not sent fd's yet.  Hence
> > > > > we need a dst monitor before device creation.  The precreate phase does that.
> > > > 
> > > > Sigh, what we obviously need here, is what we've always talked about as our
> > > > long term design goal:
> > > > 
> > > > A way to launch QEMU with the CLI only specifying the QMP socket, and every
> > > > other config aspect done by issuing QMP commands, which are processed in the
> > > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing
> > > > of different pieces in different phases.
> > > > 
> > > > Anything that isn't that, is piling more hacks on top of our existing
> > > > mountain of hacks. That's OK if it does something useful as a side effect
> > > > that moves us incrementally closer towards that desired end goal.
> > > > 
> > > > > Regarding: "This series makes this much more complex."
> > > > > 
> > > > > I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
> > > > > and other early dependencies can remain as is.  I would drop the notion of
> > > > > a precreate phase, and instead leverage the preconfig phase.  I would move
> > > > > qemu_create_late_backends, and a small part at the end of qemu_init, to
> > > > > qmp_x_exit_preconfig.
> > > > 
> > > > Is CPR still going to useful enough in the real world if you drop chardev
> > > > support ? Every VM has at least one chardev for a serial device doesn't
> > > > it, and often more since we wire chardevs into all kinds of places.
> > > 
> > > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already
> > > knows how to create and manage new connections to dest qemu, as it would for normal
> > > migration.
> > > 
> > > CPR for chardev is very useful for cpr-exec mode.  And cpr-exec mode does not need any
> > > of these monitor patches, because old qemu exec's new qemu, and they are never active
> > > at the same time.  One must completely specify the migration using src qemu before
> > > initiating the exec.  I mourn cpr-exec mode.
> > > 
> > > Which begs the question, do we really need to allow migration parameters to be set
> > > in the dest monitor when using cpr?  CPR is a very restricted mode of migration.
> > > Let me discuss this with Peter.
> > 
> > The migration QAPI design has always felt rather odd to me, in that we
> > have perfectly good commands "migrate" & "migrate-incoming" that are able
> > to accept an arbitrary list of parameters when invoked. Instead of passing
> > parameters to them though, we instead require apps use the separate
> > migreate-set-parameters/capabiltiies commands many times over to set
> > global variables which the later 'migrate' command then uses.
> > 
> > The reason for this is essentially a historical mistake - we copied the
> > way we did it from HMP, which was this way because HMP was bad at supporting
> > arbitrary customizable paramters to commands. I wish we hadn't copied this
> > design over to QMP.
> > 
> > To bring it back on topic, we need QMP on the dest to set parameters,
> > because -incoming  was limited to only take the URI.
> > 
> > If the "migrate-incoming" command accepted all parameters directly,
> > then we could use QAPI visitor to usupport a "-incoming ..." command
> > that took an arbitrary JSON document and turned it into a call to
> > "migrate-incoming".
> > 
> > With that we would never need QMP on the target for cpr-exec, avoiding
> > this ordering poblem you're facing....assuming we put processing of
> > -incoming at the right point in the code flow
> > 
> > Can we fix this design and expose the full configurability on the
> > CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> > CLI args.
> > 
> > It seems entirely practical to me to add parameters to 'migrate-incoming'
> > in a backwards compatible manner and deprecate set-parameters/capabilities
> 
> Hi Daniel, should we ever need to set caps or parameters for CPR, that sounds
> like a good way forward.  And a good idea independently of CPR.  However, I
> am hoping to proceed with CPR with the initial restriction that one cannot
> set them. The case that motivated my exploration of precreate is artificial --
> qtest wanting to enable migration events -- and I can fix that.  I know of no
> real cases where caps must be set for CPR.
> 
> The other screw case which motivated this thread is a dynamically chosen TCP
> port number for the migration listen socket.  One must query dest qemu to get it.
> Your suggestions here for new incoming syntax would not help.  However, for CPR,
> we always migrate to the same host, so a unix domain socket can be used.

FWIW, at least in libvirt usage, IIRC, libvirt will choose the listener
port number upfront itself

With regards,
Daniel

Peter Xu Oct. 28, 2024, 9:56 p.m. UTC | #11

On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
> On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote:
> > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote:
> > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
> > > > 
> > > > Regarding: "what you want is effectively to execute monitor commands
> > > > from the migration stream"
> > > > 
> > > > That is not the goal of this series.  It could be someone else's goal, when
> > > > fully developing a precreate phase, and in that context I understand and
> > > > agree with your comments.  I have a narrower immediate problem to solve,
> > > > however.
> > > > 
> > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
> > > > a dedicated channel, then src qemu sends migration state over the normal
> > > > migration channel.
> > > > 
> > > > Dst qemu reads the fds early, then calls the backend and device creation
> > > > functions which use them.  Dst qemu then accepts and reads the migration
> > > > channel.
> > > > 
> > > > We need a way to send monitor commands that set dst migration capabilities,
> > > > before src qemu starts the migration.  Hence the dst cannot proceed to
> > > > backend and device creation because the src has not sent fd's yet.  Hence
> > > > we need a dst monitor before device creation.  The precreate phase does that.
> > > 
> > > Sigh, what we obviously need here, is what we've always talked about as our
> > > long term design goal:
> > > 
> > > A way to launch QEMU with the CLI only specifying the QMP socket, and every
> > > other config aspect done by issuing QMP commands, which are processed in the
> > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing
> > > of different pieces in different phases.
> > > 
> > > Anything that isn't that, is piling more hacks on top of our existing
> > > mountain of hacks. That's OK if it does something useful as a side effect
> > > that moves us incrementally closer towards that desired end goal.
> > > 
> > > > Regarding: "This series makes this much more complex."
> > > > 
> > > > I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
> > > > and other early dependencies can remain as is.  I would drop the notion of
> > > > a precreate phase, and instead leverage the preconfig phase.  I would move
> > > > qemu_create_late_backends, and a small part at the end of qemu_init, to
> > > > qmp_x_exit_preconfig.
> > > 
> > > Is CPR still going to useful enough in the real world if you drop chardev
> > > support ? Every VM has at least one chardev for a serial device doesn't
> > > it, and often more since we wire chardevs into all kinds of places.
> > 
> > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already
> > knows how to create and manage new connections to dest qemu, as it would for normal
> > migration.
> > 
> > CPR for chardev is very useful for cpr-exec mode.  And cpr-exec mode does not need any
> > of these monitor patches, because old qemu exec's new qemu, and they are never active
> > at the same time.  One must completely specify the migration using src qemu before
> > initiating the exec.  I mourn cpr-exec mode.
> > 
> > Which begs the question, do we really need to allow migration parameters to be set
> > in the dest monitor when using cpr?  CPR is a very restricted mode of migration.
> > Let me discuss this with Peter.
> 
> The migration QAPI design has always felt rather odd to me, in that we
> have perfectly good commands "migrate" & "migrate-incoming" that are able
> to accept an arbitrary list of parameters when invoked. Instead of passing
> parameters to them though, we instead require apps use the separate
> migreate-set-parameters/capabiltiies commands many times over to set
> global variables which the later 'migrate' command then uses.

Just to mention, we will still need some special parameters that can change
during migration, like max-bandwidth, max-downtime etc.  So not all of them
can be made into "migrate"/"migrate-incoming" arguments.

> 
> The reason for this is essentially a historical mistake - we copied the
> way we did it from HMP, which was this way because HMP was bad at supporting
> arbitrary customizable paramters to commands. I wish we hadn't copied this
> design over to QMP.
> 
> To bring it back on topic, we need QMP on the dest to set parameters,
> because -incoming  was limited to only take the URI.
> 
> If the "migrate-incoming" command accepted all parameters directly,
> then we could use QAPI visitor to usupport a "-incoming ..." command
> that took an arbitrary JSON document and turned it into a call to
> "migrate-incoming".
> 
> With that we would never need QMP on the target for cpr-exec, avoiding
> this ordering poblem you're facing....assuming we put processing of
> -incoming at the right point in the code flow
> 
> Can we fix this design and expose the full configurability on the
> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> CLI args.
> 
> It seems entirely practical to me to add parameters to 'migrate-incoming'
> in a backwards compatible manner and deprecate set-parameters/capabilities

Fabiano started working on handshake recently.  After that is ready, we
should logically also achieve similar goal at least for dest qemu so no
cap/parameter is ever needed there.

Thanks,

Daniel P. Berrangé Oct. 29, 2024, 9:09 a.m. UTC | #12

On Mon, Oct 28, 2024 at 05:56:10PM -0400, Peter Xu wrote:
> On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
> > On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote:
> > > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote:
> > > > > 
> > > > > Regarding: "what you want is effectively to execute monitor commands
> > > > > from the migration stream"
> > > > > 
> > > > > That is not the goal of this series.  It could be someone else's goal, when
> > > > > fully developing a precreate phase, and in that context I understand and
> > > > > agree with your comments.  I have a narrower immediate problem to solve,
> > > > > however.
> > > > > 
> > > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over
> > > > > a dedicated channel, then src qemu sends migration state over the normal
> > > > > migration channel.
> > > > > 
> > > > > Dst qemu reads the fds early, then calls the backend and device creation
> > > > > functions which use them.  Dst qemu then accepts and reads the migration
> > > > > channel.
> > > > > 
> > > > > We need a way to send monitor commands that set dst migration capabilities,
> > > > > before src qemu starts the migration.  Hence the dst cannot proceed to
> > > > > backend and device creation because the src has not sent fd's yet.  Hence
> > > > > we need a dst monitor before device creation.  The precreate phase does that.
> > > > 
> > > > Sigh, what we obviously need here, is what we've always talked about as our
> > > > long term design goal:
> > > > 
> > > > A way to launch QEMU with the CLI only specifying the QMP socket, and every
> > > > other config aspect done by issuing QMP commands, which are processed in the
> > > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing
> > > > of different pieces in different phases.
> > > > 
> > > > Anything that isn't that, is piling more hacks on top of our existing
> > > > mountain of hacks. That's OK if it does something useful as a side effect
> > > > that moves us incrementally closer towards that desired end goal.
> > > > 
> > > > > Regarding: "This series makes this much more complex."
> > > > > 
> > > > > I could simplify it if I abandon CPR for chardevs.  Then qemu_create_early_backends
> > > > > and other early dependencies can remain as is.  I would drop the notion of
> > > > > a precreate phase, and instead leverage the preconfig phase.  I would move
> > > > > qemu_create_late_backends, and a small part at the end of qemu_init, to
> > > > > qmp_x_exit_preconfig.
> > > > 
> > > > Is CPR still going to useful enough in the real world if you drop chardev
> > > > support ? Every VM has at least one chardev for a serial device doesn't
> > > > it, and often more since we wire chardevs into all kinds of places.
> > > 
> > > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already
> > > knows how to create and manage new connections to dest qemu, as it would for normal
> > > migration.
> > > 
> > > CPR for chardev is very useful for cpr-exec mode.  And cpr-exec mode does not need any
> > > of these monitor patches, because old qemu exec's new qemu, and they are never active
> > > at the same time.  One must completely specify the migration using src qemu before
> > > initiating the exec.  I mourn cpr-exec mode.
> > > 
> > > Which begs the question, do we really need to allow migration parameters to be set
> > > in the dest monitor when using cpr?  CPR is a very restricted mode of migration.
> > > Let me discuss this with Peter.
> > 
> > The migration QAPI design has always felt rather odd to me, in that we
> > have perfectly good commands "migrate" & "migrate-incoming" that are able
> > to accept an arbitrary list of parameters when invoked. Instead of passing
> > parameters to them though, we instead require apps use the separate
> > migreate-set-parameters/capabiltiies commands many times over to set
> > global variables which the later 'migrate' command then uses.
> 
> Just to mention, we will still need some special parameters that can change
> during migration, like max-bandwidth, max-downtime etc.  So not all of them
> can be made into "migrate"/"migrate-incoming" arguments.

I guess we can leave migrate-set-parameters with the sub-set of
parameters needed at runtime, or have a 'migrate-update' command
for those, to make it clear  which are valid to set at runtime,
and which are not valid at initial start.


With regards,
Daniel

Daniel P. Berrangé Oct. 29, 2024, 11:13 a.m. UTC | #13

On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
> 
> The migration QAPI design has always felt rather odd to me, in that we
> have perfectly good commands "migrate" & "migrate-incoming" that are able
> to accept an arbitrary list of parameters when invoked. Instead of passing
> parameters to them though, we instead require apps use the separate
> migreate-set-parameters/capabiltiies commands many times over to set
> global variables which the later 'migrate' command then uses.
> 
> The reason for this is essentially a historical mistake - we copied the
> way we did it from HMP, which was this way because HMP was bad at supporting
> arbitrary customizable paramters to commands. I wish we hadn't copied this
> design over to QMP.
> 
> To bring it back on topic, we need QMP on the dest to set parameters,
> because -incoming  was limited to only take the URI.
> 
> If the "migrate-incoming" command accepted all parameters directly,
> then we could use QAPI visitor to usupport a "-incoming ..." command
> that took an arbitrary JSON document and turned it into a call to
> "migrate-incoming".
> 
> With that we would never need QMP on the target for cpr-exec, avoiding
> this ordering poblem you're facing....assuming we put processing of
> -incoming at the right point in the code flow
> 
> Can we fix this design and expose the full configurability on the
> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> CLI args.
> 
> It seems entirely practical to me to add parameters to 'migrate-incoming'
> in a backwards compatible manner and deprecate set-parameters/capabilities

Incidentally, if we were going to evolve the migration API at all, then
it probably ought to start making use of the async job infrastructure
we have available. This is use by block jobs, and by the internal snapshot
commands, and was intended to be used for any case where we had a long
running operation triggered by a command. Migration was a poster-child
example of what its intended for, but was left alone when we first
introduced the job APIs.

The 'job-cancel' API would obsolete 'migrate-cancel'.

The other interestnig thing is that the job framework creates a well
defined lifecycle for a job, that allows querying information about
the job after completeion, but without QEMU having to keep that info
around forever. ie once a job has finished, an app can query info
about completion, and when it no longer needs that info, it can
call 'job-dismiss' to tell QEMU to discard it.

If "MigrationState" were associated a job, then it would thus have a
clear 'creation' and 'deletion' time.

With regards,
Daniel

Fabiano Rosas Oct. 29, 2024, 1:20 p.m. UTC | #14

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
>> 
>> The migration QAPI design has always felt rather odd to me, in that we
>> have perfectly good commands "migrate" & "migrate-incoming" that are able
>> to accept an arbitrary list of parameters when invoked. Instead of passing
>> parameters to them though, we instead require apps use the separate
>> migreate-set-parameters/capabiltiies commands many times over to set
>> global variables which the later 'migrate' command then uses.
>> 
>> The reason for this is essentially a historical mistake - we copied the
>> way we did it from HMP, which was this way because HMP was bad at supporting
>> arbitrary customizable paramters to commands. I wish we hadn't copied this
>> design over to QMP.
>> 
>> To bring it back on topic, we need QMP on the dest to set parameters,
>> because -incoming  was limited to only take the URI.
>> 
>> If the "migrate-incoming" command accepted all parameters directly,
>> then we could use QAPI visitor to usupport a "-incoming ..." command
>> that took an arbitrary JSON document and turned it into a call to
>> "migrate-incoming".
>> 
>> With that we would never need QMP on the target for cpr-exec, avoiding
>> this ordering poblem you're facing....assuming we put processing of
>> -incoming at the right point in the code flow
>> 
>> Can we fix this design and expose the full configurability on the
>> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
>> CLI args.
>> 
>> It seems entirely practical to me to add parameters to 'migrate-incoming'
>> in a backwards compatible manner and deprecate set-parameters/capabilities
>
> Incidentally, if we were going to evolve the migration API at all, then
> it probably ought to start making use of the async job infrastructure
> we have available. This is use by block jobs, and by the internal snapshot

I'm all for standardization on core infrastructure, but unfortunately
putting migration in a coroutine would open a can of worms. In fact,
we've been discussing about moving the incoming side out of coroutines
for a while.

> commands, and was intended to be used for any case where we had a long
> running operation triggered by a command. Migration was a poster-child
> example of what its intended for, but was left alone when we first
> introduced the job APIs.
>
> The 'job-cancel' API would obsolete 'migrate-cancel'.
>
> The other interestnig thing is that the job framework creates a well
> defined lifecycle for a job, that allows querying information about
> the job after completeion, but without QEMU having to keep that info
> around forever. ie once a job has finished, an app can query info
> about completion, and when it no longer needs that info, it can
> call 'job-dismiss' to tell QEMU to discard it.
>
> If "MigrationState" were associated a job, then it would thus have a
> clear 'creation' and 'deletion' time.
>
> With regards,
> Daniel

Peter Xu Oct. 29, 2024, 3:18 p.m. UTC | #15

On Tue, Oct 29, 2024 at 10:20:24AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
> >> 
> >> The migration QAPI design has always felt rather odd to me, in that we
> >> have perfectly good commands "migrate" & "migrate-incoming" that are able
> >> to accept an arbitrary list of parameters when invoked. Instead of passing
> >> parameters to them though, we instead require apps use the separate
> >> migreate-set-parameters/capabiltiies commands many times over to set
> >> global variables which the later 'migrate' command then uses.
> >> 
> >> The reason for this is essentially a historical mistake - we copied the
> >> way we did it from HMP, which was this way because HMP was bad at supporting
> >> arbitrary customizable paramters to commands. I wish we hadn't copied this
> >> design over to QMP.
> >> 
> >> To bring it back on topic, we need QMP on the dest to set parameters,
> >> because -incoming  was limited to only take the URI.
> >> 
> >> If the "migrate-incoming" command accepted all parameters directly,
> >> then we could use QAPI visitor to usupport a "-incoming ..." command
> >> that took an arbitrary JSON document and turned it into a call to
> >> "migrate-incoming".
> >> 
> >> With that we would never need QMP on the target for cpr-exec, avoiding
> >> this ordering poblem you're facing....assuming we put processing of
> >> -incoming at the right point in the code flow
> >> 
> >> Can we fix this design and expose the full configurability on the
> >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> >> CLI args.
> >> 
> >> It seems entirely practical to me to add parameters to 'migrate-incoming'
> >> in a backwards compatible manner and deprecate set-parameters/capabilities
> >
> > Incidentally, if we were going to evolve the migration API at all, then
> > it probably ought to start making use of the async job infrastructure
> > we have available. This is use by block jobs, and by the internal snapshot
> 
> I'm all for standardization on core infrastructure, but unfortunately
> putting migration in a coroutine would open a can of worms. In fact,
> we've been discussing about moving the incoming side out of coroutines
> for a while.

Yes, I share the same concern.  I think migration decided to go already
with as much thread model as possible that it can.

And I paused that attempt to move load() into a thread, as of now, finding
it's still non-trivial to work out the major issue: after dest load became
a thread, it means it can't take BQL for too long otherwise it blocks the
monitor.

It means we can't take bql for _any_ IO operation because it can stuck at
any IO waiting for the iochannel / NIC, aka, any qemu_get*() API invoked.

Meanwhile we still trivially need the bql from time to time, either in
pre_load() / post_load(), or some of VMStateInfo->get().  But still I can't
blindly take them, as in any VMStateInfo->get(), it can invoke qemu_get*()
itself.

So I am not sure whether what we can get from that is worthwhile yet on the
effort to make it work..

> 
> > commands, and was intended to be used for any case where we had a long
> > running operation triggered by a command. Migration was a poster-child
> > example of what its intended for, but was left alone when we first
> > introduced the job APIs.
> >
> > The 'job-cancel' API would obsolete 'migrate-cancel'.
> >
> > The other interestnig thing is that the job framework creates a well
> > defined lifecycle for a job, that allows querying information about
> > the job after completeion, but without QEMU having to keep that info
> > around forever. ie once a job has finished, an app can query info
> > about completion, and when it no longer needs that info, it can
> > call 'job-dismiss' to tell QEMU to discard it.
> >
> > If "MigrationState" were associated a job, then it would thus have a
> > clear 'creation' and 'deletion' time.

It'll face the same challenge here on whether we can join() in the main
thread.  IOW, job-cancel can take time which can also potentially block
qemu from quitting fast.

Daniel P. Berrangé Oct. 29, 2024, 3:58 p.m. UTC | #16

On Tue, Oct 29, 2024 at 10:20:24AM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote:
> >> 
> >> The migration QAPI design has always felt rather odd to me, in that we
> >> have perfectly good commands "migrate" & "migrate-incoming" that are able
> >> to accept an arbitrary list of parameters when invoked. Instead of passing
> >> parameters to them though, we instead require apps use the separate
> >> migreate-set-parameters/capabiltiies commands many times over to set
> >> global variables which the later 'migrate' command then uses.
> >> 
> >> The reason for this is essentially a historical mistake - we copied the
> >> way we did it from HMP, which was this way because HMP was bad at supporting
> >> arbitrary customizable paramters to commands. I wish we hadn't copied this
> >> design over to QMP.
> >> 
> >> To bring it back on topic, we need QMP on the dest to set parameters,
> >> because -incoming  was limited to only take the URI.
> >> 
> >> If the "migrate-incoming" command accepted all parameters directly,
> >> then we could use QAPI visitor to usupport a "-incoming ..." command
> >> that took an arbitrary JSON document and turned it into a call to
> >> "migrate-incoming".
> >> 
> >> With that we would never need QMP on the target for cpr-exec, avoiding
> >> this ordering poblem you're facing....assuming we put processing of
> >> -incoming at the right point in the code flow
> >> 
> >> Can we fix this design and expose the full configurability on the
> >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified
> >> CLI args.
> >> 
> >> It seems entirely practical to me to add parameters to 'migrate-incoming'
> >> in a backwards compatible manner and deprecate set-parameters/capabilities
> >
> > Incidentally, if we were going to evolve the migration API at all, then
> > it probably ought to start making use of the async job infrastructure
> > we have available. This is use by block jobs, and by the internal snapshot
> 
> I'm all for standardization on core infrastructure, but unfortunately
> putting migration in a coroutine would open a can of worms. In fact,
> we've been discussing about moving the incoming side out of coroutines
> for a while.

Yeah, I can understand that.

The job API at the QMP level has no association with coroutines. It
simply expresses a way to handle long running background jobs in a
standard manner.

The dependency on coroutines is purely the internal job APIs. I wonder
if it is at all practical to alter the job APIs to allow migration to
use them as it exists today, as the migration code already is quite
capable to running in the background, without adding more coroutine
usage.

It would be quite annoying if our general purpose QMP API cannot be
used because internal only impl limitations :-(

With regards,
Daniel

[RFC,V1,00/14] precreate phase

Message

Comments