Message ID | 1729178055-207271-1-git-send-email-steven.sistare@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | precreate phase | expand |
On 10/17/2024 11:14 AM, Steve Sistare wrote: > Define a new qemu initialization phase called 'precreate' which occurs > before most backends or devices have been created. The only exception > is monitor and qtest devices and their associated chardevs. > > QEMU runs in the main loop during this phase. Monitor connections are > active and can receive migration configuration commands. QEMU starts > listening on the normal migration URI during this phase, which can come > from either the QEMU command line or from a migrate_incoming command. > Thus the user can issue query-migrate to get the socket-address for > dynamically allocated port numbers during precreate. > > In this series QEMU passes through and does not linger in the precreate > phase, and the user sees no change in behavior. The cpr-transfer series > will linger in the phase for an incoming CPR operation, and exit the phase > when the migrate command is send to source QEMU and causes destination QEMU > to read CPR state. Hi Peter, I rebased the cpr-transfer series on precreate. The cpr-transfer migration-test now works. Do you want to see cpr-transfer V3 now, or wait until we get feedback on precreate? The only significant change is that I deleted the HUP synchronization, and I post an async listen for the incoming cpr-uri connection. - Steve
On Thu, Oct 17, 2024 at 11:19:51AM -0400, Steven Sistare wrote: > On 10/17/2024 11:14 AM, Steve Sistare wrote: > > Define a new qemu initialization phase called 'precreate' which occurs > > before most backends or devices have been created. The only exception > > is monitor and qtest devices and their associated chardevs. > > > > QEMU runs in the main loop during this phase. Monitor connections are > > active and can receive migration configuration commands. QEMU starts > > listening on the normal migration URI during this phase, which can come > > from either the QEMU command line or from a migrate_incoming command. > > Thus the user can issue query-migrate to get the socket-address for > > dynamically allocated port numbers during precreate. > > > > In this series QEMU passes through and does not linger in the precreate > > phase, and the user sees no change in behavior. The cpr-transfer series > > will linger in the phase for an incoming CPR operation, and exit the phase > > when the migrate command is send to source QEMU and causes destination QEMU > > to read CPR state. > > Hi Peter, I rebased the cpr-transfer series on precreate. The > cpr-transfer migration-test now works. Do you want to see cpr-transfer > V3 now, or wait until we get feedback on precreate? The only significant > change is that I deleted the HUP synchronization, and I post an async > listen for the incoming cpr-uri connection. Maybe you can still send it, because I remember there're some other discussion that may not settled yet (e.g. how anon-memfd is applied, iirc), then it can be reviewed and discussed concurrently when proper with the precreate series. Thanks,
On 10/17/2024 11:53 AM, Peter Xu wrote: > On Thu, Oct 17, 2024 at 11:19:51AM -0400, Steven Sistare wrote: >> On 10/17/2024 11:14 AM, Steve Sistare wrote: >>> Define a new qemu initialization phase called 'precreate' which occurs >>> before most backends or devices have been created. The only exception >>> is monitor and qtest devices and their associated chardevs. >>> >>> QEMU runs in the main loop during this phase. Monitor connections are >>> active and can receive migration configuration commands. QEMU starts >>> listening on the normal migration URI during this phase, which can come >>> from either the QEMU command line or from a migrate_incoming command. >>> Thus the user can issue query-migrate to get the socket-address for >>> dynamically allocated port numbers during precreate. >>> >>> In this series QEMU passes through and does not linger in the precreate >>> phase, and the user sees no change in behavior. The cpr-transfer series >>> will linger in the phase for an incoming CPR operation, and exit the phase >>> when the migrate command is send to source QEMU and causes destination QEMU >>> to read CPR state. >> >> Hi Peter, I rebased the cpr-transfer series on precreate. The >> cpr-transfer migration-test now works. Do you want to see cpr-transfer >> V3 now, or wait until we get feedback on precreate? The only significant >> change is that I deleted the HUP synchronization, and I post an async >> listen for the incoming cpr-uri connection. > > Maybe you can still send it, because I remember there're some other > discussion that may not settled yet (e.g. how anon-memfd is applied, iirc), > then it can be reviewed and discussed concurrently when proper with the > precreate series. For now I'll reply to the open email threads for cpr-transfer V2. - Steve
On 10/17/24 17:14, Steve Sistare wrote: > Define a new qemu initialization phase called 'precreate' which occurs > before most backends or devices have been created. The only exception > is monitor and qtest devices and their associated chardevs. > > QEMU runs in the main loop during this phase. Monitor connections are > active and can receive migration configuration commands. QEMU starts > listening on the normal migration URI during this phase, which can come > from either the QEMU command line or from a migrate_incoming command. > Thus the user can issue query-migrate to get the socket-address for > dynamically allocated port numbers during precreate. > > In this series QEMU passes through and does not linger in the precreate > phase, and the user sees no change in behavior. The cpr-transfer series > will linger in the phase for an incoming CPR operation, and exit the phase > when the migrate command is send to source QEMU and causes destination QEMU > to read CPR state. > > A future series may wish to add a command-line argument and monitor > command that enters and exits the precreate phase for general purpose use. > That would enable the user to issue monitor commands that specify backend > creation parameters. I have a problem with the concept, which is that the split between command line and monitor is much harder to understand. Ideally, one would come entirely before the other; preconfig is already ugly in how -device is processed later than everything else[1]. This series makes this much more complex. If I understand correctly, what you want is effectively to execute monitor commands from the migration stream. If you want to do that, we need to take more qemu_init code and turn it into QMP commands, so that precreate can exit qemu_init very early and the "after precreate" command line options can be replaced by interactions on the monitor. This is the idea that drove the creation of -M smp.xxx, -M boot.xxx, -M memory.xxx, etc. (see for example commit 8c4da4b5218, "machine: add boot compound property", 2022-05-12). For example -semihosting-config, -acpitable, -smbios, -fw_cfg, -option-rom, -rtc could all become -M properties and handled by a single monitor command machine-set. Of all the other options, -accel, -cpu and -display are the main missing ones (-cpu is the very hard one); a full list is at https://wiki.qemu.org/User:Paolo_Bonzini/Machine_init_sequence#QMP_configuration_flow. Anyhow, at this point all that's needed is a -chardev/-mon pair (and I guess -incoming defer) in order to bootstrap the monitor in precreate mode. It's okay to prototype without full support for the options I've listed, but if we want to go with precreate we should make most command line options entirely incompatible with it, and also make it imply -nodefaults. Paolo [1] -loadvm and -incoming too; but for those two we could make their monitor commands exit preconfig mode automatically, and invoke them from the monitor instead of specifying them on the command line.
On 10/23/2024 12:31 PM, Paolo Bonzini wrote: > On 10/17/24 17:14, Steve Sistare wrote: >> Define a new qemu initialization phase called 'precreate' which occurs >> before most backends or devices have been created. The only exception >> is monitor and qtest devices and their associated chardevs. >> >> QEMU runs in the main loop during this phase. Monitor connections are >> active and can receive migration configuration commands. QEMU starts >> listening on the normal migration URI during this phase, which can come >> from either the QEMU command line or from a migrate_incoming command. >> Thus the user can issue query-migrate to get the socket-address for >> dynamically allocated port numbers during precreate. >> >> In this series QEMU passes through and does not linger in the precreate >> phase, and the user sees no change in behavior. The cpr-transfer series >> will linger in the phase for an incoming CPR operation, and exit the phase >> when the migrate command is send to source QEMU and causes destination QEMU >> to read CPR state. >> >> A future series may wish to add a command-line argument and monitor >> command that enters and exits the precreate phase for general purpose use. >> That would enable the user to issue monitor commands that specify backend >> creation parameters. > > I have a problem with the concept, which is that the split between command line and monitor is much harder to understand. > > Ideally, one would come entirely before the other; preconfig is already ugly in how -device is processed later than everything else[1]. This series makes this much more complex. > > If I understand correctly, what you want is effectively to execute monitor commands from the migration stream. If you want to do that, we need to take more qemu_init code and turn it into QMP commands, so that precreate can exit qemu_init very early and the "after precreate" command line options can be replaced by interactions on the monitor. > > This is the idea that drove the creation of -M smp.xxx, -M boot.xxx, -M memory.xxx, etc. (see for example commit 8c4da4b5218, "machine: add boot compound property", 2022-05-12). For example -semihosting-config, -acpitable, -smbios, -fw_cfg, -option-rom, -rtc could all become -M properties and handled by a single monitor command machine-set. > > Of all the other options, -accel, -cpu and -display are the main missing ones (-cpu is the very hard one); a full list is at https://wiki.qemu.org/User:Paolo_Bonzini/Machine_init_sequence#QMP_configuration_flow. > > Anyhow, at this point all that's needed is a -chardev/-mon pair (and I guess -incoming defer) in order to bootstrap the monitor in precreate mode. > > It's okay to prototype without full support for the options I've listed, but if we want to go with precreate we should make most command line options entirely incompatible with it, and also make it imply -nodefaults. > > Paolo > > [1] -loadvm and -incoming too; but for those two we could make their monitor commands exit preconfig mode automatically, and invoke them from the monitor instead of specifying them on the command line. Regarding: "what you want is effectively to execute monitor commands from the migration stream" That is not the goal of this series. It could be someone else's goal, when fully developing a precreate phase, and in that context I understand and agree with your comments. I have a narrower immediate problem to solve, however. For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over a dedicated channel, then src qemu sends migration state over the normal migration channel. Dst qemu reads the fds early, then calls the backend and device creation functions which use them. Dst qemu then accepts and reads the migration channel. We need a way to send monitor commands that set dst migration capabilities, before src qemu starts the migration. Hence the dst cannot proceed to backend and device creation because the src has not sent fd's yet. Hence we need a dst monitor before device creation. The precreate phase does that. Regarding: "This series makes this much more complex." I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends and other early dependencies can remain as is. I would drop the notion of a precreate phase, and instead leverage the preconfig phase. I would move qemu_create_late_backends, and a small part at the end of qemu_init, to qmp_x_exit_preconfig. These patches from the series (slightly renamed) would suffice. The "move preconfig boundary" patch is new. * migration: init and listen during preconfig * vl: move preconfig boundary * monitor: connect in preconfig * net: cleanup for preconfig phase * migration: allow commands during preconfig Would you consider supporting that? - Steve
On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: > > Regarding: "what you want is effectively to execute monitor commands > from the migration stream" > > That is not the goal of this series. It could be someone else's goal, when > fully developing a precreate phase, and in that context I understand and > agree with your comments. I have a narrower immediate problem to solve, > however. > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over > a dedicated channel, then src qemu sends migration state over the normal > migration channel. > > Dst qemu reads the fds early, then calls the backend and device creation > functions which use them. Dst qemu then accepts and reads the migration > channel. > > We need a way to send monitor commands that set dst migration capabilities, > before src qemu starts the migration. Hence the dst cannot proceed to > backend and device creation because the src has not sent fd's yet. Hence > we need a dst monitor before device creation. The precreate phase does that. Sigh, what we obviously need here, is what we've always talked about as our long term design goal: A way to launch QEMU with the CLI only specifying the QMP socket, and every other config aspect done by issuing QMP commands, which are processed in the order the mgmt app sends them, so QEMU hasn't have to hardcode processing of different pieces in different phases. Anything that isn't that, is piling more hacks on top of our existing mountain of hacks. That's OK if it does something useful as a side effect that moves us incrementally closer towards that desired end goal. > Regarding: "This series makes this much more complex." > > I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends > and other early dependencies can remain as is. I would drop the notion of > a precreate phase, and instead leverage the preconfig phase. I would move > qemu_create_late_backends, and a small part at the end of qemu_init, to > qmp_x_exit_preconfig. Is CPR still going to useful enough in the real world if you drop chardev support ? Every VM has at least one chardev for a serial device doesn't it, and often more since we wire chardevs into all kinds of places. With regards, Daniel
On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote: > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: >> >> Regarding: "what you want is effectively to execute monitor commands >> from the migration stream" >> >> That is not the goal of this series. It could be someone else's goal, when >> fully developing a precreate phase, and in that context I understand and >> agree with your comments. I have a narrower immediate problem to solve, >> however. >> >> For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over >> a dedicated channel, then src qemu sends migration state over the normal >> migration channel. >> >> Dst qemu reads the fds early, then calls the backend and device creation >> functions which use them. Dst qemu then accepts and reads the migration >> channel. >> >> We need a way to send monitor commands that set dst migration capabilities, >> before src qemu starts the migration. Hence the dst cannot proceed to >> backend and device creation because the src has not sent fd's yet. Hence >> we need a dst monitor before device creation. The precreate phase does that. > > Sigh, what we obviously need here, is what we've always talked about as our > long term design goal: > > A way to launch QEMU with the CLI only specifying the QMP socket, and every > other config aspect done by issuing QMP commands, which are processed in the > order the mgmt app sends them, so QEMU hasn't have to hardcode processing > of different pieces in different phases. > > Anything that isn't that, is piling more hacks on top of our existing > mountain of hacks. That's OK if it does something useful as a side effect > that moves us incrementally closer towards that desired end goal. > >> Regarding: "This series makes this much more complex." >> >> I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends >> and other early dependencies can remain as is. I would drop the notion of >> a precreate phase, and instead leverage the preconfig phase. I would move >> qemu_create_late_backends, and a small part at the end of qemu_init, to >> qmp_x_exit_preconfig. > > Is CPR still going to useful enough in the real world if you drop chardev > support ? Every VM has at least one chardev for a serial device doesn't > it, and often more since we wire chardevs into all kinds of places. CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already knows how to create and manage new connections to dest qemu, as it would for normal migration. CPR for chardev is very useful for cpr-exec mode. And cpr-exec mode does not need any of these monitor patches, because old qemu exec's new qemu, and they are never active at the same time. One must completely specify the migration using src qemu before initiating the exec. I mourn cpr-exec mode. Which begs the question, do we really need to allow migration parameters to be set in the dest monitor when using cpr? CPR is a very restricted mode of migration. Let me discuss this with Peter. - Steve
On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote: > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote: > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: > > > > > > Regarding: "what you want is effectively to execute monitor commands > > > from the migration stream" > > > > > > That is not the goal of this series. It could be someone else's goal, when > > > fully developing a precreate phase, and in that context I understand and > > > agree with your comments. I have a narrower immediate problem to solve, > > > however. > > > > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over > > > a dedicated channel, then src qemu sends migration state over the normal > > > migration channel. > > > > > > Dst qemu reads the fds early, then calls the backend and device creation > > > functions which use them. Dst qemu then accepts and reads the migration > > > channel. > > > > > > We need a way to send monitor commands that set dst migration capabilities, > > > before src qemu starts the migration. Hence the dst cannot proceed to > > > backend and device creation because the src has not sent fd's yet. Hence > > > we need a dst monitor before device creation. The precreate phase does that. > > > > Sigh, what we obviously need here, is what we've always talked about as our > > long term design goal: > > > > A way to launch QEMU with the CLI only specifying the QMP socket, and every > > other config aspect done by issuing QMP commands, which are processed in the > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing > > of different pieces in different phases. > > > > Anything that isn't that, is piling more hacks on top of our existing > > mountain of hacks. That's OK if it does something useful as a side effect > > that moves us incrementally closer towards that desired end goal. > > > > > Regarding: "This series makes this much more complex." > > > > > > I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends > > > and other early dependencies can remain as is. I would drop the notion of > > > a precreate phase, and instead leverage the preconfig phase. I would move > > > qemu_create_late_backends, and a small part at the end of qemu_init, to > > > qmp_x_exit_preconfig. > > > > Is CPR still going to useful enough in the real world if you drop chardev > > support ? Every VM has at least one chardev for a serial device doesn't > > it, and often more since we wire chardevs into all kinds of places. > > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already > knows how to create and manage new connections to dest qemu, as it would for normal > migration. > > CPR for chardev is very useful for cpr-exec mode. And cpr-exec mode does not need any > of these monitor patches, because old qemu exec's new qemu, and they are never active > at the same time. One must completely specify the migration using src qemu before > initiating the exec. I mourn cpr-exec mode. > > Which begs the question, do we really need to allow migration parameters to be set > in the dest monitor when using cpr? CPR is a very restricted mode of migration. > Let me discuss this with Peter. The migration QAPI design has always felt rather odd to me, in that we have perfectly good commands "migrate" & "migrate-incoming" that are able to accept an arbitrary list of parameters when invoked. Instead of passing parameters to them though, we instead require apps use the separate migreate-set-parameters/capabiltiies commands many times over to set global variables which the later 'migrate' command then uses. The reason for this is essentially a historical mistake - we copied the way we did it from HMP, which was this way because HMP was bad at supporting arbitrary customizable paramters to commands. I wish we hadn't copied this design over to QMP. To bring it back on topic, we need QMP on the dest to set parameters, because -incoming was limited to only take the URI. If the "migrate-incoming" command accepted all parameters directly, then we could use QAPI visitor to usupport a "-incoming ..." command that took an arbitrary JSON document and turned it into a call to "migrate-incoming". With that we would never need QMP on the target for cpr-exec, avoiding this ordering poblem you're facing....assuming we put processing of -incoming at the right point in the code flow Can we fix this design and expose the full configurability on the CLI using QAPI schema & inline JSON, like we do for other QAPI-ified CLI args. It seems entirely practical to me to add parameters to 'migrate-incoming' in a backwards compatible manner and deprecate set-parameters/capabilities With regards, Daniel
On 10/25/2024 9:43 AM, Daniel P. Berrangé wrote: > On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote: >> On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote: >>> On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: >>>> >>>> Regarding: "what you want is effectively to execute monitor commands >>>> from the migration stream" >>>> >>>> That is not the goal of this series. It could be someone else's goal, when >>>> fully developing a precreate phase, and in that context I understand and >>>> agree with your comments. I have a narrower immediate problem to solve, >>>> however. >>>> >>>> For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over >>>> a dedicated channel, then src qemu sends migration state over the normal >>>> migration channel. >>>> >>>> Dst qemu reads the fds early, then calls the backend and device creation >>>> functions which use them. Dst qemu then accepts and reads the migration >>>> channel. >>>> >>>> We need a way to send monitor commands that set dst migration capabilities, >>>> before src qemu starts the migration. Hence the dst cannot proceed to >>>> backend and device creation because the src has not sent fd's yet. Hence >>>> we need a dst monitor before device creation. The precreate phase does that. >>> >>> Sigh, what we obviously need here, is what we've always talked about as our >>> long term design goal: >>> >>> A way to launch QEMU with the CLI only specifying the QMP socket, and every >>> other config aspect done by issuing QMP commands, which are processed in the >>> order the mgmt app sends them, so QEMU hasn't have to hardcode processing >>> of different pieces in different phases. >>> >>> Anything that isn't that, is piling more hacks on top of our existing >>> mountain of hacks. That's OK if it does something useful as a side effect >>> that moves us incrementally closer towards that desired end goal. >>> >>>> Regarding: "This series makes this much more complex." >>>> >>>> I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends >>>> and other early dependencies can remain as is. I would drop the notion of >>>> a precreate phase, and instead leverage the preconfig phase. I would move >>>> qemu_create_late_backends, and a small part at the end of qemu_init, to >>>> qmp_x_exit_preconfig. >>> >>> Is CPR still going to useful enough in the real world if you drop chardev >>> support ? Every VM has at least one chardev for a serial device doesn't >>> it, and often more since we wire chardevs into all kinds of places. >> >> CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already >> knows how to create and manage new connections to dest qemu, as it would for normal >> migration. >> >> CPR for chardev is very useful for cpr-exec mode. And cpr-exec mode does not need any >> of these monitor patches, because old qemu exec's new qemu, and they are never active >> at the same time. One must completely specify the migration using src qemu before >> initiating the exec. I mourn cpr-exec mode. >> >> Which begs the question, do we really need to allow migration parameters to be set >> in the dest monitor when using cpr? CPR is a very restricted mode of migration. >> Let me discuss this with Peter. > > The migration QAPI design has always felt rather odd to me, in that we > have perfectly good commands "migrate" & "migrate-incoming" that are able > to accept an arbitrary list of parameters when invoked. Instead of passing > parameters to them though, we instead require apps use the separate > migreate-set-parameters/capabiltiies commands many times over to set > global variables which the later 'migrate' command then uses. > > The reason for this is essentially a historical mistake - we copied the > way we did it from HMP, which was this way because HMP was bad at supporting > arbitrary customizable paramters to commands. I wish we hadn't copied this > design over to QMP. > > To bring it back on topic, we need QMP on the dest to set parameters, > because -incoming was limited to only take the URI. > > If the "migrate-incoming" command accepted all parameters directly, > then we could use QAPI visitor to usupport a "-incoming ..." command > that took an arbitrary JSON document and turned it into a call to > "migrate-incoming". > > With that we would never need QMP on the target for cpr-exec, avoiding > this ordering poblem you're facing....assuming we put processing of > -incoming at the right point in the code flow > > Can we fix this design and expose the full configurability on the > CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > CLI args. > > It seems entirely practical to me to add parameters to 'migrate-incoming' > in a backwards compatible manner and deprecate set-parameters/capabilities Hi Daniel, should we ever need to set caps or parameters for CPR, that sounds like a good way forward. And a good idea independently of CPR. However, I am hoping to proceed with CPR with the initial restriction that one cannot set them. The case that motivated my exploration of precreate is artificial -- qtest wanting to enable migration events -- and I can fix that. I know of no real cases where caps must be set for CPR. The other screw case which motivated this thread is a dynamically chosen TCP port number for the migration listen socket. One must query dest qemu to get it. Your suggestions here for new incoming syntax would not help. However, for CPR, we always migrate to the same host, so a unix domain socket can be used. - Steve
On Fri, Oct 25, 2024 at 10:32:15AM -0400, Steven Sistare wrote: > On 10/25/2024 9:43 AM, Daniel P. Berrangé wrote: > > On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote: > > > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote: > > > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: > > > > > > > > > > Regarding: "what you want is effectively to execute monitor commands > > > > > from the migration stream" > > > > > > > > > > That is not the goal of this series. It could be someone else's goal, when > > > > > fully developing a precreate phase, and in that context I understand and > > > > > agree with your comments. I have a narrower immediate problem to solve, > > > > > however. > > > > > > > > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over > > > > > a dedicated channel, then src qemu sends migration state over the normal > > > > > migration channel. > > > > > > > > > > Dst qemu reads the fds early, then calls the backend and device creation > > > > > functions which use them. Dst qemu then accepts and reads the migration > > > > > channel. > > > > > > > > > > We need a way to send monitor commands that set dst migration capabilities, > > > > > before src qemu starts the migration. Hence the dst cannot proceed to > > > > > backend and device creation because the src has not sent fd's yet. Hence > > > > > we need a dst monitor before device creation. The precreate phase does that. > > > > > > > > Sigh, what we obviously need here, is what we've always talked about as our > > > > long term design goal: > > > > > > > > A way to launch QEMU with the CLI only specifying the QMP socket, and every > > > > other config aspect done by issuing QMP commands, which are processed in the > > > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing > > > > of different pieces in different phases. > > > > > > > > Anything that isn't that, is piling more hacks on top of our existing > > > > mountain of hacks. That's OK if it does something useful as a side effect > > > > that moves us incrementally closer towards that desired end goal. > > > > > > > > > Regarding: "This series makes this much more complex." > > > > > > > > > > I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends > > > > > and other early dependencies can remain as is. I would drop the notion of > > > > > a precreate phase, and instead leverage the preconfig phase. I would move > > > > > qemu_create_late_backends, and a small part at the end of qemu_init, to > > > > > qmp_x_exit_preconfig. > > > > > > > > Is CPR still going to useful enough in the real world if you drop chardev > > > > support ? Every VM has at least one chardev for a serial device doesn't > > > > it, and often more since we wire chardevs into all kinds of places. > > > > > > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already > > > knows how to create and manage new connections to dest qemu, as it would for normal > > > migration. > > > > > > CPR for chardev is very useful for cpr-exec mode. And cpr-exec mode does not need any > > > of these monitor patches, because old qemu exec's new qemu, and they are never active > > > at the same time. One must completely specify the migration using src qemu before > > > initiating the exec. I mourn cpr-exec mode. > > > > > > Which begs the question, do we really need to allow migration parameters to be set > > > in the dest monitor when using cpr? CPR is a very restricted mode of migration. > > > Let me discuss this with Peter. > > > > The migration QAPI design has always felt rather odd to me, in that we > > have perfectly good commands "migrate" & "migrate-incoming" that are able > > to accept an arbitrary list of parameters when invoked. Instead of passing > > parameters to them though, we instead require apps use the separate > > migreate-set-parameters/capabiltiies commands many times over to set > > global variables which the later 'migrate' command then uses. > > > > The reason for this is essentially a historical mistake - we copied the > > way we did it from HMP, which was this way because HMP was bad at supporting > > arbitrary customizable paramters to commands. I wish we hadn't copied this > > design over to QMP. > > > > To bring it back on topic, we need QMP on the dest to set parameters, > > because -incoming was limited to only take the URI. > > > > If the "migrate-incoming" command accepted all parameters directly, > > then we could use QAPI visitor to usupport a "-incoming ..." command > > that took an arbitrary JSON document and turned it into a call to > > "migrate-incoming". > > > > With that we would never need QMP on the target for cpr-exec, avoiding > > this ordering poblem you're facing....assuming we put processing of > > -incoming at the right point in the code flow > > > > Can we fix this design and expose the full configurability on the > > CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > > CLI args. > > > > It seems entirely practical to me to add parameters to 'migrate-incoming' > > in a backwards compatible manner and deprecate set-parameters/capabilities > > Hi Daniel, should we ever need to set caps or parameters for CPR, that sounds > like a good way forward. And a good idea independently of CPR. However, I > am hoping to proceed with CPR with the initial restriction that one cannot > set them. The case that motivated my exploration of precreate is artificial -- > qtest wanting to enable migration events -- and I can fix that. I know of no > real cases where caps must be set for CPR. > > The other screw case which motivated this thread is a dynamically chosen TCP > port number for the migration listen socket. One must query dest qemu to get it. > Your suggestions here for new incoming syntax would not help. However, for CPR, > we always migrate to the same host, so a unix domain socket can be used. FWIW, at least in libvirt usage, IIRC, libvirt will choose the listener port number upfront itself With regards, Daniel
On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: > On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote: > > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote: > > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: > > > > > > > > Regarding: "what you want is effectively to execute monitor commands > > > > from the migration stream" > > > > > > > > That is not the goal of this series. It could be someone else's goal, when > > > > fully developing a precreate phase, and in that context I understand and > > > > agree with your comments. I have a narrower immediate problem to solve, > > > > however. > > > > > > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over > > > > a dedicated channel, then src qemu sends migration state over the normal > > > > migration channel. > > > > > > > > Dst qemu reads the fds early, then calls the backend and device creation > > > > functions which use them. Dst qemu then accepts and reads the migration > > > > channel. > > > > > > > > We need a way to send monitor commands that set dst migration capabilities, > > > > before src qemu starts the migration. Hence the dst cannot proceed to > > > > backend and device creation because the src has not sent fd's yet. Hence > > > > we need a dst monitor before device creation. The precreate phase does that. > > > > > > Sigh, what we obviously need here, is what we've always talked about as our > > > long term design goal: > > > > > > A way to launch QEMU with the CLI only specifying the QMP socket, and every > > > other config aspect done by issuing QMP commands, which are processed in the > > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing > > > of different pieces in different phases. > > > > > > Anything that isn't that, is piling more hacks on top of our existing > > > mountain of hacks. That's OK if it does something useful as a side effect > > > that moves us incrementally closer towards that desired end goal. > > > > > > > Regarding: "This series makes this much more complex." > > > > > > > > I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends > > > > and other early dependencies can remain as is. I would drop the notion of > > > > a precreate phase, and instead leverage the preconfig phase. I would move > > > > qemu_create_late_backends, and a small part at the end of qemu_init, to > > > > qmp_x_exit_preconfig. > > > > > > Is CPR still going to useful enough in the real world if you drop chardev > > > support ? Every VM has at least one chardev for a serial device doesn't > > > it, and often more since we wire chardevs into all kinds of places. > > > > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already > > knows how to create and manage new connections to dest qemu, as it would for normal > > migration. > > > > CPR for chardev is very useful for cpr-exec mode. And cpr-exec mode does not need any > > of these monitor patches, because old qemu exec's new qemu, and they are never active > > at the same time. One must completely specify the migration using src qemu before > > initiating the exec. I mourn cpr-exec mode. > > > > Which begs the question, do we really need to allow migration parameters to be set > > in the dest monitor when using cpr? CPR is a very restricted mode of migration. > > Let me discuss this with Peter. > > The migration QAPI design has always felt rather odd to me, in that we > have perfectly good commands "migrate" & "migrate-incoming" that are able > to accept an arbitrary list of parameters when invoked. Instead of passing > parameters to them though, we instead require apps use the separate > migreate-set-parameters/capabiltiies commands many times over to set > global variables which the later 'migrate' command then uses. Just to mention, we will still need some special parameters that can change during migration, like max-bandwidth, max-downtime etc. So not all of them can be made into "migrate"/"migrate-incoming" arguments. > > The reason for this is essentially a historical mistake - we copied the > way we did it from HMP, which was this way because HMP was bad at supporting > arbitrary customizable paramters to commands. I wish we hadn't copied this > design over to QMP. > > To bring it back on topic, we need QMP on the dest to set parameters, > because -incoming was limited to only take the URI. > > If the "migrate-incoming" command accepted all parameters directly, > then we could use QAPI visitor to usupport a "-incoming ..." command > that took an arbitrary JSON document and turned it into a call to > "migrate-incoming". > > With that we would never need QMP on the target for cpr-exec, avoiding > this ordering poblem you're facing....assuming we put processing of > -incoming at the right point in the code flow > > Can we fix this design and expose the full configurability on the > CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > CLI args. > > It seems entirely practical to me to add parameters to 'migrate-incoming' > in a backwards compatible manner and deprecate set-parameters/capabilities Fabiano started working on handshake recently. After that is ready, we should logically also achieve similar goal at least for dest qemu so no cap/parameter is ever needed there. Thanks,
On Mon, Oct 28, 2024 at 05:56:10PM -0400, Peter Xu wrote: > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: > > On Fri, Oct 25, 2024 at 09:33:51AM -0400, Steven Sistare wrote: > > > On 10/25/2024 4:46 AM, Daniel P. Berrangé wrote: > > > > On Thu, Oct 24, 2024 at 05:16:14PM -0400, Steven Sistare wrote: > > > > > > > > > > Regarding: "what you want is effectively to execute monitor commands > > > > > from the migration stream" > > > > > > > > > > That is not the goal of this series. It could be someone else's goal, when > > > > > fully developing a precreate phase, and in that context I understand and > > > > > agree with your comments. I have a narrower immediate problem to solve, > > > > > however. > > > > > > > > > > For CPR, src qemu sends file descriptors to dst qemu using SCM_RIGHTS over > > > > > a dedicated channel, then src qemu sends migration state over the normal > > > > > migration channel. > > > > > > > > > > Dst qemu reads the fds early, then calls the backend and device creation > > > > > functions which use them. Dst qemu then accepts and reads the migration > > > > > channel. > > > > > > > > > > We need a way to send monitor commands that set dst migration capabilities, > > > > > before src qemu starts the migration. Hence the dst cannot proceed to > > > > > backend and device creation because the src has not sent fd's yet. Hence > > > > > we need a dst monitor before device creation. The precreate phase does that. > > > > > > > > Sigh, what we obviously need here, is what we've always talked about as our > > > > long term design goal: > > > > > > > > A way to launch QEMU with the CLI only specifying the QMP socket, and every > > > > other config aspect done by issuing QMP commands, which are processed in the > > > > order the mgmt app sends them, so QEMU hasn't have to hardcode processing > > > > of different pieces in different phases. > > > > > > > > Anything that isn't that, is piling more hacks on top of our existing > > > > mountain of hacks. That's OK if it does something useful as a side effect > > > > that moves us incrementally closer towards that desired end goal. > > > > > > > > > Regarding: "This series makes this much more complex." > > > > > > > > > > I could simplify it if I abandon CPR for chardevs. Then qemu_create_early_backends > > > > > and other early dependencies can remain as is. I would drop the notion of > > > > > a precreate phase, and instead leverage the preconfig phase. I would move > > > > > qemu_create_late_backends, and a small part at the end of qemu_init, to > > > > > qmp_x_exit_preconfig. > > > > > > > > Is CPR still going to useful enough in the real world if you drop chardev > > > > support ? Every VM has at least one chardev for a serial device doesn't > > > > it, and often more since we wire chardevs into all kinds of places. > > > > > > CPR for chardev is not as useful for cpr-transfer mode because the mgmt layer already > > > knows how to create and manage new connections to dest qemu, as it would for normal > > > migration. > > > > > > CPR for chardev is very useful for cpr-exec mode. And cpr-exec mode does not need any > > > of these monitor patches, because old qemu exec's new qemu, and they are never active > > > at the same time. One must completely specify the migration using src qemu before > > > initiating the exec. I mourn cpr-exec mode. > > > > > > Which begs the question, do we really need to allow migration parameters to be set > > > in the dest monitor when using cpr? CPR is a very restricted mode of migration. > > > Let me discuss this with Peter. > > > > The migration QAPI design has always felt rather odd to me, in that we > > have perfectly good commands "migrate" & "migrate-incoming" that are able > > to accept an arbitrary list of parameters when invoked. Instead of passing > > parameters to them though, we instead require apps use the separate > > migreate-set-parameters/capabiltiies commands many times over to set > > global variables which the later 'migrate' command then uses. > > Just to mention, we will still need some special parameters that can change > during migration, like max-bandwidth, max-downtime etc. So not all of them > can be made into "migrate"/"migrate-incoming" arguments. I guess we can leave migrate-set-parameters with the sub-set of parameters needed at runtime, or have a 'migrate-update' command for those, to make it clear which are valid to set at runtime, and which are not valid at initial start. With regards, Daniel
On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: > > The migration QAPI design has always felt rather odd to me, in that we > have perfectly good commands "migrate" & "migrate-incoming" that are able > to accept an arbitrary list of parameters when invoked. Instead of passing > parameters to them though, we instead require apps use the separate > migreate-set-parameters/capabiltiies commands many times over to set > global variables which the later 'migrate' command then uses. > > The reason for this is essentially a historical mistake - we copied the > way we did it from HMP, which was this way because HMP was bad at supporting > arbitrary customizable paramters to commands. I wish we hadn't copied this > design over to QMP. > > To bring it back on topic, we need QMP on the dest to set parameters, > because -incoming was limited to only take the URI. > > If the "migrate-incoming" command accepted all parameters directly, > then we could use QAPI visitor to usupport a "-incoming ..." command > that took an arbitrary JSON document and turned it into a call to > "migrate-incoming". > > With that we would never need QMP on the target for cpr-exec, avoiding > this ordering poblem you're facing....assuming we put processing of > -incoming at the right point in the code flow > > Can we fix this design and expose the full configurability on the > CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > CLI args. > > It seems entirely practical to me to add parameters to 'migrate-incoming' > in a backwards compatible manner and deprecate set-parameters/capabilities Incidentally, if we were going to evolve the migration API at all, then it probably ought to start making use of the async job infrastructure we have available. This is use by block jobs, and by the internal snapshot commands, and was intended to be used for any case where we had a long running operation triggered by a command. Migration was a poster-child example of what its intended for, but was left alone when we first introduced the job APIs. The 'job-cancel' API would obsolete 'migrate-cancel'. The other interestnig thing is that the job framework creates a well defined lifecycle for a job, that allows querying information about the job after completeion, but without QEMU having to keep that info around forever. ie once a job has finished, an app can query info about completion, and when it no longer needs that info, it can call 'job-dismiss' to tell QEMU to discard it. If "MigrationState" were associated a job, then it would thus have a clear 'creation' and 'deletion' time. With regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: >> >> The migration QAPI design has always felt rather odd to me, in that we >> have perfectly good commands "migrate" & "migrate-incoming" that are able >> to accept an arbitrary list of parameters when invoked. Instead of passing >> parameters to them though, we instead require apps use the separate >> migreate-set-parameters/capabiltiies commands many times over to set >> global variables which the later 'migrate' command then uses. >> >> The reason for this is essentially a historical mistake - we copied the >> way we did it from HMP, which was this way because HMP was bad at supporting >> arbitrary customizable paramters to commands. I wish we hadn't copied this >> design over to QMP. >> >> To bring it back on topic, we need QMP on the dest to set parameters, >> because -incoming was limited to only take the URI. >> >> If the "migrate-incoming" command accepted all parameters directly, >> then we could use QAPI visitor to usupport a "-incoming ..." command >> that took an arbitrary JSON document and turned it into a call to >> "migrate-incoming". >> >> With that we would never need QMP on the target for cpr-exec, avoiding >> this ordering poblem you're facing....assuming we put processing of >> -incoming at the right point in the code flow >> >> Can we fix this design and expose the full configurability on the >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified >> CLI args. >> >> It seems entirely practical to me to add parameters to 'migrate-incoming' >> in a backwards compatible manner and deprecate set-parameters/capabilities > > Incidentally, if we were going to evolve the migration API at all, then > it probably ought to start making use of the async job infrastructure > we have available. This is use by block jobs, and by the internal snapshot I'm all for standardization on core infrastructure, but unfortunately putting migration in a coroutine would open a can of worms. In fact, we've been discussing about moving the incoming side out of coroutines for a while. > commands, and was intended to be used for any case where we had a long > running operation triggered by a command. Migration was a poster-child > example of what its intended for, but was left alone when we first > introduced the job APIs. > > The 'job-cancel' API would obsolete 'migrate-cancel'. > > The other interestnig thing is that the job framework creates a well > defined lifecycle for a job, that allows querying information about > the job after completeion, but without QEMU having to keep that info > around forever. ie once a job has finished, an app can query info > about completion, and when it no longer needs that info, it can > call 'job-dismiss' to tell QEMU to discard it. > > If "MigrationState" were associated a job, then it would thus have a > clear 'creation' and 'deletion' time. > > With regards, > Daniel
On Tue, Oct 29, 2024 at 10:20:24AM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: > >> > >> The migration QAPI design has always felt rather odd to me, in that we > >> have perfectly good commands "migrate" & "migrate-incoming" that are able > >> to accept an arbitrary list of parameters when invoked. Instead of passing > >> parameters to them though, we instead require apps use the separate > >> migreate-set-parameters/capabiltiies commands many times over to set > >> global variables which the later 'migrate' command then uses. > >> > >> The reason for this is essentially a historical mistake - we copied the > >> way we did it from HMP, which was this way because HMP was bad at supporting > >> arbitrary customizable paramters to commands. I wish we hadn't copied this > >> design over to QMP. > >> > >> To bring it back on topic, we need QMP on the dest to set parameters, > >> because -incoming was limited to only take the URI. > >> > >> If the "migrate-incoming" command accepted all parameters directly, > >> then we could use QAPI visitor to usupport a "-incoming ..." command > >> that took an arbitrary JSON document and turned it into a call to > >> "migrate-incoming". > >> > >> With that we would never need QMP on the target for cpr-exec, avoiding > >> this ordering poblem you're facing....assuming we put processing of > >> -incoming at the right point in the code flow > >> > >> Can we fix this design and expose the full configurability on the > >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > >> CLI args. > >> > >> It seems entirely practical to me to add parameters to 'migrate-incoming' > >> in a backwards compatible manner and deprecate set-parameters/capabilities > > > > Incidentally, if we were going to evolve the migration API at all, then > > it probably ought to start making use of the async job infrastructure > > we have available. This is use by block jobs, and by the internal snapshot > > I'm all for standardization on core infrastructure, but unfortunately > putting migration in a coroutine would open a can of worms. In fact, > we've been discussing about moving the incoming side out of coroutines > for a while. Yes, I share the same concern. I think migration decided to go already with as much thread model as possible that it can. And I paused that attempt to move load() into a thread, as of now, finding it's still non-trivial to work out the major issue: after dest load became a thread, it means it can't take BQL for too long otherwise it blocks the monitor. It means we can't take bql for _any_ IO operation because it can stuck at any IO waiting for the iochannel / NIC, aka, any qemu_get*() API invoked. Meanwhile we still trivially need the bql from time to time, either in pre_load() / post_load(), or some of VMStateInfo->get(). But still I can't blindly take them, as in any VMStateInfo->get(), it can invoke qemu_get*() itself. So I am not sure whether what we can get from that is worthwhile yet on the effort to make it work.. > > > commands, and was intended to be used for any case where we had a long > > running operation triggered by a command. Migration was a poster-child > > example of what its intended for, but was left alone when we first > > introduced the job APIs. > > > > The 'job-cancel' API would obsolete 'migrate-cancel'. > > > > The other interestnig thing is that the job framework creates a well > > defined lifecycle for a job, that allows querying information about > > the job after completeion, but without QEMU having to keep that info > > around forever. ie once a job has finished, an app can query info > > about completion, and when it no longer needs that info, it can > > call 'job-dismiss' to tell QEMU to discard it. > > > > If "MigrationState" were associated a job, then it would thus have a > > clear 'creation' and 'deletion' time. It'll face the same challenge here on whether we can join() in the main thread. IOW, job-cancel can take time which can also potentially block qemu from quitting fast.
On Tue, Oct 29, 2024 at 10:20:24AM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Fri, Oct 25, 2024 at 02:43:06PM +0100, Daniel P. Berrangé wrote: > >> > >> The migration QAPI design has always felt rather odd to me, in that we > >> have perfectly good commands "migrate" & "migrate-incoming" that are able > >> to accept an arbitrary list of parameters when invoked. Instead of passing > >> parameters to them though, we instead require apps use the separate > >> migreate-set-parameters/capabiltiies commands many times over to set > >> global variables which the later 'migrate' command then uses. > >> > >> The reason for this is essentially a historical mistake - we copied the > >> way we did it from HMP, which was this way because HMP was bad at supporting > >> arbitrary customizable paramters to commands. I wish we hadn't copied this > >> design over to QMP. > >> > >> To bring it back on topic, we need QMP on the dest to set parameters, > >> because -incoming was limited to only take the URI. > >> > >> If the "migrate-incoming" command accepted all parameters directly, > >> then we could use QAPI visitor to usupport a "-incoming ..." command > >> that took an arbitrary JSON document and turned it into a call to > >> "migrate-incoming". > >> > >> With that we would never need QMP on the target for cpr-exec, avoiding > >> this ordering poblem you're facing....assuming we put processing of > >> -incoming at the right point in the code flow > >> > >> Can we fix this design and expose the full configurability on the > >> CLI using QAPI schema & inline JSON, like we do for other QAPI-ified > >> CLI args. > >> > >> It seems entirely practical to me to add parameters to 'migrate-incoming' > >> in a backwards compatible manner and deprecate set-parameters/capabilities > > > > Incidentally, if we were going to evolve the migration API at all, then > > it probably ought to start making use of the async job infrastructure > > we have available. This is use by block jobs, and by the internal snapshot > > I'm all for standardization on core infrastructure, but unfortunately > putting migration in a coroutine would open a can of worms. In fact, > we've been discussing about moving the incoming side out of coroutines > for a while. Yeah, I can understand that. The job API at the QMP level has no association with coroutines. It simply expresses a way to handle long running background jobs in a standard manner. The dependency on coroutines is purely the internal job APIs. I wonder if it is at all practical to alter the job APIs to allow migration to use them as it exists today, as the migration code already is quite capable to running in the background, without adding more coroutine usage. It would be quite annoying if our general purpose QMP API cannot be used because internal only impl limitations :-( With regards, Daniel