diff mbox series

qmp: Stabilize preconfig

Message ID b31f442d28920447690a6b8cee865bdbacde1283.1635160056.git.mprivozn@redhat.com (mailing list archive)
State New, archived
Headers show
Series qmp: Stabilize preconfig | expand

Commit Message

Michal Privoznik Oct. 25, 2021, 11:08 a.m. UTC
The -preconfig option and exit-preconfig command are around for
quite some time now. However, they are still marked as unstable.
This is suboptimal because it may block some upper layer in
consuming it. In this specific case - Libvirt avoids using
experimental features.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 docs/system/managed-startup.rst | 2 +-
 monitor/hmp-cmds.c              | 2 +-
 qapi/misc.json                  | 6 +++---
 qemu-options.hx                 | 5 ++---
 softmmu/vl.c                    | 4 ++--
 tests/qtest/numa-test.c         | 8 ++++----
 tests/qtest/qmp-test.c          | 6 +++---
 7 files changed, 16 insertions(+), 17 deletions(-)

Comments

Markus Armbruster Oct. 25, 2021, 12:19 p.m. UTC | #1
Michal Privoznik <mprivozn@redhat.com> writes:

> The -preconfig option and exit-preconfig command are around for
> quite some time now. However, they are still marked as unstable.
> This is suboptimal because it may block some upper layer in
> consuming it. In this specific case - Libvirt avoids using
> experimental features.
>
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

If I remember correctly, the motivation for -preconfig was NUMA
configuration via QMP.  More uses may have appeared since.

Back then, I questioned the need for yet another option and yet another
state: why not -S?

The answer boiled down to

0. Yes, having just one would be a simpler and cleaner interface, but

1. the godawful mess QEMU startup has become makes -S unsuitable for
   some things we want to do, so we need -preconfig,

2. which is in turn unsuitable for other things we want to do, so we
   still need -S".

3. Cleaning up the mess to the point where "simpler and cleaner" becomes
   viable again is not in the cards right now.

Details in the sub-thread
    Message-ID: <87zi21apkh.fsf@dusky.pond.sub.org>
    https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg02447.html

My conclusion was

    My gut's reaction to preconfig is as nauseous as ever.  It adds
    complexity we wouldn't want or need with sanely structured program
    startup.  It might still be the best we can do, and something we need to
    do.  You have to take on technical debt sometimes.  To know for sure
    this is the case here, we'd have to explore alternatives more seriously.
    Like repaying some of the existing technical debt around there.  Sadly,
    I lack the time to take on such a project.

    Since I lack the money to put where my mouth is, I'm going to shut up
    and get out of your way.

    Message-ID: <87tvqx12no.fsf@dusky.pond.sub.org>
    https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg05584.html

However, we made it experimental:

commit 361ac948a5c960ce7a093cec1744bff0d5af3dec
Author: Markus Armbruster <armbru@redhat.com>
Date:   Thu Jul 5 11:14:02 2018 +0200

    cli qmp: Mark --preconfig, exit-preconfig experimental
    
    Committing to the current --preconfig / exit-preconfig interface
    before it has seen any use is premature.  Mark both as experimental,
    the former in documentation, the latter by renaming it to
    x-exit-preconfig.
    
    See the previous commit for more detailed rationale.
    
    Signed-off-by: Markus Armbruster <armbru@redhat.com>
    Message-Id: <20180705091402.26244-3-armbru@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Acked-by: Eduardo Habkost <ehabkost@redhat.com>
    Acked-by: Igor Mammedov <imammedo@redhat.com>
    [Straightforward conflict with commit 514337c142f resolved]

commit 1f214ee1b83afd10fd5e1b63f4ecc03f9a8115c4
Author: Markus Armbruster <armbru@redhat.com>
Date:   Thu Jul 5 11:14:01 2018 +0200

    qapi: Do not expose "allow-preconfig" in query-qmp-schema
    
    According to commit 047f7038f58, option --preconfig
    
        [...] allows pausing QEMU in the new RUN_STATE_PRECONFIG state,
        allowing the configuration of QEMU from QMP before the machine
        jumps into board initialization code of machine_run_board_init()
    
        The intent is to allow management to query machine state and
        additionally configure it using previous query results within one
        QEMU instance (i.e. eliminate the need to start QEMU twice, 1st to
        query board specific parameters and 2nd for actual VM start using
        query results for additional parameters).
    
    The implementation is a bit of a hack: it splices in an additional
    main loop before machine creation, in special runstate preconfig.  New
    command exit-preconfig exits that main loop.  QEMU continues
    initializing, creates the machine, and runs the good old main loop.
    The replacement of the main loop is transparent to monitors.
    
    Sadly, some commands expect initialization to be complete.  Running
    them in --preconfig's main loop violates their preconditions.  Since
    we don't really know which commands are safe, we use a whitelist.
    This drags the concept of run state into the QMP core.
    
    The whitelist is done as a command flag in the QAPI schema (commit
    d6fe3d02e9a).  Drags the concept of run state further into the QAPI
    language.
    
    The command flag is exposed in query-qmp-schema (also commit
    d6fe3d02e9a).  This makes it ABI.
    
    I consider the whole thing an offensively ugly hack, but sometimes an
    ugly hack is the best we can do to solve a problem people have.
    
    The need described by the commit message quote above is genuine.  The
    proper solution would be a main loop that permits complete
    configuration via QMP.  This is out of reach, thus the hack.
    
    However, even though the need is genuine, it isn't urgent: libvirt is
    not going to use this anytime soon.  Baking a hack into ABI before it
    has any users is a bad idea.
    
    This commit reverts the parts of commit d6fe3d02e9a that affect ABI
    via query-qmp-schema.  The commit did the following:
    
    (1) Add command flag 'allow-preconfig' to the QAPI schema language
    
    (2) Pass it to code generators
    
    (3) Have the commands.py code generator pass it to the command
        registry (so commit 047f7038f58 can use it as whitelist)
    
    (4) Add 'allow-preconfig' to SchemaInfoCommand (neglecting to update
        qapi-code-gen.txt section "Client JSON Protocol introspection")
    
    (5) Set 'allow-preconfig': true for commands qmp_capabilities,
        query-commands, query-command-line-options, query-status
    
    Revert exactly (4), plus a bit of documentation added to
    qemu-tech.info in commit 047f7038f58.
    
    Shrinks query-qmp-schema's output from 126.5KiB to 121.8KiB for me.
    
    Signed-off-by: Markus Armbruster <armbru@redhat.com>
    Message-Id: <20180705091402.26244-2-armbru@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Acked-by: Eduardo Habkost <ehabkost@redhat.com>
    Acked-by: Igor Mammedov <imammedo@redhat.com>
    [Straightforward conflict with commit d626b6c1ae7 resolved]

Before we make it a stable interface, we should ponder:

* Is cleaning up the mess to the point where "simpler and cleaner"
  becomes viable again still impractical?

* If yes, what are the chances of it becoming practical?
Paolo Bonzini Oct. 25, 2021, 5:01 p.m. UTC | #2
On 25/10/21 14:19, Markus Armbruster wrote:
> Michal Privoznik <mprivozn@redhat.com> writes:
> 
>> The -preconfig option and exit-preconfig command are around for
>> quite some time now. However, they are still marked as unstable.
>> This is suboptimal because it may block some upper layer in
>> consuming it. In this specific case - Libvirt avoids using
>> experimental features.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> 
> If I remember correctly, the motivation for -preconfig was NUMA
> configuration via QMP.  More uses may have appeared since.
> 
> Back then, I questioned the need for yet another option and yet another
> state: why not -S?
> 
> The answer boiled down to
> 
> 0. Yes, having just one would be a simpler and cleaner interface, but
> 
> 1. the godawful mess QEMU startup has become makes -S unsuitable for
>     some things we want to do, so we need -preconfig,
> 
> 2. which is in turn unsuitable for other things we want to do, so we
>     still need -S".
> 
> 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>     viable again is not in the cards right now.

Some things have changed since then:

* The preconfig implementation is much, much better than it used to be. 
  There's no preconfig runstate anymore and QEMU effectively always 
starts in preconfig mode; it simply invokes x-exit-preconfig if 
preconfig is not specified.  This also made it sensible to add a lot 
make more commands to allow_preconfig.

* On the other hand, preconfig still does not support all machine 
initialization, and especially it does not support device-add (so 
libvirt could not, for example, remove all of its 
-blockdev/-netdev/-device code in favor of QMP).  And -machine supports 
compound options (albeit not JSON yet; see commit d8fb7d0969, "vl: 
switch -M parsing to keyval", 2021-07-06) which those could be used for 
NUMA.  This makes preconfig mode much less interesting compared to a 
QMP-only QEMU executable.

If we decide that preconfig is the way to go, I would still not 
stabilize it in its current form, and would do a couple aesthetic 
adjustments before: 1) make loadvm, cont and migrate-incoming exit 
preconfig, 2) add a new command finish-machine-init be the equivalent of 
"exit preconfig and stay paused", 3) make -S implicit if -preconfig is 
specified.

As an aside, I agree with the original decision not to expose 
allow-preconfig in query-qmp-schema.  Originally only a couple commands 
were exposed in preconfig mode; these days, a command should simply be 
allowed in preconfig mode if it makes sense, and the only ones that are 
missing are block device commands[1].  That patch should be a 
precondition for stabilizing preconfig.

>     The implementation is a bit of a hack: it splices in an additional
>     main loop before machine creation, in special runstate preconfig.  New
>     command exit-preconfig exits that main loop.  QEMU continues
>     initializing, creates the machine, and runs the good old main loop.
>     The replacement of the main loop is transparent to monitors.

Quite an understatement. :)

> Before we make it a stable interface, we should ponder:
> 
> * Is cleaning up the mess to the point where "simpler and cleaner"
>    becomes viable again still impractical?
> 
> * If yes, what are the chances of it becoming practical?

To sum up: it's been cleaned up, and preconfig has benefited from it. 
The question is whether to keep cleaning up (and then the obvious 
direction is the QMP-only QEMU executable), or decide that we've gotten 
to the point of diminishing returns.

Paolo

[1] https://patchew.org/QEMU/20210511153949.506200-1-pbonzini@redhat.com/
Michal Privoznik Nov. 1, 2021, 2:37 p.m. UTC | #3
On 10/25/21 2:19 PM, Markus Armbruster wrote:
> Michal Privoznik <mprivozn@redhat.com> writes:
> 
>> The -preconfig option and exit-preconfig command are around for
>> quite some time now. However, they are still marked as unstable.
>> This is suboptimal because it may block some upper layer in
>> consuming it. In this specific case - Libvirt avoids using
>> experimental features.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> 
> If I remember correctly, the motivation for -preconfig was NUMA
> configuration via QMP.  More uses may have appeared since.
> 
> Back then, I questioned the need for yet another option and yet another
> state: why not -S?
> 
> The answer boiled down to
> 
> 0. Yes, having just one would be a simpler and cleaner interface, but
> 
> 1. the godawful mess QEMU startup has become makes -S unsuitable for
>    some things we want to do, so we need -preconfig,
> 
> 2. which is in turn unsuitable for other things we want to do, so we
>    still need -S".
> 
> 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>    viable again is not in the cards right now.

I see a difference between the two. -preconfig starts QEMU in such a way
that its configuration can still be changed (in my particular use case
vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
we had one state for both, then some commands must be forbidden from
executing as soon as 'cont' is issued. Moreover, those commands would
need to do much more than they are doing now (e.g. regenerate ACPI table
after each run). Subsequently, validating configuration would need to be
postponed until the first 'cont' because with just one state QEMU can't
know when the last config command was issued.

Having said all of that, I'm not sure if -preconfig is the way to go or
we want to go the other way. I don't have a strong opinion.

Michal
Daniel P. Berrangé Nov. 1, 2021, 2:57 p.m. UTC | #4
On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
> On 10/25/21 2:19 PM, Markus Armbruster wrote:
> > Michal Privoznik <mprivozn@redhat.com> writes:
> > 
> >> The -preconfig option and exit-preconfig command are around for
> >> quite some time now. However, they are still marked as unstable.
> >> This is suboptimal because it may block some upper layer in
> >> consuming it. In this specific case - Libvirt avoids using
> >> experimental features.
> >>
> >> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> > 
> > If I remember correctly, the motivation for -preconfig was NUMA
> > configuration via QMP.  More uses may have appeared since.
> > 
> > Back then, I questioned the need for yet another option and yet another
> > state: why not -S?
> > 
> > The answer boiled down to
> > 
> > 0. Yes, having just one would be a simpler and cleaner interface, but
> > 
> > 1. the godawful mess QEMU startup has become makes -S unsuitable for
> >    some things we want to do, so we need -preconfig,
> > 
> > 2. which is in turn unsuitable for other things we want to do, so we
> >    still need -S".
> > 
> > 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
> >    viable again is not in the cards right now.
> 
> I see a difference between the two. -preconfig starts QEMU in such a way
> that its configuration can still be changed (in my particular use case
> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
> we had one state for both, then some commands must be forbidden from
> executing as soon as 'cont' is issued. Moreover, those commands would
> need to do much more than they are doing now (e.g. regenerate ACPI table
> after each run). Subsequently, validating configuration would need to be
> postponed until the first 'cont' because with just one state QEMU can't
> know when the last config command was issued.
> 
> Having said all of that, I'm not sure if -preconfig is the way to go or
> we want to go the other way. I don't have a strong opinion.

It feels like the scenario here is really just a specialization of the
more general problem we want to be able to solve. Namely, we want to be
able to start a bare QEMU and configure it entirely on the fly. IOW, we
are really targetting for -preconfig to be able to do /all/ configuration,
and with a new ELF binary, at which point -preconfig wouldn't exist, it
would be the implicit default.

Libvirt primarily uses -S because it needs to query various aspects of
QEMU's config before CPUs start executing, while QEMU can still be
considered trustworthy (as it hasn't executed untrusted guest code 
yet). eg we query vCPU PIDs so that we can apply CPU pinning to them. 
We query the CPU model features so we can reflect what exact CPU 
features we got from KVM. There are various other examples.

I don't see this as inherantly tied to the -S option from a conceptual
POV. We do have some ordering constraints, but they are not as crude
as -preconfig / -S.  For querying vCPU PIDs, we have a constraint that
the accelerator and machine objects must have been created. For querying
CPU models, we have a similar constraint.  IOW, libvirt should be fine
with doing /everything/ in a hypothetical -preconfig state, provided
that we know about the ordering constraints wrt object creation.


The secondary reason we use -S is that sometimes the mgmt app does
not actually want the guest CPUs to start running - they actively
want it in a paused state initially and will manually start CPUs
later. One reason is to enable them to open the serial console
backend before CPUs start, to guarantee that no console output is
lost in that small startup window.  This is really the original
purpose of -S.  This doesn't imply a need for -S. I'd say that
-preconfig should essentially imply -S by default. If you're
already doing lots of things via QMP, being required to issue
a 'cont' command is no hardship.


Regards,
Daniel
Markus Armbruster Nov. 3, 2021, 8:02 a.m. UTC | #5
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
>> On 10/25/21 2:19 PM, Markus Armbruster wrote:
>> > Michal Privoznik <mprivozn@redhat.com> writes:
>> > 
>> >> The -preconfig option and exit-preconfig command are around for
>> >> quite some time now. However, they are still marked as unstable.
>> >> This is suboptimal because it may block some upper layer in
>> >> consuming it. In this specific case - Libvirt avoids using
>> >> experimental features.
>> >>
>> >> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> > 
>> > If I remember correctly, the motivation for -preconfig was NUMA
>> > configuration via QMP.  More uses may have appeared since.
>> > 
>> > Back then, I questioned the need for yet another option and yet another
>> > state: why not -S?
>> > 
>> > The answer boiled down to
>> > 
>> > 0. Yes, having just one would be a simpler and cleaner interface, but
>> > 
>> > 1. the godawful mess QEMU startup has become makes -S unsuitable for
>> >    some things we want to do, so we need -preconfig,
>> > 
>> > 2. which is in turn unsuitable for other things we want to do, so we
>> >    still need -S".
>> > 
>> > 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>> >    viable again is not in the cards right now.
>> 
>> I see a difference between the two. -preconfig starts QEMU in such a way
>> that its configuration can still be changed (in my particular use case
>> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
>> we had one state for both, then some commands must be forbidden from
>> executing as soon as 'cont' is issued. Moreover, those commands would
>> need to do much more than they are doing now (e.g. regenerate ACPI table
>> after each run). Subsequently, validating configuration would need to be
>> postponed until the first 'cont' because with just one state QEMU can't
>> know when the last config command was issued.

Doesn't all this apply to x-exit-preconfig already?

* Some commands are only allowed before x-exit-preconfig,
  e.g. set-numa-node.

* The complete (pre-)configuration is only available at
  x-exit-preconfig.  In particular, ACPI tables can be fixed only then.

>> Having said all of that, I'm not sure if -preconfig is the way to go or
>> we want to go the other way. I don't have a strong opinion.
>
> It feels like the scenario here is really just a specialization of the
> more general problem we want to be able to solve. Namely, we want to be
> able to start a bare QEMU and configure it entirely on the fly. IOW, we
> are really targetting for -preconfig to be able to do /all/ configuration,
> and with a new ELF binary, at which point -preconfig wouldn't exist, it
> would be the implicit default.

Whether -preconfig is the default or an option doesn't matter for
discussing the state machine.

> Libvirt primarily uses -S because it needs to query various aspects of
> QEMU's config before CPUs start executing, while QEMU can still be
> considered trustworthy (as it hasn't executed untrusted guest code 
> yet). eg we query vCPU PIDs so that we can apply CPU pinning to them. 
> We query the CPU model features so we can reflect what exact CPU 
> features we got from KVM. There are various other examples.

Which of the queries you need work only between x-exit-preconfig and -S?

Which of them could be made to work before x-exit-preconfig?

> I don't see this as inherantly tied to the -S option from a conceptual
> POV. We do have some ordering constraints, but they are not as crude
> as -preconfig / -S.  For querying vCPU PIDs, we have a constraint that
> the accelerator and machine objects must have been created. For querying
> CPU models, we have a similar constraint.  IOW, libvirt should be fine
> with doing /everything/ in a hypothetical -preconfig state, provided
> that we know about the ordering constraints wrt object creation.

Plausible.

> The secondary reason we use -S is that sometimes the mgmt app does
> not actually want the guest CPUs to start running - they actively
> want it in a paused state initially and will manually start CPUs
> later. One reason is to enable them to open the serial console
> backend before CPUs start, to guarantee that no console output is
> lost in that small startup window.  This is really the original
> purpose of -S.  This doesn't imply a need for -S. I'd say that
> -preconfig should essentially imply -S by default. If you're
> already doing lots of things via QMP, being required to issue
> a 'cont' command is no hardship.

I wonder whether we really have to step through three states

         x-exit-preconfig  cont
    preconfig ---> pre run ---> run

and not two

            cont
    pre run ---> run
Daniel P. Berrangé Nov. 3, 2021, 9:27 a.m. UTC | #6
On Wed, Nov 03, 2021 at 09:02:49AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
> >> On 10/25/21 2:19 PM, Markus Armbruster wrote:
> >> > Michal Privoznik <mprivozn@redhat.com> writes:
> >> > 
> >> >> The -preconfig option and exit-preconfig command are around for
> >> >> quite some time now. However, they are still marked as unstable.
> >> >> This is suboptimal because it may block some upper layer in
> >> >> consuming it. In this specific case - Libvirt avoids using
> >> >> experimental features.
> >> >>
> >> >> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> >> > 
> >> > If I remember correctly, the motivation for -preconfig was NUMA
> >> > configuration via QMP.  More uses may have appeared since.
> >> > 
> >> > Back then, I questioned the need for yet another option and yet another
> >> > state: why not -S?
> >> > 
> >> > The answer boiled down to
> >> > 
> >> > 0. Yes, having just one would be a simpler and cleaner interface, but
> >> > 
> >> > 1. the godawful mess QEMU startup has become makes -S unsuitable for
> >> >    some things we want to do, so we need -preconfig,
> >> > 
> >> > 2. which is in turn unsuitable for other things we want to do, so we
> >> >    still need -S".
> >> > 
> >> > 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
> >> >    viable again is not in the cards right now.
> >> 
> >> I see a difference between the two. -preconfig starts QEMU in such a way
> >> that its configuration can still be changed (in my particular use case
> >> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
> >> we had one state for both, then some commands must be forbidden from
> >> executing as soon as 'cont' is issued. Moreover, those commands would
> >> need to do much more than they are doing now (e.g. regenerate ACPI table
> >> after each run). Subsequently, validating configuration would need to be
> >> postponed until the first 'cont' because with just one state QEMU can't
> >> know when the last config command was issued.
> 
> Doesn't all this apply to x-exit-preconfig already?
> 
> * Some commands are only allowed before x-exit-preconfig,
>   e.g. set-numa-node.
> 
> * The complete (pre-)configuration is only available at
>   x-exit-preconfig.  In particular, ACPI tables can be fixed only then.
> 
> >> Having said all of that, I'm not sure if -preconfig is the way to go or
> >> we want to go the other way. I don't have a strong opinion.
> >
> > It feels like the scenario here is really just a specialization of the
> > more general problem we want to be able to solve. Namely, we want to be
> > able to start a bare QEMU and configure it entirely on the fly. IOW, we
> > are really targetting for -preconfig to be able to do /all/ configuration,
> > and with a new ELF binary, at which point -preconfig wouldn't exist, it
> > would be the implicit default.
> 
> Whether -preconfig is the default or an option doesn't matter for
> discussing the state machine.
> 
> > Libvirt primarily uses -S because it needs to query various aspects of
> > QEMU's config before CPUs start executing, while QEMU can still be
> > considered trustworthy (as it hasn't executed untrusted guest code 
> > yet). eg we query vCPU PIDs so that we can apply CPU pinning to them. 
> > We query the CPU model features so we can reflect what exact CPU 
> > features we got from KVM. There are various other examples.
> 
> Which of the queries you need work only between x-exit-preconfig and -S?

Well before x-exit-preconfig, QMP only permits a very small number
of commands - QEMU has loosened that up a bit, but I don't think anyone
has checked whether there's enough to cover libvirt's current usage yet.

> Which of them could be made to work before x-exit-preconfig?

Quite a few i expect.


> > The secondary reason we use -S is that sometimes the mgmt app does
> > not actually want the guest CPUs to start running - they actively
> > want it in a paused state initially and will manually start CPUs
> > later. One reason is to enable them to open the serial console
> > backend before CPUs start, to guarantee that no console output is
> > lost in that small startup window.  This is really the original
> > purpose of -S.  This doesn't imply a need for -S. I'd say that
> > -preconfig should essentially imply -S by default. If you're
> > already doing lots of things via QMP, being required to issue
> > a 'cont' command is no hardship.
> 
> I wonder whether we really have to step through three states
> 
>          x-exit-preconfig  cont
>     preconfig ---> pre run ---> run
> 
> and not two
> 
>             cont
>     pre run ---> run

Looking at it from POV of configuration we have two states, with
a unidirectional transition permitted

  unconfigured   --->  configured

Then from the POV of guest CPUs we have two states, with a
bi-directional transition permitted.

   stopped   <----->  running


During QEMU start process we have two end goals we need to satisfy

 *  configured + running (the 95+% common case)
 *  configured + stopped (the rarer case)


So in terms of QEMU internal state transitions it feels like we do
likely need to distinguish pre-config separately from stopped, but
from a CLI arg POV I think it is redundant to distinguish them as
"stopped" can be reasonably implied as a default

Regards,
Daniel
Michal Privoznik Nov. 10, 2021, 12:54 p.m. UTC | #7
On 11/3/21 9:02 AM, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
>> On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
>>> On 10/25/21 2:19 PM, Markus Armbruster wrote:
>>>> Michal Privoznik <mprivozn@redhat.com> writes:
>>>>
>>>>> The -preconfig option and exit-preconfig command are around for
>>>>> quite some time now. However, they are still marked as unstable.
>>>>> This is suboptimal because it may block some upper layer in
>>>>> consuming it. In this specific case - Libvirt avoids using
>>>>> experimental features.
>>>>>
>>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>>
>>>> If I remember correctly, the motivation for -preconfig was NUMA
>>>> configuration via QMP.  More uses may have appeared since.
>>>>
>>>> Back then, I questioned the need for yet another option and yet another
>>>> state: why not -S?
>>>>
>>>> The answer boiled down to
>>>>
>>>> 0. Yes, having just one would be a simpler and cleaner interface, but
>>>>
>>>> 1. the godawful mess QEMU startup has become makes -S unsuitable for
>>>>    some things we want to do, so we need -preconfig,
>>>>
>>>> 2. which is in turn unsuitable for other things we want to do, so we
>>>>    still need -S".
>>>>
>>>> 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>>>>    viable again is not in the cards right now.
>>>
>>> I see a difference between the two. -preconfig starts QEMU in such a way
>>> that its configuration can still be changed (in my particular use case
>>> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
>>> we had one state for both, then some commands must be forbidden from
>>> executing as soon as 'cont' is issued. Moreover, those commands would
>>> need to do much more than they are doing now (e.g. regenerate ACPI table
>>> after each run). Subsequently, validating configuration would need to be
>>> postponed until the first 'cont' because with just one state QEMU can't
>>> know when the last config command was issued.
> 
> Doesn't all this apply to x-exit-preconfig already?
> 
> * Some commands are only allowed before x-exit-preconfig,
>   e.g. set-numa-node.
> 
> * The complete (pre-)configuration is only available at
>   x-exit-preconfig.  In particular, ACPI tables can be fixed only then.

So why was preconfig introduced in the first place? I mean, from
libvirt's POV it doesn't really matter whether it needs to use both
-preconfig and -S or just -S (or some new -option). But ideally, we
would start QEMU with nothing but monitor config and then pass whole
configuration via the monitor. I thought it would be simpler for QEMU if
it had three states.

Michal
Damien Hedde Nov. 10, 2021, 1:23 p.m. UTC | #8
On 11/10/21 13:54, Michal Prívozník wrote:
> On 11/3/21 9:02 AM, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>
>>> On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
>>>> On 10/25/21 2:19 PM, Markus Armbruster wrote:
>>>>> Michal Privoznik <mprivozn@redhat.com> writes:
>>>>>
>>>>>> The -preconfig option and exit-preconfig command are around for
>>>>>> quite some time now. However, they are still marked as unstable.
>>>>>> This is suboptimal because it may block some upper layer in
>>>>>> consuming it. In this specific case - Libvirt avoids using
>>>>>> experimental features.
>>>>>>
>>>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>>>
>>>>> If I remember correctly, the motivation for -preconfig was NUMA
>>>>> configuration via QMP.  More uses may have appeared since.
>>>>>
>>>>> Back then, I questioned the need for yet another option and yet another
>>>>> state: why not -S?
>>>>>
>>>>> The answer boiled down to
>>>>>
>>>>> 0. Yes, having just one would be a simpler and cleaner interface, but
>>>>>
>>>>> 1. the godawful mess QEMU startup has become makes -S unsuitable for
>>>>>     some things we want to do, so we need -preconfig,
>>>>>
>>>>> 2. which is in turn unsuitable for other things we want to do, so we
>>>>>     still need -S".
>>>>>
>>>>> 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
>>>>>     viable again is not in the cards right now.
>>>>
>>>> I see a difference between the two. -preconfig starts QEMU in such a way
>>>> that its configuration can still be changed (in my particular use case
>>>> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
>>>> we had one state for both, then some commands must be forbidden from
>>>> executing as soon as 'cont' is issued. Moreover, those commands would
>>>> need to do much more than they are doing now (e.g. regenerate ACPI table
>>>> after each run). Subsequently, validating configuration would need to be
>>>> postponed until the first 'cont' because with just one state QEMU can't
>>>> know when the last config command was issued.
>>
>> Doesn't all this apply to x-exit-preconfig already?
>>
>> * Some commands are only allowed before x-exit-preconfig,
>>    e.g. set-numa-node.
>>
>> * The complete (pre-)configuration is only available at
>>    x-exit-preconfig.  In particular, ACPI tables can be fixed only then.
> 
> So why was preconfig introduced in the first place? I mean, from
> libvirt's POV it doesn't really matter whether it needs to use both
> -preconfig and -S or just -S (or some new -option). But ideally, we
> would start QEMU with nothing but monitor config and then pass whole
> configuration via the monitor. I thought it would be simpler for QEMU if
> it had three states.
> 
> Michal
> 
> 

IMHO only introducing preconfig allowed to pause qemu at an early-enough 
stage to do qemu configuration. '-S' was just way too late.

We can hope being able, some day, to start with only preconfig and 
monitor setup and do all the rest using the monitor. AFAIK, right now 
accelerator, machine and device options cannot be configured on the 
monitor even with preconfig. For the devices it should be doable soon.

Damien
Paolo Bonzini Nov. 10, 2021, 9:56 p.m. UTC | #9
> On 11/3/21 09:02, Markus Armbruster wrote:
>> I wonder whether we really have to step through three states
>> 
>>           x-exit-preconfig  cont
>>      preconfig ---> pre run ---> run
>> 
>> and not two
>> 
>>              cont
>>      pre run ---> run

Devices would be hotplugged between x-exit-preconfig and cont, and part 
of the machine until x-exit-preconfig; so there is a need for something 
like x-exit-preconfig.

In my prototype of a QMP-only binary, the idea would be that there 
wouldn't be a single x-exit-preconfig command, but "cont", 
"migrate-incoming", "finish-machine-init" (the stable replacement for 
x-exit-preconfig) and "loadvm" would all complete the configuration of 
the machine.  "finish-machine-init" would do nothing else, the others 
would continue with whatever they were supposed to do.

>> Which of the queries you need work only between x-exit-preconfig and -S?
> 
> Well before x-exit-preconfig, QMP only permits a very small number
> of commands - QEMU has loosened that up a bit, but I don't think anyone
> has checked whether there's enough to cover libvirt's current usage yet.

Indeed I looked at the commands that operate on the backends, but not 
that much at query commands.

Paolo
Markus Armbruster Nov. 11, 2021, 6:11 a.m. UTC | #10
Paolo Bonzini <pbonzini@redhat.com> writes:

>> On 11/3/21 09:02, Markus Armbruster wrote:
>>> I wonder whether we really have to step through three states
>>>
>>>           x-exit-preconfig  cont
>>>      preconfig ---> pre run ---> run
>>>
>>> and not two
>>>
>>>              cont
>>>      pre run ---> run
>
> Devices would be hotplugged between x-exit-preconfig and cont, and

Cold plugged!

> part of the machine until x-exit-preconfig; so there is a need for
> something like x-exit-preconfig.

Can you briefly explain why device_add doesn't work before
x-exit-preconfig and does after?

> In my prototype of a QMP-only binary, the idea would be that there
> wouldn't be a single x-exit-preconfig command, but "cont", 
> "migrate-incoming", "finish-machine-init" (the stable replacement for
> x-exit-preconfig) and "loadvm" would all complete the configuration of 
> the machine.  "finish-machine-init" would do nothing else, the others
> would continue with whatever they were supposed to do.
>
>>> Which of the queries you need work only between x-exit-preconfig and -S?
>> Well before x-exit-preconfig, QMP only permits a very small number
>> of commands - QEMU has loosened that up a bit, but I don't think anyone
>> has checked whether there's enough to cover libvirt's current usage yet.
>
> Indeed I looked at the commands that operate on the backends, but not
> that much at query commands.
>
> Paolo
Paolo Bonzini Nov. 11, 2021, 8:15 a.m. UTC | #11
On 11/11/21 07:11, Markus Armbruster wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>>> On 11/3/21 09:02, Markus Armbruster wrote:
>>>> I wonder whether we really have to step through three states
>>>>
>>>>            x-exit-preconfig  cont
>>>>       preconfig ---> pre run ---> run
>>>>
>>>> and not two
>>>>
>>>>               cont
>>>>       pre run ---> run
>>
>> Devices would be hotplugged between x-exit-preconfig and cont, and
> 
> Cold plugged!

Nope, hotplugged!  After x-exit-preconfig, the machine is ready to 
start, and that means that the machine will have completed 
initialization via their machine_init_done notifiers.

For example, fw_cfg will have set the bootorder.  Any device created 
after x-exit-preconfig will not be in the bootorder.

>> part of the machine until x-exit-preconfig; so there is a need for
>> something like x-exit-preconfig.
> 
> Can you briefly explain why device_add doesn't work before
> x-exit-preconfig and does after?

The answer to this question is basically the verbose version of the 
coldplug/hotplug thing from above.  There are five stages in the startup 
of QEMU (marked by different values of the MachineInitPhase enum):

1) PHASE_NO_MACHINE - backends can already be created here, but no 
machine exists yet

2) PHASE_MACHINE_CREATED - the machine object has been created.  It's 
not initialized, but it's there.

3) PHASE_ACCEL_CREATED - the accelerator object has been created.  The 
accelerator needs the machine object, because for example KVM might not 
support all machine types.  So the accelerator queries the machine 
object and fails creation in case of incompatibility.  This enables e.g. 
fallback to TCG.  -preconfig starts the monitor here.

4) PHASE_MACHINE_INIT - machine initialization consists mostly in 
creating the onboard devices.  For this to happen, the machine needs to 
learn about the accelerator, because onboard devices include CPUs and 
other accelerator-dependent devices.  Devices plugged in this phase are 
cold-plugged.

5) PHASE_MACHINE_READY - machine init done notifiers have been called 
and the VM is ready.  Devices plugged in this phase already count as 
hot-plugged.  -S starts the monitor here.

x-exit-preconfig goes straight from PHASE_ACCEL_CREATED to 
PHASE_MACHINE_READY.  Devices can only be created after 
PHASE_MACHINE_INIT, so device_add cannot be enabled at preconfig stage. 
  Why does preconfig start at PHASE_ACCEL_CREATED?  Well, the phases 
were not as easy to identify in qemu_init() when it was introduced, so I 
suppose it just seemed like a good place. :)  These days, qemu_init() is 
just a hundred lines of code apart from the huge command line parsing 
switch statement, so we have a clearer idea of the steps and you can 
look deeper at what happens in each phase if you want.  phase_advance() 
is your friend.


With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be 
reached with a machine-set command (corresponding to the non-deprecated 
parts of -machine) and PHASE_ACCEL_CREATED would be reached with an 
accel-set command (corresponding to -accel).

I haven't yet thought hard enough whether accel-set could go directly 
from PHASE_MACHINE_CREATED to PHASE_MACHINE_INIT.  It probably depends 
on how CPUs would be configured in the QMP flow; if accel-set must 
return at PHASE_ACCEL_CREATED, a separate command is needed to reach 
PHASE_MACHINE_INIT.  But either way, there the monitor would be 
accessible at PHASE_MACHINE_INIT, where device_add works and cold-plugs 
the devices.

Paolo
Markus Armbruster Nov. 11, 2021, 2:37 p.m. UTC | #12
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/11/21 07:11, Markus Armbruster wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>>>> On 11/3/21 09:02, Markus Armbruster wrote:
>>>>> I wonder whether we really have to step through three states
>>>>>
>>>>>            x-exit-preconfig  cont
>>>>>       preconfig ---> pre run ---> run
>>>>>
>>>>> and not two
>>>>>
>>>>>               cont
>>>>>       pre run ---> run
>>>
>>> Devices would be hotplugged between x-exit-preconfig and cont, and
>> 
>> Cold plugged!
>
> Nope, hotplugged!  After x-exit-preconfig, the machine is ready to
> start, and that means that the machine will have completed 
> initialization via their machine_init_done notifiers.
>
> For example, fw_cfg will have set the bootorder.  Any device created
> after x-exit-preconfig will not be in the bootorder.

After re-reading this a couple of times, and checking the code, I now
see what I missed.

There has to be a point where we transition from cold to hot plug.  It
obviously can be no later than VM starting to run, i.e. first cont.

We actually moved it from there back to x-exit-preconfig.  I missed /
forgot that.

Why does it have to be moved back?  Let's see below.

>>> part of the machine until x-exit-preconfig; so there is a need for
>>> something like x-exit-preconfig.
>> 
>> Can you briefly explain why device_add doesn't work before
>> x-exit-preconfig and does after?
>
> The answer to this question is basically the verbose version of the
> coldplug/hotplug thing from above.  There are five stages in the
> startup of QEMU (marked by different values of the MachineInitPhase
> enum):
>
> 1) PHASE_NO_MACHINE - backends can already be created here, but no
> machine exists yet
>
> 2) PHASE_MACHINE_CREATED - the machine object has been created.  It's
> not initialized, but it's there.
>
> 3) PHASE_ACCEL_CREATED - the accelerator object has been created.  The
> accelerator needs the machine object, because for example KVM might
> not support all machine types.  So the accelerator queries the machine 
> object and fails creation in case of incompatibility.  This enables
> e.g. fallback to TCG.  -preconfig starts the monitor here.

We should be able to start monitors first, if we put in the work.

> 4) PHASE_MACHINE_INIT - machine initialization consists mostly in
> creating the onboard devices.  For this to happen, the machine needs
> to learn about the accelerator, because onboard devices include CPUs
> and other accelerator-dependent devices.  Devices plugged in this
> phase are cold-plugged.
>
> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
> and the VM is ready.  Devices plugged in this phase already count as 
> hot-plugged.  -S starts the monitor here.

Remind us: what work is done in the machine init done notifiers?

What exactly necessitates "count as hot-plugged"?  Is it something done
in these notifiers?

> x-exit-preconfig goes straight from PHASE_ACCEL_CREATED to
> PHASE_MACHINE_READY.  Devices can only be created after 
> PHASE_MACHINE_INIT, so device_add cannot be enabled at preconfig
> stage.

Now I am confused again.  Can you cold plug devices with device_add in
presence of -preconfig, and if yes, how?

Related question: when exactly in these phases do we create devices
specified with -device?

> stage.   Why does preconfig start at PHASE_ACCEL_CREATED?  Well, the
> phases were not as easy to identify in qemu_init() when it was
> introduced, so I suppose it just seemed like a good place. :)  These
> days, qemu_init() is just a hundred lines of code apart from the huge
> command line parsing switch statement, so we have a clearer idea of
> the steps and you can look deeper at what happens in each phase if you
> want.  phase_advance() is your friend.
>
>
> With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
> reached with a machine-set command (corresponding to the
> non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
> reached with an accel-set command (corresponding to -accel).

I don't think this depends on "pure-QMP configuration flow".  -machine
and -accel could advance the phase just like their buddies machine-set
and accel-set.

State transition diagram:

    PHASE_NO_MACHINE (initial state)
            |
            |  -machine or machine-set
            v
    PHASE_MACHINE_CREATED
            |
            |  -accel or accel-set
            v
    PHASE_ACCEL_CREATED
            |
            |  ???
            v
    PHASE_MACHINE_INIT
            |
            |  ???
            v
    PHASE_MACHINE_READY
            |
            |  cont
            v
           ???

Can you fill in the ??? blanks?

> I haven't yet thought hard enough whether accel-set could go directly
> from PHASE_MACHINE_CREATED to PHASE_MACHINE_INIT.  It probably depends 
> on how CPUs would be configured in the QMP flow; if accel-set must
> return at PHASE_ACCEL_CREATED, a separate command is needed to reach 
> PHASE_MACHINE_INIT.  But either way, there the monitor would be
> accessible at PHASE_MACHINE_INIT, where device_add works and
> cold-plugs the devices.

The earlier the monitor becomes available, the better.

Ideally, we'd process the command line strictly left to right, and fail
options that are "out of phase".  Make the monitor available right when
we process its -mon.  The -chardev for its character device must precede
it.

Likewise, we'd fail QMP commands that are "out of phase".
@allow-preconfig is a crutch that only exists because we're afraid (with
reason) of hidden assumptions in QMP commands.
Paolo Bonzini Nov. 11, 2021, 7:22 p.m. UTC | #13
On 11/11/21 15:37, Markus Armbruster wrote:
>> 1) PHASE_NO_MACHINE - backends can already be created here, but no
>> machine exists yet
>>
>> 2) PHASE_MACHINE_CREATED - the machine object has been created.  It's
>> not initialized, but it's there.
>>
>> 3) PHASE_ACCEL_CREATED - the accelerator object has been created.  The
>> accelerator needs the machine object, because for example KVM might
>> not support all machine types.  So the accelerator queries the machine
>> object and fails creation in case of incompatibility.  This enables
>> e.g. fallback to TCG.  -preconfig starts the monitor here.
> 
> We should be able to start monitors first, if we put in the work.

The monitor starts, the question is the availability of the event loop. 
  This requires a command (or a something) to advance to the next phase. 
   x-exit-preconfig is such a command.

In addition, one thing I don't like of preconfig is that command line 
arguments linger until they are triggered by x-exit-preconfig.  Adding 
more such commands makes things worse.

>> 4) PHASE_MACHINE_INIT - machine initialization consists mostly in
>> creating the onboard devices.  For this to happen, the machine needs
>> to learn about the accelerator, because onboard devices include CPUs
>> and other accelerator-dependent devices.  Devices plugged in this
>> phase are cold-plugged.
>>
>> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
>> and the VM is ready.  Devices plugged in this phase already count as
>> hot-plugged.  -S starts the monitor here.
> 
> Remind us: what work is done in the machine init done notifiers?

It depends, but---generally speaking---what they do applies only to 
cold-plugged devices.  For example, fw_cfg gathers the boot order in the 
machine-init-done notifier (via get_boot_devices_list).

> What exactly necessitates "count as hot-plugged"?  Is it something done
> in these notifiers?

It depends on the bus.  It boils down to this code in device_initfn:

     if (phase_check(PHASE_MACHINE_READY)) {
         dev->hotplugged = 1;
         qdev_hot_added = true;
     }

For example, hotplugged PCI devices must define function 0 last; 
coldplugged PCI devices can define functions in any order 
(do_pci_register_device, called by pci_qdev_realize).

Another example, a device_add after machine-done causes an ACPI hotplug 
event, because acpi_pcihp_device_plug_cb checks dev->hotplugged.

>> x-exit-preconfig goes straight from PHASE_ACCEL_CREATED to
>> PHASE_MACHINE_READY.  Devices can only be created after
>> PHASE_MACHINE_INIT, so device_add cannot be enabled at preconfig
>> stage.
> 
> Now I am confused again.  Can you cold plug devices with device_add in
> presence of -preconfig, and if yes, how?

No, because the monitor goes directly from a point where device_add 
fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
(PHASE_MACHINE_READY).

> Related question: when exactly in these phases do we create devices
> specified with -device?

In PHASE_MACHINE_INIT---that is, after the machine has been initialized 
and before machine-done-notifiers have been called.

>> With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
>> reached with a machine-set command (corresponding to the
>> non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
>> reached with an accel-set command (corresponding to -accel).
> 
> I don't think this depends on "pure-QMP configuration flow".  -machine
> and -accel could advance the phase just like their buddies machine-set
> and accel-set.

They already do (see qemu_init's calls to phase_advance).

> State transition diagram:
> 
>      PHASE_NO_MACHINE (initial state)
>              |
>              |  -machine or machine-set
>              v
>      PHASE_MACHINE_CREATED
>              |
>              |  -accel or accel-set
>              v
>      PHASE_ACCEL_CREATED
>              |
>              |  ???

qmp_x_exit_preconfig() -> qemu_init_board() -> machine_run_board_init()

>              v
>      PHASE_MACHINE_INIT
>              |
>              |  ???

qmp_x_exit_preconfig() -> qemu_machine_creation_done() -> 
qdev_machine_creation_done()

The steps in qmp_x_exit_preconfig() are pretty self-explanatory:

     qemu_init_board();
     qemu_create_cli_devices();
     qemu_machine_creation_done();

qemu_init() calls qmp_x_exit_preconfig() if -preconfig is not passed on 
the command line.

>              v
>      PHASE_MACHINE_READY
>              |
>              |  cont
>              v
>             ???

No phases anymore, it's always PHASE_MACHINE_READY.  cont simply changes 
the runstate to RUNNING.

> The earlier the monitor becomes available, the better.
> Ideally, we'd process the command line strictly left to right, and fail
> options that are "out of phase".  Make the monitor available right when
> we process its -mon.  The -chardev for its character device must precede
> it.

The boat for this has sailed.  The only sane way to do this is a new binary.

> Likewise, we'd fail QMP commands that are "out of phase".
> @allow-preconfig is a crutch that only exists because we're afraid (with
> reason) of hidden assumptions in QMP commands.

At this point, it's not even like that anymore (except for block devices 
because my patches haven't been applied).  allow-preconfig is just a 
quick way to avoid writing

     if (!phase_check(PHASE_MACHINE_READY)) {
         error_setg(errp, "Please configure the machine first");
         return;
     }

over and over in many commands.

Paolo
Markus Armbruster Nov. 12, 2021, 11:48 a.m. UTC | #14
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/11/21 15:37, Markus Armbruster wrote:
>>> 1) PHASE_NO_MACHINE - backends can already be created here, but no
>>> machine exists yet
>>>
>>> 2) PHASE_MACHINE_CREATED - the machine object has been created.  It's
>>> not initialized, but it's there.
>>>
>>> 3) PHASE_ACCEL_CREATED - the accelerator object has been created.  The
>>> accelerator needs the machine object, because for example KVM might
>>> not support all machine types.  So the accelerator queries the machine
>>> object and fails creation in case of incompatibility.  This enables
>>> e.g. fallback to TCG.  -preconfig starts the monitor here.
>> 
>> We should be able to start monitors first, if we put in the work.
>
> The monitor starts, the question is the availability of the event loop. 

What does the event loop depend on?

>   This requires a command (or a something) to advance to the next phase. 
>    x-exit-preconfig is such a command.
>
> In addition, one thing I don't like of preconfig is that command line 
> arguments linger until they are triggered by x-exit-preconfig.  Adding 
> more such commands makes things worse.

Yes, that's ugly.  I'd prefer command line left to right, and then QMP
commands in order.  If your command line advances the phase too far for
your QMP commands, then that's your own fault.

>>> 4) PHASE_MACHINE_INIT - machine initialization consists mostly in
>>> creating the onboard devices.  For this to happen, the machine needs
>>> to learn about the accelerator, because onboard devices include CPUs
>>> and other accelerator-dependent devices.  Devices plugged in this
>>> phase are cold-plugged.
>>>
>>> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
>>> and the VM is ready.  Devices plugged in this phase already count as
>>> hot-plugged.  -S starts the monitor here.
>> 
>> Remind us: what work is done in the machine init done notifiers?
>
> It depends, but---generally speaking---what they do applies only to 
> cold-plugged devices.  For example, fw_cfg gathers the boot order in the 
> machine-init-done notifier (via get_boot_devices_list).
>
>> What exactly necessitates "count as hot-plugged"?  Is it something done
>> in these notifiers?
>
> It depends on the bus.  It boils down to this code in device_initfn:
>
>      if (phase_check(PHASE_MACHINE_READY)) {
>          dev->hotplugged = 1;
>          qdev_hot_added = true;
>      }
>
> For example, hotplugged PCI devices must define function 0 last; 
> coldplugged PCI devices can define functions in any order 
> (do_pci_register_device, called by pci_qdev_realize).
>
> Another example, a device_add after machine-done causes an ACPI hotplug 
> event, because acpi_pcihp_device_plug_cb checks dev->hotplugged.

Worse, if the guest doesn't play ball, the device remains in hot plug
limbo.

Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
the plug is hot) instead of earlier (when it's cold)?

>>> x-exit-preconfig goes straight from PHASE_ACCEL_CREATED to
>>> PHASE_MACHINE_READY.  Devices can only be created after
>>> PHASE_MACHINE_INIT, so device_add cannot be enabled at preconfig
>>> stage.
>> 
>> Now I am confused again.  Can you cold plug devices with device_add in
>> presence of -preconfig, and if yes, how?
>
> No, because the monitor goes directly from a point where device_add 
> fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
> (PHASE_MACHINE_READY).

Bummer.

>> Related question: when exactly in these phases do we create devices
>> specified with -device?
>
> In PHASE_MACHINE_INIT---that is, after the machine has been initialized 
> and before machine-done-notifiers have been called.

In other words, you should never use device_add where -device would do,
because the latter gives you cold plug (which is simple and reliable),
and the former hot plug (which is the opposite).

>>> With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
>>> reached with a machine-set command (corresponding to the
>>> non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
>>> reached with an accel-set command (corresponding to -accel).
>> 
>> I don't think this depends on "pure-QMP configuration flow".  -machine
>> and -accel could advance the phase just like their buddies machine-set
>> and accel-set.
>
> They already do (see qemu_init's calls to phase_advance).
>
>> State transition diagram:
>> 
>>      PHASE_NO_MACHINE (initial state)
>>              |
>>              |  -machine or machine-set
>>              v
>>      PHASE_MACHINE_CREATED
>>              |
>>              |  -accel or accel-set
>>              v
>>      PHASE_ACCEL_CREATED
>>              |
>>              |  ???
>
> qmp_x_exit_preconfig() -> qemu_init_board() -> machine_run_board_init()

I read this as "the state transition happens in
machine_run_board_init(), called from qmp_x_exit_preconfig() via
qemu_init_board()".

>>              v
>>      PHASE_MACHINE_INIT
>>              |
>>              |  ???
>
> qmp_x_exit_preconfig() -> qemu_machine_creation_done() -> 
> qdev_machine_creation_done()

I read this as "the state transition happens in
qdev_machine_creation_done(), called from qmp_x_exit_preconfig() via
qemu_machine_creation_done()".

> The steps in qmp_x_exit_preconfig() are pretty self-explanatory:
>
>      qemu_init_board();
>      qemu_create_cli_devices();
>      qemu_machine_creation_done();
>
> qemu_init() calls qmp_x_exit_preconfig() if -preconfig is not passed on 
> the command line.
>
>>              v
>>      PHASE_MACHINE_READY
>>              |
>>              |  cont
>>              v
>>             ???
>
> No phases anymore, it's always PHASE_MACHINE_READY.  cont simply changes 
> the runstate to RUNNING.
>
>> The earlier the monitor becomes available, the better.
>> Ideally, we'd process the command line strictly left to right, and fail
>> options that are "out of phase".  Make the monitor available right when
>> we process its -mon.  The -chardev for its character device must precede
>> it.
>
> The boat for this has sailed.  The only sane way to do this is a new binary.

"Ideally" still applies to any new binary.

>> Likewise, we'd fail QMP commands that are "out of phase".
>> @allow-preconfig is a crutch that only exists because we're afraid (with
>> reason) of hidden assumptions in QMP commands.
>
> At this point, it's not even like that anymore (except for block devices 
> because my patches haven't been applied).  allow-preconfig is just a 
> quick way to avoid writing
>
>      if (!phase_check(PHASE_MACHINE_READY)) {
>          error_setg(errp, "Please configure the machine first");
>          return;
>      }
>
> over and over in many commands.

My point is that we still have quite a few commands without
'allow-preconfig' mostly because we are afraid of allowing them in
preconfig state, not because of true phase dependencies.
Paolo Bonzini Nov. 12, 2021, 10:18 p.m. UTC | #15
On 11/12/21 12:48, Markus Armbruster wrote:
>> The monitor starts, the question is the availability of the event loop.
> 
> What does the event loop depend on?

It depends on moving the relevant code out of qemu_init (at least 
conditionally, as is the case for what is in qmp_x_exit_preconfig). 
This in turn has the problem that it's ugly to have lingering unapplied 
settings from the command line.

>>>> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
>>>> and the VM is ready.  Devices plugged in this phase already count as
>>>> hot-plugged.  -S starts the monitor here.
> 
> Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
> the plug is hot) instead of earlier (when it's cold)?

Well, PHASE_MACHINE_READY includes the whole time the guest is running. 
  So the simplest thing to do is to tell the user "if it hurts, don't do 
it".  If you want a cold-plugged device, plug it during 
PHASE_MACHINE_INIT, which right now means on the command line.

>>> Related question: when exactly in these phases do we create devices
>>> specified with -device?
>>
>> In PHASE_MACHINE_INIT---that is, after the machine has been initialized
>> and before machine-done-notifiers have been called.
> 
> In other words, you should never use device_add where -device would do,
> because the latter gives you cold plug (which is simple and reliable),
> and the former hot plug (which is the opposite).

Exactly.

>> No, because the monitor goes directly from a point where device_add 
>> fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
>> (PHASE_MACHINE_READY).
> 
> Bummer.

True, but consider that these "phases" were reconstructed ex post.  It's 
not like x-exit-preconfig was designed to skip PHASE_MACHINE_INIT; it's 
just that preconfig used to call qemu_main_loop() at the point which is 
now known as PHASE_ACCEL_CREATED.

>>>> With a pure-QMP configuration flow, PHASE_MACHINE_CREATED would be
>>>> reached with a machine-set command (corresponding to the
>>>> non-deprecated parts of -machine) and PHASE_ACCEL_CREATED would be
>>>> reached with an accel-set command (corresponding to -accel).
>>>
>>> I don't think this depends on "pure-QMP configuration flow".  -machine
>>> and -accel could advance the phase just like their buddies machine-set
>>> and accel-set.
>>
>> They already do (see qemu_init's calls to phase_advance).
>>
>>> State transition diagram:
>>>
>>>       PHASE_NO_MACHINE (initial state)
>>>               |  -machine or machine-set
>>>       PHASE_MACHINE_CREATED
>>>               |  -accel or accel-set
>>>       PHASE_ACCEL_CREATED
>>>               |
>>
>> qmp_x_exit_preconfig() -> qemu_init_board() -> machine_run_board_init()
> 
> I read this as "the state transition happens in
> machine_run_board_init(), called from qmp_x_exit_preconfig() via
> qemu_init_board()".

Exactly.  And in turn qmp_x_exit_preconfig() is reached from either the 
monitor (with -preconfig) or qemu_init (without -preconfig).

>>>       PHASE_MACHINE_INIT
>>>               |
>>
>> qmp_x_exit_preconfig() -> qemu_machine_creation_done() ->
>> qdev_machine_creation_done()
> 
> I read this as "the state transition happens in
> qdev_machine_creation_done(), called from qmp_x_exit_preconfig() via
> qemu_machine_creation_done()".

Right again.  In both cases, just grep for calls of "phase_advance".

>>> The earlier the monitor becomes available, the better.
>>> Ideally, we'd process the command line strictly left to right, and fail
>>> options that are "out of phase".  Make the monitor available right when
>>> we process its -mon.  The -chardev for its character device must precede
>>> it.
>>
>> The boat for this has sailed.  The only sane way to do this is a new binary.
> 
> "Ideally" still applies to any new binary.

Well, "ideally" any new binary would only have a few command line 
options, and ordering would be mostly irrelevant.  For example I'd 
expect a QMP binary to only have a few options, mostly for 
debugging/development (-L, -trace) and for process-wide settings (such 
as -name).

>>> Likewise, we'd fail QMP commands that are "out of phase".
>>> @allow-preconfig is a crutch that only exists because we're afraid (with
>>> reason) of hidden assumptions in QMP commands.
>>
>> At this point, it's not even like that anymore (except for block devices
>> because my patches haven't been applied).
>
> My point is that we still have quite a few commands without
> 'allow-preconfig' mostly because we are afraid of allowing them in
> preconfig state, not because of true phase dependencies.

I think there's very few of them, if any (outside the block layer for 
which patches exist), and those are due to distraction more than fear.

Paolo
Markus Armbruster Nov. 13, 2021, 7:52 a.m. UTC | #16
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/12/21 12:48, Markus Armbruster wrote:
>>> The monitor starts, the question is the availability of the event loop.
>> 
>> What does the event loop depend on?
>
> It depends on moving the relevant code out of qemu_init (at least 
> conditionally, as is the case for what is in qmp_x_exit_preconfig). 
> This in turn has the problem that it's ugly to have lingering unapplied 
> settings from the command line.
>
>>>>> 5) PHASE_MACHINE_READY - machine init done notifiers have been called
>>>>> and the VM is ready.  Devices plugged in this phase already count as
>>>>> hot-plugged.  -S starts the monitor here.
>> 
>> Why would anyone *want* to plug a device in PHASE_MACHINE_READY (when
>> the plug is hot) instead of earlier (when it's cold)?
>
> Well, PHASE_MACHINE_READY includes the whole time the guest is running. 
>   So the simplest thing to do is to tell the user "if it hurts, don't do 
> it".  If you want a cold-plugged device, plug it during 
> PHASE_MACHINE_INIT, which right now means on the command line.

One, we don't tell users anything of the sort as far as I can tell, and
two, I'm afraid you missed my question :)

I'm not asking what to do "if it hurts", or "if you want a cold-plugged
device".  I'm asking whether there's a reason for ever wanting hot plug
instead of cold plug.  Or in other words, what can hot plug possibly
gain us over cold plug?

As far as I know, the answer is "nothing but trouble".

If that's true, then what we should tell users is to stick to -device
for initial configuration, and stay away from device_add.

Such advice would rain on the "configure everything with QMP" parade.
No big deal, we already know that parade needs plenty of work before it
can hit main street, and having to provide a way to cold plug with QMP
is merely yet another sub-task.

>>>> Related question: when exactly in these phases do we create devices
>>>> specified with -device?
>>>
>>> In PHASE_MACHINE_INIT---that is, after the machine has been initialized
>>> and before machine-done-notifiers have been called.
>> 
>> In other words, you should never use device_add where -device would do,
>> because the latter gives you cold plug (which is simple and reliable),
>> and the former hot plug (which is the opposite).
>
> Exactly.
>
>>> No, because the monitor goes directly from a point where device_add 
>>> fails (PHASE_ACCEL_CREATED) to a point where devices are hotplugged 
>>> (PHASE_MACHINE_READY).
>> 
>> Bummer.
>
> True, but consider that these "phases" were reconstructed ex post.  It's 
> not like x-exit-preconfig was designed to skip PHASE_MACHINE_INIT; it's 
> just that preconfig used to call qemu_main_loop() at the point which is 
> now known as PHASE_ACCEL_CREATED.

Understand.  I'm just trying to map the terrain so we can hopefully get
from here to a better place.


[...]

>>>> The earlier the monitor becomes available, the better.
>>>> Ideally, we'd process the command line strictly left to right, and fail
>>>> options that are "out of phase".  Make the monitor available right when
>>>> we process its -mon.  The -chardev for its character device must precede
>>>> it.
>>>
>>> The boat for this has sailed.  The only sane way to do this is a new binary.
>> 
>> "Ideally" still applies to any new binary.
>
> Well, "ideally" any new binary would only have a few command line 
> options, and ordering would be mostly irrelevant.  For example I'd 
> expect a QMP binary to only have a few options, mostly for 
> debugging/development (-L, -trace) and for process-wide settings (such 
> as -name).

This is where we disagree.

For me, a new, alternative qemu-system-FOO binary should be able to
replace the warty one we have.

One important kind of user is management applications.  Libvirt
developers tell us that they'd like to configure as much as possible via
QMP.

Another kind of user dear to me is me^H^Hdevelopers.  For ad hoc
testing, having to configure via QMP is a pain we'd rathe do without.  A
combination of configuration file(s), CLI and HMP is much quicker.  I
don't want to remain stuck on the traditional binary, I want to do this
with the new one.

Catering to this kind of users should not be hard.  All it takes is a
sensiblly designed startup.  Rough sketch without much thought:

1. Start event loop

2. Feed it CLI left to right.  Each option runs a handler just like each
   QMP command does.

   Options that read a configuration file inject the file into the feed.

   Options that create a monitor create it suspended.

   Options may advance the phase / run state, and they may require
   certain phase(s).

3. When we're done with CLI, resume any monitors we created.

4. Monitors now feed commands to the event loop.  Commands may advance
   the phase / run state, and they may require certain phase(s).

>>>> Likewise, we'd fail QMP commands that are "out of phase".
>>>> @allow-preconfig is a crutch that only exists because we're afraid (with
>>>> reason) of hidden assumptions in QMP commands.
>>>
>>> At this point, it's not even like that anymore (except for block devices
>>> because my patches haven't been applied).
>>
>> My point is that we still have quite a few commands without
>> 'allow-preconfig' mostly because we are afraid of allowing them in
>> preconfig state, not because of true phase dependencies.
>
> I think there's very few of them, if any (outside the block layer for 
> which patches exist), and those are due to distraction more than fear.

qapi/*.json has 216 commands, of which 26 carry 'allow-preconfig'.
Paolo Bonzini Nov. 15, 2021, 12:37 p.m. UTC | #17
On 11/13/21 08:52, Markus Armbruster wrote:
> I'm not asking what to do "if it hurts", or "if you want a cold-plugged
> device".  I'm asking whether there's a reason for ever wanting hot plug
> instead of cold plug.  Or in other words, what can hot plug possibly
> gain us over cold plug?
> 
> As far as I know, the answer is "nothing but trouble".

Yes, I agree.

> If that's true, then what we should tell users is to stick to -device
> for initial configuration, and stay away from device_add.

Yes, which is one issue with stabilizing -preconfig.  It's not clear 
what exactly it is solving.

>>>> The boat for this has sailed.  The only sane way to do this is a new binary.
>>>
>>> "Ideally" still applies to any new binary.
>>
>> Well, "ideally" any new binary would only have a few command line
>> options, and ordering would be mostly irrelevant.  For example I'd
>> expect a QMP binary to only have a few options, mostly for
>> debugging/development (-L, -trace) and for process-wide settings (such
>> as -name).
> 
> This is where we disagree.  For me, a new, alternative qemu-system-FOO binary
> should be able to replace the warty one we have.
> 
> One important kind of user is management applications.  Libvirt
> developers tell us that they'd like to configure as much as possible via
> QMP.  Another kind of user dear to me is me^H^Hdevelopers.  For ad hoc
> testing, having to configure via QMP is a pain we'd rathe do without.  I
> don't want to remain stuck on the traditional binary, I want to do this
> with the new one.

Why do you care?  For another example, you can use "reboot" or 
"systemctl isolate reboot.target" and they end up doing the same thing.

As long as qemu_init invokes qmp_machine_set, qmp_accel_set, 
qmp_device_add, qmp_plugin_add, qmp_cont, etc. to do its job, the 
difference between qemu-system-* and qemu-qmp-* is a couple thousands 
lines of boring code that all but disappears once the VM is up and 
running.  IOW, with the right design (e.g. shortcut options for QOM 
properties good; dozens of global variables bad), there's absolutely no 
issue with some people using qemu-system-* and some using qemu-qmp-*.

>>>>> Likewise, we'd fail QMP commands that are "out of phase".
>>>>> @allow-preconfig is a crutch that only exists because we're afraid (with
>>>>> reason) of hidden assumptions in QMP commands.
>>>>
>>>> At this point, it's not even like that anymore (except for block devices
>>>> because my patches haven't been applied).
>>>
>>> My point is that we still have quite a few commands without
>>> 'allow-preconfig' mostly because we are afraid of allowing them in
>>> preconfig state, not because of true phase dependencies.
>>
>> I think there's very few of them, if any (outside the block layer for
>> which patches exist), and those are due to distraction more than fear.
> 
> qapi/*.json has 216 commands, of which 26 carry 'allow-preconfig'.

Well, 
https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg01597.html 
alone would more than double that. :)

Places like machine.json, machine-target.json, migration.json, 
replay.json have a lot of commands that are, obviously, almost entirely 
not suitable for preconfig.  I don't think there are many commands left, 
I'd guess maybe 30 (meaning that ~60% are done).

Paolo
Markus Armbruster Nov. 15, 2021, 3:40 p.m. UTC | #18
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/13/21 08:52, Markus Armbruster wrote:
>> I'm not asking what to do "if it hurts", or "if you want a cold-plugged
>> device".  I'm asking whether there's a reason for ever wanting hot plug
>> instead of cold plug.  Or in other words, what can hot plug possibly
>> gain us over cold plug?
>> 
>> As far as I know, the answer is "nothing but trouble".
>
> Yes, I agree.
>
>> If that's true, then what we should tell users is to stick to -device
>> for initial configuration, and stay away from device_add.
>
> Yes, which is one issue with stabilizing -preconfig.  It's not clear 
> what exactly it is solving.
>
>>>>> The boat for this has sailed.  The only sane way to do this is a new binary.
>>>>
>>>> "Ideally" still applies to any new binary.
>>>
>>> Well, "ideally" any new binary would only have a few command line
>>> options, and ordering would be mostly irrelevant.  For example I'd
>>> expect a QMP binary to only have a few options, mostly for
>>> debugging/development (-L, -trace) and for process-wide settings (such
>>> as -name).
>> 
>> This is where we disagree.  For me, a new, alternative qemu-system-FOO binary
>> should be able to replace the warty one we have.
>> 
>> One important kind of user is management applications.  Libvirt
>> developers tell us that they'd like to configure as much as possible via
>> QMP.  Another kind of user dear to me is me^H^Hdevelopers.  For ad hoc
>> testing, having to configure via QMP is a pain we'd rathe do without.  I
>> don't want to remain stuck on the traditional binary, I want to do this
>> with the new one.
>
> Why do you care?  For another example, you can use "reboot" or 
> "systemctl isolate reboot.target" and they end up doing the same thing.
>
> As long as qemu_init invokes qmp_machine_set, qmp_accel_set, 
> qmp_device_add, qmp_plugin_add, qmp_cont, etc. to do its job, the 
> difference between qemu-system-* and qemu-qmp-* is a couple thousands 
> lines of boring code that all but disappears once the VM is up and 
> running.  IOW, with the right design (e.g. shortcut options for QOM 
> properties good; dozens of global variables bad), there's absolutely no 
> issue with some people using qemu-system-* and some using qemu-qmp-*.

I think maintaining two binaries forever is madness.  I want the old one
to wither away.

Making the new binary capable of serving all use cases should not be
hard, just work (see my design sketch).  I expect the result to serve
*better* than the mess we have now.

>>>>>> Likewise, we'd fail QMP commands that are "out of phase".
>>>>>> @allow-preconfig is a crutch that only exists because we're afraid (with
>>>>>> reason) of hidden assumptions in QMP commands.
>>>>>
>>>>> At this point, it's not even like that anymore (except for block devices
>>>>> because my patches haven't been applied).
>>>>
>>>> My point is that we still have quite a few commands without
>>>> 'allow-preconfig' mostly because we are afraid of allowing them in
>>>> preconfig state, not because of true phase dependencies.
>>>
>>> I think there's very few of them, if any (outside the block layer for
>>> which patches exist), and those are due to distraction more than fear.
>> 
>> qapi/*.json has 216 commands, of which 26 carry 'allow-preconfig'.
>
> Well, 
> https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg01597.html 
> alone would more than double that. :)
>
> Places like machine.json, machine-target.json, migration.json, 
> replay.json have a lot of commands that are, obviously, almost entirely 
> not suitable for preconfig.  I don't think there are many commands left, 
> I'd guess maybe 30 (meaning that ~60% are done).

My point is that "very few" is not literally true, and I think you just
confirmed it ;)
Paolo Bonzini Nov. 16, 2021, 6:50 a.m. UTC | #19
El lun., 15 nov. 2021 16:40, Markus Armbruster <armbru@redhat.com> escribió:

> > Why do you care?  For another example, you can use "reboot" or
> > "systemctl isolate reboot.target" and they end up doing the same thing.
> >
> > As long as qemu_init invokes qmp_machine_set, qmp_accel_set,
> > qmp_device_add, qmp_plugin_add, qmp_cont, etc. to do its job, the
> > difference between qemu-system-* and qemu-qmp-* is a couple thousands
> > lines of boring code that all but disappears once the VM is up and
> > running.  IOW, with the right design (e.g. shortcut options for QOM
> > properties good; dozens of global variables bad), there's absolutely no
> > issue with some people using qemu-system-* and some using qemu-qmp-*.
>
> I think maintaining two binaries forever is madness.  I want the old one
> to wither away.


This seems a bit dogmatic to me. The difference between the two binaries
would be literally a single file, which basically disappears before
anything interesting happens.

Making the new binary capable of serving all use cases should not be
> hard, just work (see my design sketch).  I expect the result to serve
> *better* than the mess we have now.
>

Most of the mess is in the implementation. Not all, granted. But overall
softmmu/vl.c's ugliness is mostly due to the layers of backwards
compatibility, and that wouldn't go away very soon.

>>>> My point is that we still have quite a few commands without
> >>>> 'allow-preconfig' mostly because we are afraid of allowing them in
> >>>> preconfig state, not because of true phase dependencies.
> >>>
> >>> I think there's very few of them, if any (outside the block layer for
> >>> which patches exist), and those are due to distraction more than fear.
> >>
> >> qapi/*.json has 216 commands, of which 26 carry 'allow-preconfig'.
> >
> > Well,
> > https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg01597.html
> > alone would more than double that. :)
> >
> > Places like machine.json, machine-target.json, migration.json,
> > replay.json have a lot of commands that are, obviously, almost entirely
> > not suitable for preconfig.  I don't think there are many commands left,
> > I'd guess maybe 30 (meaning that ~60% are done).
>
> My point is that "very few" is not literally true, and I think you just
> confirmed it ;)


Ok, let me rephrase that as "most of the missing ones are block-layer
relates". ;)

Paolo
diff mbox series

Patch

diff --git a/docs/system/managed-startup.rst b/docs/system/managed-startup.rst
index 9bcf98ea79..f3291b867b 100644
--- a/docs/system/managed-startup.rst
+++ b/docs/system/managed-startup.rst
@@ -32,4 +32,4 @@  machine, including but not limited to:
 - ``query-qmp-schema``
 - ``query-commands``
 - ``query-status``
-- ``x-exit-preconfig``
+- ``exit-preconfig``
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bcaa41350e..5ccb823b97 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1001,7 +1001,7 @@  void hmp_exit_preconfig(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
 
-    qmp_x_exit_preconfig(&err);
+    qmp_exit_preconfig(&err);
     hmp_handle_error(mon, err);
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 5c2ca3b556..0f75d60996 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -175,7 +175,7 @@ 
 { 'command': 'cont' }
 
 ##
-# @x-exit-preconfig:
+# @exit-preconfig:
 #
 # Exit from "preconfig" state
 #
@@ -191,11 +191,11 @@ 
 #
 # Example:
 #
-# -> { "execute": "x-exit-preconfig" }
+# -> { "execute": "exit-preconfig" }
 # <- { "return": {} }
 #
 ##
-{ 'command': 'x-exit-preconfig', 'allow-preconfig': true }
+{ 'command': 'exit-preconfig', 'allow-preconfig': true }
 
 ##
 # @human-monitor-command:
diff --git a/qemu-options.hx b/qemu-options.hx
index 5f375bbfa6..d27778869b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3936,10 +3936,9 @@  SRST
 ``--preconfig``
     Pause QEMU for interactive configuration before the machine is
     created, which allows querying and configuring properties that will
-    affect machine initialization. Use QMP command 'x-exit-preconfig' to
+    affect machine initialization. Use QMP command 'exit-preconfig' to
     exit the preconfig state and move to the next state (i.e. run guest
-    if -S isn't used or pause the second time if -S is used). This
-    option is experimental.
+    if -S isn't used or pause the second time if -S is used).
 ERST
 
 DEF("S", 0, QEMU_OPTION_S, \
diff --git a/softmmu/vl.c b/softmmu/vl.c
index af0c4cbd99..09a9ec06f9 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2730,7 +2730,7 @@  static void qemu_machine_creation_done(void)
     }
 }
 
-void qmp_x_exit_preconfig(Error **errp)
+void qmp_exit_preconfig(Error **errp)
 {
     if (phase_check(PHASE_MACHINE_INITIALIZED)) {
         error_setg(errp, "The command is permitted only before machine initialization");
@@ -3770,7 +3770,7 @@  void qemu_init(int argc, char **argv, char **envp)
     }
 
     if (!preconfig_requested) {
-        qmp_x_exit_preconfig(&error_fatal);
+        qmp_exit_preconfig(&error_fatal);
     }
     qemu_init_displays();
     accel_setup_post(current_machine);
diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c
index 90bf68a5b3..bea95b1832 100644
--- a/tests/qtest/numa-test.c
+++ b/tests/qtest/numa-test.c
@@ -285,7 +285,7 @@  static void pc_dynamic_cpu_cfg(const void *data)
         " 'arguments': { 'type': 'cpu', 'node-id': 1, 'socket-id': 0 } }")));
 
     /* let machine initialization to complete and run */
-    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' }")));
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'exit-preconfig' }")));
     qtest_qmp_eventwait(qs, "RESUME");
 
     /* check that CPUs are mapped as expected */
@@ -443,7 +443,7 @@  static void pc_hmat_build_cfg(const void *data)
 
     /* let machine initialization to complete and run */
     g_assert_false(qmp_rsp_is_err(qtest_qmp(qs,
-        "{ 'execute': 'x-exit-preconfig' }")));
+        "{ 'execute': 'exit-preconfig' }")));
     qtest_qmp_eventwait(qs, "RESUME");
 
     qtest_quit(qs);
@@ -482,7 +482,7 @@  static void pc_hmat_off_cfg(const void *data)
 
     /* let machine initialization to complete and run */
     g_assert_false(qmp_rsp_is_err(qtest_qmp(qs,
-        "{ 'execute': 'x-exit-preconfig' }")));
+        "{ 'execute': 'exit-preconfig' }")));
     qtest_qmp_eventwait(qs, "RESUME");
 
     qtest_quit(qs);
@@ -533,7 +533,7 @@  static void pc_hmat_erange_cfg(const void *data)
 
     /* let machine initialization to complete and run */
     g_assert_false(qmp_rsp_is_err(qtest_qmp(qs,
-        "{ 'execute': 'x-exit-preconfig' }")));
+        "{ 'execute': 'exit-preconfig' }")));
     qtest_qmp_eventwait(qs, "RESUME");
 
     qtest_quit(qs);
diff --git a/tests/qtest/qmp-test.c b/tests/qtest/qmp-test.c
index cd27fae3de..f0aaa94d8a 100644
--- a/tests/qtest/qmp-test.c
+++ b/tests/qtest/qmp-test.c
@@ -299,7 +299,7 @@  static void test_qmp_preconfig(void)
     qobject_unref(rsp);
 
     /* exit preconfig state */
-    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' }")));
+    g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'exit-preconfig' }")));
     qtest_qmp_eventwait(qs, "RESUME");
 
     /* check that query-status returns running state */
@@ -309,8 +309,8 @@  static void test_qmp_preconfig(void)
     g_assert_cmpstr(qdict_get_try_str(ret, "status"), ==, "running");
     qobject_unref(rsp);
 
-    /* check that x-exit-preconfig returns error after exiting preconfig */
-    g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'x-exit-preconfig' }")));
+    /* check that exit-preconfig returns error after exiting preconfig */
+    g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'exit-preconfig' }")));
 
     /* enabled commands, no error expected  */
     g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'query-cpus-fast' }")));