diff mbox series

[V5,02/12] cpus: stop vm in suspended state

Message ID 1699900440-207345-3-git-send-email-steven.sistare@oracle.com (mailing list archive)
State New, archived
Headers show
Series fix migration of suspended runstate | expand

Commit Message

Steven Sistare Nov. 13, 2023, 6:33 p.m. UTC
A vm in the suspended state is not completely stopped.  The VCPUs have been
paused, but the cpu clock still runs, and runstate notifiers for the
transition to stopped have not been called.  Modify vm_stop_force_state to
completely stop the vm if the current state is suspended, to be called for
live migration and snapshots.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 system/cpus.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Fabiano Rosas Nov. 20, 2023, 2:15 p.m. UTC | #1
Steve Sistare <steven.sistare@oracle.com> writes:

> A vm in the suspended state is not completely stopped.  

Is this a statement of a fact about VMs in the suspended state in
general or is this describing what this patch is trying to fix?

> The VCPUs have been paused, but the cpu clock still runs, and runstate
> notifiers for the transition to stopped have not been called.

...it reads like the latter, but then why aren't we fixing this at the
moment we put the VM in the suspend state?

> Modify vm_stop_force_state to completely stop the vm if the current
> state is suspended, to be called for live migration and snapshots.

Hm, this changes the meaning of the "force" from:

"force a state even if already stopped"

into:

"force a complete stop if already suspended, otherwise just set the
state"

I don't know what to make of this, shouldn't all vm_stops cause a
complete stop?

We need to at least resolve the overloading of the 'force' term.

>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  system/cpus.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/system/cpus.c b/system/cpus.c
> index f72c4be..c772708 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>  {
>      int ret = 0;
> +    bool running = runstate_is_running();
> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>  
>      if (qemu_in_vcpu_thread()) {
>          qemu_system_vmstop_request_prepare();
> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, bool force)
>          return 0;
>      }
>  
> -    if (runstate_is_running()) {
> +    if (running || (suspended && force)) {
>          runstate_set(state);
>          cpu_disable_ticks();
> -        pause_all_vcpus();
> +        if (running) {
> +            pause_all_vcpus();
> +        }
>          vm_state_notify(0, state);
>          if (send_stop) {
>              qapi_event_send_stop();
Steven Sistare Nov. 20, 2023, 7:10 p.m. UTC | #2
On 11/20/2023 9:15 AM, Fabiano Rosas wrote:
> Steve Sistare <steven.sistare@oracle.com> writes:
> 
>> A vm in the suspended state is not completely stopped.  
> 
> Is this a statement of a fact about VMs in the suspended state in
> general or is this describing what this patch is trying to fix?

The former.

>> The VCPUs have been paused, but the cpu clock still runs, and runstate
>> notifiers for the transition to stopped have not been called.
> 
> ...it reads like the latter, but then why aren't we fixing this at the
> moment we put the VM in the suspend state?

cpu_get_ticks() must continue to tick while the guest is suspended, so that
QEMU_CLOCK_VIRTUAL continues to tick, so that timeouts based on that clock
will fire.  One example is timed wake from suspend,  acpi_pm_tmr_timer.

>> Modify vm_stop_force_state to completely stop the vm if the current
>> state is suspended, to be called for live migration and snapshots.
> 
> Hm, this changes the meaning of the "force" from:
> 
> "force a state even if already stopped"
> 
> into:
> 
> "force a complete stop if already suspended, otherwise just set the
> state"

vm_stop_force_state has the same behavior as before for all states
except suspended.  If suspended, it also:
  - stops cpu ticks
  - calls runstate stopped handlers
  - sets a new runstate

> I don't know what to make of this, shouldn't all vm_stops cause a
> complete stop?

We cannot stop cpu_get_ticks.  We could maybe call the runstate stop handlers,
but that requires a careful examination of every handler, and there is no obvious 
correctness or cleanliness reason to stop them immediately on vm_stop(), since cpu 
ticks still needs special handling later.

> We need to at least resolve the overloading of the 'force' term.

How about a more complete function header comment:

/*
 * If the machine is running or suspended, completely stop it.
 * Force the new runstate to @state.
 * The current state is forgotten forever.
 */

- Steve

>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  system/cpus.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/system/cpus.c b/system/cpus.c
>> index f72c4be..c772708 100644
>> --- a/system/cpus.c
>> +++ b/system/cpus.c
>> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>>  {
>>      int ret = 0;
>> +    bool running = runstate_is_running();
>> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>>  
>>      if (qemu_in_vcpu_thread()) {
>>          qemu_system_vmstop_request_prepare();
>> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, bool force)
>>          return 0;
>>      }
>>  
>> -    if (runstate_is_running()) {
>> +    if (running || (suspended && force)) {
>>          runstate_set(state);
>>          cpu_disable_ticks();
>> -        pause_all_vcpus();
>> +        if (running) {
>> +            pause_all_vcpus();
>> +        }
>>          vm_state_notify(0, state);
>>          if (send_stop) {
>>              qapi_event_send_stop();
Peter Xu Nov. 20, 2023, 7:59 p.m. UTC | #3
On Mon, Nov 13, 2023 at 10:33:50AM -0800, Steve Sistare wrote:
> A vm in the suspended state is not completely stopped.  The VCPUs have been
> paused, but the cpu clock still runs, and runstate notifiers for the
> transition to stopped have not been called.  Modify vm_stop_force_state to
> completely stop the vm if the current state is suspended, to be called for
> live migration and snapshots.
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  system/cpus.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/system/cpus.c b/system/cpus.c
> index f72c4be..c772708 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>  {
>      int ret = 0;
> +    bool running = runstate_is_running();
> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>  
>      if (qemu_in_vcpu_thread()) {
>          qemu_system_vmstop_request_prepare();
> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, bool force)
>          return 0;
>      }
>  
> -    if (runstate_is_running()) {
> +    if (running || (suspended && force)) {
>          runstate_set(state);
>          cpu_disable_ticks();

Not directly relevant, but this is weird that I just notice.

If we disable ticks before stopping vCPUs, IIUC it means vcpus can see
stall ticks.  I checked the vm_start() and indeed that one did it in the
other way round: we'll stop vCPUs before stopping the ticks.

> -        pause_all_vcpus();
> +        if (running) {
> +            pause_all_vcpus();
> +        }
>          vm_state_notify(0, state);
>          if (send_stop) {
>              qapi_event_send_stop();

IIUC the "force" is not actually needed.  It's only used when SUSPENDED,
right?

In general, considering all above, I'm wondering something like this would
be much cleaner (and dropping force)?

===8<===
static int do_vm_stop(RunState state, bool send_stop)
 {
+    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
+    bool running = runstate_is_running();
     int ret = 0;
 
-    if (runstate_is_running()) {
+    /*
+     * RUNNING:   VM and vCPUs are all running
+     * SUSPENDED: VM is running, VCPUs are stopped
+     * Others:    VM and vCPUs are all stopped
+     */
+
+    /* Whether do we need to stop vCPUs? */
+    if (running) {
+        pause_all_vcpus();
+    }
+
+    /* Whether do we need to stop the VM in general? */
+    if (running || suspended) {
         runstate_set(state);
         cpu_disable_ticks();
-        pause_all_vcpus();
         vm_state_notify(0, state);
         if (send_stop) {
             qapi_event_send_stop();
Fabiano Rosas Nov. 20, 2023, 8:47 p.m. UTC | #4
Peter Xu <peterx@redhat.com> writes:

> On Mon, Nov 13, 2023 at 10:33:50AM -0800, Steve Sistare wrote:
>> A vm in the suspended state is not completely stopped.  The VCPUs have been
>> paused, but the cpu clock still runs, and runstate notifiers for the
>> transition to stopped have not been called.  Modify vm_stop_force_state to
>> completely stop the vm if the current state is suspended, to be called for
>> live migration and snapshots.
>> 
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  system/cpus.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/system/cpus.c b/system/cpus.c
>> index f72c4be..c772708 100644
>> --- a/system/cpus.c
>> +++ b/system/cpus.c
>> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>>  {
>>      int ret = 0;
>> +    bool running = runstate_is_running();
>> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>>  
>>      if (qemu_in_vcpu_thread()) {
>>          qemu_system_vmstop_request_prepare();
>> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, bool force)
>>          return 0;
>>      }
>>  
>> -    if (runstate_is_running()) {
>> +    if (running || (suspended && force)) {
>>          runstate_set(state);
>>          cpu_disable_ticks();
>
> Not directly relevant, but this is weird that I just notice.
>
> If we disable ticks before stopping vCPUs, IIUC it means vcpus can see
> stall ticks.  I checked the vm_start() and indeed that one did it in the
> other way round: we'll stop vCPUs before stopping the ticks.
>
>> -        pause_all_vcpus();
>> +        if (running) {
>> +            pause_all_vcpus();
>> +        }
>>          vm_state_notify(0, state);
>>          if (send_stop) {
>>              qapi_event_send_stop();
>
> IIUC the "force" is not actually needed.  It's only used when SUSPENDED,
> right?

That's the overloading I'm complaining about. We're using "force" to say
both: "include suspended" and: "set the state". This is basically taking
knowledge from the callsite being the migration code and encoding it in
that flag.

I'd prefer something like:

static int do_vm_stop(RunState state, bool send_stop, bool set_state,
                      bool include_suspended);
Steven Sistare Nov. 20, 2023, 8:55 p.m. UTC | #5
On 11/20/2023 2:59 PM, Peter Xu wrote:
> On Mon, Nov 13, 2023 at 10:33:50AM -0800, Steve Sistare wrote:
>> A vm in the suspended state is not completely stopped.  The VCPUs have been
>> paused, but the cpu clock still runs, and runstate notifiers for the
>> transition to stopped have not been called.  Modify vm_stop_force_state to
>> completely stop the vm if the current state is suspended, to be called for
>> live migration and snapshots.
>>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  system/cpus.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/system/cpus.c b/system/cpus.c
>> index f72c4be..c772708 100644
>> --- a/system/cpus.c
>> +++ b/system/cpus.c
>> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>>  {
>>      int ret = 0;
>> +    bool running = runstate_is_running();
>> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>>  
>>      if (qemu_in_vcpu_thread()) {
>>          qemu_system_vmstop_request_prepare();
>> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, bool force)
>>          return 0;
>>      }
>>  
>> -    if (runstate_is_running()) {
>> +    if (running || (suspended && force)) {
>>          runstate_set(state);
>>          cpu_disable_ticks();
> 
> Not directly relevant, but this is weird that I just notice.
> 
> If we disable ticks before stopping vCPUs, IIUC it means vcpus can see
> stall ticks.  I checked the vm_start() and indeed that one did it in the
> other way round: we'll stop vCPUs before stopping the ticks.
> 
>> -        pause_all_vcpus();
>> +        if (running) {
>> +            pause_all_vcpus();
>> +        }
>>          vm_state_notify(0, state);
>>          if (send_stop) {
>>              qapi_event_send_stop();
> 
> IIUC the "force" is not actually needed.  It's only used when SUSPENDED,
> right?
> 
> In general, considering all above, I'm wondering something like this would
> be much cleaner (and dropping force)?

If we drop force, then all calls to vm_stop will completely stop the suspended
state, eg an hmp "stop" command. This causes two problems.  First, that is a change
in user-visible behavior for something that currently works, vs the migration code
where we are fixing brokenness.  Second, it does not quite work, because the state 
becomes RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont" 
will try to set the running state.  I could fix that by introducing a new state
RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change in existing
behavior.  (I even implemented that while developing, then I realized it was not 
needed to fix the migration bugs.)

- Steve

> ===8<===
> static int do_vm_stop(RunState state, bool send_stop)
>  {
> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
> +    bool running = runstate_is_running();
>      int ret = 0;
>  
> -    if (runstate_is_running()) {
> +    /*
> +     * RUNNING:   VM and vCPUs are all running
> +     * SUSPENDED: VM is running, VCPUs are stopped
> +     * Others:    VM and vCPUs are all stopped
> +     */
> +
> +    /* Whether do we need to stop vCPUs? */
> +    if (running) {
> +        pause_all_vcpus();
> +    }
> +
> +    /* Whether do we need to stop the VM in general? */
> +    if (running || suspended) {
>          runstate_set(state);
>          cpu_disable_ticks();
> -        pause_all_vcpus();
>          vm_state_notify(0, state);
>          if (send_stop) {
>              qapi_event_send_stop();
>
Steven Sistare Nov. 20, 2023, 9:26 p.m. UTC | #6
On 11/20/2023 3:47 PM, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>> On Mon, Nov 13, 2023 at 10:33:50AM -0800, Steve Sistare wrote:
>>> A vm in the suspended state is not completely stopped.  The VCPUs have been
>>> paused, but the cpu clock still runs, and runstate notifiers for the
>>> transition to stopped have not been called.  Modify vm_stop_force_state to
>>> completely stop the vm if the current state is suspended, to be called for
>>> live migration and snapshots.
>>>
>>> Suggested-by: Peter Xu <peterx@redhat.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>>  system/cpus.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/cpus.c b/system/cpus.c
>>> index f72c4be..c772708 100644
>>> --- a/system/cpus.c
>>> +++ b/system/cpus.c
>>> @@ -255,6 +255,8 @@ void cpu_interrupt(CPUState *cpu, int mask)
>>>  static int do_vm_stop(RunState state, bool send_stop, bool force)
>>>  {
>>>      int ret = 0;
>>> +    bool running = runstate_is_running();
>>> +    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
>>>  
>>>      if (qemu_in_vcpu_thread()) {
>>>          qemu_system_vmstop_request_prepare();
>>> @@ -267,10 +269,12 @@ static int do_vm_stop(RunState state, bool send_stop, bool force)
>>>          return 0;
>>>      }
>>>  
>>> -    if (runstate_is_running()) {
>>> +    if (running || (suspended && force)) {
>>>          runstate_set(state);
>>>          cpu_disable_ticks();
>>
>> Not directly relevant, but this is weird that I just notice.
>>
>> If we disable ticks before stopping vCPUs, IIUC it means vcpus can see
>> stall ticks.  I checked the vm_start() and indeed that one did it in the
>> other way round: we'll stop vCPUs before stopping the ticks.
>>
>>> -        pause_all_vcpus();
>>> +        if (running) {
>>> +            pause_all_vcpus();
>>> +        }
>>>          vm_state_notify(0, state);
>>>          if (send_stop) {
>>>              qapi_event_send_stop();
>>
>> IIUC the "force" is not actually needed.  It's only used when SUSPENDED,
>> right?

When not suspended, the force flag causes a stopped state to be forced even
if current is a different stopped state.

> That's the overloading I'm complaining about. We're using "force" to say
> both: "include suspended" and: "set the state". This is basically taking
> knowledge from the callsite being the migration code and encoding it in
> that flag.
> 
> I'd prefer something like:
> 
> static int do_vm_stop(RunState state, bool send_stop, bool set_state,
>                       bool include_suspended);

This function has always been tailored for use by migration code and no other
callers.  Migration would always pass set_state=true and include_suspended=true.
We have no use case for other combinations and no test for them.

To my mind, "force" naturally implies both behaviors.  We force the machine into the
specified stop state, completely stopping suspended execution.

Perhaps renaming vm_stop_force_state would erase the old association of "force" with 
only forcing runstate, such as vm_stop_all().

- Steve
Peter Xu Nov. 20, 2023, 9:44 p.m. UTC | #7
On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
> If we drop force, then all calls to vm_stop will completely stop the
> suspended state, eg an hmp "stop" command. This causes two problems.
> First, that is a change in user-visible behavior for something that
> currently works,

IMHO it depends on what should be the correct behavior.  IOW, when VM is in
SUSPENDED state and then the user sends "stop" QMP command, what should we
expect?

My understanding is we should expect to fully stop the VM, including the
ticks, for example.  Keeping the ticks running even after QMP "stop"
doesn't sound right, isn't it?

> vs the migration code where we are fixing brokenness.

This is not a migration-only bug if above holds, IMO.

> Second, it does not quite work, because the state becomes
> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
> will try to set the running state.  I could fix that by introducing a new
> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
> in existing behavior.  (I even implemented that while developing, then I
> realized it was not needed to fix the migration bugs.)

Good point.

Now with above comments, what's your thoughts on whether we should change
the user behavior?  My answer is still a yes.

Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
behavior, while something like QMP "stop" is not guest visible.  Maybe we
should remember it separately?

It means qemu_system_suspend() could remember that in a separate field (as
part of guest state), then when wakeup we should conditionally go back
with/without vcpus running depending on the new "suspended" state.
Steven Sistare Nov. 21, 2023, 9:21 p.m. UTC | #8
On 11/20/2023 4:44 PM, Peter Xu wrote:
> On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
>> If we drop force, then all calls to vm_stop will completely stop the
>> suspended state, eg an hmp "stop" command. This causes two problems.
>> First, that is a change in user-visible behavior for something that
>> currently works,
> 
> IMHO it depends on what should be the correct behavior.  IOW, when VM is in
> SUSPENDED state and then the user sends "stop" QMP command, what should we
> expect?
> 
> My understanding is we should expect to fully stop the VM, including the
> ticks, for example.  Keeping the ticks running even after QMP "stop"
> doesn't sound right, isn't it?

I agree, but others may not, and this decision would require approval from 
maintainers in other areas, including upper layers such as libvirt that are
aware of the suspended state.  It is a user-visible change, and may require 
a new libvirt release.

>> vs the migration code where we are fixing brokenness.
> 
> This is not a migration-only bug if above holds, IMO.
> 
>> Second, it does not quite work, because the state becomes
>> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
>> will try to set the running state.  I could fix that by introducing a new
>> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
>> in existing behavior.  (I even implemented that while developing, then I
>> realized it was not needed to fix the migration bugs.)
> 
> Good point.
> 
> Now with above comments, what's your thoughts on whether we should change
> the user behavior?  My answer is still a yes.
> 
> Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
> behavior, while something like QMP "stop" is not guest visible.  Maybe we
> should remember it separately?
>
> It means qemu_system_suspend() could remember that in a separate field (as
> part of guest state), then when wakeup we should conditionally go back
> with/without vcpus running depending on the new "suspended" state.

Regardless of how we remember it, the status command must still expose the 
suspended state to the user.  The user must be able to see that a guest is 
suspended, and decide when to issue a wakeup command.

If we change the stop command to completely stop a suspended vm, then we must
allow the user to query whether a vm is suspended-running or suspended-stopped,
because the command they must issue to resume is different: wakeup for the
former, and cont for the latter.

This change is visible to libvirt.  Adding it will delay this entire patch
series, and is not necessary for fixing the migration bug.  There is no
downside to drawing the line here for this series, and possibly changing stop
semantics in the future.

- Steve
Peter Xu Nov. 21, 2023, 10:47 p.m. UTC | #9
On Tue, Nov 21, 2023 at 04:21:18PM -0500, Steven Sistare wrote:
> On 11/20/2023 4:44 PM, Peter Xu wrote:
> > On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
> >> If we drop force, then all calls to vm_stop will completely stop the
> >> suspended state, eg an hmp "stop" command. This causes two problems.
> >> First, that is a change in user-visible behavior for something that
> >> currently works,
> > 
> > IMHO it depends on what should be the correct behavior.  IOW, when VM is in
> > SUSPENDED state and then the user sends "stop" QMP command, what should we
> > expect?
> > 
> > My understanding is we should expect to fully stop the VM, including the
> > ticks, for example.  Keeping the ticks running even after QMP "stop"
> > doesn't sound right, isn't it?
> 
> I agree, but others may not, and this decision would require approval from 
> maintainers in other areas, including upper layers such as libvirt that are
> aware of the suspended state.  It is a user-visible change, and may require 
> a new libvirt release.

$ ./scripts/get_maintainer.pl -f system/cpus.c
Richard Henderson <richard.henderson@linaro.org> (maintainer:Overall TCG CPUs)
Paolo Bonzini <pbonzini@redhat.com> (reviewer:Overall TCG CPUs)
qemu-devel@nongnu.org (open list:All patches CC here)

I'm also copying Richard, while Dan/Paolo is already in the loop, so we
should have the "quorum" already.  Let's see whether we can already get
some comments from the maintainers..

> 
> >> vs the migration code where we are fixing brokenness.
> > 
> > This is not a migration-only bug if above holds, IMO.
> > 
> >> Second, it does not quite work, because the state becomes
> >> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
> >> will try to set the running state.  I could fix that by introducing a new
> >> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
> >> in existing behavior.  (I even implemented that while developing, then I
> >> realized it was not needed to fix the migration bugs.)
> > 
> > Good point.
> > 
> > Now with above comments, what's your thoughts on whether we should change
> > the user behavior?  My answer is still a yes.
> > 
> > Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
> > behavior, while something like QMP "stop" is not guest visible.  Maybe we
> > should remember it separately?
> >
> > It means qemu_system_suspend() could remember that in a separate field (as
> > part of guest state), then when wakeup we should conditionally go back
> > with/without vcpus running depending on the new "suspended" state.
>
> Regardless of how we remember it, the status command must still expose
> the suspended state to the user.  The user must be able to see that a
> guest is suspended, and decide when to issue a wakeup command.

Hmm, right, we may want to keep having the SUSPENDED state in RunState,
even another separate "vm_suspended" boolean might still be required.

> 
> If we change the stop command to completely stop a suspended vm, then we must
> allow the user to query whether a vm is suspended-running or suspended-stopped,
> because the command they must issue to resume is different: wakeup for the
> former, and cont for the latter.

If it's stopped, the user must need a "cont" anyway.  And then if after
"cont" the user still sees it's suspended, then would "system_wakeup" work
here if necessary, after that "cont"?

Let's consider the current QEMU with below sequence of operations:

  1) vm running
  2) guest triggers ACPI suspend -> vm suspended
  3) admin triggers "stop" cmd -> vm suspended (ignored..)
  4) admin triggers "cont" cmd -> vm suspended (ignored.. too)

AFAICT both 2) and 3) are unwanted behavior, and after noticing 3) I feel
stronger that this is not a migration issue alone.

It also means after step 1)-3) if we got a wakeup elsewhere, the VM can
actually be running!  That's definitely unexpected after admin sends "stop"
already.  Isn't that another real bug?

I'm slightly confused on why you said above that libvirt will need a new
release. Could you elaborate?  Especially on what scenario we need to
maintain compatibility that still makes sense.

> 
> This change is visible to libvirt.  Adding it will delay this entire patch
> series, and is not necessary for fixing the migration bug.  There is no
> downside to drawing the line here for this series, and possibly changing stop
> semantics in the future.

This series will need to wait for rc releases anyway until 8.2 all out:

  https://wiki.qemu.org/Planning/8.2

I think we still have time to even catch the earliest train right after 8.2
released, if we can reach a consensus soon in whatever form.

Having a partial solution merged for migration is probably doable, but that
will make the code even more complicated and harder to maintain.  So before
doing so, I'd at least like to understand better on what use case you were
describing that will start to fall apart.

Thanks,
Daniel P. Berrangé Nov. 22, 2023, 9:38 a.m. UTC | #10
On Mon, Nov 20, 2023 at 04:44:50PM -0500, Peter Xu wrote:
> On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
> > If we drop force, then all calls to vm_stop will completely stop the
> > suspended state, eg an hmp "stop" command. This causes two problems.
> > First, that is a change in user-visible behavior for something that
> > currently works,
> 
> IMHO it depends on what should be the correct behavior.  IOW, when VM is in
> SUSPENDED state and then the user sends "stop" QMP command, what should we
> expect?

I would say that from a mgmtm app POV "stop" is initiating a state
transition, from RUN_STATE_RUNNING to RUN_STATE_PAUSED and "cont"
is doing the reverse from PAUSED to RUNNING.

It is a little more complicated than that as there are some other
states like INMIGRATE that are conceptually equiv to RUNNING,
and states where the transition simply doesn't make sense.


So my question is if we're in "SUSPENDED" and someone issues "stop",
what state do we go into, and perhaps more importantly what state
do we go to in a subsequent "cont".

If you say SUSPENDED ---(stop)---> PAUSED ---(cont)---> SUSPENDED
then we create a problem, because the decision for the transition
out of PAUSED needs memory of the previous state.

> My understanding is we should expect to fully stop the VM, including the
> ticks, for example.  Keeping the ticks running even after QMP "stop"
> doesn't sound right, isn't it?

The "stop" QMP command is documented as

    "Stop all guest VCPU execution"

the devil is in the detail though, and we've not documented any detail.

Whether or not timers keep running across stop/cont I think can be
argued to be an impl detail, as long as the headline goal "vcpus
don't execute" is satisfied.

> > vs the migration code where we are fixing brokenness.
> 
> This is not a migration-only bug if above holds, IMO.
> 
> > Second, it does not quite work, because the state becomes
> > RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
> > will try to set the running state.  I could fix that by introducing a new
> > state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
> > in existing behavior.  (I even implemented that while developing, then I
> > realized it was not needed to fix the migration bugs.)
> 
> Good point.

We have added new guest states periodically. It is a user visible
change, but you could argue that it is implementing a new feature
ie the ability to "stop" a "suspended" guest, and so is justified.

S3 is so little used in virt, so I'm not surprised we're finding
long standing edge cases that have never been thought about before.

> Now with above comments, what's your thoughts on whether we should change
> the user behavior?  My answer is still a yes.
> 
> Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
> behavior, while something like QMP "stop" is not guest visible.  Maybe we
> should remember it separately?

Yes, every time I look at this area I come away thinking that
the RunState enum is a mess, overloading too many different
concepts onto the same single field.

Specifically "SUSPENDED" vs "RUNNING" is a reflection of guest
state (ie whether or not the VM is in S3), but pretty much all
the others are a reflection of QEMU host state. I kind of feel
that SUSPENDED (S3) probably shouldn't have been a RunState at
all. I'd probably put guest-panicked into a separate thing too.

But we're stuck with what we have.

> It means qemu_system_suspend() could remember that in a separate field (as
> part of guest state), then when wakeup we should conditionally go back
> with/without vcpus running depending on the new "suspended" state.



With regards,
Daniel
Peter Xu Nov. 22, 2023, 4:21 p.m. UTC | #11
On Wed, Nov 22, 2023 at 09:38:06AM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 20, 2023 at 04:44:50PM -0500, Peter Xu wrote:
> > On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
> > > If we drop force, then all calls to vm_stop will completely stop the
> > > suspended state, eg an hmp "stop" command. This causes two problems.
> > > First, that is a change in user-visible behavior for something that
> > > currently works,
> > 
> > IMHO it depends on what should be the correct behavior.  IOW, when VM is in
> > SUSPENDED state and then the user sends "stop" QMP command, what should we
> > expect?
> 
> I would say that from a mgmtm app POV "stop" is initiating a state
> transition, from RUN_STATE_RUNNING to RUN_STATE_PAUSED and "cont"
> is doing the reverse from PAUSED to RUNNING.
> 
> It is a little more complicated than that as there are some other
> states like INMIGRATE that are conceptually equiv to RUNNING,
> and states where the transition simply doesn't make sense.

In the qemu impl, INMIGRATE is, imho, more equiv of PAUSED - do_vm_stop()
mostly ignores every state except RUNNING (putting bdrv operations aside).
IOW, anything besides "running" is treated as "not running".

But then Paolo fixed that in 1e9981465f ("qmp: handle stop/cont in
INMIGRATE state"), wiring that to autostart.

Now we seem to find that "suspended" should actually fall within (where
"vm" is running, but "vcpu" is not), and it seems we should treat "vm" and
"vcpu" differently.

> 
> So my question is if we're in "SUSPENDED" and someone issues "stop",
> what state do we go into, and perhaps more importantly what state
> do we go to in a subsequent "cont".

I think we must stop the "vm", not only the "vcpu".  I discussed this bit
in the other thread more or less: it's because qemu_system_wakeup_request()
can be called in many places, e.g. acpi_pm_tmr_timer().

It means even after the VM is "stop"ped by the admin QMP (where qmp_stop()
ignores SUSPENDED, keep the "vm" running), it can silently got waken up
without admin even noticing it.  I'm not sure what Libvirt will behave if
it suddenly receives a QAPI_EVENT_WAKEUP randomly after a "stop".

> 
> If you say SUSPENDED ---(stop)---> PAUSED ---(cont)---> SUSPENDED
> then we create a problem, because the decision for the transition
> out of PAUSED needs memory of the previous state.

Right, that's why I think we at least need one more boolean to remember the
suspended state, then when we switch from any STOP states into any RUN
states, we know where to go.  Here STOP states I meant anything except
RUNNING and SUSPENDED, while RUN -> RUNNING or SUSPENDED.

> 
> > My understanding is we should expect to fully stop the VM, including the
> > ticks, for example.  Keeping the ticks running even after QMP "stop"
> > doesn't sound right, isn't it?
> 
> The "stop" QMP command is documented as
> 
>     "Stop all guest VCPU execution"
> 
> the devil is in the detail though, and we've not documented any detail.
> 
> Whether or not timers keep running across stop/cont I think can be
> argued to be an impl detail, as long as the headline goal "vcpus
> don't execute" is satisfied.

"stop" was since qemu v0.14, so I guess we can't blame the missing of
details or any form of inaccuracy..  Obviously we do more than "stop vCPU
executions" in the current implementation.

But after we reach a consensus on how we should fix the current suspended
problem, we may want to update the documentation to start containing more
information.

> 
> > > vs the migration code where we are fixing brokenness.
> > 
> > This is not a migration-only bug if above holds, IMO.
> > 
> > > Second, it does not quite work, because the state becomes
> > > RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
> > > will try to set the running state.  I could fix that by introducing a new
> > > state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
> > > in existing behavior.  (I even implemented that while developing, then I
> > > realized it was not needed to fix the migration bugs.)
> > 
> > Good point.
> 
> We have added new guest states periodically. It is a user visible
> change, but you could argue that it is implementing a new feature
> ie the ability to "stop" a "suspended" guest, and so is justified.
> 
> S3 is so little used in virt, so I'm not surprised we're finding
> long standing edge cases that have never been thought about before.
> 
> > Now with above comments, what's your thoughts on whether we should change
> > the user behavior?  My answer is still a yes.
> > 
> > Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
> > behavior, while something like QMP "stop" is not guest visible.  Maybe we
> > should remember it separately?
> 
> Yes, every time I look at this area I come away thinking that
> the RunState enum is a mess, overloading too many different
> concepts onto the same single field.
> 
> Specifically "SUSPENDED" vs "RUNNING" is a reflection of guest
> state (ie whether or not the VM is in S3), but pretty much all
> the others are a reflection of QEMU host state. I kind of feel
> that SUSPENDED (S3) probably shouldn't have been a RunState at
> all. I'd probably put guest-panicked into a separate thing too.
> 
> But we're stuck with what we have.

IMO compatibility is only necessary if at least the existing code is
running well.  But now I see it a major flaw in suspended state and I can't
see how it can go right if with current code base..  My concern is instead
that after suspended will be used more (e.g., assuming CPR will rely on it)
we can have more chance to confuse/oops a mgmt app like Libvirt, like I
described above.

In summary, I think a current solution to me is only to fix at least
suspended state for good, by:

  - adding vm_suspended boolean to remember machine RUNNING / SUSPENDED
    state.  When "cont" we need to switch to either RUNNING / SUSPENDED
    depending on the boolean.

  - keeping SUSPENDED state as RunState (for compatibility, otherwise we'll
    need another interface to fetch that boolean anyway), even though not
    query-able during any !RUNNING && !SUSPENDED state.. hopefully not a
    big deal

  - enrich ducumentation of qmp_stop/qmp_cont to describe what they really do

  - (with suspended working all right...) fix migration of SUSPENDED state

I don't expect a lot of code changes is needed, maybe even less than the
current series (because we don't need special knob to differenciate
migration or non-migration callers of do_vm_stop()). IMHO this series is
already doing some of that but just decided to ignore outside-migration
states for suspeneded.

We may want to add some test cases though to verify the suspended state
transitions (maybe easier to put into migration-test with the new ACPI
guest code), but optional.

Thanks,
Steven Sistare Nov. 28, 2023, 1:26 p.m. UTC | #12
On 11/22/2023 11:21 AM, Peter Xu wrote:
> On Wed, Nov 22, 2023 at 09:38:06AM +0000, Daniel P. Berrangé wrote:
>> On Mon, Nov 20, 2023 at 04:44:50PM -0500, Peter Xu wrote:
>>> On Mon, Nov 20, 2023 at 03:55:54PM -0500, Steven Sistare wrote:
>>>> If we drop force, then all calls to vm_stop will completely stop the
>>>> suspended state, eg an hmp "stop" command. This causes two problems.
>>>> First, that is a change in user-visible behavior for something that
>>>> currently works,
>>>
>>> IMHO it depends on what should be the correct behavior.  IOW, when VM is in
>>> SUSPENDED state and then the user sends "stop" QMP command, what should we
>>> expect?
>>
>> I would say that from a mgmtm app POV "stop" is initiating a state
>> transition, from RUN_STATE_RUNNING to RUN_STATE_PAUSED and "cont"
>> is doing the reverse from PAUSED to RUNNING.
>>
>> It is a little more complicated than that as there are some other
>> states like INMIGRATE that are conceptually equiv to RUNNING,
>> and states where the transition simply doesn't make sense.
> 
> In the qemu impl, INMIGRATE is, imho, more equiv of PAUSED - do_vm_stop()
> mostly ignores every state except RUNNING (putting bdrv operations aside).
> IOW, anything besides "running" is treated as "not running".
> 
> But then Paolo fixed that in 1e9981465f ("qmp: handle stop/cont in
> INMIGRATE state"), wiring that to autostart.
> 
> Now we seem to find that "suspended" should actually fall within (where
> "vm" is running, but "vcpu" is not), and it seems we should treat "vm" and
> "vcpu" differently.
> 
>>
>> So my question is if we're in "SUSPENDED" and someone issues "stop",
>> what state do we go into, and perhaps more importantly what state
>> do we go to in a subsequent "cont".
> 
> I think we must stop the "vm", not only the "vcpu".  I discussed this bit
> in the other thread more or less: it's because qemu_system_wakeup_request()
> can be called in many places, e.g. acpi_pm_tmr_timer().
> 
> It means even after the VM is "stop"ped by the admin QMP (where qmp_stop()
> ignores SUSPENDED, keep the "vm" running), it can silently got waken up
> without admin even noticing it.  I'm not sure what Libvirt will behave if
> it suddenly receives a QAPI_EVENT_WAKEUP randomly after a "stop".
> 
>>
>> If you say SUSPENDED ---(stop)---> PAUSED ---(cont)---> SUSPENDED
>> then we create a problem, because the decision for the transition
>> out of PAUSED needs memory of the previous state.
> 
> Right, that's why I think we at least need one more boolean to remember the
> suspended state, then when we switch from any STOP states into any RUN
> states, we know where to go.  Here STOP states I meant anything except
> RUNNING and SUSPENDED, while RUN -> RUNNING or SUSPENDED.
> 
>>
>>> My understanding is we should expect to fully stop the VM, including the
>>> ticks, for example.  Keeping the ticks running even after QMP "stop"
>>> doesn't sound right, isn't it?
>>
>> The "stop" QMP command is documented as
>>
>>     "Stop all guest VCPU execution"
>>
>> the devil is in the detail though, and we've not documented any detail.
>>
>> Whether or not timers keep running across stop/cont I think can be
>> argued to be an impl detail, as long as the headline goal "vcpus
>> don't execute" is satisfied.
> 
> "stop" was since qemu v0.14, so I guess we can't blame the missing of
> details or any form of inaccuracy..  Obviously we do more than "stop vCPU
> executions" in the current implementation.
> 
> But after we reach a consensus on how we should fix the current suspended
> problem, we may want to update the documentation to start containing more
> information.
> 
>>
>>>> vs the migration code where we are fixing brokenness.
>>>
>>> This is not a migration-only bug if above holds, IMO.
>>>
>>>> Second, it does not quite work, because the state becomes
>>>> RUN_STATE_PAUSED, so the suspended state is forgotten, and the hmp "cont"
>>>> will try to set the running state.  I could fix that by introducing a new
>>>> state RUN_STATE_SUSPENDED_STOPPED, but again it is a user-visible change
>>>> in existing behavior.  (I even implemented that while developing, then I
>>>> realized it was not needed to fix the migration bugs.)
>>>
>>> Good point.
>>
>> We have added new guest states periodically. It is a user visible
>> change, but you could argue that it is implementing a new feature
>> ie the ability to "stop" a "suspended" guest, and so is justified.
>>
>> S3 is so little used in virt, so I'm not surprised we're finding
>> long standing edge cases that have never been thought about before.
>>
>>> Now with above comments, what's your thoughts on whether we should change
>>> the user behavior?  My answer is still a yes.
>>>
>>> Maybe SUSPENDED should not be a RunState at all? SUSPENDED is guest visible
>>> behavior, while something like QMP "stop" is not guest visible.  Maybe we
>>> should remember it separately?
>>
>> Yes, every time I look at this area I come away thinking that
>> the RunState enum is a mess, overloading too many different
>> concepts onto the same single field.
>>
>> Specifically "SUSPENDED" vs "RUNNING" is a reflection of guest
>> state (ie whether or not the VM is in S3), but pretty much all
>> the others are a reflection of QEMU host state. I kind of feel
>> that SUSPENDED (S3) probably shouldn't have been a RunState at
>> all. I'd probably put guest-panicked into a separate thing too.
>>
>> But we're stuck with what we have.
> 
> IMO compatibility is only necessary if at least the existing code is
> running well.  But now I see it a major flaw in suspended state and I can't
> see how it can go right if with current code base..  My concern is instead
> that after suspended will be used more (e.g., assuming CPR will rely on it)
> we can have more chance to confuse/oops a mgmt app like Libvirt, like I
> described above.
> 
> In summary, I think a current solution to me is only to fix at least
> suspended state for good, by:
> 
>   - adding vm_suspended boolean to remember machine RUNNING / SUSPENDED
>     state.  When "cont" we need to switch to either RUNNING / SUSPENDED
>     depending on the boolean.
> 
>   - keeping SUSPENDED state as RunState (for compatibility, otherwise we'll
>     need another interface to fetch that boolean anyway), even though not
>     query-able during any !RUNNING && !SUSPENDED state.. hopefully not a
>     big deal
> 
>   - enrich ducumentation of qmp_stop/qmp_cont to describe what they really do
> 
>   - (with suspended working all right...) fix migration of SUSPENDED state
> 
> I don't expect a lot of code changes is needed, maybe even less than the
> current series (because we don't need special knob to differenciate
> migration or non-migration callers of do_vm_stop()). IMHO this series is
> already doing some of that but just decided to ignore outside-migration
> states for suspeneded.
> 
> We may want to add some test cases though to verify the suspended state
> transitions (maybe easier to put into migration-test with the new ACPI
> guest code), but optional.

FYI, here is a brief update before today's meeting.  I have implemented this and
I am testing libvirt and its various save + restore commands, when the guest is
suspended running (RUN_STATE_SUSPENDED), and suspended stopped (RUN_STATE_PAUSED
with vm_was_suspended = true). There are a few failures, and I am still investigating 
to see whether they can be fixed in qemu, or need a fix in libvirt.

I will send more details later.

- Steve
diff mbox series

Patch

diff --git a/system/cpus.c b/system/cpus.c
index f72c4be..c772708 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -255,6 +255,8 @@  void cpu_interrupt(CPUState *cpu, int mask)
 static int do_vm_stop(RunState state, bool send_stop, bool force)
 {
     int ret = 0;
+    bool running = runstate_is_running();
+    bool suspended = runstate_check(RUN_STATE_SUSPENDED);
 
     if (qemu_in_vcpu_thread()) {
         qemu_system_vmstop_request_prepare();
@@ -267,10 +269,12 @@  static int do_vm_stop(RunState state, bool send_stop, bool force)
         return 0;
     }
 
-    if (runstate_is_running()) {
+    if (running || (suspended && force)) {
         runstate_set(state);
         cpu_disable_ticks();
-        pause_all_vcpus();
+        if (running) {
+            pause_all_vcpus();
+        }
         vm_state_notify(0, state);
         if (send_stop) {
             qapi_event_send_stop();