mbox series

[0/1] Do not stop guest when panic event is received

Message ID 1601606494-1154-1-git-send-email-alejandro.j.jimenez@oracle.com (mailing list archive)
Headers show
Series Do not stop guest when panic event is received | expand

Message

Alejandro Jimenez Oct. 2, 2020, 2:41 a.m. UTC
The following patch adds an option to request that QEMU does not stop the VM when a panic event is received.
This allows guests in cloud environments to report the panic condition to the control plane, but be able to
proceed to collect a crash dump and automatically reboot, without waiting to receive one or several 'cont'
monitor commands.

I am aware of a previous discussion regarding the decision to stop the guest on panic event:
https://lore.kernel.org/qemu-devel/52148F88.5000509@redhat.com/
that is why I propose explicitly using a parameter to change the default behavior when necessary.

The PVPANIC_CRASHLOADED event was introduced in the v5.6 kernel, and it is intended to tell QEMU that the guest
will handle the panic condition by itself, but unfortunately older kernels will only support sending the
PVPANIC_PANICKED event, for which the default behavior is to pause the VM.

Having a '-no-panicstop' option allows for older guest kernels that do not support the PVPANIC_CRASHLOADED event
to behave in the same way as newer kernels, simplifying control plane code. It also provides the same advantage
when launching Windows guests with the hv-crash enlightenment, since the hv-crash MSR writes are ultimately
handled by QEMU as if the guest had sent a PVPANIC_PANICKED event.

The fact that the behavior of hv-crash is also affected is why I chose to implement this change as an independent
option, as opposed to making it a property of the pvpanic device (e.g. -device pvpanic,no-panicstop).

Please let me know if you have any comments or suggestions.

Regards,
Alejandro

Alejandro Jimenez (1):
  vl: Add -no-panicstop option

 qemu-options.hx | 11 +++++++++++
 softmmu/vl.c    | 17 ++++++++++++++---
 2 files changed, 25 insertions(+), 3 deletions(-)

Comments

Paolo Bonzini Oct. 20, 2020, 5:14 p.m. UTC | #1
On 02/10/20 04:41, Alejandro Jimenez wrote:
> The fact that the behavior of hv-crash is also affected is why I chose to implement this change as an independent
> option, as opposed to making it a property of the pvpanic device (e.g. -device pvpanic,no-panicstop).
> 
> Please let me know if you have any comments or suggestions.

Hi Alejandro, sorry for the delayed response.

The concept is fine, and I agree this should not be a device property.

On the other hand, we already have many similar options: -no-reboot,
-no-shutdown, -watchdog-action and now --no-panicstop.

I think it's time to group them into a single option:

* -action reboot=pause|shutdown|none
* -action shutdown=pause|poweroff|none
* -action panic=pause|shutdown|none
* -action watchdog=reset|shutdown|poweroff|pause|debug|none|inject-nmi

where the existing options would translate to the new option, like:

* -no-reboot "-action reboot=shutdown"
* -no-shutdown "-action shutdown=pause"

The implementation should be relatively easy too; there's already an
enum WatchdogAction (that can be renamed to e.g. RunstateAction) and a
parsing function select_watchdog_action that can be changed to just
return the RunstateAction.

Would you like to take a look at this?

Thanks,

Paolo
Alejandro Jimenez Oct. 21, 2020, 1:26 p.m. UTC | #2
On 10/20/2020 1:14 PM, Paolo Bonzini wrote:
> On 02/10/20 04:41, Alejandro Jimenez wrote:
>> The fact that the behavior of hv-crash is also affected is why I chose to implement this change as an independent
>> option, as opposed to making it a property of the pvpanic device (e.g. -device pvpanic,no-panicstop).
>>
>> Please let me know if you have any comments or suggestions.
> Hi Alejandro, sorry for the delayed response.
>
> The concept is fine, and I agree this should not be a device property.
>
> On the other hand, we already have many similar options: -no-reboot,
> -no-shutdown, -watchdog-action and now --no-panicstop.
>
> I think it's time to group them into a single option:
>
> * -action reboot=pause|shutdown|none
> * -action shutdown=pause|poweroff|none
> * -action panic=pause|shutdown|none
> * -action watchdog=reset|shutdown|poweroff|pause|debug|none|inject-nmi
>
> where the existing options would translate to the new option, like:
>
> * -no-reboot "-action reboot=shutdown"
> * -no-shutdown "-action shutdown=pause"
>
> The implementation should be relatively easy too; there's already an
> enum WatchdogAction (that can be renamed to e.g. RunstateAction) and a
> parsing function select_watchdog_action that can be changed to just
> return the RunstateAction.
>
> Would you like to take a look at this?
Hi Paolo,

Thank you for your reply and the advice/hints above. I'll take a look 
and try to implement what you propose.

Regards,
Alejandro
>
> Thanks,
>
> Paolo
>
Paolo Bonzini Oct. 21, 2020, 1:33 p.m. UTC | #3
On 21/10/20 15:26, Alejandro Jimenez wrote:
> 
> 
> On 10/20/2020 1:14 PM, Paolo Bonzini wrote:
>> On 02/10/20 04:41, Alejandro Jimenez wrote:
>>> The fact that the behavior of hv-crash is also affected is why I
>>> chose to implement this change as an independent
>>> option, as opposed to making it a property of the pvpanic device
>>> (e.g. -device pvpanic,no-panicstop).
>>>
>>> Please let me know if you have any comments or suggestions.
>> Hi Alejandro, sorry for the delayed response.
>>
>> The concept is fine, and I agree this should not be a device property.
>>
>> On the other hand, we already have many similar options: -no-reboot,
>> -no-shutdown, -watchdog-action and now --no-panicstop.
>>
>> I think it's time to group them into a single option:
>>
>> * -action reboot=pause|shutdown|none
>> * -action shutdown=pause|poweroff|none
>> * -action panic=pause|shutdown|none
>> * -action watchdog=reset|shutdown|poweroff|pause|debug|none|inject-nmi
>>
>> where the existing options would translate to the new option, like:
>>
>> * -no-reboot "-action reboot=shutdown"
>> * -no-shutdown "-action shutdown=pause"
>>
>> The implementation should be relatively easy too; there's already an
>> enum WatchdogAction (that can be renamed to e.g. RunstateAction) and a
>> parsing function select_watchdog_action that can be changed to just
>> return the RunstateAction.
>>
>> Would you like to take a look at this?
> Hi Paolo,
> 
> Thank you for your reply and the advice/hints above. I'll take a look
> and try to implement what you propose.

Just one thing, for the parsing you can place it close to the existing

    qemu_opts_foreach(qemu_find_opts("name"),
                      parse_name, NULL, &error_fatal);

Paolo