diff mbox series

x86/shutdown: change default reboot method preference

Message ID 20230914152120.92696-1-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/shutdown: change default reboot method preference | expand

Commit Message

Roger Pau Monné Sept. 14, 2023, 3:21 p.m. UTC
The current logic to chose the preferred reboot method is based on the mode Xen
has been booted into, so if the box is booted from UEFI, the preferred reboot
method will be to use the ResetSystem() run time service call.

However, that call seems to be widely untested once the firmware has exited
boot services, and quite often leads to a result similar to:

Hardware Dom0 shutdown: rebooting machine
----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
CPU:    0
RIP:    e008:[<0000000000000017>] 0000000000000017
RFLAGS: 0000000000010202   CONTEXT: hypervisor
[...]
Xen call trace:
   [<0000000000000017>] R 0000000000000017
   [<ffff83207eff7b50>] S ffff83207eff7b50
   [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
   [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
   [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
   [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
   [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
   [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
   [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
   [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
   [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee

****************************************
Panic on CPU 0:
FATAL TRAP: vector = 6 (invalid opcode)
****************************************

Which in most cases does lead to a reboot, however that's unreliable.

Change the default reboot preference to prefer ACPI over UEFI if available and
not in reduced hardware mode.

This is in line to what Linux does, so it's unlikely to cause issues on current
and future hardware, since there's a much higher chance of vendors testing
hardware with Linux rather than Xen.

I'm not aware of using ACPI reboot causing issues on boxes that do have
properly implemented ResetSystem() methods.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
I haven't found anything in the UEFI spec that mandates to use ResetSystem()
when booted from UEFI, I've only found:

"One method of resetting the platform is through the EFI Runtime Service
ResetSystem()."

But that's vaguely a recommendation.

I've found a lot of boxes that don't reboot properly using ResetSystem(), and I
think our default should attempt to make sure Xen reliably reboots on as much
hardware as possible.
---
 xen/arch/x86/shutdown.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Andrew Cooper Sept. 14, 2023, 5:42 p.m. UTC | #1
On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
> The current logic to chose the preferred reboot method is based on the mode Xen
> has been booted into, so if the box is booted from UEFI, the preferred reboot
> method will be to use the ResetSystem() run time service call.
>
> However, that call seems to be widely untested once the firmware has exited
> boot services, and quite often leads to a result similar to:

I don't know how true this is.  What Xen does differently to most of the
rest of the world is not calling SetVirtualAddressMap()

>
> Hardware Dom0 shutdown: rebooting machine
> ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> CPU:    0
> RIP:    e008:[<0000000000000017>] 0000000000000017
> RFLAGS: 0000000000010202   CONTEXT: hypervisor
> [...]
> Xen call trace:
>    [<0000000000000017>] R 0000000000000017
>    [<ffff83207eff7b50>] S ffff83207eff7b50
>    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
>    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
>    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
>    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
>    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
>    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
>    [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
>    [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
>    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
>
> ****************************************
> Panic on CPU 0:
> FATAL TRAP: vector = 6 (invalid opcode)
> ****************************************
>
> Which in most cases does lead to a reboot, however that's unreliable.
>
> Change the default reboot preference to prefer ACPI over UEFI if available and
> not in reduced hardware mode.
>
> This is in line to what Linux does, so it's unlikely to cause issues on current
> and future hardware, since there's a much higher chance of vendors testing
> hardware with Linux rather than Xen.
>
> I'm not aware of using ACPI reboot causing issues on boxes that do have
> properly implemented ResetSystem() methods.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Wording aside, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

This is a giant embarrassment to Xen, and needs fixing.

> ---
> I haven't found anything in the UEFI spec that mandates to use ResetSystem()
> when booted from UEFI, I've only found:
>
> "One method of resetting the platform is through the EFI Runtime Service
> ResetSystem()."
>
> But that's vaguely a recommendation.
>
> I've found a lot of boxes that don't reboot properly using ResetSystem(), and I
> think our default should attempt to make sure Xen reliably reboots on as much
> hardware as possible.

You're supposed to use ResetSystem() so all the value-add can be added
behind your back, but it's a chocolate teapot when it's so broken that
none of the OSes use it...

~Andrew
Roger Pau Monné Sept. 15, 2023, 7:11 a.m. UTC | #2
On Thu, Sep 14, 2023 at 06:42:03PM +0100, Andrew Cooper wrote:
> On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
> > The current logic to chose the preferred reboot method is based on the mode Xen
> > has been booted into, so if the box is booted from UEFI, the preferred reboot
> > method will be to use the ResetSystem() run time service call.
> >
> > However, that call seems to be widely untested once the firmware has exited
> > boot services, and quite often leads to a result similar to:
> 
> I don't know how true this is.  What Xen does differently to most of the
> rest of the world is not calling SetVirtualAddressMap()

Hm, but that doesn't seem to affect the rest of the run-time services
(at least on this box), it's (just?) ResetSystem() that misbehaves.

I can remove that sentence however, it is more of a guess rather than
a fact.

> 
> >
> > Hardware Dom0 shutdown: rebooting machine
> > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > CPU:    0
> > RIP:    e008:[<0000000000000017>] 0000000000000017
> > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > [...]
> > Xen call trace:
> >    [<0000000000000017>] R 0000000000000017
> >    [<ffff83207eff7b50>] S ffff83207eff7b50
> >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> >    [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> >    [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> >
> > ****************************************
> > Panic on CPU 0:
> > FATAL TRAP: vector = 6 (invalid opcode)
> > ****************************************
> >
> > Which in most cases does lead to a reboot, however that's unreliable.
> >
> > Change the default reboot preference to prefer ACPI over UEFI if available and
> > not in reduced hardware mode.
> >
> > This is in line to what Linux does, so it's unlikely to cause issues on current
> > and future hardware, since there's a much higher chance of vendors testing
> > hardware with Linux rather than Xen.
> >
> > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > properly implemented ResetSystem() methods.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Wording aside, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> This is a giant embarrassment to Xen, and needs fixing.

Looking at Linux I think we need to add an override for Acer
TravelMate X514-51T to force it to use UEFI, will send v2 with it.

I do wonder if we need to keep the reboot_dmi_table[] entries that
force systems to use ACPI, as it would now be the default?  My
thinking is that it's likely better to keep it just in case we change
the default again in the future.

Thanks, Roger.
Marek Marczykowski-Górecki Sept. 15, 2023, 5:03 p.m. UTC | #3
On Fri, Sep 15, 2023 at 09:11:50AM +0200, Roger Pau Monné wrote:
> On Thu, Sep 14, 2023 at 06:42:03PM +0100, Andrew Cooper wrote:
> > On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
> > > The current logic to chose the preferred reboot method is based on the mode Xen
> > > has been booted into, so if the box is booted from UEFI, the preferred reboot
> > > method will be to use the ResetSystem() run time service call.
> > >
> > > However, that call seems to be widely untested once the firmware has exited
> > > boot services, and quite often leads to a result similar to:
> > 
> > I don't know how true this is.  What Xen does differently to most of the
> > rest of the world is not calling SetVirtualAddressMap()
> 
> Hm, but that doesn't seem to affect the rest of the run-time services
> (at least on this box), it's (just?) ResetSystem() that misbehaves.

I've seen (too) many firmwares where also GetVariable crashes without
SetVirtualAddressMap(). That's why we have a build-time option to
actually call SetVirtualAddressMap().

Anyway, I agree that preferring ACPI reboot even when booted through
UEFI is a good idea.

> I can remove that sentence however, it is more of a guess rather than
> a fact.
> 
> > 
> > >
> > > Hardware Dom0 shutdown: rebooting machine
> > > ----[ Xen-4.18-unstable  x86_64  debug=y  Tainted:   C    ]----
> > > CPU:    0
> > > RIP:    e008:[<0000000000000017>] 0000000000000017
> > > RFLAGS: 0000000000010202   CONTEXT: hypervisor
> > > [...]
> > > Xen call trace:
> > >    [<0000000000000017>] R 0000000000000017
> > >    [<ffff83207eff7b50>] S ffff83207eff7b50
> > >    [<ffff82d0403525aa>] F machine_restart+0x1da/0x261
> > >    [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37
> > >    [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb
> > >    [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34
> > >    [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3
> > >    [<ffff82d0402018c2>] F common_interrupt+0x132/0x140
> > >    [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129
> > >    [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7
> > >    [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee
> > >
> > > ****************************************
> > > Panic on CPU 0:
> > > FATAL TRAP: vector = 6 (invalid opcode)
> > > ****************************************
> > >
> > > Which in most cases does lead to a reboot, however that's unreliable.
> > >
> > > Change the default reboot preference to prefer ACPI over UEFI if available and
> > > not in reduced hardware mode.
> > >
> > > This is in line to what Linux does, so it's unlikely to cause issues on current
> > > and future hardware, since there's a much higher chance of vendors testing
> > > hardware with Linux rather than Xen.
> > >
> > > I'm not aware of using ACPI reboot causing issues on boxes that do have
> > > properly implemented ResetSystem() methods.
> > >
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > 
> > Wording aside, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > 
> > This is a giant embarrassment to Xen, and needs fixing.
> 
> Looking at Linux I think we need to add an override for Acer
> TravelMate X514-51T to force it to use UEFI, will send v2 with it.
> 
> I do wonder if we need to keep the reboot_dmi_table[] entries that
> force systems to use ACPI, as it would now be the default?  My
> thinking is that it's likely better to keep it just in case we change
> the default again in the future.
> 
> Thanks, Roger.
>
Jan Beulich Sept. 18, 2023, 11:58 a.m. UTC | #4
On 14.09.2023 19:42, Andrew Cooper wrote:
> On 14/09/2023 4:21 pm, Roger Pau Monne wrote:
>> I've found a lot of boxes that don't reboot properly using ResetSystem(), and I
>> think our default should attempt to make sure Xen reliably reboots on as much
>> hardware as possible.
> 
> You're supposed to use ResetSystem() so all the value-add can be added
> behind your back, but it's a chocolate teapot when it's so broken that
> none of the OSes use it...

That's only one aspect. Recall that EFI originated from ia64 bringup, where
it wasn't even specified how to reboot a system without using the runtime
services function. Fundamentally under EFI shutdown/reboot shouldn't be an
arch-specific operation in the first place.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c
index 7619544d14da..81d8eda64a41 100644
--- a/xen/arch/x86/shutdown.c
+++ b/xen/arch/x86/shutdown.c
@@ -150,12 +150,12 @@  static void default_reboot_type(void)
 
     if ( xen_guest )
         reboot_type = BOOT_XEN;
+    else if ( !acpi_disabled && !acpi_gbl_reduced_hardware )
+        reboot_type = BOOT_ACPI;
     else if ( efi_enabled(EFI_RS) )
         reboot_type = BOOT_EFI;
-    else if ( acpi_disabled )
-        reboot_type = BOOT_KBD;
     else
-        reboot_type = BOOT_ACPI;
+        reboot_type = BOOT_KBD;
 }
 
 static int __init cf_check override_reboot(const struct dmi_system_id *d)