Message ID | 20230914152120.92696-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/shutdown: change default reboot method preference | expand |
On 14/09/2023 4:21 pm, Roger Pau Monne wrote: > The current logic to chose the preferred reboot method is based on the mode Xen > has been booted into, so if the box is booted from UEFI, the preferred reboot > method will be to use the ResetSystem() run time service call. > > However, that call seems to be widely untested once the firmware has exited > boot services, and quite often leads to a result similar to: I don't know how true this is. What Xen does differently to most of the rest of the world is not calling SetVirtualAddressMap() > > Hardware Dom0 shutdown: rebooting machine > ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- > CPU: 0 > RIP: e008:[<0000000000000017>] 0000000000000017 > RFLAGS: 0000000000010202 CONTEXT: hypervisor > [...] > Xen call trace: > [<0000000000000017>] R 0000000000000017 > [<ffff83207eff7b50>] S ffff83207eff7b50 > [<ffff82d0403525aa>] F machine_restart+0x1da/0x261 > [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37 > [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb > [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34 > [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3 > [<ffff82d0402018c2>] F common_interrupt+0x132/0x140 > [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129 > [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7 > [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee > > **************************************** > Panic on CPU 0: > FATAL TRAP: vector = 6 (invalid opcode) > **************************************** > > Which in most cases does lead to a reboot, however that's unreliable. > > Change the default reboot preference to prefer ACPI over UEFI if available and > not in reduced hardware mode. > > This is in line to what Linux does, so it's unlikely to cause issues on current > and future hardware, since there's a much higher chance of vendors testing > hardware with Linux rather than Xen. > > I'm not aware of using ACPI reboot causing issues on boxes that do have > properly implemented ResetSystem() methods. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Wording aside, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> This is a giant embarrassment to Xen, and needs fixing. > --- > I haven't found anything in the UEFI spec that mandates to use ResetSystem() > when booted from UEFI, I've only found: > > "One method of resetting the platform is through the EFI Runtime Service > ResetSystem()." > > But that's vaguely a recommendation. > > I've found a lot of boxes that don't reboot properly using ResetSystem(), and I > think our default should attempt to make sure Xen reliably reboots on as much > hardware as possible. You're supposed to use ResetSystem() so all the value-add can be added behind your back, but it's a chocolate teapot when it's so broken that none of the OSes use it... ~Andrew
On Thu, Sep 14, 2023 at 06:42:03PM +0100, Andrew Cooper wrote: > On 14/09/2023 4:21 pm, Roger Pau Monne wrote: > > The current logic to chose the preferred reboot method is based on the mode Xen > > has been booted into, so if the box is booted from UEFI, the preferred reboot > > method will be to use the ResetSystem() run time service call. > > > > However, that call seems to be widely untested once the firmware has exited > > boot services, and quite often leads to a result similar to: > > I don't know how true this is. What Xen does differently to most of the > rest of the world is not calling SetVirtualAddressMap() Hm, but that doesn't seem to affect the rest of the run-time services (at least on this box), it's (just?) ResetSystem() that misbehaves. I can remove that sentence however, it is more of a guess rather than a fact. > > > > > Hardware Dom0 shutdown: rebooting machine > > ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- > > CPU: 0 > > RIP: e008:[<0000000000000017>] 0000000000000017 > > RFLAGS: 0000000000010202 CONTEXT: hypervisor > > [...] > > Xen call trace: > > [<0000000000000017>] R 0000000000000017 > > [<ffff83207eff7b50>] S ffff83207eff7b50 > > [<ffff82d0403525aa>] F machine_restart+0x1da/0x261 > > [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37 > > [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb > > [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34 > > [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3 > > [<ffff82d0402018c2>] F common_interrupt+0x132/0x140 > > [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129 > > [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7 > > [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee > > > > **************************************** > > Panic on CPU 0: > > FATAL TRAP: vector = 6 (invalid opcode) > > **************************************** > > > > Which in most cases does lead to a reboot, however that's unreliable. > > > > Change the default reboot preference to prefer ACPI over UEFI if available and > > not in reduced hardware mode. > > > > This is in line to what Linux does, so it's unlikely to cause issues on current > > and future hardware, since there's a much higher chance of vendors testing > > hardware with Linux rather than Xen. > > > > I'm not aware of using ACPI reboot causing issues on boxes that do have > > properly implemented ResetSystem() methods. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > Wording aside, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > > This is a giant embarrassment to Xen, and needs fixing. Looking at Linux I think we need to add an override for Acer TravelMate X514-51T to force it to use UEFI, will send v2 with it. I do wonder if we need to keep the reboot_dmi_table[] entries that force systems to use ACPI, as it would now be the default? My thinking is that it's likely better to keep it just in case we change the default again in the future. Thanks, Roger.
On Fri, Sep 15, 2023 at 09:11:50AM +0200, Roger Pau Monné wrote: > On Thu, Sep 14, 2023 at 06:42:03PM +0100, Andrew Cooper wrote: > > On 14/09/2023 4:21 pm, Roger Pau Monne wrote: > > > The current logic to chose the preferred reboot method is based on the mode Xen > > > has been booted into, so if the box is booted from UEFI, the preferred reboot > > > method will be to use the ResetSystem() run time service call. > > > > > > However, that call seems to be widely untested once the firmware has exited > > > boot services, and quite often leads to a result similar to: > > > > I don't know how true this is. What Xen does differently to most of the > > rest of the world is not calling SetVirtualAddressMap() > > Hm, but that doesn't seem to affect the rest of the run-time services > (at least on this box), it's (just?) ResetSystem() that misbehaves. I've seen (too) many firmwares where also GetVariable crashes without SetVirtualAddressMap(). That's why we have a build-time option to actually call SetVirtualAddressMap(). Anyway, I agree that preferring ACPI reboot even when booted through UEFI is a good idea. > I can remove that sentence however, it is more of a guess rather than > a fact. > > > > > > > > > Hardware Dom0 shutdown: rebooting machine > > > ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- > > > CPU: 0 > > > RIP: e008:[<0000000000000017>] 0000000000000017 > > > RFLAGS: 0000000000010202 CONTEXT: hypervisor > > > [...] > > > Xen call trace: > > > [<0000000000000017>] R 0000000000000017 > > > [<ffff83207eff7b50>] S ffff83207eff7b50 > > > [<ffff82d0403525aa>] F machine_restart+0x1da/0x261 > > > [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37 > > > [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb > > > [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34 > > > [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3 > > > [<ffff82d0402018c2>] F common_interrupt+0x132/0x140 > > > [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129 > > > [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7 > > > [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee > > > > > > **************************************** > > > Panic on CPU 0: > > > FATAL TRAP: vector = 6 (invalid opcode) > > > **************************************** > > > > > > Which in most cases does lead to a reboot, however that's unreliable. > > > > > > Change the default reboot preference to prefer ACPI over UEFI if available and > > > not in reduced hardware mode. > > > > > > This is in line to what Linux does, so it's unlikely to cause issues on current > > > and future hardware, since there's a much higher chance of vendors testing > > > hardware with Linux rather than Xen. > > > > > > I'm not aware of using ACPI reboot causing issues on boxes that do have > > > properly implemented ResetSystem() methods. > > > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > > > Wording aside, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> > > > > This is a giant embarrassment to Xen, and needs fixing. > > Looking at Linux I think we need to add an override for Acer > TravelMate X514-51T to force it to use UEFI, will send v2 with it. > > I do wonder if we need to keep the reboot_dmi_table[] entries that > force systems to use ACPI, as it would now be the default? My > thinking is that it's likely better to keep it just in case we change > the default again in the future. > > Thanks, Roger. >
On 14.09.2023 19:42, Andrew Cooper wrote: > On 14/09/2023 4:21 pm, Roger Pau Monne wrote: >> I've found a lot of boxes that don't reboot properly using ResetSystem(), and I >> think our default should attempt to make sure Xen reliably reboots on as much >> hardware as possible. > > You're supposed to use ResetSystem() so all the value-add can be added > behind your back, but it's a chocolate teapot when it's so broken that > none of the OSes use it... That's only one aspect. Recall that EFI originated from ia64 bringup, where it wasn't even specified how to reboot a system without using the runtime services function. Fundamentally under EFI shutdown/reboot shouldn't be an arch-specific operation in the first place. Jan
diff --git a/xen/arch/x86/shutdown.c b/xen/arch/x86/shutdown.c index 7619544d14da..81d8eda64a41 100644 --- a/xen/arch/x86/shutdown.c +++ b/xen/arch/x86/shutdown.c @@ -150,12 +150,12 @@ static void default_reboot_type(void) if ( xen_guest ) reboot_type = BOOT_XEN; + else if ( !acpi_disabled && !acpi_gbl_reduced_hardware ) + reboot_type = BOOT_ACPI; else if ( efi_enabled(EFI_RS) ) reboot_type = BOOT_EFI; - else if ( acpi_disabled ) - reboot_type = BOOT_KBD; else - reboot_type = BOOT_ACPI; + reboot_type = BOOT_KBD; } static int __init cf_check override_reboot(const struct dmi_system_id *d)
The current logic to chose the preferred reboot method is based on the mode Xen has been booted into, so if the box is booted from UEFI, the preferred reboot method will be to use the ResetSystem() run time service call. However, that call seems to be widely untested once the firmware has exited boot services, and quite often leads to a result similar to: Hardware Dom0 shutdown: rebooting machine ----[ Xen-4.18-unstable x86_64 debug=y Tainted: C ]---- CPU: 0 RIP: e008:[<0000000000000017>] 0000000000000017 RFLAGS: 0000000000010202 CONTEXT: hypervisor [...] Xen call trace: [<0000000000000017>] R 0000000000000017 [<ffff83207eff7b50>] S ffff83207eff7b50 [<ffff82d0403525aa>] F machine_restart+0x1da/0x261 [<ffff82d04035263c>] F apic_wait_icr_idle+0/0x37 [<ffff82d040233689>] F smp_call_function_interrupt+0xc7/0xcb [<ffff82d040352f05>] F call_function_interrupt+0x20/0x34 [<ffff82d04033b0d5>] F do_IRQ+0x150/0x6f3 [<ffff82d0402018c2>] F common_interrupt+0x132/0x140 [<ffff82d040283d33>] F arch/x86/acpi/cpu_idle.c#acpi_idle_do_entry+0x113/0x129 [<ffff82d04028436c>] F arch/x86/acpi/cpu_idle.c#acpi_processor_idle+0x3eb/0x5f7 [<ffff82d04032a549>] F arch/x86/domain.c#idle_loop+0xec/0xee **************************************** Panic on CPU 0: FATAL TRAP: vector = 6 (invalid opcode) **************************************** Which in most cases does lead to a reboot, however that's unreliable. Change the default reboot preference to prefer ACPI over UEFI if available and not in reduced hardware mode. This is in line to what Linux does, so it's unlikely to cause issues on current and future hardware, since there's a much higher chance of vendors testing hardware with Linux rather than Xen. I'm not aware of using ACPI reboot causing issues on boxes that do have properly implemented ResetSystem() methods. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- I haven't found anything in the UEFI spec that mandates to use ResetSystem() when booted from UEFI, I've only found: "One method of resetting the platform is through the EFI Runtime Service ResetSystem()." But that's vaguely a recommendation. I've found a lot of boxes that don't reboot properly using ResetSystem(), and I think our default should attempt to make sure Xen reliably reboots on as much hardware as possible. --- xen/arch/x86/shutdown.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)