Message ID | Z0gn1N3IsP8r3gTA@hovoldconsulting.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | UEFI EBS() failures on Lenovo T14s | expand |
Hi Johan, Putting Leif on cc, although he is OoO and so it may take him a while to respond. On Thu, 28 Nov 2024 at 09:20, Johan Hovold <johan@kernel.org> wrote: > > Hi Ard, > > We've run into a buggy UEFI implementation on the Qualcomm Snapdragon > based Lenovo ThinkPad T14s where ExitBootServices() often fails. > > One bootloader entry may fail to start almost consistently (once in a > while it may start), while a second entry may always work even when the > kernel, dtb and initramfs images are copies of the failing entry on the > same ESP. > > This can be worked around by starting and exiting a UEFI shell from the > bootloader or by starting the bootloader manually via the Boot Menu > (F12) before starting the kernel. > > Notably starting the kernel automatically from the shell startup.nsh > does not work, while calling the same script manually works. > > Based on your comments to a similar report for an older Snapdragon based > Lenovo UEFI implementation [1], I discovered that allocating an event > before calling ExitBootServices() can make the call succeed. There is > often no need to actually signal the event group, but the event must > remain allocated (i.e. CloseEvent() must not be called). > > (Raising TPL or disabling interrupts does not seem to help.) > > Also with the event signalling, ExitBootServices() sometimes fails when > starting the kernel automatically from a shell startup.nsh, while > systemd-boot seems to always work. This was only observed after removing > some efi_printk() used during the experiments from the stub... > > Something is obviously really broken here, but do you have any > suggestions about what could be the cause of this as further input to > Qualcomm (and Lenovo) as they try to fix this? > > For completeness, the first call to ExitBootServices() often fails also > on the x1e80100 reference design (CRD), and Qualcomm appears to have > been the ones providing the current retry implementation: > > fc07716ba803 ("efi/libstub: Introduce ExitBootServices helper") > > as this was needed to prevent similar boot failures with older Qualcomm > UEFI fw. > > Marc is also hitting something like this on the Qualcomm X1E devkit > (i.e. with firmware that should not have any modifications from Lenovo). > So the error code is EFI_INVALID_PARAMETER in all cases? In the upstream implementation, the only thing that can make ExitBootServices() return an error is a mismatch of the map key, and so there is something changing the memory map. This might be due to a handler of the gEfiEventBeforeExitBootServicesGuid event group that fails to close the event, and so it gets signaled every time. This is a fairly recent addition, though, so I'm not sure it even exists in QCOM's tree. In upstream EDK2, the map key is just a monotonic counter that gets incremented on every memory map update, so one experiment worth conducting is to repeat the second call to ExitBootServices() a couple of times, increasing the map key each time. Or use GetMemoryMap() to just grab the map key without the actual memory map, and printing it to the console (although the timer is disabled on the first call so anything that relies on that will be shut down at this point)
On Thu, Nov 28, 2024 at 09:52:33AM +0100, Ard Biesheuvel wrote: > On Thu, 28 Nov 2024 at 09:20, Johan Hovold <johan@kernel.org> wrote: > > We've run into a buggy UEFI implementation on the Qualcomm Snapdragon > > based Lenovo ThinkPad T14s where ExitBootServices() often fails. > > Based on your comments to a similar report for an older Snapdragon based > > Lenovo UEFI implementation [1], I discovered that allocating an event > > before calling ExitBootServices() can make the call succeed. There is > > often no need to actually signal the event group, but the event must > > remain allocated (i.e. CloseEvent() must not be called). > > > > (Raising TPL or disabling interrupts does not seem to help.) > > > > Also with the event signalling, ExitBootServices() sometimes fails when > > starting the kernel automatically from a shell startup.nsh, while > > systemd-boot seems to always work. This was only observed after removing > > some efi_printk() used during the experiments from the stub.. . > So the error code is EFI_INVALID_PARAMETER in all cases? In the > upstream implementation, the only thing that can make > ExitBootServices() return an error is a mismatch of the map key, and > so there is something changing the memory map. Yes, it's always EFI_INVALID_PARAMETER. > This might be due to a handler of the > gEfiEventBeforeExitBootServicesGuid event group that fails to close > the event, and so it gets signaled every time. This is a fairly recent > addition, though, so I'm not sure it even exists in QCOM's tree. > > In upstream EDK2, the map key is just a monotonic counter that gets > incremented on every memory map update, so one experiment worth > conducting is to repeat the second call to ExitBootServices() a couple > of times, increasing the map key each time. I had already tried repeating the second call (GMM + EBS) by running it in a loop, and I do see the map_key increasing for each iteration (e.g. by 0x1a). > Or use GetMemoryMap() to > just grab the map key without the actual memory map, and printing it > to the console (although the timer is disabled on the first call so > anything that relies on that will be shut down at this point) I just tried adding another inner loop just calling GetMemoryMap() a few times and I see the map_key increasing there too for each iteration (e.g. by 0x6). (The map size remains constant.) I do get the feeling that efi_printk() contributes to the memory map updates, and I can indeed get the reference design fw to similarly fail if I try to print the map_key after each call to GetMemoryMap() in a retry loop. Johan
On Thu, 28 Nov 2024 at 11:00, Johan Hovold <johan@kernel.org> wrote: > > On Thu, Nov 28, 2024 at 09:52:33AM +0100, Ard Biesheuvel wrote: > ... > > In upstream EDK2, the map key is just a monotonic counter that gets > > incremented on every memory map update, so one experiment worth > > conducting is to repeat the second call to ExitBootServices() a couple > > of times, increasing the map key each time. > > I had already tried repeating the second call (GMM + EBS) by running it > in a loop, and I do see the map_key increasing for each iteration (e.g. > by 0x1a). > > > Or use GetMemoryMap() to > > just grab the map key without the actual memory map, and printing it > > to the console (although the timer is disabled on the first call so > > anything that relies on that will be shut down at this point) > > I just tried adding another inner loop just calling GetMemoryMap() a few > times and I see the map_key increasing there too for each iteration > (e.g. by 0x6). > > (The map size remains constant.) > > I do get the feeling that efi_printk() contributes to the memory map > updates, and I can indeed get the reference design fw to similarly fail > if I try to print the map_key after each call to GetMemoryMap() in a > retry loop. Per the spec, the only thing you are permitted to call if ExitBootServices() fails is GetMemoryMap(), and so this is not a spec violation. If GetMemoryMap() itself causes the map key to assume a different value than the one it returns, or if ExitBootServices() invokes event callbacks on the second call that may cause the map key to get updated before it manages to check it, there is obviously something wrong in the firmware implementation.
On Thu, 28 Nov 2024 at 11:21, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 28 Nov 2024 at 11:00, Johan Hovold <johan@kernel.org> wrote: > > > > On Thu, Nov 28, 2024 at 09:52:33AM +0100, Ard Biesheuvel wrote: > > > ... > > > In upstream EDK2, the map key is just a monotonic counter that gets > > > incremented on every memory map update, so one experiment worth > > > conducting is to repeat the second call to ExitBootServices() a couple > > > of times, increasing the map key each time. > > > > I had already tried repeating the second call (GMM + EBS) by running it > > in a loop, and I do see the map_key increasing for each iteration (e.g. > > by 0x1a). > > > > > Or use GetMemoryMap() to > > > just grab the map key without the actual memory map, and printing it > > > to the console (although the timer is disabled on the first call so > > > anything that relies on that will be shut down at this point) > > > > I just tried adding another inner loop just calling GetMemoryMap() a few > > times and I see the map_key increasing there too for each iteration > > (e.g. by 0x6). > > > > (The map size remains constant.) > > > > I do get the feeling that efi_printk() contributes to the memory map > > updates, and I can indeed get the reference design fw to similarly fail > > if I try to print the map_key after each call to GetMemoryMap() in a > > retry loop. > > Per the spec, the only thing you are permitted to call if > ExitBootServices() fails is GetMemoryMap(), and so this is not a spec > violation. > > If GetMemoryMap() itself causes the map key to assume a different > value than the one it returns, or if ExitBootServices() invokes event > callbacks on the second call that may cause the map key to get updated > before it manages to check it, there is obviously something wrong in > the firmware implementation. If you're happy to experiment more, you could try and register a notification for EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES using CreateEventEx(), and see if it gets called when ExitBootServices() is called. That would at least help narrow it down.
On Thu, Nov 28, 2024 at 12:05:09PM +0100, Ard Biesheuvel wrote: > If you're happy to experiment more, you could try and register a > notification for EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES using > CreateEventEx(), and see if it gets called when ExitBootServices() is > called. That would at least help narrow it down. Thanks for the suggestion. I see the notify function being called when I signal it as well as on each ExitBootServices(). With an efi_printk() in the callback ExitBootServices() fails as expected, but with an empty function the kernel seems to start every time. Interestingly, ExitBootServices() now succeeds also if I add back the CloseEvent() call. In fact, it works also if I never signal the event (i.e. if I just create and close the event). The patch below should suffice as a workaround I can carry until the firmware has been fixed. Johan From 1464360c7c16d1a6ce454bf88ee5815663f27283 Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro@kernel.org> Date: Wed, 27 Nov 2024 16:05:37 +0100 Subject: [PATCH] hack: efi/libstub: fix t14s exit_boot_services() failure The UEFI firmware on the Lenovo ThinkPad T14s is broken and ExitBootServices() often fails and prevents the kernel from starting: EFI stub: Exiting boot services... EFI stub: Exit boot services failed. One bootloader entry may fail to start almost consistently (once in a while it may start), while a second entry may always work even when the kernel, dtb and initramfs images are copies of the failing entry on the same ESP. This can be worked around by starting and exiting a UEFI shell from the bootloader or by starting the bootloader manually via the Boot Menu (F12) before starting the kernel. Notably starting the kernel automatically from the shell startup.nsh does not work, while calling the same script manually works. Experiments have revealed that allocating an event before calling ExitBootServices() can make the call succeed. When providing a notification function there apparently is no need to actually signal the event group and CloseEvent() could also be called directly. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> --- .../firmware/efi/libstub/efi-stub-helper.c | 24 +++++++++++++++++++ drivers/firmware/efi/libstub/efistub.h | 4 ++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index de659f6a815f..9c9c7a1f1718 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -409,6 +409,13 @@ char *efi_convert_cmdline(efi_loaded_image_t *image, int *cmd_line_len) return (char *)cmdline_addr; } +#define EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES \ + EFI_GUID(0x8be0e274, 0x3970, 0x4b44, 0x80, 0xc5, 0x1a, 0xb9, 0x50, 0x2f, 0x3b, 0xfc) + +static void efi_before_ebs_notify(efi_event_t event, void *context) +{ +} + /** * efi_exit_boot_services() - Exit boot services * @handle: handle of the exiting image @@ -429,10 +436,27 @@ efi_status_t efi_exit_boot_services(void *handle, void *priv, { struct efi_boot_memmap *map; efi_status_t status; + efi_guid_t guid = EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES; + efi_event_t event; if (efi_disable_pci_dma) efi_pci_disable_bridge_busmaster(); + status = efi_bs_call(create_event_ex, EFI_EVT_NOTIFY_SIGNAL, + EFI_TPL_CALLBACK, efi_before_ebs_notify, NULL, + &guid, &event); + if (status == EFI_SUCCESS) { + status = efi_bs_call(signal_event, event); + if (status != EFI_SUCCESS) + efi_err("%s - signal event failed: %02lx\n", __func__, status); + + status = efi_bs_call(close_event, event); + if (status != EFI_SUCCESS) + efi_err("%s - close event failed: %02lx\n", __func__, status); + } else { + efi_err("%s - create event ex failed: %02lx\n", __func__, status); + } + status = efi_get_memory_map(&map, true); if (status != EFI_SUCCESS) return status; diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index 685098f9626f..e3f710823a29 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -272,7 +272,7 @@ union efi_boot_services { efi_status_t (__efiapi *wait_for_event)(unsigned long, efi_event_t *, unsigned long *); - void *signal_event; + efi_status_t (__efiapi *signal_event)(efi_event_t); efi_status_t (__efiapi *close_event)(efi_event_t); void *check_event; void *install_protocol_interface; @@ -322,7 +322,7 @@ union efi_boot_services { void *calculate_crc32; void (__efiapi *copy_mem)(void *, const void *, unsigned long); void (__efiapi *set_mem)(void *, unsigned long, unsigned char); - void *create_event_ex; + efi_status_t (__efiapi *create_event_ex)(u32, int, void *, void *, void *, efi_event_t *); }; struct { efi_table_hdr_t hdr;
On Thu, 28 Nov 2024 at 15:46, Johan Hovold <johan@kernel.org> wrote: > > On Thu, Nov 28, 2024 at 12:05:09PM +0100, Ard Biesheuvel wrote: > > > If you're happy to experiment more, you could try and register a > > notification for EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES using > > CreateEventEx(), and see if it gets called when ExitBootServices() is > > called. That would at least help narrow it down. > > Thanks for the suggestion. > > I see the notify function being called when I signal it as well as on > each ExitBootServices(). > Interesting. That means the EDK2 fork is fairly recent. FYI https://github.com/tianocore/edk2/pull/6481 > With an efi_printk() in the callback ExitBootServices() fails as > expected, but with an empty function the kernel seems to start every > time. > > Interestingly, ExitBootServices() now succeeds also if I add back the > CloseEvent() call. In fact, it works also if I never signal the event > (i.e. if I just create and close the event). > Is it still invoked by the firmware if you closed the event before EBS()? > The patch below should suffice as a workaround I can carry until the > firmware has been fixed. > Ok. I'd prefer to get this fixed on the firmware side as well.
On Thu, Nov 28, 2024 at 04:21:09PM +0100, Ard Biesheuvel wrote: > On Thu, 28 Nov 2024 at 15:46, Johan Hovold <johan@kernel.org> wrote: > > > > On Thu, Nov 28, 2024 at 12:05:09PM +0100, Ard Biesheuvel wrote: > > > > > If you're happy to experiment more, you could try and register a > > > notification for EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES using > > > CreateEventEx(), and see if it gets called when ExitBootServices() is > > > called. That would at least help narrow it down. > > > > Thanks for the suggestion. > > > > I see the notify function being called when I signal it as well as on > > each ExitBootServices(). > > Interesting. That means the EDK2 fork is fairly recent. > > FYI https://github.com/tianocore/edk2/pull/6481 Nice find. > > With an efi_printk() in the callback ExitBootServices() fails as > > expected, but with an empty function the kernel seems to start every > > time. > > > > Interestingly, ExitBootServices() now succeeds also if I add back the > > CloseEvent() call. In fact, it works also if I never signal the event > > (i.e. if I just create and close the event). > > Is it still invoked by the firmware if you closed the event before EBS()? No, I just reconfirmed that then it is only called when I signal it before closing (or never if don't signal the event). Johan
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c index de659f6a815f..f228895bf090 100644 --- a/drivers/firmware/efi/libstub/efi-stub-helper.c +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c @@ -409,6 +409,9 @@ char *efi_convert_cmdline(efi_loaded_image_t *image, int *cmd_line_len) return (char *)cmdline_addr; } +#define EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES \ + EFI_GUID(0x8be0e274, 0x3970, 0x4b44, 0x80, 0xc5, 0x1a, 0xb9, 0x50, 0x2f, 0x3b, 0xfc) + /** * efi_exit_boot_services() - Exit boot services * @handle: handle of the exiting image @@ -429,10 +432,26 @@ efi_status_t efi_exit_boot_services(void *handle, void *priv, { struct efi_boot_memmap *map; efi_status_t status; + efi_guid_t guid = EFI_EVENT_GROUP_BEFORE_EXIT_BOOT_SERVICES; + efi_event_t event; if (efi_disable_pci_dma) efi_pci_disable_bridge_busmaster(); + status = efi_bs_call(create_event_ex, 0, 0, NULL, NULL, &guid, &event); + if (status == EFI_SUCCESS) { + status = efi_bs_call(signal_event, event); + if (status != EFI_SUCCESS) + efi_err("%s - signal event failed: %02lx\n", __func__, status); +#if 0 + status = efi_bs_call(close_event, event); + if (status != EFI_SUCCESS) + efi_err("%s - close event failed: %02lx\n", __func__, status); +#endif + } else { + efi_err("%s - create event ex failed: %02lx\n", __func__, status); + } + status = efi_get_memory_map(&map, true); if (status != EFI_SUCCESS) return status; @@ -446,6 +465,7 @@ efi_status_t efi_exit_boot_services(void *handle, void *priv, status = efi_bs_call(exit_boot_services, handle, map->map_key); if (status == EFI_INVALID_PARAMETER) { + //efi_err("Exit boot services failed: %lx\n", status); /* * The memory map changed between efi_get_memory_map() and * exit_boot_services(). Per the UEFI Spec v2.6, Section 6.4: diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h index 685098f9626f..e3f710823a29 100644 --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -272,7 +272,7 @@ union efi_boot_services { efi_status_t (__efiapi *wait_for_event)(unsigned long, efi_event_t *, unsigned long *); - void *signal_event; + efi_status_t (__efiapi *signal_event)(efi_event_t); efi_status_t (__efiapi *close_event)(efi_event_t); void *check_event; void *install_protocol_interface; @@ -322,7 +322,7 @@ union efi_boot_services { void *calculate_crc32; void (__efiapi *copy_mem)(void *, const void *, unsigned long); void (__efiapi *set_mem)(void *, unsigned long, unsigned char); - void *create_event_ex; + efi_status_t (__efiapi *create_event_ex)(u32, int, void *, void *, void *, efi_event_t *); }; struct { efi_table_hdr_t hdr;