From patchwork Fri Jan 27 12:14:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 13118660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B15CFC54EAA for ; Fri, 27 Jan 2023 13:04:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=sUoB4XNKNfm7cc5qPORKPn7Z+y/LgQoBEuRxkLp7F2g=; b=eMJFKVTypFM3Av bOy8s3BdYGbWatfiZIntRisOqM58o/sR3p6sgCExO+U+ZEA9O4gFekWx2mWjA335gw31CMMavg/Hq wT2KbwgOHemHDG+ujDtoy2qeUV68vGek2wMx/Ir3veYuJXwNKo00xvQMkAJPMcit9wwT/nA8IAkD6 7xPjzV1+BTj1sKU0Co/pnUbYeZpprUgiIHKgzkSHrpJ8f7uHTT2ZuizRyH1dfQPSNN0mT4r6ixVL4 Msvhpx7t9+ghOO8cVwlSRhMVPzFqeICSa9Cj3ska06fLhBEAQObTMoldKes1tYwA/4qW1+bvsF6RI ElP329oWk+BQReXZ4DwA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pLONq-00EqIE-RH; Fri, 27 Jan 2023 13:03:11 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1pLNca-00EShS-MI for linux-arm-kernel@lists.infradead.org; Fri, 27 Jan 2023 12:14:23 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 67F101570; Fri, 27 Jan 2023 04:15:00 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.9.233]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2A8B43F5A1; Fri, 27 Jan 2023 04:14:17 -0800 (PST) Date: Fri, 27 Jan 2023 12:14:10 +0000 From: Mark Rutland To: Ard Biesheuvel Cc: Will Deacon , cki-project@redhat.com, xiawu@redhat.com, catalin.marinas@arm.com, bhe@redhat.com, linux-arm-kernel@lists.infradead.org, james.morse@arm.com Subject: [PATCH] arm64: avoid executing padding bytes during kexec / hibernation (WAS: Re: ? FAIL (91/181 SKIPPED): Test report for for-kernelci (6.2.0-rc5, arm-next, 2e84eedb)) Message-ID: References: <29992.123012504212600261@us-mta-139.us.mimecast.lan> <20230126125202.GA29102@willie-the-truck> <20230126151715.GA29322@willie-the-truck> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230127_041420_872000_E4F4A6FE X-CRM114-Status: GOOD ( 54.47 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Adding James, since this affects hibernate too. On Thu, Jan 26, 2023 at 07:33:22PM +0100, Ard Biesheuvel wrote: > On Thu, 26 Jan 2023 at 18:25, Mark Rutland wrote: > > > > On Thu, Jan 26, 2023 at 03:17:16PM +0000, Will Deacon wrote: > > > On Thu, Jan 26, 2023 at 03:09:58PM +0000, Mark Rutland wrote: > > > > On Thu, Jan 26, 2023 at 12:52:03PM +0000, Will Deacon wrote: > > > > > [+Mark and Ard in case they have ideas] > > > > > > > > > > On Wed, Jan 25, 2023 at 09:21:19AM -0000, cki-project@redhat.com wrote: > > > > > > Hi, we tested your kernel and here are the results: > > > > > > > > > > > > Overall result: FAILED > > > > > > Merge: OK > > > > > > Compile: OK > > > > > > Test: FAILED > > > > > > > > > > > > > > > > > > Kernel information: > > > > > > Commit message: Merge branch 'for-next/core' into for-kernelci > > > > > > > > > > > > You can find all the details about the test run at > > > > > > https://datawarehouse.cki-project.org/kcidb/checkouts/66828 > > > > > > > > > > > > One or more kernel tests failed: > > > > > > Unrecognized or new issues: > > > > > > aarch64 - kdump - kexec_boot > > > > > > Logs: https://datawarehouse.cki-project.org/kcidb/tests/6799495 > > > > > > > > > > This looks like we run into an undefined instruction when we jump to the > > > > > kexec relocation code. Do you know if the failure is reproducible, and is > > > > > the log identical each time? > > > > > > > > I had a go in a QEMU KVM VM on ThunderX2, and a QEMU KVM TCG VM. With defconfig > > > > I don't see the issue, but with the config from the test run link above I > > > > consistently see the issue both under KVM and TCG (logs below). > > > > > > > > It should be simple enough to figure out which config option is tickling this; > > > > I'll go dig in to that.. > > > > > > Cheers, Mark. If you get a chance, it's probably also worth testing vanilla > > > -rc5 to confirm that it's a regression in our queue (which we could > > > assumedly bisect if necessary). > > > > I have met the enemy, and he is me: > > > > | git bisect start > > | # good: [2241ab53cbb5cdb08a6b2d4688feb13971058f65] Linux 6.2-rc5 > > | git bisect good 2241ab53cbb5cdb08a6b2d4688feb13971058f65 > > | # bad: [2e84eedb182e43a9113c2c83cc3373c2ae99ce19] Merge branch 'for-next/core' into for-kernelci > > | git bisect bad 2e84eedb182e43a9113c2c83cc3373c2ae99ce19 > > | # good: [3eb1b41fba97a1586e3ecca8c10547071f541567] kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps > > | git bisect good 3eb1b41fba97a1586e3ecca8c10547071f541567 > > | # good: [daac835347a52d9d141be281e4657cc08a360e97] kselftest/arm64: Correct buffer size for SME ZA storage > > | git bisect good daac835347a52d9d141be281e4657cc08a360e97 > > | # bad: [baaf553d3bc330697c68a00f96cf11f4edfeac7e] arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS > > | git bisect bad baaf553d3bc330697c68a00f96cf11f4edfeac7e > > | # good: [47a15aa544279d34e14e17ca3b5855e39b946cec] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT > > | git bisect good 47a15aa544279d34e14e17ca3b5855e39b946cec > > | # good: [e4ecbe83fd1a5428d5458de04a3404f1b5444429] arm64: patching: Add aarch64_insn_write_literal_u64() > > | git bisect good e4ecbe83fd1a5428d5458de04a3404f1b5444429 > > | # good: [90955d778ad7873964a271852b1f24d31e00248b] arm64: ftrace: Update stale comment > > | git bisect good 90955d778ad7873964a271852b1f24d31e00248b > > | # first bad commit: [baaf553d3bc330697c68a00f96cf11f4edfeac7e] arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS > > > > It looks like this is down to the function alignment; reverting that commit makes it go away, but if ia add: > > > > | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > | index 6914f6bf41e22..8dafeea05864e 100644 > > | --- a/arch/arm64/Kconfig > > | +++ b/arch/arm64/Kconfig > > | @@ -123,7 +123,7 @@ config ARM64 > > | select DMA_DIRECT_REMAP > > | select EDAC_SUPPORT > > | select FRAME_POINTER > > | - select FUNCTION_ALIGNMENT_4B > > | + select FUNCTION_ALIGNMENT_8B # HACK HACK HACK > > | select GENERIC_ALLOCATOR > > | select GENERIC_ARCH_TOPOLOGY > > | select GENERIC_CLOCKEVENTS_BROADCAST > > > > ... then it blows up again. > > > > So we're probably doing a clever address calculation in the kexec idmap code > > that ends up being wrong when the code gets shuffled a bit; possibly a > > mismatched caller/callee alignment. > > > > $ grep -1 arm64_reloca System.map > ffff80000b0320fc T __relocate_new_kernel_start > ffff80000b032100 T arm64_relocate_new_kernel > ffff80000b032220 T __relocate_new_kernel_end > > so the alignment results in arm64_relocate_new_kernel() appearing past > the start of the section, and the kexec code takes the address of the > section not the function symbol. Yup, and we do the same for hibernation with swsusp_arch_suspend_exit vs __hibernate_exit_text_start. AFAICT those are the only places we end up relying upon alignment that we don't already have (e.g. the KPTI trampoline section is already page-aligned). > Something like the below should fix it, although it would arguably be > better to clean up the kexec code (but I'm reluctant to make a clean > spot) > > index 407415a5163ab62f..89720ec2d830d51c 100644 > --- a/arch/arm64/kernel/vmlinux.lds.S > +++ b/arch/arm64/kernel/vmlinux.lds.S > @@ -102,6 +102,7 @@ jiffies = jiffies_64; > > #ifdef CONFIG_KEXEC_CORE > #define KEXEC_TEXT \ > + . = ALIGN(64); \ > __relocate_new_kernel_start = .; \ > *(.kexec_relocate.text) \ > __relocate_new_kernel_end = .; Yep, and there's ALIGN_FUNCTION() for exactly this purpose. I had a quick look at making the kexec and hibernation code figure out the offset between the start of the section and the function, and it wasn't very pretty, so I think for now we should just place an assertion in the linker script. With all that in mind, I came up with the patch below, which works for me with ftrace and/or CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B selected. Thanks, Mark ---->8---- From 8c82c71d810be311f10703b91563a301e56be910 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Fri, 27 Jan 2023 11:08:20 +0000 Subject: [PATCH] arm64: avoid executing padding bytes during kexec / hibernation Currently we rely on the HIBERNATE_TEXT section starting with the entry point to swsusp_arch_suspend_exit, and the KEXEC_TEXT section starting with the entry point to arm64_relocate_new_kernel. In both cases we copy the entire section into a dynamically-allocated page, and then later branch to the start of this page. SYM_FUNC_START() will align the function entry points to CONFIG_FUNCTION_ALIGNMENT, and when the linker later processes the assembled code it will place padding bytes before the function entry point if the location counter was not already sufficiently aligned. The linker happens to use the value zero for these padding bytes. This padding may end up being applied whenever CONFIG_FUNCTION_ALIGNMENT is greater than 4, which can be the case with CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y or CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS=y. When such padding is applied, attempting to kexec or resume from hibernate will result ina crash: the kernel will branch to the padding bytes as the start of the dynamically-allocated page, and as those bytes are zero they will decode as UDF #0, which reliably triggers an UNDEFINED exception. For example: | # ./kexec --reuse-cmdline -f Image | [ 46.965800] kexec_core: Starting new kernel | [ 47.143641] psci: CPU1 killed (polled 0 ms) | [ 47.233653] psci: CPU2 killed (polled 0 ms) | [ 47.323465] psci: CPU3 killed (polled 0 ms) | [ 47.324776] Bye! | [ 47.327072] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP | [ 47.328510] Modules linked in: | [ 47.329086] CPU: 0 PID: 259 Comm: kexec Not tainted 6.2.0-rc5+ #3 | [ 47.330223] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 | [ 47.331497] pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) | [ 47.332782] pc : 0x43a95000 | [ 47.333338] lr : machine_kexec+0x190/0x1e0 | [ 47.334169] sp : ffff80000d293b70 | [ 47.334845] x29: ffff80000d293b70 x28: ffff000002cc0000 x27: 0000000000000000 | [ 47.336292] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | [ 47.337744] x23: ffff80000a837858 x22: 0000000048ec9000 x21: 0000000000000010 | [ 47.339192] x20: 00000000adc83000 x19: ffff000000827000 x18: 0000000000000006 | [ 47.340638] x17: ffff800075a61000 x16: ffff800008000000 x15: ffff80000d293658 | [ 47.342085] x14: 0000000000000000 x13: ffff80000d2937f7 x12: ffff80000a7ff6e0 | [ 47.343530] x11: 00000000ffffdfff x10: ffff80000a8ef8e0 x9 : ffff80000813ef00 | [ 47.344976] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8 | [ 47.346431] x5 : 0000000000001fff x4 : 0000000000000001 x3 : ffff80000a0a3008 | [ 47.347877] x2 : ffff80000a8220f8 x1 : 0000000043a95000 x0 : ffff000000827000 | [ 47.349334] Call trace: | [ 47.349834] 0x43a95000 | [ 47.350338] kernel_kexec+0x88/0x100 | [ 47.351070] __do_sys_reboot+0x108/0x268 | [ 47.351873] __arm64_sys_reboot+0x2c/0x40 | [ 47.352689] invoke_syscall+0x78/0x108 | [ 47.353458] el0_svc_common.constprop.0+0x4c/0x100 | [ 47.354426] do_el0_svc+0x34/0x50 | [ 47.355102] el0_svc+0x34/0x108 | [ 47.355747] el0t_64_sync_handler+0xf4/0x120 | [ 47.356617] el0t_64_sync+0x194/0x198 | [ 47.357374] Code: bad PC value | [ 47.357999] ---[ end trace 0000000000000000 ]--- | [ 47.358937] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception | [ 47.360515] Kernel Offset: disabled | [ 47.361230] CPU features: 0x002000,00050108,c8004203 | [ 47.362232] Memory Limit: none Note: Unfortunately the code dump reports "bad PC value" as it attempts to dump some instructions prior to the UDF (i.e. before the start of the page), and terminates early upon a fault, obscuring the problem. This patch fixes this issue by aligning the section starter markes to CONFIG_FUNCTION_ALIGNMENT using the ALIGN_FUNCTION() helper, which ensures that the linker never needs to place padding bytes within the section. Assertions are added to verify each section begins with the function we expect, making our implicit requirement explicit. In future it might be nice to rework the kexec and hibernation code to decouple the section start from the entry point, but that involves much more significant changes that come with a higher risk of error, so I've tried to keep this fix as simple as possible for now. Fixes: 47a15aa544279d34 ("arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT") Reported-by: CKI Project Link: https://lore.kernel.org/linux-arm-kernel/29992.123012504212600261@us-mta-139.us.mimecast.lan/ Signed-off-by: Mark Rutland Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: James Morse Cc: Will Deacon Reviewed-by: Ard Biesheuvel --- arch/arm64/kernel/vmlinux.lds.S | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 407415a5163ab..1a43df27a2046 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -93,6 +93,7 @@ jiffies = jiffies_64; #ifdef CONFIG_HIBERNATION #define HIBERNATE_TEXT \ + ALIGN_FUNCTION(); \ __hibernate_exit_text_start = .; \ *(.hibernate_exit.text) \ __hibernate_exit_text_end = .; @@ -102,6 +103,7 @@ jiffies = jiffies_64; #ifdef CONFIG_KEXEC_CORE #define KEXEC_TEXT \ + ALIGN_FUNCTION(); \ __relocate_new_kernel_start = .; \ *(.kexec_relocate.text) \ __relocate_new_kernel_end = .; @@ -355,6 +357,8 @@ ASSERT(__idmap_text_end - (__idmap_text_start & ~(SZ_4K - 1)) <= SZ_4K, #ifdef CONFIG_HIBERNATION ASSERT(__hibernate_exit_text_end - __hibernate_exit_text_start <= SZ_4K, "Hibernate exit text is bigger than 4 KiB") +ASSERT(__hibernate_exit_text_start == swsusp_arch_suspend_exit, + "Hibernate exit text does not start with swsusp_arch_suspend_exit") #endif #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 ASSERT((__entry_tramp_text_end - __entry_tramp_text_start) <= 3*PAGE_SIZE, @@ -381,4 +385,6 @@ ASSERT(swapper_pg_dir - tramp_pg_dir == TRAMP_SWAPPER_OFFSET, ASSERT(__relocate_new_kernel_end - __relocate_new_kernel_start <= SZ_4K, "kexec relocation code is bigger than 4 KiB") ASSERT(KEXEC_CONTROL_PAGE_SIZE >= SZ_4K, "KEXEC_CONTROL_PAGE_SIZE is broken") +ASSERT(__relocate_new_kernel_start == arm64_relocate_new_kernel, + "kexec control page does not start with arm64_relocate_new_kernel") #endif