From patchwork Thu Jul 30 20:51:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nick Desaulniers X-Patchwork-Id: 11693667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2BF0913B6 for ; Thu, 30 Jul 2020 20:51:53 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0472420829 for ; Thu, 30 Jul 2020 20:51:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="hC04nTdO"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="WS/N+oLb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0472420829 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:From:Subject:Mime-Version:Message-Id:Date: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=k3WOjKGiu7SpVqMVEOmGUZWOykeVht4qIq/l6/Nxh7I=; b=hC04nTdOVdXFVUxJPWubQe8KGw B8tyfoiUyWi/LDc7sYbA7J2kE5g9YdJgQ8AlgQnNxcssh8ep/yAvvpGm9NMu/aD71rpKCi0yPkp4A 3pBX1aBknV0yqGqYA8O6QlWceH/o/WwLbZrYU1P1hfgMPdqezFT3+T9nRGGtBR8WcduT/xvhJT59+ zmsawArlGlnJ7jVpSUDWfmJnBcWsyV66pRR927I2j+1T2tIz51Pothwks3pgHkmReFIlhx/PSH42s L3cCjuYDYhR9zbO6D0qCMuzIcH83rS/4LkpzC3v6wmnSrNVndKojsssyLBRGA5aU56h4zXRgSvfAf wCN4qPkg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k1FWW-0004fg-Lk; Thu, 30 Jul 2020 20:51:32 +0000 Received: from mail-qv1-xf4a.google.com ([2607:f8b0:4864:20::f4a]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k1FWS-0004dQ-Me for linux-arm-kernel@lists.infradead.org; Thu, 30 Jul 2020 20:51:29 +0000 Received: by mail-qv1-xf4a.google.com with SMTP id l18so587150qvq.16 for ; Thu, 30 Jul 2020 13:51:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=+8IcRcjNMmg55nzAYWemaonCO3YVJFMOAlMg6/HQqZk=; b=WS/N+oLbFzSBBDpGrj5zhi5qc3eMKgQfUSbLNA4X6vw0fslhwV92R948HcCKCQBs4B 3cAPHFnBjNf1GJzmYzjIdxVcT88aOw/+5Z7wY4hDcfZ5yh6wC2nAOs69NEGLdm4IKtNI RHkMh6NAvOxXxk7mDuS8rcqW39ze8lB0GIa6RhJc9qU7vK9N2Btacj0v68dmU6Dug9LP jP/tT7rYjKVlIhvetloVr2f4TprDQRLJSmx4UsvYZgDVA5qVyPFnBDK3Fkb88waPgT74 NFiy+CaVbmI50DH86lrxVdWJKfOc9vsv3Lpzq90+nz4g64FuSpFl766z+JkRzk2031lz HZxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=+8IcRcjNMmg55nzAYWemaonCO3YVJFMOAlMg6/HQqZk=; b=JWTh17CBi80STx6xe7P3jhTVKSUGpAaRxniFoCW5grK3TKfEoUS9B7XkWZ1/FSSEde /wGKbPOm6nqlQd7zaqRodtAKSNJ92yyQEPrPT+GOL3E5EIwO1bBsqAsGMxOrOO2M0IcD qpv12WgR370LZqmmPQNecLtdQKX9FhiVzoliY4x8BqyWFJAL+dxMOmNrVN+1MzP8m61Q VREDbk/LscOrqsOQNep0RimOGbs62MDIuTuopb6leTTy9EXkGdNjc3Hehvpw8ttCEJus ifk/l0eaWDhzdopxgE2mkoWaaolLfoGap97yMhxp464lzhDsTwgK24Pph9Adavnx9W+c I/3A== X-Gm-Message-State: AOAM533YrzwBWCx14zx4k6noOvM3oJwGNjVhzcwBPwstCRPLX1i+4izq wLgR7BXBNgB8jZAeH/px8qmHepJVgEpjR6ZU1zo= X-Google-Smtp-Source: ABdhPJwmD50FNu7pg+bxDxzIfP7b6FbrE4ZJaK1Bw8TAS+CDtLGbaaU90Wd09/zT21c7WigKXG7F+Uwo0u5cNy1rAbY= X-Received: by 2002:a0c:b712:: with SMTP id t18mr958012qvd.205.1596142281418; Thu, 30 Jul 2020 13:51:21 -0700 (PDT) Date: Thu, 30 Jul 2020 13:51:08 -0700 Message-Id: <20200730205112.2099429-1-ndesaulniers@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.28.0.163.g6104cc2f0b6-goog Subject: [PATCH 0/4] CONFIG_UNWINDER_FRAME_POINTER fixes+cleanups From: Nick Desaulniers To: Nathan Huckleberry , Russell King X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200730_165128_778174_21AB508D X-CRM114-Status: GOOD ( 21.72 ) X-Spam-Score: -7.7 (-------) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-7.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [2607:f8b0:4864:20:0:0:0:f4a listed in] [list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.0 DKIMWL_WL_MED DKIMwl.org - Medium sender X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nick Desaulniers , Chunyan Zhang , Dmitry Safonov <0x7f454c46@gmail.com>, linux-kernel@vger.kernel.org, clang-built-linux@googlegroups.com, Miles Chen , linux-mediatek@lists.infradead.org, Matthias Brugger , Andrew Morton , Lvqiang Huang , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org We received a report of boot failure on stable/4.14.y using Clang with CONFIG_UNWINDER_FRAME_POINTER. Turns out, this was a cascaded failure with at least 4 different things going wrong. Working backwards from the failure: 4) There was no fixup handler in backtrace-clang.S for a specific address calculation. If an indirect access to that address triggers a page fault for which no corresponding fixup exists, then a panic is triggered. Panicking triggers another unwind, and all this repeats in an infinite loop. If the unwind was started from within start_kernel(), this results in a kernel that does not boot. We can install a fixup handler to fix the infinite loop, but why was the unwinder accessing an address that would trigger a fault? 3) The unwinder has multiple conditions to know when to stop unwinding, but checking for a valid address in the link register was not one of them. If there was a value for lr that we could check for before using it, then we could add that as another stopping condition to terminate unwinding. But the garbage value in lr in the case of save_stack() wasn't particularly noteworthy from any other address; it was ambiguous whether we had more frames to continue unwinding through or not, but what value would we check for? 2) When following a frame chain, we can generally follow the addresses pushed onto the stack from the link register, lr. The callee generally pushes this value. For our particular failure, the value in the link register upon entry to save_stack() was garbage. The caller, __mmap_switched, does a tail call into save_stack() since we don't plan to return control flow back to __mmap_switched. It uses a `b` (branch) instruction rather than a `bl` (branch+link) which is correct, since there are no instructions after the `b save_stack` in __mmap_switched. If we interpret the value of lr that was pushed on the stack in save_stack(), then it appears that we have further frames to unwind. When observing an unwind on mainline though, lr upon entry to save_stack() was 0x00! It turns out that this exact ambiguity was found+fixed already by upstream commit 59b6359dd92d ("ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()") which landed in 4.15-rc1 but was not yet backported to stable/4.14.y. Sent to stable in: https://lore.kernel.org/stable/20200730180340.1724137-1-ndesaulniers@google.com/T/#u That gives us a value in lr upon entry to save_stack() that's noteworthy as a terminal condition during unwinding. But why did we start unwinding in start_kernel() in the first place? 1) A simple WARN_ON_ONCE was being triggered during start_kernel() due to another patch that landed in v4.15-rc9 but wasn't backported to stable/4.14.y. Sent to stable in: https://lore.kernel.org/stable/20200727191746.3644844-1-ndesaulniers@google.com/T/#u Read (or unwound; pun intended) in the order 1), 2), 3), 4) explains the cascading chain of failures that lead to a kernel not booting. Patch 0001 fixes 3) by adding a test for NULL to the conditions to stop unwinding. This prevents the cascade from going further. Patch 0002 fixes 4) by adding a fixup handler. It's not strictly necessary right now, but I get the feeling that we might not be able to trust the value of the link register pushed on the stack. I'm guessing a stack buffer overflow could overwrite this value. Either way, triggering an exception during unwind should be prevented at all costs to avoid infinite loops. Patches 0003/0004 are cleanup/bikeshed, feel free to NACK them and I don't mind dropping them. They're just minor touchups I felt helped readability from when I was debugging these. 0001 (and slightly so 0002) are the only patches I really care about. Nick Desaulniers (4): ARM: backtrace-clang: check for NULL lr ARM: backtrace-clang: add fixup for lr dereference ARM: backtrace-clang: give labels more descriptive names ARM: backtrace: use more descriptive labels arch/arm/lib/backtrace-clang.S | 34 +++++++++++++++++++++------------- arch/arm/lib/backtrace.S | 30 +++++++++++++++--------------- 2 files changed, 36 insertions(+), 28 deletions(-)