From patchwork Fri Jun 22 11:16:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 10481885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 02CAE60230 for ; Fri, 22 Jun 2018 11:16:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E150928E75 for ; Fri, 22 Jun 2018 11:16:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D55F828EC0; Fri, 22 Jun 2018 11:16:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 550BA28E75 for ; Fri, 22 Jun 2018 11:16:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=xN0GEjwIwfTKuYPuGjNnMwd+H4mp7CkE7wP/ItwxQaA=; b=UpV+83MUznDRtR jb1dp7gB57LAUoV2JR9XnLoxL40LlSsbdnwumddOO9dCpoAbYdTiZB8gUFMZP10MkSzXKctYjkJOy VUv3wLeDlyllEpZGm9crBy3bt0Y6/pjoVUTarr5lhiDqAuIkApcqLcVxsQod1lRs5ZsAd9LqLKyUp VMbL50ie2YSMX5VbZ/fe+zSciz7Ao9FDmG8WKcPToYTE5iZu1qt1qoA9TVcoCB2bFOHdDGkV63/PK 0d1lBOgXwXcFmZHRuZmyxhLzOkABvqOOomUnKDsMCaWFXkAzkklDQnch5oD5SBahmtlDi8uiisrRi XRCiWZnpjAw+7cnzCUcQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fWK2j-00083O-Ba; Fri, 22 Jun 2018 11:15:53 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1fWK2f-00080k-JG for linux-arm-kernel@lists.infradead.org; Fri, 22 Jun 2018 11:15:51 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E3E5114; Fri, 22 Jun 2018 04:15:38 -0700 (PDT) Received: from edgewater-inn.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B30F53F557; Fri, 22 Jun 2018 04:15:38 -0700 (PDT) Received: by edgewater-inn.cambridge.arm.com (Postfix, from userid 1000) id 2B8761AE0C58; Fri, 22 Jun 2018 12:16:15 +0100 (BST) Date: Fri, 22 Jun 2018 12:16:15 +0100 From: Will Deacon To: Wei Xu Subject: Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform. Message-ID: <20180622111614.GA1150@arm.com> References: <5B2A7832.4010502@hisilicon.com> <5B2A7FE1.5040607@hisilicon.com> <20180621091850.GA22505@arm.com> <5B2B7A84.8090309@hisilicon.com> <20180621105404.GB22505@arm.com> <5B2CB440.8040705@hisilicon.com> <20180622092330.GD7601@arm.com> <5B2CD33B.9020702@hisilicon.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <5B2CD33B.9020702@hisilicon.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20180622_041549_651488_C8A61E11 X-CRM114-Status: GOOD ( 22.72 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, "Chenxin \(Charles\)" , Linuxarm , Hanjun Guo , xiexiuqi@huawei.com, "kongxinwei \(A\)" , huangdaode , catalin.marinas@arm.com, "Liyuan \(Larry, Turing Solution\)" , "Zhuangyuzeng \(Yisen\)" , Zhangyi ac , suzuki.poulose@arm.com, marc.zyngier@arm.com, John Garry , "Xiongfanggou \(James\)" , jonathan.cameron@huawei.com, linux-arm-kernel@lists.infradead.org, Salil Mehta , linux-kernel@vger.kernel.org, Shameerali Kolothum Thodi , dave.martin@arm.com, zhangbin011@hisilicon.com, "Wangzhou \(B\)" , James Morse , libeijian@hisilicon.com, "Liguozhu \(Kenneth\)" , Shiju Jose Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Virus-Scanned: ClamAV using ClamSMTP Hi Wei, Thanks for giving that a spin. On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote: > On 2018/6/22 17:23, Will Deacon wrote: > >On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote: > >>On 2018/6/21 11:54, Will Deacon wrote: > >>>On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote: > >>>>On 2018/6/21 10:18, Will Deacon wrote: > >>>>>Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN, > >>>>>otherwise your kernel will take an age to boot. > >>>>Yes, amazing! This patch resolved the issue. > >>>Great... > >>> > >>>>I have tested 50 times and can not reproduce the issue any more. > >>>>Could you please tell more why this patch works? > >>>You might need to ask your CPU design team ;) > >>> > >>>Without this patch, the code in idmap_kpti_install_ng_mappings() sets > >>>bit 11 in table descriptors so that we can keep track of which parts of > >>>the page table we've visited. With this patch, we don't bother tracking > >>>and potentially rewalk parts of the page table (which takes a very long > >>>time if KASAN is enabled). > >>Got it. Thanks! > >> > >>>The architecture documents I've looked at are clear that bit 11 is IGNORED > >>>by the CPU, which: > >>> > >>> "Indicates that the architecture guarantees that the bit or field is not > >>> interpreted or modified by hardware." > >>> > >>>Please can you double-check that your CPU is indeed ignoring bit 11 in > >>>non-leaf (table) descriptors? > >>Do the non-leaf(table) descriptors mean the table descriptors > >>of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats" > >>in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)? > >> > >>If yes, our hardware does ignore it(not interpret or modify). > >Ok, thanks for checking. > > > >>Is there any other possible reason cause this? > >Perhaps just writing back the table entries is enough to cause the issue, > >although I really can't understand why that would be the case. Can you try > >the diff below (without my previous change), please? > > Thanks! > But it does not resolve the issue(only apply this patch based on 4.17.0). Thanks, that's a useful data point. It means that it still crashes even if we write back the same table entries, so it's the fact that we're writing them at all which causes the problem, not the value that we write. Whilst looking at the code, we noticed a missing DMB. On the off-chance that it helps, can you try this instead please? Will --->8 diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 5f9a73a4452c..03646e6a2ef4 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1) .macro __idmap_kpti_put_pgtable_ent_ng, type orr \type, \type, #PTE_NG // Same bit for blocks and pages - str \type, [cur_\()\type\()p] // Update the entry and ensure it - dc civac, cur_\()\type\()p // is visible to all CPUs. + str \type, [cur_\()\type\()p] // Update the entry and ensure + dmb sy // that it is visible to all + dc civac, cur_\()\type\()p // CPUs. .endm /*