From patchwork Wed Mar 31 11:25:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?eWFvYWlsaSBb5LmI54ix5YipXQ==?= X-Patchwork-Id: 12175331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BDDEC433C1 for ; Wed, 31 Mar 2021 11:25:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F35AF6198F for ; Wed, 31 Mar 2021 11:25:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F35AF6198F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 469EE6B0082; Wed, 31 Mar 2021 07:25:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 419F66B0083; Wed, 31 Mar 2021 07:25:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2BA1B6B0085; Wed, 31 Mar 2021 07:25:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 117F96B0082 for ; Wed, 31 Mar 2021 07:25:46 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BCA8E8249980 for ; Wed, 31 Mar 2021 11:25:45 +0000 (UTC) X-FDA: 77979939450.08.0A64C77 Received: from mail.kingsoft.com (unknown [114.255.44.146]) by imf12.hostedemail.com (Postfix) with ESMTP id 3367A138 for ; Wed, 31 Mar 2021 11:25:42 +0000 (UTC) X-AuditID: 0a580155-f55ff70000015057-03-60645c357ab1 Received: from mail.kingsoft.com (localhost [10.88.1.79]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-2-NODE-85) with SMTP id 82.14.20567.53C54606; Wed, 31 Mar 2021 19:25:41 +0800 (HKT) Received: from alex-virtual-machine (172.16.253.254) by KSBJMAIL4.kingsoft.cn (10.88.1.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 31 Mar 2021 19:25:41 +0800 Date: Wed, 31 Mar 2021 19:25:40 +0800 From: Aili Yao To: "HORIGUCHI =?utf-8?q?NAOYA?=(=?utf-8?b?5aCA5Y+j44CA55u05Lmf?=)" , "Luck, Tony" , "Oscar Salvador" , "david@redhat.com" CC: "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" , , Subject: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned Message-ID: <20210331192540.2141052f@alex-virtual-machine> In-Reply-To: <20210309143534.6c1a8ec5@alex-virtual-machine> References: <20210304144524.795872d7@alex-virtual-machine> <20210304235720.GA215567@agluck-desk2.amr.corp.intel.com> <20210305093016.40c87375@alex-virtual-machine> <20210305093656.6c262b19@alex-virtual-machine> <20210305221143.GA220893@agluck-desk2.amr.corp.intel.com> <20210308064558.GA3617@hori.linux.bs1.fc.nec.co.jp> <3690ece2101d428fb9067fcd2a423ff8@intel.com> <20210308223839.GA21886@hori.linux.bs1.fc.nec.co.jp> <20210308225504.GA233893@agluck-desk2.amr.corp.intel.com> <20210309100421.3d09b6b1@alex-virtual-machine> <20210309060440.GA29668@hori.linux.bs1.fc.nec.co.jp> <20210309143534.6c1a8ec5@alex-virtual-machine> Organization: kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL4.kingsoft.cn (10.88.1.79) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrNLMWRmVeSWpSXmKPExsXCFcHor2sak5JgsO2xvMWc9WvYLL6u/8Vs cXnXHDaLe2v+s1pcbDzAaHFmWpHFmwv3WBzYPRbvecnksenTJHaPEzN+s3i8uLqRxeP9vqts HptPV3t83iQXwB7FZZOSmpNZllqkb5fAlXHvfgNzwRuhivWrJjA1MD7l62Lk5JAQMJHYcKmL GcQWEpjOJHH5qlQXIxeQ/YpRYurK70wgCRYBVYm9v1pZQGw2IHvXvVmsIEUiAlcZJRbem8oO 4jAL9DFJ7Nq+nLGLkYNDWMBfYuLPUJAGXgEriaPXF4EN4hSwljhyfAozxIZWVok9r6+AreYX EJPovfKfCeIke4m2LYsYIZoFJU7OfAK2mVlAR+LEqmPMELa8xPa3c6DOVpQ4vOQXO0SvksSR 7hlsEHasRNOBW2wTGIVnIRk1C8moWUhGLWBkXsXIUpybbrSJERIfoTsYZzR91DvEyMTBeIhR goNZSYRX+EBighBvSmJlVWpRfnxRaU5q8SFGaQ4WJXHe7w+SEoQE0hNLUrNTUwtSi2CyTByc Ug1M0w+uiuasi45LFPxc96KmyCTs4erFCTxVb5pLrLM/3f+z6fXuQ4UvnJlm3Jz1N9T3RO57 HZ9JfBfucH6bvP7SEeuvb3eUhLnf4eXxUzgQvJubncU24innSqtPPmcsWjifsN6/3Xz/2o74 gr8bZ8gemaNfxM7/bQVD4x73LZuK7ev3B1dNMWFTLopcp7yrUUi3Il4hpL6jaMpXg9KtsZPV vovWvNy37l4M5/Wjr+6ufeTzrczjrvPVnfPSLvRM33D/47rVx6Jmpx8RPlNjXH7QWtT5be2m 51VTLFtFXrVuqnyzROj5/UzjMnXrvLUMZw/d3sz4u1wnKiXq1cQcN+/1CzMmdraaPH4c5/ZZ Y+nzCCWW4oxEQy3mouJEAHFYImD+AgAA X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3367A138 X-Stat-Signature: nnoyjwhc8dmp6kni4syi4bc79dorqcx9 Received-SPF: none (kingsoft.com>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from=""; helo=mail.kingsoft.com; client-ip=114.255.44.146 X-HE-DKIM-Result: none/none X-HE-Tag: 1617189942-690393 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the page is already poisoned, another memory_failure() call in the same page now return 0, meaning OK. For nested memory mce handling, this behavior may lead to one mce looping, Example: 1.When LCME is enabled, and there are two processes A && B running on different core X && Y separately, which will access one same page, then the page corrupted when process A access it, a MCE will be rasied to core X and the error process is just underway. 2.Then B access the page and trigger another MCE to core Y, it will also do error process, it will see TestSetPageHWPoison be true, and 0 is returned. 3.The kill_me_maybe will check the return: 1244 static void kill_me_maybe(struct callback_head *cb) 1245 { 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); 1257 sync_core(); 1258 return; 1259 } 1267 } 4. The error process for B will end, and may nothing happened if kill-early is not set, The process B will re-excute instruction and get into mce again and then loop happens. And also the set_mce_nospec() here is not proper, may refer to commit fd0e786d9d09 ("x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages"). For other cases which care the return value of memory_failure() should check why they want to process a memory error which have already been processed. This behavior seems reasonable. Signed-off-by: Aili Yao --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 24210c9bd843..5cd42144b67c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1228,7 +1228,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; } num_poisoned_pages_inc(); @@ -1430,7 +1430,7 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; } orig_head = hpage = compound_head(p);