From patchwork Fri May 21 03:01:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12271793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DDB5C433B4 for ; Fri, 21 May 2021 03:02:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E6756128A for ; Fri, 21 May 2021 03:02:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E6756128A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3F61D8D001D; Thu, 20 May 2021 23:02:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A3BB8D0001; Thu, 20 May 2021 23:02:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBCE18D001D; Thu, 20 May 2021 23:02:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id 91DAD8D0001 for ; Thu, 20 May 2021 23:02:11 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 246EEBEE0 for ; Fri, 21 May 2021 03:02:11 +0000 (UTC) X-FDA: 78163739262.35.4B2A5B7 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf30.hostedemail.com (Postfix) with ESMTP id 37126E000803 for ; Fri, 21 May 2021 03:02:09 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id kr9so2151950pjb.5 for ; Thu, 20 May 2021 20:02:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pyb0TctjvaI/OUB8rDFVD7xXzPEU5hRVJt7W8tnhK1Q=; b=T0p2rR2gWnKeMMy4lg+36jLMLFefCFJQpxF0fSZkNAor+htwgnU68pj6lm/uDsIehQ 9TILJ8hh2jMOfmo7unm+O+zjFg3F7j5YF2y1kHU5DvMya8wngy3oCiyDeLLuopkomOZr CBJp7FBTP1HcZulZyTo/aYLz48bmJxu8THj+rZJMYk+SzaMzvNTPCRR41+5xwonWmVkC PjD1Nd0PIthLpEGuPN30a2BgvTsr5nnhcYUD+ouD3GWsXFHreuDDUiho/Ttlah4sCgeq hhICjTEfI+c85VQ41rPPBsxMhxS+ZfL1vPZCu0kxEhOubQGquDV6O94m/jp4p24DamhY cXNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pyb0TctjvaI/OUB8rDFVD7xXzPEU5hRVJt7W8tnhK1Q=; b=Q4ZEr4M1gHyqiHeAJ+Q0F755KrQSEpRv43SFlgNki5WKgIBc+7ilHqk46FFV0+/fPD ccxVItqyIGW1ikkXGhP1RVQJu6+PNFZRxhy1QweIkgox+Zj5Dw77rM6/qlIgZefp+t84 mLbYH/gypHDeNNhcTAU8iPjUlmBJfFkTJt8mFF+o+o0Vt7x1E75g0fNMrcky1KcG1jVf 6Y7GXIc5WkFnN5g5eJwKpmO7S4uG+LIkvqa1K+KbpjeQM45pzudFTHCj4ZopHWD6j6Ib qG2wIv4T8s1ltaQoqwnGzdnyT3UVf5cEOHAvO6Ea2NrPyB+WoFW6mxEM5z3p+rpS8FVu mAnw== X-Gm-Message-State: AOAM5303bSUbP8yylTASAQlKTF5gWpDeLcAcXQ223nhYfWnHCVHTEQGO xVe9zPgNu59RnxL4Wcv3LQ5NAGq4OlAtNAg= X-Google-Smtp-Source: ABdhPJxFqjiOZDuJCAgryzL8NkinikwGV9C+njllULaSHVoCRPlzpo0KTnp4x3CmDRX2fyPw3CWGrA== X-Received: by 2002:a17:90a:fa88:: with SMTP id cu8mr8631050pjb.233.1621566129802; Thu, 20 May 2021 20:02:09 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id 191sm2959677pfx.121.2021.05.20.20.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 May 2021 20:02:09 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v5 2/3] mm,hwpoison: Return -EHWPOISON to denote that the page has already been poisoned Date: Fri, 21 May 2021 12:01:55 +0900 Message-Id: <20210521030156.2612074-3-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210521030156.2612074-1-nao.horiguchi@gmail.com> References: <20210521030156.2612074-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 37126E000803 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=T0p2rR2g; spf=pass (imf30.hostedemail.com: domain of naohoriguchi@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=naohoriguchi@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam04 X-Stat-Signature: 9yzcdfxs8ap1yew7ity1pjmq68azrzac X-HE-Tag: 1621566129-633335 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Aili Yao When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return an error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS even in "already hwpoisoned" path of memory_failure() as is done in page fault path. Signed-off-by: Aili Yao Signed-off-by: Naoya Horiguchi Reviewed-by: Oscar Salvador --- ChangeLog v5: - update patch description. --- mm/memory-failure.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git v5.13-rc2/mm/memory-failure.c v5.13-rc2_patched/mm/memory-failure.c index 0f0b932ccbca..8add7cafad5e 100644 --- v5.13-rc2/mm/memory-failure.c +++ v5.13-rc2_patched/mm/memory-failure.c @@ -1247,7 +1247,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EHWPOISON; } num_poisoned_pages_inc(); @@ -1456,6 +1456,7 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; goto unlock_mutex; }