From patchwork Tue Apr 27 06:29:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12225411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01B4DC433B4 for ; Tue, 27 Apr 2021 06:30:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9615161107 for ; Tue, 27 Apr 2021 06:30:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9615161107 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D60216B0036; Tue, 27 Apr 2021 02:30:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE9676B006E; Tue, 27 Apr 2021 02:30:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B62E16B0070; Tue, 27 Apr 2021 02:30:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 92B556B0036 for ; Tue, 27 Apr 2021 02:30:08 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 5201612FB for ; Tue, 27 Apr 2021 06:30:08 +0000 (UTC) X-FDA: 78077172096.19.8A186BB Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf12.hostedemail.com (Postfix) with ESMTP id 9B826135 for ; Tue, 27 Apr 2021 06:29:57 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id 20so26349931pll.7 for ; Mon, 26 Apr 2021 23:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OjbUpDY56wXOesrg6GNwI3LYisi2P4HtofaGW6JDxic=; b=bgvlCsmYMZyGhWOl8DcOU2ZkVIIyMBiK7tQis1z/N3tjcSguLnIYTF9vb3Cue3lwlI A4gJTl7i02E7oSkuKrKD9ON3oLFYbcquASqV5Scs7LHLgVV0rH9eKy9gNXJu+zVo1Bw3 d48auHf78UbJRG2qRB+JbJW37VqL/LxDTkOFjgQG/XoPQ4PhwM+HJh1dyEM5BymifsOL lkh1m/EhBc8g3bKdRVuMjVCY57X8wek5PnzwEsmmUnprI2UqtbMFII4AbPtiC4KfrNfq gYx4FEF8thIUg4iz3txbbRp+PXztDqq3NSOdUjOXbSttob1uJjtvxsUVuQmLe5/sYNFP BKPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OjbUpDY56wXOesrg6GNwI3LYisi2P4HtofaGW6JDxic=; b=Bl2naQcyrEbrlVykynEou/SZ0XaUwaRgigDEJE7fWIejRbGKFIe4KAhPaVUk2lbmWc 7S9mWKWx6u1wk27B42jcbRzUL8YZH09EtB+FyBfhq1GKJ9ZJdMFwOWpZVmhE8FBZNVEu akJTs5yNqNnSlWO0G8G89vWpGbag9HYmV+AfYv9woN8A/e6KitHG97aMd3A7EeNIJFJd fG13rfdn4njkOA8Is6IMMrG8njH3XgpFnrmkWS6mE5UFPSz+vXOP5YD8CyDRn9zN/jDP Zc0RbzOKVjGkK24WCUwk2Z8buem/ZGaNmajrZj3uRuNu/0HiKKaqQGtoNama2XkUxbFd iR0A== X-Gm-Message-State: AOAM532WfYQkNoigPs0VqDZ395xNzyu9OgbrQMNGEBmv9m8s+Aua5Qd/ BaLwVP7MCfiXV86k3wQ+B/7UO2BS/CyF7WU= X-Google-Smtp-Source: ABdhPJzBbcSJ4pf6ikTwULLfBEmvMLlIaVS36i5qsnQpRoOZfsjmUEPYc/mGsf3wx+qk71nv4gTb+Q== X-Received: by 2002:a17:90b:1646:: with SMTP id il6mr3255353pjb.27.1619505006702; Mon, 26 Apr 2021 23:30:06 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id t9sm902704pgg.6.2021.04.26.23.30.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Apr 2021 23:30:06 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v4 1/2] mm/memory-failure: Use a mutex to avoid memory_failure() races Date: Tue, 27 Apr 2021 15:29:52 +0900 Message-Id: <20210427062953.2080293-2-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210427062953.2080293-1-nao.horiguchi@gmail.com> References: <20210427062953.2080293-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9B826135 X-Stat-Signature: 8yu5d4juz149qsaseit7j43enfhzof9b Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from=""; helo=mail-pl1-f177.google.com; client-ip=209.85.214.177 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619504997-491685 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tony Luck There can be races when multiple CPUs consume poison from the same page. The first into memory_failure() atomically sets the HWPoison page flag and begins hunting for tasks that map this page. Eventually it invalidates those mappings and may send a SIGBUS to the affected tasks. But while all that work is going on, other CPUs see a "success" return code from memory_failure() and so they believe the error has been handled and continue executing. Fix by wrapping most of the internal parts of memory_failure() in a mutex. Signed-off-by: Tony Luck Signed-off-by: Naoya Horiguchi Reviewed-by: Borislav Petkov --- mm/memory-failure.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c index 24210c9bd843..4087308e4b32 100644 --- v5.12/mm/memory-failure.c +++ v5.12_patched/mm/memory-failure.c @@ -1381,6 +1381,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +static DEFINE_MUTEX(mf_mutex); + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1404,7 +1406,7 @@ int memory_failure(unsigned long pfn, int flags) struct page *hpage; struct page *orig_head; struct dev_pagemap *pgmap; - int res; + int res = 0; unsigned long page_flags; bool retry = true; @@ -1424,13 +1426,18 @@ int memory_failure(unsigned long pfn, int flags) return -ENXIO; } + mutex_lock(&mf_mutex); + try_again: - if (PageHuge(p)) - return memory_failure_hugetlb(pfn, flags); + if (PageHuge(p)) { + res = memory_failure_hugetlb(pfn, flags); + goto unlock_mutex; + } + if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + goto unlock_mutex; } orig_head = hpage = compound_head(p); @@ -1463,17 +1470,19 @@ int memory_failure(unsigned long pfn, int flags) res = MF_FAILED; } action_result(pfn, MF_MSG_BUDDY, res); - return res == MF_RECOVERED ? 0 : -EBUSY; + res = res == MF_RECOVERED ? 0 : -EBUSY; } else { action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); - return -EBUSY; + res = -EBUSY; } + goto unlock_mutex; } if (PageTransHuge(hpage)) { if (try_to_split_thp_page(p, "Memory Failure") < 0) { action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED); - return -EBUSY; + res = -EBUSY; + goto unlock_mutex; } VM_BUG_ON_PAGE(!page_count(p), p); } @@ -1497,7 +1506,7 @@ int memory_failure(unsigned long pfn, int flags) if (PageCompound(p) && compound_head(p) != orig_head) { action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } /* @@ -1517,14 +1526,14 @@ int memory_failure(unsigned long pfn, int flags) num_poisoned_pages_dec(); unlock_page(p); put_page(p); - return 0; + goto unlock_mutex; } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); unlock_page(p); put_page(p); - return 0; + goto unlock_mutex; } if (!PageTransTail(p) && !PageLRU(p)) @@ -1543,7 +1552,7 @@ int memory_failure(unsigned long pfn, int flags) if (!hwpoison_user_mappings(p, pfn, flags, &p)) { action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } /* @@ -1552,13 +1561,15 @@ int memory_failure(unsigned long pfn, int flags) if (PageLRU(p) && !PageSwapCache(p) && p->mapping == NULL) { action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); res = -EBUSY; - goto out; + goto unlock_page; } identify_page_state: res = identify_page_state(pfn, p, page_flags); -out: +unlock_page: unlock_page(p); +unlock_mutex: + mutex_unlock(&mf_mutex); return res; } EXPORT_SYMBOL_GPL(memory_failure); From patchwork Tue Apr 27 06:29:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12225413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C47B3C433B4 for ; Tue, 27 Apr 2021 06:30:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 30803610A5 for ; Tue, 27 Apr 2021 06:30:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 30803610A5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9423B6B006E; Tue, 27 Apr 2021 02:30:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F34D6B0070; Tue, 27 Apr 2021 02:30:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CF686B0071; Tue, 27 Apr 2021 02:30:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id 496DF6B006E for ; Tue, 27 Apr 2021 02:30:11 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 07335180AD83A for ; Tue, 27 Apr 2021 06:30:11 +0000 (UTC) X-FDA: 78077172222.19.E26A436 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf09.hostedemail.com (Postfix) with ESMTP id 17A7C6000106 for ; Tue, 27 Apr 2021 06:30:03 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id o16so16690716plg.5 for ; Mon, 26 Apr 2021 23:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9SQOGfRhOvmrRAB3OwuASj4ObtEZfSj/KmwjsOnqiSE=; b=UvR6wyoZactuHmK0ODzJV0Csj6QMjeR0s+QsW5/uJ7Ju0JqALZzLQLzFGy5EKK+dgP S4nIb+tiy5qm1ec6Snjc0O2Y/dL9xpX2NBBat63RBbxRuIJardDNPh/wXO4Xw9ztRoSM ABDufnwJa4LE5PAmGUOB0bOSBPoxo4rqnXzcoAswnPOy2nZs0kPp5jfId4586tI++DAK GXVS9HoA054B7YinwVMjhDhNoGuYLukwXdP/i0JQbFQ139yglDXVC4VbrUixxuXE1ltP jH2QRVAPllNKsaJc+iqSkpXzxePV6Etec5KeFIaYyT4/susfd5JsrKsZ6KGDfCfNTXe7 niog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9SQOGfRhOvmrRAB3OwuASj4ObtEZfSj/KmwjsOnqiSE=; b=mzv5wRtpUlMDouzNB6eK+B1rb7JHXxBerp7BRvxDoM+yaKJ9zGIHEAMb4B2lP7n8EH H8l+/39veFHZnNK9gLXH5+ue966iJkM/efgqX29dSmdjA6ypFwN9o5cZWnzTPGsAyW8L Nx8FEj2ycFZ25iCRoAOeO3F6HF1ZAvhq7SGbC4mU27QygQ6ABeSLMdcV58KAfTt3HSuy bKBuPK3rPfpWMwWaU7WgdCezvNO7sbo9TjpcuwnJS1DYuhuklJy8TdOwXuyoifDJ89NJ Il1d52SpxTvPFWMjeURg6KYMxPj/XVVseinEwYiSlkIxJlhMpLnmZK3CSmdBvBGRdqJ/ BCNQ== X-Gm-Message-State: AOAM531t+JoXdo1kJfguCJlRdGr4f9/MhFy29Z0fACtSmkIsoblntHN2 eBBBgNgBgelsWNqfVo//1uxPPwdqdSlOYwo= X-Google-Smtp-Source: ABdhPJzQNpGxdL1CV9nUdLnegqUNZvAmnHzS3LG02PsnEIgOiFZ6GIFvDW+7kU/Le8Z8bqfZm6I1rg== X-Received: by 2002:a17:90a:950c:: with SMTP id t12mr3133848pjo.135.1619505009581; Mon, 26 Apr 2021 23:30:09 -0700 (PDT) Received: from localhost.localdomain (h175-177-040-153.catv02.itscom.jp. [175.177.40.153]) by smtp.gmail.com with ESMTPSA id t9sm902704pgg.6.2021.04.26.23.30.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Apr 2021 23:30:09 -0700 (PDT) From: Naoya Horiguchi To: linux-mm@kvack.org, Tony Luck , Aili Yao Cc: Andrew Morton , Oscar Salvador , David Hildenbrand , Borislav Petkov , Andy Lutomirski , Naoya Horiguchi , Jue Wang , linux-kernel@vger.kernel.org Subject: [PATCH v4 2/2] mm,hwpoison: send SIGBUS when the page has already been poisoned Date: Tue, 27 Apr 2021 15:29:53 +0900 Message-Id: <20210427062953.2080293-3-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210427062953.2080293-1-nao.horiguchi@gmail.com> References: <20210427062953.2080293-1-nao.horiguchi@gmail.com> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 17A7C6000106 X-Stat-Signature: npj373nxdekzfprg3wibx3n4c717p8hu Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf09; identity=mailfrom; envelope-from=""; helo=mail-pl1-f179.google.com; client-ip=209.85.214.179 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619505003-838396 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi When memory_failure() is called with MF_ACTION_REQUIRED on the page that has already been hwpoisoned, memory_failure() could fail to send SIGBUS to the affected process, which results in infinite loop of MCEs. Currently memory_failure() returns 0 if it's called for already hwpoisoned page, then the caller, kill_me_maybe(), could return without sending SIGBUS to current process. An action required MCE is raised when the current process accesses to the broken memory, so no SIGBUS means that the current process continues to run and access to the error page again soon, so running into MCE loop. This issue can arise for example in the following scenarios: - Two or more threads access to the poisoned page concurrently. If local MCE is enabled, MCE handler independently handles the MCE events. So there's a race among MCE events, and the second or latter threads fall into the situation in question. - If there was a precedent memory error event and memory_failure() for the event failed to unmap the error page for some reason, the subsequent memory access to the error page triggers the MCE loop situation. To fix the issue, make memory_failure() return some error code when the error page has already been hwpoisoned. This allows memory error handler to control how it sends signals to userspace. And make sure that any process touching a hwpoisoned page should get a SIGBUS (if possible) with the error virtual address, even in "already hwpoisoned" path of memory_failure() as is done in page fault path. kill_accessing_process() does pagetable walk to find the error virtual address. If multiple virtual addresses are found in the pagetable walk, no one knows which address is the correct one, so we fall back to sending SIGBUS in kill_me_maybe() without error address info as we do now. This corner case is left to be solved in the future. Signed-off-by: Naoya Horiguchi Signed-off-by: Aili Yao --- change log v3 -> v4: - refactored hwpoison_pte_range to save indentation, - updated patch description change log v1 -> v2: - initialize local variables in check_hwpoisoned_entry() and hwpoison_pte_range() - fix and improve logic to calculate error address offset. --- arch/x86/kernel/cpu/mce/core.c | 13 ++- include/linux/swapops.h | 5 ++ mm/memory-failure.c | 143 ++++++++++++++++++++++++++++++++- 3 files changed, 158 insertions(+), 3 deletions(-) diff --git v5.12/arch/x86/kernel/cpu/mce/core.c v5.12_patched/arch/x86/kernel/cpu/mce/core.c index 7962355436da..3ce23445a48c 100644 --- v5.12/arch/x86/kernel/cpu/mce/core.c +++ v5.12_patched/arch/x86/kernel/cpu/mce/core.c @@ -1257,19 +1257,28 @@ static void kill_me_maybe(struct callback_head *cb) { struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me); int flags = MF_ACTION_REQUIRED; + int ret; pr_err("Uncorrected hardware memory error in user-access at %llx", p->mce_addr); if (!p->mce_ripv) flags |= MF_MUST_KILL; - if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && - !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { + ret = memory_failure(p->mce_addr >> PAGE_SHIFT, flags); + if (!ret && !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); sync_core(); return; } + /* + * -EHWPOISON from memory_failure() means that it already sent SIGBUS + * to the current process with the proper error info, so no need to + * send it here again. + */ + if (ret == -EHWPOISON) + return; + if (p->mce_vaddr != (void __user *)-1l) { force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT); } else { diff --git v5.12/include/linux/swapops.h v5.12_patched/include/linux/swapops.h index d9b7c9132c2f..98ea67fcf360 100644 --- v5.12/include/linux/swapops.h +++ v5.12_patched/include/linux/swapops.h @@ -323,6 +323,11 @@ static inline int is_hwpoison_entry(swp_entry_t entry) return swp_type(entry) == SWP_HWPOISON; } +static inline unsigned long hwpoison_entry_to_pfn(swp_entry_t entry) +{ + return swp_offset(entry); +} + static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); diff --git v5.12/mm/memory-failure.c v5.12_patched/mm/memory-failure.c index 4087308e4b32..a3659619d293 100644 --- v5.12/mm/memory-failure.c +++ v5.12_patched/mm/memory-failure.c @@ -56,6 +56,7 @@ #include #include #include +#include #include "internal.h" #include "ras/ras_event.h" @@ -554,6 +555,140 @@ static void collect_procs(struct page *page, struct list_head *tokill, collect_procs_file(page, tokill, force_early); } +struct hwp_walk { + struct to_kill tk; + unsigned long pfn; + int flags; +}; + +static int set_to_kill(struct to_kill *tk, unsigned long addr, short shift) +{ + /* Abort pagewalk when finding multiple mappings to the error page. */ + if (tk->addr) + return 1; + tk->addr = addr; + tk->size_shift = shift; + return 0; +} + +static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, + unsigned long poisoned_pfn, struct to_kill *tk) +{ + unsigned long pfn = 0; + + if (pte_present(pte)) { + pfn = pte_pfn(pte); + } else { + swp_entry_t swp = pte_to_swp_entry(pte); + + if (is_hwpoison_entry(swp)) + pfn = hwpoison_entry_to_pfn(swp); + } + + if (!pfn || pfn != poisoned_pfn) + return 0; + + return set_to_kill(tk, addr, shift); +} + +static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct hwp_walk *hwp = (struct hwp_walk *)walk->private; + int ret = 0; + pte_t *ptep; + pmd_t pmd; + spinlock_t *ptl; + unsigned long pfn; + unsigned long hwpoison_vaddr; + + ptl = pmd_trans_huge_lock(pmdp, walk->vma); + if (!ptl) + goto pte_loop; + pmd = *pmdp; + if (!pmd_present(pmd)) + goto unlock; + pfn = pmd_pfn(pmd); + if (pfn <= hwp->pfn && hwp->pfn < pfn + HPAGE_PMD_NR) { + hwpoison_vaddr = addr + ((hwp->pfn - pfn) << PAGE_SHIFT); + ret = set_to_kill(&hwp->tk, hwpoison_vaddr, PAGE_SHIFT); + } +unlock: + spin_unlock(ptl); + goto out; +pte_loop: + if (pmd_trans_unstable(pmdp)) + goto out; + + ptep = pte_offset_map_lock(walk->vma->vm_mm, pmdp, addr, &ptl); + for (; addr != end; ptep++, addr += PAGE_SIZE) { + ret = check_hwpoisoned_entry(*ptep, addr, PAGE_SHIFT, + hwp->pfn, &hwp->tk); + if (ret == 1) + break; + } + pte_unmap_unlock(ptep - 1, ptl); +out: + cond_resched(); + return ret; +} + +#ifdef CONFIG_HUGETLB_PAGE +static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, + unsigned long addr, unsigned long end, + struct mm_walk *walk) +{ + struct hwp_walk *hwp = (struct hwp_walk *)walk->private; + pte_t pte = huge_ptep_get(ptep); + struct hstate *h = hstate_vma(walk->vma); + + return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), + hwp->pfn, &hwp->tk); +} +#else +#define hwpoison_hugetlb_range NULL +#endif + +static struct mm_walk_ops hwp_walk_ops = { + .pmd_entry = hwpoison_pte_range, + .hugetlb_entry = hwpoison_hugetlb_range, +}; + +/* + * Sends SIGBUS to the current process with the error info. + * + * This function is intended to handle "Action Required" MCEs on already + * hardware poisoned pages. They could happen, for example, when + * memory_failure() failed to unmap the error page at the first call, or + * when multiple local machine checks happened on different CPUs. + * + * MCE handler currently has no easy access to the error virtual address, + * so this function walks page table to find it. One challenge on this is + * to reliably get the proper virual address of the error to report to + * applications via SIGBUS. A process could map a page multiple times to + * different virtual addresses, then we now have no way to tell which virtual + * address was accessed when the Action Required MCE was generated. + * So in such a corner case, we now give up and fall back to sending SIGBUS + * with no error info. + */ +static int kill_accessing_process(struct task_struct *p, unsigned long pfn, + int flags) +{ + int ret; + struct hwp_walk priv = { + .pfn = pfn, + }; + priv.tk.tsk = p; + + mmap_read_lock(p->mm); + ret = walk_page_range(p->mm, 0, TASK_SIZE_MAX, &hwp_walk_ops, + (void *)&priv); + if (!ret && priv.tk.addr) + kill_proc(&priv.tk, pfn, flags); + mmap_read_unlock(p->mm); + return ret ? -EFAULT : -EHWPOISON; +} + static const char *action_name[] = { [MF_IGNORED] = "Ignored", [MF_FAILED] = "Failed", @@ -1228,7 +1363,10 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, page_to_pfn(head), flags); + return res; } num_poisoned_pages_inc(); @@ -1437,6 +1575,9 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); + res = -EHWPOISON; + if (flags & MF_ACTION_REQUIRED) + res = kill_accessing_process(current, pfn, flags); goto unlock_mutex; }