From patchwork Mon Nov 15 08:40:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12618917 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 842C6C433EF for ; Mon, 15 Nov 2021 08:40:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 207AC60F14 for ; Mon, 15 Nov 2021 08:40:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 207AC60F14 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.dev Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id C03786B0088; Mon, 15 Nov 2021 03:40:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B8AB96B008A; Mon, 15 Nov 2021 03:40:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A53686B0092; Mon, 15 Nov 2021 03:40:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0192.hostedemail.com [216.40.44.192]) by kanga.kvack.org (Postfix) with ESMTP id 9788E6B0088 for ; Mon, 15 Nov 2021 03:40:32 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 49DD58274B for ; Mon, 15 Nov 2021 08:40:32 +0000 (UTC) X-FDA: 78810518094.19.3317EDC Received: from out0.migadu.com (out0.migadu.com [94.23.1.103]) by imf17.hostedemail.com (Postfix) with ESMTP id B5CADF000392 for ; Mon, 15 Nov 2021 08:40:31 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1636965630; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TUIDw/Vuw7+v40VKkRfutNtDggbjfhBkv4VGOk3L19Q=; b=Ci0ZJPGtq0HmK4mWWtSIDRe28Izku4VMV0ywmx5ejPQ0/+6J8WaPTvPV9Iq8Q6rRbIvkpS CW5rh68CNDoxS+T6KRr9B5rSju4KxqS3JXXbHX8lWvjgYiLP1kBqRPYNJ8902q3slrRmpp 7bCABl4gfEqpOtuF9hzzKDc4GWl8gQk= From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Oscar Salvador , Michal Hocko , Ding Hui , Tony Luck , "Aneesh Kumar K.V" , Miaohe Lin , Yang Shi , Peter Xu , Naoya Horiguchi , linux-kernel@vger.kernel.org Subject: [PATCH v4 1/3] mm/hwpoison: mf_mutex for soft offline and unpoison Date: Mon, 15 Nov 2021 17:40:04 +0900 Message-Id: <20211115084006.3728254-2-naoya.horiguchi@linux.dev> In-Reply-To: <20211115084006.3728254-1-naoya.horiguchi@linux.dev> References: <20211115084006.3728254-1-naoya.horiguchi@linux.dev> MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B5CADF000392 X-Stat-Signature: b8sgt5qhjmwfk1fxozscpe6onkdj79bu Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Ci0ZJPGt; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf17.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 94.23.1.103 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev X-HE-Tag: 1636965631-100864 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Originally mf_mutex is introduced to serialize multiple MCE events, but it is not that useful to allow unpoison to run in parallel with memory_failure() and soft offline. So apply mf_mutex to soft offline and unpoison. The memory failure handler and soft offline handler get simpler with this. Signed-off-by: Naoya Horiguchi Reviewed-by: Yang Shi Reviewed-by: Mike Kravetz --- ChangeLog v4: - fix type in commit description. ChangeLog v3: - merge with "mm/hwpoison: remove race consideration" - update description ChangeLog v2: - add mutex_unlock() in "page already poisoned" path in soft_offline_page(). (Thanks to Ding Hui) --- mm/memory-failure.c | 62 +++++++++++++-------------------------------- 1 file changed, 18 insertions(+), 44 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index e8c38e27b753..d29c79de6034 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1507,14 +1507,6 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) lock_page(head); page_flags = head->flags; - if (!PageHWPoison(head)) { - pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); - num_poisoned_pages_dec(); - unlock_page(head); - put_page(head); - return 0; - } - /* * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so * simply disable it. In order to make it work properly, we need @@ -1628,6 +1620,8 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, return rc; } +static DEFINE_MUTEX(mf_mutex); + /** * memory_failure - Handle memory failure of a page. * @pfn: Page Number of the corrupted page @@ -1654,7 +1648,6 @@ int memory_failure(unsigned long pfn, int flags) int res = 0; unsigned long page_flags; bool retry = true; - static DEFINE_MUTEX(mf_mutex); if (!sysctl_memory_failure_recovery) panic("Memory failure on page %lx", pfn); @@ -1788,16 +1781,6 @@ int memory_failure(unsigned long pfn, int flags) */ page_flags = p->flags; - /* - * unpoison always clear PG_hwpoison inside page lock - */ - if (!PageHWPoison(p)) { - pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); - num_poisoned_pages_dec(); - unlock_page(p); - put_page(p); - goto unlock_mutex; - } if (hwpoison_filter(p)) { if (TestClearPageHWPoison(p)) num_poisoned_pages_dec(); @@ -1978,6 +1961,7 @@ int unpoison_memory(unsigned long pfn) struct page *page; struct page *p; int freeit = 0; + int ret = 0; unsigned long flags = 0; static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); @@ -1988,39 +1972,30 @@ int unpoison_memory(unsigned long pfn) p = pfn_to_page(pfn); page = compound_head(p); + mutex_lock(&mf_mutex); + if (!PageHWPoison(p)) { unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", pfn, &unpoison_rs); - return 0; + goto unlock_mutex; } if (page_count(page) > 1) { unpoison_pr_info("Unpoison: Someone grabs the hwpoison page %#lx\n", pfn, &unpoison_rs); - return 0; + goto unlock_mutex; } if (page_mapped(page)) { unpoison_pr_info("Unpoison: Someone maps the hwpoison page %#lx\n", pfn, &unpoison_rs); - return 0; + goto unlock_mutex; } if (page_mapping(page)) { unpoison_pr_info("Unpoison: the hwpoison page has non-NULL mapping %#lx\n", pfn, &unpoison_rs); - return 0; - } - - /* - * unpoison_memory() can encounter thp only when the thp is being - * worked by memory_failure() and the page lock is not held yet. - * In such case, we yield to memory_failure() and make unpoison fail. - */ - if (!PageHuge(page) && PageTransHuge(page)) { - unpoison_pr_info("Unpoison: Memory failure is now running on %#lx\n", - pfn, &unpoison_rs); - return 0; + goto unlock_mutex; } if (!get_hwpoison_page(p, flags)) { @@ -2028,29 +2003,23 @@ int unpoison_memory(unsigned long pfn) num_poisoned_pages_dec(); unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", pfn, &unpoison_rs); - return 0; + goto unlock_mutex; } - lock_page(page); - /* - * This test is racy because PG_hwpoison is set outside of page lock. - * That's acceptable because that won't trigger kernel panic. Instead, - * the PG_hwpoison page will be caught and isolated on the entrance to - * the free buddy page pool. - */ if (TestClearPageHWPoison(page)) { unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n", pfn, &unpoison_rs); num_poisoned_pages_dec(); freeit = 1; } - unlock_page(page); put_page(page); if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) put_page(page); - return 0; +unlock_mutex: + mutex_unlock(&mf_mutex); + return ret; } EXPORT_SYMBOL(unpoison_memory); @@ -2231,9 +2200,12 @@ int soft_offline_page(unsigned long pfn, int flags) return -EIO; } + mutex_lock(&mf_mutex); + if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); put_ref_page(ref_page); + mutex_unlock(&mf_mutex); return 0; } @@ -2251,5 +2223,7 @@ int soft_offline_page(unsigned long pfn, int flags) } } + mutex_unlock(&mf_mutex); + return ret; } From patchwork Mon Nov 15 08:40:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12618919 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E605C433F5 for ; Mon, 15 Nov 2021 08:40:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A015261059 for ; Mon, 15 Nov 2021 08:40:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A015261059 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.dev Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3C3B36B008A; Mon, 15 Nov 2021 03:40:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3747F6B0092; Mon, 15 Nov 2021 03:40:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23BA06B0093; Mon, 15 Nov 2021 03:40:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id 16CC66B008A for ; Mon, 15 Nov 2021 03:40:37 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CBEE682744 for ; Mon, 15 Nov 2021 08:40:36 +0000 (UTC) X-FDA: 78810518472.12.61A0192 Received: from out0.migadu.com (out0.migadu.com [94.23.1.103]) by imf21.hostedemail.com (Postfix) with ESMTP id DA6C7D036A53 for ; Mon, 15 Nov 2021 08:40:29 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1636965635; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QmlSkdjeXrMBerY937XOVquwZ2X/0l2XaAAw15nVfp0=; b=rMSptcmav0ZCpE0QsepJKnYxQEdQyVcxM8SlTnWkyo4xB4mkCRDOemAdr81NQU+v2DAMbq DXuGvGVSV/ANnQUqSXuUukCPp2Brx/e40cau6ll4b0eGqsAfgYmL3jJSXiTn6++4eZyvzM DWpTeNhUasXKl86lTs+gcUtU0h5Bf5U= From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Oscar Salvador , Michal Hocko , Ding Hui , Tony Luck , "Aneesh Kumar K.V" , Miaohe Lin , Yang Shi , Peter Xu , Naoya Horiguchi , linux-kernel@vger.kernel.org Subject: [PATCH v4 2/3] mm/hwpoison: remove MF_MSG_BUDDY_2ND and MF_MSG_POISONED_HUGE Date: Mon, 15 Nov 2021 17:40:05 +0900 Message-Id: <20211115084006.3728254-3-naoya.horiguchi@linux.dev> In-Reply-To: <20211115084006.3728254-1-naoya.horiguchi@linux.dev> References: <20211115084006.3728254-1-naoya.horiguchi@linux.dev> MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: naoya.horiguchi@linux.dev Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=rMSptcma; spf=pass (imf21.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 94.23.1.103 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DA6C7D036A53 X-Stat-Signature: td5rano1ssumphh9h9ej3sesowfybefz X-HE-Tag: 1636965629-319390 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi These action_page_types are no longer used, so remove them. Signed-off-by: Naoya Horiguchi Acked-by: Yang Shi Reviewed-by: Mike Kravetz --- include/linux/mm.h | 2 -- include/ras/ras_event.h | 2 -- mm/memory-failure.c | 2 -- 3 files changed, 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a7e4a9e7d807..7941bca871dc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3247,7 +3247,6 @@ enum mf_action_page_type { MF_MSG_KERNEL_HIGH_ORDER, MF_MSG_SLAB, MF_MSG_DIFFERENT_COMPOUND, - MF_MSG_POISONED_HUGE, MF_MSG_HUGE, MF_MSG_FREE_HUGE, MF_MSG_NON_PMD_HUGE, @@ -3262,7 +3261,6 @@ enum mf_action_page_type { MF_MSG_CLEAN_LRU, MF_MSG_TRUNCATED_LRU, MF_MSG_BUDDY, - MF_MSG_BUDDY_2ND, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, MF_MSG_UNKNOWN, diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h index 0bdbc0d17d2f..d0337a41141c 100644 --- a/include/ras/ras_event.h +++ b/include/ras/ras_event.h @@ -358,7 +358,6 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" ) \ EM ( MF_MSG_SLAB, "kernel slab page" ) \ EM ( MF_MSG_DIFFERENT_COMPOUND, "different compound page after locking" ) \ - EM ( MF_MSG_POISONED_HUGE, "huge page already hardware poisoned" ) \ EM ( MF_MSG_HUGE, "huge page" ) \ EM ( MF_MSG_FREE_HUGE, "free huge page" ) \ EM ( MF_MSG_NON_PMD_HUGE, "non-pmd-sized huge page" ) \ @@ -373,7 +372,6 @@ TRACE_EVENT(aer_event, EM ( MF_MSG_CLEAN_LRU, "clean LRU page" ) \ EM ( MF_MSG_TRUNCATED_LRU, "already truncated LRU page" ) \ EM ( MF_MSG_BUDDY, "free buddy page" ) \ - EM ( MF_MSG_BUDDY_2ND, "free buddy page (2nd try)" ) \ EM ( MF_MSG_DAX, "dax page" ) \ EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \ EMe ( MF_MSG_UNKNOWN, "unknown page" ) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index d29c79de6034..722036539b44 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -723,7 +723,6 @@ static const char * const action_page_types[] = { [MF_MSG_KERNEL_HIGH_ORDER] = "high-order kernel page", [MF_MSG_SLAB] = "kernel slab page", [MF_MSG_DIFFERENT_COMPOUND] = "different compound page after locking", - [MF_MSG_POISONED_HUGE] = "huge page already hardware poisoned", [MF_MSG_HUGE] = "huge page", [MF_MSG_FREE_HUGE] = "free huge page", [MF_MSG_NON_PMD_HUGE] = "non-pmd-sized huge page", @@ -738,7 +737,6 @@ static const char * const action_page_types[] = { [MF_MSG_CLEAN_LRU] = "clean LRU page", [MF_MSG_TRUNCATED_LRU] = "already truncated LRU page", [MF_MSG_BUDDY] = "free buddy page", - [MF_MSG_BUDDY_2ND] = "free buddy page (2nd try)", [MF_MSG_DAX] = "dax page", [MF_MSG_UNSPLIT_THP] = "unsplit thp", [MF_MSG_UNKNOWN] = "unknown page", From patchwork Mon Nov 15 08:40:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 12618921 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10A74C433EF for ; Mon, 15 Nov 2021 08:40:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9975963217 for ; Mon, 15 Nov 2021 08:40:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9975963217 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.dev Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3EAC06B0092; Mon, 15 Nov 2021 03:40:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39A526B0093; Mon, 15 Nov 2021 03:40:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 262356B0095; Mon, 15 Nov 2021 03:40:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id 195ED6B0092 for ; Mon, 15 Nov 2021 03:40:42 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D89FA82744 for ; Mon, 15 Nov 2021 08:40:41 +0000 (UTC) X-FDA: 78810518892.05.F3F17B3 Received: from out0.migadu.com (out0.migadu.com [94.23.1.103]) by imf29.hostedemail.com (Postfix) with ESMTP id 1E8B2900013E for ; Mon, 15 Nov 2021 08:40:40 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1636965640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6/577quaf3YVXuMd+tpfF6/gmYsdw3JieCgnM0veDPs=; b=fuOSDGR8j+RmS8NAcmM6CouxBZ73LHBoPFhOQC5Z1ar+LxrsUWNycq+c5C5AoZ1fqmLyH4 FvvnKFx/GNC/7WzfvnSAarlyzruuhCpQ8niaxgUadCVBubP6Qs1fDDeYKxjZOTZWj+Vwu/ +CGeKdYvLxm0cms6eQajjLZLtcWE5tw= From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Oscar Salvador , Michal Hocko , Ding Hui , Tony Luck , "Aneesh Kumar K.V" , Miaohe Lin , Yang Shi , Peter Xu , Naoya Horiguchi , linux-kernel@vger.kernel.org Subject: [PATCH v4 3/3] mm/hwpoison: fix unpoison_memory() Date: Mon, 15 Nov 2021 17:40:06 +0900 Message-Id: <20211115084006.3728254-4-naoya.horiguchi@linux.dev> In-Reply-To: <20211115084006.3728254-1-naoya.horiguchi@linux.dev> References: <20211115084006.3728254-1-naoya.horiguchi@linux.dev> MIME-Version: 1.0 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 1E8B2900013E X-Stat-Signature: 9qpiufj3gry9e3e8ysgpr5gcwhc8waqy Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=fuOSDGR8; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf29.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 94.23.1.103 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev X-HE-Tag: 1636965640-832160 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi After recent soft-offline rework, error pages can be taken off from buddy allocator, but the existing unpoison_memory() does not properly undo the operation. Moreover, due to the recent change on __get_hwpoison_page(), get_page_unless_zero() is hardly called for hwpoisoned pages. So __get_hwpoison_page() highly likely returns -EBUSY (meaning to fail to grab page refcount) and unpoison just clears PG_hwpoison without releasing a refcount. That does not lead to a critical issue like kernel panic, but unpoisoned pages never get back to buddy (leaked permanently), which is not good. To (partially) fix this, we need to identify "taken off" pages from other types of hwpoisoned pages. We can't use refcount or page flags for this purpose, so a pseudo flag is defined by hacking ->private field. Someone might think that put_page() is enough to cancel taken-off pages, but the normal free path contains some operations not suitable for the current purpose, and can fire VM_BUG_ON(). Note that unpoison_memory() is now supposed to be cancel hwpoison events injected only by madvise() or /sys/devices/system/memory/{hard,soft}_offline_page, not by MCE injection, so please don't try to use unpoison when testing with MCE injection. [lkp@intel.com: report build failure for ARCH=i386] Signed-off-by: Naoya Horiguchi Reviewed-by: Yang Shi --- ChangeLog v4: - use integer value for MAGIC_HWPOISON to avoid compile error for ARCH=i386 - close race in unpoison_taken_off_page() ChangeLog v3: - fix description - add PageTable check in unpoison_memory() - fix return value of clear_page_hwpoison() - pass page instead head to PageHWPoisonTakenOff check. - rename take_page_back_buddy() with put_page_back_buddy() ChangeLog v2: - unpoison_memory() returns as commented - explicitly avoids unpoisoning slab pages - separates internal pinning function into __get_unpoison_page() --- include/linux/mm.h | 1 + include/linux/page-flags.h | 4 ++ mm/memory-failure.c | 109 ++++++++++++++++++++++++++++++------- mm/page_alloc.c | 27 +++++++++ 4 files changed, 122 insertions(+), 19 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7941bca871dc..8bbb7205ef9f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3220,6 +3220,7 @@ enum mf_flags { MF_ACTION_REQUIRED = 1 << 1, MF_MUST_KILL = 1 << 2, MF_SOFT_OFFLINE = 1 << 3, + MF_UNPOISON = 1 << 4, }; extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 52ec4b5e5615..a4fe056910bb 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -522,7 +522,11 @@ PAGEFLAG_FALSE(Uncached, uncached) PAGEFLAG(HWPoison, hwpoison, PF_ANY) TESTSCFLAG(HWPoison, hwpoison, PF_ANY) #define __PG_HWPOISON (1UL << PG_hwpoison) +#define MAGIC_HWPOISON 0x48575053U /* HWPS */ +extern void SetPageHWPoisonTakenOff(struct page *page); +extern void ClearPageHWPoisonTakenOff(struct page *page); extern bool take_page_off_buddy(struct page *page); +extern bool put_page_back_buddy(struct page *page); #else PAGEFLAG_FALSE(HWPoison, hwpoison) #define __PG_HWPOISON 0 diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 722036539b44..0f8b798cba69 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1160,6 +1160,22 @@ static int page_action(struct page_state *ps, struct page *p, return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY; } +static inline bool PageHWPoisonTakenOff(struct page *page) +{ + return PageHWPoison(page) && page_private(page) == MAGIC_HWPOISON; +} + +void SetPageHWPoisonTakenOff(struct page *page) +{ + set_page_private(page, MAGIC_HWPOISON); +} + +void ClearPageHWPoisonTakenOff(struct page *page) +{ + if (PageHWPoison(page)) + set_page_private(page, 0); +} + /* * Return true if a page type of a given page is supported by hwpoison * mechanism (while handling could fail), otherwise false. This function @@ -1262,6 +1278,27 @@ static int get_any_page(struct page *p, unsigned long flags) return ret; } +static int __get_unpoison_page(struct page *page) +{ + struct page *head = compound_head(page); + int ret = 0; + bool hugetlb = false; + + ret = get_hwpoison_huge_page(head, &hugetlb); + if (hugetlb) + return ret; + + /* + * PageHWPoisonTakenOff pages are not only marked as PG_hwpoison, + * but also isolated from buddy freelist, so need to identify the + * state and have to cancel both operations to unpoison. + */ + if (PageHWPoisonTakenOff(page)) + return -EHWPOISON; + + return get_page_unless_zero(page) ? 1 : 0; +} + /** * get_hwpoison_page() - Get refcount for memory error handling * @p: Raw error page (hit by memory error) @@ -1278,18 +1315,26 @@ static int get_any_page(struct page *p, unsigned long flags) * extra care for the error page's state (as done in __get_hwpoison_page()), * and has some retry logic in get_any_page(). * + * When called from unpoison_memory(), the caller should already ensure that + * the given page has PG_hwpoison. So it's never reused for other page + * allocations, and __get_unpoison_page() never races with them. + * * Return: 0 on failure, * 1 on success for in-use pages in a well-defined state, * -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle - * operations like allocation and free. + * operations like allocation and free, + * -EHWPOISON when the page is hwpoisoned and taken off from buddy. */ static int get_hwpoison_page(struct page *p, unsigned long flags) { int ret; zone_pcp_disable(page_zone(p)); - ret = get_any_page(p, flags); + if (flags & MF_UNPOISON) + ret = __get_unpoison_page(p); + else + ret = get_any_page(p, flags); zone_pcp_enable(page_zone(p)); return ret; @@ -1942,6 +1987,28 @@ core_initcall(memory_failure_init); pr_info(fmt, pfn); \ }) +static inline int clear_page_hwpoison(struct ratelimit_state *rs, struct page *p) +{ + if (TestClearPageHWPoison(p)) { + unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n", + page_to_pfn(p), rs); + num_poisoned_pages_dec(); + return 1; + } + return 0; +} + +static inline int unpoison_taken_off_page(struct ratelimit_state *rs, + struct page *p) +{ + if (put_page_back_buddy(p)) { + unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n", + page_to_pfn(p), rs); + return 0; + } + return -EBUSY; +} + /** * unpoison_memory - Unpoison a previously poisoned page * @pfn: Page number of the to be unpoisoned page @@ -1958,9 +2025,7 @@ int unpoison_memory(unsigned long pfn) { struct page *page; struct page *p; - int freeit = 0; - int ret = 0; - unsigned long flags = 0; + int ret = -EBUSY; static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); @@ -1996,24 +2061,30 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; } - if (!get_hwpoison_page(p, flags)) { - if (TestClearPageHWPoison(p)) - num_poisoned_pages_dec(); - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", - pfn, &unpoison_rs); + if (PageSlab(page) || PageTable(page)) goto unlock_mutex; - } - if (TestClearPageHWPoison(page)) { - unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n", - pfn, &unpoison_rs); - num_poisoned_pages_dec(); - freeit = 1; - } + ret = get_hwpoison_page(p, MF_UNPOISON); + if (!ret) { + if (clear_page_hwpoison(&unpoison_rs, page)) + ret = 0; + else + ret = -EBUSY; + } else if (ret < 0) { + if (ret == -EHWPOISON) { + ret = unpoison_taken_off_page(&unpoison_rs, p); + } else + unpoison_pr_info("Unpoison: failed to grab page %#lx\n", + pfn, &unpoison_rs); + } else { + int freeit = clear_page_hwpoison(&unpoison_rs, p); - put_page(page); - if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) put_page(page); + if (freeit && !(pfn == my_zero_pfn(0) && page_count(p) == 1)) { + put_page(page); + ret = 0; + } + } unlock_mutex: mutex_unlock(&mf_mutex); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..d9ba6c1b9f43 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -9448,6 +9449,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); + SetPageHWPoisonTakenOff(page); if (!is_migrate_isolate(migratetype)) __mod_zone_freepage_state(zone, -1, migratetype); ret = true; @@ -9459,4 +9461,29 @@ bool take_page_off_buddy(struct page *page) spin_unlock_irqrestore(&zone->lock, flags); return ret; } + +/* + * Cancel takeoff done by take_page_off_buddy(). + */ +bool put_page_back_buddy(struct page *page) +{ + struct zone *zone = page_zone(page); + unsigned long pfn = page_to_pfn(page); + unsigned long flags; + int migratetype = get_pfnblock_migratetype(page, pfn); + bool ret = false; + + spin_lock_irqsave(&zone->lock, flags); + if (put_page_testzero(page)) { + ClearPageHWPoisonTakenOff(page); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); + if (TestClearPageHWPoison(page)) { + num_poisoned_pages_dec(); + ret = true; + } + } + spin_unlock_irqrestore(&zone->lock, flags); + + return ret; +} #endif