From patchwork Thu May 16 12:26:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 13666140 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DB3EC25B74 for ; Thu, 16 May 2024 12:29:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 730476B0122; Thu, 16 May 2024 08:29:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E1546B02A1; Thu, 16 May 2024 08:29:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57FB86B0122; Thu, 16 May 2024 08:29:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3A1CA6B00FD for ; Thu, 16 May 2024 08:29:19 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D8DBD1A14D6 for ; Thu, 16 May 2024 12:29:18 +0000 (UTC) X-FDA: 82124189196.16.EBE8762 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf09.hostedemail.com (Postfix) with ESMTP id 0A7FB14001F for ; Thu, 16 May 2024 12:29:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715862557; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=4AqN6zq/4JTrQvmPZVQCIjPygCOeJ9OGCXs5M0wSAU0=; b=Kbj534D/uQ+GET866L/VW4WA+wexi3IRK0ZFGuyG2wpsDvxSkiX/egGbb0laDvMBjKRl0b q3TWXygrMO090uM94flT77izhOwAUzcUXBWL9VF2LD+Tlg6acyMUK0hewNIT7/gEidGZwp haOvvM6zVyEyCeVPMp0MGvHbd7VT15A= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715862557; a=rsa-sha256; cv=none; b=Py1cT7fDMH8FATdSbZRPaNBhlpbXyuKl2MQXyJKwROXhGtcTqTKvHnIqeK3vQkpL6SiDC7 UjrTFgM5ezunnlV5cwWf/yz5Wv3ygITOjnBcnlhpdr7rJThrQzDVco2p6aHj8raedneX+V CNp63TjaJ68flrU6JJn4RWfmB8eJNW0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4Vg8Sq3k5nz1j4m9; Thu, 16 May 2024 20:25:47 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id 1303D1A016C; Thu, 16 May 2024 20:29:11 +0800 (CST) Received: from huawei.com (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Thu, 16 May 2024 20:29:10 +0800 From: Miaohe Lin To: CC: , , , , , , , Subject: [PATCH v3] mm/huge_memory: don't unpoison huge_zero_folio Date: Thu, 16 May 2024 20:26:08 +0800 Message-ID: <20240516122608.22610-1-linmiaohe@huawei.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 X-Originating-IP: [10.173.135.154] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To canpemm500002.china.huawei.com (7.192.104.244) X-Rspam-User: X-Stat-Signature: hruj7j139u6nug551198j445uc56tfow X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0A7FB14001F X-HE-Tag: 1715862555-791582 X-HE-Meta: U2FsdGVkX18ZZRbhtGckGt+JR+Cn4QBI6ywPyZ3iTcMZGRKyN942ikGTRfOP+c+5sgDZ4p0aJaH/G0Mbbpun7h4wq3xmYkAY72u4GaP02778NGnBr3xNk0kwwZ2YoVVzjzb4NCrK/0foJC3qU4a0alxbfxo8V07BROcBwdhgRLs1LrLwLEn0OhTHvcNy5e8xDHIeibzlnXEmqtfMGP2Y0q2IqX7w2O2kL1nxyhsW02lgsnWJFdxoAjTsGrUeWO+GLXyZg/0sjt5e1TCjHQk0f3juzJmNSvGe1phpjmxrZJhaN/Vh4A2+myGiAOOzzC4dk3YfLRYW60J6kr5h1FlH+pYl6/u2cAFD/oSwhdJGj3ZgkyIQ1EG+ZSR+bAw8J9iUhoeO1UKU6HA/JDiTlkfHYQzPETKtB7wE9q5Y15qBZBrtCI3wA9RysE7vSwzjiZwFZGoIs+IauAwb3FgdPEALUqa9sabAgA4xkdPdZGHlDptU28/N+KyN8Jw4qhaZKk6il8STRzFad7lKkI5bvpMY+sO/DeRABO7HdXv53gslZVwm6vNOnJFsBqwHeQIj9eYgZ7uhfRCLxoHIJEhOJqIGzm9b5eYbqmr1pi0CZH5FAo9h43VphCq3Ce4/vS/+33Y9Pub+JxPp7Kg4t4Nblm/r1kwZ4stX/s1pjV/t46F1k03MjKbwb+AQc1lWFX0R2PNDspqbERVmz6YDNgyNCZo18WNvjMSQu3Nczr45NOZ+DCtApaXe4g4c3dEvrQduX+8KTFVUn18T1ynDf/6QWfdo1kfiQSezgkTaUiu/itCmcZE+FG+LKgpWIf/xx7tB7WoYTDslPdAwQaAumg7GkBnP+ZzI8wfiOtxC+7KGcRfFCClviWRQBZHl6rChuCFQDBr3Zbdy9eNLx71lfPJJCluZ8jUAKIBAEtlwTkXQk9Zcxprr0Y/MR4Jpb94GX/m4yPtb0AboZizj9CUTOHj5WDN 9APr5m3a bHxJh3cyMFH152wD8XJtW6PhYll67KsZLEayllTzm7wyic68CyPAb0s3pSpjv00R3nO7S38Zf9XZCUZorWXHQAWsM61vEnoNQZUkhqTYz+YxW25dEpSLHQzKiT3iEzSHjFmRDJyCej1fMpvQJOGHIwzWxCD1Vl70nrEB2AvR2Q200YUNpB1sWGyucBQoOQnhFKUD20E7zB5PCGgotqHxurwKZIHhu1zwC4l5V1qr6On6UGFtI4jUx0kQUoYa6H4vZ8JGddT1B30ya9EJtQm2JbnwB3aIiv33bXMtbc0hHjpsj1N/o3IaWHu3eBdQiqVrfVRygNrutff5b3ib941p/qjSNEVQ0Vw+vupqbzc0VAav/gwE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When I did memory failure tests recently, below panic occurs: kernel BUG at include/linux/mm.h:1135! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 Call Trace: do_shrink_slab+0x14f/0x6a0 shrink_slab+0xca/0x8c0 shrink_node+0x2d0/0x7d0 balance_pgdat+0x33a/0x720 kswapd+0x1f3/0x410 kthread+0xd5/0x100 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 The root cause is that HWPoison flag will be set for huge_zero_folio without increasing the folio refcnt. But then unpoison_memory() will decrease the folio refcnt unexpectly as it appears like a successfully hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when releasing huge_zero_folio. Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. We're not prepared to unpoison huge_zero_folio yet. Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Miaohe Lin Acked-by: David Hildenbrand Reviewed-by: Yang Shi Reviewed-by: Oscar Salvador Cc: Reviewed-by: Anshuman Khandual --- v3: Move up is_huge_zero_folio() check and change return value to -EOPNOTSUPP per Oscar. Collect Reviewed-by and Acked-by tag. Thanks. v2: Change to simply check for the huge zero page per David. Thanks. --- mm/memory-failure.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 16ada4fb02b7..a9fe9eda593f 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2546,6 +2546,13 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; } + if (is_huge_zero_folio(folio)) { + unpoison_pr_info("Unpoison: huge zero page is not supported %#lx\n", + pfn, &unpoison_rs); + ret = -EOPNOTSUPP; + goto unlock_mutex; + } + if (!PageHWPoison(p)) { unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", pfn, &unpoison_rs);