From patchwork Fri Apr 19 08:58:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miaohe Lin X-Patchwork-Id: 13635974 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74CA4C4345F for ; Fri, 19 Apr 2024 09:00:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDC6B6B007B; Fri, 19 Apr 2024 05:00:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8CC16B0082; Fri, 19 Apr 2024 05:00:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A54D46B0083; Fri, 19 Apr 2024 05:00:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 87B1C6B007B for ; Fri, 19 Apr 2024 05:00:53 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BDA2640D2D for ; Fri, 19 Apr 2024 09:00:52 +0000 (UTC) X-FDA: 82025686344.28.BCEDAE2 Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35]) by imf29.hostedemail.com (Postfix) with ESMTP id 54A28120022 for ; Fri, 19 Apr 2024 09:00:48 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713517250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=dtO/qgVSNjluWq185j7juVH1VRbFKByHDJqzEP+cUOY=; b=kDSJhL/lqkDoHjiIiwPOErwsJNmxO2p7QZb1manHI+qmYiDOeYjL5wlJ2EtZi4nACwDXEO XlVFG8Yr+d8Jg/0+JLwbQ3PdCHeX+74OaJi7lIS/0HzSEvpkCQxVKyhZHzi4bb+x5SmEse WYMmbJtifVBPa+MoHf6rlHfKuznPQIY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713517250; a=rsa-sha256; cv=none; b=0iqDChTM0P6hIbe+EQZWh4zzw4VKq9ARcl39p4gBpUfqGohhW1KJqIQuN5XcCN66o8JVCL ryyefgro68S0+Qe9WNH3HYBHSJpBuKrRV3PRSS4N/qSYS5X/ZoTB7+SlYNxQtLpZOgkOzB CGDUMmvBbwjqkUZMF/lYoM/5mD5eWc4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.35 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4VLT7D5P3bz1RCg0; Fri, 19 Apr 2024 16:57:44 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id D89341A016F; Fri, 19 Apr 2024 17:00:44 +0800 (CST) Received: from huawei.com (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 19 Apr 2024 17:00:44 +0800 From: Miaohe Lin To: , CC: , , , , Subject: [PATCH v2] mm/hugetlb: fix DEBUG_LOCKS_WARN_ON(1) when dissolve_free_hugetlb_folio() Date: Fri, 19 Apr 2024 16:58:19 +0800 Message-ID: <20240419085819.1901645-1-linmiaohe@huawei.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 X-Originating-IP: [10.173.135.154] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500002.china.huawei.com (7.192.104.244) X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 54A28120022 X-Rspam-User: X-Stat-Signature: 6rr8mznf8j6utfz4us3g4aw6tdsi1i8q X-HE-Tag: 1713517248-990217 X-HE-Meta: U2FsdGVkX1+5pwWr3Cqu5bBbUsAebObRnDyVFNswIK8cKU2jlnfG20IqvO/AlvdAG+Rn3w0sQl00soBoExdW8nwidW81UZSjhzP4qqR5TZUsQMBS8Pg5Lutmwa+6+SJ7GMHrBHOs5REXubnawr7h1HTjea3v53OJFluIn8/d7fzaFkqVU2tNTNGrBvx0F+U/j8ZlQEoOqVkzOwZzfo375PFgR2+0tLBg68rVyQfTqH34Aj48dMIUfQiQMPfaMs6VVe4pB2hicTGPS8H2ShWpNkeVXipZoijpe0sbQ4T8EDcQvfx4SCE2rFmxiFaN0LWv6gcJ0795aU0DD4BP8AUC7+mooy1DY1heDeed2pBJYvKkIhx3h4aPiGm54TAIdt13BEzGu/31uHpyj+9Dz2buVJ+zbfqDcz5cUpYl/JoeC0KLvxfqgGfrUhB1CazSbX99NJKaKKHwtZe7lzcMk/qVwDJX34d3ytKXe3hupE/gWrqGC3uz4wcbO7oBvU+ksuFpbtoi8zq52/SRJ8DXxPtbqrF41JODaoErAnreUkH2AgaZUI1wKUv6Pm25XxVejQoDtqP4DUuJK7oPn3EOKZqc2YbcIN6Op7T4p4diNFbZkxfMlPXkes7bQ51HNTMnJPFO1eH8DBm/xBQLU/YBzHstGT4Zk6Frb7bPjUY6OQByoDxWNLL+DuN2qt8bjwkwMTsxuAJPXVmuOZW7QKceTog8C8UrEhiuHDd679LO/mpEPBh1h9cilmp6WsAYzwn/Ie/Gr58r2VTBg/Kn2CMk2n3FutB46tv8E1ZywCD2AJENZyGSTBQ8Vywn9B1LINWorbenX7UxucNweh0PB7aZDFz66zhPCf5r3MOUtjNCImMNv7Y5HG4Vb3FYyfOnJU51mrXHp8TeWRJGCxmiwSVBwF6DAo/6PbIHesVsl85OPHO/DO5B+se+aFjB4bfveQy7d1MxB+f5EuCnhUQDgDSgH9P b+3hUcgh n/vdezhUj4HJn/ixEtDgVPZMC2BbTii5wq1CR1VgGW+FcNZz+H6guXnrYxxbherU70VfjuIkFs+d0K9eYEpkZlbdfE44uGbKkdxniQe6uvcu8IUszlLim6q/MdvI7WnE9kJRMCnkANzLI9qS4p/DwV9ZBFN7kANWG9Mt0No0iCeMjJe9jWJAd9C2RuyfKWVkfzWZ3omUrPGB4pI/Ko8cMRMEhveckwpgzlXAa/I7jbX40nxuj+SErP/XICmRFN85WwAijbZfOmdi/BYmHkgbqlW2YSyQrvXw/gRZU2sy9O7LQSEaNpmX6PGSW/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When I did memory failure tests recently, below warning occurs: DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0 Modules linked in: mce_inject hwpoison_inject CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004 FS: 00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0 Call Trace: lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 Kernel panic - not syncing: kernel: panic_on_warn set ... CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: panic+0x326/0x350 check_panic_on_warn+0x4f/0x50 __warn+0x98/0x190 report_bug+0x18e/0x1a0 handle_bug+0x3d/0x70 exc_invalid_op+0x18/0x70 asm_exc_invalid_op+0x1a/0x20 RIP: 0010:__lock_acquire+0xccb/0x1ca0 RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082 RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0 RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10 R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004 lock_acquire+0xbe/0x2d0 _raw_spin_lock_irqsave+0x3a/0x60 hugepage_subpool_put_pages.part.0+0xe/0xc0 free_huge_folio+0x253/0x3f0 dissolve_free_huge_page+0x147/0x210 __page_handle_poison+0x9/0x70 memory_failure+0x4e6/0x8c0 hard_offline_page_store+0x55/0xa0 kernfs_fop_write_iter+0x12c/0x1d0 vfs_write+0x380/0x540 ksys_write+0x64/0xe0 do_syscall_64+0xbc/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7ff9f3114887 RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887 RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001 RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00 After git bisecting and digging into the code, I believe the root cause is that _deferred_list field of folio is unioned with _hugetlb_subpool field. In __update_and_free_hugetlb_folio(), folio->_deferred_list is initialized leading to corrupted folio->_hugetlb_subpool when folio is hugetlb. Later free_huge_folio() will use _hugetlb_subpool and above warning happens. But it is assumed hugetlb flag must have been cleared when calling folio_put() in update_and_free_hugetlb_folio(). This assumption is broken due to below race: CPU1 CPU2 dissolve_free_huge_page update_and_free_pages_bulk update_and_free_hugetlb_folio hugetlb_vmemmap_restore_folios folio_clear_hugetlb_vmemmap_optimized clear_flag = folio_test_hugetlb_vmemmap_optimized if (clear_flag) <-- False, it's already cleared. __folio_clear_hugetlb(folio) <-- Hugetlb is not cleared. folio_put free_huge_folio <-- free_the_page is expected. list_for_each_entry() __folio_clear_hugetlb <-- Too late. Fix this issue by checking whether folio is hugetlb directly instead of checking clear_flag to close the race window. Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap") CC: stable@vger.kernel.org Signed-off-by: Miaohe Lin Reviewed-by: Oscar Salvador --- v2: The root cause should be above race, so rework the fix. --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d748664bb2c9..3b7d5ddc32ad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1773,7 +1773,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h, * If vmemmap pages were allocated above, then we need to clear the * hugetlb flag under the hugetlb lock. */ - if (clear_flag) { + if (folio_test_hugetlb(folio)) { spin_lock_irq(&hugetlb_lock); __folio_clear_hugetlb(folio); spin_unlock_irq(&hugetlb_lock);