From patchwork Thu Nov 3 21:36:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 13031041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 358CBC4332F for ; Thu, 3 Nov 2022 21:36:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6CCA66B0072; Thu, 3 Nov 2022 17:36:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 67CE16B0073; Thu, 3 Nov 2022 17:36:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5445F6B0074; Thu, 3 Nov 2022 17:36:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 46CF76B0072 for ; Thu, 3 Nov 2022 17:36:47 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 148E0A0A5D for ; Thu, 3 Nov 2022 21:36:47 +0000 (UTC) X-FDA: 80093440854.26.72ED582 Received: from mail-pg1-f174.google.com (mail-pg1-f174.google.com [209.85.215.174]) by imf20.hostedemail.com (Postfix) with ESMTP id 9F0131C0006 for ; Thu, 3 Nov 2022 21:36:45 +0000 (UTC) Received: by mail-pg1-f174.google.com with SMTP id 64so2797016pgc.5 for ; Thu, 03 Nov 2022 14:36:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=4ZwrDXqTUO/Z3/9PLna3CpJAKWFkXa6K2HPS+5CfcaY=; b=nH83IxOzHsX4AtrR9QIG+/1piqcRF7V90jcpsb3UEC2rZIjFNb9/on78X4AFZb9dNj 4PlRO9Hl3zgKtU/KI/Ea7wXroHZF2siQIUoCsW3Y9FOa7Ndm2bNGFr8oGj70nNNs+rXa 64wqBLVHRX8ye8s5ScrBvry/WBpJGIcL6NS5ntRoJr6Q1/abvFaYHz6dqU0ag+EdF8iw dUNEh7XrQtTfYJ+WCHjaCLLiPAuHBtp2Pnv3R2PkrRmfdT0qNqCvAhrGx5tEfecd1l7d CbUqv8R+aARgwM94g6FOvDgE/uc/GOsWoPLm9kyQ713keHWw1qSXfCv+ufwFhFIr9b2V zbgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4ZwrDXqTUO/Z3/9PLna3CpJAKWFkXa6K2HPS+5CfcaY=; b=S0yDRengkIcD6+D6m2cuiosml4QbhtjgOBSypISY0MavDE0BWdJWq2C1+4Q0h1xStV KZ+8pWARWkHcH4KVVYlPBjAN5I7rds+EeyuJuWuqfLsMQxtFcu97u0qaVVLDWhoj/BvO 1Ye0Nt5FDNw0SYxVeOhYMxDNSye2iGvIZyA2K1iZWT40VKE+7li2x1Uua2U/7lZ5E7fw oECNuKe3Lr7JasdbHH9zudL6Gt3FYsUzwDlqVT9R1xfG9dxC8kRUiKq/HflWeTOBs6I9 /hI2kL06n1/lV91l7AQl7LE4xyro3VyUFeelUaS8unvbBNxZKQh2/zgJNdApTpW5FAbI venA== X-Gm-Message-State: ACrzQf2tXNX/KvuIyXvEN8OjnwKa4vc/QJj3pwTzmx9U2jmFZ/a2QT5X AKOoqU7txJ5DLWy6FRLiudc= X-Google-Smtp-Source: AMsMyM61DneIJHyYPN7Ny0UB6XCSHJCONetltqWSH/2PvRCve/+P/G76deKPL8jNJbeRNbGVYKGncA== X-Received: by 2002:a05:6a00:cc6:b0:56d:3028:23ea with SMTP id b6-20020a056a000cc600b0056d302823eamr28528221pfv.19.1667511404455; Thu, 03 Nov 2022 14:36:44 -0700 (PDT) Received: from localhost.localdomain (c-67-174-241-145.hsd1.ca.comcast.net. [67.174.241.145]) by smtp.gmail.com with ESMTPSA id f132-20020a62388a000000b0056da8c41bbasm1195438pfa.161.2022.11.03.14.36.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Nov 2022 14:36:43 -0700 (PDT) From: Yang Shi To: zokeefe@google.com, mhocko@suse.com, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 1/2] mm: khugepaged: allow page allocation fallback to eligible nodes Date: Thu, 3 Nov 2022 14:36:40 -0700 Message-Id: <20221103213641.7296-1-shy828301@gmail.com> X-Mailer: git-send-email 2.26.3 MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667511405; a=rsa-sha256; cv=none; b=GE6S4hziTSH5HsVDf4h4faO8jgOOhoR+p3PPb4Q0+GMTW+ALibnuRUSPRe3z21FeJGLDfL 4A37dJWRR3msB0uWUfUFJfw6d9q4R2xAjiTelETA1tkFmbuRhNNTDAUbmaV65LXgYMRSSE flfDq2NVXKaAPAw+jjhgyhSQ0E1Etlw= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=nH83IxOz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667511405; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=4ZwrDXqTUO/Z3/9PLna3CpJAKWFkXa6K2HPS+5CfcaY=; b=BEwwHjOhoDl1SnS4pyGrUsDDAlWuPZe21LFdhK5p7adkk3Qy/vw5HeC2qti4S54+ijJZUo 1RoioxZcLXU1FEwUoIoTlOvHW/zHIy9nhYtFghs3dVTSJQB+mDWOtgSgAYT/9SbKpNA9Bv c3qC1v9PmJo53O3fBEdWL2DUhb6HO8A= X-Stat-Signature: ydjaye8tjzetoe7uexgxzxhdiydfj77t X-Rspamd-Queue-Id: 9F0131C0006 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=nH83IxOz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of shy828301@gmail.com designates 209.85.215.174 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1667511405-768860 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Syzbot reported the below splat: WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline] WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline] WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963 Modules linked in: CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted 6.1.0-rc1-syzkaller-00454-ga70385240892 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline] RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline] RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963 Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9 96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001 RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715 hpage_collapse_scan_file+0xd6c/0x17a0 mm/khugepaged.c:2156 madvise_collapse+0x53a/0xb40 mm/khugepaged.c:2611 madvise_vma_behavior+0xd0a/0x1cc0 mm/madvise.c:1066 madvise_walk_vmas+0x1c7/0x2b0 mm/madvise.c:1240 do_madvise.part.0+0x24a/0x340 mm/madvise.c:1419 do_madvise mm/madvise.c:1432 [inline] __do_sys_madvise mm/madvise.c:1432 [inline] __se_sys_madvise mm/madvise.c:1430 [inline] __x64_sys_madvise+0x113/0x150 mm/madvise.c:1430 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6b48a4eef9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6b48ccf318 EFLAGS: 00000246 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 00007f6b48af0048 RCX: 00007f6b48a4eef9 RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000000020000000 RBP: 00007f6b48af0040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b48aa53a4 R13: 00007f6b48bffcbf R14: 00007f6b48ccf400 R15: 0000000000022000 The khugepaged code would pick up the node with the most hit as the preferred node, and also tries to do some balance if several nodes have the same hit record. Basically it does conceptually: * If the target_node <= last_target_node, then iterate from last_target_node + 1 to MAX_NUMNODES (1024 on default config) * If the max_value == node_load[nid], then target_node = nid But there is a corner case, paritucularly for MADV_COLLAPSE, that the non-existing node may be returned as preferred node. Assuming the system has 2 nodes, the target_node is 0 and the last_target_node is 1, if MADV_COLLAPSE path is hit, the max_value may be 0, then it may return 2 for target_node, but it is actually not existing (offline), so the warn is triggered. The node balance was introduced by commit 9f1b868a13ac ("mm: thp: khugepaged: add policy for finding target node") to satisfy "numactl --interleave=all". But interleaving is a mere hint rather than something that has hard requirements. So use nodemask to record the nodes which have the same hit record, the hugepage allocation could fallback to those nodes. And remove __GFP_THISNODE since it does disallow fallback. And if nodemask is empty (no node is set), it means there is one single node has the most hist record, the nodemask approach actually behaves like __GFP_THISNODE. Reported-by: syzbot+0044b22d177870ee974f@syzkaller.appspotmail.com Suggested-by: Zach O'Keefe Suggested-by: Michal Hocko Signed-off-by: Yang Shi Reviewed-by: Zach O'Keefe Acked-by: Michal Hocko --- mm/khugepaged.c | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ea0d186bc9d4..572ce7dbf4b0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -97,8 +97,8 @@ struct collapse_control { /* Num pages scanned per node */ u32 node_load[MAX_NUMNODES]; - /* Last target selected in hpage_collapse_find_target_node() */ - int last_target_node; + /* nodemask for allocation fallback */ + nodemask_t alloc_nmask; }; /** @@ -734,7 +734,6 @@ static void khugepaged_alloc_sleep(void) struct collapse_control khugepaged_collapse_control = { .is_khugepaged = true, - .last_target_node = NUMA_NO_NODE, }; static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) @@ -783,16 +782,11 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc) target_node = nid; } - /* do some balance if several nodes have the same hit record */ - if (target_node <= cc->last_target_node) - for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == cc->node_load[nid]) { - target_node = nid; - break; - } + for_each_online_node(nid) { + if (max_value == cc->node_load[nid]) + node_set(nid, cc->alloc_nmask); + } - cc->last_target_node = target_node; return target_node; } #else @@ -802,9 +796,10 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc) } #endif -static bool hpage_collapse_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool hpage_collapse_alloc_page(struct page **hpage, gfp_t gfp, int node, + nodemask_t *nmask) { - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + *hpage = __alloc_pages(gfp, HPAGE_PMD_ORDER, node, nmask); if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); return false; @@ -955,12 +950,11 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, struct collapse_control *cc) { - /* Only allocate from the target node */ gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : - GFP_TRANSHUGE) | __GFP_THISNODE; + GFP_TRANSHUGE); int node = hpage_collapse_find_target_node(cc); - if (!hpage_collapse_alloc_page(hpage, gfp, node)) + if (!hpage_collapse_alloc_page(hpage, gfp, node, &cc->alloc_nmask)) return SCAN_ALLOC_HUGE_PAGE_FAIL; if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) return SCAN_CGROUP_CHARGE_FAIL; @@ -1144,6 +1138,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm, goto out; memset(cc->node_load, 0, sizeof(cc->node_load)); + nodes_clear(cc->alloc_nmask); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -2078,6 +2073,7 @@ static int hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, present = 0; swap = 0; memset(cc->node_load, 0, sizeof(cc->node_load)); + nodes_clear(cc->alloc_nmask); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -2581,7 +2577,6 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, if (!cc) return -ENOMEM; cc->is_khugepaged = false; - cc->last_target_node = NUMA_NO_NODE; mmgrab(mm); lru_add_drain_all(); @@ -2607,6 +2602,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, } mmap_assert_locked(mm); memset(cc->node_load, 0, sizeof(cc->node_load)); + nodes_clear(cc->alloc_nmask); if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) { struct file *file = get_file(vma->vm_file); pgoff_t pgoff = linear_page_index(vma, addr); From patchwork Thu Nov 3 21:36:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yang Shi X-Patchwork-Id: 13031042 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35E70C433FE for ; Thu, 3 Nov 2022 21:36:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 854716B0073; Thu, 3 Nov 2022 17:36:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8046F6B0074; Thu, 3 Nov 2022 17:36:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 656206B0075; Thu, 3 Nov 2022 17:36:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 50E356B0073 for ; Thu, 3 Nov 2022 17:36:48 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1A50A1414B7 for ; Thu, 3 Nov 2022 21:36:48 +0000 (UTC) X-FDA: 80093440896.15.968D96C Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf07.hostedemail.com (Postfix) with ESMTP id B9A0640002 for ; Thu, 3 Nov 2022 21:36:47 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id gw22so2886212pjb.3 for ; Thu, 03 Nov 2022 14:36:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6zbt6qB/1c/+1gLU+/4sNCX68bCG+yMYvDrFsBMAeuQ=; b=VKk+6TpjGdp93LuLE9ePqP553eYxDkMkD6mL4qi4cgGqPnoDWlWLHQKgdmvQUZFlWk K8w76FC/2Pkgd1TEAnIynxANEvR9oAn4/ovdI4PoNqALUtUiBJKAeahV6658YhiGyotB a1ubPt1i52WQeB/MHa3jH3xgvBCbEmjWDTOSy2kAcmmT3LztfJK+82elHSj20QaS1Tr+ U4w5chhei1gsNQpH0IMNXUkhu3HAsxi7CAPa1tBsyk9rSseSpqrjD8ZzmIzaSIDu24Dn bp8SLKgOfC0N2ISE5Pzwxa1Cc3o68Vc5zJ5XUywENHsL6TtB2/0bVZHeZ3Lluisa5MKQ Wgfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6zbt6qB/1c/+1gLU+/4sNCX68bCG+yMYvDrFsBMAeuQ=; b=MVHvjcjP7WSIzvWEq/1jD9DzgsJS/3fSqkc39/5htgP7lBQRWDYK/jdC7jEdk28kL4 3LxaPSBe0372ew13lXcv5jeEj5hnX9KkUkKG35lI9duqfL8IDkTK1yl7iGrLnPAbifKl jdPG5iINyQ01ad9fZ27PgraC9ch3tFhGQfKav0HTOOzPtSTkWKYFZ38hg2jcPwv2Kx6e fbwBr5CQBve75RinOeiHpzf2xQlqSirl7aztEraWpqeI1UgGZDm1RhvcOrxKsGLtFH+q MHR7G3IafA4Q7U/baJ6DCRqrs4mLCQLyQszF4Mwr4gAPm4U25o+L8CbmjolfZLhv5+et 0QAw== X-Gm-Message-State: ACrzQf0z/p/Pc+iAWdrV4SAxloabl5B7EdWv82zE0xZRrpqzMdvXQuHq 2jh/aDa4vsPcbh4SAaGj0s0= X-Google-Smtp-Source: AMsMyM6obNd7XwlBaMgkd7D5GeCTeE9NuIwQ9tms0i0f1S3WmYpmSZVpypgi51Bh1rFN5jWR+n5Kug== X-Received: by 2002:a17:90a:f0d1:b0:213:473e:6ff0 with SMTP id fa17-20020a17090af0d100b00213473e6ff0mr32774003pjb.87.1667511406761; Thu, 03 Nov 2022 14:36:46 -0700 (PDT) Received: from localhost.localdomain (c-67-174-241-145.hsd1.ca.comcast.net. [67.174.241.145]) by smtp.gmail.com with ESMTPSA id f132-20020a62388a000000b0056da8c41bbasm1195438pfa.161.2022.11.03.14.36.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Nov 2022 14:36:46 -0700 (PDT) From: Yang Shi To: zokeefe@google.com, mhocko@suse.com, akpm@linux-foundation.org Cc: shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v2 PATCH 2/2] mm: don't warn if the node is offlined Date: Thu, 3 Nov 2022 14:36:41 -0700 Message-Id: <20221103213641.7296-2-shy828301@gmail.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: <20221103213641.7296-1-shy828301@gmail.com> References: <20221103213641.7296-1-shy828301@gmail.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667511407; a=rsa-sha256; cv=none; b=Lk3Ndr2sv7whcgSw4G7dwS8rEyOWEptap989atM9656eM8/U5Xpft6/u0px6+hXrRMDW6T qyQhKfT8kEAZQM3aGvEniV8wyRty4CpxnKDcUHh8fUzxAmRBl5ax7fp+BqgbXB+7lu5Rna g6gFgibbl5TpbzG2WaCzWvxINsLU8/k= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=VKk+6Tpj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667511407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6zbt6qB/1c/+1gLU+/4sNCX68bCG+yMYvDrFsBMAeuQ=; b=JMVpoF6zuIESQNAFjbM03JsDAJwvm4dM8JjzH9XjuR8aadfSBX1BwBODCcY0BrMxZWFUke ILG8VmFkjpdhwoysIwF7PbAoygKrE9ekH4gR1Y6iuOlJjdiU4gPf2x4nHhOKs4rDHiksf5 viiIeLKCA6EDmQnlbIVUyDIqx9XrMRY= X-Stat-Signature: uhhy1r1efhc9eeezj9tyotds57y6qcfd X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B9A0640002 X-Rspam-User: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=VKk+6Tpj; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-HE-Tag: 1667511407-536609 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Syzbot reported the below splat: WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline] WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline] WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963 Modules linked in: CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted 6.1.0-rc1-syzkaller-00454-ga70385240892 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline] RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline] RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963 Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9 96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001 RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715 hpage_collapse_scan_file+0xd6c/0x17a0 mm/khugepaged.c:2156 madvise_collapse+0x53a/0xb40 mm/khugepaged.c:2611 madvise_vma_behavior+0xd0a/0x1cc0 mm/madvise.c:1066 madvise_walk_vmas+0x1c7/0x2b0 mm/madvise.c:1240 do_madvise.part.0+0x24a/0x340 mm/madvise.c:1419 do_madvise mm/madvise.c:1432 [inline] __do_sys_madvise mm/madvise.c:1432 [inline] __se_sys_madvise mm/madvise.c:1430 [inline] __x64_sys_madvise+0x113/0x150 mm/madvise.c:1430 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6b48a4eef9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6b48ccf318 EFLAGS: 00000246 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 00007f6b48af0048 RCX: 00007f6b48a4eef9 RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000000020000000 RBP: 00007f6b48af0040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b48aa53a4 R13: 00007f6b48bffcbf R14: 00007f6b48ccf400 R15: 0000000000022000 It is because khugepaged allocates pages with __GFP_THISNODE, but the preferred node is offlined. The previous patch fixed the khugepaged code to avoid allocating page from non-existing node. But it is still racy against memory hotremove. There is no synchronization with the memory hotplug so it is possible that memory gets offline during a longer taking scanning. So this warning still seems not quite helpful because: * There is no guarantee the node is online for __GFP_THISNODE context for all the callsites. * Kernel just fails the allocation regardless the warning, and it looks all callsites handle the allocation failure gracefully. It is actually even harmful for those running in panic-on-warn mode. So removing the warning seems like a good move. Reported-by: syzbot+0044b22d177870ee974f@syzkaller.appspotmail.com Signed-off-by: Yang Shi Reviewed-by: Zach O'Keefe Acked-by: Michal Hocko --- v2: * Added patch 1/2. * Reworded the commit log per Michal. include/linux/gfp.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index ef4aea3b356e..594d6dee5646 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -218,7 +218,6 @@ static inline struct page * __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); - VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid)); return __alloc_pages(gfp_mask, order, nid, NULL); } @@ -227,7 +226,6 @@ static inline struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid) { VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); - VM_WARN_ON((gfp & __GFP_THISNODE) && !node_online(nid)); return __folio_alloc(gfp, order, nid, NULL); }