From patchwork Thu Oct 19 07:36:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13428411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6982CDB482 for ; Thu, 19 Oct 2023 07:38:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EB8D6B016A; Thu, 19 Oct 2023 03:38:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37A5E6B0175; Thu, 19 Oct 2023 03:38:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C3996B0174; Thu, 19 Oct 2023 03:38:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 016566B016A for ; Thu, 19 Oct 2023 03:38:15 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C5EE1140733 for ; Thu, 19 Oct 2023 07:38:15 +0000 (UTC) X-FDA: 81361407750.15.80AC968 Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf22.hostedemail.com (Postfix) with ESMTP id E97F5C0013 for ; Thu, 19 Oct 2023 07:38:13 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=kZHWWUpr; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf22.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697701094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hrlA2D0tsTZAhmUT4zIyvppYy9kAGeNu6j+59eQyeHI=; b=5PT2vYeMEMkrPBo6hn5fz2PKZeEoq4Fwj5Bo1kCtJXE19xXxMWhGkwEREo8dX0C8sF/YWa F6fKGuc7mriUEoZwToDDEUfnx+uJTL78dLmpPZGosKm4bqcSlc9UjTMWLOBJ+r+tZuNNAb gjNFJM2egZzQDzEZoYmXg1EvsplXePQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=kZHWWUpr; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf22.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697701094; a=rsa-sha256; cv=none; b=HYYdaTjJ0ojBabQXSh9UjrFVi1Gnok655ZHF2IeWIT9V14ssSWApbrkIY7EfNYPYc534Mb BbllAf5sPkfsXGxsCuiMpNyAu+K5qX2dn3zlyjCMeuhoCIZWphdEKemMSFxqmqKfjgCCl8 Ww58KVtNubFmrwubaLwzE5QQwnsv9Qo= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1c9c3c51e01so4735365ad.0 for ; Thu, 19 Oct 2023 00:38:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1697701092; x=1698305892; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hrlA2D0tsTZAhmUT4zIyvppYy9kAGeNu6j+59eQyeHI=; b=kZHWWUprgMopZhsuSRy4ZkisPNu6o1bHZnE+PsRpKNLsFSHYrKrwvy1q6DA9hQe65C nNgZng1xXxw9Vs3t1xsg8yjNjkUqBFkGkh/pV3D13lpk9vd+sb7Rxluw6HZ3Yio6D2I0 vBNvbL5v9+SI3qBRYCvyTXVrTn9d4PUch7qS6YP35cpul1p5ScQe9tAO9POhK2N/Ha5i F5FW4goPPPRjoMlZBhyXA8u30MKJs9geM/Vq5hSWw1/MTJscmiZUlW6wlDFTS+GfwZzY ibSHuGuXem82kKJWmTmyYzi8EiN8kk6umwraQjN+5isXCLQX5E0EdRNLun7yQEzCp0WM cyPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697701092; x=1698305892; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hrlA2D0tsTZAhmUT4zIyvppYy9kAGeNu6j+59eQyeHI=; b=Za+V17R3Rh8RXGl1GbXU8K1dSmFExi67P0KXYCkQNhZ6+kYWBlR45N1e29bWOOpSjL wJQZDDBWno0IBkucEbLDvQ9cOwLshkO/TVsxpBlgko925+pEYDzwA8YjpLp3G3HmOqxV M/WJHd8xba/ow71S8HK+p94NJo1XIIe8e9Y9ojPp67QJUWibqF6nDitiBmp2/ztwegbv itutwccdPSwDoUmW4SSDG0GdW7jiDjozD/KsaZiIBl6bJLeTfmlbjqpxG/GxqqzxxhmR ZT4AixLy8qEYKyH2RtwL8xnqEnua4ENlg0G5vih8CNJcshXTGp3XkzAz2n9cqx9f5UVf e0+Q== X-Gm-Message-State: AOJu0YwbPtFuKq/BJsTMKNWC7r5xxNuB1DE4THLko31g5Cy05XabLQRz 06kSLZ0VfwtWJCbZclJCsUTnrA== X-Google-Smtp-Source: AGHT+IFjDgx7I8RE61UxKsbQGrvNTb8O4D1v2H9dfhDr8mxzcuSeWnOhjtJ/Uw3I1OjPQ8RpbMiOOA== X-Received: by 2002:a17:902:e5c1:b0:1ca:858d:5bef with SMTP id u1-20020a170902e5c100b001ca858d5befmr1524433plf.4.1697701092603; Thu, 19 Oct 2023 00:38:12 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id d9-20020a170902854900b001c5d09e9437sm1194306plo.25.2023.10.19.00.38.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 00:38:12 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, rppt@kernel.org, david@redhat.com, vbabka@suse.cz, mhocko@suse.com Cc: willy@infradead.org, mgorman@techsingularity.net, mingo@kernel.org, aneesh.kumar@linux.ibm.com, ying.huang@intel.com, hannes@cmpxchg.org, osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Qi Zheng Subject: [PATCH v2 1/2] mm: page_alloc: skip memoryless nodes entirely Date: Thu, 19 Oct 2023 15:36:26 +0800 Message-Id: <7928768f2658cd563978f5e5bf8109be1d559320.1697687357.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E97F5C0013 X-Stat-Signature: 16gci431ijje4b8nj4hfm8x8t3syq15t X-Rspam-User: X-HE-Tag: 1697701093-926635 X-HE-Meta: U2FsdGVkX19JQhmd6JX9OYc2XIUdOnKCwercexTMmv39Qz3HpzuHAfqnGWU5Am5AyGc3XrRj24G3zsNZktAzjSBwAAYhtFXFuu0WIXbwDWDzRLEAieCHArp7/hVyCk9pLNy48iLpZ6ULNyZViq4tkN3sEhFTPUSe8FJ3NAL45Tnt5D3XppjDAdwrcNZ9jw28xhSugxceGJE8uaA1fI2BZM/NB8MY28bFeL9hyw3uHtVSmcvAGAjbzzvl1cvniUwhha2fSQ7RUZyYIvh/pjExKCrTl1hMDlq+LYLfZxJiEcfnJiTWNPduyk28x+3VTLEEK1zpRbVRzsYYt02/qcUvuWDGTx2TUG50PyxeDkM1G06kD+QzVCygoxfLaBCkk6ZAUdooLmLvL1PoJLbIR3B2bQJcU7ITJ9aztUmYyFZFe4JBSZ9z8bIFd+0l1AAhi8cqWBBxBQJQU2e8Fti+OCOKqcXYQW41gYQfBDQKQiKPVIZfI916D/AnT3MadcjHajz6gjHHX32gZyNVN+pAZBPTY4X+PtYeh90hS8WwULz2eK+kG5e0RgWlTkPxdbg9N7jkkpmqvLYW/nRWOGCTvqhMFFD2CwG7N7Uig7vLy+zVmporJOr4t6uBZGOMKQvMlDokAHC7/PqZMfHeE53JMCSMP15voPQdnrMTr8TZrl9pPz7ak5Tp5GsZqc42zhPANZk2qzk+53J/ZuuPiWceiVF79bbz8H56tpLGQ8GJgCcYxdgcDAlSEJ81N4LpxY1TM68OMbR3qfikOx0IZ/TZmyWq6tvpbYG13A0pJRSzxKmnK0KG5b6/XTfY8ukgITmPtsh/8SAtwut7qYWG7iWzjrXuWtAVMqPcDYGVedZOTe1hQ0d4DkKOnnRahJYeXGwAjIrVc96shM/rJC74D6K55qaQVgxbBjIta/lCBrqgbDqleMaXvFMR8Rh/UAd0npr8YyC/edyHsz0hgw6HHgISv/y hOK/e7Y2 Ujnm/fAGHfQwGjLtA9Z5uhzToamciQdXyFN5+kuUHEafiDkTqM37PCzcb/hc3I5/ahNGFC5nnnj5kqlkTOXBPl9igbj4Qavxj/nnCTobxQ9C5fMxHMAOWcl/udHk3Usp/ec0MoL756qkJartp5Ha5tEStmx8xAEXplzSKhk3lwm6dIyRm7a7fIy3tAzyyJ3XLRXWXFRrosai5h4w6FnnTHI4vrPJmhK9dcF/GSCHpaM5oEr98Z10+eYzeYex5/PyBimCrMjJuVXzula20e/oJiGFIdBBCj1deCwcGIvdUY1LLI96gUA96Qhc/hfRxbw6Joeo2DmgOU8Kdzd55I3sAPHCppFIvVt6+My+8rZc1CETGr9EC0sYnWLiqrHL0/mn3PNBwA8TyM1eTziVh908kjFsM/rxgajQ0Pqpaw5+7jlDIxnmr6mQHbzEBVo7Lhc35p+H//cDDLkaxfICOngZmRv8bgkk2Qdyg3Oz0myawpy11+1OXwSrNST1/guvKyKnsl8/X2j9Q5A3inQU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In find_next_best_node(), We skipped the memoryless nodes when building the zonelists of other normal nodes (N_NORMAL), but did not skip the memoryless node itself when building the zonelist. This will cause it to be traversed at runtime. For example, say we have node0 and node1, node0 is memoryless node, then the fallback order of node0 and node1 as follows: [ 0.153005] Fallback order for Node 0: 0 1 [ 0.153564] Fallback order for Node 1: 1 After this patch, we skip memoryless node0 entirely, then the fallback order of node0 and node1 as follows: [ 0.155236] Fallback order for Node 0: 1 [ 0.155806] Fallback order for Node 1: 1 So it becomes completely invisible, which will reduce runtime overhead. And in this way, we will not try to allocate pages from memoryless node0, then the panic mentioned in [1] will also be fixed. Even though this problem has been solved by dropping the NODE_MIN_SIZE constrain in x86 [2], it would be better to fix it in core MM as well. [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ Signed-off-by: Qi Zheng Acked-by: David Hildenbrand --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ee392a324802..e978272699d3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5052,8 +5052,11 @@ int find_next_best_node(int node, nodemask_t *used_node_mask) int min_val = INT_MAX; int best_node = NUMA_NO_NODE; - /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + /* + * Use the local node if we haven't already. But for memoryless local + * node, we should skip it and fallback to other nodes. + */ + if (!node_isset(node, *used_node_mask) && node_state(node, N_MEMORY)) { node_set(node, *used_node_mask); return node; }