From patchwork Fri Oct 20 11:04:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 13430595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 705B5CDB47E for ; Fri, 20 Oct 2023 11:05:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E350D6B02C1; Fri, 20 Oct 2023 07:05:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE50E6B02C2; Fri, 20 Oct 2023 07:05:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5ED16B02C3; Fri, 20 Oct 2023 07:05:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B18146B02C1 for ; Fri, 20 Oct 2023 07:05:21 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7C0EC8086D for ; Fri, 20 Oct 2023 11:05:21 +0000 (UTC) X-FDA: 81365558442.13.F38FE22 Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) by imf15.hostedemail.com (Postfix) with ESMTP id AE4B7A003E for ; Fri, 20 Oct 2023 11:05:18 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=gKqUCCmG; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.160.45 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697799918; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gK+cFoR1MyWIl8n0phUtp2lC8tfEt3+4PWkcgkrk9lI=; b=NW1UPCiCcNGMpLABStPYtn+Ktpem8mcVhUbwm3gJuJQ7hz8fTx93Bho6mIc41kS+/zsJKu zi54ltgPVSt4SbwYAqIZIKy7hs5jgQiCpsbRJK+DSms1wrtWbEpwrYtEzVTeFNY5R9NKQ/ hNhJL+k+/yVwMYL/B150Dh6NxBfKeWw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=gKqUCCmG; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf15.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.160.45 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697799918; a=rsa-sha256; cv=none; b=WN189U716p8dHa6GIlDlvBu+COCZWtt0D7IH/MdFZLadHOe6riuQWXx6asZa57DjYG0yP3 GriTYJT6QCIZSPTNJOf32LBxTl45D/xjPy0ieCgANWWwGomlRRQylajr4inX0nOKVWB2HW DZamBiIkYtvvdxtN08WYGx3sUJPVPnw= Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-1e9c28f8193so191656fac.1 for ; Fri, 20 Oct 2023 04:05:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1697799918; x=1698404718; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gK+cFoR1MyWIl8n0phUtp2lC8tfEt3+4PWkcgkrk9lI=; b=gKqUCCmGEBKhfCNoaovs87P+DVn7Ozlok9GmJTWATe07qegy7mbX7PbgPEPf3ATypz ylFr5fmj/hJn5obvQFoXJ/7JzzHrc+sLcS7q+DQ/XQNS0rgQoJGzlIqHYemIURqD1vAd TcXW+tdTtv0xdV/bQoP7sQC477cGJPuAJkGg3a/rUpJ4i6rDe9nkwtyS7DkOIF6xLbkk rCoT07h+N52PFBdI60oK/jOTbOYVghCeLZIQlBSRfou+m2FgAHOdfAlPyfCis/6O+zuT t4wdpCx0CIoSt+SZSH66i5rdDfocy+6CXoA2IxGUm3JaUnvATamd2xeuDBujZ9oLqgBf bsoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697799918; x=1698404718; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gK+cFoR1MyWIl8n0phUtp2lC8tfEt3+4PWkcgkrk9lI=; b=SJeYA8VYF4P7vGunOehdYZ90aibTV2paXWWixBYbWogscV9ueXtS4AAoHd6D5KqD8I ECU/wUfRoKjbrwANFYDeRytBW952kLxno4e7mWZQSwucjNY1ozSOcK5S4xtSqD+2mqH7 nYRRP71jttwJ0gDDIWVxMTEH8nq4IHY0ohLOKCvGr4vy68Dcxgp1w/VMgQDSRc3G81/x MwOk4EEXHeDYGHve5LTk0ayqFxLSLHq0y+WnOB71w5P3xCxgx6VzbtAY/PYh+oNyl1+6 rK4U6hDCuweZ1WsiFBOtJ5MAfF3sAGnt3yCgx+HqL1snDLroybr7mCXYAeqT+jTXIKyl GCsQ== X-Gm-Message-State: AOJu0YxRj7u4PAFCe7yM0Qnvh3VTAADEx9IgBH8a1WczxwhxeapaxboH 8DObbltV2EgPP0VA/Px9zlOFVQ== X-Google-Smtp-Source: AGHT+IHzpFn+0JMDaI4Gs/mdtLhr+NGf107MOT2wTDU+64UeBpwHiVD2ZcJQjGH+akRjScgKxgEiwg== X-Received: by 2002:a05:6358:5922:b0:168:a35c:f07b with SMTP id g34-20020a056358592200b00168a35cf07bmr1449368rwf.0.1697799917664; Fri, 20 Oct 2023 04:05:17 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id z6-20020aa79f86000000b006be4bb0d2dcsm1323865pfr.149.2023.10.20.04.05.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 04:05:17 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, rppt@kernel.org, david@redhat.com, vbabka@suse.cz, mhocko@suse.com Cc: willy@infradead.org, mgorman@techsingularity.net, mingo@kernel.org, aneesh.kumar@linux.ibm.com, ying.huang@intel.com, hannes@cmpxchg.org, osalvador@suse.de, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Qi Zheng Subject: [PATCH v4 1/2] mm: page_alloc: skip memoryless nodes entirely Date: Fri, 20 Oct 2023 19:04:45 +0800 Message-Id: <7300fc00a057eefeb9a68c8ad28171c3f0ce66ce.1697799303.git.zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: AE4B7A003E X-Stat-Signature: xxhb7ibjmz6ae3q8j3r5fyndcss1spkt X-Rspam-User: X-HE-Tag: 1697799918-717973 X-HE-Meta: U2FsdGVkX18B8ALKU2avnI+CWXPkv+4h28EuEHZ4vK9mM+ihHkGnW24Px3hGZjfLehZGfMm+/0xypcly+omB4r9c42TjaeZPpIq+Oh6JADtrtFaBW+wgn4fe/vTxsc2pFcp8EG9VWM9SAo9JAY3DhVgyLhVV62xGN5uV4tFdONtcOJymqp2XNvCtXVJTqXhkYE50/OqXneofV01G+kwL8uCUc4owd8SmOEmH3tV/uVJOdW1TlRi2+1YTTVESmAbsbnR2/aGkJCQ+5nQzJ/KY/KQxGZl0RoVjYRtiAi3PT4EvXrhx2wJnWoZ20N4cBr++AcMl82upXqug8jJl3JcHDNMGDFnZfwwI9f3KvGSsPjmQv4aF/k9Wg0ajDU77z/+8IFPf9wtYVitpfu7TD+oH7CRjB/Z8DXzqSieZHY6nIGuTwGvWpD9v2k0xE0BAvYD0XU+erDxJxInLBAlfey18TW3CB0WUORKc0ygub8edjR8PMSpfgSCE2NwLoBOM+rPBi4wyHYvV8Jf5ExgPF0EFZGuLGYKwLns8n0ltsvawJD9rHfJyg0ElEy4RaUeyvKzaNhUDgmhsqs8+Yneh88T+H6Q2ZMHoUcRUDZPD8PRTA/OOX+WdT9c+QnmRmFLA2i3I8+UgtMF9eICem53NyVhxAbaL/7/mv04iSQRbxr03WnLcOG/sG+RUkajUvxRmMp3X6ULfXco7pln45Ii2b4kQcZ8TA/WLLdGpKTsav26lrxnU/6vax6WtVnen5B1Vt52xh8TQB/+5soiBO77J1VoTb01/EpJzubqgnwHd+7WjqGnSZfkBFk3f4Xu4r96sgi7hc3pdvZoDVPkFtwrlAkc7onTlb4C7J2lVD87E1KXHWwQQBCvGJzx/sGupcqhPv8UqQZqTP8qiaCoCl/Us8Jd87oS/FYBt50N9YgiCKvUkFle7hqLvH+P3Nlnp0KS4qR1rfvD9TVDaC9E6iT5pm0U eIMVVKTS gd5GMMauBBEluozZ9EMTP9fF0tzyJU1R/J6i1q6FVn1KzSr/V3LbWdC7ciqFs+5a4w6fD6BGCFZKCN0zqBrn4dxeuhG8dc1d+Cyx4lfZp/EV34lwBulehhMgltHjFnHlwsL28x/SKdpXwuLv0Lozm13IR10Z3iIVi1B2CY83iOlkwuQAq4eq4w8yQWo4EPhpOrGrPIIJ9BNYAvgwDi6r+PX5bwK2iGnYPnRKMeG5HYFbv7VHL5D1tgu0DQgUrHmehHEEd2x5dDgV9x1WZPvIeDN37OdkCXJ+UeI8CopYYdb+GzhRjakwq5ycljU69kAtetRKw9fkI+AtsCkLBWyIb9LKOHtDxO99wRBPrD/2g3LpJ17Dq2gsuXVjQ7O9IssC1S57HqKHE/hsdkWM/iP5DsX6Uat5R6LiLeLBE7bTyrf6EGlylRnfe0C3srdlAy2nlDKpZnpEyyhCurOWzUQbSSZWDAeTbPBy7g8Qy9dUIwSwjCiAVIvRgpo3IfcWDRiRUvjkeZs+12og2+41qDuzId0eBnNfpy9LJ0uIq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In find_next_best_node(), we skipped memoryless nodes when building the zonelists of other normal nodes (N_NORMAL), but did not skip the memoryless node itself when building the zonelist. This will cause it to be traversed at runtime. For example, say we have node0 and node1, node0 is memoryless node, then the fall back order of node0 and node1 as follows: [ 0.153005] Fallback order for Node 0: 0 1 [ 0.153564] Fallback order for Node 1: 1 After this patch, we skip memoryless node0 entirely, then the fall back order of node0 and node1 as follows: [ 0.155236] Fallback order for Node 0: 1 [ 0.155806] Fallback order for Node 1: 1 So it becomes completely invisible, which will reduce runtime overhead. And in this way, we will not try to allocate pages from memoryless node0, then the panic mentioned in [1] will also be fixed. Even though this problem has been solved by dropping the NODE_MIN_SIZE constrain in x86 [2], it would be better to fix it in the core MM as well. [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ Signed-off-by: Qi Zheng Acked-by: David Hildenbrand Acked-by: Ingo Molnar --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ee392a324802..1f852929709f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5052,8 +5052,11 @@ int find_next_best_node(int node, nodemask_t *used_node_mask) int min_val = INT_MAX; int best_node = NUMA_NO_NODE; - /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + /* + * Use the local node if we haven't already, but for memoryless local + * node, we should skip it and fall back to other nodes. + */ + if (!node_isset(node, *used_node_mask) && node_state(node, N_MEMORY)) { node_set(node, *used_node_mask); return node; }