From patchwork Thu Jul 28 19:04:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5934AC19F2B for ; Thu, 28 Jul 2022 19:06:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B542194000A; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB5148E0001; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9580D94000A; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 839C18E0001 for ; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 557721C701E for ; Thu, 28 Jul 2022 19:06:18 +0000 (UTC) X-FDA: 79737439236.25.E6FD64F Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf01.hostedemail.com (Postfix) with ESMTP id A147D400C9 for ; Thu, 28 Jul 2022 19:06:16 +0000 (UTC) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIloKW018672; Thu, 28 Jul 2022 19:05:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=uTJDX5S3TUopqT6N3AIdKKSI+mqbGEFWWY/1SYf1XMU=; b=BfCBa6Ri4r4wepH1CxPYCwmLspilQmTwtgzY3Vu62y0YkC5/rbCLrbJ6PqaPVc95BH9O wSvwBf5NB7D5JbBypq7YADV57R0LxWENbmNh5PfurbOvgZdOsrSRea3cnWkbXvjQvGK0 URjEsOf8ljYS2cWK9ynG/1e+pjtlWUTGXq8irNGOP/pj1GBsx4+o/WsM1BRBNBN82TsT hb4xOfZA40jidKS+eFrW5+XH+ddB4qNBlADmQV/Xdcz2k+0BN3lai7JBix9JU1tRoH0w ggBJRlGP6Wq4zLrE9+gOvdpIjOfmmffIxotOl+7psJxCuq3g7RDZ8xAcVeckPOu+HNOw nA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm0238sem-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:53 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SIlsKw018778; Thu, 28 Jul 2022 19:05:28 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm0238qqn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:27 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ2B7l018194; Thu, 28 Jul 2022 19:05:02 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma02dal.us.ibm.com with ESMTP id 3hhfpj63f4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:02 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ51vR39452940 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:01 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0D4746A054; Thu, 28 Jul 2022 19:05:01 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D45016A04D; Thu, 28 Jul 2022 19:04:54 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:04:54 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" , Jagdish Gediya Subject: [PATCH v11 1/8] mm/demotion: Add support for explicit memory tiers Date: Fri, 29 Jul 2022 00:34:29 +0530 Message-Id: <20220728190436.858458-2-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: Z1w32zoRkLjpVRSMY3eUyjA4weG8iv1n X-Proofpoint-ORIG-GUID: pE09Hj9K4GLlc_Q8_Ci0p3aPWfuGMvd- X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 suspectscore=0 malwarescore=0 adultscore=0 phishscore=0 spamscore=0 clxscore=1015 mlxlogscore=999 mlxscore=0 priorityscore=1501 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=BfCBa6Ri; spf=pass (imf01.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035176; a=rsa-sha256; cv=none; b=hbmCfMZ4/zVqCubN9my2guTO650FSFS42x6ABmb1IwE9mz/fHCmAdmWsIA9vKzwq/JQ9xm duh463cpkRhhBaSaEfmcumPcjzhl2MpwvqSe7ut0B6D5IovMEfe0CYxGMBkMKDcF4ylmT/ zYgGEi45Ji26C/U9/Ksc+pdgaNlAO4k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uTJDX5S3TUopqT6N3AIdKKSI+mqbGEFWWY/1SYf1XMU=; b=PD1xlczS3/Td/JmqG59JYzUAYJ0oMQNlVHGckTNNda1FSv5sGeKgCFt/d2P4c9cZqDgNJR dvdQLpSQ0A/scBcBU5k2lyzbgWeuwu9jwvlkChwOaZwlfeyQnigh2DL4pJAwEEOccS2FKL BCbDMXgp8tya6tKivOUX4bR2O7Twb60= X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: A147D400C9 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=BfCBa6Ri; spf=pass (imf01.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Stat-Signature: g46sap7dcoxco6qyiss5gf347r1wfwzx X-Rspam-User: X-HE-Tag: 1659035176-214876 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the current kernel, memory tiers are defined implicitly via a demotion path relationship between NUMA nodes, which is created during the kernel initialization and updated when a NUMA node is hot-added or hot-removed. The current implementation puts all nodes with CPU into the highest tier, and builds the tier hierarchy tier-by-tier by establishing the per-node demotion targets based on the distances between nodes. This current memory tier kernel implementation needs to be improved for several important use cases, The current tier initialization code always initializes each memory-only NUMA node into a lower tier. But a memory-only NUMA node may have a high performance memory device (e.g. a DRAM-backed memory-only node on a virtual machine) that should be put into a higher tier. The current tier hierarchy always puts CPU nodes into the top tier. But on a system with HBM or GPU devices, the memory-only NUMA nodes mapping these devices should be in the top tier, and DRAM nodes with CPUs are better to be placed into the next lower tier. With current kernel higher tier node can only be demoted to nodes with shortest distance on the next lower tier as defined by the demotion path, not any other node from any lower tier. This strict, demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space), This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that. This patch series address the above by defining memory tiers explicitly. Linux kernel presents memory devices as NUMA nodes and each memory device is of a specific type. The memory type of a device is represented by its abstract distance. A memory tier corresponds to a range of abstract distance. This allows for classifying memory devices with a specific performance range into a memory tier. This patch configures the range/chunk size to be 128. The default DRAM abstract distance is 512. We can have 4 memory tiers below the default DRAM abstract distance which cover the range 0 - 127, 127 - 255, 256- 383, 384 - 511. Slower memory devices like persistent memory will have abstract distance below the default DRAM level and hence will be placed in these 4 lower tiers. A kernel parameter is provided to override the default memory tier. Link: https://lore.kernel.org/linux-mm/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@mail.gmail.com Link: https://lore.kernel.org/linux-mm/7b72ccf4-f4ae-cb4e-f411-74d055482026@linux.ibm.com Signed-off-by: Jagdish Gediya Signed-off-by: Aneesh Kumar K.V --- include/linux/memory-tiers.h | 17 ++++++ mm/Makefile | 1 + mm/memory-tiers.c | 102 +++++++++++++++++++++++++++++++++++ 3 files changed, 120 insertions(+) create mode 100644 include/linux/memory-tiers.h create mode 100644 mm/memory-tiers.c diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h new file mode 100644 index 000000000000..8d7884b7a3f0 --- /dev/null +++ b/include/linux/memory-tiers.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MEMORY_TIERS_H +#define _LINUX_MEMORY_TIERS_H + +/* + * Each tier cover a abstrace distance chunk size of 128 + */ +#define MEMTIER_CHUNK_BITS 7 +#define MEMTIER_CHUNK_SIZE (1 << MEMTIER_CHUNK_BITS) +/* + * For now let's have 4 memory tier below default DRAM tier. + */ +#define MEMTIER_ADISTANCE_DRAM (1 << (MEMTIER_CHUNK_BITS + 2)) +/* leave one tier below this slow pmem */ +#define MEMTIER_ADISTANCE_PMEM (1 << MEMTIER_CHUNK_BITS) + +#endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/mm/Makefile b/mm/Makefile index 6f9ffa968a1a..d30acebc2164 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -92,6 +92,7 @@ obj-$(CONFIG_KFENCE) += kfence/ obj-$(CONFIG_FAILSLAB) += failslab.o obj-$(CONFIG_MEMTEST) += memtest.o obj-$(CONFIG_MIGRATION) += migrate.o +obj-$(CONFIG_NUMA) += memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) += migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) += huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) += page_counter.o diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c new file mode 100644 index 000000000000..01cfd514c192 --- /dev/null +++ b/mm/memory-tiers.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include + +struct memory_tier { + /* hierarchy of memory tiers */ + struct list_head list; + /* list of all memory types part of this tier */ + struct list_head memory_types; + /* + * start value of abstract distance. memory tier maps + * an abstract distance range, + * adistance_start .. adistance_start + MEMTIER_CHUNK_SIZE + */ + int adistance_start; +}; + +struct memory_dev_type { + /* list of memory types that are are part of same tier as this type */ + struct list_head tier_sibiling; + /* abstract distance for this specific memory type */ + int adistance; + /* Nodes of same abstract distance */ + nodemask_t nodes; + struct memory_tier *memtier; +}; + +static DEFINE_MUTEX(memory_tier_lock); +static LIST_HEAD(memory_tiers); +struct memory_dev_type *node_memory_types[MAX_NUMNODES]; +/* + * For now let's have 4 memory tier below default DRAM tier. + */ +static struct memory_dev_type default_dram_type = { + .adistance = MEMTIER_ADISTANCE_DRAM, + .tier_sibiling = LIST_HEAD_INIT(default_dram_type.tier_sibiling), +}; + +static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memtype) +{ + bool found_slot = false; + struct memory_tier *memtier, *new_memtier; + int adistance = memtype->adistance; + unsigned int memtier_adistance_chunk_size = MEMTIER_CHUNK_SIZE; + + lockdep_assert_held_once(&memory_tier_lock); + + /* + * If the memtype is already part of a memory tier, + * just return that. + */ + if (memtype->memtier) + return memtype->memtier; + + adistance = round_down(adistance, memtier_adistance_chunk_size); + list_for_each_entry(memtier, &memory_tiers, list) { + if (adistance == memtier->adistance_start) { + memtype->memtier = memtier; + list_add(&memtype->tier_sibiling, &memtier->memory_types); + return memtier; + } else if (adistance < memtier->adistance_start) { + found_slot = true; + break; + } + } + + new_memtier = kzalloc(sizeof(struct memory_tier), GFP_KERNEL); + if (!new_memtier) + return ERR_PTR(-ENOMEM); + + new_memtier->adistance_start = adistance; + INIT_LIST_HEAD(&new_memtier->list); + INIT_LIST_HEAD(&new_memtier->memory_types); + if (found_slot) + list_add_tail(&new_memtier->list, &memtier->list); + else + list_add_tail(&new_memtier->list, &memory_tiers); + memtype->memtier = new_memtier; + list_add(&memtype->tier_sibiling, &new_memtier->memory_types); + return new_memtier; +} + +static int __init memory_tier_init(void) +{ + struct memory_tier *memtier; + + mutex_lock(&memory_tier_lock); + /* CPU only nodes are not part of memory tiers. */ + default_dram_type.nodes = node_states[N_MEMORY]; + + memtier = find_create_memory_tier(&default_dram_type); + if (IS_ERR(memtier)) + panic("%s() failed to register memory tier: %ld\n", + __func__, PTR_ERR(memtier)); + mutex_unlock(&memory_tier_lock); + + return 0; +} +subsys_initcall(memory_tier_init); From patchwork Thu Jul 28 19:04:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931686 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C936C04A68 for ; Thu, 28 Jul 2022 19:05:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAC5F6B0072; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C3A94940007; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8955E6B0074; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 716496B0073 for ; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 455F21C701C for ; Thu, 28 Jul 2022 19:05:25 +0000 (UTC) X-FDA: 79737437010.07.81C4F0C Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf31.hostedemail.com (Postfix) with ESMTP id 830FD200AC for ; Thu, 28 Jul 2022 19:05:24 +0000 (UTC) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SJ2vpU007517; Thu, 28 Jul 2022 19:05:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=MYjCoNay1r368gChTs21KO0Bpf66/k6EEnGmtXI3wP4=; b=CX5jxO0cFcfndCqvWE/YPKN7+hA8stNVxt6KCN7fd/XtjOZop5jOoBgnuG0sFesmR0l9 HRads/i8w6mQuz+KQodMP6vntx4M6Bz8h43fMmUi4+VFR3VdSSkuCePl0WepGESq9/mp D7kYmOkEhUmE2KTpsfepd9KHpOdCgRqF0TybTk5oHLB4UHGHko3ILuP6Hl6ofvTl57G5 I5txvhoNrGf46Fa7ggELH9/WVrmQPwN/vLzUNIDSHAjnI/0FoHVMK6dwI5iYaks3fyQi ujWH/M20/iBSSaNF2W1avp/+EfBOlMeVOTMr968Ruu4F0umqi8wyYYWOP2tEURd5d73i QA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm097r29m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:10 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SJ59g7017815; Thu, 28 Jul 2022 19:05:09 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm097r28t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:09 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ2CqJ018209; Thu, 28 Jul 2022 19:05:08 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma02dal.us.ibm.com with ESMTP id 3hhfpj63fs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:08 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ57JL33096054 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:07 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4025F6A04D; Thu, 28 Jul 2022 19:05:07 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8E6E26A047; Thu, 28 Jul 2022 19:05:01 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:01 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" Subject: [PATCH v11 2/8] mm/demotion: Move memory demotion related code Date: Fri, 29 Jul 2022 00:34:30 +0530 Message-Id: <20220728190436.858458-3-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: NPjpWOj_lKAFhxSJdIOxsJojmKM3kTX1 X-Proofpoint-ORIG-GUID: QaVtC0VuAJfO0msP6BNjLRcmLMMFRFjv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 phishscore=0 adultscore=0 impostorscore=0 suspectscore=0 mlxlogscore=999 lowpriorityscore=0 clxscore=1015 bulkscore=0 priorityscore=1501 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035124; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MYjCoNay1r368gChTs21KO0Bpf66/k6EEnGmtXI3wP4=; b=OLQLq6EVoko9ugixaai2rLZyVl7soKn1yZdgvtBD4xXIq1rViLDOwBbvYeatP+TQeK/XV1 HtHqTkIOqu5HAbYk5RxT/ktdQbg64QnrsLAmHDDAlw3Rj6cQSsxYHL1b+IleLyI/yP3g9A ss0eaGPkHPeoEJQHU8rha7snWl2QnAc= ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=CX5jxO0c; spf=pass (imf31.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035124; a=rsa-sha256; cv=none; b=5tOoSMHGL/pz2mQ4qMDIbP2n1KTD/u4eLDYnTx86krzh9Gyap2DUO98Do9upTNiITDN9Ja D5naCR1MuRSrDKHNWr7Vd9I209qwEeWWb1jaFGx9D6Xe5dQs2bZL7L5PG1FYokbXTDvjAM d7miJLxu6an9xMetslDv2vIgP2B4EW8= X-Stat-Signature: 6qkfaaqag368g3p6phc9nzri7qtyions X-Rspamd-Queue-Id: 830FD200AC X-Rspam-User: Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=CX5jxO0c; spf=pass (imf31.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspamd-Server: rspam06 X-HE-Tag: 1659035124-700184 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This move memory demotion related code to mm/memory-tiers.c. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V --- include/linux/memory-tiers.h | 8 +++++ include/linux/migrate.h | 2 -- mm/memory-tiers.c | 62 ++++++++++++++++++++++++++++++++++++ mm/migrate.c | 60 +--------------------------------- mm/vmscan.c | 1 + 5 files changed, 72 insertions(+), 61 deletions(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 8d7884b7a3f0..b85901c0caba 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -14,4 +14,12 @@ /* leave one tier below this slow pmem */ #define MEMTIER_ADISTANCE_PMEM (1 << MEMTIER_CHUNK_BITS) +#ifdef CONFIG_NUMA +#include +extern bool numa_demotion_enabled; + +#else + +#define numa_demotion_enabled false +#endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 069a89e847f3..43e737215f33 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -78,7 +78,6 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #if defined(CONFIG_MIGRATION) && defined(CONFIG_NUMA) extern void set_migration_target_nodes(void); extern void migrate_on_reclaim_init(void); -extern bool numa_demotion_enabled; extern int next_demotion_node(int node); #else static inline void set_migration_target_nodes(void) {} @@ -87,7 +86,6 @@ static inline int next_demotion_node(int node) { return NUMA_NO_NODE; } -#define numa_demotion_enabled false #endif #ifdef CONFIG_COMPACTION diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 01cfd514c192..03e43f3dc942 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -100,3 +100,65 @@ static int __init memory_tier_init(void) return 0; } subsys_initcall(memory_tier_init); + +bool numa_demotion_enabled = false; + +#ifdef CONFIG_MIGRATION +#ifdef CONFIG_SYSFS +static ssize_t numa_demotion_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", + numa_demotion_enabled ? "true" : "false"); +} + +static ssize_t numa_demotion_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + + ret = kstrtobool(buf, &numa_demotion_enabled); + if (ret) + return ret; + + return count; +} + +static struct kobj_attribute numa_demotion_enabled_attr = + __ATTR(demotion_enabled, 0644, numa_demotion_enabled_show, + numa_demotion_enabled_store); + +static struct attribute *numa_attrs[] = { + &numa_demotion_enabled_attr.attr, + NULL, +}; + +static const struct attribute_group numa_attr_group = { + .attrs = numa_attrs, +}; + +static int __init numa_init_sysfs(void) +{ + int err; + struct kobject *numa_kobj; + + numa_kobj = kobject_create_and_add("numa", mm_kobj); + if (!numa_kobj) { + pr_err("failed to create numa kobject\n"); + return -ENOMEM; + } + err = sysfs_create_group(numa_kobj, &numa_attr_group); + if (err) { + pr_err("failed to register numa group\n"); + goto delete_obj; + } + return 0; + +delete_obj: + kobject_put(numa_kobj); + return err; +} +subsys_initcall(numa_init_sysfs); +#endif /* CONFIG_SYSFS */ +#endif diff --git a/mm/migrate.c b/mm/migrate.c index 6c1ea61f39d8..fce7d4a9e940 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2509,64 +2509,6 @@ void __init migrate_on_reclaim_init(void) set_migration_target_nodes(); cpus_read_unlock(); } +#endif /* CONFIG_NUMA */ -bool numa_demotion_enabled = false; - -#ifdef CONFIG_SYSFS -static ssize_t numa_demotion_enabled_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) -{ - return sysfs_emit(buf, "%s\n", - numa_demotion_enabled ? "true" : "false"); -} - -static ssize_t numa_demotion_enabled_store(struct kobject *kobj, - struct kobj_attribute *attr, - const char *buf, size_t count) -{ - ssize_t ret; - - ret = kstrtobool(buf, &numa_demotion_enabled); - if (ret) - return ret; - - return count; -} - -static struct kobj_attribute numa_demotion_enabled_attr = - __ATTR(demotion_enabled, 0644, numa_demotion_enabled_show, - numa_demotion_enabled_store); - -static struct attribute *numa_attrs[] = { - &numa_demotion_enabled_attr.attr, - NULL, -}; - -static const struct attribute_group numa_attr_group = { - .attrs = numa_attrs, -}; - -static int __init numa_init_sysfs(void) -{ - int err; - struct kobject *numa_kobj; - numa_kobj = kobject_create_and_add("numa", mm_kobj); - if (!numa_kobj) { - pr_err("failed to create numa kobject\n"); - return -ENOMEM; - } - err = sysfs_create_group(numa_kobj, &numa_attr_group); - if (err) { - pr_err("failed to register numa group\n"); - goto delete_obj; - } - return 0; - -delete_obj: - kobject_put(numa_kobj); - return err; -} -subsys_initcall(numa_init_sysfs); -#endif /* CONFIG_SYSFS */ -#endif /* CONFIG_NUMA */ diff --git a/mm/vmscan.c b/mm/vmscan.c index f7d9a683e3a7..3a8f78277f99 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include From patchwork Thu Jul 28 19:04:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931685 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C111C19F2B for ; Thu, 28 Jul 2022 19:05:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B00D88E0001; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A661D6B0073; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F5688E0001; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6FF0A6B0072 for ; Thu, 28 Jul 2022 15:05:25 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8422F14158C for ; Thu, 28 Jul 2022 19:05:24 +0000 (UTC) X-FDA: 79737436968.05.3080528 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf07.hostedemail.com (Postfix) with ESMTP id 216F340097 for ; Thu, 28 Jul 2022 19:05:23 +0000 (UTC) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIqXCu032622; Thu, 28 Jul 2022 19:05:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=lFenJdsVELNWeajF2epdtoWy5bSL9jInArYw6QYM9Rc=; b=siMfNBHbU0L8W3O0l3aNeqfenbmGUr9Q1LDaGIFLYYfP2qKvEZ3x6EuHY0RSSQBSD6sp jbxN+I8ap/g/P9NHUgtiJ7KICwbIm4sAlTGJjDIq2FCfWu272HO/+cg5vhM6/7XKygEb 7V3Kv0P9E1Hx9zrxgmGCuCRfhv8q+PP3Yqagph01Z7pUy840q35NI7zt25gcT2olV5KM /F1fDgHjlAngtVORZbRxN9gJyVuBcthDAC43OPXbvs9PCQM2144ARRZMQLYitwb2tHf4 OZgSOZpsCDNLYyHDy7+OrB16e1Hn45T/cwUWjkU+fhhChr902KdOwxZMsIgblPF3R7pD wg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm04c0brc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:15 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SIr2eC035826; Thu, 28 Jul 2022 19:05:15 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm04c0bqn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:15 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ2VAk017301; Thu, 28 Jul 2022 19:05:14 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma04wdc.us.ibm.com with ESMTP id 3hg94b196e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:14 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ5DXf36962772 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:13 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7C59C6A057; Thu, 28 Jul 2022 19:05:13 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CD3856A047; Thu, 28 Jul 2022 19:05:07 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:07 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" Subject: [PATCH v11 3/8] mm/demotion: Add hotplug callbacks to handle new numa node onlined Date: Fri, 29 Jul 2022 00:34:31 +0530 Message-Id: <20220728190436.858458-4-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: r1htYXv7501eBYj7l7SnioIOfDv-vXHi X-Proofpoint-GUID: Nat__jd6zEOrK0m-YntgAhvHpM2kA7gg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1011 impostorscore=0 mlxlogscore=999 adultscore=0 phishscore=0 priorityscore=1501 suspectscore=0 bulkscore=0 spamscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035124; a=rsa-sha256; cv=none; b=uzdrRCoqBdWpOcb8zWiQW6AP1AXVcC4HFNmP+xC6cvYgvFD3M2d8buEr/jfcrlLCL4O3FV 28Q4Wrr/r1eU0QG3JU+oqXgLcnmpyGj1gNbPidN3eZIDQruQnZpSpDiVKEy/ujU3ufdKkE z7ps4/9eblUR+JhDVCaWT2bhZwfg39E= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=siMfNBHb; spf=pass (imf07.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035124; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lFenJdsVELNWeajF2epdtoWy5bSL9jInArYw6QYM9Rc=; b=u3djMB4I+chwGNm5mhHAGYo2m2edeVkXmQfuGZeuhctmzKISpmDWHOfEd4fyB8B3XQwK7l ETDVKUSeL7S2LvtvrWuM6wJrAVJhlaYrnkg7VHJhDeneoNdvL7DPmgf/7d0MpT+E7coH2F XDsb8L5X0UOSV9qhTO2Fyy3cl1nboK4= Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=siMfNBHb; spf=pass (imf07.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 216F340097 X-Stat-Signature: yaasx5rkmeyt89fxxcnywd46sfzt5ray X-HE-Tag: 1659035123-88042 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If the new NUMA node onlined doesn't have a abstract distance assigned, the kernel adds the NUMA node to default memory tier. Signed-off-by: Aneesh Kumar K.V --- include/linux/memory-tiers.h | 1 + mm/memory-tiers.c | 83 ++++++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index b85901c0caba..976f43a5e3be 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -13,6 +13,7 @@ #define MEMTIER_ADISTANCE_DRAM (1 << (MEMTIER_CHUNK_BITS + 2)) /* leave one tier below this slow pmem */ #define MEMTIER_ADISTANCE_PMEM (1 << MEMTIER_CHUNK_BITS) +#define MEMTIER_HOTPLUG_PRIO 100 #ifdef CONFIG_NUMA #include diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 03e43f3dc942..c9854a394d9b 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -3,6 +3,7 @@ #include #include #include +#include #include struct memory_tier { @@ -83,6 +84,87 @@ static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memty return new_memtier; } +static struct memory_tier *__node_get_memory_tier(int node) +{ + struct memory_dev_type *memtype; + + memtype = node_memory_types[node]; + if (memtype) + return memtype->memtier; + return NULL; +} + +static void init_node_memory_tier(int node) +{ + struct memory_tier *memtier; + + mutex_lock(&memory_tier_lock); + + memtier = __node_get_memory_tier(node); + if (!memtier) { + struct memory_dev_type *memtype; + + if (!node_memory_types[node]) { + node_memory_types[node] = &default_dram_type; + node_set(node, default_dram_type.nodes); + } + memtype = node_memory_types[node]; + memtier = find_create_memory_tier(memtype); + } + mutex_unlock(&memory_tier_lock); +} + +static void destroy_memory_tier(struct memory_tier *memtier) +{ + list_del(&memtier->list); + kfree(memtier); +} + +static void clear_node_memory_tier(int node) +{ + struct memory_tier *current_memtier; + + mutex_lock(&memory_tier_lock); + current_memtier = __node_get_memory_tier(node); + if (current_memtier) { + struct memory_dev_type *memtype; + + memtype = node_memory_types[node]; + node_clear(node, memtype->nodes); + if (nodes_empty(memtype->nodes)) { + list_del(&memtype->tier_sibiling); + memtype->memtier = NULL; + if (list_empty(¤t_memtier->memory_types)) + destroy_memory_tier(current_memtier); + } + } + mutex_unlock(&memory_tier_lock); +} + +static int __meminit memtier_hotplug_callback(struct notifier_block *self, + unsigned long action, void *_arg) +{ + struct memory_notify *arg = _arg; + + /* + * Only update the node migration order when a node is + * changing status, like online->offline. + */ + if (arg->status_change_nid < 0) + return notifier_from_errno(0); + + switch (action) { + case MEM_OFFLINE: + clear_node_memory_tier(arg->status_change_nid); + break; + case MEM_ONLINE: + init_node_memory_tier(arg->status_change_nid); + break; + } + + return notifier_from_errno(0); +} + static int __init memory_tier_init(void) { struct memory_tier *memtier; @@ -97,6 +179,7 @@ static int __init memory_tier_init(void) __func__, PTR_ERR(memtier)); mutex_unlock(&memory_tier_lock); + hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRIO); return 0; } subsys_initcall(memory_tier_init); From patchwork Thu Jul 28 19:04:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931690 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BFEFC04A68 for ; Thu, 28 Jul 2022 19:06:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38A096B0071; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 339726B0072; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DA7E8E0001; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0BFF46B0071 for ; Thu, 28 Jul 2022 15:06:18 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E5BF9A0CE2 for ; Thu, 28 Jul 2022 19:06:17 +0000 (UTC) X-FDA: 79737439194.17.3B1DBC4 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf08.hostedemail.com (Postfix) with ESMTP id 3DC081600A4 for ; Thu, 28 Jul 2022 19:06:17 +0000 (UTC) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIlmNv018629; Thu, 28 Jul 2022 19:06:07 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=donn9f9OyDlvKkxd3kbeRn/TBljkOOfnH5TovPnfwlU=; b=Biz6XjWRMlzVT+5hy1MPkRbbmmK0LpR2iqgRHB6PE1aVse1xoAjMgQlBAQwRGnUCbm8u lgIzQf5oMLBduONPfDm2Ek0QeDYwd2qi7MufTmV6L90DMAAgi3VYFfqt7mNHwedTgfem Hd1kqa7ByUFMF40Kyr8RVU1ZR/5zCddhxHH5Pcv3zHhOeHq6BX9jkJ4QZ7BTcoUmytKl aaVbC4g0E77esld1DAsfVcysqa8hRe6i7aIEX20fnnrLEl8bLGef3WVmGowK8rHx3Dsk zqOB2VYk75C8jQd7CWMJ88yDTe3Qc1pFGIRDwK8QIrDH7UJYJz3Ce1Q/UPKAJAkeOOFX VA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm0238suv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:06:06 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SIpDKe029798; Thu, 28 Jul 2022 19:05:34 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm0238s15-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:34 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ5HZd003793; Thu, 28 Jul 2022 19:05:21 GMT Received: from b03cxnp07027.gho.boulder.ibm.com (b03cxnp07027.gho.boulder.ibm.com [9.17.130.14]) by ppma01dal.us.ibm.com with ESMTP id 3hg98shehf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:20 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ5JQV34931180 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:19 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C3C8F6A047; Thu, 28 Jul 2022 19:05:19 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1D53B6A04D; Thu, 28 Jul 2022 19:05:14 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:13 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" Subject: [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM Date: Fri, 29 Jul 2022 00:34:32 +0530 Message-Id: <20220728190436.858458-5-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: bfrT-tiO2ZblNXq2-UMQiu6skOp8YqYG X-Proofpoint-ORIG-GUID: Ypt4IbU8PiVovesu1XT-nRNVpzAaJ7hM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 suspectscore=0 malwarescore=0 adultscore=0 phishscore=0 spamscore=0 clxscore=1015 mlxlogscore=999 mlxscore=0 priorityscore=1501 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=Biz6XjWR; spf=pass (imf08.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035177; a=rsa-sha256; cv=none; b=gku5VFENkw/1nX1+z1q92KurhHJlct/hWh1/dL/YrU3gWT7c9mwQOMCqujjahPPQoowvau CVELRrfgAkfzqx6tgfG6wxKpds7Lijsa/QYhxUi7vA97owv6sUyMBtkIPamJuipb6mg8NJ qx5SekqLWtTTpf8yvKbQJNXtuGH/t0s= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035177; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=donn9f9OyDlvKkxd3kbeRn/TBljkOOfnH5TovPnfwlU=; b=dbp5XVqxwjjfPsE+NYugZDSjh7l1GXGKxLuOhR98yUbp6rnb3HJXlFQEz/5Fo5SFM0dY5R 3rJyQzKaOVR3sVJ6ekOnhPPLv6S3C8nzbtQtbhUNxMYfDUmQSs1su4/b87jiZrxAOCtdvL kCqvmqsIA1KCdjA3LAMaiSdjp5pAWmc= X-Stat-Signature: uju86bzbyi8sjocw6ems9yrbbuob7fw7 X-Rspamd-Queue-Id: 3DC081600A4 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=Biz6XjWR; spf=pass (imf08.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspamd-Server: rspam02 X-HE-Tag: 1659035177-807167 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: By default, all nodes are assigned to the default memory tier which is the memory tier designated for nodes with DRAM Set dax kmem device node's tier to slower memory tier by assigning abstract distance to MEMTIER_ADISTANCE_PMEM. PMEM tier appears below the default memory tier in demotion order. Signed-off-by: Aneesh Kumar K.V --- drivers/dax/kmem.c | 9 +++++++++ include/linux/memory-tiers.h | 19 ++++++++++++++++++- mm/memory-tiers.c | 28 ++++++++++++++++------------ 3 files changed, 43 insertions(+), 13 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index a37622060fff..6b0d5de9a3e9 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "dax-private.h" #include "bus.h" @@ -41,6 +42,12 @@ struct dax_kmem_data { struct resource *res[]; }; +static struct memory_dev_type default_pmem_type = { + .adistance = MEMTIER_ADISTANCE_PMEM, + .tier_sibiling = LIST_HEAD_INIT(default_pmem_type.tier_sibiling), + .nodes = NODE_MASK_NONE, +}; + static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { struct device *dev = &dev_dax->dev; @@ -62,6 +69,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) return -EINVAL; } + init_node_memory_type(numa_node, &default_pmem_type); + for (i = 0; i < dev_dax->nr_range; i++) { struct range range; diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 976f43a5e3be..4f4baf0bf430 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -2,6 +2,8 @@ #ifndef _LINUX_MEMORY_TIERS_H #define _LINUX_MEMORY_TIERS_H +#include +#include /* * Each tier cover a abstrace distance chunk size of 128 */ @@ -15,12 +17,27 @@ #define MEMTIER_ADISTANCE_PMEM (1 << MEMTIER_CHUNK_BITS) #define MEMTIER_HOTPLUG_PRIO 100 +struct memory_tier; +struct memory_dev_type { + /* list of memory types that are are part of same tier as this type */ + struct list_head tier_sibiling; + /* abstract distance for this specific memory type */ + int adistance; + /* Nodes of same abstract distance */ + nodemask_t nodes; + struct memory_tier *memtier; +}; + #ifdef CONFIG_NUMA -#include extern bool numa_demotion_enabled; +struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type *default_type); #else #define numa_demotion_enabled false +static inline struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type *default_type) +{ + return ERR_PTR(-EINVAL); +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index c9854a394d9b..109be75fa554 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -1,6 +1,4 @@ // SPDX-License-Identifier: GPL-2.0 -#include -#include #include #include #include @@ -19,16 +17,6 @@ struct memory_tier { int adistance_start; }; -struct memory_dev_type { - /* list of memory types that are are part of same tier as this type */ - struct list_head tier_sibiling; - /* abstract distance for this specific memory type */ - int adistance; - /* Nodes of same abstract distance */ - nodemask_t nodes; - struct memory_tier *memtier; -}; - static DEFINE_MUTEX(memory_tier_lock); static LIST_HEAD(memory_tiers); struct memory_dev_type *node_memory_types[MAX_NUMNODES]; @@ -141,6 +129,22 @@ static void clear_node_memory_tier(int node) mutex_unlock(&memory_tier_lock); } +struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type *default_type) +{ + struct memory_dev_type *mem_type; + + mutex_lock(&memory_tier_lock); + if (node_memory_types[node]) { + mem_type = node_memory_types[node]; + } else { + node_memory_types[node] = default_type; + node_set(node, default_type->nodes); + mem_type = default_type; + } + mutex_unlock(&memory_tier_lock); + return mem_type; +} + static int __meminit memtier_hotplug_callback(struct notifier_block *self, unsigned long action, void *_arg) { From patchwork Thu Jul 28 19:04:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8846EC04A68 for ; Thu, 28 Jul 2022 19:05:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13FA7940007; Thu, 28 Jul 2022 15:05:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F0336B0074; Thu, 28 Jul 2022 15:05:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8543940007; Thu, 28 Jul 2022 15:05:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D445A6B0073 for ; Thu, 28 Jul 2022 15:05:46 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A8BC114149C for ; Thu, 28 Jul 2022 19:05:46 +0000 (UTC) X-FDA: 79737437892.29.D9FF832 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf22.hostedemail.com (Postfix) with ESMTP id F063EC00AB for ; Thu, 28 Jul 2022 19:05:45 +0000 (UTC) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIqX4b032610; Thu, 28 Jul 2022 19:05:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=Yd0ONWV2rdQalzkAFizGF53MEVDee7WBHmYDVpCJLzw=; b=A3q+Si7ikFmfpuZpp3r9TLdf60VoupCngVJURKdTMyF2qmXnkvF1GaNsXeUjhT08dXx8 s+GUANxD2G3IWqjgM3BCBP828AP1x0UZkMr4Q3q2MQYsh2WGqvbdqW+q3RR2ttyBDJHv YEZOhZ2fz+HUUVrqwgyMX5aqM9Cbl0C8ee6N3AcGFeYM0/WZpMlIWmdNEl5CgL4o2QBu m/IB1HrDgR4XDVJd0DViWifCJHPRSu8Wl4U6ynZrw34Whfxbuph7aShczplHzjEisnkL C+26jEAYyYS0McWTK6LBTOS9Fnhy4Wb8QeGcTH0oaGjPcP4g2v4R/77P85AsqABvQRVa jw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm04c0byv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:29 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SItKOi006868; Thu, 28 Jul 2022 19:05:28 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm04c0bxy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:28 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ5FB0003779; Thu, 28 Jul 2022 19:05:27 GMT Received: from b03cxnp07027.gho.boulder.ibm.com (b03cxnp07027.gho.boulder.ibm.com [9.17.130.14]) by ppma01dal.us.ibm.com with ESMTP id 3hg98shej3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:27 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ5QEC42271188 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:26 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1901D6A054; Thu, 28 Jul 2022 19:05:26 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E7056A047; Thu, 28 Jul 2022 19:05:20 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:20 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" Subject: [PATCH v11 5/8] mm/demotion: Build demotion targets based on explicit memory tiers Date: Fri, 29 Jul 2022 00:34:33 +0530 Message-Id: <20220728190436.858458-6-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: CUfXlIzMMiPTJM6mnfiiqUwBgysuMfe_ X-Proofpoint-GUID: Orl8hREgKewHs6juwzyt_Fqcwrvjauzp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 impostorscore=0 mlxlogscore=999 adultscore=0 phishscore=0 priorityscore=1501 suspectscore=0 bulkscore=0 spamscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=A3q+Si7i; spf=pass (imf22.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035146; a=rsa-sha256; cv=none; b=7+n+u2HGI/hklcsICAzksfAjCLN/ttjOri0gllsyLdQEVUTcsuPW3UgxcohoN4wk5dOUmW JvDT7V6wafxIM7CgpUO2FrBV8DhpAluDyabxvTwiGHyqaD55MNQ8ua4PtYub9AWfdLky4F 9o6M71R0nXJprE9V/LyVzy+H3WXR80o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yd0ONWV2rdQalzkAFizGF53MEVDee7WBHmYDVpCJLzw=; b=NxrbJT41EtRdS0kSCStMWdfmzdD2h31twJB2eR2jhe5FkdSjF2qQozEjjLxTBbMZxUt6Xx 0FaxYetcOcODh8poWa0LrlSpbO4f8RlgFrsPkDVOUNa9Ro/zKNScRbG92jZz/YFe4RDXri 6NPXsi/BpqV+G7ECV7jMPhHMZtNVyS4= X-Rspamd-Server: rspam10 X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=A3q+Si7i; spf=pass (imf22.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Stat-Signature: pmsta93patd5wxdoyshpwq1cunufnreb X-Rspamd-Queue-Id: F063EC00AB X-HE-Tag: 1659035145-275163 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch switch the demotion target building logic to use memory tiers instead of NUMA distance. All N_MEMORY NUMA nodes will be placed in the default memory tier and additional memory tiers will be added by drivers like dax kmem. This patch builds the demotion target for a NUMA node by looking at all memory tiers below the tier to which the NUMA node belongs. The closest node in the immediately following memory tier is used as a demotion target. Since we are now only building demotion target for N_MEMORY NUMA nodes the CPU hotplug calls are removed in this patch. Signed-off-by: Aneesh Kumar K.V --- include/linux/memory-tiers.h | 13 ++ include/linux/migrate.h | 13 -- mm/memory-tiers.c | 221 +++++++++++++++++++- mm/migrate.c | 394 ----------------------------------- mm/vmstat.c | 4 - 5 files changed, 233 insertions(+), 412 deletions(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 4f4baf0bf430..e56a57c6ef78 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -31,6 +31,14 @@ struct memory_dev_type { #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type *default_type); +#ifdef CONFIG_MIGRATION +int next_demotion_node(int node); +#else +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} +#endif #else @@ -39,5 +47,10 @@ static inline struct memory_dev_type *init_node_memory_type(int node, struct mem { return ERR_PTR(-EINVAL); } + +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 43e737215f33..93fab62e6548 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -75,19 +75,6 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #endif /* CONFIG_MIGRATION */ -#if defined(CONFIG_MIGRATION) && defined(CONFIG_NUMA) -extern void set_migration_target_nodes(void); -extern void migrate_on_reclaim_init(void); -extern int next_demotion_node(int node); -#else -static inline void set_migration_target_nodes(void) {} -static inline void migrate_on_reclaim_init(void) {} -static inline int next_demotion_node(int node) -{ - return NUMA_NO_NODE; -} -#endif - #ifdef CONFIG_COMPACTION extern int PageMovable(struct page *page); extern void __SetPageMovable(struct page *page, struct address_space *mapping); diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 109be75fa554..60845aa74afc 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -2,8 +2,11 @@ #include #include #include +#include #include +#include "internal.h" + struct memory_tier { /* hierarchy of memory tiers */ struct list_head list; @@ -17,9 +20,74 @@ struct memory_tier { int adistance_start; }; +struct demotion_nodes { + nodemask_t preferred; +}; + static DEFINE_MUTEX(memory_tier_lock); static LIST_HEAD(memory_tiers); struct memory_dev_type *node_memory_types[MAX_NUMNODES]; +#ifdef CONFIG_MIGRATION +/* + * node_demotion[] examples: + * + * Example 1: + * + * Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM nodes. + * + * node distances: + * node 0 1 2 3 + * 0 10 20 30 40 + * 1 20 10 40 30 + * 2 30 40 10 40 + * 3 40 30 40 10 + * + * memory_tiers0 = 0-1 + * memory_tiers1 = 2-3 + * + * node_demotion[0].preferred = 2 + * node_demotion[1].preferred = 3 + * node_demotion[2].preferred = + * node_demotion[3].preferred = + * + * Example 2: + * + * Node 0 & 1 are CPU + DRAM nodes, node 2 is memory-only DRAM node. + * + * node distances: + * node 0 1 2 + * 0 10 20 30 + * 1 20 10 30 + * 2 30 30 10 + * + * memory_tiers0 = 0-2 + * + * node_demotion[0].preferred = + * node_demotion[1].preferred = + * node_demotion[2].preferred = + * + * Example 3: + * + * Node 0 is CPU + DRAM nodes, Node 1 is HBM node, node 2 is PMEM node. + * + * node distances: + * node 0 1 2 + * 0 10 20 30 + * 1 20 10 40 + * 2 30 40 10 + * + * memory_tiers0 = 1 + * memory_tiers1 = 0 + * memory_tiers2 = 2 + * + * node_demotion[0].preferred = 2 + * node_demotion[1].preferred = 0 + * node_demotion[2].preferred = + * + */ +static struct demotion_nodes *node_demotion __read_mostly; +#endif /* CONFIG_MIGRATION */ + /* * For now let's have 4 memory tier below default DRAM tier. */ @@ -82,6 +150,144 @@ static struct memory_tier *__node_get_memory_tier(int node) return NULL; } +#ifdef CONFIG_MIGRATION +/** + * next_demotion_node() - Get the next node in the demotion path + * @node: The starting node to lookup the next node + * + * Return: node id for next memory node in the demotion path hierarchy + * from @node; NUMA_NO_NODE if @node is terminal. This does not keep + * @node online or guarantee that it *continues* to be the next demotion + * target. + */ +int next_demotion_node(int node) +{ + struct demotion_nodes *nd; + int target; + + if (!node_demotion) + return NUMA_NO_NODE; + + nd = &node_demotion[node]; + + /* + * node_demotion[] is updated without excluding this + * function from running. + * + * Make sure to use RCU over entire code blocks if + * node_demotion[] reads need to be consistent. + */ + rcu_read_lock(); + /* + * If there are multiple target nodes, just select one + * target node randomly. + * + * In addition, we can also use round-robin to select + * target node, but we should introduce another variable + * for node_demotion[] to record last selected target node, + * that may cause cache ping-pong due to the changing of + * last target node. Or introducing per-cpu data to avoid + * caching issue, which seems more complicated. So selecting + * target node randomly seems better until now. + */ + target = node_random(&nd->preferred); + rcu_read_unlock(); + + return target; +} + +static void disable_all_demotion_targets(void) +{ + int node; + + for_each_node_state(node, N_MEMORY) + node_demotion[node].preferred = NODE_MASK_NONE; + /* + * Ensure that the "disable" is visible across the system. + * Readers will see either a combination of before+disable + * state or disable+after. They will never see before and + * after state together. + */ + synchronize_rcu(); +} + +static __always_inline nodemask_t get_memtier_nodemask(struct memory_tier *memtier) +{ + nodemask_t nodes = NODE_MASK_NONE; + struct memory_dev_type *memtype; + + list_for_each_entry(memtype, &memtier->memory_types, tier_sibiling) + nodes_or(nodes, nodes, memtype->nodes); + + return nodes; +} + +/* + * Find an automatic demotion target for all memory + * nodes. Failing here is OK. It might just indicate + * being at the end of a chain. + */ +static void establish_demotion_targets(void) +{ + struct memory_tier *memtier; + struct demotion_nodes *nd; + int target = NUMA_NO_NODE, node; + int distance, best_distance; + nodemask_t tier_nodes; + + lockdep_assert_held_once(&memory_tier_lock); + + if (!node_demotion || !IS_ENABLED(CONFIG_MIGRATION)) + return; + + disable_all_demotion_targets(); + + for_each_node_state(node, N_MEMORY) { + best_distance = -1; + nd = &node_demotion[node]; + + memtier = __node_get_memory_tier(node); + if (!memtier || list_is_first(&memtier->list, &memory_tiers)) + continue; + /* + * Get the lower memtier to find the demotion node list. + */ + memtier = list_prev_entry(memtier, list); + tier_nodes = get_memtier_nodemask(memtier); + /* + * find_next_best_node, use 'used' nodemask as a skip list. + * Add all memory nodes except the selected memory tier + * nodelist to skip list so that we find the best node from the + * memtier nodelist. + */ + nodes_andnot(tier_nodes, node_states[N_MEMORY], tier_nodes); + + /* + * Find all the nodes in the memory tier node list of same best distance. + * add them to the preferred mask. We randomly select between nodes + * in the preferred mask when allocating pages during demotion. + */ + do { + target = find_next_best_node(node, &tier_nodes); + if (target == NUMA_NO_NODE) + break; + + distance = node_distance(node, target); + if (distance == best_distance || best_distance == -1) { + best_distance = distance; + node_set(target, nd->preferred); + } else { + break; + } + } while (1); + } +} + +#else +static inline void disable_all_demotion_targets(void) {} +static inline void establish_demotion_targets(void) {} +#endif /* CONFIG_MIGRATION */ + static void init_node_memory_tier(int node) { struct memory_tier *memtier; @@ -89,6 +295,13 @@ static void init_node_memory_tier(int node) mutex_lock(&memory_tier_lock); memtier = __node_get_memory_tier(node); + /* + * if node is already part of the tier proceed with the + * current tier value, because we might want to establish + * new migration paths now. The node might be added to a tier + * before it was made part of N_MEMORY, hence estabilish_demotion_targets + * will have skipped this node. + */ if (!memtier) { struct memory_dev_type *memtype; @@ -99,6 +312,7 @@ static void init_node_memory_tier(int node) memtype = node_memory_types[node]; memtier = find_create_memory_tier(memtype); } + establish_demotion_targets(); mutex_unlock(&memory_tier_lock); } @@ -125,6 +339,7 @@ static void clear_node_memory_tier(int node) if (list_empty(¤t_memtier->memory_types)) destroy_memory_tier(current_memtier); } + establish_demotion_targets(); } mutex_unlock(&memory_tier_lock); } @@ -182,7 +397,11 @@ static int __init memory_tier_init(void) panic("%s() failed to register memory tier: %ld\n", __func__, PTR_ERR(memtier)); mutex_unlock(&memory_tier_lock); - +#ifdef CONFIG_MIGRATION + node_demotion = kcalloc(MAX_NUMNODES, sizeof(struct demotion_nodes), + GFP_KERNEL); + WARN_ON(!node_demotion); +#endif hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRIO); return 0; } diff --git a/mm/migrate.c b/mm/migrate.c index fce7d4a9e940..c758c9c21d7d 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2117,398 +2117,4 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, return 0; } #endif /* CONFIG_NUMA_BALANCING */ - -/* - * node_demotion[] example: - * - * Consider a system with two sockets. Each socket has - * three classes of memory attached: fast, medium and slow. - * Each memory class is placed in its own NUMA node. The - * CPUs are placed in the node with the "fast" memory. The - * 6 NUMA nodes (0-5) might be split among the sockets like - * this: - * - * Socket A: 0, 1, 2 - * Socket B: 3, 4, 5 - * - * When Node 0 fills up, its memory should be migrated to - * Node 1. When Node 1 fills up, it should be migrated to - * Node 2. The migration path start on the nodes with the - * processors (since allocations default to this node) and - * fast memory, progress through medium and end with the - * slow memory: - * - * 0 -> 1 -> 2 -> stop - * 3 -> 4 -> 5 -> stop - * - * This is represented in the node_demotion[] like this: - * - * { nr=1, nodes[0]=1 }, // Node 0 migrates to 1 - * { nr=1, nodes[0]=2 }, // Node 1 migrates to 2 - * { nr=0, nodes[0]=-1 }, // Node 2 does not migrate - * { nr=1, nodes[0]=4 }, // Node 3 migrates to 4 - * { nr=1, nodes[0]=5 }, // Node 4 migrates to 5 - * { nr=0, nodes[0]=-1 }, // Node 5 does not migrate - * - * Moreover some systems may have multiple slow memory nodes. - * Suppose a system has one socket with 3 memory nodes, node 0 - * is fast memory type, and node 1/2 both are slow memory - * type, and the distance between fast memory node and slow - * memory node is same. So the migration path should be: - * - * 0 -> 1/2 -> stop - * - * This is represented in the node_demotion[] like this: - * { nr=2, {nodes[0]=1, nodes[1]=2} }, // Node 0 migrates to node 1 and node 2 - * { nr=0, nodes[0]=-1, }, // Node 1 dose not migrate - * { nr=0, nodes[0]=-1, }, // Node 2 does not migrate - */ - -/* - * Writes to this array occur without locking. Cycles are - * not allowed: Node X demotes to Y which demotes to X... - * - * If multiple reads are performed, a single rcu_read_lock() - * must be held over all reads to ensure that no cycles are - * observed. - */ -#define DEFAULT_DEMOTION_TARGET_NODES 15 - -#if MAX_NUMNODES < DEFAULT_DEMOTION_TARGET_NODES -#define DEMOTION_TARGET_NODES (MAX_NUMNODES - 1) -#else -#define DEMOTION_TARGET_NODES DEFAULT_DEMOTION_TARGET_NODES -#endif - -struct demotion_nodes { - unsigned short nr; - short nodes[DEMOTION_TARGET_NODES]; -}; - -static struct demotion_nodes *node_demotion __read_mostly; - -/** - * next_demotion_node() - Get the next node in the demotion path - * @node: The starting node to lookup the next node - * - * Return: node id for next memory node in the demotion path hierarchy - * from @node; NUMA_NO_NODE if @node is terminal. This does not keep - * @node online or guarantee that it *continues* to be the next demotion - * target. - */ -int next_demotion_node(int node) -{ - struct demotion_nodes *nd; - unsigned short target_nr, index; - int target; - - if (!node_demotion) - return NUMA_NO_NODE; - - nd = &node_demotion[node]; - - /* - * node_demotion[] is updated without excluding this - * function from running. RCU doesn't provide any - * compiler barriers, so the READ_ONCE() is required - * to avoid compiler reordering or read merging. - * - * Make sure to use RCU over entire code blocks if - * node_demotion[] reads need to be consistent. - */ - rcu_read_lock(); - target_nr = READ_ONCE(nd->nr); - - switch (target_nr) { - case 0: - target = NUMA_NO_NODE; - goto out; - case 1: - index = 0; - break; - default: - /* - * If there are multiple target nodes, just select one - * target node randomly. - * - * In addition, we can also use round-robin to select - * target node, but we should introduce another variable - * for node_demotion[] to record last selected target node, - * that may cause cache ping-pong due to the changing of - * last target node. Or introducing per-cpu data to avoid - * caching issue, which seems more complicated. So selecting - * target node randomly seems better until now. - */ - index = get_random_int() % target_nr; - break; - } - - target = READ_ONCE(nd->nodes[index]); - -out: - rcu_read_unlock(); - return target; -} - -/* Disable reclaim-based migration. */ -static void __disable_all_migrate_targets(void) -{ - int node, i; - - if (!node_demotion) - return; - - for_each_online_node(node) { - node_demotion[node].nr = 0; - for (i = 0; i < DEMOTION_TARGET_NODES; i++) - node_demotion[node].nodes[i] = NUMA_NO_NODE; - } -} - -static void disable_all_migrate_targets(void) -{ - __disable_all_migrate_targets(); - - /* - * Ensure that the "disable" is visible across the system. - * Readers will see either a combination of before+disable - * state or disable+after. They will never see before and - * after state together. - * - * The before+after state together might have cycles and - * could cause readers to do things like loop until this - * function finishes. This ensures they can only see a - * single "bad" read and would, for instance, only loop - * once. - */ - synchronize_rcu(); -} - -/* - * Find an automatic demotion target for 'node'. - * Failing here is OK. It might just indicate - * being at the end of a chain. - */ -static int establish_migrate_target(int node, nodemask_t *used, - int best_distance) -{ - int migration_target, index, val; - struct demotion_nodes *nd; - - if (!node_demotion) - return NUMA_NO_NODE; - - nd = &node_demotion[node]; - - migration_target = find_next_best_node(node, used); - if (migration_target == NUMA_NO_NODE) - return NUMA_NO_NODE; - - /* - * If the node has been set a migration target node before, - * which means it's the best distance between them. Still - * check if this node can be demoted to other target nodes - * if they have a same best distance. - */ - if (best_distance != -1) { - val = node_distance(node, migration_target); - if (val > best_distance) - goto out_clear; - } - - index = nd->nr; - if (WARN_ONCE(index >= DEMOTION_TARGET_NODES, - "Exceeds maximum demotion target nodes\n")) - goto out_clear; - - nd->nodes[index] = migration_target; - nd->nr++; - - return migration_target; -out_clear: - node_clear(migration_target, *used); - return NUMA_NO_NODE; -} - -/* - * When memory fills up on a node, memory contents can be - * automatically migrated to another node instead of - * discarded at reclaim. - * - * Establish a "migration path" which will start at nodes - * with CPUs and will follow the priorities used to build the - * page allocator zonelists. - * - * The difference here is that cycles must be avoided. If - * node0 migrates to node1, then neither node1, nor anything - * node1 migrates to can migrate to node0. Also one node can - * be migrated to multiple nodes if the target nodes all have - * a same best-distance against the source node. - * - * This function can run simultaneously with readers of - * node_demotion[]. However, it can not run simultaneously - * with itself. Exclusion is provided by memory hotplug events - * being single-threaded. - */ -static void __set_migration_target_nodes(void) -{ - nodemask_t next_pass; - nodemask_t this_pass; - nodemask_t used_targets = NODE_MASK_NONE; - int node, best_distance; - - /* - * Avoid any oddities like cycles that could occur - * from changes in the topology. This will leave - * a momentary gap when migration is disabled. - */ - disable_all_migrate_targets(); - - /* - * Allocations go close to CPUs, first. Assume that - * the migration path starts at the nodes with CPUs. - */ - next_pass = node_states[N_CPU]; -again: - this_pass = next_pass; - next_pass = NODE_MASK_NONE; - /* - * To avoid cycles in the migration "graph", ensure - * that migration sources are not future targets by - * setting them in 'used_targets'. Do this only - * once per pass so that multiple source nodes can - * share a target node. - * - * 'used_targets' will become unavailable in future - * passes. This limits some opportunities for - * multiple source nodes to share a destination. - */ - nodes_or(used_targets, used_targets, this_pass); - - for_each_node_mask(node, this_pass) { - best_distance = -1; - - /* - * Try to set up the migration path for the node, and the target - * migration nodes can be multiple, so doing a loop to find all - * the target nodes if they all have a best node distance. - */ - do { - int target_node = - establish_migrate_target(node, &used_targets, - best_distance); - - if (target_node == NUMA_NO_NODE) - break; - - if (best_distance == -1) - best_distance = node_distance(node, target_node); - - /* - * Visit targets from this pass in the next pass. - * Eventually, every node will have been part of - * a pass, and will become set in 'used_targets'. - */ - node_set(target_node, next_pass); - } while (1); - } - /* - * 'next_pass' contains nodes which became migration - * targets in this pass. Make additional passes until - * no more migrations targets are available. - */ - if (!nodes_empty(next_pass)) - goto again; -} - -/* - * For callers that do not hold get_online_mems() already. - */ -void set_migration_target_nodes(void) -{ - get_online_mems(); - __set_migration_target_nodes(); - put_online_mems(); -} - -/* - * This leaves migrate-on-reclaim transiently disabled between - * the MEM_GOING_OFFLINE and MEM_OFFLINE events. This runs - * whether reclaim-based migration is enabled or not, which - * ensures that the user can turn reclaim-based migration at - * any time without needing to recalculate migration targets. - * - * These callbacks already hold get_online_mems(). That is why - * __set_migration_target_nodes() can be used as opposed to - * set_migration_target_nodes(). - */ -#ifdef CONFIG_MEMORY_HOTPLUG -static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, - unsigned long action, void *_arg) -{ - struct memory_notify *arg = _arg; - - /* - * Only update the node migration order when a node is - * changing status, like online->offline. This avoids - * the overhead of synchronize_rcu() in most cases. - */ - if (arg->status_change_nid < 0) - return notifier_from_errno(0); - - switch (action) { - case MEM_GOING_OFFLINE: - /* - * Make sure there are not transient states where - * an offline node is a migration target. This - * will leave migration disabled until the offline - * completes and the MEM_OFFLINE case below runs. - */ - disable_all_migrate_targets(); - break; - case MEM_OFFLINE: - case MEM_ONLINE: - /* - * Recalculate the target nodes once the node - * reaches its final state (online or offline). - */ - __set_migration_target_nodes(); - break; - case MEM_CANCEL_OFFLINE: - /* - * MEM_GOING_OFFLINE disabled all the migration - * targets. Reenable them. - */ - __set_migration_target_nodes(); - break; - case MEM_GOING_ONLINE: - case MEM_CANCEL_ONLINE: - break; - } - - return notifier_from_errno(0); -} -#endif - -void __init migrate_on_reclaim_init(void) -{ - node_demotion = kcalloc(nr_node_ids, - sizeof(struct demotion_nodes), - GFP_KERNEL); - WARN_ON(!node_demotion); -#ifdef CONFIG_MEMORY_HOTPLUG - hotplug_memory_notifier(migrate_on_reclaim_callback, 100); -#endif - /* - * At this point, all numa nodes with memory/CPus have their state - * properly set, so we can build the demotion order now. - * Let us hold the cpu_hotplug lock just, as we could possibily have - * CPU hotplug events during boot. - */ - cpus_read_lock(); - set_migration_target_nodes(); - cpus_read_unlock(); -} #endif /* CONFIG_NUMA */ - - diff --git a/mm/vmstat.c b/mm/vmstat.c index 373d2730fcf2..35c6ff97cf29 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -28,7 +28,6 @@ #include #include #include -#include #include "internal.h" @@ -2060,7 +2059,6 @@ static int vmstat_cpu_online(unsigned int cpu) if (!node_state(cpu_to_node(cpu), N_CPU)) { node_set_state(cpu_to_node(cpu), N_CPU); - set_migration_target_nodes(); } return 0; @@ -2085,7 +2083,6 @@ static int vmstat_cpu_dead(unsigned int cpu) return 0; node_clear_state(node, N_CPU); - set_migration_target_nodes(); return 0; } @@ -2118,7 +2115,6 @@ void __init init_mm_internals(void) start_shepherd_timer(); #endif - migrate_on_reclaim_init(); #ifdef CONFIG_PROC_FS proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op); proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op); From patchwork Thu Jul 28 19:04:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931692 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A496C04A68 for ; Thu, 28 Jul 2022 19:06:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 02EDC8E0002; Thu, 28 Jul 2022 15:06:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F212E8E0001; Thu, 28 Jul 2022 15:06:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9A4E8E0002; Thu, 28 Jul 2022 15:06:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C9D308E0001 for ; Thu, 28 Jul 2022 15:06:36 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9E2B0ABED8 for ; Thu, 28 Jul 2022 19:06:36 +0000 (UTC) X-FDA: 79737439992.20.9B35209 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf21.hostedemail.com (Postfix) with ESMTP id 1DBEA1C000E for ; Thu, 28 Jul 2022 19:06:35 +0000 (UTC) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIliAL018459; Thu, 28 Jul 2022 19:06:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=zzhE+RvouT8p2JfoZVaQmykIDMKRiQC78ntSkjMqxsc=; b=sI6fGeGkLdmjQmizPk7HnBqGvCD7T7ljMlv+QTdMDHCQpdjAh0EE/IWkArg8Gt/9aJ+R k+EYgV6R5qPCDANy0ntH3as+YGI6OITBw4KzNSDzIYr6MyvAxYDaTOW1LhkLJf5lJwVS wFdIBSs2AB068NUbwUKiwKyHxUPt7UE5+DkQU5mH/q/ukOba8ulLtw2UtRT8abkn/99i kFMI0nV2/zRJIqthJg2CvCpx6Ve81Qq8MvCY0bgjux6lXYft9l60wlNBGtBIvcooOdP5 7u2viAMXLu7FtCnT/sHLkGLPuhgpaPfc9d+VrSQXfS5dTFDwgGJkl8zlQgeFLhD2EhlU Eg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm0238tyf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:06:28 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SIlm0W018645; Thu, 28 Jul 2022 19:05:51 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm0238ssh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:51 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ1rB9003472; Thu, 28 Jul 2022 19:05:33 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma03dal.us.ibm.com with ESMTP id 3hg9791g6x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:33 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ5WKt32768494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:32 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0A8186A04D; Thu, 28 Jul 2022 19:05:32 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AE7FB6A047; Thu, 28 Jul 2022 19:05:26 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:26 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" Subject: [PATCH v11 6/8] mm/demotion: Add pg_data_t member to track node memory tier details Date: Fri, 29 Jul 2022 00:34:34 +0530 Message-Id: <20220728190436.858458-7-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 6-ABJ5C95z-5uZN1ltf0ItmuyL_8y0-r X-Proofpoint-ORIG-GUID: BIVOSYjHFr3bfuYsWQRk-dAMeORsPqiM X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 suspectscore=0 malwarescore=0 adultscore=0 phishscore=0 spamscore=0 clxscore=1015 mlxlogscore=999 mlxscore=0 priorityscore=1501 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=sI6fGeGk; spf=pass (imf21.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035196; a=rsa-sha256; cv=none; b=0/bOFy+rrvFJx+tl1ToRsZjrBPzIhepENWfvzCh8KvEsevV8s2vgyYAxrGJgLKVOFUNF10 iLr/dKB/aTtsOhGCQysLQOufxH/b4Js39VtS2aQJ0k53+T6aE4E1iBT7bx/EJV5yUVQkU1 Z1FxDu2oJORnBXlkSRHJBCf7itJgCfE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035196; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zzhE+RvouT8p2JfoZVaQmykIDMKRiQC78ntSkjMqxsc=; b=xABTEugR8oUF9V6GWHOAshZ7HBg80jGd/lBdxtXlRRXUgZXyk6QquhXoGq3lEU7p6tDOwX GPv+aA10C8b20COrvGFbLwyMLNZzgdzo9tHTaNb55VHCJZ2Xk19kYAek92TCH4eKAFRVT2 xViINNO69QkTewtLUdDFKNgf1FRyDCY= X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 1DBEA1C000E Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=sI6fGeGk; spf=pass (imf21.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Stat-Signature: hq4g8w3zzb46tzsni3zkk6abfp3ztdpm X-Rspam-User: X-HE-Tag: 1659035195-402411 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Also update different helpes to use NODE_DATA()->memtier. Since node specific memtier can change based on the reassignment of NUMA node to a different memory tiers, accessing NODE_DATA()->memtier needs to happen under an rcu read lock or memory_tier_lock. Signed-off-by: Aneesh Kumar K.V --- include/linux/mmzone.h | 3 +++ mm/memory-tiers.c | 40 +++++++++++++++++++++++++++++++++++----- 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aab70355d64f..353812495a70 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -928,6 +928,9 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; +#ifdef CONFIG_NUMA + struct memory_tier __rcu *memtier; +#endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 60845aa74afc..f982ca6b3559 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include "internal.h" @@ -142,12 +143,18 @@ static struct memory_tier *find_create_memory_tier(struct memory_dev_type *memty static struct memory_tier *__node_get_memory_tier(int node) { - struct memory_dev_type *memtype; + pg_data_t *pgdat; - memtype = node_memory_types[node]; - if (memtype) - return memtype->memtier; - return NULL; + pgdat = NODE_DATA(node); + if (!pgdat) + return NULL; + /* + * Since we hold memory_tier_lock, we can avoid + * RCU read locks when accessing the details. No + * parallel updates are possible here. + */ + return rcu_dereference_check(pgdat->memtier, + lockdep_is_held(&memory_tier_lock)); } #ifdef CONFIG_MIGRATION @@ -290,6 +297,7 @@ static inline void establish_demotion_targets(void) {} static void init_node_memory_tier(int node) { + pg_data_t *pgdat = NODE_DATA(node); struct memory_tier *memtier; mutex_lock(&memory_tier_lock); @@ -311,8 +319,12 @@ static void init_node_memory_tier(int node) } memtype = node_memory_types[node]; memtier = find_create_memory_tier(memtype); + if (IS_ERR(memtier)) + goto err_out; + rcu_assign_pointer(pgdat->memtier, memtier); } establish_demotion_targets(); +err_out: mutex_unlock(&memory_tier_lock); } @@ -324,13 +336,26 @@ static void destroy_memory_tier(struct memory_tier *memtier) static void clear_node_memory_tier(int node) { + pg_data_t *pgdat; struct memory_tier *current_memtier; + pgdat = NODE_DATA(node); + if (!pgdat) + return; + mutex_lock(&memory_tier_lock); + /* + * Make sure that anybody looking at NODE_DATA who finds + * a valid memtier finds memory_dev_types with nodes still + * linked to the memtier. We achieve this by waiting for + * rcu read section to finish using synchronize_rcu. + */ current_memtier = __node_get_memory_tier(node); if (current_memtier) { struct memory_dev_type *memtype; + rcu_assign_pointer(pgdat->memtier, NULL); + synchronize_rcu(); memtype = node_memory_types[node]; node_clear(node, memtype->nodes); if (nodes_empty(memtype->nodes)) { @@ -386,6 +411,7 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self, static int __init memory_tier_init(void) { + int node; struct memory_tier *memtier; mutex_lock(&memory_tier_lock); @@ -396,6 +422,10 @@ static int __init memory_tier_init(void) if (IS_ERR(memtier)) panic("%s() failed to register memory tier: %ld\n", __func__, PTR_ERR(memtier)); + + for_each_node_state(node, N_MEMORY) + rcu_assign_pointer(NODE_DATA(node)->memtier, memtier); + mutex_unlock(&memory_tier_lock); #ifdef CONFIG_MIGRATION node_demotion = kcalloc(MAX_NUMNODES, sizeof(struct demotion_nodes), From patchwork Thu Jul 28 19:04:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931688 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C941C19F2B for ; Thu, 28 Jul 2022 19:05:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CB3D940008; Thu, 28 Jul 2022 15:05:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07B3B6B0074; Thu, 28 Jul 2022 15:05:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5D3D6B0075; Thu, 28 Jul 2022 15:05:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D555F6B0073 for ; Thu, 28 Jul 2022 15:05:48 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AFAA7140FDE for ; Thu, 28 Jul 2022 19:05:48 +0000 (UTC) X-FDA: 79737437976.04.31E7EEC Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf06.hostedemail.com (Postfix) with ESMTP id 43B131800A9 for ; Thu, 28 Jul 2022 19:05:48 +0000 (UTC) Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIh9A7032347; Thu, 28 Jul 2022 19:05:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=NIlh5XHVT5HUIlQ773XYL5Jq/Mf3x/reo3QmVezqr9w=; b=jv0VJpVDNnNbDvos6ryh1bdvL0O1DJFSfWFarM+dUbOCG9Jq1AodT0ePiFp6gb8zD2Ze yY9Dh8jRTyBk2IkGSdtmB4fsu4EhQcRSsfb3d8SpbZuWrGTAcd2em1+uxa5+ISp6+IDC BCgnh4m7tpr9W/vcVwcJ8kV5Go9tse0pFm+0JdjSZ56GWCvl5eRw525I8FhqfReukaF3 fFRhsnMfBcIeQGhQoYwYdwL+2zyvYhYIjUaBoz1K7GGrY4E7tpa28w3FjixtuxX4o1OA H5Io/so+jA0POsYsf01oZR+1BiX0RMFIhEoA4qqQBbMnLiaOJm3McsQHeNl8XC9QWQap +A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hkyytgmp8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:40 +0000 Received: from m0098421.ppops.net (m0098421.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SIhoU6001176; Thu, 28 Jul 2022 19:05:40 GMT Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hkyytgmnv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:40 +0000 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ2frP022451; Thu, 28 Jul 2022 19:05:39 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma01wdc.us.ibm.com with ESMTP id 3hg94w1a5y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:39 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ5cw137290274 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:38 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 785A56A054; Thu, 28 Jul 2022 19:05:38 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A03C96A047; Thu, 28 Jul 2022 19:05:32 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:32 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, Jagdish Gediya , "Aneesh Kumar K . V" Subject: [PATCH v11 7/8] mm/demotion: Demote pages according to allocation fallback order Date: Fri, 29 Jul 2022 00:34:35 +0530 Message-Id: <20220728190436.858458-8-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: wonRcQJrtRtGo0VcwtvxfFN1Pj2HkEas X-Proofpoint-ORIG-GUID: q39cq17enKTwjkppDB68HUPK65bQxowA X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 phishscore=0 bulkscore=0 clxscore=1015 mlxscore=0 malwarescore=0 priorityscore=1501 suspectscore=0 mlxlogscore=999 spamscore=0 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=jv0VJpVD; spf=pass (imf06.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035148; a=rsa-sha256; cv=none; b=JeEDo7HzdWXSoXq/s62NMCrqIReAckUXKJUYeRiYSjkrSlD3L4DUBKZqnEXqc+ZBNSsyNm 9YLOuCsJFut7D7kkqoXUx7naPWhSkUZ0w/qGzBaadT6ENQ7uW8tPfWLe32UsQPsBq9I06n A+UBtjU6WLgY8jZZakVjHSBUd4xJ+Dw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NIlh5XHVT5HUIlQ773XYL5Jq/Mf3x/reo3QmVezqr9w=; b=qYC2R2/Lx/+M9FvnODblccj+5SxPz5AtM9ALu0Fso7u1SmJ/V9ftdfapePmjoB/D0b6+W6 oJjQRaB48EyLmpvePl8nFZCWiZCfdlScFH1IKEx3OnaxIv43EDIhb3+AmkUmjD4Md98gHk X5oXa6hjrI+F70rNo6JrrdHqHEoaB0Y= Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=jv0VJpVD; spf=pass (imf06.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 43B131800A9 X-Stat-Signature: zwj3yxrhr5yb756ana3xq35s8hty76bb X-HE-Tag: 1659035148-836332 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Jagdish Gediya Currently, a higher tier node can only be demoted to selected nodes on the next lower tier as defined by the demotion path. This strict, hard-coded demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space). This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that currently. This patch adds support to get all the allowed demotion targets for a memory tier. demote_page_list() function is now modified to utilize this allowed node mask as the fallback allocation mask. Signed-off-by: Jagdish Gediya Signed-off-by: Aneesh Kumar K.V --- include/linux/memory-tiers.h | 12 ++++++++ mm/memory-tiers.c | 50 +++++++++++++++++++++++++++++-- mm/vmscan.c | 58 ++++++++++++++++++++++++++---------- 3 files changed, 102 insertions(+), 18 deletions(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index e56a57c6ef78..f8dbeda617a7 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -4,6 +4,7 @@ #include #include +#include /* * Each tier cover a abstrace distance chunk size of 128 */ @@ -33,11 +34,17 @@ extern bool numa_demotion_enabled; struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type *default_type); #ifdef CONFIG_MIGRATION int next_demotion_node(int node); +void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); #else static inline int next_demotion_node(int node) { return NUMA_NO_NODE; } + +static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) +{ + *targets = NODE_MASK_NONE; +} #endif #else @@ -52,5 +59,10 @@ static inline int next_demotion_node(int node) { return NUMA_NO_NODE; } + +static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) +{ + *targets = NODE_MASK_NONE; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index f982ca6b3559..84e2be31a853 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -3,7 +3,6 @@ #include #include #include -#include #include #include "internal.h" @@ -19,6 +18,8 @@ struct memory_tier { * adistance_start .. adistance_start + MEMTIER_CHUNK_SIZE */ int adistance_start; + /* All the nodes that are part of all the lower memory tiers. */ + nodemask_t lower_tier_mask; }; struct demotion_nodes { @@ -158,6 +159,24 @@ static struct memory_tier *__node_get_memory_tier(int node) } #ifdef CONFIG_MIGRATION +void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) +{ + struct memory_tier *memtier; + + /* + * pg_data_t.memtier updates includes a synchronize_rcu() + * which ensures that we either find NULL or a valid memtier + * in NODE_DATA. protect the access via rcu_read_lock(); + */ + rcu_read_lock(); + memtier = rcu_dereference(pgdat->memtier); + if (memtier) + *targets = memtier->lower_tier_mask; + else + *targets = NODE_MASK_NONE; + rcu_read_unlock(); +} + /** * next_demotion_node() - Get the next node in the demotion path * @node: The starting node to lookup the next node @@ -205,10 +224,18 @@ int next_demotion_node(int node) static void disable_all_demotion_targets(void) { + struct memory_tier *memtier; int node; - for_each_node_state(node, N_MEMORY) + for_each_node_state(node, N_MEMORY) { node_demotion[node].preferred = NODE_MASK_NONE; + /* + * We are holding memory_tier_lock, it is safe + * to access pgda->memtier. + */ + memtier = __node_get_memory_tier(node); + memtier->lower_tier_mask = NODE_MASK_NONE; + } /* * Ensure that the "disable" is visible across the system. * Readers will see either a combination of before+disable @@ -240,7 +267,7 @@ static void establish_demotion_targets(void) struct demotion_nodes *nd; int target = NUMA_NO_NODE, node; int distance, best_distance; - nodemask_t tier_nodes; + nodemask_t tier_nodes, lower_tier; lockdep_assert_held_once(&memory_tier_lock); @@ -288,6 +315,23 @@ static void establish_demotion_targets(void) } } while (1); } + /* + * Now build the lower_tier mask for each node collecting node mask from + * all memory tier below it. This allows us to fallback demotion page + * allocation to a set of nodes that is closer the above selected + * perferred node. + */ + lower_tier = node_states[N_MEMORY]; + list_for_each_entry_reverse(memtier, &memory_tiers, list) { + /* + * Keep removing current tier from lower_tier nodes, + * This will remove all nodes in current and above + * memory tier from the lower_tier mask. + */ + tier_nodes = get_memtier_nodemask(memtier); + nodes_andnot(lower_tier, lower_tier, tier_nodes); + memtier->lower_tier_mask = lower_tier; + } } #else diff --git a/mm/vmscan.c b/mm/vmscan.c index 3a8f78277f99..303c93958371 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1460,21 +1460,34 @@ static void folio_check_dirty_writeback(struct folio *folio, mapping->a_ops->is_dirty_writeback(folio, dirty, writeback); } -static struct page *alloc_demote_page(struct page *page, unsigned long node) +static struct page *alloc_demote_page(struct page *page, unsigned long private) { - struct migration_target_control mtc = { - /* - * Allocate from 'node', or fail quickly and quietly. - * When this happens, 'page' will likely just be discarded - * instead of migrated. - */ - .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | - __GFP_THISNODE | __GFP_NOWARN | - __GFP_NOMEMALLOC | GFP_NOWAIT, - .nid = node - }; + struct page *target_page; + nodemask_t *allowed_mask; + struct migration_target_control *mtc; + + mtc = (struct migration_target_control *)private; + + allowed_mask = mtc->nmask; + /* + * make sure we allocate from the target node first also trying to + * demote or reclaim pages from the target node via kswapd if we are + * low on free memory on target node. If we don't do this and if + * we have free memory on the slower(lower) memtier, we would start + * allocating pages from slower(lower) memory tiers without even forcing + * a demotion of cold pages from the target memtier. This can result + * in the kernel placing hot pages in slower(lower) memory tiers. + */ + mtc->nmask = NULL; + mtc->gfp_mask |= __GFP_THISNODE; + target_page = alloc_migration_target(page, (unsigned long)mtc); + if (target_page) + return target_page; - return alloc_migration_target(page, (unsigned long)&mtc); + mtc->gfp_mask &= ~__GFP_THISNODE; + mtc->nmask = allowed_mask; + + return alloc_migration_target(page, (unsigned long)mtc); } /* @@ -1487,6 +1500,19 @@ static unsigned int demote_page_list(struct list_head *demote_pages, { int target_nid = next_demotion_node(pgdat->node_id); unsigned int nr_succeeded; + nodemask_t allowed_mask; + + struct migration_target_control mtc = { + /* + * Allocate from 'node', or fail quickly and quietly. + * When this happens, 'page' will likely just be discarded + * instead of migrated. + */ + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN | + __GFP_NOMEMALLOC | GFP_NOWAIT, + .nid = target_nid, + .nmask = &allowed_mask + }; if (list_empty(demote_pages)) return 0; @@ -1494,10 +1520,12 @@ static unsigned int demote_page_list(struct list_head *demote_pages, if (target_nid == NUMA_NO_NODE) return 0; + node_get_allowed_targets(pgdat, &allowed_mask); + /* Demotion ignores all cpuset and mempolicy settings */ migrate_pages(demote_pages, alloc_demote_page, NULL, - target_nid, MIGRATE_ASYNC, MR_DEMOTION, - &nr_succeeded); + (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION, + &nr_succeeded); if (current_is_kswapd()) __count_vm_events(PGDEMOTE_KSWAPD, nr_succeeded); From patchwork Thu Jul 28 19:04:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 12931689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA58BC3F6B0 for ; Thu, 28 Jul 2022 19:06:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 621D4940009; Thu, 28 Jul 2022 15:06:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D1C46B0074; Thu, 28 Jul 2022 15:06:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47286940009; Thu, 28 Jul 2022 15:06:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 37F2A6B0073 for ; Thu, 28 Jul 2022 15:06:00 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 11876160FE0 for ; Thu, 28 Jul 2022 19:06:00 +0000 (UTC) X-FDA: 79737438480.30.DFCDDB9 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf07.hostedemail.com (Postfix) with ESMTP id A57884008A for ; Thu, 28 Jul 2022 19:05:59 +0000 (UTC) Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 26SIqXdI032623; Thu, 28 Jul 2022 19:05:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=mPrEfIzYWCgV0qzubnulyxPvGetkG71CddTv8QD7Zvc=; b=gP/pAGGs4Vk6uS1VqJZqfPgHLbFaAZT5chwlC3lteL2Xkgas1uD5Gmx/Khkm2CuYiEeK lj4nOQOKk4fMU2CNxlNBfIZs3wD4+tk8vfEYHTokfh60ChpacbtvqvO2S6DqmCLkzNs3 Rj5Ty8ilGOBZlN5fe3qY3ZvrEkZnl+lCdRMYSPblFCVjpgUFhAqz9RGj5V8UZrCclt/j 6Rag5c97Ay8db5M4xgiWFtpE273/6TRewEOWs2KWOteU5kbdecQTCB1cRmiHwP0XFNw5 +N6JpZKOwtNJFPp03GzyY6+LYagTDK7Fm8crbRvD6sAXvrv/EYGg9OyskkMH6/Tta0ks Vg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm04c0ca3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:47 +0000 Received: from m0127361.ppops.net (m0127361.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 26SIqjPM033072; Thu, 28 Jul 2022 19:05:46 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hm04c0c97-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:46 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 26SJ2bIK017352; Thu, 28 Jul 2022 19:05:45 GMT Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19]) by ppma04wdc.us.ibm.com with ESMTP id 3hg94b199b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 28 Jul 2022 19:05:45 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 26SJ5itV20185760 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 28 Jul 2022 19:05:44 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C49846A04D; Thu, 28 Jul 2022 19:05:44 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1A0E56A047; Thu, 28 Jul 2022 19:05:39 +0000 (GMT) Received: from skywalker.ibmuc.com (unknown [9.43.25.218]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 28 Jul 2022 19:05:38 +0000 (GMT) From: "Aneesh Kumar K.V" To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Huang Ying , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, "Aneesh Kumar K.V" Subject: [PATCH v11 8/8] mm/demotion: Update node_is_toptier to work with memory tiers Date: Fri, 29 Jul 2022 00:34:36 +0530 Message-Id: <20220728190436.858458-9-aneesh.kumar@linux.ibm.com> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 8r_HMX3UjmrC3pvsKphI6B8_UTWSYyxP X-Proofpoint-GUID: HstupbENr-Ytfr1O3Y22LWMDV2cIksJz X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-07-28_06,2022-07-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 clxscore=1015 impostorscore=0 mlxlogscore=999 adultscore=0 phishscore=0 priorityscore=1501 suspectscore=0 bulkscore=0 spamscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2206140000 definitions=main-2207280086 ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b="gP/pAGGs"; spf=pass (imf07.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659035159; a=rsa-sha256; cv=none; b=NP0gbteBKwtIIEY/IUF4peovSMmcVj2wtP0OmqSU3QtIlSKOoDW8pJ4OPKBwKq+TpOeYDg 2RKXTMHiFufaWHhJF41W6cMKBl5eNjMiRKB4XE1dtUBTgx8bdDQw+vdYwcjJhbfQAf8OL5 CgRqk6pMn81RXTW+XOMzigY6W3zryaQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659035159; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mPrEfIzYWCgV0qzubnulyxPvGetkG71CddTv8QD7Zvc=; b=WyNubohtgaa0Lg2/0nicNgRlfyr+V7pt3h8RX11Oia0lu3BgHAkKTedaLRYSEHSppjEuHL Tik8wso96hNPjfELhXkLpGtpnMNuHffQSYyWRd/uLE8hjH77FUOyvgxqNtUwkWiupzh6X7 g0VtDdrTPj6H7tvQesujPmmUQmGSpu4= X-Rspamd-Queue-Id: A57884008A Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b="gP/pAGGs"; spf=pass (imf07.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspamd-Server: rspam01 X-Rspam-User: X-Stat-Signature: ysy89yppfpbnrzs9xqfdrf94cpm9qrq5 X-HE-Tag: 1659035159-685993 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With memory tiers support we can have memory only NUMA nodes in the top tier from which we want to avoid promotion tracking NUMA faults. Update node_is_toptier to work with memory tiers. All NUMA nodes are by default top tier nodes. With lower memory tiers added we consider all memory tiers above a memory tier having CPU NUMA nodes as a top memory tier Signed-off-by: Aneesh Kumar K.V --- include/linux/memory-tiers.h | 11 ++++++++++ include/linux/node.h | 5 ----- mm/huge_memory.c | 1 + mm/memory-tiers.c | 42 ++++++++++++++++++++++++++++++++++++ mm/migrate.c | 1 + mm/mprotect.c | 1 + 6 files changed, 56 insertions(+), 5 deletions(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index f8dbeda617a7..bc9fb9d39b2c 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -35,6 +35,7 @@ struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type * #ifdef CONFIG_MIGRATION int next_demotion_node(int node); void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); +bool node_is_toptier(int node); #else static inline int next_demotion_node(int node) { @@ -45,6 +46,11 @@ static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *target { *targets = NODE_MASK_NONE; } + +static inline bool node_is_toptier(int node) +{ + return true; +} #endif #else @@ -64,5 +70,10 @@ static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *target { *targets = NODE_MASK_NONE; } + +static inline bool node_is_toptier(int node) +{ + return true; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/node.h b/include/linux/node.h index 40d641a8bfb0..9ec680dd607f 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -185,9 +185,4 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, #define to_node(device) container_of(device, struct node, dev) -static inline bool node_is_toptier(int node) -{ - return node_state(node, N_CPU); -} - #endif /* _LINUX_NODE_H_ */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 834f288b3769..8405662646e9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 84e2be31a853..36d87dc422ab 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -30,6 +30,7 @@ static DEFINE_MUTEX(memory_tier_lock); static LIST_HEAD(memory_tiers); struct memory_dev_type *node_memory_types[MAX_NUMNODES]; #ifdef CONFIG_MIGRATION +static int top_tier_adistance; /* * node_demotion[] examples: * @@ -159,6 +160,31 @@ static struct memory_tier *__node_get_memory_tier(int node) } #ifdef CONFIG_MIGRATION +bool node_is_toptier(int node) +{ + bool toptier; + pg_data_t *pgdat; + struct memory_tier *memtier; + + pgdat = NODE_DATA(node); + if (!pgdat) + return false; + + rcu_read_lock(); + memtier = rcu_dereference(pgdat->memtier); + if (!memtier) { + toptier = true; + goto out; + } + if (memtier->adistance_start >= top_tier_adistance) + toptier = true; + else + toptier = false; +out: + rcu_read_unlock(); + return toptier; +} + void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) { struct memory_tier *memtier; @@ -315,6 +341,22 @@ static void establish_demotion_targets(void) } } while (1); } + /* + * Promotion is allowed from a memory tier to higher + * memory tier only if the memory tier doesn't include + * compute. We want to skip promotion from a memory tier, + * if any node that is part of the memory tier have CPUs. + * Once we detect such a memory tier, we consider that tier + * as top tiper from which promotion on is not allowed. + */ + list_for_each_entry(memtier, &memory_tiers, list) { + tier_nodes = get_memtier_nodemask(memtier); + nodes_and(tier_nodes, node_states[N_CPU], tier_nodes); + if (!nodes_empty(tier_nodes)) { + top_tier_adistance = memtier->adistance_start; + break; + } + } /* * Now build the lower_tier mask for each node collecting node mask from * all memory tier below it. This allows us to fallback demotion page diff --git a/mm/migrate.c b/mm/migrate.c index c758c9c21d7d..1da81136eaaa 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -50,6 +50,7 @@ #include #include #include +#include #include diff --git a/mm/mprotect.c b/mm/mprotect.c index ba5592655ee3..92a2fc0fa88b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include