From patchwork Tue Apr 1 09:27:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Oscar Salvador X-Patchwork-Id: 14034575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02B9CC36014 for ; Tue, 1 Apr 2025 09:27:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D93FD280007; Tue, 1 Apr 2025 05:27:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CCB12280004; Tue, 1 Apr 2025 05:27:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A357B280007; Tue, 1 Apr 2025 05:27:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 717D9280004 for ; Tue, 1 Apr 2025 05:27:38 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 122D7AE08D for ; Tue, 1 Apr 2025 09:27:40 +0000 (UTC) X-FDA: 83284947480.28.BD0C561 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf23.hostedemail.com (Postfix) with ESMTP id C5A2414000A for ; Tue, 1 Apr 2025 09:27:37 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Kw1oFMUm; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Y2JWPR6t; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Kw1oFMUm; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Y2JWPR6t; spf=pass (imf23.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743499658; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=siq2IgnKXG0zWhAAe+Gm097oSo8j7r8gZkjm8Gq47vsQFDmpHk1wEmimFA7ZqM/JcAE/oO hQSposspb6jubEIweweisb3Ah/WEepoMfAedmKkXqnvOWF2NmOicSlAEaAb1ukaRleq7iY nC/j5F+kALEHgFL+gD7feTx6GOyqvfw= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Kw1oFMUm; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Y2JWPR6t; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=Kw1oFMUm; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=Y2JWPR6t; spf=pass (imf23.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743499658; a=rsa-sha256; cv=none; b=GyYTRVbbIMLJqmekS2hHDoW4xNNGOwENsPzS6gHjzqRE78xhrWZ2so9uZT25k6mdldR1YP qyIs3rzKuHwRFQYu6qUAxwnEeLYoN0LKQ8BpLobQAcKTaGfNxHXtbuKFgI+ft0+x9VLDBj EH+w8SgNB2d5S7lArWjUIh1Nk7Q+ujU= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6A2D41F38E; Tue, 1 Apr 2025 09:27:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Kw1oFMUmMHjS9+Qdxs+ynAdFOmek9Ig7PR5cLXKRQqTZAQl2qkuUm5OBdS7+V5BQQ2o3HC b96VOXnvb9PQTQpjvMNiitNnV71fXF5jZyEbuaKpL8XbYte4K0j7qI0krvhg/GOqPHVQWN oNMKYNl1GatAZlAx0qrXSegwfa7eFrM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Y2JWPR6tB5yq8NW4gjxWJv7UA6U/TUFpj6CsOO57TbQqhLUFRO8E4HGbuLlkCn4NrXMStt B7C8IkgbMJdtnfBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Kw1oFMUmMHjS9+Qdxs+ynAdFOmek9Ig7PR5cLXKRQqTZAQl2qkuUm5OBdS7+V5BQQ2o3HC b96VOXnvb9PQTQpjvMNiitNnV71fXF5jZyEbuaKpL8XbYte4K0j7qI0krvhg/GOqPHVQWN oNMKYNl1GatAZlAx0qrXSegwfa7eFrM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743499656; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ja9IRtVB+Kg0mMs7eGKdSau5ewSm4bpz3DgB3v/fQNk=; b=Y2JWPR6tB5yq8NW4gjxWJv7UA6U/TUFpj6CsOO57TbQqhLUFRO8E4HGbuLlkCn4NrXMStt B7C8IkgbMJdtnfBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D05B613691; Tue, 1 Apr 2025 09:27:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id eMoyMIex62cnfwAAD6G6ig (envelope-from ); Tue, 01 Apr 2025 09:27:35 +0000 From: Oscar Salvador To: Andrew Morton Cc: David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com>, mkoutny@suse.com, Dan Williams , Jonathan Cameron , Oscar Salvador Subject: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier Date: Tue, 1 Apr 2025 11:27:15 +0200 Message-ID: <20250401092716.537512-2-osalvador@suse.de> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250401092716.537512-1-osalvador@suse.de> References: <20250401092716.537512-1-osalvador@suse.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: C5A2414000A X-Stat-Signature: tejmzoyd69hmdsmydsbubb4nwkpswwt8 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1743499657-469083 X-HE-Meta: U2FsdGVkX19cwNPiJNF6Te+TL3gV7Fph8x4S7nU7LtwjdiuoeJlCXklG/TxLt5r+oRQE94L/UP1ir+plUNW51dq5MQlBAPUwyhx8cLKMbo/h708vVj0ihTksLNdSEVPYFmWBnWt9hXiNYNQN2D1QC64T8ZA3Cw8R8lWam+Z+yUmrHjpQHM35JwgfQVlKzOeDZ9VvUR5bJgHsbWvABAlhBRrMiLzNcZp/avk0CiXn3guoo/VQatDcL5RubkoOIWp/daDWNpfntgPxCbvbzTclqWW3Y+r0rCAUsS36MG+Bo32VDK78ilynMMEcswbUGXTcbKTj/4PLYDcYTaZFFj4U2fOIL8RtZa1CBPQEdBGFdIG9iUIO6rdpd3ob4Ucl1gMZM8bJASQW464xVWVTsNwMquu+Kuy/g/RdBScGCFXSnYYgMzNg9B4j2f6mEvX+tH3+q1yKbTWAqrLnhFVa1snwzEjBeeUG7zezWHBTSFaVa+2PwsfWnqYI9oRlkT7m3Xqgu1fN3GMitWbeTajCGDgCjMd6UV2lPUF/e5p7Q8jN6fuKZts6wzjjuQXhYm7Zhr00hNEUuh3eOwQfIRjVzHNDwz8MlWR+QYEyCV0xQIOWmc7IjkkbWfAjp1yqj4jts17RyrzkV+V5gSmON2Jighwn2SZKXKyiBA4PU++3BQvY3niytPQO2QIZBVh+NXYzTOZJLpx9gVzY47KWzFouGu0TdNFKkMjdGEXinbYi29InsnNxJVQvvLJReWuhmccBrpcYac5SPEjI3XDtx24wssNZ/pMOoSdV23WAAJpRpF0MUqqcNWAZApZywhnIIxGS1f4POLdLi7DeV0S3DhufmS+1NWH6NQnS69Sn61M4gDHAHvE1vS/tM3D6/DjSRFNgs/ZddX/vVwpn3kBGJ7qXsQtNauf4W57Q6KdJVFZrJb4hDJI01Ew0vlCsOIywuGaPqPvt96Tpf+QmqvkIdx7qfGK W0CZ0WSm CDQC1txdU+CTQJQIBbrScgXephrh+V1JUZz5KiUbU5jUDyiZt+IcVLdFPziWoCfRUlohe8uTk4JsbmFbqTWQrZbdhtQSuv1Tf84FHleXAZoQTWqB5/brhy0JyEs3zdU1Sa4uBBxodAL5VBEOLUOPJcbDz3ykfSyvRWTK6tVcLY6unlqC/+zMoxhMf5487fg539LarjISflQEJXSIAwp+dXVx2DzHGak5biq+xczuotpb6Wl37HMeRE1gXN9p+UEcHIiwVsPDS5x2/1t/PCUfp14X0N8iBWA2EA1ReT2g2X3SCHVQCgSRMJusHRM2Dzhh4TDDaZs9H0Hy77Dv60d3xznEI+Gq6MjpdnRVtrDF1lp1JsSO9jQDpYkUlsLjmh3JcBwV6 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: There are at least four consumers of hotplug_memory_notifier that what they really are interested in is whether any numa node changed its state, e.g: going from being memory aware to becoming memoryless. Implement a specific notifier for numa nodes when their state gets changed, and have those consumers that only care about numa node state changes use it. Signed-off-by: Oscar Salvador Reviewed-by: Harry Yoo --- drivers/acpi/numa/hmat.c | 6 +-- drivers/base/node.c | 19 +++++++++ drivers/cxl/core/region.c | 14 +++---- drivers/cxl/cxl.h | 4 +- include/linux/memory.h | 38 ++++++++++++++++++ kernel/cgroup/cpuset.c | 2 +- mm/memory-tiers.c | 8 ++-- mm/memory_hotplug.c | 84 +++++++++++++++++++++++++++++---------- mm/slub.c | 22 +++++----- 9 files changed, 148 insertions(+), 49 deletions(-) diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c index bfbb08b1e6af..d18f3efa2149 100644 --- a/drivers/acpi/numa/hmat.c +++ b/drivers/acpi/numa/hmat.c @@ -918,10 +918,10 @@ static int hmat_callback(struct notifier_block *self, unsigned long action, void *arg) { struct memory_target *target; - struct memory_notify *mnb = arg; + struct node_notify *mnb = arg; int pxm, nid = mnb->status_change_nid; - if (nid == NUMA_NO_NODE || action != MEM_ONLINE) + if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE) return NOTIFY_OK; pxm = node_to_pxm(nid); @@ -1074,7 +1074,7 @@ static __init int hmat_init(void) hmat_register_targets(); /* Keep the table and structures if the notifier may use them */ - if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI)) + if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI)) goto out_put; if (!hmat_set_default_dram_perf()) diff --git a/drivers/base/node.c b/drivers/base/node.c index 0ea653fa3433..182c71dfb5b8 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -110,6 +110,25 @@ static const struct attribute_group *node_access_node_groups[] = { NULL, }; +static BLOCKING_NOTIFIER_HEAD(node_chain); + +int register_node_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&node_chain, nb); +} +EXPORT_SYMBOL(register_node_notifier); + +void unregister_node_notifier(struct notifier_block *nb) +{ + blocking_notifier_chain_unregister(&node_chain, nb); +} +EXPORT_SYMBOL(unregister_node_notifier); + +int node_notify(unsigned long val, void *v) +{ + return blocking_notifier_call_chain(&node_chain, val, v); +} + static void node_remove_accesses(struct node *node) { struct node_access_nodes *c, *cnext; diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e8d11a988fd9..7d187088f557 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2409,12 +2409,12 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb, unsigned long action, void *arg) { struct cxl_region *cxlr = container_of(nb, struct cxl_region, - memory_notifier); - struct memory_notify *mnb = arg; + node_notifier); + struct node_notify *mnb = arg; int nid = mnb->status_change_nid; int region_nid; - if (nid == NUMA_NO_NODE || action != MEM_ONLINE) + if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE) return NOTIFY_DONE; /* @@ -3388,7 +3388,7 @@ static void shutdown_notifiers(void *_cxlr) { struct cxl_region *cxlr = _cxlr; - unregister_memory_notifier(&cxlr->memory_notifier); + unregister_node_notifier(&cxlr->node_notifier); unregister_mt_adistance_algorithm(&cxlr->adist_notifier); } @@ -3427,9 +3427,9 @@ static int cxl_region_probe(struct device *dev) if (rc) return rc; - cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback; - cxlr->memory_notifier.priority = CXL_CALLBACK_PRI; - register_memory_notifier(&cxlr->memory_notifier); + cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback; + cxlr->node_notifier.priority = CXL_CALLBACK_PRI; + register_node_notifier(&cxlr->node_notifier); cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance; cxlr->adist_notifier.priority = 100; diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index bbbaa0d0a670..d4c9a499de7a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -532,7 +532,7 @@ struct cxl_region_params { * @flags: Region state flags * @params: active + config params for the region * @coord: QoS access coordinates for the region - * @memory_notifier: notifier for setting the access coordinates to node + * @node_notifier: notifier for setting the access coordinates to node * @adist_notifier: notifier for calculating the abstract distance of node */ struct cxl_region { @@ -545,7 +545,7 @@ struct cxl_region { unsigned long flags; struct cxl_region_params params; struct access_coordinate coord[ACCESS_COORDINATE_MAX]; - struct notifier_block memory_notifier; + struct notifier_block node_notifier; struct notifier_block adist_notifier; }; diff --git a/include/linux/memory.h b/include/linux/memory.h index 12daa6ec7d09..1d814dfbb8a8 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -99,6 +99,14 @@ int set_memory_block_size_order(unsigned int order); #define MEM_PREPARE_ONLINE (1<<6) #define MEM_FINISH_OFFLINE (1<<7) +/* These states are used for numa node notifiers */ +#define NODE_BECOMING_MEM_AWARE (1<<0) +#define NODE_BECAME_MEM_AWARE (1<<1) +#define NODE_BECOMING_MEMORYLESS (1<<2) +#define NODE_BECAME_MEMORYLESS (1<<3) +#define NODE_CANCEL_MEM_AWARE (1<<4) +#define NODE_CANCEL_MEMORYLESS (1<<5) + struct memory_notify { /* * The altmap_start_pfn and altmap_nr_pages fields are designated for @@ -113,6 +121,11 @@ struct memory_notify { int status_change_nid; }; +struct node_notify { + int status_change_nid_normal; + int status_change_nid; +}; + struct notifier_block; struct mem_section; @@ -149,15 +162,34 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri) { return 0; } + +static inline int register_node_notifier(struct notifier_block *nb) +{ + return 0; +} +static inline void unregister_node_notifier(struct notifier_block *nb) +{ +} +static inline int node_notify(unsigned long val, void *v) +{ + return 0; +} +static inline int hotplug_node_notifier(notifier_fn_t fn, int pri) +{ + return 0; +} #else /* CONFIG_MEMORY_HOTPLUG */ extern int register_memory_notifier(struct notifier_block *nb); +extern int register_node_notifier(struct notifier_block *nb); extern void unregister_memory_notifier(struct notifier_block *nb); +extern void unregister_node_notifier(struct notifier_block *nb); int create_memory_block_devices(unsigned long start, unsigned long size, struct vmem_altmap *altmap, struct memory_group *group); void remove_memory_block_devices(unsigned long start, unsigned long size); extern void memory_dev_init(void); extern int memory_notify(unsigned long val, void *v); +extern int node_notify(unsigned long val, void *v); extern struct memory_block *find_memory_block(unsigned long section_nr); typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *); extern int walk_memory_blocks(unsigned long start, unsigned long size, @@ -177,6 +209,12 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func, register_memory_notifier(&fn##_mem_nb); \ }) +#define hotplug_node_notifier(fn, pri) ({ \ + static __meminitdata struct notifier_block fn##_node_nb =\ + { .notifier_call = fn, .priority = pri };\ + register_node_notifier(&fn##_node_nb); \ +}) + #ifdef CONFIG_NUMA void memory_block_add_nid(struct memory_block *mem, int nid, enum meminit_context context); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 0f910c828973..62a5d34c4331 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3939,7 +3939,7 @@ void __init cpuset_init_smp(void) cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask); top_cpuset.effective_mems = node_states[N_MEMORY]; - hotplug_memory_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); + hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI); cpuset_migrate_mm_wq = alloc_ordered_workqueue("cpuset_migrate_mm", 0); BUG_ON(!cpuset_migrate_mm_wq); diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index fc14fe53e9b7..dfe6c28c8352 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -872,7 +872,7 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self, unsigned long action, void *_arg) { struct memory_tier *memtier; - struct memory_notify *arg = _arg; + struct node_notify *arg = _arg; /* * Only update the node migration order when a node is @@ -882,13 +882,13 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self, return notifier_from_errno(0); switch (action) { - case MEM_OFFLINE: + case NODE_BECAME_MEMORYLESS: mutex_lock(&memory_tier_lock); if (clear_node_memory_tier(arg->status_change_nid)) establish_demotion_targets(); mutex_unlock(&memory_tier_lock); break; - case MEM_ONLINE: + case NODE_BECAME_MEM_AWARE: mutex_lock(&memory_tier_lock); memtier = set_node_memory_tier(arg->status_change_nid); if (!IS_ERR(memtier)) @@ -929,7 +929,7 @@ static int __init memory_tier_init(void) nodes_and(default_dram_nodes, node_states[N_MEMORY], node_states[N_CPU]); - hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); + hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI); return 0; } subsys_initcall(memory_tier_init); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 75401866fb76..4bb9ff282ec9 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -701,7 +701,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages) /* check which state of node_states will be changed when online memory */ static void node_states_check_changes_online(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) + struct zone *zone, struct node_notify *arg) { int nid = zone_to_nid(zone); @@ -714,7 +714,7 @@ static void node_states_check_changes_online(unsigned long nr_pages, arg->status_change_nid_normal = nid; } -static void node_states_set_node(int node, struct memory_notify *arg) +static void node_states_set_node(int node, struct node_notify *arg) { if (arg->status_change_nid_normal >= 0) node_set_state(node, N_NORMAL_MEMORY); @@ -1177,7 +1177,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, int need_zonelists_rebuild = 0; const int nid = zone_to_nid(zone); int ret; - struct memory_notify arg; + struct memory_notify mem_arg; + struct node_notify node_arg; + bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false; /* * {on,off}lining is constrained to full memory sections (or more @@ -1194,11 +1196,23 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, /* associate pfn range with the zone */ move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); - arg.start_pfn = pfn; - arg.nr_pages = nr_pages; - node_states_check_changes_online(nr_pages, zone, &arg); + mem_arg.start_pfn = pfn; + mem_arg.nr_pages = nr_pages; + node_states_check_changes_online(nr_pages, zone, &node_arg); - ret = memory_notify(MEM_GOING_ONLINE, &arg); + if (node_arg.status_change_nid >= 0) { + /* Node is becoming memory aware. Notify consumers */ + cancel_node_notifier_on_err = true; + ret = node_notify(NODE_BECOMING_MEM_AWARE, &node_arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_addition; + } + + cancel_mem_notifier_on_err = true; + mem_arg.status_change_nid = node_arg.status_change_nid; + mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal; + ret = memory_notify(MEM_GOING_ONLINE, &mem_arg); ret = notifier_to_errno(ret); if (ret) goto failed_addition; @@ -1224,7 +1238,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, online_pages_range(pfn, nr_pages); adjust_present_page_count(pfn_to_page(pfn), group, nr_pages); - node_states_set_node(nid, &arg); + node_states_set_node(nid, &node_arg); if (need_zonelists_rebuild) build_all_zonelists(NULL); @@ -1245,16 +1259,26 @@ int online_pages(unsigned long pfn, unsigned long nr_pages, kswapd_run(nid); kcompactd_run(nid); + if (node_arg.status_change_nid >= 0) + /* + * Node went from memoryless to have memory. Notifiy interested + * consumers + */ + node_notify(NODE_BECAME_MEM_AWARE, &node_arg); + writeback_set_ratelimit(); - memory_notify(MEM_ONLINE, &arg); + memory_notify(MEM_ONLINE, &mem_arg); return 0; failed_addition: pr_debug("online_pages [mem %#010llx-%#010llx] failed\n", (unsigned long long) pfn << PAGE_SHIFT, (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1); - memory_notify(MEM_CANCEL_ONLINE, &arg); + if (cancel_node_notifier_on_err) + node_notify(NODE_CANCEL_MEM_AWARE, &node_arg); + if (cancel_mem_notifier_on_err) + memory_notify(MEM_CANCEL_ONLINE, &mem_arg); remove_pfn_range_from_zone(zone, pfn, nr_pages); return ret; } @@ -1898,7 +1922,7 @@ early_param("movable_node", cmdline_parse_movable_node); /* check which state of node_states will be changed when offline memory */ static void node_states_check_changes_offline(unsigned long nr_pages, - struct zone *zone, struct memory_notify *arg) + struct zone *zone, struct node_notify *arg) { struct pglist_data *pgdat = zone->zone_pgdat; unsigned long present_pages = 0; @@ -1935,7 +1959,7 @@ static void node_states_check_changes_offline(unsigned long nr_pages, arg->status_change_nid = zone_to_nid(zone); } -static void node_states_clear_node(int node, struct memory_notify *arg) +static void node_states_clear_node(int node, struct node_notify *arg) { if (arg->status_change_nid_normal >= 0) node_clear_state(node, N_NORMAL_MEMORY); @@ -1963,7 +1987,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, unsigned long pfn, managed_pages, system_ram_pages = 0; const int node = zone_to_nid(zone); unsigned long flags; - struct memory_notify arg; + struct memory_notify mem_arg; + struct node_notify node_arg; + bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false; char *reason; int ret; @@ -2022,11 +2048,22 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, goto failed_removal_pcplists_disabled; } - arg.start_pfn = start_pfn; - arg.nr_pages = nr_pages; - node_states_check_changes_offline(nr_pages, zone, &arg); + mem_arg.start_pfn = start_pfn; + mem_arg.nr_pages = nr_pages; + node_states_check_changes_offline(nr_pages, zone, &node_arg); + + if (node_arg.status_change_nid >= 0) { + cancel_node_notifier_on_err = true; + ret = node_notify(NODE_BECOMING_MEMORYLESS, &node_arg); + ret = notifier_to_errno(ret); + if (ret) + goto failed_removal_isolated; + } - ret = memory_notify(MEM_GOING_OFFLINE, &arg); + cancel_mem_notifier_on_err = true; + mem_arg.status_change_nid = node_arg.status_change_nid; + mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal; + ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg); ret = notifier_to_errno(ret); if (ret) { reason = "notifier failure"; @@ -2106,27 +2143,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages, * Make sure to mark the node as memory-less before rebuilding the zone * list. Otherwise this node would still appear in the fallback lists. */ - node_states_clear_node(node, &arg); + node_states_clear_node(node, &node_arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } - if (arg.status_change_nid >= 0) { + if (node_arg.status_change_nid >= 0) { kcompactd_stop(node); kswapd_stop(node); + /*Node went memoryless. Notifiy interested consumers */ + node_notify(NODE_BECAME_MEMORYLESS, &node_arg); } writeback_set_ratelimit(); - memory_notify(MEM_OFFLINE, &arg); + memory_notify(MEM_OFFLINE, &mem_arg); remove_pfn_range_from_zone(zone, start_pfn, nr_pages); return 0; failed_removal_isolated: /* pushback to free area */ undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); - memory_notify(MEM_CANCEL_OFFLINE, &arg); + if (cancel_node_notifier_on_err) + node_notify(NODE_CANCEL_MEMORYLESS, &node_arg); + if (cancel_mem_notifier_on_err) + memory_notify(MEM_CANCEL_OFFLINE, &mem_arg); failed_removal_pcplists_disabled: lru_cache_enable(); zone_pcp_enable(zone); diff --git a/mm/slub.c b/mm/slub.c index 184fd2b14758..74350f6c8ddd 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5928,10 +5928,10 @@ static int slab_mem_going_offline_callback(void *arg) static void slab_mem_offline_callback(void *arg) { - struct memory_notify *marg = arg; + struct node_notify *narg = arg; int offline_node; - offline_node = marg->status_change_nid_normal; + offline_node = narg->status_change_nid_normal; /* * If the node still has available memory. we need kmem_cache_node @@ -5954,8 +5954,8 @@ static int slab_mem_going_online_callback(void *arg) { struct kmem_cache_node *n; struct kmem_cache *s; - struct memory_notify *marg = arg; - int nid = marg->status_change_nid_normal; + struct node_notify *narg = arg; + int nid = narg->status_change_nid_normal; int ret = 0; /* @@ -6007,18 +6007,18 @@ static int slab_memory_callback(struct notifier_block *self, int ret = 0; switch (action) { - case MEM_GOING_ONLINE: + case NODE_BECOMING_MEM_AWARE: ret = slab_mem_going_online_callback(arg); break; - case MEM_GOING_OFFLINE: + case NODE_BECOMING_MEMORYLESS: ret = slab_mem_going_offline_callback(arg); break; - case MEM_OFFLINE: - case MEM_CANCEL_ONLINE: + case NODE_BECAME_MEMORYLESS: + case NODE_CANCEL_MEM_AWARE: slab_mem_offline_callback(arg); break; - case MEM_ONLINE: - case MEM_CANCEL_OFFLINE: + case NODE_BECAME_MEM_AWARE: + case NODE_CANCEL_MEMORYLESS: break; } if (ret) @@ -6094,7 +6094,7 @@ void __init kmem_cache_init(void) sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0); - hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI); + hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI); /* Able to allocate the per node structures */ slab_state = PARTIAL;