diff mbox series

[-V2] mm/migrate: fix CPUHP state to update node demotion order

Message ID 20210927081100.475075-1-ying.huang@intel.com (mailing list archive)
State New
Headers show
Series [-V2] mm/migrate: fix CPUHP state to update node demotion order | expand

Commit Message

Huang, Ying Sept. 27, 2021, 8:11 a.m. UTC
The node demotion order needs to be updated during CPU hotplug.
Because whether a NUMA node has CPU may influence the demotion order.
The update function should be called during CPU online/offline after
the node_states[N_CPU] has been updated.  That is done in
CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD
during CPU offline.  But in commit 884a6e5d1f93 ("mm/migrate: update
node demotion order on hotplug events"), the function to update node
demotion order is called in CPUHP_AP_ONLINE_DYN during CPU
online/offline.  This doesn't satisfy the order requirement.  So in
this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
update function on them.

Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Keith Busch <kbusch@kernel.org>

Changes:

v2:

- Revise state name to follow the naming convention per Thomas' comments.

- Use cpuhp_setup_state() to initialize the initial order per Mika's comments.
---
 include/linux/cpuhotplug.h | 4 ++++
 mm/migrate.c               | 8 +++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Andrew Morton Sept. 28, 2021, 12:47 a.m. UTC | #1
On Mon, 27 Sep 2021 16:11:00 +0800 Huang Ying <ying.huang@intel.com> wrote:

> The node demotion order needs to be updated during CPU hotplug.
> Because whether a NUMA node has CPU may influence the demotion order.
> The update function should be called during CPU online/offline after
> the node_states[N_CPU] has been updated.  That is done in
> CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD
> during CPU offline.  But in commit 884a6e5d1f93 ("mm/migrate: update
> node demotion order on hotplug events"), the function to update node
> demotion order is called in CPUHP_AP_ONLINE_DYN during CPU
> online/offline.  This doesn't satisfy the order requirement.

What are the user-visible runtime effects of this error?

>  So in
> this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
> CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
> CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
> update function on them.
> 
> Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Wei Xu <weixugc@google.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> Cc: Keith Busch <kbusch@kernel.org>
>
Huang, Ying Sept. 28, 2021, 1:21 a.m. UTC | #2
Andrew Morton <akpm@linux-foundation.org> writes:

> On Mon, 27 Sep 2021 16:11:00 +0800 Huang Ying <ying.huang@intel.com> wrote:
>
>> The node demotion order needs to be updated during CPU hotplug.
>> Because whether a NUMA node has CPU may influence the demotion order.
>> The update function should be called during CPU online/offline after
>> the node_states[N_CPU] has been updated.  That is done in
>> CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD
>> during CPU offline.  But in commit 884a6e5d1f93 ("mm/migrate: update
>> node demotion order on hotplug events"), the function to update node
>> demotion order is called in CPUHP_AP_ONLINE_DYN during CPU
>> online/offline.  This doesn't satisfy the order requirement.
>
> What are the user-visible runtime effects of this error?

For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in
S0 and P2, P3 in S1), the demotion order is

- S0 -> NUMA_NO_NODE
- S1 -> NUMA_NO_NODE

After P2 and P3 is offlined, because S1 has no CPU now, the demotion
order should have been changed to

- S0 -> S1
- S1 -> NO_NODE

but it isn't changed, because the order updating callback for CPU
hotplug doesn't see the new nodemask.  Now, if P1 is offlined, the
demotion order is changed to the expected order as above.

I will update the patch description after adding the above description.

Best Regards,
Huang, Ying

>>  So in
>> this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
>> CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
>> CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
>> update function on them.
>> 
>> Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Wei Xu <weixugc@google.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Greg Thelen <gthelen@google.com>
>> Cc: Keith Busch <kbusch@kernel.org>
>>
diff mbox series

Patch

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 832d8a74fa59..991911048857 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -72,6 +72,8 @@  enum cpuhp_state {
 	CPUHP_SLUB_DEAD,
 	CPUHP_DEBUG_OBJ_DEAD,
 	CPUHP_MM_WRITEBACK_DEAD,
+	/* Must be after CPUHP_MM_VMSTAT_DEAD */
+	CPUHP_MM_DEMOTION_DEAD,
 	CPUHP_MM_VMSTAT_DEAD,
 	CPUHP_SOFTIRQ_DEAD,
 	CPUHP_NET_MVNETA_DEAD,
@@ -240,6 +242,8 @@  enum cpuhp_state {
 	CPUHP_AP_BASE_CACHEINFO_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 30,
+	/* Must be after CPUHP_AP_ONLINE_DYN for node_states[N_CPU] update */
+	CPUHP_AP_MM_DEMOTION_ONLINE,
 	CPUHP_AP_X86_HPET_ONLINE,
 	CPUHP_AP_X86_KVM_CLK_ONLINE,
 	CPUHP_AP_DTPM_CPU_ONLINE,
diff --git a/mm/migrate.c b/mm/migrate.c
index c14a55004fee..7769abac8aad 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3284,9 +3284,8 @@  static int __init migrate_on_reclaim_init(void)
 {
 	int ret;
 
-	ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "migrate on reclaim",
-				migration_online_cpu,
-				migration_offline_cpu);
+	ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
+					NULL, migration_offline_cpu);
 	/*
 	 * In the unlikely case that this fails, the automatic
 	 * migration targets may become suboptimal for nodes
@@ -3294,6 +3293,9 @@  static int __init migrate_on_reclaim_init(void)
 	 * rare case, do not bother trying to do anything special.
 	 */
 	WARN_ON(ret < 0);
+	ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
+				migration_online_cpu, NULL);
+	WARN_ON(ret < 0);
 
 	hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
 	return 0;