Message ID | 20220309144644.4278-1-osalvador@suse.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Only re-generate demotion targets when a numa node changes its N_CPU state | expand |
On Wed, 9 Mar 2022 15:46:44 +0100 Oscar Salvador <osalvador@suse.de> wrote: > Abhishek reported that after patch [1], hotplug operations are > taking ~double the expected time. [2] > > The reason behind is that the CPU callbacks that migrate_on_reclaim_init() > sets always call set_migration_target_nodes() whenever a CPU is brought > up/down. > But we only care about numa nodes going from having cpus to become > cpuless, and vice versa, as that influences the demotion_target order. > > We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead()) > that check exactly that, so get rid of the CPU callbacks in > migrate_on_reclaim_init() and only call set_migration_target_nodes() from > vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state. > > [1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/ > [2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/ > > ... > > extern bool numa_demotion_enabled; > +#ifdef CONFIG_HOTPLUG_CPU > +extern void set_migration_target_nodes(void); > +#else > +static inline void set_migration_target_nodes() {} Compiler won't like that. Please test (compile and runtime) with CONFIG_HOTPLUG_CPU=n. > > ... >
On Wed, Mar 09, 2022 at 01:02:49PM -0800, Andrew Morton wrote: > On Wed, 9 Mar 2022 15:46:44 +0100 Oscar Salvador <osalvador@suse.de> wrote: > > +static inline void set_migration_target_nodes() {} > > Compiler won't like that. Please test (compile and runtime) with > CONFIG_HOTPLUG_CPU=n. Sorry, fat fingers. This on top: diff --git a/include/linux/migrate.h b/include/linux/migrate.h index c64fe2923fb0..b82b4a9a0136 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -51,7 +51,7 @@ extern bool numa_demotion_enabled; #ifdef CONFIG_HOTPLUG_CPU extern void set_migration_target_nodes(void); #else -static inline void set_migration_target_nodes() {} +static inline void set_migration_target_nodes(void) {} #endif #else I will wait for some feedback before sending a v2. thanks Andrew!
Oscar Salvador <osalvador@suse.de> writes: > Abhishek reported that after patch [1], hotplug operations are > taking ~double the expected time. [2] > > The reason behind is that the CPU callbacks that migrate_on_reclaim_init() > sets always call set_migration_target_nodes() whenever a CPU is brought > up/down. > But we only care about numa nodes going from having cpus to become > cpuless, and vice versa, as that influences the demotion_target order. > > We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead()) > that check exactly that, so get rid of the CPU callbacks in > migrate_on_reclaim_init() and only call set_migration_target_nodes() from > vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state. > > [1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/ > [2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/ > > Reported-by: Abhishek Goel <huntbag@linux.vnet.ibm.com> > Signed-off-by: Oscar Salvador <osalvador@suse.de> > --- > I think there is further room for improvement like should we call in to > set_migration_target_nodes() when demotion is disabled via sysctl? > I will have a look into that, but let us go with this quick fix for now. > Also, I am not really strong about the Fixes tag, but it can be added > if you think it makes sense. > --- > include/linux/migrate.h | 5 +++++ > mm/migrate.c | 36 +----------------------------------- > mm/vmstat.c | 10 +++++++++- > 3 files changed, 15 insertions(+), 36 deletions(-) > > diff --git a/include/linux/migrate.h b/include/linux/migrate.h > index db96e10eb8da..c64fe2923fb0 100644 > --- a/include/linux/migrate.h > +++ b/include/linux/migrate.h > @@ -48,6 +48,11 @@ int folio_migrate_mapping(struct address_space *mapping, > struct folio *newfolio, struct folio *folio, int extra_count); > > extern bool numa_demotion_enabled; > +#ifdef CONFIG_HOTPLUG_CPU > +extern void set_migration_target_nodes(void); > +#else > +static inline void set_migration_target_nodes() {} > +#endif > #else > > static inline void putback_movable_pages(struct list_head *l) {} > diff --git a/mm/migrate.c b/mm/migrate.c > index c7da064b4781..7847e4de01d7 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -3190,7 +3190,7 @@ static void __set_migration_target_nodes(void) > /* > * For callers that do not hold get_online_mems() already. > */ > -static void set_migration_target_nodes(void) > +void set_migration_target_nodes(void) > { > get_online_mems(); > __set_migration_target_nodes(); > @@ -3254,47 +3254,13 @@ static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, > return notifier_from_errno(0); > } > > -/* > - * React to hotplug events that might affect the migration targets > - * like events that online or offline NUMA nodes. > - * > - * The ordering is also currently dependent on which nodes have > - * CPUs. That means we need CPU on/offline notification too. > - */ > -static int migration_online_cpu(unsigned int cpu) > -{ > - set_migration_target_nodes(); > - return 0; > -} > - > -static int migration_offline_cpu(unsigned int cpu) > -{ > - set_migration_target_nodes(); > - return 0; > -} > - > static int __init migrate_on_reclaim_init(void) > { > - int ret; > - > node_demotion = kmalloc_array(nr_node_ids, > sizeof(struct demotion_nodes), > GFP_KERNEL); > WARN_ON(!node_demotion); > > - ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline", > - NULL, migration_offline_cpu); > - /* > - * In the unlikely case that this fails, the automatic > - * migration targets may become suboptimal for nodes > - * where N_CPU changes. With such a small impact in a > - * rare case, do not bother trying to do anything special. > - */ > - WARN_ON(ret < 0); > - ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online", > - migration_online_cpu, NULL); > - WARN_ON(ret < 0); > - We need to call set_migration_target_nodes() during system boot somewhere, either here or in init_mm_internals(). Best Regards, Huang, Ying > hotplug_memory_notifier(migrate_on_reclaim_callback, 100); > return 0; > } > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4057372745d0..0529a83c8f89 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -28,6 +28,7 @@ > #include <linux/mm_inline.h> > #include <linux/page_ext.h> > #include <linux/page_owner.h> > +#include <linux/migrate.h> > > #include "internal.h" > > @@ -2043,7 +2044,12 @@ static void __init init_cpu_node_state(void) > static int vmstat_cpu_online(unsigned int cpu) > { > refresh_zone_stat_thresholds(); > - node_set_state(cpu_to_node(cpu), N_CPU); > + > + if (!node_state(cpu_to_node(cpu), N_CPU)) { > + node_set_state(cpu_to_node(cpu), N_CPU); > + set_migration_target_nodes(); > + } > + > return 0; > } > > @@ -2066,6 +2072,8 @@ static int vmstat_cpu_dead(unsigned int cpu) > return 0; > > node_clear_state(node, N_CPU); > + set_migration_target_nodes(); > + > return 0; > }
Hi Oscar, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on hnaz-mm/master] url: https://github.com/0day-ci/linux/commits/Oscar-Salvador/mm-Only-re-generate-demotion-targets-when-a-numa-node-changes-its-N_CPU-state/20220309-224707 base: https://github.com/hnaz/linux-mm master config: sparc-randconfig-r001-20220309 (https://download.01.org/0day-ci/archive/20220310/202203100830.iDtoMHKt-lkp@intel.com/config) compiler: sparc64-linux-gcc (GCC) 11.2.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/8af0c9ff9475c64e31963a5810b127875081c5ff git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Oscar-Salvador/mm-Only-re-generate-demotion-targets-when-a-numa-node-changes-its-N_CPU-state/20220309-224707 git checkout 8af0c9ff9475c64e31963a5810b127875081c5ff # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=sparc SHELL=/bin/bash kernel/sched/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): In file included from kernel/sched/sched.h:53, from kernel/sched/core.c:13: include/linux/migrate.h:54:20: error: function declaration isn't a prototype [-Werror=strict-prototypes] 54 | static inline void set_migration_target_nodes() {} | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/migrate.h: In function 'set_migration_target_nodes': >> include/linux/migrate.h:54:20: warning: old-style function definition [-Wold-style-definition] kernel/sched/core.c: At top level: kernel/sched/core.c:3442:6: warning: no previous prototype for 'sched_set_stop_task' [-Wmissing-prototypes] 3442 | void sched_set_stop_task(int cpu, struct task_struct *stop) | ^~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors -- In file included from kernel/sched/sched.h:53, from kernel/sched/loadavg.c:9: include/linux/migrate.h:54:20: error: function declaration isn't a prototype [-Werror=strict-prototypes] 54 | static inline void set_migration_target_nodes() {} | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/migrate.h: In function 'set_migration_target_nodes': >> include/linux/migrate.h:54:20: warning: old-style function definition [-Wold-style-definition] cc1: some warnings being treated as errors -- In file included from kernel/sched/sched.h:53, from kernel/sched/fair.c:23: include/linux/migrate.h:54:20: error: function declaration isn't a prototype [-Werror=strict-prototypes] 54 | static inline void set_migration_target_nodes() {} | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/migrate.h: In function 'set_migration_target_nodes': >> include/linux/migrate.h:54:20: warning: old-style function definition [-Wold-style-definition] kernel/sched/fair.c: At top level: kernel/sched/fair.c:11135:6: warning: no previous prototype for 'task_vruntime_update' [-Wmissing-prototypes] 11135 | void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_fi) | ^~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors -- In file included from kernel/sched/sched.h:53, from kernel/sched/rt.c:6: include/linux/migrate.h:54:20: error: function declaration isn't a prototype [-Werror=strict-prototypes] 54 | static inline void set_migration_target_nodes() {} | ^~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/migrate.h: In function 'set_migration_target_nodes': >> include/linux/migrate.h:54:20: warning: old-style function definition [-Wold-style-definition] kernel/sched/rt.c: At top level: kernel/sched/rt.c:730:6: warning: no previous prototype for 'sched_rt_bandwidth_account' [-Wmissing-prototypes] 730 | bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +54 include/linux/migrate.h 36 37 extern void migrate_page_states(struct page *newpage, struct page *page); 38 extern void migrate_page_copy(struct page *newpage, struct page *page); 39 extern int migrate_huge_page_move_mapping(struct address_space *mapping, 40 struct page *newpage, struct page *page); 41 extern int migrate_page_move_mapping(struct address_space *mapping, 42 struct page *newpage, struct page *page, int extra_count); 43 void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, 44 spinlock_t *ptl); 45 void folio_migrate_flags(struct folio *newfolio, struct folio *folio); 46 void folio_migrate_copy(struct folio *newfolio, struct folio *folio); 47 int folio_migrate_mapping(struct address_space *mapping, 48 struct folio *newfolio, struct folio *folio, int extra_count); 49 50 extern bool numa_demotion_enabled; 51 #ifdef CONFIG_HOTPLUG_CPU 52 extern void set_migration_target_nodes(void); 53 #else > 54 static inline void set_migration_target_nodes() {} 55 #endif 56 #else 57 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Oscar, Thank you for the patch! Yet something to improve: [auto build test ERROR on hnaz-mm/master] url: https://github.com/0day-ci/linux/commits/Oscar-Salvador/mm-Only-re-generate-demotion-targets-when-a-numa-node-changes-its-N_CPU-state/20220309-224707 base: https://github.com/hnaz/linux-mm master config: riscv-randconfig-s032-20220309 (https://download.01.org/0day-ci/archive/20220310/202203100900.4UKi7486-lkp@intel.com/config) compiler: riscv32-linux-gcc (GCC) 11.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.4-dirty # https://github.com/0day-ci/linux/commit/8af0c9ff9475c64e31963a5810b127875081c5ff git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Oscar-Salvador/mm-Only-re-generate-demotion-targets-when-a-numa-node-changes-its-N_CPU-state/20220309-224707 git checkout 8af0c9ff9475c64e31963a5810b127875081c5ff # save the config file to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=riscv SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): mm/vmstat.c: In function 'vmstat_cpu_online': >> mm/vmstat.c:2056:17: error: implicit declaration of function 'set_migration_target_nodes' [-Werror=implicit-function-declaration] 2056 | set_migration_target_nodes(); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors vim +/set_migration_target_nodes +2056 mm/vmstat.c 2049 2050 static int vmstat_cpu_online(unsigned int cpu) 2051 { 2052 refresh_zone_stat_thresholds(); 2053 2054 if (!node_state(cpu_to_node(cpu), N_CPU)) { 2055 node_set_state(cpu_to_node(cpu), N_CPU); > 2056 set_migration_target_nodes(); 2057 } 2058 2059 return 0; 2060 } 2061 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Thu, Mar 10, 2022 at 08:39:53AM +0800, Huang, Ying wrote: > We need to call set_migration_target_nodes() during system boot > somewhere, either here or in init_mm_internals(). Hi Huang Ying, vmstat_cpu_online() already gets called during boot: static struct cpuhp_step cpuhp_hp_states[] = { ... #ifdef CONFIG_SMP [CPUHP_CREATE_THREADS]= { .name = "threads:prepare", .startup.single = smpboot_create_threads, .teardown.single = NULL, .cant_stop = true, }, ... smpboot_create_threads __smpboot_create_thread smpboot_thread_fn ht->thread_fn() cpuhp_thread_fun cpuhp_invoke_callback vmstat_cpu_online That for every CPU that is brought up during boot. So unless I am missing something, I would say we are already covered there, right?
Hi Oscar, On 3/10/2022 1:48 PM, Oscar Salvador wrote: > On Thu, Mar 10, 2022 at 08:39:53AM +0800, Huang, Ying wrote: >> We need to call set_migration_target_nodes() during system boot >> somewhere, either here or in init_mm_internals(). > > Hi Huang Ying, > > vmstat_cpu_online() already gets called during boot: > > static struct cpuhp_step cpuhp_hp_states[] = { > ... > #ifdef CONFIG_SMP > [CPUHP_CREATE_THREADS]= { > .name = "threads:prepare", > .startup.single = smpboot_create_threads, > .teardown.single = NULL, > .cant_stop = true, > }, > ... > > smpboot_create_threads > __smpboot_create_thread > smpboot_thread_fn > ht->thread_fn() > cpuhp_thread_fun > cpuhp_invoke_callback > vmstat_cpu_online > > That for every CPU that is brought up during boot. > > So unless I am missing something, I would say we are already covered > there, right? I've tested your patch, unfortunately it can not work. It already set the node with N_CPU state in init_cpu_node_state() earlier, so in vmstat_cpu_online() ,the sentence 'if(!node_state(cpu_to_node(cpu), N_CPU))' is always false, which means it will not call set_migration_target_nodes(). Like Huang Ying said, we should call set_migration_target_nodes() in migrate_on_reclaim_init() or init_mm_internals().
diff --git a/include/linux/migrate.h b/include/linux/migrate.h index db96e10eb8da..c64fe2923fb0 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -48,6 +48,11 @@ int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int extra_count); extern bool numa_demotion_enabled; +#ifdef CONFIG_HOTPLUG_CPU +extern void set_migration_target_nodes(void); +#else +static inline void set_migration_target_nodes() {} +#endif #else static inline void putback_movable_pages(struct list_head *l) {} diff --git a/mm/migrate.c b/mm/migrate.c index c7da064b4781..7847e4de01d7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -3190,7 +3190,7 @@ static void __set_migration_target_nodes(void) /* * For callers that do not hold get_online_mems() already. */ -static void set_migration_target_nodes(void) +void set_migration_target_nodes(void) { get_online_mems(); __set_migration_target_nodes(); @@ -3254,47 +3254,13 @@ static int __meminit migrate_on_reclaim_callback(struct notifier_block *self, return notifier_from_errno(0); } -/* - * React to hotplug events that might affect the migration targets - * like events that online or offline NUMA nodes. - * - * The ordering is also currently dependent on which nodes have - * CPUs. That means we need CPU on/offline notification too. - */ -static int migration_online_cpu(unsigned int cpu) -{ - set_migration_target_nodes(); - return 0; -} - -static int migration_offline_cpu(unsigned int cpu) -{ - set_migration_target_nodes(); - return 0; -} - static int __init migrate_on_reclaim_init(void) { - int ret; - node_demotion = kmalloc_array(nr_node_ids, sizeof(struct demotion_nodes), GFP_KERNEL); WARN_ON(!node_demotion); - ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline", - NULL, migration_offline_cpu); - /* - * In the unlikely case that this fails, the automatic - * migration targets may become suboptimal for nodes - * where N_CPU changes. With such a small impact in a - * rare case, do not bother trying to do anything special. - */ - WARN_ON(ret < 0); - ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online", - migration_online_cpu, NULL); - WARN_ON(ret < 0); - hotplug_memory_notifier(migrate_on_reclaim_callback, 100); return 0; } diff --git a/mm/vmstat.c b/mm/vmstat.c index 4057372745d0..0529a83c8f89 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -28,6 +28,7 @@ #include <linux/mm_inline.h> #include <linux/page_ext.h> #include <linux/page_owner.h> +#include <linux/migrate.h> #include "internal.h" @@ -2043,7 +2044,12 @@ static void __init init_cpu_node_state(void) static int vmstat_cpu_online(unsigned int cpu) { refresh_zone_stat_thresholds(); - node_set_state(cpu_to_node(cpu), N_CPU); + + if (!node_state(cpu_to_node(cpu), N_CPU)) { + node_set_state(cpu_to_node(cpu), N_CPU); + set_migration_target_nodes(); + } + return 0; } @@ -2066,6 +2072,8 @@ static int vmstat_cpu_dead(unsigned int cpu) return 0; node_clear_state(node, N_CPU); + set_migration_target_nodes(); + return 0; }
Abhishek reported that after patch [1], hotplug operations are taking ~double the expected time. [2] The reason behind is that the CPU callbacks that migrate_on_reclaim_init() sets always call set_migration_target_nodes() whenever a CPU is brought up/down. But we only care about numa nodes going from having cpus to become cpuless, and vice versa, as that influences the demotion_target order. We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead()) that check exactly that, so get rid of the CPU callbacks in migrate_on_reclaim_init() and only call set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state. [1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/ [2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/ Reported-by: Abhishek Goel <huntbag@linux.vnet.ibm.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> --- I think there is further room for improvement like should we call in to set_migration_target_nodes() when demotion is disabled via sysctl? I will have a look into that, but let us go with this quick fix for now. Also, I am not really strong about the Fixes tag, but it can be added if you think it makes sense. --- include/linux/migrate.h | 5 +++++ mm/migrate.c | 36 +----------------------------------- mm/vmstat.c | 10 +++++++++- 3 files changed, 15 insertions(+), 36 deletions(-)