Message ID | 20200512132937.19295-4-srikar@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Offline memoryless cpuless node 0 | expand |
On Tue, 12 May 2020, Srikar Dronamraju wrote: > +#ifdef CONFIG_NUMA > + [N_ONLINE] = NODE_MASK_NONE, Again. Same issue as before. If you do this then you do a global change for all architectures. You need to put something in the early boot sequence (in a non architecture specific way) that sets the first node online by default. You have fixed the issue in your earlier patches for the powerpc archicture. What about the other architectures? Or did I miss something?
* Christopher Lameter <cl@linux.com> [2020-05-12 16:31:26]: > On Tue, 12 May 2020, Srikar Dronamraju wrote: > > > +#ifdef CONFIG_NUMA > > + [N_ONLINE] = NODE_MASK_NONE, > > Again. Same issue as before. If you do this then you do a global change > for all architectures. You need to put something in the early boot > sequence (in a non architecture specific way) that sets the first node > online by default. > I did respond to that earlier. > You have fixed the issue in your earlier patches for the powerpc > archicture. What about the other architectures? > > Or did I miss something? > Here are my assumptions, please do correct me if any of them are wrong. 1. My other patches for Powerpc, don't change when the nodes are being onlined. They only change how the cpu_to_node numbering of the offline cpus. In this respect Powerpc due to its PAPR compliance may be slightly unique from other archs where the cpu binding of the node is not known till CPUs are onlined. 2. Currently the nodes are onlined (in all arch specific code) as soon as they are detected. This is unconditional onlining as in there are no checks to see the node number is 0. i.e I don't see any special checks that restrict or allow node 0 from being onlined / offlined. Its considered no special than any other online node. 3. If we were to expect node 0 to be always online, then why do we have first_online_node. We could always hard code it to 0. 4. I tried enabling CONFIG_MEMORYLESS_NODE on x86, but that's seems to be not possible. And it looks to me that something like that is only possible on powerpc and IA64. 5. Without my patch on a regular numa system, node 0 would be onlined by default during structure initialization. When the nodes get detected, node 0 and other nodes would again be onlined. The only drawback being if node 0 wasn't suppose to be online, it will still end up being marked online. With the proposed patch, when the nodes get detected, any nodes detected would be onlined. I think the node onlining is already pretty early in boot. I don't know of any other mechanism to move the onlining further up and in a non architecture specific way. However if you have ideas, please do let me know.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 69827d4..03b8959 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -116,8 +116,10 @@ struct pcpu_drain { */ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, +#ifdef CONFIG_NUMA + [N_ONLINE] = NODE_MASK_NONE, +#else [N_ONLINE] = { { [0] = 1UL } }, -#ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM [N_HIGH_MEMORY] = { { [0] = 1UL } },
Currently Linux kernel with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause numa_balancing to be enabled on systems with only one node with memory and CPUs. The existence of this dummy node which is cpuless and memoryless node can confuse users/scripts looking at output of lscpu / numactl. By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is always online. v5.7-rc3 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files ------------------ /sys/devices/system/node/online: 0,2 /proc/sys/kernel/numa_balancing: 1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.7-rc3 + patch ------------------ available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files ------------------ /sys/devices/system/node/online: 2 /proc/sys/kernel/numa_balancing: 0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Note: On Powerpc, cpu_to_node of possible but not present cpus would previously return 0. Hence this commit depends on commit ("powerpc/numa: Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id queried from vphn"). Without the 2 commits, Powerpc system might crash. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Christopher Lameter <cl@linux.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> --- Changelog v1:->v2: - Rebased to v5.7-rc3 Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-srikar@linux.vnet.ibm.com/t/#u mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)