Message ID | 20190731130037.GN9330@dhcp22.suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | microblaze HAVE_MEMBLOCK_NODE_MAP dependency (was Re: [PATCH v2 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) | expand |
On Wed, Jul 31, 2019 at 03:00:37PM +0200, Michal Hocko wrote: > On Wed 31-07-19 15:26:32, Mike Rapoport wrote: > > On Wed, Jul 31, 2019 at 01:40:16PM +0200, Michal Hocko wrote: > > > On Wed 31-07-19 14:14:22, Mike Rapoport wrote: > > > > On Wed, Jul 31, 2019 at 10:03:09AM +0200, Michal Hocko wrote: > > > > > On Wed 31-07-19 09:24:21, Mike Rapoport wrote: > > > > > > [ sorry for a late reply too, somehow I missed this thread before ] > > > > > > > > > > > > On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote: > > > > > > > [Sorry for a late reply] > > > > > > > > > > > > > > On Mon 15-07-19 17:55:07, Hoan Tran OS wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > On 7/12/19 10:00 PM, Michal Hocko wrote: > > > > > > > [...] > > > > > > > > > Hmm, I thought this was selectable. But I am obviously wrong here. > > > > > > > > > Looking more closely, it seems that this is indeed only about > > > > > > > > > __early_pfn_to_nid and as such not something that should add a config > > > > > > > > > symbol. This should have been called out in the changelog though. > > > > > > > > > > > > > > > > Yes, do you have any other comments about my patch? > > > > > > > > > > > > > > Not really. Just make sure to explicitly state that > > > > > > > CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that > > > > > > > doesn't really deserve it's own config and can be pulled under NUMA. > > > > > > > > > > > > > > > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar > > > > > > > > > bucket? Do we have any NUMA architecture that doesn't enable it? > > > > > > > > > > > > > > > > > > > > > HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization > > > > > > sequence so it's not only about a singe function. > > > > > > > > > > The question is whether we want to have this a config option or enable > > > > > it unconditionally for each NUMA system. > > > > > > > > We can make it 'default NUMA', but we can't drop it completely because > > > > microblaze uses sparse_memory_present_with_active_regions() which is > > > > unavailable when HAVE_MEMBLOCK_NODE_MAP=n. > > > > > > I suppose you mean that microblaze is using > > > sparse_memory_present_with_active_regions even without CONFIG_NUMA, > > > right? > > > > Yes. > > > > > I have to confess I do not understand that code. What is the deal > > > with setting node id there? > > > > The sparse_memory_present_with_active_regions() iterates over > > memblock.memory regions and uses the node id of each region as the > > parameter to memory_present(). The assumption here is that sometime before > > each region was assigned a proper non-negative node id. > > > > microblaze uses device tree for memory enumeration and the current FDT code > > does memblock_add() that implicitly sets nid in memblock.memory regions to -1. > > > > So in order to have proper node id passed to memory_present() microblaze > > has to call memblock_set_node() before it can use > > sparse_memory_present_with_active_regions(). > > I am sorry, but I still do not follow. Who is consuming that node id > information when NUMA=n. In other words why cannot we simply do We can, I think nobody cared to change it. > diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c > index a015a951c8b7..3a47e8db8d1c 100644 > --- a/arch/microblaze/mm/init.c > +++ b/arch/microblaze/mm/init.c > @@ -175,14 +175,9 @@ void __init setup_memory(void) > > start_pfn = memblock_region_memory_base_pfn(reg); > end_pfn = memblock_region_memory_end_pfn(reg); > - memblock_set_node(start_pfn << PAGE_SHIFT, > - (end_pfn - start_pfn) << PAGE_SHIFT, > - &memblock.memory, 0); > + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); memory_present() expects pfns, the shift is not needed. > } > > - /* XXX need to clip this if using highmem? */ > - sparse_memory_present_with_active_regions(0); > - > paging_init(); > } > > -- > Michal Hocko > SUSE Labs
On Wed 31-07-19 17:21:29, Mike Rapoport wrote: > On Wed, Jul 31, 2019 at 03:00:37PM +0200, Michal Hocko wrote: > > On Wed 31-07-19 15:26:32, Mike Rapoport wrote: > > > On Wed, Jul 31, 2019 at 01:40:16PM +0200, Michal Hocko wrote: > > > > On Wed 31-07-19 14:14:22, Mike Rapoport wrote: > > > > > On Wed, Jul 31, 2019 at 10:03:09AM +0200, Michal Hocko wrote: > > > > > > On Wed 31-07-19 09:24:21, Mike Rapoport wrote: > > > > > > > [ sorry for a late reply too, somehow I missed this thread before ] > > > > > > > > > > > > > > On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote: > > > > > > > > [Sorry for a late reply] > > > > > > > > > > > > > > > > On Mon 15-07-19 17:55:07, Hoan Tran OS wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > On 7/12/19 10:00 PM, Michal Hocko wrote: > > > > > > > > [...] > > > > > > > > > > Hmm, I thought this was selectable. But I am obviously wrong here. > > > > > > > > > > Looking more closely, it seems that this is indeed only about > > > > > > > > > > __early_pfn_to_nid and as such not something that should add a config > > > > > > > > > > symbol. This should have been called out in the changelog though. > > > > > > > > > > > > > > > > > > Yes, do you have any other comments about my patch? > > > > > > > > > > > > > > > > Not really. Just make sure to explicitly state that > > > > > > > > CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that > > > > > > > > doesn't really deserve it's own config and can be pulled under NUMA. > > > > > > > > > > > > > > > > > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar > > > > > > > > > > bucket? Do we have any NUMA architecture that doesn't enable it? > > > > > > > > > > > > > > > > > > > > > > > > HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization > > > > > > > sequence so it's not only about a singe function. > > > > > > > > > > > > The question is whether we want to have this a config option or enable > > > > > > it unconditionally for each NUMA system. > > > > > > > > > > We can make it 'default NUMA', but we can't drop it completely because > > > > > microblaze uses sparse_memory_present_with_active_regions() which is > > > > > unavailable when HAVE_MEMBLOCK_NODE_MAP=n. > > > > > > > > I suppose you mean that microblaze is using > > > > sparse_memory_present_with_active_regions even without CONFIG_NUMA, > > > > right? > > > > > > Yes. > > > > > > > I have to confess I do not understand that code. What is the deal > > > > with setting node id there? > > > > > > The sparse_memory_present_with_active_regions() iterates over > > > memblock.memory regions and uses the node id of each region as the > > > parameter to memory_present(). The assumption here is that sometime before > > > each region was assigned a proper non-negative node id. > > > > > > microblaze uses device tree for memory enumeration and the current FDT code > > > does memblock_add() that implicitly sets nid in memblock.memory regions to -1. > > > > > > So in order to have proper node id passed to memory_present() microblaze > > > has to call memblock_set_node() before it can use > > > sparse_memory_present_with_active_regions(). > > > > I am sorry, but I still do not follow. Who is consuming that node id > > information when NUMA=n. In other words why cannot we simply do > > We can, I think nobody cared to change it. It would be great if somebody with the actual HW could try it out. I can throw a patch but I do not even have a cross compiler in my toolbox. > > > diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c > > index a015a951c8b7..3a47e8db8d1c 100644 > > --- a/arch/microblaze/mm/init.c > > +++ b/arch/microblaze/mm/init.c > > @@ -175,14 +175,9 @@ void __init setup_memory(void) > > > > start_pfn = memblock_region_memory_base_pfn(reg); > > end_pfn = memblock_region_memory_end_pfn(reg); > > - memblock_set_node(start_pfn << PAGE_SHIFT, > > - (end_pfn - start_pfn) << PAGE_SHIFT, > > - &memblock.memory, 0); > > + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); > > memory_present() expects pfns, the shift is not needed. Right.
On Wed, Jul 31, 2019 at 04:41:14PM +0200, Michal Hocko wrote: > On Wed 31-07-19 17:21:29, Mike Rapoport wrote: > > On Wed, Jul 31, 2019 at 03:00:37PM +0200, Michal Hocko wrote: > > > > > > I am sorry, but I still do not follow. Who is consuming that node id > > > information when NUMA=n. In other words why cannot we simply do > > > > We can, I think nobody cared to change it. > > It would be great if somebody with the actual HW could try it out. > I can throw a patch but I do not even have a cross compiler in my > toolbox. Well, it compiles :) > > > diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c > > > index a015a951c8b7..3a47e8db8d1c 100644 > > > --- a/arch/microblaze/mm/init.c > > > +++ b/arch/microblaze/mm/init.c > > > @@ -175,14 +175,9 @@ void __init setup_memory(void) > > > > > > start_pfn = memblock_region_memory_base_pfn(reg); > > > end_pfn = memblock_region_memory_end_pfn(reg); > > > - memblock_set_node(start_pfn << PAGE_SHIFT, > > > - (end_pfn - start_pfn) << PAGE_SHIFT, > > > - &memblock.memory, 0); > > > + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); > > > > memory_present() expects pfns, the shift is not needed. > > Right. > > -- > Michal Hocko > SUSE Labs >
On 7/31/19 10:15 AM, Mike Rapoport wrote: > On Wed, Jul 31, 2019 at 04:41:14PM +0200, Michal Hocko wrote: >> On Wed 31-07-19 17:21:29, Mike Rapoport wrote: >>> On Wed, Jul 31, 2019 at 03:00:37PM +0200, Michal Hocko wrote: >>>> >>>> I am sorry, but I still do not follow. Who is consuming that node id >>>> information when NUMA=n. In other words why cannot we simply do >>> >>> We can, I think nobody cared to change it. >> >> It would be great if somebody with the actual HW could try it out. >> I can throw a patch but I do not even have a cross compiler in my >> toolbox. > > Well, it compiles :) Adding Michal Simek <monstr@monstr.eu>. It's not clear that the MICROBLAZE maintainer is still supporting MICROBLAZE. >>>> diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c >>>> index a015a951c8b7..3a47e8db8d1c 100644 >>>> --- a/arch/microblaze/mm/init.c >>>> +++ b/arch/microblaze/mm/init.c >>>> @@ -175,14 +175,9 @@ void __init setup_memory(void) >>>> >>>> start_pfn = memblock_region_memory_base_pfn(reg); >>>> end_pfn = memblock_region_memory_end_pfn(reg); >>>> - memblock_set_node(start_pfn << PAGE_SHIFT, >>>> - (end_pfn - start_pfn) << PAGE_SHIFT, >>>> - &memblock.memory, 0); >>>> + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); >>> >>> memory_present() expects pfns, the shift is not needed. >> >> Right. >> >> -- >> Michal Hocko >> SUSE Labs
On 31. 07. 19 19:15, Mike Rapoport wrote: > On Wed, Jul 31, 2019 at 04:41:14PM +0200, Michal Hocko wrote: >> On Wed 31-07-19 17:21:29, Mike Rapoport wrote: >>> On Wed, Jul 31, 2019 at 03:00:37PM +0200, Michal Hocko wrote: >>>> >>>> I am sorry, but I still do not follow. Who is consuming that node id >>>> information when NUMA=n. In other words why cannot we simply do >>> >>> We can, I think nobody cared to change it. >> >> It would be great if somebody with the actual HW could try it out. >> I can throw a patch but I do not even have a cross compiler in my >> toolbox. > > Well, it compiles :) > >>>> diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c >>>> index a015a951c8b7..3a47e8db8d1c 100644 >>>> --- a/arch/microblaze/mm/init.c >>>> +++ b/arch/microblaze/mm/init.c >>>> @@ -175,14 +175,9 @@ void __init setup_memory(void) >>>> >>>> start_pfn = memblock_region_memory_base_pfn(reg); >>>> end_pfn = memblock_region_memory_end_pfn(reg); >>>> - memblock_set_node(start_pfn << PAGE_SHIFT, >>>> - (end_pfn - start_pfn) << PAGE_SHIFT, >>>> - &memblock.memory, 0); >>>> + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); >>> >>> memory_present() expects pfns, the shift is not needed. >> >> Right. Sorry for slow response on this. In general regarding this topic. Microblaze is soft core CPU (now there are hardcore versions too but not running Linux). I believe there could be Numa system with microblaze/microblazes (SMP is not supported in mainline). This code was added in 2011 which is pretty hard to remember why it was done in this way. It compiles but not working on HW. Please take a look at log below. Thanks, Michal [ 0.000000] Linux version 5.3.0-rc6-00007-g54b01939182f-dirty (monstr@monstr-desktop3) (gcc version 8.2.0 (crosstool-NG 1.20.0)) #101 Mon Sep 2 15:44:05 CEST 2019 [ 0.000000] setup_memory: max_mapnr: 0x40000 [ 0.000000] setup_memory: min_low_pfn: 0x80000 [ 0.000000] setup_memory: max_low_pfn: 0xb0000 [ 0.000000] setup_memory: max_pfn: 0xc0000 [ 0.000000] start pfn 0x80000 [ 0.000000] end pfn 0xc0000 [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000080000000-0x00000000afffffff] [ 0.000000] Normal empty [ 0.000000] HighMem [mem 0x00000000b0000000-0x00000000bfffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 1: [mem 0x0000000080000000-0x00000000bfffffff] [ 0.000000] Could not find start_pfn for node 0 [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000] [ 0.000000] earlycon: ns16550a0 at MMIO 0x44a01000 (options '115200n8') [ 0.000000] printk: bootconsole [ns16550a0] enabled [ 0.000000] setup_cpuinfo: initialising [ 0.000000] setup_cpuinfo: Using full CPU PVR support [ 0.000000] wt_msr_noirq [ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 [ 0.000000] pcpu-alloc: [0] 0 [ 0.000000] Built 1 zonelists, mobility grouping off. Total pages: 0 [ 0.000000] Kernel command line: earlycon [ 0.000000] Dentry cache hash table entries: -2147483648 (order: -13, 0 bytes, linear) [ 0.000000] Inode-cache hash table entries: -2147483648 (order: -13, 0 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] Oops: kernel access of bad area, sig: 11 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.3.0-rc6-00007-g54b01939182f-dirty #101 [ 0.000000] Registers dump: mode=805B9EA8 [ 0.000000] r1=000065A0, r2=C05B7AE6, r3=00000000, r4=00000000 [ 0.000000] r5=00080000, r6=00080B50, r7=00000000, r8=00000004 [ 0.000000] r9=00000000, r10=0000001F, r11=00000000, r12=00006666 [ 0.000000] r13=4119DCC0, r14=00000000, r15=C05EFF8C, r16=00000000 [ 0.000000] r17=C0604408, r18=FFFC0000, r19=C05B9F6C, r20=BFFEC168 [ 0.000000] r21=BFFEC168, r22=EFFF9AC0, r23=00000001, r24=C0606874 [ 0.000000] r25=BFE6B74C, r26=80000000, r27=00000000, r28=90000040 [ 0.000000] r29=01000000, r30=00000380, r31=C05C02F0, rPC=C0604408 [ 0.000000] msr=000046A0, ear=00000004, esr=00000D12, fsr=FFFFFFFF [ 0.000000] Oops: kernel access of bad area, sig: 11
On Mon, Sep 02, 2019 at 03:51:25PM +0200, Michal Simek wrote: > On 31. 07. 19 19:15, Mike Rapoport wrote: > > On Wed, Jul 31, 2019 at 04:41:14PM +0200, Michal Hocko wrote: > >> On Wed 31-07-19 17:21:29, Mike Rapoport wrote: > >>> On Wed, Jul 31, 2019 at 03:00:37PM +0200, Michal Hocko wrote: > >>>> > >>>> I am sorry, but I still do not follow. Who is consuming that node id > >>>> information when NUMA=n. In other words why cannot we simply do > >>> > >>> We can, I think nobody cared to change it. > >> > >> It would be great if somebody with the actual HW could try it out. > >> I can throw a patch but I do not even have a cross compiler in my > >> toolbox. > > > > Well, it compiles :) > > > >>>> diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c > >>>> index a015a951c8b7..3a47e8db8d1c 100644 > >>>> --- a/arch/microblaze/mm/init.c > >>>> +++ b/arch/microblaze/mm/init.c > >>>> @@ -175,14 +175,9 @@ void __init setup_memory(void) > >>>> > >>>> start_pfn = memblock_region_memory_base_pfn(reg); > >>>> end_pfn = memblock_region_memory_end_pfn(reg); > >>>> - memblock_set_node(start_pfn << PAGE_SHIFT, > >>>> - (end_pfn - start_pfn) << PAGE_SHIFT, > >>>> - &memblock.memory, 0); > >>>> + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); > >>> > >>> memory_present() expects pfns, the shift is not needed. > >> > >> Right. > > Sorry for slow response on this. In general regarding this topic. > Microblaze is soft core CPU (now there are hardcore versions too but not > running Linux). I believe there could be Numa system with > microblaze/microblazes (SMP is not supported in mainline). > > This code was added in 2011 which is pretty hard to remember why it was > done in this way. > > It compiles but not working on HW. Please take a look at log below. > > Thanks, > Michal > > > [ 0.000000] Linux version 5.3.0-rc6-00007-g54b01939182f-dirty > (monstr@monstr-desktop3) (gcc version 8.2.0 (crosstool-NG 1.20.0)) #101 > Mon Sep 2 15:44:05 CEST 2019 > [ 0.000000] setup_memory: max_mapnr: 0x40000 > [ 0.000000] setup_memory: min_low_pfn: 0x80000 > [ 0.000000] setup_memory: max_low_pfn: 0xb0000 > [ 0.000000] setup_memory: max_pfn: 0xc0000 > [ 0.000000] start pfn 0x80000 > [ 0.000000] end pfn 0xc0000 > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000080000000-0x00000000afffffff] > [ 0.000000] Normal empty > [ 0.000000] HighMem [mem 0x00000000b0000000-0x00000000bfffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 1: [mem 0x0000000080000000-0x00000000bfffffff] > [ 0.000000] Could not find start_pfn for node 0 > [ 0.000000] Initmem setup node 0 [mem > 0x0000000000000000-0x0000000000000000] This does not look good :) I think the problem is that without an explicit call to memblock_set_node() the ->nid in memblock is MAX_NUMNODES but free_area_init_nodes() presumes actual node ids are properly set. > [ 0.000000] earlycon: ns16550a0 at MMIO 0x44a01000 (options '115200n8') > [ 0.000000] printk: bootconsole [ns16550a0] enabled > [ 0.000000] setup_cpuinfo: initialising > [ 0.000000] setup_cpuinfo: Using full CPU PVR support > [ 0.000000] wt_msr_noirq > [ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 > [ 0.000000] pcpu-alloc: [0] 0 > [ 0.000000] Built 1 zonelists, mobility grouping off. Total pages: 0 > [ 0.000000] Kernel command line: earlycon > [ 0.000000] Dentry cache hash table entries: -2147483648 (order: -13, > 0 bytes, linear) > [ 0.000000] Inode-cache hash table entries: -2147483648 (order: -13, > 0 bytes, linear) > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > [ 0.000000] Oops: kernel access of bad area, sig: 11 > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted > 5.3.0-rc6-00007-g54b01939182f-dirty #101 > [ 0.000000] Registers dump: mode=805B9EA8 > [ 0.000000] r1=000065A0, r2=C05B7AE6, r3=00000000, r4=00000000 > [ 0.000000] r5=00080000, r6=00080B50, r7=00000000, r8=00000004 > [ 0.000000] r9=00000000, r10=0000001F, r11=00000000, r12=00006666 > [ 0.000000] r13=4119DCC0, r14=00000000, r15=C05EFF8C, r16=00000000 > [ 0.000000] r17=C0604408, r18=FFFC0000, r19=C05B9F6C, r20=BFFEC168 > [ 0.000000] r21=BFFEC168, r22=EFFF9AC0, r23=00000001, r24=C0606874 > [ 0.000000] r25=BFE6B74C, r26=80000000, r27=00000000, r28=90000040 > [ 0.000000] r29=01000000, r30=00000380, r31=C05C02F0, rPC=C0604408 > [ 0.000000] msr=000046A0, ear=00000004, esr=00000D12, fsr=FFFFFFFF > [ 0.000000] Oops: kernel access of bad area, sig: 11 > > > -- > Michal Simek, Ing. (M.Eng), OpenPGP -> KeyID: FE3D1F91 > w: www.monstr.eu p: +42-0-721842854 > Maintainer of Linux kernel - Xilinx Microblaze > Maintainer of Linux kernel - Xilinx Zynq ARM and ZynqMP ARM64 SoCs > U-Boot custodian - Xilinx Microblaze/Zynq/ZynqMP/Versal SoCs > >
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c index a015a951c8b7..3a47e8db8d1c 100644 --- a/arch/microblaze/mm/init.c +++ b/arch/microblaze/mm/init.c @@ -175,14 +175,9 @@ void __init setup_memory(void) start_pfn = memblock_region_memory_base_pfn(reg); end_pfn = memblock_region_memory_end_pfn(reg); - memblock_set_node(start_pfn << PAGE_SHIFT, - (end_pfn - start_pfn) << PAGE_SHIFT, - &memblock.memory, 0); + memory_present(0, start_pfn << PAGE_SHIFT, end_pfn << PAGE_SHIFT); } - /* XXX need to clip this if using highmem? */ - sparse_memory_present_with_active_regions(0); - paging_init(); }