Message ID | alpine.DEB.2.00.1104191657030.26867@router.home (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
> On Tue, 19 Apr 2011, James Bottomley wrote: > > > > Which part of me telling you that you will break lots of other things in > > > the core kernel dont you get? > > > > I get that you tell me this ... however, the systems that, according to > > you, should be failing to get to boot prompt do, in fact, manage it. > > If you dont use certain subsystems then it may work. Also do you run with > debuggin on. > > The following patch is I think what would be needed to fix it. I'm worry about this patch. A lot of mm code assume !NUMA systems only have node 0. Not only SLUB. I'm not sure why this unfortunate mismatch occur. but I think DISCONTIG hacks makes less sense. Can we consider parisc turn NUMA on instead? -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-04-19 at 16:58 -0500, Christoph Lameter wrote: > On Tue, 19 Apr 2011, James Bottomley wrote: > > > > Which part of me telling you that you will break lots of other things in > > > the core kernel dont you get? > > > > I get that you tell me this ... however, the systems that, according to > > you, should be failing to get to boot prompt do, in fact, manage it. > > If you dont use certain subsystems then it may work. Also do you run with > debuggin on. > > The following patch is I think what would be needed to fix it. Not really: crashes immediately on boot [ 0.000000] FP[0] enabled: Rev 1 Model 20 [ 0.000000] The 64-bit Kernel has started... [ 0.000000] bootconsole [ttyB0] enabled [ 0.000000] Initialized PDC Console for debugging. [ 0.000000] Determining PDC firmware type: 64 bit PAT. [ 0.000000] model 00008870 00000491 00000000 00000002 3e0505e7352af710 100000f0 00000008 000000b2 000000b2 [ 0.000000] vers 00000301 [ 0.000000] CPUID vers 20 rev 4 (0x00000284) [ 0.000000] capabilities 0x35 [ 0.000000] model 9000/800/rp3440 [ 0.000000] parisc_cache_init: Only equivalent aliasing supported! [ 0.000000] Memory Ranges: [ 0.000000] 0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB [ 0.000000] 1) Start 0x0000004040000000 End 0x000000407fdfffff Size 1022 MB [ 0.000000] Total Memory: 2046 MB [ 0.000000] initrd: 7f390000-7ffedf6d [ 0.000000] initrd: reserving 3f390000-3ffedf6d (mem_max 7fe00000) [ 0.000000] ------------[ cut here ]------------ [ 0.000000] kernel BUG at mm/mm_init.c:127! [ 0.000000] [ 0.000000] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI [ 0.000000] PSW: 00001000000001001111111100001110 Not tainted [ 0.000000] r00-03 000000ff0804ff0e 000000004076a640 0000000040798c50 0000004080000000 [ 0.000000] r04-07 0000000040746e40 0000000004040000 0000000000000001 0000000040654150 [ 0.000000] r08-11 00000000405bd540 000000000407fe00 0000000000000001 0000000000000000 [ 0.000000] r12-15 00000000405bc740 000f000000000000 00000000000001ff 000000004076a640 [ 0.000000] r16-19 00000000000000ff 0000000000000000 2000000000000000 0000000000000000 [ 0.000000] r20-23 0000004080000000 00000000405bd908 0000000000000000 0000000004040000 [ 0.000000] r24-27 0000000000000001 0000000000000000 0000004080000000 0000000040746e40 [ 0.000000] r28-31 2000000000000000 00000000405b0610 00000000405b0640 0000000000000000 [ 0.000000] sr00-03 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] sr04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.000000] [ 0.000000] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040798fac 0000000040798fb0 [ 0.000000] IIR: 03ffe01f ISR: 0000000010350000 IOR: 0000010000000000 [ 0.000000] CPU: 0 CR30: 00000000405b0000 CR31: fffffff0f0e098e0 [ 0.000000] ORIG_R28: 000000000003fe00 [ 0.000000] IAOQ[0]: mminit_verify_page_links+0x84/0xa0 [ 0.000000] IAOQ[1]: mminit_verify_page_links+0x88/0xa0 [ 0.000000] RP(r2): memmap_init_zone+0x148/0x2a0 [ 0.000000] Backtrace: [ 0.000000] [<0000000040798c50>] memmap_init_zone+0x148/0x2a0 [ 0.000000] [<0000000040777ca8>] free_area_init_node+0x3c8/0x518 [ 0.000000] [<000000004076fde0>] paging_init+0x928/0xb20 [ 0.000000] [<0000000040770a48>] setup_arch+0xe8/0x120 [ 0.000000] [<000000004076c9a0>] start_kernel+0xf0/0x830 [ 0.000000] [<000000004011f4fc>] start_parisc+0xa4/0xb8 [ 0.000000] [<00000000404b0f0c>] packet_ioctl+0x1e4/0x208 [ 0.000000] [<00000000404a79d0>] unix_ioctl+0x70/0x168 [ 0.000000] [<0000000040482d24>] ip_mc_gsfget+0x14c/0x200 [ 0.000000] [<000000004046cc20>] raw_ioctl+0xe8/0x118 [ 0.000000] [<000000004044f524>] do_tcp_getsockopt+0x5c4/0x5d0 [ 0.000000] [<0000000040432d64>] netlink_getsockopt+0x15c/0x178 [ 0.000000] [ 0.000000] Backtrace: [ 0.000000] [<000000004011f984>] show_stack+0x14/0x20 [ 0.000000] [<000000004011f9a8>] dump_stack+0x18/0x28 [ 0.000000] [<000000004012022c>] die_if_kernel+0x194/0x258 [ 0.000000] [<0000000040120b30>] handle_interruption+0x840/0x8f8 [ 0.000000] [<0000000040798fac>] mminit_verify_page_links+0x84/0xa0 [ 0.000000] [ 0.000000] ---[ end trace 139ce121c98e96c9 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] Backtrace: [ 0.000000] [<000000004011f984>] show_stack+0x14/0x20 [ 0.000000] [<000000004011f9a8>] dump_stack+0x18/0x28 [ 0.000000] [<000000004015945c>] panic+0xd4/0x368 [ 0.000000] [<000000004015f054>] do_exit+0x89c/0x9d8 [ 0.000000] [<00000000401202d4>] die_if_kernel+0x23c/0x258 [ 0.000000] [<0000000040120b30>] handle_interruption+0x840/0x8f8 [ 0.000000] [<0000000040798fac>] mminit_verify_page_links+0x84/0xa0 [ 0.000000] There's a lot more to discontigmem than just page_to_nid ... there's the whole pfn_to_nid() thing as well James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > I'm worry about this patch. A lot of mm code assume !NUMA systems > only have node 0. Not only SLUB. So is that a valid assumption or not? Christoph seems to think it is and James seems to think it's not. Which way should we aim to fix it? Would be nice if other people chimed in as we already know what James and Christoph think. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro > <kosaki.motohiro@jp.fujitsu.com> wrote: > > I'm worry about this patch. A lot of mm code assume !NUMA systems > > only have node 0. Not only SLUB. > > So is that a valid assumption or not? Christoph seems to think it is > and James seems to think it's not. Which way should we aim to fix it? > Would be nice if other people chimed in as we already know what James > and Christoph think. I'm sorry. I don't know it really. The fact was gone into historical myst. ;-) Now, CONFIG_NUMA has mainly five meanings. 1) system may has !0 node id. 2) compile mm/mempolicy.c (ie enable mempolicy APIs) 3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topology. 4) enable zone-reclaim feature 5) scheduler makes per-node load balancing scheduler domain Anyway, we have to fix this issue. I'm digging which fixing way has least risk. btw, x86 don't have an issue. Probably it's a reason why this issue was neglected long time. arch/x86/Kconfig ------------------------------------- config ARCH_DISCONTIGMEM_ENABLE def_bool y depends on NUMA && X86_32 -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi! On Wed, Apr 20, 2011 at 4:23 AM, KOSAKI Motohiro >> <kosaki.motohiro@jp.fujitsu.com> wrote: >> > I'm worry about this patch. A lot of mm code assume !NUMA systems >> > only have node 0. Not only SLUB. >> >> So is that a valid assumption or not? Christoph seems to think it is >> and James seems to think it's not. Which way should we aim to fix it? >> Would be nice if other people chimed in as we already know what James >> and Christoph think. On Wed, Apr 20, 2011 at 10:15 AM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > I'm sorry. I don't know it really. The fact was gone into historical myst. ;-) > > Now, CONFIG_NUMA has mainly five meanings. > > 1) system may has !0 node id. > 2) compile mm/mempolicy.c (ie enable mempolicy APIs) > 3) Allocator (kmalloc, vmalloc, alloc_page, et al) awake NUMA topology. > 4) enable zone-reclaim feature > 5) scheduler makes per-node load balancing scheduler domain > > Anyway, we have to fix this issue. I'm digging which fixing way has least risk. > > > btw, x86 don't have an issue. Probably it's a reason why this issue was neglected > long time. > > arch/x86/Kconfig > ------------------------------------- > config ARCH_DISCONTIGMEM_ENABLE > def_bool y > depends on NUMA && X86_32 That part makes me think the best option is to make parisc do CONFIG_NUMA as well regardless of the historical intent was. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote: > That part makes me think the best option is to make parisc do > CONFIG_NUMA as well regardless of the historical intent was. But it's not just parisc. It's six other architectures as well, some of which aren't even SMP. Does !SMP && NUMA make any kind of sense? I think really, this is just a giant horrible misunderstanding on the part of the MM people. There's no reason why an ARM chip with 16MB of memory at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk.
Hi Matthew, On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote: >> That part makes me think the best option is to make parisc do >> CONFIG_NUMA as well regardless of the historical intent was. On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote: > But it's not just parisc. It's six other architectures as well, some > of which aren't even SMP. Does !SMP && NUMA make any kind of sense? IIRC, we actually fixed SLAB or SLUB to work on such configs in the past. On Wed, Apr 20, 2011 at 2:20 PM, Matthew Wilcox <matthew@wil.cx> wrote: > I think really, this is just a giant horrible misunderstanding on the part > of the MM people. There's no reason why an ARM chip with 16MB of memory > at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk. Right. My point was simply that since x86 doesn't support DISCONTIGMEM without NUMA, the misunderstanding is likely very wide-spread. Pekka -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2011, Matthew Wilcox wrote: > On Wed, Apr 20, 2011 at 10:34:23AM +0300, Pekka Enberg wrote: > > That part makes me think the best option is to make parisc do > > CONFIG_NUMA as well regardless of the historical intent was. > > But it's not just parisc. It's six other architectures as well, some > of which aren't even SMP. Does !SMP && NUMA make any kind of sense? Of course not. > I think really, this is just a giant horrible misunderstanding on the part > of the MM people. There's no reason why an ARM chip with 16MB of memory > at 0 and 16MB of memory at 1GB should be saddled with all the NUMA gunk. DISCONTIG has fallen out of favor in the last years. SPARSEMEM has largely replaced it. ARM uses that and does not suffer from these issue. No one considered the issues of having a !NUMA configuration with nodes (which DISCONTIG seems to create) when developing core code in the last years. The implicit assumption has always been that page_to_nid(x) etc is always zero on a !NUMA configuration. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2011, Pekka Enberg wrote: > That part makes me think the best option is to make parisc do > CONFIG_NUMA as well regardless of the historical intent was. Another possilibity is to use SPARSEMEM instead? We can do the same for the other arches that we have done to x86. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[added linux-arch to cc since we're going to be affecting them] On Wed, 2011-04-20 at 14:28 +0300, Pekka Enberg wrote: > Right. My point was simply that since x86 doesn't support DISCONTIGMEM > without NUMA, the misunderstanding is likely very wide-spread. Why don't we approach the problem in a few separate ways then. 1. We can look at what imposing NUMA on the DISCONTIGMEM archs would do ... the embedded ones are going to be hardest hit, but if it's not too much extra code, it might be palatable. 2. The other is that we can audit mm to look at all the node assumptions in the non-numa case. My suspicion is that accidentally or otherwise, it mostly works for the normal case, so there might not be much needed to pull it back to working properly for DISCONTIGMEM. 3. Finally we could look at deprecating DISCONTIGMEM in favour of SPARSEMEM, but we'd still need to fix -stable for that case. Especially as it will take time to convert all the architectures I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight framework which allows machines with split physical memory ranges to work. That's a very common case nowadays. Numa is supposed to be a heavyweight framework to preserve node locality for non-uniform memory access boxes (which none of the DISCONTIGMEM && !NUMA systems are). James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2011, James Bottomley wrote: > 1. We can look at what imposing NUMA on the DISCONTIGMEM archs > would do ... the embedded ones are going to be hardest hit, but > if it's not too much extra code, it might be palatable. > 2. The other is that we can audit mm to look at all the node > assumptions in the non-numa case. My suspicion is that > accidentally or otherwise, it mostly works for the normal case, > so there might not be much needed to pull it back to working > properly for DISCONTIGMEM. The older code may work. SLAB f.e. does not call page_to_nid() in the !NUMA case but keeps special metadata structures around in each slab page that records the node used for allocation. The problem is with new code added/revised in the last 5 years or so that uses page_to_nid() and allocates only a single structure for !NUMA. There are also VM_BUG_ONs in the page allocator that should trigger if page_to_nid() returns strange values. I wonder why that never occurred. > 3. Finally we could look at deprecating DISCONTIGMEM in favour of > SPARSEMEM, but we'd still need to fix -stable for that case. > Especially as it will take time to convert all the architectures The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We need an audit of the core VM before removing that or making it contingent on the configurations of various VM subsystems. > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight > framework which allows machines with split physical memory ranges to > work. That's a very common case nowadays. Numa is supposed to be a > heavyweight framework to preserve node locality for non-uniform memory > access boxes (which none of the DISCONTIGMEM && !NUMA systems are). Well yes but we have SPARSE for that today. DISCONTIG with multiple per pgdat structures in a !NUMA case is just weird and unexpected for many who have done VM coding in the last years. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2011-04-20 at 09:50 -0500, Christoph Lameter wrote: > On Wed, 20 Apr 2011, James Bottomley wrote: > > > 1. We can look at what imposing NUMA on the DISCONTIGMEM archs > > would do ... the embedded ones are going to be hardest hit, but > > if it's not too much extra code, it might be palatable. > > 2. The other is that we can audit mm to look at all the node > > assumptions in the non-numa case. My suspicion is that > > accidentally or otherwise, it mostly works for the normal case, > > so there might not be much needed to pull it back to working > > properly for DISCONTIGMEM. > > The older code may work. SLAB f.e. does not call page_to_nid() in the > !NUMA case but keeps special metadata structures around in each slab page > that records the node used for allocation. The problem is with new code > added/revised in the last 5 years or so that uses page_to_nid() and > allocates only a single structure for !NUMA. There are also VM_BUG_ONs in > the page allocator that should trigger if page_to_nid() returns strange > values. I wonder why that never occurred. Actually, I think slab got changed when discontigmem was added ... that's why it all works OK. > > 3. Finally we could look at deprecating DISCONTIGMEM in favour > of > SPARSEMEM, but we'd still need to fix -stable for that case. > > Especially as it will take time to convert all the architectures > > The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We > need an audit of the core VM before removing that or making it contingent > on the configurations of various VM subsystems. Don't be stupid ... that would cause six architectures to get marked broken. > > I'm certainly with Matthew: DISCONTIGMEM is supposed to be a lightweight > > framework which allows machines with split physical memory ranges to > > work. That's a very common case nowadays. Numa is supposed to be a > > heavyweight framework to preserve node locality for non-uniform memory > > access boxes (which none of the DISCONTIGMEM && !NUMA systems are). > > Well yes but we have SPARSE for that today. DISCONTIG with multiple per > pgdat structures in a !NUMA case is just weird and unexpected for many who > have done VM coding in the last years. Look, I'm not really interested in who understands what. The fact is we have six architectures with the possibility for DISCONTIGMEM && !NUMA, so that's the case we need to fix in -stable. They oops with SLUB, as far as I can tell, there are still no oops reports with SLAB. The simplest -stable fix seems to be to mark SLUB broken on DISCONTIGMEM && !NUMA. James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2011, James Bottomley wrote: > > The older code may work. SLAB f.e. does not call page_to_nid() in the > > !NUMA case but keeps special metadata structures around in each slab page > > that records the node used for allocation. The problem is with new code > > added/revised in the last 5 years or so that uses page_to_nid() and > > allocates only a single structure for !NUMA. There are also VM_BUG_ONs in > > the page allocator that should trigger if page_to_nid() returns strange > > values. I wonder why that never occurred. > > Actually, I think slab got changed when discontigmem was added ... > that's why it all works OK. Could be. I was not around at the time. > > > 3. Finally we could look at deprecating DISCONTIGMEM in favour > > of > SPARSEMEM, but we'd still need to fix -stable for that case. > > > Especially as it will take time to convert all the architectures > > > > The fix needed is to mark DISCONTIGMEM without NUMA as broken for now. We > > need an audit of the core VM before removing that or making it contingent > > on the configurations of various VM subsystems. > > Don't be stupid ... that would cause six architectures to get marked > broken. Yes they are broken right now. Marking just means showing the user that we are aware of the situation. > Look, I'm not really interested in who understands what. The fact is we > have six architectures with the possibility for DISCONTIGMEM && !NUMA, > so that's the case we need to fix in -stable. > > They oops with SLUB, as far as I can tell, there are still no oops > reports with SLAB. The simplest -stable fix seems to be to mark SLUB > broken on DISCONTIGMEM && !NUMA. There is barely any testing going on at all of this since we have had this issue for more than 5 years and have not noticed it. The absence of bug reports therefore proves nothing. Code inspection of the VM shows that this is an issue that arises in multiple subsystems and that we have VM_BUG_ONs in the page allocator that should trigger for these situations. Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 20, 2011 at 10:22:04AM -0500, Christoph Lameter wrote: > There is barely any testing going on at all of this since we have had this > issue for more than 5 years and have not noticed it. The absence of bug > reports therefore proves nothing. Code inspection of the VM shows > that this is an issue that arises in multiple subsystems and that we have > VM_BUG_ONs in the page allocator that should trigger for these situations. So ... we've proven that people using these architectures use SLAB instead of SLUB, don't enable CONFIG_DEBUG_VM and don't use hugepages (not really a surprise ... nobody's running Oracle on these arches :-) I don't think that qualifies as "barely any testing". I think that's "nobody developing the Linux MM uses one of these architectures".
On Wed, 20 Apr 2011, Matthew Wilcox wrote: > > That part makes me think the best option is to make parisc do > > CONFIG_NUMA as well regardless of the historical intent was. > > But it's not just parisc. It's six other architectures as well, some > of which aren't even SMP. Does !SMP && NUMA make any kind of sense? > It does as long as DISCONTIGMEM is hijacking NUMA abstractions throughout the code; for example, look at the .config that James is probably using for testing here: CONFIG_PA8X00=y CONFIG_64BIT=y CONFIG_DISCONTIGMEM=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_NODES_SHIFT=3 and CONFIG_NUMA is not enabled. So we want CONFIG_NODES_SHIFT of 3 (because MAX_PHYSMEM_RANGES is 8) and CONFIG_NEED_MULTIPLE_NODES is enabled because of DISCONTIGMEM: # # Both the NUMA code and DISCONTIGMEM use arrays of pg_data_t's # to represent different areas of memory. This variable allows # those dependencies to exist individually. # config NEED_MULTIPLE_NODES def_bool y depends on DISCONTIGMEM || NUMA when in reality we should do away with CONFIG_NEED_MULTIPLE_NODES and just force DISCONTIGMEM to enable CONFIG_NUMA at least for -stable and as a quick fix for James. In the long run, we'll probably want to define a lighterweight CONFIG_NUMA as a layer that CONFIG_DISCONTIGMEM can use for memory range abstractions and then CONFIG_NUMA is built on top of it to define proximity between those ranges. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2011, Christoph Lameter wrote: > There is barely any testing going on at all of this since we have had this > issue for more than 5 years and have not noticed it. The absence of bug > reports therefore proves nothing. Code inspection of the VM shows > that this is an issue that arises in multiple subsystems and that we have > VM_BUG_ONs in the page allocator that should trigger for these situations. > > Usage of DISCONTIGMEM and !NUMA is not safe and should be flagged as such. > We don't actually have any bug reports in front of us that show anything else in the VM other than slub has issues with this configuration, so marking them as broken is probably premature. The parisc config that triggered this debugging enables CONFIG_SLAB by default, so it probably has gone unnoticed just because nobody other than James has actually tried it on hppa64. Let's see if KOSAKI-san's fixes to Kconfig (even though I'd prefer the simpler and implicit "config NUMA def_bool ARCH_DISCONTIGMEM_ENABLE" over his config NUMA) and my fix to parisc to set the bit in N_NORMAL_MEMORY so that CONFIG_SLUB initializes kmem_cache_node correctly works and then address issues in the core VM as they arise. Presumably someone has been running DISCONTIGMEM on hppa64 in the past five years without issues with defconfig, so the issue here may just be slub. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2011-04-19 16:43:53.822507013 -0500 +++ linux-2.6/include/linux/mm.h 2011-04-19 16:44:52.082506944 -0500 @@ -666,14 +666,20 @@ static inline int zone_to_nid(struct zon } #ifdef NODE_NOT_IN_PAGE_FLAGS -extern int page_to_nid(struct page *page); +extern int __page_to_nid(struct page *page); #else -static inline int page_to_nid(struct page *page) +static inline int __page_to_nid(struct page *page) { return (page->flags >> NODES_PGSHIFT) & NODES_MASK; } #endif +#ifdef CONFIG_NUMA +#define page_to_nid __page_to_nid +#else +#define page_to_nid(x) 0 +#endif + static inline struct zone *page_zone(struct page *page) { return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]; Index: linux-2.6/include/asm-generic/memory_model.h =================================================================== --- linux-2.6.orig/include/asm-generic/memory_model.h 2011-04-19 16:45:26.772506904 -0500 +++ linux-2.6/include/asm-generic/memory_model.h 2011-04-19 16:46:02.602506861 -0500 @@ -40,7 +40,7 @@ #define __page_to_pfn(pg) \ ({ struct page *__pg = (pg); \ - struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg)); \ + struct pglist_data *__pgdat = NODE_DATA(__page_to_nid(__pg)); \ (unsigned long)(__pg - __pgdat->node_mem_map) + \ __pgdat->node_start_pfn; \ }) Index: linux-2.6/mm/sparse.c =================================================================== --- linux-2.6.orig/mm/sparse.c 2011-04-19 16:44:58.432506937 -0500 +++ linux-2.6/mm/sparse.c 2011-04-19 16:45:07.332506926 -0500 @@ -40,7 +40,7 @@ static u8 section_to_node_table[NR_MEM_S static u16 section_to_node_table[NR_MEM_SECTIONS] __cacheline_aligned; #endif -int page_to_nid(struct page *page) +int __page_to_nid(struct page *page) { return section_to_node_table[page_to_section(page)]; }