Message ID | alpine.DEB.2.00.1104191325470.19358@router.home (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Tue, 2011-04-19 at 13:35 -0500, Christoph Lameter wrote: > On Tue, 19 Apr 2011, James Bottomley wrote: > > > > } > > > > > > How in the world did you get a zone setup in node 1 with a !NUMA config? > > > > I told you ... I forced an allocation into the first discontiguous > > region. That will return 1 for page_to_nid(). > > How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you > cannot tell the page allocator to allocate from node 1. Yes, it does, as I explained in the email. > zone_to_nid is used as a fallback mechanism for page_to_nid() and as shown > will always return 0 for !NUMA configs. > > page_to_nid(x) == zone_to_nid(page_zone(x)) must hold true. It is not > here. > > > > The problem seems to be that the kernel seems to allow a > > > definition of a page_to_nid() function that returns non zero in the !NUMA > > > case. > > > > This is called reality, yes. > > There you have the bug. Fix that and things will work fine. Why don't yout file the bug against reality? I'm not sure I have enough credibility ... > > right, that's what I told you: slub is broken because it's making a > > wrong assumption. Look in asm-generic/memory_model.h it shows how the > > page_to_nid() is used in finding the pfn array. DISCONTIGMEM uses some > > of the numa properties (including assigning zones to the discontiguous > > regions). > > Bitrotted code? Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86 all use the discontigmem memory model in some configurations. > If it uses numa properties then it must use a zone field > in struct zone. So DISCONTIGMEM seems to require CONFIG_NUMA. No ... you're giving me back your assumptions. They're not based on what the kernel does. CONFIG_NUMA may or may not be defined with CONFIG_DISCONTIGMEM. Of all the above, only x86 always had NUMA with DISCONTIGMEM. > > > If you think that is broken then we have brokenness all over the kernel > > > whenever we determine the node from a page and use that to do a lookup. > > > > Not really. The rest of the kernel uses the proper macros. in > > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly. > > You've cut across that with all the CONFIG_NUMA checks in slub. > > What are "the proper macros"? AFAICT page_to_nid() is the proper way to > access the node of a page. If page_to_nid() returns 1 then you have a zone > that the kernel knows of as being in node 0 having a page on a different > node. Well it depends what you want. If you only want the actual NUMA node, then pfn_to_nid() probably isn't what you want, because in a DISCONTIGMEM model, there may be multiple nids per actual numa node. > We can likely force page_to_nid to ignore the node information that have > been erroneously placed there but this looks like something deeper is > wrong here. The node field in struct page is not only used for the Linux > support of a NUMA node but also for blocks of memory. Those should be > separate things. Look, it's not wrong, it's by design. The assumption that non-numa systems don't use nodes is the wrong one. > --- > include/linux/mm.h | 4 ++++ > 1 file changed, 4 insertions(+) > > Index: linux-2.6/include/linux/mm.h > =================================================================== > --- linux-2.6.orig/include/linux/mm.h 2011-04-19 13:20:20.092521248 -0500 > +++ linux-2.6/include/linux/mm.h 2011-04-19 13:21:05.962521196 -0500 > @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon > #endif > } > > +#ifdef CONFIG_NUMA > #ifdef NODE_NOT_IN_PAGE_FLAGS > extern int page_to_nid(struct page *page); > #else > @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag > return (page->flags >> NODES_PGSHIFT) & NODES_MASK; > } > #endif > +#else > +#define page_to_nid(x) 0 > +#endif Don't be silly ... that breaks asm-generic/memory_model.h James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 19 Apr 2011, James Bottomley wrote: > > > I told you ... I forced an allocation into the first discontiguous > > > region. That will return 1 for page_to_nid(). > > > > How? The kernel has no concept of a node 1 without CONFIG_NUMA and so you > > cannot tell the page allocator to allocate from node 1. > > Yes, it does, as I explained in the email. Looked through it and canot find it. How would that be possible to do with core kernel calls since the page allocator calls do not allow you to specify a node under !NUMA. > Don't be silly: alpha, ia64, m32r, m68k, mips, parisc, tile and even x86 > all use the discontigmem memory model in some configurations. I guess DISCONTIGMEM is typically used together with NUMA. Otherwise we would have run into this before. > > > Not really. The rest of the kernel uses the proper macros. in > > > DISCONTIGMEM but !NUMA configs, the numa macros expand correctly. > > > You've cut across that with all the CONFIG_NUMA checks in slub. > > > > What are "the proper macros"? AFAICT page_to_nid() is the proper way to > > access the node of a page. If page_to_nid() returns 1 then you have a zone > > that the kernel knows of as being in node 0 having a page on a different > > node. > > Well it depends what you want. If you only want the actual NUMA node, > then pfn_to_nid() probably isn't what you want, because in a > DISCONTIGMEM model, there may be multiple nids per actual numa node. Right yes you got it. The notion of a node is different(!!!!!). What matters to the core kernel is the notion of a NUMA node. If DISCONTIGMEM runs with !NUMA then the way that "node" is used in DISCONTIGMEM is different from the core code and refers only to the memory blocks managed by DISCONTIGMEM. The node should be irrelevant to the core then. > > We can likely force page_to_nid to ignore the node information that have > > been erroneously placed there but this looks like something deeper is > > wrong here. The node field in struct page is not only used for the Linux > > support of a NUMA node but also for blocks of memory. Those should be > > separate things. > > Look, it's not wrong, it's by design. The assumption that non-numa > systems don't use nodes is the wrong one. Depends on how you define the notion of a node. The way the core kernel uses the term "node" means that there will be only one node and that is node 0 if CONFIG_NUMA is off. Thus page_to_nid() must return 0 for !NUMA. All sort of things in the core code will break in weird ways if you do allow page_to_nid to return 1 under !NUMA. Just look at the usage of page_to_nid(). Tried to use huge pages yet? And how will your version of reality deal with the following checks in the page allocator? F.e. VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); Enabled CONFIG_DEBUG_VM yet? > > Index: linux-2.6/include/linux/mm.h > > =================================================================== > > --- linux-2.6.orig/include/linux/mm.h 2011-04-19 13:20:20.092521248 -0500 > > +++ linux-2.6/include/linux/mm.h 2011-04-19 13:21:05.962521196 -0500 > > @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon > > #endif > > } > > > > +#ifdef CONFIG_NUMA > > #ifdef NODE_NOT_IN_PAGE_FLAGS > > extern int page_to_nid(struct page *page); > > #else > > @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag > > return (page->flags >> NODES_PGSHIFT) & NODES_MASK; > > } > > #endif > > +#else > > +#define page_to_nid(x) 0 > > +#endif > > Don't be silly ... that breaks asm-generic/memory_model.h Well yeah looks like in order to be clean in the !NUMA case we would then need a page_to_discontig_node_id() there that is different from the page_to_nid() used for the core. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/include/linux/mm.h =================================================================== --- linux-2.6.orig/include/linux/mm.h 2011-04-19 13:20:20.092521248 -0500 +++ linux-2.6/include/linux/mm.h 2011-04-19 13:21:05.962521196 -0500 @@ -665,6 +665,7 @@ static inline int zone_to_nid(struct zon #endif } +#ifdef CONFIG_NUMA #ifdef NODE_NOT_IN_PAGE_FLAGS extern int page_to_nid(struct page *page); #else @@ -673,6 +674,9 @@ static inline int page_to_nid(struct pag return (page->flags >> NODES_PGSHIFT) & NODES_MASK; } #endif +#else +#define page_to_nid(x) 0 +#endif static inline struct zone *page_zone(struct page *page) {