Message ID | 1303248576.11237.23.camel@mulgrave.site (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Tue, 19 Apr 2011, James Bottomley wrote: > Slub makes assumptions about page_to_nid() which are violated by > DISCONTIGMEM and !NUMA. This violation results in a panic because Fix this by stating correctly by saying "The kernel makes assumptions in various subsystems ..." > page_to_nid() can be non-zero for pages in the discontiguous ranges and > this leads to a null return by get_node(). The assertion by the > maintainer is that DISCONTIGMEM should only be allowed when NUMA is also > defined. However, at least six architectures: alpha, ia64, m32r, m68k, That is not what I said. DISCONTIG support needs to be fixed so that the core subsystems using page_to_nid() will operate correctly with a !NUMA discontig configuration. Core will expect page_to_nid() to only return 0 on !NUMA. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2011-04-19 at 16:38 -0500, Christoph Lameter wrote: > On Tue, 19 Apr 2011, James Bottomley wrote: > > > Slub makes assumptions about page_to_nid() which are violated by > > DISCONTIGMEM and !NUMA. This violation results in a panic because > > Fix this by stating correctly by saying "The kernel makes assumptions in > various subsystems ..." Slub is a subset of the kernel, so the original wording is a bit more precise. > > page_to_nid() can be non-zero for pages in the discontiguous ranges and > > this leads to a null return by get_node(). The assertion by the > > maintainer is that DISCONTIGMEM should only be allowed when NUMA is also > > defined. However, at least six architectures: alpha, ia64, m32r, m68k, > > That is not what I said. DISCONTIG support needs to be fixed so that the > core subsystems using page_to_nid() will operate correctly with a !NUMA > discontig configuration. Core will expect page_to_nid() to only return 0 > on !NUMA. Well, we can discuss how to proceed going forwards. The current fact is that any prior kernel that enables SLUB with DISCONTIGMEM and !NUMA will eventually go boom when the page allocator returns a page not in the first pfn array. That has to be fixed in -stable. I don't really think a DISCONTIGMEM re-engineering effort would be the best thing for the -stable series. James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 19 Apr 2011, James Bottomley wrote: > On Tue, 2011-04-19 at 16:38 -0500, Christoph Lameter wrote: > > On Tue, 19 Apr 2011, James Bottomley wrote: > > > > > Slub makes assumptions about page_to_nid() which are violated by > > > DISCONTIGMEM and !NUMA. This violation results in a panic because > > > > Fix this by stating correctly by saying "The kernel makes assumptions in > > various subsystems ..." > > Slub is a subset of the kernel, so the original wording is a bit more > precise. F.e. hugepage support does the same thing. So it not slub specific. > Well, we can discuss how to proceed going forwards. The current fact is > that any prior kernel that enables SLUB with DISCONTIGMEM and !NUMA will > eventually go boom when the page allocator returns a page not in the > first pfn array. That has to be fixed in -stable. I don't really think > a DISCONTIGMEM re-engineering effort would be the best thing for the > -stable series. As far as I can tell: It will go boom even with other subsystems. I am surprised that we have never seen this before. -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 19, 2011 at 23:29, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > Slub makes assumptions about page_to_nid() which are violated by > DISCONTIGMEM and !NUMA. This violation results in a panic because > page_to_nid() can be non-zero for pages in the discontiguous ranges and > this leads to a null return by get_node(). The assertion by the > maintainer is that DISCONTIGMEM should only be allowed when NUMA is also > defined. However, at least six architectures: alpha, ia64, m32r, m68k, > mips, parisc violate this. The panic is a regression against slab, so > just mark slub broken in the problem configuration to prevent users > reporting these panics. How does the problem manifest itself? We're having a problem on m68k, which seems to go away when switching from SLUB to SLAB, or when reverting a commit in [2] (probably this was never reported upstream). References: [1] http://www.mail-archive.com/linux-m68k@vger.kernel.org/msg02812.html [2] http://www.spinics.net/lists/linux-m68k/msg03401.html > Cc: stable@kernel.org > Signed-off-by: James Bottomley <James.Bottomley@suse.de> > > --- > > diff --git a/init/Kconfig b/init/Kconfig > index 56240e7..a7ad8fb 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1226,6 +1226,7 @@ config SLAB > per cpu and per node queues. > > config SLUB > + depends on BROKEN || NUMA || !DISCONTIGMEM > bool "SLUB (Unqueued Allocator)" > help > SLUB is a slab allocator that minimizes cache line usage Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 20 Apr 2011, Geert Uytterhoeven wrote: > On Tue, Apr 19, 2011 at 23:29, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: > > Slub makes assumptions about page_to_nid() which are violated by > > DISCONTIGMEM and !NUMA. This violation results in a panic because > > page_to_nid() can be non-zero for pages in the discontiguous ranges and > > this leads to a null return by get_node(). The assertion by the > > maintainer is that DISCONTIGMEM should only be allowed when NUMA is also > > defined. However, at least six architectures: alpha, ia64, m32r, m68k, > > mips, parisc violate this. The panic is a regression against slab, so > > just mark slub broken in the problem configuration to prevent users > > reporting these panics. > > How does the problem manifest itself? We're having a problem on m68k, which > seems to go away when switching from SLUB to SLAB, or when reverting a commit > in [2] (probably this was never reported upstream). The NULL pointer dereference is typical. If SLUB breaks in this way then other things are also not functioning in the core. Guess you are also using DISCONTIG in !NUMA?
The problem was indeed never reported upstream but it does in fact manifest as a null pointer dereference, and happens in a discontiguous memory context on m68k. I'm uncertain as to whether I had tracked the exact source of the null pointer, I need to check my notes on this. Reverting the commit in question makes SLUB behave as though NUMA was defined, so this all checks out. Apologies for not escalating this beyond m68k, and thanks to Geert for spotting the connection. Cheers, Michael On Wed, Apr 20, 2011 at 7:48 PM, Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Tue, Apr 19, 2011 at 23:29, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: >> Slub makes assumptions about page_to_nid() which are violated by >> DISCONTIGMEM and !NUMA. This violation results in a panic because >> page_to_nid() can be non-zero for pages in the discontiguous ranges and >> this leads to a null return by get_node(). The assertion by the >> maintainer is that DISCONTIGMEM should only be allowed when NUMA is also >> defined. However, at least six architectures: alpha, ia64, m32r, m68k, >> mips, parisc violate this. The panic is a regression against slab, so >> just mark slub broken in the problem configuration to prevent users >> reporting these panics. > > How does the problem manifest itself? We're having a problem on m68k, which > seems to go away when switching from SLUB to SLAB, or when reverting a commit > in [2] (probably this was never reported upstream). > > References: > [1] http://www.mail-archive.com/linux-m68k@vger.kernel.org/msg02812.html > [2] http://www.spinics.net/lists/linux-m68k/msg03401.html > >> Cc: stable@kernel.org >> Signed-off-by: James Bottomley <James.Bottomley@suse.de> >> >> --- >> >> diff --git a/init/Kconfig b/init/Kconfig >> index 56240e7..a7ad8fb 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -1226,6 +1226,7 @@ config SLAB >> per cpu and per node queues. >> >> config SLUB >> + depends on BROKEN || NUMA || !DISCONTIGMEM >> bool "SLUB (Unqueued Allocator)" >> help >> SLUB is a slab allocator that minimizes cache line usage > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/init/Kconfig b/init/Kconfig index 56240e7..a7ad8fb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1226,6 +1226,7 @@ config SLAB per cpu and per node queues. config SLUB + depends on BROKEN || NUMA || !DISCONTIGMEM bool "SLUB (Unqueued Allocator)" help SLUB is a slab allocator that minimizes cache line usage
Slub makes assumptions about page_to_nid() which are violated by DISCONTIGMEM and !NUMA. This violation results in a panic because page_to_nid() can be non-zero for pages in the discontiguous ranges and this leads to a null return by get_node(). The assertion by the maintainer is that DISCONTIGMEM should only be allowed when NUMA is also defined. However, at least six architectures: alpha, ia64, m32r, m68k, mips, parisc violate this. The panic is a regression against slab, so just mark slub broken in the problem configuration to prevent users reporting these panics. Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@suse.de> --- -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html