Message ID | 20150821202707.GA920@agluck-desk.sc.intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Fri, Aug 21, 2015 at 1:27 PM, Luck, Tony <tony.luck@intel.com> wrote: > On Fri, Aug 21, 2015 at 11:38:13AM -0700, Yinghai Lu wrote: >> That commit could be reverted. >> According to >> https://lkml.org/lkml/2014/11/10/123 > > Do we really need to force the MIN_MEMORY_BLOCK_SIZE on small > systems? That is introduced in commit 982792c7 ("x86, mm: probe memory block size for generic x86 64bit "). that patch is used to make boot faster why create less entries in /sys/device/system/memory/. On system with less 64G ram, that will not have too many entries even with MIN_MEMORY_BLOCK_SIZE. > > What about this patch - which just uses max_pfn to choose > the block size. > > It seems that many systems with large amounts of memory > will have a nicely aligned max_pfn ... so they will get > the 2GB block size. If they don't have a well aligned > max_pfn, then they need to use a smaller size to avoid > the crash I saw. Good to me. > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 3fba623e3ba5..e14e90fd1cf8 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -1195,15 +1195,6 @@ static unsigned long probe_memory_block_size(void) > /* start from 2g */ > unsigned long bz = 1UL<<31; > > - if (totalram_pages >= (64ULL << (30 - PAGE_SHIFT))) { > - pr_info("Using 2GB memory block size for large-memory system\n"); > - return 2UL * 1024 * 1024 * 1024; > - } > - > - /* less than 64g installed */ > - if ((max_pfn << PAGE_SHIFT) < (16UL << 32)) > - return MIN_MEMORY_BLOCK_SIZE; > - > /* get the tail size */ > while (bz > MIN_MEMORY_BLOCK_SIZE) { > if (!((max_pfn << PAGE_SHIFT) & (bz - 1))) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 21, 2015 at 1:50 PM, Yinghai Lu <yinghai@kernel.org> wrote: >> It seems that many systems with large amounts of memory >> will have a nicely aligned max_pfn ... so they will get >> the 2GB block size. If they don't have a well aligned >> max_pfn, then they need to use a smaller size to avoid >> the crash I saw. > > Good to me. Still stuff going on that I don't understand here. I increased the amount of mirrored memory in this machine which moved max_pfn to 0x7560000 and probe_memory_block_size() picked 512MB as the memory_block_size, which seemed plausible. But my kernel still crashed during boot with this value. :-( Forcing the block size to 128M made the system boot. Maybe all the holes in the e820 map matter too (specifically the alignment of the holes)? -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 21, 2015 at 4:54 PM, Tony Luck <tony.luck@gmail.com> wrote: > On Fri, Aug 21, 2015 at 1:50 PM, Yinghai Lu <yinghai@kernel.org> wrote: > > Still stuff going on that I don't understand here. I increased the amount of > mirrored memory in this machine which moved max_pfn to 0x7560000 > and probe_memory_block_size() picked 512MB as the memory_block_size, > which seemed plausible. > > But my kernel still crashed during boot with this value. :-( > Forcing the block size to 128M made the system boot. > > Maybe all the holes in the e820 map matter too (specifically the > alignment of the holes)? Then, what does the E820 look like? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 24, 2015 at 10:46 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> Then, what does the E820 look like?
See attached serial console log of the latest crash
-Tony
On Mon, Aug 24, 2015 at 1:41 PM, Tony Luck <tony.luck@gmail.com> wrote: > On Mon, Aug 24, 2015 at 10:46 AM, Yinghai Lu <yinghai@kernel.org> wrote: >> Then, what does the E820 look like? > > See attached serial console log of the latest crash Can you boot with "debug ignore_loglevel" so we can see following print out for vmemmap? [ 0.352486] [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ 0.358758] [ffffea0004000000-ffffea0005ffffff] PMD -> [ffff88017d600000-ffff88017f5fffff] on node 1 -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 24, 2015 at 2:25 PM, Yinghai Lu <yinghai@kernel.org> wrote: > Can you boot with "debug ignore_loglevel" so we can see following print out > for vmemmap? See attached. There are a few extra messages from my own debug printk() calls. It seems that we successfully deal with node 0 from topology_init() but die walking node 1. I see that the NODE_DATA limits for memory on node 1 were from 1d70000 to 3a00000. But when we get into register_mem_sect_under_node() we have rounded the start pfn down to 1d00000 ... and we panic processing that range (which is in a hole in e820). We seem to die here: for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) { int page_nid; page_nid = get_nid_for_pfn(pfn); -Tony
On Mon, Aug 24, 2015 at 3:39 PM, Tony Luck <tony.luck@gmail.com> wrote: > On Mon, Aug 24, 2015 at 2:25 PM, Yinghai Lu <yinghai@kernel.org> wrote: > >> Can you boot with "debug ignore_loglevel" so we can see following print out >> for vmemmap? > > See attached. There are a few extra messages from my own debug printk() > calls. It seems that we successfully deal with node 0 from topology_init() > but die walking node 1. I see that the NODE_DATA limits for memory > on node 1 were from 1d70000 to 3a00000. But when we get into > register_mem_sect_under_node() we have rounded the start pfn down to > 1d00000 ... and we panic processing that range (which is in a hole in e820). > > We seem to die here: > > for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) { > int page_nid; > > page_nid = get_nid_for_pfn(pfn); oh, no. register_mem_sect_under_node() is assuming: first section in the block is present and first page in that section is present. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 3fba623e3ba5..e14e90fd1cf8 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1195,15 +1195,6 @@ static unsigned long probe_memory_block_size(void) /* start from 2g */ unsigned long bz = 1UL<<31; - if (totalram_pages >= (64ULL << (30 - PAGE_SHIFT))) { - pr_info("Using 2GB memory block size for large-memory system\n"); - return 2UL * 1024 * 1024 * 1024; - } - - /* less than 64g installed */ - if ((max_pfn << PAGE_SHIFT) < (16UL << 32)) - return MIN_MEMORY_BLOCK_SIZE; - /* get the tail size */ while (bz > MIN_MEMORY_BLOCK_SIZE) { if (!((max_pfn << PAGE_SHIFT) & (bz - 1)))