Message ID | 1547183577-20309-1-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info | expand |
On 1/10/19 9:12 PM, Pingfan Liu wrote: > Background > When kaslr kernel can be guaranteed to sit inside unmovable node > after [1]. What does this "[1]" refer to? Also, can you clarify your terminology here a bit. By "kaslr kernel", do you mean the base address? > But if kaslr kernel is located near the end of the movable node, > then bottom-up allocator may create pagetable which crosses the boundary > between unmovable node and movable node. Again, I'm confused. Do you literally mean a single page table page? I think you mean the page tables, but it would be nice to clarify this, and also explicitly state which page tables these are. > It is a probability issue, > two factors include -1. how big the gap between kernel end and > unmovable node's end. -2. how many memory does the system own. > Alternative way to fix this issue is by increasing the gap by > boot/compressed/kaslr*. Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]? It took me a minute to figure out you were talking about filenames. > But taking the scenario of PB level memory, the pagetable will take > server MB even if using 1GB page, different page attr and fragment > will make things worse. So it is hard to decide how much should the > gap increase. I'm not following this. If we move the image around, we leave holes. Why do we need page table pages allocated to cover these holes? > The following figure show the defection of current bottom-up style: > [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC] "defection"? > If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping() > can generate pgtable on nodeC, which stain movable node. Let me see if I can summarize this: 1. The kernel ASLR decompression code picks a spot to place the kernel image in physical memory. 2. Some page tables are dynamically allocated near (after) this spot. 3. Sometimes, based on the random ASLR location, these page tables fall over into the "movable node" area. Being unmovable allocations, this is not cool. 4. To fix this (on 64-bit at least), we stop allocating page tables based on the location of the kernel image. Instead, we allocate using the memblock allocator itself, which knows how to avoid the movable node. > This patch makes it certainty instead of a probablity problem. It achieves > this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping(). What does memory hotplug have to do with this? I thought this was all about early boot.
On Tue, Jan 15, 2019 at 7:02 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 1/10/19 9:12 PM, Pingfan Liu wrote: > > Background > > When kaslr kernel can be guaranteed to sit inside unmovable node > > after [1]. > > What does this "[1]" refer to? > https://lore.kernel.org/patchwork/patch/1029376/ > Also, can you clarify your terminology here a bit. By "kaslr kernel", > do you mean the base address? > It should be the randomization of load address. Googled, and found out that it is "base address". > > But if kaslr kernel is located near the end of the movable node, > > then bottom-up allocator may create pagetable which crosses the boundary > > between unmovable node and movable node. > > Again, I'm confused. Do you literally mean a single page table page? I > think you mean the page tables, but it would be nice to clarify this, > and also explicitly state which page tables these are. > It should be page table pages. The page table is built by init_mem_mapping(). > > It is a probability issue, > > two factors include -1. how big the gap between kernel end and > > unmovable node's end. -2. how many memory does the system own. > > Alternative way to fix this issue is by increasing the gap by > > boot/compressed/kaslr*. > > Oh, you mean the KASLR code in arch/x86/boot/compressed/kaslr*.[ch]? > Sorry, and yes, code in arch/x86/boot/compressed/kaslr_64.c and kaslr.c > It took me a minute to figure out you were talking about filenames. > > > But taking the scenario of PB level memory, the pagetable will take > > server MB even if using 1GB page, different page attr and fragment > > will make things worse. So it is hard to decide how much should the > > gap increase. > I'm not following this. If we move the image around, we leave holes. > Why do we need page table pages allocated to cover these holes? > I means in arch/x86/boot/compressed/kaslr.c, store_slot_info() { slot_area.num = (region->size - image_size) /CONFIG_PHYSICAL_ALIGN + 1 }. Let us denote the size of page table as "X", then the formula is changed to slot_area.num = (region->size - image_size -X) /CONFIG_PHYSICAL_ALIGN + 1. And it is hard to decide X due to the above factors. > > The following figure show the defection of current bottom-up style: > > [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC] > > "defection"? > Oh, defect. > > If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping() > > can generate pgtable on nodeC, which stain movable node. > > Let me see if I can summarize this: > 1. The kernel ASLR decompression code picks a spot to place the kernel > image in physical memory. > 2. Some page tables are dynamically allocated near (after) this spot. > 3. Sometimes, based on the random ASLR location, these page tables fall > over into the "movable node" area. Being unmovable allocations, this > is not cool. > 4. To fix this (on 64-bit at least), we stop allocating page tables > based on the location of the kernel image. Instead, we allocate > using the memblock allocator itself, which knows how to avoid the > movable node. > Yes, you get my idea exactly. Thanks for your help to summary it. Hard for me to express it clearly in English. > > This patch makes it certainty instead of a probablity problem. It achieves > > this by pushing forward the parsing of mem hotplug info ahead of init_mem_mapping(). > > What does memory hotplug have to do with this? I thought this was all > about early boot. Put the info about memory hot plugable to memblock allocator, initmem_init()->...->acpi_numa_memory_affinity_init(), where memblock_mark_hotplug() does it. Later when memory allocator works, in __next_mem_range(), it will check this info by memblock_is_hotpluggable(). Thanks and regards, Pingfan