Message ID | 1547183577-20309-2-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info | expand |
On Fri, Jan 11, 2019 at 01:12:51PM +0800, Pingfan Liu wrote: >This patch identifies the point where memblock alloc start. It has no >functional. [...] >+#ifdef CONFIG_MEMORY_HOTPLUG >+ /* >+ * Memory used by the kernel cannot be hot-removed because Linux >+ * cannot migrate the kernel pages. When memory hotplug is >+ * enabled, we should prevent memblock from allocating memory >+ * for the kernel. >+ * >+ * ACPI SRAT records all hotpluggable memory ranges. But before >+ * SRAT is parsed, we don't know about it. >+ * >+ * The kernel image is loaded into memory at very early time. We >+ * cannot prevent this anyway. So on NUMA system, we set any >+ * node the kernel resides in as un-hotpluggable. >+ * >+ * Since on modern servers, one node could have double-digit >+ * gigabytes memory, we can assume the memory around the kernel >+ * image is also un-hotpluggable. So before SRAT is parsed, just >+ * allocate memory near the kernel image to try the best to keep >+ * the kernel away from hotpluggable memory. >+ */ >+ if (movable_node_is_enabled()) >+ memblock_set_bottom_up(true); Hi Pingfan, In my understanding, 'movable_node' is based on the that memory near kernel is considered as in the same node as kernel in high possibility. If SRAT has been parsed early, do we still need the kernel parameter 'movable_node'? Since you have got the memory information about hot-remove, so I wonder if it's OK to drop 'movable_node', and if memory-hotremove is enabled, change memblock allocation according to SRAT. If there is something wrong in my understanding, please let me know. Thanks, Chao Fan >+#endif > init_mem_mapping(); >+ memblock_set_current_limit(get_max_mapped()); > > idt_setup_early_pf(); > >@@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p) > */ > mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE; > >- memblock_set_current_limit(get_max_mapped()); >- > /* > * NOTE: On x86-32, only from this point on, fixmaps are ready for use. > */ >-- >2.7.4 > > >
On Fri, Jan 11, 2019 at 2:13 PM Chao Fan <fanc.fnst@cn.fujitsu.com> wrote: > > On Fri, Jan 11, 2019 at 01:12:51PM +0800, Pingfan Liu wrote: > >This patch identifies the point where memblock alloc start. It has no > >functional. > [...] > >+#ifdef CONFIG_MEMORY_HOTPLUG > >+ /* > >+ * Memory used by the kernel cannot be hot-removed because Linux > >+ * cannot migrate the kernel pages. When memory hotplug is > >+ * enabled, we should prevent memblock from allocating memory > >+ * for the kernel. > >+ * > >+ * ACPI SRAT records all hotpluggable memory ranges. But before > >+ * SRAT is parsed, we don't know about it. > >+ * > >+ * The kernel image is loaded into memory at very early time. We > >+ * cannot prevent this anyway. So on NUMA system, we set any > >+ * node the kernel resides in as un-hotpluggable. > >+ * > >+ * Since on modern servers, one node could have double-digit > >+ * gigabytes memory, we can assume the memory around the kernel > >+ * image is also un-hotpluggable. So before SRAT is parsed, just > >+ * allocate memory near the kernel image to try the best to keep > >+ * the kernel away from hotpluggable memory. > >+ */ > >+ if (movable_node_is_enabled()) > >+ memblock_set_bottom_up(true); > > Hi Pingfan, > > In my understanding, 'movable_node' is based on the that memory near > kernel is considered as in the same node as kernel in high possibility. > > If SRAT has been parsed early, do we still need the kernel parameter > 'movable_node'? Since you have got the memory information about hot-remove, > so I wonder if it's OK to drop 'movable_node', and if memory-hotremove is > enabled, change memblock allocation according to SRAT. > x86_32 still need this logic. Maybe it can be doable later. Thanks, Pingfan > If there is something wrong in my understanding, please let me know. > > Thanks, > Chao Fan > > >+#endif > > init_mem_mapping(); > >+ memblock_set_current_limit(get_max_mapped()); > > > > idt_setup_early_pf(); > > > >@@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p) > > */ > > mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE; > > > >- memblock_set_current_limit(get_max_mapped()); > >- > > /* > > * NOTE: On x86-32, only from this point on, fixmaps are ready for use. > > */ > >-- > >2.7.4 > > > > > > > >
On 1/10/19 9:12 PM, Pingfan Liu wrote: > This patch identifies the point where memblock alloc start. It has no > functional. It has no functional ... what? Effects? > - memblock_set_current_limit(ISA_END_ADDRESS); > - e820__memblock_setup(); > - > reserve_bios_regions(); > > if (efi_enabled(EFI_MEMMAP)) { > @@ -1113,6 +1087,8 @@ void __init setup_arch(char **cmdline_p) > efi_reserve_boot_services(); > } > > + memblock_set_current_limit(0, ISA_END_ADDRESS, false); > + e820__memblock_setup(); It looks like you changed the arguments passed to memblock_set_current_limit(). How can this even compile? Did you mean that this patch is not functional?
On Tue, Jan 15, 2019 at 7:07 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 1/10/19 9:12 PM, Pingfan Liu wrote: > > This patch identifies the point where memblock alloc start. It has no > > functional. > > It has no functional ... what? Effects? > During re-organize the code, it takes me a long time to figure out why memblock_set_bottom_up(true) is added here, and how far can it be deferred. And finally, I realize that it only takes effect after e820__memblock_setup(), the point where memblock allocator can work. So I concentrate the related code, and hope this patch can classify this truth. > > - memblock_set_current_limit(ISA_END_ADDRESS); > > - e820__memblock_setup(); > > - > > reserve_bios_regions(); > > > > if (efi_enabled(EFI_MEMMAP)) { > > @@ -1113,6 +1087,8 @@ void __init setup_arch(char **cmdline_p) > > efi_reserve_boot_services(); > > } > > > > + memblock_set_current_limit(0, ISA_END_ADDRESS, false); > > + e820__memblock_setup(); > > It looks like you changed the arguments passed to > memblock_set_current_limit(). How can this even compile? Did you mean > that this patch is not functional? > Sorry that during rebasing, merge trivial fix by mistake. I will build against each patch. Best regards, Pingfan
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d494b9b..ac432ae 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -962,29 +962,6 @@ void __init setup_arch(char **cmdline_p) if (efi_enabled(EFI_BOOT)) efi_memblock_x86_reserve_range(); -#ifdef CONFIG_MEMORY_HOTPLUG - /* - * Memory used by the kernel cannot be hot-removed because Linux - * cannot migrate the kernel pages. When memory hotplug is - * enabled, we should prevent memblock from allocating memory - * for the kernel. - * - * ACPI SRAT records all hotpluggable memory ranges. But before - * SRAT is parsed, we don't know about it. - * - * The kernel image is loaded into memory at very early time. We - * cannot prevent this anyway. So on NUMA system, we set any - * node the kernel resides in as un-hotpluggable. - * - * Since on modern servers, one node could have double-digit - * gigabytes memory, we can assume the memory around the kernel - * image is also un-hotpluggable. So before SRAT is parsed, just - * allocate memory near the kernel image to try the best to keep - * the kernel away from hotpluggable memory. - */ - if (movable_node_is_enabled()) - memblock_set_bottom_up(true); -#endif x86_report_nx(); @@ -1096,9 +1073,6 @@ void __init setup_arch(char **cmdline_p) cleanup_highmap(); - memblock_set_current_limit(ISA_END_ADDRESS); - e820__memblock_setup(); - reserve_bios_regions(); if (efi_enabled(EFI_MEMMAP)) { @@ -1113,6 +1087,8 @@ void __init setup_arch(char **cmdline_p) efi_reserve_boot_services(); } + memblock_set_current_limit(0, ISA_END_ADDRESS, false); + e820__memblock_setup(); /* preallocate 4k for mptable mpc */ e820__memblock_alloc_reserved_mpc_new(); @@ -1130,7 +1106,31 @@ void __init setup_arch(char **cmdline_p) trim_platform_memory_ranges(); trim_low_memory_range(); +#ifdef CONFIG_MEMORY_HOTPLUG + /* + * Memory used by the kernel cannot be hot-removed because Linux + * cannot migrate the kernel pages. When memory hotplug is + * enabled, we should prevent memblock from allocating memory + * for the kernel. + * + * ACPI SRAT records all hotpluggable memory ranges. But before + * SRAT is parsed, we don't know about it. + * + * The kernel image is loaded into memory at very early time. We + * cannot prevent this anyway. So on NUMA system, we set any + * node the kernel resides in as un-hotpluggable. + * + * Since on modern servers, one node could have double-digit + * gigabytes memory, we can assume the memory around the kernel + * image is also un-hotpluggable. So before SRAT is parsed, just + * allocate memory near the kernel image to try the best to keep + * the kernel away from hotpluggable memory. + */ + if (movable_node_is_enabled()) + memblock_set_bottom_up(true); +#endif init_mem_mapping(); + memblock_set_current_limit(get_max_mapped()); idt_setup_early_pf(); @@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p) */ mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE; - memblock_set_current_limit(get_max_mapped()); - /* * NOTE: On x86-32, only from this point on, fixmaps are ready for use. */
This patch identifies the point where memblock alloc start. It has no functional. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Baoquan He <bhe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: x86@kernel.org Cc: linux-acpi@vger.kernel.org Cc: linux-mm@kvack.org --- arch/x86/kernel/setup.c | 54 ++++++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 28 deletions(-)