diff mbox series

memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE

Message ID 20241021051151.4664-1-suhua.tanke@gmail.com (mailing list archive)
State New
Headers show
Series memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE | expand

Commit Message

Hua Su Oct. 21, 2024, 5:11 a.m. UTC
Currently when CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the reserved
pages are initialized to MIGRATE_MOVABLE by default in memmap_init.

Reserved memory mainly store the metadata of struct page. When
HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=Y and hugepages are allocated,
the HVO will remap the vmemmap virtual address range to the page which
vmemmap_reuse is mapped to. The pages previously mapping the range will
be freed to the buddy system.

Before this patch:
when CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the freed memory was
placed on the Movable list;
When CONFIG_DEFERRED_STRUCT_PAGE_INIT=Y, the freed memory was placed on
the Unmovable list.

After this patch, the freed memory is placed on the Movable list
regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT is set.

Eg:
Tested on a virtual machine(1000GB):
Intel(R) Xeon(R) Platinum 8358P CPU

After vm start:
echo 500000 > /proc/sys/vm/nr_hugepages
cat /proc/meminfo | grep -i huge
HugePages_Total:   500000
HugePages_Free:    500000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        1024000000 kB

cat /proc/pagetypeinfo
before:
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
…
Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Unmovable ≈ 15GB

after:
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
…
Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Signed-off-by: Hua Su <suhua.tanke@gmail.com>
---
 mm/mm_init.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Mike Rapoport Oct. 21, 2024, 6:52 a.m. UTC | #1
From: Mike Rapoport (Microsoft) <rppt@kernel.org>

On Mon, 21 Oct 2024 13:11:51 +0800, Hua Su wrote:
> Currently when CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the reserved
> pages are initialized to MIGRATE_MOVABLE by default in memmap_init.
> 
> Reserved memory mainly store the metadata of struct page. When
> HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=Y and hugepages are allocated,
> the HVO will remap the vmemmap virtual address range to the page which
> vmemmap_reuse is mapped to. The pages previously mapping the range will
> be freed to the buddy system.
> 
> [...]

Applied to for-next branch of memblock.git tree, thanks!

[1/1] memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE
      commit: ad48825232a91a382f665bb7c3bf0044027791d4

tree: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
branch: for-next

--
Sincerely yours,
Mike.
kernel test robot Oct. 25, 2024, 2:13 a.m. UTC | #2
Hello,

kernel test robot noticed "kernel_BUG_at_include/linux/mm.h" on:

commit: 0a19e28247d042d639e5a46c3698adeda268a7a2 ("[PATCH] memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE")
url: https://github.com/intel-lab-lkp/linux/commits/Hua-Su/memblock-uniformly-initialize-all-reserved-pages-to-MIGRATE_MOVABLE/20241021-131358
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20241021051151.4664-1-suhua.tanke@gmail.com/
patch subject: [PATCH] memblock: uniformly initialize all reserved pages to MIGRATE_MOVABLE

in testcase: boot

config: x86_64-randconfig-012-20241023
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+------------------------------------------+------------+------------+
|                                          | a8883372ec | 0a19e28247 |
+------------------------------------------+------------+------------+
| boot_successes                           | 18         | 0          |
| boot_failures                            | 0          | 18         |
| kernel_BUG_at_include/linux/mm.h         | 0          | 18         |
| Oops:invalid_opcode:#[##]SMP_PTI         | 0          | 18         |
| RIP:page_zone                            | 0          | 18         |
| Kernel_panic-not_syncing:Fatal_exception | 0          | 18         |
+------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202410251024.eb4a89f1-oliver.sang@intel.com


[    0.262363][    T0] ------------[ cut here ]------------
[    0.262921][    T0] kernel BUG at include/linux/mm.h:1637!
[    0.263532][    T0] Oops: invalid opcode: 0000 [#1] SMP PTI
[    0.264140][    T0] CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G                T  6.12.0-rc3-00235-g0a19e28247d0 #1
[    0.265300][    T0] Tainted: [T]=RANDSTRUCT
[ 0.265762][ T0] RIP: 0010:page_zone (include/linux/mm.h:1858) 
[ 0.266284][ T0] Code: 43 08 89 ee 48 89 df 31 d2 5b 5d 41 5c 41 5d 41 5e e9 f1 08 02 00 48 8b 07 48 ff c0 75 0e 48 c7 c6 27 2e 99 ac e8 42 73 fd ff <0f> 0b 48 8b 07 48 c1 e8 3e 48 69 c0 40 06 00 00 48 05 c0 63 6c ad
All code
========
   0:	43 08 89 ee 48 89 df 	rex.XB or %cl,-0x2076b712(%r9)
   7:	31 d2                	xor    %edx,%edx
   9:	5b                   	pop    %rbx
   a:	5d                   	pop    %rbp
   b:	41 5c                	pop    %r12
   d:	41 5d                	pop    %r13
   f:	41 5e                	pop    %r14
  11:	e9 f1 08 02 00       	jmpq   0x20907
  16:	48 8b 07             	mov    (%rdi),%rax
  19:	48 ff c0             	inc    %rax
  1c:	75 0e                	jne    0x2c
  1e:	48 c7 c6 27 2e 99 ac 	mov    $0xffffffffac992e27,%rsi
  25:	e8 42 73 fd ff       	callq  0xfffffffffffd736c
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	48 8b 07             	mov    (%rdi),%rax
  2f:	48 c1 e8 3e          	shr    $0x3e,%rax
  33:	48 69 c0 40 06 00 00 	imul   $0x640,%rax,%rax
  3a:	48 05 c0 63 6c ad    	add    $0xffffffffad6c63c0,%rax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	48 8b 07             	mov    (%rdi),%rax
   5:	48 c1 e8 3e          	shr    $0x3e,%rax
   9:	48 69 c0 40 06 00 00 	imul   $0x640,%rax,%rax
  10:	48 05 c0 63 6c ad    	add    $0xffffffffad6c63c0,%rax
[    0.268346][    T0] RSP: 0000:fffffffface03dc0 EFLAGS: 00010046
[    0.268988][    T0] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000000
[    0.269844][    T0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.270685][    T0] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[    0.271486][    T0] R10: 0000000000000000 R11: 6d75642065676170 R12: 0000000000000001
[    0.272336][    T0] R13: 0000000000159400 R14: fffff7bb05650000 R15: ffff9bfa1ffff178
[    0.273172][    T0] FS:  0000000000000000(0000) GS:ffff9bfcefa00000(0000) knlGS:0000000000000000
[    0.274145][    T0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.274817][    T0] CR2: ffff9bfcfffff000 CR3: 000000015c2b2000 CR4: 00000000000000b0
[    0.275658][    T0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.276502][    T0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    0.277343][    T0] Call Trace:
[    0.277691][    T0]  <TASK>
[ 0.277992][ T0] ? __die_body (arch/x86/kernel/dumpstack.c:421) 
[ 0.278448][ T0] ? die (arch/x86/kernel/dumpstack.c:449) 
[ 0.278838][ T0] ? do_trap (arch/x86/kernel/traps.c:156 arch/x86/kernel/traps.c:197) 
[ 0.279276][ T0] ? page_zone (include/linux/mm.h:1858) 
[ 0.279720][ T0] ? page_zone (include/linux/mm.h:1858) 
[ 0.280170][ T0] ? do_error_trap (arch/x86/kernel/traps.c:218) 
[ 0.280648][ T0] ? page_zone (include/linux/mm.h:1858) 
[ 0.281095][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:316) 
[ 0.281597][ T0] ? page_zone (include/linux/mm.h:1858) 
[ 0.282041][ T0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:621) 
[ 0.282582][ T0] ? page_zone (include/linux/mm.h:1858) 
[ 0.283027][ T0] set_pfnblock_flags_mask (mm/page_alloc.c:408) 
[ 0.283583][ T0] reserve_bootmem_region (mm/mm_init.c:729 mm/mm_init.c:765) 
[ 0.284142][ T0] free_low_memory_core_early (mm/memblock.c:2192 mm/memblock.c:2205) 
[ 0.284736][ T0] ? swiotlb_init_io_tlb_pool+0x86/0x133 
[ 0.285419][ T0] memblock_free_all (mm/memblock.c:2252) 
[ 0.285925][ T0] mem_init (arch/x86/mm/init_64.c:1360) 
[ 0.286332][ T0] mm_core_init (mm/mm_init.c:2658) 
[ 0.286790][ T0] start_kernel (init/main.c:965) 
[ 0.287272][ T0] x86_64_start_reservations (arch/x86/kernel/head64.c:381) 
[ 0.287850][ T0] x86_64_start_kernel (arch/x86/kernel/ebda.c:57) 
[ 0.288377][ T0] common_startup_64 (arch/x86/kernel/head_64.S:414) 
[    0.288899][    T0]  </TASK>
[    0.289213][    T0] Modules linked in:
[    0.289626][    T0] ---[ end trace 0000000000000000 ]---
[ 0.290175][ T0] RIP: 0010:page_zone (include/linux/mm.h:1858) 
[ 0.290680][ T0] Code: 43 08 89 ee 48 89 df 31 d2 5b 5d 41 5c 41 5d 41 5e e9 f1 08 02 00 48 8b 07 48 ff c0 75 0e 48 c7 c6 27 2e 99 ac e8 42 73 fd ff <0f> 0b 48 8b 07 48 c1 e8 3e 48 69 c0 40 06 00 00 48 05 c0 63 6c ad
All code
========
   0:	43 08 89 ee 48 89 df 	rex.XB or %cl,-0x2076b712(%r9)
   7:	31 d2                	xor    %edx,%edx
   9:	5b                   	pop    %rbx
   a:	5d                   	pop    %rbp
   b:	41 5c                	pop    %r12
   d:	41 5d                	pop    %r13
   f:	41 5e                	pop    %r14
  11:	e9 f1 08 02 00       	jmpq   0x20907
  16:	48 8b 07             	mov    (%rdi),%rax
  19:	48 ff c0             	inc    %rax
  1c:	75 0e                	jne    0x2c
  1e:	48 c7 c6 27 2e 99 ac 	mov    $0xffffffffac992e27,%rsi
  25:	e8 42 73 fd ff       	callq  0xfffffffffffd736c
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	48 8b 07             	mov    (%rdi),%rax
  2f:	48 c1 e8 3e          	shr    $0x3e,%rax
  33:	48 69 c0 40 06 00 00 	imul   $0x640,%rax,%rax
  3a:	48 05 c0 63 6c ad    	add    $0xffffffffad6c63c0,%rax

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	48 8b 07             	mov    (%rdi),%rax
   5:	48 c1 e8 3e          	shr    $0x3e,%rax
   9:	48 69 c0 40 06 00 00 	imul   $0x640,%rax,%rax
  10:	48 05 c0 63 6c ad    	add    $0xffffffffad6c63c0,%rax


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241025/202410251024.eb4a89f1-oliver.sang@intel.com
diff mbox series

Patch

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4ba5607aaf19..6dbf2df23eee 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -722,6 +722,10 @@  static void __meminit init_reserved_page(unsigned long pfn, int nid)
 		if (zone_spans_pfn(zone, pfn))
 			break;
 	}
+
+	if (pageblock_aligned(pfn))
+		set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
+
 	__init_single_page(pfn_to_page(pfn), pfn, zid, nid);
 }
 #else