diff mbox series

x86/e820: discard high memory that can't be addressed by 32-bit systems

Message ID 20250413080858.743221-1-rppt@kernel.org (mailing list archive)
State New
Headers show
Series x86/e820: discard high memory that can't be addressed by 32-bit systems | expand

Commit Message

Mike Rapoport April 13, 2025, 8:08 a.m. UTC
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Dave Hansen reports the following crash on a 32-bit system with
CONFIG_HIGHMEM=y and CONFIG_X86_PAE=y:

  > 0xf75fe000 is the mem_map[] entry for the first page >4GB. It
  > obviously wasn't allocated, thus the oops.

  BUG: unable to handle page fault for address: f75fe000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  *pdpt = 0000000002da2001 *pde = 000000000300c067 *pte = 0000000000000000
  Oops: Oops: 0002 [#1] SMP NOPTI
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00288-ge618ee89561b-dirty #311 PREEMPT(undef)
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  EIP: __free_pages_core+0x3c/0x74
  Code: c3 d3 e6 83 ec 10 89 44 24 08 89 74 24 04 c7 04 24 c6 32 3a c2 89 55 f4 e8 a9 11 45 fe 85 f6 8b 55 f4 74 19 89 d8 31 c9 66 90 <0f> ba 30 0d c7 40 1c 00 00 00 00 41 83 c0 28 39 ce 75 ed 8b

  EAX: f75fe000 EBX: f75fe000 ECX: 00000000 EDX: 0000000a
  ESI: 00000400 EDI: 00500000 EBP: c247becc ESP: c247beb4
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
  CR0: 80050033 CR2: f75fe000 CR3: 02da6000 CR4: 000000b0
  Call Trace:
   memblock_free_pages+0x11/0x2c
   memblock_free_all+0x2ce/0x3a0
   mm_core_init+0xf5/0x320
   start_kernel+0x296/0x79c
   ? set_init_arg+0x70/0x70
   ? load_ucode_bsp+0x13c/0x1a8
   i386_start_kernel+0xad/0xb0
   startup_32_smp+0x151/0x154
  Modules linked in:
  CR2: 00000000f75fe000

The mem_map[] is allocated up to the end of ZONE_HIGHMEM which is defined
by max_pfn.

Before 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing") freeing of
high memory was also clamped to the end of ZONE_HIGHMEM but after
6faea3422e3b memblock_free_all() tries to free memory above the of
ZONE_HIGHMEM as well and that causes access to mem_map[] entries beyond
the end of the memory map.

Discard the memory after max_pfn from memblock on 32-bit systems so that
core MM would be aware only of actually usable memory.

Reported-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 arch/x86/kernel/e820.c | 8 ++++++++
 1 file changed, 8 insertions(+)


base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8

Comments

Nathan Chancellor April 17, 2025, 4:22 p.m. UTC | #1
Hi Mike,

On Sun, Apr 13, 2025 at 11:08:58AM +0300, Mike Rapoport wrote:
...
>  arch/x86/kernel/e820.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 57120f0749cc..5f673bd6c7d7 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1300,6 +1300,14 @@ void __init e820__memblock_setup(void)
>  		memblock_add(entry->addr, entry->size);
>  	}
>  
> +	/*
> +	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> +	 * to even less without it.
> +	 * Discard memory after max_pfn - the actual limit detected at runtime.
> +	 */
> +	if (IS_ENABLED(CONFIG_X86_32))
> +		memblock_remove(PFN_PHYS(max_pfn), -1);
> +
>  	/* Throw away partial pages: */
>  	memblock_trim_memory(PAGE_SIZE);

Our CI noticed a boot failure after this change as commit 1e07b9fad022
("x86/e820: Discard high memory that can't be addressed by 32-bit
systems") in -tip when booting i386_defconfig with a simple buildroot
initrd.

  $ make -skj"$(nproc)" ARCH=i386 CROSS_COMPILE=i386-linux- mrproper defconfig bzImage

  $ curl -LSs https://github.com/ClangBuiltLinux/boot-utils/releases/download/20241120-044434/x86-rootfs.cpio.zst | zstd -d >rootfs.cpio

  $ qemu-system-i386 \
      -display none \
      -nodefaults \
      -M q35 \
      -d unimp,guest_errors \
      -append 'console=ttyS0 earlycon=uart8250,io,0x3f8' \
      -kernel arch/x86/boot/bzImage \
      -initrd rootfs.cpio \
      -cpu host \
      -enable-kvm \
      -m 512m \
      -smp 8 \
      -serial mon:stdio
  [    0.000000] Linux version 6.15.0-rc1-00177-g1e07b9fad022 (nathan@ax162) (i386-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP PREEMPT_DYNAMIC Thu Apr 17 09:02:19 MST 2025
  [    0.000000] BIOS-provided physical RAM map:
  [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
  [    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
  [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
  [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable
  [    0.000000] BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved
  [    0.000000] BIOS-e820: [mem 0x00000000b0000000-0x00000000bfffffff] reserved
  [    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
  [    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
  [    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
  [    0.000000] earlycon: uart8250 at I/O port 0x3f8 (options '')
  [    0.000000] printk: legacy bootconsole [uart8250] enabled
  [    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
  [    0.000000] APIC: Static calls initialized
  [    0.000000] SMBIOS 2.8 present.
  [    0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  [    0.000000] DMI: Memory slots populated: 1/1
  [    0.000000] Hypervisor detected: KVM
  [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
  [    0.000000] kvm-clock: using sched offset of 196444860 cycles
  [    0.000589] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
  [    0.002401] tsc: Detected 2750.000 MHz processor
  [    0.003126] last_pfn = 0x1ffe0 max_arch_pfn = 0x100000
  [    0.003728] MTRR map: 4 entries (3 fixed + 1 variable; max 19), built from 8 variable MTRRs
  [    0.004664] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
  [    0.007149] found SMP MP-table at [mem 0x000f5480-0x000f548f]
  [    0.007802] No sub-1M memory is available for the trampoline
  [    0.008435] Failed to release memory for alloc_low_pages()
  [    0.008438] RAMDISK: [mem 0x1fa5f000-0x1ffdffff]
  [    0.009571] Kernel panic - not syncing: Cannot find place for new RAMDISK of size 5771264
  [    0.010486] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00177-g1e07b9fad022 #1 PREEMPT(undef)
  [    0.011601] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  [    0.012857] Call Trace:
  [    0.013135]  dump_stack_lvl+0x43/0x58
  [    0.013555]  dump_stack+0xd/0x10
  [    0.013919]  panic+0xa5/0x221
  [    0.014252]  setup_arch+0x86f/0x9f0
  [    0.014650]  ? vprintk_default+0x29/0x30
  [    0.015089]  start_kernel+0x4b/0x570
  [    0.015487]  i386_start_kernel+0x65/0x68
  [    0.015919]  startup_32_smp+0x151/0x154
  [    0.016344] ---[ end Kernel panic - not syncing: Cannot find place for new RAMDISK of size 5771264 ]---

At the parent change with the same command, the boot completes fine.

  [    0.000000] Linux version 6.15.0-rc1-00176-gd466304c4322 (nathan@ax162) (i386-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP PREEMPT_DYNAMIC Thu Apr 17 09:00:12 MST 2025
  [    0.000000] BIOS-provided physical RAM map:
  ...
  [    0.000000] earlycon: uart8250 at I/O port 0x3f8 (options '')
  [    0.000000] printk: legacy bootconsole [uart8250] enabled
  [    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
  [    0.000000] APIC: Static calls initialized
  [    0.000000] SMBIOS 2.8 present.
  [    0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  [    0.000000] DMI: Memory slots populated: 1/1
  [    0.000000] Hypervisor detected: KVM
  [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
  [    0.000001] kvm-clock: using sched offset of 429786443 cycles
  [    0.000806] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
  [    0.003278] tsc: Detected 2750.000 MHz processor
  [    0.004730] last_pfn = 0x1ffe0 max_arch_pfn = 0x100000
  [    0.006220] MTRR map: 4 entries (3 fixed + 1 variable; max 19), built from 8 variable MTRRs
  [    0.009169] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
  [    0.012840] found SMP MP-table at [mem 0x000f5480-0x000f548f]
  [    0.014310] RAMDISK: [mem 0x1fa5f000-0x1ffdffff]
  [    0.015141] ACPI: Early table checksum verification disabled
  ...
  [    0.046564] 511MB LOWMEM available.
  [    0.047421]   mapped low ram: 0 - 1ffe0000
  [    0.048431]   low ram: 0 - 1ffe0000
  [    0.049289] Zone ranges:
  [    0.049934]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
  [    0.051184]   Normal   [mem 0x0000000001000000-0x000000001ffdffff]
  [    0.053087] Movable zone start for each node
  [    0.054409] Early memory node ranges
  [    0.055513]   node   0: [mem 0x0000000000001000-0x000000000009efff]
  [    0.057411]   node   0: [mem 0x0000000000100000-0x000000001ffdffff]
  [    0.059176] Initmem setup node 0 [mem 0x0000000000001000-0x000000001ffdffff]
  ...

Is this an invalid configuration or virtual setup that is being tested
here or is there something else problematic with this change?

Cheers,
Nathan
Ingo Molnar April 18, 2025, 6:33 a.m. UTC | #2
* Nathan Chancellor <nathan@kernel.org> wrote:

> Hi Mike,
> 
> On Sun, Apr 13, 2025 at 11:08:58AM +0300, Mike Rapoport wrote:
> ...
> >  arch/x86/kernel/e820.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index 57120f0749cc..5f673bd6c7d7 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -1300,6 +1300,14 @@ void __init e820__memblock_setup(void)
> >  		memblock_add(entry->addr, entry->size);
> >  	}
> >  
> > +	/*
> > +	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> > +	 * to even less without it.
> > +	 * Discard memory after max_pfn - the actual limit detected at runtime.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_X86_32))
> > +		memblock_remove(PFN_PHYS(max_pfn), -1);
> > +
> >  	/* Throw away partial pages: */
> >  	memblock_trim_memory(PAGE_SIZE);
> 
> Our CI noticed a boot failure after this change as commit 1e07b9fad022
> ("x86/e820: Discard high memory that can't be addressed by 32-bit
> systems") in -tip when booting i386_defconfig with a simple buildroot
> initrd.

I've zapped this commit from tip:x86/urgent for the time being:

  1e07b9fad022 ("x86/e820: Discard high memory that can't be addressed by 32-bit systems")

until these bugs are better understood.

Thanks,

	Ingo
Mike Rapoport April 18, 2025, 9:01 a.m. UTC | #3
On Fri, Apr 18, 2025 at 08:33:02AM +0200, Ingo Molnar wrote:
> 
> * Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > Hi Mike,
> > 
> > On Sun, Apr 13, 2025 at 11:08:58AM +0300, Mike Rapoport wrote:
> > ...
> > >  arch/x86/kernel/e820.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > index 57120f0749cc..5f673bd6c7d7 100644
> > > --- a/arch/x86/kernel/e820.c
> > > +++ b/arch/x86/kernel/e820.c
> > > @@ -1300,6 +1300,14 @@ void __init e820__memblock_setup(void)
> > >  		memblock_add(entry->addr, entry->size);
> > >  	}
> > >  
> > > +	/*
> > > +	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> > > +	 * to even less without it.
> > > +	 * Discard memory after max_pfn - the actual limit detected at runtime.
> > > +	 */
> > > +	if (IS_ENABLED(CONFIG_X86_32))
> > > +		memblock_remove(PFN_PHYS(max_pfn), -1);
> > > +
> > >  	/* Throw away partial pages: */
> > >  	memblock_trim_memory(PAGE_SIZE);
> > 
> > Our CI noticed a boot failure after this change as commit 1e07b9fad022
> > ("x86/e820: Discard high memory that can't be addressed by 32-bit
> > systems") in -tip when booting i386_defconfig with a simple buildroot
> > initrd.
> 
> I've zapped this commit from tip:x86/urgent for the time being:
> 
>   1e07b9fad022 ("x86/e820: Discard high memory that can't be addressed by 32-bit systems")
> 
> until these bugs are better understood.

With X86_PAE disabled phys_addr_t is 32 bit, PFN_PHYS(MAX_NONPAE_PFN)
overflows and we get memblock_remove(0, -1) :(

Using max_pfn instead of MAX_NONPAE_PFN would work because there's a hole
under 4G and max_pfn should never overflow.

Another option is to skip e820 entries above 4G and not add them to
memblock at the first place, e.g.

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0749cc..2b617f36f11a 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1297,6 +1297,17 @@ void __init e820__memblock_setup(void)
 		if (entry->type != E820_TYPE_RAM)
 			continue;
 
+#ifdef CONFIG_X86_32
+		/*
+		 * Discard memory above 4GB because 32-bit systems are limited
+		 * to 4GB of memory even with HIGHMEM.
+		 */
+		if (entry->addr > SZ_4G)
+			continue;
+		if (entry->addr + entry->size > SZ_4G)
+			entry->size = SZ_4G - entry->addr;
+#endif
+
 		memblock_add(entry->addr, entry->size);
 	}
 
 
> Thanks,
> 
> 	Ingo
Ingo Molnar April 18, 2025, 12:59 p.m. UTC | #4
* Mike Rapoport <rppt@kernel.org> wrote:

> On Fri, Apr 18, 2025 at 08:33:02AM +0200, Ingo Molnar wrote:
> > 
> > * Nathan Chancellor <nathan@kernel.org> wrote:
> > 
> > > Hi Mike,
> > > 
> > > On Sun, Apr 13, 2025 at 11:08:58AM +0300, Mike Rapoport wrote:
> > > ...
> > > >  arch/x86/kernel/e820.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > > index 57120f0749cc..5f673bd6c7d7 100644
> > > > --- a/arch/x86/kernel/e820.c
> > > > +++ b/arch/x86/kernel/e820.c
> > > > @@ -1300,6 +1300,14 @@ void __init e820__memblock_setup(void)
> > > >  		memblock_add(entry->addr, entry->size);
> > > >  	}
> > > >  
> > > > +	/*
> > > > +	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> > > > +	 * to even less without it.
> > > > +	 * Discard memory after max_pfn - the actual limit detected at runtime.
> > > > +	 */
> > > > +	if (IS_ENABLED(CONFIG_X86_32))
> > > > +		memblock_remove(PFN_PHYS(max_pfn), -1);
> > > > +
> > > >  	/* Throw away partial pages: */
> > > >  	memblock_trim_memory(PAGE_SIZE);
> > > 
> > > Our CI noticed a boot failure after this change as commit 1e07b9fad022
> > > ("x86/e820: Discard high memory that can't be addressed by 32-bit
> > > systems") in -tip when booting i386_defconfig with a simple buildroot
> > > initrd.
> > 
> > I've zapped this commit from tip:x86/urgent for the time being:
> > 
> >   1e07b9fad022 ("x86/e820: Discard high memory that can't be addressed by 32-bit systems")
> > 
> > until these bugs are better understood.
> 
> With X86_PAE disabled phys_addr_t is 32 bit, PFN_PHYS(MAX_NONPAE_PFN)
> overflows and we get memblock_remove(0, -1) :(
> 
> Using max_pfn instead of MAX_NONPAE_PFN would work because there's a hole
> under 4G and max_pfn should never overflow.

So why don't we use max_pfn like your -v1 fix did IIRC?

	Ingo
Mike Rapoport April 18, 2025, 7:25 p.m. UTC | #5
On Fri, Apr 18, 2025 at 02:59:05PM +0200, Ingo Molnar wrote:
> 
> * Mike Rapoport <rppt@kernel.org> wrote:
> 
> > On Fri, Apr 18, 2025 at 08:33:02AM +0200, Ingo Molnar wrote:
> > > 
> > > * Nathan Chancellor <nathan@kernel.org> wrote:
> > > 
> > > > Hi Mike,
> > > > 
> > > > On Sun, Apr 13, 2025 at 11:08:58AM +0300, Mike Rapoport wrote:
> > > > ...
> > > > >  arch/x86/kernel/e820.c | 8 ++++++++
> > > > >  1 file changed, 8 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > > > index 57120f0749cc..5f673bd6c7d7 100644
> > > > > --- a/arch/x86/kernel/e820.c
> > > > > +++ b/arch/x86/kernel/e820.c
> > > > > @@ -1300,6 +1300,14 @@ void __init e820__memblock_setup(void)
> > > > >  		memblock_add(entry->addr, entry->size);
> > > > >  	}
> > > > >  
> > > > > +	/*
> > > > > +	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
> > > > > +	 * to even less without it.
> > > > > +	 * Discard memory after max_pfn - the actual limit detected at runtime.
> > > > > +	 */
> > > > > +	if (IS_ENABLED(CONFIG_X86_32))
> > > > > +		memblock_remove(PFN_PHYS(max_pfn), -1);
> > > > > +
> > > > >  	/* Throw away partial pages: */
> > > > >  	memblock_trim_memory(PAGE_SIZE);
> > > > 
> > > > Our CI noticed a boot failure after this change as commit 1e07b9fad022
> > > > ("x86/e820: Discard high memory that can't be addressed by 32-bit
> > > > systems") in -tip when booting i386_defconfig with a simple buildroot
> > > > initrd.
> > > 
> > > I've zapped this commit from tip:x86/urgent for the time being:
> > > 
> > >   1e07b9fad022 ("x86/e820: Discard high memory that can't be addressed by 32-bit systems")
> > > 
> > > until these bugs are better understood.
> > 
> > With X86_PAE disabled phys_addr_t is 32 bit, PFN_PHYS(MAX_NONPAE_PFN)
> > overflows and we get memblock_remove(0, -1) :(
> > 
> > Using max_pfn instead of MAX_NONPAE_PFN would work because there's a hole
> > under 4G and max_pfn should never overflow.
> 
> So why don't we use max_pfn like your -v1 fix did IIRC?

Dave didn't like max_pfn. I don't feel strongly about using max_pfn or
skipping e820 ranges above 4G and not adding them to memblock.
 
> 	Ingo
Dave Hansen April 18, 2025, 7:29 p.m. UTC | #6
On 4/18/25 12:25, Mike Rapoport wrote:
>> So why don't we use max_pfn like your -v1 fix did IIRC?
> Dave didn't like max_pfn. I don't feel strongly about using max_pfn or
> skipping e820 ranges above 4G and not adding them to memblock.

I feel more strongly about fixing the bug than avoiding max_pfn. ;)

Going back to v1 is fine with me.
Guenter Roeck April 18, 2025, 7:49 p.m. UTC | #7
Hi,

On Sun, Apr 13, 2025 at 11:08:58AM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> Dave Hansen reports the following crash on a 32-bit system with
> CONFIG_HIGHMEM=y and CONFIG_X86_PAE=y:
> 
>   > 0xf75fe000 is the mem_map[] entry for the first page >4GB. It
>   > obviously wasn't allocated, thus the oops.
> 
>   BUG: unable to handle page fault for address: f75fe000
>   #PF: supervisor write access in kernel mode
>   #PF: error_code(0x0002) - not-present page
>   *pdpt = 0000000002da2001 *pde = 000000000300c067 *pte = 0000000000000000
>   Oops: Oops: 0002 [#1] SMP NOPTI
>   CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00288-ge618ee89561b-dirty #311 PREEMPT(undef)
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
>   EIP: __free_pages_core+0x3c/0x74
>   Code: c3 d3 e6 83 ec 10 89 44 24 08 89 74 24 04 c7 04 24 c6 32 3a c2 89 55 f4 e8 a9 11 45 fe 85 f6 8b 55 f4 74 19 89 d8 31 c9 66 90 <0f> ba 30 0d c7 40 1c 00 00 00 00 41 83 c0 28 39 ce 75 ed 8b
> 
>   EAX: f75fe000 EBX: f75fe000 ECX: 00000000 EDX: 0000000a
>   ESI: 00000400 EDI: 00500000 EBP: c247becc ESP: c247beb4
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00210046
>   CR0: 80050033 CR2: f75fe000 CR3: 02da6000 CR4: 000000b0
>   Call Trace:
>    memblock_free_pages+0x11/0x2c
>    memblock_free_all+0x2ce/0x3a0
>    mm_core_init+0xf5/0x320
>    start_kernel+0x296/0x79c
>    ? set_init_arg+0x70/0x70
>    ? load_ucode_bsp+0x13c/0x1a8
>    i386_start_kernel+0xad/0xb0
>    startup_32_smp+0x151/0x154
>   Modules linked in:
>   CR2: 00000000f75fe000
> 
> The mem_map[] is allocated up to the end of ZONE_HIGHMEM which is defined
> by max_pfn.
> 
> Before 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing") freeing of
> high memory was also clamped to the end of ZONE_HIGHMEM but after
> 6faea3422e3b memblock_free_all() tries to free memory above the of
> ZONE_HIGHMEM as well and that causes access to mem_map[] entries beyond
> the end of the memory map.
> 
> Discard the memory after max_pfn from memblock on 32-bit systems so that
> core MM would be aware only of actually usable memory.
> 
> Reported-by: Dave Hansen <dave.hansen@intel.com>
> Tested-by: Arnd Bergmann <arnd@kernel.org>
> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

With this patch in pending-fixes ( v6.15-rc2-434-g93ced5296772),
all my i386 test runs crash.

[    0.020893] Kernel panic - not syncing: ioapic_setup_resources: Failed to allocate 0x0000002b bytes
[    0.021248] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc2-00434-g93ced5296772 #1 PREEMPT(undef)
[    0.021373] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    0.021549] Call Trace:
[    0.021711]  dump_stack_lvl+0x20/0x104
[    0.022023]  dump_stack+0x12/0x18
[    0.022064]  panic+0x2c1/0x2d8
[    0.022116]  ? vprintk_default+0x29/0x30
[    0.022163]  __memblock_alloc_or_panic+0x57/0x58
[    0.022221]  io_apic_init_mappings+0x2e/0x1a8
[    0.022284]  setup_arch+0x909/0xdac
[    0.022338]  ? vprintk_default+0x29/0x30
[    0.022410]  start_kernel+0x63/0x760
[    0.022457]  ? load_ucode_bsp+0x12c/0x198
[    0.022507]  i386_start_kernel+0x74/0x74
[    0.022548]  startup_32_smp+0x151/0x154
[    0.023089] ---[ end Kernel panic - not syncing: ioapic_setup_resources: Failed to allocate 0x0000002b bytes ]---

Reverting this patch fixes the problem. Bisect log is attached for reference.

Guenter

---
# bad: [93ced5296772b7b704f48e4bad9fcfdf0633c780] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
# good: [8ffd015db85fea3e15a77027fda6c02ced4d2444] Linux 6.15-rc2
git bisect start 'HEAD' 'v6.15-rc2'
# good: [5d6f363fc974e32dd9930fecaae63958b68a1df4] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git
git bisect good 5d6f363fc974e32dd9930fecaae63958b68a1df4
# good: [1790b4a242fe119fead08fccc5bf923423c7449a] Merge branch 'dma-mapping-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux.git
git bisect good 1790b4a242fe119fead08fccc5bf923423c7449a
# good: [5d37ee8a1d6455968ea3134d78223090d487c7f4] Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
git bisect good 5d37ee8a1d6455968ea3134d78223090d487c7f4
# good: [9d4de5ae5208548eb9c6a490ac454601f4fbf00b] Merge branch 'i2c/i2c-host-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux.git
git bisect good 9d4de5ae5208548eb9c6a490ac454601f4fbf00b
# bad: [f737ab93945fb8f0213e1cccc39d028eb5d880e0] Merge branch into tip/master: 'x86/urgent'
git bisect bad f737ab93945fb8f0213e1cccc39d028eb5d880e0
# good: [2e7a2843d0de7677b7bb908ca006dc435e52c416] Merge branch into tip/master: 'irq/urgent'
git bisect good 2e7a2843d0de7677b7bb908ca006dc435e52c416
# good: [d466304c4322ad391797437cd84cca7ce1660de0] x86/cpu: Add CPU model number for Bartlett Lake CPUs with Raptor Cove cores
git bisect good d466304c4322ad391797437cd84cca7ce1660de0
# good: [39893b1e4ad7c4380abe4cfddaa58b34c4363bf4] Merge branch into tip/master: 'timers/urgent'
git bisect good 39893b1e4ad7c4380abe4cfddaa58b34c4363bf4
# bad: [1e07b9fad022e0e02215150ca1e20912e78e8ec1] x86/e820: Discard high memory that can't be addressed by 32-bit systems
git bisect bad 1e07b9fad022e0e02215150ca1e20912e78e8ec1
# first bad commit: [1e07b9fad022e0e02215150ca1e20912e78e8ec1] x86/e820: Discard high memory that can't be addressed by 32-bit systems
diff mbox series

Patch

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 57120f0749cc..5f673bd6c7d7 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1300,6 +1300,14 @@  void __init e820__memblock_setup(void)
 		memblock_add(entry->addr, entry->size);
 	}
 
+	/*
+	 * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
+	 * to even less without it.
+	 * Discard memory after max_pfn - the actual limit detected at runtime.
+	 */
+	if (IS_ENABLED(CONFIG_X86_32))
+		memblock_remove(PFN_PHYS(max_pfn), -1);
+
 	/* Throw away partial pages: */
 	memblock_trim_memory(PAGE_SIZE);