diff mbox series

memblock: exclude NOMAP regions from kmemleak

Message ID 20211013054756.12177-1-rppt@kernel.org (mailing list archive)
State New
Headers show
Series memblock: exclude NOMAP regions from kmemleak | expand

Commit Message

Mike Rapoport Oct. 13, 2021, 5:47 a.m. UTC
From: Mike Rapoport <rppt@linux.ibm.com>

Vladimir Zapolskiy reports:

commit a7259df76702 ("memblock: make memblock_find_in_range method private")
invokes a kernel panic while running kmemleak on OF platforms with nomaped
regions:

  Unable to handle kernel paging request at virtual address fff000021e00000
  [...]
    scan_block+0x64/0x170
    scan_gray_list+0xe8/0x17c
    kmemleak_scan+0x270/0x514
    kmemleak_write+0x34c/0x4ac

Indeed, NOMAP regions don't have linear map entries so an attempt to scan
these areas would fault.

Prevent such faults by excluding NOMAP regions from kmemleak.

Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
---
 mm/memblock.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)


base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc

Comments

Catalin Marinas Oct. 13, 2021, 7:45 a.m. UTC | #1
On Wed, Oct 13, 2021 at 08:47:56AM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Vladimir Zapolskiy reports:
> 
> commit a7259df76702 ("memblock: make memblock_find_in_range method private")
> invokes a kernel panic while running kmemleak on OF platforms with nomaped
> regions:
> 
>   Unable to handle kernel paging request at virtual address fff000021e00000
>   [...]
>     scan_block+0x64/0x170
>     scan_gray_list+0xe8/0x17c
>     kmemleak_scan+0x270/0x514
>     kmemleak_write+0x34c/0x4ac
> 
> Indeed, NOMAP regions don't have linear map entries so an attempt to scan
> these areas would fault.
> 
> Prevent such faults by excluding NOMAP regions from kmemleak.
> 
> Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Mike Rapoport Oct. 13, 2021, 11:35 a.m. UTC | #2
On Wed, Oct 13, 2021 at 08:45:40AM +0100, Catalin Marinas wrote:
> On Wed, Oct 13, 2021 at 08:47:56AM +0300, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Vladimir Zapolskiy reports:
> > 
> > commit a7259df76702 ("memblock: make memblock_find_in_range method private")
> > invokes a kernel panic while running kmemleak on OF platforms with nomaped
> > regions:
> > 
> >   Unable to handle kernel paging request at virtual address fff000021e00000
> >   [...]
> >     scan_block+0x64/0x170
> >     scan_gray_list+0xe8/0x17c
> >     kmemleak_scan+0x270/0x514
> >     kmemleak_write+0x34c/0x4ac
> > 
> > Indeed, NOMAP regions don't have linear map entries so an attempt to scan
> > these areas would fault.
> > 
> > Prevent such faults by excluding NOMAP regions from kmemleak.
> > 
> > Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
> > Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

Thanks!

I'm going to take it via memblock tree if that's fine with everybody.
Qian Cai Oct. 19, 2021, 3:55 a.m. UTC | #3
On 10/13/2021 1:47 AM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Vladimir Zapolskiy reports:
> 
> commit a7259df76702 ("memblock: make memblock_find_in_range method private")
> invokes a kernel panic while running kmemleak on OF platforms with nomaped
> regions:
> 
>   Unable to handle kernel paging request at virtual address fff000021e00000
>   [...]
>     scan_block+0x64/0x170
>     scan_gray_list+0xe8/0x17c
>     kmemleak_scan+0x270/0x514
>     kmemleak_write+0x34c/0x4ac
> 
> Indeed, NOMAP regions don't have linear map entries so an attempt to scan
> these areas would fault.
> 
> Prevent such faults by excluding NOMAP regions from kmemleak.
> 
> Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>

Mike, reverting this commit on the top of today's linux-next fixed the early booting hang
on an arm64 server with kmemleak. Even with "earlycon", it could only print out those
lines.

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable
EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 128k boundary
EFI stub: ERROR: FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services…

I could help to confirm if it hangs right in the early boot somewhere if needed.

start_kernel()
  setup_arch()
    paging_init()
      map_mem()
        memblock_mark_nomap(


> ---
>  mm/memblock.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 184dcd2e5d99..5c3503c98b2f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -936,7 +936,12 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
>  {
> -	return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> +	int ret = memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> +
> +	if (!ret)
> +		kmemleak_free_part_phys(base, size);
> +
> +	return ret;
>  }
>  
>  /**
> 
> base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc
>
Anshuman Khandual Oct. 19, 2021, 4:21 a.m. UTC | #4
On 10/13/21 11:17 AM, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Vladimir Zapolskiy reports:
> 
> commit a7259df76702 ("memblock: make memblock_find_in_range method private")
> invokes a kernel panic while running kmemleak on OF platforms with nomaped
> regions:
> 
>   Unable to handle kernel paging request at virtual address fff000021e00000
>   [...]
>     scan_block+0x64/0x170
>     scan_gray_list+0xe8/0x17c
>     kmemleak_scan+0x270/0x514
>     kmemleak_write+0x34c/0x4ac
> 
> Indeed, NOMAP regions don't have linear map entries so an attempt to scan
> these areas would fault.
> 
> Prevent such faults by excluding NOMAP regions from kmemleak.
> 
> Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
> ---
>  mm/memblock.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 184dcd2e5d99..5c3503c98b2f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -936,7 +936,12 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
>  {
> -	return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> +	int ret = memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> +
> +	if (!ret)
> +		kmemleak_free_part_phys(base, size);
> +
> +	return ret;
>  }
>  
>  /**
> 
> base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc
> 

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

A small nit though.

Just wondering. Should not the comment for memblock_mark_nomap() be
updated (or add a comment in the function) to explain the reason to
call kmemleak_free_part_phys(), to emphasize that a scan would fail
for such memory ranges due to lack of linear mapping ?
Mike Rapoport Oct. 19, 2021, 5:45 a.m. UTC | #5
Hi Qian,

On Mon, Oct 18, 2021 at 11:55:40PM -0400, Qian Cai wrote:
> 
> 
> On 10/13/2021 1:47 AM, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Vladimir Zapolskiy reports:
> > 
> > commit a7259df76702 ("memblock: make memblock_find_in_range method private")
> > invokes a kernel panic while running kmemleak on OF platforms with nomaped
> > regions:
> > 
> >   Unable to handle kernel paging request at virtual address fff000021e00000
> >   [...]
> >     scan_block+0x64/0x170
> >     scan_gray_list+0xe8/0x17c
> >     kmemleak_scan+0x270/0x514
> >     kmemleak_write+0x34c/0x4ac
> > 
> > Indeed, NOMAP regions don't have linear map entries so an attempt to scan
> > these areas would fault.
> > 
> > Prevent such faults by excluding NOMAP regions from kmemleak.
> > 
> > Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
> > Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
> 
> Mike, reverting this commit on the top of today's linux-next fixed the early booting hang
> on an arm64 server with kmemleak. Even with "earlycon", it could only print out those
> lines.
> 
> EFI stub: Booting Linux Kernel...
> EFI stub: EFI_RNG_PROTOCOL unavailable
> EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 128k boundary
> EFI stub: ERROR: FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services…
> 
> I could help to confirm if it hangs right in the early boot somewhere if needed.

The kernel config and a log of working kernel would help to start with.
 
> start_kernel()
>   setup_arch()
>     paging_init()
>       map_mem()
>         memblock_mark_nomap(

So we have kmemleak_free_part_phys() here.

Catalin, any ideas?

> 
> > ---
> >  mm/memblock.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 184dcd2e5d99..5c3503c98b2f 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -936,7 +936,12 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
> >   */
> >  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> >  {
> > -	return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> > +	int ret = memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
> > +
> > +	if (!ret)
> > +		kmemleak_free_part_phys(base, size);
> > +
> > +	return ret;
> >  }
> >  
> >  /**
> > 
> > base-commit: 64570fbc14f8d7cb3fe3995f20e26bc25ce4b2cc
> >
Catalin Marinas Oct. 19, 2021, 11:37 a.m. UTC | #6
On Tue, Oct 19, 2021 at 08:45:49AM +0300, Mike Rapoport wrote:
> On Mon, Oct 18, 2021 at 11:55:40PM -0400, Qian Cai wrote:
> > On 10/13/2021 1:47 AM, Mike Rapoport wrote:
> > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > 
> > > Vladimir Zapolskiy reports:
> > > 
> > > commit a7259df76702 ("memblock: make memblock_find_in_range method private")
> > > invokes a kernel panic while running kmemleak on OF platforms with nomaped
> > > regions:
> > > 
> > >   Unable to handle kernel paging request at virtual address fff000021e00000
> > >   [...]
> > >     scan_block+0x64/0x170
> > >     scan_gray_list+0xe8/0x17c
> > >     kmemleak_scan+0x270/0x514
> > >     kmemleak_write+0x34c/0x4ac
> > > 
> > > Indeed, NOMAP regions don't have linear map entries so an attempt to scan
> > > these areas would fault.
> > > 
> > > Prevent such faults by excluding NOMAP regions from kmemleak.
> > > 
> > > Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org
> > > Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
> > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
> > 
> > Mike, reverting this commit on the top of today's linux-next fixed the early booting hang
> > on an arm64 server with kmemleak. Even with "earlycon", it could only print out those
> > lines.
> > 
> > EFI stub: Booting Linux Kernel...
> > EFI stub: EFI_RNG_PROTOCOL unavailable
> > EFI stub: ERROR: FIRMWARE BUG: kernel image not aligned on 128k boundary
> > EFI stub: ERROR: FIRMWARE BUG: Image BSS overlaps adjacent EFI memory region
> > EFI stub: Using DTB from configuration table
> > EFI stub: Exiting boot services…
> > 
> > I could help to confirm if it hangs right in the early boot somewhere if needed.
> 
> The kernel config and a log of working kernel would help to start with.

I don't think there's much in the log other than the EFI stub above.

> > start_kernel()
> >   setup_arch()
> >     paging_init()
> >       map_mem()
> >         memblock_mark_nomap(

Is this actual trace? It would be good to know where exactly it got
stuck.

> So we have kmemleak_free_part_phys() here.

I wonder whether the memblock_mark_nomap() here is too early for
kmemleak. We don't have the linear map created, though it shouldn't be
an issue as the kernel sections are mapped. Also I think
delete_object_part() in kmemleak.c would bail out early as there
shouldn't be any prior memblock_alloc for this range.
Qian Cai Oct. 19, 2021, 3:06 p.m. UTC | #7
On 10/19/2021 7:37 AM, Catalin Marinas wrote:
>>> I could help to confirm if it hangs right in the early boot somewhere if needed.
>>
>> The kernel config and a log of working kernel would help to start with.

http://lsbug.org/tmp/

> 
> I don't think there's much in the log other than the EFI stub above.
> 
>>> start_kernel()
>>>   setup_arch()
>>>     paging_init()
>>>       map_mem()
>>>         memblock_mark_nomap(
> 
> Is this actual trace? It would be good to know where exactly it got
> stuck.

No, I did not confirm anything yet. There is going to take a while to
figure out the exactly location that hang since even the early console
was not initialized yet. Any suggestion on how to debug in this case?

> 
>> So we have kmemleak_free_part_phys() here.
> 
> I wonder whether the memblock_mark_nomap() here is too early for
> kmemleak. We don't have the linear map created, though it shouldn't be
> an issue as the kernel sections are mapped. Also I think
> delete_object_part() in kmemleak.c would bail out early as there
> shouldn't be any prior memblock_alloc for this range.
>
Catalin Marinas Oct. 19, 2021, 3:53 p.m. UTC | #8
On Tue, Oct 19, 2021 at 11:06:11AM -0400, Qian Cai wrote:
> On 10/19/2021 7:37 AM, Catalin Marinas wrote:
> >>> I could help to confirm if it hangs right in the early boot somewhere if needed.
> >>
> >> The kernel config and a log of working kernel would help to start with.
> 
> http://lsbug.org/tmp/

Thanks. I guess the log here is with the Mike's patch reverted.

> > I don't think there's much in the log other than the EFI stub above.
> > 
> >>> start_kernel()
> >>>   setup_arch()
> >>>     paging_init()
> >>>       map_mem()
> >>>         memblock_mark_nomap(
> > 
> > Is this actual trace? It would be good to know where exactly it got
> > stuck.
> 
> No, I did not confirm anything yet. There is going to take a while to
> figure out the exactly location that hang since even the early console
> was not initialized yet. Any suggestion on how to debug in this case?

Try "earlycon=pl011,mmio32,0x12600000" on the kernel command line
and hopefully we get some early log.
Qian Cai Oct. 19, 2021, 5:59 p.m. UTC | #9
On 10/19/2021 11:53 AM, Catalin Marinas wrote:
> Thanks. I guess the log here is with the Mike's patch reverted.

Yes.

> Try "earlycon=pl011,mmio32,0x12600000" on the kernel command line
> and hopefully we get some early log.

Thanks for the suggestion, Catalin. I did not realize that a
manually-provided "earlycon" started earlier than just "earlycon"
and not defer to ACPI to populate parameters. Anyway,

[	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
[	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
[	0.000000][	T0] printk: bootconsole [pl11] enabled
[	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
[	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
[	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
[	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
[	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
[	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
[	0.000000][	T0] efi: EFI v2.70 by American Megatrends
[	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
[	0.000000][	T0] efi: Processing EFI memory map:
[	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000][	T0] ------------[ cut here ]------------
[	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
[	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
[	0.000000][	T0] Modules linked in:
[	0.000000][	T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-20211019+ #104
[	0.000000][	T0] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[	0.000000][	T0] pc : kmemleak_free_part_phys+0x64/0x8c
[	0.000000][	T0] lr : kmemleak_free_part_phys+0x38/0x8c
[	0.000000][	T0] sp : ffff800011eafbc0
[	0.000000][	T0] x29: ffff800011eafbc0 x28: 1fffff7fffb41c0d x27: fffffbfffda0e068
[	0.000000][	T0] x26: 0000000092000000 x25: 1ffff000023d5f94 x24: ffff800011ed84d0
[	0.000000][	T0] x23: ffff800011ed84c0 x22: ffff800011ed83d8 x21: 0000000000900000
[	0.000000][	T0] x20: ffff800011782000 x19: 0000000092000000 x18: ffff800011ee0730
[	0.000000][	T0] x17: 0000000000000000 x16: 0000000000000000 x15: 1ffff0000233252c
[	0.000000][	T0] x14: ffff800019a905a0 x13: 0000000000000001 x12: ffff7000023d5ed7
[	0.000000][	T0] x11: 1ffff000023d5ed6 x10: ffff7000023d5ed6 x9 : dfff800000000000
[	0.000000][	T0] x8 : ffff800011eaf6b7 x7 : 0000000000000001 x6 : ffff800011eaf6b0
[	0.000000][	T0] x5 : 00008ffffdc2a12a x4 : ffff7000023d5ed7 x3 : 1ffff000023dbf99
[	0.000000][	T0] x2 : 1ffff000022f0463 x1 : 0000000000000000 x0 : ffffffffffffffff
[	0.000000][	T0] Call trace:
[	0.000000][	T0]  kmemleak_free_part_phys+0x64/0x8c
[	0.000000][	T0]  memblock_mark_nomap+0x5c/0x78
[	0.000000][	T0]  reserve_regions+0x294/0x33c
[	0.000000][	T0]  efi_init+0x2d0/0x490
[	0.000000][	T0]  setup_arch+0x80/0x138
[	0.000000][	T0]  start_kernel+0xa0/0x3ec
[	0.000000][	T0]  __primary_switched+0xc0/0xc8
[	0.000000][	T0] Code: 34000041 97d526e7 f9418e80 36000040 (d4210000)  
[	0.000000][	T0] random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0
[	0.000000][	T0] ---[ end trace 0000000000000000 ]---
[	0.000000][	T0] Kernel panic - not syncing: Oops - BUG: Fatal exception
[	0.000000][	T0] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---

I did not quite figure out where this BUG() was triggered and I did not
see anything obviously after checking DEBUG_VIRTUAL code, but it did
finger to the kmemleak_free_part() line. I verified that phys == 0x92000000d,

void __ref kmemleak_free_part_phys(phys_addr_t phys, size_t size)
{
	if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
		kmemleak_free_part(__va(phys), size);
}

As you can see the above efi=debug information was truncated. Usually
on a working boot the whole thing is:

[	0.000000] efi: Processing EFI memory map:
[	0.000000] efi:   0x000010540000-0x00001054ffff [Memory Mapped I/O  |RUN|  |  |  |  |  |  |   |  |  |  |  ]
[	0.000000] efi:   0x000090000000-0x00009007ffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x000090080000-0x000091ebffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x000091ec0000-0x000091ffffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x000092000000-0x0000928fffff [Runtime Data   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x000092900000-0x0000fffb7fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x0000fffb8000-0x0000fffbffff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x0000fffc0000-0x0000ffffffff [Runtime Data   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x000880000000-0x00088ae4afff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x00088ae4b000-0x00088fffffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x000890000000-0x000fffffffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x008800000000-0x009f81089fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009f8108a000-0x009f82dabfff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009f82dac000-0x009fe6dabfff [Loader Code    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fe6dac000-0x009fe6dacfff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fe6dad000-0x009fe6dadfff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fe6dae000-0x009fe6db2fff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fe6db3000-0x009fe6f7bfff [Loader Code    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fe6f7c000-0x009ff287cfff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff287d000-0x009ff3293fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff3294000-0x009ff5af0fff [Boot Code      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff5af1000-0x009ff5b2ffff [Reserved       	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff5b30000-0x009ff5b4ffff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff5b50000-0x009ff5baffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff5bb0000-0x009ff5bbffff [ACPI Memory NVS	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff5bc0000-0x009ff7deffff [Runtime Data   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff7df0000-0x009ff7e5ffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff7e60000-0x009ff7ffffff [Runtime Code   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff8000000-0x009ff801efff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff801f000-0x009ff801ffff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009ff8020000-0x009fff9fffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fffa00000-0x009fffbfffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fffc00000-0x009fffdbffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fffdc0000-0x009fffdcffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fffdd0000-0x009fffdd4fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
[	0.000000] efi:   0x009fffdd5000-0x009fffffffff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
Mike Rapoport Oct. 19, 2021, 6:33 p.m. UTC | #10
On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> 
> 
> On 10/19/2021 11:53 AM, Catalin Marinas wrote:
> > Thanks. I guess the log here is with the Mike's patch reverted.
> 
> Yes.
> 
> > Try "earlycon=pl011,mmio32,0x12600000" on the kernel command line
> > and hopefully we get some early log.
> 
> Thanks for the suggestion, Catalin. I did not realize that a
> manually-provided "earlycon" started earlier than just "earlycon"
> and not defer to ACPI to populate parameters. Anyway,
> 
> [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> [	0.000000][	T0] printk: bootconsole [pl11] enabled
> [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> [	0.000000][	T0] efi: Processing EFI memory map:
> [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000][	T0] ------------[ cut here ]------------
> [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> [	0.000000][	T0] Modules linked in:
> [	0.000000][	T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-20211019+ #104
> [	0.000000][	T0] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [	0.000000][	T0] pc : kmemleak_free_part_phys+0x64/0x8c
> [	0.000000][	T0] lr : kmemleak_free_part_phys+0x38/0x8c
> [	0.000000][	T0] sp : ffff800011eafbc0
> [	0.000000][	T0] x29: ffff800011eafbc0 x28: 1fffff7fffb41c0d x27: fffffbfffda0e068
> [	0.000000][	T0] x26: 0000000092000000 x25: 1ffff000023d5f94 x24: ffff800011ed84d0
> [	0.000000][	T0] x23: ffff800011ed84c0 x22: ffff800011ed83d8 x21: 0000000000900000
> [	0.000000][	T0] x20: ffff800011782000 x19: 0000000092000000 x18: ffff800011ee0730
> [	0.000000][	T0] x17: 0000000000000000 x16: 0000000000000000 x15: 1ffff0000233252c
> [	0.000000][	T0] x14: ffff800019a905a0 x13: 0000000000000001 x12: ffff7000023d5ed7
> [	0.000000][	T0] x11: 1ffff000023d5ed6 x10: ffff7000023d5ed6 x9 : dfff800000000000
> [	0.000000][	T0] x8 : ffff800011eaf6b7 x7 : 0000000000000001 x6 : ffff800011eaf6b0
> [	0.000000][	T0] x5 : 00008ffffdc2a12a x4 : ffff7000023d5ed7 x3 : 1ffff000023dbf99
> [	0.000000][	T0] x2 : 1ffff000022f0463 x1 : 0000000000000000 x0 : ffffffffffffffff
> [	0.000000][	T0] Call trace:
> [	0.000000][	T0]  kmemleak_free_part_phys+0x64/0x8c
> [	0.000000][	T0]  memblock_mark_nomap+0x5c/0x78
> [	0.000000][	T0]  reserve_regions+0x294/0x33c
> [	0.000000][	T0]  efi_init+0x2d0/0x490
> [	0.000000][	T0]  setup_arch+0x80/0x138
> [	0.000000][	T0]  start_kernel+0xa0/0x3ec
> [	0.000000][	T0]  __primary_switched+0xc0/0xc8
> [	0.000000][	T0] Code: 34000041 97d526e7 f9418e80 36000040 (d4210000)  
> [	0.000000][	T0] random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0
> [	0.000000][	T0] ---[ end trace 0000000000000000 ]---
> [	0.000000][	T0] Kernel panic - not syncing: Oops - BUG: Fatal exception
> [	0.000000][	T0] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]---
> 
> I did not quite figure out where this BUG() was triggered and I did not

This is from here:
arch/arm64/include/asm/memory.h:

#define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })

kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
happens before memstart_addr is set.

I'll try to see how this can be untangled...

> see anything obviously after checking DEBUG_VIRTUAL code, but it did
> finger to the kmemleak_free_part() line. I verified that phys == 0x92000000d,
> 
> void __ref kmemleak_free_part_phys(phys_addr_t phys, size_t size)
> {
> 	if (!IS_ENABLED(CONFIG_HIGHMEM) || PHYS_PFN(phys) < max_low_pfn)
> 		kmemleak_free_part(__va(phys), size);
> }
> 
> As you can see the above efi=debug information was truncated. Usually
> on a working boot the whole thing is:
> 
> [	0.000000] efi: Processing EFI memory map:
> [	0.000000] efi:   0x000010540000-0x00001054ffff [Memory Mapped I/O  |RUN|  |  |  |  |  |  |   |  |  |  |  ]
> [	0.000000] efi:   0x000090000000-0x00009007ffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x000090080000-0x000091ebffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x000091ec0000-0x000091ffffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x000092000000-0x0000928fffff [Runtime Data   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x000092900000-0x0000fffb7fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x0000fffb8000-0x0000fffbffff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x0000fffc0000-0x0000ffffffff [Runtime Data   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x000880000000-0x00088ae4afff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x00088ae4b000-0x00088fffffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x000890000000-0x000fffffffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x008800000000-0x009f81089fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009f8108a000-0x009f82dabfff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009f82dac000-0x009fe6dabfff [Loader Code    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fe6dac000-0x009fe6dacfff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fe6dad000-0x009fe6dadfff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fe6dae000-0x009fe6db2fff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fe6db3000-0x009fe6f7bfff [Loader Code    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fe6f7c000-0x009ff287cfff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff287d000-0x009ff3293fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff3294000-0x009ff5af0fff [Boot Code      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff5af1000-0x009ff5b2ffff [Reserved       	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff5b30000-0x009ff5b4ffff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff5b50000-0x009ff5baffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff5bb0000-0x009ff5bbffff [ACPI Memory NVS	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff5bc0000-0x009ff7deffff [Runtime Data   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff7df0000-0x009ff7e5ffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff7e60000-0x009ff7ffffff [Runtime Code   	|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff8000000-0x009ff801efff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff801f000-0x009ff801ffff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009ff8020000-0x009fff9fffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fffa00000-0x009fffbfffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fffc00000-0x009fffdbffff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fffdc0000-0x009fffdcffff [Loader Data    	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fffdd0000-0x009fffdd4fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> [	0.000000] efi:   0x009fffdd5000-0x009fffffffff [Boot Data      	|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
Mike Rapoport Oct. 20, 2021, 7:38 a.m. UTC | #11
On Tue, Oct 19, 2021 at 09:33:11PM +0300, Mike Rapoport wrote:
> On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> > 
> > On 10/19/2021 11:53 AM, Catalin Marinas wrote:
> > > Thanks. I guess the log here is with the Mike's patch reverted.
> > 
> > Yes.
> > 
> > > Try "earlycon=pl011,mmio32,0x12600000" on the kernel command line
> > > and hopefully we get some early log.
> > 
> > Thanks for the suggestion, Catalin. I did not realize that a
> > manually-provided "earlycon" started earlier than just "earlycon"
> > and not defer to ACPI to populate parameters. Anyway,
> > 
> > [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> > [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> > [	0.000000][	T0] printk: bootconsole [pl11] enabled
> > [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> > [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> > [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> > [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> > [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> > [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> > [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> > [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> > [	0.000000][	T0] efi: Processing EFI memory map:
> > [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > [	0.000000][	T0] ------------[ cut here ]------------
> > [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> > [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> > 
> > I did not quite figure out where this BUG() was triggered and I did not
> 
> This is from here:
> arch/arm64/include/asm/memory.h:
> 
> #define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> 
> kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
> happens before memstart_addr is set.
> 
> I'll try to see how this can be untangled...
 
This late in the cycle I can only think of reverting kmemleak wavier from
memblock_mark_nomap() and putting it in
early_init_dt_alloc_reserved_memory_arch() being the only user setting
MEMBLOCK_NOMAP to an allocated chunk rather than marking NOMAP "unusable"
memory reported by firmware.

Thoughts?
Catalin Marinas Oct. 20, 2021, 8:18 a.m. UTC | #12
On Wed, Oct 20, 2021 at 10:38:23AM +0300, Mike Rapoport wrote:
> On Tue, Oct 19, 2021 at 09:33:11PM +0300, Mike Rapoport wrote:
> > On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> > > [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > > [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> > > [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> > > [	0.000000][	T0] printk: bootconsole [pl11] enabled
> > > [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> > > [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> > > [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> > > [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> > > [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> > > [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> > > [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> > > [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> > > [	0.000000][	T0] efi: Processing EFI memory map:
> > > [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > [	0.000000][	T0] ------------[ cut here ]------------
> > > [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> > > [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> > > 
> > > I did not quite figure out where this BUG() was triggered and I did not
> > 
> > This is from here:
> > arch/arm64/include/asm/memory.h:
> > 
> > #define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> > 
> > kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
> > happens before memstart_addr is set.
> > 
> > I'll try to see how this can be untangled...
>  
> This late in the cycle I can only think of reverting kmemleak wavier from
> memblock_mark_nomap() and putting it in
> early_init_dt_alloc_reserved_memory_arch() being the only user setting
> MEMBLOCK_NOMAP to an allocated chunk rather than marking NOMAP "unusable"
> memory reported by firmware.

It makes sense, there aren't many places or nomap is called.

I think arch_reserve_mem_area() called from acpi_table_upgrade() also
follows a memblock allocation. But I'd call kmemleak in
acpi_table_upgrade() directly rather than in the arch back-end.

Regarding which callback, I think kmemleak_ignore_phys() is better
suited here since kmemleak still keeps track of the object but won't
scan it.
Mike Rapoport Oct. 20, 2021, 8:42 a.m. UTC | #13
On Wed, Oct 20, 2021 at 09:18:46AM +0100, Catalin Marinas wrote:
> On Wed, Oct 20, 2021 at 10:38:23AM +0300, Mike Rapoport wrote:
> > On Tue, Oct 19, 2021 at 09:33:11PM +0300, Mike Rapoport wrote:
> > > On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> > > > [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > > > [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> > > > [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> > > > [	0.000000][	T0] printk: bootconsole [pl11] enabled
> > > > [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> > > > [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> > > > [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> > > > [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> > > > [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> > > > [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> > > > [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> > > > [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> > > > [	0.000000][	T0] efi: Processing EFI memory map:
> > > > [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > > [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > > [	0.000000][	T0] ------------[ cut here ]------------
> > > > [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> > > > [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> > > > 
> > > > I did not quite figure out where this BUG() was triggered and I did not
> > > 
> > > This is from here:
> > > arch/arm64/include/asm/memory.h:
> > > 
> > > #define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> > > 
> > > kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
> > > happens before memstart_addr is set.
> > > 
> > > I'll try to see how this can be untangled...
> >  
> > This late in the cycle I can only think of reverting kmemleak wavier from
> > memblock_mark_nomap() and putting it in
> > early_init_dt_alloc_reserved_memory_arch() being the only user setting
> > MEMBLOCK_NOMAP to an allocated chunk rather than marking NOMAP "unusable"
> > memory reported by firmware.
> 
> It makes sense, there aren't many places or nomap is called.
> 
> I think arch_reserve_mem_area() called from acpi_table_upgrade() also
> follows a memblock allocation. But I'd call kmemleak in
> acpi_table_upgrade() directly rather than in the arch back-end.

Hmm, not sure this is correct for x86. I don't see why can't it track the
memory allocated in acpi_table_upgrade().
 
> Regarding which callback, I think kmemleak_ignore_phys() is better
> suited here since kmemleak still keeps track of the object but won't
> scan it.

Ok.
Catalin Marinas Oct. 20, 2021, 9:33 a.m. UTC | #14
On Wed, Oct 20, 2021 at 11:42:28AM +0300, Mike Rapoport wrote:
> On Wed, Oct 20, 2021 at 09:18:46AM +0100, Catalin Marinas wrote:
> > On Wed, Oct 20, 2021 at 10:38:23AM +0300, Mike Rapoport wrote:
> > > On Tue, Oct 19, 2021 at 09:33:11PM +0300, Mike Rapoport wrote:
> > > > On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> > > > > [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > > > > [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> > > > > [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> > > > > [	0.000000][	T0] printk: bootconsole [pl11] enabled
> > > > > [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> > > > > [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> > > > > [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> > > > > [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> > > > > [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> > > > > [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> > > > > [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> > > > > [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> > > > > [	0.000000][	T0] efi: Processing EFI memory map:
> > > > > [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > > > [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > > > [	0.000000][	T0] ------------[ cut here ]------------
> > > > > [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> > > > > [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> > > > > 
> > > > > I did not quite figure out where this BUG() was triggered and I did not
> > > > 
> > > > This is from here:
> > > > arch/arm64/include/asm/memory.h:
> > > > 
> > > > #define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> > > > 
> > > > kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
> > > > happens before memstart_addr is set.
> > > > 
> > > > I'll try to see how this can be untangled...
> > >  
> > > This late in the cycle I can only think of reverting kmemleak wavier from
> > > memblock_mark_nomap() and putting it in
> > > early_init_dt_alloc_reserved_memory_arch() being the only user setting
> > > MEMBLOCK_NOMAP to an allocated chunk rather than marking NOMAP "unusable"
> > > memory reported by firmware.
> > 
> > It makes sense, there aren't many places or nomap is called.
> > 
> > I think arch_reserve_mem_area() called from acpi_table_upgrade() also
> > follows a memblock allocation. But I'd call kmemleak in
> > acpi_table_upgrade() directly rather than in the arch back-end.
> 
> Hmm, not sure this is correct for x86. I don't see why can't it track the
> memory allocated in acpi_table_upgrade().

Kmemleak still tracks it after an ignore but it won't be scanned. I
don't think this memory contains pointers to virtual addresses.
Catalin Marinas Oct. 20, 2021, 10:13 a.m. UTC | #15
On Wed, Oct 20, 2021 at 10:38:23AM +0300, Mike Rapoport wrote:
> On Tue, Oct 19, 2021 at 09:33:11PM +0300, Mike Rapoport wrote:
> > On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> > > [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > > [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> > > [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> > > [	0.000000][	T0] printk: bootconsole [pl11] enabled
> > > [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> > > [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> > > [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> > > [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> > > [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> > > [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> > > [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> > > [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> > > [	0.000000][	T0] efi: Processing EFI memory map:
> > > [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > [	0.000000][	T0] ------------[ cut here ]------------
> > > [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> > > [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> > > 
> > > I did not quite figure out where this BUG() was triggered and I did not
> > 
> > This is from here:
> > arch/arm64/include/asm/memory.h:
> > 
> > #define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> > 
> > kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
> > happens before memstart_addr is set.
> > 
> > I'll try to see how this can be untangled...
>  
> This late in the cycle I can only think of reverting kmemleak wavier from
> memblock_mark_nomap() and putting it in
> early_init_dt_alloc_reserved_memory_arch() being the only user setting
> MEMBLOCK_NOMAP to an allocated chunk rather than marking NOMAP "unusable"
> memory reported by firmware.

BTW, would something like this work:

diff --git a/mm/memblock.c b/mm/memblock.c
index aa87ff5ae2a4..7e67378a8ddf 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -939,7 +939,7 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
 {
 	int ret = memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
 
-	if (!ret)
+	if (!ret && memblock_is_region_reserved(base, size))
 		kmemleak_free_part_phys(base, size);
 
 	return ret;
Mike Rapoport Oct. 20, 2021, 10:39 a.m. UTC | #16
On Wed, Oct 20, 2021 at 11:13:06AM +0100, Catalin Marinas wrote:
> On Wed, Oct 20, 2021 at 10:38:23AM +0300, Mike Rapoport wrote:
> > On Tue, Oct 19, 2021 at 09:33:11PM +0300, Mike Rapoport wrote:
> > > On Tue, Oct 19, 2021 at 01:59:22PM -0400, Qian Cai wrote:
> > > > [	0.000000][	T0] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
> > > > [	0.000000][	T0] Linux version 5.15.0-rc6-next-20211019+ (root@admin5) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #104 SMP Tue Oct 19 17:36:17 UTC 2021
> > > > [	0.000000][	T0] earlycon: pl11 at MMIO32 0x0000000012600000 (options '')
> > > > [	0.000000][	T0] printk: bootconsole [pl11] enabled
> > > > [	0.000000][	T0] efi: Getting UEFI parameters from /chosen in DT:
> > > > [	0.000000][	T0] efi:   System Table     	: 0x0000009ff7de0018
> > > > [	0.000000][	T0] efi:   MemMap Address   	: 0x0000009fe6dae018
> > > > [	0.000000][	T0] efi:   MemMap Size      	: 0x0000000000000600
> > > > [	0.000000][	T0] efi:   MemMap Desc. Size	: 0x0000000000000030
> > > > [	0.000000][	T0] efi:   MemMap Desc. Version : 0x0000000000000001
> > > > [	0.000000][	T0] efi: EFI v2.70 by American Megatrends
> > > > [	0.000000][	T0] efi: ACPI 2.0=0x9ff5b40000 SMBIOS 3.0=0x9ff686fd98 ESRT=0x9ff1d18298 MEMRESERVE=0x9fe6dacd98  
> > > > [	0.000000][	T0] efi: Processing EFI memory map:
> > > > [	0.000000][	T0] efi:   0x000090000000-0x000091ffffff [Conventional|   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > > [	0.000000][	T0] efi:   0x000092000000-0x0000928fffff [Runtime Data|RUN|  |  |  |  |  |  |  |  |   |WB|WT|WC|UC]
> > > > [	0.000000][	T0] ------------[ cut here ]------------
> > > > [	0.000000][	T0] kernel BUG at mm/kmemleak.c:1140!
> > > > [	0.000000][	T0] Internal error: Oops - BUG: 0 [#1] SMP
> > > > 
> > > > I did not quite figure out where this BUG() was triggered and I did not
> > > 
> > > This is from here:
> > > arch/arm64/include/asm/memory.h:
> > > 
> > > #define PHYS_OFFSET         ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })
> > > 
> > > kmemleak_free_part_phys() does __va() which uses PHYS_OFFSET and all this
> > > happens before memstart_addr is set.
> > > 
> > > I'll try to see how this can be untangled...
> >  
> > This late in the cycle I can only think of reverting kmemleak wavier from
> > memblock_mark_nomap() and putting it in
> > early_init_dt_alloc_reserved_memory_arch() being the only user setting
> > MEMBLOCK_NOMAP to an allocated chunk rather than marking NOMAP "unusable"
> > memory reported by firmware.
> 
> BTW, would something like this work:
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index aa87ff5ae2a4..7e67378a8ddf 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -939,7 +939,7 @@ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
>  {
>  	int ret = memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
>  
> -	if (!ret)
> +	if (!ret && memblock_is_region_reserved(base, size))
>  		kmemleak_free_part_phys(base, size);

Apparently it would for the cases we have now.
But it will fail same way as now if somebody will call memblock_reserve() and then
memblock_mark_nomap() for the same chunk before arm64_memblock_init().

For instance, slight order change in efi-init::reserve_regions() will
trigger the same fault... :(
diff mbox series

Patch

diff --git a/mm/memblock.c b/mm/memblock.c
index 184dcd2e5d99..5c3503c98b2f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -936,7 +936,12 @@  int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
  */
 int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
 {
-	return memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
+	int ret = memblock_setclr_flag(base, size, 1, MEMBLOCK_NOMAP);
+
+	if (!ret)
+		kmemleak_free_part_phys(base, size);
+
+	return ret;
 }
 
 /**