diff mbox series

[3/3] arm: extend pfn_valid to take into accound freed memory map alignment

Message ID 20210518090613.21519-4-rppt@kernel.org (mailing list archive)
State New, archived
Headers show
Series memblock, arm: fixes for freeing of the memory map | expand

Commit Message

Mike Rapoport May 18, 2021, 9:06 a.m. UTC
From: Mike Rapoport <rppt@linux.ibm.com>

When unused memory map is freed the preserved part of the memory map is
extended to match pageblock boundaries because lots of core mm
functionality relies on homogeneity of the memory map within pageblock
boundaries.

Since pfn_valid() is used to check whether there is a valid memory map
entry for a PFN, make it return true also for PFNs that have memory map
entries even if there is no actual memory populated there.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 arch/arm/mm/init.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Russell King (Oracle) May 18, 2021, 9:44 a.m. UTC | #1
On Tue, May 18, 2021 at 12:06:13PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> When unused memory map is freed the preserved part of the memory map is
> extended to match pageblock boundaries because lots of core mm
> functionality relies on homogeneity of the memory map within pageblock
> boundaries.
> 
> Since pfn_valid() is used to check whether there is a valid memory map
> entry for a PFN, make it return true also for PFNs that have memory map
> entries even if there is no actual memory populated there.

I thought pfn_valid() was a particularly hot path... do we really want
to be doing multiple lookups here? Is there no better solution?
Mike Rapoport May 18, 2021, 10:53 a.m. UTC | #2
On Tue, May 18, 2021 at 10:44:27AM +0100, Russell King (Oracle) wrote:
> On Tue, May 18, 2021 at 12:06:13PM +0300, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > When unused memory map is freed the preserved part of the memory map is
> > extended to match pageblock boundaries because lots of core mm
> > functionality relies on homogeneity of the memory map within pageblock
> > boundaries.
> > 
> > Since pfn_valid() is used to check whether there is a valid memory map
> > entry for a PFN, make it return true also for PFNs that have memory map
> > entries even if there is no actual memory populated there.
> 
> I thought pfn_valid() was a particularly hot path... do we really want
> to be doing multiple lookups here? Is there no better solution?

It is hot, but for more, hmm, straightforward memory layouts it'll take 

	if (memblock_is_map_memory(addr))
		return 1;

branch, I think.

Most of mm operations are on pages that are fed into buddy allocator, and
if there are no holes with weird alignment  pfn_valid() will return 1 right
away.

Now thinking about it, with the patch that marks NOMAP areas reserved in
the memory map [1], we could also use
	
	memblock_overlaps_region(&memblock.memory,
				 ALIGN_DOWN(addr, pageblock_size),
				 ALIGN(addr + 1, pageblock_size))
to have only one lookup.

Completely another approach would be to simply stop freeing memory map with
SPARSEMEM. For systems like the one Kefen is using, it would waste less
than 2M out of 1.5G.
It is worse of course for old systems with small memories. The worst case
being mach-ep93xx with sections size of 256M and I presume 16M per section
would be normal for such machines.

[1] https://lore.kernel.org/lkml/20210511100550.28178-3-rppt@kernel.org
Kefeng Wang May 18, 2021, 12:49 p.m. UTC | #3
On 2021/5/18 17:06, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> When unused memory map is freed the preserved part of the memory map is
> extended to match pageblock boundaries because lots of core mm
> functionality relies on homogeneity of the memory map within pageblock
> boundaries.
> 
> Since pfn_valid() is used to check whether there is a valid memory map
> entry for a PFN, make it return true also for PFNs that have memory map
> entries even if there is no actual memory populated there.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>   arch/arm/mm/init.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 9d4744a632c6..bb678c0ba143 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>   int pfn_valid(unsigned long pfn)
>   {
>   	phys_addr_t addr = __pfn_to_phys(pfn);
> +	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
>   
>   	if (__phys_to_pfn(addr) != pfn)
>   		return 0;
>   
> -	return memblock_is_map_memory(addr);
> +	if (memblock_is_map_memory(addr))
> +		return 1;
> +
> +	/*
> +	 * If address less than pageblock_size bytes away from a present
> +	 * memory chunk there still will be a memory map entry for it
> +	 * because we round freed memory map to the pageblock boundaries
> +	 */
> +	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
> +	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
> +		return 1;

Hi Mike, with patch3, the system won't boot.

...
Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff]
   Normal zone: 18176 pages in unavailable ranges
8<--- cut here ---
Unable to handle kernel paging request at virtual address 0197c000
pgd = (ptrval)
[0197c000] *pgd=00000000
Internal error: Oops: 805 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #132
Hardware name: Hisilicon A9
PC is at mmioset+0x4c/0xa8
LR is at 0x0
pc : [<c033930c>]    lr : [<00000000>]    psr: 00000293
sp : c0801d88  ip : 0197c000  fp : 000001ff
r10: 00000000  r9 : 00000001  r8 : 00000000
r7 : 00000032  r6 : 00000032  r5 : 01000000  r4 : 0197c000
r3 : 00000000  r2 : ffffffe0  r1 : 00000000  r0 : 0197c000
Flags: nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 1ac5387d  Table: 80a0404a  DAC: 55555555
Process swapper (pid: 0, stack limit = 0x(ptrval))
Stack: (0xc0801d88 to 0xc0802000)
1d80:                   000cc000 c04b5764 000cbe00 0197c000 c0885040 
c04b580c
1da0: 00000000 00000000 00000001 000fffff 00000000 00000000 00000000 
000b0200
1dc0: 00000001 00000000 c0880f38 c04b5d4c 000fffff 00000000 00000000 
00000001
1de0: 000cc000 000dca00 00000005 c0813d28 000259ff 000259ff 00000001 
c072e4c4
1e00: 0004fdff 000b0200 c050903c 00000004 c0863280 c0712ea4 00000000 
ffffefff
1e20: 00000000 c0862fa0 0007f5ff 00080a00 00000000 00080a00 000fffff 
c0813d28
1e40: 00000000 00000001 c05b046b 00000000 fffff000 c0842780 c0801e90 
00000000
1e60: fffff000 c071336c c0801e90 ffffefff 00000000 00000001 000fda00 
000fffff
1e80: ffffffff 000fda00 000fffff ffffffff 00000000 c0813d28 00000000 
c0881bb8
1ea0: c0881bac c0817900 c05df034 c0842780 c0818614 c086e3fc ffe00000 
c0705d50
1ec0: 000b0200 000fffff 00000000 c0813d28 c0817900 00000000 ef7fa000 
c07076ac
1ee0: 00000000 c0801f00 c0801f04 00000000 b0200000 00080b00 00081200 
c072b000
1f00: cc000000 b0200000 00000000 00000006 fda00000 ffff1000 000afe01 
00001000
1f20: 00000007 c0813d28 c0598da7 c0727be8 c08178e4 c086e1c0 c08e8058 
c0008000
1f40: c0801fcc 1ac5387d c05df0f8 c07044e8 00000000 00117680 c0813d20 
12c0387d
1f60: 40000193 80a01000 413fc090 1ac5387d 00000000 c0166978 c0801fac 
c0801fac
1f80: 00000000 c04b07e0 c0801fac c0813d28 00000080 00117680 c0813d20 
12c0387d
1fa0: 40000193 80a01000 413fc090 1ac5387d 00000000 c0700940 00000000 
00000000
1fc0: 00000000 00000000 00000000 c072aa64 00000000 c0813d28 00000000 
00117680
1fe0: 55555555 12c0387d 40000193 80a01000 413fc090 00000000 00000000 
00000000
[<c033930c>] (mmioset) from [<c04b5764>] (__init_single_page+0x50/0x7c)
[<c04b5764>] (__init_single_page) from [<c04b580c>] 
(init_unavailable_range+0x7c/0xec)
[<c04b580c>] (init_unavailable_range) from [<c04b5d4c>] 
(memmap_init+0x144/0x188)
[<c04b5d4c>] (memmap_init) from [<c0712ea4>] 
(free_area_init_node+0x3b8/0x54c)
[<c0712ea4>] (free_area_init_node) from [<c071336c>] 
(free_area_init+0x214/0x5a4)
[<c071336c>] (free_area_init) from [<c0705d50>] (bootmem_init+0x78/0xac)
[<c0705d50>] (bootmem_init) from [<c07076ac>] (paging_init+0x410/0x6bc)
[<c07076ac>] (paging_init) from [<c07044e8>] (setup_arch+0x10c/0x5c4)
[<c07044e8>] (setup_arch) from [<c0700940>] (start_kernel+0x60/0x5c0)
[<c0700940>] (start_kernel) from [<00000000>] (0x0)
Code: 0a41aca8 f9ffffca 0081bd08 200012e3 (0a41ac18)
Mike Rapoport May 18, 2021, 3:52 p.m. UTC | #4
On Tue, May 18, 2021 at 08:49:43PM +0800, Kefeng Wang wrote:
> 
> 
> On 2021/5/18 17:06, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > When unused memory map is freed the preserved part of the memory map is
> > extended to match pageblock boundaries because lots of core mm
> > functionality relies on homogeneity of the memory map within pageblock
> > boundaries.
> > 
> > Since pfn_valid() is used to check whether there is a valid memory map
> > entry for a PFN, make it return true also for PFNs that have memory map
> > entries even if there is no actual memory populated there.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >   arch/arm/mm/init.c | 15 ++++++++++++++-
> >   1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index 9d4744a632c6..bb678c0ba143 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
> >   int pfn_valid(unsigned long pfn)
> >   {
> >   	phys_addr_t addr = __pfn_to_phys(pfn);
> > +	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
> >   	if (__phys_to_pfn(addr) != pfn)
> >   		return 0;
> > -	return memblock_is_map_memory(addr);
> > +	if (memblock_is_map_memory(addr))
> > +		return 1;
> > +
> > +	/*
> > +	 * If address less than pageblock_size bytes away from a present
> > +	 * memory chunk there still will be a memory map entry for it
> > +	 * because we round freed memory map to the pageblock boundaries
> > +	 */
> > +	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
> > +	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
> > +		return 1;
> 
> Hi Mike, with patch3, the system won't boot.

Hmm, apparently I've miscalculated the ranges...

Can you please check with the below patch on top of this series:

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index bb678c0ba143..2fafbbc8e73b 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -138,8 +138,9 @@ int pfn_valid(unsigned long pfn)
 	 * memory chunk there still will be a memory map entry for it
 	 * because we round freed memory map to the pageblock boundaries
 	 */
-	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
-	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
+	if (memblock_overlaps_region(&memblock.memory,
+				     ALIGN_DOWN(addr, pageblock_size),
+				     pageblock_size);
 		return 1;
 
 	return 0;
Kefeng Wang May 19, 2021, 1:50 a.m. UTC | #5
On 2021/5/18 23:52, Mike Rapoport wrote:
> On Tue, May 18, 2021 at 08:49:43PM +0800, Kefeng Wang wrote:
>>
>>
>> On 2021/5/18 17:06, Mike Rapoport wrote:
>>> From: Mike Rapoport <rppt@linux.ibm.com>
>>>
>>> When unused memory map is freed the preserved part of the memory map is
>>> extended to match pageblock boundaries because lots of core mm
>>> functionality relies on homogeneity of the memory map within pageblock
>>> boundaries.
>>>
>>> Since pfn_valid() is used to check whether there is a valid memory map
>>> entry for a PFN, make it return true also for PFNs that have memory map
>>> entries even if there is no actual memory populated there.
>>>
>>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>>> ---
>>>    arch/arm/mm/init.c | 15 ++++++++++++++-
>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>>> index 9d4744a632c6..bb678c0ba143 100644
>>> --- a/arch/arm/mm/init.c
>>> +++ b/arch/arm/mm/init.c
>>> @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
>>>    int pfn_valid(unsigned long pfn)
>>>    {
>>>    	phys_addr_t addr = __pfn_to_phys(pfn);
>>> +	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
>>>    	if (__phys_to_pfn(addr) != pfn)
>>>    		return 0;
>>> -	return memblock_is_map_memory(addr);
>>> +	if (memblock_is_map_memory(addr))
>>> +		return 1;
>>> +
>>> +	/*
>>> +	 * If address less than pageblock_size bytes away from a present
>>> +	 * memory chunk there still will be a memory map entry for it
>>> +	 * because we round freed memory map to the pageblock boundaries
>>> +	 */
>>> +	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
>>> +	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
>>> +		return 1;
>>
>> Hi Mike, with patch3, the system won't boot.
> 
> Hmm, apparently I've miscalculated the ranges...
> 
> Can you please check with the below patch on top of this series:

Yes, it works,

On node 0 totalpages: 311551
   Normal zone: 1230 pages used for memmap
   Normal zone: 0 pages reserved
   Normal zone: 157440 pages, LIFO batch:31
   Normal zone: 17152 pages in unavailable ranges
   HighMem zone: 154111 pages, LIFO batch:31
   HighMem zone: 513 pages in unavailable ranges

and the oom testcase could pass.

Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>


There is memblock_is_region_reserved(check if a region intersects 
reserved memory), it also checks the size, should we add a similar func?

> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index bb678c0ba143..2fafbbc8e73b 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -138,8 +138,9 @@ int pfn_valid(unsigned long pfn)
>   	 * memory chunk there still will be a memory map entry for it
>   	 * because we round freed memory map to the pageblock boundaries
>   	 */
> -	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
> -	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
> +	if (memblock_overlaps_region(&memblock.memory,
> +				     ALIGN_DOWN(addr, pageblock_size),
> +				     pageblock_size);
>   		return 1;
>   
>   	return 0;
>
Mike Rapoport May 19, 2021, 1:25 p.m. UTC | #6
On Wed, May 19, 2021 at 09:50:46AM +0800, Kefeng Wang wrote:
> 
> On 2021/5/18 23:52, Mike Rapoport wrote:
> > On Tue, May 18, 2021 at 08:49:43PM +0800, Kefeng Wang wrote:
> > > 
> > > 
> > > On 2021/5/18 17:06, Mike Rapoport wrote:
> > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > 
> > > > When unused memory map is freed the preserved part of the memory map is
> > > > extended to match pageblock boundaries because lots of core mm
> > > > functionality relies on homogeneity of the memory map within pageblock
> > > > boundaries.
> > > > 
> > > > Since pfn_valid() is used to check whether there is a valid memory map
> > > > entry for a PFN, make it return true also for PFNs that have memory map
> > > > entries even if there is no actual memory populated there.
> > > > 
> > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > > > ---
> > > >    arch/arm/mm/init.c | 15 ++++++++++++++-
> > > >    1 file changed, 14 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > > > index 9d4744a632c6..bb678c0ba143 100644
> > > > --- a/arch/arm/mm/init.c
> > > > +++ b/arch/arm/mm/init.c
> > > > @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
> > > >    int pfn_valid(unsigned long pfn)
> > > >    {
> > > >    	phys_addr_t addr = __pfn_to_phys(pfn);
> > > > +	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
> > > >    	if (__phys_to_pfn(addr) != pfn)
> > > >    		return 0;
> > > > -	return memblock_is_map_memory(addr);
> > > > +	if (memblock_is_map_memory(addr))
> > > > +		return 1;
> > > > +
> > > > +	/*
> > > > +	 * If address less than pageblock_size bytes away from a present
> > > > +	 * memory chunk there still will be a memory map entry for it
> > > > +	 * because we round freed memory map to the pageblock boundaries
> > > > +	 */
> > > > +	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
> > > > +	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
> > > > +		return 1;
> > > 
> > > Hi Mike, with patch3, the system won't boot.
> > 
> > Hmm, apparently I've miscalculated the ranges...
> > 
> > Can you please check with the below patch on top of this series:
> 
> Yes, it works,
> 
> On node 0 totalpages: 311551
>   Normal zone: 1230 pages used for memmap
>   Normal zone: 0 pages reserved
>   Normal zone: 157440 pages, LIFO batch:31
>   Normal zone: 17152 pages in unavailable ranges
>   HighMem zone: 154111 pages, LIFO batch:31
>   HighMem zone: 513 pages in unavailable ranges
> 
> and the oom testcase could pass.
> 
> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> 
> 
> There is memblock_is_region_reserved(check if a region intersects reserved
> memory), it also checks the size, should we add a similar func?

We already have memblock_is_region_memory() that checks if the entire
region is a subset of memory. :(
 
Let's deal with this mess afterwards.
 
> > 
> > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> > index bb678c0ba143..2fafbbc8e73b 100644
> > --- a/arch/arm/mm/init.c
> > +++ b/arch/arm/mm/init.c
> > @@ -138,8 +138,9 @@ int pfn_valid(unsigned long pfn)
> >   	 * memory chunk there still will be a memory map entry for it
> >   	 * because we round freed memory map to the pageblock boundaries
> >   	 */
> > -	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
> > -	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
> > +	if (memblock_overlaps_region(&memblock.memory,
> > +				     ALIGN_DOWN(addr, pageblock_size),
> > +				     pageblock_size);
> >   		return 1;
> >   	return 0;
> >
diff mbox series

Patch

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 9d4744a632c6..bb678c0ba143 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -125,11 +125,24 @@  static void __init zone_sizes_init(unsigned long min, unsigned long max_low,
 int pfn_valid(unsigned long pfn)
 {
 	phys_addr_t addr = __pfn_to_phys(pfn);
+	unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages;
 
 	if (__phys_to_pfn(addr) != pfn)
 		return 0;
 
-	return memblock_is_map_memory(addr);
+	if (memblock_is_map_memory(addr))
+		return 1;
+
+	/*
+	 * If address less than pageblock_size bytes away from a present
+	 * memory chunk there still will be a memory map entry for it
+	 * because we round freed memory map to the pageblock boundaries
+	 */
+	if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) ||
+	    memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size)))
+		return 1;
+
+	return 0;
 }
 EXPORT_SYMBOL(pfn_valid);
 #endif