Message ID | 20210518090613.21519-4-rppt@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | memblock, arm: fixes for freeing of the memory map | expand |
On Tue, May 18, 2021 at 12:06:13PM +0300, Mike Rapoport wrote: > From: Mike Rapoport <rppt@linux.ibm.com> > > When unused memory map is freed the preserved part of the memory map is > extended to match pageblock boundaries because lots of core mm > functionality relies on homogeneity of the memory map within pageblock > boundaries. > > Since pfn_valid() is used to check whether there is a valid memory map > entry for a PFN, make it return true also for PFNs that have memory map > entries even if there is no actual memory populated there. I thought pfn_valid() was a particularly hot path... do we really want to be doing multiple lookups here? Is there no better solution?
On Tue, May 18, 2021 at 10:44:27AM +0100, Russell King (Oracle) wrote: > On Tue, May 18, 2021 at 12:06:13PM +0300, Mike Rapoport wrote: > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > When unused memory map is freed the preserved part of the memory map is > > extended to match pageblock boundaries because lots of core mm > > functionality relies on homogeneity of the memory map within pageblock > > boundaries. > > > > Since pfn_valid() is used to check whether there is a valid memory map > > entry for a PFN, make it return true also for PFNs that have memory map > > entries even if there is no actual memory populated there. > > I thought pfn_valid() was a particularly hot path... do we really want > to be doing multiple lookups here? Is there no better solution? It is hot, but for more, hmm, straightforward memory layouts it'll take if (memblock_is_map_memory(addr)) return 1; branch, I think. Most of mm operations are on pages that are fed into buddy allocator, and if there are no holes with weird alignment pfn_valid() will return 1 right away. Now thinking about it, with the patch that marks NOMAP areas reserved in the memory map [1], we could also use memblock_overlaps_region(&memblock.memory, ALIGN_DOWN(addr, pageblock_size), ALIGN(addr + 1, pageblock_size)) to have only one lookup. Completely another approach would be to simply stop freeing memory map with SPARSEMEM. For systems like the one Kefen is using, it would waste less than 2M out of 1.5G. It is worse of course for old systems with small memories. The worst case being mach-ep93xx with sections size of 256M and I presume 16M per section would be normal for such machines. [1] https://lore.kernel.org/lkml/20210511100550.28178-3-rppt@kernel.org
On 2021/5/18 17:06, Mike Rapoport wrote: > From: Mike Rapoport <rppt@linux.ibm.com> > > When unused memory map is freed the preserved part of the memory map is > extended to match pageblock boundaries because lots of core mm > functionality relies on homogeneity of the memory map within pageblock > boundaries. > > Since pfn_valid() is used to check whether there is a valid memory map > entry for a PFN, make it return true also for PFNs that have memory map > entries even if there is no actual memory populated there. > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > --- > arch/arm/mm/init.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > index 9d4744a632c6..bb678c0ba143 100644 > --- a/arch/arm/mm/init.c > +++ b/arch/arm/mm/init.c > @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, > int pfn_valid(unsigned long pfn) > { > phys_addr_t addr = __pfn_to_phys(pfn); > + unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages; > > if (__phys_to_pfn(addr) != pfn) > return 0; > > - return memblock_is_map_memory(addr); > + if (memblock_is_map_memory(addr)) > + return 1; > + > + /* > + * If address less than pageblock_size bytes away from a present > + * memory chunk there still will be a memory map entry for it > + * because we round freed memory map to the pageblock boundaries > + */ > + if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || > + memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) > + return 1; Hi Mike, with patch3, the system won't boot. ... Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff] Normal zone: 18176 pages in unavailable ranges 8<--- cut here --- Unable to handle kernel paging request at virtual address 0197c000 pgd = (ptrval) [0197c000] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #132 Hardware name: Hisilicon A9 PC is at mmioset+0x4c/0xa8 LR is at 0x0 pc : [<c033930c>] lr : [<00000000>] psr: 00000293 sp : c0801d88 ip : 0197c000 fp : 000001ff r10: 00000000 r9 : 00000001 r8 : 00000000 r7 : 00000032 r6 : 00000032 r5 : 01000000 r4 : 0197c000 r3 : 00000000 r2 : ffffffe0 r1 : 00000000 r0 : 0197c000 Flags: nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 1ac5387d Table: 80a0404a DAC: 55555555 Process swapper (pid: 0, stack limit = 0x(ptrval)) Stack: (0xc0801d88 to 0xc0802000) 1d80: 000cc000 c04b5764 000cbe00 0197c000 c0885040 c04b580c 1da0: 00000000 00000000 00000001 000fffff 00000000 00000000 00000000 000b0200 1dc0: 00000001 00000000 c0880f38 c04b5d4c 000fffff 00000000 00000000 00000001 1de0: 000cc000 000dca00 00000005 c0813d28 000259ff 000259ff 00000001 c072e4c4 1e00: 0004fdff 000b0200 c050903c 00000004 c0863280 c0712ea4 00000000 ffffefff 1e20: 00000000 c0862fa0 0007f5ff 00080a00 00000000 00080a00 000fffff c0813d28 1e40: 00000000 00000001 c05b046b 00000000 fffff000 c0842780 c0801e90 00000000 1e60: fffff000 c071336c c0801e90 ffffefff 00000000 00000001 000fda00 000fffff 1e80: ffffffff 000fda00 000fffff ffffffff 00000000 c0813d28 00000000 c0881bb8 1ea0: c0881bac c0817900 c05df034 c0842780 c0818614 c086e3fc ffe00000 c0705d50 1ec0: 000b0200 000fffff 00000000 c0813d28 c0817900 00000000 ef7fa000 c07076ac 1ee0: 00000000 c0801f00 c0801f04 00000000 b0200000 00080b00 00081200 c072b000 1f00: cc000000 b0200000 00000000 00000006 fda00000 ffff1000 000afe01 00001000 1f20: 00000007 c0813d28 c0598da7 c0727be8 c08178e4 c086e1c0 c08e8058 c0008000 1f40: c0801fcc 1ac5387d c05df0f8 c07044e8 00000000 00117680 c0813d20 12c0387d 1f60: 40000193 80a01000 413fc090 1ac5387d 00000000 c0166978 c0801fac c0801fac 1f80: 00000000 c04b07e0 c0801fac c0813d28 00000080 00117680 c0813d20 12c0387d 1fa0: 40000193 80a01000 413fc090 1ac5387d 00000000 c0700940 00000000 00000000 1fc0: 00000000 00000000 00000000 c072aa64 00000000 c0813d28 00000000 00117680 1fe0: 55555555 12c0387d 40000193 80a01000 413fc090 00000000 00000000 00000000 [<c033930c>] (mmioset) from [<c04b5764>] (__init_single_page+0x50/0x7c) [<c04b5764>] (__init_single_page) from [<c04b580c>] (init_unavailable_range+0x7c/0xec) [<c04b580c>] (init_unavailable_range) from [<c04b5d4c>] (memmap_init+0x144/0x188) [<c04b5d4c>] (memmap_init) from [<c0712ea4>] (free_area_init_node+0x3b8/0x54c) [<c0712ea4>] (free_area_init_node) from [<c071336c>] (free_area_init+0x214/0x5a4) [<c071336c>] (free_area_init) from [<c0705d50>] (bootmem_init+0x78/0xac) [<c0705d50>] (bootmem_init) from [<c07076ac>] (paging_init+0x410/0x6bc) [<c07076ac>] (paging_init) from [<c07044e8>] (setup_arch+0x10c/0x5c4) [<c07044e8>] (setup_arch) from [<c0700940>] (start_kernel+0x60/0x5c0) [<c0700940>] (start_kernel) from [<00000000>] (0x0) Code: 0a41aca8 f9ffffca 0081bd08 200012e3 (0a41ac18)
On Tue, May 18, 2021 at 08:49:43PM +0800, Kefeng Wang wrote: > > > On 2021/5/18 17:06, Mike Rapoport wrote: > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > When unused memory map is freed the preserved part of the memory map is > > extended to match pageblock boundaries because lots of core mm > > functionality relies on homogeneity of the memory map within pageblock > > boundaries. > > > > Since pfn_valid() is used to check whether there is a valid memory map > > entry for a PFN, make it return true also for PFNs that have memory map > > entries even if there is no actual memory populated there. > > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > > --- > > arch/arm/mm/init.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > > index 9d4744a632c6..bb678c0ba143 100644 > > --- a/arch/arm/mm/init.c > > +++ b/arch/arm/mm/init.c > > @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, > > int pfn_valid(unsigned long pfn) > > { > > phys_addr_t addr = __pfn_to_phys(pfn); > > + unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages; > > if (__phys_to_pfn(addr) != pfn) > > return 0; > > - return memblock_is_map_memory(addr); > > + if (memblock_is_map_memory(addr)) > > + return 1; > > + > > + /* > > + * If address less than pageblock_size bytes away from a present > > + * memory chunk there still will be a memory map entry for it > > + * because we round freed memory map to the pageblock boundaries > > + */ > > + if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || > > + memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) > > + return 1; > > Hi Mike, with patch3, the system won't boot. Hmm, apparently I've miscalculated the ranges... Can you please check with the below patch on top of this series: diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index bb678c0ba143..2fafbbc8e73b 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -138,8 +138,9 @@ int pfn_valid(unsigned long pfn) * memory chunk there still will be a memory map entry for it * because we round freed memory map to the pageblock boundaries */ - if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || - memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) + if (memblock_overlaps_region(&memblock.memory, + ALIGN_DOWN(addr, pageblock_size), + pageblock_size); return 1; return 0;
On 2021/5/18 23:52, Mike Rapoport wrote: > On Tue, May 18, 2021 at 08:49:43PM +0800, Kefeng Wang wrote: >> >> >> On 2021/5/18 17:06, Mike Rapoport wrote: >>> From: Mike Rapoport <rppt@linux.ibm.com> >>> >>> When unused memory map is freed the preserved part of the memory map is >>> extended to match pageblock boundaries because lots of core mm >>> functionality relies on homogeneity of the memory map within pageblock >>> boundaries. >>> >>> Since pfn_valid() is used to check whether there is a valid memory map >>> entry for a PFN, make it return true also for PFNs that have memory map >>> entries even if there is no actual memory populated there. >>> >>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> >>> --- >>> arch/arm/mm/init.c | 15 ++++++++++++++- >>> 1 file changed, 14 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c >>> index 9d4744a632c6..bb678c0ba143 100644 >>> --- a/arch/arm/mm/init.c >>> +++ b/arch/arm/mm/init.c >>> @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, >>> int pfn_valid(unsigned long pfn) >>> { >>> phys_addr_t addr = __pfn_to_phys(pfn); >>> + unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages; >>> if (__phys_to_pfn(addr) != pfn) >>> return 0; >>> - return memblock_is_map_memory(addr); >>> + if (memblock_is_map_memory(addr)) >>> + return 1; >>> + >>> + /* >>> + * If address less than pageblock_size bytes away from a present >>> + * memory chunk there still will be a memory map entry for it >>> + * because we round freed memory map to the pageblock boundaries >>> + */ >>> + if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || >>> + memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) >>> + return 1; >> >> Hi Mike, with patch3, the system won't boot. > > Hmm, apparently I've miscalculated the ranges... > > Can you please check with the below patch on top of this series: Yes, it works, On node 0 totalpages: 311551 Normal zone: 1230 pages used for memmap Normal zone: 0 pages reserved Normal zone: 157440 pages, LIFO batch:31 Normal zone: 17152 pages in unavailable ranges HighMem zone: 154111 pages, LIFO batch:31 HighMem zone: 513 pages in unavailable ranges and the oom testcase could pass. Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> There is memblock_is_region_reserved(check if a region intersects reserved memory), it also checks the size, should we add a similar func? > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > index bb678c0ba143..2fafbbc8e73b 100644 > --- a/arch/arm/mm/init.c > +++ b/arch/arm/mm/init.c > @@ -138,8 +138,9 @@ int pfn_valid(unsigned long pfn) > * memory chunk there still will be a memory map entry for it > * because we round freed memory map to the pageblock boundaries > */ > - if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || > - memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) > + if (memblock_overlaps_region(&memblock.memory, > + ALIGN_DOWN(addr, pageblock_size), > + pageblock_size); > return 1; > > return 0; >
On Wed, May 19, 2021 at 09:50:46AM +0800, Kefeng Wang wrote: > > On 2021/5/18 23:52, Mike Rapoport wrote: > > On Tue, May 18, 2021 at 08:49:43PM +0800, Kefeng Wang wrote: > > > > > > > > > On 2021/5/18 17:06, Mike Rapoport wrote: > > > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > > > > > When unused memory map is freed the preserved part of the memory map is > > > > extended to match pageblock boundaries because lots of core mm > > > > functionality relies on homogeneity of the memory map within pageblock > > > > boundaries. > > > > > > > > Since pfn_valid() is used to check whether there is a valid memory map > > > > entry for a PFN, make it return true also for PFNs that have memory map > > > > entries even if there is no actual memory populated there. > > > > > > > > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> > > > > --- > > > > arch/arm/mm/init.c | 15 ++++++++++++++- > > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > > > > index 9d4744a632c6..bb678c0ba143 100644 > > > > --- a/arch/arm/mm/init.c > > > > +++ b/arch/arm/mm/init.c > > > > @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, > > > > int pfn_valid(unsigned long pfn) > > > > { > > > > phys_addr_t addr = __pfn_to_phys(pfn); > > > > + unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages; > > > > if (__phys_to_pfn(addr) != pfn) > > > > return 0; > > > > - return memblock_is_map_memory(addr); > > > > + if (memblock_is_map_memory(addr)) > > > > + return 1; > > > > + > > > > + /* > > > > + * If address less than pageblock_size bytes away from a present > > > > + * memory chunk there still will be a memory map entry for it > > > > + * because we round freed memory map to the pageblock boundaries > > > > + */ > > > > + if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || > > > > + memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) > > > > + return 1; > > > > > > Hi Mike, with patch3, the system won't boot. > > > > Hmm, apparently I've miscalculated the ranges... > > > > Can you please check with the below patch on top of this series: > > Yes, it works, > > On node 0 totalpages: 311551 > Normal zone: 1230 pages used for memmap > Normal zone: 0 pages reserved > Normal zone: 157440 pages, LIFO batch:31 > Normal zone: 17152 pages in unavailable ranges > HighMem zone: 154111 pages, LIFO batch:31 > HighMem zone: 513 pages in unavailable ranges > > and the oom testcase could pass. > > Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> > > > There is memblock_is_region_reserved(check if a region intersects reserved > memory), it also checks the size, should we add a similar func? We already have memblock_is_region_memory() that checks if the entire region is a subset of memory. :( Let's deal with this mess afterwards. > > > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > > index bb678c0ba143..2fafbbc8e73b 100644 > > --- a/arch/arm/mm/init.c > > +++ b/arch/arm/mm/init.c > > @@ -138,8 +138,9 @@ int pfn_valid(unsigned long pfn) > > * memory chunk there still will be a memory map entry for it > > * because we round freed memory map to the pageblock boundaries > > */ > > - if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || > > - memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) > > + if (memblock_overlaps_region(&memblock.memory, > > + ALIGN_DOWN(addr, pageblock_size), > > + pageblock_size); > > return 1; > > return 0; > >
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 9d4744a632c6..bb678c0ba143 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -125,11 +125,24 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, int pfn_valid(unsigned long pfn) { phys_addr_t addr = __pfn_to_phys(pfn); + unsigned long pageblock_size = PAGE_SIZE * pageblock_nr_pages; if (__phys_to_pfn(addr) != pfn) return 0; - return memblock_is_map_memory(addr); + if (memblock_is_map_memory(addr)) + return 1; + + /* + * If address less than pageblock_size bytes away from a present + * memory chunk there still will be a memory map entry for it + * because we round freed memory map to the pageblock boundaries + */ + if (memblock_is_map_memory(ALIGN(addr + 1, pageblock_size)) || + memblock_is_map_memory(ALIGN_DOWN(addr, pageblock_size))) + return 1; + + return 0; } EXPORT_SYMBOL(pfn_valid); #endif