Message ID | 20210202112450.11932-3-osalvador@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cleanup and fixups for vmemmap handling | expand |
> @@ -1088,10 +1150,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, > pages++; > } else { > /* If here, we are freeing vmemmap pages. */ > - memset((void *)addr, PAGE_INUSE, next - addr); > + memset((void *)addr, PAGE_UNUSED, next - addr); > > page_addr = page_address(pud_page(*pud)); > - if (!memchr_inv(page_addr, PAGE_INUSE, > + if (!memchr_inv(page_addr, PAGE_UNUSED, > PUD_SIZE)) { > free_pagetable(pud_page(*pud), > get_order(PUD_SIZE)); I'm sorry to bother you again, but isn't that dead code as well? How do we ever end up using 1GB pages for the vmemmap? At least not via vmemmap_populate() - so I guess never? There are not many occurrences of "PUD_SIZE" in the file after all ... I think we can simplify that code.
On Tue, Feb 02, 2021 at 02:29:11PM +0100, David Hildenbrand wrote: > > @@ -1088,10 +1150,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, > > pages++; > > } else { > > /* If here, we are freeing vmemmap pages. */ > > - memset((void *)addr, PAGE_INUSE, next - addr); > > + memset((void *)addr, PAGE_UNUSED, next - addr); > > page_addr = page_address(pud_page(*pud)); > > - if (!memchr_inv(page_addr, PAGE_INUSE, > > + if (!memchr_inv(page_addr, PAGE_UNUSED, > > PUD_SIZE)) { > > free_pagetable(pud_page(*pud), > > get_order(PUD_SIZE)); > > I'm sorry to bother you again, but isn't that dead code as well? Heh, I spotted that earlier, but I did not think much of it honestly. All this was introduced by: commit ae9aae9eda2db71bf4b592f15618b0160eb07731 Author: Wen Congyang <wency@cn.fujitsu.com> Date: Fri Feb 22 16:33:04 2013 -0800 memory-hotplug: common APIs to support page tables hot-remove > How do we ever end up using 1GB pages for the vmemmap? At least not via > vmemmap_populate() - so I guess never? There are not many occurrences of > "PUD_SIZE" in the file after all ... AFAICT, we don't. The largest we populate for vmemmap is 2MB. I see init_memory_mapping can use 1G, but that should not affect us. I guess that the vmemmap handling for 1GB can go as well. I will update the patchset.
Hi Oscar, Thank you for the patch! Yet something to improve: [auto build test ERROR on tip/x86/mm] [also build test ERROR on hnaz-linux-mm/master v5.11-rc6 next-20210125] [cannot apply to luto/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Oscar-Salvador/Cleanup-and-fixups-for-vmemmap-handling/20210202-192636 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 167dcfc08b0b1f964ea95d410aa496fd78adf475 config: x86_64-randconfig-r004-20210202 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 275c6af7d7f1ed63a03d05b4484413e447133269) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://github.com/0day-ci/linux/commit/2995155f4651bbb8c0d5f2e58e6e77321c5a889a git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Oscar-Salvador/Cleanup-and-fixups-for-vmemmap-handling/20210202-192636 git checkout 2995155f4651bbb8c0d5f2e58e6e77321c5a889a # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): >> arch/x86/mm/init_64.c:1588:6: error: implicit declaration of function 'vmemmap_use_new_sub_pmd' [-Werror,-Wimplicit-function-declaration] vmemmap_use_new_sub_pmd(addr, next); ^ >> arch/x86/mm/init_64.c:1594:4: error: implicit declaration of function 'vmemmap_use_sub_pmd' [-Werror,-Wimplicit-function-declaration] vmemmap_use_sub_pmd(addr, next); ^ 2 errors generated. vim +/vmemmap_use_new_sub_pmd +1588 arch/x86/mm/init_64.c 1535 1536 static int __meminit vmemmap_populate_hugepages(unsigned long start, 1537 unsigned long end, int node, struct vmem_altmap *altmap) 1538 { 1539 unsigned long addr; 1540 unsigned long next; 1541 pgd_t *pgd; 1542 p4d_t *p4d; 1543 pud_t *pud; 1544 pmd_t *pmd; 1545 1546 for (addr = start; addr < end; addr = next) { 1547 next = pmd_addr_end(addr, end); 1548 1549 pgd = vmemmap_pgd_populate(addr, node); 1550 if (!pgd) 1551 return -ENOMEM; 1552 1553 p4d = vmemmap_p4d_populate(pgd, addr, node); 1554 if (!p4d) 1555 return -ENOMEM; 1556 1557 pud = vmemmap_pud_populate(p4d, addr, node); 1558 if (!pud) 1559 return -ENOMEM; 1560 1561 pmd = pmd_offset(pud, addr); 1562 if (pmd_none(*pmd)) { 1563 void *p; 1564 1565 p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap); 1566 if (p) { 1567 pte_t entry; 1568 1569 entry = pfn_pte(__pa(p) >> PAGE_SHIFT, 1570 PAGE_KERNEL_LARGE); 1571 set_pmd(pmd, __pmd(pte_val(entry))); 1572 1573 /* check to see if we have contiguous blocks */ 1574 if (p_end != p || node_start != node) { 1575 if (p_start) 1576 pr_debug(" [%lx-%lx] PMD -> [%p-%p] on node %d\n", 1577 addr_start, addr_end-1, p_start, p_end-1, node_start); 1578 addr_start = addr; 1579 node_start = node; 1580 p_start = p; 1581 } 1582 1583 addr_end = addr + PMD_SIZE; 1584 p_end = p + PMD_SIZE; 1585 1586 if (!IS_ALIGNED(addr, PMD_SIZE) || 1587 !IS_ALIGNED(next, PMD_SIZE)) > 1588 vmemmap_use_new_sub_pmd(addr, next); 1589 continue; 1590 } else if (altmap) 1591 return -ENOMEM; /* no fallback */ 1592 } else if (pmd_large(*pmd)) { 1593 vmemmap_verify((pte_t *)pmd, node, addr, next); > 1594 vmemmap_use_sub_pmd(addr, next); 1595 continue; 1596 } 1597 if (vmemmap_populate_basepages(addr, next, node, NULL)) 1598 return -ENOMEM; 1599 } 1600 return 0; 1601 } 1602 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 4cfa902ec861..b239708e504e 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -871,7 +871,72 @@ int arch_add_memory(int nid, u64 start, u64 size, return add_pages(nid, start_pfn, nr_pages, params); } -#define PAGE_INUSE 0xFD +#define PAGE_UNUSED 0xFD + +/* + * The unused vmemmap range, which was not yet memset(PAGE_UNUSED) ranges + * from unused_pmd_start to next PMD_SIZE boundary. + */ +static unsigned long unused_pmd_start __meminitdata; + +static void __meminit vmemmap_flush_unused_pmd(void) +{ + if (!unused_pmd_start) + return; + /* + * Clears (unused_pmd_start, PMD_END] + */ + memset((void *)unused_pmd_start, PAGE_UNUSED, + ALIGN(unused_pmd_start, PMD_SIZE) - unused_pmd_start); + unused_pmd_start = 0; +} + +/* Returns true if the PMD is completely unused and thus it can be freed */ +static bool __meminit vmemmap_unuse_sub_pmd(unsigned long addr, unsigned long end) +{ + unsigned long start = ALIGN_DOWN(addr, PMD_SIZE); + + vmemmap_flush_unused_pmd(); + memset((void *)addr, PAGE_UNUSED, end - addr); + + return !memchr_inv((void *)start, PAGE_UNUSED, PMD_SIZE); +} + +static void __meminit vmemmap_use_sub_pmd(unsigned long start, unsigned long end) +{ + /* + * We only optimize if the new used range directly follows the + * previously unused range (esp., when populating consecutive sections). + */ + if (unused_pmd_start == start) { + if (likely(IS_ALIGNED(end, PMD_SIZE))) + unused_pmd_start = 0; + else + unused_pmd_start = end; + return; + } + + vmemmap_flush_unused_pmd(); +} + +static void __meminit vmemmap_use_new_sub_pmd(unsigned long start, unsigned long end) +{ + vmemmap_flush_unused_pmd(); + + /* + * Mark the unused parts of the new memmap range + */ + if (!IS_ALIGNED(start, PMD_SIZE)) + memset((void *)start, PAGE_UNUSED, + start - ALIGN_DOWN(start, PMD_SIZE)); + /* + * We want to avoid memset(PAGE_UNUSED) when populating the vmemmap of + * consecutive sections. Remember for the last added PMD the last + * unused range in the populated PMD. + */ + if (!IS_ALIGNED(end, PMD_SIZE)) + unused_pmd_start = end; +} static void __meminit free_pagetable(struct page *page, int order) { @@ -1010,7 +1075,6 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end, unsigned long next, pages = 0; pte_t *pte_base; pmd_t *pmd; - void *page_addr; pmd = pmd_start + pmd_index(addr); for (; addr < end; addr = next, pmd++) { @@ -1031,12 +1095,10 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end, spin_unlock(&init_mm.page_table_lock); pages++; } else { - /* If here, we are freeing vmemmap pages. */ - memset((void *)addr, PAGE_INUSE, next - addr); - - page_addr = page_address(pmd_page(*pmd)); - if (!memchr_inv(page_addr, PAGE_INUSE, - PMD_SIZE)) { + /* + * Free the PMD if the whole range is unused. + */ + if (vmemmap_unuse_sub_pmd(addr, next)) { free_hugepage_table(pmd_page(*pmd), altmap); @@ -1088,10 +1150,10 @@ remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end, pages++; } else { /* If here, we are freeing vmemmap pages. */ - memset((void *)addr, PAGE_INUSE, next - addr); + memset((void *)addr, PAGE_UNUSED, next - addr); page_addr = page_address(pud_page(*pud)); - if (!memchr_inv(page_addr, PAGE_INUSE, + if (!memchr_inv(page_addr, PAGE_UNUSED, PUD_SIZE)) { free_pagetable(pud_page(*pud), get_order(PUD_SIZE)); @@ -1520,11 +1582,16 @@ static int __meminit vmemmap_populate_hugepages(unsigned long start, addr_end = addr + PMD_SIZE; p_end = p + PMD_SIZE; + + if (!IS_ALIGNED(addr, PMD_SIZE) || + !IS_ALIGNED(next, PMD_SIZE)) + vmemmap_use_new_sub_pmd(addr, next); continue; } else if (altmap) return -ENOMEM; /* no fallback */ } else if (pmd_large(*pmd)) { vmemmap_verify((pte_t *)pmd, node, addr, next); + vmemmap_use_sub_pmd(addr, next); continue; } if (vmemmap_populate_basepages(addr, next, node, NULL))
When the size of a struct page is not multiple of 2MB, sections do not span a PMD anymore and so when populating them some parts of the PMD will remain unused. Because of this, PMDs will be left behind when depopulating sections since remove_pmd_table() thinks that those unused parts are still in use. Fix this by marking the unused parts with PAGE_UNUSED, so memchr_inv() will do the right thing and will let us free the PMD when the last user of it is gone. This patch is based on a similar patch by David Hildenbrand: https://lore.kernel.org/linux-mm/20200722094558.9828-9-david@redhat.com/ https://lore.kernel.org/linux-mm/20200722094558.9828-10-david@redhat.com/ Signed-off-by: Oscar Salvador <osalvador@suse.de> --- arch/x86/mm/init_64.c | 87 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 77 insertions(+), 10 deletions(-)