diff mbox series

[3/3] mm: Add kernel PTE level pagetable pages account

Message ID 398ead25695e530f766849be5edafaf62c1c864d.1657096412.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series Add PUD and kernel PTE level pagetable account | expand

Commit Message

Baolin Wang July 6, 2022, 8:59 a.m. UTC
Now the kernel PTE level ptes are always protected by mm->page_table_lock
instead of split pagetable lock, so the kernel PTE level pagetable pages
are not accounted. Especially the vmalloc()/vmap() can consume lots of
kernel pagetable, so to get an accurate pagetable accounting, calling new
helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
PTE level pagetable page.

Meanwhile converting architectures to use corresponding generic PTE pagetable
allocation and freeing functions.

Note this patch only adds accounting to the page tables allocated after boot.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 arch/csky/include/asm/pgalloc.h |  2 +-
 arch/microblaze/mm/pgtable.c    |  2 +-
 arch/openrisc/mm/ioremap.c      |  2 +-
 arch/x86/mm/pgtable.c           |  2 +-
 include/asm-generic/pgalloc.h   | 14 ++++++++++++--
 5 files changed, 16 insertions(+), 6 deletions(-)

Comments

Matthew Wilcox (Oracle) July 6, 2022, 3:45 p.m. UTC | #1
On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
> Now the kernel PTE level ptes are always protected by mm->page_table_lock
> instead of split pagetable lock, so the kernel PTE level pagetable pages
> are not accounted. Especially the vmalloc()/vmap() can consume lots of
> kernel pagetable, so to get an accurate pagetable accounting, calling new
> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
> PTE level pagetable page.
> 
> Meanwhile converting architectures to use corresponding generic PTE pagetable
> allocation and freeing functions.
> 
> Note this patch only adds accounting to the page tables allocated after boot.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>

What does this Reported-by: even mean?  the kernel test robot told you
that the page tables weren't being accounted?

I don't understand why we want to start accounting kernel page tables.
an we have a *discussion* about that with a sensible thread name instead
of just trying to sneak it in as patch 3/3?
Baolin Wang July 7, 2022, 11:45 a.m. UTC | #2
On 7/6/2022 11:45 PM, Matthew Wilcox wrote:
> On Wed, Jul 06, 2022 at 04:59:17PM +0800, Baolin Wang wrote:
>> Now the kernel PTE level ptes are always protected by mm->page_table_lock
>> instead of split pagetable lock, so the kernel PTE level pagetable pages
>> are not accounted. Especially the vmalloc()/vmap() can consume lots of
>> kernel pagetable, so to get an accurate pagetable accounting, calling new
>> helpers page_{set,clear}_pgtable() when allocating or freeing a kernel
>> PTE level pagetable page.
>>
>> Meanwhile converting architectures to use corresponding generic PTE pagetable
>> allocation and freeing functions.
>>
>> Note this patch only adds accounting to the page tables allocated after boot.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> What does this Reported-by: even mean?  the kernel test robot told you
> that the page tables weren't being accounted?

I fixed an issue reported by this robot. OK, I can remove the tag.

> I don't understand why we want to start accounting kernel page tables.
> an we have a *discussion* about that with a sensible thread name instead
> of just trying to sneak it in as patch 3/3?

I think I have replied to you in below link [1]. The reason is we should 
keep consistent with PMD or PUD pagetable allocation.

[1] 
https://lore.kernel.org/all/68a5286b-7ff3-2c4e-1ab2-305e7860a2f3@linux.alibaba.com/
diff mbox series

Patch

diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
index 7d57e5d..56f8d25 100644
--- a/arch/csky/include/asm/pgalloc.h
+++ b/arch/csky/include/asm/pgalloc.h
@@ -29,7 +29,7 @@  static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 	unsigned long i;
 
-	pte = (pte_t *) __get_free_page(GFP_KERNEL);
+	pte = __pte_alloc_one_kernel(mm);
 	if (!pte)
 		return NULL;
 
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 9f73265..e96dd1b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -245,7 +245,7 @@  unsigned long iopa(unsigned long addr)
 __ref pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
 {
 	if (mem_init_done)
-		return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+		return __pte_alloc_one_kernel(mm);
 	else
 		return memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
 					      MEMBLOCK_LOW_LIMIT,
diff --git a/arch/openrisc/mm/ioremap.c b/arch/openrisc/mm/ioremap.c
index daae13a..3453acc 100644
--- a/arch/openrisc/mm/ioremap.c
+++ b/arch/openrisc/mm/ioremap.c
@@ -118,7 +118,7 @@  pte_t __ref *pte_alloc_one_kernel(struct mm_struct *mm)
 	pte_t *pte;
 
 	if (likely(mem_init_done)) {
-		pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+		pte = __pte_alloc_one_kernel(mm);
 	} else {
 		pte = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 		if (!pte)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ea39670..20f3076 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -858,7 +858,7 @@  int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	/* INVLPG to clear all paging-structure caches */
 	flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
 
-	free_page((unsigned long)pte);
+	pte_free_kernel(NULL, pte);
 
 	return 1;
 }
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 8ce8d7c..cd8420f 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -18,7 +18,14 @@ 
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+	struct page *page;
+	gfp_t gfp = GFP_PGTABLE_KERNEL;
+
+	page = alloc_pages(gfp, 0);
+	if (!page)
+		return NULL;
+	page_set_pgtable(page);
+	return (pte_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +48,10 @@  static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	free_page((unsigned long)pte);
+	struct page *page = virt_to_page(pte);
+
+	page_clear_pgtable(page);
+	__free_page(page);
 }
 
 /**