diff mbox series

[v2,6/6] mm: Introduce ctor/dtor at PGD level

Message ID 20250103184415.2744423-7-kevin.brodsky@arm.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Account page tables at all levels | expand

Commit Message

Kevin Brodsky Jan. 3, 2025, 6:44 p.m. UTC
Following on from the introduction of P4D-level ctor/dtor, let's
finish the job and introduce ctor/dtor at PGD level. The incurred
improvement in page accounting is minimal - the main motivation is
to create a single, generic place where construction/destruction
hooks can be added for all page table pages.

This patch should cover all architectures and all configurations
where PGDs are one or more regular pages. This excludes any
configuration where PGDs are allocated from a kmem_cache object.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/m68k/include/asm/mcf_pgalloc.h | 3 ++-
 arch/m68k/mm/motorola.c             | 6 ++++--
 arch/s390/include/asm/pgalloc.h     | 9 ++++++++-
 include/asm-generic/pgalloc.h       | 3 ++-
 include/linux/mm.h                  | 5 +++++
 5 files changed, 21 insertions(+), 5 deletions(-)

Comments

Alexander Gordeev Jan. 21, 2025, 4:37 p.m. UTC | #1
On Fri, Jan 03, 2025 at 06:44:15PM +0000, Kevin Brodsky wrote:

Hi Kevin,
...
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 5fced6d3c36b..b19b6ed2ab53 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -130,11 +130,18 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
>  
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> -	return (pgd_t *) crst_table_alloc(mm);
> +	unsigned long *table = crst_table_alloc(mm);
> +
> +	if (!table)
> +		return NULL;

I do not know status of this series, but FWIW, this call is missed:

	crst_table_init(table, _REGION1_ENTRY_EMPTY); 

> +	pagetable_pgd_ctor(virt_to_ptdesc(table));
> +
> +	return (pgd_t *) table;
>  }
>  
>  static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  {
> +	pagetable_dtor(virt_to_ptdesc(pgd));
>  	crst_table_free(mm, (unsigned long *) pgd);
>  }

...

Thanks!
Heiko Carstens Jan. 22, 2025, 7:49 a.m. UTC | #2
On Tue, Jan 21, 2025 at 05:37:33PM +0100, Alexander Gordeev wrote:
> On Fri, Jan 03, 2025 at 06:44:15PM +0000, Kevin Brodsky wrote:
> 
> Hi Kevin,
> ...
> > diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> > index 5fced6d3c36b..b19b6ed2ab53 100644
> > --- a/arch/s390/include/asm/pgalloc.h
> > +++ b/arch/s390/include/asm/pgalloc.h
> > @@ -130,11 +130,18 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
> >  
> >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> >  {
> > -	return (pgd_t *) crst_table_alloc(mm);
> > +	unsigned long *table = crst_table_alloc(mm);
> > +
> > +	if (!table)
> > +		return NULL;
> 
> I do not know status of this series, but FWIW, this call is missed:
> 
> 	crst_table_init(table, _REGION1_ENTRY_EMPTY); 

Why is that missing?

A pgd table can be a Region1, Region2, or Region3 table. The only caller of
this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd()
there is a call to init_new_context() which will initialize the pgd correctly.

I guess what really gets odd, and might be broken (haven't checked yet) is
what happens on dynamic upgrade of page table levels (->crst_table_upgrade()).

With that a pgd may become a pud, and with that we get an imbalance with
the ctor/dtor calls for the various page table levels when they get freed
again. Plus, at first glance, it looks also broken that we have open-coded
crst_alloc() calls instead of using the "proper" page table allocation API
within crst_table_upgrade(), which again would cause an imbalance.
Alexander Gordeev Jan. 22, 2025, 2:06 p.m. UTC | #3
On Wed, Jan 22, 2025 at 08:49:54AM +0100, Heiko Carstens wrote:
> > >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > >  {
> > > -	return (pgd_t *) crst_table_alloc(mm);
> > > +	unsigned long *table = crst_table_alloc(mm);
> > > +
> > > +	if (!table)
> > > +		return NULL;
> > 
> > I do not know status of this series, but FWIW, this call is missed:
> > 
> > 	crst_table_init(table, _REGION1_ENTRY_EMPTY); 
> 
> Why is that missing?

Because the follow-up pagetable_pgd_ctor() is called against uninitialized
page table, while other pagetable_pXd_ctor() variants are called against
initialized one. I could imagine complications as result of that.

Whether Region1 table is the right choice is a big question though, as you
noticed below.

> A pgd table can be a Region1, Region2, or Region3 table. The only caller of
> this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd()
> there is a call to init_new_context() which will initialize the pgd correctly.

init_new_context() is in a way a constructor as well, so whole thing looks odd
to me. But I do not immedeately see a better way :(

> I guess what really gets odd, and might be broken (haven't checked yet) is
> what happens on dynamic upgrade of page table levels (->crst_table_upgrade()).

Hmm, that is a good point.

> With that a pgd may become a pud, and with that we get an imbalance with
> the ctor/dtor calls for the various page table levels when they get freed

The ctor/dtor mismatch should not be a problem, as pagetable_pgd|p4d|pud_ctor()
are the same and there is one pagetable_dtor() for all top levels as of now.
But if it ever comes to separate implementations, then we are in the world
of pain.

> again. Plus, at first glance, it looks also broken that we have open-coded
> crst_alloc() calls instead of using the "proper" page table allocation API
> within crst_table_upgrade(), which again would cause an imbalance.

This is a good point too.

Many thanks!
Alexander Gordeev Jan. 22, 2025, 9:16 p.m. UTC | #4
On Wed, Jan 22, 2025 at 03:06:05PM +0100, Alexander Gordeev wrote:

Hi Kevin,

> On Wed, Jan 22, 2025 at 08:49:54AM +0100, Heiko Carstens wrote:
> > > >  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
> > > >  {
> > > > -	return (pgd_t *) crst_table_alloc(mm);
> > > > +	unsigned long *table = crst_table_alloc(mm);
> > > > +
> > > > +	if (!table)
> > > > +		return NULL;
> > > 
> > > I do not know status of this series, but FWIW, this call is missed:
> > > 
> > > 	crst_table_init(table, _REGION1_ENTRY_EMPTY); 
> > 
> > Why is that missing?
> 
> Because the follow-up pagetable_pgd_ctor() is called against uninitialized
> page table, while other pagetable_pXd_ctor() variants are called against
> initialized one. I could imagine complications as result of that.
> 
> Whether Region1 table is the right choice is a big question though, as you
> noticed below.

As discussed with Heiko, we do not want to add the extra crst_table_init() call
at least due to performance impact. So please ignore my comment.

> > A pgd table can be a Region1, Region2, or Region3 table. The only caller of
> > this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd()
> > there is a call to init_new_context() which will initialize the pgd correctly.
> 
> init_new_context() is in a way a constructor as well, so whole thing looks odd
> to me. But I do not immedeately see a better way :(
> 
> > I guess what really gets odd, and might be broken (haven't checked yet) is
> > what happens on dynamic upgrade of page table levels (->crst_table_upgrade()).
> 
> Hmm, that is a good point.
> 
> > With that a pgd may become a pud, and with that we get an imbalance with
> > the ctor/dtor calls for the various page table levels when they get freed
> 
> The ctor/dtor mismatch should not be a problem, as pagetable_pgd|p4d|pud_ctor()
> are the same and there is one pagetable_dtor() for all top levels as of now.
> But if it ever comes to separate implementations, then we are in the world
> of pain.
> 
> > again. Plus, at first glance, it looks also broken that we have open-coded
> > crst_alloc() calls instead of using the "proper" page table allocation API
> > within crst_table_upgrade(), which again would cause an imbalance.
> 
> This is a good point too.

The below bits are seems to be missed. We will test it and send a patch.

diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index a4e761902093..d33f55b7ee98 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -88,12 +88,14 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
 		if (unlikely(!p4d))
 			goto err_p4d;
 		crst_table_init(p4d, _REGION2_ENTRY_EMPTY);
+		pagetable_p4d_ctor(virt_to_ptdesc(p4d));
 	}
 	if (end > _REGION1_SIZE) {
 		pgd = crst_table_alloc(mm);
 		if (unlikely(!pgd))
 			goto err_pgd;
 		crst_table_init(pgd, _REGION1_ENTRY_EMPTY);
+		pagetable_pgd_ctor(virt_to_ptdesc(pgd));
 	}
 
 	spin_lock_bh(&mm->page_table_lock);
@@ -130,6 +132,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end)
 	return 0;
 
 err_pgd:
+	pagetable_dtor(virt_to_ptdesc(p4d));
 	crst_table_free(mm, p4d);
 err_p4d:
 	return -ENOMEM;

Thanks!
diff mbox series

Patch

diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h
index 22d6c1fcabfb..4c648b51e7fd 100644
--- a/arch/m68k/include/asm/mcf_pgalloc.h
+++ b/arch/m68k/include/asm/mcf_pgalloc.h
@@ -73,7 +73,7 @@  static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-	pagetable_free(virt_to_ptdesc(pgd));
+	pagetable_dtor_free(virt_to_ptdesc(pgd));
 }
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
@@ -84,6 +84,7 @@  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	if (!ptdesc)
 		return NULL;
+	pagetable_pgd_ctor(ptdesc);
 	new_pgd = ptdesc_address(ptdesc);
 
 	memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t));
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 6c09ccb72e8b..73651e093c4d 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -169,6 +169,9 @@  void *get_pointer_table(int type)
 		case TABLE_PMD:
 			pagetable_pmd_ctor(virt_to_ptdesc(page));
 			break;
+		case TABLE_PGD:
+			pagetable_pgd_ctor(virt_to_ptdesc(page));
+			break;
 		}
 
 		mmu_page_ctor(page);
@@ -207,8 +210,7 @@  int free_pointer_table(void *table, int type)
 		/* all tables in page are free, free page */
 		list_del(dp);
 		mmu_page_dtor((void *)page);
-		if (type == TABLE_PTE || type == TABLE_PMD)
-			pagetable_dtor(virt_to_ptdesc((void *)page));
+		pagetable_dtor(virt_to_ptdesc((void *)page));
 		free_page (page);
 		return 1;
 	} else if (ptable_list[type].next != dp) {
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 5fced6d3c36b..b19b6ed2ab53 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -130,11 +130,18 @@  static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
-	return (pgd_t *) crst_table_alloc(mm);
+	unsigned long *table = crst_table_alloc(mm);
+
+	if (!table)
+		return NULL;
+	pagetable_pgd_ctor(virt_to_ptdesc(table));
+
+	return (pgd_t *) table;
 }
 
 static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
+	pagetable_dtor(virt_to_ptdesc(pgd));
 	crst_table_free(mm, (unsigned long *) pgd);
 }
 
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index de4df14158e6..892ece4558a2 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -271,6 +271,7 @@  static inline pgd_t *__pgd_alloc_noprof(struct mm_struct *mm, unsigned int order
 	if (!ptdesc)
 		return NULL;
 
+	pagetable_pgd_ctor(ptdesc);
 	return ptdesc_address(ptdesc);
 }
 #define __pgd_alloc(...)	alloc_hooks(__pgd_alloc_noprof(__VA_ARGS__))
@@ -280,7 +281,7 @@  static inline void __pgd_free(struct mm_struct *mm, pgd_t *pgd)
 	struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
 
 	BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
-	pagetable_free(ptdesc);
+	pagetable_dtor_free(ptdesc);
 }
 
 #ifndef __HAVE_ARCH_PGD_FREE
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065fa9449d03..486638d22fc6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3243,6 +3243,11 @@  static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc)
 	__pagetable_ctor(ptdesc);
 }
 
+static inline void pagetable_pgd_ctor(struct ptdesc *ptdesc)
+{
+	__pagetable_ctor(ptdesc);
+}
+
 extern void __init pagecache_init(void);
 extern void free_initmem(void);