Message ID | 20250103184415.2744423-7-kevin.brodsky@arm.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | Account page tables at all levels | expand |
On Fri, Jan 03, 2025 at 06:44:15PM +0000, Kevin Brodsky wrote: Hi Kevin, ... > diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h > index 5fced6d3c36b..b19b6ed2ab53 100644 > --- a/arch/s390/include/asm/pgalloc.h > +++ b/arch/s390/include/asm/pgalloc.h > @@ -130,11 +130,18 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) > > static inline pgd_t *pgd_alloc(struct mm_struct *mm) > { > - return (pgd_t *) crst_table_alloc(mm); > + unsigned long *table = crst_table_alloc(mm); > + > + if (!table) > + return NULL; I do not know status of this series, but FWIW, this call is missed: crst_table_init(table, _REGION1_ENTRY_EMPTY); > + pagetable_pgd_ctor(virt_to_ptdesc(table)); > + > + return (pgd_t *) table; > } > > static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) > { > + pagetable_dtor(virt_to_ptdesc(pgd)); > crst_table_free(mm, (unsigned long *) pgd); > } ... Thanks!
On Tue, Jan 21, 2025 at 05:37:33PM +0100, Alexander Gordeev wrote: > On Fri, Jan 03, 2025 at 06:44:15PM +0000, Kevin Brodsky wrote: > > Hi Kevin, > ... > > diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h > > index 5fced6d3c36b..b19b6ed2ab53 100644 > > --- a/arch/s390/include/asm/pgalloc.h > > +++ b/arch/s390/include/asm/pgalloc.h > > @@ -130,11 +130,18 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) > > > > static inline pgd_t *pgd_alloc(struct mm_struct *mm) > > { > > - return (pgd_t *) crst_table_alloc(mm); > > + unsigned long *table = crst_table_alloc(mm); > > + > > + if (!table) > > + return NULL; > > I do not know status of this series, but FWIW, this call is missed: > > crst_table_init(table, _REGION1_ENTRY_EMPTY); Why is that missing? A pgd table can be a Region1, Region2, or Region3 table. The only caller of this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd() there is a call to init_new_context() which will initialize the pgd correctly. I guess what really gets odd, and might be broken (haven't checked yet) is what happens on dynamic upgrade of page table levels (->crst_table_upgrade()). With that a pgd may become a pud, and with that we get an imbalance with the ctor/dtor calls for the various page table levels when they get freed again. Plus, at first glance, it looks also broken that we have open-coded crst_alloc() calls instead of using the "proper" page table allocation API within crst_table_upgrade(), which again would cause an imbalance.
On Wed, Jan 22, 2025 at 08:49:54AM +0100, Heiko Carstens wrote: > > > static inline pgd_t *pgd_alloc(struct mm_struct *mm) > > > { > > > - return (pgd_t *) crst_table_alloc(mm); > > > + unsigned long *table = crst_table_alloc(mm); > > > + > > > + if (!table) > > > + return NULL; > > > > I do not know status of this series, but FWIW, this call is missed: > > > > crst_table_init(table, _REGION1_ENTRY_EMPTY); > > Why is that missing? Because the follow-up pagetable_pgd_ctor() is called against uninitialized page table, while other pagetable_pXd_ctor() variants are called against initialized one. I could imagine complications as result of that. Whether Region1 table is the right choice is a big question though, as you noticed below. > A pgd table can be a Region1, Region2, or Region3 table. The only caller of > this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd() > there is a call to init_new_context() which will initialize the pgd correctly. init_new_context() is in a way a constructor as well, so whole thing looks odd to me. But I do not immedeately see a better way :( > I guess what really gets odd, and might be broken (haven't checked yet) is > what happens on dynamic upgrade of page table levels (->crst_table_upgrade()). Hmm, that is a good point. > With that a pgd may become a pud, and with that we get an imbalance with > the ctor/dtor calls for the various page table levels when they get freed The ctor/dtor mismatch should not be a problem, as pagetable_pgd|p4d|pud_ctor() are the same and there is one pagetable_dtor() for all top levels as of now. But if it ever comes to separate implementations, then we are in the world of pain. > again. Plus, at first glance, it looks also broken that we have open-coded > crst_alloc() calls instead of using the "proper" page table allocation API > within crst_table_upgrade(), which again would cause an imbalance. This is a good point too. Many thanks!
On Wed, Jan 22, 2025 at 03:06:05PM +0100, Alexander Gordeev wrote: Hi Kevin, > On Wed, Jan 22, 2025 at 08:49:54AM +0100, Heiko Carstens wrote: > > > > static inline pgd_t *pgd_alloc(struct mm_struct *mm) > > > > { > > > > - return (pgd_t *) crst_table_alloc(mm); > > > > + unsigned long *table = crst_table_alloc(mm); > > > > + > > > > + if (!table) > > > > + return NULL; > > > > > > I do not know status of this series, but FWIW, this call is missed: > > > > > > crst_table_init(table, _REGION1_ENTRY_EMPTY); > > > > Why is that missing? > > Because the follow-up pagetable_pgd_ctor() is called against uninitialized > page table, while other pagetable_pXd_ctor() variants are called against > initialized one. I could imagine complications as result of that. > > Whether Region1 table is the right choice is a big question though, as you > noticed below. As discussed with Heiko, we do not want to add the extra crst_table_init() call at least due to performance impact. So please ignore my comment. > > A pgd table can be a Region1, Region2, or Region3 table. The only caller of > > this function is mm_init() via mm_alloc_pgd(); and right after mm_alloc_pgd() > > there is a call to init_new_context() which will initialize the pgd correctly. > > init_new_context() is in a way a constructor as well, so whole thing looks odd > to me. But I do not immedeately see a better way :( > > > I guess what really gets odd, and might be broken (haven't checked yet) is > > what happens on dynamic upgrade of page table levels (->crst_table_upgrade()). > > Hmm, that is a good point. > > > With that a pgd may become a pud, and with that we get an imbalance with > > the ctor/dtor calls for the various page table levels when they get freed > > The ctor/dtor mismatch should not be a problem, as pagetable_pgd|p4d|pud_ctor() > are the same and there is one pagetable_dtor() for all top levels as of now. > But if it ever comes to separate implementations, then we are in the world > of pain. > > > again. Plus, at first glance, it looks also broken that we have open-coded > > crst_alloc() calls instead of using the "proper" page table allocation API > > within crst_table_upgrade(), which again would cause an imbalance. > > This is a good point too. The below bits are seems to be missed. We will test it and send a patch. diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index a4e761902093..d33f55b7ee98 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -88,12 +88,14 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end) if (unlikely(!p4d)) goto err_p4d; crst_table_init(p4d, _REGION2_ENTRY_EMPTY); + pagetable_p4d_ctor(virt_to_ptdesc(p4d)); } if (end > _REGION1_SIZE) { pgd = crst_table_alloc(mm); if (unlikely(!pgd)) goto err_pgd; crst_table_init(pgd, _REGION1_ENTRY_EMPTY); + pagetable_pgd_ctor(virt_to_ptdesc(pgd)); } spin_lock_bh(&mm->page_table_lock); @@ -130,6 +132,7 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end) return 0; err_pgd: + pagetable_dtor(virt_to_ptdesc(p4d)); crst_table_free(mm, p4d); err_p4d: return -ENOMEM; Thanks!
diff --git a/arch/m68k/include/asm/mcf_pgalloc.h b/arch/m68k/include/asm/mcf_pgalloc.h index 22d6c1fcabfb..4c648b51e7fd 100644 --- a/arch/m68k/include/asm/mcf_pgalloc.h +++ b/arch/m68k/include/asm/mcf_pgalloc.h @@ -73,7 +73,7 @@ static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable) static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) { - pagetable_free(virt_to_ptdesc(pgd)); + pagetable_dtor_free(virt_to_ptdesc(pgd)); } static inline pgd_t *pgd_alloc(struct mm_struct *mm) @@ -84,6 +84,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm) if (!ptdesc) return NULL; + pagetable_pgd_ctor(ptdesc); new_pgd = ptdesc_address(ptdesc); memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t)); diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c index 6c09ccb72e8b..73651e093c4d 100644 --- a/arch/m68k/mm/motorola.c +++ b/arch/m68k/mm/motorola.c @@ -169,6 +169,9 @@ void *get_pointer_table(int type) case TABLE_PMD: pagetable_pmd_ctor(virt_to_ptdesc(page)); break; + case TABLE_PGD: + pagetable_pgd_ctor(virt_to_ptdesc(page)); + break; } mmu_page_ctor(page); @@ -207,8 +210,7 @@ int free_pointer_table(void *table, int type) /* all tables in page are free, free page */ list_del(dp); mmu_page_dtor((void *)page); - if (type == TABLE_PTE || type == TABLE_PMD) - pagetable_dtor(virt_to_ptdesc((void *)page)); + pagetable_dtor(virt_to_ptdesc((void *)page)); free_page (page); return 1; } else if (ptable_list[type].next != dp) { diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h index 5fced6d3c36b..b19b6ed2ab53 100644 --- a/arch/s390/include/asm/pgalloc.h +++ b/arch/s390/include/asm/pgalloc.h @@ -130,11 +130,18 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd) static inline pgd_t *pgd_alloc(struct mm_struct *mm) { - return (pgd_t *) crst_table_alloc(mm); + unsigned long *table = crst_table_alloc(mm); + + if (!table) + return NULL; + pagetable_pgd_ctor(virt_to_ptdesc(table)); + + return (pgd_t *) table; } static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd) { + pagetable_dtor(virt_to_ptdesc(pgd)); crst_table_free(mm, (unsigned long *) pgd); } diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h index de4df14158e6..892ece4558a2 100644 --- a/include/asm-generic/pgalloc.h +++ b/include/asm-generic/pgalloc.h @@ -271,6 +271,7 @@ static inline pgd_t *__pgd_alloc_noprof(struct mm_struct *mm, unsigned int order if (!ptdesc) return NULL; + pagetable_pgd_ctor(ptdesc); return ptdesc_address(ptdesc); } #define __pgd_alloc(...) alloc_hooks(__pgd_alloc_noprof(__VA_ARGS__)) @@ -280,7 +281,7 @@ static inline void __pgd_free(struct mm_struct *mm, pgd_t *pgd) struct ptdesc *ptdesc = virt_to_ptdesc(pgd); BUG_ON((unsigned long)pgd & (PAGE_SIZE-1)); - pagetable_free(ptdesc); + pagetable_dtor_free(ptdesc); } #ifndef __HAVE_ARCH_PGD_FREE diff --git a/include/linux/mm.h b/include/linux/mm.h index 065fa9449d03..486638d22fc6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3243,6 +3243,11 @@ static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc) __pagetable_ctor(ptdesc); } +static inline void pagetable_pgd_ctor(struct ptdesc *ptdesc) +{ + __pagetable_ctor(ptdesc); +} + extern void __init pagecache_init(void); extern void free_initmem(void);