diff mbox series

arm64: mm: enable per pmd page table lock

Message ID 20190214211642.2200-1-yuzhao@google.com (mailing list archive)
State New, archived
Headers show
Series arm64: mm: enable per pmd page table lock | expand

Commit Message

Yu Zhao Feb. 14, 2019, 9:16 p.m. UTC
Switch from per mm_struct to per pmd page table lock by enabling
ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
large system.

I'm not sure if there is contention on mm->page_table_lock. Given
the option comes at no cost (apart from initializing more spin
locks), why not enable it now.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 arch/arm64/Kconfig               |  3 +++
 arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
 arch/arm64/include/asm/tlb.h     |  5 ++++-
 3 files changed, 18 insertions(+), 2 deletions(-)

Comments

Will Deacon Feb. 18, 2019, 3:12 p.m. UTC | #1
[+Mark]

On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
> Switch from per mm_struct to per pmd page table lock by enabling
> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> large system.
> 
> I'm not sure if there is contention on mm->page_table_lock. Given
> the option comes at no cost (apart from initializing more spin
> locks), why not enable it now.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>  arch/arm64/Kconfig               |  3 +++
>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4168d366127..104325a1ffc3 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>  config ARCH_HAS_CACHE_LINE_SIZE
>  	def_bool y
>  
> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> +	def_bool y
> +
>  config SECCOMP
>  	bool "Enable seccomp to safely compute untrusted bytecode"
>  	---help---
> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> index 52fa47c73bf0..dabba4b2c61f 100644
> --- a/arch/arm64/include/asm/pgalloc.h
> +++ b/arch/arm64/include/asm/pgalloc.h
> @@ -33,12 +33,22 @@
>  
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> +	struct page *page;
> +
> +	page = alloc_page(PGALLOC_GFP);
> +	if (!page)
> +		return NULL;
> +	if (!pgtable_pmd_page_ctor(page)) {
> +		__free_page(page);
> +		return NULL;
> +	}
> +	return page_address(page);

I'm a bit worried as to how this interacts with the page-table code in
arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
looks like that currently always calls pgtable_page_ctor(), regardless of
level. Do we now need a separate allocator function for the PMD level?

Will
Yu Zhao Feb. 18, 2019, 7:49 p.m. UTC | #2
On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
> [+Mark]
> 
> On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
> > Switch from per mm_struct to per pmd page table lock by enabling
> > ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> > large system.
> > 
> > I'm not sure if there is contention on mm->page_table_lock. Given
> > the option comes at no cost (apart from initializing more spin
> > locks), why not enable it now.
> > 
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > ---
> >  arch/arm64/Kconfig               |  3 +++
> >  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
> >  arch/arm64/include/asm/tlb.h     |  5 ++++-
> >  3 files changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index a4168d366127..104325a1ffc3 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
> >  config ARCH_HAS_CACHE_LINE_SIZE
> >  	def_bool y
> >  
> > +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> > +	def_bool y
> > +
> >  config SECCOMP
> >  	bool "Enable seccomp to safely compute untrusted bytecode"
> >  	---help---
> > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > index 52fa47c73bf0..dabba4b2c61f 100644
> > --- a/arch/arm64/include/asm/pgalloc.h
> > +++ b/arch/arm64/include/asm/pgalloc.h
> > @@ -33,12 +33,22 @@
> >  
> >  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> >  {
> > -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> > +	struct page *page;
> > +
> > +	page = alloc_page(PGALLOC_GFP);
> > +	if (!page)
> > +		return NULL;
> > +	if (!pgtable_pmd_page_ctor(page)) {
> > +		__free_page(page);
> > +		return NULL;
> > +	}
> > +	return page_address(page);
> 
> I'm a bit worried as to how this interacts with the page-table code in
> arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
> looks like that currently always calls pgtable_page_ctor(), regardless of
> level. Do we now need a separate allocator function for the PMD level?

Thanks for reminding me, I never noticed this. The short answer is
no.

I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
there because it's also compatible with pud, and pmd too without
this patch. So your concern is valid. Thanks again.

Why my answer is no? Because I don't think the ctor matters for
pgd_pgtable_alloc(). The ctor is only required for userspace page
tables, and that's why we don't have it in pte_alloc_one_kernel().
AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
userspace page tables in any other arch).

So to avoid future confusion, we might just remove the ctor from
pgd_pgtable_alloc().
Yu Zhao Feb. 18, 2019, 8:48 p.m. UTC | #3
On Mon, Feb 18, 2019 at 12:49:38PM -0700, Yu Zhao wrote:
> On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
> > [+Mark]
> > 
> > On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
> > > Switch from per mm_struct to per pmd page table lock by enabling
> > > ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> > > large system.
> > > 
> > > I'm not sure if there is contention on mm->page_table_lock. Given
> > > the option comes at no cost (apart from initializing more spin
> > > locks), why not enable it now.
> > > 
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > > ---
> > >  arch/arm64/Kconfig               |  3 +++
> > >  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
> > >  arch/arm64/include/asm/tlb.h     |  5 ++++-
> > >  3 files changed, 18 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index a4168d366127..104325a1ffc3 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
> > >  config ARCH_HAS_CACHE_LINE_SIZE
> > >  	def_bool y
> > >  
> > > +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
> > > +	def_bool y
> > > +
> > >  config SECCOMP
> > >  	bool "Enable seccomp to safely compute untrusted bytecode"
> > >  	---help---
> > > diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
> > > index 52fa47c73bf0..dabba4b2c61f 100644
> > > --- a/arch/arm64/include/asm/pgalloc.h
> > > +++ b/arch/arm64/include/asm/pgalloc.h
> > > @@ -33,12 +33,22 @@
> > >  
> > >  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
> > >  {
> > > -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
> > > +	struct page *page;
> > > +
> > > +	page = alloc_page(PGALLOC_GFP);
> > > +	if (!page)
> > > +		return NULL;
> > > +	if (!pgtable_pmd_page_ctor(page)) {
> > > +		__free_page(page);
> > > +		return NULL;
> > > +	}
> > > +	return page_address(page);
> > 
> > I'm a bit worried as to how this interacts with the page-table code in
> > arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
> > looks like that currently always calls pgtable_page_ctor(), regardless of
> > level. Do we now need a separate allocator function for the PMD level?
> 
> Thanks for reminding me, I never noticed this. The short answer is
> no.
> 
> I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
> there because it's also compatible with pud, and pmd too without
> this patch. So your concern is valid. Thanks again.
> 
> Why my answer is no? Because I don't think the ctor matters for
> pgd_pgtable_alloc(). The ctor is only required for userspace page
> tables, and that's why we don't have it in pte_alloc_one_kernel().
> AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
> pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
> userspace page tables in any other arch).
> 
> So to avoid future confusion, we might just remove the ctor from
> pgd_pgtable_alloc().

I'm sorry. I've missed that we call apply_to_page_range() on efi_mm.
The function does require the ctor. So we actually can't remove it.
Though pgtable_page_ctor() also does the work adequately for pmd in
terms of giving apply_to_page_range() what it requires, it would be
more appropriate to use pgtable_pmd_page_ctor() instead (and not
calling the ctor at all on pud).

I could add this change prior to this patch, if it makes sense to
you. Thanks.
Anshuman Khandual Feb. 19, 2019, 3:08 a.m. UTC | #4
On 02/15/2019 02:46 AM, Yu Zhao wrote:
> Switch from per mm_struct to per pmd page table lock by enabling
> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
> large system.
> 
> I'm not sure if there is contention on mm->page_table_lock. Given
> the option comes at no cost (apart from initializing more spin
> locks), why not enable it now.
> 

This has similar changes to what I had posted part of the general page table
page accounting clean up series on arm64 last month.

https://www.spinics.net/lists/arm-kernel/msg701954.html
Anshuman Khandual Feb. 19, 2019, 4:09 a.m. UTC | #5
On 02/19/2019 01:19 AM, Yu Zhao wrote:
> On Mon, Feb 18, 2019 at 03:12:23PM +0000, Will Deacon wrote:
>> [+Mark]
>>
>> On Thu, Feb 14, 2019 at 02:16:42PM -0700, Yu Zhao wrote:
>>> Switch from per mm_struct to per pmd page table lock by enabling
>>> ARCH_ENABLE_SPLIT_PMD_PTLOCK. This provides better granularity for
>>> large system.
>>>
>>> I'm not sure if there is contention on mm->page_table_lock. Given
>>> the option comes at no cost (apart from initializing more spin
>>> locks), why not enable it now.
>>>
>>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>>> ---
>>>  arch/arm64/Kconfig               |  3 +++
>>>  arch/arm64/include/asm/pgalloc.h | 12 +++++++++++-
>>>  arch/arm64/include/asm/tlb.h     |  5 ++++-
>>>  3 files changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index a4168d366127..104325a1ffc3 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -872,6 +872,9 @@ config ARCH_WANT_HUGE_PMD_SHARE
>>>  config ARCH_HAS_CACHE_LINE_SIZE
>>>  	def_bool y
>>>  
>>> +config ARCH_ENABLE_SPLIT_PMD_PTLOCK
>>> +	def_bool y
>>> +
>>>  config SECCOMP
>>>  	bool "Enable seccomp to safely compute untrusted bytecode"
>>>  	---help---
>>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>>> index 52fa47c73bf0..dabba4b2c61f 100644
>>> --- a/arch/arm64/include/asm/pgalloc.h
>>> +++ b/arch/arm64/include/asm/pgalloc.h
>>> @@ -33,12 +33,22 @@
>>>  
>>>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>>>  {
>>> -	return (pmd_t *)__get_free_page(PGALLOC_GFP);
>>> +	struct page *page;
>>> +
>>> +	page = alloc_page(PGALLOC_GFP);
>>> +	if (!page)
>>> +		return NULL;
>>> +	if (!pgtable_pmd_page_ctor(page)) {
>>> +		__free_page(page);
>>> +		return NULL;
>>> +	}
>>> +	return page_address(page);
>>
>> I'm a bit worried as to how this interacts with the page-table code in
>> arch/arm64/mm/mmu.c when pgd_pgtable_alloc is used as the allocator. It
>> looks like that currently always calls pgtable_page_ctor(), regardless of
>> level. Do we now need a separate allocator function for the PMD level?> 
> Thanks for reminding me, I never noticed this. The short answer is
> no.
> 
> I guess pgtable_page_ctor() is used on all pud/pmd/pte entries
> there because it's also compatible with pud, and pmd too without
> this patch. So your concern is valid. Thanks again.

pgtable_page_ctor() acts on a given page used as page table at any level
which sets appropriate page type (page flag PG_table) and increments the
zone stat for NR_PAGETABLE. pgtable_page_dtor() exactly does the inverse.

These two complimentary operations are required for every level page table
pages for their proper initialization, identification in buddy and zone
statistics. Hence these need to be called for all level page table pages.

pgtable_pmd_page_ctor()/pgtable_pmd_page_dtor() on the other hand just
init/free page table lock on the page for !THP cases and additionally
init page->pmd_huge_pte (deposited page table page) for THP cases.
Some archs seem to be calling pgtable_pmd_page_ctor() in place of
pgtable_page_ctor(). Wondering would not that approach skip page flag
and accounting requirements.

> 
> Why my answer is no? Because I don't think the ctor matters for
> pgd_pgtable_alloc(). The ctor is only required for userspace page
> tables, and that's why we don't have it in pte_alloc_one_kernel().

At present on arm64 certain kernel page table page allocations call
pgtable_pmd_page_ctor() and some dont. The series which I had posted
make sure that all kernel and user page table page allocations go through
pgtable_page_ctor()/dtor(). These constructs are required for kernel
page table pages as well for accurate init and accounting not just for
user space. The series just skips vmemmap struct page mapping from this
as that would require generic sparse vmemmap allocation/free functions
which I believe should also be changed going forward as well.

> AFAICT, none of the pgds (efi_mm.pgd, tramp_pg_dir and init_mm.pgd)
> pre-populated by pgd_pgtable_alloc() is. (I doubt we pre-populate
> userspace page tables in any other arch).
> 
> So to avoid future confusion, we might just remove the ctor from
> pgd_pgtable_alloc().

No. Instead we should just make sure the that those pages go through
dtor() destructor path when getting freed and the clean up series
does that.
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..104325a1ffc3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,6 +872,9 @@  config ARCH_WANT_HUGE_PMD_SHARE
 config ARCH_HAS_CACHE_LINE_SIZE
 	def_bool y
 
+config ARCH_ENABLE_SPLIT_PMD_PTLOCK
+	def_bool y
+
 config SECCOMP
 	bool "Enable seccomp to safely compute untrusted bytecode"
 	---help---
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 52fa47c73bf0..dabba4b2c61f 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -33,12 +33,22 @@ 
 
 static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
-	return (pmd_t *)__get_free_page(PGALLOC_GFP);
+	struct page *page;
+
+	page = alloc_page(PGALLOC_GFP);
+	if (!page)
+		return NULL;
+	if (!pgtable_pmd_page_ctor(page)) {
+		__free_page(page);
+		return NULL;
+	}
+	return page_address(page);
 }
 
 static inline void pmd_free(struct mm_struct *mm, pmd_t *pmdp)
 {
 	BUG_ON((unsigned long)pmdp & (PAGE_SIZE-1));
+	pgtable_pmd_page_dtor(virt_to_page(pmdp));
 	free_page((unsigned long)pmdp);
 }
 
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 106fdc951b6e..4e3becfed387 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -62,7 +62,10 @@  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
 static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
 				  unsigned long addr)
 {
-	tlb_remove_table(tlb, virt_to_page(pmdp));
+	struct page *page = virt_to_page(pmdp);
+
+	pgtable_pmd_page_dtor(page);
+	tlb_remove_table(tlb, page);
 }
 #endif