mbox series

[0/3] Add PUD and kernel PTE level pagetable account

Message ID cover.1657096412.git.baolin.wang@linux.alibaba.com (mailing list archive)
Headers show
Series Add PUD and kernel PTE level pagetable account | expand

Message

Baolin Wang July 6, 2022, 8:59 a.m. UTC
Hi,

Now we will miss to account the PUD level pagetable and kernel PTE level
pagetable, as well as missing to set the PG_table flags for these pagetable
pages, which will get an inaccurate pagetable accounting, and miss
PageTable() validation in some cases. So this patch set introduces new
helpers to help to account PUD and kernel PTE pagetable pages.

Note there are still some architectures specific pagetable allocation
that need to account the pagetable pages, which need more investigation
and cleanup in future.

Changes from RFC v3:
 - Rebased on 20220706 linux-next.
 - Introduce new pgtable_pud_page_ctor/dtor() and rename the helpers.
 - Change back to use inc_lruvec_page_state()/dec_lruvec_page_state().
 - Update some commit message.
link: https://lore.kernel.org/all/cover.1656586863.git.baolin.wang@linux.alibaba.com/

Changes from RFC v2:
 - Convert to use mod_lruvec_page_state() for non-order-0 case.
 - Rename the helpers.
 - Update some commit messages.
 - Remove unnecessary __GFP_HIGHMEM clear.
link: https://lore.kernel.org/all/cover.1655887440.git.baolin.wang@linux.alibaba.com/

Changes from RFC v1:
 - Update some commit message.
 - Add missing pgtable_clear_and_dec() on X86 arch.
 - Use __free_page() to free pagetable which can avoid duplicated virt_to_page().
link: https://lore.kernel.org/all/cover.1654271618.git.baolin.wang@linux.alibaba.com/

Baolin Wang (3):
  mm: Factor out the pagetable pages account into new helper function
  mm: Add PUD level pagetable account
  mm: Add kernel PTE level pagetable pages account

 arch/arm64/include/asm/tlb.h         |  5 ++++-
 arch/csky/include/asm/pgalloc.h      |  2 +-
 arch/loongarch/include/asm/pgalloc.h | 12 +++++++++---
 arch/microblaze/mm/pgtable.c         |  2 +-
 arch/mips/include/asm/pgalloc.h      | 12 +++++++++---
 arch/openrisc/mm/ioremap.c           |  2 +-
 arch/x86/mm/pgtable.c                |  7 +++++--
 include/asm-generic/pgalloc.h        | 26 ++++++++++++++++++++++----
 include/linux/mm.h                   | 34 ++++++++++++++++++++++++++--------
 9 files changed, 78 insertions(+), 24 deletions(-)

Comments

Dave Hansen July 6, 2022, 3:48 p.m. UTC | #1
On 7/6/22 01:59, Baolin Wang wrote:
> Now we will miss to account the PUD level pagetable and kernel PTE level
> pagetable, as well as missing to set the PG_table flags for these pagetable
> pages, which will get an inaccurate pagetable accounting, and miss
> PageTable() validation in some cases. So this patch set introduces new
> helpers to help to account PUD and kernel PTE pagetable pages.

Could you explain the motivation for this series a bit more?  Is there a
real-world problem that this fixes?
Baolin Wang July 7, 2022, 11:32 a.m. UTC | #2
On 7/6/2022 11:48 PM, Dave Hansen wrote:
> On 7/6/22 01:59, Baolin Wang wrote:
>> Now we will miss to account the PUD level pagetable and kernel PTE level
>> pagetable, as well as missing to set the PG_table flags for these pagetable
>> pages, which will get an inaccurate pagetable accounting, and miss
>> PageTable() validation in some cases. So this patch set introduces new
>> helpers to help to account PUD and kernel PTE pagetable pages.
> 
> Could you explain the motivation for this series a bit more?  Is there a
> real-world problem that this fixes?

Not fix real problem. The motivation is that making the pagetable 
accounting more accurate, which helps us to analyse the consumption of 
the pagetable pages in some cases, and maybe help to do some empty 
pagetable reclaiming in future.
Dave Hansen July 7, 2022, 2:44 p.m. UTC | #3
On 7/7/22 04:32, Baolin Wang wrote:
> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>> On 7/6/22 01:59, Baolin Wang wrote:
>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>> pagetable, as well as missing to set the PG_table flags for these
>>> pagetable
>>> pages, which will get an inaccurate pagetable accounting, and miss
>>> PageTable() validation in some cases. So this patch set introduces new
>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>
>> Could you explain the motivation for this series a bit more?  Is there a
>> real-world problem that this fixes?
> 
> Not fix real problem. The motivation is that making the pagetable
> accounting more accurate, which helps us to analyse the consumption of
> the pagetable pages in some cases, and maybe help to do some empty
> pagetable reclaiming in future.

This accounting isn't free.  It costs storage (and also parts of
cachelines) in each mm and CPU time to maintain it, plus maintainer
eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
pages unless someone is using gigantic hugetlbfs mappings.

Even with 1G gigantic pages, you would need a quarter of a million
(well, 262144 or 512*512) mappings of one 1G page to consume 1G of
memory on PUD pages.

That just doesn't seem like something anyone is likely to actually do in
practice.  That makes the benefits of the PUD portion of this series
rather unclear in the real world.

As for the kernel page tables, I'm not really aware of them causing any
problems.  We have a pretty good idea how much space they consume from
the DirectMap* entries in meminfo:

	DirectMap4k:     2262720 kB
	DirectMap2M:    40507392 kB
	DirectMap1G:    24117248 kB

as well as our page table debugging infrastructure.  I haven't found
myself dying for more specific info on them.

So, nothing in this series seems like a *BAD* idea, but I'm not sure in
the end it solves more problems than it creates.
Baolin Wang July 10, 2022, 11:19 a.m. UTC | #4
On 7/7/2022 10:44 PM, Dave Hansen wrote:
> On 7/7/22 04:32, Baolin Wang wrote:
>> On 7/6/2022 11:48 PM, Dave Hansen wrote:
>>> On 7/6/22 01:59, Baolin Wang wrote:
>>>> Now we will miss to account the PUD level pagetable and kernel PTE level
>>>> pagetable, as well as missing to set the PG_table flags for these
>>>> pagetable
>>>> pages, which will get an inaccurate pagetable accounting, and miss
>>>> PageTable() validation in some cases. So this patch set introduces new
>>>> helpers to help to account PUD and kernel PTE pagetable pages.
>>>
>>> Could you explain the motivation for this series a bit more?  Is there a
>>> real-world problem that this fixes?
>>
>> Not fix real problem. The motivation is that making the pagetable
>> accounting more accurate, which helps us to analyse the consumption of
>> the pagetable pages in some cases, and maybe help to do some empty
>> pagetable reclaiming in future.
> 
> This accounting isn't free.  It costs storage (and also parts of
> cachelines) in each mm and CPU time to maintain it, plus maintainer
> eyeballs to maintain.  PUD pages are also fundamentally (on x86 at
> least) 0.0004% of the overhead of PTE and 0.2% of the overhead of PMD
> pages unless someone is using gigantic hugetlbfs mappings.

Yes, agree. However I think the performence influence of this patch is 
small from some testing I did (like mysql, no obvious performance 
influence). Moreover the pagetable accounting gap is about 1% from below 
testing data.

Without this patchset, the pagetable consumption is about 110M with 
mysql testing.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28232      110 
__________________________g__________________      pgtable

With this patchset, and the consumption is about 111M.
              flags      page-count       MB  symbolic-flags 
          long-symbolic-flags
0x0000000004000000           28459      111 
__________________________g__________________      pgtable


> Even with 1G gigantic pages, you would need a quarter of a million
> (well, 262144 or 512*512) mappings of one 1G page to consume 1G of
> memory on PUD pages.
> 
> That just doesn't seem like something anyone is likely to actually do in
> practice.  That makes the benefits of the PUD portion of this series
> rather unclear in the real world.
> 
> As for the kernel page tables, I'm not really aware of them causing any
> problems.  We have a pretty good idea how much space they consume from
> the DirectMap* entries in meminfo:
> 
> 	DirectMap4k:     2262720 kB
> 	DirectMap2M:    40507392 kB
> 	DirectMap1G:    24117248 kB

However these statistics are arch-specific information, which only 
available on x86, s390 and powerpc.

> as well as our page table debugging infrastructure.  I haven't found
> myself dying for more specific info on them.
> 
> So, nothing in this series seems like a *BAD* idea, but I'm not sure in
> the end it solves more problems than it creates.

Thanks for your input.