diff mbox series

[v5,5/5] mm/page_alloc: reuse tail struct pages for compound devmaps

Message ID 20220210193345.23628-6-joao.m.martins@oracle.com (mailing list archive)
State New
Headers show
Series sparse-vmemmap: memory savings for compound devmaps (device-dax) | expand

Commit Message

Joao Martins Feb. 10, 2022, 7:33 p.m. UTC
Currently memmap_init_zone_device() ends up initializing 32768 pages
when it only needs to initialize 128 given tail page reuse. That
number is worse with 1GB compound pages, 262144 instead of 128. Update
memmap_init_zone_device() to skip redundant initialization, detailed
below.

When a pgmap @vmemmap_shift is set, all pages are mapped at a given
huge page alignment and use compound pages to describe them as opposed
to a struct per 4K.

With @vmemmap_shift > 0 and when struct pages are stored in ram
(!altmap) most tail pages are reused. Consequently, the amount of
unique struct pages is a lot smaller that the total amount of struct
pages being mapped.

The altmap path is left alone since it does not support memory savings
based on compound pages devmap.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 mm/page_alloc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Muchun Song Feb. 11, 2022, 5:07 a.m. UTC | #1
On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> Currently memmap_init_zone_device() ends up initializing 32768 pages
> when it only needs to initialize 128 given tail page reuse. That
> number is worse with 1GB compound pages, 262144 instead of 128. Update
> memmap_init_zone_device() to skip redundant initialization, detailed
> below.
>
> When a pgmap @vmemmap_shift is set, all pages are mapped at a given
> huge page alignment and use compound pages to describe them as opposed
> to a struct per 4K.
>
> With @vmemmap_shift > 0 and when struct pages are stored in ram
> (!altmap) most tail pages are reused. Consequently, the amount of
> unique struct pages is a lot smaller that the total amount of struct
> pages being mapped.
>
> The altmap path is left alone since it does not support memory savings
> based on compound pages devmap.
>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  mm/page_alloc.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cface1d38093..c10df2fd0ec2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>         }
>  }
>
> +/*
> + * With compound page geometry and when struct pages are stored in ram most
> + * tail pages are reused. Consequently, the amount of unique struct pages to
> + * initialize is a lot smaller that the total amount of struct pages being
> + * mapped. This is a paired / mild layering violation with explicit knowledge
> + * of how the sparse_vmemmap internals handle compound pages in the lack
> + * of an altmap. See vmemmap_populate_compound_pages().
> + */
> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
> +                                             unsigned long nr_pages)
> +{
> +       return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages;
> +}
> +

This means only the first 2 pages will be modified, the reset 6 or 4094 pages
do not.  In the HugeTLB case, those tail pages are mapped with read-only
to catch invalid usage on tail pages (e.g. write operations). Quick question:
should we also do similar things on DAX?

Thanks.
Joao Martins Feb. 11, 2022, 12:48 p.m. UTC | #2
On 2/11/22 05:07, Muchun Song wrote:
> On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index cface1d38093..c10df2fd0ec2 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>>         }
>>  }
>>
>> +/*
>> + * With compound page geometry and when struct pages are stored in ram most
>> + * tail pages are reused. Consequently, the amount of unique struct pages to
>> + * initialize is a lot smaller that the total amount of struct pages being
>> + * mapped. This is a paired / mild layering violation with explicit knowledge
>> + * of how the sparse_vmemmap internals handle compound pages in the lack
>> + * of an altmap. See vmemmap_populate_compound_pages().
>> + */
>> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
>> +                                             unsigned long nr_pages)
>> +{
>> +       return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages;
>> +}
>> +
> 
> This means only the first 2 pages will be modified, the reset 6 or 4094 pages
> do not.  In the HugeTLB case, those tail pages are mapped with read-only
> to catch invalid usage on tail pages (e.g. write operations). Quick question:
> should we also do similar things on DAX?
> 
What's sort of in the way of marking deduplicated pages as read-only is one
particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB
gets its memory from the page allocator of already has pre-populated (at boot)
system RAM sections and needs those to be 'given back' before they can be
hotunplugged. So I guess it never goes through page_init_poison(). Although
device-dax, the sections are populated and dedicated to device-dax when
hotplugged, and then on hotunplug when the last user devdax user drops the page
reference.

So page_init_poison() is called on those two occasions. It actually writes to
whole sections of memmap, not just one page. So either I gate read-only page
protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside
page_init_poison() that the caller is trying to init compound devmap backed
struct pages that were already watermarked (i.e. essentially when pfn offset
between passed page and head page is bigger than 128).
Muchun Song Feb. 12, 2022, 11:11 a.m. UTC | #3
On Fri, Feb 11, 2022 at 8:48 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> On 2/11/22 05:07, Muchun Song wrote:
> > On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@oracle.com> wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index cface1d38093..c10df2fd0ec2 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
> >>         }
> >>  }
> >>
> >> +/*
> >> + * With compound page geometry and when struct pages are stored in ram most
> >> + * tail pages are reused. Consequently, the amount of unique struct pages to
> >> + * initialize is a lot smaller that the total amount of struct pages being
> >> + * mapped. This is a paired / mild layering violation with explicit knowledge
> >> + * of how the sparse_vmemmap internals handle compound pages in the lack
> >> + * of an altmap. See vmemmap_populate_compound_pages().
> >> + */
> >> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
> >> +                                             unsigned long nr_pages)
> >> +{
> >> +       return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages;
> >> +}
> >> +
> >
> > This means only the first 2 pages will be modified, the reset 6 or 4094 pages
> > do not.  In the HugeTLB case, those tail pages are mapped with read-only
> > to catch invalid usage on tail pages (e.g. write operations). Quick question:
> > should we also do similar things on DAX?
> >
> What's sort of in the way of marking deduplicated pages as read-only is one
> particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB
> gets its memory from the page allocator of already has pre-populated (at boot)
> system RAM sections and needs those to be 'given back' before they can be
> hotunplugged. So I guess it never goes through page_init_poison(). Although
> device-dax, the sections are populated and dedicated to device-dax when
> hotplugged, and then on hotunplug when the last user devdax user drops the page
> reference.
>
> So page_init_poison() is called on those two occasions. It actually writes to
> whole sections of memmap, not just one page. So either I gate read-only page
> protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside
> page_init_poison() that the caller is trying to init compound devmap backed
> struct pages that were already watermarked (i.e. essentially when pfn offset
> between passed page and head page is bigger than 128).

Got it. I haven't realized page_init_poison() will poison the struct pages.
I agree with you that mapping with read-only is wrong.

Thanks.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cface1d38093..c10df2fd0ec2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6666,6 +6666,20 @@  static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	}
 }
 
+/*
+ * With compound page geometry and when struct pages are stored in ram most
+ * tail pages are reused. Consequently, the amount of unique struct pages to
+ * initialize is a lot smaller that the total amount of struct pages being
+ * mapped. This is a paired / mild layering violation with explicit knowledge
+ * of how the sparse_vmemmap internals handle compound pages in the lack
+ * of an altmap. See vmemmap_populate_compound_pages().
+ */
+static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap,
+					      unsigned long nr_pages)
+{
+	return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages;
+}
+
 static void __ref memmap_init_compound(struct page *head,
 				       unsigned long head_pfn,
 				       unsigned long zone_idx, int nid,
@@ -6730,7 +6744,7 @@  void __ref memmap_init_zone_device(struct zone *zone,
 			continue;
 
 		memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
-				     pfns_per_compound);
+				     compound_nr_pages(altmap, pfns_per_compound));
 	}
 
 	pr_info("%s initialised %lu pages in %ums\n", __func__,