diff mbox series

[v1,09/11] mm/page_alloc: reuse tail struct pages for compound pagemaps

Message ID 20210325230938.30752-10-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series mm, sparse-vmemmap: Introduce compound pagemaps | expand

Commit Message

Joao Martins March 25, 2021, 11:09 p.m. UTC
When a pgmap @align is set, all pages are mapped at a given huge page
alignment and thus uses compound pages to describe them as opposed to a
struct per 4K.

With @align > PAGE_SIZE and when struct pages are stored in ram
(!altmap) most tail pages are reused. Consequently, the amount of unique
struct pages is a lot smaller that the total amount of struct pages
being mapped.

When struct pages are initialize in memmap_init_zone_device, make
sure that only unique struct pages are initialized i.e. the first 2
4K pages per @align which means 128 struct pages, instead of 32768 for
2M @align or 262144 for a 1G @align.

Keep old behaviour with altmap given that it doesn't support reusal
of tail vmemmap areas.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 mm/page_alloc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Joao Martins June 7, 2021, 1:48 p.m. UTC | #1
On 6/2/21 12:35 AM, Dan Williams wrote:
> Just like patch8 lets rename this "optimize memmap_init_zone_device()
> for compound page maps"
> 
> 
> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> When a pgmap @align is set,
> 
> Rewrite this to talk about geometry rather than align per previous
> patch comments.
> 
OK.

>> all pages are mapped at a given huge page
>> alignment and thus uses compound pages to describe them as opposed to a
> 
> s/thus uses/use/
>
/me nods

>> struct per 4K.
>>
>> With @align > PAGE_SIZE and when struct pages are stored in ram
>> (!altmap) most tail pages are reused. Consequently, the amount of unique
>> struct pages is a lot smaller that the total amount of struct pages
>> being mapped.
>>
>> When struct pages are initialize in memmap_init_zone_device, make
>> sure that only unique struct pages are initialized i.e. the first 2
>> 4K pages per @align which means 128 struct pages, instead of 32768 for
>> 2M @align or 262144 for a 1G @align.
> 
> This is the important detail to put at the beginning and then follow
> with the detailed description, i.e. something like:
> 
> "Currently memmap_init_zone_device() ends up initializing 32768 pages
> when it only needs to initialize 128 given tail page reuse. That
> number is worse with 1GB compound page geometries, 262144 instead of
> 128. Update memmap_init_zone_device() to skip redundant
> initialization, details below:"
> 
/me nods

>> Keep old behaviour with altmap given that it doesn't support reusal
>> of tail vmemmap areas.
> 
> I think I like how you described this earlier as the (!altmap) path. So:
> 
> "The !altmap path is left alone since it does not support compound
> page map geometries."
> 
It's more accurate to say that "does not support memory savings with
compound page map geometries."

>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  mm/page_alloc.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3a77f9e43f3a..948dfad6754b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6277,6 +6277,8 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>>         }
>>  }
>>
>> +#define MEMMAP_NR_PAGES        (2 * (PAGE_SIZE/sizeof(struct page)))
> 
> This seems too generic a name for something that is specific to the
> compound page init case... more below.
> 
>> +
>>  void __ref memmap_init_zone_device(struct zone *zone,
>>                                    unsigned long start_pfn,
>>                                    unsigned long nr_pages,
>> @@ -6287,6 +6289,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>>         struct vmem_altmap *altmap = pgmap_altmap(pgmap);
>>         unsigned int pfn_align = pgmap_pfn_align(pgmap);
>>         unsigned int order_align = order_base_2(pfn_align);
>> +       unsigned long ntails = min_t(unsigned long, pfn_align, MEMMAP_NR_PAGES);
> 
> I'd rather delay this to a specific "if (compound)" branch below
> because compound init is so different from the base page case.
> 
Perhaps this is not obvious, but both altmap *and* non-altmap case would use compound pages.

This ntails default value difference refers to the number of *unique* pages (...)

>>         unsigned long zone_idx = zone_idx(zone);
>>         unsigned long start = jiffies;
>>         int nid = pgdat->node_id;
>> @@ -6302,6 +6305,7 @@ void __ref memmap_init_zone_device(struct zone *zone,
>>         if (altmap) {
>>                 start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
>>                 nr_pages = end_pfn - start_pfn;
>> +               ntails = pfn_align;
> 
> Why would the altmap case need to init ntails, won't the inner loop of
> compound page setup be skipped?
> 
(...) Which means the answer here is no. The altmap case does initialize tail pages, but
contrary to the non-altmap, all tail struct pages are unique (no savings) and thus all
need to be initialized.

>>         }
> 
> Given all of the above I'm wondering if there should be a new
> "compound specific" flavor of this routine rather than trying to
> cleverly inter mingle the old path with the new. This is easier
> because you have already done the work in previous patches to break
> this into helpers. So just have memmap_init_zone_device() do it the
> "old" way and memmap_init_compound() handle all the tail page init +
> optimizations.
> 
I can separate it out, should be easier to follow.

Albeit just a note, I think memmap_init_compound() should be the new normal as metadata
more accurately represents what goes on the page tables. That's regardless of
vmemmap-based gains, and hence why my train of thought was to not separate it.

After this series, all devmap pages where @geometry matches @align will have compound
pages be used instead. And we enable that in device-dax as first user (in the next patch).
altmap or not so far just differentiates on the uniqueness of struct pages as the former
doesn't reuse base pages that only contain tail pages and consequently makes us initialize
all tail struct pages.
Dan Williams June 7, 2021, 7:32 p.m. UTC | #2
On Mon, Jun 7, 2021 at 6:49 AM Joao Martins <joao.m.martins@oracle.com> wrote:
[..]
> > Given all of the above I'm wondering if there should be a new
> > "compound specific" flavor of this routine rather than trying to
> > cleverly inter mingle the old path with the new. This is easier
> > because you have already done the work in previous patches to break
> > this into helpers. So just have memmap_init_zone_device() do it the
> > "old" way and memmap_init_compound() handle all the tail page init +
> > optimizations.
> >
> I can separate it out, should be easier to follow.
>
> Albeit just a note, I think memmap_init_compound() should be the new normal as metadata
> more accurately represents what goes on the page tables. That's regardless of
> vmemmap-based gains, and hence why my train of thought was to not separate it.
>
> After this series, all devmap pages where @geometry matches @align will have compound
> pages be used instead. And we enable that in device-dax as first user (in the next patch).
> altmap or not so far just differentiates on the uniqueness of struct pages as the former
> doesn't reuse base pages that only contain tail pages and consequently makes us initialize
> all tail struct pages.

I think combining the cases into a common central routine makes the
code that much harder to maintain. A small duplication cost is worth
it in my opinion to help readability / maintainability.
Joao Martins June 14, 2021, 6:41 p.m. UTC | #3
On 6/7/21 8:32 PM, Dan Williams wrote:
> On Mon, Jun 7, 2021 at 6:49 AM Joao Martins <joao.m.martins@oracle.com> wrote:
> [..]
>>> Given all of the above I'm wondering if there should be a new
>>> "compound specific" flavor of this routine rather than trying to
>>> cleverly inter mingle the old path with the new. This is easier
>>> because you have already done the work in previous patches to break
>>> this into helpers. So just have memmap_init_zone_device() do it the
>>> "old" way and memmap_init_compound() handle all the tail page init +
>>> optimizations.
>>>
>> I can separate it out, should be easier to follow.
>>
>> Albeit just a note, I think memmap_init_compound() should be the new normal as metadata
>> more accurately represents what goes on the page tables. That's regardless of
>> vmemmap-based gains, and hence why my train of thought was to not separate it.
>>
>> After this series, all devmap pages where @geometry matches @align will have compound
>> pages be used instead. And we enable that in device-dax as first user (in the next patch).
>> altmap or not so far just differentiates on the uniqueness of struct pages as the former
>> doesn't reuse base pages that only contain tail pages and consequently makes us initialize
>> all tail struct pages.
> 
> I think combining the cases into a common central routine makes the
> code that much harder to maintain. A small duplication cost is worth
> it in my opinion to help readability / maintainability.
> 
I am addressing this comment and taking a step back. By just moving the tail page init to
memmap_init_compound() this gets a lot more readable. Albeit now I think having separate
top-level loops over pfns, doesn't bring much improvement there.

Here's what I have by moving just tails init to a separate routine. See your original
suggestion after the scissors mark. I have a slight inclination towards the first one, but
no really strong preference. Thoughts?

[...]

static void __ref memmap_init_compound(struct page *page, unsigned long pfn,
                                        unsigned long zone_idx, int nid,
                                        struct dev_pagemap *pgmap,
                                        unsigned long nr_pages)
{
        unsigned int order_align = order_base_2(nr_pages);
        unsigned long i;

        __SetPageHead(page);

        for (i = 1; i < nr_pages; i++) {
                __init_zone_device_page(page + i, pfn + i, zone_idx,
                                        nid, pgmap);
                prep_compound_tail(page, i);

                /*
                 * The first and second tail pages need to
                 * initialized first, hence the head page is
                 * prepared last.
                 */
                if (i == 2)
                        prep_compound_head(page, order_align);
        }
}

void __ref memmap_init_zone_device(struct zone *zone,
                                   unsigned long start_pfn,
                                   unsigned long nr_pages,
                                   struct dev_pagemap *pgmap)
{
        unsigned long pfn, end_pfn = start_pfn + nr_pages;
        struct pglist_data *pgdat = zone->zone_pgdat;
        struct vmem_altmap *altmap = pgmap_altmap(pgmap);
        unsigned long pfns_per_compound = pgmap_pfn_geometry(pgmap);
        unsigned long zone_idx = zone_idx(zone);
        unsigned long start = jiffies;
        int nid = pgdat->node_id;

        if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
                return;

        /*
         * The call to memmap_init_zone should have already taken care
         * of the pages reserved for the memmap, so we can just jump to
         * the end of that region and start processing the device pages.
         */
        if (altmap) {
                start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
                nr_pages = end_pfn - start_pfn;
        }

        for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) {
                struct page *page = pfn_to_page(pfn);

                __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);

                if (pfns_per_compound == 1)
                        continue;

                memmap_init_compound(page, pfn, zone_idx, nid, pgmap,
                                     pfns_per_compound);
        }

        pr_info("%s initialised %lu pages in %ums\n", __func__,
                nr_pages, jiffies_to_msecs(jiffies - start));
}


[...]

--->8-----
Whereas your original suggestion would look more like this:

[...]

static void __ref memmap_init_compound(unsigned long zone_idx, int nid,
                                        struct dev_pagemap *pgmap,
                                        unsigned long start_pfn,
                                        unsigned long end_pfn)
{
        unsigned long pfns_per_compound = pgmap_pfn_geometry(pgmap);
        unsigned int order_align = order_base_2(pfns_per_compound);
        unsigned long i;

        for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) {
                struct page *page = pfn_to_page(pfn);

                __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);

                __SetPageHead(page);

                for (i = 1; i < pfns_per_compound; i++) {
                        __init_zone_device_page(page + i, pfn + i, zone_idx,
                                                nid, pgmap);
                        prep_compound_tail(page, i);

                        /*
                         * The first and second tail pages need to
                         * initialized first, hence the head page is
                         * prepared last.
                         */
                        if (i == 2)
                                prep_compound_head(page, order_align);
                }
        }
}

static void __ref memmap_init_base(unsigned long zone_idx, int nid,
                                         struct dev_pagemap *pgmap,
                                         unsigned long start_pfn,
                                         unsigned long end_pfn)
{
        for (pfn = start_pfn; pfn < end_pfn; pfn++) {
                struct page *page = pfn_to_page(pfn);

                __init_zone_device_page(page, pfn, zone_idx, nid, pgmap);
        }
}

void __ref memmap_init_zone_device(struct zone *zone,
                                   unsigned long start_pfn,
                                   unsigned long nr_pages,
                                   struct dev_pagemap *pgmap)
{
        unsigned long pfn, end_pfn = start_pfn + nr_pages;
        struct pglist_data *pgdat = zone->zone_pgdat;
        struct vmem_altmap *altmap = pgmap_altmap(pgmap);
        bool compound = pgmap_geometry(pgmap) > PAGE_SIZE;
        unsigned long zone_idx = zone_idx(zone);
        unsigned long start = jiffies;
        int nid = pgdat->node_id;

        if (WARN_ON_ONCE(!pgmap || zone_idx(zone) != ZONE_DEVICE))
                return;

        /*
         * The call to memmap_init_zone should have already taken care
         * of the pages reserved for the memmap, so we can just jump to
         * the end of that region and start processing the device pages.
         */
        if (altmap) {
                start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
                nr_pages = end_pfn - start_pfn;
        }

        if (compound)
                memmap_init_compound(zone_idx, nid, pgmap, start_pfn, end_pfn);
        else
                memmap_init_base(zone_idx, nid, pgmap, start_pfn, end_pfn);

        pr_info("%s initialised %lu pages in %ums\n", __func__,
                nr_pages, jiffies_to_msecs(jiffies - start));
}
Dan Williams June 14, 2021, 11:07 p.m. UTC | #4
On Mon, Jun 14, 2021 at 11:42 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
>
>
> On 6/7/21 8:32 PM, Dan Williams wrote:
> > On Mon, Jun 7, 2021 at 6:49 AM Joao Martins <joao.m.martins@oracle.com> wrote:
> > [..]
> >>> Given all of the above I'm wondering if there should be a new
> >>> "compound specific" flavor of this routine rather than trying to
> >>> cleverly inter mingle the old path with the new. This is easier
> >>> because you have already done the work in previous patches to break
> >>> this into helpers. So just have memmap_init_zone_device() do it the
> >>> "old" way and memmap_init_compound() handle all the tail page init +
> >>> optimizations.
> >>>
> >> I can separate it out, should be easier to follow.
> >>
> >> Albeit just a note, I think memmap_init_compound() should be the new normal as metadata
> >> more accurately represents what goes on the page tables. That's regardless of
> >> vmemmap-based gains, and hence why my train of thought was to not separate it.
> >>
> >> After this series, all devmap pages where @geometry matches @align will have compound
> >> pages be used instead. And we enable that in device-dax as first user (in the next patch).
> >> altmap or not so far just differentiates on the uniqueness of struct pages as the former
> >> doesn't reuse base pages that only contain tail pages and consequently makes us initialize
> >> all tail struct pages.
> >
> > I think combining the cases into a common central routine makes the
> > code that much harder to maintain. A small duplication cost is worth
> > it in my opinion to help readability / maintainability.
> >
> I am addressing this comment and taking a step back. By just moving the tail page init to
> memmap_init_compound() this gets a lot more readable. Albeit now I think having separate
> top-level loops over pfns, doesn't bring much improvement there.
>
> Here's what I have by moving just tails init to a separate routine. See your original
> suggestion after the scissors mark. I have a slight inclination towards the first one, but
> no really strong preference. Thoughts?

I like your "first one" too.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3a77f9e43f3a..948dfad6754b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6277,6 +6277,8 @@  static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
 	}
 }
 
+#define MEMMAP_NR_PAGES	(2 * (PAGE_SIZE/sizeof(struct page)))
+
 void __ref memmap_init_zone_device(struct zone *zone,
 				   unsigned long start_pfn,
 				   unsigned long nr_pages,
@@ -6287,6 +6289,7 @@  void __ref memmap_init_zone_device(struct zone *zone,
 	struct vmem_altmap *altmap = pgmap_altmap(pgmap);
 	unsigned int pfn_align = pgmap_pfn_align(pgmap);
 	unsigned int order_align = order_base_2(pfn_align);
+	unsigned long ntails = min_t(unsigned long, pfn_align, MEMMAP_NR_PAGES);
 	unsigned long zone_idx = zone_idx(zone);
 	unsigned long start = jiffies;
 	int nid = pgdat->node_id;
@@ -6302,6 +6305,7 @@  void __ref memmap_init_zone_device(struct zone *zone,
 	if (altmap) {
 		start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap);
 		nr_pages = end_pfn - start_pfn;
+		ntails = pfn_align;
 	}
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pfn_align) {
@@ -6315,7 +6319,7 @@  void __ref memmap_init_zone_device(struct zone *zone,
 
 		__SetPageHead(page);
 
-		for (i = 1; i < pfn_align; i++) {
+		for (i = 1; i < ntails; i++) {
 			__init_zone_device_page(page + i, pfn + i, zone_idx,
 						nid, pgmap);
 			prep_compound_tail(page, i);