Message ID | 20220210193345.23628-6-joao.m.martins@oracle.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | sparse-vmemmap: memory savings for compound devmaps (device-dax) | expand |
On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@oracle.com> wrote: > > Currently memmap_init_zone_device() ends up initializing 32768 pages > when it only needs to initialize 128 given tail page reuse. That > number is worse with 1GB compound pages, 262144 instead of 128. Update > memmap_init_zone_device() to skip redundant initialization, detailed > below. > > When a pgmap @vmemmap_shift is set, all pages are mapped at a given > huge page alignment and use compound pages to describe them as opposed > to a struct per 4K. > > With @vmemmap_shift > 0 and when struct pages are stored in ram > (!altmap) most tail pages are reused. Consequently, the amount of > unique struct pages is a lot smaller that the total amount of struct > pages being mapped. > > The altmap path is left alone since it does not support memory savings > based on compound pages devmap. > > Signed-off-by: Joao Martins <joao.m.martins@oracle.com> > --- > mm/page_alloc.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cface1d38093..c10df2fd0ec2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, > } > } > > +/* > + * With compound page geometry and when struct pages are stored in ram most > + * tail pages are reused. Consequently, the amount of unique struct pages to > + * initialize is a lot smaller that the total amount of struct pages being > + * mapped. This is a paired / mild layering violation with explicit knowledge > + * of how the sparse_vmemmap internals handle compound pages in the lack > + * of an altmap. See vmemmap_populate_compound_pages(). > + */ > +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, > + unsigned long nr_pages) > +{ > + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; > +} > + This means only the first 2 pages will be modified, the reset 6 or 4094 pages do not. In the HugeTLB case, those tail pages are mapped with read-only to catch invalid usage on tail pages (e.g. write operations). Quick question: should we also do similar things on DAX? Thanks.
On 2/11/22 05:07, Muchun Song wrote: > On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@oracle.com> wrote: >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cface1d38093..c10df2fd0ec2 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, >> } >> } >> >> +/* >> + * With compound page geometry and when struct pages are stored in ram most >> + * tail pages are reused. Consequently, the amount of unique struct pages to >> + * initialize is a lot smaller that the total amount of struct pages being >> + * mapped. This is a paired / mild layering violation with explicit knowledge >> + * of how the sparse_vmemmap internals handle compound pages in the lack >> + * of an altmap. See vmemmap_populate_compound_pages(). >> + */ >> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, >> + unsigned long nr_pages) >> +{ >> + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; >> +} >> + > > This means only the first 2 pages will be modified, the reset 6 or 4094 pages > do not. In the HugeTLB case, those tail pages are mapped with read-only > to catch invalid usage on tail pages (e.g. write operations). Quick question: > should we also do similar things on DAX? > What's sort of in the way of marking deduplicated pages as read-only is one particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB gets its memory from the page allocator of already has pre-populated (at boot) system RAM sections and needs those to be 'given back' before they can be hotunplugged. So I guess it never goes through page_init_poison(). Although device-dax, the sections are populated and dedicated to device-dax when hotplugged, and then on hotunplug when the last user devdax user drops the page reference. So page_init_poison() is called on those two occasions. It actually writes to whole sections of memmap, not just one page. So either I gate read-only page protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside page_init_poison() that the caller is trying to init compound devmap backed struct pages that were already watermarked (i.e. essentially when pfn offset between passed page and head page is bigger than 128).
On Fri, Feb 11, 2022 at 8:48 PM Joao Martins <joao.m.martins@oracle.com> wrote: > > On 2/11/22 05:07, Muchun Song wrote: > > On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@oracle.com> wrote: > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index cface1d38093..c10df2fd0ec2 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, > >> } > >> } > >> > >> +/* > >> + * With compound page geometry and when struct pages are stored in ram most > >> + * tail pages are reused. Consequently, the amount of unique struct pages to > >> + * initialize is a lot smaller that the total amount of struct pages being > >> + * mapped. This is a paired / mild layering violation with explicit knowledge > >> + * of how the sparse_vmemmap internals handle compound pages in the lack > >> + * of an altmap. See vmemmap_populate_compound_pages(). > >> + */ > >> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, > >> + unsigned long nr_pages) > >> +{ > >> + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; > >> +} > >> + > > > > This means only the first 2 pages will be modified, the reset 6 or 4094 pages > > do not. In the HugeTLB case, those tail pages are mapped with read-only > > to catch invalid usage on tail pages (e.g. write operations). Quick question: > > should we also do similar things on DAX? > > > What's sort of in the way of marking deduplicated pages as read-only is one > particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB > gets its memory from the page allocator of already has pre-populated (at boot) > system RAM sections and needs those to be 'given back' before they can be > hotunplugged. So I guess it never goes through page_init_poison(). Although > device-dax, the sections are populated and dedicated to device-dax when > hotplugged, and then on hotunplug when the last user devdax user drops the page > reference. > > So page_init_poison() is called on those two occasions. It actually writes to > whole sections of memmap, not just one page. So either I gate read-only page > protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside > page_init_poison() that the caller is trying to init compound devmap backed > struct pages that were already watermarked (i.e. essentially when pfn offset > between passed page and head page is bigger than 128). Got it. I haven't realized page_init_poison() will poison the struct pages. I agree with you that mapping with read-only is wrong. Thanks.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cface1d38093..c10df2fd0ec2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, } } +/* + * With compound page geometry and when struct pages are stored in ram most + * tail pages are reused. Consequently, the amount of unique struct pages to + * initialize is a lot smaller that the total amount of struct pages being + * mapped. This is a paired / mild layering violation with explicit knowledge + * of how the sparse_vmemmap internals handle compound pages in the lack + * of an altmap. See vmemmap_populate_compound_pages(). + */ +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, + unsigned long nr_pages) +{ + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; +} + static void __ref memmap_init_compound(struct page *head, unsigned long head_pfn, unsigned long zone_idx, int nid, @@ -6730,7 +6744,7 @@ void __ref memmap_init_zone_device(struct zone *zone, continue; memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - pfns_per_compound); + compound_nr_pages(altmap, pfns_per_compound)); } pr_info("%s initialised %lu pages in %ums\n", __func__,
Currently memmap_init_zone_device() ends up initializing 32768 pages when it only needs to initialize 128 given tail page reuse. That number is worse with 1GB compound pages, 262144 instead of 128. Update memmap_init_zone_device() to skip redundant initialization, detailed below. When a pgmap @vmemmap_shift is set, all pages are mapped at a given huge page alignment and use compound pages to describe them as opposed to a struct per 4K. With @vmemmap_shift > 0 and when struct pages are stored in ram (!altmap) most tail pages are reused. Consequently, the amount of unique struct pages is a lot smaller that the total amount of struct pages being mapped. The altmap path is left alone since it does not support memory savings based on compound pages devmap. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> --- mm/page_alloc.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-)