Message ID | 20210315092015.35396-8-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Free some vmemmap pages of HugeTLB page | expand |
On Mon, Mar 15, 2021 at 05:20:14PM +0800, Muchun Song wrote: > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -34,6 +34,7 @@ > #include <linux/gfp.h> > #include <linux/kcore.h> > #include <linux/bootmem_info.h> > +#include <linux/hugetlb.h> > > #include <asm/processor.h> > #include <asm/bios_ebda.h> > @@ -1557,7 +1558,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > { > int err; > > - if (end - start < PAGES_PER_SECTION * sizeof(struct page)) > + if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || > + end - start < PAGES_PER_SECTION * sizeof(struct page)) > err = vmemmap_populate_basepages(start, end, node, NULL); > else if (boot_cpu_has(X86_FEATURE_PSE)) > err = vmemmap_populate_hugepages(start, end, node, altmap); I've been thinking about this some more. Assume you opt-in the hugetlb-vmemmap feature, and assume you pass a valid altmap to vmemmap_populate. This will lead to use populating the vmemmap array with hugepages. What if then, a HugeTLB gets allocated and falls within that memory range (backed by hugetpages)? AFAIK, this will get us in trouble as currently the code can only operate on memory backed by PAGE_SIZE pages, right? I cannot remember, but I do not think nothing prevents that from happening? Am I missing anything?
On Fri, Mar 19, 2021 at 4:59 PM Oscar Salvador <osalvador@suse.de> wrote: > > On Mon, Mar 15, 2021 at 05:20:14PM +0800, Muchun Song wrote: > > --- a/arch/x86/mm/init_64.c > > +++ b/arch/x86/mm/init_64.c > > @@ -34,6 +34,7 @@ > > #include <linux/gfp.h> > > #include <linux/kcore.h> > > #include <linux/bootmem_info.h> > > +#include <linux/hugetlb.h> > > > > #include <asm/processor.h> > > #include <asm/bios_ebda.h> > > @@ -1557,7 +1558,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > > { > > int err; > > > > - if (end - start < PAGES_PER_SECTION * sizeof(struct page)) > > + if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || > > + end - start < PAGES_PER_SECTION * sizeof(struct page)) > > err = vmemmap_populate_basepages(start, end, node, NULL); > > else if (boot_cpu_has(X86_FEATURE_PSE)) > > err = vmemmap_populate_hugepages(start, end, node, altmap); > > I've been thinking about this some more. > > Assume you opt-in the hugetlb-vmemmap feature, and assume you pass a valid altmap > to vmemmap_populate. > This will lead to use populating the vmemmap array with hugepages. Right. > > What if then, a HugeTLB gets allocated and falls within that memory range (backed > by hugetpages)? I am not sure whether we can allocate the HugeTLB pages from there. Will only device memory pass a valid altmap parameter to vmemmap_populate()? If yes, can we allocate HugeTLB pages from device memory? Sorry, I am not an expert on this. > AFAIK, this will get us in trouble as currently the code can only operate on memory > backed by PAGE_SIZE pages, right? > > I cannot remember, but I do not think nothing prevents that from happening? > Am I missing anything? Maybe David H is more familiar with this. Hi David, Do you have some suggestions on this? Thanks. > > -- > Oscar Salvador > SUSE L3
On 19.03.21 13:15, Muchun Song wrote: > On Fri, Mar 19, 2021 at 4:59 PM Oscar Salvador <osalvador@suse.de> wrote: >> >> On Mon, Mar 15, 2021 at 05:20:14PM +0800, Muchun Song wrote: >>> --- a/arch/x86/mm/init_64.c >>> +++ b/arch/x86/mm/init_64.c >>> @@ -34,6 +34,7 @@ >>> #include <linux/gfp.h> >>> #include <linux/kcore.h> >>> #include <linux/bootmem_info.h> >>> +#include <linux/hugetlb.h> >>> >>> #include <asm/processor.h> >>> #include <asm/bios_ebda.h> >>> @@ -1557,7 +1558,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >>> { >>> int err; >>> >>> - if (end - start < PAGES_PER_SECTION * sizeof(struct page)) >>> + if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || >>> + end - start < PAGES_PER_SECTION * sizeof(struct page)) >>> err = vmemmap_populate_basepages(start, end, node, NULL); >>> else if (boot_cpu_has(X86_FEATURE_PSE)) >>> err = vmemmap_populate_hugepages(start, end, node, altmap); >> >> I've been thinking about this some more. >> >> Assume you opt-in the hugetlb-vmemmap feature, and assume you pass a valid altmap >> to vmemmap_populate. >> This will lead to use populating the vmemmap array with hugepages. > > Right. > >> >> What if then, a HugeTLB gets allocated and falls within that memory range (backed >> by hugetpages)? > > I am not sure whether we can allocate the HugeTLB pages from there. > Will only device memory pass a valid altmap parameter to > vmemmap_populate()? If yes, can we allocate HugeTLB pages from > device memory? Sorry, I am not an expert on this. I think, right now, yes. System RAM that's applicable for HugePages never uses an altmap. But Oscar's patch will change that, maybe before your series might get included from what I've been reading. [1] [1] https://lkml.kernel.org/r/20210319092635.6214-1-osalvador@suse.de > > >> AFAIK, this will get us in trouble as currently the code can only operate on memory >> backed by PAGE_SIZE pages, right? >> >> I cannot remember, but I do not think nothing prevents that from happening? >> Am I missing anything? > > Maybe David H is more familiar with this. > > Hi David, > > Do you have some suggestions on this? There has to be some way to identify whether we can optimize specific vmemmap pages or should just leave them alone. altmap vs. !altmap. Unfortunately, there is no easy way to detect that - e.g., PageReserved() applies also to boot memory. We could go back to setting a special PageType for these vmemmap pages, indicating "this is a page allocated from an altmap, don't touch it".
On 19.03.21 13:36, David Hildenbrand wrote: > On 19.03.21 13:15, Muchun Song wrote: >> On Fri, Mar 19, 2021 at 4:59 PM Oscar Salvador <osalvador@suse.de> wrote: >>> >>> On Mon, Mar 15, 2021 at 05:20:14PM +0800, Muchun Song wrote: >>>> --- a/arch/x86/mm/init_64.c >>>> +++ b/arch/x86/mm/init_64.c >>>> @@ -34,6 +34,7 @@ >>>> #include <linux/gfp.h> >>>> #include <linux/kcore.h> >>>> #include <linux/bootmem_info.h> >>>> +#include <linux/hugetlb.h> >>>> >>>> #include <asm/processor.h> >>>> #include <asm/bios_ebda.h> >>>> @@ -1557,7 +1558,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, >>>> { >>>> int err; >>>> >>>> - if (end - start < PAGES_PER_SECTION * sizeof(struct page)) >>>> + if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || >>>> + end - start < PAGES_PER_SECTION * sizeof(struct page)) >>>> err = vmemmap_populate_basepages(start, end, node, NULL); >>>> else if (boot_cpu_has(X86_FEATURE_PSE)) >>>> err = vmemmap_populate_hugepages(start, end, node, altmap); >>> >>> I've been thinking about this some more. >>> >>> Assume you opt-in the hugetlb-vmemmap feature, and assume you pass a valid altmap >>> to vmemmap_populate. >>> This will lead to use populating the vmemmap array with hugepages. >> >> Right. >> >>> >>> What if then, a HugeTLB gets allocated and falls within that memory range (backed >>> by hugetpages)? >> >> I am not sure whether we can allocate the HugeTLB pages from there. >> Will only device memory pass a valid altmap parameter to >> vmemmap_populate()? If yes, can we allocate HugeTLB pages from >> device memory? Sorry, I am not an expert on this. > > I think, right now, yes. System RAM that's applicable for HugePages > never uses an altmap. But Oscar's patch will change that, maybe before > your series might get included from what I've been reading. [1] > > [1] https://lkml.kernel.org/r/20210319092635.6214-1-osalvador@suse.de > >> >> >>> AFAIK, this will get us in trouble as currently the code can only operate on memory >>> backed by PAGE_SIZE pages, right? >>> >>> I cannot remember, but I do not think nothing prevents that from happening? >>> Am I missing anything? >> >> Maybe David H is more familiar with this. >> >> Hi David, >> >> Do you have some suggestions on this? > > There has to be some way to identify whether we can optimize specific > vmemmap pages or should just leave them alone. altmap vs. !altmap. > > Unfortunately, there is no easy way to detect that - e.g., > PageReserved() applies also to boot memory. > > We could go back to setting a special PageType for these vmemmap pages, > indicating "this is a page allocated from an altmap, don't touch it". > With SPARSEMEM we can use PageReserved(page) && early_section(): vmemmap from bootmem PageReserved(page) && !early_section(): vmemmap from altmap !PageReserved(page): vmemmap from buddy But it's a bit shaky :)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 04545725f187..2e6b57207a3d 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1557,6 +1557,23 @@ Documentation/admin-guide/mm/hugetlbpage.rst. Format: size[KMG] + hugetlb_free_vmemmap= + [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP + enabled. + Allows heavy hugetlb users to free up some more + memory (6 * PAGE_SIZE for each 2MB hugetlb page). + This feauture is not free though. Large page + tables are not used to back vmemmap pages which + can lead to a performance degradation for some + workloads. Also there will be memory allocation + required when hugetlb pages are freed from the + pool which can lead to corner cases under heavy + memory pressure. + Format: { on | off (default) } + + on: enable the feature + off: disable the feature + hung_task_panic= [KNL] Should the hung task detector generate panics. Format: 0 | 1 diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 6988895d09a8..8abaeb144e44 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -153,6 +153,9 @@ default_hugepagesz will all result in 256 2M huge pages being allocated. Valid default huge page size is architecture dependent. +hugetlb_free_vmemmap + When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, this enables freeing + unused vmemmap pages associated with each HugeTLB page. When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages`` indicates the current number of pre-allocated huge pages of the default size. diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 0435bee2e172..39f88c5faadc 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -34,6 +34,7 @@ #include <linux/gfp.h> #include <linux/kcore.h> #include <linux/bootmem_info.h> +#include <linux/hugetlb.h> #include <asm/processor.h> #include <asm/bios_ebda.h> @@ -1557,7 +1558,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, { int err; - if (end - start < PAGES_PER_SECTION * sizeof(struct page)) + if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || + end - start < PAGES_PER_SECTION * sizeof(struct page)) err = vmemmap_populate_basepages(start, end, node, NULL); else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); @@ -1585,6 +1587,8 @@ void register_page_bootmem_memmap(unsigned long section_nr, pmd_t *pmd; unsigned int nr_pmd_pages; struct page *page; + bool base_mapping = !boot_cpu_has(X86_FEATURE_PSE) || + is_hugetlb_free_vmemmap_enabled(); for (; addr < end; addr = next) { pte_t *pte = NULL; @@ -1610,7 +1614,7 @@ void register_page_bootmem_memmap(unsigned long section_nr, } get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO); - if (!boot_cpu_has(X86_FEATURE_PSE)) { + if (base_mapping) { next = (addr + PAGE_SIZE) & PAGE_MASK; pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7f7a0e3405ae..3efc6b9b23f2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -872,6 +872,20 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; + +static inline bool is_hugetlb_free_vmemmap_enabled(void) +{ + return hugetlb_free_vmemmap_enabled; +} +#else +static inline bool is_hugetlb_free_vmemmap_enabled(void) +{ + return false; +} +#endif + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1025,6 +1039,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline bool is_hugetlb_free_vmemmap_enabled(void) +{ + return false; +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 0e6835264da3..721258beeb94 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -168,6 +168,8 @@ * (last) level. So this type of HugeTLB page can be optimized only when its * size of the struct page structs is greater than 2 pages. */ +#define pr_fmt(fmt) "HugeTLB: " fmt + #include "hugetlb_vmemmap.h" /* @@ -180,6 +182,28 @@ #define RESERVE_VMEMMAP_NR 2U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) +bool hugetlb_free_vmemmap_enabled; + +static int __init early_hugetlb_free_vmemmap_param(char *buf) +{ + /* We cannot optimize if a "struct page" crosses page boundaries. */ + if ((!is_power_of_2(sizeof(struct page)))) { + pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); + return 0; + } + + if (!buf) + return -EINVAL; + + if (!strcmp(buf, "on")) + hugetlb_free_vmemmap_enabled = true; + else if (strcmp(buf, "off")) + return -EINVAL; + + return 0; +} +early_param("hugetlb_free_vmemmap", early_hugetlb_free_vmemmap_param); + static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h) { return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT;