Message ID | 20210308102807.59745-7-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Free some vmemmap pages of HugeTLB page | expand |
On Mon 08-03-21 18:28:04, Muchun Song wrote: > Add a kernel parameter hugetlb_free_vmemmap to enable the feature of > freeing unused vmemmap pages associated with each hugetlb page on boot. > > We disables PMD mapping of vmemmap pages for x86-64 arch when this > feature is enabled. Because vmemmap_remap_free() depends on vmemmap > being base page mapped. > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > Reviewed-by: Oscar Salvador <osalvador@suse.de> > Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> > Tested-by: Chen Huang <chenhuang5@huawei.com> > Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com> > --- > Documentation/admin-guide/kernel-parameters.txt | 14 ++++++++++++++ > Documentation/admin-guide/mm/hugetlbpage.rst | 3 +++ > arch/x86/mm/init_64.c | 8 ++++++-- > include/linux/hugetlb.h | 19 +++++++++++++++++++ > mm/hugetlb_vmemmap.c | 24 ++++++++++++++++++++++++ > 5 files changed, 66 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 04545725f187..de91d54573c4 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1557,6 +1557,20 @@ > Documentation/admin-guide/mm/hugetlbpage.rst. > Format: size[KMG] > > + hugetlb_free_vmemmap= > + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, > + this controls freeing unused vmemmap pages associated > + with each HugeTLB page. When this option is enabled, > + we disable PMD/huge page mapping of vmemmap pages which > + increase page table pages. So if a user/sysadmin only > + uses a small number of HugeTLB pages (as a percentage > + of system memory), they could end up using more memory > + with hugetlb_free_vmemmap on as opposed to off. > + Format: { on | off (default) } Please note this is an admin guide and for those this seems overly low level. I would use something like the following [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP enabled. Allows heavy hugetlb users to free up some more memory (6 * PAGE_SIZE for each 2MB hugetlb page). This feauture is not free though. Large page tables are not use to back vmemmap pages which can lead to a performance degradation for some workloads. Also there will be memory allocation required when hugetlb pages are freed from the pool which can lead to corner cases under heavy memory pressure. > + > + on: enable the feature > + off: disable the feature > + > hung_task_panic= > [KNL] Should the hung task detector generate panics. > Format: 0 | 1
On 3/10/21 7:37 AM, Michal Hocko wrote: > On Mon 08-03-21 18:28:04, Muchun Song wrote: >> Add a kernel parameter hugetlb_free_vmemmap to enable the feature of >> freeing unused vmemmap pages associated with each hugetlb page on boot. >> >> We disables PMD mapping of vmemmap pages for x86-64 arch when this >> feature is enabled. Because vmemmap_remap_free() depends on vmemmap >> being base page mapped. >> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com> >> Reviewed-by: Oscar Salvador <osalvador@suse.de> >> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> >> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> >> Tested-by: Chen Huang <chenhuang5@huawei.com> >> Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com> >> --- >> Documentation/admin-guide/kernel-parameters.txt | 14 ++++++++++++++ >> Documentation/admin-guide/mm/hugetlbpage.rst | 3 +++ >> arch/x86/mm/init_64.c | 8 ++++++-- >> include/linux/hugetlb.h | 19 +++++++++++++++++++ >> mm/hugetlb_vmemmap.c | 24 ++++++++++++++++++++++++ >> 5 files changed, 66 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 04545725f187..de91d54573c4 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -1557,6 +1557,20 @@ >> Documentation/admin-guide/mm/hugetlbpage.rst. >> Format: size[KMG] >> >> + hugetlb_free_vmemmap= >> + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, >> + this controls freeing unused vmemmap pages associated >> + with each HugeTLB page. When this option is enabled, >> + we disable PMD/huge page mapping of vmemmap pages which >> + increase page table pages. So if a user/sysadmin only >> + uses a small number of HugeTLB pages (as a percentage >> + of system memory), they could end up using more memory >> + with hugetlb_free_vmemmap on as opposed to off. >> + Format: { on | off (default) } > > Please note this is an admin guide and for those this seems overly low > level. I would use something like the following > [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > enabled. > Allows heavy hugetlb users to free up some more > memory (6 * PAGE_SIZE for each 2MB hugetlb > page). > This feauture is not free though. Large page > tables are not use to back vmemmap pages which are not used > can lead to a performance degradation for some > workloads. Also there will be memory allocation > required when hugetlb pages are freed from the > pool which can lead to corner cases under heavy > memory pressure. >> + >> + on: enable the feature >> + off: disable the feature >> + >> hung_task_panic= >> [KNL] Should the hung task detector generate panics. >> Format: 0 | 1
On Wed, Mar 10, 2021 at 11:37 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 08-03-21 18:28:04, Muchun Song wrote: > > Add a kernel parameter hugetlb_free_vmemmap to enable the feature of > > freeing unused vmemmap pages associated with each hugetlb page on boot. > > > > We disables PMD mapping of vmemmap pages for x86-64 arch when this > > feature is enabled. Because vmemmap_remap_free() depends on vmemmap > > being base page mapped. > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > Reviewed-by: Oscar Salvador <osalvador@suse.de> > > Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> > > Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> > > Tested-by: Chen Huang <chenhuang5@huawei.com> > > Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com> > > --- > > Documentation/admin-guide/kernel-parameters.txt | 14 ++++++++++++++ > > Documentation/admin-guide/mm/hugetlbpage.rst | 3 +++ > > arch/x86/mm/init_64.c | 8 ++++++-- > > include/linux/hugetlb.h | 19 +++++++++++++++++++ > > mm/hugetlb_vmemmap.c | 24 ++++++++++++++++++++++++ > > 5 files changed, 66 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index 04545725f187..de91d54573c4 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -1557,6 +1557,20 @@ > > Documentation/admin-guide/mm/hugetlbpage.rst. > > Format: size[KMG] > > > > + hugetlb_free_vmemmap= > > + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, > > + this controls freeing unused vmemmap pages associated > > + with each HugeTLB page. When this option is enabled, > > + we disable PMD/huge page mapping of vmemmap pages which > > + increase page table pages. So if a user/sysadmin only > > + uses a small number of HugeTLB pages (as a percentage > > + of system memory), they could end up using more memory > > + with hugetlb_free_vmemmap on as opposed to off. > > + Format: { on | off (default) } > > Please note this is an admin guide and for those this seems overly low OK. > level. I would use something like the following > [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > enabled. > Allows heavy hugetlb users to free up some more > memory (6 * PAGE_SIZE for each 2MB hugetlb > page). > This feauture is not free though. Large page > tables are not use to back vmemmap pages which > can lead to a performance degradation for some > workloads. Also there will be memory allocation > required when hugetlb pages are freed from the > pool which can lead to corner cases under heavy > memory pressure. Very thanks. I will update this. > > + > > + on: enable the feature > > + off: disable the feature > > + > > hung_task_panic= > > [KNL] Should the hung task detector generate panics. > > Format: 0 | 1 > -- > Michal Hocko > SUSE Labs
On Thu, Mar 11, 2021 at 1:16 AM Randy Dunlap <rdunlap@infradead.org> wrote: > > On 3/10/21 7:37 AM, Michal Hocko wrote: > > On Mon 08-03-21 18:28:04, Muchun Song wrote: > >> Add a kernel parameter hugetlb_free_vmemmap to enable the feature of > >> freeing unused vmemmap pages associated with each hugetlb page on boot. > >> > >> We disables PMD mapping of vmemmap pages for x86-64 arch when this > >> feature is enabled. Because vmemmap_remap_free() depends on vmemmap > >> being base page mapped. > >> > >> Signed-off-by: Muchun Song <songmuchun@bytedance.com> > >> Reviewed-by: Oscar Salvador <osalvador@suse.de> > >> Reviewed-by: Barry Song <song.bao.hua@hisilicon.com> > >> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> > >> Tested-by: Chen Huang <chenhuang5@huawei.com> > >> Tested-by: Bodeddula Balasubramaniam <bodeddub@amazon.com> > >> --- > >> Documentation/admin-guide/kernel-parameters.txt | 14 ++++++++++++++ > >> Documentation/admin-guide/mm/hugetlbpage.rst | 3 +++ > >> arch/x86/mm/init_64.c | 8 ++++++-- > >> include/linux/hugetlb.h | 19 +++++++++++++++++++ > >> mm/hugetlb_vmemmap.c | 24 ++++++++++++++++++++++++ > >> 5 files changed, 66 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > >> index 04545725f187..de91d54573c4 100644 > >> --- a/Documentation/admin-guide/kernel-parameters.txt > >> +++ b/Documentation/admin-guide/kernel-parameters.txt > >> @@ -1557,6 +1557,20 @@ > >> Documentation/admin-guide/mm/hugetlbpage.rst. > >> Format: size[KMG] > >> > >> + hugetlb_free_vmemmap= > >> + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, > >> + this controls freeing unused vmemmap pages associated > >> + with each HugeTLB page. When this option is enabled, > >> + we disable PMD/huge page mapping of vmemmap pages which > >> + increase page table pages. So if a user/sysadmin only > >> + uses a small number of HugeTLB pages (as a percentage > >> + of system memory), they could end up using more memory > >> + with hugetlb_free_vmemmap on as opposed to off. > >> + Format: { on | off (default) } > > > > Please note this is an admin guide and for those this seems overly low > > level. I would use something like the following > > [KNL] Reguires CONFIG_HUGETLB_PAGE_FREE_VMEMMAP > > enabled. > > Allows heavy hugetlb users to free up some more > > memory (6 * PAGE_SIZE for each 2MB hugetlb > > page). > > This feauture is not free though. Large page > > tables are not use to back vmemmap pages which > > are not used Thanks. > > > can lead to a performance degradation for some > > workloads. Also there will be memory allocation > > required when hugetlb pages are freed from the > > pool which can lead to corner cases under heavy > > memory pressure. > >> + > >> + on: enable the feature > >> + off: disable the feature > >> + > >> hung_task_panic= > >> [KNL] Should the hung task detector generate panics. > >> Format: 0 | 1 > > > -- > ~Randy >
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 04545725f187..de91d54573c4 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1557,6 +1557,20 @@ Documentation/admin-guide/mm/hugetlbpage.rst. Format: size[KMG] + hugetlb_free_vmemmap= + [KNL] When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, + this controls freeing unused vmemmap pages associated + with each HugeTLB page. When this option is enabled, + we disable PMD/huge page mapping of vmemmap pages which + increase page table pages. So if a user/sysadmin only + uses a small number of HugeTLB pages (as a percentage + of system memory), they could end up using more memory + with hugetlb_free_vmemmap on as opposed to off. + Format: { on | off (default) } + + on: enable the feature + off: disable the feature + hung_task_panic= [KNL] Should the hung task detector generate panics. Format: 0 | 1 diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 6988895d09a8..8abaeb144e44 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -153,6 +153,9 @@ default_hugepagesz will all result in 256 2M huge pages being allocated. Valid default huge page size is architecture dependent. +hugetlb_free_vmemmap + When CONFIG_HUGETLB_PAGE_FREE_VMEMMAP is set, this enables freeing + unused vmemmap pages associated with each HugeTLB page. When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages`` indicates the current number of pre-allocated huge pages of the default size. diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 0435bee2e172..39f88c5faadc 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -34,6 +34,7 @@ #include <linux/gfp.h> #include <linux/kcore.h> #include <linux/bootmem_info.h> +#include <linux/hugetlb.h> #include <asm/processor.h> #include <asm/bios_ebda.h> @@ -1557,7 +1558,8 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, { int err; - if (end - start < PAGES_PER_SECTION * sizeof(struct page)) + if ((is_hugetlb_free_vmemmap_enabled() && !altmap) || + end - start < PAGES_PER_SECTION * sizeof(struct page)) err = vmemmap_populate_basepages(start, end, node, NULL); else if (boot_cpu_has(X86_FEATURE_PSE)) err = vmemmap_populate_hugepages(start, end, node, altmap); @@ -1585,6 +1587,8 @@ void register_page_bootmem_memmap(unsigned long section_nr, pmd_t *pmd; unsigned int nr_pmd_pages; struct page *page; + bool base_mapping = !boot_cpu_has(X86_FEATURE_PSE) || + is_hugetlb_free_vmemmap_enabled(); for (; addr < end; addr = next) { pte_t *pte = NULL; @@ -1610,7 +1614,7 @@ void register_page_bootmem_memmap(unsigned long section_nr, } get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO); - if (!boot_cpu_has(X86_FEATURE_PSE)) { + if (base_mapping) { next = (addr + PAGE_SIZE) & PAGE_MASK; pmd = pmd_offset(pud, addr); if (pmd_none(*pmd)) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ce6533584eb7..78934e9aeab6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -852,6 +852,20 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; + +static inline bool is_hugetlb_free_vmemmap_enabled(void) +{ + return hugetlb_free_vmemmap_enabled; +} +#else +static inline bool is_hugetlb_free_vmemmap_enabled(void) +{ + return false; +} +#endif + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1005,6 +1019,11 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline bool is_hugetlb_free_vmemmap_enabled(void) +{ + return false; +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index f7ab3d99250a..7807ed6678e0 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -169,6 +169,8 @@ * (last) level. So this type of HugeTLB page can be optimized only when its * size of the struct page structs is greater than 2 pages. */ +#define pr_fmt(fmt) "HugeTLB: " fmt + #include "hugetlb_vmemmap.h" /* @@ -181,6 +183,28 @@ #define RESERVE_VMEMMAP_NR 2U #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) +bool hugetlb_free_vmemmap_enabled; + +static int __init early_hugetlb_free_vmemmap_param(char *buf) +{ + /* We cannot optimize if a "struct page" crosses page boundaries. */ + if ((!is_power_of_2(sizeof(struct page)))) { + pr_warn("cannot free vmemmap pages because \"struct page\" crosses page boundaries\n"); + return 0; + } + + if (!buf) + return -EINVAL; + + if (!strcmp(buf, "on")) + hugetlb_free_vmemmap_enabled = true; + else if (strcmp(buf, "off")) + return -EINVAL; + + return 0; +} +early_param("hugetlb_free_vmemmap", early_hugetlb_free_vmemmap_param); + static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h) { return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT;