diff mbox series

[v3,03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP

Message ID 20201108141113.65450-4-songmuchun@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Free some vmemmap pages of hugetlb page | expand

Commit Message

Muchun Song Nov. 8, 2020, 2:10 p.m. UTC
The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
whether to enable the feature of freeing unused vmemmap associated
with HugeTLB pages. Now only support x86.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 arch/x86/mm/init_64.c |  2 +-
 fs/Kconfig            | 16 ++++++++++++++++
 mm/bootmem_info.c     |  3 +--
 3 files changed, 18 insertions(+), 3 deletions(-)

Comments

Oscar Salvador Nov. 9, 2020, 1:52 p.m. UTC | #1
On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
> The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
> whether to enable the feature of freeing unused vmemmap associated
> with HugeTLB pages. Now only support x86.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  arch/x86/mm/init_64.c |  2 +-
>  fs/Kconfig            | 16 ++++++++++++++++
>  mm/bootmem_info.c     |  3 +--
>  3 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 0a45f062826e..0435bee2e172 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1225,7 +1225,7 @@ static struct kcore_list kcore_vsyscall;
>  
>  static void __init register_page_bootmem_info(void)
>  {
> -#ifdef CONFIG_NUMA
> +#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
>  	int i;
>  
>  	for_each_online_node(i)
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 976e8b9033c4..21b8d39a9715 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -245,6 +245,22 @@ config HUGETLBFS
>  config HUGETLB_PAGE
>  	def_bool HUGETLBFS
>  
> +config HUGETLB_PAGE_FREE_VMEMMAP
> +	bool "Free unused vmemmap associated with HugeTLB pages"
> +	default y
> +	depends on X86
> +	depends on HUGETLB_PAGE
> +	depends on SPARSEMEM_VMEMMAP
> +	depends on HAVE_BOOTMEM_INFO_NODE
> +	help
> +	  There are many struct page structures associated with each HugeTLB
> +	  page. But we only use a few struct page structures. In this case,
> +	  it wastes some memory. It is better to free the unused struct page
> +	  structures to buddy system which can save some memory. For
> +	  architectures that support it, say Y here.
> +
> +	  If unsure, say N.

I am not sure the above is useful for someone who needs to decide
whether he needs/wants to enable this or not.
I think the above fits better in a Documentation part.

I suck at this, but what about the following, or something along those
lines? 

"
When using SPARSEMEM_VMEMMAP, the system can save up some memory
from pre-allocated HugeTLB pages when they are not used.
6 pages per 2MB HugeTLB page and 4095 per 1GB HugeTLB page.
When the pages are going to be used or freed up, the vmemmap
array representing that range needs to be remapped again and
the pages we discarded earlier need to be rellocated again.
Therefore, this is a trade-off between saving memory and
increasing time in allocation/free path.
"

It would be also great to point out that this might be a
trade-off between saving up memory and increasing the cost
of certain operations on allocation/free path.
That is why I mentioned it there.
Muchun Song Nov. 9, 2020, 2:20 p.m. UTC | #2
On Mon, Nov 9, 2020 at 9:52 PM Oscar Salvador <osalvador@suse.de> wrote:
>
> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
> > The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
> > whether to enable the feature of freeing unused vmemmap associated
> > with HugeTLB pages. Now only support x86.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  arch/x86/mm/init_64.c |  2 +-
> >  fs/Kconfig            | 16 ++++++++++++++++
> >  mm/bootmem_info.c     |  3 +--
> >  3 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index 0a45f062826e..0435bee2e172 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -1225,7 +1225,7 @@ static struct kcore_list kcore_vsyscall;
> >
> >  static void __init register_page_bootmem_info(void)
> >  {
> > -#ifdef CONFIG_NUMA
> > +#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
> >       int i;
> >
> >       for_each_online_node(i)
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index 976e8b9033c4..21b8d39a9715 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -245,6 +245,22 @@ config HUGETLBFS
> >  config HUGETLB_PAGE
> >       def_bool HUGETLBFS
> >
> > +config HUGETLB_PAGE_FREE_VMEMMAP
> > +     bool "Free unused vmemmap associated with HugeTLB pages"
> > +     default y
> > +     depends on X86
> > +     depends on HUGETLB_PAGE
> > +     depends on SPARSEMEM_VMEMMAP
> > +     depends on HAVE_BOOTMEM_INFO_NODE
> > +     help
> > +       There are many struct page structures associated with each HugeTLB
> > +       page. But we only use a few struct page structures. In this case,
> > +       it wastes some memory. It is better to free the unused struct page
> > +       structures to buddy system which can save some memory. For
> > +       architectures that support it, say Y here.
> > +
> > +       If unsure, say N.
>
> I am not sure the above is useful for someone who needs to decide
> whether he needs/wants to enable this or not.
> I think the above fits better in a Documentation part.
>
> I suck at this, but what about the following, or something along those
> lines?
>
> "
> When using SPARSEMEM_VMEMMAP, the system can save up some memory
> from pre-allocated HugeTLB pages when they are not used.
> 6 pages per 2MB HugeTLB page and 4095 per 1GB HugeTLB page.
> When the pages are going to be used or freed up, the vmemmap
> array representing that range needs to be remapped again and
> the pages we discarded earlier need to be rellocated again.
> Therefore, this is a trade-off between saving memory and
> increasing time in allocation/free path.
> "

Will do. Thanks for your suggestions.

>
> It would be also great to point out that this might be a
> trade-off between saving up memory and increasing the cost
> of certain operations on allocation/free path.
> That is why I mentioned it there.

OK, I will add this to the Documentation part, thanks.

>
> --
> Oscar Salvador
> SUSE L3
Mike Kravetz Nov. 10, 2020, 7:31 p.m. UTC | #3
On 11/9/20 5:52 AM, Oscar Salvador wrote:
> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
>> The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
>> whether to enable the feature of freeing unused vmemmap associated
>> with HugeTLB pages. Now only support x86.
>>
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>>  arch/x86/mm/init_64.c |  2 +-
>>  fs/Kconfig            | 16 ++++++++++++++++
>>  mm/bootmem_info.c     |  3 +--
>>  3 files changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index 0a45f062826e..0435bee2e172 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -1225,7 +1225,7 @@ static struct kcore_list kcore_vsyscall;
>>  
>>  static void __init register_page_bootmem_info(void)
>>  {
>> -#ifdef CONFIG_NUMA
>> +#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
>>  	int i;
>>  
>>  	for_each_online_node(i)
>> diff --git a/fs/Kconfig b/fs/Kconfig
>> index 976e8b9033c4..21b8d39a9715 100644
>> --- a/fs/Kconfig
>> +++ b/fs/Kconfig
>> @@ -245,6 +245,22 @@ config HUGETLBFS
>>  config HUGETLB_PAGE
>>  	def_bool HUGETLBFS
>>  
>> +config HUGETLB_PAGE_FREE_VMEMMAP
>> +	bool "Free unused vmemmap associated with HugeTLB pages"
>> +	default y
>> +	depends on X86
>> +	depends on HUGETLB_PAGE
>> +	depends on SPARSEMEM_VMEMMAP
>> +	depends on HAVE_BOOTMEM_INFO_NODE
>> +	help
>> +	  There are many struct page structures associated with each HugeTLB
>> +	  page. But we only use a few struct page structures. In this case,
>> +	  it wastes some memory. It is better to free the unused struct page
>> +	  structures to buddy system which can save some memory. For
>> +	  architectures that support it, say Y here.
>> +
>> +	  If unsure, say N.
> 
> I am not sure the above is useful for someone who needs to decide
> whether he needs/wants to enable this or not.
> I think the above fits better in a Documentation part.
> 
> I suck at this, but what about the following, or something along those
> lines? 
> 
> "
> When using SPARSEMEM_VMEMMAP, the system can save up some memory
> from pre-allocated HugeTLB pages when they are not used.
> 6 pages per 2MB HugeTLB page and 4095 per 1GB HugeTLB page.
> When the pages are going to be used or freed up, the vmemmap
> array representing that range needs to be remapped again and
> the pages we discarded earlier need to be rellocated again.
> Therefore, this is a trade-off between saving memory and
> increasing time in allocation/free path.
> "
> 
> It would be also great to point out that this might be a
> trade-off between saving up memory and increasing the cost
> of certain operations on allocation/free path.
> That is why I mentioned it there.

Yes, this is somewhat a trade-off.

As a config option, this is something that would likely be decided by
distros.  I almost hate to suggest this, but is it something that an
end user would want to decide?  Is this something that perhaps should
be a boot/kernel command line option?
Matthew Wilcox (Oracle) Nov. 10, 2020, 7:50 p.m. UTC | #4
On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote:
> On 11/9/20 5:52 AM, Oscar Salvador wrote:
> > On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
> >> The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
> >> whether to enable the feature of freeing unused vmemmap associated
> >> with HugeTLB pages. Now only support x86.
> >>
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> ---
> >>  arch/x86/mm/init_64.c |  2 +-
> >>  fs/Kconfig            | 16 ++++++++++++++++
> >>  mm/bootmem_info.c     |  3 +--
> >>  3 files changed, 18 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> >> index 0a45f062826e..0435bee2e172 100644
> >> --- a/arch/x86/mm/init_64.c
> >> +++ b/arch/x86/mm/init_64.c
> >> @@ -1225,7 +1225,7 @@ static struct kcore_list kcore_vsyscall;
> >>  
> >>  static void __init register_page_bootmem_info(void)
> >>  {
> >> -#ifdef CONFIG_NUMA
> >> +#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
> >>  	int i;
> >>  
> >>  	for_each_online_node(i)
> >> diff --git a/fs/Kconfig b/fs/Kconfig
> >> index 976e8b9033c4..21b8d39a9715 100644
> >> --- a/fs/Kconfig
> >> +++ b/fs/Kconfig
> >> @@ -245,6 +245,22 @@ config HUGETLBFS
> >>  config HUGETLB_PAGE
> >>  	def_bool HUGETLBFS
> >>  
> >> +config HUGETLB_PAGE_FREE_VMEMMAP
> >> +	bool "Free unused vmemmap associated with HugeTLB pages"
> >> +	default y
> >> +	depends on X86
> >> +	depends on HUGETLB_PAGE
> >> +	depends on SPARSEMEM_VMEMMAP
> >> +	depends on HAVE_BOOTMEM_INFO_NODE
> >> +	help
> >> +	  There are many struct page structures associated with each HugeTLB
> >> +	  page. But we only use a few struct page structures. In this case,
> >> +	  it wastes some memory. It is better to free the unused struct page
> >> +	  structures to buddy system which can save some memory. For
> >> +	  architectures that support it, say Y here.
> >> +
> >> +	  If unsure, say N.
> > 
> > I am not sure the above is useful for someone who needs to decide
> > whether he needs/wants to enable this or not.
> > I think the above fits better in a Documentation part.
> > 
> > I suck at this, but what about the following, or something along those
> > lines? 
> > 
> > "
> > When using SPARSEMEM_VMEMMAP, the system can save up some memory
> > from pre-allocated HugeTLB pages when they are not used.
> > 6 pages per 2MB HugeTLB page and 4095 per 1GB HugeTLB page.
> > When the pages are going to be used or freed up, the vmemmap
> > array representing that range needs to be remapped again and
> > the pages we discarded earlier need to be rellocated again.
> > Therefore, this is a trade-off between saving memory and
> > increasing time in allocation/free path.
> > "
> > 
> > It would be also great to point out that this might be a
> > trade-off between saving up memory and increasing the cost
> > of certain operations on allocation/free path.
> > That is why I mentioned it there.
> 
> Yes, this is somewhat a trade-off.
> 
> As a config option, this is something that would likely be decided by
> distros.  I almost hate to suggest this, but is it something that an
> end user would want to decide?  Is this something that perhaps should
> be a boot/kernel command line option?

I don't like config options.  I like boot options even less.  I don't
know how to describe to an end-user whether they should select this
or not.  Is there a way to make this not a tradeoff?  Or make the
tradeoff so minimal as to be not worth describing?  (do we have numbers
for the worst possible situation when enabling this option?)

I haven't read through these patches in detail, so maybe we do this
already, but when we free the pages to the buddy allocator, do we retain
the third page to use for the PTEs (and free pages 3-7), or do we allocate
a separate page for the PTES and free pages 2-7?
Mike Kravetz Nov. 10, 2020, 8:30 p.m. UTC | #5
On 11/10/20 11:50 AM, Matthew Wilcox wrote:
> On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote:
>> On 11/9/20 5:52 AM, Oscar Salvador wrote:
>>> On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
> 
> I don't like config options.  I like boot options even less.  I don't
> know how to describe to an end-user whether they should select this
> or not.  Is there a way to make this not a tradeoff?  Or make the
> tradeoff so minimal as to be not worth describing?  (do we have numbers
> for the worst possible situation when enabling this option?)

It is not exactly worst case, but Muchun provided some simple benchmarking
results in the cover letter.  Quick summary is that hugetlb page creation
and free time is "~2x slower".  At first glance, one would say that is
terrible.  However, remember that the majority of use cases create hugetlb
pages at or shortly after boot time and add them to the pool.  So, additional
overhead is at pool creation time.  There is no change to 'normal run time'
operations of getting a page from or returning a page to the pool (think
page fault/unmap).

> I haven't read through these patches in detail, so maybe we do this
> already, but when we free the pages to the buddy allocator, do we retain
> the third page to use for the PTEs (and free pages 3-7), or do we allocate
> a separate page for the PTES and free pages 2-7?

I haven't got there in this latest series.  But, in previous revisions the
code did allocate a separate page.
Muchun Song Nov. 11, 2020, 3:28 a.m. UTC | #6
On Wed, Nov 11, 2020 at 3:31 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 11/9/20 5:52 AM, Oscar Salvador wrote:
> > On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
> >> The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
> >> whether to enable the feature of freeing unused vmemmap associated
> >> with HugeTLB pages. Now only support x86.
> >>
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> ---
> >>  arch/x86/mm/init_64.c |  2 +-
> >>  fs/Kconfig            | 16 ++++++++++++++++
> >>  mm/bootmem_info.c     |  3 +--
> >>  3 files changed, 18 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> >> index 0a45f062826e..0435bee2e172 100644
> >> --- a/arch/x86/mm/init_64.c
> >> +++ b/arch/x86/mm/init_64.c
> >> @@ -1225,7 +1225,7 @@ static struct kcore_list kcore_vsyscall;
> >>
> >>  static void __init register_page_bootmem_info(void)
> >>  {
> >> -#ifdef CONFIG_NUMA
> >> +#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
> >>      int i;
> >>
> >>      for_each_online_node(i)
> >> diff --git a/fs/Kconfig b/fs/Kconfig
> >> index 976e8b9033c4..21b8d39a9715 100644
> >> --- a/fs/Kconfig
> >> +++ b/fs/Kconfig
> >> @@ -245,6 +245,22 @@ config HUGETLBFS
> >>  config HUGETLB_PAGE
> >>      def_bool HUGETLBFS
> >>
> >> +config HUGETLB_PAGE_FREE_VMEMMAP
> >> +    bool "Free unused vmemmap associated with HugeTLB pages"
> >> +    default y
> >> +    depends on X86
> >> +    depends on HUGETLB_PAGE
> >> +    depends on SPARSEMEM_VMEMMAP
> >> +    depends on HAVE_BOOTMEM_INFO_NODE
> >> +    help
> >> +      There are many struct page structures associated with each HugeTLB
> >> +      page. But we only use a few struct page structures. In this case,
> >> +      it wastes some memory. It is better to free the unused struct page
> >> +      structures to buddy system which can save some memory. For
> >> +      architectures that support it, say Y here.
> >> +
> >> +      If unsure, say N.
> >
> > I am not sure the above is useful for someone who needs to decide
> > whether he needs/wants to enable this or not.
> > I think the above fits better in a Documentation part.
> >
> > I suck at this, but what about the following, or something along those
> > lines?
> >
> > "
> > When using SPARSEMEM_VMEMMAP, the system can save up some memory
> > from pre-allocated HugeTLB pages when they are not used.
> > 6 pages per 2MB HugeTLB page and 4095 per 1GB HugeTLB page.
> > When the pages are going to be used or freed up, the vmemmap
> > array representing that range needs to be remapped again and
> > the pages we discarded earlier need to be rellocated again.
> > Therefore, this is a trade-off between saving memory and
> > increasing time in allocation/free path.
> > "
> >
> > It would be also great to point out that this might be a
> > trade-off between saving up memory and increasing the cost
> > of certain operations on allocation/free path.
> > That is why I mentioned it there.
>
> Yes, this is somewhat a trade-off.
>
> As a config option, this is something that would likely be decided by
> distros.  I almost hate to suggest this, but is it something that an
> end user would want to decide?  Is this something that perhaps should
> be a boot/kernel command line option?

Yeah, it already has a boot/kernel command line option named
"hugetlb_free_vmemmap". We can refer to

  [PATCH v3 18/21] mm/hugetlb: Add a kernel parameter hugetlb_free_vmemmap

Thanks :)


>
> --
> Mike Kravetz



--
Yours,
Muchun
Muchun Song Nov. 17, 2020, 3:35 p.m. UTC | #7
On Wed, Nov 11, 2020 at 3:50 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 10, 2020 at 11:31:31AM -0800, Mike Kravetz wrote:
> > On 11/9/20 5:52 AM, Oscar Salvador wrote:
> > > On Sun, Nov 08, 2020 at 10:10:55PM +0800, Muchun Song wrote:
> > >> The purpose of introducing HUGETLB_PAGE_FREE_VMEMMAP is to configure
> > >> whether to enable the feature of freeing unused vmemmap associated
> > >> with HugeTLB pages. Now only support x86.
> > >>
> > >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > >> ---
> > >>  arch/x86/mm/init_64.c |  2 +-
> > >>  fs/Kconfig            | 16 ++++++++++++++++
> > >>  mm/bootmem_info.c     |  3 +--
> > >>  3 files changed, 18 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > >> index 0a45f062826e..0435bee2e172 100644
> > >> --- a/arch/x86/mm/init_64.c
> > >> +++ b/arch/x86/mm/init_64.c
> > >> @@ -1225,7 +1225,7 @@ static struct kcore_list kcore_vsyscall;
> > >>
> > >>  static void __init register_page_bootmem_info(void)
> > >>  {
> > >> -#ifdef CONFIG_NUMA
> > >> +#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
> > >>    int i;
> > >>
> > >>    for_each_online_node(i)
> > >> diff --git a/fs/Kconfig b/fs/Kconfig
> > >> index 976e8b9033c4..21b8d39a9715 100644
> > >> --- a/fs/Kconfig
> > >> +++ b/fs/Kconfig
> > >> @@ -245,6 +245,22 @@ config HUGETLBFS
> > >>  config HUGETLB_PAGE
> > >>    def_bool HUGETLBFS
> > >>
> > >> +config HUGETLB_PAGE_FREE_VMEMMAP
> > >> +  bool "Free unused vmemmap associated with HugeTLB pages"
> > >> +  default y
> > >> +  depends on X86
> > >> +  depends on HUGETLB_PAGE
> > >> +  depends on SPARSEMEM_VMEMMAP
> > >> +  depends on HAVE_BOOTMEM_INFO_NODE
> > >> +  help
> > >> +    There are many struct page structures associated with each HugeTLB
> > >> +    page. But we only use a few struct page structures. In this case,
> > >> +    it wastes some memory. It is better to free the unused struct page
> > >> +    structures to buddy system which can save some memory. For
> > >> +    architectures that support it, say Y here.
> > >> +
> > >> +    If unsure, say N.
> > >
> > > I am not sure the above is useful for someone who needs to decide
> > > whether he needs/wants to enable this or not.
> > > I think the above fits better in a Documentation part.
> > >
> > > I suck at this, but what about the following, or something along those
> > > lines?
> > >
> > > "
> > > When using SPARSEMEM_VMEMMAP, the system can save up some memory
> > > from pre-allocated HugeTLB pages when they are not used.
> > > 6 pages per 2MB HugeTLB page and 4095 per 1GB HugeTLB page.
> > > When the pages are going to be used or freed up, the vmemmap
> > > array representing that range needs to be remapped again and
> > > the pages we discarded earlier need to be rellocated again.
> > > Therefore, this is a trade-off between saving memory and
> > > increasing time in allocation/free path.
> > > "
> > >
> > > It would be also great to point out that this might be a
> > > trade-off between saving up memory and increasing the cost
> > > of certain operations on allocation/free path.
> > > That is why I mentioned it there.
> >
> > Yes, this is somewhat a trade-off.
> >
> > As a config option, this is something that would likely be decided by
> > distros.  I almost hate to suggest this, but is it something that an
> > end user would want to decide?  Is this something that perhaps should
> > be a boot/kernel command line option?
>
> I don't like config options.  I like boot options even less.  I don't
> know how to describe to an end-user whether they should select this
> or not.  Is there a way to make this not a tradeoff?  Or make the
> tradeoff so minimal as to be not worth describing?  (do we have numbers
> for the worst possible situation when enabling this option?)
>
> I haven't read through these patches in detail, so maybe we do this
> already, but when we free the pages to the buddy allocator, do we retain
> the third page to use for the PTEs (and free pages 3-7), or do we allocate
> a separate page for the PTES and free pages 2-7?

Sorry for missing this reply. It is a good idea. I will start an investigation
and implement this. Thanks Matthew.
diff mbox series

Patch

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 0a45f062826e..0435bee2e172 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1225,7 +1225,7 @@  static struct kcore_list kcore_vsyscall;
 
 static void __init register_page_bootmem_info(void)
 {
-#ifdef CONFIG_NUMA
+#if defined(CONFIG_NUMA) || defined(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP)
 	int i;
 
 	for_each_online_node(i)
diff --git a/fs/Kconfig b/fs/Kconfig
index 976e8b9033c4..21b8d39a9715 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -245,6 +245,22 @@  config HUGETLBFS
 config HUGETLB_PAGE
 	def_bool HUGETLBFS
 
+config HUGETLB_PAGE_FREE_VMEMMAP
+	bool "Free unused vmemmap associated with HugeTLB pages"
+	default y
+	depends on X86
+	depends on HUGETLB_PAGE
+	depends on SPARSEMEM_VMEMMAP
+	depends on HAVE_BOOTMEM_INFO_NODE
+	help
+	  There are many struct page structures associated with each HugeTLB
+	  page. But we only use a few struct page structures. In this case,
+	  it wastes some memory. It is better to free the unused struct page
+	  structures to buddy system which can save some memory. For
+	  architectures that support it, say Y here.
+
+	  If unsure, say N.
+
 config MEMFD_CREATE
 	def_bool TMPFS || HUGETLBFS
 
diff --git a/mm/bootmem_info.c b/mm/bootmem_info.c
index d276e96e487f..fcab5a3f8cc0 100644
--- a/mm/bootmem_info.c
+++ b/mm/bootmem_info.c
@@ -10,8 +10,7 @@ 
 #include <linux/bootmem_info.h>
 #include <linux/memory_hotplug.h>
 
-void get_page_bootmem(unsigned long info,  struct page *page,
-		      unsigned long type)
+void get_page_bootmem(unsigned long info, struct page *page, unsigned long type)
 {
 	page->freelist = (void *)type;
 	SetPagePrivate(page);