[v7,0/5] Free the 2nd vmemmap page associated with each HugeTLB page

Message ID	20211101031651.75851-1-songmuchun@bytedance.com (mailing list archive)
Headers	show Return-Path: <SRS0=vc3H=PU=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B0FA1600CD From: Muchun Song <songmuchun@bytedance.com> To: mike.kravetz@oracle.com, akpm@linux-foundation.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, chenhuang5@huawei.com, bodeddub@amazon.com, corbet@lwn.net, willy@infradead.org, 21cnbao@gmail.com Cc: duanxiongchun@bytedance.com, fam.zheng@bytedance.com, smuchun@gmail.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Muchun Song <songmuchun@bytedance.com> Subject: [PATCH v7 0/5] Free the 2nd vmemmap page associated with each HugeTLB page Date: Mon, 1 Nov 2021 11:16:46 +0800 Message-Id: <20211101031651.75851-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Free the 2nd vmemmap page associated with each HugeTLB page \| expand [v7,0/5] Free the 2nd vmemmap page associated with each HugeTLB page [v7,1/5] mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page [v7,2/5] mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key [v7,3/5] mm: sparsemem: use page table lock to protect kernel pmd operations [v7,4/5] selftests: vm: add a hugetlb test case [v7,5/5] mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP

Muchun Song Nov. 1, 2021, 3:16 a.m. UTC

This series can minimize the overhead of struct page for 2MB HugeTLB pages
significantly. It further reduces the overhead of struct page by 12.5% for
a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.

The main implementation and details can refer to the commit log of patch 1.
In this series, I have changed the following four helpers, the following
table shows the impact of the overhead of those helpers.

	+------------------+-----------------------+
	|       APIs       | head page | tail page |
	+------------------+-----------+-----------+
	|    PageHead()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|    PageTail()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|  PageCompound()  |     N     |     N     |
	+------------------+-----------+-----------+
	|  compound_head() |     Y     |     N     |
	+------------------+-----------+-----------+

	Y: Overhead is increased.
	N: Overhead is _NOT_ increased.

It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()). So I believe that Matthew Wilcox's folio series
will help with this.

The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.

I have tested the overhead of calling compound_head() on a head page, which
is 2.11ns (Measure the call time of 10 million times compound_head(), and
then average).

For a head page whose address is not aligned with PAGE_SIZE or a non-compound
page, the overhead of compound_head() is 2.54ns which is increased by 20%.
For a head page whose address is aligned with PAGE_SIZE, the overhead of
compound_head() is 2.97ns which is increased by 40%. Most pages are the former.
I do not think the overhead is significant since the overhead of compound_head()
itself is low.

Changlogs in v7:
  1. Fix a typo (change page_head_if_fake to page_fixed_fake_head) in the
     commit log of patch 2.
  2. Move details of implementation from cover letter to the commit log of
     patch 1.
  3. Add some overhead numbers to the cover letter.

Changlogs in v6:
  1. Add test case to tools/testing/selftests/vm/run_vmtests.sh.

Changlogs in v5:
  1. Move NR_RESET_STRUCT_PAGE to the front of reset_struct_pages().
  2. Collect Reviewed-by tags.

  Thanks Barry for his suggestions and reviews.

Changlogs in v4:
  1. Move hugetlb_free_vmemmap_enabled from hugetlb.h to page-flags.h.
  2. Collect Reviewed-by.
  3. Add a new patch to move vmemmap functions related to HugeTLB to
     the scope of the CONFIG_HUGETLB_PAGE_FREE_VMEMMAP.

  Thanks Barry for his suggestions and reviews.

Changlogs in v3:
  1. Rename page_head_if_fake() to page_fixed_fake_head().
  2. Introducing a new helper page_is_fake_head() to make code more readable.
  3. Update commit log of patch 3 to add more judgements.
  4. Add some comments in check_page_flags() in the patch 4.

  Thanks Barry for his suggestions and reviews.

Changlogs in v2:
  1. Drop two patches of introducing PAGEFLAGS_MASK from this series.
  2. Let page_head_if_fake() return page instead of NULL.
  3. Add a selftest to check if PageHead or PageTail work well.

Muchun Song (5):
  mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB
    page
  mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key
  mm: sparsemem: use page table lock to protect kernel pmd operations
  selftests: vm: add a hugetlb test case
  mm: sparsemem: move vmemmap related to HugeTLB to
    CONFIG_HUGETLB_PAGE_FREE_VMEMMAP

 Documentation/admin-guide/kernel-parameters.txt |   2 +-
 include/linux/hugetlb.h                         |   6 -
 include/linux/mm.h                              |   2 +
 include/linux/page-flags.h                      |  90 ++++++++++++++-
 mm/hugetlb_vmemmap.c                            |  68 ++++++-----
 mm/memory_hotplug.c                             |   2 +-
 mm/ptdump.c                                     |  16 ++-
 mm/sparse-vmemmap.c                             |  70 +++++++++---
 tools/testing/selftests/vm/.gitignore           |   1 +
 tools/testing/selftests/vm/Makefile             |   1 +
 tools/testing/selftests/vm/hugepage-vmemmap.c   | 144 ++++++++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests.sh       |  11 ++
 12 files changed, 350 insertions(+), 63 deletions(-)
 create mode 100644 tools/testing/selftests/vm/hugepage-vmemmap.c

Muchun Song Nov. 8, 2021, 8:16 a.m. UTC | #1

On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> significantly. It further reduces the overhead of struct page by 12.5% for
> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
>

Hi,

Ping guys. Does anyone have any comments or suggestions
on this series?

Thanks.

Mike Kravetz Nov. 8, 2021, 7:33 p.m. UTC | #2

On 11/8/21 12:16 AM, Muchun Song wrote:
> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>
>> This series can minimize the overhead of struct page for 2MB HugeTLB pages
>> significantly. It further reduces the overhead of struct page by 12.5% for
>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
>>
> 
> Hi,
> 
> Ping guys. Does anyone have any comments or suggestions
> on this series?
> 
> Thanks.
> 

I did look over the series earlier.  I have no issue with the hugetlb and
vmemmap modifications as they are enhancements to the existing
optimizations.  My primary concern is the (small) increased overhead
for the helpers as outlined in your cover letter.  Since these helpers
are not limited to hugetlb and used throughout the kernel, I would
really like to get comments from others with a better understanding of
the potential impact.

Muchun Song Nov. 10, 2021, 6:18 a.m. UTC | #3

On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 11/8/21 12:16 AM, Muchun Song wrote:
> > On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>
> >> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> >> significantly. It further reduces the overhead of struct page by 12.5% for
> >> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> >> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> >>
> >
> > Hi,
> >
> > Ping guys. Does anyone have any comments or suggestions
> > on this series?
> >
> > Thanks.
> >
>
> I did look over the series earlier.  I have no issue with the hugetlb and
> vmemmap modifications as they are enhancements to the existing
> optimizations.  My primary concern is the (small) increased overhead
> for the helpers as outlined in your cover letter.  Since these helpers
> are not limited to hugetlb and used throughout the kernel, I would
> really like to get comments from others with a better understanding of
> the potential impact.

Thanks Mike. I'd like to hear others' comments about this as well.
From my point of view, maybe the (small) overhead is acceptable
since it only affects the head page, however Matthew Wilcox's folio
series could reduce this situation as well.

Looking forward to others' comments.

Thanks.

>
> --
> Mike Kravetz

Muchun Song Nov. 22, 2021, 4:21 a.m. UTC | #4

On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 11/8/21 12:16 AM, Muchun Song wrote:
> > > On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >>
> > >> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> > >> significantly. It further reduces the overhead of struct page by 12.5% for
> > >> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> > >> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> > >>
> > >
> > > Hi,
> > >
> > > Ping guys. Does anyone have any comments or suggestions
> > > on this series?
> > >
> > > Thanks.
> > >
> >
> > I did look over the series earlier.  I have no issue with the hugetlb and
> > vmemmap modifications as they are enhancements to the existing
> > optimizations.  My primary concern is the (small) increased overhead
> > for the helpers as outlined in your cover letter.  Since these helpers
> > are not limited to hugetlb and used throughout the kernel, I would
> > really like to get comments from others with a better understanding of
> > the potential impact.
>
> Thanks Mike. I'd like to hear others' comments about this as well.
> From my point of view, maybe the (small) overhead is acceptable
> since it only affects the head page, however Matthew Wilcox's folio
> series could reduce this situation as well.
>
> Looking forward to others' comments.
>

Ping guys.

Hi Andrew,

Do you have any suggestions on this series to move it on?

Thanks.

Andrew Morton Nov. 24, 2021, 3:09 a.m. UTC | #5

On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote:

> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > On 11/8/21 12:16 AM, Muchun Song wrote:
> > > > On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >>
> > > >> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> > > >> significantly. It further reduces the overhead of struct page by 12.5% for
> > > >> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> > > >> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> > > >>
> > > >
> > > > Hi,
> > > >
> > > > Ping guys. Does anyone have any comments or suggestions
> > > > on this series?
> > > >
> > > > Thanks.
> > > >
> > >
> > > I did look over the series earlier.  I have no issue with the hugetlb and
> > > vmemmap modifications as they are enhancements to the existing
> > > optimizations.  My primary concern is the (small) increased overhead
> > > for the helpers as outlined in your cover letter.  Since these helpers
> > > are not limited to hugetlb and used throughout the kernel, I would
> > > really like to get comments from others with a better understanding of
> > > the potential impact.
> >
> > Thanks Mike. I'd like to hear others' comments about this as well.
> > From my point of view, maybe the (small) overhead is acceptable
> > since it only affects the head page, however Matthew Wilcox's folio
> > series could reduce this situation as well.

I think Mike was inviting you to run some tests to quantify the
overhead ;)

> Ping guys.
> 
> Hi Andrew,
> 
> Do you have any suggestions on this series to move it on?
> 

I tossed it in there for some testing but yes please, additional
reviewing?

Muchun Song Jan. 26, 2022, 8:04 a.m. UTC | #6

On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
>
> > On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > >
> > > On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > >
> > > > On 11/8/21 12:16 AM, Muchun Song wrote:
> > > > > On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > >>
> > > > >> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> > > > >> significantly. It further reduces the overhead of struct page by 12.5% for
> > > > >> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> > > > >> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> > > > >>
> > > > >
> > > > > Hi,
> > > > >
> > > > > Ping guys. Does anyone have any comments or suggestions
> > > > > on this series?
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > > I did look over the series earlier.  I have no issue with the hugetlb and
> > > > vmemmap modifications as they are enhancements to the existing
> > > > optimizations.  My primary concern is the (small) increased overhead
> > > > for the helpers as outlined in your cover letter.  Since these helpers
> > > > are not limited to hugetlb and used throughout the kernel, I would
> > > > really like to get comments from others with a better understanding of
> > > > the potential impact.
> > >
> > > Thanks Mike. I'd like to hear others' comments about this as well.
> > > From my point of view, maybe the (small) overhead is acceptable
> > > since it only affects the head page, however Matthew Wilcox's folio
> > > series could reduce this situation as well.
>
> I think Mike was inviting you to run some tests to quantify the
> overhead ;)

Hi Andrew,

Sorry for the late reply.

Specific overhead figures are already in the cover letter. Also,
I did some other tests, e.g. kernel compilation, sysbench. I didn't
see any regressions.

>
> > Ping guys.
> >
> > Hi Andrew,
> >
> > Do you have any suggestions on this series to move it on?
> >
>
> I tossed it in there for some testing but yes please, additional
> reviewing?

It's already been in the next-tree (also in our ByteDance servers)
for several months, and I didn't receive any negative feedback.

Do you think it is ready for 5.17?

Thanks.

Muchun Song Feb. 9, 2022, 7:44 a.m. UTC | #7

On Wed, Jan 26, 2022 at 4:04 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > > On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
> > > >
> > > > On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > > >
> > > > > On 11/8/21 12:16 AM, Muchun Song wrote:
> > > > > > On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > > > > >>
> > > > > >> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> > > > > >> significantly. It further reduces the overhead of struct page by 12.5% for
> > > > > >> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> > > > > >> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> > > > > >>
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Ping guys. Does anyone have any comments or suggestions
> > > > > > on this series?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > > I did look over the series earlier.  I have no issue with the hugetlb and
> > > > > vmemmap modifications as they are enhancements to the existing
> > > > > optimizations.  My primary concern is the (small) increased overhead
> > > > > for the helpers as outlined in your cover letter.  Since these helpers
> > > > > are not limited to hugetlb and used throughout the kernel, I would
> > > > > really like to get comments from others with a better understanding of
> > > > > the potential impact.
> > > >
> > > > Thanks Mike. I'd like to hear others' comments about this as well.
> > > > From my point of view, maybe the (small) overhead is acceptable
> > > > since it only affects the head page, however Matthew Wilcox's folio
> > > > series could reduce this situation as well.
> >
> > I think Mike was inviting you to run some tests to quantify the
> > overhead ;)
>
> Hi Andrew,
>
> Sorry for the late reply.
>
> Specific overhead figures are already in the cover letter. Also,
> I did some other tests, e.g. kernel compilation, sysbench. I didn't
> see any regressions.

The overhead is introduced by page_fixed_fake_head() which
has an "if" statement and an access to a possible cold cache line.
I think the main overhead is from the latter. However, probabilistically,
only 1/64 of the pages need to do the latter.  And
page_fixed_fake_head() is already simple (I mean the overhead
is small enough) and many performance bottlenecks in mm are
not in compound_head().  This also matches the tests I did.
I didn't see any regressions after enabling this feature.

I knew Mike's concern is the increased overhead to use cases
beyond HugeTLB. If we really want to avoid the access to
a possible cold cache line, we can introduce a new page
flag like PG_hugetlb and test if it is set in the page->flags,
if so, then return the read head page struct. Then
page_fixed_fake_head() looks like below.

static __always_inline const struct page *page_fixed_fake_head(const
struct page *page)
{
        if (!hugetlb_free_vmemmap_enabled())
                return page;

        if (test_bit(PG_hugetlb, &page->flags)) {
                unsigned long head = READ_ONCE(page[1].compound_head);

                if (likely(head & 1))
                        return (const struct page *)(head - 1);
        }
        return page;
}

But I don't think it's worth doing this.

Hi Mike and Andrew,

Since these helpers are not limited to hugetlb and used throughout the
kernel, I would really like to get comments from others with a better
understanding of the potential impact. Do you have any appropriate
reviewers to invite?

Thanks.
>
> >
> > > Ping guys.
> > >
> > > Hi Andrew,
> > >
> > > Do you have any suggestions on this series to move it on?
> > >
> >
> > I tossed it in there for some testing but yes please, additional
> > reviewing?
>
> It's already been in the next-tree (also in our ByteDance servers)
> for several months, and I didn't receive any negative feedback.
>
> Do you think it is ready for 5.17?
>
> Thanks.

Mike Kravetz Feb. 9, 2022, 10:48 p.m. UTC | #8

On 2/8/22 23:44, Muchun Song wrote:
> On Wed, Jan 26, 2022 at 4:04 PM Muchun Song <songmuchun@bytedance.com> wrote:
>>
>> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
>> <akpm@linux-foundation.org> wrote:
>>>
>>> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
>>>
>>>> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
>>>>>
>>>>> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>>>
>>>>>> On 11/8/21 12:16 AM, Muchun Song wrote:
>>>>>>> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>>>>>>
>>>>>>>> This series can minimize the overhead of struct page for 2MB HugeTLB pages
>>>>>>>> significantly. It further reduces the overhead of struct page by 12.5% for
>>>>>>>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
>>>>>>>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
>>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Ping guys. Does anyone have any comments or suggestions
>>>>>>> on this series?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>> I did look over the series earlier.  I have no issue with the hugetlb and
>>>>>> vmemmap modifications as they are enhancements to the existing
>>>>>> optimizations.  My primary concern is the (small) increased overhead
>>>>>> for the helpers as outlined in your cover letter.  Since these helpers
>>>>>> are not limited to hugetlb and used throughout the kernel, I would
>>>>>> really like to get comments from others with a better understanding of
>>>>>> the potential impact.
>>>>>
>>>>> Thanks Mike. I'd like to hear others' comments about this as well.
>>>>> From my point of view, maybe the (small) overhead is acceptable
>>>>> since it only affects the head page, however Matthew Wilcox's folio
>>>>> series could reduce this situation as well.
>>>
>>> I think Mike was inviting you to run some tests to quantify the
>>> overhead ;)
>>
>> Hi Andrew,
>>
>> Sorry for the late reply.
>>
>> Specific overhead figures are already in the cover letter. Also,
>> I did some other tests, e.g. kernel compilation, sysbench. I didn't
>> see any regressions.
> 
> The overhead is introduced by page_fixed_fake_head() which
> has an "if" statement and an access to a possible cold cache line.
> I think the main overhead is from the latter. However, probabilistically,
> only 1/64 of the pages need to do the latter.  And
> page_fixed_fake_head() is already simple (I mean the overhead
> is small enough) and many performance bottlenecks in mm are
> not in compound_head().  This also matches the tests I did.
> I didn't see any regressions after enabling this feature.
> 
> I knew Mike's concern is the increased overhead to use cases
> beyond HugeTLB. If we really want to avoid the access to
> a possible cold cache line, we can introduce a new page
> flag like PG_hugetlb and test if it is set in the page->flags,
> if so, then return the read head page struct. Then
> page_fixed_fake_head() looks like below.
> 
> static __always_inline const struct page *page_fixed_fake_head(const
> struct page *page)
> {
>         if (!hugetlb_free_vmemmap_enabled())
>                 return page;
> 
>         if (test_bit(PG_hugetlb, &page->flags)) {
>                 unsigned long head = READ_ONCE(page[1].compound_head);
> 
>                 if (likely(head & 1))
>                         return (const struct page *)(head - 1);
>         }
>         return page;
> }
> 
> But I don't think it's worth doing this.
> 
> Hi Mike and Andrew,
> 
> Since these helpers are not limited to hugetlb and used throughout the
> kernel, I would really like to get comments from others with a better
> understanding of the potential impact. Do you have any appropriate
> reviewers to invite?
> 

I think the appropriate people are already on Cc as they provided input on
the original vmemmap optimization series.

The question that needs to be answered is simple enough:  Is the savings of
one vmemmap page per hugetlb page worth the extra minimal overhead in
compound_head()?  Like most things, this depends on workload.

One thing to note is that compound_page() overhead is only introduced if
hugetlb vmemmap freeing is enabled.  Correct?  During the original vmemmap
optimization discussions, people thought it important that this be 'opt in'.  I do not know if distos will enable this by default.  But, perhaps the 
potential overhead can be thought of as just part of 'opting in' for
vmemmap optimizations.

Muchun Song Feb. 10, 2022, 7:45 a.m. UTC | #9

On Thu, Feb 10, 2022 at 6:49 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 2/8/22 23:44, Muchun Song wrote:
> > On Wed, Jan 26, 2022 at 4:04 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >>
> >> On Wed, Nov 24, 2021 at 11:09 AM Andrew Morton
> >> <akpm@linux-foundation.org> wrote:
> >>>
> >>> On Mon, 22 Nov 2021 12:21:32 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> >>>
> >>>> On Wed, Nov 10, 2021 at 2:18 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>>
> >>>>> On Tue, Nov 9, 2021 at 3:33 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>>>>>
> >>>>>> On 11/8/21 12:16 AM, Muchun Song wrote:
> >>>>>>> On Mon, Nov 1, 2021 at 11:22 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>>>>>
> >>>>>>>> This series can minimize the overhead of struct page for 2MB HugeTLB pages
> >>>>>>>> significantly. It further reduces the overhead of struct page by 12.5% for
> >>>>>>>> a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB
> >>>>>>>> HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Ping guys. Does anyone have any comments or suggestions
> >>>>>>> on this series?
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>
> >>>>>> I did look over the series earlier.  I have no issue with the hugetlb and
> >>>>>> vmemmap modifications as they are enhancements to the existing
> >>>>>> optimizations.  My primary concern is the (small) increased overhead
> >>>>>> for the helpers as outlined in your cover letter.  Since these helpers
> >>>>>> are not limited to hugetlb and used throughout the kernel, I would
> >>>>>> really like to get comments from others with a better understanding of
> >>>>>> the potential impact.
> >>>>>
> >>>>> Thanks Mike. I'd like to hear others' comments about this as well.
> >>>>> From my point of view, maybe the (small) overhead is acceptable
> >>>>> since it only affects the head page, however Matthew Wilcox's folio
> >>>>> series could reduce this situation as well.
> >>>
> >>> I think Mike was inviting you to run some tests to quantify the
> >>> overhead ;)
> >>
> >> Hi Andrew,
> >>
> >> Sorry for the late reply.
> >>
> >> Specific overhead figures are already in the cover letter. Also,
> >> I did some other tests, e.g. kernel compilation, sysbench. I didn't
> >> see any regressions.
> >
> > The overhead is introduced by page_fixed_fake_head() which
> > has an "if" statement and an access to a possible cold cache line.
> > I think the main overhead is from the latter. However, probabilistically,
> > only 1/64 of the pages need to do the latter.  And
> > page_fixed_fake_head() is already simple (I mean the overhead
> > is small enough) and many performance bottlenecks in mm are
> > not in compound_head().  This also matches the tests I did.
> > I didn't see any regressions after enabling this feature.
> >
> > I knew Mike's concern is the increased overhead to use cases
> > beyond HugeTLB. If we really want to avoid the access to
> > a possible cold cache line, we can introduce a new page
> > flag like PG_hugetlb and test if it is set in the page->flags,
> > if so, then return the read head page struct. Then
> > page_fixed_fake_head() looks like below.
> >
> > static __always_inline const struct page *page_fixed_fake_head(const
> > struct page *page)
> > {
> >         if (!hugetlb_free_vmemmap_enabled())
> >                 return page;
> >
> >         if (test_bit(PG_hugetlb, &page->flags)) {
> >                 unsigned long head = READ_ONCE(page[1].compound_head);
> >
> >                 if (likely(head & 1))
> >                         return (const struct page *)(head - 1);
> >         }
> >         return page;
> > }
> >
> > But I don't think it's worth doing this.
> >
> > Hi Mike and Andrew,
> >
> > Since these helpers are not limited to hugetlb and used throughout the
> > kernel, I would really like to get comments from others with a better
> > understanding of the potential impact. Do you have any appropriate
> > reviewers to invite?
> >
>
> I think the appropriate people are already on Cc as they provided input on
> the original vmemmap optimization series.
>
> The question that needs to be answered is simple enough:  Is the savings of
> one vmemmap page per hugetlb page worth the extra minimal overhead in
> compound_head()?  Like most things, this depends on workload.
>
> One thing to note is that compound_page() overhead is only introduced if
> hugetlb vmemmap freeing is enabled.  Correct?

Definitely correct.

> During the original vmemmap
> optimization discussions, people thought it important that this be 'opt in'.  I do not know if distos will enable this by default.  But, perhaps the
> potential overhead can be thought of as just part of 'opting in' for
> vmemmap optimizations.

I agree. Does anyone else have a different opinion?

Thanks.

[v7,0/5] Free the 2nd vmemmap page associated with each HugeTLB page

Message

Comments