[v7,3/7] mm,hugetlb: Clear HPageFreed outside of the lock

Message ID	20210413104747.12177-4-osalvador@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=kYq9=JK=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F173613BF From: Oscar Salvador <osalvador@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, David Hildenbrand <david@redhat.com>, Michal Hocko <mhocko@kernel.org>, Muchun Song <songmuchun@bytedance.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Oscar Salvador <osalvador@suse.de> Subject: [PATCH v7 3/7] mm,hugetlb: Clear HPageFreed outside of the lock Date: Tue, 13 Apr 2021 12:47:43 +0200 Message-Id: <20210413104747.12177-4-osalvador@suse.de> In-Reply-To: <20210413104747.12177-1-osalvador@suse.de> References: <20210413104747.12177-1-osalvador@suse.de> Received-SPF: none (suse.de>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from="<osalvador@suse.de>"; helo=mx2.suse.de; client-ip=195.135.220.15 Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Make alloc_contig_range handle Hugetlb pages \| expand [v7,0/7] Make alloc_contig_range handle Hugetlb pages [v7,1/7] mm,page_alloc: Bail out earlier on -ENOMEM in alloc_contig_migrate_range [v7,2/7] mm,compaction: Let isolate_migratepages_{range,block} return error codes [v7,3/7] mm,hugetlb: Clear HPageFreed outside of the lock [v7,4/7] mm,hugetlb: Split prep_new_huge_page functionality [v7,5/7] mm: Make alloc_contig_range handle free hugetlb pages [v7,6/7] mm: Make alloc_contig_range handle in-use hugetlb pages [v7,7/7] mm,page_alloc: Drop unnecessary checks from pfn_range_valid_contig

Oscar Salvador April 13, 2021, 10:47 a.m. UTC

Currently, the clearing of the flag is done under the lock, but this
is unnecessary as we just allocated the page and we did not give it
away yet, so no one should be messing with it.

Also, this helps making clear that here the lock is only protecting the
counter.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Michal Hocko April 13, 2021, 1:23 p.m. UTC | #1

On Tue 13-04-21 12:47:43, Oscar Salvador wrote:
> Currently, the clearing of the flag is done under the lock, but this
> is unnecessary as we just allocated the page and we did not give it
> away yet, so no one should be messing with it.
> 
> Also, this helps making clear that here the lock is only protecting the
> counter.

While moving the flag clearing is ok I am wondering why do we need that
in the first place. I think it is just a leftover from 6c0371490140
("hugetlb: convert PageHugeFreed to HPageFreed flag"). Prior to that a tail
page as been used to keep track of the state but now all happens in the
head page and the flag uses page->private which is always initialized
when allocated by the allocator (post_alloc_hook).

Or do we need it for giga pages which are not allocated by the page
allocator? If yes then moving it to prep_compound_gigantic_page would be
better.

So should we just drop it here?

> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 54d81d5947ed..e40d5fe5c63c 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1490,10 +1490,10 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
>  	hugetlb_set_page_subpool(page, NULL);
>  	set_hugetlb_cgroup(page, NULL);
>  	set_hugetlb_cgroup_rsvd(page, NULL);
> +	ClearHPageFreed(page);
>  	spin_lock_irq(&hugetlb_lock);
>  	h->nr_huge_pages++;
>  	h->nr_huge_pages_node[nid]++;
> -	ClearHPageFreed(page);
>  	spin_unlock_irq(&hugetlb_lock);
>  }
>  
> -- 
> 2.16.3

Mike Kravetz April 13, 2021, 9:19 p.m. UTC | #2

On 4/13/21 6:23 AM, Michal Hocko wrote:
> On Tue 13-04-21 12:47:43, Oscar Salvador wrote:
>> Currently, the clearing of the flag is done under the lock, but this
>> is unnecessary as we just allocated the page and we did not give it
>> away yet, so no one should be messing with it.
>>
>> Also, this helps making clear that here the lock is only protecting the
>> counter.
> 
> While moving the flag clearing is ok I am wondering why do we need that
> in the first place. I think it is just a leftover from 6c0371490140
> ("hugetlb: convert PageHugeFreed to HPageFreed flag"). Prior to that a tail
> page as been used to keep track of the state but now all happens in the
> head page and the flag uses page->private which is always initialized
> when allocated by the allocator (post_alloc_hook).

Yes, this was just left over from 6c0371490140.  And, as you mention
post_alloc_hook will clear page->private for all non-gigantic pages
allocated via buddy.

> Or do we need it for giga pages which are not allocated by the page
> allocator? If yes then moving it to prep_compound_gigantic_page would be
> better.

I am pretty sure dynamically allocated giga pages have page->Private
cleared as well.  It is not obvious, but the alloc_contig_range code
used to put together giga pages will end up calling isolate_freepages_range
that will call split_map_pages and then post_alloc_hook for each MAX_ORDER
block.  As mentioned, this is not obvious and I would hate to rely on this
behavior not changing.

> 
> So should we just drop it here?

The only place where page->private may not be initialized is when we do
allocations at boot time from memblock.  In this case, we will add the
pages to the free list via put_page/free_huge_page so the appropriate
flags will be cleared before anyone notices.

I'm wondering if we should just do a set_page_private(page, 0) here in
prep_new_huge_page since we now use that field for flags.  Or, is that
overkill?

Michal Hocko April 14, 2021, 6:04 a.m. UTC | #3

On Tue 13-04-21 14:19:03, Mike Kravetz wrote:
> On 4/13/21 6:23 AM, Michal Hocko wrote:
> > On Tue 13-04-21 12:47:43, Oscar Salvador wrote:
[...]
> > Or do we need it for giga pages which are not allocated by the page
> > allocator? If yes then moving it to prep_compound_gigantic_page would be
> > better.
> 
> I am pretty sure dynamically allocated giga pages have page->Private
> cleared as well.  It is not obvious, but the alloc_contig_range code
> used to put together giga pages will end up calling isolate_freepages_range
> that will call split_map_pages and then post_alloc_hook for each MAX_ORDER
> block.

Thanks for saving me from crawling that code.

> As mentioned, this is not obvious and I would hate to rely on this
> behavior not changing.

Thinking about it some more, having some (page granularity) allocator
not clearing page private would be a serious problem for anybody relying
on its state. So I am not sure this can change.
 
> > So should we just drop it here?
> 
> The only place where page->private may not be initialized is when we do
> allocations at boot time from memblock.  In this case, we will add the
> pages to the free list via put_page/free_huge_page so the appropriate
> flags will be cleared before anyone notices.

Pages allocated by the bootmem should be pre initialized from the boot,
no?

> I'm wondering if we should just do a set_page_private(page, 0) here in
> prep_new_huge_page since we now use that field for flags.  Or, is that
> overkill?

I would rather not duplicate the work done by underlying allocators. I
do not think other users of the allocator want to do the same so why
should hugetlb be any different.

Oscar Salvador April 14, 2021, 7:41 a.m. UTC | #4

On Wed, Apr 14, 2021 at 08:04:21AM +0200, Michal Hocko wrote:
> On Tue 13-04-21 14:19:03, Mike Kravetz wrote:
> > On 4/13/21 6:23 AM, Michal Hocko wrote:
> > The only place where page->private may not be initialized is when we do
> > allocations at boot time from memblock.  In this case, we will add the
> > pages to the free list via put_page/free_huge_page so the appropriate
> > flags will be cleared before anyone notices.
> 
> Pages allocated by the bootmem should be pre initialized from the boot,
> no?

I guess Mike means:

hugetlb_hstate_alloc_pages
 alloc_bootmem_huge_page
  __alloc_bootmem_huge_page
   memblock_alloc_try_nid_raw

and AFAICS, memblock_alloc_try_nid_raw() does not zero the memory.

Then these pages are initialized in:

gather_bootmem_prealloc
 prep_compound_huge_page
 prep_new_huge_page

But as can be noticed, no one touches page->private when coming from that
path.

Michal Hocko April 14, 2021, 8:28 a.m. UTC | #5

On Wed 14-04-21 09:41:32, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 08:04:21AM +0200, Michal Hocko wrote:
> > On Tue 13-04-21 14:19:03, Mike Kravetz wrote:
> > > On 4/13/21 6:23 AM, Michal Hocko wrote:
> > > The only place where page->private may not be initialized is when we do
> > > allocations at boot time from memblock.  In this case, we will add the
> > > pages to the free list via put_page/free_huge_page so the appropriate
> > > flags will be cleared before anyone notices.
> > 
> > Pages allocated by the bootmem should be pre initialized from the boot,
> > no?
> 
> I guess Mike means:
> 
> hugetlb_hstate_alloc_pages
>  alloc_bootmem_huge_page
>   __alloc_bootmem_huge_page
>    memblock_alloc_try_nid_raw
> 
> and AFAICS, memblock_alloc_try_nid_raw() does not zero the memory.

You are right it doesn't do it there. But all struct pages, even those
that are allocated by the bootmem allocator should initialize its struct
pages. They would be poisoned otherwise, right? I would have to look at
the exact code path but IIRC this should be around the time bootmem
allocator state transitions to the page allocator.

Oscar Salvador April 14, 2021, 10:01 a.m. UTC | #6

On Wed, Apr 14, 2021 at 10:28:33AM +0200, Michal Hocko wrote:
> You are right it doesn't do it there. But all struct pages, even those
> that are allocated by the bootmem allocator should initialize its struct
> pages. They would be poisoned otherwise, right? I would have to look at
> the exact code path but IIRC this should be around the time bootmem
> allocator state transitions to the page allocator.

Ok, you are right.
struct pages are initialized a bit earlier through:

start_kernel
 setup_arch
  paging_init
   zone_sizes_init
    free_area_init
     free_area_init_node
      free_area_init_core
       memmap_init_zone
        memmap_init_range
         __init_single_page

While the allocation of bootmem hugetlb happens

start_kernel
 parse_args
  ...
   hugepages_setup
    ...
     hugetlb_hstate_alloc_pages
      __alloc_bootmem_huge_page

which is after the setup_arch() call.

So by the time we get the page from __alloc_bootmem_huge_page(), fields are
zeroed.
I thought we might get in trouble because memblock_alloc_try_nid_raw() calls
page_init_poison() which poisons the chunk with 0xff,e.g:

[    1.955471] boot: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    1.955476] boot: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff

 but it seems that does not the memmap struct page.

I checked, and when we get there in __alloc_bootmem_huge_page, page->private is
still zeroed, so I guess it should be safe to assume that we do not really need
to clear the flag in __prep_new_huge_page() routine?

Oscar Salvador April 14, 2021, 10:03 a.m. UTC | #7

On Wed, Apr 14, 2021 at 12:01:47PM +0200, Oscar Salvador wrote:
>  but it seems that does not the memmap struct page.
that sould read as "but it seems that that does not affect the memmap struct page"

Michal Hocko April 14, 2021, 10:32 a.m. UTC | #8

On Wed 14-04-21 12:01:47, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 10:28:33AM +0200, Michal Hocko wrote:
> > You are right it doesn't do it there. But all struct pages, even those
> > that are allocated by the bootmem allocator should initialize its struct
> > pages. They would be poisoned otherwise, right? I would have to look at
> > the exact code path but IIRC this should be around the time bootmem
> > allocator state transitions to the page allocator.
> 
> Ok, you are right.
> struct pages are initialized a bit earlier through:
> 
> start_kernel
>  setup_arch
>   paging_init
>    zone_sizes_init
>     free_area_init
>      free_area_init_node
>       free_area_init_core
>        memmap_init_zone
>         memmap_init_range
>          __init_single_page
> 
> While the allocation of bootmem hugetlb happens
> 
> start_kernel
>  parse_args
>   ...
>    hugepages_setup
>     ...
>      hugetlb_hstate_alloc_pages
>       __alloc_bootmem_huge_page
> 
> which is after the setup_arch() call.

Thanks for pulling those paths. It is always painful to crawl that code.

> So by the time we get the page from __alloc_bootmem_huge_page(), fields are
> zeroed.
> I thought we might get in trouble because memblock_alloc_try_nid_raw() calls
> page_init_poison() which poisons the chunk with 0xff,e.g:
> 
> [    1.955471] boot: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> [    1.955476] boot: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> 
>  but it seems that does not the memmap struct page.

Well, to be precise it does the very same thing with memamp struct pages
but that is before the initialization code you have pointed out above.
In this context it just poisons the allocated content which is the GB
page storage.

> I checked, and when we get there in __alloc_bootmem_huge_page, page->private is
> still zeroed, so I guess it should be safe to assume that we do not really need
> to clear the flag in __prep_new_huge_page() routine?

It would be quite nasty if the struct pages content would be undefined.
Maybe that is possible but then I would rather stick the initialization
into __alloc_bootmem_huge_page.

Thanks!

Oscar Salvador April 14, 2021, 10:49 a.m. UTC | #9

On Wed, Apr 14, 2021 at 12:32:58PM +0200, Michal Hocko wrote:
> Well, to be precise it does the very same thing with memamp struct pages
> but that is before the initialization code you have pointed out above.
> In this context it just poisons the allocated content which is the GB
> page storage.

Right.

> > I checked, and when we get there in __alloc_bootmem_huge_page, page->private is
> > still zeroed, so I guess it should be safe to assume that we do not really need
> > to clear the flag in __prep_new_huge_page() routine?
> 
> It would be quite nasty if the struct pages content would be undefined.
> Maybe that is possible but then I would rather stick the initialization
> into __alloc_bootmem_huge_page.

Yes, but I do not think that is really possible unless I missed something.
Let us see what Mike thinks of it, if there are no objections, we can
get rid of the clearing flag right there.

Michal Hocko April 14, 2021, 11:09 a.m. UTC | #10

On Wed 14-04-21 12:49:53, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 12:32:58PM +0200, Michal Hocko wrote:
[...]
> > > I checked, and when we get there in __alloc_bootmem_huge_page, page->private is
> > > still zeroed, so I guess it should be safe to assume that we do not really need
> > > to clear the flag in __prep_new_huge_page() routine?
> > 
> > It would be quite nasty if the struct pages content would be undefined.
> > Maybe that is possible but then I would rather stick the initialization
> > into __alloc_bootmem_huge_page.
> 
> Yes, but I do not think that is really possible unless I missed something.

Yeah, it should be fine. I was thinking of a alloc, modify struct pages,
free back to the bootmem allocator sequence. But I do not remember ever
seeing sequence like that. Bootmem allocator users tend to be simple,
allocate storage and either retain it for the life time. Other than
PageReserved bit they do not touch metadata. If we want to be paranoid
then we can add VM_WARN_ON for unexpected state when allocating from the
bootmem. But I am not sure this is really worth it.

David Hildenbrand April 14, 2021, 12:02 p.m. UTC | #11

On 14.04.21 13:09, Michal Hocko wrote:
> On Wed 14-04-21 12:49:53, Oscar Salvador wrote:
>> On Wed, Apr 14, 2021 at 12:32:58PM +0200, Michal Hocko wrote:
> [...]
>>>> I checked, and when we get there in __alloc_bootmem_huge_page, page->private is
>>>> still zeroed, so I guess it should be safe to assume that we do not really need
>>>> to clear the flag in __prep_new_huge_page() routine?
>>>
>>> It would be quite nasty if the struct pages content would be undefined.
>>> Maybe that is possible but then I would rather stick the initialization
>>> into __alloc_bootmem_huge_page.
>>
>> Yes, but I do not think that is really possible unless I missed something.
> 
> Yeah, it should be fine. I was thinking of a alloc, modify struct pages,
> free back to the bootmem allocator sequence. But I do not remember ever
> seeing sequence like that.

We do have code like that, though.

Take a look at arch/x86/mm/init_64.c:free_pagetable() as one example.

But in general, we assume freeing code (buddy, but also memblock) 
restores the state of the memmap to the original state it obtained the 
memmap. So if it's properly initialized when coming from the allocator 
the first time, it should remain properly initialized when re-entering 
and re-leaving the allocator.

Mike Kravetz April 14, 2021, 4:45 p.m. UTC | #12

On 4/14/21 3:49 AM, Oscar Salvador wrote:
> On Wed, Apr 14, 2021 at 12:32:58PM +0200, Michal Hocko wrote:
>> Well, to be precise it does the very same thing with memamp struct pages
>> but that is before the initialization code you have pointed out above.
>> In this context it just poisons the allocated content which is the GB
>> page storage.
> 
> Right.
> 
>>> I checked, and when we get there in __alloc_bootmem_huge_page, page->private is
>>> still zeroed, so I guess it should be safe to assume that we do not really need
>>> to clear the flag in __prep_new_huge_page() routine?
>>
>> It would be quite nasty if the struct pages content would be undefined.
>> Maybe that is possible but then I would rather stick the initialization
>> into __alloc_bootmem_huge_page.
> 
> Yes, but I do not think that is really possible unless I missed something.
> Let us see what Mike thinks of it, if there are no objections, we can
> get rid of the clearing flag right there.
>  

Thanks for crawling through that code Oscar!

I do not think you missed anything.  Let's just get rid of the flag
clearing.

[v7,3/7] mm,hugetlb: Clear HPageFreed outside of the lock

Commit Message

Comments

Patch