diff mbox series

mm,memory_hotplug: Explicitly pass the head to isolate_huge_page

Message ID 20190208090604.975-1-osalvador@suse.de (mailing list archive)
State New, archived
Headers show
Series mm,memory_hotplug: Explicitly pass the head to isolate_huge_page | expand

Commit Message

Oscar Salvador Feb. 8, 2019, 9:06 a.m. UTC
isolate_huge_page() expects we pass the head of hugetlb page to it:

bool isolate_huge_page(...)
{
	...
	VM_BUG_ON_PAGE(!PageHead(page), page);
	...
}

While I really cannot think of any situation where we end up with a
non-head page between hands in do_migrate_range(), let us make sure
the code is as sane as possible by explicitly passing the Head.
Since we already got the pointer, it does not take us extra effort.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 mm/memory_hotplug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Hildenbrand Feb. 11, 2019, 8:58 a.m. UTC | #1
On 08.02.19 10:06, Oscar Salvador wrote:
> isolate_huge_page() expects we pass the head of hugetlb page to it:
> 
> bool isolate_huge_page(...)
> {
> 	...
> 	VM_BUG_ON_PAGE(!PageHead(page), page);
> 	...
> }
> 
> While I really cannot think of any situation where we end up with a
> non-head page between hands in do_migrate_range(), let us make sure
> the code is as sane as possible by explicitly passing the Head.
> Since we already got the pointer, it does not take us extra effort.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 656ff386ac15..d5f7afda67db 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1378,12 +1378,12 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  
>  		if (PageHuge(page)) {
>  			struct page *head = compound_head(page);
> -			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
>  			if (compound_order(head) > PFN_SECTION_SHIFT) {
>  				ret = -EBUSY;
>  				break;
>  			}
> -			isolate_huge_page(page, &source);
> +			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
> +			isolate_huge_page(head, &source);
>  			continue;
>  		} else if (PageTransHuge(page))
>  			pfn = page_to_pfn(compound_head(page))
> 

Reviewed-by: David Hildenbrand <david@redhat.com>
Michal Hocko Feb. 12, 2019, 8:33 a.m. UTC | #2
On Fri 08-02-19 10:06:04, Oscar Salvador wrote:
> isolate_huge_page() expects we pass the head of hugetlb page to it:
> 
> bool isolate_huge_page(...)
> {
> 	...
> 	VM_BUG_ON_PAGE(!PageHead(page), page);
> 	...
> }
> 
> While I really cannot think of any situation where we end up with a
> non-head page between hands in do_migrate_range(), let us make sure
> the code is as sane as possible by explicitly passing the Head.
> Since we already got the pointer, it does not take us extra effort.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Acked-by: Michal Hocko <mhocko@suse.com>

Btw.

> ---
>  mm/memory_hotplug.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 656ff386ac15..d5f7afda67db 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1378,12 +1378,12 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  
>  		if (PageHuge(page)) {
>  			struct page *head = compound_head(page);
> -			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
>  			if (compound_order(head) > PFN_SECTION_SHIFT) {
>  				ret = -EBUSY;
>  				break;
>  			}

Why are we doing this, btw? 

> -			isolate_huge_page(page, &source);
> +			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
> +			isolate_huge_page(head, &source);
>  			continue;
>  		} else if (PageTransHuge(page))
>  			pfn = page_to_pfn(compound_head(page))
> -- 
> 2.13.7
Oscar Salvador Feb. 12, 2019, 1:45 p.m. UTC | #3
On Tue, Feb 12, 2019 at 09:33:29AM +0100, Michal Hocko wrote:
> >  
> >  		if (PageHuge(page)) {
> >  			struct page *head = compound_head(page);
> > -			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
> >  			if (compound_order(head) > PFN_SECTION_SHIFT) {
> >  				ret = -EBUSY;
> >  				break;
> >  			}
> 
> Why are we doing this, btw? 

I assume you are referring to:

> >                     if (compound_order(head) > PFN_SECTION_SHIFT) {
> >                             ret = -EBUSY;
> >                             break;
> >                     }

I thought it was in case we stumble upon a gigantic page, and commit
(c8721bbbdd36 mm: memory-hotplug: enable memory hotplug to handle hugepage)
confirms it.

But I am not really sure if the above condition would still hold on powerpc,
I wanted to check it but it is a bit more tricky than it is in x86_64 because
of the different hugetlb sizes.
Could it be that the above condition is not true, but still the order of that
hugetlb page goes beyond MAX_ORDER? It is something I have to check.

Anyway, I think that a safer way to check this would be using hstate_is_gigantic(),
which checks whether the order of the hstate goes beyond MAX_ORDER.
In the end, I think that all we care about is if we can get the pages to migrate
to via the buddy allocator, since gigantic pages need to use another method.

Actually, alloc_migrate_huge_page() checks for it:

<---
static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
		int nid, nodemask_t *nmask)
{

	if (hstate_is_gigantic(h))
		return NULL;
--->

Another thing is that AFAICS, as long as the memblock we try to offline contains
a gigantic page, it will not be able to be offlined.
Moreover, the -EBUSY we return in that case is not checked anywhere, although that
is not really an issue because scan_movable_pages will skip it in the next loop.

Now, this is more rambling than anything:
Maybe I am missing half of the picture, but I have been thinking for a while whether
we could do better when it comes to gigantic pages vs hotplug.
I think that we could try to migrate those in case any of the other nodes
have a spare pre-allocated gigantic page.
Michal Hocko Feb. 12, 2019, 2:40 p.m. UTC | #4
On Tue 12-02-19 14:45:49, Oscar Salvador wrote:
> On Tue, Feb 12, 2019 at 09:33:29AM +0100, Michal Hocko wrote:
> > >  
> > >  		if (PageHuge(page)) {
> > >  			struct page *head = compound_head(page);
> > > -			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
> > >  			if (compound_order(head) > PFN_SECTION_SHIFT) {
> > >  				ret = -EBUSY;
> > >  				break;
> > >  			}
> > 
> > Why are we doing this, btw? 
> 
> I assume you are referring to:
> 
> > >                     if (compound_order(head) > PFN_SECTION_SHIFT) {
> > >                             ret = -EBUSY;
> > >                             break;
> > >                     }

yes.

> I thought it was in case we stumble upon a gigantic page, and commit
> (c8721bbbdd36 mm: memory-hotplug: enable memory hotplug to handle hugepage)
> confirms it.
> 
> But I am not really sure if the above condition would still hold on powerpc,
> I wanted to check it but it is a bit more tricky than it is in x86_64 because
> of the different hugetlb sizes.
> Could it be that the above condition is not true, but still the order of that
> hugetlb page goes beyond MAX_ORDER? It is something I have to check.

This check doesn't make much sense in principle. Why should we bail out
based on a section size? We are offlining a pfn range. All that we care
about is whether the hugetlb is migrateable.
Mike Kravetz Feb. 13, 2019, 12:13 a.m. UTC | #5
On 2/12/19 6:40 AM, Michal Hocko wrote:
> On Tue 12-02-19 14:45:49, Oscar Salvador wrote:
>> On Tue, Feb 12, 2019 at 09:33:29AM +0100, Michal Hocko wrote:
>>>>  
>>>>  		if (PageHuge(page)) {
>>>>  			struct page *head = compound_head(page);
>>>> -			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
>>>>  			if (compound_order(head) > PFN_SECTION_SHIFT) {
>>>>  				ret = -EBUSY;
>>>>  				break;
>>>>  			}
>>>
>>> Why are we doing this, btw? 
>>
>> I assume you are referring to:
>>
>>>>                     if (compound_order(head) > PFN_SECTION_SHIFT) {
>>>>                             ret = -EBUSY;
>>>>                             break;
>>>>                     }
> 
> yes.
> 
>> I thought it was in case we stumble upon a gigantic page, and commit
>> (c8721bbbdd36 mm: memory-hotplug: enable memory hotplug to handle hugepage)
>> confirms it.
>>
>> But I am not really sure if the above condition would still hold on powerpc,
>> I wanted to check it but it is a bit more tricky than it is in x86_64 because
>> of the different hugetlb sizes.
>> Could it be that the above condition is not true, but still the order of that
>> hugetlb page goes beyond MAX_ORDER? It is something I have to check.

Well, commit 94310cbcaa3c ("mm/madvise: enable (soft|hard) offline of
HugeTLB pages at PGD level") should have allowed migration of gigantic
pages.  I believe it was added for 16GB pages on powerpc.  However, due
to subsequent changes I suspsect this no longer works.

> This check doesn't make much sense in principle. Why should we bail out
> based on a section size? We are offlining a pfn range. All that we care
> about is whether the hugetlb is migrateable.

Yes.  Do note that the do_migrate_range is only called from __offline_pages
with a start_pfn that was returned by scan_movable_pages.  scan_movable_pages
has the hugepage_migration_supported check for PageHuge pages.  So, it would
seem to be redundant to do another check in do_migrate_range.
Oscar Salvador Feb. 13, 2019, 8:13 a.m. UTC | #6
On Tue, Feb 12, 2019 at 04:13:05PM -0800, Mike Kravetz wrote:
> Well, commit 94310cbcaa3c ("mm/madvise: enable (soft|hard) offline of
> HugeTLB pages at PGD level") should have allowed migration of gigantic
> pages.  I believe it was added for 16GB pages on powerpc.  However, due
> to subsequent changes I suspsect this no longer works.

I will take a look, I am definitely interested in that.
Thanks for pointing it out Mike.

> 
> > This check doesn't make much sense in principle. Why should we bail out
> > based on a section size? We are offlining a pfn range. All that we care
> > about is whether the hugetlb is migrateable.
> 
> Yes.  Do note that the do_migrate_range is only called from __offline_pages
> with a start_pfn that was returned by scan_movable_pages.  scan_movable_pages
> has the hugepage_migration_supported check for PageHuge pages.  So, it would
> seem to be redundant to do another check in do_migrate_range.

Well, the thing is that if the gigantic page does not start at the very beginning
of the memblock, and we do find migrateable pages before it in scan_movable_pages(),
the range that we will pass to do_migrate_ranges() will contain the gigantic page.
So we need the check there to cover that case too, although I agree that the current
check is misleading.

I will think about it.
Michal Hocko Feb. 13, 2019, 12:33 p.m. UTC | #7
On Wed 13-02-19 09:13:14, Oscar Salvador wrote:
> On Tue, Feb 12, 2019 at 04:13:05PM -0800, Mike Kravetz wrote:
> > Well, commit 94310cbcaa3c ("mm/madvise: enable (soft|hard) offline of
> > HugeTLB pages at PGD level") should have allowed migration of gigantic
> > pages.  I believe it was added for 16GB pages on powerpc.  However, due
> > to subsequent changes I suspsect this no longer works.
> 
> I will take a look, I am definitely interested in that.
> Thanks for pointing it out Mike.
> 
> > 
> > > This check doesn't make much sense in principle. Why should we bail out
> > > based on a section size? We are offlining a pfn range. All that we care
> > > about is whether the hugetlb is migrateable.
> > 
> > Yes.  Do note that the do_migrate_range is only called from __offline_pages
> > with a start_pfn that was returned by scan_movable_pages.  scan_movable_pages
> > has the hugepage_migration_supported check for PageHuge pages.  So, it would
> > seem to be redundant to do another check in do_migrate_range.
> 
> Well, the thing is that if the gigantic page does not start at the very beginning
> of the memblock, and we do find migrateable pages before it in scan_movable_pages(),
> the range that we will pass to do_migrate_ranges() will contain the gigantic page.
> So we need the check there to cover that case too, although I agree that the current
> check is misleading.

Why isn't our check in has_unmovable_pages sufficient?
Oscar Salvador Feb. 13, 2019, 3:15 p.m. UTC | #8
On Wed, Feb 13, 2019 at 01:33:39PM +0100, Michal Hocko wrote:
> Why isn't our check in has_unmovable_pages sufficient?

Taking a closer look, it should be enough.
I was mainly confused by the fact that if the zone is ZONE_MOVABLE,
we do not keep checking in has_unmovable_pages():

if (zone_idx(zone) == ZONE_MOVABLE)
	continue;

But I overlooked that htlb_alloc_mask() checks whether the allocation
cand end up in a movable zone.
hugepage_movable_supported() checks that and if the hstate does not
support migration at all, we skip __GFP_MOVABLE.

So I think that the check in has_unmovable_pages() should be more than enough,
so we could strip the checks from do_migrate_ranges() and
scan_movable_pages() regarding hugepage migratability.

I will run some tests just to make sure this holds and then
I will send a patch.

Thanks
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 656ff386ac15..d5f7afda67db 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1378,12 +1378,12 @@  do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 
 		if (PageHuge(page)) {
 			struct page *head = compound_head(page);
-			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
 			if (compound_order(head) > PFN_SECTION_SHIFT) {
 				ret = -EBUSY;
 				break;
 			}
-			isolate_huge_page(page, &source);
+			pfn = page_to_pfn(head) + (1<<compound_order(head)) - 1;
+			isolate_huge_page(head, &source);
 			continue;
 		} else if (PageTransHuge(page))
 			pfn = page_to_pfn(compound_head(page))