mbox series

[0/3] mm/hugetlb: hugepage migration enhancements

Message ID 20220921223639.1152392-1-opendmb@gmail.com (mailing list archive)
Headers show
Series mm/hugetlb: hugepage migration enhancements | expand

Message

Doug Berger Sept. 21, 2022, 10:36 p.m. UTC
This patch set was included as patches [04/21-06/21] in my
previous patch set to introduce Designated Movable Blocks [1].
They were included there for context, but they have value on
their own and are being resubmitted here for consideration on
their own merits.

The alloc_contig_range() function attempts to isolate the free
pages within the range and migrate the data from non-free pages
within the range to allow those pages to become isolated free
pages. If all of the pages in the range are on isolated free
page lists they can be allocated to the caller.

When free hugepages are encountered in the range an attempt is
made to allocate a new compound page to become a replacement
hugepage and to dissolve the free hugepage so that its pages
within isolated pageblocks can be added to the isolated free
page lists. Hugepages that are not free and encountered within
the range must be migrated out of the range and dissolved to
allow the underlying pages to be added to the isolated free
page lists.

Moving the data from hugepages within the range and freeing the
hugepages is not sufficient since the allocation of migration
target hugepages is allowed to come from free hugepages that may
contain isolated pageblocks and freed hugepages will not be on
isolated free page lists so the alloc_contig_range() will fail.

To address these shortcommings the HPG_dissolve hugepage flag is
introduced to tag hugepages that must be dissolved when they are
freed so that their underlying pages can be moved to the page
allocator's free lists. This prevents hugepages that have had
their data migrated to new hugepages from being made available
to subsequent hugepage allocations and allows the isolated free
page test of alloc_contig_range() to succeed.

Dissolved hugepages must be replaced with new hugepages to keep
the hugetlbfs balanced. To support blocking allocation a new
workqueue in introduced that is analogous to the workqueue
introduced to support the hugetlb vmemmap optimization. This new
workqueue allows the allocation and dissolution of the hugepage
to be offloaded to a separate context from the freeing of the
hugepage. The sync_hugetlb_dissolve() function is introduced to
allow outstanding dissolution of hugepages to complete before
the isolated free page check is made by alloc_contig_range().

In addition, a check is added to hugepage allocation to prevent
free hugepages within an isolated pageblock range from being
used to satisfy migration target allocations preventing circular
migrations.

[1] https://lore.kernel.org/linux-mm/20220913195508.3511038-1-opendmb@gmail.com/

Doug Berger (3):
  mm/hugetlb: refactor alloc_and_dissolve_huge_page
  mm/hugetlb: allow migrated hugepage to dissolve when freed
  mm/hugetlb: add hugepage isolation support

 include/linux/hugetlb.h |   5 ++
 mm/hugetlb.c            | 161 +++++++++++++++++++++++++++++++---------
 mm/migrate.c            |   1 +
 mm/page_alloc.c         |   1 +
 4 files changed, 131 insertions(+), 37 deletions(-)


base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5

Comments

Mike Kravetz Sept. 22, 2022, 8:25 p.m. UTC | #1
On 09/21/22 15:36, Doug Berger wrote:
> This patch set was included as patches [04/21-06/21] in my
> previous patch set to introduce Designated Movable Blocks [1].
> They were included there for context, but they have value on
> their own and are being resubmitted here for consideration on
> their own merits.
> 
> The alloc_contig_range() function attempts to isolate the free
> pages within the range and migrate the data from non-free pages
> within the range to allow those pages to become isolated free
> pages. If all of the pages in the range are on isolated free
> page lists they can be allocated to the caller.
> 
> When free hugepages are encountered in the range an attempt is
> made to allocate a new compound page to become a replacement
> hugepage and to dissolve the free hugepage so that its pages
> within isolated pageblocks can be added to the isolated free
> page lists. Hugepages that are not free and encountered within
> the range must be migrated out of the range and dissolved to
> allow the underlying pages to be added to the isolated free
> page lists.
> 
> Moving the data from hugepages within the range and freeing the
> hugepages is not sufficient since the allocation of migration
> target hugepages is allowed to come from free hugepages that may
> contain isolated pageblocks and freed hugepages will not be on
> isolated free page lists so the alloc_contig_range() will fail.

Thanks for looking into this!  I am adding Oscar, Michal and David on
Cc: as they have looked at similar issues in the past.

Before jumping into the details of your changes, I just want to make one
observation.  When migrating (or dissolving) hugetlb pages that are in a
range operated on by alloc_contig_range(), we should not change the number
of hugetlb pages available as noted here.  This includes the number of
total huge pages and the number of huge pages on the node.  Therefore,
we should allocate another huge page from buddy either as the migration
target or to take the place of the dissolved page.

For free huge pages, we do this via alloc_and_dissolve_huge_page.  IIUC,
there are no issues with this code path?

As noted above, for pages to be migrated we first try to use an existing
free huge page as the target.  Quite some time ago, Michal added code to
allocate a new page from buddy as the target if no free huge pages were
available.  This change also included a special flag to dissolve the
source huge page when it is freed.  It seems like this is the exact
behavior we want here?  I wonder if it might be easier just to use this
existing code?
Mike Kravetz Sept. 22, 2022, 10:41 p.m. UTC | #2
On 09/22/22 13:25, Mike Kravetz wrote:
> On 09/21/22 15:36, Doug Berger wrote:
> 
> As noted above, for pages to be migrated we first try to use an existing
> free huge page as the target.  Quite some time ago, Michal added code to
> allocate a new page from buddy as the target if no free huge pages were
> available.  This change also included a special flag to dissolve the
> source huge page when it is freed.  It seems like this is the exact
> behavior we want here?  I wonder if it might be easier just to use this
> existing code?

Totally untested, but I believe the patch below would accomplish this.

From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Thu, 22 Sep 2022 15:32:10 -0700
Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to
 allocate new pages

When migrating hugetlb pages as the result of an alloc_contig_range
operation, allocate a new page from buddy for the migration target.
This guarantees that the number of hugetlb pages is not decreased by
the operation.  In addition, this will result in the special HPageTemporary
flag being set in the source page so that it will be dissolved when
freed.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/hugetlb.h |  5 +++--
 mm/hugetlb.c            | 12 ++++++++++--
 mm/internal.h           |  1 +
 mm/migrate.c            |  3 ++-
 mm/page_alloc.c         |  1 +
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cfe15b32e2d4..558831bf1087 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -702,7 +702,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
 struct page *alloc_huge_page(struct vm_area_struct *vma,
 				unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-				nodemask_t *nmask, gfp_t gfp_mask);
+				nodemask_t *nmask, gfp_t gfp_mask,
+				bool temporary);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 				unsigned long address);
 int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
@@ -1003,7 +1004,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
 
 static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-			nodemask_t *nmask, gfp_t gfp_mask)
+			nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
 {
 	return NULL;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bcaf66defc5..19de8ae79ec8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2308,8 +2308,11 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
 
 /* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-		nodemask_t *nmask, gfp_t gfp_mask)
+		nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
 {
+	if (temporary)
+		goto temporary_alloc;
+
 	spin_lock_irq(&hugetlb_lock);
 	if (h->free_huge_pages - h->resv_huge_pages > 0) {
 		struct page *page;
@@ -2322,6 +2325,11 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
 	}
 	spin_unlock_irq(&hugetlb_lock);
 
+temporary_alloc:
+	/*
+	 * Try to allocate a fresh page that with special HPageTemporary
+	 * characteristics
+	 */
 	return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
 }
 
@@ -2337,7 +2345,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
 
 	gfp_mask = htlb_alloc_mask(h);
 	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
+	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false);
 	mpol_cond_put(mpol);
 
 	return page;
diff --git a/mm/internal.h b/mm/internal.h
index b3002e03c28f..3ebf8885e24f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -800,6 +800,7 @@ struct migration_target_control {
 	int nid;		/* preferred node id */
 	nodemask_t *nmask;
 	gfp_t gfp_mask;
+	bool alloc_contig;	/* alloc_contig_range allocation */
 };
 
 /*
diff --git a/mm/migrate.c b/mm/migrate.c
index c228afba0963..6505ba2070d7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1610,7 +1610,8 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
 		struct hstate *h = page_hstate(&folio->page);
 
 		gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
+		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask,
+						mtc->alloc_contig);
 	}
 
 	if (folio_test_large(folio)) {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d7b20bf09c1c..2b8a5a2b51cd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9166,6 +9166,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
 	struct migration_target_control mtc = {
 		.nid = zone_to_nid(cc->zone),
 		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+		.alloc_contig = true,
 	};
 
 	lru_cache_disable();
Doug Berger Sept. 22, 2022, 11:14 p.m. UTC | #3
On 9/22/2022 1:25 PM, Mike Kravetz wrote:
> On 09/21/22 15:36, Doug Berger wrote:
>> This patch set was included as patches [04/21-06/21] in my
>> previous patch set to introduce Designated Movable Blocks [1].
>> They were included there for context, but they have value on
>> their own and are being resubmitted here for consideration on
>> their own merits.
>>
>> The alloc_contig_range() function attempts to isolate the free
>> pages within the range and migrate the data from non-free pages
>> within the range to allow those pages to become isolated free
>> pages. If all of the pages in the range are on isolated free
>> page lists they can be allocated to the caller.
>>
>> When free hugepages are encountered in the range an attempt is
>> made to allocate a new compound page to become a replacement
>> hugepage and to dissolve the free hugepage so that its pages
>> within isolated pageblocks can be added to the isolated free
>> page lists. Hugepages that are not free and encountered within
>> the range must be migrated out of the range and dissolved to
>> allow the underlying pages to be added to the isolated free
>> page lists.
>>
>> Moving the data from hugepages within the range and freeing the
>> hugepages is not sufficient since the allocation of migration
>> target hugepages is allowed to come from free hugepages that may
>> contain isolated pageblocks and freed hugepages will not be on
>> isolated free page lists so the alloc_contig_range() will fail.
> 
> Thanks for looking into this!  I am adding Oscar, Michal and David on
> Cc: as they have looked at similar issues in the past.
Thanks for helping to get the right eyeballs looking at this.

> 
> Before jumping into the details of your changes, I just want to make one
> observation.  When migrating (or dissolving) hugetlb pages that are in a
> range operated on by alloc_contig_range(), we should not change the number
> of hugetlb pages available as noted here.  This includes the number of
> total huge pages and the number of huge pages on the node.  Therefore,
> we should allocate another huge page from buddy either as the migration
> target or to take the place of the dissolved page.
> 
> For free huge pages, we do this via alloc_and_dissolve_huge_page.  IIUC,
> there are no issues with this code path?
Yes, I did not observe any issue with the the free hugepage handling 
except that your recent improvements with freezing allocated pages 
(thanks for those) will make merging patch 1 more difficult ;).

> 
> As noted above, for pages to be migrated we first try to use an existing
> free huge page as the target.  Quite some time ago, Michal added code to
> allocate a new page from buddy as the target if no free huge pages were
> available.  This change also included a special flag to dissolve the
> source huge page when it is freed.  It seems like this is the exact
> behavior we want here?  I wonder if it might be easier just to use this
> existing code?
Yes, the temporary page flag can be made to work here and I did 
experiment with it, but it is dependent on the current design decisions.
I decided to move to the approach suggested here because I believe it 
could conceivably scale if support for migrating gigantic pages is 
desired in the future. The circular dependency on alloc_contig_range 
will likely keep that from happening, but that was my thinking.

Thanks for taking the time,
-Doug
Doug Berger Sept. 22, 2022, 11:27 p.m. UTC | #4
On 9/22/2022 3:41 PM, Mike Kravetz wrote:
> On 09/22/22 13:25, Mike Kravetz wrote:
>> On 09/21/22 15:36, Doug Berger wrote:
>>
>> As noted above, for pages to be migrated we first try to use an existing
>> free huge page as the target.  Quite some time ago, Michal added code to
>> allocate a new page from buddy as the target if no free huge pages were
>> available.  This change also included a special flag to dissolve the
>> source huge page when it is freed.  It seems like this is the exact
>> behavior we want here?  I wonder if it might be easier just to use this
>> existing code?
> 
> Totally untested, but I believe the patch below would accomplish this.
> 
>  From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Thu, 22 Sep 2022 15:32:10 -0700
> Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to
>   allocate new pages
> 
> When migrating hugetlb pages as the result of an alloc_contig_range
> operation, allocate a new page from buddy for the migration target.
> This guarantees that the number of hugetlb pages is not decreased by
> the operation.  In addition, this will result in the special HPageTemporary
> flag being set in the source page so that it will be dissolved when
> freed.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   include/linux/hugetlb.h |  5 +++--
>   mm/hugetlb.c            | 12 ++++++++++--
>   mm/internal.h           |  1 +
>   mm/migrate.c            |  3 ++-
>   mm/page_alloc.c         |  1 +
>   5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index cfe15b32e2d4..558831bf1087 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -702,7 +702,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
>   struct page *alloc_huge_page(struct vm_area_struct *vma,
>   				unsigned long addr, int avoid_reserve);
>   struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> -				nodemask_t *nmask, gfp_t gfp_mask);
> +				nodemask_t *nmask, gfp_t gfp_mask,
> +				bool temporary);
>   struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
>   				unsigned long address);
>   int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping,
> @@ -1003,7 +1004,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
>   
>   static inline struct page *
>   alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> -			nodemask_t *nmask, gfp_t gfp_mask)
> +			nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
>   {
>   	return NULL;
>   }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8bcaf66defc5..19de8ae79ec8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2308,8 +2308,11 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
>   
>   /* page migration callback function */
>   struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> -		nodemask_t *nmask, gfp_t gfp_mask)
> +		nodemask_t *nmask, gfp_t gfp_mask, bool temporary)
>   {
> +	if (temporary)
> +		goto temporary_alloc;
> +
>   	spin_lock_irq(&hugetlb_lock);
>   	if (h->free_huge_pages - h->resv_huge_pages > 0) {
>   		struct page *page;
> @@ -2322,6 +2325,11 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
>   	}
>   	spin_unlock_irq(&hugetlb_lock);
>   
> +temporary_alloc:
> +	/*
> +	 * Try to allocate a fresh page that with special HPageTemporary
> +	 * characteristics
> +	 */
>   	return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask);
>   }
>   
> @@ -2337,7 +2345,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
>   
>   	gfp_mask = htlb_alloc_mask(h);
>   	node = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
> -	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask);
> +	page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false);
>   	mpol_cond_put(mpol);
>   
>   	return page;
> diff --git a/mm/internal.h b/mm/internal.h
> index b3002e03c28f..3ebf8885e24f 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -800,6 +800,7 @@ struct migration_target_control {
>   	int nid;		/* preferred node id */
>   	nodemask_t *nmask;
>   	gfp_t gfp_mask;
> +	bool alloc_contig;	/* alloc_contig_range allocation */
>   };
>   
>   /*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c228afba0963..6505ba2070d7 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1610,7 +1610,8 @@ struct page *alloc_migration_target(struct page *page, unsigned long private)
>   		struct hstate *h = page_hstate(&folio->page);
>   
>   		gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
> -		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
> +		return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask,
> +						mtc->alloc_contig);
>   	}
>   
>   	if (folio_test_large(folio)) {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d7b20bf09c1c..2b8a5a2b51cd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9166,6 +9166,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
>   	struct migration_target_control mtc = {
>   		.nid = zone_to_nid(cc->zone),
>   		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> +		.alloc_contig = true,
>   	};
>   
>   	lru_cache_disable();
I believe I exposed alloc_migrate_huge_page() and conditionally invoked 
it from alloc_migration_target() when in alloc_contig, which is roughly 
equivalent. I didn't consider modifying the mtc to pass the information 
so my logic in alloc_migration_target() was a little kludgy.

Like I said, this can be made to work and I'm happy to accept an 
alternative if others agree. I think the isolation test of patch 3 is 
also still desirable.

Thanks again!
-Doug
Mike Kravetz Sept. 23, 2022, 10:51 p.m. UTC | #5
On 09/22/22 16:27, Doug Berger wrote:
> On 9/22/2022 3:41 PM, Mike Kravetz wrote:
> > On 09/22/22 13:25, Mike Kravetz wrote:
> > > On 09/21/22 15:36, Doug Berger wrote:
> > > 
> > > As noted above, for pages to be migrated we first try to use an existing
> > > free huge page as the target.  Quite some time ago, Michal added code to
> > > allocate a new page from buddy as the target if no free huge pages were
> > > available.  This change also included a special flag to dissolve the
> > > source huge page when it is freed.  It seems like this is the exact
> > > behavior we want here?  I wonder if it might be easier just to use this
> > > existing code?
> > 
> > Totally untested, but I believe the patch below would accomplish this.
> > 
> >  From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001
> > From: Mike Kravetz <mike.kravetz@oracle.com>
> > Date: Thu, 22 Sep 2022 15:32:10 -0700
> > Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to
> >   allocate new pages
> > 
> > When migrating hugetlb pages as the result of an alloc_contig_range
> > operation, allocate a new page from buddy for the migration target.
> > This guarantees that the number of hugetlb pages is not decreased by
> > the operation.  In addition, this will result in the special HPageTemporary
> > flag being set in the source page so that it will be dissolved when
> > freed.
> > 
<snip>
> I believe I exposed alloc_migrate_huge_page() and conditionally invoked it
> from alloc_migration_target() when in alloc_contig, which is roughly
> equivalent. I didn't consider modifying the mtc to pass the information so
> my logic in alloc_migration_target() was a little kludgy.
> 
> Like I said, this can be made to work and I'm happy to accept an alternative
> if others agree. I think the isolation test of patch 3 is also still
> desirable.

Yes, hoping to get some other opinions as well.

I do agree that patch 3 is still a good idea.