Message ID | 20220921223639.1152392-1-opendmb@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | mm/hugetlb: hugepage migration enhancements | expand |
On 09/21/22 15:36, Doug Berger wrote: > This patch set was included as patches [04/21-06/21] in my > previous patch set to introduce Designated Movable Blocks [1]. > They were included there for context, but they have value on > their own and are being resubmitted here for consideration on > their own merits. > > The alloc_contig_range() function attempts to isolate the free > pages within the range and migrate the data from non-free pages > within the range to allow those pages to become isolated free > pages. If all of the pages in the range are on isolated free > page lists they can be allocated to the caller. > > When free hugepages are encountered in the range an attempt is > made to allocate a new compound page to become a replacement > hugepage and to dissolve the free hugepage so that its pages > within isolated pageblocks can be added to the isolated free > page lists. Hugepages that are not free and encountered within > the range must be migrated out of the range and dissolved to > allow the underlying pages to be added to the isolated free > page lists. > > Moving the data from hugepages within the range and freeing the > hugepages is not sufficient since the allocation of migration > target hugepages is allowed to come from free hugepages that may > contain isolated pageblocks and freed hugepages will not be on > isolated free page lists so the alloc_contig_range() will fail. Thanks for looking into this! I am adding Oscar, Michal and David on Cc: as they have looked at similar issues in the past. Before jumping into the details of your changes, I just want to make one observation. When migrating (or dissolving) hugetlb pages that are in a range operated on by alloc_contig_range(), we should not change the number of hugetlb pages available as noted here. This includes the number of total huge pages and the number of huge pages on the node. Therefore, we should allocate another huge page from buddy either as the migration target or to take the place of the dissolved page. For free huge pages, we do this via alloc_and_dissolve_huge_page. IIUC, there are no issues with this code path? As noted above, for pages to be migrated we first try to use an existing free huge page as the target. Quite some time ago, Michal added code to allocate a new page from buddy as the target if no free huge pages were available. This change also included a special flag to dissolve the source huge page when it is freed. It seems like this is the exact behavior we want here? I wonder if it might be easier just to use this existing code?
On 09/22/22 13:25, Mike Kravetz wrote: > On 09/21/22 15:36, Doug Berger wrote: > > As noted above, for pages to be migrated we first try to use an existing > free huge page as the target. Quite some time ago, Michal added code to > allocate a new page from buddy as the target if no free huge pages were > available. This change also included a special flag to dissolve the > source huge page when it is freed. It seems like this is the exact > behavior we want here? I wonder if it might be easier just to use this > existing code? Totally untested, but I believe the patch below would accomplish this. From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz@oracle.com> Date: Thu, 22 Sep 2022 15:32:10 -0700 Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to allocate new pages When migrating hugetlb pages as the result of an alloc_contig_range operation, allocate a new page from buddy for the migration target. This guarantees that the number of hugetlb pages is not decreased by the operation. In addition, this will result in the special HPageTemporary flag being set in the source page so that it will be dissolved when freed. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 12 ++++++++++-- mm/internal.h | 1 + mm/migrate.c | 3 ++- mm/page_alloc.c | 1 + 5 files changed, 17 insertions(+), 5 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cfe15b32e2d4..558831bf1087 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -702,7 +702,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, - nodemask_t *nmask, gfp_t gfp_mask); + nodemask_t *nmask, gfp_t gfp_mask, + bool temporary); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, @@ -1003,7 +1004,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma, static inline struct page * alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, - nodemask_t *nmask, gfp_t gfp_mask) + nodemask_t *nmask, gfp_t gfp_mask, bool temporary) { return NULL; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8bcaf66defc5..19de8ae79ec8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2308,8 +2308,11 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, /* page migration callback function */ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, - nodemask_t *nmask, gfp_t gfp_mask) + nodemask_t *nmask, gfp_t gfp_mask, bool temporary) { + if (temporary) + goto temporary_alloc; + spin_lock_irq(&hugetlb_lock); if (h->free_huge_pages - h->resv_huge_pages > 0) { struct page *page; @@ -2322,6 +2325,11 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, } spin_unlock_irq(&hugetlb_lock); +temporary_alloc: + /* + * Try to allocate a fresh page that with special HPageTemporary + * characteristics + */ return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); } @@ -2337,7 +2345,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, gfp_mask = htlb_alloc_mask(h); node = huge_node(vma, address, gfp_mask, &mpol, &nodemask); - page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask); + page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false); mpol_cond_put(mpol); return page; diff --git a/mm/internal.h b/mm/internal.h index b3002e03c28f..3ebf8885e24f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -800,6 +800,7 @@ struct migration_target_control { int nid; /* preferred node id */ nodemask_t *nmask; gfp_t gfp_mask; + bool alloc_contig; /* alloc_contig_range allocation */ }; /* diff --git a/mm/migrate.c b/mm/migrate.c index c228afba0963..6505ba2070d7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1610,7 +1610,8 @@ struct page *alloc_migration_target(struct page *page, unsigned long private) struct hstate *h = page_hstate(&folio->page); gfp_mask = htlb_modify_alloc_mask(h, gfp_mask); - return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask); + return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask, + mtc->alloc_contig); } if (folio_test_large(folio)) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d7b20bf09c1c..2b8a5a2b51cd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9166,6 +9166,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc, struct migration_target_control mtc = { .nid = zone_to_nid(cc->zone), .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, + .alloc_contig = true, }; lru_cache_disable();
On 9/22/2022 1:25 PM, Mike Kravetz wrote: > On 09/21/22 15:36, Doug Berger wrote: >> This patch set was included as patches [04/21-06/21] in my >> previous patch set to introduce Designated Movable Blocks [1]. >> They were included there for context, but they have value on >> their own and are being resubmitted here for consideration on >> their own merits. >> >> The alloc_contig_range() function attempts to isolate the free >> pages within the range and migrate the data from non-free pages >> within the range to allow those pages to become isolated free >> pages. If all of the pages in the range are on isolated free >> page lists they can be allocated to the caller. >> >> When free hugepages are encountered in the range an attempt is >> made to allocate a new compound page to become a replacement >> hugepage and to dissolve the free hugepage so that its pages >> within isolated pageblocks can be added to the isolated free >> page lists. Hugepages that are not free and encountered within >> the range must be migrated out of the range and dissolved to >> allow the underlying pages to be added to the isolated free >> page lists. >> >> Moving the data from hugepages within the range and freeing the >> hugepages is not sufficient since the allocation of migration >> target hugepages is allowed to come from free hugepages that may >> contain isolated pageblocks and freed hugepages will not be on >> isolated free page lists so the alloc_contig_range() will fail. > > Thanks for looking into this! I am adding Oscar, Michal and David on > Cc: as they have looked at similar issues in the past. Thanks for helping to get the right eyeballs looking at this. > > Before jumping into the details of your changes, I just want to make one > observation. When migrating (or dissolving) hugetlb pages that are in a > range operated on by alloc_contig_range(), we should not change the number > of hugetlb pages available as noted here. This includes the number of > total huge pages and the number of huge pages on the node. Therefore, > we should allocate another huge page from buddy either as the migration > target or to take the place of the dissolved page. > > For free huge pages, we do this via alloc_and_dissolve_huge_page. IIUC, > there are no issues with this code path? Yes, I did not observe any issue with the the free hugepage handling except that your recent improvements with freezing allocated pages (thanks for those) will make merging patch 1 more difficult ;). > > As noted above, for pages to be migrated we first try to use an existing > free huge page as the target. Quite some time ago, Michal added code to > allocate a new page from buddy as the target if no free huge pages were > available. This change also included a special flag to dissolve the > source huge page when it is freed. It seems like this is the exact > behavior we want here? I wonder if it might be easier just to use this > existing code? Yes, the temporary page flag can be made to work here and I did experiment with it, but it is dependent on the current design decisions. I decided to move to the approach suggested here because I believe it could conceivably scale if support for migrating gigantic pages is desired in the future. The circular dependency on alloc_contig_range will likely keep that from happening, but that was my thinking. Thanks for taking the time, -Doug
On 9/22/2022 3:41 PM, Mike Kravetz wrote: > On 09/22/22 13:25, Mike Kravetz wrote: >> On 09/21/22 15:36, Doug Berger wrote: >> >> As noted above, for pages to be migrated we first try to use an existing >> free huge page as the target. Quite some time ago, Michal added code to >> allocate a new page from buddy as the target if no free huge pages were >> available. This change also included a special flag to dissolve the >> source huge page when it is freed. It seems like this is the exact >> behavior we want here? I wonder if it might be easier just to use this >> existing code? > > Totally untested, but I believe the patch below would accomplish this. > > From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz <mike.kravetz@oracle.com> > Date: Thu, 22 Sep 2022 15:32:10 -0700 > Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to > allocate new pages > > When migrating hugetlb pages as the result of an alloc_contig_range > operation, allocate a new page from buddy for the migration target. > This guarantees that the number of hugetlb pages is not decreased by > the operation. In addition, this will result in the special HPageTemporary > flag being set in the source page so that it will be dissolved when > freed. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > include/linux/hugetlb.h | 5 +++-- > mm/hugetlb.c | 12 ++++++++++-- > mm/internal.h | 1 + > mm/migrate.c | 3 ++- > mm/page_alloc.c | 1 + > 5 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index cfe15b32e2d4..558831bf1087 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -702,7 +702,8 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve); > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > - nodemask_t *nmask, gfp_t gfp_mask); > + nodemask_t *nmask, gfp_t gfp_mask, > + bool temporary); > struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, > unsigned long address); > int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, > @@ -1003,7 +1004,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma, > > static inline struct page * > alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > - nodemask_t *nmask, gfp_t gfp_mask) > + nodemask_t *nmask, gfp_t gfp_mask, bool temporary) > { > return NULL; > } > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 8bcaf66defc5..19de8ae79ec8 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2308,8 +2308,11 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, > > /* page migration callback function */ > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > - nodemask_t *nmask, gfp_t gfp_mask) > + nodemask_t *nmask, gfp_t gfp_mask, bool temporary) > { > + if (temporary) > + goto temporary_alloc; > + > spin_lock_irq(&hugetlb_lock); > if (h->free_huge_pages - h->resv_huge_pages > 0) { > struct page *page; > @@ -2322,6 +2325,11 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > } > spin_unlock_irq(&hugetlb_lock); > > +temporary_alloc: > + /* > + * Try to allocate a fresh page that with special HPageTemporary > + * characteristics > + */ > return alloc_migrate_huge_page(h, gfp_mask, preferred_nid, nmask); > } > > @@ -2337,7 +2345,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, > > gfp_mask = htlb_alloc_mask(h); > node = huge_node(vma, address, gfp_mask, &mpol, &nodemask); > - page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask); > + page = alloc_huge_page_nodemask(h, node, nodemask, gfp_mask, false); > mpol_cond_put(mpol); > > return page; > diff --git a/mm/internal.h b/mm/internal.h > index b3002e03c28f..3ebf8885e24f 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -800,6 +800,7 @@ struct migration_target_control { > int nid; /* preferred node id */ > nodemask_t *nmask; > gfp_t gfp_mask; > + bool alloc_contig; /* alloc_contig_range allocation */ > }; > > /* > diff --git a/mm/migrate.c b/mm/migrate.c > index c228afba0963..6505ba2070d7 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1610,7 +1610,8 @@ struct page *alloc_migration_target(struct page *page, unsigned long private) > struct hstate *h = page_hstate(&folio->page); > > gfp_mask = htlb_modify_alloc_mask(h, gfp_mask); > - return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask); > + return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask, > + mtc->alloc_contig); > } > > if (folio_test_large(folio)) { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d7b20bf09c1c..2b8a5a2b51cd 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -9166,6 +9166,7 @@ int __alloc_contig_migrate_range(struct compact_control *cc, > struct migration_target_control mtc = { > .nid = zone_to_nid(cc->zone), > .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, > + .alloc_contig = true, > }; > > lru_cache_disable(); I believe I exposed alloc_migrate_huge_page() and conditionally invoked it from alloc_migration_target() when in alloc_contig, which is roughly equivalent. I didn't consider modifying the mtc to pass the information so my logic in alloc_migration_target() was a little kludgy. Like I said, this can be made to work and I'm happy to accept an alternative if others agree. I think the isolation test of patch 3 is also still desirable. Thanks again! -Doug
On 09/22/22 16:27, Doug Berger wrote: > On 9/22/2022 3:41 PM, Mike Kravetz wrote: > > On 09/22/22 13:25, Mike Kravetz wrote: > > > On 09/21/22 15:36, Doug Berger wrote: > > > > > > As noted above, for pages to be migrated we first try to use an existing > > > free huge page as the target. Quite some time ago, Michal added code to > > > allocate a new page from buddy as the target if no free huge pages were > > > available. This change also included a special flag to dissolve the > > > source huge page when it is freed. It seems like this is the exact > > > behavior we want here? I wonder if it might be easier just to use this > > > existing code? > > > > Totally untested, but I believe the patch below would accomplish this. > > > > From aa8fc11bb67bc9e67e3b6b280fab339afce37759 Mon Sep 17 00:00:00 2001 > > From: Mike Kravetz <mike.kravetz@oracle.com> > > Date: Thu, 22 Sep 2022 15:32:10 -0700 > > Subject: [PATCH] hugetlb: force alloc_contig_range hugetlb migrations to > > allocate new pages > > > > When migrating hugetlb pages as the result of an alloc_contig_range > > operation, allocate a new page from buddy for the migration target. > > This guarantees that the number of hugetlb pages is not decreased by > > the operation. In addition, this will result in the special HPageTemporary > > flag being set in the source page so that it will be dissolved when > > freed. > > <snip> > I believe I exposed alloc_migrate_huge_page() and conditionally invoked it > from alloc_migration_target() when in alloc_contig, which is roughly > equivalent. I didn't consider modifying the mtc to pass the information so > my logic in alloc_migration_target() was a little kludgy. > > Like I said, this can be made to work and I'm happy to accept an alternative > if others agree. I think the isolation test of patch 3 is also still > desirable. Yes, hoping to get some other opinions as well. I do agree that patch 3 is still a good idea.