Message ID | 1739936804-18199-1-git-send-email-yangge1116@126.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [V4] mm/hugetlb: wait for hugetlb folios to be freed | expand |
> On Feb 19, 2025, at 11:46, yangge1116@126.com wrote: > > From: Ge Yang <yangge1116@126.com> > > Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing > of huge pages if in non-task context"), which supports deferring the > freeing of hugetlb pages, the allocation of contiguous memory through > cma_alloc() may fail probabilistically. > > In the CMA allocation process, if it is found that the CMA area is occupied > by in-use hugetlb folios, these in-use hugetlb folios need to be migrated > to another location. When there are no available hugetlb folios in the > free hugetlb pool during the migration of in-use hugetlb folios, new folios > are allocated from the buddy system. A temporary state is set on the newly > allocated folio. Upon completion of the hugetlb folio migration, the > temporary state is transferred from the new folios to the old folios. > Normally, when the old folios with the temporary state are freed, it is > directly released back to the buddy system. However, due to the deferred > freeing of hugetlb pages, the PageBuddy() check fails, ultimately leading > to the failure of cma_alloc(). > > Here is a simplified call trace illustrating the process: > cma_alloc() > ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios > ->unmap_and_move_huge_page() > ->folio_putback_hugetlb() // Free old folios > ->test_pages_isolated() > ->__test_page_isolated_in_pageblock() > ->PageBuddy(page) // Check if the page is in buddy > > To resolve this issue, we have implemented a function named > wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb > folios are properly released back to the buddy system after their migration > is completed. By invoking wait_for_freed_hugetlb_folios() before calling > PageBuddy(), we ensure that PageBuddy() will succeed. > > Fixes: c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context") > Signed-off-by: Ge Yang <yangge1116@126.com> > Cc: <stable@vger.kernel.org> Reviewed-by: Muchun Song <muchun.song@linux.dev> Thanks.
On 19.02.25 04:46, yangge1116@126.com wrote: > From: Ge Yang <yangge1116@126.com> > > Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing > of huge pages if in non-task context"), which supports deferring the > freeing of hugetlb pages, the allocation of contiguous memory through > cma_alloc() may fail probabilistically. > > In the CMA allocation process, if it is found that the CMA area is occupied > by in-use hugetlb folios, these in-use hugetlb folios need to be migrated > to another location. When there are no available hugetlb folios in the > free hugetlb pool during the migration of in-use hugetlb folios, new folios > are allocated from the buddy system. A temporary state is set on the newly > allocated folio. Upon completion of the hugetlb folio migration, the > temporary state is transferred from the new folios to the old folios. > Normally, when the old folios with the temporary state are freed, it is > directly released back to the buddy system. However, due to the deferred > freeing of hugetlb pages, the PageBuddy() check fails, ultimately leading > to the failure of cma_alloc(). > > Here is a simplified call trace illustrating the process: > cma_alloc() > ->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios > ->unmap_and_move_huge_page() > ->folio_putback_hugetlb() // Free old folios > ->test_pages_isolated() > ->__test_page_isolated_in_pageblock() > ->PageBuddy(page) // Check if the page is in buddy > > To resolve this issue, we have implemented a function named > wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb > folios are properly released back to the buddy system after their migration > is completed. By invoking wait_for_freed_hugetlb_folios() before calling > PageBuddy(), we ensure that PageBuddy() will succeed. > > Fixes: c77c0a8ac4c52 ("mm/hugetlb: defer freeing of huge pages if in non-task context") > Signed-off-by: Ge Yang <yangge1116@126.com> > Cc: <stable@vger.kernel.org> > --- > > V4: > - add a check to determine if hpage_freelist is empty suggested by David > > V3: > - adjust code and message suggested by Muchun and David > > V2: > - flush all folios at once suggested by David > > include/linux/hugetlb.h | 5 +++++ > mm/hugetlb.c | 8 ++++++++ > mm/page_isolation.c | 10 ++++++++++ > 3 files changed, 23 insertions(+) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 6c6546b..0c54b3a 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); > > int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); > int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); > +void wait_for_freed_hugetlb_folios(void); > struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > unsigned long addr, bool cow_from_owner); > struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, > @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, > return 0; > } > > +static inline void wait_for_freed_hugetlb_folios(void) > +{ > +} > + > static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, > unsigned long addr, > bool cow_from_owner) > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 30bc34d..8801dbc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) > return ret; > } > > +void wait_for_freed_hugetlb_folios(void) > +{ > + if (llist_empty(&hpage_freelist)) > + return; > + > + flush_work(&free_hpage_work); > +} > + > typedef enum { > /* > * For either 0/1: we checked the per-vma resv map, and one resv > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 8ed53ee0..b2fc526 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, > int ret; > > /* > + * Due to the deferred freeing of hugetlb folios, the hugepage folios may > + * not immediately release to the buddy system. This can cause PageBuddy() > + * to fail in __test_page_isolated_in_pageblock(). To ensure that the > + * hugetlb folios are properly released back to the buddy system, we > + * invoke the wait_for_freed_hugetlb_folios() function to wait for the > + * release to complete. > + */ > + wait_for_freed_hugetlb_folios(); > + > + /* > * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free > * pages are not aligned to pageblock_nr_pages. > * Then we just check migratetype first. Acked-by: David Hildenbrand <david@redhat.com>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6c6546b..0c54b3a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -697,6 +697,7 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct huge_bootmem_page *m); int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn); +void wait_for_freed_hugetlb_folios(void); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, @@ -1092,6 +1093,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn, return 0; } +static inline void wait_for_freed_hugetlb_folios(void) +{ +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 30bc34d..8801dbc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2955,6 +2955,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn) return ret; } +void wait_for_freed_hugetlb_folios(void) +{ + if (llist_empty(&hpage_freelist)) + return; + + flush_work(&free_hpage_work); +} + typedef enum { /* * For either 0/1: we checked the per-vma resv map, and one resv diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 8ed53ee0..b2fc526 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -615,6 +615,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, int ret; /* + * Due to the deferred freeing of hugetlb folios, the hugepage folios may + * not immediately release to the buddy system. This can cause PageBuddy() + * to fail in __test_page_isolated_in_pageblock(). To ensure that the + * hugetlb folios are properly released back to the buddy system, we + * invoke the wait_for_freed_hugetlb_folios() function to wait for the + * release to complete. + */ + wait_for_freed_hugetlb_folios(); + + /* * Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free * pages are not aligned to pageblock_nr_pages. * Then we just check migratetype first.