mm/hugetlb: wait for hugepage folios to be freed

Message ID	1739514729-21265-1-git-send-email-yangge1116@126.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: yangge1116@126.com To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, muchun.song@linux.dev, osalvador@suse.de, liuzixing@hygon.cn, Ge Yang <yangge1116@126.com> Subject: [PATCH] mm/hugetlb: wait for hugepage folios to be freed Date: Fri, 14 Feb 2025 14:32:09 +0800 Message-Id: <1739514729-21265-1-git-send-email-yangge1116@126.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/hugetlb: wait for hugepage folios to be freed \| expand mm/hugetlb: wait for hugepage folios to be freed

Ge Yang Feb. 14, 2025, 6:32 a.m. UTC

From: Ge Yang <yangge1116@126.com>

Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
the allocation of contiguous memory through cma_alloc() may fail
probabilistically.

In the CMA allocation process, if it is found that the CMA area is occupied
by in-use hugepage folios, these in-use hugepage folios need to be migrated
to another location. When there are no available hugepage folios in the
free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
are allocated from the buddy system. A temporary state is set on the newly
allocated folio. Upon completion of the hugepage folio migration, the
temporary state is transferred from the new folios to the old folios.
Normally, when the old folios with the temporary state are freed, it is
directly released back to the buddy system. However, due to the deferred
freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
to the failure of cma_alloc().

Here is a simplified call trace illustrating the process:
cma_alloc()
    ->__alloc_contig_migrate_range() // Migrate in-use hugepage
        ->unmap_and_move_huge_page()
            ->folio_putback_hugetlb() // Free old folios
    ->test_pages_isolated()
        ->__test_page_isolated_in_pageblock()
             ->PageBuddy(page) // Check if the page is in buddy

To resolve this issue, we have implemented a function named
wait_for_hugepage_folios_freed(). This function ensures that the hugepage
folios are properly released back to the buddy system after their migration
is completed. By invoking wait_for_hugepage_folios_freed() following the
migration process, we guarantee that when test_pages_isolated() is
executed, it will successfully pass.

Fixes: b65d4adbc0f0 ("mm: hugetlb: defer freeing of HugeTLB pages")
Signed-off-by: Ge Yang <yangge1116@126.com>
---
 include/linux/hugetlb.h |  5 +++++
 mm/hugetlb.c            |  7 +++++++
 mm/migrate.c            | 16 ++++++++++++++--
 3 files changed, 26 insertions(+), 2 deletions(-)

David Hildenbrand Feb. 14, 2025, 8:08 a.m. UTC | #1

On 14.02.25 07:32, yangge1116@126.com wrote:
> From: Ge Yang <yangge1116@126.com>
> 
> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer freeing
> of HugeTLB pages"), which supports deferring the freeing of HugeTLB pages,
> the allocation of contiguous memory through cma_alloc() may fail
> probabilistically.
> 
> In the CMA allocation process, if it is found that the CMA area is occupied
> by in-use hugepage folios, these in-use hugepage folios need to be migrated
> to another location. When there are no available hugepage folios in the
> free HugeTLB pool during the migration of in-use HugeTLB pages, new folios
> are allocated from the buddy system. A temporary state is set on the newly
> allocated folio. Upon completion of the hugepage folio migration, the
> temporary state is transferred from the new folios to the old folios.
> Normally, when the old folios with the temporary state are freed, it is
> directly released back to the buddy system. However, due to the deferred
> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
> to the failure of cma_alloc().
> 
> Here is a simplified call trace illustrating the process:
> cma_alloc()
>      ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>          ->unmap_and_move_huge_page()
>              ->folio_putback_hugetlb() // Free old folios
>      ->test_pages_isolated()
>          ->__test_page_isolated_in_pageblock()
>               ->PageBuddy(page) // Check if the page is in buddy
> 
> To resolve this issue, we have implemented a function named
> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
> folios are properly released back to the buddy system after their migration
> is completed. By invoking wait_for_hugepage_folios_freed() following the
> migration process, we guarantee that when test_pages_isolated() is
> executed, it will successfully pass.

Okay, so after every successful migration -> put of src, we wait for the 
src to actually get freed.

When migrating multiple hugetlb folios, we'd wait once per folio.

It reminds me a bit about pcp caches, where folios are !buddy until the 
pcp was drained.

I wonder if that waiting should instead be done exactly once after 
migrating multiple folios? For example, at the beginning of 
test_pages_isolated(), to "flush" that state from any previous migration?

Thanks for all your effort around making CMA allocations / migration 
more reliable.

Ge Yang Feb. 15, 2025, 5:50 a.m. UTC | #2

在 2025/2/14 16:08, David Hildenbrand 写道:
> On 14.02.25 07:32, yangge1116@126.com wrote:
>> From: Ge Yang <yangge1116@126.com>
>>
>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer 
>> freeing
>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB 
>> pages,
>> the allocation of contiguous memory through cma_alloc() may fail
>> probabilistically.
>>
>> In the CMA allocation process, if it is found that the CMA area is 
>> occupied
>> by in-use hugepage folios, these in-use hugepage folios need to be 
>> migrated
>> to another location. When there are no available hugepage folios in the
>> free HugeTLB pool during the migration of in-use HugeTLB pages, new 
>> folios
>> are allocated from the buddy system. A temporary state is set on the 
>> newly
>> allocated folio. Upon completion of the hugepage folio migration, the
>> temporary state is transferred from the new folios to the old folios.
>> Normally, when the old folios with the temporary state are freed, it is
>> directly released back to the buddy system. However, due to the deferred
>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
>> to the failure of cma_alloc().
>>
>> Here is a simplified call trace illustrating the process:
>> cma_alloc()
>>      ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>          ->unmap_and_move_huge_page()
>>              ->folio_putback_hugetlb() // Free old folios
>>      ->test_pages_isolated()
>>          ->__test_page_isolated_in_pageblock()
>>               ->PageBuddy(page) // Check if the page is in buddy
>>
>> To resolve this issue, we have implemented a function named
>> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
>> folios are properly released back to the buddy system after their 
>> migration
>> is completed. By invoking wait_for_hugepage_folios_freed() following the
>> migration process, we guarantee that when test_pages_isolated() is
>> executed, it will successfully pass.
> 
> Okay, so after every successful migration -> put of src, we wait for the 
> src to actually get freed.
> 
> When migrating multiple hugetlb folios, we'd wait once per folio.
> 
> It reminds me a bit about pcp caches, where folios are !buddy until the 
> pcp was drained.
> 
It seems that we only track unmovable, reclaimable, and movable pages on 
the pcp lists. For specific details, please refer to the 
free_frozen_pages() function.

> I wonder if that waiting should instead be done exactly once after 
> migrating multiple folios? For example, at the beginning of 
> test_pages_isolated(), to "flush" that state from any previous migration?
> 
Yes, this can improve performance. I will make the modification in the 
next version. Thank you.
> Thanks for all your effort around making CMA allocations / migration 
> more reliable.
>

David Hildenbrand Feb. 18, 2025, 8:55 a.m. UTC | #3

On 15.02.25 06:50, Ge Yang wrote:
> 
> 
> 在 2025/2/14 16:08, David Hildenbrand 写道:
>> On 14.02.25 07:32, yangge1116@126.com wrote:
>>> From: Ge Yang <yangge1116@126.com>
>>>
>>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer
>>> freeing
>>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB
>>> pages,
>>> the allocation of contiguous memory through cma_alloc() may fail
>>> probabilistically.
>>>
>>> In the CMA allocation process, if it is found that the CMA area is
>>> occupied
>>> by in-use hugepage folios, these in-use hugepage folios need to be
>>> migrated
>>> to another location. When there are no available hugepage folios in the
>>> free HugeTLB pool during the migration of in-use HugeTLB pages, new
>>> folios
>>> are allocated from the buddy system. A temporary state is set on the
>>> newly
>>> allocated folio. Upon completion of the hugepage folio migration, the
>>> temporary state is transferred from the new folios to the old folios.
>>> Normally, when the old folios with the temporary state are freed, it is
>>> directly released back to the buddy system. However, due to the deferred
>>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately leading
>>> to the failure of cma_alloc().
>>>
>>> Here is a simplified call trace illustrating the process:
>>> cma_alloc()
>>>       ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>>           ->unmap_and_move_huge_page()
>>>               ->folio_putback_hugetlb() // Free old folios
>>>       ->test_pages_isolated()
>>>           ->__test_page_isolated_in_pageblock()
>>>                ->PageBuddy(page) // Check if the page is in buddy
>>>
>>> To resolve this issue, we have implemented a function named
>>> wait_for_hugepage_folios_freed(). This function ensures that the hugepage
>>> folios are properly released back to the buddy system after their
>>> migration
>>> is completed. By invoking wait_for_hugepage_folios_freed() following the
>>> migration process, we guarantee that when test_pages_isolated() is
>>> executed, it will successfully pass.
>>
>> Okay, so after every successful migration -> put of src, we wait for the
>> src to actually get freed.
>>
>> When migrating multiple hugetlb folios, we'd wait once per folio.
>>
>> It reminds me a bit about pcp caches, where folios are !buddy until the
>> pcp was drained.
>>
> It seems that we only track unmovable, reclaimable, and movable pages on
> the pcp lists. For specific details, please refer to the
> free_frozen_pages() function.

It reminded me about PCP caches, because we effectively also have to 
wait for some stuck folios to properly get freed to the buddy.

Ge Yang Feb. 18, 2025, 9:22 a.m. UTC | #4

在 2025/2/18 16:55, David Hildenbrand 写道:
> On 15.02.25 06:50, Ge Yang wrote:
>>
>>
>> 在 2025/2/14 16:08, David Hildenbrand 写道:
>>> On 14.02.25 07:32, yangge1116@126.com wrote:
>>>> From: Ge Yang <yangge1116@126.com>
>>>>
>>>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer
>>>> freeing
>>>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB
>>>> pages,
>>>> the allocation of contiguous memory through cma_alloc() may fail
>>>> probabilistically.
>>>>
>>>> In the CMA allocation process, if it is found that the CMA area is
>>>> occupied
>>>> by in-use hugepage folios, these in-use hugepage folios need to be
>>>> migrated
>>>> to another location. When there are no available hugepage folios in the
>>>> free HugeTLB pool during the migration of in-use HugeTLB pages, new
>>>> folios
>>>> are allocated from the buddy system. A temporary state is set on the
>>>> newly
>>>> allocated folio. Upon completion of the hugepage folio migration, the
>>>> temporary state is transferred from the new folios to the old folios.
>>>> Normally, when the old folios with the temporary state are freed, it is
>>>> directly released back to the buddy system. However, due to the 
>>>> deferred
>>>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately 
>>>> leading
>>>> to the failure of cma_alloc().
>>>>
>>>> Here is a simplified call trace illustrating the process:
>>>> cma_alloc()
>>>>       ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>>>           ->unmap_and_move_huge_page()
>>>>               ->folio_putback_hugetlb() // Free old folios
>>>>       ->test_pages_isolated()
>>>>           ->__test_page_isolated_in_pageblock()
>>>>                ->PageBuddy(page) // Check if the page is in buddy
>>>>
>>>> To resolve this issue, we have implemented a function named
>>>> wait_for_hugepage_folios_freed(). This function ensures that the 
>>>> hugepage
>>>> folios are properly released back to the buddy system after their
>>>> migration
>>>> is completed. By invoking wait_for_hugepage_folios_freed() following 
>>>> the
>>>> migration process, we guarantee that when test_pages_isolated() is
>>>> executed, it will successfully pass.
>>>
>>> Okay, so after every successful migration -> put of src, we wait for the
>>> src to actually get freed.
>>>
>>> When migrating multiple hugetlb folios, we'd wait once per folio.
>>>
>>> It reminds me a bit about pcp caches, where folios are !buddy until the
>>> pcp was drained.
>>>
>> It seems that we only track unmovable, reclaimable, and movable pages on
>> the pcp lists. For specific details, please refer to the
>> free_frozen_pages() function.
> 
> It reminded me about PCP caches, because we effectively also have to 
> wait for some stuck folios to properly get freed to the buddy.
> 
It seems that when an isolated page is freed, it won't be placed back 
into the PCP caches.

David Hildenbrand Feb. 18, 2025, 9:41 a.m. UTC | #5

On 18.02.25 10:22, Ge Yang wrote:
> 
> 
> 在 2025/2/18 16:55, David Hildenbrand 写道:
>> On 15.02.25 06:50, Ge Yang wrote:
>>>
>>>
>>> 在 2025/2/14 16:08, David Hildenbrand 写道:
>>>> On 14.02.25 07:32, yangge1116@126.com wrote:
>>>>> From: Ge Yang <yangge1116@126.com>
>>>>>
>>>>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer
>>>>> freeing
>>>>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB
>>>>> pages,
>>>>> the allocation of contiguous memory through cma_alloc() may fail
>>>>> probabilistically.
>>>>>
>>>>> In the CMA allocation process, if it is found that the CMA area is
>>>>> occupied
>>>>> by in-use hugepage folios, these in-use hugepage folios need to be
>>>>> migrated
>>>>> to another location. When there are no available hugepage folios in the
>>>>> free HugeTLB pool during the migration of in-use HugeTLB pages, new
>>>>> folios
>>>>> are allocated from the buddy system. A temporary state is set on the
>>>>> newly
>>>>> allocated folio. Upon completion of the hugepage folio migration, the
>>>>> temporary state is transferred from the new folios to the old folios.
>>>>> Normally, when the old folios with the temporary state are freed, it is
>>>>> directly released back to the buddy system. However, due to the
>>>>> deferred
>>>>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately
>>>>> leading
>>>>> to the failure of cma_alloc().
>>>>>
>>>>> Here is a simplified call trace illustrating the process:
>>>>> cma_alloc()
>>>>>        ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>>>>            ->unmap_and_move_huge_page()
>>>>>                ->folio_putback_hugetlb() // Free old folios
>>>>>        ->test_pages_isolated()
>>>>>            ->__test_page_isolated_in_pageblock()
>>>>>                 ->PageBuddy(page) // Check if the page is in buddy
>>>>>
>>>>> To resolve this issue, we have implemented a function named
>>>>> wait_for_hugepage_folios_freed(). This function ensures that the
>>>>> hugepage
>>>>> folios are properly released back to the buddy system after their
>>>>> migration
>>>>> is completed. By invoking wait_for_hugepage_folios_freed() following
>>>>> the
>>>>> migration process, we guarantee that when test_pages_isolated() is
>>>>> executed, it will successfully pass.
>>>>
>>>> Okay, so after every successful migration -> put of src, we wait for the
>>>> src to actually get freed.
>>>>
>>>> When migrating multiple hugetlb folios, we'd wait once per folio.
>>>>
>>>> It reminds me a bit about pcp caches, where folios are !buddy until the
>>>> pcp was drained.
>>>>
>>> It seems that we only track unmovable, reclaimable, and movable pages on
>>> the pcp lists. For specific details, please refer to the
>>> free_frozen_pages() function.
>>
>> It reminded me about PCP caches, because we effectively also have to
>> wait for some stuck folios to properly get freed to the buddy.
>>
> It seems that when an isolated page is freed, it won't be placed back
> into the PCP caches.

I recall there are cases when the page was in the pcp before the 
isolation started, which is why we drain the pcp at some point (IIRC).

Ge Yang Feb. 18, 2025, 9:54 a.m. UTC | #6

在 2025/2/18 17:41, David Hildenbrand 写道:
> On 18.02.25 10:22, Ge Yang wrote:
>>
>>
>> 在 2025/2/18 16:55, David Hildenbrand 写道:
>>> On 15.02.25 06:50, Ge Yang wrote:
>>>>
>>>>
>>>> 在 2025/2/14 16:08, David Hildenbrand 写道:
>>>>> On 14.02.25 07:32, yangge1116@126.com wrote:
>>>>>> From: Ge Yang <yangge1116@126.com>
>>>>>>
>>>>>> Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer
>>>>>> freeing
>>>>>> of HugeTLB pages"), which supports deferring the freeing of HugeTLB
>>>>>> pages,
>>>>>> the allocation of contiguous memory through cma_alloc() may fail
>>>>>> probabilistically.
>>>>>>
>>>>>> In the CMA allocation process, if it is found that the CMA area is
>>>>>> occupied
>>>>>> by in-use hugepage folios, these in-use hugepage folios need to be
>>>>>> migrated
>>>>>> to another location. When there are no available hugepage folios 
>>>>>> in the
>>>>>> free HugeTLB pool during the migration of in-use HugeTLB pages, new
>>>>>> folios
>>>>>> are allocated from the buddy system. A temporary state is set on the
>>>>>> newly
>>>>>> allocated folio. Upon completion of the hugepage folio migration, the
>>>>>> temporary state is transferred from the new folios to the old folios.
>>>>>> Normally, when the old folios with the temporary state are freed, 
>>>>>> it is
>>>>>> directly released back to the buddy system. However, due to the
>>>>>> deferred
>>>>>> freeing of HugeTLB pages, the PageBuddy() check fails, ultimately
>>>>>> leading
>>>>>> to the failure of cma_alloc().
>>>>>>
>>>>>> Here is a simplified call trace illustrating the process:
>>>>>> cma_alloc()
>>>>>>        ->__alloc_contig_migrate_range() // Migrate in-use hugepage
>>>>>>            ->unmap_and_move_huge_page()
>>>>>>                ->folio_putback_hugetlb() // Free old folios
>>>>>>        ->test_pages_isolated()
>>>>>>            ->__test_page_isolated_in_pageblock()
>>>>>>                 ->PageBuddy(page) // Check if the page is in buddy
>>>>>>
>>>>>> To resolve this issue, we have implemented a function named
>>>>>> wait_for_hugepage_folios_freed(). This function ensures that the
>>>>>> hugepage
>>>>>> folios are properly released back to the buddy system after their
>>>>>> migration
>>>>>> is completed. By invoking wait_for_hugepage_folios_freed() following
>>>>>> the
>>>>>> migration process, we guarantee that when test_pages_isolated() is
>>>>>> executed, it will successfully pass.
>>>>>
>>>>> Okay, so after every successful migration -> put of src, we wait 
>>>>> for the
>>>>> src to actually get freed.
>>>>>
>>>>> When migrating multiple hugetlb folios, we'd wait once per folio.
>>>>>
>>>>> It reminds me a bit about pcp caches, where folios are !buddy until 
>>>>> the
>>>>> pcp was drained.
>>>>>
>>>> It seems that we only track unmovable, reclaimable, and movable 
>>>> pages on
>>>> the pcp lists. For specific details, please refer to the
>>>> free_frozen_pages() function.
>>>
>>> It reminded me about PCP caches, because we effectively also have to
>>> wait for some stuck folios to properly get freed to the buddy.
>>>
>> It seems that when an isolated page is freed, it won't be placed back
>> into the PCP caches.
> 
> I recall there are cases when the page was in the pcp before the 
> isolation started, which is why we drain the pcp at some point (IIRC).
> 
Yes, indeed, drain_all_pages(cc.zone) is currently executed before 
__alloc_contig_migrate_range().

mm/hugetlb: wait for hugepage folios to be freed

Commit Message

Comments

Patch