diff mbox series

[RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock()

Message ID 20220530115027.123341-1-anshuman.khandual@arm.com (mailing list archive)
State New
Headers show
Series [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock() | expand

Commit Message

Anshuman Khandual May 30, 2022, 11:50 a.m. UTC
HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
is getting stuck in isolate_single_pageblock(), because of an infinite loop
Because head_pfn always evaluate the same, so does pfn, and the outer loop
never exits. Dropping the relevant code block, which seems redundant, makes
the problem go away.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
I am not sure about this fix, and also did not find much time today to
debug any further. There are much code changes around this function in
recent days. This problem is present on latest mainline kernel.

- Anshuman

 mm/page_isolation.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Zi Yan May 30, 2022, 1:53 p.m. UTC | #1
On 30 May 2022, at 7:50, Anshuman Khandual wrote:

> HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
> is getting stuck in isolate_single_pageblock(), because of an infinite loop
> Because head_pfn always evaluate the same, so does pfn, and the outer loop
> never exits. Dropping the relevant code block, which seems redundant, makes
> the problem go away.

Thanks for the report.

>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> I am not sure about this fix, and also did not find much time today to
> debug any further. There are much code changes around this function in
> recent days. This problem is present on latest mainline kernel.
>
> - Anshuman
>
>  mm/page_isolation.c | 4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 6021f8444b5a..b0922fee75c1 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>  			struct page *head = compound_head(page);
>  			unsigned long head_pfn = page_to_pfn(head);
>
> -			if (head_pfn + nr_pages <= boundary_pfn) {
> -				pfn = head_pfn + nr_pages;
> -				continue;
> -			}
>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>  			/*
>  			 * hugetlb, lru compound (THP), and movable compound pages
> -- 
> 2.20.1

Can you try the patch below to see if it fixes the issue? Thanks.

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6021f8444b5a..d200d41ad0d3 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
                 * above do the rest. If migration is not possible, just fail.
                 */
                if (PageCompound(page)) {
-                       unsigned long nr_pages = compound_nr(page);
                        struct page *head = compound_head(page);
                        unsigned long head_pfn = page_to_pfn(head);
+                       unsigned long nr_pages = compound_nr(head);

                        if (head_pfn + nr_pages <= boundary_pfn) {
                                pfn = head_pfn + nr_pages;


--
Best Regards,
Yan, Zi
Anshuman Khandual May 31, 2022, 2:22 a.m. UTC | #2
On 5/30/22 19:23, Zi Yan wrote:
> On 30 May 2022, at 7:50, Anshuman Khandual wrote:
> 
>> HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform
>> is getting stuck in isolate_single_pageblock(), because of an infinite loop
>> Because head_pfn always evaluate the same, so does pfn, and the outer loop
>> never exits. Dropping the relevant code block, which seems redundant, makes
>> the problem go away.
> 
> Thanks for the report.
> 
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>> I am not sure about this fix, and also did not find much time today to
>> debug any further. There are much code changes around this function in
>> recent days. This problem is present on latest mainline kernel.
>>
>> - Anshuman
>>
>>  mm/page_isolation.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 6021f8444b5a..b0922fee75c1 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>>  			struct page *head = compound_head(page);
>>  			unsigned long head_pfn = page_to_pfn(head);
>>
>> -			if (head_pfn + nr_pages <= boundary_pfn) {
>> -				pfn = head_pfn + nr_pages;
>> -				continue;
>> -			}
>>  #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>  			/*
>>  			 * hugetlb, lru compound (THP), and movable compound pages
>> -- 
>> 2.20.1
> 
> Can you try the patch below to see if it fixes the issue? Thanks.
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 6021f8444b5a..d200d41ad0d3 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>                  * above do the rest. If migration is not possible, just fail.
>                  */
>                 if (PageCompound(page)) {
> -                       unsigned long nr_pages = compound_nr(page);
>                         struct page *head = compound_head(page);
>                         unsigned long head_pfn = page_to_pfn(head);
> +                       unsigned long nr_pages = compound_nr(head);
> 
>                         if (head_pfn + nr_pages <= boundary_pfn) {
>                                 pfn = head_pfn + nr_pages;
> 
> 

Yes, this does solve the problem. I guess nr_pages should have been derived
from the compound head itself for it be meaningful (i.e > 1). I assume you
will send a fix patch with appropriate write up that describes this problem.

- Anshuman
diff mbox series

Patch

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 6021f8444b5a..b0922fee75c1 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -389,10 +389,6 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 			struct page *head = compound_head(page);
 			unsigned long head_pfn = page_to_pfn(head);
 
-			if (head_pfn + nr_pages <= boundary_pfn) {
-				pfn = head_pfn + nr_pages;
-				continue;
-			}
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
 			/*
 			 * hugetlb, lru compound (THP), and movable compound pages