Message ID | 20220530115027.123341-1-anshuman.khandual@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] mm/page_isolation: Fix an infinite loop in isolate_single_pageblock() | expand |
On 30 May 2022, at 7:50, Anshuman Khandual wrote: > HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform > is getting stuck in isolate_single_pageblock(), because of an infinite loop > Because head_pfn always evaluate the same, so does pfn, and the outer loop > never exits. Dropping the relevant code block, which seems redundant, makes > the problem go away. Thanks for the report. > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Zi Yan <ziy@nvidia.com> > Cc: linux-mm@kvack.org > Cc: linux-kernel@vger.kernel.org > Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> > --- > I am not sure about this fix, and also did not find much time today to > debug any further. There are much code changes around this function in > recent days. This problem is present on latest mainline kernel. > > - Anshuman > > mm/page_isolation.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 6021f8444b5a..b0922fee75c1 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > struct page *head = compound_head(page); > unsigned long head_pfn = page_to_pfn(head); > > - if (head_pfn + nr_pages <= boundary_pfn) { > - pfn = head_pfn + nr_pages; > - continue; > - } > #if defined CONFIG_COMPACTION || defined CONFIG_CMA > /* > * hugetlb, lru compound (THP), and movable compound pages > -- > 2.20.1 Can you try the patch below to see if it fixes the issue? Thanks. diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 6021f8444b5a..d200d41ad0d3 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, * above do the rest. If migration is not possible, just fail. */ if (PageCompound(page)) { - unsigned long nr_pages = compound_nr(page); struct page *head = compound_head(page); unsigned long head_pfn = page_to_pfn(head); + unsigned long nr_pages = compound_nr(head); if (head_pfn + nr_pages <= boundary_pfn) { pfn = head_pfn + nr_pages; -- Best Regards, Yan, Zi
On 5/30/22 19:23, Zi Yan wrote: > On 30 May 2022, at 7:50, Anshuman Khandual wrote: > >> HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform >> is getting stuck in isolate_single_pageblock(), because of an infinite loop >> Because head_pfn always evaluate the same, so does pfn, and the outer loop >> never exits. Dropping the relevant code block, which seems redundant, makes >> the problem go away. > > Thanks for the report. > >> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: linux-mm@kvack.org >> Cc: linux-kernel@vger.kernel.org >> Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") >> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> >> --- >> I am not sure about this fix, and also did not find much time today to >> debug any further. There are much code changes around this function in >> recent days. This problem is present on latest mainline kernel. >> >> - Anshuman >> >> mm/page_isolation.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >> index 6021f8444b5a..b0922fee75c1 100644 >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, >> struct page *head = compound_head(page); >> unsigned long head_pfn = page_to_pfn(head); >> >> - if (head_pfn + nr_pages <= boundary_pfn) { >> - pfn = head_pfn + nr_pages; >> - continue; >> - } >> #if defined CONFIG_COMPACTION || defined CONFIG_CMA >> /* >> * hugetlb, lru compound (THP), and movable compound pages >> -- >> 2.20.1 > > Can you try the patch below to see if it fixes the issue? Thanks. > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 6021f8444b5a..d200d41ad0d3 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -385,9 +385,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, > * above do the rest. If migration is not possible, just fail. > */ > if (PageCompound(page)) { > - unsigned long nr_pages = compound_nr(page); > struct page *head = compound_head(page); > unsigned long head_pfn = page_to_pfn(head); > + unsigned long nr_pages = compound_nr(head); > > if (head_pfn + nr_pages <= boundary_pfn) { > pfn = head_pfn + nr_pages; > > Yes, this does solve the problem. I guess nr_pages should have been derived from the compound head itself for it be meaningful (i.e > 1). I assume you will send a fix patch with appropriate write up that describes this problem. - Anshuman
diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 6021f8444b5a..b0922fee75c1 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -389,10 +389,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, struct page *head = compound_head(page); unsigned long head_pfn = page_to_pfn(head); - if (head_pfn + nr_pages <= boundary_pfn) { - pfn = head_pfn + nr_pages; - continue; - } #if defined CONFIG_COMPACTION || defined CONFIG_CMA /* * hugetlb, lru compound (THP), and movable compound pages
HugeTLB allocation (32MB pages on 4K base page) via sysfs on arm64 platform is getting stuck in isolate_single_pageblock(), because of an infinite loop Because head_pfn always evaluate the same, so does pfn, and the outer loop never exits. Dropping the relevant code block, which seems redundant, makes the problem go away. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zi Yan <ziy@nvidia.com> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> --- I am not sure about this fix, and also did not find much time today to debug any further. There are much code changes around this function in recent days. This problem is present on latest mainline kernel. - Anshuman mm/page_isolation.c | 4 ---- 1 file changed, 4 deletions(-)