diff mbox series

mm: fix a potential infinite loop in start_isolate_page_range().

Message ID 20220524194756.1698351-1-zi.yan@sent.com (mailing list archive)
State New
Headers show
Series mm: fix a potential infinite loop in start_isolate_page_range(). | expand

Commit Message

Zi Yan May 24, 2022, 7:47 p.m. UTC
From: Zi Yan <ziy@nvidia.com>

In isolate_single_pageblock() called by start_isolate_page_range(),
there are some pageblock isolation issues causing a potential
infinite loop when isolating a page range. This is reported by Qian Cai.

1. the pageblock was isolated by just changing pageblock migratetype
   without checking unmovable pages. Calling set_migratetype_isolate() to
   isolate pageblock properly.
2. an off-by-one error caused migrating pages unnecessarily, since the page
   is not crossing pageblock boundary.
3. migrating a compound page across pageblock boundary then splitting the
   free page later has a small race window that the free page might be
   allocated again, so that the code will try again, causing an potential
   infinite loop. Temporarily set the to-be-migrated page's pageblock to
   MIGRATE_ISOLATE to prevent that and bail out early if no free page is
   found after page migration.

An additional fix to split_free_page() aims to avoid crashing in
__free_one_page(). When the free page is split at the specified
split_pfn_offset, free_page_order should check both the first bit of
free_page_pfn and the last bit of split_pfn_offset and use the smaller one.
For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
0x8000, which the original algorithm did.

Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c     |  5 ++++-
 mm/page_isolation.c | 52 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 13 deletions(-)

Comments

Andrew Morton May 24, 2022, 8:23 p.m. UTC | #1
On Tue, 24 May 2022 15:47:56 -0400 Zi Yan <zi.yan@sent.com> wrote:

> From: Zi Yan <ziy@nvidia.com>
> 
> In isolate_single_pageblock() called by start_isolate_page_range(),
> there are some pageblock isolation issues causing a potential
> infinite loop when isolating a page range. This is reported by Qian Cai.
> 
> 1. the pageblock was isolated by just changing pageblock migratetype
>    without checking unmovable pages. Calling set_migratetype_isolate() to
>    isolate pageblock properly.
> 2. an off-by-one error caused migrating pages unnecessarily, since the page
>    is not crossing pageblock boundary.
> 3. migrating a compound page across pageblock boundary then splitting the
>    free page later has a small race window that the free page might be
>    allocated again, so that the code will try again, causing an potential
>    infinite loop. Temporarily set the to-be-migrated page's pageblock to
>    MIGRATE_ISOLATE to prevent that and bail out early if no free page is
>    found after page migration.
> 
> An additional fix to split_free_page() aims to avoid crashing in
> __free_one_page(). When the free page is split at the specified
> split_pfn_offset, free_page_order should check both the first bit of
> free_page_pfn and the last bit of split_pfn_offset and use the smaller one.
> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
> 0x8000, which the original algorithm did.
> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page,
>  	unsigned long flags;
>  	int free_page_order;
>  
> +	if (split_pfn_offset == 0)
> +		return;
> +
>  	spin_lock_irqsave(&zone->lock, flags);
>  	del_page_from_free_list(free_page, zone, order);
>  	for (pfn = free_page_pfn;
>  	     pfn < free_page_pfn + (1UL << order);) {
>  		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>  
> -		free_page_order = ffs(split_pfn_offset) - 1;
> +		free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset));

Why is it testing the zeroness of `pfn' here?  Can pfn==0 even happen? 
If so, it's a legitimate value so why does it get special-cased?
Zi Yan May 24, 2022, 8:27 p.m. UTC | #2
On 24 May 2022, at 16:23, Andrew Morton wrote:

> On Tue, 24 May 2022 15:47:56 -0400 Zi Yan <zi.yan@sent.com> wrote:
>
>> From: Zi Yan <ziy@nvidia.com>
>>
>> In isolate_single_pageblock() called by start_isolate_page_range(),
>> there are some pageblock isolation issues causing a potential
>> infinite loop when isolating a page range. This is reported by Qian Cai.
>>
>> 1. the pageblock was isolated by just changing pageblock migratetype
>>    without checking unmovable pages. Calling set_migratetype_isolate() to
>>    isolate pageblock properly.
>> 2. an off-by-one error caused migrating pages unnecessarily, since the page
>>    is not crossing pageblock boundary.
>> 3. migrating a compound page across pageblock boundary then splitting the
>>    free page later has a small race window that the free page might be
>>    allocated again, so that the code will try again, causing an potential
>>    infinite loop. Temporarily set the to-be-migrated page's pageblock to
>>    MIGRATE_ISOLATE to prevent that and bail out early if no free page is
>>    found after page migration.
>>
>> An additional fix to split_free_page() aims to avoid crashing in
>> __free_one_page(). When the free page is split at the specified
>> split_pfn_offset, free_page_order should check both the first bit of
>> free_page_pfn and the last bit of split_pfn_offset and use the smaller one.
>> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
>> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
>> 0x8000, which the original algorithm did.
>>
>> ...
>>
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page,
>>  	unsigned long flags;
>>  	int free_page_order;
>>
>> +	if (split_pfn_offset == 0)
>> +		return;
>> +
>>  	spin_lock_irqsave(&zone->lock, flags);
>>  	del_page_from_free_list(free_page, zone, order);
>>  	for (pfn = free_page_pfn;
>>  	     pfn < free_page_pfn + (1UL << order);) {
>>  		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>>
>> -		free_page_order = ffs(split_pfn_offset) - 1;
>> +		free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset));
>
> Why is it testing the zeroness of `pfn' here?  Can pfn==0 even happen?
> If so, it's a legitimate value so why does it get special-cased?

__ffs() and __fls() are undefined if no bit exists, based on their
comments. I checked both pfn and split_pfn_offset against 0
just in case, even if pfn most likely is not going to be 0.

--
Best Regards,
Yan, Zi
Marek Szyprowski May 25, 2022, 9:48 p.m. UTC | #3
On 24.05.2022 21:47, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> In isolate_single_pageblock() called by start_isolate_page_range(),
> there are some pageblock isolation issues causing a potential
> infinite loop when isolating a page range. This is reported by Qian Cai.
>
> 1. the pageblock was isolated by just changing pageblock migratetype
>     without checking unmovable pages. Calling set_migratetype_isolate() to
>     isolate pageblock properly.
> 2. an off-by-one error caused migrating pages unnecessarily, since the page
>     is not crossing pageblock boundary.
> 3. migrating a compound page across pageblock boundary then splitting the
>     free page later has a small race window that the free page might be
>     allocated again, so that the code will try again, causing an potential
>     infinite loop. Temporarily set the to-be-migrated page's pageblock to
>     MIGRATE_ISOLATE to prevent that and bail out early if no free page is
>     found after page migration.
>
> An additional fix to split_free_page() aims to avoid crashing in
> __free_one_page(). When the free page is split at the specified
> split_pfn_offset, free_page_order should check both the first bit of
> free_page_pfn and the last bit of split_pfn_offset and use the smaller one.
> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
> 0x8000, which the original algorithm did.
>
> Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity")
> Reported-by: Qian Cai <quic_qiancai@quicinc.com>
> Signed-off-by: Zi Yan <ziy@nvidia.com>

This patch landed in linux next-20220525 as commit 29a8af92b874 ("mm: 
fix a potential infinite loop in start_isolate_page_range()"). 
Unfortunately it breaks all CMA allocations done by the DMA-mapping 
framework. I've observed this on ARM 32bit and 64bit. In the logs I only 
see messages like:

cma: cma_alloc: linux,cma: alloc failed, req-size: 128 pages, ret: -16

I will try to analyze it a bit more tomorrow, but it looks that 
isolation always fails.

> ---
>   mm/page_alloc.c     |  5 ++++-
>   mm/page_isolation.c | 52 ++++++++++++++++++++++++++++++++++-----------
>   2 files changed, 44 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 267599dd9706..6eec0211e0be 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1114,13 +1114,16 @@ void split_free_page(struct page *free_page,
>   	unsigned long flags;
>   	int free_page_order;
>   
> +	if (split_pfn_offset == 0)
> +		return;
> +
>   	spin_lock_irqsave(&zone->lock, flags);
>   	del_page_from_free_list(free_page, zone, order);
>   	for (pfn = free_page_pfn;
>   	     pfn < free_page_pfn + (1UL << order);) {
>   		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>   
> -		free_page_order = ffs(split_pfn_offset) - 1;
> +		free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset));
>   		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
>   				mt, FPI_NONE);
>   		pfn += 1UL << free_page_order;
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index b3f074d1682e..c643c8420809 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -283,6 +283,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>    * isolate_single_pageblock() -- tries to isolate a pageblock that might be
>    * within a free or in-use page.
>    * @boundary_pfn:		pageblock-aligned pfn that a page might cross
> + * @flags:			isolation flags
>    * @gfp_flags:			GFP flags used for migrating pages
>    * @isolate_before:	isolate the pageblock before the boundary_pfn
>    *
> @@ -298,14 +299,15 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>    * either. The function handles this by splitting the free page or migrating
>    * the in-use page then splitting the free page.
>    */
> -static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
> -			bool isolate_before)
> +static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> +			gfp_t gfp_flags, bool isolate_before)
>   {
>   	unsigned char saved_mt;
>   	unsigned long start_pfn;
>   	unsigned long isolate_pageblock;
>   	unsigned long pfn;
>   	struct zone *zone;
> +	int ret;
>   
>   	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
>   
> @@ -325,7 +327,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   				      zone->zone_start_pfn);
>   
>   	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
> -	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE);
> +	ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
> +			isolate_pageblock, isolate_pageblock + pageblock_nr_pages);
> +
> +	if (ret)
> +		return ret;
>   
>   	/*
>   	 * Bail out early when the to-be-isolated pageblock does not form
> @@ -374,7 +380,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   			struct page *head = compound_head(page);
>   			unsigned long head_pfn = page_to_pfn(head);
>   
> -			if (head_pfn + nr_pages < boundary_pfn) {
> +			if (head_pfn + nr_pages <= boundary_pfn) {
>   				pfn = head_pfn + nr_pages;
>   				continue;
>   			}
> @@ -386,7 +392,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   			if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) {
>   				int order;
>   				unsigned long outer_pfn;
> -				int ret;
> +				int page_mt = get_pageblock_migratetype(page);
> +				bool isolate_page = !is_migrate_isolate_page(page);
>   				struct compact_control cc = {
>   					.nr_migratepages = 0,
>   					.order = -1,
> @@ -399,9 +406,31 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   				};
>   				INIT_LIST_HEAD(&cc.migratepages);
>   
> +				/*
> +				 * XXX: mark the page as MIGRATE_ISOLATE so that
> +				 * no one else can grab the freed page after migration.
> +				 * Ideally, the page should be freed as two separate
> +				 * pages to be added into separate migratetype free
> +				 * lists.
> +				 */
> +				if (isolate_page) {
> +					ret = set_migratetype_isolate(page, page_mt,
> +						flags, head_pfn, head_pfn + nr_pages);
> +					if (ret)
> +						goto failed;
> +				}
> +
>   				ret = __alloc_contig_migrate_range(&cc, head_pfn,
>   							head_pfn + nr_pages);
>   
> +				/*
> +				 * restore the page's migratetype so that it can
> +				 * be split into separate migratetype free lists
> +				 * later.
> +				 */
> +				if (isolate_page)
> +					unset_migratetype_isolate(page, page_mt);
> +
>   				if (ret)
>   					goto failed;
>   				/*
> @@ -417,10 +446,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   				order = 0;
>   				outer_pfn = pfn;
>   				while (!PageBuddy(pfn_to_page(outer_pfn))) {
> -					if (++order >= MAX_ORDER) {
> -						outer_pfn = pfn;
> -						break;
> -					}
> +					/* stop if we cannot find the free page */
> +					if (++order >= MAX_ORDER)
> +						goto failed;
>   					outer_pfn &= ~0UL << order;
>   				}
>   				pfn = outer_pfn;
> @@ -435,7 +463,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
>   	return 0;
>   failed:
>   	/* restore the original migratetype */
> -	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt);
> +	unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt);
>   	return -EBUSY;
>   }
>   
> @@ -496,12 +524,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>   	int ret;
>   
>   	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
> -	ret = isolate_single_pageblock(isolate_start, gfp_flags, false);
> +	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false);
>   	if (ret)
>   		return ret;
>   
>   	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> -	ret = isolate_single_pageblock(isolate_end, gfp_flags, true);
> +	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true);
>   	if (ret) {
>   		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
>   		return ret;

Best regards
Zi Yan May 26, 2022, 5:32 p.m. UTC | #4
On 25 May 2022, at 17:48, Marek Szyprowski wrote:

> On 24.05.2022 21:47, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> In isolate_single_pageblock() called by start_isolate_page_range(),
>> there are some pageblock isolation issues causing a potential
>> infinite loop when isolating a page range. This is reported by Qian Cai.
>>
>> 1. the pageblock was isolated by just changing pageblock migratetype
>>     without checking unmovable pages. Calling set_migratetype_isolate() to
>>     isolate pageblock properly.
>> 2. an off-by-one error caused migrating pages unnecessarily, since the page
>>     is not crossing pageblock boundary.
>> 3. migrating a compound page across pageblock boundary then splitting the
>>     free page later has a small race window that the free page might be
>>     allocated again, so that the code will try again, causing an potential
>>     infinite loop. Temporarily set the to-be-migrated page's pageblock to
>>     MIGRATE_ISOLATE to prevent that and bail out early if no free page is
>>     found after page migration.
>>
>> An additional fix to split_free_page() aims to avoid crashing in
>> __free_one_page(). When the free page is split at the specified
>> split_pfn_offset, free_page_order should check both the first bit of
>> free_page_pfn and the last bit of split_pfn_offset and use the smaller one.
>> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
>> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
>> 0x8000, which the original algorithm did.
>>
>> Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity")
>> Reported-by: Qian Cai <quic_qiancai@quicinc.com>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> This patch landed in linux next-20220525 as commit 29a8af92b874 ("mm:
> fix a potential infinite loop in start_isolate_page_range()").
> Unfortunately it breaks all CMA allocations done by the DMA-mapping
> framework. I've observed this on ARM 32bit and 64bit. In the logs I only
> see messages like:
>
> cma: cma_alloc: linux,cma: alloc failed, req-size: 128 pages, ret: -16
>
> I will try to analyze it a bit more tomorrow, but it looks that
> isolation always fails.
>

Hi Marek,

Thanks for reporting the issue.

Can you try the patch below to see if it fixes the issue?

Basically, the bug introduced by this commit is that it does not consider
the situation when a smaller than pageblock range is to be isolated,
the set_migratetype_isolate() in the second isolate_single_pageblock()
called by start_isolate_page_range() returns with a failure. Skipping isolating
the pageblock which has been isolated by the first isolate_single_pageblock()
solves the issue.

The patch below also includes the fix for the free memory accounting issue.

diff --git a/mm/internal.h b/mm/internal.h
index 64e61b032dac..c0f8fbe0445b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -374,8 +374,8 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
 			  phys_addr_t min_addr,
 			  int nid, bool exact_nid);

-void split_free_page(struct page *free_page,
-				int order, unsigned long split_pfn_offset);
+int split_free_page(struct page *free_page,
+			unsigned int order, unsigned long split_pfn_offset);

 #if defined CONFIG_COMPACTION || defined CONFIG_CMA

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bc93a82e51e6..6f6e4649ac21 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1100,30 +1100,44 @@ static inline void __free_one_page(struct page *page,
  * @order:		the order of the page
  * @split_pfn_offset:	split offset within the page
  *
+ * Return -ENOENT if the free page is changed, otherwise 0
+ *
  * It is used when the free page crosses two pageblocks with different migratetypes
  * at split_pfn_offset within the page. The split free page will be put into
  * separate migratetype lists afterwards. Otherwise, the function achieves
  * nothing.
  */
-void split_free_page(struct page *free_page,
-				int order, unsigned long split_pfn_offset)
+int split_free_page(struct page *free_page,
+			unsigned int order, unsigned long split_pfn_offset)
 {
 	struct zone *zone = page_zone(free_page);
 	unsigned long free_page_pfn = page_to_pfn(free_page);
 	unsigned long pfn;
 	unsigned long flags;
 	int free_page_order;
+	int mt;
+	int ret = 0;

 	if (split_pfn_offset == 0)
-		return;
+		return ret;

 	spin_lock_irqsave(&zone->lock, flags);
+
+	if (!PageBuddy(free_page) || buddy_order(free_page) != order) {
+		ret = -ENOENT;
+		goto out;
+	}
+
+	mt = get_pageblock_migratetype(free_page);
+	if (likely(!is_migrate_isolate(mt)))
+		__mod_zone_freepage_state(zone, -(1UL << order), mt);
+
 	del_page_from_free_list(free_page, zone, order);
 	for (pfn = free_page_pfn;
 	     pfn < free_page_pfn + (1UL << order);) {
 		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);

-		free_page_order = min_t(int,
+		free_page_order = min_t(unsigned int,
 					pfn ? __ffs(pfn) : order,
 					__fls(split_pfn_offset));
 		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
@@ -1134,7 +1148,9 @@ void split_free_page(struct page *free_page,
 		if (split_pfn_offset == 0)
 			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
 	}
+out:
 	spin_unlock_irqrestore(&zone->lock, flags);
+	return ret;
 }
 /*
  * A bad page could be due to a number of fields. Instead of multiple branches,
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c643c8420809..f539ccf7fb44 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -300,7 +300,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * the in-use page then splitting the free page.
  */
 static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
-			gfp_t gfp_flags, bool isolate_before)
+			gfp_t gfp_flags, bool isolate_before, bool skip_isolation)
 {
 	unsigned char saved_mt;
 	unsigned long start_pfn;
@@ -327,11 +327,16 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 				      zone->zone_start_pfn);

 	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
-	ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
-			isolate_pageblock, isolate_pageblock + pageblock_nr_pages);

-	if (ret)
-		return ret;
+	if (skip_isolation)
+		VM_BUG_ON(!is_migrate_isolate(saved_mt));
+	else {
+		ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
+				isolate_pageblock, isolate_pageblock + pageblock_nr_pages);
+
+		if (ret)
+			return ret;
+	}

 	/*
 	 * Bail out early when the to-be-isolated pageblock does not form
@@ -367,8 +372,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 			int order = buddy_order(page);

 			if (pfn + (1UL << order) > boundary_pfn)
-				split_free_page(page, order, boundary_pfn - pfn);
-			pfn += (1UL << order);
+				/* free page changed before split, check it again */
+				if (split_free_page(page, order, boundary_pfn - pfn))
+				    continue;
+
+			pfn += 1UL << order;
 			continue;
 		}
 		/*
@@ -463,7 +471,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
 	return 0;
 failed:
 	/* restore the original migratetype */
-	unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt);
+	if (!skip_isolation)
+		unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt);
 	return -EBUSY;
 }

@@ -522,14 +531,18 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
 	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
 	int ret;
+	bool skip_isolation = false;

 	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
-	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false);
+	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false, skip_isolation);
 	if (ret)
 		return ret;

+	if (isolate_start == isolate_end - pageblock_nr_pages)
+		skip_isolation = true;
+
 	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
-	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true);
+	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, skip_isolation);
 	if (ret) {
 		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
 		return ret;



--
Best Regards,
Yan, Zi
Marek Szyprowski May 26, 2022, 8:11 p.m. UTC | #5
Hi,

On 26.05.2022 19:32, Zi Yan wrote:
> On 25 May 2022, at 17:48, Marek Szyprowski wrote:
>> On 24.05.2022 21:47, Zi Yan wrote:
>>> From: Zi Yan <ziy@nvidia.com>
>>>
>>> In isolate_single_pageblock() called by start_isolate_page_range(),
>>> there are some pageblock isolation issues causing a potential
>>> infinite loop when isolating a page range. This is reported by Qian Cai.
>>>
>>> 1. the pageblock was isolated by just changing pageblock migratetype
>>>      without checking unmovable pages. Calling set_migratetype_isolate() to
>>>      isolate pageblock properly.
>>> 2. an off-by-one error caused migrating pages unnecessarily, since the page
>>>      is not crossing pageblock boundary.
>>> 3. migrating a compound page across pageblock boundary then splitting the
>>>      free page later has a small race window that the free page might be
>>>      allocated again, so that the code will try again, causing an potential
>>>      infinite loop. Temporarily set the to-be-migrated page's pageblock to
>>>      MIGRATE_ISOLATE to prevent that and bail out early if no free page is
>>>      found after page migration.
>>>
>>> An additional fix to split_free_page() aims to avoid crashing in
>>> __free_one_page(). When the free page is split at the specified
>>> split_pfn_offset, free_page_order should check both the first bit of
>>> free_page_pfn and the last bit of split_pfn_offset and use the smaller one.
>>> For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
>>> free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
>>> 0x8000, which the original algorithm did.
>>>
>>> Fixes: b2c9e2fbba ("mm: make alloc_contig_range work at pageblock granularity")
>>> Reported-by: Qian Cai <quic_qiancai@quicinc.com>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> This patch landed in linux next-20220525 as commit 29a8af92b874 ("mm:
>> fix a potential infinite loop in start_isolate_page_range()").
>> Unfortunately it breaks all CMA allocations done by the DMA-mapping
>> framework. I've observed this on ARM 32bit and 64bit. In the logs I only
>> see messages like:
>>
>> cma: cma_alloc: linux,cma: alloc failed, req-size: 128 pages, ret: -16
>>
>> I will try to analyze it a bit more tomorrow, but it looks that
>> isolation always fails.
>>
> Hi Marek,
>
> Thanks for reporting the issue.
>
> Can you try the patch below to see if it fixes the issue?
>
> Basically, the bug introduced by this commit is that it does not consider
> the situation when a smaller than pageblock range is to be isolated,
> the set_migratetype_isolate() in the second isolate_single_pageblock()
> called by start_isolate_page_range() returns with a failure. Skipping isolating
> the pageblock which has been isolated by the first isolate_single_pageblock()
> solves the issue.
>
> The patch below also includes the fix for the free memory accounting issue.

This patch fixed the issue, thanks!

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>


> diff --git a/mm/internal.h b/mm/internal.h
> index 64e61b032dac..c0f8fbe0445b 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -374,8 +374,8 @@ extern void *memmap_alloc(phys_addr_t size, phys_addr_t align,
>   			  phys_addr_t min_addr,
>   			  int nid, bool exact_nid);
>
> -void split_free_page(struct page *free_page,
> -				int order, unsigned long split_pfn_offset);
> +int split_free_page(struct page *free_page,
> +			unsigned int order, unsigned long split_pfn_offset);
>
>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bc93a82e51e6..6f6e4649ac21 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1100,30 +1100,44 @@ static inline void __free_one_page(struct page *page,
>    * @order:		the order of the page
>    * @split_pfn_offset:	split offset within the page
>    *
> + * Return -ENOENT if the free page is changed, otherwise 0
> + *
>    * It is used when the free page crosses two pageblocks with different migratetypes
>    * at split_pfn_offset within the page. The split free page will be put into
>    * separate migratetype lists afterwards. Otherwise, the function achieves
>    * nothing.
>    */
> -void split_free_page(struct page *free_page,
> -				int order, unsigned long split_pfn_offset)
> +int split_free_page(struct page *free_page,
> +			unsigned int order, unsigned long split_pfn_offset)
>   {
>   	struct zone *zone = page_zone(free_page);
>   	unsigned long free_page_pfn = page_to_pfn(free_page);
>   	unsigned long pfn;
>   	unsigned long flags;
>   	int free_page_order;
> +	int mt;
> +	int ret = 0;
>
>   	if (split_pfn_offset == 0)
> -		return;
> +		return ret;
>
>   	spin_lock_irqsave(&zone->lock, flags);
> +
> +	if (!PageBuddy(free_page) || buddy_order(free_page) != order) {
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +
> +	mt = get_pageblock_migratetype(free_page);
> +	if (likely(!is_migrate_isolate(mt)))
> +		__mod_zone_freepage_state(zone, -(1UL << order), mt);
> +
>   	del_page_from_free_list(free_page, zone, order);
>   	for (pfn = free_page_pfn;
>   	     pfn < free_page_pfn + (1UL << order);) {
>   		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
>
> -		free_page_order = min_t(int,
> +		free_page_order = min_t(unsigned int,
>   					pfn ? __ffs(pfn) : order,
>   					__fls(split_pfn_offset));
>   		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
> @@ -1134,7 +1148,9 @@ void split_free_page(struct page *free_page,
>   		if (split_pfn_offset == 0)
>   			split_pfn_offset = (1UL << order) - (pfn - free_page_pfn);
>   	}
> +out:
>   	spin_unlock_irqrestore(&zone->lock, flags);
> +	return ret;
>   }
>   /*
>    * A bad page could be due to a number of fields. Instead of multiple branches,
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index c643c8420809..f539ccf7fb44 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -300,7 +300,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>    * the in-use page then splitting the free page.
>    */
>   static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> -			gfp_t gfp_flags, bool isolate_before)
> +			gfp_t gfp_flags, bool isolate_before, bool skip_isolation)
>   {
>   	unsigned char saved_mt;
>   	unsigned long start_pfn;
> @@ -327,11 +327,16 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>   				      zone->zone_start_pfn);
>
>   	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
> -	ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
> -			isolate_pageblock, isolate_pageblock + pageblock_nr_pages);
>
> -	if (ret)
> -		return ret;
> +	if (skip_isolation)
> +		VM_BUG_ON(!is_migrate_isolate(saved_mt));
> +	else {
> +		ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
> +				isolate_pageblock, isolate_pageblock + pageblock_nr_pages);
> +
> +		if (ret)
> +			return ret;
> +	}
>
>   	/*
>   	 * Bail out early when the to-be-isolated pageblock does not form
> @@ -367,8 +372,11 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>   			int order = buddy_order(page);
>
>   			if (pfn + (1UL << order) > boundary_pfn)
> -				split_free_page(page, order, boundary_pfn - pfn);
> -			pfn += (1UL << order);
> +				/* free page changed before split, check it again */
> +				if (split_free_page(page, order, boundary_pfn - pfn))
> +				    continue;
> +
> +			pfn += 1UL << order;
>   			continue;
>   		}
>   		/*
> @@ -463,7 +471,8 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
>   	return 0;
>   failed:
>   	/* restore the original migratetype */
> -	unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt);
> +	if (!skip_isolation)
> +		unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt);
>   	return -EBUSY;
>   }
>
> @@ -522,14 +531,18 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>   	unsigned long isolate_start = ALIGN_DOWN(start_pfn, pageblock_nr_pages);
>   	unsigned long isolate_end = ALIGN(end_pfn, pageblock_nr_pages);
>   	int ret;
> +	bool skip_isolation = false;
>
>   	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
> -	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false);
> +	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false, skip_isolation);
>   	if (ret)
>   		return ret;
>
> +	if (isolate_start == isolate_end - pageblock_nr_pages)
> +		skip_isolation = true;
> +
>   	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> -	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true);
> +	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true, skip_isolation);
>   	if (ret) {
>   		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
>   		return ret;
>
>
>
> --
> Best Regards,
> Yan, Zi

Best regards
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 267599dd9706..6eec0211e0be 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1114,13 +1114,16 @@  void split_free_page(struct page *free_page,
 	unsigned long flags;
 	int free_page_order;
 
+	if (split_pfn_offset == 0)
+		return;
+
 	spin_lock_irqsave(&zone->lock, flags);
 	del_page_from_free_list(free_page, zone, order);
 	for (pfn = free_page_pfn;
 	     pfn < free_page_pfn + (1UL << order);) {
 		int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
 
-		free_page_order = ffs(split_pfn_offset) - 1;
+		free_page_order = min(pfn ? __ffs(pfn) : order, __fls(split_pfn_offset));
 		__free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
 				mt, FPI_NONE);
 		pfn += 1UL << free_page_order;
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b3f074d1682e..c643c8420809 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -283,6 +283,7 @@  __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * isolate_single_pageblock() -- tries to isolate a pageblock that might be
  * within a free or in-use page.
  * @boundary_pfn:		pageblock-aligned pfn that a page might cross
+ * @flags:			isolation flags
  * @gfp_flags:			GFP flags used for migrating pages
  * @isolate_before:	isolate the pageblock before the boundary_pfn
  *
@@ -298,14 +299,15 @@  __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * either. The function handles this by splitting the free page or migrating
  * the in-use page then splitting the free page.
  */
-static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
-			bool isolate_before)
+static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
+			gfp_t gfp_flags, bool isolate_before)
 {
 	unsigned char saved_mt;
 	unsigned long start_pfn;
 	unsigned long isolate_pageblock;
 	unsigned long pfn;
 	struct zone *zone;
+	int ret;
 
 	VM_BUG_ON(!IS_ALIGNED(boundary_pfn, pageblock_nr_pages));
 
@@ -325,7 +327,11 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
 				      zone->zone_start_pfn);
 
 	saved_mt = get_pageblock_migratetype(pfn_to_page(isolate_pageblock));
-	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), MIGRATE_ISOLATE);
+	ret = set_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt, flags,
+			isolate_pageblock, isolate_pageblock + pageblock_nr_pages);
+
+	if (ret)
+		return ret;
 
 	/*
 	 * Bail out early when the to-be-isolated pageblock does not form
@@ -374,7 +380,7 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
 			struct page *head = compound_head(page);
 			unsigned long head_pfn = page_to_pfn(head);
 
-			if (head_pfn + nr_pages < boundary_pfn) {
+			if (head_pfn + nr_pages <= boundary_pfn) {
 				pfn = head_pfn + nr_pages;
 				continue;
 			}
@@ -386,7 +392,8 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
 			if (PageHuge(page) || PageLRU(page) || __PageMovable(page)) {
 				int order;
 				unsigned long outer_pfn;
-				int ret;
+				int page_mt = get_pageblock_migratetype(page);
+				bool isolate_page = !is_migrate_isolate_page(page);
 				struct compact_control cc = {
 					.nr_migratepages = 0,
 					.order = -1,
@@ -399,9 +406,31 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
 				};
 				INIT_LIST_HEAD(&cc.migratepages);
 
+				/*
+				 * XXX: mark the page as MIGRATE_ISOLATE so that
+				 * no one else can grab the freed page after migration.
+				 * Ideally, the page should be freed as two separate
+				 * pages to be added into separate migratetype free
+				 * lists.
+				 */
+				if (isolate_page) {
+					ret = set_migratetype_isolate(page, page_mt,
+						flags, head_pfn, head_pfn + nr_pages);
+					if (ret)
+						goto failed;
+				}
+
 				ret = __alloc_contig_migrate_range(&cc, head_pfn,
 							head_pfn + nr_pages);
 
+				/*
+				 * restore the page's migratetype so that it can
+				 * be split into separate migratetype free lists
+				 * later.
+				 */
+				if (isolate_page)
+					unset_migratetype_isolate(page, page_mt);
+
 				if (ret)
 					goto failed;
 				/*
@@ -417,10 +446,9 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
 				order = 0;
 				outer_pfn = pfn;
 				while (!PageBuddy(pfn_to_page(outer_pfn))) {
-					if (++order >= MAX_ORDER) {
-						outer_pfn = pfn;
-						break;
-					}
+					/* stop if we cannot find the free page */
+					if (++order >= MAX_ORDER)
+						goto failed;
 					outer_pfn &= ~0UL << order;
 				}
 				pfn = outer_pfn;
@@ -435,7 +463,7 @@  static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags,
 	return 0;
 failed:
 	/* restore the original migratetype */
-	set_pageblock_migratetype(pfn_to_page(isolate_pageblock), saved_mt);
+	unset_migratetype_isolate(pfn_to_page(isolate_pageblock), saved_mt);
 	return -EBUSY;
 }
 
@@ -496,12 +524,12 @@  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	int ret;
 
 	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
-	ret = isolate_single_pageblock(isolate_start, gfp_flags, false);
+	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false);
 	if (ret)
 		return ret;
 
 	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
-	ret = isolate_single_pageblock(isolate_end, gfp_flags, true);
+	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true);
 	if (ret) {
 		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
 		return ret;