mbox series

[0/9] Fixes and cleanups to compaction

Message ID 20230805110711.2975149-1-shikemeng@huaweicloud.com (mailing list archive)
Headers show
Series Fixes and cleanups to compaction | expand

Message

Kemeng Shi Aug. 5, 2023, 11:07 a.m. UTC
Hi all, this is another series to do fix and clean up to compaction.
Patch 1-2 fix and clean up freepage list operation.
Patch 3-4 fix and clean up isolation of freepages
Patch 7-9 factor code to check if compaction is needed for allocation
order.
More details can be found in respective patches. Thanks!

Kemeng Shi (9):
  mm/compaction: use correct list in move_freelist_{head}/{tail}
  mm/compaction: call list_is_{first}/{last} more intuitively in
    move_freelist_{head}/{tail}
  mm/compaction: correctly return failure with bogus compound_order in
    strict mode
  mm/compaction: simplify pfn iteration in isolate_freepages_range
  mm/compaction: remove repeat compact_blockskip_flush check in
    reset_isolation_suitable
  mm/compaction: rename is_via_compact_memory to
    compaction_with_allocation_order
  mm/compaction: factor out code to test if we should run compaction for
    target order
  mm/compaction: call compaction_suit_allocation_order in
    kcompactd_node_suitable
  mm/compaction: call compaction_suit_allocation_order in
    kcompactd_do_work

 mm/compaction.c | 119 +++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 62 deletions(-)

Comments

Matthew Wilcox Aug. 5, 2023, 3:14 a.m. UTC | #1
On Sat, Aug 05, 2023 at 07:07:02PM +0800, Kemeng Shi wrote:
> Hi all, this is another series to do fix and clean up to compaction.
> Patch 1-2 fix and clean up freepage list operation.
> Patch 3-4 fix and clean up isolation of freepages
> Patch 7-9 factor code to check if compaction is needed for allocation
> order.
> More details can be found in respective patches. Thanks!

As with your last patch series, half of the patches are missing.
Looks like they didn't make it to lore.kernel.org either:

https://lore.kernel.org/linux-mm/20230804110454.2935878-1-shikemeng@huaweicloud.com/
https://lore.kernel.org/linux-mm/20230805110711.2975149-1-shikemeng@huaweicloud.com/
Kemeng Shi Aug. 5, 2023, 4:07 a.m. UTC | #2
Hi Matthew, thanks to inform me of this. You can find full series on lore.kernel.org
all at [1] and [2].
I contacted owner-linux-mm@kvack.org and was told all patches are recived and forawrd
successfully. Then I contacted meta@public-inbox.org which is the only email I found
from help page of lore.kernel.org/linux-mm/, but no response yet.
Please let me know if there is any other ways I can get help. Thanks!

[1] https://lore.kernel.org/all/20230804110454.2935878-1-shikemeng@huaweicloud.com/
[2] https://lore.kernel.org/all/20230805110711.2975149-1-shikemeng@huaweicloud.com/

on 8/5/2023 11:14 AM, Matthew Wilcox wrote:
> On Sat, Aug 05, 2023 at 07:07:02PM +0800, Kemeng Shi wrote:
>> Hi all, this is another series to do fix and clean up to compaction.
>> Patch 1-2 fix and clean up freepage list operation.
>> Patch 3-4 fix and clean up isolation of freepages
>> Patch 7-9 factor code to check if compaction is needed for allocation
>> order.
>> More details can be found in respective patches. Thanks!
> 
> As with your last patch series, half of the patches are missing.
> Looks like they didn't make it to lore.kernel.org either:
> 
> https://lore.kernel.org/linux-mm/20230804110454.2935878-1-shikemeng@huaweicloud.com/
> https://lore.kernel.org/linux-mm/20230805110711.2975149-1-shikemeng@huaweicloud.com/
> 
>
Andrew Morton Aug. 5, 2023, 5:11 p.m. UTC | #3
On Sat,  5 Aug 2023 19:07:03 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:

> The freepage is chained with buddy_list in freelist head. Use buddy_list
> instead of lru to correct the list operation.
> 
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>  {
>  	LIST_HEAD(sublist);
>  
> -	if (!list_is_last(freelist, &freepage->lru)) {
> -		list_cut_before(&sublist, freelist, &freepage->lru);
> +	if (!list_is_last(freelist, &freepage->buddy_list)) {
> +		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>  		list_splice_tail(&sublist, freelist);
>  	}
>  }
> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>  {
>  	LIST_HEAD(sublist);
>  
> -	if (!list_is_first(freelist, &freepage->lru)) {
> -		list_cut_position(&sublist, freelist, &freepage->lru);
> +	if (!list_is_first(freelist, &freepage->buddy_list)) {
> +		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>  		list_splice_tail(&sublist, freelist);
>  	}
>  }

This looks like a significant error.  Can we speculate about the
possible runtime effects?
Kemeng Shi Aug. 7, 2023, 12:37 a.m. UTC | #4
on 8/6/2023 1:11 AM, Andrew Morton wrote:
> On Sat,  5 Aug 2023 19:07:03 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> 
>> The freepage is chained with buddy_list in freelist head. Use buddy_list
>> instead of lru to correct the list operation.
>>
>> ...
>>
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>>  {
>>  	LIST_HEAD(sublist);
>>  
>> -	if (!list_is_last(freelist, &freepage->lru)) {
>> -		list_cut_before(&sublist, freelist, &freepage->lru);
>> +	if (!list_is_last(freelist, &freepage->buddy_list)) {
>> +		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>>  		list_splice_tail(&sublist, freelist);
>>  	}
>>  }
>> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>>  {
>>  	LIST_HEAD(sublist);
>>  
>> -	if (!list_is_first(freelist, &freepage->lru)) {
>> -		list_cut_position(&sublist, freelist, &freepage->lru);
>> +	if (!list_is_first(freelist, &freepage->buddy_list)) {
>> +		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>>  		list_splice_tail(&sublist, freelist);
>>  	}
>>  }
> 
> This looks like a significant error.  Can we speculate about the
> possible runtime effects?
> 
> 
> It seems no runtime effects for now as lru and buddy_list share
the same memory address in a union.
Baolin Wang Aug. 15, 2023, 7:16 a.m. UTC | #5
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> The freepage is chained with buddy_list in freelist head. Use buddy_list
> instead of lru to correct the list operation.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/compaction.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ea61922a1619..513b1caeb4fa 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>   {
>   	LIST_HEAD(sublist);
>   
> -	if (!list_is_last(freelist, &freepage->lru)) {
> -		list_cut_before(&sublist, freelist, &freepage->lru);
> +	if (!list_is_last(freelist, &freepage->buddy_list)) {
> +		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>   		list_splice_tail(&sublist, freelist);
>   	}
>   }
> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>   {
>   	LIST_HEAD(sublist);
>   
> -	if (!list_is_first(freelist, &freepage->lru)) {
> -		list_cut_position(&sublist, freelist, &freepage->lru);
> +	if (!list_is_first(freelist, &freepage->buddy_list)) {
> +		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>   		list_splice_tail(&sublist, freelist);
>   	}
>   }
Baolin Wang Aug. 15, 2023, 8:28 a.m. UTC | #6
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> In strict mode, we should return 0 if there is any hole in pageblock. If
> we successfully isolated pages at beginning at pageblock and then have a
> bogus compound_order outside pageblock in next page. We will abort search
> loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
> we will treat it as a successful isolation in strict mode as blockpfn is
> not < end_pfn and return partial isolated pages. Then
> isolate_freepages_range may success unexpectly with hole in isolated
> range.

Yes, that can be happened.

> This patch also removes unnecessary limit for blockpfn to go outside
> by buddy page introduced in fixed commit or by stride introduced after
> fixed commit. Caller could use returned blockpfn to check if full
> pageblock is scanned by test if blockpfn >= end and to get next pfn to
> scan inside isolate_freepages_block on demand.

IMO, I don't think removing the pageblock restriction is worth it, since 
it did not fix anything and will make people more confused, at least to me.

That is to say, it will be surprised that the blockpfn can go outside of 
the pageblock after calling isolate_freepages_block() to just scan only 
one pageblock, and I did not see in detail if this can cause other 
potential problems.

> Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>   mm/compaction.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fa1b100b0d10..684f6e6cd8bc 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>   				page += (1UL << order) - 1;
>   				nr_scanned += (1UL << order) - 1;
>   			}
> +			/*
> +			 * There is a tiny chance that we have read bogus
> +			 * compound_order(), so be careful to not go outside
> +			 * of the pageblock.
> +			 */
> +			if (unlikely(blockpfn >= end_pfn))
> +				blockpfn = end_pfn - 1;

So we can just add this validation to ensure that the 
isolate_freepages_block() can return 0 if failure is happened, which can 
fix your problem.

> +
>   			goto isolate_fail;
>   		}
>   
> @@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>   	if (locked)
>   		spin_unlock_irqrestore(&cc->zone->lock, flags);
>   
> -	/*
> -	 * There is a tiny chance that we have read bogus compound_order(),
> -	 * so be careful to not go outside of the pageblock.
> -	 */
> -	if (unlikely(blockpfn > end_pfn))
> -		blockpfn = end_pfn;
> -
>   	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>   					nr_scanned, total_isolated);
>   
> -	/* Record how far we have got within the block */
> +	/* Record how far we have got */
>   	*start_pfn = blockpfn;
>   
>   	/*
> @@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
>   	isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>   
>   	/* Skip this pageblock in the future as it's full or nearly full */
> -	if (start_pfn == end_pfn && !cc->no_set_skip_hint)
> +	if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
>   		set_pageblock_skip(page);
>   }
>   
> @@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
>   					block_end_pfn, freelist, stride, false);
>   
>   		/* Update the skip hint if the full pageblock was scanned */
> -		if (isolate_start_pfn == block_end_pfn)
> +		if (isolate_start_pfn >= block_end_pfn)
>   			update_pageblock_skip(cc, page, block_start_pfn -
>   					      pageblock_nr_pages);
>
Baolin Wang Aug. 15, 2023, 8:38 a.m. UTC | #7
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We call isolate_freepages_block in strict mode, continuous pages in
> pageblock will be isolated if isolate_freepages_block successed.
> Then pfn + isolated will point to start of next pageblock to scan
> no matter how many pageblocks are isolated in isolate_freepages_block.
> Use pfn + isolated as start of next pageblock to scan to simplify the
> iteration.

IIUC, the isolate_freepages_block() can isolate high-order free pages, 
which means the pfn + isolated can be larger than the block_end_pfn. So 
in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in 
different pageblocks, that will break pageblock_pfn_to_page().

> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>   mm/compaction.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 684f6e6cd8bc..8d7d38073d30 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>   	block_end_pfn = pageblock_end_pfn(pfn);
>   
>   	for (; pfn < end_pfn; pfn += isolated,
> -				block_start_pfn = block_end_pfn,
> -				block_end_pfn += pageblock_nr_pages) {
> +				block_start_pfn = pfn,
> +				block_end_pfn = pfn + pageblock_nr_pages) {
>   		/* Protect pfn from changing by isolate_freepages_block */
>   		unsigned long isolate_start_pfn = pfn;
>   
> -		/*
> -		 * pfn could pass the block_end_pfn if isolated freepage
> -		 * is more than pageblock order. In this case, we adjust
> -		 * scanning range to right one.
> -		 */
> -		if (pfn >= block_end_pfn) {
> -			block_start_pfn = pageblock_start_pfn(pfn);
> -			block_end_pfn = pageblock_end_pfn(pfn);
> -		}
> -
>   		block_end_pfn = min(block_end_pfn, end_pfn);
>   
>   		if (!pageblock_pfn_to_page(block_start_pfn,
Kemeng Shi Aug. 15, 2023, 9:22 a.m. UTC | #8
on 8/15/2023 4:28 PM, Baolin Wang wrote:
> 
> 
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> In strict mode, we should return 0 if there is any hole in pageblock. If
>> we successfully isolated pages at beginning at pageblock and then have a
>> bogus compound_order outside pageblock in next page. We will abort search
>> loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
>> we will treat it as a successful isolation in strict mode as blockpfn is
>> not < end_pfn and return partial isolated pages. Then
>> isolate_freepages_range may success unexpectly with hole in isolated
>> range.
> 
> Yes, that can be happened.
> 
>> This patch also removes unnecessary limit for blockpfn to go outside
>> by buddy page introduced in fixed commit or by stride introduced after
>> fixed commit. Caller could use returned blockpfn to check if full
>> pageblock is scanned by test if blockpfn >= end and to get next pfn to
>> scan inside isolate_freepages_block on demand.
> 
> IMO, I don't think removing the pageblock restriction is worth it, since it did not fix anything and will make people more confused, at least to me.
> 
> That is to say, it will be surprised that the blockpfn can go outside of the pageblock after calling isolate_freepages_block() to just scan only one pageblock, and I did not see in detail if this can cause other potential problems.
> 
>> Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>   mm/compaction.c | 21 +++++++++++----------
>>   1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index fa1b100b0d10..684f6e6cd8bc 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>>                   page += (1UL << order) - 1;
>>                   nr_scanned += (1UL << order) - 1;
>>               }
>> +            /*
>> +             * There is a tiny chance that we have read bogus
>> +             * compound_order(), so be careful to not go outside
>> +             * of the pageblock.
>> +             */
>> +            if (unlikely(blockpfn >= end_pfn))
>> +                blockpfn = end_pfn - 1;
> 
> So we can just add this validation to ensure that the isolate_freepages_block() can return 0 if failure is happened, which can fix your problem.
> 
Thanks for feedback! Sure, I will do this in next version.
>> +
>>               goto isolate_fail;
>>           }
>>   @@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>>       if (locked)
>>           spin_unlock_irqrestore(&cc->zone->lock, flags);
>>   -    /*
>> -     * There is a tiny chance that we have read bogus compound_order(),
>> -     * so be careful to not go outside of the pageblock.
>> -     */
>> -    if (unlikely(blockpfn > end_pfn))
>> -        blockpfn = end_pfn;
>> -
>>       trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>>                       nr_scanned, total_isolated);
>>   -    /* Record how far we have got within the block */
>> +    /* Record how far we have got */
>>       *start_pfn = blockpfn;
>>         /*
>> @@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
>>       isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>>         /* Skip this pageblock in the future as it's full or nearly full */
>> -    if (start_pfn == end_pfn && !cc->no_set_skip_hint)
>> +    if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
>>           set_pageblock_skip(page);
>>   }
>>   @@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
>>                       block_end_pfn, freelist, stride, false);
>>             /* Update the skip hint if the full pageblock was scanned */
>> -        if (isolate_start_pfn == block_end_pfn)
>> +        if (isolate_start_pfn >= block_end_pfn)
>>               update_pageblock_skip(cc, page, block_start_pfn -
>>                             pageblock_nr_pages);
>>   
>
Kemeng Shi Aug. 15, 2023, 9:32 a.m. UTC | #9
on 8/15/2023 4:38 PM, Baolin Wang wrote:
> 
> 
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We call isolate_freepages_block in strict mode, continuous pages in
>> pageblock will be isolated if isolate_freepages_block successed.
>> Then pfn + isolated will point to start of next pageblock to scan
>> no matter how many pageblocks are isolated in isolate_freepages_block.
>> Use pfn + isolated as start of next pageblock to scan to simplify the
>> iteration.
> 
> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
> 
In for update statement, we always update block_start_pfn to pfn and
update block_end_pfn to pfn + pageblock_nr_pages. So they should point
to the same pageblock. I guess you missed the change to update of
block_end_pfn. :)
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>   mm/compaction.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 684f6e6cd8bc..8d7d38073d30 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>       block_end_pfn = pageblock_end_pfn(pfn);
>>         for (; pfn < end_pfn; pfn += isolated,
>> -                block_start_pfn = block_end_pfn,
>> -                block_end_pfn += pageblock_nr_pages) {
>> +                block_start_pfn = pfn,
>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>           /* Protect pfn from changing by isolate_freepages_block */
>>           unsigned long isolate_start_pfn = pfn;
>>   -        /*
>> -         * pfn could pass the block_end_pfn if isolated freepage
>> -         * is more than pageblock order. In this case, we adjust
>> -         * scanning range to right one.
>> -         */
>> -        if (pfn >= block_end_pfn) {
>> -            block_start_pfn = pageblock_start_pfn(pfn);
>> -            block_end_pfn = pageblock_end_pfn(pfn);
>> -        }
>> -
>>           block_end_pfn = min(block_end_pfn, end_pfn);
>>             if (!pageblock_pfn_to_page(block_start_pfn,
>
Baolin Wang Aug. 15, 2023, 10:07 a.m. UTC | #10
On 8/15/2023 5:32 PM, Kemeng Shi wrote:
> 
> 
> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We call isolate_freepages_block in strict mode, continuous pages in
>>> pageblock will be isolated if isolate_freepages_block successed.
>>> Then pfn + isolated will point to start of next pageblock to scan
>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>> iteration.
>>
>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>
> In for update statement, we always update block_start_pfn to pfn and

I mean, you changed to:
1) pfn += isolated;
2) block_start_pfn = pfn;
3) block_end_pfn = pfn + pageblock_nr_pages;

But in 1) pfn + isolated can go outside of the currnet pageblock if 
isolating a high-order page, for example, located in the middle of the 
next pageblock. So that the block_start_pfn can point to the middle of 
the next pageblock, not the start position. Meanwhile after 3), the 
block_end_pfn can point another pageblock. Or I missed something else?

> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
> to the same pageblock. I guess you missed the change to update of
> block_end_pfn. :)
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>    mm/compaction.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>        block_end_pfn = pageblock_end_pfn(pfn);
>>>          for (; pfn < end_pfn; pfn += isolated,
>>> -                block_start_pfn = block_end_pfn,
>>> -                block_end_pfn += pageblock_nr_pages) {
>>> +                block_start_pfn = pfn,
>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>            /* Protect pfn from changing by isolate_freepages_block */
>>>            unsigned long isolate_start_pfn = pfn;
>>>    -        /*
>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>> -         * is more than pageblock order. In this case, we adjust
>>> -         * scanning range to right one.
>>> -         */
>>> -        if (pfn >= block_end_pfn) {
>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>> -        }
>>> -
>>>            block_end_pfn = min(block_end_pfn, end_pfn);
>>>              if (!pageblock_pfn_to_page(block_start_pfn,
>>
Kemeng Shi Aug. 15, 2023, 10:37 a.m. UTC | #11
on 8/15/2023 6:07 PM, Baolin Wang wrote:
> 
> 
> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>> iteration.
>>>
>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>
>> In for update statement, we always update block_start_pfn to pfn and
> 
> I mean, you changed to:
> 1) pfn += isolated;
> 2) block_start_pfn = pfn;
> 3) block_end_pfn = pfn + pageblock_nr_pages;
> 
> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
> 
Ah, I miss to explain this in changelog.
In case we could we have buddy page with order higher than pageblock:
1. page in buddy page is aligned with it's order
2. order of page is higher than pageblock order
Then page is aligned with pageblock order. So pfn of page and isolated pages
count are both aligned pageblock order. So pfn + isolated is pageblock order
aligned.
>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>> to the same pageblock. I guess you missed the change to update of
>> block_end_pfn. :)
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>    mm/compaction.c | 14 ++------------
>>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>        block_end_pfn = pageblock_end_pfn(pfn);
>>>>          for (; pfn < end_pfn; pfn += isolated,
>>>> -                block_start_pfn = block_end_pfn,
>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>> +                block_start_pfn = pfn,
>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>            /* Protect pfn from changing by isolate_freepages_block */
>>>>            unsigned long isolate_start_pfn = pfn;
>>>>    -        /*
>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>> -         * is more than pageblock order. In this case, we adjust
>>>> -         * scanning range to right one.
>>>> -         */
>>>> -        if (pfn >= block_end_pfn) {
>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>> -        }
>>>> -
>>>>            block_end_pfn = min(block_end_pfn, end_pfn);
>>>>              if (!pageblock_pfn_to_page(block_start_pfn,
>>>
>
Baolin Wang Aug. 19, 2023, 11:58 a.m. UTC | #12
On 8/15/2023 6:37 PM, Kemeng Shi wrote:
> 
> 
> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>> iteration.
>>>>
>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>
>>> In for update statement, we always update block_start_pfn to pfn and
>>
>> I mean, you changed to:
>> 1) pfn += isolated;
>> 2) block_start_pfn = pfn;
>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>
>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>
> Ah, I miss to explain this in changelog.
> In case we could we have buddy page with order higher than pageblock:
> 1. page in buddy page is aligned with it's order
> 2. order of page is higher than pageblock order
> Then page is aligned with pageblock order. So pfn of page and isolated pages
> count are both aligned pageblock order. So pfn + isolated is pageblock order
> aligned.

That's not what I mean. pfn + isolated is not always pageblock-aligned, 
since the isolate_freepages_block() can isolated high-order free pages 
(for example: order-1, order-2 ...).

Suppose the pageblock size is 2M, when isolating a pageblock (suppose 
the pfn range is 0 - 511 to make the arithmetic easy) by 
isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 
page, but pfn 511 is order-1 page, so you will isolate 513 pages from 
this pageblock, which will make 'pfn + isolated' not pageblock aligned.

>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>> to the same pageblock. I guess you missed the change to update of
>>> block_end_pfn. :)
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>>     mm/compaction.c | 14 ++------------
>>>>>     1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>         block_end_pfn = pageblock_end_pfn(pfn);
>>>>>           for (; pfn < end_pfn; pfn += isolated,
>>>>> -                block_start_pfn = block_end_pfn,
>>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>>> +                block_start_pfn = pfn,
>>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>             /* Protect pfn from changing by isolate_freepages_block */
>>>>>             unsigned long isolate_start_pfn = pfn;
>>>>>     -        /*
>>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>>> -         * is more than pageblock order. In this case, we adjust
>>>>> -         * scanning range to right one.
>>>>> -         */
>>>>> -        if (pfn >= block_end_pfn) {
>>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>>> -        }
>>>>> -
>>>>>             block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>               if (!pageblock_pfn_to_page(block_start_pfn,
>>>>
>>
Kemeng Shi Aug. 22, 2023, 1:37 a.m. UTC | #13
on 8/19/2023 7:58 PM, Baolin Wang wrote:
> 
> 
> On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>>
>>>>
>>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>>> iteration.
>>>>>
>>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>>
>>>> In for update statement, we always update block_start_pfn to pfn and
>>>
>>> I mean, you changed to:
>>> 1) pfn += isolated;
>>> 2) block_start_pfn = pfn;
>>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>>
>>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>>
>> Ah, I miss to explain this in changelog.
>> In case we could we have buddy page with order higher than pageblock:
>> 1. page in buddy page is aligned with it's order
>> 2. order of page is higher than pageblock order
>> Then page is aligned with pageblock order. So pfn of page and isolated pages
>> count are both aligned pageblock order. So pfn + isolated is pageblock order
>> aligned.
> 
> That's not what I mean. pfn + isolated is not always pageblock-aligned, since the isolate_freepages_block() can isolated high-order free pages (for example: order-1, order-2 ...).
> 
> Suppose the pageblock size is 2M, when isolating a pageblock (suppose the pfn range is 0 - 511 to make the arithmetic easy) by isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 page, but pfn 511 is order-1 page, so you will isolate 513 pages from this pageblock, which will make 'pfn + isolated' not pageblock aligned.
This is also no supposed to happen as low order buddy pages should never span
cross boundary of high order pages:
In buddy system, we always split order N pages into two order N - 1 pages as
following:
|        order N        |
|order N - 1|order N - 1|
So buddy pages with order N - 1 will never cross boudary of order N. Similar,
buddy pages with order N - 2 will never cross boudary of order N - 1 and so
on. Then any pages with order less than N will never cross boudary of order
N.

> 
>>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>>> to the same pageblock. I guess you missed the change to update of
>>>> block_end_pfn. :)
>>>>>>
>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>> ---
>>>>>>     mm/compaction.c | 14 ++------------
>>>>>>     1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>>         block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>           for (; pfn < end_pfn; pfn += isolated,
>>>>>> -                block_start_pfn = block_end_pfn,
>>>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>>>> +                block_start_pfn = pfn,
>>>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>>             /* Protect pfn from changing by isolate_freepages_block */
>>>>>>             unsigned long isolate_start_pfn = pfn;
>>>>>>     -        /*
>>>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>>>> -         * is more than pageblock order. In this case, we adjust
>>>>>> -         * scanning range to right one.
>>>>>> -         */
>>>>>> -        if (pfn >= block_end_pfn) {
>>>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>>>> -        }
>>>>>> -
>>>>>>             block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>>               if (!pageblock_pfn_to_page(block_start_pfn,
>>>>>
>>>
> 
>
Baolin Wang Aug. 24, 2023, 2:19 a.m. UTC | #14
On 8/22/2023 9:37 AM, Kemeng Shi wrote:
> 
> 
> on 8/19/2023 7:58 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>>>
>>>>>
>>>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>>>> iteration.
>>>>>>
>>>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>>>
>>>>> In for update statement, we always update block_start_pfn to pfn and
>>>>
>>>> I mean, you changed to:
>>>> 1) pfn += isolated;
>>>> 2) block_start_pfn = pfn;
>>>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>>>
>>>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>>>
>>> Ah, I miss to explain this in changelog.
>>> In case we could we have buddy page with order higher than pageblock:
>>> 1. page in buddy page is aligned with it's order
>>> 2. order of page is higher than pageblock order
>>> Then page is aligned with pageblock order. So pfn of page and isolated pages
>>> count are both aligned pageblock order. So pfn + isolated is pageblock order
>>> aligned.
>>
>> That's not what I mean. pfn + isolated is not always pageblock-aligned, since the isolate_freepages_block() can isolated high-order free pages (for example: order-1, order-2 ...).
>>
>> Suppose the pageblock size is 2M, when isolating a pageblock (suppose the pfn range is 0 - 511 to make the arithmetic easy) by isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 page, but pfn 511 is order-1 page, so you will isolate 513 pages from this pageblock, which will make 'pfn + isolated' not pageblock aligned.

I realized I made a bad example, sorry for noise.

After more thinking, I agree that the 'pfn + isolated' is always 
pageblock aligned in strict mode. So feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> This is also no supposed to happen as low order buddy pages should never span
> cross boundary of high order pages:
> In buddy system, we always split order N pages into two order N - 1 pages as
> following:
> |        order N        |
> |order N - 1|order N - 1|
> So buddy pages with order N - 1 will never cross boudary of order N. Similar,
> buddy pages with order N - 2 will never cross boudary of order N - 1 and so
> on. Then any pages with order less than N will never cross boudary of order
> N.
> 
>>
>>>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>>>> to the same pageblock. I guess you missed the change to update of
>>>>> block_end_pfn. :)
>>>>>>>
>>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>>> ---
>>>>>>>      mm/compaction.c | 14 ++------------
>>>>>>>      1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>>>> --- a/mm/compaction.c
>>>>>>> +++ b/mm/compaction.c
>>>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>>>          block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>>            for (; pfn < end_pfn; pfn += isolated,
>>>>>>> -                block_start_pfn = block_end_pfn,
>>>>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>>>>> +                block_start_pfn = pfn,
>>>>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>>>              /* Protect pfn from changing by isolate_freepages_block */
>>>>>>>              unsigned long isolate_start_pfn = pfn;
>>>>>>>      -        /*
>>>>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>>>>> -         * is more than pageblock order. In this case, we adjust
>>>>>>> -         * scanning range to right one.
>>>>>>> -         */
>>>>>>> -        if (pfn >= block_end_pfn) {
>>>>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>> -        }
>>>>>>> -
>>>>>>>              block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>>>                if (!pageblock_pfn_to_page(block_start_pfn,
>>>>>>
>>>>
>>
>>