diff mbox series

mm: compaction: fix endless looping over same migrate block

Message ID 20230731172450.1632195-1-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: compaction: fix endless looping over same migrate block | expand

Commit Message

Johannes Weiner July 31, 2023, 5:24 p.m. UTC
During stress testing, the following situation was observed:

     70 root      39  19       0      0      0 R 100.0   0.0 959:29.92 khugepaged
 310936 root      20   0   84416  25620    512 R  99.7   1.5 642:37.22 hugealloc

Tracing shows isolate_migratepages_block() endlessly looping over the
first block in the DMA zone:

       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
       hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0

The problem is that the functions tries to test and set the skip bit
once on the block, to avoid skipping on its own skip-set, using
pageblock_aligned() on the pfn as a test. But because this is the DMA
zone which starts at pfn 1, this is never true for the first block,
and the skip bit isn't set or tested at all. As a result,
fast_find_migrateblock() returns the same pageblock over and over.

If the pfn isn't pageblock-aligned, also check if it's the start of
the zone to ensure test-and-set-exactly-once on unaligned ranges.

Thanks to Vlastimil Babka for the help in debugging this.

Fixes: 90ed667c03fe ("Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Vlastimil Babka July 31, 2023, 8:25 p.m. UTC | #1
On 7/31/23 19:24, Johannes Weiner wrote:
> During stress testing, the following situation was observed:
> 
>      70 root      39  19       0      0      0 R 100.0   0.0 959:29.92 khugepaged
>  310936 root      20   0   84416  25620    512 R  99.7   1.5 642:37.22 hugealloc
> 
> Tracing shows isolate_migratepages_block() endlessly looping over the
> first block in the DMA zone:
> 
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
> 
> The problem is that the functions tries to test and set the skip bit
> once on the block, to avoid skipping on its own skip-set, using
> pageblock_aligned() on the pfn as a test. But because this is the DMA
> zone which starts at pfn 1, this is never true for the first block,
> and the skip bit isn't set or tested at all. As a result,
> fast_find_migrateblock() returns the same pageblock over and over.
> 
> If the pfn isn't pageblock-aligned, also check if it's the start of
> the zone to ensure test-and-set-exactly-once on unaligned ranges.
> 
> Thanks to Vlastimil Babka for the help in debugging this.
> 
> Fixes: 90ed667c03fe ("Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""")

Yeah I suggested this commit for Fixes: as before the commit (or the
previous, reverted attempt) the skip would be set in
fast_find_migrateblock() so even though the issue of not handling unaligned
zones properly is older, it wouldn't cause an endless loop otherwise. Since
90ed667c03fe is rc1, we don't need stable.

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/compaction.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index dbc9f86b1934..eacca2794e47 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -912,11 +912,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  		/*
>  		 * Check if the pageblock has already been marked skipped.
> -		 * Only the aligned PFN is checked as the caller isolates
> +		 * Only the first PFN is checked as the caller isolates
>  		 * COMPACT_CLUSTER_MAX at a time so the second call must
>  		 * not falsely conclude that the block should be skipped.
>  		 */
> -		if (!valid_page && pageblock_aligned(low_pfn)) {
> +		if (!valid_page && (pageblock_aligned(low_pfn) ||
> +				    low_pfn == cc->zone->zone_start_pfn)) {
>  			if (!isolation_suitable(cc, page)) {
>  				low_pfn = end_pfn;
>  				folio = NULL;
> @@ -2002,7 +2003,8 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>  		 * before making it "skip" so other compaction instances do
>  		 * not scan the same block.
>  		 */
> -		if (pageblock_aligned(low_pfn) &&
> +		if ((pageblock_aligned(low_pfn) ||
> +		     low_pfn == cc->zone->zone_start_pfn) &&
>  		    !fast_find_block && !isolation_suitable(cc, page))
>  			continue;
>
Mel Gorman Aug. 1, 2023, 9:08 a.m. UTC | #2
On Mon, Jul 31, 2023 at 01:24:50PM -0400, Johannes Weiner wrote:
> During stress testing, the following situation was observed:
> 
>      70 root      39  19       0      0      0 R 100.0   0.0 959:29.92 khugepaged
>  310936 root      20   0   84416  25620    512 R  99.7   1.5 642:37.22 hugealloc
> 
> Tracing shows isolate_migratepages_block() endlessly looping over the
> first block in the DMA zone:
> 
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>        hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
> 
> The problem is that the functions tries to test and set the skip bit
> once on the block, to avoid skipping on its own skip-set, using
> pageblock_aligned() on the pfn as a test. But because this is the DMA
> zone which starts at pfn 1, this is never true for the first block,
> and the skip bit isn't set or tested at all. As a result,
> fast_find_migrateblock() returns the same pageblock over and over.
> 
> If the pfn isn't pageblock-aligned, also check if it's the start of
> the zone to ensure test-and-set-exactly-once on unaligned ranges.
> 
> Thanks to Vlastimil Babka for the help in debugging this.
> 
> Fixes: 90ed667c03fe ("Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Mel Gorman <mgorman@techsingularity.net>
Baolin Wang Aug. 1, 2023, 11:03 a.m. UTC | #3
On 8/1/2023 1:24 AM, Johannes Weiner wrote:
> During stress testing, the following situation was observed:
> 
>       70 root      39  19       0      0      0 R 100.0   0.0 959:29.92 khugepaged
>   310936 root      20   0   84416  25620    512 R  99.7   1.5 642:37.22 hugealloc
> 
> Tracing shows isolate_migratepages_block() endlessly looping over the
> first block in the DMA zone:
> 
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_finished: node=0 zone=DMA      order=9 ret=no_suitable_page
>         hugealloc-310936  [001] ..... 237297.415718: mm_compaction_isolate_migratepages: range=(0x1 ~ 0x400) nr_scanned=513 nr_taken=0
> 
> The problem is that the functions tries to test and set the skip bit
> once on the block, to avoid skipping on its own skip-set, using
> pageblock_aligned() on the pfn as a test. But because this is the DMA
> zone which starts at pfn 1, this is never true for the first block,
> and the skip bit isn't set or tested at all. As a result,
> fast_find_migrateblock() returns the same pageblock over and over.
> 
> If the pfn isn't pageblock-aligned, also check if it's the start of
> the zone to ensure test-and-set-exactly-once on unaligned ranges.
> 
> Thanks to Vlastimil Babka for the help in debugging this.
> 
> Fixes: 90ed667c03fe ("Revert "Revert "mm/compaction: fix set skip in fast_find_migrateblock""")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/compaction.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index dbc9f86b1934..eacca2794e47 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -912,11 +912,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   
>   		/*
>   		 * Check if the pageblock has already been marked skipped.
> -		 * Only the aligned PFN is checked as the caller isolates
> +		 * Only the first PFN is checked as the caller isolates
>   		 * COMPACT_CLUSTER_MAX at a time so the second call must
>   		 * not falsely conclude that the block should be skipped.
>   		 */
> -		if (!valid_page && pageblock_aligned(low_pfn)) {
> +		if (!valid_page && (pageblock_aligned(low_pfn) ||
> +				    low_pfn == cc->zone->zone_start_pfn)) {
>   			if (!isolation_suitable(cc, page)) {
>   				low_pfn = end_pfn;
>   				folio = NULL;
> @@ -2002,7 +2003,8 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>   		 * before making it "skip" so other compaction instances do
>   		 * not scan the same block.
>   		 */
> -		if (pageblock_aligned(low_pfn) &&
> +		if ((pageblock_aligned(low_pfn) ||
> +		     low_pfn == cc->zone->zone_start_pfn) &&
>   		    !fast_find_block && !isolation_suitable(cc, page))
>   			continue;
>
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index dbc9f86b1934..eacca2794e47 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -912,11 +912,12 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/*
 		 * Check if the pageblock has already been marked skipped.
-		 * Only the aligned PFN is checked as the caller isolates
+		 * Only the first PFN is checked as the caller isolates
 		 * COMPACT_CLUSTER_MAX at a time so the second call must
 		 * not falsely conclude that the block should be skipped.
 		 */
-		if (!valid_page && pageblock_aligned(low_pfn)) {
+		if (!valid_page && (pageblock_aligned(low_pfn) ||
+				    low_pfn == cc->zone->zone_start_pfn)) {
 			if (!isolation_suitable(cc, page)) {
 				low_pfn = end_pfn;
 				folio = NULL;
@@ -2002,7 +2003,8 @@  static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
 		 * before making it "skip" so other compaction instances do
 		 * not scan the same block.
 		 */
-		if (pageblock_aligned(low_pfn) &&
+		if ((pageblock_aligned(low_pfn) ||
+		     low_pfn == cc->zone->zone_start_pfn) &&
 		    !fast_find_block && !isolation_suitable(cc, page))
 			continue;