diff mbox series

mm/vmscan: Fix NR_ISOLATED_FILE corruption on 64-bit

Message ID 20201029032320.1448441-1-npiggin@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/vmscan: Fix NR_ISOLATED_FILE corruption on 64-bit | expand

Commit Message

Nicholas Piggin Oct. 29, 2020, 3:23 a.m. UTC
Previously the negated unsigned long would be cast back to signed long
which would have the correct negative value. After commit 730ec8c01a2b
("mm/vmscan.c: change prototype for shrink_page_list"), the large
unsigned int converts to a large positive signed long.

Symptoms include CMA allocations hanging forever holding the cma_mutex
due to alloc_contig_range->...->isolate_migratepages_block waiting
forever in "while (unlikely(too_many_isolated(pgdat)))".

Cc: linux-mm@kvack.org
Cc: Maninder Singh <maninder1.s@samsung.com>
Cc: Vaneet Narang <v.narang@samsung.com>
Cc: Maninder Singh <maninder1.s@samsung.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Amit Sahrawat <a.sahrawat@samsung.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko Oct. 29, 2020, 1:11 p.m. UTC | #1
On Thu 29-10-20 13:23:20, Nicholas Piggin wrote:
> Previously the negated unsigned long would be cast back to signed long
> which would have the correct negative value. After commit 730ec8c01a2b
> ("mm/vmscan.c: change prototype for shrink_page_list"), the large
> unsigned int converts to a large positive signed long.
> 
> Symptoms include CMA allocations hanging forever holding the cma_mutex
> due to alloc_contig_range->...->isolate_migratepages_block waiting
> forever in "while (unlikely(too_many_isolated(pgdat)))".

Nasty.

> Cc: linux-mm@kvack.org
> Cc: Maninder Singh <maninder1.s@samsung.com>
> Cc: Vaneet Narang <v.narang@samsung.com>
> Cc: Maninder Singh <maninder1.s@samsung.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Amit Sahrawat <a.sahrawat@samsung.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  mm/vmscan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1b8f0e059767..92c507bacf09 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>  	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>  			TTU_IGNORE_ACCESS, &stat, true);
>  	list_splice(&clean_pages, page_list);
> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed);
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed);
>  	/*
>  	 * Since lazyfree pages are isolated from file LRU from the beginning,
>  	 * they will rotate back to anonymous LRU in the end if it failed to

You want the same also for -stat.nr_lazyfree_fail right?
Andrew Morton Oct. 30, 2020, 5:25 a.m. UTC | #2
On Thu, 29 Oct 2020 14:11:03 +0100 Michal Hocko <mhocko@suse.com> wrote:

> On Thu 29-10-20 13:23:20, Nicholas Piggin wrote:
> > Previously the negated unsigned long would be cast back to signed long
> > which would have the correct negative value. After commit 730ec8c01a2b
> > ("mm/vmscan.c: change prototype for shrink_page_list"), the large
> > unsigned int converts to a large positive signed long.
> > 
> > Symptoms include CMA allocations hanging forever holding the cma_mutex
> > due to alloc_contig_range->...->isolate_migratepages_block waiting
> > forever in "while (unlikely(too_many_isolated(pgdat)))".
> 
> Nasty.
> 
> > Cc: linux-mm@kvack.org
> > Cc: Maninder Singh <maninder1.s@samsung.com>
> > Cc: Vaneet Narang <v.narang@samsung.com>
> > Cc: Maninder Singh <maninder1.s@samsung.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Amit Sahrawat <a.sahrawat@samsung.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list")
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  mm/vmscan.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1b8f0e059767..92c507bacf09 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> >  	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> >  			TTU_IGNORE_ACCESS, &stat, true);
> >  	list_splice(&clean_pages, page_list);
> > -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed);
> > +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed);
> >  	/*
> >  	 * Since lazyfree pages are isolated from file LRU from the beginning,
> >  	 * they will rotate back to anonymous LRU in the end if it failed to
> 
> You want the same also for -stat.nr_lazyfree_fail right?

I did the below, and added a cc:stable.

--- a/mm/vmscan.c~mm-vmscan-fix-nr_isolated_file-corruption-on-64-bit-fix
+++ a/mm/vmscan.c
@@ -1516,7 +1516,8 @@ unsigned int reclaim_clean_pages_from_li
 	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
 			TTU_IGNORE_ACCESS, &stat, true);
 	list_splice(&clean_pages, page_list);
-	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed);
+	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
+			    -(long)nr_reclaimed);
 	/*
 	 * Since lazyfree pages are isolated from file LRU from the beginning,
 	 * they will rotate back to anonymous LRU in the end if it failed to
@@ -1526,7 +1527,7 @@ unsigned int reclaim_clean_pages_from_li
 	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
 			    stat.nr_lazyfree_fail);
 	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
-			    -stat.nr_lazyfree_fail);
+			    -(long)stat.nr_lazyfree_fail);
 	return nr_reclaimed;
 }
Michal Hocko Oct. 30, 2020, 7 a.m. UTC | #3
On Thu 29-10-20 22:25:26, Andrew Morton wrote:
> On Thu, 29 Oct 2020 14:11:03 +0100 Michal Hocko <mhocko@suse.com> wrote:
> 
> > On Thu 29-10-20 13:23:20, Nicholas Piggin wrote:
> > > Previously the negated unsigned long would be cast back to signed long
> > > which would have the correct negative value. After commit 730ec8c01a2b
> > > ("mm/vmscan.c: change prototype for shrink_page_list"), the large
> > > unsigned int converts to a large positive signed long.
> > > 
> > > Symptoms include CMA allocations hanging forever holding the cma_mutex
> > > due to alloc_contig_range->...->isolate_migratepages_block waiting
> > > forever in "while (unlikely(too_many_isolated(pgdat)))".
> > 
> > Nasty.
> > 
> > > Cc: linux-mm@kvack.org
> > > Cc: Maninder Singh <maninder1.s@samsung.com>
> > > Cc: Vaneet Narang <v.narang@samsung.com>
> > > Cc: Maninder Singh <maninder1.s@samsung.com>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Amit Sahrawat <a.sahrawat@samsung.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list")
> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > > ---
> > >  mm/vmscan.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 1b8f0e059767..92c507bacf09 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> > >  	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> > >  			TTU_IGNORE_ACCESS, &stat, true);
> > >  	list_splice(&clean_pages, page_list);
> > > -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed);
> > > +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed);
> > >  	/*
> > >  	 * Since lazyfree pages are isolated from file LRU from the beginning,
> > >  	 * they will rotate back to anonymous LRU in the end if it failed to
> > 
> > You want the same also for -stat.nr_lazyfree_fail right?
> 
> I did the below, and added a cc:stable.
> 
> --- a/mm/vmscan.c~mm-vmscan-fix-nr_isolated_file-corruption-on-64-bit-fix
> +++ a/mm/vmscan.c
> @@ -1516,7 +1516,8 @@ unsigned int reclaim_clean_pages_from_li
>  	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>  			TTU_IGNORE_ACCESS, &stat, true);
>  	list_splice(&clean_pages, page_list);
> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed);
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
> +			    -(long)nr_reclaimed);
>  	/*
>  	 * Since lazyfree pages are isolated from file LRU from the beginning,
>  	 * they will rotate back to anonymous LRU in the end if it failed to
> @@ -1526,7 +1527,7 @@ unsigned int reclaim_clean_pages_from_li
>  	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
>  			    stat.nr_lazyfree_fail);
>  	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
> -			    -stat.nr_lazyfree_fail);
> +			    -(long)stat.nr_lazyfree_fail);
>  	return nr_reclaimed;
>  }

Acked-by: Michal Hocko <mhocko@suse.com>

Btw. regular reclaim paths are OK because nr_taken is unsigned long.

Btw #2 I was quite surprised about the inconsistency for !SMP case where
__mod_node_page_state uses int for the delta.

Thanks!
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1b8f0e059767..92c507bacf09 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1516,7 +1516,7 @@  unsigned int reclaim_clean_pages_from_list(struct zone *zone,
 	nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
 			TTU_IGNORE_ACCESS, &stat, true);
 	list_splice(&clean_pages, page_list);
-	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed);
+	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed);
 	/*
 	 * Since lazyfree pages are isolated from file LRU from the beginning,
 	 * they will rotate back to anonymous LRU in the end if it failed to