Message ID | 20201029032320.1448441-1-npiggin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/vmscan: Fix NR_ISOLATED_FILE corruption on 64-bit | expand |
On Thu 29-10-20 13:23:20, Nicholas Piggin wrote: > Previously the negated unsigned long would be cast back to signed long > which would have the correct negative value. After commit 730ec8c01a2b > ("mm/vmscan.c: change prototype for shrink_page_list"), the large > unsigned int converts to a large positive signed long. > > Symptoms include CMA allocations hanging forever holding the cma_mutex > due to alloc_contig_range->...->isolate_migratepages_block waiting > forever in "while (unlikely(too_many_isolated(pgdat)))". Nasty. > Cc: linux-mm@kvack.org > Cc: Maninder Singh <maninder1.s@samsung.com> > Cc: Vaneet Narang <v.narang@samsung.com> > Cc: Maninder Singh <maninder1.s@samsung.com> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Amit Sahrawat <a.sahrawat@samsung.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Vlastimil Babka <vbabka@suse.cz> > Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list") > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > mm/vmscan.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 1b8f0e059767..92c507bacf09 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, > TTU_IGNORE_ACCESS, &stat, true); > list_splice(&clean_pages, page_list); > - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed); > + mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); > /* > * Since lazyfree pages are isolated from file LRU from the beginning, > * they will rotate back to anonymous LRU in the end if it failed to You want the same also for -stat.nr_lazyfree_fail right?
On Thu, 29 Oct 2020 14:11:03 +0100 Michal Hocko <mhocko@suse.com> wrote: > On Thu 29-10-20 13:23:20, Nicholas Piggin wrote: > > Previously the negated unsigned long would be cast back to signed long > > which would have the correct negative value. After commit 730ec8c01a2b > > ("mm/vmscan.c: change prototype for shrink_page_list"), the large > > unsigned int converts to a large positive signed long. > > > > Symptoms include CMA allocations hanging forever holding the cma_mutex > > due to alloc_contig_range->...->isolate_migratepages_block waiting > > forever in "while (unlikely(too_many_isolated(pgdat)))". > > Nasty. > > > Cc: linux-mm@kvack.org > > Cc: Maninder Singh <maninder1.s@samsung.com> > > Cc: Vaneet Narang <v.narang@samsung.com> > > Cc: Maninder Singh <maninder1.s@samsung.com> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Amit Sahrawat <a.sahrawat@samsung.com> > > Cc: Mel Gorman <mgorman@suse.de> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list") > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > --- > > mm/vmscan.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 1b8f0e059767..92c507bacf09 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > > nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, > > TTU_IGNORE_ACCESS, &stat, true); > > list_splice(&clean_pages, page_list); > > - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed); > > + mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); > > /* > > * Since lazyfree pages are isolated from file LRU from the beginning, > > * they will rotate back to anonymous LRU in the end if it failed to > > You want the same also for -stat.nr_lazyfree_fail right? I did the below, and added a cc:stable. --- a/mm/vmscan.c~mm-vmscan-fix-nr_isolated_file-corruption-on-64-bit-fix +++ a/mm/vmscan.c @@ -1516,7 +1516,8 @@ unsigned int reclaim_clean_pages_from_li nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); + mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, + -(long)nr_reclaimed); /* * Since lazyfree pages are isolated from file LRU from the beginning, * they will rotate back to anonymous LRU in the end if it failed to @@ -1526,7 +1527,7 @@ unsigned int reclaim_clean_pages_from_li mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON, stat.nr_lazyfree_fail); mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, - -stat.nr_lazyfree_fail); + -(long)stat.nr_lazyfree_fail); return nr_reclaimed; }
On Thu 29-10-20 22:25:26, Andrew Morton wrote: > On Thu, 29 Oct 2020 14:11:03 +0100 Michal Hocko <mhocko@suse.com> wrote: > > > On Thu 29-10-20 13:23:20, Nicholas Piggin wrote: > > > Previously the negated unsigned long would be cast back to signed long > > > which would have the correct negative value. After commit 730ec8c01a2b > > > ("mm/vmscan.c: change prototype for shrink_page_list"), the large > > > unsigned int converts to a large positive signed long. > > > > > > Symptoms include CMA allocations hanging forever holding the cma_mutex > > > due to alloc_contig_range->...->isolate_migratepages_block waiting > > > forever in "while (unlikely(too_many_isolated(pgdat)))". > > > > Nasty. > > > > > Cc: linux-mm@kvack.org > > > Cc: Maninder Singh <maninder1.s@samsung.com> > > > Cc: Vaneet Narang <v.narang@samsung.com> > > > Cc: Maninder Singh <maninder1.s@samsung.com> > > > Cc: Michal Hocko <mhocko@suse.com> > > > Cc: Amit Sahrawat <a.sahrawat@samsung.com> > > > Cc: Mel Gorman <mgorman@suse.de> > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list") > > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > > > --- > > > mm/vmscan.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 1b8f0e059767..92c507bacf09 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, > > > nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, > > > TTU_IGNORE_ACCESS, &stat, true); > > > list_splice(&clean_pages, page_list); > > > - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed); > > > + mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); > > > /* > > > * Since lazyfree pages are isolated from file LRU from the beginning, > > > * they will rotate back to anonymous LRU in the end if it failed to > > > > You want the same also for -stat.nr_lazyfree_fail right? > > I did the below, and added a cc:stable. > > --- a/mm/vmscan.c~mm-vmscan-fix-nr_isolated_file-corruption-on-64-bit-fix > +++ a/mm/vmscan.c > @@ -1516,7 +1516,8 @@ unsigned int reclaim_clean_pages_from_li > nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, > TTU_IGNORE_ACCESS, &stat, true); > list_splice(&clean_pages, page_list); > - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); > + mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, > + -(long)nr_reclaimed); > /* > * Since lazyfree pages are isolated from file LRU from the beginning, > * they will rotate back to anonymous LRU in the end if it failed to > @@ -1526,7 +1527,7 @@ unsigned int reclaim_clean_pages_from_li > mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON, > stat.nr_lazyfree_fail); > mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, > - -stat.nr_lazyfree_fail); > + -(long)stat.nr_lazyfree_fail); > return nr_reclaimed; > } Acked-by: Michal Hocko <mhocko@suse.com> Btw. regular reclaim paths are OK because nr_taken is unsigned long. Btw #2 I was quite surprised about the inconsistency for !SMP case where __mod_node_page_state uses int for the delta. Thanks!
diff --git a/mm/vmscan.c b/mm/vmscan.c index 1b8f0e059767..92c507bacf09 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1516,7 +1516,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone, nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -nr_reclaimed); + mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); /* * Since lazyfree pages are isolated from file LRU from the beginning, * they will rotate back to anonymous LRU in the end if it failed to
Previously the negated unsigned long would be cast back to signed long which would have the correct negative value. After commit 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list"), the large unsigned int converts to a large positive signed long. Symptoms include CMA allocations hanging forever holding the cma_mutex due to alloc_contig_range->...->isolate_migratepages_block waiting forever in "while (unlikely(too_many_isolated(pgdat)))". Cc: linux-mm@kvack.org Cc: Maninder Singh <maninder1.s@samsung.com> Cc: Vaneet Narang <v.narang@samsung.com> Cc: Maninder Singh <maninder1.s@samsung.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Amit Sahrawat <a.sahrawat@samsung.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Fixes: 730ec8c01a2b ("mm/vmscan.c: change prototype for shrink_page_list") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- mm/vmscan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)