diff mbox series

mm/vmscan: skip layzfree page on reclaim_clean_pages_from_list

Message ID 20200416033514.6366-1-jaewon31.kim@samsung.com (mailing list archive)
State New, archived
Headers show
Series mm/vmscan: skip layzfree page on reclaim_clean_pages_from_list | expand

Commit Message

Jaewon Kim April 16, 2020, 3:35 a.m. UTC
This patch fix nr_isolate_* mismatch problem between cma and dirty
lazyfree page.

If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
page, then the lazyfree page is changed to a normal anon page having
SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
with the change, reclaim context correctly counts isolated files because
it uses is_file_lru to distinguish file. And the change to anon is not
happened if try_to_unmap_one is used for migration. So migration context
like compaction also correctly counts isolated files even though it uses
page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
MADV_FREE").

But the nr_isolate_* mismatch problem happens on cma alloc. There is
reclaim_clean_pages_from_list which is being used only by cma. It was
introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
contiguous allocation instead of migration") to reclaim clean file pages
without migration. The cma alloc uses both reclaim_clean_pages_from_list
and migrate_pages, and it uses page_is_file_lru to count isolated
files. If there are dirty lazyfree pages allocated from cma memory
region, the pages are counted as isolated file at the beginging but are
counted as isolated anon after finished.

Mem-Info:
Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no

Like log above, there was too much isolated file, 37664kB, which
triggers too_many_isolated in reclaim when there is no isolated file in
system wide. It could be reproducible by running two programs, doing
MADV_FREE, writing and doing cma alloc, respectively. Although isolated
anon is 0, I found that the internal value of isolated anon was the
negative value of isolated file.

Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
lazyfree page can be checked by both PageAnon(page) and
page_is_file_lru. But in this case, PageAnon is enough to skip all
anon pages.

Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 mm/vmscan.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jaewon Kim April 16, 2020, 5:17 a.m. UTC | #1
On 2020년 04월 16일 12:35, Jaewon Kim wrote:
> This patch fix nr_isolate_* mismatch problem between cma and dirty
> lazyfree page.
>
> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
> page, then the lazyfree page is changed to a normal anon page having
> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
> with the change, reclaim context correctly counts isolated files because
Sorry, I think I pointed a wrong commit, the SwapBacked was recovered
by commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages").
> it uses is_file_lru to distinguish file. And the change to anon is not
> happened if try_to_unmap_one is used for migration. So migration context
> like compaction also correctly counts isolated files even though it uses
> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
> MADV_FREE").
>
> But the nr_isolate_* mismatch problem happens on cma alloc. There is
> reclaim_clean_pages_from_list which is being used only by cma. It was
> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
> contiguous allocation instead of migration") to reclaim clean file pages
> without migration. The cma alloc uses both reclaim_clean_pages_from_list
> and migrate_pages, and it uses page_is_file_lru to count isolated
> files. If there are dirty lazyfree pages allocated from cma memory
> region, the pages are counted as isolated file at the beginging but are
> counted as isolated anon after finished.
>
> Mem-Info:
> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>
> Like log above, there was too much isolated file, 37664kB, which
> triggers too_many_isolated in reclaim when there is no isolated file in
> system wide. It could be reproducible by running two programs, doing
> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
> anon is 0, I found that the internal value of isolated anon was the
> negative value of isolated file.
>
> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
> lazyfree page can be checked by both PageAnon(page) and
> page_is_file_lru. But in this case, PageAnon is enough to skip all
> anon pages.
>
> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  mm/vmscan.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b06868fc4926..9380a18eef5e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  	LIST_HEAD(clean_pages);
>  
>  	list_for_each_entry_safe(page, next, page_list, lru) {
> +		/* to avoid race with MADV_FREE anon page */
> +		if (PageAnon(page))
> +			continue;
>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>  			ClearPageActive(page);
Minchan Kim April 17, 2020, 12:38 a.m. UTC | #2
Hi Jaewon,

On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
> This patch fix nr_isolate_* mismatch problem between cma and dirty
> lazyfree page.
> 
> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
> page, then the lazyfree page is changed to a normal anon page having
> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
> with the change, reclaim context correctly counts isolated files because
> it uses is_file_lru to distinguish file. And the change to anon is not
> happened if try_to_unmap_one is used for migration. So migration context
> like compaction also correctly counts isolated files even though it uses
> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
> MADV_FREE").
> 
> But the nr_isolate_* mismatch problem happens on cma alloc. There is
> reclaim_clean_pages_from_list which is being used only by cma. It was
> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
> contiguous allocation instead of migration") to reclaim clean file pages
> without migration. The cma alloc uses both reclaim_clean_pages_from_list
> and migrate_pages, and it uses page_is_file_lru to count isolated
> files. If there are dirty lazyfree pages allocated from cma memory
> region, the pages are counted as isolated file at the beginging but are
> counted as isolated anon after finished.
> 
> Mem-Info:
> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> 
> Like log above, there was too much isolated file, 37664kB, which
> triggers too_many_isolated in reclaim when there is no isolated file in
> system wide. It could be reproducible by running two programs, doing
> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
> anon is 0, I found that the internal value of isolated anon was the
> negative value of isolated file.
> 
> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
> lazyfree page can be checked by both PageAnon(page) and
> page_is_file_lru. But in this case, PageAnon is enough to skip all
> anon pages.
> 
> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>

Thanks for the investigation!
The thing is MADV_FREEed page since supporting swapless could change
his LRU status between reclaim.

I am worry about voiding the optimization we have kept in CMA but
also don't have good idea, either so I tend to agree with this.

Let me Cc Johannes who might have better idea.

> ---
>  mm/vmscan.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b06868fc4926..9380a18eef5e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  	LIST_HEAD(clean_pages);
>  
>  	list_for_each_entry_safe(page, next, page_list, lru) {
> +		/* to avoid race with MADV_FREE anon page */
> +		if (PageAnon(page))
> +			continue;
>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>  			ClearPageActive(page);
> -- 
> 2.13.7
>
Minchan Kim April 17, 2020, 3:13 p.m. UTC | #3
On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
> Hi Jaewon,
> 
> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
> > This patch fix nr_isolate_* mismatch problem between cma and dirty
> > lazyfree page.
> > 
> > If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
> > page, then the lazyfree page is changed to a normal anon page having
> > SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
> > with the change, reclaim context correctly counts isolated files because
> > it uses is_file_lru to distinguish file. And the change to anon is not
> > happened if try_to_unmap_one is used for migration. So migration context
> > like compaction also correctly counts isolated files even though it uses
> > page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
> > renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
> > MADV_FREE").
> > 
> > But the nr_isolate_* mismatch problem happens on cma alloc. There is
> > reclaim_clean_pages_from_list which is being used only by cma. It was
> > introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
> > contiguous allocation instead of migration") to reclaim clean file pages
> > without migration. The cma alloc uses both reclaim_clean_pages_from_list
> > and migrate_pages, and it uses page_is_file_lru to count isolated
> > files. If there are dirty lazyfree pages allocated from cma memory
> > region, the pages are counted as isolated file at the beginging but are
> > counted as isolated anon after finished.
> > 
> > Mem-Info:
> > Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> > 
> > Like log above, there was too much isolated file, 37664kB, which
> > triggers too_many_isolated in reclaim when there is no isolated file in
> > system wide. It could be reproducible by running two programs, doing
> > MADV_FREE, writing and doing cma alloc, respectively. Although isolated
> > anon is 0, I found that the internal value of isolated anon was the
> > negative value of isolated file.
> > 
> > Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
> > lazyfree page can be checked by both PageAnon(page) and
> > page_is_file_lru. But in this case, PageAnon is enough to skip all
> > anon pages.
> > 
> > Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
> > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> 
> Thanks for the investigation!
> The thing is MADV_FREEed page since supporting swapless could change
> his LRU status between reclaim.
> 
> I am worry about voiding the optimization we have kept in CMA but
> also don't have good idea, either so I tend to agree with this.
> 
> Let me Cc Johannes who might have better idea.
> 
> > ---
> >  mm/vmscan.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b06868fc4926..9380a18eef5e 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> >  	LIST_HEAD(clean_pages);
> >  
> >  	list_for_each_entry_safe(page, next, page_list, lru) {
> > +		/* to avoid race with MADV_FREE anon page */
> > +		if (PageAnon(page))
> > +			continue;
> >  		if (page_is_file_lru(page) && !PageDirty(page) &&
> >  		    !__PageMovable(page) && !PageUnevictable(page)) {
> >  			ClearPageActive(page);
> > -- 
> > 2.13.7
> > 

Hi Jaewon,

How about this idea? I think it could solve the issue with keeping
CMA alloc latency optimization.

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 292485f3d24d..10cc932e209a 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -29,6 +29,7 @@ struct reclaim_stat {
 	unsigned nr_activate[2];
 	unsigned nr_ref_keep;
 	unsigned nr_unmap_fail;
+	unsigned nr_lazyfree_fail;
 };
 
 enum writeback_stat_item {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4c8a1cdccbba..b390f6094f2f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 */
 		if (page_mapped(page)) {
 			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
+			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
 
 			if (unlikely(PageTransHuge(page)))
 				flags |= TTU_SPLIT_HUGE_PMD;
+
 			if (!try_to_unmap(page, flags)) {
 				stat->nr_unmap_fail += nr_pages;
+				if (lazyfree && PageSwapBacked(page))
+					stat->nr_lazyfree_fail += nr_pages;
 				goto activate_locked;
 			}
 		}
@@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 		.priority = DEF_PRIORITY,
 		.may_unmap = 1,
 	};
-	struct reclaim_stat dummy_stat;
-	unsigned long ret;
+	struct reclaim_stat stat;
+	unsigned long reclaimed;
 	struct page *page, *next;
 	LIST_HEAD(clean_pages);
 
@@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 		}
 	}
 
-	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
-			TTU_IGNORE_ACCESS, &dummy_stat, true);
+	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
+			TTU_IGNORE_ACCESS, &stat, true);
 	list_splice(&clean_pages, page_list);
-	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
-	return ret;
+	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
+	/*
+	 * Since lazyfree pages are isolated from file LRU from the beginning,
+	 * they will rotate back to anonymous LRU in the end if it failed to
+	 * discard so isolated count will be mismatched.
+	 * Compensate the isolated count for both LRU lists.
+	 */
+	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
+					stat->nr_lazyfree_fail);
+	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
+					-stat->nr_lazyfree_fail);
+	return reclaimed;
 }
 
 /*
Jaewon Kim April 17, 2020, 11:45 p.m. UTC | #4
On 2020년 04월 18일 00:13, Minchan Kim wrote:
> On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
>> Hi Jaewon,
>>
>> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
>>> This patch fix nr_isolate_* mismatch problem between cma and dirty
>>> lazyfree page.
>>>
>>> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
>>> page, then the lazyfree page is changed to a normal anon page having
>>> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
>>> with the change, reclaim context correctly counts isolated files because
>>> it uses is_file_lru to distinguish file. And the change to anon is not
>>> happened if try_to_unmap_one is used for migration. So migration context
>>> like compaction also correctly counts isolated files even though it uses
>>> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
>>> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
>>> MADV_FREE").
>>>
>>> But the nr_isolate_* mismatch problem happens on cma alloc. There is
>>> reclaim_clean_pages_from_list which is being used only by cma. It was
>>> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
>>> contiguous allocation instead of migration") to reclaim clean file pages
>>> without migration. The cma alloc uses both reclaim_clean_pages_from_list
>>> and migrate_pages, and it uses page_is_file_lru to count isolated
>>> files. If there are dirty lazyfree pages allocated from cma memory
>>> region, the pages are counted as isolated file at the beginging but are
>>> counted as isolated anon after finished.
>>>
>>> Mem-Info:
>>> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>>>
>>> Like log above, there was too much isolated file, 37664kB, which
>>> triggers too_many_isolated in reclaim when there is no isolated file in
>>> system wide. It could be reproducible by running two programs, doing
>>> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
>>> anon is 0, I found that the internal value of isolated anon was the
>>> negative value of isolated file.
>>>
>>> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
>>> lazyfree page can be checked by both PageAnon(page) and
>>> page_is_file_lru. But in this case, PageAnon is enough to skip all
>>> anon pages.
>>>
>>> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
>>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> Thanks for the investigation!
>> The thing is MADV_FREEed page since supporting swapless could change
>> his LRU status between reclaim.
>>
>> I am worry about voiding the optimization we have kept in CMA but
>> also don't have good idea, either so I tend to agree with this.
>>
>> Let me Cc Johannes who might have better idea.
>>
>>> ---
>>>  mm/vmscan.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index b06868fc4926..9380a18eef5e 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>  	LIST_HEAD(clean_pages);
>>>  
>>>  	list_for_each_entry_safe(page, next, page_list, lru) {
>>> +		/* to avoid race with MADV_FREE anon page */
>>> +		if (PageAnon(page))
>>> +			continue;
>>>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>>>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>>>  			ClearPageActive(page);
>>> -- 
>>> 2.13.7
>>>
> Hi Jaewon,
>
> How about this idea? I think it could solve the issue with keeping
> CMA alloc latency optimization.
Hello Minchan

It looks good to me except compilation error.

And to apply this patch on other stable branches, we may need some other
dependent patches though.

Thank you
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 292485f3d24d..10cc932e209a 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -29,6 +29,7 @@ struct reclaim_stat {
>  	unsigned nr_activate[2];
>  	unsigned nr_ref_keep;
>  	unsigned nr_unmap_fail;
> +	unsigned nr_lazyfree_fail;
>  };
>  
>  enum writeback_stat_item {
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4c8a1cdccbba..b390f6094f2f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		 */
>  		if (page_mapped(page)) {
>  			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
> +			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
>  
>  			if (unlikely(PageTransHuge(page)))
>  				flags |= TTU_SPLIT_HUGE_PMD;
> +
>  			if (!try_to_unmap(page, flags)) {
>  				stat->nr_unmap_fail += nr_pages;
> +				if (lazyfree && PageSwapBacked(page))
> +					stat->nr_lazyfree_fail += nr_pages;
>  				goto activate_locked;
>  			}
>  		}
> @@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  		.priority = DEF_PRIORITY,
>  		.may_unmap = 1,
>  	};
> -	struct reclaim_stat dummy_stat;
> -	unsigned long ret;
> +	struct reclaim_stat stat;
> +	unsigned long reclaimed;
>  	struct page *page, *next;
>  	LIST_HEAD(clean_pages);
>  
> @@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>  		}
>  	}
>  
> -	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> -			TTU_IGNORE_ACCESS, &dummy_stat, true);
> +	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> +			TTU_IGNORE_ACCESS, &stat, true);
>  	list_splice(&clean_pages, page_list);
> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
> -	return ret;
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
> +	/*
> +	 * Since lazyfree pages are isolated from file LRU from the beginning,
> +	 * they will rotate back to anonymous LRU in the end if it failed to
> +	 * discard so isolated count will be mismatched.
> +	 * Compensate the isolated count for both LRU lists.
> +	 */
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
> +					stat->nr_lazyfree_fail);
> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
> +					-stat->nr_lazyfree_fail);
should be stat.nr_lazyfree_fail and -stat.nr_lazyfree_fail instead of ->
> +	return reclaimed;
>  }
>  
>  /*
Jaewon Kim April 20, 2020, 6:19 a.m. UTC | #5
On 2020년 04월 18일 08:45, Jaewon Kim wrote:
>
> On 2020년 04월 18일 00:13, Minchan Kim wrote:
>> On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
>>> Hi Jaewon,
>>>
>>> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
>>>> This patch fix nr_isolate_* mismatch problem between cma and dirty
>>>> lazyfree page.
>>>>
>>>> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
>>>> page, then the lazyfree page is changed to a normal anon page having
>>>> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
>>>> with the change, reclaim context correctly counts isolated files because
>>>> it uses is_file_lru to distinguish file. And the change to anon is not
>>>> happened if try_to_unmap_one is used for migration. So migration context
>>>> like compaction also correctly counts isolated files even though it uses
>>>> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
>>>> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
>>>> MADV_FREE").
>>>>
>>>> But the nr_isolate_* mismatch problem happens on cma alloc. There is
>>>> reclaim_clean_pages_from_list which is being used only by cma. It was
>>>> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
>>>> contiguous allocation instead of migration") to reclaim clean file pages
>>>> without migration. The cma alloc uses both reclaim_clean_pages_from_list
>>>> and migrate_pages, and it uses page_is_file_lru to count isolated
>>>> files. If there are dirty lazyfree pages allocated from cma memory
>>>> region, the pages are counted as isolated file at the beginging but are
>>>> counted as isolated anon after finished.
>>>>
>>>> Mem-Info:
>>>> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>>>>
>>>> Like log above, there was too much isolated file, 37664kB, which
>>>> triggers too_many_isolated in reclaim when there is no isolated file in
>>>> system wide. It could be reproducible by running two programs, doing
>>>> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
>>>> anon is 0, I found that the internal value of isolated anon was the
>>>> negative value of isolated file.
>>>>
>>>> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
>>>> lazyfree page can be checked by both PageAnon(page) and
>>>> page_is_file_lru. But in this case, PageAnon is enough to skip all
>>>> anon pages.
>>>>
>>>> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
>>>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>>> Thanks for the investigation!
>>> The thing is MADV_FREEed page since supporting swapless could change
>>> his LRU status between reclaim.
>>>
>>> I am worry about voiding the optimization we have kept in CMA but
>>> also don't have good idea, either so I tend to agree with this.
>>>
>>> Let me Cc Johannes who might have better idea.
>>>
>>>> ---
>>>>  mm/vmscan.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index b06868fc4926..9380a18eef5e 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>>  	LIST_HEAD(clean_pages);
>>>>  
>>>>  	list_for_each_entry_safe(page, next, page_list, lru) {
>>>> +		/* to avoid race with MADV_FREE anon page */
>>>> +		if (PageAnon(page))
>>>> +			continue;
>>>>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>>>>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>>>>  			ClearPageActive(page);
>>>> -- 
>>>> 2.13.7
>>>>
>> Hi Jaewon,
>>
>> How about this idea? I think it could solve the issue with keeping
>> CMA alloc latency optimization.
> Hello Minchan
>
> It looks good to me except compilation error.
>
> And to apply this patch on other stable branches, we may need some other
> dependent patches though.
>
> Thank you
>> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
>> index 292485f3d24d..10cc932e209a 100644
>> --- a/include/linux/vmstat.h
>> +++ b/include/linux/vmstat.h
>> @@ -29,6 +29,7 @@ struct reclaim_stat {
>>  	unsigned nr_activate[2];
>>  	unsigned nr_ref_keep;
>>  	unsigned nr_unmap_fail;
>> +	unsigned nr_lazyfree_fail;
>>  };
>>  
>>  enum writeback_stat_item {
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 4c8a1cdccbba..b390f6094f2f 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>  		 */
>>  		if (page_mapped(page)) {
>>  			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
>> +			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
>>  
>>  			if (unlikely(PageTransHuge(page)))
>>  				flags |= TTU_SPLIT_HUGE_PMD;
>> +
>>  			if (!try_to_unmap(page, flags)) {
>>  				stat->nr_unmap_fail += nr_pages;
>> +				if (lazyfree && PageSwapBacked(page))
>> +					stat->nr_lazyfree_fail += nr_pages;
>>  				goto activate_locked;
>>  			}
>>  		}
>> @@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>  		.priority = DEF_PRIORITY,
>>  		.may_unmap = 1,
>>  	};
>> -	struct reclaim_stat dummy_stat;
>> -	unsigned long ret;
>> +	struct reclaim_stat stat;
>> +	unsigned long reclaimed;
>>  	struct page *page, *next;
>>  	LIST_HEAD(clean_pages);
>>  
>> @@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>  		}
>>  	}
>>  
>> -	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>> -			TTU_IGNORE_ACCESS, &dummy_stat, true);
>> +	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>> +			TTU_IGNORE_ACCESS, &stat, true);
>>  	list_splice(&clean_pages, page_list);
>> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
>> -	return ret;
>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
>> +	/*
>> +	 * Since lazyfree pages are isolated from file LRU from the beginning,
>> +	 * they will rotate back to anonymous LRU in the end if it failed to
>> +	 * discard so isolated count will be mismatched.
>> +	 * Compensate the isolated count for both LRU lists.
>> +	 */
>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
>> +					stat->nr_lazyfree_fail);
>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
>> +					-stat->nr_lazyfree_fail);
> should be stat.nr_lazyfree_fail and -stat.nr_lazyfree_fail instead of ->
>> +	return reclaimed;
>>  }
>>  
>>  /*

Let me just CC Shaohua Li for commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
, because I missed him/her on other mail thread 
 : Sorry, I think I pointed a wrong commit, the SwapBacked was recovered
 : by commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages").


Hello Minchan

I tested on my v4.19 based source tree and it seems to work.

Prior to your patch I applied commit 060f005f0747 (
"mm/vmscan.c: do not allocate duplicate stack variables in shrink_page_list()"
for struct reclaim_stat.

I considered other dependent changes below to follow code changes, not really needed for this issue though.
v5.3 98879b3b9edc mm: vmscan: correct some vmscan counters for THP swapout
v5.2 886cf1901db9 mm: move recent_rotated pages calculation to shrink_inactive_list()

Thank you
Jaewon Kim
Jaewon Kim April 21, 2020, 12:06 p.m. UTC | #6
On 2020년 04월 20일 15:19, Jaewon Kim wrote:
>
> On 2020년 04월 18일 08:45, Jaewon Kim wrote:
>> On 2020년 04월 18일 00:13, Minchan Kim wrote:
>>> On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
>>>> Hi Jaewon,
>>>>
>>>> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
>>>>> This patch fix nr_isolate_* mismatch problem between cma and dirty
>>>>> lazyfree page.
>>>>>
>>>>> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
>>>>> page, then the lazyfree page is changed to a normal anon page having
>>>>> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
>>>>> with the change, reclaim context correctly counts isolated files because
>>>>> it uses is_file_lru to distinguish file. And the change to anon is not
>>>>> happened if try_to_unmap_one is used for migration. So migration context
>>>>> like compaction also correctly counts isolated files even though it uses
>>>>> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
>>>>> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
>>>>> MADV_FREE").
>>>>>
>>>>> But the nr_isolate_* mismatch problem happens on cma alloc. There is
>>>>> reclaim_clean_pages_from_list which is being used only by cma. It was
>>>>> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
>>>>> contiguous allocation instead of migration") to reclaim clean file pages
>>>>> without migration. The cma alloc uses both reclaim_clean_pages_from_list
>>>>> and migrate_pages, and it uses page_is_file_lru to count isolated
>>>>> files. If there are dirty lazyfree pages allocated from cma memory
>>>>> region, the pages are counted as isolated file at the beginging but are
>>>>> counted as isolated anon after finished.
>>>>>
>>>>> Mem-Info:
>>>>> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>>>>>
>>>>> Like log above, there was too much isolated file, 37664kB, which
>>>>> triggers too_many_isolated in reclaim when there is no isolated file in
>>>>> system wide. It could be reproducible by running two programs, doing
>>>>> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
>>>>> anon is 0, I found that the internal value of isolated anon was the
>>>>> negative value of isolated file.
>>>>>
>>>>> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
>>>>> lazyfree page can be checked by both PageAnon(page) and
>>>>> page_is_file_lru. But in this case, PageAnon is enough to skip all
>>>>> anon pages.
>>>>>
>>>>> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
>>>>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>>>> Thanks for the investigation!
>>>> The thing is MADV_FREEed page since supporting swapless could change
>>>> his LRU status between reclaim.
>>>>
>>>> I am worry about voiding the optimization we have kept in CMA but
>>>> also don't have good idea, either so I tend to agree with this.
>>>>
>>>> Let me Cc Johannes who might have better idea.
>>>>
>>>>> ---
>>>>>  mm/vmscan.c | 3 +++
>>>>>  1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index b06868fc4926..9380a18eef5e 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>>>  	LIST_HEAD(clean_pages);
>>>>>  
>>>>>  	list_for_each_entry_safe(page, next, page_list, lru) {
>>>>> +		/* to avoid race with MADV_FREE anon page */
>>>>> +		if (PageAnon(page))
>>>>> +			continue;
>>>>>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>>>>>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>>>>>  			ClearPageActive(page);
>>>>> -- 
>>>>> 2.13.7
>>>>>
>>> Hi Jaewon,
>>>
>>> How about this idea? I think it could solve the issue with keeping
>>> CMA alloc latency optimization.
>> Hello Minchan
>>
>> It looks good to me except compilation error.
>>
>> And to apply this patch on other stable branches, we may need some other
>> dependent patches though.
>>
>> Thank you
>>> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
>>> index 292485f3d24d..10cc932e209a 100644
>>> --- a/include/linux/vmstat.h
>>> +++ b/include/linux/vmstat.h
>>> @@ -29,6 +29,7 @@ struct reclaim_stat {
>>>  	unsigned nr_activate[2];
>>>  	unsigned nr_ref_keep;
>>>  	unsigned nr_unmap_fail;
>>> +	unsigned nr_lazyfree_fail;
>>>  };
>>>  
>>>  enum writeback_stat_item {
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 4c8a1cdccbba..b390f6094f2f 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>>  		 */
>>>  		if (page_mapped(page)) {
>>>  			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
>>> +			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
>>>  
>>>  			if (unlikely(PageTransHuge(page)))
>>>  				flags |= TTU_SPLIT_HUGE_PMD;
>>> +
>>>  			if (!try_to_unmap(page, flags)) {
>>>  				stat->nr_unmap_fail += nr_pages;
>>> +				if (lazyfree && PageSwapBacked(page))
>>> +					stat->nr_lazyfree_fail += nr_pages;
>>>  				goto activate_locked;
>>>  			}
>>>  		}
>>> @@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>  		.priority = DEF_PRIORITY,
>>>  		.may_unmap = 1,
>>>  	};
>>> -	struct reclaim_stat dummy_stat;
>>> -	unsigned long ret;
>>> +	struct reclaim_stat stat;
>>> +	unsigned long reclaimed;
>>>  	struct page *page, *next;
>>>  	LIST_HEAD(clean_pages);
>>>  
>>> @@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>  		}
>>>  	}
>>>  
>>> -	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>>> -			TTU_IGNORE_ACCESS, &dummy_stat, true);
>>> +	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>>> +			TTU_IGNORE_ACCESS, &stat, true);
>>>  	list_splice(&clean_pages, page_list);
>>> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
>>> -	return ret;
>>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
>>> +	/*
>>> +	 * Since lazyfree pages are isolated from file LRU from the beginning,
>>> +	 * they will rotate back to anonymous LRU in the end if it failed to
>>> +	 * discard so isolated count will be mismatched.
>>> +	 * Compensate the isolated count for both LRU lists.
>>> +	 */
>>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
>>> +					stat->nr_lazyfree_fail);
>>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
>>> +					-stat->nr_lazyfree_fail);
>> should be stat.nr_lazyfree_fail and -stat.nr_lazyfree_fail instead of ->
>>> +	return reclaimed;
>>>  }
>>>  
>>>  /*
> Let me just CC Shaohua Li for commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
> , because I missed him/her on other mail thread 
>  : Sorry, I think I pointed a wrong commit, the SwapBacked was recovered
>  : by commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages").
>
>
> Hello Minchan
>
> I tested on my v4.19 based source tree and it seems to work.
>
> Prior to your patch I applied commit 060f005f0747 (
> "mm/vmscan.c: do not allocate duplicate stack variables in shrink_page_list()"
> for struct reclaim_stat.
>
> I considered other dependent changes below to follow code changes, not really needed for this issue though.
> v5.3 98879b3b9edc mm: vmscan: correct some vmscan counters for THP swapout
> v5.2 886cf1901db9 mm: move recent_rotated pages calculation to shrink_inactive_list()
Hello Minchan

Are you preparing a complete patch for this issue?
Sorry if I am bugging you.
>
> Thank you
> Jaewon Kim
>
>
Minchan Kim April 22, 2020, 5:40 a.m. UTC | #7
On Tue, Apr 21, 2020 at 09:06:37PM +0900, Jaewon Kim wrote:
> 
> 
> On 2020년 04월 20일 15:19, Jaewon Kim wrote:
> >
> > On 2020년 04월 18일 08:45, Jaewon Kim wrote:
> >> On 2020년 04월 18일 00:13, Minchan Kim wrote:
> >>> On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
> >>>> Hi Jaewon,
> >>>>
> >>>> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
> >>>>> This patch fix nr_isolate_* mismatch problem between cma and dirty
> >>>>> lazyfree page.
> >>>>>
> >>>>> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
> >>>>> page, then the lazyfree page is changed to a normal anon page having
> >>>>> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
> >>>>> with the change, reclaim context correctly counts isolated files because
> >>>>> it uses is_file_lru to distinguish file. And the change to anon is not
> >>>>> happened if try_to_unmap_one is used for migration. So migration context
> >>>>> like compaction also correctly counts isolated files even though it uses
> >>>>> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
> >>>>> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
> >>>>> MADV_FREE").
> >>>>>
> >>>>> But the nr_isolate_* mismatch problem happens on cma alloc. There is
> >>>>> reclaim_clean_pages_from_list which is being used only by cma. It was
> >>>>> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
> >>>>> contiguous allocation instead of migration") to reclaim clean file pages
> >>>>> without migration. The cma alloc uses both reclaim_clean_pages_from_list
> >>>>> and migrate_pages, and it uses page_is_file_lru to count isolated
> >>>>> files. If there are dirty lazyfree pages allocated from cma memory
> >>>>> region, the pages are counted as isolated file at the beginging but are
> >>>>> counted as isolated anon after finished.
> >>>>>
> >>>>> Mem-Info:
> >>>>> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> >>>>>
> >>>>> Like log above, there was too much isolated file, 37664kB, which
> >>>>> triggers too_many_isolated in reclaim when there is no isolated file in
> >>>>> system wide. It could be reproducible by running two programs, doing
> >>>>> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
> >>>>> anon is 0, I found that the internal value of isolated anon was the
> >>>>> negative value of isolated file.
> >>>>>
> >>>>> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
> >>>>> lazyfree page can be checked by both PageAnon(page) and
> >>>>> page_is_file_lru. But in this case, PageAnon is enough to skip all
> >>>>> anon pages.
> >>>>>
> >>>>> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
> >>>>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> >>>> Thanks for the investigation!
> >>>> The thing is MADV_FREEed page since supporting swapless could change
> >>>> his LRU status between reclaim.
> >>>>
> >>>> I am worry about voiding the optimization we have kept in CMA but
> >>>> also don't have good idea, either so I tend to agree with this.
> >>>>
> >>>> Let me Cc Johannes who might have better idea.
> >>>>
> >>>>> ---
> >>>>>  mm/vmscan.c | 3 +++
> >>>>>  1 file changed, 3 insertions(+)
> >>>>>
> >>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>>>> index b06868fc4926..9380a18eef5e 100644
> >>>>> --- a/mm/vmscan.c
> >>>>> +++ b/mm/vmscan.c
> >>>>> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> >>>>>  	LIST_HEAD(clean_pages);
> >>>>>  
> >>>>>  	list_for_each_entry_safe(page, next, page_list, lru) {
> >>>>> +		/* to avoid race with MADV_FREE anon page */
> >>>>> +		if (PageAnon(page))
> >>>>> +			continue;
> >>>>>  		if (page_is_file_lru(page) && !PageDirty(page) &&
> >>>>>  		    !__PageMovable(page) && !PageUnevictable(page)) {
> >>>>>  			ClearPageActive(page);
> >>>>> -- 
> >>>>> 2.13.7
> >>>>>
> >>> Hi Jaewon,
> >>>
> >>> How about this idea? I think it could solve the issue with keeping
> >>> CMA alloc latency optimization.
> >> Hello Minchan
> >>
> >> It looks good to me except compilation error.
> >>
> >> And to apply this patch on other stable branches, we may need some other
> >> dependent patches though.
> >>
> >> Thank you
> >>> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> >>> index 292485f3d24d..10cc932e209a 100644
> >>> --- a/include/linux/vmstat.h
> >>> +++ b/include/linux/vmstat.h
> >>> @@ -29,6 +29,7 @@ struct reclaim_stat {
> >>>  	unsigned nr_activate[2];
> >>>  	unsigned nr_ref_keep;
> >>>  	unsigned nr_unmap_fail;
> >>> +	unsigned nr_lazyfree_fail;
> >>>  };
> >>>  
> >>>  enum writeback_stat_item {
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index 4c8a1cdccbba..b390f6094f2f 100644
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >>>  		 */
> >>>  		if (page_mapped(page)) {
> >>>  			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
> >>> +			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
> >>>  
> >>>  			if (unlikely(PageTransHuge(page)))
> >>>  				flags |= TTU_SPLIT_HUGE_PMD;
> >>> +
> >>>  			if (!try_to_unmap(page, flags)) {
> >>>  				stat->nr_unmap_fail += nr_pages;
> >>> +				if (lazyfree && PageSwapBacked(page))
> >>> +					stat->nr_lazyfree_fail += nr_pages;
> >>>  				goto activate_locked;
> >>>  			}
> >>>  		}
> >>> @@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> >>>  		.priority = DEF_PRIORITY,
> >>>  		.may_unmap = 1,
> >>>  	};
> >>> -	struct reclaim_stat dummy_stat;
> >>> -	unsigned long ret;
> >>> +	struct reclaim_stat stat;
> >>> +	unsigned long reclaimed;
> >>>  	struct page *page, *next;
> >>>  	LIST_HEAD(clean_pages);
> >>>  
> >>> @@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
> >>>  		}
> >>>  	}
> >>>  
> >>> -	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> >>> -			TTU_IGNORE_ACCESS, &dummy_stat, true);
> >>> +	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
> >>> +			TTU_IGNORE_ACCESS, &stat, true);
> >>>  	list_splice(&clean_pages, page_list);
> >>> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
> >>> -	return ret;
> >>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
> >>> +	/*
> >>> +	 * Since lazyfree pages are isolated from file LRU from the beginning,
> >>> +	 * they will rotate back to anonymous LRU in the end if it failed to
> >>> +	 * discard so isolated count will be mismatched.
> >>> +	 * Compensate the isolated count for both LRU lists.
> >>> +	 */
> >>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
> >>> +					stat->nr_lazyfree_fail);
> >>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
> >>> +					-stat->nr_lazyfree_fail);
> >> should be stat.nr_lazyfree_fail and -stat.nr_lazyfree_fail instead of ->
> >>> +	return reclaimed;
> >>>  }
> >>>  
> >>>  /*
> > Let me just CC Shaohua Li for commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
> > , because I missed him/her on other mail thread 
> >  : Sorry, I think I pointed a wrong commit, the SwapBacked was recovered
> >  : by commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages").
> >
> >
> > Hello Minchan
> >
> > I tested on my v4.19 based source tree and it seems to work.
> >
> > Prior to your patch I applied commit 060f005f0747 (
> > "mm/vmscan.c: do not allocate duplicate stack variables in shrink_page_list()"
> > for struct reclaim_stat.
> >
> > I considered other dependent changes below to follow code changes, not really needed for this issue though.
> > v5.3 98879b3b9edc mm: vmscan: correct some vmscan counters for THP swapout
> > v5.2 886cf1901db9 mm: move recent_rotated pages calculation to shrink_inactive_list()
> Hello Minchan
> 
> Are you preparing a complete patch for this issue?
> Sorry if I am bugging you.

Hi Jaewon

Sorry for the late. You catched the bug and gave good description with
the solution. What I did was just suggestion for alternative so feel
free to send the patch with your SoB. You could use my Suggested-by
and Acked-by. It's totally your credit!

Thanks!

Thanks!
Jaewon Kim April 22, 2020, 8:39 a.m. UTC | #8
On 2020년 04월 22일 14:40, Minchan Kim wrote:
> On Tue, Apr 21, 2020 at 09:06:37PM +0900, Jaewon Kim wrote:
>>
>> On 2020년 04월 20일 15:19, Jaewon Kim wrote:
>>> On 2020년 04월 18일 08:45, Jaewon Kim wrote:
>>>> On 2020년 04월 18일 00:13, Minchan Kim wrote:
>>>>> On Thu, Apr 16, 2020 at 05:38:37PM -0700, Minchan Kim wrote:
>>>>>> Hi Jaewon,
>>>>>>
>>>>>> On Thu, Apr 16, 2020 at 12:35:14PM +0900, Jaewon Kim wrote:
>>>>>>> This patch fix nr_isolate_* mismatch problem between cma and dirty
>>>>>>> lazyfree page.
>>>>>>>
>>>>>>> If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
>>>>>>> page, then the lazyfree page is changed to a normal anon page having
>>>>>>> SwapBacked by commit 18863d3a3f59 ("mm: remove SWAP_DIRTY in ttu"). Even
>>>>>>> with the change, reclaim context correctly counts isolated files because
>>>>>>> it uses is_file_lru to distinguish file. And the change to anon is not
>>>>>>> happened if try_to_unmap_one is used for migration. So migration context
>>>>>>> like compaction also correctly counts isolated files even though it uses
>>>>>>> page_is_file_lru insted of is_file_lru. Recently page_is_file_cache was
>>>>>>> renamed to page_is_file_lru by commit 9de4f22a60f7 ("mm: code cleanup for
>>>>>>> MADV_FREE").
>>>>>>>
>>>>>>> But the nr_isolate_* mismatch problem happens on cma alloc. There is
>>>>>>> reclaim_clean_pages_from_list which is being used only by cma. It was
>>>>>>> introduced by commit 02c6de8d757c ("mm: cma: discard clean pages during
>>>>>>> contiguous allocation instead of migration") to reclaim clean file pages
>>>>>>> without migration. The cma alloc uses both reclaim_clean_pages_from_list
>>>>>>> and migrate_pages, and it uses page_is_file_lru to count isolated
>>>>>>> files. If there are dirty lazyfree pages allocated from cma memory
>>>>>>> region, the pages are counted as isolated file at the beginging but are
>>>>>>> counted as isolated anon after finished.
>>>>>>>
>>>>>>> Mem-Info:
>>>>>>> Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
>>>>>>>
>>>>>>> Like log above, there was too much isolated file, 37664kB, which
>>>>>>> triggers too_many_isolated in reclaim when there is no isolated file in
>>>>>>> system wide. It could be reproducible by running two programs, doing
>>>>>>> MADV_FREE, writing and doing cma alloc, respectively. Although isolated
>>>>>>> anon is 0, I found that the internal value of isolated anon was the
>>>>>>> negative value of isolated file.
>>>>>>>
>>>>>>> Fix this by skipping anon pages on reclaim_clean_pages_from_list. The
>>>>>>> lazyfree page can be checked by both PageAnon(page) and
>>>>>>> page_is_file_lru. But in this case, PageAnon is enough to skip all
>>>>>>> anon pages.
>>>>>>>
>>>>>>> Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
>>>>>>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>>>>>> Thanks for the investigation!
>>>>>> The thing is MADV_FREEed page since supporting swapless could change
>>>>>> his LRU status between reclaim.
>>>>>>
>>>>>> I am worry about voiding the optimization we have kept in CMA but
>>>>>> also don't have good idea, either so I tend to agree with this.
>>>>>>
>>>>>> Let me Cc Johannes who might have better idea.
>>>>>>
>>>>>>> ---
>>>>>>>  mm/vmscan.c | 3 +++
>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>>>> index b06868fc4926..9380a18eef5e 100644
>>>>>>> --- a/mm/vmscan.c
>>>>>>> +++ b/mm/vmscan.c
>>>>>>> @@ -1497,6 +1497,9 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>>>>>  	LIST_HEAD(clean_pages);
>>>>>>>  
>>>>>>>  	list_for_each_entry_safe(page, next, page_list, lru) {
>>>>>>> +		/* to avoid race with MADV_FREE anon page */
>>>>>>> +		if (PageAnon(page))
>>>>>>> +			continue;
>>>>>>>  		if (page_is_file_lru(page) && !PageDirty(page) &&
>>>>>>>  		    !__PageMovable(page) && !PageUnevictable(page)) {
>>>>>>>  			ClearPageActive(page);
>>>>>>> -- 
>>>>>>> 2.13.7
>>>>>>>
>>>>> Hi Jaewon,
>>>>>
>>>>> How about this idea? I think it could solve the issue with keeping
>>>>> CMA alloc latency optimization.
>>>> Hello Minchan
>>>>
>>>> It looks good to me except compilation error.
>>>>
>>>> And to apply this patch on other stable branches, we may need some other
>>>> dependent patches though.
>>>>
>>>> Thank you
>>>>> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
>>>>> index 292485f3d24d..10cc932e209a 100644
>>>>> --- a/include/linux/vmstat.h
>>>>> +++ b/include/linux/vmstat.h
>>>>> @@ -29,6 +29,7 @@ struct reclaim_stat {
>>>>>  	unsigned nr_activate[2];
>>>>>  	unsigned nr_ref_keep;
>>>>>  	unsigned nr_unmap_fail;
>>>>> +	unsigned nr_lazyfree_fail;
>>>>>  };
>>>>>  
>>>>>  enum writeback_stat_item {
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index 4c8a1cdccbba..b390f6094f2f 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -1296,11 +1296,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>>>>>  		 */
>>>>>  		if (page_mapped(page)) {
>>>>>  			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
>>>>> +			bool lazyfree = PageAnon(page) && !PageSwapBacked(page);
>>>>>  
>>>>>  			if (unlikely(PageTransHuge(page)))
>>>>>  				flags |= TTU_SPLIT_HUGE_PMD;
>>>>> +
>>>>>  			if (!try_to_unmap(page, flags)) {
>>>>>  				stat->nr_unmap_fail += nr_pages;
>>>>> +				if (lazyfree && PageSwapBacked(page))
>>>>> +					stat->nr_lazyfree_fail += nr_pages;
>>>>>  				goto activate_locked;
>>>>>  			}
>>>>>  		}
>>>>> @@ -1492,8 +1496,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>>>  		.priority = DEF_PRIORITY,
>>>>>  		.may_unmap = 1,
>>>>>  	};
>>>>> -	struct reclaim_stat dummy_stat;
>>>>> -	unsigned long ret;
>>>>> +	struct reclaim_stat stat;
>>>>> +	unsigned long reclaimed;
>>>>>  	struct page *page, *next;
>>>>>  	LIST_HEAD(clean_pages);
>>>>>  
>>>>> @@ -1505,11 +1509,21 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
>>>>>  		}
>>>>>  	}
>>>>>  
>>>>> -	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>>>>> -			TTU_IGNORE_ACCESS, &dummy_stat, true);
>>>>> +	reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
>>>>> +			TTU_IGNORE_ACCESS, &stat, true);
>>>>>  	list_splice(&clean_pages, page_list);
>>>>> -	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
>>>>> -	return ret;
>>>>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -reclaimed);
>>>>> +	/*
>>>>> +	 * Since lazyfree pages are isolated from file LRU from the beginning,
>>>>> +	 * they will rotate back to anonymous LRU in the end if it failed to
>>>>> +	 * discard so isolated count will be mismatched.
>>>>> +	 * Compensate the isolated count for both LRU lists.
>>>>> +	 */
>>>>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON,
>>>>> +					stat->nr_lazyfree_fail);
>>>>> +	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE,
>>>>> +					-stat->nr_lazyfree_fail);
>>>> should be stat.nr_lazyfree_fail and -stat.nr_lazyfree_fail instead of ->
>>>>> +	return reclaimed;
>>>>>  }
>>>>>  
>>>>>  /*
>>> Let me just CC Shaohua Li for commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
>>> , because I missed him/her on other mail thread 
>>>  : Sorry, I think I pointed a wrong commit, the SwapBacked was recovered
>>>  : by commit 802a3a92ad7a ("mm: reclaim MADV_FREE pages").
>>>
>>>
>>> Hello Minchan
>>>
>>> I tested on my v4.19 based source tree and it seems to work.
>>>
>>> Prior to your patch I applied commit 060f005f0747 (
>>> "mm/vmscan.c: do not allocate duplicate stack variables in shrink_page_list()"
>>> for struct reclaim_stat.
>>>
>>> I considered other dependent changes below to follow code changes, not really needed for this issue though.
>>> v5.3 98879b3b9edc mm: vmscan: correct some vmscan counters for THP swapout
>>> v5.2 886cf1901db9 mm: move recent_rotated pages calculation to shrink_inactive_list()
>> Hello Minchan
>>
>> Are you preparing a complete patch for this issue?
>> Sorry if I am bugging you.
> Hi Jaewon
>
> Sorry for the late. You catched the bug and gave good description with
> the solution. What I did was just suggestion for alternative so feel
> free to send the patch with your SoB. You could use my Suggested-by
> and Acked-by. It's totally your credit!
Thank you for your support and generosity.
>
> Thanks!
>
> Thanks!
>
>
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b06868fc4926..9380a18eef5e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1497,6 +1497,9 @@  unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	LIST_HEAD(clean_pages);
 
 	list_for_each_entry_safe(page, next, page_list, lru) {
+		/* to avoid race with MADV_FREE anon page */
+		if (PageAnon(page))
+			continue;
 		if (page_is_file_lru(page) && !PageDirty(page) &&
 		    !__PageMovable(page) && !PageUnevictable(page)) {
 			ClearPageActive(page);