diff mbox series

[v9,07/20] mm/lru: introduce TestClearPageLRU

Message ID 1583146830-169516-8-git-send-email-alex.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series [v9,01/20] mm/vmscan: remove unnecessary lruvec adding | expand

Commit Message

Alex Shi March 2, 2020, 11 a.m. UTC
Combined PageLRU check and ClearPageLRU into one function by new
introduced func TestClearPageLRU. This function will be used as page
isolation precondition.

No functional change yet.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: linux-mm@kvack.org
---
 include/linux/page-flags.h |  1 +
 mm/memcontrol.c            |  4 ++--
 mm/mlock.c                 |  3 +--
 mm/swap.c                  |  8 ++------
 mm/vmscan.c                | 19 +++++++++----------
 5 files changed, 15 insertions(+), 20 deletions(-)

Comments

Andrew Morton March 2, 2020, 10:11 p.m. UTC | #1
On Mon,  2 Mar 2020 19:00:17 +0800 Alex Shi <alex.shi@linux.alibaba.com> wrote:

> Combined PageLRU check and ClearPageLRU into one function by new
> introduced func TestClearPageLRU. This function will be used as page
> isolation precondition.
> 
> No functional change yet.
> 
> ...
>
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2588,9 +2588,8 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg,
>  		pgdat = page_pgdat(page);
>  		spin_lock_irq(&pgdat->lru_lock);
>  
> -		if (PageLRU(page)) {
> +		if (TestClearPageLRU(page)) {
>  			lruvec = mem_cgroup_page_lruvec(page, pgdat);
> -			ClearPageLRU(page);
>  			del_page_from_lru_list(page, lruvec, page_lru(page));
>  		} else

The code will now get exclusive access of the page->flags cacheline and
will dirty that cacheline, even for !PageLRU() pages.  What is the
performance impact of this?
Alex Shi March 3, 2020, 4:11 a.m. UTC | #2
在 2020/3/3 上午6:11, Andrew Morton 写道:
>> -		if (PageLRU(page)) {
>> +		if (TestClearPageLRU(page)) {
>>  			lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> -			ClearPageLRU(page);
>>  			del_page_from_lru_list(page, lruvec, page_lru(page));
>>  		} else
> 
> The code will now get exclusive access of the page->flags cacheline and
> will dirty that cacheline, even for !PageLRU() pages.  What is the
> performance impact of this?
> 

Hi Andrew,

Thanks a lot for comments!

I was tested the whole patchset with fengguang's case-lru-file-readtwice
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/
which is most sensitive case on PageLRU I found. There are no clear performance
drop.

On this single patch, I just test the same case again, there is still no perf
drop. some data is here on my 96 threads machine:

		no lock_dep	w lock_dep and few other debug option
w this patch, 	50.96MB/s		32.93MB/s
w/o this patch, 50.50MB/s		33.53MB/s


And also no any warning from Intel 0day yet.

Thanks a lot!
Alex
Andrew Morton March 4, 2020, 12:46 a.m. UTC | #3
On Tue, 3 Mar 2020 12:11:34 +0800 Alex Shi <alex.shi@linux.alibaba.com> wrote:

> 
> 
> 在 2020/3/3 上午6:11, Andrew Morton 写道:
> >> -		if (PageLRU(page)) {
> >> +		if (TestClearPageLRU(page)) {
> >>  			lruvec = mem_cgroup_page_lruvec(page, pgdat);
> >> -			ClearPageLRU(page);
> >>  			del_page_from_lru_list(page, lruvec, page_lru(page));
> >>  		} else
> > 
> > The code will now get exclusive access of the page->flags cacheline and
> > will dirty that cacheline, even for !PageLRU() pages.  What is the
> > performance impact of this?
> > 
> 
> Hi Andrew,
> 
> Thanks a lot for comments!
> 
> I was tested the whole patchset with fengguang's case-lru-file-readtwice
> https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/
> which is most sensitive case on PageLRU I found. There are no clear performance
> drop.
> 
> On this single patch, I just test the same case again, there is still no perf
> drop. some data is here on my 96 threads machine:
> 
> 		no lock_dep	w lock_dep and few other debug option
> w this patch, 	50.96MB/s		32.93MB/s
> w/o this patch, 50.50MB/s		33.53MB/s
> 
> 

Well, any difference would be small and the numbers did get a bit
lower, albeit probably within the margin of error.

But you know, if someone were to send a patch which micro-optimized
some code by replacing 'TestClearXXX()' with `if PageXXX()
ClearPageXXX()', I would happily merge it!

Is this change essential to the overall patchset?  If not, I'd be
inclined to skip it?
Alex Shi March 4, 2020, 7:06 a.m. UTC | #4
在 2020/3/4 上午8:46, Andrew Morton 写道:
> 
> Well, any difference would be small and the numbers did get a bit
> lower, albeit probably within the margin of error.
> 
> But you know, if someone were to send a patch which micro-optimized
> some code by replacing 'TestClearXXX()' with `if PageXXX()
> ClearPageXXX()', I would happily merge it!
> 
> Is this change essential to the overall patchset?  If not, I'd be
> inclined to skip it?
> 

Hi Andrew,

Thanks a lot for comments!
Yes, this patch is essential for all next.

Consider the normal memory testing would focus on user page, that probabably with PageLRU. 

Fengguang's vm-scalibicase-small-allocs which used a lots vm_area_struct slab, which may
got some impact. 0day/Cheng Rong is working on the test. In my roughly testing, it may drop 5% on my
96threads/394GB machine.

Thanks
Alex
Chen, Rong A March 4, 2020, 9:03 a.m. UTC | #5
On Wed, Mar 04, 2020 at 03:06:48PM +0800, Alex Shi wrote:
> 
> 
> 在 2020/3/4 上午8:46, Andrew Morton 写道:
> > 
> > Well, any difference would be small and the numbers did get a bit
> > lower, albeit probably within the margin of error.
> > 
> > But you know, if someone were to send a patch which micro-optimized
> > some code by replacing 'TestClearXXX()' with `if PageXXX()
> > ClearPageXXX()', I would happily merge it!
> > 
> > Is this change essential to the overall patchset?  If not, I'd be
> > inclined to skip it?
> > 
> 
> Hi Andrew,
> 
> Thanks a lot for comments!
> Yes, this patch is essential for all next.
> 
> Consider the normal memory testing would focus on user page, that probabably with PageLRU. 
> 
> Fengguang's vm-scalibicase-small-allocs which used a lots vm_area_struct slab, which may
> got some impact. 0day/Cheng Rong is working on the test. In my roughly testing, it may drop 5% on my
> 96threads/394GB machine.
> 
> Thanks
> Alex

Hi,

We only tested one case and found a slight regression of vm-scalability.median from this patch set:

Test case: small allocs
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/300s/lkp-ivb-d02/small-allocs/vm-scalability/0x21

commit: 
  v5.6-rc4
  f71ed0f653e9dbd57347f6321e36556004a17b52

        v5.6-rc4 f71ed0f653e9dbd57347f6321e3 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    998930            -1.4%     984729        vm-scalability.median

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase/ucode:
  gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/300s/lkp-csl-2ap4/small-allocs/vm-scalability/0x500002c

commit: 
  v5.6-rc4
  f71ed0f653e9dbd57347f6321e36556004a17b52

        v5.6-rc4 f71ed0f653e9dbd57347f6321e3 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     64040            -2.2%      62641        vm-scalability.median
  12294118            -2.2%   12027483        vm-scalability.throughput
 3.695e+09            -2.2%  3.614e+09        vm-scalability.workload


$ git log --oneline v5.6-rc4..f71ed0f653e9dbd57347f6321e36556004a17b52
f71ed0f653e9d mm/memcg: add debug checking in lock_page_memcg
c56d782a737a5 mm/lru: add debug checking for page memcg moving
40f9438e4f7a9 mm/lru: revise the comments of lru_lock
cf4d433ab1f59 mm/pgdat: remove pgdat lru_lock
8b45216bf602c mm/swap: only change the lru_lock iff page's lruvec is different
9aeff27f856c4 mm/mlock: optimize munlock_pagevec by relocking
0fd16f50f4260 mm/lru: introduce the relock_page_lruvec function
e8bcc2440b133 mm/lru: replace pgdat lru_lock with lruvec lock
88013de2d9cfa mm/mlock: clean up __munlock_isolate_lru_page
037f0e01cc3a3 mm/memcg: move SetPageLRU out of lru_lock in commit_charge
5f889edacd91d mm/lru: take PageLRU first in moving page between lru lists
06998f054a82b mm/mlock: ClearPageLRU before get lru lock in munlock page isolation
5212e3636eed6 mm/lru: add page isolation precondition in __isolate_lru_page
a7b8b29f36b13 mm/lru: introduce TestClearPageLRU
f27e8da1e2fa1 mm/thp: narrow lru locking
2deca0177d843 mm/thp: clean up lru_add_page_tail
afbe030c47e06 mm/thp: move lru_add_page_tail func to huge_memory.c
9bee24913b538 mm/page_idle: no unlikely double check for idle page counting
40def76b96d7b mm/memcg: fold lock_page_lru into commit_charge
c1199696673c2 mm/vmscan: remove unnecessary lruvec adding

Best Regards,
Rong Chen
Alex Shi March 4, 2020, 9:37 a.m. UTC | #6
在 2020/3/4 下午5:03, Rong Chen 写道:
>> Hi Andrew,
>>
>> Thanks a lot for comments!
>> Yes, this patch is essential for all next.
>>
>> Consider the normal memory testing would focus on user page, that probabably with PageLRU. 
>>
>> Fengguang's vm-scalibicase-small-allocs which used a lots vm_area_struct slab, which may
>> got some impact. 0day/Cheng Rong is working on the test. In my roughly testing, it may drop 5% on my
>> 96threads/394GB machine.
>>
>> Thanks
>> Alex
> Hi,
> 
> We only tested one case and found a slight regression of vm-scalability.median from this patch set:
> 
> Test case: small allocs
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase/ucode:
>   gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/300s/lkp-ivb-d02/small-allocs/vm-scalability/0x21

It's a very acceptable result!

Thanks a lot for so quick testing! I believe your results would be far more stable than me. :)

(My testing show quit different result in 2 reboot(1.3% or 6.6% drop). Maybe sth wrong for me in this case.)

Thanks for your report!
Alex
diff mbox series

Patch

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 1bf83c8fcaa7..5cb155f3191e 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -318,6 +318,7 @@  static inline void page_init_poison(struct page *page, size_t size)
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
 	__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+	TESTCLEARFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
 	TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 875e2aebcde7..f8419f3436a8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2588,9 +2588,8 @@  static void commit_charge(struct page *page, struct mem_cgroup *memcg,
 		pgdat = page_pgdat(page);
 		spin_lock_irq(&pgdat->lru_lock);
 
-		if (PageLRU(page)) {
+		if (TestClearPageLRU(page)) {
 			lruvec = mem_cgroup_page_lruvec(page, pgdat);
-			ClearPageLRU(page);
 			del_page_from_lru_list(page, lruvec, page_lru(page));
 		} else
 			spin_unlock_irq(&pgdat->lru_lock);
@@ -2613,6 +2612,7 @@  static void commit_charge(struct page *page, struct mem_cgroup *memcg,
 
 	if (lrucare && lruvec) {
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
+
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 		SetPageLRU(page);
 		add_page_to_lru_list(page, lruvec, page_lru(page));
diff --git a/mm/mlock.c b/mm/mlock.c
index a72c1eeded77..03b3a5d99ad7 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -108,13 +108,12 @@  void mlock_vma_page(struct page *page)
  */
 static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
 {
-	if (PageLRU(page)) {
+	if (TestClearPageLRU(page)) {
 		struct lruvec *lruvec;
 
 		lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
 		if (getpage)
 			get_page(page);
-		ClearPageLRU(page);
 		del_page_from_lru_list(page, lruvec, page_lru(page));
 		return true;
 	}
diff --git a/mm/swap.c b/mm/swap.c
index 1ac24fc35d6b..8e71bdd04a1a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -59,15 +59,13 @@ 
  */
 static void __page_cache_release(struct page *page)
 {
-	if (PageLRU(page)) {
+	if (TestClearPageLRU(page)) {
 		pg_data_t *pgdat = page_pgdat(page);
 		struct lruvec *lruvec;
 		unsigned long flags;
 
 		spin_lock_irqsave(&pgdat->lru_lock, flags);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
-		VM_BUG_ON_PAGE(!PageLRU(page), page);
-		__ClearPageLRU(page);
 		del_page_from_lru_list(page, lruvec, page_off_lru(page));
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
 	}
@@ -831,7 +829,7 @@  void release_pages(struct page **pages, int nr)
 			continue;
 		}
 
-		if (PageLRU(page)) {
+		if (TestClearPageLRU(page)) {
 			struct pglist_data *pgdat = page_pgdat(page);
 
 			if (pgdat != locked_pgdat) {
@@ -844,8 +842,6 @@  void release_pages(struct page **pages, int nr)
 			}
 
 			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
-			VM_BUG_ON_PAGE(!PageLRU(page), page);
-			__ClearPageLRU(page);
 			del_page_from_lru_list(page, lruvec, page_off_lru(page));
 		}
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcdd33f65f43..8958454d50fe 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1751,21 +1751,20 @@  int isolate_lru_page(struct page *page)
 	VM_BUG_ON_PAGE(!page_count(page), page);
 	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
 
-	if (PageLRU(page)) {
+	get_page(page);
+	if (TestClearPageLRU(page)) {
 		pg_data_t *pgdat = page_pgdat(page);
 		struct lruvec *lruvec;
+		int lru = page_lru(page);
 
-		spin_lock_irq(&pgdat->lru_lock);
 		lruvec = mem_cgroup_page_lruvec(page, pgdat);
-		if (PageLRU(page)) {
-			int lru = page_lru(page);
-			get_page(page);
-			ClearPageLRU(page);
-			del_page_from_lru_list(page, lruvec, lru);
-			ret = 0;
-		}
+		spin_lock_irq(&pgdat->lru_lock);
+		del_page_from_lru_list(page, lruvec, lru);
 		spin_unlock_irq(&pgdat->lru_lock);
-	}
+		ret = 0;
+	} else
+		put_page(page);
+
 	return ret;
 }