diff mbox series

mm: page_io: fix psi memory pressure error on cold swapins

Message ID 20220214214921.419687-1-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: page_io: fix psi memory pressure error on cold swapins | expand

Commit Message

Johannes Weiner Feb. 14, 2022, 9:49 p.m. UTC
Once upon a time, all swapins counted toward memory pressure[1]. Then
Joonsoo introduced workingset detection for anonymous pages and we
gained the ability to distinguish hot from cold swapins[2][3]. But we
failed to update swap_readpage() accordingly, and now we account
partial memory pressure in the swapin path of cold memory.

Not for all situations - which adds more inconsistency: paths using
the conventional submit_bio() and lock_page() route will not see much
pressure - unless storage itself is heavily congested and the bio
submissions stall. ZRAM and ZSWAP do most of the work directly from
swap_readpage() and will see all swapins reflected as pressure.

Restore consistency by making all swapin stall accounting conditional
on the page actually being part of the workingset.

[1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
[2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
[3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin")

Reported-by: CGEL <cgel.zte@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Yu Zhao <yuzhao@google.com>
---
 mm/page_io.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Andrew Morton Feb. 14, 2022, 10:48 p.m. UTC | #1
On Mon, 14 Feb 2022 16:49:21 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Once upon a time, all swapins counted toward memory pressure[1]. Then
> Joonsoo introduced workingset detection for anonymous pages and we
> gained the ability to distinguish hot from cold swapins[2][3]. But we
> failed to update swap_readpage() accordingly, and now we account
> partial memory pressure in the swapin path of cold memory.
> 
> Not for all situations - which adds more inconsistency: paths using
> the conventional submit_bio() and lock_page() route will not see much
> pressure - unless storage itself is heavily congested and the bio
> submissions stall. ZRAM and ZSWAP do most of the work directly from
> swap_readpage() and will see all swapins reflected as pressure.
> 
> Restore consistency by making all swapin stall accounting conditional
> on the page actually being part of the workingset.

Does this have any known runtime effects?  If not, can we
hazard a guess?
Johannes Weiner Feb. 15, 2022, 4:13 p.m. UTC | #2
On Mon, Feb 14, 2022 at 02:48:05PM -0800, Andrew Morton wrote:
> On Mon, 14 Feb 2022 16:49:21 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Once upon a time, all swapins counted toward memory pressure[1]. Then
> > Joonsoo introduced workingset detection for anonymous pages and we
> > gained the ability to distinguish hot from cold swapins[2][3]. But we
> > failed to update swap_readpage() accordingly, and now we account
> > partial memory pressure in the swapin path of cold memory.
> > 
> > Not for all situations - which adds more inconsistency: paths using
> > the conventional submit_bio() and lock_page() route will not see much
> > pressure - unless storage itself is heavily congested and the bio
> > submissions stall. ZRAM and ZSWAP do most of the work directly from
> > swap_readpage() and will see all swapins reflected as pressure.
> > 
> > Restore consistency by making all swapin stall accounting conditional
> > on the page actually being part of the workingset.
> 
> Does this have any known runtime effects?  If not, can we
> hazard a guess?

hm, how about this paragrah between "not for all situations" and
"restore consistency":

IOW, a workload doing cold swapins could see little to no pressure
reported with on-disk swap, but potentially high pressure with a zram
or zswap backend. That confuses any psi-based health monitoring, load
shedding, proactive reclaim, or userspace OOM killing schemes that
might be in place for the workload.
Minchan Kim Feb. 15, 2022, 5:44 p.m. UTC | #3
On Mon, Feb 14, 2022 at 04:49:21PM -0500, Johannes Weiner wrote:
> Once upon a time, all swapins counted toward memory pressure[1]. Then
> Joonsoo introduced workingset detection for anonymous pages and we
> gained the ability to distinguish hot from cold swapins[2][3]. But we
> failed to update swap_readpage() accordingly, and now we account
> partial memory pressure in the swapin path of cold memory.
> 
> Not for all situations - which adds more inconsistency: paths using
> the conventional submit_bio() and lock_page() route will not see much
> pressure - unless storage itself is heavily congested and the bio
> submissions stall. ZRAM and ZSWAP do most of the work directly from
> swap_readpage() and will see all swapins reflected as pressure.
> 
> Restore consistency by making all swapin stall accounting conditional
> on the page actually being part of the workingset.
> 
> [1] commit 937790699be9 ("mm/page_io.c: annotate refault stalls from swap_readpage")
> [2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> [3] commit cad8320b4b39 ("mm/swap: don't SetPageWorkingset unconditionally during swapin")
> 
> Reported-by: CGEL <cgel.zte@gmail.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Minchan Kim <minchan@kernel.org>
diff mbox series

Patch

diff --git a/mm/page_io.c b/mm/page_io.c
index 61c792f916fa..f6296ee25014 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -359,6 +359,7 @@  int swap_readpage(struct page *page, bool synchronous)
 	struct bio *bio;
 	int ret = 0;
 	struct swap_info_struct *sis = page_swap_info(page);
+	bool workingset = PageWorkingset(page);
 	unsigned long pflags;
 
 	VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
@@ -370,7 +371,8 @@  int swap_readpage(struct page *page, bool synchronous)
 	 * or the submitting cgroup IO-throttled, submission can be a
 	 * significant part of overall IO time.
 	 */
-	psi_memstall_enter(&pflags);
+	if (workingset)
+		psi_memstall_enter(&pflags);
 	delayacct_swapin_start();
 
 	if (frontswap_load(page) == 0) {
@@ -431,7 +433,8 @@  int swap_readpage(struct page *page, bool synchronous)
 	bio_put(bio);
 
 out:
-	psi_memstall_leave(&pflags);
+	if (workingset)
+		psi_memstall_leave(&pflags);
 	delayacct_swapin_end();
 	return ret;
 }