diff mbox series

mm: don't SetPageWorkingset unconditionally during swapin

Message ID 20201209012400.1771150-1-yuzhao@google.com (mailing list archive)
State New, archived
Headers show
Series mm: don't SetPageWorkingset unconditionally during swapin | expand

Commit Message

Yu Zhao Dec. 9, 2020, 1:24 a.m. UTC
We are capable of SetPageWorkingset based on refault distances after
commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
This is done by workingset_refault(), which is right above the
unconditional SetPageWorkingset deleted by this patch.

The unconditional SetPageWorkingset miscategorizes pages that are
read ahead or never belonged to the working set (e.g., tmpfs pages
accessed by fd). When those pages are swapped in (after they were
swapped out) for the first time, they skew PSI (when using
async swap). When this happens again, depending on their refault
distances, they might skew workingset_restore_anon counter in
addition to PSI because their shadows say they were part of the
working set.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/swap_state.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Vlastimil Babka Dec. 9, 2020, 9:18 a.m. UTC | #1
On 12/9/20 2:24 AM, Yu Zhao wrote:
> We are capable of SetPageWorkingset based on refault distances after
> commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> This is done by workingset_refault(), which is right above the
> unconditional SetPageWorkingset deleted by this patch.
> 
> The unconditional SetPageWorkingset miscategorizes pages that are
> read ahead or never belonged to the working set (e.g., tmpfs pages
> accessed by fd). When those pages are swapped in (after they were
> swapped out) for the first time, they skew PSI (when using
> async swap). When this happens again, depending on their refault
> distances, they might skew workingset_restore_anon counter in
> addition to PSI because their shadows say they were part of the
> working set.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Makes sense, especially now that we have anonymous LRU support. The flag setting
in this context seems to go back all the way to 1899ad18c607 ("mm: workingset:
tell cache transitions from workingset thrashing") where I'm not sure why it was
even used on the anonymous page, when workingset was only implemented for the
page cache. Maybe Johannes remembers?

> ---
>  mm/swap_state.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 1a01235156d1..6ecc84448d75 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -536,7 +536,6 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>  		workingset_refault(page, shadow);
>  
>  	/* Caller will initiate read into locked page */
> -	SetPageWorkingset(page);
>  	lru_cache_add(page);
>  	*new_page_allocated = true;
>  	return page;
>
Johannes Weiner Dec. 10, 2020, 10:47 a.m. UTC | #2
On Tue, Dec 08, 2020 at 06:24:00PM -0700, Yu Zhao wrote:
> We are capable of SetPageWorkingset based on refault distances after
> commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> This is done by workingset_refault(), which is right above the
> unconditional SetPageWorkingset deleted by this patch.
> 
> The unconditional SetPageWorkingset miscategorizes pages that are
> read ahead or never belonged to the working set (e.g., tmpfs pages
> accessed by fd). When those pages are swapped in (after they were
> swapped out) for the first time, they skew PSI (when using
> async swap). When this happens again, depending on their refault
> distances, they might skew workingset_restore_anon counter in
> addition to PSI because their shadows say they were part of the
> working set.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Johannes Weiner Dec. 10, 2020, 11:21 a.m. UTC | #3
On Wed, Dec 09, 2020 at 10:18:22AM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:24 AM, Yu Zhao wrote:
> > We are capable of SetPageWorkingset based on refault distances after
> > commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> > This is done by workingset_refault(), which is right above the
> > unconditional SetPageWorkingset deleted by this patch.
> > 
> > The unconditional SetPageWorkingset miscategorizes pages that are
> > read ahead or never belonged to the working set (e.g., tmpfs pages
> > accessed by fd). When those pages are swapped in (after they were
> > swapped out) for the first time, they skew PSI (when using
> > async swap). When this happens again, depending on their refault
> > distances, they might skew workingset_restore_anon counter in
> > addition to PSI because their shadows say they were part of the
> > working set.
> > 
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Makes sense, especially now that we have anonymous LRU support. The flag setting
> in this context seems to go back all the way to 1899ad18c607 ("mm: workingset:
> tell cache transitions from workingset thrashing") where I'm not sure why it was
> even used on the anonymous page, when workingset was only implemented for the
> page cache. Maybe Johannes remembers?

I just double checked that commit and the changelog is indeed
incomplete and doesn't mention the swap aspect. :(

That patch was part of the psi series. It was meant to mark incoming
pages under IO with SetPageWorkingset when waiting for them
constituted a memory stall.

On the page cache side, because we HAVE workingset detection, this was
specific to recently evicted pages that had been active in their
previous life. On the anon side, the aging algorithm had no
distinction between workingset and sporadically used pages. Given the
choice between a) no swapin stalls are pressure and b) all swapin
stalls are pressure, I went with the latter in order to detect swap
storms. The false positive case - high rate of swapin without severe
memory pressure - was relatively unlikely, because we tried to avoid
swapping until everything was completely on fire in the first place.

With the lru balancing rework, more prevalent use of proactive reclaim
etc. the distinction between hot and cold swapins became more
important. Thankfully, Joonsoo's patches made that possible.
Joonsoo Kim Dec. 10, 2020, 12:06 p.m. UTC | #4
On Tue, Dec 08, 2020 at 06:24:00PM -0700, Yu Zhao wrote:
> We are capable of SetPageWorkingset based on refault distances after
> commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> This is done by workingset_refault(), which is right above the
> unconditional SetPageWorkingset deleted by this patch.
> 
> The unconditional SetPageWorkingset miscategorizes pages that are
> read ahead or never belonged to the working set (e.g., tmpfs pages
> accessed by fd). When those pages are swapped in (after they were
> swapped out) for the first time, they skew PSI (when using
> async swap). When this happens again, depending on their refault
> distances, they might skew workingset_restore_anon counter in
> addition to PSI because their shadows say they were part of the
> working set.
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks
Yu Zhao Dec. 10, 2020, 10:44 p.m. UTC | #5
On Thu, Dec 10, 2020 at 06:21:57AM -0500, Johannes Weiner wrote:
> On Wed, Dec 09, 2020 at 10:18:22AM +0100, Vlastimil Babka wrote:
> > On 12/9/20 2:24 AM, Yu Zhao wrote:
> > > We are capable of SetPageWorkingset based on refault distances after
> > > commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> > > This is done by workingset_refault(), which is right above the
> > > unconditional SetPageWorkingset deleted by this patch.
> > > 
> > > The unconditional SetPageWorkingset miscategorizes pages that are
> > > read ahead or never belonged to the working set (e.g., tmpfs pages
> > > accessed by fd). When those pages are swapped in (after they were
> > > swapped out) for the first time, they skew PSI (when using
> > > async swap). When this happens again, depending on their refault
> > > distances, they might skew workingset_restore_anon counter in
> > > addition to PSI because their shadows say they were part of the
> > > working set.
> > > 
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > 
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Makes sense, especially now that we have anonymous LRU support. The flag setting
> > in this context seems to go back all the way to 1899ad18c607 ("mm: workingset:
> > tell cache transitions from workingset thrashing") where I'm not sure why it was
> > even used on the anonymous page, when workingset was only implemented for the
> > page cache. Maybe Johannes remembers?
> 
> I just double checked that commit and the changelog is indeed
> incomplete and doesn't mention the swap aspect. :(
> 
> That patch was part of the psi series. It was meant to mark incoming
> pages under IO with SetPageWorkingset when waiting for them
> constituted a memory stall.
> 
> On the page cache side, because we HAVE workingset detection, this was
> specific to recently evicted pages that had been active in their
> previous life. On the anon side, the aging algorithm had no
> distinction between workingset and sporadically used pages. Given the
> choice between a) no swapin stalls are pressure and b) all swapin
> stalls are pressure, I went with the latter in order to detect swap
> storms. The false positive case - high rate of swapin without severe
> memory pressure - was relatively unlikely, because we tried to avoid
> swapping until everything was completely on fire in the first place.

This was my guess too -- and it makes sense to go with b) at that time.

Thanks for confirming.

> With the lru balancing rework, more prevalent use of proactive reclaim
> etc. the distinction between hot and cold swapins became more
> important. Thankfully, Joonsoo's patches made that possible.
Michal Hocko Dec. 14, 2020, 4:09 p.m. UTC | #6
On Thu 10-12-20 06:21:57, Johannes Weiner wrote:
> On Wed, Dec 09, 2020 at 10:18:22AM +0100, Vlastimil Babka wrote:
> > On 12/9/20 2:24 AM, Yu Zhao wrote:
> > > We are capable of SetPageWorkingset based on refault distances after
> > > commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
> > > This is done by workingset_refault(), which is right above the
> > > unconditional SetPageWorkingset deleted by this patch.
> > > 
> > > The unconditional SetPageWorkingset miscategorizes pages that are
> > > read ahead or never belonged to the working set (e.g., tmpfs pages
> > > accessed by fd). When those pages are swapped in (after they were
> > > swapped out) for the first time, they skew PSI (when using
> > > async swap). When this happens again, depending on their refault
> > > distances, they might skew workingset_restore_anon counter in
> > > addition to PSI because their shadows say they were part of the
> > > working set.
> > > 
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > 
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Makes sense, especially now that we have anonymous LRU support. The flag setting
> > in this context seems to go back all the way to 1899ad18c607 ("mm: workingset:
> > tell cache transitions from workingset thrashing") where I'm not sure why it was
> > even used on the anonymous page, when workingset was only implemented for the
> > page cache. Maybe Johannes remembers?
> 
> I just double checked that commit and the changelog is indeed
> incomplete and doesn't mention the swap aspect. :(
> 
> That patch was part of the psi series. It was meant to mark incoming
> pages under IO with SetPageWorkingset when waiting for them
> constituted a memory stall.
> 
> On the page cache side, because we HAVE workingset detection, this was
> specific to recently evicted pages that had been active in their
> previous life. On the anon side, the aging algorithm had no
> distinction between workingset and sporadically used pages. Given the
> choice between a) no swapin stalls are pressure and b) all swapin
> stalls are pressure, I went with the latter in order to detect swap
> storms. The false positive case - high rate of swapin without severe
> memory pressure - was relatively unlikely, because we tried to avoid
> swapping until everything was completely on fire in the first place.
> 
> With the lru balancing rework, more prevalent use of proactive reclaim
> etc. the distinction between hot and cold swapins became more
> important. Thankfully, Joonsoo's patches made that possible.

This is a useful information, thanks! Yu Zhao can you make it into the
changelog so that we have it for a future reference please?

Feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
diff mbox series

Patch

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 1a01235156d1..6ecc84448d75 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -536,7 +536,6 @@  struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
 		workingset_refault(page, shadow);
 
 	/* Caller will initiate read into locked page */
-	SetPageWorkingset(page);
 	lru_cache_add(page);
 	*new_page_allocated = true;
 	return page;