diff mbox series

[1/2] mm/mglru: only clear kswapd_failures if reclaimable

Message ID 20241014221211.832591-1-weixugc@google.com (mailing list archive)
State New
Headers show
Series [1/2] mm/mglru: only clear kswapd_failures if reclaimable | expand

Commit Message

Wei Xu Oct. 14, 2024, 10:12 p.m. UTC
lru_gen_shrink_node() unconditionally clears kswapd_failures, which
can prevent kswapd from sleeping and cause 100% kswapd cpu usage even
when kswapd repeatedly fails to make progress in reclaim.

Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes
some progress, similar to shrink_node().

Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Wei Xu <weixugc@google.com>
---
 mm/vmscan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Andrew Morton Oct. 14, 2024, 11:25 p.m. UTC | #1
On Mon, 14 Oct 2024 22:12:11 +0000 Wei Xu <weixugc@google.com> wrote:

> lru_gen_shrink_node() unconditionally clears kswapd_failures, which
> can prevent kswapd from sleeping and cause 100% kswapd cpu usage even
> when kswapd repeatedly fails to make progress in reclaim.
> 
> Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes
> some progress, similar to shrink_node().

That sounds bad.  What triggers this?  Can you suggest why it has just
bee discovered, after 1.5 years?  And should the fix be backported into
-stable kernels?
Wei Xu Oct. 14, 2024, 11:41 p.m. UTC | #2
On Mon, Oct 14, 2024 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 14 Oct 2024 22:12:11 +0000 Wei Xu <weixugc@google.com> wrote:
>
> > lru_gen_shrink_node() unconditionally clears kswapd_failures, which
> > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even
> > when kswapd repeatedly fails to make progress in reclaim.
> >
> > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes
> > some progress, similar to shrink_node().
>
> That sounds bad.  What triggers this?  Can you suggest why it has just
> bee discovered, after 1.5 years?  And should the fix be backported into
> -stable kernels?
>

I happened to run into this problem in one of my tests recently. It
requires a combination of several conditions: The allocator needs to
allocate a right amount of pages such that it can wake up kswapd
without itself being OOM killed; there is no memory for kswapd to
reclaim (My test disables swap and cleans page cache first); no other
process frees enough memory at the same time.

I think the fix is a good candidate for stable kernels.
Yu Zhao Oct. 16, 2024, 4:56 a.m. UTC | #3
On Mon, Oct 14, 2024 at 4:12 PM Wei Xu <weixugc@google.com> wrote:
>
> lru_gen_shrink_node() unconditionally clears kswapd_failures, which
> can prevent kswapd from sleeping and cause 100% kswapd cpu usage even
> when kswapd repeatedly fails to make progress in reclaim.
>
> Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes
> some progress, similar to shrink_node().
>
> Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
> Signed-off-by: Wei Xu <weixugc@google.com>
> ---
>  mm/vmscan.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 50dc06d55b1d..9d1e1c4e383d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4970,8 +4970,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
>
>         blk_finish_plug(&plug);
>  done:

Nit: the "done:" isn't used anymore, so better just remove it.

> -       /* kswapd should never fail */
> -       pgdat->kswapd_failures = 0;
> +       if (sc->nr_reclaimed > reclaimed)
> +               pgdat->kswapd_failures = 0;
>  }
>
>  /******************************************************************************
> --
> 2.47.0.rc1.288.g06298d1525-goog
>
>
Wei Xu Oct. 16, 2024, 5:29 a.m. UTC | #4
On Tue, Oct 15, 2024 at 9:57 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Mon, Oct 14, 2024 at 4:12 PM Wei Xu <weixugc@google.com> wrote:
> >
> > lru_gen_shrink_node() unconditionally clears kswapd_failures, which
> > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even
> > when kswapd repeatedly fails to make progress in reclaim.
> >
> > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes
> > some progress, similar to shrink_node().
> >
> > Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
> > Signed-off-by: Wei Xu <weixugc@google.com>
> > ---
> >  mm/vmscan.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 50dc06d55b1d..9d1e1c4e383d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4970,8 +4970,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
> >
> >         blk_finish_plug(&plug);
> >  done:
>
> Nit: the "done:" isn't used anymore, so better just remove it.
>

"goto done" is still used at the beginning of lru_gen_shrink_node().
We can refactor the code to remove it. But it is better to be handled
in a separate change.

> > -       /* kswapd should never fail */
> > -       pgdat->kswapd_failures = 0;
> > +       if (sc->nr_reclaimed > reclaimed)
> > +               pgdat->kswapd_failures = 0;
> >  }
> >
> >  /******************************************************************************
> > --
> > 2.47.0.rc1.288.g06298d1525-goog
> >
> >
diff mbox series

Patch

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 50dc06d55b1d..9d1e1c4e383d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4970,8 +4970,8 @@  static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
 
 	blk_finish_plug(&plug);
 done:
-	/* kswapd should never fail */
-	pgdat->kswapd_failures = 0;
+	if (sc->nr_reclaimed > reclaimed)
+		pgdat->kswapd_failures = 0;
 }
 
 /******************************************************************************