Message ID | 20241014221211.832591-1-weixugc@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/2] mm/mglru: only clear kswapd_failures if reclaimable | expand |
On Mon, 14 Oct 2024 22:12:11 +0000 Wei Xu <weixugc@google.com> wrote: > lru_gen_shrink_node() unconditionally clears kswapd_failures, which > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even > when kswapd repeatedly fails to make progress in reclaim. > > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes > some progress, similar to shrink_node(). That sounds bad. What triggers this? Can you suggest why it has just bee discovered, after 1.5 years? And should the fix be backported into -stable kernels?
On Mon, Oct 14, 2024 at 4:25 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 14 Oct 2024 22:12:11 +0000 Wei Xu <weixugc@google.com> wrote: > > > lru_gen_shrink_node() unconditionally clears kswapd_failures, which > > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even > > when kswapd repeatedly fails to make progress in reclaim. > > > > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes > > some progress, similar to shrink_node(). > > That sounds bad. What triggers this? Can you suggest why it has just > bee discovered, after 1.5 years? And should the fix be backported into > -stable kernels? > I happened to run into this problem in one of my tests recently. It requires a combination of several conditions: The allocator needs to allocate a right amount of pages such that it can wake up kswapd without itself being OOM killed; there is no memory for kswapd to reclaim (My test disables swap and cleans page cache first); no other process frees enough memory at the same time. I think the fix is a good candidate for stable kernels.
On Mon, Oct 14, 2024 at 4:12 PM Wei Xu <weixugc@google.com> wrote: > > lru_gen_shrink_node() unconditionally clears kswapd_failures, which > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even > when kswapd repeatedly fails to make progress in reclaim. > > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes > some progress, similar to shrink_node(). > > Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") > Signed-off-by: Wei Xu <weixugc@google.com> > --- > mm/vmscan.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 50dc06d55b1d..9d1e1c4e383d 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4970,8 +4970,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control * > > blk_finish_plug(&plug); > done: Nit: the "done:" isn't used anymore, so better just remove it. > - /* kswapd should never fail */ > - pgdat->kswapd_failures = 0; > + if (sc->nr_reclaimed > reclaimed) > + pgdat->kswapd_failures = 0; > } > > /****************************************************************************** > -- > 2.47.0.rc1.288.g06298d1525-goog > >
On Tue, Oct 15, 2024 at 9:57 PM Yu Zhao <yuzhao@google.com> wrote: > > On Mon, Oct 14, 2024 at 4:12 PM Wei Xu <weixugc@google.com> wrote: > > > > lru_gen_shrink_node() unconditionally clears kswapd_failures, which > > can prevent kswapd from sleeping and cause 100% kswapd cpu usage even > > when kswapd repeatedly fails to make progress in reclaim. > > > > Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes > > some progress, similar to shrink_node(). > > > > Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") > > Signed-off-by: Wei Xu <weixugc@google.com> > > --- > > mm/vmscan.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 50dc06d55b1d..9d1e1c4e383d 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -4970,8 +4970,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control * > > > > blk_finish_plug(&plug); > > done: > > Nit: the "done:" isn't used anymore, so better just remove it. > "goto done" is still used at the beginning of lru_gen_shrink_node(). We can refactor the code to remove it. But it is better to be handled in a separate change. > > - /* kswapd should never fail */ > > - pgdat->kswapd_failures = 0; > > + if (sc->nr_reclaimed > reclaimed) > > + pgdat->kswapd_failures = 0; > > } > > > > /****************************************************************************** > > -- > > 2.47.0.rc1.288.g06298d1525-goog > > > >
diff --git a/mm/vmscan.c b/mm/vmscan.c index 50dc06d55b1d..9d1e1c4e383d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4970,8 +4970,8 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control * blk_finish_plug(&plug); done: - /* kswapd should never fail */ - pgdat->kswapd_failures = 0; + if (sc->nr_reclaimed > reclaimed) + pgdat->kswapd_failures = 0; } /******************************************************************************
lru_gen_shrink_node() unconditionally clears kswapd_failures, which can prevent kswapd from sleeping and cause 100% kswapd cpu usage even when kswapd repeatedly fails to make progress in reclaim. Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some progress, similar to shrink_node(). Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Wei Xu <weixugc@google.com> --- mm/vmscan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)