[v4,1/1] mm: vmscan: Reduce throttling due to a failure to make progress

From: Mel Gorman <mgorman@techsingularity.net>

Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.
In Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling. In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being
made") introduced the problem although commit a19594ca4a8b ("mm/vmscan:
increase the timeout if page reclaim is not making progress") made it
worse. Systems at or near an OOM state that cannot be recovered must
reach OOM quickly and memcg should kill tasks if a memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there were
excessive pages pending for writeback. If kswapd has stopped reclaiming due
to excessive failures, do not stall at all so that OOM triggers relatively
quickly. Similarly, if an LRU is simply congested, only lightly throttle
similar to NOPROGRESS.

Alexey's original case was the most straight forward

	for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a lot
with lots of frames missing and numerous audio glitches. With this patch
applies, the video plays similarly to 5.15.

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net

[lkp@intel.com: Fix W=1 build warning]
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 69392a403f49 ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/mmzone.h        |  1 +
 include/trace/events/vmscan.h |  4 ++-
 mm/vmscan.c                   | 64 ++++++++++++++++++++++++++++++-----
 3 files changed, 59 insertions(+), 10 deletions(-)

Message ID	20211202150614.22440-1-mgorman@techsingularity.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> From: Mel Gorman <mgorman@techsingularity.net> To: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>, Alexey Avramov <hakavlad@inbox.lv>, Rik van Riel <riel@surriel.com>, Mike Galbraith <efault@gmx.de>, Darrick Wong <djwong@kernel.org>, regressions@lists.linux.dev, Linux-fsdevel <linux-fsdevel@vger.kernel.org>, Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@techsingularity.net> Subject: [PATCH v4 1/1] mm: vmscan: Reduce throttling due to a failure to make progress Date: Thu, 2 Dec 2021 15:06:14 +0000 Message-Id: <20211202150614.22440-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v4,1/1] mm: vmscan: Reduce throttling due to a failure to make progress \| expand [v4,1/1] mm: vmscan: Reduce throttling due to a failure to make progress

[v4,1/1] mm: vmscan: Reduce throttling due to a failure to make progress

Commit Message

Comments

Patch