From patchwork Mon Aug 15 07:13:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 12943196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBB4CC282E7 for ; Mon, 15 Aug 2022 07:14:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6EA8F6B0082; Mon, 15 Aug 2022 03:14:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D4448D0001; Mon, 15 Aug 2022 03:14:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42A076B0085; Mon, 15 Aug 2022 03:14:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0D7446B0082 for ; Mon, 15 Aug 2022 03:14:28 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DE7A4C01B8 for ; Mon, 15 Aug 2022 07:14:27 +0000 (UTC) X-FDA: 79800963774.01.E0B8889 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf22.hostedemail.com (Postfix) with ESMTP id 66748C0013 for ; Mon, 15 Aug 2022 07:14:27 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id y2-20020a25bb82000000b0068c18dad92aso83633ybg.13 for ; Mon, 15 Aug 2022 00:14:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:from:to:cc; bh=5eUro1Ys65PlYEqsgNKi16f/oPtNjC98bqZv1yckg04=; b=EB8amhRSjr6akZtRJpJd3EbVK5+m7OZimLU3f1TX6110F1KqnW4WpnJt8XFow2IRTP ZblIfviXz3UrCUCTrYg8On2uNDBjJtniWRX9QYBRMli1hFiVLZAFh4YA+S05Qi39c0xT Dzyv9cBuHv3M3cs9EPK2CtKrtnT9L9flJv91JwQv6yjUmMNFsc1BxLVGdxgalAJNNa4U ldMrM1RlDpTthS9li0XcUfSpz/6FwY1aPtZVbwsG8YI4+2wHqpjOUZ8BFUBdnGxyij4I QkOgsdGdwTSfzBuS31n/8w9gh5PpaY3vWxvDvWrjMIivtuWaExuM/ADamWAlrYjHYE5x nujQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:x-gm-message-state:from:to :cc; bh=5eUro1Ys65PlYEqsgNKi16f/oPtNjC98bqZv1yckg04=; b=19YIC1E/neVhfo8binto/2Tag+win+UTYpS8Cfqq3YiFskGwwLAUvzvX+g/9FdLroA bWGL1PEPXhMahFRljYDkuZanoDKK+1b0Gqlbku7HBm2XFPpBiAM5Nh0DrHmxX7UHhpaM D4kcEa85sRoFpmbCSoI3piYkHkDicO4YFrTgMExkx9bUIzjJUZKfr5brWkCSLCwVVVGJ FqxEO0Fq7Q8q40GGbRmDWduUe92OsOMCROY9/IFwsNuev1ptRCKde3FaS/xAY1kv+AaW yssJFOVRm5B9YoL4RlR0o+T89h/zunQcSSzzJhjgsH7uE3GyfILDJ44TFLqEqAAM9S4E 7lmQ== X-Gm-Message-State: ACgBeo2WvhprMVgZWQ4WfYU7rxy4R/oTQbOWsZKDX21Bi3ZiRXVAtHgm 9KU2bCcCx7FUGXiXJ6RTAcIAGXX7OQU= X-Google-Smtp-Source: AA6agR669YNMXgcKgGcmGvzMJKETmvy/CAH14Ch6T2S9fJbFuil9iZjGHFjOhKoiLRsskelSJ4q76RDP/iM= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:d91:5887:ac93:ddf0]) (user=yuzhao job=sendgmr) by 2002:a81:1206:0:b0:32f:ecac:f0f with SMTP id 6-20020a811206000000b0032fecac0f0fmr5286066yws.495.1660547666940; Mon, 15 Aug 2022 00:14:26 -0700 (PDT) Date: Mon, 15 Aug 2022 01:13:28 -0600 In-Reply-To: <20220815071332.627393-1-yuzhao@google.com> Message-Id: <20220815071332.627393-10-yuzhao@google.com> Mime-Version: 1.0 References: <20220815071332.627393-1-yuzhao@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v14 09/14] mm: multi-gen LRU: optimize multiple memcgs From: Yu Zhao To: Andrew Morton Cc: Andi Kleen , Aneesh Kumar , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Johannes Weiner , Jonathan Corbet , Linus Torvalds , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Mike Rapoport , Peter Zijlstra , Tejun Heo , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, page-reclaim@google.com, Yu Zhao , Brian Geffon , Jan Alexander Steffens , Oleksandr Natalenko , Steven Barrett , Suleiman Souhlal , Daniel Byrne , Donald Carr , " =?utf-8?q?Holger_Hoffst=C3=A4tte?= " , Konstantin Kharlamov , Shuang Zhai , Sofia Trinh , Vaibhav Jain ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660547667; a=rsa-sha256; cv=none; b=TY1bwBgMluL7VBqmfDMyU71uPM5gj7uj25QWSIK6Ns+Yks07BMWndQPmn9V3oIGs3UzVHG cSK12mCQM58HKHhdG/SYIMHqq5eWLID8RqNcoaro2HDUPrt4d6/AskhB8+mYZPz1UHi0Hk CMHM4MmZSXlEfHiEnrPgntgs9hpyJTU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EB8amhRS; spf=pass (imf22.hostedemail.com: domain of 3UvL5YgYKCD0xtygZnfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3UvL5YgYKCD0xtygZnfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660547667; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5eUro1Ys65PlYEqsgNKi16f/oPtNjC98bqZv1yckg04=; b=NFuBOxeUcHxqnerdjoLhSU4dVpLTgFTB/IdcmDZm3z7MzdlAoOwcbkD4Nv3u+4tY400fDZ fFE3RxvqTzWEX8H4RyZ9uCK5fHSdGFbGhXyKl8S8qD8LgwEUFwPhjcas0+T1jOPJU/oo5y zxEtM6P8NkdtMN1vfCyE/Pt6bOsyugU= X-Rspam-User: X-Stat-Signature: pwyqojsw14sqetwcoainu4e8z8rwatdx X-Rspamd-Queue-Id: 66748C0013 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EB8amhRS; spf=pass (imf22.hostedemail.com: domain of 3UvL5YgYKCD0xtygZnfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3UvL5YgYKCD0xtygZnfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-HE-Tag: 1660547667-167461 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When multiple memcgs are available, it is possible to make better choices based on generations and tiers and therefore improve the overall performance under global memory pressure. This patch adds a rudimentary optimization to select memcgs that can drop single-use unmapped clean pages first. Doing so reduces the chance of going into the aging path or swapping. These two decisions can be costly. A typical example that benefits from this optimization is a server running mixed types of workloads, e.g., heavy anon workload in one memcg and heavy buffered I/O workload in the other. Though this optimization can be applied to both kswapd and direct reclaim, it is only added to kswapd to keep the patchset manageable. Later improvements will cover the direct reclaim path. Server benchmark results: Mixed workloads: fio (buffered I/O): +[19, 21]% IOPS BW patch1-8: 1880k 7343MiB/s patch1-9: 2252k 8796MiB/s memcached (anon): +[119, 123]% Ops/sec KB/sec patch1-8: 862768.65 33514.68 patch1-9: 1911022.12 74234.54 Mixed workloads: fio (buffered I/O): +[75, 77]% IOPS BW 5.19-rc1: 1279k 4996MiB/s patch1-9: 2252k 8796MiB/s memcached (anon): +[13, 15]% Ops/sec KB/sec 5.19-rc1: 1673524.04 65008.87 patch1-9: 1911022.12 74234.54 Configurations: (changes since patch 6) cat mixed.sh modprobe brd rd_nr=2 rd_size=56623104 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 mkfs.ext4 /dev/ram1 mount -t ext4 /dev/ram1 /mnt memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 fio -name=mglru --numjobs=36 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=90m --group_reporting & pid=$! sleep 200 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=50000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed kill -INT $pid wait Client benchmark results: no change (CONFIG_MEMCG=n) Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain --- mm/vmscan.c | 55 ++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d1dfc0a77b6f..ee51c752a3af 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -131,6 +131,13 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; +#ifdef CONFIG_LRU_GEN + /* help make better choices when multiple memcgs are available */ + unsigned int memcgs_need_aging:1; + unsigned int memcgs_need_swapping:1; + unsigned int memcgs_avoid_swapping:1; +#endif + /* Allocation order */ s8 order; @@ -4437,6 +4444,22 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) VM_WARN_ON_ONCE(!current_is_kswapd()); + /* + * To reduce the chance of going into the aging path or swapping, which + * can be costly, optimistically skip them unless their corresponding + * flags were cleared in the eviction path. This improves the overall + * performance when multiple memcgs are available. + */ + if (!sc->memcgs_need_aging) { + sc->memcgs_need_aging = true; + sc->memcgs_avoid_swapping = !sc->memcgs_need_swapping; + sc->memcgs_need_swapping = true; + return; + } + + sc->memcgs_need_swapping = true; + sc->memcgs_avoid_swapping = true; + set_mm_walk(pgdat); memcg = mem_cgroup_iter(NULL, NULL, NULL); @@ -4846,7 +4869,8 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw return scanned; } -static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness) +static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness, + bool *need_swapping) { int type; int scanned; @@ -4909,6 +4933,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap sc->nr_reclaimed += reclaimed; + if (type == LRU_GEN_ANON && need_swapping) + *need_swapping = true; + return scanned; } @@ -4918,10 +4945,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap * reclaim. */ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, - bool can_swap, unsigned long reclaimed) + bool can_swap, unsigned long reclaimed, bool *need_aging) { int priority; - bool need_aging; unsigned long nr_to_scan; struct mem_cgroup *memcg = lruvec_memcg(lruvec); DEFINE_MAX_SEQ(lruvec); @@ -4936,7 +4962,7 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control * (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim)) return 0; - nr_to_scan = get_nr_evictable(lruvec, max_seq, min_seq, can_swap, &need_aging); + nr_to_scan = get_nr_evictable(lruvec, max_seq, min_seq, can_swap, need_aging); if (!nr_to_scan) return 0; @@ -4952,7 +4978,7 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control * if (!nr_to_scan) return 0; - if (!need_aging) + if (!*need_aging) return nr_to_scan; /* skip the aging path at the default priority */ @@ -4972,6 +4998,8 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control * static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { struct blk_plug plug; + bool need_aging = false; + bool need_swapping = false; unsigned long scanned = 0; unsigned long reclaimed = sc->nr_reclaimed; @@ -4993,21 +5021,30 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc else swappiness = 0; - nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness, reclaimed); + nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness, reclaimed, &need_aging); if (!nr_to_scan) - break; + goto done; - delta = evict_folios(lruvec, sc, swappiness); + delta = evict_folios(lruvec, sc, swappiness, &need_swapping); if (!delta) - break; + goto done; scanned += delta; if (scanned >= nr_to_scan) break; + if (sc->memcgs_avoid_swapping && swappiness < 200 && need_swapping) + break; + cond_resched(); } + /* see the comment in lru_gen_age_node() */ + if (!need_aging) + sc->memcgs_need_aging = false; + if (!need_swapping) + sc->memcgs_need_swapping = false; +done: clear_mm_walk(); blk_finish_plug(&plug);