From patchwork Wed Dec 21 00:12:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13078277 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40568C4332F for ; Wed, 21 Dec 2022 00:12:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07B408E0009; Tue, 20 Dec 2022 19:12:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 003F58E0007; Tue, 20 Dec 2022 19:12:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DBEFC8E0009; Tue, 20 Dec 2022 19:12:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C69CB8E0007 for ; Tue, 20 Dec 2022 19:12:43 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A17041C5C44 for ; Wed, 21 Dec 2022 00:12:43 +0000 (UTC) X-FDA: 80264387406.22.20A0CDC Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf25.hostedemail.com (Postfix) with ESMTP id 0360CA0007 for ; Wed, 21 Dec 2022 00:12:41 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="eG/fUD4z"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3eU-iYwYKCMQ849rkyqyyqvo.mywvsx47-wwu5kmu.y1q@flex--yuzhao.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3eU-iYwYKCMQ849rkyqyyqvo.mywvsx47-wwu5kmu.y1q@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671581562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W215O6nVZQSEsqpHqtb/vUKPzkuQ3oMopgbAZLI2Y6k=; b=URU2Hq3Qu7UyqnsFUGIMGwMPVUMs6L6n1WxGv6K6noXuO4inZn2sa93S0Ca1ENZO+AgThm ph6PJz8WBvAD/FrqDO6kXythfxCI9TM59YtJbldb99QCFOQVKb9dsZ+qQRoeqMLB0CbRHQ BL2rc/9VL1cj0esnusWwx4Gzo4K6UT8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="eG/fUD4z"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of 3eU-iYwYKCMQ849rkyqyyqvo.mywvsx47-wwu5kmu.y1q@flex--yuzhao.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3eU-iYwYKCMQ849rkyqyyqvo.mywvsx47-wwu5kmu.y1q@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671581562; a=rsa-sha256; cv=none; b=asB3v/OIc04seQ51egqP7mQoviaXldzFRl3SkKbA8ApO5g3VnBFGzQMr4gz9d0ly4O0CeH seIFsfV49Gg08f0Z420klr8Nn3rYXsIXX5XeAwvS8B169PFqH4m603qHw4nVO4C+BlXOFr LaN9ARybUsQJzPTvWvcLscg42O8M3wg= Received: by mail-yb1-f201.google.com with SMTP id 203-20020a2502d4000000b006f94ab02400so15812700ybc.2 for ; Tue, 20 Dec 2022 16:12:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W215O6nVZQSEsqpHqtb/vUKPzkuQ3oMopgbAZLI2Y6k=; b=eG/fUD4zGkFUZK+ZF+HvzoNHr/U2E6OCu8wDseqLAQ7XetG2/d97LSGE+vjDKJW/8T 55c6gGFY7rbXkizJhVfunHV+jAKiBzVpXbBFN3tFXqv08tKo4j5tbJdoIFIVw8yuZZEd ze4WDTwcwXETWVT8Z2vGRdjfiisp9rSc1HLiCfPkE6vpTXTeOalRHGAvgdBLPNLdEZx2 jkxOB+JW7wkH/evopjYD7Zl4/gzUQ+yv9/VY+Ad2IDAebHIFWEaxMQI2vQiJqEidlS4M B0fpb9bK118a9RvoRkIDcjCWKNmiMIf1PgLrjubYfLSYbJNQvRTJ++80oy6hzbiOWW49 rdWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W215O6nVZQSEsqpHqtb/vUKPzkuQ3oMopgbAZLI2Y6k=; b=Rls7tfaUMl7vA1fHzn4Kp4IMa5uZsLKHkpZjJN57tKfHUKBKRyL8MPhdTZA4xk5owZ Dj7w9m9rTPs//zoHnafGnqL1Hv3eYQ93qjUJBjtZrQDCOkaiRNi7lWr62R6zGVo6Q3fH FHNo1Q0GqUK/oTIA8Vg0iF7Fz83XWZr0HpsJbn/bnj7KroifKcsuW7E1dvS3WgW16G3Q Q5773Z95WalFTq92EVEU8XKfS07SNSlZxPRByQ9rplB72hL5GGi9Qe3PPmKybCnRyOaN jigFoQlh4DzIHmx9Mz5ZA+pyzfqxMywflVF2GaLo3SZitgkCpZZ9mIp8U3UzXUSrv7LG gSFw== X-Gm-Message-State: AFqh2kpCzhkvPvWs+DhqZhyXyEX1mU72kg3/LAIPUyJMYviTA+1K6TJ7 WiyW6cSgYBQL7UUdJuPj21NV+HLKQ10= X-Google-Smtp-Source: AMrXdXv0COwLPwkTS0Vl7hKGmMXAA1gr18YJ0e5zARRIGHI8mnjrVM8t0QekFFd4Y0N7jJ2180yJ1mZynA4= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:a589:30f2:2daa:4ab7]) (user=yuzhao job=sendgmr) by 2002:a25:9d8c:0:b0:733:4d8d:f01a with SMTP id v12-20020a259d8c000000b007334d8df01amr3223932ybp.399.1671581561148; Tue, 20 Dec 2022 16:12:41 -0800 (PST) Date: Tue, 20 Dec 2022 17:12:04 -0700 In-Reply-To: <20221221001207.1376119-1-yuzhao@google.com> Message-Id: <20221221001207.1376119-5-yuzhao@google.com> Mime-Version: 1.0 References: <20221221001207.1376119-1-yuzhao@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Subject: [PATCH mm-unstable v2 4/8] mm: multi-gen LRU: remove aging fairness safeguard From: Yu Zhao To: Andrew Morton Cc: Johannes Weiner , Jonathan Corbet , Michael Larabel , Michal Hocko , Mike Rapoport , Roman Gushchin , Suren Baghdasaryan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-mm@google.com, Yu Zhao X-Rspamd-Queue-Id: 0360CA0007 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: y9xp71o7fwmzottr46ppaws9eg5oehfa X-HE-Tag: 1671581561-187743 X-HE-Meta: U2FsdGVkX1+uvMmzUzvnDUFUIS4ws0QIDpjUU1NlRi/sA/MahgtdqRlA++Qf70tm370uIZQ7g5O60Xtfa5SC/ITIsAzHI+Xzfmmt4jpcSAzmc+KiH+yJFijuzjBoHkg5xLlIMIOBdY2XdnXboG20iVawIC+aHZa9ExUgFGwev87lyVopTGQ7TumKrQZ7KNzevDRz2LX2MofC8QObqEJ70ggwhUsesOKnBFCPyQjOtTj6qpkYyL+U8H8wOH+pUGAFWfH7M5ReU62eZHn2uG+BsbzV57feVyqzJYm6Rf6YXd4kwsn5iNXpLV3x77l2yDlqVWMnSQuYqqxKSXvbOTE3UJMxkBNjmDZ4bE5CTMaeHjVKB2CqDk2gDHmiMaOK2RvQ7Gu6uETU0VW2tx0wgfS9czYQtPAHCPQfqs5SQpvdvu6TuHAQYJSN2H+m7IsRcLCNrcRmR85eMI7qGx9/qOpAHCqFAGRz+HKPeizOzR4zltpa2Em18pduGYdy7yUr+vQNSJf2t2dyLul4owvSmlt5n3DrZf+9RC4a/YJoE58S1Bt2XD+nfJGMtnvC5Azdn6IugChENbpE6PH82jbN6cRBTwGPrPY+01IIUozVp2y3SK1rPV+4QAaKxTEWkDu63TnE8L5CbWGIdEZw5En04FpigLtwbcrFy5H1QSDS7u8WfP8D9p8yKCS/40w9bMXMb8lBCqj7P8olb8eCz4pt3lepCNgJPg0z5TFRTaQcKS0y/Lg9FrjXGoZ0raMAknRUCM9PyeJCHM5CoWfLih6psqmaD4fiPdxV6V6xHdzgZuMJFXYp+dpFki60Kt9iXKEL1vQutTsGGvuQrQsKJemnGLswLY5q2iQ2TJgQe+sdaUhhQZ2UbG6ElCnQOGNA6GDY64oEpF/p3MRXpjJPg/0PER2Y7YH7HnPR5ge5vq4qhDObgyI/pF9SXPD6bwgZwIQxsr+AZFd34gbP4LxMOWKXWul LiElmPdu cG7AjVXjQWbOofMUpN04W2LkBNWDLVlv1AF/IzEBi77Cpn1s4VjlBhioRHIv/nX4IFbPgMeguHu5z6qrwSEV6jjH4V04zNIT1vjsv4h3GhcGjoGywvg97iLgG+xbWVvxdBzFTqb4X4PrJH5u3A1gWSSK5Fdd4U+x6ygQFr+lCZDb2YqM2gKCvYB0TlViOG30/Fq/5Uea3n4nPUQzaSkSmMRjd/k+A6e0Oe8EOTocwb5ScEpOtVF0dn6ku7gekLgnuk1Atu+oAeUS1L4fnF4YbKAmxkmi/HIKYgDuJ6lfx6Y5Lv0KF1fNuqDnqGfue4sQs8eKVlMrbjscFgHDJF7ecz00F6OHaRRjX1CxsHwVu02VJDSk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Recall that the aging produces the youngest generation: first it scans for accessed folios and updates their gen counters; then it increments lrugen->max_seq. The current aging fairness safeguard for kswapd uses two passes to ensure the fairness to multiple eligible memcgs. On the first pass, which is shared with the eviction, it checks whether all eligible memcgs are low on cold folios. If so, it requires a second pass, on which it ages all those memcgs at the same time. With memcg LRU, the aging, while ensuring eventual fairness, will run when necessary. Therefore the current aging fairness safeguard for kswapd will not be needed. Note that memcg LRU only applies to global reclaim. For memcg reclaim, the aging can be unfair to different memcgs, i.e., their lrugen->max_seq can be incremented at different paces. Signed-off-by: Yu Zhao Change-Id: I66c70bd31d5276c710ad9209f0a74b1c24a0eda9 --- mm/vmscan.c | 150 +++++++++++++++++++++++++--------------------------- 1 file changed, 71 insertions(+), 79 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9655b3b3a95e..a2f71400b8be 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -137,7 +137,6 @@ struct scan_control { #ifdef CONFIG_LRU_GEN /* help kswapd make better choices among multiple memcgs */ - unsigned int memcgs_need_aging:1; unsigned long last_reclaimed; #endif @@ -4471,7 +4470,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq, return true; } -static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsigned long *min_seq, +static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan) { int gen, type, zone; @@ -4480,6 +4479,13 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig unsigned long total = 0; struct lru_gen_folio *lrugen = &lruvec->lrugen; struct mem_cgroup *memcg = lruvec_memcg(lruvec); + DEFINE_MIN_SEQ(lruvec); + + /* whether this lruvec is completely out of cold folios */ + if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) { + *nr_to_scan = 0; + return true; + } for (type = !can_swap; type < ANON_AND_FILE; type++) { unsigned long seq; @@ -4508,8 +4514,6 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig * stalls when the number of generations reaches MIN_NR_GENS. Hence, the * ideal number of generations is MIN_NR_GENS+1. */ - if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) - return true; if (min_seq[!can_swap] + MIN_NR_GENS < max_seq) return false; @@ -4528,40 +4532,54 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsig return false; } -static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned long min_ttl) +static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc) { - bool need_aging; - unsigned long nr_to_scan; - int swappiness = get_swappiness(lruvec, sc); + int gen, type, zone; + unsigned long total = 0; + bool can_swap = get_swappiness(lruvec, sc); + struct lru_gen_folio *lrugen = &lruvec->lrugen; struct mem_cgroup *memcg = lruvec_memcg(lruvec); DEFINE_MAX_SEQ(lruvec); DEFINE_MIN_SEQ(lruvec); + for (type = !can_swap; type < ANON_AND_FILE; type++) { + unsigned long seq; + + for (seq = min_seq[type]; seq <= max_seq; seq++) { + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + total += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L); + } + } + + /* whether the size is big enough to be helpful */ + return mem_cgroup_online(memcg) ? (total >> sc->priority) : total; +} + +static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc, + unsigned long min_ttl) +{ + int gen; + unsigned long birth; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + DEFINE_MIN_SEQ(lruvec); + VM_WARN_ON_ONCE(sc->memcg_low_reclaim); + /* see the comment on lru_gen_folio */ + gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]); + birth = READ_ONCE(lruvec->lrugen.timestamps[gen]); + + if (time_is_after_jiffies(birth + min_ttl)) + return false; + + if (!lruvec_is_sizable(lruvec, sc)) + return false; + mem_cgroup_calculate_protection(NULL, memcg); - if (mem_cgroup_below_min(NULL, memcg)) - return false; - - need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, swappiness, &nr_to_scan); - - if (min_ttl) { - int gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]); - unsigned long birth = READ_ONCE(lruvec->lrugen.timestamps[gen]); - - if (time_is_after_jiffies(birth + min_ttl)) - return false; - - /* the size is likely too small to be helpful */ - if (!nr_to_scan && sc->priority != DEF_PRIORITY) - return false; - } - - if (need_aging) - try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false); - - return true; + return !mem_cgroup_below_min(NULL, memcg); } /* to protect the working set of the last N jiffies */ @@ -4570,46 +4588,32 @@ static unsigned long lru_gen_min_ttl __read_mostly; static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) { struct mem_cgroup *memcg; - bool success = false; unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl); VM_WARN_ON_ONCE(!current_is_kswapd()); sc->last_reclaimed = sc->nr_reclaimed; - /* - * To reduce the chance of going into the aging path, which can be - * costly, optimistically skip it if the flag below was cleared in the - * eviction path. This improves the overall performance when multiple - * memcgs are available. - */ - if (!sc->memcgs_need_aging) { - sc->memcgs_need_aging = true; - return; - } - - set_mm_walk(pgdat); - - memcg = mem_cgroup_iter(NULL, NULL, NULL); - do { - struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); - - if (age_lruvec(lruvec, sc, min_ttl)) - success = true; - - cond_resched(); - } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); - - clear_mm_walk(); - /* check the order to exclude compaction-induced reclaim */ - if (success || !min_ttl || sc->order) + if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY) return; + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + + if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) { + mem_cgroup_iter_break(NULL, memcg); + return; + } + + cond_resched(); + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + /* * The main goal is to OOM kill if every generation from all memcgs is * younger than min_ttl. However, another possibility is all memcgs are - * either below min or empty. + * either too small or below min. */ if (mutex_trylock(&oom_lock)) { struct oom_control oc = { @@ -5117,34 +5121,28 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap * reclaim. */ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, - bool can_swap, bool *need_aging) + bool can_swap) { unsigned long nr_to_scan; struct mem_cgroup *memcg = lruvec_memcg(lruvec); DEFINE_MAX_SEQ(lruvec); - DEFINE_MIN_SEQ(lruvec); if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg) || (mem_cgroup_below_low(sc->target_mem_cgroup, memcg) && !sc->memcg_low_reclaim)) return 0; - *need_aging = should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, &nr_to_scan); - if (!*need_aging) + if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan)) return nr_to_scan; /* skip the aging path at the default priority */ if (sc->priority == DEF_PRIORITY) - goto done; - - /* leave the work to lru_gen_age_node() */ - if (current_is_kswapd()) - return 0; - - if (try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false)) return nr_to_scan; -done: - return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0; + + try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false); + + /* skip this lruvec as it's low on cold folios */ + return 0; } static unsigned long get_nr_to_reclaim(struct scan_control *sc) @@ -5163,9 +5161,7 @@ static unsigned long get_nr_to_reclaim(struct scan_control *sc) static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { struct blk_plug plug; - bool need_aging = false; unsigned long scanned = 0; - unsigned long reclaimed = sc->nr_reclaimed; unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); lru_add_drain(); @@ -5186,13 +5182,13 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc else swappiness = 0; - nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness, &need_aging); + nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness); if (!nr_to_scan) - goto done; + break; delta = evict_folios(lruvec, sc, swappiness); if (!delta) - goto done; + break; scanned += delta; if (scanned >= nr_to_scan) @@ -5204,10 +5200,6 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc cond_resched(); } - /* see the comment in lru_gen_age_node() */ - if (sc->nr_reclaimed - reclaimed >= MIN_LRU_BATCH && !need_aging) - sc->memcgs_need_aging = false; -done: clear_mm_walk(); blk_finish_plug(&plug);