From patchwork Tue Apr 13 06:56:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 12199443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2924C433ED for ; Tue, 13 Apr 2021 06:57:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 51C706128E for ; Tue, 13 Apr 2021 06:57:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 51C706128E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 91AA16B0081; Tue, 13 Apr 2021 02:57:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F2F56B0082; Tue, 13 Apr 2021 02:57:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71C8E6B0083; Tue, 13 Apr 2021 02:57:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id 45CD56B0081 for ; Tue, 13 Apr 2021 02:57:01 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 04466180938CC for ; Tue, 13 Apr 2021 06:57:01 +0000 (UTC) X-FDA: 78026436642.26.6823509 Received: from mail-qv1-f74.google.com (mail-qv1-f74.google.com [209.85.219.74]) by imf02.hostedemail.com (Postfix) with ESMTP id F15ED40002DB for ; Tue, 13 Apr 2021 06:56:45 +0000 (UTC) Received: by mail-qv1-f74.google.com with SMTP id gu11so7133561qvb.0 for ; Mon, 12 Apr 2021 23:57:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ZkZkBuwvqnJ3RNHJhCNbR3K9qvaxv7Y+ShqFogGYPM4=; b=YuhzAl4jnf9B8DPsAHH+IEn6TeEK8tkXzqeIIUWrV6MKmrDwRVWEaxlfpyho7LEl9c Yb/oFtKUHNb53oILQT33tlmVOzpPgzylMipFZ2l5j9KHbcsDyRmB0oqQUa1QZ2PJMYNK fWpCu7LXduAtYRU+OGHNrJHXp576QKDulX5A0p9heBIoiC+vWWS/x+GcCoUk17noPsZC Su6UQCzg6NAfh+hiQZUMluxkVxIZLc0tUeagDPWX8AYcx4WshWUrgTPuDgI3s1vI7M8C K9lLKPVh9VeBFpsycJM4koujbXoOVbPXyfWOhPPIE23ETJR5Yb0o5n5VqtBZYTB2FIhK TPQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ZkZkBuwvqnJ3RNHJhCNbR3K9qvaxv7Y+ShqFogGYPM4=; b=dfMITXvLuRsL4IV4QJAglZYKaj0wsodJPwAT7BtwCdh+2INC7YqShfwXB5t7crpJli Kpy3YpR0yeuV7dF4qwYmA6ruscIwbhCHMZumePW/Jx35H3w/nBRGhQ6DD0RakyaCG+ry qBxzeHEHzbKWJOuwTYU1iws16DwwQIrUcct+wCdy/tEBmoOn6CroowLI7g6YWkIw1n0/ YAP1DYHlk+UJxKM36jhJ29fn57C4/rkbjUxhE5uDm1oN26GXo00DmK9RyLy2YF1KBG8r hsN6FJlTKO/S8CNUIxtee2hBFBtD3ZMtYlFrd3YcoV1Bc3ViqYg2+nL5kijGBaCU/w38 DtDg== X-Gm-Message-State: AOAM532+eB4z0m/lT2520Uy2u560IF/wOAwdjjRnEkP2qQTuHamYenzG OvU9ut1tj8Z7KxxT3mSX5pcgzahbUausXqCTqOBbXsvCiam/zzUYkyiM85ayUX5ojPXc0tAZVoi NzlRpObOIG18OHTugfDqSOyTqYiDucyMDjRZnmtv1Qj9icFxwlhiIuwJa X-Google-Smtp-Source: ABdhPJz8UXRBXxFnHjwU9KHKJ57aCdWAlTupj/VfQPjKJc1AKD7gBysJ6np5sy0VpO9JJLZsJRX7gVcs/zM= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:d02d:cccc:9ebe:9fe9]) (user=yuzhao job=sendgmr) by 2002:a05:6214:1c0c:: with SMTP id u12mr31837398qvc.24.1618297019786; Mon, 12 Apr 2021 23:56:59 -0700 (PDT) Date: Tue, 13 Apr 2021 00:56:30 -0600 In-Reply-To: <20210413065633.2782273-1-yuzhao@google.com> Message-Id: <20210413065633.2782273-14-yuzhao@google.com> Mime-Version: 1.0 References: <20210413065633.2782273-1-yuzhao@google.com> X-Mailer: git-send-email 2.31.1.295.g9ea45b61b8-goog Subject: [PATCH v2 13/16] mm: multigenerational lru: page reclaim From: Yu Zhao To: linux-mm@kvack.org Cc: Alex Shi , Andi Kleen , Andrew Morton , Benjamin Manes , Dave Chinner , Dave Hansen , Hillf Danton , Jens Axboe , Johannes Weiner , Jonathan Corbet , Joonsoo Kim , Matthew Wilcox , Mel Gorman , Miaohe Lin , Michael Larabel , Michal Hocko , Michel Lespinasse , Rik van Riel , Roman Gushchin , Rong Chen , SeongJae Park , Tim Chen , Vlastimil Babka , Yang Shi , Ying Huang , Zi Yan , linux-kernel@vger.kernel.org, lkp@lists.01.org, page-reclaim@google.com, Yu Zhao X-Stat-Signature: khn3hm8x715pa9bgqzc9ycbgew9nhgmd X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F15ED40002DB Received-SPF: none (flex--yuzhao.bounces.google.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from="<3u0B1YAYKCBwQMR92G8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--yuzhao.bounces.google.com>"; helo=mail-qv1-f74.google.com; client-ip=209.85.219.74 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618297005-986501 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the aging and the eviction in place, we can build the page reclaim in a straightforward manner: 1) In order to reduce the latency, direct reclaim only invokes the aging when both min_seq[2] reaches max_seq-1; otherwise it invokes the eviction. 2) In order to avoid the aging in the direct reclaim path, kswapd does the background aging more proactively. It invokes the aging when either of min_seq[2] reaches max_seq-1; otherwise it invokes the eviction. And we add another optimization: pages mapped around a referenced PTE may also have been referenced due to the spatial locality. In the reclaim path, if the rmap finds the PTE mapping a page under reclaim referenced, it calls a new function lru_gen_scan_around() to scan the vicinity of the PTE. And if this new function finds others referenced PTEs, it updates the generation number of the pages mapped by those PTEs. Signed-off-by: Yu Zhao --- include/linux/mmzone.h | 6 ++ mm/rmap.c | 6 ++ mm/vmscan.c | 236 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 248 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index dcfadf6a8c07..a22e9e40083f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -292,6 +292,7 @@ enum lruvec_flags { }; struct lruvec; +struct page_vma_mapped_walk; #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_USAGE_MASK ((BIT(LRU_USAGE_WIDTH) - 1) << LRU_USAGE_PGOFF) @@ -384,6 +385,7 @@ struct lrugen { void lru_gen_init_lruvec(struct lruvec *lruvec); void lru_gen_set_state(bool enable, bool main, bool swap); +void lru_gen_scan_around(struct page_vma_mapped_walk *pvmw); #else /* CONFIG_LRU_GEN */ @@ -395,6 +397,10 @@ static inline void lru_gen_set_state(bool enable, bool main, bool swap) { } +static inline void lru_gen_scan_around(struct page_vma_mapped_walk *pvmw) +{ +} + #endif /* CONFIG_LRU_GEN */ struct lruvec { diff --git a/mm/rmap.c b/mm/rmap.c index b0fc27e77d6d..d600b282ced5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -72,6 +72,7 @@ #include #include #include +#include #include @@ -792,6 +793,11 @@ static bool page_referenced_one(struct page *page, struct vm_area_struct *vma, } if (pvmw.pte) { + /* the multigenerational lru exploits the spatial locality */ + if (lru_gen_enabled() && pte_young(*pvmw.pte)) { + lru_gen_scan_around(&pvmw); + referenced++; + } if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) { /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 6239b1acd84f..01c475386379 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1114,6 +1114,10 @@ static unsigned int shrink_page_list(struct list_head *page_list, if (!sc->may_unmap && page_mapped(page)) goto keep_locked; + /* in case the page was found accessed by lru_gen_scan_around() */ + if (lru_gen_enabled() && !ignore_references && PageReferenced(page)) + goto keep_locked; + may_enter_fs = (sc->gfp_mask & __GFP_FS) || (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO)); @@ -2233,6 +2237,10 @@ static void prepare_scan_count(pg_data_t *pgdat, struct scan_control *sc) unsigned long file; struct lruvec *target_lruvec; + /* the multigenerational lru doesn't use these counters */ + if (lru_gen_enabled()) + return; + target_lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat); /* @@ -2522,6 +2530,19 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, } } +#ifdef CONFIG_LRU_GEN +static void age_lru_gens(struct pglist_data *pgdat, struct scan_control *sc); +static void shrink_lru_gens(struct lruvec *lruvec, struct scan_control *sc); +#else +static void age_lru_gens(struct pglist_data *pgdat, struct scan_control *sc) +{ +} + +static void shrink_lru_gens(struct lruvec *lruvec, struct scan_control *sc) +{ +} +#endif + static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { unsigned long nr[NR_LRU_LISTS]; @@ -2533,6 +2554,11 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) struct blk_plug plug; bool scan_adjusted; + if (lru_gen_enabled()) { + shrink_lru_gens(lruvec, sc); + return; + } + get_scan_count(lruvec, sc, nr); /* Record the original scan target for proportional adjustments later */ @@ -2999,6 +3025,10 @@ static void snapshot_refaults(struct mem_cgroup *target_memcg, pg_data_t *pgdat) struct lruvec *target_lruvec; unsigned long refaults; + /* the multigenerational lru doesn't use these counters */ + if (lru_gen_enabled()) + return; + target_lruvec = mem_cgroup_lruvec(target_memcg, pgdat); refaults = lruvec_page_state(target_lruvec, WORKINGSET_ACTIVATE_ANON); target_lruvec->refaults[0] = refaults; @@ -3373,6 +3403,11 @@ static void age_active_anon(struct pglist_data *pgdat, struct mem_cgroup *memcg; struct lruvec *lruvec; + if (lru_gen_enabled()) { + age_lru_gens(pgdat, sc); + return; + } + if (!total_swap_pages) return; @@ -5468,6 +5503,57 @@ static bool walk_mm_list(struct lruvec *lruvec, unsigned long max_seq, return true; } +void lru_gen_scan_around(struct page_vma_mapped_walk *pvmw) +{ + pte_t *pte; + unsigned long start, end; + int old_gen, new_gen; + unsigned long flags; + struct lruvec *lruvec; + struct mem_cgroup *memcg; + struct pglist_data *pgdat = page_pgdat(pvmw->page); + + lockdep_assert_held(pvmw->ptl); + + start = max(pvmw->address & PMD_MASK, pvmw->vma->vm_start); + end = pmd_addr_end(pvmw->address, pvmw->vma->vm_end); + pte = pvmw->pte - ((pvmw->address - start) >> PAGE_SHIFT); + + memcg = lock_page_memcg(pvmw->page); + lruvec = lock_page_lruvec_irqsave(pvmw->page, &flags); + + new_gen = lru_gen_from_seq(lruvec->evictable.max_seq); + + for (; start != end; pte++, start += PAGE_SIZE) { + struct page *page; + unsigned long pfn = pte_pfn(*pte); + + if (!pte_present(*pte) || !pte_young(*pte) || is_zero_pfn(pfn)) + continue; + + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + continue; + + page = compound_head(pfn_to_page(pfn)); + if (page_to_nid(page) != pgdat->node_id) + continue; + + if (page_memcg_rcu(page) != memcg) + continue; + /* + * We may be holding many locks. So try to finish as fast as + * possible and leave the accessed and the dirty bits to page + * table walks. + */ + old_gen = page_update_gen(page, new_gen); + if (old_gen >= 0 && old_gen != new_gen) + lru_gen_update_size(page, lruvec, old_gen, new_gen); + } + + unlock_page_lruvec_irqrestore(lruvec, flags); + unlock_page_memcg(pvmw->page); +} + /****************************************************************************** * the eviction ******************************************************************************/ @@ -5809,6 +5895,156 @@ static bool evict_lru_gen_pages(struct lruvec *lruvec, struct scan_control *sc, return *nr_to_scan > 0 && sc->nr_reclaimed < sc->nr_to_reclaim; } +/****************************************************************************** + * page reclaim + ******************************************************************************/ + +static int get_swappiness(struct lruvec *lruvec) +{ + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + int swappiness = mem_cgroup_get_nr_swap_pages(memcg) >= (long)SWAP_CLUSTER_MAX ? + mem_cgroup_swappiness(memcg) : 0; + + VM_BUG_ON(swappiness > 200U); + + return swappiness; +} + +static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, + int swappiness) +{ + int gen, file, zone; + long nr_to_scan = 0; + struct lrugen *lrugen = &lruvec->evictable; + DEFINE_MAX_SEQ(); + DEFINE_MIN_SEQ(); + + lru_add_drain(); + + for (file = !swappiness; file < ANON_AND_FILE; file++) { + unsigned long seq; + + for (seq = min_seq[file]; seq <= max_seq; seq++) { + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone <= sc->reclaim_idx; zone++) + nr_to_scan += READ_ONCE(lrugen->sizes[gen][file][zone]); + } + } + + nr_to_scan = max(nr_to_scan, 0L); + nr_to_scan = round_up(nr_to_scan >> sc->priority, SWAP_CLUSTER_MAX); + + if (max_nr_gens(max_seq, min_seq, swappiness) > MIN_NR_GENS) + return nr_to_scan; + + /* kswapd uses age_lru_gens() */ + if (current_is_kswapd()) + return 0; + + return walk_mm_list(lruvec, max_seq, sc, swappiness, NULL) ? nr_to_scan : 0; +} + +static void shrink_lru_gens(struct lruvec *lruvec, struct scan_control *sc) +{ + struct blk_plug plug; + unsigned long scanned = 0; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + + blk_start_plug(&plug); + + while (true) { + long nr_to_scan; + int swappiness = sc->may_swap ? get_swappiness(lruvec) : 0; + + nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness) - scanned; + if (nr_to_scan < (long)SWAP_CLUSTER_MAX) + break; + + scanned += nr_to_scan; + + if (!evict_lru_gen_pages(lruvec, sc, swappiness, &nr_to_scan)) + break; + + scanned -= nr_to_scan; + + if (mem_cgroup_below_min(memcg) || + (mem_cgroup_below_low(memcg) && !sc->memcg_low_reclaim)) + break; + + cond_resched(); + } + + blk_finish_plug(&plug); +} + +/****************************************************************************** + * the background aging + ******************************************************************************/ + +static int lru_gen_spread = MIN_NR_GENS; + +static void try_walk_mm_list(struct lruvec *lruvec, struct scan_control *sc) +{ + int gen, file, zone; + long old_and_young[2] = {}; + struct mm_walk_args args = {}; + int spread = READ_ONCE(lru_gen_spread); + int swappiness = get_swappiness(lruvec); + struct lrugen *lrugen = &lruvec->evictable; + DEFINE_MAX_SEQ(); + DEFINE_MIN_SEQ(); + + lru_add_drain(); + + for (file = !swappiness; file < ANON_AND_FILE; file++) { + unsigned long seq; + + for (seq = min_seq[file]; seq <= max_seq; seq++) { + gen = lru_gen_from_seq(seq); + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + old_and_young[seq == max_seq] += + READ_ONCE(lrugen->sizes[gen][file][zone]); + } + } + + old_and_young[0] = max(old_and_young[0], 0L); + old_and_young[1] = max(old_and_young[1], 0L); + + if (old_and_young[0] + old_and_young[1] < SWAP_CLUSTER_MAX) + return; + + /* try to spread pages out across spread+1 generations */ + if (old_and_young[0] >= old_and_young[1] * spread && + min_nr_gens(max_seq, min_seq, swappiness) > max(spread, MIN_NR_GENS)) + return; + + walk_mm_list(lruvec, max_seq, sc, swappiness, &args); +} + +static void age_lru_gens(struct pglist_data *pgdat, struct scan_control *sc) +{ + struct mem_cgroup *memcg; + + VM_BUG_ON(!current_is_kswapd()); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat); + struct lrugen *lrugen = &lruvec->evictable; + + if (!mem_cgroup_below_min(memcg) && + (!mem_cgroup_below_low(memcg) || sc->memcg_low_reclaim)) + try_walk_mm_list(lruvec, sc); + + if (!mem_cgroup_disabled()) + atomic_add_unless(&lrugen->priority, 1, DEF_PRIORITY); + + cond_resched(); + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); +} + /****************************************************************************** * state change ******************************************************************************/