From patchwork Mon Dec 23 08:20:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chen Ridong X-Patchwork-Id: 13918595 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1363AE7718B for ; Mon, 23 Dec 2024 08:30:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15E4E6B007B; Mon, 23 Dec 2024 03:30:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E7016B0082; Mon, 23 Dec 2024 03:30:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA2716B0083; Mon, 23 Dec 2024 03:30:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CD63D6B007B for ; Mon, 23 Dec 2024 03:30:26 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3EE24AFB8F for ; Mon, 23 Dec 2024 08:30:26 +0000 (UTC) X-FDA: 82925551212.18.E850942 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf16.hostedemail.com (Postfix) with ESMTP id 47CF918000F for ; Mon, 23 Dec 2024 08:29:42 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734942607; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=XFAjCKQGFNU3DrzEgZ5N0wmDOJ6popQzbJl2J9UDclA=; b=TjcsofMdV4Q9zQegzpL174hx4Mft6WSXFYXPhR85uq5MDbFNKSUWYMWeN55vHb/c5H4gl1 oGypQh252pKkSnpnRuNdQpdUyxfwBkNjq12NaxypE2Eyehq1YxRUb/waF2xop7NnOEMHey f3YG6BqN5tyD7swucvx+S6tU9c2Osb8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734942607; a=rsa-sha256; cv=none; b=H22CXq+TKuWbV5zEFluq3jgrkDFdzAcWqouoDDX4HlQwUIx531iQFTLrsv6rI0og+G7mob TPIeJDA5Z3EdPCAPiEHul55CBZxM1zzDvMGqz8AibJS5qb6ths4EcybrVYXtVuu1LDc8/b yM1UL9dtnyYm/KM57AbP5JtiB49eccU= Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YGrmn20Nyz4f3jkk for ; Mon, 23 Dec 2024 16:30:01 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id D4ADC1A0359 for ; Mon, 23 Dec 2024 16:30:15 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgAn1OCGH2lnbuRtFQ--.28145S2; Mon, 23 Dec 2024 16:30:13 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, mhocko@suse.com, hannes@cmpxchg.org, yosryahmed@google.com, yuzhao@google.com, david@redhat.com, willy@infradead.org, ryan.roberts@arm.com, baohua@kernel.org, 21cnbao@gmail.com, wangkefeng.wang@huawei.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com, xieym_ict@hotmail.com Subject: [PATCH -next v6] mm: vmscan: retry folios written back while isolated for traditional LRU Date: Mon, 23 Dec 2024 08:20:04 +0000 Message-Id: <20241223082004.3759152-1-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-CM-TRANSID: Syh0CgAn1OCGH2lnbuRtFQ--.28145S2 X-Coremail-Antispam: 1UD129KBjvJXoW3XrWfuFyxWr4xWr4rGr1xKrg_yoWfKryDpF Z3Wrnrtw48Jr1fKrsxZF1DWryak3yxWr47tFW7Wry2yF13Xr1FgFy2k34jvF45GrykAFna vrZxXryDWa1jyF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvFb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAa w2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a 6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6x kF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AK xVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvj xUF1v3UUUUU X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Queue-Id: 47CF918000F X-Stat-Signature: gabya7owjwetyugts5tf4uh73g6ny1s3 X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734942582-662572 X-HE-Meta: U2FsdGVkX1/yeyZHEBVJpkXXrrsHlk7F6SblpVp3xD3Nvkw6xY+bXWo/ZJYf2najSxvjOcCYDZQ+YkrbqiTq7f5ZFI5Hps5exIJtaWDFiIxi3C2K/+B4E1QSSiNEdDKoPDKGWhwCk5pAGCmniL8KRfMWp6VlXIDTiu+PAUiAEKIxee9hdfjtsnbtdDTCDt4oXRhexUziDnrNZjK2uG4URM6DBS6JrLUc05GAeJMpmCx+73BYIPJvJHeANnPksiIvif9p6YrbVKo3Q5Bk87dws7l5NNk/iK0xRwgyqo38jmOqF56SOW/mLUMyfuPE2Y6PLbSTAvDYgtR7Ur4RRKWn3PWGYMh2qiGuIRIlUBJOEKk9xQBRe7URxSgjXsQrRnna6U0emRmNX/f6zAVu2YGv/oUvwDcmdd1tR+jzJFT99U/+gvurEcoqZoHt4xL9aiJd/k5jNWF6RrrMbMYocyhSqqCmaBTNOLgVnBGe6kxUhsn7/zevYi5U5esh77mLHfoDspVwQaMrwL4ALstp+EN+He2ohiFG7kal/QAMHNG3b3RGDSqoGEAVFy0BdE70zb1pl3GpEK7zhcUJA+3SKUverLd79dcv1dTpOemHqg8enDEEa9UpIrmXR6fMXvrKJZITUTeutsk0vix92leOl6SN+FJscp9TWLs40kgAUuFsC3ZCjatNayDryZ6w+Y/ZQu3Ip7CBDj0EUZnTfqn7kO2ZEkG12faGelXEQlDWfliWBdy3B+C5+K9zEE2tyCPbL6aCP4wn3+6lJia4RyrdtC/5jsmCTZpHMAtNawvR0CuQoV3NL1rXljjta14ksjs1RNQVNVA/mlrJnwpLd06RwYo7zo37Tglw+V7KurieUwlO2Q38mFuf2/z/SCm65m4N3YoN7IOMXwszPkFBnaF6oI9p/GsocuCpDXYZDqkGYuPQCyrmXgveUzMMqCfuhyjW0NofIDkGlK76cmH2sezm/vO r2xjJ6HH IqPM+efwNSpc5dgrdg6ZUjotBs0Sjwqxhm2OaP3Pgi/pIgkfb+hGTT2QDL7E44hGmM8/TAJ78W1lBIW/EDsRNPYq4CJ79gGvzPonVLevgucqmVUk1Po4lE1IGGQqQzWllYwAu2DJV3/kdtqjPSOiDQshxlHRG8cRD2zCrL6bIzNBcijHXmjp7bQHAcfKWBxslBDE7SuD/+LAv4tM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chen Ridong The page reclaim isolates a batch of folios from the tail of one of the LRU lists and works on those folios one by one. For a suitable swap-backed folio, if the swap device is async, it queues that folio for writeback. After the page reclaim finishes an entire batch, it puts back the folios it queued for writeback to the head of the original LRU list. In the meantime, the page writeback flushes the queued folios also by batches. Its batching logic is independent from that of the page reclaim. For each of the folios it writes back, the page writeback calls folio_rotate_reclaimable() which tries to rotate a folio to the tail. folio_rotate_reclaimable() only works for a folio after the page reclaim has put it back. If an async swap device is fast enough, the page writeback can finish with that folio while the page reclaim is still working on the rest of the batch containing it. In this case, that folio will remain at the head and the page reclaim will not retry it before reaching there. The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated") only fixed the issue for mglru. However, this issue also exists in the traditional active/inactive LRU. This issue will be worse if THP is split, which makes the list longer and needs longer time to finish a batch of folios reclaim. This issue should be fixed in the same way for the traditional LRU. Therefore, the common logic was extracted to the 'find_folios_written_back' function firstly, which is then reused in the 'shrink_inactive_list' function. Finally, retry reclaiming those folios that may have missed the rotation for traditional LRU. Link: https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/ Link: https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/ Signed-off-by: Chen Ridong Reviewed-by: Barry Song --- v5->v6: - fix compile error(implicit declaration of function 'lru_gen_distance') when CONFIG_LRU_GEN is disable. - rename 'is_retried' to is_retrying suggested by Barry Song. v5: https://lore.kernel.org/linux-kernel/CAGsJ_4x3Aj7wieK1FQKQC4Vbz5N+1dExs=Q70KQt-whS1dMxpw@mail.gmail.com/ include/linux/mm_inline.h | 5 ++ mm/vmscan.c | 108 ++++++++++++++++++++++++-------------- 2 files changed, 75 insertions(+), 38 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 3fcf5fa797fe..07b2fda6fafa 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -342,6 +342,11 @@ static inline void folio_migrate_refs(struct folio *new, struct folio *old) { } + +static inline int lru_gen_distance(struct folio *folio, bool reclaiming) +{ + return -1; +} #endif /* CONFIG_LRU_GEN */ static __always_inline diff --git a/mm/vmscan.c b/mm/vmscan.c index 39886f435ec5..701716306f8b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -283,6 +283,39 @@ static void set_task_reclaim_state(struct task_struct *task, task->reclaim_state = rs; } +/** + * find_folios_written_back - Find and move the written back folios to a new list. + * @list: filios list + * @clean: the written back folios list + * @is_retrying: whether the list is retrying. + */ +static inline void find_folios_written_back(struct list_head *list, + struct list_head *clean, bool is_retrying) +{ + struct folio *folio; + struct folio *next; + + list_for_each_entry_safe_reverse(folio, next, list, lru) { + if (!folio_evictable(folio)) { + list_del(&folio->lru); + folio_putback_lru(folio); + continue; + } + + /* retry folios that may have missed folio_rotate_reclaimable() */ + if (!is_retrying && !folio_test_active(folio) && !folio_mapped(folio) && + !folio_test_dirty(folio) && !folio_test_writeback(folio)) { + list_move(&folio->lru, clean); + continue; + } + + /* don't add rejected folios to the oldest generation */ + if (lru_gen_enabled() && !lru_gen_distance(folio, false)) + set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); + } + +} + /* * flush_reclaim_state(): add pages reclaimed outside of LRU-based reclaim to * scan_control->nr_reclaimed. @@ -1959,14 +1992,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, enum lru_list lru) { LIST_HEAD(folio_list); + LIST_HEAD(clean_list); unsigned long nr_scanned; - unsigned int nr_reclaimed = 0; + unsigned int nr_reclaimed, total_reclaimed = 0; + unsigned int nr_pageout = 0; + unsigned int nr_unqueued_dirty = 0; unsigned long nr_taken; struct reclaim_stat stat; bool file = is_file_lru(lru); enum vm_event_item item; struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + bool is_retrying = false; while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) @@ -2000,22 +2037,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (nr_taken == 0) return 0; +retry: nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false); + sc->nr.dirty += stat.nr_dirty; + sc->nr.congested += stat.nr_congested; + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; + sc->nr.writeback += stat.nr_writeback; + sc->nr.immediate += stat.nr_immediate; + total_reclaimed += nr_reclaimed; + nr_pageout += stat.nr_pageout; + nr_unqueued_dirty += stat.nr_unqueued_dirty; + + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + nr_scanned, nr_reclaimed, &stat, sc->priority, file); + + find_folios_written_back(&folio_list, &clean_list, is_retrying); + spin_lock_irq(&lruvec->lru_lock); move_folios_to_lru(lruvec, &folio_list); __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), stat.nr_demoted); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); item = PGSTEAL_KSWAPD + reclaimer_offset(); if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); + + if (!list_empty(&clean_list)) { + list_splice_init(&clean_list, &folio_list); + is_retrying = true; + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&lruvec->lru_lock); + sc->nr.taken += nr_taken; + if (file) + sc->nr.file_taken += nr_taken; - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); + lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed); /* * If dirty folios are scanned that are not queued for IO, it @@ -2028,7 +2090,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, * the flushers simply cannot keep up with the allocation * rate. Nudge the flusher threads in case they are asleep. */ - if (stat.nr_unqueued_dirty == nr_taken) { + if (nr_unqueued_dirty == nr_taken) { wakeup_flusher_threads(WB_REASON_VMSCAN); /* * For cgroupv1 dirty throttling is achieved by waking up @@ -2043,18 +2105,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); } - sc->nr.dirty += stat.nr_dirty; - sc->nr.congested += stat.nr_congested; - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; - sc->nr.writeback += stat.nr_writeback; - sc->nr.immediate += stat.nr_immediate; - sc->nr.taken += nr_taken; - if (file) - sc->nr.file_taken += nr_taken; - - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, - nr_scanned, nr_reclaimed, &stat, sc->priority, file); - return nr_reclaimed; + return total_reclaimed; } /* @@ -4585,12 +4636,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap int reclaimed; LIST_HEAD(list); LIST_HEAD(clean); - struct folio *folio; - struct folio *next; enum vm_event_item item; struct reclaim_stat stat; struct lru_gen_mm_walk *walk; - bool skip_retry = false; + bool is_retrying = false; struct lru_gen_folio *lrugen = &lruvec->lrugen; struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); @@ -4616,24 +4665,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap scanned, reclaimed, &stat, sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); - list_for_each_entry_safe_reverse(folio, next, &list, lru) { - if (!folio_evictable(folio)) { - list_del(&folio->lru); - folio_putback_lru(folio); - continue; - } - - /* retry folios that may have missed folio_rotate_reclaimable() */ - if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && - !folio_test_dirty(folio) && !folio_test_writeback(folio)) { - list_move(&folio->lru, &clean); - continue; - } - - /* don't add rejected folios to the oldest generation */ - if (!lru_gen_distance(folio, false)) - set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); - } + find_folios_written_back(&list, &clean, is_retrying); spin_lock_irq(&lruvec->lru_lock); @@ -4656,7 +4688,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap list_splice_init(&clean, &list); if (!list_empty(&list)) { - skip_retry = true; + is_retrying = true; goto retry; }