From patchwork Sat Jan 11 09:15:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chen Ridong X-Patchwork-Id: 13935966 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34482E7719A for ; Sat, 11 Jan 2025 09:25:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C7D96B007B; Sat, 11 Jan 2025 04:25:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 94F3B6B0082; Sat, 11 Jan 2025 04:25:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C8316B0083; Sat, 11 Jan 2025 04:25:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5B3246B007B for ; Sat, 11 Jan 2025 04:25:50 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C4B5045267 for ; Sat, 11 Jan 2025 09:25:49 +0000 (UTC) X-FDA: 82994638818.18.5B357F7 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf17.hostedemail.com (Postfix) with ESMTP id DDF9140013 for ; Sat, 11 Jan 2025 09:25:45 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736587548; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=sF9R0cmgkfxOnqOw+4jTN3H2F7lqBr1LHHdkAQaf1SI=; b=k+E3x45+UssDoFTO9xi1hmVTbZRJ4CH6XrRJlfoQtSX95jG4PYbkcrR5m5l1lE2R4g12/l qoyS3wOfIggYWhuBWuJOlc0HXBU8XDgL+sV9gMIyMQN3LatIMmbvF6br5qgH0i51BBRNlx ex2+pUqQhth1g+V+Pqw4lcJMOvKl3E8= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of chenridong@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=chenridong@huaweicloud.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736587548; a=rsa-sha256; cv=none; b=tlMNdhMxSa2UT6+xTQn24xXpTgmGy96XTEASQ8MkXqzhjGZ4jSgaScPvSK45sWDFdZ4J9R /QjCmu70R6Q/e9l40togNMo1qlmIhtRwe0BhRbd7+tPCR4MlNgeoCKctKP71h63NbHuyrB z0ePQN4UNMqG+3HrQyG8RGw2YaPSCJQ= Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4YVY5x2mqjz4f3jqM for ; Sat, 11 Jan 2025 17:25:25 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.252]) by mail.maildlp.com (Postfix) with ESMTP id 799CC1A0D68 for ; Sat, 11 Jan 2025 17:25:40 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP3 (Coremail) with SMTP id _Ch0CgAnesQGOYJnP+SNAg--.9868S2; Sat, 11 Jan 2025 17:25:38 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, mhocko@suse.com, hannes@cmpxchg.org, yosryahmed@google.com, yuzhao@google.com, david@redhat.com, willy@infradead.org, ryan.roberts@arm.com, baohua@kernel.org, 21cnbao@gmail.com, wangkefeng.wang@huawei.com Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, chenridong@huawei.com, wangweiyang2@huawei.com, xieym_ict@hotmail.com Subject: [PATCH v7 mm-unstable] mm: vmscan: retry folios written back while isolated for traditional LRU Date: Sat, 11 Jan 2025 09:15:04 +0000 Message-Id: <20250111091504.1363075-1-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-CM-TRANSID: _Ch0CgAnesQGOYJnP+SNAg--.9868S2 X-Coremail-Antispam: 1UD129KBjvJXoWfJw17CFyfKF1kXw15CF1fXrb_yoWDArWfpF Z3WrsFy3y8Jr1fKrsxZF4DWryak3ykWF1UJFW7Gry2y3W3uryFga42k34YvFW5GrykAF9a va9xXryDWa1jyFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvFb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAa w2AFwI0_Jw0_GFyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a 6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6x kF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AK xVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvj xUOBMKDUUUU X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ X-Rspamd-Server: rspam05 X-Stat-Signature: 5qtuje3enpqkecnsgckxdhjrcsy5fnfj X-Rspamd-Queue-Id: DDF9140013 X-Rspam-User: X-HE-Tag: 1736587545-241176 X-HE-Meta: U2FsdGVkX18yPaQp8eAn9SliPELDM2QP5+fgujidBaRWwFDoTCeG1kLI9gybGxdAIzrWwYD7bDy5pKooKSEIWu3uchAM6036kkCv6/yvbfDq2o/MrtWAeeDv345zM4AVuthaLkTMw7MSIOY4j5JSmuLvUoTI6hGY/yoD5xQvaeHtEJJ9FLU+zoZbypGVSUfD1hmop2MhIvQrgwhsabRzUyE7icWrDoAB6jCWmeG9kcvyJVY38O96PFNVg9EeXt5PKxEp/0hE0y1rGmSmCWToc4wWylKxqeK5NUptIoUcy0V2Sjip45v4luuqBLOug6eLpP3L5+OcSngN3ZLasasWsgjEwaXseIZSXL2Eg+8HrfnTGCj93d8Wzo0OVI6ePval7935KWJ+U1pzfh0riELwZk5kml1EZ0u660EyNXn3Cta5YBgHDInigEbKujLuBE96is2twJtLGRYGaThKE1403RdelHDxA5clNyuNIGmmHSS5Zl5hQGud0pPWTV4jOeY4ErVjlW9pgwkBF5RyB3I9O/Y0ugTzwdESwH6KvTgOY374b2OcAXKTb6QPuDQkiy3TSB1YzxjmZs6Du/GAk3Uqj3URoE1y0KNQpH4Xt9LYYc34cOrqQhW+46TH9shPlIUxxTk/WjyuO1DSfRuCjX/OB9RWV2Aznl4fo1mgKgtULcvYO56BoDuF901f36z5MMksOkeNgKi4nuqWEaSQa/6gCOzm05acHeGoBxtDf6qM+ISxMhMuDRBRRU7vxPjeV3kc6LwG9SbkxTdEo/U6l25LMY02B2SV6CuVvl04dJJp4vxk10bxTBHkod/f61Dj5ZdMtTtKItRuTYZTdH1E4Bhi2wVRahwMu1hcg0n55Nw6Ughg6OzVtmnXR24dfMVYdF5RCyrc1aaLLaG1+NrRXkKPLY17xvNHo5yhqQAyAsNAuhwFXz8kPxB1WpGgyjacf8+PKFmY78kEiZoZ6suNAf9 41rTyjy6 zQK/i0DmhEWUTcy+xBx+PFt62UJNpNkAE0CINpVZ4DalPLppDdQMiUExdmTjNfxSQFepeD/ax1euHeZ0AMJ+Y4C4E9b6rcq02ho+YKPOKVVf9NNY+4zvhK1/X/23qznzD3MJL5jrvFzYqh4wiLboJ6NUOQlDsqWIDSYZ/SJ4l7PW9KQ9+5kTU2AWGwIAWS9n+Eh5V2r7dPQoThDg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chen Ridong As commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated") mentioned: The page reclaim isolates a batch of folios from the tail of one of the LRU lists and works on those folios one by one. For a suitable swap-backed folio, if the swap device is async, it queues that folio for writeback. After the page reclaim finishes an entire batch, it puts back the folios it queued for writeback to the head of the original LRU list. In the meantime, the page writeback flushes the queued folios also by batches. Its batching logic is independent from that of the page reclaim. For each of the folios it writes back, the page writeback calls folio_rotate_reclaimable() which tries to rotate a folio to the tail. folio_rotate_reclaimable() only works for a folio after the page reclaim has put it back. If an async swap device is fast enough, the page writeback can finish with that folio while the page reclaim is still working on the rest of the batch containing it. In this case, that folio will remain at the head and the page reclaim will not retry it before reaching there". The commit 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated") only fixed the issue for mglru. However, this issue also exists in the traditional active/inactive LRU and was found at [1]. It can be reproduced with below steps: 1. Compile with CONFIG_TRANSPARENT_HUGEPAGE=y 2. Mount memcg v1, and create memcg named test_memcg and set limit_in_bytes=1G, memsw.limit_in_bytes=2G. 3. Create a 1G swap file, and allocate 1.05G anon memory in test_memcg. It was found that: cat memory.limit_in_bytes 1073741824 cat memory.memsw.limit_in_bytes 2147483648 cat memory.usage_in_bytes 1073664000 cat memory.memsw.usage_in_bytes 1129840640 free -h total used free Mem: 31Gi 1.2Gi 28Gi Swap: 1.0Gi 1.0Gi 2.0Mi As shown above, the test_memcg used about 50M swap, but almost 1G swap memory was used, which means that 900M+ may be wasted because other memcgs can not use these swap memory. This issue should be fixed in the same way as mglru. Therefore, the common logic was extracted to the 'find_folios_written_back' function firstly, which is then reused in the 'shrink_inactive_list' function. Finally, retry reclaiming those folios that may have missed the rotation for traditional LRU. After change, the same test case. only 54M swap was used. cat memory.usage_in_bytes 1073463296 cat memory.memsw.usage_in_bytes 1129828352 free -h total used free Mem: 31Gi 1.2Gi 28Gi Swap: 1.0Gi 54Mi 969Mi [1] https://lore.kernel.org/linux-kernel/20241010081802.290893-1-chenridong@huaweicloud.com/ [2] https://lore.kernel.org/linux-kernel/CAGsJ_4zqL8ZHNRZ44o_CC69kE7DBVXvbZfvmQxMGiFqRxqHQdA@mail.gmail.com/ Signed-off-by: Chen Ridong --- v6->v7: - fix conflict based on mm-unstable. - update the commit message(quote from YU's commit message, and add improvements after change.) - restore 'is_retrying' to 'skip_retry' to keep original semantics. v6: https://lore.kernel.org/linux-kernel/20241223082004.3759152-1-chenridong@huaweicloud.com/ mm/vmscan.c | 114 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 76 insertions(+), 38 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 01dce6f26..6861b6937 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -183,6 +183,9 @@ struct scan_control { struct reclaim_state reclaim_state; }; +static inline void find_folios_written_back(struct list_head *list, + struct list_head *clean, struct lruvec *lruvec, int type, bool is_retrying); + #ifdef ARCH_HAS_PREFETCHW #define prefetchw_prev_lru_folio(_folio, _base, _field) \ do { \ @@ -1960,14 +1963,18 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, enum lru_list lru) { LIST_HEAD(folio_list); + LIST_HEAD(clean_list); unsigned long nr_scanned; - unsigned int nr_reclaimed = 0; + unsigned int nr_reclaimed, total_reclaimed = 0; + unsigned int nr_pageout = 0; + unsigned int nr_unqueued_dirty = 0; unsigned long nr_taken; struct reclaim_stat stat; bool file = is_file_lru(lru); enum vm_event_item item; struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + bool skip_retry = false; while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) @@ -2001,22 +2008,47 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (nr_taken == 0) return 0; +retry: nr_reclaimed = shrink_folio_list(&folio_list, pgdat, sc, &stat, false); + sc->nr.dirty += stat.nr_dirty; + sc->nr.congested += stat.nr_congested; + sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; + sc->nr.writeback += stat.nr_writeback; + sc->nr.immediate += stat.nr_immediate; + total_reclaimed += nr_reclaimed; + nr_pageout += stat.nr_pageout; + nr_unqueued_dirty += stat.nr_unqueued_dirty; + + trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + nr_scanned, nr_reclaimed, &stat, sc->priority, file); + + find_folios_written_back(&folio_list, &clean_list, lruvec, 0, skip_retry); + spin_lock_irq(&lruvec->lru_lock); move_folios_to_lru(lruvec, &folio_list); __mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(), stat.nr_demoted); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); item = PGSTEAL_KSWAPD + reclaimer_offset(); if (!cgroup_reclaim(sc)) __count_vm_events(item, nr_reclaimed); __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed); __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed); + + if (!list_empty(&clean_list)) { + list_splice_init(&clean_list, &folio_list); + skip_retry = true; + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&lruvec->lru_lock); + sc->nr.taken += nr_taken; + if (file) + sc->nr.file_taken += nr_taken; - lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed); + lru_note_cost(lruvec, file, nr_pageout, nr_scanned - total_reclaimed); /* * If dirty folios are scanned that are not queued for IO, it @@ -2029,7 +2061,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, * the flushers simply cannot keep up with the allocation * rate. Nudge the flusher threads in case they are asleep. */ - if (stat.nr_unqueued_dirty == nr_taken) { + if (nr_unqueued_dirty == nr_taken) { wakeup_flusher_threads(WB_REASON_VMSCAN); /* * For cgroupv1 dirty throttling is achieved by waking up @@ -2044,18 +2076,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK); } - sc->nr.dirty += stat.nr_dirty; - sc->nr.congested += stat.nr_congested; - sc->nr.unqueued_dirty += stat.nr_unqueued_dirty; - sc->nr.writeback += stat.nr_writeback; - sc->nr.immediate += stat.nr_immediate; - sc->nr.taken += nr_taken; - if (file) - sc->nr.file_taken += nr_taken; - - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, - nr_scanned, nr_reclaimed, &stat, sc->priority, file); - return nr_reclaimed; + return total_reclaimed; } /* @@ -4637,8 +4658,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap int reclaimed; LIST_HEAD(list); LIST_HEAD(clean); - struct folio *folio; - struct folio *next; enum vm_event_item item; struct reclaim_stat stat; struct lru_gen_mm_walk *walk; @@ -4668,26 +4687,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap scanned, reclaimed, &stat, sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); - list_for_each_entry_safe_reverse(folio, next, &list, lru) { - DEFINE_MIN_SEQ(lruvec); - - if (!folio_evictable(folio)) { - list_del(&folio->lru); - folio_putback_lru(folio); - continue; - } - - /* retry folios that may have missed folio_rotate_reclaimable() */ - if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && - !folio_test_dirty(folio) && !folio_test_writeback(folio)) { - list_move(&folio->lru, &clean); - continue; - } - - /* don't add rejected folios to the oldest generation */ - if (lru_gen_folio_seq(lruvec, folio, false) == min_seq[type]) - set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); - } + find_folios_written_back(&list, &clean, lruvec, type, skip_retry); spin_lock_irq(&lruvec->lru_lock); @@ -5706,6 +5706,44 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control * #endif /* CONFIG_LRU_GEN */ +/** + * find_folios_written_back - Find and move the written back folios to a new list. + * @list: filios list + * @clean: the written back folios list + * @lruvec: the lruvec + * @type: LRU_GEN_ANON/LRU_GEN_FILE, only for multi-gen LRU + * @skip_retry: whether skip retry. + */ +static inline void find_folios_written_back(struct list_head *list, + struct list_head *clean, struct lruvec *lruvec, int type, bool skip_retry) +{ + struct folio *folio; + struct folio *next; + + list_for_each_entry_safe_reverse(folio, next, list, lru) { +#ifdef CONFIG_LRU_GEN + DEFINE_MIN_SEQ(lruvec); +#endif + if (!folio_evictable(folio)) { + list_del(&folio->lru); + folio_putback_lru(folio); + continue; + } + + /* retry folios that may have missed folio_rotate_reclaimable() */ + if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) && + !folio_test_dirty(folio) && !folio_test_writeback(folio)) { + list_move(&folio->lru, clean); + continue; + } +#ifdef CONFIG_LRU_GEN + /* don't add rejected folios to the oldest generation */ + if (lru_gen_enabled() && lru_gen_folio_seq(lruvec, folio, false) == min_seq[type]) + set_mask_bits(&folio->flags, LRU_REFS_FLAGS, BIT(PG_active)); +#endif + } +} + static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { unsigned long nr[NR_LRU_LISTS];