From patchwork Wed Oct 16 03:30:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13837675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42B1AD20693 for ; Wed, 16 Oct 2024 03:30:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A65826B007B; Tue, 15 Oct 2024 23:30:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A16436B0082; Tue, 15 Oct 2024 23:30:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8B7406B0083; Tue, 15 Oct 2024 23:30:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 67EC06B007B for ; Tue, 15 Oct 2024 23:30:47 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2D1D0ABDE9 for ; Wed, 16 Oct 2024 03:30:28 +0000 (UTC) X-FDA: 82678038366.18.D35A3D2 Received: from mail-pl1-f178.google.com (mail-pl1-f178.google.com [209.85.214.178]) by imf02.hostedemail.com (Postfix) with ESMTP id 99C3A80004 for ; Wed, 16 Oct 2024 03:30:29 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hb6hJehq; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729049371; a=rsa-sha256; cv=none; b=3zSParXAdlq8o7c/mqvY4f9tseWK0ShsSIA7N1xUN/X9W0HeKz1Zz44+qEnqtxsw7j7shA bzzG2qKrba/kb608hSDmJtceOYp3WwGwwcIHqC8Axtg6lsCfwJ9RhmpnP0TtxQHrSHMn2A caotPg7XEJvvyn87rqFiiT7noaItKLI= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hb6hJehq; spf=pass (imf02.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.214.178 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729049371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=SQ36mmbSu0pf/cAbG7e4iK4xKL9jwKjnD5CjIn/Qois=; b=CqsyihDntF+cbMeXtQ0ErhEB1EM1quVMwUmtWBX0zLva+9+umHp8x+ZJC6M3s+9fgeKEWI yrlyZWfwfub+A7V+xlm3livcMSaBPv4iKZk6kxwWMGPj04y2A/m3mTs9VWf3FV/8y62Ji2 97fuT1CDwgdojpiXGs/fvsgxrCVJDzY= Received: by mail-pl1-f178.google.com with SMTP id d9443c01a7336-20c805a0753so49825375ad.0 for ; Tue, 15 Oct 2024 20:30:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729049444; x=1729654244; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=SQ36mmbSu0pf/cAbG7e4iK4xKL9jwKjnD5CjIn/Qois=; b=hb6hJehq6oAkcptgvwGZmQJ2TrjkIF3gYLPrSD26R8T3GQ68Pauh7Zpg5/nORg806t r12c0SAy80vUCo31DgR+FjVfUHeA0liqabi1dt7Zu99jBxzaGM1H31J3ui7V6f5Rl8Gs UudctuTHHuP8eEX3gBE25+hQw5wFJcZl5nlyn/pXhqBOZoRvcdGPeHVZMztTWjzAn+QA 7+fna86eogiPMnuwnxjspETaPzEWJ92chasDrFPfaUS0/PyKfsziV2PpsHHOTXcuusFy OfKVhOto0gyIzt5i/zTGJz4XQg23NnGLep5adcQbu8FclTXS8aS3ZuUikZJ42IMZWZ5w B5FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729049444; x=1729654244; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SQ36mmbSu0pf/cAbG7e4iK4xKL9jwKjnD5CjIn/Qois=; b=kaSW4Ph+PFpu50XZ+2yj3ylRbe4tCCcg3gYw6Upvr4MSQGZDTr9QbkdHcinlX0ujPT TEKDleSokRC4sAfJNC8/R7qIuQgNfyt/0e+70/kcUyuXcGEjQwnRkA3+1VjpK3ofbJa9 L3x3H6VXRTho/VHnu4Qt1nk/v1ukIdmw2KlSjvsyxvykN315wbPzsxEDviHkWRxOne8H Ii5NUaOnAVkLbylZU9KEAq2vw9ROwyEKH7qklrYcgWFDq5Mml6lrrZP/1rjRyND7Xnna 1jYWPGkn3yvihydm2RebGK398pbChIuT/0qD5jrNyhboeSdf4G+pvfLcOf2obu220Rh0 oL6A== X-Forwarded-Encrypted: i=1; AJvYcCX/nOaq5exH8r5iB00sBnobHG36Bf68VtUnTmp/aM2pdX35DqnYEmcP5VCWL94ZiB955ualudmlxA==@kvack.org X-Gm-Message-State: AOJu0YxZ0MLN64gZai9+AxtGIPg61CvITNszs8ok8jTkaAZdrHJdDR9t Lfdkf1Ww1hw1dtMa621vRne7Sb+un13ykaLYBMQ+zUJLcdtKUNf8 X-Google-Smtp-Source: AGHT+IG5j8BN+C4OIH3m7G6bs8K3FsD2Iw/4x2nxnpvQ1c/F7F9oLeUB3otrh4oYnCo6/hjow8Sh/Q== X-Received: by 2002:a17:902:e80c:b0:20b:7e0d:8f with SMTP id d9443c01a7336-20d27e46a2fmr34283895ad.3.1729049443517; Tue, 15 Oct 2024 20:30:43 -0700 (PDT) Received: from Barrys-MBP.hub ([2407:7000:8942:5500:e037:6314:f5c6:369b]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20d1805da3fsm19603485ad.243.2024.10.15.20.30.36 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 15 Oct 2024 20:30:43 -0700 (PDT) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: david@redhat.com, fengbaopeng@honor.com, gaoxu2@honor.com, hailong.liu@oppo.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, lokeshgidra@google.com, mhocko@suse.com, minchan@kernel.org, ngeoffray@google.com, shli@fb.com, surenb@google.com, v-songbaohua@oppo.com, yipengxiang@honor.com, yuzhao@google.com, Gao Xu Subject: [PATCH v2] mm: mglru: provide a separate list for lazyfree anon folios Date: Wed, 16 Oct 2024 16:30:30 +1300 Message-Id: <20241016033030.36990-1-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 X-Stat-Signature: opoh1nj83wweptwrzk5by14q4fwwjydw X-Rspamd-Queue-Id: 99C3A80004 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1729049429-642470 X-HE-Meta: U2FsdGVkX19NzJRhg4zDGFzSmshy+IWJZU7UPcSb1KYONn2fXwV+/k5D8JMFsQOjmH2bxUNvQKMcA+//q6cTQ6moK4w3zL/kZdSYMs+/q9qeKfoizTLgd+sAo0cYYr1qyBNc76GIQp1BE0ekEkyGVjtel0JL0oelbyEPVChjWB6g/WvnqYkU7U7AtDgxPaJxI8Mr42Pjsqi9kyhJlQ1II6TZLjlqKyNP9x93w0yJ2WcyQ7KOM/Jfat6N7sGR0sn1PoMfZIFqldDrzIfKzzL7RfcnheU1xkUzx4DJincCNZgTAjnMgniJnT0xaspa54IYjJv4mBnSq8aGgoE/MiMjqYBiOCak8rzFevrbDo3KjKVSOqmrjh+z63OWkZz5CksU12V5U9HolL5Zw/+A2iPI7xJoC1QTMPN1SVdiuO9NyHg2Ul7/V32Snxb9ri8tOFt+o0bvGBsTVSDXp11int1REU1Wl51WLx0dd/cOGP63EvhoZ126gDCJd/8wxdU2z+Ewvpjjc3HJY9EXrU3/mTK0WPl+L8OXBbc7m1w7QgDhKGb2cv9KpY3U1zCSUjQ/asma1Z/BfPLfT4rXxJSNfCrDVgTSTKkdUM9OcSUfsHhGgzU8k4+4kLC9Hw8KM8v10FilxSgJUEM3GWU1AxBg8NZ0b9+h7or32EUGzUPLJuotpKkhrl6u5BjYWwi1sTvG7YhNAuaESnIDVK0+JIQcwlsjoiHqS64VcX2Jxv//RdV+YGiZJjeK7yoRE7WAnMT9AhQV/79Nv5k+PHyfZzCfNBR3E60N2FoSkZBtBcfQUqFfGiUm2BbXHXy3sIhY/+Zw849wf8ed5f0i+MVzIhiCQADxXzq8duO/RFfXYcTamOKaz1pP1CrPcLsk2xq1qf/al29qOzcQrx0kpPmELdOtALYUU9pRZ2po/QTKEeqFYy+DsmqimVoJAqYWERToXPG+zsIbhrRsSQclPDtQuCT+Uad KS7T/gix Ky0Teplp4/mgumyL5OMXlRBHBnEUIddUUTOi3jeGV98EIcLm0NcTRnu/Hiv4T0F1TeU2Hgi7mFUulLN7Nba8trLaraISPSEz1DdBG/+WrdDYfZ4p6PrGqyIgFnqS4afNtgIWiw70vQchIm25Yw55n8nq7xkn4LpN1p2AzAjmDEB+ya6oCOME7KfwETaGDW8FP6pydFW4Zx2ZkOIavsy6nNODq7LfdGuMRtKNKmCFt7ZuQaqF02YCkzl7559C9DdQ9nLLYLceYHLY1pNrkglS4MvLtSxi5tTFD+Y7w0tMB7mBmr9uEssegLWGMSDaX3CSjcTLKyZ5QHPhYRM1OWBwrUy3LW6c/IXORtEYUAd6WZ0TLX8vnQyVF/tObYMG/sEoypeZn8v8Fl+tHg0Vqvw3ELglxPcql/KbgbLRQJK4kB0cM8Xyqkb5XFgWv9GHnFdDRFPaahNh6DBY0Rnb0UlUe4pbrMi6kvrjPt0dp42bcRnBDS7UX2uCx6xgwRytuh7mKDwpLl3X3Sn7uUJDMQ+eWkLBfTKEcHUIJURK9GvnkaE1uTcebYNh95jiuAg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Barry Song This builds on the discussion about Gaoxu's work[1]. Significant file refaults may occur when userspace makes extensive use of MADV_FREE on anonymous folios, as these folios are not positioned in an easily reclaimable area within the LRU. According to Lokesh, MADV_FREE'd anon folios are expected to be released earlier than file folios. One option, as implemented by Gao Xu, is to place lazyfree anon folios at the tail of the file's `min_seq` generation[1]. However, this approach results in lazyfree folios being released in a LIFO manner, which conflicts with LRU behavior, as noted by Michal. To address this, this patch proposes maintaining a separate list for lazyfree anon folios while keeping them classified under the "file" LRU type to minimize code changes. These lazyfree anon folios will still be counted as file folios and share the same generation with regular files. In the eviction path, the lazyfree list will be prioritized for scanning before the actual file LRU list. Thanks to Gao Xu for the test results, which look quite promising: Base version:android V (enable Android ART use MADV_FREE) Test cases: 60 apps repeatedly restarted, tested for 8 hours; The test results are as follows: workingset_refault_anon workingset_refault_file base 42016805 92010542 patch 19834873 49383572 % diff -52.79% -46.33% A comparative test was also performed on approach [1], with the following results: workingset_refault_anon workingset_refault_file lazyfree-tail 20313395 52203061 patch 19834873 49383572 % diff -2.36% -5.40% [1] https://lore.kernel.org/linux-mm/f29f64e29c08427b95e3df30a5770056@honor.com/ Tested-by: Gao Xu Signed-off-by: Barry Song --- -v2: collect Gao Xu's test results and tested-by tag, thanks! include/linux/mm_inline.h | 5 +- include/linux/mmzone.h | 2 +- mm/vmscan.c | 97 +++++++++++++++++++++++---------------- 3 files changed, 61 insertions(+), 43 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 87580e8363ef..615fe80d73d0 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -226,6 +226,7 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, int gen = folio_lru_gen(folio); int type = folio_is_file_lru(folio); int zone = folio_zonenum(folio); + int lazyfree = type ? folio_test_anon(folio) : 0; struct lru_gen_folio *lrugen = &lruvec->lrugen; VM_WARN_ON_ONCE_FOLIO(gen != -1, folio); @@ -265,9 +266,9 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, lru_gen_update_size(lruvec, folio, -1, gen); /* for folio_rotate_reclaimable() */ if (reclaiming) - list_add_tail(&folio->lru, &lrugen->folios[gen][type][zone]); + list_add_tail(&folio->lru, &lrugen->folios[gen][type + lazyfree][zone]); else - list_add(&folio->lru, &lrugen->folios[gen][type][zone]); + list_add(&folio->lru, &lrugen->folios[gen][type + lazyfree][zone]); return true; } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 96dea31fb211..5cb86ea324be 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -436,7 +436,7 @@ struct lru_gen_folio { /* the birth time of each generation in jiffies */ unsigned long timestamps[MAX_NR_GENS]; /* the multi-gen LRU lists, lazily sorted on eviction */ - struct list_head folios[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; + struct list_head folios[MAX_NR_GENS][ANON_AND_FILE + 1][MAX_NR_ZONES]; /* the multi-gen LRU sizes, eventually consistent */ long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the exponential moving average of refaulted */ diff --git a/mm/vmscan.c b/mm/vmscan.c index fd3908d43b07..e2f13a9b50da 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3736,21 +3736,25 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap) /* prevent cold/hot inversion if force_scan is true */ for (zone = 0; zone < MAX_NR_ZONES; zone++) { - struct list_head *head = &lrugen->folios[old_gen][type][zone]; + int list_num = type ? 2 : 1; + struct list_head *head; - while (!list_empty(head)) { - struct folio *folio = lru_to_folio(head); + for (int i = list_num - 1; i >= 0; i--) { + head = &lrugen->folios[old_gen][type + i][zone]; + while (!list_empty(head)) { + struct folio *folio = lru_to_folio(head); - VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); - VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); - VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); - VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); + VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); + VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); - new_gen = folio_inc_gen(lruvec, folio, false); - list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]); + new_gen = folio_inc_gen(lruvec, folio, false); + list_move_tail(&folio->lru, &lrugen->folios[new_gen][type + i][zone]); - if (!--remaining) - return false; + if (!--remaining) + return false; + } } } done: @@ -4302,6 +4306,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c int refs = folio_lru_refs(folio); int tier = lru_tier_from_refs(refs); struct lru_gen_folio *lrugen = &lruvec->lrugen; + int lazyfree = type ? folio_test_anon(folio) : 0; VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio); @@ -4317,7 +4322,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* promoted */ if (gen != lru_gen_from_seq(lrugen->min_seq[type])) { - list_move(&folio->lru, &lrugen->folios[gen][type][zone]); + list_move(&folio->lru, &lrugen->folios[gen][type + lazyfree][zone]); return true; } @@ -4326,7 +4331,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c int hist = lru_hist_from_seq(lrugen->min_seq[type]); gen = folio_inc_gen(lruvec, folio, false); - list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); + list_move_tail(&folio->lru, &lrugen->folios[gen][type + lazyfree][zone]); WRITE_ONCE(lrugen->protected[hist][type][tier - 1], lrugen->protected[hist][type][tier - 1] + delta); @@ -4336,7 +4341,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c /* ineligible */ if (!folio_test_lru(folio) || zone > sc->reclaim_idx) { gen = folio_inc_gen(lruvec, folio, false); - list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); + list_move_tail(&folio->lru, &lrugen->folios[gen][type + lazyfree][zone]); return true; } @@ -4344,7 +4349,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c if (folio_test_locked(folio) || folio_test_writeback(folio) || (type == LRU_GEN_FILE && folio_test_dirty(folio))) { gen = folio_inc_gen(lruvec, folio, true); - list_move(&folio->lru, &lrugen->folios[gen][type][zone]); + list_move(&folio->lru, &lrugen->folios[gen][type + lazyfree][zone]); return true; } @@ -4388,7 +4393,7 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, int type, int tier, struct list_head *list) { - int i; + int i, j; int gen; enum vm_event_item item; int sorted = 0; @@ -4410,33 +4415,38 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc, LIST_HEAD(moved); int skipped_zone = 0; int zone = (sc->reclaim_idx + i) % MAX_NR_ZONES; - struct list_head *head = &lrugen->folios[gen][type][zone]; - - while (!list_empty(head)) { - struct folio *folio = lru_to_folio(head); - int delta = folio_nr_pages(folio); - - VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); - VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); - VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); - VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); - - scanned += delta; + int list_num = type ? 2 : 1; + struct list_head *head; + + for (j = list_num - 1; j >= 0; j--) { + head = &lrugen->folios[gen][type + j][zone]; + while (!list_empty(head)) { + struct folio *folio = lru_to_folio(head); + int delta = folio_nr_pages(folio); + + VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); + VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio); + VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio); + + scanned += delta; + + if (sort_folio(lruvec, folio, sc, tier)) + sorted += delta; + else if (isolate_folio(lruvec, folio, sc)) { + list_add(&folio->lru, list); + isolated += delta; + } else { + list_move(&folio->lru, &moved); + skipped_zone += delta; + } - if (sort_folio(lruvec, folio, sc, tier)) - sorted += delta; - else if (isolate_folio(lruvec, folio, sc)) { - list_add(&folio->lru, list); - isolated += delta; - } else { - list_move(&folio->lru, &moved); - skipped_zone += delta; + if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) + goto isolate_done; } - - if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH) - break; } +isolate_done: if (skipped_zone) { list_splice(&moved, head); __count_zid_vm_events(PGSCAN_SKIP, zone, skipped_zone); @@ -5588,8 +5598,15 @@ void lru_gen_init_lruvec(struct lruvec *lruvec) for (i = 0; i <= MIN_NR_GENS + 1; i++) lrugen->timestamps[i] = jiffies; - for_each_gen_type_zone(gen, type, zone) + for_each_gen_type_zone(gen, type, zone) { INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]); + /* + * lazyfree anon folios have a separate list while using + * file as type + */ + if (type) + INIT_LIST_HEAD(&lrugen->folios[gen][type + 1][zone]); + } if (mm_state) mm_state->seq = MIN_NR_GENS;