From patchwork Fri Dec 8 06:14:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13484883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A8BFC4167B for ; Fri, 8 Dec 2023 06:14:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFD856B007B; Fri, 8 Dec 2023 01:14:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAD666B007E; Fri, 8 Dec 2023 01:14:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D74086B0080; Fri, 8 Dec 2023 01:14:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BCF0A6B007B for ; Fri, 8 Dec 2023 01:14:15 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 94368160291 for ; Fri, 8 Dec 2023 06:14:15 +0000 (UTC) X-FDA: 81542636070.16.55256EE Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf07.hostedemail.com (Postfix) with ESMTP id E250E4000B for ; Fri, 8 Dec 2023 06:14:13 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="1/M/ZKm+"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702016053; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=nC7NDg86WFzP7WxsK6WV71VcqRExKFBLCjeKPeIee+F1WN1xVhzTGlWpaGZj78b+0X5JMJ g9CDuIRgT9TyEuUDrIJZ4zZwPuNNvJfZf9BaWwllY+YlNTC8t0lPm8rBhl8Q+oyvlB5m1K kbEo8VBzxOqBTSszSGoDuTEdX40zy+k= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="1/M/ZKm+"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702016053; a=rsa-sha256; cv=none; b=7QcYpwiwAXFG0U8v1cfsmdFjUA55JW8Dj9RTycfZ6sxx3EKUmLqthi8lE3Oqiy49QqRKII bJlhhhwt957xJqo028/EekafHFt3LM3Xtur6DYOZ2r1ty9mFgtw3lSncp4BoFOwg+JaKeF OHYblD4Di0JJHgaIp0F/CnYfV7x/IxY= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-da3dd6a72a7so2427642276.0 for ; Thu, 07 Dec 2023 22:14:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016053; x=1702620853; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=1/M/ZKm+LGznMULpUpoTeyo4ZNqfAWQXZRa54A0SGIu6Z2Fw/9k3irIuW2FD38NIGg qUXG79nVzm7IrOzW1bb0RTo+O6+ovi/SKNFAa6qzjQ5wzByoZPfHVrNH3Er67ca2E2Cv Qb9cbbEYLQSDe9B16wZm2grBgwV4C8TW+SoATIJcQD/SDJML6YvqZNSzOSnNLURO+RlF JXxsLDwM5gfP8i4zAbj9t+WpOXda1uIUN68X+Zi454dxWWDlphGHSTLTGSllqqq7iLwl RF3lypXEcseWy9uhqj9ZX2Mrh4GZxsb+QjCheiD5nnD0AZX9WzhOOqPcdwyQSoniG3wz eJqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016053; x=1702620853; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=ZZHzZS/MMtwnTLEEoyacPQGrU+1CbVcspiAT1GQnjlWHBdyzT8PnMZ6qLHYWYaaWFu lUc8XFVoGQDHjcl8EPTHXlC9brF0UtqDDcHCo1Ry9tMnhvR4KHZ1ztc/MIG5Ff3QKWP2 9QmwK9ZlgW46PX6hM609WoWYCTnHMZhnVy1H4T+pNQ91Pz0Duei4pDiUBlY9i9BeWabB 5DKLMsEHvkvZdqjq8SCkzADNWERJPHYFbJ0r8T9x+4Yvhx1eXn7R5lSxERznR6m6V2wy TDPTYQ0RQEU4zFSepantmeACrLblP/EFEhYNoethkwOZVp7iICApv6Ji/Qpfh0PLXbyR T92A== X-Gm-Message-State: AOJu0YxUrqL9UiZFTv0XtKmI1NBWJ5w0O7Wkz4UujJzlS8ej0HKnXurY 8gULxndoGDFfJLR7eXqDOLhGeLlTY7A= X-Google-Smtp-Source: AGHT+IHhxQvDbfCzLXGcrgriZnrDb7xchNgA0cUieagOjXHbp5V8pGi0I1EYaipp3zZy55FFcaB1WVBR9y4= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a25:cf81:0:b0:d9a:36cd:482e with SMTP id f123-20020a25cf81000000b00d9a36cd482emr49952ybg.13.1702016052872; Thu, 07 Dec 2023 22:14:12 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:04 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1 1/4] mm/mglru: fix underprotected page cache From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Charan Teja Kalla , Kalesh Singh , stable@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E250E4000B X-Stat-Signature: qeo5y5rg7d18cby13mah51bqbxthi7a4 X-HE-Tag: 1702016053-913452 X-HE-Meta: U2FsdGVkX1+PsYD7VoUsRhzwgU8LQvpdFfouHjG35lm7bYPzZWea3fg1jW+vMAzmzqVZbxZsiLDhV3XM4u+erqq5g9/7Z8Ff4nNVACYfoZC9F9VKyw/C2wuQs6Jvir+tOosCB1hGmLThldZXLoS+z5WTFfbJcRxEFXnVMzg261hJhUqCBZoOlhAK61HSm3x0Qs8mD3GFUqMmZFv3ORyVacyfoCs/yEuZ4ZRnb6UrayExBBXYn5sKt9DqPzMT8+Gr2K5XEeaq4GBEDPh6EAHi2yuEVixDUfyJwvK1I7ysnqmmSVd8yJLR4IAL0OBqVJA2ayfez8TTsMYbH1g7LLR3qOzLnL/jFU91LuVNxsXC9u5h5kWzVBDS8AQRDrw+nMAgulY6yuAOaBRTua8/QWv3v7qxf20PsJ10RLO5nxMsLFmbNVxDrtrsdI4ZWHTwtUAdM+o7U5JwmDVNjeAMGqzW3B1UBe6n8BSeQnah6xyeuIjS1y76pcZCiJWWsRtgFliXty1a8d6lGVA6xJys1rS9s9jjwrw2XsszhJ7kE7w+bufEu5pf6qiPv8abtADj83G0QaI6416h3etuZnxWdYyIh4ab2nCI0gpmYmnnG/WhEfOAgjjSTv+44Zrp2MigQVVpGXmJ8vgUJN6bexqJa51tbRkOXKnyTRKmp/O0mRvg2Vv4eaU223xnPD1Pn3kkaxznmgKvs4DvyHyDQY0EtrZyG3wMvKkPBD5S1ak+GsfYy0bypRHCSoA5NiJ7bzuM5nK4Y7WD6CPkFZLFvPUeWFboSvVrfo0jtSpDhR5hapK3mWYRlbwtE8rXb9fG4kcAr07cFK+UktuIPs3lRW0v4M9n8FRlCIV9hJurzD663jLXr2MyRBmBPxhevRfoxiKk6rdW4VfyZK5hASg9ufKKwfLAtU7dvIyGgKsnI6D9khyXwfMwP7V5fZ7sDgBuXmxv6vUSGX6+BZCUQqUpDX9PsQz inV/uQBp iYlc8nliO5Ze2mRfk5s8RiIwRtq0bKNEdlEqRq2IPPFxJNBvNnEEcFYucn5gW+kJnWFy03wUXh5kN++ibAM+koxKj+Pdjr3MmC9NUIFuvu8veYSLhuEN/hPwwUEHdUMOUBAnFO52z1Nj6IkHT1KTXfVsBvZb/XJvTGhPGmJk4T+nukyHWEN7kgewCJ5NZuuKy6YrnwOY4ADmgl8gbHuxG9GOorQWkSePx7UDljlPt7mD+jOy83iFqA4R5bszo1lwaRH7FjdqEyapDUQ7JKYWr6KcHC/0OS5BPndlfMQIaqKP6UHzJROqqQFTGIPBXgGrULoUYdLFBxtRzoh7WhLAMl6vAiFs/n6nFHBgUgJ0OlZtx9r0LEp6ieJz/HDZcQT9Asrb2FnKcsJ8qxyFQO9HMKtlQC/06+kPOg9aPOcWexr1MZUaGPJTJZrYkDN5SQVAI7VC+r6TzSPppCbGhNR46OYHOY+0hdUosW+/Oj9no79btqUjRAlA8jkErCUCxC+bASDR7xkZ37+NGivQpnAKED9bGf3u04iOIpijLY2ZNVsCgDbmow4dQ6k+R47ydwWWItjJ0V84oCS2b6fbgDgD4MFTU0YDysXypfQ6SA3YK4VQ+ThuzsOiiLA5j0Q5vGJoiGXiTdZKWDgjvI+tUS3fqr34PSgOw6a0IlNZER3N6DZiI+QKBEQIl6YkZCir5cAcwTq8BratXFiuiiaBnMQLhr++qVndVIpoclMPc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Unmapped folios accessed through file descriptors can be underprotected. Those folios are added to the oldest generation based on: 1. The fact that they are less costly to reclaim (no need to walk the rmap and flush the TLB) and have less impact on performance (don't cause major PFs and can be non-blocking if needed again). 2. The observation that they are likely to be single-use. E.g., for client use cases like Android, its apps parse configuration files and store the data in heap (anon); for server use cases like MySQL, it reads from InnoDB files and holds the cached data for tables in buffer pools (anon). However, the oldest generation can be very short lived, and if so, it doesn't provide the PID controller with enough time to respond to a surge of refaults. (Note that the PID controller uses weighted refaults and those from evicted generations only take a half of the whole weight.) In other words, for a short lived generation, the moving average smooths out the spike quickly. To fix the problem: 1. For folios that are already on LRU, if they can be beyond the tracking range of tiers, i.e., five accesses through file descriptors, move them to the second oldest generation to give them more time to age. (Note that tiers are used by the PID controller to statistically determine whether folios accessed multiple times through file descriptors are worth protecting.) 2. When adding unmapped folios to LRU, adjust the placement of them so that they are not too close to the tail. The effect of this is similar to the above. On Android, launching 55 apps sequentially: Before After Change workingset_refault_anon 25641024 25598972 0% workingset_refault_file 115016834 106178438 -8% Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") Signed-off-by: Yu Zhao Reported-by: Charan Teja Kalla Tested-by: Kalesh Singh Cc: stable@vger.kernel.org --- include/linux/mm_inline.h | 23 ++++++++++++++--------- mm/vmscan.c | 2 +- mm/workingset.c | 6 +++--- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 9ae7def16cb2..f4fe593c1400 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -232,22 +232,27 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, if (folio_test_unevictable(folio) || !lrugen->enabled) return false; /* - * There are three common cases for this page: - * 1. If it's hot, e.g., freshly faulted in or previously hot and - * migrated, add it to the youngest generation. - * 2. If it's cold but can't be evicted immediately, i.e., an anon page - * not in swapcache or a dirty page pending writeback, add it to the - * second oldest generation. - * 3. Everything else (clean, cold) is added to the oldest generation. + * There are four common cases for this page: + * 1. If it's hot, i.e., freshly faulted in, add it to the youngest + * generation, and it's protected over the rest below. + * 2. If it can't be evicted immediately, i.e., a dirty page pending + * writeback, add it to the second youngest generation. + * 3. If it should be evicted first, e.g., cold and clean from + * folio_rotate_reclaimable(), add it to the oldest generation. + * 4. Everything else falls between 2 & 3 above and is added to the + * second oldest generation if it's considered inactive, or the + * oldest generation otherwise. See lru_gen_is_active(). */ if (folio_test_active(folio)) seq = lrugen->max_seq; else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || (folio_test_reclaim(folio) && (folio_test_dirty(folio) || folio_test_writeback(folio)))) - seq = lrugen->min_seq[type] + 1; - else + seq = lrugen->max_seq - 1; + else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq) seq = lrugen->min_seq[type]; + else + seq = lrugen->min_seq[type] + 1; gen = lru_gen_from_seq(seq); flags = (gen + 1UL) << LRU_GEN_PGOFF; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..e67631c60ac0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4260,7 +4260,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* protected */ - if (tier > tier_idx) { + if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { int hist = lru_hist_from_seq(lrugen->min_seq[type]); gen = folio_inc_gen(lruvec, folio, false); diff --git a/mm/workingset.c b/mm/workingset.c index 7d3dacab8451..2a2a34234df9 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 1. For pages accessed through page tables, hotter pages pushed out * hot pages which refaulted immediately. * 2. For pages accessed multiple times through file descriptors, - * numbers of accesses might have been out of the range. + * they would have been protected by sort_folio(). */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { - folio_set_workingset(folio); + if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) { + set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset)); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } unlock: