From patchwork Fri Dec 8 06:14:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13484883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A8BFC4167B for ; Fri, 8 Dec 2023 06:14:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EFD856B007B; Fri, 8 Dec 2023 01:14:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EAD666B007E; Fri, 8 Dec 2023 01:14:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D74086B0080; Fri, 8 Dec 2023 01:14:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BCF0A6B007B for ; Fri, 8 Dec 2023 01:14:15 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 94368160291 for ; Fri, 8 Dec 2023 06:14:15 +0000 (UTC) X-FDA: 81542636070.16.55256EE Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf07.hostedemail.com (Postfix) with ESMTP id E250E4000B for ; Fri, 8 Dec 2023 06:14:13 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="1/M/ZKm+"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702016053; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=nC7NDg86WFzP7WxsK6WV71VcqRExKFBLCjeKPeIee+F1WN1xVhzTGlWpaGZj78b+0X5JMJ g9CDuIRgT9TyEuUDrIJZ4zZwPuNNvJfZf9BaWwllY+YlNTC8t0lPm8rBhl8Q+oyvlB5m1K kbEo8VBzxOqBTSszSGoDuTEdX40zy+k= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="1/M/ZKm+"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3NLRyZQYKCJkRNSA3H9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702016053; a=rsa-sha256; cv=none; b=7QcYpwiwAXFG0U8v1cfsmdFjUA55JW8Dj9RTycfZ6sxx3EKUmLqthi8lE3Oqiy49QqRKII bJlhhhwt957xJqo028/EekafHFt3LM3Xtur6DYOZ2r1ty9mFgtw3lSncp4BoFOwg+JaKeF OHYblD4Di0JJHgaIp0F/CnYfV7x/IxY= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-da3dd6a72a7so2427642276.0 for ; Thu, 07 Dec 2023 22:14:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016053; x=1702620853; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=1/M/ZKm+LGznMULpUpoTeyo4ZNqfAWQXZRa54A0SGIu6Z2Fw/9k3irIuW2FD38NIGg qUXG79nVzm7IrOzW1bb0RTo+O6+ovi/SKNFAa6qzjQ5wzByoZPfHVrNH3Er67ca2E2Cv Qb9cbbEYLQSDe9B16wZm2grBgwV4C8TW+SoATIJcQD/SDJML6YvqZNSzOSnNLURO+RlF JXxsLDwM5gfP8i4zAbj9t+WpOXda1uIUN68X+Zi454dxWWDlphGHSTLTGSllqqq7iLwl RF3lypXEcseWy9uhqj9ZX2Mrh4GZxsb+QjCheiD5nnD0AZX9WzhOOqPcdwyQSoniG3wz eJqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016053; x=1702620853; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Z4mnJuieaycBMAUUn8t7GQZw8OVUsYesaq0RGH3Lfpw=; b=ZZHzZS/MMtwnTLEEoyacPQGrU+1CbVcspiAT1GQnjlWHBdyzT8PnMZ6qLHYWYaaWFu lUc8XFVoGQDHjcl8EPTHXlC9brF0UtqDDcHCo1Ry9tMnhvR4KHZ1ztc/MIG5Ff3QKWP2 9QmwK9ZlgW46PX6hM609WoWYCTnHMZhnVy1H4T+pNQ91Pz0Duei4pDiUBlY9i9BeWabB 5DKLMsEHvkvZdqjq8SCkzADNWERJPHYFbJ0r8T9x+4Yvhx1eXn7R5lSxERznR6m6V2wy TDPTYQ0RQEU4zFSepantmeACrLblP/EFEhYNoethkwOZVp7iICApv6Ji/Qpfh0PLXbyR T92A== X-Gm-Message-State: AOJu0YxUrqL9UiZFTv0XtKmI1NBWJ5w0O7Wkz4UujJzlS8ej0HKnXurY 8gULxndoGDFfJLR7eXqDOLhGeLlTY7A= X-Google-Smtp-Source: AGHT+IHhxQvDbfCzLXGcrgriZnrDb7xchNgA0cUieagOjXHbp5V8pGi0I1EYaipp3zZy55FFcaB1WVBR9y4= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a25:cf81:0:b0:d9a:36cd:482e with SMTP id f123-20020a25cf81000000b00d9a36cd482emr49952ybg.13.1702016052872; Thu, 07 Dec 2023 22:14:12 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:04 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-1-yuzhao@google.com> Subject: [PATCH mm-unstable v1 1/4] mm/mglru: fix underprotected page cache From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Charan Teja Kalla , Kalesh Singh , stable@vger.kernel.org X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E250E4000B X-Stat-Signature: qeo5y5rg7d18cby13mah51bqbxthi7a4 X-HE-Tag: 1702016053-913452 X-HE-Meta: U2FsdGVkX1+PsYD7VoUsRhzwgU8LQvpdFfouHjG35lm7bYPzZWea3fg1jW+vMAzmzqVZbxZsiLDhV3XM4u+erqq5g9/7Z8Ff4nNVACYfoZC9F9VKyw/C2wuQs6Jvir+tOosCB1hGmLThldZXLoS+z5WTFfbJcRxEFXnVMzg261hJhUqCBZoOlhAK61HSm3x0Qs8mD3GFUqMmZFv3ORyVacyfoCs/yEuZ4ZRnb6UrayExBBXYn5sKt9DqPzMT8+Gr2K5XEeaq4GBEDPh6EAHi2yuEVixDUfyJwvK1I7ysnqmmSVd8yJLR4IAL0OBqVJA2ayfez8TTsMYbH1g7LLR3qOzLnL/jFU91LuVNxsXC9u5h5kWzVBDS8AQRDrw+nMAgulY6yuAOaBRTua8/QWv3v7qxf20PsJ10RLO5nxMsLFmbNVxDrtrsdI4ZWHTwtUAdM+o7U5JwmDVNjeAMGqzW3B1UBe6n8BSeQnah6xyeuIjS1y76pcZCiJWWsRtgFliXty1a8d6lGVA6xJys1rS9s9jjwrw2XsszhJ7kE7w+bufEu5pf6qiPv8abtADj83G0QaI6416h3etuZnxWdYyIh4ab2nCI0gpmYmnnG/WhEfOAgjjSTv+44Zrp2MigQVVpGXmJ8vgUJN6bexqJa51tbRkOXKnyTRKmp/O0mRvg2Vv4eaU223xnPD1Pn3kkaxznmgKvs4DvyHyDQY0EtrZyG3wMvKkPBD5S1ak+GsfYy0bypRHCSoA5NiJ7bzuM5nK4Y7WD6CPkFZLFvPUeWFboSvVrfo0jtSpDhR5hapK3mWYRlbwtE8rXb9fG4kcAr07cFK+UktuIPs3lRW0v4M9n8FRlCIV9hJurzD663jLXr2MyRBmBPxhevRfoxiKk6rdW4VfyZK5hASg9ufKKwfLAtU7dvIyGgKsnI6D9khyXwfMwP7V5fZ7sDgBuXmxv6vUSGX6+BZCUQqUpDX9PsQz inV/uQBp iYlc8nliO5Ze2mRfk5s8RiIwRtq0bKNEdlEqRq2IPPFxJNBvNnEEcFYucn5gW+kJnWFy03wUXh5kN++ibAM+koxKj+Pdjr3MmC9NUIFuvu8veYSLhuEN/hPwwUEHdUMOUBAnFO52z1Nj6IkHT1KTXfVsBvZb/XJvTGhPGmJk4T+nukyHWEN7kgewCJ5NZuuKy6YrnwOY4ADmgl8gbHuxG9GOorQWkSePx7UDljlPt7mD+jOy83iFqA4R5bszo1lwaRH7FjdqEyapDUQ7JKYWr6KcHC/0OS5BPndlfMQIaqKP6UHzJROqqQFTGIPBXgGrULoUYdLFBxtRzoh7WhLAMl6vAiFs/n6nFHBgUgJ0OlZtx9r0LEp6ieJz/HDZcQT9Asrb2FnKcsJ8qxyFQO9HMKtlQC/06+kPOg9aPOcWexr1MZUaGPJTJZrYkDN5SQVAI7VC+r6TzSPppCbGhNR46OYHOY+0hdUosW+/Oj9no79btqUjRAlA8jkErCUCxC+bASDR7xkZ37+NGivQpnAKED9bGf3u04iOIpijLY2ZNVsCgDbmow4dQ6k+R47ydwWWItjJ0V84oCS2b6fbgDgD4MFTU0YDysXypfQ6SA3YK4VQ+ThuzsOiiLA5j0Q5vGJoiGXiTdZKWDgjvI+tUS3fqr34PSgOw6a0IlNZER3N6DZiI+QKBEQIl6YkZCir5cAcwTq8BratXFiuiiaBnMQLhr++qVndVIpoclMPc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Unmapped folios accessed through file descriptors can be underprotected. Those folios are added to the oldest generation based on: 1. The fact that they are less costly to reclaim (no need to walk the rmap and flush the TLB) and have less impact on performance (don't cause major PFs and can be non-blocking if needed again). 2. The observation that they are likely to be single-use. E.g., for client use cases like Android, its apps parse configuration files and store the data in heap (anon); for server use cases like MySQL, it reads from InnoDB files and holds the cached data for tables in buffer pools (anon). However, the oldest generation can be very short lived, and if so, it doesn't provide the PID controller with enough time to respond to a surge of refaults. (Note that the PID controller uses weighted refaults and those from evicted generations only take a half of the whole weight.) In other words, for a short lived generation, the moving average smooths out the spike quickly. To fix the problem: 1. For folios that are already on LRU, if they can be beyond the tracking range of tiers, i.e., five accesses through file descriptors, move them to the second oldest generation to give them more time to age. (Note that tiers are used by the PID controller to statistically determine whether folios accessed multiple times through file descriptors are worth protecting.) 2. When adding unmapped folios to LRU, adjust the placement of them so that they are not too close to the tail. The effect of this is similar to the above. On Android, launching 55 apps sequentially: Before After Change workingset_refault_anon 25641024 25598972 0% workingset_refault_file 115016834 106178438 -8% Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") Signed-off-by: Yu Zhao Reported-by: Charan Teja Kalla Tested-by: Kalesh Singh Cc: stable@vger.kernel.org --- include/linux/mm_inline.h | 23 ++++++++++++++--------- mm/vmscan.c | 2 +- mm/workingset.c | 6 +++--- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 9ae7def16cb2..f4fe593c1400 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -232,22 +232,27 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, if (folio_test_unevictable(folio) || !lrugen->enabled) return false; /* - * There are three common cases for this page: - * 1. If it's hot, e.g., freshly faulted in or previously hot and - * migrated, add it to the youngest generation. - * 2. If it's cold but can't be evicted immediately, i.e., an anon page - * not in swapcache or a dirty page pending writeback, add it to the - * second oldest generation. - * 3. Everything else (clean, cold) is added to the oldest generation. + * There are four common cases for this page: + * 1. If it's hot, i.e., freshly faulted in, add it to the youngest + * generation, and it's protected over the rest below. + * 2. If it can't be evicted immediately, i.e., a dirty page pending + * writeback, add it to the second youngest generation. + * 3. If it should be evicted first, e.g., cold and clean from + * folio_rotate_reclaimable(), add it to the oldest generation. + * 4. Everything else falls between 2 & 3 above and is added to the + * second oldest generation if it's considered inactive, or the + * oldest generation otherwise. See lru_gen_is_active(). */ if (folio_test_active(folio)) seq = lrugen->max_seq; else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || (folio_test_reclaim(folio) && (folio_test_dirty(folio) || folio_test_writeback(folio)))) - seq = lrugen->min_seq[type] + 1; - else + seq = lrugen->max_seq - 1; + else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq) seq = lrugen->min_seq[type]; + else + seq = lrugen->min_seq[type] + 1; gen = lru_gen_from_seq(seq); flags = (gen + 1UL) << LRU_GEN_PGOFF; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..e67631c60ac0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4260,7 +4260,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* protected */ - if (tier > tier_idx) { + if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { int hist = lru_hist_from_seq(lrugen->min_seq[type]); gen = folio_inc_gen(lruvec, folio, false); diff --git a/mm/workingset.c b/mm/workingset.c index 7d3dacab8451..2a2a34234df9 100644 --- a/mm/workingset.c +++ b/mm/workingset.c @@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio *folio, void *shadow) * 1. For pages accessed through page tables, hotter pages pushed out * hot pages which refaulted immediately. * 2. For pages accessed multiple times through file descriptors, - * numbers of accesses might have been out of the range. + * they would have been protected by sort_folio(). */ - if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { - folio_set_workingset(folio); + if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) { + set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset)); mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); } unlock: From patchwork Fri Dec 8 06:14:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13484884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01AC1C4167B for ; Fri, 8 Dec 2023 06:14:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75C8E6B007E; Fri, 8 Dec 2023 01:14:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E4906B0081; Fri, 8 Dec 2023 01:14:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55E836B0082; Fri, 8 Dec 2023 01:14:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3B4766B007E for ; Fri, 8 Dec 2023 01:14:18 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0EB6880292 for ; Fri, 8 Dec 2023 06:14:18 +0000 (UTC) X-FDA: 81542636196.03.33F338C Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf02.hostedemail.com (Postfix) with ESMTP id 57F5380008 for ; Fri, 8 Dec 2023 06:14:16 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2n6bpTcR; spf=pass (imf02.hostedemail.com: domain of 3N7RyZQYKCJwUQVD6KCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3N7RyZQYKCJwUQVD6KCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702016056; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MPyzSKrXH5Tqmu9fHEEk3D/5egUUX9zjhxgsMiAOvSY=; b=vXAoQ7d4RGiBH+cWA84Ka7way+pdAS/5NLDuMJ72H+BjrblRg/ZNOeTyXKR8LJeWpFCdYd qKt3zPgUPYscPlpQgMszLNUZVrjS+JyT6FvIELYqkp3jFUf1fbF6p3fgXz83koArr8QKdv GJMMpptDJiwUcJgzr1c6Ca3O6GcuutM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702016056; a=rsa-sha256; cv=none; b=qNqUpFfQgCP3YTcq4S2XvaXJphOmBB7U1FRsNZWvrvyclPMrhDXOiHuEGYbEAkIHVcOO0I DWRW8rpHYZFW2niDS+qSjf4OkzZi7oClV/IRrD7NQ8hQsOzB8LxfT0cxOW54JRCvSXpBmg TUsP8uknppv9Hk7mxmmxMU6KRjgbWZA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=2n6bpTcR; spf=pass (imf02.hostedemail.com: domain of 3N7RyZQYKCJwUQVD6KCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yuzhao.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3N7RyZQYKCJwUQVD6KCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--yuzhao.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-db542ec49d9so2325170276.0 for ; Thu, 07 Dec 2023 22:14:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016055; x=1702620855; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MPyzSKrXH5Tqmu9fHEEk3D/5egUUX9zjhxgsMiAOvSY=; b=2n6bpTcRhPSp/vA2qKXmPh1YaZJQLGlTTnk0wmkZJ8vmgzOdlNqipoBGkxvmsnY/JT pceWz7Up1eM296/YElKT6Wlf2rBA5zJGY+j5MIfv6DNyWNlV1V5AIPdd1Fffv2hZjW0i 0f2t1Qc8lTDSctqpc8jQ5B7i+k8rvh+WQ2izBnLdKUQMqOl/KjhVvsIFWfqnxvY7tP3e cz9+OU3MDCLg2OjJhPkxdXS8Rk50G4j8gx7R8Q3S2R8qdxVBbK83i85G9dplO+bB4cNG Sb3mCuNOGin/fj611cOxqhKAh2UVkN6m/RVXJ29+VvPc+oj5iKvvRYy0SYJ0S9DEVr2L Gccw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016055; x=1702620855; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MPyzSKrXH5Tqmu9fHEEk3D/5egUUX9zjhxgsMiAOvSY=; b=Pnlq1u3qdH7YPfBUMdCkKAgrs2dSjG1V66gTEZAPuE7aNEv+6KuEqiKD+ovaHkoZw+ VUrFZyH93AWF2IMuyCs1JHb1tSIVfifnF8HCv5TZbYd3paO3TfuxvX19RLw+XATT5zfd 9/o/Qg1u82s30wCBWkLy6QxGWSXMen2XKYKJkIvkSqWxHthOxP1KQpNxkSRfqKnP5y0e M2XNFas7r1zHvDJzRzF5eykjiN5LK7KgClNZntqzWGdeUcbhISiT70d5tqWi+mPilg53 zkryT7PvAZXrGJSAdTlzdtm2yuqXUr2gpEzym/ww4bVWd3hg4NEVcw6VnK3oHnSp3oC+ 863Q== X-Gm-Message-State: AOJu0Yx+hJoxxIOOJMlPG05E6NmMoeJQoB9EL3t+sbq6G4/unEI8UI4k pp3Eo5Hvb5TdJnvmTGwA9QOWsY+VeHA= X-Google-Smtp-Source: AGHT+IGdRJ6oJBGbszEwYJdNZWMiTiNDeQOA8ujBPwzvHJj4q8s9r2Akp7Hq9nIVYDsXPZgw2A9/YR7iKD4= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a25:76d5:0:b0:da0:3da9:ce08 with SMTP id r204-20020a2576d5000000b00da03da9ce08mr52921ybc.10.1702016055395; Thu, 07 Dec 2023 22:14:15 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:05 -0700 In-Reply-To: <20231208061407.2125867-1-yuzhao@google.com> Mime-Version: 1.0 References: <20231208061407.2125867-1-yuzhao@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-2-yuzhao@google.com> Subject: [PATCH mm-unstable v1 2/4] mm/mglru: try to stop at high watermarks From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , Charan Teja Kalla , Jaroslav Pulchart , Kalesh Singh , stable@vger.kernel.org X-Rspamd-Queue-Id: 57F5380008 X-Rspam-User: X-Stat-Signature: fjt1jrjmngnksuhhqdx4x3tw4d1uo7x5 X-Rspamd-Server: rspam03 X-HE-Tag: 1702016056-101804 X-HE-Meta: U2FsdGVkX1/CHSVylP+WipuUv5AQpH4fqtMw00W24ih8gfdXzkrwCxIsQSS2Wmhss+ITnqjZZmpDjzDekNUexh0QhcbLPvCK5z8X+At/X7K5HGB7Awan5bECDjHmfQ+UzRIb2VpM8UrJOLYp26I2WioossHPXC6JzMs9ggjotLl4Jc9hQugSRZKPYEuM4Qrr8yhq8XPobR6qWmwiUwbLzXsSiR+1yS8DPSE3QzoMabzMX+kDxKNFypf1U4DQ/fF1pllKvgoZi1Pv9/FIyzfCBm4fYhM3jTcw2Q2a9LazjfFUxVghYYktlEYTn2Ot5KWmYdmoFokdbPpN32CuIGrbHBjrwHK4Gk6S7ECo2XF2YcV1gofAe/3taiw76Au8ggo2PPdWAa16F83U8/8gr/qkzkvMeglvdz/DRAdb/yVMP3c78VU8I9CoCzpLeOEnDjUABgoZWII1PNeArkDAry9RykP6Ln3JgydWyBNPA8lQRQbDICEcag6wXAwgsGoBhuRyZMwTdYV4uxu7oghciWdsEAPlKdvUzE4+Zu4S2+zHx/1/sc2Vd9V22doswVgbEuX9ko5iZvSazGzvawlYzFJftJtdygSZMhr93Yf7NLv3QmJ96ihrXk85F+VbohhpyUOg4DND/oCPZUFawuhnaxP21g6CE8q4X/Dv7ZPmmbQaAu9drnGwTNVfKWpCEQhOVSR3HIi1Zu3Fct9tGOUPsJdXB9qFOvyNTRx2aryYzhjClspGxdx+TQ+8s55dVIL72txPWmtJAUJvXIhCUkn7UmHi3ssMFypp1BuQ1mWRozAlRGUVak/bSrCxRmasXhQvKs6q/VXvZsC50FPDeQR6HErsNhmg7UokarS69QzVnFZUUYWN7vhU+jElJCf2K5XhSMVRIumOn6LffEeS1AUb0MtxTnVrDl3yWmFPRH/nDZIFt27xvWovfsl/gj1+yj6RqY23rFHAj+ca5c4LrQnCtdL Y2N6mVkr UHlJib+lBuFFFAUCqNBIRfpHh6y01syqjL6IvWldZ6+GN5CYIqA4PCh895hWPrgOLgwNCcJcF0lbzE25aOdKGW0pWlakVmB2thEGLZ0rj2H7lzO3rfAMNxWYUOW8U0Ug7iBiBUNNh0EFAweE+JMMu0udEhSlmxJ/Bo914kWyaFRxDdaI2Rv2CPr+wLf5Gm6TKW4rH5HhpEC1SkZQ3o4pw5y+ut6ibirx9IfVAkhiPrWVWkjOygUbS2skugoup8WjbYXz5YALQ9xWaIunRY2/Wpxu8Jgh+zH+0125w3YylIIdIMD9Hfl5PovPLSzxsfKEFWBhRUTD4zFQJkpf1eqvZnIQnUTxvqLGD9vBGLGgMwoCJ6dfxcLJQMJnE34zIvXHiKo6CFpuNFu6QlTqAIi2YMBLZSEe/exhASfh7iseI912UPdPpVxz6Dv/nxkLnZNbMQiiIKOeIvOXPd2SThdySFin2uQslOdbKzAW8fhClgL7WBSfV+aI1l4PsNcoKMB65Hj7THTMyZZsguabAiaZC5oLFBCRSICrPuyV7RGWu2aQOCHPHEG6oXlCRMc8Ip7Ld6YYH2XFICd/0IRe6nTwIHdfCeAKKgxgbG2FiEw3GronxWcTr77InX/OEDRBPdDeg6gJ6c+pTZi1jJVlKNGjo+wC6M54HW4V5GB0qJXxrjjB35Hhxkr7Y5a6znTx6+vQYzqYfk1+uCqoi+r8dwHy8BPCntI+ZH4GT5+MG4MM8QRWpzRs9kVd12I3qdtZvbEM935SIlHeAKIurYVjTI2oznCb/ZC7TTr8KUSMzQlhLUEI2H2UISizHRgNBlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The initial MGLRU patchset didn't include the memcg LRU support, and it relied on should_abort_scan(), added by commit f76c83378851 ("mm: multi-gen LRU: optimize multiple memcgs"), to "backoff to avoid overshooting their aggregate reclaim target by too much". Later on when the memcg LRU was added, should_abort_scan() was deemed unnecessary, and the test results [1] showed no side effects after it was removed by commit a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard"). However, that test used memory.reclaim, which sets nr_to_reclaim to SWAP_CLUSTER_MAX. So it can overshoot only by SWAP_CLUSTER_MAX-1 pages, i.e., from nr_reclaimed=nr_to_reclaim-1 to nr_reclaimed=nr_to_reclaim+SWAP_CLUSTER_MAX-1. Compared with the batch size kswapd sets to nr_to_reclaim, SWAP_CLUSTER_MAX is tiny. Therefore that test isn't able to reproduce the worst case scenario, i.e., kswapd overshooting GBs on large systems and "consuming 100% CPU" (see the Closes tag). Bring back a simplified version of should_abort_scan() on top of the memcg LRU, so that kswapd stops when all eligible zones are above their respective high watermarks plus a small delta to lower the chance of KSWAPD_HIGH_WMARK_HIT_QUICKLY. Note that this only applies to order-0 reclaim, meaning compaction-induced reclaim can still run wild (which is a different problem). On Android, launching 55 apps sequentially: Before After Change pgpgin 838377172 802955040 -4% pgpgout 38037080 34336300 -10% [1] https://lore.kernel.org/20221222041905.2431096-1-yuzhao@google.com/ Fixes: a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard") Signed-off-by: Yu Zhao Reported-by: Charan Teja Kalla Reported-by: Jaroslav Pulchart Closes: https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/ Tested-by: Jaroslav Pulchart Tested-by: Kalesh Singh Cc: stable@vger.kernel.org --- mm/vmscan.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index e67631c60ac0..10e964cd0efe 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4676,20 +4676,41 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0; } -static unsigned long get_nr_to_reclaim(struct scan_control *sc) +static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc) { + int i; + enum zone_watermarks mark; + /* don't abort memcg reclaim to ensure fairness */ if (!root_reclaim(sc)) - return -1; + return false; - return max(sc->nr_to_reclaim, compact_gap(sc->order)); + if (sc->nr_reclaimed >= max(sc->nr_to_reclaim, compact_gap(sc->order))) + return true; + + /* check the order to exclude compaction-induced reclaim */ + if (!current_is_kswapd() || sc->order) + return false; + + mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ? + WMARK_PROMO : WMARK_HIGH; + + for (i = 0; i <= sc->reclaim_idx; i++) { + struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i; + unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH; + + if (managed_zone(zone) && !zone_watermark_ok(zone, 0, size, sc->reclaim_idx, 0)) + return false; + } + + /* kswapd should abort if all eligible zones are safe */ + return true; } static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) { long nr_to_scan; unsigned long scanned = 0; - unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); int swappiness = get_swappiness(lruvec, sc); /* clean file folios are more likely to exist */ @@ -4711,7 +4732,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) if (scanned >= nr_to_scan) break; - if (sc->nr_reclaimed >= nr_to_reclaim) + if (should_abort_scan(lruvec, sc)) break; cond_resched(); @@ -4772,7 +4793,6 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) struct lru_gen_folio *lrugen; struct mem_cgroup *memcg; const struct hlist_nulls_node *pos; - unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); restart: @@ -4805,7 +4825,7 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) rcu_read_lock(); - if (sc->nr_reclaimed >= nr_to_reclaim) + if (should_abort_scan(lruvec, sc)) break; } @@ -4816,7 +4836,7 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) mem_cgroup_put(memcg); - if (sc->nr_reclaimed >= nr_to_reclaim) + if (!is_a_nulls(pos)) return; /* restart if raced with lru_gen_rotate_memcg() */ From patchwork Fri Dec 8 06:14:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13484885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85624C4167B for ; Fri, 8 Dec 2023 06:14:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05CC86B0081; Fri, 8 Dec 2023 01:14:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F23626B0082; Fri, 8 Dec 2023 01:14:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7ABA6B0083; Fri, 8 Dec 2023 01:14:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B9FC16B0081 for ; Fri, 8 Dec 2023 01:14:20 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 902AE1202D7 for ; Fri, 8 Dec 2023 06:14:20 +0000 (UTC) X-FDA: 81542636280.17.911CD4D Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf11.hostedemail.com (Postfix) with ESMTP id CD55740003 for ; Fri, 8 Dec 2023 06:14:18 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Zsj6KeES; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of 3ObRyZQYKCJ4WSXF8MEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3ObRyZQYKCJ4WSXF8MEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702016058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bVLrJAGnIVd+3pf7kb6jXJD9XT/HKbFyDQ4DzyYCxV8=; b=W6ah6tAw8ofyn+StDCpBl2mASxN0IrwhbGk8qxy/hB2Ska6G0A4Jsobzcc8mdw9W8+Feul 3hwl+pnAmm3ti9uU74vbCwacCE9Bdu/V9EWpMjq0DJmpVSskVHDsupJjHmCt4oh7rhICv8 zyqAaCMtVZpj9kMhriSI53g+9gV4nWg= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Zsj6KeES; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of 3ObRyZQYKCJ4WSXF8MEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3ObRyZQYKCJ4WSXF8MEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702016058; a=rsa-sha256; cv=none; b=te5l2z7fAU8mVtnZZd8GXpZrtgdZxsbTvbqAAojjNqs4/16Nfnezst786wP7Jcd7IiAavB oAURPr7m7Vlt94oaivFvLcL1NRklv9HrrmulACsjLjQ32X0RWdXnqqUpz0rZKKiM8Smpte utO3JOuwfS2Eyeq2GwEG/aHvPSazdGE= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5d10f5bf5d9so17761187b3.3 for ; Thu, 07 Dec 2023 22:14:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016058; x=1702620858; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bVLrJAGnIVd+3pf7kb6jXJD9XT/HKbFyDQ4DzyYCxV8=; b=Zsj6KeES2baBHpbYfGPuypxrAkSFt5e4eEzkKGRm+9SG9UIYDaCYZBRvTK77Skhv6t eOhSvD4xBdRZHv45WySrFKpLk6KdGk+JXGV5D4tlEf13NIYagC2gCKfHstdZthNuOZx0 kK0yid9f8FgIEmuU+TVU2/7o8nIsghPRmu27Y1gfaxy7Gqhs4HMbUwpiyBQ/5ZSJPQLI OWYdtkeZtSrH0pn43EqnzkZP8xv/00XLR4He05y5qqzs6QuOHXTz4Uc8/l+kBA83l4kv /ddR88piJhFdAF2l4+eUlUlGbrd6s7wVo4HJm/0Hp/QtXWvCDFdOlwEJwz51pTkmXjcy Mvvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016058; x=1702620858; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bVLrJAGnIVd+3pf7kb6jXJD9XT/HKbFyDQ4DzyYCxV8=; b=fA7HIVEPtQz5gF6wuVbLFOEzc04NzATr5pnTMCir5a3altT6joQ5df222dQlveqqB0 StuoZT5wcVFy01seU13jn68nSfVs0of26Ln9olzVoNKSkkB9ghPgLmtblot3RN9gCABh dGqu5rI0Pi9NX2jccTnRGTUWVw86ZBknfXNeCQajQ7yPB9bkqRAXZYXTAdtp/aQmYUd/ L51hT3iY7HvzkiGz+QqvrXUI46JQ4HodflK1W3Hh1iiaWrhrfajN9q4hL13MlCLdqh9c CGUpMltdKrVh+UeWXyMRbyKl/OBQ+JhVa8b/Q/wIZf9yusPbve51S4aPACyZzRL88QhI NAlQ== X-Gm-Message-State: AOJu0YyqZPnIlH29qRtIJgeBCjsukGmYI6HGW/y2N50pHWvhTRpTwPmc y30b1d4/t5PMoKxwR9qS99sIowU5Wcw= X-Google-Smtp-Source: AGHT+IEZAEdkcU44UW2VfipVvWCCbfA4P11A2km0xUAWfMA+F2bO2hVaznBKtFn6SfuvrT80lhXwHcrPs/o= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a81:af51:0:b0:5d7:1941:61d3 with SMTP id x17-20020a81af51000000b005d7194161d3mr44861ywj.9.1702016057776; Thu, 07 Dec 2023 22:14:17 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:06 -0700 In-Reply-To: <20231208061407.2125867-1-yuzhao@google.com> Mime-Version: 1.0 References: <20231208061407.2125867-1-yuzhao@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-3-yuzhao@google.com> Subject: [PATCH mm-unstable v1 3/4] mm/mglru: respect min_ttl_ms with memcgs From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , "T . J . Mercier" , stable@vger.kernel.org X-Rspamd-Queue-Id: CD55740003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: o67ymaic6kprzbb3yjespfm7g4szkk41 X-HE-Tag: 1702016058-459569 X-HE-Meta: U2FsdGVkX1+qLSvTIdHwLf9P1zmfbwYzPDOz5KiJi7E5ZA6GMqjB3xZJQT4tkgiPyx1se8ayhesrioCXtfduQoEtAzd5SZEXpYGD30m+Omf9lK4EOVM0G3b1kV+SD2dNHHoSg1RDNWCUB83uxlFtEc2pm+yu0iKyhMElNQ9IGzE78hHuyzZTgThXtDKwd2NN6hegwQU3M+FyXhJZEtTGX3CbHcUda3k0TTvhLfnjEsoaKmw26Ra9praFXZ+eusZ7hVqwRPBfSPoFmfTlSp2LNFpEs0yF7IKDC/ZJqXH7R18Ltd7CeCZFqRtT56t3uG5o53GN9eikXd4d2B+xP/Mgl1RutuvFMTb/TeAtjt6HtYeClyMaTpgQ0GSutQjrSbfT1mB4fAcdJQxWVq0cENRN7nFvmS8Z6fJfbJX4Eur8LhwpfmB2im42xOxoLsNd9A2QWOMms/DzNb4o5S9ZNji5IdAyFayzAn6Dgegjl3zlIYMWHzjyx2mlsLgkHU09hAvah2QJ8pSPRKi2BNhCRICHqoK3mI8BdfHSsWk0IOiDhaK7IEu/tdHnKyzKgwCskkx02oQgrQeFM/7vPwMO/6yXUhEr7TChd6wiYlWFzPkrOjfBAsTdgLZQGjby9b+tzYXOI9LZOXKARg3vQgo0azkSQj0DX8c8bzBxren6KGJrPK2X0fH9E3OGEH/F8NyGD4juWvUDVoF/1zytjrrjUYwqW7FW4oBpwh1DCDTnJuS0exo+9h8V3X1hsdxyD/OiPuuPVKAWWOaUSFgRmktNJAZgKLvL6ao9e1ywehLEmGWP7G3WvPhB6t0Rqei5hOFzkQ+nDKYiujAYFwu4bKrRFS6zMPtxowOVBDpLBZQEyRKvydntu1zEp7ZxTuBxcmyTtXdIa4ZHmvLtbsU6Q8lnx3Egeh8BFhv+yhQjMz7d8cvWyactrRiHKG/XwnAPj/Fad5kxRVr+43VZPkt14WVoO9Q LMzHS5Dg EdYs3kggT0DFAUakCqhH+ZYNFoDEmOyASUvmbYCLDopy6p7ymsEskWCqfeLKDmyNmz8V3LQK3xyY+4alAaZ2fUqSiObeC+qci5XRj3nYnQw4LsGIVMh2Ss6oWgwGQqldiNgExIYLTf/ueFhfquLiYOiNEH6r1UVzUQAAYrOnqD6Fob23SuSCTQEdwmteMeyd8Rz07r66M9Nu2j5zx8PBwIIoFmYkCAPLJRdodKLgfuQhBZF0TO22WCM3hmW4d/9kiWYkvb3wblaJLBMZOF4VJHQS/+8XKBFLfRWejvOBFiKzO+R2pqKEgg4nQYKhPyr0XCrL/Fls3KntZ7gjADlypNH0STxAqK5fZh3k6zILhn4r8xgkd3vA127pOcQ7MV8L+1o0RdL5vdpy7nhlJkHQe4l8zIiBFT57M971QEg5i8SWR4bxnARSb9NKyT+rQpm+jYsHsvYEvvwT08npQSLUprqcKmN8/vxTYGLXxvdU/jz754weuPCLCiyHdUKQ4P82S/PLrtBgYdfShJXYgHkpLI5uEcC+5DZ9n4R8MPcT/GLThuofEWw506IuiMhIxdSEubxzx1+E18PRjPD9i2nxAZgJQ40WJfVjM8kwIwZczJyBQkRvHUZ3w4lqdD6hKOY2CidjN6ylebzhg3ob46aK9YNFgJHpFkFyYCnjSqwzpkkL/cN6WRfU//U9v/w/ClC8F+HIwQydK0EcrRlyj+v9m5rCFGWHqq9qxs4/IjxM/rNFgBZY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru: try to stop at high watermarks"), it was discovered that the memcg LRU can breach the thrashing protection imposed by min_ttl_ms. Before the memcg LRU: kswapd() shrink_node_memcgs() mem_cgroup_iter() inc_max_seq() // always hit a different memcg lru_gen_age_node() mem_cgroup_iter() check the timestamp of the oldest generation After the memcg LRU: kswapd() shrink_many() restart: iterate the memcg LRU: inc_max_seq() // occasionally hit the same memcg if raced with lru_gen_rotate_memcg(): goto restart lru_gen_age_node() mem_cgroup_iter() check the timestamp of the oldest generation Specifically, when the restart happens in shrink_many(), it needs to stick with the (memcg LRU) generation it began with. In other words, it should neither re-read memcg_lru->seq nor age an lruvec of a different generation. Otherwise it can hit the same memcg multiple times without giving lru_gen_age_node() a chance to check the timestamp of that memcg's oldest generation (against min_ttl_ms). [1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/ Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao Tested-by: T.J. Mercier Cc: stable@vger.kernel.org --- include/linux/mmzone.h | 30 +++++++++++++++++------------- mm/vmscan.c | 30 ++++++++++++++++-------------- 2 files changed, 33 insertions(+), 27 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b23bc5390240..e3093ef9530f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -510,33 +510,37 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * the old generation, is incremented when all its bins become empty. * * There are four operations: - * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its + * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its * current generation (old or young) and updates its "seg" to "head"; - * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its + * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its * current generation (old or young) and updates its "seg" to "tail"; - * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old + * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old * generation, updates its "gen" to "old" and resets its "seg" to "default"; - * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the + * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the * young generation, updates its "gen" to "young" and resets its "seg" to * "default". * * The events that trigger the above operations are: * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; - * 2. The first attempt to reclaim an memcg below low, which triggers + * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim an memcg below reclaimable size threshold, + * 3. The first attempt to reclaim a memcg below reclaimable size threshold, * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim an memcg below reclaimable size threshold, + * 4. The second attempt to reclaim a memcg below reclaimable size threshold, * which triggers MEMCG_LRU_YOUNG; - * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG; + * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; - * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD. + * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. * - * Note that memcg LRU only applies to global reclaim, and the round-robin - * incrementing of their max_seq counters ensures the eventual fairness to all - * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). + * Notes: + * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing + * of their max_seq counters ensures the eventual fairness to all eligible + * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). + * 2. There are only two valid generations: old (seq) and young (seq+1). + * MEMCG_NR_GENS is set to three so that when reading the generation counter + * locklessly, a stale value (seq-1) does not wraparound to young. */ -#define MEMCG_NR_GENS 2 +#define MEMCG_NR_GENS 3 #define MEMCG_NR_BINS 8 struct lru_gen_memcg { diff --git a/mm/vmscan.c b/mm/vmscan.c index 10e964cd0efe..cac38e9cac86 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4117,6 +4117,9 @@ static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) else VM_WARN_ON_ONCE(true); + WRITE_ONCE(lruvec->lrugen.seg, seg); + WRITE_ONCE(lruvec->lrugen.gen, new); + hlist_nulls_del_rcu(&lruvec->lrugen.list); if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD) @@ -4127,9 +4130,6 @@ static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) pgdat->memcg_lru.nr_memcgs[old]--; pgdat->memcg_lru.nr_memcgs[new]++; - lruvec->lrugen.gen = new; - WRITE_ONCE(lruvec->lrugen.seg, seg); - if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq)) WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); @@ -4152,11 +4152,11 @@ void lru_gen_online_memcg(struct mem_cgroup *memcg) gen = get_memcg_gen(pgdat->memcg_lru.seq); + lruvec->lrugen.gen = gen; + hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]); pgdat->memcg_lru.nr_memcgs[gen]++; - lruvec->lrugen.gen = gen; - spin_unlock_irq(&pgdat->memcg_lru.lock); } } @@ -4663,7 +4663,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, bool DEFINE_MAX_SEQ(lruvec); if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) - return 0; + return -1; if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan)) return nr_to_scan; @@ -4738,7 +4738,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) cond_resched(); } - /* whether try_to_inc_max_seq() was successful */ + /* whether this lruvec should be rotated */ return nr_to_scan < 0; } @@ -4792,13 +4792,13 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) struct lruvec *lruvec; struct lru_gen_folio *lrugen; struct mem_cgroup *memcg; - const struct hlist_nulls_node *pos; + struct hlist_nulls_node *pos; + gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); restart: op = 0; memcg = NULL; - gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); rcu_read_lock(); @@ -4809,6 +4809,10 @@ static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) } mem_cgroup_put(memcg); + memcg = NULL; + + if (gen != READ_ONCE(lrugen->gen)) + continue; lruvec = container_of(lrugen, struct lruvec, lrugen); memcg = lruvec_memcg(lruvec); @@ -4893,16 +4897,14 @@ static void set_initial_priority(struct pglist_data *pgdat, struct scan_control if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH) return; /* - * Determine the initial priority based on ((total / MEMCG_NR_GENS) >> - * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the - * estimated reclaimed_to_scanned_ratio = inactive / total. + * Determine the initial priority based on + * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, + * where reclaimed_to_scanned_ratio = inactive / total. */ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE); if (get_swappiness(lruvec, sc)) reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON); - reclaimable /= MEMCG_NR_GENS; - /* round down reclaimable and round up sc->nr_to_reclaim */ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1); From patchwork Fri Dec 8 06:14:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 13484886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A133EC4167B for ; Fri, 8 Dec 2023 06:14:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CE456B0083; Fri, 8 Dec 2023 01:14:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02F996B0085; Fri, 8 Dec 2023 01:14:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D9C066B0087; Fri, 8 Dec 2023 01:14:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BDE6C6B0083 for ; Fri, 8 Dec 2023 01:14:22 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8C40D1C07F7 for ; Fri, 8 Dec 2023 06:14:22 +0000 (UTC) X-FDA: 81542636364.11.B605B1C Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf09.hostedemail.com (Postfix) with ESMTP id C83F014000A for ; Fri, 8 Dec 2023 06:14:20 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=d6lmE8ZI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3PLRyZQYKCKEZVaIBPHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3PLRyZQYKCKEZVaIBPHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yuzhao.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702016060; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XyO/ljh9FKCeCHVPdhFy9bPgiCVl0Vc+13vD9r1JAPY=; b=b5SEpHM+0gQ5EznBaWvzeuz2fD9DW/HUamL5CHZie/M0b6c5PrLMReaG8i+LToubhSdFut LO/XUZ3G/Mxi8iX66Qjo9gqhlMxsNRMgTiZxh/eTvH6xIqBeSTk/tRFsP4PaEDiydN4/Uw XMGmu8JEmZfSgRgkCyEvlUSznKaU4sg= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=d6lmE8ZI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3PLRyZQYKCKEZVaIBPHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yuzhao.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3PLRyZQYKCKEZVaIBPHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--yuzhao.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702016060; a=rsa-sha256; cv=none; b=KV01eid4BmKVLGQasP8CFd5yc7tLYsRbEtofoc1pcW1V9r8CZDlF2z0ov3BT4IY1jrY5Gn bQjCiUlqOsPS3acsRJ+0WhZluPM6I8CHd5cTJaEHvzY2CAfAHb38+/RUGtqwjAqgkv1RWW Rw1rHXlsULzGitv54fU8EeHhsLfcKZY= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5d7e7e10231so11270367b3.1 for ; Thu, 07 Dec 2023 22:14:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702016060; x=1702620860; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XyO/ljh9FKCeCHVPdhFy9bPgiCVl0Vc+13vD9r1JAPY=; b=d6lmE8ZII3MDBDYQX265ZWmzjcDuTpDtCEN0OUbH1FHR3G/+D9JqWb4z+m1WT63ywA H7lkWdTTf6fH4VLVj5WfsOWnsCECqI+rSrU97baT6L6yXNEnDLhzWqselpZ5tPJDHiTd 4BSSA3BCFFH8sxHlLN1slK1tp3Bz/cZTFsMX6eRquWQwXY4jREcbLZ2RSVn0+kTefRhm MgJOwaOnPtVWom6BAmrmT2zfdxP+6vvGV6bd/nxzRRVEBM1MkIlGOxmb44l6VvBDM7Y1 feGTIhIybnj7EtydVt8zvaV1lG/RSWg4ecvZcpdwuAYhoC/UKxN0CNHuBRCVHLGh5S8S pZ7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702016060; x=1702620860; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XyO/ljh9FKCeCHVPdhFy9bPgiCVl0Vc+13vD9r1JAPY=; b=XhwNwndhwmkjuT2XPPlz5DvSBQUwxPbky+9f/tLjS/gGj/UObBLSMK6aYUjNAfVUA/ 6Giom4Ts9Khse1ZWblD+h0uthbxnt0ae1nNBQ93U/v1AlhukyiIHX4SzAXzeX/ZDrzAS qH+R2YQOTjaLotNweJ/jlzsYbqhHPHytsFI3K0ZXVpw01cWl9j8mfm3BQaSSmvkScNDm 1NEQ8s5aJO4ShLkhS7qjVg/agCBWy+MOKGEzkLgg5k0WG2ITaUluxp66Z3HrmTzi3V9H Gv5v0UlLdsxu8QNQALnn+hUn5yP60zQBFMF3tsb2gjjW68eTMMSAYC2CLjT0sMekzKWP 13ZQ== X-Gm-Message-State: AOJu0YxVzpqhTjrmajb70vWic2mE2wNQpE06y914DtmIHSbeikWYcmxn 08aF6gWSHpPhQyc/ksrsTfQ4JLF82bA= X-Google-Smtp-Source: AGHT+IHgAD3BUwnh/ze1lDgA5VK1+NfzNOWSF6TnZDywN1IKmaY+ZlKGrPraa9vlIOKdRdO0XE83PMMSUHc= X-Received: from yuzhao2.bld.corp.google.com ([100.64.188.49]) (user=yuzhao job=sendgmr) by 2002:a05:690c:d05:b0:5d3:e8b8:e1fd with SMTP id cn5-20020a05690c0d0500b005d3e8b8e1fdmr7250ywb.3.1702016060001; Thu, 07 Dec 2023 22:14:20 -0800 (PST) Date: Thu, 7 Dec 2023 23:14:07 -0700 In-Reply-To: <20231208061407.2125867-1-yuzhao@google.com> Mime-Version: 1.0 References: <20231208061407.2125867-1-yuzhao@google.com> X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231208061407.2125867-4-yuzhao@google.com> Subject: [PATCH mm-unstable v1 4/4] mm/mglru: reclaim offlined memcgs harder From: Yu Zhao To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao , "T . J . Mercier" , stable@vger.kernel.org X-Rspamd-Queue-Id: C83F014000A X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: cri5mqztpndne8uneknb1ksw3mr8uczz X-HE-Tag: 1702016060-369306 X-HE-Meta: U2FsdGVkX1+jZonCBvqC8ScQ2mgAwUmB/YQsJLw55i1CLEvXiFcRDw6PJzdn+MvGycLqlci7kQYyp0dPVH/NA3WkuhHoBwi534sJ9y0rvFQLA/uVjDn5q25pYweOJVjW1UgTWLr2/HX/PxfkcPVjGxlRtFxGDJFdYjA48LeirmCZ8JHiDVZCp6XN/pXNxPAc99pbBKzRjpZWxluyunmy37GlRtjbREIvDlBrMKmCk8vWoAPZpyk3XViUQrlwZXZypF5UO2XVB0NfJzJX/NR5aNShBQR5Wwnkj1OFGkSQk0gL7F7hZRkqswcz5FEYtYMTKA/EhTgacWoS/3WbIwqC1SIkPA2w19IOcrdwafHeLaHZpbYZePFe5Sy5tLJ1EE9iUIwiCaGtO4AVcLb6KTNCPv/sy6CdO1H1m9BudPgFUAK9L51sJ6abV/6cl2QHmdqN94SipoTBCZEFK0/x3YRe+NydLhWZPsty5NpyN/oAhobu0+PKa5bqNyikPXNJoEh/EgeTFuZ9/o0IYvXbNnp14katkvZg0JYsaHkpftD4mDtoB1WezcfY02kWwFq37o6ZapqBQcsPt/fdwP7godz9/dmIwVo+ddE54lxrKRMj838vBXz2qJiM5OkaO7VkUY9oZxLUpE4urx91DrHgWhqSruXOAJSGRU8JB0GQF0igm5ymun65saBz3w1KvhWreQ7BeOey24EmUmXF4UYrGu6dejHEyqxiqTCWzgZxJQdQiHThiKFQ0dsnip1XTDTlczb4WVREIXKiK6SfiYd0fNRmPQJhVSi19U03b9V5Rr1tC9GNkuoFCdDZoevZbU4FxsjOy2ri/e2Um7u7LS2bMolLcUmPvVvQwVW8Nh9HGtsxRDu1HtQA/wyvfFNUF/7Qa6PyoXVBw3VGwxI8z4V4qO56E5WAV2AfzMPeKYiGglpWnsYSzXYcXl9kbn1gRAL1al9R5pfiS2ckQ6zLyEYYXAp tn+MP6dU MW7jOpXmBnhE7JSHIcvE0C2pJw3FDx5pUPpnjjWoUY90u34vF18GKzpDqTnLT7u5RjbrjbcAQXV2CxFj322K7IphdKj7ECyuaMvpQnvfthss1qmZnL/NV06Mpyzyv4JUuUORv2nE/Slpm1CpV2GEfJjdx8jpl3OnOvzgC/+JAU85BIim9wgCXPMd3ohD8Jt3LfDZTyecNAv0QOA+VVe4MfNB/JWR04jzTZ8ejg+UYaq3dz6WdpB+kx93FMml6DNtZUkamvwTlTCWyZkciVygzkgZQkkNQZCq0Sci+rpxraypag1lU6tXzMPa95DJfLzgFgdTtL0K4tFnZxwYPlgoFdkjW0aeH45bClOPcobliDSAAF63CJMZa5PceekYp7OLJ6l9SpB1p+YloM05oHMmCeLp7GgvwHz8ibuRJkHX1JxEid62Krdsq+Iz15sNwnBWa/OEB1IuSHMVbdWYUIFVBb2RxxQcWStCz/9KnBoAvzQ8I6slCGtqQyVKfZkdMYF4MjJeAwe2i5Y9ZLGYPKqZHJDiS5/yqBr+UwgDG/OT12Rah3CCVF25Fpb0lRnSMaVWBqbe1gVKbDMkkBUzm0yczy+Fdc+fkC1njuJg/aSNjWU5Lr6PNJb3iZFY5mteCCd5QJoTzZKGRvLTLbqQZe8PWVg3xzci3so41Wmg+HQcTTo1C7leNUUvm8ELeHw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.046434, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: In the effort to reduce zombie memcgs [1], it was discovered that the memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically, instead of rotating them to the tail of the current generation (MEMCG_LRU_TAIL) for a second attempt, it moves them to the next generation (MEMCG_LRU_YOUNG) after the first attempt. Not applying enough pressure on offlined memcgs can cause them to build up, and this can be particularly harmful to memory-constrained systems. On Pixel 8 Pro, launching apps for 50 cycles: Before After Change Zombie memcgs 45 35 -22% [1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/ Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao Reported-by: T.J. Mercier Tested-by: T.J. Mercier Cc: stable@vger.kernel.org --- include/linux/mmzone.h | 8 ++++---- mm/vmscan.c | 24 ++++++++++++++++-------- 2 files changed, 20 insertions(+), 12 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e3093ef9530f..2efd3be484fd 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -524,10 +524,10 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; * 2. The first attempt to reclaim a memcg below low, which triggers * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim a memcg below reclaimable size threshold, - * which triggers MEMCG_LRU_YOUNG; + * 3. The first attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_TAIL; + * 4. The second attempt to reclaim a memcg offlined or below reclaimable size + * threshold, which triggers MEMCG_LRU_YOUNG; * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. diff --git a/mm/vmscan.c b/mm/vmscan.c index cac38e9cac86..dad4b80b04cd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4626,7 +4626,12 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, } /* try to scrape all its memory if this memcg was deleted */ - *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total; + if (!mem_cgroup_online(memcg)) { + *nr_to_scan = total; + return false; + } + + *nr_to_scan = total >> sc->priority; /* * The aging tries to be lazy to reduce the overhead, while the eviction @@ -4747,14 +4752,9 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) bool success; unsigned long scanned = sc->nr_scanned; unsigned long reclaimed = sc->nr_reclaimed; - int seg = lru_gen_memcg_seg(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); struct pglist_data *pgdat = lruvec_pgdat(lruvec); - /* see the comment on MEMCG_NR_GENS */ - if (!lruvec_is_sizable(lruvec, sc)) - return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; - mem_cgroup_calculate_protection(NULL, memcg); if (mem_cgroup_below_min(NULL, memcg)) @@ -4762,7 +4762,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) if (mem_cgroup_below_low(NULL, memcg)) { /* see the comment on MEMCG_NR_GENS */ - if (seg != MEMCG_LRU_TAIL) + if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL) return MEMCG_LRU_TAIL; memcg_memory_event(memcg, MEMCG_LOW); @@ -4778,7 +4778,15 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) flush_reclaim_state(sc); - return success ? MEMCG_LRU_YOUNG : 0; + if (success && mem_cgroup_online(memcg)) + return MEMCG_LRU_YOUNG; + + if (!success && lruvec_is_sizable(lruvec, sc)) + return 0; + + /* one retry if offlined or too small */ + return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ? + MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; } #ifdef CONFIG_MEMCG