From patchwork Wed May 29 18:05:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13679365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C8FCC25B75 for ; Wed, 29 May 2024 18:05:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A4A36B009E; Wed, 29 May 2024 14:05:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 954A56B00A0; Wed, 29 May 2024 14:05:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A7CD6B00A1; Wed, 29 May 2024 14:05:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5B8786B009E for ; Wed, 29 May 2024 14:05:22 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EBD24120CB9 for ; Wed, 29 May 2024 18:05:21 +0000 (UTC) X-FDA: 82172210442.28.EE3E71A Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 1B2BC80010 for ; Wed, 29 May 2024 18:05:19 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bjwZxFDk; spf=pass (imf30.hostedemail.com: domain of 3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717005920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3BMYDZzpf0riy1J9SV25Sb190oVQ0fa4aF69BnMnzTA=; b=DM1Ve0cjR+3Jt6UqmuGVAKBMLNUm9N87+CumHJpPpz78nZsvoFOin3zAPx9B2p+0VUKaSF e5+K2H3oweXs/vUKfy6+Xginoh27ZojfFpK/j/zuq1r2cNKBClUeJwxzpFUvrt4u7xNF3+ mqdwE0Q2WIiEMw69ZVny8p2dEQ/bC8k= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bjwZxFDk; spf=pass (imf30.hostedemail.com: domain of 3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3X25XZgoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717005920; a=rsa-sha256; cv=none; b=wPNDJiaR8xpzeISfZDxHQpbS4nrOa1md79Y+fRlKUUudAR3WwhVNn703jutWBkV7YaOsR1 udaubTZbVebw2wFpTp3cGkDpKDQ9vaKdFU9PvpWHIPIeB/ztYkfiMt1CDjPErhozElLft1 ab3rMYd+fFcCohxBw86l3atDziiwkmo= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-62a248f46aaso35465347b3.0 for ; Wed, 29 May 2024 11:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717005919; x=1717610719; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3BMYDZzpf0riy1J9SV25Sb190oVQ0fa4aF69BnMnzTA=; b=bjwZxFDkg2woBpEmJHxS/bjnw0veUyUxJfv+LVOgnC3afX6k1fyfskgwagWNj7fmew e3vz7pLB//2YTgtYdZ9JMnvPH2dbGpIUjHyPbq7Gir6GobAy7AJmxqVheeuhDR8pz53o ClUPhwSytD5AHD0f4Ihie0GB/j3FVgi4XaBxyfAi9TKDPQOj8ng3FdyIX0ypyt4fIwtI pbi//CKCHPm0oWzZUdDcsZ71/3hw2VZwlFkgkWgQeuAVW+FrQ+G23N6YxnnFpammM5ae wRtctJN2BpaRwgY3glLQB3c2jxLzkfQGiPc7rmbKUhJfPTFfFIUFx2jCYvNmLaC+e76C rXiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717005919; x=1717610719; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3BMYDZzpf0riy1J9SV25Sb190oVQ0fa4aF69BnMnzTA=; b=Ha4+lENvG9Alb7h2XBzZA/5Dn+wcYxZwXnzlqcn/scNtBQmjMq9RWjt5bULAiSht1Z Y+iZ0MQHELhJxjO9o0U4IB85nWX4m63esNsnqaPmqNlkCc1gOGh3KbW39UGfFPx/u3Hj BVQwBum4K7l22DWA3eWJ+e8N4mUaKgzswo6VPTrrLwjuzrd76q6PJ8+1eIYx/plMLyHF 9U9l1arUJZQRSIEnfjvwzjwzB9vL4WhmzTPU+a5fK+ipYgzrSk2ITevW0w/WUp5tAdFZ OiTJAHE1v/R8Bd6kks6arjitJAf5DkSE6gPGwlVHjnDyimOqxzt4xY6wbmHBnuHhqnfY zM2A== X-Forwarded-Encrypted: i=1; AJvYcCVJANls7DEhlSxs0iAxG3z549FX68mZtWyYp3L7YrzXmwTX4QnH3JIq/syUZcvXUmufPEaZdd/ihDcvHFD8BeF09jc= X-Gm-Message-State: AOJu0Yw4nDOdE9OqWmG8InSHCA2OfW81vUll5w/MAJTuBSFkVFses74L HKDDL0xvycMIDhw2PcQBtynUAHBd3f5YBLmENNDowrw8Dkb37Iy2EIRWrergPTfmnwjwPL0XM5o t2O0aLTKhcNnJizvNzw== X-Google-Smtp-Source: AGHT+IF5yrDeFyOp4siVIq2/ry1YjnMG5fFsIAa/oZWUnOkMu1R7nEWj7M0TEs/34Y02GNHwoChUWC5yxmTc6XxG X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:25c8:b0:627:3c45:4a90 with SMTP id 00721157ae682-62a08dd88fcmr41559147b3.4.1717005919131; Wed, 29 May 2024 11:05:19 -0700 (PDT) Date: Wed, 29 May 2024 18:05:05 +0000 In-Reply-To: <20240529180510.2295118-1-jthoughton@google.com> Mime-Version: 1.0 References: <20240529180510.2295118-1-jthoughton@google.com> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240529180510.2295118-3-jthoughton@google.com> Subject: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging From: James Houghton To: Andrew Morton , Paolo Bonzini Cc: Albert Ou , Ankit Agrawal , Anup Patel , Atish Patra , Axel Rasmussen , Bibo Mao , Catalin Marinas , David Matlack , David Rientjes , Huacai Chen , James Houghton , James Morse , Jonathan Corbet , Marc Zyngier , Michael Ellerman , Nicholas Piggin , Oliver Upton , Palmer Dabbelt , Paul Walmsley , Raghavendra Rao Ananta , Ryan Roberts , Sean Christopherson , Shaoqin Huang , Shuah Khan , Suzuki K Poulose , Tianrui Zhao , Will Deacon , Yu Zhao , Zenghui Yu , kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 1B2BC80010 X-Stat-Signature: cwj1zzr11w1e5zycgyppgo4ujep1igdz X-HE-Tag: 1717005919-751461 X-HE-Meta: U2FsdGVkX18zpA2XGNxggQ2msXva2IyTrkhQVAqKPAjeXVhtpsBEPD3H/q/J93/dw9F6F8FLXKzbsl6XZIYMTzyuAAk3w0vJU4qWU1mMg3W110LiFBS3fmxOLHD4ETtQrNYtWC6vUOmgw2pYQW2Z8LXrRQaHeCVA1jN6qCc1xZHK07ZOazxIAEMJfiNWMLIpEPxywof0hd+GSFlZSyzAkcl6yJalG7ggg1koYK8uyW5WiPoqXQjgNYH2tNzVB7fHpvqbKoSsELY/AhHF/9wgY9PDvXZlfNFXL1ROUAFz9ZIhbm6psWoR8nWJGm1VfHt7syqVW9AzSKb01WSSaDTS+E15bCjzMwxbIeL0yxUEi6xDzCid6lZTLtDOPrJZ2EaJl2R+MGGAfKfhF5HEeFu3H7Lc35+lwJ5s4d0d4ukZ6ee0XT6oDLQYYf8/tdLguczNlEU4BgVc9WbYo9zjTFOdTApbsrUnxIiNu+xl5Wtf/AN3CPy4ZvuCZiozx8NL0YSNeSXUyTxDUrc4IUhe/PwpENUTSzOQI92rlvsE3f+Bo3SOFYeS6hREAf2V0kDdr6XuRkTpudPjIx/ebx1vKb3/C3M+cUosjULtML2ZUclThtejjx2sSwFKyYxi6zHrdbUgG3qkDLzPjkMKepFK00LNwl47PgZWeNyD97Hx0jTGOd8hfsB9cJPhCo4ygQMo3QuSVX5L1dKAzxFyakFJ0GdNyWLyY6/Veu7S0JaBwDNwTzlhsUR7+Wx5Bmx+vNFIyarwsmEvIIfsBXRPyAWdwRlIQJDgpOHoc54r30FChCMvMzmZwm2Px5XpME3pA2XS2lOewE2eILKaa8hmMtnR+cRzCZSFYPg1mcGYCaBkW8ML3H+n3pehMhRjSQHy4u9cfOGgtzQYes4KlyUWkW/HiYtMYkgo9fDTEyNPBFG9v1W5QgvPX6oKQRobjxgGwsh9cRg8eFapwErg3HEPqMNIcf7 +NG9Eiso TUUhuMiiSLAchwfDmuC/pzeu8apmkCl0ZWuC+UqlLQ+GVBG00e1PczmLpZFGLFtK1C5VTuRmYMvnJvpDrqz8GdIcdKHLuoQxa87mYk+hz7sE8gpYwPd1Pt8TOsklK5rfL6uVGSbKRw7ZCl9TcMCDzCiZzsdl4Ins9sbXYNOOX+l+sfmok4y5Nq2+lngU+dtWDZM9o9rLQoLVdGZX5GlpOd6m5c5Q4nF3ZxGbr835fkK7l3NgHFWZH1KmqUxLY+Mxd98KPcqfCtzJ6myoVZdpg/+62/FmQP9eKkFr/kz9cunJ0kgSLjWeYtb/OnzOg7wTZk55YQAxa/2yhSU5LmI93JL66d+cet+anTmFMzZnZF1stm8qYGjhOVket/WQO2aOr0FUH4XK+7RsQBS+WsiDoP89NcfkmyyG4pHUtw+MVW6NadYU9GJXcxlRusWGdBJDzYawireIF9Id2auY0h5eQ1K0nGWQU1XHo6X0xRnisZOY7CP8M/Mrmf+rf9AdtE0ElV8BnlLNiD1MWnEGPJl82qyPOf4kv3q8u5hENmhZnT9tjQuSm63BdHmwbTqX113czSL4TlV7cEFp/mO8KZdBYf+KC5OePzDl++2N2waSsWi/PiZQRAx217rAmg5/UTft1cRKOTCSgZMwNvHM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Secondary MMUs are currently consulted for access/age information at eviction time, but before then, we don't get accurate age information. That is, pages that are mostly accessed through a secondary MMU (like guest memory, used by KVM) will always just proceed down to the oldest generation, and then at eviction time, if KVM reports the page to be young, the page will be activated/promoted back to the youngest generation. Do not do look around if there is a secondary MMU we have to interact with. The added feature bit (0x8), if disabled, will make MGLRU behave as if there are no secondary MMUs subscribed to MMU notifiers except at eviction time. Suggested-by: Yu Zhao Signed-off-by: James Houghton --- Documentation/admin-guide/mm/multigen_lru.rst | 6 +- include/linux/mmzone.h | 6 +- mm/rmap.c | 9 +- mm/vmscan.c | 144 ++++++++++++++---- 4 files changed, 123 insertions(+), 42 deletions(-) diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst index 33e068830497..1e578e0c4c0c 100644 --- a/Documentation/admin-guide/mm/multigen_lru.rst +++ b/Documentation/admin-guide/mm/multigen_lru.rst @@ -48,6 +48,10 @@ Values Components verified on x86 varieties other than Intel and AMD. If it is disabled, the multi-gen LRU will suffer a negligible performance degradation. +0x0008 Continuously clear the accessed bit in secondary MMU page + tables instead of waiting until eviction time. This results in + accurate page age information for pages that are mainly used by + a secondary MMU. [yYnN] Apply to all the components above. ====== =============================================================== @@ -56,7 +60,7 @@ E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled - 0x0007 + 0x000f echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f9c9590a42c..869824ef5f3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -400,6 +400,7 @@ enum { LRU_GEN_CORE, LRU_GEN_MM_WALK, LRU_GEN_NONLEAF_YOUNG, + LRU_GEN_SECONDARY_MMU_WALK, NR_LRU_GEN_CAPS }; @@ -557,7 +558,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -576,8 +577,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index e8fc5ecb59b2..24a3ff639919 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -870,13 +870,10 @@ static bool folio_referenced_one(struct folio *folio, continue; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index d55e8d07ffc4..0d89f712f45c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -2579,6 +2580,12 @@ static bool should_clear_pmd_young(void) return arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG); } +static bool should_walk_secondary_mmu(void) +{ + return IS_ENABLED(CONFIG_LRU_GEN_WALKS_SECONDARY_MMU) && + get_cap(LRU_GEN_SECONDARY_MMU_WALK); +} + /****************************************************************************** * shorthand helpers ******************************************************************************/ @@ -3276,7 +3283,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3291,10 +3299,15 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3309,6 +3322,10 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + /* try to avoid unnecessary memory loads */ + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3317,10 +3334,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3343,6 +3356,32 @@ static bool suitable_to_scan(int total, int young) return young * n >= total; } +static bool lru_gen_notifier_test_young(struct mm_struct *mm, + unsigned long addr) +{ + return should_walk_secondary_mmu() && mmu_notifier_test_young(mm, addr); +} + +static bool lru_gen_notifier_clear_young(struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + return should_walk_secondary_mmu() && + mmu_notifier_clear_young(mm, start, end); +} + +static bool lru_gen_pmdp_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, + pmd_t *pmd) +{ + bool young = pmdp_test_and_clear_young(vma, addr, pmd); + + if (lru_gen_notifier_clear_young(vma->vm_mm, addr, addr + PMD_SIZE)) + young = true; + + return young; +} + static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *args) { @@ -3357,8 +3396,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = args->mm; - pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl); + pte = pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl); if (!pte) return false; if (!spin_trylock(ptl)) { @@ -3376,11 +3416,12 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { + if (!pte_young(ptent) && + !lru_gen_notifier_test_young(mm, addr)) { walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3389,8 +3430,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); + if (pte_young(ptent)) + ptep_test_and_clear_young(args->vma, addr, pte + i); young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3456,22 +3498,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (should_clear_pmd_young() && + !should_walk_secondary_mmu()) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!lru_gen_pmdp_test_and_clear_young(vma, addr, pmd + i)) { + walk->mm_stats[MM_LEAF_OLD]++; goto next; + } walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3528,19 +3573,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; + if (pfn == -1) continue; - } - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + if (!pmd_young(val) && !mm_has_notifiers(args->mm)) { + walk->mm_stats[MM_LEAF_OLD]++; continue; + } walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; @@ -3548,7 +3592,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (should_clear_pmd_young()) { + if (should_clear_pmd_young() && !should_walk_secondary_mmu()) { if (!pmd_young(val)) continue; @@ -3994,6 +4038,26 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * rmap/PT walk feedback ******************************************************************************/ +static bool should_look_around(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, int *young) +{ + bool secondary_was_young = + mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); + + /* + * Look around if (1) the PTE is young and (2) we do not need to + * consult any secondary MMUs. + */ + if (pte_young(ptep_get(pte))) { + ptep_test_and_clear_young(vma, addr, pte); + *young = true; + return !mm_has_notifiers(vma->vm_mm); + } else if (secondary_was_young) + *young = true; + + return false; +} + /* * This function exploits spatial locality when shrink_folio_list() walks the * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If @@ -4001,7 +4065,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; @@ -4019,16 +4083,20 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) struct lru_gen_mm_state *mm_state = get_mm_state(lruvec); DEFINE_MAX_SEQ(lruvec); int old_gen, new_gen = lru_gen_from_seq(max_seq); + struct mm_struct *mm = pvmw->vma->vm_mm; lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!should_look_around(vma, addr, pte, &young)) + return young; + if (spin_is_contended(pvmw->ptl)) - return; + return young; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return young; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4036,6 +4104,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return young; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4049,7 +4120,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return young; arch_enter_lazy_mmu_mode(); @@ -4059,19 +4130,21 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) + if (!pte_young(ptent) && + !lru_gen_notifier_test_young(mm, addr)) continue; folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE); + if (pte_young(ptent)) + ptep_test_and_clear_young(vma, addr, pte + i); young++; @@ -4101,6 +4174,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return young; } /****************************************************************************** @@ -5137,6 +5212,9 @@ static ssize_t enabled_show(struct kobject *kobj, struct kobj_attribute *attr, c if (should_clear_pmd_young()) caps |= BIT(LRU_GEN_NONLEAF_YOUNG); + if (should_walk_secondary_mmu()) + caps |= BIT(LRU_GEN_SECONDARY_MMU_WALK); + return sysfs_emit(buf, "0x%04x\n", caps); }