From patchwork Mon Mar 24 22:03:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kinsey Ho X-Patchwork-Id: 14027924 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E337C3600C for ; Mon, 24 Mar 2025 22:03:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9DBB8280005; Mon, 24 Mar 2025 18:03:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98965280001; Mon, 24 Mar 2025 18:03:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B88E280005; Mon, 24 Mar 2025 18:03:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 54A99280001 for ; Mon, 24 Mar 2025 18:03:13 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 20CD91A0A12 for ; Mon, 24 Mar 2025 22:03:14 +0000 (UTC) X-FDA: 83257821108.28.C87881C Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf14.hostedemail.com (Postfix) with ESMTP id 5880310000A for ; Mon, 24 Mar 2025 22:03:12 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lW95ZFLs; spf=pass (imf14.hostedemail.com: domain of 3n9bhZwgKCBoA8DI4O7E6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--kinseyho.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3n9bhZwgKCBoA8DI4O7E6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--kinseyho.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742853792; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H6xLSNMv7dH+XhTTh/MX0iK8IzdaNSHq8WhjOCavuno=; b=iO1btWWD0crIeFXDTl83OM/X6SbWNOkfcT09KX79ZcxxFQ1CxdMqRk9Mzyl3ti+gnFFKVg yyIEQUtsaFRPztnw/fdFod+7+v1zG6QXsR9JdCpI/h08QfLHs6eswYeCZUYEfAf/gQ9Zpw G0n75r8t3btoxrqOUkSujUYHMw0/qTU= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=lW95ZFLs; spf=pass (imf14.hostedemail.com: domain of 3n9bhZwgKCBoA8DI4O7E6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--kinseyho.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3n9bhZwgKCBoA8DI4O7E6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--kinseyho.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742853792; a=rsa-sha256; cv=none; b=rcGwkTVyy6K4Fn3UFAtOi2vkJ45Oa92Ichn0JadnKIxD/7gk4d6x+K8GSFk9ccgvFuHQCK hyx5B6LZkTx2AUQFJ689wZ5HaXFuKVRLAkWxnQS6UxKll2RaDEYESDjqCvr2TCIDUfALQL MIxPgbyLRJbCDn8VAhBpVxK7v4unprM= Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2242ade807fso138846135ad.2 for ; Mon, 24 Mar 2025 15:03:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742853791; x=1743458591; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=H6xLSNMv7dH+XhTTh/MX0iK8IzdaNSHq8WhjOCavuno=; b=lW95ZFLsbStWssnH1iEow4ucE91OVQRgRRO+o0ciQy+JmS5azpC6Km+Eyi7RqJhidQ D3AUJT9CJMz+KSaSDWoOy651ofvNpPMbMLylsCSvv7o26YU92C7/OLGEz46W7SBMsBQO RGvVwXGNNgs5eGZ0IdbjangCVVDToU+7LWICxCV8qddzaiBbzdVR6+G8v+GvkLaq/OLw eJI1/uGyi19NG+6vdnrQikUsOcWiA+s8tzEkJJL3YnG3v8O7fW89ShRPqMo1BRPRur5K v5lTjAkZpS8ByKK6rfEAu7rU+mPvGPuXTndMsCEz41RNqhOZD9hl+oMdfEFi6ZeOyY3C 1PHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742853791; x=1743458591; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H6xLSNMv7dH+XhTTh/MX0iK8IzdaNSHq8WhjOCavuno=; b=bgJKpA4Z4cA6b3ghjq17n9WesxKiJnFc0gK+D51YZ43OyDXhnTq2w1B8E1CrA0GrqC d1eHMj6GmIcLp7CaF80hiviTvGGCfpqpcHrA2OePmi5nBwUn8hLDiN0OncffFMfl+9jC n+4EZqD8tRu/ypbBlsW9vkzoGrWKfFT+CJrSXpnsfYZl+2l5YtRetSCi/LN4t6sKNrbr hTzM/bue9ScjaN5dtKjW93RfOtX31U7l6J7kDayC2aQWSssc0HjfIFK0bpBIZPkKlVIy WE5HQ6gtioAiTGU25pwgb1vJ1B7gyLRs+a6cjhioFM07H66m28zavf+SMMuXH3U8XgK0 Zrpw== X-Gm-Message-State: AOJu0Yw4UHIQnsJjf+qRIEIAJxEdOSz8IVdhS8/0mSgdzPAPa8xw6fNd 0i11ikVwFSFXgIupPAilf4NU25G0Z3elfdPqzNAGBuUQs7kR8PazD5I+y5X4meOVIP5/zHunxEZ Mpgh1M/X1Pt5Qro2Nglk/g3Pe8riRIDsIdfHTMYB8tvoQs0BWTjwDdE2PBjZQy9PMzuOA1kc1qY VSrg9L270vs8GbeWzqgljNcGVPG8SeypzmYjwlUg== X-Google-Smtp-Source: AGHT+IFxcf1ulizGHCmrAYuaueILB7KnJ+mFxdngS8qoVQOFjaGknbkRH4uweWaOG90A2VClWCEi/Qzqir1kug== X-Received: from pfbhq26.prod.google.com ([2002:a05:6a00:681a:b0:736:6fb6:7fc]) (user=kinseyho job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3a07:b0:736:3768:6d74 with SMTP id d2e1a72fcca58-7390598e60amr22559759b3a.7.1742853791050; Mon, 24 Mar 2025 15:03:11 -0700 (PDT) Date: Mon, 24 Mar 2025 15:03:00 -0700 In-Reply-To: <20250324220301.1273038-1-kinseyho@google.com> Mime-Version: 1.0 References: <20250324220301.1273038-1-kinseyho@google.com> X-Mailer: git-send-email 2.49.0.395.g12beb8f557-goog Message-ID: <20250324220301.1273038-2-kinseyho@google.com> Subject: [RFC PATCH v1 1/2] mm: mglru: generalize page table walk From: Kinsey Ho To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: yuanchu@google.com, AneeshKumar.KizhakeVeetil@arm.com, Hasan.Maruf@amd.com, Jonathan.Cameron@huawei.com, Michael.Day@amd.com, akpm@linux-foundation.org, dave.hansen@intel.com, david@redhat.com, feng.tang@intel.com, gourry@gourry.net, hannes@cmpxchg.org, honggyu.kim@sk.com, hughd@google.com, jhubbard@nvidia.com, k.shutemov@gmail.com, kbusch@meta.com, kmanaouil.dev@gmail.com, leesuyeon0506@gmail.com, leillc@google.com, liam.howlett@oracle.com, mgorman@techsingularity.net, mingo@redhat.com, nadav.amit@gmail.com, nphamcs@gmail.com, peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com, rientjes@google.com, rppt@kernel.org, shivankg@amd.com, shy828301@gmail.com, sj@kernel.org, vbabka@suse.cz, weixugc@google.com, willy@infradead.org, ying.huang@linux.alibaba.com, ziy@nvidia.com, dave@stgolabs.net, hyeonggon.yoo@sk.com, bharata@amd.com, Kinsey Ho X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 5880310000A X-Stat-Signature: mnndb4bhj8og716ni1a681qyhjc9ofqb X-HE-Tag: 1742853792-43836 X-HE-Meta: U2FsdGVkX19LbfQBCF2a39j08Dki0I5AxExmQpt4N4V70Yich3vAq5d/CQ5ny+i3VGssPGbS0CIkn8tdIfwXUIQDPUbIwFcv0v6PrYvdGqzeJvxiu26oWUla06WeY5mY136qnHJ5vK5/Zc+rWfdO0nd9RGvRLaxRrq7kfCNXcAdnsMu76Oi19TB6rZqm0l2mD/bOpfbi5Spzo/uYjESdVhljopPRaFsrKKIqjCUtpIHxuhG7tiiQGrIXd9gN4jm7QpNkS/vCl3SN2Mh59o0nPdk2IJwKE5zEDBNNaAFroSVJLG6krvKq6UnOwMYILrtD4SSl2Td3sXLbY3RK5AAzqdcphozE+WwIXei5LTp0v11d0tUX5YE4MBZY1XdUuXkuZ0gcqi6rbDWwsJmiwCQg+wdE2r5+in6S8bjapNBnIAtBlSJFAsmplic9UzIlymT0+eV7v1h9DTKYk0sh4A6Suycmkh1nF9PuAdc9iMZ5H4slrHx2X1JjSVQdfzQTU/luf/LMhEsekll7flAMZjfRS8yyipzBMVD6sNULT+JCA3I56ZVC1aMOlGjiAo6pIgeYwQGm85dQsSRhMeQyCoB09Nk8b2XekEF2AiQVDMAKgJp0Q1tbIHhqG86GCNU9bmuGugSDSLsCmtCQkMM90cksYtKZY+BNK5Q9oUNYF0AOf4YMCv+h+Nn7TRb9PlL4bcQLNPoyORXKdxjYRRCVeujyCPMpwTNtYCTnDNyqD7iuVoVaadB6spNJlgkQDHXaSA6cb976svwn6/xFooXqGKcp/mBgWgoNCIMRB5d4ePDHqf1Oyjz5tYoR3j+YJ4GJVDUcKQC03mYOMATWnJSFpXad3ygq4n/dPESzfySeHBwxorn9R7mWSMRaVSRn5ey2b4mMZG7tS1yyMUQ0B9DdEPkjq5/Z1pFcWduS6WP8ZrCsvKg1Bt9OfC21abZgdobB9aLpzZFrI5ugY8QpoL9r4xM ojt4Gwve XtZSEqi3NaIvnI8t4yWOkLsbJoZotBx/nEdBH/EKIdwBAU/ptv/ygLbBvAjhgQKmpLvdYvGcNB+Hhf/hB9n1f2CDnQhqF9xI5GvabFO7YyJFCzVxpX+1XJBitb754EflCDCMCd+bjfr4AgEBPPPTUpucuPjd8V/e0n3RPllKqWRf+jIkIa/ddrdP/tbmB3IIFmav0GaEMIzCQvmW5SSl7GB11f+68ChAn2GTqaAi3QMaGP5kQCGtJuUw3p6ioM5wk/b+4wcbd4XfcubDfCAWbmNfJKy6gTWTPm1hiEY30YTjzR0/Bhid93bi6kjEERJA33kis8grXyy3Qr0bkPm4V2UoCTS2GroR+i6fxl2eD68VyMI3OIj/KGFiY7+u29ep64WKravS3u8SmhYYdhaRhQq5+59/qWpNxr9iZuDREelK1vVnCP2GVaPDo96nNVV2dPjf+rS/mwSZBn8aan3w0d8PDGRaZk1ra0huKF7mb3p+nDUcmSzubWxmM/ITcHofvcPJrfAwVTO1M/0uAEvqn7SxdZ6RhWRPMjmolOmbDINFLpmHMvJx74ofF2F3xoIMWFiOnwH+96lXMkdsnSSxx36SIJVF0nFbczYG+RuAqGyORNqWAc/ctAPh3LbxTS9WRJVob X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Refactor the existing MGLRU page table walking logic to make it resumable. Additionally, introduce two hooks into the MGLRU page table walk: accessed callback and flush callback. The accessed callback is called for each accessed page detected via the scanned accessed bit. The flush callback is called when the accessed callback reports an out of space error. This allows for processing pages in batches for efficiency. With a generalised page table walk, introduce a new scan function which repeatedly scans on the same young generation and does not add a new young generation. Signed-off-by: Kinsey Ho Signed-off-by: Yuanchu Xie --- include/linux/mmzone.h | 5 ++ mm/internal.h | 4 + mm/vmscan.c | 177 ++++++++++++++++++++++++++++++----------- 3 files changed, 140 insertions(+), 46 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index a5c4e789aa55..bab586961a82 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -511,6 +511,8 @@ struct lru_gen_mm_walk { unsigned long seq; /* the next address within an mm to scan */ unsigned long next_addr; + /* called for each accessed pte/pmd */ + int (*accessed_cb)(pfn_t pfn); /* to batch promoted pages */ int nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* to batch the mm stats */ @@ -518,6 +520,9 @@ struct lru_gen_mm_walk { /* total batched items */ int batched; int swappiness; + /* for the pmd under scanning */ + int nr_young_pte; + int nr_total_pte; bool force_scan; }; diff --git a/mm/internal.h b/mm/internal.h index 20b3535935a3..3bf528af2deb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -476,6 +476,10 @@ extern unsigned long highest_memmap_pfn; bool folio_isolate_lru(struct folio *folio); void folio_putback_lru(struct folio *folio); extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason); +void set_task_reclaim_state(struct task_struct *task, + struct reclaim_state *rs); +void lru_gen_scan_lruvec(struct lruvec *lruvec, unsigned long seq, + int (*accessed_cb)(pfn_t), void (*flush_cb)(void)); /* * in mm/rmap.c: diff --git a/mm/vmscan.c b/mm/vmscan.c index c767d71c43d7..fb828a429645 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -271,7 +272,7 @@ static int sc_swappiness(struct scan_control *sc, struct mem_cgroup *memcg) } #endif -static void set_task_reclaim_state(struct task_struct *task, +void set_task_reclaim_state(struct task_struct *task, struct reclaim_state *rs) { /* Check for an overwrite */ @@ -3023,7 +3024,7 @@ static bool iterate_mm_list(struct lru_gen_mm_walk *walk, struct mm_struct **ite VM_WARN_ON_ONCE(mm_state->seq + 1 < walk->seq); - if (walk->seq <= mm_state->seq) + if (!walk->accessed_cb && walk->seq <= mm_state->seq) goto done; if (!mm_state->head) @@ -3452,16 +3453,14 @@ static void walk_update_folio(struct lru_gen_mm_walk *walk, struct folio *folio, } } -static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, - struct mm_walk *args) +static int walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, + struct mm_walk *args, bool *suitable) { - int i; + int i, err = 0; bool dirty; pte_t *pte; spinlock_t *ptl; unsigned long addr; - int total = 0; - int young = 0; struct folio *last = NULL; struct lru_gen_mm_walk *walk = args->private; struct mem_cgroup *memcg = lruvec_memcg(walk->lruvec); @@ -3471,17 +3470,21 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, pmd_t pmdval; pte = pte_offset_map_rw_nolock(args->mm, pmd, start & PMD_MASK, &pmdval, &ptl); - if (!pte) - return false; + if (!pte) { + *suitable = false; + return 0; + } if (!spin_trylock(ptl)) { pte_unmap(pte); - return true; + *suitable = true; + return 0; } if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { pte_unmap_unlock(pte, ptl); - return false; + *suitable = false; + return 0; } arch_enter_lazy_mmu_mode(); @@ -3491,7 +3494,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, struct folio *folio; pte_t ptent = ptep_get(pte + i); - total++; + walk->nr_total_pte++; walk->mm_stats[MM_LEAF_TOTAL]++; pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); @@ -3515,23 +3518,34 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, if (pte_dirty(ptent)) dirty = true; - young++; + walk->nr_young_pte++; walk->mm_stats[MM_LEAF_YOUNG]++; + + if (!walk->accessed_cb) + continue; + + err = walk->accessed_cb(pfn_to_pfn_t(pfn)); + if (err) { + walk->next_addr = addr + PAGE_SIZE; + break; + } } walk_update_folio(walk, last, gen, dirty); last = NULL; - if (i < PTRS_PER_PTE && get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &end)) + if (!err && i < PTRS_PER_PTE && + get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &end)) goto restart; arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte, ptl); - return suitable_to_scan(total, young); + *suitable = suitable_to_scan(walk->nr_total_pte, walk->nr_young_pte); + return err; } -static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma, +static int walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area_struct *vma, struct mm_walk *args, unsigned long *bitmap, unsigned long *first) { int i; @@ -3544,6 +3558,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); DEFINE_MAX_SEQ(walk->lruvec); int gen = lru_gen_from_seq(max_seq); + int err = 0; VM_WARN_ON_ONCE(pud_leaf(*pud)); @@ -3551,13 +3566,13 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area if (*first == -1) { *first = addr; bitmap_zero(bitmap, MIN_LRU_BATCH); - return; + return 0; } i = addr == -1 ? 0 : pmd_index(addr) - pmd_index(*first); if (i && i <= MIN_LRU_BATCH) { __set_bit(i - 1, bitmap); - return; + return 0; } pmd = pmd_offset(pud, *first); @@ -3607,6 +3622,16 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area dirty = true; walk->mm_stats[MM_LEAF_YOUNG]++; + if (!walk->accessed_cb) + goto next; + + err = walk->accessed_cb(pfn_to_pfn_t(pfn)); + if (err) { + i = find_next_bit(bitmap, MIN_LRU_BATCH, i) + 1; + + walk->next_addr = (*first & PMD_MASK) + i * PMD_SIZE; + break; + } next: i = i > MIN_LRU_BATCH ? 0 : find_next_bit(bitmap, MIN_LRU_BATCH, i) + 1; } while (i <= MIN_LRU_BATCH); @@ -3617,9 +3642,10 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area spin_unlock(ptl); done: *first = -1; + return err; } -static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, +static int walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, struct mm_walk *args) { int i; @@ -3631,6 +3657,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, unsigned long first = -1; struct lru_gen_mm_walk *walk = args->private; struct lru_gen_mm_state *mm_state = get_mm_state(walk->lruvec); + int err = 0; VM_WARN_ON_ONCE(pud_leaf(*pud)); @@ -3644,6 +3671,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, /* walk_pte_range() may call get_next_vma() */ vma = args->vma; for (i = pmd_index(start), addr = start; addr != end; i++, addr = next) { + bool suitable; pmd_t val = pmdp_get_lockless(pmd + i); next = pmd_addr_end(addr, end); @@ -3660,7 +3688,10 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (pfn != -1) - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + err = walk_pmd_range_locked(pud, addr, vma, args, + bitmap, &first); + if (err) + return err; continue; } @@ -3669,33 +3700,50 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, if (!pmd_young(val)) continue; - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + err = walk_pmd_range_locked(pud, addr, vma, args, + bitmap, &first); + if (err) + return err; } if (!walk->force_scan && !test_bloom_filter(mm_state, walk->seq, pmd + i)) continue; + err = walk_pte_range(&val, addr, next, args, &suitable); + if (err && walk->next_addr < next && first == -1) + return err; + + walk->nr_total_pte = 0; + walk->nr_young_pte = 0; + walk->mm_stats[MM_NONLEAF_FOUND]++; - if (!walk_pte_range(&val, addr, next, args)) - continue; + if (!suitable) + goto next; walk->mm_stats[MM_NONLEAF_ADDED]++; /* carry over to the next generation */ update_bloom_filter(mm_state, walk->seq + 1, pmd + i); +next: + if (err) { + walk->next_addr = first; + return err; + } } - walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first); + err = walk_pmd_range_locked(pud, -1, vma, args, bitmap, &first); - if (i < PTRS_PER_PMD && get_next_vma(PUD_MASK, PMD_SIZE, args, &start, &end)) + if (!err && i < PTRS_PER_PMD && get_next_vma(PUD_MASK, PMD_SIZE, args, &start, &end)) goto restart; + + return err; } static int walk_pud_range(p4d_t *p4d, unsigned long start, unsigned long end, struct mm_walk *args) { - int i; + int i, err; pud_t *pud; unsigned long addr; unsigned long next; @@ -3713,7 +3761,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long start, unsigned long end, if (!pud_present(val) || WARN_ON_ONCE(pud_leaf(val))) continue; - walk_pmd_range(&val, addr, next, args); + err = walk_pmd_range(&val, addr, next, args); + if (err) + return err; if (need_resched() || walk->batched >= MAX_LRU_BATCH) { end = (addr | ~PUD_MASK) + 1; @@ -3734,40 +3784,48 @@ static int walk_pud_range(p4d_t *p4d, unsigned long start, unsigned long end, return -EAGAIN; } -static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk) +static int try_walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk) { + int err; static const struct mm_walk_ops mm_walk_ops = { .test_walk = should_skip_vma, .p4d_entry = walk_pud_range, .walk_lock = PGWALK_RDLOCK, }; - int err; struct lruvec *lruvec = walk->lruvec; - walk->next_addr = FIRST_USER_ADDRESS; + DEFINE_MAX_SEQ(lruvec); - do { - DEFINE_MAX_SEQ(lruvec); + err = -EBUSY; - err = -EBUSY; + /* another thread might have called inc_max_seq() */ + if (walk->seq != max_seq) + return err; - /* another thread might have called inc_max_seq() */ - if (walk->seq != max_seq) - break; + /* the caller might be holding the lock for write */ + if (mmap_read_trylock(mm)) { + err = walk_page_range(mm, walk->next_addr, ULONG_MAX, + &mm_walk_ops, walk); - /* the caller might be holding the lock for write */ - if (mmap_read_trylock(mm)) { - err = walk_page_range(mm, walk->next_addr, ULONG_MAX, &mm_walk_ops, walk); + mmap_read_unlock(mm); + } - mmap_read_unlock(mm); - } + if (walk->batched) { + spin_lock_irq(&lruvec->lru_lock); + reset_batch_size(walk); + spin_unlock_irq(&lruvec->lru_lock); + } - if (walk->batched) { - spin_lock_irq(&lruvec->lru_lock); - reset_batch_size(walk); - spin_unlock_irq(&lruvec->lru_lock); - } + return err; +} + +static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk) +{ + int err; + walk->next_addr = FIRST_USER_ADDRESS; + do { + err = try_walk_mm(mm, walk); cond_resched(); } while (err == -EAGAIN); } @@ -3964,6 +4022,33 @@ static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness return success; } +void lru_gen_scan_lruvec(struct lruvec *lruvec, unsigned long seq, + int (*accessed_cb)(pfn_t), void (*flush_cb)(void)) +{ + struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk; + struct mm_struct *mm = NULL; + + walk->lruvec = lruvec; + walk->seq = seq; + walk->accessed_cb = accessed_cb; + walk->swappiness = MAX_SWAPPINESS; + + do { + int err = -EBUSY; + + iterate_mm_list(walk, &mm); + if (!mm) + break; + + walk->next_addr = FIRST_USER_ADDRESS; + do { + err = try_walk_mm(mm, walk); + cond_resched(); + flush_cb(); + } while (err == -EAGAIN); + } while (mm); +} + static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness, bool force_scan) {