From patchwork Sat Oct 19 01:29:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13842528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 269EDD3E19A for ; Sat, 19 Oct 2024 01:29:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 78DBA6B00AA; Fri, 18 Oct 2024 21:29:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 715706B00AC; Fri, 18 Oct 2024 21:29:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B7E66B00AD; Fri, 18 Oct 2024 21:29:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2EAD06B00AA for ; Fri, 18 Oct 2024 21:29:50 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5C26AC04C5 for ; Sat, 19 Oct 2024 01:29:36 +0000 (UTC) X-FDA: 82688619762.22.E930507 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf04.hostedemail.com (Postfix) with ESMTP id 016DA40007 for ; Sat, 19 Oct 2024 01:29:32 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=W+FDGLDm; spf=pass (imf04.hostedemail.com: domain of 3igsTZwoKCC0ScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3igsTZwoKCC0ScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729301242; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eKoy9FeSC1Po3W3xcBQFvk5zP5oePGJ83EILIsAFUYs=; b=LoHlY+VnJYBJUF2ta/HRNILnB/lgh93kVpDdyK+fmsPfkMFgKsWfGpYnlM9J6Wysd2cPfs KntAvA4Y7BzVXuqeaVKaI0u+Wsd59prsH6OxdORsJyJJnfndy2VUJwnVd3+9E58S1ekWM6 QFgIBfP5PDDmJVefceKGiEUDAukho8w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729301242; a=rsa-sha256; cv=none; b=l0WvLrdtBkXmqIFy92gVcD4EY8cZ6lemaY8g6V66t4fBBEjXbv6nzxXsw5gNgcCW2vvlf/ iJnE3fvXZJmeNaRUUtfJNLe2AALCFZYT6JanqznzmoZeHyjBFm5T2wVz/r1IJlGBNxg9sJ oM3mUkBbfgZXfHiSTcyzbCaVh4lrA9c= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=W+FDGLDm; spf=pass (imf04.hostedemail.com: domain of 3igsTZwoKCC0ScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3igsTZwoKCC0ScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e2974759f5fso3109035276.0 for ; Fri, 18 Oct 2024 18:29:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729301387; x=1729906187; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eKoy9FeSC1Po3W3xcBQFvk5zP5oePGJ83EILIsAFUYs=; b=W+FDGLDmoqsR/8ecuGEhcsxW+QhGm+hubGG5VmUA1sanYkhVrsHkvd0d7pLbSvy+oj YYDfmY7tzKOm/I2UfSQrYGmNX9DFy84x3kr91zMEMXtcGdz+/TF+j1zFPmZKMnBiBmbe 4guGAMC3ezQYew1G5zzQd5fHxGSM7AzQ910sFo8PaFt9bDqxIYefw+Hbh2AQxq2rsdxt UfQowqIUuWeorP6E+/93Gl3fIpmRvSqQkRcktxtuoYTqQSXOvJiAfGjvGRcTxR05TVkM TKY/iRHcD7/u/ydE7w8ReL6xKMvyQN9LUeSgHTMW3MxSjiHykF/sCy434RpbC1RoM83i HJoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729301387; x=1729906187; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eKoy9FeSC1Po3W3xcBQFvk5zP5oePGJ83EILIsAFUYs=; b=TxrY6am8uxKviWxmRRZhZurBGdr2eKup9YVJQmk0jk3Rk1Lq39Bn36R/fTdM8Tmxt2 bj4gQEOvv7/fGYZh9GGzgIAckzqff3yBocEuX2MbPpD+LKaGA1HL1ErUpgypggfEIWVg xw8hg8QjZYIl82C/fJ9HZEGF9GTmyFhSEZnL82mjCPWlD/ZQ360VzF9GZDwIiJSzt60M ojKxDHkeBh+brHMYeXEqTmCKOZrjE+EMmrAh8J+/gzES6+9vrrzxF/pX7bHQmvphf1DG g7gOCQCJRR+jwFe/LowYbssicH8wChHFkP7n2s20EnWUtq0wLRTTocoC8CNXXUoDKoUe SICg== X-Forwarded-Encrypted: i=1; AJvYcCVXoZB1svr4A+lA8eI1xAUMT9T/B3ecAgQru0/9fQo16gETkxJMPGNuqaJPy1c32Dd/5pJciZv/TQ==@kvack.org X-Gm-Message-State: AOJu0YyNAokeYt/yVgAXhfwLxvOjuXm5kssyN5CUC/3aVJVJyj8zv4Xi FqNubsx8fUkiDLZ/5ZZMGIMs+rF4PmwLI0cCghHPfiMIGwr76KXd4fmBE3d46LVZYlaVnPbqr3z rp+hDn0gO4mTmdCaXCg== X-Google-Smtp-Source: AGHT+IHp8tdxBCI9Q7uz6H+aiHlGXfLOJyM6qNf9CDG0PcOfLQK4TggfImr4/05l5Ruj7HVq1yX93CVhFQoj5Pw0 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:13d:fb22:ac12:a84b]) (user=jthoughton job=sendgmr) by 2002:a25:9346:0:b0:e2b:cca2:69e1 with SMTP id 3f1490d57ef6-e2bcca26b2amr2712276.3.1729301386945; Fri, 18 Oct 2024 18:29:46 -0700 (PDT) Date: Sat, 19 Oct 2024 01:29:39 +0000 In-Reply-To: <20241019012940.3656292-1-jthoughton@google.com> Mime-Version: 1.0 References: <20241019012940.3656292-1-jthoughton@google.com> X-Mailer: git-send-email 2.47.0.105.g07ac214952-goog Message-ID: <20241019012940.3656292-3-jthoughton@google.com> Subject: [PATCH 2/2] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() From: James Houghton To: Andrew Morton Cc: Sean Christopherson , Paolo Bonzini , David Matlack , David Rientjes , James Houghton , Oliver Upton , David Stevens , Yu Zhao , Wei Xu , Axel Rasmussen , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 016DA40007 X-Stat-Signature: sbqcfff9zdwxicm5o6kphx5mmc4f78a3 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729301372-867318 X-HE-Meta: U2FsdGVkX18o66e7VdSxalcaTo4h1XjOidcxgCANAh0OrCikPrdctfZPeYjMG3zk6UeO1Cgi/doBsGclViUpqyOZqf0VeJiOq9SIJjkCPfEaZKav9qUoiQUbGaSZ3TaidZQ3hzNfbVhoVEmlE1hHRNup1PLNcsueT1wAcP9VKIEW4xti0O3OIxqiuE3+NlRc3SEin8Y7/bRP1UBEETfgYeLzFAOcJ8VmjOV322lUBVtAC6NETf/Oe+ellG6YP4q/IocQwEd2YyEu+rMCNVIMOC4CFcdNVOd2Nnb2ewFv5dSr91ukd/gyDy+rXECntY/yGptbCIQv8fzk7AhuebR7P6H3rKZpGgEVaim3y1Qu6cOMfnMawpV2VY6aaoiOlyYR/3tokGj72OKNvBmtSrhOS24UMrbSpUS0rUmWiIAZJUKh1HAzt/0XbZCGXym6gRn2oy8rKKfaZJ7tNYGkSt7QD4a1RraT2//P4s8mIx9aR+JBx061KWKYDr6hvUR6wmgqMtMRf89Udwa70lJdAgeJmb8SEqKt9ukGH/jhtRa+Q8TH/lJvHKlEOPlMiblYmIXElh6xY+Q4ErahApQP36Hqe4P+2iH1jOzNE28lWyryxo9g5EK0U8UO76cDp1UPXV0hSxSynqNxGNMq9MrJyfk6qQ0JJMAS2yXw9psN/gOiuXraJY5PsXkiCGMmmVx2xNsQtjzAWc1CtPj+vvX0j0AVju5sYlg4BDUjMBxidDHpcC9YtEorh4wHFElWNQhb6ivyLAscA7F0stxRpbSu4kuK/rUwaSOO66daOe45NlYinNUkTFyxEpCHvwio0QHYG5g4+lm4A2+PmvNjoeCay4gr+Z6BVbOGhwnb8HwRAE6HaqLcZ2qX1HXjg2QWm2OIqc7jJtmPh15KklCKBGweJS38tgUvWCNdzusuzfixhjUE9lV8c5N0T1HeOb3lebnqUozsoxxFFr9Jj2xTFeQPX/J enNY9Epg MaGlwc2UcAzU1jMENQ71CVbOT9CXhU5IrObjxSl7yONJlexRDmy3Ux3fe93xVsO9+w7let4ODnKfRbu7voTZxgnq14jfVXQirWsMFmT2CV5YmmK0VOjbIJz/erK3skyZJ2cEbO2dLucwPV7rMFvaqGSUYNxSxmBH7F54EwCWejIMO43qyNjXzyS/AMVn4RJt+VBphlFkKkxPiCrPuZr79UHEaDwkh/5kMaf4mY9d3OhqdxQjj9CE8khqTBTXTD7nt3xh2finZr1vvBxiLbJf9xhM6aAfCGj4XecQDZpyIBOKcgJPqd/rrNHKFkq/5lsLoqar0rnii1aT7WDl0OJsZof0AG2jLD6hOMPFSSJC8SWXt075zyPBpi5B8oRP2AHK0PG1Ei87SOvw0lG0gwiOQ5iGVOKcMknjbUVsQpQrLxn+ln9ebgWvFt685AaRRkGlxIQQiSgWmRld9DTkc3qLCHKEXIHcC/+E4W9pXak6M7bP+Xwct4gqA0PPu7LuoPRpmMFIzTM1eKHX0WSEknTXOQ7cGDmKnpPAn7XL3eUNucmZDq1PEwcPkWSnl2l69npsnC/EmCkvwRNENSfmNc6YAKU+x0O5GJft6ErhXYv0g5guIHkEdHS+F3VW3Nw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Yu Zhao When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Reported-by: David Stevens Signed-off-by: Yu Zhao Signed-off-by: James Houghton --- include/linux/mmzone.h | 5 ++- mm/rmap.c | 9 ++--- mm/vmscan.c | 91 +++++++++++++++++++++++------------------- 3 files changed, 55 insertions(+), 50 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 691c635d8d1f..2e8c4307c728 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -557,7 +557,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -576,8 +576,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index 2c561b1e52cc..4785a693857a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -896,13 +896,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 60669f8bba46..29c098790b01 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include #include #include +#include #include #include @@ -3293,7 +3294,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3305,13 +3307,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3323,9 +3332,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3334,10 +3349,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3400,20 +3411,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3479,21 +3486,22 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) - goto next; - - if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (pmd_present(pmd[i]) && !pmd_trans_huge(pmd[i])) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3551,24 +3559,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4042,13 +4044,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4064,12 +4066,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4077,6 +4082,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4090,7 +4098,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4100,19 +4108,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4144,6 +4149,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************