From patchwork Wed Apr 17 14:11:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lance Yang X-Patchwork-Id: 13633425 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ED14C4345F for ; Wed, 17 Apr 2024 14:11:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 674936B0083; Wed, 17 Apr 2024 10:11:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6243A6B0085; Wed, 17 Apr 2024 10:11:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EBDE6B0087; Wed, 17 Apr 2024 10:11:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 31DD66B0083 for ; Wed, 17 Apr 2024 10:11:36 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 8063D80EA5 for ; Wed, 17 Apr 2024 14:11:35 +0000 (UTC) X-FDA: 82019211750.26.33F7E9D Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf26.hostedemail.com (Postfix) with ESMTP id AC2F7140005 for ; Wed, 17 Apr 2024 14:11:33 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F8AOezxq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713363093; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=WOyzFWDhjtqDpZ8eiPXvRnBmlOv37Y/sWaQ3gyeQkVE=; b=6ZLLPjJNO6YJMw2YGxOTAzuHaZwn4uzSdiixvfCyAUUFqE7OT3Z/99ZNp9g2QTZa86ngPJ 7cdxC0IXa5RalJzIQeUD5JlPsCHJEOn6mL7rpluJqk/vZKzWuh8Wf2X/E6LtKxiz756nzY V9j2EHZ8VDIjW5HW+Nx/w7M6k3CqQa0= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=F8AOezxq; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713363093; a=rsa-sha256; cv=none; b=Lg1CUdzoiMKTaffcS35Rk40NtSGvS5VpQNv0koX7mKjV3FxtoJClFGSszDzFnPKIR2IQdJ K2pJETi3TQl8iKuu/4OZrttLOsGGzG9BFS/tW4JNjOvlBYNpjNSjLKQRd+37NQgwm9j2C6 EUEYYN84VXBahWGpZtT+4eoqJkNSmrQ= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1e2b1cd446fso44306375ad.3 for ; Wed, 17 Apr 2024 07:11:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713363092; x=1713967892; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=WOyzFWDhjtqDpZ8eiPXvRnBmlOv37Y/sWaQ3gyeQkVE=; b=F8AOezxqgyT/TC1gJ23nMeLDypX1tzdmai92cWABdMXbfGyH2e3zI7SdhqxJ4FxAIa 6tu7up35p68t7yiwReZgDUS1ybgjjIzp26/nZgMGRzn83S69ntwQ8gSXpVpDWAM5sTFg kmlxttNzPOexx81BIDrrXs81IsQsktNIpbJJdzrCpGLSG94idayOXdB1ed8q7AitGuRZ iTgwdMqBADit4vS/NWegJosaG/g4WeS+HUAY5qEHjSgSHJS2j1ewY5bFX4S7UZIL2QIk JBY6ycp9mFlA5xLgxP5jI9urWG/8HXeIajV834Pag2zk8Pj/gxzOY1x1I45Faz3xNzbE zCdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713363092; x=1713967892; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WOyzFWDhjtqDpZ8eiPXvRnBmlOv37Y/sWaQ3gyeQkVE=; b=tnbbrKfmD/rPyH5fRE408VOwhw9q7W1GbAUxKhmGHWxA/Ul01LCA4AcJ9LQQULr2Gc xEYyJJQ+dXW1vEwtxBUAq5Ml78lW8yetTVcBWAg4Tl3wc/VIk5g9+XTS11x9NKkPM8t7 ZnVSSLVpJi/Lj7Zdesq83DCMkn3JpQHvHhmB0+D4+YXQwG+kh6XtF498Cdhg5TrCtLwe LnmXq3ErA0b4HfToujUIKJmGK+1zYRbaliOm9CCDHMtTa41PKJF4YfL6Uh5hw4r6HUnv e0K+5z6PxdnO/AZib0x49D1tDGM79tuZ+K1CxBNUz2pxqa2AJ26khNkSKcZ1KyPDvtUY mfew== X-Forwarded-Encrypted: i=1; AJvYcCX3BNToIucMgksiA15EgqJpKRrIBLJdxYuCHZtpaqvxUjZ8U5VY9SPbPr7+G1rNYe98aMbclIySDc9PiCWMR3uYvDc= X-Gm-Message-State: AOJu0YzIeOh1VCZkT9rabBhFd8qVB6F4WlffAGuBDwRLf6LsxlYdK1wS nXLN53AXZpsxo6b/wft2YU8x1cORXyzHm6maRQc7zWzD8HUq1YfB X-Google-Smtp-Source: AGHT+IET3LK55hdttEvsXHzIZ5l9WR+u3/sJNRuAzlaYshciPv8sL0N3ZYenqgEV4MTlku+xLcvGhg== X-Received: by 2002:a17:902:a705:b0:1de:e47e:116c with SMTP id w5-20020a170902a70500b001dee47e116cmr14382321plq.39.1713363092309; Wed, 17 Apr 2024 07:11:32 -0700 (PDT) Received: from LancedeMBP.lan ([112.10.225.217]) by smtp.gmail.com with ESMTPSA id bi5-20020a170902bf0500b001e3e0a6e76csm11542318plb.99.2024.04.17.07.11.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Apr 2024 07:11:31 -0700 (PDT) From: Lance Yang To: akpm@linux-foundation.org Cc: willy@infradead.org, maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com, david@redhat.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: [PATCH 1/1] mm/vmscan: avoid split PMD-mapped THP during shrink_folio_list() Date: Wed, 17 Apr 2024 22:11:11 +0800 Message-Id: <20240417141111.77855-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AC2F7140005 X-Stat-Signature: beskihyq5g3kst6qpsqmt4r3o7xkcgeh X-HE-Tag: 1713363093-411024 X-HE-Meta: U2FsdGVkX1986Vmh8WYP7GIbTjhFmflpJhtsXav6Dut1BXrBrDIhGF8vojp1+073s8HB9Crs3yvxhggjtb6OyU1aJOf7AIetsESz40opiuPph9ICrgoFTy4nSDHxNomcB3GSKFp2nQluji/2omWP2AKkdIvsUBGOYXBRxBfkFKKnYOhdeWR2Xc57BTSFua8PPvd4sUpuluJAipJ5KsrhvK2dcS9mNCUBPAnw+jB3D/iOInq23/WCL3JttMKc007vQXxrATBMxJa8VJStVgozXhvnOn46FbJJKNUnJbJo8ipq/8eR//eYypsHgwKzTkAAZOnN9FgrIJJ4Bmo9UdmXf5gBeIq15Ha519gyudFHKvjclJ2KTgu9wjLUb9LfEfFhyr+/21HyEoz7xu3nGuVYr3JLSLpxK8kFlJb4EzNMWU1zWUwVJc+tMPrzGqKbMVBN8teL6cYuVsstv+PwbjObTUxvANla+ONMpLnIjmxmMyBXxbpQK+pJRIr8i9yvMLteuLkM1aoZy4cIrYtfsf0LspJ1Oeik6LGKBr9/cNGHBYEGgelC8zbozVJbPgo5tZ0yLFGDMmPFf1g+9HpMq2yzFyPmmyv81X09+4QsDtrjIWyDwUziYF3q7CoftomLYwjB1p9S4JFG/xvhfRRqE5I6HkxMTk90kp6XgpM20rAQnKf50ssX8kDeq1qfzoISuttEaIMNk4S2wZiWkJh+Z9CPHJhJxblWql1thbzjgu5RjbRCcA5dMlHDyKKF1X2abKVcZYa9xQFSi/WrV1CZtWtUN2D1Ml3B0IO1QpB0XhlyteRuW6n6Tv190ez7Fg5weFP/WMHUsZnMrCEp5P+XFI3NVKEb+jHWEz3yYNlUuDJe5mPO1IODXfvNCwYP2u+YsU/oz/mDMt7MHvJAcfBQfCO9OSxL++kcGhQ/f+unNrhZ94iL+SLKgo45RWAajtaZewayz2iTTO0/6EOKvzDnl/s S23y7q34 mz1+zmP+3EIXOxHAVcrNnBN3w8/Aoq9nJB4hGxBdcBPclTMeNklXFqKQX9tn+x66qEQDB2Q9CtqhmDHp+DtDE+7pKfEUFguzVDK4E2SgMJMO91zFSBdkVYqFacq1NNhJyFl/iBymGkmlMIPpVK/jjwCvy3TAhud+3kf0B4sxJ1HfDRs/cSu81gM2schvx1JNp1uTTkRXTdeX0a4AJd8jStUaS9QS2U2lxUYNtyGALFQlLehMqTS+OFfM+mv29mMonxiTmy+Efcrim6EnI6WDkSRfCcvtKVvQulQioY/I5FgD8Kecs+TdZoDoTew5r1InG93/D0FXKc6aeIwdm79MwzMBOFax//B9j9v5rxiUNLuu75qDXHxkzwy/9kwxlBKiGuxvwDDxZhKuOKtGY3R0HyIuh9Gr8uJ2e4mddtM5Wb/3cHWDpd7Dnzhb99YtOScL2ySHMS1qtREyvGlC292iMx08yV6GffjrWP0lEvVduXSXqoSY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When the user no longer requires the pages, they would use madvise(madv_free) to mark the pages as lazy free. IMO, they would not typically rewrite to the given range. At present, a PMD-mapped THP marked as lazyfree during shrink_folio_list() is unconditionally split, which may be unnecessary. If the THP is exclusively mapped and clean, and the PMD associated with it is also clean, then we can attempt to remove the PMD mapping from it. This change will improve the efficiency of memory reclamation in this case. On an Intel i5 CPU, reclaiming 1GiB of PMD-mapped THPs using mem_cgroup_force_empty() results in the following runtimes in seconds (shorter is better): -------------------------------------------- | Old | New | Change | -------------------------------------------- | 0.683426 | 0.049197 | -92.80% | -------------------------------------------- Signed-off-by: Lance Yang --- include/linux/huge_mm.h | 1 + include/linux/rmap.h | 1 + mm/huge_memory.c | 2 +- mm/rmap.c | 81 +++++++++++++++++++++++++++++++++++++++++ mm/vmscan.c | 7 ++++ 5 files changed, 91 insertions(+), 1 deletion(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd07b83a3d0..02a71c05f68a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -36,6 +36,7 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, pgprot_t newprot, unsigned long cp_flags); +inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd); vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write); vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write); diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0f906dc6d280..8c2f45713351 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -100,6 +100,7 @@ enum ttu_flags { * do a final flush if necessary */ TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: * caller holds it */ + TTU_LAZYFREE_THP = 0x100, /* avoid split PMD-mapped THP */ }; #ifdef CONFIG_MMU diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 58f2c4745d80..309fba9624c2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1801,7 +1801,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, return ret; } -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) +inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) { pgtable_t pgtable; diff --git a/mm/rmap.c b/mm/rmap.c index 2608c40dffad..4994f9e402d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -77,6 +77,7 @@ #include #include +#include #define CREATE_TRACE_POINTS #include @@ -1606,6 +1607,80 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page, #endif } +static bool __try_to_unmap_lazyfree_thp(struct vm_area_struct *vma, + unsigned long address, + struct folio *folio) +{ + spinlock_t *ptl; + pmd_t *pmdp, orig_pmd; + struct mmu_notifier_range range; + struct mmu_gather tlb; + struct mm_struct *mm = vma->vm_mm; + struct page *page; + bool ret = false; + + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); + + /* + * If we encounter a PMD-mapped THP that marked as lazyfree, we + * will try to unmap it without splitting. + * + * The folio exclusively mapped should only have two refs: + * one from the isolation and one from the rmap. + */ + if (folio_entire_mapcount(folio) != 1 || folio_test_dirty(folio) || + folio_ref_count(folio) != 2) + return false; + + pmdp = mm_find_pmd(mm, address); + if (unlikely(!pmdp)) + return false; + if (pmd_dirty(*pmdp)) + return false; + + tlb_gather_mmu(&tlb, mm); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, + address & HPAGE_PMD_MASK, + (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE); + mmu_notifier_invalidate_range_start(&range); + + ptl = pmd_lock(mm, pmdp); + orig_pmd = *pmdp; + if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd))) + goto out; + + page = pmd_page(orig_pmd); + if (unlikely(page_folio(page) != folio)) + goto out; + + orig_pmd = pmdp_huge_get_and_clear(mm, address, pmdp); + tlb_remove_pmd_tlb_entry(&tlb, pmdp, address); + /* + * There is a race between the first check of the dirty bit + * for the PMD and the TLB entry flush. If the PMD is re-dirty + * at this point, we will return to try_to_unmap_one() to call + * split_huge_pmd_address() to split it. + */ + if (pmd_dirty(orig_pmd)) + set_pmd_at(mm, address, pmdp, orig_pmd); + else { + folio_remove_rmap_pmd(folio, page, vma); + zap_deposited_table(mm, pmdp); + add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR); + folio_put(folio); + ret = true; + } + +out: + spin_unlock(ptl); + mmu_notifier_invalidate_range_end(&range); + + return ret; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1631,6 +1706,12 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (flags & TTU_SYNC) pvmw.flags = PVMW_SYNC; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (flags & TTU_LAZYFREE_THP) + if (__try_to_unmap_lazyfree_thp(vma, address, folio)) + return true; +#endif + if (flags & TTU_SPLIT_HUGE_PMD) split_huge_pmd_address(vma, address, false, folio); diff --git a/mm/vmscan.c b/mm/vmscan.c index 49bd94423961..2358b1cff8bf 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1277,6 +1277,13 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_test_pmd_mappable(folio)) flags |= TTU_SPLIT_HUGE_PMD; + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (folio_test_anon(folio) && !was_swapbacked && + flags & TTU_SPLIT_HUGE_PMD) + flags |= TTU_LAZYFREE_THP; +#endif + /* * Without TTU_SYNC, try_to_unmap will only begin to * hold PTL from the first present PTE within a large