From patchwork Wed Feb 26 12:03:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992200 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40F9EC021B8 for ; Wed, 26 Feb 2025 12:04:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9865928002D; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8ACBE280024; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 615DC280034; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 3361828002D for ; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E54BCB69CF for ; Wed, 26 Feb 2025 12:03:51 +0000 (UTC) X-FDA: 83161961862.14.C53366C Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf25.hostedemail.com (Postfix) with ESMTP id 053BCA000E for ; Wed, 26 Feb 2025 12:03:49 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=FiHnF+KuTwEjvFyPJ3PoiZHnBerVifqImn6TA6yPpT4=; b=p+MLLeTbqeTeI26XArMDjep5NhZoVO/Bbm5jh5M0RHqvegIKGXZIJf4ZYz7QHtIZsbU1Jv jDTO2F0IKP0YGaMjzxZsIezrtRko9vut3Fn5ur0az989aQw5aLJfEe2Le8HOu1eSl5njWJ 0SQod0NThGb4mkH6MmOQWjAYb/y63Dw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571430; a=rsa-sha256; cv=none; b=1j67Gc/GI2FZT3AFKNRNIUw8ENkX1ssKF5Lg91vv3Z9G9kVB9AEPrWvHIrXk20qHVvsfDH 7AXo1cLHpgYNO8riEECPkDZVUdxgiepYZfAzEboc2QJD+fwKNZzzfd50roUOmUTaiI/1+x Rxm6lyqoLPhCrSdzEmgsSFsf+6yL/KY= X-AuditID: a67dfc5b-3e1ff7000001d7ae-f6-67bf03226ffd From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 01/25] x86/tlb: add APIs manipulating tlb batch's arch data Date: Wed, 26 Feb 2025 21:03:12 +0900 Message-Id: <20250226120336.29565-1-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226113024.GA1935@system.software.com> References: <20250226113024.GA1935@system.software.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnoa4S8/50gy2LxS3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK6P3wxO2gidCFUun8DUwfufvYuTk kBAwkVh6+Ds7jP33218wm01AXeLGjZ/MILaIgJnEwdY/QHEuDmaBZUwSe080sIEkhAXSJPa2 HgazWQRUJZ5fvM0IYvMKmEocblrPBDFUXmL1hgNggzgFLCSmTdnGAmILCZhL/Pm1jwlkqITA fTaJu297GCEaJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzKvMwKveT83E2MwKBeVvsnegfj pwvBhxgFOBiVeHgfnNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEhRmkO FiVxXqNv5SlCAumJJanZqakFqUUwWSYOTqkGRtmrSaGVnJtKb742aGc825wkM8FCfIf925cL it83PGs3jztpsED/a5Fl8IbFCrMXWp5+apDw4fSEW3k6PVFT7AXzWVTMHXmsL4hJ3i+P/5Mu psF1vp5p624f66Dw3QLzBf5LfJ+4QbneTmnNJsVFh2sLTlg3hvhy2qXd2fb/4DJ2D26vAs5j SizFGYmGWsxFxYkA1yTHI2YCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYPN0Hos569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyej88YSt4IlSx dApfA+N3/i5GTg4JAROJv9/+soPYbALqEjdu/GQGsUUEzCQOtv4BinNxMAssY5LYe6KBDSQh LJAmsbf1MJjNIqAq8fzibUYQm1fAVOJw03omiKHyEqs3HAAbxClgITFtyjYWEFtIwFziz699 TBMYuRYwMqxiFMnMK8tNzMwx1SvOzqjMy6zQS87P3cQIDNFltX8m7mD8ctn9EKMAB6MSD++D M3vThVgTy4orcw8xSnAwK4nwcmbuSRfiTUmsrEotyo8vKs1JLT7EKM3BoiTO6xWemiAkkJ5Y kpqdmlqQWgSTZeLglGpg3Hc1sXj3rrN9zTy/fJkZrDsf6uy/IM/K4+AWFHgo0m7e3IXmKk78 7devnHynG1MkufbI++cr+SVmbZ0n+dG+u/3Exq1HnfQvv99csZd3l8EMXbWIPD9Pz1C5d1ev ZMw/sy560kX9TK5bq5c2/bw/6/TmwyuL9woeevgsSPLzx8CT3ZWLFlZYSCmxFGckGmoxFxUn AgAUVfMcTQIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 053BCA000E X-Stat-Signature: 1th9s93neie7tqkpmp1u4u3s11hsfgtr X-HE-Tag: 1740571429-258940 X-HE-Meta: U2FsdGVkX18rd4cFl0VFtStX85FYxx/NL9+yAK79jCFZLo0quwjFjXVP/N0BL1NvsVZ8pLKu2lTRMxpuVuje9iky4q/BnCyaSp3sh1pL4fqQzqk1pc8w/uaVln4asHbel3E28v68SsPHErE2gmXOh4jg8BenWr8R47jyzEJNmAj7SsOxY72j6iA3LUUeKVExK5ZZjzshpPBRDHzilY3XTeV6O1/OjnuIWrsDn5LrPud4iZJ+npxqfkkgVjqt71i31mlnboPGjqhReFZ+xhMzU9l1VCDYlp/zX5XD7hSBlcew6Pmy6Uc5vW3WBSMzQ6QjhKbzrHGrvSTNwm+nyPs2kk2Cym9fds0WEGVnOEDmOy2tjCJIgSFE4m+EPqt3R0vk5lOwDM2UOkX4JnuB2xfxNlqIAwnX5zkMXfplodeT9pzTW1PWYcVQ0uwxQmE5YxImPTDr/itA9K5xCJ+rbD5FsUVQfKbIvYQ4Eiz1YIuBlXy5wHLInFlU7e2wjWEyaadslsI5XQS9ceXmCrgQk4TiOHQe8XR4cuvM9EGd6tE8j2VAZInk50dMKLWCHpeFZXYNfoY66bmaQkEupy00ydEYTyTwagNyZ7o5r8+oFg+HVYd6cEqwM9nfNskQ0j4rHdjDWcUKWxUVmuMalPtfawDQUYzjbavzzZ5mvNP8ABxM79dY/8igtUMnlsmx3UPN2m+z05TmxzoeFkRSktuX5QNe+ufCFlaUsvuZBbb09fmzoBIo28KUiVhN98N8dv5Bh3OPCWx6J9vp18Qxj5+rfZzPExUglyIPCmdFzTgzyYThwftX0+iu5yKWsVVpyYakpltL9s0Yoi0RlOWYir7iIFbjh26fNcbjraYTyb058MH3GfwKhlI8iwJd6T6hAWNEHJGE38ro7GRMH2ugfCWLKMWSaVIqcSkRYnWr4vn0+V9+od+g/sTmpqcIKT8PHCVzY3iTd6C6yPHKeavSc9lRl3p DYkaJfQM OdjTwChxLpQHQpqfqKWXsLQIMYg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read-only and were unmapped, since the contents of the folios wouldn't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that needs to recognize read-only tlb entries by separating tlb batch arch data into two, one is for read-only entries and the other is for writable ones, and merging those two when needed. It also optimizes tlb shootdown by skipping CPUs that have already performed tlb flush needed since. To support it, added APIs manipulating arch data for x86. Signed-off-by: Byungchul Park --- arch/x86/include/asm/tlbflush.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 02fc2aa06e9e0..c27e61bd274a5 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -294,6 +295,29 @@ static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + return !cpumask_subset(mm_cpumask(mm), &batch->cpumask); +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + return !cpumask_andnot(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); +} + static inline bool pte_flags_need_flush(unsigned long oldflags, unsigned long newflags, bool ignore_access) From patchwork Wed Feb 26 12:03:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96B3FC021B8 for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26CB228001C; Wed, 26 Feb 2025 07:03:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 21C1928001A; Wed, 26 Feb 2025 07:03:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BC9028001C; Wed, 26 Feb 2025 07:03:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E146028001A for ; Wed, 26 Feb 2025 07:03:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9CFAD1211FE for ; Wed, 26 Feb 2025 12:03:49 +0000 (UTC) X-FDA: 83161961778.28.16B0A85 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id 8E1AF40015 for ; Wed, 26 Feb 2025 12:03:47 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=xL/Zi69ylwOqBL7lDpwc4dP93INfmxOM0Txvx+3UKD8=; b=3XA63w4XGgw/E/cNMypXnKX4mVxZKpI/vc/IUvr4Ay/n+qbJZ5Y1Rw7za0J+vVS3baoOvv Cydeq6ay78lovoBCZJMlIIvF8DlWMA1mqZlz4LPAnLn05+AtT2lpt9Yg3jy5RVs4PmINgj wFs/51BlnrEhykGvcvR0WC3IogtRRwU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571427; a=rsa-sha256; cv=none; b=lfLYR7qx6fVqjyX5SFxgtwxCxAFkOgYE3Uia5g27tfCKS6YF1m1MyqEe5btywa7tgIb7XX +y94ou+nApuGyfcfILNhk+1nolwfZ4Axz+TLMfRSLWxL+gGyIg8A/KYZHiZoSf5RGmy6cw UTcU77j+RNgKyFWBiyKDxb42+I4fBI8= X-AuditID: a67dfc5b-3e1ff7000001d7ae-fb-67bf0322d251 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 02/25] arm64/tlbflush: add APIs manipulating tlb batch's arch data Date: Wed, 26 Feb 2025 21:03:13 +0900 Message-Id: <20250226120336.29565-2-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g5sN4hZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlbH60iqWgr98Fbca57M3MC7l6WLk 5JAQMJFYf+EAE4w9vX0NC4jNJqAucePGT2YQW0TATOJg6x/2LkYuDmaBZUwSe080sIEkhAVy JbY/nwHWzCKgKjGxayUjiM0rYCpxY/o2doih8hKrNxwAG8QJNOjTtGNgvUICyRI7f/9hAhkq IXCbTaLr+xpWiAZJiYMrbrBMYORdwMiwilEoM68sNzEzx0QvozIvs0IvOT93EyMwrJfV/one wfjpQvAhRgEORiUe3gdn9qYLsSaWFVfmHmKU4GBWEuHlzNyTLsSbklhZlVqUH19UmpNafIhR moNFSZzX6Ft5ipBAemJJanZqakFqEUyWiYNTqoFx1aVvAifSF09e++Pclb8G5ju71nFnFm9R X/4rsuAWv9BPtmMcZtXeTqEfJ1y6VfBv+dGphoLJl/ts9/7T4fRyL1jIZ7ot/Rr3ndxpq/yU l65dvjhum8sHwa+Gv+8fZMw74jbpVk7y73lL9Ri+PrKMSmCN79x1adXDxin6QScmLVx4rTrR 7MiOlUosxRmJhlrMRcWJAHybQUhnAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYM9UPos569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyVl9axVLwl6/i VuN89gbGpTxdjJwcEgImEtPb17CA2GwC6hI3bvxkBrFFBMwkDrb+Ye9i5OJgFljGJLH3RAMb SEJYIFdi+/MZTCA2i4CqxMSulYwgNq+AqcSN6dvYIYbKS6zecABsECfQoE/TjoH1CgkkS+z8 /YdpAiPXAkaGVYwimXlluYmZOaZ6xdkZlXmZFXrJ+bmbGIFBuqz2z8QdjF8uux9iFOBgVOLh fXBmb7oQa2JZcWXuIUYJDmYlEV7OzD3pQrwpiZVVqUX58UWlOanFhxilOViUxHm9wlMThATS E0tSs1NTC1KLYLJMHJxSDYxs0sKtwUvMTcSufg66ohDewsbGMfERa0hDUrAvV9typthHLlE5 H59pxMa8Nap0mrJgVdn1v/0JU/6kWR7VZ2SK1N5wrqEw4Pb7HqfHZefNW0J5tryeauR9+14F z7XzvRPZ4lt2K3OJPJS5tt9p/5TJF5mYHs1k2m3kpLtAXD/SU77eknu+kxJLcUaioRZzUXEi AOGD5elOAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 8E1AF40015 X-Stat-Signature: e9ftfidj1zhkmk1co8afnzrsbq8pzfkm X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571427-27689 X-HE-Meta: U2FsdGVkX1+t1byzC+QropybUt1Xiqw8tFhFoBHFy7h7iQ1XKx/10MdG4qFWOe8tZoFk0AjXSg+YEwrLMwlE3n8ahj8t2+NEcUtPJeeFljvoXypI7jBwhVTuqbap2MLSJWnVEgHEr9hg+M/IdFVeUfmh8/lHWkmljycA5lt1SDSwJuYKADgAyemSxEvA0DYL09q3N+MvaQYEBOti9rI2H38xA6O9mpDZjL50H+op1TeuOMPZY341V6fFBgpETzeQ62ROqy5jAenk/UOU/k+kF+H42pB9srIttGGdCt32PkXat0JHGNxZoqymzr0etrykwbmY8MHoL6tk65F/8nTFS13b+QMdPwLI9qS9SFuw5HeLHhc9TRy6Eomxx6IIPQ3UTxHoMZ28nk4g+qtPdalb0LuCpp0co/M0x2oU41vq/Alpu10vuG99c4NXKH0O+PnrduK9rwHO/QrpyF5lSRKjEWyIDk2dKTgTg35Edvhwf4tzfgIq2oHURmxIXsziZ8fzuiQcuYCwWOAf57zKMD64oJDYO4xzeYT6O6yrDY4mz9TE6QHmmbwFhvSOCLaijKoQio4C7YZjSqZPu1AWqcQKoFNjkMdxuOQE+NyNJVwMEMgcsMoLqgsnmhuRPWmD8SUhxrlSAw3XiSSYxu+QoqzNEpx/sLt8UI5xbyNBa/mATYex5PX9msJ08OQS0OQv8TTiGi25XF5Kr7QX9mxxJwBy5Qy8iyGFy4ydjpkgjlbCdHGwJN4b/X+JEuS40nhOdEeGcMlm4sa9o31ckCujfqJZK3/5ooW968y/mIq5anpa8Ij2JvNiW4ah/yOsU79M3AdKzPZ4qtHp7t75/k8SOORevlcuAaPSHfOj+sU000wYSdfU9nluhJYng5mO+PaxlYRzv87HNx7mF09VBxNmMU0q+RkqGyeEM0qRUPgXBDN2GlqTj9BgtJ7/gwaCnNa11JvIM5OsIkaV1PMc94i80m7 8rg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that requires to manipulate tlb batch's arch data. Even though arm64 does nothing for tlb things, arch with CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH should provide the APIs. Signed-off-by: Byungchul Park --- arch/arm64/include/asm/tlbflush.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b9..acac53a21e5d1 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -354,6 +354,33 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) dsb(ish); } +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + /* nothing to do */ +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + /* nothing to do */ +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + /* + * Nothing is needed in this architecture. + */ + return false; +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + /* Kernel can consider tlb batch always has been done. */ + return true; +} + /* * This is meant to avoid soft lock-ups on large TLB flushing ranges and not * necessarily a performance improvement. From patchwork Wed Feb 26 12:03:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992196 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DF83C021BF for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7273128001A; Wed, 26 Feb 2025 07:03:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D7D128001E; Wed, 26 Feb 2025 07:03:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5500928001A; Wed, 26 Feb 2025 07:03:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2EE1728001E for ; Wed, 26 Feb 2025 07:03:50 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EF8E952502 for ; Wed, 26 Feb 2025 12:03:49 +0000 (UTC) X-FDA: 83161961778.24.4D25C12 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf27.hostedemail.com (Postfix) with ESMTP id EB4F940008 for ; Wed, 26 Feb 2025 12:03:47 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=G30xGqy1s/coEL/U1j52C+U7PjQw0ciNR8FppuYW4fo=; b=LNl4AiiT7+4Ie2txyI7ImSI66g/8N4CgehROBN8fUikrpUT2bZ8rc/0idvKbyAOrWtTbh0 Lzw0/5OWlYojHh3CuEA+5L9yCK12YdqKPTtfQWpvZvGl5cIusFU2DZmhPEooeAf6qEWuA7 7BAFtOl+xfmWuuZ1O88wnPtpIfJxhkk= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571428; a=rsa-sha256; cv=none; b=esdVuiXsowSrWuwfMogvWDnRm6FMCY/CqR0jHUh60/KruOZD9Qmx8FiIVve+ma7edCYknH 5A+24oFTRo8z9oYAYXSOAHHYmd/ly18z6xVOaHSCTjTo0q86XvaFlJWnk66ZuGL4pcfLlm fzN/vSI4Dga5216lYtUnBbkkCwERKeM= X-AuditID: a67dfc5b-3e1ff7000001d7ae-00-67bf0322bf9e From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 03/25] riscv/tlb: add APIs manipulating tlb batch's arch data Date: Wed, 26 Feb 2025 21:03:14 +0900 Message-Id: <20250226120336.29565-3-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g46pkhZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlbFj7V2WgmtCFU8e/2NqYHzF38XI ySEhYCIxYdMHNhh78YnTYDabgLrEjRs/mUFsEQEziYOtf9i7GLk4mAWWMUnsPdEAViQskCHR f2I/O4jNIqAqMbF1L2sXIwcHr4CpxPcN5hAz5SVWbzgANocTaM6nacfAWoUEkiV2/v7DBDJT QuA+m8TzvvesEA2SEgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVeZoVecn7uJkZgUC+r/RO9 g/HTheBDjAIcjEo8vA/O7E0XYk0sK67MPcQowcGsJMLLmbknXYg3JbGyKrUoP76oNCe1+BCj NAeLkjiv0bfyFCGB9MSS1OzU1ILUIpgsEwenVANj/Hzjbvdwn806nk28n7711UZUe6T+yX6x 1U3zvFFDU/UMPt9os5JAy4g/JeULc3/L/FJdcfjPBRPh97u+bF150p9D7dhxhTO2eoyZ6w57 zLeUyHV+LrRj8kGrg8u282yt8PSdId+u5Cs0fV+u4UfG1zlir16omG/9+d/WmDW4XO5krN+D z1JKLMUZiYZazEXFiQCHrFxAZgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4M4KAYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgydqy9y1JwTaji yeN/TA2Mr/i7GDk5JARMJBafOM0GYrMJqEvcuPGTGcQWETCTONj6h72LkYuDWWAZk8TeEw1g RcICGRL9J/azg9gsAqoSE1v3snYxcnDwCphKfN9gDjFTXmL1hgNgcziB5nyadgysVUggWWLn 7z9MExi5FjAyrGIUycwry03MzDHVK87OqMzLrNBLzs/dxAgM0WW1fybuYPxy2f0QowAHoxIP 74Mze9OFWBPLiitzDzFKcDArifByZu5JF+JNSaysSi3Kjy8qzUktPsQozcGiJM7rFZ6aICSQ nliSmp2aWpBaBJNl4uCUamC8b7pDp87Q/FDLVWEh/bvTGf4f+7Buus8uhoOLSqabT5zR8Ell TuDG8pSY6LMFwbziKy99neTONN9DR3t69BOxd3Zsz+yk1t2R5eS167+5xPCdXavwoX2fPDme WK3Y95iheek0j/7c2L9GW7qfzii4V8gsF/KX805rTmbHIs+qIrYIyR/HVxgpsRRnJBpqMRcV JwIAI6pn4k0CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: EB4F940008 X-Stat-Signature: jj98fjrxtty9rfhj1n5em1drc7azyd4j X-HE-Tag: 1740571427-458276 X-HE-Meta: U2FsdGVkX1+DnxQ1OHju2HMAKrpesPdJ4mr3BSOXre5yxlfZcgu6UuGvdviVKHYbjOrG4Txn19zaZm5VrKgGT3Z9BilFpHcaKN8I8LwKEDnbnxHFO9INjmsMMRMX3SW9Fwx6heVmp3JCGfS2a0mc0wDpZ1RHIzOknUECH0z4mKa2ty8JB+Y6OHMYgZPYs7VLV7YfQ+LWKfbFo49hyxPJE7J0yD4P2gdcF5ufSqA3jv1NFbt0PyZJshdOw4EwYhaQMYU7NxE6/5dyRHq3owIt/2a4lWkmvzGp5IH49eM1UTSA00nZH7GedrhplybbSb9pDts9ymaoIFilHInJOOB7s6en0DCXW+RSu1buATd4VpSfBAENqWDb6A4+KH3SmS4O2M2aCyGIWglOSa+YtJBSknF8AbY6uqNZnI8YnbHkegIq94A0bsqXMZaLIAY4Dh3NrAVRwzwOqIfBYhQG1lCvidZg+oM6KGJJmWW/RrMjFSxq6NolJX9dZ+ZSM45fnii9ojxpe6P4NE7veWQPKCW5mVXOVoJ4CpCYrMYhCUu86099uWxhr00hcYU55wHMiBsJu5JEDj5Lyy8NfRB7ALMWahomM9Jfbo549KJdQIZfVhsbEF6lrAmt6UcQRdcMEhhc2k1R+SrcLNLtAs0i5K/svej2rcT37vQdXe/CU/kMEp0YDE//h0AATffymCWxJ8nm5Cag+BfEd0c5nc8Cnu4jjp2oJBhAt83k01FJnO/8SKkgTOTdrm8Mk66cfTErvB0c9jILaZ7S/Bbhwj+pCPBP2EeIxkz2r0o3r9fMiIbkh6GXj+cB/tdwgIQHy2LgQVHPYptnA21NFJv/Z9L47hJ4gwnmJ4qe07x22B0hh9G4v2nuIpUWFXOMlYXN3MvIw7oJhr0J07u56KCA5UzZy+xb2Oa6J7Hdh7aSFrU+4kLP1XrNRMbfI1nB2544yXqv80ppZ8TqHVQRCbmZJ7Tu1rv HUp7ucqU 5TrY5cdaCYKTsB79zdaHaQw/KZcdMBcSDaQLw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that needs to recognize read-only tlb entries by separating tlb batch arch data into two, one is for read-only entries and the other is for writable ones, and merging those two when needed. It also optimizes tlb shootdown by skipping CPUs that have already performed tlb flush needed since. To support it, added APIs manipulating arch data for riscv. Signed-off-by: Byungchul Park --- arch/riscv/include/asm/tlbflush.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 72e5599349529..1dc7d30273d59 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -8,6 +8,7 @@ #define _ASM_RISCV_TLBFLUSH_H #include +#include #include #include @@ -65,6 +66,33 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); + +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + return !cpumask_subset(mm_cpumask(mm), &batch->cpumask); + +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + return !cpumask_andnot(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + +} + extern unsigned long tlb_flush_all_threshold; #else /* CONFIG_MMU */ #define local_flush_tlb_all() do { } while (0) From patchwork Wed Feb 26 12:03:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C49F7C021BF for ; Wed, 26 Feb 2025 12:05:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52D79280002; Wed, 26 Feb 2025 07:05:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DC67280001; Wed, 26 Feb 2025 07:05:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37DCB280002; Wed, 26 Feb 2025 07:05:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 14D09280001 for ; Wed, 26 Feb 2025 07:05:43 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C51141413D8 for ; Wed, 26 Feb 2025 12:05:42 +0000 (UTC) X-FDA: 83161966524.27.8145DBF Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf21.hostedemail.com (Postfix) with ESMTP id 41E661C0002 for ; Wed, 26 Feb 2025 12:03:47 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=GpivHWQDQ8TeIBddoSLz7nMvfj8HMxpiIf9q3WZjE/0=; b=DrzCUjfvWJnQtnLGsxkVHDijwG8AtT2wtIa92Pr9HuF42VgY92kf3s6QoRhvJthaLXWAIz D3bmWkvtRkAPbw3KKs3acJvjbfWWFJU0D5F+MZDek1boM5TuzVdZeCJEIG1TbFwD8luOEo c4SbNdxw5Bu3QVEz+6TDvt14jLYkcds= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571428; a=rsa-sha256; cv=none; b=2q/iPsFM77cSRLf+KjPNXvuKqbaUi6J6sN5GqSjQ7G/A8qSBNf8anIUCMmi+uht9clcZAj WP4Dnp5K2twnm7RBX78WuRD8wDYMupq4dPBwnKycZchJHfwVKUSyIlr4GzdufZGHA7Rkj1 ap4ndIWK48slQCHIw1HzMNcvCJMzJ7g= X-AuditID: a67dfc5b-3e1ff7000001d7ae-05-67bf0322220f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 04/25] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Date: Wed, 26 Feb 2025 21:03:15 +0900 Message-Id: <20250226120336.29565-4-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g51TpC3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PunU+sBU/5Krr3JjQwzubpYuTk kBAwkVjV+JwJxp5+ug3MZhNQl7hx4ycziC0iYCZxsPUPexcjFwezwDImib0nGthAHGGBLkaJ my1NYFUsAqoSu+ceYgGxeQVMJQ5174SaKi+xesMBsBpOoEmfph1jA7GFBJIldv7+wwQySELg NpvExtdd7BANkhIHV9xgmcDIu4CRYRWjUGZeWW5iZo6JXkZlXmaFXnJ+7iZGYFgvq/0TvYPx 04XgQ4wCHIxKPLwPzuxNF2JNLCuuzD3EKMHBrCTCy5m5J12INyWxsiq1KD++qDQntfgQozQH i5I4r9G38hQhgfTEktTs1NSC1CKYLBMHp1QDo3izEHP9nBebDpwM1T/0okyjf9fGbIOfxrHT jXzkxRcfaos/rsXd+V76vBazyxJ+m9LivGyfuoYFIt6Ojodm3GQ88iP6pMQpzYYDD9NE75aJ +/4QnvF+furBnJwtVtudxEOY53E77Piz0jD93jTFxeJSV3580mRas0n0sfI6I1+DTVn5SypT lViKMxINtZiLihMB+7rekmcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYOcqIYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgy5t75xFrwlK+i e29CA+Nsni5GTg4JAROJ6afbmEBsNgF1iRs3fjKD2CICZhIHW/+wdzFycTALLGOS2HuigQ3E ERboYpS42dIEVsUioCqxe+4hFhCbV8BU4lD3TiaIqfISqzccAKvhBJr0adoxNhBbSCBZYufv P0wTGLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAzSZbV/Ju5g/HLZ/RCjAAejEg/v gzN704VYE8uKK3MPMUpwMCuJ8HJm7kkX4k1JrKxKLcqPLyrNSS0+xCjNwaIkzusVnpogJJCe WJKanZpakFoEk2Xi4JRqYIwWv+aq6KWR/q7FW7r5JTe/wtzj4a17FKOdKm5tMBJ7ynfwhdLj LObJBpduTFbqtjNdoTs3LCE0SeLboY/aP2fOS10g4vTbWbBr4USheM3r3xKWs1SYnuU4/j9k iWvjP/G2C1dT5wqmpxSpnlnx2sO6KGbGulQ7YXV/z3m2V1x1zv9fa/1qlxJLcUaioRZzUXEi AHikbm9OAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 41E661C0002 X-Stat-Signature: w5mtg1wc61b43a6gto1mgo4rr8y9s3sb X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571427-252302 X-HE-Meta: U2FsdGVkX1/ZvtCDem3hZbL/nMjpVMNrP7EbzWvl2NfmkZgmVtado7vXAx8tpCwOT9Mo/A1kuf5OSnnntmPtwO1wrc8q/IKDDsUwj93FUTkB5oT/f/15TreDydPMz8oti+hsOSNxYOEvBBcbKqoP/i3F5Rj0Mfhv/q6KbbYZFb+W007HJSJPaS7+lVELf9nuXQaOpHQOd2I3xvQpFvgSKQDH92jdHAQR1S8/udffnutQwnAZeBxUftnfWo7h/JttsIewqoHtgVECZhiOfIZW8Zk6RQPtBwDrs9whi/3L7LQJjhw6A14Qo55C74zi1h5yZUCVdwVkL+DnDI4NXp9a/wa46o7RGxD6NOdNINdPXNlbqitVJrzm33V9Uahe5r34L35dSNSfDa13AmAKQVrNycVwq6/S3R8UrYawjzilu6ncdY3JUX+snd2pfJ4AhLyqm1sMS0S/Ax1wEkxjcrsfhbbly3d1/3OLVxY2hIhzeDmeJ5+P4yju2EbWMfXRev850rewH38GJp1MXYnWdYAamNXLXgGQd7F2BiFh5W6G6hLhEApREwuE1xtdw6s3TCFAlL0mJyrdtYocSuKU9YSIj4UaWZaJAZRfa52LYYxhf1tTKxJ9W0TQymO4S3XqwMH6HMoqyr6mq3KHbQUhoENqXKoQCWB0L+i22hwlWRBNU93Bb8lKYNqkyXxYESxMEAZcJCghlLXx159bclYgyCH+ABRbip6ckcVT5XJX2VaOZhchvo15epOdp2s7jCEZ9TvyFBAMIeCWM/qM9u0IxCdX+S0hspr587kc3mPMJkETauZedcaKyv0l2Jc/ysLR3NoiFMoURsp3plLE4unlCHAfTt57+ivMxGQW1KExJwN8FtE+kmwVu44bUm/XVy+LmuUOtgTFXIDDuMbAadv5d/YHMubB5689LYbqMnngYHJxU0pjGfS6fCRUjTYve+BiUX6oaLoRXKQU/EGJxYo77AD GOsegROu XT3QYihV6NTCsRZQEkX8rZT6wfFqcAn6gFWx8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that requires to avoid redundant tlb flush by manipulating tlb batch's arch data. To achieve that, we need to separate the part clearing the tlb batch's arch data out of arch_tlbbatch_flush(). Signed-off-by: Byungchul Park --- arch/riscv/mm/tlbflush.c | 1 - arch/x86/mm/tlb.c | 2 -- mm/rmap.c | 1 + 3 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 9b6e86ce38674..36f996af6256c 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -201,5 +201,4 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); - cpumask_clear(&batch->cpumask); } diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 6cf881a942bbe..523e8bb6fba1f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1292,8 +1292,6 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) local_irq_enable(); } - cpumask_clear(&batch->cpumask); - put_flush_tlb_info(); put_cpu(); } diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7e..2de01de164ef0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -648,6 +648,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); + arch_tlbbatch_clear(&tlb_ubc->arch); tlb_ubc->flush_required = false; tlb_ubc->writable = false; } From patchwork Wed Feb 26 12:03:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63591C021B8 for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2CB528001E; Wed, 26 Feb 2025 07:03:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDC58280020; Wed, 26 Feb 2025 07:03:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C074228001E; Wed, 26 Feb 2025 07:03:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 95BF8280020 for ; Wed, 26 Feb 2025 07:03:50 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4E216B6729 for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) X-FDA: 83161961820.12.9174596 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id 5FCC21A0023 for ; Wed, 26 Feb 2025 12:03:48 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=M1fkZu3FyjLbAoapdbPq1UQhtrUdGaJIICRcLwYQhnE=; b=Asr3kY9/FOiLEVmjnK2bbkyi3OFAAFXr5NY9ZV3UCipwJm4gV1pCUnlci7ThzSAbyH64Pp HFzp8NiyqC6og4WtiV9N2kim+RFwbGO36UpzSj04VxPhbiCMZOEgNXCBmp5IHYHtcnXYjJ IgHBBt4/OhxDIGilOammxdixnpuUzqk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571428; a=rsa-sha256; cv=none; b=gcx2L0BXM5QWdWWzkw6B7lp4COmrUkvDJqp4ERovouTJH7pqXuKmC9znNjz+hUB9SwnLNY 8rIqSp9MFZKAhHyM4LLD2XL53F/fNRmwM11Q/AIpVyefbm2yEUpSyIEMBIYc+0RNgpkM4J ok1mec7sf8G3uPDk4KjQ6de2i6Yh8t4= X-AuditID: a67dfc5b-3e1ff7000001d7ae-0a-67bf0322c8a4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 05/25] mm/buddy: make room for a new variable, luf_key, in struct page Date: Wed, 26 Feb 2025 21:03:16 +0900 Message-Id: <20250226120336.29565-5-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsXC9ZZnoa4S8/50g7UzFS3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+Pz/c+sBWtkKpbukWxgnCvWxcjJ ISFgInFzynYWGPvbzO/MIDabgLrEjRs/wWwRATOJg61/2LsYuTiYBZYxSew90cAGkhAWKJSY P/sHmM0ioCqxZP9Vxi5GDg5eAVOJs81FEDPlJVZvOAA2hxNozqdpx8DKhQSSJXb+/sMEMlNC 4D6bxKf2HawQDZISB1fcYJnAyLuAkWEVo1BmXlluYmaOiV5GZV5mhV5yfu4mRmBIL6v9E72D 8dOF4EOMAhyMSjy8D87sTRdiTSwrrsw9xCjBwawkwsuZuSddiDclsbIqtSg/vqg0J7X4EKM0 B4uSOK/Rt/IUIYH0xJLU7NTUgtQimCwTB6dUA6PqnNzmmLIt1naGx9b/eORz7td8/oeuTw4w rt17RFB1S+bsiE+BTownZy1lbmuufbn+N59x30rnxcE1pvpbtbbE8m1k2xVQu1tT4BtPKOMT 5hQ/HxvrKuaM/WL5kxw2b/8pWpa2YWfODeV9Vev45+eEvN3yoO75w7JJi7ZvbCjQkJU2PbDz Y74SS3FGoqEWc1FxIgD26I6RZQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4OhBEYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyPt//zFqwRqZi 6R7JBsa5Yl2MnBwSAiYS32Z+Zwax2QTUJW7c+AlmiwiYSRxs/cPexcjFwSywjEli74kGNpCE sEChxPzZP8BsFgFViSX7rzJ2MXJw8AqYSpxtLoKYKS+xesMBsDmcQHM+TTsGVi4kkCyx8/cf pgmMXAsYGVYximTmleUmZuaY6hVnZ1TmZVboJefnbmIEBuiy2j8TdzB+uex+iFGAg1GJh/fB mb3pQqyJZcWVuYcYJTiYlUR4OTP3pAvxpiRWVqUW5ccXleakFh9ilOZgURLn9QpPTRASSE8s Sc1OTS1ILYLJMnFwSjUwzgz7eXTXDtUfGwVOuRpxPnhu/2Gv0xKlyk+Pp5zNf6sUWhJ+/XDz 7Ur3vg0umpOiX7/cuezP3NoMn8wpD4zmKq5xZEizaL7/0pP3lJlbtbTz//02Rz5Eztq+S1rv cdHkY4ui3Xv07Tds8T/ssfvl86sRk746CTWvLznY/vLEFwmruKkvum6+f6DEUpyRaKjFXFSc CADmImAJTAIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: o1tftxrd1orooch5aw56i5ifcudyo97s X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 5FCC21A0023 X-HE-Tag: 1740571428-636033 X-HE-Meta: U2FsdGVkX18DFgKqxSEVyg90YD9Zs00RGjgeiLF1ydvZ0Zco2r+ZsG/dtwaN7KWfGV2WdyNKnejYeFDpa7vDXZtcnuV6sPvHvyXiitxUUxQF9pBZ3PCy6kiRMq+NScFKc5UdlTqCiCgWEMEIGwhTyAXpCCiM7Xstd6wRTJ0/TxbjjULDxUeY/R51kpT1VwUsd0IjdgFw0DowWRis1x+FWH53Aisw4jwgbdt/IBXe5q/U/OkG0zsNS6GZg1G7KqPCwoURoSTzDfYddo4bcTHhdM7B7NtESBXmjUCVnfTCsB2a/kARXILHEAwY6lNnczXEPsEWdByYQpRpGr3YnN5WUZMPgamjvF3yAJ7csuqBg5zDi/35b/7LhpZN+f8jCEhxK2ugcFTVBIWgu4sVSZP8PIBseWv7Q9Xp5ikOghEhniHmDvbJU9saCvMucqn+TOtRwKGX2Gf/WuCRtdV+WJdj2YRuGrGB4I8tlNiflKUvWCFbaZ6i7IgXMHVZrX1yGZ/KyA4TVgL2+ZG0kRskGElB7uGOnoNWZ+Y5N/lYwPo9gAQSvo9ivg5ickbdQcJW72WDSrNgQI6A37ngBVX8+iX+mFLrpa8BzlfQSicQ9N1oMxATHEGifruMtaabfF8YrPdMzTjycToebe0D/eC0Ea84l5fKwsG+5Xtt6ozEid4fOgzg22JKG0ch+lDgnA3mEIXZJUVCgvWF5fYwReJEUcplfz5SAiIz37oQFacvb8kd47WG4ur37EZE/B4l3+TkeknlVQvgUaQSv7ieqIRmGzSXUnoitUT8mN6EktZa2fmgEOcPMAf72j41xgwKvVp+yHM2qOr1p/6JQr5mo6yIYV3ZpwA7qwu3pwJTO1yHoVbCIJRk7xgIIl0CuAZFHt4rLzrel3sz6q9LlX6Zhl+2wnVDqzYwgDgoK7sX3wJXzFUPVx6ikv2ATT/NqnIq+RqU0ylVoOV2ClyoSsOLs+J5wVM WrjffR90 DGHCOCcx4FB+a9Fo2BOqE/po5vpTOcEfc3goBcd5nd4DHajTvHqweSRpZfnHawN0Wtqrx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that tracks need of tlb flush for each page residing in buddy. Since the private field in struct page is used only to store page order in buddy, ranging from 0 to MAX_PAGE_ORDER, that can be covered with unsigned short. So splitted it into two smaller ones, order and luf_key, so that the both can be used in buddy at the same time. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 42 +++++++++++++++++++++++++++++++++------- mm/internal.h | 4 ++-- mm/page_alloc.c | 2 +- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 0234f14f2aa6b..7d78a285e52ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -106,13 +106,27 @@ struct page { pgoff_t index; /* Our offset within mapping. */ unsigned long share; /* share count for fsdax */ }; - /** - * @private: Mapping-private opaque data. - * Usually used for buffer_heads if PagePrivate. - * Used for swp_entry_t if swapcache flag set. - * Indicates order in the buddy system if PageBuddy. - */ - unsigned long private; + union { + /** + * @private: Mapping-private opaque data. + * Usually used for buffer_heads if PagePrivate. + * Used for swp_entry_t if swapcache flag set. + * Indicates order in the buddy system if PageBuddy. + */ + unsigned long private; + struct { + /* + * Indicates order in the buddy system if PageBuddy. + */ + unsigned short order; + + /* + * For tracking need of tlb flush, + * by luf(lazy unmap flush). + */ + unsigned short luf_key; + }; + }; }; struct { /* page_pool used by netstack */ /** @@ -566,6 +580,20 @@ static inline void set_page_private(struct page *page, unsigned long private) page->private = private; } +#define page_buddy_order(page) ((page)->order) + +static inline void set_page_buddy_order(struct page *page, unsigned int order) +{ + page->order = (unsigned short)order; +} + +#define page_luf_key(page) ((page)->luf_key) + +static inline void set_page_luf_key(struct page *page, unsigned short luf_key) +{ + page->luf_key = luf_key; +} + static inline void *folio_get_private(struct folio *folio) { return folio->private; diff --git a/mm/internal.h b/mm/internal.h index 109ef30fee11f..d7161a6e0b352 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -543,7 +543,7 @@ struct alloc_context { static inline unsigned int buddy_order(struct page *page) { /* PageBuddy() must be checked by the caller */ - return page_private(page); + return page_buddy_order(page); } /* @@ -557,7 +557,7 @@ static inline unsigned int buddy_order(struct page *page) * times, potentially observing different values in the tests and the actual * use of the result. */ -#define buddy_order_unsafe(page) READ_ONCE(page_private(page)) +#define buddy_order_unsafe(page) READ_ONCE(page_buddy_order(page)) /* * This function checks whether a page is free && is the buddy diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 579789600a3c7..c08b1389d5671 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -576,7 +576,7 @@ void prep_compound_page(struct page *page, unsigned int order) static inline void set_buddy_order(struct page *page, unsigned int order) { - set_page_private(page, order); + set_page_buddy_order(page, order); __SetPageBuddy(page); } From patchwork Wed Feb 26 12:03:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992198 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56DB0C021B8 for ; Wed, 26 Feb 2025 12:03:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44402280020; Wed, 26 Feb 2025 07:03:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C94F280024; Wed, 26 Feb 2025 07:03:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F6BF280020; Wed, 26 Feb 2025 07:03:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E808E280024 for ; Wed, 26 Feb 2025 07:03:50 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A0930C1386 for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) X-FDA: 83161961820.15.FB68D23 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id 9F78B40008 for ; Wed, 26 Feb 2025 12:03:48 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571429; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=SKiITRHUl5INiO6pLcOQNCWmURoPfnq7z5oAqR6eRcY=; b=EnoO42u2g7XSZcZrP+Vox1D9X+qndbHXfNk6BblSFpEQF07o2PcOrBIup6L37hwn8PovAD gzfEqy1yM3ZaVRYW/lr3E3zyZ1KaUdef2gozZLCFYdMZZq9rRHQbHdOhvWJDdnxCI/WXd2 flNeuFTFXhTmZYBaF4CAVGIiX3H5EB8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571429; a=rsa-sha256; cv=none; b=7g+U5pCyA4PrDCdQS9cbZ6B2PFxlS8aBLMLSAj2eopkDj4+3uKzrumSja0uma0qeYbVKzT 0UzGWINRIiwH6oCyX1Jo1yxAoI/6yTuTGLLDRq34Zneut8/Hlg2F7D2zPE6f0hvtiDp/dY CBcup3i3FYlAslUaU7wxmbEntrb1Kqs= X-AuditID: a67dfc5b-3e1ff7000001d7ae-0f-67bf0322004f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 06/25] mm: move should_skip_kasan_poison() to mm/internal.h Date: Wed, 26 Feb 2025 21:03:17 +0900 Message-Id: <20250226120336.29565-6-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50gys7FC3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2P/+7PsBQt1Ku73/2ZuYJyj0sXI ySEhYCKx/V4/G4x94vFbdhCbTUBd4saNn8wgtoiAmcTB1j9AcS4OZoFlTBJ7TzQANXBwCAuk Say9JQlSwyKgKrGpaRJYL6+AqcSfXQtZIWbKS6zecABsDifQnE/TjoHtEhJIltj5+w8TyEwJ gftsEk9XNDFDNEhKHFxxg2UCI+8CRoZVjEKZeWW5iZk5JnoZlXmZFXrJ+bmbGIFBvaz2T/QO xk8Xgg8xCnAwKvHwPjizN12INbGsuDL3EKMEB7OSCC9n5p50Id6UxMqq1KL8+KLSnNTiQ4zS HCxK4rxG38pThATSE0tSs1NTC1KLYLJMHJxSDYy1LmaMCnelbmxlrfpuEFwd6ncz1F4meNna hMPVLrbezb16hw9vk779TddFU2Kv4iPlkA1TO3PfxR4qf6Vw9fGpt3sTHq1rCH6R9EfbY9pb nxVB01zlNDZUhDEs2GH3p7/6ypWFIsf2LX0QOMv13rvN71/p13PoLvKq/cMmLX8g+sKhurtP WBYrsRRnJBpqMRcVJwIAXY/5umYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4MVlMYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgy9r8/y16wUKfi fv9v5gbGOSpdjJwcEgImEicev2UHsdkE1CVu3PjJDGKLCJhJHGz9AxTn4mAWWMYksfdEA1sX IweHsECaxNpbkiA1LAKqEpuaJoH18gqYSvzZtZAVYqa8xOoNB8DmcALN+TTtGBuILSSQLLHz 9x+mCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgSG6LLaPxN3MH657H6IUYCDUYmH 98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBREuf1Ck9NEBJI TyxJzU5NLUgtgskycXBKNTDW8i4PXHj84t83p7/4CQvNFcgS4Jk5eXcD5+EYi0W7PjP3Ftfn XYzvj3m0rOuv56zZV8KnO878vfLkBkmRMh/ntiDHEs7S3z+8nZ4rihdM+7RJP8pin8vUoiil xwZWi3M4terlPS+zPX9jGHz9orGa3aHjXgzMp7Zs2D7hX0Ha8p9uMTVaNRJKLMUZiYZazEXF iQCrjtWPTQIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 9F78B40008 X-Stat-Signature: 3dzkqctkiydpor7d4tqpr9j68e9tpz8b X-Rspamd-Server: rspam03 X-HE-Tag: 1740571428-606759 X-HE-Meta: U2FsdGVkX1+hRMC1b43oLRz3jzcWz8lLjbR0+ub+CtJRDeQWd2zt2so/BiWFQ+adFhm4s41z4jBPTIkYPWucFhUwBjxG5IecKduXWgVvHe9dnl+tEob7CKzkqrNr1qDu/o19/Z4HZ5AktnEed3It88TPk6RWrWem1IJVIcB3YJgpz8T6Xu5llJ/1lrn8GFxmwYzz8RHsQC4wOj9iJo/DTZ8NibBqCnSevBabFNiqn8Z5qDFZz4bOlmePK6d+R4z/iSpPMycuKk9TWUBkhHU4PAph+saEcdbPg14GWJwBwQjxjlSGPahufaQvWVFg4zDvVKs/RQMmw1MNX9HH2PRCYrw/iZmJoN6FbDhozgcrKn3KMH2kokze+Uoot4Gb7Qzpz3I15QqBKBLtaqgOm62mVO4Wk1MK6CaWpz72wjNqrtl8TWXxadQ88yFmUFAIAeSdlK26S07N19Pr+Ci3Aunc448aiF3zx8vQdT0fxfQqlXfKPzKFhnhbpthtFTuNx7kuORGjjwitFqBCCmtjCcBJqPHOSDTLC9yX3vIEALRxdICW5pQZ8+7O0fdOe3sFiXIgW4spOhPBS1UzuTMb7VNS8HnpkgnD1KcXwWbPw36Y9G0XKSznofpZUflmOGAYb1jXRDy9djDyQCQrHgu8ucw/0kt0UecO9mNWwmdC6Vc/ir6fHTh1X8ZFYXsgMl6Yx193YzxcXwm3VQaGOKQ2TbyQGt+d61aA6EHAnUNHTJivdNBB6tiPSoFlEHQlnzGayuTaWAdkSu/eGc5IidaiF7ESc85qboanYQPBe6TwfavMhKTzeDFk4+zsMoz+Lk3Yng0Uj4T7VQJy9gW0Jx0rKi0O53dFTOJZJXCIhadFUArRzXL9jjvcNRbIsCRxyKqjUYqxxYw65KSJ56D7xcNGiAgAFdKQ3uwitxpJRqaq+heFfQP/LOQZleuF7w9sDGiL6zCBFaQy+HooUHhLyvxaj/m kRs+OReU smRwcgUfwYFPanEEdvTJPiiv5BT2OHhw24AW/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to use should_skip_kasan_poison() function in mm/internal.h. Signed-off-by: Byungchul Park --- mm/internal.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 47 ----------------------------------------------- 2 files changed, 47 insertions(+), 47 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index d7161a6e0b352..4c8ed93a792ec 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1051,8 +1051,55 @@ static inline void vunmap_range_noflush(unsigned long start, unsigned long end) DECLARE_STATIC_KEY_TRUE(deferred_pages); bool __init deferred_grow_zone(struct zone *zone, unsigned int order); + +static inline bool deferred_pages_enabled(void) +{ + return static_branch_unlikely(&deferred_pages); +} +#else +static inline bool deferred_pages_enabled(void) +{ + return false; +} #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ +/* + * Skip KASAN memory poisoning when either: + * + * 1. For generic KASAN: deferred memory initialization has not yet completed. + * Tag-based KASAN modes skip pages freed via deferred memory initialization + * using page tags instead (see below). + * 2. For tag-based KASAN modes: the page has a match-all KASAN tag, indicating + * that error detection is disabled for accesses via the page address. + * + * Pages will have match-all tags in the following circumstances: + * + * 1. Pages are being initialized for the first time, including during deferred + * memory init; see the call to page_kasan_tag_reset in __init_single_page. + * 2. The allocation was not unpoisoned due to __GFP_SKIP_KASAN, with the + * exception of pages unpoisoned by kasan_unpoison_vmalloc. + * 3. The allocation was excluded from being checked due to sampling, + * see the call to kasan_unpoison_pages. + * + * Poisoning pages during deferred memory init will greatly lengthen the + * process and cause problem in large memory systems as the deferred pages + * initialization is done with interrupt disabled. + * + * Assuming that there will be no reference to those newly initialized + * pages before they are ever allocated, this should have no effect on + * KASAN memory tracking as the poison will be properly inserted at page + * allocation time. The only corner case is when pages are allocated by + * on-demand allocation and then freed again before the deferred pages + * initialization is done, but this is not likely to happen. + */ +static inline bool should_skip_kasan_poison(struct page *page) +{ + if (IS_ENABLED(CONFIG_KASAN_GENERIC)) + return deferred_pages_enabled(); + + return page_kasan_tag(page) == KASAN_TAG_KERNEL; +} + enum mminit_level { MMINIT_WARNING, MMINIT_VERIFY, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c08b1389d5671..27aeee0cfcf8f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -299,11 +299,6 @@ int page_group_by_mobility_disabled __read_mostly; */ DEFINE_STATIC_KEY_TRUE(deferred_pages); -static inline bool deferred_pages_enabled(void) -{ - return static_branch_unlikely(&deferred_pages); -} - /* * deferred_grow_zone() is __init, but it is called from * get_page_from_freelist() during early boot until deferred_pages permanently @@ -316,11 +311,6 @@ _deferred_grow_zone(struct zone *zone, unsigned int order) return deferred_grow_zone(zone, order); } #else -static inline bool deferred_pages_enabled(void) -{ - return false; -} - static inline bool _deferred_grow_zone(struct zone *zone, unsigned int order) { return false; @@ -993,43 +983,6 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) return ret; } -/* - * Skip KASAN memory poisoning when either: - * - * 1. For generic KASAN: deferred memory initialization has not yet completed. - * Tag-based KASAN modes skip pages freed via deferred memory initialization - * using page tags instead (see below). - * 2. For tag-based KASAN modes: the page has a match-all KASAN tag, indicating - * that error detection is disabled for accesses via the page address. - * - * Pages will have match-all tags in the following circumstances: - * - * 1. Pages are being initialized for the first time, including during deferred - * memory init; see the call to page_kasan_tag_reset in __init_single_page. - * 2. The allocation was not unpoisoned due to __GFP_SKIP_KASAN, with the - * exception of pages unpoisoned by kasan_unpoison_vmalloc. - * 3. The allocation was excluded from being checked due to sampling, - * see the call to kasan_unpoison_pages. - * - * Poisoning pages during deferred memory init will greatly lengthen the - * process and cause problem in large memory systems as the deferred pages - * initialization is done with interrupt disabled. - * - * Assuming that there will be no reference to those newly initialized - * pages before they are ever allocated, this should have no effect on - * KASAN memory tracking as the poison will be properly inserted at page - * allocation time. The only corner case is when pages are allocated by - * on-demand allocation and then freed again before the deferred pages - * initialization is done, but this is not likely to happen. - */ -static inline bool should_skip_kasan_poison(struct page *page) -{ - if (IS_ENABLED(CONFIG_KASAN_GENERIC)) - return deferred_pages_enabled(); - - return page_kasan_tag(page) == KASAN_TAG_KERNEL; -} - static void kernel_init_pages(struct page *page, int numpages) { int i; From patchwork Wed Feb 26 12:03:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992199 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59FB7C021BF for ; Wed, 26 Feb 2025 12:04:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 617C6280035; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C51E280024; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3879A280034; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 120C7280024 for ; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C544EA0B31 for ; Wed, 26 Feb 2025 12:03:51 +0000 (UTC) X-FDA: 83161961862.03.A01B3EF Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id EB4254000F for ; Wed, 26 Feb 2025 12:03:49 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=hxCrz9G+d7At/E4HL/oRu9TFVWuCM/ndy7Fucb7/da8=; b=wv4wsni9SbwGkhfJuDXdHrQeCCp/L8CNtlbvp/d3n5WPQ/TDyhXd8zp/gDIcMiy86WVRIr CxFJjqgxV/9oS31DpZtujhKSnmY7ORIHslDkj7ibW/mOY2QsLFLGu9COjFtBSJuaKFRkCA DcFtkCBqHcgg0v2TG31rLLzAROmvtd4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571430; a=rsa-sha256; cv=none; b=bTQTn2Z+oDMWWcx+Nj3bWLk+qn3uQT2+apKg89cj1sVMafkP1EwE1GUevv6Ve6dY1fU+zx NS/RA0f5i/B29unhjaYzpjc1jAusAQPH47/20oVRKBedvs5tmQ6yBfUoPnrf2q+yTSr42L 7R43a4dvJrh5t+HLACRMIqmM6lv9bhw= X-AuditID: a67dfc5b-3e1ff7000001d7ae-14-67bf032243b6 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 07/25] mm: introduce luf_ugen to be used as a global timestamp Date: Wed, 26 Feb 2025 21:03:18 +0900 Message-Id: <20250226120336.29565-7-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsXC9ZZnoa4S8/50g55+DYs569ewWXze8I/N 4uv6X8wWTz/1sVhc3jWHzeLemv+sFud3rWW12LF0H5PFpQMLmCyO9x5gsph/7zObxeZNU5kt jk+Zymjx+8ccNgc+j++tfSweO2fdZfdYsKnUY/MKLY9NqzrZPDZ9msTu8e7cOXaPEzN+s3i8 33eVzWPrLzuPxqnX2Dw+b5IL4InisklJzcksSy3St0vgyphw4yVTwTvRinszt7I3MN4Q7GLk 5JAQMJE4NecpSxcjB5i9/3Y1SJhNQF3ixo2fzCC2iICZxMHWP+xdjFwczALLmCT2nmhgA0kI C2RKvOvewQ5iswioSpy8f5YdZA6vgKnExK2SEOPlJVZvOAA2hxNozqdpx8BahQSSJXb+/sME MlNC4DGbxNTWXWwQDZISB1fcYJnAyLuAkWEVo1BmXlluYmaOiV5GZV5mhV5yfu4mRmBIL6v9 E72D8dOF4EOMAhyMSjy8D87sTRdiTSwrrsw9xCjBwawkwsuZuSddiDclsbIqtSg/vqg0J7X4 EKM0B4uSOK/Rt/IUIYH0xJLU7NTUgtQimCwTB6dUA+Oahx8ZVkjfP87m9y1v38kXD7bw3Gy/ rmDtPnvejXMfeMtWqfoIvlnZemk73/eJ3uvntWwT6d/t9TWm6syeisqvKt/XGrQHLj1bdMen 7kTHfG23LcuK9G1EFFwP+ei6lnZ+e1ayOf7xqoOdr8Quz4lMaZyjsf9HQoTZch+zV/78Xy71 7u9kviSpxFKckWioxVxUnAgA1zaIxGUCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYPYyeYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyJtx4yVTwTrTi 3syt7A2MNwS7GDk4JARMJPbfru5i5ORgE1CXuHHjJzOILSJgJnGw9Q97FyMXB7PAMiaJvSca 2EASwgKZEu+6d7CD2CwCqhIn759lB5nDK2AqMXGrJEhYQkBeYvWGA2BzOIHmfJp2DKxVSCBZ YufvP0wTGLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAzQZbV/Ju5g/HLZ/RCjAAej Eg/vgzN704VYE8uKK3MPMUpwMCuJ8HJm7kkX4k1JrKxKLcqPLyrNSS0+xCjNwaIkzusVnpog JJCeWJKanZpakFoEk2Xi4JRqYKw5f6xrl+mxn51MLNXMB6+b73u+Te+MwY/F2pKOi5bnz798 MyvNfoLhtPia193Tzs0SrP6YebGw0t3wZm764Wu/n2wNVTopOeMay4vTSn2fKzcvFnvh+C7l wI52WUbtBaFPArZkuinznXnhLy751laTSX7XoVgX3/xn2k19IUnsXdsX+tjzVCmxFGckGmox FxUnAgD9F61dTAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: EB4254000F X-Stat-Signature: e4y75otqkmfc54oyq9hdser5h54q7je5 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571429-18902 X-HE-Meta: U2FsdGVkX18NXlmnn1SvdIfslQJUWEF7WZ3EREXJhnVT0KMzpYLVeTlQ+0aXQBDNXuQe2UAXq3jpC/VhZ1sx4dnssuvtpeXDTmSBnZPPmQuwWFVTZ4UxrxmItwpaYAJlKhOlTvIxxdfO8NLruq29N2JxnMC6fUfxXHsPcxqozgZy+STszlMgsFP1JY9VYYl/p9+n+XeaKfqpAF1mDJxFUGePMPa1wMkDI2oMaQBc9+cVR2orf467VAtUBApu2PI3P8+B919Y6HPnZNOJesCDMlaWexIN+PPRzIzeeCra9+UGsubIRZmLRvZp7WR2YnM+i1ZaAJycV4ZRQNA0k7b8xSafV7m17iVjvqI+BCACIe9LLD+jxGpEwY9t4AR86EcVEGucbKF9e0tL0pecXyjIFc8WcmehK7eq9B69yNNwhdm5D4/h0qWQi/5jGp/Wxie9kFMz4PsvPfnqRs5bAHNnu+46rww6g4t+8Er9Js599iwOW5DOdQqOAZ0FrIm/mDjdJevrSvlTnK2+viYfHqMcgEY/K1ew0rvvAETI6JB+En/Srt9aHUv4xrHOEMYTxAZHIwzVXlf5zGt0EAoDVskDIO5c0p0hgd0QNqLh8fTbMzrKMuYXi70WHZ+1YMWOwRHEdJjID4QWCH5jjXDcJQtGFuQjOFh120Aq5C9W8Yh4Hc0Off9mfTHWGpI9oWDIyn84idTjNgqGfPVcwxRy54+fj1Urpnzobfj/CLUt6PQKOX3u9VkevUulebOIXTZvoLgCf05jxtWF4YO3VtrFUNL9Bpg+TeJe6y2q4Ue8NlRCjtkshDen3zMXorkS+arXwlwPxCR6LikgtNiT8lzD9iP624RKcjRoR5NATNu3QRww3ZyJyzOWf70k8OzrhgM1jmw3o6VIMITqEKZQdyZ6utraivorUoeBi7LByfec6LqDeQIPyhYGunn2XSoAbeZ1f2FM4EpX2MbgcgblGla+EBe LmLjya67 DAJ/83ELK0WjOMbwjaPcj4Kqz3OPxN+dE1YD2C6p65TkAb87ryefiLk9d6Ts34ZBk++9rR8Pg4bDWrC8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to evaluate the temporal sequence of events to determine whether tlb flush required has been done on each CPU. To achieve that, this patch introduced a generation number, luf_ugen, and a few APIs manipulating the number. It's worth noting the number is designed to wraparound so care must be taken when using it. Signed-off-by: Byungchul Park --- include/linux/mm.h | 34 ++++++++++++++++++++++++++++++++++ mm/rmap.c | 22 ++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 7b1068ddcbb70..8c3481402d8cb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4155,4 +4155,38 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status); +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +/* + * luf_ugen will start with 2 so that 1 can be regarded as a passed one. + */ +#define LUF_UGEN_INIT 2 + +static inline bool ugen_before(unsigned long a, unsigned long b) +{ + /* + * Consider wraparound. + */ + return (long)(a - b) < 0; +} + +static inline unsigned long next_ugen(unsigned long ugen) +{ + if (ugen + 1) + return ugen + 1; + /* + * Avoid invalid ugen, zero. + */ + return ugen + 2; +} + +static inline unsigned long prev_ugen(unsigned long ugen) +{ + if (ugen - 1) + return ugen - 1; + /* + * Avoid invalid ugen, zero. + */ + return ugen - 2; +} +#endif #endif /* _LINUX_MM_H */ diff --git a/mm/rmap.c b/mm/rmap.c index 2de01de164ef0..ed345503e4f88 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -634,6 +634,28 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, } #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +/* + * This generation number is primarily used as a global timestamp to + * determine whether tlb flush required has been done on each CPU. The + * function, ugen_before(), should be used to evaluate the temporal + * sequence of events because the number is designed to wraparound. + */ +static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); + +/* + * Don't return invalid luf_ugen, zero. + */ +static unsigned long __maybe_unused new_luf_ugen(void) +{ + unsigned long ugen = atomic_long_inc_return(&luf_ugen); + + if (!ugen) + ugen = atomic_long_inc_return(&luf_ugen); + + return ugen; +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed From patchwork Wed Feb 26 12:03:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992201 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4425DC021BF for ; Wed, 26 Feb 2025 12:04:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE264280038; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D92CD280024; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96FCF280036; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 73B0428002D for ; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2DBD7121207 for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) X-FDA: 83161961904.13.18DE458 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf27.hostedemail.com (Postfix) with ESMTP id 4864B40003 for ; Wed, 26 Feb 2025 12:03:49 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=t7ZESj6S5klDd37iho9K+N1w/Vn9wQb8Zgaocb9z0sI=; b=UR2od9mfmWfWWfZ1+k8I/ZZnp+Jh+pIC5Pc9DsqwC6ZWdMl/i1V/9oULa0EIbxo8p2PKmA eDDPlaB7C1blhdCrENfSqCQIEg3Wdpaidm5HH5IMMATneoCz1qwDE8oc10ydQrP6dUZ28o e+2MQjeo3mqQhzjd4nFiCb2NUkJe7sg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571430; a=rsa-sha256; cv=none; b=XwKXx9fZnv1EB5KewxOk9TRJNUgzlQNZByLVx0XXNf3EHTMW3KFtctSIrh5BWXyd/b7Cxx T2dywBvPgeLwFHxYVBb/bDN5PSOh+oCbAk8LGEUQ1AYbDeP0MdrNZXXakuK9OTCsxLhf2f f9XnSjt135x3N661CQEuCg8yNPru/R4= X-AuditID: a67dfc5b-3e1ff7000001d7ae-19-67bf03225d34 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 08/25] mm: introduce luf_batch to be used as hash table to store luf meta data Date: Wed, 26 Feb 2025 21:03:19 +0900 Message-Id: <20250226120336.29565-8-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g4lrtSzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK6Nt2hHWgi0GFa92d7I1MB5S72Lk 5JAQMJG4svgmI4z95cZsNhCbTUBd4saNn8wgtoiAmcTB1j/sXYxcHMwCy5gk9p5oACsSFqiU 2DC5AayIRUBV4n7rPbBBvAKmEgeO/WWBGCovsXrDAbAaTqBBn6YdA+sVEkiW2Pn7DxPIUAmB 22wSU3+uZINokJQ4uOIGywRG3gWMDKsYhTLzynITM3NM9DIq8zIr9JLzczcxAsN6We2f6B2M ny4EH2IU4GBU4uF9cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWHGKU5 WJTEeY2+lacICaQnlqRmp6YWpBbBZJk4OKUaGFn5veMu7Jo9v9hj8YPDmY+aBYXmr42M5Hl/ mmNa6VmLZGW5my1rFm2N1uwJehh3vm6a9a4sY5acl/u/esxin/rTn/e3+47n3y7V96/Jmf76 5IrP9e5BH5pC5d1mVbx3+c6w8w2Dr6dsY6XUdsZohR0TP7IWx2U3VJrdfPLlqHixYEj4rAkf 9iixFGckGmoxFxUnAgCGMxFcZwIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4P5pRYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgy2qYdYS3YYlDx ancnWwPjIfUuRk4OCQETiS83ZrOB2GwC6hI3bvxkBrFFBMwkDrb+Ye9i5OJgFljGJLH3RANY kbBApcSGyQ1gRSwCqhL3W+8xgti8AqYSB479ZYEYKi+xesMBsBpOoEGfph0D6xUSSJbY+fsP 0wRGrgWMDKsYRTLzynITM3NM9YqzMyrzMiv0kvNzNzECg3RZ7Z+JOxi/XHY/xCjAwajEw/vg zN50IdbEsuLK3EOMEhzMSiK8nJl70oV4UxIrq1KL8uOLSnNSiw8xSnOwKInzeoWnJggJpCeW pGanphakFsFkmTg4pRoYnZVb1h3Trokzf3zIhynw8//k806zfs20vsVzJEfno8nbb5sWdsgp W/fq8yXbF1nNOfO0Ze++bPlHFo/OiruVnq47YxJy+9qRD8VTczz/inf3zjqtdlXtesezdH+R p7MORsmdFvt/V13uD5v1AXEzY7X6x36VSy4/vlQ0keULZw5nbsmeqoNdSizFGYmGWsxFxYkA dXrYUU4CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 4864B40003 X-Stat-Signature: jj5n6eb7t5brgdkw347dx6b4h4u5z8fm X-HE-Tag: 1740571429-204223 X-HE-Meta: U2FsdGVkX181jA5pnDpjRnmttV3LMpez68oC/wCOBUTzE1DHHjEitNjIhAMUasnPLksxj0pCghRrp7IPOJVY7f3wEdfqVvyLTCsLyO4Wo9DM3wG8k86fPLiro9CsHbSakwGFcIEKIkiZhQEEQi0jaJ3DNh4CuKXMqFAFcPCT8+jD+G6o1t8fVrIz+BJV8f33Ce0xuY7LbhsHeSVOIkDS3Qwk9adKsOTJFDumChojvOIcvgskYwXeiFT1qoDiW6SbmDmHBilGVvohuIsmxEHofjwHqPAY7sdHWU733HeQwArtV4LaxdcU06GtwLKoB2G7JQhCD6p56VeEzTPBZn+cVGieLIhpAazPeb4ITY8Z2R6RCwiK1sC5zKTLkxGVFgXtQk2d3dHaa5rPHG2AMciCCM1RacWAWrow73wzleFb6WY7kLI61AIs/S6R6lsIDUtm5D75ZFk2CQK/J0iN9M6EN2D+EKRXHtG8RElfoniEfqU5H9WyufdOR6y2fLGnB1J7DEciMRdvNKDXtJNQGoxOaKMLvWLKh5qNkZFqkRr5HHLhCLIoZrGZ4VWqx4VPbNbDay/agBcJvkOMbzZ9kskZqtxFhrx/d7/24oBPIrMA5JNNS7ROyPhksJE9TkNbN7Gzo6wdbenY0ij/nnxM2r0gIH27u7vugQm17C69WaoYcA6mGPH06J+pEHaEiSowHZUW5q5K5FdU08WzCwJoIWzRm7z80drjjpxDdCQTbGPIxEr0lDmziw2pKycIcGS62JPiZYoDd/q1Rv10bgjVsPj6YhSxsxQQR4h8Kjcp6+OsfsjEJ9nuAPhE9FrwuYhaRmBf0IlKoxgAaiFa+XYuSTUjFh6lo1HjAf0EJ1vtSe9XsDHf1oLafWKQIxkIFSDgdGSEMQi6J6ixT0YlM2BnpXsrvLruvrqSSIEe1FX0JSEj9DPFLJrwWiFq/WU+9IXy/1sd2r83ae0TZMs2g1hlWBD SD0DUave QESvNYz4zExd1xqCmjwB83f74oj5kUdfL2yXoqeNhA6N0YLzsG+ydvLotGCHYA0XdKm3ihknDRNWsnQg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to keep luf meta data per page while staying in pcp or buddy allocator. The meta data includes cpumask for tlb shootdown and luf's request generation number. Since struct page doesn't have enough room to store luf meta data, this patch introduces a hash table to store them and makes each page keep its hash key instead. Since all the pages in pcp or buddy share the hash table, confliction is inevitable so care must be taken when reading or updating its entry. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 10 ++++ mm/internal.h | 8 +++ mm/rmap.c | 122 +++++++++++++++++++++++++++++++++++++-- 3 files changed, 136 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7d78a285e52ca..4bfe8d072b0ea 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -32,6 +32,16 @@ struct address_space; struct mem_cgroup; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +#else +struct luf_batch {}; +#endif + /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/mm/internal.h b/mm/internal.h index 4c8ed93a792ec..3333d8d461c2c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1253,6 +1253,8 @@ extern struct workqueue_struct *mm_percpu_wq; void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else static inline void try_to_unmap_flush(void) { @@ -1263,6 +1265,12 @@ static inline void try_to_unmap_flush_dirty(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) +{ +} +static inline void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ +} #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ extern const struct trace_print_flags pageflag_names[]; diff --git a/mm/rmap.c b/mm/rmap.c index ed345503e4f88..74fbf6c2fb3a7 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -641,7 +641,7 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, * function, ugen_before(), should be used to evaluate the temporal * sequence of events because the number is designed to wraparound. */ -static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); +static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. @@ -656,6 +656,122 @@ static unsigned long __maybe_unused new_luf_ugen(void) return ugen; } +static void reset_batch(struct tlbflush_unmap_batch *batch) +{ + arch_tlbbatch_clear(&batch->arch); + batch->flush_required = false; + batch->writable = false; +} + +void fold_batch(struct tlbflush_unmap_batch *dst, + struct tlbflush_unmap_batch *src, bool reset) +{ + if (!src->flush_required) + return; + + /* + * Fold src to dst. + */ + arch_tlbbatch_fold(&dst->arch, &src->arch); + dst->writable = dst->writable || src->writable; + dst->flush_required = true; + + if (!reset) + return; + + /* + * Reset src. + */ + reset_batch(src); +} + +/* + * The range that luf_key covers, which is 'unsigned short' type. + */ +#define NR_LUF_BATCH (1 << (sizeof(short) * 8)) + +/* + * Use 0th entry as accumulated batch. + */ +static struct luf_batch luf_batch[NR_LUF_BATCH]; + +static void luf_batch_init(struct luf_batch *lb) +{ + rwlock_init(&lb->lock); + reset_batch(&lb->batch); + lb->ugen = atomic_long_read(&luf_ugen) - 1; +} + +static int __init luf_init(void) +{ + int i; + + for (i = 0; i < NR_LUF_BATCH; i++) + luf_batch_init(&luf_batch[i]); + + return 0; +} +early_initcall(luf_init); + +/* + * key to point an entry of the luf_batch array + * + * note: zero means invalid key + */ +static atomic_t luf_kgen = ATOMIC_INIT(1); + +/* + * Don't return invalid luf_key, zero. + */ +static unsigned short __maybe_unused new_luf_key(void) +{ + unsigned short luf_key = atomic_inc_return(&luf_kgen); + + if (!luf_key) + luf_key = atomic_inc_return(&luf_kgen); + + return luf_key; +} + +static void __fold_luf_batch(struct luf_batch *dst_lb, + struct tlbflush_unmap_batch *src_batch, + unsigned long src_ugen) +{ + /* + * dst_lb->ugen represents one that requires tlb shootdown for + * it, that is, sort of request number. The newer it is, the + * more tlb shootdown might be needed to fulfill the newer + * request. Conservertively keep the newer one. + */ + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + dst_lb->ugen = src_ugen; + fold_batch(&dst_lb->batch, src_batch, false); +} + +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ + unsigned long flags; + + /* + * Exactly same. Nothing to fold. + */ + if (dst == src) + return; + + if (&src->lock < &dst->lock) { + read_lock_irqsave(&src->lock, flags); + write_lock(&dst->lock); + } else { + write_lock_irqsave(&dst->lock, flags); + read_lock(&src->lock); + } + + __fold_luf_batch(dst, &src->batch, src->ugen); + + write_unlock(&dst->lock); + read_unlock_irqrestore(&src->lock, flags); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -670,9 +786,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); - arch_tlbbatch_clear(&tlb_ubc->arch); - tlb_ubc->flush_required = false; - tlb_ubc->writable = false; + reset_batch(tlb_ubc); } /* Flush iff there are potentially writable TLB entries that can race with IO */ From patchwork Wed Feb 26 12:03:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992202 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D477C021BF for ; Wed, 26 Feb 2025 12:04:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24A72280024; Wed, 26 Feb 2025 07:03:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15AB0280037; Wed, 26 Feb 2025 07:03:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDBC3280036; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8E90B280034 for ; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 517A4121214 for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) X-FDA: 83161961904.26.CC92942 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf21.hostedemail.com (Postfix) with ESMTP id 71F5A1C000A for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571430; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=EMVQAAUeOtuRH6J4sQkG9jiokUi1LOM+qZAHHr0qxqY=; b=4ciLtEuZZitOT09HHjUgbtHAkTYoUwTZEsAhNWQgYVsy8xIsYOORwjR4fcLZEaUo9uf/a4 gjzeGMwvs4PUyB80yxFNmpKw8FNrKKtnr1ctj+ytfOihA1hnLRf/HwOD/okp9CQxb4L4L4 /01qL+48fL3XQuUG+CSlI4OYJVolkn8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571430; a=rsa-sha256; cv=none; b=cRrXhTdCzUQLAN9YVB0NfjF83ouJ8XTu/f3Nvm9Qo1T6Ms2KSt/LKTyvai6hEvFoiFqcup BoQZY4tSaBBLH/7CEy09FqWA1+NSUgLqwmFHGjxq43yWN+3g9oNedPA5wE7B7YdY1VFoKG 1K0CRVW/GCMxeI5io1ozbaV6t+8j+0E= X-AuditID: a67dfc5b-3e1ff7000001d7ae-1e-67bf032265ed From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 09/25] mm: introduce API to perform tlb shootdown on exit from page allocator Date: Wed, 26 Feb 2025 21:03:20 +0900 Message-Id: <20250226120336.29565-9-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnoa4S8/50g6//9SzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PFldqCR6IVzRd7GRsYZwt1MXJy SAiYSHyb+ZAJxt59bwkLiM0moC5x48ZPZhBbRMBM4mDrH/YuRi4OZoFlTBJ7TzSwgSSEBSok psx4CtbAIqAqsX3PVkYQm1fAVOJUx1VGiKHyEqs3HAAbxAk06NO0Y2C9QgLJEjt//2ECGSoh cJtNoqnnJ1SDpMTBFTdYJjDyLmBkWMUolJlXlpuYmWOil1GZl1mhl5yfu4kRGNTLav9E72D8 dCH4EKMAB6MSD++DM3vThVgTy4orcw8xSnAwK4nwcmbuSRfiTUmsrEotyo8vKs1JLT7EKM3B oiTOa/StPEVIID2xJDU7NbUgtQgmy8TBKdXAyFo79fkXfhsfrtQZre0nLkU9SGi74MpcHvh1 RmPewXdTRS8zzs8XPHrQ1OlcSfW1VMWFm27IsfAw72q3/PRu/R7289tUw6+weRZmuV5yK7Ns bLi7ir/F/JZ+1vF0yWNerTYxBhvSHtXyvq47d8NpD1v2W/fXzn5L+B0msDbJzszY9PXhUx53 JZbijERDLeai4kQAYsuXG2YCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYHu/msWc9WvYLD5v+Mdm 8XX9L2aLp5/6WCwOzz3JanF51xw2i3tr/rNanN+1ltVix9J9TBaXDixgsjjee4DJYv69z2wW mzdNZbY4PmUqo8XvH3PYHPg9vrf2sXjsnHWX3WPBplKPzSu0PDat6mTz2PRpErvHu3Pn2D1O zPjN4vF+31U2j8UvPjB5bP1l59E49Rqbx+dNcgG8UVw2Kak5mWWpRfp2CVwZK67UFjwSrWi+ 2MvYwDhbqIuRk0NCwERi970lLCA2m4C6xI0bP5lBbBEBM4mDrX/Yuxi5OJgFljFJ7D3RwAaS EBaokJgy4ylYA4uAqsT2PVsZQWxeAVOJUx1XGSGGykus3nAAbBAn0KBP046B9QoJJEvs/P2H aQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBIbqs9s/EHYxfLrsfYhTgYFTi4X1w Zm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlYlMR5vcJTE4QE0hNL UrNTUwtSi2CyTBycUg2Ms6Jm1iuslbD7+TRqxjqltWF2tS7fb7ros1kcyAje+/sCS1/Vec9H 68OX3H9psLnN439Cv/Phr7584f+WHD3Zd90y9dXTifZ/VNu/rzomuyBFbYNf2w+5lZ9fZDPI e0TfYH67zGb2b09bc/tw1jvuKx5O/M/df6365LWXyW3Jfnas5+cc9WjXVGIpzkg01GIuKk4E AHMyoZlNAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 71F5A1C000A X-Stat-Signature: g8qjmra1jndewj8j4ecg8ejeg1mia7oz X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571430-660317 X-HE-Meta: U2FsdGVkX189hvu+apBZxMSY/NBll9cxbheWs5+FZoZzP1j8R1J0DK8hUg1htpDfFRdFX3/XflR0kIbRZzzGZAEzuK+yn6pAEO5owRGSUGDPw9E/91ZzcyU8kpXnJm0rf0lyU2qLeptc4Aq+rM20cx6h61JIpD5lEEEznXEqgygdhaYCtISgn3sR64QtX/jA+fQOxHjickVI0lxpmqfmYbzQ4nm+GRLJ+3kQYUDYv199UOiMRjmLHv94qiuR6SbFkzFP6TfbhSgdZYVOxaDtLxXBcEqysQ4GDMbh8q0IJhXf04g4zp5GiPSO/QV38x6HnDacBjVSgi/Xwx/PQMewAFgO5ActrFqw+KGE1HJCkK44sM/e3HWoG024F1WXKvP20cNM6mMZY4/3s1WiGiUYqrCpvj59XxV42J1lMaZ1J6PJmRXPxgOtwN7IsZNNTm+YV6VlaaJWSVPglZFPWcyDZaCMG1H15TaMZIgELcCE4ydOZxspymzICgvvDMw2eSjruBx8ahlRXoYInXbbyhBfZ8SVsqwD5H8FJQPcL4zalXGGKeA3cePF9au8O5V5iH/nrhDdJzo15KfhtnJ2/DDUFVoZs7Ed6ZykjDmAkoO7wjJtotkRp4l/LRkXmVfeh9pM+aqjhYwbk5NWX85+UTzy7XevYdo3+7zhvOfZ5/c6FNg7hiJtlN//+/tVRUVy5Uom2kKJZey+8Sk9Etk3J0b/chGkrzAWmnKIXn69Tbuc919InSxeaHyzm1sPGLi3UOSXrYk5FFqGhP3gIlXttLYpdn7XQzpQerI3UIs+HiqOOhO3wkswXRHhpzHijBo2ZhrNtWk3utK0uQDdeKCX47JKfgkVmSdKtcLTTgggEfMMbF7UZSqCpQt4zC/DvCDuPtaZhvp7gjcUQ5pmDj2kG7picTSsm6V4QHsyI8EVdNtdQzZTquBRQpGEpUU5fOPUF+vV/KpJD46vyH17XARh1jU JFA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that performs tlb shootdown required on exit from page allocator. This patch introduced a new API rather than making use of existing try_to_unmap_flush() to avoid repeated and redundant tlb shootdown due to frequent page allocations during a session of batched unmap flush. Signed-off-by: Byungchul Park --- include/linux/sched.h | 1 + mm/internal.h | 4 ++++ mm/rmap.c | 20 ++++++++++++++++++++ 3 files changed, 25 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9632e3318e0d6..86ef426644639 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1401,6 +1401,7 @@ struct task_struct { #endif struct tlbflush_unmap_batch tlb_ubc; + struct tlbflush_unmap_batch tlb_ubc_takeoff; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/internal.h b/mm/internal.h index 3333d8d461c2c..b52e14f86c436 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1252,6 +1252,7 @@ extern struct workqueue_struct *mm_percpu_wq; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); +void try_to_unmap_flush_takeoff(void); void flush_tlb_batched_pending(struct mm_struct *mm); void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); @@ -1262,6 +1263,9 @@ static inline void try_to_unmap_flush(void) static inline void try_to_unmap_flush_dirty(void) { } +static inline void try_to_unmap_flush_takeoff(void) +{ +} static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } diff --git a/mm/rmap.c b/mm/rmap.c index 74fbf6c2fb3a7..72c5e665e59a4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -772,6 +772,26 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +void try_to_unmap_flush_takeoff(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + + if (!tlb_ubc_takeoff->flush_required) + return; + + arch_tlbbatch_flush(&tlb_ubc_takeoff->arch); + + /* + * Now that tlb shootdown of tlb_ubc_takeoff has been performed, + * it's good chance to shrink tlb_ubc if possible. + */ + if (arch_tlbbatch_done(&tlb_ubc->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc); + + reset_batch(tlb_ubc_takeoff); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed From patchwork Wed Feb 26 12:03:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEA4AC021B8 for ; Wed, 26 Feb 2025 12:04:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 500CE280034; Wed, 26 Feb 2025 07:03:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F464280036; Wed, 26 Feb 2025 07:03:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEE28280034; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C46CF280037 for ; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7E3FEC1390 for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) X-FDA: 83161961904.19.F63F2BF Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id A8A7B1A0019 for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=A4NnjN00HTXUYnYxnCdGoGHTFSCUMp1YVyMTOwC1+N4=; b=iVeEt/hbgSuriZ7U3Llqth6IAFLoOoD81TlwqoCvLS/FcgrhClK2wd6g/zCo4Tyk6x6OJk F/cfWtdlR0laplwOcl3yHeN2zA6a0MIYyQPgRyxzCY8svTpzkfhEfPSf1w9XW9zA2Etb7J 667Q1HRhFxxHTds76UHpd7nAhJrrr6o= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571431; a=rsa-sha256; cv=none; b=pFiw0pqPMlS61fXxbr9b7146HHeCnYc8+yCD2RDJ4U08oN3o5jnup8/0VRQlkcBa80GUMI AxYhSMUD8aaDOcT/YhjaaB/myI6TognS1A4SmGcsEgGzdjLnf57/2xWO8jJZu8vkybIqkd k+WnjR1qm8bh/DGSGcl/GfSiYE7pOIc= X-AuditID: a67dfc5b-3e1ff7000001d7ae-23-67bf0322c49c From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 10/25] mm: introduce APIs to check if the page allocation is tlb shootdownable Date: Wed, 26 Feb 2025 21:03:21 +0900 Message-Id: <20250226120336.29565-10-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g7UfDSzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+PG1mNsBT+NK9ZP38bcwHhcq4uR k0NCwERi67p/TDD29G8PWEBsNgF1iRs3fjKD2CICZhIHW/+wdzFycTALLGOS2Huiga2LkYND WKBSYsfpGJAaFgFViS93TjKC2LxA9Ssfz2ODmCkvsXrDAbA5nEDxT9OOgcWFBJIldv7+wwQy U0LgPpvEh85DUEdIShxccYNlAiPvAkaGVYxCmXlluYmZOSZ6GZV5mRV6yfm5mxiBQb2s9k/0 DsZPF4IPMQpwMCrx8D44szddiDWxrLgy9xCjBAezkggvZ+aedCHelMTKqtSi/Pii0pzU4kOM 0hwsSuK8Rt/KU4QE0hNLUrNTUwtSi2CyTBycUg2MvL/PXT7G8VdX5OZb3reF2/sdtqj2Rl5/ MHklb7zpdt0NTBf/fLA8sUHTVOSa+BI1N5W9CvFv1i2+n+idoLn7wdF7uee8Qj7ZG/8LPFun 0H7wgbPrMRe1mGnsJT89Am8zxPeHC81SLRLzfHUv1OvV9McXnvOfrZJznlT5QaqTcUZ29Jtf qiFWSizFGYmGWsxFxYkANVSxGmYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4MZ0DYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgybmw9xlbw07hi /fRtzA2Mx7W6GDk5JARMJKZ/e8ACYrMJqEvcuPGTGcQWETCTONj6h72LkYuDWWAZk8TeEw1s XYwcHMIClRI7TseA1LAIqEp8uXOSEcTmBapf+XgeG8RMeYnVGw6AzeEEin+adgwsLiSQLLHz 9x+mCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgSG6LLaPxN3MH657H6IUYCDUYmH 98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBREuf1Ck9NEBJI TyxJzU5NLUgtgskycXBKNTA6mb2Z9zROaOuB2cx19w2TS85smnSY58WOj6XfmPgmTbcMED+X NvvLsoXrFn9d9OxmQ1HXhL51S9Yq1r69KTy9a+PUBnH3VddqVjTnKPfVs043U/57Ly3jnd1H M8O5jz4d5BEXTkpc/LPWwLDkucjWw+YPnP/Nt/kg+Xv/s4mMIbWyrlF3oz55KrEUZyQaajEX FScCAO0dzwxNAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: edrktxwrfs36pxdhujp15fh1n19tdnbo X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A8A7B1A0019 X-HE-Tag: 1740571430-493008 X-HE-Meta: U2FsdGVkX18e3JSY2CaAmMawLSbxG3nnvpcq1Y/nTYRSmP/RhdZTrNV4WmrjVQbK3tzsIwnWSQfsRl3BKSr0J9vHGyDiyCJlV3lV1HYfPyq2+LP6PJJAenymo1wCBjW5GVCdwPl16zqcw8qJKuWU0zQLefmNIihtIwh0hy/XGhkhln8b+K4eN1fhPGV9FVk0RelUlU4xyt84BWuz5IFvBgjeib+vIwNYO81LK4I4DSt/hwHfFcVtsqRTNXGwRctzXOLYZp1I5Ppbn2DXiPdJ2IYnnuS82xz1wI0Ozhng6vZTbAqI+rQknY8hg+Tw3GxauVmCVDWX1b9t091HgKf6ujyvwSWJOKZvvcteQdNlvV3a6ogRGILvwqNNRb5lalN5n+QbxUTlERxi55RmIaUl9ZOKd3QHAO1yfK6eZXkYvgmJrmy61GT5Zr0/aiwjrZhbzqgMRX3nfFb9ma/8u93Dtu775bl1JkhcofSPFkY75oioIrfGvVX9rZkjQH8W3/LJIeiQMS4Q4r+fTUsWRgeuTM0TdQk5aQDH8GMpd3xnP0es0VpBoRkyDInTWBl1YG6JU5I1SeKW5Vul9JxmvbzhQ7qHDYXqrd7AuJnETh8GJP2BEwp6WyaUb280v6aOr6Sxwvi9IW+DcWUZkT0IR8c5SjOPKcs5UUHvI2cEw9de3Ps4g/rpLNRufRY7F+9RrijGWuyqD5HUPDwRknE+R9CggaO3hVQa62X7HLjku6z3BdgPFMc/hCcZ0Hlnlno4bncQ7nWcSwpWfKq5myo1utZpQMIdITmOdq/aB2nDqd3XKCXpstUx3/EfK0Tqa00EgA+Zkxhc7SGZm94SK7kvDwmM+Oym1sRwYhAkwURQi1gaMp6cS6VEG3b16syjvzi7zV1iA/EdzNXzHTHrqF95fW/FVnm5Nl9gf+WGb3Tw6MCOlS6VIu5mfI0tExNc6Iqik3HyjicldoQsyyGkmMLkFWG RXGWx1LF 0hR0Nub//Wed6VwwTXkITS/R5sw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that should indentify if tlb shootdown can be performed on page allocation. In a context with irq disabled or non-task, tlb shootdown cannot be performed because of deadlock issue. Thus, page allocator should work being aware of whether tlb shootdown can be performed on returning page. This patch introduced APIs that pcp or buddy page allocator can use to delimit the critical sections taking off pages and indentify whether tlb shootdown can be performed. Signed-off-by: Byungchul Park --- include/linux/sched.h | 5 ++ mm/internal.h | 14 ++++ mm/page_alloc.c | 159 ++++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 2 +- 4 files changed, 179 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 86ef426644639..a3049ea5b3ad3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1400,6 +1400,11 @@ struct task_struct { struct callback_head cid_work; #endif +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + int luf_no_shootdown; + int luf_takeoff_started; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; diff --git a/mm/internal.h b/mm/internal.h index b52e14f86c436..5e67f009d23c6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1580,6 +1580,20 @@ static inline void accept_page(struct page *page) { } #endif /* CONFIG_UNACCEPTED_MEMORY */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +extern struct luf_batch luf_batch[]; +bool luf_takeoff_start(void); +void luf_takeoff_end(void); +bool luf_takeoff_no_shootdown(void); +bool luf_takeoff_check(struct page *page); +bool luf_takeoff_check_and_fold(struct page *page); +#else +static inline bool luf_takeoff_start(void) { return false; } +static inline void luf_takeoff_end(void) {} +static inline bool luf_takeoff_no_shootdown(void) { return true; } +static inline bool luf_takeoff_check(struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +#endif /* pagewalk.c */ int walk_page_range_mm(struct mm_struct *mm, unsigned long start, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 27aeee0cfcf8f..a964a98fbad51 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -622,6 +622,165 @@ compaction_capture(struct capture_control *capc, struct page *page, } #endif /* CONFIG_COMPACTION */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +static bool no_shootdown_context(void) +{ + /* + * If it performs with irq disabled, that might cause a deadlock. + * Avoid tlb shootdown in this case. + */ + return !(!irqs_disabled() && in_task()); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_start(void) +{ + unsigned long flags; + bool no_shootdown = no_shootdown_context(); + + local_irq_save(flags); + + /* + * It's the outmost luf_takeoff_start(). + */ + if (!current->luf_takeoff_started) + VM_WARN_ON(current->luf_no_shootdown); + + /* + * current->luf_no_shootdown > 0 doesn't mean tlb shootdown is + * not allowed at all. However, it guarantees tlb shootdown is + * possible once current->luf_no_shootdown == 0. It might look + * too conservative but for now do this way for simplity. + */ + if (no_shootdown || current->luf_no_shootdown) + current->luf_no_shootdown++; + + current->luf_takeoff_started++; + local_irq_restore(flags); + + return !no_shootdown; +} + +/* + * Should be called within the same context of luf_takeoff_start(). + */ +void luf_takeoff_end(void) +{ + unsigned long flags; + bool no_shootdown; + bool outmost = false; + + local_irq_save(flags); + VM_WARN_ON(!current->luf_takeoff_started); + + /* + * Assume the context and irq flags are same as those at + * luf_takeoff_start(). + */ + if (current->luf_no_shootdown) + current->luf_no_shootdown--; + + no_shootdown = !!current->luf_no_shootdown; + + current->luf_takeoff_started--; + + /* + * It's the outmost luf_takeoff_end(). + */ + if (!current->luf_takeoff_started) + outmost = true; + + local_irq_restore(flags); + + if (no_shootdown) + goto out; + + try_to_unmap_flush_takeoff(); +out: + if (outmost) + VM_WARN_ON(current->luf_no_shootdown); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_no_shootdown(void) +{ + bool no_shootdown = true; + unsigned long flags; + + local_irq_save(flags); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + goto out; + } + no_shootdown = current->luf_no_shootdown; +out: + local_irq_restore(flags); + return no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check(struct page *page) +{ + unsigned short luf_key = page_luf_key(page); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + return !current->luf_no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check_and_fold(struct page *page) +{ + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long flags; + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc_takeoff, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + return true; +} +#endif + static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { diff --git a/mm/rmap.c b/mm/rmap.c index 72c5e665e59a4..1581b1a00f974 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -693,7 +693,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, /* * Use 0th entry as accumulated batch. */ -static struct luf_batch luf_batch[NR_LUF_BATCH]; +struct luf_batch luf_batch[NR_LUF_BATCH]; static void luf_batch_init(struct luf_batch *lb) { From patchwork Wed Feb 26 12:03:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 931D4C18E7C for ; Wed, 26 Feb 2025 12:05:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 18491280047; Wed, 26 Feb 2025 07:05:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1336A280022; Wed, 26 Feb 2025 07:05:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA124280047; Wed, 26 Feb 2025 07:05:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BF32D280022 for ; Wed, 26 Feb 2025 07:05:01 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 73FEF813DF for ; Wed, 26 Feb 2025 12:05:01 +0000 (UTC) X-FDA: 83161964802.04.9F4CF9F Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id EB4FB40008 for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=hMG6P6zz9TNf6wZGMNFsOWeFUJaCioV7/YyIlR85qrY=; b=Tb30q0dgyt2xCgMm0/Ln+2s766iPNZxeQoiAlLVY7azLTA7DhdY4XPYjkPW3fZWI+NgzRn xJgArA+wbBLXtzELO6qosL4oHt92c3VneaRB3JauQS5zGyP35FUlUHCwxBhWCVjJCUBoeO pYCKViLUmRMmkbb7aKL59fJ3Amy2iBM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571431; a=rsa-sha256; cv=none; b=8YY+ixuehr+u3LE74r5QZ+x3yhGRAoYFNnuf1RVfSSebMKFnZ9V+dRjov30Gvqme8QGE8H 8Qp6r0cmoyhL5H2/tzbCvV2S2QoCTnwiXu75jN8SPqNTsrdMRjw3QFCWC/LRUZ64s8FN28 WtMobPJE4woqvWD/UjBU3DgNqkW09TI= X-AuditID: a67dfc5b-3e1ff7000001d7ae-28-67bf0322c4b0 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 11/25] mm: deliver luf_key to pcp or buddy on free after unmapping Date: Wed, 26 Feb 2025 21:03:22 +0900 Message-Id: <20250226120336.29565-11-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g5YTJhZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlbF13nKWgq39jBVnz3xibWCcXtLF yMkhIWAicWvGQVYY+96WBcwgNpuAusSNGz/BbBEBM4mDrX/Yuxi5OJgFljFJ7D3RwAaSEBbI lZh1bgaYzSKgKvF4z3V2EJsXqOF+y3cmiKHyEqs3HAAbxAkU/zTtGFi9kECyxM7ff6BqbrNJ rHpWB2FLShxccYNlAiPvAkaGVYxCmXlluYmZOSZ6GZV5mRV6yfm5mxiBYb2s9k/0DsZPF4IP MQpwMCrx8D44szddiDWxrLgy9xCjBAezkggvZ+aedCHelMTKqtSi/Pii0pzU4kOM0hwsSuK8 Rt/KU4QE0hNLUrNTUwtSi2CyTBycUg2MvMbTNy+Udvp1zvxPxbQwDdPkpSZn3QJDznf1su2S ZGDZrha79HEx04Wflzd+Oqyde/zJlvDsDU6MwuYtErEeuhOFl9sHFxx4/G6VULpQQcxb1oeN 7T++Hpet7WS4HdAmUfbWKmlmliij9L7a2Rcto/bOs574O/T9l0q7g8ePNIfWiP7pZDikxFKc kWioxVxUnAgA1FsgaGcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4NFXLYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyts5bzlKwtZ+x 4uyZT6wNjNNLuhg5OSQETCTubVnADGKzCahL3LjxE8wWETCTONj6h72LkYuDWWAZk8TeEw1s IAlhgVyJWedmgNksAqoSj/dcZwexeYEa7rd8Z4IYKi+xesMBsEGcQPFP046B1QsJJEvs/P2H aQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBYbqs9s/EHYxfLrsfYhTgYFTi4X1w Zm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlYlMR5vcJTE4QE0hNL UrNTUwtSi2CyTBycUg2MM/YvqE/+/ZPVTEvALM/7kpmFk8fsC0V7Nxq+uclpet38q4Q595/P PbtOMX7S0r7S4n2T//R3V5MPB4yf7onxd7GaKl6Q+07u802WmxU3ruTJ+khqr3TT3/JmdTfv DpnVM35f4/PyKl9psFvwoNp3P55Sj7dCviKP31RXrP3z/APnK9EUuxu+SizFGYmGWsxFxYkA a0pgxE8CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: EB4FB40008 X-Stat-Signature: agufsxpau7xktqgqj9kzxk7pr7rwppd3 X-Rspamd-Server: rspam03 X-HE-Tag: 1740571430-707340 X-HE-Meta: U2FsdGVkX1+kBZIWhsAzM73N0CrIVpin6UxP0j83USvBjZdPPJm/XKOI8SqYMQJffakNKGYJE3TKJrMuNyixuMJMqIrztU/wfsd2iq4y/Li4hzCN2Rf8eNZ4b9e+SsKgc38uxqAqp1wPXFb17j3WSpA3lOWVvrTkwa8yCrU8oNC2pbnY54f2nfDdvvVUxb4dnQccNkvlNs0SsbkKqj+UMgFz6YbOHBC16v12yz7DL6w+wSHWTz7l6suu+btq0yIz1YMEGkkqJtq8B+zSOk+IyGXyX7NqcYgaihlmtcSmE7T1n1VqbTOUKpeV9TdHBJABDy2ZWUHh4+v5ddCrjlCa0lnJQ7r9C276UCanA8m7TJId+qcy1GwIxB8R/LGjI9ZbZxRJifI1gZdMeH0VVsjxZ+E4CWc9IFNXKeJ8a+Ari7m/Xijz28BFSR4z8BvW0iT1L34KfT9cDS5QNMhNFrWnoXEN9B80r9d0iBaeWTxJYwB4ssWMZd9puHnBU6zZSd5D7FSvlcnNd//Y9LnOaQbROperaH/DdQCOTmYeLhEa/3stzCptzBlr+v5u8pGGa6S89hpVMD3zZRepxFnJxKHgmjQS+XI0C7YOvNvyWuT5+JDmkjTsQ++8XuwTAXZBn+h4Bty1JxOmxkePwua97vz7D3SWks+rXnj2JlIraLPgfejFgF1y5tJ3DsMxBjB9b/7Ne6JKE4Fp6Q9sowQQNImtkoYNlerOx5M4xvYW3lHhl45oEwn72N1+9CKnc3GT/9f2+uwsly8Vk5Q7TLUym+bOpP2cVF8CQ9LSU3CDn0ApyMRIFNEzOLk1+ZY8atrfmEumHXcQNiLlZY5/p1lIlJJzpH/HNmaNtUgc5gVCgDES9Znxp0JNnvnzFsVuGKLsV3cBIjdddgfv4Qg6mCgdqjFcr91BPsHt5BxrsgYFQalHVvV+Kff8V5CKXdllJy88kdCeUOy8zfdEomJ4W2Y8YMO Jn0Fd1iP 7k8oEZRf/TBm02qe9lJWOMO6vdb593uuQFCcl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to pass luf_key to pcp or buddy allocator on free after unmapping e.g. during page reclaim or page migration. The luf_key will be used to track need of tlb shootdown and which cpus need to perform tlb flush, per page residing in pcp or buddy, and should be handed over properly when pages travel between pcp and buddy. Signed-off-by: Byungchul Park --- mm/internal.h | 4 +- mm/page_alloc.c | 116 ++++++++++++++++++++++++++++++++----------- mm/page_frag_cache.c | 6 +-- mm/page_isolation.c | 6 +++ mm/page_reporting.c | 6 +++ mm/slub.c | 2 +- mm/swap.c | 4 +- mm/vmscan.c | 8 +-- 8 files changed, 111 insertions(+), 41 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 5e67f009d23c6..47d3291278e81 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -746,8 +746,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid, nodemask_t *); #define __alloc_frozen_pages(...) \ alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__)) -void free_frozen_pages(struct page *page, unsigned int order); -void free_unref_folios(struct folio_batch *fbatch); +void free_frozen_pages(struct page *page, unsigned int order, unsigned short luf_key); +void free_unref_folios(struct folio_batch *fbatch, unsigned short luf_key); #ifdef CONFIG_NUMA struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a964a98fbad51..d2d23bbd60467 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -212,7 +212,7 @@ unsigned int pageblock_order __read_mostly; #endif static void __free_pages_ok(struct page *page, unsigned int order, - fpi_t fpi_flags); + fpi_t fpi_flags, unsigned short luf_key); /* * results with 256, 32 in the lowmem_reserve sysctl: @@ -850,8 +850,13 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon list_del(&page->buddy_list); __ClearPageBuddy(page); - set_page_private(page, 0); zone->free_area[order].nr_free--; + + /* + * Keep head page's private until post_alloc_hook(). + * + * XXX: Tail pages' private doesn't get cleared. + */ } static inline void del_page_from_free_list(struct page *page, struct zone *zone, @@ -920,7 +925,7 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn, static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, - int migratetype, fpi_t fpi_flags) + int migratetype, fpi_t fpi_flags, unsigned short luf_key) { struct capture_control *capc = task_capc(zone); unsigned long buddy_pfn = 0; @@ -937,10 +942,21 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); + /* + * Use the page's luf_key unchanged if luf_key == 0. Worth + * noting that page_luf_key() will be 0 in most cases since it's + * initialized at free_pages_prepare(). + */ + if (luf_key) + set_page_luf_key(page, luf_key); + else + luf_key = page_luf_key(page); + while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; + unsigned short buddy_luf_key; - if (compaction_capture(capc, page, order, migratetype)) { + if (!luf_key && compaction_capture(capc, page, order, migratetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -973,6 +989,18 @@ static inline void __free_one_page(struct page *page, else __del_page_from_free_list(buddy, zone, order, buddy_mt); + /* + * !buddy_luf_key && !luf_key : do nothing + * buddy_luf_key && !luf_key : luf_key = buddy_luf_key + * !buddy_luf_key && luf_key : do nothing + * buddy_luf_key && luf_key : merge two into luf_key + */ + buddy_luf_key = page_luf_key(buddy); + if (buddy_luf_key && !luf_key) + luf_key = buddy_luf_key; + else if (buddy_luf_key && luf_key) + fold_luf_batch(&luf_batch[luf_key], &luf_batch[buddy_luf_key]); + if (unlikely(buddy_mt != migratetype)) { /* * Match buddy type. This ensures that an @@ -984,6 +1012,7 @@ static inline void __free_one_page(struct page *page, combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); + set_page_luf_key(page, luf_key); pfn = combined_pfn; order++; } @@ -1164,6 +1193,11 @@ __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(PageTail(page), page); + /* + * Ensure private is zero before using it inside allocator. + */ + set_page_private(page, 0); + trace_mm_page_free(page, order); kmsan_free_page(page, order); @@ -1329,7 +1363,8 @@ static void free_pcppages_bulk(struct zone *zone, int count, count -= nr_pages; pcp->count -= nr_pages; - __free_one_page(page, pfn, zone, order, mt, FPI_NONE); + __free_one_page(page, pfn, zone, order, mt, FPI_NONE, 0); + trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); } @@ -1353,7 +1388,7 @@ static void split_large_buddy(struct zone *zone, struct page *page, do { int mt = get_pfnblock_migratetype(page, pfn); - __free_one_page(page, pfn, zone, order, mt, fpi); + __free_one_page(page, pfn, zone, order, mt, fpi, 0); pfn += 1 << order; if (pfn == end) break; @@ -1363,11 +1398,18 @@ static void split_large_buddy(struct zone *zone, struct page *page, static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, - fpi_t fpi_flags) + fpi_t fpi_flags, unsigned short luf_key) { unsigned long flags; spin_lock_irqsave(&zone->lock, flags); + + /* + * valid luf_key can be passed only if order == 0. + */ + VM_WARN_ON(luf_key && order); + set_page_luf_key(page, luf_key); + split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -1375,13 +1417,13 @@ static void free_one_page(struct zone *zone, struct page *page, } static void __free_pages_ok(struct page *page, unsigned int order, - fpi_t fpi_flags) + fpi_t fpi_flags, unsigned short luf_key) { unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); if (free_pages_prepare(page, order)) - free_one_page(zone, page, pfn, order, fpi_flags); + free_one_page(zone, page, pfn, order, fpi_flags, luf_key); } void __meminit __free_pages_core(struct page *page, unsigned int order, @@ -1429,7 +1471,7 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, * Bypass PCP and place fresh pages right to the tail, primarily * relevant for memory onlining. */ - __free_pages_ok(page, order, FPI_TO_TAIL); + __free_pages_ok(page, order, FPI_TO_TAIL, 0); } /* @@ -2426,6 +2468,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, if (unlikely(page == NULL)) break; + /* + * Keep the page's luf_key. + */ + /* * Split buddy pages returned by expand() are received here in * physical page order. The page is added to the tail of @@ -2707,12 +2753,14 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_frozen_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, unsigned short luf_key) { int high, batch; int pindex; bool free_high = false; + set_page_luf_key(page, luf_key); + /* * On freeing, reduce the number of pages that are batch allocated. * See nr_pcp_alloc() where alloc_factor is increased for subsequent @@ -2721,7 +2769,16 @@ static void free_frozen_page_commit(struct zone *zone, pcp->alloc_factor >>= 1; __count_vm_events(PGFREE, 1 << order); pindex = order_to_pindex(migratetype, order); - list_add(&page->pcp_list, &pcp->lists[pindex]); + + /* + * Defer tlb shootdown as much as possible by putting luf'd + * pages to the tail. + */ + if (luf_key) + list_add_tail(&page->pcp_list, &pcp->lists[pindex]); + else + list_add(&page->pcp_list, &pcp->lists[pindex]); + pcp->count += 1 << order; batch = READ_ONCE(pcp->batch); @@ -2756,7 +2813,8 @@ static void free_frozen_page_commit(struct zone *zone, /* * Free a pcp page */ -void free_frozen_pages(struct page *page, unsigned int order) +void free_frozen_pages(struct page *page, unsigned int order, + unsigned short luf_key) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2765,7 +2823,7 @@ void free_frozen_pages(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, FPI_NONE, luf_key); return; } @@ -2783,7 +2841,7 @@ void free_frozen_pages(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE, luf_key); return; } migratetype = MIGRATE_MOVABLE; @@ -2792,10 +2850,10 @@ void free_frozen_pages(struct page *page, unsigned int order) pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_frozen_page_commit(zone, pcp, page, migratetype, order); + free_frozen_page_commit(zone, pcp, page, migratetype, order, luf_key); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE, luf_key); } pcp_trylock_finish(UP_flags); } @@ -2803,7 +2861,7 @@ void free_frozen_pages(struct page *page, unsigned int order) /* * Free a batch of folios */ -void free_unref_folios(struct folio_batch *folios) +void free_unref_folios(struct folio_batch *folios, unsigned short luf_key) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp = NULL; @@ -2824,7 +2882,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (!pcp_allowed_order(order)) { free_one_page(folio_zone(folio), &folio->page, - pfn, order, FPI_NONE); + pfn, order, FPI_NONE, luf_key); continue; } folio->private = (void *)(unsigned long)order; @@ -2860,7 +2918,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (is_migrate_isolate(migratetype)) { free_one_page(zone, &folio->page, pfn, - order, FPI_NONE); + order, FPI_NONE, luf_key); continue; } @@ -2873,7 +2931,7 @@ void free_unref_folios(struct folio_batch *folios) if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); free_one_page(zone, &folio->page, pfn, - order, FPI_NONE); + order, FPI_NONE, luf_key); continue; } locked_zone = zone; @@ -2888,7 +2946,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_frozen_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, luf_key); } if (pcp) { @@ -2980,7 +3038,7 @@ void __putback_isolated_page(struct page *page, unsigned int order, int mt) /* Return isolated page to tail of freelist. */ __free_one_page(page, page_to_pfn(page), zone, order, mt, - FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL); + FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL, 0); } /* @@ -4866,7 +4924,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, out: if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page && unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) { - free_frozen_pages(page, order); + free_frozen_pages(page, order, 0); page = NULL; } @@ -4947,11 +5005,11 @@ void __free_pages(struct page *page, unsigned int order) struct alloc_tag *tag = pgalloc_tag_get(page); if (put_page_testzero(page)) - free_frozen_pages(page, order); + free_frozen_pages(page, order, 0); else if (!head) { pgalloc_tag_sub_pages(tag, (1 << order) - 1); while (order-- > 0) - free_frozen_pages(page + (1 << order), order); + free_frozen_pages(page + (1 << order), order, 0); } } EXPORT_SYMBOL(__free_pages); @@ -4982,7 +5040,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, last = page + (1UL << order); for (page += nr; page < last; page++) - __free_pages_ok(page, 0, FPI_TO_TAIL); + __free_pages_ok(page, 0, FPI_TO_TAIL, 0); } return (void *)addr; } @@ -7000,7 +7058,7 @@ bool put_page_back_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page, pfn); ClearPageHWPoisonTakenOff(page); - __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE, 0); if (TestClearPageHWPoison(page)) { ret = true; } @@ -7069,7 +7127,7 @@ static void __accept_page(struct zone *zone, unsigned long *flags, accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER); - __free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL); + __free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL, 0); if (last) static_branch_dec(&zones_with_unaccepted_pages); diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index d2423f30577e4..558622f15a81e 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -86,7 +86,7 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); if (page_ref_sub_and_test(page, count)) - free_frozen_pages(page, compound_order(page)); + free_frozen_pages(page, compound_order(page), 0); } EXPORT_SYMBOL(__page_frag_cache_drain); @@ -139,7 +139,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, if (unlikely(encoded_page_decode_pfmemalloc(encoded_page))) { free_frozen_pages(page, - encoded_page_decode_order(encoded_page)); + encoded_page_decode_order(encoded_page), 0); goto refill; } @@ -166,6 +166,6 @@ void page_frag_free(void *addr) struct page *page = virt_to_head_page(addr); if (unlikely(put_page_testzero(page))) - free_frozen_pages(page, compound_order(page)); + free_frozen_pages(page, compound_order(page), 0); } EXPORT_SYMBOL(page_frag_free); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c608e9d728655..04dcea88a0dda 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -258,6 +258,12 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype)); } else { set_pageblock_migratetype(page, migratetype); + + /* + * Do not clear the page's private to keep its luf_key + * unchanged. + */ + __putback_isolated_page(page, order, migratetype); } zone->nr_isolate_pageblock--; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e4c428e61d8c1..c05afb7a395f1 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -116,6 +116,12 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, int mt = get_pageblock_migratetype(page); unsigned int order = get_order(sg->length); + /* + * Ensure private is zero before putting into the + * allocator. + */ + set_page_private(page, 0); + __putback_isolated_page(page, order, mt); /* If the pages were not reported due to error skip flagging */ diff --git a/mm/slub.c b/mm/slub.c index 1f50129dcfb3c..2cc3bf0f58bce 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2652,7 +2652,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) __folio_clear_slab(folio); mm_account_reclaimed_pages(pages); unaccount_slab(slab, order, s); - free_frozen_pages(&folio->page, order); + free_frozen_pages(&folio->page, order, 0); } static void rcu_free_slab(struct rcu_head *h) diff --git a/mm/swap.c b/mm/swap.c index fc8281ef42415..0c6198e4a8ee4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -109,7 +109,7 @@ void __folio_put(struct folio *folio) page_cache_release(folio); folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); - free_frozen_pages(&folio->page, folio_order(folio)); + free_frozen_pages(&folio->page, folio_order(folio), 0); } EXPORT_SYMBOL(__folio_put); @@ -991,7 +991,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) folios->nr = j; mem_cgroup_uncharge_folios(folios); - free_unref_folios(folios); + free_unref_folios(folios, 0); } EXPORT_SYMBOL(folios_put_refs); diff --git a/mm/vmscan.c b/mm/vmscan.c index c767d71c43d7d..ff1c53e769398 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1515,7 +1515,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); } continue; @@ -1584,7 +1584,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); list_splice(&ret_folios, folio_list); count_vm_events(PGACTIVATE, pgactivate); @@ -1908,7 +1908,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); spin_lock_irq(&lruvec->lru_lock); } @@ -1930,7 +1930,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (free_folios.nr) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); spin_lock_irq(&lruvec->lru_lock); } From patchwork Wed Feb 26 12:03:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992206 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F1F6C021BF for ; Wed, 26 Feb 2025 12:04:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4915C280036; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F1F728003D; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E78EF280039; Wed, 26 Feb 2025 07:03:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A5E86280037 for ; Wed, 26 Feb 2025 07:03:54 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 60BC3121209 for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) X-FDA: 83161961988.01.3C67689 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id 1E57E4001C for ; Wed, 26 Feb 2025 12:03:51 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=g9hb9vvg6JxlVibc5oHIFhIqyKa8x6jReQOMEQuihIc=; b=1yWcFJjXcHG8rtnYSPqhaNW7uFIeY/IVXzqpgVEOvavq/Mw5jyomT0Rwi5yxBqzg4M28lr w2w02+Lc5a1U0OLCAiJrB8WbX/woGFwqRIzaP5NWd6LnNJGL8hZOZTm/dlPst8Y5Ky0HQh JplraA54XVhmRdLd9fx9NGSJ6m2yLAM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571432; a=rsa-sha256; cv=none; b=fcguyh7e7ApXD1WV/lS6HFSDHRRtGHRc1bNmys0ZJx4aVEfzKAeylaKPOg8cdIUa9qezWu mVVW47SdOylwgqKsMAV0rYJS6/GKtiXXB2JuQq+LbsMWhoO8WSAMocg+CbBhu85QnsilA9 BsgsfwL0XsK73a8io4IOV6gS5vL9yhk= X-AuditID: a67dfc5b-3e1ff7000001d7ae-2d-67bf032295b6 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 12/25] mm: delimit critical sections to take off pages from pcp or buddy alloctor Date: Wed, 26 Feb 2025 21:03:23 +0900 Message-Id: <20250226120336.29565-12-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnoa4S8/50gxWvzSzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+PN5A7mgp3NjBUrZl1jbWCcnNXF yMkhIWAiMXPbZyYYu/XJdEYQm01AXeLGjZ/MILaIgJnEwdY/7F2MXBzMAsuYJPaeaGADSQgL 1Ei8etMJ1swioCrR1dHACmLzAjU83vaRGWKovMTqDQfAbE6g+Kdpx8B6hQSSJXb+/gO1+Dab xPcpqRC2pMTBFTdYJjDyLmBkWMUolJlXlpuYmWOil1GZl1mhl5yfu4kRGNbLav9E72D8dCH4 EKMAB6MSD++DM3vThVgTy4orcw8xSnAwK4nwcmbuSRfiTUmsrEotyo8vKs1JLT7EKM3BoiTO a/StPEVIID2xJDU7NbUgtQgmy8TBKdXAmHjDau27tis7Zt53/xU+/ZOlgfcn88qwnESWvedu BKi9mC2T7pvXGTTTtcLRwTHCW/7WIUvfYrerO/osMlw62dpCr35N7/tTzljes6zs8t0Fxnbi xZsO36oKTm6KquzvzlhYsyWMd/eMqfNOpB7omWPYl1WR+eLFrRNXvcr05ypZGHDciZ6pxFKc kWioxVxUnAgAp/uZVmcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYPs+XYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgy3kzuYC7Y2cxY sWLWNdYGxslZXYycHBICJhKtT6YzgthsAuoSN278ZAaxRQTMJA62/mHvYuTiYBZYxiSx90QD G0hCWKBG4tWbTiYQm0VAVaKro4EVxOYFani87SMzxFB5idUbDoDZnEDxT9OOgfUKCSRL7Pz9 h2kCI9cCRoZVjCKZeWW5iZk5pnrF2RmVeZkVesn5uZsYgWG6rPbPxB2MXy67H2IU4GBU4uF9 cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeb3CUxOEBNIT S1KzU1MLUotgskwcnFINjBGPLlnPvnn6ee7VLCGheJv6hTdNF/59XuqXJOFn/tNj4wq9v6/K mwzX2X/vX/Ncgtk3pWT3TYGA4rO/Ju9LXeWforZdmPPCs1PefS7H17Svf9uq9LC5d9m9EC2T qyrLFLKuH/6osuqIpa5DhNqi9xyXP0YZ8q2pbCmq+jf/oKdC+LFZ8m7VAUosxRmJhlrMRcWJ AGlnhTtPAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 1E57E4001C X-Stat-Signature: gzm9xgs1n9ic5h9w46fbjwbqe1o8bunx X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571431-335739 X-HE-Meta: U2FsdGVkX18tGIWl3tb1yWDxWTj5f1jBzcV9uLeAaeCLFwTJpJ7Pxv+IHHdwyBzSJdPe6day/hCKDSxt4+AjIXdTjmif63XXxpdstsEbJO1lGQJYncmvDsZhXWnZNM94JA69diNT2r8ilpun+w+R/RaOMUvQZ4o7XHQgKxXEZwR1eHXrRwLbVHdrz24NIh+Gwy7Wt41ybwAB/gneMmFHMmsqQcM3wHtjX6jwplYrBGZSCBtepwrO3Znqbm5xH6X5rkVUfP53y2ZPjihmfpPC80MXUSKcsGqmloXYH63vcNa4pLqjcfjOKhuz3tNzLfai6L9BUJmOzGt4PfVF2bJ31BLQEp1DmXT6iwRN/a4STq4QocR2U/78C3c7hge+BKCKCh30KyPUOUZPSqXR4SzH8DSvj42jz855Z36Ntmd4JRKq0YA7fl9dLr0+1XfMMMTXSKb6dKZ7zeB9iB72+uEd8NgJiZxQNAHwbPJgik4g2pUrXoexAaVVB1eJF3dugGpaFvU9f5ixn2eJYEzm/XwIokMQfRjXUxniTf7LtQ6b9pn6gB4lvu2S3b4ESJVRwXIOqqBto4+I/7JV66hjxYlzSmMKd/LL4vnDQCHRKCkZMax9W7EW4lAH9TLkZkF347zW9OFd920oww5tiQZGDIXVvXuZ264l+wAijU1mWpssMF/y2gvunzLH6b7g3p1jTanyd0jOlxoVY0/CbR/Aej1H7QK1qrrrDuIAsE9oktXGfre2G4VVaiL3+Fd3v8vvG2dtxDyYvt0utC9ZvInSXch8LMHIpfLrS8k4y95IqUCG65X27rpaNZ7nV0lv6RaQHLKj4RmMdFcEh1HigYdvTzoDp749Gb+IrufBUr/oFSBu9I6JE/vKL/i3bUm+PnDiGRWW1qvCMiYE7R5OaJNIGdjpijyljicaOySyWVy0q1poTbHWjVgOYExIimFVjLl7z9xNKn6M5G2e/I7aNMjvLdJ 9zQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now that luf mechanism has been introduced, tlb shootdown might be necessary when luf'd pages exit from pcp or buddy allocator. Check if it's okay to take off pages and can perform for luf'd pages before use. Signed-off-by: Byungchul Park --- mm/compaction.c | 32 ++++++++++++++++-- mm/internal.h | 2 +- mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++++++++++++-- mm/page_isolation.c | 4 ++- mm/page_reporting.c | 20 +++++++++++- 5 files changed, 129 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 12ed8425fa175..e26736d5b7b9c 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -606,6 +606,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, page = pfn_to_page(blockpfn); + luf_takeoff_start(); /* Isolate free pages. */ for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; @@ -654,9 +655,12 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } + if (!luf_takeoff_check(page)) + goto isolate_fail; + /* Found a free page, will break it into order-0 pages */ order = buddy_order(page); - isolated = __isolate_free_page(page, order); + isolated = __isolate_free_page(page, order, false); if (!isolated) break; set_page_private(page, order); @@ -684,6 +688,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (locked) spin_unlock_irqrestore(&cc->zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* * Be careful to not go outside of the pageblock. */ @@ -1591,6 +1600,7 @@ static void fast_isolate_freepages(struct compact_control *cc) if (!area->nr_free) continue; + luf_takeoff_start(); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; list_for_each_entry_reverse(freepage, freelist, buddy_list) { @@ -1598,6 +1608,10 @@ static void fast_isolate_freepages(struct compact_control *cc) order_scanned++; nr_scanned++; + + if (!luf_takeoff_check(freepage)) + goto scan_next; + pfn = page_to_pfn(freepage); if (pfn >= highest) @@ -1617,7 +1631,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Shorten the scan if a candidate is found */ limit >>= 1; } - +scan_next: if (order_scanned >= limit) break; } @@ -1635,7 +1649,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Isolate the page if available */ if (page) { - if (__isolate_free_page(page, order)) { + if (__isolate_free_page(page, order, false)) { set_page_private(page, order); nr_isolated = 1 << order; nr_scanned += nr_isolated - 1; @@ -1652,6 +1666,11 @@ static void fast_isolate_freepages(struct compact_control *cc) spin_unlock_irqrestore(&cc->zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* Skip fast search if enough freepages isolated */ if (cc->nr_freepages >= cc->nr_migratepages) break; @@ -2372,7 +2391,14 @@ static enum compact_result compact_finished(struct compact_control *cc) { int ret; + /* + * luf_takeoff_{start,end}() is required to identify whether + * this compaction context is tlb shootdownable for luf'd pages. + */ + luf_takeoff_start(); ret = __compact_finished(cc); + luf_takeoff_end(); + trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) ret = COMPACT_CONTINUE; diff --git a/mm/internal.h b/mm/internal.h index 47d3291278e81..9426ff6346d44 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -664,7 +664,7 @@ static inline void clear_zone_contiguous(struct zone *zone) zone->contiguous = false; } -extern int __isolate_free_page(struct page *page, unsigned int order); +extern int __isolate_free_page(struct page *page, unsigned int order, bool willputback); extern void __putback_isolated_page(struct page *page, unsigned int order, int mt); extern void memblock_free_pages(struct page *page, unsigned long pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d2d23bbd60467..325f07c34cfdc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -869,8 +869,13 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, static inline struct page *get_page_from_free_area(struct free_area *area, int migratetype) { - return list_first_entry_or_null(&area->free_list[migratetype], + struct page *page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + + return NULL; } /* @@ -1575,6 +1580,8 @@ static __always_inline void page_del_and_expand(struct zone *zone, int nr_pages = 1 << high; __del_page_from_free_list(page, zone, high, migratetype); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); nr_pages -= expand(zone, page, low, high, migratetype); account_freepages(zone, -nr_pages, migratetype); } @@ -1945,6 +1952,13 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(buddy, zone, order, get_pfnblock_migratetype(buddy, pfn)); + + /* + * No need to luf_takeoff_check_and_fold() since it's + * going back to buddy. luf_key will be handed over in + * split_large_buddy(). + */ + set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, buddy, pfn, order, FPI_NONE); return true; @@ -1956,6 +1970,13 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(page, zone, order, get_pfnblock_migratetype(page, pfn)); + + /* + * No need to luf_takeoff_check_and_fold() since it's + * going back to buddy. luf_key will be handed over in + * split_large_buddy(). + */ + set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, page, pfn, order, FPI_NONE); return true; @@ -2088,6 +2109,8 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int nr_added; del_page_from_free_list(page, zone, current_order, block_type); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); change_pageblock_range(page, current_order, start_type); nr_added = expand(zone, page, order, current_order, start_type); account_freepages(zone, nr_added, start_type); @@ -2168,6 +2191,9 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (free_area_empty(area, fallback_mt)) continue; + if (luf_takeoff_no_shootdown()) + continue; + if (can_steal_fallback(order, migratetype)) *can_steal = true; @@ -2259,6 +2285,11 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, pageblock_nr_pages) continue; + /* + * luf_takeoff_{start,end}() is required for + * get_page_from_free_area() to use luf_takeoff_check(). + */ + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2316,10 +2347,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, WARN_ON_ONCE(ret == -1); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); return ret; } } spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); } return false; @@ -2461,6 +2494,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -2485,6 +2519,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add_tail(&page->pcp_list, list); } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return i; } @@ -2979,7 +3017,7 @@ void split_page(struct page *page, unsigned int order) } EXPORT_SYMBOL_GPL(split_page); -int __isolate_free_page(struct page *page, unsigned int order) +int __isolate_free_page(struct page *page, unsigned int order, bool willputback) { struct zone *zone = page_zone(page); int mt = get_pageblock_migratetype(page); @@ -2998,6 +3036,8 @@ int __isolate_free_page(struct page *page, unsigned int order) } del_page_from_free_list(page, zone, order, mt); + if (unlikely(!willputback && !luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); /* * Set the pageblock if the isolated page is at least half of a @@ -3077,6 +3117,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); @@ -3094,10 +3135,15 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (!page) { spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); return NULL; } } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3181,6 +3227,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } page = list_first_entry(list, struct page, pcp_list); + if (!luf_takeoff_check_and_fold(page)) + return NULL; list_del(&page->pcp_list); pcp->count -= 1 << order; } while (check_new_pages(page, order)); @@ -3198,11 +3246,13 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long __maybe_unused UP_flags; + luf_takeoff_start(); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (!pcp) { pcp_trylock_finish(UP_flags); + luf_takeoff_end(); return NULL; } @@ -3216,6 +3266,10 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); @@ -4814,6 +4868,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; + luf_takeoff_start(); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -4849,6 +4904,10 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(zonelist_zone(ac.preferred_zoneref), zone, nr_account); @@ -4858,6 +4917,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, failed_irq: pcp_trylock_finish(UP_flags); + luf_takeoff_end(); failed: page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask); @@ -6912,6 +6972,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); while (pfn < end_pfn) { page = pfn_to_page(pfn); @@ -6940,9 +7001,15 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return end_pfn - start_pfn - already_offline; } @@ -7018,6 +7085,7 @@ bool take_page_off_buddy(struct page *page) unsigned int order; bool ret = false; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); @@ -7030,6 +7098,8 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order, migratetype); + if (unlikely(!luf_takeoff_check_and_fold(page_head))) + VM_WARN_ON(1); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); SetPageHWPoisonTakenOff(page); @@ -7040,6 +7110,11 @@ bool take_page_off_buddy(struct page *page) break; } spin_unlock_irqrestore(&zone->lock, flags); + + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return ret; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 04dcea88a0dda..c34659b58ca6c 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -211,6 +211,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) struct page *buddy; zone = page_zone(page); + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; @@ -229,7 +230,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) buddy = find_buddy_page_pfn(page, page_to_pfn(page), order, NULL); if (buddy && !is_migrate_isolate_page(buddy)) { - isolated_page = !!__isolate_free_page(page, order); + isolated_page = !!__isolate_free_page(page, order, true); /* * Isolating a free page in an isolated pageblock * is expected to always work as watermarks don't @@ -269,6 +270,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) zone->nr_isolate_pageblock--; out: spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(zone); } static inline struct page * diff --git a/mm/page_reporting.c b/mm/page_reporting.c index c05afb7a395f1..03a7f5f6dc073 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -167,6 +167,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (list_empty(list)) return err; + luf_takeoff_start(); spin_lock_irq(&zone->lock); /* @@ -191,6 +192,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (PageReported(page)) continue; + if (!luf_takeoff_check(page)) { + VM_WARN_ON(1); + continue; + } + /* * If we fully consumed our budget then update our * state to indicate that we are requesting additional @@ -204,7 +210,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* Attempt to pull page from list and place in scatterlist */ if (*offset) { - if (!__isolate_free_page(page, order)) { + if (!__isolate_free_page(page, order, false)) { next = page; break; } @@ -227,6 +233,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* release lock before waiting on report processing */ spin_unlock_irq(&zone->lock); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* begin processing pages in local list */ err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); @@ -236,6 +247,8 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* update budget to reflect call to report function */ budget--; + luf_takeoff_start(); + /* reacquire zone lock and resume processing */ spin_lock_irq(&zone->lock); @@ -259,6 +272,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, spin_unlock_irq(&zone->lock); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + return err; } From patchwork Wed Feb 26 12:03:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992204 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CDDBC18E7C for ; Wed, 26 Feb 2025 12:04:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCD4328003A; Wed, 26 Feb 2025 07:03:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8560280036; Wed, 26 Feb 2025 07:03:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA7C628003A; Wed, 26 Feb 2025 07:03:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 82EF3280036 for ; Wed, 26 Feb 2025 07:03:54 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 49EB0524F0 for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) X-FDA: 83161961988.07.7E4596E Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf25.hostedemail.com (Postfix) with ESMTP id 32416A0011 for ; Wed, 26 Feb 2025 12:03:51 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=v8jlyHlNTdQLnFZsC8oa6mSAPzPRoagG2RH44vfqgeY=; b=oHITnj0bZ9rXz8jeGG9X+lpctOJcjQuT9XFr9GlC427gWG9CJM0FWmwoYytKL7gnLw/lNQ jvh0BvhIX8PUx/KqEOiI9w7LTq0QF7pxyUW5IaqEUq4udWgALukFJkmAQlf+IfY51sMz/d H9agQZBIOjKQChcqSaUNL+2ZsWgOOP8= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571432; a=rsa-sha256; cv=none; b=odfKI7kIpwMVcLNJNjvJJAnhyXsen8AQH1tFld+2cMVhVpQx74Sv8sm489MDCH2LDCgZ2f DjjOJyIPF8eW3+K98zm8619k4BrhsaQ+hLCKqJ6crv0rYfIuHrEGoS5C0J77Hp4lAKWnQa OtUdgH9XGIqNO/+5bhXhyGPabFCfRI8= X-AuditID: a67dfc5b-3e1ff7000001d7ae-33-67bf0322b20f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 13/25] mm: introduce pend_list in struct free_area to track luf'd pages Date: Wed, 26 Feb 2025 21:03:24 +0900 Message-Id: <20250226120336.29565-13-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g8+LrCzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PD68dsBV8nM1bMa17H3sC4q6yL kZNDQsBE4tKEJlYYu/XELWYQm01AXeLGjZ9gtoiAmcTB1j/sXYxcHMwCy5gk9p5oYANJCAsU Sax79B6siEVAVeJBy2V2EJsXpOHuE2aIofISqzccALM5geKfph0D6xUSSJbY+fsPE8hQCYHb bBL/Fk+HukJS4uCKGywTGHkXMDKsYhTKzCvLTczMMdHLqMzLrNBLzs/dxAgM7GW1f6J3MH66 EHyIUYCDUYmH98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBR Euc1+laeIiSQnliSmp2aWpBaBJNl4uCUamBctUr72PVup+CC1vYg9+gVfjNCvq45a1r1c3VZ juuVx/svbzsz+4PFWTsV6b39zCcVJ3+vnlR/i8XitJFFYGq/cIZgYW/pk+yYwKwJbLGrl166 a7ljRffbI0FTAn5Xai7a4jtdv3XPBO1HLAYvN7jsul4p9NNl4rpTU5L+x6evFTHI3/eNx4lf iaU4I9FQi7moOBEAzxvJPGgCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYNdMA4s569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyNrx+zFbwdTJj xbzmdewNjLvKuhg5OSQETCRaT9xiBrHZBNQlbtz4CWaLCJhJHGz9w97FyMXBLLCMSWLviQY2 kISwQJHEukfvwYpYBFQlHrRcZgexeUEa7j5hhhgqL7F6wwEwmxMo/mnaMbBeIYFkiZ2//zBN YORawMiwilEkM68sNzEzx1SvODujMi+zQi85P3cTIzBMl9X+mbiD8ctl90OMAhyMSjy8D87s TRdiTSwrrsw9xCjBwawkwsuZuSddiDclsbIqtSg/vqg0J7X4EKM0B4uSOK9XeGqCkEB6Yklq dmpqQWoRTJaJg1OqgdEys/PUtmnsb3nLXi/a93j9yynGcVrSvNErlib9+b912etQ1by2nIY4 O8YFtpMk9rzJuinMnml6dN4k5vyza0/uPSfeZHisOOzChean078fjD6n84VtXnuJtq7znoeP n6xXy2u6cCXikc7Koy6TjoqYs5caTLPQ4ivNuFK+wvHD1cMXLrja3DFQYinOSDTUYi4qTgQA hd+rp08CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 32416A0011 X-Stat-Signature: 3m48o7z7pn8pxthjf7aiteg6dscrezj3 X-HE-Tag: 1740571431-919617 X-HE-Meta: U2FsdGVkX18aQktLHDporhzen7e8oI3oNVVH97QX1oBeUAwFV7fI0K8CwURRS8UMIuWmWI6xT1wz6ySF4KZVJDXvlN068mxML5eaZ55ccQvjxXc4Zd0jF24WnRHatVjnt1iw7WJQSLLSsm2XqjzT6q7ikd/HYUZEQ1YlHvA0lki8Fe8nYdnIALDKnphZhAPQSxQZphVyPLyxY84nyt9vzGiv7yYCRx3PdikTMbOX3wN9/mjaPe1UDjATJVYjsyWycOwvjpX1/KKvVXUlhdSko8Ij0zYZtP95XqyEhBAKNUxwgbrF2rRqKvCPkafisLKyGJaV0paFeB+44UC9QLeUgLLe6Y32rXaU60hcDmZMmSEkL5/owXgg0sv0ZXEH4JT/sy3fyLO0frevGypKdoaX7XcYjpDgBcoKwsg34d26BQYA43jQWsQQ1KuNkUdTM5QEW2Zvce3XQHyP2pPEH+3RHBPdvj7bTTZtSz0x1Ri4FjJkNXPs4zFCnwlpDG9p739/ujEfovPLdhO4LXNXDnLG617i9023+aljgVLgUj4lN0fqcj/HM6Jnt/EGX4bWN7ESjvfKHorK7yMk1r/qAhXoXhyf4csURyYRK+/CfX4UgZUl7jiVve/OZDkomcCGFzPkojR76g9zfv3DBWICP0Au84NI4b/iMKeBKwG3BTp563gejzL9Z8cVzbpkeSvS6Jv7ztPDQe3FyLmGRvhNLyDKPcsabXCpjX8OMyzvmT2xx1BWxstLo+4fXLadoTXI8keHtmEa/mOCKKu6zGOwD6CpFIeDiSjfl36Hp9uip8cZQrxDZOy5O6AP+bsi7TSU8AWu1F4nQQNlXdEwKtFd1/VynQKXA5SbQcW/ZIHbQfJSgewpzkjG4t5hrSB85yaYj1PWzNFB14xae9doYrGq0o1RgOJlP2m1/4O2w4UC48/PuNtHOSuL1Pm5bxtp/oekifMTmhur/zlJSycG9KzIaOV VuBokkDI xKekEwhdIDBgSVUAOtr0UGygnzKcqJGNNLqQW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: luf'd pages requires tlb shootdown on exiting from page allocator. For some page allocation request, it's okay to return luf'd page followed by tlb shootdown but it's not okay for e.g. irq context. This patch splitted the list in free_area into two, 'free_list' for non-luf'd pages and 'pend_list' for luf'd pages so that the buddy allocator can work better with various conditions of context. Signed-off-by: Byungchul Park --- include/linux/mmzone.h | 3 ++ kernel/power/snapshot.c | 14 ++++++ kernel/vmcore_info.c | 2 + mm/compaction.c | 33 ++++++++++--- mm/internal.h | 17 ++++++- mm/mm_init.c | 2 + mm/page_alloc.c | 105 ++++++++++++++++++++++++++++++++++------ mm/page_reporting.c | 22 ++++++--- mm/vmstat.c | 15 ++++++ 9 files changed, 184 insertions(+), 29 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9540b41894da6..e2c8d7147e361 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -116,6 +116,7 @@ extern int page_group_by_mobility_disabled; MIGRATETYPE_MASK) struct free_area { struct list_head free_list[MIGRATE_TYPES]; + struct list_head pend_list[MIGRATE_TYPES]; unsigned long nr_free; }; @@ -1014,6 +1015,8 @@ struct zone { /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; + /* Count pages that need tlb shootdown on allocation */ + atomic_long_t nr_luf_pages; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index c9fb559a63993..ca10796855aba 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1285,6 +1285,20 @@ static void mark_free_pages(struct zone *zone) swsusp_set_page_free(pfn_to_page(pfn + i)); } } + + list_for_each_entry(page, + &zone->free_area[order].pend_list[t], buddy_list) { + unsigned long i; + + pfn = page_to_pfn(page); + for (i = 0; i < (1UL << order); i++) { + if (!--page_count) { + touch_nmi_watchdog(); + page_count = WD_PAGE_COUNT; + } + swsusp_set_page_free(pfn_to_page(pfn + i)); + } + } } spin_unlock_irqrestore(&zone->lock, flags); } diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index 1fec61603ef32..638deb57f9ddd 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -188,11 +188,13 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_OFFSET(zone, vm_stat); VMCOREINFO_OFFSET(zone, spanned_pages); VMCOREINFO_OFFSET(free_area, free_list); + VMCOREINFO_OFFSET(free_area, pend_list); VMCOREINFO_OFFSET(list_head, next); VMCOREINFO_OFFSET(list_head, prev); VMCOREINFO_LENGTH(zone.free_area, NR_PAGE_ORDERS); log_buf_vmcoreinfo_setup(); VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); + VMCOREINFO_LENGTH(free_area.pend_list, MIGRATE_TYPES); VMCOREINFO_NUMBER(NR_FREE_PAGES); VMCOREINFO_NUMBER(PG_lru); VMCOREINFO_NUMBER(PG_private); diff --git a/mm/compaction.c b/mm/compaction.c index e26736d5b7b9c..aa594a85d8aee 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1592,24 +1592,28 @@ static void fast_isolate_freepages(struct compact_control *cc) order = next_search_order(cc, order)) { struct free_area *area = &cc->zone->free_area[order]; struct list_head *freelist; + struct list_head *high_pfn_list; struct page *freepage; unsigned long flags; unsigned int order_scanned = 0; unsigned long high_pfn = 0; + bool consider_pend = false; + bool can_shootdown; if (!area->nr_free) continue; - luf_takeoff_start(); + can_shootdown = luf_takeoff_start(); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; +retry: list_for_each_entry_reverse(freepage, freelist, buddy_list) { unsigned long pfn; order_scanned++; nr_scanned++; - if (!luf_takeoff_check(freepage)) + if (unlikely(consider_pend && !luf_takeoff_check(freepage))) goto scan_next; pfn = page_to_pfn(freepage); @@ -1622,26 +1626,34 @@ static void fast_isolate_freepages(struct compact_control *cc) cc->fast_search_fail = 0; cc->search_order = order; page = freepage; - break; + goto done; } if (pfn >= min_pfn && pfn > high_pfn) { high_pfn = pfn; + high_pfn_list = freelist; /* Shorten the scan if a candidate is found */ limit >>= 1; } scan_next: if (order_scanned >= limit) - break; + goto done; } + if (!consider_pend && can_shootdown) { + consider_pend = true; + freelist = &area->pend_list[MIGRATE_MOVABLE]; + goto retry; + } +done: /* Use a maximum candidate pfn if a preferred one was not found */ if (!page && high_pfn) { page = pfn_to_page(high_pfn); /* Update freepage for the list reorder below */ freepage = page; + freelist = high_pfn_list; } /* Reorder to so a future search skips recent pages */ @@ -2039,18 +2051,20 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) struct list_head *freelist; unsigned long flags; struct page *freepage; + bool consider_pend = false; if (!area->nr_free) continue; spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; +retry: list_for_each_entry(freepage, freelist, buddy_list) { unsigned long free_pfn; if (nr_scanned++ >= limit) { move_freelist_tail(freelist, freepage); - break; + goto done; } free_pfn = page_to_pfn(freepage); @@ -2073,9 +2087,16 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) pfn = cc->zone->zone_start_pfn; cc->fast_search_fail = 0; found_block = true; - break; + goto done; } } + + if (!consider_pend) { + consider_pend = true; + freelist = &area->pend_list[MIGRATE_MOVABLE]; + goto retry; + } +done: spin_unlock_irqrestore(&cc->zone->lock, flags); } diff --git a/mm/internal.h b/mm/internal.h index 9426ff6346d44..9dbb65f919337 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -849,11 +849,16 @@ void init_cma_reserved_pageblock(struct page *page); int find_suitable_fallback(struct free_area *area, unsigned int order, int migratetype, bool only_stealable, bool *can_steal); -static inline bool free_area_empty(struct free_area *area, int migratetype) +static inline bool free_list_empty(struct free_area *area, int migratetype) { return list_empty(&area->free_list[migratetype]); } +static inline bool free_area_empty(struct free_area *area, int migratetype) +{ + return list_empty(&area->free_list[migratetype]) && + list_empty(&area->pend_list[migratetype]); +} /* mm/util.c */ struct anon_vma *folio_anon_vma(const struct folio *folio); @@ -1587,12 +1592,22 @@ void luf_takeoff_end(void); bool luf_takeoff_no_shootdown(void); bool luf_takeoff_check(struct page *page); bool luf_takeoff_check_and_fold(struct page *page); + +static inline bool non_luf_pages_ok(struct zone *zone) +{ + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); + unsigned long min_wm = min_wmark_pages(zone); + unsigned long nr_luf_pages = atomic_long_read(&zone->nr_luf_pages); + + return nr_free - nr_luf_pages > min_wm; +} #else static inline bool luf_takeoff_start(void) { return false; } static inline void luf_takeoff_end(void) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct page *page) { return true; } static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +static inline bool non_luf_pages_ok(struct zone *zone) { return true; } #endif /* pagewalk.c */ diff --git a/mm/mm_init.c b/mm/mm_init.c index 2630cc30147e0..41c38fbb58a30 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1399,12 +1399,14 @@ static void __meminit zone_init_free_lists(struct zone *zone) unsigned int order, t; for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); + INIT_LIST_HEAD(&zone->free_area[order].pend_list[t]); zone->free_area[order].nr_free = 0; } #ifdef CONFIG_UNACCEPTED_MEMORY INIT_LIST_HEAD(&zone->unaccepted_pages); #endif + atomic_long_set(&zone->nr_luf_pages, 0); } void __meminit init_currently_empty_zone(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 325f07c34cfdc..db1460c07b964 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -804,15 +804,28 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, bool tail) { struct free_area *area = &zone->free_area[order]; + struct list_head *list; VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), migratetype, 1 << order); + /* + * When identifying whether a page requires tlb shootdown, false + * positive is okay because it will cause just additional tlb + * shootdown. + */ + if (page_luf_key(page)) { + list = &area->pend_list[migratetype]; + atomic_long_add(1 << order, &zone->nr_luf_pages); + } else + list = &area->free_list[migratetype]; + if (tail) - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); + list_add_tail(&page->buddy_list, list); else - list_add(&page->buddy_list, &area->free_list[migratetype]); + list_add(&page->buddy_list, list); + area->nr_free++; } @@ -831,7 +844,20 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), old_mt, 1 << order); - list_move_tail(&page->buddy_list, &area->free_list[new_mt]); + /* + * The page might have been taken from a pfn where it's not + * clear which list was used. Therefore, conservatively + * consider it as pend_list, not to miss any true ones that + * require tlb shootdown. + * + * When identifying whether a page requires tlb shootdown, false + * positive is okay because it will cause just additional tlb + * shootdown. + */ + if (page_luf_key(page)) + list_move_tail(&page->buddy_list, &area->pend_list[new_mt]); + else + list_move_tail(&page->buddy_list, &area->free_list[new_mt]); account_freepages(zone, -(1 << order), old_mt); account_freepages(zone, 1 << order, new_mt); @@ -848,6 +874,9 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon if (page_reported(page)) __ClearPageReported(page); + if (page_luf_key(page)) + atomic_long_sub(1 << order, &zone->nr_luf_pages); + list_del(&page->buddy_list); __ClearPageBuddy(page); zone->free_area[order].nr_free--; @@ -866,15 +895,48 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, account_freepages(zone, -(1 << order), migratetype); } -static inline struct page *get_page_from_free_area(struct free_area *area, - int migratetype) +static inline struct page *get_page_from_free_area(struct zone *zone, + struct free_area *area, int migratetype) { - struct page *page = list_first_entry_or_null(&area->free_list[migratetype], - struct page, buddy_list); + struct page *page; + bool pend_first; - if (page && luf_takeoff_check(page)) - return page; + /* + * XXX: Make the decision preciser if needed e.g. using + * zone_watermark_ok() or its family, but for now, don't want to + * make it heavier. + * + * Try free_list, holding non-luf pages, first if there are + * enough non-luf pages to aggressively defer tlb flush, but + * should try pend_list first instead if not. + */ + pend_first = !non_luf_pages_ok(zone); + + if (pend_first) { + page = list_first_entry_or_null(&area->pend_list[migratetype], + struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + + page = list_first_entry_or_null(&area->free_list[migratetype], + struct page, buddy_list); + + if (page) + return page; + } else { + page = list_first_entry_or_null(&area->free_list[migratetype], + struct page, buddy_list); + + if (page) + return page; + page = list_first_entry_or_null(&area->pend_list[migratetype], + struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + } return NULL; } @@ -1027,6 +1089,8 @@ static inline void __free_one_page(struct page *page, if (fpi_flags & FPI_TO_TAIL) to_tail = true; + else if (page_luf_key(page)) + to_tail = true; else if (is_shuffle_order(order)) to_tail = shuffle_pick_tail(); else @@ -1552,6 +1616,8 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, unsigned int nr_added = 0; while (high > low) { + bool tail = false; + high--; size >>= 1; VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); @@ -1565,7 +1631,10 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; - __add_to_free_list(&page[size], zone, high, migratetype, false); + if (page_luf_key(&page[size])) + tail = true; + + __add_to_free_list(&page[size], zone, high, migratetype, tail); set_buddy_order(&page[size], high); nr_added += size; } @@ -1749,7 +1818,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < NR_PAGE_ORDERS; ++current_order) { area = &(zone->free_area[current_order]); - page = get_page_from_free_area(area, migratetype); + page = get_page_from_free_area(zone, area, migratetype); if (!page) continue; @@ -2191,7 +2260,8 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (free_area_empty(area, fallback_mt)) continue; - if (luf_takeoff_no_shootdown()) + if (free_list_empty(area, fallback_mt) && + luf_takeoff_no_shootdown()) continue; if (can_steal_fallback(order, migratetype)) @@ -2295,7 +2365,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, struct free_area *area = &(zone->free_area[order]); int mt; - page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); + page = get_page_from_free_area(zone, area, MIGRATE_HIGHATOMIC); if (!page) continue; @@ -2433,7 +2503,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, VM_BUG_ON(current_order > MAX_PAGE_ORDER); do_steal: - page = get_page_from_free_area(area, fallback_mt); + page = get_page_from_free_area(zone, area, fallback_mt); /* take off list, maybe claim block, expand remainder */ page = steal_suitable_fallback(zone, page, current_order, order, @@ -7056,6 +7126,8 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, struct page *current_buddy; while (high > low) { + bool tail = false; + high--; size >>= 1; @@ -7069,7 +7141,10 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; - add_to_free_list(current_buddy, zone, high, migratetype, false); + if (page_luf_key(current_buddy)) + tail = true; + + add_to_free_list(current_buddy, zone, high, migratetype, tail); set_buddy_order(current_buddy, high); } } diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 03a7f5f6dc073..e152b22fbba8a 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -159,15 +159,17 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, struct page *page, *next; long budget; int err = 0; + bool consider_pend = false; + bool can_shootdown; /* * Perform early check, if free area is empty there is * nothing to process so we can skip this free_list. */ - if (list_empty(list)) + if (free_area_empty(area, mt)) return err; - luf_takeoff_start(); + can_shootdown = luf_takeoff_start(); spin_lock_irq(&zone->lock); /* @@ -185,14 +187,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, * should always be a power of 2. */ budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16); - +retry: /* loop through free list adding unreported pages to sg list */ list_for_each_entry_safe(page, next, list, lru) { /* We are going to skip over the reported pages. */ if (PageReported(page)) continue; - if (!luf_takeoff_check(page)) { + if (unlikely(consider_pend && !luf_takeoff_check(page))) { VM_WARN_ON(1); continue; } @@ -205,14 +207,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (budget < 0) { atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED); next = page; - break; + goto done; } /* Attempt to pull page from list and place in scatterlist */ if (*offset) { if (!__isolate_free_page(page, order, false)) { next = page; - break; + goto done; } /* Add page to scatter list */ @@ -263,9 +265,15 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* exit on error */ if (err) - break; + goto done; } + if (!consider_pend && can_shootdown) { + consider_pend = true; + list = &area->pend_list[mt]; + goto retry; + } +done: /* Rotate any leftover pages to the head of the freelist */ if (!list_entry_is_head(next, list, lru) && !list_is_first(&next->lru, list)) list_rotate_to_front(&next->lru, list); diff --git a/mm/vmstat.c b/mm/vmstat.c index 16bfe1c694dd4..5ae5ac9f0a4a9 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1581,6 +1581,21 @@ static void pagetypeinfo_showfree_print(struct seq_file *m, break; } } + list_for_each(curr, &area->pend_list[mtype]) { + /* + * Cap the pend_list iteration because it might + * be really large and we are under a spinlock + * so a long time spent here could trigger a + * hard lockup detector. Anyway this is a + * debugging tool so knowing there is a handful + * of pages of this order should be more than + * sufficient. + */ + if (++freecount >= 100000) { + overflow = true; + break; + } + } seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount); spin_unlock_irq(&zone->lock); cond_resched(); From patchwork Wed Feb 26 12:03:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992205 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F194CC021B8 for ; Wed, 26 Feb 2025 12:04:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1391928003B; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BE69280037; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D93B328003B; Wed, 26 Feb 2025 07:03:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A6C9A280039 for ; Wed, 26 Feb 2025 07:03:54 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 61536C135A for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) X-FDA: 83161961988.29.7A28DF3 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf27.hostedemail.com (Postfix) with ESMTP id 722284000C for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=PvOUH4TkWjKdwJbsbFwUzHcppCyoZWq5ZTfmC93XvKY=; b=3YyRAptkBbQ7zMXW1ak8NlaAielumVC6BGeUPyUMfk16Ubh1WaFdxphiZUNb51NXv8AC4p WjNJsrR6Dqxafwr0Idt6gPSeG65imNNTkM5+l1gMNPx3Dioa0IZRWw5nzhY9Z7CvN6Jbjd uJyFHsaYnB351ft0xWQXCL2nZXIFavU= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571432; a=rsa-sha256; cv=none; b=W+lPEYPydhhTOo2Dkxo3SxRSIhgfWA5ixo9zzO2U3zIBJZtX2oSug+tHSAK7nGuEsRR8a2 NuHi4nZAkZj0wqc0ahmHHxzvZzxXSPdp0LjN5GulrbjjZviKot3y5bPVQOCD90j7WGHoLJ hAbkJLu2ksXXjrs3DDQpAYsGZWlcCgE= X-AuditID: a67dfc5b-3e1ff7000001d7ae-38-67bf0322651c From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 14/25] mm/rmap: recognize read-only tlb entries during batched tlb flush Date: Wed, 26 Feb 2025 21:03:25 +0900 Message-Id: <20250226120336.29565-14-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsXC9ZZnoa4y8/50g0OiFnPWr2Gz+LzhH5vF 1/W/mC2efupjsbi8aw6bxb01/1ktzu9ay2qxY+k+JotLBxYwWRzvPcBkMf/eZzaLzZumMlsc nzKV0eL3jzlsDnwe31v7WDx2zrrL7rFgU6nH5hVaHptWdbJ5bPo0id3j3blz7B4nZvxm8Xi/ 7yqbx9Zfdh6NU6+xeXzeJBfAE8Vlk5Kak1mWWqRvl8CV8WThU5aCk2IV2z+3szcwvhDqYuTk kBAwkZj0pI8Fxr73vY8dxGYTUJe4ceMnM4gtImAmcbD1D1Cci4NZYBmTxN4TDWwgCWGBYomL PxrAmlkEVCWarrxm6mLk4OAFavj90Q5iprzE6g0HwOZwAoU/TTsG1iokkCyx8/cfJpCZEgK3 2SRW9k5ih2iQlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECQ3pZ7Z/oHYyf LgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlY lMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYk3snaE1vrrgZpbz4y7S/uTJhisf/sywqC756 Onvdua8rZZ+K5jbE+7NMO5BcPunh+tjXG+0li+caKLe8cdnzonlaBaP+y82l55YfvKTOu/Ew 32qPa5emBW//w3f2QJbDzxBT+baXqxabRiuFzygN7zd93sY/609Q9vdPbXtFlp+w3eXfWM7V osRSnJFoqMVcVJwIAG1cVZtlAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYO8VE4s569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgynix8ylJwUqxi ++d29gbGF0JdjJwcEgImEve+97GD2GwC6hI3bvxkBrFFBMwkDrb+AYpzcTALLGOS2HuigQ0k ISxQLHHxRwMLiM0ioCrRdOU1UxcjBwcvUMPvj3YQM+UlVm84ADaHEyj8adoxsFYhgWSJnb// ME1g5FrAyLCKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMESX1f6ZuIPxy2X3Q4wCHIxKPLwP zuxNF2JNLCuuzD3EKMHBrCTCy5m5J12INyWxsiq1KD++qDQntfgQozQHi5I4r1d4aoKQQHpi SWp2ampBahFMlomDU6qBkeHtTas1d5snJTxrF8r+1NFylKNxrvaSF8zMYTy2a67s3xTJvS6v 0TDY0fDB+iClZJ+CKfMvcRxvu2Izz+VazBrl3WaX1hff5y+fefxDwJ03S1XyH00oeZ/u5NHM zcAgsS9toiWXKOvarjn+W7d2t5+Me+Nwycnp3QeR67Kx/v/4dI6cMZz8V4mlOCPRUIu5qDgR AGTi/ZhNAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 722284000C X-Stat-Signature: cpga878ozzg78kiokmr9oft4f8rcxbnh X-HE-Tag: 1740571432-332951 X-HE-Meta: U2FsdGVkX19SxvgX4LAACLXCYUCeBoqh8FWXNS+p2DrkmXmW6g1COEnIzSrezwQ6Yr00gq97vh20GdLGXsSE/KCpKiJTyq5/iDKXGxCmFZqDcW/dTfKxHibVYC8wZTAZVC5Qduv0n6KOvtQKvmLCJnRyJV+8IEXPTXKQQPDVRc47lT3+mNFksQ9wmgIDqaT6422c+VzzGcV2wV3PmHJDbbbnkUumGx4mqhsRifXYaxVmD7sCSH71+sdJQusfFywz7+WQzj0FyyIFhqcbR8rUM6XjJGN7ljlVjMvrqZZgOqd6qSqBHAToOXwYrTCUKbB4dhhN2PhAdCHoxeDSX+XOZ9JDeJpkAzmnVfheIahN6r1rcrhKAJdq8/YkmvGNVFQEUykFHThpi8IRoN6NIn5/gcFel7bJmsCh+KKYGUHMrxpCOT65WvyskuGjfTZhSTuSdAT9otnGNChXq0HSY4CGK2hZyI2Xgq42h0uyFW7AEnGehf8YloXlbqbAEkdPPO0/RybjKiWAki6VNbQYw2hSKDj2zRQ6s1+sodqvJ4+cnIJcTDARifGrd2Aa8kYu1b4HBfA1zJDU4d1qhDpdsOGD0/RfUgzpLvt83zZRFyGdBduS+1/CQNsoUcEA6kICjpUQWzf2WoB763dZHPxNpv1IVCWjF71eHzFBM63uI+9GL5EDjmwai8nhmsLxmhHsz51PU/1pUkef0XiBTgy3uT6ChrVEdEEvWtrd939mJ3LA+BNMaNLAnT9Gf+TtwzSvYjxUUToDznfLTpv/kM8D6OAjgxyZ/hv/EC1V/guqISKBC05wYVjReiJpLYQM36sMRPm9orGIudJGJDK+FgdQH/f6zyDtLgRBkp14FNQWVbELg7GJb3FpQajccYDSpK+ii+E7xy5LwvYe+wc44uKxFDeRhvFCKTYiVo+HQeaBSWlRlrbic8x5oI42fJhAuWkc9MObMcISM+RxFJR5ZqN3MNN T5w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to recognize read-only tlb entries and handle them in a different way. The newly introduced API in this patch, fold_ubc(), will be used by luf mechanism. Signed-off-by: Byungchul Park --- include/linux/sched.h | 1 + mm/rmap.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a3049ea5b3ad3..d1a3c97491ff2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1407,6 +1407,7 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; + struct tlbflush_unmap_batch tlb_ubc_ro; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/rmap.c b/mm/rmap.c index 1581b1a00f974..3ed6234dd777e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -775,6 +775,7 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; if (!tlb_ubc_takeoff->flush_required) @@ -789,6 +790,9 @@ void try_to_unmap_flush_takeoff(void) if (arch_tlbbatch_done(&tlb_ubc->arch, &tlb_ubc_takeoff->arch)) reset_batch(tlb_ubc); + if (arch_tlbbatch_done(&tlb_ubc_ro->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc_ro); + reset_batch(tlb_ubc_takeoff); } @@ -801,7 +805,9 @@ void try_to_unmap_flush_takeoff(void) void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + fold_batch(tlb_ubc, tlb_ubc_ro, true); if (!tlb_ubc->flush_required) return; @@ -813,8 +819,9 @@ void try_to_unmap_flush(void) void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; - if (tlb_ubc->writable) + if (tlb_ubc->writable || tlb_ubc_ro->writable) try_to_unmap_flush(); } @@ -831,13 +838,18 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr) { - struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc; int batch; bool writable = pte_dirty(pteval); if (!pte_accessible(mm, pteval)) return; + if (pte_write(pteval)) + tlb_ubc = ¤t->tlb_ubc; + else + tlb_ubc = ¤t->tlb_ubc_ro; + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; From patchwork Wed Feb 26 12:03:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992207 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 648B1C021B8 for ; Wed, 26 Feb 2025 12:04:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 845BA280037; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 774BB280039; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3844228003C; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 065BD280036 for ; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B8313812DF for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) X-FDA: 83161961988.13.9BCD87F Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf21.hostedemail.com (Postfix) with ESMTP id A21051C0018 for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=GGRlM7Ft9JM3YJFrubMIi8XkWHlSzsiDgzSrk9bbZr0=; b=5CJ8mzvCptRP7aIu6f9JO6FkYa2Ru2CP5uTLcL7XXjO5h3xCKJviHhkDGoLfSq0HmpGPVg eLZA7GyNePlZ9EFsA3a+In2kPwlOQNw893/khozDD74JWLtyX1GUMSg1787h5KjAMxnnzh P1tPHpy/s4ZdKc/u8wFExWnPR9UyFQw= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571433; a=rsa-sha256; cv=none; b=QZWOhZrpVjKe7lnyK+YAWARdg7fgWehpI+xmYEwMs/0gzgxxb5uh6wLIu+o6TSfqfaixjK uxw+pPy6AR2ZWYGmXcnVgr+x5jmKNPfBqLaABRrKcxGf/JzROUHVELKKmDQ9hSuvQWZpGs 6s6LiOlXqxTR7jGi69cbJFyGhEnM4F4= X-AuditID: a67dfc5b-3e1ff7000001d7ae-3d-67bf0322a5d2 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 15/25] fs, filemap: refactor to gather the scattered ->write_{begin,end}() calls Date: Wed, 26 Feb 2025 21:03:26 +0900 Message-Id: <20250226120336.29565-15-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4y8/50g8b7zBZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlTFvx3mWgm/ZFWdnLmNqYDwX2sXI ySEhYCIx8+tBpi5GDjD7z21dkDCbgLrEjRs/mUFsEQEziYOtf9i7GLk4mAWWMUnsPdHABpIQ FqiW+P5jIiuIzSKgKjHj8wswmxeoYfKsfiaI+fISqzccABvECRT/NO0YWK+QQLLEzt9/mECG SgjcZ5N41XGLDaJBUuLgihssExh5FzAyrGIUyswry03MzDHRy6jMy6zQS87P3cQIDOpltX+i dzB+uhB8iFGAg1GJh/fBmb3pQqyJZcWVuYcYJTiYlUR4OTP3pAvxpiRWVqUW5ccXleakFh9i lOZgURLnNfpWniIkkJ5YkpqdmlqQWgSTZeLglGpgNAz68jfM8/C/B6vfJHoYzWn9+nlvTvHm +j9RYhvWKSc0pXEeY6tkeD4l8EbEpGUOkxY4TZUPnJIt/ce1st/n65WGO7elu6PMMzNF2h7r sH032HMiKyQn1PF5Kt+eA3dO/p04M2uX+4FP++IMYh3j79SGbK6e4nhVteEA4wZGjYpnjDfM FZoDlViKMxINtZiLihMBLiclEGYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4GmfucWc9WvYLD5v+Mdm 8XX9L2aLp5/6WCwOzz3JanF51xw2i3tr/rNanN+1ltVix9J9TBaXDixgsjjee4DJYv69z2wW mzdNZbY4PmUqo8XvH3PYHPg9vrf2sXjsnHWX3WPBplKPzSu0PDat6mTz2PRpErvHu3Pn2D1O zPjN4vF+31U2j8UvPjB5bP1l59E49Rqbx+dNcgG8UVw2Kak5mWWpRfp2CVwZ83acZyn4ll1x duYypgbGc6FdjBwcEgImEn9u63YxcnKwCahL3LjxkxnEFhEwkzjY+oe9i5GLg1lgGZPE3hMN bCAJYYFqie8/JrKC2CwCqhIzPr8As3mBGibP6mcCsSUE5CVWbzgANogTKP5p2jGwXiGBZImd v/8wTWDkWsDIsIpRJDOvLDcxM8dUrzg7ozIvs0IvOT93EyMwRJfV/pm4g/HLZfdDjAIcjEo8 vA/O7E0XYk0sK67MPcQowcGsJMLLmbknXYg3JbGyKrUoP76oNCe1+BCjNAeLkjivV3hqgpBA emJJanZqakFqEUyWiYNTqoHRKP6DafbsI3qblXgLymdPXGIdrby/LY85S6cy+XDSfKklPp7W /8LWu6R+9m8WzNi90XSNBRf/76DdnpGzOqyFPZ8a8Dml+lmlPmC5en9HZeTqrXnbnLP+dv5v 3pCVtrp9ScSq7ew3lrU3+baYC0wq3FIaqfdqtsg2pl+L10ztKJ63xVLpp5wSS3FGoqEWc1Fx IgAKMHr4TQIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: A21051C0018 X-Stat-Signature: i4uiqasnrjowp46h6pe5y883wq6txnc5 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571432-673743 X-HE-Meta: U2FsdGVkX19gl8n0oW8QiF+tzrWAcoor2jI9tY70ZkkzvCZ0TNCaj4zntQ/YmBbtin3hKGgpfgP26Wdmhqjpmqb7H40JFEgKqGyEXYGN12g1Zm2lWLDgR1kD/9uTpzC/9fp/KKZi/xZ68QBas7PBwMLwAGD9yi7BBA53d/XA8BQFE4TAn+q26aj9JhAT2mRs4FQ/TYYoy+Bk0Il4e+Xdy6xHh3TK1Aj6BOLYkbkI408lJsrvQCelSFL935s7kCqb1Wll8TnFxXd9rZ+VEy8Aqz10HpE/F6nX0lJ/XTlOGEt5NipzjytuWbhP0SvtsP+gbY87xdxQwIJiv40brEPEs9JIE5yBu9cqUMchSKH/X99WaEEOtFz3yrcDqONXmI4dRfzjhbpqSprxjmzg31y2ZBbwu0BxE1g935d9jBoi3GoEYRrdOY6FJ4jkLDzLo5b1zkZph8jFqCv7ZifCCY381ucXl2ljthjzy5ly837hZrzmOVRYIfwIuAzORktuFg/Py+95ej2GmmP+xvXx/E3Pk70ufOipZ7cr65YiogJtNBxa7cikikwF3Riu0tbX1q9IFZkoYl8FMgP+Yq5WN0hlJZd1kAAKVyGk+YOq8tia+R9HUsXZnKDPM7wzv91MXfIBtqSeDZ8ZFT/zFWlVDba74mRUZnOoH1SQqtPo1O0YCQTcPlcvXftDDg74XqoIfkhIJVZ6hoT/jsgcqm5r8lBiXclcJykv2YrQF2GhkEIJgO0sKuv+3rihNlIDug6Dl6oAFW42Qhj0AAg2uE8hpsLo0g95BloJRceFOaOZFtpCQNaPKw7YSqh62g/lNwvY8gtfgEbAAqnnq5uEMAxYf/2KRenpgRR4XZpQXWDK9H70nqtZpcMvFoHgaWPCdmhDdVGWW9g2srM2dq1BagY1FnlmM8bVgOMgkz7su6jee7IqCXOiueCVPoS85cwBFB8oVD6dzmG+Wm3NJCq/OpJVaxL TnQrcyiW ti+eKtmJS9JZG0GUnw1RJvDbBi5iHL/0ILPHi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to hook when updating page cache that might have pages that have been mapped on any tasks so that tlb flush needed can be performed. Signed-off-by: Byungchul Park --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 ++++------- fs/affs/file.c | 4 ++-- fs/buffer.c | 14 ++++++-------- fs/exfat/file.c | 5 ++--- fs/ext4/verity.c | 5 ++--- fs/f2fs/super.c | 5 ++--- fs/f2fs/verity.c | 5 ++--- fs/namei.c | 5 ++--- include/linux/fs.h | 18 ++++++++++++++++++ mm/filemap.c | 5 ++--- 10 files changed, 42 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index ae3343c81a645..22ce009d13689 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -418,7 +418,6 @@ shmem_pwrite(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pwrite *arg) { struct address_space *mapping = obj->base.filp->f_mapping; - const struct address_space_operations *aops = mapping->a_ops; char __user *user_data = u64_to_user_ptr(arg->data_ptr); u64 remain; loff_t pos; @@ -477,7 +476,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj, if (err) return err; - err = aops->write_begin(obj->base.filp, mapping, pos, len, + err = mapping_write_begin(obj->base.filp, mapping, pos, len, &folio, &data); if (err < 0) return err; @@ -488,7 +487,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj, pagefault_enable(); kunmap_local(vaddr); - err = aops->write_end(obj->base.filp, mapping, pos, len, + err = mapping_write_end(obj->base.filp, mapping, pos, len, len - unwritten, folio, data); if (err < 0) return err; @@ -654,7 +653,6 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, { struct drm_i915_gem_object *obj; struct file *file; - const struct address_space_operations *aops; loff_t pos; int err; @@ -666,21 +664,20 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, GEM_BUG_ON(obj->write_domain != I915_GEM_DOMAIN_CPU); file = obj->base.filp; - aops = file->f_mapping->a_ops; pos = 0; do { unsigned int len = min_t(typeof(size), size, PAGE_SIZE); struct folio *folio; void *fsdata; - err = aops->write_begin(file, file->f_mapping, pos, len, + err = mapping_write_begin(file, file->f_mapping, pos, len, &folio, &fsdata); if (err < 0) goto fail; memcpy_to_folio(folio, offset_in_folio(folio, pos), data, len); - err = aops->write_end(file, file->f_mapping, pos, len, len, + err = mapping_write_end(file, file->f_mapping, pos, len, len, folio, fsdata); if (err < 0) goto fail; diff --git a/fs/affs/file.c b/fs/affs/file.c index a5a861dd52230..10e7f53828e93 100644 --- a/fs/affs/file.c +++ b/fs/affs/file.c @@ -885,9 +885,9 @@ affs_truncate(struct inode *inode) loff_t isize = inode->i_size; int res; - res = mapping->a_ops->write_begin(NULL, mapping, isize, 0, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, isize, 0, &folio, &fsdata); if (!res) - res = mapping->a_ops->write_end(NULL, mapping, isize, 0, 0, folio, fsdata); + res = mapping_write_end(NULL, mapping, isize, 0, 0, folio, fsdata); else inode->i_size = AFFS_I(inode)->mmu_private; mark_inode_dirty(inode); diff --git a/fs/buffer.c b/fs/buffer.c index cc8452f602516..f54fce7729bf1 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2456,7 +2456,6 @@ EXPORT_SYMBOL(block_read_full_folio); int generic_cont_expand_simple(struct inode *inode, loff_t size) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; struct folio *folio; void *fsdata = NULL; int err; @@ -2465,11 +2464,11 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size) if (err) goto out; - err = aops->write_begin(NULL, mapping, size, 0, &folio, &fsdata); + err = mapping_write_begin(NULL, mapping, size, 0, &folio, &fsdata); if (err) goto out; - err = aops->write_end(NULL, mapping, size, 0, 0, folio, fsdata); + err = mapping_write_end(NULL, mapping, size, 0, 0, folio, fsdata); BUG_ON(err > 0); out: @@ -2481,7 +2480,6 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, loff_t pos, loff_t *bytes) { struct inode *inode = mapping->host; - const struct address_space_operations *aops = mapping->a_ops; unsigned int blocksize = i_blocksize(inode); struct folio *folio; void *fsdata = NULL; @@ -2501,12 +2499,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = PAGE_SIZE - zerofrom; - err = aops->write_begin(file, mapping, curpos, len, + err = mapping_write_begin(file, mapping, curpos, len, &folio, &fsdata); if (err) goto out; folio_zero_range(folio, offset_in_folio(folio, curpos), len); - err = aops->write_end(file, mapping, curpos, len, len, + err = mapping_write_end(file, mapping, curpos, len, len, folio, fsdata); if (err < 0) goto out; @@ -2534,12 +2532,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = offset - zerofrom; - err = aops->write_begin(file, mapping, curpos, len, + err = mapping_write_begin(file, mapping, curpos, len, &folio, &fsdata); if (err) goto out; folio_zero_range(folio, offset_in_folio(folio, curpos), len); - err = aops->write_end(file, mapping, curpos, len, len, + err = mapping_write_end(file, mapping, curpos, len, len, folio, fsdata); if (err < 0) goto out; diff --git a/fs/exfat/file.c b/fs/exfat/file.c index 05b51e7217838..9a1002761f79f 100644 --- a/fs/exfat/file.c +++ b/fs/exfat/file.c @@ -539,7 +539,6 @@ static int exfat_extend_valid_size(struct file *file, loff_t new_valid_size) struct inode *inode = file_inode(file); struct exfat_inode_info *ei = EXFAT_I(inode); struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *ops = mapping->a_ops; pos = ei->valid_size; while (pos < new_valid_size) { @@ -551,14 +550,14 @@ static int exfat_extend_valid_size(struct file *file, loff_t new_valid_size) if (pos + len > new_valid_size) len = new_valid_size - pos; - err = ops->write_begin(file, mapping, pos, len, &folio, NULL); + err = mapping_write_begin(file, mapping, pos, len, &folio, NULL); if (err) goto out; off = offset_in_folio(folio, pos); folio_zero_new_buffers(folio, off, off + len); - err = ops->write_end(file, mapping, pos, len, len, folio, NULL); + err = mapping_write_end(file, mapping, pos, len, len, folio, NULL); if (err < 0) goto out; pos += len; diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c index d9203228ce979..64fa43f80c73e 100644 --- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -68,7 +68,6 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, loff_t pos) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; if (pos + count > inode->i_sb->s_maxbytes) return -EFBIG; @@ -80,13 +79,13 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, void *fsdata = NULL; int res; - res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, pos, n, &folio, &fsdata); if (res) return res; memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n); - res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata); + res = mapping_write_end(NULL, mapping, pos, n, n, folio, fsdata); if (res < 0) return res; if (res != n) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 19b67828ae325..87c26f0571dab 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2710,7 +2710,6 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, { struct inode *inode = sb_dqopt(sb)->files[type]; struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *a_ops = mapping->a_ops; int offset = off & (sb->s_blocksize - 1); size_t towrite = len; struct folio *folio; @@ -2722,7 +2721,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, tocopy = min_t(unsigned long, sb->s_blocksize - offset, towrite); retry: - err = a_ops->write_begin(NULL, mapping, off, tocopy, + err = mapping_write_begin(NULL, mapping, off, tocopy, &folio, &fsdata); if (unlikely(err)) { if (err == -ENOMEM) { @@ -2735,7 +2734,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, memcpy_to_folio(folio, offset_in_folio(folio, off), data, tocopy); - a_ops->write_end(NULL, mapping, off, tocopy, tocopy, + mapping_write_end(NULL, mapping, off, tocopy, tocopy, folio, fsdata); offset = 0; towrite -= tocopy; diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c index 2287f238ae09e..b232589546d39 100644 --- a/fs/f2fs/verity.c +++ b/fs/f2fs/verity.c @@ -72,7 +72,6 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, loff_t pos) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; if (pos + count > F2FS_BLK_TO_BYTES(max_file_blocks(inode))) return -EFBIG; @@ -84,13 +83,13 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, void *fsdata = NULL; int res; - res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, pos, n, &folio, &fsdata); if (res) return res; memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n); - res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata); + res = mapping_write_end(NULL, mapping, pos, n, n, folio, fsdata); if (res < 0) return res; if (res != n) diff --git a/fs/namei.c b/fs/namei.c index 3ab9440c5b931..e1c6d28c560da 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5409,7 +5409,6 @@ EXPORT_SYMBOL(page_readlink); int page_symlink(struct inode *inode, const char *symname, int len) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; bool nofs = !mapping_gfp_constraint(mapping, __GFP_FS); struct folio *folio; void *fsdata = NULL; @@ -5419,7 +5418,7 @@ int page_symlink(struct inode *inode, const char *symname, int len) retry: if (nofs) flags = memalloc_nofs_save(); - err = aops->write_begin(NULL, mapping, 0, len-1, &folio, &fsdata); + err = mapping_write_begin(NULL, mapping, 0, len-1, &folio, &fsdata); if (nofs) memalloc_nofs_restore(flags); if (err) @@ -5427,7 +5426,7 @@ int page_symlink(struct inode *inode, const char *symname, int len) memcpy(folio_address(folio), symname, len - 1); - err = aops->write_end(NULL, mapping, 0, len - 1, len - 1, + err = mapping_write_end(NULL, mapping, 0, len - 1, len - 1, folio, fsdata); if (err < 0) goto fail; diff --git a/include/linux/fs.h b/include/linux/fs.h index 2c3b2f8a621f7..820ff4752249e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -531,6 +531,24 @@ struct address_space { #define PAGECACHE_TAG_WRITEBACK XA_MARK_1 #define PAGECACHE_TAG_TOWRITE XA_MARK_2 +static inline int mapping_write_begin(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, + struct folio **foliop, void **fsdata) +{ + return mapping->a_ops->write_begin(file, mapping, pos, len, foliop, + fsdata); +} + +static inline int mapping_write_end(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct folio *folio, void *fsdata) +{ + return mapping->a_ops->write_end(file, mapping, pos, len, copied, + folio, fsdata); +} + /* * Returns true if any of the pages in the mapping are marked with the tag. */ diff --git a/mm/filemap.c b/mm/filemap.c index 804d7365680c1..6a1f90ddaaf08 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4152,7 +4152,6 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) struct file *file = iocb->ki_filp; loff_t pos = iocb->ki_pos; struct address_space *mapping = file->f_mapping; - const struct address_space_operations *a_ops = mapping->a_ops; size_t chunk = mapping_max_folio_size(mapping); long status = 0; ssize_t written = 0; @@ -4186,7 +4185,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) break; } - status = a_ops->write_begin(file, mapping, pos, bytes, + status = mapping_write_begin(file, mapping, pos, bytes, &folio, &fsdata); if (unlikely(status < 0)) break; @@ -4201,7 +4200,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); flush_dcache_folio(folio); - status = a_ops->write_end(file, mapping, pos, bytes, copied, + status = mapping_write_end(file, mapping, pos, bytes, copied, folio, fsdata); if (unlikely(status != copied)) { iov_iter_revert(i, copied - max(status, 0L)); From patchwork Wed Feb 26 12:03:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992208 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 203FCC18E7C for ; Wed, 26 Feb 2025 12:04:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B97E828003C; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2064280039; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8368628003C; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 56085280037 for ; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id EDB92121222 for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) X-FDA: 83161961988.04.4301C86 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id D12321A0014 for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=i8okyV1bMMxsnE1nGY/SjhwJCpCV1U3D7JU3an1SCyg=; b=qDZ/mO5OeqHtvvd1Eq5HpPaf4xv4n2KnX1mKrpVilShDzoEp9eq7S5HLIetLOSg1HjhYNA 4lNneejEhjkBdLZvPEdMpXmrwOp61QsEQLF9AuVGZiPPzi0euJkBIFqBYczG2HkOuJ4IN+ Cf/RFMXePeKozR75d/a0czUxNlOeKAQ= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571433; a=rsa-sha256; cv=none; b=2W/D9EK2aWB5ZWdj7Xx+RqBroTKbjKnqiGBiDX+bdpid0ZAAwr4TNjL8Z/iAXXBitIRS0T MijutSVy+eH3uL5m2JUC3HDMxEOFhzR2YnKqmVT3P+/ttO1h12iSkqJ20AV4FWsdNh3pfX SJj5hBFzIh3ae2DkIFDsla5F4IWbdik= X-AuditID: a67dfc5b-3e1ff7000001d7ae-42-67bf03223aab From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 16/25] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Date: Wed, 26 Feb 2025 21:03:27 +0900 Message-Id: <20250226120336.29565-16-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnoa4y8/50g1cz2S3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2ND5wnGgrd3GSs+HFjO2MDYsp6x i5GTQ0LARGLn0vVsMPaTztNMIDabgLrEjRs/mUFsEQEziYOtf9i7GLk4mAWWMUnsPdHABuII CzQySpz6/wusm0VAVaKxYQFYNy9Qx5LWGawQU+UlVm84ADaJEyj+adoxsHohgWSJnb//MEHU 3GeTWLxaB8KWlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECQ3tZ7Z/oHYyf LgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlY lMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYM/aE9KnNDu0OLva53d6ycCfb4hcKvFV1k1ZK OOkdX79kX5Ivi45UzRreTgNLYS3xj1dvyV069dVJ9lfifaP9GwUcIvO6vSd8XHL81uEEl5On J2R33PX6baXSGNK3c7rfrju9PU/4E/skKyyzVx6Yfu7/9qXKZs8jku8FZ1c8Kv/8O8ameWGV EktxRqKhFnNRcSIAVH5wNWkCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrHLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/dYP5vS4s569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyNnSeYCx4e5ex 4sOB5YwNjC3rGbsYOTkkBEwknnSeZgKx2QTUJW7c+MkMYosImEkcbP3D3sXIxcEssIxJYu+J BjYQR1igkVHi1P9fbCBVLAKqEo0NC8C6eYE6lrTOYIWYKi+xesMBsEmcQPFP046B1QsJJEvs /P2HaQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBobqs9s/EHYxfLrsfYhTgYFTi 4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlYlMR5vcJTE4QE 0hNLUrNTUwtSi2CyTBycUg2M+kZOW/SWH+40DnrMMZXJJKlgm93rxb+ncBzO3sSjFVPgfzH+ u3jw5k2Zd8Mzi2Xud077L3Bl6rcll+u8tUxmJ9qVekUcjn5gGK/n+OsHw7lQ3e6j6TzhO9s/ SSX4rFRn0ZkbdZDxWMpTqV1FLBsj7OQEswziZj2Xzaz5sD75o/qbd0tmuNUosRRnJBpqMRcV JwIAwIEWM1ECAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: 7owp4shr6wbpme391h6ktq8bxkntou6h X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D12321A0014 X-HE-Tag: 1740571432-544755 X-HE-Meta: U2FsdGVkX1+Oh2vOTCPVWg6hV1ZUBBLwKb+ARkL5ulMu8Tz9+XFWLunBKjZKC/1B9Hf/ZM0GLfftTwn3FCs6C98Jo7pF0ZTONQE5s9u+BMfWv7XzQtyH4kfgX4HAitI1zGciDqHBj45bAsHF0BSIWySu2Ch0kMNqWkmWV4LwxpsHDSKA8JApIcCeBs8+PDUizfni0DMIC8wd+Aa43k8V85kRMFtYGVkuOtQGrfRGnk0Mlq1NBTi3lclW5Paz53Bwo49bXEQVv29wpg0iDxpYa9nOmq1WIi9l+RDO+wxf/3vT8oGS55rBA6acla7Kt/oJjBTncnV224TOU5GkDOGOnR/NUhGmIB8HlEx5VqqTR51WjuSuOmyk9ejwjHeNabb25Tmtvt3FOTnGyhZ1dMF3Bn+2ZqFiRdt50n+anKO/2Qls2ONVFAe7H+yIXgOOkvwUOkOcFghfTDxm31kjDKUkHyG8P0HfzXZzTXlqJVWJhmKbOUCqPTm7+VTbWfv6Sk4qS8fH8Poftj5EJQ7f2Ad/vEO8jbi9UrVKBwgk3yUOrVk5Zp10FY00JGkQ0rgfsejiBmGp7NdgjPuO8LBk8mtxddpC4X822PcZeazCIIZB2/sAwpBP+2PUvH698htz1wLjZsKRzruHLSISVSCvxr/KGXwplyvmsVDuh6BKC2zspFGOXhrk652WLMUamtuyhSYAvLciWvrx/DBz4kW8KWtkywtN2dGY8ZNH05WXAi0fTINJIXMfkPNmOJ9zVOk36Ah8j4gqeLSozfSg+ELlgXgZo/CqkgkmA46QnZzNobl8sxcPtkw2xjAGsi/ZCrTmAIsl/0GjTwUxwy26Lyu0gCyczxMi3Jl7XHJZaY/bCJ29eyhQ0qtiPEmGukslWcSxyJzO16RxTqQ7BZ+R3IqolHBESawMx8vdaVqI1dCa+3zi2Q+5CCsqZ6vpx0UWJV+4WsKdjxU0Tyh/UqBQRBCx9yO o+g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read-only and were unmapped, as long as the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. tlb flush can be defered when folios get unmapped as long as it guarantees to perform tlb flush needed, before the folios actually become used, of course, only if all the corresponding ptes don't have write permission. Otherwise, the system will get messed up. To achieve that, for the folios that map only to non-writable tlb entries, prevent tlb flush during unmapping but perform it just before the folios actually become used, out of buddy or pcp. However, we should cancel the pending by LUF and perform the deferred TLB flush right away when: 1. a writable pte is newly set through fault handler 2. a file is updated 3. kasan needs poisoning on free 4. the kernel wants to init pages on free No matter what type of workload is used for performance evaluation, the result would be positive thanks to the unconditional reduction of tlb flushes, tlb misses and interrupts. For the test, I picked up one of the most popular and heavy workload, llama.cpp that is a LLM(Large Language Model) inference engine. The result would depend on memory latency and how often reclaim runs, which implies tlb miss overhead and how many times unmapping happens. In my system, the result shows: 1. tlb shootdown interrupts are reduced about 97%. 2. The test program runtime is reduced about 4.5%. The test environment and the test set are like: Machine: bare metal, x86_64, Intel(R) Xeon(R) Gold 6430 CPU: 1 socket 64 core with hyper thread on Numa: 2 nodes (64 CPUs DRAM 42GB, no CPUs CXL expander 98GB) Config: swap off, numa balancing tiering on, demotion enabled llama.cpp/main -m $(70G_model1) -p "who are you?" -s 1 -t 15 -n 20 & llama.cpp/main -m $(70G_model2) -p "who are you?" -s 1 -t 15 -n 20 & llama.cpp/main -m $(70G_model3) -p "who are you?" -s 1 -t 15 -n 20 & wait where, -t: nr of threads, -s: seed used to make the runtime stable, -n: nr of tokens that determines the runtime, -p: prompt to ask, -m: LLM model to use. Run the test set 5 times successively with caches dropped every run via 'echo 3 > /proc/sys/vm/drop_caches'. Each inference prints its runtime at the end of each. The results are like: 1. Runtime from the output of llama.cpp BEFORE ------ llama_print_timings: total time = 883450.54 ms / 24 tokens llama_print_timings: total time = 861665.91 ms / 24 tokens llama_print_timings: total time = 898079.02 ms / 24 tokens llama_print_timings: total time = 879897.69 ms / 24 tokens llama_print_timings: total time = 892360.75 ms / 24 tokens llama_print_timings: total time = 884587.85 ms / 24 tokens llama_print_timings: total time = 861023.19 ms / 24 tokens llama_print_timings: total time = 900022.18 ms / 24 tokens llama_print_timings: total time = 878771.88 ms / 24 tokens llama_print_timings: total time = 889027.98 ms / 24 tokens llama_print_timings: total time = 880783.90 ms / 24 tokens llama_print_timings: total time = 856475.29 ms / 24 tokens llama_print_timings: total time = 896842.21 ms / 24 tokens llama_print_timings: total time = 878883.53 ms / 24 tokens llama_print_timings: total time = 890122.10 ms / 24 tokens AFTER ----- llama_print_timings: total time = 871060.86 ms / 24 tokens llama_print_timings: total time = 825609.53 ms / 24 tokens llama_print_timings: total time = 836854.81 ms / 24 tokens llama_print_timings: total time = 843147.99 ms / 24 tokens llama_print_timings: total time = 831426.65 ms / 24 tokens llama_print_timings: total time = 873939.23 ms / 24 tokens llama_print_timings: total time = 826127.69 ms / 24 tokens llama_print_timings: total time = 835489.26 ms / 24 tokens llama_print_timings: total time = 842589.62 ms / 24 tokens llama_print_timings: total time = 833700.66 ms / 24 tokens llama_print_timings: total time = 875996.19 ms / 24 tokens llama_print_timings: total time = 826401.73 ms / 24 tokens llama_print_timings: total time = 839341.28 ms / 24 tokens llama_print_timings: total time = 841075.10 ms / 24 tokens llama_print_timings: total time = 835136.41 ms / 24 tokens 2. tlb shootdowns from 'cat /proc/interrupts' BEFORE ------ TLB: 80911532 93691786 100296251 111062810 109769109 109862429 108968588 119175230 115779676 118377498 119325266 120300143 124514185 116697222 121068466 118031913 122660681 117494403 121819907 116960596 120936335 117217061 118630217 122322724 119595577 111693298 119232201 120030377 115334687 113179982 118808254 116353592 140987367 137095516 131724276 139742240 136501150 130428761 127585535 132483981 133430250 133756207 131786710 126365824 129812539 133850040 131742690 125142213 128572830 132234350 131945922 128417707 133355434 129972846 126331823 134050849 133991626 121129038 124637283 132830916 126875507 122322440 125776487 124340278 TLB shootdowns AFTER ----- TLB: 2121206 2615108 2983494 2911950 3055086 3092672 3204894 3346082 3286744 3307310 3357296 3315940 3428034 3112596 3143325 3185551 3186493 3322314 3330523 3339663 3156064 3272070 3296309 3198962 3332662 3315870 3234467 3353240 3281234 3300666 3345452 3173097 4009196 3932215 3898735 3726531 3717982 3671726 3728788 3724613 3799147 3691764 3620630 3684655 3666688 3393974 3448651 3487593 3446357 3618418 3671920 3712949 3575264 3715385 3641513 3630897 3691047 3630690 3504933 3662647 3629926 3443044 3832970 3548813 TLB shootdowns Signed-off-by: Byungchul Park --- include/asm-generic/tlb.h | 5 ++ include/linux/fs.h | 12 +++- include/linux/mm_types.h | 6 ++ include/linux/sched.h | 9 +++ kernel/sched/core.c | 1 + mm/internal.h | 94 ++++++++++++++++++++++++- mm/memory.c | 15 ++++ mm/pgtable-generic.c | 2 + mm/rmap.c | 141 +++++++++++++++++++++++++++++++++++--- mm/truncate.c | 55 +++++++++++++-- mm/vmscan.c | 12 +++- 11 files changed, 333 insertions(+), 19 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index e402aef79c93e..5bb6b05bd3549 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -565,6 +565,11 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct * static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { + /* + * Don't leave stale tlb entries for this vma. + */ + luf_flush(0); + if (tlb->fullmm) return; diff --git a/include/linux/fs.h b/include/linux/fs.h index 820ff4752249e..78aaf769d32d1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -536,8 +536,18 @@ static inline int mapping_write_begin(struct file *file, loff_t pos, unsigned len, struct folio **foliop, void **fsdata) { - return mapping->a_ops->write_begin(file, mapping, pos, len, foliop, + int ret; + + ret = mapping->a_ops->write_begin(file, mapping, pos, len, foliop, fsdata); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + if (!ret) + luf_flush(0); + + return ret; } static inline int mapping_write_end(struct file *file, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4bfe8d072b0ea..cb9e6282b7ad1 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1339,6 +1339,12 @@ extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_finish_mmu(struct mmu_gather *tlb); +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +void luf_flush(unsigned short luf_key); +#else +static inline void luf_flush(unsigned short luf_key) {} +#endif + struct vm_fault; /** diff --git a/include/linux/sched.h b/include/linux/sched.h index d1a3c97491ff2..47a0a3ccb7b1a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1408,6 +1408,15 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; struct tlbflush_unmap_batch tlb_ubc_ro; + struct tlbflush_unmap_batch tlb_ubc_luf; + +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + /* + * whether all the mappings of a folio during unmap are read-only + * so that luf can work on the folio + */ + bool can_luf; +#endif /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9aecd914ac691..1f4c5da800365 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5275,6 +5275,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) if (mm) { membarrier_mm_sync_core_before_usermode(mm); mmdrop_lazy_tlb_sched(mm); + luf_flush(0); } if (unlikely(prev_state == TASK_DEAD)) { diff --git a/mm/internal.h b/mm/internal.h index 9dbb65f919337..43e91f04d6d1c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1601,13 +1601,105 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } -#else + +unsigned short fold_unmap_luf(void); + +/* + * Reset the indicator indicating there are no writable mappings at the + * beginning of every rmap traverse for unmap. luf can work only when + * all the mappings are read-only. + */ +static inline void can_luf_init(struct folio *f) +{ + if (IS_ENABLED(CONFIG_DEBUG_PAGEALLOC)) + current->can_luf = false; + /* + * Pages might get updated inside buddy. + */ + else if (want_init_on_free()) + current->can_luf = false; + /* + * Pages might get updated inside buddy. + */ + else if (!should_skip_kasan_poison(folio_page(f, 0))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles zone device folio. + */ + else if (unlikely(folio_is_zone_device(f))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles hugetlb folio. + */ + else if (unlikely(folio_test_hugetlb(f))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles large folio. + */ + else if (unlikely(folio_test_large(f))) + current->can_luf = false; + /* + * Can track write of anon folios through fault handler. + */ + else if (folio_test_anon(f)) + current->can_luf = true; + /* + * Can track write of file folios through page cache or truncation. + */ + else if (folio_mapping(f)) + current->can_luf = true; + /* + * For niehter anon nor file folios, do not apply luf. + */ + else + current->can_luf = false; +} + +/* + * Mark the folio is not applicable to luf once it found a writble or + * dirty pte during rmap traverse for unmap. + */ +static inline void can_luf_fail(void) +{ + current->can_luf = false; +} + +/* + * Check if all the mappings are read-only. + */ +static inline bool can_luf_test(void) +{ + return current->can_luf; +} + +static inline bool can_luf_vma(struct vm_area_struct *vma) +{ + /* + * Shared region requires a medium like file to keep all the + * associated mm_struct. luf makes use of strcut address_space + * for that purpose. + */ + if (vma->vm_flags & VM_SHARED) + return !!vma->vm_file; + + /* + * Private region can be handled through its mm_struct. + */ + return true; +} +#else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ static inline bool luf_takeoff_start(void) { return false; } static inline void luf_takeoff_end(void) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct page *page) { return true; } static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } static inline bool non_luf_pages_ok(struct zone *zone) { return true; } +static inline unsigned short fold_unmap_luf(void) { return 0; } + +static inline void can_luf_init(struct folio *f) {} +static inline void can_luf_fail(void) {} +static inline bool can_luf_test(void) { return false; } +static inline bool can_luf_vma(struct vm_area_struct *vma) { return false; } #endif /* pagewalk.c */ diff --git a/mm/memory.c b/mm/memory.c index b4d3d4893267c..c1d2d2b0112cd 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6181,6 +6181,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, struct mm_struct *mm = vma->vm_mm; vm_fault_t ret; bool is_droppable; + bool flush = false; __set_current_state(TASK_RUNNING); @@ -6206,6 +6207,14 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, lru_gen_enter_fault(vma); + /* + * Any potential cases that make pte writable even forcely + * should be considered. + */ + if (vma->vm_flags & (VM_WRITE | VM_MAYWRITE) || + flags & FAULT_FLAG_WRITE) + flush = true; + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else @@ -6237,6 +6246,12 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, out: mm_account_fault(mm, regs, address, flags, ret); + /* + * Ensure to clean stale tlb entries for this vma. + */ + if (flush) + luf_flush(0); + return ret; } EXPORT_SYMBOL_GPL(handle_mm_fault); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5a882f2b10f90..d6678d6bac746 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -99,6 +99,8 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, pte = ptep_get_and_clear(mm, address, ptep); if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); + else + luf_flush(0); return pte; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index 3ed6234dd777e..c3df36cf7ac16 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -646,7 +646,7 @@ static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. */ -static unsigned long __maybe_unused new_luf_ugen(void) +static unsigned long new_luf_ugen(void) { unsigned long ugen = atomic_long_inc_return(&luf_ugen); @@ -723,7 +723,7 @@ static atomic_t luf_kgen = ATOMIC_INIT(1); /* * Don't return invalid luf_key, zero. */ -static unsigned short __maybe_unused new_luf_key(void) +static unsigned short new_luf_key(void) { unsigned short luf_key = atomic_inc_return(&luf_kgen); @@ -776,6 +776,7 @@ void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; if (!tlb_ubc_takeoff->flush_required) @@ -793,9 +794,72 @@ void try_to_unmap_flush_takeoff(void) if (arch_tlbbatch_done(&tlb_ubc_ro->arch, &tlb_ubc_takeoff->arch)) reset_batch(tlb_ubc_ro); + if (arch_tlbbatch_done(&tlb_ubc_luf->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc_luf); + reset_batch(tlb_ubc_takeoff); } +/* + * Should be called just before try_to_unmap_flush() to optimize the tlb + * shootdown using arch_tlbbatch_done(). + */ +unsigned short fold_unmap_luf(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + struct luf_batch *lb; + unsigned long new_ugen; + unsigned short new_key; + unsigned long flags; + + if (!tlb_ubc_luf->flush_required) + return 0; + + /* + * fold_unmap_luf() is always followed by try_to_unmap_flush(). + */ + if (arch_tlbbatch_done(&tlb_ubc_luf->arch, &tlb_ubc->arch)) { + tlb_ubc_luf->flush_required = false; + tlb_ubc_luf->writable = false; + } + + /* + * Check again after shrinking. + */ + if (!tlb_ubc_luf->flush_required) + return 0; + + new_ugen = new_luf_ugen(); + new_key = new_luf_key(); + + /* + * Update the next entry of luf_batch table, that is the oldest + * entry among the candidate, hopefully tlb flushes have been + * done for all of the CPUs. + */ + lb = &luf_batch[new_key]; + write_lock_irqsave(&lb->lock, flags); + __fold_luf_batch(lb, tlb_ubc_luf, new_ugen); + write_unlock_irqrestore(&lb->lock, flags); + + reset_batch(tlb_ubc_luf); + return new_key; +} + +void luf_flush(unsigned short luf_key) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb = &luf_batch[luf_key]; + unsigned long flags; + + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -806,8 +870,10 @@ void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; fold_batch(tlb_ubc, tlb_ubc_ro, true); + fold_batch(tlb_ubc, tlb_ubc_luf, true); if (!tlb_ubc->flush_required) return; @@ -820,8 +886,9 @@ void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; - if (tlb_ubc->writable || tlb_ubc_ro->writable) + if (tlb_ubc->writable || tlb_ubc_ro->writable || tlb_ubc_luf->writable) try_to_unmap_flush(); } @@ -836,7 +903,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + struct vm_area_struct *vma) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -845,7 +913,16 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; - if (pte_write(pteval)) + if (can_luf_test()) { + /* + * luf cannot work with the folio once it found a + * writable or dirty mapping on it. + */ + if (pte_write(pteval) || !can_luf_vma(vma)) + can_luf_fail(); + } + + if (!can_luf_test()) tlb_ubc = ¤t->tlb_ubc; else tlb_ubc = ¤t->tlb_ubc_ro; @@ -853,6 +930,21 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; + if (can_luf_test()) { + struct luf_batch *lb; + unsigned long flags; + + /* + * Accumulate to the 0th entry right away so that + * luf_flush(0) can be uesed to properly perform pending + * TLB flush once this unmapping is observed. + */ + lb = &luf_batch[0]; + write_lock_irqsave(&lb->lock, flags); + __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); + write_unlock_irqrestore(&lb->lock, flags); + } + /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -907,6 +999,8 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) * This must be called under the PTL so that an access to tlb_flush_batched * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise * via the PTL. + * + * LUF(Lazy Unmap Flush) also relies on this for mprotect/munmap/etc. */ void flush_tlb_batched_pending(struct mm_struct *mm) { @@ -916,6 +1010,7 @@ void flush_tlb_batched_pending(struct mm_struct *mm) if (pending != flushed) { arch_flush_tlb_batched_pending(mm); + /* * If the new TLB flushing is pending during flushing, leave * mm->tlb_flush_batched as is, to avoid losing flushing. @@ -926,7 +1021,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + struct vm_area_struct *vma) { } @@ -1292,6 +1388,11 @@ int folio_mkclean(struct folio *folio) rmap_walk(folio, &rwc); + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + return cleaned; } EXPORT_SYMBOL_GPL(folio_mkclean); @@ -1961,7 +2062,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, vma); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2132,6 +2233,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range_end(&range); + if (!ret) + can_luf_fail(); return ret; } @@ -2164,11 +2267,21 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + + can_luf_init(folio); if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); + + if (can_luf_test()) + fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); + else + fold_batch(tlb_ubc, tlb_ubc_ro, true); } /* @@ -2338,7 +2451,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, vma); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2494,6 +2607,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range_end(&range); + if (!ret) + can_luf_fail(); return ret; } @@ -2513,6 +2628,9 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; /* * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and @@ -2537,10 +2655,17 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) if (!folio_test_ksm(folio) && folio_test_anon(folio)) rwc.invalid_vma = invalid_migration_vma; + can_luf_init(folio); + if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); + + if (can_luf_test()) + fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); + else + fold_batch(tlb_ubc, tlb_ubc_ro, true); } #ifdef CONFIG_DEVICE_PRIVATE diff --git a/mm/truncate.c b/mm/truncate.c index e2e115adfbc58..2bf3806391c21 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -124,6 +124,11 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) if (aops->invalidate_folio) aops->invalidate_folio(folio, offset, length); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -160,6 +165,11 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) truncate_cleanup_folio(folio); filemap_remove_folio(folio); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return 0; } @@ -205,6 +215,12 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) if (folio_needs_release(folio)) folio_invalidate(folio, offset, length); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + if (!folio_test_large(folio)) return true; if (split_folio(folio) == 0) @@ -246,19 +262,28 @@ EXPORT_SYMBOL(generic_error_remove_folio); */ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) { + long ret = 0; + /* The page may have been truncated before it was locked */ if (!mapping) - return 0; + goto out; if (folio_test_dirty(folio) || folio_test_writeback(folio)) - return 0; + goto out; /* The refcount will be elevated if any page in the folio is mapped */ if (folio_ref_count(folio) > folio_nr_pages(folio) + folio_has_private(folio) + 1) - return 0; + goto out; if (!filemap_release_folio(folio, 0)) - return 0; + goto out; - return remove_mapping(mapping, folio); + ret = remove_mapping(mapping, folio); +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + + return ret; } /** @@ -298,7 +323,7 @@ void truncate_inode_pages_range(struct address_space *mapping, bool same_folio; if (mapping_empty(mapping)) - return; + goto out; /* * 'start' and 'end' always covers the range of pages to be fully @@ -386,6 +411,12 @@ void truncate_inode_pages_range(struct address_space *mapping, truncate_folio_batch_exceptionals(mapping, &fbatch, indices); folio_batch_release(&fbatch); } + +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -501,6 +532,11 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, folio_batch_release(&fbatch); cond_resched(); } + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return count; } @@ -605,7 +641,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, int did_range_unmap = 0; if (mapping_empty(mapping)) - return 0; + goto out; folio_batch_init(&fbatch); index = start; @@ -666,6 +702,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, if (dax_mapping(mapping)) { unmap_mapping_pages(mapping, start, end - start + 1, false); } +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index ff1c53e769398..461e7643898e7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -828,6 +828,8 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio, */ long remove_mapping(struct address_space *mapping, struct folio *folio) { + long ret = 0; + if (__remove_mapping(mapping, folio, false, NULL)) { /* * Unfreezing the refcount with 1 effectively @@ -835,9 +837,15 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) * atomic operation. */ folio_ref_unfreeze(folio, 1); - return folio_nr_pages(folio); + ret = folio_nr_pages(folio); } - return 0; + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + + return ret; } /** From patchwork Wed Feb 26 12:03:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBCC5C021B8 for ; Wed, 26 Feb 2025 12:04:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E955280039; Wed, 26 Feb 2025 07:03:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 06F9328003D; Wed, 26 Feb 2025 07:03:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2B81280039; Wed, 26 Feb 2025 07:03:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BB0E928003D for ; Wed, 26 Feb 2025 07:03:55 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 832331A136E for ; Wed, 26 Feb 2025 12:03:55 +0000 (UTC) X-FDA: 83161962030.04.EB5D3CF Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id 69C0240027 for ; Wed, 26 Feb 2025 12:03:53 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571434; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=m1z3DIBemRlM5FU/RuIYqOTCFG99z1U+8IE9xeAjCls=; b=PE3RHTLg+o22dOzsuzWuEUXbAzlZobwpZNTrhWvcGTRzXy0lEoiDe6Szb9oLyPuOqQLu+f 0eyPdWNBTRJ+eqmV7JJ9OHj4sWBVVuwuUQRsrA5uwTq0yJL/6KBzXB5BgdWCiQxOoz0buo 0boGeyzGmANHO2mQBw6DlV4RJn51gqU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571434; a=rsa-sha256; cv=none; b=sCXsvrnbHMdmrdu1o1pZRpycpXyPifUFOL6PqAXFvixr4kg5H5qWEPmUknCaf929Wy5G5K tagnX3yF6vCBFsdyPKnxm5bA3d6r351G2KQQW2w3Sy2v+rx5va6YSv7TACqbgV8AzTOoRF 8kpVFxJmcwfGqEIYYtafVmM/gfOlSNQ= X-AuditID: a67dfc5b-3e1ff7000001d7ae-48-67bf03239cb8 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 17/25] x86/tlb, riscv/tlb, arm64/tlbflush, mm: remove cpus from tlb shootdown that already have been done Date: Wed, 26 Feb 2025 21:03:28 +0900 Message-Id: <20250226120336.29565-17-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4y8/50gy+T+SzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+P33I/sBU9qKy4e28fSwDg3vYuR k0NCwETiXUsfC4x9r7MDzGYTUJe4ceMnM4gtImAmcbD1D3sXIxcHs8AyJom9JxrYQBxhgSmM Els6lwM5HBwsAqoS088mgJi8QA0zj+VBzJSXWL3hANgcTqDwp2nH2EBsIYFkiZ2//zCBjJEQ uM0m8ebqX6gjJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzKvMwKveT83E2MwKBeVvsnegfj pwvBhxgFOBiVeHgfnNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEhRmkO FiVxXqNv5SlCAumJJanZqakFqUUwWSYOTqkGRt6LRo4z3ljahRZLFHK7ndtYKDqR9d/uTo7a 6/t//bstx9GYwcxy7fRLG1eRD73ZBXpTLJjmpdQnyjmf8tdJyDTRc+P6GvUv18fh16qIyfab c87Jc/gf32EYVOhQl/yWb9GJHJkGPXG/+n7HO+yPvW01/JV2BKmJqqQd41pty60vpiNz21mJ pTgj0VCLuag4EQBkq1b8ZgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4NUBG4s569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyfs/9yF7wpLbi 4rF9LA2Mc9O7GDk5JARMJO51drCA2GwC6hI3bvxkBrFFBMwkDrb+Ye9i5OJgFljGJLH3RAMb iCMsMIVRYkvnciCHg4NFQFVi+tkEEJMXqGHmsTyImfISqzccAJvDCRT+NO0YG4gtJJAssfP3 H6YJjFwLGBlWMYpk5pXlJmbmmOoVZ2dU5mVW6CXn525iBIbosto/E3cwfrnsfohRgINRiYf3 wZm96UKsiWXFlbmHGCU4mJVEeDkz96QL8aYkVlalFuXHF5XmpBYfYpTmYFES5/UKT00QEkhP LEnNTk0tSC2CyTJxcEo1MFbpLm2zva2vZnpo1u0DzJkKDgccOL3Sc0Jttt196Wu+z9zjYI7G 3eLr4W+KMxZUpcm5puzRL54472TOz+O7Tr6V36JaWXFY73Nt2cIPMW1nF4bOarl8I8G4Z+7d HVI1GfdSyxoj+G2O9dbENSkJ/v12pFNr+Q2+Z4ZntWYUqc1XcrjbGWYvosRSnJFoqMVcVJwI AE4I4GZNAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: 69C0240027 X-Stat-Signature: 4t7efa658w67mc48kud1zt54tsbbz7i1 X-Rspamd-Server: rspam03 X-HE-Tag: 1740571433-94675 X-HE-Meta: U2FsdGVkX1/LD/ETnoInh+SUabO6ng3QkHS8wnEskT1xIQrWJmiv9JhKAYoi6Y6bpsPYmg6LOdJLtG0uN5p/cdudKBMXnG9Z1Ki+UH5Lq7FyAsbN9HgEH8EwYKrbmoNR+uahEPz8Y+kUEdsqOb+18aX/X87FzK7HsUxvHrsuQ5sJK4HM1gqEIFvcQnrRmWfkS71VBI1lAbAuAFXzGymLpy7rHbKLtJfXe5RqoqyshsOG/r0mUb1duoR5rZTV/5jP278/db5Vz3yQzseMxz/o89alSSNLthcQK5/DPWJMtCMu2gpyAAt8lpINIek2ZupAswDqCIV2cJxlndWbLpKEhdjW9XZAoUx0hCHDW9/cc/sevLqjCU4vBTd+kzyAt0pwm9mP9/Rdeh0YzDqtBDxWuew2GGMXiIX7C21IH78LFDMDwMCQloJ6Rk24V/FU4edw8CnozTajmZylazuKVnd1Hki1zP3ZUpYklm23SANSMi0sVwdIWDVv3+ZHO9OudRpGvmduIt8LbYzXy2LzhqotsyXLhFgmXWz10AjFf7Gvieopiu0X/SsGe66nmOwH7AIVWUIYXUPNGYboq15Qc46HGn1yPHYA8dW6eHRokNRihwSV+50F/J36UoP2W+OHFYUWXY39p7hc4hhRfi8tcKpNVRbfZrt9piD7YfN0XK979s5Ud/VsV3k32jN0G4XgtNhHp6NmKGeZ42K/qA2UMRYWGSRCNx4GrdfeETlRyJA2WgN/MW6vkKC6fMzq1ZmEKt9++VmkCnI6OLEnkmF/RMoa5fptjzVJw8ke/C7KxY+ryCfuzv7LsurJG77Xg+Xa4+vGhfoSt7HDQJiFnTeQTQ0mtMGO0MrJjtgp+dK7GU3dSH6SGPQcuX68qBNys+3v7wP5e9TFciKp8Kp3QhYj2vvYuXUUBJVqe2Iv0+8vd52L4vvUAVujGs9Ivrj10cOv8XnTCozRngd6FAbXgqmwt3G 3ew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: luf mechanism performs tlb shootdown for mappings that have been unmapped in lazy manner. However, it doesn't have to perform tlb shootdown to cpus that already have been done by others since the tlb shootdown was desired. Since luf already introduced its own generation number used as a global timestamp, luf_ugen, it's possible to selectively pick cpus that have been done tlb flush required. This patch introduced APIs that use the generation number to select and remove those cpus so that it can perform tlb shootdown with a smaller cpumask, for all the CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH archs, x86, riscv, and arm64. Signed-off-by: Byungchul Park --- arch/arm64/include/asm/tlbflush.h | 26 +++++++ arch/riscv/include/asm/tlbflush.h | 4 ++ arch/riscv/mm/tlbflush.c | 108 ++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 4 ++ arch/x86/mm/tlb.c | 108 ++++++++++++++++++++++++++++++ include/linux/sched.h | 1 + mm/internal.h | 4 ++ mm/page_alloc.c | 32 +++++++-- mm/rmap.c | 46 ++++++++++++- 9 files changed, 327 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index acac53a21e5d1..5547ab1ffb3c3 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -354,6 +354,32 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) dsb(ish); } +static inline bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* + * Nothing is needed in this architecture. + */ + return true; +} + +static inline bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* + * Nothing is needed in this architecture. + */ + return true; +} + +static inline void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* nothing to do */ +} + +static inline void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + /* nothing to do */ +} + static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { /* nothing to do */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 1dc7d30273d59..ec5caeb3cf8ef 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -65,6 +65,10 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, unsigned long uaddr); void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 36f996af6256c..93afb7a299003 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -202,3 +202,111 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); } + +static DEFINE_PER_CPU(atomic_long_t, ugen_done); + +static int __init luf_init_arch(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_possible_mask) + atomic_long_set(per_cpu_ptr(&ugen_done, cpu), LUF_UGEN_INIT - 1); + + return 0; +} +early_initcall(luf_init_arch); + +/* + * batch will not be updated. + */ +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (ugen_before(done, ugen)) + return false; + } + return true; +out: + return cpumask_empty(&batch->cpumask); +} + +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (!ugen_before(done, ugen)) + cpumask_clear_cpu(cpu, &batch->cpumask); + } +out: + return cpumask_empty(&batch->cpumask); +} + +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, &batch->cpumask) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index c27e61bd274a5..dbcbf0477ed2a 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -294,6 +294,10 @@ static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) } extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 523e8bb6fba1f..be6068b60c32d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1270,6 +1270,114 @@ void __flush_tlb_all(void) } EXPORT_SYMBOL_GPL(__flush_tlb_all); +static DEFINE_PER_CPU(atomic_long_t, ugen_done); + +static int __init luf_init_arch(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_possible_mask) + atomic_long_set(per_cpu_ptr(&ugen_done, cpu), LUF_UGEN_INIT - 1); + + return 0; +} +early_initcall(luf_init_arch); + +/* + * batch will not be updated. + */ +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (ugen_before(done, ugen)) + return false; + } + return true; +out: + return cpumask_empty(&batch->cpumask); +} + +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (!ugen_before(done, ugen)) + cpumask_clear_cpu(cpu, &batch->cpumask); + } +out: + return cpumask_empty(&batch->cpumask); +} + +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, &batch->cpumask) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info *info; diff --git a/include/linux/sched.h b/include/linux/sched.h index 47a0a3ccb7b1a..31efc88ce911a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1403,6 +1403,7 @@ struct task_struct { #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) int luf_no_shootdown; int luf_takeoff_started; + unsigned long luf_ugen; #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/internal.h b/mm/internal.h index 43e91f04d6d1c..a95c46355e93d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1259,6 +1259,7 @@ void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void try_to_unmap_flush_takeoff(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void reset_batch(struct tlbflush_unmap_batch *batch); void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else @@ -1274,6 +1275,9 @@ static inline void try_to_unmap_flush_takeoff(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void reset_batch(struct tlbflush_unmap_batch *batch) +{ +} static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) { } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index db1460c07b964..8e1ed80f304cd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -668,9 +668,11 @@ bool luf_takeoff_start(void) */ void luf_takeoff_end(void) { + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned long flags; bool no_shootdown; bool outmost = false; + unsigned long cur_luf_ugen; local_irq_save(flags); VM_WARN_ON(!current->luf_takeoff_started); @@ -697,10 +699,19 @@ void luf_takeoff_end(void) if (no_shootdown) goto out; + cur_luf_ugen = current->luf_ugen; + + current->luf_ugen = 0; + + if (cur_luf_ugen && arch_tlbbatch_diet(&tlb_ubc_takeoff->arch, cur_luf_ugen)) + reset_batch(tlb_ubc_takeoff); + try_to_unmap_flush_takeoff(); out: - if (outmost) + if (outmost) { VM_WARN_ON(current->luf_no_shootdown); + VM_WARN_ON(current->luf_ugen); + } } /* @@ -757,6 +768,7 @@ bool luf_takeoff_check_and_fold(struct page *page) struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned short luf_key = page_luf_key(page); struct luf_batch *lb; + unsigned long lb_ugen; unsigned long flags; /* @@ -770,13 +782,25 @@ bool luf_takeoff_check_and_fold(struct page *page) if (!luf_key) return true; - if (current->luf_no_shootdown) - return false; - lb = &luf_batch[luf_key]; read_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + + if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + read_unlock_irqrestore(&lb->lock, flags); + return true; + } + + if (current->luf_no_shootdown) { + read_unlock_irqrestore(&lb->lock, flags); + return false; + } + fold_batch(tlb_ubc_takeoff, &lb->batch, false); read_unlock_irqrestore(&lb->lock, flags); + + if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) + current->luf_ugen = lb_ugen; return true; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index c3df36cf7ac16..fcd27200efa04 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -656,7 +656,7 @@ static unsigned long new_luf_ugen(void) return ugen; } -static void reset_batch(struct tlbflush_unmap_batch *batch) +void reset_batch(struct tlbflush_unmap_batch *batch) { arch_tlbbatch_clear(&batch->arch); batch->flush_required = false; @@ -743,8 +743,14 @@ static void __fold_luf_batch(struct luf_batch *dst_lb, * more tlb shootdown might be needed to fulfill the newer * request. Conservertively keep the newer one. */ - if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) { + /* + * Good chance to shrink the batch using the old ugen. + */ + if (dst_lb->ugen && arch_tlbbatch_diet(&dst_lb->batch.arch, dst_lb->ugen)) + reset_batch(&dst_lb->batch); dst_lb->ugen = src_ugen; + } fold_batch(&dst_lb->batch, src_batch, false); } @@ -772,17 +778,45 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +static unsigned long tlb_flush_start(void) +{ + /* + * Memory barrier implied in the atomic operation prevents + * reading luf_ugen from happening after the following + * tlb flush. + */ + return new_luf_ugen(); +} + +static void tlb_flush_end(struct arch_tlbflush_unmap_batch *arch, + struct mm_struct *mm, unsigned long ugen) +{ + /* + * Prevent the following marking from placing prior to the + * actual tlb flush. + */ + smp_mb(); + + if (arch) + arch_tlbbatch_mark_ugen(arch, ugen); + if (mm) + arch_mm_mark_ugen(mm, ugen); +} + void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned long ugen; if (!tlb_ubc_takeoff->flush_required) return; + ugen = tlb_flush_start(); arch_tlbbatch_flush(&tlb_ubc_takeoff->arch); + tlb_flush_end(&tlb_ubc_takeoff->arch, NULL, ugen); /* * Now that tlb shootdown of tlb_ubc_takeoff has been performed, @@ -871,13 +905,17 @@ void try_to_unmap_flush(void) struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + unsigned long ugen; fold_batch(tlb_ubc, tlb_ubc_ro, true); fold_batch(tlb_ubc, tlb_ubc_luf, true); if (!tlb_ubc->flush_required) return; + ugen = tlb_flush_start(); arch_tlbbatch_flush(&tlb_ubc->arch); + tlb_flush_end(&tlb_ubc->arch, NULL, ugen); + reset_batch(tlb_ubc); } @@ -1009,7 +1047,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm) int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; if (pending != flushed) { + unsigned long ugen; + + ugen = tlb_flush_start(); arch_flush_tlb_batched_pending(mm); + tlb_flush_end(NULL, mm, ugen); /* * If the new TLB flushing is pending during flushing, leave From patchwork Wed Feb 26 12:03:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41B92C021BF for ; Wed, 26 Feb 2025 12:04:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF15728003F; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E9F2C280042; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D010028003F; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7BEC6280041 for ; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 301711413D1 for ; Wed, 26 Feb 2025 12:03:57 +0000 (UTC) X-FDA: 83161962114.21.40760D2 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf25.hostedemail.com (Postfix) with ESMTP id 9E79AA0005 for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=EZ3WVo2LjDrGqBB/HpAMLChMQrFmcButqYdHJ5uvg/0=; b=65Ib71jp0xWp5dj8qtCTTgSwvtbxh4KZDdtF4s4tsFA0uuOcrgpK9ycUhIkhUSjK3hAtU/ Posdof5zn6wy9oxXqbvZfkm66Y4zOr06MMRLpEh1oLqVueFTsL6dQFzNh7PUI8mvku4ARw afDw5q+NimzF5m+DJS8Rr6euinfneGY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571435; a=rsa-sha256; cv=none; b=JVyQQOGGjys7yA064bwQRgBZZyX5lNXzybXEWQhM/TyNxIxpIaAVQQ4Ap/7v7UtDasf+mn CBhlKKeP/M1bW7hk9oH2+DFM0dOhCqCefObpRSM4wcbJECi4F8AlzW8rK5lUjql91zR+NP 1mytklMcFD7VCgnABiZC7b6aVwOllEs= X-AuditID: a67dfc5b-3e1ff7000001d7ae-4d-67bf0323287c From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 18/25] mm/page_alloc: retry 3 times to take pcp pages on luf check failure Date: Wed, 26 Feb 2025 21:03:29 +0900 Message-Id: <20250226120336.29565-18-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4y8/50g0cr+S3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PS7/dMBU+4KhZ+/svcwLiFo4uR k0NCwETi5aknLDD27lm7wGw2AXWJGzd+MoPYIgJmEgdb/7B3MXJxMAssY5LYe6KBDSQhLFAq saflC1ARBweLgKrEl+0VIGFeoPqvS3uZIGbKS6zecABsDidQ/NO0Y2CtQgLJEjt//2ECmSkh cJtNYsHbVYwQDZISB1fcYJnAyLuAkWEVo1BmXlluYmaOiV5GZV5mhV5yfu4mRmBQL6v9E72D 8dOF4EOMAhyMSjy8D87sTRdiTSwrrsw9xCjBwawkwsuZuSddiDclsbIqtSg/vqg0J7X4EKM0 B4uSOK/Rt/IUIYH0xJLU7NTUgtQimCwTB6dUA2NVoPP3GyuTFNPd5xRkPkyo/S03bWvu0gLz j+6TVyQrXJw/2ZVz30XV9pjaMLZd984nzExfZ/jL8uxa90OLqzr+Or+xPdXImzg1b8GTwq5r Pz4umOFu81dqx+rl3Pt/WB+2VPde8Mp61Q3lSyp24c6pr1+1xj7M2SgTMGNqz4QLsx4XGzNf vmakxFKckWioxVxUnAgAi+QV7GYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/d4HIni8Wc9WvYLD5v+Mdm 8XX9L2aLp5/6WCwOzz3JanF51xw2i3tr/rNanN+1ltVix9J9TBaXDixgsjjee4DJYv69z2wW mzdNZbY4PmUqo8XvH3PYHPg9vrf2sXjsnHWX3WPBplKPzSu0PDat6mTz2PRpErvHu3Pn2D1O zPjN4vF+31U2j8UvPjB5bP1l59E49Rqbx+dNcgG8UVw2Kak5mWWpRfp2CVwZk36/Zyp4wlWx 8PNf5gbGLRxdjJwcEgImErtn7WIBsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 kISwQKnEnpYvQEUcHCwCqhJftleAhHmB6r8u7WWCmCkvsXrDAbA5nEDxT9OOgbUKCSRL7Pz9 h2kCI9cCRoZVjCKZeWW5iZk5pnrF2RmVeZkVesn5uZsYgSG6rPbPxB2MXy67H2IU4GBU4uF9 cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeb3CUxOEBNIT S1KzU1MLUotgskwcnFINjCnnH+xg14kzjJJp9ufZ5yXLH9WheOLKG4XzPL7zblyRCkxI9bGP uM/p/yE1xJWfR9WvYaFG+cPDKzkCdy2959DF9It5Anv1FzXXjw3G21QeRqabVjNUljis+RW2 svH+lVvz2VcV/AsSLGz4eiJ8D9MpG10tTS5WYaHur/ODK8yV2PYJqDQrsRRnJBpqMRcVJwIA vtKw7k0CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 9E79AA0005 X-Stat-Signature: nrbwm36mws4coexbew48rzordi4twef1 X-HE-Tag: 1740571434-51757 X-HE-Meta: U2FsdGVkX1+IeOf6QK1Mg4umD/SqF2wmkqqm5P8i5fv3L4vi3giiFkVRBimlK2RS7VgeU9KAGoAbrNWtfORe20SVnwrMRhKGsi7d9w/yjJT2HDKSL4aUNAWoRAuiVsq5Cp95AB6/ixAibiTt+UZ2igk9BuHJ6gjSEFvQhhe6TNgZMEVhvBYYYROYuDXPHLakMge2nF3VO7nhNaGfcuDHbsYlo4DKvSIWgLN18Kz4IhKf6absj77Ych9hIM1RzjqdyIhDIfXgcWnayGKVLLFRp10UxB+awlW2gmTRLyj9My7RYDMleXykrBNJLRf6e5f5JwNGDWNY0PfNZ23aoV+HTT6xDJ3m2LQIKKZ+/XikmW15Vfhn2iAhPZgqp19Sbz+qCf4ZJRkKcMUxakY7IsiLt1evBzBUi4QWKDgM/Xq8qcd1GduHSbLf3gfaur9G2yu2m+WYSLXqSEPbuzW/Xuy6xj+C+sLTrETUfcjb4AFACsZQC0OFxR9+Co8n33EWhTvRzVT1ALNdGcUyNbrjzWml+1g/0lWRVf5QINBQRFUJF4jNPS2iFXEKrsMt616CfjpoMz74jN33ElhE6Dv5ydZHJxtk79G9yMknRtv7X1zcM9Q6YqZUvk7b3urYfedFUNafCVYZ1QYNss2RJ279GydJoGOJHKT9q0LDd/M7OdO22esejnpnMJ595eLl4Ida+cI/7gI93M3XAkKcjrjQ5dfUy6WBzoFkmf8Qt1ElGnlwyfrwyFIwRCTiKJReG3ELbzXV8gJH8PA061M9qCKZ6y7+H3mtMyxxSu1ODRBqSbAEQ5hxQHp8449bSFm3Qva0olM3IyYfCbvcRSMTqZKmx7BVrvpK82nbSyyFU7S3zHOSQsx4qh/FYD0gmAQB0XyeDDQDaHu1YcBfKWQYXtgr7myOw8rGNNk0L4qihr0+U8UFsrw3vjPQfXTG7Cn4YvaLAYuQasWFNFb0Bcaw6+h3Oav +Pi40n7S v8yzGjJZXgyqYExFTmzgwppcRUYYev/p1OOe9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Signed-off-by: Byungchul Park --- mm/page_alloc.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8e1ed80f304cd..811e7c4bd2d19 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3306,6 +3306,12 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, { struct page *page; + /* + * give up taking page from pcp if it fails to take pcp page + * 3 times due to the tlb shootdownable issue. + */ + int try_luf_pages = 3; + do { if (list_empty(list)) { int batch = nr_pcp_alloc(pcp, zone, order); @@ -3320,11 +3326,21 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, return NULL; } - page = list_first_entry(list, struct page, pcp_list); - if (!luf_takeoff_check_and_fold(page)) + list_for_each_entry(page, list, pcp_list) { + if (luf_takeoff_check_and_fold(page)) { + list_del(&page->pcp_list); + pcp->count -= 1 << order; + break; + } + if (!--try_luf_pages) + return NULL; + } + + /* + * If all the pages in the list fails... + */ + if (list_entry_is_head(page, list, pcp_list)) return NULL; - list_del(&page->pcp_list); - pcp->count -= 1 << order; } while (check_new_pages(page, order)); return page; From patchwork Wed Feb 26 12:03:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992211 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B178BC18E7C for ; Wed, 26 Feb 2025 12:04:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7508628003D; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70043280040; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5291C28003D; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2BFD328003F for ; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D436D524F0 for ; Wed, 26 Feb 2025 12:03:56 +0000 (UTC) X-FDA: 83161962072.30.2520209 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf27.hostedemail.com (Postfix) with ESMTP id A898D4001E for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=r6mV+6EptRXow3oqRDXy6doVIlS4eDf+i5XfCN5DFIE=; b=vRP8pFQzAcKxPYUav0R9/I4BQZhROh0Hb2UQayyKzCEZ8cCo9JSjJ8ogIbJH+HGMg92GDz AvACCjmLX6rKK4IgGWu/V3wUoU5b5+E2+EjKgSLkF1u9haGmWoVLHmoDOqdlh5Wm2SFJVh Rxpp6LSnj2Od/3n76HkW037xtO4+UB4= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; spf=pass (imf27.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571435; a=rsa-sha256; cv=none; b=QUdDQp6pkr4ID9kXgGk2s4RQ4mMSuEhs8fjSR5vysrHcfmoqfwcNBc14yPKUHIZPoYxa2p M95QP6Wl++d/Y1k3yVHfwlyiJ64vkt42VYs8wOPiSDWo5QgJAKMO/n+ARD9/HLTbtLRelA bcN98PTVPUIKkWqRP8xe3baRrqbuzSg= X-AuditID: a67dfc5b-3e1ff7000001d7ae-52-67bf03233e22 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 19/25] mm: skip luf tlb flush for luf'd mm that already has been done Date: Wed, 26 Feb 2025 21:03:30 +0900 Message-Id: <20250226120336.29565-19-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnoa4y8/50gz3n+S3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PWpBVMBWdtK/bNXsbawNhu1MXI ySEhYCLxeNoM9i5GDjC75bMrSJhNQF3ixo2fzCC2iICZxMHWP0AlXBzMAsuYJPaeaGADSQgL FEhMmrmfEcRmEVCV+Pv+CTuIzQvUsLnhLRPEfHmJ1RsOgA3iBIp/mnYMrFdIIFli5+8/TCBD JQTus0n0ta1khWiQlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECg3pZ7Z/o HYyfLgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcY pTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYLWuFs1caPNJW0LW7t3XVH9dpa/T+fck/ 22Sw+OdPX8dL1hsm6+avWv12Jg+z4dX4XJdPOsczszbvtIuf+i/yvOqOiKgLX564J/04N//B kmihBwult+6UOq4yQ643tPzz+ltvnJLKf/Tlf9/nu1lldcMv2eKw7D+nl+V4bZ7JyGx2bp+h WHlTshJLcUaioRZzUXEiAJJ5FJJmAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/dYNdRNos569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyZk1awVRw1rZi 3+xlrA2M7UZdjBwcEgImEi2fXbsYOTnYBNQlbtz4yQxiiwiYSRxs/cPexcjFwSywjEli74kG NpCEsECBxKSZ+xlBbBYBVYm/75+wg9i8QA2bG94ygdgSAvISqzccABvECRT/NO0YWK+QQLLE zt9/mCYwci1gZFjFKJKZV5abmJljqlecnVGZl1mhl5yfu4kRGKLLav9M3MH45bL7IUYBDkYl Ht4HZ/amC7EmlhVX5h5ilOBgVhLh5czcky7Em5JYWZValB9fVJqTWnyIUZqDRUmc1ys8NUFI ID2xJDU7NbUgtQgmy8TBKdXAWLFP/nnHxW1xMxp+7f5nydUtOnN7T2jh9N8tPc+uMkdef9NW vVxtwtzV4tMzzAwf7H8pcfUFb+OyNbqJXHfFPaptWzvk9DqNRfkyLa9fZf97szv1ze7TiRKO Ok1+p7tkD4V9nxzxJ/RDv1pjiOM3rksWOmfbdj5/1Kjwq+Z481UpaakLVRZblFiKMxINtZiL ihMBcPghPE0CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: A898D4001E X-Stat-Signature: xqhc1t9i5qpqx5wm78j78bgjpd4cs664 X-HE-Tag: 1740571434-169522 X-HE-Meta: U2FsdGVkX1842ZT2sXZtcfoz279KIm8lsLdg3lsQWprPpJlqn0q8Kd/TflcdmlDfPGs48UBnSGyv6Whz8Pg8djdTIVJ9AchX4Bb/sBcXD8Xt7N8YyEwq8MMOG2MO61X1hTPOR5h712ntIk2tIpNCPp9eCdmcAQMsE6CCM33BK0yjXY4d6DiQBw/6RcMPyepvwkUk8oNmw4beTWD4WI4sCZ7/8geovqUU4sKgPj3z5W+mQxbMRJIcIrsG6q6L0fpJAgHyUjdlnGg7onDyBclc55zK7zsfyrNhAIQeTt3AyTAEVIKmLqFjjMYJuPFplNg/5gFFzMy91o/hiZAOAYF0wYeSCBmPHrFFNSyY8xeTBCBMd+Sx7fhf4mVQgHVpJ9r0C6Zg3N/NCEmfXMfkdjyDPww4U4GzvElguxtsIp4sTQg+LIWuqU1MEoDnbfYqyqMk0LDLyI0JN8JvjoQYMy7NhQEemYjz0wDtU31EWdb6cRNdWilllmEubqPsEXkr+aYuXHRn4Rh3vhTFYiFxA9UbDXq/xU44DL+Yrja+vuPDmETVho1zzE/jcoapJpuMmJokhGhvLDhS0iS0WVRwRqcM7BlPOvNxAVG3F3m5igxCMTJQQ2dW2UY34rdgMuTZlPXzh44uj9P/vgXdXeTuMqD1YeopHwSiPGlPOHORY+2ApNR6L8KSqMR2VGHRXYEguVxy8YaYyo1xfgjdXrTbYHplwQ1SVpuPZYnPQ8oQIbOKp7G1AKTs/n1U6W1DqGJbelstIHqYcRYS2YbKr3mPZchrvOZ9TwG/cNXRhh49TbGfPmXjUS2ns1q6yXQLg7gczs1TPwSTowqyIWpUaMYiPZVoFdX+geOt3d9UR8oQ7zOlJgCtkAIuKE+DJzQhE1URR/UwVT9w2OxbkmZ+HVUh0DV0Hsmbta8agnajrkgn9JtrgkpN+xlCqTVNrbNFYgJ1NvfaLOc8g0l3BTf8oahQ4dg z0A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fault hander performs tlb flush pended by luf when a new pte becomes to have write permission, no matter whether tlb flush required has been performed or not. By storing luf generation number, luf_ugen, in struct mm_struct, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- include/asm-generic/tlb.h | 2 +- include/linux/mm_types.h | 9 +++++ kernel/fork.c | 1 + kernel/sched/core.c | 2 +- mm/memory.c | 22 ++++++++++-- mm/pgtable-generic.c | 2 +- mm/rmap.c | 74 +++++++++++++++++++++++++++++++++++++-- 7 files changed, 104 insertions(+), 8 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 5bb6b05bd3549..f156e8cb3bd4a 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -568,7 +568,7 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm /* * Don't leave stale tlb entries for this vma. */ - luf_flush(0); + luf_flush_vma(vma); if (tlb->fullmm) return; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index cb9e6282b7ad1..2ac93d4f67c15 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -38,8 +38,10 @@ struct luf_batch { unsigned long ugen; rwlock_t lock; }; +void luf_batch_init(struct luf_batch *lb); #else struct luf_batch {}; +static inline void luf_batch_init(struct luf_batch *lb) {} #endif /* @@ -1059,6 +1061,9 @@ struct mm_struct { * moving a PROT_NONE mapped page. */ atomic_t tlb_flush_pending; + + /* luf batch for this mm */ + struct luf_batch luf_batch; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* See flush_tlb_batched_pending() */ atomic_t tlb_flush_batched; @@ -1341,8 +1346,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) void luf_flush(unsigned short luf_key); +void luf_flush_mm(struct mm_struct *mm); +void luf_flush_vma(struct vm_area_struct *vma); #else static inline void luf_flush(unsigned short luf_key) {} +static inline void luf_flush_mm(struct mm_struct *mm) {} +static inline void luf_flush_vma(struct vm_area_struct *vma) {} #endif struct vm_fault; diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f32..ece87fece2113 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1280,6 +1280,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); spin_lock_init(&mm->arg_lock); + luf_batch_init(&mm->luf_batch); mm_init_cpumask(mm); mm_init_aio(mm); mm_init_owner(mm, p); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1f4c5da800365..ec132abbbce6e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5275,7 +5275,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) if (mm) { membarrier_mm_sync_core_before_usermode(mm); mmdrop_lazy_tlb_sched(mm); - luf_flush(0); + luf_flush_mm(mm); } if (unlikely(prev_state == TASK_DEAD)) { diff --git a/mm/memory.c b/mm/memory.c index c1d2d2b0112cd..52bd45fe00f55 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6181,6 +6181,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, struct mm_struct *mm = vma->vm_mm; vm_fault_t ret; bool is_droppable; + struct address_space *mapping = NULL; bool flush = false; __set_current_state(TASK_RUNNING); @@ -6212,9 +6213,17 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, * should be considered. */ if (vma->vm_flags & (VM_WRITE | VM_MAYWRITE) || - flags & FAULT_FLAG_WRITE) + flags & FAULT_FLAG_WRITE) { flush = true; + /* + * Doesn't care the !VM_SHARED cases because it won't + * update the pages that might be shared with others. + */ + if (vma->vm_flags & VM_SHARED && vma->vm_file) + mapping = vma->vm_file->f_mapping; + } + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else @@ -6249,8 +6258,15 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, /* * Ensure to clean stale tlb entries for this vma. */ - if (flush) - luf_flush(0); + if (flush) { + /* + * If it has a VM_SHARED mapping, all the mms involved + * should be luf_flush'ed. + */ + if (mapping) + luf_flush(0); + luf_flush_mm(mm); + } return ret; } diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d6678d6bac746..545d401db82c1 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -100,7 +100,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); else - luf_flush(0); + luf_flush_vma(vma); return pte; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index fcd27200efa04..d68cfd28e0939 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -695,7 +695,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, */ struct luf_batch luf_batch[NR_LUF_BATCH]; -static void luf_batch_init(struct luf_batch *lb) +void luf_batch_init(struct luf_batch *lb) { rwlock_init(&lb->lock); reset_batch(&lb->batch); @@ -778,6 +778,31 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +static void fold_luf_batch_mm(struct luf_batch *dst, + struct mm_struct *mm) +{ + unsigned long flags; + bool need_fold = false; + + read_lock_irqsave(&dst->lock, flags); + if (arch_tlbbatch_need_fold(&dst->batch.arch, mm)) + need_fold = true; + read_unlock(&dst->lock); + + write_lock(&dst->lock); + if (unlikely(need_fold)) + arch_tlbbatch_add_pending(&dst->batch.arch, mm, 0); + + /* + * dst->ugen represents sort of request for tlb shootdown. The + * newer it is, the more tlb shootdown might be needed to + * fulfill the newer request. Keep the newest one not to miss + * necessary tlb shootdown. + */ + dst->ugen = new_luf_ugen(); + write_unlock_irqrestore(&dst->lock, flags); +} + static unsigned long tlb_flush_start(void) { /* @@ -894,6 +919,49 @@ void luf_flush(unsigned short luf_key) } EXPORT_SYMBOL(luf_flush); +void luf_flush_vma(struct vm_area_struct *vma) +{ + struct mm_struct *mm; + struct address_space *mapping = NULL; + + if (!vma) + return; + + mm = vma->vm_mm; + /* + * Doesn't care the !VM_SHARED cases because it won't + * update the pages that might be shared with others. + */ + if (vma->vm_flags & VM_SHARED && vma->vm_file) + mapping = vma->vm_file->f_mapping; + + if (mapping) + luf_flush(0); + luf_flush_mm(mm); +} + +void luf_flush_mm(struct mm_struct *mm) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mm) + return; + + lb = &mm->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -962,8 +1030,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!can_luf_test()) tlb_ubc = ¤t->tlb_ubc; - else + else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; From patchwork Wed Feb 26 12:03:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992210 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5129C021BF for ; Wed, 26 Feb 2025 12:04:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 135C628003E; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E79828003D; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA04128003E; Wed, 26 Feb 2025 07:03:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BF82128003D for ; Wed, 26 Feb 2025 07:03:56 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 85D5651AC2 for ; Wed, 26 Feb 2025 12:03:56 +0000 (UTC) X-FDA: 83161962072.27.6516A09 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id A99204001D for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=S/vMRtcfMWgcatQ+P8/daAgGR6Fc4eeqIPQa4JLxkOI=; b=rfn2GUg1+7i9HnUSIqiylBosiqYFgo1CTgzoUf93Y75q87oKyhET7Zn+v+ktVkkYM+gs4k ZAochQyv+3gkfJ0Qr1Wm/DP8P8NqIE2j1+02+Z4jtVzZFWjrXFHJVO+8RzaS2+bryGsQnz cdLkTGvnFzIJCZFAzol2EN/dGoDezMw= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571435; a=rsa-sha256; cv=none; b=unoGiS6WHRykCl3+t0AiwuCjna3g83V9f9kcdVbsHZndsir6KZ1SLap44MZgQVyRoZMfZ6 bdeJtmTtyZPZoxYhqK0EgGFStDlVqpXGtMFNzSlKRWeAfdPc8myzmrsuPxAwAALK32jLz9 khQwOYaJrxaTJ7bMM7B8Kdkxrq3BO/E= X-AuditID: a67dfc5b-3e1ff7000001d7ae-57-67bf03235d70 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 20/25] mm, fs: skip tlb flushes for luf'd filemap that already has been done Date: Wed, 26 Feb 2025 21:03:31 +0900 Message-Id: <20250226120336.29565-20-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrILMWRmVeSWpSXmKPExsXC9ZZnka4y8/50g31/BCzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PFfPeCSQEVx+ffZ2pgPOvQxcjB ISFgIrG5XbmLkRPMPLNqMTOIzSagLnHjxk8wW0TATOJg6x/2LkYuDmaBZUwSe080sIEkhAXK JeY13WcDmcMioCrRMJkTJMwLVH//7W5miJnyEqs3HACzOYHin6YdA2sVEkiW2Pn7DxPITAmB 22wSZ1d8gWqQlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECA3pZ7Z/oHYyf LgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlY lMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoY5V/du9Iu/bKyJq488K5c2IaPKt1vNAz6WXwO qyi7C1g8v/7oGYtLFsfGAKGNk3smxjvf577y6zXPlVr37Bk5IcvdZyRxbhIRdau81vburUtJ tTRfYoFxdvr22kMRJw+tiVU1Ff3jlrUqsbEi+UXj43DNk1rntJ887/ivLHkrnGeiydKskDNK LMUZiYZazEXFiQC5VIXpZAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/d4O4zDos569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyVsx3L5gUUHF8 /n2mBsazDl2MnBwSAiYSZ1YtZgax2QTUJW7c+AlmiwiYSRxs/cPexcjFwSywjEli74kGNpCE sEC5xLym+0A2BweLgKpEw2ROkDAvUP39t7uZIWbKS6zecADM5gSKf5p2DKxVSCBZYufvP0wT GLkWMDKsYhTJzCvLTczMMdUrzs6ozMus0EvOz93ECAzQZbV/Ju5g/HLZ/RCjAAejEg/vgzN7 04VYE8uKK3MPMUpwMCuJ8HJm7kkX4k1JrKxKLcqPLyrNSS0+xCjNwaIkzusVnpogJJCeWJKa nZpakFoEk2Xi4JRqYDQ/sezjhw4ej9n/Xil8v/VwjzvPUr6req925djce2K1plTcLvLBg6ec 3Vnfp13LOHf3a23C8YSbea1Lb89fs9x0Eg+rrsSpjGWMsybtniiSqLn1Y9Ueh+z9lzj189a1 Vh5mLi1739+R/kZJ8MeTSX2B7Z0/muqVT/w9Zt/q/uE/V+j8bf1P6syVWIozEg21mIuKEwGV Kex8TAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: A99204001D X-Stat-Signature: jcqbwupydhme5b657za19qhksjm7y6wz X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571434-397293 X-HE-Meta: U2FsdGVkX18EGHlatCSw+rp0ATAYZWxnCZnxvVqFxWcWDtVqL6qh1BKeE2UVz/PPgL8MJ+TtrP0vldeXCCvRodqVmE8ZYH+EdC/0+BoYKtZ9AG49O+74/+4O6KyRatQmMQAb3gGtJVJfbnsAGLI9jRuv0Ne1uiNdEVAj/ceXYhDDaClXrMcm6K61MiItOLdbM2DdxQZegOex4zLb8UKL7JWhHZJFzYuryhtKMAoZsbZFA95H9UPjyLK2JuAdM2k6Q7fIF9KmXU6xrLLGRFzbd4UqSLNFE9Wk5J4c9Usej/VmoPRvKgxTLrzPX6GrmguE9+FsMz8OiPQse78gHzh7l9JlAzFnkxTVgPm3CZC2ArSqotuAeNOtJwVp3X/LE9HrGwj+Qiads56BGlyO4QCsVPxxxQ6kTwKEeKIelEP2H2TKFnITLja6I91W/mIJ0EZNOpVkHdlycS7h4sgUqKs1KFHrM4Wjd02z+EM74fzj6yqLNn6ciHS5xICpLcykX1O4WgUMeGu2AxI1MXuqYJ3nTgqw2cnbV9vKAXWe0+pn9clPxruE1WR6bHT/Hxek1DkkgaHKpLUuLmHKyKKjooUZZjhBG4KHcVAhcRMbV9BPl5/vX5coR4zSqww7kV3sDyOTeg5HYmWrJn6q2faoYUzbWf7obKUs9sA9BspaeD8GuF8H6cbsa93rSVIlYjeIwDquLvPsJo9K37kHC2uSJsVCVtVGESKOZx19XRvtull+dkzBggrcD8AedM3W2KvomSQmsqA++BPzmwSYCjnKfF4tBGOM01c1L4k1JGspdj8rvFK9+cT9aMVlz3nz0cxHGUHwXT2QTBh9uxCsEKB04BrXc/rQsdVcAQ1V+/XaDie2MUxMxuQlOZfatqBzBYi9zk0DZDfXrd8wPJYy+Q41q/yMMH9ZOB/atp1hdrvx+yJzGF6aKCRm4gwL2EiQxk7Vo8TdrudnJczyDgY9tNZQbnA eow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For luf'd filemap, tlb shootdown is performed when updating page cache, no matter whether tlb flushes required already has been done or not. By storing luf meta data in struct address_space and updating the luf meta data properly, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- fs/inode.c | 1 + include/linux/fs.h | 4 ++- include/linux/mm_types.h | 2 ++ mm/memory.c | 4 +-- mm/rmap.c | 59 +++++++++++++++++++++++++--------------- mm/truncate.c | 14 +++++----- mm/vmscan.c | 2 +- 7 files changed, 53 insertions(+), 33 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 5587aabdaa5ee..752fb2df6f3b3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -475,6 +475,7 @@ static void __address_space_init_once(struct address_space *mapping) init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->i_private_list); spin_lock_init(&mapping->i_private_lock); + luf_batch_init(&mapping->luf_batch); mapping->i_mmap = RB_ROOT_CACHED; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 78aaf769d32d1..a2f014b31028f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -498,6 +498,7 @@ extern const struct address_space_operations empty_aops; * @i_private_lock: For use by the owner of the address_space. * @i_private_list: For use by the owner of the address_space. * @i_private_data: For use by the owner of the address_space. + * @luf_batch: Data to track need of tlb flush by luf. */ struct address_space { struct inode *host; @@ -519,6 +520,7 @@ struct address_space { struct list_head i_private_list; struct rw_semaphore i_mmap_rwsem; void * i_private_data; + struct luf_batch luf_batch; } __attribute__((aligned(sizeof(long)))) __randomize_layout; /* * On most architectures that alignment is already the case; but @@ -545,7 +547,7 @@ static inline int mapping_write_begin(struct file *file, * Ensure to clean stale tlb entries for this mapping. */ if (!ret) - luf_flush(0); + luf_flush_mapping(mapping); return ret; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2ac93d4f67c15..96015fc68e4f5 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1348,10 +1348,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); void luf_flush(unsigned short luf_key); void luf_flush_mm(struct mm_struct *mm); void luf_flush_vma(struct vm_area_struct *vma); +void luf_flush_mapping(struct address_space *mapping); #else static inline void luf_flush(unsigned short luf_key) {} static inline void luf_flush_mm(struct mm_struct *mm) {} static inline void luf_flush_vma(struct vm_area_struct *vma) {} +static inline void luf_flush_mapping(struct address_space *mapping) {} #endif struct vm_fault; diff --git a/mm/memory.c b/mm/memory.c index 52bd45fe00f55..6cdc1df0424f3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6261,10 +6261,10 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (flush) { /* * If it has a VM_SHARED mapping, all the mms involved - * should be luf_flush'ed. + * in the struct address_space should be luf_flush'ed. */ if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } diff --git a/mm/rmap.c b/mm/rmap.c index d68cfd28e0939..58dfc9889b1ee 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -691,7 +691,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, #define NR_LUF_BATCH (1 << (sizeof(short) * 8)) /* - * Use 0th entry as accumulated batch. + * XXX: Reserve the 0th entry for later use. */ struct luf_batch luf_batch[NR_LUF_BATCH]; @@ -936,7 +936,7 @@ void luf_flush_vma(struct vm_area_struct *vma) mapping = vma->vm_file->f_mapping; if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } @@ -962,6 +962,29 @@ void luf_flush_mm(struct mm_struct *mm) try_to_unmap_flush(); } +void luf_flush_mapping(struct address_space *mapping) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mapping) + return; + + lb = &mapping->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush_mapping); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -1010,7 +1033,8 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -1032,27 +1056,15 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, tlb_ubc = ¤t->tlb_ubc; else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + if (mapping) + fold_luf_batch_mm(&mapping->luf_batch, mm); } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); tlb_ubc->flush_required = true; - if (can_luf_test()) { - struct luf_batch *lb; - unsigned long flags; - - /* - * Accumulate to the 0th entry right away so that - * luf_flush(0) can be uesed to properly perform pending - * TLB flush once this unmapping is observed. - */ - lb = &luf_batch[0]; - write_lock_irqsave(&lb->lock, flags); - __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); - write_unlock_irqrestore(&lb->lock, flags); - } - /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -1134,7 +1146,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long uaddr, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { } @@ -1503,7 +1516,7 @@ int folio_mkclean(struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return cleaned; } @@ -2037,6 +2050,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2174,7 +2188,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2414,6 +2428,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2563,7 +2578,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } diff --git a/mm/truncate.c b/mm/truncate.c index 2bf3806391c21..b2934c4edebf1 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -128,7 +128,7 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -169,7 +169,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return 0; } @@ -219,7 +219,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); if (!folio_test_large(folio)) return true; @@ -281,7 +281,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } @@ -416,7 +416,7 @@ void truncate_inode_pages_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -536,7 +536,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return count; } @@ -706,7 +706,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index 461e7643898e7..a31a7cf87315f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -843,7 +843,7 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } From patchwork Wed Feb 26 12:03:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992214 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28D7EC021BF for ; Wed, 26 Feb 2025 12:04:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28CD5280041; Wed, 26 Feb 2025 07:03:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D2C9280043; Wed, 26 Feb 2025 07:03:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E52B4280041; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A9C12280042 for ; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 622271A13B4 for ; Wed, 26 Feb 2025 12:03:57 +0000 (UTC) X-FDA: 83161962114.12.64FC131 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf21.hostedemail.com (Postfix) with ESMTP id 10E961C000A for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=8hm+ud5WCKt9LCY6e1gsPHE2uUHi86hDozw4NfZy5m0=; b=1U14Mk0A9y1T7TwcLdnRtZYDTQtRT+0LX+qb4/b5o/wNAqIdlMRrbPxihggzz6jXDQsW8O udIq/reF0ALY6EA4R9D4FnrTJAuMt9WJnWGJ959qSuRXlSvUYMCONEtvVHlrk7O9StFaxk P6tLIHGAjZpsZxRm6xEJAMKGy8L2I7g= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; spf=pass (imf21.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571435; a=rsa-sha256; cv=none; b=klD49RTDgE2Wqo2tvLjmPVmbqgwPUVM02+bNhTog9pxkgEryty8UuTDnM3xiBgAqlyHo/O /gamXdoHiEWKeg9C+HgS205E7oj1zqNMskPUs3apq20iuU7ne8ngsBoGRwG4qHHtXXU69V +Mrc4BmEEv1VI+3xmguPEjpk3XJZtzg= X-AuditID: a67dfc5b-3e1ff7000001d7ae-5c-67bf03236738 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 21/25] mm: perform luf tlb shootdown per zone in batched manner Date: Wed, 26 Feb 2025 21:03:32 +0900 Message-Id: <20250226120336.29565-21-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnoa4y8/50g7XXJCzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PXhDeMBd97mSq+TNvH2MDYfZOx i5GTQ0LARGLGq/8sMPbC7VOYQGw2AXWJGzd+MoPYIgJmEgdb/7B3MXJxMAssY5LYe6KBDSQh LJAlcXnSBbAGFgFViaXTboIN4gVquPFxJhPEUHmJ1RsOgA3iBIp/mnYMrFdIIFli5+8/TCBD JQTus0k0XNgPdYWkxMEVN1gmMPIuYGRYxSiUmVeWm5iZY6KXUZmXWaGXnJ+7iREY2stq/0Tv YPx0IfgQowAHoxIP74Mze9OFWBPLiitzDzFKcDArifByZu5JF+JNSaysSi3Kjy8qzUktPsQo zcGiJM5r9K08RUggPbEkNTs1tSC1CCbLxMEp1cDY5iZ4cL3md4EvZyT0XhcXaa9vuuny883H 4Ddbu3WlH3w7q/HjbWi/oU213buIb092TX4ir58tn7vjvvvKDx3fLy9fc6DfJKyer1X7tpxe 3qcMkcPFguE6IctWsLj7rpzOW3D7c6crv+f/lY1rxQ6qte28e/cB649jG6e9uXkuJkMmUPEk 7yp9JZbijERDLeai4kQA5f+XyWkCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/dYMFWPos569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgydk14w1jwvZep 4su0fYwNjN03GbsYOTkkBEwkFm6fwgRiswmoS9y48ZMZxBYRMJM42PqHvYuRi4NZYBmTxN4T DWwgCWGBLInLky6ANbAIqEosnXaTBcTmBWq48XEmE8RQeYnVGw6ADeIEin+adgysV0ggWWLn 7z9MExi5FjAyrGIUycwry03MzDHVK87OqMzLrNBLzs/dxAgM1GW1fybuYPxy2f0QowAHoxIP 74Mze9OFWBPLiitzDzFKcDArifByZu5JF+JNSaysSi3Kjy8qzUktPsQozcGiJM7rFZ6aICSQ nliSmp2aWpBaBJNl4uCUamD03r2/9tvz7cUfWdVFan/UHHpVy/ez5+q5d9OWX6yc8uTX1tnH KqWFFaYv+jXli/DC3XY+s0wmiOvketwvObvkqOqKDykvluewvtqg0vLyzv4dlaah/PELCldn /M7vrF7Y08513Er91+qTM2KtL8/axtu/jMUtRzBAi09AJ7n8vUzoU+tzN96YK7EUZyQaajEX FScCACGvDGlQAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 10E961C000A X-Stat-Signature: 9w14grsw888wgim443dr6gekotfoc633 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571434-439862 X-HE-Meta: U2FsdGVkX18/eWxlkw7Z0xOAeZVxMgNcdvlbcUzSg1S0Wtj328kYqomzCz89uJpisNmTTMM0YETkxWpPFRpv0If2hllPgeiVGsxn14Qm0WNkA+hya+DK6zk3Gea6j45t8HyehxoqUKp2qubS7+z7gjG8tuFcPIcn0jlGeY9mybe9lxRaxG0MDPF7O0CZ4v4AwGZgLn7ApBUCgFpF3WRkzvJX8AJ72kLdsVYWc51ZCPjpB8yhPf+HN0fCbyhkRozppwrhF7YAOlctgK1qa9RauNHJy+y5XGn0JmT2F5M704CaW5Fl8O++KNqFFfamOWlQ1FhRrJlcw0/Bc8WwmuOG9ci9Byh35/GJizFpY6tLZEMDJP7BwKdn6ski4kXyiRw/SRbP0Rd9LeJXOZNWhix/2xZDl639eqxr1zER0/IV5F+eipoOW9ZARH6TwUC/xOnCMh9ERwVqQbBZ4ayoM8mc2PtrqZrf1EvbXGOIN6snICkrDJ7O4WvzaDPZxrfTUtHYr9NhpYQIkVgRDIK1RiZoWXmvJc3WkXr8Oy/xQnTT1bVLMHge9b21+PfLtV2IVucjFbTB2X1uJFwySixhS9pfgw+qBinpZLOoiZls0EG/5jGVd0NfGptyRukFFWrHHwKlUdngJdUFWTlc8LvZnZtRFHw5r4dWZ886yVwZpek0c/5X2hp9eq7dmeloTKM6a4eCw63Lt38m3XBnSInRGmDrqiqrR8fm6bCbxotjxHkx378b50+it+0iAtJ0bpC2gztxrIkKJDNJWMSr/4BeNToY2tt2/hxVW8FnsWhssBzsJF7hk3JrhPRl21MX5depi0t0tsg1Atp2eRLJXIi8FU1sNmbIrB9IYpx8gUq8oM6Ez80Q/dk0G5E6vNZpKYeQnjUIwSm2Rz50l42goim0chN+njTwnaZUpFRlLHN3zU4SrQsA8dHvlmKFIM7rRACj2ntO3zdoRN5Qv0kMUBRnMXY 3Sw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Each luf page in buddy has its pending tlb shootdown information and performs the corresponding tlb shootdown on exit from buddy. However, every exit from buddy causes small but frequent IPIs. Even though total IPIs get reduced, unnecessary waits on conflict CPUs in IPI handler have been observed via perf profiling. Thus, made it perfrom luf tlb shootdown per zone in batched manner when pages exit from buddy so as to avoid frequent IPIs. Signed-off-by: Byungchul Park --- include/linux/mm.h | 44 ++++- include/linux/mm_types.h | 19 +- include/linux/mmzone.h | 9 + include/linux/sched.h | 2 + mm/compaction.c | 10 +- mm/internal.h | 13 +- mm/mm_init.c | 5 + mm/page_alloc.c | 363 +++++++++++++++++++++++++++++++-------- mm/page_reporting.c | 9 +- mm/rmap.c | 6 +- 10 files changed, 383 insertions(+), 97 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8c3481402d8cb..e8e6562abc77d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4155,12 +4155,16 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status); -#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) /* * luf_ugen will start with 2 so that 1 can be regarded as a passed one. */ #define LUF_UGEN_INIT 2 +/* + * zone_ugen will start with 2 so that 1 can be regarded as done. + */ +#define ZONE_UGEN_INIT 2 +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) static inline bool ugen_before(unsigned long a, unsigned long b) { /* @@ -4171,7 +4175,11 @@ static inline bool ugen_before(unsigned long a, unsigned long b) static inline unsigned long next_ugen(unsigned long ugen) { - if (ugen + 1) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if ((unsigned short)(ugen + 1)) return ugen + 1; /* * Avoid invalid ugen, zero. @@ -4181,7 +4189,11 @@ static inline unsigned long next_ugen(unsigned long ugen) static inline unsigned long prev_ugen(unsigned long ugen) { - if (ugen - 1) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if ((unsigned short)(ugen - 1)) return ugen - 1; /* * Avoid invalid ugen, zero. @@ -4189,4 +4201,30 @@ static inline unsigned long prev_ugen(unsigned long ugen) return ugen - 2; } #endif + +/* + * return the biggest ugen but it should be before the real zone_ugen. + */ +static inline unsigned long page_zone_ugen(struct zone *zone, struct page *page) +{ + unsigned long zone_ugen = zone->zone_ugen; + unsigned short short_zone_ugen = page->zone_ugen; + unsigned long cand1, cand2; + + if (!short_zone_ugen) + return 0; + + cand1 = (zone_ugen & ~(unsigned long)USHRT_MAX) | short_zone_ugen; + cand2 = cand1 - USHRT_MAX - 1; + + if (!ugen_before(zone_ugen, cand1)) + return cand1; + + return cand2; +} + +static inline void set_page_zone_ugen(struct page *page, unsigned short zone_ugen) +{ + page->zone_ugen = zone_ugen; +} #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 96015fc68e4f5..c5f44b5c9758f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -132,11 +132,20 @@ struct page { */ unsigned short order; - /* - * For tracking need of tlb flush, - * by luf(lazy unmap flush). - */ - unsigned short luf_key; + union { + /* + * For tracking need of + * tlb flush, by + * luf(lazy unmap flush). + */ + unsigned short luf_key; + + /* + * Casted zone_ugen with + * unsigned short. + */ + unsigned short zone_ugen; + }; }; }; }; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e2c8d7147e361..df5bacd48a2a2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -117,6 +117,7 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; struct list_head pend_list[MIGRATE_TYPES]; + unsigned long pend_zone_ugen[MIGRATE_TYPES]; unsigned long nr_free; }; @@ -1017,6 +1018,14 @@ struct zone { atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; /* Count pages that need tlb shootdown on allocation */ atomic_long_t nr_luf_pages; + /* Generation number for that tlb shootdown has been done */ + unsigned long zone_ugen_done; + /* Generation number to control zone batched tlb shootdown */ + unsigned long zone_ugen; + /* Approximate latest luf_ugen that have ever entered */ + unsigned long luf_ugen; + /* Accumulated tlb batch for this zone */ + struct tlbflush_unmap_batch zone_batch; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/include/linux/sched.h b/include/linux/sched.h index 31efc88ce911a..96375274d0335 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1404,6 +1404,8 @@ struct task_struct { int luf_no_shootdown; int luf_takeoff_started; unsigned long luf_ugen; + unsigned long zone_ugen; + unsigned long wait_zone_ugen; #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/compaction.c b/mm/compaction.c index aa594a85d8aee..b7a7a6feb9eac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -655,7 +655,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } - if (!luf_takeoff_check(page)) + if (!luf_takeoff_check(cc->zone, page)) goto isolate_fail; /* Found a free page, will break it into order-0 pages */ @@ -691,7 +691,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(cc->zone); /* * Be careful to not go outside of the pageblock. @@ -1613,7 +1613,7 @@ static void fast_isolate_freepages(struct compact_control *cc) order_scanned++; nr_scanned++; - if (unlikely(consider_pend && !luf_takeoff_check(freepage))) + if (unlikely(consider_pend && !luf_takeoff_check(cc->zone, freepage))) goto scan_next; pfn = page_to_pfn(freepage); @@ -1681,7 +1681,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(cc->zone); /* Skip fast search if enough freepages isolated */ if (cc->nr_freepages >= cc->nr_migratepages) @@ -2418,7 +2418,7 @@ static enum compact_result compact_finished(struct compact_control *cc) */ luf_takeoff_start(); ret = __compact_finished(cc); - luf_takeoff_end(); + luf_takeoff_end(cc->zone); trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) diff --git a/mm/internal.h b/mm/internal.h index a95c46355e93d..6d7b3b389810e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1592,10 +1592,10 @@ static inline void accept_page(struct page *page) #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) extern struct luf_batch luf_batch[]; bool luf_takeoff_start(void); -void luf_takeoff_end(void); +void luf_takeoff_end(struct zone *zone); bool luf_takeoff_no_shootdown(void); -bool luf_takeoff_check(struct page *page); -bool luf_takeoff_check_and_fold(struct page *page); +bool luf_takeoff_check(struct zone *zone, struct page *page); +bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page); static inline bool non_luf_pages_ok(struct zone *zone) { @@ -1605,7 +1605,6 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } - unsigned short fold_unmap_luf(void); /* @@ -1693,10 +1692,10 @@ static inline bool can_luf_vma(struct vm_area_struct *vma) } #else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ static inline bool luf_takeoff_start(void) { return false; } -static inline void luf_takeoff_end(void) {} +static inline void luf_takeoff_end(struct zone *zone) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } -static inline bool luf_takeoff_check(struct page *page) { return true; } -static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +static inline bool luf_takeoff_check(struct zone *zone, struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) { return true; } static inline bool non_luf_pages_ok(struct zone *zone) { return true; } static inline unsigned short fold_unmap_luf(void) { return 0; } diff --git a/mm/mm_init.c b/mm/mm_init.c index 41c38fbb58a30..69643c3564a47 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1400,6 +1400,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); INIT_LIST_HEAD(&zone->free_area[order].pend_list[t]); + zone->free_area[order].pend_zone_ugen[t] = ZONE_UGEN_INIT; zone->free_area[order].nr_free = 0; } @@ -1407,6 +1408,10 @@ static void __meminit zone_init_free_lists(struct zone *zone) INIT_LIST_HEAD(&zone->unaccepted_pages); #endif atomic_long_set(&zone->nr_luf_pages, 0); + zone->zone_ugen_done = ZONE_UGEN_INIT - 1; + zone->zone_ugen = ZONE_UGEN_INIT; + zone->luf_ugen = LUF_UGEN_INIT - 1; + reset_batch(&zone->zone_batch); } void __meminit init_currently_empty_zone(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 811e7c4bd2d19..917a257ea5706 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -663,16 +663,29 @@ bool luf_takeoff_start(void) return !no_shootdown; } +static void wait_zone_ugen_done(struct zone *zone, unsigned long zone_ugen) +{ + while (ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + cond_resched(); +} + +static void set_zone_ugen_done(struct zone *zone, unsigned long zone_ugen) +{ + WRITE_ONCE(zone->zone_ugen_done, zone_ugen); +} + /* * Should be called within the same context of luf_takeoff_start(). */ -void luf_takeoff_end(void) +void luf_takeoff_end(struct zone *zone) { struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned long flags; bool no_shootdown; bool outmost = false; unsigned long cur_luf_ugen; + unsigned long cur_zone_ugen; + unsigned long cur_wait_zone_ugen; local_irq_save(flags); VM_WARN_ON(!current->luf_takeoff_started); @@ -700,6 +713,8 @@ void luf_takeoff_end(void) goto out; cur_luf_ugen = current->luf_ugen; + cur_zone_ugen = current->zone_ugen; + cur_wait_zone_ugen = current->wait_zone_ugen; current->luf_ugen = 0; @@ -707,10 +722,38 @@ void luf_takeoff_end(void) reset_batch(tlb_ubc_takeoff); try_to_unmap_flush_takeoff(); + + if (cur_wait_zone_ugen || cur_zone_ugen) { + /* + * pcp(zone == NULL) doesn't work with zone batch. + */ + if (zone) { + current->zone_ugen = 0; + current->wait_zone_ugen = 0; + + /* + * Guarantee that tlb shootdown required for the + * zone_ugen has been completed once observing + * 'zone_ugen_done'. + */ + smp_mb(); + + /* + * zone->zone_ugen_done should be updated + * sequentially. + */ + if (cur_wait_zone_ugen) + wait_zone_ugen_done(zone, cur_wait_zone_ugen); + if (cur_zone_ugen) + set_zone_ugen_done(zone, cur_zone_ugen); + } + } out: if (outmost) { VM_WARN_ON(current->luf_no_shootdown); VM_WARN_ON(current->luf_ugen); + VM_WARN_ON(current->zone_ugen); + VM_WARN_ON(current->wait_zone_ugen); } } @@ -741,9 +784,9 @@ bool luf_takeoff_no_shootdown(void) * Should be called with either zone lock held and irq disabled or pcp * lock held. */ -bool luf_takeoff_check(struct page *page) +bool luf_takeoff_check(struct zone *zone, struct page *page) { - unsigned short luf_key = page_luf_key(page); + unsigned long zone_ugen; /* * No way. Delimit using luf_takeoff_{start,end}(). @@ -753,7 +796,29 @@ bool luf_takeoff_check(struct page *page) return false; } - if (!luf_key) + if (!zone) { + unsigned short luf_key = page_luf_key(page); + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + return true; + } + + zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + return true; + + /* + * Should not be zero since zone-zone_ugen has been updated in + * __free_one_page() -> update_zone_batch(). + */ + VM_WARN_ON(!zone->zone_ugen); + + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) return true; return !current->luf_no_shootdown; @@ -763,13 +828,11 @@ bool luf_takeoff_check(struct page *page) * Should be called with either zone lock held and irq disabled or pcp * lock held. */ -bool luf_takeoff_check_and_fold(struct page *page) +bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) { struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; - unsigned short luf_key = page_luf_key(page); - struct luf_batch *lb; - unsigned long lb_ugen; unsigned long flags; + unsigned long zone_ugen; /* * No way. Delimit using luf_takeoff_{start,end}(). @@ -779,28 +842,94 @@ bool luf_takeoff_check_and_fold(struct page *page) return false; } - if (!luf_key) - return true; + /* + * pcp case + */ + if (!zone) { + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long lb_ugen; - lb = &luf_batch[luf_key]; - read_lock_irqsave(&lb->lock, flags); - lb_ugen = lb->ugen; + if (!luf_key) + return true; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + + if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + read_unlock_irqrestore(&lb->lock, flags); + return true; + } + + if (current->luf_no_shootdown) { + read_unlock_irqrestore(&lb->lock, flags); + return false; + } - if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + fold_batch(tlb_ubc_takeoff, &lb->batch, false); read_unlock_irqrestore(&lb->lock, flags); + + if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) + current->luf_ugen = lb_ugen; return true; } - if (current->luf_no_shootdown) { - read_unlock_irqrestore(&lb->lock, flags); + zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + return true; + + /* + * Should not be zero since zone-zone_ugen has been updated in + * __free_one_page() -> update_zone_batch(). + */ + VM_WARN_ON(!zone->zone_ugen); + + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + return true; + + if (current->luf_no_shootdown) return false; - } - fold_batch(tlb_ubc_takeoff, &lb->batch, false); - read_unlock_irqrestore(&lb->lock, flags); + /* + * zone batched flush has been already set. + */ + if (current->zone_ugen) + return true; + + /* + * Others are already performing tlb shootdown for us. All we + * need is to wait for those to complete. + */ + if (zone_ugen != zone->zone_ugen) { + if (!current->wait_zone_ugen || + ugen_before(current->wait_zone_ugen, zone_ugen)) + current->wait_zone_ugen = zone_ugen; + /* + * It's the first time that zone->zone_ugen has been set to + * current->zone_ugen. current->luf_ugen also get set. + */ + } else { + current->wait_zone_ugen = prev_ugen(zone->zone_ugen); + current->zone_ugen = zone->zone_ugen; + current->luf_ugen = zone->luf_ugen; + + /* + * Now that tlb shootdown for the zone_ugen will be + * performed at luf_takeoff_end(), advance it so that + * the next zone->lock holder can efficiently avoid + * unnecessary tlb shootdown. + */ + zone->zone_ugen = next_ugen(zone->zone_ugen); - if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) - current->luf_ugen = lb_ugen; + /* + * All the luf pages will eventually become non-luf + * pages by tlb flushing at luf_takeoff_end() and, + * flush_pend_list_if_done() will empty pend_list. + */ + atomic_long_set(&zone->nr_luf_pages, 0); + fold_batch(tlb_ubc_takeoff, &zone->zone_batch, true); + } return true; } #endif @@ -822,6 +951,42 @@ static inline void account_freepages(struct zone *zone, int nr_pages, zone->nr_free_highatomic + nr_pages); } +static void flush_pend_list_if_done(struct zone *zone, + struct free_area *area, int migratetype) +{ + unsigned long zone_ugen_done = READ_ONCE(zone->zone_ugen_done); + + /* + * tlb shootdown required for the zone_ugen already has been + * done. Thus, let's move pages in pend_list to free_list to + * secure more non-luf pages. + */ + if (!ugen_before(zone_ugen_done, area->pend_zone_ugen[migratetype])) + list_splice_init(&area->pend_list[migratetype], + &area->free_list[migratetype]); +} + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +/* + * Should be called with zone->lock held and irq disabled. + */ +static void update_zone_batch(struct zone *zone, unsigned short luf_key) +{ + unsigned long lb_ugen; + struct luf_batch *lb = &luf_batch[luf_key]; + + read_lock(&lb->lock); + fold_batch(&zone->zone_batch, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock(&lb->lock); + + if (ugen_before(zone->luf_ugen, lb_ugen)) + zone->luf_ugen = lb_ugen; +} +#else +static void update_zone_batch(struct zone *zone, unsigned short luf_key) {} +#endif + /* Used for pages not on another list */ static inline void __add_to_free_list(struct page *page, struct zone *zone, unsigned int order, int migratetype, @@ -830,6 +995,12 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, struct free_area *area = &zone->free_area[order]; struct list_head *list; + /* + * Good chance to flush pend_list just before updating the + * {free,pend}_list. + */ + flush_pend_list_if_done(zone, area, migratetype); + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), migratetype, 1 << order); @@ -839,8 +1010,9 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, * positive is okay because it will cause just additional tlb * shootdown. */ - if (page_luf_key(page)) { + if (page_zone_ugen(zone, page)) { list = &area->pend_list[migratetype]; + area->pend_zone_ugen[migratetype] = zone->zone_ugen; atomic_long_add(1 << order, &zone->nr_luf_pages); } else list = &area->free_list[migratetype]; @@ -862,6 +1034,7 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, unsigned int order, int old_mt, int new_mt) { struct free_area *area = &zone->free_area[order]; + unsigned long zone_ugen = page_zone_ugen(zone, page); /* Free page moving can fail, so it happens before the type update */ VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt, @@ -878,9 +1051,12 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, * positive is okay because it will cause just additional tlb * shootdown. */ - if (page_luf_key(page)) + if (zone_ugen) { list_move_tail(&page->buddy_list, &area->pend_list[new_mt]); - else + if (!area->pend_zone_ugen[new_mt] || + ugen_before(area->pend_zone_ugen[new_mt], zone_ugen)) + area->pend_zone_ugen[new_mt] = zone_ugen; + } else list_move_tail(&page->buddy_list, &area->free_list[new_mt]); account_freepages(zone, -(1 << order), old_mt); @@ -898,7 +1074,7 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon if (page_reported(page)) __ClearPageReported(page); - if (page_luf_key(page)) + if (page_zone_ugen(zone, page)) atomic_long_sub(1 << order, &zone->nr_luf_pages); list_del(&page->buddy_list); @@ -936,29 +1112,39 @@ static inline struct page *get_page_from_free_area(struct zone *zone, */ pend_first = !non_luf_pages_ok(zone); + /* + * Good chance to flush pend_list just before updating the + * {free,pend}_list. + */ + flush_pend_list_if_done(zone, area, migratetype); + if (pend_first) { page = list_first_entry_or_null(&area->pend_list[migratetype], struct page, buddy_list); - if (page && luf_takeoff_check(page)) + if (page && luf_takeoff_check(zone, page)) return page; page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); - if (page) + if (page) { + set_page_zone_ugen(page, 0); return page; + } } else { page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); - if (page) + if (page) { + set_page_zone_ugen(page, 0); return page; + } page = list_first_entry_or_null(&area->pend_list[migratetype], struct page, buddy_list); - if (page && luf_takeoff_check(page)) + if (page && luf_takeoff_check(zone, page)) return page; } return NULL; @@ -1023,6 +1209,7 @@ static inline void __free_one_page(struct page *page, unsigned long combined_pfn; struct page *buddy; bool to_tail; + unsigned long zone_ugen; VM_BUG_ON(!zone_is_initialized(zone)); VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); @@ -1034,20 +1221,25 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); /* - * Use the page's luf_key unchanged if luf_key == 0. Worth - * noting that page_luf_key() will be 0 in most cases since it's - * initialized at free_pages_prepare(). + * Use the page's zone_ugen unchanged if luf_key == 0. Worth + * noting that page_zone_ugen() will be 0 in most cases since + * it's initialized at free_pages_prepare(). + * + * Update page's zone_ugen and zone's batch only if a valid + * luf_key was passed. */ - if (luf_key) - set_page_luf_key(page, luf_key); - else - luf_key = page_luf_key(page); + if (luf_key) { + zone_ugen = zone->zone_ugen; + set_page_zone_ugen(page, (unsigned short)zone_ugen); + update_zone_batch(zone, luf_key); + } else + zone_ugen = page_zone_ugen(zone, page); while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; - unsigned short buddy_luf_key; + unsigned long buddy_zone_ugen; - if (!luf_key && compaction_capture(capc, page, order, migratetype)) { + if (!zone_ugen && compaction_capture(capc, page, order, migratetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -1080,17 +1272,15 @@ static inline void __free_one_page(struct page *page, else __del_page_from_free_list(buddy, zone, order, buddy_mt); + buddy_zone_ugen = page_zone_ugen(zone, buddy); + /* - * !buddy_luf_key && !luf_key : do nothing - * buddy_luf_key && !luf_key : luf_key = buddy_luf_key - * !buddy_luf_key && luf_key : do nothing - * buddy_luf_key && luf_key : merge two into luf_key + * if (!zone_ugen && !buddy_zone_ugen) : nothing to do + * if ( zone_ugen && !buddy_zone_ugen) : nothing to do */ - buddy_luf_key = page_luf_key(buddy); - if (buddy_luf_key && !luf_key) - luf_key = buddy_luf_key; - else if (buddy_luf_key && luf_key) - fold_luf_batch(&luf_batch[luf_key], &luf_batch[buddy_luf_key]); + if ((!zone_ugen && buddy_zone_ugen) || + ( zone_ugen && buddy_zone_ugen && ugen_before(zone_ugen, buddy_zone_ugen))) + zone_ugen = buddy_zone_ugen; if (unlikely(buddy_mt != migratetype)) { /* @@ -1103,7 +1293,7 @@ static inline void __free_one_page(struct page *page, combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); - set_page_luf_key(page, luf_key); + set_page_zone_ugen(page, zone_ugen); pfn = combined_pfn; order++; } @@ -1446,6 +1636,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, do { unsigned long pfn; int mt; + unsigned short luf_key; page = list_last_entry(list, struct page, pcp_list); pfn = page_to_pfn(page); @@ -1456,7 +1647,16 @@ static void free_pcppages_bulk(struct zone *zone, int count, count -= nr_pages; pcp->count -= nr_pages; - __free_one_page(page, pfn, zone, order, mt, FPI_NONE, 0); + /* + * page private in pcp stores luf_key while it + * stores zone_ugen in buddy. Thus, the private + * needs to be cleared and the luf_key needs to + * be passed to buddy. + */ + luf_key = page_luf_key(page); + set_page_private(page, 0); + + __free_one_page(page, pfn, zone, order, mt, FPI_NONE, luf_key); trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); @@ -1501,7 +1701,15 @@ static void free_one_page(struct zone *zone, struct page *page, * valid luf_key can be passed only if order == 0. */ VM_WARN_ON(luf_key && order); - set_page_luf_key(page, luf_key); + + /* + * Update page's zone_ugen and zone's batch only if a valid + * luf_key was passed. + */ + if (luf_key) { + set_page_zone_ugen(page, (unsigned short)zone->zone_ugen); + update_zone_batch(zone, luf_key); + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -1655,7 +1863,7 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; - if (page_luf_key(&page[size])) + if (page_zone_ugen(zone, &page[size])) tail = true; __add_to_free_list(&page[size], zone, high, migratetype, tail); @@ -1673,7 +1881,7 @@ static __always_inline void page_del_and_expand(struct zone *zone, int nr_pages = 1 << high; __del_page_from_free_list(page, zone, high, migratetype); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); nr_pages -= expand(zone, page, low, high, migratetype); account_freepages(zone, -nr_pages, migratetype); @@ -2202,7 +2410,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int nr_added; del_page_from_free_list(page, zone, current_order, block_type); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); change_pageblock_range(page, current_order, start_type); nr_added = expand(zone, page, order, current_order, start_type); @@ -2441,12 +2649,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, WARN_ON_ONCE(ret == -1); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); return ret; } } spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); } return false; @@ -2611,12 +2819,15 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->pcp_list, list); + + /* + * Reset all the luf fields. tlb shootdown will be + * performed at luf_takeoff_end() below if needed. + */ + set_page_private(page, 0); } spin_unlock_irqrestore(&zone->lock, flags); - /* - * Check and flush before using the pages taken off. - */ - luf_takeoff_end(); + luf_takeoff_end(zone); return i; } @@ -3130,7 +3341,7 @@ int __isolate_free_page(struct page *page, unsigned int order, bool willputback) } del_page_from_free_list(page, zone, order, mt); - if (unlikely(!willputback && !luf_takeoff_check_and_fold(page))) + if (unlikely(!willputback && !luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); /* @@ -3229,7 +3440,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (!page) { spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); return NULL; } } @@ -3237,7 +3448,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3327,7 +3538,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } list_for_each_entry(page, list, pcp_list) { - if (luf_takeoff_check_and_fold(page)) { + if (luf_takeoff_check_and_fold(NULL, page)) { list_del(&page->pcp_list); pcp->count -= 1 << order; break; @@ -3362,7 +3573,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (!pcp) { pcp_trylock_finish(UP_flags); - luf_takeoff_end(); + luf_takeoff_end(NULL); return NULL; } @@ -3379,7 +3590,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(NULL); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); @@ -3418,6 +3629,7 @@ struct page *rmqueue(struct zone *preferred_zone, migratetype); out: + /* Separate test+clear to avoid unnecessary atomics */ if ((alloc_flags & ALLOC_KSWAPD) && unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { @@ -5017,7 +5229,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(NULL); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(zonelist_zone(ac.preferred_zoneref), zone, nr_account); @@ -5027,7 +5239,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, failed_irq: pcp_trylock_finish(UP_flags); - luf_takeoff_end(); + luf_takeoff_end(NULL); failed: page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask); @@ -7111,7 +7323,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); pfn += (1 << order); } @@ -7119,7 +7331,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return end_pfn - start_pfn - already_offline; } @@ -7181,7 +7393,7 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; - if (page_luf_key(current_buddy)) + if (page_zone_ugen(zone, current_buddy)) tail = true; add_to_free_list(current_buddy, zone, high, migratetype, tail); @@ -7213,7 +7425,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order, migratetype); - if (unlikely(!luf_takeoff_check_and_fold(page_head))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page_head))) VM_WARN_ON(1); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); @@ -7229,7 +7441,7 @@ bool take_page_off_buddy(struct page *page) /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return ret; } @@ -7248,6 +7460,13 @@ bool put_page_back_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page, pfn); ClearPageHWPoisonTakenOff(page); + + /* + * Reset all the luf fields. tlb shootdown has already + * been performed by take_page_off_buddy(). + */ + set_page_private(page, 0); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE, 0); if (TestClearPageHWPoison(page)) { ret = true; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e152b22fbba8a..b23d3ed34ec07 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -118,7 +118,8 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, /* * Ensure private is zero before putting into the - * allocator. + * allocator. tlb shootdown has already been performed + * at isolation. */ set_page_private(page, 0); @@ -194,7 +195,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (PageReported(page)) continue; - if (unlikely(consider_pend && !luf_takeoff_check(page))) { + if (unlikely(consider_pend && !luf_takeoff_check(zone, page))) { VM_WARN_ON(1); continue; } @@ -238,7 +239,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); /* begin processing pages in local list */ err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); @@ -283,7 +284,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return err; } diff --git a/mm/rmap.c b/mm/rmap.c index 58dfc9889b1ee..b6613b48669ac 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -650,7 +650,11 @@ static unsigned long new_luf_ugen(void) { unsigned long ugen = atomic_long_inc_return(&luf_ugen); - if (!ugen) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if (!(unsigned short)ugen) ugen = atomic_long_inc_return(&luf_ugen); return ugen; From patchwork Wed Feb 26 12:03:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992212 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08ED0C021BF for ; Wed, 26 Feb 2025 12:04:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A89E1280040; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A2E4528003F; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C35C280042; Wed, 26 Feb 2025 07:03:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 54AF028003F for ; Wed, 26 Feb 2025 07:03:57 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1A6711A13A7 for ; Wed, 26 Feb 2025 12:03:57 +0000 (UTC) X-FDA: 83161962114.29.BD73C4E Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id 437C41A000B for ; Wed, 26 Feb 2025 12:03:54 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571435; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=m9YKYHmA3jceTDObJol10aeVRasWp5qci5u0K+W6z1I=; b=Q5syOiLDaEgd5LZa4HofnJ2l+JWbIRUFwKgFVnXRZEpXNuCvEWPKC6Jhenv5w6GBkap492 4WertU7l37cfcIgwb9SvFA1HZsqyBzSJirbgez7LGMJ24E8+F2qdv2ytBBtzlRtPpJ233f UvVo0Uj8jVK0bRyy75XQLPRyZ09dZHk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571435; a=rsa-sha256; cv=none; b=6fq2VCXJ6EhUc7KWQxA5hqzcFR+ETA37gWcneT6HlQPGQc8riuaOEXg6keCef45MciXqRO VhY7rfTbS2UcGXzCRqK0gnHL5hn9CFkngqcX6/5Uh6/2xe8eCBLEAKNq0QCfINyw7YgoMA 0YC0qmaVuIiXlBl17AXFPh+9fo9uy0A= X-AuditID: a67dfc5b-3e1ff7000001d7ae-61-67bf03238ab2 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 22/25] mm/page_alloc: not allow to tlb shootdown if !preemptable() && non_luf_pages_ok() Date: Wed, 26 Feb 2025 21:03:33 +0900 Message-Id: <20250226120336.29565-22-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnka4y8/50gxnnJS3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+PM3WNMBV+sKiYvOcfWwPjOoIuR k0NCwETifPcONhi78fN3RhCbTUBd4saNn8wgtoiAmcTB1j/sXYxcHMwCy5gk9p5oYANxhAWa GSUe7GxmB6liEVCV2LDqE1gHL1BHe+t3Foip8hKrNxwAi3MCxT9NOwa2TUggWWLn7z9MIIMk BO6zSVyY2ckI0SApcXDFDZYJjLwLGBlWMQpl5pXlJmbmmOhlVOZlVugl5+duYgQG9rLaP9E7 GD9dCD7EKMDBqMTD++DM3nQh1sSy4srcQ4wSHMxKIrycmXvShXhTEiurUovy44tKc1KLDzFK c7AoifMafStPERJITyxJzU5NLUgtgskycXBKNTA6Jh2L+5d5aPMdMcF3RnlvfOfcqFK83Ndj cfbTv3ceX/5Mutws2Hld6PGHjS4nb7Myer4O47Es1olXrJp7aNJD/+67aw5qL/8m+oqzIu7d 8tlRabV6kVvOBwqdV1nIx37oo9/+xXq/Zx26cLmgXDSB7aeOU5/TdzXzC7INuxkCJm+6e6f5 4RYHJZbijERDLeai4kQAPohn22gCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/d4Pk6AYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyztw9xlTwxapi 8pJzbA2M7wy6GDk5JARMJBo/f2cEsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 EEdYoJlR4sHOZnaQKhYBVYkNqz6BdfACdbS3fmeBmCovsXrDAbA4J1D807RjbCC2kECyxM7f f5gmMHItYGRYxSiSmVeWm5iZY6pXnJ1RmZdZoZecn7uJERimy2r/TNzB+OWy+yFGAQ5GJR7e B2f2pguxJpYVV+YeYpTgYFYS4eXM3JMuxJuSWFmVWpQfX1Sak1p8iFGag0VJnNcrPDVBSCA9 sSQ1OzW1ILUIJsvEwSnVwBjbXBu5Qjt+3xPnwxxTyiYmnclbbXXK2dL8bGdChIijrjiXXSGD 4u6HrSl3c51eTWvqkVLUl4uaVJUoOnfV7SspE/ZdmtP7+KqbzqW33x6zvHRLcvNcm/ZvoVj9 TZuPJV+qw4KXfxQ5bfen8Oh6Y/djH54HKUgxCfv++5CbuqZmxR3T338XSCuxFGckGmoxFxUn AgA5z9K+TwIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: tn6bu6nofhdppbs46zoh4y6ynr1cbt57 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 437C41A000B X-HE-Tag: 1740571434-270048 X-HE-Meta: U2FsdGVkX1+t5u5jBkQhNehj1Fe7RvD3YROv/kJjUyjdcGem6Kyji8GIWcCJ3utFx285HiJtCOQtjRoFQnG0LkPgGSfe9Q5jqDrszaegqET6k+jtbPgWD4q67FUqEgzdGOanCF5hY6SZp59FgnmeeLQg/h2coJqKLKzu8u+ZRenSqPNKzI3COLkcx8YIkYvKupGFgVxl0iQlX3TqIqg85b9rkRnjHGqMWA6kMhfBRLKDDYEwRiDEbc5KiwGhaYqKWb4Riepnr3Ex2XJcL2uu/tdBCVacEGoeWX9hIsbpyBitjrEUr68zgLefw3KYtncHPIdcVHnwlZhgEf6Cza0CIBvzRXLfu9RIu79sPZqaJZkitFxQ5xIe1+N5Y0XbMjoZDCc22KW/DyLSg5TBLjI1O+ToibmKthsl48W4haovV013Kf7xJsXu312Kdf/6fS4hr+lNUxu4YO17N2C8fuBwCUqEy1VXxw15q5YV9ytu2y7LGcKYi9AFuRj+uXHhK6eckpeNSbLANRYweakv3te+9y4wPvh6ZB/FPqCDjxKNCuFcBOETxQ7J4yDzKA5v7KlF9vywtHjXuTy9f1T/Br9cdb1j50DmbRzs0OydFrti+srhm+bgFWAYgogH7LKZSdfwSMp76d4tsvdagztkZa1iUcdYC7UIK7kje6RdTIMnGKnUcuHH5HE/NO/wEcSzJm9evaWB64X0m25ohlUIK80avD9HdeWWiYV4qH6KDysGe3ykZTaiGcq8LwAoediE6SlRMQbxdMnCz0uQGHaue20iL/JN4D0fVDlTVwminVVqxJVpleA4aflmMRyn8Wj1V8Jdgym0UA01GXw8MtC6HVK3jHhYMshR7Gcq77uM4V8m8fy+Zti/QSaFI6FwHcq5glxwc9Yqz5ZxDS+Btrs3Hs7Reyp+IngaZQHXRn2bJEKXB46l2MDPCKVS4DaDE3Rkz3I+4VfbGnAfXZo9QsD4COv 3NHlJypA aZBnOp/jBpnKmGlNCh/HbJTxdpOOhFZXQu3bmJkVyCe9UBjMe8V383BXAll9AZWJpOt+FOp0tzE/j1Ak= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Do not perform tlb shootdown if the context is in preempt disable and there are already enough non luf pages, not to hurt preemptibility. Signed-off-by: Byungchul Park --- mm/compaction.c | 6 +++--- mm/internal.h | 5 +++-- mm/page_alloc.c | 27 +++++++++++++++------------ mm/page_isolation.c | 2 +- mm/page_reporting.c | 4 ++-- 5 files changed, 24 insertions(+), 20 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index b7a7a6feb9eac..aab400ec6a734 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -606,7 +606,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, page = pfn_to_page(blockpfn); - luf_takeoff_start(); + luf_takeoff_start(cc->zone); /* Isolate free pages. */ for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; @@ -1603,7 +1603,7 @@ static void fast_isolate_freepages(struct compact_control *cc) if (!area->nr_free) continue; - can_shootdown = luf_takeoff_start(); + can_shootdown = luf_takeoff_start(cc->zone); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; retry: @@ -2416,7 +2416,7 @@ static enum compact_result compact_finished(struct compact_control *cc) * luf_takeoff_{start,end}() is required to identify whether * this compaction context is tlb shootdownable for luf'd pages. */ - luf_takeoff_start(); + luf_takeoff_start(cc->zone); ret = __compact_finished(cc); luf_takeoff_end(cc->zone); diff --git a/mm/internal.h b/mm/internal.h index 6d7b3b389810e..b5f1928732498 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1591,7 +1591,7 @@ static inline void accept_page(struct page *page) #endif /* CONFIG_UNACCEPTED_MEMORY */ #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) extern struct luf_batch luf_batch[]; -bool luf_takeoff_start(void); +bool luf_takeoff_start(struct zone *zone); void luf_takeoff_end(struct zone *zone); bool luf_takeoff_no_shootdown(void); bool luf_takeoff_check(struct zone *zone, struct page *page); @@ -1605,6 +1605,7 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } + unsigned short fold_unmap_luf(void); /* @@ -1691,7 +1692,7 @@ static inline bool can_luf_vma(struct vm_area_struct *vma) return true; } #else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ -static inline bool luf_takeoff_start(void) { return false; } +static inline bool luf_takeoff_start(struct zone *zone) { return false; } static inline void luf_takeoff_end(struct zone *zone) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct zone *zone, struct page *page) { return true; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 917a257ea5706..2a2103df2d88e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -623,22 +623,25 @@ compaction_capture(struct capture_control *capc, struct page *page, #endif /* CONFIG_COMPACTION */ #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) -static bool no_shootdown_context(void) +static bool no_shootdown_context(struct zone *zone) { /* - * If it performs with irq disabled, that might cause a deadlock. - * Avoid tlb shootdown in this case. + * Tries to avoid tlb shootdown if !preemptible(). However, it + * should be allowed under heavy memory pressure. */ + if (zone && non_luf_pages_ok(zone)) + return !(preemptible() && in_task()); + return !(!irqs_disabled() && in_task()); } /* * Can be called with zone lock released and irq enabled. */ -bool luf_takeoff_start(void) +bool luf_takeoff_start(struct zone *zone) { unsigned long flags; - bool no_shootdown = no_shootdown_context(); + bool no_shootdown = no_shootdown_context(zone); local_irq_save(flags); @@ -2591,7 +2594,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * luf_takeoff_{start,end}() is required for * get_page_from_free_area() to use luf_takeoff_check(). */ - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2796,7 +2799,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -3422,7 +3425,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); @@ -3567,7 +3570,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long __maybe_unused UP_flags; - luf_takeoff_start(); + luf_takeoff_start(NULL); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -5190,7 +5193,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; - luf_takeoff_start(); + luf_takeoff_start(NULL); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -7294,7 +7297,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); while (pfn < end_pfn) { page = pfn_to_page(pfn); @@ -7412,7 +7415,7 @@ bool take_page_off_buddy(struct page *page) unsigned int order; bool ret = false; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c34659b58ca6c..f4055c0a2ea89 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -211,7 +211,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) struct page *buddy; zone = page_zone(page); - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index b23d3ed34ec07..83b66e7f0d257 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -170,7 +170,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (free_area_empty(area, mt)) return err; - can_shootdown = luf_takeoff_start(); + can_shootdown = luf_takeoff_start(zone); spin_lock_irq(&zone->lock); /* @@ -250,7 +250,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* update budget to reflect call to report function */ budget--; - luf_takeoff_start(); + luf_takeoff_start(zone); /* reacquire zone lock and resume processing */ spin_lock_irq(&zone->lock); From patchwork Wed Feb 26 12:03:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDAABC021B8 for ; Wed, 26 Feb 2025 12:04:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A822F280043; Wed, 26 Feb 2025 07:03:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A3572280042; Wed, 26 Feb 2025 07:03:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F6A5280043; Wed, 26 Feb 2025 07:03:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 34673280042 for ; Wed, 26 Feb 2025 07:03:58 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F044D1C9C9B for ; Wed, 26 Feb 2025 12:03:57 +0000 (UTC) X-FDA: 83161962114.17.4C6D055 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id E418C40015 for ; Wed, 26 Feb 2025 12:03:55 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571436; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=XXMUXL/Lk5kaMkA3arvEEUUX2KAeYLr0m6D2vVkax2Y=; b=6WF7DnGZgY/b9WuZCfwJ/1BUgr9qfH7G1zPjV1dj9bQB+r3Z21x5mLMslqklCw2cE1JF7B jActJYwmCP5C5XgfHppBFNCZf4UqWeWZU/aUzfezMXlRhjHsf9cRbLUn2GY4Cj4sgO67PU FtktJ/XCMwY+/IljhSdtQymLdPtE58s= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571436; a=rsa-sha256; cv=none; b=xZiVXrL26+Ph4vzpDcJrc5q6FrJbwWVMNmemQRq4Bn0XG5PFVNCJKKWu78pk0BQxpe+UBq OfjMFHlth3aA1jITFhypIfIr8AZ0WCSmzlJ9LArMVDXz5ttklbqHuaPV6T+2E6N8HRGiwG AMcMiPSd8A+sOl8LbUs1GO299wonrN0= X-AuditID: a67dfc5b-3e1ff7000001d7ae-66-67bf032324af From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 23/25] mm/migrate: apply luf mechanism to unmapping during migration Date: Wed, 26 Feb 2025 21:03:34 +0900 Message-Id: <20250226120336.29565-23-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4y8/50gw29shZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlfFxwV+Wgn8BFX/37GZpYJzo2MXI ySEhYCLx9tctFhh71s92JhCbTUBd4saNn8wgtoiAmcTB1j/sXYxcHMwCy5gk9p5oYANJCAvk S1ye+AyogYODRUBV4uplP5AwL1D9q/utzBAz5SVWbzgAZnMCxT9NOwbWKiSQLLHz9x8mkJkS ArfZJJ4cPgR1hKTEwRU3WCYw8i5gZFjFKJSZV5abmJljopdRmZdZoZecn7uJERjUy2r/RO9g /HQh+BCjAAejEg/vgzN704VYE8uKK3MPMUpwMCuJ8HJm7kkX4k1JrKxKLcqPLyrNSS0+xCjN waIkzmv0rTxFSCA9sSQ1OzW1ILUIJsvEwSnVwLjmyq818vf3sfz67aZbLvr3tfg2Ka2Vc9an u7WLzZ/xhiHwwzwZo9ZjU9cUcrwolBXkkC6yUPGuN978jKVDWf/kfAWmi08+Pz0e1s6zyn7p AU7j67xnXvRxupw2WZnKfrxLmPFRDcvfdU95nR+wSfn1nmF6dPOddfu38Dr3kypxP6/vcD78 6aASS3FGoqEWc1FxIgCdmyVNZgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/d4MYBIYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyPi74y1LwL6Di 757dLA2MEx27GDk5JARMJGb9bGcCsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 kISwQL7E5YnPgBo4OFgEVCWuXvYDCfMC1b+638oMMVNeYvWGA2A2J1D807RjYK1CAskSO3// YZrAyLWAkWEVo0hmXlluYmaOqV5xdkZlXmaFXnJ+7iZGYIguq/0zcQfjl8vuhxgFOBiVeHgf nNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEhRmkOFiVxXq/w1AQhgfTE ktTs1NSC1CKYLBMHp1QDIwtLfybnL7GM1RkLL8z/sTH6I/P2ovVMDFvKOzKM5stf28n92lpt er5M2JGLD7Oaz7fZH567zVJtVUJdis/TTXNVHgldiPVpDYt+X8pSpLjd5xd/qtmH05/WXmY7 6ORjGHGkK0c0MCX5oPnjIxy2dc77lVW3XWi4t8Le4rbFt/uJD3XOzu+ZrcRSnJFoqMVcVJwI AOzi2bhNAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Queue-Id: E418C40015 X-Stat-Signature: r4j7qcdxw9qs94zxhknzrcdm46145gdm X-Rspamd-Server: rspam03 X-HE-Tag: 1740571435-647598 X-HE-Meta: U2FsdGVkX18tTxkLeVC5JrJXPhDmtPRDr6PFRnvatBm5dhBYFnHJd3CFsMQJx92s/s06i7Am9E5/QvrYQUYVbgFdj65NgVrZUWKnyJ3nX/SvvO3Wrg1H+w9WrzOGUMClMOGtKkqVU12YsXhniHEi800orp32he/8YwsT7lrLpC33bwrO9spqUHHXu9zwSqdNIj8CNC7/9AaycoratrlmTYMi9jwr94JOScjVewHYpJq+xFW2KgRz7lS+sEqb3huzT+tWKKydQdA7Cnem8ZjTPD4vV+qAlq+18fMs1w+rCP60QG8uSWLMZvuooy2PwGamFlnMpDIBIAwLuW8/8EhR34oJ5YHJQhNNA2rm13+XbdjpqDDlMPZHVBmFFhj2g2cIpKDIKAFxew4HhlthYJml7EC7fQ11cIQ07rWM72GUKARMoztmaPC4DyebTbDsl/7JYwF6S+LxigJbbRkd1XuefYMLm3u35cEy3MLT8mcjdNnPSCYWyfN49gkpxKyhy7NAvh/wuOi6Txff17/hTRwbDWOOv4Ziu+8f+iqv+VcBld9udKpHyyEP67yF46b29s6Vm3tp8UGe5vs72ei0gUU3T7NorCJGcqmOMoZGUoDZayGy6U6Cx1CUuRLFX/TMN+5TToH+YB1LaJGdUcL8R6tkXPGnLp7ZC0xtbbwJ5BmYLesMO5Ml8W6s6+hneX7Rw1cUONRKA6LlUUu8KBuowxPTFK9sm0yIJPZSSvPLql8FC5wKUU29EoD3gumz+OaWwFCkl+y0rIMXxfKc/Ue3T5+dm2H3cY+8V1vq52z9lHjuGovzGFURKxk1yi/qgEMj1zDLLQE3ew8WBb3E+iqm8oBUgxmiwa51Gq7Yq4uAaCLh0k4DVY6t0ksfAtvnHh9cTUMxwCZzSm5VX3Yp/wX9/8BGxB76Wd7BE/d5SX7J8I2pL8bzqcYfFK794gTThN17LaN57d6viXDT9G1ytts0r6v PYw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. Applied the mechanism to unmapping during migration. Signed-off-by: Byungchul Park --- include/linux/mm.h | 2 ++ include/linux/rmap.h | 2 +- mm/migrate.c | 66 ++++++++++++++++++++++++++++++++++---------- mm/rmap.c | 15 ++++++---- mm/swap.c | 2 +- 5 files changed, 64 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e8e6562abc77d..1577bc8b743fe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1489,6 +1489,8 @@ static inline void folio_put(struct folio *folio) __folio_put(folio); } +void page_cache_release(struct folio *folio); + /** * folio_put_refs - Reduce the reference count on a folio. * @folio: The folio. diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 683a04088f3f2..cedba4812ccc7 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -660,7 +660,7 @@ static inline int folio_try_share_anon_rmap_pmd(struct folio *folio, int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); -void try_to_migrate(struct folio *folio, enum ttu_flags flags); +bool try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, diff --git a/mm/migrate.c b/mm/migrate.c index fb19a18892c89..7ce4d3dbcb1af 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1164,7 +1164,8 @@ static void migrate_folio_undo_dst(struct folio *dst, bool locked, /* Cleanup src folio upon migration success */ static void migrate_folio_done(struct folio *src, - enum migrate_reason reason) + enum migrate_reason reason, + unsigned short luf_key) { /* * Compaction can migrate also non-LRU pages which are @@ -1175,16 +1176,31 @@ static void migrate_folio_done(struct folio *src, mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); - if (reason != MR_MEMORY_FAILURE) - /* We release the page in page_handle_poison. */ + /* We release the page in page_handle_poison. */ + if (reason == MR_MEMORY_FAILURE) + luf_flush(luf_key); + else if (!luf_key) folio_put(src); + else { + /* + * Should be the last reference. + */ + if (unlikely(!folio_put_testzero(src))) + VM_WARN_ON(1); + + page_cache_release(src); + folio_unqueue_deferred_split(src); + mem_cgroup_uncharge(src); + free_frozen_pages(&src->page, folio_order(src), luf_key); + } } /* Obtain the lock on page, remove all ptes. */ static int migrate_folio_unmap(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio **dstp, enum migrate_mode mode, - enum migrate_reason reason, struct list_head *ret) + enum migrate_reason reason, struct list_head *ret, + bool *can_luf) { struct folio *dst; int rc = -EAGAIN; @@ -1200,7 +1216,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, folio_clear_unevictable(src); /* free_pages_prepare() will clear PG_isolated. */ list_del(&src->lru); - migrate_folio_done(src, reason); + migrate_folio_done(src, reason, 0); return MIGRATEPAGE_SUCCESS; } @@ -1317,7 +1333,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_vma, src); - try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); + *can_luf = try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); old_page_state |= PAGE_WAS_MAPPED; } @@ -1345,7 +1361,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio *dst, enum migrate_mode mode, enum migrate_reason reason, - struct list_head *ret) + struct list_head *ret, unsigned short luf_key) { int rc; int old_page_state = 0; @@ -1399,7 +1415,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, if (anon_vma) put_anon_vma(anon_vma); folio_unlock(src); - migrate_folio_done(src, reason); + migrate_folio_done(src, reason, luf_key); return rc; out: @@ -1694,7 +1710,7 @@ static void migrate_folios_move(struct list_head *src_folios, struct list_head *ret_folios, struct migrate_pages_stats *stats, int *retry, int *thp_retry, int *nr_failed, - int *nr_retry_pages) + int *nr_retry_pages, unsigned short luf_key) { struct folio *folio, *folio2, *dst, *dst2; bool is_thp; @@ -1711,7 +1727,7 @@ static void migrate_folios_move(struct list_head *src_folios, rc = migrate_folio_move(put_new_folio, private, folio, dst, mode, - reason, ret_folios); + reason, ret_folios, luf_key); /* * The rules are: * Success: folio will be freed @@ -1788,7 +1804,11 @@ static int migrate_pages_batch(struct list_head *from, int rc, rc_saved = 0, nr_pages; LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); + LIST_HEAD(unmap_folios_luf); + LIST_HEAD(dst_folios_luf); bool nosplit = (reason == MR_NUMA_MISPLACED); + unsigned short luf_key; + bool can_luf; VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); @@ -1863,9 +1883,11 @@ static int migrate_pages_batch(struct list_head *from, continue; } + can_luf = false; rc = migrate_folio_unmap(get_new_folio, put_new_folio, private, folio, &dst, mode, reason, - ret_folios); + ret_folios, &can_luf); + /* * The rules are: * Success: folio will be freed @@ -1911,7 +1933,8 @@ static int migrate_pages_batch(struct list_head *from, /* nr_failed isn't updated for not used */ stats->nr_thp_failed += thp_retry; rc_saved = rc; - if (list_empty(&unmap_folios)) + if (list_empty(&unmap_folios) && + list_empty(&unmap_folios_luf)) goto out; else goto move; @@ -1925,8 +1948,13 @@ static int migrate_pages_batch(struct list_head *from, stats->nr_thp_succeeded += is_thp; break; case MIGRATEPAGE_UNMAP: - list_move_tail(&folio->lru, &unmap_folios); - list_add_tail(&dst->lru, &dst_folios); + if (can_luf) { + list_move_tail(&folio->lru, &unmap_folios_luf); + list_add_tail(&dst->lru, &dst_folios_luf); + } else { + list_move_tail(&folio->lru, &unmap_folios); + list_add_tail(&dst->lru, &dst_folios); + } break; default: /* @@ -1946,6 +1974,8 @@ static int migrate_pages_batch(struct list_head *from, stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: + /* Should be before try_to_unmap_flush() */ + luf_key = fold_unmap_luf(); /* Flush TLBs for all unmapped folios */ try_to_unmap_flush(); @@ -1959,7 +1989,11 @@ static int migrate_pages_batch(struct list_head *from, migrate_folios_move(&unmap_folios, &dst_folios, put_new_folio, private, mode, reason, ret_folios, stats, &retry, &thp_retry, - &nr_failed, &nr_retry_pages); + &nr_failed, &nr_retry_pages, 0); + migrate_folios_move(&unmap_folios_luf, &dst_folios_luf, + put_new_folio, private, mode, reason, + ret_folios, stats, &retry, &thp_retry, + &nr_failed, &nr_retry_pages, luf_key); } nr_failed += retry; stats->nr_thp_failed += thp_retry; @@ -1970,6 +2004,8 @@ static int migrate_pages_batch(struct list_head *from, /* Cleanup remaining folios */ migrate_folios_undo(&unmap_folios, &dst_folios, put_new_folio, private, ret_folios); + migrate_folios_undo(&unmap_folios_luf, &dst_folios_luf, + put_new_folio, private, ret_folios); return rc; } diff --git a/mm/rmap.c b/mm/rmap.c index b6613b48669ac..284fc48aef2de 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2750,8 +2750,9 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * Tries to remove all the page table entries which are mapping this folio and * replace them with special swap entries. Caller must hold the folio lock. + * Return true if all the mappings are read-only, otherwise false. */ -void try_to_migrate(struct folio *folio, enum ttu_flags flags) +bool try_to_migrate(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_migrate_one, @@ -2769,11 +2770,11 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) */ if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC | TTU_BATCH_FLUSH))) - return; + return false; if (folio_is_zone_device(folio) && (!folio_is_device_private(folio) && !folio_is_device_coherent(folio))) - return; + return false; /* * During exec, a temporary VMA is setup and later moved. @@ -2793,10 +2794,12 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) else rmap_walk(folio, &rwc); - if (can_luf_test()) + if (can_luf_test()) { fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); - else - fold_batch(tlb_ubc, tlb_ubc_ro, true); + return true; + } + fold_batch(tlb_ubc, tlb_ubc_ro, true); + return false; } #ifdef CONFIG_DEVICE_PRIVATE diff --git a/mm/swap.c b/mm/swap.c index 0c6198e4a8ee4..e322670c30041 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -84,7 +84,7 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, * This path almost never happens for VM activity - pages are normally freed * in batches. But it gets used by networking - and for compound pages. */ -static void page_cache_release(struct folio *folio) +void page_cache_release(struct folio *folio) { struct lruvec *lruvec = NULL; unsigned long flags; From patchwork Wed Feb 26 12:03:35 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992216 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55526C18E7C for ; Wed, 26 Feb 2025 12:04:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DF4F280045; Wed, 26 Feb 2025 07:03:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 48F79280042; Wed, 26 Feb 2025 07:03:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DF3B280044; Wed, 26 Feb 2025 07:03:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0E3EA280042 for ; Wed, 26 Feb 2025 07:03:59 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B91311C97CE for ; Wed, 26 Feb 2025 12:03:58 +0000 (UTC) X-FDA: 83161962156.28.36698C0 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf25.hostedemail.com (Postfix) with ESMTP id CBB6DA0019 for ; Wed, 26 Feb 2025 12:03:56 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571437; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=9vR7sZI/g9xLheO85wiiZxHP+SqzTea3rvr0rqea48g=; b=mO/txXVZrSgAP62NkmBS8B+0fOGUFLcWI+VyQrn+1C+6gr0IlAdUIPe3+Mn8kU/reqnrJo qkg2sdjYT+NFqI2Pvvg4AcSry04JYEPJQn6IeWiJKkXc24Kq5oofdagFuBeZmta9Xn+gCk djp6S8xhcYxWxovJVXuXSrEdxbcfamw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf25.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571437; a=rsa-sha256; cv=none; b=BSkTikndB8Tuh+zIyBTbnPydZpLwKfieF++ABaQlPEptwV+KLce8tqzut8ROWcfKC7YRdj YgLNbX/Pdt4dKwW6QTbSDLzkn157nrS3EcWOq6k5cM/Ma8cW63Iu/LcMXEbFrYnJvtMxwo VOKW38dHbmrVlIHhEZsjiwYMAxShDd8= X-AuditID: a67dfc5b-3e1ff7000001d7ae-6b-67bf03231e30 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 24/25] mm/vmscan: apply luf mechanism to unmapping during folio reclaim Date: Wed, 26 Feb 2025 21:03:35 +0900 Message-Id: <20250226120336.29565-24-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrALMWRmVeSWpSXmKPExsXC9ZZnoa4y8/50gy8TFCzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PGkeiCi/oVfw7uZWxgfKnWxcjB ISFgIrHyuGoXIyeYObP3PxuIzSagLnHjxk9mEFtEwEziYOsf9i5GLg5mgWVMEntPNIAVCQsU Scw7tAysiEVAVaJ/xX0WkJm8QA0XbltDzJSXWL3hAFgJJ1D407RjYK1CAskSO3//YYKouc0m cfKOH4QtKXFwxQ2WCYy8CxgZVjEKZeaV5SZm5pjoZVTmZVboJefnbmIEhvOy2j/ROxg/XQg+ xCjAwajEw/vgzN50IdbEsuLK3EOMEhzMSiK8nJl70oV4UxIrq1KL8uOLSnNSiw8xSnOwKInz Gn0rTxESSE8sSc1OTS1ILYLJMnFwSjUwlnBuvvlQl/FCLrPVDpXC9z/8t5VH3yzl9rogreWy a+ufdPXahAVnvPbs+vvh3AK7a+ULnycdZVYTD5ee1M1nax71c8EX/sVtZckuuUIegp1fjdp+ fOgRSuGKmZrUyhzGIXh+Z1VMS8r3pPxO5Rtzve1fzd3Gk+Sw+1/VSVE+39Pnl+w/tfqfEktx RqKhFnNRcSIAU/DYKmMCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/d4Mh0cYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyZhyJLrioX/Hn 4F7GBsaXal2MnBwSAiYSM3v/s4HYbALqEjdu/GQGsUUEzCQOtv5h72Lk4mAWWMYksfdEA1iR sECRxLxDy8CKWARUJfpX3GfpYuTg4AVquHDbGmKmvMTqDQfASjiBwp+mHQNrFRJIltj5+w/T BEauBYwMqxhFMvPKchMzc0z1irMzKvMyK/SS83M3MQIDdFntn4k7GL9cdj/EKMDBqMTD++DM 3nQh1sSy4srcQ4wSHMxKIrycmXvShXhTEiurUovy44tKc1KLDzFKc7AoifN6hacmCAmkJ5ak ZqemFqQWwWSZODilGhiLZwdfDuLnLb1rvft3XuJ+b+ktxnnHWh5OXZt8JmfBwTzfxHMnq94d vii+K2bBlPN9F5Zfk1I1eHDBb6WdHlut2LKCqhcdDIzOt2f8uXPrk9J6rdcvpY0jr0+RjrF+ lxsukePCrdScYvJhQgjn3tAy8flWSxxf5HtHX16ocnDSMq2Fgv83V/sosRRnJBpqMRcVJwIA 9alR/EwCAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: CBB6DA0019 X-Stat-Signature: p3cfxggzwgrxrrmtjasxd8xo1zypjxan X-HE-Tag: 1740571436-742508 X-HE-Meta: U2FsdGVkX1/OYCw4+hG/LmL/8AnUG/AbeV8xe11WSe1aRfP/C4DLthWfnop2lGUVbq6M4bjx2O4ANQ2+5YWaCkGL18rtdBDsNFUOMaJV2CbygHUavFweQEO5U93NEQ2i1DQVilNrE/0uf//OHPsBkeJL1gVphpkMbQqeaNBuCyZSaOT7zuTMS/BBAuCnn4BCxJrJ5M66rSF1tW9xLbTVFz00UOuYAs041TihiDc25tmhm9yN8pBJrqvb9Wl2QrBXdVimakKYQ5XR66v9VNiUob3mf/K1KNlL5oveIllS9uDovyK+VHGbuMCM1Sj5M+B1uhrwRtaJpVWmDHr9CxA8xKQHOjygqu3MRRN6gOp7/OkiQtxqpDtz2luaYpLIk/+GZHWxfc0SI8zQyH8il2wmbdjuKsfosG33ntryIMoYvF1uXXSYg3kl2MJQ4AhJtMqbqASHX12cIbr4haE3iHyhfV6zicWmixP14uTmM1Eoi6R5t7/O2VNr/XiqI3Q6ODTii+77yv3QLjtoDC2VPUunVMzP9CinmSM0tBJ2TfwwEErDkGRc9DwLJ7XYXvoJYLYCLhqkJlHqEZ3GfeK2MWqZyUXFWdKEoRqhMIF+U8E6l+7+U8HJx55WvHmWkduOk2Z0MtN6nOfKOMkwI9bPrhZS2xfwgNEImV42vIQoJpkKq4tXvmi8qdYN7nykl6TBStmD6WM71UOTU+LNDbv75YTP7DMnRZAouJ5O6H5eHNeA6SscGeIZ3BvGqml6MLsD4jGjNJRlxTkzB0B7grl0iU0uaKs9BYEkMtfynQ2gsPyK9uFSg0OW9l9VK3xVj+9bve1uk3j2bXu7teEjFETvxiq8bCDbLJLBP+Lpb18ItQDnO9V6RwCojHcCpwUo1xeDbceVk3mc2vowMAz2rucCRdBw93jLVCjbJF9Ekvy2/fscnKJaAZu4tIJO3mpc3mcZIU/UNOS54aJD6lYCkHR4WwS ylg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. Applied the mechanism to unmapping during folio reclaim. Signed-off-by: Byungchul Park --- include/linux/rmap.h | 5 +++-- mm/rmap.c | 11 +++++++---- mm/vmscan.c | 37 ++++++++++++++++++++++++++++++++----- 3 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index cedba4812ccc7..854b41441d466 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -661,7 +661,7 @@ int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); bool try_to_migrate(struct folio *folio, enum ttu_flags flags); -void try_to_unmap(struct folio *, enum ttu_flags flags); +bool try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, unsigned long end, struct page **pages, @@ -794,8 +794,9 @@ static inline int folio_referenced(struct folio *folio, int is_locked, return 0; } -static inline void try_to_unmap(struct folio *folio, enum ttu_flags flags) +static inline bool try_to_unmap(struct folio *folio, enum ttu_flags flags) { + return false; } static inline int folio_mkclean(struct folio *folio) diff --git a/mm/rmap.c b/mm/rmap.c index 284fc48aef2de..df350b4dfddd0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2386,10 +2386,11 @@ static int folio_not_mapped(struct folio *folio) * Tries to remove all the page table entries which are mapping this * folio. It is the caller's responsibility to check if the folio is * still mapped if needed (use TTU_SYNC to prevent accounting races). + * Return true if all the mappings are read-only, otherwise false. * * Context: Caller must hold the folio lock. */ -void try_to_unmap(struct folio *folio, enum ttu_flags flags) +bool try_to_unmap(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, @@ -2408,10 +2409,12 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) else rmap_walk(folio, &rwc); - if (can_luf_test()) + if (can_luf_test()) { fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); - else - fold_batch(tlb_ubc, tlb_ubc_ro, true); + return true; + } + fold_batch(tlb_ubc, tlb_ubc_ro, true); + return false; } /* diff --git a/mm/vmscan.c b/mm/vmscan.c index a31a7cf87315f..065b40f36bbdd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1092,14 +1092,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, struct reclaim_stat *stat, bool ignore_references) { struct folio_batch free_folios; + struct folio_batch free_folios_luf; LIST_HEAD(ret_folios); LIST_HEAD(demote_folios); unsigned int nr_reclaimed = 0, nr_demoted = 0; unsigned int pgactivate = 0; bool do_demote_pass; struct swap_iocb *plug = NULL; + unsigned short luf_key; folio_batch_init(&free_folios); + folio_batch_init(&free_folios_luf); memset(stat, 0, sizeof(*stat)); cond_resched(); do_demote_pass = can_demote(pgdat->node_id, sc); @@ -1111,6 +1114,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, enum folio_references references = FOLIOREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; + bool can_luf = false; cond_resched(); @@ -1344,7 +1348,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_test_large(folio)) flags |= TTU_SYNC; - try_to_unmap(folio, flags); + can_luf = try_to_unmap(folio, flags); if (folio_mapped(folio)) { stat->nr_unmap_fail += nr_pages; if (!was_swapbacked && @@ -1488,6 +1492,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * leave it off the LRU). */ nr_reclaimed += nr_pages; + if (can_luf) + luf_flush(fold_unmap_luf()); continue; } } @@ -1520,6 +1526,19 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, nr_reclaimed += nr_pages; folio_unqueue_deferred_split(folio); + + if (can_luf) { + if (folio_batch_add(&free_folios_luf, folio) == 0) { + mem_cgroup_uncharge_folios(&free_folios); + mem_cgroup_uncharge_folios(&free_folios_luf); + luf_key = fold_unmap_luf(); + try_to_unmap_flush(); + free_unref_folios(&free_folios, 0); + free_unref_folios(&free_folios_luf, luf_key); + } + continue; + } + if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1554,9 +1573,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, list_add(&folio->lru, &ret_folios); VM_BUG_ON_FOLIO(folio_test_lru(folio) || folio_test_unevictable(folio), folio); + if (can_luf) + luf_flush(fold_unmap_luf()); } /* 'folio_list' is always empty here */ + /* + * Finalize this turn before demote_folio_list(). + */ + mem_cgroup_uncharge_folios(&free_folios); + mem_cgroup_uncharge_folios(&free_folios_luf); + luf_key = fold_unmap_luf(); + try_to_unmap_flush(); + free_unref_folios(&free_folios, 0); + free_unref_folios(&free_folios_luf, luf_key); + /* Migrate folios selected for demotion */ nr_demoted = demote_folio_list(&demote_folios, pgdat); nr_reclaimed += nr_demoted; @@ -1590,10 +1621,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, pgactivate = stat->nr_activate[0] + stat->nr_activate[1]; - mem_cgroup_uncharge_folios(&free_folios); - try_to_unmap_flush(); - free_unref_folios(&free_folios, 0); - list_splice(&ret_folios, folio_list); count_vm_events(PGACTIVATE, pgactivate); From patchwork Wed Feb 26 12:03:36 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992218 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35A4DC021BF for ; Wed, 26 Feb 2025 12:05:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6EAA280042; Wed, 26 Feb 2025 07:03:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA6FC280044; Wed, 26 Feb 2025 07:03:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80FBF280042; Wed, 26 Feb 2025 07:03:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 49D93280044 for ; Wed, 26 Feb 2025 07:03:59 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 03AB1C136F for ; Wed, 26 Feb 2025 12:03:58 +0000 (UTC) X-FDA: 83161962198.20.BBD1DF9 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id D4CF34001C for ; Wed, 26 Feb 2025 12:03:56 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571437; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=7nOFbsSt5dmBle93YRB4hPUymdMfCQ3zWE2mhppFw6k=; b=rCO3Ng3/flOcPQlPdgs604ddemNprb6F6R/r//4i3Ep2Pw80ywSeVv82rr3kBdl/m2Uojf atiAMrXSANabz+3uzOLrOpVFXBtOCd7ys8utgAAPBojwk5EmOiJfFuy36HncnKovmcO0t4 c+Ex0HPl7zhx41FyQnbAg58YWmuU71o= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571437; a=rsa-sha256; cv=none; b=Y5sNC+MpyQ+4x668AXEatkDx6t0VZW8PfujubDCJsHGz1GdqJYLc9VVn1CfroigoR86Yjc judFQfZke8EyVAwzpY+Qu7TbsxpUEsFSDCipXXyPTMzpWK3U6YUtCDSKyePURcG9owJPHc tHrf4gzMqJGbSRvDdXQ9CGKBlynu5UQ= X-AuditID: a67dfc5b-3e1ff7000001d7ae-70-67bf03230b95 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 25/25] mm/luf: implement luf debug feature Date: Wed, 26 Feb 2025 21:03:36 +0900 Message-Id: <20250226120336.29565-25-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsXC9ZZnoa4y8/50gxOHNSzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK6N1j1vBq5OMFYv/BjQwXlvC2MXI ySEhYCJx4MZbVhj7xsllYDabgLrEjRs/mUFsEQEziYOtf9i7GLk4mAWWMUnsPdHABpIQFgiV mPa8iaWLkYODRUBVYvnqJBCTF6j+SnMBxEh5idUbDoCN4QQKf5p2DKxTSCBZYufvP0wgIyUE brNJvD38gw2iQVLi4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus0EvOz93ECAzpZbV/oncw froQfIhRgINRiYf3wZm96UKsiWXFlbmHGCU4mJVEeDkz96QL8aYkVlalFuXHF5XmpBYfYpTm YFES5zX6Vp4iJJCeWJKanZpakFoEk2Xi4JRqYHRLWTG1+2brt9K/GTP1SksV8nRNulc5RtVI 1okte7TooOpEvyeb30xSV/5jpNgovrF85067/APmB0J21lQblivOdtxlVTHBpnTKzS9Wi8J0 PG1zojbe6z8Uf8lAw3x3q8alL7YPL/27IXmxR/7NDAazD4dq34rL8IbXtzkvdVjnopAo/D59 mxJLcUaioRZzUXEiAMG3Ti1lAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJLMWRmVeSWpSXmKPExsXC5WfdrKvMvD/d4PQ+GYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyWve4Fbw6yVix +G9AA+O1JYxdjJwcEgImEjdOLmMFsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 kISwQKjEtOdNLF2MHBwsAqoSy1cngZi8QPVXmgsgRspLrN5wAGwMJ1D407RjYJ1CAskSO3// YZrAyLWAkWEVo0hmXlluYmaOqV5xdkZlXmaFXnJ+7iZGYIAuq/0zcQfjl8vuhxgFOBiVeHgf nNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEhRmkOFiVxXq/w1AQhgfTE ktTs1NSC1CKYLBMHp1QDo37WOY9WIXn1ghOFW/1N9qdZrf5/kHe7XU2RVt/RxgZvyQI1k0cJ VqYTko24Jh7m6FVqs9O0zD89a87uPT4HxC4tULMwdw+5E3X+1jseuay0fZ98O1dO/2Latn/t v/hDy9Zu3j1z+/HZK2Ysl/gnf3pSgWcyg8ic3Vv+Ok27M/1QKbv2G+6sX0osxRmJhlrMRcWJ AGAHFYJMAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: D4CF34001C X-Stat-Signature: 6axyk9enu3fpn9admb9iu3zg3espdenz X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571436-212417 X-HE-Meta: U2FsdGVkX18QJ6hOsQ0oQt/2LBUiRoyyg+8wGo+/aJoY4n2DegQ2+1nVGUdR6pmJ/AUfbuv++OLqxOnQebBR2Mi6DiDY0ru6EyWzc8/7iD13NT5XZATnBbE4E8BbV5kEwKAA4ynakSW9dMoupgiicEbXa0RUbLMJ8kSUSaWoY+84i5Z5Qipali91z1XuIL/cf7blukPAk3WHVb+1h6HbKU27Hm2ELg/wrUR5Bg3jRT+Mu0enmIv6aMOb46pqaa6tCvPVyZct3Qepdxv5el8B5K2HlpptRECQs816dBaQsNbeyKiJ580Tjbvdz8GJUIfAtyrzklGTu6gyT2Ts6HwIrfMto8fOqWtzYLc2xDa8rmdYI0TbYXzYDqp7NlL47L9NiFG7G9piU6RTpdyo72SwqNltC905O/77cu2mu0J2IxdjX77t5LJzrIKJY/5R4E0dArqAUDntQtnIcW0WSaIChhNmnSLb/j6d3KaoR9H+QFxYon/P50LaYitqIw+UiPAsMR1JNbGD6BA5I7SNjP0XWEPMv/GAaagI8itAkWsvN0MqrK3GcMiszux7mIlQ4X3mDXCxCv0lPLJohb3VjqcFWLZrjcdU8HuliHUeb4HhKkQAPIWAWkWEY1Js42BObkGhRPHe9ZAvpwS4YAdaus4157L6IrwLq0Ml1SqQaOF44RDFjeRyTcaKypMOhRkFN+l69A4CQVHIYZqsfsi7WkJWb2JMOajZ7bzf+xe1gMq3GUIoGSKkTTIBRMpu3r8e2E5x3YdoJEIyAdvo50H+Ni2Py5Aej+UQlccnydEmCB40vyCo/9tG8w4EVn6jwcRbfK6UEi0rAVfctQr2crabNof17f7CGV42Y792RfAnCLO81QgRzmyof6/EPvckcsh1r+OBQ899DjrMBFdmdff3jeeWtsBqUr3ZU0yL2NC9z+JlkghXey7AdN6OqfFCxtqy9D2G1nTcu1DDr7cU4t2oikr VDA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We need luf debug feature to detect when luf goes wrong by any chance. As a RFC, suggest a simple implementation to report problematic situations by luf. Signed-off-by: Byungchul Park --- arch/riscv/include/asm/tlbflush.h | 3 + arch/riscv/mm/tlbflush.c | 35 ++++- arch/x86/include/asm/pgtable.h | 10 ++ arch/x86/include/asm/tlbflush.h | 3 + arch/x86/mm/pgtable.c | 10 ++ arch/x86/mm/tlb.c | 35 ++++- include/linux/highmem-internal.h | 5 + include/linux/mm.h | 20 ++- include/linux/mm_types.h | 16 +-- include/linux/mm_types_task.h | 16 +++ include/linux/sched.h | 5 + mm/highmem.c | 1 + mm/memory.c | 12 ++ mm/page_alloc.c | 34 ++++- mm/page_ext.c | 3 + mm/rmap.c | 229 ++++++++++++++++++++++++++++++ 16 files changed, 418 insertions(+), 19 deletions(-) diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index ec5caeb3cf8ef..9451f3d22f229 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -69,6 +69,9 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); +#ifdef CONFIG_LUF_DEBUG +extern void print_lufd_arch(void); +#endif static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 93afb7a299003..de91bfe0426c2 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -216,6 +216,25 @@ static int __init luf_init_arch(void) } early_initcall(luf_init_arch); +#ifdef CONFIG_LUF_DEBUG +static DEFINE_SPINLOCK(luf_debug_lock); +#define lufd_lock(f) spin_lock_irqsave(&luf_debug_lock, (f)) +#define lufd_unlock(f) spin_unlock_irqrestore(&luf_debug_lock, (f)) + +void print_lufd_arch(void) +{ + int cpu; + + pr_cont("LUFD ARCH:"); + for_each_cpu(cpu, cpu_possible_mask) + pr_cont(" %lu", atomic_long_read(per_cpu_ptr(&ugen_done, cpu))); + pr_cont("\n"); +} +#else +#define lufd_lock(f) do { (void)(f); } while(0) +#define lufd_unlock(f) do { (void)(f); } while(0) +#endif + /* * batch will not be updated. */ @@ -223,17 +242,22 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); - if (ugen_before(done, ugen)) + if (ugen_before(done, ugen)) { + lufd_unlock(flags); return false; + } } + lufd_unlock(flags); return true; out: return cpumask_empty(&batch->cpumask); @@ -243,10 +267,12 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; @@ -254,6 +280,7 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, if (!ugen_before(done, ugen)) cpumask_clear_cpu(cpu, &batch->cpumask); } + lufd_unlock(flags); out: return cpumask_empty(&batch->cpumask); } @@ -262,10 +289,12 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -283,15 +312,18 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, mm_cpumask(mm)) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -309,4 +341,5 @@ void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 593f10aabd45a..414bcabb23b51 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -695,12 +695,22 @@ static inline pud_t pud_mkyoung(pud_t pud) return pud_set_flags(pud, _PAGE_ACCESSED); } +#ifdef CONFIG_LUF_DEBUG +pud_t pud_mkwrite(pud_t pud); +static inline pud_t __pud_mkwrite(pud_t pud) +{ + pud = pud_set_flags(pud, _PAGE_RW); + + return pud_clear_saveddirty(pud); +} +#else static inline pud_t pud_mkwrite(pud_t pud) { pud = pud_set_flags(pud, _PAGE_RW); return pud_clear_saveddirty(pud); } +#endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline int pte_soft_dirty(pte_t pte) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index dbcbf0477ed2a..03b3e90186ab1 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -298,6 +298,9 @@ extern bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, un extern bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); extern void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); extern void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); +#ifdef CONFIG_LUF_DEBUG +extern void print_lufd_arch(void); +#endif static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 1fef5ad32d5a8..d0b7a1437214c 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -904,6 +904,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { + lufd_check_pages(pte_page(pte), 0); if (vma->vm_flags & VM_SHADOW_STACK) return pte_mkwrite_shstk(pte); @@ -914,6 +915,7 @@ pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { + lufd_check_pages(pmd_page(pmd), PMD_ORDER); if (vma->vm_flags & VM_SHADOW_STACK) return pmd_mkwrite_shstk(pmd); @@ -922,6 +924,14 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd_clear_saveddirty(pmd); } +#ifdef CONFIG_LUF_DEBUG +pud_t pud_mkwrite(pud_t pud) +{ + lufd_check_pages(pud_page(pud), PUD_ORDER); + return __pud_mkwrite(pud); +} +#endif + void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) { /* diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index be6068b60c32d..99b3d54aa74d2 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1283,6 +1283,25 @@ static int __init luf_init_arch(void) } early_initcall(luf_init_arch); +#ifdef CONFIG_LUF_DEBUG +static DEFINE_SPINLOCK(luf_debug_lock); +#define lufd_lock(f) spin_lock_irqsave(&luf_debug_lock, (f)) +#define lufd_unlock(f) spin_unlock_irqrestore(&luf_debug_lock, (f)) + +void print_lufd_arch(void) +{ + int cpu; + + pr_cont("LUFD ARCH:"); + for_each_cpu(cpu, cpu_possible_mask) + pr_cont(" %lu", atomic_long_read(per_cpu_ptr(&ugen_done, cpu))); + pr_cont("\n"); +} +#else +#define lufd_lock(f) do { (void)(f); } while(0) +#define lufd_unlock(f) do { (void)(f); } while(0) +#endif + /* * batch will not be updated. */ @@ -1290,17 +1309,22 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); - if (ugen_before(done, ugen)) + if (ugen_before(done, ugen)) { + lufd_unlock(flags); return false; + } } + lufd_unlock(flags); return true; out: return cpumask_empty(&batch->cpumask); @@ -1310,10 +1334,12 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; @@ -1321,6 +1347,7 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, if (!ugen_before(done, ugen)) cpumask_clear_cpu(cpu, &batch->cpumask); } + lufd_unlock(flags); out: return cpumask_empty(&batch->cpumask); } @@ -1329,10 +1356,12 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -1350,15 +1379,18 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, mm_cpumask(mm)) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -1376,6 +1408,7 @@ void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index dd100e849f5e0..0792530d1be7b 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -41,6 +41,7 @@ static inline void *kmap(struct page *page) { void *addr; + lufd_check_pages(page, 0); might_sleep(); if (!PageHighMem(page)) addr = page_address(page); @@ -161,6 +162,7 @@ static inline struct page *kmap_to_page(void *addr) static inline void *kmap(struct page *page) { + lufd_check_pages(page, 0); might_sleep(); return page_address(page); } @@ -177,11 +179,13 @@ static inline void kunmap(struct page *page) static inline void *kmap_local_page(struct page *page) { + lufd_check_pages(page, 0); return page_address(page); } static inline void *kmap_local_folio(struct folio *folio, size_t offset) { + lufd_check_folio(folio); return page_address(&folio->page) + offset; } @@ -204,6 +208,7 @@ static inline void __kunmap_local(const void *addr) static inline void *kmap_atomic(struct page *page) { + lufd_check_pages(page, 0); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else diff --git a/include/linux/mm.h b/include/linux/mm.h index 1577bc8b743fe..5e577d5fba130 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -45,6 +45,24 @@ extern int sysctl_page_lock_unfairness; void mm_core_init(void); void init_mm_internals(void); +#ifdef CONFIG_LUF_DEBUG +void lufd_check_folio(struct folio *f); +void lufd_check_pages(const struct page *p, unsigned int order); +void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order); +void lufd_check_queued_pages(void); +void lufd_queue_page_for_check(struct page *page, int order); +void lufd_mark_folio(struct folio *f, unsigned short luf_key); +void lufd_mark_pages(struct page *p, unsigned int order, unsigned short luf_key); +#else +static inline void lufd_check_folio(struct folio *f) {} +static inline void lufd_check_pages(const struct page *p, unsigned int order) {} +static inline void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order) {} +static inline void lufd_check_queued_pages(void) {} +static inline void lufd_queue_page_for_check(struct page *page, int order) {} +static inline void lufd_mark_folio(struct folio *f, unsigned short luf_key) {} +static inline void lufd_mark_pages(struct page *p, unsigned int order, unsigned short luf_key) {} +#endif + #ifndef CONFIG_NUMA /* Don't use mapnrs, do it properly */ extern unsigned long max_mapnr; @@ -114,7 +132,7 @@ extern int mmap_rnd_compat_bits __read_mostly; #endif #ifndef page_to_virt -#define page_to_virt(x) __va(PFN_PHYS(page_to_pfn(x))) +#define page_to_virt(x) ({ lufd_check_pages(x, 0); __va(PFN_PHYS(page_to_pfn(x)));}) #endif #ifndef lm_alias diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c5f44b5c9758f..0cd83c1c231b9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -22,6 +22,10 @@ #include +#ifdef CONFIG_LUF_DEBUG +extern struct page_ext_operations luf_debug_ops; +#endif + #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 #endif @@ -32,18 +36,6 @@ struct address_space; struct mem_cgroup; -#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH -struct luf_batch { - struct tlbflush_unmap_batch batch; - unsigned long ugen; - rwlock_t lock; -}; -void luf_batch_init(struct luf_batch *lb); -#else -struct luf_batch {}; -static inline void luf_batch_init(struct luf_batch *lb) {} -#endif - /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index a82aa80c0ba46..3b87f8674e528 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -10,6 +10,7 @@ #include #include +#include #include @@ -88,4 +89,19 @@ struct tlbflush_unmap_batch { #endif }; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +void luf_batch_init(struct luf_batch *lb); +#else +struct luf_batch {}; +static inline void luf_batch_init(struct luf_batch *lb) {} +#endif + +#if defined(CONFIG_LUF_DEBUG) +#define NR_LUFD_PAGES 512 +#endif #endif /* _LINUX_MM_TYPES_TASK_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 96375274d0335..9cb8e6fa1b1b4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1406,6 +1406,11 @@ struct task_struct { unsigned long luf_ugen; unsigned long zone_ugen; unsigned long wait_zone_ugen; +#if defined(CONFIG_LUF_DEBUG) + struct page *lufd_pages[NR_LUFD_PAGES]; + int lufd_pages_order[NR_LUFD_PAGES]; + int lufd_pages_nr; +#endif #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b36cadb..a323d5a655bf9 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -576,6 +576,7 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot) { void *kmap; + lufd_check_pages(page, 0); /* * To broaden the usage of the actual kmap_local() machinery always map * pages when debugging is enabled and the architecture has no problems diff --git a/mm/memory.c b/mm/memory.c index 6cdc1df0424f3..e7a0a89d7027e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6224,6 +6224,18 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mapping = vma->vm_file->f_mapping; } +#ifdef CONFIG_LUF_DEBUG + if (luf_flush) { + /* + * If it has a VM_SHARED mapping, all the mms involved + * in the struct address_space should be luf_flush'ed. + */ + if (mapping) + luf_flush_mapping(mapping); + luf_flush_mm(mm); + } +#endif + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2a2103df2d88e..9258d7c4eaf42 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -758,6 +758,8 @@ void luf_takeoff_end(struct zone *zone) VM_WARN_ON(current->zone_ugen); VM_WARN_ON(current->wait_zone_ugen); } + + lufd_check_queued_pages(); } /* @@ -853,8 +855,10 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) struct luf_batch *lb; unsigned long lb_ugen; - if (!luf_key) + if (!luf_key) { + lufd_check_pages(page, buddy_order(page)); return true; + } lb = &luf_batch[luf_key]; read_lock_irqsave(&lb->lock, flags); @@ -875,12 +879,15 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) current->luf_ugen = lb_ugen; + lufd_queue_page_for_check(page, buddy_order(page)); return true; } zone_ugen = page_zone_ugen(zone, page); - if (!zone_ugen) + if (!zone_ugen) { + lufd_check_pages(page, buddy_order(page)); return true; + } /* * Should not be zero since zone-zone_ugen has been updated in @@ -888,17 +895,23 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) */ VM_WARN_ON(!zone->zone_ugen); - if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) { + lufd_check_pages(page, buddy_order(page)); return true; + } if (current->luf_no_shootdown) return false; + lufd_check_zone_pages(zone, page, buddy_order(page)); + /* * zone batched flush has been already set. */ - if (current->zone_ugen) + if (current->zone_ugen) { + lufd_queue_page_for_check(page, buddy_order(page)); return true; + } /* * Others are already performing tlb shootdown for us. All we @@ -933,6 +946,7 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) atomic_long_set(&zone->nr_luf_pages, 0); fold_batch(tlb_ubc_takeoff, &zone->zone_batch, true); } + lufd_queue_page_for_check(page, buddy_order(page)); return true; } #endif @@ -1238,6 +1252,11 @@ static inline void __free_one_page(struct page *page, } else zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + lufd_check_pages(page, order); + else + lufd_check_zone_pages(zone, page, order); + while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; unsigned long buddy_zone_ugen; @@ -1299,6 +1318,10 @@ static inline void __free_one_page(struct page *page, set_page_zone_ugen(page, zone_ugen); pfn = combined_pfn; order++; + if (!zone_ugen) + lufd_check_pages(page, order); + else + lufd_check_zone_pages(zone, page, order); } done_merging: @@ -3168,6 +3191,8 @@ void free_frozen_pages(struct page *page, unsigned int order, unsigned long pfn = page_to_pfn(page); int migratetype; + lufd_mark_pages(page, order, luf_key); + if (!pcp_allowed_order(order)) { __free_pages_ok(page, order, FPI_NONE, luf_key); return; @@ -3220,6 +3245,7 @@ void free_unref_folios(struct folio_batch *folios, unsigned short luf_key) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); + lufd_mark_folio(folio, luf_key); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/page_ext.c b/mm/page_ext.c index 641d93f6af4c1..be40bc2a93378 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -89,6 +89,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = { #ifdef CONFIG_PAGE_TABLE_CHECK &page_table_check_ops, #endif +#ifdef CONFIG_LUF_DEBUG + &luf_debug_ops, +#endif }; unsigned long page_ext_size; diff --git a/mm/rmap.c b/mm/rmap.c index df350b4dfddd0..6a6188d47031b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1161,6 +1161,235 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) } #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ +#ifdef CONFIG_LUF_DEBUG + +static bool need_luf_debug(void) +{ + return true; +} + +static void init_luf_debug(void) +{ + /* Do nothing */ +} + +struct page_ext_operations luf_debug_ops = { + .size = sizeof(struct luf_batch), + .need = need_luf_debug, + .init = init_luf_debug, + .need_shared_flags = false, +}; + +static bool __lufd_check_zone_pages(struct page *page, int nr, + struct tlbflush_unmap_batch *batch, unsigned long ugen) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + unsigned long lb_ugen; + unsigned long flags; + bool ret; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + write_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + ret = arch_tlbbatch_done(&lb->batch.arch, &batch->arch); + write_unlock_irqrestore(&lb->lock, flags); + page_ext_put(page_ext); + + if (!ret || ugen_before(ugen, lb_ugen)) + return false; + } + return true; +} + +void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order) +{ + bool warn; + static bool once = false; + + if (!page || !zone) + return; + + warn = !__lufd_check_zone_pages(page, 1 << order, + &zone->zone_batch, zone->luf_ugen); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} + +static bool __lufd_check_pages(const struct page *page, int nr) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + unsigned long lb_ugen; + unsigned long flags; + bool ret; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + write_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + ret = arch_tlbbatch_diet(&lb->batch.arch, lb_ugen); + write_unlock_irqrestore(&lb->lock, flags); + page_ext_put(page_ext); + + if (!ret) + return false; + } + return true; +} + +void lufd_queue_page_for_check(struct page *page, int order) +{ + struct page **parray = current->lufd_pages; + int *oarray = current->lufd_pages_order; + + if (!page) + return; + + if (current->lufd_pages_nr >= NR_LUFD_PAGES) { + VM_WARN_ONCE(1, "LUFD: NR_LUFD_PAGES is too small.\n"); + return; + } + + *(parray + current->lufd_pages_nr) = page; + *(oarray + current->lufd_pages_nr) = order; + current->lufd_pages_nr++; +} + +void lufd_check_queued_pages(void) +{ + struct page **parray = current->lufd_pages; + int *oarray = current->lufd_pages_order; + int i; + + for (i = 0; i < current->lufd_pages_nr; i++) + lufd_check_pages(*(parray + i), *(oarray + i)); + current->lufd_pages_nr = 0; +} + +void lufd_check_folio(struct folio *folio) +{ + struct page *page; + int nr; + bool warn; + static bool once = false; + + if (!folio) + return; + + page = folio_page(folio, 0); + nr = folio_nr_pages(folio); + + warn = !__lufd_check_pages(page, nr); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) nr(%d)\n", + atomic_long_read(&luf_ugen), page, nr); + print_lufd_arch(); + } +} +EXPORT_SYMBOL(lufd_check_folio); + +void lufd_check_pages(const struct page *page, unsigned int order) +{ + bool warn; + static bool once = false; + + if (!page) + return; + + warn = !__lufd_check_pages(page, 1 << order); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} +EXPORT_SYMBOL(lufd_check_pages); + +static void __lufd_mark_pages(struct page *page, int nr, unsigned short luf_key) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + fold_luf_batch(lb, &luf_batch[luf_key]); + page_ext_put(page_ext); + } +} + +void lufd_mark_folio(struct folio *folio, unsigned short luf_key) +{ + struct page *page; + int nr; + bool warn; + static bool once = false; + + if (!luf_key) + return; + + page = folio_page(folio, 0); + nr = folio_nr_pages(folio); + + warn = !__lufd_check_pages(page, nr); + __lufd_mark_pages(page, nr, luf_key); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) nr(%d)\n", + atomic_long_read(&luf_ugen), page, nr); + print_lufd_arch(); + } +} + +void lufd_mark_pages(struct page *page, unsigned int order, unsigned short luf_key) +{ + bool warn; + static bool once = false; + + if (!luf_key) + return; + + warn = !__lufd_check_pages(page, 1 << order); + __lufd_mark_pages(page, 1 << order, luf_key); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} +#endif + /** * page_address_in_vma - The virtual address of a page in this VMA. * @folio: The folio containing the page.