From patchwork Wed Feb 26 12:01:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992170 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BEB4C021B8 for ; Wed, 26 Feb 2025 12:01:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D177E28001A; Wed, 26 Feb 2025 07:01:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC82E280015; Wed, 26 Feb 2025 07:01:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B682028001A; Wed, 26 Feb 2025 07:01:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 993E2280015 for ; Wed, 26 Feb 2025 07:01:47 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 51C381413C8 for ; Wed, 26 Feb 2025 12:01:47 +0000 (UTC) X-FDA: 83161956654.29.B72B8EC Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf02.hostedemail.com (Postfix) with ESMTP id 0BDE580004 for ; Wed, 26 Feb 2025 12:01:43 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=esqmxSvdXZ0/fOiVv7mAv4zTnaRTKUcehwals+5uS4s=; b=Xmd5s4dFxZm3y5YNOrGg7t8owdpJ72XxXFk2rcdF4wUcdX8K8MGCDRYp8CU0TccrJoAA4Y neeIXYxBpKJZeurzUJ2JGmlrZmWbptX9Qb9wcC4HUwFCl9tRLqEP8hL71m/u/085ltdsOD pAjv341FF/+mZq7HLyFRvETppKqIYMA= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571305; a=rsa-sha256; cv=none; b=ksASvRMcTAgfu9dm0Qots5/8GztXXr5hb3cdfw1R4xkbPlk0y3HjNQ650HERqDm7F+TS/T WkwwRPTECxSQ7yRJHDVrrOUAnvwyB7R72dG36DrwUeh9NpTSIyZv1NBqAq5noig0257kf4 84mqA1+nDGba6SAqwxSTZ5oh61QK8XY= X-AuditID: a67dfc5b-3e1ff7000001d7ae-c7-67bf02a62d46 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 01/25] x86/tlb: add APIs manipulating tlb batch's arch data Date: Wed, 26 Feb 2025 21:01:08 +0900 Message-Id: <20250226120132.28469-1-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226113342.GB1935@system.software.com> References: <20250226113342.GB1935@system.software.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBjdeC1nMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8a1PbMYC54IVdzfVtjA+J2/i5GT Q0LAROLSlGvMMPbGi71gNpuAusSNGz/BbBEBM4mDrX/Yuxi5OJgFljFJ7D3RwAaSEBaokbg5 /R8TiM0ioCrx6vRldhCbV8BU4vGXfnaIofISqzccABvEKWAhMWveAbB6IQFziTOP/jKCDJUQ uM8m8XXDWkaIBkmJgytusExg5F3AyLCKUSgzryw3MTPHRC+jMi+zQi85P3cTIzCol9X+id7B +OlC8CFGAQ5GJR7eB2f2pguxJpYVV+YeYpTgYFYS4eXM3JMuxJuSWFmVWpQfX1Sak1p8iFGa g0VJnNfoW3mKkEB6YklqdmpqQWoRTJaJg1OqgZHloKphXtYzronKui9LV8Xmy+qlaD5akjqj Y8KdTuXTXqfvWWoUu8xoCVJKmcD18ltdltlneSmGi+6hL+9+jojRjXNYoKhhGrd5CvdBxVrZ c9/nOk8MslP5NXnBAsZV97/ZZW7YJ2ur8dzu5zW+6TZHV/JbslRe+rkyxi6HS+VPxdzFHdMs 3ymxFGckGmoxFxUnAgCaPPDpZgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wadX7BZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlXNszi7HgiVDF /W2FDYzf+bsYOTkkBEwkNl7sZQax2QTUJW7c+AlmiwiYSRxs/cPexcjFwSywjEli74kGNpCE sECNxM3p/5hAbBYBVYlXpy+zg9i8AqYSj7/0s0MMlZdYveEA2CBOAQuJWfMOgNULCZhLnHn0 l3ECI9cCRoZVjCKZeWW5iZk5pnrF2RmVeZkVesn5uZsYgSG6rPbPxB2MXy67H2IU4GBU4uF9 cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeb3CUxOEBNIT S1KzU1MLUotgskwcnFINjHcYLU5lOnoeZC5jv7Q4fK11+SGxK983+Jud65i4O0pt7bXcyZdz GSY8TtfgFLEo7GblPVvxJDiUofrqjX7XCQIzr+ZpNZXd8DIJ2bvPf8apc7eeTok+LrNSuOD7 zBNrU9b01fMnzpZdkfLk47tVS9/v6dmil3fcsdl5wcuJEeEZPOfdUwreFSqxFGckGmoxFxUn AgCLgQj5TQIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0BDE580004 X-Stat-Signature: 1ez98hjj3kkpgghfrnzbxso1h4a54pc8 X-HE-Tag: 1740571303-992041 X-HE-Meta: U2FsdGVkX19PBigkbyvb/UzZlHKUD8cs1WGdmNn5na2xf5H7mlVVT1jMlRYBdiK8VApK/azAcr5ms30sOZQRvR0vm24G/gVZKtyv95v5A2N7fBAkqBgdvoo8dFLpvJYp9Y+Ge/JV9rRYvnb6HiErvrxVsYgHRwwWuwAVkAYu7WUl7Mweqfk9LF16Ki4PbkN8BLcsapa93IzN/x8sDr2WD/JBaahK5KsMYseqzqcCBSj7MsB/mAZbg9D9EtIPrJgg78XyNx2u5WqXd73Fu9BahMErtEwxGuH65oexRM3wFedV5HfqAh+5gguqFeNAxlGcBbwKT07cEc3YHokJN8GJKoFPoaLCz9WFb6QzRUgjiIShEF/ELO6IlHCVJkRChwJM1rjz/P+djhAWSOUTXN6l3MlOt+IOmVbrTHt714/7r86VdJ2cSNE2oqarbhQhmwGEJ9U7x2wciEYwANcqq9Ylt1svk1riALtbqOE7c8RJIJQlIz2xSY6cmEen5pWDC11SxFsK6wbKRVCQP4IhTglJogPRX2zWvI1wg44hTFm3oZqnwlNbbHYWYlxvkrhlJAos2FjbsuGcZP4FpgImc8y/N1Mww+iFUy4YV0g3ePQOM8BcbyP0Gk+NlnJJyR2oUUD5rus+OfHVoAnUtAjbAWm0LbWtYFJxj39Ish04uDr1lUbzTStFpr6sq+U6Qvs713HgBtalkVHBshjnpy4vZ2zwvadP2JW07S5IGBBpcKqCEdL1LKNXZV0TMynSk9nsIA44lKyTPxh2k4Zsj8PZEgMlHIFHupptJs0ZWcvMKYxJhUKCtOrmZ30iLHXU4nOMYvJjpXAdBQasU7VZ5fR1H/eBsYTSbmRT1xfbru+Tp96GxiU8jahpMN+iN4ydFMF1ZLwmHV7ph7X4lgcsZvWNHkIECtpO7pKTLmplb8J+UJoQLqv8SU2lI0OEULZaxGqM4EQeJjDclciLqH9GKyUnZoF Ek3vZXih 4CU9oM2W0SC9ZxAQ2B5FTTCYxsQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read-only and were unmapped, since the contents of the folios wouldn't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that needs to recognize read-only tlb entries by separating tlb batch arch data into two, one is for read-only entries and the other is for writable ones, and merging those two when needed. It also optimizes tlb shootdown by skipping CPUs that have already performed tlb flush needed since. To support it, added APIs manipulating arch data for x86. Signed-off-by: Byungchul Park --- arch/x86/include/asm/tlbflush.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) base-commit: f7ed46277aaa8f848f18959ff68469f5186ba87c diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 29373da7b00a6..52c54ca68ca9e 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -293,6 +294,29 @@ static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + return !cpumask_subset(mm_cpumask(mm), &batch->cpumask); +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + return !cpumask_andnot(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); +} + static inline bool pte_flags_need_flush(unsigned long oldflags, unsigned long newflags, bool ignore_access) From patchwork Wed Feb 26 12:01:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992176 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BF10C18E7C for ; Wed, 26 Feb 2025 12:02:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D414F28001D; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C7ECA280020; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 773E328001D; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4DD64280022 for ; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B5EE3C1423 for ; Wed, 26 Feb 2025 12:01:46 +0000 (UTC) X-FDA: 83161956612.30.B2FB742 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id 62D198000D for ; Wed, 26 Feb 2025 12:01:43 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571305; a=rsa-sha256; cv=none; b=J8hSO6wdgJIQTdxsGeQjhf1cEhLqDYvWYQBnj2VWq08W6u1TfR+bF3tD8HvaVL5JkMwTI8 R3wc/3JR/Ekvbq/VrXsQ8bhj3XgVQ8YavLuMBSYPCcLkfDbX24bhE5bUCmxqSUD3JMQlvk nbUO3Vb7JZz560tpV5yORXJEl5cm1Dc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=o+JdRKqtyRFdqe6SPb80piI1yJ/qGijUiwV5uYNBWrA=; b=Eo5jnQkg9/vFXjXlJF3HGZcE7JuMYTwpI88EOA0ssrNkFy8lnJLtHGlLVNGJdOFPP43BPu HFw/lUxVDkmj7pkHSGp0cMpfBceSZ3T3GIW8udPuU/PJahMAKtPBugQ0/UiuhsaWCI/4Un zb+TL+62nlR9bsXKcyd7CdCr+rMkCP0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3e1ff7000001d7ae-cc-67bf02a6ea00 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 02/25] arm64/tlbflush: add APIs manipulating tlb batch's arch data Date: Wed, 26 Feb 2025 21:01:09 +0900 Message-Id: <20250226120132.28469-2-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBgvfCVnMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8aPSb8YC/7yVWxY3MXUwLiUp4uR k0NCwETifNshdhj79a1jzCA2m4C6xI0bP8FsEQEziYOtf4BquDiYBZYxSew90cAGkhAWaGaU uPLZCsRmEVCVuHXuF9ggXgFTiTv329gghspLrN5wAGwQJ9Cgf7t/g9UICSRLtKz/zQIyVELg NpvEv4f7WCEaJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzKvMwKveT83E2MwLBeVvsnegfj pwvBhxgFOBiVeHgfnNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEhRmkO FiVxXqNv5SlCAumJJanZqakFqUUwWSYOTqkGxsiw9In1CRmXHii4LTS4bdgWxKAvM7lQZ8bq hU0a06YzJ3lq8OQcO8EzofuzBLt2pmmmeR7/6n93tTvvzZxYus594tOzr/xmRN7Zv1fRqcSv W6StVO7kCo5Pwhm26t+3nL5x2GxHT4SxE6Nv5JImrf2L8tWSvHS2fZl9kDvPbv6CULNk03lf lFiKMxINtZiLihMBMTo6gGcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6warPnBZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBl/Jj0i7HgL1/F hsVdTA2MS3m6GDk5JARMJF7fOsYMYrMJqEvcuPETzBYRMJM42PqHvYuRi4NZYBmTxN4TDWwg CWGBZkaJK5+tQGwWAVWJW+d+sYPYvAKmEnfut7FBDJWXWL3hANggTqBB/3b/BqsREkiWaFn/ m2UCI9cCRoZVjCKZeWW5iZk5pnrF2RmVeZkVesn5uZsYgUG6rPbPxB2MXy67H2IU4GBU4uF9 cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeb3CUxOEBNIT S1KzU1MLUotgskwcnFINjDVCLtmqp+ROKeRJsMWKVy1a9SWap37f9itpnDYPDj0+xsr3drXz U5cTmz6Xtpx6fPZIaHn2rNxvd9Y2M+rJWWd/05n6gTdu477VjZK9hjxdKVse882aIHruwOIZ 6oxFKSWhgrFpZr+Wrn7+4rOjyTlj8f0qTxuVVrK0vWldw2SpbvBz68fz9UosxRmJhlrMRcWJ AAkBmzpOAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 62D198000D X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: fxxzmjgsccggj1xapcm44q8key3p4cgk X-HE-Tag: 1740571303-617268 X-HE-Meta: U2FsdGVkX18KEaZc5PPWXijM3FEo+KIm1J041LagTlv9AhK79YFFYrkgQz3Gyaaxy0HnJf2sUOOajLxg/JGbcrzB2PY5t9u33w+YQgIrU9GiR/gAMHL2TlSQXTzS6uNJaJifNNxCrnT7eztt0zAEbW3YmWAY/7w9OQKAw00cdQqB63Tf+YJQVrlPT5ViIt5f1DzoyfOWj6rWIBHTOcnhAFKfmZLRR7DUEI0aGxGV/i+ty0NuWmosH1ePyM5ay9d5lI88W9RWqsYOvRQxx9Ap2cqlU7v/HFWb1ShyNEd1fRhb65X8wmfIhApYuB564WmvCNN2dTrNKaFqA6vmmqXZVL8jfJiAkstok+g7ZQXfD/n7izi5+7G6BdBKzaSHUrnAnBKKnd97A+jbWY7lC7QoLxLJF5dy70W2BnPtaR8g1jPZvk/LM2i3sewKAWPnyOEn3mUim8VsY2YUvk6ZKLJelLyqC7nfXjm4cCItppTOYKcHEgZwvfqTEWtf+KTAI6zUEcXuXn4Jz0YUbl45jTxs1VRn/V9ZDbXLK8bT1YpKAnvYSjEaq7AdtUyhjD8DyThy9791zplCa9maBvymvzUbaAvTMG7DpHq9YVfQraZFRQluEDquyD4zdZjie2OoZEdDkJuJ1xml6qi7Ta36mgufyO+tv3ZPgsKAWVJUWzgV5b+pL05WzarYwVDI3Bsd6A5kZxTliJRj3HO9/u2GqbPBLZtH2MjlLAhwm6UugmvnbCAtXfGDAp/C5yJLo2ozVxoXa6NPWLuHsQLSBrPwkDPF5CFxhI+NUSte3M+CRejZyzEo6z3BR3fDDtLyv/oCnlHPUTuSo5TIrWR5/K5wbytvYgJ/ji8HotDFT1ychbsfzDHw+yvwurJLodj2UHATZFDNreSyMPwpB6lQ05u8d5kVTMLqFrZd0WwbUnmt8j3q+wg0DpaIItsmgedMztcU2xbS1Rtub8Wp3y0Fxo8PEdR PCg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that requires to manipulate tlb batch's arch data. Even though arm64 does nothing for tlb things, arch with CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH should provide the APIs. Signed-off-by: Byungchul Park --- arch/arm64/include/asm/tlbflush.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index b7e1920570bdd..f7036cd33e35c 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -347,6 +347,33 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) dsb(ish); } +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + /* nothing to do */ +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + /* nothing to do */ +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + /* + * Nothing is needed in this architecture. + */ + return false; +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + /* Kernel can consider tlb batch always has been done. */ + return true; +} + /* * This is meant to avoid soft lock-ups on large TLB flushing ranges and not * necessarily a performance improvement. From patchwork Wed Feb 26 12:01:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C90EC021BF for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9BC8280015; Wed, 26 Feb 2025 07:01:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D163B28001C; Wed, 26 Feb 2025 07:01:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 96F6028001B; Wed, 26 Feb 2025 07:01:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EB637280015 for ; Wed, 26 Feb 2025 07:01:48 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D560A1613EC for ; Wed, 26 Feb 2025 12:01:46 +0000 (UTC) X-FDA: 83161956612.28.4818732 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf13.hostedemail.com (Postfix) with ESMTP id 7FB9120042 for ; Wed, 26 Feb 2025 12:01:44 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=VwiS9ZEcAts7a8IbXYyi30Hso5bXjL0vgcDTbz5F65Y=; b=caM5jOLo2zN5jhLqr2XNhCj8tuXnSBEn3jxPGKEcHH5naIx4zf6iXIXEpZbEa+wwodrJAL xZ4WmPf/nkYEP5bggLrveVNxg0ZHOoAaMCAEuOcCjwgDtn7pdJ+FXG5eSJFR0m6mctpEj9 vqVvJcJj2V1gBhLIrzDR7jsu+TtvQAw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571305; a=rsa-sha256; cv=none; b=thQcY7a7K4bbEFIf1I10ID4Q3SBeYi/SY7hkKZj4/DUyvvUZqW108UUSV/81FyEN6lpZaD mQvOAbVRKEtYBdSRZozqHFzgpMcLH3mm+H21lsN5Gw4YwUVc8Q1jny2mtbq4B7QH4cfrR0 CN2yAg9plOr5U+1e41R4xq6aHcs4le4= X-AuditID: a67dfc5b-3e1ff7000001d7ae-d1-67bf02a6bbce From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 03/25] riscv/tlb: add APIs manipulating tlb batch's arch data Date: Wed, 26 Feb 2025 21:01:10 +0900 Message-Id: <20250226120132.28469-3-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBr8uSFrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bO1RMYC64JVSzduoipgfEVfxcj J4eEgInEv3cvGWHsZ3ves4PYbALqEjdu/GQGsUUEzCQOtv4BinNxMAssY5LYe6KBDSQhLFAn sXvjY1YQm0VAVeLN6c1gcV4BU4n+U49ZIIbKS6zecABsECfQoH+7f4MtEBJIlmhZ/5sFZKiE wG02iaY/i6EaJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzKvMwKveT83E2MwLBeVvsnegfj pwvBhxgFOBiVeHgfnNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEhRmkO FiVxXqNv5SlCAumJJanZqakFqUUwWSYOTqkGRt22/ftbTKxcGS9PbhBdtaKhPNREYsaG/8yn XTVPcZQ9DuluKovnUF6utEK9K8y+a8NuxbebP2Zqmx8WWDJd49aZ/3cfOrPKWbVUiB1c/T7D 0IS9MfhW3+oM/5CLLz+19m6JzeGtEtD6uML36+8fLgc/PZ6obie+2O7uhAcXdtmGPHIt8rhy WomlOCPRUIu5qDgRAFq0F5lnAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wbMDfBZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBl7Fw9gbHgmlDF 0q2LmBoYX/F3MXJySAiYSDzb854dxGYTUJe4ceMnM4gtImAmcbD1D1Cci4NZYBmTxN4TDWwg CWGBOondGx+zgtgsAqoSb05vBovzCphK9J96zAIxVF5i9YYDYIM4gQb92/0bbIGQQLJEy/rf LBMYuRYwMqxiFMnMK8tNzMwx1SvOzqjMy6zQS87P3cQIDNJltX8m7mD8ctn9EKMAB6MSD++D M3vThVgTy4orcw8xSnAwK4nwcmbuSRfiTUmsrEotyo8vKs1JLT7EKM3BoiTO6xWemiAkkJ5Y kpqdmlqQWgSTZeLglGpgZDaarrH8x1nXjrBzh1j523t3bN2yYur6m4t62OVecSxhFUxuudg5 j7X3dqFD20HfrIofHKxv0+eE8evw+zN+kDj7jUFQXW7GwSWmIl86Vyb0xAd2/mQRM1zzbruN tvrhOv1/hf5TDDw0k3dv28WlKKwTNaHcJCJslqPtRv/QjZlVp9U2XSlXYinOSDTUYi4qTgQA k/WaE04CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7FB9120042 X-Stat-Signature: uepapp96fun3x7z4dxyg9jstrbqdnxoc X-HE-Tag: 1740571304-202506 X-HE-Meta: U2FsdGVkX1+gIGMvuX/4q69IIjM+uFZdo+JZLGUt7sKWPmZ+g/YYrNXwY8kzvhhheM2Z3+j6fj/r5eDRUNUVHPemtBVLlkVFvKog/0mOctvu/tLQoLersm98yRSnYvc25DtgEn7Kayr6jZIpqTS+pIfTrsTbJ+zw1XGk9jZrHOK7sbr5TQxO9l6LFYcP0/kH/TcmJjVprtrqUhGnMT5IOyZxzEjDZEJRPWQgDDhGHbx5q77DL/FxU/O9RHFPEc7Y5gsPYE7kPA9CgfDc2C9xruGKYS8rv0hleE3hAA1EYvjcBomkBRB6sSh/LTOb7/B9R1az93RqPLdH4az+VOIdTssTZn8uGP13tOZ3wsmA9g+OZ6El5MhFCM8jGgEyexlQ81YMD9eE/8YCjA1dfvxPTgmaJSlB99+GP5l06lk5Fpfjzh3wCXzP0D6/qDksjm+SK4ztNv45B215zBY6ocbRaB4+2XESH+XkKTZWC9RU85DOREcw7z3W1Fm1rZuWRtalP+lnvkhsRuJOXAbEYiuRje96LkpAwLQ6w5LjY9KErZ61TUU+hx/SyW2eQKxGEZQIyyeMLc5+ZUvgtinNRKgbdrP2HUizq0k/f9l/NmvV4UqxPsIq19TDw47H6f29mCAxdwwhaVuVwy64adDorUfjkuOySZWIEdacH3/OraeoHyhp1yAhIXRD2Bf0X74i7Z9gBZxPctSE3DZN65Sc2yvDG5k2iYvxMmpdRHHvanWOjzqCiZvB68KfU7KIQvqh4YTm/Xap14Wid9ETChlO1TBRJGJcvUiI0dTbW8fMToKU+EKRn92WROzDny74qaZxT5kJR26+z79j/nqJDXgp9cFY/6rZZ/JRSSM6SAocDqlt8y2BjU8f6ugTRt8R4++VVPDRTBvFW7CyyXHhrdmYKB4ErwRAwp2ycE4rnJpXVDuarUTCEiY12PIkctMroOV5athL1mkjUMyIT5zprhZYgQ9 GYPN+hIi BTLRM4pfHwBlsVuITD8zzunGXucEJSDwD9r0r X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that needs to recognize read-only tlb entries by separating tlb batch arch data into two, one is for read-only entries and the other is for writable ones, and merging those two when needed. It also optimizes tlb shootdown by skipping CPUs that have already performed tlb flush needed since. To support it, added APIs manipulating arch data for riscv. Signed-off-by: Byungchul Park --- arch/riscv/include/asm/tlbflush.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index ce0dd0fed7646..cecd8e7e2a3bd 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -8,6 +8,7 @@ #define _ASM_RISCV_TLBFLUSH_H #include +#include #include #include @@ -64,6 +65,33 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) +{ + cpumask_clear(&batch->cpumask); + +} + +static inline void arch_tlbbatch_fold(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + cpumask_or(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + +} + +static inline bool arch_tlbbatch_need_fold(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm) +{ + return !cpumask_subset(mm_cpumask(mm), &batch->cpumask); + +} + +static inline bool arch_tlbbatch_done(struct arch_tlbflush_unmap_batch *bdst, + struct arch_tlbflush_unmap_batch *bsrc) +{ + return !cpumask_andnot(&bdst->cpumask, &bdst->cpumask, &bsrc->cpumask); + +} + extern unsigned long tlb_flush_all_threshold; #else /* CONFIG_MMU */ #define local_flush_tlb_all() do { } while (0) From patchwork Wed Feb 26 12:01:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01D18C18E7C for ; Wed, 26 Feb 2025 12:01:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 27B2528001C; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F134280021; Wed, 26 Feb 2025 07:01:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1A9728001D; Wed, 26 Feb 2025 07:01:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 709CC28001C for ; Wed, 26 Feb 2025 07:01:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CE18FA34B2 for ; Wed, 26 Feb 2025 12:01:47 +0000 (UTC) X-FDA: 83161956696.28.DCE01C8 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id 2DDD440045 for ; Wed, 26 Feb 2025 12:01:43 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=XXYoBnHoyfeoAmGg+EzfxFy8YoEa8sPs3rLTgmZBeC4=; b=DgXe3KEjaZum/iopVmtgPsAMmkuactYSRJckW5b4e4AbRgQv0A0BSWlrvj9lFzJ6j4mtCQ yYMN5R922roLR16+gl3qOuI0uC3pkYjgNeES7jzRArPYgbimvh1EN1pws2265MvFQhn6yZ lFd+5h3aqfrBwmPBqZuUxl93UabQ1xo= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571305; a=rsa-sha256; cv=none; b=lzOIgTT7ulSJkmkoQhgIyADUBsGKVzh56RoeLjjFzcGYFGTc+hKyC1oZkmCQ0vrxCBHsFp lIN7yf/Cqj/ResyXIYauqu9Gq30hqV8ehFhmhNQRua7L+jmYIUOwleavjKTF0+wmQqtIQf 3s+Sf5uigRMWgjBUE4lxeiJgdjY2H70= X-AuditID: a67dfc5b-3e1ff7000001d7ae-d6-67bf02a660b4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 04/25] x86/tlb, riscv/tlb, mm/rmap: separate arch_tlbbatch_clear() out of arch_tlbbatch_flush() Date: Wed, 26 Feb 2025 21:01:11 +0900 Message-Id: <20250226120132.28469-4-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBrPWSlrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8b+y5+YC57yVWxduompgXE2Txcj J4eEgIlEW18fC4w98UMPI4jNJqAucePGT2YQW0TATOJg6x/2LkYuDmaBZUwSe080sIE4wgIL GCUmdExiBaliEVCVeHyuH6ybV8BUYseKtUwQU+UlVm84ADaJE2jSv92/2UFsIYFkiZb1v1lA BkkI3GeT+L/wCtQZkhIHV9xgmcDIu4CRYRWjUGZeWW5iZo6JXkZlXmaFXnJ+7iZGYGAvq/0T vYPx04XgQ4wCHIxKPLwPzuxNF2JNLCuuzD3EKMHBrCTCy5m5J12INyWxsiq1KD++qDQntfgQ ozQHi5I4r9G38hQhgfTEktTs1NSC1CKYLBMHp1QDY7HkV05O7lkN/henvV58qGZeNq97j3eH sYDs1zk83/44887KKO1m+Bakque+IOqIXVzgvaxtD7t29uwK+2TB/ejPc54pHxYdla+O+bLC Rbd3ffS6NU+uT/mxgpPtboDusgN6puy/1n02aaxdsXbB1tfbAp6dVldrf+kalHBB6OTp3W3f 7leeXqPEUpyRaKjFXFScCACeUCQdaAIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6QdtRAYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgy9l/+xFzwlK9i 69JNTA2Ms3m6GDk5JARMJCZ+6GEEsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 EEdYYAGjxISOSawgVSwCqhKPz/WDdfMKmErsWLGWCWKqvMTqDQfAJnECTfq3+zc7iC0kkCzR sv43ywRGrgWMDKsYRTLzynITM3NM9YqzMyrzMiv0kvNzNzECw3RZ7Z+JOxi/XHY/xCjAwajE w/vgzN50IdbEsuLK3EOMEhzMSiK8nJl70oV4UxIrq1KL8uOLSnNSiw8xSnOwKInzeoWnJggJ pCeWpGanphakFsFkmTg4pRoYH/MVal/dc/ZRh3WVpNLHhXz/anq8+3cZ9XZomr2VXZY549/a ZS/cil7znjqlxDOxwd5pRtd2w9cVXNzT/5+2EAiZaLM7TqziuvGpiSaOafINeqxHO5NYvaVv HJ4TUnxk1uXo5SJsy3UfrGUVmXNgz6NPFcuuy5v+9K2sOfRi1habCqb1+aJeSizFGYmGWsxF xYkAF+3NiE8CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 2DDD440045 X-Stat-Signature: e7feb99tdy46pcgyp5hh1a6szuc1zt36 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571303-387100 X-HE-Meta: U2FsdGVkX1+sxe4gHh6BG1WUZ+SzyyHrBBIqlK9w5CV2fWXj4fyJeQ1xHX1wL+cs2j0bcwrCw+Nc+/OIH8XCEj+E+FH2IOIZtfeYJMDnUZOYDEW4OwLWl/trNrXAcrFZTLbJXG5ljjlDRLlT3U+s1P2chDb/VaON1EUceklr1+3UlMyL/DJmxcTv59djMSnl0KxkWq3ZGnx3mevkEK8qmM9S47t4qgGfP6UXUWZihsFd0qtr6y3Sgh4RutVz+q/xTlVsgMRryoUR+sdB58vo9ulvdVears/eaGYGEY7t7tuNqnh8cucgedt+HIEzLxYuY2r0R+9+gDHKT5p/kS0inVYkX0pJfGdwV6tL1jVkLdBxI6pKaj8dsSrVcg+z9F2te5a4f1mNMx9hgf+95hRJYSMbMA3CEghm+OYp2VwwsOzIKTDKpl5r9y45R1csR3s8iV34BXoNBzPBzKA/wgoM7yqbifCEDmymvaAWbqRWs9zvB92usNhCzcH7GGNvX8ruVvZbhGKoCHfYM2Uq7fh/HgYI2Yhh3VMX1cMiZ3zx0Zv4qY/83vikqAkoH0gQ+xBGFMbxntOnc3w5Aj2nJ2oCfaCs1xjTDDcUbBx6WIWkyVZpNxq0DQl8/XMAXnqyum5Qz8K9MLjNYKeZT681lBcWjq/zWEc3hScXVtTYzuXEkrO2kwDFrcQvqCXOJxcHp7xvnl9Hsd5SY4T+jnUQL49coghdeHI8gotLvbbroXJMrDs1iszTBHtWYuFgwDGmRebT21dkWycr/6oREPWarADLM5PC+gXcDa8IalyXQ2o14Qc4PG41GeTbvy+44Wuk/E5rPodIDvsa2GBqpe+g2qUy1Gug9OG6DC0+fKcOxH5TVVc4yWt15WwO/RLMdPolFiFFTJmJ0IQgbV2qU9K0wegsG4wL+fdksOsf6Sa9EriKVrbFAHiOEgoao3NzKnlt9ZazJJ2VrA9lw7KDsNfYKVL roJikwma le3yArwAko1l7+4k1GwmVsiC3GVUpnpkvSJ4Q3nCZigyJ9CFdFnQ11+N093hEbJpaHz7r X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. This is a preparation for the mechanism that requires to avoid redundant tlb flush by manipulating tlb batch's arch data. To achieve that, we need to separate the part clearing the tlb batch's arch data out of arch_tlbbatch_flush(). Signed-off-by: Byungchul Park --- arch/riscv/mm/tlbflush.c | 1 - arch/x86/mm/tlb.c | 2 -- mm/rmap.c | 1 + 3 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 74dd9307fbf1b..38f4bea8a964a 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -200,5 +200,4 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); - cpumask_clear(&batch->cpumask); } diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 6cf881a942bbe..523e8bb6fba1f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1292,8 +1292,6 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) local_irq_enable(); } - cpumask_clear(&batch->cpumask); - put_flush_tlb_info(); put_cpu(); } diff --git a/mm/rmap.c b/mm/rmap.c index bcec8677f68df..546b7a6a30a44 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -648,6 +648,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); + arch_tlbbatch_clear(&tlb_ubc->arch); tlb_ubc->flush_required = false; tlb_ubc->writable = false; } From patchwork Wed Feb 26 12:01:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60CEBC18E7C for ; Wed, 26 Feb 2025 12:02:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1CE6280020; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA660280023; Wed, 26 Feb 2025 07:01:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D492280020; Wed, 26 Feb 2025 07:01:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3F24C280020 for ; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9F5371A136E for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) X-FDA: 83161956780.13.F3FAA31 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf23.hostedemail.com (Postfix) with ESMTP id 0CE4F140028 for ; Wed, 26 Feb 2025 12:01:44 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571305; a=rsa-sha256; cv=none; b=qIs/Q3f8vvYvj+g6rpLJDcwhM5u87FsXS3wkjExeggbJJVE+jcT1zL5E6sUS1y7kRoLZ6v 5n5siZ5lk8UFQ+xT/jyzcCu33U1KEb/KYMiStVFzbAwwg2FukXp26IRdmNjuM43hah0v0o XJUS4qUuoRQ+oeMYLZQU6IO4Rm7Gats= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=clFjBUSA7nPR6hZWh3dLYCFnVg2VyTrdMDGjh6+piaw=; b=V+44jDC9/z2EjPy4Q85OMiEzNhguJAYPpYyzv5chvyGvHJ40MrNuqPUWu6CZEsP1OrKJ+6 PtwAOLs/+KHMPj9slA6WsazlVJOsqYvMkDN0kakJJ3iVnU+m9A87cmVVN4zoaHXhAvnQ8U oW4uL8SSv/oxno5zlk7SX1gfP44GGeA= X-AuditID: a67dfc5b-3e1ff7000001d7ae-db-67bf02a6e88f From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 05/25] mm/buddy: make room for a new variable, luf_key, in struct page Date: Wed, 26 Feb 2025 21:01:12 +0900 Message-Id: <20250226120132.28469-5-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBtOeyFnMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bGR3sZC9bIVPTtb2JtYJwr1sXI ySEhYCIx/2EbG4zdv+wqC4jNJqAucePGT2YQW0TATOJg6x/2LkYuDmaBZUwSe080sIE4wgLt jBJd77+BVbEIqEp8nfYerJtXwFTieMdzdoip8hKrNxwAq+EEmvRv92+wuJBAskTL+t8sIIMk BG6zSUx70QF1hqTEwRU3WCYw8i5gZFjFKJSZV5abmJljopdRmZdZoZecn7uJERjYy2r/RO9g /HQh+BCjAAejEg/vgzN704VYE8uKK3MPMUpwMCuJ8HJm7kkX4k1JrKxKLcqPLyrNSS0+xCjN waIkzmv0rTxFSCA9sSQ1OzW1ILUIJsvEwSnVwChcYvCwoK9EMqOuKnlNbWPGnwdzpaQrHvd+ c32xa8t3s2i/xb8M8u5WPEtVm//44fb4ueXF0Q1Ga8x6y+ZqL1E5H+yWf9a39XecvtLqVe+X qG5/fIo3obKU0+OvDadgxUuNTds2ePszuh+8elYg59q3d2zSjpOrLE/WiUU2uqt9mMK6+Zn+ WyWW4oxEQy3mouJEAFLeKnxoAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6QVufqMWc9WvYLD5v+Mdm 8XX9L2aLp5/6WCwOzz3JanF51xw2i3tr/rNanN+1ltVix9J9TBaXDixgsjjee4DJYv69z2wW mzdNZbY4PmUqo8XvH3PYHPg9vrf2sXjsnHWX3WPBplKPzSu0PDat6mTz2PRpErvHu3Pn2D1O zPjN4vF+31U2j8UvPjB5bP1l59E49Rqbx+dNcgG8UVw2Kak5mWWpRfp2CVwZGx/tZSxYI1PR t7+JtYFxrlgXIyeHhICJRP+yqywgNpuAusSNGz+ZQWwRATOJg61/2LsYuTiYBZYxSew90cAG 4ggLtDNKdL3/BlbFIqAq8XXae7BuXgFTieMdz9khpspLrN5wAKyGE2jSv92/weJCAskSLet/ s0xg5FrAyLCKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMEyX1f6ZuIPxy2X3Q4wCHIxKPLwP zuxNF2JNLCuuzD3EKMHBrCTCy5m5J12INyWxsiq1KD++qDQntfgQozQHi5I4r1d4aoKQQHpi SWp2ampBahFMlomDU6qBUS+ZieNlcOKk5LookfCjW3bvy/HjcI2dsDhB0Yen0ajvut+NbSVz vx7R/roh4/PzF7qqC/WTNt2tPD79UJ11xeLaIl2hKKVby7KT1jT+S1H8mirVcdwwI2vOP6En P07a1D3z9Hs5f9YatTfy8Ytqk4/1szQJpZT9OFAenH9KPPD3nR0imtnLlViKMxINtZiLihMB 1g+7Ok8CAAA= X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 0CE4F140028 X-Stat-Signature: fucnbkc8ci1m5epa11mryud58js74mrk X-Rspam-User: X-HE-Tag: 1740571304-784399 X-HE-Meta: U2FsdGVkX18rrRNa/q7YyRkDitU+UM1y28ypfqKedpzt+woWTONbYC8dArCUgKSCPiajceBsjZS8ZNHAFEUwT29I9NC7w1WRbt6j3k1vmvN4W4tqzqLagFZbKlP5v5itAj+EAreyjlnMDymdAqNsO2ePH7Rw4yx4kNBAVVfHlJKjiivoZ+1IR1Fv2wQEyZ0Lb7/TGpWft3X2Dl0Uv3iiacM7ZnwNUoXxtBUe1/J89j96IU+14JEq1N2O4Uv6yfST7khDu1dMPuXTTTZ/pNoPg0S1IK3qXWQm138jvUbPfTdhYj9tbdPQwYnL/46ULcHyUHoxQBgXKaLOle41CA/W9t6QM7elqknp1usjG1is5G3g+eZdBFc+A4cjxS5y8/mBDWlWEvcjkoGyJSdM3I9fOm4TG+0VAUyguMuaeew88M/h18dyN+Tle1zh4jHmNOu+J+RQw42t+Nhv8oCjtRIfEbzxtwcp4U87Ee6XRSuTmCJBJaxGVgi1sZetOTybw2HCiW08U5XT0LhhlHWeXTB4Zrtyo31wqhyaCAGkCZqbXx0eTz1n5/bVyfSLLr5YCQ+cmrenO+2b5ZJgaNKRfF4fr7AL5NguJxZZy9gW+I+nn79MbeXadBhQKdFB130X242pSaB3g7A7YWHhmJFYg4WzqvzoMB5CxtSzh7qPKLAwG/fxgFM6SYi/JgkMDxdeleeLSq4+Vf09Kxn4UQQfS2Eu13aZrmoMuo8CdbfO7FduKzS65Wml9dwuUhnSC261KWkm8JCUdu2y/g9rBlZySHVJPMa/dVPTUYIADnuVXxvOmmfm5x9jswnoG71TP20F59QkUKDG9yEhxTXLBn7Q7Jm7crOl0uoKuPvP+yGuqIa2O1vCl7Pgi/QOOEIHm2q5NznqzVVRMRaRVbRYFaUDMWUZxPU17ZYZEE/HYD2TAbg450Lq8j5IFjjdfx1HyZcIvAIwN3m/mj3g17mDakKs6jL K1mkW6ap kGW3xO+YoNCOO1yT/JJBetKec0YdmDZTdATr6pqnegHu60XzwvxOPRW3P41Tp8uPIp6Mc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that tracks need of tlb flush for each page residing in buddy. Since the private field in struct page is used only to store page order in buddy, ranging from 0 to MAX_PAGE_ORDER, that can be covered with unsigned short. So splitted it into two smaller ones, order and luf_key, so that the both can be used in buddy at the same time. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 42 +++++++++++++++++++++++++++++++++------- mm/internal.h | 4 ++-- mm/page_alloc.c | 2 +- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 689b2a7461892..7b15efbe9f529 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -107,13 +107,27 @@ struct page { pgoff_t index; /* Our offset within mapping. */ unsigned long share; /* share count for fsdax */ }; - /** - * @private: Mapping-private opaque data. - * Usually used for buffer_heads if PagePrivate. - * Used for swp_entry_t if swapcache flag set. - * Indicates order in the buddy system if PageBuddy. - */ - unsigned long private; + union { + /** + * @private: Mapping-private opaque data. + * Usually used for buffer_heads if PagePrivate. + * Used for swp_entry_t if swapcache flag set. + * Indicates order in the buddy system if PageBuddy. + */ + unsigned long private; + struct { + /* + * Indicates order in the buddy system if PageBuddy. + */ + unsigned short order; + + /* + * For tracking need of tlb flush, + * by luf(lazy unmap flush). + */ + unsigned short luf_key; + }; + }; }; struct { /* page_pool used by netstack */ /** @@ -577,6 +591,20 @@ static inline void set_page_private(struct page *page, unsigned long private) page->private = private; } +#define page_buddy_order(page) ((page)->order) + +static inline void set_page_buddy_order(struct page *page, unsigned int order) +{ + page->order = (unsigned short)order; +} + +#define page_luf_key(page) ((page)->luf_key) + +static inline void set_page_luf_key(struct page *page, unsigned short luf_key) +{ + page->luf_key = luf_key; +} + static inline void *folio_get_private(struct folio *folio) { return folio->private; diff --git a/mm/internal.h b/mm/internal.h index b07550db2bfd1..c4d2018a7cf8e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -543,7 +543,7 @@ struct alloc_context { static inline unsigned int buddy_order(struct page *page) { /* PageBuddy() must be checked by the caller */ - return page_private(page); + return page_buddy_order(page); } /* @@ -557,7 +557,7 @@ static inline unsigned int buddy_order(struct page *page) * times, potentially observing different values in the tests and the actual * use of the result. */ -#define buddy_order_unsafe(page) READ_ONCE(page_private(page)) +#define buddy_order_unsafe(page) READ_ONCE(page_buddy_order(page)) /* * This function checks whether a page is free && is the buddy diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 16dfcf7ade74a..86c9fa45d36fe 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -576,7 +576,7 @@ void prep_compound_page(struct page *page, unsigned int order) static inline void set_buddy_order(struct page *page, unsigned int order) { - set_page_private(page, order); + set_page_buddy_order(page, order); __SetPageBuddy(page); } From patchwork Wed Feb 26 12:01:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D3C6C18E7C for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B1E0528001E; Wed, 26 Feb 2025 07:01:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA7A8280015; Wed, 26 Feb 2025 07:01:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D2E928001D; Wed, 26 Feb 2025 07:01:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5ACB628001B for ; Wed, 26 Feb 2025 07:01:49 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8A41412120E for ; Wed, 26 Feb 2025 12:01:47 +0000 (UTC) X-FDA: 83161956654.18.FB2199B Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf16.hostedemail.com (Postfix) with ESMTP id 517C718002D for ; Wed, 26 Feb 2025 12:01:44 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571305; a=rsa-sha256; cv=none; b=ZbDFJ2OkBEpeF3eqDF64e0yfWB7AIZqPjmQWX/UlNcOQcfLP7KotBYwHPpNnAdeFbr/QN5 jJkO5iK8VcPuLWGkqpwTwHQtWHnhNaFa8f3NE7UhW74+dZixuwiboFAi8E7GCrnIzpeE9S RfMKyloZX5GBUqaAQ0PT/N4s9KjLCJE= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=yk0RnK9t/QXfZVXi7eXdhWunlcx4pbexLIgBlJEBzY4=; b=0gSUYEe5RSJhtNHQeFvX+zgVv5d+JqxGxkTAafbXf/OYsGKSzD8mzgIcQaeINpu7/TPss7 hFBQ+32C92Mpd0t9yljfKaBO3sZh6QxCqyMVqAtZzXTwP0zryhOX9AngzkRVOFTeLrDhzM IqN4dW8oGdRp/7+c6PXip604FBK1viw= X-AuditID: a67dfc5b-3e1ff7000001d7ae-e0-67bf02a6185d From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 06/25] mm: move should_skip_kasan_poison() to mm/internal.h Date: Wed, 26 Feb 2025 21:01:13 +0900 Message-Id: <20250226120132.28469-6-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBpf6VCzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2NB+1fGgoU6FRubv7I2MM5R6WLk 4JAQMJH4uUS5i5ETzHzxcCkLiM0moC5x48ZPZhBbRMBM4mDrH/YuRi4OZoFlTBJ7TzSwgSSE Baol/h3ZxQoyh0VAVWJqgzlImFfAVOLl04VsEDPlJVZvOAA2hxNozr/dv9lBbCGBZImW9b9Z QGZKCNxnkzg8o40FokFS4uCKGywTGHkXMDKsYhTKzCvLTczMMdHLqMzLrNBLzs/dxAgM6WW1 f6J3MH66EHyIUYCDUYmH98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQW H2KU5mBREuc1+laeIiSQnliSmp2aWpBaBJNl4uCUamAMPH1HLnjVo5Zb5m/1d/A/1vhzQUfz cPqXCzo1jPsWvlymcajVSfd3/i/roHr293+CY1lDWx7eiLirdHDOj5mMSlKpK9XmqYWmOD16 dV9VYJ/Sn4Pn/B6+/NzM/Ktw1v/ch1NXJ7IorZv7U93HNYW5dtbnPXdOWNmKMc16LCLk4TC9 O/GWDkO1EktxRqKhFnNRcSIAnzG2WWUCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wdSl4hZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlLGj/yliwUKdi Y/NX1gbGOSpdjJwcEgImEi8eLmUBsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 kISwQLXEvyO7WLsYOThYBFQlpjaYg4R5BUwlXj5dyAYxU15i9YYDYHM4geb82/2bHcQWEkiW aFn/m2UCI9cCRoZVjCKZeWW5iZk5pnrF2RmVeZkVesn5uZsYgSG6rPbPxB2MXy67H2IU4GBU 4uF9cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeb3CUxOE BNITS1KzU1MLUotgskwcnFINjBp/XQ1z4qQ5OyU5xLrC/3555chz5hcj9zuO8NYK42svFm2R s3ZjWCqfW/VgXumXPGePezvFLVc8qv54wavLqP+D8lkl3v06jlvu+tv8W5QV9eOpvFqX+gRb q++/Z0fNK3vBo7T4/8IbuSzW36MXWex74K4VFTY/X8tDfnPsvaYlbgtjjnueUGIpzkg01GIu Kk4EAGS9OkJNAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 517C718002D X-Stat-Signature: 53ax641io7k7qo7k4xbob9gamkn9xqbt X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740571304-663943 X-HE-Meta: U2FsdGVkX1/dOJTLngDoljsSK4/DhEi8H4AN/pge4GTv5wSiD3y4Zrg+3l6dGm11TYoTy9AEmkf1zi8RWbsraUbee/Q56k1iDcCwyZmFCprdTw++DMvTs3flC56xpmscQq4MJx9e1F3gTi6UlFI42g2vTsQHE/uo7aTLuDjXcLIJjqBo+eP7Zq07sodMotMALkvPqAapAXPN2KiydU4EQEfhJyB/oXjPEdk+OEIJ9oN1owqzdL7S8XW4zwHizeD7UecnBNG+xdRpZGGX2XRaZWujbJcIfdKiOXHIcksSYQx0UpqrD3nqd30nZqpdlgEmpVLpDtYlUnXNHf8AhtqmAlZ3l/+iW2ZuQYwSlTZ2de9HMC+ZkwY2o2O1Q+RIjquO1kW/sQJiHulPb1GMIRiDn06SUJpRhlXMzaBAFqjbWwcAeSiq8B+sBgXW91LPjBEu/Y2mVbdfKVjQclHZUZN6ltboDfBGh0QX7aAW9pcgf9KOpQXyerTso9mkFzmL9BK0lqRIkuR/DDH9MCVRNcHXVaAm7MX7nc/lFwrIdaMlxMTG8TOxNXLgq+wcaJfQYc6cfMefMh1WPuAHbwEZsJukzCgX53ipHykbg/tPIEF4b7maCre01Vr/fw6VcpV5WYl7718hxnnop90GY2FrxcuZQS+vRHzScJ2Di5FVTXc7LsGeUpPmJEGKHoLHAj2YDpAWwHpqXWH6z4H5FTJaC7Zqv/y1rwQpewjpUFUGAp1Xno6TglMSozIHULdwP/1KVAK2vN/eWeRXGlIvHszQq1aYud6vK6o5uk6fd5LvHSWiRd/GJcx7omZHZd191rXCHr0dsbBm536WFcqgbAiPB7H8LpcRR5CqIknlYsn023DgS72DkYGHFht1if4B4l21UzxTSA/qKxtI8xtg3AdN4EhhEemVRJgl54sRR/bNdQDHp9nEB0LqhHB5j7s93TdquoTAnROiyMwz3lj5jscDhnt 1HTFCYlE YpMb3t51wPjKHODaf8nImr+Fwk/4B3qO/AdEo X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to use should_skip_kasan_poison() function in mm/internal.h. Signed-off-by: Byungchul Park --- mm/internal.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 47 ----------------------------------------------- 2 files changed, 47 insertions(+), 47 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index c4d2018a7cf8e..ee8af97c39f59 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1067,8 +1067,55 @@ static inline void vunmap_range_noflush(unsigned long start, unsigned long end) DECLARE_STATIC_KEY_TRUE(deferred_pages); bool __init deferred_grow_zone(struct zone *zone, unsigned int order); + +static inline bool deferred_pages_enabled(void) +{ + return static_branch_unlikely(&deferred_pages); +} +#else +static inline bool deferred_pages_enabled(void) +{ + return false; +} #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ +/* + * Skip KASAN memory poisoning when either: + * + * 1. For generic KASAN: deferred memory initialization has not yet completed. + * Tag-based KASAN modes skip pages freed via deferred memory initialization + * using page tags instead (see below). + * 2. For tag-based KASAN modes: the page has a match-all KASAN tag, indicating + * that error detection is disabled for accesses via the page address. + * + * Pages will have match-all tags in the following circumstances: + * + * 1. Pages are being initialized for the first time, including during deferred + * memory init; see the call to page_kasan_tag_reset in __init_single_page. + * 2. The allocation was not unpoisoned due to __GFP_SKIP_KASAN, with the + * exception of pages unpoisoned by kasan_unpoison_vmalloc. + * 3. The allocation was excluded from being checked due to sampling, + * see the call to kasan_unpoison_pages. + * + * Poisoning pages during deferred memory init will greatly lengthen the + * process and cause problem in large memory systems as the deferred pages + * initialization is done with interrupt disabled. + * + * Assuming that there will be no reference to those newly initialized + * pages before they are ever allocated, this should have no effect on + * KASAN memory tracking as the poison will be properly inserted at page + * allocation time. The only corner case is when pages are allocated by + * on-demand allocation and then freed again before the deferred pages + * initialization is done, but this is not likely to happen. + */ +static inline bool should_skip_kasan_poison(struct page *page) +{ + if (IS_ENABLED(CONFIG_KASAN_GENERIC)) + return deferred_pages_enabled(); + + return page_kasan_tag(page) == KASAN_TAG_KERNEL; +} + enum mminit_level { MMINIT_WARNING, MMINIT_VERIFY, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 86c9fa45d36fe..f3930a2a05cd3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -299,11 +299,6 @@ int page_group_by_mobility_disabled __read_mostly; */ DEFINE_STATIC_KEY_TRUE(deferred_pages); -static inline bool deferred_pages_enabled(void) -{ - return static_branch_unlikely(&deferred_pages); -} - /* * deferred_grow_zone() is __init, but it is called from * get_page_from_freelist() during early boot until deferred_pages permanently @@ -316,11 +311,6 @@ _deferred_grow_zone(struct zone *zone, unsigned int order) return deferred_grow_zone(zone, order); } #else -static inline bool deferred_pages_enabled(void) -{ - return false; -} - static inline bool _deferred_grow_zone(struct zone *zone, unsigned int order) { return false; @@ -993,43 +983,6 @@ static int free_tail_page_prepare(struct page *head_page, struct page *page) return ret; } -/* - * Skip KASAN memory poisoning when either: - * - * 1. For generic KASAN: deferred memory initialization has not yet completed. - * Tag-based KASAN modes skip pages freed via deferred memory initialization - * using page tags instead (see below). - * 2. For tag-based KASAN modes: the page has a match-all KASAN tag, indicating - * that error detection is disabled for accesses via the page address. - * - * Pages will have match-all tags in the following circumstances: - * - * 1. Pages are being initialized for the first time, including during deferred - * memory init; see the call to page_kasan_tag_reset in __init_single_page. - * 2. The allocation was not unpoisoned due to __GFP_SKIP_KASAN, with the - * exception of pages unpoisoned by kasan_unpoison_vmalloc. - * 3. The allocation was excluded from being checked due to sampling, - * see the call to kasan_unpoison_pages. - * - * Poisoning pages during deferred memory init will greatly lengthen the - * process and cause problem in large memory systems as the deferred pages - * initialization is done with interrupt disabled. - * - * Assuming that there will be no reference to those newly initialized - * pages before they are ever allocated, this should have no effect on - * KASAN memory tracking as the poison will be properly inserted at page - * allocation time. The only corner case is when pages are allocated by - * on-demand allocation and then freed again before the deferred pages - * initialization is done, but this is not likely to happen. - */ -static inline bool should_skip_kasan_poison(struct page *page) -{ - if (IS_ENABLED(CONFIG_KASAN_GENERIC)) - return deferred_pages_enabled(); - - return page_kasan_tag(page) == KASAN_TAG_KERNEL; -} - static void kernel_init_pages(struct page *page, int numpages) { int i; From patchwork Wed Feb 26 12:01:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CA87C021B8 for ; Wed, 26 Feb 2025 12:02:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 06F37280022; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F413C280020; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA0A6280024; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 835DB280022 for ; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id BD2241A137F for ; Wed, 26 Feb 2025 12:01:49 +0000 (UTC) X-FDA: 83161956738.19.C17EFAC Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id E3DDC4000C for ; Wed, 26 Feb 2025 12:01:46 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=9TElu9bIdjxoSWdWYcuP0UnxfGdL8F6PWgbYEg9QfNg=; b=laSdL0uA7dWWLcm1+lOmRISLH5QjK3junrO5pMMtNqkItPb3a5ji+KxXQpeg99Rm+FMTef X4qkSuyfRoQeV4vn6ATpsmKgvDfTulUmMB0/Xf1K3+4hBnuaC+SYKYx6bMdWPPr7Tu+oJh naoNXqTRwDF813JHcHQdLh7wObPOucE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571307; a=rsa-sha256; cv=none; b=66XQCOB/VuORpReRLXIkr6dpO6keYTLa2DHEWeO2FB4eQfZxtFRbF19+bkCX7EVeOzkLSm 5vfU28RUrhrfINkkZKYBCx+KiTpT+nHG66NU37b5PmV0woXt+cMx62Ou2fNv8rvBrXyF59 dZgjX667Htjak2KY0/zW4hSOJJwmOgQ= X-AuditID: a67dfc5b-3e1ff7000001d7ae-e5-67bf02a658e0 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 07/25] mm: introduce luf_ugen to be used as a global timestamp Date: Wed, 26 Feb 2025 21:01:14 +0900 Message-Id: <20250226120132.28469-7-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBqeuqVjMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV0bzqwusBe9EK570nmNsYLwh2MXI ySEhYCJxdf4zdhj71YmfLCA2m4C6xI0bP5lBbBEBM4mDrX+Aarg4mAWWMUnsPdHABpIQFqiT 2LP4FFgzi4CqxNymD2ANvAKmEn9ftrNADJWXWL3hAFicE2jQv92/weqFBJIlWtb/ZgEZKiFw m01i3cKrrBANkhIHV9xgmcDIu4CRYRWjUGZeWW5iZo6JXkZlXmaFXnJ+7iZGYFgvq/0TvYPx 04XgQ4wCHIxKPLwPzuxNF2JNLCuuzD3EKMHBrCTCy5m5J12INyWxsiq1KD++qDQntfgQozQH i5I4r9G38hQhgfTEktTs1NSC1CKYLBMHp1QDI8OV5vXM9kcfzHxh5eYz7bzQx/c1B14UPdHy O1zU5Szb3147U+poxKsk52+nkn4LH5vgYvWZ1WTC2mreHW9NStT2y16WFbBq7b67TWeOo+KT 46lf510o0qtg22V6+8DXzF01pRc+Ti20ObTtQ7v8Ls29LTPyM3bsPpzzp1vTXnoBq9yDiKr5 h5RYijMSDbWYi4oTAWFdUNZnAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wemP0hZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlNL+6wFrwTrTi Se85xgbGG4JdjJwcEgImEq9O/GQBsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 kISwQJ3EnsWn2EFsFgFViblNH8AaeAVMJf6+bGeBGCovsXrDAbA4J9Cgf7t/g9ULCSRLtKz/ zTKBkWsBI8MqRpHMvLLcxMwcU73i7IzKvMwKveT83E2MwCBdVvtn4g7GL5fdDzEKcDAq8fA+ OLM3XYg1say4MvcQowQHs5IIL2fmnnQh3pTEyqrUovz4otKc1OJDjNIcLErivF7hqQlCAumJ JanZqakFqUUwWSYOTqkGxjRprpfy61ZX31/ldHqa5/pFHmt0ktwKO+V/Lbr18tK1umbnHrkk 7V9bNA27uQ+a626vie62fj2rvWnOgTV3L356YvDJtKRlcSP/vlyFDYf3az/l8FFk/d+doKbY IBM298bFD6GlLZv6uCbaS/9KNTzy59e/vcEVf0+pHA9s2V64IcKz4IHZdiWW4oxEQy3mouJE ABhXBHZOAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: E3DDC4000C X-Stat-Signature: dnyjixfwsj98jzsh6t4hsz1ujmba8y1f X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571306-615163 X-HE-Meta: U2FsdGVkX1/Zk/MpZIoW2v6N17B4x39POklkX+2Fe5CQfythZWRBuGgdrQAGLMz9PNNw7WyP16sVobBdIGbo4lPEU3ig/6g7J4o3SIegqzUFWXiK1LjnHTSEZ7M2QjgUvDKXkHhA9xlk6AkoEglj8KJE9IK9dahVWzCN6d0pm+4TXLLTMlle6zWdrUqAVwkxmqsgTIOqh7YI3Xqb0Tmq7m9oG00eQ8PjMsjtjJVmdk15+Xo19gLWudUeuIvoVtnrDtuRNZb+gtCtpVawxaMEMqPnRQy8L9XMp3tGfjL5x+ZzOON7mlsadOaG6PjGOwUnOmukq1nr9zq8Woxj8FsczckdS6D6agvehlRreWvJ1jltSo0MLHYYyCx8W22f3hS3wq/KUerH/+aYtAvPg+0Ur9A3R6KjMh8vOsj8U5jlv7cpwX8YkLWLCu0voD5yAcqUuYPIuCZptcqrulQpaC95NVFEpdxIrNCUOMW1+lkKiHKf1lPLfFmV2gXBrxaemOB6uO0Rc1FdtRhF7nhKwBdCYB5vZ8GpXYAEYErR+EVB8qXC40xv9AjfytJ6GxeizA/KebYgB8rSXwbcWXy3VDUb+uOcFMbME6lvfghXGqH3QVYLUfXMkIi7aKHYPxu5u5QrVun5VwxN4Db8PumDDYguC8kMdD0EeDFIx1sgcNjcXmnGLwD98BiMWiwBRWzvf9BLrAa+NcqWyhXfVZDxenp330cwYvQkuIRbfdJNM5Sjcb0Fejz9/YLVfqAHvIVd1Bvm4RzUX9KTjXRolxYloFdcP5riDUEFEUvKIadd5bYKxjOHeQmP+UVQSy6ADYZCDtlxXl0fFPwygEbCownF2ZFZ0tuEO0QX6r9rl0YZHHghpu54ow8nj9S4Be563V83qBJOohCHK4apsknqN+z66xlcAJSfQxrsfOrka6r+ft1hgcmcsjXs0OLrdveH3b2NRBLqQed2nBHVZU92pa+e5LK s3BhY41v ciWDLYVLWGHuAhkabLjbkPdvdp8u8qZEpFjJqrw61NhahZDgr/gawsbCew4sMEYbAwPBI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to evaluate the temporal sequence of events to determine whether tlb flush required has been done on each CPU. To achieve that, this patch introduced a generation number, luf_ugen, and a few APIs manipulating the number. It's worth noting the number is designed to wraparound so care must be taken when using it. Signed-off-by: Byungchul Park --- include/linux/mm.h | 34 ++++++++++++++++++++++++++++++++++ mm/rmap.c | 22 ++++++++++++++++++++++ 2 files changed, 56 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index d82feabbe44f8..74a37cb132caa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4240,4 +4240,38 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status); +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +/* + * luf_ugen will start with 2 so that 1 can be regarded as a passed one. + */ +#define LUF_UGEN_INIT 2 + +static inline bool ugen_before(unsigned long a, unsigned long b) +{ + /* + * Consider wraparound. + */ + return (long)(a - b) < 0; +} + +static inline unsigned long next_ugen(unsigned long ugen) +{ + if (ugen + 1) + return ugen + 1; + /* + * Avoid invalid ugen, zero. + */ + return ugen + 2; +} + +static inline unsigned long prev_ugen(unsigned long ugen) +{ + if (ugen - 1) + return ugen - 1; + /* + * Avoid invalid ugen, zero. + */ + return ugen - 2; +} +#endif #endif /* _LINUX_MM_H */ diff --git a/mm/rmap.c b/mm/rmap.c index 546b7a6a30a44..8439dbb194c8c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -634,6 +634,28 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, } #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH + +/* + * This generation number is primarily used as a global timestamp to + * determine whether tlb flush required has been done on each CPU. The + * function, ugen_before(), should be used to evaluate the temporal + * sequence of events because the number is designed to wraparound. + */ +static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); + +/* + * Don't return invalid luf_ugen, zero. + */ +static unsigned long __maybe_unused new_luf_ugen(void) +{ + unsigned long ugen = atomic_long_inc_return(&luf_ugen); + + if (!ugen) + ugen = atomic_long_inc_return(&luf_ugen); + + return ugen; +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed From patchwork Wed Feb 26 12:01:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50E3FC021B8 for ; Wed, 26 Feb 2025 12:01:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77660280021; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FEAD280020; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4718728001D; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0DD3A28001B for ; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B035FA34B2 for ; Wed, 26 Feb 2025 12:01:49 +0000 (UTC) X-FDA: 83161956738.20.AAB8DF9 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id 112DE8003F for ; Wed, 26 Feb 2025 12:01:46 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571307; a=rsa-sha256; cv=none; b=cgTaiJZyg/R7H9gqtweRlfwLvnsephjpFnRSGTMv6FK7OiPqJ8emtTRhi4aPEcXRPzTtYT 98eKGsB6FmlOHq2vdsRsPwuJxr6tYJJVO3+8WSyisSR4OG2GKsSNbYjDBK+ta3ulYCJKef InApmzfSAwHtX2UZSaWnpL69SlfPxw0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=w8T46xcmGrZcaIInJnnxcZ6eX1tQ3DR9dYMEp+X+GhE=; b=3f95eCITjfzZ/u068u7wqjVF1+7Dhnv/gTcaGjwMGEQYsxW9mgiMnYF8Ao6dVGQ5K5W1U6 J1uLNFDfKhJ441kit83y7nZxWldvFQ2KcKv8VuPl8a703+k5XkVE67ht23Z4Liwuxt9dqg 68/KSo91+a/NbV4f2JqbRbY2wjGkpkQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3e1ff7000001d7ae-ea-67bf02a672e4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 08/25] mm: introduce luf_batch to be used as hash table to store luf meta data Date: Wed, 26 Feb 2025 21:01:15 +0900 Message-Id: <20250226120132.28469-8-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBrc/aVrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV0bz2UVsBVsMKj6v8W5gPKTexcjJ ISFgInH0exMTjP38x0pWEJtNQF3ixo2fzCC2iICZxMHWP+xdjFwczALLmCT2nmhgA3GEBfoY Jfbd/MwCUsUioCqxf30D2CReAVOJ/k/dzBBT5SVWbzgAZnMCTfq3+zc7iC0kkCzRsv43C8gg CYH7bBLLN15mhGiQlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECw3pZ7Z/o HYyfLgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcY pTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYm2ZuM/neMukL98y7H28Jsp61co+rktNt ZJ5wdafIcT1bnqN8WyaYR622+TNR+8kMyeM2T4zWNkUd59bRPrLIOFC+6cmeqF9FSomLBS+y vxcPj+znEfhY3yDUHXbQ/+Q/lTMPA07x9Wpde7Dp76ybsswXm1c8CdeasNhLJNmxk5n3SEDS QbFsJZbijERDLeai4kQAcBTgV2cCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wZ5rChZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlNJ9dxFawxaDi 8xrvBsZD6l2MnBwSAiYSz3+sZAWx2QTUJW7c+MkMYosImEkcbP3D3sXIxcEssIxJYu+JBjYQ R1igj1Fi383PLCBVLAKqEvvXNzCB2LwCphL9n7qZIabKS6zecADM5gSa9G/3b3YQW0ggWaJl /W+WCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgQG6bLaPxN3MH657H6IUYCDUYmH 98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBREuf1Ck9NEBJI TyxJzU5NLUgtgskycXBKNTCGn3L/VtX9jfd+98w1O6SLw4+JH/bRMGnxkzFRZH/dyLnkfdq0 KTe87z/4I7SqpjLKaN6GrOYb38TVpQzOGffeLbka8rCmSmhbx5NjH/55n+va3MhnsDrt2DGN jUYiSq3xb2Jr280EJpYkuXYEvH7refj1prufbxVtfzpppbvlzoSpq90cJkgpsRRnJBpqMRcV JwIA2UEK004CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 112DE8003F X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: tn1bgxxuowunsx1r5n4cp51mjhfuipns X-HE-Tag: 1740571306-350603 X-HE-Meta: U2FsdGVkX1+HKF68ef9FMehNmApeBextSQIb6LUI5VuKGW2v2/aPM1biKWJjDhb7vVRnQPKW4DaiTWEFPWixzBtBjCT59OIFOl20c9G6Q/WmeEzYJS6dkQ2RUZ87rTNMfD9YX7Gv3G6sIH7qV/qOgeJ5XdgVjhXjlvEQxKlNe6u7zOtsH8VzWrCpFTuzW3kudPNZv8J2iOWDX4Za9PWa5Ol60yljvC7h7MAEd3W0j6RfoHJ8PUq2Dh+6w/JHGMOJfM/5c66UWkSkNY2e2s6FIWjSK8eieD82Es5jGZci9rHrkKMfoA3fwMt48MtxAwUrDykDEYsJsH/dQ02UXZz8QvjlfcnX80Lix8C4WrF0OhieiU3XocOWVtwzMN09jsECYdU4koAxLWB8+FhKhV1fRPZL12jLcHEpeKRIUsFmEPbc5x4lm73QOemym0bwFFAHgPVXH9XfFBJTNaPW6DvjVTvzXE2TOvjP76m9AUgrvt1MUav7IZKLW4x+atFHs8Fw7L/4NMB2LaEccxl84n8nbJEIIZaBETQOGNsN4uhikiTbNPIWnNzdiKnRqS+8CqrMiGNjC5237yhXa0VXAzHjzn0dQUfdGbOwIfJN4x0xpGnU80YCPg5qsq2mxrhcmxYMMOsYp4W1hi15Caj1nwQLW6hRHb2aRjhSfqu0QYMKkZKcM3YSn21nXS7b0ewYkAANSm46Axsg/EOFdAil+od/Umafz9N4DfXZNXlYn3m67frEGq8Nxad6hVGHi7p+6Nvf4t3oH3smK99C4NcWNQOIvYGYb8FwXAJqnx0yfqes/gB6si+uwEapEczr9PS33IaLUQFHS+pofercSP7ztmvBICmKS3as/leC7y7CYyge8zY1lJWuj5BSxMEOIhxpuAcTkMjyeHqLygjVCg6GILpMCrdUxmlivEqmqhRp9clshHWhwoRaSNhaPTXGt9qMgpCFQ+/sb1VFxzXrOaXVWlg mT3FtE0B DBqJdhS92LwlGCPgsPYeGQX1A1elySdP4pxZU4c2OBgPBlZXqabOSNZrtBR7ZOdfppVc1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to keep luf meta data per page while staying in pcp or buddy allocator. The meta data includes cpumask for tlb shootdown and luf's request generation number. Since struct page doesn't have enough room to store luf meta data, this patch introduces a hash table to store them and makes each page keep its hash key instead. Since all the pages in pcp or buddy share the hash table, confliction is inevitable so care must be taken when reading or updating its entry. Signed-off-by: Byungchul Park --- include/linux/mm_types.h | 10 ++++ mm/internal.h | 8 +++ mm/rmap.c | 122 +++++++++++++++++++++++++++++++++++++-- 3 files changed, 136 insertions(+), 4 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7b15efbe9f529..f52d4e49e8736 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -33,6 +33,16 @@ struct address_space; struct mem_cgroup; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +#else +struct luf_batch {}; +#endif + /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/mm/internal.h b/mm/internal.h index ee8af97c39f59..8ade04255dba3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1270,6 +1270,8 @@ extern struct workqueue_struct *mm_percpu_wq; void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else static inline void try_to_unmap_flush(void) { @@ -1280,6 +1282,12 @@ static inline void try_to_unmap_flush_dirty(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) +{ +} +static inline void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ +} #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ extern const struct trace_print_flags pageflag_names[]; diff --git a/mm/rmap.c b/mm/rmap.c index 8439dbb194c8c..ac450a45257f6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -641,7 +641,7 @@ struct anon_vma *folio_lock_anon_vma_read(const struct folio *folio, * function, ugen_before(), should be used to evaluate the temporal * sequence of events because the number is designed to wraparound. */ -static atomic_long_t __maybe_unused luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); +static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. @@ -656,6 +656,122 @@ static unsigned long __maybe_unused new_luf_ugen(void) return ugen; } +static void reset_batch(struct tlbflush_unmap_batch *batch) +{ + arch_tlbbatch_clear(&batch->arch); + batch->flush_required = false; + batch->writable = false; +} + +void fold_batch(struct tlbflush_unmap_batch *dst, + struct tlbflush_unmap_batch *src, bool reset) +{ + if (!src->flush_required) + return; + + /* + * Fold src to dst. + */ + arch_tlbbatch_fold(&dst->arch, &src->arch); + dst->writable = dst->writable || src->writable; + dst->flush_required = true; + + if (!reset) + return; + + /* + * Reset src. + */ + reset_batch(src); +} + +/* + * The range that luf_key covers, which is 'unsigned short' type. + */ +#define NR_LUF_BATCH (1 << (sizeof(short) * 8)) + +/* + * Use 0th entry as accumulated batch. + */ +static struct luf_batch luf_batch[NR_LUF_BATCH]; + +static void luf_batch_init(struct luf_batch *lb) +{ + rwlock_init(&lb->lock); + reset_batch(&lb->batch); + lb->ugen = atomic_long_read(&luf_ugen) - 1; +} + +static int __init luf_init(void) +{ + int i; + + for (i = 0; i < NR_LUF_BATCH; i++) + luf_batch_init(&luf_batch[i]); + + return 0; +} +early_initcall(luf_init); + +/* + * key to point an entry of the luf_batch array + * + * note: zero means invalid key + */ +static atomic_t luf_kgen = ATOMIC_INIT(1); + +/* + * Don't return invalid luf_key, zero. + */ +static unsigned short __maybe_unused new_luf_key(void) +{ + unsigned short luf_key = atomic_inc_return(&luf_kgen); + + if (!luf_key) + luf_key = atomic_inc_return(&luf_kgen); + + return luf_key; +} + +static void __fold_luf_batch(struct luf_batch *dst_lb, + struct tlbflush_unmap_batch *src_batch, + unsigned long src_ugen) +{ + /* + * dst_lb->ugen represents one that requires tlb shootdown for + * it, that is, sort of request number. The newer it is, the + * more tlb shootdown might be needed to fulfill the newer + * request. Conservertively keep the newer one. + */ + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + dst_lb->ugen = src_ugen; + fold_batch(&dst_lb->batch, src_batch, false); +} + +void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) +{ + unsigned long flags; + + /* + * Exactly same. Nothing to fold. + */ + if (dst == src) + return; + + if (&src->lock < &dst->lock) { + read_lock_irqsave(&src->lock, flags); + write_lock(&dst->lock); + } else { + write_lock_irqsave(&dst->lock, flags); + read_lock(&src->lock); + } + + __fold_luf_batch(dst, &src->batch, src->ugen); + + write_unlock(&dst->lock); + read_unlock_irqrestore(&src->lock, flags); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -670,9 +786,7 @@ void try_to_unmap_flush(void) return; arch_tlbbatch_flush(&tlb_ubc->arch); - arch_tlbbatch_clear(&tlb_ubc->arch); - tlb_ubc->flush_required = false; - tlb_ubc->writable = false; + reset_batch(tlb_ubc); } /* Flush iff there are potentially writable TLB entries that can race with IO */ From patchwork Wed Feb 26 12:01:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF0A8C021BF for ; Wed, 26 Feb 2025 12:01:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACA3628001B; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 99A0A280023; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63A3B28001B; Wed, 26 Feb 2025 07:01:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1809E280020 for ; Wed, 26 Feb 2025 07:01:50 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B8792515A4 for ; Wed, 26 Feb 2025 12:01:49 +0000 (UTC) X-FDA: 83161956738.09.635AD8B Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf13.hostedemail.com (Postfix) with ESMTP id 31D6620037 for ; Wed, 26 Feb 2025 12:01:46 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571307; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=VsBhFK+5BXnm6VlUr5DhbTwkXrQsT4fypjBw9yhcI50=; b=sxZ0quyi5IeN9+3LOBdUwHhJ5KsW2sXe+trWsLLySuknpS4iI/FaFN2jrG84QUhGDJiq5Y iIM0ZxwsfcHs9GVFS5VDojUH3PqxpVO1Fde2JSRjBI5bOZ3mGMRfd8Ib4yjAOxQ8dFiw5h ooVqSSUNXHfZM2sbU6JCSubFRvSeuX0= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571307; a=rsa-sha256; cv=none; b=ywjT6f/XEe2nf8cak9UAq/v2cqK6SsDPJYT73aexyJ88sWrq2kC0MyQAWQeqlm8rYvJSiK PkAMy51sbFs2/yxu938hoyNPKEU2YcaEyuRsums8C0N2kST6soEqK13wjRCQQ4ins+FtnW NSU81P5mvSvSbdh1DIKt6KXP4z0GTGk= X-AuditID: a67dfc5b-3e1ff7000001d7ae-ef-67bf02a6edf0 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 09/25] mm: introduce API to perform tlb shootdown on exit from page allocator Date: Wed, 26 Feb 2025 21:01:16 +0900 Message-Id: <20250226120132.28469-9-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBhuP6VrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8aSPwEFj0Qrvq/9xtrAOFuoi5GT Q0LARGLj/6vsMPa8ecuZQWw2AXWJGzd+gtkiAmYSB1v/ANVwcTALLGOS2HuigQ3EERboZZRY cWQSC0gVi4CqxOnupWAdvAKmEvsffoeaKi+xesMBsDgn0KR/u3+DxYUEkiVa1v9mARkkIXCb TaL1zT4miAZJiYMrbrBMYORdwMiwilEoM68sNzEzx0QvozIvs0IvOT93EyMwrJfV/onewfjp QvAhRgEORiUe3gdn9qYLsSaWFVfmHmKU4GBWEuHlzNyTLsSbklhZlVqUH19UmpNafIhRmoNF SZzX6Ft5ipBAemJJanZqakFqEUyWiYNTqoFR+W7E08OnZlruC9nBcG3S7xuyudO3nSmcNDFb /KBH0r4bgSoSiY/cH0pcs1zwrDcotCGdbW7kBJaDphkurpr1vzi3Wn2MK5igU+lkm/b5Xka/ 0Z+7PYuvfOO98HGttJjcpmLhaVKKT41WqU9UOb5QWl/DOFm9QojJje3WypDP2g9YNAo+//6u xFKckWioxVxUnAgAHatWsWcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wZ5pqhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlLPkTUPBItOL7 2m+sDYyzhboYOTkkBEwk5s1bzgxiswmoS9y48RPMFhEwkzjY+oe9i5GLg1lgGZPE3hMNbCCO sEAvo8SKI5NYQKpYBFQlTncvBevgFTCV2P/wOzvEVHmJ1RsOgMU5gSb92/0bLC4kkCzRsv43 ywRGrgWMDKsYRTLzynITM3NM9YqzMyrzMiv0kvNzNzECg3RZ7Z+JOxi/XHY/xCjAwajEw/vg zN50IdbEsuLK3EOMEhzMSiK8nJl70oV4UxIrq1KL8uOLSnNSiw8xSnOwKInzeoWnJggJpCeW pGanphakFsFkmTg4pRoYm75PauJK1ZyWp3DielHqjImsW9tyNj1O29H1p8KkLvzX5rR/H/+6 yx959DKzfNHmZcq/rU+9OCVUObPZY03/lgkBry9Uubb/N+PZvkOrWN1y4bkf4Ufm8odd3vZi Sabe5nxPdZaPnqcWmpr9KdwfuPBv2TtJm3Xbs/rEzJXfn5gwQ+311pP35ZVYijMSDbWYi4oT AY/vV1BOAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 31D6620037 X-Stat-Signature: tjwpucy7djgujw344qr6jpec881w6dfz X-HE-Tag: 1740571306-344884 X-HE-Meta: U2FsdGVkX18W2yXtXSfMRJBA+xv3094LO+8bIKNaOsb/hmyPuGIJG1+oZg37q7+eYdFKKN3Zn1MjnT/oN7/bQ7XpWU3woeUDXyzy5+oy0Dv+8x7JVCExdgM6Flm8MpBHP/e7uQFni48hzkzmCW+xG+AKatUI+98+a2I3XIVSt1pjklDVitJ5L2wo/NJtnztsmHiHEGaMXEbPkflUjpIn7Oh8L48SOMyakKCJ07qzwfIWgk0gUvkl0d8JadhMTMsETW+iFG9/ahaATYS2XeSsT9FMvnP70zFSlrL1AZCToutFmBzJZVyyIIfOMgVacyCrHUA6Zx68ioxtrLba/wS71jweehgFzIWZ4IE1YfLp7z9SvpQj3bctglJQIxM4YP0LnxuxYx8oIXRgokfwKQ1ALgFApG2oAIevGcdzk9CTuBKt2G052D8joxlvf1HDQdmSC2R67hp4Cv/LcCVjLfSjsQz4a8Y4vaJ2xJ/N1YkK3D/fFMc8313GBxkYvZrpLueoDyX5o2jlZqrd/hM7sPm90UIT9aVkMXrhwDuF1rHcOsNEPAQ9G+e9t6ATJguqzgIKnePxyg7GA9jMzSBUpgCaczdlS32LyHT5timxAA1wkp48suBMkAxgUWLjgb+81N9ZvTthfOQv0h2YVzSmi/qawHulnA7VjwQxL4L+0dWw+2eTLPaNtZRLg1Kncq2w5Qcgjox0YNupwOwXQwqdKfuUfDrU/lc419takPC2dRYP6HY/Q975f1pyS/FQ3zKiebgkWzlQjVX4a4sdhjbmb1mSDRL8RnrNej1PTJ0v9UPUR6lc3Lo1lZ2C5lMnW+blM5p37bRtDRTXrhBR8xcn+7P5BNJNwpKJZCmyNZK1gYo+1eW01pK8MGCN9WMj86NTdUfSQQY/q03yiC9iKNizWCBT0/Ism5M7s1wqtFSioGSwourqwqlPtNiqe+XdRF+B2SM/HoGwMhfJKQy69Y4WIyr iyKpQsQ1 t89K3cCwklRaaaXfzEqVSwUk2nQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that performs tlb shootdown required on exit from page allocator. This patch introduced a new API rather than making use of existing try_to_unmap_flush() to avoid repeated and redundant tlb shootdown due to frequent page allocations during a session of batched unmap flush. Signed-off-by: Byungchul Park --- include/linux/sched.h | 1 + mm/internal.h | 4 ++++ mm/rmap.c | 20 ++++++++++++++++++++ 3 files changed, 25 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9632e3318e0d6..86ef426644639 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1401,6 +1401,7 @@ struct task_struct { #endif struct tlbflush_unmap_batch tlb_ubc; + struct tlbflush_unmap_batch tlb_ubc_takeoff; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/internal.h b/mm/internal.h index 8ade04255dba3..8ad7e86c1c0e2 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1269,6 +1269,7 @@ extern struct workqueue_struct *mm_percpu_wq; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); +void try_to_unmap_flush_takeoff(void); void flush_tlb_batched_pending(struct mm_struct *mm); void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); @@ -1279,6 +1280,9 @@ static inline void try_to_unmap_flush(void) static inline void try_to_unmap_flush_dirty(void) { } +static inline void try_to_unmap_flush_takeoff(void) +{ +} static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } diff --git a/mm/rmap.c b/mm/rmap.c index ac450a45257f6..61366b4570c9a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -772,6 +772,26 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +void try_to_unmap_flush_takeoff(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + + if (!tlb_ubc_takeoff->flush_required) + return; + + arch_tlbbatch_flush(&tlb_ubc_takeoff->arch); + + /* + * Now that tlb shootdown of tlb_ubc_takeoff has been performed, + * it's good chance to shrink tlb_ubc if possible. + */ + if (arch_tlbbatch_done(&tlb_ubc->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc); + + reset_batch(tlb_ubc_takeoff); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed From patchwork Wed Feb 26 12:01:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992178 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F64CC021BF for ; Wed, 26 Feb 2025 12:02:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D83D280024; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 687A0280023; Wed, 26 Feb 2025 07:01:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48CC1280024; Wed, 26 Feb 2025 07:01:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1E661280023 for ; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8EC171612E5 for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) X-FDA: 83161956780.04.EC89E34 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf02.hostedemail.com (Postfix) with ESMTP id 161FE8001A for ; Wed, 26 Feb 2025 12:01:47 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=Cg/fpVqogJ3sTbAxwKiXYKaOPt890OklpZzKlHKTkOo=; b=qrvpPnfaSLEr7JNA99lNQyFz4hVL509R/oomY1UEPw+Dy26uLRjb7EwukFLPJhl4ieulUF WrveUc4xDcGUPP9P/ZN8eO13YwbYoaglpQXayws12X8f2vmFCQDnRD9Cjm5ySm6AthiKOu trXZq2es8+O975hdE3FaZWNDFfAb+pY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571308; a=rsa-sha256; cv=none; b=arN5BY/9Nj2BdP0pJxjsc/hxz5m2guo0pC/yKCeIy7kXjYqn/Pn3LsyODK/1C96xcqcrj6 9xJb1mDhXyjKpNJkClDgNHlxsgkbSiNYGkJipsRZ585qScOINnfacTMk0Rsy8Qq1XX2qkH 300LgBJbC/QdExEOIXdz7118MA/waa4= X-AuditID: a67dfc5b-3e1ff7000001d7ae-f4-67bf02a682ef From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 10/25] mm: introduce APIs to check if the page allocation is tlb shootdownable Date: Wed, 26 Feb 2025 21:01:17 +0900 Message-Id: <20250226120132.28469-10-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBnPfG1nMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bXnplMBT+NK/62GTUwHtfqYuTk kBAwkfjz4i4TjH1p41QWEJtNQF3ixo2fzCC2iICZxMHWP+xdjFwczALLmCT2nmhgA3GEBfoY JW517wHrYBFQlbjSvQpsEi9Qx/vT/5khpspLrN5wAMzmBIr/2/2bHcQWEkiWaFn/mwVkkITA bTaJR6/mskA0SEocXHGDZQIj7wJGhlWMQpl5ZbmJmTkmehmVeZkVesn5uZsYgWG9rPZP9A7G TxeCDzEKcDAq8fA+OLM3XYg1say4MvcQowQHs5IIL2fmnnQh3pTEyqrUovz4otKc1OJDjNIc LErivEbfylOEBNITS1KzU1MLUotgskwcnFINjFMWJG+44J3SwMS2UKfY1EPm0Hpd9gwLhetT XhYcOJpTkBVV7GJ5q+HqzIeSSyesdl78/dpi63B1vp5Zmxt+qNV1/U6T+Pnrea6yLM/0XxsW LDRKuXS88E98/d8QqVUfrnfdeqndONVl3YG70Wv3rrDou9N8KvnGrr8PVQQueppXbF/tfcOx W06JpTgj0VCLuag4EQDOqW81ZwIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wanfmhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlfO2ZyVTw07ji b5tRA+NxrS5GTg4JAROJSxunsoDYbALqEjdu/GQGsUUEzCQOtv5h72Lk4mAWWMYksfdEAxuI IyzQxyhxq3sPWAeLgKrEle5VTCA2L1DH+9P/mSGmykus3nAAzOYEiv/b/ZsdxBYSSJZoWf+b ZQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBQbqs9s/EHYxfLrsfYhTgYFTi4X1w Zm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlYlMR5vcJTE4QE0hNL UrNTUwtSi2CyTBycUg2MOWfnFsQlbDnOuH9BJB+v7Z+HnmtMQ3mdltbp1F1ZXsy+8IXvu+1b X7zLd+I6ejJR4lxPaSuP9Pt/mWezWdl+2pQwVFhtUp1pxDFLL3pCo97GCXUtXqtvWr5k7Jgu zPdpFfvVrR89HtYbd954Yf5XSCvff9mOn0X2bFZ/dGamnn14JaHunNUXJZbijERDLeai4kQA Fd5EEU4CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 161FE8001A X-Stat-Signature: gjp8ks1pfxesyxjx15knwtzi831zncdq X-HE-Tag: 1740571307-922618 X-HE-Meta: U2FsdGVkX1+LU6++o/PBvn/bHtwgOXp2PpQLwKa2nwWiihpfVRe6h2Chc2Z7sdsAXIbQGaX9qTM/v69KwdAIseG/Q0IP0Ogch2MIzad6Tn3Xn7s6eV7dq/HStb9nwqiKjdEEvWMizOvrugRVl4Ni6bTtDXt6z+FJ69uTgZj81jCKwHvkGGlSc1mpQ/fecJ8oPEGhFCZf12VSn6ihyFfOOkpInYrmJqtE8ZqMwnw+Uq1P4Fx2ln4Q2fx+txJanQTxRV/ukscwlSwSxii3B3ttjvcSE2caLA9O1BxrJFIhS/PHzCkOkRO3e84/SDtIDWjh8ZqOgj9l1IwGQ90eUonpBMDoRq7QJWycEs08LZgsJ6WAV7nVwaGs8vqVjSJXS8IO8g9osAllEfUi/pVvXhiGfj7b2Xpv5a5MJy5zC39BeMuN3nMjv2MHt1RmiRnaf/MbfplTFJ3E2Nc9Eegg2U8eccYKtM18YcJamzHQFzAdSJTiczuPBlCrXLTvIcl84BjJNA4t+DKfjG2W7a/jT+UiHCMb6W/m/fhSWjMFv/++QKh3e5bGvQFcvp6dwRaEgk/lNBY1x4sXE0GY1gQVW4Chz53efwMK0ez9jzCbNCnzqm1abUHS+h0Mpelad/XkWTUi2MnGXw33Xz2htHScq76GagfHZv02sc8L17U70cuhXo1w2IjDMRRrMF0jut6ZU/2lFat3CR9fRHWpX0pZBkBVSwU0BMy7Wn76W1trFEJba+sx5A9kTCeYpJnccw9scGpE0V2Qg4N3QRgqhDfdzdflRIZ1YC0w822WjPZdkDQNyseXbpr3UedwBW1fONc8JQ838l6p72SKmFk70QkjfBDmgMBS9UenyxlbMBHKf0rGGFFfbhueL1w76M1YQlHaSsu5daiWHqVAYHXtJ4PT5GwWxsrw7D0U3LRAwC9b0HWF3mEcxQzmgEOmAgr4pv/DbYkgsb+xNel7ezBHnVJ75ar 6TltiRr9 e5zo7mFSjVgY4IbIvL9A0jmf84w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that should indentify if tlb shootdown can be performed on page allocation. In a context with irq disabled or non-task, tlb shootdown cannot be performed because of deadlock issue. Thus, page allocator should work being aware of whether tlb shootdown can be performed on returning page. This patch introduced APIs that pcp or buddy page allocator can use to delimit the critical sections taking off pages and indentify whether tlb shootdown can be performed. Signed-off-by: Byungchul Park --- include/linux/sched.h | 5 ++ mm/internal.h | 14 ++++ mm/page_alloc.c | 159 ++++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 2 +- 4 files changed, 179 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 86ef426644639..a3049ea5b3ad3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1400,6 +1400,11 @@ struct task_struct { struct callback_head cid_work; #endif +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + int luf_no_shootdown; + int luf_takeoff_started; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; diff --git a/mm/internal.h b/mm/internal.h index 8ad7e86c1c0e2..bf16482bce2f5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1598,6 +1598,20 @@ static inline void accept_page(struct page *page) { } #endif /* CONFIG_UNACCEPTED_MEMORY */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +extern struct luf_batch luf_batch[]; +bool luf_takeoff_start(void); +void luf_takeoff_end(void); +bool luf_takeoff_no_shootdown(void); +bool luf_takeoff_check(struct page *page); +bool luf_takeoff_check_and_fold(struct page *page); +#else +static inline bool luf_takeoff_start(void) { return false; } +static inline void luf_takeoff_end(void) {} +static inline bool luf_takeoff_no_shootdown(void) { return true; } +static inline bool luf_takeoff_check(struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +#endif /* pagewalk.c */ int walk_page_range_mm(struct mm_struct *mm, unsigned long start, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f3930a2a05cd3..f3cb02e36e770 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -622,6 +622,165 @@ compaction_capture(struct capture_control *capc, struct page *page, } #endif /* CONFIG_COMPACTION */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +static bool no_shootdown_context(void) +{ + /* + * If it performs with irq disabled, that might cause a deadlock. + * Avoid tlb shootdown in this case. + */ + return !(!irqs_disabled() && in_task()); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_start(void) +{ + unsigned long flags; + bool no_shootdown = no_shootdown_context(); + + local_irq_save(flags); + + /* + * It's the outmost luf_takeoff_start(). + */ + if (!current->luf_takeoff_started) + VM_WARN_ON(current->luf_no_shootdown); + + /* + * current->luf_no_shootdown > 0 doesn't mean tlb shootdown is + * not allowed at all. However, it guarantees tlb shootdown is + * possible once current->luf_no_shootdown == 0. It might look + * too conservative but for now do this way for simplity. + */ + if (no_shootdown || current->luf_no_shootdown) + current->luf_no_shootdown++; + + current->luf_takeoff_started++; + local_irq_restore(flags); + + return !no_shootdown; +} + +/* + * Should be called within the same context of luf_takeoff_start(). + */ +void luf_takeoff_end(void) +{ + unsigned long flags; + bool no_shootdown; + bool outmost = false; + + local_irq_save(flags); + VM_WARN_ON(!current->luf_takeoff_started); + + /* + * Assume the context and irq flags are same as those at + * luf_takeoff_start(). + */ + if (current->luf_no_shootdown) + current->luf_no_shootdown--; + + no_shootdown = !!current->luf_no_shootdown; + + current->luf_takeoff_started--; + + /* + * It's the outmost luf_takeoff_end(). + */ + if (!current->luf_takeoff_started) + outmost = true; + + local_irq_restore(flags); + + if (no_shootdown) + goto out; + + try_to_unmap_flush_takeoff(); +out: + if (outmost) + VM_WARN_ON(current->luf_no_shootdown); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_no_shootdown(void) +{ + bool no_shootdown = true; + unsigned long flags; + + local_irq_save(flags); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + goto out; + } + no_shootdown = current->luf_no_shootdown; +out: + local_irq_restore(flags); + return no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check(struct page *page) +{ + unsigned short luf_key = page_luf_key(page); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + return !current->luf_no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check_and_fold(struct page *page) +{ + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long flags; + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc_takeoff, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + return true; +} +#endif + static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { diff --git a/mm/rmap.c b/mm/rmap.c index 61366b4570c9a..40de03c8f73be 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -693,7 +693,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, /* * Use 0th entry as accumulated batch. */ -static struct luf_batch luf_batch[NR_LUF_BATCH]; +struct luf_batch luf_batch[NR_LUF_BATCH]; static void luf_batch_init(struct luf_batch *lb) { From patchwork Wed Feb 26 12:01:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992180 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 210C6C021BF for ; Wed, 26 Feb 2025 12:02:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3308D280023; Wed, 26 Feb 2025 07:01:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 290A9280025; Wed, 26 Feb 2025 07:01:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BF16280023; Wed, 26 Feb 2025 07:01:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D5093280025 for ; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 109951C9377 for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) X-FDA: 83161956780.08.7947800 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf16.hostedemail.com (Postfix) with ESMTP id A6C2418001D for ; Wed, 26 Feb 2025 12:01:47 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571308; a=rsa-sha256; cv=none; b=P1EF9C9acDJbdasGnY9cfYFL/qim1zkrwC+7OneJXLwhFfXY17VNJ8dJ/VMK2HLyZBYysG Xhz+4deox7ss4poUyGBbN32e7bwsZj9a6Z7SphZmNT43HeFtHZkpAHY7f6CiVtDFjanaZd kvebzOt0oU7cPwiaNyroH+G8HqzScY8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=mk3Wod11maNiiKO67wsf4vQnbG+nLdVGn3KjiyxKMhE=; b=krny0iatGMN8X6g0JMvup9m17sZxhajWgpAnHdClFpliC+/aVMXvyqEdVzYzw8Kh5JvaIK OgS0xlXcdtHA0Tu65WgeOp+yXKVDhfLgVpHzXthLBQMdbGRB48LBiI7uIV7t9lloefpv1/ Kf5f9WMHxaVJ27Qcf32R1RLnMOGU8pk= X-AuditID: a67dfc5b-3e1ff7000001d7ae-f9-67bf02a6a88e From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 11/25] mm: deliver luf_key to pcp or buddy on free after unmapping Date: Wed, 26 Feb 2025 21:01:18 +0900 Message-Id: <20250226120132.28469-11-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBo27rSzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+PGw1fMBYv6GSvaHvSxNzC2lnQx cnJICJhIbFxwlQ3Gvn1nHpjNJqAucePGT2YQW0TATOJg6x/2LkYuDmaBZUwSe080sIE4wgJN jBKb378E62ARUJVYfv0KO4jNC9SxddZDFoip8hKrNxwAm8QJFP+3+zdYjZBAskTL+t8sIIMk BO6zSSzZN5cJokFS4uCKGywTGHkXMDKsYhTKzCvLTczMMdHLqMzLrNBLzs/dxAgM7WW1f6J3 MH66EHyIUYCDUYmH98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU 5mBREuc1+laeIiSQnliSmp2aWpBaBJNl4uCUamCcV8nwRPBP6dwlrg9eJOsdmmpzVWrVxxmx +5w+lxudur9E3z6dVVz0N9sieY8y5Sn/b72fx8X1rWa7mUvPktrrorXsF1XmzeNdt5KzeG9o gMGs7FNLn5lEZHzW4H3VdnUX75Jv/8M6z5VX+DK+2xK908VpruTyfaWny90T/ls+vTwjM/OA bfQTJZbijERDLeai4kQAi2kfKWkCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wZMt+hZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBl3Hj4irlgUT9j RduDPvYGxtaSLkZODgkBE4nbd+axgdhsAuoSN278ZAaxRQTMJA62/mHvYuTiYBZYxiSx90QD G4gjLNDEKLH5/UuwDhYBVYnl16+wg9i8QB1bZz1kgZgqL7F6wwGwSZxA8X+7f4PVCAkkS7Ss /80ygZFrASPDKkaRzLyy3MTMHFO94uyMyrzMCr3k/NxNjMBAXVb7Z+IOxi+X3Q8xCnAwKvHw PjizN12INbGsuDL3EKMEB7OSCC9n5p50Id6UxMqq1KL8+KLSnNTiQ4zSHCxK4rxe4akJQgLp iSWp2ampBalFMFkmDk6pBsbNLsc+Sm+X3tJ0p3VHFeO14OCJqoYBImY+x2WXpJ9eb7vKyL+P g/ekqueyl4vCWN2D1mbsXV5dbuMYHqIVNWFfbW7WlxfFd3akCe2qPNK87u4D+Wy9g1YVj21q nAVdvy+3uzn1Zirby9VHNq85Xs0aP+tLiiNH8c7gs6d2FRzXPWoc/4zt6kQlluKMREMt5qLi RAAjjQF+UAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: A6C2418001D X-Stat-Signature: ob6tmgcriskq9scygrrderm5bkie5kqp X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740571307-496753 X-HE-Meta: U2FsdGVkX1+5KuVMEy0BxFGHbInkmeGEzkCgy1xmWHhOW5xros8S18Mh+NsdKuvBghQsIZlQp5/mRPhk+i5Q5rNJXlr8fiFfUrGZo3YGMDzaSN6PY9AB4Jwqj4jVJaMm9vXz0eWetQjW3P0NXBktj48r8plgvlBuAu/pEL9PH6yNEvYHC0e/mx07cRXvCXbiyM5Y+8X0+Gggkb/SjHF7f2jE6jPQ0sFZrkBpCBDq1VzDBVJAxF051P0/uy2RcEEwXi9LeAOjbm8dSH7I+yDxrn8MFB+iTHC6y889Qxn53lGgfIApw2V21gwW9K8uGGz0U7Qqc7W072hw0LsacNKHHBfW6y6j3cCWkKA1Pvs6+sH3oNAAWpGI12J4bgH5RJmBcRT3QmCk8PmePfy4Dp76s9BGyJdfe7uxfcI/H+NpcBH7JHtmyb+7yXz01mFoICfKkKtDpY6LGTpuI3+tNMpbNG0CU4Qfq66tTpKV30zF/DwVryPsk7hE5A+wisxUS9tZ8h6QF0sI3XQt+w8MB2KzK9bsSgDMZPdeivJ/c+oojLFkuN1vbmyOG50CRwRkqXTpk1XQgCA24LB2ziU7RCUp2ylRkrpsVjjm/WYIOUeTgY3WGLIpvLLR6H6Z4HUVX4DwY4HqxFtSn5uPAif2njZWJC1DoQcWGoW1s266Yp5zs7a0qdNgbe/6GHbn8QRVNOzwvSU6bcEc2ku6Jyf4zelfu/FQVVt8dINee+C6BVg3Uc1dvZjPBkEftcCvAkz9xm6mCJa1uNAv4NM1ASP0KLAqdlWCHRG+mTrQ2dEc7d7kBak/7ojkxpRP4EP3wO35HhY3v9vDcQeUP7uDOhrZPrFBMsV40udV3tsq31j13J7lIqMTIa/ACGi5d7A8P2Bch46gt7GkEo4853gV71tcYB4b+ijjt+1iEzhLAEBgm04rChURNg8qxXfysnjssv5XAKKpAwAujmrzf4yC2U6nuiL Q1NM/bnQ jshfFY8pVEaglMv5CnrRheH9NrRmTNJMFJQ2kL5PtQNgz8JL6tbACnboBgcIJygdQiG1D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that needs to pass luf_key to pcp or buddy allocator on free after unmapping e.g. during page reclaim or page migration. The luf_key will be used to track need of tlb shootdown and which cpus need to perform tlb flush, per page residing in pcp or buddy, and should be handed over properly when pages travel between pcp and buddy. Signed-off-by: Byungchul Park --- mm/internal.h | 4 +- mm/page_alloc.c | 116 ++++++++++++++++++++++++++++++++----------- mm/page_frag_cache.c | 6 +-- mm/page_isolation.c | 6 +++ mm/page_reporting.c | 6 +++ mm/slub.c | 2 +- mm/swap.c | 4 +- mm/vmscan.c | 8 +-- 8 files changed, 111 insertions(+), 41 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index bf16482bce2f5..fe1c879b41487 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -746,8 +746,8 @@ struct page *__alloc_frozen_pages_noprof(gfp_t, unsigned int order, int nid, nodemask_t *); #define __alloc_frozen_pages(...) \ alloc_hooks(__alloc_frozen_pages_noprof(__VA_ARGS__)) -void free_frozen_pages(struct page *page, unsigned int order); -void free_unref_folios(struct folio_batch *fbatch); +void free_frozen_pages(struct page *page, unsigned int order, unsigned short luf_key); +void free_unref_folios(struct folio_batch *fbatch, unsigned short luf_key); #ifdef CONFIG_NUMA struct page *alloc_frozen_pages_noprof(gfp_t, unsigned int order); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f3cb02e36e770..986fdd57e8e3a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -212,7 +212,7 @@ unsigned int pageblock_order __read_mostly; #endif static void __free_pages_ok(struct page *page, unsigned int order, - fpi_t fpi_flags); + fpi_t fpi_flags, unsigned short luf_key); /* * results with 256, 32 in the lowmem_reserve sysctl: @@ -850,8 +850,13 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon list_del(&page->buddy_list); __ClearPageBuddy(page); - set_page_private(page, 0); zone->free_area[order].nr_free--; + + /* + * Keep head page's private until post_alloc_hook(). + * + * XXX: Tail pages' private doesn't get cleared. + */ } static inline void del_page_from_free_list(struct page *page, struct zone *zone, @@ -920,7 +925,7 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn, static inline void __free_one_page(struct page *page, unsigned long pfn, struct zone *zone, unsigned int order, - int migratetype, fpi_t fpi_flags) + int migratetype, fpi_t fpi_flags, unsigned short luf_key) { struct capture_control *capc = task_capc(zone); unsigned long buddy_pfn = 0; @@ -937,10 +942,21 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); + /* + * Use the page's luf_key unchanged if luf_key == 0. Worth + * noting that page_luf_key() will be 0 in most cases since it's + * initialized at free_pages_prepare(). + */ + if (luf_key) + set_page_luf_key(page, luf_key); + else + luf_key = page_luf_key(page); + while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; + unsigned short buddy_luf_key; - if (compaction_capture(capc, page, order, migratetype)) { + if (!luf_key && compaction_capture(capc, page, order, migratetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -973,6 +989,18 @@ static inline void __free_one_page(struct page *page, else __del_page_from_free_list(buddy, zone, order, buddy_mt); + /* + * !buddy_luf_key && !luf_key : do nothing + * buddy_luf_key && !luf_key : luf_key = buddy_luf_key + * !buddy_luf_key && luf_key : do nothing + * buddy_luf_key && luf_key : merge two into luf_key + */ + buddy_luf_key = page_luf_key(buddy); + if (buddy_luf_key && !luf_key) + luf_key = buddy_luf_key; + else if (buddy_luf_key && luf_key) + fold_luf_batch(&luf_batch[luf_key], &luf_batch[buddy_luf_key]); + if (unlikely(buddy_mt != migratetype)) { /* * Match buddy type. This ensures that an @@ -984,6 +1012,7 @@ static inline void __free_one_page(struct page *page, combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); + set_page_luf_key(page, luf_key); pfn = combined_pfn; order++; } @@ -1242,6 +1271,11 @@ __always_inline bool free_pages_prepare(struct page *page, VM_BUG_ON_PAGE(PageTail(page), page); + /* + * Ensure private is zero before using it inside allocator. + */ + set_page_private(page, 0); + trace_mm_page_free(page, order); kmsan_free_page(page, order); @@ -1407,7 +1441,8 @@ static void free_pcppages_bulk(struct zone *zone, int count, count -= nr_pages; pcp->count -= nr_pages; - __free_one_page(page, pfn, zone, order, mt, FPI_NONE); + __free_one_page(page, pfn, zone, order, mt, FPI_NONE, 0); + trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); } @@ -1431,7 +1466,7 @@ static void split_large_buddy(struct zone *zone, struct page *page, do { int mt = get_pfnblock_migratetype(page, pfn); - __free_one_page(page, pfn, zone, order, mt, fpi); + __free_one_page(page, pfn, zone, order, mt, fpi, 0); pfn += 1 << order; if (pfn == end) break; @@ -1441,11 +1476,18 @@ static void split_large_buddy(struct zone *zone, struct page *page, static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, - fpi_t fpi_flags) + fpi_t fpi_flags, unsigned short luf_key) { unsigned long flags; spin_lock_irqsave(&zone->lock, flags); + + /* + * valid luf_key can be passed only if order == 0. + */ + VM_WARN_ON(luf_key && order); + set_page_luf_key(page, luf_key); + split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -1453,13 +1495,13 @@ static void free_one_page(struct zone *zone, struct page *page, } static void __free_pages_ok(struct page *page, unsigned int order, - fpi_t fpi_flags) + fpi_t fpi_flags, unsigned short luf_key) { unsigned long pfn = page_to_pfn(page); struct zone *zone = page_zone(page); if (free_pages_prepare(page, order)) - free_one_page(zone, page, pfn, order, fpi_flags); + free_one_page(zone, page, pfn, order, fpi_flags, luf_key); } void __meminit __free_pages_core(struct page *page, unsigned int order, @@ -1507,7 +1549,7 @@ void __meminit __free_pages_core(struct page *page, unsigned int order, * Bypass PCP and place fresh pages right to the tail, primarily * relevant for memory onlining. */ - __free_pages_ok(page, order, FPI_TO_TAIL); + __free_pages_ok(page, order, FPI_TO_TAIL, 0); } /* @@ -2504,6 +2546,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, if (unlikely(page == NULL)) break; + /* + * Keep the page's luf_key. + */ + /* * Split buddy pages returned by expand() are received here in * physical page order. The page is added to the tail of @@ -2785,12 +2831,14 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_frozen_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, unsigned short luf_key) { int high, batch; int pindex; bool free_high = false; + set_page_luf_key(page, luf_key); + /* * On freeing, reduce the number of pages that are batch allocated. * See nr_pcp_alloc() where alloc_factor is increased for subsequent @@ -2799,7 +2847,16 @@ static void free_frozen_page_commit(struct zone *zone, pcp->alloc_factor >>= 1; __count_vm_events(PGFREE, 1 << order); pindex = order_to_pindex(migratetype, order); - list_add(&page->pcp_list, &pcp->lists[pindex]); + + /* + * Defer tlb shootdown as much as possible by putting luf'd + * pages to the tail. + */ + if (luf_key) + list_add_tail(&page->pcp_list, &pcp->lists[pindex]); + else + list_add(&page->pcp_list, &pcp->lists[pindex]); + pcp->count += 1 << order; batch = READ_ONCE(pcp->batch); @@ -2834,7 +2891,8 @@ static void free_frozen_page_commit(struct zone *zone, /* * Free a pcp page */ -void free_frozen_pages(struct page *page, unsigned int order) +void free_frozen_pages(struct page *page, unsigned int order, + unsigned short luf_key) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2843,7 +2901,7 @@ void free_frozen_pages(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, FPI_NONE, luf_key); return; } @@ -2861,7 +2919,7 @@ void free_frozen_pages(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE, luf_key); return; } migratetype = MIGRATE_MOVABLE; @@ -2870,10 +2928,10 @@ void free_frozen_pages(struct page *page, unsigned int order) pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_frozen_page_commit(zone, pcp, page, migratetype, order); + free_frozen_page_commit(zone, pcp, page, migratetype, order, luf_key); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, FPI_NONE, luf_key); } pcp_trylock_finish(UP_flags); } @@ -2881,7 +2939,7 @@ void free_frozen_pages(struct page *page, unsigned int order) /* * Free a batch of folios */ -void free_unref_folios(struct folio_batch *folios) +void free_unref_folios(struct folio_batch *folios, unsigned short luf_key) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp = NULL; @@ -2902,7 +2960,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (!pcp_allowed_order(order)) { free_one_page(folio_zone(folio), &folio->page, - pfn, order, FPI_NONE); + pfn, order, FPI_NONE, luf_key); continue; } folio->private = (void *)(unsigned long)order; @@ -2938,7 +2996,7 @@ void free_unref_folios(struct folio_batch *folios) */ if (is_migrate_isolate(migratetype)) { free_one_page(zone, &folio->page, pfn, - order, FPI_NONE); + order, FPI_NONE, luf_key); continue; } @@ -2951,7 +3009,7 @@ void free_unref_folios(struct folio_batch *folios) if (unlikely(!pcp)) { pcp_trylock_finish(UP_flags); free_one_page(zone, &folio->page, pfn, - order, FPI_NONE); + order, FPI_NONE, luf_key); continue; } locked_zone = zone; @@ -2966,7 +3024,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_frozen_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, luf_key); } if (pcp) { @@ -3058,7 +3116,7 @@ void __putback_isolated_page(struct page *page, unsigned int order, int mt) /* Return isolated page to tail of freelist. */ __free_one_page(page, page_to_pfn(page), zone, order, mt, - FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL); + FPI_SKIP_REPORT_NOTIFY | FPI_TO_TAIL, 0); } /* @@ -4944,7 +5002,7 @@ struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order, out: if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page && unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) { - free_frozen_pages(page, order); + free_frozen_pages(page, order, 0); page = NULL; } @@ -5024,11 +5082,11 @@ void __free_pages(struct page *page, unsigned int order) int head = PageHead(page); if (put_page_testzero(page)) - free_frozen_pages(page, order); + free_frozen_pages(page, order, 0); else if (!head) { pgalloc_tag_sub_pages(page, (1 << order) - 1); while (order-- > 0) - free_frozen_pages(page + (1 << order), order); + free_frozen_pages(page + (1 << order), order, 0); } } EXPORT_SYMBOL(__free_pages); @@ -5059,7 +5117,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order, last = page + (1UL << order); for (page += nr; page < last; page++) - __free_pages_ok(page, 0, FPI_TO_TAIL); + __free_pages_ok(page, 0, FPI_TO_TAIL, 0); } return (void *)addr; } @@ -7077,7 +7135,7 @@ bool put_page_back_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page, pfn); ClearPageHWPoisonTakenOff(page); - __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE, 0); if (TestClearPageHWPoison(page)) { ret = true; } @@ -7146,7 +7204,7 @@ static void __accept_page(struct zone *zone, unsigned long *flags, accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER); - __free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL); + __free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL, 0); if (last) static_branch_dec(&zones_with_unaccepted_pages); diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index d2423f30577e4..558622f15a81e 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -86,7 +86,7 @@ void __page_frag_cache_drain(struct page *page, unsigned int count) VM_BUG_ON_PAGE(page_ref_count(page) == 0, page); if (page_ref_sub_and_test(page, count)) - free_frozen_pages(page, compound_order(page)); + free_frozen_pages(page, compound_order(page), 0); } EXPORT_SYMBOL(__page_frag_cache_drain); @@ -139,7 +139,7 @@ void *__page_frag_alloc_align(struct page_frag_cache *nc, if (unlikely(encoded_page_decode_pfmemalloc(encoded_page))) { free_frozen_pages(page, - encoded_page_decode_order(encoded_page)); + encoded_page_decode_order(encoded_page), 0); goto refill; } @@ -166,6 +166,6 @@ void page_frag_free(void *addr) struct page *page = virt_to_head_page(addr); if (unlikely(put_page_testzero(page))) - free_frozen_pages(page, compound_order(page)); + free_frozen_pages(page, compound_order(page), 0); } EXPORT_SYMBOL(page_frag_free); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index b2fc5266e3d26..ac45a5f4e7b9f 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -265,6 +265,12 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) WARN_ON_ONCE(!move_freepages_block_isolate(zone, page, migratetype)); } else { set_pageblock_migratetype(page, migratetype); + + /* + * Do not clear the page's private to keep its luf_key + * unchanged. + */ + __putback_isolated_page(page, order, migratetype); } zone->nr_isolate_pageblock--; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e4c428e61d8c1..c05afb7a395f1 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -116,6 +116,12 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, int mt = get_pageblock_migratetype(page); unsigned int order = get_order(sg->length); + /* + * Ensure private is zero before putting into the + * allocator. + */ + set_page_private(page, 0); + __putback_isolated_page(page, order, mt); /* If the pages were not reported due to error skip flagging */ diff --git a/mm/slub.c b/mm/slub.c index 184fd2b147584..812b24ed16ea1 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2665,7 +2665,7 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) __folio_clear_slab(folio); mm_account_reclaimed_pages(pages); unaccount_slab(slab, order, s); - free_frozen_pages(&folio->page, order); + free_frozen_pages(&folio->page, order, 0); } static void rcu_free_slab(struct rcu_head *h) diff --git a/mm/swap.c b/mm/swap.c index 7523b65d8caa6..bdfede631aea9 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -109,7 +109,7 @@ void __folio_put(struct folio *folio) page_cache_release(folio); folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); - free_frozen_pages(&folio->page, folio_order(folio)); + free_frozen_pages(&folio->page, folio_order(folio), 0); } EXPORT_SYMBOL(__folio_put); @@ -989,7 +989,7 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) folios->nr = j; mem_cgroup_uncharge_folios(folios); - free_unref_folios(folios); + free_unref_folios(folios, 0); } EXPORT_SYMBOL(folios_put_refs); diff --git a/mm/vmscan.c b/mm/vmscan.c index fcca38bc640f5..c8a995a3380ac 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1525,7 +1525,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); } continue; @@ -1594,7 +1594,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); list_splice(&ret_folios, folio_list); count_vm_events(PGACTIVATE, pgactivate); @@ -1918,7 +1918,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); spin_lock_irq(&lruvec->lru_lock); } @@ -1940,7 +1940,7 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec, if (free_folios.nr) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); - free_unref_folios(&free_folios); + free_unref_folios(&free_folios, 0); spin_lock_irq(&lruvec->lru_lock); } From patchwork Wed Feb 26 12:01:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CAB6C021B8 for ; Wed, 26 Feb 2025 12:02:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01FE3280026; Wed, 26 Feb 2025 07:01:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F12C8280025; Wed, 26 Feb 2025 07:01:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA1E4280026; Wed, 26 Feb 2025 07:01:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 996B9280025 for ; Wed, 26 Feb 2025 07:01:52 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 360461613E0 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) X-FDA: 83161956864.18.5CFB717 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id 28ECF40018 for ; Wed, 26 Feb 2025 12:01:48 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=gBXyulMHFbGDH+ZukCqKXM5XioQ1A3MDYGxoFh0QtvA=; b=NMCRzwJKO5LbpT0zGwA1WGzH2gDJQ3Lc8Kh4OmzFBxF8UWIxyPQ7FFnFLdaPR8AdILVE30 Y3YhitLrnuGcAVrZo+8mj+VLbvHL6ZgPTijM8j4AV2KXaPNnJ6KOpUmhR2E3zydXhnjsrS FcCQplxfGYiSZZ4Z5juJEvABjQCUwS8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571309; a=rsa-sha256; cv=none; b=wHfLnBQ3g/utv+5q5V32pnWR4tlkH36F2IqI4df2k1h4U3EpzXiaMFr09CgdSI6zdloVtr keARZfjTvkZwUkLrJpOt9TcGt1LrAsi2Gv/RkdxQ9O5SgG50Ppz5aluwchW3DVQHKBrrLu FgOIlYP7Bie9WI09QMrIVlpmnn1R330= X-AuditID: a67dfc5b-3e1ff7000001d7ae-fe-67bf02a6a7b1 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 12/25] mm: delimit critical sections to take off pages from pcp or buddy alloctor Date: Wed, 26 Feb 2025 21:01:19 +0900 Message-Id: <20250226120132.28469-12-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBof3MFrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8a0L19YC3Y2M1as3/6arYFxclYX IyeHhICJxPKtaxhh7LV/drCA2GwC6hI3bvxkBrFFBMwkDrb+Ye9i5OJgFljGJLH3RAMbiCMs MJFRYlPDcVaQKhYBVYld36eD2bxAHT1fX7NATJWXWL3hANgkTqD4v92/2UFsIYFkiZb1v1lA BkkI3GeTONy6ngmiQVLi4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus0EvOz93ECAztZbV/ oncwfroQfIhRgINRiYf3wZm96UKsiWXFlbmHGCU4mJVEeDkz96QL8aYkVlalFuXHF5XmpBYf YpTmYFES5zX6Vp4iJJCeWJKanZpakFoEk2Xi4JRqYIwRsgj+cXZ7hk/HvMeW57m+nrsud130 2knWx2ytr033zJ3rOE9iziz+R01tbXue/NkZnb2qfnLxbEEpj0Msc5ZtO+ARcfqj7rS7IetX 3lfaq7Z3v4dK+794laydzOtu3D8Yzfl9mw1LkLbea73dYonVD0Vn/FLnSizxPDDdIKtUliEm pzcu31+JpTgj0VCLuag4EQDQc5zvaQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wa2bJhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlTPvyhbVgZzNj xfrtr9kaGCdndTFyckgImEis/bODBcRmE1CXuHHjJzOILSJgJnGw9Q97FyMXB7PAMiaJvSca 2EAcYYGJjBKbGo6zglSxCKhK7Po+HczmBero+fqaBWKqvMTqDQfAJnECxf/t/s0OYgsJJEu0 rP/NMoGRawEjwypGkcy8stzEzBxTveLsjMq8zAq95PzcTYzAQF1W+2fiDsYvl90PMQpwMCrx 8D44szddiDWxrLgy9xCjBAezkggvZ+aedCHelMTKqtSi/Pii0pzU4kOM0hwsSuK8XuGpCUIC 6YklqdmpqQWpRTBZJg5OqQZGthU7JuW+263ezSC7VepZuusspjjpnDdzFr3cImjDEl1urHPM b2f+rEfGr2PEUtUWBi3ZwzLRvoTjudS1B4VF95lUXGJV6nx3HDwp1FMpHd9/yu/yj5klp9bU fT8lP6su3H5J+Y9lpiV87LJOpo4H/M+WL+vy0zyXsPuYX3DF4sjrP0NnSU9RYinOSDTUYi4q TgQAYk0z41ACAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 28ECF40018 X-Stat-Signature: iau9t5zc9u9gqqiqea54qf4bea6hr1s7 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571308-974225 X-HE-Meta: U2FsdGVkX1+GRa+r6VHuUM6Msau+y/9c7xk5eiH+GaUtSq18Yi8oiU80QncC5yC00rObvkMZv18SFdOqFEppT7/Lx1K5CZIzT2qFA3FCepa3M/2OmZsj2S7kVYAMvW0m8K7c+eOMlojAoGkiOn3TuVDPlTb/IM7Xa3v+gEp7qHoXX4DEYcHGSmEmgAh4q6/0Zvwty1CEnGu2iQ/dMiMNBEbKuSI+qgPkVjsq2LTIFv7vuaK4KIcoOAr1jq6UuL7/wKpVMPKNskkT5b8/ghb2Y8ZDIomNuc/mh8LTqQeZ3Sc2aDGR+HTXob/V6ehuueQEv8rejEUs6NPbTgAYV+r1bA8oFOCGYk63u/a9xG/B6O5pwcY09rcAPGS7miou8pAf14zZv5aAVf5t7E4FQYzgrnTCYqExOS/2PzLDeO6Zpv1Dh8f5kfSnZ6+9rIlRbg/TYGVLVV3UL3r9ycwyu7bfiNAhrfbZX8WDJuIjPcrOKG3Pbnysjb0RDildDRfyE8LQhhbKIayZqVDOlV9PLbnRb95p/pDT6CQcjghFRJpFSOGgVbedf5izgiOIwY53Ci2H8Nlaeknndx5NcFFqWe+Qei/SNdPioH5bO7cnMmXAxvOgCPxp0Z0eMEfh92M2HQ3eDA/5FMlUHZGnjVA09eyqW8X5Mv/nsafhzwFbtuHCzkyWv7RN+Z4ggLBGD0jnFyKechYAnk3m7PpMoFRVVLt3gnre/BQOnFAkmHYO46RKiRJ1i91F3tWh0Qghy64kisvvjLs44xZdzMHPJ8eKmGfr+nARTEjYmT/Dxq3pb2Ri+6nYths6f+2Xi/x41CShCoOoMgMtS50OEG1BuDG5wlS9LPI+z5iEmybO9kHZLz4aFHltSoz0RNi7uQuGlYfgEbVaEF24Nswh8hJi/lgmKPvTztNViCCiyJ77qs4vdGHCJVw/Vup1J/4jKNFjFspsC5cRk+yuDgnMy2aGwPIcsJQ BOhpLHfn JlkSFPVhRiWFHM2osoq+47XP6Qg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now that luf mechanism has been introduced, tlb shootdown might be necessary when luf'd pages exit from pcp or buddy allocator. Check if it's okay to take off pages and can perform for luf'd pages before use. Signed-off-by: Byungchul Park --- mm/compaction.c | 32 ++++++++++++++++-- mm/internal.h | 2 +- mm/page_alloc.c | 79 +++++++++++++++++++++++++++++++++++++++++++-- mm/page_isolation.c | 4 ++- mm/page_reporting.c | 20 +++++++++++- 5 files changed, 129 insertions(+), 8 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index e5744f354edea..bf5ded83b9dd1 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -606,6 +606,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, page = pfn_to_page(blockpfn); + luf_takeoff_start(); /* Isolate free pages. */ for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; @@ -654,9 +655,12 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } + if (!luf_takeoff_check(page)) + goto isolate_fail; + /* Found a free page, will break it into order-0 pages */ order = buddy_order(page); - isolated = __isolate_free_page(page, order); + isolated = __isolate_free_page(page, order, false); if (!isolated) break; set_page_private(page, order); @@ -684,6 +688,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, if (locked) spin_unlock_irqrestore(&cc->zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* * Be careful to not go outside of the pageblock. */ @@ -1591,6 +1600,7 @@ static void fast_isolate_freepages(struct compact_control *cc) if (!area->nr_free) continue; + luf_takeoff_start(); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; list_for_each_entry_reverse(freepage, freelist, buddy_list) { @@ -1598,6 +1608,10 @@ static void fast_isolate_freepages(struct compact_control *cc) order_scanned++; nr_scanned++; + + if (!luf_takeoff_check(freepage)) + goto scan_next; + pfn = page_to_pfn(freepage); if (pfn >= highest) @@ -1617,7 +1631,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Shorten the scan if a candidate is found */ limit >>= 1; } - +scan_next: if (order_scanned >= limit) break; } @@ -1635,7 +1649,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* Isolate the page if available */ if (page) { - if (__isolate_free_page(page, order)) { + if (__isolate_free_page(page, order, false)) { set_page_private(page, order); nr_isolated = 1 << order; nr_scanned += nr_isolated - 1; @@ -1652,6 +1666,11 @@ static void fast_isolate_freepages(struct compact_control *cc) spin_unlock_irqrestore(&cc->zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* Skip fast search if enough freepages isolated */ if (cc->nr_freepages >= cc->nr_migratepages) break; @@ -2373,7 +2392,14 @@ static enum compact_result compact_finished(struct compact_control *cc) { int ret; + /* + * luf_takeoff_{start,end}() is required to identify whether + * this compaction context is tlb shootdownable for luf'd pages. + */ + luf_takeoff_start(); ret = __compact_finished(cc); + luf_takeoff_end(); + trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) ret = COMPACT_CONTINUE; diff --git a/mm/internal.h b/mm/internal.h index fe1c879b41487..77b7e6d0bcc29 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -666,7 +666,7 @@ static inline void clear_zone_contiguous(struct zone *zone) zone->contiguous = false; } -extern int __isolate_free_page(struct page *page, unsigned int order); +extern int __isolate_free_page(struct page *page, unsigned int order, bool willputback); extern void __putback_isolated_page(struct page *page, unsigned int order, int mt); extern void memblock_free_pages(struct page *page, unsigned long pfn, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 986fdd57e8e3a..a0182421da13e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -869,8 +869,13 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, static inline struct page *get_page_from_free_area(struct free_area *area, int migratetype) { - return list_first_entry_or_null(&area->free_list[migratetype], + struct page *page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + + return NULL; } /* @@ -1653,6 +1658,8 @@ static __always_inline void page_del_and_expand(struct zone *zone, int nr_pages = 1 << high; __del_page_from_free_list(page, zone, high, migratetype); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); nr_pages -= expand(zone, page, low, high, migratetype); account_freepages(zone, -nr_pages, migratetype); } @@ -2023,6 +2030,13 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(buddy, zone, order, get_pfnblock_migratetype(buddy, pfn)); + + /* + * No need to luf_takeoff_check_and_fold() since it's + * going back to buddy. luf_key will be handed over in + * split_large_buddy(). + */ + set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, buddy, pfn, order, FPI_NONE); return true; @@ -2034,6 +2048,13 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(page, zone, order, get_pfnblock_migratetype(page, pfn)); + + /* + * No need to luf_takeoff_check_and_fold() since it's + * going back to buddy. luf_key will be handed over in + * split_large_buddy(). + */ + set_pageblock_migratetype(page, migratetype); split_large_buddy(zone, page, pfn, order, FPI_NONE); return true; @@ -2166,6 +2187,8 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int nr_added; del_page_from_free_list(page, zone, current_order, block_type); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); change_pageblock_range(page, current_order, start_type); nr_added = expand(zone, page, order, current_order, start_type); account_freepages(zone, nr_added, start_type); @@ -2246,6 +2269,9 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (free_area_empty(area, fallback_mt)) continue; + if (luf_takeoff_no_shootdown()) + continue; + if (can_steal_fallback(order, migratetype)) *can_steal = true; @@ -2337,6 +2363,11 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, pageblock_nr_pages) continue; + /* + * luf_takeoff_{start,end}() is required for + * get_page_from_free_area() to use luf_takeoff_check(). + */ + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2394,10 +2425,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, WARN_ON_ONCE(ret == -1); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); return ret; } } spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); } return false; @@ -2539,6 +2572,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -2563,6 +2597,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add_tail(&page->pcp_list, list); } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return i; } @@ -3057,7 +3095,7 @@ void split_page(struct page *page, unsigned int order) } EXPORT_SYMBOL_GPL(split_page); -int __isolate_free_page(struct page *page, unsigned int order) +int __isolate_free_page(struct page *page, unsigned int order, bool willputback) { struct zone *zone = page_zone(page); int mt = get_pageblock_migratetype(page); @@ -3076,6 +3114,8 @@ int __isolate_free_page(struct page *page, unsigned int order) } del_page_from_free_list(page, zone, order, mt); + if (unlikely(!willputback && !luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); /* * Set the pageblock if the isolated page is at least half of a @@ -3155,6 +3195,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); @@ -3172,10 +3213,15 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (!page) { spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(); return NULL; } } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3259,6 +3305,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } page = list_first_entry(list, struct page, pcp_list); + if (!luf_takeoff_check_and_fold(page)) + return NULL; list_del(&page->pcp_list); pcp->count -= 1 << order; } while (check_new_pages(page, order)); @@ -3276,11 +3324,13 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long __maybe_unused UP_flags; + luf_takeoff_start(); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (!pcp) { pcp_trylock_finish(UP_flags); + luf_takeoff_end(); return NULL; } @@ -3294,6 +3344,10 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); @@ -4892,6 +4946,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; + luf_takeoff_start(); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -4927,6 +4982,10 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, pcp_spin_unlock(pcp); pcp_trylock_finish(UP_flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(zonelist_zone(ac.preferred_zoneref), zone, nr_account); @@ -4936,6 +4995,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, failed_irq: pcp_trylock_finish(UP_flags); + luf_takeoff_end(); failed: page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask); @@ -6989,6 +7049,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); while (pfn < end_pfn) { page = pfn_to_page(pfn); @@ -7017,9 +7078,15 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); + if (unlikely(!luf_takeoff_check_and_fold(page))) + VM_WARN_ON(1); pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return end_pfn - start_pfn - already_offline; } @@ -7095,6 +7162,7 @@ bool take_page_off_buddy(struct page *page) unsigned int order; bool ret = false; + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); @@ -7107,6 +7175,8 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order, migratetype); + if (unlikely(!luf_takeoff_check_and_fold(page_head))) + VM_WARN_ON(1); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); SetPageHWPoisonTakenOff(page); @@ -7117,6 +7187,11 @@ bool take_page_off_buddy(struct page *page) break; } spin_unlock_irqrestore(&zone->lock, flags); + + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); return ret; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index ac45a5f4e7b9f..521ed32bdbf67 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -218,6 +218,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) struct page *buddy; zone = page_zone(page); + luf_takeoff_start(); spin_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; @@ -236,7 +237,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) buddy = find_buddy_page_pfn(page, page_to_pfn(page), order, NULL); if (buddy && !is_migrate_isolate_page(buddy)) { - isolated_page = !!__isolate_free_page(page, order); + isolated_page = !!__isolate_free_page(page, order, true); /* * Isolating a free page in an isolated pageblock * is expected to always work as watermarks don't @@ -276,6 +277,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) zone->nr_isolate_pageblock--; out: spin_unlock_irqrestore(&zone->lock, flags); + luf_takeoff_end(zone); } static inline struct page * diff --git a/mm/page_reporting.c b/mm/page_reporting.c index c05afb7a395f1..03a7f5f6dc073 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -167,6 +167,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (list_empty(list)) return err; + luf_takeoff_start(); spin_lock_irq(&zone->lock); /* @@ -191,6 +192,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (PageReported(page)) continue; + if (!luf_takeoff_check(page)) { + VM_WARN_ON(1); + continue; + } + /* * If we fully consumed our budget then update our * state to indicate that we are requesting additional @@ -204,7 +210,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* Attempt to pull page from list and place in scatterlist */ if (*offset) { - if (!__isolate_free_page(page, order)) { + if (!__isolate_free_page(page, order, false)) { next = page; break; } @@ -227,6 +233,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* release lock before waiting on report processing */ spin_unlock_irq(&zone->lock); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + /* begin processing pages in local list */ err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); @@ -236,6 +247,8 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* update budget to reflect call to report function */ budget--; + luf_takeoff_start(); + /* reacquire zone lock and resume processing */ spin_lock_irq(&zone->lock); @@ -259,6 +272,11 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, spin_unlock_irq(&zone->lock); + /* + * Check and flush before using the pages taken off. + */ + luf_takeoff_end(); + return err; } From patchwork Wed Feb 26 12:01:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992185 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99F26C021B8 for ; Wed, 26 Feb 2025 12:02:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1871A28002B; Wed, 26 Feb 2025 07:01:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 096E5280029; Wed, 26 Feb 2025 07:01:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E032628002B; Wed, 26 Feb 2025 07:01:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9EE39280028 for ; Wed, 26 Feb 2025 07:01:54 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8666FC13FC for ; Wed, 26 Feb 2025 12:01:51 +0000 (UTC) X-FDA: 83161956822.15.98BD7EA Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id 5B3CA8001D for ; Wed, 26 Feb 2025 12:01:49 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571309; a=rsa-sha256; cv=none; b=DzDqphkRUGDqGmZYMFnaNyY61JdValVT6TTgeiNvD2z8rM/6/NCUTtIaTSY/HFE1pkY4Ix beUsYuMB2x67YORtkgwoafYxCsWPl3itJleFdpgrKOqkDBuDGMCm4A+/VqFcxj6xHvjrB8 7XrBwC3S0JSRWGgOfRXjQQ/11l6d44Y= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=1SMkY9CXdR0EJNsxAB5NliTA5DfHyPSZ7LqwQh3clkA=; b=lZdo3SCCC1F28WhpJQ4zN9vYpPZ0ezZN4yaQuql3vw/o83ITUwK9JbVvY8YIAL7NX6Ptp3 SAvcZkH/fwK0a0ZeU5pk8LMq0b3vh8y7QU1Oq5Db00nErp8Kb5o1twHEOs8FyNw4kKNxei fCu5vFJeodDMXOL4tgz2dleFv78xMwM= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3e1ff7000001d7ae-03-67bf02a63501 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 13/25] mm: introduce pend_list in struct free_area to track luf'd pages Date: Wed, 26 Feb 2025 21:01:20 +0900 Message-Id: <20250226120132.28469-13-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBg0nWC3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK2PX9j7Wgp+TGStu3n7O3sC4p6yL kZNDQsBEou/yC0YYu7PpNxOIzSagLnHjxk9mEFtEwEziYOsf9i5GLg5mgWVMEntPNLCBOMIC 7YwSu24uBOtgEVCV+PmuD6iKg4MXqOP0aWaIofISqzccALM5gcL/dv9mB7GFBJIlWtb/ZgGZ IyFwn01i0oopbBANkhIHV9xgmcDIu4CRYRWjUGZeWW5iZo6JXkZlXmaFXnJ+7iZGYGAvq/0T vYPx04XgQ4wCHIxKPLwPzuxNF2JNLCuuzD3EKMHBrCTCy5m5J12INyWxsiq1KD++qDQntfgQ ozQHi5I4r9G38hQhgfTEktTs1NSC1CKYLBMHp1QDY0PNlS3abHmOf3L/CKxPFpmz7e2/xb+S Wi4q5jz/7nzisMn5j+EiViqFN/31l+jeYA/0td+WknVctLNa6+BC6do5ufNNYjOVt76dp3dk Sv7NOBnx66eWcez+ajr94MONicfsb8xOSRK/FC1/+atK+9vLlr1fltdat+7nSv+Q5hBlf3X9 PHOLw0osxRmJhlrMRcWJAJQ/EGpoAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6Qdsdc4s569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgydm3vYy34OZmx 4ubt5+wNjHvKuhg5OSQETCQ6m34zgdhsAuoSN278ZAaxRQTMJA62/mHvYuTiYBZYxiSx90QD G4gjLNDOKLHr5kKwDhYBVYmf7/qAqjg4eIE6Tp9mhhgqL7F6wwEwmxMo/G/3b3YQW0ggWaJl /W+WCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgSG6bLaPxN3MH657H6IUYCDUYmH 98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBREuf1Ck9NEBJI TyxJzU5NLUgtgskycXBKNTCaMf0L1js+P7rvQ+EZp237pbc3tarcr7sktfh5zFXR+o75b3Zc 3Pu3yTPl1MbfW5wiMr+uilKsbPM3uhJhdUJHKKbm3Y6FZjN2eyYy3u/ZfzlowlG91vdLM+7+ W/wgb2/Lpu7FV5edX8e81UKwQ3E2xwfWagbHuqvyelZhfxcy/9Azqv/7wvmqEktxRqKhFnNR cSIA4pPk1E8CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 5B3CA8001D X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: deon169pb18xohbeyegphjxzxbcexmue X-HE-Tag: 1740571309-593352 X-HE-Meta: U2FsdGVkX1/52dbUybbEjKBZ1snSxMGCXBSZu8cujk9u+ZF2OQ5X4id2xCM51FY1wX4mGbZRmyA2QOmoR63SzuEhDt7I0N79tr5qIKjAmdTvVsA0Xyb1NA0A6LgUKM1dtA1HVpqg1XqggnN6uhKdEbifHXhbNFa+xOhDSDEeqtHOBiBS5ieOtN8llfeSW7oDz8kKt3vJPGCP7Yubo+ENBL8I1c6enWdUcvQogtpLc7Ofhi8J7UiaCcEG3tKIMNa+G+d48qrZ3LpMUNcSPk1G7UfFr9yESixyp1PXcWC3K2cPOEtHMQK6sdR0KsnstBpLubx0gDk/SYPT1APANzN6kSLwdXVwXIH+kOGBE0IAZqOY0Z61vQk0+urR/RkMPsvikUyvLXLurzXhdo3APVSVPxVuarZ/BAtT7zFQ25DDX1AhXOy423s2lMP09/iwhwIEVCPa82sXWZug5MFyvINZTEbM5/CGCd7VGhdWKug8+xGk2aCmq5Uz1op3SMmV9EEPiOsbQcbByOFi6BO4KZQ3ksSNq4BDG98WH0+Fl6i24pcY54n3bAnpOnE/HFtx9zv3vjSodCU273M/muwQgfokQND8DhBiC/HLCsaSwr7DFUaUJjhDX5s/dgQTEMHOyv3c7gUon7H33tsaPUUTeg4uzg5pBT8J9qCMVm7FRQ4mxFBQkod9AdqmkcGkET/gOH/twg3oUheIPcOkGXPIm44YxttmPBBO0ftBa54ni9yZTvk+zJBbZv5s477rI99hcRaXfawR2kMV3sa6ELa3hDm8C4zZ6ZZOo1/fbZ8Nyi/bghIfQ1IIgy4/PIiTGw8ZwT5cGnouIt9QRIhmMoyQVnMsm4C7eiYFTylXC80jSFAsC3LgFl+cSvhKSgEY9Z/rX5st1ygbonNmFbUgwQtp91UysYWWjdry1vayIFz7zFMZWUhroTvfkL7NfLXcdsxzZ1zrBaP/uBzq/Od2BNiNZX5 JdPks9tC N2cUaPCrDxcg9AKoWfGj1QfvPEIQIpBQ6E11uQgIByDCMJVJJETh88DjlgIePLZq3RUkY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: luf'd pages requires tlb shootdown on exiting from page allocator. For some page allocation request, it's okay to return luf'd page followed by tlb shootdown but it's not okay for e.g. irq context. This patch splitted the list in free_area into two, 'free_list' for non-luf'd pages and 'pend_list' for luf'd pages so that the buddy allocator can work better with various conditions of context. Signed-off-by: Byungchul Park --- include/linux/mmzone.h | 3 ++ kernel/power/snapshot.c | 14 ++++++ kernel/vmcore_info.c | 2 + mm/compaction.c | 33 ++++++++++--- mm/internal.h | 17 ++++++- mm/mm_init.c | 2 + mm/page_alloc.c | 105 ++++++++++++++++++++++++++++++++++------ mm/page_reporting.c | 22 ++++++--- mm/vmstat.c | 15 ++++++ 9 files changed, 184 insertions(+), 29 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 550dbba92521a..9294cbbe698fc 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -116,6 +116,7 @@ extern int page_group_by_mobility_disabled; MIGRATETYPE_MASK) struct free_area { struct list_head free_list[MIGRATE_TYPES]; + struct list_head pend_list[MIGRATE_TYPES]; unsigned long nr_free; }; @@ -1014,6 +1015,8 @@ struct zone { /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; + /* Count pages that need tlb shootdown on allocation */ + atomic_long_t nr_luf_pages; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index c9fb559a63993..ca10796855aba 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1285,6 +1285,20 @@ static void mark_free_pages(struct zone *zone) swsusp_set_page_free(pfn_to_page(pfn + i)); } } + + list_for_each_entry(page, + &zone->free_area[order].pend_list[t], buddy_list) { + unsigned long i; + + pfn = page_to_pfn(page); + for (i = 0; i < (1UL << order); i++) { + if (!--page_count) { + touch_nmi_watchdog(); + page_count = WD_PAGE_COUNT; + } + swsusp_set_page_free(pfn_to_page(pfn + i)); + } + } } spin_unlock_irqrestore(&zone->lock, flags); } diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index 1fec61603ef32..638deb57f9ddd 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -188,11 +188,13 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_OFFSET(zone, vm_stat); VMCOREINFO_OFFSET(zone, spanned_pages); VMCOREINFO_OFFSET(free_area, free_list); + VMCOREINFO_OFFSET(free_area, pend_list); VMCOREINFO_OFFSET(list_head, next); VMCOREINFO_OFFSET(list_head, prev); VMCOREINFO_LENGTH(zone.free_area, NR_PAGE_ORDERS); log_buf_vmcoreinfo_setup(); VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); + VMCOREINFO_LENGTH(free_area.pend_list, MIGRATE_TYPES); VMCOREINFO_NUMBER(NR_FREE_PAGES); VMCOREINFO_NUMBER(PG_lru); VMCOREINFO_NUMBER(PG_private); diff --git a/mm/compaction.c b/mm/compaction.c index bf5ded83b9dd1..5dfa53252d75b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1592,24 +1592,28 @@ static void fast_isolate_freepages(struct compact_control *cc) order = next_search_order(cc, order)) { struct free_area *area = &cc->zone->free_area[order]; struct list_head *freelist; + struct list_head *high_pfn_list; struct page *freepage; unsigned long flags; unsigned int order_scanned = 0; unsigned long high_pfn = 0; + bool consider_pend = false; + bool can_shootdown; if (!area->nr_free) continue; - luf_takeoff_start(); + can_shootdown = luf_takeoff_start(); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; +retry: list_for_each_entry_reverse(freepage, freelist, buddy_list) { unsigned long pfn; order_scanned++; nr_scanned++; - if (!luf_takeoff_check(freepage)) + if (unlikely(consider_pend && !luf_takeoff_check(freepage))) goto scan_next; pfn = page_to_pfn(freepage); @@ -1622,26 +1626,34 @@ static void fast_isolate_freepages(struct compact_control *cc) cc->fast_search_fail = 0; cc->search_order = order; page = freepage; - break; + goto done; } if (pfn >= min_pfn && pfn > high_pfn) { high_pfn = pfn; + high_pfn_list = freelist; /* Shorten the scan if a candidate is found */ limit >>= 1; } scan_next: if (order_scanned >= limit) - break; + goto done; } + if (!consider_pend && can_shootdown) { + consider_pend = true; + freelist = &area->pend_list[MIGRATE_MOVABLE]; + goto retry; + } +done: /* Use a maximum candidate pfn if a preferred one was not found */ if (!page && high_pfn) { page = pfn_to_page(high_pfn); /* Update freepage for the list reorder below */ freepage = page; + freelist = high_pfn_list; } /* Reorder to so a future search skips recent pages */ @@ -2040,18 +2052,20 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) struct list_head *freelist; unsigned long flags; struct page *freepage; + bool consider_pend = false; if (!area->nr_free) continue; spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; +retry: list_for_each_entry(freepage, freelist, buddy_list) { unsigned long free_pfn; if (nr_scanned++ >= limit) { move_freelist_tail(freelist, freepage); - break; + goto done; } free_pfn = page_to_pfn(freepage); @@ -2074,9 +2088,16 @@ static unsigned long fast_find_migrateblock(struct compact_control *cc) pfn = cc->zone->zone_start_pfn; cc->fast_search_fail = 0; found_block = true; - break; + goto done; } } + + if (!consider_pend) { + consider_pend = true; + freelist = &area->pend_list[MIGRATE_MOVABLE]; + goto retry; + } +done: spin_unlock_irqrestore(&cc->zone->lock, flags); } diff --git a/mm/internal.h b/mm/internal.h index 77b7e6d0bcc29..d34fd43086d89 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -865,11 +865,16 @@ static inline void init_cma_pageblock(struct page *page) int find_suitable_fallback(struct free_area *area, unsigned int order, int migratetype, bool only_stealable, bool *can_steal); -static inline bool free_area_empty(struct free_area *area, int migratetype) +static inline bool free_list_empty(struct free_area *area, int migratetype) { return list_empty(&area->free_list[migratetype]); } +static inline bool free_area_empty(struct free_area *area, int migratetype) +{ + return list_empty(&area->free_list[migratetype]) && + list_empty(&area->pend_list[migratetype]); +} /* mm/util.c */ struct anon_vma *folio_anon_vma(const struct folio *folio); @@ -1605,12 +1610,22 @@ void luf_takeoff_end(void); bool luf_takeoff_no_shootdown(void); bool luf_takeoff_check(struct page *page); bool luf_takeoff_check_and_fold(struct page *page); + +static inline bool non_luf_pages_ok(struct zone *zone) +{ + unsigned long nr_free = zone_page_state(zone, NR_FREE_PAGES); + unsigned long min_wm = min_wmark_pages(zone); + unsigned long nr_luf_pages = atomic_long_read(&zone->nr_luf_pages); + + return nr_free - nr_luf_pages > min_wm; +} #else static inline bool luf_takeoff_start(void) { return false; } static inline void luf_takeoff_end(void) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct page *page) { return true; } static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +static inline bool non_luf_pages_ok(struct zone *zone) { return true; } #endif /* pagewalk.c */ diff --git a/mm/mm_init.c b/mm/mm_init.c index 133640a93d1da..81c5060496112 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1421,12 +1421,14 @@ static void __meminit zone_init_free_lists(struct zone *zone) unsigned int order, t; for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); + INIT_LIST_HEAD(&zone->free_area[order].pend_list[t]); zone->free_area[order].nr_free = 0; } #ifdef CONFIG_UNACCEPTED_MEMORY INIT_LIST_HEAD(&zone->unaccepted_pages); #endif + atomic_long_set(&zone->nr_luf_pages, 0); } void __meminit init_currently_empty_zone(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a0182421da13e..530c5c16ab323 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -804,15 +804,28 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, bool tail) { struct free_area *area = &zone->free_area[order]; + struct list_head *list; VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), migratetype, 1 << order); + /* + * When identifying whether a page requires tlb shootdown, false + * positive is okay because it will cause just additional tlb + * shootdown. + */ + if (page_luf_key(page)) { + list = &area->pend_list[migratetype]; + atomic_long_add(1 << order, &zone->nr_luf_pages); + } else + list = &area->free_list[migratetype]; + if (tail) - list_add_tail(&page->buddy_list, &area->free_list[migratetype]); + list_add_tail(&page->buddy_list, list); else - list_add(&page->buddy_list, &area->free_list[migratetype]); + list_add(&page->buddy_list, list); + area->nr_free++; } @@ -831,7 +844,20 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), old_mt, 1 << order); - list_move_tail(&page->buddy_list, &area->free_list[new_mt]); + /* + * The page might have been taken from a pfn where it's not + * clear which list was used. Therefore, conservatively + * consider it as pend_list, not to miss any true ones that + * require tlb shootdown. + * + * When identifying whether a page requires tlb shootdown, false + * positive is okay because it will cause just additional tlb + * shootdown. + */ + if (page_luf_key(page)) + list_move_tail(&page->buddy_list, &area->pend_list[new_mt]); + else + list_move_tail(&page->buddy_list, &area->free_list[new_mt]); account_freepages(zone, -(1 << order), old_mt); account_freepages(zone, 1 << order, new_mt); @@ -848,6 +874,9 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon if (page_reported(page)) __ClearPageReported(page); + if (page_luf_key(page)) + atomic_long_sub(1 << order, &zone->nr_luf_pages); + list_del(&page->buddy_list); __ClearPageBuddy(page); zone->free_area[order].nr_free--; @@ -866,15 +895,48 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, account_freepages(zone, -(1 << order), migratetype); } -static inline struct page *get_page_from_free_area(struct free_area *area, - int migratetype) +static inline struct page *get_page_from_free_area(struct zone *zone, + struct free_area *area, int migratetype) { - struct page *page = list_first_entry_or_null(&area->free_list[migratetype], - struct page, buddy_list); + struct page *page; + bool pend_first; - if (page && luf_takeoff_check(page)) - return page; + /* + * XXX: Make the decision preciser if needed e.g. using + * zone_watermark_ok() or its family, but for now, don't want to + * make it heavier. + * + * Try free_list, holding non-luf pages, first if there are + * enough non-luf pages to aggressively defer tlb flush, but + * should try pend_list first instead if not. + */ + pend_first = !non_luf_pages_ok(zone); + + if (pend_first) { + page = list_first_entry_or_null(&area->pend_list[migratetype], + struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + + page = list_first_entry_or_null(&area->free_list[migratetype], + struct page, buddy_list); + + if (page) + return page; + } else { + page = list_first_entry_or_null(&area->free_list[migratetype], + struct page, buddy_list); + + if (page) + return page; + page = list_first_entry_or_null(&area->pend_list[migratetype], + struct page, buddy_list); + + if (page && luf_takeoff_check(page)) + return page; + } return NULL; } @@ -1027,6 +1089,8 @@ static inline void __free_one_page(struct page *page, if (fpi_flags & FPI_TO_TAIL) to_tail = true; + else if (page_luf_key(page)) + to_tail = true; else if (is_shuffle_order(order)) to_tail = shuffle_pick_tail(); else @@ -1630,6 +1694,8 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, unsigned int nr_added = 0; while (high > low) { + bool tail = false; + high--; size >>= 1; VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); @@ -1643,7 +1709,10 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; - __add_to_free_list(&page[size], zone, high, migratetype, false); + if (page_luf_key(&page[size])) + tail = true; + + __add_to_free_list(&page[size], zone, high, migratetype, tail); set_buddy_order(&page[size], high); nr_added += size; } @@ -1827,7 +1896,7 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, /* Find a page of the appropriate size in the preferred list */ for (current_order = order; current_order < NR_PAGE_ORDERS; ++current_order) { area = &(zone->free_area[current_order]); - page = get_page_from_free_area(area, migratetype); + page = get_page_from_free_area(zone, area, migratetype); if (!page) continue; @@ -2269,7 +2338,8 @@ int find_suitable_fallback(struct free_area *area, unsigned int order, if (free_area_empty(area, fallback_mt)) continue; - if (luf_takeoff_no_shootdown()) + if (free_list_empty(area, fallback_mt) && + luf_takeoff_no_shootdown()) continue; if (can_steal_fallback(order, migratetype)) @@ -2373,7 +2443,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, struct free_area *area = &(zone->free_area[order]); int mt; - page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC); + page = get_page_from_free_area(zone, area, MIGRATE_HIGHATOMIC); if (!page) continue; @@ -2511,7 +2581,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, VM_BUG_ON(current_order > MAX_PAGE_ORDER); do_steal: - page = get_page_from_free_area(area, fallback_mt); + page = get_page_from_free_area(zone, area, fallback_mt); /* take off list, maybe claim block, expand remainder */ page = steal_suitable_fallback(zone, page, current_order, order, @@ -7133,6 +7203,8 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, struct page *current_buddy; while (high > low) { + bool tail = false; + high--; size >>= 1; @@ -7146,7 +7218,10 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; - add_to_free_list(current_buddy, zone, high, migratetype, false); + if (page_luf_key(current_buddy)) + tail = true; + + add_to_free_list(current_buddy, zone, high, migratetype, tail); set_buddy_order(current_buddy, high); } } diff --git a/mm/page_reporting.c b/mm/page_reporting.c index 03a7f5f6dc073..e152b22fbba8a 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -159,15 +159,17 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, struct page *page, *next; long budget; int err = 0; + bool consider_pend = false; + bool can_shootdown; /* * Perform early check, if free area is empty there is * nothing to process so we can skip this free_list. */ - if (list_empty(list)) + if (free_area_empty(area, mt)) return err; - luf_takeoff_start(); + can_shootdown = luf_takeoff_start(); spin_lock_irq(&zone->lock); /* @@ -185,14 +187,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, * should always be a power of 2. */ budget = DIV_ROUND_UP(area->nr_free, PAGE_REPORTING_CAPACITY * 16); - +retry: /* loop through free list adding unreported pages to sg list */ list_for_each_entry_safe(page, next, list, lru) { /* We are going to skip over the reported pages. */ if (PageReported(page)) continue; - if (!luf_takeoff_check(page)) { + if (unlikely(consider_pend && !luf_takeoff_check(page))) { VM_WARN_ON(1); continue; } @@ -205,14 +207,14 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (budget < 0) { atomic_set(&prdev->state, PAGE_REPORTING_REQUESTED); next = page; - break; + goto done; } /* Attempt to pull page from list and place in scatterlist */ if (*offset) { if (!__isolate_free_page(page, order, false)) { next = page; - break; + goto done; } /* Add page to scatter list */ @@ -263,9 +265,15 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* exit on error */ if (err) - break; + goto done; } + if (!consider_pend && can_shootdown) { + consider_pend = true; + list = &area->pend_list[mt]; + goto retry; + } +done: /* Rotate any leftover pages to the head of the freelist */ if (!list_entry_is_head(next, list, lru) && !list_is_first(&next->lru, list)) list_rotate_to_front(&next->lru, list); diff --git a/mm/vmstat.c b/mm/vmstat.c index 16bfe1c694dd4..5ae5ac9f0a4a9 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1581,6 +1581,21 @@ static void pagetypeinfo_showfree_print(struct seq_file *m, break; } } + list_for_each(curr, &area->pend_list[mtype]) { + /* + * Cap the pend_list iteration because it might + * be really large and we are under a spinlock + * so a long time spent here could trigger a + * hard lockup detector. Anyway this is a + * debugging tool so knowing there is a handful + * of pages of this order should be more than + * sufficient. + */ + if (++freecount >= 100000) { + overflow = true; + break; + } + } seq_printf(m, "%s%6lu ", overflow ? ">" : "", freecount); spin_unlock_irq(&zone->lock); cond_resched(); From patchwork Wed Feb 26 12:01:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992182 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE374C18E7C for ; Wed, 26 Feb 2025 12:02:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DDD3A280027; Wed, 26 Feb 2025 07:01:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D64A4280025; Wed, 26 Feb 2025 07:01:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE049280027; Wed, 26 Feb 2025 07:01:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9E4B4280025 for ; Wed, 26 Feb 2025 07:01:53 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5DF061C9DB7 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) X-FDA: 83161956864.21.8EF028A Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf13.hostedemail.com (Postfix) with ESMTP id 0649820004 for ; Wed, 26 Feb 2025 12:01:49 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571310; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=et3JOkQOwZufTDAD0ecOikUnxLE/p3ZhMFaCgfThDu0=; b=XzMGA18HpaWU+4Cm6aQPPAfqSu2mkNr77XrgqxihDLSsCbTVRXZbTdayqLOxv4wJjoaFr+ +lhIP7NqHKJg3L/0JouBSH4+5X9DHY8qqOMirp7leAc1Z5rSI3u19UETak/TZ4gd5KKYWQ Et2X3AuCrgNAS/m0EDp/bYYzYz2njc8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571310; a=rsa-sha256; cv=none; b=37A1/Awc8FpmzgMfJbiggnmN22tm29/k8MDOsr3rO2q7+xStsK51ll3Aw2+3KnKyMuMnch sayw/uRBapSqU9HS3BNnhDX+vCtJj9W5CVjiFWdc3BLVwG+ac3lygkKLgQb6MiBF1Oo0bv +jAurZAMwYfSHNOw26ygYYJ0ln9iDTM= X-AuditID: a67dfc5b-3e1ff7000001d7ae-08-67bf02a769c9 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 14/25] mm/rmap: recognize read-only tlb entries during batched tlb flush Date: Wed, 26 Feb 2025 21:01:21 +0900 Message-Id: <20250226120132.28469-14-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBgfes1vMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8aby/PYC26KVTRMDmpg/CPUxcjJ ISFgInHx4C1mGLvtx2Ewm01AXeLGjZ9gtoiAmcTB1j/sXYxcHMwCy5gk9p5oYANJCAt0Mkpc eJcLYrMIqEq0fL8A1sAL1DD/1CwmiKHyEqs3HACLcwLF/+3+zQ5iCwkkS7Ss/80CMlRC4Dab xLrtX1kgGiQlDq64wTKBkXcBI8MqRqHMvLLcxMwcE72MyrzMCr3k/NxNjMCgXlb7J3oH46cL wYcYBTgYlXh4H5zZmy7EmlhWXJl7iFGCg1lJhJczc0+6EG9KYmVValF+fFFpTmrxIUZpDhYl cV6jb+UpQgLpiSWp2ampBalFMFkmDk6pBkajsxsWffZ9lVSkb9nAuaRH7e+E9XbqcYJft2Rl HHh6X0C654dXp8oTJw6rz79sS+98XCNWZSbYGp70zp31lNnfBSmZfv8Evv8OWKnaqZrUVzNN UXW74t8PUZ0fH39P5rOuCvMMXH5t8i2rpAuW71rSIw5vfGlma+h8/erZPaJrnxtfWP7cc44S S3FGoqEWc1FxIgDhRYCTZgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wbn9NhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlvLk8j73gplhF w+SgBsY/Ql2MnBwSAiYSbT8OM4PYbALqEjdu/ASzRQTMJA62/mHvYuTiYBZYxiSx90QDG0hC WKCTUeLCu1wQm0VAVaLl+wWwBl6ghvmnZjFBDJWXWL3hAFicEyj+b/dvdhBbSCBZomX9b5YJ jFwLGBlWMYpk5pXlJmbmmOoVZ2dU5mVW6CXn525iBIbosto/E3cwfrnsfohRgINRiYf3wZm9 6UKsiWXFlbmHGCU4mJVEeDkz96QL8aYkVlalFuXHF5XmpBYfYpTmYFES5/UKT00QEkhPLEnN Tk0tSC2CyTJxcEo1MC5R+NMRyZlrsezGp6jFZ/ymHT4x/12MRC+n+wHrLvOfG67YbAkSu2U6 T+bavub3KlZWLB+YzMRnryiUZEwz+vi39HxNYvqvB89Lo9clcYc5fnx2KGVm0xTndHWDgxyX 3v303rTumQ+zdvnDbfOKPvzO+jL5QqdQxJ/WV+crT797xjbpBs/T9QeUWIozEg21mIuKEwFC FGFNTQIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 0649820004 X-Stat-Signature: kghmfqqpxhiz79q7bzqzgn9kzyan9kx7 X-HE-Tag: 1740571309-891687 X-HE-Meta: U2FsdGVkX192iCtOxmMOsZeVUSlKmfqFapy1AUulGWWo4UcriMfBYZoTt74esU8ug8AxOV0qWVA2SVFpwskjEv4XULB4sbz7zy0PG+nIpLT8SLUHo3c254Zi8Afu+28rNgzij4R1TkPKw9IA12fMjCy8gUEAO2bxl5RqZ9AczgplKqya1Ui5E6QYXeh198PL2m+4vJQ5O4djjteND4DCxVnLrXTTQ+Fo1eRu2zphwhNR7uR57xroMpNqaMlaaXPtul8Or1hh8slhMRFNDpIxAUa0wKGaRTZXZsuy0ui61sgVViG/P87LZAm63wyUwgejf7eVPDnvPJXdRC6OKcGkW5i6xK8/Fh+aopO2on28b0SK74S+L9uyR7YwQP+OGNTqsQRIGoRVRxSFuhe+M7IZKLNpjeqqzF8SlcDBV2IqKMd/orTlWxA1usvFt62RvedKL1n1iEKgRR/4MWDto+22FwTYKmFZ02GoFGjDrmU1gU9Zd1gjq0qdUfC54b5SMMePYEZdB5yMcxLd765B53/SUa1+LU6m6PiOmdP9EfwbKCILy0YOY6w7y/L27IQmKF6xEOPVuxmi+bP7M8myOnEa8DLWiKgybsOZ3Q4rVwXMKEaWdEqW/defYeFnb+FforZMCuBFmlCthjmrpwioDCrUD/UTSDks4LgT1AXBMys1vTqID4nV7vBtExo+qUacfAOVTNYuczBTQLiopglkXqPr/1zCHp7sfAZkA3SKxgO/dyvuORUTQ+1WBQ4ZTkG4YxDYdDutXpzfiymV0HRqNb5zrKXt8TegsZg/L6q98lkonrtjFXCR1VQUNEY/Lx915ZfKX79FOdIj7557KD+ipzsst23q2XQJGjtlXtMcZpG5+7Ua/n4JpJb6ORST6AQ4Gq+TR585k3VtI/s2U2vVxfIE+BUkLTHp0wZVfE8wWAqXD2nricfI/v7fHKBNPvSaRM45MlSVrr1wEzGlyFeMCa/ VWg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to recognize read-only tlb entries and handle them in a different way. The newly introduced API in this patch, fold_ubc(), will be used by luf mechanism. Signed-off-by: Byungchul Park --- include/linux/sched.h | 1 + mm/rmap.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index a3049ea5b3ad3..d1a3c97491ff2 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1407,6 +1407,7 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; + struct tlbflush_unmap_batch tlb_ubc_ro; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/mm/rmap.c b/mm/rmap.c index 40de03c8f73be..c9c594d73058c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -775,6 +775,7 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; if (!tlb_ubc_takeoff->flush_required) @@ -789,6 +790,9 @@ void try_to_unmap_flush_takeoff(void) if (arch_tlbbatch_done(&tlb_ubc->arch, &tlb_ubc_takeoff->arch)) reset_batch(tlb_ubc); + if (arch_tlbbatch_done(&tlb_ubc_ro->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc_ro); + reset_batch(tlb_ubc_takeoff); } @@ -801,7 +805,9 @@ void try_to_unmap_flush_takeoff(void) void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + fold_batch(tlb_ubc, tlb_ubc_ro, true); if (!tlb_ubc->flush_required) return; @@ -813,8 +819,9 @@ void try_to_unmap_flush(void) void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; - if (tlb_ubc->writable) + if (tlb_ubc->writable || tlb_ubc_ro->writable) try_to_unmap_flush(); } @@ -831,13 +838,18 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end) { - struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc; int batch; bool writable = pte_dirty(pteval); if (!pte_accessible(mm, pteval)) return; + if (pte_write(pteval)) + tlb_ubc = ¤t->tlb_ubc; + else + tlb_ubc = ¤t->tlb_ubc_ro; + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; From patchwork Wed Feb 26 12:01:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0965C021BF for ; Wed, 26 Feb 2025 12:02:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1805C280032; Wed, 26 Feb 2025 07:02:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 10A5E28002D; Wed, 26 Feb 2025 07:02:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E768B280032; Wed, 26 Feb 2025 07:02:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B878428002D for ; Wed, 26 Feb 2025 07:02:01 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E6E6E813D7 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) X-FDA: 83161956864.07.74651EA Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf16.hostedemail.com (Postfix) with ESMTP id 53FAA180035 for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571311; a=rsa-sha256; cv=none; b=7PCi5f2oMVHzLWu6aLaC/Sz4RJjriu/IwcavZik8TdZmI0q8tfa62e7CyEO0yJXw26RJUg RgIeKL+64Up0fv3A9hvRSXguGcd1w1BACzKNooE64kC2t2NzgRSGCe5rZDW/Ve14jg36OI kVog9/9aTNfVgSUIRE1/cv3EB4Gzl0E= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571311; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=rBTBhd6fhSQRjl+PEOPXWDO6K8D5BQjUNxUr9sH1k2s=; b=B5cj4HhX1z0RwLrm+86qNI86wJdj4yclIvAbJ95bFHLby7Iyr+sbi5DdPNyyRLJOL858hM IXIvgUnnfopxGckgdGa/C6mLbHl+zTJgNVW8ACnGXCRh5egfQvZFFlMXdxxZxFSkPkHawg ID+8gkW9HaUtgAo5IncQ5jLel3ydlE8= X-AuditID: a67dfc5b-3e1ff7000001d7ae-0d-67bf02a79efb From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 15/25] fs, filemap: refactor to gather the scattered ->write_{begin,end}() calls Date: Wed, 26 Feb 2025 21:01:22 +0900 Message-Id: <20250226120132.28469-15-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBpN/8lrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8b9Fa3sBd+yK9Y9XM/cwHgutIuR k0NCwERixbYrbF2MHGD2kl+BIGE2AXWJGzd+MoPYIgJmEgdb/7B3MXJxMAssY5LYe6KBDcQR FpjAKDFvVQs7SBWLgKrEsc1rwGxeoI5F1/uZIRbIS6zecADM5gSK/9v9G6xGSCBZomX9bxaQ QRIC99kkdszrYoRokJQ4uOIGywRG3gWMDKsYhTLzynITM3NM9DIq8zIr9JLzczcxAsN6We2f 6B2Mny4EH2IU4GBU4uF9cGZvuhBrYllxZe4hRgkOZiURXs7MPelCvCmJlVWpRfnxRaU5qcWH GKU5WJTEeY2+lacICaQnlqRmp6YWpBbBZJk4OKUaGKOPTc+YkDU7+9WRvPMnXbfzrM5KL3tw 4+8WoY//r5n4lNhpHX8f+1H73k+ukhtJX21WlP/stznGvDDzXF1zZxPvhlv9Wo2yxQ1Jyzs9 DFL7Lu8VkFt6xspSQuY0Y4XL6tU+t3SmRMxuvmJcrXPEN3qxksBu/sMP3nsVyPx/MzmBtc// 71wdUyWW4oxEQy3mouJEAGgv1UtnAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wc2DLBZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBl3F/Ryl7wLbti 3cP1zA2M50K7GDk4JARMJJb8Cuxi5ORgE1CXuHHjJzOILSJgJnGw9Q97FyMXB7PAMiaJvSca 2EAcYYEJjBLzVrWwg1SxCKhKHNu8BszmBepYdL0frFtCQF5i9YYDYDYnUPzf7t9gNUICyRIt 63+zTGDkWsDIsIpRJDOvLDcxM8dUrzg7ozIvs0IvOT93EyMwSJfV/pm4g/HLZfdDjAIcjEo8 vA/O7E0XYk0sK67MPcQowcGsJMLLmbknXYg3JbGyKrUoP76oNCe1+BCjNAeLkjivV3hqgpBA emJJanZqakFqEUyWiYNTqoFxYdWJv10t/g9y7OTsmX2VvDr1i5WZDluEcK8PPsOtZGoo+HN/ e8gN4d2/ZusGvzHdtC+zedW8m8df3RA8aiG970lQi2rBGbndr8x3rlV33vWzxiE52nvNRJvg KbP5nnZwO1XM+2fp6v4zv9Ro65WjiX7BWtr3gx+1nUz6+cwuknnNtKqitBglluKMREMt5qLi RACLtfHSTgIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 53FAA180035 X-Stat-Signature: imih5ik3njf8bs8o47w9o5hiiatm45rs X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740571310-426688 X-HE-Meta: U2FsdGVkX18mwUV9OKwe4vtvH6oUHDDPjPZRMLZeEr0ppndpeD3w/EiA0KbXaLF4E4332dZPdoQYGdgiQjA7XNq7eXWvChg5rvChayo0zd2G8SdMtV2jznRh/6nQEs40fIFWtwcNcKBXDDcAdreukLxP9iHuGJ36Ey9lrUtTyOEtN+3oGIVB2/cPp06kYQIxnhnpu/9Y79vykmixAikV0H58t0fHkq5GzyOfyUmcisdZGF+9M53mESe3wIpCX7/DXXY2+ADjFLCIG84gbJQ6EnZk8mzYfqeheII7HGu5mAPMnVukfKwmj+TEjv3FoZPF03VKtzb0pRtR2riWU+nhHQd57mTEmZmDr4VYt3NUtH95/Stv7DW9+08cBaYU1poUL7jWhRRsukFggMETXOaorKunCfRo5FAgpUx2lgo8yaAO9IHjT/JGTcT8t7UtuOxHMLnG3eLywZhOx9PoVdQcXTA2Qpu4UPbDF1UFCpJRwOy7SpImqYnAe8aQLT6tyFGVtlxvRq+zWGh070hmIvhvVzK1OEVROX5G6ncwCLjlW6GBPEVrUttxDjbWO7CGTXFKziI7Sj5iSgG5ng94x6I+DpBa8wskaWovmMNphkjKxGQeJhFjRDJUUGxxm5BctAQyIbGC87v8Lfn0vp/SB8x+2tXbXpS6iM0ySpcVTyIGCchGPxeG936Kg1FIecRhwDu9MNjJ+4s64DhhHQO/KBbGBXa9fIr1jdCIRlzr52NstPYSGYGykbVSKVNKbohKC2JtQ3o65h+L1lA33ZT1KBVwvckOjuKU6RmdAFNrpmMEUiK78w8z7k4lPB2+xb1NBRWgXRaBhnJc/ko/0H+68x/hKviHm8mw1nmEgx7AVxgXJ/R5XPAUtp7uEYGigFHYuZd8QjMBMILqsN1Ui0TjTT62Ak9r6lOyvuA72xLnr0dBjBi2M+SnljI+uRzX6ZpqWFtD/DLVeEx+c6TPd3YhNNH LqqdFNc6 I4aUgWdFHc2RB+n9JBgKGrlMc/r89bukG2sWhWNDbF38FYmpfcM6MQlEbOmKbKXeJ2I0r X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that requires to hook when updating page cache that might have pages that have been mapped on any tasks so that tlb flush needed can be performed. Signed-off-by: Byungchul Park --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 11 ++++------- fs/affs/file.c | 4 ++-- fs/buffer.c | 14 ++++++-------- fs/exfat/file.c | 5 ++--- fs/ext4/verity.c | 5 ++--- fs/f2fs/super.c | 5 ++--- fs/f2fs/verity.c | 5 ++--- fs/namei.c | 5 ++--- include/linux/fs.h | 18 ++++++++++++++++++ mm/filemap.c | 5 ++--- 10 files changed, 42 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index ae3343c81a645..22ce009d13689 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -418,7 +418,6 @@ shmem_pwrite(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pwrite *arg) { struct address_space *mapping = obj->base.filp->f_mapping; - const struct address_space_operations *aops = mapping->a_ops; char __user *user_data = u64_to_user_ptr(arg->data_ptr); u64 remain; loff_t pos; @@ -477,7 +476,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj, if (err) return err; - err = aops->write_begin(obj->base.filp, mapping, pos, len, + err = mapping_write_begin(obj->base.filp, mapping, pos, len, &folio, &data); if (err < 0) return err; @@ -488,7 +487,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj, pagefault_enable(); kunmap_local(vaddr); - err = aops->write_end(obj->base.filp, mapping, pos, len, + err = mapping_write_end(obj->base.filp, mapping, pos, len, len - unwritten, folio, data); if (err < 0) return err; @@ -654,7 +653,6 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, { struct drm_i915_gem_object *obj; struct file *file; - const struct address_space_operations *aops; loff_t pos; int err; @@ -666,21 +664,20 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915, GEM_BUG_ON(obj->write_domain != I915_GEM_DOMAIN_CPU); file = obj->base.filp; - aops = file->f_mapping->a_ops; pos = 0; do { unsigned int len = min_t(typeof(size), size, PAGE_SIZE); struct folio *folio; void *fsdata; - err = aops->write_begin(file, file->f_mapping, pos, len, + err = mapping_write_begin(file, file->f_mapping, pos, len, &folio, &fsdata); if (err < 0) goto fail; memcpy_to_folio(folio, offset_in_folio(folio, pos), data, len); - err = aops->write_end(file, file->f_mapping, pos, len, len, + err = mapping_write_end(file, file->f_mapping, pos, len, len, folio, fsdata); if (err < 0) goto fail; diff --git a/fs/affs/file.c b/fs/affs/file.c index a5a861dd52230..10e7f53828e93 100644 --- a/fs/affs/file.c +++ b/fs/affs/file.c @@ -885,9 +885,9 @@ affs_truncate(struct inode *inode) loff_t isize = inode->i_size; int res; - res = mapping->a_ops->write_begin(NULL, mapping, isize, 0, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, isize, 0, &folio, &fsdata); if (!res) - res = mapping->a_ops->write_end(NULL, mapping, isize, 0, 0, folio, fsdata); + res = mapping_write_end(NULL, mapping, isize, 0, 0, folio, fsdata); else inode->i_size = AFFS_I(inode)->mmu_private; mark_inode_dirty(inode); diff --git a/fs/buffer.c b/fs/buffer.c index c66a59bb068b9..6655912f12c46 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2457,7 +2457,6 @@ EXPORT_SYMBOL(block_read_full_folio); int generic_cont_expand_simple(struct inode *inode, loff_t size) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; struct folio *folio; void *fsdata = NULL; int err; @@ -2466,11 +2465,11 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size) if (err) goto out; - err = aops->write_begin(NULL, mapping, size, 0, &folio, &fsdata); + err = mapping_write_begin(NULL, mapping, size, 0, &folio, &fsdata); if (err) goto out; - err = aops->write_end(NULL, mapping, size, 0, 0, folio, fsdata); + err = mapping_write_end(NULL, mapping, size, 0, 0, folio, fsdata); BUG_ON(err > 0); out: @@ -2482,7 +2481,6 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, loff_t pos, loff_t *bytes) { struct inode *inode = mapping->host; - const struct address_space_operations *aops = mapping->a_ops; unsigned int blocksize = i_blocksize(inode); struct folio *folio; void *fsdata = NULL; @@ -2502,12 +2500,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = PAGE_SIZE - zerofrom; - err = aops->write_begin(file, mapping, curpos, len, + err = mapping_write_begin(file, mapping, curpos, len, &folio, &fsdata); if (err) goto out; folio_zero_range(folio, offset_in_folio(folio, curpos), len); - err = aops->write_end(file, mapping, curpos, len, len, + err = mapping_write_end(file, mapping, curpos, len, len, folio, fsdata); if (err < 0) goto out; @@ -2535,12 +2533,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, } len = offset - zerofrom; - err = aops->write_begin(file, mapping, curpos, len, + err = mapping_write_begin(file, mapping, curpos, len, &folio, &fsdata); if (err) goto out; folio_zero_range(folio, offset_in_folio(folio, curpos), len); - err = aops->write_end(file, mapping, curpos, len, len, + err = mapping_write_end(file, mapping, curpos, len, len, folio, fsdata); if (err < 0) goto out; diff --git a/fs/exfat/file.c b/fs/exfat/file.c index 05b51e7217838..9a1002761f79f 100644 --- a/fs/exfat/file.c +++ b/fs/exfat/file.c @@ -539,7 +539,6 @@ static int exfat_extend_valid_size(struct file *file, loff_t new_valid_size) struct inode *inode = file_inode(file); struct exfat_inode_info *ei = EXFAT_I(inode); struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *ops = mapping->a_ops; pos = ei->valid_size; while (pos < new_valid_size) { @@ -551,14 +550,14 @@ static int exfat_extend_valid_size(struct file *file, loff_t new_valid_size) if (pos + len > new_valid_size) len = new_valid_size - pos; - err = ops->write_begin(file, mapping, pos, len, &folio, NULL); + err = mapping_write_begin(file, mapping, pos, len, &folio, NULL); if (err) goto out; off = offset_in_folio(folio, pos); folio_zero_new_buffers(folio, off, off + len); - err = ops->write_end(file, mapping, pos, len, len, folio, NULL); + err = mapping_write_end(file, mapping, pos, len, len, folio, NULL); if (err < 0) goto out; pos += len; diff --git a/fs/ext4/verity.c b/fs/ext4/verity.c index d9203228ce979..64fa43f80c73e 100644 --- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -68,7 +68,6 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, loff_t pos) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; if (pos + count > inode->i_sb->s_maxbytes) return -EFBIG; @@ -80,13 +79,13 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, void *fsdata = NULL; int res; - res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, pos, n, &folio, &fsdata); if (res) return res; memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n); - res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata); + res = mapping_write_end(NULL, mapping, pos, n, n, folio, fsdata); if (res < 0) return res; if (res != n) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 19b67828ae325..87c26f0571dab 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2710,7 +2710,6 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, { struct inode *inode = sb_dqopt(sb)->files[type]; struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *a_ops = mapping->a_ops; int offset = off & (sb->s_blocksize - 1); size_t towrite = len; struct folio *folio; @@ -2722,7 +2721,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, tocopy = min_t(unsigned long, sb->s_blocksize - offset, towrite); retry: - err = a_ops->write_begin(NULL, mapping, off, tocopy, + err = mapping_write_begin(NULL, mapping, off, tocopy, &folio, &fsdata); if (unlikely(err)) { if (err == -ENOMEM) { @@ -2735,7 +2734,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type, memcpy_to_folio(folio, offset_in_folio(folio, off), data, tocopy); - a_ops->write_end(NULL, mapping, off, tocopy, tocopy, + mapping_write_end(NULL, mapping, off, tocopy, tocopy, folio, fsdata); offset = 0; towrite -= tocopy; diff --git a/fs/f2fs/verity.c b/fs/f2fs/verity.c index 2287f238ae09e..b232589546d39 100644 --- a/fs/f2fs/verity.c +++ b/fs/f2fs/verity.c @@ -72,7 +72,6 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, loff_t pos) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; if (pos + count > F2FS_BLK_TO_BYTES(max_file_blocks(inode))) return -EFBIG; @@ -84,13 +83,13 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count, void *fsdata = NULL; int res; - res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata); + res = mapping_write_begin(NULL, mapping, pos, n, &folio, &fsdata); if (res) return res; memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n); - res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata); + res = mapping_write_end(NULL, mapping, pos, n, n, folio, fsdata); if (res < 0) return res; if (res != n) diff --git a/fs/namei.c b/fs/namei.c index 3ab9440c5b931..e1c6d28c560da 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -5409,7 +5409,6 @@ EXPORT_SYMBOL(page_readlink); int page_symlink(struct inode *inode, const char *symname, int len) { struct address_space *mapping = inode->i_mapping; - const struct address_space_operations *aops = mapping->a_ops; bool nofs = !mapping_gfp_constraint(mapping, __GFP_FS); struct folio *folio; void *fsdata = NULL; @@ -5419,7 +5418,7 @@ int page_symlink(struct inode *inode, const char *symname, int len) retry: if (nofs) flags = memalloc_nofs_save(); - err = aops->write_begin(NULL, mapping, 0, len-1, &folio, &fsdata); + err = mapping_write_begin(NULL, mapping, 0, len-1, &folio, &fsdata); if (nofs) memalloc_nofs_restore(flags); if (err) @@ -5427,7 +5426,7 @@ int page_symlink(struct inode *inode, const char *symname, int len) memcpy(folio_address(folio), symname, len - 1); - err = aops->write_end(NULL, mapping, 0, len - 1, len - 1, + err = mapping_write_end(NULL, mapping, 0, len - 1, len - 1, folio, fsdata); if (err < 0) goto fail; diff --git a/include/linux/fs.h b/include/linux/fs.h index 2c3b2f8a621f7..820ff4752249e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -531,6 +531,24 @@ struct address_space { #define PAGECACHE_TAG_WRITEBACK XA_MARK_1 #define PAGECACHE_TAG_TOWRITE XA_MARK_2 +static inline int mapping_write_begin(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, + struct folio **foliop, void **fsdata) +{ + return mapping->a_ops->write_begin(file, mapping, pos, len, foliop, + fsdata); +} + +static inline int mapping_write_end(struct file *file, + struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct folio *folio, void *fsdata) +{ + return mapping->a_ops->write_end(file, mapping, pos, len, copied, + folio, fsdata); +} + /* * Returns true if any of the pages in the mapping are marked with the tag. */ diff --git a/mm/filemap.c b/mm/filemap.c index c6650de837d06..1c6fda5a43020 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4141,7 +4141,6 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) struct file *file = iocb->ki_filp; loff_t pos = iocb->ki_pos; struct address_space *mapping = file->f_mapping; - const struct address_space_operations *a_ops = mapping->a_ops; size_t chunk = mapping_max_folio_size(mapping); long status = 0; ssize_t written = 0; @@ -4175,7 +4174,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) break; } - status = a_ops->write_begin(file, mapping, pos, bytes, + status = mapping_write_begin(file, mapping, pos, bytes, &folio, &fsdata); if (unlikely(status < 0)) break; @@ -4190,7 +4189,7 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) copied = copy_folio_from_iter_atomic(folio, offset, bytes, i); flush_dcache_folio(folio); - status = a_ops->write_end(file, mapping, pos, bytes, copied, + status = mapping_write_end(file, mapping, pos, bytes, copied, folio, fsdata); if (unlikely(status != copied)) { iov_iter_revert(i, copied - max(status, 0L)); From patchwork Wed Feb 26 12:01:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992187 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C93C0C021BF for ; Wed, 26 Feb 2025 12:02:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E52F328002C; Wed, 26 Feb 2025 07:01:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDD04280028; Wed, 26 Feb 2025 07:01:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A81D428002C; Wed, 26 Feb 2025 07:01:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 82B11280028 for ; Wed, 26 Feb 2025 07:01:56 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CA8844D113 for ; Wed, 26 Feb 2025 12:01:55 +0000 (UTC) X-FDA: 83161956990.23.23B04E3 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf23.hostedemail.com (Postfix) with ESMTP id 7F4E514003F for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571311; a=rsa-sha256; cv=none; b=wXLSEQANQqYgm1TTS+epRu2s/sFVobknJkAqb02DYs4+LcekV4q+42PSOH8/I7x15MSd5n KFtdf/mh784y/OhGqVqv6ZjVWKgvKa3gSCR2QRBK19gQhiQ8fh819m1Qwdw3h8Ujqj1HIO +g4h2xWIO5ngL8iwW/oAZTYn/bKGY20= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf23.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571311; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=pyXMz+h96BNUKvS7kgzlM3a+Iy3GGjlncx5IpfJ8unI=; b=31qUaIGtdN9V2no2fn75GsYiQKDAEzqqASXJH6emGmytj8V9zhSEKC9h3ZcIFrSED2Jhdi wuHFzPxIvacR31p5IeqawQGlBa+S47JtLSQpLErxLkbRMDnPOMjLf2dxrtWzH//Y6crRkZ l45iPDusDZ1bws9cZh9rc374fJZdfd0= X-AuditID: a67dfc5b-3e1ff7000001d7ae-12-67bf02a722f5 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 16/25] mm: implement LUF(Lazy Unmap Flush) defering tlb flush when folios get unmapped Date: Wed, 26 Feb 2025 21:01:23 +0900 Message-Id: <20250226120132.28469-16-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBi+WCFnMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV0bfjI+MBdPuM1ZMv/STrYFx2QbG LkZODgkBE4kvz3qYYeyNVz6xgthsAuoSN278BIuLCJhJHGz9w97FyMXBLLCMSWLviQY2kISw wDRGiW1d8iA2i4CqxLxvd8GaeYEaNu6awgYxVF5i9YYDYIM4geL/dv9mB7GFBJIlWtb/ZgEZ KiFwn02i4coRqIskJQ6uuMEygZF3ASPDKkahzLyy3MTMHBO9jMq8zAq95PzcTYzA0F5W+yd6 B+OnC8GHGAU4GJV4eB+c2ZsuxJpYVlyZe4hRgoNZSYSXM3NPuhBvSmJlVWpRfnxRaU5q8SFG aQ4WJXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQZGx3t3Hz7s8OzVedT9+sqaJ7wTq3rUS4w4 6teu97pvY7Qr+fiBS22aZ5SqTeNc76TFzXjBwy6ma5rDuNjXq3Hysyt5POXVn0QOf/Kpq56Q uo3hi/OFtf58zNocqp+keS4LiPzOsr9Wvjr/277SPbuOPIwIWyuYcOOgUeCj31LvGw6aKC5M 57+jxFKckWioxVxUnAgATNQYP2kCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wYcmDos569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgy+mZ8ZCyYdp+x Yvqln2wNjMs2MHYxcnJICJhIbLzyiRXEZhNQl7hx4ycziC0iYCZxsPUPexcjFwezwDImib0n GthAEsIC0xgltnXJg9gsAqoS877dBWvmBWrYuGsKG8RQeYnVGw6ADeIEiv/b/ZsdxBYSSJZo Wf+bZQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBgbqs9s/EHYxfLrsfYhTgYFTi 4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlYlMR5vcJTE4QE 0hNLUrNTUwtSi2CyTBycUg2MybO3rMjq1rFn2rdj0iu2+qfMAkFPDZ9OZy5fVWJ/VMSl8Jy0 /bqfG599tfKd4FY5q/WRwemC+kNtu+aL7frq27/wm5Yv54kOnTkMaxQzbd9aBrWaXmWIcrWZ s+JzbsGkTa0/ohfV3H62X1S4WPnR9Fm8LmqedulJ3BNfGpr0JVyvXWJ5/ZyNEktxRqKhFnNR cSIA7Y8YvVACAAA= X-CFilter-Loop: Reflected X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7F4E514003F X-Stat-Signature: x3pas97gswgzcawotsc8hncbn67edibf X-Rspam-User: X-HE-Tag: 1740571310-133969 X-HE-Meta: U2FsdGVkX1/lc7XG5fkNQoviRb9335TcujyF51NdnRbxRGckCapgbyKYMx8+3DgP+rVUAhK2B8xTfUtbPT/ZHjs9JVVwEOIBB7+5lUXupf58jwaHfXIMeRUf2J7cd0HDGGrh88lDgN99p2+NMAAYJZSAcWC8OqZaEebMCzQKqwwuoTOSZrpoccwLM9ga7qMh/isFdDAVDoUiMKPoKcqFELdpuqp+1rrwB1MIuVuDp2XyPiZ9uPUUU/ODT6m8AAlayvvPiRv6CQ5IJ3tBzLVUG08waRSvNwWP4mHwU34533+GfZ9MEIODmqAUNAO3o6GCpW+1XHPWoO8C5x3cJUV1PqVBlYGQXFdLAyARX/Gu6tiO3TYqkIXlLIzVfKoA96S+U/j8VgTCjnWnRRCTdW0hic+MuytW4x4meh9ELKld1zDEvhk+bb3TzilM4wR4p4yT63KqLKPQNOEHUXPEN9MEKcwo0JcJwgvqLx3KB/gbAMZPNG+3QKpqBs2775Jf51X7H/w76uvoDfVmmpP4sQC71FJh8nw8+Tbzi4mlA285Vl1KigC+u62IByoEcijkrQxvmIxsKxRcRCmx6scyQRe1ClXTz0cLQ5T3hiiTVL6yzIHUciVutpbTNyeGW8ZVSw0a9z/HkG4swFhbsCoCRnfTRXYhuBWPCxEVFyFzjv5fZDDhDhDKZ5k67Ir+EVMzbJ+OICycheGM6JI9YoCnqeuXarcyhmQ9zYNE7ZZPpWN1JpkjWJVQ6VRerPLEAFJJXcnp7wdzrZALyHM9SMyAL7lk5CFXwuLyUHztEiJ72k4XOQhRaS+dtWTwPr37bVg3z5IACiCR1IZ6SOsk0PVfkTq/du2Dbm2G5EOPMS7TE1cBzZPs5LunlQUPYdiJ35PqPujpovxPf1Ag2oQu7/y3jIbqHljw5nstqmtGlx3tzVP25QVgL34DPOr4G0pR8yH0ViDgQCHEbofA8aCgneX4j8u PRw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read-only and were unmapped, as long as the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. tlb flush can be defered when folios get unmapped as long as it guarantees to perform tlb flush needed, before the folios actually become used, of course, only if all the corresponding ptes don't have write permission. Otherwise, the system will get messed up. To achieve that, for the folios that map only to non-writable tlb entries, prevent tlb flush during unmapping but perform it just before the folios actually become used, out of buddy or pcp. However, we should cancel the pending by LUF and perform the deferred TLB flush right away when: 1. a writable pte is newly set through fault handler 2. a file is updated 3. kasan needs poisoning on free 4. the kernel wants to init pages on free No matter what type of workload is used for performance evaluation, the result would be positive thanks to the unconditional reduction of tlb flushes, tlb misses and interrupts. For the test, I picked up one of the most popular and heavy workload, llama.cpp that is a LLM(Large Language Model) inference engine. The result would depend on memory latency and how often reclaim runs, which implies tlb miss overhead and how many times unmapping happens. In my system, the result shows: 1. tlb shootdown interrupts are reduced about 97%. 2. The test program runtime is reduced about 4.5%. The test environment and the test set are like: Machine: bare metal, x86_64, Intel(R) Xeon(R) Gold 6430 CPU: 1 socket 64 core with hyper thread on Numa: 2 nodes (64 CPUs DRAM 42GB, no CPUs CXL expander 98GB) Config: swap off, numa balancing tiering on, demotion enabled llama.cpp/main -m $(70G_model1) -p "who are you?" -s 1 -t 15 -n 20 & llama.cpp/main -m $(70G_model2) -p "who are you?" -s 1 -t 15 -n 20 & llama.cpp/main -m $(70G_model3) -p "who are you?" -s 1 -t 15 -n 20 & wait where, -t: nr of threads, -s: seed used to make the runtime stable, -n: nr of tokens that determines the runtime, -p: prompt to ask, -m: LLM model to use. Run the test set 5 times successively with caches dropped every run via 'echo 3 > /proc/sys/vm/drop_caches'. Each inference prints its runtime at the end of each. The results are like: 1. Runtime from the output of llama.cpp BEFORE ------ llama_print_timings: total time = 883450.54 ms / 24 tokens llama_print_timings: total time = 861665.91 ms / 24 tokens llama_print_timings: total time = 898079.02 ms / 24 tokens llama_print_timings: total time = 879897.69 ms / 24 tokens llama_print_timings: total time = 892360.75 ms / 24 tokens llama_print_timings: total time = 884587.85 ms / 24 tokens llama_print_timings: total time = 861023.19 ms / 24 tokens llama_print_timings: total time = 900022.18 ms / 24 tokens llama_print_timings: total time = 878771.88 ms / 24 tokens llama_print_timings: total time = 889027.98 ms / 24 tokens llama_print_timings: total time = 880783.90 ms / 24 tokens llama_print_timings: total time = 856475.29 ms / 24 tokens llama_print_timings: total time = 896842.21 ms / 24 tokens llama_print_timings: total time = 878883.53 ms / 24 tokens llama_print_timings: total time = 890122.10 ms / 24 tokens AFTER ----- llama_print_timings: total time = 871060.86 ms / 24 tokens llama_print_timings: total time = 825609.53 ms / 24 tokens llama_print_timings: total time = 836854.81 ms / 24 tokens llama_print_timings: total time = 843147.99 ms / 24 tokens llama_print_timings: total time = 831426.65 ms / 24 tokens llama_print_timings: total time = 873939.23 ms / 24 tokens llama_print_timings: total time = 826127.69 ms / 24 tokens llama_print_timings: total time = 835489.26 ms / 24 tokens llama_print_timings: total time = 842589.62 ms / 24 tokens llama_print_timings: total time = 833700.66 ms / 24 tokens llama_print_timings: total time = 875996.19 ms / 24 tokens llama_print_timings: total time = 826401.73 ms / 24 tokens llama_print_timings: total time = 839341.28 ms / 24 tokens llama_print_timings: total time = 841075.10 ms / 24 tokens llama_print_timings: total time = 835136.41 ms / 24 tokens 2. tlb shootdowns from 'cat /proc/interrupts' BEFORE ------ TLB: 80911532 93691786 100296251 111062810 109769109 109862429 108968588 119175230 115779676 118377498 119325266 120300143 124514185 116697222 121068466 118031913 122660681 117494403 121819907 116960596 120936335 117217061 118630217 122322724 119595577 111693298 119232201 120030377 115334687 113179982 118808254 116353592 140987367 137095516 131724276 139742240 136501150 130428761 127585535 132483981 133430250 133756207 131786710 126365824 129812539 133850040 131742690 125142213 128572830 132234350 131945922 128417707 133355434 129972846 126331823 134050849 133991626 121129038 124637283 132830916 126875507 122322440 125776487 124340278 TLB shootdowns AFTER ----- TLB: 2121206 2615108 2983494 2911950 3055086 3092672 3204894 3346082 3286744 3307310 3357296 3315940 3428034 3112596 3143325 3185551 3186493 3322314 3330523 3339663 3156064 3272070 3296309 3198962 3332662 3315870 3234467 3353240 3281234 3300666 3345452 3173097 4009196 3932215 3898735 3726531 3717982 3671726 3728788 3724613 3799147 3691764 3620630 3684655 3666688 3393974 3448651 3487593 3446357 3618418 3671920 3712949 3575264 3715385 3641513 3630897 3691047 3630690 3504933 3662647 3629926 3443044 3832970 3548813 TLB shootdowns Signed-off-by: Byungchul Park --- include/asm-generic/tlb.h | 5 ++ include/linux/fs.h | 12 +++- include/linux/mm_types.h | 6 ++ include/linux/sched.h | 9 +++ kernel/sched/core.c | 1 + mm/internal.h | 94 ++++++++++++++++++++++++- mm/memory.c | 15 ++++ mm/pgtable-generic.c | 2 + mm/rmap.c | 141 +++++++++++++++++++++++++++++++++++--- mm/truncate.c | 55 +++++++++++++-- mm/vmscan.c | 12 +++- 11 files changed, 333 insertions(+), 19 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index b35b36fa7aabf..4b7d29d8ea794 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -567,6 +567,11 @@ static inline void tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct * static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) { + /* + * Don't leave stale tlb entries for this vma. + */ + luf_flush(0); + if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) return; diff --git a/include/linux/fs.h b/include/linux/fs.h index 820ff4752249e..78aaf769d32d1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -536,8 +536,18 @@ static inline int mapping_write_begin(struct file *file, loff_t pos, unsigned len, struct folio **foliop, void **fsdata) { - return mapping->a_ops->write_begin(file, mapping, pos, len, foliop, + int ret; + + ret = mapping->a_ops->write_begin(file, mapping, pos, len, foliop, fsdata); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + if (!ret) + luf_flush(0); + + return ret; } static inline int mapping_write_end(struct file *file, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index f52d4e49e8736..117f8e822e969 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1353,6 +1353,12 @@ extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_finish_mmu(struct mmu_gather *tlb); +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +void luf_flush(unsigned short luf_key); +#else +static inline void luf_flush(unsigned short luf_key) {} +#endif + struct vm_fault; /** diff --git a/include/linux/sched.h b/include/linux/sched.h index d1a3c97491ff2..47a0a3ccb7b1a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1408,6 +1408,15 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; struct tlbflush_unmap_batch tlb_ubc_ro; + struct tlbflush_unmap_batch tlb_ubc_luf; + +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + /* + * whether all the mappings of a folio during unmap are read-only + * so that luf can work on the folio + */ + bool can_luf; +#endif /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9aecd914ac691..1f4c5da800365 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5275,6 +5275,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) if (mm) { membarrier_mm_sync_core_before_usermode(mm); mmdrop_lazy_tlb_sched(mm); + luf_flush(0); } if (unlikely(prev_state == TASK_DEAD)) { diff --git a/mm/internal.h b/mm/internal.h index d34fd43086d89..2429db598e265 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1619,13 +1619,105 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } -#else + +unsigned short fold_unmap_luf(void); + +/* + * Reset the indicator indicating there are no writable mappings at the + * beginning of every rmap traverse for unmap. luf can work only when + * all the mappings are read-only. + */ +static inline void can_luf_init(struct folio *f) +{ + if (IS_ENABLED(CONFIG_DEBUG_PAGEALLOC)) + current->can_luf = false; + /* + * Pages might get updated inside buddy. + */ + else if (want_init_on_free()) + current->can_luf = false; + /* + * Pages might get updated inside buddy. + */ + else if (!should_skip_kasan_poison(folio_page(f, 0))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles zone device folio. + */ + else if (unlikely(folio_is_zone_device(f))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles hugetlb folio. + */ + else if (unlikely(folio_test_hugetlb(f))) + current->can_luf = false; + /* + * XXX: Remove the constraint once luf handles large folio. + */ + else if (unlikely(folio_test_large(f))) + current->can_luf = false; + /* + * Can track write of anon folios through fault handler. + */ + else if (folio_test_anon(f)) + current->can_luf = true; + /* + * Can track write of file folios through page cache or truncation. + */ + else if (folio_mapping(f)) + current->can_luf = true; + /* + * For niehter anon nor file folios, do not apply luf. + */ + else + current->can_luf = false; +} + +/* + * Mark the folio is not applicable to luf once it found a writble or + * dirty pte during rmap traverse for unmap. + */ +static inline void can_luf_fail(void) +{ + current->can_luf = false; +} + +/* + * Check if all the mappings are read-only. + */ +static inline bool can_luf_test(void) +{ + return current->can_luf; +} + +static inline bool can_luf_vma(struct vm_area_struct *vma) +{ + /* + * Shared region requires a medium like file to keep all the + * associated mm_struct. luf makes use of strcut address_space + * for that purpose. + */ + if (vma->vm_flags & VM_SHARED) + return !!vma->vm_file; + + /* + * Private region can be handled through its mm_struct. + */ + return true; +} +#else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ static inline bool luf_takeoff_start(void) { return false; } static inline void luf_takeoff_end(void) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct page *page) { return true; } static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } static inline bool non_luf_pages_ok(struct zone *zone) { return true; } +static inline unsigned short fold_unmap_luf(void) { return 0; } + +static inline void can_luf_init(struct folio *f) {} +static inline void can_luf_fail(void) {} +static inline bool can_luf_test(void) { return false; } +static inline bool can_luf_vma(struct vm_area_struct *vma) { return false; } #endif /* pagewalk.c */ diff --git a/mm/memory.c b/mm/memory.c index cacf6d53bdf32..e496d8deb887f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6216,6 +6216,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, struct mm_struct *mm = vma->vm_mm; vm_fault_t ret; bool is_droppable; + bool flush = false; __set_current_state(TASK_RUNNING); @@ -6241,6 +6242,14 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, lru_gen_enter_fault(vma); + /* + * Any potential cases that make pte writable even forcely + * should be considered. + */ + if (vma->vm_flags & (VM_WRITE | VM_MAYWRITE) || + flags & FAULT_FLAG_WRITE) + flush = true; + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else @@ -6272,6 +6281,12 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, out: mm_account_fault(mm, regs, address, flags, ret); + /* + * Ensure to clean stale tlb entries for this vma. + */ + if (flush) + luf_flush(0); + return ret; } EXPORT_SYMBOL_GPL(handle_mm_fault); diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5a882f2b10f90..d6678d6bac746 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -99,6 +99,8 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, pte = ptep_get_and_clear(mm, address, ptep); if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); + else + luf_flush(0); return pte; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index c9c594d73058c..2191cf1d38270 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -646,7 +646,7 @@ static atomic_long_t luf_ugen = ATOMIC_LONG_INIT(LUF_UGEN_INIT); /* * Don't return invalid luf_ugen, zero. */ -static unsigned long __maybe_unused new_luf_ugen(void) +static unsigned long new_luf_ugen(void) { unsigned long ugen = atomic_long_inc_return(&luf_ugen); @@ -723,7 +723,7 @@ static atomic_t luf_kgen = ATOMIC_INIT(1); /* * Don't return invalid luf_key, zero. */ -static unsigned short __maybe_unused new_luf_key(void) +static unsigned short new_luf_key(void) { unsigned short luf_key = atomic_inc_return(&luf_kgen); @@ -776,6 +776,7 @@ void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; if (!tlb_ubc_takeoff->flush_required) @@ -793,9 +794,72 @@ void try_to_unmap_flush_takeoff(void) if (arch_tlbbatch_done(&tlb_ubc_ro->arch, &tlb_ubc_takeoff->arch)) reset_batch(tlb_ubc_ro); + if (arch_tlbbatch_done(&tlb_ubc_luf->arch, &tlb_ubc_takeoff->arch)) + reset_batch(tlb_ubc_luf); + reset_batch(tlb_ubc_takeoff); } +/* + * Should be called just before try_to_unmap_flush() to optimize the tlb + * shootdown using arch_tlbbatch_done(). + */ +unsigned short fold_unmap_luf(void) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + struct luf_batch *lb; + unsigned long new_ugen; + unsigned short new_key; + unsigned long flags; + + if (!tlb_ubc_luf->flush_required) + return 0; + + /* + * fold_unmap_luf() is always followed by try_to_unmap_flush(). + */ + if (arch_tlbbatch_done(&tlb_ubc_luf->arch, &tlb_ubc->arch)) { + tlb_ubc_luf->flush_required = false; + tlb_ubc_luf->writable = false; + } + + /* + * Check again after shrinking. + */ + if (!tlb_ubc_luf->flush_required) + return 0; + + new_ugen = new_luf_ugen(); + new_key = new_luf_key(); + + /* + * Update the next entry of luf_batch table, that is the oldest + * entry among the candidate, hopefully tlb flushes have been + * done for all of the CPUs. + */ + lb = &luf_batch[new_key]; + write_lock_irqsave(&lb->lock, flags); + __fold_luf_batch(lb, tlb_ubc_luf, new_ugen); + write_unlock_irqrestore(&lb->lock, flags); + + reset_batch(tlb_ubc_luf); + return new_key; +} + +void luf_flush(unsigned short luf_key) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb = &luf_batch[luf_key]; + unsigned long flags; + + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -806,8 +870,10 @@ void try_to_unmap_flush(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; fold_batch(tlb_ubc, tlb_ubc_ro, true); + fold_batch(tlb_ubc, tlb_ubc_luf, true); if (!tlb_ubc->flush_required) return; @@ -820,8 +886,9 @@ void try_to_unmap_flush_dirty(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; - if (tlb_ubc->writable || tlb_ubc_ro->writable) + if (tlb_ubc->writable || tlb_ubc_ro->writable || tlb_ubc_luf->writable) try_to_unmap_flush(); } @@ -836,7 +903,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + struct vm_area_struct *vma) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -845,7 +913,16 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; - if (pte_write(pteval)) + if (can_luf_test()) { + /* + * luf cannot work with the folio once it found a + * writable or dirty mapping on it. + */ + if (pte_write(pteval) || !can_luf_vma(vma)) + can_luf_fail(); + } + + if (!can_luf_test()) tlb_ubc = ¤t->tlb_ubc; else tlb_ubc = ¤t->tlb_ubc_ro; @@ -853,6 +930,21 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; + if (can_luf_test()) { + struct luf_batch *lb; + unsigned long flags; + + /* + * Accumulate to the 0th entry right away so that + * luf_flush(0) can be uesed to properly perform pending + * TLB flush once this unmapping is observed. + */ + lb = &luf_batch[0]; + write_lock_irqsave(&lb->lock, flags); + __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); + write_unlock_irqrestore(&lb->lock, flags); + } + /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -907,6 +999,8 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) * This must be called under the PTL so that an access to tlb_flush_batched * that is potentially a "reclaim vs mprotect/munmap/etc" race will synchronise * via the PTL. + * + * LUF(Lazy Unmap Flush) also relies on this for mprotect/munmap/etc. */ void flush_tlb_batched_pending(struct mm_struct *mm) { @@ -916,6 +1010,7 @@ void flush_tlb_batched_pending(struct mm_struct *mm) if (pending != flushed) { arch_flush_tlb_batched_pending(mm); + /* * If the new TLB flushing is pending during flushing, leave * mm->tlb_flush_batched as is, to avoid losing flushing. @@ -926,7 +1021,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + struct vm_area_struct *vma) { } @@ -1300,6 +1396,11 @@ int folio_mkclean(struct folio *folio) rmap_walk(folio, &rwc); + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + return cleaned; } EXPORT_SYMBOL_GPL(folio_mkclean); @@ -2146,7 +2247,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * and traps if the PTE is unmapped. */ if (should_defer_flush(mm, flags)) - set_tlb_ubc_flush_pending(mm, pteval, address, end_addr); + set_tlb_ubc_flush_pending(mm, pteval, address, end_addr, vma); else flush_tlb_range(vma, address, end_addr); if (pte_dirty(pteval)) @@ -2329,6 +2430,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range_end(&range); + if (!ret) + can_luf_fail(); return ret; } @@ -2361,11 +2464,21 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + + can_luf_init(folio); if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); + + if (can_luf_test()) + fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); + else + fold_batch(tlb_ubc, tlb_ubc_ro, true); } /* @@ -2533,7 +2646,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE); + set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE, vma); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2669,6 +2782,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range_end(&range); + if (!ret) + can_luf_fail(); return ret; } @@ -2688,6 +2803,9 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) .done = folio_not_mapped, .anon_lock = folio_lock_anon_vma_read, }; + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; + struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; /* * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and @@ -2712,10 +2830,17 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) if (!folio_test_ksm(folio) && folio_test_anon(folio)) rwc.invalid_vma = invalid_migration_vma; + can_luf_init(folio); + if (flags & TTU_RMAP_LOCKED) rmap_walk_locked(folio, &rwc); else rmap_walk(folio, &rwc); + + if (can_luf_test()) + fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); + else + fold_batch(tlb_ubc, tlb_ubc_ro, true); } #ifdef CONFIG_DEVICE_PRIVATE diff --git a/mm/truncate.c b/mm/truncate.c index 031d0be19f42c..68c9ded2f789b 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -138,6 +138,11 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) if (aops->invalidate_folio) aops->invalidate_folio(folio, offset, length); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -174,6 +179,11 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) truncate_cleanup_folio(folio); filemap_remove_folio(folio); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return 0; } @@ -220,6 +230,12 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) if (folio_needs_release(folio)) folio_invalidate(folio, offset, length); + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + if (!folio_test_large(folio)) return true; @@ -289,19 +305,28 @@ EXPORT_SYMBOL(generic_error_remove_folio); */ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) { + long ret = 0; + /* The page may have been truncated before it was locked */ if (!mapping) - return 0; + goto out; if (folio_test_dirty(folio) || folio_test_writeback(folio)) - return 0; + goto out; /* The refcount will be elevated if any page in the folio is mapped */ if (folio_ref_count(folio) > folio_nr_pages(folio) + folio_has_private(folio) + 1) - return 0; + goto out; if (!filemap_release_folio(folio, 0)) - return 0; + goto out; - return remove_mapping(mapping, folio); + ret = remove_mapping(mapping, folio); +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + + return ret; } /** @@ -341,7 +366,7 @@ void truncate_inode_pages_range(struct address_space *mapping, bool same_folio; if (mapping_empty(mapping)) - return; + goto out; /* * 'start' and 'end' always covers the range of pages to be fully @@ -429,6 +454,12 @@ void truncate_inode_pages_range(struct address_space *mapping, truncate_folio_batch_exceptionals(mapping, &fbatch, indices); folio_batch_release(&fbatch); } + +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -544,6 +575,11 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, folio_batch_release(&fbatch); cond_resched(); } + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return count; } @@ -648,7 +684,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, int did_range_unmap = 0; if (mapping_empty(mapping)) - return 0; + goto out; folio_batch_init(&fbatch); index = start; @@ -709,6 +745,11 @@ int invalidate_inode_pages2_range(struct address_space *mapping, if (dax_mapping(mapping)) { unmap_mapping_pages(mapping, start, end - start + 1, false); } +out: + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index c8a995a3380ac..422b9a03a6753 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -838,6 +838,8 @@ static int __remove_mapping(struct address_space *mapping, struct folio *folio, */ long remove_mapping(struct address_space *mapping, struct folio *folio) { + long ret = 0; + if (__remove_mapping(mapping, folio, false, NULL)) { /* * Unfreezing the refcount with 1 effectively @@ -845,9 +847,15 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) * atomic operation. */ folio_ref_unfreeze(folio, 1); - return folio_nr_pages(folio); + ret = folio_nr_pages(folio); } - return 0; + + /* + * Ensure to clean stale tlb entries for this mapping. + */ + luf_flush(0); + + return ret; } /** From patchwork Wed Feb 26 12:01:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D5C9C021BF for ; Wed, 26 Feb 2025 12:02:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 726F0280025; Wed, 26 Feb 2025 07:01:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6430B280029; Wed, 26 Feb 2025 07:01:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 306EC280025; Wed, 26 Feb 2025 07:01:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E066C280028 for ; Wed, 26 Feb 2025 07:01:53 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 64100B69DA for ; Wed, 26 Feb 2025 12:01:53 +0000 (UTC) X-FDA: 83161956906.25.9FEA85C Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf02.hostedemail.com (Postfix) with ESMTP id 7C5AB80036 for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571311; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=7IYT/w6xPdcmsd0jRvhQSaQp9OV1AwUbyJ+BdqrbLGI=; b=tHR1Lp3sPbWYIHXZoZyIJPVQo3facbs2vOuM1B4L7oDnft2NN6S/v9a1jpJbgVIhexsCjf rzY0XE0GX5wFdTHo2PFe0cBUXuTM0XmT4YVY11VtglNwyY21YHfuf7SHV6zV3s0rXCnaSJ 2cDcN+x8ZI75g7s11+/6Ot9jqHvY2/8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571311; a=rsa-sha256; cv=none; b=YRmT15RqoprJ6ZH/b7Ssh4cnNK3Yfc0aXTG0sAVgkLCusQchrszLZPcAhRGFlRvUOpznjX 9Tuj5oqAaBeBud/9mHhVrfT09jSO0JnOpeGJcflkitNByz6SMp2VFyCC1Hcgj/3lmsJ5fb 4hxir+6iNsjp8dvvzbDQdTpYTJqKvHg= X-AuditID: a67dfc5b-3e1ff7000001d7ae-17-67bf02a77646 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 17/25] x86/tlb, riscv/tlb, arm64/tlbflush, mm: remove cpus from tlb shootdown that already have been done Date: Wed, 26 Feb 2025 21:01:24 +0900 Message-Id: <20250226120132.28469-17-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBjf/CFvMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8aM6SoFrXUVO7fFNTAeSO9i5OSQ EDCRWD17FlMXIweY3bY6ESTMJqAucePGT2YQW0TATOJg6x/2LkYuDmaBZUwSe080sIE4wgKr GCXaLm1iB6liEVCVePz9C5jNC9RxZ/ZVZogF8hKrNxwAszmB4v92/warERJIlmhZ/5sFZJCE wG02ie3XDrJDNEhKHFxxg2UCI+8CRoZVjEKZeWW5iZk5JnoZlXmZFXrJ+bmbGIEhvaz2T/QO xk8Xgg8xCnAwKvHwPjizN12INbGsuDL3EKMEB7OSCC9n5p50Id6UxMqq1KL8+KLSnNTiQ4zS HCxK4rxG38pThATSE0tSs1NTC1KLYLJMHJxSDYw1is8Ytxy4NnPmcXZu9yqJyR5PmeT+zVa5 82vfH+Hjusf3tiV98r7WcfRSQdGM3gB15Zq33hwyc3N8ssRnHo3cd8Y+gGFxisFJgSpW4eke X4Tde0sMvmrPfrxf3KmxykNiZePBX+EfLE/fT33K/EOZRWtmRW/ufrY7QZP/2U8W2FkVWSu2 UVCJpTgj0VCLuag4EQDlhN19ZQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrJLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wY4fXBZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlzJiuUtBaV7Fz W1wD44H0LkYODgkBE4m21YldjJwcbALqEjdu/GQGsUUEzCQOtv5h72Lk4mAWWMYksfdEAxuI IyywilGi7dImdpAqFgFVicffv4DZvEAdd2ZfBeuWEJCXWL3hAJjNCRT/t/s3WI2QQLJEy/rf LBMYuRYwMqxiFMnMK8tNzMwx1SvOzqjMy6zQS87P3cQIDNBltX8m7mD8ctn9EKMAB6MSD++D M3vThVgTy4orcw8xSnAwK4nwcmbuSRfiTUmsrEotyo8vKs1JLT7EKM3BoiTO6xWemiAkkJ5Y kpqdmlqQWgSTZeLglGpgXPp8laaEwv7vPcxJZpbMJwtWqZ/82ig79fhO718L5JKZj3Vs+hcb ZGjNOT3tpszmC/tj9Rd0vzzU76mqsPyFgK5jUH1p0MlTrxkLvrN2aH15HJA2saP7ybOQ3Xm8 zUx3JLanyiVJ2ZybO4Vjscz/kzPuC7RMfDtvs82vxfsv1Wcbb263C9utpcRSnJFoqMVcVJwI ACkTWk1MAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 7C5AB80036 X-Stat-Signature: r11fw88q7cfb596e58hccb6jbhbj8b7m X-HE-Tag: 1740571310-854352 X-HE-Meta: U2FsdGVkX19n+werCZH5sYvHE9avYH4G48ZomBQAxwYQumzErcmev84wlI22FyMzPnQQi+n4X3TNVXhJOZXh1pgj9fd8pw0bkT57SFvYukC68To8mG1EsCHS30FFSbD9JbkB6Std2U1icM4DapgyWUyVCVt3bQ1lQZw3BdPnDK70yZ8be/4SGW1PMLTIUpL8YxQdsM/Wn0PoNYu+6z7cspVK9EaFUlcH8ERN1A6V8SuGdIScuMuAK1pcallX2vqsTKTPyfr7DYvHh9Y8Re29C3bNSRlgEp5CrN1p0SuszhiPwrLrxyF+jtGqdxZee5Z84WggWzEq5Zxb/ry9SQJRUuVhH5oXahuk/C48RM/J8YL9XvTxagmwM3Iu8WZ18WBoGQa5x5PvEsR/ThD1+SbfovyqpPZs1+T+w4EeNtrKu58OCh0v2oZQczRS0uyIQJ6QlDc1QlmZEAnDkWNiV0o6PZDhRNmFtHpK3w4vWeM23O3tyzfvP8nqoYWBAePocdlXTOQhMMTgV4GgldVC4AMvXJqccAAc0MWfcRbWNw2c7s+6JY5x1ACJBIsXQuDO9yOme7yTOQGMHlUNlQGMLMa/jqxtc1I/CNr0yQ4IMzyjJBV/hyw6eSpyDi4bdnBgb/e26QaPOQEG4PuHteEcIosLXGhSK07Y1OAl6IHtQFL/ihnJ3lNVKwy41U6yMVEuTOWnIAhvnTrGo1mNu3R7hh0QIGBLUzvpJHbvrSgxcVl4ee389+7Ywp2Mzb0VrW2GHnCNwdXAT8c/2CeZL/CGGROeLm65TOAkJw2peCGD7oOBQ81Lfr5YpvJoFynlZzNKeT0Jy5RpU7kVzil3X71Qr5oLRkiGvOp7CktAg7eCUu5mnB3rV/WkTsUUu3YMW/iqSsefTMd9kb6o65F1RfKqaZtHfPco8DQamidrMrHyvJ7fF4xCMPZxVhd7UAGQRRT137X5VHoESYK78kiNOKz6r3x YR4+3Uh+ yce7IqPE6BLkMGr2lcO8YokHCxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: luf mechanism performs tlb shootdown for mappings that have been unmapped in lazy manner. However, it doesn't have to perform tlb shootdown to cpus that already have been done by others since the tlb shootdown was desired. Since luf already introduced its own generation number used as a global timestamp, luf_ugen, it's possible to selectively pick cpus that have been done tlb flush required. This patch introduced APIs that use the generation number to select and remove those cpus so that it can perform tlb shootdown with a smaller cpumask, for all the CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH archs, x86, riscv, and arm64. Signed-off-by: Byungchul Park --- arch/arm64/include/asm/tlbflush.h | 26 +++++++ arch/riscv/include/asm/tlbflush.h | 4 ++ arch/riscv/mm/tlbflush.c | 108 ++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 4 ++ arch/x86/mm/tlb.c | 108 ++++++++++++++++++++++++++++++ include/linux/sched.h | 1 + mm/internal.h | 4 ++ mm/page_alloc.c | 32 +++++++-- mm/rmap.c | 46 ++++++++++++- 9 files changed, 327 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index f7036cd33e35c..ae3c981fcc218 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -347,6 +347,32 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) dsb(ish); } +static inline bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* + * Nothing is needed in this architecture. + */ + return true; +} + +static inline bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* + * Nothing is needed in this architecture. + */ + return true; +} + +static inline void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) +{ + /* nothing to do */ +} + +static inline void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + /* nothing to do */ +} + static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { /* nothing to do */ diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index cecd8e7e2a3bd..936bf9ce0abd9 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -64,6 +64,10 @@ void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm, unsigned long start, unsigned long end); void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 38f4bea8a964a..6ce44370a8e11 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -201,3 +201,111 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0, FLUSH_TLB_MAX_SIZE, PAGE_SIZE); } + +static DEFINE_PER_CPU(atomic_long_t, ugen_done); + +static int __init luf_init_arch(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_possible_mask) + atomic_long_set(per_cpu_ptr(&ugen_done, cpu), LUF_UGEN_INIT - 1); + + return 0; +} +early_initcall(luf_init_arch); + +/* + * batch will not be updated. + */ +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (ugen_before(done, ugen)) + return false; + } + return true; +out: + return cpumask_empty(&batch->cpumask); +} + +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (!ugen_before(done, ugen)) + cpumask_clear_cpu(cpu, &batch->cpumask); + } +out: + return cpumask_empty(&batch->cpumask); +} + +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, &batch->cpumask) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 52c54ca68ca9e..58ad7e6989bb1 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -293,6 +293,10 @@ static inline void arch_flush_tlb_batched_pending(struct mm_struct *mm) } extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +extern bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); +extern void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 523e8bb6fba1f..be6068b60c32d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1270,6 +1270,114 @@ void __flush_tlb_all(void) } EXPORT_SYMBOL_GPL(__flush_tlb_all); +static DEFINE_PER_CPU(atomic_long_t, ugen_done); + +static int __init luf_init_arch(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_possible_mask) + atomic_long_set(per_cpu_ptr(&ugen_done, cpu), LUF_UGEN_INIT - 1); + + return 0; +} +early_initcall(luf_init_arch); + +/* + * batch will not be updated. + */ +bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (ugen_before(done, ugen)) + return false; + } + return true; +out: + return cpumask_empty(&batch->cpumask); +} + +bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + goto out; + + for_each_cpu(cpu, &batch->cpumask) { + unsigned long done; + + done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); + if (!ugen_before(done, ugen)) + cpumask_clear_cpu(cpu, &batch->cpumask); + } +out: + return cpumask_empty(&batch->cpumask); +} + +void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, + unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, &batch->cpumask) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + +void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) +{ + int cpu; + + if (!ugen) + return; + + for_each_cpu(cpu, mm_cpumask(mm)) { + atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); + unsigned long old = atomic_long_read(done); + + /* + * It's racy. The race results in unnecessary tlb flush + * because of the smaller ugen_done than it should be. + * However, it's okay in terms of correctness. + */ + if (!ugen_before(old, ugen)) + continue; + + /* + * It's for optimization. Just skip on fail than retry. + */ + atomic_long_cmpxchg(done, old, ugen); + } +} + void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info *info; diff --git a/include/linux/sched.h b/include/linux/sched.h index 47a0a3ccb7b1a..31efc88ce911a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1403,6 +1403,7 @@ struct task_struct { #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) int luf_no_shootdown; int luf_takeoff_started; + unsigned long luf_ugen; #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/internal.h b/mm/internal.h index 2429db598e265..9fccfd38e03f0 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1276,6 +1276,7 @@ void try_to_unmap_flush(void); void try_to_unmap_flush_dirty(void); void try_to_unmap_flush_takeoff(void); void flush_tlb_batched_pending(struct mm_struct *mm); +void reset_batch(struct tlbflush_unmap_batch *batch); void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset); void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src); #else @@ -1291,6 +1292,9 @@ static inline void try_to_unmap_flush_takeoff(void) static inline void flush_tlb_batched_pending(struct mm_struct *mm) { } +static inline void reset_batch(struct tlbflush_unmap_batch *batch) +{ +} static inline void fold_batch(struct tlbflush_unmap_batch *dst, struct tlbflush_unmap_batch *src, bool reset) { } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 530c5c16ab323..7b023b34d53da 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -668,9 +668,11 @@ bool luf_takeoff_start(void) */ void luf_takeoff_end(void) { + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned long flags; bool no_shootdown; bool outmost = false; + unsigned long cur_luf_ugen; local_irq_save(flags); VM_WARN_ON(!current->luf_takeoff_started); @@ -697,10 +699,19 @@ void luf_takeoff_end(void) if (no_shootdown) goto out; + cur_luf_ugen = current->luf_ugen; + + current->luf_ugen = 0; + + if (cur_luf_ugen && arch_tlbbatch_diet(&tlb_ubc_takeoff->arch, cur_luf_ugen)) + reset_batch(tlb_ubc_takeoff); + try_to_unmap_flush_takeoff(); out: - if (outmost) + if (outmost) { VM_WARN_ON(current->luf_no_shootdown); + VM_WARN_ON(current->luf_ugen); + } } /* @@ -757,6 +768,7 @@ bool luf_takeoff_check_and_fold(struct page *page) struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned short luf_key = page_luf_key(page); struct luf_batch *lb; + unsigned long lb_ugen; unsigned long flags; /* @@ -770,13 +782,25 @@ bool luf_takeoff_check_and_fold(struct page *page) if (!luf_key) return true; - if (current->luf_no_shootdown) - return false; - lb = &luf_batch[luf_key]; read_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + + if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + read_unlock_irqrestore(&lb->lock, flags); + return true; + } + + if (current->luf_no_shootdown) { + read_unlock_irqrestore(&lb->lock, flags); + return false; + } + fold_batch(tlb_ubc_takeoff, &lb->batch, false); read_unlock_irqrestore(&lb->lock, flags); + + if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) + current->luf_ugen = lb_ugen; return true; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index 2191cf1d38270..579c75f46c170 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -656,7 +656,7 @@ static unsigned long new_luf_ugen(void) return ugen; } -static void reset_batch(struct tlbflush_unmap_batch *batch) +void reset_batch(struct tlbflush_unmap_batch *batch) { arch_tlbbatch_clear(&batch->arch); batch->flush_required = false; @@ -743,8 +743,14 @@ static void __fold_luf_batch(struct luf_batch *dst_lb, * more tlb shootdown might be needed to fulfill the newer * request. Conservertively keep the newer one. */ - if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) + if (!dst_lb->ugen || ugen_before(dst_lb->ugen, src_ugen)) { + /* + * Good chance to shrink the batch using the old ugen. + */ + if (dst_lb->ugen && arch_tlbbatch_diet(&dst_lb->batch.arch, dst_lb->ugen)) + reset_batch(&dst_lb->batch); dst_lb->ugen = src_ugen; + } fold_batch(&dst_lb->batch, src_batch, false); } @@ -772,17 +778,45 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +static unsigned long tlb_flush_start(void) +{ + /* + * Memory barrier implied in the atomic operation prevents + * reading luf_ugen from happening after the following + * tlb flush. + */ + return new_luf_ugen(); +} + +static void tlb_flush_end(struct arch_tlbflush_unmap_batch *arch, + struct mm_struct *mm, unsigned long ugen) +{ + /* + * Prevent the following marking from placing prior to the + * actual tlb flush. + */ + smp_mb(); + + if (arch) + arch_tlbbatch_mark_ugen(arch, ugen); + if (mm) + arch_mm_mark_ugen(mm, ugen); +} + void try_to_unmap_flush_takeoff(void) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned long ugen; if (!tlb_ubc_takeoff->flush_required) return; + ugen = tlb_flush_start(); arch_tlbbatch_flush(&tlb_ubc_takeoff->arch); + tlb_flush_end(&tlb_ubc_takeoff->arch, NULL, ugen); /* * Now that tlb shootdown of tlb_ubc_takeoff has been performed, @@ -871,13 +905,17 @@ void try_to_unmap_flush(void) struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; struct tlbflush_unmap_batch *tlb_ubc_ro = ¤t->tlb_ubc_ro; struct tlbflush_unmap_batch *tlb_ubc_luf = ¤t->tlb_ubc_luf; + unsigned long ugen; fold_batch(tlb_ubc, tlb_ubc_ro, true); fold_batch(tlb_ubc, tlb_ubc_luf, true); if (!tlb_ubc->flush_required) return; + ugen = tlb_flush_start(); arch_tlbbatch_flush(&tlb_ubc->arch); + tlb_flush_end(&tlb_ubc->arch, NULL, ugen); + reset_batch(tlb_ubc); } @@ -1009,7 +1047,11 @@ void flush_tlb_batched_pending(struct mm_struct *mm) int flushed = batch >> TLB_FLUSH_BATCH_FLUSHED_SHIFT; if (pending != flushed) { + unsigned long ugen; + + ugen = tlb_flush_start(); arch_flush_tlb_batched_pending(mm); + tlb_flush_end(NULL, mm, ugen); /* * If the new TLB flushing is pending during flushing, leave From patchwork Wed Feb 26 12:01:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992186 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68F8DC021BF for ; Wed, 26 Feb 2025 12:02:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 689D3280029; Wed, 26 Feb 2025 07:01:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 63979280028; Wed, 26 Feb 2025 07:01:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3353F28002C; Wed, 26 Feb 2025 07:01:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0146D280028 for ; Wed, 26 Feb 2025 07:01:54 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id B39DF1C9DDC for ; Wed, 26 Feb 2025 12:01:54 +0000 (UTC) X-FDA: 83161956948.04.3C8DDCF Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id C4A944004F for ; Wed, 26 Feb 2025 12:01:51 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571312; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=FeXd4p0Ju1xaEhEL2I+RWpo4YNkrBjnBrmDajR22eZk=; b=UWIu9telKyPdDl/ns/e3jEmWC6YzO3/ytooiDWCIkaiUpHMXbooHSlG9WIs8z1ZBCW6/EQ nBidbjIRTx5ffWOfYIWiUh0fLAcbd4OzZcApacPpQQPzrg+UptEDxnzrR4Yl8r8jGF/oWh NOVwjcMxTVAzQvFt3TDXG9yV/yIGuok= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571312; a=rsa-sha256; cv=none; b=OcmQwEHWQJwMifgT4kW+EG/xHClhRSvL6PPxVKywkjZeEA3FfHM/0y2dZY89j7DTBnqBXZ Lu4ZymRCFtDXFKv4Mvctu7cb60wba9TNOJHANAgADvKAtY8UZGpGwSlYPM2GO5n7BNdEKi YEP0srvXd1ZSm+jkiJy6BIywRwxWMiM= X-AuditID: a67dfc5b-3e1ff7000001d7ae-1c-67bf02a76aa4 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 18/25] mm/page_alloc: retry 3 times to take pcp pages on luf check failure Date: Wed, 26 Feb 2025 21:01:25 +0900 Message-Id: <20250226120132.28469-18-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBvv3i1vMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bZ3rssBU+4Kp6/X87UwLiFo4uR k0NCwESicfJhRhj7zs9NzCA2m4C6xI0bP8FsEQEziYOtf9i7GLk4mAWWMUnsPdHABuIIC3Qz Sqw8eRWsm0VAVWLVzTMsIDYvUMe8yxugpspLrN5wAGwSJ1D83+7f7CC2kECyRMv63ywggyQE 7rNJXPv5hx2iQVLi4IobLBMYeRcwMqxiFMrMK8tNzMwx0cuozMus0EvOz93ECAzsZbV/oncw froQfIhRgINRiYf3wZm96UKsiWXFlbmHGCU4mJVEeDkz96QL8aYkVlalFuXHF5XmpBYfYpTm YFES5zX6Vp4iJJCeWJKanZpakFoEk2Xi4JRqYEy9XC2RuCf75rKaENFqN5WOFG/FH3yBS17G nauTnP1tpa6BtvoFp8tRWzpnKGyXfFXk/CRld3dy4QJzt09Mq1t4VsavyF9SG26xZfUUjvt6 ipsXlOx5H+i0Nzdyj1JiWP6F29PnX9H7p9nBs2ZH0E7xCgmLr+wZwkzLhL5HnLJb9HrfgumP jJVYijMSDbWYi4oTAdW9NNZoAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wb7D/BZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlnO29y1LwhKvi +fvlTA2MWzi6GDk5JARMJO783MQMYrMJqEvcuPETzBYRMJM42PqHvYuRi4NZYBmTxN4TDWwg jrBAN6PEypNXGUGqWARUJVbdPMMCYvMCdcy7vIERYqq8xOoNB8AmcQLF/+3+zQ5iCwkkS7Ss /80ygZFrASPDKkaRzLyy3MTMHFO94uyMyrzMCr3k/NxNjMAwXVb7Z+IOxi+X3Q8xCnAwKvHw PjizN12INbGsuDL3EKMEB7OSCC9n5p50Id6UxMqq1KL8+KLSnNTiQ4zSHCxK4rxe4akJQgLp iSWp2ampBalFMFkmDk6pBsZNn6cJfFqT4Hfudpj5eeWtZ2L3aCSp/91h7q/LNVHMQXCj/Cq+ k7pnZ15vzDY67HRut9eumDsFZ41lVghGnnRbkTxzq2r83rIz2qtWPto7YbK8NqPb1J6kX5eL Xss6Oq8tZ9Db8zTB9d/s3z9u1eV5vNGIPneL5dXz2gRXwZdXROafbdgVdf2yEktxRqKhFnNR cSIA/NkYRU8CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: C4A944004F X-Stat-Signature: f1hjxn4yogiyb449raybcxterebcoqx9 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571311-76084 X-HE-Meta: U2FsdGVkX1+P59qFXX4FoFyFCkkiQJHW4XvyHFBDHvDKEomc5B3V0TkYpHL1H04AMq35jMzi6OOm8HMiNWkIv3GfktFi1m9UHpagoPxmomeFf/vB/n5zHV2IdhY2BiT3fwuXuwptqe9zvIwU0OIc/YkXDd1jNduQSEnxAuOr6YPVAifAnof95iA0NYaim1PhScLfxufXMtlLI+QOTMQ/xvs/dCJKvT5ustMYjlzR929rlBBH33utCNbDVOYS8YUSZt1w9F3eR4C7POWz6T+PzB1fv+dzIYs0MswxGAoP52YlebOqXFqp1Fqu7d6mEirp68+3gXc6yvlc6/lIZFgI1FlE9UMNsjEYNS7RKPzqoHvsTJReRcTf1oLYwbUS//jpNM1KygIxddF7uTYEa8gv5H11k4TcURjTKr5CjfTWq1vH0mcoW2fB0PxZazUTzzs/lNqByhnFwn4yoUgme/Hxd4oXvEqLQgG9DJ23aghaZCfeYICX1G9Znj+yDd48yKq4SAh+Gmwakxw/HhhGTZRT350g2hPCyUdOOXP9ZFJtooU5UH0/1Tfee6cpvwcfvlvNb+wpcelzSR3a42b6DaIDTZ6pQFx3FtKGDXsuViajo7c/QuP2LzZjuQdwrWMfShyn9Dt873GDUNCxB2+Kb2lna3ytTuwj7qMeBGuc+9o5CjJBtYZf/3KhUYg+BbAVhfJCfho9yM0aE/pzLR779YoIBjP0QJbifMYHS4rBNfIDgbznld+DLKtvCZdGiaCHgmaPC7F7Fzw2QtBJSWeUHgXHBDmO4OdbYjprdYDNwiI5e1c66Wi7tV3nz7f4rR03ofvBkaCteFKeRL9X9ijEs1zS1IlYVDLjetVQXw2rUi99fY6M98rbHr31PyECy2/FB3DMtQHiE5fXX4NHxuKFCC1fovei1ykW8SaEdoBhfildwJgC61CuYCWd6hPDx8gV15U4OXN1ZSNADTk9CE8uO6u aQWkMxgJ ZHFCbFYjxwBaOrLwf+YB7drbO9TnjbwnNuL48bP1TVMNYGjInay6MOV7Wjr73hDdgxn0y X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Signed-off-by: Byungchul Park --- mm/page_alloc.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7b023b34d53da..f35ae2550019f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3384,6 +3384,12 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, { struct page *page; + /* + * give up taking page from pcp if it fails to take pcp page + * 3 times due to the tlb shootdownable issue. + */ + int try_luf_pages = 3; + do { if (list_empty(list)) { int batch = nr_pcp_alloc(pcp, zone, order); @@ -3398,11 +3404,21 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, return NULL; } - page = list_first_entry(list, struct page, pcp_list); - if (!luf_takeoff_check_and_fold(page)) + list_for_each_entry(page, list, pcp_list) { + if (luf_takeoff_check_and_fold(page)) { + list_del(&page->pcp_list); + pcp->count -= 1 << order; + break; + } + if (!--try_luf_pages) + return NULL; + } + + /* + * If all the pages in the list fails... + */ + if (list_entry_is_head(page, list, pcp_list)) return NULL; - list_del(&page->pcp_list); - pcp->count -= 1 << order; } while (check_new_pages(page, order)); return page; From patchwork Wed Feb 26 12:01:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992184 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F231AC18E7C for ; Wed, 26 Feb 2025 12:02:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBFDF28002A; Wed, 26 Feb 2025 07:01:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AF823280029; Wed, 26 Feb 2025 07:01:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7277728002A; Wed, 26 Feb 2025 07:01:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4DB2B280028 for ; Wed, 26 Feb 2025 07:01:54 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id CD421121224 for ; Wed, 26 Feb 2025 12:01:53 +0000 (UTC) X-FDA: 83161956906.21.9A54CE1 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id CC58C80027 for ; Wed, 26 Feb 2025 12:01:51 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571312; a=rsa-sha256; cv=none; b=GTw76pqnQiR92WNatUGBGUJDA/96WjLJ5G6+3p8xv0griXAj+6r7nhNJHkgEHuKjMn6mDK hg7fdcJtZ+jKorAlSJThlGCesnL+i6lC8XwsYPq1fAeMp0s02z0jJEYHB2yGittqHkhy4B qwNZ8Io+zusq3OebBpfbPPIHVeUjOT0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571312; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=ItOnexWAaT7oAYLZ4TOVQWo2x2U0cyr2di81G90Ku/g=; b=qEDcR20O4e2VZxix/loioTEVKHawMQU/k8QyMvRn0QX8o72vmMxwfbPWu6ZpLhAy3acxf+ G/2UaZ4D9w8wGXE7jlG2SkOdtpymG/eFx5rqzM/EGdj3Wu2Ocix8ocRHsXPUBj8vLtcU7Q Yvk6qGYFBOVNOlwdI69hZLQ4gLmNA9Y= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3e1ff7000001d7ae-21-67bf02a7b6fa From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 19/25] mm: skip luf tlb flush for luf'd mm that already has been done Date: Wed, 26 Feb 2025 21:01:26 +0900 Message-Id: <20250226120132.28469-19-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBpfmyFrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV0bP9XlMBS12Fe/mb2JqYNxl1MXI ySEhYCKx+vBDVhh77pmLLCA2m4C6xI0bP5lBbBEBM4mDrX/Yuxi5OJgFljFJ7D3RwAbiCAu0 Mkr8v7SUCaSKRUBVYvujF2AdvEAdVy78ZYaYKi+xesMBMJsTKP5v9292EFtIIFmiZf1vFpBB EgL32ST2tR9hhGiQlDi44gbLBEbeBYwMqxiFMvPKchMzc0z0MirzMiv0kvNzNzECA3tZ7Z/o HYyfLgQfYhTgYFTi4X1wZm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcY pTlYlMR5jb6VpwgJpCeWpGanphakFsFkmTg4pRoYkwR8lrRb/Zmp+9V1D0vUPqPYSZ6m4Yz5 rSqrVp89wheqMdH1xtFtG6Ojfijt/XFoR2ULS8Y61Yu81qr3p1388Mm0+/D0eieDb7e7hAxO dl9ovWuZrmI4Y8mZEjuHZiPp1v9Z3FYH1rdYdQkbMv53cXPSWfKc7ycX19tzBXdk34unnnN+ ZrRIiaU4I9FQi7moOBEA2Y0ne2gCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wdNNIhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBl9Fyfx1TQYlfx bv4mpgbGXUZdjJwcEgImEnPPXGQBsdkE1CVu3PjJDGKLCJhJHGz9w97FyMXBLLCMSWLviQY2 EEdYoJVR4v+lpUwgVSwCqhLbH70A6+AF6rhy4S8zxFR5idUbDoDZnEDxf7t/s4PYQgLJEi3r f7NMYORawMiwilEkM68sNzEzx1SvODujMi+zQi85P3cTIzBMl9X+mbiD8ctl90OMAhyMSjy8 D87sTRdiTSwrrsw9xCjBwawkwsuZuSddiDclsbIqtSg/vqg0J7X4EKM0B4uSOK9XeGqCkEB6 YklqdmpqQWoRTJaJg1OqgdGwWU7/2eG3U0+aWc06mVXAruR6rNdi46rfTyOMlvY5uuxY+Yr5 a8zBWQxGR+ZsvX/EV8xPcOvK5Y8zf6/9fO2a9RVmnzemdlNf+GfuupbZ9ldviU2TaGnxEU6/ BY5/jnQwsbQErQp2mTx/i+SOkL+rlDM7tiYqSmsuVNOPU3sQ9G55w4x9kg+UWIozEg21mIuK EwHu8XZ8TwIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: CC58C80027 X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: nsxm9edjnwyqrimqwqxfoxexw6i3n41z X-HE-Tag: 1740571311-656971 X-HE-Meta: U2FsdGVkX1+YZeT9Ifa9GVCy/UfJOmxr1/V3mjIGNIgIvs8+3UnYDVx94k+F7R2d/6IIO5zECnYVEXFO4IbbABqGy1ooWXr8Ptwc8nFgGORAmB/uiTMdYRBxCh9PU1nys75R8d+EqgZgPV1N4hVWGmErvFY0Bkyt8ic6JaE/DB8Ytm+IfvtjzHFM8IaF/xYsgxxt0cdXyFZHj00Z9P5Hfe5U+LNlrdbkWp/1fmBVuR/dLUdiGEN6KlQ1JJH4Y1TJMKwdmF8MzRDmoKAn30sb10bjy0he5wGP9+yLigu43kizryq955ilT3TwNyNqsUKW6eqnNJY8CI6igqzjF0IUWzbTsvoxsD8VYEhRzTi99sCVIdcviZY+Y3FUArx+4KbeWPq/G2Ft8xZm3QlDNjwGA7MkpPU0xWPL/z1Hn5xQmlVcOYRdnjsdZ8Z8cZfpsaqPqtEnd215cp+GJs5klZeij4enz1g3XI1M0GQ0hbAgB7Hsvk1F/lx/giNRerJrh28vt0VHybjvmZwGeCkRaVUJNzrgwY5DLqSDE0XVA4jGT+eQCoR/Ak4X7LeWOk6/dM0iGmhmNIOWY5tIk4k62KOLGhY3h0REkA+LI4FSobmD+dNkRsTIEBr4FbbUn9l7vS0r/GyQrfzPHqtWTUSE/fJ2kycpoLlLNGoUTpslqt7zjyXaQAklxQFsLYSf+W+SUaDR/xzzfQsshprFuGxMuWk5yLacSVkOGV1ZtBK7u1CnuRGIsMOJ4TruIOQH25ScakkRhjfJViGeu6jlr6wsnRpRuuN8UzyNtgqJ7nSSdVRynN8ptnGwps1/nKgda0z3qtEudpbcJDAZMWjF3z3nNO6hUQqPxf4Tnd7bbKBTdLoMmLBugjYWlUr9i6fwJzomJ5zYvBzc2bn84n4AjwKyBnx3FksFWO64JCpHRE8Jud22yoEe1QW2Crc5+97tueSXTl5/Ru7wXa5yqfeiEjWHbhv vaCDnrMt qrA3i6diu1grVWEZZTrDk4XG/nA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fault hander performs tlb flush pended by luf when a new pte becomes to have write permission, no matter whether tlb flush required has been performed or not. By storing luf generation number, luf_ugen, in struct mm_struct, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- include/asm-generic/tlb.h | 2 +- include/linux/mm_types.h | 9 +++++ kernel/fork.c | 1 + kernel/sched/core.c | 2 +- mm/memory.c | 22 ++++++++++-- mm/pgtable-generic.c | 2 +- mm/rmap.c | 74 +++++++++++++++++++++++++++++++++++++-- 7 files changed, 104 insertions(+), 8 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 4b7d29d8ea794..5be3487bd9192 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -570,7 +570,7 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm /* * Don't leave stale tlb entries for this vma. */ - luf_flush(0); + luf_flush_vma(vma); if (tlb->fullmm || IS_ENABLED(CONFIG_MMU_GATHER_MERGE_VMAS)) return; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 117f8e822e969..c32ef19a25056 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -39,8 +39,10 @@ struct luf_batch { unsigned long ugen; rwlock_t lock; }; +void luf_batch_init(struct luf_batch *lb); #else struct luf_batch {}; +static inline void luf_batch_init(struct luf_batch *lb) {} #endif /* @@ -1073,6 +1075,9 @@ struct mm_struct { * moving a PROT_NONE mapped page. */ atomic_t tlb_flush_pending; + + /* luf batch for this mm */ + struct luf_batch luf_batch; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* See flush_tlb_batched_pending() */ atomic_t tlb_flush_batched; @@ -1355,8 +1360,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) void luf_flush(unsigned short luf_key); +void luf_flush_mm(struct mm_struct *mm); +void luf_flush_vma(struct vm_area_struct *vma); #else static inline void luf_flush(unsigned short luf_key) {} +static inline void luf_flush_mm(struct mm_struct *mm) {} +static inline void luf_flush_vma(struct vm_area_struct *vma) {} #endif struct vm_fault; diff --git a/kernel/fork.c b/kernel/fork.c index 364b2d4fd3efa..15274eabc727d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1265,6 +1265,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); spin_lock_init(&mm->page_table_lock); spin_lock_init(&mm->arg_lock); + luf_batch_init(&mm->luf_batch); mm_init_cpumask(mm); mm_init_aio(mm); mm_init_owner(mm, p); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1f4c5da800365..ec132abbbce6e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5275,7 +5275,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) if (mm) { membarrier_mm_sync_core_before_usermode(mm); mmdrop_lazy_tlb_sched(mm); - luf_flush(0); + luf_flush_mm(mm); } if (unlikely(prev_state == TASK_DEAD)) { diff --git a/mm/memory.c b/mm/memory.c index e496d8deb887f..93e5879583b07 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6216,6 +6216,7 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, struct mm_struct *mm = vma->vm_mm; vm_fault_t ret; bool is_droppable; + struct address_space *mapping = NULL; bool flush = false; __set_current_state(TASK_RUNNING); @@ -6247,9 +6248,17 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, * should be considered. */ if (vma->vm_flags & (VM_WRITE | VM_MAYWRITE) || - flags & FAULT_FLAG_WRITE) + flags & FAULT_FLAG_WRITE) { flush = true; + /* + * Doesn't care the !VM_SHARED cases because it won't + * update the pages that might be shared with others. + */ + if (vma->vm_flags & VM_SHARED && vma->vm_file) + mapping = vma->vm_file->f_mapping; + } + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else @@ -6284,8 +6293,15 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, /* * Ensure to clean stale tlb entries for this vma. */ - if (flush) - luf_flush(0); + if (flush) { + /* + * If it has a VM_SHARED mapping, all the mms involved + * should be luf_flush'ed. + */ + if (mapping) + luf_flush(0); + luf_flush_mm(mm); + } return ret; } diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index d6678d6bac746..545d401db82c1 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -100,7 +100,7 @@ pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); else - luf_flush(0); + luf_flush_vma(vma); return pte; } #endif diff --git a/mm/rmap.c b/mm/rmap.c index 579c75f46c170..fe9c4606ae542 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -695,7 +695,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, */ struct luf_batch luf_batch[NR_LUF_BATCH]; -static void luf_batch_init(struct luf_batch *lb) +void luf_batch_init(struct luf_batch *lb) { rwlock_init(&lb->lock); reset_batch(&lb->batch); @@ -778,6 +778,31 @@ void fold_luf_batch(struct luf_batch *dst, struct luf_batch *src) read_unlock_irqrestore(&src->lock, flags); } +static void fold_luf_batch_mm(struct luf_batch *dst, + struct mm_struct *mm) +{ + unsigned long flags; + bool need_fold = false; + + read_lock_irqsave(&dst->lock, flags); + if (arch_tlbbatch_need_fold(&dst->batch.arch, mm)) + need_fold = true; + read_unlock(&dst->lock); + + write_lock(&dst->lock); + if (unlikely(need_fold)) + arch_tlbbatch_add_pending(&dst->batch.arch, mm, 0, -1UL); + + /* + * dst->ugen represents sort of request for tlb shootdown. The + * newer it is, the more tlb shootdown might be needed to + * fulfill the newer request. Keep the newest one not to miss + * necessary tlb shootdown. + */ + dst->ugen = new_luf_ugen(); + write_unlock_irqrestore(&dst->lock, flags); +} + static unsigned long tlb_flush_start(void) { /* @@ -894,6 +919,49 @@ void luf_flush(unsigned short luf_key) } EXPORT_SYMBOL(luf_flush); +void luf_flush_vma(struct vm_area_struct *vma) +{ + struct mm_struct *mm; + struct address_space *mapping = NULL; + + if (!vma) + return; + + mm = vma->vm_mm; + /* + * Doesn't care the !VM_SHARED cases because it won't + * update the pages that might be shared with others. + */ + if (vma->vm_flags & VM_SHARED && vma->vm_file) + mapping = vma->vm_file->f_mapping; + + if (mapping) + luf_flush(0); + luf_flush_mm(mm); +} + +void luf_flush_mm(struct mm_struct *mm) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mm) + return; + + lb = &mm->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -962,8 +1030,10 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!can_luf_test()) tlb_ubc = ¤t->tlb_ubc; - else + else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; From patchwork Wed Feb 26 12:01:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D493BC18E7C for ; Wed, 26 Feb 2025 12:02:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2425B280030; Wed, 26 Feb 2025 07:02:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F4F528002D; Wed, 26 Feb 2025 07:02:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06DAD280030; Wed, 26 Feb 2025 07:01:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D22ED28002D for ; Wed, 26 Feb 2025 07:01:59 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 05DED813BB for ; Wed, 26 Feb 2025 12:01:55 +0000 (UTC) X-FDA: 83161956990.21.8DDA3DD Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf13.hostedemail.com (Postfix) with ESMTP id AEAF720037 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=WHk7Qy3oxEORyfQNmjfm3iiZNsAQ/CI5XMApxaqbKNA=; b=f+ryyg0d0A/f8ec34mrAVEYH6HrWAl6KYi3oVUtliwE0VQr8F9Bt/O3LQGi19yDsxAoLb6 gpijoyXbQvBrIynrtRXN/V2aJn+fRdGbXxOnnHaUbFAv+mDRDXLMtBl2aQkaU3R5jj/gR8 1HOxYjVez7Yp0F+dAtrC2rNdljOVIy8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571313; a=rsa-sha256; cv=none; b=brRJku2lGrV9Huq/0vy2z2qYlAQJ5DJ8WCefT8ycoEJkCmrG9RHRoMTEjiRQ9tvad16NfQ zMDt6HeKTx+827inyzmJwxNukz0FLSOhMCuqrFBAUlfmXho3Phho5WCImhTF1cFuFO4+KC H13ZIbAfwB4KDiSz8wxgaRkAC05St4Q= X-AuditID: a67dfc5b-3e1ff7000001d7ae-27-67bf02a7fd3a From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 20/25] mm, fs: skip tlb flushes for luf'd filemap that already has been done Date: Wed, 26 Feb 2025 21:01:27 +0900 Message-Id: <20250226120132.28469-20-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBt0NShZz1q9hs/i84R+b xdf1v5gtnn7qY7G4vGsOm8W9Nf9ZLc7vWstqsWPpPiaLSwcWMFkc7z3AZDH/3mc2i82bpjJb HJ8yldHi9485bA58Ht9b+1g8ds66y+6xYFOpx+YVWh6bVnWyeWz6NInd4925c+weJ2b8ZvF4 v+8qm8fWX3YejVOvsXl83iQXwBPFZZOSmpNZllqkb5fAlbFux132gq7AimnNvA2Mmxy7GDk5 JARMJBY132CHsb9vfccKYrMJqEvcuPGTGcQWETCTONj6B6iGi4NZYBmTxN4TDWwgjrBAD6PE hR9nwKpYBFQlLvU9ZQGxeYE6znR+ZISYKi+xesMBsBpOoPi/3b/BtgkJJEu0rP/NAjJIQuA2 m8Tev63MEA2SEgdX3GCZwMi7gJFhFaNQZl5ZbmJmjoleRmVeZoVecn7uJkZgWC+r/RO9g/HT heBDjAIcjEo8vA/O7E0XYk0sK67MPcQowcGsJMLLmbknXYg3JbGyKrUoP76oNCe1+BCjNAeL kjiv0bfyFCGB9MSS1OzU1ILUIpgsEwenVAOjkpimz+0/0Vk8D+aJCt9cYnX6rIdB0Jm06VON eidnd26Q3SSVWuwjUJUZJCYVzrD8le3RvaqtT1I2Tup8tU6yOpj3dueDzJPLs7YtrhA2/rz8 /M31erNqq5alqLTNu1ufvPASuz3Tx/g5Hv8Sb9xbsXTRzAV3+Fi61u1M6zrldGXFrZ4stfkm SizFGYmGWsxFxYkAIuzsUmcCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wYnZkhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlrNtxl72gK7Bi WjNvA+Mmxy5GTg4JAROJ71vfsYLYbALqEjdu/GQGsUUEzCQOtv5h72Lk4mAWWMYksfdEAxuI IyzQwyhx4ccZsCoWAVWJS31PWUBsXqCOM50fGSGmykus3nAArIYTKP5v9292EFtIIFmiZf1v lgmMXAsYGVYximTmleUmZuaY6hVnZ1TmZVboJefnbmIEBumy2j8TdzB+uex+iFGAg1GJh/fB mb3pQqyJZcWVuYcYJTiYlUR4OTP3pAvxpiRWVqUW5ccXleakFh9ilOZgURLn9QpPTRASSE8s Sc1OTS1ILYLJMnFwSjUwzs147Xe3kiX9JXe5GGcV66b/y/6+OBKeMCPe5l1whs7XY8vPK99b c8w185Jjm8j5FubbwU8qdaff7BUOVpFPY42SjFGb+vmJws3FWm+dF1v5ZlwTnfSCP/N19coD jDffvOy5fk17Oa/uBrtXhyO2SXuY53htcedU73sda/7q18vHe9eLJoSKKrEUZyQaajEXFScC AOtgG4hOAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AEAF720037 X-Stat-Signature: 5f8m8fsbkxb9ffmptzhb1ij1oj1tb6js X-HE-Tag: 1740571312-145793 X-HE-Meta: U2FsdGVkX1+7OxLdlKH3p+1bknyyhDibev1/kVUZZhUY4ofb2mrjmF/D9WWp8OhOV9R2XfOtx7ztGroeBpYdYPXS6h5iJuQ5jl/rOc5Ye3MiEs4pX8j8VnnreFy26hAXllH8Z4A4WNsECuLv/jY3x4PzfTwtWk4/OuU2XSGeW+SqNVSV4d5KjadLdFapLAcJ39V08DnLsH31Rko4K7YuweoaEBQuXf5LNBcO3nD5ejTp6v6o2VqZMROMTa+SCg6qgExDf7BxkbBj5fwtUkNRckKXI2koFRka0jeexxwgb52nvpZdWBVNYxejNyA0n8GZ/GoALRruTc3Lh4ZrP9GIFJzdF2HfiItEHg5mEtISQTJeAcqxiRhfyVrZ5P54zuWa5kKHSSQl4Ed3npGe4bi93YDVAdTLCqhhP9Q40bzhPEkVFDwMJkADE8lH/n8yu9lvtMJx715KpVQ3Py/SVhGVToGj+5Tiu+SJorSaad6u0t6rLRA21dO2/yM/ZyeUK9MHjoRn4+LxbTak7hDPRwdZ0dqWV7q66Vny5siiNLlW1asWdcuihwl427KUJnPuhRm4CyIqXl0VzxLl2RHBODVG/5s+Gr5/hYz6abYOcbNstsOxU6E1g0QzXYH/IQU1QRAR+ZEQjXdcuE34GhI1Q9Tdx6AlUL24LGDfHqwAxm4/OX+1d3tx8NeOEUmogESmsq2j9AQLerI+0dYV+E+SoFsn63UYKy+dRXTxW1f0W1YCfjMgOYAmybAADusWUiu6ggaZemBPz2dysK3OngeSSTEq4wCwNCMLO+jpgI3GS5f5N7wBenvQNfc6APUhRDX7cmpXsc9QbTZz9TOnGh4/d6GDxi/9uBY9suVwZXXPNsCLT9sUlxPWeVw2Ce/mp82eZGaq1tiOrWkRXJQihhbvPER8yrcJ87FZSMWV67pRYcKQY3JAateZmcPIft380qlkU1XuQpZ/6IHuyva9bUtSFBt 4h4QnlGR TOE5ib1Q7bC33j3p9VmGOB5j0gQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: For luf'd filemap, tlb shootdown is performed when updating page cache, no matter whether tlb flushes required already has been done or not. By storing luf meta data in struct address_space and updating the luf meta data properly, we can skip unnecessary tlb flush. Signed-off-by: Byungchul Park --- fs/inode.c | 1 + include/linux/fs.h | 4 ++- include/linux/mm_types.h | 2 ++ mm/memory.c | 4 +-- mm/rmap.c | 59 +++++++++++++++++++++++++--------------- mm/truncate.c | 14 +++++----- mm/vmscan.c | 2 +- 7 files changed, 53 insertions(+), 33 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 5587aabdaa5ee..752fb2df6f3b3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -475,6 +475,7 @@ static void __address_space_init_once(struct address_space *mapping) init_rwsem(&mapping->i_mmap_rwsem); INIT_LIST_HEAD(&mapping->i_private_list); spin_lock_init(&mapping->i_private_lock); + luf_batch_init(&mapping->luf_batch); mapping->i_mmap = RB_ROOT_CACHED; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 78aaf769d32d1..a2f014b31028f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -498,6 +498,7 @@ extern const struct address_space_operations empty_aops; * @i_private_lock: For use by the owner of the address_space. * @i_private_list: For use by the owner of the address_space. * @i_private_data: For use by the owner of the address_space. + * @luf_batch: Data to track need of tlb flush by luf. */ struct address_space { struct inode *host; @@ -519,6 +520,7 @@ struct address_space { struct list_head i_private_list; struct rw_semaphore i_mmap_rwsem; void * i_private_data; + struct luf_batch luf_batch; } __attribute__((aligned(sizeof(long)))) __randomize_layout; /* * On most architectures that alignment is already the case; but @@ -545,7 +547,7 @@ static inline int mapping_write_begin(struct file *file, * Ensure to clean stale tlb entries for this mapping. */ if (!ret) - luf_flush(0); + luf_flush_mapping(mapping); return ret; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c32ef19a25056..d73a3eb0f7b21 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1362,10 +1362,12 @@ extern void tlb_finish_mmu(struct mmu_gather *tlb); void luf_flush(unsigned short luf_key); void luf_flush_mm(struct mm_struct *mm); void luf_flush_vma(struct vm_area_struct *vma); +void luf_flush_mapping(struct address_space *mapping); #else static inline void luf_flush(unsigned short luf_key) {} static inline void luf_flush_mm(struct mm_struct *mm) {} static inline void luf_flush_vma(struct vm_area_struct *vma) {} +static inline void luf_flush_mapping(struct address_space *mapping) {} #endif struct vm_fault; diff --git a/mm/memory.c b/mm/memory.c index 93e5879583b07..62137ab258d2c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6296,10 +6296,10 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, if (flush) { /* * If it has a VM_SHARED mapping, all the mms involved - * should be luf_flush'ed. + * in the struct address_space should be luf_flush'ed. */ if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } diff --git a/mm/rmap.c b/mm/rmap.c index fe9c4606ae542..f5c5190be24e0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -691,7 +691,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, #define NR_LUF_BATCH (1 << (sizeof(short) * 8)) /* - * Use 0th entry as accumulated batch. + * XXX: Reserve the 0th entry for later use. */ struct luf_batch luf_batch[NR_LUF_BATCH]; @@ -936,7 +936,7 @@ void luf_flush_vma(struct vm_area_struct *vma) mapping = vma->vm_file->f_mapping; if (mapping) - luf_flush(0); + luf_flush_mapping(mapping); luf_flush_mm(mm); } @@ -962,6 +962,29 @@ void luf_flush_mm(struct mm_struct *mm) try_to_unmap_flush(); } +void luf_flush_mapping(struct address_space *mapping) +{ + struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; + struct luf_batch *lb; + unsigned long flags; + unsigned long lb_ugen; + + if (!mapping) + return; + + lb = &mapping->luf_batch; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock_irqrestore(&lb->lock, flags); + + if (arch_tlbbatch_diet(&tlb_ubc->arch, lb_ugen)) + return; + + try_to_unmap_flush(); +} +EXPORT_SYMBOL(luf_flush_mapping); + /* * Flush TLB entries for recently unmapped pages from remote CPUs. It is * important if a PTE was dirty when it was unmapped that it's flushed @@ -1010,7 +1033,8 @@ void try_to_unmap_flush_dirty(void) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { struct tlbflush_unmap_batch *tlb_ubc; int batch; @@ -1032,27 +1056,15 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, tlb_ubc = ¤t->tlb_ubc; else { tlb_ubc = ¤t->tlb_ubc_ro; + fold_luf_batch_mm(&mm->luf_batch, mm); + if (mapping) + fold_luf_batch_mm(&mapping->luf_batch, mm); } arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; - if (can_luf_test()) { - struct luf_batch *lb; - unsigned long flags; - - /* - * Accumulate to the 0th entry right away so that - * luf_flush(0) can be uesed to properly perform pending - * TLB flush once this unmapping is observed. - */ - lb = &luf_batch[0]; - write_lock_irqsave(&lb->lock, flags); - __fold_luf_batch(lb, tlb_ubc, new_luf_ugen()); - write_unlock_irqrestore(&lb->lock, flags); - } - /* * Ensure compiler does not re-order the setting of tlb_flush_batched * before the PTE is cleared. @@ -1134,7 +1146,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, unsigned long start, unsigned long end, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + struct address_space *mapping) { } @@ -1511,7 +1524,7 @@ int folio_mkclean(struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return cleaned; } @@ -2198,6 +2211,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, unsigned long nr_pages = 1, end_addr; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2359,7 +2373,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * and traps if the PTE is unmapped. */ if (should_defer_flush(mm, flags)) - set_tlb_ubc_flush_pending(mm, pteval, address, end_addr, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, end_addr, vma, mapping); else flush_tlb_range(vma, address, end_addr); if (pte_dirty(pteval)) @@ -2611,6 +2625,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, enum ttu_flags flags = (enum ttu_flags)(long)arg; unsigned long pfn; unsigned long hsz = 0; + struct address_space *mapping = folio_mapping(folio); /* * When racing against e.g. zap_pte_range() on another cpu, @@ -2758,7 +2773,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE, vma); + set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE, vma, mapping); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } diff --git a/mm/truncate.c b/mm/truncate.c index 68c9ded2f789b..8c133b93cefe8 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -142,7 +142,7 @@ void folio_invalidate(struct folio *folio, size_t offset, size_t length) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); } EXPORT_SYMBOL_GPL(folio_invalidate); @@ -183,7 +183,7 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return 0; } @@ -234,7 +234,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(folio->mapping); if (!folio_test_large(folio)) return true; @@ -324,7 +324,7 @@ long mapping_evict_folio(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } @@ -459,7 +459,7 @@ void truncate_inode_pages_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); } EXPORT_SYMBOL(truncate_inode_pages_range); @@ -579,7 +579,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return count; } @@ -749,7 +749,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range); diff --git a/mm/vmscan.c b/mm/vmscan.c index 422b9a03a6753..f145c09629b97 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -853,7 +853,7 @@ long remove_mapping(struct address_space *mapping, struct folio *folio) /* * Ensure to clean stale tlb entries for this mapping. */ - luf_flush(0); + luf_flush_mapping(mapping); return ret; } From patchwork Wed Feb 26 12:01:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992188 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 872BFC021B8 for ; Wed, 26 Feb 2025 12:02:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48E13280028; Wed, 26 Feb 2025 07:01:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 416E528002D; Wed, 26 Feb 2025 07:01:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26998280028; Wed, 26 Feb 2025 07:01:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id F21A928002D for ; Wed, 26 Feb 2025 07:01:56 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8EB0E1413DC for ; Wed, 26 Feb 2025 12:01:56 +0000 (UTC) X-FDA: 83161957032.23.4C15B0D Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf02.hostedemail.com (Postfix) with ESMTP id 187D380030 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=8SbkmZXse0Zn9LMOc4YYsNDp6iJ7TB4v3e+jzQtKUsA=; b=h/0JbtOwBEQnukkQNM++48wr/2I2y9zlWH7KzYhG+hHV/wgKY8gfFemYn7thXJUkLjDM8g 9V5HusINbBIYF1byG1DIMTaeJVQqVylSIjsEnnBbyxvq76p5JRTm+Xx0YSBDlSYc8jNiHa lNfapj1BiHNOQRK3ZssVJeJ6c4H06po= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571313; a=rsa-sha256; cv=none; b=nl72cJyvgqHTE3cTrPvl5XXuLXWkoP1nteh2MPVPZYpLtfZNzP2epjs+8y8YUIoztSkT/q RRq75AX2j8vQwGwQwMX054hrA8EwYHAzh+3DOrxAwaaILgfRSn5ugA5tNEisf0kCLsvtY5 6YtLZvrU23PzBJ2SG65PBXN+4sILEw8= X-AuditID: a67dfc5b-3e1ff7000001d7ae-2c-67bf02a7c988 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 21/25] mm: perform luf tlb shootdown per zone in batched manner Date: Wed, 26 Feb 2025 21:01:28 +0900 Message-Id: <20250226120132.28469-21-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBk9naFvMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8atNXfZC773MlWsvfebvYGx+yZj FyMnh4SAicS384uZYOzFExexgNhsAuoSN278ZAaxRQTMJA62/mHvYuTiYBZYxiSx90QDG0hC WKBeYv2bdawgNouAqsTE7x1gcV6ghq5TfawQQ+UlVm84ADaIEyj+b/dvdhBbSCBZomX9bxaQ oRIC99kkZv3dAHWRpMTBFTdYJjDyLmBkWMUolJlXlpuYmWOil1GZl1mhl5yfu4kRGNrLav9E 72D8dCH4EKMAB6MSD++DM3vThVgTy4orcw8xSnAwK4nwcmbuSRfiTUmsrEotyo8vKs1JLT7E KM3BoiTOa/StPEVIID2xJDU7NbUgtQgmy8TBKdXAuPiI9QEXP9YvnRafMqrDyzr8NrRN8Krw dZn37qvo1JSWvSuCvvze5mrbef3T0iZvm4KIA6uKtMO2Poh/Z+UUH8oc6F01z//bzOezM2fl 8VrdXuBh0Fvws9Ilz6Vl91M3q7io1sA23WJW0wbuBwpuXdaRh+Ql+9sUSw1/aXyvOWge0t85 S1GJpTgj0VCLuag4EQBReMchaQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wYR98hZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBl3Fpzl73gey9T xdp7v9kbGLtvMnYxcnJICJhILJ64iAXEZhNQl7hx4ycziC0iYCZxsPUPexcjFwezwDImib0n GthAEsIC9RLr36xjBbFZBFQlJn7vAIvzAjV0nepjhRgqL7F6wwGwQZxA8X+7f7OD2EICyRIt 63+zTGDkWsDIsIpRJDOvLDcxM8dUrzg7ozIvs0IvOT93EyMwUJfV/pm4g/HLZfdDjAIcjEo8 vA/O7E0XYk0sK67MPcQowcGsJMLLmbknXYg3JbGyKrUoP76oNCe1+BCjNAeLkjivV3hqgpBA emJJanZqakFqEUyWiYNTqoFRgWfjB+NT++4waWn0lVZM97tQt499dnrAhHXmgbFZEjJnzAJ7 3vC1fH7x79dBuanqrhknM6JeGWkosXL1TBOatefp2aDjXB9nzGn2Z/ZfrBVxJlfjyJnMxS9n rjcWUy6+vvnZry97+rQY6mO0m/jvmT//uKjh8xFOLekA9lMreFfJf7nzYPsBJZbijERDLeai 4kQAW79GElACAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 187D380030 X-Stat-Signature: 3f1ksha6cu7iwqzpgo3x4bj8xdzrdk8n X-HE-Tag: 1740571312-968631 X-HE-Meta: U2FsdGVkX18hktGlQ029NFaywMsQdDaiceIivYsutb94ACd6jug8+Ly5M2rIA+hit9FsbYCS1NkDsxrfVvtz6bon5mvHoDKFQzWsjS8xuYK+l7kSbCVes4ixFj0NE9GQyXCFwkx3tMUZfq9WCXY1dr15r5uJ6LNtz0OL7jzZ6PKQrWJRjDaw4vLEOJuqK1pOX9RrGr/ESLuK+J4jOrOAuL583Ui4OHAciLRXT4DHW+NiznA84OuosNuJwf8fuODJMRDf3P+P8MLwJ1/+WAbvFO3zpNqZYXKP3u3wDmD/J41Ly3m3qmOTvorQ2PlpIeCRKHZduKz31bMyHXTKeTJ8t/OszsKDmUHJbFmawn6RgJTesWY3ZPvbjtS0E/WlsLMcG59RiFpsw4OgydYXXh71ZXHjByB/vhePEvxHFsxaRtIP+PuOnRNM6HIT3m2MrxHhzPOs1Z5E10o6gsQsQgkPRIzwcx/rhqebooGu7lidzQUcjD9oZgKbNSATzHQfB+qnch5xTd7uJuiN7Qdvltj0so3Uxw1mwRcBZdDpPzc/M/RctXoQ7zYWmBN/jpS72BD+Dk+QbJ1kFRjIPcIlNWjMJIQ/rELQq9ZlG/+o2qZWJ47FKa3EabwdYjCIwae/YkyBVfgPKEjTr/3vciCaCRMt8wH74ZFDCq6Kph7HuMN88DYLxNt4zOTvaXhOx/SKM9IB6/Z9vf80u9wPfBZQfIkR/ZW+Ug1LWH3SAalU03kbVP0ZkvPrMA67qj2qomqSxQmOCJTnBrgCuI4Dqwvp9W1tMJdvLOcQu/gCEMd7QRqh8nBq7kmkzoXXJG2cr7lTNzJzK/5mD4FcUZYSRazNjrxzBCJ1RlJjIZmYJhW+l32XqOaXeFXQKpJakcVus1zC5rTOUcCWqx/tygNte9DnAFu+My4A27Pg3Lo5ON60j4+6H6WNxAz4I/npySkXOx8o8eVz85YzPCkr/DUn5zbnpLt kSg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Each luf page in buddy has its pending tlb shootdown information and performs the corresponding tlb shootdown on exit from buddy. However, every exit from buddy causes small but frequent IPIs. Even though total IPIs get reduced, unnecessary waits on conflict CPUs in IPI handler have been observed via perf profiling. Thus, made it perfrom luf tlb shootdown per zone in batched manner when pages exit from buddy so as to avoid frequent IPIs. Signed-off-by: Byungchul Park --- include/linux/mm.h | 44 ++++- include/linux/mm_types.h | 19 +- include/linux/mmzone.h | 9 + include/linux/sched.h | 2 + mm/compaction.c | 10 +- mm/internal.h | 13 +- mm/mm_init.c | 5 + mm/page_alloc.c | 363 +++++++++++++++++++++++++++++++-------- mm/page_reporting.c | 9 +- mm/rmap.c | 6 +- 10 files changed, 383 insertions(+), 97 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 74a37cb132caa..2fa5185880105 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -4240,12 +4240,16 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status); int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status); -#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) /* * luf_ugen will start with 2 so that 1 can be regarded as a passed one. */ #define LUF_UGEN_INIT 2 +/* + * zone_ugen will start with 2 so that 1 can be regarded as done. + */ +#define ZONE_UGEN_INIT 2 +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) static inline bool ugen_before(unsigned long a, unsigned long b) { /* @@ -4256,7 +4260,11 @@ static inline bool ugen_before(unsigned long a, unsigned long b) static inline unsigned long next_ugen(unsigned long ugen) { - if (ugen + 1) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if ((unsigned short)(ugen + 1)) return ugen + 1; /* * Avoid invalid ugen, zero. @@ -4266,7 +4274,11 @@ static inline unsigned long next_ugen(unsigned long ugen) static inline unsigned long prev_ugen(unsigned long ugen) { - if (ugen - 1) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if ((unsigned short)(ugen - 1)) return ugen - 1; /* * Avoid invalid ugen, zero. @@ -4274,4 +4286,30 @@ static inline unsigned long prev_ugen(unsigned long ugen) return ugen - 2; } #endif + +/* + * return the biggest ugen but it should be before the real zone_ugen. + */ +static inline unsigned long page_zone_ugen(struct zone *zone, struct page *page) +{ + unsigned long zone_ugen = zone->zone_ugen; + unsigned short short_zone_ugen = page->zone_ugen; + unsigned long cand1, cand2; + + if (!short_zone_ugen) + return 0; + + cand1 = (zone_ugen & ~(unsigned long)USHRT_MAX) | short_zone_ugen; + cand2 = cand1 - USHRT_MAX - 1; + + if (!ugen_before(zone_ugen, cand1)) + return cand1; + + return cand2; +} + +static inline void set_page_zone_ugen(struct page *page, unsigned short zone_ugen) +{ + page->zone_ugen = zone_ugen; +} #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d73a3eb0f7b21..a1d80ffafe338 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -133,11 +133,20 @@ struct page { */ unsigned short order; - /* - * For tracking need of tlb flush, - * by luf(lazy unmap flush). - */ - unsigned short luf_key; + union { + /* + * For tracking need of + * tlb flush, by + * luf(lazy unmap flush). + */ + unsigned short luf_key; + + /* + * Casted zone_ugen with + * unsigned short. + */ + unsigned short zone_ugen; + }; }; }; }; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9294cbbe698fc..3f2a79631fedf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -117,6 +117,7 @@ extern int page_group_by_mobility_disabled; struct free_area { struct list_head free_list[MIGRATE_TYPES]; struct list_head pend_list[MIGRATE_TYPES]; + unsigned long pend_zone_ugen[MIGRATE_TYPES]; unsigned long nr_free; }; @@ -1017,6 +1018,14 @@ struct zone { atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; /* Count pages that need tlb shootdown on allocation */ atomic_long_t nr_luf_pages; + /* Generation number for that tlb shootdown has been done */ + unsigned long zone_ugen_done; + /* Generation number to control zone batched tlb shootdown */ + unsigned long zone_ugen; + /* Approximate latest luf_ugen that have ever entered */ + unsigned long luf_ugen; + /* Accumulated tlb batch for this zone */ + struct tlbflush_unmap_batch zone_batch; } ____cacheline_internodealigned_in_smp; enum pgdat_flags { diff --git a/include/linux/sched.h b/include/linux/sched.h index 31efc88ce911a..96375274d0335 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1404,6 +1404,8 @@ struct task_struct { int luf_no_shootdown; int luf_takeoff_started; unsigned long luf_ugen; + unsigned long zone_ugen; + unsigned long wait_zone_ugen; #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/compaction.c b/mm/compaction.c index 5dfa53252d75b..c87a1803b10e2 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -655,7 +655,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, goto isolate_fail; } - if (!luf_takeoff_check(page)) + if (!luf_takeoff_check(cc->zone, page)) goto isolate_fail; /* Found a free page, will break it into order-0 pages */ @@ -691,7 +691,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(cc->zone); /* * Be careful to not go outside of the pageblock. @@ -1613,7 +1613,7 @@ static void fast_isolate_freepages(struct compact_control *cc) order_scanned++; nr_scanned++; - if (unlikely(consider_pend && !luf_takeoff_check(freepage))) + if (unlikely(consider_pend && !luf_takeoff_check(cc->zone, freepage))) goto scan_next; pfn = page_to_pfn(freepage); @@ -1681,7 +1681,7 @@ static void fast_isolate_freepages(struct compact_control *cc) /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(cc->zone); /* Skip fast search if enough freepages isolated */ if (cc->nr_freepages >= cc->nr_migratepages) @@ -2419,7 +2419,7 @@ static enum compact_result compact_finished(struct compact_control *cc) */ luf_takeoff_start(); ret = __compact_finished(cc); - luf_takeoff_end(); + luf_takeoff_end(cc->zone); trace_mm_compaction_finished(cc->zone, cc->order, ret); if (ret == COMPACT_NO_SUITABLE_PAGE) diff --git a/mm/internal.h b/mm/internal.h index 9fccfd38e03f0..53056ad7dade9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1610,10 +1610,10 @@ static inline void accept_page(struct page *page) #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) extern struct luf_batch luf_batch[]; bool luf_takeoff_start(void); -void luf_takeoff_end(void); +void luf_takeoff_end(struct zone *zone); bool luf_takeoff_no_shootdown(void); -bool luf_takeoff_check(struct page *page); -bool luf_takeoff_check_and_fold(struct page *page); +bool luf_takeoff_check(struct zone *zone, struct page *page); +bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page); static inline bool non_luf_pages_ok(struct zone *zone) { @@ -1623,7 +1623,6 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } - unsigned short fold_unmap_luf(void); /* @@ -1711,10 +1710,10 @@ static inline bool can_luf_vma(struct vm_area_struct *vma) } #else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ static inline bool luf_takeoff_start(void) { return false; } -static inline void luf_takeoff_end(void) {} +static inline void luf_takeoff_end(struct zone *zone) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } -static inline bool luf_takeoff_check(struct page *page) { return true; } -static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +static inline bool luf_takeoff_check(struct zone *zone, struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) { return true; } static inline bool non_luf_pages_ok(struct zone *zone) { return true; } static inline unsigned short fold_unmap_luf(void) { return 0; } diff --git a/mm/mm_init.c b/mm/mm_init.c index 81c5060496112..f067d82f797be 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1422,6 +1422,7 @@ static void __meminit zone_init_free_lists(struct zone *zone) for_each_migratetype_order(order, t) { INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); INIT_LIST_HEAD(&zone->free_area[order].pend_list[t]); + zone->free_area[order].pend_zone_ugen[t] = ZONE_UGEN_INIT; zone->free_area[order].nr_free = 0; } @@ -1429,6 +1430,10 @@ static void __meminit zone_init_free_lists(struct zone *zone) INIT_LIST_HEAD(&zone->unaccepted_pages); #endif atomic_long_set(&zone->nr_luf_pages, 0); + zone->zone_ugen_done = ZONE_UGEN_INIT - 1; + zone->zone_ugen = ZONE_UGEN_INIT; + zone->luf_ugen = LUF_UGEN_INIT - 1; + reset_batch(&zone->zone_batch); } void __meminit init_currently_empty_zone(struct zone *zone, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f35ae2550019f..0f986cfa4fe39 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -663,16 +663,29 @@ bool luf_takeoff_start(void) return !no_shootdown; } +static void wait_zone_ugen_done(struct zone *zone, unsigned long zone_ugen) +{ + while (ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + cond_resched(); +} + +static void set_zone_ugen_done(struct zone *zone, unsigned long zone_ugen) +{ + WRITE_ONCE(zone->zone_ugen_done, zone_ugen); +} + /* * Should be called within the same context of luf_takeoff_start(). */ -void luf_takeoff_end(void) +void luf_takeoff_end(struct zone *zone) { struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; unsigned long flags; bool no_shootdown; bool outmost = false; unsigned long cur_luf_ugen; + unsigned long cur_zone_ugen; + unsigned long cur_wait_zone_ugen; local_irq_save(flags); VM_WARN_ON(!current->luf_takeoff_started); @@ -700,6 +713,8 @@ void luf_takeoff_end(void) goto out; cur_luf_ugen = current->luf_ugen; + cur_zone_ugen = current->zone_ugen; + cur_wait_zone_ugen = current->wait_zone_ugen; current->luf_ugen = 0; @@ -707,10 +722,38 @@ void luf_takeoff_end(void) reset_batch(tlb_ubc_takeoff); try_to_unmap_flush_takeoff(); + + if (cur_wait_zone_ugen || cur_zone_ugen) { + /* + * pcp(zone == NULL) doesn't work with zone batch. + */ + if (zone) { + current->zone_ugen = 0; + current->wait_zone_ugen = 0; + + /* + * Guarantee that tlb shootdown required for the + * zone_ugen has been completed once observing + * 'zone_ugen_done'. + */ + smp_mb(); + + /* + * zone->zone_ugen_done should be updated + * sequentially. + */ + if (cur_wait_zone_ugen) + wait_zone_ugen_done(zone, cur_wait_zone_ugen); + if (cur_zone_ugen) + set_zone_ugen_done(zone, cur_zone_ugen); + } + } out: if (outmost) { VM_WARN_ON(current->luf_no_shootdown); VM_WARN_ON(current->luf_ugen); + VM_WARN_ON(current->zone_ugen); + VM_WARN_ON(current->wait_zone_ugen); } } @@ -741,9 +784,9 @@ bool luf_takeoff_no_shootdown(void) * Should be called with either zone lock held and irq disabled or pcp * lock held. */ -bool luf_takeoff_check(struct page *page) +bool luf_takeoff_check(struct zone *zone, struct page *page) { - unsigned short luf_key = page_luf_key(page); + unsigned long zone_ugen; /* * No way. Delimit using luf_takeoff_{start,end}(). @@ -753,7 +796,29 @@ bool luf_takeoff_check(struct page *page) return false; } - if (!luf_key) + if (!zone) { + unsigned short luf_key = page_luf_key(page); + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + return true; + } + + zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + return true; + + /* + * Should not be zero since zone-zone_ugen has been updated in + * __free_one_page() -> update_zone_batch(). + */ + VM_WARN_ON(!zone->zone_ugen); + + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) return true; return !current->luf_no_shootdown; @@ -763,13 +828,11 @@ bool luf_takeoff_check(struct page *page) * Should be called with either zone lock held and irq disabled or pcp * lock held. */ -bool luf_takeoff_check_and_fold(struct page *page) +bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) { struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; - unsigned short luf_key = page_luf_key(page); - struct luf_batch *lb; - unsigned long lb_ugen; unsigned long flags; + unsigned long zone_ugen; /* * No way. Delimit using luf_takeoff_{start,end}(). @@ -779,28 +842,94 @@ bool luf_takeoff_check_and_fold(struct page *page) return false; } - if (!luf_key) - return true; + /* + * pcp case + */ + if (!zone) { + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long lb_ugen; - lb = &luf_batch[luf_key]; - read_lock_irqsave(&lb->lock, flags); - lb_ugen = lb->ugen; + if (!luf_key) + return true; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + + if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + read_unlock_irqrestore(&lb->lock, flags); + return true; + } + + if (current->luf_no_shootdown) { + read_unlock_irqrestore(&lb->lock, flags); + return false; + } - if (arch_tlbbatch_check_done(&lb->batch.arch, lb_ugen)) { + fold_batch(tlb_ubc_takeoff, &lb->batch, false); read_unlock_irqrestore(&lb->lock, flags); + + if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) + current->luf_ugen = lb_ugen; return true; } - if (current->luf_no_shootdown) { - read_unlock_irqrestore(&lb->lock, flags); + zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + return true; + + /* + * Should not be zero since zone-zone_ugen has been updated in + * __free_one_page() -> update_zone_batch(). + */ + VM_WARN_ON(!zone->zone_ugen); + + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + return true; + + if (current->luf_no_shootdown) return false; - } - fold_batch(tlb_ubc_takeoff, &lb->batch, false); - read_unlock_irqrestore(&lb->lock, flags); + /* + * zone batched flush has been already set. + */ + if (current->zone_ugen) + return true; + + /* + * Others are already performing tlb shootdown for us. All we + * need is to wait for those to complete. + */ + if (zone_ugen != zone->zone_ugen) { + if (!current->wait_zone_ugen || + ugen_before(current->wait_zone_ugen, zone_ugen)) + current->wait_zone_ugen = zone_ugen; + /* + * It's the first time that zone->zone_ugen has been set to + * current->zone_ugen. current->luf_ugen also get set. + */ + } else { + current->wait_zone_ugen = prev_ugen(zone->zone_ugen); + current->zone_ugen = zone->zone_ugen; + current->luf_ugen = zone->luf_ugen; + + /* + * Now that tlb shootdown for the zone_ugen will be + * performed at luf_takeoff_end(), advance it so that + * the next zone->lock holder can efficiently avoid + * unnecessary tlb shootdown. + */ + zone->zone_ugen = next_ugen(zone->zone_ugen); - if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) - current->luf_ugen = lb_ugen; + /* + * All the luf pages will eventually become non-luf + * pages by tlb flushing at luf_takeoff_end() and, + * flush_pend_list_if_done() will empty pend_list. + */ + atomic_long_set(&zone->nr_luf_pages, 0); + fold_batch(tlb_ubc_takeoff, &zone->zone_batch, true); + } return true; } #endif @@ -822,6 +951,42 @@ static inline void account_freepages(struct zone *zone, int nr_pages, zone->nr_free_highatomic + nr_pages); } +static void flush_pend_list_if_done(struct zone *zone, + struct free_area *area, int migratetype) +{ + unsigned long zone_ugen_done = READ_ONCE(zone->zone_ugen_done); + + /* + * tlb shootdown required for the zone_ugen already has been + * done. Thus, let's move pages in pend_list to free_list to + * secure more non-luf pages. + */ + if (!ugen_before(zone_ugen_done, area->pend_zone_ugen[migratetype])) + list_splice_init(&area->pend_list[migratetype], + &area->free_list[migratetype]); +} + +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +/* + * Should be called with zone->lock held and irq disabled. + */ +static void update_zone_batch(struct zone *zone, unsigned short luf_key) +{ + unsigned long lb_ugen; + struct luf_batch *lb = &luf_batch[luf_key]; + + read_lock(&lb->lock); + fold_batch(&zone->zone_batch, &lb->batch, false); + lb_ugen = lb->ugen; + read_unlock(&lb->lock); + + if (ugen_before(zone->luf_ugen, lb_ugen)) + zone->luf_ugen = lb_ugen; +} +#else +static void update_zone_batch(struct zone *zone, unsigned short luf_key) {} +#endif + /* Used for pages not on another list */ static inline void __add_to_free_list(struct page *page, struct zone *zone, unsigned int order, int migratetype, @@ -830,6 +995,12 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, struct free_area *area = &zone->free_area[order]; struct list_head *list; + /* + * Good chance to flush pend_list just before updating the + * {free,pend}_list. + */ + flush_pend_list_if_done(zone, area, migratetype); + VM_WARN_ONCE(get_pageblock_migratetype(page) != migratetype, "page type is %lu, passed migratetype is %d (nr=%d)\n", get_pageblock_migratetype(page), migratetype, 1 << order); @@ -839,8 +1010,9 @@ static inline void __add_to_free_list(struct page *page, struct zone *zone, * positive is okay because it will cause just additional tlb * shootdown. */ - if (page_luf_key(page)) { + if (page_zone_ugen(zone, page)) { list = &area->pend_list[migratetype]; + area->pend_zone_ugen[migratetype] = zone->zone_ugen; atomic_long_add(1 << order, &zone->nr_luf_pages); } else list = &area->free_list[migratetype]; @@ -862,6 +1034,7 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, unsigned int order, int old_mt, int new_mt) { struct free_area *area = &zone->free_area[order]; + unsigned long zone_ugen = page_zone_ugen(zone, page); /* Free page moving can fail, so it happens before the type update */ VM_WARN_ONCE(get_pageblock_migratetype(page) != old_mt, @@ -878,9 +1051,12 @@ static inline void move_to_free_list(struct page *page, struct zone *zone, * positive is okay because it will cause just additional tlb * shootdown. */ - if (page_luf_key(page)) + if (zone_ugen) { list_move_tail(&page->buddy_list, &area->pend_list[new_mt]); - else + if (!area->pend_zone_ugen[new_mt] || + ugen_before(area->pend_zone_ugen[new_mt], zone_ugen)) + area->pend_zone_ugen[new_mt] = zone_ugen; + } else list_move_tail(&page->buddy_list, &area->free_list[new_mt]); account_freepages(zone, -(1 << order), old_mt); @@ -898,7 +1074,7 @@ static inline void __del_page_from_free_list(struct page *page, struct zone *zon if (page_reported(page)) __ClearPageReported(page); - if (page_luf_key(page)) + if (page_zone_ugen(zone, page)) atomic_long_sub(1 << order, &zone->nr_luf_pages); list_del(&page->buddy_list); @@ -936,29 +1112,39 @@ static inline struct page *get_page_from_free_area(struct zone *zone, */ pend_first = !non_luf_pages_ok(zone); + /* + * Good chance to flush pend_list just before updating the + * {free,pend}_list. + */ + flush_pend_list_if_done(zone, area, migratetype); + if (pend_first) { page = list_first_entry_or_null(&area->pend_list[migratetype], struct page, buddy_list); - if (page && luf_takeoff_check(page)) + if (page && luf_takeoff_check(zone, page)) return page; page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); - if (page) + if (page) { + set_page_zone_ugen(page, 0); return page; + } } else { page = list_first_entry_or_null(&area->free_list[migratetype], struct page, buddy_list); - if (page) + if (page) { + set_page_zone_ugen(page, 0); return page; + } page = list_first_entry_or_null(&area->pend_list[migratetype], struct page, buddy_list); - if (page && luf_takeoff_check(page)) + if (page && luf_takeoff_check(zone, page)) return page; } return NULL; @@ -1023,6 +1209,7 @@ static inline void __free_one_page(struct page *page, unsigned long combined_pfn; struct page *buddy; bool to_tail; + unsigned long zone_ugen; VM_BUG_ON(!zone_is_initialized(zone)); VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page); @@ -1034,20 +1221,25 @@ static inline void __free_one_page(struct page *page, account_freepages(zone, 1 << order, migratetype); /* - * Use the page's luf_key unchanged if luf_key == 0. Worth - * noting that page_luf_key() will be 0 in most cases since it's - * initialized at free_pages_prepare(). + * Use the page's zone_ugen unchanged if luf_key == 0. Worth + * noting that page_zone_ugen() will be 0 in most cases since + * it's initialized at free_pages_prepare(). + * + * Update page's zone_ugen and zone's batch only if a valid + * luf_key was passed. */ - if (luf_key) - set_page_luf_key(page, luf_key); - else - luf_key = page_luf_key(page); + if (luf_key) { + zone_ugen = zone->zone_ugen; + set_page_zone_ugen(page, (unsigned short)zone_ugen); + update_zone_batch(zone, luf_key); + } else + zone_ugen = page_zone_ugen(zone, page); while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; - unsigned short buddy_luf_key; + unsigned long buddy_zone_ugen; - if (!luf_key && compaction_capture(capc, page, order, migratetype)) { + if (!zone_ugen && compaction_capture(capc, page, order, migratetype)) { account_freepages(zone, -(1 << order), migratetype); return; } @@ -1080,17 +1272,15 @@ static inline void __free_one_page(struct page *page, else __del_page_from_free_list(buddy, zone, order, buddy_mt); + buddy_zone_ugen = page_zone_ugen(zone, buddy); + /* - * !buddy_luf_key && !luf_key : do nothing - * buddy_luf_key && !luf_key : luf_key = buddy_luf_key - * !buddy_luf_key && luf_key : do nothing - * buddy_luf_key && luf_key : merge two into luf_key + * if (!zone_ugen && !buddy_zone_ugen) : nothing to do + * if ( zone_ugen && !buddy_zone_ugen) : nothing to do */ - buddy_luf_key = page_luf_key(buddy); - if (buddy_luf_key && !luf_key) - luf_key = buddy_luf_key; - else if (buddy_luf_key && luf_key) - fold_luf_batch(&luf_batch[luf_key], &luf_batch[buddy_luf_key]); + if ((!zone_ugen && buddy_zone_ugen) || + ( zone_ugen && buddy_zone_ugen && ugen_before(zone_ugen, buddy_zone_ugen))) + zone_ugen = buddy_zone_ugen; if (unlikely(buddy_mt != migratetype)) { /* @@ -1103,7 +1293,7 @@ static inline void __free_one_page(struct page *page, combined_pfn = buddy_pfn & pfn; page = page + (combined_pfn - pfn); - set_page_luf_key(page, luf_key); + set_page_zone_ugen(page, zone_ugen); pfn = combined_pfn; order++; } @@ -1524,6 +1714,7 @@ static void free_pcppages_bulk(struct zone *zone, int count, do { unsigned long pfn; int mt; + unsigned short luf_key; page = list_last_entry(list, struct page, pcp_list); pfn = page_to_pfn(page); @@ -1534,7 +1725,16 @@ static void free_pcppages_bulk(struct zone *zone, int count, count -= nr_pages; pcp->count -= nr_pages; - __free_one_page(page, pfn, zone, order, mt, FPI_NONE, 0); + /* + * page private in pcp stores luf_key while it + * stores zone_ugen in buddy. Thus, the private + * needs to be cleared and the luf_key needs to + * be passed to buddy. + */ + luf_key = page_luf_key(page); + set_page_private(page, 0); + + __free_one_page(page, pfn, zone, order, mt, FPI_NONE, luf_key); trace_mm_page_pcpu_drain(page, order, mt); } while (count > 0 && !list_empty(list)); @@ -1579,7 +1779,15 @@ static void free_one_page(struct zone *zone, struct page *page, * valid luf_key can be passed only if order == 0. */ VM_WARN_ON(luf_key && order); - set_page_luf_key(page, luf_key); + + /* + * Update page's zone_ugen and zone's batch only if a valid + * luf_key was passed. + */ + if (luf_key) { + set_page_zone_ugen(page, (unsigned short)zone->zone_ugen); + update_zone_batch(zone, luf_key); + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -1733,7 +1941,7 @@ static inline unsigned int expand(struct zone *zone, struct page *page, int low, if (set_page_guard(zone, &page[size], high)) continue; - if (page_luf_key(&page[size])) + if (page_zone_ugen(zone, &page[size])) tail = true; __add_to_free_list(&page[size], zone, high, migratetype, tail); @@ -1751,7 +1959,7 @@ static __always_inline void page_del_and_expand(struct zone *zone, int nr_pages = 1 << high; __del_page_from_free_list(page, zone, high, migratetype); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); nr_pages -= expand(zone, page, low, high, migratetype); account_freepages(zone, -nr_pages, migratetype); @@ -2280,7 +2488,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, unsigned int nr_added; del_page_from_free_list(page, zone, current_order, block_type); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); change_pageblock_range(page, current_order, start_type); nr_added = expand(zone, page, order, current_order, start_type); @@ -2519,12 +2727,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, WARN_ON_ONCE(ret == -1); if (ret > 0) { spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); return ret; } } spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); } return false; @@ -2689,12 +2897,15 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, * pages are ordered properly. */ list_add_tail(&page->pcp_list, list); + + /* + * Reset all the luf fields. tlb shootdown will be + * performed at luf_takeoff_end() below if needed. + */ + set_page_private(page, 0); } spin_unlock_irqrestore(&zone->lock, flags); - /* - * Check and flush before using the pages taken off. - */ - luf_takeoff_end(); + luf_takeoff_end(zone); return i; } @@ -3208,7 +3419,7 @@ int __isolate_free_page(struct page *page, unsigned int order, bool willputback) } del_page_from_free_list(page, zone, order, mt); - if (unlikely(!willputback && !luf_takeoff_check_and_fold(page))) + if (unlikely(!willputback && !luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); /* @@ -3307,7 +3518,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, if (!page) { spin_unlock_irqrestore(&zone->lock, flags); - luf_takeoff_end(); + luf_takeoff_end(zone); return NULL; } } @@ -3315,7 +3526,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); } while (check_new_pages(page, order)); __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); @@ -3405,7 +3616,7 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, } list_for_each_entry(page, list, pcp_list) { - if (luf_takeoff_check_and_fold(page)) { + if (luf_takeoff_check_and_fold(NULL, page)) { list_del(&page->pcp_list); pcp->count -= 1 << order; break; @@ -3440,7 +3651,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (!pcp) { pcp_trylock_finish(UP_flags); - luf_takeoff_end(); + luf_takeoff_end(NULL); return NULL; } @@ -3457,7 +3668,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(NULL); if (page) { __count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order); zone_statistics(preferred_zone, zone, 1); @@ -3496,6 +3707,7 @@ struct page *rmqueue(struct zone *preferred_zone, migratetype); out: + /* Separate test+clear to avoid unnecessary atomics */ if ((alloc_flags & ALLOC_KSWAPD) && unlikely(test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags))) { @@ -5095,7 +5307,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(NULL); __count_zid_vm_events(PGALLOC, zone_idx(zone), nr_account); zone_statistics(zonelist_zone(ac.preferred_zoneref), zone, nr_account); @@ -5105,7 +5317,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, failed_irq: pcp_trylock_finish(UP_flags); - luf_takeoff_end(); + luf_takeoff_end(NULL); failed: page = __alloc_pages_noprof(gfp, 0, preferred_nid, nodemask); @@ -7188,7 +7400,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, VM_WARN_ON(get_pageblock_migratetype(page) != MIGRATE_ISOLATE); order = buddy_order(page); del_page_from_free_list(page, zone, order, MIGRATE_ISOLATE); - if (unlikely(!luf_takeoff_check_and_fold(page))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page))) VM_WARN_ON(1); pfn += (1 << order); } @@ -7196,7 +7408,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return end_pfn - start_pfn - already_offline; } @@ -7258,7 +7470,7 @@ static void break_down_buddy_pages(struct zone *zone, struct page *page, if (set_page_guard(zone, current_buddy, high)) continue; - if (page_luf_key(current_buddy)) + if (page_zone_ugen(zone, current_buddy)) tail = true; add_to_free_list(current_buddy, zone, high, migratetype, tail); @@ -7290,7 +7502,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order, migratetype); - if (unlikely(!luf_takeoff_check_and_fold(page_head))) + if (unlikely(!luf_takeoff_check_and_fold(zone, page_head))) VM_WARN_ON(1); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); @@ -7306,7 +7518,7 @@ bool take_page_off_buddy(struct page *page) /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return ret; } @@ -7325,6 +7537,13 @@ bool put_page_back_buddy(struct page *page) int migratetype = get_pfnblock_migratetype(page, pfn); ClearPageHWPoisonTakenOff(page); + + /* + * Reset all the luf fields. tlb shootdown has already + * been performed by take_page_off_buddy(). + */ + set_page_private(page, 0); + __free_one_page(page, pfn, zone, 0, migratetype, FPI_NONE, 0); if (TestClearPageHWPoison(page)) { ret = true; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index e152b22fbba8a..b23d3ed34ec07 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -118,7 +118,8 @@ page_reporting_drain(struct page_reporting_dev_info *prdev, /* * Ensure private is zero before putting into the - * allocator. + * allocator. tlb shootdown has already been performed + * at isolation. */ set_page_private(page, 0); @@ -194,7 +195,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (PageReported(page)) continue; - if (unlikely(consider_pend && !luf_takeoff_check(page))) { + if (unlikely(consider_pend && !luf_takeoff_check(zone, page))) { VM_WARN_ON(1); continue; } @@ -238,7 +239,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); /* begin processing pages in local list */ err = prdev->report(prdev, sgl, PAGE_REPORTING_CAPACITY); @@ -283,7 +284,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* * Check and flush before using the pages taken off. */ - luf_takeoff_end(); + luf_takeoff_end(zone); return err; } diff --git a/mm/rmap.c b/mm/rmap.c index f5c5190be24e0..a2dc002a9c33d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -650,7 +650,11 @@ static unsigned long new_luf_ugen(void) { unsigned long ugen = atomic_long_inc_return(&luf_ugen); - if (!ugen) + /* + * Avoid zero even in unsigned short range so as to treat + * '(unsigned short)ugen == 0' as invalid. + */ + if (!(unsigned short)ugen) ugen = atomic_long_inc_return(&luf_ugen); return ugen; From patchwork Wed Feb 26 12:01:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992194 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 973A3C021B8 for ; Wed, 26 Feb 2025 12:02:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B932B280033; Wed, 26 Feb 2025 07:02:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B448228002D; Wed, 26 Feb 2025 07:02:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9980F280033; Wed, 26 Feb 2025 07:02:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 414B628002D for ; Wed, 26 Feb 2025 07:02:08 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4CE9EC13FD for ; Wed, 26 Feb 2025 12:01:56 +0000 (UTC) X-FDA: 83161957032.27.96FA760 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf16.hostedemail.com (Postfix) with ESMTP id 00059180034 for ; Wed, 26 Feb 2025 12:01:52 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571313; a=rsa-sha256; cv=none; b=OHUSIiqZHPPe/soCvB1ZYUlVy0R9qBtIXzIw0qdNxzby9T+r6QzrRxW7CS7t79Qp2RjLcR bXXi2LnkhJc9d1mXEKYl/VPRqJ9X8OfDhY6ymMPSRdKGyniXbUHmVZ19QGH/ALDBs0UTk9 EDEl0XdkS0XgLg7lcRIwVUHuT0/tsDE= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf16.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571313; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=yt/j7uLHHbHD82MNbKO0fiJJBF3HG+noFKUKLwwlwIo=; b=7iXun15eLQWZWAGEiNGQbczklAjyZnAgg6MQU+F/M3qK37HvShVwbcVTzR9w2tyEPKxBYk UOgljlp/7j/zcPOlf9C4dtPAQe9ObFociM5+GUcOAGgImM/jKq8HnOloz+v+lbAlmnAeTk kfrfdsoIap59fR/J5mWmOcXTUqk+6nw= X-AuditID: a67dfc5b-3e1ff7000001d7ae-31-67bf02a743c9 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 22/25] mm/page_alloc: not allow to tlb shootdown if !preemptable() && non_luf_pages_ok() Date: Wed, 26 Feb 2025 21:01:29 +0900 Message-Id: <20250226120132.28469-22-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrKLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBj0TtS3mrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+P19GUsBV+sKj5ff83WwPjOoIuR k0NCwERi48knjDD2rP6vzCA2m4C6xI0bP8FsEQEziYOtf9i7GLk4mAWWMUnsPdHABuIIC8xk lHjx/w4rSBWLgKrEiU0TmUBsXqCOG/Oa2CCmykus3nAAbBInUPzf7t/sILaQQLJEy/rfLCCD JATus0k0/fkAdYakxMEVN1gmMPIuYGRYxSiUmVeWm5iZY6KXUZmXWaGXnJ+7iREY2Mtq/0Tv YPx0IfgQowAHoxIP74Mze9OFWBPLiitzDzFKcDArifByZu5JF+JNSaysSi3Kjy8qzUktPsQo zcGiJM5r9K08RUggPbEkNTs1tSC1CCbLxMEp1cBYO7Hbtv1PTnnl1/QjFxe3TSxfW2WhO43r 1I+i/wf3C6WJz9t2cMqP5UKlaVU2C6aZC/43SPCsv7fpwoa/dssC3rx/ai6k4P97s6VRj1bc gq0TjFa8fG67RGjyaSZdV+4XZcpq1W03Xbec+79sP8+Z5u0yni/jH3R9zXtjf7/aWLNn+yEv t+1nlFiKMxINtZiLihMBFJdnCmgCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6Qf9jRYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgyXk9fxlLwxari 8/XXbA2M7wy6GDk5JARMJGb1f2UGsdkE1CVu3PgJZosImEkcbP3D3sXIxcEssIxJYu+JBjYQ R1hgJqPEi/93WEGqWARUJU5smsgEYvMCddyY18QGMVVeYvWGA2CTOIHi/3b/ZgexhQSSJVrW /2aZwMi1gJFhFaNIZl5ZbmJmjqlecXZGZV5mhV5yfu4mRmCYLqv9M3EH45fL7ocYBTgYlXh4 H5zZmy7EmlhWXJl7iFGCg1lJhJczc0+6EG9KYmVValF+fFFpTmrxIUZpDhYlcV6v8NQEIYH0 xJLU7NTUgtQimCwTB6dUA6Pi/8N/DF4/6VNO22JuWHxJQMp62mxfzyTWiXVzMvdemGr8WY7x W06zx4ubobOnCS+9+uOao6eg2p78a0f+O8/dWVdwRk+n9POX77qpzxiTb5vkrftso+UbnWLq 7qi3UDR2XrmweYRRMmfjnJeuPPOPOp0/JZMX1z9r+6zNvq9jCsNfuDPXLFJiKc5INNRiLipO BACYa2oITwIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 00059180034 X-Stat-Signature: y1bbjhbw74yz74a3a1zj9wtowx16fwqs X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740571312-153931 X-HE-Meta: U2FsdGVkX1+O03uVno6XRSkDvHaMelXkaN3MYku1YgOjeIN19UFMt3SjdSQQyeDp64eO5x9GRHqSQslpsFm2Obelrx4AvresDcUIy2OEaHfWFAtHjNLb4HMOjpzUCUfXPTD2Tm2ZspPp/tKUcd5iO249p+hpRCVBOa+8USnd3Vm+nhtzZZGa2CovPffYZsNZ9B7Kk9yVJmykq09OXXCBOgyu9/cNqDbiff9bEtqWApy1wd1oSjT/cnLPgPgOXBPyPExS1ilWZnaSd9G3bDhytwwTV+jebeFSKSo9rcpNB2Km9bDPpPAuMbJH422HLowRgAKwDhM1pYOXWe/m5XdhApGIih0dA1W9GyLBLszVAp8M9PwSNA0eIUJ7o40mT4oHJI0hvkiPpzRhFrAuINnbqEcrkw7nwjMiDeSRrYrSuqczuJL5W1BV7OZlXJFq0fWdZH7X40vQQJmY/xFq78YRbZHDbq6KDdzjKQUqIlAIzaWkckHwHJaaZqYukUtCEmFUMnD930UXaX2253SyvTnbdV/MYXQTqwA1PNfll0kQS3fRqqjCOlJtU01WmwjkpiGHbKe/uBwHN2YoO0KoR/GCPCoT483LjJQ3lHU4ZjlpgEWtOgtkqG6odLh4ryI7ExYhv/w1/ivUhNEWZ8ItOEuKbpvbXQNRrzIypyoPsboZpokLrNEoM85/z3dhSEXJPraJgQ7w9jdB/N210zQJRNU+IWpKdGAorY4J1JaNp+PwZ86T+qmf4gTfqXk9hdPEPrsqOs9JTH1j8st4IrQ3Fn8BeTd6udoB5SZymVcCP1u0c4cP9ATMN9KDpCEPZ2ZY+iNjuQhmMj77UQFDJYr5h0UFgh8rjKi/I36ahUKUd8jX5BxT9jzcntdiaptWAGch6pVaHvq92iIUKn/WIt4pQXhHDyjKKG14HL9+kdbPV8mf7mx/3XLcXt5e1+Ey2dRw4IXFVNI9o8ly/xl4s2Hawb2 kRnEdu+8 6On6zxlhBSeY05Pu/3ZAlhSZHs92NyUXwYiiF9677a6cUIHIOnpQcVk6CgNzY6YGSSuiAnb5q4Ntp2WU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Do not perform tlb shootdown if the context is in preempt disable and there are already enough non luf pages, not to hurt preemptibility. Signed-off-by: Byungchul Park --- mm/compaction.c | 6 +++--- mm/internal.h | 5 +++-- mm/page_alloc.c | 27 +++++++++++++++------------ mm/page_isolation.c | 2 +- mm/page_reporting.c | 4 ++-- 5 files changed, 24 insertions(+), 20 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index c87a1803b10e2..9098ddb04bbf5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -606,7 +606,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, page = pfn_to_page(blockpfn); - luf_takeoff_start(); + luf_takeoff_start(cc->zone); /* Isolate free pages. */ for (; blockpfn < end_pfn; blockpfn += stride, page += stride) { int isolated; @@ -1603,7 +1603,7 @@ static void fast_isolate_freepages(struct compact_control *cc) if (!area->nr_free) continue; - can_shootdown = luf_takeoff_start(); + can_shootdown = luf_takeoff_start(cc->zone); spin_lock_irqsave(&cc->zone->lock, flags); freelist = &area->free_list[MIGRATE_MOVABLE]; retry: @@ -2417,7 +2417,7 @@ static enum compact_result compact_finished(struct compact_control *cc) * luf_takeoff_{start,end}() is required to identify whether * this compaction context is tlb shootdownable for luf'd pages. */ - luf_takeoff_start(); + luf_takeoff_start(cc->zone); ret = __compact_finished(cc); luf_takeoff_end(cc->zone); diff --git a/mm/internal.h b/mm/internal.h index 53056ad7dade9..7c4198f5e22c3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1609,7 +1609,7 @@ static inline void accept_page(struct page *page) #endif /* CONFIG_UNACCEPTED_MEMORY */ #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) extern struct luf_batch luf_batch[]; -bool luf_takeoff_start(void); +bool luf_takeoff_start(struct zone *zone); void luf_takeoff_end(struct zone *zone); bool luf_takeoff_no_shootdown(void); bool luf_takeoff_check(struct zone *zone, struct page *page); @@ -1623,6 +1623,7 @@ static inline bool non_luf_pages_ok(struct zone *zone) return nr_free - nr_luf_pages > min_wm; } + unsigned short fold_unmap_luf(void); /* @@ -1709,7 +1710,7 @@ static inline bool can_luf_vma(struct vm_area_struct *vma) return true; } #else /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ -static inline bool luf_takeoff_start(void) { return false; } +static inline bool luf_takeoff_start(struct zone *zone) { return false; } static inline void luf_takeoff_end(struct zone *zone) {} static inline bool luf_takeoff_no_shootdown(void) { return true; } static inline bool luf_takeoff_check(struct zone *zone, struct page *page) { return true; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0f986cfa4fe39..9a58d6f7a9609 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -623,22 +623,25 @@ compaction_capture(struct capture_control *capc, struct page *page, #endif /* CONFIG_COMPACTION */ #if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) -static bool no_shootdown_context(void) +static bool no_shootdown_context(struct zone *zone) { /* - * If it performs with irq disabled, that might cause a deadlock. - * Avoid tlb shootdown in this case. + * Tries to avoid tlb shootdown if !preemptible(). However, it + * should be allowed under heavy memory pressure. */ + if (zone && non_luf_pages_ok(zone)) + return !(preemptible() && in_task()); + return !(!irqs_disabled() && in_task()); } /* * Can be called with zone lock released and irq enabled. */ -bool luf_takeoff_start(void) +bool luf_takeoff_start(struct zone *zone) { unsigned long flags; - bool no_shootdown = no_shootdown_context(); + bool no_shootdown = no_shootdown_context(zone); local_irq_save(flags); @@ -2669,7 +2672,7 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac, * luf_takeoff_{start,end}() is required for * get_page_from_free_area() to use luf_takeoff_check(). */ - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct free_area *area = &(zone->free_area[order]); @@ -2874,7 +2877,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long flags; int i; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype, @@ -3500,7 +3503,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, do { page = NULL; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); if (alloc_flags & ALLOC_HIGHATOMIC) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); @@ -3645,7 +3648,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, struct page *page; unsigned long __maybe_unused UP_flags; - luf_takeoff_start(); + luf_takeoff_start(NULL); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -5268,7 +5271,7 @@ unsigned long alloc_pages_bulk_noprof(gfp_t gfp, int preferred_nid, if (unlikely(!zone)) goto failed; - luf_takeoff_start(); + luf_takeoff_start(NULL); /* spin_trylock may fail due to a parallel drain or IRQ reentrancy. */ pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); @@ -7371,7 +7374,7 @@ unsigned long __offline_isolated_pages(unsigned long start_pfn, offline_mem_sections(pfn, end_pfn); zone = page_zone(pfn_to_page(pfn)); - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); while (pfn < end_pfn) { page = pfn_to_page(pfn); @@ -7489,7 +7492,7 @@ bool take_page_off_buddy(struct page *page) unsigned int order; bool ret = false; - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); for (order = 0; order < NR_PAGE_ORDERS; order++) { struct page *page_head = page - (pfn & ((1 << order) - 1)); diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 521ed32bdbf67..70f938c0921ae 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -218,7 +218,7 @@ static void unset_migratetype_isolate(struct page *page, int migratetype) struct page *buddy; zone = page_zone(page); - luf_takeoff_start(); + luf_takeoff_start(zone); spin_lock_irqsave(&zone->lock, flags); if (!is_migrate_isolate_page(page)) goto out; diff --git a/mm/page_reporting.c b/mm/page_reporting.c index b23d3ed34ec07..83b66e7f0d257 100644 --- a/mm/page_reporting.c +++ b/mm/page_reporting.c @@ -170,7 +170,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, if (free_area_empty(area, mt)) return err; - can_shootdown = luf_takeoff_start(); + can_shootdown = luf_takeoff_start(zone); spin_lock_irq(&zone->lock); /* @@ -250,7 +250,7 @@ page_reporting_cycle(struct page_reporting_dev_info *prdev, struct zone *zone, /* update budget to reflect call to report function */ budget--; - luf_takeoff_start(); + luf_takeoff_start(zone); /* reacquire zone lock and resume processing */ spin_lock_irq(&zone->lock); From patchwork Wed Feb 26 12:01:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992189 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EED8C021BF for ; Wed, 26 Feb 2025 12:02:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDFE228002E; Wed, 26 Feb 2025 07:01:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB70128002D; Wed, 26 Feb 2025 07:01:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7D5528002E; Wed, 26 Feb 2025 07:01:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8BAE828002D for ; Wed, 26 Feb 2025 07:01:57 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 2F63DA331F for ; Wed, 26 Feb 2025 12:01:57 +0000 (UTC) X-FDA: 83161957074.12.8415B52 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf12.hostedemail.com (Postfix) with ESMTP id F15EA4000E for ; Wed, 26 Feb 2025 12:01:53 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571314; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=gUybK/612goMZS2YsajqHRiogafJTSI9chy7FXASWAs=; b=rVd5Bh7pDFQsfmzOuF/t31Lxlzs0Y9FHXIlFICJoq9g56sG2bSpOleMfWaoVsIURQ8iLlf XiYGFYm7Ete0nRbfBttlOSIcpN9YlyCoLREiAnwlCxwQlA2U4eGAXJvigNh5tWXmn8szga k9iKHWcGFLs2NIP3k5Y5RJIH4ILCW34= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571314; a=rsa-sha256; cv=none; b=vEt+S5R+0wMUnrj1/58p8aP0VLf7qiNOw05mLXoGrkettAUYswfB1ZdDcRmSClG64vxN26 CVQ2Y/UW3Cc1tSPTlypSrCYuB9Ithj3WdDpqRfwz6UXHtDqq2VlkYlhEVssDvEROhwKEfK gAF/zL4A1sJzm2PakASgrIc8MiDQq9g= X-AuditID: a67dfc5b-3e1ff7000001d7ae-37-67bf02a70812 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 23/25] mm/migrate: apply luf mechanism to unmapping during migration Date: Wed, 26 Feb 2025 21:01:30 +0900 Message-Id: <20250226120132.28469-23-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBouOGVjMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8b5/WdZChoCKy7sPMrYwDjJsYuR k0NCwESi6dMqJhj7y9EFrCA2m4C6xI0bP5lBbBEBM4mDrX/Yuxi5OJgFljFJ7D3RwAbiCAu0 Mkp82/cDyOHgYBFQlZjYVgLSwAvU0PriPRvEUHmJ1RsOgA3iBIr/2/2bHcQWEkiWaFn/mwVk joTAfTaJ9k/r2CEaJCUOrrjBMoGRdwEjwypGocy8stzEzBwTvYzKvMwKveT83E2MwLBeVvsn egfjpwvBhxgFOBiVeHgfnNmbLsSaWFZcmXuIUYKDWUmElzNzT7oQb0piZVVqUX58UWlOavEh RmkOFiVxXqNv5SlCAumJJanZqakFqUUwWSYOTqkGxjW6OpsO1jryLxfZJVdqah4ac86a5eJ/ ve61z1l0L7sfm3vzjsh85gahXuYVS1IyeQr7Xxy7vdf6QJ+gNoPjhot3D786374osHca+3qr gmksu8VOpJnU1bw22ulbvfXx2j+vWVcdmLOFl6uLmcftbMDE4C16DomuledCHWrqHm42M1z2 Ve3SayWW4oxEQy3mouJEAK0R8WtnAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wanXahZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlnN9/lqWgIbDi ws6jjA2Mkxy7GDk5JARMJL4cXcAKYrMJqEvcuPGTGcQWETCTONj6h72LkYuDWWAZk8TeEw1s II6wQCujxLd9P4AcDg4WAVWJiW0lIA28QA2tL96zQQyVl1i94QDYIE6g+L/dv9lBbCGBZImW 9b9ZJjByLWBkWMUokplXlpuYmWOqV5ydUZmXWaGXnJ+7iREYpMtq/0zcwfjlsvshRgEORiUe 3gdn9qYLsSaWFVfmHmKU4GBWEuHlzNyTLsSbklhZlVqUH19UmpNafIhRmoNFSZzXKzw1QUgg PbEkNTs1tSC1CCbLxMEp1cDIa29wK/ymecbRHynKTbPXdK+55+J/4aqCUeO2wj67Y9Y6R01f Mp+d9cJKqeW91/Kr712+X/T6c0QjrSdYatrufA6P3uJu20UPHly+9X39NIucoyav3M5XFW6J nG/ALjBV2/HTmyUlr4N2Tj99+PGt+zn/7Z37tUwOrgs9P1GHVelhrtAMH04HJZbijERDLeai 4kQAW73xwE4CAAA= X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: F15EA4000E X-Stat-Signature: c6y7czxabuc6wfk33kxh5dmx1pb1wtws X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740571313-596752 X-HE-Meta: U2FsdGVkX1/jf5/wSr009QePmJ68y7CaZkfPDzRRUjytDvh1f+YV+kbV18/A2pAjPAef46QuLdcjVgh7jJw5lhkMGqKd6nnjx/z0fT1bBSDxHhtf8w2xqxbxZZgW85B9s5TFyjrVTAijMRDeWpXUyfz0IAmWDRAIFzUZB3i9Toio8auHN3KRlpXUcpvQSVEvU7DC/C3S8c66NhWH+1uxfJXBDPu6eiG2EsMZaNRso849SXxXhZGMonNxLjvnhxQ1vGO35HInksfx9fuyzcP6bb8eqdxcogEhD7RD1/zBPQE9ZiWwON1w/crx8OW87TS4V3YJpGxhKhe1/aIK5QwzeUWg10tBftLSvWupBawUYIQoWfWyXytKmzvGdv3UWxSeqXukL851hXd2X3FjhpIczzEaktN7J+Yn/c6a76xPyi1u5p1SPWsxfwH6zL6EJ0fkwYLoJQ3wGWcja9IrEeVtA9bk1S6vCktxDCP8y8K2+mUZ60sD43Qf8JBSXro+VK4NpfJVxaDz3LFEb68Sdd9z1//iRq1SRdTQzJgB37ZBLyJAY0mMdMMI5EZHszyS39wcmcZngZDT+7ZBU+vGmfpCPo1r50AMHXmnwmkB4YMwnB8ECtOq+pnOYGPM5vOV5/q0u3MyxxoxE7TZNYt9ZQTnTeUfrMplbMEVsRYeAEI9aTSRElhs77KZTlUE9Jiup1Ku0CvgRaGyD/Wzaf+6ULEw4Mnv2ly/S++iV+/g13n7jsPWZDoVZImIgbNkgeT5vRs1Qq+F/s9O0TN809y5GCuMSVI1qkZjLHNe2lBnhaEOVLtku49IZBSFOTzJeQTsrlqKoSwubOXuvxA8SM1I6KGSDKQMYs0LL24QoMVq0Gsp7cAyMt6K+SEEPV389OcM1dskasgjjTqXDYwkDclAdd6BN3fQrQL2B7J2SO5prDP6yQpVkd8SRdiI3iamojrOw9eNCBgIkQyCXEuvRKh7jYy 1hg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. Applied the mechanism to unmapping during migration. Signed-off-by: Byungchul Park --- include/linux/mm.h | 2 ++ include/linux/rmap.h | 2 +- mm/migrate.c | 66 ++++++++++++++++++++++++++++++++++---------- mm/rmap.c | 15 ++++++---- mm/swap.c | 2 +- 5 files changed, 64 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2fa5185880105..b41d7804a06a2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1566,6 +1566,8 @@ static inline void folio_put(struct folio *folio) __folio_put(folio); } +void page_cache_release(struct folio *folio); + /** * folio_put_refs - Reduce the reference count on a folio. * @folio: The folio. diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 6abf7960077aa..bfccf2efb9000 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -675,7 +675,7 @@ static inline int folio_try_share_anon_rmap_pmd(struct folio *folio, int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); -void try_to_migrate(struct folio *folio, enum ttu_flags flags); +bool try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, diff --git a/mm/migrate.c b/mm/migrate.c index 365c6daa8d1b1..7d6472cc236ae 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1164,7 +1164,8 @@ static void migrate_folio_undo_dst(struct folio *dst, bool locked, /* Cleanup src folio upon migration success */ static void migrate_folio_done(struct folio *src, - enum migrate_reason reason) + enum migrate_reason reason, + unsigned short luf_key) { /* * Compaction can migrate also non-LRU pages which are @@ -1175,16 +1176,31 @@ static void migrate_folio_done(struct folio *src, mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); - if (reason != MR_MEMORY_FAILURE) - /* We release the page in page_handle_poison. */ + /* We release the page in page_handle_poison. */ + if (reason == MR_MEMORY_FAILURE) + luf_flush(luf_key); + else if (!luf_key) folio_put(src); + else { + /* + * Should be the last reference. + */ + if (unlikely(!folio_put_testzero(src))) + VM_WARN_ON(1); + + page_cache_release(src); + folio_unqueue_deferred_split(src); + mem_cgroup_uncharge(src); + free_frozen_pages(&src->page, folio_order(src), luf_key); + } } /* Obtain the lock on page, remove all ptes. */ static int migrate_folio_unmap(new_folio_t get_new_folio, free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio **dstp, enum migrate_mode mode, - enum migrate_reason reason, struct list_head *ret) + enum migrate_reason reason, struct list_head *ret, + bool *can_luf) { struct folio *dst; int rc = -EAGAIN; @@ -1200,7 +1216,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, folio_clear_unevictable(src); /* free_pages_prepare() will clear PG_isolated. */ list_del(&src->lru); - migrate_folio_done(src, reason); + migrate_folio_done(src, reason, 0); return MIGRATEPAGE_SUCCESS; } @@ -1317,7 +1333,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, /* Establish migration ptes */ VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_vma, src); - try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); + *can_luf = try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); old_page_state |= PAGE_WAS_MAPPED; } @@ -1345,7 +1361,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, struct folio *src, struct folio *dst, enum migrate_mode mode, enum migrate_reason reason, - struct list_head *ret) + struct list_head *ret, unsigned short luf_key) { int rc; int old_page_state = 0; @@ -1399,7 +1415,7 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, if (anon_vma) put_anon_vma(anon_vma); folio_unlock(src); - migrate_folio_done(src, reason); + migrate_folio_done(src, reason, luf_key); return rc; out: @@ -1694,7 +1710,7 @@ static void migrate_folios_move(struct list_head *src_folios, struct list_head *ret_folios, struct migrate_pages_stats *stats, int *retry, int *thp_retry, int *nr_failed, - int *nr_retry_pages) + int *nr_retry_pages, unsigned short luf_key) { struct folio *folio, *folio2, *dst, *dst2; bool is_thp; @@ -1711,7 +1727,7 @@ static void migrate_folios_move(struct list_head *src_folios, rc = migrate_folio_move(put_new_folio, private, folio, dst, mode, - reason, ret_folios); + reason, ret_folios, luf_key); /* * The rules are: * Success: folio will be freed @@ -1788,7 +1804,11 @@ static int migrate_pages_batch(struct list_head *from, int rc, rc_saved = 0, nr_pages; LIST_HEAD(unmap_folios); LIST_HEAD(dst_folios); + LIST_HEAD(unmap_folios_luf); + LIST_HEAD(dst_folios_luf); bool nosplit = (reason == MR_NUMA_MISPLACED); + unsigned short luf_key; + bool can_luf; VM_WARN_ON_ONCE(mode != MIGRATE_ASYNC && !list_empty(from) && !list_is_singular(from)); @@ -1863,9 +1883,11 @@ static int migrate_pages_batch(struct list_head *from, continue; } + can_luf = false; rc = migrate_folio_unmap(get_new_folio, put_new_folio, private, folio, &dst, mode, reason, - ret_folios); + ret_folios, &can_luf); + /* * The rules are: * Success: folio will be freed @@ -1911,7 +1933,8 @@ static int migrate_pages_batch(struct list_head *from, /* nr_failed isn't updated for not used */ stats->nr_thp_failed += thp_retry; rc_saved = rc; - if (list_empty(&unmap_folios)) + if (list_empty(&unmap_folios) && + list_empty(&unmap_folios_luf)) goto out; else goto move; @@ -1925,8 +1948,13 @@ static int migrate_pages_batch(struct list_head *from, stats->nr_thp_succeeded += is_thp; break; case MIGRATEPAGE_UNMAP: - list_move_tail(&folio->lru, &unmap_folios); - list_add_tail(&dst->lru, &dst_folios); + if (can_luf) { + list_move_tail(&folio->lru, &unmap_folios_luf); + list_add_tail(&dst->lru, &dst_folios_luf); + } else { + list_move_tail(&folio->lru, &unmap_folios); + list_add_tail(&dst->lru, &dst_folios); + } break; default: /* @@ -1946,6 +1974,8 @@ static int migrate_pages_batch(struct list_head *from, stats->nr_thp_failed += thp_retry; stats->nr_failed_pages += nr_retry_pages; move: + /* Should be before try_to_unmap_flush() */ + luf_key = fold_unmap_luf(); /* Flush TLBs for all unmapped folios */ try_to_unmap_flush(); @@ -1959,7 +1989,11 @@ static int migrate_pages_batch(struct list_head *from, migrate_folios_move(&unmap_folios, &dst_folios, put_new_folio, private, mode, reason, ret_folios, stats, &retry, &thp_retry, - &nr_failed, &nr_retry_pages); + &nr_failed, &nr_retry_pages, 0); + migrate_folios_move(&unmap_folios_luf, &dst_folios_luf, + put_new_folio, private, mode, reason, + ret_folios, stats, &retry, &thp_retry, + &nr_failed, &nr_retry_pages, luf_key); } nr_failed += retry; stats->nr_thp_failed += thp_retry; @@ -1970,6 +2004,8 @@ static int migrate_pages_batch(struct list_head *from, /* Cleanup remaining folios */ migrate_folios_undo(&unmap_folios, &dst_folios, put_new_folio, private, ret_folios); + migrate_folios_undo(&unmap_folios_luf, &dst_folios_luf, + put_new_folio, private, ret_folios); return rc; } diff --git a/mm/rmap.c b/mm/rmap.c index a2dc002a9c33d..e645bb0dd44b5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2925,8 +2925,9 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * Tries to remove all the page table entries which are mapping this folio and * replace them with special swap entries. Caller must hold the folio lock. + * Return true if all the mappings are read-only, otherwise false. */ -void try_to_migrate(struct folio *folio, enum ttu_flags flags) +bool try_to_migrate(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_migrate_one, @@ -2944,11 +2945,11 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) */ if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | TTU_SYNC | TTU_BATCH_FLUSH))) - return; + return false; if (folio_is_zone_device(folio) && (!folio_is_device_private(folio) && !folio_is_device_coherent(folio))) - return; + return false; /* * During exec, a temporary VMA is setup and later moved. @@ -2968,10 +2969,12 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) else rmap_walk(folio, &rwc); - if (can_luf_test()) + if (can_luf_test()) { fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); - else - fold_batch(tlb_ubc, tlb_ubc_ro, true); + return true; + } + fold_batch(tlb_ubc, tlb_ubc_ro, true); + return false; } #ifdef CONFIG_DEVICE_PRIVATE diff --git a/mm/swap.c b/mm/swap.c index bdfede631aea9..21374892854eb 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -84,7 +84,7 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, * This path almost never happens for VM activity - pages are normally freed * in batches. But it gets used by networking - and for compound pages. */ -static void page_cache_release(struct folio *folio) +void page_cache_release(struct folio *folio) { struct lruvec *lruvec = NULL; unsigned long flags; From patchwork Wed Feb 26 12:01:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992192 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 909B1C021B8 for ; Wed, 26 Feb 2025 12:02:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B4C5280031; Wed, 26 Feb 2025 07:02:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 43EA528002D; Wed, 26 Feb 2025 07:02:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DF22280031; Wed, 26 Feb 2025 07:02:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0D5F528002D for ; Wed, 26 Feb 2025 07:02:01 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 30AEF813B2 for ; Wed, 26 Feb 2025 12:01:56 +0000 (UTC) X-FDA: 83161957032.27.B166E7E Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf30.hostedemail.com (Postfix) with ESMTP id 05CFF80027 for ; Wed, 26 Feb 2025 12:01:53 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571314; a=rsa-sha256; cv=none; b=kiK7QD170bzBnKpZ63waWTbUsO1QDKadPZKwjSW1Z0fPOeo0bYZNjITyNsWgThM8+pY/tL yx+YkuMjBzkJfsdOgDPX8dIyKrK11khwWE8DBpZYZmbXMukN6Zq64z3Bvi7HVUm8av1l70 f4IZJOOTvmcWh2yWTmH39k7bHdj5p1k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571314; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=fhFgzyVCk3kDpxN+XO3PJReZpqjA2h7YsmQpIAI1Hos=; b=UPn5EK5qtNE7+QPBvqxwmMMm8lkHFCegwMSYJuhvbXvQ00c7dyjynxUrKOVThLzTmw3JHw 0anuqI+RYb7C5qWjgG1kRO+rYHF+L0z0ovb0TlixmO8aFbVuSvhm07D0uX5S8jxktT7/85 nblliZ09LhQyOV9gRY4iOfyzIv4Mzrs= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com X-AuditID: a67dfc5b-3e1ff7000001d7ae-3c-67bf02a74622 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 24/25] mm/vmscan: apply luf mechanism to unmapping during folio reclaim Date: Wed, 26 Feb 2025 21:01:31 +0900 Message-Id: <20250226120132.28469-24-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBtM+G1vMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bxC6+YCs7pV5zqUG9gfKHWxcjJ ISFgIrH+2w9WGPvF9h9MIDabgLrEjRs/mUFsEQEziYOtf9i7GLk4mAWWMUnsPdHABuIIC3Qw SizecJ0dpIpFQFXi9KuHbCA2L1DH1v932SGmykus3nAAbBInUPzf7t9gcSGBZImW9b9ZQAZJ CNxmk3jybioLRIOkxMEVN1gmMPIuYGRYxSiUmVeWm5iZY6KXUZmXWaGXnJ+7iREY1stq/0Tv YPx0IfgQowAHoxIP74Mze9OFWBPLiitzDzFKcDArifByZu5JF+JNSaysSi3Kjy8qzUktPsQo zcGiJM5r9K08RUggPbEkNTs1tSC1CCbLxMEp1cDYtoa1uC3nse63A6XM6hs/dgTozZCy+bPt n6+4S6XQrKU82ztsilbc/tg9w/nq7aQDtsqx05iTlf4zqSyxm+be93jhr0+T3q3d0HshRNlW N2b52823fZ71zuZ7auMTs7Kzq+liYNHfgzJB3E8t97ltyVtZYceSab3wpndfz6veuPWffNkd 1eYqsRRnJBpqMRcVJwIAJmWse2cCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wa8N2hZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlHL/wiqngnH7F qQ71BsYXal2MnBwSAiYSL7b/YAKx2QTUJW7c+MkMYosImEkcbP3D3sXIxcEssIxJYu+JBjYQ R1igg1Fi8Ybr7CBVLAKqEqdfPWQDsXmBOrb+v8sOMVVeYvWGA2CTOIHi/3b/BosLCSRLtKz/ zTKBkWsBI8MqRpHMvLLcxMwcU73i7IzKvMwKveT83E2MwCBdVvtn4g7GL5fdDzEKcDAq8fA+ OLM3XYg1say4MvcQowQHs5IIL2fmnnQh3pTEyqrUovz4otKc1OJDjNIcLErivF7hqQlCAumJ JanZqakFqUUwWSYOTqkGxgvzY940GM4+l3F8vbCuUNxePsMZ+zgWff7YzPH1WIqDlfi155XT 2nVWyU9d87zfZffk7dMO2L74mhlpvdHTr+XL/RW5PT+63mms6WgxW2gYWpLPpHiszVW2UZoh 6ucCUbm7xhuuyFcdFFpw0snb0mvnheUn9iga7DTbxur8eAPDf2eTHeIxK5RYijMSDbWYi4oT AWF+U4dOAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 05CFF80027 X-Rspamd-Server: rspam08 X-Rspam-User: X-Stat-Signature: e5jj1perjip73rtybgu1t1mkbht7jfbz X-HE-Tag: 1740571313-936131 X-HE-Meta: U2FsdGVkX1+t1ysbSpNIvB6TImFSSybHGiuYo64hcSmhjrfBlsn8orGb6MjnspsHInNYF/Y69XwgDnkVKCrSN8sApW7naupV/ZfyX8/xKWXc1df9KH6mvMGQjjC2x5xqVEejPZXpllvHp3WMKbEHWCyxXIHIcfgVPeZnWvTXva1lh8Er8oeoca5OUYmVfwSBHMI6Y+OEsva+Z0QMqI/eoSUGDi7Y6vNf8jpKgOxWEreUhSw1KYT/Ec7kAl/eRfZ7giu/joIQVauwvHAO875UAO6EfRPbrC7/KLa7/4IdHfChSE5ggWsRCAjzXxa5bJAYvonMwzCFyVHe4X7sJo2RSqe8uG6DD+vAiolj2vH2DOZklOdMljNeXXNM2mxmYF0SwdxeNkYKDZ4gNGywrYGTtzDawRMeeNSY31WeD1vwU/0zAR2g1FC+XxTkVvdEJmCjBQE0O5uWt0s95+KuULXzRXvmk1oPrMAYlFi4D06S9hDCxIA/ahOcByRQ/L7MWU5yCfwr36pfTY+TAYpcL71OQS0e+Mked6JWOC/g3aRFWawK067e4qyxFhAP5XX/LW1wFKSeWfjD6rG1AyYe6bCAaZbVbYebRuSj5OHbk+vtUAlzA+IAWOSgWvGlDHxySYX9FN7JzKm+JB9+5/8g2m01wMXPannlnaumqoJQpaVph5aZcwGDn7vZiPNS8O1/fRQSyKFTLMyTI6ekDQsbvcRmdbxf5BTYarGb2FoFIkIz048lSUgY/AC/E6eO7DskpgaGOB5Cetckgi04xUGcproWb2KUV8as4HLEDAZ40o3mRQJ7rht5gEgRVIO6JBgVwz64OYjuj+v7fjAidJrnUZNnS4GFJEzcsXl6DrVSEzOofw7gKoR8Y07QsxfIb04NqoE6YNSEXHe4lwErRssm9x1xMpI0GElVTrHH6/NHimaUSlIP67PuXSSnnW6QFF8vXRgs2JdnyAOmEQzxs8drJU6 MyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: A new mechanism, LUF(Lazy Unmap Flush), defers tlb flush until folios that have been unmapped and freed, eventually get allocated again. It's safe for folios that had been mapped read only and were unmapped, since the contents of the folios don't change while staying in pcp or buddy so we can still read the data through the stale tlb entries. Applied the mechanism to unmapping during folio reclaim. Signed-off-by: Byungchul Park --- include/linux/rmap.h | 5 +++-- mm/rmap.c | 11 +++++++---- mm/vmscan.c | 37 ++++++++++++++++++++++++++++++++----- 3 files changed, 42 insertions(+), 11 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bfccf2efb9000..8002f4b2a2d14 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -676,7 +676,7 @@ int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); bool try_to_migrate(struct folio *folio, enum ttu_flags flags); -void try_to_unmap(struct folio *, enum ttu_flags flags); +bool try_to_unmap(struct folio *, enum ttu_flags flags); struct page *make_device_exclusive(struct mm_struct *mm, unsigned long addr, void *owner, struct folio **foliop); @@ -811,8 +811,9 @@ static inline int folio_referenced(struct folio *folio, int is_locked, return 0; } -static inline void try_to_unmap(struct folio *folio, enum ttu_flags flags) +static inline bool try_to_unmap(struct folio *folio, enum ttu_flags flags) { + return false; } static inline int folio_mkclean(struct folio *folio) diff --git a/mm/rmap.c b/mm/rmap.c index e645bb0dd44b5..124ef59afa25e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2583,10 +2583,11 @@ static int folio_not_mapped(struct folio *folio) * Tries to remove all the page table entries which are mapping this * folio. It is the caller's responsibility to check if the folio is * still mapped if needed (use TTU_SYNC to prevent accounting races). + * Return true if all the mappings are read-only, otherwise false. * * Context: Caller must hold the folio lock. */ -void try_to_unmap(struct folio *folio, enum ttu_flags flags) +bool try_to_unmap(struct folio *folio, enum ttu_flags flags) { struct rmap_walk_control rwc = { .rmap_one = try_to_unmap_one, @@ -2605,10 +2606,12 @@ void try_to_unmap(struct folio *folio, enum ttu_flags flags) else rmap_walk(folio, &rwc); - if (can_luf_test()) + if (can_luf_test()) { fold_batch(tlb_ubc_luf, tlb_ubc_ro, true); - else - fold_batch(tlb_ubc, tlb_ubc_ro, true); + return true; + } + fold_batch(tlb_ubc, tlb_ubc_ro, true); + return false; } /* diff --git a/mm/vmscan.c b/mm/vmscan.c index f145c09629b97..a24d2d05df43a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1102,14 +1102,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, struct reclaim_stat *stat, bool ignore_references) { struct folio_batch free_folios; + struct folio_batch free_folios_luf; LIST_HEAD(ret_folios); LIST_HEAD(demote_folios); unsigned int nr_reclaimed = 0, nr_demoted = 0; unsigned int pgactivate = 0; bool do_demote_pass; struct swap_iocb *plug = NULL; + unsigned short luf_key; folio_batch_init(&free_folios); + folio_batch_init(&free_folios_luf); memset(stat, 0, sizeof(*stat)); cond_resched(); do_demote_pass = can_demote(pgdat->node_id, sc); @@ -1121,6 +1124,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, enum folio_references references = FOLIOREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; + bool can_luf = false; cond_resched(); @@ -1354,7 +1358,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, if (folio_test_large(folio)) flags |= TTU_SYNC; - try_to_unmap(folio, flags); + can_luf = try_to_unmap(folio, flags); if (folio_mapped(folio)) { stat->nr_unmap_fail += nr_pages; if (!was_swapbacked && @@ -1498,6 +1502,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * leave it off the LRU). */ nr_reclaimed += nr_pages; + if (can_luf) + luf_flush(fold_unmap_luf()); continue; } } @@ -1530,6 +1536,19 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, nr_reclaimed += nr_pages; folio_unqueue_deferred_split(folio); + + if (can_luf) { + if (folio_batch_add(&free_folios_luf, folio) == 0) { + mem_cgroup_uncharge_folios(&free_folios); + mem_cgroup_uncharge_folios(&free_folios_luf); + luf_key = fold_unmap_luf(); + try_to_unmap_flush(); + free_unref_folios(&free_folios, 0); + free_unref_folios(&free_folios_luf, luf_key); + } + continue; + } + if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1564,9 +1583,21 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, list_add(&folio->lru, &ret_folios); VM_BUG_ON_FOLIO(folio_test_lru(folio) || folio_test_unevictable(folio), folio); + if (can_luf) + luf_flush(fold_unmap_luf()); } /* 'folio_list' is always empty here */ + /* + * Finalize this turn before demote_folio_list(). + */ + mem_cgroup_uncharge_folios(&free_folios); + mem_cgroup_uncharge_folios(&free_folios_luf); + luf_key = fold_unmap_luf(); + try_to_unmap_flush(); + free_unref_folios(&free_folios, 0); + free_unref_folios(&free_folios_luf, luf_key); + /* Migrate folios selected for demotion */ nr_demoted = demote_folio_list(&demote_folios, pgdat); nr_reclaimed += nr_demoted; @@ -1600,10 +1631,6 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, pgactivate = stat->nr_activate[0] + stat->nr_activate[1]; - mem_cgroup_uncharge_folios(&free_folios); - try_to_unmap_flush(); - free_unref_folios(&free_folios, 0); - list_splice(&ret_folios, folio_list); count_vm_events(PGACTIVATE, pgactivate); From patchwork Wed Feb 26 12:01:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992190 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2458BC021B8 for ; Wed, 26 Feb 2025 12:02:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F9CD28002F; Wed, 26 Feb 2025 07:01:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8ABA428002D; Wed, 26 Feb 2025 07:01:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EB3828002F; Wed, 26 Feb 2025 07:01:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 374B828002D for ; Wed, 26 Feb 2025 07:01:58 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D22211C9BB0 for ; Wed, 26 Feb 2025 12:01:57 +0000 (UTC) X-FDA: 83161957074.19.D30D10D Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf13.hostedemail.com (Postfix) with ESMTP id 3722920028 for ; Wed, 26 Feb 2025 12:01:54 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571316; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=ldeLdc1d6Oj/4at3rZC/kyRGHg2YMxXb/SVVgar8OXY=; b=p4gyiDY4+KS/PGHHToWosw+vm4p0g4QsVZABXPIb5B+jOGhQ5nOUqN6VvW66U6dVgyWfXU LIHfepOJbjmqLDJW7Y6DB/FLbGVRhEnfiAXGKynzZz/gtkhrtua9C6Hm2MjWiajmp27yVD fATmaqzcUXErdJb7bDtGQnbjzYoDy/M= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571316; a=rsa-sha256; cv=none; b=epy8uqE6QRgNzZzagpBTvw9V2PmIQse5MG7Xd9T2hbB8h2gvLA5qe1OjbpDSgiMuIhmWko 5HfdQyuGYGm5tUMKJwIm5ZWrd5eqttYrCppGeGcL+B5iGb5qY4hTwpMl5/onEy045GwIfJ x+qrjKq3DHDt+hXtZlcckgWrIswmalc= X-AuditID: a67dfc5b-3e1ff7000001d7ae-41-67bf02a70b05 From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 25/25] mm/luf: implement luf debug feature Date: Wed, 26 Feb 2025 21:01:32 +0900 Message-Id: <20250226120132.28469-25-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsXC9ZZnke5ypv3pBmtuWFrMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bEi/fYC16dZKyYer+FuYHx2hLG LkZODgkBE4nW97vZYezGV9dYQGw2AXWJGzd+MoPYIgJmEgdb/wDVcHEwCyxjkth7ooENJCEs kCXR/3kb2CAWAVWJTfv7wQbxAjVcPD0Xaqi8xOoNB8AGcQLF/+3+DRYXEkiWaFn/mwVkqITA fTaJV/NmQV0kKXFwxQ2WCYy8CxgZVjEKZeaV5SZm5pjoZVTmZVboJefnbmIEhvay2j/ROxg/ XQg+xCjAwajEw/vgzN50IdbEsuLK3EOMEhzMSiK8nJl70oV4UxIrq1KL8uOLSnNSiw8xSnOw KInzGn0rTxESSE8sSc1OTS1ILYLJMnFwSjUw2ixR1fh6qtbI7sxxtiVbt+3OrJcsvflwWZHR tzrV3G9HWtY8nLz5461QSZ0NAsdn/ss1kKk3ZJO8+nKVZ/xusV9/3lyaXN30MpWHu+qR0te6 NFbtAiaffVyWs3z+blKbfXHytFjW6mi16g0qf6ZskHs758qdq8VL1ixrbEt+dJ7508cZr7o/ 9ymxFGckGmoxFxUnAgC3aMScaQIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrLLMWRmVeSWpSXmKPExsXC5WfdrLucaX+6wdT/ehZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlTLx4j73g1UnG iqn3W5gbGK8tYexi5OSQEDCRaHx1jQXEZhNQl7hx4ycziC0iYCZxsPUPexcjFwezwDImib0n GthAEsICWRL9n7eBNbMIqEps2t/PDmLzAjVcPD2XHWKovMTqDQfABnECxf/t/g0WFxJIlmhZ /5tlAiPXAkaGVYwimXlluYmZOaZ6xdkZlXmZFXrJ+bmbGIGBuqz2z8QdjF8uux9iFOBgVOLh fXBmb7oQa2JZcWXuIUYJDmYlEV7OzD3pQrwpiZVVqUX58UWlOanFhxilOViUxHm9wlMThATS E0tSs1NTC1KLYLJMHJxSDYxnkuNW3dJaPi9eNqQzeaov5+a6VMOj7+9rrjz298PBUKW4XJZd bzp/cR1Iq5WY28XS8u3D6ZQbv1gEFspo8dQULjGa9XaqyTa1h3E/jxUmdC1Und78y4fzquXP zJu3sgxCcybZiYa8fBN4g9d/nRQTe3VL8+xrUZ9WRW3m6k+8s0pmQa5hR6oSS3FGoqEWc1Fx IgAdOOPiUAIAAA== X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 3722920028 X-Stat-Signature: j1zxz6oagbua7kctyfzm9mpawifyy9dx X-HE-Tag: 1740571314-520959 X-HE-Meta: U2FsdGVkX1+roRMwQ9p1fM7Pq3HOeOKkhUzrQ5TmcfO7XBHdIIHU9DnbhlKRep3UoWfA/DQWuqN6obkjctEiPp2aUIzRDSG2ZKhfEqGzlDxlTCkQfDjPPX8+bhb0l0au3T2oQ+lgeZZox5L3/JTcVaDjCotSvrFs2lrY78so4XCv5jlpd8Q4mMEgj8ff3y5TY9aNRNbxRzIatp06k/HYVA2gLMpUtu7e90nEhWFsQRML1CTGIZwEf6lPtBxibJDYrK3SJpRCNJYb6R6LjT8eVGHBopW3v+4ylHq4ifTYE7HqJnQmBQ2oLih6WBX/GJO7X6M9DFe9DoiVSH0ZTE4Mr5wluOef2hjPa8U3CzHN572fvhX3oVCW6lkeS6Hby+wJ57wJlz5bblGVDOf/51DhUtfFbo1Nc73oc5R3+p6LZsc2QN29o3B5b+7/XmT4nCbrdiuM1/vS0psHyUWrZkiZCiAuXA3tIoiAoTMMxYMO/oMDT5F9L0EE09UBtr4FGe9hE6MDUMjYsV+aqIfRpyt4lj4ipeqRogKmT5k3AcVb24maHPWHtkcvvUVpPyRJtyk233dCiFO/0mD0OrUElGL526hPZdxCz43Iywce7vcwT0fYj/Eke12aIlAIs8buveAyWViiw4ZZSi77Ks7bSCagr6WXBl2O4vWwvqnkB5/voZMJhBzh2XaAF4pSs+kZz/IhVX09Cw/q20mE87AeAp47ptGYKmSSX38ATl0g6NBTSosgCvKSDNF3gr7i0boR89hsb14wRMbWd+OfbWhDwXoOg6dVrAsY27BvC6qjkzyyVmGTgEgody6XgakaOuxHLrLYW4DPxIK8bQZa2mNfLyd6zTvIQ+t/FsjXuZqTq94eJr7zG9uTUoz4se10PS4tiiFZjIz1osDoUtIDNn9bimdVHhou6+ugFFliUHG0ssR6pVO2YI8oPnhNIfTktUT14rmmoT4hqsfGpbhjW8hHc5T Xww== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: We need luf debug feature to detect when luf goes wrong by any chance. As a RFC, suggest a simple implementation to report problematic situations by luf. Signed-off-by: Byungchul Park --- arch/riscv/include/asm/tlbflush.h | 3 + arch/riscv/mm/tlbflush.c | 35 ++++- arch/x86/include/asm/pgtable.h | 10 ++ arch/x86/include/asm/tlbflush.h | 3 + arch/x86/mm/pgtable.c | 10 ++ arch/x86/mm/tlb.c | 35 ++++- include/linux/highmem-internal.h | 5 + include/linux/mm.h | 20 ++- include/linux/mm_types.h | 16 +-- include/linux/mm_types_task.h | 16 +++ include/linux/sched.h | 5 + mm/highmem.c | 1 + mm/memory.c | 12 ++ mm/page_alloc.c | 34 ++++- mm/page_ext.c | 3 + mm/rmap.c | 229 ++++++++++++++++++++++++++++++ 16 files changed, 418 insertions(+), 19 deletions(-) diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 936bf9ce0abd9..b927d134cda9b 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -68,6 +68,9 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); +#ifdef CONFIG_LUF_DEBUG +extern void print_lufd_arch(void); +#endif static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 6ce44370a8e11..345846fbc2ecf 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -215,6 +215,25 @@ static int __init luf_init_arch(void) } early_initcall(luf_init_arch); +#ifdef CONFIG_LUF_DEBUG +static DEFINE_SPINLOCK(luf_debug_lock); +#define lufd_lock(f) spin_lock_irqsave(&luf_debug_lock, (f)) +#define lufd_unlock(f) spin_unlock_irqrestore(&luf_debug_lock, (f)) + +void print_lufd_arch(void) +{ + int cpu; + + pr_cont("LUFD ARCH:"); + for_each_cpu(cpu, cpu_possible_mask) + pr_cont(" %lu", atomic_long_read(per_cpu_ptr(&ugen_done, cpu))); + pr_cont("\n"); +} +#else +#define lufd_lock(f) do { (void)(f); } while(0) +#define lufd_unlock(f) do { (void)(f); } while(0) +#endif + /* * batch will not be updated. */ @@ -222,17 +241,22 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); - if (ugen_before(done, ugen)) + if (ugen_before(done, ugen)) { + lufd_unlock(flags); return false; + } } + lufd_unlock(flags); return true; out: return cpumask_empty(&batch->cpumask); @@ -242,10 +266,12 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; @@ -253,6 +279,7 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, if (!ugen_before(done, ugen)) cpumask_clear_cpu(cpu, &batch->cpumask); } + lufd_unlock(flags); out: return cpumask_empty(&batch->cpumask); } @@ -261,10 +288,12 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -282,15 +311,18 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, mm_cpumask(mm)) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -308,4 +340,5 @@ void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 593f10aabd45a..414bcabb23b51 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -695,12 +695,22 @@ static inline pud_t pud_mkyoung(pud_t pud) return pud_set_flags(pud, _PAGE_ACCESSED); } +#ifdef CONFIG_LUF_DEBUG +pud_t pud_mkwrite(pud_t pud); +static inline pud_t __pud_mkwrite(pud_t pud) +{ + pud = pud_set_flags(pud, _PAGE_RW); + + return pud_clear_saveddirty(pud); +} +#else static inline pud_t pud_mkwrite(pud_t pud) { pud = pud_set_flags(pud, _PAGE_RW); return pud_clear_saveddirty(pud); } +#endif #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY static inline int pte_soft_dirty(pte_t pte) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 58ad7e6989bb1..b667987dbd31b 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -297,6 +297,9 @@ extern bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, un extern bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); extern void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen); extern void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen); +#ifdef CONFIG_LUF_DEBUG +extern void print_lufd_arch(void); +#endif static inline void arch_tlbbatch_clear(struct arch_tlbflush_unmap_batch *batch) { diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 1fef5ad32d5a8..d0b7a1437214c 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -904,6 +904,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) { + lufd_check_pages(pte_page(pte), 0); if (vma->vm_flags & VM_SHADOW_STACK) return pte_mkwrite_shstk(pte); @@ -914,6 +915,7 @@ pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma) pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) { + lufd_check_pages(pmd_page(pmd), PMD_ORDER); if (vma->vm_flags & VM_SHADOW_STACK) return pmd_mkwrite_shstk(pmd); @@ -922,6 +924,14 @@ pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) return pmd_clear_saveddirty(pmd); } +#ifdef CONFIG_LUF_DEBUG +pud_t pud_mkwrite(pud_t pud) +{ + lufd_check_pages(pud_page(pud), PUD_ORDER); + return __pud_mkwrite(pud); +} +#endif + void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) { /* diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index be6068b60c32d..99b3d54aa74d2 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1283,6 +1283,25 @@ static int __init luf_init_arch(void) } early_initcall(luf_init_arch); +#ifdef CONFIG_LUF_DEBUG +static DEFINE_SPINLOCK(luf_debug_lock); +#define lufd_lock(f) spin_lock_irqsave(&luf_debug_lock, (f)) +#define lufd_unlock(f) spin_unlock_irqrestore(&luf_debug_lock, (f)) + +void print_lufd_arch(void) +{ + int cpu; + + pr_cont("LUFD ARCH:"); + for_each_cpu(cpu, cpu_possible_mask) + pr_cont(" %lu", atomic_long_read(per_cpu_ptr(&ugen_done, cpu))); + pr_cont("\n"); +} +#else +#define lufd_lock(f) do { (void)(f); } while(0) +#define lufd_unlock(f) do { (void)(f); } while(0) +#endif + /* * batch will not be updated. */ @@ -1290,17 +1309,22 @@ bool arch_tlbbatch_check_done(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; done = atomic_long_read(per_cpu_ptr(&ugen_done, cpu)); - if (ugen_before(done, ugen)) + if (ugen_before(done, ugen)) { + lufd_unlock(flags); return false; + } } + lufd_unlock(flags); return true; out: return cpumask_empty(&batch->cpumask); @@ -1310,10 +1334,12 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) goto out; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { unsigned long done; @@ -1321,6 +1347,7 @@ bool arch_tlbbatch_diet(struct arch_tlbflush_unmap_batch *batch, if (!ugen_before(done, ugen)) cpumask_clear_cpu(cpu, &batch->cpumask); } + lufd_unlock(flags); out: return cpumask_empty(&batch->cpumask); } @@ -1329,10 +1356,12 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, &batch->cpumask) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -1350,15 +1379,18 @@ void arch_tlbbatch_mark_ugen(struct arch_tlbflush_unmap_batch *batch, */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) { int cpu; + unsigned long flags; if (!ugen) return; + lufd_lock(flags); for_each_cpu(cpu, mm_cpumask(mm)) { atomic_long_t *done = per_cpu_ptr(&ugen_done, cpu); unsigned long old = atomic_long_read(done); @@ -1376,6 +1408,7 @@ void arch_mm_mark_ugen(struct mm_struct *mm, unsigned long ugen) */ atomic_long_cmpxchg(done, old, ugen); } + lufd_unlock(flags); } void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index dd100e849f5e0..0792530d1be7b 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -41,6 +41,7 @@ static inline void *kmap(struct page *page) { void *addr; + lufd_check_pages(page, 0); might_sleep(); if (!PageHighMem(page)) addr = page_address(page); @@ -161,6 +162,7 @@ static inline struct page *kmap_to_page(void *addr) static inline void *kmap(struct page *page) { + lufd_check_pages(page, 0); might_sleep(); return page_address(page); } @@ -177,11 +179,13 @@ static inline void kunmap(struct page *page) static inline void *kmap_local_page(struct page *page) { + lufd_check_pages(page, 0); return page_address(page); } static inline void *kmap_local_folio(struct folio *folio, size_t offset) { + lufd_check_folio(folio); return page_address(&folio->page) + offset; } @@ -204,6 +208,7 @@ static inline void __kunmap_local(const void *addr) static inline void *kmap_atomic(struct page *page) { + lufd_check_pages(page, 0); if (IS_ENABLED(CONFIG_PREEMPT_RT)) migrate_disable(); else diff --git a/include/linux/mm.h b/include/linux/mm.h index b41d7804a06a2..5304477e7da8e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -46,6 +46,24 @@ extern int sysctl_page_lock_unfairness; void mm_core_init(void); void init_mm_internals(void); +#ifdef CONFIG_LUF_DEBUG +void lufd_check_folio(struct folio *f); +void lufd_check_pages(const struct page *p, unsigned int order); +void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order); +void lufd_check_queued_pages(void); +void lufd_queue_page_for_check(struct page *page, int order); +void lufd_mark_folio(struct folio *f, unsigned short luf_key); +void lufd_mark_pages(struct page *p, unsigned int order, unsigned short luf_key); +#else +static inline void lufd_check_folio(struct folio *f) {} +static inline void lufd_check_pages(const struct page *p, unsigned int order) {} +static inline void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order) {} +static inline void lufd_check_queued_pages(void) {} +static inline void lufd_queue_page_for_check(struct page *page, int order) {} +static inline void lufd_mark_folio(struct folio *f, unsigned short luf_key) {} +static inline void lufd_mark_pages(struct page *p, unsigned int order, unsigned short luf_key) {} +#endif + #ifndef CONFIG_NUMA /* Don't use mapnrs, do it properly */ extern unsigned long max_mapnr; @@ -115,7 +133,7 @@ extern int mmap_rnd_compat_bits __read_mostly; #endif #ifndef page_to_virt -#define page_to_virt(x) __va(PFN_PHYS(page_to_pfn(x))) +#define page_to_virt(x) ({ lufd_check_pages(x, 0); __va(PFN_PHYS(page_to_pfn(x)));}) #endif #ifndef lm_alias diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index a1d80ffafe338..30d29a6f9db4c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -23,6 +23,10 @@ #include +#ifdef CONFIG_LUF_DEBUG +extern struct page_ext_operations luf_debug_ops; +#endif + #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 #endif @@ -33,18 +37,6 @@ struct address_space; struct mem_cgroup; -#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH -struct luf_batch { - struct tlbflush_unmap_batch batch; - unsigned long ugen; - rwlock_t lock; -}; -void luf_batch_init(struct luf_batch *lb); -#else -struct luf_batch {}; -static inline void luf_batch_init(struct luf_batch *lb) {} -#endif - /* * Each physical page in the system has a struct page associated with * it to keep track of whatever it is we are using the page for at the diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index a82aa80c0ba46..3b87f8674e528 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -10,6 +10,7 @@ #include #include +#include #include @@ -88,4 +89,19 @@ struct tlbflush_unmap_batch { #endif }; +#ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH +struct luf_batch { + struct tlbflush_unmap_batch batch; + unsigned long ugen; + rwlock_t lock; +}; +void luf_batch_init(struct luf_batch *lb); +#else +struct luf_batch {}; +static inline void luf_batch_init(struct luf_batch *lb) {} +#endif + +#if defined(CONFIG_LUF_DEBUG) +#define NR_LUFD_PAGES 512 +#endif #endif /* _LINUX_MM_TYPES_TASK_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 96375274d0335..9cb8e6fa1b1b4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1406,6 +1406,11 @@ struct task_struct { unsigned long luf_ugen; unsigned long zone_ugen; unsigned long wait_zone_ugen; +#if defined(CONFIG_LUF_DEBUG) + struct page *lufd_pages[NR_LUFD_PAGES]; + int lufd_pages_order[NR_LUFD_PAGES]; + int lufd_pages_nr; +#endif #endif struct tlbflush_unmap_batch tlb_ubc; diff --git a/mm/highmem.c b/mm/highmem.c index ef3189b36cadb..a323d5a655bf9 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -576,6 +576,7 @@ void *__kmap_local_page_prot(struct page *page, pgprot_t prot) { void *kmap; + lufd_check_pages(page, 0); /* * To broaden the usage of the actual kmap_local() machinery always map * pages when debugging is enabled and the architecture has no problems diff --git a/mm/memory.c b/mm/memory.c index 62137ab258d2c..26e8b73436eab 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -6259,6 +6259,18 @@ vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long address, mapping = vma->vm_file->f_mapping; } +#ifdef CONFIG_LUF_DEBUG + if (luf_flush) { + /* + * If it has a VM_SHARED mapping, all the mms involved + * in the struct address_space should be luf_flush'ed. + */ + if (mapping) + luf_flush_mapping(mapping); + luf_flush_mm(mm); + } +#endif + if (unlikely(is_vm_hugetlb_page(vma))) ret = hugetlb_fault(vma->vm_mm, vma, address, flags); else diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9a58d6f7a9609..8a114a4339d68 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -758,6 +758,8 @@ void luf_takeoff_end(struct zone *zone) VM_WARN_ON(current->zone_ugen); VM_WARN_ON(current->wait_zone_ugen); } + + lufd_check_queued_pages(); } /* @@ -853,8 +855,10 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) struct luf_batch *lb; unsigned long lb_ugen; - if (!luf_key) + if (!luf_key) { + lufd_check_pages(page, buddy_order(page)); return true; + } lb = &luf_batch[luf_key]; read_lock_irqsave(&lb->lock, flags); @@ -875,12 +879,15 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) if (!current->luf_ugen || ugen_before(current->luf_ugen, lb_ugen)) current->luf_ugen = lb_ugen; + lufd_queue_page_for_check(page, buddy_order(page)); return true; } zone_ugen = page_zone_ugen(zone, page); - if (!zone_ugen) + if (!zone_ugen) { + lufd_check_pages(page, buddy_order(page)); return true; + } /* * Should not be zero since zone-zone_ugen has been updated in @@ -888,17 +895,23 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) */ VM_WARN_ON(!zone->zone_ugen); - if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) + if (!ugen_before(READ_ONCE(zone->zone_ugen_done), zone_ugen)) { + lufd_check_pages(page, buddy_order(page)); return true; + } if (current->luf_no_shootdown) return false; + lufd_check_zone_pages(zone, page, buddy_order(page)); + /* * zone batched flush has been already set. */ - if (current->zone_ugen) + if (current->zone_ugen) { + lufd_queue_page_for_check(page, buddy_order(page)); return true; + } /* * Others are already performing tlb shootdown for us. All we @@ -933,6 +946,7 @@ bool luf_takeoff_check_and_fold(struct zone *zone, struct page *page) atomic_long_set(&zone->nr_luf_pages, 0); fold_batch(tlb_ubc_takeoff, &zone->zone_batch, true); } + lufd_queue_page_for_check(page, buddy_order(page)); return true; } #endif @@ -1238,6 +1252,11 @@ static inline void __free_one_page(struct page *page, } else zone_ugen = page_zone_ugen(zone, page); + if (!zone_ugen) + lufd_check_pages(page, order); + else + lufd_check_zone_pages(zone, page, order); + while (order < MAX_PAGE_ORDER) { int buddy_mt = migratetype; unsigned long buddy_zone_ugen; @@ -1299,6 +1318,10 @@ static inline void __free_one_page(struct page *page, set_page_zone_ugen(page, zone_ugen); pfn = combined_pfn; order++; + if (!zone_ugen) + lufd_check_pages(page, order); + else + lufd_check_zone_pages(zone, page, order); } done_merging: @@ -3246,6 +3269,8 @@ void free_frozen_pages(struct page *page, unsigned int order, unsigned long pfn = page_to_pfn(page); int migratetype; + lufd_mark_pages(page, order, luf_key); + if (!pcp_allowed_order(order)) { __free_pages_ok(page, order, FPI_NONE, luf_key); return; @@ -3298,6 +3323,7 @@ void free_unref_folios(struct folio_batch *folios, unsigned short luf_key) unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); + lufd_mark_folio(folio, luf_key); if (!free_pages_prepare(&folio->page, order)) continue; /* diff --git a/mm/page_ext.c b/mm/page_ext.c index 641d93f6af4c1..be40bc2a93378 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -89,6 +89,9 @@ static struct page_ext_operations *page_ext_ops[] __initdata = { #ifdef CONFIG_PAGE_TABLE_CHECK &page_table_check_ops, #endif +#ifdef CONFIG_LUF_DEBUG + &luf_debug_ops, +#endif }; unsigned long page_ext_size; diff --git a/mm/rmap.c b/mm/rmap.c index 124ef59afa25e..11bdbbc47ad11 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1161,6 +1161,235 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags) } #endif /* CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH */ +#ifdef CONFIG_LUF_DEBUG + +static bool need_luf_debug(void) +{ + return true; +} + +static void init_luf_debug(void) +{ + /* Do nothing */ +} + +struct page_ext_operations luf_debug_ops = { + .size = sizeof(struct luf_batch), + .need = need_luf_debug, + .init = init_luf_debug, + .need_shared_flags = false, +}; + +static bool __lufd_check_zone_pages(struct page *page, int nr, + struct tlbflush_unmap_batch *batch, unsigned long ugen) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + unsigned long lb_ugen; + unsigned long flags; + bool ret; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + write_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + ret = arch_tlbbatch_done(&lb->batch.arch, &batch->arch); + write_unlock_irqrestore(&lb->lock, flags); + page_ext_put(page_ext); + + if (!ret || ugen_before(ugen, lb_ugen)) + return false; + } + return true; +} + +void lufd_check_zone_pages(struct zone *zone, struct page *page, unsigned int order) +{ + bool warn; + static bool once = false; + + if (!page || !zone) + return; + + warn = !__lufd_check_zone_pages(page, 1 << order, + &zone->zone_batch, zone->luf_ugen); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} + +static bool __lufd_check_pages(const struct page *page, int nr) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + unsigned long lb_ugen; + unsigned long flags; + bool ret; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + write_lock_irqsave(&lb->lock, flags); + lb_ugen = lb->ugen; + ret = arch_tlbbatch_diet(&lb->batch.arch, lb_ugen); + write_unlock_irqrestore(&lb->lock, flags); + page_ext_put(page_ext); + + if (!ret) + return false; + } + return true; +} + +void lufd_queue_page_for_check(struct page *page, int order) +{ + struct page **parray = current->lufd_pages; + int *oarray = current->lufd_pages_order; + + if (!page) + return; + + if (current->lufd_pages_nr >= NR_LUFD_PAGES) { + VM_WARN_ONCE(1, "LUFD: NR_LUFD_PAGES is too small.\n"); + return; + } + + *(parray + current->lufd_pages_nr) = page; + *(oarray + current->lufd_pages_nr) = order; + current->lufd_pages_nr++; +} + +void lufd_check_queued_pages(void) +{ + struct page **parray = current->lufd_pages; + int *oarray = current->lufd_pages_order; + int i; + + for (i = 0; i < current->lufd_pages_nr; i++) + lufd_check_pages(*(parray + i), *(oarray + i)); + current->lufd_pages_nr = 0; +} + +void lufd_check_folio(struct folio *folio) +{ + struct page *page; + int nr; + bool warn; + static bool once = false; + + if (!folio) + return; + + page = folio_page(folio, 0); + nr = folio_nr_pages(folio); + + warn = !__lufd_check_pages(page, nr); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) nr(%d)\n", + atomic_long_read(&luf_ugen), page, nr); + print_lufd_arch(); + } +} +EXPORT_SYMBOL(lufd_check_folio); + +void lufd_check_pages(const struct page *page, unsigned int order) +{ + bool warn; + static bool once = false; + + if (!page) + return; + + warn = !__lufd_check_pages(page, 1 << order); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} +EXPORT_SYMBOL(lufd_check_pages); + +static void __lufd_mark_pages(struct page *page, int nr, unsigned short luf_key) +{ + int i; + + for (i = 0; i < nr; i++) { + struct page_ext *page_ext; + struct luf_batch *lb; + + page_ext = page_ext_get(page + i); + if (!page_ext) + continue; + + lb = (struct luf_batch *)page_ext_data(page_ext, &luf_debug_ops); + fold_luf_batch(lb, &luf_batch[luf_key]); + page_ext_put(page_ext); + } +} + +void lufd_mark_folio(struct folio *folio, unsigned short luf_key) +{ + struct page *page; + int nr; + bool warn; + static bool once = false; + + if (!luf_key) + return; + + page = folio_page(folio, 0); + nr = folio_nr_pages(folio); + + warn = !__lufd_check_pages(page, nr); + __lufd_mark_pages(page, nr, luf_key); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) nr(%d)\n", + atomic_long_read(&luf_ugen), page, nr); + print_lufd_arch(); + } +} + +void lufd_mark_pages(struct page *page, unsigned int order, unsigned short luf_key) +{ + bool warn; + static bool once = false; + + if (!luf_key) + return; + + warn = !__lufd_check_pages(page, 1 << order); + __lufd_mark_pages(page, 1 << order, luf_key); + + if (warn && !READ_ONCE(once)) { + WRITE_ONCE(once, true); + VM_WARN(1, "LUFD: ugen(%lu) page(%p) order(%u)\n", + atomic_long_read(&luf_ugen), page, order); + print_lufd_arch(); + } +} +#endif + /** * page_address_in_vma - The virtual address of a page in this VMA. * @folio: The folio containing the page.