From patchwork Wed Feb 26 12:01:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992178 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F64CC021BF for ; Wed, 26 Feb 2025 12:02:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D83D280024; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 687A0280023; Wed, 26 Feb 2025 07:01:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48CC1280024; Wed, 26 Feb 2025 07:01:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1E661280023 for ; Wed, 26 Feb 2025 07:01:51 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8EC171612E5 for ; Wed, 26 Feb 2025 12:01:50 +0000 (UTC) X-FDA: 83161956780.04.EC89E34 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf02.hostedemail.com (Postfix) with ESMTP id 161FE8001A for ; Wed, 26 Feb 2025 12:01:47 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=Cg/fpVqogJ3sTbAxwKiXYKaOPt890OklpZzKlHKTkOo=; b=qrvpPnfaSLEr7JNA99lNQyFz4hVL509R/oomY1UEPw+Dy26uLRjb7EwukFLPJhl4ieulUF WrveUc4xDcGUPP9P/ZN8eO13YwbYoaglpQXayws12X8f2vmFCQDnRD9Cjm5ySm6AthiKOu trXZq2es8+O975hdE3FaZWNDFfAb+pY= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; spf=pass (imf02.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571308; a=rsa-sha256; cv=none; b=arN5BY/9Nj2BdP0pJxjsc/hxz5m2guo0pC/yKCeIy7kXjYqn/Pn3LsyODK/1C96xcqcrj6 9xJb1mDhXyjKpNJkClDgNHlxsgkbSiNYGkJipsRZ585qScOINnfacTMk0Rsy8Qq1XX2qkH 300LgBJbC/QdExEOIXdz7118MA/waa4= X-AuditID: a67dfc5b-3e1ff7000001d7ae-f4-67bf02a682ef From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on mm-unstable as of Feb 21, 2025 10/25] mm: introduce APIs to check if the page allocation is tlb shootdownable Date: Wed, 26 Feb 2025 21:01:17 +0900 Message-Id: <20250226120132.28469-10-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120132.28469-1-byungchul@sk.com> References: <20250226113342.GB1935@system.software.com> <20250226120132.28469-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrCLMWRmVeSWpSXmKPExsXC9ZZnke4ypv3pBnPfG1nMWb+GzeLzhn9s Fl/X/2K2ePqpj8Xi8q45bBb31vxntTi/ay2rxY6l+5gsLh1YwGRxvPcAk8X8e5/ZLDZvmsps cXzKVEaL3z/msDnweXxv7WPx2DnrLrvHgk2lHptXaHlsWtXJ5rHp0yR2j3fnzrF7nJjxm8Xj /b6rbB5bf9l5NE69xubxeZNcAE8Ul01Kak5mWWqRvl0CV8bXnplMBT+NK/62GTUwHtfqYuTk kBAwkfjz4i4TjH1p41QWEJtNQF3ixo2fzCC2iICZxMHWP+xdjFwczALLmCT2nmhgA3GEBfoY JW517wHrYBFQlbjSvQpsEi9Qx/vT/5khpspLrN5wAMzmBIr/2/2bHcQWEkiWaFn/mwVkkITA bTaJR6/mskA0SEocXHGDZQIj7wJGhlWMQpl5ZbmJmTkmehmVeZkVesn5uZsYgWG9rPZP9A7G TxeCDzEKcDAq8fA+OLM3XYg1say4MvcQowQHs5IIL2fmnnQh3pTEyqrUovz4otKc1OJDjNIc LErivEbfylOEBNITS1KzU1MLUotgskwcnFINjFMWJG+44J3SwMS2UKfY1EPm0Hpd9gwLhetT XhYcOJpTkBVV7GJ5q+HqzIeSSyesdl78/dpi63B1vp5Zmxt+qNV1/U6T+Pnrea6yLM/0XxsW LDRKuXS88E98/d8QqVUfrnfdeqndONVl3YG70Wv3rrDou9N8KvnGrr8PVQQueppXbF/tfcOx W06JpTgj0VCLuag4EQDOqW81ZwIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrNLMWRmVeSWpSXmKPExsXC5WfdrLuMaX+6wanfmhZz1q9hs/i84R+b xdf1v5gtnn7qY7E4PPckq8XlXXPYLO6t+c9qcX7XWlaLHUv3MVlcOrCAyeJ47wEmi/n3PrNZ bN40ldni+JSpjBa/f8xhc+D3+N7ax+Kxc9Zddo8Fm0o9Nq/Q8ti0qpPNY9OnSewe786dY/c4 MeM3i8f7fVfZPBa/+MDksfWXnUfj1GtsHp83yQXwRnHZpKTmZJalFunbJXBlfO2ZyVTw07ji b5tRA+NxrS5GTg4JAROJSxunsoDYbALqEjdu/GQGsUUEzCQOtv5h72Lk4mAWWMYksfdEAxuI IyzQxyhxq3sPWAeLgKrEle5VTCA2L1DH+9P/mSGmykus3nAAzOYEiv/b/ZsdxBYSSJZoWf+b ZQIj1wJGhlWMIpl5ZbmJmTmmesXZGZV5mRV6yfm5mxiBQbqs9s/EHYxfLrsfYhTgYFTi4X1w Zm+6EGtiWXFl7iFGCQ5mJRFezsw96UK8KYmVValF+fFFpTmpxYcYpTlYlMR5vcJTE4QE0hNL UrNTUwtSi2CyTBycUg2MOWfnFsQlbDnOuH9BJB+v7Z+HnmtMQ3mdltbp1F1ZXsy+8IXvu+1b X7zLd+I6ejJR4lxPaSuP9Pt/mWezWdl+2pQwVFhtUp1pxDFLL3pCo97GCXUtXqtvWr5k7Jgu zPdpFfvVrR89HtYbd954Yf5XSCvff9mOn0X2bFZ/dGamnn14JaHunNUXJZbijERDLeai4kQA Fd5EEU4CAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 161FE8001A X-Stat-Signature: gjp8ks1pfxesyxjx15knwtzi831zncdq X-HE-Tag: 1740571307-922618 X-HE-Meta: U2FsdGVkX1+LU6++o/PBvn/bHtwgOXp2PpQLwKa2nwWiihpfVRe6h2Chc2Z7sdsAXIbQGaX9qTM/v69KwdAIseG/Q0IP0Ogch2MIzad6Tn3Xn7s6eV7dq/HStb9nwqiKjdEEvWMizOvrugRVl4Ni6bTtDXt6z+FJ69uTgZj81jCKwHvkGGlSc1mpQ/fecJ8oPEGhFCZf12VSn6ihyFfOOkpInYrmJqtE8ZqMwnw+Uq1P4Fx2ln4Q2fx+txJanQTxRV/ukscwlSwSxii3B3ttjvcSE2caLA9O1BxrJFIhS/PHzCkOkRO3e84/SDtIDWjh8ZqOgj9l1IwGQ90eUonpBMDoRq7QJWycEs08LZgsJ6WAV7nVwaGs8vqVjSJXS8IO8g9osAllEfUi/pVvXhiGfj7b2Xpv5a5MJy5zC39BeMuN3nMjv2MHt1RmiRnaf/MbfplTFJ3E2Nc9Eegg2U8eccYKtM18YcJamzHQFzAdSJTiczuPBlCrXLTvIcl84BjJNA4t+DKfjG2W7a/jT+UiHCMb6W/m/fhSWjMFv/++QKh3e5bGvQFcvp6dwRaEgk/lNBY1x4sXE0GY1gQVW4Chz53efwMK0ez9jzCbNCnzqm1abUHS+h0Mpelad/XkWTUi2MnGXw33Xz2htHScq76GagfHZv02sc8L17U70cuhXo1w2IjDMRRrMF0jut6ZU/2lFat3CR9fRHWpX0pZBkBVSwU0BMy7Wn76W1trFEJba+sx5A9kTCeYpJnccw9scGpE0V2Qg4N3QRgqhDfdzdflRIZ1YC0w822WjPZdkDQNyseXbpr3UedwBW1fONc8JQ838l6p72SKmFk70QkjfBDmgMBS9UenyxlbMBHKf0rGGFFfbhueL1w76M1YQlHaSsu5daiWHqVAYHXtJ4PT5GwWxsrw7D0U3LRAwC9b0HWF3mEcxQzmgEOmAgr4pv/DbYkgsb+xNel7ezBHnVJ75ar 6TltiRr9 e5zo7mFSjVgY4IbIvL9A0jmf84w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that should indentify if tlb shootdown can be performed on page allocation. In a context with irq disabled or non-task, tlb shootdown cannot be performed because of deadlock issue. Thus, page allocator should work being aware of whether tlb shootdown can be performed on returning page. This patch introduced APIs that pcp or buddy page allocator can use to delimit the critical sections taking off pages and indentify whether tlb shootdown can be performed. Signed-off-by: Byungchul Park --- include/linux/sched.h | 5 ++ mm/internal.h | 14 ++++ mm/page_alloc.c | 159 ++++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 2 +- 4 files changed, 179 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 86ef426644639..a3049ea5b3ad3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1400,6 +1400,11 @@ struct task_struct { struct callback_head cid_work; #endif +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + int luf_no_shootdown; + int luf_takeoff_started; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; diff --git a/mm/internal.h b/mm/internal.h index 8ad7e86c1c0e2..bf16482bce2f5 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1598,6 +1598,20 @@ static inline void accept_page(struct page *page) { } #endif /* CONFIG_UNACCEPTED_MEMORY */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +extern struct luf_batch luf_batch[]; +bool luf_takeoff_start(void); +void luf_takeoff_end(void); +bool luf_takeoff_no_shootdown(void); +bool luf_takeoff_check(struct page *page); +bool luf_takeoff_check_and_fold(struct page *page); +#else +static inline bool luf_takeoff_start(void) { return false; } +static inline void luf_takeoff_end(void) {} +static inline bool luf_takeoff_no_shootdown(void) { return true; } +static inline bool luf_takeoff_check(struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +#endif /* pagewalk.c */ int walk_page_range_mm(struct mm_struct *mm, unsigned long start, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f3930a2a05cd3..f3cb02e36e770 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -622,6 +622,165 @@ compaction_capture(struct capture_control *capc, struct page *page, } #endif /* CONFIG_COMPACTION */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +static bool no_shootdown_context(void) +{ + /* + * If it performs with irq disabled, that might cause a deadlock. + * Avoid tlb shootdown in this case. + */ + return !(!irqs_disabled() && in_task()); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_start(void) +{ + unsigned long flags; + bool no_shootdown = no_shootdown_context(); + + local_irq_save(flags); + + /* + * It's the outmost luf_takeoff_start(). + */ + if (!current->luf_takeoff_started) + VM_WARN_ON(current->luf_no_shootdown); + + /* + * current->luf_no_shootdown > 0 doesn't mean tlb shootdown is + * not allowed at all. However, it guarantees tlb shootdown is + * possible once current->luf_no_shootdown == 0. It might look + * too conservative but for now do this way for simplity. + */ + if (no_shootdown || current->luf_no_shootdown) + current->luf_no_shootdown++; + + current->luf_takeoff_started++; + local_irq_restore(flags); + + return !no_shootdown; +} + +/* + * Should be called within the same context of luf_takeoff_start(). + */ +void luf_takeoff_end(void) +{ + unsigned long flags; + bool no_shootdown; + bool outmost = false; + + local_irq_save(flags); + VM_WARN_ON(!current->luf_takeoff_started); + + /* + * Assume the context and irq flags are same as those at + * luf_takeoff_start(). + */ + if (current->luf_no_shootdown) + current->luf_no_shootdown--; + + no_shootdown = !!current->luf_no_shootdown; + + current->luf_takeoff_started--; + + /* + * It's the outmost luf_takeoff_end(). + */ + if (!current->luf_takeoff_started) + outmost = true; + + local_irq_restore(flags); + + if (no_shootdown) + goto out; + + try_to_unmap_flush_takeoff(); +out: + if (outmost) + VM_WARN_ON(current->luf_no_shootdown); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_no_shootdown(void) +{ + bool no_shootdown = true; + unsigned long flags; + + local_irq_save(flags); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + goto out; + } + no_shootdown = current->luf_no_shootdown; +out: + local_irq_restore(flags); + return no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check(struct page *page) +{ + unsigned short luf_key = page_luf_key(page); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + return !current->luf_no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check_and_fold(struct page *page) +{ + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long flags; + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc_takeoff, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + return true; +} +#endif + static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { diff --git a/mm/rmap.c b/mm/rmap.c index 61366b4570c9a..40de03c8f73be 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -693,7 +693,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, /* * Use 0th entry as accumulated batch. */ -static struct luf_batch luf_batch[NR_LUF_BATCH]; +struct luf_batch luf_batch[NR_LUF_BATCH]; static void luf_batch_init(struct luf_batch *lb) {