From patchwork Wed Feb 26 12:03:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Byungchul Park X-Patchwork-Id: 13992203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEA4AC021B8 for ; Wed, 26 Feb 2025 12:04:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 500CE280034; Wed, 26 Feb 2025 07:03:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F464280036; Wed, 26 Feb 2025 07:03:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEE28280034; Wed, 26 Feb 2025 07:03:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C46CF280037 for ; Wed, 26 Feb 2025 07:03:52 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7E3FEC1390 for ; Wed, 26 Feb 2025 12:03:52 +0000 (UTC) X-FDA: 83161961904.19.F63F2BF Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf19.hostedemail.com (Postfix) with ESMTP id A8A7B1A0019 for ; Wed, 26 Feb 2025 12:03:50 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740571431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=A4NnjN00HTXUYnYxnCdGoGHTFSCUMp1YVyMTOwC1+N4=; b=iVeEt/hbgSuriZ7U3Llqth6IAFLoOoD81TlwqoCvLS/FcgrhClK2wd6g/zCo4Tyk6x6OJk F/cfWtdlR0laplwOcl3yHeN2zA6a0MIYyQPgRyxzCY8svTpzkfhEfPSf1w9XW9zA2Etb7J 667Q1HRhFxxHTds76UHpd7nAhJrrr6o= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf19.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740571431; a=rsa-sha256; cv=none; b=pFiw0pqPMlS61fXxbr9b7146HHeCnYc8+yCD2RDJ4U08oN3o5jnup8/0VRQlkcBa80GUMI AxYhSMUD8aaDOcT/YhjaaB/myI6TognS1A4SmGcsEgGzdjLnf57/2xWO8jJZu8vkybIqkd k+WnjR1qm8bh/DGSGcl/GfSiYE7pOIc= X-AuditID: a67dfc5b-3e1ff7000001d7ae-23-67bf0322c49c From: Byungchul Park To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, rjgolo@gmail.com Subject: [RFC PATCH v12 based on v6.14-rc4 10/25] mm: introduce APIs to check if the page allocation is tlb shootdownable Date: Wed, 26 Feb 2025 21:03:21 +0900 Message-Id: <20250226120336.29565-10-byungchul@sk.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20250226120336.29565-1-byungchul@sk.com> References: <20250226113024.GA1935@system.software.com> <20250226120336.29565-1-byungchul@sk.com> X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrMLMWRmVeSWpSXmKPExsXC9ZZnka4S8/50g7UfDSzmrF/DZvF5wz82 i6/rfzFbPP3Ux2JxedccNot7a/6zWpzftZbVYsfSfUwWlw4sYLI43nuAyWL+vc9sFps3TWW2 OD5lKqPF7x9z2Bz4PL639rF47Jx1l91jwaZSj80rtDw2repk89j0aRK7x7tz59g9Tsz4zeLx ft9VNo+tv+w8GqdeY/P4vEkugCeKyyYlNSezLLVI3y6BK+PG1mNsBT+NK9ZP38bcwHhcq4uR k0NCwERi67p/TDD29G8PWEBsNgF1iRs3fjKD2CICZhIHW/+wdzFycTALLGOS2Huiga2LkYND WKBSYsfpGJAaFgFViS93TjKC2LxA9Ssfz2ODmCkvsXrDAbA5nEDxT9OOgcWFBJIldv7+wwQy U0LgPpvEh85DUEdIShxccYNlAiPvAkaGVYxCmXlluYmZOSZ6GZV5mRV6yfm5mxiBQb2s9k/0 DsZPF4IPMQpwMCrx8D44szddiDWxrLgy9xCjBAezkggvZ+aedCHelMTKqtSi/Pii0pzU4kOM 0hwsSuK8Rt/KU4QE0hNLUrNTUwtSi2CyTBycUg2MvL/PXT7G8VdX5OZb3reF2/sdtqj2Rl5/ MHklb7zpdt0NTBf/fLA8sUHTVOSa+BI1N5W9CvFv1i2+n+idoLn7wdF7uee8Qj7ZG/8LPFun 0H7wgbPrMRe1mGnsJT89Am8zxPeHC81SLRLzfHUv1OvV9McXnvOfrZJznlT5QaqTcUZ29Jtf qiFWSizFGYmGWsxFxYkANVSxGmYCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrFLMWRmVeSWpSXmKPExsXC5WfdrKvEvD/d4MZ0DYs569ewWXze8I/N 4uv6X8wWTz/1sVgcnnuS1eLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHG89wCTxfx7n9ks Nm+aymxxfMpURovfP+awOfB7fG/tY/HYOesuu8eCTaUem1doeWxa1cnmsenTJHaPd+fOsXuc mPGbxeP9vqtsHotffGDy2PrLzqNx6jU2j8+b5AJ4o7hsUlJzMstSi/TtErgybmw9xlbw07hi /fRtzA2Mx7W6GDk5JARMJKZ/e8ACYrMJqEvcuPGTGcQWETCTONj6h72LkYuDWWAZk8TeEw1s XYwcHMIClRI7TseA1LAIqEp8uXOSEcTmBapf+XgeG8RMeYnVGw6AzeEEin+adgwsLiSQLLHz 9x+mCYxcCxgZVjGKZOaV5SZm5pjqFWdnVOZlVugl5+duYgSG6LLaPxN3MH657H6IUYCDUYmH 98GZvelCrIllxZW5hxglOJiVRHg5M/ekC/GmJFZWpRblxxeV5qQWH2KU5mBREuf1Ck9NEBJI TyxJzU5NLUgtgskycXBKNTA6mb2Z9zROaOuB2cx19w2TS85smnSY58WOj6XfmPgmTbcMED+X NvvLsoXrFn9d9OxmQ1HXhL51S9Yq1r69KTy9a+PUBnH3VddqVjTnKPfVs043U/57Ly3jnd1H M8O5jz4d5BEXTkpc/LPWwLDkucjWw+YPnP/Nt/kg+Xv/s4mMIbWyrlF3oz55KrEUZyQaajEX FScCAO0dzwxNAgAA X-CFilter-Loop: Reflected X-Rspam-User: X-Stat-Signature: edrktxwrfs36pxdhujp15fh1n19tdnbo X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A8A7B1A0019 X-HE-Tag: 1740571430-493008 X-HE-Meta: U2FsdGVkX18e3JSY2CaAmMawLSbxG3nnvpcq1Y/nTYRSmP/RhdZTrNV4WmrjVQbK3tzsIwnWSQfsRl3BKSr0J9vHGyDiyCJlV3lV1HYfPyq2+LP6PJJAenymo1wCBjW5GVCdwPl16zqcw8qJKuWU0zQLefmNIihtIwh0hy/XGhkhln8b+K4eN1fhPGV9FVk0RelUlU4xyt84BWuz5IFvBgjeib+vIwNYO81LK4I4DSt/hwHfFcVtsqRTNXGwRctzXOLYZp1I5Ppbn2DXiPdJ2IYnnuS82xz1wI0Ozhng6vZTbAqI+rQknY8hg+Tw3GxauVmCVDWX1b9t091HgKf6ujyvwSWJOKZvvcteQdNlvV3a6ogRGILvwqNNRb5lalN5n+QbxUTlERxi55RmIaUl9ZOKd3QHAO1yfK6eZXkYvgmJrmy61GT5Zr0/aiwjrZhbzqgMRX3nfFb9ma/8u93Dtu775bl1JkhcofSPFkY75oioIrfGvVX9rZkjQH8W3/LJIeiQMS4Q4r+fTUsWRgeuTM0TdQk5aQDH8GMpd3xnP0es0VpBoRkyDInTWBl1YG6JU5I1SeKW5Vul9JxmvbzhQ7qHDYXqrd7AuJnETh8GJP2BEwp6WyaUb280v6aOr6Sxwvi9IW+DcWUZkT0IR8c5SjOPKcs5UUHvI2cEw9de3Ps4g/rpLNRufRY7F+9RrijGWuyqD5HUPDwRknE+R9CggaO3hVQa62X7HLjku6z3BdgPFMc/hCcZ0Hlnlno4bncQ7nWcSwpWfKq5myo1utZpQMIdITmOdq/aB2nDqd3XKCXpstUx3/EfK0Tqa00EgA+Zkxhc7SGZm94SK7kvDwmM+Oym1sRwYhAkwURQi1gaMp6cS6VEG3b16syjvzi7zV1iA/EdzNXzHTHrqF95fW/FVnm5Nl9gf+WGb3Tw6MCOlS6VIu5mfI0tExNc6Iqik3HyjicldoQsyyGkmMLkFWG RXGWx1LF 0hR0Nub//Wed6VwwTXkITS/R5sw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Functionally, no change. This is a preparation for luf mechanism that should indentify if tlb shootdown can be performed on page allocation. In a context with irq disabled or non-task, tlb shootdown cannot be performed because of deadlock issue. Thus, page allocator should work being aware of whether tlb shootdown can be performed on returning page. This patch introduced APIs that pcp or buddy page allocator can use to delimit the critical sections taking off pages and indentify whether tlb shootdown can be performed. Signed-off-by: Byungchul Park --- include/linux/sched.h | 5 ++ mm/internal.h | 14 ++++ mm/page_alloc.c | 159 ++++++++++++++++++++++++++++++++++++++++++ mm/rmap.c | 2 +- 4 files changed, 179 insertions(+), 1 deletion(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 86ef426644639..a3049ea5b3ad3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1400,6 +1400,11 @@ struct task_struct { struct callback_head cid_work; #endif +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) + int luf_no_shootdown; + int luf_takeoff_started; +#endif + struct tlbflush_unmap_batch tlb_ubc; struct tlbflush_unmap_batch tlb_ubc_takeoff; diff --git a/mm/internal.h b/mm/internal.h index b52e14f86c436..5e67f009d23c6 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1580,6 +1580,20 @@ static inline void accept_page(struct page *page) { } #endif /* CONFIG_UNACCEPTED_MEMORY */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +extern struct luf_batch luf_batch[]; +bool luf_takeoff_start(void); +void luf_takeoff_end(void); +bool luf_takeoff_no_shootdown(void); +bool luf_takeoff_check(struct page *page); +bool luf_takeoff_check_and_fold(struct page *page); +#else +static inline bool luf_takeoff_start(void) { return false; } +static inline void luf_takeoff_end(void) {} +static inline bool luf_takeoff_no_shootdown(void) { return true; } +static inline bool luf_takeoff_check(struct page *page) { return true; } +static inline bool luf_takeoff_check_and_fold(struct page *page) { return true; } +#endif /* pagewalk.c */ int walk_page_range_mm(struct mm_struct *mm, unsigned long start, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 27aeee0cfcf8f..a964a98fbad51 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -622,6 +622,165 @@ compaction_capture(struct capture_control *capc, struct page *page, } #endif /* CONFIG_COMPACTION */ +#if defined(CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH) +static bool no_shootdown_context(void) +{ + /* + * If it performs with irq disabled, that might cause a deadlock. + * Avoid tlb shootdown in this case. + */ + return !(!irqs_disabled() && in_task()); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_start(void) +{ + unsigned long flags; + bool no_shootdown = no_shootdown_context(); + + local_irq_save(flags); + + /* + * It's the outmost luf_takeoff_start(). + */ + if (!current->luf_takeoff_started) + VM_WARN_ON(current->luf_no_shootdown); + + /* + * current->luf_no_shootdown > 0 doesn't mean tlb shootdown is + * not allowed at all. However, it guarantees tlb shootdown is + * possible once current->luf_no_shootdown == 0. It might look + * too conservative but for now do this way for simplity. + */ + if (no_shootdown || current->luf_no_shootdown) + current->luf_no_shootdown++; + + current->luf_takeoff_started++; + local_irq_restore(flags); + + return !no_shootdown; +} + +/* + * Should be called within the same context of luf_takeoff_start(). + */ +void luf_takeoff_end(void) +{ + unsigned long flags; + bool no_shootdown; + bool outmost = false; + + local_irq_save(flags); + VM_WARN_ON(!current->luf_takeoff_started); + + /* + * Assume the context and irq flags are same as those at + * luf_takeoff_start(). + */ + if (current->luf_no_shootdown) + current->luf_no_shootdown--; + + no_shootdown = !!current->luf_no_shootdown; + + current->luf_takeoff_started--; + + /* + * It's the outmost luf_takeoff_end(). + */ + if (!current->luf_takeoff_started) + outmost = true; + + local_irq_restore(flags); + + if (no_shootdown) + goto out; + + try_to_unmap_flush_takeoff(); +out: + if (outmost) + VM_WARN_ON(current->luf_no_shootdown); +} + +/* + * Can be called with zone lock released and irq enabled. + */ +bool luf_takeoff_no_shootdown(void) +{ + bool no_shootdown = true; + unsigned long flags; + + local_irq_save(flags); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + goto out; + } + no_shootdown = current->luf_no_shootdown; +out: + local_irq_restore(flags); + return no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check(struct page *page) +{ + unsigned short luf_key = page_luf_key(page); + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + return !current->luf_no_shootdown; +} + +/* + * Should be called with either zone lock held and irq disabled or pcp + * lock held. + */ +bool luf_takeoff_check_and_fold(struct page *page) +{ + struct tlbflush_unmap_batch *tlb_ubc_takeoff = ¤t->tlb_ubc_takeoff; + unsigned short luf_key = page_luf_key(page); + struct luf_batch *lb; + unsigned long flags; + + /* + * No way. Delimit using luf_takeoff_{start,end}(). + */ + if (unlikely(!current->luf_takeoff_started)) { + VM_WARN_ON(1); + return false; + } + + if (!luf_key) + return true; + + if (current->luf_no_shootdown) + return false; + + lb = &luf_batch[luf_key]; + read_lock_irqsave(&lb->lock, flags); + fold_batch(tlb_ubc_takeoff, &lb->batch, false); + read_unlock_irqrestore(&lb->lock, flags); + return true; +} +#endif + static inline void account_freepages(struct zone *zone, int nr_pages, int migratetype) { diff --git a/mm/rmap.c b/mm/rmap.c index 72c5e665e59a4..1581b1a00f974 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -693,7 +693,7 @@ void fold_batch(struct tlbflush_unmap_batch *dst, /* * Use 0th entry as accumulated batch. */ -static struct luf_batch luf_batch[NR_LUF_BATCH]; +struct luf_batch luf_batch[NR_LUF_BATCH]; static void luf_batch_init(struct luf_batch *lb) {