From patchwork Fri May 19 12:01:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248260 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6799AC77B7A for ; Fri, 19 May 2023 12:02:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD038280001; Fri, 19 May 2023 08:02:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8070900003; Fri, 19 May 2023 08:02:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B2010280001; Fri, 19 May 2023 08:02:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A4156900003 for ; Fri, 19 May 2023 08:02:02 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 61EA4160A09 for ; Fri, 19 May 2023 12:02:02 +0000 (UTC) X-FDA: 80806866084.30.804B447 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id ACC1710001D for ; Fri, 19 May 2023 12:01:39 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZFkl+Clo; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684497699; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XKTug7YtIVG7ORSr6AmdGrO8h7oDKo7Zn6imQe4xNdA=; b=Z91nQybXJkwRMTf5zRBX9gjsuKZOIiPs8qhfQBWcFIbXirdEml5+/lIEu4QHZiwWu6VIg4 Xn9TpBl/VMGOCBExuyy4z3NVukvoz3Uk+1aXw27kulDnX/0MSh1dv4n4WDRWQU+qSQKIId u2cL+WaottbI6WbFSTFXt3hc08NpI74= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZFkl+Clo; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684497699; a=rsa-sha256; cv=none; b=ahjIqucD+f2fLHMbvfpRYOj9VS7t956fyNH+IJyZK3lm4B+DtyqEFKZPVL4YXZgnubySm4 HAEj6+ghQIpaiHWohaGFWqI0rEZICfNxGK8VPF8rzZo1AJCl+PcnfnAxuiLsbBXqymV0nf Seteo2zBxKkx0qMUjqfQ3CLqCAld3lI= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497698; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XKTug7YtIVG7ORSr6AmdGrO8h7oDKo7Zn6imQe4xNdA=; b=ZFkl+ClosKOK7cMo4ZiHkvTiYxAORlo66HojbhXSKxyGvHTVll07wy0LHH64+r5tFVtNBW qbvJ+yEmiIyUvvu2iHsp6AYVlRUuX/ifvaZlSSK+iUpOhyqDFP+MxmLadHvPfk/dmONhcl HxJ/WJujG7S/iQaQfz1leCGOyW6op08= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-408-vKDXBfpQNGOY_VAdrU0AMw-1; Fri, 19 May 2023 08:01:33 -0400 X-MC-Unique: vKDXBfpQNGOY_VAdrU0AMw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B7897280AA69; Fri, 19 May 2023 12:01:32 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8315CC0004B; Fri, 19 May 2023 12:01:31 +0000 (UTC) Date: Fri, 19 May 2023 20:01:27 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 1/3] mm/vmalloc.c: try to flush vmap_area one by one Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Rspamd-Queue-Id: ACC1710001D X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: bcp5euh98zo3x1heituyfke4sk5qh9qk X-HE-Tag: 1684497699-817208 X-HE-Meta: U2FsdGVkX19e+xAjQ3YgqP49q4IurwhyIwtdhKEsAl+woMScfnmyea2IUJVp2aiWLV+G5EmoEwJexu3Ca2IU7AkQ2riC7/y2TlAhVHHRlcu1uxU6aleXcYsUbHTZVp77caLwZP14mci3mdSz/6A4sQbgmOVr6PYKbJwxxCErG2qpUcpcNmNjPvP3CWJCn+OP8M27pb8oY3gRQKCdN+t5yNAYN1gUboKWGpqalbQfcXSb4kBcQrSLEu8+UaAdVgx/i9xRzgJS+9jgW7C4vNItq4sJUyGZvPIGjJgQYxElOXxiG5F359W4Olss6MoYRRjj3YAQs4RvEVGMY7BsU+tuENBDVmnmQzqG0O8rpkqgI36AZHjjEKBuusupYvUV1lyWqF/lzBB20ym/ASwMzIem7rQ4pSUEDMnEhVSp0uI6IWRTm0UeA+zDRqxBr0MDl9xMfZYS4sWH4/CwJgAtBuhlRes/3b+wvPNo6rvGSmBHf+mAk0b5YA4gtP+ClEgnvrc9Q2KrusRexbBYQXgFE2p5SuOcNb/SimtUrET6gbb9dq5xUYxI/S9bBa02JLxfNFd5DcZbefhAYuDtOaWzBRShVn4YwOhevd7AT/DrapQBrffPnkR3AQi2bA3zHxzVGgTQ8tE+WhIfTlB13l3K7krPGMddQrbVhJZE/43cz6y1NGQCylGJZgySig96YKWf7wQiEAXSuEJKJtJvgAUaL5ea0vIHzJo//NEnFQ/haTePZBvNOsio9KOBPnvQGOP5qRHvdBlG41jxbNIhsTLiSOvb68xm9F4UYHy48bVwW8Vvy82E2dODShSmjmCCMrHXqE21v5pvxS862dvAgNpavkygBawxQJDUh8o/HEhvV8vXmBy8RNWhi9OfTymE8Rp0iYUD24sZC2pp3kU3l1L4oQyZgwjEEAx+Dc4qL3mceJUzpW8iHWZd4rQ1H1/wxI3atHacds44iA1b8tmra7MUB+S IOVN6P9T ukCYl6z3H2xPMPLAkhQYhV3cq7R8neE1za+gvH6rfm9HzsR8FG02z48hMG6bPzpql3z54uKR28oZksVawv7uXKs7AHOPVq92aIQ/nlHJX1A8uQz04kpRzK5801NIWRIyivFhz+SAMebesWERMba6REFkb7UxPUzREsSl/9iv/n7v5w4zQ/OQNFJTxAnUStTihjfVpGfODG4l0/ZB3BF7uBCN+POFtwH6Jau+8EQ6Wu3g/f+ULvuSR4Cpmexg/F7k8pBhveqLHXJIIJxjxprvFs/pHCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the current __purge_vmap_area_lazy(), when trying to flush TLB of vmalloc area, it calculate the flushing the range with [min:max] of vas. That calculated range could be big because of the gap between the vas. E.g in below graph, there are only 12 (4 from va_1, 8 from va_2) pages. While the calculated flush range is 58. VA_1 VA_2 |....|-------------------------|............| 10 12 60 68 . mapped; - not mapped. Sometime the calculated flush range could be surprisingly huge because the vas could cross two kernel virtual address area. E.g the vmalloc and the kernel module area are very far away from each other on some architectures. So for systems which lack a full TLB flush, to flush a long range is a big problem(it takes time). Flushing va one by one becomes necessary in that case. Hence, introduce flush_tlb_kernel_vas() to try to flush va one by one. And add CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS to indicate if a certain architecture has provided a flush_tlb_kernel_vas() implementation. Otherwise, take the old way to calculate and flush the whole range. Signed-off-by: Thomas Gleixner Signed-off-by: Baoquan He #Fix error of 'undefined reference to `flush_tlb_kernel_vas'' --- arch/Kconfig | 4 ++++ arch/arm/Kconfig | 1 + arch/arm/kernel/smp_tlb.c | 23 +++++++++++++++++++++++ arch/x86/Kconfig | 1 + arch/x86/mm/tlb.c | 22 ++++++++++++++++++++++ include/linux/vmalloc.h | 8 ++++++++ mm/vmalloc.c | 32 ++++++++++++++++++++++---------- 7 files changed, 81 insertions(+), 10 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 205fd23e0cad..ca5413f1e4e0 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -270,6 +270,10 @@ config ARCH_HAS_SET_MEMORY config ARCH_HAS_SET_DIRECT_MAP bool +# Select if architecture provides flush_tlb_kernel_vas() +config ARCH_HAS_FLUSH_TLB_KERNEL_VAS + bool + # # Select if the architecture provides the arch_dma_set_uncached symbol to # either provide an uncached segment alias for a DMA allocation, or diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fb4b218f665..c4de7f38f9a7 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -10,6 +10,7 @@ config ARM select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV select ARCH_HAS_MEMBARRIER_SYNC_CORE diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c index d4908b3736d8..22ec9b982cb1 100644 --- a/arch/arm/kernel/smp_tlb.c +++ b/arch/arm/kernel/smp_tlb.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include @@ -69,6 +70,19 @@ static inline void ipi_flush_tlb_kernel_range(void *arg) local_flush_tlb_kernel_range(ta->ta_start, ta->ta_end); } +static inline void local_flush_tlb_kernel_vas(struct list_head *vmap_list) +{ + struct vmap_area *va; + + list_for_each_entry(va, vmap_list, list) + local_flush_tlb_kernel_range(va->va_start, va->va_end); +} + +static inline void ipi_flush_tlb_kernel_vas(void *arg) +{ + local_flush_tlb_kernel_vas(arg); +} + static inline void ipi_flush_bp_all(void *ignored) { local_flush_bp_all(); @@ -244,6 +258,15 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) broadcast_tlb_a15_erratum(); } +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (tlb_ops_need_broadcast()) { + on_each_cpu(ipi_flush_tlb_kernel_vas, vmap_list, 1); + } else + local_flush_tlb_kernel_vas(vmap_list); + broadcast_tlb_a15_erratum(); +} + void flush_bp_all(void) { if (tlb_ops_need_broadcast()) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 53bab123a8ee..7d7a44810a0b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -77,6 +77,7 @@ config X86 select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_EARLY_DEBUG if KGDB select ARCH_HAS_ELF_RANDOMIZE + select ARCH_HAS_FLUSH_TLB_KERNEL_VAS select ARCH_HAS_FAST_MULTIPLIER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 267acf27480a..c39d77eb37e4 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -1081,6 +1082,27 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) } } +static void do_flush_tlb_vas(void *arg) +{ + struct list_head *vmap_list = arg; + struct vmap_area *va; + unsigned long addr; + + list_for_each_entry(va, vmap_list, list) { + /* flush range by one by one 'invlpg' */ + for (addr = va->va_start; addr < va->va_end; addr += PAGE_SIZE) + flush_tlb_one_kernel(addr); + } +} + +void flush_tlb_kernel_vas(struct list_head *vmap_list, unsigned long num_entries) +{ + if (num_entries > tlb_single_page_flush_ceiling) + on_each_cpu(do_flush_tlb_all, NULL, 1); + else + on_each_cpu(do_flush_tlb_vas, vmap_list, 1); +} + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c720be70c8dd..a9a1e488261d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -295,4 +295,12 @@ bool vmalloc_dump_obj(void *object); static inline bool vmalloc_dump_obj(void *object) { return false; } #endif +#if defined(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS) +void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries); +#else +static inline void flush_tlb_kernel_vas(struct list_head *list, unsigned long num_entries) +{ +} +#endif + #endif /* _LINUX_VMALLOC_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index c0f80982eb06..31e8d9e93650 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1724,7 +1724,8 @@ static void purge_fragmented_blocks_allcpus(void); */ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) { - unsigned long resched_threshold; + unsigned long resched_threshold, num_entries = 0, num_alias_entries = 0; + struct vmap_area alias_va = { .va_start = start, .va_end = end }; unsigned int num_purged_areas = 0; struct list_head local_purge_list; struct vmap_area *va, *n_va; @@ -1736,18 +1737,29 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) list_replace_init(&purge_vmap_area_list, &local_purge_list); spin_unlock(&purge_vmap_area_lock); - if (unlikely(list_empty(&local_purge_list))) - goto out; + start = min(start, list_first_entry(&local_purge_list, struct vmap_area, list)->va_start); + end = max(end, list_last_entry(&local_purge_list, struct vmap_area, list)->va_end); + + if (IS_ENABLED(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS)) { + list_for_each_entry(va, &local_purge_list, list) + num_entries += (va->va_end - va->va_start) >> PAGE_SHIFT; + + if (unlikely(!num_entries)) + goto out; + + if (alias_va.va_end > alias_va.va_start) { + num_alias_entries = (alias_va.va_end - alias_va.va_start) >> PAGE_SHIFT; + list_add(&alias_va.list, &local_purge_list); + } - start = min(start, - list_first_entry(&local_purge_list, - struct vmap_area, list)->va_start); + flush_tlb_kernel_vas(&local_purge_list, num_entries + num_alias_entries); - end = max(end, - list_last_entry(&local_purge_list, - struct vmap_area, list)->va_end); + if (num_alias_entries) + list_del(&alias_va.list); + } else { + flush_tlb_kernel_range(start, end); + } - flush_tlb_kernel_range(start, end); resched_threshold = lazy_max_pages() << 1; spin_lock(&free_vmap_area_lock); From patchwork Fri May 19 12:02:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248261 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50854C77B7F for ; Fri, 19 May 2023 12:02:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E57A5900005; Fri, 19 May 2023 08:02:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E0647900003; Fri, 19 May 2023 08:02:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF53D900005; Fri, 19 May 2023 08:02:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BCE06900003 for ; Fri, 19 May 2023 08:02:53 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 704DB809F6 for ; Fri, 19 May 2023 12:02:53 +0000 (UTC) X-FDA: 80806868226.18.3C15B67 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id DD9D81401A9 for ; Fri, 19 May 2023 12:02:19 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="gJ/E6MLu"; spf=pass (imf23.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684497740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cwIaLMT16E5qj8r3voiOM19WTdKFjWSp1/HW3YYU9XQ=; b=VPgUaqc+qRMu4b/j2u3t/nUF+uLRHCiknQFQFyLIAuuNgQy13Lsw4RK5Xh/hfW29gUaMXC uFPOeuWn1452AJvxdfrHE4m12kHxhp0w5mH1J3KOUin79EYEewnmShzGFAaDB2dxQv0fvQ b+XC2dcEX26bYT2G2r8uhya1xbitZV0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="gJ/E6MLu"; spf=pass (imf23.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684497740; a=rsa-sha256; cv=none; b=TZpYZ7MRKHyxgbPLKQGPdFAp5SfcqRpyBzr6MUAcQu/efS0DuPJch32bXG9epx9CtKnz13 tm0U8Ivu1jggUnFuy4oEBdranJtuZMzAc95DynCsy2+q0HcjxHezfbW4wbzx0vER2HDpA+ uNseM1Z0Xx7Nbo7O+1fv5XI8ya3kRyU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=cwIaLMT16E5qj8r3voiOM19WTdKFjWSp1/HW3YYU9XQ=; b=gJ/E6MLuVu2EmXNG7Jfnt/BZmOSXrtS0gRK3fzrA54aArVII2QpT4L5fvQjmyN3yF4ci9d 2GfiQt+zo9nnJ435JNMIw+kZ8E3OpTL+uovFHiaUrp4tijkkmxiUVJwPfE+0ykzVkEarut W5wBjNwSzjwRzkTmRWE8jLS051g4BpY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-618-C3i_wYVyMvq9E-ledI_xKw-1; Fri, 19 May 2023 08:02:15 -0400 X-MC-Unique: C3i_wYVyMvq9E-ledI_xKw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D2DC985A5BF; Fri, 19 May 2023 12:02:14 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 05C7840CFD46; Fri, 19 May 2023 12:02:13 +0000 (UTC) Date: Fri, 19 May 2023 20:02:10 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 2/3] mm/vmalloc.c: Only flush VM_FLUSH_RESET_PERMS area immediately Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Rspamd-Queue-Id: DD9D81401A9 X-Rspam-User: X-Stat-Signature: dhbpizcw6ounrkr6jkqt3xduymz1owig X-Rspamd-Server: rspam01 X-HE-Tag: 1684497739-740563 X-HE-Meta: U2FsdGVkX18z/2FC+vYq6AwkKogU7xzgoXckWP2eU5AGF5ymbYFSBI3ot/gopTkydvNUnlxhf+RH54igKDRL90n2CLctr5UsOyYX5sD7zw62VzFBsKQpuZ+QIcrOg/M78x73Wcc7eDmyLfMLHz0smmhJ6HjvBPt56vtWa42mzlY7CAtphq7qapyDTYr5uwJYc2AZn1K0UFbijjJ2WHNfTNfT47Hh7aOmtxwMwgUKqZi/a2i/lxxUylQwhPHct1fKIEUzzPMowHzHI6CjUo7CWtg3cx9Lmafyc7o+IXPkUFMk/VUaaj3xrY7k0JB7XnnMXcHO4Iu7OycSqXkiX4tWe6ZPbpKqmr1iZg4oZsz+Yy9n36OUirpAOMo8rtfz9lokQ0xoEs7bUVMtj6X+pknpeOHCNwisgM3UvBjZYjLd+LFdfAB0m52PJ05Bda10cs8lGysUqPiAUu3XMjy5j96Yx3KlWQ21taozFKyq7iLWsu+wcXYI2nf+58L1g6l/1KNZ1JZs1poGCed116tiFSm2ylEkXhZD0bxncBtTfjSn9VOwhkrWuw/USlsUuRrqC/eEtMaTEm1aePtYy7fcmXTO/o1DCTyQnflx0V/ktFHPL6udXRyUMbfmfQg5xgzfdalltV0y1F+i/lbG78LAUqbFt315uvPJhkP4wvw1E7vvgGik7sNvQd8iVANEr9NPbYIScXyBOuBeCCN8BHjE5RhE2YmvK2LHX0HY+Xr88B61uktg34AKJ6TQYmFJIUXep3ePNduVrg4tgTB18CBZwQfVbbjeiwO9gqOYS+/PkUoar0VMyiU80gVm5raij64CDJ6BcbSOPqvOv6+0sak4X8HKGAgEN7x5LRlYRF7GJ3I4YHAi+vBJYi/KHhzWhp3DLydLjEN0LQ4qJXvIuxYf9v7qrjCohla06Dk1z2rMGtzhUYVjYu7K5URHfVM+agI2nlQT1Y9M2t5pVCWyyueIE2P zyjQw8kY LeEvicJXPvK3pTIg7aU84/gWU8RxdXXsy3gh+69lOLSSyUdom0Qqu/qYb6BP+tlNt3SqIIN6GVpOEdcebSY14KAG/lJnKaI3sIcrMMiLTK9oJaDeo2fA54HRo2cwmE1DfdcI2MpNSMNzV9rrCQz0lgzZ0H5qQQwPUbLSpE42T5f0Nsi0yaBfi+eqOEO50uoZzLTpFmL85wM9BtmvvFRItW1TjQNdafbV2VV5zMABCOADhGYcZkLLwJDXrOXBIqzcbI8/Wi1es1KI+TaY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When freeing vmalloc range mapping, only unmapping the page tables is done, TLB flush is lazily deferred to a late stage until lazy_max_pages() is met or vmalloc() can't find available virtual memory region. However, to free VM_FLUSH_RESET_PERMS memory of vmalloc, TLB flushing need be done immediately before freeing pages, and the direct map needs resetting permissions and TLB flushing. Please see commit 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions") for more details. In the current code, when freeing VM_FLUSH_RESET_PERMS memory, lazy purge is also done to try to save a TLB flush later. When doing that, it merges the direct map range with the percpu vbq dirty range and all purge ranges by calculating flush range of [min:max]. That will add the huge gap between direct map range and vmalloc range into the final TLB flush range. So here, only flush VM_FLUSH_RESET_PERMS area immediately, and leave the lazy flush to the normal points, e.g when allocating a new vmap_area, or lazy_max_pages() is met. Signed-off-by: Baoquan He --- mm/vmalloc.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 31e8d9e93650..87134dd8abc3 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2690,9 +2690,10 @@ static inline void set_area_direct_map(const struct vm_struct *area, */ static void vm_reset_perms(struct vm_struct *area) { - unsigned long start = ULONG_MAX, end = 0; + unsigned long start = ULONG_MAX, end = 0, pages = 0; unsigned int page_order = vm_area_page_order(area); - int flush_dmap = 0; + struct list_head local_flush_list; + struct vmap_area alias_va, va; int i; /* @@ -2708,17 +2709,33 @@ static void vm_reset_perms(struct vm_struct *area) page_size = PAGE_SIZE << page_order; start = min(addr, start); end = max(addr + page_size, end); - flush_dmap = 1; } } + va.va_start = (unsigned long)area->addr; + va.va_end = (unsigned long)(area->addr + area->size); /* * Set direct map to something invalid so that it won't be cached if * there are any accesses after the TLB flush, then flush the TLB and * reset the direct map permissions to the default. */ set_area_direct_map(area, set_direct_map_invalid_noflush); - _vm_unmap_aliases(start, end, flush_dmap); + if (IS_ENABLED(CONFIG_HAVE_FLUSH_TLB_KERNEL_VAS)) { + if (end > start) { + pages = (end - start) >> PAGE_SHIFT; + alias_va.va_start = (unsigned long)start; + alias_va.va_end = (unsigned long)end; + list_add(&alias_va.list, &local_flush_list); + } + + pages += area->size >> PAGE_SHIFT; + list_add(&va.list, &local_flush_list); + + flush_tlb_kernel_vas(&local_flush_list, pages); + } else { + flush_tlb_kernel_range(start, end); + flush_tlb_kernel_range(va.va_start, va.va_end); + } set_area_direct_map(area, set_direct_map_default_noflush); } From patchwork Fri May 19 12:03:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baoquan He X-Patchwork-Id: 13248262 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73093C77B7A for ; Fri, 19 May 2023 12:03:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15E14280002; Fri, 19 May 2023 08:03:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1347F900003; Fri, 19 May 2023 08:03:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3F63280002; Fri, 19 May 2023 08:03:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E681D900003 for ; Fri, 19 May 2023 08:03:29 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id B4899C093C for ; Fri, 19 May 2023 12:03:29 +0000 (UTC) X-FDA: 80806869738.30.3C141BB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 3157A120021 for ; Fri, 19 May 2023 12:03:20 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WwQosKhX; spf=pass (imf29.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684497801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZivJ7EerY3Vw+ercF4baigkPxVV4k6zKRN43xpFWDKk=; b=cokUVzW0/1xIRB7cTkdpvU+fRFsvAkH5rqrQWbLFLd1r8m0M6od10FLBlBdEfRiVAWx0ww mhowMiIaNUEgfj2CAg3+06zXte3HkacuxRKKy72nrZDrYHa2d9ebizKo2t0+qtPmkozoBG 7wX7cXEIPkU0HNCS2rawQjb7lscSJ9w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684497801; a=rsa-sha256; cv=none; b=F8XksTfMR5oHxbRNbzP0rmXMadqVvFAcx1jIelVlMC3zjPihwLAiQWBKbRm1q3IUo1c10/ tm2DkFcg10xZxY5AU3C1Wkyeykv7miAWHPHH5faqZJa/dMBhKn6b2SD2qPuQ9dMm3yBydW yOCf+uBVBT3rFYhL5NlzqklD6P1XT04= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WwQosKhX; spf=pass (imf29.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684497799; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ZivJ7EerY3Vw+ercF4baigkPxVV4k6zKRN43xpFWDKk=; b=WwQosKhXcVC6nkZibORwOgdg0WbN3rgy6lV2E4zpVlgsKsRaRf1OKaI6mu1VI2ykzbVPEs xxBCSH0F3uNmrdFcvQ9zTMsL8OWzncy6uvQw2v7GAzjcSwvsTg4iH+/OO3g92I4vAhFtC3 nZZ6dYUkd9gMZs5Ghf59CEMPEC1D9/k= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-562-Z7r5N1YRPJyifeXJHy-YsA-1; Fri, 19 May 2023 08:03:17 -0400 X-MC-Unique: Z7r5N1YRPJyifeXJHy-YsA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1A20C800159; Fri, 19 May 2023 12:03:14 +0000 (UTC) Received: from localhost (ovpn-12-79.pek2.redhat.com [10.72.12.79]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 63AD140CFD45; Fri, 19 May 2023 12:03:13 +0000 (UTC) Date: Fri, 19 May 2023 20:03:09 +0800 From: Baoquan He To: Thomas Gleixner Cc: "Russell King (Oracle)" , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Uladzislau Rezki , Lorenzo Stoakes , Peter Zijlstra , John Ogness , linux-arm-kernel@lists.infradead.org, Mark Rutland , Marc Zyngier , x86@kernel.org Subject: [RFC PATCH 3/3] mm/vmalloc.c: change _vm_unmap_aliases() to do purge firstly Message-ID: References: <87zg658fla.ffs@tglx> <87r0rg93z5.ffs@tglx> <87ilcs8zab.ffs@tglx> <87fs7w8z6y.ffs@tglx> <874joc8x7d.ffs@tglx> <87r0rg73wp.ffs@tglx> <87edng6qu8.ffs@tglx> MIME-Version: 1.0 In-Reply-To: <87edng6qu8.ffs@tglx> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Rspamd-Queue-Id: 3157A120021 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: w1guj19ba9k443hf8p56gapdqf6d5hc1 X-HE-Tag: 1684497800-179836 X-HE-Meta: U2FsdGVkX1+BfyYyPAsp4VGrzxy1MJ7hJpVx9C2FwAfUPAx5aTkhSnd1eGiSCKbpY0Pmbro2XEvHQagCPIr15x28T1gCXjPKnBP0hWXKzArFB7RfMCm1PeNyJ7GMIQIZGZoUYrFc+SXLjPDzeI8IluCW9F86/occUBFg124LpDcjyLlGRiFPLIv9/kmwq7fiXgjg9vEfeTwhvnwDET3nzhwRGpyoEnPEOffvbfwSimhcuWJUYsWpvhsCG9yLeDKE/05lUGO2hzFIa217oF4UWtklDngwQfZF6nw7j2uNJqpelXgW0Wn2+gOBo6FjlSYLM2sV4lRd09BjyDVfHhapzO+DetJCuOxCOb4KrWMfjDSolgUYiGHynD9REfvmDEhyyXIbnroUpFzTw08WF0/N0mADOpWYM3vNkvhCqT2wZ5C+/HfiV15cFE3aB3z+d7r9Sljwv6peRKu4+7r4t8uWjrSDfF9x+eEeltONhpUnckhG9L4eCLPKwRvmAQuzp3veANJ+ocfmL61AmKbjmNfVXLB481uYq5cz3sUOt/39uYiwqMuZdkkD2u90upc2vHSIOu9gUIoNQSKN7qAUVugzUpyjnMACM0jASEGbcRDC2MEG1XOr6hxojICfFQVHrLJM9kVJnnNRa1IYpcPYdamwFFYQqgyqiBZshRYRiu0nnTgMfodAGw5qSOUwuaWPob4ddUBFTCr6V0AD5KjqGAqkTqP7sBLiYwwWBdW0NSdAYzzYtKqwX2AWdU1nOzm4lhEpFAe3f3P52A3L+QqzEnHA8rLdqfzKur+sEM3Yq01EdpwO5AIyP717AHs883RClccMANR3XCFceSALfuMWppqt6X/ymjdniABesbCzpKmj/suosm/Wmx6jY2zC0aRYVai7EGdqde40tYawuOf8wN9SgDDWMj7YhakDIgZ58/iEjYV2io8hzqqvW9gG77yiCXo30wVm/0UTZW2FAB1UcKI ljE/afxb /InDISJqUBfs+NdXiAt5NEyFjbXMCIpPznI6ZN3LmIukzGZHjOT6OgqXxC9fBaUfp69IBGu/gCPxTdTYBQLREV0YpupLTqYcVRSmrJ4+BIll0xcgUj99LFEPv7ED9POHkZg+BzBPxkrA5mNNCoyTRs6MEDqUuFikNWd8pImEOH4Hz3egSL4if0kWtubsNCgn0+vnWB1j1WA9lKPVuibyELLYSTkUB/J6JylTv8euJbgGxfrqCZh9QzYxgVGlkzm9ophY3oCwpsKCG4dkdw3+b5s4jyg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After vb_free() invocation, the va will be purged and put into purge tree/list if the entire vmap_block is dirty. If not entirely dirty, the vmap_block is still in percpu vmap_block_queue list, just like below two graphs: (1) |-----|------------|-----------|-------| |dirty|still mapped| dirty | free | 2) |------------------------------|-------| | dirty | free | In the current _vm_unmap_aliases(), to reclaim those unmapped range and flush, it will iterate percpu vbq to calculate the range from vmap_block like above two cases. Then call purge_fragmented_blocks_allcpus() to purge the vmap_block in case 2 since no mapping exists right now, and put these purged vmap_block va into purge tree/list. Then in __purge_vmap_area_lazy(), it will continue calculating the flush range from purge list. Obviously, this will take vmap_block va in the 2nd case into account twice. So here just move purge_fragmented_blocks_allcpus() up to purge the vmap_block va of case 2 firstly, then only need iterate and count in the dirty range in above 1st case. With the change, counting in the dirty region of vmap_block in 1st case is now inside vmap_purge_lock protection region, which makes the flush range calculation more reasonable and accurate by avoiding concurrent operation in other cpu. And also rename _vm_unmap_aliases() to vm_unmap_aliases(), since no other caller except of the old vm_unmap_aliases(). Signed-off-by: Baoquan He --- mm/vmalloc.c | 45 ++++++++++++++++++++------------------------- 1 file changed, 20 insertions(+), 25 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 87134dd8abc3..9f7cbd6182ad 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2236,8 +2236,23 @@ static void vb_free(unsigned long addr, unsigned long size) spin_unlock(&vb->lock); } -static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) +/** + * vm_unmap_aliases - unmap outstanding lazy aliases in the vmap layer + * + * The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily + * to amortize TLB flushing overheads. What this means is that any page you + * have now, may, in a former life, have been mapped into kernel virtual + * address by the vmap layer and so there might be some CPUs with TLB entries + * still referencing that page (additional to the regular 1:1 kernel mapping). + * + * vm_unmap_aliases flushes all such lazy mappings. After it returns, we can + * be sure that none of the pages we have control over will have any aliases + * from the vmap layer. + */ +void vm_unmap_aliases(void) { + unsigned long start = ULONG_MAX, end = 0; + bool flush = false; int cpu; if (unlikely(!vmap_initialized)) @@ -2245,6 +2260,9 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) might_sleep(); + mutex_lock(&vmap_purge_lock); + purge_fragmented_blocks_allcpus(); + for_each_possible_cpu(cpu) { struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); struct vmap_block *vb; @@ -2262,40 +2280,17 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush) start = min(s, start); end = max(e, end); - flush = 1; + flush = true; } spin_unlock(&vb->lock); } rcu_read_unlock(); } - mutex_lock(&vmap_purge_lock); - purge_fragmented_blocks_allcpus(); if (!__purge_vmap_area_lazy(start, end) && flush) flush_tlb_kernel_range(start, end); mutex_unlock(&vmap_purge_lock); } - -/** - * vm_unmap_aliases - unmap outstanding lazy aliases in the vmap layer - * - * The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily - * to amortize TLB flushing overheads. What this means is that any page you - * have now, may, in a former life, have been mapped into kernel virtual - * address by the vmap layer and so there might be some CPUs with TLB entries - * still referencing that page (additional to the regular 1:1 kernel mapping). - * - * vm_unmap_aliases flushes all such lazy mappings. After it returns, we can - * be sure that none of the pages we have control over will have any aliases - * from the vmap layer. - */ -void vm_unmap_aliases(void) -{ - unsigned long start = ULONG_MAX, end = 0; - int flush = 0; - - _vm_unmap_aliases(start, end, flush); -} EXPORT_SYMBOL_GPL(vm_unmap_aliases); /**