From patchwork Wed Mar 27 04:49:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Samuel Holland X-Patchwork-Id: 13605544 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDC95C54E67 for ; Wed, 27 Mar 2024 04:50:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8413A6B0089; Wed, 27 Mar 2024 00:50:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F07C6B008A; Wed, 27 Mar 2024 00:50:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B8306B0092; Wed, 27 Mar 2024 00:50:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 59F226B0089 for ; Wed, 27 Mar 2024 00:50:41 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E1C5B1C025A for ; Wed, 27 Mar 2024 04:50:40 +0000 (UTC) X-FDA: 81941593440.30.A476E47 Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) by imf23.hostedemail.com (Postfix) with ESMTP id 12459140013 for ; Wed, 27 Mar 2024 04:50:38 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=sifive.com header.s=google header.b="T/YFjvNb"; spf=pass (imf23.hostedemail.com: domain of samuel.holland@sifive.com designates 209.85.210.51 as permitted sender) smtp.mailfrom=samuel.holland@sifive.com; dmarc=pass (policy=reject) header.from=sifive.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711515039; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=6CpKiyiacN2I6+hFCujZLl+xL41sMj3x+fr/3v5LupU=; b=e6jaDlCGL/b4rdQmf5vJgu8j1OXrUt8M4l+M+Mk/ZhOZz/ucjVpo+qmgFEXcPDRpGXpWwV QkrNF9vJTzc8yOefEmtMyjacoRZKrKncnWuOS0RVeUtLNf+LLvr5CErBbBli89KEEim1KZ w3+mxsY/Tw3M6WTZCt4sumBwZjnfvMI= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=sifive.com header.s=google header.b="T/YFjvNb"; spf=pass (imf23.hostedemail.com: domain of samuel.holland@sifive.com designates 209.85.210.51 as permitted sender) smtp.mailfrom=samuel.holland@sifive.com; dmarc=pass (policy=reject) header.from=sifive.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711515039; a=rsa-sha256; cv=none; b=3EhvoQAIhU4GvJ4RlFvkB1mOPDwd1yG4pMoEt0nWcGGQA7RDB5d3GDkr8fseIUyKVcyKWP bBleD6vWexVufXurvWbcwxAI2uwiKftyoECZZw6khLRPuAWTIDrKkPIG9GI2u1UgC9tQu7 ifEwzp7IlZmft5qwQUkiwaUkpcnvykw= Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-6ddf26eba3cso4143735a34.0 for ; Tue, 26 Mar 2024 21:50:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1711515038; x=1712119838; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=6CpKiyiacN2I6+hFCujZLl+xL41sMj3x+fr/3v5LupU=; b=T/YFjvNbYuMNpTh0JmxpaTEEzGrHrcRa3kCbRkx5Vy/NEHAm7wIlweHd+i89XWmY9a OGmv3P8loaPqhR8UhWRiPU31QEjql1XxAs39SCAMELXdVN1d50zhaTC17esXg94cJu1E PeMJ3TVqan22iWfX//Bnrm+lKm88TzdbkIBWwoMs1t5vI0T7xOt2f29HgLdEVHV7ZgXH P8LXD3hpi5HFpUshw+YdOXb0KbHNONBXxqT4dsTXwJ/1bl0wZ8zsM0zty/MLbcC03U+0 aMYlfXsZY8HUIJDp+evwJp4UfWpWlSoFyM0eckkp7jImPsLhHKpjy90FHPzdOodwAvTM F9yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711515038; x=1712119838; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6CpKiyiacN2I6+hFCujZLl+xL41sMj3x+fr/3v5LupU=; b=TKAt+jPOuiNx0fYduyD1wJ+1R+4huIFZiyeDfP5yFDnxDlfXpBFu64CHIK/6WyTCUk 5W/CqmpGtOOFYYYQfalsbYQjid9Gd2fO8m+LMoJdyI2AGFOJ7wQr8Sc33ZWGFep2TGHt hjjqnVvLF8gqFrbLg0IzSIEM1aRZuDUFzvzU0TVgFDONOmcE4gkz9o6y63LoGNxrmbCZ Se18XJ4Yd39nwXB0nADsSctE/wWbCZcW0tXVu2WXSwsDONmAZN+NTPWIEnb2nyPAZD1u CLrPfrcOHT4PaNh+Q+23Q1gHftcatEiT888A4kqayJKTwBei+Lsb+g0QdtAFSmYWxJW2 1m6w== X-Forwarded-Encrypted: i=1; AJvYcCUUGZXZ941CX+5zkiOXhiZbXieyRW5jCNQE7IUh0YMTEk5YTBLnlOc8o0uSFARNZ8qyLb6w+x0KwHICzSgky6+JuaU= X-Gm-Message-State: AOJu0Yy9bdbbmT89k7hnODqXNMfqygGL8teR9r2guXpztsUIsuOlE02M FPXsAiS+J2On0/bAuSxwwQWbFwDLCV7gGpYNgtG5ehOLyjxn5BZgSPExlf72bVE= X-Google-Smtp-Source: AGHT+IHtgqVc4dGRodfUI+5mrwigf1xOqOoq1XaZ686TNbdNzvm3dDCitVqVj7iIe7hadKDURImf9g== X-Received: by 2002:a05:6870:a118:b0:22a:1ce4:c0cf with SMTP id m24-20020a056870a11800b0022a1ce4c0cfmr3722861oae.55.1711515037957; Tue, 26 Mar 2024 21:50:37 -0700 (PDT) Received: from sw06.internal.sifive.com ([4.53.31.132]) by smtp.gmail.com with ESMTPSA id e31-20020a63501f000000b005e4666261besm8351500pgb.50.2024.03.26.21.50.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 21:50:37 -0700 (PDT) From: Samuel Holland To: Palmer Dabbelt , linux-riscv@lists.infradead.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Alexandre Ghiti , Jisheng Zhang , Yunhui Cui , Samuel Holland Subject: [PATCH v6 00/13] riscv: ASID-related and UP-related TLB flush enhancements Date: Tue, 26 Mar 2024 21:49:41 -0700 Message-ID: <20240327045035.368512-1-samuel.holland@sifive.com> X-Mailer: git-send-email 2.43.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 12459140013 X-Rspam-User: X-Stat-Signature: wz6ifc373srxmrw9rmp3khd4kyzphzbc X-Rspamd-Server: rspam01 X-HE-Tag: 1711515038-130732 X-HE-Meta: U2FsdGVkX1/BfX2C0XNy1GuBa7P2tKSSxTd67zbiiNw6GMNn0B3oT6tEuQeeqXgX+ma/Zsw1KA2MLVcXfxrc/UI5kJ+9yptEf//QzEd89XBBPX4avVXrb5sgsmRszb/JIKELi+bRQdCA0bumbzaLhUoiPeaFmQyJa9kUREwVnnhVTyjiGOT9LT1ciip4rtBwuUuXOTP1Ndge29jIJUmascf1fIAFICmhhxab7DOGinebY5vQ1jF+Bi7Ew8o963H2O+Wmm/QYNrrqddRi8//5Z/ilgCcIKR8NaXpvEVirHgJ/buPqglyK1TuFY1wGLpyorKFcy5eK/NLMpM8ewE6vcD8og7epCjHF+d0FFLFhgsVuMzLsC0/bbuGpjUKHZf2ui/J0QEgmg3FxC1rriGauJHGzBccdEJrh6i0o4agU+x6fiOqLYgc0oe7RNfMbcR4Wp+cRwR+ihaVloS7akv16gtQSoXpimqksBx61w5CuMHfTaCGKIBUXRekI0e6J6za9Sl+QLgiYmJhQ0NAEkmpecUvPw8RMg6G4jf/LKBXszOBXtdSIq0lsDpfGkTjU3nEZupMMtBHNcIej7Vr233GNz6xTQBmyxPXPaMeWCTo1c5iBvxYrM32I/wa5l8okbXhCuKsi1n6IWBNKCpPHMmI6Ws0nrD+DuNXFW0BnGErNyALFUHtRV+3aKRlVzAivLdzY7sWlKjRZMFUvVhob/OOGjZmB1k2LvHeJcE+jrr5+/YsgGXy5TczOH/EKHTsnajlNJ5kxs/KxDZIrFNSHKlikqYwj8+S4aJcX8vGLDuS6NbWi0nWKuv+wezH6ODop6R5OZM8El2QpiUDmn1JLkzzASRQR9bF1Gopya/BFihVnkAwt2e2Z4Z5DQjXBGTO6tzvhqvKFwFiefW3l46IV4B6z5Q9VPhsZkPz7a8+mYdflaNy9D+Em+QfeU9dR9MLB1CdSo2pzzc+wtc2tAtC7H8y yCEQ/GYE M33UDQb6VKDTqwyp+w1GUDTSXk/sSyt8e7jxL9HsdkG+LW28JBOczD5ODrEo8THEMJJA+mnYmH3exDNnLBo0fPiSN2yAZKM4bneQ3z8/EN/aFQ9qMOOBuM5Ds4IwV+/VLEQemeRlH5FxKRHH8wo6iG6zpiCUR4Pdmm8VrY50ZywWAhGmmTiVmwdINgHYP2TWZkQz0m101292Z6SiOMwloL/Yxkp0Dye1VNYpJCDQ/lddATm+uPH9y8DzeyyRzaz9sPqiflFYj7P0JShKhybJgu55ixMt9pbfqTIA8qy9OqIspFHXFxTvnjMiwGhUgFoi+9vWQNcqqDMaJWT4W+ZBDzS9LdQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This series converts uniprocessor kernel builds to use the same TLB flushing code as SMP builds, to take advantage of batching and existing range- and ASID-based TLB flush optimizations. It optimizes out IPIs and SBI calls based on the online CPU count, which also covers the scenario where SMP was enabled at build time but only one CPU is present/online. A final optimization is to use single-ASID flushes wherever possible, to avoid unnecessary TLB misses for kernel mappings. This series has a semantic conflict with the AIA patches that are in linux-next due to the removal of the third parameter of riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init() in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove the extra argument from the call site. Here are some numbers from D1 which show the performance impact: v6.9-rc1: System Benchmarks Partial Index BASELINE RESULT INDEX Execl Throughput 43.0 198.5 46.2 File Copy 1024 bufsize 2000 maxblocks 3960.0 73934.4 186.7 File Copy 256 bufsize 500 maxblocks 1655.0 20242.6 122.3 File Copy 4096 bufsize 8000 maxblocks 5800.0 197706.4 340.9 Pipe Throughput 12440.0 176974.2 142.3 Pipe-based Context Switching 4000.0 23626.8 59.1 Process Creation 126.0 449.9 35.7 Shell Scripts (1 concurrent) 42.4 544.4 128.4 Shell Scripts (16 concurrent) --- 35.3 --- Shell Scripts (8 concurrent) 6.0 71.6 119.3 System Call Overhead 15000.0 248072.6 165.4 ======== System Benchmarks Index Score (Partial Only) 110.6 v6.9-rc1 + this patch series: System Benchmarks Partial Index BASELINE RESULT INDEX Execl Throughput 43.0 196.8 45.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 71782.2 181.3 File Copy 256 bufsize 500 maxblocks 1655.0 21269.4 128.5 File Copy 4096 bufsize 8000 maxblocks 5800.0 199424.0 343.8 Pipe Throughput 12440.0 196468.6 157.9 Pipe-based Context Switching 4000.0 24261.8 60.7 Process Creation 126.0 459.0 36.4 Shell Scripts (1 concurrent) 42.4 543.8 128.2 Shell Scripts (16 concurrent) --- 35.5 --- Shell Scripts (8 concurrent) 6.0 71.7 119.6 System Call Overhead 15000.0 259415.2 172.9 ======== System Benchmarks Index Score (Partial Only) 113.0 Changes in v6: - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build - Clarify the commit message for patch 3 based on ML discussion - Clarify the commit message for patch 8 based on ML discussion - Rebased on v6.9-rc1 Changes in v5: - Rebase on v6.8-rc1 + riscv/for-next (for the fast GUP implementation) - Add patch for minor refactoring in asm/pgalloc.h - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h - Leave use_asid_allocator declared in asm/mmu_context.h Changes in v4: - Fix a possible race between flush_icache_*() and SMP bringup - Refactor riscv_use_ipi_for_rfence() to make later changes cleaner - Optimize kernel TLB flushes with only one CPU online - Optimize global cache/TLB flushes with only one CPU online - Merge the two copies of __flush_tlb_range() and rely on the compiler to optimize out the broadcast path (both clang and gcc do this) - Merge the two copies of flush_tlb_all() and rely on constant folding - Only set tlb_flush_all_threshold when CONFIG_MMU=y. Changes in v3: - Fixed a performance regression caused by executing sfence.vma in a loop on implementations affected by SiFive CIP-1200 - Rebased on v6.7-rc1 Changes in v2: - Move the SMP/UP merge earlier in the series to avoid build issues - Make a copy of __flush_tlb_range() instead of adding ifdefs inside - local_flush_tlb_all() is the only function used on !MMU (smpboot.c) Samuel Holland (13): riscv: Flush the instruction cache during SMP bringup riscv: Factor out page table TLB synchronization riscv: Use IPIs for remote cache/TLB flushes by default riscv: mm: Broadcast kernel TLB flushes only when needed riscv: Only send remote fences when some other CPU is online riscv: mm: Combine the SMP and UP TLB flush code riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma riscv: Avoid TLB flush loops when affected by SiFive CIP-1200 riscv: mm: Introduce cntx2asid/cntx2version helper macros riscv: mm: Use a fixed layout for the MM context ID riscv: mm: Make asid_bits a local variable riscv: mm: Preserve global TLB entries when switching contexts riscv: mm: Always use an ASID to flush mm contexts arch/riscv/Kconfig | 2 +- arch/riscv/errata/sifive/errata.c | 5 ++ arch/riscv/include/asm/errata_list.h | 12 ++++- arch/riscv/include/asm/mmu.h | 3 ++ arch/riscv/include/asm/pgalloc.h | 32 ++++++------ arch/riscv/include/asm/sbi.h | 4 ++ arch/riscv/include/asm/smp.h | 15 +----- arch/riscv/include/asm/tlbflush.h | 52 ++++++++----------- arch/riscv/kernel/sbi-ipi.c | 11 +++- arch/riscv/kernel/smp.c | 11 +--- arch/riscv/kernel/smpboot.c | 7 +-- arch/riscv/mm/Makefile | 5 +- arch/riscv/mm/cacheflush.c | 7 +-- arch/riscv/mm/context.c | 23 ++++----- arch/riscv/mm/tlbflush.c | 75 ++++++++-------------------- drivers/clocksource/timer-clint.c | 2 +- 16 files changed, 114 insertions(+), 152 deletions(-)