mbox series

[v6,00/13] riscv: ASID-related and UP-related TLB flush enhancements

Message ID 20240327045035.368512-1-samuel.holland@sifive.com (mailing list archive)
Headers show
Series riscv: ASID-related and UP-related TLB flush enhancements | expand

Message

Samuel Holland March 27, 2024, 4:49 a.m. UTC
This series converts uniprocessor kernel builds to use the same TLB
flushing code as SMP builds, to take advantage of batching and existing
range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
SBI calls based on the online CPU count, which also covers the scenario
where SMP was enabled at build time but only one CPU is present/online.
A final optimization is to use single-ASID flushes wherever possible, to
avoid unnecessary TLB misses for kernel mappings.

This series has a semantic conflict with the AIA patches that are in
linux-next due to the removal of the third parameter of
riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init()
in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove
the extra argument from the call site.

Here are some numbers from D1 which show the performance impact:

v6.9-rc1:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        198.5     46.2
 File Copy 1024 bufsize 2000 maxblocks          3960.0      73934.4    186.7
 File Copy 256 bufsize 500 maxblocks            1655.0      20242.6    122.3
 File Copy 4096 bufsize 8000 maxblocks          5800.0     197706.4    340.9
 Pipe Throughput                               12440.0     176974.2    142.3
 Pipe-based Context Switching                   4000.0      23626.8     59.1
 Process Creation                                126.0        449.9     35.7
 Shell Scripts (1 concurrent)                     42.4        544.4    128.4
 Shell Scripts (16 concurrent)                     ---         35.3      ---
 Shell Scripts (8 concurrent)                      6.0         71.6    119.3
 System Call Overhead                          15000.0     248072.6    165.4
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          110.6

v6.9-rc1 + this patch series:
 System Benchmarks Partial Index              BASELINE       RESULT    INDEX
 Execl Throughput                                 43.0        196.8     45.8
 File Copy 1024 bufsize 2000 maxblocks          3960.0      71782.2    181.3
 File Copy 256 bufsize 500 maxblocks            1655.0      21269.4    128.5
 File Copy 4096 bufsize 8000 maxblocks          5800.0     199424.0    343.8
 Pipe Throughput                               12440.0     196468.6    157.9
 Pipe-based Context Switching                   4000.0      24261.8     60.7
 Process Creation                                126.0        459.0     36.4
 Shell Scripts (1 concurrent)                     42.4        543.8    128.2
 Shell Scripts (16 concurrent)                     ---         35.5      ---
 Shell Scripts (8 concurrent)                      6.0         71.7    119.6
 System Call Overhead                          15000.0     259415.2    172.9
                                                                    ========
 System Benchmarks Index Score (Partial Only)                          113.0

Changes in v6:
 - Move riscv_tlb_remove_ptdesc() definition to fix 32-bit build
 - Clarify the commit message for patch 3 based on ML discussion
 - Clarify the commit message for patch 8 based on ML discussion
 - Rebased on v6.9-rc1

Changes in v5:
 - Rebase on v6.8-rc1 + riscv/for-next (for the fast GUP implementation)
 - Add patch for minor refactoring in asm/pgalloc.h
 - Also switch to riscv_use_sbi_for_rfence() in asm/pgalloc.h
 - Leave use_asid_allocator declared in asm/mmu_context.h

Changes in v4:
 - Fix a possible race between flush_icache_*() and SMP bringup
 - Refactor riscv_use_ipi_for_rfence() to make later changes cleaner
 - Optimize kernel TLB flushes with only one CPU online
 - Optimize global cache/TLB flushes with only one CPU online
 - Merge the two copies of __flush_tlb_range() and rely on the compiler
   to optimize out the broadcast path (both clang and gcc do this)
 - Merge the two copies of flush_tlb_all() and rely on constant folding
 - Only set tlb_flush_all_threshold when CONFIG_MMU=y.

Changes in v3:
 - Fixed a performance regression caused by executing sfence.vma in a
   loop on implementations affected by SiFive CIP-1200
 - Rebased on v6.7-rc1

Changes in v2:
 - Move the SMP/UP merge earlier in the series to avoid build issues
 - Make a copy of __flush_tlb_range() instead of adding ifdefs inside
 - local_flush_tlb_all() is the only function used on !MMU (smpboot.c)

Samuel Holland (13):
  riscv: Flush the instruction cache during SMP bringup
  riscv: Factor out page table TLB synchronization
  riscv: Use IPIs for remote cache/TLB flushes by default
  riscv: mm: Broadcast kernel TLB flushes only when needed
  riscv: Only send remote fences when some other CPU is online
  riscv: mm: Combine the SMP and UP TLB flush code
  riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
  riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
  riscv: mm: Introduce cntx2asid/cntx2version helper macros
  riscv: mm: Use a fixed layout for the MM context ID
  riscv: mm: Make asid_bits a local variable
  riscv: mm: Preserve global TLB entries when switching contexts
  riscv: mm: Always use an ASID to flush mm contexts

 arch/riscv/Kconfig                   |  2 +-
 arch/riscv/errata/sifive/errata.c    |  5 ++
 arch/riscv/include/asm/errata_list.h | 12 ++++-
 arch/riscv/include/asm/mmu.h         |  3 ++
 arch/riscv/include/asm/pgalloc.h     | 32 ++++++------
 arch/riscv/include/asm/sbi.h         |  4 ++
 arch/riscv/include/asm/smp.h         | 15 +-----
 arch/riscv/include/asm/tlbflush.h    | 52 ++++++++-----------
 arch/riscv/kernel/sbi-ipi.c          | 11 +++-
 arch/riscv/kernel/smp.c              | 11 +---
 arch/riscv/kernel/smpboot.c          |  7 +--
 arch/riscv/mm/Makefile               |  5 +-
 arch/riscv/mm/cacheflush.c           |  7 +--
 arch/riscv/mm/context.c              | 23 ++++-----
 arch/riscv/mm/tlbflush.c             | 75 ++++++++--------------------
 drivers/clocksource/timer-clint.c    |  2 +-
 16 files changed, 114 insertions(+), 152 deletions(-)

Comments

patchwork-bot+linux-riscv@kernel.org May 14, 2024, 2 p.m. UTC | #1
Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Tue, 26 Mar 2024 21:49:41 -0700 you wrote:
> This series converts uniprocessor kernel builds to use the same TLB
> flushing code as SMP builds, to take advantage of batching and existing
> range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
> SBI calls based on the online CPU count, which also covers the scenario
> where SMP was enabled at build time but only one CPU is present/online.
> A final optimization is to use single-ASID flushes wherever possible, to
> avoid unnecessary TLB misses for kernel mappings.
> 
> [...]

Here is the summary with links:
  - [v6,01/13] riscv: Flush the instruction cache during SMP bringup
    https://git.kernel.org/riscv/c/58661a30f1bc
  - [v6,02/13] riscv: Factor out page table TLB synchronization
    https://git.kernel.org/riscv/c/aaa56c8f378d
  - [v6,03/13] riscv: Use IPIs for remote cache/TLB flushes by default
    https://git.kernel.org/riscv/c/dc892fb44322
  - [v6,04/13] riscv: mm: Broadcast kernel TLB flushes only when needed
    https://git.kernel.org/riscv/c/038ac18aae93
  - [v6,05/13] riscv: Only send remote fences when some other CPU is online
    https://git.kernel.org/riscv/c/9546f00410ed
  - [v6,06/13] riscv: mm: Combine the SMP and UP TLB flush code
    https://git.kernel.org/riscv/c/c6026d35b6ab
  - [v6,07/13] riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
    https://git.kernel.org/riscv/c/20e03d702e00
  - [v6,08/13] riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
    https://git.kernel.org/riscv/c/d6dcdabafcd7
  - [v6,09/13] riscv: mm: Introduce cntx2asid/cntx2version helper macros
    https://git.kernel.org/riscv/c/74cd17792d28
  - [v6,10/13] riscv: mm: Use a fixed layout for the MM context ID
    https://git.kernel.org/riscv/c/f58e5dc45fa9
  - [v6,11/13] riscv: mm: Make asid_bits a local variable
    https://git.kernel.org/riscv/c/8d3e7613f97e
  - [v6,12/13] riscv: mm: Preserve global TLB entries when switching contexts
    https://git.kernel.org/riscv/c/8fc21cc672e8
  - [v6,13/13] riscv: mm: Always use an ASID to flush mm contexts
    https://git.kernel.org/riscv/c/daef19263fc1

You are awesome, thank you!