Message ID | 20241023162711.2579610-1-rppt@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | x86/module: use large ROX pages for text allocations | expand |
On Wed, 23 Oct 2024 19:27:03 +0300 Mike Rapoport <rppt@kernel.org> wrote: > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > Hi, > > This is an updated version of execmem ROX caches. > FYI, I booted a kernel before and after applying these patches with my change: https://lore.kernel.org/20241017113105.1edfa943@gandalf.local.home Before these patches: # cat /sys/kernel/tracing/dyn_ftrace_total_info 57695 pages:231 groups: 9 ftrace boot update time = 14733459 (ns) ftrace module total update time = 449016 (ns) After: # cat /sys/kernel/tracing/dyn_ftrace_total_info 57708 pages:231 groups: 9 ftrace boot update time = 47195374 (ns) ftrace module total update time = 592080 (ns) Which caused boot time to slowdown by over 30ms. That may not seem like much, but we are very concerned about boot time and are fighting every ms we can get. -- Steve
On Mon, Nov 18, 2024 at 01:25:01PM -0500, Steven Rostedt wrote: > On Wed, 23 Oct 2024 19:27:03 +0300 > Mike Rapoport <rppt@kernel.org> wrote: > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > > > Hi, > > > > This is an updated version of execmem ROX caches. > > > > FYI, I booted a kernel before and after applying these patches with my > change: > > https://lore.kernel.org/20241017113105.1edfa943@gandalf.local.home > > Before these patches: > > # cat /sys/kernel/tracing/dyn_ftrace_total_info > 57695 pages:231 groups: 9 > ftrace boot update time = 14733459 (ns) > ftrace module total update time = 449016 (ns) > > After: > > # cat /sys/kernel/tracing/dyn_ftrace_total_info > 57708 pages:231 groups: 9 > ftrace boot update time = 47195374 (ns) > ftrace module total update time = 592080 (ns) > > Which caused boot time to slowdown by over 30ms. That may not seem like > much, but we are very concerned about boot time and are fighting every ms > we can get. Hmm, looks like this change was lost in rebase :/ @Andrew, should I send it as a patch on top of mm-stable? diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 8da0e66ca22d..859902dd06fc 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -111,17 +111,22 @@ static int ftrace_verify_code(unsigned long ip, const char *old_code) */ static int __ref ftrace_modify_code_direct(unsigned long ip, const char *old_code, - const char *new_code) + const char *new_code, struct module *mod) { int ret = ftrace_verify_code(ip, old_code); if (ret) return ret; /* replace the text with the new text */ - if (ftrace_poke_late) + if (ftrace_poke_late) { text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL); - else + } else if (!mod) { text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE); + } else { + mutex_lock(&text_mutex); + text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE); + mutex_unlock(&text_mutex); + } return 0; } @@ -142,7 +147,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, unsigned long ad * just modify the code directly. */ if (addr == MCOUNT_ADDR) - return ftrace_modify_code_direct(ip, old, new); + return ftrace_modify_code_direct(ip, old, new, mod); /* * x86 overrides ftrace_replace_code -- this function will never be used @@ -161,7 +166,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) new = ftrace_call_replace(ip, addr); /* Should only be called when module is loaded */ - return ftrace_modify_code_direct(rec->ip, old, new); + return ftrace_modify_code_direct(rec->ip, old, new, NULL); } /* > -- Steve
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Hi, This is an updated version of execmem ROX caches. v6: https://lore.kernel.org/all/20241016122424.1655560-1-rppt@kernel.org * Fixed handling of alternatives for fineibt (kbuild bot) * Restored usage of text_poke_early for ftrace boot time initialization (Steve) * Made !module path in module_writable_address inline v5: https://lore.kernel.org/all/20241009180816.83591-1-rppt@kernel.org * Droped check for !area in mas_for_each() loop (Kees Bakker) * Droped externs in include/linux/vmalloc.h (Christoph) * Fixed handling of alternatives for CFI-enabled configs (Nathan) * Fixed interaction with kmemleak (Sergey). It looks like execmem and kmemleak interaction should be improved further, but it's out of scope of this series. * Added ARCH_HAS_EXECMEM_ROX configuration option to arch/Kconfig. The option serves two purposes: - make sure architecture that uses ROX caches implements execmem_fill_trapping_insns() callback (Christoph) - make sure entire physical memory is mapped in the direct map (Dave) v4: https://lore.kernel.org/all/20241007062858.44248-1-rppt@kernel.org * Fix copy/paste error in looongarch (Huacai) v3: https://lore.kernel.org/all/20240909064730.3290724-1-rppt@kernel.org * Drop ftrace_swap_func(). It is not needed because mcount array lives in a data section (Peter) * Update maple_tree usage (Liam) * Set ->fill_trapping_insns pointer on init (Ard) * Instead of using VM_FLUSH_RESET_PERMS for execmem cache, completely remove it from the direct map v2: https://lore.kernel.org/all/20240826065532.2618273-1-rppt@kernel.org * add comment why ftrace_swap_func() is needed (Steve) Since RFC: https://lore.kernel.org/all/20240411160526.2093408-1-rppt@kernel.org * update changelog about HUGE_VMAP allocations (Christophe) * move module_writable_address() from x86 to modules core (Ingo) * rename execmem_invalidate() to execmem_fill_trapping_insns() (Peter) * call alternatives_smp_unlock() after module text in-place is up to date (Nadav) = Original cover letter = These patches add support for using large ROX pages for allocations of executable memory on x86. They address Andy's comments [1] about having executable mappings for code that was not completely formed. The approach taken is to allocate ROX memory along with writable but not executable memory and use the writable copy to perform relocations and alternatives patching. After the module text gets into its final shape, the contents of the writable memory is copied into the actual ROX location using text poking. The allocations of the ROX memory use vmalloc(VMAP_ALLOW_HUGE_MAP) to allocate PMD aligned memory, fill that memory with invalid instructions and in the end remap it as ROX. Portions of these large pages are handed out to execmem_alloc() callers without any changes to the permissions. When the memory is freed with execmem_free() it is invalidated again so that it won't contain stale instructions. The module memory allocation, x86 code dealing with relocations and alternatives patching take into account the existence of the two copies, the writable memory and the ROX memory at the actual allocated virtual address. The patches are available at git: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=execmem/x86-rox/v6 [1] https://lore.kernel.org/all/a17c65c6-863f-4026-9c6f-a04b659e9ab4@app.fastmail.com Mike Rapoport (Microsoft) (8): mm: vmalloc: group declarations depending on CONFIG_MMU together mm: vmalloc: don't account for number of nodes for HUGE_VMAP allocations asm-generic: introduce text-patching.h module: prepare to handle ROX allocations for text arch: introduce set_direct_map_valid_noflush() x86/module: prepare module loading for ROX allocations of text execmem: add support for cache of large ROX pages x86/module: enable ROX caches for module text on 64 bit arch/Kconfig | 8 + arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild | 1 + .../include/asm/{patch.h => text-patching.h} | 0 arch/arm/kernel/ftrace.c | 2 +- arch/arm/kernel/jump_label.c | 2 +- arch/arm/kernel/kgdb.c | 2 +- arch/arm/kernel/patch.c | 2 +- arch/arm/probes/kprobes/core.c | 2 +- arch/arm/probes/kprobes/opt-arm.c | 2 +- arch/arm64/include/asm/set_memory.h | 1 + .../asm/{patching.h => text-patching.h} | 0 arch/arm64/kernel/ftrace.c | 2 +- arch/arm64/kernel/jump_label.c | 2 +- arch/arm64/kernel/kgdb.c | 2 +- arch/arm64/kernel/patching.c | 2 +- arch/arm64/kernel/probes/kprobes.c | 2 +- arch/arm64/kernel/traps.c | 2 +- arch/arm64/mm/pageattr.c | 10 + arch/arm64/net/bpf_jit_comp.c | 2 +- arch/csky/include/asm/Kbuild | 1 + arch/hexagon/include/asm/Kbuild | 1 + arch/loongarch/include/asm/Kbuild | 1 + arch/loongarch/include/asm/set_memory.h | 1 + arch/loongarch/mm/pageattr.c | 19 + arch/m68k/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 1 + .../include/asm/{patch.h => text-patching.h} | 0 arch/parisc/kernel/ftrace.c | 2 +- arch/parisc/kernel/jump_label.c | 2 +- arch/parisc/kernel/kgdb.c | 2 +- arch/parisc/kernel/kprobes.c | 2 +- arch/parisc/kernel/patch.c | 2 +- arch/powerpc/include/asm/kprobes.h | 2 +- .../asm/{code-patching.h => text-patching.h} | 0 arch/powerpc/kernel/crash_dump.c | 2 +- arch/powerpc/kernel/epapr_paravirt.c | 2 +- arch/powerpc/kernel/jump_label.c | 2 +- arch/powerpc/kernel/kgdb.c | 2 +- arch/powerpc/kernel/kprobes.c | 2 +- arch/powerpc/kernel/module_32.c | 2 +- arch/powerpc/kernel/module_64.c | 2 +- arch/powerpc/kernel/optprobes.c | 2 +- arch/powerpc/kernel/process.c | 2 +- arch/powerpc/kernel/security.c | 2 +- arch/powerpc/kernel/setup_32.c | 2 +- arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/kernel/static_call.c | 2 +- arch/powerpc/kernel/trace/ftrace.c | 2 +- arch/powerpc/kernel/trace/ftrace_64_pg.c | 2 +- arch/powerpc/lib/code-patching.c | 2 +- arch/powerpc/lib/feature-fixups.c | 2 +- arch/powerpc/lib/test-code-patching.c | 2 +- arch/powerpc/lib/test_emulate_step.c | 2 +- arch/powerpc/mm/book3s32/mmu.c | 2 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 +- arch/powerpc/mm/book3s64/slb.c | 2 +- arch/powerpc/mm/kasan/init_32.c | 2 +- arch/powerpc/mm/mem.c | 2 +- arch/powerpc/mm/nohash/44x.c | 2 +- arch/powerpc/mm/nohash/book3e_pgtable.c | 2 +- arch/powerpc/mm/nohash/tlb.c | 2 +- arch/powerpc/mm/nohash/tlb_64e.c | 2 +- arch/powerpc/net/bpf_jit_comp.c | 2 +- arch/powerpc/perf/8xx-pmu.c | 2 +- arch/powerpc/perf/core-book3s.c | 2 +- arch/powerpc/platforms/85xx/smp.c | 2 +- arch/powerpc/platforms/86xx/mpc86xx_smp.c | 2 +- arch/powerpc/platforms/cell/smp.c | 2 +- arch/powerpc/platforms/powermac/smp.c | 2 +- arch/powerpc/platforms/powernv/idle.c | 2 +- arch/powerpc/platforms/powernv/smp.c | 2 +- arch/powerpc/platforms/pseries/smp.c | 2 +- arch/powerpc/xmon/xmon.c | 2 +- arch/riscv/errata/andes/errata.c | 2 +- arch/riscv/errata/sifive/errata.c | 2 +- arch/riscv/errata/thead/errata.c | 2 +- arch/riscv/include/asm/set_memory.h | 1 + .../include/asm/{patch.h => text-patching.h} | 0 arch/riscv/include/asm/uprobes.h | 2 +- arch/riscv/kernel/alternative.c | 2 +- arch/riscv/kernel/cpufeature.c | 3 +- arch/riscv/kernel/ftrace.c | 2 +- arch/riscv/kernel/jump_label.c | 2 +- arch/riscv/kernel/patch.c | 2 +- arch/riscv/kernel/probes/kprobes.c | 2 +- arch/riscv/mm/pageattr.c | 15 + arch/riscv/net/bpf_jit_comp64.c | 2 +- arch/riscv/net/bpf_jit_core.c | 2 +- arch/s390/include/asm/set_memory.h | 1 + arch/s390/mm/pageattr.c | 11 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/um/kernel/um_arch.c | 16 +- arch/x86/Kconfig | 1 + arch/x86/entry/vdso/vma.c | 3 +- arch/x86/include/asm/alternative.h | 14 +- arch/x86/include/asm/set_memory.h | 1 + arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 181 ++++++---- arch/x86/kernel/ftrace.c | 30 +- arch/x86/kernel/module.c | 45 ++- arch/x86/mm/init.c | 37 +- arch/x86/mm/pat/set_memory.c | 8 + arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/text-patching.h | 5 + include/linux/execmem.h | 37 ++ include/linux/module.h | 16 + include/linux/moduleloader.h | 4 + include/linux/set_memory.h | 6 + include/linux/text-patching.h | 15 + include/linux/vmalloc.h | 60 ++-- kernel/module/debug_kmemleak.c | 3 +- kernel/module/main.c | 74 +++- kernel/module/strict_rwx.c | 3 + mm/execmem.c | 336 +++++++++++++++++- mm/internal.h | 1 + mm/vmalloc.c | 14 +- 121 files changed, 885 insertions(+), 247 deletions(-) rename arch/arm/include/asm/{patch.h => text-patching.h} (100%) rename arch/arm64/include/asm/{patching.h => text-patching.h} (100%) rename arch/parisc/include/asm/{patch.h => text-patching.h} (100%) rename arch/powerpc/include/asm/{code-patching.h => text-patching.h} (100%) rename arch/riscv/include/asm/{patch.h => text-patching.h} (100%) create mode 100644 include/asm-generic/text-patching.h create mode 100644 include/linux/text-patching.h base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc