From patchwork Fri May 26 05:15:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 13256404 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D76CD2915 for ; Fri, 26 May 2023 05:16:15 +0000 (UTC) Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35BE813A for ; Thu, 25 May 2023 22:16:13 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PKnwNc009618 for ; Thu, 25 May 2023 22:16:12 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qt3t6q3mp-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 25 May 2023 22:16:12 -0700 Received: from twshared18891.17.frc2.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 25 May 2023 22:16:07 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id BA3F71E38AFE9; Thu, 25 May 2023 22:15:54 -0700 (PDT) From: Song Liu To: CC: , , , , , , , Song Liu Subject: [PATCH 1/3] module: Introduce module_alloc_type Date: Thu, 25 May 2023 22:15:27 -0700 Message-ID: <20230526051529.3387103-2-song@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230526051529.3387103-1-song@kernel.org> References: <20230526051529.3387103-1-song@kernel.org> X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Hg4TxwUyLNVtOK7AaiHZxZK2zb0QTYI7 X-Proofpoint-GUID: Hg4TxwUyLNVtOK7AaiHZxZK2zb0QTYI7 X-Proofpoint-UnRewURL: 0 URL was un-rewritten Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-26_01,2023-05-25_03,2023-05-22_02 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Introduce memory type aware module_alloc_type, which provides a unified allocator for all different archs. This work was discussed in [1]. Each arch can configure the allocator to do the following: 1. Specify module_vaddr and module_end 2. Random module start address for KASLR 3. kasan_alloc_module_shadow() 4. kasan_reset_tag() 5. Preferred and secondary module address ranges enum mod_alloc_params_flags are used to control the behavior of module_alloc_type. Specifically: MOD_ALLOC_FALLBACK let module_alloc_type fallback to existing module_alloc. MOD_ALLOC_SET_MEMORY let module_alloc_type to protect the memory before returning to the user. module_allocator_init() call is added to start_kernel() to initialize module_alloc_type. Signed-off-by: Song Liu [1] https://lore.kernel.org/linux-mm/20221107223921.3451913-1-song@kernel.org/ --- include/linux/module.h | 6 + include/linux/moduleloader.h | 75 ++++++++++++ init/main.c | 1 + kernel/bpf/bpf_struct_ops.c | 10 +- kernel/bpf/core.c | 20 ++-- kernel/bpf/trampoline.c | 6 +- kernel/kprobes.c | 6 +- kernel/module/internal.h | 3 + kernel/module/main.c | 217 +++++++++++++++++++++++++++++++++-- kernel/module/strict_rwx.c | 4 + 10 files changed, 319 insertions(+), 29 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index 9e56763dff81..948b8132a742 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -752,6 +752,8 @@ static inline bool is_livepatch_module(struct module *mod) void set_module_sig_enforced(void); +void __init module_allocator_init(void); + #else /* !CONFIG_MODULES... */ static inline struct module *__module_address(unsigned long addr) @@ -855,6 +857,10 @@ void *dereference_module_function_descriptor(struct module *mod, void *ptr) return ptr; } +static inline void __init module_allocator_init(void) +{ +} + #endif /* CONFIG_MODULES */ #ifdef CONFIG_SYSFS diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 03be088fb439..59c7114a7b65 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -32,6 +32,81 @@ void *module_alloc(unsigned long size); /* Free memory returned from module_alloc. */ void module_memfree(void *module_region); +#ifdef CONFIG_MODULES + +/* For mod_alloc_params.flags */ +enum mod_alloc_params_flags { + MOD_ALLOC_FALLBACK = (1 << 0), /* Fallback to module_alloc() */ + MOD_ALLOC_KASAN_MODULE_SHADOW = (1 << 1), /* Calls kasan_alloc_module_shadow() */ + MOD_ALLOC_KASAN_RESET_TAG = (1 << 2), /* Calls kasan_reset_tag() */ + MOD_ALLOC_SET_MEMORY = (1 << 3), /* The allocator calls set_memory_ on + * memory before returning it to the + * caller, so that the caller do not need + * to call set_memory_* again. This does + * not work for MOD_RO_AFTER_INIT. + */ +}; + +#define MOD_MAX_ADDR_SPACES 2 + +/** + * struct vmalloc_params - Parameters to call __vmalloc_node_range() + * @start: Address space range start + * @end: Address space range end + * @gfp_mask: The gfp_t mask used for this range + * @pgprot: The page protection for this range + * @vm_flags The vm_flag used for this range + */ +struct vmalloc_params { + unsigned long start; + unsigned long end; + gfp_t gfp_mask; + pgprot_t pgprot; + unsigned long vm_flags; +}; + +/** + * struct mod_alloc_params - Parameters for module allocation type + * @flags: Properties in mod_alloc_params_flags + * @granularity: The allocation granularity (PAGE/PMD) in bytes + * @alignment: The allocation alignment requirement + * @vmp: Parameters used to call vmalloc + * @fill: Function to fill allocated space. If NULL, use memcpy() + * @invalidate: Function to invalidate memory space. + * + * If @granularity > @alignment the allocation can reuse free space in + * previously allocated pages. If they are the same, then fresh pages + * have to be allocated. + */ +struct mod_alloc_params { + unsigned int flags; + unsigned int granularity; + unsigned int alignment; + struct vmalloc_params vmp[MOD_MAX_ADDR_SPACES]; + void * (*fill)(void *dst, const void *src, size_t len); + void * (*invalidate)(void *ptr, size_t len); +}; + +struct mod_type_allocator { + struct mod_alloc_params params; +}; + +struct mod_allocators { + struct mod_type_allocator *types[MOD_MEM_NUM_TYPES]; +}; + +void *module_alloc_type(size_t size, enum mod_mem_type type); +void module_memfree_type(void *ptr, enum mod_mem_type type); +void module_memory_fill_type(void *dst, void *src, size_t len, enum mod_mem_type type); +void module_memory_invalidate_type(void *ptr, size_t len, enum mod_mem_type type); +void module_memory_protect(void *ptr, size_t len, enum mod_mem_type type); +void module_memory_unprotect(void *ptr, size_t len, enum mod_mem_type type); +void module_memory_force_protect(void *ptr, size_t len, enum mod_mem_type type); +void module_memory_force_unprotect(void *ptr, size_t len, enum mod_mem_type type); +void module_alloc_type_init(struct mod_allocators *allocators); + +#endif /* CONFIG_MODULES */ + /* Determines if the section name is an init section (that is only used during * module loading). */ diff --git a/init/main.c b/init/main.c index af50044deed5..e05228cabde8 100644 --- a/init/main.c +++ b/init/main.c @@ -936,6 +936,7 @@ asmlinkage __visible void __init __no_sanitize_address __noreturn start_kernel(v sort_main_extable(); trap_init(); mm_core_init(); + module_allocator_init(); poking_init(); ftrace_init(); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index d3f0a4825fa6..e4ec4be866cc 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -12,6 +12,7 @@ #include #include #include +#include enum bpf_struct_ops_state { BPF_STRUCT_OPS_STATE_INIT, @@ -512,7 +513,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, err = st_ops->validate(kdata); if (err) goto reset_unlock; - set_memory_rox((long)st_map->image, 1); + module_memory_protect(st_map->image, PAGE_SIZE, MOD_TEXT); + /* Let bpf_link handle registration & unregistration. * * Pair with smp_load_acquire() during lookup_elem(). @@ -521,7 +523,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, goto unlock; } - set_memory_rox((long)st_map->image, 1); + module_memory_protect(st_map->image, PAGE_SIZE, MOD_TEXT); err = st_ops->reg(kdata); if (likely(!err)) { /* This refcnt increment on the map here after @@ -544,8 +546,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, * there was a race in registering the struct_ops (under the same name) to * a sub-system through different struct_ops's maps. */ - set_memory_nx((long)st_map->image, 1); - set_memory_rw((long)st_map->image, 1); + module_memory_unprotect(st_map->image, PAGE_SIZE, MOD_TEXT); reset_unlock: bpf_struct_ops_map_put_progs(st_map); @@ -907,4 +908,3 @@ int bpf_struct_ops_link_create(union bpf_attr *attr) kfree(link); return err; } - diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7421487422d4..4c989a8fe8b8 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -860,7 +860,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins GFP_KERNEL); if (!pack) return NULL; - pack->ptr = module_alloc(BPF_PROG_PACK_SIZE); + pack->ptr = module_alloc_type(BPF_PROG_PACK_SIZE, MOD_TEXT); if (!pack->ptr) { kfree(pack); return NULL; @@ -869,8 +869,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); list_add_tail(&pack->list, &pack_list); - set_vm_flush_reset_perms(pack->ptr); - set_memory_rox((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); + module_memory_protect(pack->ptr, BPF_PROG_PACK_SIZE, MOD_TEXT); return pack; } @@ -884,11 +883,10 @@ void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) mutex_lock(&pack_mutex); if (size > BPF_PROG_PACK_SIZE) { size = round_up(size, PAGE_SIZE); - ptr = module_alloc(size); + ptr = module_alloc_type(size, MOD_TEXT); if (ptr) { bpf_fill_ill_insns(ptr, size); - set_vm_flush_reset_perms(ptr); - set_memory_rox((unsigned long)ptr, size / PAGE_SIZE); + module_memory_protect(ptr, size, MOD_TEXT); } goto out; } @@ -922,7 +920,8 @@ void bpf_prog_pack_free(struct bpf_binary_header *hdr) mutex_lock(&pack_mutex); if (hdr->size > BPF_PROG_PACK_SIZE) { - module_memfree(hdr); + module_memfree_type(hdr, MOD_TEXT); + goto out; } @@ -946,7 +945,8 @@ void bpf_prog_pack_free(struct bpf_binary_header *hdr) if (bitmap_find_next_zero_area(pack->bitmap, BPF_PROG_CHUNK_COUNT, 0, BPF_PROG_CHUNK_COUNT, 0) == 0) { list_del(&pack->list); - module_memfree(pack->ptr); + module_memfree_type(pack->ptr, MOD_TEXT); + kfree(pack); } out: @@ -997,12 +997,12 @@ void bpf_jit_uncharge_modmem(u32 size) void *__weak bpf_jit_alloc_exec(unsigned long size) { - return module_alloc(size); + return module_alloc_type(size, MOD_TEXT); } void __weak bpf_jit_free_exec(void *addr) { - module_memfree(addr); + module_memfree_type(addr, MOD_TEXT); } struct bpf_binary_header * diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index ac021bc43a66..fd2d46c9a295 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -13,6 +13,7 @@ #include #include #include +#include /* dummy _ops. The verifier will operate on target program's ops. */ const struct bpf_verifier_ops bpf_extension_verifier_ops = { @@ -440,7 +441,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut if (err < 0) goto out; - set_memory_rox((long)im->image, 1); + module_memory_protect(im->image, PAGE_SIZE, MOD_TEXT); WARN_ON(tr->cur_image && tr->selector == 0); WARN_ON(!tr->cur_image && tr->selector); @@ -462,8 +463,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut tr->fops->trampoline = 0; /* reset im->image memory attr for arch_prepare_bpf_trampoline */ - set_memory_nx((long)im->image, 1); - set_memory_rw((long)im->image, 1); + module_memory_unprotect(im->image, PAGE_SIZE, MOD_TEXT); goto again; } #endif diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 00e177de91cc..daf47da3c96e 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -113,17 +113,17 @@ enum kprobe_slot_state { void __weak *alloc_insn_page(void) { /* - * Use module_alloc() so this page is within +/- 2GB of where the + * Use module_alloc_type() so this page is within +/- 2GB of where the * kernel image and loaded module images reside. This is required * for most of the architectures. * (e.g. x86-64 needs this to handle the %rip-relative fixups.) */ - return module_alloc(PAGE_SIZE); + return module_alloc_type(PAGE_SIZE, MOD_TEXT); } static void free_insn_page(void *page) { - module_memfree(page); + module_memfree_type(page, MOD_TEXT); } struct kprobe_insn_cache kprobe_insn_slots = { diff --git a/kernel/module/internal.h b/kernel/module/internal.h index dc7b0160c480..b2e136326c4c 100644 --- a/kernel/module/internal.h +++ b/kernel/module/internal.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #ifndef ARCH_SHF_SMALL @@ -392,3 +393,5 @@ static inline int same_magic(const char *amagic, const char *bmagic, bool has_cr return strcmp(amagic, bmagic) == 0; } #endif /* CONFIG_MODVERSIONS */ + +extern struct mod_allocators module_allocators; diff --git a/kernel/module/main.c b/kernel/module/main.c index ea7d0c7f3e60..0f9183f1ca9f 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1203,11 +1203,11 @@ static bool mod_mem_use_vmalloc(enum mod_mem_type type) mod_mem_type_is_core_data(type); } -static void *module_memory_alloc(unsigned int size, enum mod_mem_type type) +static void *module_memory_alloc(size_t size, enum mod_mem_type type) { if (mod_mem_use_vmalloc(type)) return vzalloc(size); - return module_alloc(size); + return module_alloc_type(size, type); } static void module_memory_free(void *ptr, enum mod_mem_type type) @@ -1215,7 +1215,7 @@ static void module_memory_free(void *ptr, enum mod_mem_type type) if (mod_mem_use_vmalloc(type)) vfree(ptr); else - module_memfree(ptr); + module_memfree_type(ptr, type); } static void free_mod_mem(struct module *mod) @@ -1609,6 +1609,201 @@ void * __weak module_alloc(unsigned long size) NUMA_NO_NODE, __builtin_return_address(0)); } +struct mod_allocators module_allocators; + +static struct mod_type_allocator default_mod_type_allocator = { + .params = { + .flags = MOD_ALLOC_FALLBACK, + }, +}; + +void __init __weak module_alloc_type_init(struct mod_allocators *allocators) +{ + for_each_mod_mem_type(type) + allocators->types[type] = &default_mod_type_allocator; +} + +static void module_memory_enable_protection(void *ptr, size_t len, enum mod_mem_type type) +{ + int npages = DIV_ROUND_UP(len, PAGE_SIZE); + + switch (type) { + case MOD_TEXT: + case MOD_INIT_TEXT: + set_memory_rox((unsigned long)ptr, npages); + break; + case MOD_DATA: + case MOD_INIT_DATA: + set_memory_nx((unsigned long)ptr, npages); + break; + case MOD_RODATA: + set_memory_nx((unsigned long)ptr, npages); + set_memory_ro((unsigned long)ptr, npages); + break; + case MOD_RO_AFTER_INIT: + set_memory_ro((unsigned long)ptr, npages); + break; + default: + WARN_ONCE(true, "Unknown mod_mem_type: %d\n", type); + break; + } +} + +static void module_memory_disable_protection(void *ptr, size_t len, enum mod_mem_type type) +{ + int npages = DIV_ROUND_UP(len, PAGE_SIZE); + + switch (type) { + case MOD_TEXT: + case MOD_INIT_TEXT: + set_memory_nx((unsigned long)ptr, npages); + set_memory_rw((unsigned long)ptr, npages); + break; + case MOD_RODATA: + case MOD_RO_AFTER_INIT: + set_memory_rw((unsigned long)ptr, npages); + break; + case MOD_DATA: + case MOD_INIT_DATA: + break; + default: + WARN_ONCE(true, "Unknown mod_mem_type: %d\n", type); + break; + } +} + +void *module_alloc_type(size_t size, enum mod_mem_type type) +{ + struct mod_type_allocator *allocator; + struct mod_alloc_params *params; + void *ptr = NULL; + int i; + + if (WARN_ON_ONCE(type >= MOD_MEM_NUM_TYPES)) + return NULL; + + allocator = module_allocators.types[type]; + params = &allocator->params; + + if (params->flags & MOD_ALLOC_FALLBACK) + return module_alloc(size); + + for (i = 0; i < MOD_MAX_ADDR_SPACES; i++) { + struct vmalloc_params *vmp = ¶ms->vmp[i]; + + if (vmp->start == vmp->end) + continue; + + ptr = __vmalloc_node_range(size, params->alignment, vmp->start, vmp->end, + vmp->gfp_mask, vmp->pgprot, vmp->vm_flags, + NUMA_NO_NODE, __builtin_return_address(0)); + if (!ptr) + continue; + + if (params->flags & MOD_ALLOC_KASAN_MODULE_SHADOW) { + if (ptr && kasan_alloc_module_shadow(ptr, size, vmp->gfp_mask)) { + vfree(ptr); + return NULL; + } + } + + /* + * VM_FLUSH_RESET_PERMS is still needed here. This is + * because "size" is not available in module_memfree_type + * at the moment, so we cannot undo set_memory_rox in + * module_memfree_type. Once a better allocator is used, + * we can manually undo set_memory_rox, and thus remove + * VM_FLUSH_RESET_PERMS. + */ + set_vm_flush_reset_perms(ptr); + + if (params->flags & MOD_ALLOC_SET_MEMORY) + module_memory_enable_protection(ptr, size, type); + + if (params->flags & MOD_ALLOC_KASAN_RESET_TAG) + return kasan_reset_tag(ptr); + return ptr; + } + return NULL; +} + +void module_memfree_type(void *ptr, enum mod_mem_type type) +{ + module_memfree(ptr); +} + +void module_memory_fill_type(void *dst, void *src, size_t len, enum mod_mem_type type) +{ + struct mod_type_allocator *allocator; + struct mod_alloc_params *params; + + allocator = module_allocators.types[type]; + params = &allocator->params; + + if (params->fill) + params->fill(dst, src, len); + else + memcpy(dst, src, len); +} + +void module_memory_invalidate_type(void *dst, size_t len, enum mod_mem_type type) +{ + struct mod_type_allocator *allocator; + struct mod_alloc_params *params; + + allocator = module_allocators.types[type]; + params = &allocator->params; + + if (params->invalidate) + params->invalidate(dst, len); + else + memset(dst, 0, len); +} + +/* + * Protect memory allocated by module_alloc_type(). Called by users of + * module_alloc_type. This is a no-op with MOD_ALLOC_SET_MEMORY. + */ +void module_memory_protect(void *ptr, size_t len, enum mod_mem_type type) +{ + struct mod_alloc_params *params = &module_allocators.types[type]->params; + + if (params->flags & MOD_ALLOC_SET_MEMORY) + return; + module_memory_enable_protection(ptr, len, type); +} + +/* + * Unprotect memory allocated by module_alloc_type(). Called by users of + * module_alloc_type. This is a no-op with MOD_ALLOC_SET_MEMORY. + */ +void module_memory_unprotect(void *ptr, size_t len, enum mod_mem_type type) +{ + struct mod_alloc_params *params = &module_allocators.types[type]->params; + + if (params->flags & MOD_ALLOC_SET_MEMORY) + return; + module_memory_disable_protection(ptr, len, type); +} + +/* + * Should only be used by arch code in cases where text_poke like + * solution is not ready yet + */ +void module_memory_force_protect(void *ptr, size_t len, enum mod_mem_type type) +{ + module_memory_enable_protection(ptr, len, type); +} + +/* + * Should only be used by arch code in cases where text_poke like + * solution is not ready yet + */ +void module_memory_force_unprotect(void *ptr, size_t len, enum mod_mem_type type) +{ + module_memory_disable_protection(ptr, len, type); +} + bool __weak module_init_section(const char *name) { return strstarts(name, ".init"); @@ -2241,7 +2436,7 @@ static int move_module(struct module *mod, struct load_info *info) t = type; goto out_enomem; } - memset(ptr, 0, mod->mem[type].size); + module_memory_invalidate_type(ptr, mod->mem[type].size, type); mod->mem[type].base = ptr; } @@ -2269,7 +2464,8 @@ static int move_module(struct module *mod, struct load_info *info) ret = -ENOEXEC; goto out_enomem; } - memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size); + + module_memory_fill_type(dest, (void *)shdr->sh_addr, shdr->sh_size, type); } /* * Update the userspace copy's ELF section address to point to @@ -2471,9 +2667,9 @@ static void do_free_init(struct work_struct *w) llist_for_each_safe(pos, n, list) { initfree = container_of(pos, struct mod_initfree, node); - module_memfree(initfree->init_text); - module_memfree(initfree->init_data); - module_memfree(initfree->init_rodata); + module_memfree_type(initfree->init_text, MOD_INIT_TEXT); + module_memfree_type(initfree->init_data, MOD_INIT_DATA); + module_memfree_type(initfree->init_rodata, MOD_INIT_RODATA); kfree(initfree); } } @@ -3268,3 +3464,8 @@ static int module_debugfs_init(void) } module_init(module_debugfs_init); #endif + +void __init module_allocator_init(void) +{ + module_alloc_type_init(&module_allocators); +} diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index a2b656b4e3d2..65ff1b09dc84 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -16,6 +16,10 @@ static void module_set_memory(const struct module *mod, enum mod_mem_type type, { const struct module_memory *mod_mem = &mod->mem[type]; + /* The allocator already called set_memory_*, skip here. */ + if (module_allocators.types[type]->params.flags & MOD_ALLOC_SET_MEMORY) + return; + set_vm_flush_reset_perms(mod_mem->base); set_memory((unsigned long)mod_mem->base, mod_mem->size >> PAGE_SHIFT); } From patchwork Fri May 26 05:15:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 13256402 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD7D823AD for ; Fri, 26 May 2023 05:16:08 +0000 (UTC) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED1C513A for ; Thu, 25 May 2023 22:16:05 -0700 (PDT) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34Q26iGe024872 for ; Thu, 25 May 2023 22:16:05 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qtkpcgy3u-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 25 May 2023 22:16:05 -0700 Received: from twshared25760.37.frc1.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 25 May 2023 22:16:04 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id CC8761E38AFF2; Thu, 25 May 2023 22:15:55 -0700 (PDT) From: Song Liu To: CC: , , , , , , , Song Liu Subject: [PATCH 2/3] ftrace: Add swap_func to ftrace_process_locs() Date: Thu, 25 May 2023 22:15:28 -0700 Message-ID: <20230526051529.3387103-3-song@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230526051529.3387103-1-song@kernel.org> References: <20230526051529.3387103-1-song@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: TuVyIa1_LTeaNrJ0I9dr8AvQ0Tryiu2B X-Proofpoint-GUID: TuVyIa1_LTeaNrJ0I9dr8AvQ0Tryiu2B X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-26_01,2023-05-25_03,2023-05-22_02 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net ftrace_process_locs sorts module mcount, which is inside RO memory. Add a ftrace_swap_func so that archs can use RO-memory-poke function to do the sorting. Signed-off-by: Song Liu --- include/linux/ftrace.h | 2 ++ kernel/trace/ftrace.c | 13 ++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index b23bdd414394..fe443b8ed32c 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -1166,4 +1166,6 @@ unsigned long arch_syscall_addr(int nr); #endif /* CONFIG_FTRACE_SYSCALLS */ +void ftrace_swap_func(void *a, void *b, int n); + #endif /* _LINUX_FTRACE_H */ diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 764668467155..f5ddc9d4cfb6 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -6430,6 +6430,17 @@ static void test_is_sorted(unsigned long *start, unsigned long count) } #endif +void __weak ftrace_swap_func(void *a, void *b, int n) +{ + unsigned long t; + + WARN_ON_ONCE(n != sizeof(t)); + + t = *((unsigned long *)a); + *(unsigned long *)a = *(unsigned long *)b; + *(unsigned long *)b = t; +} + static int ftrace_process_locs(struct module *mod, unsigned long *start, unsigned long *end) @@ -6455,7 +6466,7 @@ static int ftrace_process_locs(struct module *mod, */ if (!IS_ENABLED(CONFIG_BUILDTIME_MCOUNT_SORT) || mod) { sort(start, count, sizeof(*start), - ftrace_cmp_ips, NULL); + ftrace_cmp_ips, ftrace_swap_func); } else { test_is_sorted(start, count); } From patchwork Fri May 26 05:15:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 13256405 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B0B63C04 for ; Fri, 26 May 2023 05:16:17 +0000 (UTC) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70E7E187 for ; Thu, 25 May 2023 22:16:13 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34PKnHgR028990 for ; Thu, 25 May 2023 22:16:13 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3qt9vqn5mb-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 25 May 2023 22:16:13 -0700 Received: from twshared24695.38.frc1.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 25 May 2023 22:16:11 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 5683B1E38AFFF; Thu, 25 May 2023 22:15:57 -0700 (PDT) From: Song Liu To: CC: , , , , , , , Song Liu Subject: [PATCH 3/3] x86/module: Use module_alloc_type Date: Thu, 25 May 2023 22:15:29 -0700 Message-ID: <20230526051529.3387103-4-song@kernel.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230526051529.3387103-1-song@kernel.org> References: <20230526051529.3387103-1-song@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: exYrmN8aktJDqi54h3R2KrtSHn484w_c X-Proofpoint-ORIG-GUID: exYrmN8aktJDqi54h3R2KrtSHn484w_c X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-26_01,2023-05-25_03,2023-05-22_02 X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: bpf@iogearbox.net Enable module_alloc_type to 1. Allocate ROX data for MOD_TEXT and MOD_INIT_TEXT; 2. Allocate RO data for MOD_RODATA and MOD_INIT_RODATA; 3. Allocate RW data for other types. Also, update users of module_alloc_type (BPF, ftrace, kprobe) to handle these restrictions. arch_prepare_bpf_trampoline() cannot jit directly into module memory yet, so we have to use module_memory_force_[un]protect() in it. Signed-off-by: Song Liu --- arch/x86/kernel/alternative.c | 37 +++++++---- arch/x86/kernel/ftrace.c | 44 +++++++------ arch/x86/kernel/kprobes/core.c | 8 +-- arch/x86/kernel/module.c | 114 +++++++++++++++++++++++---------- arch/x86/kernel/unwind_orc.c | 13 ++-- arch/x86/net/bpf_jit_comp.c | 22 +++++-- kernel/bpf/core.c | 6 +- 7 files changed, 160 insertions(+), 84 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index f615e0cb6d93..bb4e6c3225bf 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -122,6 +122,17 @@ extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; extern s32 __smp_locks[], __smp_locks_end[]; void text_poke_early(void *addr, const void *opcode, size_t len); +static void __init_or_module do_text_poke(void *addr, const void *opcode, size_t len) +{ + if (system_state < SYSTEM_RUNNING) { + text_poke_early(addr, opcode, len); + } else { + mutex_lock(&text_mutex); + text_poke(addr, opcode, len); + mutex_unlock(&text_mutex); + } +} + /* * Are we looking at a near JMP with a 1 or 4-byte displacement. */ @@ -331,7 +342,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, DUMP_BYTES(insn_buff, insn_buff_sz, "%px: final_insn: ", instr); - text_poke_early(instr, insn_buff, insn_buff_sz); + do_text_poke(instr, insn_buff, insn_buff_sz); next: optimize_nops(instr, a->instrlen); @@ -564,7 +575,7 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) optimize_nops(bytes, len); DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(addr, bytes, len); + do_text_poke(addr, bytes, len); } } } @@ -638,7 +649,7 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end) if (len == insn.length) { DUMP_BYTES(((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(addr, bytes, len); + do_text_poke(addr, bytes, len); } } } @@ -674,7 +685,7 @@ static void poison_endbr(void *addr, bool warn) */ DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr); DUMP_BYTES(((u8*)&poison), 4, "%px: repl: ", addr); - text_poke_early(addr, &poison, 4); + do_text_poke(addr, &poison, 4); } /* @@ -869,7 +880,7 @@ static int cfi_disable_callers(s32 *start, s32 *end) if (!hash) /* nocfi callers */ continue; - text_poke_early(addr, jmp, 2); + do_text_poke(addr, jmp, 2); } return 0; @@ -892,7 +903,7 @@ static int cfi_enable_callers(s32 *start, s32 *end) if (!hash) /* nocfi callers */ continue; - text_poke_early(addr, mov, 2); + do_text_poke(addr, mov, 2); } return 0; @@ -913,7 +924,7 @@ static int cfi_rand_preamble(s32 *start, s32 *end) return -EINVAL; hash = cfi_rehash(hash); - text_poke_early(addr + 1, &hash, 4); + do_text_poke(addr + 1, &hash, 4); } return 0; @@ -932,9 +943,9 @@ static int cfi_rewrite_preamble(s32 *start, s32 *end) addr, addr, 5, addr)) return -EINVAL; - text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size); + do_text_poke(addr, fineibt_preamble_start, fineibt_preamble_size); WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678); - text_poke_early(addr + fineibt_preamble_hash, &hash, 4); + do_text_poke(addr + fineibt_preamble_hash, &hash, 4); } return 0; @@ -953,7 +964,7 @@ static int cfi_rand_callers(s32 *start, s32 *end) hash = decode_caller_hash(addr); if (hash) { hash = -cfi_rehash(hash); - text_poke_early(addr + 2, &hash, 4); + do_text_poke(addr + 2, &hash, 4); } } @@ -971,9 +982,9 @@ static int cfi_rewrite_callers(s32 *start, s32 *end) addr -= fineibt_caller_size; hash = decode_caller_hash(addr); if (hash) { - text_poke_early(addr, fineibt_caller_start, fineibt_caller_size); + do_text_poke(addr, fineibt_caller_start, fineibt_caller_size); WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678); - text_poke_early(addr + fineibt_caller_hash, &hash, 4); + do_text_poke(addr + fineibt_caller_hash, &hash, 4); } /* rely on apply_retpolines() */ } @@ -1243,7 +1254,7 @@ void __init_or_module apply_paravirt(struct paravirt_patch_site *start, /* Pad the rest with nops */ add_nops(insn_buff + used, p->len - used); - text_poke_early(p->instr, insn_buff, p->len); + do_text_poke(p->instr, insn_buff, p->len); } } extern struct paravirt_patch_site __start_parainstructions[], diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 5e7ead52cfdb..a41af9e49afb 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -119,8 +119,11 @@ ftrace_modify_code_direct(unsigned long ip, const char *old_code, /* replace the text with the new text */ if (ftrace_poke_late) text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL); - else - text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE); + else { + mutex_lock(&text_mutex); + text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE); + mutex_unlock(&text_mutex); + } return 0; } @@ -265,11 +268,11 @@ void arch_ftrace_update_code(int command) /* Module allocation simplifies allocating memory for code */ static inline void *alloc_tramp(unsigned long size) { - return module_alloc(size); + return module_alloc_type(size, MOD_TEXT); } static inline void tramp_free(void *tramp) { - module_memfree(tramp); + module_memfree_type(tramp, MOD_TEXT); } #else /* Trampolines can only be created if modules are supported */ @@ -319,7 +322,6 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) unsigned long call_offset; unsigned long jmp_offset; unsigned long offset; - unsigned long npages; unsigned long size; unsigned long *ptr; void *trampoline; @@ -328,7 +330,6 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 }; unsigned const char retq[] = { RET_INSN_OPCODE, INT3_INSN_OPCODE }; union ftrace_op_code_union op_ptr; - int ret; if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) { start_offset = (unsigned long)ftrace_regs_caller; @@ -356,18 +357,16 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) return 0; *tramp_size = size + RET_SIZE + sizeof(void *); - npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE); /* Copy ftrace_caller onto the trampoline memory */ - ret = copy_from_kernel_nofault(trampoline, (void *)start_offset, size); - if (WARN_ON(ret < 0)) + if (WARN_ON(text_poke_copy(trampoline, (void *)start_offset, size) == NULL)) goto fail; ip = trampoline + size; if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) __text_gen_insn(ip, JMP32_INSN_OPCODE, ip, x86_return_thunk, JMP32_INSN_SIZE); else - memcpy(ip, retq, sizeof(retq)); + text_poke_copy(ip, retq, sizeof(retq)); /* No need to test direct calls on created trampolines */ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) { @@ -375,8 +374,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) ip = trampoline + (jmp_offset - start_offset); if (WARN_ON(*(char *)ip != 0x75)) goto fail; - ret = copy_from_kernel_nofault(ip, x86_nops[2], 2); - if (ret < 0) + if (text_poke_copy(ip, x86_nops[2], 2) == NULL) goto fail; } @@ -389,7 +387,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) */ ptr = (unsigned long *)(trampoline + size + RET_SIZE); - *ptr = (unsigned long)ops; + text_poke_copy(ptr, &ops, sizeof(unsigned long)); op_offset -= start_offset; memcpy(&op_ptr, trampoline + op_offset, OP_REF_SIZE); @@ -405,7 +403,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) op_ptr.offset = offset; /* put in the new offset to the ftrace_ops */ - memcpy(trampoline + op_offset, &op_ptr, OP_REF_SIZE); + text_poke_copy(trampoline + op_offset, &op_ptr, OP_REF_SIZE); /* put in the call to the function */ mutex_lock(&text_mutex); @@ -415,15 +413,14 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) * the depth accounting before the call already. */ dest = ftrace_ops_get_func(ops); - memcpy(trampoline + call_offset, - text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest), - CALL_INSN_SIZE); + text_poke_copy_locked(trampoline + call_offset, + text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest), + CALL_INSN_SIZE, false); mutex_unlock(&text_mutex); /* ALLOC_TRAMP flags lets us know we created it */ ops->flags |= FTRACE_OPS_FL_ALLOC_TRAMP; - set_memory_rox((unsigned long)trampoline, npages); return (unsigned long)trampoline; fail: tramp_free(trampoline); @@ -667,4 +664,15 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip, } #endif +void ftrace_swap_func(void *a, void *b, int n) +{ + unsigned long t; + + WARN_ON_ONCE(n != sizeof(t)); + + t = *((unsigned long *)a); + text_poke_copy(a, b, sizeof(t)); + text_poke_copy(b, &t, sizeof(t)); +} + #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index f7f6042eb7e6..96f56e663cbe 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -414,16 +414,10 @@ void *alloc_insn_page(void) { void *page; - page = module_alloc(PAGE_SIZE); + page = module_alloc_type(PAGE_SIZE, MOD_TEXT); if (!page) return NULL; - /* - * TODO: Once additional kernel code protection mechanisms are set, ensure - * that the page was not maliciously altered and it is still zeroed. - */ - set_memory_rox((unsigned long)page, 1); - return page; } diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index b05f62ee2344..80c2ee1a4f7f 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -67,24 +67,82 @@ static unsigned long int get_module_load_offset(void) void *module_alloc(unsigned long size) { - gfp_t gfp_mask = GFP_KERNEL; - void *p; - - if (PAGE_ALIGN(size) > MODULES_LEN) - return NULL; - - p = __vmalloc_node_range(size, MODULE_ALIGN, - MODULES_VADDR + get_module_load_offset(), - MODULES_END, gfp_mask, PAGE_KERNEL, - VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK, - NUMA_NO_NODE, __builtin_return_address(0)); + WARN(true, "x86 should not use module_alloc\n"); + return NULL; +} - if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) { - vfree(p); - return NULL; - } +static void *x86_module_invalidate_text(void *ptr, size_t len) +{ + return text_poke_set(ptr, 0xcc, len); +} - return p; +static struct mod_type_allocator x86_mod_allocator_text = { + .params = { + .flags = MOD_ALLOC_KASAN_MODULE_SHADOW | MOD_ALLOC_SET_MEMORY, + .granularity = PAGE_SIZE, + .alignment = MODULE_ALIGN, + .fill = text_poke_copy, + .invalidate = x86_module_invalidate_text, + }, +}; + +static struct mod_type_allocator x86_mod_allocator_rw_data = { + .params = { + .flags = MOD_ALLOC_KASAN_MODULE_SHADOW, + .granularity = PAGE_SIZE, + .alignment = MODULE_ALIGN, + }, +}; + +static struct mod_type_allocator x86_mod_allocator_ro_data = { + .params = { + .flags = MOD_ALLOC_KASAN_MODULE_SHADOW | MOD_ALLOC_SET_MEMORY, + .granularity = PAGE_SIZE, + .alignment = MODULE_ALIGN, + .fill = text_poke_copy, + .invalidate = x86_module_invalidate_text, + }, +}; + +void __init module_alloc_type_init(struct mod_allocators *allocators) +{ + struct mod_alloc_params *params = &x86_mod_allocator_text.params; + struct vmalloc_params *vmp = ¶ms->vmp[0]; + + vmp->start = MODULES_VADDR + get_module_load_offset(); + vmp->end = MODULES_END; + vmp->gfp_mask = GFP_KERNEL; + vmp->pgprot = PAGE_KERNEL_EXEC; + vmp->vm_flags = VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK | + VM_ALLOW_HUGE_VMAP; + + for_class_mod_mem_type(type, text) + allocators->types[type] = &x86_mod_allocator_text; + + params = &x86_mod_allocator_rw_data.params; + vmp = ¶ms->vmp[0]; + + vmp->start = MODULES_VADDR + get_module_load_offset(); + vmp->end = MODULES_END; + vmp->gfp_mask = GFP_KERNEL; + vmp->pgprot = PAGE_KERNEL_EXEC; + vmp->vm_flags = VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK; + + allocators->types[MOD_DATA] = &x86_mod_allocator_rw_data; + allocators->types[MOD_INIT_DATA] = &x86_mod_allocator_rw_data; + allocators->types[MOD_RO_AFTER_INIT] = &x86_mod_allocator_rw_data; + + params = &x86_mod_allocator_ro_data.params; + vmp = ¶ms->vmp[0]; + + vmp->start = MODULES_VADDR + get_module_load_offset(); + vmp->end = MODULES_END; + vmp->gfp_mask = GFP_KERNEL; + vmp->pgprot = PAGE_KERNEL_EXEC; + vmp->vm_flags = VM_FLUSH_RESET_PERMS | VM_DEFER_KMEMLEAK; + + allocators->types[MOD_RODATA] = &x86_mod_allocator_ro_data; + allocators->types[MOD_INIT_RODATA] = &x86_mod_allocator_ro_data; } #ifdef CONFIG_X86_32 @@ -134,7 +192,6 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs, unsigned int symindex, unsigned int relsec, struct module *me, - void *(*write)(void *dest, const void *src, size_t len), bool apply) { unsigned int i; @@ -202,14 +259,14 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs, (int)ELF64_R_TYPE(rel[i].r_info), loc, val); return -ENOEXEC; } - write(loc, &val, size); + text_poke(loc, &val, size); } else { if (memcmp(loc, &val, size)) { pr_warn("x86/modules: Invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n", (int)ELF64_R_TYPE(rel[i].r_info), loc, val); return -ENOEXEC; } - write(loc, &zero, size); + text_poke(loc, &zero, size); } } return 0; @@ -230,22 +287,11 @@ static int write_relocate_add(Elf64_Shdr *sechdrs, bool apply) { int ret; - bool early = me->state == MODULE_STATE_UNFORMED; - void *(*write)(void *, const void *, size_t) = memcpy; - - if (!early) { - write = text_poke; - mutex_lock(&text_mutex); - } - - ret = __write_relocate_add(sechdrs, strtab, symindex, relsec, me, - write, apply); - - if (!early) { - text_poke_sync(); - mutex_unlock(&text_mutex); - } + mutex_lock(&text_mutex); + ret = __write_relocate_add(sechdrs, strtab, symindex, relsec, me, apply); + text_poke_sync(); + mutex_unlock(&text_mutex); return ret; } diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 3ac50b7298d1..264188ec50c9 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -7,6 +7,7 @@ #include #include #include +#include #define orc_warn(fmt, ...) \ printk_deferred_once(KERN_WARNING "WARNING: " fmt, ##__VA_ARGS__) @@ -222,18 +223,22 @@ static void orc_sort_swap(void *_a, void *_b, int size) struct orc_entry orc_tmp; int *a = _a, *b = _b, tmp; int delta = _b - _a; + int val; /* Swap the .orc_unwind_ip entries: */ tmp = *a; - *a = *b + delta; - *b = tmp - delta; + val = *b + delta; + text_poke_copy(a, &val, sizeof(val)); + val = tmp - delta; + text_poke_copy(b, &val, sizeof(val)); /* Swap the corresponding .orc_unwind entries: */ orc_a = cur_orc_table + (a - cur_orc_ip_table); orc_b = cur_orc_table + (b - cur_orc_ip_table); orc_tmp = *orc_a; - *orc_a = *orc_b; - *orc_b = orc_tmp; + + text_poke_copy(orc_a, orc_b, sizeof(*orc_b)); + text_poke_copy(orc_b, &orc_tmp, sizeof(orc_tmp)); } static int orc_sort_cmp(const void *_a, const void *_b) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 1056bbf55b17..846228fb12f2 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -226,7 +227,7 @@ static u8 simple_alu_opcodes[] = { static void jit_fill_hole(void *area, unsigned int size) { /* Fill whole space with INT3 instructions */ - memset(area, 0xcc, size); + text_poke_set(area, 0xcc, size); } int bpf_arch_text_invalidate(void *dst, size_t len) @@ -2202,6 +2203,8 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i orig_call += X86_PATCH_SIZE; } + module_memory_force_unprotect((void *)((unsigned long)image & PAGE_MASK), + PAGE_SIZE, MOD_TEXT); prog = image; EMIT_ENDBR(); @@ -2238,20 +2241,24 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i emit_mov_imm64(&prog, BPF_REG_1, (long) im >> 32, (u32) (long) im); if (emit_rsb_call(&prog, __bpf_tramp_enter, prog)) { ret = -EINVAL; - goto cleanup; + goto reprotect_memory; } } if (fentry->nr_links) if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off, - flags & BPF_TRAMP_F_RET_FENTRY_RET)) - return -EINVAL; + flags & BPF_TRAMP_F_RET_FENTRY_RET)) { + ret = -EINVAL; + goto reprotect_memory; + } if (fmod_ret->nr_links) { branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *), GFP_KERNEL); - if (!branches) - return -ENOMEM; + if (!branches) { + ret = -ENOMEM; + goto reprotect_memory; + } if (invoke_bpf_mod_ret(m, &prog, fmod_ret, regs_off, run_ctx_off, branches)) { @@ -2336,6 +2343,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i cleanup: kfree(branches); +reprotect_memory: + module_memory_force_protect((void *)((unsigned long)image & PAGE_MASK), + PAGE_SIZE, MOD_TEXT); return ret; } diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 4c989a8fe8b8..90f09218d30f 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1092,8 +1092,10 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, return NULL; } - /* Fill space with illegal/arch-dep instructions. */ - bpf_fill_ill_insns(*rw_header, size); + /* bpf_fill_ill_insns is used to write to RO memory, so we cannot + * use it on rw_header, use memset(0) instead. + */ + memset(*rw_header, 0, size); (*rw_header)->size = size; hole = min_t(unsigned int, size - (proglen + sizeof(*ro_header)),