From patchwork Thu May 19 20:20:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856010 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4139FC433EF for ; Thu, 19 May 2022 20:23:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244738AbiESUXz convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:23:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244740AbiESUXx (ORCPT ); Thu, 19 May 2022 16:23:53 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3838BF1358 for ; Thu, 19 May 2022 13:23:38 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24JFNtXX013262 for ; Thu, 19 May 2022 13:23:37 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3g5rgj29k6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:23:37 -0700 Received: from twshared8307.18.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:23:35 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id ECA637D58F62; Thu, 19 May 2022 13:20:45 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 1/8] bpf: fill new bpf_prog_pack with illegal instructions Date: Thu, 19 May 2022 13:20:30 -0700 Message-ID: <20220519202037.2401584-2-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: R7Pl3jJ6tFj4E4trieEdQrqIP7ulaKnU X-Proofpoint-ORIG-GUID: R7Pl3jJ6tFj4E4trieEdQrqIP7ulaKnU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net bpf_prog_pack enables sharing huge pages among multiple BPF programs. These pages are marked as executable before the JIT engine fill it with BPF programs. To make these pages safe, fill the hole bpf_prog_pack with illegal instructions before making it executable. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Signed-off-by: Song Liu --- kernel/bpf/core.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 9cc91f0f3115..2d0c9d4696ad 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -873,7 +873,7 @@ static size_t select_bpf_prog_pack_size(void) return size; } -static struct bpf_prog_pack *alloc_new_pack(void) +static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns) { struct bpf_prog_pack *pack; @@ -886,6 +886,7 @@ static struct bpf_prog_pack *alloc_new_pack(void) kfree(pack); return NULL; } + bpf_fill_ill_insns(pack->ptr, bpf_prog_pack_size); bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE); list_add_tail(&pack->list, &pack_list); @@ -895,7 +896,7 @@ static struct bpf_prog_pack *alloc_new_pack(void) return pack; } -static void *bpf_prog_pack_alloc(u32 size) +static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) { unsigned int nbits = BPF_PROG_SIZE_TO_NBITS(size); struct bpf_prog_pack *pack; @@ -910,6 +911,7 @@ static void *bpf_prog_pack_alloc(u32 size) size = round_up(size, PAGE_SIZE); ptr = module_alloc(size); if (ptr) { + bpf_fill_ill_insns(ptr, size); set_vm_flush_reset_perms(ptr); set_memory_ro((unsigned long)ptr, size / PAGE_SIZE); set_memory_x((unsigned long)ptr, size / PAGE_SIZE); @@ -923,7 +925,7 @@ static void *bpf_prog_pack_alloc(u32 size) goto found_free_area; } - pack = alloc_new_pack(); + pack = alloc_new_pack(bpf_fill_ill_insns); if (!pack) goto out; @@ -1102,7 +1104,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, if (bpf_jit_charge_modmem(size)) return NULL; - ro_header = bpf_prog_pack_alloc(size); + ro_header = bpf_prog_pack_alloc(size, bpf_fill_ill_insns); if (!ro_header) { bpf_jit_uncharge_modmem(size); return NULL; From patchwork Thu May 19 20:20:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856008 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46370C433EF for ; Thu, 19 May 2022 20:23:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244703AbiESUXe convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:23:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244702AbiESUXc (ORCPT ); Thu, 19 May 2022 16:23:32 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2F67A2046 for ; Thu, 19 May 2022 13:23:31 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 24JFG3O9031223 for ; Thu, 19 May 2022 13:23:30 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by m0089730.ppops.net (PPS) with ESMTPS id 3g5h5d4wmt-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:23:30 -0700 Received: from twshared0725.22.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:23:28 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 78B3D7D58F63; Thu, 19 May 2022 13:20:47 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 2/8] x86/alternative: introduce text_poke_set Date: Thu, 19 May 2022 13:20:31 -0700 Message-ID: <20220519202037.2401584-3-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: r9I8OpQ8WaamQ1a8CydYK7Ce-HOXtev3 X-Proofpoint-ORIG-GUID: r9I8OpQ8WaamQ1a8CydYK7Ce-HOXtev3 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Introduce a memset like API for text_poke. This will be used to fill the unused RX memory with illegal instructions. Suggested-by: Peter Zijlstra (Intel) Acked-by: Peter Zijlstra (Intel) Signed-off-by: Song Liu --- arch/x86/include/asm/text-patching.h | 1 + arch/x86/kernel/alternative.c | 67 +++++++++++++++++++++++----- 2 files changed, 58 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index d20ab0921480..1cc15528ce29 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -45,6 +45,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern void *text_poke_copy(void *addr, const void *opcode, size_t len); +extern void *text_poke_set(void *addr, int c, size_t len); extern int poke_int3_handler(struct pt_regs *regs); extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index d374cb3cf024..7563b5bc8328 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -994,7 +994,21 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state) __ro_after_init struct mm_struct *poking_mm; __ro_after_init unsigned long poking_addr; -static void *__text_poke(void *addr, const void *opcode, size_t len) +static void text_poke_memcpy(void *dst, const void *src, size_t len) +{ + memcpy(dst, src, len); +} + +static void text_poke_memset(void *dst, const void *src, size_t len) +{ + int c = *(const int *)src; + + memset(dst, c, len); +} + +typedef void text_poke_f(void *dst, const void *src, size_t len); + +static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t len) { bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; struct page *pages[2] = {NULL}; @@ -1059,7 +1073,7 @@ static void *__text_poke(void *addr, const void *opcode, size_t len) prev = use_temporary_mm(poking_mm); kasan_disable_current(); - memcpy((u8 *)poking_addr + offset_in_page(addr), opcode, len); + func((u8 *)poking_addr + offset_in_page(addr), src, len); kasan_enable_current(); /* @@ -1087,11 +1101,13 @@ static void *__text_poke(void *addr, const void *opcode, size_t len) (cross_page_boundary ? 2 : 1) * PAGE_SIZE, PAGE_SHIFT, false); - /* - * If the text does not match what we just wrote then something is - * fundamentally screwy; there's nothing we can really do about that. - */ - BUG_ON(memcmp(addr, opcode, len)); + if (func == text_poke_memcpy) { + /* + * If the text does not match what we just wrote then something is + * fundamentally screwy; there's nothing we can really do about that. + */ + BUG_ON(memcmp(addr, src, len)); + } local_irq_restore(flags); pte_unmap_unlock(ptep, ptl); @@ -1118,7 +1134,7 @@ void *text_poke(void *addr, const void *opcode, size_t len) { lockdep_assert_held(&text_mutex); - return __text_poke(addr, opcode, len); + return __text_poke(text_poke_memcpy, addr, opcode, len); } /** @@ -1137,7 +1153,7 @@ void *text_poke(void *addr, const void *opcode, size_t len) */ void *text_poke_kgdb(void *addr, const void *opcode, size_t len) { - return __text_poke(addr, opcode, len); + return __text_poke(text_poke_memcpy, addr, opcode, len); } /** @@ -1167,7 +1183,38 @@ void *text_poke_copy(void *addr, const void *opcode, size_t len) s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched); - __text_poke((void *)ptr, opcode + patched, s); + __text_poke(text_poke_memcpy, (void *)ptr, opcode + patched, s); + patched += s; + } + mutex_unlock(&text_mutex); + return addr; +} + +/** + * text_poke_set - memset into (an unused part of) RX memory + * @addr: address to modify + * @c: the byte to fill the area with + * @len: length to copy, could be more than 2x PAGE_SIZE + * + * This is useful to overwrite unused regions of RX memory with illegal + * instructions. + */ +void *text_poke_set(void *addr, int c, size_t len) +{ + unsigned long start = (unsigned long)addr; + size_t patched = 0; + + if (WARN_ON_ONCE(core_kernel_text(start))) + return NULL; + + mutex_lock(&text_mutex); + while (patched < len) { + unsigned long ptr = start + patched; + size_t s; + + s = min_t(size_t, PAGE_SIZE * 2 - offset_in_page(ptr), len - patched); + + __text_poke(text_poke_memset, (void *)ptr, (void *)&c, s); patched += s; } mutex_unlock(&text_mutex); From patchwork Thu May 19 20:20:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856007 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 527B7C4321E for ; Thu, 19 May 2022 20:23:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244713AbiESUXd convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:23:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244703AbiESUXc (ORCPT ); Thu, 19 May 2022 16:23:32 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49C82AE267 for ; Thu, 19 May 2022 13:23:30 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24JFGCPM024392 for ; Thu, 19 May 2022 13:23:29 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3g59tbpvp1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:23:29 -0700 Received: from twshared0725.22.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:23:28 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id C3F337D58F64; Thu, 19 May 2022 13:20:49 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 3/8] bpf: introduce bpf_arch_text_invalidate for bpf_prog_pack Date: Thu, 19 May 2022 13:20:32 -0700 Message-ID: <20220519202037.2401584-4-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: _-Q9hjuB7cNLtWnrRqSb8ZoyQpDi_ONQ X-Proofpoint-GUID: _-Q9hjuB7cNLtWnrRqSb8ZoyQpDi_ONQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Introduce bpf_arch_text_invalidate and use it to fill unused part of the bpf_prog_pack with illegal instructions when a BPF program is freed. Signed-off-by: Song Liu --- arch/x86/net/bpf_jit_comp.c | 5 +++++ include/linux/bpf.h | 1 + kernel/bpf/core.c | 8 ++++++++ 3 files changed, 14 insertions(+) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a2b6d197c226..f298b18a9a3d 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -228,6 +228,11 @@ static void jit_fill_hole(void *area, unsigned int size) memset(area, 0xcc, size); } +int bpf_arch_text_invalidate(void *dst, size_t len) +{ + return IS_ERR_OR_NULL(text_poke_set(dst, 0xcc, len)); +} + struct jit_context { int cleanup_addr; /* Epilogue code offset */ diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c107392b0ba7..f6dfa416f892 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2364,6 +2364,7 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *addr1, void *addr2); void *bpf_arch_text_copy(void *dst, void *src, size_t len); +int bpf_arch_text_invalidate(void *dst, size_t len); struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 2d0c9d4696ad..cacd8684c3c4 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -968,6 +968,9 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr) nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size); pos = ((unsigned long)hdr - (unsigned long)pack_ptr) >> BPF_PROG_CHUNK_SHIFT; + WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size), + "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n"); + bitmap_clear(pack->bitmap, pos, nbits); if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, bpf_prog_chunk_count(), 0) == 0) { @@ -2740,6 +2743,11 @@ void * __weak bpf_arch_text_copy(void *dst, void *src, size_t len) return ERR_PTR(-ENOTSUPP); } +int __weak bpf_arch_text_invalidate(void *dst, size_t len) +{ + return -ENOTSUPP; +} + DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key); EXPORT_SYMBOL(bpf_stats_enabled_key); From patchwork Thu May 19 20:20:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856012 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B080FC433F5 for ; Thu, 19 May 2022 20:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244728AbiESU0e convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:26:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36212 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244730AbiESU0b (ORCPT ); Thu, 19 May 2022 16:26:31 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC7CF38BC4 for ; Thu, 19 May 2022 13:26:29 -0700 (PDT) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24JKPFKG027317 for ; Thu, 19 May 2022 13:26:28 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3g5b9vpb94-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:26:28 -0700 Received: from twshared29473.14.frc2.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:26:28 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id DE6BF7D58F6A; Thu, 19 May 2022 13:20:54 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 5/8] bpf: use module_alloc_huge for bpf_prog_pack Date: Thu, 19 May 2022 13:20:34 -0700 Message-ID: <20220519202037.2401584-6-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> X-FB-Internal: Safe X-Proofpoint-GUID: TQG56qfWdHdsBPXNoySqwc__SKZ0yP_o X-Proofpoint-ORIG-GUID: TQG56qfWdHdsBPXNoySqwc__SKZ0yP_o X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on PMD_SIZE pages. This benefits system performance by reducing iTLB miss rate. Benchmark of a real web service workload shows this change gives another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack (which improve system throughput by ~0.5%). Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use set_memory_[nx|rw] in bpf_prog_pack_free(). This is because VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1] [1] https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/ Suggested-by: Rick Edgecombe Signed-off-by: Song Liu --- kernel/bpf/core.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index cacd8684c3c4..b64d91fcb0ba 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void) void *ptr; size = BPF_HPAGE_SIZE * num_online_nodes(); - ptr = module_alloc(size); + ptr = module_alloc_huge(size); /* Test whether we can get huge pages. If not just use PAGE_SIZE * packs. @@ -881,7 +881,7 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins GFP_KERNEL); if (!pack) return NULL; - pack->ptr = module_alloc(bpf_prog_pack_size); + pack->ptr = module_alloc_huge(bpf_prog_pack_size); if (!pack->ptr) { kfree(pack); return NULL; @@ -890,7 +890,6 @@ static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins bitmap_zero(pack->bitmap, bpf_prog_pack_size / BPF_PROG_CHUNK_SIZE); list_add_tail(&pack->list, &pack_list); - set_vm_flush_reset_perms(pack->ptr); set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); return pack; @@ -909,10 +908,9 @@ static void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insn if (size > bpf_prog_pack_size) { size = round_up(size, PAGE_SIZE); - ptr = module_alloc(size); + ptr = module_alloc_huge(size); if (ptr) { bpf_fill_ill_insns(ptr, size); - set_vm_flush_reset_perms(ptr); set_memory_ro((unsigned long)ptr, size / PAGE_SIZE); set_memory_x((unsigned long)ptr, size / PAGE_SIZE); } @@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr) mutex_lock(&pack_mutex); if (hdr->size > bpf_prog_pack_size) { + set_memory_nx((unsigned long)hdr, hdr->size / PAGE_SIZE); + set_memory_rw((unsigned long)hdr, hdr->size / PAGE_SIZE); module_memfree(hdr); goto out; } @@ -975,6 +975,8 @@ static void bpf_prog_pack_free(struct bpf_binary_header *hdr) if (bitmap_find_next_zero_area(pack->bitmap, bpf_prog_chunk_count(), 0, bpf_prog_chunk_count(), 0) == 0) { list_del(&pack->list); + set_memory_nx((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); + set_memory_rw((unsigned long)pack->ptr, bpf_prog_pack_size / PAGE_SIZE); module_memfree(pack->ptr); kfree(pack); } From patchwork Thu May 19 20:20:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856033 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF7A2C433EF for ; Thu, 19 May 2022 20:32:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229503AbiESUch convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:32:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244742AbiESUca (ORCPT ); Thu, 19 May 2022 16:32:30 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EB916B641 for ; Thu, 19 May 2022 13:32:30 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 24JKPI9m004676 for ; Thu, 19 May 2022 13:32:29 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3g4myhxsam-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:32:29 -0700 Received: from twshared24024.25.frc3.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:32:27 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id E895B7D58F6B; Thu, 19 May 2022 13:20:57 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 6/8] vmalloc: WARN for set_vm_flush_reset_perms() on huge pages Date: Thu, 19 May 2022 13:20:35 -0700 Message-ID: <20220519202037.2401584-7-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: JBNTkgkJFgZJpkJtKZmML2meS8z045oU X-Proofpoint-GUID: JBNTkgkJFgZJpkJtKZmML2meS8z045oU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net VM_FLUSH_RESET_PERMS is not yet ready for huge pages, add a WARN to catch misuse soon. Suggested-by: Rick Edgecombe Signed-off-by: Song Liu --- include/linux/vmalloc.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index b159c2789961..5e0d0a60d9d5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -238,6 +238,7 @@ static inline void set_vm_flush_reset_perms(void *addr) { struct vm_struct *vm = find_vm_area(addr); + WARN_ON_ONCE(is_vm_area_hugepages(addr)); if (vm) vm->flags |= VM_FLUSH_RESET_PERMS; } From patchwork Thu May 19 20:20:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856032 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D98D4C433FE for ; Thu, 19 May 2022 20:32:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239880AbiESUcZ convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:32:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229503AbiESUcZ (ORCPT ); Thu, 19 May 2022 16:32:25 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0526E6B641 for ; Thu, 19 May 2022 13:32:23 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 24JKPmmn012733 for ; Thu, 19 May 2022 13:32:23 -0700 Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3g5h5d4yjh-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:32:23 -0700 Received: from twshared19572.14.frc2.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:32:21 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 2D80D7D58F79; Thu, 19 May 2022 13:21:00 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 7/8] vmalloc: introduce huge_vmalloc_supported Date: Thu, 19 May 2022 13:20:36 -0700 Message-ID: <20220519202037.2401584-8-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: X10zIGV79ziPUPsmHK014lXb9UmKjHpc X-Proofpoint-ORIG-GUID: X10zIGV79ziPUPsmHK014lXb9UmKjHpc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net huge_vmalloc_supported() exposes vmap_allow_huge so that users of vmalloc APIs could know whether vmalloc will return huge pages. Suggested-by: Rick Edgecombe Signed-off-by: Song Liu Reported-by: kernel test robot --- include/linux/vmalloc.h | 2 ++ mm/vmalloc.c | 5 +++++ 2 files changed, 7 insertions(+) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 5e0d0a60d9d5..3268e7e875ff 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -242,11 +242,13 @@ static inline void set_vm_flush_reset_perms(void *addr) if (vm) vm->flags |= VM_FLUSH_RESET_PERMS; } +bool huge_vmalloc_supported(void); #else static inline void set_vm_flush_reset_perms(void *addr) { } +bool huge_vmalloc_supported(void) { return false; } #endif /* for /proc/kcore */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 07da85ae825b..d3b11317b025 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -72,6 +72,11 @@ early_param("nohugevmalloc", set_nohugevmalloc); static const bool vmap_allow_huge = false; #endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */ +bool huge_vmalloc_supported(void) +{ + return vmap_allow_huge; +} + bool is_vmalloc_addr(const void *x) { unsigned long addr = (unsigned long)kasan_reset_tag(x); From patchwork Thu May 19 20:20:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 12856011 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AAB2C433EF for ; Thu, 19 May 2022 20:26:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244747AbiESU0c convert rfc822-to-8bit (ORCPT ); Thu, 19 May 2022 16:26:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36074 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244735AbiESU03 (ORCPT ); Thu, 19 May 2022 16:26:29 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A1ED2D1F4 for ; Thu, 19 May 2022 13:26:29 -0700 (PDT) Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24JKPGh6029522 for ; Thu, 19 May 2022 13:26:29 -0700 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3g4d82ah2d-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Thu, 19 May 2022 13:26:28 -0700 Received: from twshared0725.22.frc3.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Thu, 19 May 2022 13:26:22 -0700 Received: by devbig932.frc1.facebook.com (Postfix, from userid 4523) id 78BE17D58F7C; Thu, 19 May 2022 13:21:02 -0700 (PDT) From: Song Liu To: , , CC: , , , , , , , Song Liu Subject: [PATCH v2 bpf-next 8/8] bpf: simplify select_bpf_prog_pack_size Date: Thu, 19 May 2022 13:20:37 -0700 Message-ID: <20220519202037.2401584-9-song@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220519202037.2401584-1-song@kernel.org> References: <20220519202037.2401584-1-song@kernel.org> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: gBnVVyMwyRT0WTu6pRA80UGgxLoXeV_L X-Proofpoint-ORIG-GUID: gBnVVyMwyRT0WTu6pRA80UGgxLoXeV_L X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.874,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-19_06,2022-05-19_03,2022-02-23_01 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Use huge_vmalloc_supported to simplify select_bpf_prog_pack_size, so that we don't allocate some huge pages and free them immediately. Suggested-by: Rick Edgecombe Signed-off-by: Song Liu Reported-by: kernel test robot Reported-by: kernel test robot --- kernel/bpf/core.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index b64d91fcb0ba..62c8632a59a2 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -856,20 +856,14 @@ static size_t select_bpf_prog_pack_size(void) size_t size; void *ptr; - size = BPF_HPAGE_SIZE * num_online_nodes(); - ptr = module_alloc_huge(size); - - /* Test whether we can get huge pages. If not just use PAGE_SIZE - * packs. - */ - if (!ptr || !is_vm_area_hugepages(ptr)) { + if (huge_vmalloc_supported()) { + size = BPF_HPAGE_SIZE * num_online_nodes(); + bpf_prog_pack_mask = BPF_HPAGE_MASK; + } else { size = PAGE_SIZE; bpf_prog_pack_mask = PAGE_MASK; - } else { - bpf_prog_pack_mask = BPF_HPAGE_MASK; } - vfree(ptr); return size; }