From patchwork Mon Mar 21 15:28:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12787338 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27274C433EF for ; Mon, 21 Mar 2022 15:17:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233189AbiCUPTL (ORCPT ); Mon, 21 Mar 2022 11:19:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349773AbiCUPTL (ORCPT ); Mon, 21 Mar 2022 11:19:11 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA09810EC5A for ; Mon, 21 Mar 2022 08:17:45 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KMdXK1cZkzcbFq; Mon, 21 Mar 2022 23:17:37 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 21 Mar 2022 23:17:42 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH bpf-next v5 1/5] arm64: insn: add ldr/str with immediate offset Date: Mon, 21 Mar 2022 11:28:48 -0400 Message-ID: <20220321152852.2334294-2-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220321152852.2334294-1-xukuohai@huawei.com> References: <20220321152852.2334294-1-xukuohai@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch introduces ldr/str with immediate offset support to simplify the JIT implementation of BPF LDX/STX instructions on arm64. Although arm64 ldr/str immediate is available in pre-index, post-index and unsigned offset forms, the unsigned offset form is sufficient for BPF, so this patch only adds this type. Signed-off-by: Xu Kuohai --- arch/arm64/include/asm/insn.h | 9 +++++ arch/arm64/lib/insn.c | 67 +++++++++++++++++++++++++++-------- 2 files changed, 62 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h index 0b6b31307e68..d507acfdf02d 100644 --- a/arch/arm64/include/asm/insn.h +++ b/arch/arm64/include/asm/insn.h @@ -200,6 +200,8 @@ enum aarch64_insn_size_type { enum aarch64_insn_ldst_type { AARCH64_INSN_LDST_LOAD_REG_OFFSET, AARCH64_INSN_LDST_STORE_REG_OFFSET, + AARCH64_INSN_LDST_LOAD_IMM_OFFSET, + AARCH64_INSN_LDST_STORE_IMM_OFFSET, AARCH64_INSN_LDST_LOAD_PAIR_PRE_INDEX, AARCH64_INSN_LDST_STORE_PAIR_PRE_INDEX, AARCH64_INSN_LDST_LOAD_PAIR_POST_INDEX, @@ -334,6 +336,7 @@ __AARCH64_INSN_FUNCS(load_pre, 0x3FE00C00, 0x38400C00) __AARCH64_INSN_FUNCS(store_post, 0x3FE00C00, 0x38000400) __AARCH64_INSN_FUNCS(load_post, 0x3FE00C00, 0x38400400) __AARCH64_INSN_FUNCS(str_reg, 0x3FE0EC00, 0x38206800) +__AARCH64_INSN_FUNCS(str_imm, 0x3FC00000, 0x39000000) __AARCH64_INSN_FUNCS(ldadd, 0x3F20FC00, 0x38200000) __AARCH64_INSN_FUNCS(ldclr, 0x3F20FC00, 0x38201000) __AARCH64_INSN_FUNCS(ldeor, 0x3F20FC00, 0x38202000) @@ -341,6 +344,7 @@ __AARCH64_INSN_FUNCS(ldset, 0x3F20FC00, 0x38203000) __AARCH64_INSN_FUNCS(swp, 0x3F20FC00, 0x38208000) __AARCH64_INSN_FUNCS(cas, 0x3FA07C00, 0x08A07C00) __AARCH64_INSN_FUNCS(ldr_reg, 0x3FE0EC00, 0x38606800) +__AARCH64_INSN_FUNCS(ldr_imm, 0x3FC00000, 0x39400000) __AARCH64_INSN_FUNCS(ldr_lit, 0xBF000000, 0x18000000) __AARCH64_INSN_FUNCS(ldrsw_lit, 0xFF000000, 0x98000000) __AARCH64_INSN_FUNCS(exclusive, 0x3F800000, 0x08000000) @@ -500,6 +504,11 @@ u32 aarch64_insn_gen_load_store_reg(enum aarch64_insn_register reg, enum aarch64_insn_register offset, enum aarch64_insn_size_type size, enum aarch64_insn_ldst_type type); +u32 aarch64_insn_gen_load_store_imm(enum aarch64_insn_register reg, + enum aarch64_insn_register base, + unsigned int imm, + enum aarch64_insn_size_type size, + enum aarch64_insn_ldst_type type); u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1, enum aarch64_insn_register reg2, enum aarch64_insn_register base, diff --git a/arch/arm64/lib/insn.c b/arch/arm64/lib/insn.c index 5e90887deec4..695d7368fadc 100644 --- a/arch/arm64/lib/insn.c +++ b/arch/arm64/lib/insn.c @@ -299,29 +299,24 @@ static u32 aarch64_insn_encode_register(enum aarch64_insn_register_type type, return insn; } +static const u32 aarch64_insn_ldst_size[] = { + [AARCH64_INSN_SIZE_8] = 0, + [AARCH64_INSN_SIZE_16] = 1, + [AARCH64_INSN_SIZE_32] = 2, + [AARCH64_INSN_SIZE_64] = 3, +}; + static u32 aarch64_insn_encode_ldst_size(enum aarch64_insn_size_type type, u32 insn) { u32 size; - switch (type) { - case AARCH64_INSN_SIZE_8: - size = 0; - break; - case AARCH64_INSN_SIZE_16: - size = 1; - break; - case AARCH64_INSN_SIZE_32: - size = 2; - break; - case AARCH64_INSN_SIZE_64: - size = 3; - break; - default: + if (type < AARCH64_INSN_SIZE_8 || type > AARCH64_INSN_SIZE_64) { pr_err("%s: unknown size encoding %d\n", __func__, type); return AARCH64_BREAK_FAULT; } + size = aarch64_insn_ldst_size[type]; insn &= ~GENMASK(31, 30); insn |= size << 30; @@ -504,6 +499,50 @@ u32 aarch64_insn_gen_load_store_reg(enum aarch64_insn_register reg, offset); } +u32 aarch64_insn_gen_load_store_imm(enum aarch64_insn_register reg, + enum aarch64_insn_register base, + unsigned int imm, + enum aarch64_insn_size_type size, + enum aarch64_insn_ldst_type type) +{ + u32 insn; + u32 shift; + + if (size < AARCH64_INSN_SIZE_8 || size > AARCH64_INSN_SIZE_64) { + pr_err("%s: unknown size encoding %d\n", __func__, type); + return AARCH64_BREAK_FAULT; + } + + shift = aarch64_insn_ldst_size[size]; + if (imm & ~(BIT(12 + shift) - BIT(shift))) { + pr_err("%s: invalid imm: %d\n", __func__, imm); + return AARCH64_BREAK_FAULT; + } + + imm >>= shift; + + switch (type) { + case AARCH64_INSN_LDST_LOAD_IMM_OFFSET: + insn = aarch64_insn_get_ldr_imm_value(); + break; + case AARCH64_INSN_LDST_STORE_IMM_OFFSET: + insn = aarch64_insn_get_str_imm_value(); + break; + default: + pr_err("%s: unknown load/store encoding %d\n", __func__, type); + return AARCH64_BREAK_FAULT; + } + + insn = aarch64_insn_encode_ldst_size(size, insn); + + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT, insn, reg); + + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, + base); + + return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_12, insn, imm); +} + u32 aarch64_insn_gen_load_store_pair(enum aarch64_insn_register reg1, enum aarch64_insn_register reg2, enum aarch64_insn_register base, From patchwork Mon Mar 21 15:28:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12787339 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18833C433F5 for ; Mon, 21 Mar 2022 15:17:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349773AbiCUPTM (ORCPT ); Mon, 21 Mar 2022 11:19:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349901AbiCUPTL (ORCPT ); Mon, 21 Mar 2022 11:19:11 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E115111DDC for ; Mon, 21 Mar 2022 08:17:46 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.56]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KMdXK6G48zcbCw; Mon, 21 Mar 2022 23:17:37 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 21 Mar 2022 23:17:43 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH bpf-next v5 2/5] bpf, arm64: Optimize BPF store/load using arm64 str/ldr(immediate offset) Date: Mon, 21 Mar 2022 11:28:49 -0400 Message-ID: <20220321152852.2334294-3-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220321152852.2334294-1-xukuohai@huawei.com> References: <20220321152852.2334294-1-xukuohai@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The current BPF store/load instruction is translated by the JIT into two instructions. The first instruction moves the immediate offset into a temporary register. The second instruction uses this temporary register to do the real store/load. In fact, arm64 supports addressing with immediate offsets. So This patch introduces optimization that uses arm64 str/ldr instruction with immediate offset when the offset fits. Example of generated instuction for r2 = *(u64 *)(r1 + 0): without optimization: mov x10, 0 ldr x1, [x0, x10] with optimization: ldr x1, [x0, 0] If the offset is negative, or is not aligned correctly, or exceeds max value, rollback to the use of temporary register. Signed-off-by: Xu Kuohai --- arch/arm64/net/bpf_jit.h | 14 ++++ arch/arm64/net/bpf_jit_comp.c | 128 ++++++++++++++++++++++++++++++---- 2 files changed, 127 insertions(+), 15 deletions(-) diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h index dd59b5ad8fe4..3920213244f0 100644 --- a/arch/arm64/net/bpf_jit.h +++ b/arch/arm64/net/bpf_jit.h @@ -66,6 +66,20 @@ #define A64_STR64(Xt, Xn, Xm) A64_LS_REG(Xt, Xn, Xm, 64, STORE) #define A64_LDR64(Xt, Xn, Xm) A64_LS_REG(Xt, Xn, Xm, 64, LOAD) +/* Load/store register (immediate offset) */ +#define A64_LS_IMM(Rt, Rn, imm, size, type) \ + aarch64_insn_gen_load_store_imm(Rt, Rn, imm, \ + AARCH64_INSN_SIZE_##size, \ + AARCH64_INSN_LDST_##type##_IMM_OFFSET) +#define A64_STRBI(Wt, Xn, imm) A64_LS_IMM(Wt, Xn, imm, 8, STORE) +#define A64_LDRBI(Wt, Xn, imm) A64_LS_IMM(Wt, Xn, imm, 8, LOAD) +#define A64_STRHI(Wt, Xn, imm) A64_LS_IMM(Wt, Xn, imm, 16, STORE) +#define A64_LDRHI(Wt, Xn, imm) A64_LS_IMM(Wt, Xn, imm, 16, LOAD) +#define A64_STR32I(Wt, Xn, imm) A64_LS_IMM(Wt, Xn, imm, 32, STORE) +#define A64_LDR32I(Wt, Xn, imm) A64_LS_IMM(Wt, Xn, imm, 32, LOAD) +#define A64_STR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, STORE) +#define A64_LDR64I(Xt, Xn, imm) A64_LS_IMM(Xt, Xn, imm, 64, LOAD) + /* Load/store register pair */ #define A64_LS_PAIR(Rt, Rt2, Rn, offset, ls, type) \ aarch64_insn_gen_load_store_pair(Rt, Rt2, Rn, offset, \ diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index e850c69e128c..06037607c04a 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -191,6 +191,47 @@ static bool is_addsub_imm(u32 imm) return !(imm & ~0xfff) || !(imm & ~0xfff000); } +/* + * There are 3 types of AArch64 LDR/STR (immediate) instruction: + * Post-index, Pre-index, Unsigned offset. + * + * For BPF ldr/str, the "unsigned offset" type is sufficient. + * + * "Unsigned offset" type LDR(immediate) format: + * + * 3 2 1 0 + * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |x x|1 1 1 0 0 1 0 1| imm12 | Rn | Rt | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * scale + * + * "Unsigned offset" type STR(immediate) format: + * 3 2 1 0 + * 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * |x x|1 1 1 0 0 1 0 0| imm12 | Rn | Rt | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * scale + * + * The offset is calculated from imm12 and scale in the following way: + * + * offset = (u64)imm12 << scale + */ +static bool is_lsi_offset(s16 offset, int scale) +{ + if (offset < 0) + return false; + + if (offset > (0xFFF << scale)) + return false; + + if (offset & ((1 << scale) - 1)) + return false; + + return true; +} + /* Tail call offset to jump into */ #if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) #define PROLOGUE_OFFSET 8 @@ -971,19 +1012,38 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_LDX | BPF_PROBE_MEM | BPF_W: case BPF_LDX | BPF_PROBE_MEM | BPF_H: case BPF_LDX | BPF_PROBE_MEM | BPF_B: - emit_a64_mov_i(1, tmp, off, ctx); switch (BPF_SIZE(code)) { case BPF_W: - emit(A64_LDR32(dst, src, tmp), ctx); + if (is_lsi_offset(off, 2)) { + emit(A64_LDR32I(dst, src, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_LDR32(dst, src, tmp), ctx); + } break; case BPF_H: - emit(A64_LDRH(dst, src, tmp), ctx); + if (is_lsi_offset(off, 1)) { + emit(A64_LDRHI(dst, src, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_LDRH(dst, src, tmp), ctx); + } break; case BPF_B: - emit(A64_LDRB(dst, src, tmp), ctx); + if (is_lsi_offset(off, 0)) { + emit(A64_LDRBI(dst, src, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_LDRB(dst, src, tmp), ctx); + } break; case BPF_DW: - emit(A64_LDR64(dst, src, tmp), ctx); + if (is_lsi_offset(off, 3)) { + emit(A64_LDR64I(dst, src, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_LDR64(dst, src, tmp), ctx); + } break; } @@ -1011,20 +1071,39 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_ST | BPF_MEM | BPF_B: case BPF_ST | BPF_MEM | BPF_DW: /* Load imm to a register then store it */ - emit_a64_mov_i(1, tmp2, off, ctx); emit_a64_mov_i(1, tmp, imm, ctx); switch (BPF_SIZE(code)) { case BPF_W: - emit(A64_STR32(tmp, dst, tmp2), ctx); + if (is_lsi_offset(off, 2)) { + emit(A64_STR32I(tmp, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp2, off, ctx); + emit(A64_STR32(tmp, dst, tmp2), ctx); + } break; case BPF_H: - emit(A64_STRH(tmp, dst, tmp2), ctx); + if (is_lsi_offset(off, 1)) { + emit(A64_STRHI(tmp, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp2, off, ctx); + emit(A64_STRH(tmp, dst, tmp2), ctx); + } break; case BPF_B: - emit(A64_STRB(tmp, dst, tmp2), ctx); + if (is_lsi_offset(off, 0)) { + emit(A64_STRBI(tmp, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp2, off, ctx); + emit(A64_STRB(tmp, dst, tmp2), ctx); + } break; case BPF_DW: - emit(A64_STR64(tmp, dst, tmp2), ctx); + if (is_lsi_offset(off, 3)) { + emit(A64_STR64I(tmp, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp2, off, ctx); + emit(A64_STR64(tmp, dst, tmp2), ctx); + } break; } break; @@ -1034,19 +1113,38 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_STX | BPF_MEM | BPF_H: case BPF_STX | BPF_MEM | BPF_B: case BPF_STX | BPF_MEM | BPF_DW: - emit_a64_mov_i(1, tmp, off, ctx); switch (BPF_SIZE(code)) { case BPF_W: - emit(A64_STR32(src, dst, tmp), ctx); + if (is_lsi_offset(off, 2)) { + emit(A64_STR32I(src, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_STR32(src, dst, tmp), ctx); + } break; case BPF_H: - emit(A64_STRH(src, dst, tmp), ctx); + if (is_lsi_offset(off, 1)) { + emit(A64_STRHI(src, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_STRH(src, dst, tmp), ctx); + } break; case BPF_B: - emit(A64_STRB(src, dst, tmp), ctx); + if (is_lsi_offset(off, 0)) { + emit(A64_STRBI(src, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_STRB(src, dst, tmp), ctx); + } break; case BPF_DW: - emit(A64_STR64(src, dst, tmp), ctx); + if (is_lsi_offset(off, 3)) { + emit(A64_STR64I(src, dst, off), ctx); + } else { + emit_a64_mov_i(1, tmp, off, ctx); + emit(A64_STR64(src, dst, tmp), ctx); + } break; } break; From patchwork Mon Mar 21 15:28:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12787341 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6755EC4332F for ; Mon, 21 Mar 2022 15:17:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349914AbiCUPTO (ORCPT ); Mon, 21 Mar 2022 11:19:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349907AbiCUPTN (ORCPT ); Mon, 21 Mar 2022 11:19:13 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03A9EEDF0B for ; Mon, 21 Mar 2022 08:17:47 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4KMdVj0RCFzfZ0h; Mon, 21 Mar 2022 23:16:13 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 21 Mar 2022 23:17:44 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH bpf-next v5 3/5] bpf, arm64: adjust the offset of str/ldr(immediate) to positive number Date: Mon, 21 Mar 2022 11:28:50 -0400 Message-ID: <20220321152852.2334294-4-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220321152852.2334294-1-xukuohai@huawei.com> References: <20220321152852.2334294-1-xukuohai@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net The BPF STX/LDX instruction uses offset relative to the FP to address stack space. Since the BPF_FP locates at the top of the frame, the offset is usually a negative number. However, arm64 str/ldr immediate instruction requires that offset be a positive number. Therefore, this patch tries to convert the offsets. The method is to find the negative offset furthest from the FP firstly. Then add it to the FP, calculate a bottom position, called FPB, and then adjust the offsets in other STR/LDX instructions relative to FPB. FPB is saved using the callee-saved register x27 of arm64 which is not used yet. Before adjusting the offset, the patch checks every instruction to ensure that the FP does not change in run-time. If the FP may change, no offset is adjusted. For example, for the following bpftrace command: bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }' Without this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: mov x25, sp 1c: mov x26, #0x0 // #0 20: bti j 24: sub sp, sp, #0x90 28: add x19, x0, #0x0 2c: mov x0, #0x0 // #0 30: mov x10, #0xffffffffffffff78 // #-136 34: str x0, [x25, x10] 38: mov x10, #0xffffffffffffff80 // #-128 3c: str x0, [x25, x10] 40: mov x10, #0xffffffffffffff88 // #-120 44: str x0, [x25, x10] 48: mov x10, #0xffffffffffffff90 // #-112 4c: str x0, [x25, x10] 50: mov x10, #0xffffffffffffff98 // #-104 54: str x0, [x25, x10] 58: mov x10, #0xffffffffffffffa0 // #-96 5c: str x0, [x25, x10] 60: mov x10, #0xffffffffffffffa8 // #-88 64: str x0, [x25, x10] 68: mov x10, #0xffffffffffffffb0 // #-80 6c: str x0, [x25, x10] 70: mov x10, #0xffffffffffffffb8 // #-72 74: str x0, [x25, x10] 78: mov x10, #0xffffffffffffffc0 // #-64 7c: str x0, [x25, x10] 80: mov x10, #0xffffffffffffffc8 // #-56 84: str x0, [x25, x10] 88: mov x10, #0xffffffffffffffd0 // #-48 8c: str x0, [x25, x10] 90: mov x10, #0xffffffffffffffd8 // #-40 94: str x0, [x25, x10] 98: mov x10, #0xffffffffffffffe0 // #-32 9c: str x0, [x25, x10] a0: mov x10, #0xffffffffffffffe8 // #-24 a4: str x0, [x25, x10] a8: mov x10, #0xfffffffffffffff0 // #-16 ac: str x0, [x25, x10] b0: mov x10, #0xfffffffffffffff8 // #-8 b4: str x0, [x25, x10] b8: mov x10, #0x8 // #8 bc: ldr x2, [x19, x10] [...] With this patch, jited code(fragment): 0: bti c 4: stp x29, x30, [sp, #-16]! 8: mov x29, sp c: stp x19, x20, [sp, #-16]! 10: stp x21, x22, [sp, #-16]! 14: stp x25, x26, [sp, #-16]! 18: stp x27, x28, [sp, #-16]! 1c: mov x25, sp 20: sub x27, x25, #0x88 24: mov x26, #0x0 // #0 28: bti j 2c: sub sp, sp, #0x90 30: add x19, x0, #0x0 34: mov x0, #0x0 // #0 38: str x0, [x27] 3c: str x0, [x27, #8] 40: str x0, [x27, #16] 44: str x0, [x27, #24] 48: str x0, [x27, #32] 4c: str x0, [x27, #40] 50: str x0, [x27, #48] 54: str x0, [x27, #56] 58: str x0, [x27, #64] 5c: str x0, [x27, #72] 60: str x0, [x27, #80] 64: str x0, [x27, #88] 68: str x0, [x27, #96] 6c: str x0, [x27, #104] 70: str x0, [x27, #112] 74: str x0, [x27, #120] 78: str x0, [x27, #128] 7c: ldr x2, [x19, #8] [...] Signed-off-by: Xu Kuohai --- arch/arm64/net/bpf_jit_comp.c | 165 ++++++++++++++++++++++++++++------ 1 file changed, 138 insertions(+), 27 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 06037607c04a..dacab86e0cdc 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -26,6 +26,7 @@ #define TMP_REG_2 (MAX_BPF_JIT_REG + 1) #define TCALL_CNT (MAX_BPF_JIT_REG + 2) #define TMP_REG_3 (MAX_BPF_JIT_REG + 3) +#define FP_BOTTOM (MAX_BPF_JIT_REG + 4) #define check_imm(bits, imm) do { \ if ((((imm) > 0) && ((imm) >> (bits))) || \ @@ -63,6 +64,7 @@ static const int bpf2a64[] = { [TCALL_CNT] = A64_R(26), /* temporary register for blinding constants */ [BPF_REG_AX] = A64_R(9), + [FP_BOTTOM] = A64_R(27), }; struct jit_ctx { @@ -73,6 +75,7 @@ struct jit_ctx { int exentry_idx; __le32 *image; u32 stack_size; + int fpb_offset; }; static inline void emit(const u32 insn, struct jit_ctx *ctx) @@ -218,7 +221,7 @@ static bool is_addsub_imm(u32 imm) * * offset = (u64)imm12 << scale */ -static bool is_lsi_offset(s16 offset, int scale) +static bool is_lsi_offset(int offset, int scale) { if (offset < 0) return false; @@ -234,9 +237,9 @@ static bool is_lsi_offset(s16 offset, int scale) /* Tail call offset to jump into */ #if IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) -#define PROLOGUE_OFFSET 8 +#define PROLOGUE_OFFSET 9 #else -#define PROLOGUE_OFFSET 7 +#define PROLOGUE_OFFSET 8 #endif static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) @@ -248,6 +251,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) const u8 r9 = bpf2a64[BPF_REG_9]; const u8 fp = bpf2a64[BPF_REG_FP]; const u8 tcc = bpf2a64[TCALL_CNT]; + const u8 fpb = bpf2a64[FP_BOTTOM]; const int idx0 = ctx->idx; int cur_offset; @@ -286,6 +290,7 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) emit(A64_PUSH(r6, r7, A64_SP), ctx); emit(A64_PUSH(r8, r9, A64_SP), ctx); emit(A64_PUSH(fp, tcc, A64_SP), ctx); + emit(A64_PUSH(fpb, A64_R(28), A64_SP), ctx); /* Set up BPF prog stack base register */ emit(A64_MOV(1, fp, A64_SP), ctx); @@ -306,6 +311,8 @@ static int build_prologue(struct jit_ctx *ctx, bool ebpf_from_cbpf) emit(A64_BTI_J, ctx); } + emit(A64_SUB_I(1, fpb, fp, ctx->fpb_offset), ctx); + /* Stack must be multiples of 16B */ ctx->stack_size = round_up(prog->aux->stack_depth, 16); @@ -553,10 +560,13 @@ static void build_epilogue(struct jit_ctx *ctx) const u8 r8 = bpf2a64[BPF_REG_8]; const u8 r9 = bpf2a64[BPF_REG_9]; const u8 fp = bpf2a64[BPF_REG_FP]; + const u8 fpb = bpf2a64[FP_BOTTOM]; /* We're done with BPF stack */ emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx); + /* Restore x27 and x28 */ + emit(A64_POP(fpb, A64_R(28), A64_SP), ctx); /* Restore fs (x25) and x26 */ emit(A64_POP(fp, A64_R(26), A64_SP), ctx); @@ -650,6 +660,8 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, const u8 src = bpf2a64[insn->src_reg]; const u8 tmp = bpf2a64[TMP_REG_1]; const u8 tmp2 = bpf2a64[TMP_REG_2]; + const u8 fp = bpf2a64[BPF_REG_FP]; + const u8 fpb = bpf2a64[FP_BOTTOM]; const s16 off = insn->off; const s32 imm = insn->imm; const int i = insn - ctx->prog->insnsi; @@ -658,6 +670,9 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, u8 jmp_cond; s32 jmp_offset; u32 a64_insn; + u8 src_adj; + u8 dst_adj; + int off_adj; int ret; switch (code) { @@ -1012,34 +1027,41 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_LDX | BPF_PROBE_MEM | BPF_W: case BPF_LDX | BPF_PROBE_MEM | BPF_H: case BPF_LDX | BPF_PROBE_MEM | BPF_B: + if (ctx->fpb_offset > 0 && src == fp) { + src_adj = fpb; + off_adj = off + ctx->fpb_offset; + } else { + src_adj = src; + off_adj = off; + } switch (BPF_SIZE(code)) { case BPF_W: - if (is_lsi_offset(off, 2)) { - emit(A64_LDR32I(dst, src, off), ctx); + if (is_lsi_offset(off_adj, 2)) { + emit(A64_LDR32I(dst, src_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_LDR32(dst, src, tmp), ctx); } break; case BPF_H: - if (is_lsi_offset(off, 1)) { - emit(A64_LDRHI(dst, src, off), ctx); + if (is_lsi_offset(off_adj, 1)) { + emit(A64_LDRHI(dst, src_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_LDRH(dst, src, tmp), ctx); } break; case BPF_B: - if (is_lsi_offset(off, 0)) { - emit(A64_LDRBI(dst, src, off), ctx); + if (is_lsi_offset(off_adj, 0)) { + emit(A64_LDRBI(dst, src_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_LDRB(dst, src, tmp), ctx); } break; case BPF_DW: - if (is_lsi_offset(off, 3)) { - emit(A64_LDR64I(dst, src, off), ctx); + if (is_lsi_offset(off_adj, 3)) { + emit(A64_LDR64I(dst, src_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_LDR64(dst, src, tmp), ctx); @@ -1070,36 +1092,43 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_ST | BPF_MEM | BPF_H: case BPF_ST | BPF_MEM | BPF_B: case BPF_ST | BPF_MEM | BPF_DW: + if (ctx->fpb_offset > 0 && dst == fp) { + dst_adj = fpb; + off_adj = off + ctx->fpb_offset; + } else { + dst_adj = dst; + off_adj = off; + } /* Load imm to a register then store it */ emit_a64_mov_i(1, tmp, imm, ctx); switch (BPF_SIZE(code)) { case BPF_W: - if (is_lsi_offset(off, 2)) { - emit(A64_STR32I(tmp, dst, off), ctx); + if (is_lsi_offset(off_adj, 2)) { + emit(A64_STR32I(tmp, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp2, off, ctx); emit(A64_STR32(tmp, dst, tmp2), ctx); } break; case BPF_H: - if (is_lsi_offset(off, 1)) { - emit(A64_STRHI(tmp, dst, off), ctx); + if (is_lsi_offset(off_adj, 1)) { + emit(A64_STRHI(tmp, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp2, off, ctx); emit(A64_STRH(tmp, dst, tmp2), ctx); } break; case BPF_B: - if (is_lsi_offset(off, 0)) { - emit(A64_STRBI(tmp, dst, off), ctx); + if (is_lsi_offset(off_adj, 0)) { + emit(A64_STRBI(tmp, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp2, off, ctx); emit(A64_STRB(tmp, dst, tmp2), ctx); } break; case BPF_DW: - if (is_lsi_offset(off, 3)) { - emit(A64_STR64I(tmp, dst, off), ctx); + if (is_lsi_offset(off_adj, 3)) { + emit(A64_STR64I(tmp, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp2, off, ctx); emit(A64_STR64(tmp, dst, tmp2), ctx); @@ -1113,34 +1142,41 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_STX | BPF_MEM | BPF_H: case BPF_STX | BPF_MEM | BPF_B: case BPF_STX | BPF_MEM | BPF_DW: + if (ctx->fpb_offset > 0 && dst == fp) { + dst_adj = fpb; + off_adj = off + ctx->fpb_offset; + } else { + dst_adj = dst; + off_adj = off; + } switch (BPF_SIZE(code)) { case BPF_W: - if (is_lsi_offset(off, 2)) { - emit(A64_STR32I(src, dst, off), ctx); + if (is_lsi_offset(off_adj, 2)) { + emit(A64_STR32I(src, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_STR32(src, dst, tmp), ctx); } break; case BPF_H: - if (is_lsi_offset(off, 1)) { - emit(A64_STRHI(src, dst, off), ctx); + if (is_lsi_offset(off_adj, 1)) { + emit(A64_STRHI(src, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_STRH(src, dst, tmp), ctx); } break; case BPF_B: - if (is_lsi_offset(off, 0)) { - emit(A64_STRBI(src, dst, off), ctx); + if (is_lsi_offset(off_adj, 0)) { + emit(A64_STRBI(src, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_STRB(src, dst, tmp), ctx); } break; case BPF_DW: - if (is_lsi_offset(off, 3)) { - emit(A64_STR64I(src, dst, off), ctx); + if (is_lsi_offset(off_adj, 3)) { + emit(A64_STR64I(src, dst_adj, off_adj), ctx); } else { emit_a64_mov_i(1, tmp, off, ctx); emit(A64_STR64(src, dst, tmp), ctx); @@ -1167,6 +1203,79 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, return 0; } +/* + * Return 0 if FP may change at runtime, otherwise find the minimum negative + * offset to FP, converts it to positive number, and align down to 8 bytes. + */ +static int find_fpb_offset(struct bpf_prog *prog) +{ + int i; + int offset = 0; + + for (i = 0; i < prog->len; i++) { + const struct bpf_insn *insn = &prog->insnsi[i]; + const u8 class = BPF_CLASS(insn->code); + const u8 mode = BPF_MODE(insn->code); + const u8 src = insn->src_reg; + const u8 dst = insn->dst_reg; + const s32 imm = insn->imm; + const s16 off = insn->off; + + switch (class) { + case BPF_STX: + case BPF_ST: + /* fp holds atomic operation result */ + if (class == BPF_STX && mode == BPF_ATOMIC && + ((imm == BPF_XCHG || + imm == (BPF_FETCH | BPF_ADD) || + imm == (BPF_FETCH | BPF_AND) || + imm == (BPF_FETCH | BPF_XOR) || + imm == (BPF_FETCH | BPF_OR)) && + src == BPF_REG_FP)) + return 0; + + if (mode == BPF_MEM && dst == BPF_REG_FP && + off < offset) + offset = insn->off; + break; + + case BPF_JMP32: + case BPF_JMP: + break; + + case BPF_LDX: + case BPF_LD: + /* fp holds load result */ + if (dst == BPF_REG_FP) + return 0; + + if (class == BPF_LDX && mode == BPF_MEM && + src == BPF_REG_FP && off < offset) + offset = off; + break; + + case BPF_ALU: + case BPF_ALU64: + default: + /* fp holds ALU result */ + if (dst == BPF_REG_FP) + return 0; + } + } + + if (offset < 0) { + /* + * safely be converted to a positive 'int', since insn->off + * is 's16' + */ + offset = -offset; + /* align down to 8 bytes */ + offset = ALIGN_DOWN(offset, 8); + } + + return offset; +} + static int build_body(struct jit_ctx *ctx, bool extra_pass) { const struct bpf_prog *prog = ctx->prog; @@ -1288,6 +1397,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) goto out_off; } + ctx.fpb_offset = find_fpb_offset(prog); + /* * 1. Initial fake pass to compute ctx->idx and ctx->offset. * From patchwork Mon Mar 21 15:28:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12787342 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADA03C4167B for ; Mon, 21 Mar 2022 15:17:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349908AbiCUPTP (ORCPT ); Mon, 21 Mar 2022 11:19:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349902AbiCUPTO (ORCPT ); Mon, 21 Mar 2022 11:19:14 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D495C111DDC for ; Mon, 21 Mar 2022 08:17:47 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4KMdVj22WJzfZSh; Mon, 21 Mar 2022 23:16:13 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 21 Mar 2022 23:17:44 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH bpf-next v5 4/5] bpf/tests: Add tests for BPF_LDX/BPF_STX with different offsets Date: Mon, 21 Mar 2022 11:28:51 -0400 Message-ID: <20220321152852.2334294-5-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220321152852.2334294-1-xukuohai@huawei.com> References: <20220321152852.2334294-1-xukuohai@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net This patch adds tests to verify the behavior of BPF_LDX/BPF_STX + BPF_B/BPF_H/BPF_W/BPF_DW with negative offset, small positive offset, large positive offset, and misaligned offset. Tested on both big-endian and little-endian arm64 qemu, result: test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed]'] test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [8/8 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED Signed-off-by: Xu Kuohai --- lib/test_bpf.c | 285 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 281 insertions(+), 4 deletions(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 0c5cb2d6436a..aa0c7c68b2be 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -53,6 +53,7 @@ #define FLAG_EXPECTED_FAIL BIT(1) #define FLAG_SKB_FRAG BIT(2) #define FLAG_VERIFIER_ZEXT BIT(3) +#define FLAG_LARGE_MEM BIT(4) enum { CLASSIC = BIT(6), /* Old BPF instructions only. */ @@ -7838,7 +7839,7 @@ static struct bpf_test tests[] = { }, /* BPF_LDX_MEM B/H/W/DW */ { - "BPF_LDX_MEM | BPF_B", + "BPF_LDX_MEM | BPF_B, base", .u.insns_int = { BPF_LD_IMM64(R1, 0x0102030405060708ULL), BPF_LD_IMM64(R2, 0x0000000000000008ULL), @@ -7878,7 +7879,56 @@ static struct bpf_test tests[] = { .stack_depth = 8, }, { - "BPF_LDX_MEM | BPF_H", + "BPF_LDX_MEM | BPF_B, negative offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000000088ULL), + BPF_ALU64_IMM(BPF_ADD, R1, 512), + BPF_STX_MEM(BPF_B, R1, R2, -256), + BPF_LDX_MEM(BPF_B, R0, R1, -256), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_B, small positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000000088ULL), + BPF_STX_MEM(BPF_B, R1, R2, 256), + BPF_LDX_MEM(BPF_B, R0, R1, 256), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_B, large positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000000088ULL), + BPF_STX_MEM(BPF_B, R1, R2, 4096), + BPF_LDX_MEM(BPF_B, R0, R1, 4096), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 4096 + 16, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_H, base", .u.insns_int = { BPF_LD_IMM64(R1, 0x0102030405060708ULL), BPF_LD_IMM64(R2, 0x0000000000000708ULL), @@ -7918,7 +7968,72 @@ static struct bpf_test tests[] = { .stack_depth = 8, }, { - "BPF_LDX_MEM | BPF_W", + "BPF_LDX_MEM | BPF_H, negative offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000008788ULL), + BPF_ALU64_IMM(BPF_ADD, R1, 512), + BPF_STX_MEM(BPF_H, R1, R2, -256), + BPF_LDX_MEM(BPF_H, R0, R1, -256), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_H, small positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000008788ULL), + BPF_STX_MEM(BPF_H, R1, R2, 256), + BPF_LDX_MEM(BPF_H, R0, R1, 256), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_H, large positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000008788ULL), + BPF_STX_MEM(BPF_H, R1, R2, 8192), + BPF_LDX_MEM(BPF_H, R0, R1, 8192), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 8192 + 16, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_H, unaligned positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000000008788ULL), + BPF_STX_MEM(BPF_H, R1, R2, 13), + BPF_LDX_MEM(BPF_H, R0, R1, 13), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 32, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_W, base", .u.insns_int = { BPF_LD_IMM64(R1, 0x0102030405060708ULL), BPF_LD_IMM64(R2, 0x0000000005060708ULL), @@ -7957,6 +8072,162 @@ static struct bpf_test tests[] = { { { 0, 0 } }, .stack_depth = 8, }, + { + "BPF_LDX_MEM | BPF_W, negative offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000085868788ULL), + BPF_ALU64_IMM(BPF_ADD, R1, 512), + BPF_STX_MEM(BPF_W, R1, R2, -256), + BPF_LDX_MEM(BPF_W, R0, R1, -256), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_W, small positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000085868788ULL), + BPF_STX_MEM(BPF_W, R1, R2, 256), + BPF_LDX_MEM(BPF_W, R0, R1, 256), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_W, large positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000085868788ULL), + BPF_STX_MEM(BPF_W, R1, R2, 16384), + BPF_LDX_MEM(BPF_W, R0, R1, 16384), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 16384 + 16, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_W, unaligned positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_LD_IMM64(R3, 0x0000000085868788ULL), + BPF_STX_MEM(BPF_W, R1, R2, 13), + BPF_LDX_MEM(BPF_W, R0, R1, 13), + BPF_JMP_REG(BPF_JNE, R0, R3, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 32, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_DW, base", + .u.insns_int = { + BPF_LD_IMM64(R1, 0x0102030405060708ULL), + BPF_STX_MEM(BPF_DW, R10, R1, -8), + BPF_LDX_MEM(BPF_DW, R0, R10, -8), + BPF_JMP_REG(BPF_JNE, R0, R1, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL, + { }, + { { 0, 0 } }, + .stack_depth = 8, + }, + { + "BPF_LDX_MEM | BPF_DW, MSB set", + .u.insns_int = { + BPF_LD_IMM64(R1, 0x8182838485868788ULL), + BPF_STX_MEM(BPF_DW, R10, R1, -8), + BPF_LDX_MEM(BPF_DW, R0, R10, -8), + BPF_JMP_REG(BPF_JNE, R0, R1, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL, + { }, + { { 0, 0 } }, + .stack_depth = 8, + }, + { + "BPF_LDX_MEM | BPF_DW, negative offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_ALU64_IMM(BPF_ADD, R1, 512), + BPF_STX_MEM(BPF_DW, R1, R2, -256), + BPF_LDX_MEM(BPF_DW, R0, R1, -256), + BPF_JMP_REG(BPF_JNE, R0, R2, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_DW, small positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_STX_MEM(BPF_DW, R1, R2, 256), + BPF_LDX_MEM(BPF_DW, R0, R1, 256), + BPF_JMP_REG(BPF_JNE, R0, R2, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 512, 0 } }, + .stack_depth = 8, + }, + { + "BPF_LDX_MEM | BPF_DW, large positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_STX_MEM(BPF_DW, R1, R2, 32760), + BPF_LDX_MEM(BPF_DW, R0, R1, 32760), + BPF_JMP_REG(BPF_JNE, R0, R2, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 32768, 0 } }, + .stack_depth = 0, + }, + { + "BPF_LDX_MEM | BPF_DW, unaligned positive offset", + .u.insns_int = { + BPF_LD_IMM64(R2, 0x8182838485868788ULL), + BPF_STX_MEM(BPF_DW, R1, R2, 13), + BPF_LDX_MEM(BPF_DW, R0, R1, 13), + BPF_JMP_REG(BPF_JNE, R0, R2, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + INTERNAL | FLAG_LARGE_MEM, + { }, + { { 32, 0 } }, + .stack_depth = 0, + }, /* BPF_STX_MEM B/H/W/DW */ { "BPF_STX_MEM | BPF_B", @@ -14094,6 +14365,9 @@ static void *generate_test_data(struct bpf_test *test, int sub) if (test->aux & FLAG_NO_DATA) return NULL; + if (test->aux & FLAG_LARGE_MEM) + return kmalloc(test->test[sub].data_size, GFP_KERNEL); + /* Test case expects an skb, so populate one. Various * subtests generate skbs of different sizes based on * the same data. @@ -14137,7 +14411,10 @@ static void release_test_data(const struct bpf_test *test, void *data) if (test->aux & FLAG_NO_DATA) return; - kfree_skb(data); + if (test->aux & FLAG_LARGE_MEM) + kfree(data); + else + kfree_skb(data); } static int filter_length(int which) From patchwork Mon Mar 21 15:28:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xu Kuohai X-Patchwork-Id: 12787340 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EBDFC433F5 for ; Mon, 21 Mar 2022 15:17:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349924AbiCUPTP (ORCPT ); Mon, 21 Mar 2022 11:19:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349908AbiCUPTO (ORCPT ); Mon, 21 Mar 2022 11:19:14 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9CEF11174C for ; Mon, 21 Mar 2022 08:17:48 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4KMdRz6P69z9sps; Mon, 21 Mar 2022 23:13:51 +0800 (CST) Received: from huawei.com (10.67.174.197) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 21 Mar 2022 23:17:45 +0800 From: Xu Kuohai To: , CC: Catalin Marinas , Will Deacon , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Julien Thierry , Mark Rutland , Hou Tao , Fuad Tabba , James Morse Subject: [PATCH bpf-next v5 5/5] bpf, arm64: add load store test case for tail call Date: Mon, 21 Mar 2022 11:28:52 -0400 Message-ID: <20220321152852.2334294-6-xukuohai@huawei.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220321152852.2334294-1-xukuohai@huawei.com> References: <20220321152852.2334294-1-xukuohai@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.67.174.197] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net Add test case to enusre that the caller and callee's fp offsets are correct during tail call. Tested on both big-endian and little-endian arm64 qemu, result: test_bpf: Summary: 1026 PASSED, 0 FAILED, [1014/1014 JIT'ed] test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed] test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED Signed-off-by: Xu Kuohai --- lib/test_bpf.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index aa0c7c68b2be..2a7836e115b4 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -14950,6 +14950,36 @@ static struct tail_call_test tail_call_tests[] = { }, .result = 10, }, + { + "Tail call load/store leaf", + .insns = { + BPF_ALU64_IMM(BPF_MOV, R1, 1), + BPF_ALU64_IMM(BPF_MOV, R2, 2), + BPF_ALU64_REG(BPF_MOV, R3, BPF_REG_FP), + BPF_STX_MEM(BPF_DW, R3, R1, -8), + BPF_STX_MEM(BPF_DW, R3, R2, -16), + BPF_LDX_MEM(BPF_DW, R0, BPF_REG_FP, -8), + BPF_JMP_REG(BPF_JNE, R0, R1, 3), + BPF_LDX_MEM(BPF_DW, R0, BPF_REG_FP, -16), + BPF_JMP_REG(BPF_JNE, R0, R2, 1), + BPF_ALU64_IMM(BPF_MOV, R0, 0), + BPF_EXIT_INSN(), + }, + .result = 0, + .stack_depth = 32, + }, + { + "Tail call load/store", + .insns = { + BPF_ALU64_IMM(BPF_MOV, R0, 3), + BPF_STX_MEM(BPF_DW, BPF_REG_FP, R0, -8), + TAIL_CALL(-1), + BPF_ALU64_IMM(BPF_MOV, R0, -1), + BPF_EXIT_INSN(), + }, + .result = 0, + .stack_depth = 16, + }, { "Tail call error path, max count reached", .insns = {