Message ID | 20201215121816.1048557-9-jackmanb@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Atomics for eBPF | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 15634 this patch: 15634 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: Lines should not end with a '(' CHECK: No space is necessary after a cast WARNING: line length of 104 exceeds 80 columns WARNING: line length of 110 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 15302 this patch: 15302 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On 12/15/20 4:18 AM, Brendan Jackman wrote: > This adds two atomic opcodes, both of which include the BPF_FETCH > flag. XCHG without the BPF_FETCH flag would naturally encode > atomic_set. This is not supported because it would be of limited > value to userspace (it doesn't imply any barriers). CMPXCHG without > BPF_FETCH woulud be an atomic compare-and-write. We don't have such > an operation in the kernel so it isn't provided to BPF either. > > There are two significant design decisions made for the CMPXCHG > instruction: > > - To solve the issue that this operation fundamentally has 3 > operands, but we only have two register fields. Therefore the > operand we compare against (the kernel's API calls it 'old') is > hard-coded to be R0. x86 has similar design (and A64 doesn't > have this problem). > > A potential alternative might be to encode the other operand's > register number in the immediate field. > > - The kernel's atomic_cmpxchg returns the old value, while the C11 > userspace APIs return a boolean indicating the comparison > result. Which should BPF do? A64 returns the old value. x86 returns > the old value in the hard-coded register (and also sets a > flag). That means return-old-value is easier to JIT, so that's > what we use. > > Signed-off-by: Brendan Jackman <jackmanb@google.com> Ack with a minor comment below. Acked-by: Yonghong Song <yhs@fb.com> > --- > arch/x86/net/bpf_jit_comp.c | 8 ++++++++ > include/linux/filter.h | 2 ++ > include/uapi/linux/bpf.h | 4 +++- > kernel/bpf/core.c | 20 ++++++++++++++++++++ > kernel/bpf/disasm.c | 15 +++++++++++++++ > kernel/bpf/verifier.c | 19 +++++++++++++++++-- > tools/include/linux/filter.h | 2 ++ > tools/include/uapi/linux/bpf.h | 4 +++- > 8 files changed, 70 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > index eea7d8b0bb12..308241187582 100644 > --- a/arch/x86/net/bpf_jit_comp.c > +++ b/arch/x86/net/bpf_jit_comp.c > @@ -815,6 +815,14 @@ static int emit_atomic(u8 **pprog, u8 atomic_op, > /* src_reg = atomic_fetch_add(dst_reg + off, src_reg); */ > EMIT2(0x0F, 0xC1); > break; > + case BPF_XCHG: > + /* src_reg = atomic_xchg(dst_reg + off, src_reg); */ > + EMIT1(0x87); > + break; > + case BPF_CMPXCHG: > + /* r0 = atomic_cmpxchg(dst_reg + off, r0, src_reg); */ > + EMIT2(0x0F, 0xB1); > + break; > default: > pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op); > return -EFAULT; > diff --git a/include/linux/filter.h b/include/linux/filter.h > index c3e87a63e0b8..16e0ba5e8937 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -265,6 +265,8 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) > * > * BPF_ADD *(uint *) (dst_reg + off16) += src_reg > * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg); > + * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg) > + * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg) > */ > > #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \ > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 760ae333a5ed..538b95472c8f 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -45,7 +45,9 @@ > #define BPF_EXIT 0x90 /* function return */ > > /* atomic op type fields (stored in immediate) */ > -#define BPF_FETCH 0x01 /* fetch previous value into src reg */ > +#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */ > +#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */ > +#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */ Although the above code works fine, I would suggest to put BPF_FETCH definition before BPF_XCHG and BPF_CMPXCHG, which makes more sense intuitively. > > /* Register numbers */ > enum { > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c [...] \ > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 760ae333a5ed..538b95472c8f 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -45,7 +45,9 @@ > #define BPF_EXIT 0x90 /* function return */ > > /* atomic op type fields (stored in immediate) */ > -#define BPF_FETCH 0x01 /* fetch previous value into src reg */ > +#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */ > +#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */ > +#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */ same here. > > /* Register numbers */ > enum { >
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index eea7d8b0bb12..308241187582 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -815,6 +815,14 @@ static int emit_atomic(u8 **pprog, u8 atomic_op, /* src_reg = atomic_fetch_add(dst_reg + off, src_reg); */ EMIT2(0x0F, 0xC1); break; + case BPF_XCHG: + /* src_reg = atomic_xchg(dst_reg + off, src_reg); */ + EMIT1(0x87); + break; + case BPF_CMPXCHG: + /* r0 = atomic_cmpxchg(dst_reg + off, r0, src_reg); */ + EMIT2(0x0F, 0xB1); + break; default: pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op); return -EFAULT; diff --git a/include/linux/filter.h b/include/linux/filter.h index c3e87a63e0b8..16e0ba5e8937 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -265,6 +265,8 @@ static inline bool insn_is_zext(const struct bpf_insn *insn) * * BPF_ADD *(uint *) (dst_reg + off16) += src_reg * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg); + * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg) + * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg) */ #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 760ae333a5ed..538b95472c8f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -45,7 +45,9 @@ #define BPF_EXIT 0x90 /* function return */ /* atomic op type fields (stored in immediate) */ -#define BPF_FETCH 0x01 /* fetch previous value into src reg */ +#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */ +#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */ +#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */ /* Register numbers */ enum { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 2b1d3fea03fc..4f22cff4491e 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1630,6 +1630,16 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) (u32) SRC, (atomic_t *)(unsigned long) (DST + insn->off)); break; + case BPF_XCHG: + SRC = (u32) atomic_xchg( + (atomic_t *)(unsigned long) (DST + insn->off), + (u32) SRC); + break; + case BPF_CMPXCHG: + BPF_R0 = (u32) atomic_cmpxchg( + (atomic_t *)(unsigned long) (DST + insn->off), + (u32) BPF_R0, (u32) SRC); + break; default: goto default_label; } @@ -1647,6 +1657,16 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack) (u64) SRC, (atomic64_t *)(unsigned long) (DST + insn->off)); break; + case BPF_XCHG: + SRC = (u64) atomic64_xchg( + (atomic64_t *)(unsigned long) (DST + insn->off), + (u64) SRC); + break; + case BPF_CMPXCHG: + BPF_R0 = (u64) atomic64_cmpxchg( + (atomic64_t *)(unsigned long) (DST + insn->off), + (u64) BPF_R0, (u64) SRC); + break; default: goto default_label; } diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c index d2e20f6d0516..ee8d1132767b 100644 --- a/kernel/bpf/disasm.c +++ b/kernel/bpf/disasm.c @@ -167,6 +167,21 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs, BPF_SIZE(insn->code) == BPF_DW ? "64" : "", bpf_ldst_string[BPF_SIZE(insn->code) >> 3], insn->dst_reg, insn->off, insn->src_reg); + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + insn->imm == BPF_CMPXCHG) { + verbose(cbs->private_data, "(%02x) r0 = atomic%s_cmpxchg((%s *)(r%d %+d), r0, r%d)\n", + insn->code, + BPF_SIZE(insn->code) == BPF_DW ? "64" : "", + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, + insn->src_reg); + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + insn->imm == BPF_XCHG) { + verbose(cbs->private_data, "(%02x) r%d = atomic%s_xchg((%s *)(r%d %+d), r%d)\n", + insn->code, insn->src_reg, + BPF_SIZE(insn->code) == BPF_DW ? "64" : "", + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, insn->src_reg); } else { verbose(cbs->private_data, "BUG_%02x\n", insn->code); } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 449b2da9b6db..b1226bcd9765 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3606,11 +3606,14 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn) { + int load_reg; int err; switch (insn->imm) { case BPF_ADD: case BPF_ADD | BPF_FETCH: + case BPF_XCHG: + case BPF_CMPXCHG: break; default: verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm); @@ -3632,6 +3635,13 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i if (err) return err; + if (insn->imm == BPF_CMPXCHG) { + /* Check comparison of R0 with memory location */ + err = check_reg_arg(env, BPF_REG_0, SRC_OP); + if (err) + return err; + } + if (is_pointer_value(env, insn->src_reg)) { verbose(env, "R%d leaks addr into mem\n", insn->src_reg); return -EACCES; @@ -3662,8 +3672,13 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i if (!(insn->imm & BPF_FETCH)) return 0; - /* check and record load of old value into src reg */ - err = check_reg_arg(env, insn->src_reg, DST_OP); + if (insn->imm == BPF_CMPXCHG) + load_reg = BPF_REG_0; + else + load_reg = insn->src_reg; + + /* check and record load of old value */ + err = check_reg_arg(env, load_reg, DST_OP); if (err) return err; diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h index 7211ce9fba53..d75998b0d5ac 100644 --- a/tools/include/linux/filter.h +++ b/tools/include/linux/filter.h @@ -174,6 +174,8 @@ * * BPF_ADD *(uint *) (dst_reg + off16) += src_reg * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg); + * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg) + * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg) */ #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \ diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 760ae333a5ed..538b95472c8f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -45,7 +45,9 @@ #define BPF_EXIT 0x90 /* function return */ /* atomic op type fields (stored in immediate) */ -#define BPF_FETCH 0x01 /* fetch previous value into src reg */ +#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */ +#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */ +#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */ /* Register numbers */ enum {
This adds two atomic opcodes, both of which include the BPF_FETCH flag. XCHG without the BPF_FETCH flag would naturally encode atomic_set. This is not supported because it would be of limited value to userspace (it doesn't imply any barriers). CMPXCHG without BPF_FETCH woulud be an atomic compare-and-write. We don't have such an operation in the kernel so it isn't provided to BPF either. There are two significant design decisions made for the CMPXCHG instruction: - To solve the issue that this operation fundamentally has 3 operands, but we only have two register fields. Therefore the operand we compare against (the kernel's API calls it 'old') is hard-coded to be R0. x86 has similar design (and A64 doesn't have this problem). A potential alternative might be to encode the other operand's register number in the immediate field. - The kernel's atomic_cmpxchg returns the old value, while the C11 userspace APIs return a boolean indicating the comparison result. Which should BPF do? A64 returns the old value. x86 returns the old value in the hard-coded register (and also sets a flag). That means return-old-value is easier to JIT, so that's what we use. Signed-off-by: Brendan Jackman <jackmanb@google.com> --- arch/x86/net/bpf_jit_comp.c | 8 ++++++++ include/linux/filter.h | 2 ++ include/uapi/linux/bpf.h | 4 +++- kernel/bpf/core.c | 20 ++++++++++++++++++++ kernel/bpf/disasm.c | 15 +++++++++++++++ kernel/bpf/verifier.c | 19 +++++++++++++++++-- tools/include/linux/filter.h | 2 ++ tools/include/uapi/linux/bpf.h | 4 +++- 8 files changed, 70 insertions(+), 4 deletions(-)