Message ID | b1792bb3c51eb3e94b9d27e67665d3f2209bba7e.camel@linux.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | BPF |
Headers | show |
Series | What should BPF_CMPXCHG do when the values match? | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > Hi, > > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the > doc, the interpreter and the X64 JIT do not agree on what the behavior > should be when the values match. > > If the operand size is BPF_W, this matters, because, depending on the > answer, the upper bits of R0 are either zeroed out out or are left > intact. > > I made the experiment based on the following modification to the > "atomic compare-and-exchange smoketest - 32bit" test on top of commit > ee5cc0363ea0: > > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > @@ -57,8 +57,8 @@ > BPF_MOV32_IMM(BPF_REG_1, 4), > BPF_MOV32_IMM(BPF_REG_0, 3), > BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, > BPF_REG_1, -4), > - /* if (old != 3) exit(4); */ > - BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > + /* if ((u64)old != 3) exit(4); */ > + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > BPF_MOV32_IMM(BPF_REG_0, 4), > BPF_EXIT_INSN(), > /* if (val != 4) exit(5); */ > > and got the following results: > > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as if > there is a typo and either a period or the word "otherwise" is > missing? > > > If they match it is replaced with ``src_reg``, The value that was > > there before is loaded back to ``R0``. > > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics) > > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg semantics) > > => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > 0xffffffffc0146bcc: cmp $0x3,%rax > (gdb) p/x $rax > 0x6bd5720c00000003 > (gdb) x/d $rbp-4 > 0xffffc90001263d5c: 3 > > 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > => 0xffffffffc0146bcc: cmp $0x3,%rax > (gdb) p/x $rax > 0x6bd5720c00000003 > > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?) > > => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > 0xffffffffc0144201: cmp $0x3,%rax > (gdb) p/x $rax > 0x81776ea600000003 > (gdb) x/d $rbp-4 > 0xffffc90001117d5c: 3 > > 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > => 0xffffffffc0144201: cmp $0x3,%rax > (gdb) p/x $rax > $3 = 0x3 > > So which option is the right one? In my opinion, it would be safer to > follow what the interpreter does and zero out the upper bits. Wow. What a find! I thought that all 32-bit x86 ops zero-extend the dest register. I agree that it's best to zero upper bits for cmpxchg as well. I wonder whether compilers know about this exceptional behavior. I believe the bpf backend considers full R0 to be used by bpf's cmpxchg. Do you know what xchg does on x86? What about arm64 with cas?
On Tue, 2021-02-09 at 20:14 -0800, Alexei Starovoitov wrote: > On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@linux.ibm.com> > wrote: > > > > Hi, > > > > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the > > doc, the interpreter and the X64 JIT do not agree on what the > > behavior > > should be when the values match. > > > > If the operand size is BPF_W, this matters, because, depending on > > the > > answer, the upper bits of R0 are either zeroed out out or are left > > intact. > > > > I made the experiment based on the following modification to the > > "atomic compare-and-exchange smoketest - 32bit" test on top of > > commit > > ee5cc0363ea0: > > > > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > > @@ -57,8 +57,8 @@ > > BPF_MOV32_IMM(BPF_REG_1, 4), > > BPF_MOV32_IMM(BPF_REG_0, 3), > > BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, > > BPF_REG_1, -4), > > - /* if (old != 3) exit(4); */ > > - BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > > + /* if ((u64)old != 3) exit(4); */ > > + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > > BPF_MOV32_IMM(BPF_REG_0, 4), > > BPF_EXIT_INSN(), > > /* if (val != 4) exit(5); */ > > > > and got the following results: > > > > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as > > if > > there is a typo and either a period or the word "otherwise" is > > missing? > > > > > If they match it is replaced with ``src_reg``, The value that > > was > > > there before is loaded back to ``R0``. > > > > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics) > > > > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg > > semantics) > > > > => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > > 0xffffffffc0146bcc: cmp $0x3,%rax > > (gdb) p/x $rax > > 0x6bd5720c00000003 > > (gdb) x/d $rbp-4 > > 0xffffc90001263d5c: 3 > > > > 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > > => 0xffffffffc0146bcc: cmp $0x3,%rax > > (gdb) p/x $rax > > 0x6bd5720c00000003 > > > > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?) > > > > => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > > 0xffffffffc0144201: cmp $0x3,%rax > > (gdb) p/x $rax > > 0x81776ea600000003 > > (gdb) x/d $rbp-4 > > 0xffffc90001117d5c: 3 > > > > 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > > => 0xffffffffc0144201: cmp $0x3,%rax > > (gdb) p/x $rax > > $3 = 0x3 > > > > So which option is the right one? In my opinion, it would be safer > > to > > follow what the interpreter does and zero out the upper bits. > > Wow. What a find! > I thought that all 32-bit x86 ops zero-extend the dest register. I think that's true, it's just that when the values match, cmpxchg is specified to do nothing. > I agree that it's best to zero upper bits for cmpxchg as well. I will send a doc patch to clarify the wording then. > I wonder whether compilers know about this exceptional behavior. I'm not too familiar with the BPF LLVM backend, but at least CMPXCHG32 is defined in a similar way to XFALU32, so it should be fine. Maybe Yonghong can comment on this further. > I believe the bpf backend considers full R0 to be used by bpf's > cmpxchg. It's a little bit inconsistent at the moment. I don't know why yet, but on s390 the subreg optimization kicks in and I have to run with the following patch in order to avoid stack pointer zero extension: --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10588,6 +10588,7 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, for (i = 0; i < len; i++) { int adj_idx = i + delta; struct bpf_insn insn; + u8 load_reg; insn = insns[adj_idx]; if (!aux[adj_idx].zext_dst) { @@ -10630,9 +10631,29 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, if (!bpf_jit_needs_zext()) continue; + /* zext_dst means that we want to zero-extend whatever register + * the insn defines, which is dst_reg most of the time, with + * the notable exception of BPF_STX + BPF_ATOMIC + BPF_FETCH. + */ + if (BPF_CLASS(insn.code) == BPF_STX && + BPF_MODE(insn.code) == BPF_ATOMIC) { + /* BPF_STX + BPF_ATOMIC insns without BPF_FETCH do not + * define any registers, therefore zext_dst cannot be + * set. + */ + if (WARN_ON_ONCE(!(insn.imm & BPF_FETCH))) + return -EINVAL; + if (insn.imm == BPF_CMPXCHG) + load_reg = BPF_REG_0; + else + load_reg = insn.src_reg; + } else { + load_reg = insn.dst_reg; + } + zext_patch[0] = insn; - zext_patch[1].dst_reg = insn.dst_reg; - zext_patch[1].src_reg = insn.dst_reg; + zext_patch[1].dst_reg = load_reg; + zext_patch[1].src_reg = load_reg; patch = zext_patch; patch_len = 2; apply_patch_buffer: However, this doesn't seem to affect x86_64. > Do you know what xchg does on x86? What about arm64 with cas? xchg always zeroes out the upper half. Unlike x86_64's cmpxchg, arm64's cas is specified to always zero out the upper half, even if the values match. I don't have access to arm8.1 machine to test this, but at least QEMU does behave this way. s390's cs does not zero out the upper half, we need to use llgfr in addition (which doesn't sound like a big deal to me).
On Wed, Feb 10, 2021 at 5:28 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote: > > On Tue, 2021-02-09 at 20:14 -0800, Alexei Starovoitov wrote: > > On Tue, Feb 9, 2021 at 4:43 PM Ilya Leoshkevich <iii@linux.ibm.com> > > wrote: > > > > > > Hi, > > > > > > I'm implementing BPF_CMPXCHG for the s390x JIT and noticed that the > > > doc, the interpreter and the X64 JIT do not agree on what the > > > behavior > > > should be when the values match. > > > > > > If the operand size is BPF_W, this matters, because, depending on > > > the > > > answer, the upper bits of R0 are either zeroed out out or are left > > > intact. > > > > > > I made the experiment based on the following modification to the > > > "atomic compare-and-exchange smoketest - 32bit" test on top of > > > commit > > > ee5cc0363ea0: > > > > > > --- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > > > +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c > > > @@ -57,8 +57,8 @@ > > > BPF_MOV32_IMM(BPF_REG_1, 4), > > > BPF_MOV32_IMM(BPF_REG_0, 3), > > > BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, > > > BPF_REG_1, -4), > > > - /* if (old != 3) exit(4); */ > > > - BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > > > + /* if ((u64)old != 3) exit(4); */ > > > + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2), > > > BPF_MOV32_IMM(BPF_REG_0, 4), > > > BPF_EXIT_INSN(), > > > /* if (val != 4) exit(5); */ > > > > > > and got the following results: > > > > > > 1) Documentation: Upper bits of R0 are zeroed out - but it looks as > > > if > > > there is a typo and either a period or the word "otherwise" is > > > missing? > > > > > > > If they match it is replaced with ``src_reg``, The value that > > > was > > > > there before is loaded back to ``R0``. > > > > > > 2) Interpreter + KVM: Upper bits of R0 are zeroed out (C semantics) > > > > > > 3) X64 JIT + KVM: Upper bits of R0 are untouched (cmpxchg > > > semantics) > > > > > > => 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > > > 0xffffffffc0146bcc: cmp $0x3,%rax > > > (gdb) p/x $rax > > > 0x6bd5720c00000003 > > > (gdb) x/d $rbp-4 > > > 0xffffc90001263d5c: 3 > > > > > > 0xffffffffc0146bc7: lock cmpxchg %edi,-0x4(%rbp) > > > => 0xffffffffc0146bcc: cmp $0x3,%rax > > > (gdb) p/x $rax > > > 0x6bd5720c00000003 > > > > > > 4) X64 JIT + TCG: Upper bits of R0 are zeroed out (qemu bug?) > > > > > > => 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > > > 0xffffffffc0144201: cmp $0x3,%rax > > > (gdb) p/x $rax > > > 0x81776ea600000003 > > > (gdb) x/d $rbp-4 > > > 0xffffc90001117d5c: 3 > > > > > > 0xffffffffc01441fc: lock cmpxchg %edi,-0x4(%rbp) > > > => 0xffffffffc0144201: cmp $0x3,%rax > > > (gdb) p/x $rax > > > $3 = 0x3 > > > > > > So which option is the right one? In my opinion, it would be safer > > > to > > > follow what the interpreter does and zero out the upper bits. > > > > Wow. What a find! > > I thought that all 32-bit x86 ops zero-extend the dest register. > > I think that's true, it's just that when the values match, cmpxchg is > specified to do nothing. > > > I agree that it's best to zero upper bits for cmpxchg as well. > > I will send a doc patch to clarify the wording then. > > > I wonder whether compilers know about this exceptional behavior. > > I'm not too familiar with the BPF LLVM backend, but at least CMPXCHG32 > is defined in a similar way to XFALU32, so it should be fine. Maybe > Yonghong can comment on this further. I meant x86 backends in gcc and llvm. bpf backend in llvm I've already checked. > > I believe the bpf backend considers full R0 to be used by bpf's > > cmpxchg. > > It's a little bit inconsistent at the moment. I don't know why yet, > but on s390 the subreg optimization kicks in and I have to run with the > following patch in order to avoid stack pointer zero extension: makes sense. This is needed not only for cmpxchg, but for all bpf_fetch variants, right? > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -10588,6 +10588,7 @@ static int opt_subreg_zext_lo32_rnd_hi32(struct > bpf_verifier_env *env, > for (i = 0; i < len; i++) { > int adj_idx = i + delta; > struct bpf_insn insn; > + u8 load_reg; > > insn = insns[adj_idx]; > if (!aux[adj_idx].zext_dst) { > @@ -10630,9 +10631,29 @@ static int > opt_subreg_zext_lo32_rnd_hi32(struct bpf_verifier_env *env, > if (!bpf_jit_needs_zext()) > continue; > > + /* zext_dst means that we want to zero-extend whatever > register > + * the insn defines, which is dst_reg most of the time, > with > + * the notable exception of BPF_STX + BPF_ATOMIC + > BPF_FETCH. > + */ > + if (BPF_CLASS(insn.code) == BPF_STX && > + BPF_MODE(insn.code) == BPF_ATOMIC) { > + /* BPF_STX + BPF_ATOMIC insns without BPF_FETCH > do not > + * define any registers, therefore zext_dst > cannot be > + * set. > + */ > + if (WARN_ON_ONCE(!(insn.imm & BPF_FETCH))) > + return -EINVAL; warn makes sense. > + if (insn.imm == BPF_CMPXCHG) > + load_reg = BPF_REG_0; > + else > + load_reg = insn.src_reg; pls use ?:. I think it will read easier. And submit it as an official patch. Please. > + } else { > + load_reg = insn.dst_reg; > + } > + > zext_patch[0] = insn; > - zext_patch[1].dst_reg = insn.dst_reg; > - zext_patch[1].src_reg = insn.dst_reg; > + zext_patch[1].dst_reg = load_reg; > + zext_patch[1].src_reg = load_reg; > patch = zext_patch; > patch_len = 2; > apply_patch_buffer: > > However, this doesn't seem to affect x86_64. Right, but it will affect x86-32. It doesn't implement atomics yet, but would be good to keep zext correct. > > Do you know what xchg does on x86? What about arm64 with cas? > > xchg always zeroes out the upper half. > Unlike x86_64's cmpxchg, arm64's cas is specified to always zero out > the upper half, even if the values match. I don't have access to arm8.1 > machine to test this, but at least QEMU does behave this way. > s390's cs does not zero out the upper half, we need to use llgfr in > addition (which doesn't sound like a big deal to me). thanks for checking! Brendan, could you please follow up with x64 jit fix to add 'mov eax,eax' for u32-sized cmpxchg ?
Thanks a lot for bringing this up Ilya! On Wed, 10 Feb 2021 at 19:08, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: [...] > Brendan, > could you please follow up with x64 jit fix to add 'mov eax,eax' for > u32-sized cmpxchg ? Ack, will do.
--- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c +++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c @@ -57,8 +57,8 @@ BPF_MOV32_IMM(BPF_REG_1, 4), BPF_MOV32_IMM(BPF_REG_0, 3), BPF_ATOMIC_OP(BPF_W, BPF_CMPXCHG, BPF_REG_10, BPF_REG_1, -4), - /* if (old != 3) exit(4); */ - BPF_JMP32_IMM(BPF_JEQ, BPF_REG_0, 3, 2), + /* if ((u64)old != 3) exit(4); */ + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 3, 2), BPF_MOV32_IMM(BPF_REG_0, 4), BPF_EXIT_INSN(),