Message ID | d03d8c3305e311c6cb29924119b5eecae8370bbc.1738888641.git.yepeilin@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Introduce load-acquire and store-release BPF instructions | expand |
On Fri, 2025-02-07 at 02:05 +0000, Peilin Ye wrote: > Introduce BPF instructions with load-acquire and store-release > semantics, as discussed in [1]. The following new flags are defined: > > BPF_ATOMIC_LOAD 0x10 > BPF_ATOMIC_STORE 0x20 > BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) > > BPF_RELAXED 0x0 > BPF_ACQUIRE 0x1 > BPF_RELEASE 0x2 > BPF_ACQ_REL 0x3 > BPF_SEQ_CST 0x4 > > BPF_LOAD_ACQ (BPF_ATOMIC_LOAD | BPF_ACQUIRE) > BPF_STORE_REL (BPF_ATOMIC_STORE | BPF_RELEASE) > > A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' > field set to BPF_LOAD_ACQ (0x11). > > Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction > with > the 'imm' field set to BPF_STORE_REL (0x22). > > Unlike existing atomic operations that only support BPF_W (32-bit) > and > BPF_DW (64-bit) size modifiers, load-acquires and store-releases also > support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load- > acquire > zero-extends the value before writing it to a 32-bit register, just > like > ARM64 instruction LDARH and friends. > > As an example, consider the following 64-bit load-acquire BPF > instruction: > > db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) > > opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX > imm (0x00000011): BPF_LOAD_ACQ > > Similarly, a 16-bit BPF store-release: > > cb 21 00 00 22 00 00 00 store_release((u16 *)(r1 + 0x0), w2) > > opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX > imm (0x00000022): BPF_STORE_REL > > In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have > bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the > new > instructions, until the corresponding JIT compiler supports them. > > [1] > https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/ > > Acked-by: Eduard Zingerman <eddyz87@gmail.com> > Signed-off-by: Peilin Ye <yepeilin@google.com> > --- > arch/arm64/net/bpf_jit_comp.c | 4 +++ > arch/s390/net/bpf_jit_comp.c | 14 +++++--- > arch/x86/net/bpf_jit_comp.c | 4 +++ > include/linux/bpf.h | 11 ++++++ > include/linux/filter.h | 2 ++ > include/uapi/linux/bpf.h | 13 +++++++ > kernel/bpf/core.c | 63 ++++++++++++++++++++++++++++++-- > -- > kernel/bpf/disasm.c | 12 +++++++ > kernel/bpf/verifier.c | 45 ++++++++++++++++++++++-- > tools/include/uapi/linux/bpf.h | 13 +++++++ > 10 files changed, 168 insertions(+), 13 deletions(-) Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> s390x has a strong memory model, and the regular load and store instructions are atomic as long as operand addresses are aligned. IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT is set, in which case whoever loaded the program is responsible for the consequences: memory accesses that happen to be unaligned would not trigger an exception, but they would not be atomic either. So I can implement the new instructions as normal loads/stores after this series is merged.
Hi Ilya, On Fri, Feb 07, 2025 at 12:28:43PM +0100, Ilya Leoshkevich wrote: > Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Thanks! > s390x has a strong memory model, and the regular load and store > instructions are atomic as long as operand addresses are aligned. I see. > IIUC the verifier already enforces this unless BPF_F_ANY_ALIGNMENT > is set, in which case whoever loaded the program is responsible for the > consequences: memory accesses that happen to be unaligned would > not trigger an exception, but they would not be atomic either. The verifier rejects misaligned BPF_ATOMIC instructions since commit ca36960211eb ("bpf: allow xadd only on aligned memory"), even if BPF_F_ANY_ALIGNMENT is set - so this patch makes the verifier reject misaligned load-acquires and store-releases, too, to keep the behaviour consistent: Specifically, check_atomic_load() calls check_load_mem() (and check_atomic_store() calls check_store_reg()) with the @strict_alignment_once argument equals true. See also selftests load_acquire_misaligned() and store_release_misaligned() in PATCH 8/9. > So I can implement the new instructions as normal loads/stores after > this series is merged. That would be great! Thanks, Peilin Ye
On Thu, Feb 6, 2025 at 6:06 PM Peilin Ye <yepeilin@google.com> wrote: > > Introduce BPF instructions with load-acquire and store-release > semantics, as discussed in [1]. The following new flags are defined: > > BPF_ATOMIC_LOAD 0x10 > BPF_ATOMIC_STORE 0x20 > BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) > > BPF_RELAXED 0x0 > BPF_ACQUIRE 0x1 > BPF_RELEASE 0x2 > BPF_ACQ_REL 0x3 > BPF_SEQ_CST 0x4 I still don't like this. Earlier you said: > If yes, I think we either: > > (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and > STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by > RMW atomics > (b) specify memorder in imm<0-3> > > I chose (b) for fewer "What would be a good numerical value so that RMW > atomics won't need to use it in imm<4-7>?" questions to answer. > > If we're having dedicated fields for memorder, I think it's better to > define all possible values once and for all, just so that e.g. 0x2 will > always mean RELEASE in a memorder field. Initially I defined all six of > them [2], then Yonghong suggested dropping CONSUME [3]. I don't think we should be defining "all possible values", since these are the values that llvm and C model supports, but do we have any plans to support anything bug ld_acq/st_rel ? I haven't heard anything. What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ? What does the verifier suppose to do? reject for now? and then what? Map to what insn? These values might imply that bpf infra is supposed to map all the values to cpu instructions, but that's not what we're doing here. We're only dealing with two specific instructions. We're not defining a memory model for all future new instructions. pw-bot: cr
Hi Alexei, On Sat, Feb 08, 2025 at 01:30:46PM -0800, Alexei Starovoitov wrote: > > Introduce BPF instructions with load-acquire and store-release > > semantics, as discussed in [1]. The following new flags are defined: > > > > BPF_ATOMIC_LOAD 0x10 > > BPF_ATOMIC_STORE 0x20 > > BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) > > > > BPF_RELAXED 0x0 > > BPF_ACQUIRE 0x1 > > BPF_RELEASE 0x2 > > BPF_ACQ_REL 0x3 > > BPF_SEQ_CST 0x4 > > I still don't like this. > > Earlier you said: > > > If yes, I think we either: > > > > (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and > > STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by > > RMW atomics > > (b) specify memorder in imm<0-3> > > > > I chose (b) for fewer "What would be a good numerical value so that RMW > > atomics won't need to use it in imm<4-7>?" questions to answer. > > > > If we're having dedicated fields for memorder, I think it's better to > > define all possible values once and for all, just so that e.g. 0x2 will > > always mean RELEASE in a memorder field. Initially I defined all six of > > them [2], then Yonghong suggested dropping CONSUME [3]. > > I don't think we should be defining "all possible values", > since these are the values that llvm and C model supports, > but do we have any plans to support anything bug ld_acq/st_rel ? > I haven't heard anything. > What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ? > > What does the verifier suppose to do? reject for now? and then what? > Map to what insn? > > These values might imply that bpf infra is supposed to map all the values > to cpu instructions, but that's not what we're doing here. > We're only dealing with two specific instructions. > We're not defining a memory model for all future new instructions. Got it! In v3, I'll change it back to: #define BPF_LOAD_ACQ 0x10 #define BPF_STORE_REL 0x20 Thanks, Peilin Ye
On Sat, Feb 8, 2025 at 6:21 PM Peilin Ye <yepeilin@google.com> wrote: > > Hi Alexei, > > On Sat, Feb 08, 2025 at 01:30:46PM -0800, Alexei Starovoitov wrote: > > > Introduce BPF instructions with load-acquire and store-release > > > semantics, as discussed in [1]. The following new flags are defined: > > > > > > BPF_ATOMIC_LOAD 0x10 > > > BPF_ATOMIC_STORE 0x20 > > > BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) > > > > > > BPF_RELAXED 0x0 > > > BPF_ACQUIRE 0x1 > > > BPF_RELEASE 0x2 > > > BPF_ACQ_REL 0x3 > > > BPF_SEQ_CST 0x4 > > > > I still don't like this. > > > > Earlier you said: > > > > > If yes, I think we either: > > > > > > (a) add more flags to imm<4-7>: maybe LOAD_SEQ_CST (0x3) and > > > STORE_SEQ_CST (0x6); need to skip OR (0x4) and AND (0x5) used by > > > RMW atomics > > > (b) specify memorder in imm<0-3> > > > > > > I chose (b) for fewer "What would be a good numerical value so that RMW > > > atomics won't need to use it in imm<4-7>?" questions to answer. > > > > > > If we're having dedicated fields for memorder, I think it's better to > > > define all possible values once and for all, just so that e.g. 0x2 will > > > always mean RELEASE in a memorder field. Initially I defined all six of > > > them [2], then Yonghong suggested dropping CONSUME [3]. > > > > I don't think we should be defining "all possible values", > > since these are the values that llvm and C model supports, > > but do we have any plans to support anything bug ld_acq/st_rel ? > > I haven't heard anything. > > What even the meaning of BPF_ATOMIC_LOAD | BPF_ACQ_REL ? > > > > What does the verifier suppose to do? reject for now? and then what? > > Map to what insn? > > > > These values might imply that bpf infra is supposed to map all the values > > to cpu instructions, but that's not what we're doing here. > > We're only dealing with two specific instructions. > > We're not defining a memory model for all future new instructions. > > Got it! In v3, I'll change it back to: > > #define BPF_LOAD_ACQ 0x10 > #define BPF_STORE_REL 0x20 why not 1 and 2 ? All other bits are reserved and the verifier will make sure they're zero, so when/if we need to extend it then it wouldn't matter whether lower 4 bits are reserved or other bits. Say, we decide to support cmpwait_relaxed as a new insn. It can take the value 3 and arm64 JIT will map it to ldxr+wfe+... Then with this new load_acq and cmpwait_relaxed we can efficiently implement both smp_cond_load_relaxed and smp_cond_load_acquire.
On Sat, Feb 08, 2025 at 07:46:54PM -0800, Alexei Starovoitov wrote: > > > These values might imply that bpf infra is supposed to map all the values > > > to cpu instructions, but that's not what we're doing here. > > > We're only dealing with two specific instructions. > > > We're not defining a memory model for all future new instructions. > > > > Got it! In v3, I'll change it back to: > > > > #define BPF_LOAD_ACQ 0x10 > > #define BPF_STORE_REL 0x20 > > why not 1 and 2 ? > All other bits are reserved and the verifier will make sure they're zero, > so when/if we need to extend it then it wouldn't matter whether > lower 4 bits are reserved or other bits. > Say, we decide to support cmpwait_relaxed as a new insn. > It can take the value 3 and arm64 JIT will map it to ldxr+wfe+... > > Then with this new load_acq and cmpwait_relaxed we can efficiently > implement both smp_cond_load_relaxed and smp_cond_load_acquire. Ah, I see. When you suggested "LOAD_ACQ=1 and STORE_REL=2" earlier, I didn't realize you meant 1 and 2 in imm<0-3>. Thanks, Peilin Ye
Hi Alexei, On Sat, Feb 08, 2025 at 07:46:54PM -0800, Alexei Starovoitov wrote: > > Got it! In v3, I'll change it back to: > > > > #define BPF_LOAD_ACQ 0x10 > > #define BPF_STORE_REL 0x20 > > why not 1 and 2 ? I just realized that we can't do 1 and 2 because BPF_ADD | BPF_FETCH also equals 1. > All other bits are reserved and the verifier will make sure they're zero IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00). What would you suggest? Maybe: #define BPF_ATOMIC_LD_ST 0x10 #define BPF_LOAD_ACQ 0x1 #define BPF_STORE_REL 0x2 ? > , so when/if we need to extend it then it wouldn't matter whether > lower 4 bits are reserved or other bits. > Say, we decide to support cmpwait_relaxed as a new insn. > It can take the value 3 and arm64 JIT will map it to ldxr+wfe+... > > Then with this new load_acq and cmpwait_relaxed we can efficiently > implement both smp_cond_load_relaxed and smp_cond_load_acquire. Thanks, Peilin Ye
On Mon, Feb 10, 2025 at 10:51:11PM +0000, Peilin Ye wrote: > > > #define BPF_LOAD_ACQ 0x10 > > > #define BPF_STORE_REL 0x20 > > > > why not 1 and 2 ? > > I just realized that we can't do 1 and 2 because BPF_ADD | BPF_FETCH > also equals 1. > > > All other bits are reserved and the verifier will make sure they're zero > > IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00). What > would you suggest? Maybe: > > #define BPF_ATOMIC_LD_ST 0x10 > > #define BPF_LOAD_ACQ 0x1 > #define BPF_STORE_REL 0x2 > > ? Or, how about reusing 0xb in imm<4-7>: #define BPF_ATOMIC_LD_ST 0xb0 #define BPF_LOAD_ACQ 0x1 #define BPF_STORE_REL 0x2 0xb is BPF_MOV in BPFArithOp<>, and we'll never need it for BPF_ATOMIC. Instead of moving values between registers, we now "move" values from/to the memory - if I can think of it that way. - - - Or, do we want to start to use the remaining bits of the imm field (i.e. imm<8-31>) ? Thanks, Peilin Ye
On Wed, Feb 12, 2025 at 2:14 PM Peilin Ye <yepeilin@google.com> wrote: > > On Mon, Feb 10, 2025 at 10:51:11PM +0000, Peilin Ye wrote: > > > > #define BPF_LOAD_ACQ 0x10 > > > > #define BPF_STORE_REL 0x20 so that was broken then, since BPF_SUB 0x10 ? And original thing was also completely broken for BPF_ATOMIC_LOAD | BPF_RELAXED == 0x10 == BPF_SUB ? so much for "lets define relaxed, acquire, release, acq_rel for completeness". :( > > > > > > why not 1 and 2 ? > > > > I just realized that we can't do 1 and 2 because BPF_ADD | BPF_FETCH > > also equals 1. > > > > > All other bits are reserved and the verifier will make sure they're zero > > > > IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00). What > > would you suggest? Maybe: > > > > #define BPF_ATOMIC_LD_ST 0x10 > > > > #define BPF_LOAD_ACQ 0x1 > > #define BPF_STORE_REL 0x2 This is also broken, since BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ == 0x11 == BPF_SUB | BPF_FETCH. BPF_SUB | BPF_FETCH is invalid at the moment, but such aliasing is bad. > > > > ? > > Or, how about reusing 0xb in imm<4-7>: > > #define BPF_ATOMIC_LD_ST 0xb0 > > #define BPF_LOAD_ACQ 0x1 > #define BPF_STORE_REL 0x2 > > 0xb is BPF_MOV in BPFArithOp<>, and we'll never need it for BPF_ATOMIC. > Instead of moving values between registers, we now "move" values from/to > the memory - if I can think of it that way. and BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ would == BPF_MOV | BPF_FETCH ? Not pretty and confusing. BPF_FETCH modifier means that "do whatever opcode says to do, like add in memory, but also return the value into insn->src_reg" Which doesn't fit this BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ semantics which loads into _dst_reg_. How about: #define BPF_LOAD_ACQ 2 #define BPF_STORE_REL 3 and only use them with BPF_MOV like imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire imm = BPF_MOV | BPF_STORE_REL - release Thought 2 stands on its own, it's also equal to BPF_ADD | BPF_LOAD_ACQ which is kinda ugly, so I don't like to use 2 alone. > Or, do we want to start to use the remaining bits of the imm field (i.e. > imm<8-31>) ? Maybe. Sort-of. Since #define BPF_CMPXCHG (0xf0 | BPF_FETCH) another option would be: #define BPF_LOAD_ACQ 0x100 #define BPF_STORE_REL 0x110 essentially extending op type to: BPF_ATOMIC_TYPE(imm) ((imm) & 0x1f0) All options are not great. I feel we need to step back. Is there an architecture that has sign extending load acquire ? Looks like arm doesn't, and I couldn't find any arch that does. Then maybe we should reconsider BPF_LDX/STX and use BPF_MODE to distinguish from normal ldx/stx #define BPF_ACQ_REL 0xe0 BPF_LDX | BPF_ACQ_REL | BPF_W BPF_STX | BPF_ACQ_REL | BPF_W ?
Hi Alexei, On Wed, Feb 12, 2025 at 09:55:43PM -0800, Alexei Starovoitov wrote: > > > > > #define BPF_LOAD_ACQ 0x10 > > > > > #define BPF_STORE_REL 0x20 > > so that was broken then, > since BPF_SUB 0x10 ? > > And original thing was also completely broken for > BPF_ATOMIC_LOAD | BPF_RELAXED == 0x10 == BPF_SUB ? > > so much for "lets define relaxed, acquire, > release, acq_rel for completeness". > :( > > > > > why not 1 and 2 ? > > > > > > I just realized To clarify, by "just realized" I meant I forgot BPF_ADD equals 0x00 until (I had coffee on) Monday :-) I wouldn't call it completely broken though. For full context, initially I picked [1] 0x1 and 0xb in imm<4-7> because: * 0x1 is BPF_SUB in BPFArithOp<>, and atomic SUB is implemented using NEG + ADD, quoting a comment in LLVM source: // atomic_load_sub can be represented as a neg followed // by an atomic_load_add. Though admittedly atomic SUB _could_ have its own insn. * 0xb is BPF_MOV, which is not applicable for atomic (memory) operations, as already discussed After discussing [2] this with Yonghong, I changed it to 0x1 and 0x2, because 0x2 is BPF_MUL and we are unlikely to support atomic multiplication. Then, following your suggestion to discuss the encoding on-list, I left this as an open topic in RFC v1 cover letter (then documented it in PATCH v1 8/8 and v2 9/9). TL;DR: I wasn't aware that you were against having "aliases" (I do still believe it's safe to pick 0xb). > > > that we can't do 1 and 2 because BPF_ADD | BPF_FETCH also equals > > > 1. > > > > > > > All other bits are reserved and the verifier will make sure they're zero > > > > > > IOW, we can't tell if imm<4-7> is reserved or BPF_ADD (0x00). What > > > would you suggest? Maybe: > > > > > > #define BPF_ATOMIC_LD_ST 0x10 > > > > > > #define BPF_LOAD_ACQ 0x1 > > > #define BPF_STORE_REL 0x2 > > This is also broken, since > BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ == 0x11 == BPF_SUB | BPF_FETCH. > > BPF_SUB | BPF_FETCH is invalid at the moment, > but such aliasing is bad. > > > > ? > > > > Or, how about reusing 0xb in imm<4-7>: > > > > #define BPF_ATOMIC_LD_ST 0xb0 > > > > #define BPF_LOAD_ACQ 0x1 > > #define BPF_STORE_REL 0x2 > > > > 0xb is BPF_MOV in BPFArithOp<>, and we'll never need it for BPF_ATOMIC. > > Instead of moving values between registers, we now "move" values from/to > > the memory - if I can think of it that way. > > and BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ would == BPF_MOV | BPF_FETCH ? > > Not pretty and confusing. > > BPF_FETCH modifier means that "do whatever opcode says to do, > like add in memory, but also return the value into insn->src_reg" > > Which doesn't fit this BPF_ATOMIC_LD_ST | BPF_LOAD_ACQ semantics > which loads into _dst_reg_. I think we can have different imm<0-3> "namespace"s depending on different imm<4-7> values? So that 0x1 in imm<0-3> means BPF_FETCH for existing RMW operations, and BPF_LOAD_ACQ for loads/stores. Just like (browsing instruction-set.rst) for "64-bit immediate instructions", the imm field means different things depending on the value in src_reg? If I change PATCH v2 9/9 to say the following in instruction-set.rst: ``` These operations are categorized based on the second lowest nibble (bits 4-7) of the 'imm' field: * ``ATOMIC_LD_ST`` indicates an atomic load or store operation (see `Atomic load and store operations`_). * All other defined values indicate an atomic read-modify-write operation, as described in the following section. ``` The section for loads/stores will have its own table explaining what imm<0-3> means. > How about: > #define BPF_LOAD_ACQ 2 > #define BPF_STORE_REL 3 > > and only use them with BPF_MOV like > > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire > imm = BPF_MOV | BPF_STORE_REL - release > > Thought 2 stands on its own, > it's also equal to BPF_ADD | BPF_LOAD_ACQ > which is kinda ugly, > so I don't like to use 2 alone. Totally agree - if we use 2 and 3 alone, zero in imm<4-7> would mean "reserved" for load_acq/store_rel, and "BPF_ADD" for add/fetch_add. > > Or, do we want to start to use the remaining bits of the imm field (i.e. > > imm<8-31>) ? > > Maybe. > Sort-of. > Since #define BPF_CMPXCHG (0xf0 | BPF_FETCH) > another option would be: > > #define BPF_LOAD_ACQ 0x100 > #define BPF_STORE_REL 0x110 > > essentially extending op type to: > BPF_ATOMIC_TYPE(imm) ((imm) & 0x1f0) Why, it sounds like a great idea! If we extend the op_type field from imm<4-7> to imm<4-11>, 256 numbers is more than we'll ever need? After all we'd still need to worry about e.g. cmpwait_relaxed you mentioned earlier. I am guessing that we'll want to put it under BPF_ATOMIC as well, since XCHG and CMPXCHG are here. If we take your approach, cmpwait_relaxed can be easily defined as e.g.: #define BPF_CMPWAIT_RELAXED 0x120 (FWIW, I was imagining a subtype/subclass flag in maybe imm<8-11> or imm<28-31> (or make it 8 bits for 256 subtypes/subclasses), so that 0x0 means read-modify-write subclass, then 0x1 means maybe load/store subclass" etc.) > All options are not great. > I feel we need to step back. > Is there an architecture that has sign extending load acquire ? IIUC, if I grep the LLVM source like: $ git grep -B 100 -A 100 getExtendForAtomicOps -- llvm/lib/Target/ \ | grep ISD::SIGN_EXTEND llvm/lib/Target/LoongArch/LoongArchISelLowering.h- return ISD::SIGN_EXTEND; llvm/lib/Target/Mips/MipsISelLowering.h- return ISD::SIGN_EXTEND; llvm/lib/Target/RISCV/RISCVISelLowering.h- return ISD::SIGN_EXTEND; So LoongArch, Mips and RISCV it seems? Semi-related, but it would be non-trivial (if not infeasible) to support both zext and sext load-acquire for LLVM BPF backend, because LLVM core expects each arch to pick from SIGN_EXTEND, ZERO_EXTEND and ANY_EXTEND for its atomic ops. See my earlier investigation: https://github.com/llvm/llvm-project/pull/108636#issuecomment-2433844760 > Looks like arm doesn't, and I couldn't find any arch that does. > Then maybe we should reconsider BPF_LDX/STX and use BPF_MODE > to distinguish from normal ldx/stx > > #define BPF_ACQ_REL 0xe0 > > BPF_LDX | BPF_ACQ_REL | BPF_W > BPF_STX | BPF_ACQ_REL | BPF_W > > ? [1] https://github.com/llvm/llvm-project/pull/108636#issuecomment-2398916882 [2] https://github.com/llvm/llvm-project/pull/108636#discussion_r1815927568 Thanks, Peilin Ye
> On Wed, Feb 12, 2025 at 09:55:43PM -0800, Alexei Starovoitov wrote: > > How about: > > #define BPF_LOAD_ACQ 2 > > #define BPF_STORE_REL 3 > > > > and only use them with BPF_MOV like > > > > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire > > imm = BPF_MOV | BPF_STORE_REL - release Based on everything discussed, should we proceed with the above suggestion? Specifically: #define BPF_LD_ST BPF_MOV /* 0xb0 */ #define BPF_LOAD_ACQ 0x2 #define BPF_STORE_REL 0x3 (And document that BPF_LD_ST cannot be used together with BPF_FETCH.) So that: 1. We avoid "aliasing" with BPF_SUB or BPF_MUL at all. 2. If we need to add cmpwait_relaxed, we can then expand imm<4-7> to e.g. imm<4-11> and do something similar to: XCHG 0x0e0 | FETCH CMPXCHG 0x0f0 | FETCH +CMPWAIT_RELAXED 0x100 So that <asm/cmpxchg.h> operations can "stay together". 3. In the hypothetical scenario where we need seq_cst loads/stores, we add new flags to imm<0-3>. Though considering that: * BPF_FETCH doesn't apply to loads/stores, and * BPF_LOAD_ACQ and BPF_STORE_REL don't apply to RMW operatons * We only have 15 numbers for imm<0-3> flags I do think it makes sense to define BPF_LOAD_ACQ and BPF_STORE_REL as 1 and 2 (instead of 2 and 3). With proper documentation I believe it'll be clear that load/store and RMW are separate categories, with different ways of using imm<0-3> (or, different imm<0-3> "namespace"s). That said, I'm happy to do either 2 and 3, or 1 and 2. I'll start making changes for v3 and the LLVM PR, according to the description above (with BPF_LOAD_ACQ=2, BPF_STORE_REL=3). Please advise me of any further concerns. Thanks, Peilin Ye
On Fri, Feb 14, 2025 at 6:34 PM Peilin Ye <yepeilin@google.com> wrote: > > > On Wed, Feb 12, 2025 at 09:55:43PM -0800, Alexei Starovoitov wrote: > > > How about: > > > #define BPF_LOAD_ACQ 2 > > > #define BPF_STORE_REL 3 > > > > > > and only use them with BPF_MOV like > > > > > > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire > > > imm = BPF_MOV | BPF_STORE_REL - release > > Based on everything discussed, should we proceed with the above > suggestion? Specifically: > > #define BPF_LD_ST BPF_MOV /* 0xb0 */ The aliasing still bothers me. I hated doing it when going from cBPF to eBPF, but we only had 8-bit to work with. Here we have 32-bit. Aliases make disassemblers trickier, since value no longer translates to string as-is. It depends on the context. There is probably no use for BPF_MOV operation in atomic context, but by reusing BPF_ADD, BPF_XOR, etc in atomic we signed up ourselves for all of alu ops. That's why BPF_XCHG and BPF_CMPXCHG are outside of alu op range. So my preference is to do: #define BPF_LOAD_ACQ 0x100 #define BPF_STORE_REL 0x110 #define BPF_CMPWAIT_RELAXED 0x120 and keep growing it. We burn the first nibble, but so be it.
On Fri, Feb 14, 2025 at 07:04:13PM -0800, Alexei Starovoitov wrote: > > > > How about: > > > > #define BPF_LOAD_ACQ 2 > > > > #define BPF_STORE_REL 3 > > > > > > > > and only use them with BPF_MOV like > > > > > > > > imm = BPF_MOV | BPF_LOAD_ACQ - is actual load acquire > > > > imm = BPF_MOV | BPF_STORE_REL - release > > > > Based on everything discussed, should we proceed with the above > > suggestion? Specifically: > > > > #define BPF_LD_ST BPF_MOV /* 0xb0 */ > > The aliasing still bothers me. > I hated doing it when going from cBPF to eBPF, > but we only had 8-bit to work with. > Here we have 32-bit. > Aliases make disassemblers trickier, since value no longer > translates to string as-is. It depends on the context. > There is probably no use for BPF_MOV operation in atomic > context, but by reusing BPF_ADD, BPF_XOR, etc in atomic > we signed up ourselves for all of alu ops. I see. > That's why BPF_XCHG and BPF_CMPXCHG are outside > of alu op range. > > So my preference is to do: > #define BPF_LOAD_ACQ 0x100 > #define BPF_STORE_REL 0x110 > #define BPF_CMPWAIT_RELAXED 0x120 > > and keep growing it. > We burn the first nibble, but so be it. Got it! In instruction-set.rst I'll make it clear that imm<0-3> must be zero for load_acq/store_rel. Cheers, Peilin Ye
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 8446848edddb..8c3b47d9e441 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -2667,8 +2667,12 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena) if (!in_arena) return true; switch (insn->code) { + case BPF_STX | BPF_ATOMIC | BPF_B: + case BPF_STX | BPF_ATOMIC | BPF_H: case BPF_STX | BPF_ATOMIC | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_DW: + if (bpf_atomic_is_load_store(insn)) + return false; if (!cpus_have_cap(ARM64_HAS_LSE_ATOMICS)) return false; } diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 9d440a0b729e..0776dfde2dba 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -2919,10 +2919,16 @@ bool bpf_jit_supports_arena(void) bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena) { - /* - * Currently the verifier uses this function only to check which - * atomic stores to arena are supported, and they all are. - */ + if (!in_arena) + return true; + switch (insn->code) { + case BPF_STX | BPF_ATOMIC | BPF_B: + case BPF_STX | BPF_ATOMIC | BPF_H: + case BPF_STX | BPF_ATOMIC | BPF_W: + case BPF_STX | BPF_ATOMIC | BPF_DW: + if (bpf_atomic_is_load_store(insn)) + return false; + } return true; } diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a43fc5af973d..f0c31c940fb8 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -3771,8 +3771,12 @@ bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena) if (!in_arena) return true; switch (insn->code) { + case BPF_STX | BPF_ATOMIC | BPF_B: + case BPF_STX | BPF_ATOMIC | BPF_H: case BPF_STX | BPF_ATOMIC | BPF_W: case BPF_STX | BPF_ATOMIC | BPF_DW: + if (bpf_atomic_is_load_store(insn)) + return false; if (insn->imm == (BPF_AND | BPF_FETCH) || insn->imm == (BPF_OR | BPF_FETCH) || insn->imm == (BPF_XOR | BPF_FETCH)) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f3f50e29d639..96c936fd125f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -990,6 +990,17 @@ static inline bool bpf_pseudo_func(const struct bpf_insn *insn) return bpf_is_ldimm64(insn) && insn->src_reg == BPF_PSEUDO_FUNC; } +/* Given a BPF_ATOMIC instruction @atomic_insn, return true if it is an + * atomic load or store, and false if it is a read-modify-write instruction. + */ +static inline bool +bpf_atomic_is_load_store(const struct bpf_insn *atomic_insn) +{ + const s32 type = BPF_ATOMIC_TYPE(atomic_insn->imm); + + return type == BPF_ATOMIC_LOAD || type == BPF_ATOMIC_STORE; +} + struct bpf_prog_ops { int (*test_run)(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); diff --git a/include/linux/filter.h b/include/linux/filter.h index a3ea46281595..e36812a5b01f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -364,6 +364,8 @@ static inline bool insn_is_cast_user(const struct bpf_insn *insn) * BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg); * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg) * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg) + * BPF_LOAD_ACQ dst_reg = smp_load_acquire(src_reg + off16) + * BPF_STORE_REL smp_store_release(dst_reg + off16, src_reg) */ #define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index fff6cdb8d11a..e78306e6e2be 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -51,6 +51,19 @@ #define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */ #define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */ +#define BPF_ATOMIC_LOAD 0x10 +#define BPF_ATOMIC_STORE 0x20 +#define BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) + +#define BPF_RELAXED 0x00 +#define BPF_ACQUIRE 0x01 +#define BPF_RELEASE 0x02 +#define BPF_ACQ_REL 0x03 +#define BPF_SEQ_CST 0x04 + +#define BPF_LOAD_ACQ (BPF_ATOMIC_LOAD | BPF_ACQUIRE) /* load-acquire */ +#define BPF_STORE_REL (BPF_ATOMIC_STORE | BPF_RELEASE) /* store-release */ + enum bpf_cond_pseudo_jmp { BPF_MAY_GOTO = 0, }; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index da729cbbaeb9..3f3127479a93 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1663,14 +1663,17 @@ EXPORT_SYMBOL_GPL(__bpf_call_base); INSN_3(JMP, JSET, K), \ INSN_2(JMP, JA), \ INSN_2(JMP32, JA), \ + /* Atomic operations. */ \ + INSN_3(STX, ATOMIC, B), \ + INSN_3(STX, ATOMIC, H), \ + INSN_3(STX, ATOMIC, W), \ + INSN_3(STX, ATOMIC, DW), \ /* Store instructions. */ \ /* Register based. */ \ INSN_3(STX, MEM, B), \ INSN_3(STX, MEM, H), \ INSN_3(STX, MEM, W), \ INSN_3(STX, MEM, DW), \ - INSN_3(STX, ATOMIC, W), \ - INSN_3(STX, ATOMIC, DW), \ /* Immediate based. */ \ INSN_3(ST, MEM, B), \ INSN_3(ST, MEM, H), \ @@ -2152,24 +2155,33 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn) if (BPF_SIZE(insn->code) == BPF_W) \ atomic_##KOP((u32) SRC, (atomic_t *)(unsigned long) \ (DST + insn->off)); \ - else \ + else if (BPF_SIZE(insn->code) == BPF_DW) \ atomic64_##KOP((u64) SRC, (atomic64_t *)(unsigned long) \ (DST + insn->off)); \ + else \ + goto default_label; \ break; \ case BOP | BPF_FETCH: \ if (BPF_SIZE(insn->code) == BPF_W) \ SRC = (u32) atomic_fetch_##KOP( \ (u32) SRC, \ (atomic_t *)(unsigned long) (DST + insn->off)); \ - else \ + else if (BPF_SIZE(insn->code) == BPF_DW) \ SRC = (u64) atomic64_fetch_##KOP( \ (u64) SRC, \ (atomic64_t *)(unsigned long) (DST + insn->off)); \ + else \ + goto default_label; \ break; STX_ATOMIC_DW: STX_ATOMIC_W: + STX_ATOMIC_H: + STX_ATOMIC_B: switch (IMM) { + /* Atomic read-modify-write instructions support only W and DW + * size modifiers. + */ ATOMIC_ALU_OP(BPF_ADD, add) ATOMIC_ALU_OP(BPF_AND, and) ATOMIC_ALU_OP(BPF_OR, or) @@ -2181,20 +2193,59 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn) SRC = (u32) atomic_xchg( (atomic_t *)(unsigned long) (DST + insn->off), (u32) SRC); - else + else if (BPF_SIZE(insn->code) == BPF_DW) SRC = (u64) atomic64_xchg( (atomic64_t *)(unsigned long) (DST + insn->off), (u64) SRC); + else + goto default_label; break; case BPF_CMPXCHG: if (BPF_SIZE(insn->code) == BPF_W) BPF_R0 = (u32) atomic_cmpxchg( (atomic_t *)(unsigned long) (DST + insn->off), (u32) BPF_R0, (u32) SRC); - else + else if (BPF_SIZE(insn->code) == BPF_DW) BPF_R0 = (u64) atomic64_cmpxchg( (atomic64_t *)(unsigned long) (DST + insn->off), (u64) BPF_R0, (u64) SRC); + else + goto default_label; + break; + /* Atomic load and store instructions support all size + * modifiers. + */ + case BPF_LOAD_ACQ: + switch (BPF_SIZE(insn->code)) { +#define LOAD_ACQUIRE(SIZEOP, SIZE) \ + case BPF_##SIZEOP: \ + DST = (SIZE)smp_load_acquire( \ + (SIZE *)(unsigned long)(SRC + insn->off)); \ + break; + LOAD_ACQUIRE(B, u8) + LOAD_ACQUIRE(H, u16) + LOAD_ACQUIRE(W, u32) + LOAD_ACQUIRE(DW, u64) +#undef LOAD_ACQUIRE + default: + goto default_label; + } + break; + case BPF_STORE_REL: + switch (BPF_SIZE(insn->code)) { +#define STORE_RELEASE(SIZEOP, SIZE) \ + case BPF_##SIZEOP: \ + smp_store_release( \ + (SIZE *)(unsigned long)(DST + insn->off), (SIZE)SRC); \ + break; + STORE_RELEASE(B, u8) + STORE_RELEASE(H, u16) + STORE_RELEASE(W, u32) + STORE_RELEASE(DW, u64) +#undef STORE_RELEASE + default: + goto default_label; + } break; default: diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c index 309c4aa1b026..974d172d6735 100644 --- a/kernel/bpf/disasm.c +++ b/kernel/bpf/disasm.c @@ -267,6 +267,18 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs, BPF_SIZE(insn->code) == BPF_DW ? "64" : "", bpf_ldst_string[BPF_SIZE(insn->code) >> 3], insn->dst_reg, insn->off, insn->src_reg); + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + insn->imm == BPF_LOAD_ACQ) { + verbose(cbs->private_data, "(%02x) r%d = load_acquire((%s *)(r%d %+d))\n", + insn->code, insn->dst_reg, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->src_reg, insn->off); + } else if (BPF_MODE(insn->code) == BPF_ATOMIC && + insn->imm == BPF_STORE_REL) { + verbose(cbs->private_data, "(%02x) store_release((%s *)(r%d %+d), r%d)\n", + insn->code, + bpf_ldst_string[BPF_SIZE(insn->code) >> 3], + insn->dst_reg, insn->off, insn->src_reg); } else { verbose(cbs->private_data, "BUG_%02x\n", insn->code); } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 82a5a4acf576..7ebc224bf9cb 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -579,6 +579,13 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn) insn->imm == BPF_CMPXCHG; } +static bool is_atomic_load_insn(const struct bpf_insn *insn) +{ + return BPF_CLASS(insn->code) == BPF_STX && + BPF_MODE(insn->code) == BPF_ATOMIC && + BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD; +} + static int __get_spi(s32 off) { return (-off - 1) / BPF_REG_SIZE; @@ -3481,7 +3488,7 @@ static bool is_reg64(struct bpf_verifier_env *env, struct bpf_insn *insn, } if (class == BPF_STX) { - /* BPF_STX (including atomic variants) has multiple source + /* BPF_STX (including atomic variants) has one or more source * operands, one of which is a ptr. Check whether the caller is * asking about it. */ @@ -4095,7 +4102,7 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx, * dreg still needs precision before this insn */ } - } else if (class == BPF_LDX) { + } else if (class == BPF_LDX || is_atomic_load_insn(insn)) { if (!bt_is_reg_set(bt, dreg)) return 0; bt_clear_reg(bt, dreg); @@ -7686,6 +7693,32 @@ static int check_atomic_rmw(struct bpf_verifier_env *env, return 0; } +static int check_atomic_load(struct bpf_verifier_env *env, + struct bpf_insn *insn) +{ + if (!atomic_ptr_type_ok(env, insn->src_reg, insn)) { + verbose(env, "BPF_ATOMIC loads from R%d %s is not allowed\n", + insn->src_reg, + reg_type_str(env, reg_state(env, insn->src_reg)->type)); + return -EACCES; + } + + return check_load_mem(env, insn, true, false, false, "atomic_load"); +} + +static int check_atomic_store(struct bpf_verifier_env *env, + struct bpf_insn *insn) +{ + if (!atomic_ptr_type_ok(env, insn->dst_reg, insn)) { + verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n", + insn->dst_reg, + reg_type_str(env, reg_state(env, insn->dst_reg)->type)); + return -EACCES; + } + + return check_store_reg(env, insn, true); +} + static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn) { switch (insn->imm) { @@ -7700,6 +7733,10 @@ static int check_atomic(struct bpf_verifier_env *env, struct bpf_insn *insn) case BPF_XCHG: case BPF_CMPXCHG: return check_atomic_rmw(env, insn); + case BPF_LOAD_ACQ: + return check_atomic_load(env, insn); + case BPF_STORE_REL: + return check_atomic_store(env, insn); default: verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm); @@ -20445,7 +20482,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) insn->code == (BPF_ST | BPF_MEM | BPF_W) || insn->code == (BPF_ST | BPF_MEM | BPF_DW)) { type = BPF_WRITE; - } else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) || + } else if ((insn->code == (BPF_STX | BPF_ATOMIC | BPF_B) || + insn->code == (BPF_STX | BPF_ATOMIC | BPF_H) || + insn->code == (BPF_STX | BPF_ATOMIC | BPF_W) || insn->code == (BPF_STX | BPF_ATOMIC | BPF_DW)) && env->insn_aux_data[i + delta].ptr_type == PTR_TO_ARENA) { insn->code = BPF_STX | BPF_PROBE_ATOMIC | BPF_SIZE(insn->code); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 2acf9b336371..4a20a125eb46 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -51,6 +51,19 @@ #define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */ #define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */ +#define BPF_ATOMIC_LOAD 0x10 +#define BPF_ATOMIC_STORE 0x20 +#define BPF_ATOMIC_TYPE(imm) ((imm) & 0xf0) + +#define BPF_RELAXED 0x00 +#define BPF_ACQUIRE 0x01 +#define BPF_RELEASE 0x02 +#define BPF_ACQ_REL 0x03 +#define BPF_SEQ_CST 0x04 + +#define BPF_LOAD_ACQ (BPF_ATOMIC_LOAD | BPF_ACQUIRE) /* load-acquire */ +#define BPF_STORE_REL (BPF_ATOMIC_STORE | BPF_RELEASE) /* store-release */ + enum bpf_cond_pseudo_jmp { BPF_MAY_GOTO = 0, };