Message ID | 1447195301-16757-3-git-send-email-yang.shi@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: > aarch64 doesn't have native support for XADD instruction, implement it by > the below instruction sequence: > > Load (dst + off) to a register > Add src to it > Store it back to (dst + off) Not really what is needed ? See this BPF_XADD as an atomic_add() equivalent.
On 11/10/2015 4:08 PM, Eric Dumazet wrote: > On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: >> aarch64 doesn't have native support for XADD instruction, implement it by >> the below instruction sequence: >> >> Load (dst + off) to a register >> Add src to it >> Store it back to (dst + off) > > Not really what is needed ? > > See this BPF_XADD as an atomic_add() equivalent. I see. Thanks. The documentation doesn't say too much about "exclusive" add. If so it should need load-acquire/store-release. I will rework it. Yang > >
On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: > On 11/10/2015 4:08 PM, Eric Dumazet wrote: > >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: > >>aarch64 doesn't have native support for XADD instruction, implement it by > >>the below instruction sequence: > >> > >>Load (dst + off) to a register > >>Add src to it > >>Store it back to (dst + off) > > > >Not really what is needed ? > > > >See this BPF_XADD as an atomic_add() equivalent. > > I see. Thanks. The documentation doesn't say too much about "exclusive" add. > If so it should need load-acquire/store-release. I think doc is clear enough, but it can always be improved. Pls suggest a patch. It's quite hard to write a test for atomicity in test_bpf framework, so code review is the key. Eric, thanks for catching it!
Yang, On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: >> On 11/10/2015 4:08 PM, Eric Dumazet wrote: >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: >> >>aarch64 doesn't have native support for XADD instruction, implement it by >> >>the below instruction sequence: aarch64 supports atomic add in ARMv8.1. For ARMv8(.0), please consider using LDXR/STXR sequence. >> >> >> >>Load (dst + off) to a register >> >>Add src to it >> >>Store it back to (dst + off) >> > >> >Not really what is needed ? >> > >> >See this BPF_XADD as an atomic_add() equivalent. >> >> I see. Thanks. The documentation doesn't say too much about "exclusive" add. >> If so it should need load-acquire/store-release. > > I think doc is clear enough, but it can always be improved. Pls suggest a patch. > It's quite hard to write a test for atomicity in test_bpf framework, so > code review is the key. Eric, thanks for catching it! >
On Tuesday 10 November 2015 18:52:45 Z Lim wrote: > On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: > >> On 11/10/2015 4:08 PM, Eric Dumazet wrote: > >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: > >> >>aarch64 doesn't have native support for XADD instruction, implement it by > >> >>the below instruction sequence: > > aarch64 supports atomic add in ARMv8.1. > For ARMv8(.0), please consider using LDXR/STXR sequence. Is it worth optimizing for the 8.1 case? It would add a bit of complexity to make the code depend on the CPU feature, but it's certainly doable. Arnd
On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote: > On Tuesday 10 November 2015 18:52:45 Z Lim wrote: > > On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: > > >> On 11/10/2015 4:08 PM, Eric Dumazet wrote: > > >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: > > >> >>aarch64 doesn't have native support for XADD instruction, implement it by > > >> >>the below instruction sequence: > > > > aarch64 supports atomic add in ARMv8.1. > > For ARMv8(.0), please consider using LDXR/STXR sequence. > > Is it worth optimizing for the 8.1 case? It would add a bit of complexity > to make the code depend on the CPU feature, but it's certainly doable. What's the atomicity required for? Put another way, what are we racing with (I thought bpf was single-threaded)? Do we need to worry about memory barriers? Apologies if these are stupid questions, but all I could find was samples/bpf/sock_example.c and it didn't help much :( Will
On 11/11/2015 11:24 AM, Will Deacon wrote: > On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote: >> On Tuesday 10 November 2015 18:52:45 Z Lim wrote: >>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: >>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote: >>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: >>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by >>>>>>> the below instruction sequence: >>> >>> aarch64 supports atomic add in ARMv8.1. >>> For ARMv8(.0), please consider using LDXR/STXR sequence. >> >> Is it worth optimizing for the 8.1 case? It would add a bit of complexity >> to make the code depend on the CPU feature, but it's certainly doable. > > What's the atomicity required for? Put another way, what are we racing > with (I thought bpf was single-threaded)? Do we need to worry about > memory barriers? > > Apologies if these are stupid questions, but all I could find was > samples/bpf/sock_example.c and it didn't help much :( The equivalent code more readable in restricted C syntax (that can be compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the built-in __sync_fetch_and_add() will be translated into a BPF_XADD insn variant. What you can race against is that an eBPF map can be _shared_ by multiple eBPF programs that are attached somewhere in the system, and they could all update a particular entry/counter from the map at the same time. Best, Daniel
Hi Daniel, On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote: > On 11/11/2015 11:24 AM, Will Deacon wrote: > >On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote: > >>On Tuesday 10 November 2015 18:52:45 Z Lim wrote: > >>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov > >>><alexei.starovoitov@gmail.com> wrote: > >>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: > >>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote: > >>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: > >>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by > >>>>>>>the below instruction sequence: > >>> > >>>aarch64 supports atomic add in ARMv8.1. > >>>For ARMv8(.0), please consider using LDXR/STXR sequence. > >> > >>Is it worth optimizing for the 8.1 case? It would add a bit of complexity > >>to make the code depend on the CPU feature, but it's certainly doable. > > > >What's the atomicity required for? Put another way, what are we racing > >with (I thought bpf was single-threaded)? Do we need to worry about > >memory barriers? > > > >Apologies if these are stupid questions, but all I could find was > >samples/bpf/sock_example.c and it didn't help much :( > > The equivalent code more readable in restricted C syntax (that can be > compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the > built-in __sync_fetch_and_add() will be translated into a BPF_XADD > insn variant. Yikes, so the memory-model for BPF is based around the deprecated GCC __sync builtins, that inherit their semantics from ia64? Any reason not to use the C11-compatible __atomic builtins[1] as a base? > What you can race against is that an eBPF map can be _shared_ by > multiple eBPF programs that are attached somewhere in the system, and > they could all update a particular entry/counter from the map at the > same time. Ok, so it does sound like eBPF needs to define/choose a memory-model and I worry that riding on the back of __sync isn't necessarily the right thing to do, particularly as its fallen out of favour with the compiler folks. On weakly-ordered architectures, it's also going to result in heavy-weight barriers for all atomic operations. Will [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
On 11/11/2015 12:58 PM, Will Deacon wrote: > On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote: >> On 11/11/2015 11:24 AM, Will Deacon wrote: >>> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote: >>>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote: >>>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov >>>>> <alexei.starovoitov@gmail.com> wrote: >>>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: >>>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote: >>>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: >>>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by >>>>>>>>> the below instruction sequence: >>>>> >>>>> aarch64 supports atomic add in ARMv8.1. >>>>> For ARMv8(.0), please consider using LDXR/STXR sequence. >>>> >>>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity >>>> to make the code depend on the CPU feature, but it's certainly doable. >>> >>> What's the atomicity required for? Put another way, what are we racing >>> with (I thought bpf was single-threaded)? Do we need to worry about >>> memory barriers? >>> >>> Apologies if these are stupid questions, but all I could find was >>> samples/bpf/sock_example.c and it didn't help much :( >> >> The equivalent code more readable in restricted C syntax (that can be >> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the >> built-in __sync_fetch_and_add() will be translated into a BPF_XADD >> insn variant. > > Yikes, so the memory-model for BPF is based around the deprecated GCC > __sync builtins, that inherit their semantics from ia64? Any reason not > to use the C11-compatible __atomic builtins[1] as a base? Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add() keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add() and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw from sock_example.c can be regarded as one possible equivalent program section output from the compiler. >> What you can race against is that an eBPF map can be _shared_ by >> multiple eBPF programs that are attached somewhere in the system, and >> they could all update a particular entry/counter from the map at the >> same time. > > Ok, so it does sound like eBPF needs to define/choose a memory-model and > I worry that riding on the back of __sync isn't necessarily the right > thing to do, particularly as its fallen out of favour with the compiler > folks. On weakly-ordered architectures, it's also going to result in > heavy-weight barriers for all atomic operations. > > Will > > [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
On Wed, Nov 11, 2015 at 01:21:04PM +0100, Daniel Borkmann wrote: > On 11/11/2015 12:58 PM, Will Deacon wrote: > >On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote: > >>On 11/11/2015 11:24 AM, Will Deacon wrote: > >>>On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote: > >>>>On Tuesday 10 November 2015 18:52:45 Z Lim wrote: > >>>>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov > >>>>><alexei.starovoitov@gmail.com> wrote: > >>>>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote: > >>>>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote: > >>>>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote: > >>>>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by > >>>>>>>>>the below instruction sequence: > >>>>> > >>>>>aarch64 supports atomic add in ARMv8.1. > >>>>>For ARMv8(.0), please consider using LDXR/STXR sequence. > >>>> > >>>>Is it worth optimizing for the 8.1 case? It would add a bit of complexity > >>>>to make the code depend on the CPU feature, but it's certainly doable. > >>> > >>>What's the atomicity required for? Put another way, what are we racing > >>>with (I thought bpf was single-threaded)? Do we need to worry about > >>>memory barriers? > >>> > >>>Apologies if these are stupid questions, but all I could find was > >>>samples/bpf/sock_example.c and it didn't help much :( > >> > >>The equivalent code more readable in restricted C syntax (that can be > >>compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the > >>built-in __sync_fetch_and_add() will be translated into a BPF_XADD > >>insn variant. > > > >Yikes, so the memory-model for BPF is based around the deprecated GCC > >__sync builtins, that inherit their semantics from ia64? Any reason not > >to use the C11-compatible __atomic builtins[1] as a base? > > Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on > gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add() > keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the > interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add() > and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw > from sock_example.c can be regarded as one possible equivalent program > section output from the compiler. Ok, so if I understand you correctly, then __sync_fetch_and_add() has different semantics depending on the backend target. That seems counter to the LLVM atomics Documentation: http://llvm.org/docs/Atomics.html which specifically calls out the __sync_* primitives as being sequentially-consistent and requiring barriers on ARM (which isn't the case for atomic[64]_add in the kernel). If we re-use the __sync_* naming scheme in the source language, I don't think we can overlay our own semantics in the backend. The __sync_fetch_and_add primitive is also expected to return the old value, which doesn't appear to be the case for BPF_XADD. Will
On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote: > > Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on > > gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add() > > keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the > > interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add() > > and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw > > from sock_example.c can be regarded as one possible equivalent program > > section output from the compiler. > > Ok, so if I understand you correctly, then __sync_fetch_and_add() has > different semantics depending on the backend target. That seems counter > to the LLVM atomics Documentation: > > http://llvm.org/docs/Atomics.html > > which specifically calls out the __sync_* primitives as being > sequentially-consistent and requiring barriers on ARM (which isn't the > case for atomic[64]_add in the kernel). > > If we re-use the __sync_* naming scheme in the source language, I don't > think we can overlay our own semantics in the backend. The > __sync_fetch_and_add primitive is also expected to return the old value, > which doesn't appear to be the case for BPF_XADD. Yikes. That's double fail. Please don't do this. If you use the __sync stuff (and I agree with Will, you should not) it really _SHOULD_ be sequentially consistent, which means full barriers all over the place. And if you name something XADD (exchange and add, or fetch-add) then it had better return the previous value. atomic*_add() does neither.
On 11/11/2015 01:58 PM, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote: >>> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on >>> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add() >>> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the >>> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add() >>> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw >>> from sock_example.c can be regarded as one possible equivalent program >>> section output from the compiler. >> >> Ok, so if I understand you correctly, then __sync_fetch_and_add() has >> different semantics depending on the backend target. That seems counter >> to the LLVM atomics Documentation: >> >> http://llvm.org/docs/Atomics.html >> >> which specifically calls out the __sync_* primitives as being >> sequentially-consistent and requiring barriers on ARM (which isn't the >> case for atomic[64]_add in the kernel). >> >> If we re-use the __sync_* naming scheme in the source language, I don't >> think we can overlay our own semantics in the backend. The >> __sync_fetch_and_add primitive is also expected to return the old value, >> which doesn't appear to be the case for BPF_XADD. > > Yikes. That's double fail. Please don't do this. > > If you use the __sync stuff (and I agree with Will, you should not) it > really _SHOULD_ be sequentially consistent, which means full barriers > all over the place. > > And if you name something XADD (exchange and add, or fetch-add) then it > had better return the previous value. > > atomic*_add() does neither. unsigned int ui; unsigned long long ull; void foo(void) { (void) __sync_fetch_and_add(&ui, 1); (void) __sync_fetch_and_add(&ull, 1); } So clang front-end translates this snippet into intermediate representation of ... clang test.c -S -emit-llvm -o - [...] define void @foo() #0 { %1 = atomicrmw add i32* @ui, i32 1 seq_cst %2 = atomicrmw add i64* @ull, i64 1 seq_cst ret void } [...] ... which, if I see this correctly, then maps atomicrmw add {i32,i64} in the BPF target into BPF_XADD as mentioned: // Atomics class XADD<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode> : InstBPF<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val), !strconcat(OpcodeStr, "\t$dst, $addr, $val"), [(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))]> { bits<3> mode; bits<2> size; bits<4> src; bits<20> addr; let Inst{63-61} = mode; let Inst{60-59} = size; let Inst{51-48} = addr{19-16}; // base reg let Inst{55-52} = src; let Inst{47-32} = addr{15-0}; // offset let mode = 6; // BPF_XADD let size = SizeOp; let BPFClass = 3; // BPF_STX } let Constraints = "$dst = $val" in { def XADD32 : XADD<0, "xadd32", atomic_load_add_32>; def XADD64 : XADD<3, "xadd64", atomic_load_add_64>; // undefined def XADD16 : XADD<1, "xadd16", atomic_load_add_16>; // undefined def XADD8 : XADD<2, "xadd8", atomic_load_add_8>; } I played a bit around with eBPF code to assign the __sync_fetch_and_add() return value to a var and dump it to trace pipe, or use it as return code. llvm compiles it (with the result assignment) and it looks like: [...] 206: (b7) r3 = 3 207: (db) lock *(u64 *)(r0 +0) += r3 208: (bf) r1 = r10 209: (07) r1 += -16 210: (b7) r2 = 10 211: (85) call 6 // r3 dumped here [...] [...] 206: (b7) r5 = 3 207: (db) lock *(u64 *)(r0 +0) += r5 208: (bf) r1 = r10 209: (07) r1 += -16 210: (b7) r2 = 10 211: (b7) r3 = 43 212: (b7) r4 = 42 213: (85) call 6 // r5 dumped here [...] [...] 11: (b7) r0 = 3 12: (db) lock *(u64 *)(r1 +0) += r0 13: (95) exit // r0 returned here [...] What it seems is that we 'get back' the value (== 3 here in r3, r5, r0) that we're adding, at least that's what seems to be generated wrt register assignments. Hmm, the semantic differences of bpf target should be documented somewhere for people writing eBPF programs to be aware of. Best, Daniel
Hi Daniel, Thanks for investigating this further. On Wed, Nov 11, 2015 at 04:52:00PM +0100, Daniel Borkmann wrote: > I played a bit around with eBPF code to assign the __sync_fetch_and_add() > return value to a var and dump it to trace pipe, or use it as return code. > llvm compiles it (with the result assignment) and it looks like: > > [...] > 206: (b7) r3 = 3 > 207: (db) lock *(u64 *)(r0 +0) += r3 > 208: (bf) r1 = r10 > 209: (07) r1 += -16 > 210: (b7) r2 = 10 > 211: (85) call 6 // r3 dumped here > [...] > > [...] > 206: (b7) r5 = 3 > 207: (db) lock *(u64 *)(r0 +0) += r5 > 208: (bf) r1 = r10 > 209: (07) r1 += -16 > 210: (b7) r2 = 10 > 211: (b7) r3 = 43 > 212: (b7) r4 = 42 > 213: (85) call 6 // r5 dumped here > [...] > > [...] > 11: (b7) r0 = 3 > 12: (db) lock *(u64 *)(r1 +0) += r0 > 13: (95) exit // r0 returned here > [...] > > What it seems is that we 'get back' the value (== 3 here in r3, r5, r0) > that we're adding, at least that's what seems to be generated wrt > register assignments. Hmm, the semantic differences of bpf target > should be documented somewhere for people writing eBPF programs to > be aware of. If we're going to document it, a bug tracker might be a good place to start. The behaviour, as it stands, is broken wrt the definition of the __sync primitives. That is, there is no way to build __sync_fetch_and_add out of BPF_XADD without changing its semantics. We could fix this by either: (1) Defining BPF_XADD to match __sync_fetch_and_add (including memory barriers). (2) Introducing some new BPF_ atomics, that map to something like the C11 __atomic builtins and deprecating BPF_XADD in favour of these. (3) Introducing new source-language intrinsics to match what BPF can do (unlikely to be popular). As it stands, I'm not especially keen on adding BPF_XADD to the arm64 JIT backend until we have at least (1) and preferably (2) as well. Will
On Wed, Nov 11, 2015 at 04:23:41PM +0000, Will Deacon wrote: > > If we're going to document it, a bug tracker might be a good place to > start. The behaviour, as it stands, is broken wrt the definition of the > __sync primitives. That is, there is no way to build __sync_fetch_and_add > out of BPF_XADD without changing its semantics. BPF_XADD == atomic_add() in kernel. period. we are not going to deprecate it or introduce something else. Semantics of __sync* or atomic in C standard and/or gcc/llvm has nothing to do with this. arm64 JIT needs to JIT bpf_xadd insn equivalent to the code of atomic_add() which is 'stadd' in armv8.1. The cpu check can be done by jit and for older cpus just fall back to interpreter. trivial. > We could fix this by either: > > (1) Defining BPF_XADD to match __sync_fetch_and_add (including memory > barriers). nope. > (2) Introducing some new BPF_ atomics, that map to something like the > C11 __atomic builtins and deprecating BPF_XADD in favour of these. nope. > (3) Introducing new source-language intrinsics to match what BPF can do > (unlikely to be popular). llvm's __sync intrinsic is used temporarily until we have time to do new intrinsic in llvm that matches kernel's atomic_add() properly. It will be done similar to llvm-bpf load_byte/word intrinsics. Note that we've been hiding it under lock_xadd() wrapper, like here: https://github.com/iovisor/bcc/blob/master/examples/networking/tunnel_monitor/monitor.c#L130
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Wed, 11 Nov 2015 09:27:00 -0800 > BPF_XADD == atomic_add() in kernel. period. > we are not going to deprecate it or introduce something else. Agreed, it makes no sense to try and tie C99 or whatever atomic semantics to something that is already clearly defined to have exactly kernel atomic_add() semantics.
On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Wed, 11 Nov 2015 09:27:00 -0800 > > > BPF_XADD == atomic_add() in kernel. period. > > we are not going to deprecate it or introduce something else. > > Agreed, it makes no sense to try and tie C99 or whatever atomic > semantics to something that is already clearly defined to have > exactly kernel atomic_add() semantics. ... and which is emitted by LLVM when asked to compile __sync_fetch_and_add, which has clearly defined (yet conflicting) semantics. If the discrepancy is in LLVM (and it sounds like it is), then I'll raise a bug over there instead. Will
On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Date: Wed, 11 Nov 2015 09:27:00 -0800 > > > BPF_XADD == atomic_add() in kernel. period. > > we are not going to deprecate it or introduce something else. > > Agreed, it makes no sense to try and tie C99 or whatever atomic > semantics to something that is already clearly defined to have > exactly kernel atomic_add() semantics. Dave, this really doesn't make any sense to me. __sync primitives have well defined semantics and (e)BPF is violating this. Furthermore, the fetch_and_add (or XADD) name has well defined semantics, which (e)BPF also violates. Atomicy is hard enough as it is, backends giving random interpretations to them isn't helping anybody. It also baffles me that Alexei is seemingly unwilling to change/rev the (e)BPF instructions, which would be invisible to the regular user, he does want to change the language itself, which will impact all 'scripts'.
On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > Date: Wed, 11 Nov 2015 09:27:00 -0800 > > > > > BPF_XADD == atomic_add() in kernel. period. > > > we are not going to deprecate it or introduce something else. > > > > Agreed, it makes no sense to try and tie C99 or whatever atomic > > semantics to something that is already clearly defined to have > > exactly kernel atomic_add() semantics. > > Dave, this really doesn't make any sense to me. __sync primitives have > well defined semantics and (e)BPF is violating this. bpf_xadd was never meant to be __sync_fetch_and_add equivalent. From the day one it meant to be atomic_add() as kernel does it. I did piggy back on __sync in the llvm backend because it was the quick and dirty way to move forward. In retrospect I should have introduced a clean intrinstic for that instead, but it's not too late to do it now. user space we can change at any time unlike kernel. > Furthermore, the fetch_and_add (or XADD) name has well defined > semantics, which (e)BPF also violates. bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning. > Atomicy is hard enough as it is, backends giving random interpretations > to them isn't helping anybody. no randomness. bpf_xadd == atomic_add() in kernel. imo that is the simplest and cleanest intepretantion one can have, no? > It also baffles me that Alexei is seemingly unwilling to change/rev the > (e)BPF instructions, which would be invisible to the regular user, he > does want to change the language itself, which will impact all > 'scripts'. well, we cannot change it in kernel because it's ABI. I'm not against adding new insns. We definitely can, but let's figure out why? Is anything broken? No. So what new insns make sense? Add new one that does 'fetch_and_add' ? What is the real use case it will be used for? Adding new intrinsic to llvm is not a big deal. I'll add it as soon as I have time to work on it or if somebody beats me to it I would be glad to test it and apply it.
On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote: > On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > > Date: Wed, 11 Nov 2015 09:27:00 -0800 > > > > > > > BPF_XADD == atomic_add() in kernel. period. > > > > we are not going to deprecate it or introduce something else. > > > > > > Agreed, it makes no sense to try and tie C99 or whatever atomic > > > semantics to something that is already clearly defined to have > > > exactly kernel atomic_add() semantics. > > > > Dave, this really doesn't make any sense to me. __sync primitives have > > well defined semantics and (e)BPF is violating this. > > bpf_xadd was never meant to be __sync_fetch_and_add equivalent. > From the day one it meant to be atomic_add() as kernel does it. > I did piggy back on __sync in the llvm backend because it was the quick > and dirty way to move forward. > In retrospect I should have introduced a clean intrinstic for that instead, > but it's not too late to do it now. user space we can change at any time > unlike kernel. I would argue that breaking userspace (language in this case) is equally bad. Programs that used to work will now no longer work. > > Furthermore, the fetch_and_add (or XADD) name has well defined > > semantics, which (e)BPF also violates. > > bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning. Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD, this means it must have a return value. You using the XADD name for something that is not in fact XADD is just wrong. > > Atomicy is hard enough as it is, backends giving random interpretations > > to them isn't helping anybody. > > no randomness. You mean every other backend translating __sync_fetch_and_add() differently than you isn't random on your part? > bpf_xadd == atomic_add() in kernel. > imo that is the simplest and cleanest intepretantion one can have, no? Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That is 'randomly' co-opting something that has well defined meaning and semantics with something else. > > It also baffles me that Alexei is seemingly unwilling to change/rev the > > (e)BPF instructions, which would be invisible to the regular user, he > > does want to change the language itself, which will impact all > > 'scripts'. > > well, we cannot change it in kernel because it's ABI. You can always rev it. Introduce a new set, and wait for users of the old set to die, then remove it. We do that all the time with Linux ABI. > I'm not against adding new insns. We definitely can, but let's figure out why? > Is anything broken? No. Yes, __sync_fetch_and_add() is broken when pulled through the eBPF backend. > So what new insns make sense? Depends a bit on how fancy you want to go. If you want to support weakly ordered architectures at full speed you'll need more (and more complexity) than if you decide to not go that way. The simplest option would be a fully ordered compare-and-swap operation. That is enough to implement everything else (at a cost). The other extreme is a weak ll/sc with an optimizer pass recognising various forms to translate into 'better' native instructions. > Add new one that does 'fetch_and_add' ? What is the real use case it > will be used for? Look at all the atomic_{add,dec}_return*() users in the kernel. A typical example would be a reader-writer lock implementations. See include/asm-generic/rwsem.h for examples. > Adding new intrinsic to llvm is not a big deal. I'll add it as soon > as I have time to work on it or if somebody beats me to it I would be > glad to test it and apply it. This isn't a speed coding contest. You want to think about this properly.
On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote: > > Adding new intrinsic to llvm is not a big deal. I'll add it as soon > > as I have time to work on it or if somebody beats me to it I would be > > glad to test it and apply it. > > This isn't a speed coding contest. You want to think about this > properly. That is, I don't think you want to go add LLVM intrinsics at all. You want to piggy back on the memory model work done by the C/C++11 people. What you want to think about is what the memory model of your virtual machine is and how many instructions you want to expose for that. Concurrency is a right pain, a little time and effort now will safe heaps of pain down the road.
On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote: > > Add new one that does 'fetch_and_add' ? What is the real use case it > > will be used for? > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical > example would be a reader-writer lock implementations. See > include/asm-generic/rwsem.h for examples. Maybe a better example would be refcounting, where you free on 0. if (!fetch_add(&obj->ref, -1)) free(obj);
On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote: > On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: > > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com> > > > Date: Wed, 11 Nov 2015 09:27:00 -0800 > > > > > > > BPF_XADD == atomic_add() in kernel. period. > > > > we are not going to deprecate it or introduce something else. > > > > > > Agreed, it makes no sense to try and tie C99 or whatever atomic > > > semantics to something that is already clearly defined to have > > > exactly kernel atomic_add() semantics. > > > > Dave, this really doesn't make any sense to me. __sync primitives have > > well defined semantics and (e)BPF is violating this. > > bpf_xadd was never meant to be __sync_fetch_and_add equivalent. > From the day one it meant to be atomic_add() as kernel does it. > I did piggy back on __sync in the llvm backend because it was the quick > and dirty way to move forward. > In retrospect I should have introduced a clean intrinstic for that instead, > but it's not too late to do it now. user space we can change at any time > unlike kernel. But it's not just "user space", it's the source language definition! I also don't see how you can change it now, without simply rejecting the __sync primitives outright. > > Furthermore, the fetch_and_add (or XADD) name has well defined > > semantics, which (e)BPF also violates. > > bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning. Right, so it's just a misnomer. > > Atomicy is hard enough as it is, backends giving random interpretations > > to them isn't helping anybody. > > no randomness. bpf_xadd == atomic_add() in kernel. > imo that is the simplest and cleanest intepretantion one can have, no? I don't really mind, as long as there is a semantic that everybody agrees on. Really, I just want this to be consistent because memory models are a PITA enough without having multiple interpretations flying around. > > It also baffles me that Alexei is seemingly unwilling to change/rev the > > (e)BPF instructions, which would be invisible to the regular user, he > > does want to change the language itself, which will impact all > > 'scripts'. > > well, we cannot change it in kernel because it's ABI. > I'm not against adding new insns. We definitely can, but let's figure out why? > Is anything broken? No. So what new insns make sense? If you end up needing a suite of atomics, I would suggest the __atomic builtins because they are likely to be more portable and more flexible than trying to use the kernel memory model outside of the environment for which it was developed. However, I agree with you that we can cross that bridge when we get there. > Adding new intrinsic to llvm is not a big deal. I'll add it as soon > as I have time to work on it or if somebody beats me to it I would be > glad to test it and apply it. I'm more interested in what you do about the existing intrinsic. Anyway, I'll raise a ticket against LLVM so that they're aware (and maybe somebody else will fix it :). Will
On 11/11/2015 07:31 PM, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote: >> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote: >>> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: >>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> >>>> Date: Wed, 11 Nov 2015 09:27:00 -0800 >>>> >>>>> BPF_XADD == atomic_add() in kernel. period. >>>>> we are not going to deprecate it or introduce something else. >>>> >>>> Agreed, it makes no sense to try and tie C99 or whatever atomic >>>> semantics to something that is already clearly defined to have >>>> exactly kernel atomic_add() semantics. >>> >>> Dave, this really doesn't make any sense to me. __sync primitives have >>> well defined semantics and (e)BPF is violating this. >> >> bpf_xadd was never meant to be __sync_fetch_and_add equivalent. >> From the day one it meant to be atomic_add() as kernel does it. >> I did piggy back on __sync in the llvm backend because it was the quick >> and dirty way to move forward. >> In retrospect I should have introduced a clean intrinstic for that instead, >> but it's not too late to do it now. user space we can change at any time >> unlike kernel. > > I would argue that breaking userspace (language in this case) is equally > bad. Programs that used to work will now no longer work. Well, on that note, it's not like you just change the target to bpf in your Makefile and can compile (& load into the kernel) anything you want with it. You do have to write small, restricted programs from scratch for a specific use-case with the limited set of helper functions and intrinsics that are available from the kernel. So I don't think that "Programs that used to work will now no longer work." holds if you regard it as such. >>> Furthermore, the fetch_and_add (or XADD) name has well defined >>> semantics, which (e)BPF also violates. >> >> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning. > > Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD, > this means it must have a return value. > > You using the XADD name for something that is not in fact XADD is just > wrong. > >>> Atomicy is hard enough as it is, backends giving random interpretations >>> to them isn't helping anybody. >> >> no randomness. > > You mean every other backend translating __sync_fetch_and_add() > differently than you isn't random on your part? > >> bpf_xadd == atomic_add() in kernel. >> imo that is the simplest and cleanest intepretantion one can have, no? > > Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That > is 'randomly' co-opting something that has well defined meaning and > semantics with something else. > >>> It also baffles me that Alexei is seemingly unwilling to change/rev the >>> (e)BPF instructions, which would be invisible to the regular user, he >>> does want to change the language itself, which will impact all >>> 'scripts'. >> >> well, we cannot change it in kernel because it's ABI. > > You can always rev it. Introduce a new set, and wait for users of the > old set to die, then remove it. We do that all the time with Linux ABI. > >> I'm not against adding new insns. We definitely can, but let's figure out why? >> Is anything broken? No. > > Yes, __sync_fetch_and_add() is broken when pulled through the eBPF > backend. > >> So what new insns make sense? > > Depends a bit on how fancy you want to go. If you want to support weakly > ordered architectures at full speed you'll need more (and more > complexity) than if you decide to not go that way. > > The simplest option would be a fully ordered compare-and-swap operation. > That is enough to implement everything else (at a cost). The other > extreme is a weak ll/sc with an optimizer pass recognising various forms > to translate into 'better' native instructions. > >> Add new one that does 'fetch_and_add' ? What is the real use case it >> will be used for? > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical > example would be a reader-writer lock implementations. See > include/asm-generic/rwsem.h for examples. > >> Adding new intrinsic to llvm is not a big deal. I'll add it as soon >> as I have time to work on it or if somebody beats me to it I would be >> glad to test it and apply it. > > This isn't a speed coding contest. You want to think about this > properly. >
On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote: > > > Add new one that does 'fetch_and_add' ? What is the real use case it > > > will be used for? > > > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical > > example would be a reader-writer lock implementations. See > > include/asm-generic/rwsem.h for examples. > > Maybe a better example would be refcounting, where you free on 0. > > if (!fetch_add(&obj->ref, -1)) > free(obj); Urgh, too used to the atomic_add_return(), which returns post op. That wants to be: if (fetch_add(&obj->ref, -1) == 1) free(obj); Note that I would very much recommend _against_ encoding the post-op thing in instructions. It works for reversible operations (like add) but is pointless for irreversible operations (like or). That is, given or_return(), you cannot reconstruct the state prior to the operation, so or_return() provides less information than fetch_or().
From: Will Deacon <will.deacon@arm.com> Date: Wed, 11 Nov 2015 17:44:01 +0000 > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote: >> From: Alexei Starovoitov <alexei.starovoitov@gmail.com> >> Date: Wed, 11 Nov 2015 09:27:00 -0800 >> >> > BPF_XADD == atomic_add() in kernel. period. >> > we are not going to deprecate it or introduce something else. >> >> Agreed, it makes no sense to try and tie C99 or whatever atomic >> semantics to something that is already clearly defined to have >> exactly kernel atomic_add() semantics. > > ... and which is emitted by LLVM when asked to compile __sync_fetch_and_add, > which has clearly defined (yet conflicting) semantics. Alexei clearly stated that he knows about this issue and will fully fix this up in LLVM. What more do you need to hear from him once he's stated that he is aware and is working on it? Meanwhile you should make your JIT emit what is expected, rather than arguing to change the semantics. Thanks.
From: Alexei Starovoitov <alexei.starovoitov@gmail.com> Date: Wed, 11 Nov 2015 10:11:33 -0800 > bpf_xadd was never meant to be __sync_fetch_and_add equivalent. > From the day one it meant to be atomic_add() as kernel does it. +1 > I did piggy back on __sync in the llvm backend because it was the quick > and dirty way to move forward. > In retrospect I should have introduced a clean intrinstic for that instead, > but it's not too late to do it now. user space we can change at any time > unlike kernel. +1
From: Daniel Borkmann <daniel@iogearbox.net> Date: Wed, 11 Nov 2015 19:50:15 +0100 > Well, on that note, it's not like you just change the target to bpf > in your Makefile and can compile (& load into the kernel) anything > you want with it. You do have to write small, restricted programs > from scratch for a specific use-case with the limited set of helper > functions and intrinsics that are available from the kernel. So I > don't think that "Programs that used to work will now no longer > work." holds if you regard it as such. +1 Strict C language semantics do not apply here at all, we are talking about purposfully built modules of "C like" code that have any semantics we want and make the most sense for us. Maybe BPF_XADD is unfortunately named, but this is tangental to our ability to choose what atomic operations mean and what semantics they match up to.
On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote: > Well, on that note, it's not like you just change the target to bpf in your > Makefile and can compile (& load into the kernel) anything you want with it. > You do have to write small, restricted programs from scratch for a specific > use-case with the limited set of helper functions and intrinsics that are > available from the kernel. So I don't think that "Programs that used to work > will now no longer work." holds if you regard it as such. So I don't get this argument. If everything is so targeted, then why are the BPF instructions an ABI. If OTOH you're expected to be able to transfer these small proglets, then too I would expect to transfer the source of these proglets. You cannot argue both ways.
On 11/11/2015 08:23 PM, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote: >> Well, on that note, it's not like you just change the target to bpf in your >> Makefile and can compile (& load into the kernel) anything you want with it. >> You do have to write small, restricted programs from scratch for a specific >> use-case with the limited set of helper functions and intrinsics that are >> available from the kernel. So I don't think that "Programs that used to work >> will now no longer work." holds if you regard it as such. > > So I don't get this argument. If everything is so targeted, then why are > the BPF instructions an ABI. > > If OTOH you're expected to be able to transfer these small proglets, > then too I would expect to transfer the source of these proglets. > > You cannot argue both ways. Ohh, I think we were talking past each other. ;) So, yeah, you'd likely need to add new intrinstics that then map to the existing BPF_XADD instructions, and perhaps spill a warning when __sync_fetch_and_add() is being used to advise the developer to switch to the new intrinstics instead. From kernel ABI PoV nothing would change.
On Wed, Nov 11, 2015 at 07:54:15PM +0100, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote: > > > > Add new one that does 'fetch_and_add' ? What is the real use case it > > > > will be used for? > > > > > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical > > > example would be a reader-writer lock implementations. See > > > include/asm-generic/rwsem.h for examples. > > > > Maybe a better example would be refcounting, where you free on 0. > > > > if (!fetch_add(&obj->ref, -1)) > > free(obj); > > Urgh, too used to the atomic_add_return(), which returns post op. That > wants to be: > > if (fetch_add(&obj->ref, -1) == 1) > free(obj); this type of code will never be acceptable in bpf world. If C code does cmpxchg-like things, it's clearly beyond bpf abilities. There are no locks or support for locks in bpf design and will not be. We don't want a program to grab a lock and then terminate automatically because it did divide by zero. Programs are not allowed to directly allocate/free memory either. We don't want dangling pointers. Therefore things like memory barriers, full set of atomics are not applicable in bpf world. The only goal for bpf_xadd (could have been named better, agreed) was to do counters. Like counting packets or bytes or events. In all such cases there is no need to do 'fetch' part. Another reason for lack of 'fetch' part is simplifying JIT. It's easier to emit 'atomic_add' equivalent than to emit 'atomic_add_return'. The only shared data structure two programs can see is a map element. They can increment counters via bpf_xadd or replace the whole map element atomically via bpf_update_map_elem() helper. That's it. If the program needs to grab the lock, do some writes and release it, then probably bpf is not suitable for such use case. The bpf programs should be "fast by design" meaning that there should be no mechanisms in bpf architecture that would allow a program to slow down other programs or the kernel in general.
On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote: > Therefore things like memory barriers, full set of atomics are not applicable > in bpf world. There are still plenty of wait-free constructs one can make using them. Say a barrier/rendezvous construct for knowing when an event has happened on all CPUs. But if you really do not want any of that, I suppose that is a valid choice. Is even privileged (e)BPF not allowed things like this? I was thinking the strict no loops stuff was for unpriv (e)BPF only.
On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote: > On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote: > > Therefore things like memory barriers, full set of atomics are not applicable > > in bpf world. > > There are still plenty of wait-free constructs one can make using them. yes, but all such lock-free algos are typically based on cmpxchg8b and tight loop, so it would be very hard for verifier to proof termination of such loops. I think when we'd need to add something like this, we'll add new bpf insn that will be membarrier+cmpxhg8b+check+loop as a single insn, so it cannot be misused. I don't know of any concrete use case yet. All possible though. > Say a barrier/rendezvous construct for knowing when an event has > happened on all CPUs. > > But if you really do not want any of that, I suppose that is a valid > choice. I do want it :) and I think in the future we'll add a bunch of interesting stuff. May be including things like above. I just don't want to rush things in just because x86 has such insn or because gcc has a builtin for it. Like we discussed adding popcnt insn. It can be useful in some cases, but doesn't seem to worth the pain of adding it to interpreter, JITs and llvm backends... as of today... May be tomorrow it will be must have. > Is even privileged (e)BPF not allowed things like this? I was thinking > the strict no loops stuff was for unpriv (e)BPF only. the only difference between unpriv and priv is the ability to send all values (including kernel addresses) to user space (like tracing needs to see all registers). The rest is the same. root should never crash the kernel as well. If we relax even little bit for root then the whole bpf stuff is no better than kernel module. btw, support for mini loops was requested many times in the past. I guess we'd have to add something like this, but it's tricky. Mainly because control flow graph analysis becomes much more complicated.
On Wed, Nov 11, 2015 at 03:40:15PM -0800, Alexei Starovoitov wrote: > On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote: > > On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote: > > > Therefore things like memory barriers, full set of atomics are not applicable > > > in bpf world. > > > > There are still plenty of wait-free constructs one can make using them. > > yes, but all such lock-free algos are typically based on cmpxchg8b and > tight loop, so it would be very hard for verifier to proof termination > of such loops. I think when we'd need to add something like this, we'll > add new bpf insn that will be membarrier+cmpxhg8b+check+loop as > a single insn, so it cannot be misused. > I don't know of any concrete use case yet. All possible though. So this is where the 'unconditional' atomic ops come in handy. Like the x86: xchg, lock {xadd,add,sub,inc,dec,or,and,xor} Those do not have a loop, and then you can create truly wait-free things; even some applications of cmpxchg do not actually need the loop. But this class of wait-free constructs is indeed significantly smaller than the class of lock-less constructs. > btw, support for mini loops was requested many times in the past. > I guess we'd have to add something like this, but it's tricky. > Mainly because control flow graph analysis becomes much more complicated. Agreed, that does sound like an 'interesting' problem :-) Something like: atomic_op(ptr, f) { for (;;) { val = *ptr; new = f(val) old = cmpxchg(ptr, val, new); if (old == val) break; cpu_relax(); } } might be castable as an instruction I suppose, but I'm not sure you have function references in (e)BPF. The above is 'sane' if f is sane (although there is a starvation case, which is why things like sparc (iirc) need an increasing backoff instead of cpu_relax()).
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 49c1f1b..0b1d2d3 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -609,7 +609,21 @@ emit_cond_jmp: case BPF_STX | BPF_XADD | BPF_W: /* STX XADD: lock *(u64 *)(dst + off) += src */ case BPF_STX | BPF_XADD | BPF_DW: - goto notyet; + ctx->tmp_used = 1; + emit_a64_mov_i(1, tmp2, off, ctx); + switch (BPF_SIZE(code)) { + case BPF_W: + emit(A64_LDR32(tmp, dst, tmp2), ctx); + emit(A64_ADD(is64, tmp, tmp, src), ctx); + emit(A64_STR32(tmp, dst, tmp2), ctx); + break; + case BPF_DW: + emit(A64_LDR64(tmp, dst, tmp2), ctx); + emit(A64_ADD(is64, tmp, tmp, src), ctx); + emit(A64_STR64(tmp, dst, tmp2), ctx); + break; + } + break; /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */ case BPF_LD | BPF_ABS | BPF_W: @@ -679,9 +693,6 @@ emit_cond_jmp: } break; } -notyet: - pr_info_once("*** NOT YET: opcode %02x ***\n", code); - return -EFAULT; default: pr_err_once("unknown opcode %02x\n", code);
aarch64 doesn't have native support for XADD instruction, implement it by the below instruction sequence: Load (dst + off) to a register Add src to it Store it back to (dst + off) Signed-off-by: Yang Shi <yang.shi@linaro.org> CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> --- arch/arm64/net/bpf_jit_comp.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)