Message ID | 391d524c496acc97a8801d8bea80976f58485810.1700676682.git.dxu@dxuuu.xyz (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add bpf_xdp_get_xfrm_state() kfunc | expand |
On 11/22/23 1:20 PM, Daniel Xu wrote: > Switching to vmlinux.h definitions seems to make the verifier very > unhappy with bitfield accesses. The error is: > > ; md.u.md2.dir = direction; > 33: (69) r1 = *(u16 *)(r2 +11) > misaligned stack access off (0x0; 0x0)+-64+11 size 2 > > It looks like disabling CO-RE relocations seem to make the error go > away. Thanks for reporting. I did some preliminary investigation and the failure is due to that we do not support CORE-based bitfield store yet. Besides disabling CORE-relocation as in this patch, there are a few ways to do this: - Change the code to avoid bitfield store and use 1/2/4/8 byte(s) store. A little bit ugly but it should work. - Use to-be-supported 'preserve_static_offset' (https://reviews.llvm.org/D133361) to preserve the offset. This might work (I didn't try it yet). - Eduard did some early study trying to remove CORE attribute (preserve_access_index) from UAPI structures. In this particular case, erspan_metadata is in /usr/include/linux/erspan.h. We will also investigate whether we could store bitfield store directly with CORE. > > Co-developed-by: Antony Antony <antony.antony@secunet.com> > Signed-off-by: Antony Antony <antony.antony@secunet.com> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> > --- > tools/testing/selftests/bpf/progs/test_tunnel_kern.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > index 3065a716544d..ec7e04e012ae 100644 > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > @@ -6,6 +6,7 @@ > * modify it under the terms of version 2 of the GNU General Public > * License as published by the Free Software Foundation. > */ > +#define BPF_NO_PRESERVE_ACCESS_INDEX This is a temporary workaround and hopefully we can lift it in the near future. Please add a comment here with prefix 'Workaround' to explain why this is needed and later on we can earliy search the keyword and remember to tackle this. > #include "vmlinux.h" > #include <bpf/bpf_helpers.h> > #include <bpf/bpf_endian.h>
On Sat, Nov 25, 2023 at 4:52 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > index 3065a716544d..ec7e04e012ae 100644 > > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > @@ -6,6 +6,7 @@ > > * modify it under the terms of version 2 of the GNU General Public > > * License as published by the Free Software Foundation. > > */ > > +#define BPF_NO_PRESERVE_ACCESS_INDEX > > This is a temporary workaround and hopefully we can lift it in the > near future. Please add a comment here with prefix 'Workaround' to > explain why this is needed and later on we can earliy search the > keyword and remember to tackle this. I suspect we will forget to remove this "workaround" and people will start copy pasting it. Let's change the test instead to avoid bitfield access.
On 11/25/23 7:54 PM, Alexei Starovoitov wrote: > On Sat, Nov 25, 2023 at 4:52 PM Yonghong Song <yonghong.song@linux.dev> wrote: >>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>> index 3065a716544d..ec7e04e012ae 100644 >>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>> @@ -6,6 +6,7 @@ >>> * modify it under the terms of version 2 of the GNU General Public >>> * License as published by the Free Software Foundation. >>> */ >>> +#define BPF_NO_PRESERVE_ACCESS_INDEX >> This is a temporary workaround and hopefully we can lift it in the >> near future. Please add a comment here with prefix 'Workaround' to >> explain why this is needed and later on we can earliy search the >> keyword and remember to tackle this. > I suspect we will forget to remove this "workaround" and people > will start copy pasting it. > Let's change the test instead to avoid bitfield access. Agree. Avoiding bitfield access is definitely a solution. I just checked llvm preserve_static_offset (not merged yet), it seems to be able to fix the issue as well. Applying patch https://reviews.llvm.org/D133361 to latest llvm-project, and with the following patch on top of patch 6, ===== diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c index ec7e04e012ae..11cbb12b4029 100644 --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c @@ -6,7 +6,10 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ -#define BPF_NO_PRESERVE_ACCESS_INDEX +#if __has_attribute(preserve_static_offset) +struct __attribute__((preserve_static_offset)) erspan_md2; +struct __attribute__((preserve_static_offset)) erspan_metadata; +#endif #include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h> @@ -25,12 +28,12 @@ * 172.16.1.200 */ #define ASSIGNED_ADDR_VETH1 0xac1001c8 struct vxlanhdr { __be32 vx_flags; __be32 vx_vni; } __attribute__((packed)); int bpf_skb_set_fou_encap(struct __sk_buff *skb_ctx, struct bpf_fou_encap *encap, int type) __ksym; int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, @@ -174,9 +177,13 @@ int erspan_set_tunnel(struct __sk_buff *skb) __u8 hwid = 7; md.version = 2; +#if __has_attribute(preserve_static_offset) md.u.md2.dir = direction; md.u.md2.hwid = hwid & 0xf; md.u.md2.hwid_upper = (hwid >> 4) & 0x3; +#else + /* Change bit-field store to byte(s)-level stores. */ +#endif #endif ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); ==== Eduard, could you double check whether this is a valid use case to solve this kind of issue with preserve_static_offset attribute?
On Sat, 2023-11-25 at 20:22 -0800, Yonghong Song wrote: [...] > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > @@ -6,7 +6,10 @@ > * modify it under the terms of version 2 of the GNU General Public > * License as published by the Free Software Foundation. > */ > -#define BPF_NO_PRESERVE_ACCESS_INDEX > +#if __has_attribute(preserve_static_offset) > +struct __attribute__((preserve_static_offset)) erspan_md2; > +struct __attribute__((preserve_static_offset)) erspan_metadata; > +#endif > #include "vmlinux.h" [...] > int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, > @@ -174,9 +177,13 @@ int erspan_set_tunnel(struct __sk_buff *skb) > __u8 hwid = 7; > > md.version = 2; > +#if __has_attribute(preserve_static_offset) > md.u.md2.dir = direction; > md.u.md2.hwid = hwid & 0xf; > md.u.md2.hwid_upper = (hwid >> 4) & 0x3; > +#else > + /* Change bit-field store to byte(s)-level stores. */ > +#endif > #endif > > ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); > > ==== > > Eduard, could you double check whether this is a valid use case > to solve this kind of issue with preserve_static_offset attribute? Tbh I'm not sure. This test passes with preserve_static_offset because it suppresses preserve_access_index. In general clang translates bitfield access to a set of IR statements like: C: struct foo { unsigned _; unsigned a:1; ... }; ... foo->a ... IR: %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 %bf.load = load i8, ptr %a, align 4 %bf.clear = and i8 %bf.load, 1 %bf.cast = zext i8 %bf.clear to i32 With preserve_static_offset the getelementptr+load are replaced by a single statement which is preserved as-is till code generation, thus load with align 4 is preserved. On the other hand, I'm not sure that clang guarantees that load or stores used for bitfield access would be always aligned according to verifier expectations. I think we should check if there are some clang knobs that prevent generation of unaligned memory access. I'll take a look.
Hi, On Sun, Nov 26, 2023 at 10:14:21PM +0200, Eduard Zingerman wrote: > On Sat, 2023-11-25 at 20:22 -0800, Yonghong Song wrote: > [...] > > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > @@ -6,7 +6,10 @@ > > * modify it under the terms of version 2 of the GNU General Public > > * License as published by the Free Software Foundation. > > */ > > -#define BPF_NO_PRESERVE_ACCESS_INDEX > > +#if __has_attribute(preserve_static_offset) > > +struct __attribute__((preserve_static_offset)) erspan_md2; > > +struct __attribute__((preserve_static_offset)) erspan_metadata; > > +#endif > > #include "vmlinux.h" > [...] > > int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, > > @@ -174,9 +177,13 @@ int erspan_set_tunnel(struct __sk_buff *skb) > > __u8 hwid = 7; > > > > md.version = 2; > > +#if __has_attribute(preserve_static_offset) > > md.u.md2.dir = direction; > > md.u.md2.hwid = hwid & 0xf; > > md.u.md2.hwid_upper = (hwid >> 4) & 0x3; > > +#else > > + /* Change bit-field store to byte(s)-level stores. */ > > +#endif > > #endif > > > > ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); > > > > ==== > > > > Eduard, could you double check whether this is a valid use case > > to solve this kind of issue with preserve_static_offset attribute? > > Tbh I'm not sure. This test passes with preserve_static_offset > because it suppresses preserve_access_index. In general clang > translates bitfield access to a set of IR statements like: > > C: > struct foo { > unsigned _; > unsigned a:1; > ... > }; > ... foo->a ... > > IR: > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > %bf.load = load i8, ptr %a, align 4 > %bf.clear = and i8 %bf.load, 1 > %bf.cast = zext i8 %bf.clear to i32 > > With preserve_static_offset the getelementptr+load are replaced by a > single statement which is preserved as-is till code generation, > thus load with align 4 is preserved. > > On the other hand, I'm not sure that clang guarantees that load or > stores used for bitfield access would be always aligned according to > verifier expectations. > > I think we should check if there are some clang knobs that prevent > generation of unaligned memory access. I'll take a look. Is there a reason to prefer fixing in compiler? I'm not opposed to it, but the downside to compiler fix is it takes years to propagate and sprinkles ifdefs into the code. Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? Thanks, Daniel
On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: [...] > > Tbh I'm not sure. This test passes with preserve_static_offset > > because it suppresses preserve_access_index. In general clang > > translates bitfield access to a set of IR statements like: > > > > C: > > struct foo { > > unsigned _; > > unsigned a:1; > > ... > > }; > > ... foo->a ... > > > > IR: > > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > > %bf.load = load i8, ptr %a, align 4 > > %bf.clear = and i8 %bf.load, 1 > > %bf.cast = zext i8 %bf.clear to i32 > > > > With preserve_static_offset the getelementptr+load are replaced by a > > single statement which is preserved as-is till code generation, > > thus load with align 4 is preserved. > > > > On the other hand, I'm not sure that clang guarantees that load or > > stores used for bitfield access would be always aligned according to > > verifier expectations. > > > > I think we should check if there are some clang knobs that prevent > > generation of unaligned memory access. I'll take a look. > > Is there a reason to prefer fixing in compiler? I'm not opposed to it, > but the downside to compiler fix is it takes years to propagate and > sprinkles ifdefs into the code. > > Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? Well, the contraption below passes verification, tunnel selftest appears to work. I might have messed up some shifts in the macro, though. Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular field access might be unaligned. --- diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c index 3065a716544d..41cd913ac7ff 100644 --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c @@ -9,6 +9,7 @@ #include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h> +#include <bpf/bpf_core_read.h> #include "bpf_kfuncs.h" #include "bpf_tracing_net.h" @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) return TC_ACT_OK; } +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ + unsigned bit_size = (rshift - lshift); \ + unsigned long long nval, val, hi, lo; \ + \ + asm volatile("" : "=r"(p) : "0"(p)); \ + \ + switch (byte_size) { \ + case 1: val = *(unsigned char *)p; break; \ + case 2: val = *(unsigned short *)p; break; \ + case 4: val = *(unsigned int *)p; break; \ + case 8: val = *(unsigned long long *)p; break; \ + } \ + hi = val >> (bit_size + rshift); \ + hi <<= bit_size + rshift; \ + lo = val << (bit_size + lshift); \ + lo >>= bit_size + lshift; \ + nval = new_val; \ + nval <<= lshift; \ + nval >>= rshift; \ + val = hi | nval | lo; \ + switch (byte_size) { \ + case 1: *(unsigned char *)p = val; break; \ + case 2: *(unsigned short *)p = val; break; \ + case 4: *(unsigned int *)p = val; break; \ + case 8: *(unsigned long long *)p = val; break; \ + } \ +}) + SEC("tc") int erspan_set_tunnel(struct __sk_buff *skb) { @@ -173,9 +206,9 @@ int erspan_set_tunnel(struct __sk_buff *skb) __u8 hwid = 7; md.version = 2; - md.u.md2.dir = direction; - md.u.md2.hwid = hwid & 0xf; - md.u.md2.hwid_upper = (hwid >> 4) & 0x3; + BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction); + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf)); + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3); #endif ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); @@ -214,8 +247,9 @@ int erspan_get_tunnel(struct __sk_buff *skb) bpf_printk("\tindex %x\n", index); #else bpf_printk("\tdirection %d hwid %x timestamp %u\n", - md.u.md2.dir, - (md.u.md2.hwid_upper << 4) + md.u.md2.hwid, + BPF_CORE_READ_BITFIELD(&md.u.md2, dir), + (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) + + BPF_CORE_READ_BITFIELD(&md.u.md2, hwid), bpf_ntohl(md.u.md2.timestamp)); #endif @@ -252,9 +286,9 @@ int ip4ip6erspan_set_tunnel(struct __sk_buff *skb) __u8 hwid = 17; md.version = 2; - md.u.md2.dir = direction; - md.u.md2.hwid = hwid & 0xf; - md.u.md2.hwid_upper = (hwid >> 4) & 0x3; + BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction); + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf)); + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3); #endif ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); @@ -294,8 +328,9 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb) bpf_printk("\tindex %x\n", index); #else bpf_printk("\tdirection %d hwid %x timestamp %u\n", - md.u.md2.dir, - (md.u.md2.hwid_upper << 4) + md.u.md2.hwid, + BPF_CORE_READ_BITFIELD(&md.u.md2, dir), + (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) + + BPF_CORE_READ_BITFIELD(&md.u.md2, hwid), bpf_ntohl(md.u.md2.timestamp)); #endif
On 11/26/23 3:14 PM, Eduard Zingerman wrote: > On Sat, 2023-11-25 at 20:22 -0800, Yonghong Song wrote: > [...] >> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >> @@ -6,7 +6,10 @@ >> * modify it under the terms of version 2 of the GNU General Public >> * License as published by the Free Software Foundation. >> */ >> -#define BPF_NO_PRESERVE_ACCESS_INDEX >> +#if __has_attribute(preserve_static_offset) >> +struct __attribute__((preserve_static_offset)) erspan_md2; >> +struct __attribute__((preserve_static_offset)) erspan_metadata; >> +#endif >> #include "vmlinux.h" > [...] >> int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, >> @@ -174,9 +177,13 @@ int erspan_set_tunnel(struct __sk_buff *skb) >> __u8 hwid = 7; >> >> md.version = 2; >> +#if __has_attribute(preserve_static_offset) >> md.u.md2.dir = direction; >> md.u.md2.hwid = hwid & 0xf; >> md.u.md2.hwid_upper = (hwid >> 4) & 0x3; >> +#else >> + /* Change bit-field store to byte(s)-level stores. */ >> +#endif >> #endif >> >> ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); >> >> ==== >> >> Eduard, could you double check whether this is a valid use case >> to solve this kind of issue with preserve_static_offset attribute? > Tbh I'm not sure. This test passes with preserve_static_offset > because it suppresses preserve_access_index. In general clang > translates bitfield access to a set of IR statements like: > > C: > struct foo { > unsigned _; > unsigned a:1; > ... > }; > ... foo->a ... > > IR: > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > %bf.load = load i8, ptr %a, align 4 > %bf.clear = and i8 %bf.load, 1 > %bf.cast = zext i8 %bf.clear to i32 > > With preserve_static_offset the getelementptr+load are replaced by a > single statement which is preserved as-is till code generation, > thus load with align 4 is preserved. > > On the other hand, I'm not sure that clang guarantees that load or > stores used for bitfield access would be always aligned according to > verifier expectations. I think it should be true. The frontend does alignment analysis based on types and (packed vs. unpacked) and assign each load/store with proper alignment (like 'align 4' in the above). 'align 4' truely means the load itself is 4-byte aligned. Otherwise, it will be very confusing for arch's which do not support unaligned memory access (e.g. BPF). > > I think we should check if there are some clang knobs that prevent > generation of unaligned memory access. I'll take a look.
On 11/26/23 8:52 PM, Eduard Zingerman wrote: > On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: > [...] >>> Tbh I'm not sure. This test passes with preserve_static_offset >>> because it suppresses preserve_access_index. In general clang >>> translates bitfield access to a set of IR statements like: >>> >>> C: >>> struct foo { >>> unsigned _; >>> unsigned a:1; >>> ... >>> }; >>> ... foo->a ... >>> >>> IR: >>> %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 >>> %bf.load = load i8, ptr %a, align 4 >>> %bf.clear = and i8 %bf.load, 1 >>> %bf.cast = zext i8 %bf.clear to i32 >>> >>> With preserve_static_offset the getelementptr+load are replaced by a >>> single statement which is preserved as-is till code generation, >>> thus load with align 4 is preserved. >>> >>> On the other hand, I'm not sure that clang guarantees that load or >>> stores used for bitfield access would be always aligned according to >>> verifier expectations. >>> >>> I think we should check if there are some clang knobs that prevent >>> generation of unaligned memory access. I'll take a look. >> Is there a reason to prefer fixing in compiler? I'm not opposed to it, >> but the downside to compiler fix is it takes years to propagate and >> sprinkles ifdefs into the code. >> >> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? > Well, the contraption below passes verification, tunnel selftest > appears to work. I might have messed up some shifts in the macro, though. I didn't test it. But from high level it should work. > > Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular > field access might be unaligned. clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > > --- > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > index 3065a716544d..41cd913ac7ff 100644 > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > @@ -9,6 +9,7 @@ > #include "vmlinux.h" > #include <bpf/bpf_helpers.h> > #include <bpf/bpf_endian.h> > +#include <bpf/bpf_core_read.h> > #include "bpf_kfuncs.h" > #include "bpf_tracing_net.h" > > @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) > return TC_ACT_OK; > } > > +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ > + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ > + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ > + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ > + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ > + unsigned bit_size = (rshift - lshift); \ > + unsigned long long nval, val, hi, lo; \ > + \ > + asm volatile("" : "=r"(p) : "0"(p)); \ Use asm volatile("" : "+r"(p)) ? > + \ > + switch (byte_size) { \ > + case 1: val = *(unsigned char *)p; break; \ > + case 2: val = *(unsigned short *)p; break; \ > + case 4: val = *(unsigned int *)p; break; \ > + case 8: val = *(unsigned long long *)p; break; \ > + } \ > + hi = val >> (bit_size + rshift); \ > + hi <<= bit_size + rshift; \ > + lo = val << (bit_size + lshift); \ > + lo >>= bit_size + lshift; \ > + nval = new_val; \ > + nval <<= lshift; \ > + nval >>= rshift; \ > + val = hi | nval | lo; \ > + switch (byte_size) { \ > + case 1: *(unsigned char *)p = val; break; \ > + case 2: *(unsigned short *)p = val; break; \ > + case 4: *(unsigned int *)p = val; break; \ > + case 8: *(unsigned long long *)p = val; break; \ > + } \ > +}) I think this should be put in libbpf public header files but not sure where to put it. bpf_core_read.h although it is core write? But on the other hand, this is a uapi struct bitfield write, strictly speaking, CORE write is really unnecessary here. It would be great if we can relieve users from dealing with such unnecessary CORE writes. In that sense, for this particular case, I would prefer rewriting the code by using byte-level stores... > + > SEC("tc") > int erspan_set_tunnel(struct __sk_buff *skb) > { > @@ -173,9 +206,9 @@ int erspan_set_tunnel(struct __sk_buff *skb) > __u8 hwid = 7; > > md.version = 2; > - md.u.md2.dir = direction; > - md.u.md2.hwid = hwid & 0xf; > - md.u.md2.hwid_upper = (hwid >> 4) & 0x3; > + BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction); > + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf)); > + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3); > #endif > > ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); > @@ -214,8 +247,9 @@ int erspan_get_tunnel(struct __sk_buff *skb) > bpf_printk("\tindex %x\n", index); > #else > bpf_printk("\tdirection %d hwid %x timestamp %u\n", > - md.u.md2.dir, > - (md.u.md2.hwid_upper << 4) + md.u.md2.hwid, > + BPF_CORE_READ_BITFIELD(&md.u.md2, dir), > + (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) + > + BPF_CORE_READ_BITFIELD(&md.u.md2, hwid), > bpf_ntohl(md.u.md2.timestamp)); > #endif > > @@ -252,9 +286,9 @@ int ip4ip6erspan_set_tunnel(struct __sk_buff *skb) > __u8 hwid = 17; > > md.version = 2; > - md.u.md2.dir = direction; > - md.u.md2.hwid = hwid & 0xf; > - md.u.md2.hwid_upper = (hwid >> 4) & 0x3; > + BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction); > + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf)); > + BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3); > #endif > > ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md)); > @@ -294,8 +328,9 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb) > bpf_printk("\tindex %x\n", index); > #else > bpf_printk("\tdirection %d hwid %x timestamp %u\n", > - md.u.md2.dir, > - (md.u.md2.hwid_upper << 4) + md.u.md2.hwid, > + BPF_CORE_READ_BITFIELD(&md.u.md2, dir), > + (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) + > + BPF_CORE_READ_BITFIELD(&md.u.md2, hwid), > bpf_ntohl(md.u.md2.timestamp)); > #endif >
On 11/27/23 12:44 AM, Yonghong Song wrote: > > On 11/26/23 8:52 PM, Eduard Zingerman wrote: >> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: >> [...] >>>> Tbh I'm not sure. This test passes with preserve_static_offset >>>> because it suppresses preserve_access_index. In general clang >>>> translates bitfield access to a set of IR statements like: >>>> >>>> C: >>>> struct foo { >>>> unsigned _; >>>> unsigned a:1; >>>> ... >>>> }; >>>> ... foo->a ... >>>> >>>> IR: >>>> %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 >>>> %bf.load = load i8, ptr %a, align 4 >>>> %bf.clear = and i8 %bf.load, 1 >>>> %bf.cast = zext i8 %bf.clear to i32 >>>> >>>> With preserve_static_offset the getelementptr+load are replaced by a >>>> single statement which is preserved as-is till code generation, >>>> thus load with align 4 is preserved. >>>> >>>> On the other hand, I'm not sure that clang guarantees that load or >>>> stores used for bitfield access would be always aligned according to >>>> verifier expectations. >>>> >>>> I think we should check if there are some clang knobs that prevent >>>> generation of unaligned memory access. I'll take a look. >>> Is there a reason to prefer fixing in compiler? I'm not opposed to it, >>> but the downside to compiler fix is it takes years to propagate and >>> sprinkles ifdefs into the code. >>> >>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? >> Well, the contraption below passes verification, tunnel selftest >> appears to work. I might have messed up some shifts in the macro, >> though. > > I didn't test it. But from high level it should work. > >> >> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular >> field access might be unaligned. > > clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet > alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > >> >> --- >> >> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >> index 3065a716544d..41cd913ac7ff 100644 >> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >> @@ -9,6 +9,7 @@ >> #include "vmlinux.h" >> #include <bpf/bpf_helpers.h> >> #include <bpf/bpf_endian.h> >> +#include <bpf/bpf_core_read.h> >> #include "bpf_kfuncs.h" >> #include "bpf_tracing_net.h" >> @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) >> return TC_ACT_OK; >> } >> +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ >> + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ >> + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ >> + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ >> + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ >> + unsigned bit_size = (rshift - lshift); \ >> + unsigned long long nval, val, hi, lo; \ >> + \ >> + asm volatile("" : "=r"(p) : "0"(p)); \ > > Use asm volatile("" : "+r"(p)) ? > >> + \ >> + switch (byte_size) { \ >> + case 1: val = *(unsigned char *)p; break; \ >> + case 2: val = *(unsigned short *)p; break; \ >> + case 4: val = *(unsigned int *)p; break; \ >> + case 8: val = *(unsigned long long *)p; break; \ >> + } \ >> + hi = val >> (bit_size + rshift); \ >> + hi <<= bit_size + rshift; \ >> + lo = val << (bit_size + lshift); \ >> + lo >>= bit_size + lshift; \ >> + nval = new_val; \ >> + nval <<= lshift; \ >> + nval >>= rshift; \ >> + val = hi | nval | lo; \ >> + switch (byte_size) { \ >> + case 1: *(unsigned char *)p = val; break; \ >> + case 2: *(unsigned short *)p = val; break; \ >> + case 4: *(unsigned int *)p = val; break; \ >> + case 8: *(unsigned long long *)p = val; break; \ >> + } \ >> +}) > > I think this should be put in libbpf public header files but not sure > where to put it. bpf_core_read.h although it is core write? > > But on the other hand, this is a uapi struct bitfield write, > strictly speaking, CORE write is really unnecessary here. It > would be great if we can relieve users from dealing with > such unnecessary CORE writes. In that sense, for this particular > case, I would prefer rewriting the code by using byte-level > stores... or preserve_static_offset to clearly mean to undo bitfield CORE ... [...]
On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: > > On 11/27/23 12:44 AM, Yonghong Song wrote: > > > > On 11/26/23 8:52 PM, Eduard Zingerman wrote: > > > On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: > > > [...] > > > > > Tbh I'm not sure. This test passes with preserve_static_offset > > > > > because it suppresses preserve_access_index. In general clang > > > > > translates bitfield access to a set of IR statements like: > > > > > > > > > > C: > > > > > struct foo { > > > > > unsigned _; > > > > > unsigned a:1; > > > > > ... > > > > > }; > > > > > ... foo->a ... > > > > > > > > > > IR: > > > > > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > > > > > %bf.load = load i8, ptr %a, align 4 > > > > > %bf.clear = and i8 %bf.load, 1 > > > > > %bf.cast = zext i8 %bf.clear to i32 > > > > > > > > > > With preserve_static_offset the getelementptr+load are replaced by a > > > > > single statement which is preserved as-is till code generation, > > > > > thus load with align 4 is preserved. > > > > > > > > > > On the other hand, I'm not sure that clang guarantees that load or > > > > > stores used for bitfield access would be always aligned according to > > > > > verifier expectations. > > > > > > > > > > I think we should check if there are some clang knobs that prevent > > > > > generation of unaligned memory access. I'll take a look. > > > > Is there a reason to prefer fixing in compiler? I'm not opposed to it, > > > > but the downside to compiler fix is it takes years to propagate and > > > > sprinkles ifdefs into the code. > > > > > > > > Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? > > > Well, the contraption below passes verification, tunnel selftest > > > appears to work. I might have messed up some shifts in the macro, > > > though. > > > > I didn't test it. But from high level it should work. > > > > > > > > Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular > > > field access might be unaligned. > > > > clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet > > alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > > > > > > > > --- > > > > > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > index 3065a716544d..41cd913ac7ff 100644 > > > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > @@ -9,6 +9,7 @@ > > > #include "vmlinux.h" > > > #include <bpf/bpf_helpers.h> > > > #include <bpf/bpf_endian.h> > > > +#include <bpf/bpf_core_read.h> > > > #include "bpf_kfuncs.h" > > > #include "bpf_tracing_net.h" > > > @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) > > > return TC_ACT_OK; > > > } > > > +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ > > > + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ > > > + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ > > > + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ > > > + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ > > > + unsigned bit_size = (rshift - lshift); \ > > > + unsigned long long nval, val, hi, lo; \ > > > + \ > > > + asm volatile("" : "=r"(p) : "0"(p)); \ > > > > Use asm volatile("" : "+r"(p)) ? > > > > > + \ > > > + switch (byte_size) { \ > > > + case 1: val = *(unsigned char *)p; break; \ > > > + case 2: val = *(unsigned short *)p; break; \ > > > + case 4: val = *(unsigned int *)p; break; \ > > > + case 8: val = *(unsigned long long *)p; break; \ > > > + } \ > > > + hi = val >> (bit_size + rshift); \ > > > + hi <<= bit_size + rshift; \ > > > + lo = val << (bit_size + lshift); \ > > > + lo >>= bit_size + lshift; \ > > > + nval = new_val; \ > > > + nval <<= lshift; \ > > > + nval >>= rshift; \ > > > + val = hi | nval | lo; \ > > > + switch (byte_size) { \ > > > + case 1: *(unsigned char *)p = val; break; \ > > > + case 2: *(unsigned short *)p = val; break; \ > > > + case 4: *(unsigned int *)p = val; break; \ > > > + case 8: *(unsigned long long *)p = val; break; \ > > > + } \ > > > +}) > > > > I think this should be put in libbpf public header files but not sure > > where to put it. bpf_core_read.h although it is core write? > > > > But on the other hand, this is a uapi struct bitfield write, > > strictly speaking, CORE write is really unnecessary here. It > > would be great if we can relieve users from dealing with > > such unnecessary CORE writes. In that sense, for this particular > > case, I would prefer rewriting the code by using byte-level > > stores... > or preserve_static_offset to clearly mean to undo bitfield CORE ... Ok, I will do byte-level rewrite for next revision. Just wondering, though: will bpftool be able to generate the appropriate annotations for uapi structs? IIUC uapi structs look the same in BTF as any other struct. > > [...] > Thanks, Daniel
On Mon, 2023-11-27 at 14:45 -0600, Daniel Xu wrote: [...] > IIUC uapi structs look the same in BTF as any other struct. Yes, and all share preserve_access_index attribute because of the way attribute push/pop directives are generated in vmlinux.h. > Just wondering, though: will bpftool be able to generate the appropriate > annotations for uapi structs? The problem is that there is no easy way to identify if structure is uapi in DWARF (from which BTF is generated). One way to do this: - modify pahole to check DW_AT_decl_file for each struct DWARF entry and generate some special decl tag in BTF; - modify bpftool to interpret this tag as a marker to not generate preserve_access_index for a structure. The drawback is that such behavior hardcodes some kernel specific assumptions both in pahole and in bpftool. It also remains to be seen if DW_AT_decl_file tags are consistent. It might be the case that allowing excessive CO-RE relocations is a better option. (And maybe tweak something about bitfield access generation to avoid such issues as in this thread). Thanks, Eduard
On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: > On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: > > > > On 11/27/23 12:44 AM, Yonghong Song wrote: > > > > > > On 11/26/23 8:52 PM, Eduard Zingerman wrote: > > > > On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: > > > > [...] > > > > > > Tbh I'm not sure. This test passes with preserve_static_offset > > > > > > because it suppresses preserve_access_index. In general clang > > > > > > translates bitfield access to a set of IR statements like: > > > > > > > > > > > > C: > > > > > > struct foo { > > > > > > unsigned _; > > > > > > unsigned a:1; > > > > > > ... > > > > > > }; > > > > > > ... foo->a ... > > > > > > > > > > > > IR: > > > > > > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > > > > > > %bf.load = load i8, ptr %a, align 4 > > > > > > %bf.clear = and i8 %bf.load, 1 > > > > > > %bf.cast = zext i8 %bf.clear to i32 > > > > > > > > > > > > With preserve_static_offset the getelementptr+load are replaced by a > > > > > > single statement which is preserved as-is till code generation, > > > > > > thus load with align 4 is preserved. > > > > > > > > > > > > On the other hand, I'm not sure that clang guarantees that load or > > > > > > stores used for bitfield access would be always aligned according to > > > > > > verifier expectations. > > > > > > > > > > > > I think we should check if there are some clang knobs that prevent > > > > > > generation of unaligned memory access. I'll take a look. > > > > > Is there a reason to prefer fixing in compiler? I'm not opposed to it, > > > > > but the downside to compiler fix is it takes years to propagate and > > > > > sprinkles ifdefs into the code. > > > > > > > > > > Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? > > > > Well, the contraption below passes verification, tunnel selftest > > > > appears to work. I might have messed up some shifts in the macro, > > > > though. > > > > > > I didn't test it. But from high level it should work. > > > > > > > > > > > Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular > > > > field access might be unaligned. > > > > > > clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet > > > alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > > > > > > > > > > > --- > > > > > > > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > index 3065a716544d..41cd913ac7ff 100644 > > > > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > @@ -9,6 +9,7 @@ > > > > #include "vmlinux.h" > > > > #include <bpf/bpf_helpers.h> > > > > #include <bpf/bpf_endian.h> > > > > +#include <bpf/bpf_core_read.h> > > > > #include "bpf_kfuncs.h" > > > > #include "bpf_tracing_net.h" > > > > @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) > > > > return TC_ACT_OK; > > > > } > > > > +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ > > > > + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ > > > > + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ > > > > + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ > > > > + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ > > > > + unsigned bit_size = (rshift - lshift); \ > > > > + unsigned long long nval, val, hi, lo; \ > > > > + \ > > > > + asm volatile("" : "=r"(p) : "0"(p)); \ > > > > > > Use asm volatile("" : "+r"(p)) ? > > > > > > > + \ > > > > + switch (byte_size) { \ > > > > + case 1: val = *(unsigned char *)p; break; \ > > > > + case 2: val = *(unsigned short *)p; break; \ > > > > + case 4: val = *(unsigned int *)p; break; \ > > > > + case 8: val = *(unsigned long long *)p; break; \ > > > > + } \ > > > > + hi = val >> (bit_size + rshift); \ > > > > + hi <<= bit_size + rshift; \ > > > > + lo = val << (bit_size + lshift); \ > > > > + lo >>= bit_size + lshift; \ > > > > + nval = new_val; \ > > > > + nval <<= lshift; \ > > > > + nval >>= rshift; \ > > > > + val = hi | nval | lo; \ > > > > + switch (byte_size) { \ > > > > + case 1: *(unsigned char *)p = val; break; \ > > > > + case 2: *(unsigned short *)p = val; break; \ > > > > + case 4: *(unsigned int *)p = val; break; \ > > > > + case 8: *(unsigned long long *)p = val; break; \ > > > > + } \ > > > > +}) > > > > > > I think this should be put in libbpf public header files but not sure > > > where to put it. bpf_core_read.h although it is core write? > > > > > > But on the other hand, this is a uapi struct bitfield write, > > > strictly speaking, CORE write is really unnecessary here. It > > > would be great if we can relieve users from dealing with > > > such unnecessary CORE writes. In that sense, for this particular > > > case, I would prefer rewriting the code by using byte-level > > > stores... > > or preserve_static_offset to clearly mean to undo bitfield CORE ... > > Ok, I will do byte-level rewrite for next revision. [...] This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . But I don't think it's very pretty. Also I'm seeing on the internet that people are saying the exact layout of bitfields is compiler dependent. So I am wondering if these byte sized writes are correct. For that matter, I am wondering how the GCC generated bitfield accesses line up with clang generated BPF bytecode. Or why uapi contains a bitfield. WDYT, should I send up v2 with this or should I do one of the other approaches in this thread? I am ok with any of the approaches. Thanks, Daniel
On 11/27/23 7:01 PM, Daniel Xu wrote: > On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: >> On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: >>> On 11/27/23 12:44 AM, Yonghong Song wrote: >>>> On 11/26/23 8:52 PM, Eduard Zingerman wrote: >>>>> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: >>>>> [...] >>>>>>> Tbh I'm not sure. This test passes with preserve_static_offset >>>>>>> because it suppresses preserve_access_index. In general clang >>>>>>> translates bitfield access to a set of IR statements like: >>>>>>> >>>>>>> C: >>>>>>> struct foo { >>>>>>> unsigned _; >>>>>>> unsigned a:1; >>>>>>> ... >>>>>>> }; >>>>>>> ... foo->a ... >>>>>>> >>>>>>> IR: >>>>>>> %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 >>>>>>> %bf.load = load i8, ptr %a, align 4 >>>>>>> %bf.clear = and i8 %bf.load, 1 >>>>>>> %bf.cast = zext i8 %bf.clear to i32 >>>>>>> >>>>>>> With preserve_static_offset the getelementptr+load are replaced by a >>>>>>> single statement which is preserved as-is till code generation, >>>>>>> thus load with align 4 is preserved. >>>>>>> >>>>>>> On the other hand, I'm not sure that clang guarantees that load or >>>>>>> stores used for bitfield access would be always aligned according to >>>>>>> verifier expectations. >>>>>>> >>>>>>> I think we should check if there are some clang knobs that prevent >>>>>>> generation of unaligned memory access. I'll take a look. >>>>>> Is there a reason to prefer fixing in compiler? I'm not opposed to it, >>>>>> but the downside to compiler fix is it takes years to propagate and >>>>>> sprinkles ifdefs into the code. >>>>>> >>>>>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? >>>>> Well, the contraption below passes verification, tunnel selftest >>>>> appears to work. I might have messed up some shifts in the macro, >>>>> though. >>>> I didn't test it. But from high level it should work. >>>> >>>>> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular >>>>> field access might be unaligned. >>>> clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet >>>> alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. >>>> >>>>> --- >>>>> >>>>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> index 3065a716544d..41cd913ac7ff 100644 >>>>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> @@ -9,6 +9,7 @@ >>>>> #include "vmlinux.h" >>>>> #include <bpf/bpf_helpers.h> >>>>> #include <bpf/bpf_endian.h> >>>>> +#include <bpf/bpf_core_read.h> >>>>> #include "bpf_kfuncs.h" >>>>> #include "bpf_tracing_net.h" >>>>> @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) >>>>> return TC_ACT_OK; >>>>> } >>>>> +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ >>>>> + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ >>>>> + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ >>>>> + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ >>>>> + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ >>>>> + unsigned bit_size = (rshift - lshift); \ >>>>> + unsigned long long nval, val, hi, lo; \ >>>>> + \ >>>>> + asm volatile("" : "=r"(p) : "0"(p)); \ >>>> Use asm volatile("" : "+r"(p)) ? >>>> >>>>> + \ >>>>> + switch (byte_size) { \ >>>>> + case 1: val = *(unsigned char *)p; break; \ >>>>> + case 2: val = *(unsigned short *)p; break; \ >>>>> + case 4: val = *(unsigned int *)p; break; \ >>>>> + case 8: val = *(unsigned long long *)p; break; \ >>>>> + } \ >>>>> + hi = val >> (bit_size + rshift); \ >>>>> + hi <<= bit_size + rshift; \ >>>>> + lo = val << (bit_size + lshift); \ >>>>> + lo >>= bit_size + lshift; \ >>>>> + nval = new_val; \ >>>>> + nval <<= lshift; \ >>>>> + nval >>= rshift; \ >>>>> + val = hi | nval | lo; \ >>>>> + switch (byte_size) { \ >>>>> + case 1: *(unsigned char *)p = val; break; \ >>>>> + case 2: *(unsigned short *)p = val; break; \ >>>>> + case 4: *(unsigned int *)p = val; break; \ >>>>> + case 8: *(unsigned long long *)p = val; break; \ >>>>> + } \ >>>>> +}) >>>> I think this should be put in libbpf public header files but not sure >>>> where to put it. bpf_core_read.h although it is core write? >>>> >>>> But on the other hand, this is a uapi struct bitfield write, >>>> strictly speaking, CORE write is really unnecessary here. It >>>> would be great if we can relieve users from dealing with >>>> such unnecessary CORE writes. In that sense, for this particular >>>> case, I would prefer rewriting the code by using byte-level >>>> stores... >>> or preserve_static_offset to clearly mean to undo bitfield CORE ... >> Ok, I will do byte-level rewrite for next revision. > [...] > > This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . > > But I don't think it's very pretty. Also I'm seeing on the internet that > people are saying the exact layout of bitfields is compiler dependent. Any reference for this (exact layout of bitfields is compiler dependent)? > So I am wondering if these byte sized writes are correct. For that > matter, I am wondering how the GCC generated bitfield accesses line up > with clang generated BPF bytecode. Or why uapi contains a bitfield. One thing for sure is memory layout of bitfields should be the same for both clang and gcc as it is determined by C standard. Register representation and how to manipulate could be different for different compilers. > > WDYT, should I send up v2 with this or should I do one of the other > approaches in this thread? Daniel, look at your patch, since we need to do CORE_READ for those bitfields any way, I think Eduard's patch with BPF_CORE_WRITE_BITFIELD does make sense and it also makes code easy to understand. Could you take Eduard's patch for now? Whether and where to put BPF_CORE_WRITE_BITFIELD macros can be decided later. > > I am ok with any of the approaches. > > Thanks, > Daniel >
On Mon, Nov 27, 2023 at 8:06 PM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 11/27/23 7:01 PM, Daniel Xu wrote: > > On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: > >> On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: > >>> On 11/27/23 12:44 AM, Yonghong Song wrote: > >>>> On 11/26/23 8:52 PM, Eduard Zingerman wrote: > >>>>> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: > >>>>> [...] > >>>>>>> Tbh I'm not sure. This test passes with preserve_static_offset > >>>>>>> because it suppresses preserve_access_index. In general clang > >>>>>>> translates bitfield access to a set of IR statements like: > >>>>>>> > >>>>>>> C: > >>>>>>> struct foo { > >>>>>>> unsigned _; > >>>>>>> unsigned a:1; > >>>>>>> ... > >>>>>>> }; > >>>>>>> ... foo->a ... > >>>>>>> > >>>>>>> IR: > >>>>>>> %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > >>>>>>> %bf.load = load i8, ptr %a, align 4 > >>>>>>> %bf.clear = and i8 %bf.load, 1 > >>>>>>> %bf.cast = zext i8 %bf.clear to i32 > >>>>>>> > >>>>>>> With preserve_static_offset the getelementptr+load are replaced by a > >>>>>>> single statement which is preserved as-is till code generation, > >>>>>>> thus load with align 4 is preserved. > >>>>>>> > >>>>>>> On the other hand, I'm not sure that clang guarantees that load or > >>>>>>> stores used for bitfield access would be always aligned according to > >>>>>>> verifier expectations. > >>>>>>> > >>>>>>> I think we should check if there are some clang knobs that prevent > >>>>>>> generation of unaligned memory access. I'll take a look. > >>>>>> Is there a reason to prefer fixing in compiler? I'm not opposed to it, > >>>>>> but the downside to compiler fix is it takes years to propagate and > >>>>>> sprinkles ifdefs into the code. > >>>>>> > >>>>>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? > >>>>> Well, the contraption below passes verification, tunnel selftest > >>>>> appears to work. I might have messed up some shifts in the macro, > >>>>> though. > >>>> I didn't test it. But from high level it should work. > >>>> > >>>>> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular > >>>>> field access might be unaligned. > >>>> clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet > >>>> alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > >>>> > >>>>> --- > >>>>> > >>>>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > >>>>> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > >>>>> index 3065a716544d..41cd913ac7ff 100644 > >>>>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > >>>>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > >>>>> @@ -9,6 +9,7 @@ > >>>>> #include "vmlinux.h" > >>>>> #include <bpf/bpf_helpers.h> > >>>>> #include <bpf/bpf_endian.h> > >>>>> +#include <bpf/bpf_core_read.h> > >>>>> #include "bpf_kfuncs.h" > >>>>> #include "bpf_tracing_net.h" > >>>>> @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) > >>>>> return TC_ACT_OK; > >>>>> } > >>>>> +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ > >>>>> + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ > >>>>> + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ > >>>>> + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ > >>>>> + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ > >>>>> + unsigned bit_size = (rshift - lshift); \ > >>>>> + unsigned long long nval, val, hi, lo; \ > >>>>> + \ > >>>>> + asm volatile("" : "=r"(p) : "0"(p)); \ > >>>> Use asm volatile("" : "+r"(p)) ? > >>>> > >>>>> + \ > >>>>> + switch (byte_size) { \ > >>>>> + case 1: val = *(unsigned char *)p; break; \ > >>>>> + case 2: val = *(unsigned short *)p; break; \ > >>>>> + case 4: val = *(unsigned int *)p; break; \ > >>>>> + case 8: val = *(unsigned long long *)p; break; \ > >>>>> + } \ > >>>>> + hi = val >> (bit_size + rshift); \ > >>>>> + hi <<= bit_size + rshift; \ > >>>>> + lo = val << (bit_size + lshift); \ > >>>>> + lo >>= bit_size + lshift; \ > >>>>> + nval = new_val; \ > >>>>> + nval <<= lshift; \ > >>>>> + nval >>= rshift; \ > >>>>> + val = hi | nval | lo; \ > >>>>> + switch (byte_size) { \ > >>>>> + case 1: *(unsigned char *)p = val; break; \ > >>>>> + case 2: *(unsigned short *)p = val; break; \ > >>>>> + case 4: *(unsigned int *)p = val; break; \ > >>>>> + case 8: *(unsigned long long *)p = val; break; \ > >>>>> + } \ > >>>>> +}) > >>>> I think this should be put in libbpf public header files but not sure > >>>> where to put it. bpf_core_read.h although it is core write? > >>>> > >>>> But on the other hand, this is a uapi struct bitfield write, > >>>> strictly speaking, CORE write is really unnecessary here. It > >>>> would be great if we can relieve users from dealing with > >>>> such unnecessary CORE writes. In that sense, for this particular > >>>> case, I would prefer rewriting the code by using byte-level > >>>> stores... > >>> or preserve_static_offset to clearly mean to undo bitfield CORE ... > >> Ok, I will do byte-level rewrite for next revision. > > [...] > > > > This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . > > > > But I don't think it's very pretty. Also I'm seeing on the internet that > > people are saying the exact layout of bitfields is compiler dependent. > > Any reference for this (exact layout of bitfields is compiler dependent)? > > > So I am wondering if these byte sized writes are correct. For that > > matter, I am wondering how the GCC generated bitfield accesses line up > > with clang generated BPF bytecode. Or why uapi contains a bitfield. > > One thing for sure is memory layout of bitfields should be the same > for both clang and gcc as it is determined by C standard. Register > representation and how to manipulate could be different for different > compilers. > > > > > WDYT, should I send up v2 with this or should I do one of the other > > approaches in this thread? > > Daniel, look at your patch, since we need to do CORE_READ for > those bitfields any way, I think Eduard's patch with > BPF_CORE_WRITE_BITFIELD does make sense and it also makes code > easy to understand. Could you take Eduard's patch for now? > Whether and where to put BPF_CORE_WRITE_BITFIELD macros > can be decided later. bpf_core_read.h name is... let's say "historical" and was never meant to limit stuff there to read-only or anything like that. Think about it as just bpf_core.h where all the CO-RE-related stuff goes. So please put BPF_CORE_WRITE_BITFIELD there. > > > > > I am ok with any of the approaches. > > > > Thanks, > > Daniel > >
On Mon, Nov 27, 2023 at 08:06:01PM -0800, Yonghong Song wrote: > > On 11/27/23 7:01 PM, Daniel Xu wrote: > > On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: > > > On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: > > > > On 11/27/23 12:44 AM, Yonghong Song wrote: > > > > > On 11/26/23 8:52 PM, Eduard Zingerman wrote: > > > > > > On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: > > > > > > [...] > > > > > > > > Tbh I'm not sure. This test passes with preserve_static_offset > > > > > > > > because it suppresses preserve_access_index. In general clang > > > > > > > > translates bitfield access to a set of IR statements like: > > > > > > > > > > > > > > > > C: > > > > > > > > struct foo { > > > > > > > > unsigned _; > > > > > > > > unsigned a:1; > > > > > > > > ... > > > > > > > > }; > > > > > > > > ... foo->a ... > > > > > > > > > > > > > > > > IR: > > > > > > > > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > > > > > > > > %bf.load = load i8, ptr %a, align 4 > > > > > > > > %bf.clear = and i8 %bf.load, 1 > > > > > > > > %bf.cast = zext i8 %bf.clear to i32 > > > > > > > > > > > > > > > > With preserve_static_offset the getelementptr+load are replaced by a > > > > > > > > single statement which is preserved as-is till code generation, > > > > > > > > thus load with align 4 is preserved. > > > > > > > > > > > > > > > > On the other hand, I'm not sure that clang guarantees that load or > > > > > > > > stores used for bitfield access would be always aligned according to > > > > > > > > verifier expectations. > > > > > > > > > > > > > > > > I think we should check if there are some clang knobs that prevent > > > > > > > > generation of unaligned memory access. I'll take a look. > > > > > > > Is there a reason to prefer fixing in compiler? I'm not opposed to it, > > > > > > > but the downside to compiler fix is it takes years to propagate and > > > > > > > sprinkles ifdefs into the code. > > > > > > > > > > > > > > Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? > > > > > > Well, the contraption below passes verification, tunnel selftest > > > > > > appears to work. I might have messed up some shifts in the macro, > > > > > > though. > > > > > I didn't test it. But from high level it should work. > > > > > > > > > > > Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular > > > > > > field access might be unaligned. > > > > > clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet > > > > > alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > > > > > > > > > > > --- > > > > > > > > > > > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > index 3065a716544d..41cd913ac7ff 100644 > > > > > > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > @@ -9,6 +9,7 @@ > > > > > > #include "vmlinux.h" > > > > > > #include <bpf/bpf_helpers.h> > > > > > > #include <bpf/bpf_endian.h> > > > > > > +#include <bpf/bpf_core_read.h> > > > > > > #include "bpf_kfuncs.h" > > > > > > #include "bpf_tracing_net.h" > > > > > > @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) > > > > > > return TC_ACT_OK; > > > > > > } > > > > > > +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ > > > > > > + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ > > > > > > + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ > > > > > > + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ > > > > > > + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ > > > > > > + unsigned bit_size = (rshift - lshift); \ > > > > > > + unsigned long long nval, val, hi, lo; \ > > > > > > + \ > > > > > > + asm volatile("" : "=r"(p) : "0"(p)); \ > > > > > Use asm volatile("" : "+r"(p)) ? > > > > > > > > > > > + \ > > > > > > + switch (byte_size) { \ > > > > > > + case 1: val = *(unsigned char *)p; break; \ > > > > > > + case 2: val = *(unsigned short *)p; break; \ > > > > > > + case 4: val = *(unsigned int *)p; break; \ > > > > > > + case 8: val = *(unsigned long long *)p; break; \ > > > > > > + } \ > > > > > > + hi = val >> (bit_size + rshift); \ > > > > > > + hi <<= bit_size + rshift; \ > > > > > > + lo = val << (bit_size + lshift); \ > > > > > > + lo >>= bit_size + lshift; \ > > > > > > + nval = new_val; \ > > > > > > + nval <<= lshift; \ > > > > > > + nval >>= rshift; \ > > > > > > + val = hi | nval | lo; \ > > > > > > + switch (byte_size) { \ > > > > > > + case 1: *(unsigned char *)p = val; break; \ > > > > > > + case 2: *(unsigned short *)p = val; break; \ > > > > > > + case 4: *(unsigned int *)p = val; break; \ > > > > > > + case 8: *(unsigned long long *)p = val; break; \ > > > > > > + } \ > > > > > > +}) > > > > > I think this should be put in libbpf public header files but not sure > > > > > where to put it. bpf_core_read.h although it is core write? > > > > > > > > > > But on the other hand, this is a uapi struct bitfield write, > > > > > strictly speaking, CORE write is really unnecessary here. It > > > > > would be great if we can relieve users from dealing with > > > > > such unnecessary CORE writes. In that sense, for this particular > > > > > case, I would prefer rewriting the code by using byte-level > > > > > stores... > > > > or preserve_static_offset to clearly mean to undo bitfield CORE ... > > > Ok, I will do byte-level rewrite for next revision. > > [...] > > > > This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . > > > > But I don't think it's very pretty. Also I'm seeing on the internet that > > people are saying the exact layout of bitfields is compiler dependent. > > Any reference for this (exact layout of bitfields is compiler dependent)? > > > So I am wondering if these byte sized writes are correct. For that > > matter, I am wondering how the GCC generated bitfield accesses line up > > with clang generated BPF bytecode. Or why uapi contains a bitfield. > > One thing for sure is memory layout of bitfields should be the same > for both clang and gcc as it is determined by C standard. Register > representation and how to manipulate could be different for different > compilers. I was reading this thread: https://github.com/Lora-net/LoRaMac-node/issues/697. It's obviously not authoritative, but they sure sound confident! I think I've also heard it before a long time ago when I was working on adding bitfield support to bpftrace. [...]
On Tue, Nov 28, 2023 at 10:13:50AM -0600, Daniel Xu wrote: > On Mon, Nov 27, 2023 at 08:06:01PM -0800, Yonghong Song wrote: > > > > On 11/27/23 7:01 PM, Daniel Xu wrote: > > > On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: > > > > On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: > > > > > On 11/27/23 12:44 AM, Yonghong Song wrote: > > > > > > On 11/26/23 8:52 PM, Eduard Zingerman wrote: > > > > > > > On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: > > > > > > > [...] > > > > > > > > > Tbh I'm not sure. This test passes with preserve_static_offset > > > > > > > > > because it suppresses preserve_access_index. In general clang > > > > > > > > > translates bitfield access to a set of IR statements like: > > > > > > > > > > > > > > > > > > C: > > > > > > > > > struct foo { > > > > > > > > > unsigned _; > > > > > > > > > unsigned a:1; > > > > > > > > > ... > > > > > > > > > }; > > > > > > > > > ... foo->a ... > > > > > > > > > > > > > > > > > > IR: > > > > > > > > > %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 > > > > > > > > > %bf.load = load i8, ptr %a, align 4 > > > > > > > > > %bf.clear = and i8 %bf.load, 1 > > > > > > > > > %bf.cast = zext i8 %bf.clear to i32 > > > > > > > > > > > > > > > > > > With preserve_static_offset the getelementptr+load are replaced by a > > > > > > > > > single statement which is preserved as-is till code generation, > > > > > > > > > thus load with align 4 is preserved. > > > > > > > > > > > > > > > > > > On the other hand, I'm not sure that clang guarantees that load or > > > > > > > > > stores used for bitfield access would be always aligned according to > > > > > > > > > verifier expectations. > > > > > > > > > > > > > > > > > > I think we should check if there are some clang knobs that prevent > > > > > > > > > generation of unaligned memory access. I'll take a look. > > > > > > > > Is there a reason to prefer fixing in compiler? I'm not opposed to it, > > > > > > > > but the downside to compiler fix is it takes years to propagate and > > > > > > > > sprinkles ifdefs into the code. > > > > > > > > > > > > > > > > Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? > > > > > > > Well, the contraption below passes verification, tunnel selftest > > > > > > > appears to work. I might have messed up some shifts in the macro, > > > > > > > though. > > > > > > I didn't test it. But from high level it should work. > > > > > > > > > > > > > Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular > > > > > > > field access might be unaligned. > > > > > > clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet > > > > > > alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. > > > > > > > > > > > > > --- > > > > > > > > > > > > > > diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > > b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > > index 3065a716544d..41cd913ac7ff 100644 > > > > > > > --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > > +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c > > > > > > > @@ -9,6 +9,7 @@ > > > > > > > #include "vmlinux.h" > > > > > > > #include <bpf/bpf_helpers.h> > > > > > > > #include <bpf/bpf_endian.h> > > > > > > > +#include <bpf/bpf_core_read.h> > > > > > > > #include "bpf_kfuncs.h" > > > > > > > #include "bpf_tracing_net.h" > > > > > > > @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) > > > > > > > return TC_ACT_OK; > > > > > > > } > > > > > > > +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ > > > > > > > + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ > > > > > > > + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ > > > > > > > + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ > > > > > > > + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ > > > > > > > + unsigned bit_size = (rshift - lshift); \ > > > > > > > + unsigned long long nval, val, hi, lo; \ > > > > > > > + \ > > > > > > > + asm volatile("" : "=r"(p) : "0"(p)); \ > > > > > > Use asm volatile("" : "+r"(p)) ? > > > > > > > > > > > > > + \ > > > > > > > + switch (byte_size) { \ > > > > > > > + case 1: val = *(unsigned char *)p; break; \ > > > > > > > + case 2: val = *(unsigned short *)p; break; \ > > > > > > > + case 4: val = *(unsigned int *)p; break; \ > > > > > > > + case 8: val = *(unsigned long long *)p; break; \ > > > > > > > + } \ > > > > > > > + hi = val >> (bit_size + rshift); \ > > > > > > > + hi <<= bit_size + rshift; \ > > > > > > > + lo = val << (bit_size + lshift); \ > > > > > > > + lo >>= bit_size + lshift; \ > > > > > > > + nval = new_val; \ > > > > > > > + nval <<= lshift; \ > > > > > > > + nval >>= rshift; \ > > > > > > > + val = hi | nval | lo; \ > > > > > > > + switch (byte_size) { \ > > > > > > > + case 1: *(unsigned char *)p = val; break; \ > > > > > > > + case 2: *(unsigned short *)p = val; break; \ > > > > > > > + case 4: *(unsigned int *)p = val; break; \ > > > > > > > + case 8: *(unsigned long long *)p = val; break; \ > > > > > > > + } \ > > > > > > > +}) > > > > > > I think this should be put in libbpf public header files but not sure > > > > > > where to put it. bpf_core_read.h although it is core write? > > > > > > > > > > > > But on the other hand, this is a uapi struct bitfield write, > > > > > > strictly speaking, CORE write is really unnecessary here. It > > > > > > would be great if we can relieve users from dealing with > > > > > > such unnecessary CORE writes. In that sense, for this particular > > > > > > case, I would prefer rewriting the code by using byte-level > > > > > > stores... > > > > > or preserve_static_offset to clearly mean to undo bitfield CORE ... > > > > Ok, I will do byte-level rewrite for next revision. > > > [...] > > > > > > This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . > > > > > > But I don't think it's very pretty. Also I'm seeing on the internet that > > > people are saying the exact layout of bitfields is compiler dependent. > > > > Any reference for this (exact layout of bitfields is compiler dependent)? > > > > > So I am wondering if these byte sized writes are correct. For that > > > matter, I am wondering how the GCC generated bitfield accesses line up > > > with clang generated BPF bytecode. Or why uapi contains a bitfield. > > > > One thing for sure is memory layout of bitfields should be the same > > for both clang and gcc as it is determined by C standard. Register > > representation and how to manipulate could be different for different > > compilers. > > I was reading this thread: > https://github.com/Lora-net/LoRaMac-node/issues/697. It's obviously not > authoritative, but they sure sound confident! > > I think I've also heard it before a long time ago when I was working on > adding bitfield support to bpftrace. Wikipedia [0] also claims this: The layout of bit fields in a C struct is implementation-defined. For behavior that remains predictable across compilers, it may be preferable to emulate bit fields with a primitive and bit operators: [0]: https://en.wikipedia.org/wiki/Bit_field#C_programming_language
On Tue, 2023-11-28 at 10:13 -0600, Daniel Xu wrote: [...] > > One thing for sure is memory layout of bitfields should be the same > > for both clang and gcc as it is determined by C standard. Register > > representation and how to manipulate could be different for different > > compilers. > > I was reading this thread: > https://github.com/Lora-net/LoRaMac-node/issues/697. It's obviously not > authoritative, but they sure sound confident! > > I think I've also heard it before a long time ago when I was working on > adding bitfield support to bpftrace. > > > [...] Here is a citation from ISO/IEC 9899:201x (C11 standard) §6.7.2.1 (Structure and union specifiers), paragraph 11 (page 114 in my pdf): > An implementation may allocate any addressable storage unit large > enough to hold a bit- field. If enough space remains, a bit-field > that immediately follows another bit-field in a structure shall be > packed into adjacent bits of the same unit. If insufficient space > remains, whether a bit-field that does not fit is put into the next > unit or overlaps adjacent units is implementation-defined. The order > of allocation of bit-fields within a unit (high-order to low-order > or low-order to high-order) is implementation-defined. The alignment > of the addressable storage unit is unspecified.
On 11/28/23 11:17 AM, Daniel Xu wrote: > On Tue, Nov 28, 2023 at 10:13:50AM -0600, Daniel Xu wrote: >> On Mon, Nov 27, 2023 at 08:06:01PM -0800, Yonghong Song wrote: >>> On 11/27/23 7:01 PM, Daniel Xu wrote: >>>> On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: >>>>> On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: >>>>>> On 11/27/23 12:44 AM, Yonghong Song wrote: >>>>>>> On 11/26/23 8:52 PM, Eduard Zingerman wrote: >>>>>>>> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: >>>>>>>> [...] >>>>>>>>>> Tbh I'm not sure. This test passes with preserve_static_offset >>>>>>>>>> because it suppresses preserve_access_index. In general clang >>>>>>>>>> translates bitfield access to a set of IR statements like: >>>>>>>>>> >>>>>>>>>> C: >>>>>>>>>> struct foo { >>>>>>>>>> unsigned _; >>>>>>>>>> unsigned a:1; >>>>>>>>>> ... >>>>>>>>>> }; >>>>>>>>>> ... foo->a ... >>>>>>>>>> >>>>>>>>>> IR: >>>>>>>>>> %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 >>>>>>>>>> %bf.load = load i8, ptr %a, align 4 >>>>>>>>>> %bf.clear = and i8 %bf.load, 1 >>>>>>>>>> %bf.cast = zext i8 %bf.clear to i32 >>>>>>>>>> >>>>>>>>>> With preserve_static_offset the getelementptr+load are replaced by a >>>>>>>>>> single statement which is preserved as-is till code generation, >>>>>>>>>> thus load with align 4 is preserved. >>>>>>>>>> >>>>>>>>>> On the other hand, I'm not sure that clang guarantees that load or >>>>>>>>>> stores used for bitfield access would be always aligned according to >>>>>>>>>> verifier expectations. >>>>>>>>>> >>>>>>>>>> I think we should check if there are some clang knobs that prevent >>>>>>>>>> generation of unaligned memory access. I'll take a look. >>>>>>>>> Is there a reason to prefer fixing in compiler? I'm not opposed to it, >>>>>>>>> but the downside to compiler fix is it takes years to propagate and >>>>>>>>> sprinkles ifdefs into the code. >>>>>>>>> >>>>>>>>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? >>>>>>>> Well, the contraption below passes verification, tunnel selftest >>>>>>>> appears to work. I might have messed up some shifts in the macro, >>>>>>>> though. >>>>>>> I didn't test it. But from high level it should work. >>>>>>> >>>>>>>> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular >>>>>>>> field access might be unaligned. >>>>>>> clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet >>>>>>> alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. >>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> index 3065a716544d..41cd913ac7ff 100644 >>>>>>>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> @@ -9,6 +9,7 @@ >>>>>>>> #include "vmlinux.h" >>>>>>>> #include <bpf/bpf_helpers.h> >>>>>>>> #include <bpf/bpf_endian.h> >>>>>>>> +#include <bpf/bpf_core_read.h> >>>>>>>> #include "bpf_kfuncs.h" >>>>>>>> #include "bpf_tracing_net.h" >>>>>>>> @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) >>>>>>>> return TC_ACT_OK; >>>>>>>> } >>>>>>>> +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({ \ >>>>>>>> + void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET); \ >>>>>>>> + unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE); \ >>>>>>>> + unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ >>>>>>>> + unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ >>>>>>>> + unsigned bit_size = (rshift - lshift); \ >>>>>>>> + unsigned long long nval, val, hi, lo; \ >>>>>>>> + \ >>>>>>>> + asm volatile("" : "=r"(p) : "0"(p)); \ >>>>>>> Use asm volatile("" : "+r"(p)) ? >>>>>>> >>>>>>>> + \ >>>>>>>> + switch (byte_size) { \ >>>>>>>> + case 1: val = *(unsigned char *)p; break; \ >>>>>>>> + case 2: val = *(unsigned short *)p; break; \ >>>>>>>> + case 4: val = *(unsigned int *)p; break; \ >>>>>>>> + case 8: val = *(unsigned long long *)p; break; \ >>>>>>>> + } \ >>>>>>>> + hi = val >> (bit_size + rshift); \ >>>>>>>> + hi <<= bit_size + rshift; \ >>>>>>>> + lo = val << (bit_size + lshift); \ >>>>>>>> + lo >>= bit_size + lshift; \ >>>>>>>> + nval = new_val; \ >>>>>>>> + nval <<= lshift; \ >>>>>>>> + nval >>= rshift; \ >>>>>>>> + val = hi | nval | lo; \ >>>>>>>> + switch (byte_size) { \ >>>>>>>> + case 1: *(unsigned char *)p = val; break; \ >>>>>>>> + case 2: *(unsigned short *)p = val; break; \ >>>>>>>> + case 4: *(unsigned int *)p = val; break; \ >>>>>>>> + case 8: *(unsigned long long *)p = val; break; \ >>>>>>>> + } \ >>>>>>>> +}) >>>>>>> I think this should be put in libbpf public header files but not sure >>>>>>> where to put it. bpf_core_read.h although it is core write? >>>>>>> >>>>>>> But on the other hand, this is a uapi struct bitfield write, >>>>>>> strictly speaking, CORE write is really unnecessary here. It >>>>>>> would be great if we can relieve users from dealing with >>>>>>> such unnecessary CORE writes. In that sense, for this particular >>>>>>> case, I would prefer rewriting the code by using byte-level >>>>>>> stores... >>>>>> or preserve_static_offset to clearly mean to undo bitfield CORE ... >>>>> Ok, I will do byte-level rewrite for next revision. >>>> [...] >>>> >>>> This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . >>>> >>>> But I don't think it's very pretty. Also I'm seeing on the internet that >>>> people are saying the exact layout of bitfields is compiler dependent. >>> Any reference for this (exact layout of bitfields is compiler dependent)? >>> >>>> So I am wondering if these byte sized writes are correct. For that >>>> matter, I am wondering how the GCC generated bitfield accesses line up >>>> with clang generated BPF bytecode. Or why uapi contains a bitfield. >>> One thing for sure is memory layout of bitfields should be the same >>> for both clang and gcc as it is determined by C standard. Register >>> representation and how to manipulate could be different for different >>> compilers. >> I was reading this thread: >> https://github.com/Lora-net/LoRaMac-node/issues/697. It's obviously not >> authoritative, but they sure sound confident! >> >> I think I've also heard it before a long time ago when I was working on >> adding bitfield support to bpftrace. > Wikipedia [0] also claims this: > > The layout of bit fields in a C struct is > implementation-defined. For behavior that remains predictable > across compilers, it may be preferable to emulate bit fields > with a primitive and bit operators: > > [0]: https://en.wikipedia.org/wiki/Bit_field#C_programming_language Thanks for the informaiton. I am truely not aware of bit field layout could be different for different compilers. Does this mean source level bitfield manipulation may not work? uapi has bitfield is okay. compiler should do the right thing to do load/store in bitfields. Also, the networking bitfields are related memory layout transferring on the wire. Its memory layout is determined (although little/big endian interpresentation is different). BPF_CORE_WRITE_BITFIELD 'should' also be okay since the offset/size etc. is gotten from the compiler internals (from dwarf in more precise term). So looks like BPF_CORE_WRITE_BITFIELD is the way to go. Please use it then.
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c index 3065a716544d..ec7e04e012ae 100644 --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c @@ -6,6 +6,7 @@ * modify it under the terms of version 2 of the GNU General Public * License as published by the Free Software Foundation. */ +#define BPF_NO_PRESERVE_ACCESS_INDEX #include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_endian.h>