Message ID | 20231208000531.19179-1-eddyz87@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | use preserve_static_offset in bpf uapi headers | expand |
On 12/7/23 4:05 PM, Eduard Zingerman wrote: > For certain program context types, the verifier applies the > verifier.c:convert_ctx_access() transformation. > It modifies ST/STX/LDX instructions that access program context. > convert_ctx_access() updates the offset field of these instructions > changing "virtual" offset by offset corresponding to data > representation in the running kernel. > > For this transformation to be applicable access to the context field > shouldn't use pointer arithmetics. For example, consider the read of > __sk_buff->pkt_type field. > If translated as a single ST instruction: > > r0 = *(u32 *)(r1 + 4); > > The verifier would accept such code and patch the offset in the > instruction, however, if translated as a pair of instructions: > > r1 += 4; > r0 = *(u32 *)(r1 + 0); > > The verifier would reject such code. > > Occasionally clang shuffling code during compilation might break > verifier expectations and cause verification errors, e.g. as in [0]. > Technically, this happens because each field read/write represented in > LLVM IR as two operations: address lookup + memory access, > and the compiler is free to move and substitute those independently. > For example, LLVM can rewrite C code below: > > __u32 v; > if (...) > v = sk_buff->pkt_type; > else > v = sk_buff->mark; > > As if it was written as so: > > __u32 v, *p; > if (...) > p = &sk_buff->pkt_type; // r0 = 4; (offset of pkt_type) > else > p = &sk_buff->mark; // r0 = 8; (offset of mark) > v = *p; // r1 += r0; > // r0 = *(u32 *)(r1 + 0) > > Which is a valid rewrite from the point of view of C semantics but won't > pass verification, because convert_ctx_access() can no longer replace > offset in 'r0 = *(u32 *)(r1 + 0)' with a constant. > > Recently, attribute preserve_static_offset was added to > clang [1] to tackle this problem. From its documentation: > > Clang supports the ``__attribute__((preserve_static_offset))`` > attribute for the BPF target. This attribute may be attached to a > struct or union declaration. Reading or writing fields of types having > such annotation is guaranteed to generate LDX/ST/STX instruction with > an offset corresponding to the field. > > The convert_ctx_access() transformation is applied when the context > parameter has one of the following types: > - __sk_buff > - bpf_cgroup_dev_ctx > - bpf_perf_event_data > - bpf_sk_lookup > - bpf_sock > - bpf_sock_addr > - bpf_sock_ops > - bpf_sockopt > - bpf_sysctl > - sk_msg_md > - sk_reuseport_md > - xdp_md All context types are defined in include/linux/bpf_types.h. The context type bpf_nf_ctx is missing. > > From my understanding, BPF programs typically access definitions of > these types in two ways: > - via uapi headers linux/bpf.h and linux/bpf_perf_event.h; and bpf_nf_ctx is defined in include/net/netfilter/nf_bpf_link.h and rely on vmlinux.h to provide the ctx struct definition. > - via vmlinux.h. > > This RFC seeks to mark with preserve_static_offset the definitions of > the relevant context types within uapi headers. > > The attribute is abstracted by '__bpf_ctx' macro. > As bpf.h and bpf_perf_event.h do not share any common include files, > this RFC opts to copy the same definition of '__bpf_ctx' in both > headers to avoid adding a new uapi header. > (Another tempting location for '__bpf_ctx' is compiler_types.h / > compiler-clang.h, but these headers are not exported as uapi). Previously I think we might use similar mechanism like vmlinux.h with push/pop preserve_static_offset attributes. But looks like there are many other structures in uapi bpf.h do not need preserve_static_offset. So I think your approach sounds okay. > > How to add the same definitions in vmlinux.h is an open question, > and most likely requires bpftool modification: > - Hard code generation of __bpf_ctx based on type names? > - Mark context types with some special > __attribute__((btf_decl_tag("preserve_static_offset"))) > and convert it to __attribute__((preserve_static_offset))? The number of context types is limited, I would just go through the first approach with hard coding the list of ctx types and mark them with preserve_static_offset attribute in vmlinux.h. > > Please suggest if any of the options above sound reasonable. > > [0] https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 > [1] 030b8cb1561d ("[BPF] Attribute preserve_static_offset for structs") > git@github.com:llvm/llvm-project.git > > Eduard Zingerman (1): > bpf: Mark virtual BPF context structures as preserve_static_offset > > include/uapi/linux/bpf.h | 28 ++++++++++++++--------- > include/uapi/linux/bpf_perf_event.h | 8 ++++++- > tools/include/uapi/linux/bpf.h | 28 ++++++++++++++--------- > tools/include/uapi/linux/bpf_perf_event.h | 8 ++++++- > 4 files changed, 48 insertions(+), 24 deletions(-) >
On 08/12/2023 00:05, Eduard Zingerman wrote: > For certain program context types, the verifier applies the > verifier.c:convert_ctx_access() transformation. > It modifies ST/STX/LDX instructions that access program context. > convert_ctx_access() updates the offset field of these instructions > changing "virtual" offset by offset corresponding to data > representation in the running kernel. > > For this transformation to be applicable access to the context field > shouldn't use pointer arithmetics. For example, consider the read of > __sk_buff->pkt_type field. > If translated as a single ST instruction: > > r0 = *(u32 *)(r1 + 4); > > The verifier would accept such code and patch the offset in the > instruction, however, if translated as a pair of instructions: > > r1 += 4; > r0 = *(u32 *)(r1 + 0); > > The verifier would reject such code. > Sorry if this is a digression, but I'm trying to understand how this might intersect with vmlinux.h's #pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record Since that is currently applied to all structures in vmlinux.h, does that protect us from the above scenario when BPF code is compiled and #include's vmlinux.h (I suspect not from what you say below but just wanted to check)? I realize we get extra relocation info that we don't need since the offsets for these BPF context structures are recalcuated by the verifier, but given that clang needs to record the relocations, does it also constrain the generated code to avoid these "increment pointer, use zero offset" instruction patterns? Or can they still occur with preserve_access_index applied to the structure? Sorry, might be a naive question but it's not clear to me how (if at all) the mechanisms might interact. The reason I ask is if it was safe to assume that code generation would avoid such patterns with preserve_access_index, it might avoid needing to update vmlinux.h generation. > Occasionally clang shuffling code during compilation might break > verifier expectations and cause verification errors, e.g. as in [0]. > Technically, this happens because each field read/write represented in > LLVM IR as two operations: address lookup + memory access, > and the compiler is free to move and substitute those independently. > For example, LLVM can rewrite C code below: > > __u32 v; > if (...) > v = sk_buff->pkt_type; > else > v = sk_buff->mark; > > As if it was written as so: > > __u32 v, *p; > if (...) > p = &sk_buff->pkt_type; // r0 = 4; (offset of pkt_type) > else > p = &sk_buff->mark; // r0 = 8; (offset of mark) > v = *p; // r1 += r0; > // r0 = *(u32 *)(r1 + 0) > > Which is a valid rewrite from the point of view of C semantics but won't > pass verification, because convert_ctx_access() can no longer replace > offset in 'r0 = *(u32 *)(r1 + 0)' with a constant. > > Recently, attribute preserve_static_offset was added to > clang [1] to tackle this problem. From its documentation: > > Clang supports the ``__attribute__((preserve_static_offset))`` > attribute for the BPF target. This attribute may be attached to a > struct or union declaration. Reading or writing fields of types having > such annotation is guaranteed to generate LDX/ST/STX instruction with > an offset corresponding to the field. > > The convert_ctx_access() transformation is applied when the context > parameter has one of the following types: > - __sk_buff > - bpf_cgroup_dev_ctx > - bpf_perf_event_data > - bpf_sk_lookup > - bpf_sock > - bpf_sock_addr > - bpf_sock_ops > - bpf_sockopt > - bpf_sysctl > - sk_msg_md > - sk_reuseport_md > - xdp_md > > From my understanding, BPF programs typically access definitions of > these types in two ways: > - via uapi headers linux/bpf.h and linux/bpf_perf_event.h; > - via vmlinux.h. > > This RFC seeks to mark with preserve_static_offset the definitions of > the relevant context types within uapi headers. > > The attribute is abstracted by '__bpf_ctx' macro. > As bpf.h and bpf_perf_event.h do not share any common include files, > this RFC opts to copy the same definition of '__bpf_ctx' in both > headers to avoid adding a new uapi header. > (Another tempting location for '__bpf_ctx' is compiler_types.h / > compiler-clang.h, but these headers are not exported as uapi). > > How to add the same definitions in vmlinux.h is an open question, > and most likely requires bpftool modification: > - Hard code generation of __bpf_ctx based on type names? > - Mark context types with some special > __attribute__((btf_decl_tag("preserve_static_offset"))) > and convert it to __attribute__((preserve_static_offset))? > To me it seems like whatever mechanism supports identification of such structures would need to live in vmlinux BTF as ideally it should be possible to generate vmlinux.h purely from that BTF. That seems to argue for the declaration tag approach. Thanks! Alan > Please suggest if any of the options above sound reasonable. > > [0] https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6 > [1] 030b8cb1561d ("[BPF] Attribute preserve_static_offset for structs") > git@github.com:llvm/llvm-project.git > > Eduard Zingerman (1): > bpf: Mark virtual BPF context structures as preserve_static_offset > > include/uapi/linux/bpf.h | 28 ++++++++++++++--------- > include/uapi/linux/bpf_perf_event.h | 8 ++++++- > tools/include/uapi/linux/bpf.h | 28 ++++++++++++++--------- > tools/include/uapi/linux/bpf_perf_event.h | 8 ++++++- > 4 files changed, 48 insertions(+), 24 deletions(-) >
On Fri, 2023-12-08 at 12:27 +0000, Alan Maguire wrote: [...] > Sorry if this is a digression, but I'm trying to understand how > this might intersect with vmlinux.h's > > #pragma clang attribute push (__attribute__((preserve_access_index)), > apply_to = record > > Since that is currently applied to all structures in vmlinux.h, does > that protect us from the above scenario when BPF code is compiled and > #include's vmlinux.h (I suspect not from what you say below but just > wanted to check)? I realize we get extra relocation info that we don't > need since the offsets for these BPF context structures are recalcuated > by the verifier, but given that clang needs to record the relocations, > does it also constrain the generated code to avoid these "increment > pointer, use zero offset" instruction patterns? Or can they still occur > with preserve_access_index applied to the structure? Sorry, might be a > naive question but it's not clear to me how (if at all) the mechanisms > might interact. Unfortunately preserve_access_index does not save us from this problem. This is the case because field reads and writes are split as two LLVM IR instructions: getelementptr to get an address, and load/store to/from that address. The preserve_access_index transformation rewrites the getelementptr but does not touch load/store. For example, consider the following C code: /* #define __ctx __attribute__((preserve_static_offset)) */ /* #define __pai */ #define __ctx #define __pai __attribute__((preserve_access_index)) extern int magic2(int); struct bpf_sock { int bound_dev_if; int family; } __ctx __pai; struct bpf_sockopt { int _; struct bpf_sock *sk; int level; int optlen; } __ctx __pai; int known_load_sink_example_1(struct bpf_sockopt *ctx) { unsigned g = 0; switch (ctx->level) { case 10: g = magic2(ctx->sk->family); break; case 20: g = magic2(ctx->optlen); break; } return g % 2; } Here is how it is compiled: $ clang -g -O2 --target=bpf -mcpu=v3 -c e3.c -o - | llvm-objdump --no-show-raw-insn -Sdr - ... 0000000000000000 <known_load_sink_example_1>: ; switch (ctx->level) { 0: r2 = *(u32 *)(r1 + 0x10) 0000000000000000: CO-RE <byte_off> [2] struct bpf_sockopt::level (0:2) 1: if w2 == 0x14 goto +0x5 <LBB0_3> 2: w0 = 0x0 ; switch (ctx->level) { 3: if w2 != 0xa goto +0x8 <LBB0_5> ; g = magic2(ctx->sk->family); 4: r1 = *(u64 *)(r1 + 0x8) 0000000000000020: CO-RE <byte_off> [2] struct bpf_sockopt::sk (0:1) 5: r2 = 0x4 0000000000000028: CO-RE <byte_off> [7] struct bpf_sock::family (0:1) 6: goto +0x1 <LBB0_4> 0000000000000038 <LBB0_3>: 7: r2 = 0x14 0000000000000038: CO-RE <byte_off> [2] struct bpf_sockopt::optlen (0:3) 0000000000000040 <LBB0_4>: 8: r1 += r2 9: r1 = *(u32 *)(r1 + 0x0) <---------------- verifier error would 10: call -0x1 be reported for this insn 0000000000000050: R_BPF_64_32 magic2 ; return g % 2; 11: w0 &= 0x1 0000000000000060 <LBB0_5>: 12: exit > The reason I ask is if it was safe to assume that code generation would > avoid such patterns with preserve_access_index, it might avoid needing > to update vmlinux.h generation. In current LLVM implementation preserve_static_offset has priority over preserve_access_index. So changing __pai/__ctx definitions above helps. (And this priority of one attribute over the other was the reason to have preserve_static_offset as an attribute, not as btf_decl_tag("preserve_static_offset"). Although that is unfortunate for vmlinux.h, as we already have means to preserve decl tags). [...] > > How to add the same definitions in vmlinux.h is an open question, > > and most likely requires bpftool modification: > > - Hard code generation of __bpf_ctx based on type names? > > - Mark context types with some special > > __attribute__((btf_decl_tag("preserve_static_offset"))) > > and convert it to __attribute__((preserve_static_offset))? > > To me it seems like whatever mechanism supports identification of such > structures would need to live in vmlinux BTF as ideally it should be > possible to generate vmlinux.h purely from that BTF. That seems to argue > for the declaration tag approach. Tbh, I like the decl tag approach a bit more too. Although macro definition would be somewhat ridiculous: #if __has_attribute(preserve_static_offset) && defined(__bpf__) #define __bpf_ctx __attribute__((preserve_static_offset)) \ __attribute__((btf_decl_tag("preserve_static_offset"))) #else #define __bpf_ctx #endif Thanks, Eduard
On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote: [...] > All context types are defined in include/linux/bpf_types.h. > The context type bpf_nf_ctx is missing. convert_ctx_access() is not applied for bpf_nf_ctx. Searching through kernel code shows that BPF programs access this structure directly (net/netfilter/nf_bpf_link.c): static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, const struct nf_hook_state *s) { const struct bpf_prog *prog = bpf_prog; struct bpf_nf_ctx ctx = { .state = s, .skb = skb, }; return bpf_prog_run(prog, &ctx); } I added __bpf_ctx only for types that are subject to convert_ctx_access() transformation. On the other hand, applying it to each context type should not hurt either. Which way would you prefer? [...] > > How to add the same definitions in vmlinux.h is an open question, > > and most likely requires bpftool modification: > > - Hard code generation of __bpf_ctx based on type names? > > - Mark context types with some special > > __attribute__((btf_decl_tag("preserve_static_offset"))) > > and convert it to __attribute__((preserve_static_offset))? > > The number of context types is limited, I would just go through > the first approach with hard coding the list of ctx types and > mark them with preserve_static_offset attribute in vmlinux.h. Tbh, I'm with Alan here, generic approach seems a tad nicer. Lets collect some more votes :)
On 08/12/2023 14:21, Eduard Zingerman wrote: > On Fri, 2023-12-08 at 12:27 +0000, Alan Maguire wrote: > [...] >> Sorry if this is a digression, but I'm trying to understand how >> this might intersect with vmlinux.h's >> >> #pragma clang attribute push (__attribute__((preserve_access_index)), >> apply_to = record >> >> Since that is currently applied to all structures in vmlinux.h, does >> that protect us from the above scenario when BPF code is compiled and >> #include's vmlinux.h (I suspect not from what you say below but just >> wanted to check)? I realize we get extra relocation info that we don't >> need since the offsets for these BPF context structures are recalcuated >> by the verifier, but given that clang needs to record the relocations, >> does it also constrain the generated code to avoid these "increment >> pointer, use zero offset" instruction patterns? Or can they still occur >> with preserve_access_index applied to the structure? Sorry, might be a >> naive question but it's not clear to me how (if at all) the mechanisms >> might interact. > > Unfortunately preserve_access_index does not save us from this problem. > This is the case because field reads and writes are split as two LLVM > IR instructions: getelementptr to get an address, and load/store > to/from that address. The preserve_access_index transformation > rewrites the getelementptr but does not touch load/store. > > For example, consider the following C code: > > /* #define __ctx __attribute__((preserve_static_offset)) */ > /* #define __pai */ > #define __ctx > #define __pai __attribute__((preserve_access_index)) > > extern int magic2(int); > > struct bpf_sock { > int bound_dev_if; > int family; > } __ctx __pai; > > struct bpf_sockopt { > int _; > struct bpf_sock *sk; > int level; > int optlen; > } __ctx __pai; > > int known_load_sink_example_1(struct bpf_sockopt *ctx) > { > unsigned g = 0; > switch (ctx->level) { > case 10: > g = magic2(ctx->sk->family); > break; > case 20: > g = magic2(ctx->optlen); > break; > } > return g % 2; > } > > Here is how it is compiled: > > $ clang -g -O2 --target=bpf -mcpu=v3 -c e3.c -o - | llvm-objdump --no-show-raw-insn -Sdr - > ... > 0000000000000000 <known_load_sink_example_1>: > ; switch (ctx->level) { > 0: r2 = *(u32 *)(r1 + 0x10) > 0000000000000000: CO-RE <byte_off> [2] struct bpf_sockopt::level (0:2) > 1: if w2 == 0x14 goto +0x5 <LBB0_3> > 2: w0 = 0x0 > ; switch (ctx->level) { > 3: if w2 != 0xa goto +0x8 <LBB0_5> > ; g = magic2(ctx->sk->family); > 4: r1 = *(u64 *)(r1 + 0x8) > 0000000000000020: CO-RE <byte_off> [2] struct bpf_sockopt::sk (0:1) > 5: r2 = 0x4 > 0000000000000028: CO-RE <byte_off> [7] struct bpf_sock::family (0:1) > 6: goto +0x1 <LBB0_4> > > 0000000000000038 <LBB0_3>: > 7: r2 = 0x14 > 0000000000000038: CO-RE <byte_off> [2] struct bpf_sockopt::optlen (0:3) > > 0000000000000040 <LBB0_4>: > 8: r1 += r2 > 9: r1 = *(u32 *)(r1 + 0x0) <---------------- verifier error would > 10: call -0x1 be reported for this insn > 0000000000000050: R_BPF_64_32 magic2 > ; return g % 2; > 11: w0 &= 0x1 > > 0000000000000060 <LBB0_5>: > 12: exit > >> The reason I ask is if it was safe to assume that code generation would >> avoid such patterns with preserve_access_index, it might avoid needing >> to update vmlinux.h generation. > > In current LLVM implementation preserve_static_offset has priority > over preserve_access_index. So changing __pai/__ctx definitions above helps. > (And this priority of one attribute over the other was the reason to > have preserve_static_offset as an attribute, not as > btf_decl_tag("preserve_static_offset"). Although that is unfortunate > for vmlinux.h, as we already have means to preserve decl tags). > Thanks for the explanation! > [...] > >>> How to add the same definitions in vmlinux.h is an open question, >>> and most likely requires bpftool modification: >>> - Hard code generation of __bpf_ctx based on type names? >>> - Mark context types with some special >>> __attribute__((btf_decl_tag("preserve_static_offset"))) >>> and convert it to __attribute__((preserve_static_offset))? >> >> To me it seems like whatever mechanism supports identification of such >> structures would need to live in vmlinux BTF as ideally it should be >> possible to generate vmlinux.h purely from that BTF. That seems to argue >> for the declaration tag approach. > > Tbh, I like the decl tag approach a bit more too. > Although macro definition would be somewhat ridiculous: > > #if __has_attribute(preserve_static_offset) && defined(__bpf__) > #define __bpf_ctx __attribute__((preserve_static_offset)) \ > __attribute__((btf_decl_tag("preserve_static_offset"))) > #else > #define __bpf_ctx > #endif > As macro definitions go, that's not that ridiculous ;-) If we add it to vmlinux.h, would be good to have a #ifdef BPF_NO_PRESERVE_STATIC_OFFSET #undef __bpf_ctx #define __bpf_ctx #endif ...too, just in case the user wanted to use CO-RE with any of the types covered. Thanks! Alan
On Fri, 2023-12-08 at 15:35 +0000, Alan Maguire wrote: [...] > > Tbh, I like the decl tag approach a bit more too. > > Although macro definition would be somewhat ridiculous: > > > > #if __has_attribute(preserve_static_offset) && defined(__bpf__) > > #define __bpf_ctx __attribute__((preserve_static_offset)) \ > > __attribute__((btf_decl_tag("preserve_static_offset"))) > > #else > > #define __bpf_ctx > > #endif > > > > As macro definitions go, that's not that ridiculous ;-) Fair enough :) > If we add it to vmlinux.h, would be good to have a > > #ifdef BPF_NO_PRESERVE_STATIC_OFFSET > #undef __bpf_ctx > #define __bpf_ctx > #endif > > ...too, just in case the user wanted to use CO-RE with any of the types > covered. Thanks! Will do. Thanks, Eduard.
On 12/8/23 6:34 AM, Eduard Zingerman wrote: > On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote: > [...] >> All context types are defined in include/linux/bpf_types.h. >> The context type bpf_nf_ctx is missing. > convert_ctx_access() is not applied for bpf_nf_ctx. Searching through > kernel code shows that BPF programs access this structure directly > (net/netfilter/nf_bpf_link.c): > > static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, > const struct nf_hook_state *s) > { > const struct bpf_prog *prog = bpf_prog; > struct bpf_nf_ctx ctx = { > .state = s, > .skb = skb, > }; > > return bpf_prog_run(prog, &ctx); > } > > I added __bpf_ctx only for types that are subject to convert_ctx_access() > transformation. On the other hand, applying it to each context type > should not hurt either. Which way would you prefer? > > [...] The error message should happen here: check_mem_access ... } else if (reg->type == PTR_TO_CTX) { check_ptr_off_reg __check_ptr_off_reg if (!fixed_off_ok && reg->off) { verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n", reg_type_str(env, reg->type), regno, reg->off); return -EACCES; } ... So the verification error message will be emitted earlier, before convert_ctx_access. Could you double check? > >>> How to add the same definitions in vmlinux.h is an open question, >>> and most likely requires bpftool modification: >>> - Hard code generation of __bpf_ctx based on type names? >>> - Mark context types with some special >>> __attribute__((btf_decl_tag("preserve_static_offset"))) >>> and convert it to __attribute__((preserve_static_offset))? >> The number of context types is limited, I would just go through >> the first approach with hard coding the list of ctx types and >> mark them with preserve_static_offset attribute in vmlinux.h. > Tbh, I'm with Alan here, generic approach seems a tad nicer. > Lets collect some more votes :)
On 12/8/23 6:34 AM, Eduard Zingerman wrote: > On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote: > [...] >> All context types are defined in include/linux/bpf_types.h. >> The context type bpf_nf_ctx is missing. > convert_ctx_access() is not applied for bpf_nf_ctx. Searching through > kernel code shows that BPF programs access this structure directly > (net/netfilter/nf_bpf_link.c): > > static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, > const struct nf_hook_state *s) > { > const struct bpf_prog *prog = bpf_prog; > struct bpf_nf_ctx ctx = { > .state = s, > .skb = skb, > }; > > return bpf_prog_run(prog, &ctx); > } > > I added __bpf_ctx only for types that are subject to convert_ctx_access() > transformation. On the other hand, applying it to each context type > should not hurt either. Which way would you prefer? > > [...] > >>> How to add the same definitions in vmlinux.h is an open question, >>> and most likely requires bpftool modification: >>> - Hard code generation of __bpf_ctx based on type names? >>> - Mark context types with some special >>> __attribute__((btf_decl_tag("preserve_static_offset"))) >>> and convert it to __attribute__((preserve_static_offset))? >> The number of context types is limited, I would just go through >> the first approach with hard coding the list of ctx types and >> mark them with preserve_static_offset attribute in vmlinux.h. > Tbh, I'm with Alan here, generic approach seems a tad nicer. > Lets collect some more votes :) I just want to propose to have less work :-) since we are only dealing with a few structs in bpf domain. Note that eventually generated vmlinux.h will be the same whether we use 'hard code' approach or the decl_tag approach. The difference is just how to do it: - dwarf/btf with decl tag -> bpftool vmlinux.h gen, or - dwarf/btf without decl tag + hardcoded bpf ctx info -> bpftool vmlinux.h gen If we intends to cover all uapi data structures (to prevent unnecessary CORE relocation, esp. for troublesome bitfield operations), hardcoded approach won't work and we may have to go to decl tag approach.
On Fri, Dec 8, 2023 at 9:30 AM Yonghong Song <yonghong.song@linux.dev> wrote: > > > On 12/8/23 6:34 AM, Eduard Zingerman wrote: > > On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote: > > [...] > >> All context types are defined in include/linux/bpf_types.h. > >> The context type bpf_nf_ctx is missing. > > convert_ctx_access() is not applied for bpf_nf_ctx. Searching through > > kernel code shows that BPF programs access this structure directly > > (net/netfilter/nf_bpf_link.c): > > > > static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb, > > const struct nf_hook_state *s) > > { > > const struct bpf_prog *prog = bpf_prog; > > struct bpf_nf_ctx ctx = { > > .state = s, > > .skb = skb, > > }; > > > > return bpf_prog_run(prog, &ctx); > > } > > > > I added __bpf_ctx only for types that are subject to convert_ctx_access() > > transformation. On the other hand, applying it to each context type > > should not hurt either. Which way would you prefer? > > > > [...] > > > >>> How to add the same definitions in vmlinux.h is an open question, > >>> and most likely requires bpftool modification: > >>> - Hard code generation of __bpf_ctx based on type names? > >>> - Mark context types with some special > >>> __attribute__((btf_decl_tag("preserve_static_offset"))) > >>> and convert it to __attribute__((preserve_static_offset))? > >> The number of context types is limited, I would just go through > >> the first approach with hard coding the list of ctx types and > >> mark them with preserve_static_offset attribute in vmlinux.h. > > Tbh, I'm with Alan here, generic approach seems a tad nicer. > > Lets collect some more votes :) > > I just want to propose to have less work :-) since we are only dealing > with a few structs in bpf domain. Note that eventually generated > vmlinux.h will be the same whether we use 'hard code' approach or the > decl_tag approach. The difference is just how to do it: - dwarf/btf with > decl tag -> bpftool vmlinux.h gen, or - dwarf/btf without decl tag + > hardcoded bpf ctx info -> bpftool vmlinux.h gen If we intends to cover > all uapi data structures (to prevent unnecessary CORE relocation, esp. > for troublesome bitfield operations), hardcoded approach won't work and > we may have to go to decl tag approach. +1 for simplicity of "hard code" approach. We've stopped adding new uapi "mirror" structs like __sk_buff long ago. The number of structs that need ctx rewrite will not increase.
On Fri, 2023-12-08 at 09:46 -0800, Alexei Starovoitov wrote: [...] > > I just want to propose to have less work :-) since we are only dealing > > with a few structs in bpf domain. Note that eventually generated > > vmlinux.h will be the same whether we use 'hard code' approach or the > > decl_tag approach. The difference is just how to do it: - dwarf/btf with > > decl tag -> bpftool vmlinux.h gen, or - dwarf/btf without decl tag + > > hardcoded bpf ctx info -> bpftool vmlinux.h gen If we intends to cover > > all uapi data structures (to prevent unnecessary CORE relocation, esp. > > for troublesome bitfield operations), hardcoded approach won't work and > > we may have to go to decl tag approach. > > +1 for simplicity of "hard code" approach. > We've stopped adding new uapi "mirror" structs like __sk_buff long ago. > The number of structs that need ctx rewrite will not increase. Ok, I'll submit V2 with changes in libbpf/bpftool to emit preserve_static_offset for predefined set of types.
On Fri, 2023-12-08 at 09:19 -0800, Yonghong Song wrote: > On 12/8/23 6:34 AM, Eduard Zingerman wrote: > > On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote: > > [...] > > > All context types are defined in include/linux/bpf_types.h. > > > The context type bpf_nf_ctx is missing. [...] > The error message should happen here: > > check_mem_access > ... > } else if (reg->type == PTR_TO_CTX) { > check_ptr_off_reg > __check_ptr_off_reg > if (!fixed_off_ok && reg->off) { > verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n", > reg_type_str(env, reg->type), regno, reg->off); > return -EACCES; > } > ... > > So the verification error message will be emitted earlier, before convert_ctx_access. > Could you double check? You are correct and I was unaware of this check. A simple test "r1 += 8; r0 = *(u64 *)(r1 + 0);" does indeed report an error. I'll make sure that every context type is annotated with preserve static offset, thank you for pointing this out. [...]