mbox series

[bpf-next,0/1] use preserve_static_offset in bpf uapi headers

Message ID 20231208000531.19179-1-eddyz87@gmail.com (mailing list archive)
Headers show
Series use preserve_static_offset in bpf uapi headers | expand

Message

Eduard Zingerman Dec. 8, 2023, 12:05 a.m. UTC
For certain program context types, the verifier applies the
verifier.c:convert_ctx_access() transformation.
It modifies ST/STX/LDX instructions that access program context.
convert_ctx_access() updates the offset field of these instructions
changing "virtual" offset by offset corresponding to data
representation in the running kernel.

For this transformation to be applicable access to the context field
shouldn't use pointer arithmetics. For example, consider the read of
__sk_buff->pkt_type field.
If translated as a single ST instruction:

    r0 = *(u32 *)(r1 + 4);

The verifier would accept such code and patch the offset in the
instruction, however, if translated as a pair of instructions:

    r1 += 4;
    r0 = *(u32 *)(r1 + 0);

The verifier would reject such code.

Occasionally clang shuffling code during compilation might break
verifier expectations and cause verification errors, e.g. as in [0].
Technically, this happens because each field read/write represented in
LLVM IR as two operations: address lookup + memory access,
and the compiler is free to move and substitute those independently.
For example, LLVM can rewrite C code below:

    __u32 v;
    if (...)
      v = sk_buff->pkt_type;
    else
      v = sk_buff->mark;

As if it was written as so:

    __u32 v, *p;
    if (...)
      p = &sk_buff->pkt_type;  // r0 = 4; (offset of pkt_type)
    else
      p = &sk_buff->mark;      // r0 = 8; (offset of mark)
    v = *p;                    // r1 += r0;
                               // r0 = *(u32 *)(r1 + 0)

Which is a valid rewrite from the point of view of C semantics but won't
pass verification, because convert_ctx_access() can no longer replace
offset in 'r0 = *(u32 *)(r1 + 0)' with a constant.

Recently, attribute preserve_static_offset was added to
clang [1] to tackle this problem. From its documentation:

  Clang supports the ``__attribute__((preserve_static_offset))``
  attribute for the BPF target. This attribute may be attached to a
  struct or union declaration. Reading or writing fields of types having
  such annotation is guaranteed to generate LDX/ST/STX instruction with
  an offset corresponding to the field.

The convert_ctx_access() transformation is applied when the context
parameter has one of the following types:
- __sk_buff
- bpf_cgroup_dev_ctx
- bpf_perf_event_data
- bpf_sk_lookup
- bpf_sock
- bpf_sock_addr
- bpf_sock_ops
- bpf_sockopt
- bpf_sysctl
- sk_msg_md
- sk_reuseport_md
- xdp_md

From my understanding, BPF programs typically access definitions of
these types in two ways:
- via uapi headers linux/bpf.h and linux/bpf_perf_event.h;
- via vmlinux.h.

This RFC seeks to mark with preserve_static_offset the definitions of
the relevant context types within uapi headers.

The attribute is abstracted by '__bpf_ctx' macro. 
As bpf.h and bpf_perf_event.h do not share any common include files,
this RFC opts to copy the same definition of '__bpf_ctx' in both
headers to avoid adding a new uapi header.
(Another tempting location for '__bpf_ctx' is compiler_types.h /
 compiler-clang.h, but these headers are not exported as uapi).

How to add the same definitions in vmlinux.h is an open question,
and most likely requires bpftool modification:
- Hard code generation of __bpf_ctx based on type names?
- Mark context types with some special
  __attribute__((btf_decl_tag("preserve_static_offset")))
  and convert it to __attribute__((preserve_static_offset))?

Please suggest if any of the options above sound reasonable.

[0] https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
[1] 030b8cb1561d ("[BPF] Attribute preserve_static_offset for structs")
    git@github.com:llvm/llvm-project.git

Eduard Zingerman (1):
  bpf: Mark virtual BPF context structures as preserve_static_offset

 include/uapi/linux/bpf.h                  | 28 ++++++++++++++---------
 include/uapi/linux/bpf_perf_event.h       |  8 ++++++-
 tools/include/uapi/linux/bpf.h            | 28 ++++++++++++++---------
 tools/include/uapi/linux/bpf_perf_event.h |  8 ++++++-
 4 files changed, 48 insertions(+), 24 deletions(-)

Comments

Yonghong Song Dec. 8, 2023, 2:28 a.m. UTC | #1
On 12/7/23 4:05 PM, Eduard Zingerman wrote:
> For certain program context types, the verifier applies the
> verifier.c:convert_ctx_access() transformation.
> It modifies ST/STX/LDX instructions that access program context.
> convert_ctx_access() updates the offset field of these instructions
> changing "virtual" offset by offset corresponding to data
> representation in the running kernel.
>
> For this transformation to be applicable access to the context field
> shouldn't use pointer arithmetics. For example, consider the read of
> __sk_buff->pkt_type field.
> If translated as a single ST instruction:
>
>      r0 = *(u32 *)(r1 + 4);
>
> The verifier would accept such code and patch the offset in the
> instruction, however, if translated as a pair of instructions:
>
>      r1 += 4;
>      r0 = *(u32 *)(r1 + 0);
>
> The verifier would reject such code.
>
> Occasionally clang shuffling code during compilation might break
> verifier expectations and cause verification errors, e.g. as in [0].
> Technically, this happens because each field read/write represented in
> LLVM IR as two operations: address lookup + memory access,
> and the compiler is free to move and substitute those independently.
> For example, LLVM can rewrite C code below:
>
>      __u32 v;
>      if (...)
>        v = sk_buff->pkt_type;
>      else
>        v = sk_buff->mark;
>
> As if it was written as so:
>
>      __u32 v, *p;
>      if (...)
>        p = &sk_buff->pkt_type;  // r0 = 4; (offset of pkt_type)
>      else
>        p = &sk_buff->mark;      // r0 = 8; (offset of mark)
>      v = *p;                    // r1 += r0;
>                                 // r0 = *(u32 *)(r1 + 0)
>
> Which is a valid rewrite from the point of view of C semantics but won't
> pass verification, because convert_ctx_access() can no longer replace
> offset in 'r0 = *(u32 *)(r1 + 0)' with a constant.
>
> Recently, attribute preserve_static_offset was added to
> clang [1] to tackle this problem. From its documentation:
>
>    Clang supports the ``__attribute__((preserve_static_offset))``
>    attribute for the BPF target. This attribute may be attached to a
>    struct or union declaration. Reading or writing fields of types having
>    such annotation is guaranteed to generate LDX/ST/STX instruction with
>    an offset corresponding to the field.
>
> The convert_ctx_access() transformation is applied when the context
> parameter has one of the following types:
> - __sk_buff
> - bpf_cgroup_dev_ctx
> - bpf_perf_event_data
> - bpf_sk_lookup
> - bpf_sock
> - bpf_sock_addr
> - bpf_sock_ops
> - bpf_sockopt
> - bpf_sysctl
> - sk_msg_md
> - sk_reuseport_md
> - xdp_md

All context types are defined in include/linux/bpf_types.h.
The context type bpf_nf_ctx is missing.

>
>  From my understanding, BPF programs typically access definitions of
> these types in two ways:
> - via uapi headers linux/bpf.h and linux/bpf_perf_event.h;

and bpf_nf_ctx is defined in include/net/netfilter/nf_bpf_link.h
and rely on vmlinux.h to provide the ctx struct definition.

> - via vmlinux.h.
>
> This RFC seeks to mark with preserve_static_offset the definitions of
> the relevant context types within uapi headers.
>
> The attribute is abstracted by '__bpf_ctx' macro.
> As bpf.h and bpf_perf_event.h do not share any common include files,
> this RFC opts to copy the same definition of '__bpf_ctx' in both
> headers to avoid adding a new uapi header.
> (Another tempting location for '__bpf_ctx' is compiler_types.h /
>   compiler-clang.h, but these headers are not exported as uapi).

Previously I think we might use similar mechanism like vmlinux.h
with push/pop preserve_static_offset attributes. But looks like
there are many other structures in uapi bpf.h do not need
preserve_static_offset. So I think your approach sounds okay.

>
> How to add the same definitions in vmlinux.h is an open question,
> and most likely requires bpftool modification:
> - Hard code generation of __bpf_ctx based on type names?
> - Mark context types with some special
>    __attribute__((btf_decl_tag("preserve_static_offset")))
>    and convert it to __attribute__((preserve_static_offset))?

The number of context types is limited, I would just go through
the first approach with hard coding the list of ctx types and
mark them with preserve_static_offset attribute in vmlinux.h.

>
> Please suggest if any of the options above sound reasonable.
>
> [0] https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
> [1] 030b8cb1561d ("[BPF] Attribute preserve_static_offset for structs")
>      git@github.com:llvm/llvm-project.git
>
> Eduard Zingerman (1):
>    bpf: Mark virtual BPF context structures as preserve_static_offset
>
>   include/uapi/linux/bpf.h                  | 28 ++++++++++++++---------
>   include/uapi/linux/bpf_perf_event.h       |  8 ++++++-
>   tools/include/uapi/linux/bpf.h            | 28 ++++++++++++++---------
>   tools/include/uapi/linux/bpf_perf_event.h |  8 ++++++-
>   4 files changed, 48 insertions(+), 24 deletions(-)
>
Alan Maguire Dec. 8, 2023, 12:27 p.m. UTC | #2
On 08/12/2023 00:05, Eduard Zingerman wrote:
> For certain program context types, the verifier applies the
> verifier.c:convert_ctx_access() transformation.
> It modifies ST/STX/LDX instructions that access program context.
> convert_ctx_access() updates the offset field of these instructions
> changing "virtual" offset by offset corresponding to data
> representation in the running kernel.
> 
> For this transformation to be applicable access to the context field
> shouldn't use pointer arithmetics. For example, consider the read of
> __sk_buff->pkt_type field.
> If translated as a single ST instruction:
> 
>     r0 = *(u32 *)(r1 + 4);
> 
> The verifier would accept such code and patch the offset in the
> instruction, however, if translated as a pair of instructions:
> 
>     r1 += 4;
>     r0 = *(u32 *)(r1 + 0);
> 
> The verifier would reject such code.
>

Sorry if this is a digression, but I'm trying to understand how
this might intersect with vmlinux.h's

#pragma clang attribute push (__attribute__((preserve_access_index)),
apply_to = record

Since that is currently applied to all structures in vmlinux.h, does
that protect us from the above scenario when BPF code is compiled and
#include's vmlinux.h (I suspect not from what you say below but just
wanted to check)? I realize we get extra relocation info that we don't
need since the offsets for these BPF context structures are recalcuated
by the verifier, but given that clang needs to record the relocations,
does it also constrain the generated code to avoid these "increment
pointer, use zero offset" instruction patterns? Or can they still occur
with preserve_access_index applied to the structure? Sorry, might be a
naive question but it's not clear to me how (if at all) the mechanisms
might interact.

The reason I ask is if it was safe to assume that code generation would
avoid such patterns with preserve_access_index, it might avoid needing
to update vmlinux.h generation.

> Occasionally clang shuffling code during compilation might break
> verifier expectations and cause verification errors, e.g. as in [0].
> Technically, this happens because each field read/write represented in
> LLVM IR as two operations: address lookup + memory access,
> and the compiler is free to move and substitute those independently.
> For example, LLVM can rewrite C code below:
> 
>     __u32 v;
>     if (...)
>       v = sk_buff->pkt_type;
>     else
>       v = sk_buff->mark;
> 
> As if it was written as so:
> 
>     __u32 v, *p;
>     if (...)
>       p = &sk_buff->pkt_type;  // r0 = 4; (offset of pkt_type)
>     else
>       p = &sk_buff->mark;      // r0 = 8; (offset of mark)
>     v = *p;                    // r1 += r0;
>                                // r0 = *(u32 *)(r1 + 0)
> 
> Which is a valid rewrite from the point of view of C semantics but won't
> pass verification, because convert_ctx_access() can no longer replace
> offset in 'r0 = *(u32 *)(r1 + 0)' with a constant.
> 
> Recently, attribute preserve_static_offset was added to
> clang [1] to tackle this problem. From its documentation:
> 
>   Clang supports the ``__attribute__((preserve_static_offset))``
>   attribute for the BPF target. This attribute may be attached to a
>   struct or union declaration. Reading or writing fields of types having
>   such annotation is guaranteed to generate LDX/ST/STX instruction with
>   an offset corresponding to the field.
> 
> The convert_ctx_access() transformation is applied when the context
> parameter has one of the following types:
> - __sk_buff
> - bpf_cgroup_dev_ctx
> - bpf_perf_event_data
> - bpf_sk_lookup
> - bpf_sock
> - bpf_sock_addr
> - bpf_sock_ops
> - bpf_sockopt
> - bpf_sysctl
> - sk_msg_md
> - sk_reuseport_md
> - xdp_md
> 
> From my understanding, BPF programs typically access definitions of
> these types in two ways:
> - via uapi headers linux/bpf.h and linux/bpf_perf_event.h;
> - via vmlinux.h.
> 
> This RFC seeks to mark with preserve_static_offset the definitions of
> the relevant context types within uapi headers.
> 
> The attribute is abstracted by '__bpf_ctx' macro. 
> As bpf.h and bpf_perf_event.h do not share any common include files,
> this RFC opts to copy the same definition of '__bpf_ctx' in both
> headers to avoid adding a new uapi header.
> (Another tempting location for '__bpf_ctx' is compiler_types.h /
>  compiler-clang.h, but these headers are not exported as uapi).
> 
> How to add the same definitions in vmlinux.h is an open question,
> and most likely requires bpftool modification:
> - Hard code generation of __bpf_ctx based on type names?
> - Mark context types with some special
>   __attribute__((btf_decl_tag("preserve_static_offset")))
>   and convert it to __attribute__((preserve_static_offset))?
>

To me it seems like whatever mechanism supports identification of such
structures would need to live in vmlinux BTF as ideally it should be
possible to generate vmlinux.h purely from that BTF. That seems to argue
for the declaration tag approach.

Thanks!

Alan

> Please suggest if any of the options above sound reasonable.
> 
> [0] https://lore.kernel.org/bpf/CAA-VZPmxh8o8EBcJ=m-DH4ytcxDFmo0JKsm1p1gf40kS0CE3NQ@mail.gmail.com/T/#m4b9ce2ce73b34f34172328f975235fc6f19841b6
> [1] 030b8cb1561d ("[BPF] Attribute preserve_static_offset for structs")
>     git@github.com:llvm/llvm-project.git
> 
> Eduard Zingerman (1):
>   bpf: Mark virtual BPF context structures as preserve_static_offset
> 
>  include/uapi/linux/bpf.h                  | 28 ++++++++++++++---------
>  include/uapi/linux/bpf_perf_event.h       |  8 ++++++-
>  tools/include/uapi/linux/bpf.h            | 28 ++++++++++++++---------
>  tools/include/uapi/linux/bpf_perf_event.h |  8 ++++++-
>  4 files changed, 48 insertions(+), 24 deletions(-)
>
Eduard Zingerman Dec. 8, 2023, 2:21 p.m. UTC | #3
On Fri, 2023-12-08 at 12:27 +0000, Alan Maguire wrote:
[...]
> Sorry if this is a digression, but I'm trying to understand how
> this might intersect with vmlinux.h's
>
> #pragma clang attribute push (__attribute__((preserve_access_index)),
> apply_to = record
>
> Since that is currently applied to all structures in vmlinux.h, does
> that protect us from the above scenario when BPF code is compiled and
> #include's vmlinux.h (I suspect not from what you say below but just
> wanted to check)? I realize we get extra relocation info that we don't
> need since the offsets for these BPF context structures are recalcuated
> by the verifier, but given that clang needs to record the relocations,
> does it also constrain the generated code to avoid these "increment
> pointer, use zero offset" instruction patterns? Or can they still occur
> with preserve_access_index applied to the structure? Sorry, might be a
> naive question but it's not clear to me how (if at all) the mechanisms
> might interact.

Unfortunately preserve_access_index does not save us from this problem.
This is the case because field reads and writes are split as two LLVM
IR instructions: getelementptr to get an address, and load/store
to/from that address. The preserve_access_index transformation
rewrites the getelementptr but does not touch load/store.

For example, consider the following C code:

    /* #define __ctx __attribute__((preserve_static_offset)) */
    /* #define __pai */
    #define __ctx
    #define __pai __attribute__((preserve_access_index))

    extern int magic2(int);

    struct bpf_sock {
      int bound_dev_if;
      int family;
    } __ctx __pai;

    struct bpf_sockopt {
      int _;
      struct bpf_sock *sk;
      int level;
      int optlen;
    } __ctx __pai;

    int known_load_sink_example_1(struct bpf_sockopt *ctx)
    {
      unsigned g = 0;
      switch (ctx->level) {
      case 10:
        g = magic2(ctx->sk->family);
        break;
      case 20:
        g = magic2(ctx->optlen);
        break;
      }
      return g % 2;
    }

Here is how it is compiled:

    $ clang -g -O2 --target=bpf -mcpu=v3 -c e3.c -o - | llvm-objdump --no-show-raw-insn -Sdr -
    ...
    0000000000000000 <known_load_sink_example_1>:
    ;   switch (ctx->level) {
           0:   r2 = *(u32 *)(r1 + 0x10)
            0000000000000000:  CO-RE <byte_off> [2] struct bpf_sockopt::level (0:2)
           1:   if w2 == 0x14 goto +0x5 <LBB0_3>
           2:   w0 = 0x0
    ;   switch (ctx->level) {
           3:   if w2 != 0xa goto +0x8 <LBB0_5>
    ;     g = magic2(ctx->sk->family);
           4:   r1 = *(u64 *)(r1 + 0x8)
            0000000000000020:  CO-RE <byte_off> [2] struct bpf_sockopt::sk (0:1)
           5:   r2 = 0x4
            0000000000000028:  CO-RE <byte_off> [7] struct bpf_sock::family (0:1)
           6:   goto +0x1 <LBB0_4>

    0000000000000038 <LBB0_3>:
           7:   r2 = 0x14
            0000000000000038:  CO-RE <byte_off> [2] struct bpf_sockopt::optlen (0:3)

    0000000000000040 <LBB0_4>:
           8:   r1 += r2
           9:   r1 = *(u32 *)(r1 + 0x0)  <---------------- verifier error would
          10:   call -0x1                                  be reported for this insn
            0000000000000050:  R_BPF_64_32  magic2
    ;   return g % 2;
          11:   w0 &= 0x1

    0000000000000060 <LBB0_5>:
          12:   exit

> The reason I ask is if it was safe to assume that code generation would
> avoid such patterns with preserve_access_index, it might avoid needing
> to update vmlinux.h generation.

In current LLVM implementation preserve_static_offset has priority
over preserve_access_index. So changing __pai/__ctx definitions above helps.
(And this priority of one attribute over the other was the reason to
 have preserve_static_offset as an attribute, not as
 btf_decl_tag("preserve_static_offset"). Although that is unfortunate
 for vmlinux.h, as we already have means to preserve decl tags).

[...]

> > How to add the same definitions in vmlinux.h is an open question,
> > and most likely requires bpftool modification:
> > - Hard code generation of __bpf_ctx based on type names?
> > - Mark context types with some special
> >   __attribute__((btf_decl_tag("preserve_static_offset")))
> >   and convert it to __attribute__((preserve_static_offset))?
>
> To me it seems like whatever mechanism supports identification of such
> structures would need to live in vmlinux BTF as ideally it should be
> possible to generate vmlinux.h purely from that BTF. That seems to argue
> for the declaration tag approach.

Tbh, I like the decl tag approach a bit more too.
Although macro definition would be somewhat ridiculous:

    #if __has_attribute(preserve_static_offset) && defined(__bpf__)
    #define __bpf_ctx __attribute__((preserve_static_offset)) \
                      __attribute__((btf_decl_tag("preserve_static_offset")))
    #else
    #define __bpf_ctx
    #endif

Thanks,
Eduard
Eduard Zingerman Dec. 8, 2023, 2:34 p.m. UTC | #4
On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote:
[...]
> All context types are defined in include/linux/bpf_types.h.
> The context type bpf_nf_ctx is missing.

convert_ctx_access() is not applied for bpf_nf_ctx. Searching through
kernel code shows that BPF programs access this structure directly
(net/netfilter/nf_bpf_link.c):

    static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb,
                        const struct nf_hook_state *s)
    {
        const struct bpf_prog *prog = bpf_prog;
        struct bpf_nf_ctx ctx = {
            .state = s,
            .skb = skb,
        };

        return bpf_prog_run(prog, &ctx);
    }

I added __bpf_ctx only for types that are subject to convert_ctx_access()
transformation. On the other hand, applying it to each context type
should not hurt either. Which way would you prefer?

[...]

> > How to add the same definitions in vmlinux.h is an open question,
> > and most likely requires bpftool modification:
> > - Hard code generation of __bpf_ctx based on type names?
> > - Mark context types with some special
> >    __attribute__((btf_decl_tag("preserve_static_offset")))
> >    and convert it to __attribute__((preserve_static_offset))?
> 
> The number of context types is limited, I would just go through
> the first approach with hard coding the list of ctx types and
> mark them with preserve_static_offset attribute in vmlinux.h.

Tbh, I'm with Alan here, generic approach seems a tad nicer.
Lets collect some more votes :)
Alan Maguire Dec. 8, 2023, 3:35 p.m. UTC | #5
On 08/12/2023 14:21, Eduard Zingerman wrote:
> On Fri, 2023-12-08 at 12:27 +0000, Alan Maguire wrote:
> [...]
>> Sorry if this is a digression, but I'm trying to understand how
>> this might intersect with vmlinux.h's
>>
>> #pragma clang attribute push (__attribute__((preserve_access_index)),
>> apply_to = record
>>
>> Since that is currently applied to all structures in vmlinux.h, does
>> that protect us from the above scenario when BPF code is compiled and
>> #include's vmlinux.h (I suspect not from what you say below but just
>> wanted to check)? I realize we get extra relocation info that we don't
>> need since the offsets for these BPF context structures are recalcuated
>> by the verifier, but given that clang needs to record the relocations,
>> does it also constrain the generated code to avoid these "increment
>> pointer, use zero offset" instruction patterns? Or can they still occur
>> with preserve_access_index applied to the structure? Sorry, might be a
>> naive question but it's not clear to me how (if at all) the mechanisms
>> might interact.
> 
> Unfortunately preserve_access_index does not save us from this problem.
> This is the case because field reads and writes are split as two LLVM
> IR instructions: getelementptr to get an address, and load/store
> to/from that address. The preserve_access_index transformation
> rewrites the getelementptr but does not touch load/store.
> 
> For example, consider the following C code:
> 
>     /* #define __ctx __attribute__((preserve_static_offset)) */
>     /* #define __pai */
>     #define __ctx
>     #define __pai __attribute__((preserve_access_index))
> 
>     extern int magic2(int);
> 
>     struct bpf_sock {
>       int bound_dev_if;
>       int family;
>     } __ctx __pai;
> 
>     struct bpf_sockopt {
>       int _;
>       struct bpf_sock *sk;
>       int level;
>       int optlen;
>     } __ctx __pai;
> 
>     int known_load_sink_example_1(struct bpf_sockopt *ctx)
>     {
>       unsigned g = 0;
>       switch (ctx->level) {
>       case 10:
>         g = magic2(ctx->sk->family);
>         break;
>       case 20:
>         g = magic2(ctx->optlen);
>         break;
>       }
>       return g % 2;
>     }
> 
> Here is how it is compiled:
> 
>     $ clang -g -O2 --target=bpf -mcpu=v3 -c e3.c -o - | llvm-objdump --no-show-raw-insn -Sdr -
>     ...
>     0000000000000000 <known_load_sink_example_1>:
>     ;   switch (ctx->level) {
>            0:   r2 = *(u32 *)(r1 + 0x10)
>             0000000000000000:  CO-RE <byte_off> [2] struct bpf_sockopt::level (0:2)
>            1:   if w2 == 0x14 goto +0x5 <LBB0_3>
>            2:   w0 = 0x0
>     ;   switch (ctx->level) {
>            3:   if w2 != 0xa goto +0x8 <LBB0_5>
>     ;     g = magic2(ctx->sk->family);
>            4:   r1 = *(u64 *)(r1 + 0x8)
>             0000000000000020:  CO-RE <byte_off> [2] struct bpf_sockopt::sk (0:1)
>            5:   r2 = 0x4
>             0000000000000028:  CO-RE <byte_off> [7] struct bpf_sock::family (0:1)
>            6:   goto +0x1 <LBB0_4>
> 
>     0000000000000038 <LBB0_3>:
>            7:   r2 = 0x14
>             0000000000000038:  CO-RE <byte_off> [2] struct bpf_sockopt::optlen (0:3)
> 
>     0000000000000040 <LBB0_4>:
>            8:   r1 += r2
>            9:   r1 = *(u32 *)(r1 + 0x0)  <---------------- verifier error would
>           10:   call -0x1                                  be reported for this insn
>             0000000000000050:  R_BPF_64_32  magic2
>     ;   return g % 2;
>           11:   w0 &= 0x1
> 
>     0000000000000060 <LBB0_5>:
>           12:   exit
> 
>> The reason I ask is if it was safe to assume that code generation would
>> avoid such patterns with preserve_access_index, it might avoid needing
>> to update vmlinux.h generation.
> 
> In current LLVM implementation preserve_static_offset has priority
> over preserve_access_index. So changing __pai/__ctx definitions above helps.
> (And this priority of one attribute over the other was the reason to
>  have preserve_static_offset as an attribute, not as
>  btf_decl_tag("preserve_static_offset"). Although that is unfortunate
>  for vmlinux.h, as we already have means to preserve decl tags).
>

Thanks for the explanation!

> [...]
> 
>>> How to add the same definitions in vmlinux.h is an open question,
>>> and most likely requires bpftool modification:
>>> - Hard code generation of __bpf_ctx based on type names?
>>> - Mark context types with some special
>>>   __attribute__((btf_decl_tag("preserve_static_offset")))
>>>   and convert it to __attribute__((preserve_static_offset))?
>>
>> To me it seems like whatever mechanism supports identification of such
>> structures would need to live in vmlinux BTF as ideally it should be
>> possible to generate vmlinux.h purely from that BTF. That seems to argue
>> for the declaration tag approach.
> 
> Tbh, I like the decl tag approach a bit more too.
> Although macro definition would be somewhat ridiculous:
> 
>     #if __has_attribute(preserve_static_offset) && defined(__bpf__)
>     #define __bpf_ctx __attribute__((preserve_static_offset)) \
>                       __attribute__((btf_decl_tag("preserve_static_offset")))
>     #else
>     #define __bpf_ctx
>     #endif
>

As macro definitions go, that's not that ridiculous ;-)

If we add it to vmlinux.h, would be good to have a

#ifdef BPF_NO_PRESERVE_STATIC_OFFSET
#undef __bpf_ctx
#define __bpf_ctx
#endif

...too, just in case the user wanted to use CO-RE with any of the types
covered. Thanks!

Alan
Eduard Zingerman Dec. 8, 2023, 3:39 p.m. UTC | #6
On Fri, 2023-12-08 at 15:35 +0000, Alan Maguire wrote:
[...]
> > Tbh, I like the decl tag approach a bit more too.
> > Although macro definition would be somewhat ridiculous:
> > 
> >     #if __has_attribute(preserve_static_offset) && defined(__bpf__)
> >     #define __bpf_ctx __attribute__((preserve_static_offset)) \
> >                       __attribute__((btf_decl_tag("preserve_static_offset")))
> >     #else
> >     #define __bpf_ctx
> >     #endif
> > 
> 
> As macro definitions go, that's not that ridiculous ;-)

Fair enough :)

> If we add it to vmlinux.h, would be good to have a
> 
> #ifdef BPF_NO_PRESERVE_STATIC_OFFSET
> #undef __bpf_ctx
> #define __bpf_ctx
> #endif
> 
> ...too, just in case the user wanted to use CO-RE with any of the types
> covered. Thanks!

Will do.

Thanks,
Eduard.
Yonghong Song Dec. 8, 2023, 5:19 p.m. UTC | #7
On 12/8/23 6:34 AM, Eduard Zingerman wrote:
> On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote:
> [...]
>> All context types are defined in include/linux/bpf_types.h.
>> The context type bpf_nf_ctx is missing.
> convert_ctx_access() is not applied for bpf_nf_ctx. Searching through
> kernel code shows that BPF programs access this structure directly
> (net/netfilter/nf_bpf_link.c):
>
>      static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb,
>                          const struct nf_hook_state *s)
>      {
>          const struct bpf_prog *prog = bpf_prog;
>          struct bpf_nf_ctx ctx = {
>              .state = s,
>              .skb = skb,
>          };
>
>          return bpf_prog_run(prog, &ctx);
>      }
>
> I added __bpf_ctx only for types that are subject to convert_ctx_access()
> transformation. On the other hand, applying it to each context type
> should not hurt either. Which way would you prefer?
>
> [...]

The error message should happen here:

check_mem_access
  ...
  } else if (reg->type == PTR_TO_CTX) {
   check_ptr_off_reg
    __check_ptr_off_reg
         if (!fixed_off_ok && reg->off) {
                 verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
                         reg_type_str(env, reg->type), regno, reg->off);
                 return -EACCES;
         }
   ...

So the verification error message will be emitted earlier, before convert_ctx_access.
Could you double check?

>
>>> How to add the same definitions in vmlinux.h is an open question,
>>> and most likely requires bpftool modification:
>>> - Hard code generation of __bpf_ctx based on type names?
>>> - Mark context types with some special
>>>     __attribute__((btf_decl_tag("preserve_static_offset")))
>>>     and convert it to __attribute__((preserve_static_offset))?
>> The number of context types is limited, I would just go through
>> the first approach with hard coding the list of ctx types and
>> mark them with preserve_static_offset attribute in vmlinux.h.
> Tbh, I'm with Alan here, generic approach seems a tad nicer.
> Lets collect some more votes :)
Yonghong Song Dec. 8, 2023, 5:30 p.m. UTC | #8
On 12/8/23 6:34 AM, Eduard Zingerman wrote:
> On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote:
> [...]
>> All context types are defined in include/linux/bpf_types.h.
>> The context type bpf_nf_ctx is missing.
> convert_ctx_access() is not applied for bpf_nf_ctx. Searching through
> kernel code shows that BPF programs access this structure directly
> (net/netfilter/nf_bpf_link.c):
>
>      static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb,
>                          const struct nf_hook_state *s)
>      {
>          const struct bpf_prog *prog = bpf_prog;
>          struct bpf_nf_ctx ctx = {
>              .state = s,
>              .skb = skb,
>          };
>
>          return bpf_prog_run(prog, &ctx);
>      }
>
> I added __bpf_ctx only for types that are subject to convert_ctx_access()
> transformation. On the other hand, applying it to each context type
> should not hurt either. Which way would you prefer?
>
> [...]
>
>>> How to add the same definitions in vmlinux.h is an open question,
>>> and most likely requires bpftool modification:
>>> - Hard code generation of __bpf_ctx based on type names?
>>> - Mark context types with some special
>>>     __attribute__((btf_decl_tag("preserve_static_offset")))
>>>     and convert it to __attribute__((preserve_static_offset))?
>> The number of context types is limited, I would just go through
>> the first approach with hard coding the list of ctx types and
>> mark them with preserve_static_offset attribute in vmlinux.h.
> Tbh, I'm with Alan here, generic approach seems a tad nicer.
> Lets collect some more votes :)

I just want to propose to have less work :-) since we are only dealing 
with a few structs in bpf domain. Note that eventually generated 
vmlinux.h will be the same whether we use 'hard code' approach or the 
decl_tag approach. The difference is just how to do it: - dwarf/btf with 
decl tag -> bpftool vmlinux.h gen, or - dwarf/btf without decl tag + 
hardcoded bpf ctx info -> bpftool vmlinux.h gen If we intends to cover 
all uapi data structures (to prevent unnecessary CORE relocation, esp. 
for troublesome bitfield operations), hardcoded approach won't work and 
we may have to go to decl tag approach.
Alexei Starovoitov Dec. 8, 2023, 5:46 p.m. UTC | #9
On Fri, Dec 8, 2023 at 9:30 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/8/23 6:34 AM, Eduard Zingerman wrote:
> > On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote:
> > [...]
> >> All context types are defined in include/linux/bpf_types.h.
> >> The context type bpf_nf_ctx is missing.
> > convert_ctx_access() is not applied for bpf_nf_ctx. Searching through
> > kernel code shows that BPF programs access this structure directly
> > (net/netfilter/nf_bpf_link.c):
> >
> >      static unsigned int nf_hook_run_bpf(void *bpf_prog, struct sk_buff *skb,
> >                          const struct nf_hook_state *s)
> >      {
> >          const struct bpf_prog *prog = bpf_prog;
> >          struct bpf_nf_ctx ctx = {
> >              .state = s,
> >              .skb = skb,
> >          };
> >
> >          return bpf_prog_run(prog, &ctx);
> >      }
> >
> > I added __bpf_ctx only for types that are subject to convert_ctx_access()
> > transformation. On the other hand, applying it to each context type
> > should not hurt either. Which way would you prefer?
> >
> > [...]
> >
> >>> How to add the same definitions in vmlinux.h is an open question,
> >>> and most likely requires bpftool modification:
> >>> - Hard code generation of __bpf_ctx based on type names?
> >>> - Mark context types with some special
> >>>     __attribute__((btf_decl_tag("preserve_static_offset")))
> >>>     and convert it to __attribute__((preserve_static_offset))?
> >> The number of context types is limited, I would just go through
> >> the first approach with hard coding the list of ctx types and
> >> mark them with preserve_static_offset attribute in vmlinux.h.
> > Tbh, I'm with Alan here, generic approach seems a tad nicer.
> > Lets collect some more votes :)
>
> I just want to propose to have less work :-) since we are only dealing
> with a few structs in bpf domain. Note that eventually generated
> vmlinux.h will be the same whether we use 'hard code' approach or the
> decl_tag approach. The difference is just how to do it: - dwarf/btf with
> decl tag -> bpftool vmlinux.h gen, or - dwarf/btf without decl tag +
> hardcoded bpf ctx info -> bpftool vmlinux.h gen If we intends to cover
> all uapi data structures (to prevent unnecessary CORE relocation, esp.
> for troublesome bitfield operations), hardcoded approach won't work and
> we may have to go to decl tag approach.

+1 for simplicity of "hard code" approach.
We've stopped adding new uapi "mirror" structs like __sk_buff long ago.
The number of structs that need ctx rewrite will not increase.
Eduard Zingerman Dec. 8, 2023, 8:35 p.m. UTC | #10
On Fri, 2023-12-08 at 09:46 -0800, Alexei Starovoitov wrote:
[...]
> > I just want to propose to have less work :-) since we are only dealing
> > with a few structs in bpf domain. Note that eventually generated
> > vmlinux.h will be the same whether we use 'hard code' approach or the
> > decl_tag approach. The difference is just how to do it: - dwarf/btf with
> > decl tag -> bpftool vmlinux.h gen, or - dwarf/btf without decl tag +
> > hardcoded bpf ctx info -> bpftool vmlinux.h gen If we intends to cover
> > all uapi data structures (to prevent unnecessary CORE relocation, esp.
> > for troublesome bitfield operations), hardcoded approach won't work and
> > we may have to go to decl tag approach.
> 
> +1 for simplicity of "hard code" approach.
> We've stopped adding new uapi "mirror" structs like __sk_buff long ago.
> The number of structs that need ctx rewrite will not increase.

Ok, I'll submit V2 with changes in libbpf/bpftool to emit
preserve_static_offset for predefined set of types.
Eduard Zingerman Dec. 8, 2023, 8:54 p.m. UTC | #11
On Fri, 2023-12-08 at 09:19 -0800, Yonghong Song wrote:
> On 12/8/23 6:34 AM, Eduard Zingerman wrote:
> > On Thu, 2023-12-07 at 18:28 -0800, Yonghong Song wrote:
> > [...]
> > > All context types are defined in include/linux/bpf_types.h.
> > > The context type bpf_nf_ctx is missing.
[...]
> The error message should happen here:
> 
> check_mem_access
>   ...
>   } else if (reg->type == PTR_TO_CTX) {
>    check_ptr_off_reg
>     __check_ptr_off_reg
>          if (!fixed_off_ok && reg->off) {
>                  verbose(env, "dereference of modified %s ptr R%d off=%d disallowed\n",
>                          reg_type_str(env, reg->type), regno, reg->off);
>                  return -EACCES;
>          }
>    ...
> 
> So the verification error message will be emitted earlier, before convert_ctx_access.
> Could you double check?

You are correct and I was unaware of this check. A simple test
"r1 += 8; r0 = *(u64 *)(r1 + 0);" does indeed report an error.
I'll make sure that every context type is annotated with
preserve static offset, thank you for pointing this out.

[...]