mbox series

[RFC,bpf-next,00/12] Use uapi kernel headers with vmlinux.h

Message ID 20221025222802.2295103-1-eddyz87@gmail.com (mailing list archive)
Headers show
Series Use uapi kernel headers with vmlinux.h | expand

Message

Eduard Zingerman Oct. 25, 2022, 10:27 p.m. UTC
Hi BPF community,

AFAIK there is a long standing feature request to use kernel headers
alongside `vmlinux.h` generated by `bpftool`. For example significant
effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
clang, unfortunately this effort was stuck due to concerns regarding C
language semantics.

After some discussion with Alexei and Yonghong I'd like to request
your comments regarding a somewhat brittle and partial solution to
this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
the generated `vmlinux.h`.

The basic idea
---

The goal of the patch set is to allow usage of header files from
`include/uapi` alongside `vmlinux.h` as follows:

  #include <uapi/linux/tcp.h>
  #include "vmlinux.h"

This goal is achieved by adding `#ifndef ... #endif` guards in
`vmlinux.h` around definitions that originate from the `include/uapi`
headers. The guards emitted match the guards used in the original
headers. E.g. as follows:

include/uapi/linux/tcp.h:

  #ifndef _UAPI_LINUX_TCP_H
  #define _UAPI_LINUX_TCP_H
  ...
  union tcp_word_hdr {
	struct tcphdr hdr;
	__be32        words[5];
  };
  ...
  #endif /* _UAPI_LINUX_TCP_H */

vmlinux.h:

  ...
  #ifndef _UAPI_LINUX_TCP_H

  union tcp_word_hdr {
	struct tcphdr hdr;
	__be32 words[5];
  };

  #endif /* _UAPI_LINUX_TCP_H */
  ...

To get to this state the following steps are necessary:
- "header guard" name should be identified for each header file;
- the correspondence between data type and it's header guard has to be
  encoded in BTF;
- `bpftool` should be adjusted to emit `#ifndef FOO_H ... #endif`
  brackets.

It is not possible to identify header guard names for all uapi headers
basing only on the file name. However a simple script could devised to
identify the guards basing on the file name and it's content. Thus it
is possible to obtain the list of header names with corresponding
header guards.

The correspondence between type and it's declaration file (header) is
available in DWARF as `DW_AT_decl_file` attribute. The
`DW_AT_decl_file` can be matched with the list of header guards
described above to obtain the header guard name for a specific type.

The `pahole` generates BTF using DWARF. It is possible to modify
`pahole` to accept the header guards list as an additional parameter
and to encode the header guard names in BTF.

Implementation details
---

Present patch-set implements these ideas as follows:
- A parameter `--header_guards_db` is added to `pahole`. If present it
  points to a file with a list of `<header> <guard>` records.
- `pahole` uses DWARF `DW_AT_decl_file` value to lookup the header
  guard for each type emitted to BTF. If header guard is present it is
  encoded alongside the type.
- Header guards are encoded in BTF as `BTF_DECL_TAG` records with a
  special prefix. The prefix "header_guard:" is added to a value of
  such tags. (Here `BTF_DECL_TAG` is used to avoid BTF binary format
  changes).
- A special script `infer_header_guards.pl` is added as a part of
  kbuild, it can infer header guard names for each UAPI header basing
  on the header content.
- This script is invoked from `link-vmlinux.sh` prior to BTF
  generation during kernel build. The output of the script is saved to
  a file, the file is passed to `pahole` as `--header_guards_db`
  parameter.
- `libbpf` is modified to aggregate `BTF_DECL_TAG` records for each
  type and to emit `#ifndef FOO_H ... #endif` brackets when
  "header_guard:" tag is present for a type.

Details for each patch in a set:
- libbpf: Deduplicate unambigous standalone forward declarations
- selftests/bpf: Tests for standalone forward BTF declarations deduplication

  There is a small number (63 for defconfig) of forward declarations
  that are not de-duplicated with the main type declaration under
  certain conditions. This hinders the header guard brackets
  generation. This patch addresses this de-duplication issue.

- libbpf: Support for BTF_DECL_TAG dump in C format
- selftests/bpf: Tests for BTF_DECL_TAG dump in C format

  Currently libbpf does not process BTF_DECL_TAG when btf is dumped in
  C format. This patch adds a hash table matching btf type ids with a
  list of decl tags to the struct btf_dump.
  The `btf_dump_emit_decl_tags` is not necessary for the overall
  patch-set to function but simplifies testing a bit.

- libbpf: Header guards for selected data structures in vmlinux.h
- selftests/bpf: Tests for header guards printing in BTF dump

  Adds option `emit_header_guards` to `struct btf_dump_opts`.
  When enabled the `btf_dump__dump_type` prints `#ifndef ... #endif`
  brackets around types for which header guard information is present
  in BTF.

- bpftool: Enable header guards generation

  Unconditionally enables `emit_header_guards` for BTF dump in C format.

- kbuild: Script to infer header guard values for uapi headers
- kbuild: Header guards for types from include/uapi/*.h in kernel BTF

  Adds `scripts/infer_header_guards.pl` and integrates it with
  `link-vmlinux.sh`.

- selftests/bpf: Script to verify uapi headers usage with vmlinux.h

  Adds a script `test_uapi_headers.py` that tests header guards with
  vmlinux.h by compiling a simple C snippet. The snippet looks as
  follows:
  
    #include <some_uapi_header.h>
    #include "vmlinux.h"
  
    __attribute__((section("tc"), used))
    int syncookie_tc(struct __sk_buff *skb) { return 0; }
  
  The list of headers to test comes from
  `tools/testing/selftests/bpf/good_uapi_headers.txt`.

- selftests/bpf: Known good uapi headers for test_uapi_headers.py

  The list of uapi headers that could be included alongside vmlinux.h.
  The headers are peeked from the following locations:
  - <headers-export-dir>/linux/*.h
  - <headers-export-dir>/linux/**/*.h
  This choice of locations is somewhat arbitrary.

- selftests/bpf: script for infer_header_guards.pl testing

  The test case for `scripts/infer_header_guards.pl`, verifies that
  header guards can be inferred for all uapi headers.

- There is also a patch for dwarves that adds `--header_guards_db`
  option (see [2]).

The `test_uapi_headers.py` is important as it demonstrates the
the necessary compiler flags:

clang ...                                  \
      -D__x86_64__                         \
      -Xclang -fwchar-type=short           \
      -Xclang -fno-signed-wchar            \
      -I{exported_kernel_headers}/include/ \
      ...

- `-fwchar-type=short` and `-fno-signed-wchar` had to be added because
  BPF target uses `int` for `wchar_t` by default and this differs from
  `vmlinux.h` definition of the type (at least for x86_64).
- `__x86_64__` had to be added for uapi headers that include
  `stddef.h` (the one that is supplied my CLANG itself), in order to
  define correct sizes for `size_t` and `ptrdiff_t`.
- The `{exported_kernel_headers}` stands for exported kernel headers
  directory (the headers obtained by `make headers_install` or via
  distribution package).

When it works
---

The mechanics described above works for a significant number of UAPI
headers. For example, for the test case above I chose the headers from
the following locations:
- linux/*.h
- linux/**/*.h
There are 759 such headers and for 677 of them the test described
above passes.

I excluded the headers from the following sub-directories as
potentially not interesting:

  asm          rdma   video xen
  asm-generic  misc   scsi
  drm          mtd    sound

Thus saving some time for both discussion and CI but the choice is
somewhat arbitrary. If I run `test_uapi_headers.py --test '*'` (all
headers) test passes for 834 out of 972 headers.

When it breaks
---

There several scenarios when this mechanics breaks.
Specifically I found the following cases:
- When uapi header includes some system header that conflicts with
  vmlinux.h.
- When uapi header itself conflicts with vmlinux.h.

Below are examples for both cases.

Conflict with system headers
----

The following uapi headers:
- linux/atmbr2684.h
- linux/bpfilter.h
- linux/gsmmux.h
- linux/icmp.h
- linux/if.h
- linux/if_arp.h
- linux/if_bonding.h
- linux/if_pppox.h
- linux/if_tunnel.h
- linux/ip6_tunnel.h
- linux/llc.h
- linux/mctp.h
- linux/mptcp.h
- linux/netdevice.h
- linux/netfilter/xt_RATEEST.h
- linux/netfilter/xt_hashlimit.h
- linux/netfilter/xt_physdev.h
- linux/netfilter/xt_rateest.h
- linux/netfilter_arp/arp_tables.h
- linux/netfilter_arp/arpt_mangle.h
- linux/netfilter_bridge.h
- linux/netfilter_bridge/ebtables.h
- linux/netfilter_ipv4/ip_tables.h
- linux/netfilter_ipv6/ip6_tables.h
- linux/route.h
- linux/wireless.h

Include the following system header:
- /usr/include/sys/socket.h (all via linux/if.h)

The sys/socket.h conflicts with vmlinux.h in:
- types: struct iovec, struct sockaddr, struct msghdr, ...
- constants: SOCK_STREAM, SOCK_DGRAM, ...

However, only two types are actually used:
- struct sockaddr
- struct sockaddr_storage (used only in linux/mptcp.h)

In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
(non UAPI header), thus does not have a header guard.

The only workaround that I see is to:
- define a stub sys/socket.h as follows:

    #ifndef __BPF_SOCKADDR__
    #define __BPF_SOCKADDR__
    
    /* For __kernel_sa_family_t */
    #include <linux/socket.h>
    
    struct sockaddr {
        __kernel_sa_family_t sa_family;
        char sa_data[14];
    };
    
    #endif

- hardcode generation of __BPF_SOCKADDR__ bracket for
  'struct sockaddr' in vmlinux.h.

Another possibility is to move the definition of 'struct sockaddr'
from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
but I expect that this won't fly with the mainline as it might break
the programs that include both 'linux/socket.h' and 'sys/socket.h'.

Conflict with vmlinux.h
----

Uapi header:
- linux/signal.h

Conflict with vmlinux.h in definition of 'struct sigaction'.
Defined in:
- vmlinux.h: kernel/include/linux/signal_types.h
- uapi:      kernel/arch/x86/include/asm/signal.h

Uapi headers:
- linux/tipc_sockets_diag.h
- linux/sock_diag.h

Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
Defined in:
- vmlinux.h: kernel/include/net/sock.h
- uapi:      kernel/include/uapi/linux/sock_diag.h
Constants seem to be unrelated.

And so on... I have details for many other headers but omit those for
brevity.

In conclusion
---

Except from the general feasibility I have a few questions:
- What UAPI headers are the candidates for such use? If there are some
  interesting headers currently not working with this patch-set some
  hacks have to be added (e.g. like with `linux/if.h`).
- Is it ok to encode header guards as special `BTF_DECL_TAG` or should
  I change the BTF format a bit to save some bytes.

Thanks,
Eduard


[1] https://reviews.llvm.org/D111307
    [clang] __attribute__ bpf_dominating_decl
[2] https://lore.kernel.org/dwarves/20221025220729.2293891-1-eddyz87@gmail.com/T/
    [RFC dwarves] pahole: Save header guard names when
                  --header_guards_db is passed

Eduard Zingerman (12):
  libbpf: Deduplicate unambigous standalone forward declarations
  selftests/bpf: Tests for standalone forward BTF declarations
    deduplication
  libbpf: Support for BTF_DECL_TAG dump in C format
  selftests/bpf: Tests for BTF_DECL_TAG dump in C format
  libbpf: Header guards for selected data structures in vmlinux.h
  selftests/bpf: Tests for header guards printing in BTF dump
  bpftool: Enable header guards generation
  kbuild: Script to infer header guard values for uapi headers
  kbuild: Header guards for types from include/uapi/*.h in kernel BTF
  selftests/bpf: Script to verify uapi headers usage with vmlinux.h
  selftests/bpf: Known good uapi headers for test_uapi_headers.py
  selftests/bpf: script for infer_header_guards.pl testing

 scripts/infer_header_guards.pl                | 191 +++++
 scripts/link-vmlinux.sh                       |  13 +-
 tools/bpf/bpftool/btf.c                       |   4 +-
 tools/lib/bpf/btf.c                           | 178 ++++-
 tools/lib/bpf/btf.h                           |   7 +-
 tools/lib/bpf/btf_dump.c                      | 232 +++++-
 .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
 .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
 .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
 .../progs/btf_dump_test_case_header_guards.c  |  94 +++
 .../bpf/test_uapi_header_guards_infer.sh      |  33 +
 .../selftests/bpf/test_uapi_headers.py        | 197 +++++
 13 files changed, 1816 insertions(+), 12 deletions(-)
 create mode 100755 scripts/infer_header_guards.pl
 create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
 create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
 create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
 create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py

Comments

Alexei Starovoitov Oct. 25, 2022, 11:46 p.m. UTC | #1
On Wed, Oct 26, 2022 at 01:27:49AM +0300, Eduard Zingerman wrote:
> 
> Include the following system header:
> - /usr/include/sys/socket.h (all via linux/if.h)
> 
> The sys/socket.h conflicts with vmlinux.h in:
> - types: struct iovec, struct sockaddr, struct msghdr, ...
> - constants: SOCK_STREAM, SOCK_DGRAM, ...
> 
> However, only two types are actually used:
> - struct sockaddr
> - struct sockaddr_storage (used only in linux/mptcp.h)
> 
> In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> (non UAPI header), thus does not have a header guard.
> 
> The only workaround that I see is to:
> - define a stub sys/socket.h as follows:
> 
>     #ifndef __BPF_SOCKADDR__
>     #define __BPF_SOCKADDR__
>     
>     /* For __kernel_sa_family_t */
>     #include <linux/socket.h>
>     
>     struct sockaddr {
>         __kernel_sa_family_t sa_family;
>         char sa_data[14];
>     };
>     
>     #endif
> 
> - hardcode generation of __BPF_SOCKADDR__ bracket for
>   'struct sockaddr' in vmlinux.h.

we don't need to hack sys/socket.h and can hardcode
#ifdef _SYS_SOCKET_H as header guard for sockaddr instead, right?
bits/socket.h has this:
#ifndef _SYS_SOCKET_H
# error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."

So that ifdef is kinda stable.

> Another possibility is to move the definition of 'struct sockaddr'
> from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> but I expect that this won't fly with the mainline as it might break
> the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> 
> Conflict with vmlinux.h
> ----
> 
> Uapi header:
> - linux/signal.h
> 
> Conflict with vmlinux.h in definition of 'struct sigaction'.
> Defined in:
> - vmlinux.h: kernel/include/linux/signal_types.h
> - uapi:      kernel/arch/x86/include/asm/signal.h
> 
> Uapi headers:
> - linux/tipc_sockets_diag.h
> - linux/sock_diag.h
> 
> Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.

Interesting one!
I think we can hard code '#undef SOCK_DESTROY' in vmlinux.h

The goal is not to be able to mix arbitrary uapi header with
vmlinux.h, but only those that could be useful out of bpf progs.
Currently it's tcp.h and few other network related headers
because they have #define-s in them that are useful inside bpf progs.
As long as the solution covers this small subset we're fine.
Alan Maguire Oct. 26, 2022, 11:10 a.m. UTC | #2
On 25/10/2022 23:27, Eduard Zingerman wrote:
> Hi BPF community,
> 
> AFAIK there is a long standing feature request to use kernel headers
> alongside `vmlinux.h` generated by `bpftool`. For example significant
> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> clang, unfortunately this effort was stuck due to concerns regarding C
> language semantics.
> 

Thanks for the details! It's a tricky problem to solve.

Before diving into this, can I ask if there's another way round this;
is there no way we could teach vmlinux.h generation which types to
skip via some kind of bootstrapping process?

For a bpf object foo.bpf.c that wants to include linux/tcp.h and 
vmlinux.h, something like this seems possible;

1. run the preprocessor (gcc -E) over the BPF program to generate 
a bootstrap header foo.bpf.exclude_types.h consisting of all the types mentioned
in it and associated includes, but not in vmlinux.h  It would need to -D__VMLINUX_H__ 
to avoid including vmlinux.h definitions and -D__BPF_EXCLUDE_TYPE_BOOTSTRAP__ to 
skip the programs themselves, which would need a guard around them I think:

#include <stddef.h>
#include <stdbool.h>
#include <vmlinux.h>
#include <linux/tcp.h>

#ifndef __BPF_EXCLUDE_TYPE_BOOTSTRAP__
//rest of program here
#endif

So as a result of this, we now have a single header file that contains all the types
that non-vmlinux.h include files define.

2. now to generate vmlinux.h, pass foo.bpf.exclude_types.h into "bpftool btf dump" as an
exclude header:

bpftool btf dump --exclude /tmp/foo.bpf.types.h file /sys/kernel/btf/vmlinux format c > vmlinux.h

bpftool would have to parse the exclude header for actual type definitions, spotting struct,
enum and typedef definitions. This is likely made easier by running the preprocessor
at least since formatting is probably quite uniform. vmlinux.h could simply emit forward declarations
for types described both in vmlinux BTF and in the exclude file.

So the build process would be

- start with empty vmlinux.h
- bootstrap a header consisting of the set of types to exclude via c preprocessor
- generate new vmlinux.h based on above
- build bpf program

Build processes for BPF objects already has bootstrapping elements like
this for generating vmlinux.h and skeletons, so while it's not perfect it might
be a simpler approach. There may be problems with this I'm not seeing though?

Thanks!

Alan

> After some discussion with Alexei and Yonghong I'd like to request
> your comments regarding a somewhat brittle and partial solution to
> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> the generated `vmlinux.h`.
> 
> The basic idea
> ---
> 
> The goal of the patch set is to allow usage of header files from
> `include/uapi` alongside `vmlinux.h` as follows:
> 
>   #include <uapi/linux/tcp.h>
>   #include "vmlinux.h"
> 
> This goal is achieved by adding `#ifndef ... #endif` guards in
> `vmlinux.h` around definitions that originate from the `include/uapi`
> headers. The guards emitted match the guards used in the original
> headers. E.g. as follows:
> 
> include/uapi/linux/tcp.h:
> 
>   #ifndef _UAPI_LINUX_TCP_H
>   #define _UAPI_LINUX_TCP_H
>   ...
>   union tcp_word_hdr {
> 	struct tcphdr hdr;
> 	__be32        words[5];
>   };
>   ...
>   #endif /* _UAPI_LINUX_TCP_H */
> 
> vmlinux.h:
> 
>   ...
>   #ifndef _UAPI_LINUX_TCP_H
> 
>   union tcp_word_hdr {
> 	struct tcphdr hdr;
> 	__be32 words[5];
>   };
> 
>   #endif /* _UAPI_LINUX_TCP_H */
>   ...
> 
> To get to this state the following steps are necessary:
> - "header guard" name should be identified for each header file;
> - the correspondence between data type and it's header guard has to be
>   encoded in BTF;
> - `bpftool` should be adjusted to emit `#ifndef FOO_H ... #endif`
>   brackets.
> 
> It is not possible to identify header guard names for all uapi headers
> basing only on the file name. However a simple script could devised to
> identify the guards basing on the file name and it's content. Thus it
> is possible to obtain the list of header names with corresponding
> header guards.
> 
> The correspondence between type and it's declaration file (header) is
> available in DWARF as `DW_AT_decl_file` attribute. The
> `DW_AT_decl_file` can be matched with the list of header guards
> described above to obtain the header guard name for a specific type.
> 
> The `pahole` generates BTF using DWARF. It is possible to modify
> `pahole` to accept the header guards list as an additional parameter
> and to encode the header guard names in BTF.
> 
> Implementation details
> ---
> 
> Present patch-set implements these ideas as follows:
> - A parameter `--header_guards_db` is added to `pahole`. If present it
>   points to a file with a list of `<header> <guard>` records.
> - `pahole` uses DWARF `DW_AT_decl_file` value to lookup the header
>   guard for each type emitted to BTF. If header guard is present it is
>   encoded alongside the type.
> - Header guards are encoded in BTF as `BTF_DECL_TAG` records with a
>   special prefix. The prefix "header_guard:" is added to a value of
>   such tags. (Here `BTF_DECL_TAG` is used to avoid BTF binary format
>   changes).
> - A special script `infer_header_guards.pl` is added as a part of
>   kbuild, it can infer header guard names for each UAPI header basing
>   on the header content.
> - This script is invoked from `link-vmlinux.sh` prior to BTF
>   generation during kernel build. The output of the script is saved to
>   a file, the file is passed to `pahole` as `--header_guards_db`
>   parameter.
> - `libbpf` is modified to aggregate `BTF_DECL_TAG` records for each
>   type and to emit `#ifndef FOO_H ... #endif` brackets when
>   "header_guard:" tag is present for a type.
> 
> Details for each patch in a set:
> - libbpf: Deduplicate unambigous standalone forward declarations
> - selftests/bpf: Tests for standalone forward BTF declarations deduplication
> 
>   There is a small number (63 for defconfig) of forward declarations
>   that are not de-duplicated with the main type declaration under
>   certain conditions. This hinders the header guard brackets
>   generation. This patch addresses this de-duplication issue.
> 
> - libbpf: Support for BTF_DECL_TAG dump in C format
> - selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> 
>   Currently libbpf does not process BTF_DECL_TAG when btf is dumped in
>   C format. This patch adds a hash table matching btf type ids with a
>   list of decl tags to the struct btf_dump.
>   The `btf_dump_emit_decl_tags` is not necessary for the overall
>   patch-set to function but simplifies testing a bit.
> 
> - libbpf: Header guards for selected data structures in vmlinux.h
> - selftests/bpf: Tests for header guards printing in BTF dump
> 
>   Adds option `emit_header_guards` to `struct btf_dump_opts`.
>   When enabled the `btf_dump__dump_type` prints `#ifndef ... #endif`
>   brackets around types for which header guard information is present
>   in BTF.
> 
> - bpftool: Enable header guards generation
> 
>   Unconditionally enables `emit_header_guards` for BTF dump in C format.
> 
> - kbuild: Script to infer header guard values for uapi headers
> - kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> 
>   Adds `scripts/infer_header_guards.pl` and integrates it with
>   `link-vmlinux.sh`.
> 
> - selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> 
>   Adds a script `test_uapi_headers.py` that tests header guards with
>   vmlinux.h by compiling a simple C snippet. The snippet looks as
>   follows:
>   
>     #include <some_uapi_header.h>
>     #include "vmlinux.h"
>   
>     __attribute__((section("tc"), used))
>     int syncookie_tc(struct __sk_buff *skb) { return 0; }
>   
>   The list of headers to test comes from
>   `tools/testing/selftests/bpf/good_uapi_headers.txt`.
> 
> - selftests/bpf: Known good uapi headers for test_uapi_headers.py
> 
>   The list of uapi headers that could be included alongside vmlinux.h.
>   The headers are peeked from the following locations:
>   - <headers-export-dir>/linux/*.h
>   - <headers-export-dir>/linux/**/*.h
>   This choice of locations is somewhat arbitrary.
> 
> - selftests/bpf: script for infer_header_guards.pl testing
> 
>   The test case for `scripts/infer_header_guards.pl`, verifies that
>   header guards can be inferred for all uapi headers.
> 
> - There is also a patch for dwarves that adds `--header_guards_db`
>   option (see [2]).
> 
> The `test_uapi_headers.py` is important as it demonstrates the
> the necessary compiler flags:
> 
> clang ...                                  \
>       -D__x86_64__                         \
>       -Xclang -fwchar-type=short           \
>       -Xclang -fno-signed-wchar            \
>       -I{exported_kernel_headers}/include/ \
>       ...
> 
> - `-fwchar-type=short` and `-fno-signed-wchar` had to be added because
>   BPF target uses `int` for `wchar_t` by default and this differs from
>   `vmlinux.h` definition of the type (at least for x86_64).
> - `__x86_64__` had to be added for uapi headers that include
>   `stddef.h` (the one that is supplied my CLANG itself), in order to
>   define correct sizes for `size_t` and `ptrdiff_t`.
> - The `{exported_kernel_headers}` stands for exported kernel headers
>   directory (the headers obtained by `make headers_install` or via
>   distribution package).
> 
> When it works
> ---
> 
> The mechanics described above works for a significant number of UAPI
> headers. For example, for the test case above I chose the headers from
> the following locations:
> - linux/*.h
> - linux/**/*.h
> There are 759 such headers and for 677 of them the test described
> above passes.
> 
> I excluded the headers from the following sub-directories as
> potentially not interesting:
> 
>   asm          rdma   video xen
>   asm-generic  misc   scsi
>   drm          mtd    sound
> 
> Thus saving some time for both discussion and CI but the choice is
> somewhat arbitrary. If I run `test_uapi_headers.py --test '*'` (all
> headers) test passes for 834 out of 972 headers.
> 
> When it breaks
> ---
> 
> There several scenarios when this mechanics breaks.
> Specifically I found the following cases:
> - When uapi header includes some system header that conflicts with
>   vmlinux.h.
> - When uapi header itself conflicts with vmlinux.h.
> 
> Below are examples for both cases.
> 
> Conflict with system headers
> ----
> 
> The following uapi headers:
> - linux/atmbr2684.h
> - linux/bpfilter.h
> - linux/gsmmux.h
> - linux/icmp.h
> - linux/if.h
> - linux/if_arp.h
> - linux/if_bonding.h
> - linux/if_pppox.h
> - linux/if_tunnel.h
> - linux/ip6_tunnel.h
> - linux/llc.h
> - linux/mctp.h
> - linux/mptcp.h
> - linux/netdevice.h
> - linux/netfilter/xt_RATEEST.h
> - linux/netfilter/xt_hashlimit.h
> - linux/netfilter/xt_physdev.h
> - linux/netfilter/xt_rateest.h
> - linux/netfilter_arp/arp_tables.h
> - linux/netfilter_arp/arpt_mangle.h
> - linux/netfilter_bridge.h
> - linux/netfilter_bridge/ebtables.h
> - linux/netfilter_ipv4/ip_tables.h
> - linux/netfilter_ipv6/ip6_tables.h
> - linux/route.h
> - linux/wireless.h
> 
> Include the following system header:
> - /usr/include/sys/socket.h (all via linux/if.h)
> 
> The sys/socket.h conflicts with vmlinux.h in:
> - types: struct iovec, struct sockaddr, struct msghdr, ...
> - constants: SOCK_STREAM, SOCK_DGRAM, ...
> 
> However, only two types are actually used:
> - struct sockaddr
> - struct sockaddr_storage (used only in linux/mptcp.h)
> 
> In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> (non UAPI header), thus does not have a header guard.
> 
> The only workaround that I see is to:
> - define a stub sys/socket.h as follows:
> 
>     #ifndef __BPF_SOCKADDR__
>     #define __BPF_SOCKADDR__
>     
>     /* For __kernel_sa_family_t */
>     #include <linux/socket.h>
>     
>     struct sockaddr {
>         __kernel_sa_family_t sa_family;
>         char sa_data[14];
>     };
>     
>     #endif
> 
> - hardcode generation of __BPF_SOCKADDR__ bracket for
>   'struct sockaddr' in vmlinux.h.
> 
> Another possibility is to move the definition of 'struct sockaddr'
> from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> but I expect that this won't fly with the mainline as it might break
> the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> 
> Conflict with vmlinux.h
> ----
> 
> Uapi header:
> - linux/signal.h
> 
> Conflict with vmlinux.h in definition of 'struct sigaction'.
> Defined in:
> - vmlinux.h: kernel/include/linux/signal_types.h
> - uapi:      kernel/arch/x86/include/asm/signal.h
> 
> Uapi headers:
> - linux/tipc_sockets_diag.h
> - linux/sock_diag.h
> 
> Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
> Defined in:
> - vmlinux.h: kernel/include/net/sock.h
> - uapi:      kernel/include/uapi/linux/sock_diag.h
> Constants seem to be unrelated.
> 
> And so on... I have details for many other headers but omit those for
> brevity.
> 
> In conclusion
> ---
> 
> Except from the general feasibility I have a few questions:
> - What UAPI headers are the candidates for such use? If there are some
>   interesting headers currently not working with this patch-set some
>   hacks have to be added (e.g. like with `linux/if.h`).
> - Is it ok to encode header guards as special `BTF_DECL_TAG` or should
>   I change the BTF format a bit to save some bytes.
> 
> Thanks,
> Eduard
> 
> 
> [1] https://reviews.llvm.org/D111307
>     [clang] __attribute__ bpf_dominating_decl
> [2] https://lore.kernel.org/dwarves/20221025220729.2293891-1-eddyz87@gmail.com/T/
>     [RFC dwarves] pahole: Save header guard names when
>                   --header_guards_db is passed
> 
> Eduard Zingerman (12):
>   libbpf: Deduplicate unambigous standalone forward declarations
>   selftests/bpf: Tests for standalone forward BTF declarations
>     deduplication
>   libbpf: Support for BTF_DECL_TAG dump in C format
>   selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>   libbpf: Header guards for selected data structures in vmlinux.h
>   selftests/bpf: Tests for header guards printing in BTF dump
>   bpftool: Enable header guards generation
>   kbuild: Script to infer header guard values for uapi headers
>   kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>   selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>   selftests/bpf: Known good uapi headers for test_uapi_headers.py
>   selftests/bpf: script for infer_header_guards.pl testing
> 
>  scripts/infer_header_guards.pl                | 191 +++++
>  scripts/link-vmlinux.sh                       |  13 +-
>  tools/bpf/bpftool/btf.c                       |   4 +-
>  tools/lib/bpf/btf.c                           | 178 ++++-
>  tools/lib/bpf/btf.h                           |   7 +-
>  tools/lib/bpf/btf_dump.c                      | 232 +++++-
>  .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>  tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>  .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>  .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>  .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>  .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>  .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>  13 files changed, 1816 insertions(+), 12 deletions(-)
>  create mode 100755 scripts/infer_header_guards.pl
>  create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>
Eduard Zingerman Oct. 26, 2022, 10:46 p.m. UTC | #3
On Tue, 2022-10-25 at 16:46 -0700, Alexei Starovoitov wrote:
> On Wed, Oct 26, 2022 at 01:27:49AM +0300, Eduard Zingerman wrote:
> > 
> > Include the following system header:
> > - /usr/include/sys/socket.h (all via linux/if.h)
> > 
> > The sys/socket.h conflicts with vmlinux.h in:
> > - types: struct iovec, struct sockaddr, struct msghdr, ...
> > - constants: SOCK_STREAM, SOCK_DGRAM, ...
> > 
> > However, only two types are actually used:
> > - struct sockaddr
> > - struct sockaddr_storage (used only in linux/mptcp.h)
> > 
> > In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> > (non UAPI header), thus does not have a header guard.
> > 
> > The only workaround that I see is to:
> > - define a stub sys/socket.h as follows:
> > 
> >     #ifndef __BPF_SOCKADDR__
> >     #define __BPF_SOCKADDR__
> >     
> >     /* For __kernel_sa_family_t */
> >     #include <linux/socket.h>
> >     
> >     struct sockaddr {
> >         __kernel_sa_family_t sa_family;
> >         char sa_data[14];
> >     };
> >     
> >     #endif
> > 
> > - hardcode generation of __BPF_SOCKADDR__ bracket for
> >   'struct sockaddr' in vmlinux.h.
> 
> we don't need to hack sys/socket.h and can hardcode
> #ifdef _SYS_SOCKET_H as header guard for sockaddr instead, right?
> bits/socket.h has this:
> #ifndef _SYS_SOCKET_H
> # error "Never include <bits/socket.h> directly; use <sys/socket.h> instead."
> 
> So that ifdef is kinda stable.

The `if.h` only uses two types from `sys/socket.h`, namely:
- `struct sockaddr`
- `struct sockaddr_storage`

However `sys/socket.h` itself defines more types, here is a complete
list of types from `sys/socket.h` that conflict with `vmlinux.h`
(generated for my x86_64 laptop):

Type name       System header
iovec           /usr/include/bits/types/struct_iovec.h
loff_t          /usr/include/sys/types.h
dev_t           /usr/include/sys/types.h
nlink_t         /usr/include/sys/types.h
timer_t         /usr/include/bits/types/timer_t.h
int16_t         /usr/include/bits/stdint-intn.h
int32_t         /usr/include/bits/stdint-intn.h
int64_t         /usr/include/bits/stdint-intn.h
u_int64_t       /usr/include/sys/types.h
sigset_t        /usr/include/bits/types/sigset_t.h
fd_set          /usr/include/sys/select.h
blkcnt_t        /usr/include/sys/types.h
SOCK_STREAM     /usr/include/bits/socket_type.h
SOCK_DGRAM      /usr/include/bits/socket_type.h
SOCK_RAW        /usr/include/bits/socket_type.h
SOCK_RDM        /usr/include/bits/socket_type.h
SOCK_SEQPACKET  /usr/include/bits/socket_type.h
SOCK_DCCP       /usr/include/bits/socket_type.h
SOCK_PACKET     /usr/include/bits/socket_type.h
sockaddr        /usr/include/bits/socket.h
msghdr          /usr/include/bits/socket.h
cmsghdr         /usr/include/bits/socket.h
linger          /usr/include/bits/socket.h
SHUT_RD         /usr/include/sys/socket.h
SHUT_WR         /usr/include/sys/socket.h
SHUT_RDWR       /usr/include/sys/socket.h

It would be safe to wrap the corresponding types in the vmlinux.h with
_SYS_SOCKET_H / _SYS_TYPES guards if the definitions above match
between libc and kernel. To my surprise not all of them match. Here is
the list of genuine conflicts (for typedefs I skip the intermediate
definitions and print the last typedef in the chain):

Type: dev_t
typedef unsigned int __u32                                vmlinux.h
typedef unsigned long int __dev_t         /usr/include/bits/types.h

Type: nlink_t
typedef unsigned int __u32                                vmlinux.h
typedef unsigned long int __nlink_t       /usr/include/bits/types.h

Type: timer_t
typedef int __kernel_timer _t                             vmlinux.h
typedef void *__timer_t                   /usr/include/bits/types.h

Type: sigset_t
typedef struct                                            vmlinux.h
{
  long unsigned int sig[1];
} sigset_t

typedef struct                 /usr/include/bits/types/__sigset_t.h
{
  unsigned long int __val[1024 / (8 * (sizeof(unsigned long int)))];
} __sigset_t

Type: fd_set
typedef struct                                            vmlinux.h
{
  long unsigned int fds_bits[16];
} __kernel_fd_set

typedef struct                            /usr/include/sys/select.h
{
  __fd_mask __fds_bits[1024 / (8 * ((int) (sizeof(__fd_mask))))];
} fd_set

Type: sigset_t
typedef struct                                            vmlinux.h 
{
  long unsigned int sig[1];
} sigset_t

typedef struct                 /usr/include/bits/types/__sigset_t.h
{
  unsigned long int __val[1024 / (8 * (sizeof(unsigned long int)))];
} __sigset_t

Type: msghdr
struct msghdr                                             vmlinux.h
{
  void *msg_name;
  int msg_namelen;
  int msg_inq;
  struct iov_iter msg_iter;
  union 
  {
    void *msg_control;
    void *msg_control_user;
  };
  bool msg_control_is_user : 1;
  bool msg_get_inq : 1;
  unsigned int msg_flags;
  __kernel_size_t msg_controllen;
  struct kiocb *msg_iocb;
  struct ubuf_info *msg_ubuf;
  struct sock *, struct sk_buff *, struct iov_iter *, size_tint;
}

struct msghdr                            /usr/include/bits/socket.h
{
  void *msg_name;
  socklen_t msg_namelen;
  struct iovec *msg_iov;
  size_t msg_iovlen;
  void *msg_control;
  size_t msg_controllen;
  int msg_flags;
}

> 
> > Another possibility is to move the definition of 'struct sockaddr'
> > from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> > but I expect that this won't fly with the mainline as it might break
> > the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> > 
> > Conflict with vmlinux.h
> > ----
> > 
> > Uapi header:
> > - linux/signal.h
> > 
> > Conflict with vmlinux.h in definition of 'struct sigaction'.
> > Defined in:
> > - vmlinux.h: kernel/include/linux/signal_types.h
> > - uapi:      kernel/arch/x86/include/asm/signal.h
> > 
> > Uapi headers:
> > - linux/tipc_sockets_diag.h
> > - linux/sock_diag.h
> > 
> > Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
> 
> Interesting one!
> I think we can hard code '#undef SOCK_DESTROY' in vmlinux.h
> 
> The goal is not to be able to mix arbitrary uapi header with
> vmlinux.h, but only those that could be useful out of bpf progs.
> Currently it's tcp.h and few other network related headers
> because they have #define-s in them that are useful inside bpf progs.
> As long as the solution covers this small subset we're fine.

Well, tcp.h works :) It would be great if someone could list the
interesting headers.

Thanks,
Eduard
Eduard Zingerman Oct. 26, 2022, 11:54 p.m. UTC | #4
On Wed, 2022-10-26 at 12:10 +0100, Alan Maguire wrote:
> On 25/10/2022 23:27, Eduard Zingerman wrote:
> > Hi BPF community,
> > 
> > AFAIK there is a long standing feature request to use kernel headers
> > alongside `vmlinux.h` generated by `bpftool`. For example significant
> > effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> > clang, unfortunately this effort was stuck due to concerns regarding C
> > language semantics.
> > 
> 
> Thanks for the details! It's a tricky problem to solve.
> 
> Before diving into this, can I ask if there's another way round this;
> is there no way we could teach vmlinux.h generation which types to
> skip via some kind of bootstrapping process?
> 
> For a bpf object foo.bpf.c that wants to include linux/tcp.h and 
> vmlinux.h, something like this seems possible;
> 
> 1. run the preprocessor (gcc -E) over the BPF program to generate 
> a bootstrap header foo.bpf.exclude_types.h consisting of all the types mentioned
> in it and associated includes, but not in vmlinux.h  It would need to -D__VMLINUX_H__ 
> to avoid including vmlinux.h definitions and -D__BPF_EXCLUDE_TYPE_BOOTSTRAP__ to 
> skip the programs themselves, which would need a guard around them I think:
> 
> #include <stddef.h>
> #include <stdbool.h>
> #include <vmlinux.h>
> #include <linux/tcp.h>
> 
> #ifndef __BPF_EXCLUDE_TYPE_BOOTSTRAP__
> //rest of program here
> #endif
> 
> So as a result of this, we now have a single header file that contains all the types
> that non-vmlinux.h include files define.
> 
> 2. now to generate vmlinux.h, pass foo.bpf.exclude_types.h into "bpftool btf dump" as an
> exclude header:
> 
> bpftool btf dump --exclude /tmp/foo.bpf.types.h file /sys/kernel/btf/vmlinux format c > vmlinux.h
> 
> bpftool would have to parse the exclude header for actual type definitions, spotting struct,
> enum and typedef definitions. This is likely made easier by running the preprocessor
> at least since formatting is probably quite uniform. vmlinux.h could simply emit forward declarations
> for types described both in vmlinux BTF and in the exclude file.
> 
> So the build process would be
> 
> - start with empty vmlinux.h
> - bootstrap a header consisting of the set of types to exclude via c preprocessor
> - generate new vmlinux.h based on above
> - build bpf program
> 
> Build processes for BPF objects already has bootstrapping elements like
> this for generating vmlinux.h and skeletons, so while it's not perfect it might
> be a simpler approach. There may be problems with this I'm not seeing though?

I like the tool based approach more but I heard there were some
reservations about separate tool complicating the build process.
- the tool does not require changes to the kbuild;
- the tool is not limited to uapi headers;
- the tool could imitate the dominating declarations attribute and
  address the issue with definitions miss-match between vmlinux.h and
  system headers (see my reply to Alexei in the parallel sub-thread).
  To support this point it would have to work in a somewhat different
  order:
  - read/pre-process the input file;
  - remove from it the definitions that are also present in kernel BTF;
  - prepend vmlinux.h to the beginning of the file.

On the other hand I think that implementation would need a C parser /
type analysis step. Thus it would be harder to implement it as a part
of bpftool. But the prototype using something like [1] should be simple.

Thanks,
Eduard


[1] https://github.com/inducer/pycparserext

> 
> Thanks!
> 
> Alan
> 
> > After some discussion with Alexei and Yonghong I'd like to request
> > your comments regarding a somewhat brittle and partial solution to
> > this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> > the generated `vmlinux.h`.
> > 
> > The basic idea
> > ---
> > 
> > The goal of the patch set is to allow usage of header files from
> > `include/uapi` alongside `vmlinux.h` as follows:
> > 
> >   #include <uapi/linux/tcp.h>
> >   #include "vmlinux.h"
> > 
> > This goal is achieved by adding `#ifndef ... #endif` guards in
> > `vmlinux.h` around definitions that originate from the `include/uapi`
> > headers. The guards emitted match the guards used in the original
> > headers. E.g. as follows:
> > 
> > include/uapi/linux/tcp.h:
> > 
> >   #ifndef _UAPI_LINUX_TCP_H
> >   #define _UAPI_LINUX_TCP_H
> >   ...
> >   union tcp_word_hdr {
> > 	struct tcphdr hdr;
> > 	__be32        words[5];
> >   };
> >   ...
> >   #endif /* _UAPI_LINUX_TCP_H */
> > 
> > vmlinux.h:
> > 
> >   ...
> >   #ifndef _UAPI_LINUX_TCP_H
> > 
> >   union tcp_word_hdr {
> > 	struct tcphdr hdr;
> > 	__be32 words[5];
> >   };
> > 
> >   #endif /* _UAPI_LINUX_TCP_H */
> >   ...
> > 
> > To get to this state the following steps are necessary:
> > - "header guard" name should be identified for each header file;
> > - the correspondence between data type and it's header guard has to be
> >   encoded in BTF;
> > - `bpftool` should be adjusted to emit `#ifndef FOO_H ... #endif`
> >   brackets.
> > 
> > It is not possible to identify header guard names for all uapi headers
> > basing only on the file name. However a simple script could devised to
> > identify the guards basing on the file name and it's content. Thus it
> > is possible to obtain the list of header names with corresponding
> > header guards.
> > 
> > The correspondence between type and it's declaration file (header) is
> > available in DWARF as `DW_AT_decl_file` attribute. The
> > `DW_AT_decl_file` can be matched with the list of header guards
> > described above to obtain the header guard name for a specific type.
> > 
> > The `pahole` generates BTF using DWARF. It is possible to modify
> > `pahole` to accept the header guards list as an additional parameter
> > and to encode the header guard names in BTF.
> > 
> > Implementation details
> > ---
> > 
> > Present patch-set implements these ideas as follows:
> > - A parameter `--header_guards_db` is added to `pahole`. If present it
> >   points to a file with a list of `<header> <guard>` records.
> > - `pahole` uses DWARF `DW_AT_decl_file` value to lookup the header
> >   guard for each type emitted to BTF. If header guard is present it is
> >   encoded alongside the type.
> > - Header guards are encoded in BTF as `BTF_DECL_TAG` records with a
> >   special prefix. The prefix "header_guard:" is added to a value of
> >   such tags. (Here `BTF_DECL_TAG` is used to avoid BTF binary format
> >   changes).
> > - A special script `infer_header_guards.pl` is added as a part of
> >   kbuild, it can infer header guard names for each UAPI header basing
> >   on the header content.
> > - This script is invoked from `link-vmlinux.sh` prior to BTF
> >   generation during kernel build. The output of the script is saved to
> >   a file, the file is passed to `pahole` as `--header_guards_db`
> >   parameter.
> > - `libbpf` is modified to aggregate `BTF_DECL_TAG` records for each
> >   type and to emit `#ifndef FOO_H ... #endif` brackets when
> >   "header_guard:" tag is present for a type.
> > 
> > Details for each patch in a set:
> > - libbpf: Deduplicate unambigous standalone forward declarations
> > - selftests/bpf: Tests for standalone forward BTF declarations deduplication
> > 
> >   There is a small number (63 for defconfig) of forward declarations
> >   that are not de-duplicated with the main type declaration under
> >   certain conditions. This hinders the header guard brackets
> >   generation. This patch addresses this de-duplication issue.
> > 
> > - libbpf: Support for BTF_DECL_TAG dump in C format
> > - selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> > 
> >   Currently libbpf does not process BTF_DECL_TAG when btf is dumped in
> >   C format. This patch adds a hash table matching btf type ids with a
> >   list of decl tags to the struct btf_dump.
> >   The `btf_dump_emit_decl_tags` is not necessary for the overall
> >   patch-set to function but simplifies testing a bit.
> > 
> > - libbpf: Header guards for selected data structures in vmlinux.h
> > - selftests/bpf: Tests for header guards printing in BTF dump
> > 
> >   Adds option `emit_header_guards` to `struct btf_dump_opts`.
> >   When enabled the `btf_dump__dump_type` prints `#ifndef ... #endif`
> >   brackets around types for which header guard information is present
> >   in BTF.
> > 
> > - bpftool: Enable header guards generation
> > 
> >   Unconditionally enables `emit_header_guards` for BTF dump in C format.
> > 
> > - kbuild: Script to infer header guard values for uapi headers
> > - kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> > 
> >   Adds `scripts/infer_header_guards.pl` and integrates it with
> >   `link-vmlinux.sh`.
> > 
> > - selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> > 
> >   Adds a script `test_uapi_headers.py` that tests header guards with
> >   vmlinux.h by compiling a simple C snippet. The snippet looks as
> >   follows:
> >   
> >     #include <some_uapi_header.h>
> >     #include "vmlinux.h"
> >   
> >     __attribute__((section("tc"), used))
> >     int syncookie_tc(struct __sk_buff *skb) { return 0; }
> >   
> >   The list of headers to test comes from
> >   `tools/testing/selftests/bpf/good_uapi_headers.txt`.
> > 
> > - selftests/bpf: Known good uapi headers for test_uapi_headers.py
> > 
> >   The list of uapi headers that could be included alongside vmlinux.h.
> >   The headers are peeked from the following locations:
> >   - <headers-export-dir>/linux/*.h
> >   - <headers-export-dir>/linux/**/*.h
> >   This choice of locations is somewhat arbitrary.
> > 
> > - selftests/bpf: script for infer_header_guards.pl testing
> > 
> >   The test case for `scripts/infer_header_guards.pl`, verifies that
> >   header guards can be inferred for all uapi headers.
> > 
> > - There is also a patch for dwarves that adds `--header_guards_db`
> >   option (see [2]).
> > 
> > The `test_uapi_headers.py` is important as it demonstrates the
> > the necessary compiler flags:
> > 
> > clang ...                                  \
> >       -D__x86_64__                         \
> >       -Xclang -fwchar-type=short           \
> >       -Xclang -fno-signed-wchar            \
> >       -I{exported_kernel_headers}/include/ \
> >       ...
> > 
> > - `-fwchar-type=short` and `-fno-signed-wchar` had to be added because
> >   BPF target uses `int` for `wchar_t` by default and this differs from
> >   `vmlinux.h` definition of the type (at least for x86_64).
> > - `__x86_64__` had to be added for uapi headers that include
> >   `stddef.h` (the one that is supplied my CLANG itself), in order to
> >   define correct sizes for `size_t` and `ptrdiff_t`.
> > - The `{exported_kernel_headers}` stands for exported kernel headers
> >   directory (the headers obtained by `make headers_install` or via
> >   distribution package).
> > 
> > When it works
> > ---
> > 
> > The mechanics described above works for a significant number of UAPI
> > headers. For example, for the test case above I chose the headers from
> > the following locations:
> > - linux/*.h
> > - linux/**/*.h
> > There are 759 such headers and for 677 of them the test described
> > above passes.
> > 
> > I excluded the headers from the following sub-directories as
> > potentially not interesting:
> > 
> >   asm          rdma   video xen
> >   asm-generic  misc   scsi
> >   drm          mtd    sound
> > 
> > Thus saving some time for both discussion and CI but the choice is
> > somewhat arbitrary. If I run `test_uapi_headers.py --test '*'` (all
> > headers) test passes for 834 out of 972 headers.
> > 
> > When it breaks
> > ---
> > 
> > There several scenarios when this mechanics breaks.
> > Specifically I found the following cases:
> > - When uapi header includes some system header that conflicts with
> >   vmlinux.h.
> > - When uapi header itself conflicts with vmlinux.h.
> > 
> > Below are examples for both cases.
> > 
> > Conflict with system headers
> > ----
> > 
> > The following uapi headers:
> > - linux/atmbr2684.h
> > - linux/bpfilter.h
> > - linux/gsmmux.h
> > - linux/icmp.h
> > - linux/if.h
> > - linux/if_arp.h
> > - linux/if_bonding.h
> > - linux/if_pppox.h
> > - linux/if_tunnel.h
> > - linux/ip6_tunnel.h
> > - linux/llc.h
> > - linux/mctp.h
> > - linux/mptcp.h
> > - linux/netdevice.h
> > - linux/netfilter/xt_RATEEST.h
> > - linux/netfilter/xt_hashlimit.h
> > - linux/netfilter/xt_physdev.h
> > - linux/netfilter/xt_rateest.h
> > - linux/netfilter_arp/arp_tables.h
> > - linux/netfilter_arp/arpt_mangle.h
> > - linux/netfilter_bridge.h
> > - linux/netfilter_bridge/ebtables.h
> > - linux/netfilter_ipv4/ip_tables.h
> > - linux/netfilter_ipv6/ip6_tables.h
> > - linux/route.h
> > - linux/wireless.h
> > 
> > Include the following system header:
> > - /usr/include/sys/socket.h (all via linux/if.h)
> > 
> > The sys/socket.h conflicts with vmlinux.h in:
> > - types: struct iovec, struct sockaddr, struct msghdr, ...
> > - constants: SOCK_STREAM, SOCK_DGRAM, ...
> > 
> > However, only two types are actually used:
> > - struct sockaddr
> > - struct sockaddr_storage (used only in linux/mptcp.h)
> > 
> > In 'vmlinux.h' this type originates from 'kernel/include/socket.h'
> > (non UAPI header), thus does not have a header guard.
> > 
> > The only workaround that I see is to:
> > - define a stub sys/socket.h as follows:
> > 
> >     #ifndef __BPF_SOCKADDR__
> >     #define __BPF_SOCKADDR__
> >     
> >     /* For __kernel_sa_family_t */
> >     #include <linux/socket.h>
> >     
> >     struct sockaddr {
> >         __kernel_sa_family_t sa_family;
> >         char sa_data[14];
> >     };
> >     
> >     #endif
> > 
> > - hardcode generation of __BPF_SOCKADDR__ bracket for
> >   'struct sockaddr' in vmlinux.h.
> > 
> > Another possibility is to move the definition of 'struct sockaddr'
> > from 'kernel/include/socket.h' to 'kernel/include/uapi/linux/socket.h',
> > but I expect that this won't fly with the mainline as it might break
> > the programs that include both 'linux/socket.h' and 'sys/socket.h'.
> > 
> > Conflict with vmlinux.h
> > ----
> > 
> > Uapi header:
> > - linux/signal.h
> > 
> > Conflict with vmlinux.h in definition of 'struct sigaction'.
> > Defined in:
> > - vmlinux.h: kernel/include/linux/signal_types.h
> > - uapi:      kernel/arch/x86/include/asm/signal.h
> > 
> > Uapi headers:
> > - linux/tipc_sockets_diag.h
> > - linux/sock_diag.h
> > 
> > Conflict with vmlinux.h in definition of 'SOCK_DESTROY'.
> > Defined in:
> > - vmlinux.h: kernel/include/net/sock.h
> > - uapi:      kernel/include/uapi/linux/sock_diag.h
> > Constants seem to be unrelated.
> > 
> > And so on... I have details for many other headers but omit those for
> > brevity.
> > 
> > In conclusion
> > ---
> > 
> > Except from the general feasibility I have a few questions:
> > - What UAPI headers are the candidates for such use? If there are some
> >   interesting headers currently not working with this patch-set some
> >   hacks have to be added (e.g. like with `linux/if.h`).
> > - Is it ok to encode header guards as special `BTF_DECL_TAG` or should
> >   I change the BTF format a bit to save some bytes.
> > 
> > Thanks,
> > Eduard
> > 
> > 
> > [1] https://reviews.llvm.org/D111307
> >     [clang] __attribute__ bpf_dominating_decl
> > [2] https://lore.kernel.org/dwarves/20221025220729.2293891-1-eddyz87@gmail.com/T/
> >     [RFC dwarves] pahole: Save header guard names when
> >                   --header_guards_db is passed
> > 
> > Eduard Zingerman (12):
> >   libbpf: Deduplicate unambigous standalone forward declarations
> >   selftests/bpf: Tests for standalone forward BTF declarations
> >     deduplication
> >   libbpf: Support for BTF_DECL_TAG dump in C format
> >   selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> >   libbpf: Header guards for selected data structures in vmlinux.h
> >   selftests/bpf: Tests for header guards printing in BTF dump
> >   bpftool: Enable header guards generation
> >   kbuild: Script to infer header guard values for uapi headers
> >   kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> >   selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> >   selftests/bpf: Known good uapi headers for test_uapi_headers.py
> >   selftests/bpf: script for infer_header_guards.pl testing
> > 
> >  scripts/infer_header_guards.pl                | 191 +++++
> >  scripts/link-vmlinux.sh                       |  13 +-
> >  tools/bpf/bpftool/btf.c                       |   4 +-
> >  tools/lib/bpf/btf.c                           | 178 ++++-
> >  tools/lib/bpf/btf.h                           |   7 +-
> >  tools/lib/bpf/btf_dump.c                      | 232 +++++-
> >  .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> >  tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> >  .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> >  .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> >  .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> >  .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> >  .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> >  13 files changed, 1816 insertions(+), 12 deletions(-)
> >  create mode 100755 scripts/infer_header_guards.pl
> >  create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> >  create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> >  create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> >
Andrii Nakryiko Oct. 27, 2022, 11:14 p.m. UTC | #5
On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> Hi BPF community,
>
> AFAIK there is a long standing feature request to use kernel headers
> alongside `vmlinux.h` generated by `bpftool`. For example significant
> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> clang, unfortunately this effort was stuck due to concerns regarding C
> language semantics.
>

Maybe we should make another attempt to implement bpf_dominating_decl?
That seems like a more elegant solution than any other implemented or
discussed alternative. Yonghong, WDYT?

BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
separate series and sending them as non-RFC sooner. Those improvements
are independent of all the header guards stuff, let's get them landed
sooner.

> After some discussion with Alexei and Yonghong I'd like to request
> your comments regarding a somewhat brittle and partial solution to
> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> the generated `vmlinux.h`.
>

[...]

> Eduard Zingerman (12):
>   libbpf: Deduplicate unambigous standalone forward declarations
>   selftests/bpf: Tests for standalone forward BTF declarations
>     deduplication
>   libbpf: Support for BTF_DECL_TAG dump in C format
>   selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>   libbpf: Header guards for selected data structures in vmlinux.h
>   selftests/bpf: Tests for header guards printing in BTF dump
>   bpftool: Enable header guards generation
>   kbuild: Script to infer header guard values for uapi headers
>   kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>   selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>   selftests/bpf: Known good uapi headers for test_uapi_headers.py
>   selftests/bpf: script for infer_header_guards.pl testing
>
>  scripts/infer_header_guards.pl                | 191 +++++
>  scripts/link-vmlinux.sh                       |  13 +-
>  tools/bpf/bpftool/btf.c                       |   4 +-
>  tools/lib/bpf/btf.c                           | 178 ++++-
>  tools/lib/bpf/btf.h                           |   7 +-
>  tools/lib/bpf/btf_dump.c                      | 232 +++++-
>  .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>  tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>  .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>  .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>  .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>  .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>  .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>  13 files changed, 1816 insertions(+), 12 deletions(-)
>  create mode 100755 scripts/infer_header_guards.pl
>  create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>  create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>  create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>
> --
> 2.34.1
>
Yonghong Song Oct. 28, 2022, 1:33 a.m. UTC | #6
On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> Hi BPF community,
>>
>> AFAIK there is a long standing feature request to use kernel headers
>> alongside `vmlinux.h` generated by `bpftool`. For example significant
>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
>> clang, unfortunately this effort was stuck due to concerns regarding C
>> language semantics.
>>
> 
> Maybe we should make another attempt to implement bpf_dominating_decl?
> That seems like a more elegant solution than any other implemented or
> discussed alternative. Yonghong, WDYT?

I would say it would be very difficult for upstream to agree with
bpf_dominating_decl. We already have lots of discussions and we
likely won't be able to satisfy Aaron who wants us to emit
adequate diagnostics which will involve lots of other work
and he also thinks this is too far away from C standard and he
wants us to implement in a llvm/clang tool which is not what
we want.

> 
> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> separate series and sending them as non-RFC sooner. Those improvements
> are independent of all the header guards stuff, let's get them landed
> sooner.
> 
>> After some discussion with Alexei and Yonghong I'd like to request
>> your comments regarding a somewhat brittle and partial solution to
>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>> the generated `vmlinux.h`.
>>
> 
> [...]
> 
>> Eduard Zingerman (12):
>>    libbpf: Deduplicate unambigous standalone forward declarations
>>    selftests/bpf: Tests for standalone forward BTF declarations
>>      deduplication
>>    libbpf: Support for BTF_DECL_TAG dump in C format
>>    selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>    libbpf: Header guards for selected data structures in vmlinux.h
>>    selftests/bpf: Tests for header guards printing in BTF dump
>>    bpftool: Enable header guards generation
>>    kbuild: Script to infer header guard values for uapi headers
>>    kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>    selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>    selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>    selftests/bpf: script for infer_header_guards.pl testing
>>
>>   scripts/infer_header_guards.pl                | 191 +++++
>>   scripts/link-vmlinux.sh                       |  13 +-
>>   tools/bpf/bpftool/btf.c                       |   4 +-
>>   tools/lib/bpf/btf.c                           | 178 ++++-
>>   tools/lib/bpf/btf.h                           |   7 +-
>>   tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>   .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>   tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>   .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>   .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>   .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>   .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>   .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>   13 files changed, 1816 insertions(+), 12 deletions(-)
>>   create mode 100755 scripts/infer_header_guards.pl
>>   create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>   create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>   create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>
>> --
>> 2.34.1
>>
Andrii Nakryiko Oct. 28, 2022, 5:13 p.m. UTC | #7
On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
> > On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>
> >> Hi BPF community,
> >>
> >> AFAIK there is a long standing feature request to use kernel headers
> >> alongside `vmlinux.h` generated by `bpftool`. For example significant
> >> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> >> clang, unfortunately this effort was stuck due to concerns regarding C
> >> language semantics.
> >>
> >
> > Maybe we should make another attempt to implement bpf_dominating_decl?
> > That seems like a more elegant solution than any other implemented or
> > discussed alternative. Yonghong, WDYT?
>
> I would say it would be very difficult for upstream to agree with
> bpf_dominating_decl. We already have lots of discussions and we
> likely won't be able to satisfy Aaron who wants us to emit
> adequate diagnostics which will involve lots of other work
> and he also thinks this is too far away from C standard and he
> wants us to implement in a llvm/clang tool which is not what
> we want.

Ok, could we change the problem to detecting if some type is defined.
Would it be possible to have something like

#if !__is_type_defined(struct abc)
struct abc {
};
#endif

I think we talked about this and there were problems with this
approach, but I don't remember details and how insurmountable the
problem is. Having a way to check whether some type is defined would
be very useful even outside of -target bpf parlance, though, so maybe
it's the problem worth attacking?

>
> >
> > BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> > separate series and sending them as non-RFC sooner. Those improvements
> > are independent of all the header guards stuff, let's get them landed
> > sooner.
> >
> >> After some discussion with Alexei and Yonghong I'd like to request
> >> your comments regarding a somewhat brittle and partial solution to
> >> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> >> the generated `vmlinux.h`.
> >>
> >
> > [...]
> >
> >> Eduard Zingerman (12):
> >>    libbpf: Deduplicate unambigous standalone forward declarations
> >>    selftests/bpf: Tests for standalone forward BTF declarations
> >>      deduplication
> >>    libbpf: Support for BTF_DECL_TAG dump in C format
> >>    selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> >>    libbpf: Header guards for selected data structures in vmlinux.h
> >>    selftests/bpf: Tests for header guards printing in BTF dump
> >>    bpftool: Enable header guards generation
> >>    kbuild: Script to infer header guard values for uapi headers
> >>    kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> >>    selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> >>    selftests/bpf: Known good uapi headers for test_uapi_headers.py
> >>    selftests/bpf: script for infer_header_guards.pl testing
> >>
> >>   scripts/infer_header_guards.pl                | 191 +++++
> >>   scripts/link-vmlinux.sh                       |  13 +-
> >>   tools/bpf/bpftool/btf.c                       |   4 +-
> >>   tools/lib/bpf/btf.c                           | 178 ++++-
> >>   tools/lib/bpf/btf.h                           |   7 +-
> >>   tools/lib/bpf/btf_dump.c                      | 232 +++++-
> >>   .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> >>   tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> >>   .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> >>   .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> >>   .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> >>   .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> >>   .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> >>   13 files changed, 1816 insertions(+), 12 deletions(-)
> >>   create mode 100755 scripts/infer_header_guards.pl
> >>   create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> >>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> >>   create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> >>   create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> >>   create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> >>
> >> --
> >> 2.34.1
> >>
Yonghong Song Oct. 28, 2022, 6:56 p.m. UTC | #8
On 10/28/22 10:13 AM, Andrii Nakryiko wrote:
> On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
>>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>
>>>> Hi BPF community,
>>>>
>>>> AFAIK there is a long standing feature request to use kernel headers
>>>> alongside `vmlinux.h` generated by `bpftool`. For example significant
>>>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
>>>> clang, unfortunately this effort was stuck due to concerns regarding C
>>>> language semantics.
>>>>
>>>
>>> Maybe we should make another attempt to implement bpf_dominating_decl?
>>> That seems like a more elegant solution than any other implemented or
>>> discussed alternative. Yonghong, WDYT?
>>
>> I would say it would be very difficult for upstream to agree with
>> bpf_dominating_decl. We already have lots of discussions and we
>> likely won't be able to satisfy Aaron who wants us to emit
>> adequate diagnostics which will involve lots of other work
>> and he also thinks this is too far away from C standard and he
>> wants us to implement in a llvm/clang tool which is not what
>> we want.
> 
> Ok, could we change the problem to detecting if some type is defined.
> Would it be possible to have something like
> 
> #if !__is_type_defined(struct abc)
> struct abc {
> };
> #endif
> 
> I think we talked about this and there were problems with this
> approach, but I don't remember details and how insurmountable the
> problem is. Having a way to check whether some type is defined would
> be very useful even outside of -target bpf parlance, though, so maybe
> it's the problem worth attacking?

Yes, we discussed this before. This will need to add additional work
in preprocessor. I just made a discussion topic in llvm discourse

https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268

Let us see whether we can get some upstream agreement or not.

> 
>>
>>>
>>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
>>> separate series and sending them as non-RFC sooner. Those improvements
>>> are independent of all the header guards stuff, let's get them landed
>>> sooner.
>>>
>>>> After some discussion with Alexei and Yonghong I'd like to request
>>>> your comments regarding a somewhat brittle and partial solution to
>>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>>>> the generated `vmlinux.h`.
>>>>
>>>
>>> [...]
>>>
>>>> Eduard Zingerman (12):
>>>>     libbpf: Deduplicate unambigous standalone forward declarations
>>>>     selftests/bpf: Tests for standalone forward BTF declarations
>>>>       deduplication
>>>>     libbpf: Support for BTF_DECL_TAG dump in C format
>>>>     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>>>     libbpf: Header guards for selected data structures in vmlinux.h
>>>>     selftests/bpf: Tests for header guards printing in BTF dump
>>>>     bpftool: Enable header guards generation
>>>>     kbuild: Script to infer header guard values for uapi headers
>>>>     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>>>     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>>>     selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>>>     selftests/bpf: script for infer_header_guards.pl testing
>>>>
>>>>    scripts/infer_header_guards.pl                | 191 +++++
>>>>    scripts/link-vmlinux.sh                       |  13 +-
>>>>    tools/bpf/bpftool/btf.c                       |   4 +-
>>>>    tools/lib/bpf/btf.c                           | 178 ++++-
>>>>    tools/lib/bpf/btf.h                           |   7 +-
>>>>    tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>>>    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>>>    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>>>    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>>>    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>>>    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>>>    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>>>    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>>>    13 files changed, 1816 insertions(+), 12 deletions(-)
>>>>    create mode 100755 scripts/infer_header_guards.pl
>>>>    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>>>
>>>> --
>>>> 2.34.1
>>>>
Andrii Nakryiko Oct. 28, 2022, 9:35 p.m. UTC | #9
On Fri, Oct 28, 2022 at 11:57 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 10/28/22 10:13 AM, Andrii Nakryiko wrote:
> > On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
> >>
> >>
> >>
> >> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
> >>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >>>>
> >>>> Hi BPF community,
> >>>>
> >>>> AFAIK there is a long standing feature request to use kernel headers
> >>>> alongside `vmlinux.h` generated by `bpftool`. For example significant
> >>>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
> >>>> clang, unfortunately this effort was stuck due to concerns regarding C
> >>>> language semantics.
> >>>>
> >>>
> >>> Maybe we should make another attempt to implement bpf_dominating_decl?
> >>> That seems like a more elegant solution than any other implemented or
> >>> discussed alternative. Yonghong, WDYT?
> >>
> >> I would say it would be very difficult for upstream to agree with
> >> bpf_dominating_decl. We already have lots of discussions and we
> >> likely won't be able to satisfy Aaron who wants us to emit
> >> adequate diagnostics which will involve lots of other work
> >> and he also thinks this is too far away from C standard and he
> >> wants us to implement in a llvm/clang tool which is not what
> >> we want.
> >
> > Ok, could we change the problem to detecting if some type is defined.
> > Would it be possible to have something like
> >
> > #if !__is_type_defined(struct abc)
> > struct abc {
> > };
> > #endif
> >
> > I think we talked about this and there were problems with this
> > approach, but I don't remember details and how insurmountable the
> > problem is. Having a way to check whether some type is defined would
> > be very useful even outside of -target bpf parlance, though, so maybe
> > it's the problem worth attacking?
>
> Yes, we discussed this before. This will need to add additional work
> in preprocessor. I just made a discussion topic in llvm discourse
>
> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
>
> Let us see whether we can get some upstream agreement or not.
>

Thanks for starting the conversation! I'll be following along.

> >
> >>
> >>>
> >>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> >>> separate series and sending them as non-RFC sooner. Those improvements
> >>> are independent of all the header guards stuff, let's get them landed
> >>> sooner.
> >>>
> >>>> After some discussion with Alexei and Yonghong I'd like to request
> >>>> your comments regarding a somewhat brittle and partial solution to
> >>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> >>>> the generated `vmlinux.h`.
> >>>>
> >>>
> >>> [...]
> >>>
> >>>> Eduard Zingerman (12):
> >>>>     libbpf: Deduplicate unambigous standalone forward declarations
> >>>>     selftests/bpf: Tests for standalone forward BTF declarations
> >>>>       deduplication
> >>>>     libbpf: Support for BTF_DECL_TAG dump in C format
> >>>>     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> >>>>     libbpf: Header guards for selected data structures in vmlinux.h
> >>>>     selftests/bpf: Tests for header guards printing in BTF dump
> >>>>     bpftool: Enable header guards generation
> >>>>     kbuild: Script to infer header guard values for uapi headers
> >>>>     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> >>>>     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> >>>>     selftests/bpf: Known good uapi headers for test_uapi_headers.py
> >>>>     selftests/bpf: script for infer_header_guards.pl testing
> >>>>
> >>>>    scripts/infer_header_guards.pl                | 191 +++++
> >>>>    scripts/link-vmlinux.sh                       |  13 +-
> >>>>    tools/bpf/bpftool/btf.c                       |   4 +-
> >>>>    tools/lib/bpf/btf.c                           | 178 ++++-
> >>>>    tools/lib/bpf/btf.h                           |   7 +-
> >>>>    tools/lib/bpf/btf_dump.c                      | 232 +++++-
> >>>>    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> >>>>    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> >>>>    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> >>>>    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> >>>>    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> >>>>    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> >>>>    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> >>>>    13 files changed, 1816 insertions(+), 12 deletions(-)
> >>>>    create mode 100755 scripts/infer_header_guards.pl
> >>>>    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> >>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> >>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> >>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> >>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> >>>>
> >>>> --
> >>>> 2.34.1
> >>>>
Alan Maguire Nov. 1, 2022, 4:01 p.m. UTC | #10
On 28/10/2022 22:35, Andrii Nakryiko wrote:
> On Fri, Oct 28, 2022 at 11:57 AM Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 10/28/22 10:13 AM, Andrii Nakryiko wrote:
>>> On Thu, Oct 27, 2022 at 6:33 PM Yonghong Song <yhs@meta.com> wrote:
>>>>
>>>>
>>>>
>>>> On 10/27/22 4:14 PM, Andrii Nakryiko wrote:
>>>>> On Tue, Oct 25, 2022 at 3:28 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>>>>>
>>>>>> Hi BPF community,
>>>>>>
>>>>>> AFAIK there is a long standing feature request to use kernel headers
>>>>>> alongside `vmlinux.h` generated by `bpftool`. For example significant
>>>>>> effort was put to add an attribute `bpf_dominating_decl` (see [1]) to
>>>>>> clang, unfortunately this effort was stuck due to concerns regarding C
>>>>>> language semantics.
>>>>>>
>>>>>
>>>>> Maybe we should make another attempt to implement bpf_dominating_decl?
>>>>> That seems like a more elegant solution than any other implemented or
>>>>> discussed alternative. Yonghong, WDYT?
>>>>
>>>> I would say it would be very difficult for upstream to agree with
>>>> bpf_dominating_decl. We already have lots of discussions and we
>>>> likely won't be able to satisfy Aaron who wants us to emit
>>>> adequate diagnostics which will involve lots of other work
>>>> and he also thinks this is too far away from C standard and he
>>>> wants us to implement in a llvm/clang tool which is not what
>>>> we want.
>>>
>>> Ok, could we change the problem to detecting if some type is defined.
>>> Would it be possible to have something like
>>>
>>> #if !__is_type_defined(struct abc)
>>> struct abc {
>>> };
>>> #endif
>>>
>>> I think we talked about this and there were problems with this
>>> approach, but I don't remember details and how insurmountable the
>>> problem is. Having a way to check whether some type is defined would
>>> be very useful even outside of -target bpf parlance, though, so maybe
>>> it's the problem worth attacking?
>>
>> Yes, we discussed this before. This will need to add additional work
>> in preprocessor. I just made a discussion topic in llvm discourse
>>
>> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
>>
>> Let us see whether we can get some upstream agreement or not.
>>
> 
> Thanks for starting the conversation! I'll be following along.
>


I think this sort of approach assumes that vmlinux.h is included after
any uapi headers, and would guard type definitions with 

#if type_is_defined(foo)
struct foo {

};
#endif

...is that right? My concern is that the vmlinux.h definitions have
the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
definitions to dominate over UAPI definitions do you think, or does it
matter?

I was wondering if there might be yet another way to crack this;
if we did want the vmlinux.h type definitions to be authoritative
because they have the preserve access index attribute, and because
bpftool knows all vmlinux types, it could use that info to selectively
redefine those type names such that we avoid name clashes when later
including UAPI headers. Something like

#ifdef __VMLINUX_H__
//usual vmlinux.h type definitions
#endif /* __VMLINUX_H__ */

#ifdef __VMLINUX_ALIAS__
if !defined(timespec64)
#define timespec64 __VMLINUX_ALIAS__timespec64
#endif
// rest of the types define aliases here
#undef __VMLINUX_ALIAS__
#else /* unalias */
#if defined(timespec64)
#undef timespec64
#endif
// rest of types undef aliases here
#endif /* __VMLINUX_ALIAS__ */


Then the consumer does this:

#define __VMLINUX_ALIAS__
#include "vmlinux.h"
// include uapi headers
#include "vmlinux.h"

(the latter include of vmlinux.h is needed to undef all the type aliases)

I tried hacking up bpftool to support this aliasing scheme and while 
it is kind of hacky it does seem to work, aside from some issues with 
IPPROTO_* definitions - for the enumerated IPPROTO_ values linux/in.h does
this:

enum {
  IPPROTO_IP = 0,               /* Dummy protocol for TCP               */
#define IPPROTO_IP              IPPROTO_IP
  IPPROTO_ICMP = 1,             /* Internet Control Message Protocol    */
#define IPPROTO_ICMP            IPPROTO_ICMP


...so our enum value definitions for IPPROTO_ values clash with the above 
definitions. These could be individually ifdef-guarded if needed though I think.

I can send the proof-of-concept patch if it would help, I just wanted to 
check in case that might be a workable path too, since it just requires 
changes to bpftool (and changes to in.h).

Thanks!

Alan

 
>>>
>>>>
>>>>>
>>>>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
>>>>> separate series and sending them as non-RFC sooner. Those improvements
>>>>> are independent of all the header guards stuff, let's get them landed
>>>>> sooner.
>>>>>
>>>>>> After some discussion with Alexei and Yonghong I'd like to request
>>>>>> your comments regarding a somewhat brittle and partial solution to
>>>>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>>>>>> the generated `vmlinux.h`.
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>> Eduard Zingerman (12):
>>>>>>     libbpf: Deduplicate unambigous standalone forward declarations
>>>>>>     selftests/bpf: Tests for standalone forward BTF declarations
>>>>>>       deduplication
>>>>>>     libbpf: Support for BTF_DECL_TAG dump in C format
>>>>>>     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>>>>>     libbpf: Header guards for selected data structures in vmlinux.h
>>>>>>     selftests/bpf: Tests for header guards printing in BTF dump
>>>>>>     bpftool: Enable header guards generation
>>>>>>     kbuild: Script to infer header guard values for uapi headers
>>>>>>     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>>>>>     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>>>>>     selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>>>>>     selftests/bpf: script for infer_header_guards.pl testing
>>>>>>
>>>>>>    scripts/infer_header_guards.pl                | 191 +++++
>>>>>>    scripts/link-vmlinux.sh                       |  13 +-
>>>>>>    tools/bpf/bpftool/btf.c                       |   4 +-
>>>>>>    tools/lib/bpf/btf.c                           | 178 ++++-
>>>>>>    tools/lib/bpf/btf.h                           |   7 +-
>>>>>>    tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>>>>>    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>>>>>    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>>>>>    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>>>>>    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>>>>>    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>>>>>    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>>>>>    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>>>>>    13 files changed, 1816 insertions(+), 12 deletions(-)
>>>>>>    create mode 100755 scripts/infer_header_guards.pl
>>>>>>    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>>>>>    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>>>>>    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>
Alexei Starovoitov Nov. 1, 2022, 6:35 p.m. UTC | #11
On Tue, Nov 1, 2022 at 9:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> >> Yes, we discussed this before. This will need to add additional work
> >> in preprocessor. I just made a discussion topic in llvm discourse
> >>
> >> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268

That would be a great clang feature.
Thanks Yonghong!

> >>
> >> Let us see whether we can get some upstream agreement or not.
> >>
> >
> > Thanks for starting the conversation! I'll be following along.
> >
>
>
> I think this sort of approach assumes that vmlinux.h is included after
> any uapi headers, and would guard type definitions with
>
> #if type_is_defined(foo)
> struct foo {
>
> };
> #endif
>
> ...is that right? My concern is that the vmlinux.h definitions have
> the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
> definitions to dominate over UAPI definitions do you think, or does it
> matter?

I think it's totally fine to require #include "vmlinux.h" to be last.
The attr(preserve_access_index) is only useful for kernel internal
structs. uapi structs don't need it.

>
> I was wondering if there might be yet another way to crack this;
> if we did want the vmlinux.h type definitions to be authoritative
> because they have the preserve access index attribute, and because
> bpftool knows all vmlinux types, it could use that info to selectively
> redefine those type names such that we avoid name clashes when later
> including UAPI headers. Something like
>
> #ifdef __VMLINUX_H__
> //usual vmlinux.h type definitions
> #endif /* __VMLINUX_H__ */
>
> #ifdef __VMLINUX_ALIAS__
> if !defined(timespec64)
> #define timespec64 __VMLINUX_ALIAS__timespec64
> #endif
> // rest of the types define aliases here
> #undef __VMLINUX_ALIAS__
> #else /* unalias */
> #if defined(timespec64)
> #undef timespec64
> #endif
> // rest of types undef aliases here
> #endif /* __VMLINUX_ALIAS__ */
>
>
> Then the consumer does this:
>
> #define __VMLINUX_ALIAS__
> #include "vmlinux.h"
> // include uapi headers
> #include "vmlinux.h"
>
> (the latter include of vmlinux.h is needed to undef all the type aliases)

Sounds like a bunch of complexity for the use case that is not
clear to me.

>
> I tried hacking up bpftool to support this aliasing scheme and while
> it is kind of hacky it does seem to work, aside from some issues with
> IPPROTO_* definitions - for the enumerated IPPROTO_ values linux/in.h does
> this:
>
> enum {
>   IPPROTO_IP = 0,               /* Dummy protocol for TCP               */
> #define IPPROTO_IP              IPPROTO_IP
>   IPPROTO_ICMP = 1,             /* Internet Control Message Protocol    */
> #define IPPROTO_ICMP            IPPROTO_ICMP
>
>
> ...so our enum value definitions for IPPROTO_ values clash with the above
> definitions. These could be individually ifdef-guarded if needed though I think.

Including vmlinux.h last won't have this enum conflicts, right?

> I can send the proof-of-concept patch if it would help, I just wanted to
> check in case that might be a workable path too, since it just requires
> changes to bpftool (and changes to in.h).

I think changing the uapi header like in.h is no-go.
Touching anything in uapi is too much of a risk.
Eduard Zingerman Nov. 1, 2022, 7:21 p.m. UTC | #12
On Tue, 2022-11-01 at 11:35 -0700, Alexei Starovoitov wrote:
> On Tue, Nov 1, 2022 at 9:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > 
> > > > Yes, we discussed this before. This will need to add additional work
> > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > > 
> > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> 
> That would be a great clang feature.
> Thanks Yonghong!
> 
> > > > 
> > > > Let us see whether we can get some upstream agreement or not.
> > > > 
> > > 
> > > Thanks for starting the conversation! I'll be following along.
> > > 
> > 
> > 
> > I think this sort of approach assumes that vmlinux.h is included after
> > any uapi headers, and would guard type definitions with
> > 
> > #if type_is_defined(foo)
> > struct foo {
> > 
> > };
> > #endif
> > 
> > ...is that right? My concern is that the vmlinux.h definitions have
> > the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
> > definitions to dominate over UAPI definitions do you think, or does it
> > matter?
> 
> I think it's totally fine to require #include "vmlinux.h" to be last.
> The attr(preserve_access_index) is only useful for kernel internal
> structs. uapi structs don't need it.
> 
> > 
> > I was wondering if there might be yet another way to crack this;
> > if we did want the vmlinux.h type definitions to be authoritative
> > because they have the preserve access index attribute, and because
> > bpftool knows all vmlinux types, it could use that info to selectively
> > redefine those type names such that we avoid name clashes when later
> > including UAPI headers. Something like
> > 
> > #ifdef __VMLINUX_H__
> > //usual vmlinux.h type definitions
> > #endif /* __VMLINUX_H__ */
> > 
> > #ifdef __VMLINUX_ALIAS__
> > if !defined(timespec64)
> > #define timespec64 __VMLINUX_ALIAS__timespec64
> > #endif
> > // rest of the types define aliases here
> > #undef __VMLINUX_ALIAS__
> > #else /* unalias */
> > #if defined(timespec64)
> > #undef timespec64
> > #endif
> > // rest of types undef aliases here
> > #endif /* __VMLINUX_ALIAS__ */
> > 
> > 
> > Then the consumer does this:
> > 
> > #define __VMLINUX_ALIAS__
> > #include "vmlinux.h"
> > // include uapi headers
> > #include "vmlinux.h"
> > 
> > (the latter include of vmlinux.h is needed to undef all the type aliases)
> 
> Sounds like a bunch of complexity for the use case that is not
> clear to me.

Well, my RFC is not shy of complexity :)
What Alan suggests should solve the confilicts described in [1] or any
other confilicts of such kind.

[1] https://lore.kernel.org/bpf/999da51bdf050f155ba299500061b3eb6e0dcd0d.camel@gmail.com/


> > 
> > I tried hacking up bpftool to support this aliasing scheme and while
> > it is kind of hacky it does seem to work, aside from some issues with
> > IPPROTO_* definitions - for the enumerated IPPROTO_ values linux/in.h does
> > this:
> > 
> > enum {
> >   IPPROTO_IP = 0,               /* Dummy protocol for TCP               */
> > #define IPPROTO_IP              IPPROTO_IP
> >   IPPROTO_ICMP = 1,             /* Internet Control Message Protocol    */
> > #define IPPROTO_ICMP            IPPROTO_ICMP
> > 
> > 
> > ...so our enum value definitions for IPPROTO_ values clash with the above
> > definitions. These could be individually ifdef-guarded if needed though I think.
> 
> Including vmlinux.h last won't have this enum conflicts, right?
> 
> > I can send the proof-of-concept patch if it would help, I just wanted to
> > check in case that might be a workable path too, since it just requires
> > changes to bpftool (and changes to in.h).
> 
> I think changing the uapi header like in.h is no-go.
> Touching anything in uapi is too much of a risk.
Alexei Starovoitov Nov. 1, 2022, 7:44 p.m. UTC | #13
On Tue, Nov 1, 2022 at 12:21 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2022-11-01 at 11:35 -0700, Alexei Starovoitov wrote:
> > On Tue, Nov 1, 2022 at 9:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> > >
> > > > > Yes, we discussed this before. This will need to add additional work
> > > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > > >
> > > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> >
> > That would be a great clang feature.
> > Thanks Yonghong!
> >
> > > > >
> > > > > Let us see whether we can get some upstream agreement or not.
> > > > >
> > > >
> > > > Thanks for starting the conversation! I'll be following along.
> > > >
> > >
> > >
> > > I think this sort of approach assumes that vmlinux.h is included after
> > > any uapi headers, and would guard type definitions with
> > >
> > > #if type_is_defined(foo)
> > > struct foo {
> > >
> > > };
> > > #endif
> > >
> > > ...is that right? My concern is that the vmlinux.h definitions have
> > > the CO-RE attributes. From a BPF perspective, would we like the vmlinux.h
> > > definitions to dominate over UAPI definitions do you think, or does it
> > > matter?
> >
> > I think it's totally fine to require #include "vmlinux.h" to be last.
> > The attr(preserve_access_index) is only useful for kernel internal
> > structs. uapi structs don't need it.
> >
> > >
> > > I was wondering if there might be yet another way to crack this;
> > > if we did want the vmlinux.h type definitions to be authoritative
> > > because they have the preserve access index attribute, and because
> > > bpftool knows all vmlinux types, it could use that info to selectively
> > > redefine those type names such that we avoid name clashes when later
> > > including UAPI headers. Something like
> > >
> > > #ifdef __VMLINUX_H__
> > > //usual vmlinux.h type definitions
> > > #endif /* __VMLINUX_H__ */
> > >
> > > #ifdef __VMLINUX_ALIAS__
> > > if !defined(timespec64)
> > > #define timespec64 __VMLINUX_ALIAS__timespec64
> > > #endif
> > > // rest of the types define aliases here
> > > #undef __VMLINUX_ALIAS__
> > > #else /* unalias */
> > > #if defined(timespec64)
> > > #undef timespec64
> > > #endif
> > > // rest of types undef aliases here
> > > #endif /* __VMLINUX_ALIAS__ */
> > >
> > >
> > > Then the consumer does this:
> > >
> > > #define __VMLINUX_ALIAS__
> > > #include "vmlinux.h"
> > > // include uapi headers
> > > #include "vmlinux.h"
> > >
> > > (the latter include of vmlinux.h is needed to undef all the type aliases)
> >
> > Sounds like a bunch of complexity for the use case that is not
> > clear to me.
>
> Well, my RFC is not shy of complexity :)
> What Alan suggests should solve the confilicts described in [1] or any
> other confilicts of such kind.
>
> [1] https://lore.kernel.org/bpf/999da51bdf050f155ba299500061b3eb6e0dcd0d.camel@gmail.com/

I don't quite see how renaming all types in the 1st vmlinux.h
will help with name collisions inside enum {}.
The enums will conflict with 2nd vmlinux.h too.
Unless the proposal is to rename them as well,
but then what's the point?
Eduard Zingerman Nov. 11, 2022, 9:55 p.m. UTC | #14
On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > [...]
> > 
> > Ok, could we change the problem to detecting if some type is defined.
> > Would it be possible to have something like
> > 
> > #if !__is_type_defined(struct abc)
> > struct abc {
> > };
> > #endif
> > 
> > I think we talked about this and there were problems with this
> > approach, but I don't remember details and how insurmountable the
> > problem is. Having a way to check whether some type is defined would
> > be very useful even outside of -target bpf parlance, though, so maybe
> > it's the problem worth attacking?
> 
> Yes, we discussed this before. This will need to add additional work
> in preprocessor. I just made a discussion topic in llvm discourse
> 
> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> 
> Let us see whether we can get some upstream agreement or not.

I did a small investigation of this feature.

The main pre-requirement is construction of the symbol table during
source code pre-processing, which implies necessity to parse the
source code at the same time. It is technically possible in clang, as
lexing, pre-processing and AST construction happens at the same time
when in compilation mode.

The prototype is available here [1], it includes:
- Change in the pre-processor that adds an optional callback
  "IsTypeDefinedFn" & necessary parsing of __is_type_defined
  construct.
- Change in Sema module (responsible for parsing/AST & symbol table)
  that installs the appropriate "IsTypeDefinedFn" in the pre-processor
  instance.

However, this prototype builds a backward dependency between
pre-processor and semantic analysis. There are currently no such
dependencies in the clang code base.

This makes it impossible to do pre-processing and compilation
separately, e.g. consider the following example:

$ cat test.c

  struct foo { int x; };
  
  #if __is_type_defined(foo)
    const int x = 1;
  #else
    const int x = 2;
  #endif
  
$ clang -cc1 -ast-print test.c -o -

  struct foo {
      int x;
  };
  const int x = 1;

$ clang -E test.c -o -

  # ... some line directives ...
  struct foo { int x; };
  const int x = 2;

Note that __is_type_defined is computed to different value in the
first and second calls. This is so because semantic analysis (AST,
symbol table) is not done for -E.

It also breaks that C11 standard which clearly separates
pre-processing and semantic analysis phases, see [2] 5.1.1.2.

So, my conclusion is as follows: this is technically possible in clang
but has no chance to reach llvm upstream.

Thanks,
Eduard

[1] https://github.com/llvm/llvm-project/compare/main...eddyz87:llvm-project:is-type-defined-experiment
[2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf


> 
> > 
> > > 
> > > > 
> > > > BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> > > > separate series and sending them as non-RFC sooner. Those improvements
> > > > are independent of all the header guards stuff, let's get them landed
> > > > sooner.
> > > > 
> > > > > After some discussion with Alexei and Yonghong I'd like to request
> > > > > your comments regarding a somewhat brittle and partial solution to
> > > > > this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> > > > > the generated `vmlinux.h`.
> > > > > 
> > > > 
> > > > [...]
> > > > 
> > > > > Eduard Zingerman (12):
> > > > >     libbpf: Deduplicate unambigous standalone forward declarations
> > > > >     selftests/bpf: Tests for standalone forward BTF declarations
> > > > >       deduplication
> > > > >     libbpf: Support for BTF_DECL_TAG dump in C format
> > > > >     selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> > > > >     libbpf: Header guards for selected data structures in vmlinux.h
> > > > >     selftests/bpf: Tests for header guards printing in BTF dump
> > > > >     bpftool: Enable header guards generation
> > > > >     kbuild: Script to infer header guard values for uapi headers
> > > > >     kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> > > > >     selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> > > > >     selftests/bpf: Known good uapi headers for test_uapi_headers.py
> > > > >     selftests/bpf: script for infer_header_guards.pl testing
> > > > > 
> > > > >    scripts/infer_header_guards.pl                | 191 +++++
> > > > >    scripts/link-vmlinux.sh                       |  13 +-
> > > > >    tools/bpf/bpftool/btf.c                       |   4 +-
> > > > >    tools/lib/bpf/btf.c                           | 178 ++++-
> > > > >    tools/lib/bpf/btf.h                           |   7 +-
> > > > >    tools/lib/bpf/btf_dump.c                      | 232 +++++-
> > > > >    .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> > > > >    tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> > > > >    .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> > > > >    .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> > > > >    .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> > > > >    .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> > > > >    .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> > > > >    13 files changed, 1816 insertions(+), 12 deletions(-)
> > > > >    create mode 100755 scripts/infer_header_guards.pl
> > > > >    create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> > > > >    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> > > > >    create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> > > > >    create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> > > > >    create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> > > > > 
> > > > > --
> > > > > 2.34.1
> > > > >
Yonghong Song Nov. 14, 2022, 7:52 a.m. UTC | #15
On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
>>>> [...]
>>>
>>> Ok, could we change the problem to detecting if some type is defined.
>>> Would it be possible to have something like
>>>
>>> #if !__is_type_defined(struct abc)
>>> struct abc {
>>> };
>>> #endif
>>>
>>> I think we talked about this and there were problems with this
>>> approach, but I don't remember details and how insurmountable the
>>> problem is. Having a way to check whether some type is defined would
>>> be very useful even outside of -target bpf parlance, though, so maybe
>>> it's the problem worth attacking?
>>
>> Yes, we discussed this before. This will need to add additional work
>> in preprocessor. I just made a discussion topic in llvm discourse
>>
>> https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
>>
>> Let us see whether we can get some upstream agreement or not.
> 
> I did a small investigation of this feature.
> 
> The main pre-requirement is construction of the symbol table during
> source code pre-processing, which implies necessity to parse the
> source code at the same time. It is technically possible in clang, as
> lexing, pre-processing and AST construction happens at the same time
> when in compilation mode.
> 
> The prototype is available here [1], it includes:
> - Change in the pre-processor that adds an optional callback
>    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
>    construct.
> - Change in Sema module (responsible for parsing/AST & symbol table)
>    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
>    instance.
> 
> However, this prototype builds a backward dependency between
> pre-processor and semantic analysis. There are currently no such
> dependencies in the clang code base.
> 
> This makes it impossible to do pre-processing and compilation
> separately, e.g. consider the following example:
> 
> $ cat test.c
> 
>    struct foo { int x; };
>    
>    #if __is_type_defined(foo)
>      const int x = 1;
>    #else
>      const int x = 2;
>    #endif
>    
> $ clang -cc1 -ast-print test.c -o -
> 
>    struct foo {
>        int x;
>    };
>    const int x = 1;
> 
> $ clang -E test.c -o -
> 
>    # ... some line directives ...
>    struct foo { int x; };
>    const int x = 2;

Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
That is, even with -E we could do some semantics analysis
as well, using either current clang semantics analysis or creating
an minimal version of sema analysis in preprocessor itself?

> 
> Note that __is_type_defined is computed to different value in the
> first and second calls. This is so because semantic analysis (AST,
> symbol table) is not done for -E.
> 
> It also breaks that C11 standard which clearly separates
> pre-processing and semantic analysis phases, see [2] 5.1.1.2.
> 
> So, my conclusion is as follows: this is technically possible in clang
> but has no chance to reach llvm upstream.
> 
> Thanks,
> Eduard
> 
> [1] https://github.com/llvm/llvm-project/compare/main...eddyz87:llvm-project:is-type-defined-experiment
> [2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
> 
> 
>>
>>>
>>>>
>>>>>
>>>>> BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
>>>>> separate series and sending them as non-RFC sooner. Those improvements
>>>>> are independent of all the header guards stuff, let's get them landed
>>>>> sooner.
>>>>>
>>>>>> After some discussion with Alexei and Yonghong I'd like to request
>>>>>> your comments regarding a somewhat brittle and partial solution to
>>>>>> this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
>>>>>> the generated `vmlinux.h`.
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>> Eduard Zingerman (12):
>>>>>>      libbpf: Deduplicate unambigous standalone forward declarations
>>>>>>      selftests/bpf: Tests for standalone forward BTF declarations
>>>>>>        deduplication
>>>>>>      libbpf: Support for BTF_DECL_TAG dump in C format
>>>>>>      selftests/bpf: Tests for BTF_DECL_TAG dump in C format
>>>>>>      libbpf: Header guards for selected data structures in vmlinux.h
>>>>>>      selftests/bpf: Tests for header guards printing in BTF dump
>>>>>>      bpftool: Enable header guards generation
>>>>>>      kbuild: Script to infer header guard values for uapi headers
>>>>>>      kbuild: Header guards for types from include/uapi/*.h in kernel BTF
>>>>>>      selftests/bpf: Script to verify uapi headers usage with vmlinux.h
>>>>>>      selftests/bpf: Known good uapi headers for test_uapi_headers.py
>>>>>>      selftests/bpf: script for infer_header_guards.pl testing
>>>>>>
>>>>>>     scripts/infer_header_guards.pl                | 191 +++++
>>>>>>     scripts/link-vmlinux.sh                       |  13 +-
>>>>>>     tools/bpf/bpftool/btf.c                       |   4 +-
>>>>>>     tools/lib/bpf/btf.c                           | 178 ++++-
>>>>>>     tools/lib/bpf/btf.h                           |   7 +-
>>>>>>     tools/lib/bpf/btf_dump.c                      | 232 +++++-
>>>>>>     .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
>>>>>>     tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
>>>>>>     .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
>>>>>>     .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
>>>>>>     .../progs/btf_dump_test_case_header_guards.c  |  94 +++
>>>>>>     .../bpf/test_uapi_header_guards_infer.sh      |  33 +
>>>>>>     .../selftests/bpf/test_uapi_headers.py        | 197 +++++
>>>>>>     13 files changed, 1816 insertions(+), 12 deletions(-)
>>>>>>     create mode 100755 scripts/infer_header_guards.pl
>>>>>>     create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
>>>>>>     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
>>>>>>     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
>>>>>>     create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
>>>>>>     create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
>>>>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>>
>
Eduard Zingerman Nov. 14, 2022, 9:13 p.m. UTC | #16
On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote:
> 
> On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > > > [...]
> > > > 
> > > > Ok, could we change the problem to detecting if some type is defined.
> > > > Would it be possible to have something like
> > > > 
> > > > #if !__is_type_defined(struct abc)
> > > > struct abc {
> > > > };
> > > > #endif
> > > > 
> > > > I think we talked about this and there were problems with this
> > > > approach, but I don't remember details and how insurmountable the
> > > > problem is. Having a way to check whether some type is defined would
> > > > be very useful even outside of -target bpf parlance, though, so maybe
> > > > it's the problem worth attacking?
> > > 
> > > Yes, we discussed this before. This will need to add additional work
> > > in preprocessor. I just made a discussion topic in llvm discourse
> > > 
> > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> > > 
> > > Let us see whether we can get some upstream agreement or not.
> > 
> > I did a small investigation of this feature.
> > 
> > The main pre-requirement is construction of the symbol table during
> > source code pre-processing, which implies necessity to parse the
> > source code at the same time. It is technically possible in clang, as
> > lexing, pre-processing and AST construction happens at the same time
> > when in compilation mode.
> > 
> > The prototype is available here [1], it includes:
> > - Change in the pre-processor that adds an optional callback
> >    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
> >    construct.
> > - Change in Sema module (responsible for parsing/AST & symbol table)
> >    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
> >    instance.
> > 
> > However, this prototype builds a backward dependency between
> > pre-processor and semantic analysis. There are currently no such
> > dependencies in the clang code base.
> > 
> > This makes it impossible to do pre-processing and compilation
> > separately, e.g. consider the following example:
> > 
> > $ cat test.c
> > 
> >    struct foo { int x; };
> >    
> >    #if __is_type_defined(foo)
> >      const int x = 1;
> >    #else
> >      const int x = 2;
> >    #endif
> >    
> > $ clang -cc1 -ast-print test.c -o -
> > 
> >    struct foo {
> >        int x;
> >    };
> >    const int x = 1;
> > 
> > $ clang -E test.c -o -
> > 
> >    # ... some line directives ...
> >    struct foo { int x; };
> >    const int x = 2;
> 
> Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
> That is, even with -E we could do some semantics analysis
> as well, using either current clang semantics analysis or creating
> an minimal version of sema analysis in preprocessor itself?

Sema drives consumption of tokens from Preprocessor. Calls to
Preprocessor are done on a parsing recursive descent. Extracting a
stream of tokens would require an incremental parser instead.

A minimal version of such parser is possible to implement for C.
It might be the case that matching open / closing braces and
identifiers following 'struct' / 'union' / 'enum' keywords might be
almost sufficient but I need to try to be sure (e.g. it is more
complex for 'typedef').

I can work on it but I don't think there is a chance to upstream this work.

Thanks,
Eduard

> 
> > 
> > Note that __is_type_defined is computed to different value in the
> > first and second calls. This is so because semantic analysis (AST,
> > symbol table) is not done for -E.
> > 
> > It also breaks that C11 standard which clearly separates
> > pre-processing and semantic analysis phases, see [2] 5.1.1.2.
> > 
> > So, my conclusion is as follows: this is technically possible in clang
> > but has no chance to reach llvm upstream.
> > 
> > Thanks,
> > Eduard
> > 
> > [1] https://github.com/llvm/llvm-project/compare/main...eddyz87:llvm-project:is-type-defined-experiment
> > [2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
> > 
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > BTW, I suggest splitting libbpf btf_dedup and btf_dump changes into a
> > > > > > separate series and sending them as non-RFC sooner. Those improvements
> > > > > > are independent of all the header guards stuff, let's get them landed
> > > > > > sooner.
> > > > > > 
> > > > > > > After some discussion with Alexei and Yonghong I'd like to request
> > > > > > > your comments regarding a somewhat brittle and partial solution to
> > > > > > > this issue that relies on adding `#ifndef FOO_H ... #endif` guards in
> > > > > > > the generated `vmlinux.h`.
> > > > > > > 
> > > > > > 
> > > > > > [...]
> > > > > > 
> > > > > > > Eduard Zingerman (12):
> > > > > > >      libbpf: Deduplicate unambigous standalone forward declarations
> > > > > > >      selftests/bpf: Tests for standalone forward BTF declarations
> > > > > > >        deduplication
> > > > > > >      libbpf: Support for BTF_DECL_TAG dump in C format
> > > > > > >      selftests/bpf: Tests for BTF_DECL_TAG dump in C format
> > > > > > >      libbpf: Header guards for selected data structures in vmlinux.h
> > > > > > >      selftests/bpf: Tests for header guards printing in BTF dump
> > > > > > >      bpftool: Enable header guards generation
> > > > > > >      kbuild: Script to infer header guard values for uapi headers
> > > > > > >      kbuild: Header guards for types from include/uapi/*.h in kernel BTF
> > > > > > >      selftests/bpf: Script to verify uapi headers usage with vmlinux.h
> > > > > > >      selftests/bpf: Known good uapi headers for test_uapi_headers.py
> > > > > > >      selftests/bpf: script for infer_header_guards.pl testing
> > > > > > > 
> > > > > > >     scripts/infer_header_guards.pl                | 191 +++++
> > > > > > >     scripts/link-vmlinux.sh                       |  13 +-
> > > > > > >     tools/bpf/bpftool/btf.c                       |   4 +-
> > > > > > >     tools/lib/bpf/btf.c                           | 178 ++++-
> > > > > > >     tools/lib/bpf/btf.h                           |   7 +-
> > > > > > >     tools/lib/bpf/btf_dump.c                      | 232 +++++-
> > > > > > >     .../selftests/bpf/good_uapi_headers.txt       | 677 ++++++++++++++++++
> > > > > > >     tools/testing/selftests/bpf/prog_tests/btf.c  | 152 ++++
> > > > > > >     .../selftests/bpf/prog_tests/btf_dump.c       |  11 +-
> > > > > > >     .../bpf/progs/btf_dump_test_case_decl_tag.c   |  39 +
> > > > > > >     .../progs/btf_dump_test_case_header_guards.c  |  94 +++
> > > > > > >     .../bpf/test_uapi_header_guards_infer.sh      |  33 +
> > > > > > >     .../selftests/bpf/test_uapi_headers.py        | 197 +++++
> > > > > > >     13 files changed, 1816 insertions(+), 12 deletions(-)
> > > > > > >     create mode 100755 scripts/infer_header_guards.pl
> > > > > > >     create mode 100644 tools/testing/selftests/bpf/good_uapi_headers.txt
> > > > > > >     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_decl_tag.c
> > > > > > >     create mode 100644 tools/testing/selftests/bpf/progs/btf_dump_test_case_header_guards.c
> > > > > > >     create mode 100755 tools/testing/selftests/bpf/test_uapi_header_guards_infer.sh
> > > > > > >     create mode 100755 tools/testing/selftests/bpf/test_uapi_headers.py
> > > > > > > 
> > > > > > > --
> > > > > > > 2.34.1
> > > > > > > 
> >
Alexei Starovoitov Nov. 14, 2022, 9:50 p.m. UTC | #17
On Mon, Nov 14, 2022 at 1:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote:
> >
> > On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> > > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > > > > [...]
> > > > >
> > > > > Ok, could we change the problem to detecting if some type is defined.
> > > > > Would it be possible to have something like
> > > > >
> > > > > #if !__is_type_defined(struct abc)
> > > > > struct abc {
> > > > > };
> > > > > #endif
> > > > >
> > > > > I think we talked about this and there were problems with this
> > > > > approach, but I don't remember details and how insurmountable the
> > > > > problem is. Having a way to check whether some type is defined would
> > > > > be very useful even outside of -target bpf parlance, though, so maybe
> > > > > it's the problem worth attacking?
> > > >
> > > > Yes, we discussed this before. This will need to add additional work
> > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > >
> > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> > > >
> > > > Let us see whether we can get some upstream agreement or not.
> > >
> > > I did a small investigation of this feature.
> > >
> > > The main pre-requirement is construction of the symbol table during
> > > source code pre-processing, which implies necessity to parse the
> > > source code at the same time. It is technically possible in clang, as
> > > lexing, pre-processing and AST construction happens at the same time
> > > when in compilation mode.
> > >
> > > The prototype is available here [1], it includes:
> > > - Change in the pre-processor that adds an optional callback
> > >    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
> > >    construct.
> > > - Change in Sema module (responsible for parsing/AST & symbol table)
> > >    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
> > >    instance.
> > >
> > > However, this prototype builds a backward dependency between
> > > pre-processor and semantic analysis. There are currently no such
> > > dependencies in the clang code base.
> > >
> > > This makes it impossible to do pre-processing and compilation
> > > separately, e.g. consider the following example:
> > >
> > > $ cat test.c
> > >
> > >    struct foo { int x; };
> > >
> > >    #if __is_type_defined(foo)
> > >      const int x = 1;
> > >    #else
> > >      const int x = 2;
> > >    #endif
> > >
> > > $ clang -cc1 -ast-print test.c -o -
> > >
> > >    struct foo {
> > >        int x;
> > >    };
> > >    const int x = 1;
> > >
> > > $ clang -E test.c -o -
> > >
> > >    # ... some line directives ...
> > >    struct foo { int x; };
> > >    const int x = 2;
> >
> > Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
> > That is, even with -E we could do some semantics analysis
> > as well, using either current clang semantics analysis or creating
> > an minimal version of sema analysis in preprocessor itself?
>
> Sema drives consumption of tokens from Preprocessor. Calls to
> Preprocessor are done on a parsing recursive descent. Extracting a
> stream of tokens would require an incremental parser instead.
>
> A minimal version of such parser is possible to implement for C.
> It might be the case that matching open / closing braces and
> identifiers following 'struct' / 'union' / 'enum' keywords might be
> almost sufficient but I need to try to be sure (e.g. it is more
> complex for 'typedef').
>
> I can work on it but I don't think there is a chance to upstream this work.

Right. It's going to be C only.
C++ with namespaces and nested class decls won't work with simple
type parser.

On the other side if we're asking preprocessor to look for
'struct foo' and remember that 'foo' is a type
maybe we can add a regex-search instead?
It would be a bit more generic and will work for basic
union/struct foo definition?
Something like instead of:
#if __is_type_defined(foo)
use:
#if regex(struct[\t]+foo)

enums are harder in this approach, but higher chance to land?

regex() would mean "search for this pattern in the file until this line.

Or some other preprocessor "language" tricks?

For example:
The preprocessor would grep for 'struct *' in a single line
while processing a file and emit #define __secret_prefix_##$1
where $1 would be a capture from "single line regex".
Then later in the same file instead of:
#if __is_type_defined(foo)
use:
#ifdef __secret_prefix_foo

This "single line regex" may look like:
#if regex_in_any_later_line(struct[\t]+[a-zA-Z_]+) define __secret_prefix_$2
Eduard Zingerman Nov. 16, 2022, 2:01 a.m. UTC | #18
On Mon, 2022-11-14 at 13:50 -0800, Alexei Starovoitov wrote:
> On Mon, Nov 14, 2022 at 1:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Sun, 2022-11-13 at 23:52 -0800, Yonghong Song wrote:
> > > 
> > > On 11/11/22 1:55 PM, Eduard Zingerman wrote:
> > > > On Fri, 2022-10-28 at 11:56 -0700, Yonghong Song wrote:
> > > > > > > [...]
> > > > > > 
> > > > > > Ok, could we change the problem to detecting if some type is defined.
> > > > > > Would it be possible to have something like
> > > > > > 
> > > > > > #if !__is_type_defined(struct abc)
> > > > > > struct abc {
> > > > > > };
> > > > > > #endif
> > > > > > 
> > > > > > I think we talked about this and there were problems with this
> > > > > > approach, but I don't remember details and how insurmountable the
> > > > > > problem is. Having a way to check whether some type is defined would
> > > > > > be very useful even outside of -target bpf parlance, though, so maybe
> > > > > > it's the problem worth attacking?
> > > > > 
> > > > > Yes, we discussed this before. This will need to add additional work
> > > > > in preprocessor. I just made a discussion topic in llvm discourse
> > > > > 
> > > > > https://discourse.llvm.org/t/add-a-type-checking-macro-is-type-defined-type/66268
> > > > > 
> > > > > Let us see whether we can get some upstream agreement or not.
> > > > 
> > > > I did a small investigation of this feature.
> > > > 
> > > > The main pre-requirement is construction of the symbol table during
> > > > source code pre-processing, which implies necessity to parse the
> > > > source code at the same time. It is technically possible in clang, as
> > > > lexing, pre-processing and AST construction happens at the same time
> > > > when in compilation mode.
> > > > 
> > > > The prototype is available here [1], it includes:
> > > > - Change in the pre-processor that adds an optional callback
> > > >    "IsTypeDefinedFn" & necessary parsing of __is_type_defined
> > > >    construct.
> > > > - Change in Sema module (responsible for parsing/AST & symbol table)
> > > >    that installs the appropriate "IsTypeDefinedFn" in the pre-processor
> > > >    instance.
> > > > 
> > > > However, this prototype builds a backward dependency between
> > > > pre-processor and semantic analysis. There are currently no such
> > > > dependencies in the clang code base.
> > > > 
> > > > This makes it impossible to do pre-processing and compilation
> > > > separately, e.g. consider the following example:
> > > > 
> > > > $ cat test.c
> > > > 
> > > >    struct foo { int x; };
> > > > 
> > > >    #if __is_type_defined(foo)
> > > >      const int x = 1;
> > > >    #else
> > > >      const int x = 2;
> > > >    #endif
> > > > 
> > > > $ clang -cc1 -ast-print test.c -o -
> > > > 
> > > >    struct foo {
> > > >        int x;
> > > >    };
> > > >    const int x = 1;
> > > > 
> > > > $ clang -E test.c -o -
> > > > 
> > > >    # ... some line directives ...
> > > >    struct foo { int x; };
> > > >    const int x = 2;
> > > 
> > > Is it any chance '-E' could output the same one as '-cc1 -ast-print'?
> > > That is, even with -E we could do some semantics analysis
> > > as well, using either current clang semantics analysis or creating
> > > an minimal version of sema analysis in preprocessor itself?
> > 
> > Sema drives consumption of tokens from Preprocessor. Calls to
> > Preprocessor are done on a parsing recursive descent. Extracting a
> > stream of tokens would require an incremental parser instead.
> > 
> > A minimal version of such parser is possible to implement for C.
> > It might be the case that matching open / closing braces and
> > identifiers following 'struct' / 'union' / 'enum' keywords might be
> > almost sufficient but I need to try to be sure (e.g. it is more
> > complex for 'typedef').
> > 
> > I can work on it but I don't think there is a chance to upstream this work.
> 
> Right. It's going to be C only.
> C++ with namespaces and nested class decls won't work with simple
> type parser.
> 
> On the other side if we're asking preprocessor to look for
> 'struct foo' and remember that 'foo' is a type
> maybe we can add a regex-search instead?
> It would be a bit more generic and will work for basic
> union/struct foo definition?
> Something like instead of:
> #if __is_type_defined(foo)
> use:
> #if regex(struct[\t]+foo)
> 
> enums are harder in this approach, but higher chance to land?
> 
> regex() would mean "search for this pattern in the file until this line.
> 
> Or some other preprocessor "language" tricks?
> 

I talked to Yonhong today and he suggests to investigate whether pre-processor
changes could be made BPF target specific. E.g. there are extension points
in the clang pre-processor right now but those for tooling. There might be
a way to extend this mechanism to allow target specific pre-processor behavior.
I'll take a look and write another email here.

> For example:
> The preprocessor would grep for 'struct *' in a single line
> while processing a file and emit #define __secret_prefix_##$1
> where $1 would be a capture from "single line regex".
> Then later in the same file instead of:
> #if __is_type_defined(foo)
> use:
> #ifdef __secret_prefix_foo
> 
> This "single line regex" may look like:
> #if regex_in_any_later_line(struct[\t]+[a-zA-Z_]+) define __secret_prefix_$2