mbox series

[v4,bpf-next,00/11] bpf: support resilient split BTF

Message ID 20240517102246.4070184-1-alan.maguire@oracle.com (mailing list archive)
Headers show
Series bpf: support resilient split BTF | expand

Message

Alan Maguire May 17, 2024, 10:22 a.m. UTC
Split BPF Type Format (BTF) provides huge advantages in that kernel
modules only have to provide type information for types that they do not
share with the core kernel; for core kernel types, split BTF refers to
core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
uses that structure (or a pointer to it) simply needs to refer to the
core kernel type id, saving the need to define the structure and its many
dependents.  This cuts down on duplication and makes BTF as compact
as possible.

However, there is a downside.  This scheme requires the references from
split BTF to base BTF to be valid not just at encoding time, but at use
time (when the module is loaded).  Even a small change in kernel types
can perturb the type ids in core kernel BTF, and due to pahole's
parallel processing of compilation units, even an unchanged kernel can
have different type ids if BTF is re-generated.  So we have a robustness
problem for split BTF for cases where a module is not always compiled at
the same time as the kernel.  This problem is particularly acute for
distros which generally want module builders to be able to compile a
module for the lifetime of a Linux stable-based release, and have it
continue to be valid over the lifetime of that release, even as changes
in data structures (and hence BTF types) accrue.  Today it's not
possible to generate BTF for modules that works beyond the initial
kernel it is compiled against - kernel bugfixes etc invalidate the split
BTF references to vmlinux BTF, and BTF is no longer usable for the
module.

The goal of this series is to provide options to provide additional
context for cases like this.  That context comes in the form of
distilled base BTF; it stands in for the base BTF, and contains
information about the types referenced from split BTF, but not their
full descriptions.  The modified split BTF will refer to type ids in
this .BTF.base section, and when the kernel loads such modules it
will use that .BTF.base to map references from split BTF to the
equivalent current vmlinux base BTF types.  Once this relocation
process has succeeded, the module BTF available in /sys/kernel/btf
will look exactly as if it was built with the current vmlinux;
references to base types will be fixed up etc.

A module builder - using this series along with the pahole changes -
can then build a module with distilled base BTF via an out-of-tree
module build, i.e.

make -C . M=path/2/module

The module will have a .BTF section (the split BTF) and a
.BTF.base section.  The latter is small in size - distilled base
BTF does not need full struct/union/enum information for named
types for example.  For 2667 modules built with distilled base BTF,
the average size observed was 1556 bytes (stddev 1563).  The overall
size added to this 2667 modules was 5.3Mb.

Note that for the in-tree modules, this approach is not needed as
split and base BTF in the case of in-tree modules are always built
and re-built together.

The series first focuses on generating split BTF with distilled base
BTF, and provides btf__parse_opts() which allows specification
of the section name from which to read BTF data, since we now have
both .BTF and .BTF.base sections that can contain such data.

Then we add support to resolve_btfids for generating the .BTF.ids
section with reference to the .BTF.base section - this ensures the
.BTF.ids match those used in the split/base BTF.

Finally the series provides the mechanism for relocating split BTF with
a new base; the distilled base BTF is used to map the references to base
BTF in the split BTF to the new base.  For the kernel, this relocation
process happens at module load time, and we relocate split BTF
references to point at types in the current vmlinux BTF.  As part of
this, .BTF.ids references need to be mapped also.

So concretely, what happens is

- we generate split BTF in the .BTF section of a module that refers to
  types in the .BTF.base section as base types; the latter are not full
  type descriptions but provide information about the base type.  So
  a STRUCT sk_buff would be represented as a FWD struct sk_buff in
  distilled base BTF for example.
- when the module is loaded, the split BTF is relocated with vmlinux
  BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
  in vmlinux BTF and map all split BTF references to the distilled base
  FWD sk_buff, replacing them with references to the vmlinux BTF
  STRUCT sk_buff.

Support is also added to bpftool to be able to display split BTF
relative to its .BTF.base section, and also to display the relocated
form via the "-R path_to_base_btf".

A previous approach to this problem [1] utilized standalone BTF for such
cases - where the BTF is not defined relative to base BTF so there is no
relocation required.  The problem with that approach is that from
the verifier perspective, some types are special, and having a custom
representation of a core kernel type that did not necessarily match the
current representation is not tenable.  So the approach taken here was
to preserve the split BTF model while minimizing the representation of
the context needed to relocate split and current vmlinux BTF.

To generate distilled .BTF.base sections the associated dwarves
patch (to be applied on the "next" branch there) is needed.
Without it, things will still work but modules will not be built
with a .BTF.base section.

Changes since v3[3]:

- distill now checks for duplicate-named struct/unions and records
  them as a sized struct/union to help identify which of the
  multiple base BTF structs/unions it refers to (Eduard, patch 1)
- added test support for multiple name handling (Eduard, patch 2)
- simplified the string mapping when updating split BTF to use
  base BTF instead of distilled base.  Since the only string
  references split BTF can make to base BTF are the names of
  the base types, create a string map from distilled string
  offset -> base BTF string offset and update string offsets
  by visiting all strings in split BTF; this saves having to
  do costly searches of base BTF (Eduard, patch 7,10)
- fixed bpftool manpage and indentation issues (Quentin, patch 11)

Also explored Eduard's suggestion of doing an implicit fallback
to checking for .BTF.base section in btf__parse() when it is
called to get base BTF.  However while it is doable, it turned
out to be difficult operationally.  Since fallback is implicit
we do not know the source of the BTF - was it from .BTF or
.BTF.base? In bpftool, we want to try first standalone BTF,
then split, then split with distilled base.  Having a way
to explicitly request .BTF.base via btf__parse_opts() fits
that model better.

Changes since v2[4]:

- submitted patch to use --btf_features in Makefile.btf for pahole
  v1.26 and later separately (Andrii).  That has landed in bpf-next
  now.
- distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
  the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
- moved to distilling only named types, augmenting split BTF with
  associated reference types; this simplifies greatly the distilled
  base BTF and the mapping operation between distilled and base
  BTF when relocating (most of the series changes, Andrii)
- relocation now iterates over base BTF, looking for matches based
  on name in distilled BTF.  Distilled BTF is pre-sorted by name
  (Andrii, patch 8)
- removed most redundant compabitiliby checks aside from struct
  size for base types/embedded structs and kind compatibility
  (since we only match on name) (Andrii, patch 8)
- btf__parse_opts() now replaces btf_parse() internally in libbpf
  (Eduard, patch 3)

Changes since RFC [5]:

- updated terminology; we replace clunky "base reference" BTF with
  distilling base BTF into a .BTF.base section. Similarly BTF
  reconcilation becomes BTF relocation (Andrii, most patches)
- add distilled base BTF by default for out-of-tree modules
  (Alexei, patch 8)
- distill algorithm updated to record size of embedded struct/union
  by recording it as a 0-vlen STRUCT/UNION with size preserved
  (Andrii, patch 2)
- verify size match on relocation for such STRUCT/UNIONs (Andrii,
  patch 9)
- with embedded STRUCT/UNION recording size, we can have bpftool
  dump a header representation using .BTF.base + .BTF sections
  rather than special-casing and refusing to use "format c" for
  that case (patch 5)
- match enum with enum64 and vice versa (Andrii, patch 9)
- ensure that resolve_btfids works with BTF without .BTF.base
  section (patch 7)
- update tests to cover embedded types, arrays and function
  prototypes (patches 3, 12)

[1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/
[2] https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/
[4] https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/
[5] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/

Alan Maguire (11):
  libbpf: add btf__distill_base() creating split BTF with distilled base
    BTF
  selftests/bpf: test distilled base, split BTF generation
  libbpf: add btf__parse_opts() API for flexible BTF parsing
  bpftool: support displaying raw split BTF using base BTF section as
    base
  resolve_btfids: use .BTF.base ELF section as base BTF if -B option is
    used
  kbuild, bpf: add module-specific pahole/resolve_btfids flags for
    distilled base BTF
  libbpf: split BTF relocation
  selftests/bpf: extend distilled BTF tests to cover BTF relocation
  module, bpf: store BTF base pointer in struct module
  libbpf,bpf: share BTF relocate-related code with kernel
  bpftool: support displaying relocated-with-base split BTF

 include/linux/btf.h                           |  45 ++
 include/linux/module.h                        |   2 +
 kernel/bpf/Makefile                           |   8 +
 kernel/bpf/btf.c                              | 166 +++--
 kernel/module/main.c                          |   5 +-
 scripts/Makefile.btf                          |   7 +
 scripts/Makefile.modfinal                     |   4 +-
 .../bpf/bpftool/Documentation/bpftool-btf.rst |  15 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   7 +-
 tools/bpf/bpftool/btf.c                       |  19 +-
 tools/bpf/bpftool/main.c                      |  14 +-
 tools/bpf/bpftool/main.h                      |   2 +
 tools/bpf/resolve_btfids/main.c               |  28 +-
 tools/lib/bpf/Build                           |   2 +-
 tools/lib/bpf/btf.c                           | 605 +++++++++++++-----
 tools/lib/bpf/btf.h                           |  59 ++
 tools/lib/bpf/btf_common.c                    | 143 +++++
 tools/lib/bpf/btf_relocate.c                  | 341 ++++++++++
 tools/lib/bpf/libbpf.map                      |   3 +
 tools/lib/bpf/libbpf_internal.h               |   3 +
 .../selftests/bpf/prog_tests/btf_distill.c    | 346 ++++++++++
 21 files changed, 1612 insertions(+), 212 deletions(-)
 create mode 100644 tools/lib/bpf/btf_common.c
 create mode 100644 tools/lib/bpf/btf_relocate.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c

Comments

Alan Maguire May 17, 2024, 11:56 a.m. UTC | #1
On 17/05/2024 12:11, Arnaldo Carvalho de Melo wrote:
> On Fri, May 17, 2024, 7:23 AM Alan Maguire <alan.maguire@oracle.com
> <mailto:alan.maguire@oracle.com>> wrote:
> 
>     Split BPF Type Format (BTF) provides huge advantages in that kernel
>     modules only have to provide type information for types that they do not
>     share with the core kernel; for core kernel types, split BTF refers to
>     core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
>     uses that structure (or a pointer to it) simply needs to refer to the
>     core kernel type id, saving the need to define the structure and its
>     many
>     dependents.  This cuts down on duplication and makes BTF as compact
>     as possible.
> 
>     However, there is a downside.  This scheme requires the references from
>     split BTF to base BTF to be valid not just at encoding time, but at use
>     time (when the module is loaded).  Even a small change in kernel types
>     can perturb the type ids in core kernel BTF, and due to pahole's
>     parallel processing of compilation units, even an unchanged kernel can
>     have different type ids if BTF is re-generated.
> 
> 
> 
> I think it would be informative to mention the recently added
> "reproducible_build" feature, i.e. rephrase to "... if the
> reproducible_build isn't selected via --btf_features..." in the relevant
> documentation.
>

Yeah, sorry this part should have been updated after the
reproducible_build feature landed.

> - Arnaldo
> 
> Sent from smartphone, still on my way back home from LSF/MM+BPF
> 
>       So we have a robustness
>     problem for split BTF for cases where a module is not always compiled at
>     the same time as the kernel.  This problem is particularly acute for
>     distros which generally want module builders to be able to compile a
>     module for the lifetime of a Linux stable-based release, and have it
>     continue to be valid over the lifetime of that release, even as changes
>     in data structures (and hence BTF types) accrue.  Today it's not
>     possible to generate BTF for modules that works beyond the initial
>     kernel it is compiled against - kernel bugfixes etc invalidate the split
>     BTF references to vmlinux BTF, and BTF is no longer usable for the
>     module.
> 
>     The goal of this series is to provide options to provide additional
>     context for cases like this.  That context comes in the form of
>     distilled base BTF; it stands in for the base BTF, and contains
>     information about the types referenced from split BTF, but not their
>     full descriptions.  The modified split BTF will refer to type ids in
>     this .BTF.base section, and when the kernel loads such modules it
>     will use that .BTF.base to map references from split BTF to the
>     equivalent current vmlinux base BTF types.  Once this relocation
>     process has succeeded, the module BTF available in /sys/kernel/btf
>     will look exactly as if it was built with the current vmlinux;
>     references to base types will be fixed up etc.
> 
>     A module builder - using this series along with the pahole changes -
>     can then build a module with distilled base BTF via an out-of-tree
>     module build, i.e.
> 
>     make -C . M=path/2/module
> 
>     The module will have a .BTF section (the split BTF) and a
>     .BTF.base section.  The latter is small in size - distilled base
>     BTF does not need full struct/union/enum information for named
>     types for example.  For 2667 modules built with distilled base BTF,
>     the average size observed was 1556 bytes (stddev 1563).  The overall
>     size added to this 2667 modules was 5.3Mb.
> 
>     Note that for the in-tree modules, this approach is not needed as
>     split and base BTF in the case of in-tree modules are always built
>     and re-built together.
> 
>     The series first focuses on generating split BTF with distilled base
>     BTF, and provides btf__parse_opts() which allows specification
>     of the section name from which to read BTF data, since we now have
>     both .BTF and .BTF.base sections that can contain such data.
> 
>     Then we add support to resolve_btfids for generating the .BTF.ids
>     section with reference to the .BTF.base section - this ensures the
>     .BTF.ids match those used in the split/base BTF.
> 
>     Finally the series provides the mechanism for relocating split BTF with
>     a new base; the distilled base BTF is used to map the references to base
>     BTF in the split BTF to the new base.  For the kernel, this relocation
>     process happens at module load time, and we relocate split BTF
>     references to point at types in the current vmlinux BTF.  As part of
>     this, .BTF.ids references need to be mapped also.
> 
>     So concretely, what happens is
> 
>     - we generate split BTF in the .BTF section of a module that refers to
>       types in the .BTF.base section as base types; the latter are not full
>       type descriptions but provide information about the base type.  So
>       a STRUCT sk_buff would be represented as a FWD struct sk_buff in
>       distilled base BTF for example.
>     - when the module is loaded, the split BTF is relocated with vmlinux
>       BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
>       in vmlinux BTF and map all split BTF references to the distilled base
>       FWD sk_buff, replacing them with references to the vmlinux BTF
>       STRUCT sk_buff.
> 
>     Support is also added to bpftool to be able to display split BTF
>     relative to its .BTF.base section, and also to display the relocated
>     form via the "-R path_to_base_btf".
> 
>     A previous approach to this problem [1] utilized standalone BTF for such
>     cases - where the BTF is not defined relative to base BTF so there is no
>     relocation required.  The problem with that approach is that from
>     the verifier perspective, some types are special, and having a custom
>     representation of a core kernel type that did not necessarily match the
>     current representation is not tenable.  So the approach taken here was
>     to preserve the split BTF model while minimizing the representation of
>     the context needed to relocate split and current vmlinux BTF.
> 
>     To generate distilled .BTF.base sections the associated dwarves
>     patch (to be applied on the "next" branch there) is needed.
>     Without it, things will still work but modules will not be built
>     with a .BTF.base section.
> 
>     Changes since v3[3]:
> 
>     - distill now checks for duplicate-named struct/unions and records
>       them as a sized struct/union to help identify which of the
>       multiple base BTF structs/unions it refers to (Eduard, patch 1)
>     - added test support for multiple name handling (Eduard, patch 2)
>     - simplified the string mapping when updating split BTF to use
>       base BTF instead of distilled base.  Since the only string
>       references split BTF can make to base BTF are the names of
>       the base types, create a string map from distilled string
>       offset -> base BTF string offset and update string offsets
>       by visiting all strings in split BTF; this saves having to
>       do costly searches of base BTF (Eduard, patch 7,10)
>     - fixed bpftool manpage and indentation issues (Quentin, patch 11)
> 
>     Also explored Eduard's suggestion of doing an implicit fallback
>     to checking for .BTF.base section in btf__parse() when it is
>     called to get base BTF.  However while it is doable, it turned
>     out to be difficult operationally.  Since fallback is implicit
>     we do not know the source of the BTF - was it from .BTF or
>     .BTF.base? In bpftool, we want to try first standalone BTF,
>     then split, then split with distilled base.  Having a way
>     to explicitly request .BTF.base via btf__parse_opts() fits
>     that model better.
> 
>     Changes since v2[4]:
> 
>     - submitted patch to use --btf_features in Makefile.btf for pahole
>       v1.26 and later separately (Andrii).  That has landed in bpf-next
>       now.
>     - distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
>       the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
>     - moved to distilling only named types, augmenting split BTF with
>       associated reference types; this simplifies greatly the distilled
>       base BTF and the mapping operation between distilled and base
>       BTF when relocating (most of the series changes, Andrii)
>     - relocation now iterates over base BTF, looking for matches based
>       on name in distilled BTF.  Distilled BTF is pre-sorted by name
>       (Andrii, patch 8)
>     - removed most redundant compabitiliby checks aside from struct
>       size for base types/embedded structs and kind compatibility
>       (since we only match on name) (Andrii, patch 8)
>     - btf__parse_opts() now replaces btf_parse() internally in libbpf
>       (Eduard, patch 3)
> 
>     Changes since RFC [5]:
> 
>     - updated terminology; we replace clunky "base reference" BTF with
>       distilling base BTF into a .BTF.base section. Similarly BTF
>       reconcilation becomes BTF relocation (Andrii, most patches)
>     - add distilled base BTF by default for out-of-tree modules
>       (Alexei, patch 8)
>     - distill algorithm updated to record size of embedded struct/union
>       by recording it as a 0-vlen STRUCT/UNION with size preserved
>       (Andrii, patch 2)
>     - verify size match on relocation for such STRUCT/UNIONs (Andrii,
>       patch 9)
>     - with embedded STRUCT/UNION recording size, we can have bpftool
>       dump a header representation using .BTF.base + .BTF sections
>       rather than special-casing and refusing to use "format c" for
>       that case (patch 5)
>     - match enum with enum64 and vice versa (Andrii, patch 9)
>     - ensure that resolve_btfids works with BTF without .BTF.base
>       section (patch 7)
>     - update tests to cover embedded types, arrays and function
>       prototypes (patches 3, 12)
> 
>     [1]
>     https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/>
>     [2]
>     https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/>
>     [3]
>     https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/>
>     [4]
>     https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/>
>     [5]
>     https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/>
> 
>     Alan Maguire (11):
>       libbpf: add btf__distill_base() creating split BTF with distilled base
>         BTF
>       selftests/bpf: test distilled base, split BTF generation
>       libbpf: add btf__parse_opts() API for flexible BTF parsing
>       bpftool: support displaying raw split BTF using base BTF section as
>         base
>       resolve_btfids: use .BTF.base ELF section as base BTF if -B option is
>         used
>       kbuild, bpf: add module-specific pahole/resolve_btfids flags for
>         distilled base BTF
>       libbpf: split BTF relocation
>       selftests/bpf: extend distilled BTF tests to cover BTF relocation
>       module, bpf: store BTF base pointer in struct module
>       libbpf,bpf: share BTF relocate-related code with kernel
>       bpftool: support displaying relocated-with-base split BTF
> 
>      include/linux/btf.h                           |  45 ++
>      include/linux/module.h                        |   2 +
>      kernel/bpf/Makefile                           |   8 +
>      kernel/bpf/btf.c                              | 166 +++--
>      kernel/module/main.c                          |   5 +-
>      scripts/Makefile.btf                          |   7 +
>      scripts/Makefile.modfinal                     |   4 +-
>      .../bpf/bpftool/Documentation/bpftool-btf.rst |  15 +-
>      tools/bpf/bpftool/bash-completion/bpftool     |   7 +-
>      tools/bpf/bpftool/btf.c                       |  19 +-
>      tools/bpf/bpftool/main.c                      |  14 +-
>      tools/bpf/bpftool/main.h                      |   2 +
>      tools/bpf/resolve_btfids/main.c               |  28 +-
>      tools/lib/bpf/Build                           |   2 +-
>      tools/lib/bpf/btf.c                           | 605 +++++++++++++-----
>      tools/lib/bpf/btf.h                           |  59 ++
>      tools/lib/bpf/btf_common.c                    | 143 +++++
>      tools/lib/bpf/btf_relocate.c                  | 341 ++++++++++
>      tools/lib/bpf/libbpf.map                      |   3 +
>      tools/lib/bpf/libbpf_internal.h               |   3 +
>      .../selftests/bpf/prog_tests/btf_distill.c    | 346 ++++++++++
>      21 files changed, 1612 insertions(+), 212 deletions(-)
>      create mode 100644 tools/lib/bpf/btf_common.c
>      create mode 100644 tools/lib/bpf/btf_relocate.c
>      create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c
> 
>     -- 
>     2.31.1
> 
>
Eduard Zingerman May 17, 2024, 9:09 p.m. UTC | #2
On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:
[...]

> Changes since v3[3]:
> 
> - distill now checks for duplicate-named struct/unions and records
>   them as a sized struct/union to help identify which of the
>   multiple base BTF structs/unions it refers to (Eduard, patch 1)

Hi Alan,

Sorry, a little bit more on this topic.
- In patch #1 two kinds of structs get BTF_KIND_STRUCT declaration
  with size: those embedded and those with ambiguous name.
- In patch #7 btf_relocate_map_distilled_base() unconditionally
  requires size field for BTF_KIND_STRUCT to match.

This might hinder portability in the following scenario:
- base type is referred to only by pointer;
- base type has ambiguous name (in the old kernel used for base BTF
  generation);
- base type size is different in the new kernel when module is loaded.

There is also a scenario when type name is not ambiguous in the old
kernel, but becomes ambiguous in the new kernel.

So, what I had in mind when commented in v3 was:
- if distilled base has FWD for a structure and an ambiguous name,
  fail to relocate;
- if distilled base has STRUCT for a structure, find a unique pair
  name/size or fail to relocate.

This covers scenario #1 but ignores scenario #2 and requires minimal
changes for v3 design.

An alternative would be to e.g. keep STRUCT with size for all
structures in the base BTF and to compute "embedded" flag during
relocation:
- if distilled base STRUCT is embedded, search for a unique pair
  name/size or fail to relocate;
- if distilled base STRUCT is not embedded, search for a uniquely
  named struct, if that fails search for a unique pair name/size,
  or fail to relocate.

If we consider above to much of a hassle, I think v3 design + size
check for STRUCT is better because it is a bit simpler.

Wdyt?

[...]
Eduard Zingerman May 18, 2024, 2:38 a.m. UTC | #3
On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:

(Also, please note that CI fails for this series).

[...]

> Also explored Eduard's suggestion of doing an implicit fallback
> to checking for .BTF.base section in btf__parse() when it is
> called to get base BTF.  However while it is doable, it turned
> out to be difficult operationally.  Since fallback is implicit
> we do not know the source of the BTF - was it from .BTF or
> .BTF.base? In bpftool, we want to try first standalone BTF,
> then split, then split with distilled base.  Having a way
> to explicitly request .BTF.base via btf__parse_opts() fits
> that model better.

I don't think this is the case. Here is what I mean:
https://github.com/eddyz87/bpf/tree/distilled-base-alternative-parse-elf

The branch above is a modification for btf_parse_elf() and a few
reverts on top of this patch-set.

I modified btf_parse_elf() to follow the logic below:

| base_btf   | .BTF.base | Effect                                      |
| specified? | present?  |                                             |
|------------+-----------+---------------------------------------------|
| no         | no        | load btf from .BTF                          |
|------------+-----------+---------------------------------------------|
| yes        | no        | load btf from .BTF using base_btf as base   |
|            |           |                                             |
|------------+-----------+---------------------------------------------|
| no         | yes       | load btf from .BTF using .BTF.base as base  |
|            |           |                                             |
|------------+-----------+---------------------------------------------|
| yes        | yes       | load btf from .BTF using .BTF.base as base, |
|            |           | relocate btf against base_btf               |

When organized like that, there is no need to modify libbpf clients to
work with split BTF.

The `bpftool btf dump file ./btf_testmod.ko` would print non-relocated BTF.
The `bpftool btf -B ../../../vmlinux dump file ./btf_testmod.ko` would
print relocated BTF, no need for separate -R flag.
Imo, loading split BTF w/o relocation when .BTF.base is present
is interesting only for debug purposes and could be handled separately
as all building blocks are present in the library.

[...]
Alan Maguire May 20, 2024, 9:36 a.m. UTC | #4
On 17/05/2024 22:09, Eduard Zingerman wrote:
> On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:
> [...]
> 
>> Changes since v3[3]:
>>
>> - distill now checks for duplicate-named struct/unions and records
>>   them as a sized struct/union to help identify which of the
>>   multiple base BTF structs/unions it refers to (Eduard, patch 1)
> 
> Hi Alan,
> 
> Sorry, a little bit more on this topic.
> - In patch #1 two kinds of structs get BTF_KIND_STRUCT declaration
>   with size: those embedded and those with ambiguous name.
> - In patch #7 btf_relocate_map_distilled_base() unconditionally
>   requires size field for BTF_KIND_STRUCT to match.
> 
> This might hinder portability in the following scenario:
> - base type is referred to only by pointer;
> - base type has ambiguous name (in the old kernel used for base BTF
>   generation);
> - base type size is different in the new kernel when module is loaded.
> 
> There is also a scenario when type name is not ambiguous in the old
> kernel, but becomes ambiguous in the new kernel.
> 
> So, what I had in mind when commented in v3 was:
> - if distilled base has FWD for a structure and an ambiguous name,
>   fail to relocate;
> - if distilled base has STRUCT for a structure, find a unique pair
>   name/size or fail to relocate.
> 
> This covers scenario #1 but ignores scenario #2 and requires minimal
> changes for v3 design.
> 
> An alternative would be to e.g. keep STRUCT with size for all
> structures in the base BTF and to compute "embedded" flag during
> relocation:
> - if distilled base STRUCT is embedded, search for a unique pair
>   name/size or fail to relocate;
> - if distilled base STRUCT is not embedded, search for a uniquely
>   named struct, if that fails search for a unique pair name/size,
>   or fail to relocate.
>

Hi Eduard,

IIRC I think Andrii suggested something like the above; then the idea
was to use a 0 STRUCT size for cases where we didn't care about matching
size rather than using a mix of FWDs and STRUCTs. As you say though,
conditions can be different in the target base such that we might have
wished we had recorded additional size information, and we can only
really know that at relocation time.

That being the case, I think the most robust approach is as you suggest
to always record STRUCT/UNION size at distillation time and then only
require size matches for the embedded and duplicate name-kind cases.
This requires moving the embeddedness/dup logic from distillation to
relocation.

> If we consider above to much of a hassle, I think v3 design + size
> check for STRUCT is better because it is a bit simpler.
> 
> Wdyt?
>

The main goal from my side is to try and ensure we maximize the
opportunities to reconcile split and base BTF where possible. The scheme
you suggest above - always recording size but selectively using it in
matching - seems like the best way to achieve that. Thanks!

Alan

> [...]
Alan Maguire May 21, 2024, 9:15 a.m. UTC | #5
On 18/05/2024 03:38, Eduard Zingerman wrote:
> On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:
> 
> (Also, please note that CI fails for this series).
> 
> [...]
> 
>> Also explored Eduard's suggestion of doing an implicit fallback
>> to checking for .BTF.base section in btf__parse() when it is
>> called to get base BTF.  However while it is doable, it turned
>> out to be difficult operationally.  Since fallback is implicit
>> we do not know the source of the BTF - was it from .BTF or
>> .BTF.base? In bpftool, we want to try first standalone BTF,
>> then split, then split with distilled base.  Having a way
>> to explicitly request .BTF.base via btf__parse_opts() fits
>> that model better.
> 
> I don't think this is the case. Here is what I mean:
> https://github.com/eddyz87/bpf/tree/distilled-base-alternative-parse-elf
> 
> The branch above is a modification for btf_parse_elf() and a few
> reverts on top of this patch-set.
> 
> I modified btf_parse_elf() to follow the logic below:
> 
> | base_btf   | .BTF.base | Effect                                      |
> | specified? | present?  |                                             |
> |------------+-----------+---------------------------------------------|
> | no         | no        | load btf from .BTF                          |
> |------------+-----------+---------------------------------------------|
> | yes        | no        | load btf from .BTF using base_btf as base   |
> |            |           |                                             |
> |------------+-----------+---------------------------------------------|
> | no         | yes       | load btf from .BTF using .BTF.base as base  |
> |            |           |                                             |
> |------------+-----------+---------------------------------------------|
> | yes        | yes       | load btf from .BTF using .BTF.base as base, |
> |            |           | relocate btf against base_btf               |
> 
> When organized like that, there is no need to modify libbpf clients to
> work with split BTF.
> 
> The `bpftool btf dump file ./btf_testmod.ko` would print non-relocated BTF.
> The `bpftool btf -B ../../../vmlinux dump file ./btf_testmod.ko` would
> print relocated BTF, no need for separate -R flag.
> Imo, loading split BTF w/o relocation when .BTF.base is present
> is interesting only for debug purposes and could be handled separately
> as all building blocks are present in the library.
> 

This is a neat approach, and as you say it eliminates the need to modify
bpftool to handle distilled base BTF and relocation.  The only wrinkle
is resolve_btfids; we call resolve_btfids for modules with a "-B
vmlinux" argument, so in that case we'd be calling btf_parse_elf() with
both a split and base BTF. According to the approach outlined above,
we'd relocate split BTF - originally relative to .BTF.base - to be
relative to vmlinux BTF, but in the case of resolve_btfids we don't want
that relocation. We want the BTF ids to reflect the distilled base BTF
ids since they will be relocated later on module load along with the
split BTF references themselves.

We can handle this by having a -R flag to skip relocation; it would
simply ensure we first try calling btf__parse(), falling back to
btf__parse_split(); we need the fallback logic as it is possible the
pahole version didn't add .BTF.base sections. This logic would only be
activated for out-of-tree module builds so seems acceptable to me. If
that makes sense, with your permission I can rework the series to
include your BTF parsing patch.

Thanks!

Alan
> [...]
>
Eduard Zingerman May 21, 2024, 4:19 p.m. UTC | #6
On Tue, 2024-05-21 at 10:15 +0100, Alan Maguire wrote:

[...]

> This is a neat approach, and as you say it eliminates the need to modify
> bpftool to handle distilled base BTF and relocation.  The only wrinkle
> is resolve_btfids; we call resolve_btfids for modules with a "-B
> vmlinux" argument, so in that case we'd be calling btf_parse_elf() with
> both a split and base BTF. According to the approach outlined above,
> we'd relocate split BTF - originally relative to .BTF.base - to be
> relative to vmlinux BTF, but in the case of resolve_btfids we don't want
> that relocation. We want the BTF ids to reflect the distilled base BTF
> ids since they will be relocated later on module load along with the
> split BTF references themselves.

You are correct, I missed this detail, resolve_btfids needs distilled
base instead of vmlinux for out of tree modules.

> We can handle this by having a -R flag to skip relocation; it would
> simply ensure we first try calling btf__parse(), falling back to
> btf__parse_split(); we need the fallback logic as it is possible the
> pahole version didn't add .BTF.base sections. This logic would only be
> activated for out-of-tree module builds so seems acceptable to me. If
> that makes sense, with your permission I can rework the series to
> include your BTF parsing patch.

Makes sense to me, but I'm curious whether you and Andrii consider
this a good interface, compared to _opts version.

Thanks,
Eduard
Andrii Nakryiko May 21, 2024, 6:54 p.m. UTC | #7
On Tue, May 21, 2024 at 9:19 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-05-21 at 10:15 +0100, Alan Maguire wrote:
>
> [...]
>
> > This is a neat approach, and as you say it eliminates the need to modify
> > bpftool to handle distilled base BTF and relocation.  The only wrinkle
> > is resolve_btfids; we call resolve_btfids for modules with a "-B
> > vmlinux" argument, so in that case we'd be calling btf_parse_elf() with
> > both a split and base BTF. According to the approach outlined above,
> > we'd relocate split BTF - originally relative to .BTF.base - to be
> > relative to vmlinux BTF, but in the case of resolve_btfids we don't want
> > that relocation. We want the BTF ids to reflect the distilled base BTF
> > ids since they will be relocated later on module load along with the
> > split BTF references themselves.
>
> You are correct, I missed this detail, resolve_btfids needs distilled
> base instead of vmlinux for out of tree modules.
>
> > We can handle this by having a -R flag to skip relocation; it would
> > simply ensure we first try calling btf__parse(), falling back to
> > btf__parse_split(); we need the fallback logic as it is possible the
> > pahole version didn't add .BTF.base sections. This logic would only be
> > activated for out-of-tree module builds so seems acceptable to me. If
> > that makes sense, with your permission I can rework the series to
> > include your BTF parsing patch.
>
> Makes sense to me, but I'm curious whether you and Andrii consider
> this a good interface, compared to _opts version.
>

Hey, sorry for delays, just getting back to reviewing patches, I will
go through this revision today.

I'm probably leaning towards not doing automatic relocations in
btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
feature, if we need to teach resolve_btfids and bpftool to do
something extra for that case (i.e., call another API for relocation,
if necessary), then it's fine, doesn't seems like a problem.

Much worse is having to do some workarounds to prevent an API from
doing some undesired extra steps (like in resolve_btfids not wanting a
relocation). Orthogonality FTW, IMO.


> Thanks,
> Eduard
Eduard Zingerman May 21, 2024, 7:08 p.m. UTC | #8
On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:

[...]

> I'm probably leaning towards not doing automatic relocations in
> btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> feature, if we need to teach resolve_btfids and bpftool to do
> something extra for that case (i.e., call another API for relocation,
> if necessary), then it's fine, doesn't seems like a problem.

My point is that with current implementation it does not even make
sense to call btf__parse() for an ELF with distilled base,
because it would fail.

And selecting BTF encoding based on a few retries seems like a kludge
if there is a simple way to check if distilled base has to be used
(presence of the .BTF.base section).

> Much worse is having to do some workarounds to prevent an API from
> doing some undesired extra steps (like in resolve_btfids not wanting a
> relocation). Orthogonality FTW, IMO.

For resolve_btfids it is a bit different, imo.
It does want some base: for in-tree modules it wants vmlinux,
for out-of-tree it wants distilled base.
So it has to be adjusted either way.
Andrii Nakryiko May 21, 2024, 10:01 p.m. UTC | #9
On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > I'm probably leaning towards not doing automatic relocations in
> > btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> > feature, if we need to teach resolve_btfids and bpftool to do
> > something extra for that case (i.e., call another API for relocation,
> > if necessary), then it's fine, doesn't seems like a problem.
>
> My point is that with current implementation it does not even make
> sense to call btf__parse() for an ELF with distilled base,
> because it would fail.

True (unless application loaded .BTF.base as stand-alone BTF first,
but it's pretty advanced scenario)

>
> And selecting BTF encoding based on a few retries seems like a kludge
> if there is a simple way to check if distilled base has to be used
> (presence of the .BTF.base section).

agreed

>
> > Much worse is having to do some workarounds to prevent an API from
> > doing some undesired extra steps (like in resolve_btfids not wanting a
> > relocation). Orthogonality FTW, IMO.
>
> For resolve_btfids it is a bit different, imo.
> It does want some base: for in-tree modules it wants vmlinux,
> for out-of-tree it wants distilled base.
> So it has to be adjusted either way.

Ok, so I read some more code and re-read your discussion w/ Alan. I
agree with your proposal, I think it's logical (even if relocation
does feel a bit "extra" for "parse"-like API, but ok, whatever).

I see what you are saying about resolve_btfids needing the changes
either way, and that's true. But instead of adding (unnecessary, IMO)
-R argument, resolve_btfids should be able to detect .BTF.base section
presence and infer that this is distilled BTF case, and thus proceed
with ignoring `-B <vmlinux>` argument (we can even complain that `-B
vmlinux` is specified if distilled BTF is used, not sure.
Eduard Zingerman May 21, 2024, 10:15 p.m. UTC | #10
On Tue, 2024-05-21 at 15:01 -0700, Andrii Nakryiko wrote:
> On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
> > 
> > [...]
> > 
> > > I'm probably leaning towards not doing automatic relocations in
> > > btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> > > feature, if we need to teach resolve_btfids and bpftool to do
> > > something extra for that case (i.e., call another API for relocation,
> > > if necessary), then it's fine, doesn't seems like a problem.
> > 
> > My point is that with current implementation it does not even make
> > sense to call btf__parse() for an ELF with distilled base,
> > because it would fail.
> 
> True (unless application loaded .BTF.base as stand-alone BTF first,
> but it's pretty advanced scenario)

In this scenario .BTF.base would be relocated against .BTF.base,
which is useless but not a failure.
Maybe having the _opts() variant with additional degree of control
(e.g. whether to ignore .BTF.base) is interesting as well.
On the other hand, for such use-cases libbpf provides btf__parse()
that accepts raw binary input, and application can extract ELF
contents by itself.

[...]

> I see what you are saying about resolve_btfids needing the changes
> either way, and that's true. But instead of adding (unnecessary, IMO)
> -R argument, resolve_btfids should be able to detect .BTF.base section
> presence and infer that this is distilled BTF case, and thus proceed
> with ignoring `-B <vmlinux>` argument (we can even complain that `-B
> vmlinux` is specified if distilled BTF is used, not sure.

+1 for complaining about -B vmlinux when .BTF.base should be used.
Andrii Nakryiko May 21, 2024, 10:36 p.m. UTC | #11
On Tue, May 21, 2024 at 3:15 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-05-21 at 15:01 -0700, Andrii Nakryiko wrote:
> > On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
> > >
> > > [...]
> > >
> > > > I'm probably leaning towards not doing automatic relocations in
> > > > btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> > > > feature, if we need to teach resolve_btfids and bpftool to do
> > > > something extra for that case (i.e., call another API for relocation,
> > > > if necessary), then it's fine, doesn't seems like a problem.
> > >
> > > My point is that with current implementation it does not even make
> > > sense to call btf__parse() for an ELF with distilled base,
> > > because it would fail.
> >
> > True (unless application loaded .BTF.base as stand-alone BTF first,
> > but it's pretty advanced scenario)
>
> In this scenario .BTF.base would be relocated against .BTF.base,
> which is useless but not a failure.
> Maybe having the _opts() variant with additional degree of control
> (e.g. whether to ignore .BTF.base) is interesting as well.
> On the other hand, for such use-cases libbpf provides btf__parse()
> that accepts raw binary input, and application can extract ELF
> contents by itself.

I was just being pedantic :) We can always add a new _opts() variant
later, if necessary. I don't think this scenario is going to be a real
scenario, so no need to worry about it too much.

>
> [...]
>
> > I see what you are saying about resolve_btfids needing the changes
> > either way, and that's true. But instead of adding (unnecessary, IMO)
> > -R argument, resolve_btfids should be able to detect .BTF.base section
> > presence and infer that this is distilled BTF case, and thus proceed
> > with ignoring `-B <vmlinux>` argument (we can even complain that `-B
> > vmlinux` is specified if distilled BTF is used, not sure.
>
> +1 for complaining about -B vmlinux when .BTF.base should be used.

ok, let's
Alan Maguire May 22, 2024, 4:16 p.m. UTC | #12
On 21/05/2024 23:01, Andrii Nakryiko wrote:
> On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
>>
>> [...]
>>
>>> I'm probably leaning towards not doing automatic relocations in
>>> btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
>>> feature, if we need to teach resolve_btfids and bpftool to do
>>> something extra for that case (i.e., call another API for relocation,
>>> if necessary), then it's fine, doesn't seems like a problem.
>>
>> My point is that with current implementation it does not even make
>> sense to call btf__parse() for an ELF with distilled base,
>> because it would fail.
> 
> True (unless application loaded .BTF.base as stand-alone BTF first,
> but it's pretty advanced scenario)
> 
>>
>> And selecting BTF encoding based on a few retries seems like a kludge
>> if there is a simple way to check if distilled base has to be used
>> (presence of the .BTF.base section).
> 
> agreed
> 
>>
>>> Much worse is having to do some workarounds to prevent an API from
>>> doing some undesired extra steps (like in resolve_btfids not wanting a
>>> relocation). Orthogonality FTW, IMO.
>>
>> For resolve_btfids it is a bit different, imo.
>> It does want some base: for in-tree modules it wants vmlinux,
>> for out-of-tree it wants distilled base.
>> So it has to be adjusted either way.
> 
> Ok, so I read some more code and re-read your discussion w/ Alan. I
> agree with your proposal, I think it's logical (even if relocation
> does feel a bit "extra" for "parse"-like API, but ok, whatever).
> 
> I see what you are saying about resolve_btfids needing the changes
> either way, and that's true. But instead of adding (unnecessary, IMO)
> -R argument, resolve_btfids should be able to detect .BTF.base section
> presence and infer that this is distilled BTF case, and thus proceed
> with ignoring `-B <vmlinux>` argument (we can even complain that `-B
> vmlinux` is specified if distilled BTF is used, not sure.

That's a good idea; we already have ELF parsing in resolve_btfids, so as
we can detect the .BTF.base presence there easily, no need for a
parameter. I put together a quick proof-of-concept and it works well.

This simplifies things considerably; we can eliminate the bpftool
patches completely since ELF parsing does the right thing already, and
the resolve_btfids change is small, with no user interface-visible
changes there.

Thanks!

Alan