mbox series

[v1,bpf-next,0/6] Support stashing local kptrs with bpf_kptr_xchg

Message ID 20230309180111.1618459-1-davemarchevsky@fb.com (mailing list archive)
Headers show
Series Support stashing local kptrs with bpf_kptr_xchg | expand

Message

Dave Marchevsky March 9, 2023, 6:01 p.m. UTC
Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program
BTF. A BPF program which creates a local kptr has exclusive control of the
lifetime of the kptr, and, prior to terminating, must:

  * free the kptr via bpf_obj_drop
  * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree},
    thereby passing control of the lifetime to the collection

This series adds a third option:

  * stash the kptr in a map value using bpf_kptr_xchg

As indicated by the use of "stash" to describe this behavior, the intended use
of this feature is temporary storage of local kptrs. For example, a sched_ext
([0]) scheduler may want to create an rbtree node for each new cgroup on cgroup
init, but to add that node to the rbtree as part of a separate program which
runs on enqueue. Stashing the node in a map_value allows its lifetime to outlive
the execution of the cgroup_init program.

Behavior:

There is no semantic difference between adding a kptr to a graph collection and
"stashing" it in a map. In both cases exclusive ownership of the kptr's lifetime
is passed to some containing data structure, which is responsible for
bpf_obj_drop'ing it when the container goes away.

Since graph collections also expect exclusive ownership of the nodes they
contain, graph nodes cannot be both stashed in a map_value and contained by
their corresponding collection.

Implementation:

Two observations simplify the verifier changes for this feature. First, kptrs
("referenced kptrs" until a recent renaming) require registration of a
dtor function as part of their acquire/release semantics, so that a referenced
kptr which is placed in a map_value is properly released when the map goes away.
We want this exact behavior for local kptrs, but with bpf_obj_drop as the dtor
instead of a per-btf_id dtor.

The second observation is that, in terms of identification, "referenced kptr"
and "local kptr" already don't interfere with one another. Consider the
following example:

  struct node_data {
          long key;
          long data;
          struct bpf_rb_node node;
  };

  struct map_value {
          struct node_data __kptr *node;
  };

  struct {
          __uint(type, BPF_MAP_TYPE_ARRAY);
          __type(key, int);
          __type(value, struct map_value);
          __uint(max_entries, 1);
  } some_nodes SEC(".maps");

  struct map_value *mapval;
  struct node_data *res;
  int key = 0;

  res = bpf_obj_new(typeof(*res));
  if (!res) { /* err handling */ }

  mapval = bpf_map_lookup_elem(&some_nodes, &key);
  if (!mapval) { /* err handling */ }

  res = bpf_kptr_xchg(&mapval->node, res);
  if (res)
          bpf_obj_drop(res);

The __kptr tag identifies map_value's node as a referenced kptr, while the
PTR_TO_BTF_ID which bpf_obj_new returns - a type in some non-vmlinux,
non-module BTF - identifies res as a local kptr. Type tag on the pointer
indicates referenced kptr, while the type of the pointee indicates local kptr.
So using existing facilities we can tell the verifier about a "referenced kptr"
pointer to a "local kptr" pointee.

When kptr_xchg'ing a kptr into a map_value, the verifier can recognize local
kptr types and treat them like referenced kptrs with a properly-typed
bpf_obj_drop as a dtor.

Other implementation notes:
  * We don't need to do anything special to enforce "graph nodes cannot be
    both stashed in a map_value and contained by their corresponding collection"
    * bpf_kptr_xchg both returns and takes as input a (possibly-null) owning
      reference. It does not accept non-owning references as input by virtue
      of requiring a ref_obj_id. By definition, if a program has an owning
      ref to a node, the node isn't in a collection, so it's safe to pass
      ownership via bpf_kptr_xchg.

Summary of patches:

  * Patches 1 - 3 are small refactorings.
  * Patch 4 modifies BTF plumbing to support using bpf_obj_drop as a dtor
  * Patch 5 adds verifier plumbing to support MEM_ALLOC-flagged param for
    bpf_kptr_xchg
  * Patch 6 adds selftests exercising the new behavior

  [0]: https://lwn.net/Articles/916290/

Dave Marchevsky (6):
  bpf: verifier: Rename kernel_type_name helper to btf_type_name
  bpf: btf: Remove unused btf_field_info_type enum
  bpf: Change btf_record_find enum parameter to field_mask
  bpf: Support __kptr to local kptrs
  bpf: Allow local kptrs to be exchanged via bpf_kptr_xchg
  selftests/bpf: Add local kptr stashing test

 include/linux/bpf.h                           | 13 ++-
 include/linux/btf.h                           |  2 -
 kernel/bpf/btf.c                              | 40 +++++----
 kernel/bpf/helpers.c                          | 11 ++-
 kernel/bpf/syscall.c                          | 20 ++++-
 kernel/bpf/verifier.c                         | 24 ++++--
 .../bpf/prog_tests/local_kptr_stash.c         | 33 +++++++
 .../selftests/bpf/progs/local_kptr_stash.c    | 85 +++++++++++++++++++
 8 files changed, 193 insertions(+), 35 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/local_kptr_stash.c
 create mode 100644 tools/testing/selftests/bpf/progs/local_kptr_stash.c

Comments

patchwork-bot+netdevbpf@kernel.org March 10, 2023, 8:30 p.m. UTC | #1
Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Thu, 9 Mar 2023 10:01:05 -0800 you wrote:
> Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program
> BTF. A BPF program which creates a local kptr has exclusive control of the
> lifetime of the kptr, and, prior to terminating, must:
> 
>   * free the kptr via bpf_obj_drop
>   * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree},
>     thereby passing control of the lifetime to the collection
> 
> [...]

Here is the summary with links:
  - [v1,bpf-next,1/6] bpf: verifier: Rename kernel_type_name helper to btf_type_name
    https://git.kernel.org/bpf/bpf-next/c/b32a5dae44cc
  - [v1,bpf-next,2/6] bpf: btf: Remove unused btf_field_info_type enum
    https://git.kernel.org/bpf/bpf-next/c/a4aa38897b6a
  - [v1,bpf-next,3/6] bpf: Change btf_record_find enum parameter to field_mask
    https://git.kernel.org/bpf/bpf-next/c/74843b57ec70
  - [v1,bpf-next,4/6] bpf: Support __kptr to local kptrs
    (no matching commit)
  - [v1,bpf-next,5/6] bpf: Allow local kptrs to be exchanged via bpf_kptr_xchg
    (no matching commit)
  - [v1,bpf-next,6/6] selftests/bpf: Add local kptr stashing test
    (no matching commit)

You are awesome, thank you!