mbox series

[v1,bpf-next,0/2] bpf: Add mmapable task_local storage

Message ID 20231120175925.733167-1-davemarchevsky@fb.com (mailing list archive)
Headers show
Series bpf: Add mmapable task_local storage | expand

Message

Dave Marchevsky Nov. 20, 2023, 5:59 p.m. UTC
This series adds support for mmap()ing single task_local storage mapvals
into userspace. Two motivating usecases:

  * sched_ext ([0]) schedulers might want to act on 'scheduling hints'
    provided by userspace tasks. For example, a task can tag itself as
    latency-sensitive but not particularly computationally intensive and
    BPF scheduler can use this information to make better scheduling
    decisions. Similarly, a database task about to start a
    transaction can tag itself as doing so without high overhead by
    writing to the mmap'd mapval. In both cases the information is
    task-specific and in the latter it'd be preferable to avoid
    incurring syscall overhead as the hint would change often.

  * strobemeta ([1]) technique to read thread_local storage is used
    by tracing programs at Meta to annotate tracing data with
    task-specific metadata. For example, a multithreaded webserver with
    a pool of worker threads preparing responses and other threads
    handling request connections might want to tag threads by type, and
    further tag worker threads with feature flags enabled during request
    processing.
      * The strobemeta technique predates existence of task_local
	storage map, instead relying on reverse-engineering thread_local
	storage implementation specifics. The approach enabled here
	avoids much of this complexity.

The general thrust of this series' implementation is "simplest thing
that works". A userspace thread can mmap() a task_local storage map fd
and receive the map_value corresponding to its task. In the future we
can support mmap()ing in other threads' map_values via offset parameter
or some other approach. Similarly, this series makes no attempt to pack
multiple map_values into a userspace-mappable page - each map_value for
a BPF_F_MMAPABLE task_local storage map is given its own page. For the
motivating usecases above neither of those potential improvements is
necessary. Patch 1's summary digs deeper into implementation details.

This series' changes to generic local_storage implementation shared by
cgroup_local storage and others will make extending this support to
those local storage types straightforward in the future.

Summary of patches:
  * Patch 1 adds support for mmapable map_vals in generic
    bpf_local_storage infrastructure and uses the new feature in
    task_local storage
  * Patch 2 adds tests

  [0]: https://lore.kernel.org/bpf/20231111024835.2164816-1-tj@kernel.org/
  [1]: tools/testing/selftests/bpf/progs/strobemeta*

Dave Marchevsky (2):
  bpf: Support BPF_F_MMAPABLE task_local storage
  selftests/bpf: Add test exercising mmapable task_local_storage

 include/linux/bpf_local_storage.h             |  14 +-
 kernel/bpf/bpf_local_storage.c                | 145 +++++++++++---
 kernel/bpf/bpf_task_storage.c                 |  35 +++-
 kernel/bpf/syscall.c                          |   2 +-
 .../bpf/prog_tests/task_local_storage.c       | 177 ++++++++++++++++++
 .../bpf/progs/task_local_storage__mmap.c      |  59 ++++++
 .../bpf/progs/task_local_storage__mmap.h      |   7 +
 .../bpf/progs/task_local_storage__mmap_fail.c |  39 ++++
 8 files changed, 445 insertions(+), 33 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage__mmap.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage__mmap.h
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage__mmap_fail.c