[v1,bpf-next,1/2,RFC] bpf: Introduce BPF_F_VMA_NEXT flag for bpf_find_vma helper

At Meta we have a profiling daemon which periodically collects
information on many hosts. This collection usually involves grabbing
stacks (user and kernel) using perf_event BPF progs and later symbolicating
them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
those cases we must fall back to digging around in /proc/PID/maps to map
virtual address to (binary, offset). The /proc/PID/maps digging does not
occur synchronously with stack collection, so the process might already
be gone, in which case it won't have /proc/PID/maps and we will fail to
symbolicate.

This 'exited process problem' doesn't occur very often as
most of the prod services we care to profile are long-lived daemons,
there are enough usecases to warrant a workaround: a BPF program which
can be optionally loaded at data collection time and essentially walks
/proc/PID/maps. Currently this is done by walking the vma list:

  struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
  mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */

Since commit 763ecb035029 ("mm: remove the vma linked list") there's no
longer a vma linked list to walk. Walking the vma maple tree is not as
simple as hopping struct vm_area_struct->vm_next. That commit replaces
vm_next hopping with calls to find_vma(mm, addr) helper function, which
returns the vma containing addr, or if no vma contains addr,
the closest vma with higher start addr.

The BPF helper bpf_find_vma is unsurprisingly a thin wrapper around
find_vma, with the major difference that no 'closest vma' is returned if
there is no VMA containing a particular address. This prevents BPF
programs from being able to use bpf_find_vma to iterate all vmas in a
task in a reasonable way.

This patch adds a BPF_F_VMA_NEXT flag to bpf_find_vma which restores
'closest vma' behavior when used. Because this is find_vma's default
behavior it's as straightforward as nerfing a 'vma contains addr' check
on find_vma retval.

Also, change bpf_find_vma's address parameter to 'addr' instead of
'start'. The former is used in documentation and more accurately
describes the param.

[
  RFC: This isn't an ideal solution for iteration of all vmas in a task
       in the long term for a few reasons:

     * In nmi context, second call to bpf_find_vma will fail because
       irq_work is busy, so can't iterate all vmas
     * Repeatedly taking and releasing mmap_read lock when a dedicated
       iterate_all_vmas(task) kfunc could just take it once and hold for
       all vmas

    My specific usecase doesn't do vma iteration in nmi context and I
    think the 'closest vma' behavior can be useful here despite locking
    inefficiencies.

    When Alexei and I discussed this offline, two alternatives to
    provide similar functionality while addressing above issues seemed
    reasonable:

      * open-coded iterator for task vma. Similar to existing
        task_vma bpf_iter, but no need to create a bpf_link and read
	bpf_iter fd from userspace.
      * New kfunc taking callback similar bpf_find_vma, but iterating
        over all vmas in one go

     I think this patch is useful on its own since it's a fairly minimal
     change and fixes my usecase. Sending for early feedback and to
     solicit further thought about whether this should be dropped in
     favor of one of the above options.
]

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Nathan Slingerland <slinger@meta.com>
---
 include/uapi/linux/bpf.h       | 14 ++++++++++++--
 kernel/bpf/task_iter.c         | 12 ++++++++----
 tools/include/uapi/linux/bpf.h | 14 ++++++++++++--
 3 files changed, 32 insertions(+), 8 deletions(-)

Message ID	20230801145414.418145-1-davemarchevsky@fb.com (mailing list archive)
State	Changes Requested
Delegated to:	BPF
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EF9E4DC79 for <bpf@vger.kernel.org>; Tue, 1 Aug 2023 14:54:46 +0000 (UTC) From: Dave Marchevsky <davemarchevsky@fb.com> To: <bpf@vger.kernel.org> CC: Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@kernel.org>, Kernel Team <kernel-team@fb.com>, Dave Marchevsky <davemarchevsky@fb.com>, Nathan Slingerland <slinger@meta.com> Subject: [PATCH v1 bpf-next 1/2] [RFC] bpf: Introduce BPF_F_VMA_NEXT flag for bpf_find_vma helper Date: Tue, 1 Aug 2023 07:54:13 -0700 Message-ID: <20230801145414.418145-1-davemarchevsky@fb.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain
Series	[v1,bpf-next,1/2,RFC] bpf: Introduce BPF_F_VMA_NEXT flag for bpf_find_vma helper \| expand [v1,bpf-next,1/2,RFC] bpf: Introduce BPF_F_VMA_NEXT flag for bpf_find_vma helper [v1,bpf-next,2/2] selftests/bpf: Add test exercising bpf_find_vma's BPF_F_VMA_NEXT flag

Context	Check	Description
bpf/vmtest-bpf-next-PR	fail	PR summary
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	pending	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16	pending	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26	pending	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for veristat
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_verifier on x86_64 with gcc
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for bpf-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 3078 this patch: 3078
netdev/cc_maintainers	warning	8 maintainers not CCed: kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com sdf@google.com song@kernel.org yonghong.song@linux.dev jolsa@kernel.org haoluo@google.com
netdev/build_clang	success	Errors and warnings before: 1539 this patch: 1539
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 3100 this patch: 3100
netdev/checkpatch	warning	WARNING: line length of 85 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[v1,bpf-next,1/2,RFC] bpf: Introduce BPF_F_VMA_NEXT flag for bpf_find_vma helper

Checks

Commit Message

Comments

Patch