Message ID | AM6PR03MB50806D2E13B3C81B0ECDB5B299E12@AM6PR03MB5080.eurprd03.prod.outlook.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: Add open-coded style process file iterator and bpf_fget_task() kfunc | expand |
I noticed an interesting problem in Kernel CI. For gcc-compiled kernel (x86_64-gcc, aarch64-gcc, s390x-gcc), the iters/task_file test case can pass. But for llvm-compiled kernel (x86_64-llvm-17, x86_64-llvm-18), the iters/task_file test case fails. The following is the error: ; if (task->parent->pid != parent_pid) @ iters_task_file.c:26 1: (79) r1 = *(u64 *)(r0 +1696) ; R0_w=trusted_ptr_task_struct() R1_w=rcu_ptr_or_null_task_struct(id=1) 2: (61) r1 = *(u32 *)(r1 +1672) R1 invalid mem access 'rcu_ptr_or_null_' I reproduced this error on my local laptop. With the same ebpf program, it can be loaded normally into gcc-compiled kernel, but loading it into llvm-compiled (make LLVM=1) kernel will result in an error. The behavior of the gcc-compiled kernel is inconsistent with the llvm-compiled kernel. After my analysis, this is because of the following reasons: Currently, llvm-compiled kernel can determine whether a pointer is rcu or rcu_or_null based on btf_type_tag and allowlist, but gcc-compiled kernel can only based on allowlist, because gcc does not support btf_type_tag (commit 6fcd486b3a0a "bpf: Refactor RCU enforcement in the verifier."). Since the verifier cannot determine which fields are safe to read, it is not feasible to set all untagged fields to PTR_UNTRUSTED, so just remove the PTR_TRUSTED (commit afeebf9f57a4 "bpf: Undo strict enforcement for walking untagged fields."). These end up leading to an interesting problem. static int check_ptr_to_btf_access(...) { ... } else if (in_rcu_cs(env) && !type_may_be_null(reg->type)) { if (type_is_rcu(env, reg, field_name, btf_id)) { (1) /* ignore __rcu tag and mark it MEM_RCU */ flag |= MEM_RCU; } else if (flag & MEM_RCU || (2) type_is_rcu_or_null(env, reg, field_name, btf_id)) { /* __rcu tagged pointers can be NULL */ flag |= MEM_RCU | PTR_MAYBE_NULL; /* We always trust them */ if (type_is_rcu_or_null(env, reg, field_name, btf_id) && flag & PTR_UNTRUSTED) flag &= ~PTR_UNTRUSTED; } else if (flag & (MEM_PERCPU | MEM_USER)) { (3) /* keep as-is */ } else { (4) /* walking unknown pointers yields old deprecated PTR_TO_BTF_ID */ clear_trusted_flags(&flag); } ... } In gcc-compiled kernel, since task->parent is not in BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_RCU_OR_NULL, and since btf_type_tag is not supported, there is no MEM_RCU, branch (4) is reached, just remove PTR_TRUSTED. Although we cannot pass such a pointer to a kfunc with KF_TRUSTED_ARGS or KF_RCU, we can still dereference it. In llvm-compiled kernel, since task->parent is tagged with __rcu, branch (2) is reached and flag is set to MEM_RCU | PTR_MAYBE_NULL. This results in the "R1 invalid mem access rcu_ptr_or_null_" error mentioned earlier. If we remove the __rcu tag of task->parent, then there is no error, even with the llvm-compiled kernel, because branch (4) is reached like the gcc-compiled kernel. This leads to the interesting result that pointers not tagged with __rcu will have more freedom to dereference than pointers tagged with __rcu. This result is obviously incorrect because the pointer may be NULL whether or not it is tagged with __rcu. But how to fix this problem seems a bit tricky. It is not possible to mark every data structure member whether it may be NULL. If every pointer has PTR_MAYBE_NULL by default, chained dereferencing will become cumbersome. More discussion is welcome. Many thanks.
This patch series adds open-coded style process file iterator bpf_iter_task_file and bpf_fget_task() kfunc, and corresponding selftests test cases. In addition, since fs kfuncs is general and useful for scenarios other than LSM, this patch makes fs kfuncs available for SYSCALL program type. (In this version I did not remove the declarations in bpf_experimental.h as I guess these might be useful to others?) Although iter/task_file already exists, for CRIB we still need the open-coded iterator style process file iterator, and the same is true for other bpf iterators such as iter/tcp, iter/udp, etc. The traditional bpf iterator is more like a bpf version of procfs, but similar to procfs, it is not suitable for CRIB scenarios that need to obtain large amounts of complex, multi-level in-kernel information. The following is from previous discussions [1]. [1]: https://lore.kernel.org/bpf/AM6PR03MB5848CA34B5B68C90F210285E99B12@AM6PR03MB5848.eurprd03.prod.outlook.com/ This is because the context of bpf iterators is fixed and bpf iterators cannot be nested. This means that a bpf iterator program can only complete a specific small iterative dump task, and cannot dump multi-level data. An example, when we need to dump all the sockets of a process, we need to iterate over all the files (sockets) of the process, and iterate over the all packets in the queue of each socket, and iterate over all data in each packet. If we use bpf iterator, since the iterator can not be nested, we need to use socket iterator program to get all the basic information of all sockets (pass pid as filter), and then use packet iterator program to get the basic information of all packets of a specific socket (pass pid, fd as filter), and then use packet data iterator program to get all the data of a specific packet (pass pid, fd, packet index as filter). This would be complicated and require a lot of (each iteration) bpf program startup and exit (leading to poor performance). By comparison, open coded iterator is much more flexible, we can iterate in any context, at any time, and iteration can be nested, so we can achieve more flexible and more elegant dumping through open coded iterators. With open coded iterators, all of the above can be done in a single bpf program, and with nested iterators, everything becomes compact and simple. Also, bpf iterators transmit data to user space through seq_file, which involves a lot of open (bpf_iter_create), read, close syscalls, context switching, memory copying, and cannot achieve the performance of using ringbuf. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> --- v7 -> v8: * Keep path_d_path_kfunc_non_lsm * Add back the const following extern v6 -> v7: * Fix argument index mistake * Remove __aligned(8) at bpf_iter_task_file_kern * Make the if statement that checks item->file closer to fget_task_next * Remove the const following extern * Keep bpf_fs_kfuncs_filter v5 -> v6: * Remove local variable in bpf_fget_task. * Remove KF_RCU_PROTECTED from bpf_iter_task_file_new. * Remove bpf_fs_kfunc_set from being available for TRACING. * Use get_task_struct in bpf_iter_task_file_new. * Use put_task_struct in bpf_iter_task_file_destroy. v4 -> v5: * Add file type checks in test cases for process file iterator and bpf_fget_task(). * Use fentry to synchronize tests instead of waiting in a loop. * Remove path_d_path_kfunc_non_lsm test case. * Replace task_lookup_next_fdget_rcu() with fget_task_next(). * Remove future merge conflict section in cover letter (resolved). v3 -> v4: * Make all kfuncs generic, not CRIB specific. * Move bpf_fget_task to fs/bpf_fs_kfuncs.c. * Remove bpf_iter_task_file_get_fd and bpf_get_file_ops_type. * Use struct bpf_iter_task_file_item * as the return value of bpf_iter_task_file_next. * Change fd to unsigned int type and add next_fd. * Add KF_RCU_PROTECTED to bpf_iter_task_file_new. * Make fs kfuncs available to SYSCALL and TRACING program types. * Update all relevant test cases. * Remove the discussion section from cover letter. v2 -> v3: * Move task_file open-coded iterator to kernel/bpf/helpers.c. * Fix duplicate error code 7 in test_bpf_iter_task_file(). * Add comment for case when bpf_iter_task_file_get_fd() returns -1. * Add future plans in commit message of "Add struct file related CRIB kfuncs". * Add Discussion section to cover letter. v1 -> v2: * Fix a type definition error in the fd parameter of bpf_fget_task() at crib_common.h. Juntong Deng (5): bpf: Introduce task_file open-coded iterator kfuncs selftests/bpf: Add tests for open-coded style process file iterator bpf: Add bpf_fget_task() kfunc bpf: Make fs kfuncs available for SYSCALL program type selftests/bpf: Add tests for bpf_fget_task() kfunc fs/bpf_fs_kfuncs.c | 32 +++++-- kernel/bpf/helpers.c | 3 + kernel/bpf/task_iter.c | 90 ++++++++++++++++++ .../testing/selftests/bpf/bpf_experimental.h | 15 +++ .../selftests/bpf/prog_tests/fs_kfuncs.c | 46 ++++++++++ .../testing/selftests/bpf/prog_tests/iters.c | 78 ++++++++++++++++ .../selftests/bpf/progs/fs_kfuncs_failure.c | 33 +++++++ .../selftests/bpf/progs/iters_task_file.c | 87 ++++++++++++++++++ .../bpf/progs/iters_task_file_failure.c | 91 +++++++++++++++++++ .../selftests/bpf/progs/test_fget_task.c | 63 +++++++++++++ 10 files changed, 530 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/fs_kfuncs_failure.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task_file_failure.c create mode 100644 tools/testing/selftests/bpf/progs/test_fget_task.c