Message ID | 20250125-bpf_dynptr_probe-v2-1-c42c87f97afe@outlook.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | bpf: Add probe_read_{kernel,user}_dynptr and copy_from_user_dynptr | expand |
On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay <devnull+rsworktech.outlook.com@kernel.org> wrote: > > From: Levi Zim <rsworktech@outlook.com> > > This patch add a helper function bpf_probe_read_kernel_dynptr: > > long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, > u32 offset, u32 size, const void *unsafe_ptr, u64 flags); We stopped adding helpers years ago. Only new kfuncs are allowed. This particular one doesn't look useful as-is. The same logic can be expressed with - create dynptr - dynptr_slice - copy_from_kernel pw-bot: cr
On 2025/1/26 00:58, Alexei Starovoitov wrote: > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay > <devnull+rsworktech.outlook.com@kernel.org> wrote: >> From: Levi Zim <rsworktech@outlook.com> >> >> This patch add a helper function bpf_probe_read_kernel_dynptr: >> >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); > We stopped adding helpers years ago. > Only new kfuncs are allowed. Sorry, I didn't know that. Just asking, is there any documentation/discussion about stopping adding helpers? I will switch the implementation to kfuncs in v3. > This particular one doesn't look useful as-is. > The same logic can be expressed with > - create dynptr > - dynptr_slice > - copy_from_kernel By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem with dynptr_slice_rdwr and probe_read_kernel is that they only support a compile-time constant size [1]. But in order to best utilize the space on a BPF ringbuf, it is possible to reserve a variable length of space as dynptr on a ringbuf with bpf_ringbuf_reserve_dynptr. Then currently we have no way to read a variable length of kernel memory into this dynptr, except doing it chunk by chunk[2], which is kinda awkward. That's the problem the new helpers trying to solve. And I am not the only one needing this kind of feature [3]. Andrii said it would be a straightforward addition as it is a super thin wrapper around existing functionality (we are just avoiding fixed buffer size restrictions of existing probe/copy_from APIs) [1]: https://elixir.bootlin.com/linux/v6.12.6/source/kernel/bpf/helpers.c#L2600-L2601 [2]: https://github.com/libbpf/libbpf-bootstrap/commit/046fad60df3e39540937b5ec6ee86054f33d3f28 [3]: https://github.com/libbpf/libbpf-rs/issues/1041 [4]: https://lore.kernel.org/bpf/CAEf4BzZctXJsR+TwMhmXNWnR0_BV802-3KJw226ZZt8St4xNkw@mail.gmail.com/ > pw-bot: cr
On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: > > On 2025/1/26 00:58, Alexei Starovoitov wrote: > > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay > > <devnull+rsworktech.outlook.com@kernel.org> wrote: > >> From: Levi Zim <rsworktech@outlook.com> > >> > >> This patch add a helper function bpf_probe_read_kernel_dynptr: > >> > >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, > >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); > > We stopped adding helpers years ago. > > Only new kfuncs are allowed. > > Sorry, I didn't know that. Just asking, is there any > documentation/discussion > about stopping adding helpers? > > I will switch the implementation to kfuncs in v3. > > > This particular one doesn't look useful as-is. > > The same logic can be expressed with > > - create dynptr > > - dynptr_slice > > - copy_from_kernel > > By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem > with dynptr_slice_rdwr and probe_read_kernel is that they only support a > compile-time constant size [1]. > > But in order to best utilize the space on a BPF ringbuf, it is possible > to reserve a > variable length of space as dynptr on a ringbuf with > bpf_ringbuf_reserve_dynptr. That makes sense. The commit log didn't call it out. Please spell out the motivation clearly. Also why bpf_probe_read_kernel_common ? Do we need to memset() it on failure?
On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: > > On 2025/1/26 00:58, Alexei Starovoitov wrote: > > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay > > <devnull+rsworktech.outlook.com@kernel.org> wrote: > >> From: Levi Zim <rsworktech@outlook.com> > >> > >> This patch add a helper function bpf_probe_read_kernel_dynptr: > >> > >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, > >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); > > We stopped adding helpers years ago. > > Only new kfuncs are allowed. > > Sorry, I didn't know that. Just asking, is there any > documentation/discussion > about stopping adding helpers? > > I will switch the implementation to kfuncs in v3. you might want to look at Documentation/bpf/kfuncs.rst as well > > > This particular one doesn't look useful as-is. > > The same logic can be expressed with > > - create dynptr > > - dynptr_slice > > - copy_from_kernel > > By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem > with dynptr_slice_rdwr and probe_read_kernel is that they only support a > compile-time constant size [1]. > > But in order to best utilize the space on a BPF ringbuf, it is possible > to reserve a > variable length of space as dynptr on a ringbuf with > bpf_ringbuf_reserve_dynptr. > > Then currently we have no way to read a variable length of kernel memory > into this dynptr, except doing it chunk by chunk[2], which is kinda awkward. > That's the problem the new helpers trying to solve. > And I am not the only one needing this kind of feature [3]. > > Andrii said it would be a straightforward addition as it is a super thin > wrapper > around existing functionality (we are just avoiding fixed buffer size > restrictions of > existing probe/copy_from APIs) Yep, exactly. All the probe/copy APIs we have require statically known dst buffer and, worst still, statically known read size, which makes it very hard and cumbersome to request data which size is known at runtime (e.g., process argv/environ data). So, in short, I think these new dynptr-based APIs are a good addition, but yeah, they'll have to be added as kfuncs. One thing that we need to figure out is how to deal with SKB/XDP dynptrs, we can't just reject them as destination buffers for all possible APIs. I think we'll need to develop a set of internal helpers that would handle skb/xdp dynptr cases almost transparently: - if skb/xdp memory is linear, we just memcpy/probe_read into it directly; - if it's not linear, we use on-the-stack small buffer (128 or 256 bytes, something like that), to do desired operation (memcpy or probe) into that buffer, and then using generic dynptr write functionality from that buffer into skb/xdp. We probably can start simple with excluding xdp/skb, but this should be the second step. > > [1]: > https://elixir.bootlin.com/linux/v6.12.6/source/kernel/bpf/helpers.c#L2600-L2601 > [2]: > https://github.com/libbpf/libbpf-bootstrap/commit/046fad60df3e39540937b5ec6ee86054f33d3f28 > [3]: https://github.com/libbpf/libbpf-rs/issues/1041 > [4]: > https://lore.kernel.org/bpf/CAEf4BzZctXJsR+TwMhmXNWnR0_BV802-3KJw226ZZt8St4xNkw@mail.gmail.com/ > > > pw-bot: cr
On Mon, Jan 27, 2025 at 5:04 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: > > > > On 2025/1/26 00:58, Alexei Starovoitov wrote: > > > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay > > > <devnull+rsworktech.outlook.com@kernel.org> wrote: > > >> From: Levi Zim <rsworktech@outlook.com> > > >> > > >> This patch add a helper function bpf_probe_read_kernel_dynptr: > > >> > > >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, > > >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); > > > We stopped adding helpers years ago. > > > Only new kfuncs are allowed. > > > > Sorry, I didn't know that. Just asking, is there any > > documentation/discussion > > about stopping adding helpers? > > > > I will switch the implementation to kfuncs in v3. > > > > > This particular one doesn't look useful as-is. > > > The same logic can be expressed with > > > - create dynptr > > > - dynptr_slice > > > - copy_from_kernel > > > > By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem > > with dynptr_slice_rdwr and probe_read_kernel is that they only support a > > compile-time constant size [1]. > > > > But in order to best utilize the space on a BPF ringbuf, it is possible > > to reserve a > > variable length of space as dynptr on a ringbuf with > > bpf_ringbuf_reserve_dynptr. For our uprobes, we've run into similar issues around doing variable-sized bpf_probe_read_user() into ring buffers for our debugger [1]. Our use case is that we generate uprobes that recursively read data structures until we fill up a buffer. The verifier's insistence on knowing statically that a read fits into the buffer makes for awkward code, and makes it hard to pack the buffer fully; we have to split our reads into a couple of static size classes. Any chance there'd be interest in taking the opportunity to support dynamically-sized reads from userspace too? :) [1] https://side-eye.io > > That makes sense. The commit log didn't call it out. > Please spell out the motivation clearly. > Also why bpf_probe_read_kernel_common ? > Do we need to memset() it on failure? >
On Mon, Jan 27, 2025 at 2:54 PM Andrei Matei <andreimatei1@gmail.com> wrote: > > On Mon, Jan 27, 2025 at 5:04 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: > > > > > > On 2025/1/26 00:58, Alexei Starovoitov wrote: > > > > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay > > > > <devnull+rsworktech.outlook.com@kernel.org> wrote: > > > >> From: Levi Zim <rsworktech@outlook.com> > > > >> > > > >> This patch add a helper function bpf_probe_read_kernel_dynptr: > > > >> > > > >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, > > > >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); > > > > We stopped adding helpers years ago. > > > > Only new kfuncs are allowed. > > > > > > Sorry, I didn't know that. Just asking, is there any > > > documentation/discussion > > > about stopping adding helpers? > > > > > > I will switch the implementation to kfuncs in v3. > > > > > > > This particular one doesn't look useful as-is. > > > > The same logic can be expressed with > > > > - create dynptr > > > > - dynptr_slice > > > > - copy_from_kernel > > > > > > By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem > > > with dynptr_slice_rdwr and probe_read_kernel is that they only support a > > > compile-time constant size [1]. > > > > > > But in order to best utilize the space on a BPF ringbuf, it is possible > > > to reserve a > > > variable length of space as dynptr on a ringbuf with > > > bpf_ringbuf_reserve_dynptr. > > For our uprobes, we've run into similar issues around doing variable-sized > bpf_probe_read_user() into ring buffers for our debugger [1]. Our use case > is that we generate uprobes that recursively read data structures until we > fill up a buffer. The verifier's insistence on knowing statically that a read > fits into the buffer makes for awkward code, and makes it hard to pack the > buffer fully; we have to split our reads into a couple of static size classes. > > Any chance there'd be interest in taking the opportunity to support > dynamically-sized reads from userspace too? :) That's bpf_probe_read_user_dynptr() from patch #2, no? But generally speaking, here's a list of new APIs that we'd need to cover all existing fixed buffer versions: - non-sleepable probe reads: bpf_probe_read_kernel_dynptr() bpf_probe_read_user_dynptr() bpf_probe_read_kernel_str_dynptr() bpf_probe_read_user_str_dynptr() - sleepable probe reads (copy_from_user): bpf_copy_from_user_dynptr() bpf_copy_from_user_str_dynptr() - and then we have complementary task-based APIs for non-current process: bpf_probe_read_user_task_dynptr() bpf_probe_read_user_str_task_dynptr() bpf_copy_from_user_task_dynptr() bpf_copy_from_user_str_task_dynptr() Jordan is working on non-dynptr version of bpf_copy_from_user_str_task(), once he's done with that, we'll add dynptr version, probably. > > [1] https://side-eye.io > > > > > That makes sense. The commit log didn't call it out. > > Please spell out the motivation clearly. > > Also why bpf_probe_read_kernel_common ? > > Do we need to memset() it on failure? > >
On 2025-01-28 07:09, Andrii Nakryiko wrote: > On Mon, Jan 27, 2025 at 2:54 PM Andrei Matei <andreimatei1@gmail.com> wrote: >> On Mon, Jan 27, 2025 at 5:04 PM Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >>> On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: >>>> On 2025/1/26 00:58, Alexei Starovoitov wrote: >>>> > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay >>>> > <devnull+rsworktech.outlook.com@kernel.org> wrote: >>>> >> From: Levi Zim <rsworktech@outlook.com> >>>> >> >>>> >> This patch add a helper function bpf_probe_read_kernel_dynptr: >>>> >> >>>> >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, >>>> >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); >>>> > We stopped adding helpers years ago. >>>> > Only new kfuncs are allowed. >>>> >>>> Sorry, I didn't know that. Just asking, is there any >>>> documentation/discussion >>>> about stopping adding helpers? >>>> >>>> I will switch the implementation to kfuncs in v3. >>>> >>>> > This particular one doesn't look useful as-is. >>>> > The same logic can be expressed with >>>> > - create dynptr >>>> > - dynptr_slice >>>> > - copy_from_kernel >>>> >>>> By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem >>>> with dynptr_slice_rdwr and probe_read_kernel is that they only support a >>>> compile-time constant size [1]. >>>> >>>> But in order to best utilize the space on a BPF ringbuf, it is possible >>>> to reserve a >>>> variable length of space as dynptr on a ringbuf with >>>> bpf_ringbuf_reserve_dynptr. >> For our uprobes, we've run into similar issues around doing variable-sized >> bpf_probe_read_user() into ring buffers for our debugger [1]. Our use case >> is that we generate uprobes that recursively read data structures until we >> fill up a buffer. The verifier's insistence on knowing statically that a read >> fits into the buffer makes for awkward code, and makes it hard to pack the >> buffer fully; we have to split our reads into a couple of static size classes. >> >> Any chance there'd be interest in taking the opportunity to support >> dynamically-sized reads from userspace too? :) > That's bpf_probe_read_user_dynptr() from patch #2, no? > > But generally speaking, here's a list of new APIs that we'd need to > cover all existing fixed buffer versions: > > - non-sleepable probe reads: > > bpf_probe_read_kernel_dynptr() > bpf_probe_read_user_dynptr() > bpf_probe_read_kernel_str_dynptr() I think the _str_dynptr versions are probably not worth adding. For example, when we use probe_read_kernel_str, the length of the str is usually not known and we usually allocate a fixed size buffer for it. If we do know the length of the str beforehand, we can just use probe_read_kernel_dynptr. > bpf_probe_read_user_str_dynptr() > > - sleepable probe reads (copy_from_user): > > bpf_copy_from_user_dynptr() > bpf_copy_from_user_str_dynptr() > > - and then we have complementary task-based APIs for non-current process: > > bpf_probe_read_user_task_dynptr() > bpf_probe_read_user_str_task_dynptr() > bpf_copy_from_user_task_dynptr() > bpf_copy_from_user_str_task_dynptr() > > Jordan is working on non-dynptr version of > bpf_copy_from_user_str_task(), once he's done with that, we'll add > dynptr version, probably. > >> [1] https://side-eye.io >> >>> That makes sense. The commit log didn't call it out. >>> Please spell out the motivation clearly. >>> Also why bpf_probe_read_kernel_common ? >>> Do we need to memset() it on failure? >>>
On Mon, Jan 27, 2025 at 3:09 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Jan 27, 2025 at 2:54 PM Andrei Matei <andreimatei1@gmail.com> wrote: > > > > On Mon, Jan 27, 2025 at 5:04 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: > > > > > > > > On 2025/1/26 00:58, Alexei Starovoitov wrote: > > > > > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay > > > > > <devnull+rsworktech.outlook.com@kernel.org> wrote: > > > > >> From: Levi Zim <rsworktech@outlook.com> > > > > >> > > > > >> This patch add a helper function bpf_probe_read_kernel_dynptr: > > > > >> > > > > >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, > > > > >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); > > > > > We stopped adding helpers years ago. > > > > > Only new kfuncs are allowed. > > > > > > > > Sorry, I didn't know that. Just asking, is there any > > > > documentation/discussion > > > > about stopping adding helpers? > > > > > > > > I will switch the implementation to kfuncs in v3. > > > > > > > > > This particular one doesn't look useful as-is. > > > > > The same logic can be expressed with > > > > > - create dynptr > > > > > - dynptr_slice > > > > > - copy_from_kernel > > > > > > > > By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem > > > > with dynptr_slice_rdwr and probe_read_kernel is that they only support a > > > > compile-time constant size [1]. > > > > > > > > But in order to best utilize the space on a BPF ringbuf, it is possible > > > > to reserve a > > > > variable length of space as dynptr on a ringbuf with > > > > bpf_ringbuf_reserve_dynptr. > > > > For our uprobes, we've run into similar issues around doing variable-sized > > bpf_probe_read_user() into ring buffers for our debugger [1]. Our use case > > is that we generate uprobes that recursively read data structures until we > > fill up a buffer. The verifier's insistence on knowing statically that a read > > fits into the buffer makes for awkward code, and makes it hard to pack the > > buffer fully; we have to split our reads into a couple of static size classes. > > > > Any chance there'd be interest in taking the opportunity to support > > dynamically-sized reads from userspace too? :) > > That's bpf_probe_read_user_dynptr() from patch #2, no? > > But generally speaking, here's a list of new APIs that we'd need to > cover all existing fixed buffer versions: > > - non-sleepable probe reads: > > bpf_probe_read_kernel_dynptr() > bpf_probe_read_user_dynptr() > bpf_probe_read_kernel_str_dynptr() > bpf_probe_read_user_str_dynptr() > > - sleepable probe reads (copy_from_user): > > bpf_copy_from_user_dynptr() > bpf_copy_from_user_str_dynptr() > > - and then we have complementary task-based APIs for non-current process: > > bpf_probe_read_user_task_dynptr() > bpf_probe_read_user_str_task_dynptr() > bpf_copy_from_user_task_dynptr() > bpf_copy_from_user_str_task_dynptr() > > Jordan is working on non-dynptr version of > bpf_copy_from_user_str_task(), once he's done with that, we'll add > dynptr version, probably. This is quite a bunch of kfuncs. It doesn't look like adding _dynptr suffix and duplicating kfuncs approach scales. Let's make the existing helpers/kfuncs more flexible ? We can introduce a kfunc bpf_dynptr_buf() that checks that dynptr is not readonly and type == local or ringbuf and return dynptr->data as PTR_TO_MEM | dynptr_flag | VERIFIER_ADDS_SIZE_CHECK. Then allow bpf_probe_read_user/kernel/... all of them to accept this register type where PTR_TO_MEM is required while relaxing ARG_CONST_SIZE 2nd argument to ARG_ANYTHING. Then the verifier will insert an extra check if (arg1->size < arg2) before the call. Not only the bpf_probe_read_kernel/user, _str variants will work but things like bpf_strtol, bpf_strncmp, bpf_snprintf, bpf_get_stack will auto-magically work as well. I think those are quite valuable to make available with non-constant size. bpf_get_stack_*() directly into the ring buffer sounds very useful.
On 2025/1/28 06:04, Alexei Starovoitov wrote: > On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: >> On 2025/1/26 00:58, Alexei Starovoitov wrote: >> > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay >> > <devnull+rsworktech.outlook.com@kernel.org> wrote: >> >> From: Levi Zim <rsworktech@outlook.com> >> >> >> >> This patch add a helper function bpf_probe_read_kernel_dynptr: >> >> >> >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, >> >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); >> > We stopped adding helpers years ago. >> > Only new kfuncs are allowed. >> >> Sorry, I didn't know that. Just asking, is there any >> documentation/discussion >> about stopping adding helpers? >> >> I will switch the implementation to kfuncs in v3. >> >> > This particular one doesn't look useful as-is. >> > The same logic can be expressed with >> > - create dynptr >> > - dynptr_slice >> > - copy_from_kernel >> >> By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem >> with dynptr_slice_rdwr and probe_read_kernel is that they only support a >> compile-time constant size [1]. >> >> But in order to best utilize the space on a BPF ringbuf, it is possible >> to reserve a >> variable length of space as dynptr on a ringbuf with >> bpf_ringbuf_reserve_dynptr. > That makes sense. The commit log didn't call it out. > Please spell out the motivation clearly. Thanks for the advice! I will include it in v3. > Also why bpf_probe_read_kernel_common ? > Do we need to memset() it on failure? Since the current patch is basically a thin wrapper around bpf_probe_read_kernel, I think we'd better not deviate from the wrapped function.
On 2025/1/28 10:57, Alexei Starovoitov wrote: > On Mon, Jan 27, 2025 at 3:09 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> On Mon, Jan 27, 2025 at 2:54 PM Andrei Matei <andreimatei1@gmail.com> wrote: >>> On Mon, Jan 27, 2025 at 5:04 PM Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> On Sat, Jan 25, 2025 at 5:05 PM Levi Zim <rsworktech@outlook.com> wrote: >>>>> On 2025/1/26 00:58, Alexei Starovoitov wrote: >>>>> > On Sat, Jan 25, 2025 at 12:30 AM Levi Zim via B4 Relay >>>>> > <devnull+rsworktech.outlook.com@kernel.org> wrote: >>>>> >> From: Levi Zim <rsworktech@outlook.com> >>>>> >> >>>>> >> This patch add a helper function bpf_probe_read_kernel_dynptr: >>>>> >> >>>>> >> long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, >>>>> >> u32 offset, u32 size, const void *unsafe_ptr, u64 flags); >>>>> > We stopped adding helpers years ago. >>>>> > Only new kfuncs are allowed. >>>>> >>>>> Sorry, I didn't know that. Just asking, is there any >>>>> documentation/discussion >>>>> about stopping adding helpers? >>>>> >>>>> I will switch the implementation to kfuncs in v3. >>>>> >>>>> > This particular one doesn't look useful as-is. >>>>> > The same logic can be expressed with >>>>> > - create dynptr >>>>> > - dynptr_slice >>>>> > - copy_from_kernel >>>>> >>>>> By copy_from_kernel I assume you mean bpf_probe_read_kernel. The problem >>>>> with dynptr_slice_rdwr and probe_read_kernel is that they only support a >>>>> compile-time constant size [1]. >>>>> >>>>> But in order to best utilize the space on a BPF ringbuf, it is possible >>>>> to reserve a >>>>> variable length of space as dynptr on a ringbuf with >>>>> bpf_ringbuf_reserve_dynptr. >>> For our uprobes, we've run into similar issues around doing variable-sized >>> bpf_probe_read_user() into ring buffers for our debugger [1]. Our use case >>> is that we generate uprobes that recursively read data structures until we >>> fill up a buffer. The verifier's insistence on knowing statically that a read >>> fits into the buffer makes for awkward code, and makes it hard to pack the >>> buffer fully; we have to split our reads into a couple of static size classes. >>> >>> Any chance there'd be interest in taking the opportunity to support >>> dynamically-sized reads from userspace too? :) >> That's bpf_probe_read_user_dynptr() from patch #2, no? >> >> But generally speaking, here's a list of new APIs that we'd need to >> cover all existing fixed buffer versions: >> >> - non-sleepable probe reads: >> >> bpf_probe_read_kernel_dynptr() >> bpf_probe_read_user_dynptr() >> bpf_probe_read_kernel_str_dynptr() >> bpf_probe_read_user_str_dynptr() >> >> - sleepable probe reads (copy_from_user): >> >> bpf_copy_from_user_dynptr() >> bpf_copy_from_user_str_dynptr() >> >> - and then we have complementary task-based APIs for non-current process: >> >> bpf_probe_read_user_task_dynptr() >> bpf_probe_read_user_str_task_dynptr() >> bpf_copy_from_user_task_dynptr() >> bpf_copy_from_user_str_task_dynptr() >> >> Jordan is working on non-dynptr version of >> bpf_copy_from_user_str_task(), once he's done with that, we'll add >> dynptr version, probably. > This is quite a bunch of kfuncs. > It doesn't look like adding _dynptr suffix and duplicating > kfuncs approach scales. The _str_dynptr versions might not worth adding [1]. So only four read_{kernel,user}_dynptr and copy_from_user{,_task}_dynptr are needed, which seems manageable for now. But taking other helpers like bpf_strtol into account does quickly show that this approach is not scalable. > Let's make the existing helpers/kfuncs more flexible ? > > We can introduce a kfunc bpf_dynptr_buf() that checks that > dynptr is not readonly and type == local or ringbuf and > return dynptr->data as PTR_TO_MEM | dynptr_flag | VERIFIER_ADDS_SIZE_CHECK. > > Then allow bpf_probe_read_user/kernel/... all of them to accept > this register type where PTR_TO_MEM is required > while relaxing ARG_CONST_SIZE 2nd argument to ARG_ANYTHING. > Then the verifier will insert an extra check > if (arg1->size < arg2) > before the call. Nice idea. I will try this approach first. > > Not only the bpf_probe_read_kernel/user, _str variants will work > but things like bpf_strtol, bpf_strncmp, bpf_snprintf, bpf_get_stack > will auto-magically work as well. > > I think those are quite valuable to make available with non-constant size. > bpf_get_stack_*() directly into the ring buffer sounds very useful. [1]: https://lore.kernel.org/bpf/20250125-bpf_dynptr_probe-v2-0-c42c87f97afe@outlook.com/T/#m9700146d286a88abc0b25ef47041015ba6c477a3
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f3f50e29d63929acaf12c81f8356173f1f5e154b..9d5ae8b4b7d82c4523bf0ab041d4b76bf134a106 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1323,6 +1323,8 @@ u32 __bpf_dynptr_size(const struct bpf_dynptr_kern *ptr); const void *__bpf_dynptr_data(const struct bpf_dynptr_kern *ptr, u32 len); void *__bpf_dynptr_data_rw(const struct bpf_dynptr_kern *ptr, u32 len); bool __bpf_dynptr_is_rdonly(const struct bpf_dynptr_kern *ptr); +int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u32 offset, u32 len); +enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr); #ifdef CONFIG_BPF_JIT int bpf_trampoline_link_prog(struct bpf_tramp_link *link, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2acf9b33637174bd16b1d12ccc6410c5f55a7ea9..2e08a59527ecf56732ea14ac34446b5eb25b5690 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5805,6 +5805,21 @@ union bpf_attr { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * long bpf_probe_read_kernel_dynptr(const struct bpf_dynptr *dst, u32 offset, u32 size, const void *unsafe_ptr, u64 flags) + * Description + * Safely attempt to read *size* bytes from kernel space address + * *unsafe_ptr* and store the data in *dst* starting from *offset*. + * *flags* is currently unused. + * Return + * 0 on success. + * + * **-E2BIG** if *offset* + *len* exceeds the length of *src*'s data + * + * **-EINVAL** if *src* is an invalid dynptr or doesn't support this + * support this helper, or if *flags* is not 0. + * + * Or other negative errors on failure reading kernel memory. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -6019,6 +6034,7 @@ union bpf_attr { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(probe_read_kernel_dynptr, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index f27ce162427ab4040d2e2d2eb84a883fe57de59e..a736dc9e7be98571103ba404420be0da4dac4fbe 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1678,7 +1678,7 @@ static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_typ ptr->size |= type << DYNPTR_TYPE_SHIFT; } -static enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr) +enum bpf_dynptr_type bpf_dynptr_get_type(const struct bpf_dynptr_kern *ptr) { return (ptr->size & ~(DYNPTR_RDONLY_BIT)) >> DYNPTR_TYPE_SHIFT; } @@ -1714,7 +1714,7 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr) memset(ptr, 0, sizeof(*ptr)); } -static int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u32 offset, u32 len) +int bpf_dynptr_check_off_len(const struct bpf_dynptr_kern *ptr, u32 offset, u32 len) { u32 size = __bpf_dynptr_size(ptr); @@ -1900,6 +1900,7 @@ const struct bpf_func_proto bpf_probe_read_user_proto __weak; const struct bpf_func_proto bpf_probe_read_user_str_proto __weak; const struct bpf_func_proto bpf_probe_read_kernel_proto __weak; const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak; +const struct bpf_func_proto bpf_probe_read_kernel_dynptr_proto __weak; const struct bpf_func_proto bpf_task_pt_regs_proto __weak; const struct bpf_func_proto * @@ -2031,6 +2032,9 @@ bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_probe_read_kernel: return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? NULL : &bpf_probe_read_kernel_proto; + case BPF_FUNC_probe_read_kernel_dynptr: + return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? + NULL : &bpf_probe_read_kernel_dynptr_proto; case BPF_FUNC_probe_read_user_str: return &bpf_probe_read_user_str_proto; case BPF_FUNC_probe_read_kernel_str: diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index adc947587eb8132ebbd54778d2db937b3b8861de..75c9d1e8d04c3b8930ae81345f5586756ce8b5ec 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -248,6 +248,48 @@ const struct bpf_func_proto bpf_probe_read_kernel_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_5(bpf_probe_read_kernel_dynptr, const struct bpf_dynptr_kern *, dst, + u32, offset, u32, size, void *, unsafe_ptr, u64, flags) +{ + enum bpf_dynptr_type type; + int err; + + if (!dst->data || __bpf_dynptr_is_rdonly(dst)) + return -EINVAL; + + err = bpf_dynptr_check_off_len(dst, offset, size); + if (err) + return err; + + type = bpf_dynptr_get_type(dst); + + switch (type) { + case BPF_DYNPTR_TYPE_LOCAL: + case BPF_DYNPTR_TYPE_RINGBUF: + if (flags) + return -EINVAL; + return bpf_probe_read_kernel_common(dst->data + dst->offset + offset, + size, unsafe_ptr); + case BPF_DYNPTR_TYPE_SKB: + case BPF_DYNPTR_TYPE_XDP: + return -EINVAL; + default: + WARN_ONCE(true, "%s: unknown dynptr type %d\n", __func__, type); + return -EFAULT; + } +} + +const struct bpf_func_proto bpf_probe_read_kernel_dynptr_proto = { + .func = bpf_probe_read_kernel_dynptr, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_DYNPTR | MEM_RDONLY, + .arg2_type = ARG_ANYTHING, + .arg3_type = ARG_ANYTHING, + .arg4_type = ARG_ANYTHING, + .arg5_type = ARG_ANYTHING, +}; + static __always_inline int bpf_probe_read_kernel_str_common(void *dst, u32 size, const void *unsafe_ptr) {