Message ID | 20221207205537.860248-1-joannelkoong@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Dynptr convenience helpers | expand |
On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > and the 2nd can be found here [1]. > > In this patchset, the following convenience helpers are added for interacting > with bpf dynamic pointers: > > * bpf_dynptr_data_rdonly > * bpf_dynptr_trim > * bpf_dynptr_advance > * bpf_dynptr_is_null > * bpf_dynptr_is_rdonly > * bpf_dynptr_get_size > * bpf_dynptr_get_offset > * bpf_dynptr_clone > * bpf_dynptr_iterator This is great, but it really stretches uapi limits. Please convert the above and those in [1] to kfuncs. I know that there can be an argument made for consistency with existing dynptr uapi helpers, but we got burned on them once and scrambled to add 'flags' argument. kfuncs are unstable and can be adjusted/removed at any time later. The verifier now supports dynptr in kfunc verification, so conversion should be straightforward. Thanks > > Please note that this patchset will be rebased on top of dynptr refactoring/fixes > once that is landed upstream. > > [0] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ > [1] https://lore.kernel.org/bpf/20221021011510.1890852-1-joannelkoong@gmail.com/ >
On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > and the 2nd can be found here [1]. > > > > In this patchset, the following convenience helpers are added for interacting > > with bpf dynamic pointers: > > > > * bpf_dynptr_data_rdonly > > * bpf_dynptr_trim > > * bpf_dynptr_advance > > * bpf_dynptr_is_null > > * bpf_dynptr_is_rdonly > > * bpf_dynptr_get_size > > * bpf_dynptr_get_offset > > * bpf_dynptr_clone > > * bpf_dynptr_iterator > > This is great, but it really stretches uapi limits. Stretches in what sense? They are simple and straightforward getters and trim/advance/clone are fundamental modifiers to be able to work with a subset of dynptr's overall memory area. > Please convert the above and those in [1] to kfuncs. > I know that there can be an argument made for consistency with existing dynptr uapi yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as BPF helpers, it makes sense to have such basic things like is_null and trim/advance/clone as BPF helpers as well. Both for consistency and because there is nothing unstable about them. We are not going to remove dynptr as a concept, it's pretty well defined. Out of the above list perhaps only move bpf_dynptr_iterator() might be a candidate for kfunc. Though, personally, it makes sense to me to keep it as BPF helper without GPL restriction as well, given it is meant for networking applications in the first place, and you don't need to be GPL-compatible to write useful networking BPF program, from what I understand. But all the other ones is something you'd need to make actual use of dynptr concept in real-world BPF programs. Can we please have those as BPF helpers, and we can decide to move slightly fancier bpf_dynptr_iterator() (and future dynptr-related extras) into kfunc? > helpers, but we got burned on them once and scrambled to add 'flags' argument. > kfuncs are unstable and can be adjusted/removed at any time later. I don't see why we would remove any of the above list ever? They are generic and fundamental to dynptr as a concept, they can't restrict what dynptr can do in the future. Also GPL restriction of kfuncs doesn't apply to these dynptr helpers either, IMO. > The verifier now supports dynptr in kfunc verification, so conversion should > be straightforward. > Thanks > > > > > Please note that this patchset will be rebased on top of dynptr refactoring/fixes > > once that is landed upstream. > > > > [0] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ > > [1] https://lore.kernel.org/bpf/20221021011510.1890852-1-joannelkoong@gmail.com/ > >
On Thu, Dec 8, 2022 at 4:42 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > > and the 2nd can be found here [1]. > > > > > > In this patchset, the following convenience helpers are added for interacting > > > with bpf dynamic pointers: > > > > > > * bpf_dynptr_data_rdonly > > > * bpf_dynptr_trim > > > * bpf_dynptr_advance > > > * bpf_dynptr_is_null > > > * bpf_dynptr_is_rdonly > > > * bpf_dynptr_get_size > > > * bpf_dynptr_get_offset > > > * bpf_dynptr_clone > > > * bpf_dynptr_iterator > > > > This is great, but it really stretches uapi limits. > > Stretches in what sense? They are simple and straightforward getters > and trim/advance/clone are fundamental modifiers to be able to work > with a subset of dynptr's overall memory area. > > > Please convert the above and those in [1] to kfuncs. > > I know that there can be an argument made for consistency with existing dynptr uapi > > yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as > BPF helpers, it makes sense to have such basic things like is_null and > trim/advance/clone as BPF helpers as well. Both for consistency and > because there is nothing unstable about them. We are not going to > remove dynptr as a concept, it's pretty well defined. > > Out of the above list perhaps only move bpf_dynptr_iterator() might be > a candidate for kfunc. Though, personally, it makes sense to me to > keep it as BPF helper without GPL restriction as well, given it is > meant for networking applications in the first place, and you don't > need to be GPL-compatible to write useful networking BPF program, from > what I understand. But all the other ones is something you'd need to > make actual use of dynptr concept in real-world BPF programs. > > Can we please have those as BPF helpers, and we can decide to move > slightly fancier bpf_dynptr_iterator() (and future dynptr-related > extras) into kfunc? Sorry, uapi concerns are more important here. non-gpl and consistency don't even come close. We've been doing everything new as kfuncs and dynptr is not special. > > helpers, but we got burned on them once and scrambled to add 'flags' argument. > > kfuncs are unstable and can be adjusted/removed at any time later. > > I don't see why we would remove any of the above list ever? They are > generic and fundamental to dynptr as a concept, they can't restrict > what dynptr can do in the future. It's not about removing them, but about changing them. Just for example the whole discussion of whether frags should be handled transparently and how write is handled didn't inspire confidence that there is a strong consensus on semantics of these new dynptr accessors. Scrambling to add flags to dynptr helpers was another red flag. All signs are pointing out that we're not ready do fix dynptr api. It will evolve and has to evolve without uapi pain. kfuncs only. For everything. Please.
On Thu, Dec 8, 2022 at 5:30 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Dec 8, 2022 at 4:42 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > > > and the 2nd can be found here [1]. > > > > > > > > In this patchset, the following convenience helpers are added for interacting > > > > with bpf dynamic pointers: > > > > > > > > * bpf_dynptr_data_rdonly > > > > * bpf_dynptr_trim > > > > * bpf_dynptr_advance > > > > * bpf_dynptr_is_null > > > > * bpf_dynptr_is_rdonly > > > > * bpf_dynptr_get_size > > > > * bpf_dynptr_get_offset > > > > * bpf_dynptr_clone > > > > * bpf_dynptr_iterator > > > > > > This is great, but it really stretches uapi limits. > > > > Stretches in what sense? They are simple and straightforward getters > > and trim/advance/clone are fundamental modifiers to be able to work > > with a subset of dynptr's overall memory area. > > > > > Please convert the above and those in [1] to kfuncs. > > > I know that there can be an argument made for consistency with existing dynptr uapi > > > > yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as > > BPF helpers, it makes sense to have such basic things like is_null and > > trim/advance/clone as BPF helpers as well. Both for consistency and > > because there is nothing unstable about them. We are not going to > > remove dynptr as a concept, it's pretty well defined. > > > > Out of the above list perhaps only move bpf_dynptr_iterator() might be > > a candidate for kfunc. Though, personally, it makes sense to me to > > keep it as BPF helper without GPL restriction as well, given it is > > meant for networking applications in the first place, and you don't > > need to be GPL-compatible to write useful networking BPF program, from > > what I understand. But all the other ones is something you'd need to > > make actual use of dynptr concept in real-world BPF programs. > > > > Can we please have those as BPF helpers, and we can decide to move > > slightly fancier bpf_dynptr_iterator() (and future dynptr-related > > extras) into kfunc? > > Sorry, uapi concerns are more important here. > non-gpl and consistency don't even come close. > We've been doing everything new as kfuncs and dynptr is not special. > > > > helpers, but we got burned on them once and scrambled to add 'flags' argument. > > > kfuncs are unstable and can be adjusted/removed at any time later. > > > > I don't see why we would remove any of the above list ever? They are > > generic and fundamental to dynptr as a concept, they can't restrict > > what dynptr can do in the future. > > It's not about removing them, but about changing them. > > Just for example the whole discussion of whether frags should > be handled transparently and how write is handled didn't inspire > confidence that there is a strong consensus on semantics > of these new dynptr accessors. > > Scrambling to add flags to dynptr helpers was another red flag. > > All signs are pointing out that we're not ready do fix dynptr api. > It will evolve and has to evolve without uapi pain. > > kfuncs only. For everything. Please. Thanks for your feedback, Alexei and Andrii. I share the same opinion as Andrii about helpers for the APIs that are straightforward (eg bpf_dynptr_get_offset), but I see your point as well about doing everything new as kfuncs. I'll change this to use kfuncs for v3.
On Thu, Dec 8, 2022 at 5:30 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Dec 8, 2022 at 4:42 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > > > and the 2nd can be found here [1]. > > > > > > > > In this patchset, the following convenience helpers are added for interacting > > > > with bpf dynamic pointers: > > > > > > > > * bpf_dynptr_data_rdonly > > > > * bpf_dynptr_trim > > > > * bpf_dynptr_advance > > > > * bpf_dynptr_is_null > > > > * bpf_dynptr_is_rdonly > > > > * bpf_dynptr_get_size > > > > * bpf_dynptr_get_offset > > > > * bpf_dynptr_clone > > > > * bpf_dynptr_iterator > > > > > > This is great, but it really stretches uapi limits. > > > > Stretches in what sense? They are simple and straightforward getters > > and trim/advance/clone are fundamental modifiers to be able to work > > with a subset of dynptr's overall memory area. > > > > > Please convert the above and those in [1] to kfuncs. > > > I know that there can be an argument made for consistency with existing dynptr uapi > > > > yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as > > BPF helpers, it makes sense to have such basic things like is_null and > > trim/advance/clone as BPF helpers as well. Both for consistency and > > because there is nothing unstable about them. We are not going to > > remove dynptr as a concept, it's pretty well defined. > > > > Out of the above list perhaps only move bpf_dynptr_iterator() might be > > a candidate for kfunc. Though, personally, it makes sense to me to > > keep it as BPF helper without GPL restriction as well, given it is > > meant for networking applications in the first place, and you don't > > need to be GPL-compatible to write useful networking BPF program, from > > what I understand. But all the other ones is something you'd need to > > make actual use of dynptr concept in real-world BPF programs. > > > > Can we please have those as BPF helpers, and we can decide to move > > slightly fancier bpf_dynptr_iterator() (and future dynptr-related > > extras) into kfunc? > > Sorry, uapi concerns are more important here. What about the overall user experience and adoption? There is no clean way to ever move from unstable kfunc to a stable helper. BPF helpers also have the advantage of working on all architectures, whether that architecture supports kfuncs or not, whether it supports JIT or not. BPF helpers are also nicely self-discoverable and documented in include/uapi/linux/bpf.h, in one place where other BPF helpers are. This is a big deal, especially for non-expert BPF users (a vast majority of BPF users). > non-gpl and consistency don't even come close. > We've been doing everything new as kfuncs and dynptr is not special. I think dynptr is quite special. It's a very generic and fundamental concept, part of core BPF experience. It's a more dynamic counterpart to an inflexible statically sized `void * + size` pair of arguments sent to helpers for input or output memory regions. Dynptr has no inherent dependencies on BTF, kfuncs, trampolines, JIT, nothing. By requiring kfunc-based helpers we are significantly raising the obstacles towards adopting dynptr across a wide range of BPF applications. And the only advantage in return is that we get a hypothetical chance to change something in the future. But let's see if that will ever be necessary for the helpers Joanne is adding: 1. Generic accessors to check validity of *any* dynptr, and it's inherent properties like offset, available size, read-only property (just as useful somethings as bpf_ringbuf_query() is for ringbufs, both for debugging and for various heuristics in production). bpf_dynptr_is_null(struct bpf_dynptr *ptr) long bpf_dynptr_get_size(struct bpf_dynptr *ptr) long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) There is nothing to add or remove here. No flags, no change in semantics. 2. Manipulators to copy existing dynptr's view and narrow it down to a subset (e.g., for when you have a large memory blog, but need to calculate hashes over smaller subset, without destroying original dynptr, because it will be used later for some other access). We can debate whether clone should get offset or not, but it doesn't change much (except usability in common cases). Again, nothing to add or remove otherwise, and pretty fundamental for real use of full power of dynptr. long bpf_dynptr_clone(struct bpf_dynptr *ptr, struct bpf_dynptr *clone, u32 offset) long bpf_dynptr_trim(struct bpf_dynptr *ptr, u32 len) long bpf_dynptr_advance(struct bpf_dynptr *ptr, u32 len) 3. This one is the only one I feel less strongly about, but mostly because I can implement the same (even though less ergonomically, of course) with bpf_loop() and bpf_dynptr_{clone,advance}. long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, void *callback_ctx, u64 flags) All of the above don't add or change any semantics to dynptr as a concept. There is nothing that we'd need to change. > > > > helpers, but we got burned on them once and scrambled to add 'flags' argument. > > > kfuncs are unstable and can be adjusted/removed at any time later. It's unfair to block these helpers just because we recided to add flags to one of the previous ones (before the final release). And even if we didn't managed to do it in time, the worst things would probably be another variant of BPF helper. Definitely something to avoid, but not end of the world. But as I pointed out above, this set of helpers won't be change, as they just complete already established dynptr ecosystem of helpers. > > > > I don't see why we would remove any of the above list ever? They are > > generic and fundamental to dynptr as a concept, they can't restrict > > what dynptr can do in the future. > > It's not about removing them, but about changing them. > > Just for example the whole discussion of whether frags should > be handled transparently and how write is handled didn't inspire > confidence that there is a strong consensus on semantics > of these new dynptr accessors. So let's start with acknowledging that skb and xdp buffer abstractions as logically contiguous memory area are inherently complex and non-perfect due to the way that kernel handles them for performance and flexibility reasons. Let's also note that verifier knows specific flavor of dynptr and thus can enforce additional restrictions based on specifically SKB/XDP flavor vs LOCAL/RINGBUF. So just because there is no perfect way to handle all the SKB/XDP physical non-contiguity, doesn't mean that the dynptr concept itself is flawed or not well thought out. It's just that for SKB/XDP there is no perfect solution. Dynptr doesn't change anything here, rather it actually simplifies a bunch of stuff, especially for common scenarios. I'd argue that for wider SKB/XDP dynptr adoption in the networking world, those dynptr constructor helpers should be helpers and not kfuncs as well. But I'd wish someone with more networking tie-ins would argue this instead of me. > > Scrambling to add flags to dynptr helpers was another red flag. > > All signs are pointing out that we're not ready do fix dynptr api. I disagree, it's an overly harsh generalization. > It will evolve and has to evolve without uapi pain. > > kfuncs only. For everything. Please. This is yet another generalized blanket statement I disagree with. Over the years I've got an impression that the BPF subsystem is generally a proud proponent of pragmatic, flexible, and common sense engineering approaches, so this hard-and-fast rule with no room for nuance sounds weird. There are things that belong in fundamental and core BPF concepts, and it makes sense to keep them as stable abstractions and helpers. And there are various things (like interfacing into kernel mechanics, its types and systems) which totally make sense to keep unstable.
On Mon, Dec 12, 2022 at 12:12 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Dec 8, 2022 at 5:30 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Dec 8, 2022 at 4:42 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > > > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > > > > and the 2nd can be found here [1]. > > > > > > > > > > In this patchset, the following convenience helpers are added for interacting > > > > > with bpf dynamic pointers: > > > > > > > > > > * bpf_dynptr_data_rdonly > > > > > * bpf_dynptr_trim > > > > > * bpf_dynptr_advance > > > > > * bpf_dynptr_is_null > > > > > * bpf_dynptr_is_rdonly > > > > > * bpf_dynptr_get_size > > > > > * bpf_dynptr_get_offset > > > > > * bpf_dynptr_clone > > > > > * bpf_dynptr_iterator > > > > > > > > This is great, but it really stretches uapi limits. > > > > > > Stretches in what sense? They are simple and straightforward getters > > > and trim/advance/clone are fundamental modifiers to be able to work > > > with a subset of dynptr's overall memory area. > > > > > > > Please convert the above and those in [1] to kfuncs. > > > > I know that there can be an argument made for consistency with existing dynptr uapi > > > > > > yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as > > > BPF helpers, it makes sense to have such basic things like is_null and > > > trim/advance/clone as BPF helpers as well. Both for consistency and > > > because there is nothing unstable about them. We are not going to > > > remove dynptr as a concept, it's pretty well defined. > > > > > > Out of the above list perhaps only move bpf_dynptr_iterator() might be > > > a candidate for kfunc. Though, personally, it makes sense to me to > > > keep it as BPF helper without GPL restriction as well, given it is > > > meant for networking applications in the first place, and you don't > > > need to be GPL-compatible to write useful networking BPF program, from > > > what I understand. But all the other ones is something you'd need to > > > make actual use of dynptr concept in real-world BPF programs. > > > > > > Can we please have those as BPF helpers, and we can decide to move > > > slightly fancier bpf_dynptr_iterator() (and future dynptr-related > > > extras) into kfunc? > > > > Sorry, uapi concerns are more important here. > > What about the overall user experience and adoption? > > There is no clean way to ever move from unstable kfunc to a stable helper. > > BPF helpers also have the advantage of working on all architectures, > whether that architecture supports kfuncs or not, whether it supports > JIT or not. Oh interesting, I didn't realize some architectures do not support kfuncs. Out of curiosity, can you elaborate on "no clean way to move from unstable kfunc to a stable helper"? If for example we needed to move something from kfunc -> helper, could we not just remove the code where we added it as a kfunc (eg defining a BTF_ID for it) and add it as a helper instead? > > BPF helpers are also nicely self-discoverable and documented in > include/uapi/linux/bpf.h, in one place where other BPF helpers are. > This is a big deal, especially for non-expert BPF users (a vast > majority of BPF users). > > > non-gpl and consistency don't even come close. > > We've been doing everything new as kfuncs and dynptr is not special. > > I think dynptr is quite special. It's a very generic and fundamental > concept, part of core BPF experience. It's a more dynamic counterpart > to an inflexible statically sized `void * + size` pair of arguments > sent to helpers for input or output memory regions. Dynptr has no > inherent dependencies on BTF, kfuncs, trampolines, JIT, nothing. > > By requiring kfunc-based helpers we are significantly raising the > obstacles towards adopting dynptr across a wide range of BPF > applications. > > And the only advantage in return is that we get a hypothetical chance > to change something in the future. But let's see if that will ever be > necessary for the helpers Joanne is adding: > > 1. Generic accessors to check validity of *any* dynptr, and it's > inherent properties like offset, available size, read-only property > (just as useful somethings as bpf_ringbuf_query() is for ringbufs, > both for debugging and for various heuristics in production). > > bpf_dynptr_is_null(struct bpf_dynptr *ptr) > long bpf_dynptr_get_size(struct bpf_dynptr *ptr) > long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) > bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) > > There is nothing to add or remove here. No flags, no change in semantics. > > 2. Manipulators to copy existing dynptr's view and narrow it down to a > subset (e.g., for when you have a large memory blog, but need to > calculate hashes over smaller subset, without destroying original > dynptr, because it will be used later for some other access). We can > debate whether clone should get offset or not, but it doesn't change > much (except usability in common cases). Again, nothing to add or > remove otherwise, and pretty fundamental for real use of full power of > dynptr. > > long bpf_dynptr_clone(struct bpf_dynptr *ptr, struct bpf_dynptr > *clone, u32 offset) > long bpf_dynptr_trim(struct bpf_dynptr *ptr, u32 len) > long bpf_dynptr_advance(struct bpf_dynptr *ptr, u32 len) > > 3. This one is the only one I feel less strongly about, but mostly > because I can implement the same (even though less ergonomically, of > course) with bpf_loop() and bpf_dynptr_{clone,advance}. > > long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, > void *callback_ctx, u64 flags) > > > All of the above don't add or change any semantics to dynptr as a > concept. There is nothing that we'd need to change. > > > > > > > > helpers, but we got burned on them once and scrambled to add 'flags' argument. > > > > kfuncs are unstable and can be adjusted/removed at any time later. > > It's unfair to block these helpers just because we recided to add > flags to one of the previous ones (before the final release). And even > if we didn't managed to do it in time, the worst things would probably > be another variant of BPF helper. Definitely something to avoid, but > not end of the world. But as I pointed out above, this set of helpers > won't be change, as they just complete already established dynptr > ecosystem of helpers. > > > > > > > I don't see why we would remove any of the above list ever? They are > > > generic and fundamental to dynptr as a concept, they can't restrict > > > what dynptr can do in the future. > > > > It's not about removing them, but about changing them. > > > > Just for example the whole discussion of whether frags should > > be handled transparently and how write is handled didn't inspire > > confidence that there is a strong consensus on semantics > > of these new dynptr accessors. > > So let's start with acknowledging that skb and xdp buffer abstractions > as logically contiguous memory area are inherently complex and > non-perfect due to the way that kernel handles them for performance > and flexibility reasons. > > Let's also note that verifier knows specific flavor of dynptr and thus > can enforce additional restrictions based on specifically SKB/XDP > flavor vs LOCAL/RINGBUF. So just because there is no perfect way to > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > dynptr concept itself is flawed or not well thought out. It's just > that for SKB/XDP there is no perfect solution. Dynptr doesn't change > anything here, rather it actually simplifies a bunch of stuff, > especially for common scenarios. > > I'd argue that for wider SKB/XDP dynptr adoption in the networking > world, those dynptr constructor helpers should be helpers and not > kfuncs as well. But I'd wish someone with more networking tie-ins > would argue this instead of me. I'm not that familiar with the semantics of bpf kfuncs, so to clarify: from a user API perspective, is there any difference in calling the function from the bpf program as a helper vs. kfunc? > > > > > Scrambling to add flags to dynptr helpers was another red flag. > > > > All signs are pointing out that we're not ready do fix dynptr api. > > I disagree, it's an overly harsh generalization. > > > It will evolve and has to evolve without uapi pain. > > > > kfuncs only. For everything. Please. > > This is yet another generalized blanket statement I disagree with. > Over the years I've got an impression that the BPF subsystem is > generally a proud proponent of pragmatic, flexible, and common sense > engineering approaches, so this hard-and-fast rule with no room for > nuance sounds weird. > > There are things that belong in fundamental and core BPF concepts, and > it makes sense to keep them as stable abstractions and helpers. And > there are various things (like interfacing into kernel mechanics, its > types and systems) which totally make sense to keep unstable. I agree with all of your points. I know Alexei is on PTO these next two weeks, so I will in the meantime table this and work on the dynptr memory allocation patchset and a dynptr documentation write-up. Thanks for the discussion!
On Tue, Dec 13, 2022 at 3:50 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Mon, Dec 12, 2022 at 12:12 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Dec 8, 2022 at 5:30 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Thu, Dec 8, 2022 at 4:42 PM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > > > > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > > > > > and the 2nd can be found here [1]. > > > > > > > > > > > > In this patchset, the following convenience helpers are added for interacting > > > > > > with bpf dynamic pointers: > > > > > > > > > > > > * bpf_dynptr_data_rdonly > > > > > > * bpf_dynptr_trim > > > > > > * bpf_dynptr_advance > > > > > > * bpf_dynptr_is_null > > > > > > * bpf_dynptr_is_rdonly > > > > > > * bpf_dynptr_get_size > > > > > > * bpf_dynptr_get_offset > > > > > > * bpf_dynptr_clone > > > > > > * bpf_dynptr_iterator > > > > > > > > > > This is great, but it really stretches uapi limits. > > > > > > > > Stretches in what sense? They are simple and straightforward getters > > > > and trim/advance/clone are fundamental modifiers to be able to work > > > > with a subset of dynptr's overall memory area. > > > > > > > > > Please convert the above and those in [1] to kfuncs. > > > > > I know that there can be an argument made for consistency with existing dynptr uapi > > > > > > > > yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as > > > > BPF helpers, it makes sense to have such basic things like is_null and > > > > trim/advance/clone as BPF helpers as well. Both for consistency and > > > > because there is nothing unstable about them. We are not going to > > > > remove dynptr as a concept, it's pretty well defined. > > > > > > > > Out of the above list perhaps only move bpf_dynptr_iterator() might be > > > > a candidate for kfunc. Though, personally, it makes sense to me to > > > > keep it as BPF helper without GPL restriction as well, given it is > > > > meant for networking applications in the first place, and you don't > > > > need to be GPL-compatible to write useful networking BPF program, from > > > > what I understand. But all the other ones is something you'd need to > > > > make actual use of dynptr concept in real-world BPF programs. > > > > > > > > Can we please have those as BPF helpers, and we can decide to move > > > > slightly fancier bpf_dynptr_iterator() (and future dynptr-related > > > > extras) into kfunc? > > > > > > Sorry, uapi concerns are more important here. > > > > What about the overall user experience and adoption? > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > BPF helpers also have the advantage of working on all architectures, > > whether that architecture supports kfuncs or not, whether it supports > > JIT or not. > > Oh interesting, I didn't realize some architectures do not support kfuncs. > > Out of curiosity, can you elaborate on "no clean way to move from > unstable kfunc to a stable helper"? If for example we needed to move > something from kfunc -> helper, could we not just remove the code > where we added it as a kfunc (eg defining a BTF_ID for it) and add it > as a helper instead? We could in the kernel. And make user life horrible. If, say, bpf_dynptr_is_null() is defined as kfunc, it will be exposed (actually would have to be found in the kernel and definition would be copy/pasted by user manually) to user's BPF application as: extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; When we "stabilize it" and make it helper, it turns into the following definition supplied by libbpf in its bpf_helper_defs.h header (auto-generated from include/uapi/linux/bpf.h): static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; From C source code perspective both will be called exactly the same, but BPF assembly generated for them will be different. For kfunc it will be a specially patched by libbpf `call -1;` instruction with embedded BTF object ID and BTF type ID corresponding to this kfunc. For BPF helper it will be simply `call 777;`. Both are processed by verifier very differently. From BPF program's standpoint it's impossible to support both ways of calling the same bpf_dynptr_is_null(), because we get naming conflict, and there is no single BPF assembly instruction that would support both ways. You'd have to get really creative to transparently call this helper without caring whether it is kfunc or BPF helper. Or you'd have to compile and distribute two variants of the same BPF object file. Both suck. BPF CO-RE is nice and all, but we do it due to necessity, not because it's fun and easy. So if we migrate kfunc to become BPF helper, we'd most probably would need to make a new name for a helper that's different from kfunc. And it's currently not that easy to detect whether kfunc is available or not (see [0]). [0] https://lore.kernel.org/bpf/de495e3a-cf06-ff85-1a4a-185621c9211a@linux.dev/ > > > > > BPF helpers are also nicely self-discoverable and documented in > > include/uapi/linux/bpf.h, in one place where other BPF helpers are. > > This is a big deal, especially for non-expert BPF users (a vast > > majority of BPF users). > > > > > non-gpl and consistency don't even come close. > > > We've been doing everything new as kfuncs and dynptr is not special. > > > > I think dynptr is quite special. It's a very generic and fundamental > > concept, part of core BPF experience. It's a more dynamic counterpart > > to an inflexible statically sized `void * + size` pair of arguments > > sent to helpers for input or output memory regions. Dynptr has no > > inherent dependencies on BTF, kfuncs, trampolines, JIT, nothing. > > > > By requiring kfunc-based helpers we are significantly raising the > > obstacles towards adopting dynptr across a wide range of BPF > > applications. > > > > And the only advantage in return is that we get a hypothetical chance > > to change something in the future. But let's see if that will ever be > > necessary for the helpers Joanne is adding: > > > > 1. Generic accessors to check validity of *any* dynptr, and it's > > inherent properties like offset, available size, read-only property > > (just as useful somethings as bpf_ringbuf_query() is for ringbufs, > > both for debugging and for various heuristics in production). > > > > bpf_dynptr_is_null(struct bpf_dynptr *ptr) > > long bpf_dynptr_get_size(struct bpf_dynptr *ptr) > > long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) > > bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) > > > > There is nothing to add or remove here. No flags, no change in semantics. > > > > 2. Manipulators to copy existing dynptr's view and narrow it down to a > > subset (e.g., for when you have a large memory blog, but need to > > calculate hashes over smaller subset, without destroying original > > dynptr, because it will be used later for some other access). We can > > debate whether clone should get offset or not, but it doesn't change > > much (except usability in common cases). Again, nothing to add or > > remove otherwise, and pretty fundamental for real use of full power of > > dynptr. > > > > long bpf_dynptr_clone(struct bpf_dynptr *ptr, struct bpf_dynptr > > *clone, u32 offset) > > long bpf_dynptr_trim(struct bpf_dynptr *ptr, u32 len) > > long bpf_dynptr_advance(struct bpf_dynptr *ptr, u32 len) > > > > 3. This one is the only one I feel less strongly about, but mostly > > because I can implement the same (even though less ergonomically, of > > course) with bpf_loop() and bpf_dynptr_{clone,advance}. > > > > long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, > > void *callback_ctx, u64 flags) > > > > > > All of the above don't add or change any semantics to dynptr as a > > concept. There is nothing that we'd need to change. > > > > > > > > > > > > helpers, but we got burned on them once and scrambled to add 'flags' argument. > > > > > kfuncs are unstable and can be adjusted/removed at any time later. > > > > It's unfair to block these helpers just because we recided to add > > flags to one of the previous ones (before the final release). And even > > if we didn't managed to do it in time, the worst things would probably > > be another variant of BPF helper. Definitely something to avoid, but > > not end of the world. But as I pointed out above, this set of helpers > > won't be change, as they just complete already established dynptr > > ecosystem of helpers. > > > > > > > > > > I don't see why we would remove any of the above list ever? They are > > > > generic and fundamental to dynptr as a concept, they can't restrict > > > > what dynptr can do in the future. > > > > > > It's not about removing them, but about changing them. > > > > > > Just for example the whole discussion of whether frags should > > > be handled transparently and how write is handled didn't inspire > > > confidence that there is a strong consensus on semantics > > > of these new dynptr accessors. > > > > So let's start with acknowledging that skb and xdp buffer abstractions > > as logically contiguous memory area are inherently complex and > > non-perfect due to the way that kernel handles them for performance > > and flexibility reasons. > > > > Let's also note that verifier knows specific flavor of dynptr and thus > > can enforce additional restrictions based on specifically SKB/XDP > > flavor vs LOCAL/RINGBUF. So just because there is no perfect way to > > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > > dynptr concept itself is flawed or not well thought out. It's just > > that for SKB/XDP there is no perfect solution. Dynptr doesn't change > > anything here, rather it actually simplifies a bunch of stuff, > > especially for common scenarios. > > > > I'd argue that for wider SKB/XDP dynptr adoption in the networking > > world, those dynptr constructor helpers should be helpers and not > > kfuncs as well. But I'd wish someone with more networking tie-ins > > would argue this instead of me. > > I'm not that familiar with the semantics of bpf kfuncs, so to clarify: > from a user API perspective, is there any difference in calling the > function from the bpf program as a helper vs. kfunc? I think I addressed that above, but let me know if not. > > > > > > > > > Scrambling to add flags to dynptr helpers was another red flag. > > > > > > All signs are pointing out that we're not ready do fix dynptr api. > > > > I disagree, it's an overly harsh generalization. > > > > > It will evolve and has to evolve without uapi pain. > > > > > > kfuncs only. For everything. Please. > > > > This is yet another generalized blanket statement I disagree with. > > Over the years I've got an impression that the BPF subsystem is > > generally a proud proponent of pragmatic, flexible, and common sense > > engineering approaches, so this hard-and-fast rule with no room for > > nuance sounds weird. > > > > There are things that belong in fundamental and core BPF concepts, and > > it makes sense to keep them as stable abstractions and helpers. And > > there are various things (like interfacing into kernel mechanics, its > > types and systems) which totally make sense to keep unstable. > > I agree with all of your points. I know Alexei is on PTO these next > two weeks, so I will in the meantime table this and work on the dynptr > memory allocation patchset and a dynptr documentation write-up. > > Thanks for the discussion! SGTM.
On Tue, Dec 13, 2022 at 4:57 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Dec 13, 2022 at 3:50 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > > > On Mon, Dec 12, 2022 at 12:12 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Dec 8, 2022 at 5:30 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Thu, Dec 8, 2022 at 4:42 PM Andrii Nakryiko > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Wed, Dec 7, 2022 at 5:54 PM Alexei Starovoitov > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > > > On Wed, Dec 07, 2022 at 12:55:31PM -0800, Joanne Koong wrote: > > > > > > > This patchset is the 3rd in the dynptr series. The 1st can be found here [0] > > > > > > > and the 2nd can be found here [1]. > > > > > > > > > > > > > > In this patchset, the following convenience helpers are added for interacting > > > > > > > with bpf dynamic pointers: > > > > > > > > > > > > > > * bpf_dynptr_data_rdonly > > > > > > > * bpf_dynptr_trim > > > > > > > * bpf_dynptr_advance > > > > > > > * bpf_dynptr_is_null > > > > > > > * bpf_dynptr_is_rdonly > > > > > > > * bpf_dynptr_get_size > > > > > > > * bpf_dynptr_get_offset > > > > > > > * bpf_dynptr_clone > > > > > > > * bpf_dynptr_iterator > > > > > > > > > > > > This is great, but it really stretches uapi limits. > > > > > > > > > > Stretches in what sense? They are simple and straightforward getters > > > > > and trim/advance/clone are fundamental modifiers to be able to work > > > > > with a subset of dynptr's overall memory area. > > > > > > > > > > > Please convert the above and those in [1] to kfuncs. > > > > > > I know that there can be an argument made for consistency with existing dynptr uapi > > > > > > > > > > yeah, given we have bpf_dynptr_{read,write} and bpf_dynptr_data() as > > > > > BPF helpers, it makes sense to have such basic things like is_null and > > > > > trim/advance/clone as BPF helpers as well. Both for consistency and > > > > > because there is nothing unstable about them. We are not going to > > > > > remove dynptr as a concept, it's pretty well defined. > > > > > > > > > > Out of the above list perhaps only move bpf_dynptr_iterator() might be > > > > > a candidate for kfunc. Though, personally, it makes sense to me to > > > > > keep it as BPF helper without GPL restriction as well, given it is > > > > > meant for networking applications in the first place, and you don't > > > > > need to be GPL-compatible to write useful networking BPF program, from > > > > > what I understand. But all the other ones is something you'd need to > > > > > make actual use of dynptr concept in real-world BPF programs. > > > > > > > > > > Can we please have those as BPF helpers, and we can decide to move > > > > > slightly fancier bpf_dynptr_iterator() (and future dynptr-related > > > > > extras) into kfunc? > > > > > > > > Sorry, uapi concerns are more important here. > > > > > > What about the overall user experience and adoption? > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > BPF helpers also have the advantage of working on all architectures, > > > whether that architecture supports kfuncs or not, whether it supports > > > JIT or not. > > > > Oh interesting, I didn't realize some architectures do not support kfuncs. > > > > Out of curiosity, can you elaborate on "no clean way to move from > > unstable kfunc to a stable helper"? If for example we needed to move > > something from kfunc -> helper, could we not just remove the code > > where we added it as a kfunc (eg defining a BTF_ID for it) and add it > > as a helper instead? > > We could in the kernel. And make user life horrible. > > If, say, bpf_dynptr_is_null() is defined as kfunc, it will be exposed > (actually would have to be found in the kernel and definition would be > copy/pasted by user manually) to user's BPF application as: > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > When we "stabilize it" and make it helper, it turns into the following > definition supplied by libbpf in its bpf_helper_defs.h header > (auto-generated from include/uapi/linux/bpf.h): > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > From C source code perspective both will be called exactly the same, > but BPF assembly generated for them will be different. For kfunc it > will be a specially patched by libbpf `call -1;` instruction with > embedded BTF object ID and BTF type ID corresponding to this kfunc. > For BPF helper it will be simply `call 777;`. Both are processed by > verifier very differently. > > From BPF program's standpoint it's impossible to support both ways of > calling the same bpf_dynptr_is_null(), because we get naming conflict, > and there is no single BPF assembly instruction that would support > both ways. > > You'd have to get really creative to transparently call this helper > without caring whether it is kfunc or BPF helper. Or you'd have to > compile and distribute two variants of the same BPF object file. Both > suck. BPF CO-RE is nice and all, but we do it due to necessity, not > because it's fun and easy. So if we migrate kfunc to become BPF > helper, we'd most probably would need to make a new name for a helper > that's different from kfunc. > > And it's currently not that easy to detect whether kfunc is available > or not (see [0]). > > [0] https://lore.kernel.org/bpf/de495e3a-cf06-ff85-1a4a-185621c9211a@linux.dev/ > > Thank you for the explanation! This is very helpful to know! > > > > > > > > > BPF helpers are also nicely self-discoverable and documented in > > > include/uapi/linux/bpf.h, in one place where other BPF helpers are. > > > This is a big deal, especially for non-expert BPF users (a vast > > > majority of BPF users). > > > > > > > non-gpl and consistency don't even come close. > > > > We've been doing everything new as kfuncs and dynptr is not special. > > > > > > I think dynptr is quite special. It's a very generic and fundamental > > > concept, part of core BPF experience. It's a more dynamic counterpart > > > to an inflexible statically sized `void * + size` pair of arguments > > > sent to helpers for input or output memory regions. Dynptr has no > > > inherent dependencies on BTF, kfuncs, trampolines, JIT, nothing. > > > > > > By requiring kfunc-based helpers we are significantly raising the > > > obstacles towards adopting dynptr across a wide range of BPF > > > applications. > > > > > > And the only advantage in return is that we get a hypothetical chance > > > to change something in the future. But let's see if that will ever be > > > necessary for the helpers Joanne is adding: > > > > > > 1. Generic accessors to check validity of *any* dynptr, and it's > > > inherent properties like offset, available size, read-only property > > > (just as useful somethings as bpf_ringbuf_query() is for ringbufs, > > > both for debugging and for various heuristics in production). > > > > > > bpf_dynptr_is_null(struct bpf_dynptr *ptr) > > > long bpf_dynptr_get_size(struct bpf_dynptr *ptr) > > > long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) > > > bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) > > > > > > There is nothing to add or remove here. No flags, no change in semantics. > > > > > > 2. Manipulators to copy existing dynptr's view and narrow it down to a > > > subset (e.g., for when you have a large memory blog, but need to > > > calculate hashes over smaller subset, without destroying original > > > dynptr, because it will be used later for some other access). We can > > > debate whether clone should get offset or not, but it doesn't change > > > much (except usability in common cases). Again, nothing to add or > > > remove otherwise, and pretty fundamental for real use of full power of > > > dynptr. > > > > > > long bpf_dynptr_clone(struct bpf_dynptr *ptr, struct bpf_dynptr > > > *clone, u32 offset) > > > long bpf_dynptr_trim(struct bpf_dynptr *ptr, u32 len) > > > long bpf_dynptr_advance(struct bpf_dynptr *ptr, u32 len) > > > > > > 3. This one is the only one I feel less strongly about, but mostly > > > because I can implement the same (even though less ergonomically, of > > > course) with bpf_loop() and bpf_dynptr_{clone,advance}. > > > > > > long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, > > > void *callback_ctx, u64 flags) > > > > > > > > > All of the above don't add or change any semantics to dynptr as a > > > concept. There is nothing that we'd need to change. > > > > > > > > > > > > > > > > helpers, but we got burned on them once and scrambled to add 'flags' argument. > > > > > > kfuncs are unstable and can be adjusted/removed at any time later. > > > > > > It's unfair to block these helpers just because we recided to add > > > flags to one of the previous ones (before the final release). And even > > > if we didn't managed to do it in time, the worst things would probably > > > be another variant of BPF helper. Definitely something to avoid, but > > > not end of the world. But as I pointed out above, this set of helpers > > > won't be change, as they just complete already established dynptr > > > ecosystem of helpers. > > > > > > > > > > > > > I don't see why we would remove any of the above list ever? They are > > > > > generic and fundamental to dynptr as a concept, they can't restrict > > > > > what dynptr can do in the future. > > > > > > > > It's not about removing them, but about changing them. > > > > > > > > Just for example the whole discussion of whether frags should > > > > be handled transparently and how write is handled didn't inspire > > > > confidence that there is a strong consensus on semantics > > > > of these new dynptr accessors. > > > > > > So let's start with acknowledging that skb and xdp buffer abstractions > > > as logically contiguous memory area are inherently complex and > > > non-perfect due to the way that kernel handles them for performance > > > and flexibility reasons. > > > > > > Let's also note that verifier knows specific flavor of dynptr and thus > > > can enforce additional restrictions based on specifically SKB/XDP > > > flavor vs LOCAL/RINGBUF. So just because there is no perfect way to > > > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > > > dynptr concept itself is flawed or not well thought out. It's just > > > that for SKB/XDP there is no perfect solution. Dynptr doesn't change > > > anything here, rather it actually simplifies a bunch of stuff, > > > especially for common scenarios. > > > > > > I'd argue that for wider SKB/XDP dynptr adoption in the networking > > > world, those dynptr constructor helpers should be helpers and not > > > kfuncs as well. But I'd wish someone with more networking tie-ins > > > would argue this instead of me. > > > > I'm not that familiar with the semantics of bpf kfuncs, so to clarify: > > from a user API perspective, is there any difference in calling the > > function from the bpf program as a helper vs. kfunc? > > I think I addressed that above, but let me know if not. > > > > > > > > > > > > > > Scrambling to add flags to dynptr helpers was another red flag. > > > > > > > > All signs are pointing out that we're not ready do fix dynptr api. > > > > > > I disagree, it's an overly harsh generalization. > > > > > > > It will evolve and has to evolve without uapi pain. > > > > > > > > kfuncs only. For everything. Please. > > > > > > This is yet another generalized blanket statement I disagree with. > > > Over the years I've got an impression that the BPF subsystem is > > > generally a proud proponent of pragmatic, flexible, and common sense > > > engineering approaches, so this hard-and-fast rule with no room for > > > nuance sounds weird. > > > > > > There are things that belong in fundamental and core BPF concepts, and > > > it makes sense to keep them as stable abstractions and helpers. And > > > there are various things (like interfacing into kernel mechanics, its > > > types and systems) which totally make sense to keep unstable. > > > > I agree with all of your points. I know Alexei is on PTO these next > > two weeks, so I will in the meantime table this and work on the dynptr > > memory allocation patchset and a dynptr documentation write-up. > > > > Thanks for the discussion! > > SGTM.
On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > There is no clean way to ever move from unstable kfunc to a stable helper. No clean way? Yet in the other email you proposed a way. Not pretty, but workable. I'm sure if ever there will be a need to stabilize the kfunc we will find a clean way to do it. Strongly arguing right now that this is an issue without doing the home work is not productive. > BPF helpers also have the advantage of working on all architectures, > whether that architecture supports kfuncs or not, whether it supports > JIT or not. Correct, but applying the same argument we should argue that all features must work in the interpreter as well, because not all architectures support JIT. This way struct-ops and bpf based TCP-CC would never be possible. Some JITs don't support tail calls with subprogs. freplace (bpf prog replacement) works when JITed only. bpf trampoline works on x86-64 only. while kfuncs work on more than one arch. Now comapre the amount of .text that kernel has to contain to support hundreds of helpers vs same amount of kfuncs. In the former it's a whole bunch of code that is there in the kernel in case bpf prog will call that helper. With 200+ helpers and half of them already deprecated we have quite a bit of dead code in the kernel that we cannot delete. While with kfunc approach there is no extra code that deals with conversion of the registers from bpf psABI to arch psABI. With kfuncs we generate this code on demand. > BPF helpers are also nicely self-discoverable and documented in > include/uapi/linux/bpf.h, in one place where other BPF helpers are. > This is a big deal, especially for non-expert BPF users (a vast > majority of BPF users). Good point. In general the kfuncs are not up to the level of documentation of helpers and we should work on improving that, but some of kfuncs are better documented than helpers. So it's not black and white. Discoverability we discussed in the past. The task to automatically emit kfuncs into vmlinux.h is still not complete. Time to prioritize it higher. > > > non-gpl and consistency don't even come close. > > We've been doing everything new as kfuncs and dynptr is not special. > > I think dynptr is quite special. It's a very generic and fundamental > concept, part of core BPF experience. It's a more dynamic counterpart > to an inflexible statically sized `void * + size` pair of arguments > sent to helpers for input or output memory regions. Dynptr has no > inherent dependencies on BTF, kfuncs, trampolines, JIT, nothing. imo dynptr and kptr are more or less equivalent in terms of being core building blocks. kptrs are done via kfuncs, so dynptr can do just as well. > By requiring kfunc-based helpers we are significantly raising the > obstacles towards adopting dynptr across a wide range of BPF > applications. Sorry, but I have to disagree. kptr and dynptr are left and right hand. Both will work just fine as kfuncs. > And the only advantage in return is that we get a hypothetical chance > to change something in the future. But let's see if that will ever be > necessary for the helpers Joanne is adding: > > 1. Generic accessors to check validity of *any* dynptr, and it's > inherent properties like offset, available size, read-only property > (just as useful somethings as bpf_ringbuf_query() is for ringbufs, > both for debugging and for various heuristics in production). > > bpf_dynptr_is_null(struct bpf_dynptr *ptr) > long bpf_dynptr_get_size(struct bpf_dynptr *ptr) > long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) > bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) > > There is nothing to add or remove here. No flags, no change in semantics. Disagree, since there is an obvious counter example. See all of bpf_get_current_task*(). Some of them are still used, but bpf_get_current_task vs bpf_get_current_task_btf is our acknowledgement of the fact that we suck in inventing uapi. It's the lesson that we've learned the hard way. Not going to repeat that mistake again. To be completely honest I expect that dynptr may get obsolete as the whole concept several years from now. We still don't have a single actual user of it. Just like kptr. Could be deprecated eventually just as well. > 3. This one is the only one I feel less strongly about, but mostly > because I can implement the same (even though less ergonomically, of > course) with bpf_loop() and bpf_dynptr_{clone,advance}. > > long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, > void *callback_ctx, u64 flags) Speaking of your upcoming inline iterators. Please make sure that you're adding them as kfuncs. We've made a mistake with bpf_loop. It's a stable helper, but inline iterators will immediately deprecate most uses of bpf_loop. If bpf_loop was a kfunc we would have deleted it. > Let's also note that verifier knows specific flavor of dynptr and thus > can enforce additional restrictions based on specifically SKB/XDP > flavor vs LOCAL/RINGBUF. So just because there is no perfect way to > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > dynptr concept itself is flawed or not well thought out. It's just I think that's exactly what it means. dynptr concept is flawed. It's ok to add this flawed feature to the kernel right now, because we don't see a better way today, but that might change in the future and we gotta be able to fix our mistakes.
On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > No clean way? Yet in the other email you proposed a way. > Not pretty, but workable. > I'm sure if ever there will be a need to stabilize the kfunc we will > find a clean way to do it. You can't have stable and unstable helper definition in the same .c file, so work around would include having two separate .c files and statically linking them together just to be able to call one or the other within the same program. It's possible, but in no way it's clean or straightforward. And that was my point. > Strongly arguing right now that this is an issue without doing the home work > is not productive. Not sure what kind of extra homework I should do to be able to point out that what I said above and in previous emails is a real pain for users. > > > BPF helpers also have the advantage of working on all architectures, > > whether that architecture supports kfuncs or not, whether it supports > > JIT or not. > > Correct, but applying the same argument we should argue that > all features must work in the interpreter as well, because > not all architectures support JIT. > This way struct-ops and bpf based TCP-CC would never be possible. > Some JITs don't support tail calls with subprogs. > freplace (bpf prog replacement) works when JITed only. > bpf trampoline works on x86-64 only. > while kfuncs work on more than one arch. Where did I claim that *everything* should work everywhere? And yes, if we can make some feature work across JIT and interpreter *with no extra work*, then yes, we should strive to do it. > > Now comapre the amount of .text that kernel has to contain > to support hundreds of helpers vs same amount of kfuncs. The amount of code is about the same for helpers vs kfuncs assuming they are used, though, right? so it comes down to being able to remove stuff, as you mention below. > In the former it's a whole bunch of code that is there in the kernel > in case bpf prog will call that helper. With 200+ helpers and half > of them already deprecated we have quite a bit of dead code in the kernel > that we cannot delete. So "half of them already deprecated" is news to me and a pretty strong statement. I went just scrolling through helpers and lots of them seems as useful as they were when they were added. Completely ignoring networking helpers (which I don't use much at all, but that doesn't mean they are useless and deprecated, right?), I counted about 40 at least that I've used personally, and there is more helpers that are used in practice across various apps I've helped over time. > While with kfunc approach there is no extra code that deals with > conversion of the registers from bpf psABI to arch psABI. > With kfuncs we generate this code on demand. First time I'm hearing this .text size concern due to conversion of the registers from bpf psABI to arch psABI. Can you elaborate, please? I went spot checking, looked at a few helpers like bpf_map_lookup_elem, bpf_csum_diff, bpf_skb_store_bytes, etc. I couldn't guess what bloat you are talking about? And how many bytes are we talking about here? > > > BPF helpers are also nicely self-discoverable and documented in > > include/uapi/linux/bpf.h, in one place where other BPF helpers are. > > This is a big deal, especially for non-expert BPF users (a vast > > majority of BPF users). > > Good point. In general the kfuncs are not up to the level of > documentation of helpers and we should work on improving that, > but some of kfuncs are better documented than helpers. > So it's not black and white. I was not comparing the quality of documentation. I was saying all the helpers are nicely listed (with their doc comments, yes) in one place in UAPI, making it simple for users to discover. Documentation itself can and should be improved for both helpers and kfuncs as much as possible, of course. > > Discoverability we discussed in the past. > The task to automatically emit kfuncs into vmlinux.h is still not complete. > Time to prioritize it higher. > Yep. > > > > > non-gpl and consistency don't even come close. > > > We've been doing everything new as kfuncs and dynptr is not special. > > > > I think dynptr is quite special. It's a very generic and fundamental > > concept, part of core BPF experience. It's a more dynamic counterpart > > to an inflexible statically sized `void * + size` pair of arguments > > sent to helpers for input or output memory regions. Dynptr has no > > inherent dependencies on BTF, kfuncs, trampolines, JIT, nothing. > > imo dynptr and kptr are more or less equivalent in terms of being core > building blocks. > kptrs are done via kfuncs, so dynptr can do just as well. bpf_kptr_xchg() is a BPF helper, so kptr is not 100% done via kfuncs. (But I'm guessing you'll say it was a mistake and bpf_kptr_xchg() should have been a kfunc, but it's too late to change that, and it's just a counter example that proves the rule). But regardless, dynptr is modeled as black box with hidden state, and its API surface area is bigger (offset, size, is null or not, manipulations over those aspects; then there is skb/xdp abstraction to be taken care of for generic read/write). It has a wider *generic* API surface to be useful and effectively used. Kptr is a single pointer that can be NULL or not and you can check for that directly. The rest is BPF verifier magic that keeps track of types and "trustedness", and then you can use specific interfacing kfuncs to work with kernel objects (which as I said before, makes sense to keep unstable). Yes, both are fundamental. But they are not apples to apples. > > > By requiring kfunc-based helpers we are significantly raising the > > obstacles towards adopting dynptr across a wide range of BPF > > applications. > > Sorry, but I have to disagree. kptr and dynptr are left and right hand. > Both will work just fine as kfuncs. > Ok, let's agree to disagree. > > And the only advantage in return is that we get a hypothetical chance > > to change something in the future. But let's see if that will ever be > > necessary for the helpers Joanne is adding: > > > > 1. Generic accessors to check validity of *any* dynptr, and it's > > inherent properties like offset, available size, read-only property > > (just as useful somethings as bpf_ringbuf_query() is for ringbufs, > > both for debugging and for various heuristics in production). > > > > bpf_dynptr_is_null(struct bpf_dynptr *ptr) > > long bpf_dynptr_get_size(struct bpf_dynptr *ptr) > > long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) > > bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) > > > > There is nothing to add or remove here. No flags, no change in semantics. > > Disagree, since there is an obvious counter example. I'm talking about *specific* dynptr helpers under discussion, and you are bringing up some other helpers as "counter examples". What kind of discussion is this? We'll keep branching out with more and more (at best) tangentially related arguments until I'm exhausted and just give up? > See all of bpf_get_current_task*(). > Some of them are still used, but > bpf_get_current_task vs bpf_get_current_task_btf is our acknowledgement > of the fact that we suck in inventing uapi. All *two* of them, bpf_get_current_task() and bpf_get_current_task_btf(), right? They are 2 years apart. bpf_get_current_task() was added before BTF era. It is still actively used today and there is nothing wrong with it. It works on older kernels just fine, even with BPF CO-RE (as backporting a few simple patches to generate BTF is simple and easy; not so much with BPF verifier changes to add native BTF support). I don't see much problem having both, they are not maintenance burden. > It's the lesson that we've learned the hard way. > Not going to repeat that mistake again. I'm not dismissing the burden of backwards compat and UAPI stability, you don't have to explain that to me. But I don't see it as a reason to suddenly make everything unstable, even concepts that are core parts of the BPF framework. > > To be completely honest I expect that dynptr may get obsolete > as the whole concept several years from now. > We still don't have a single actual user of it. > Just like kptr. Could be deprecated eventually just as well. > One can say similar things about any technology or API. It doesn't mean that it was a mistake to implement them in the first place (just like your example with bpf_get_current_task() -- it served and still serves its purpose). For dynptr, time will tell, but we are still missing important parts for wider adoption. Skb/xdp stuff will be great for networking. Ringbuf/local (and malloc one, when we get to it) dynptrs will be used by generic tracing apps, but it will have to be deployed more widely across all supported kernels to make sense (thinking about our fleet-wide profiler adoption, for example). And in general, adoption of new concepts takes time. > > 3. This one is the only one I feel less strongly about, but mostly > > because I can implement the same (even though less ergonomically, of > > course) with bpf_loop() and bpf_dynptr_{clone,advance}. > > > > long bpf_dynptr_iterator(struct bpf_dynptr *ptr, void *callback_fn, > > void *callback_ctx, u64 flags) > > Speaking of your upcoming inline iterators. > Please make sure that you're adding them as kfuncs. > We've made a mistake with bpf_loop. It's a stable helper, > but inline iterators will immediately deprecate most uses of bpf_loop. > If bpf_loop was a kfunc we would have deleted it. I'm afraid we'll have to have a similar discussion with iterators. For a generic fundamental number range iterator, which is a generalization of bounded loops and bpf_loop, I believe it should be in stable UAPI as well. For stuff like iterators over kernel objects (tasks, cgroups, etc) -- kfuncs make sense to me. But let's cross that bridge when we get there. > > > Let's also note that verifier knows specific flavor of dynptr and thus > > can enforce additional restrictions based on specifically SKB/XDP > > flavor vs LOCAL/RINGBUF. So just because there is no perfect way to > > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > > dynptr concept itself is flawed or not well thought out. It's just > > I think that's exactly what it means. dynptr concept is flawed. > It's ok to add this flawed feature to the kernel right now, > because we don't see a better way today, but that might change > in the future and we gotta be able to fix our mistakes. "flawed", "mistakes", "deprecated", etc. You keep using this strongly negatively connotated language for things that were and are perfectly valid and working (and, most importantly, used and useful in practice), but somehow fell out of your favor. Is it really necessary to denigrate everything like that? It just distracts from the essence of the discussion.
On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > No clean way? Yet in the other email you proposed a way. > > Not pretty, but workable. > > I'm sure if ever there will be a need to stabilize the kfunc we will > > find a clean way to do it. > > You can't have stable and unstable helper definition in the same .c > file, of course we can. uapi helpers vs kfuncs argument is not a black and white comparison. It's not just stable vs unstable. uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. Meaning they are largely unstable. The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, but distros can apply their own "stability rules". See Redhat's kABI, for example. A distro can guarantee a stability of certain EXPORT_SYMBOL* for their customers, but that doesn't bind upstream development. With uapi bpf helpers we have to guarantee their stability, while with kfuncs we can do whatever we want. Right now all kfuncs are unstable and to prove the point we changed them couple times already (nf_conn*). We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). Hard to imagine more stable and more fundamental function. Of course we want bpf programs to use bpf_obj_new() and assume that it's going to be available in all future kernel releases. But at the same time we're not bound by uapi rules. bpf_obj_new() will likely be stable, but not uapi stable. If we screw up (or find better way to allocate memory in the future) we can change it. We can invent our own deprecation rules for stable-ish kfuncs and invent our more-unstable-than-current-unstable rules for kfuncs that are too much kernel release dependent. > But regardless, dynptr is modeled as black box with hidden state, and > its API surface area is bigger (offset, size, is null or not, > manipulations over those aspects; then there is skb/xdp abstraction to > be taken care of for generic read/write). It has a wider *generic* API > surface to be useful and effectively used. tbh dynptr as an abstraction of skb/xdp is not convincing. cilium created their own abstraction on top of skb and xdp and it's zero cost. While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. So I suspect it won't be a success story in the long run, but we can certainly try it out since they will be kfuncs and can be deprecated if maintenance outweighs the number of users. > All *two* of them, bpf_get_current_task() and > bpf_get_current_task_btf(), right? They are 2 years apart. > bpf_get_current_task() was added before BTF era. It is still actively > used today and there is nothing wrong with it. It works on older > kernels just fine, even with BPF CO-RE (as backporting a few simple > patches to generate BTF is simple and easy; not so much with BPF > verifier changes to add native BTF support). I don't see much problem > having both, they are not maintenance burden. bpf_get_current_pid_tgid bpf_get_current_uid_gid bpf_get_current_comm bpf_get_current_task bpf_get_current_task_btf bpf_get_current_cgroup_id bpf_get_current_ancestor_cgroup_id bpf_skb_ancestor_cgroup_id bpf_sk_cgroup_id bpf_sk_ancestor_cgroup_id _are_ a maintenance burden. The verifier got smarter and we could have removed all of them, but uapi rules makes it impossible. The bpf prog could have been enabled to access all these task_struct and cgroup fields directly. Likely without any kfuncs. bpf_send_signal vs bpf_send_signal_thread bpf_jiffies64 vs bpf_this_cpu_ptr etc there are plenty examples where uapi bpf helpers became a burden. They are working and will keep working, but we could have done much better job if not for uapi. These are the examples where uapi rules are too strong for bpf development. Our pace of adding new features is high. The kernel uapi rules are too strict for us. At one point DaveM declared freeze on sizeof(struct sk_buff). It was a difficult, but correct decision. We have to declare freeze on bpf helpers. 211 helpers that have to be maintained forever is a huge burden. All new features should use kfuncs and we need to figure out a deprecation and stability story for them. How to document kfuncs cleanly, how to discover them, etc.
On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > No clean way? Yet in the other email you proposed a way. > > > Not pretty, but workable. > > > I'm sure if ever there will be a need to stabilize the kfunc we will > > > find a clean way to do it. > > > > You can't have stable and unstable helper definition in the same .c > > file, > > of course we can. > uapi helpers vs kfuncs argument is not a black and white comparison. > It's not just stable vs unstable. > uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. > While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. > Meaning they are largely unstable. > The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, > but distros can apply their own "stability rules". > See Redhat's kABI, for example. A distro can guarantee a stability > of certain EXPORT_SYMBOL* for their customers, but that doesn't bind > upstream development. > > With uapi bpf helpers we have to guarantee their stability, > while with kfuncs we can do whatever we want. Right now all kfuncs are > unstable and to prove the point we changed them couple times already (nf_conn*). > We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). > Hard to imagine more stable and more fundamental function. > Of course we want bpf programs to use bpf_obj_new() and assume > that it's going to be available in all future kernel releases. > But at the same time we're not bound by uapi rules. > bpf_obj_new() will likely be stable, but not uapi stable. > If we screw up (or find better way to allocate memory in the future) > we can change it. > We can invent our own deprecation rules for stable-ish kfuncs and > invent our more-unstable-than-current-unstable rules for kfuncs that > are too much kernel release dependent. I'm talking about *mechanics* of having two incompatible definitions of functions with the same name, not the *concept* of stable vs unstable API. See [0] where I explained this as a reply to Joanne. [0] https://lore.kernel.org/bpf/CAEf4BzbRQLEjAFUkzzStv0c0=O+r9iZ8hq33sJB2RtSuGrGAEA@mail.gmail.com/ > > > But regardless, dynptr is modeled as black box with hidden state, and > > its API surface area is bigger (offset, size, is null or not, > > manipulations over those aspects; then there is skb/xdp abstraction to > > be taken care of for generic read/write). It has a wider *generic* API > > surface to be useful and effectively used. > > tbh dynptr as an abstraction of skb/xdp is not convincing. > cilium created their own abstraction on top of skb and xdp and it's zero cost. > While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. > So I suspect it won't be a success story in the long run, but we > can certainly try it out since they will be kfuncs and can be deprecated > if maintenance outweighs the number of users. > > > All *two* of them, bpf_get_current_task() and > > bpf_get_current_task_btf(), right? They are 2 years apart. > > bpf_get_current_task() was added before BTF era. It is still actively > > used today and there is nothing wrong with it. It works on older > > kernels just fine, even with BPF CO-RE (as backporting a few simple > > patches to generate BTF is simple and easy; not so much with BPF > > verifier changes to add native BTF support). I don't see much problem > > having both, they are not maintenance burden. > > bpf_get_current_pid_tgid > bpf_get_current_uid_gid > bpf_get_current_comm > bpf_get_current_task > bpf_get_current_task_btf > bpf_get_current_cgroup_id > bpf_get_current_ancestor_cgroup_id > bpf_skb_ancestor_cgroup_id > bpf_sk_cgroup_id > bpf_sk_ancestor_cgroup_id > > _are_ a maintenance burden. bpf_get_current_pid_tgid() was added in 2015, slightly and uncritically touched by Daniel in 2016 and we never had any problems with it ever since. No updates, no maintenance. I don't remember much problem with other helpers in this list, but I didn't check each one. But we certainly have a different understanding of what "maintenance burden" is. If some code doesn't require constant change and doesn't prevent changes in some other parts of the system, it's not a maintenance burden. > The verifier got smarter and we could have removed all of them, > but uapi rules makes it impossible. > The bpf prog could have been enabled to access all these task_struct > and cgroup fields directly. Likely without any kfuncs. > > bpf_send_signal vs bpf_send_signal_thread > bpf_jiffies64 vs bpf_this_cpu_ptr > etc > there are plenty examples where uapi bpf helpers became a burden. > They are working and will keep working, but we could have done > much better job if not for uapi. > These are the examples where uapi rules are too strong for bpf development. > Our pace of adding new features is high. > The kernel uapi rules are too strict for us. I'm familiar with the burden of maintaining API stability and backwards compat. But it's not just about the library/system developer's convenience and burden, it's also about the end user's experience and convenience. BPF tool developers really appreciate when there are few less quirks to remember and work around across kernel versions, configurations, architectures, etc. It's the pain that kernel engineers working on BPF bleeding-edge don't experience in the BPF selftests environment. > > At one point DaveM declared freeze on sizeof(struct sk_buff). > It was a difficult, but correct decision. > We have to declare freeze on bpf helpers. > 211 helpers that have to be maintained forever is a huge burden. I still didn't get why we have to freeze anything and how exactly helpers are a burden. But especially in this specific case of few simple dynptr helpers, especially that other dynptrs generic APIs are already BPF helpers. I just don't get it and honestly all I see from this discussion is that you've made up your mind and there is nothing that can be done to convince you. The only "BPF helpers are stable and thus a burden" argument is just not convincing and I'd even say is mostly false. There are no upsides to having dynptr helpers as kfuncs, as far as I'm concerned. But there are a bunch of downsides, even if some of those might be lifted in the future. The unfortunate thing is that end users that are meant to benefit from all these helpers and them being "a standard API offering" are not well represented on the BPF mailing list, unfortunately. And my opinion and arguments as a proxy for theirs is clearly not enough. > All new features should use kfuncs and we need to figure out a deprecation > and stability story for them. How to document kfuncs cleanly, > how to discover them, etc.
On Thu, Dec 29, 2022 at 03:10:22PM -0800, Andrii Nakryiko wrote: > On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > > > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > > > No clean way? Yet in the other email you proposed a way. > > > > Not pretty, but workable. > > > > I'm sure if ever there will be a need to stabilize the kfunc we will > > > > find a clean way to do it. > > > > > > You can't have stable and unstable helper definition in the same .c > > > file, > > > > of course we can. > > uapi helpers vs kfuncs argument is not a black and white comparison. > > It's not just stable vs unstable. > > uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. > > While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. > > Meaning they are largely unstable. > > The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, > > but distros can apply their own "stability rules". > > See Redhat's kABI, for example. A distro can guarantee a stability > > of certain EXPORT_SYMBOL* for their customers, but that doesn't bind > > upstream development. > > > > With uapi bpf helpers we have to guarantee their stability, > > while with kfuncs we can do whatever we want. Right now all kfuncs are > > unstable and to prove the point we changed them couple times already (nf_conn*). > > We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). > > Hard to imagine more stable and more fundamental function. > > Of course we want bpf programs to use bpf_obj_new() and assume > > that it's going to be available in all future kernel releases. > > But at the same time we're not bound by uapi rules. > > bpf_obj_new() will likely be stable, but not uapi stable. > > If we screw up (or find better way to allocate memory in the future) > > we can change it. > > We can invent our own deprecation rules for stable-ish kfuncs and > > invent our more-unstable-than-current-unstable rules for kfuncs that > > are too much kernel release dependent. > > I'm talking about *mechanics* of having two incompatible definitions > of functions with the same name, not the *concept* of stable vs > unstable API. See [0] where I explained this as a reply to Joanne. > > [0] https://lore.kernel.org/bpf/CAEf4BzbRQLEjAFUkzzStv0c0=O+r9iZ8hq33sJB2RtSuGrGAEA@mail.gmail.com/ Mechanics for kfuncs are much better than for helpers. extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; will likely work with both gcc and clang. And if it doesn't we can fix it. While when gcc folks saw helpers: static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; they realized that it is a hack that abuses compiler optimizations. They even invented attr(kernel_helper) to workaround this issue. After a bunch of arguing gcc added support for this hack without attr, but it's going to be around forever... in gcc, in clang and in kernel. It's something that we could have fixed if it wasn't for uapi. Just one more example of unfixable mistake that causing issues to multiple projects. That's the core issue of kernel uapi rules: inability to fix mistakes. > > > > > But regardless, dynptr is modeled as black box with hidden state, and > > > its API surface area is bigger (offset, size, is null or not, > > > manipulations over those aspects; then there is skb/xdp abstraction to > > > be taken care of for generic read/write). It has a wider *generic* API > > > surface to be useful and effectively used. > > > > tbh dynptr as an abstraction of skb/xdp is not convincing. > > cilium created their own abstraction on top of skb and xdp and it's zero cost. > > While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. > > So I suspect it won't be a success story in the long run, but we > > can certainly try it out since they will be kfuncs and can be deprecated > > if maintenance outweighs the number of users. > > > > > All *two* of them, bpf_get_current_task() and > > > bpf_get_current_task_btf(), right? They are 2 years apart. > > > bpf_get_current_task() was added before BTF era. It is still actively > > > used today and there is nothing wrong with it. It works on older > > > kernels just fine, even with BPF CO-RE (as backporting a few simple > > > patches to generate BTF is simple and easy; not so much with BPF > > > verifier changes to add native BTF support). I don't see much problem > > > having both, they are not maintenance burden. > > > > bpf_get_current_pid_tgid > > bpf_get_current_uid_gid > > bpf_get_current_comm > > bpf_get_current_task > > bpf_get_current_task_btf > > bpf_get_current_cgroup_id > > bpf_get_current_ancestor_cgroup_id > > bpf_skb_ancestor_cgroup_id > > bpf_sk_cgroup_id > > bpf_sk_ancestor_cgroup_id > > > > _are_ a maintenance burden. > > bpf_get_current_pid_tgid() was added in 2015, slightly and > uncritically touched by Daniel in 2016 and we never had any problems > with it ever since. No updates, no maintenance. I don't remember much > problem with other helpers in this list, but I didn't check each one. > > But we certainly have a different understanding of what "maintenance > burden" is. If some code doesn't require constant change and doesn't > prevent changes in some other parts of the system, it's not a > maintenance burden. As I said it's not about working today. If one doesn't touch code it will keep working. It's about being able to change it. The uapi bits we simply cannot change. > > > The verifier got smarter and we could have removed all of them, > > but uapi rules makes it impossible. > > The bpf prog could have been enabled to access all these task_struct > > and cgroup fields directly. Likely without any kfuncs. > > > > bpf_send_signal vs bpf_send_signal_thread > > bpf_jiffies64 vs bpf_this_cpu_ptr > > etc > > there are plenty examples where uapi bpf helpers became a burden. > > They are working and will keep working, but we could have done > > much better job if not for uapi. > > These are the examples where uapi rules are too strong for bpf development. > > Our pace of adding new features is high. > > The kernel uapi rules are too strict for us. > > I'm familiar with the burden of maintaining API stability and > backwards compat. But it's not just about the library/system libbpf 1.0 wasn't the smoothest example of deprecation. But we still did it despite all kinds of negative flame. With uapi helpers we cannot do any of that. No deprecation schemes. While kfuncs allow innovation. > developer's convenience and burden, it's also about the end user's > experience and convenience. BPF tool developers really appreciate when > there are few less quirks to remember and work around across kernel > versions, configurations, architectures, etc. It's the pain that > kernel engineers working on BPF bleeding-edge don't experience in the > BPF selftests environment. There is a trade off between users and developers. We want to make user experience as smooth as possible while preserve the speed of development for the kernel. uapi is in the way of that. > > > > At one point DaveM declared freeze on sizeof(struct sk_buff). > > It was a difficult, but correct decision. > > We have to declare freeze on bpf helpers. > > 211 helpers that have to be maintained forever is a huge burden. > > I still didn't get why we have to freeze anything and how exactly > helpers are a burden. > > But especially in this specific case of few simple dynptr helpers, > especially that other dynptrs generic APIs are already BPF helpers. I > just don't get it and honestly all I see from this discussion is that > you've made up your mind and there is nothing that can be done to > convince you. > > The only "BPF helpers are stable and thus a burden" argument is just > not convincing and I'd even say is mostly false. There are no upsides > to having dynptr helpers as kfuncs, as far as I'm concerned. The main and only upside for everything as kfunc is that we can change it. That's it. > But there > are a bunch of downsides, even if some of those might be lifted in the > future. imo ability to change outweighs all downsides, since downsides are fixable while inability to change is a burden. > The unfortunate thing is that end users that are meant to benefit from > all these helpers and them being "a standard API offering" are not > well represented on the BPF mailing list, unfortunately. And my > opinion and arguments as a proxy for theirs is clearly not enough. I also would like to hear what others on the list are thinking.
On Thu, Dec 29, 2022 at 06:46:41PM -0800, Alexei Starovoitov wrote: > On Thu, Dec 29, 2022 at 03:10:22PM -0800, Andrii Nakryiko wrote: > > On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > > > > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > > > > > No clean way? Yet in the other email you proposed a way. > > > > > Not pretty, but workable. > > > > > I'm sure if ever there will be a need to stabilize the kfunc we will > > > > > find a clean way to do it. > > > > > > > > You can't have stable and unstable helper definition in the same .c > > > > file, > > > > > > of course we can. > > > uapi helpers vs kfuncs argument is not a black and white comparison. > > > It's not just stable vs unstable. > > > uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. > > > While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. > > > Meaning they are largely unstable. > > > The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, > > > but distros can apply their own "stability rules". > > > See Redhat's kABI, for example. A distro can guarantee a stability > > > of certain EXPORT_SYMBOL* for their customers, but that doesn't bind > > > upstream development. This also sounds more in line with what was discussed at the maintainers summit [0]. "A BPF program that depends on kernel symbols is not really a user program anymore." Given that perspective, EXPORT_SYMBOL_GPL sounds like the correct equivalency to "public BPF symbols". [0]: https://lwn.net/Articles/908464/ > > > > > > With uapi bpf helpers we have to guarantee their stability, > > > while with kfuncs we can do whatever we want. Right now all kfuncs are > > > unstable and to prove the point we changed them couple times already (nf_conn*). > > > We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). > > > Hard to imagine more stable and more fundamental function. > > > Of course we want bpf programs to use bpf_obj_new() and assume > > > that it's going to be available in all future kernel releases. > > > But at the same time we're not bound by uapi rules. > > > bpf_obj_new() will likely be stable, but not uapi stable. > > > If we screw up (or find better way to allocate memory in the future) > > > we can change it. > > > We can invent our own deprecation rules for stable-ish kfuncs and > > > invent our more-unstable-than-current-unstable rules for kfuncs that > > > are too much kernel release dependent. > > > > I'm talking about *mechanics* of having two incompatible definitions > > of functions with the same name, not the *concept* of stable vs > > unstable API. See [0] where I explained this as a reply to Joanne. > > > > [0] https://lore.kernel.org/bpf/CAEf4BzbRQLEjAFUkzzStv0c0=O+r9iZ8hq33sJB2RtSuGrGAEA@mail.gmail.com/ > > Mechanics for kfuncs are much better than for helpers. > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > will likely work with both gcc and clang. > And if it doesn't we can fix it. > > While when gcc folks saw helpers: > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > they realized that it is a hack that abuses compiler optimizations. > They even invented attr(kernel_helper) to workaround this issue. > After a bunch of arguing gcc added support for this hack without attr, > but it's going to be around forever... in gcc, in clang and in kernel. > It's something that we could have fixed if it wasn't for uapi. > Just one more example of unfixable mistake that causing issues > to multiple projects. > That's the core issue of kernel uapi rules: inability to fix mistakes. > > > > > > > > But regardless, dynptr is modeled as black box with hidden state, and > > > > its API surface area is bigger (offset, size, is null or not, > > > > manipulations over those aspects; then there is skb/xdp abstraction to > > > > be taken care of for generic read/write). It has a wider *generic* API > > > > surface to be useful and effectively used. > > > > > > tbh dynptr as an abstraction of skb/xdp is not convincing. > > > cilium created their own abstraction on top of skb and xdp and it's zero cost. > > > While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. > > > So I suspect it won't be a success story in the long run, but we > > > can certainly try it out since they will be kfuncs and can be deprecated > > > if maintenance outweighs the number of users. > > > > > > > All *two* of them, bpf_get_current_task() and > > > > bpf_get_current_task_btf(), right? They are 2 years apart. > > > > bpf_get_current_task() was added before BTF era. It is still actively > > > > used today and there is nothing wrong with it. It works on older > > > > kernels just fine, even with BPF CO-RE (as backporting a few simple > > > > patches to generate BTF is simple and easy; not so much with BPF > > > > verifier changes to add native BTF support). I don't see much problem > > > > having both, they are not maintenance burden. > > > > > > bpf_get_current_pid_tgid > > > bpf_get_current_uid_gid > > > bpf_get_current_comm > > > bpf_get_current_task > > > bpf_get_current_task_btf > > > bpf_get_current_cgroup_id > > > bpf_get_current_ancestor_cgroup_id > > > bpf_skb_ancestor_cgroup_id > > > bpf_sk_cgroup_id > > > bpf_sk_ancestor_cgroup_id > > > > > > _are_ a maintenance burden. > > > > bpf_get_current_pid_tgid() was added in 2015, slightly and > > uncritically touched by Daniel in 2016 and we never had any problems > > with it ever since. No updates, no maintenance. I don't remember much > > problem with other helpers in this list, but I didn't check each one. You could argue that this actually a point in favor of kfuncs. If we implement these as kfuncs and never touch them again, users will not need to change anything and will have the same exact experience as if it was in UAPI (minus being on platforms that don't support kfuncs, which is something we should work to fix in general). It will just work indefinitely, as long as we decide to support it. The only time there will be pain felt by users is if we in fact do actually have to change it. If we have to add a flags field, or change the semantics to have different behavior, etc. I think Alexei's point is that we simply _can't_ do that if we're bound by UAPI. At least with kfuncs we have the choice to change it if we deem it necessary. Taking bpf_get_current_task() as an example, I think it's better to have the debate be "should we keep supporting this / are users still using it?" rather than, "it's UAPI, there's nothing to even discuss". The point being that even if bpf_get_current_task() is still used, there may (and inevitably will) be other UAPI helpers that are useless and that we just can't remove. > > > > But we certainly have a different understanding of what "maintenance > > burden" is. If some code doesn't require constant change and doesn't > > prevent changes in some other parts of the system, it's not a > > maintenance burden. > > As I said it's not about working today. If one doesn't touch code > it will keep working. > It's about being able to change it. > The uapi bits we simply cannot change. I think Michael Kerrisk's classic "Once upon an API" talk [1] provides a compelling, real-world example of this point: [1]: https://kernel-recipes.org/en/2022/once-upon-an-api/ APIs can seem innocuous when you first add them, and then as you use them more and in different ways, your platform grows more featureful and things change, etc, you realize that the axioms upon which you designed your APIs in the first place are no longer true. prctl() started out as a dead-simple syscall where a child process would get a signal if its parent process dies. Over the years, it's morphed into a monstrosity [2] of a syscall with tons of odd behavior that's impossible [3] to fix even a decade+ after the API was first introduced due to the possibility of breaking applications that have come to rely on that non-sensical behavior. Never breaking user space is a great philosophy, but I don't think we need to inflict that same pain on ourselves for _kernel_ programs, which is what we're discussing here. [2]: https://man7.org/linux/man-pages/man2/prctl.2.html [3]: https://bugzilla.kernel.org/show_bug.cgi?id=43300#c22 I'm not trying to paint a false equivalency between prctl() and the helpers you enumerated in [4], because I agree with you that it's very unlikely that they'll change, but I also think it's impossible to know that for sure, and I do agree with Alexei that the "hypothetical chance to change something in the future" is hugely valuable. That being said, I comment more on the dynptr helpers down below. [4]: https://lore.kernel.org/all/CAEf4BzZM0+j6DXMgu2o2UvjtzoOxcjsJtT8j-jqVZYvAqxc52g@mail.gmail.com/ > > > > > The verifier got smarter and we could have removed all of them, > > > but uapi rules makes it impossible. > > > The bpf prog could have been enabled to access all these task_struct > > > and cgroup fields directly. Likely without any kfuncs. > > > > > > bpf_send_signal vs bpf_send_signal_thread > > > bpf_jiffies64 vs bpf_this_cpu_ptr > > > etc > > > there are plenty examples where uapi bpf helpers became a burden. > > > They are working and will keep working, but we could have done > > > much better job if not for uapi. > > > These are the examples where uapi rules are too strong for bpf development. > > > Our pace of adding new features is high. > > > The kernel uapi rules are too strict for us. > > > > I'm familiar with the burden of maintaining API stability and > > backwards compat. But it's not just about the library/system > > libbpf 1.0 wasn't the smoothest example of deprecation. > But we still did it despite all kinds of negative flame. > With uapi helpers we cannot do any of that. No deprecation schemes. > While kfuncs allow innovation. > > > developer's convenience and burden, it's also about the end user's > > experience and convenience. BPF tool developers really appreciate when > > there are few less quirks to remember and work around across kernel > > versions, configurations, architectures, etc. It's the pain that > > kernel engineers working on BPF bleeding-edge don't experience in the > > BPF selftests environment. > > There is a trade off between users and developers. We want to make user > experience as smooth as possible while preserve the speed of development > for the kernel. uapi is in the way of that. As illustrated in the prctl() example above, UAPI can get in the way of users as well. If we can't fix an API or its semantics, some users are stuck with that crappy behavior (while, admittedly, others get to enjoy the consistency of the weird / existing behavior not changing out from under them). I certainly see why there are strong reasons to have a stable UAPI for user space, but for kernel programs I don't think so. > > > > > > At one point DaveM declared freeze on sizeof(struct sk_buff). > > > It was a difficult, but correct decision. > > > We have to declare freeze on bpf helpers. > > > 211 helpers that have to be maintained forever is a huge burden. While I agree that we should freeze helpers at some point, I also think we need to take care of a few things before that can or should formally go into effect. You mentioned some things we should take care of in [5]. Automatically emitting kfuncs into vmlinux.h, properly documenting kfuncs. I think that list is insufficient, and that we need: [5]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ 1. A formal, build-enforced policy for documenting kfuncs, as we currently have for helpers (as you mentioned, minus the build-enforcement). 2. Emitting kfuncs into vmlinux.h, as you mentioned. 3. Allowing users to specify flags per-argument in kfuncs. In my opinion this is a big deficiency of kfuncs relative to helpers. This would mean e.g. getting rid of the __sz and __k hacks. I think it's fine for us to live with it for now while we're continuing to flesh-out and improve kfuncs (a process which is happening quickly), but IMO it's really not appropriate for it to be the official only way to add helpers. It's a beta feature :-) 4. Getting rid of KF_TRUSTED_ARGS and making that the default. 5. Ideally we could improve the story for _defining_ kfuncs as well, though IMO it's already far less painful than defining helpers. It would be nice if you could just tag a kfunc with something like a __bpf_kfunc macro and it would do the following: - Automatically disable the -Wmissing-prototypes warning. I doubt this is possible without adding some compiler features that let you do something like __attribute__(__nowarn__("Wmissing-prototypes")), so maybe this isn't a hard blocker, but more of a medium / long-term goal. - Add whatever other attributes we need for the kfuncs to be safe. For example, 'noinline' and '__used'. Even if the symbols are global, we'll probably need '__used' for LTO. Overall, my point is really that we still have some homework to do before we can just unilaterally freeze helpers. We're getting close, but IMO not quite there yet. > > > > I still didn't get why we have to freeze anything and how exactly > > helpers are a burden. > > > > But especially in this specific case of few simple dynptr helpers, > > especially that other dynptrs generic APIs are already BPF helpers. I > > just don't get it and honestly all I see from this discussion is that > > you've made up your mind and there is nothing that can be done to > > convince you. > > > > The only "BPF helpers are stable and thus a burden" argument is just > > not convincing and I'd even say is mostly false. There are no upsides > > to having dynptr helpers as kfuncs, as far as I'm concerned. > > The main and only upside for everything as kfunc is that we can change it. > That's it. > > > But there > > are a bunch of downsides, even if some of those might be lifted in the > > future. > > imo ability to change outweighs all downsides, since downsides are fixable > while inability to change is a burden. > > > The unfortunate thing is that end users that are meant to benefit from > > all these helpers and them being "a standard API offering" are not > > well represented on the BPF mailing list, unfortunately. And my > > opinion and arguments as a proxy for theirs is clearly not enough. > > I also would like to hear what others on the list are thinking. The last thing I'll say is that everything I've said above is really in regards to the more general debate of helpers vs. kfuncs. Specifically for the dynptrs being added in this set, I agree with Andrii that it's arguably an odd user experience for certain platforms to support different only specific parts of the dynptr API surface. I'm not sure whether that's enough to warrant making them helpers instead of kfuncs, but I do think it's not exactly an apples to apples comparison with future features that today have no helper API presence. Putting myself in the shoes of a dynptr user, I would be very surprised and confused if all of a sudden, I couldn't use some of the core dynptr APIs due to being on a platform that doesn't have kfunc support. My two cents are that letting these dynptr functions stay as helpers, while agreeing that kfuncs is the way forward (though I don't think Andrii agrees with that even aside from just these dynptrs) is a reasonable compromise that errs on the side of user-friendliness for dynptr users. FWIW, I also don't think it's fair or logical to argue at this point in the game that dynptrs as a concept is inherently flawed. They were super useful for enabling the user ringbuf map type, which is a key part of rhone / user-space scheduling in sched_ext, and I wouldn't be surprised if ghOSt started using it as well as a way to make scheduling decisions without trapping into the kernel as well. Also, the attendees at LSFMM generally seemed enthusiastic about dynptrs and user ringbuf, though I admittedly don't know who's using either feature outside of rhone. That being said, to reiterate, I personally agree that once we take care of a few more things for kfuncs , they're 100% the way forward over helpers. BPF programs are kernel programs, no UAPI pain should be necessary.
On Fri, Dec 30, 2022 at 12:38:55PM -0600, David Vernet wrote: > On Thu, Dec 29, 2022 at 06:46:41PM -0800, Alexei Starovoitov wrote: > > On Thu, Dec 29, 2022 at 03:10:22PM -0800, Andrii Nakryiko wrote: > > > On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > > > > > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > > > > > > > No clean way? Yet in the other email you proposed a way. > > > > > > Not pretty, but workable. > > > > > > I'm sure if ever there will be a need to stabilize the kfunc we will > > > > > > find a clean way to do it. > > > > > > > > > > You can't have stable and unstable helper definition in the same .c > > > > > file, > > > > > > > > of course we can. > > > > uapi helpers vs kfuncs argument is not a black and white comparison. > > > > It's not just stable vs unstable. > > > > uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. > > > > While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. > > > > Meaning they are largely unstable. > > > > The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, > > > > but distros can apply their own "stability rules". > > > > See Redhat's kABI, for example. A distro can guarantee a stability > > > > of certain EXPORT_SYMBOL* for their customers, but that doesn't bind > > > > upstream development. > > This also sounds more in line with what was discussed at the maintainers > summit [0]. "A BPF program that depends on kernel symbols is not really > a user program anymore." Given that perspective, EXPORT_SYMBOL_GPL > sounds like the correct equivalency to "public BPF symbols". > > [0]: https://lwn.net/Articles/908464/ > > > > > > > > > With uapi bpf helpers we have to guarantee their stability, > > > > while with kfuncs we can do whatever we want. Right now all kfuncs are > > > > unstable and to prove the point we changed them couple times already (nf_conn*). > > > > We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). > > > > Hard to imagine more stable and more fundamental function. > > > > Of course we want bpf programs to use bpf_obj_new() and assume > > > > that it's going to be available in all future kernel releases. > > > > But at the same time we're not bound by uapi rules. > > > > bpf_obj_new() will likely be stable, but not uapi stable. > > > > If we screw up (or find better way to allocate memory in the future) > > > > we can change it. > > > > We can invent our own deprecation rules for stable-ish kfuncs and > > > > invent our more-unstable-than-current-unstable rules for kfuncs that > > > > are too much kernel release dependent. > > > > > > I'm talking about *mechanics* of having two incompatible definitions > > > of functions with the same name, not the *concept* of stable vs > > > unstable API. See [0] where I explained this as a reply to Joanne. > > > > > > [0] https://lore.kernel.org/bpf/CAEf4BzbRQLEjAFUkzzStv0c0=O+r9iZ8hq33sJB2RtSuGrGAEA@mail.gmail.com/ > > > > Mechanics for kfuncs are much better than for helpers. > > > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > > > will likely work with both gcc and clang. > > And if it doesn't we can fix it. > > > > While when gcc folks saw helpers: > > > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > > > they realized that it is a hack that abuses compiler optimizations. > > They even invented attr(kernel_helper) to workaround this issue. > > After a bunch of arguing gcc added support for this hack without attr, > > but it's going to be around forever... in gcc, in clang and in kernel. > > It's something that we could have fixed if it wasn't for uapi. > > Just one more example of unfixable mistake that causing issues > > to multiple projects. > > That's the core issue of kernel uapi rules: inability to fix mistakes. > > > > > > > > > > > But regardless, dynptr is modeled as black box with hidden state, and > > > > > its API surface area is bigger (offset, size, is null or not, > > > > > manipulations over those aspects; then there is skb/xdp abstraction to > > > > > be taken care of for generic read/write). It has a wider *generic* API > > > > > surface to be useful and effectively used. > > > > > > > > tbh dynptr as an abstraction of skb/xdp is not convincing. > > > > cilium created their own abstraction on top of skb and xdp and it's zero cost. > > > > While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. > > > > So I suspect it won't be a success story in the long run, but we > > > > can certainly try it out since they will be kfuncs and can be deprecated > > > > if maintenance outweighs the number of users. > > > > > > > > > All *two* of them, bpf_get_current_task() and > > > > > bpf_get_current_task_btf(), right? They are 2 years apart. > > > > > bpf_get_current_task() was added before BTF era. It is still actively > > > > > used today and there is nothing wrong with it. It works on older > > > > > kernels just fine, even with BPF CO-RE (as backporting a few simple > > > > > patches to generate BTF is simple and easy; not so much with BPF > > > > > verifier changes to add native BTF support). I don't see much problem > > > > > having both, they are not maintenance burden. > > > > > > > > bpf_get_current_pid_tgid > > > > bpf_get_current_uid_gid > > > > bpf_get_current_comm > > > > bpf_get_current_task > > > > bpf_get_current_task_btf > > > > bpf_get_current_cgroup_id > > > > bpf_get_current_ancestor_cgroup_id > > > > bpf_skb_ancestor_cgroup_id > > > > bpf_sk_cgroup_id > > > > bpf_sk_ancestor_cgroup_id > > > > > > > > _are_ a maintenance burden. > > > > > > bpf_get_current_pid_tgid() was added in 2015, slightly and > > > uncritically touched by Daniel in 2016 and we never had any problems > > > with it ever since. No updates, no maintenance. I don't remember much > > > problem with other helpers in this list, but I didn't check each one. > > You could argue that this actually a point in favor of kfuncs. If we > implement these as kfuncs and never touch them again, users will not > need to change anything and will have the same exact experience as if it > was in UAPI (minus being on platforms that don't support kfuncs, which > is something we should work to fix in general). It will just work > indefinitely, as long as we decide to support it. > > The only time there will be pain felt by users is if we in fact do > actually have to change it. If we have to add a flags field, or change > the semantics to have different behavior, etc. I think Alexei's point is > that we simply _can't_ do that if we're bound by UAPI. At least with > kfuncs we have the choice to change it if we deem it necessary. > > Taking bpf_get_current_task() as an example, I think it's better to have > the debate be "should we keep supporting this / are users still using > it?" rather than, "it's UAPI, there's nothing to even discuss". The > point being that even if bpf_get_current_task() is still used, there may > (and inevitably will) be other UAPI helpers that are useless and that we > just can't remove. > > > > > > > But we certainly have a different understanding of what "maintenance > > > burden" is. If some code doesn't require constant change and doesn't > > > prevent changes in some other parts of the system, it's not a > > > maintenance burden. > > > > As I said it's not about working today. If one doesn't touch code > > it will keep working. > > It's about being able to change it. > > The uapi bits we simply cannot change. > > I think Michael Kerrisk's classic "Once upon an API" talk [1] provides a > compelling, real-world example of this point: > > [1]: https://kernel-recipes.org/en/2022/once-upon-an-api/ > > APIs can seem innocuous when you first add them, and then as you use > them more and in different ways, your platform grows more featureful and > things change, etc, you realize that the axioms upon which you designed > your APIs in the first place are no longer true. prctl() started out as > a dead-simple syscall where a child process would get a signal if its > parent process dies. Over the years, it's morphed into a monstrosity [2] > of a syscall with tons of odd behavior that's impossible [3] to fix even > a decade+ after the API was first introduced due to the possibility of > breaking applications that have come to rely on that non-sensical > behavior. Never breaking user space is a great philosophy, but I don't > think we need to inflict that same pain on ourselves for _kernel_ > programs, which is what we're discussing here. > > [2]: https://man7.org/linux/man-pages/man2/prctl.2.html > [3]: https://bugzilla.kernel.org/show_bug.cgi?id=43300#c22 > > I'm not trying to paint a false equivalency between prctl() and the > helpers you enumerated in [4], because I agree with you that it's very > unlikely that they'll change, but I also think it's impossible to know > that for sure, and I do agree with Alexei that the "hypothetical chance > to change something in the future" is hugely valuable. That being said, > I comment more on the dynptr helpers down below. > > [4]: https://lore.kernel.org/all/CAEf4BzZM0+j6DXMgu2o2UvjtzoOxcjsJtT8j-jqVZYvAqxc52g@mail.gmail.com/ > > > > > > > > The verifier got smarter and we could have removed all of them, > > > > but uapi rules makes it impossible. > > > > The bpf prog could have been enabled to access all these task_struct > > > > and cgroup fields directly. Likely without any kfuncs. > > > > > > > > bpf_send_signal vs bpf_send_signal_thread > > > > bpf_jiffies64 vs bpf_this_cpu_ptr > > > > etc > > > > there are plenty examples where uapi bpf helpers became a burden. > > > > They are working and will keep working, but we could have done > > > > much better job if not for uapi. > > > > These are the examples where uapi rules are too strong for bpf development. > > > > Our pace of adding new features is high. > > > > The kernel uapi rules are too strict for us. > > > > > > I'm familiar with the burden of maintaining API stability and > > > backwards compat. But it's not just about the library/system > > > > libbpf 1.0 wasn't the smoothest example of deprecation. > > But we still did it despite all kinds of negative flame. > > With uapi helpers we cannot do any of that. No deprecation schemes. > > While kfuncs allow innovation. > > > > > developer's convenience and burden, it's also about the end user's > > > experience and convenience. BPF tool developers really appreciate when > > > there are few less quirks to remember and work around across kernel > > > versions, configurations, architectures, etc. It's the pain that > > > kernel engineers working on BPF bleeding-edge don't experience in the > > > BPF selftests environment. > > > > There is a trade off between users and developers. We want to make user > > experience as smooth as possible while preserve the speed of development > > for the kernel. uapi is in the way of that. > > As illustrated in the prctl() example above, UAPI can get in the way of > users as well. If we can't fix an API or its semantics, some users are > stuck with that crappy behavior (while, admittedly, others get to enjoy > the consistency of the weird / existing behavior not changing out from > under them). I certainly see why there are strong reasons to have a > stable UAPI for user space, but for kernel programs I don't think so. > > > > > > > > > At one point DaveM declared freeze on sizeof(struct sk_buff). > > > > It was a difficult, but correct decision. > > > > We have to declare freeze on bpf helpers. > > > > 211 helpers that have to be maintained forever is a huge burden. > > While I agree that we should freeze helpers at some point, I also think > we need to take care of a few things before that can or should formally > go into effect. You mentioned some things we should take care of in [5]. > Automatically emitting kfuncs into vmlinux.h, properly documenting > kfuncs. I think that list is insufficient, and that we need: > > [5]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ All of the below are 'nice to have' to improve kfunc user experience, but certainly not 'must have'. > 1. A formal, build-enforced policy for documenting kfuncs, as we > currently have for helpers (as you mentioned, minus the > build-enforcement). That would be necessary only for stable-ish kfuncs. Like recently added bpf_obj_new. Unstable kfuncs would have to be documented differently and maybe not even documented. It will take time to figure it all out. > 2. Emitting kfuncs into vmlinux.h, as you mentioned. Key kfuncs are already in bpf_experimental.h Unstable kfuncs might go into vmlinux.h. Maybe all. Many ways to go about it. > 3. Allowing users to specify flags per-argument in kfuncs. In my opinion > this is a big deficiency of kfuncs relative to helpers. This would mean > e.g. getting rid of the __sz and __k hacks. I think it's fine for us to > live with it for now while we're continuing to flesh-out and improve > kfuncs (a process which is happening quickly), but IMO it's really not > appropriate for it to be the official only way to add helpers. It's a > beta feature :-) This is a huge discussion on pros and cons and correct approach. That might take years. We already had ~3 refactoring of how kfuncs are represented in the kernel in the last ~2 years. Is 4th refactoring going to be final? Likely no. It's a bit of wishful thinking that addressing today's problem will somehow make everything nice and clean and then we will be ready to stop adding helpers. We'll keep improving the infra for years to come. There is no "end of the road" sign. > 4. Getting rid of KF_TRUSTED_ARGS and making that the default. We've been talking about this possibility for months. Are you suggesting to keep adding helpers for another year or so? We already have 91 kfuncs and 211 helpers. If we were not asking all developers to use kfuncs we would have had 300+ helpers. > 5. Ideally we could improve the story for _defining_ kfuncs as well, > though IMO it's already far less painful than defining helpers. It would > be nice if you could just tag a kfunc with something like a __bpf_kfunc > macro and it would do the following: > > - Automatically disable the -Wmissing-prototypes warning. I doubt this > is possible without adding some compiler features that let you do > something like __attribute__(__nowarn__("Wmissing-prototypes")), so > maybe this isn't a hard blocker, but more of a medium / long-term > goal. > - Add whatever other attributes we need for the kfuncs to be safe. For > example, 'noinline' and '__used'. Even if the symbols are global, > we'll probably need '__used' for LTO. would be nice, but that didn't stop existing 91 kfuncs to appear and already used in production. Yes. kfuncs are already used in production. > Overall, my point is really that we still have some homework to do > before we can just unilaterally freeze helpers. We're getting close, but > IMO not quite there yet. 91 vs 211 tells a different story. > > > > > > I still didn't get why we have to freeze anything and how exactly > > > helpers are a burden. > > > > > > But especially in this specific case of few simple dynptr helpers, > > > especially that other dynptrs generic APIs are already BPF helpers. I > > > just don't get it and honestly all I see from this discussion is that > > > you've made up your mind and there is nothing that can be done to > > > convince you. > > > > > > The only "BPF helpers are stable and thus a burden" argument is just > > > not convincing and I'd even say is mostly false. There are no upsides > > > to having dynptr helpers as kfuncs, as far as I'm concerned. > > > > The main and only upside for everything as kfunc is that we can change it. > > That's it. > > > > > But there > > > are a bunch of downsides, even if some of those might be lifted in the > > > future. > > > > imo ability to change outweighs all downsides, since downsides are fixable > > while inability to change is a burden. > > > > > The unfortunate thing is that end users that are meant to benefit from > > > all these helpers and them being "a standard API offering" are not > > > well represented on the BPF mailing list, unfortunately. And my > > > opinion and arguments as a proxy for theirs is clearly not enough. > > > > I also would like to hear what others on the list are thinking. > > The last thing I'll say is that everything I've said above is really in > regards to the more general debate of helpers vs. kfuncs. Specifically > for the dynptrs being added in this set, I agree with Andrii that it's > arguably an odd user experience for certain platforms to support > different only specific parts of the dynptr API surface. > > I'm not sure whether that's enough to warrant making them helpers > instead of kfuncs, but I do think it's not exactly an apples to apples > comparison with future features that today have no helper API presence. > Putting myself in the shoes of a dynptr user, I would be very surprised > and confused if all of a sudden, I couldn't use some of the core dynptr > APIs due to being on a platform that doesn't have kfunc support. My two > cents are that letting these dynptr functions stay as helpers, while > agreeing that kfuncs is the way forward (though I don't think Andrii > agrees with that even aside from just these dynptrs) is a reasonable > compromise that errs on the side of user-friendliness for dynptr users. We already have this 'discrepancy' of both kfuncs and helpers for kptrs (bpf_obj_new vs bpf_kptr_xhcg) and so far no complains. Why dynptr is special? > FWIW, I also don't think it's fair or logical to argue at this point in > the game that dynptrs as a concept is inherently flawed. They were super > useful for enabling the user ringbuf map type, which is a key part of > rhone / user-space scheduling in sched_ext, and I wouldn't be surprised > if ghOSt started using it as well as a way to make scheduling decisions > without trapping into the kernel as well. Also, the attendees at LSFMM > generally seemed enthusiastic about dynptrs and user ringbuf, though I > admittedly don't know who's using either feature outside of rhone. rhone doesn't have stability guarantees just like sched-ext doesn't have them. To drive that point rhone and sched-ext should really be using kfuncs. Otherwise somebody might point the finger at helpers and argue that this is somehow makes sched-ext stable. > That being said, to reiterate, I personally agree that once we take care > of a few more things for kfuncs , they're 100% the way forward over > helpers. BPF programs are kernel programs, no UAPI pain should be > necessary. Similar arguments were made during sk_buff freeze... let's add few more fields that are going to be sooo useful and then we'll freeze sk_buff... dynptr is trying to be that special snow flake. bpf_rcu_read_lock was added as a kfunc. It's more fundamental than dynptr. bpf_obj_new is a kfunc too. Also more fundamental than dynptr. What is so special about dynptr that we need to make an exception for it?
On Fri, Dec 30, 2022 at 11:31:12AM -0800, Alexei Starovoitov wrote: > On Fri, Dec 30, 2022 at 12:38:55PM -0600, David Vernet wrote: > > On Thu, Dec 29, 2022 at 06:46:41PM -0800, Alexei Starovoitov wrote: > > > On Thu, Dec 29, 2022 at 03:10:22PM -0800, Andrii Nakryiko wrote: > > > > On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > > > > > > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > > > > > > > > > No clean way? Yet in the other email you proposed a way. > > > > > > > Not pretty, but workable. > > > > > > > I'm sure if ever there will be a need to stabilize the kfunc we will > > > > > > > find a clean way to do it. > > > > > > > > > > > > You can't have stable and unstable helper definition in the same .c > > > > > > file, > > > > > > > > > > of course we can. > > > > > uapi helpers vs kfuncs argument is not a black and white comparison. > > > > > It's not just stable vs unstable. > > > > > uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. > > > > > While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. > > > > > Meaning they are largely unstable. > > > > > The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, > > > > > but distros can apply their own "stability rules". > > > > > See Redhat's kABI, for example. A distro can guarantee a stability > > > > > of certain EXPORT_SYMBOL* for their customers, but that doesn't bind > > > > > upstream development. > > > > This also sounds more in line with what was discussed at the maintainers > > summit [0]. "A BPF program that depends on kernel symbols is not really > > a user program anymore." Given that perspective, EXPORT_SYMBOL_GPL > > sounds like the correct equivalency to "public BPF symbols". > > > > [0]: https://lwn.net/Articles/908464/ > > > > > > > > > > > > With uapi bpf helpers we have to guarantee their stability, > > > > > while with kfuncs we can do whatever we want. Right now all kfuncs are > > > > > unstable and to prove the point we changed them couple times already (nf_conn*). > > > > > We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). > > > > > Hard to imagine more stable and more fundamental function. > > > > > Of course we want bpf programs to use bpf_obj_new() and assume > > > > > that it's going to be available in all future kernel releases. > > > > > But at the same time we're not bound by uapi rules. > > > > > bpf_obj_new() will likely be stable, but not uapi stable. > > > > > If we screw up (or find better way to allocate memory in the future) > > > > > we can change it. > > > > > We can invent our own deprecation rules for stable-ish kfuncs and > > > > > invent our more-unstable-than-current-unstable rules for kfuncs that > > > > > are too much kernel release dependent. > > > > > > > > I'm talking about *mechanics* of having two incompatible definitions > > > > of functions with the same name, not the *concept* of stable vs > > > > unstable API. See [0] where I explained this as a reply to Joanne. > > > > > > > > [0] https://lore.kernel.org/bpf/CAEf4BzbRQLEjAFUkzzStv0c0=O+r9iZ8hq33sJB2RtSuGrGAEA@mail.gmail.com/ > > > > > > Mechanics for kfuncs are much better than for helpers. > > > > > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > > > > > will likely work with both gcc and clang. > > > And if it doesn't we can fix it. > > > > > > While when gcc folks saw helpers: > > > > > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > > > > > they realized that it is a hack that abuses compiler optimizations. > > > They even invented attr(kernel_helper) to workaround this issue. > > > After a bunch of arguing gcc added support for this hack without attr, > > > but it's going to be around forever... in gcc, in clang and in kernel. > > > It's something that we could have fixed if it wasn't for uapi. > > > Just one more example of unfixable mistake that causing issues > > > to multiple projects. > > > That's the core issue of kernel uapi rules: inability to fix mistakes. > > > > > > > > > > > > > > But regardless, dynptr is modeled as black box with hidden state, and > > > > > > its API surface area is bigger (offset, size, is null or not, > > > > > > manipulations over those aspects; then there is skb/xdp abstraction to > > > > > > be taken care of for generic read/write). It has a wider *generic* API > > > > > > surface to be useful and effectively used. > > > > > > > > > > tbh dynptr as an abstraction of skb/xdp is not convincing. > > > > > cilium created their own abstraction on top of skb and xdp and it's zero cost. > > > > > While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. > > > > > So I suspect it won't be a success story in the long run, but we > > > > > can certainly try it out since they will be kfuncs and can be deprecated > > > > > if maintenance outweighs the number of users. > > > > > > > > > > > All *two* of them, bpf_get_current_task() and > > > > > > bpf_get_current_task_btf(), right? They are 2 years apart. > > > > > > bpf_get_current_task() was added before BTF era. It is still actively > > > > > > used today and there is nothing wrong with it. It works on older > > > > > > kernels just fine, even with BPF CO-RE (as backporting a few simple > > > > > > patches to generate BTF is simple and easy; not so much with BPF > > > > > > verifier changes to add native BTF support). I don't see much problem > > > > > > having both, they are not maintenance burden. > > > > > > > > > > bpf_get_current_pid_tgid > > > > > bpf_get_current_uid_gid > > > > > bpf_get_current_comm > > > > > bpf_get_current_task > > > > > bpf_get_current_task_btf > > > > > bpf_get_current_cgroup_id > > > > > bpf_get_current_ancestor_cgroup_id > > > > > bpf_skb_ancestor_cgroup_id > > > > > bpf_sk_cgroup_id > > > > > bpf_sk_ancestor_cgroup_id > > > > > > > > > > _are_ a maintenance burden. > > > > > > > > bpf_get_current_pid_tgid() was added in 2015, slightly and > > > > uncritically touched by Daniel in 2016 and we never had any problems > > > > with it ever since. No updates, no maintenance. I don't remember much > > > > problem with other helpers in this list, but I didn't check each one. > > > > You could argue that this actually a point in favor of kfuncs. If we > > implement these as kfuncs and never touch them again, users will not > > need to change anything and will have the same exact experience as if it > > was in UAPI (minus being on platforms that don't support kfuncs, which > > is something we should work to fix in general). It will just work > > indefinitely, as long as we decide to support it. > > > > The only time there will be pain felt by users is if we in fact do > > actually have to change it. If we have to add a flags field, or change > > the semantics to have different behavior, etc. I think Alexei's point is > > that we simply _can't_ do that if we're bound by UAPI. At least with > > kfuncs we have the choice to change it if we deem it necessary. > > > > Taking bpf_get_current_task() as an example, I think it's better to have > > the debate be "should we keep supporting this / are users still using > > it?" rather than, "it's UAPI, there's nothing to even discuss". The > > point being that even if bpf_get_current_task() is still used, there may > > (and inevitably will) be other UAPI helpers that are useless and that we > > just can't remove. > > > > > > > > > > But we certainly have a different understanding of what "maintenance > > > > burden" is. If some code doesn't require constant change and doesn't > > > > prevent changes in some other parts of the system, it's not a > > > > maintenance burden. > > > > > > As I said it's not about working today. If one doesn't touch code > > > it will keep working. > > > It's about being able to change it. > > > The uapi bits we simply cannot change. > > > > I think Michael Kerrisk's classic "Once upon an API" talk [1] provides a > > compelling, real-world example of this point: > > > > [1]: https://kernel-recipes.org/en/2022/once-upon-an-api/ > > > > APIs can seem innocuous when you first add them, and then as you use > > them more and in different ways, your platform grows more featureful and > > things change, etc, you realize that the axioms upon which you designed > > your APIs in the first place are no longer true. prctl() started out as > > a dead-simple syscall where a child process would get a signal if its > > parent process dies. Over the years, it's morphed into a monstrosity [2] > > of a syscall with tons of odd behavior that's impossible [3] to fix even > > a decade+ after the API was first introduced due to the possibility of > > breaking applications that have come to rely on that non-sensical > > behavior. Never breaking user space is a great philosophy, but I don't > > think we need to inflict that same pain on ourselves for _kernel_ > > programs, which is what we're discussing here. > > > > [2]: https://man7.org/linux/man-pages/man2/prctl.2.html > > [3]: https://bugzilla.kernel.org/show_bug.cgi?id=43300#c22 > > > > I'm not trying to paint a false equivalency between prctl() and the > > helpers you enumerated in [4], because I agree with you that it's very > > unlikely that they'll change, but I also think it's impossible to know > > that for sure, and I do agree with Alexei that the "hypothetical chance > > to change something in the future" is hugely valuable. That being said, > > I comment more on the dynptr helpers down below. > > > > [4]: https://lore.kernel.org/all/CAEf4BzZM0+j6DXMgu2o2UvjtzoOxcjsJtT8j-jqVZYvAqxc52g@mail.gmail.com/ > > > > > > > > > > > The verifier got smarter and we could have removed all of them, > > > > > but uapi rules makes it impossible. > > > > > The bpf prog could have been enabled to access all these task_struct > > > > > and cgroup fields directly. Likely without any kfuncs. > > > > > > > > > > bpf_send_signal vs bpf_send_signal_thread > > > > > bpf_jiffies64 vs bpf_this_cpu_ptr > > > > > etc > > > > > there are plenty examples where uapi bpf helpers became a burden. > > > > > They are working and will keep working, but we could have done > > > > > much better job if not for uapi. > > > > > These are the examples where uapi rules are too strong for bpf development. > > > > > Our pace of adding new features is high. > > > > > The kernel uapi rules are too strict for us. > > > > > > > > I'm familiar with the burden of maintaining API stability and > > > > backwards compat. But it's not just about the library/system > > > > > > libbpf 1.0 wasn't the smoothest example of deprecation. > > > But we still did it despite all kinds of negative flame. > > > With uapi helpers we cannot do any of that. No deprecation schemes. > > > While kfuncs allow innovation. > > > > > > > developer's convenience and burden, it's also about the end user's > > > > experience and convenience. BPF tool developers really appreciate when > > > > there are few less quirks to remember and work around across kernel > > > > versions, configurations, architectures, etc. It's the pain that > > > > kernel engineers working on BPF bleeding-edge don't experience in the > > > > BPF selftests environment. > > > > > > There is a trade off between users and developers. We want to make user > > > experience as smooth as possible while preserve the speed of development > > > for the kernel. uapi is in the way of that. > > > > As illustrated in the prctl() example above, UAPI can get in the way of > > users as well. If we can't fix an API or its semantics, some users are > > stuck with that crappy behavior (while, admittedly, others get to enjoy > > the consistency of the weird / existing behavior not changing out from > > under them). I certainly see why there are strong reasons to have a > > stable UAPI for user space, but for kernel programs I don't think so. > > > > > > > > > > > > At one point DaveM declared freeze on sizeof(struct sk_buff). > > > > > It was a difficult, but correct decision. > > > > > We have to declare freeze on bpf helpers. > > > > > 211 helpers that have to be maintained forever is a huge burden. > > > > While I agree that we should freeze helpers at some point, I also think > > we need to take care of a few things before that can or should formally > > go into effect. You mentioned some things we should take care of in [5]. > > Automatically emitting kfuncs into vmlinux.h, properly documenting > > kfuncs. I think that list is insufficient, and that we need: > > > > [5]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ > > All of the below are 'nice to have' to improve kfunc user experience, > but certainly not 'must have'. I certainly agree that what is 'must have' is subjective. > > > 1. A formal, build-enforced policy for documenting kfuncs, as we > > currently have for helpers (as you mentioned, minus the > > build-enforcement). > > That would be necessary only for stable-ish kfuncs. > Like recently added bpf_obj_new. > Unstable kfuncs would have to be documented differently and maybe not even documented. > It will take time to figure it all out. Why would we only want to make it necessary for stable-ish kfuncs? It's simpler and less open to interpretation to just have a blanket "you must document your kfuncs" policy. It seems pretty reasonable to expect people who are exporting public symbols that can be linked against by BPF programs to document those functions given that it takes no more than ~5 minutes? I also don't want to hijack the larger conversation here to discuss documentation. I think we all agree that documentation is important. We already have a pretty good kfuncs docs page [0] anyways. In my subjective opinion, _the_ platform for documenting public / exported BPF symbols should have a well-defined documentation story, but yes, arguing for it to be a blocker is maybe a stretch. [0]: https://docs.kernel.org/bpf/kfuncs.html > > 2. Emitting kfuncs into vmlinux.h, as you mentioned. > > Key kfuncs are already in bpf_experimental.h > Unstable kfuncs might go into vmlinux.h. > Maybe all. > Many ways to go about it. Agreed that there are many possibilities. In my once again subjective opinion it would be good to get this ironed out, but yes, arguably not a blocker. > > > 3. Allowing users to specify flags per-argument in kfuncs. In my opinion > > this is a big deficiency of kfuncs relative to helpers. This would mean > > e.g. getting rid of the __sz and __k hacks. I think it's fine for us to > > live with it for now while we're continuing to flesh-out and improve > > kfuncs (a process which is happening quickly), but IMO it's really not > > appropriate for it to be the official only way to add helpers. It's a > > beta feature :-) > > This is a huge discussion on pros and cons and correct approach. > That might take years. We already had ~3 refactoring of how kfuncs > are represented in the kernel in the last ~2 years. > Is 4th refactoring going to be final? Likely no. I don't think the fact that we'll never be done is a valid counterpoint to "are we ready now"? The first iteration of kfuncs was definitely not in a good enough state to freeze all helpers. The usability of kfuncs has improved drastically since then. The question isn't "when will be at a complete stopping point?", it's, "are we sufficiently ready now?". > It's a bit of wishful thinking that addressing today's problem will somehow > make everything nice and clean and then we will be ready to stop adding helpers. > We'll keep improving the infra for years to come. > There is no "end of the road" sign. Yes, there's no end of the road, but my point is that there are still pieces that we know we need to change, and which we know are temporary (__sz and __k being the main examples). *That being said*: I completely admit that this is all subjective. From a technical standpoint, there is nothing stopping us from freezing helpers. And honestly, I don't disagree with you that getting out of UAPI immediately and forever is a huge positive; possibly even to the point that it warrants us just doing it now. More below. > > > 4. Getting rid of KF_TRUSTED_ARGS and making that the default. > > We've been talking about this possibility for months. > Are you suggesting to keep adding helpers for another year or so? I think that kfuncs should be the norm for the vast majority of things being added, and hopefully for everything (I'm going to walk back my suggestion of adding these new dynptr functions as helpers). Honestly, my point was really just that I think the API for defining kfuncs needs to be improved before we can totally and completely freeze helpers due to the fact that we have __sz and __k, and don't have a consistent documentation story. That being said, __sz and __k are there, they work, and as you and I have both said at this point, whether or not they're "blockers" is subjective. So my answer to your question of "should we add helpers for another year or so" in my last reply would have been "absolutely not, unless we truly have no choice because of the lack of per-arg flags". After reading your reply, if you're worried that that policy won't be strictly enforced (meaning that we'll end up having to add helpers that easily could have just been kfuncs) then I agree that we should just do the hard freeze now. We've de-facto been doing that anyways for the last year. That being said, I really would hope that we could at least get some of the documentation story figured out. Even if it's just something as simple as spelling out a formal policy on our kfuncs docs page stipulating that you have to add a doxygen header and link it from a docs page, it would be nice to have some policy that puts kfuncs on a road to being as well documented as helpers. > We already have 91 kfuncs and 211 helpers. > If we were not asking all developers to use kfuncs we would have had 300+ helpers. Agreed that this would have been a _very_ unfortunate outcome. > > > 5. Ideally we could improve the story for _defining_ kfuncs as well, > > though IMO it's already far less painful than defining helpers. It would > > be nice if you could just tag a kfunc with something like a __bpf_kfunc > > macro and it would do the following: > > > > - Automatically disable the -Wmissing-prototypes warning. I doubt this > > is possible without adding some compiler features that let you do > > something like __attribute__(__nowarn__("Wmissing-prototypes")), so > > maybe this isn't a hard blocker, but more of a medium / long-term > > goal. > > - Add whatever other attributes we need for the kfuncs to be safe. For > > example, 'noinline' and '__used'. Even if the symbols are global, > > we'll probably need '__used' for LTO. > > would be nice, but that didn't stop existing 91 kfuncs to appear > and already used in production. > Yes. kfuncs are already used in production. This is something that would literally only take like 1-2 patches anyways. I'm happy to do it so we don't have to waste cycles thinking about it as a blocker for anything. > > > Overall, my point is really that we still have some homework to do > > before we can just unilaterally freeze helpers. We're getting close, but > > IMO not quite there yet. > > 91 vs 211 tells a different story. Yeah, the fact that we have 91 kfuncs is strong evidence that kfuncs are already in a good-enough place to just freeze helpers. Another counterpoint to my initial claim that not having per-arg flags could be problematic is that there are certain things that are global in kfuncs that are also global in helpers despite having per-arg modifiers. For example, the fact that you can only have one OBJ_RELEASE argument. And yet another is the fact that none of the helpers we've added in the last year relied on having per-arg modifiers, so in practice it hasn't been a problem. I think it's fair to say that if you just look at the data instead of from an "API cleanlines" perspective, having per-arg modifiers is not a blocker. Data wins over subjectivity, so as mentioned above, I'm willing to change my mind about per-arg modifiers being a blocker, especially with __sz and __k. > > > > > > > > I still didn't get why we have to freeze anything and how exactly > > > > helpers are a burden. > > > > > > > > But especially in this specific case of few simple dynptr helpers, > > > > especially that other dynptrs generic APIs are already BPF helpers. I > > > > just don't get it and honestly all I see from this discussion is that > > > > you've made up your mind and there is nothing that can be done to > > > > convince you. > > > > > > > > The only "BPF helpers are stable and thus a burden" argument is just > > > > not convincing and I'd even say is mostly false. There are no upsides > > > > to having dynptr helpers as kfuncs, as far as I'm concerned. > > > > > > The main and only upside for everything as kfunc is that we can change it. > > > That's it. > > > > > > > But there > > > > are a bunch of downsides, even if some of those might be lifted in the > > > > future. > > > > > > imo ability to change outweighs all downsides, since downsides are fixable > > > while inability to change is a burden. > > > > > > > The unfortunate thing is that end users that are meant to benefit from > > > > all these helpers and them being "a standard API offering" are not > > > > well represented on the BPF mailing list, unfortunately. And my > > > > opinion and arguments as a proxy for theirs is clearly not enough. > > > > > > I also would like to hear what others on the list are thinking. > > > > The last thing I'll say is that everything I've said above is really in > > regards to the more general debate of helpers vs. kfuncs. Specifically > > for the dynptrs being added in this set, I agree with Andrii that it's > > arguably an odd user experience for certain platforms to support > > different only specific parts of the dynptr API surface. > > > > I'm not sure whether that's enough to warrant making them helpers > > instead of kfuncs, but I do think it's not exactly an apples to apples > > comparison with future features that today have no helper API presence. > > Putting myself in the shoes of a dynptr user, I would be very surprised > > and confused if all of a sudden, I couldn't use some of the core dynptr > > APIs due to being on a platform that doesn't have kfunc support. My two > > cents are that letting these dynptr functions stay as helpers, while > > agreeing that kfuncs is the way forward (though I don't think Andrii > > agrees with that even aside from just these dynptrs) is a reasonable > > compromise that errs on the side of user-friendliness for dynptr users. > > We already have this 'discrepancy' of both kfuncs and helpers for kptrs > (bpf_obj_new vs bpf_kptr_xhcg) and so far no complains. > Why dynptr is special? Well, lack of usability in one case doesn't necessarily mean we should allow it in another. That said, the "usability" gains from having a helper really are minimal to the point of practically being negligible anyways. Part of me was trying to find a compromise here to move forward, but honestly, I do agree with you that we should aggressively make everything a kfunc unless we have a good reason not to, dynptr functions included. So I'm willing to walk this suggestion back as well -- let's just make these kfuncs. > > FWIW, I also don't think it's fair or logical to argue at this point in > > the game that dynptrs as a concept is inherently flawed. They were super > > useful for enabling the user ringbuf map type, which is a key part of > > rhone / user-space scheduling in sched_ext, and I wouldn't be surprised > > if ghOSt started using it as well as a way to make scheduling decisions > > without trapping into the kernel as well. Also, the attendees at LSFMM > > generally seemed enthusiastic about dynptrs and user ringbuf, though I > > admittedly don't know who's using either feature outside of rhone. > > rhone doesn't have stability guarantees just like sched-ext doesn't have them. > To drive that point rhone and sched-ext should really be using kfuncs. > Otherwise somebody might point the finger at helpers and argue that > this is somehow makes sched-ext stable. Also a reasonable point. My point above was really just a response to your claim in [0] that dynptrs are flawed. It wasn't related to kfuncs vs. helpers. [0]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ > > > That being said, to reiterate, I personally agree that once we take care > > of a few more things for kfuncs , they're 100% the way forward over > > helpers. BPF programs are kernel programs, no UAPI pain should be > > necessary. > > Similar arguments were made during sk_buff freeze... let's add few more fields > that are going to be sooo useful and then we'll freeze sk_buff... > dynptr is trying to be that special snow flake. The main points of my initial response were not about dynptrs, they were about how we define kfuncs. I agree there is nothing at all special about dynptrs beyond the fact that they as a feature already have helpers. Sure, let's add them as kfuncs. No reason to be beholden to the UAPI restrictions. > > bpf_rcu_read_lock was added as a kfunc. It's more fundamental than dynptr. > bpf_obj_new is a kfunc too. Also more fundamental than dynptr. > What is so special about dynptr that we need to make an exception for it? See above.
On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: > > > > > > Taking bpf_get_current_task() as an example, I think it's better to have > > > the debate be "should we keep supporting this / are users still using > > > it?" rather than, "it's UAPI, there's nothing to even discuss". The > > > point being that even if bpf_get_current_task() is still used, there may > > > (and inevitably will) be other UAPI helpers that are useless and that we > > > just can't remove. Sorry, missed this question in the previous reply. The answer is "it's UAPI, there's nothing to even discuss". It doesn't matter whether bpf_get_current_task() is used heavily or not used at all. The chance of breaking user space is what paralyzes the changes. Any change to uapi header file is looked at with a magnifying glass. There is no deprecation story for uapi. The definition and semantics of bpf helpers are frozen _forever_. And our uapi/bpf.h is not in a good company: ls -Sla include/uapi/linux/|head -rw-r--r-- 1 ast users 331159 Nov 3 08:32 nl80211.h -rw-r--r-- 1 ast users 265312 Dec 25 13:51 bpf.h -rw-r--r-- 1 ast users 118621 Dec 25 13:51 v4l2-controls.h -rw-r--r-- 1 ast users 99533 Dec 25 13:51 videodev2.h -rw-r--r-- 1 ast users 86460 Nov 29 11:15 ethtool.h "Freeze bpf helpers now" is a minimum we should do right now. We need to take aggressive steps to freeze the growth of the whole uapi/bpf.h Support for kfuncs was added in March 2021 in commit e6ac2450d6de ("bpf: Support bpf program calling kernel function") In almost 2 years we've learned a lot on how to verify them, how to use and extend them. The way they're defined in the kernel was refactored ~3 times. Right now do: git grep 'BTF_ID_FLAGS(func' to find all kfuncs. Including Documentation/bpf/kfuncs.rst that you've made great contribution to :) When I mentioned 91 kfunc in my previous reply I forgot to count another dozen kfuncs in sched-ext and another dozen in hid-bpf that are not in mainline yet. fuse-bpf will likely add their own kfuncs and so on. Your 'todo list' for kfuncs is absolutely correct. Are kfuncs a perfect substitute for helpers? No. They have downsides and we need to work on addressing downsides instead of growing bpf.h further. Are we ready to freeze bpf helpers? Absolutely yes. "please use kfuncs instead of helpers" was our recommendation for 9 month or so and now we need to make it an official rule. For bpf noobs it's certainly easier to add new prog type, new map type, new helper, but that gotta stop. Last prog type we added in May 2021 and we should try hard not to add one anymore. hid-bpf managed to do everything without new prog type. sched-ext is not adding new prog type either. This is great. We're breaking free from uapi constraints. With map types we are not doing so well: 9330986c03006 (Joanne Koong 2021-10-27 16:45:00 -0700 943) BPF_MAP_TYPE_BLOOM_FILTER, 583c1f420173f (David Vernet 2022-09-19 19:00:57 -0500 944) BPF_MAP_TYPE_USER_RINGBUF, c4bcfb38a95ed (Yonghong Song 2022-10-25 21:28:50 -0700 945) BPF_MAP_TYPE_CGRP_STORAGE, 99c55f7d47c0d (Alexei Starovoitov 2014-09-26 00:16:57 -0700 946) }; I wish these last three were not added as stable uapi. Right now we're getting close on defining new map types in unstable way. The bpf link lists and bpf rbtree are added through kfuncs (aka new generation data structures, aka graph apis). They don't have uapi values in 'enum bpf_map_type' and that's the most important part about them. Are we ready to freeze map prog types already? Probably not. Upcoming qp-trie comes to mind that looks very hard to do without new map type. I hope it will be the last stable map type. > > > I think Michael Kerrisk's classic "Once upon an API" talk [1] provides a > > > compelling, real-world example of this point: > > > > > > [1]: https://kernel-recipes.org/en/2022/once-upon-an-api/ This is great analogy. We need to learn from the "uapi pain" of others before us instead of learning it the hard way through our own mistakes. > I also don't want to hijack the larger conversation here to discuss > documentation. I think we all agree that documentation is important. We > already have a pretty good kfuncs docs page [0] anyways. In my > subjective opinion, _the_ platform for documenting public / exported BPF > symbols should have a well-defined documentation story, but yes, arguing > for it to be a blocker is maybe a stretch. ... > That being said, I really would hope that we could at least get some of > the documentation story figured out. Even if it's just something as > simple as spelling out a formal policy on our kfuncs docs page > stipulating that you have to add a doxygen header and link it from a > docs page, it would be nice to have some policy that puts kfuncs on a > road to being as well documented as helpers. The challenge of requiring the doc with a kfunc is that it can make kfunc look stable. We need the whole spectrum of kfuncs from pretty stable (like bpf_obj_new) to something very unstable (like bpf_kfunc_call_test_mem_len_fail2). We cannot require a doc with automatic .h for every kfunc. Therefore right now all kfuncs are completely unstable and stability story (including good doc and discoverability) is yet to be figured out. > > > > > 5. Ideally we could improve the story for _defining_ kfuncs as well, > > > though IMO it's already far less painful than defining helpers. It would > > > be nice if you could just tag a kfunc with something like a __bpf_kfunc > > > macro and it would do the following: > > > > > > - Automatically disable the -Wmissing-prototypes warning. I doubt this > > > is possible without adding some compiler features that let you do > > > something like __attribute__(__nowarn__("Wmissing-prototypes")), so > > > maybe this isn't a hard blocker, but more of a medium / long-term > > > goal. > > > - Add whatever other attributes we need for the kfuncs to be safe. For > > > example, 'noinline' and '__used'. Even if the symbols are global, > > > we'll probably need '__used' for LTO. > > > > would be nice, but that didn't stop existing 91 kfuncs to appear > > and already used in production. > > Yes. kfuncs are already used in production. > > This is something that would literally only take like 1-2 patches > anyways. I'm happy to do it so we don't have to waste cycles thinking > about it as a blocker for anything. Yeah. __bpf_kfunc tag would be nice to avoid this boilerplate. In addition to your 'kfunc todo list' I can add: 6. introduce polymorphic kfuncs We have helpers that have different implementation depending on prog type. All kfuncs have one-to-one match so far. We need kfuncs that would work differently depending on bpf prog context. 7. fine grained kfunc scope Right now a set of available kfuncs is determined by prog type. Same thing we do for helpers, but kfuncs already outpaced helpers. We need to be able to define a set of kfuncs for a pair (prog type, attach location) or something like that. hid-bpf and sched-ext folks asked for it. That would be similar to EXPORT_SYMBOL namespaces, but with strict enforcement for safety. > Another counterpoint to my initial claim that not having per-arg flags > could be problematic is that there are certain things that are global in > kfuncs that are also global in helpers despite having per-arg modifiers. > For example, the fact that you can only have one OBJ_RELEASE argument. > And yet another is the fact that none of the helpers we've added in the > last year relied on having per-arg modifiers, so in practice it hasn't > been a problem. Right. Right now we have OBJ_RELEASE flag for args of helpers, but that refactoring happened recently. Not that long ago all helpers with release semantic were hard coded in verifier.c. We're making progress in both helper and kfunc verification. We should be able to combine the code eventually. > Part of me was trying to find a compromise here to move forward, but > honestly, I do agree with you that we should aggressively make > everything a kfunc unless we have a good reason not to, dynptr functions > included. So I'm willing to walk this suggestion back as well -- let's > just make these kfuncs. Agree that any hard policy like 'only kfuncs from now on' gotta have its limits. Maybe there will be a strong reason to add a new helper one day, so we can keep the door open a tiny bit for an exception, but for dynptr... There are kfuncs with dynptr already (bpf_verify_pkcs7_signature) So precedent is already made. > Also a reasonable point. My point above was really just a response to > your claim in [0] that dynptrs are flawed. It wasn't related to kfuncs > vs. helpers. > > [0]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ The flawed part of dynptr I was explaining here: https://lore.kernel.org/all/20221225215210.ekmfhyczgubx4rih@macbook-pro-6.dhcp.thefacebook.com/ It's not that the whole concept of dynptr is flawed, but using it as an abstraction on top of skb/xdp. I don't believe that the extreme performance demands of xdp users are compatible with 'lets verify in runtime' philosophy of dynptr. I could be wrong. That's why I'm fine adding dynptr_on_top_of_xdp as kfuncs and seeing it playing out, but certainly not as a stable helper. iirc Martin and Kuba had concerns about bits of dynptr(skb | xdp) too. With kfuncs we can iron out the issues while trying to use it whereas with helpers we will be stuck for long time in endless mailing list arguments. It's a win-win for everyone to switch everything to kfuncs.
On 12/31/22 1:42 AM, Alexei Starovoitov wrote: > On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: >>>> >>>> Taking bpf_get_current_task() as an example, I think it's better to have >>>> the debate be "should we keep supporting this / are users still using >>>> it?" rather than, "it's UAPI, there's nothing to even discuss". The >>>> point being that even if bpf_get_current_task() is still used, there may >>>> (and inevitably will) be other UAPI helpers that are useless and that we >>>> just can't remove. > > Sorry, missed this question in the previous reply. > The answer is "it's UAPI, there's nothing to even discuss". > It doesn't matter whether bpf_get_current_task() is used heavily or not used at all. > The chance of breaking user space is what paralyzes the changes. > Any change to uapi header file is looked at with a magnifying glass. > There is no deprecation story for uapi. > The definition and semantics of bpf helpers are frozen _forever_. > And our uapi/bpf.h is not in a good company: > ls -Sla include/uapi/linux/|head > -rw-r--r-- 1 ast users 331159 Nov 3 08:32 nl80211.h > -rw-r--r-- 1 ast users 265312 Dec 25 13:51 bpf.h > -rw-r--r-- 1 ast users 118621 Dec 25 13:51 v4l2-controls.h > -rw-r--r-- 1 ast users 99533 Dec 25 13:51 videodev2.h > -rw-r--r-- 1 ast users 86460 Nov 29 11:15 ethtool.h > > "Freeze bpf helpers now" is a minimum we should do right now. > We need to take aggressive steps to freeze the growth of the whole uapi/bpf.h Imho, freezing BPF helpers now is way too aggressive step. One aspect which was not discussed here is that unstable kfuncs will be a pain for user experience compared to BPF helpers. Probably not for FB or G who maintain they own limited set of kernels, but for all others. If there is valid reason that kfuncs will have to change one way or another, then BPF applications using them will have to carry the maintenance burden on their side to be able to support a variety of kernel versions with working around the kfunc quirks. So you're essentially outsourcing the problem from kernel to users, which will suck from a user experience (and add to development cost on their side). Ofc there is interest in keeping changes to a minimum, but it's not the same as BPF helpers where there is a significantly higher guarantee that things continue to keep working going forward. Today in Cilium we don't use any of the kfuncs, we might at some point when we see it necessary, but likely to a limited degree if sth cannot be solved as-is and only kfunc is present as a solution. But again, from a UX it's not great having to know that things can break anytime soon with newer kernels (things might already with verifier/LLVM upgrade and kfunc potentially adds yet another level). Generally speaking, I'm not against kfuncs but I suggest only making "freeze bpf helpers now" a soft freeze with a path forward for promoting some of the kfuncs which have been around and matured for a while and didn't need changes as stable BPF helpers to indicate their maturity level when we see it fit. So it's not a hard "no", but possible promotion when suitable. [...] > When I mentioned 91 kfunc in my previous reply I forgot to count another dozen kfuncs > in sched-ext and another dozen in hid-bpf that are not in mainline yet. > fuse-bpf will likely add their own kfuncs and so on. For the latter agree as well given from a bigger picture, they are mainly niche use cases at this point and in future. > Your 'todo list' for kfuncs is absolutely correct. Are kfuncs a perfect substitute > for helpers? No. They have downsides and we need to work on addressing downsides > instead of growing bpf.h further. > Are we ready to freeze bpf helpers? Absolutely yes. > "please use kfuncs instead of helpers" was our recommendation for 9 month or so > and now we need to make it an official rule. > For bpf noobs it's certainly easier to add new prog type, new map type, new helper, > but that gotta stop. > Last prog type we added in May 2021 and we should try hard not to add one anymore. > hid-bpf managed to do everything without new prog type. > sched-ext is not adding new prog type either. > This is great. We're breaking free from uapi constraints. [...] > The challenge of requiring the doc with a kfunc is that it can make kfunc > look stable. > We need the whole spectrum of kfuncs from pretty stable (like bpf_obj_new) > to something very unstable (like bpf_kfunc_call_test_mem_len_fail2). > We cannot require a doc with automatic .h for every kfunc. > Therefore right now all kfuncs are completely unstable and > stability story (including good doc and discoverability) is yet to be figured out. [...] Discoverability plus being able to know semantics from a user PoV to figure out when workarounds for older/newer kernels are required to be able to support both kernels. "something very unstable" sounds like it probably shouldn't even be merged in the first place, but generally speaking a spectrum from pretty stable to very unstable is imho repeating the same story as BPF helpers vs kfuncs. Saying a kfunc is 'pretty stable' is kind of hinting to users that it's close to UAPI, but yet it's unstable. It'll confuse even more. I'd rather have a path forward where those kfuncs get promoted to actual BPF helpers by then where we go and say, that kfunc has proven itself in production and from an API PoV that it is ready to be a proper BPF helper, and until this point it's unstable, expect things to change, period. If a kfunc actually changed for the kernels that users develop against, they need to go and figure out anyway as part of their development process (/ maintenance cost). > Agree that any hard policy like 'only kfuncs from now on' gotta have its limits. > Maybe there will be a strong reason to add a new helper one day, > so we can keep the door open a tiny bit for an exception, +1 > but for dynptr... > There are kfuncs with dynptr already (bpf_verify_pkcs7_signature) > So precedent is already made. bpf_verify_pkcs7_signature as kfunc also makes sense given wider-spread adoption (and ideally as part of an OSS project) is yet to be seen. >> Also a reasonable point. My point above was really just a response to >> your claim in [0] that dynptrs are flawed. It wasn't related to kfuncs >> vs. helpers. >> >> [0]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ > > The flawed part of dynptr I was explaining here: > https://lore.kernel.org/all/20221225215210.ekmfhyczgubx4rih@macbook-pro-6.dhcp.thefacebook.com/ > > It's not that the whole concept of dynptr is flawed, > but using it as an abstraction on top of skb/xdp. > I don't believe that the extreme performance demands of xdp users are > compatible with 'lets verify in runtime' philosophy of dynptr. > I could be wrong. That's why I'm fine adding dynptr_on_top_of_xdp as kfuncs > and seeing it playing out, but certainly not as a stable helper. > iirc Martin and Kuba had concerns about bits of dynptr(skb | xdp) too. (My assumption was that you're adding it because you were planning to use it internally?) > With kfuncs we can iron out the issues while trying to use it whereas > with helpers we will be stuck for long time in endless mailing list arguments. > It's a win-win for everyone to switch everything to kfuncs. Thanks, Daniel
On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: > On 12/31/22 1:42 AM, Alexei Starovoitov wrote: > > On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: > > > > > > > > > > Taking bpf_get_current_task() as an example, I think it's better to have > > > > > the debate be "should we keep supporting this / are users still using > > > > > it?" rather than, "it's UAPI, there's nothing to even discuss". The > > > > > point being that even if bpf_get_current_task() is still used, there may > > > > > (and inevitably will) be other UAPI helpers that are useless and that we > > > > > just can't remove. > > > > Sorry, missed this question in the previous reply. > > The answer is "it's UAPI, there's nothing to even discuss". > > It doesn't matter whether bpf_get_current_task() is used heavily or not used at all. > > The chance of breaking user space is what paralyzes the changes. > > Any change to uapi header file is looked at with a magnifying glass. > > There is no deprecation story for uapi. > > The definition and semantics of bpf helpers are frozen _forever_. > > And our uapi/bpf.h is not in a good company: > > ls -Sla include/uapi/linux/|head > > -rw-r--r-- 1 ast users 331159 Nov 3 08:32 nl80211.h > > -rw-r--r-- 1 ast users 265312 Dec 25 13:51 bpf.h > > -rw-r--r-- 1 ast users 118621 Dec 25 13:51 v4l2-controls.h > > -rw-r--r-- 1 ast users 99533 Dec 25 13:51 videodev2.h > > -rw-r--r-- 1 ast users 86460 Nov 29 11:15 ethtool.h > > > > "Freeze bpf helpers now" is a minimum we should do right now. > > We need to take aggressive steps to freeze the growth of the whole uapi/bpf.h > > Imho, freezing BPF helpers now is way too aggressive step. One aspect which was > not discussed here is that unstable kfuncs will be a pain for user experience > compared to BPF helpers. Probably not for FB or G who maintain they own limited > set of kernels, but for all others. If there is valid reason that kfuncs will have > to change one way or another, then BPF applications using them will have to carry > the maintenance burden on their side to be able to support a variety of kernel > versions with working around the kfunc quirks. So you're essentially outsourcing > the problem from kernel to users, which will suck from a user experience (and add > to development cost on their side). It's actually the opposite. A small company that wants to use BPF needs to have a workaround/plan B for different kernels and different distros. That's why cilium and others have to detect availability of helpers and bpf features. One bpf prog for newer kernel and potentially completely different solution for older kernels. That's the biggest obstacle in bpf adoption: the required features are in the latest kernels, but companies have to support older kernels too. Now look at the problem from different angle: Detecting kfuncs is no different than detecting helpers. The bpf users has to have a workaround when helper/kfunc is not available. In that sense stability of the helpers vs instability of kfuncs is irrelevant. Both might not exist in a particular kernel. So if cilium starts to use kfunc it won't be extra development cost and bpf program writer experience using kfuncs vs using helpers is the same as well. But with kfuncs we can solve this bpf adoption issue. The helpers are not easily backportable and cannot be added in modules, so company's workarounds for older kernel are painful. While kfuncs are trivially added in a module. Let's take bpf_sock_destroy that Aditi wants to add as an example. If it's done as a helper the cilium would need to wait for the next kernel release and next distro release some years from now to actually use it at the customer site. If bpf_sock_destroy is added as kfunc you can ship an extra kernel module with just that kfunc to your customers. You can also attempt to convince a distro that this module with kfuncs should be certified, since the same kfunc is in upstream kernel. The customer can use cilium that relies on bpf_sock_destroy much sooner and likely there won't be a need to develop a completely different workaround for kernels without that kfunc. There is no need to actually backport bpf_sock_destroy to older kernels. As long as verifier infrastructure for kfuncs is feature rich all new kfuncs can be shipped by distro or by cilium in a module without affecting support contract of the main kernel. The verification of kfuncs is still actively evolving, but in not too distant future people will be able to ship/add kfuncs without touching the kernel. The faster the whole bpf community switches to 'use kfuncs for everything' model the faster the verification of them becomes solid and bpf adoption issue will be addressed. > Ofc there is interest in keeping changes to a > minimum, but it's not the same as BPF helpers where there is a significantly higher > guarantee that things continue to keep working going forward. Today in Cilium we > don't use any of the kfuncs, we might at some point when we see it necessary, but > likely to a limited degree if sth cannot be solved as-is and only kfunc is present > as a solution. But again, from a UX it's not great having to know that things can > break anytime soon with newer kernels (things might already with verifier/LLVM > upgrade and kfunc potentially adds yet another level). Generally speaking, I'm not > against kfuncs but I suggest only making "freeze bpf helpers now" a soft freeze > with a path forward for promoting some of the kfuncs which have been around and > matured for a while and didn't need changes as stable BPF helpers to indicate their > maturity level when we see it fit. So it's not a hard "no", but possible promotion > when suitable. The problem with 'soft' freeze that it's open to interpretation and abuse. It feels to me you're saying that cilium is not using kfuncs and therefore all cilium features additions are ok to be done as helpers. That doesn't sound fair to other bpf devs. > > [...] > > When I mentioned 91 kfunc in my previous reply I forgot to count another dozen kfuncs > > in sched-ext and another dozen in hid-bpf that are not in mainline yet. > > fuse-bpf will likely add their own kfuncs and so on. > > For the latter agree as well given from a bigger picture, they are mainly niche use > cases at this point and in future. I'd argue that cilium's bpf_sock_destroy is just as niche as sched-ext scheduling kfuncs. > > > Your 'todo list' for kfuncs is absolutely correct. Are kfuncs a perfect substitute > > for helpers? No. They have downsides and we need to work on addressing downsides > > instead of growing bpf.h further. > > Are we ready to freeze bpf helpers? Absolutely yes. > > "please use kfuncs instead of helpers" was our recommendation for 9 month or so > > and now we need to make it an official rule. > > For bpf noobs it's certainly easier to add new prog type, new map type, new helper, > > but that gotta stop. > > Last prog type we added in May 2021 and we should try hard not to add one anymore. > > hid-bpf managed to do everything without new prog type. > > sched-ext is not adding new prog type either. > > This is great. We're breaking free from uapi constraints. > [...] > > > The challenge of requiring the doc with a kfunc is that it can make kfunc > > look stable. > > We need the whole spectrum of kfuncs from pretty stable (like bpf_obj_new) > > to something very unstable (like bpf_kfunc_call_test_mem_len_fail2). > > We cannot require a doc with automatic .h for every kfunc. > > Therefore right now all kfuncs are completely unstable and > > stability story (including good doc and discoverability) is yet to be figured out. > [...] > > Discoverability plus being able to know semantics from a user PoV to figure out when > workarounds for older/newer kernels are required to be able to support both kernels. Sounds like your concern is that there could be a kfunc that changed it semantics, but kept exact same name and arguments? Yeah. That would be bad, but we should prevent such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs. > "something very unstable" sounds like it probably shouldn't even be merged in the > first place, but generally speaking a spectrum from pretty stable to very unstable See bpf_kfunc_call_test_mem_len_fail2. It's very much 'very unstable'. It's a test function. Currently it's in net/bpf/test_run.c. It's there only because at that time we didn't have an ability to add kfuncs in modules. Soon we will move all test kfuncs from the main kernel to bpf_testmod.ko > is imho repeating the same story as BPF helpers vs kfuncs. Saying a kfunc is 'pretty > stable' is kind of hinting to users that it's close to UAPI, but yet it's unstable. correct. > It'll confuse even more. I'd rather have a path forward where those kfuncs get promoted why confuse more? There are EXPORT_SYMBOL like kmalloc that are quite stable, yet they can change. EXPORT_SYMBOL_GPL is exact analogy to kfunc. > to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > and from an API PoV that it is ready to be a proper BPF helper, and until this point "Proper BPF helper" model is broken. static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; is a hack that works only when compiler optimizes the code. See gcc's attr(kernel_helper) workaround. This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. And because it's uapi we cannot even fix this. With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. These tools don't exist yet, but we have a way forward whereas with helpers we are stuck with -O2. > it's unstable, expect things to change, period. If a kfunc actually changed for the > kernels that users develop against, they need to go and figure out anyway as part of > their development process (/ maintenance cost). The stable kfuncs will still use the same kfuncs mechanics: libbpf searches BTF and supplies kernel with btf_id of that kfunc before loading the bpf prog. We won't be hacking stable kfuncs into '= (void *) 1;' > > Agree that any hard policy like 'only kfuncs from now on' gotta have its limits. > > Maybe there will be a strong reason to add a new helper one day, > > so we can keep the door open a tiny bit for an exception, > > +1 > > > but for dynptr... > > There are kfuncs with dynptr already (bpf_verify_pkcs7_signature) > > So precedent is already made. > > bpf_verify_pkcs7_signature as kfunc also makes sense given wider-spread adoption (and > ideally as part of an OSS project) is yet to be seen. > > > > Also a reasonable point. My point above was really just a response to > > > your claim in [0] that dynptrs are flawed. It wasn't related to kfuncs > > > vs. helpers. > > > > > > [0]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ > > > > The flawed part of dynptr I was explaining here: > > https://lore.kernel.org/all/20221225215210.ekmfhyczgubx4rih@macbook-pro-6.dhcp.thefacebook.com/ > > > > It's not that the whole concept of dynptr is flawed, > > but using it as an abstraction on top of skb/xdp. > > I don't believe that the extreme performance demands of xdp users are > > compatible with 'lets verify in runtime' philosophy of dynptr. > > I could be wrong. That's why I'm fine adding dynptr_on_top_of_xdp as kfuncs > > and seeing it playing out, but certainly not as a stable helper. > > iirc Martin and Kuba had concerns about bits of dynptr(skb | xdp) too. > > (My assumption was that you're adding it because you were planning to use > it internally?) The bar is not that some project wants to use this new feature, but rather that the feature looks useful and may potentially be used. We are as maintainers making this judgement call ever single day. When we make mistake we should be able to fix it. With uapi we cannot fix our mistakes. > > With kfuncs we can iron out the issues while trying to use it whereas > > with helpers we will be stuck for long time in endless mailing list arguments. > > It's a win-win for everyone to switch everything to kfuncs. > > Thanks, > Daniel
On Fri, 30 Dec 2022 16:42:13 -0800 Alexei Starovoitov wrote:
> iirc Martin and Kuba had concerns about bits of dynptr(skb | xdp) too.
FWIW yes, I withdrew my objections because Joanne showed me some changes
which reduced LOC in user space even with the limited functionality.
But dynptrs are not the efficient skb/xdp buf abstraction I was hoping
for :(
On 1/4/23 12:51 AM, Alexei Starovoitov wrote: > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: >> On 12/31/22 1:42 AM, Alexei Starovoitov wrote: >>> On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: >>>>>> >>>>>> Taking bpf_get_current_task() as an example, I think it's better to have >>>>>> the debate be "should we keep supporting this / are users still using >>>>>> it?" rather than, "it's UAPI, there's nothing to even discuss". The >>>>>> point being that even if bpf_get_current_task() is still used, there may >>>>>> (and inevitably will) be other UAPI helpers that are useless and that we >>>>>> just can't remove. >>> >>> Sorry, missed this question in the previous reply. >>> The answer is "it's UAPI, there's nothing to even discuss". >>> It doesn't matter whether bpf_get_current_task() is used heavily or not used at all. >>> The chance of breaking user space is what paralyzes the changes. >>> Any change to uapi header file is looked at with a magnifying glass. >>> There is no deprecation story for uapi. >>> The definition and semantics of bpf helpers are frozen _forever_. >>> And our uapi/bpf.h is not in a good company: >>> ls -Sla include/uapi/linux/|head >>> -rw-r--r-- 1 ast users 331159 Nov 3 08:32 nl80211.h >>> -rw-r--r-- 1 ast users 265312 Dec 25 13:51 bpf.h >>> -rw-r--r-- 1 ast users 118621 Dec 25 13:51 v4l2-controls.h >>> -rw-r--r-- 1 ast users 99533 Dec 25 13:51 videodev2.h >>> -rw-r--r-- 1 ast users 86460 Nov 29 11:15 ethtool.h >>> >>> "Freeze bpf helpers now" is a minimum we should do right now. >>> We need to take aggressive steps to freeze the growth of the whole uapi/bpf.h >> >> Imho, freezing BPF helpers now is way too aggressive step. One aspect which was >> not discussed here is that unstable kfuncs will be a pain for user experience >> compared to BPF helpers. Probably not for FB or G who maintain they own limited >> set of kernels, but for all others. If there is valid reason that kfuncs will have >> to change one way or another, then BPF applications using them will have to carry >> the maintenance burden on their side to be able to support a variety of kernel >> versions with working around the kfunc quirks. So you're essentially outsourcing >> the problem from kernel to users, which will suck from a user experience (and add >> to development cost on their side). > > It's actually the opposite. > A small company that wants to use BPF needs to have a workaround/plan B for > different kernels and different distros. > That's why cilium and others have to detect availability of helpers and bpf features. > One bpf prog for newer kernel and potentially completely different solution > for older kernels. > That's the biggest obstacle in bpf adoption: the required features are in > the latest kernels, but companies have to support older kernels too. > Now look at the problem from different angle: > Detecting kfuncs is no different than detecting helpers. > The bpf users has to have a workaround when helper/kfunc is not available. > In that sense stability of the helpers vs instability of kfuncs is irrelevant. > Both might not exist in a particular kernel. > So if cilium starts to use kfunc it won't be extra development cost and > bpf program writer experience using kfuncs vs using helpers is the same as well. But that was not the point I was making. What you describe above is the baseline cost which is there regardless of BPF helper vs kfunc.. detecting availability and having a workaround for older kernel if needed. The added cost is if kfunc changes over time for whichever valid reason, then you are essentially pushing the maintenance cost _from kernel to users_ when they need to keep track of that and implement workarounds specifically to make the kfunc work in their program for a set of kernels they plan to support, which they otherwise would /not/ have if it was a BPF helper. It raises the barrier from user side. Similarly, if users started out with using kfunc from a base kernel, and in future it might get removed given its not stable, then a workaround (if possible) needs to be implemented for newer kernels - probably rare occasion but not impossible or something that can be ruled out entirely. So the stability of the helpers vs instability of kfuncs is relevant in that case, not for the case you describe above, and that is extra development cost on user side. Generally, what I'm saying is, there needs to be a path forward where we are still open for both instead of completely freezing the former. > But with kfuncs we can solve this bpf adoption issue. > The helpers are not easily backportable and cannot be added in modules, > so company's workarounds for older kernel are painful. > While kfuncs are trivially added in a module. Maybe to a small degree. Often shipping out-of-tree kernel module is generally a no-go from corp policy and there's nothing you can do about it in such case. "trivially added" is a bit oversimplified as well.. depends on the kfunc of course, but potentially painful in terms of having to work around various changing kernel internals for your kfunc implementation and only possible if kernel actually exposes the needed functionality to modules. While the adoption issue /can/ in some cases be solved, I don't think it will be widely practical to solve adoption issue. Eventually only time will solve it when everyone is on decent enough kernel as baseline, this is what is there today at least for networking and tracing side where BPF is widely adopted and its available framework big enough to solve many use cases. Aside and independent of all that, kfuncs added in out of tree modules should be discouraged. After all we want developers to contribute back to upstream kernel, and for a very long time we've had the stance that no extra functionality should be possible via out of tree module extensions. > Let's take bpf_sock_destroy that Aditi wants to add as an example. > If it's done as a helper the cilium would need to wait for the next kernel release > and next distro release some years from now to actually use it at the customer site. Yeap, with some distros in K8s space being better than others, for example, some like Flatcar tend to be fairly up to date. With major LTS ones it takes 1+ years though. > If bpf_sock_destroy is added as kfunc you can ship an extra kernel module > with just that kfunc to your customers. You can also attempt to convince a distro > that this module with kfuncs should be certified, since the same kfunc is in upstream kernel. > The customer can use cilium that relies on bpf_sock_destroy much sooner > and likely there won't be a need to develop a completely different workaround > for kernels without that kfunc. See above wrt modules. Some larger users which run their own DC infra also build kernels for themselves, so in some cases it's possible and easier from corp policy PoV to just cherry-pick upstream commits and roll them into their own kernel build until they upgrade at some point to a base kernel where this comes by default. Some of the distro vendors build "hw enablement" kernels for cloud providers and there it is possible too to ask for backports on core functionality even if not in stable, it's a slow process however. [...] >> Ofc there is interest in keeping changes to a >> minimum, but it's not the same as BPF helpers where there is a significantly higher >> guarantee that things continue to keep working going forward. Today in Cilium we >> don't use any of the kfuncs, we might at some point when we see it necessary, but >> likely to a limited degree if sth cannot be solved as-is and only kfunc is present >> as a solution. But again, from a UX it's not great having to know that things can >> break anytime soon with newer kernels (things might already with verifier/LLVM >> upgrade and kfunc potentially adds yet another level). Generally speaking, I'm not >> against kfuncs but I suggest only making "freeze bpf helpers now" a soft freeze >> with a path forward for promoting some of the kfuncs which have been around and >> matured for a while and didn't need changes as stable BPF helpers to indicate their >> maturity level when we see it fit. So it's not a hard "no", but possible promotion >> when suitable. > > The problem with 'soft' freeze that it's open to interpretation and abuse. > It feels to me you're saying that cilium is not using kfuncs and > therefore all cilium features additions are ok to be done as helpers. > That doesn't sound fair to other bpf devs. I think you misread, lets not twist what I mentioned. All I was saying is that we should keep the door open for both to continue to co-exist; both have a place, both come with their advantages but also baggage. It's not that one is absolutely better than the other, and that maintenance baggage is either on our side or pushed towards users. [...] >> Discoverability plus being able to know semantics from a user PoV to figure out when >> workarounds for older/newer kernels are required to be able to support both kernels. > > Sounds like your concern is that there could be a kfunc that changed it semantics, > but kept exact same name and arguments? Yeah. That would be bad, but we should prevent > such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs. Yes, that is a concern. New kfunc and deprecation with eventual removal of the old one might be better in such case, agree. [...] >> is imho repeating the same story as BPF helpers vs kfuncs. Saying a kfunc is 'pretty >> stable' is kind of hinting to users that it's close to UAPI, but yet it's unstable. > > correct. > >> It'll confuse even more. I'd rather have a path forward where those kfuncs get promoted > > why confuse more? There are EXPORT_SYMBOL like kmalloc that are quite stable, > yet they can change. > EXPORT_SYMBOL_GPL is exact analogy to kfunc. They are quite stable because they are used in lots of places in-tree and changing would cause a ton of needless churn and merge conflicts for everyone, etc. You might not always have this kind of visibility on usage of kfuncs. The data you have is from your internal code base and what's in some of the larger OSS projects, but certainly a more limited/biased view. So as with 'soft' freeze this is just as well open to interpretation. "confuse more" because you declare it quite stable, yet not stable. Why is there fear to make them proper uapi then with the given known guarantees? From user side this guarantee is a good thing, not a bad thing. Mistakes were/are made all the time and learned from. Imagine syscall API is not stable anymore. Would you invest the cost to develop an application against it? Imho, it's one of BPF's strengths and we should keep the door open, not close it. >> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production >> and from an API PoV that it is ready to be a proper BPF helper, and until this point > > "Proper BPF helper" model is broken. > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > is a hack that works only when compiler optimizes the code. > See gcc's attr(kernel_helper) workaround. > This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > And because it's uapi we cannot even fix this > With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > These tools don't exist yet, but we have a way forward whereas with helpers > we are stuck with -O2. Better debugging tools are needed either way, independent of -O0 or -O2. I don't think -O0 is a requirement or barrier for that. It may open up possibilities for new tools, but production is still running with -O2. Proper BPF helper model is broken, but everyone relies on it, and will be for a very very long time to come, whether we like it or not. There is a larger ecosystem around BPF devs outside of kernel, and developers will use the existing means today. There are recommendations / guidelines that we can provide but we also don't have control over what developers are doing. Yet we should make their life easier, not harder. Better debugging possibilities should cater to everyone. Thanks, Daniel
On Thu, Dec 29, 2022 at 6:46 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Dec 29, 2022 at 03:10:22PM -0800, Andrii Nakryiko wrote: > > On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Tue, Dec 20, 2022 at 11:31:25AM -0800, Andrii Nakryiko wrote: > > > > On Fri, Dec 16, 2022 at 9:35 AM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Mon, Dec 12, 2022 at 12:12:09PM -0800, Andrii Nakryiko wrote: > > > > > > > > > > > > There is no clean way to ever move from unstable kfunc to a stable helper. > > > > > > > > > > No clean way? Yet in the other email you proposed a way. > > > > > Not pretty, but workable. > > > > > I'm sure if ever there will be a need to stabilize the kfunc we will > > > > > find a clean way to do it. > > > > > > > > You can't have stable and unstable helper definition in the same .c > > > > file, > > > > > > of course we can. > > > uapi helpers vs kfuncs argument is not a black and white comparison. > > > It's not just stable vs unstable. > > > uapi has strict rules and helpers in uapi/bpf.h have to follow those rules. > > > While kfuncs in terms of stability are equivalent to EXPORT_SYMBOL_GPL. > > > Meaning they are largely unstable. > > > The upsteam kernel keeps changing those EXPORT_SYMBOL* functions, > > > but distros can apply their own "stability rules". > > > See Redhat's kABI, for example. A distro can guarantee a stability > > > of certain EXPORT_SYMBOL* for their customers, but that doesn't bind > > > upstream development. > > > > > > With uapi bpf helpers we have to guarantee their stability, > > > while with kfuncs we can do whatever we want. Right now all kfuncs are > > > unstable and to prove the point we changed them couple times already (nf_conn*). > > > We also have bpf_obj_new_impl() kfunc which is equivalent to EXPORT_SYMBOL(__kmalloc). > > > Hard to imagine more stable and more fundamental function. > > > Of course we want bpf programs to use bpf_obj_new() and assume > > > that it's going to be available in all future kernel releases. > > > But at the same time we're not bound by uapi rules. > > > bpf_obj_new() will likely be stable, but not uapi stable. > > > If we screw up (or find better way to allocate memory in the future) > > > we can change it. > > > We can invent our own deprecation rules for stable-ish kfuncs and > > > invent our more-unstable-than-current-unstable rules for kfuncs that > > > are too much kernel release dependent. > > > > I'm talking about *mechanics* of having two incompatible definitions > > of functions with the same name, not the *concept* of stable vs > > unstable API. See [0] where I explained this as a reply to Joanne. > > > > [0] https://lore.kernel.org/bpf/CAEf4BzbRQLEjAFUkzzStv0c0=O+r9iZ8hq33sJB2RtSuGrGAEA@mail.gmail.com/ > > Mechanics for kfuncs are much better than for helpers. >> *mechanics* of having two incompatible definitions >> of functions with the same name, but you made it clear that no unstable kfunc will ever be promoted to BPF helper, so I see no point in arguing further > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > will likely work with both gcc and clang. > And if it doesn't we can fix it. > > While when gcc folks saw helpers: > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > they realized that it is a hack that abuses compiler optimizations. > They even invented attr(kernel_helper) to workaround this issue. > After a bunch of arguing gcc added support for this hack without attr, > but it's going to be around forever... in gcc, in clang and in kernel. > It's something that we could have fixed if it wasn't for uapi. > Just one more example of unfixable mistake that causing issues > to multiple projects. > That's the core issue of kernel uapi rules: inability to fix mistakes. This is BPF ISA defining `call #N;` to call helper with ID N, which you agree that it (ISA) has to be stable, documented and standardized, right? Everything else is just how we expose those constants into C code and how libbpf deals with them. Libbpf could support new attribute or even extern-based convention, if necessary. But it wasn't necessary for years and only was brought up during GCC's attempt to invent a new convention here. And they successfully dealt with this challenge. > > > > > > > > But regardless, dynptr is modeled as black box with hidden state, and > > > > its API surface area is bigger (offset, size, is null or not, > > > > manipulations over those aspects; then there is skb/xdp abstraction to > > > > be taken care of for generic read/write). It has a wider *generic* API > > > > surface to be useful and effectively used. > > > > > > tbh dynptr as an abstraction of skb/xdp is not convincing. > > > cilium created their own abstraction on top of skb and xdp and it's zero cost. > > > While dynptr is not free, so xdp users unlikely to use dynptr(xdp) for perf reasons. > > > So I suspect it won't be a success story in the long run, but we > > > can certainly try it out since they will be kfuncs and can be deprecated > > > if maintenance outweighs the number of users. > > > > > > > All *two* of them, bpf_get_current_task() and > > > > bpf_get_current_task_btf(), right? They are 2 years apart. > > > > bpf_get_current_task() was added before BTF era. It is still actively > > > > used today and there is nothing wrong with it. It works on older > > > > kernels just fine, even with BPF CO-RE (as backporting a few simple > > > > patches to generate BTF is simple and easy; not so much with BPF > > > > verifier changes to add native BTF support). I don't see much problem > > > > having both, they are not maintenance burden. > > > > > > bpf_get_current_pid_tgid > > > bpf_get_current_uid_gid > > > bpf_get_current_comm > > > bpf_get_current_task > > > bpf_get_current_task_btf > > > bpf_get_current_cgroup_id > > > bpf_get_current_ancestor_cgroup_id > > > bpf_skb_ancestor_cgroup_id > > > bpf_sk_cgroup_id > > > bpf_sk_ancestor_cgroup_id > > > > > > _are_ a maintenance burden. > > > > bpf_get_current_pid_tgid() was added in 2015, slightly and > > uncritically touched by Daniel in 2016 and we never had any problems > > with it ever since. No updates, no maintenance. I don't remember much > > problem with other helpers in this list, but I didn't check each one. > > > > But we certainly have a different understanding of what "maintenance > > burden" is. If some code doesn't require constant change and doesn't > > prevent changes in some other parts of the system, it's not a > > maintenance burden. > > As I said it's not about working today. If one doesn't touch code Where do you see "working today"? Quoting myself, just few lines above: > > If some code doesn't require constant change and doesn't > > prevent changes in some other parts of the system, it's not a > > maintenance burden. Which of those helpers prevent us from doing something new? Which ones are slowing us down and by how much? > it will keep working. > It's about being able to change it. > The uapi bits we simply cannot change. Yes, we won't change existing helpers, but we can add new ones if we need to extend them. That's how APIs work. Yes, they need careful considerations when designing and implementing new APIs. Yes, mistakes do happen, that's just fact of life and par for the course of software development. Yes, we have to live with those mistakes. Nothing changed about that. But somehow libraries and kernel still produce stable APIs and maintain them because they clearly provide benefits to end users. > > > > > > The verifier got smarter and we could have removed all of them, > > > but uapi rules makes it impossible. > > > The bpf prog could have been enabled to access all these task_struct > > > and cgroup fields directly. Likely without any kfuncs. > > > > > > bpf_send_signal vs bpf_send_signal_thread > > > bpf_jiffies64 vs bpf_this_cpu_ptr > > > etc > > > there are plenty examples where uapi bpf helpers became a burden. > > > They are working and will keep working, but we could have done > > > much better job if not for uapi. > > > These are the examples where uapi rules are too strong for bpf development. > > > Our pace of adding new features is high. > > > The kernel uapi rules are too strict for us. > > > > I'm familiar with the burden of maintaining API stability and > > backwards compat. But it's not just about the library/system > > libbpf 1.0 wasn't the smoothest example of deprecation. > But we still did it despite all kinds of negative flame. > With uapi helpers we cannot do any of that. No deprecation schemes. > While kfuncs allow innovation. We'll get the same amount of flame when we try to change kfunc that's widely adopted. You are missing the point, though, in trying to pit BPF helpers against kfuncs. I'm not saying it has to always be BPF helpers and never kfuncs. Both have the right to exist. My point is that in some cases BPF helpers are better, in others - kfuncs are more adequate. Why is this so controversial? > > > developer's convenience and burden, it's also about the end user's > > experience and convenience. BPF tool developers really appreciate when > > there are few less quirks to remember and work around across kernel > > versions, configurations, architectures, etc. It's the pain that > > kernel engineers working on BPF bleeding-edge don't experience in the > > BPF selftests environment. > > There is a trade off between users and developers. We want to make user > experience as smooth as possible while preserve the speed of development > for the kernel. uapi is in the way of that. > > > > > > > At one point DaveM declared freeze on sizeof(struct sk_buff). > > > It was a difficult, but correct decision. > > > We have to declare freeze on bpf helpers. > > > 211 helpers that have to be maintained forever is a huge burden. > > > > I still didn't get why we have to freeze anything and how exactly > > helpers are a burden. > > > > But especially in this specific case of few simple dynptr helpers, > > especially that other dynptrs generic APIs are already BPF helpers. I > > just don't get it and honestly all I see from this discussion is that > > you've made up your mind and there is nothing that can be done to > > convince you. > > > > The only "BPF helpers are stable and thus a burden" argument is just > > not convincing and I'd even say is mostly false. There are no upsides > > to having dynptr helpers as kfuncs, as far as I'm concerned. > > The main and only upside for everything as kfunc is that we can change it. > That's it. And that's not reason enough to outlaw new BPF helpers wholesale. > > > But there > > are a bunch of downsides, even if some of those might be lifted in the > > future. > > imo ability to change outweighs all downsides, since downsides are fixable > while inability to change is a burden. I'm curious what's the mechanism when people disagree with your "imo" and have good reasons for that? Is there a scenario where opinion other than yours prevails even if you disagree with it? > > > The unfortunate thing is that end users that are meant to benefit from > > all these helpers and them being "a standard API offering" are not > > well represented on the BPF mailing list, unfortunately. And my > > opinion and arguments as a proxy for theirs is clearly not enough. > > I also would like to hear what others on the list are thinking.
On Fri, Dec 30, 2022 at 1:00 PM David Vernet <void@manifault.com> wrote: > > On Fri, Dec 30, 2022 at 11:31:12AM -0800, Alexei Starovoitov wrote: > > On Fri, Dec 30, 2022 at 12:38:55PM -0600, David Vernet wrote: > > > On Thu, Dec 29, 2022 at 06:46:41PM -0800, Alexei Starovoitov wrote: > > > > On Thu, Dec 29, 2022 at 03:10:22PM -0800, Andrii Nakryiko wrote: > > > > > On Sun, Dec 25, 2022 at 1:52 PM Alexei Starovoitov > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > [...] > I don't think the fact that we'll never be done is a valid counterpoint > to "are we ready now"? The first iteration of kfuncs was definitely not > in a good enough state to freeze all helpers. The usability of kfuncs > has improved drastically since then. The question isn't "when will be at > a complete stopping point?", it's, "are we sufficiently ready now?". > > > It's a bit of wishful thinking that addressing today's problem will somehow > > make everything nice and clean and then we will be ready to stop adding helpers. > > We'll keep improving the infra for years to come. > > There is no "end of the road" sign. > > Yes, there's no end of the road, but my point is that there are still > pieces that we know we need to change, and which we know are temporary > (__sz and __k being the main examples). > > *That being said*: I completely admit that this is all subjective. From > a technical standpoint, there is nothing stopping us from freezing > helpers. And honestly, I don't disagree with you that getting out of > UAPI immediately and forever is a huge positive; possibly even to the "huge positive" for whom? for happy kernel engineers that only care about the latest version of everything in BPF selftests or samples/bpf? Sure. But let's think about poor end user. Let's as a hypothetical and trivial example think about dynptr and bpf_dynptr_is_null(). Basic dynptr is usable in earlier kernel release than bpf_dynptr_is_null() helper, so you could write BPF app that will do some work-arounds without using bpf_dynptr_is_null() on old kernel, but happily switch to new helper/kfunc, if available. With BPF helpers I can detect this on BPF side completely transparently to user-space part of my app: struct bpf_dynptr dptr = ...; bool is_null = false; if (bpf_core_value_exists(enum bpf_func_id, BPF_FUNC_dynptr_is_null)) { is_null = bpf_dynptr_is_null(&dptr); } else { struct bpf_dynptr_kern *kdptr = (void*)&dptr; is_null = !!BPF_CORE_READ(kdptr, data); } How do you detect the existence of kfunc today? Preferably without doing extra work in user-space. Now, let's say kfunc changes its signature. Show me a short example on how you deal with that in BPF C code? Think about sched_ext. Right now it's so bleeding edge that you have to assume the very latest and freshest kernel code. So you know all the kfuncs that you need should exist otherwise sched_ext doesn't work at all. Ok, happy place. Now a year or two passes by. Some kfuncs are added, some are changed. We still believe that BPF CO-RE (compile once - run everywhere) is good and we don't want to compile and distribute multiple versions of BPF application, right? You'll want to do some extra (or more performant) stuff if kernel is recent and has some new kfunc, but fallback to some default suboptimal behavior otherwise. How do you do that in a simple and straightforward way? But even worse is what if some critical kfunc is changed between kernel versions and you do *need* to support both versions. Think about those aspects, because sched_ext will run into them almost inevitably soon after its inclusion into kernel. One way or another there are some technical solution of various degrees of creativity. And I'm actually not sure if I have a solution for kfunc signature change at all. Without BTF we could use two separate .c files and statically link them together, which would work because extern is untyped in pure C. But with BPF static linking we do have BTF information for each extern, and those BTF types will be incompatible for the same extern func. We can probably come up with some hacks and conventions, as usual, but better start thinking about them now. But hopefully you can empathize a bit more with poor end users that have to do hack like this and why having bpf_dynptr API defined as stable BPF helpers, with no extra dependencies on BTF in kernel, on kfunc support for architecture, and whatever other hidden dependencies we all forgot or haven't thought about yet (believe me, there will always be users trying to do something on some embedded system with "unusual" kernel configs or architectures). But again. Let me repeat my point *again*. BPF helpers and kfuncs are not mutually exclusive, both can and should exist and evolve. That's one of the main points which is somehow eluding this conversation. > point that it warrants us just doing it now. More below. > > > > > > 4. Getting rid of KF_TRUSTED_ARGS and making that the default. > > > > We've been talking about this possibility for months. > > Are you suggesting to keep adding helpers for another year or so? > > I think that kfuncs should be the norm for the vast majority of things > being added, and hopefully for everything (I'm going to walk back my > suggestion of adding these new dynptr functions as helpers). Honestly, > my point was really just that I think the API for defining kfuncs needs > to be improved before we can totally and completely freeze helpers due > to the fact that we have __sz and __k, and don't have a consistent > documentation story. That being said, __sz and __k are there, they work, > and as you and I have both said at this point, whether or not they're > "blockers" is subjective. > > So my answer to your question of "should we add helpers for another year > or so" in my last reply would have been "absolutely not, unless we truly > have no choice because of the lack of per-arg flags". After reading your > reply, if you're worried that that policy won't be strictly enforced > (meaning that we'll end up having to add helpers that easily could have > just been kfuncs) then I agree that we should just do the hard freeze > now. We've de-facto been doing that anyways for the last year. > > That being said, I really would hope that we could at least get some of > the documentation story figured out. Even if it's just something as > simple as spelling out a formal policy on our kfuncs docs page > stipulating that you have to add a doxygen header and link it from a > docs page, it would be nice to have some policy that puts kfuncs on a > road to being as well documented as helpers. > > > We already have 91 kfuncs and 211 helpers. > > If we were not asking all developers to use kfuncs we would have had 300+ helpers. > > Agreed that this would have been a _very_ unfortunate outcome. Again, this is a wrong dichotomy. Just because there are 91 (out of which 25-ish are test-only kfuncs that should really be in bpf_testmod, but somehow that doesn't bother anyone) kfuncs, doesn't mean they would have to all be done as BPF helpers. dynptr is stable generic concept, it should be done as BPF helpers. ct, xfrm, hid-bpf are interfaces to kernel objects, they are perfectly fit with kfunc. There is no contradiction there. Just some questionable conclusions. > > > > > > 5. Ideally we could improve the story for _defining_ kfuncs as well, > > > though IMO it's already far less painful than defining helpers. It would > > > be nice if you could just tag a kfunc with something like a __bpf_kfunc > > > macro and it would do the following: > > > > > > - Automatically disable the -Wmissing-prototypes warning. I doubt this > > > is possible without adding some compiler features that let you do > > > something like __attribute__(__nowarn__("Wmissing-prototypes")), so > > > maybe this isn't a hard blocker, but more of a medium / long-term > > > goal. > > > - Add whatever other attributes we need for the kfuncs to be safe. For > > > example, 'noinline' and '__used'. Even if the symbols are global, > > > we'll probably need '__used' for LTO. > > > > would be nice, but that didn't stop existing 91 kfuncs to appear > > and already used in production. > > Yes. kfuncs are already used in production. > > This is something that would literally only take like 1-2 patches > anyways. I'm happy to do it so we don't have to waste cycles thinking > about it as a blocker for anything. > > > > > > Overall, my point is really that we still have some homework to do > > > before we can just unilaterally freeze helpers. We're getting close, but > > > IMO not quite there yet. > > > > 91 vs 211 tells a different story. > > Yeah, the fact that we have 91 kfuncs is strong evidence that kfuncs are > already in a good-enough place to just freeze helpers. > > Another counterpoint to my initial claim that not having per-arg flags > could be problematic is that there are certain things that are global in > kfuncs that are also global in helpers despite having per-arg modifiers. > For example, the fact that you can only have one OBJ_RELEASE argument. > And yet another is the fact that none of the helpers we've added in the > last year relied on having per-arg modifiers, so in practice it hasn't > been a problem. You are conflating "single flag per func" with "which arg it belongs to doesn't matter". There could be only one OBJ_RELEASE, but we need to know which argument it applies to. Sure, today we take a shortcut and say it should apply to the only ref_obj_id-enabled argument. But think about some hypothetical kfunc: int do_something_weird(struct bpf_dynptr *dptr1, struct bpf_dynptr *dptr2) If it has OBJ_RELEASE, which arg (dptr1 or dptr2) it applies to? OBJ_RELEASE is still an argument flag. > > I think it's fair to say that if you just look at the data instead of > from an "API cleanlines" perspective, having per-arg modifiers is not a > blocker. Data wins over subjectivity, so as mentioned above, I'm willing > to change my mind about per-arg modifiers being a blocker, especially > with __sz and __k. > [...] > > > I'm not sure whether that's enough to warrant making them helpers > > > instead of kfuncs, but I do think it's not exactly an apples to apples > > > comparison with future features that today have no helper API presence. > > > Putting myself in the shoes of a dynptr user, I would be very surprised > > > and confused if all of a sudden, I couldn't use some of the core dynptr > > > APIs due to being on a platform that doesn't have kfunc support. My two > > > cents are that letting these dynptr functions stay as helpers, while > > > agreeing that kfuncs is the way forward (though I don't think Andrii > > > agrees with that even aside from just these dynptrs) is a reasonable > > > compromise that errs on the side of user-friendliness for dynptr users. > > > > We already have this 'discrepancy' of both kfuncs and helpers for kptrs > > (bpf_obj_new vs bpf_kptr_xhcg) and so far no complains. > > Why dynptr is special? > > Well, lack of usability in one case doesn't necessarily mean we should > allow it in another. That said, the "usability" gains from having a > helper really are minimal to the point of practically being negligible > anyways. Depends on perspective. If I was some humble dev trying to build BPF-based tool that should work on x86, arm64, s390x, and riscv (or whatever other architecture), and dynptr API is only based on kfuncs, I'm screwed. I can't sponsor or do kfunc support for my favorite architecture, I'm stuck waiting for this to be done by someone some time, if ever. And all because we arbitrarily decided not to do BPF helper. From a good engineering perspective, if some functionality doesn't require dependency X to work in principle, it shouldn't depend on that feature X. Even if feature X is beloved BTF. > > Part of me was trying to find a compromise here to move forward, but > honestly, I do agree with you that we should aggressively make > everything a kfunc unless we have a good reason not to, dynptr functions > included. So I'm willing to walk this suggestion back as well -- let's > just make these kfuncs. How about the policy of "let's use common sense and decide on what's best in each particular case"? Isn't that the best policy? Blanket statements and hard-defined rules are easy to follow, but they do not produce best outcomes (IMO). > > > > FWIW, I also don't think it's fair or logical to argue at this point in > > > the game that dynptrs as a concept is inherently flawed. They were super > > > useful for enabling the user ringbuf map type, which is a key part of > > > rhone / user-space scheduling in sched_ext, and I wouldn't be surprised > > > if ghOSt started using it as well as a way to make scheduling decisions > > > without trapping into the kernel as well. Also, the attendees at LSFMM > > > generally seemed enthusiastic about dynptrs and user ringbuf, though I > > > admittedly don't know who's using either feature outside of rhone. > > > > rhone doesn't have stability guarantees just like sched-ext doesn't have them. > > To drive that point rhone and sched-ext should really be using kfuncs. > > Otherwise somebody might point the finger at helpers and argue that > > this is somehow makes sched-ext stable. > > Also a reasonable point. My point above was really just a response to > your claim in [0] that dynptrs are flawed. It wasn't related to kfuncs > vs. helpers. > > [0]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ > > > > > > That being said, to reiterate, I personally agree that once we take care > > > of a few more things for kfuncs , they're 100% the way forward over > > > helpers. BPF programs are kernel programs, no UAPI pain should be > > > necessary. > > > > Similar arguments were made during sk_buff freeze... let's add few more fields > > that are going to be sooo useful and then we'll freeze sk_buff... > > dynptr is trying to be that special snow flake. > > The main points of my initial response were not about dynptrs, they were > about how we define kfuncs. I agree there is nothing at all special > about dynptrs beyond the fact that they as a feature already have > helpers. Sure, let's add them as kfuncs. No reason to be beholden to the > UAPI restrictions. > > > > > bpf_rcu_read_lock was added as a kfunc. It's more fundamental than dynptr. > > bpf_obj_new is a kfunc too. Also more fundamental than dynptr. > > What is so special about dynptr that we need to make an exception for it? > > See above.
On Fri, Dec 30, 2022 at 4:42 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: > > > > > > > > Taking bpf_get_current_task() as an example, I think it's better to have > > > > the debate be "should we keep supporting this / are users still using > > > > it?" rather than, "it's UAPI, there's nothing to even discuss". The > > > > point being that even if bpf_get_current_task() is still used, there may > > > > (and inevitably will) be other UAPI helpers that are useless and that we > > > > just can't remove. > > Sorry, missed this question in the previous reply. [...] > > Part of me was trying to find a compromise here to move forward, but > > honestly, I do agree with you that we should aggressively make > > everything a kfunc unless we have a good reason not to, dynptr functions > > included. So I'm willing to walk this suggestion back as well -- let's > > just make these kfuncs. > > Agree that any hard policy like 'only kfuncs from now on' gotta have its limits. > Maybe there will be a strong reason to add a new helper one day, > so we can keep the door open a tiny bit for an exception, > but for dynptr... > There are kfuncs with dynptr already (bpf_verify_pkcs7_signature) > So precedent is already made. bpf_verify_pkcs7_signature() is using dynptr as a pointer to memory. It's a totally valid and intended use case, to pass memory area of statically unknown size, yes. But that's very different from having basic dynptr helpers like is_null() and trim/advance as kfunc. Such helpers are stable, they manipulate generic attributes of dynptr: size, offset, underlying memory pointer. There is nothing unstable and potentially changing about them. > > > Also a reasonable point. My point above was really just a response to > > your claim in [0] that dynptrs are flawed. It wasn't related to kfuncs > > vs. helpers. > > > > [0]: https://lore.kernel.org/all/20221216173526.y3e5go6mgmjrv46l@MacBook-Pro-6.local/ > > The flawed part of dynptr I was explaining here: > https://lore.kernel.org/all/20221225215210.ekmfhyczgubx4rih@macbook-pro-6.dhcp.thefacebook.com/ > > It's not that the whole concept of dynptr is flawed, > but using it as an abstraction on top of skb/xdp. From original exchange: > > > So just because there is no perfect way to > > > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > > > dynptr concept itself is flawed or not well thought out. It's just > > > > I think that's exactly what it means. dynptr concept is flawed. Must be a lot of typos in here ;) because as written it clearly states that the whole concept of dynptr is flawed. But I'm glad we are finally on the same page at least on this point now. > I don't believe that the extreme performance demands of xdp users are > compatible with 'lets verify in runtime' philosophy of dynptr. > I could be wrong. That's why I'm fine adding dynptr_on_top_of_xdp as kfuncs > and seeing it playing out, but certainly not as a stable helper. > iirc Martin and Kuba had concerns about bits of dynptr(skb | xdp) too. > With kfuncs we can iron out the issues while trying to use it whereas > with helpers we will be stuck for long time in endless mailing list arguments. > It's a win-win for everyone to switch everything to kfuncs.
On Wed, Jan 4, 2023 at 6:25 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 1/4/23 12:51 AM, Alexei Starovoitov wrote: > > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: > >> On 12/31/22 1:42 AM, Alexei Starovoitov wrote: > >>> On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: > >>>>>> > >>>>>> Taking bpf_get_current_task() as an example, I think it's better to have > >>>>>> the debate be "should we keep supporting this / are users still using > >>>>>> it?" rather than, "it's UAPI, there's nothing to even discuss". The > >>>>>> point being that even if bpf_get_current_task() is still used, there may > >>>>>> (and inevitably will) be other UAPI helpers that are useless and that we > >>>>>> just can't remove. > >>> +1 to all the things Daniel said about end user pains and barriers for adoption, glad I'm not the only one arguing this anymore. [...] > >> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > >> and from an API PoV that it is ready to be a proper BPF helper, and until this point > > > > "Proper BPF helper" model is broken. > > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > > > is a hack that works only when compiler optimizes the code. > > See gcc's attr(kernel_helper) workaround. > > This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > > And because it's uapi we cannot even fix this > > With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > > These tools don't exist yet, but we have a way forward whereas with helpers > > we are stuck with -O2. > But specifically about how the BPF helper model is broken, that's at least an exaggeration. BPF helper call is defined at BPF ISA level, it has to be a `call <some constant>;`, and as long as compiler generates such code, it's all good. From C standpoint UAPI is just a function call: bpf_map_lookup_elem(&map, ...); As long as this compiles and generates proper `call 1;` assembly instruction, we are good. If/when both Clang and GCC support an alternative way to define helper and not as a static func pointer, -O0 builds (at least in the aspect of calling BPF helpers, I suspect other stuff will break still) will just work. And what's better, bpf_helper_defs.h would be able to pick the best option based on compiler's support with end users not having to care or notice the difference. This is not an UAPI problem at all. > Better debugging tools are needed either way, independent of -O0 or -O2. I don't > think -O0 is a requirement or barrier for that. It may open up possibilities for > new tools, but production is still running with -O2. Proper BPF helper model is > broken, but everyone relies on it, and will be for a very very long time to come, > whether we like it or not. There is a larger ecosystem around BPF devs outside of > kernel, and developers will use the existing means today. There are recommendations / > guidelines that we can provide but we also don't have control over what developers > are doing. Yet we should make their life easier, not harder. Better debugging > possibilities should cater to everyone. > > Thanks, > Daniel
On Wed, Jan 04, 2023 at 03:25:00PM +0100, Daniel Borkmann wrote: > On 1/4/23 12:51 AM, Alexei Starovoitov wrote: > > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: > > > On 12/31/22 1:42 AM, Alexei Starovoitov wrote: > > > > On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: > > > > > > > > > > > > > > Taking bpf_get_current_task() as an example, I think it's better to have > > > > > > > the debate be "should we keep supporting this / are users still using > > > > > > > it?" rather than, "it's UAPI, there's nothing to even discuss". The > > > > > > > point being that even if bpf_get_current_task() is still used, there may > > > > > > > (and inevitably will) be other UAPI helpers that are useless and that we > > > > > > > just can't remove. > > > > > > > > Sorry, missed this question in the previous reply. > > > > The answer is "it's UAPI, there's nothing to even discuss". > > > > It doesn't matter whether bpf_get_current_task() is used heavily or not used at all. > > > > The chance of breaking user space is what paralyzes the changes. > > > > Any change to uapi header file is looked at with a magnifying glass. > > > > There is no deprecation story for uapi. > > > > The definition and semantics of bpf helpers are frozen _forever_. > > > > And our uapi/bpf.h is not in a good company: > > > > ls -Sla include/uapi/linux/|head > > > > -rw-r--r-- 1 ast users 331159 Nov 3 08:32 nl80211.h > > > > -rw-r--r-- 1 ast users 265312 Dec 25 13:51 bpf.h > > > > -rw-r--r-- 1 ast users 118621 Dec 25 13:51 v4l2-controls.h > > > > -rw-r--r-- 1 ast users 99533 Dec 25 13:51 videodev2.h > > > > -rw-r--r-- 1 ast users 86460 Nov 29 11:15 ethtool.h > > > > > > > > "Freeze bpf helpers now" is a minimum we should do right now. > > > > We need to take aggressive steps to freeze the growth of the whole uapi/bpf.h > > > > > > Imho, freezing BPF helpers now is way too aggressive step. One aspect which was > > > not discussed here is that unstable kfuncs will be a pain for user experience > > > compared to BPF helpers. Probably not for FB or G who maintain they own limited > > > set of kernels, but for all others. If there is valid reason that kfuncs will have > > > to change one way or another, then BPF applications using them will have to carry > > > the maintenance burden on their side to be able to support a variety of kernel > > > versions with working around the kfunc quirks. So you're essentially outsourcing > > > the problem from kernel to users, which will suck from a user experience (and add > > > to development cost on their side). > > > > It's actually the opposite. > > A small company that wants to use BPF needs to have a workaround/plan B for > > different kernels and different distros. > > That's why cilium and others have to detect availability of helpers and bpf features. > > One bpf prog for newer kernel and potentially completely different solution > > for older kernels. > > That's the biggest obstacle in bpf adoption: the required features are in > > the latest kernels, but companies have to support older kernels too. > > Now look at the problem from different angle: > > Detecting kfuncs is no different than detecting helpers. > > The bpf users has to have a workaround when helper/kfunc is not available. > > In that sense stability of the helpers vs instability of kfuncs is irrelevant. > > Both might not exist in a particular kernel. > > So if cilium starts to use kfunc it won't be extra development cost and > > bpf program writer experience using kfuncs vs using helpers is the same as well. > > But that was not the point I was making. What you describe above is the baseline > cost which is there regardless of BPF helper vs kfunc.. detecting availability > and having a workaround for older kernel if needed. The added cost is if kfunc > changes over time for whichever valid reason, then you are essentially pushing > the maintenance cost _from kernel to users_ when they need to keep track of that > and implement workarounds specifically to make the kfunc work in their program > for a set of kernels they plan to support, which they otherwise would /not/ have > if it was a BPF helper. It raises the barrier from user side. Similarly, if users > started out with using kfunc from a base kernel, and in future it might get > removed given its not stable, then a workaround (if possible) needs to be > implemented for newer kernels - probably rare occasion but not impossible or > something that can be ruled out entirely. In theory it all makes sense assuming that kernel devs keep changing kfuncs to make users suffer. You're painting kernel as malicious towards users whereas in reallity it's exactly the opposite. When we add a kfunc we think just as hard about its usefulness. We don't have a deprecation strategy yet and that's the point I'm making: while we think about helpers as the only stable medium we won't be making progress in kfunc deprecation and kfunc stability areas. > So the stability of the helpers vs > instability of kfuncs is relevant in that case, not for the case you describe > above, and that is extra development cost on user side. Generally, what I'm saying > is, there needs to be a path forward where we are still open for both instead of > completely freezing the former. 'extra development cost on user side'... in theory. None of it happened in practice yet. kfuncs is the best answer to uapi rigidness we have. Maybe years from now we realize that kfunc mechanism sucks too and we will replace it with something else. It's a possiblity and opportunity to make our own decisions and fix our mistakes where uapi rules we cannot change. > > But with kfuncs we can solve this bpf adoption issue. > > The helpers are not easily backportable and cannot be added in modules, > > so company's workarounds for older kernel are painful. > > While kfuncs are trivially added in a module. > > Maybe to a small degree. Often shipping out-of-tree kernel module is generally > a no-go from corp policy and there's nothing you can do about it in such case. Often yes, but in many cases the customers are ok with additional ko-s when it's clear that there is a path forward to upstream the ko's functionality. In this case the kfuncs will be already upstream, so selling out-of-tree ko that implements what's already upstream is much easier. > "trivially added" is a bit oversimplified as well.. depends on the kfunc of course, > but potentially painful in terms of having to work around various changing kernel > internals for your kfunc implementation and only possible if kernel actually exposes > the needed functionality to modules. While the adoption issue /can/ in some cases be > solved, I don't think it will be widely practical to solve adoption issue. Eventually > only time will solve it when everyone is on decent enough kernel as baseline, this > is what is there today at least for networking and tracing side where BPF is widely > adopted and its available framework big enough to solve many use cases. Of course. The verification of kfuncs still rapidly evolves. Today we cannot claim that 6.1 kernel will be a stable base and the model of 'kfuncs in extra ko' will work from now on. The point that we need to stop thinking about helpers as the only stable option we have and align all our efforts behind kfuncs, define deprecation and stability rules. > Aside and independent of all that, kfuncs added in out of tree modules should be > discouraged. After all we want developers to contribute back to upstream kernel, > and for a very long time we've had the stance that no extra functionality should be > possible via out of tree module extensions. Right. That model worked until windows came along and started defining their own stable helpers with different func_id numbers. Now if cilium wants to run on linux and windows it still needs to use different bpf_helper_defs.h. The C code stays largerly the same, but the numbers change and their semantics between OSes likely differ a tiny bit to be annoying long term. The point is the stability of helpers is relative. > > Let's take bpf_sock_destroy that Aditi wants to add as an example. > > If it's done as a helper the cilium would need to wait for the next kernel release > > and next distro release some years from now to actually use it at the customer site. > > Yeap, with some distros in K8s space being better than others, for example, some like > Flatcar tend to be fairly up to date. With major LTS ones it takes 1+ years though. > > > If bpf_sock_destroy is added as kfunc you can ship an extra kernel module > > with just that kfunc to your customers. You can also attempt to convince a distro > > that this module with kfuncs should be certified, since the same kfunc is in upstream kernel. > > The customer can use cilium that relies on bpf_sock_destroy much sooner > > and likely there won't be a need to develop a completely different workaround > > for kernels without that kfunc. > > See above wrt modules. Some larger users which run their own DC infra also build > kernels for themselves, so in some cases it's possible and easier from corp policy > PoV to just cherry-pick upstream commits and roll them into their own kernel build > until they upgrade at some point to a base kernel where this comes by default. Some > of the distro vendors build "hw enablement" kernels for cloud providers and there > it is possible too to ask for backports on core functionality even if not in stable, > it's a slow process however. Right. Redhat is backporting quite a bit of upstream bpf features into their official kernels and that's great. With kfuncs in ko-s it will become much easier. No need to validate the whole kernel. The QA effort is smaller, code reviews are easier, etc. The kfuncs in ko-s will be easier on support team too, since any kernel crash is easier to attribute. "pls unload kfunc-ko and repeate your work". > > [...] > > > Ofc there is interest in keeping changes to a > > > minimum, but it's not the same as BPF helpers where there is a significantly higher > > > guarantee that things continue to keep working going forward. Today in Cilium we > > > don't use any of the kfuncs, we might at some point when we see it necessary, but > > > likely to a limited degree if sth cannot be solved as-is and only kfunc is present > > > as a solution. But again, from a UX it's not great having to know that things can > > > break anytime soon with newer kernels (things might already with verifier/LLVM > > > upgrade and kfunc potentially adds yet another level). Generally speaking, I'm not > > > against kfuncs but I suggest only making "freeze bpf helpers now" a soft freeze > > > with a path forward for promoting some of the kfuncs which have been around and > > > matured for a while and didn't need changes as stable BPF helpers to indicate their > > > maturity level when we see it fit. So it's not a hard "no", but possible promotion > > > when suitable. > > > > The problem with 'soft' freeze that it's open to interpretation and abuse. > > It feels to me you're saying that cilium is not using kfuncs and > > therefore all cilium features additions are ok to be done as helpers. > > That doesn't sound fair to other bpf devs. > > I think you misread, lets not twist what I mentioned. All I was saying is that we > should keep the door open for both to continue to co-exist; both have a place, both > come with their advantages but also baggage. It's not that one is absolutely better > than the other, and that maintenance baggage is either on our side or pushed towards > users. > > [...] > > > Discoverability plus being able to know semantics from a user PoV to figure out when > > > workarounds for older/newer kernels are required to be able to support both kernels. > > > > Sounds like your concern is that there could be a kfunc that changed it semantics, > > but kept exact same name and arguments? Yeah. That would be bad, but we should prevent > > such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs. > > Yes, that is a concern. New kfunc and deprecation with eventual removal of the old > one might be better in such case, agree. > > [...] > > > is imho repeating the same story as BPF helpers vs kfuncs. Saying a kfunc is 'pretty > > > stable' is kind of hinting to users that it's close to UAPI, but yet it's unstable. > > > > correct. > > > > > It'll confuse even more. I'd rather have a path forward where those kfuncs get promoted > > > > why confuse more? There are EXPORT_SYMBOL like kmalloc that are quite stable, > > yet they can change. > > EXPORT_SYMBOL_GPL is exact analogy to kfunc. > > They are quite stable because they are used in lots of places in-tree and changing > would cause a ton of needless churn and merge conflicts for everyone, etc. You might > not always have this kind of visibility on usage of kfuncs. The data you have is > from your internal code base and what's in some of the larger OSS projects, but > certainly a more limited/biased view. So as with 'soft' freeze this is just as well open > to interpretation. "confuse more" because you declare it quite stable, yet not stable. > Why is there fear to make them proper uapi then with the given known guarantees? From > user side this guarantee is a good thing, not a bad thing. Mistakes were/are made all > the time and learned from. Imagine syscall API is not stable anymore. Would you invest > the cost to develop an application against it? Would you invest in developing application against unstable syscall API? Absolutely. People develop all tons of stuff on top of fuse-fs. People develop apps that interact with tracing bpf progs that are clearly unstable. They do suffer when kernel side changes and people accept that cost. BPF and tracing in general contributed to that mind change. In a datacenter quite a few user apps are tied to kernel internals. > Imho, it's one of BPF's strengths and > we should keep the door open, not close it. The strength of BPF was and still is that it has both stable and unstable interfaces. Roughly: networking is stable, tracing is unstable. The point is that to be stable one doesn't need to use helpers. We can make kfuncs stable too if we focus all our efforts this way and for that we need to abandon adding helpers though it's a pain short term. > > > > to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > > > and from an API PoV that it is ready to be a proper BPF helper, and until this point > > > > "Proper BPF helper" model is broken. > > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > > > is a hack that works only when compiler optimizes the code. > > See gcc's attr(kernel_helper) workaround. > > This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > > And because it's uapi we cannot even fix this > > With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > > These tools don't exist yet, but we have a way forward whereas with helpers > > we are stuck with -O2. > > Better debugging tools are needed either way, independent of -O0 or -O2. I don't > think -O0 is a requirement or barrier for that. It may open up possibilities for > new tools, but production is still running with -O2. Proper BPF helper model is > broken, but everyone relies on it, and will be for a very very long time to come, > whether we like it or not. There is a larger ecosystem around BPF devs outside of > kernel, and developers will use the existing means today. There are recommendations / > guidelines that we can provide but we also don't have control over what developers > are doing. Yet we should make their life easier, not harder. Fully fleshed out kfunc infra will make developers job easier. No one is advocating to make users suffer.
On Wed, Jan 04, 2023 at 10:43:37AM -0800, Andrii Nakryiko wrote: > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > > > will likely work with both gcc and clang. > > And if it doesn't we can fix it. > > > > While when gcc folks saw helpers: > > > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > > > they realized that it is a hack that abuses compiler optimizations. > > They even invented attr(kernel_helper) to workaround this issue. > > After a bunch of arguing gcc added support for this hack without attr, > > but it's going to be around forever... in gcc, in clang and in kernel. > > It's something that we could have fixed if it wasn't for uapi. > > Just one more example of unfixable mistake that causing issues > > to multiple projects. > > That's the core issue of kernel uapi rules: inability to fix mistakes. > > This is BPF ISA defining `call #N;` to call helper with ID N, which > you agree that it (ISA) has to be stable, documented and standardized, > right? > > Everything else is just how we expose those constants into C code and > how libbpf deals with them. Libbpf could support new attribute or even > extern-based convention, if necessary. > > But it wasn't necessary for years and only was brought up during GCC's > attempt to invent a new convention here. And they successfully dealt > with this challenge. 'dealt with this challenge'? You mean didn't, right? gcc doesn't guarantee that '= (void *) 777;' will work even with optimization on. In clang we cannot guarantee that either. Nothing requires a compiler to do constant propagation. > > Yes, we won't change existing helpers, but we can add new ones if we > need to extend them. That's how APIs work. Yes, they need careful > considerations when designing and implementing new APIs. Yes, mistakes > do happen, that's just fact of life and par for the course of software > development. Yes, we have to live with those mistakes. Nothing changed > about that. > > But somehow libraries and kernel still produce stable APIs and > maintain them because they clearly provide benefits to end users. Did you 'live with mistakes done in libbpf 0.x' ? No. You've introduced libbpf 1.0 with incompatible api and some users suffereed. > We'll get the same amount of flame when we try to change kfunc that's > widely adopted. Of course. That's why we need to define a stability and deperecation plan for them.
On Wed, Jan 04, 2023 at 10:43:52AM -0800, Andrii Nakryiko wrote: > > struct bpf_dynptr dptr = ...; > bool is_null = false; > > if (bpf_core_value_exists(enum bpf_func_id, BPF_FUNC_dynptr_is_null)) { > is_null = bpf_dynptr_is_null(&dptr); > } else { > struct bpf_dynptr_kern *kdptr = (void*)&dptr; > is_null = !!BPF_CORE_READ(kdptr, data); > } > > How do you detect the existence of kfunc today? Preferably without > doing extra work in user-space. > > Now, let's say kfunc changes its signature. Show me a short example on > how you deal with that in BPF C code? Didn't we add bpf_core_type_matches for func protos specifically to deal with function signature changes in the kernel after tracepoint args got swapped? I'm assuming the same mechanism will work for kfuncs. If not we can come up with a new one. > > Think about sched_ext. Right now it's so bleeding edge that you have > to assume the very latest and freshest kernel code. So you know all > the kfuncs that you need should exist otherwise sched_ext doesn't work > at all. Ok, happy place. > > Now a year or two passes by. Some kfuncs are added, some are changed. > We still believe that BPF CO-RE (compile once - run everywhere) is > good and we don't want to compile and distribute multiple versions of > BPF application, right? You'll want to do some extra (or more > performant) stuff if kernel is recent and has some new kfunc, but > fallback to some default suboptimal behavior otherwise. How do you do > that in a simple and straightforward way? with a help of CORE, of course. If it doesn't exist today we can add it. > But even worse is what if > some critical kfunc is changed between kernel versions and you do > *need* to support both versions. Think about those aspects, because > sched_ext will run into them almost inevitably soon after its > inclusion into kernel. > > > One way or another there are some technical solution of various > degrees of creativity. And I'm actually not sure if I have a solution > for kfunc signature change at all. Without BTF we could use two > separate .c files and statically link them together, which would work > because extern is untyped in pure C. But with BPF static linking we do > have BTF information for each extern, and those BTF types will be > incompatible for the same extern func. > > We can probably come up with some hacks and conventions, as usual, but > better start thinking about them now. > > But hopefully you can empathize a bit more with poor end users that > have to do hack like this and why having bpf_dynptr API defined as > stable BPF helpers, with no extra dependencies on BTF in kernel, BTF is a reasonable dependency. You've just used it to detect whether helper exists or not. So it's fine to use the same to check whether kfunc exists or not. > > Depends on perspective. If I was some humble dev trying to build > BPF-based tool that should work on x86, arm64, s390x, and riscv (or > whatever other architecture), and dynptr API is only based on kfuncs, > I'm screwed. I can't sponsor or do kfunc support for my favorite > architecture, I'm stuck waiting for this to be done by someone some > time, if ever. If kfuncs and bpf trampoline don't work on a particular architecture that developer is likely screwed anyway. Dynptr is the last thing they would worry about.
On Wed, Jan 04, 2023 at 10:44:07AM -0800, Andrii Nakryiko wrote: > > > > Agree that any hard policy like 'only kfuncs from now on' gotta have its limits. > > Maybe there will be a strong reason to add a new helper one day, > > so we can keep the door open a tiny bit for an exception, > > but for dynptr... > > There are kfuncs with dynptr already (bpf_verify_pkcs7_signature) > > So precedent is already made. > > bpf_verify_pkcs7_signature() is using dynptr as a pointer to memory. > It's a totally valid and intended use case, to pass memory area of > statically unknown size, yes. > > But that's very different from having basic dynptr helpers like > is_null() and trim/advance as kfunc. Such helpers are stable, they > manipulate generic attributes of dynptr: size, offset, underlying > memory pointer. There is nothing unstable and potentially changing > about them. dynptr is defined in uapi as: struct bpf_dynptr { __u64 :64; __u64 :64; } __attribute__((aligned(8))); So sizes, offset and memory pointer are not stable today and there is no need to stabilize this part of it. > From original exchange: > > > > > So just because there is no perfect way to > > > > handle all the SKB/XDP physical non-contiguity, doesn't mean that the > > > > dynptr concept itself is flawed or not well thought out. It's just > > > > > > I think that's exactly what it means. dynptr concept is flawed. > > Must be a lot of typos in here ;) because as written it clearly states > that the whole concept of dynptr is flawed. Maybe will we realize a year from now that it is? We have some uapi exposure of dynptr in uapi. I think it's a safer bet to keep it to the minimum.
On Wed, Jan 04, 2023 at 10:59:15AM -0800, Andrii Nakryiko wrote: > > > >> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > > >> and from an API PoV that it is ready to be a proper BPF helper, and until this point > > > > > > "Proper BPF helper" model is broken. > > > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > > > > > is a hack that works only when compiler optimizes the code. > > > See gcc's attr(kernel_helper) workaround. > > > This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > > > And because it's uapi we cannot even fix this > > > With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > > > These tools don't exist yet, but we have a way forward whereas with helpers > > > we are stuck with -O2. > > > > But specifically about how the BPF helper model is broken, that's at > least an exaggeration. BPF helper call is defined at BPF ISA level, it > has to be a `call <some constant>;`, and as long as compiler generates > such code, it's all good. From C standpoint UAPI is just a function > call: > > bpf_map_lookup_elem(&map, ...); > > As long as this compiles and generates proper `call 1;` assembly > instruction, we are good. If/when both Clang and GCC support an > alternative way to define helper and not as a static func pointer, -O0 > builds (at least in the aspect of calling BPF helpers, I suspect other > stuff will break still) will just work. And what's better, > bpf_helper_defs.h would be able to pick the best option based on > compiler's support with end users not having to care or notice the > difference. Right and that's what gcc did with attribute((kernel_helper(1)), but we didn't like it because gcc and clang would diverge. Now you're arguing it's just a bpf_helper_defs.h change and we should have allowed it? Also consider that 'call <some constant>' or more precise 'call absolute_address' as an instruction exist in only one CPU architecture. It's BPF ISA. It's a mistake that I made 8 years ago and inability to fix it bothers me. Now we have 100 times more developers than we had 8 years ago. I expect 100 time more UAPI and ABI mistakes. Minimizing unfixable mistakes is what I'm after.
On Wed, Jan 04, 2023 at 03:25:00PM +0100, Daniel Borkmann wrote: > On 1/4/23 12:51 AM, Alexei Starovoitov wrote: > > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: > > > On 12/31/22 1:42 AM, Alexei Starovoitov wrote: > > > > On Fri, Dec 30, 2022 at 03:00:21PM -0600, David Vernet wrote: > > > > > > > > > > > > > > Taking bpf_get_current_task() as an example, I think it's better to have > > > > > > > the debate be "should we keep supporting this / are users still using > > > > > > > it?" rather than, "it's UAPI, there's nothing to even discuss". The > > > > > > > point being that even if bpf_get_current_task() is still used, there may > > > > > > > (and inevitably will) be other UAPI helpers that are useless and that we > > > > > > > just can't remove. > > > > > > > > Sorry, missed this question in the previous reply. > > > > The answer is "it's UAPI, there's nothing to even discuss". > > > > It doesn't matter whether bpf_get_current_task() is used heavily or not used at all. > > > > The chance of breaking user space is what paralyzes the changes. > > > > Any change to uapi header file is looked at with a magnifying glass. > > > > There is no deprecation story for uapi. > > > > The definition and semantics of bpf helpers are frozen _forever_. > > > > And our uapi/bpf.h is not in a good company: > > > > ls -Sla include/uapi/linux/|head > > > > -rw-r--r-- 1 ast users 331159 Nov 3 08:32 nl80211.h > > > > -rw-r--r-- 1 ast users 265312 Dec 25 13:51 bpf.h > > > > -rw-r--r-- 1 ast users 118621 Dec 25 13:51 v4l2-controls.h > > > > -rw-r--r-- 1 ast users 99533 Dec 25 13:51 videodev2.h > > > > -rw-r--r-- 1 ast users 86460 Nov 29 11:15 ethtool.h > > > > > > > > "Freeze bpf helpers now" is a minimum we should do right now. > > > > We need to take aggressive steps to freeze the growth of the whole uapi/bpf.h > > > > > > Imho, freezing BPF helpers now is way too aggressive step. One aspect which was > > > not discussed here is that unstable kfuncs will be a pain for user experience > > > compared to BPF helpers. Probably not for FB or G who maintain they own limited > > > set of kernels, but for all others. If there is valid reason that kfuncs will have > > > to change one way or another, then BPF applications using them will have to carry > > > the maintenance burden on their side to be able to support a variety of kernel > > > versions with working around the kfunc quirks. So you're essentially outsourcing > > > the problem from kernel to users, which will suck from a user experience (and add > > > to development cost on their side). > > > > It's actually the opposite. > > A small company that wants to use BPF needs to have a workaround/plan B for > > different kernels and different distros. > > That's why cilium and others have to detect availability of helpers and bpf features. > > One bpf prog for newer kernel and potentially completely different solution > > for older kernels. > > That's the biggest obstacle in bpf adoption: the required features are in > > the latest kernels, but companies have to support older kernels too. > > Now look at the problem from different angle: > > Detecting kfuncs is no different than detecting helpers. > > The bpf users has to have a workaround when helper/kfunc is not available. > > In that sense stability of the helpers vs instability of kfuncs is irrelevant. > > Both might not exist in a particular kernel. > > So if cilium starts to use kfunc it won't be extra development cost and > > bpf program writer experience using kfuncs vs using helpers is the same as well. > > But that was not the point I was making. What you describe above is the baseline > cost which is there regardless of BPF helper vs kfunc.. detecting availability > and having a workaround for older kernel if needed. The added cost is if kfunc > changes over time for whichever valid reason, then you are essentially pushing But if there is a "valid reason" to change something, then it's better to have the _option_ to change it, no? IMHO that's the key point here. With kfuncs, "reasons" are allowed to be part of the discussion. With UAPI, there is nothing to discuss. And that's the fundamental problem with having things in UAPI. Forever is a very long time. Do we really not want to have the option of changing or removing something after (e.g.) 20 years? 40 years? 60 years? I agree with you that it's unambiguous that using kfuncs instead of helpers does shift some maintenance cost from the kernel to users, but IMO the point is that with kfuncs we at least have the ability to control that cost. Taking an extreme example, we could decide to support a kfunc for 30 years, and then deprecate it for 10 years, and then and then finally remove it. With UAPI our childrens' childrens' children will have to support it. I don't think guaranteed stability is worth that cost. Not for symbols exported by the kernel, used by other kernel programs, which is fundamentally what BPF programs are. Another way to look at it would be: do we expect tooling to support all kernel versions and features indefinitely? When we're on Linux 50.15, do we expect that there will be tooling that requires us to support bpf_get_current_task() instead of bpf_get_current_task_btf()? And even if there is a tool that needs it, is it worth the cost of keeping it around? With kfuncs the question would matter, even if it's "yes it's worth it". With UAPI, the question is meaningless. I realize that I'm being a bit hyperbolic here, and it is not my intention to misrepresent any points made in favor of not freezing UAPI. I just think it's necessary to be hyperbolic when it comes to UAPI to really underscore the implications of using it. There are very good reasons for having UAPI in general, but IMHO, those reasons don't apply to kernel programs, which is really what we're talking about here. > the maintenance cost _from kernel to users_ when they need to keep track of that > and implement workarounds specifically to make the kfunc work in their program > for a set of kernels they plan to support, which they otherwise would /not/ have > if it was a BPF helper. It raises the barrier from user side. Similarly, if users > started out with using kfunc from a base kernel, and in future it might get > removed given its not stable, then a workaround (if possible) needs to be > implemented for newer kernels - probably rare occasion but not impossible or > something that can be ruled out entirely. So the stability of the helpers vs > instability of kfuncs is relevant in that case, not for the case you describe > above, and that is extra development cost on user side. Generally, what I'm saying > is, there needs to be a path forward where we are still open for both instead of > completely freezing the former. Curious what you envision as the policy long term (i.e. after the path forward)? The reason I ask is that on the one hand we're claiming that kfuncs work for some things, while on the other we seem to be claiming that UAPI is _necessary_ for users to have guaranteed stability and adopt the platform (and I will preemptively apologize if I'm unintentionally misrepresenting your view by saying that). If we operate under the assumption that helpers are necessary for certain things due to its stability guarantees, whereas kfuncs are appropriate in some cases, I think that begs the question: what criteria are we using to decide when stability is really necessary? We could say "for core functionality", but how do we know that there aren't other users out there who are using "non-core-functionality" kfuncs instead of helpers? Why do we give stability to some users but not others? The fact that we don't have a crystal ball seems to be the central argument around why we need UAPI, but I think it's a fallacy to have that view at the same time as also supporting the existence of kfuncs. [...] > > > Discoverability plus being able to know semantics from a user PoV to figure out when > > > workarounds for older/newer kernels are required to be able to support both kernels. > > > > Sounds like your concern is that there could be a kfunc that changed it semantics, > > but kept exact same name and arguments? Yeah. That would be bad, but we should prevent > > such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs. > > Yes, that is a concern. New kfunc and deprecation with eventual removal of the old > one might be better in such case, agree. Agreed. With kfuncs, say that the scenario described comes to pass. We could have a hypothetical deprecation policy like the following: 1. Add the new kfunc with the changed semantics, arguments, etc, under a different name. 2. Deprecate the old kfunc for X years / releases, where X is whatever conservative deprecation value we deem appropriate (and one which we could always extend if need be). 3. Once we feel we're ready to remove the old kfunc, we remove it, rename the new (now old) kfunc from (1) to that name, and then keep the temporary name from the new-old kfunc in (1) as a wrapper / alias around it. That temporary alias can itself then be deprecated and removed after X years. All of this is carefully orchestrated, and we have the flexibility to be as conservative as we'd like in support of users. Maybe we decide that we can never stop supporting the original kfunc because it's too ubiquitous. It will surely depend on the policy we end up crafting for kfuncs, and will probably sometimes require a case-by-case determination, but at least we'll have the flexibility to choose. > > [...] > > > is imho repeating the same story as BPF helpers vs kfuncs. Saying a kfunc is 'pretty > > > stable' is kind of hinting to users that it's close to UAPI, but yet it's unstable. > > > > correct. > > > > > It'll confuse even more. I'd rather have a path forward where those kfuncs get promoted > > > > why confuse more? There are EXPORT_SYMBOL like kmalloc that are quite stable, > > yet they can change. > > EXPORT_SYMBOL_GPL is exact analogy to kfunc. > > They are quite stable because they are used in lots of places in-tree and changing > would cause a ton of needless churn and merge conflicts for everyone, etc. You might > not always have this kind of visibility on usage of kfuncs. The data you have is > from your internal code base and what's in some of the larger OSS projects, but > certainly a more limited/biased view. So as with 'soft' freeze this is just as well open > to interpretation. "confuse more" because you declare it quite stable, yet not stable. > Why is there fear to make them proper uapi then with the given known guarantees? From > user side this guarantee is a good thing, not a bad thing. Mistakes were/are made all > the time and learned from. Imagine syscall API is not stable anymore. Would you invest > the cost to develop an application against it? Imho, it's one of BPF's strengths and > we should keep the door open, not close it. But we're talking about _kernel_ programs here, not user programs. And from that perspective, one could argue that having kfuncs actually promotes more upstreaming of BPF programs for the exact reasons you're spelling out here, just as EXPORT_SYMBOL_GPL promotes the upstreaming of modules. Of course, it won't be the exact same as EXPORT_SYMBOL_GPL because we'll still come up with a well documented, reliable deprecation story, but the benefits of upstreaming the BPF program still apply. In general, I think BPF programs and the syscall layer is really an apples and oranges comparison. The kernel has internally never had a stable interface as Greg describes in [0]. I don't see why we'd frame BPF programs differently than any other kernel program in that regard. [0]: https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst > > > to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > > > and from an API PoV that it is ready to be a proper BPF helper, and until this point > > > > "Proper BPF helper" model is broken. > > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > > > is a hack that works only when compiler optimizes the code. > > See gcc's attr(kernel_helper) workaround. > > This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > > And because it's uapi we cannot even fix this > > With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > > These tools don't exist yet, but we have a way forward whereas with helpers > > we are stuck with -O2. > > Better debugging tools are needed either way, independent of -O0 or -O2. I don't > think -O0 is a requirement or barrier for that. It may open up possibilities for I personally disagree that not being able to support -O0 is sane for a debugging tool, but IMHO that's not the main point. Rather, it's that what we have now is kind of a mess (I think we're all in agreement on that?), and we can never fix it because of UAPI. IMO, that is a sign that things need to change. > new tools, but production is still running with -O2. Proper BPF helper model is > broken, but everyone relies on it, and will be for a very very long time to come, > whether we like it or not. There is a larger ecosystem around BPF devs outside of > kernel, and developers will use the existing means today. There are recommendations / > guidelines that we can provide but we also don't have control over what developers > are doing. Yet we should make their life easier, not harder. Better debugging > possibilities should cater to everyone. > > Thanks, > Daniel
On Wed, Jan 4, 2023 at 11:44 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Jan 04, 2023 at 10:43:37AM -0800, Andrii Nakryiko wrote: > > > extern bool bpf_dynptr_is_null(const struct bpf_dynptr *p) __ksym; > > > > > > will likely work with both gcc and clang. > > > And if it doesn't we can fix it. > > > > > > While when gcc folks saw helpers: > > > > > > static bool (*bpf_dynptr_is_null)(const struct bpf_dynptr *p) = (void *) 777; > > > > > > they realized that it is a hack that abuses compiler optimizations. > > > They even invented attr(kernel_helper) to workaround this issue. > > > After a bunch of arguing gcc added support for this hack without attr, > > > but it's going to be around forever... in gcc, in clang and in kernel. > > > It's something that we could have fixed if it wasn't for uapi. > > > Just one more example of unfixable mistake that causing issues > > > to multiple projects. > > > That's the core issue of kernel uapi rules: inability to fix mistakes. > > > > This is BPF ISA defining `call #N;` to call helper with ID N, which > > you agree that it (ISA) has to be stable, documented and standardized, > > right? > > > > Everything else is just how we expose those constants into C code and > > how libbpf deals with them. Libbpf could support new attribute or even > > extern-based convention, if necessary. > > > > But it wasn't necessary for years and only was brought up during GCC's > > attempt to invent a new convention here. And they successfully dealt > > with this challenge. > > 'dealt with this challenge'? You mean didn't, right? > gcc doesn't guarantee that '= (void *) 777;' will work even with optimization on. I don't use gcc-bpf, but given they dropped kernel_helper attribute, and given you said "After a bunch of arguing gcc added support for this hack without attr but it's going to be around forever..." I assumed it does work. Are you saying it doesn't? > In clang we cannot guarantee that either. It works today, if it ever regresses there will be a lot of noise and this regression will be fixed. So maybe technically it's not guaranteed, but in practice it will keep working. We had a `const volatile` case recently, variables were not being put into .rodata section properly. GCC was changed to do it the same way as Clang so that all the existing apps can keep working. > Nothing requires a compiler to do constant propagation. > > > > > Yes, we won't change existing helpers, but we can add new ones if we > > need to extend them. That's how APIs work. Yes, they need careful > > considerations when designing and implementing new APIs. Yes, mistakes > > do happen, that's just fact of life and par for the course of software > > development. Yes, we have to live with those mistakes. Nothing changed > > about that. > > > > But somehow libraries and kernel still produce stable APIs and > > maintain them because they clearly provide benefits to end users. > > Did you 'live with mistakes done in libbpf 0.x' ? No. for a long time yes. And it's not apples to apples comparison, with library it is possible to deprecate APIs, which is what we did. With lots of work and gradual transition, but did it. If we couldn't pull this through, yeah, I would live with whatever APIs are there. And added new ones as a better replacement. As is always done for APIs, nothing new here. Within 0.x and 1.x APIs are stable and we live with them. This API stability fear doesn't paralyze libbpf development, we still add new stable APIs, if they are considered useful and thought through enough. > You've introduced libbpf 1.0 with incompatible api and some users suffereed. By "suffered" you mean a few systemd folks being grumpy about this? And having to do 100 lines of code changes ([0]) to support two incompatible major versions of libbpf *simultaneously*? On the other hand we got a library with saner error propagation behavior and various API normalizations and additions. Not too bad of a trade off. Sure, deprecation is not easy or free, there was a lot of prep work, and some users had to adjust their code to use new APIs. But this is quite a tangent. [0] https://github.com/systemd/systemd/pull/24511/ > > > We'll get the same amount of flame when we try to change kfunc that's > > widely adopted. > > Of course. That's why we need to define a stability and deperecation > plan for them. Lots of things that need to be defined and figured out, but we are already quick to freeze BPF helpers.
On Wed, Jan 4, 2023 at 11:51 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Jan 04, 2023 at 10:43:52AM -0800, Andrii Nakryiko wrote: > > > > struct bpf_dynptr dptr = ...; > > bool is_null = false; > > > > if (bpf_core_value_exists(enum bpf_func_id, BPF_FUNC_dynptr_is_null)) { > > is_null = bpf_dynptr_is_null(&dptr); > > } else { > > struct bpf_dynptr_kern *kdptr = (void*)&dptr; > > is_null = !!BPF_CORE_READ(kdptr, data); > > } > > > > How do you detect the existence of kfunc today? Preferably without > > doing extra work in user-space. > > > > Now, let's say kfunc changes its signature. Show me a short example on > > how you deal with that in BPF C code? > > Didn't we add bpf_core_type_matches for func protos specifically > to deal with function signature changes in the kernel after tracepoint > args got swapped? > I'm assuming the same mechanism will work for kfuncs. > If not we can come up with a new one. It would be good if someone actually try that and see if it works, and if it doesn't, to come up with an approach that does. Right now I just see hand-wavy arguments that BPF helpers and BPF kfuncs are equivalent in this regard. Which currently I'm afraid they are not. > > > > > Think about sched_ext. Right now it's so bleeding edge that you have > > to assume the very latest and freshest kernel code. So you know all > > the kfuncs that you need should exist otherwise sched_ext doesn't work > > at all. Ok, happy place. > > > > Now a year or two passes by. Some kfuncs are added, some are changed. > > We still believe that BPF CO-RE (compile once - run everywhere) is > > good and we don't want to compile and distribute multiple versions of > > BPF application, right? You'll want to do some extra (or more > > performant) stuff if kernel is recent and has some new kfunc, but > > fallback to some default suboptimal behavior otherwise. How do you do > > that in a simple and straightforward way? > > with a help of CORE, of course. > If it doesn't exist today we can add it. > > > But even worse is what if > > some critical kfunc is changed between kernel versions and you do How about this one? I'm honestly curious to see someone try and figure out what works and what doesn't. > > *need* to support both versions. Think about those aspects, because > > sched_ext will run into them almost inevitably soon after its > > inclusion into kernel. > > > > > > One way or another there are some technical solution of various > > degrees of creativity. And I'm actually not sure if I have a solution > > for kfunc signature change at all. Without BTF we could use two > > separate .c files and statically link them together, which would work > > because extern is untyped in pure C. But with BPF static linking we do > > have BTF information for each extern, and those BTF types will be > > incompatible for the same extern func. > > > > We can probably come up with some hacks and conventions, as usual, but > > better start thinking about them now. > > > > But hopefully you can empathize a bit more with poor end users that > > have to do hack like this and why having bpf_dynptr API defined as > > stable BPF helpers, with no extra dependencies on BTF in kernel, > > BTF is a reasonable dependency. > You've just used it to detect whether helper exists or not. > So it's fine to use the same to check whether kfunc exists or not. BTFGen doesn't require kernel to be built with BTF, and yet I get BPF CO-RE stuff. But you are jumbling everything together. I don't need BPF CO-RE to build a useful BPF application that needs to use ringbuf+dynptr (think uprobe'ing of some app, USDTs, etc), yet we will require BTF for no reason. Just as you are afraid of not getting UAPI right because we can't anticipate possible changes, let's be just as much afraid of unnecessary dependencies, which can be a blocker or pain for some users in some situations. Isn't that fair? > > > > > Depends on perspective. If I was some humble dev trying to build > > BPF-based tool that should work on x86, arm64, s390x, and riscv (or > > whatever other architecture), and dynptr API is only based on kfuncs, > > I'm screwed. I can't sponsor or do kfunc support for my favorite > > architecture, I'm stuck waiting for this to be done by someone some > > time, if ever. > > If kfuncs and bpf trampoline don't work on a particular architecture > that developer is likely screwed anyway. Dynptr is the last thing they > would worry about. uprobe+dynptr+ringbuf is all I need for useful apps. Likely or not can be argued to the end of times.
On Wed, Jan 4, 2023 at 12:03 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Jan 04, 2023 at 10:59:15AM -0800, Andrii Nakryiko wrote: > > > > > >> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > > > >> and from an API PoV that it is ready to be a proper BPF helper, and until this point > > > > > > > > "Proper BPF helper" model is broken. > > > > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > > > > > > > is a hack that works only when compiler optimizes the code. > > > > See gcc's attr(kernel_helper) workaround. > > > > This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > > > > And because it's uapi we cannot even fix this > > > > With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > > > > These tools don't exist yet, but we have a way forward whereas with helpers > > > > we are stuck with -O2. > > > > > > > But specifically about how the BPF helper model is broken, that's at > > least an exaggeration. BPF helper call is defined at BPF ISA level, it > > has to be a `call <some constant>;`, and as long as compiler generates > > such code, it's all good. From C standpoint UAPI is just a function > > call: > > > > bpf_map_lookup_elem(&map, ...); > > > > As long as this compiles and generates proper `call 1;` assembly > > instruction, we are good. If/when both Clang and GCC support an > > alternative way to define helper and not as a static func pointer, -O0 > > builds (at least in the aspect of calling BPF helpers, I suspect other > > stuff will break still) will just work. And what's better, > > bpf_helper_defs.h would be able to pick the best option based on > > compiler's support with end users not having to care or notice the > > difference. > > Right and that's what gcc did with attribute((kernel_helper(1)), > but we didn't like it because gcc and clang would diverge. > Now you're arguing it's just a bpf_helper_defs.h change and we should > have allowed it? No, I'm saying if you feel so strongly that the current situation is bad and attribute-based approach is preferable (presumably to allow -O0 to work), then we can do that (both on GCC and Clang sides) and everything will work with no UAPI changes. And I did suggest a relatively clean approach with BPF_HELPER_DEF() ([0]) which would combine both old and new ways. But I personally have no problem with the current approach. You are bringing it up as an UAPI problem, which I'm claiming it is not. [0] https://lore.kernel.org/bpf/CAEf4BzYwRyXG1zE5BK1ZXmxLh+ZPU0=yQhNhpqr0JmfNA30tdQ@mail.gmail.com/ > > Also consider that 'call <some constant>' or more precise 'call absolute_address' > as an instruction exist in only one CPU architecture. It's BPF ISA. > It's a mistake that I made 8 years ago and inability to fix it bothers me. > Now we have 100 times more developers than we had 8 years ago. > I expect 100 time more UAPI and ABI mistakes. > Minimizing unfixable mistakes is what I'm after.
On Wed, Jan 04, 2023 at 01:55:32PM -0800, Andrii Nakryiko wrote: [...] > > > Yes, we won't change existing helpers, but we can add new ones if we > > > need to extend them. That's how APIs work. Yes, they need careful > > > considerations when designing and implementing new APIs. Yes, mistakes > > > do happen, that's just fact of life and par for the course of software > > > development. Yes, we have to live with those mistakes. Nothing changed > > > about that. > > > > > > But somehow libraries and kernel still produce stable APIs and > > > maintain them because they clearly provide benefits to end users. > > > > Did you 'live with mistakes done in libbpf 0.x' ? No. > > for a long time yes. And it's not apples to apples comparison, with > library it is possible to deprecate APIs, which is what we did. With > lots of work and gradual transition, but did it. User space <-> kernel is not an apples to apples comparison with kernel <-> BPF programs either. Also, you're using the word "possible" here like it's a foregone conclusion. It is "possible" to deprecate BPF APIs as well, if we start using kfuncs going forward instead of adding to the UAPI boundary. > If we couldn't pull this through, yeah, I would live with whatever > APIs are there. And added new ones as a better replacement. As is > always done for APIs, nothing new here. The point is that you had a choice. > Within 0.x and 1.x APIs are stable and we live with them. This API > stability fear doesn't paralyze libbpf development, we still add new > stable APIs, if they are considered useful and thought through enough. Nobody is claiming that we can't have stable APIs. We're arguing in favor of being able to _choose_ which APIs to deprecate. Using your logic, you wouldn't have been able to deprecate _anything_ for fear of some user, somewhere being affected by it. I understand the sentiment, and I agree that it's very important to have conservative and predictable approaches to deprecation. What I don't think is important is to provide _indefinite_ guarantees for _all_ APIs between two different kernel contexts. And to reiterate, as I've said a few times now but nobody seems to be responding to (unless I missed something), this is for kernel <-> kernel programs. We're not even talking about APIs that are available to user space. Let's at least be clear about the boundaries for which we're debating the merits of stability, because while some user space tooling would certainly affected by choosing to freeze BPF helpers, kfuncs and BPF helpers are ever invoked by _kernel_ programs. > > You've introduced libbpf 1.0 with incompatible api and some users suffereed. > > By "suffered" you mean a few systemd folks being grumpy about this? > And having to do 100 lines of code changes ([0]) to support two > incompatible major versions of libbpf *simultaneously*? > > On the other hand we got a library with saner error propagation > behavior and various API normalizations and additions. Not too bad of > a trade off. This sounds like an argument in favor of why it is acceptable to deprecate some things? Why are some users allowed to feel "pain" (a term you've used in other threads), but other users who are affected by your choices are just "grumpy"? Also, what about the myriad hypothetical users you've never heard of (the ones who we're really protecting with UAPI) who had to deal with breaking API stability changes? > Sure, deprecation is not easy or free, there was a lot of prep work, > and some users had to adjust their code to use new APIs. But this is > quite a tangent. I don't see how this is tangential to the discussion -- it seems very relevant. From my perspective, the core of the discussion has been whether it's acceptable to shift _any_ of the burden of API stability to users. My point, and I believe Alexei's point as well, is that the answer is "it depends and it's a tradeoff", as you've essentially said here. What I'm failing to understand is why your argument that there are tradeoffs applies here, but not for kernel <-> BPF kernel programs? I'm genuinely trying to understand what the distinction is, because from where I'm sitting it feels like we're being selective about when the unknown _threat_ of API instability automatically completely overrides our ability to choose our own deprecation and stability story (a stability story which is informed by our perception of an API's importance, usage, etc). Note that my point here applies to something you've raised on other threads as well, such as on [0] where you (reasonably) reiterated this point: [0]: https://lore.kernel.org/all/CAEf4BzY0aJNGT321Y7Fx01sjHAMT_ynu2-kN_8gB_UELvd7+vw@mail.gmail.com/ > But again. Let me repeat my point *again*. BPF helpers and kfuncs are > not mutually exclusive, both can and should exist and evolve. That's > one of the main points which is somehow eluding this conversation. This is one of the big disconnects for me. If you argue that both BPF helpers and kfuncs can and should continue to coexist indefinitely, it feels like you're arguing for two incompatible points (and please correct me anywhere that I'm unintentionally misrepresenting your perspective here): - On the one hand you're arguing that in some cases, _no_ API instability is acceptable. That in general, the main kernel <-> kernel BPF program API boundary is equivalent to UAPI, and that it's _never_ acceptable for us to ever, _ever_ deprecate certain APIs because _some_ users may be using them, and the possibilty of APIs ever changing or being deprecated will impose an unacceptable pain to users which will make it too difficult to build tooling and, and end up discouraging adoption onto BPF. It seems that you've been making making this argument in favor of what you consider to be "core" BPF helpers such as bpf_dynptr_is_null(), etc. - At the same time, on the other hand, you're arguing that _some_ of the API boundary between kernel <-> BPF program can be unstable. That it's acceptable for _some_ users and _some_ tooling to feel the pain of certain APIs changing. To perhaps extrapolate your point a bit further, you're arguing that niche / non-core kfuncs can be unstable, and that we don't have to worry about the unknown, hypothetical user who would feel pain from having to deal with them being deprecated, because they're not "core". Assuming that's all true, my question is: Why not just give ourselves the _option_ of being able to deem those core helpers as being indefinitely stable for the foreseeable future, and keep the unstable kfuncs to have the same stability guarantees as what they have today? In terms of _stability_ specifically (so ignoring other concerns you've raised, such as that we need BTF and BPF trampoline support for kfuncs -- not because they're irrelevant, but just to keep the discussion focused on stability), what do we gain by keeping the "core" / "stable" functions as BPF helpers, instead of just making them "super stable" kfuncs? At least then we have the option in the far-far-far future to deprecate them if they eventually, way later, become 100% obsolete. Plus you get the other benefits that Alexei mentioned such as potentially being able to backport them to older kernels by including them in modules, etc. Note that I'm not saying with 100% conviction that we don't have _any_ work to do before freezing helpers (though IMO we should just rip the bandaid and do it now), but I am arguing with strong conviction that once any of that precursor work is taken care of, there is no reason to use BPF helpers in place of kfuncs. At least, that's how I see it at this point. > [0] https://github.com/systemd/systemd/pull/24511/ > > > > > > We'll get the same amount of flame when we try to change kfunc that's > > > widely adopted. > > > > Of course. That's why we need to define a stability and deperecation > > plan for them. > > Lots of things that need to be defined and figured out, but we are > already quick to freeze BPF helpers. I agree with you that it would be prudent for us to iron some of this out more concretely. In this discussion it seems like one of the key points of contention has been around stability, and that the lack of a concrete policy for kfuncs has largely (but not completely) been the cause for concern. Perhaps it would help clarify things if someone submitted a patch set that included a more formal kfunc stability proposal?
On 1/4/23 11:37 AM, Alexei Starovoitov wrote: > Would you invest in developing application against unstable syscall API? Absolutely. > People develop all tons of stuff on top of fuse-fs. People develop apps that interact > with tracing bpf progs that are clearly unstable. They do suffer when kernel side > changes and people accept that cost. BPF and tracing in general contributed to that mind change. > In a datacenter quite a few user apps are tied to kernel internals. > >> Imho, it's one of BPF's strengths and >> we should keep the door open, not close it. > The strength of BPF was and still is that it has both stable and unstable interfaces. > Roughly: networking is stable, tracing is unstable. > The point is that to be stable one doesn't need to use helpers. > We can make kfuncs stable too if we focus all our efforts this way and > for that we need to abandon adding helpers though it's a pain short term. > >>>> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production >>>> and from an API PoV that it is ready to be a proper BPF helper, and until this point >>> "Proper BPF helper" model is broken. >>> static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; >>> >>> is a hack that works only when compiler optimizes the code. >>> See gcc's attr(kernel_helper) workaround. >>> This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. >>> And because it's uapi we cannot even fix this >>> With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. >>> These tools don't exist yet, but we have a way forward whereas with helpers >>> we are stuck with -O2. >> Better debugging tools are needed either way, independent of -O0 or -O2. I don't >> think -O0 is a requirement or barrier for that. It may open up possibilities for >> new tools, but production is still running with -O2. Proper BPF helper model is >> broken, but everyone relies on it, and will be for a very very long time to come, >> whether we like it or not. There is a larger ecosystem around BPF devs outside of >> kernel, and developers will use the existing means today. There are recommendations / >> guidelines that we can provide but we also don't have control over what developers >> are doing. Yet we should make their life easier, not harder. > Fully fleshed out kfunc infra will make developers job easier. No one is advocating > to make users suffer. It is a long discussion. I am replying on a thread with points that I have also been thinking about kfunc and helper. I think bpf helper is a kernel function but helpers need to be defined in a more tedious form. It requires to define bpf_func_proto and then wrap into BPF_CALL_x. It was not obvious for me to get around to understand the reason behind it. With kfunc, it is a more natural way for other kernel developers to expose subsystem features to bpf prog. In time, I believe we will be able to make kfunc has a similar experience as EXPORT_SYMBOL_*. Thus, for subsystem (hid, fuse, netdev...etc) exposing functions to bpf prog, I think it makes sense to stay with kfunc from now on. The subsystem is not exposing something like syscall as an uapi. bpf prog is part of the kernel in the sense that it extends that subsystem code. I don't think bpf needs to provide extra and more guarantee than the EXPORT_SYMBOL_* in term of api. That said, we should still review kfunc in a way that ensuring it is competent to the best of our knowledge at that point with the limited initial use cases at hand. I won't be surprised some of the existing EXPORT_SYMBOL_* kernel functions will be exposed to the bpf prog as kfunc as-is without any change in the future. For example, a few tcp cc kfuncs such as tcp_slow_start. They are likely stable without much change for a long time. It can be directly exposed as bpf kfunc. kfunc is a way to expose subsystem function without needing the bpf_func_proto and BPF_CALL_x quirks. When the function can be dual compiled later, the kfunc can also be inlined. If kfunc will be used for subsystem, it is very likely the number of kfunc will grow and exceed the bpf helpers soon. This seems to be a stronger need to work on the user experience problems about kfunc that have mentioned in this thread sooner than later. They have to be solved regardless. May be start with stable kfunc first. If the new helper is guaranteed stable, then why it cannot be kfunc but instead needs to go through the bpf_func_proto and BPF_CALL_x? In time, I hope the bpf helper support in the verifier can be quieted down (eg. check_helper_call vs check_kfunc_call) and focus energy into kfunc like inlining kfunc...etc.
On Thu, Jan 5, 2023 at 1:14 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 1/4/23 11:37 AM, Alexei Starovoitov wrote: > > Would you invest in developing application against unstable syscall API? Absolutely. > > People develop all tons of stuff on top of fuse-fs. People develop apps that interact > > with tracing bpf progs that are clearly unstable. They do suffer when kernel side > > changes and people accept that cost. BPF and tracing in general contributed to that mind change. > > In a datacenter quite a few user apps are tied to kernel internals. > > > >> Imho, it's one of BPF's strengths and > >> we should keep the door open, not close it. > > The strength of BPF was and still is that it has both stable and unstable interfaces. > > Roughly: networking is stable, tracing is unstable. > > The point is that to be stable one doesn't need to use helpers. > > We can make kfuncs stable too if we focus all our efforts this way and > > for that we need to abandon adding helpers though it's a pain short term. > > > >>>> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > >>>> and from an API PoV that it is ready to be a proper BPF helper, and until this point > >>> "Proper BPF helper" model is broken. > >>> static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > >>> > >>> is a hack that works only when compiler optimizes the code. > >>> See gcc's attr(kernel_helper) workaround. > >>> This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > >>> And because it's uapi we cannot even fix this > >>> With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > >>> These tools don't exist yet, but we have a way forward whereas with helpers > >>> we are stuck with -O2. > >> Better debugging tools are needed either way, independent of -O0 or -O2. I don't > >> think -O0 is a requirement or barrier for that. It may open up possibilities for > >> new tools, but production is still running with -O2. Proper BPF helper model is > >> broken, but everyone relies on it, and will be for a very very long time to come, > >> whether we like it or not. There is a larger ecosystem around BPF devs outside of > >> kernel, and developers will use the existing means today. There are recommendations / > >> guidelines that we can provide but we also don't have control over what developers > >> are doing. Yet we should make their life easier, not harder. > > Fully fleshed out kfunc infra will make developers job easier. No one is advocating > > to make users suffer. > > It is a long discussion. I am replying on a thread with points that I have also > been thinking about kfunc and helper. > > I think bpf helper is a kernel function but helpers need to be defined in a more > tedious form. It requires to define bpf_func_proto and then wrap into > BPF_CALL_x. It was not obvious for me to get around to understand the reason > behind it. With kfunc, it is a more natural way for other kernel developers to > expose subsystem features to bpf prog. In time, I believe we will be able to > make kfunc has a similar experience as EXPORT_SYMBOL_*. > > Thus, for subsystem (hid, fuse, netdev...etc) exposing functions to bpf prog, I > think it makes sense to stay with kfunc from now on. The subsystem is not > exposing something like syscall as an uapi. bpf prog is part of the kernel in > the sense that it extends that subsystem code. I don't think bpf needs to > provide extra and more guarantee than the EXPORT_SYMBOL_* in term of api. That > said, we should still review kfunc in a way that ensuring it is competent to the > best of our knowledge at that point with the limited initial use cases at hand. > I won't be surprised some of the existing EXPORT_SYMBOL_* kernel functions will > be exposed to the bpf prog as kfunc as-is without any change in the future. For > example, a few tcp cc kfuncs such as tcp_slow_start. They are likely stable > without much change for a long time. It can be directly exposed as bpf kfunc. > kfunc is a way to expose subsystem function without needing the bpf_func_proto > and BPF_CALL_x quirks. When the function can be dual compiled later, the kfunc > can also be inlined. > > If kfunc will be used for subsystem, it is very likely the number of kfunc will > grow and exceed the bpf helpers soon. This seems to be a stronger need to work > on the user experience problems about kfunc that have mentioned in this thread > sooner than later. They have to be solved regardless. May be start with stable > kfunc first. If the new helper is guaranteed stable, then why it cannot be kfunc > but instead needs to go through the bpf_func_proto and BPF_CALL_x? In time, I > hope the bpf helper support in the verifier can be quieted down (eg. > check_helper_call vs check_kfunc_call) and focus energy into kfunc like inlining > kfunc...etc. Sorry, I am late to this discussion. The way I read this is that kfuncs and helpers are implementation details and the real question is about the stability and mutability of the helper methods. I think there are two kinds of BPF program developers, and I might be oversimplifying to convey a point here: [1] Tracing people: They craft tracing programs and are more accustomed to probing deeper into kernel internals, handling variable renames and consequently will tolerate a kfunc changing its signature, being renamed or disappearing. [2] Network people: They are not accustomed to mutability the same way as the tracing people. If there is mutability here, these users will face a change in developer experience. I see two paths forward here: [a] We want to somewhat preserve the developer experience of [2] and we find a way to do somewhat stable APIs. kfuncs have the benefit that they are eventually mutable, but a longer stability guarantee for helpers used by [2] could ameliorate the pains of mutability. e.g. something we could do for certain helpers is a deprecation story, e.g. a kfunc won't change for X kernel versions, or when we annotate kfuncs as deprecated, libbpf can warn users "this kfunc is going away in kernel version Z"). If this would be difficult to guarantee and we do care about developer experience, we might need to have some helpers exposed as UAPI. [b] We accept the fact the user experience will change more for [2] and that's a trade-off we accept. IMHO, this is not ideal and while tracing folks have found a way to cope, it would be yet another thing to worry about for folks who are not used to it. There are things we can do to make it slightly less burdensome for the user by adding a shim in BPF headers (however, it won't solve problems for everyone though e.g. inline BPF, other languages but will give them a template for their respective "shims"). Another thing to consider if there are use-cases where some users disable BTF (for whatever reason, like running BPF in a pacemaker :P or in extremely low memory cases).
Didn't find the best place to put this, so it will be here. I think it would be beneficial to discuss BPF helpers freeze in BPF office hours. So I took the liberty to put it up for next BPF office hours, 9am, Jan 12th 2022. I hope that some more people that have exposure to real-world BPF application and pains associated with all that could join the discussion, but obviously anyone is welcome as well, no matter which way they are leaning. Please consider joining, see details on Zoom meeting at [0] For the rest, please see below. I'll be out for a few days and won't be able to reply, my apologies. [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0 On Wed, Jan 4, 2023 at 3:47 PM David Vernet <void@manifault.com> wrote: > > On Wed, Jan 04, 2023 at 01:55:32PM -0800, Andrii Nakryiko wrote: > > [...] > > > > > Yes, we won't change existing helpers, but we can add new ones if we > > > > need to extend them. That's how APIs work. Yes, they need careful > > > > considerations when designing and implementing new APIs. Yes, mistakes > > > > do happen, that's just fact of life and par for the course of software > > > > development. Yes, we have to live with those mistakes. Nothing changed > > > > about that. > > > > > > > > But somehow libraries and kernel still produce stable APIs and > > > > maintain them because they clearly provide benefits to end users. > > > > > > Did you 'live with mistakes done in libbpf 0.x' ? No. > > > > for a long time yes. And it's not apples to apples comparison, with > > library it is possible to deprecate APIs, which is what we did. With > > lots of work and gradual transition, but did it. > > User space <-> kernel is not an apples to apples comparison with kernel > <-> BPF programs either. Also, you're using the word "possible" here > like it's a foregone conclusion. It is "possible" to deprecate BPF APIs > as well, if we start using kfuncs going forward instead of adding to the > UAPI boundary. I'm not sure what to make out of this reply, to be honest. Yes, I think kernel and libraries are sufficiently different to not draw direct comparisons. No, I didn't claim anything about foregone conclusions. I think it's even possible to deprecate BPF helpers, if we really want to. In the end, technically, the only UAPI part about BPF helper is it's ID. That should stay fixed. We do change over time which helpers are available in which program types. Yes, it would be really bad to change helper signature and I'd be very much against this, but from my perspective (and I'm sure others will disagree), it's in the realm of possibility to do gradual deprecation of some helpers. We'll leave BPF_FUNC_xxx enumerator intact, of course, but add a simple wrapper that will just -ENOTSUP. E.g., Linus requested bpf_probe_read() to not exist and not be used, everyone agreed. Good opportunity? But really, we are going on so many tangents instead of addressing specific points. As I said early on in the discussion, this will be a discussion to exhaustion of one side or the other, unfortunately. > > > If we couldn't pull this through, yeah, I would live with whatever > > APIs are there. And added new ones as a better replacement. As is > > always done for APIs, nothing new here. > > The point is that you had a choice. The point is that UAPI stability is not the end of the world and paranoia is bad. We shouldn't get paralyzed because we add APIs. We do that to libbpf and APIs will stay stable within entire 1.x version. Yes, we don't have such a nice "luxury" with kernel, but see above. There are libraries that go to great lengths to keep old APIs, however broken or inconvenient they are. Yes, it's a pain, but it doesn't paralyze development. > > > Within 0.x and 1.x APIs are stable and we live with them. This API > > stability fear doesn't paralyze libbpf development, we still add new > > stable APIs, if they are considered useful and thought through enough. > > Nobody is claiming that we can't have stable APIs. We're arguing in > favor of being able to _choose_ which APIs to deprecate. Using your > logic, you wouldn't have been able to deprecate _anything_ for fear of > some user, somewhere being affected by it. I understand the sentiment, > and I agree that it's very important to have conservative and > predictable approaches to deprecation. What I don't think is important > is to provide _indefinite_ guarantees for _all_ APIs between two > different kernel contexts. > > And to reiterate, as I've said a few times now but nobody seems to be > responding to (unless I missed something), this is for kernel <-> kernel > programs. We're not even talking about APIs that are available to user > space. Let's at least be clear about the boundaries for which we're > debating the merits of stability, because while some user space tooling > would certainly affected by choosing to freeze BPF helpers, kfuncs and > BPF helpers are ever invoked by _kernel_ programs. I'm also for the choice. And freezing BPF helpers removes this choice. I want to have functionality that won't depend on arch-specific kfunc support, won't depend on BTF, etc. Think about it this way (and try to avoid the temptation to point out imperfections of analogy). How would you feel if Rust added slice support, and said that it will work in some super basic form everywhere. But some things, like deriving subslice or checking slice's size would be architecture-specific, they will initially work in Tier 1 supported architectures, maybe or soon they might work on Tier 2, but unlikely to work on Tier 3, unless someone will do a bunch of highly technical work and signs up to maintain it going forward. Does this sound reasonable for something that is a stable and simple abstraction, which should feel like an integral part of the BPF framework. It doesn't have any ties into arch-specific details, it doesn't require debug information to be usable and efficient, etc. Alas. Another example. I'm adding BPF open-coded iterators. One of them is fundamentally an improved (in terms of functionality and ergonomics) version of bpf_loop() and bounded BPF loop support. It consists of a black-box struct bpf_iter to keep state and three helpers: bpf_iter_range_new(), bpf_iter_range_next() and bpf_iter_range_destroy(). It can be used roughly like this: struct bpf_iter it; int N = ..., *v, i; bpf_iter_range_new(&it, 0, N); while ((v = bpf_iter_range_next(&it))) { i = *v; /* use i which will take values from 0 to N-1 */ } Not too bad, but a bit verbose. I'd like to add a simple macro to help write this a bit more natural. Right now I know how to do it so that is looks roughly like this: bpf_for(i, 0, N, ({ /* my code using i and any other local variables */ })); Here's a few concerns if I'm made to do these bpf_iter_xxx() functions as kfunc: a) I'll have an ability to do this iteration only on architectures that do support kfunc, which is not *all* architectures that support BPF. So there are case where I can write some BPF programs, kernel could be recent enough to support bpf_iter_*() APIs, but I won't be able to rely on my BPF applications (which is some simple tool that doesn't need anything fancy from BPF, no BTF, no BPF trampoline, no nothing, I just want to trace some uprobes and USDTs, fetch some data from user-space app, do post-processing, maintains few simple ARRAY and HASH maps, dump data through perfbuf/ringbuf). Why do I need to explain to customers why they can't use bpf_iter_*() even if they have a recent kernel? There is no reason for a simple looping construct to require all this extra baggage. ZERO. b) I'd like to provide bpf_for() macro from libbpf. Well, whether you agree or not, but libbpf does provide stable APIs as well. bpf_for() can't be really stable because bpf_iter_*() funcs are declared unstable (and if they are stable, then why can't I make them BPF helpers). If something change, it will be on libbpf to come up with some creative ingenious work arounds. If they get removed -- oops, too bad, libbpf. Also given that kfuncs are not part of bpf_helper_defs.h (and shouldn't, they are unstable), I'll have to define __ksym definitions for necessary APIs somewhere in the same header where bpf_for() is defined. Luckily (I checked, not too lazy to try solve problems end-to-end, would be happy for someone to reply to my specific request to do the same, but alas), it's ok to have multiple duplicated externs __ksym definitions. So it's annoying, but at least not impossible. I know what will come next: proposal to add some unstable headers and APIs to libbpf and stuff. It's another discussion, everything is possible, etc, etc. But I'm hoping that at least some people will garner a bit of empathy for consequences of these helpers vs kfunc choices. Just to reiterate. I have no problem with kfuncs per se. Task struct, ct, xfrm, whatever other things that are working with kernel objects -- totally makes sense to have them as kfunc. Totally. But concepts like dynptr (memory slice), for loop, etc. I see zero, absolutely zero, reason to dictate that they should be unstable and arch-specific. > > > > You've introduced libbpf 1.0 with incompatible api and some users suffereed. > > > > By "suffered" you mean a few systemd folks being grumpy about this? > > And having to do 100 lines of code changes ([0]) to support two > > incompatible major versions of libbpf *simultaneously*? > > > > On the other hand we got a library with saner error propagation > > behavior and various API normalizations and additions. Not too bad of > > a trade off. > > This sounds like an argument in favor of why it is acceptable to > deprecate some things? Why are some users allowed to feel "pain" (a term > you've used in other threads), but other users who are affected by your > choices are just "grumpy"? Also, what about the myriad hypothetical > users you've never heard of (the ones who we're really protecting with > UAPI) who had to deal with breaking API stability changes? I think you are twisting what I'm arguing for. I didn't say that everything should be stable, did I? I'm saying some things should be stable, like dynptr and for loop iterator. As for the libbpf deprecation process. I'm happy to discuss how it went and what could have been done better. But I don't think this thread is the place to discuss this. Please, ping me offline or start a separate thread. > > > Sure, deprecation is not easy or free, there was a lot of prep work, > > and some users had to adjust their code to use new APIs. But this is > > quite a tangent. > > I don't see how this is tangential to the discussion -- it seems very > relevant. From my perspective, the core of the discussion has been > whether it's acceptable to shift _any_ of the burden of API stability to > users. My point, and I believe Alexei's point as well, is that the > answer is "it depends and it's a tradeoff", as you've essentially said > here. Interesting. Alexei is saying "no more BPF helpers", and that has all the consequences I outlined above (and probably more I haven't thought about). Daniel is asking to have this "it depends" option by not taking such a hard line on BPF helpers freeze. From my perspective, the core of the discussion is whether stability of UAPI is the paramount issue that overshadows everything else or not. Me and Daniel are saying no, you and Alexei are arguing yes. > > What I'm failing to understand is why your argument that there are > tradeoffs applies here, but not for kernel <-> BPF kernel programs? I'm > genuinely trying to understand what the distinction is, because from > where I'm sitting it feels like we're being selective about when the > unknown _threat_ of API instability automatically completely overrides > our ability to choose our own deprecation and stability story (a > stability story which is informed by our perception of an API's > importance, usage, etc). There is some misunderstanding obviously. I'm all for flexibility and considering tradeoffs. But dictating "no more BPF helpers" is not that, it's the opposite of that. And yes, I do not believe that UAPI stability is the most important and the only aspect that should be taken into consideration. I really hope that specific points about dynptr and for loop iterator help you understand my position. It's not even so much a stability (though that matter for core concepts, obviously), but rather all the incidental complexities, dependencies, and limitations that come with kfuncs (and some, like arch-specific support, are fundamental; while others, like detecting their support are currently big hurdles, but could be solved; and let's solve them first, before taking these hard stances, not the other way around). > > Note that my point here applies to something you've raised on other > threads as well, such as on [0] where you (reasonably) reiterated this > point: > > [0]: https://lore.kernel.org/all/CAEf4BzY0aJNGT321Y7Fx01sjHAMT_ynu2-kN_8gB_UELvd7+vw@mail.gmail.com/ > > > But again. Let me repeat my point *again*. BPF helpers and kfuncs are > > not mutually exclusive, both can and should exist and evolve. That's > > one of the main points which is somehow eluding this conversation. > > This is one of the big disconnects for me. If you argue that both BPF > helpers and kfuncs can and should continue to coexist indefinitely, it > feels like you're arguing for two incompatible points (and please > correct me anywhere that I'm unintentionally misrepresenting your > perspective here): > > - On the one hand you're arguing that in some cases, _no_ API > instability is acceptable. That in general, the main kernel <-> kernel > BPF program API boundary is equivalent to UAPI, and that it's _never_ > acceptable for us to ever, _ever_ deprecate certain APIs because you are being hyperbolic and overdramatic again for no good reason, "ever, _ever_" -- really? There is no such thing. > _some_ users may be using them, and the possibilty of APIs ever > changing or being deprecated will impose an unacceptable pain to users > which will make it too difficult to build tooling and, and end up > discouraging adoption onto BPF. It seems that you've been making > making this argument in favor of what you consider to be "core" BPF > helpers such as bpf_dynptr_is_null(), etc. > > - At the same time, on the other hand, you're arguing that _some_ of the > API boundary between kernel <-> BPF program can be unstable. That it's > acceptable for _some_ users and _some_ tooling to feel the pain of > certain APIs changing. To perhaps extrapolate your point a bit > further, you're arguing that niche / non-core kfuncs can be unstable, > and that we don't have to worry about the unknown, hypothetical user > who would feel pain from having to deal with them being deprecated, > because they're not "core". > Yes, but I don't see the contradiction. If BPF map abstraction and its API was declared unstable (and made arch-specific, this is not a small detail which you conveniently want to ignore below), I as a user would think twice before using them. Depends on the situation and what I'm trying to do. Developing some app within Meta internally -- should, I'd probably still go for it. But building some tool like perf or retsnoop -- I'd think twice if I want to take dependency on BPF map (or dynptr for that matter), if it potentially limits the applicability of my application. But when we think about kfuncs that work with kernel object (task_struct, sockets, whatnot), yes, it's reasonable that we in BPF can't guarantee stability of those (though I'd very much hope that we wouldn't willy-nilly keep changing them for no good reason and do reasonable effort to isolate end users from some reasonable underlying changes to how task/socket/etc are handled within kernel). If tomorrow the kernel decides to drop socket abstraction, I don't think BPF subsystem should "emulate" it somehow (though even that depends, tbh). So yes, I don't see contradictions. With BPF map, dynptr, (some) iterators -- BPF controls its destiny, it can and should provide an unassuming interface, abstractions, APIs and stick to supporting them and not dictating arbitrary extra dependencies. > Assuming that's all true, my question is: > > Why not just give ourselves the _option_ of being able to deem those > core helpers as being indefinitely stable for the foreseeable future, > and keep the unstable kfuncs to have the same stability guarantees as > what they have today? In terms of _stability_ specifically (so ignoring > other concerns you've raised, such as that we need BTF and BPF > trampoline support for kfuncs -- not because they're irrelevant, but > just to keep the discussion focused on stability), what do we gain by Quite convenient to ignore very important limitations, of course. But hopefully I addressed your question above? > keeping the "core" / "stable" functions as BPF helpers, instead of just > making them "super stable" kfuncs? At least then we have the option in > the far-far-far future to deprecate them if they eventually, way later, > become 100% obsolete. Plus you get the other benefits that Alexei > mentioned such as potentially being able to backport them to older > kernels by including them in modules, etc. > > Note that I'm not saying with 100% conviction that we don't have _any_ > work to do before freezing helpers (though IMO we should just rip the > bandaid and do it now), but I am arguing with strong conviction that > once any of that precursor work is taken care of, there is no reason to > use BPF helpers in place of kfuncs. At least, that's how I see it at > this point. I disagree about ripping the bandaid and precluding dynptr framework to be whole before we solve various problems I pointed out in [1] (which unfortunately was mostly ignored, it seems). And for the "for loop iterator", I absolutely do not want to have a useful generic abstraction for repeatable loop, that will have few asterisks associated with them, dictating which arches and what kernel config values (beyond basic BPF ones) should be ensured to make iteration work. Kills any motivation to finish it. Imagine if HASH map didn't work on some new minor platform, even though basic BPF works there. How does that sound to you? [1] https://lore.kernel.org/all/CAEf4BzY0aJNGT321Y7Fx01sjHAMT_ynu2-kN_8gB_UELvd7+vw@mail.gmail.com/ > > > [0] https://github.com/systemd/systemd/pull/24511/ > > > > > > > > > We'll get the same amount of flame when we try to change kfunc that's > > > > widely adopted. > > > > > > Of course. That's why we need to define a stability and deperecation > > > plan for them. > > > > Lots of things that need to be defined and figured out, but we are > > already quick to freeze BPF helpers. > > I agree with you that it would be prudent for us to iron some of this > out more concretely. In this discussion it seems like one of the key > points of contention has been around stability, and that the lack of a > concrete policy for kfuncs has largely (but not completely) been the > cause for concern. Perhaps it would help clarify things if someone > submitted a patch set that included a more formal kfunc stability > proposal? Stability isn't the only concern, hopefully I made this clear above.
On Wed, Jan 4, 2023 at 4:13 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 1/4/23 11:37 AM, Alexei Starovoitov wrote: > > Would you invest in developing application against unstable syscall API? Absolutely. > > People develop all tons of stuff on top of fuse-fs. People develop apps that interact > > with tracing bpf progs that are clearly unstable. They do suffer when kernel side > > changes and people accept that cost. BPF and tracing in general contributed to that mind change. > > In a datacenter quite a few user apps are tied to kernel internals. > > > >> Imho, it's one of BPF's strengths and > >> we should keep the door open, not close it. > > The strength of BPF was and still is that it has both stable and unstable interfaces. > > Roughly: networking is stable, tracing is unstable. > > The point is that to be stable one doesn't need to use helpers. > > We can make kfuncs stable too if we focus all our efforts this way and > > for that we need to abandon adding helpers though it's a pain short term. > > > >>>> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > >>>> and from an API PoV that it is ready to be a proper BPF helper, and until this point > >>> "Proper BPF helper" model is broken. > >>> static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > >>> > >>> is a hack that works only when compiler optimizes the code. > >>> See gcc's attr(kernel_helper) workaround. > >>> This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > >>> And because it's uapi we cannot even fix this > >>> With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > >>> These tools don't exist yet, but we have a way forward whereas with helpers > >>> we are stuck with -O2. > >> Better debugging tools are needed either way, independent of -O0 or -O2. I don't > >> think -O0 is a requirement or barrier for that. It may open up possibilities for > >> new tools, but production is still running with -O2. Proper BPF helper model is > >> broken, but everyone relies on it, and will be for a very very long time to come, > >> whether we like it or not. There is a larger ecosystem around BPF devs outside of > >> kernel, and developers will use the existing means today. There are recommendations / > >> guidelines that we can provide but we also don't have control over what developers > >> are doing. Yet we should make their life easier, not harder. > > Fully fleshed out kfunc infra will make developers job easier. No one is advocating > > to make users suffer. > > It is a long discussion. I am replying on a thread with points that I have also > been thinking about kfunc and helper. > > I think bpf helper is a kernel function but helpers need to be defined in a more > tedious form. It requires to define bpf_func_proto and then wrap into > BPF_CALL_x. It was not obvious for me to get around to understand the reason This is subjective and there is no point in arguing about that. I find BPF helper definitions more obvious and more discoverable, for example. But it doesn't matter what I prefer personally. Whatever the case might be, this is purely internal implementation detail that can be improved and unified much more between helpers and kfuncs, and it's way less important compared to stability and usability issues brought up in this thread, as it has no bearing on user's experience. > behind it. With kfunc, it is a more natural way for other kernel developers to > expose subsystem features to bpf prog. In time, I believe we will be able to > make kfunc has a similar experience as EXPORT_SYMBOL_*. The original goal for kfuncs was to just directly expose kernel functions as is, but then we ended up adding allowlists, tuning them, fixing them, reworking them. We are talking about different lists per different program types, etc. But again, this is internal matters. There is fundamentally no difference between how kfunc and helpers can/should be defined, they are both kernel functions with additional annotations. If we put work into it we can converge the mechanics of how they are defined. > > Thus, for subsystem (hid, fuse, netdev...etc) exposing functions to bpf prog, I > think it makes sense to stay with kfunc from now on. The subsystem is not > exposing something like syscall as an uapi. bpf prog is part of the kernel in > the sense that it extends that subsystem code. I don't think bpf needs to > provide extra and more guarantee than the EXPORT_SYMBOL_* in term of api. That > said, we should still review kfunc in a way that ensuring it is competent to the > best of our knowledge at that point with the limited initial use cases at hand. > I won't be surprised some of the existing EXPORT_SYMBOL_* kernel functions will > be exposed to the bpf prog as kfunc as-is without any change in the future. For > example, a few tcp cc kfuncs such as tcp_slow_start. They are likely stable > without much change for a long time. It can be directly exposed as bpf kfunc. > kfunc is a way to expose subsystem function without needing the bpf_func_proto > and BPF_CALL_x quirks. When the function can be dual compiled later, the kfunc > can also be inlined. > > If kfunc will be used for subsystem, it is very likely the number of kfunc will > grow and exceed the bpf helpers soon. This seems to be a stronger need to work > on the user experience problems about kfunc that have mentioned in this thread > sooner than later. They have to be solved regardless. May be start with stable > kfunc first. If the new helper is guaranteed stable, then why it cannot be kfunc > but instead needs to go through the bpf_func_proto and BPF_CALL_x? In time, I > hope the bpf helper support in the verifier can be quieted down (eg. > check_helper_call vs check_kfunc_call) and focus energy into kfunc like inlining > kfunc...etc.
On Thu, Jan 5, 2023 at 9:17 AM KP Singh <kpsingh@kernel.org> wrote: > > On Thu, Jan 5, 2023 at 1:14 AM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > > > On 1/4/23 11:37 AM, Alexei Starovoitov wrote: > > > Would you invest in developing application against unstable syscall API? Absolutely. > > > People develop all tons of stuff on top of fuse-fs. People develop apps that interact > > > with tracing bpf progs that are clearly unstable. They do suffer when kernel side > > > changes and people accept that cost. BPF and tracing in general contributed to that mind change. > > > In a datacenter quite a few user apps are tied to kernel internals. > > > > > >> Imho, it's one of BPF's strengths and > > >> we should keep the door open, not close it. > > > The strength of BPF was and still is that it has both stable and unstable interfaces. > > > Roughly: networking is stable, tracing is unstable. > > > The point is that to be stable one doesn't need to use helpers. > > > We can make kfuncs stable too if we focus all our efforts this way and > > > for that we need to abandon adding helpers though it's a pain short term. > > > > > >>>> to actual BPF helpers by then where we go and say, that kfunc has proven itself in production > > >>>> and from an API PoV that it is ready to be a proper BPF helper, and until this point > > >>> "Proper BPF helper" model is broken. > > >>> static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > >>> > > >>> is a hack that works only when compiler optimizes the code. > > >>> See gcc's attr(kernel_helper) workaround. > > >>> This 'proper helper' hack is the reason we cannot compile bpf programs with -O0. > > >>> And because it's uapi we cannot even fix this > > >>> With kfuncs we will be able to compile with -O0 and debug bpf programs with better tools. > > >>> These tools don't exist yet, but we have a way forward whereas with helpers > > >>> we are stuck with -O2. > > >> Better debugging tools are needed either way, independent of -O0 or -O2. I don't > > >> think -O0 is a requirement or barrier for that. It may open up possibilities for > > >> new tools, but production is still running with -O2. Proper BPF helper model is > > >> broken, but everyone relies on it, and will be for a very very long time to come, > > >> whether we like it or not. There is a larger ecosystem around BPF devs outside of > > >> kernel, and developers will use the existing means today. There are recommendations / > > >> guidelines that we can provide but we also don't have control over what developers > > >> are doing. Yet we should make their life easier, not harder. > > > Fully fleshed out kfunc infra will make developers job easier. No one is advocating > > > to make users suffer. > > > > It is a long discussion. I am replying on a thread with points that I have also > > been thinking about kfunc and helper. > > > > I think bpf helper is a kernel function but helpers need to be defined in a more > > tedious form. It requires to define bpf_func_proto and then wrap into > > BPF_CALL_x. It was not obvious for me to get around to understand the reason > > behind it. With kfunc, it is a more natural way for other kernel developers to > > expose subsystem features to bpf prog. In time, I believe we will be able to > > make kfunc has a similar experience as EXPORT_SYMBOL_*. > > > > Thus, for subsystem (hid, fuse, netdev...etc) exposing functions to bpf prog, I > > think it makes sense to stay with kfunc from now on. The subsystem is not > > exposing something like syscall as an uapi. bpf prog is part of the kernel in > > the sense that it extends that subsystem code. I don't think bpf needs to > > provide extra and more guarantee than the EXPORT_SYMBOL_* in term of api. That > > said, we should still review kfunc in a way that ensuring it is competent to the > > best of our knowledge at that point with the limited initial use cases at hand. > > I won't be surprised some of the existing EXPORT_SYMBOL_* kernel functions will > > be exposed to the bpf prog as kfunc as-is without any change in the future. For > > example, a few tcp cc kfuncs such as tcp_slow_start. They are likely stable > > without much change for a long time. It can be directly exposed as bpf kfunc. > > kfunc is a way to expose subsystem function without needing the bpf_func_proto > > and BPF_CALL_x quirks. When the function can be dual compiled later, the kfunc > > can also be inlined. > > > > If kfunc will be used for subsystem, it is very likely the number of kfunc will > > grow and exceed the bpf helpers soon. This seems to be a stronger need to work > > on the user experience problems about kfunc that have mentioned in this thread > > sooner than later. They have to be solved regardless. May be start with stable > > kfunc first. If the new helper is guaranteed stable, then why it cannot be kfunc > > but instead needs to go through the bpf_func_proto and BPF_CALL_x? In time, I > > hope the bpf helper support in the verifier can be quieted down (eg. > > check_helper_call vs check_kfunc_call) and focus energy into kfunc like inlining > > kfunc...etc. > > > Sorry, I am late to this discussion. The way I read this is that > kfuncs and helpers are implementation details and the real question is > about the stability and mutability of the helper methods. > > I think there are two kinds of BPF program developers, and I might be > oversimplifying to convey a point here: > > [1] Tracing people: They craft tracing programs and are more > accustomed to probing deeper into kernel internals, handling variable > renames and consequently will tolerate a kfunc changing its signature, > being renamed or disappearing. > > [2] Network people: They are not accustomed to mutability the same way > as the tracing people. If there is mutability here, these users will > face a change in developer experience. > > I see two paths forward here: As I mentioned in another reply, I took a liberty to add "BPF helpers freeze" as a topic for next BPF office hours. It's probably going to be a bit more productive to discuss it there. WDYT? > > [a] We want to somewhat preserve the developer experience of [2] and > we find a way to do somewhat stable APIs. kfuncs have the benefit that > they are eventually mutable, but a longer stability guarantee for > helpers used by [2] could ameliorate the pains of mutability. e.g. > something we could do for certain helpers is a deprecation story, e.g. > a kfunc won't change for X kernel versions, or when we annotate kfuncs > as deprecated, libbpf can warn users "this kfunc is going away in > kernel version Z"). > > If this would be difficult to guarantee and we do care about developer > experience, we might need to have some helpers exposed as UAPI. > > [b] We accept the fact the user experience will change more for [2] > and that's a trade-off we accept. IMHO, this is not ideal and while > tracing folks have found a way to cope, it would be yet another thing > to worry about for folks who are not used to it. > > There are things we can do to make it slightly less burdensome for the > user by adding a shim in BPF headers (however, it won't solve problems > for everyone though e.g. inline BPF, other languages but will give > them a template for their respective "shims"). > > Another thing to consider if there are use-cases where some users > disable BTF (for whatever reason, like running BPF in a pacemaker :P > or in extremely low memory cases). There are various embedded systems (which usually means stricter memory requirements and less mainstream architectures) and people are experimenting with them, trying to run libbpf-tools and such there, or building their own tracing tools. I keep getting Github issues in libbpf-bootstrap and libbpf about something not working on some embedded system and it's absolutely unclear why. I'd rather not have to debug stuff like this for dynptr or for the loop iterator.
[...] > > I see two paths forward here: > > As I mentioned in another reply, I took a liberty to add "BPF helpers > freeze" as a topic for next BPF office hours. It's probably going to > be a bit more productive to discuss it there. WDYT? Perfect, much easier to discuss during office hours. Thanks for adding it! > > > > > [a] We want to somewhat preserve the developer experience of [2] and > > we find a way to do somewhat stable APIs. kfuncs have the benefit that [...]
On Thu, Jan 05, 2023 at 01:01:56PM -0800, Andrii Nakryiko wrote: > Didn't find the best place to put this, so it will be here. I think it > would be beneficial to discuss BPF helpers freeze in BPF office hours. > So I took the liberty to put it up for next BPF office hours, 9am, Jan > 12th 2022. I hope that some more people that have exposure to > real-world BPF application and pains associated with all that could > join the discussion, but obviously anyone is welcome as well, no > matter which way they are leaning. > > Please consider joining, see details on Zoom meeting at [0] > > For the rest, please see below. I'll be out for a few days and won't > be able to reply, my apologies. > > [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0 Thanks for adding it to the agenda. Hopefully we'll be able to converge faster on a call. There are several things to discuss: 1. whether or not to freeze helpers. 2. whether dynptr accessors should be helpers or kfuncs. 3. whether your future inline iterators should be helpers or kfuncs. 4. whether cilium's bpf_sock_destroy should be helper or kfunc. If we hard freeze helpers in 1 it automatically decides the fate for 2, 3, 4. We can soft freeze the helpers then 2,3,4 are up for discussion. Looks like the thread so far was primarily about 1. 4 was touched separately. Daniel hasn't replied yet to my suggestion for it to be kfunc. You insist that 2 and 3 must be helpers. No one seen the patches for 3. I've seen you whiteboard them. It's impossible for others to participate without patches, so let's postpone that. Let's try to focus this thread on 2 assuming both helpers and kfuncs are on the table for dynptrs... > conclusions. I think it's even possible to deprecate BPF helpers, if > we really want to. In the end, technically, the only UAPI part about > BPF helper is it's ID. That should stay fixed. We do change over time > which helpers are available in which program types. Yes, it would be > really bad to change helper signature and I'd be very much against > this, but from my perspective (and I'm sure others will disagree), > it's in the realm of possibility to do gradual deprecation of some > helpers. We'll leave BPF_FUNC_xxx enumerator intact, of course, but > add a simple wrapper that will just -ENOTSUP. Unfortunately you're completely wrong in the above paragraph. I suggest to read this Linus's rant first: https://lkml.org/lkml/2012/12/23/75 Everything that user space sees we cannot change. We can try to, but it will be reverted if users complain. That's why we never try unless there is a very strong reason like security issue. For example your last commit to uapi/bpf.h commit 8a76145a2ec2 ("bpf: explicitly define BPF_FUNC_xxx integer values") is a leap of faith. Though we tried to make it as transparent as possible and I googled BPF_FUNC_MAPPER before applying the patch to see in what weird ways people can use the macro, there is still a non zero chance that we would have to revert it if users complain loud enough. For example cilium has this bit of code: https://github.com/cilium/ebpf/blob/master/asm/func.go I suspect it's broken now, because you've changed 'FN' macro in that commit. Cilium folks are unlikely to complain and demand a revert, so we should be safe in this regard, but we cannot assume that for other users. It should be obvious that we cannot deprecate helpers with ENOTSUP or deprecate them in any other way. > E.g., Linus requested bpf_probe_read() to not exist and not be used, > everyone agreed. Good opportunity? It's an exception that proves the rule. 1. it's a security issue that's why uapi breakage was on the table. 2. it wasn't completely removed. See: #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE case BPF_FUNC_probe_read: return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? NULL : &bpf_probe_read_compat_proto; > The point is that UAPI stability is not the end of the world and > paranoia is bad. We shouldn't get paralyzed because we add APIs. We do > that to libbpf and APIs will stay stable within entire 1.x version. > Yes, we don't have such a nice "luxury" with kernel, but see above. Exactly. See above. There is no way at all to deprecate helpers. > > > > - On the one hand you're arguing that in some cases, _no_ API > > instability is acceptable. That in general, the main kernel <-> kernel > > BPF program API boundary is equivalent to UAPI, and that it's _never_ > > acceptable for us to ever, _ever_ deprecate certain APIs because > > you are being hyperbolic and overdramatic again for no good reason, > "ever, _ever_" -- really? There is no such thing. Andrii, it's really _ever_. You need to internalize that first before we discuss this topic again during office hours. > I'd probably still go for it. But building some tool like perf or > retsnoop -- I'd think twice if I want to take dependency on BPF map > (or dynptr for that matter), if it potentially limits the > applicability of my application. A quote from retsnoop readme: " NOTE: Retsnoop relies on BPF CO-RE technology, so please make sure your Linux kernel is built with CONFIG_DEBUG_INFO_BTF=y kernel config. Without this retsnoop will refuse to start. " and in calib_feat.bpf.c /* Detect if bpf_get_func_ip() helper is supported by the kernel. /* Detect if fentry/fexit re-entry protection is implemented. /* Detect if fexit is safe to use for long-running and sleepable /* Detect if bpf_get_branch_snapshot() helper is supported. /* Detect if BPF_MAP_TYPE_RINGBUF map is supported. /* Detect if BPF cookie is supported for kprobes. /* Detect if multi-attach kprobes are supported. If the feature is useful you will use it. In retsnoop and everywhere else. Regardless whether it's arch dependent, kernel dependent or unstable. > I disagree about ripping the bandaid and precluding dynptr framework > to be whole before we solve various problems I pointed out in [1] > (which unfortunately was mostly ignored, it seems). Let's look at your https://lore.kernel.org/all/CAEf4BzZM0+j6DXMgu2o2UvjtzoOxcjsJtT8j-jqVZYvAqxc52g@mail.gmail.com/ " 1. Generic accessors to check validity of *any* dynptr, and it's inherent properties like offset, available size, read-only property (just as useful somethings as bpf_ringbuf_query() is for ringbufs, both for debugging and for various heuristics in production). bpf_dynptr_is_null(struct bpf_dynptr *ptr) long bpf_dynptr_get_size(struct bpf_dynptr *ptr) long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) There is nothing to add or remove here. No flags, no change in semantics. " You're arguing that it's obviously stable material. Like: +BPF_CALL_1(bpf_dynptr_get_offset, struct bpf_dynptr_kern *, ptr) +{ + if (!ptr->data) + return -EINVAL; + + return ptr->offset; +} but we can do it now in native bpf code: static inline int bpf_dynptr_get_offset(const struct bpf_dynptr *uptr) { struct bpf_dynptr_kern *ptr = bpf_rdonly_cast(uptr, bpf_core_type_id_kernel(struct bpf_dynptr_kern)); if (!ptr->data) return -EINVAL; return ptr->offset; } No kernel changes necessary. No UAPI helpers. No kfuncs. CO-RE will take care of kernel version differences. Do you still insist that it should be a stable uapi helper ? > And for the "for loop iterator", I absolutely do not want to have a > useful generic abstraction for repeatable loop, that will have few > asterisks associated with them, dictating which arches and what kernel > config values (beyond basic BPF ones) should be ensured to make > iteration work. Kills any motivation to finish it. I'm really sad that you went down this ultimatum path. Essentially you're saying: "loop iterator has to be stable helper or I quit working on it." Say we cave in and accepted your demand. Later you do another ultimatum and we cannot cave in for whatever reason. You stay true to your words and quit BPF development. Now we're stuck with your uapi that we cannot change, cannot improve, but still have to maintain it _forever_ without you because you quit. That would suck. Let's get back to discussing technical merits without ultimatums. Ok?
On Thu, Jan 5, 2023 at 6:54 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Jan 05, 2023 at 01:01:56PM -0800, Andrii Nakryiko wrote: > > Didn't find the best place to put this, so it will be here. I think it > > would be beneficial to discuss BPF helpers freeze in BPF office hours. > > So I took the liberty to put it up for next BPF office hours, 9am, Jan > > 12th 2022. I hope that some more people that have exposure to > > real-world BPF application and pains associated with all that could > > join the discussion, but obviously anyone is welcome as well, no > > matter which way they are leaning. > > > > Please consider joining, see details on Zoom meeting at [0] > > > > For the rest, please see below. I'll be out for a few days and won't > > be able to reply, my apologies. > > > > [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0 > > Thanks for adding it to the agenda. > Hopefully we'll be able to converge faster on a call. Yep, hopefully. Looking forward to BPF office hours this week. > > There are several things to discuss: > 1. whether or not to freeze helpers. > 2. whether dynptr accessors should be helpers or kfuncs. > 3. whether your future inline iterators should be helpers or kfuncs. > 4. whether cilium's bpf_sock_destroy should be helper or kfunc. > > If we hard freeze helpers in 1 it automatically decides the fate for 2, 3, 4. > We can soft freeze the helpers then 2,3,4 are up for discussion. > Looks like the thread so far was primarily about 1. The thread started as 2 and got expanded to 1, but I agree that 2, 3, and 4 are all separate topics (just predicated on 1 being decided in favor of not freezing helpers). > 4 was touched separately. Daniel hasn't replied yet to my suggestion for it to be kfunc. > You insist that 2 and 3 must be helpers. > No one seen the patches for 3. I've seen you whiteboard them. It's impossible > for others to participate without patches, so let's postpone that. Sure, as I intended to do in [0], except if BPF helpers are hard-frozen, there would be no discussion to have. But hopefully it's clear that my example with iterators was about stability and generality of certain concepts (looping) and how libbpf has stable API expectations and responsibilities as well. [0] https://lore.kernel.org/bpf/CAEf4BzbVoiVSa1_49CMNu-q5NnOvmaaHsOWxed-nZo9rioooWg@mail.gmail.com/ > > Let's try to focus this thread on 2 assuming both helpers and kfuncs > are on the table for dynptrs... > > > conclusions. I think it's even possible to deprecate BPF helpers, if > > we really want to. In the end, technically, the only UAPI part about > > BPF helper is it's ID. That should stay fixed. We do change over time > > which helpers are available in which program types. Yes, it would be > > really bad to change helper signature and I'd be very much against > > this, but from my perspective (and I'm sure others will disagree), > > it's in the realm of possibility to do gradual deprecation of some > > helpers. We'll leave BPF_FUNC_xxx enumerator intact, of course, but > > add a simple wrapper that will just -ENOTSUP. > > Unfortunately you're completely wrong in the above paragraph. > I suggest to read this Linus's rant first: > https://lkml.org/lkml/2012/12/23/75 > > Everything that user space sees we cannot change. > We can try to, but it will be reverted if users complain. I very well might be and it was my opinion (which I explicitly acknowledged as certainly being controversial). This is a completely separate discussion, but on one hand we say it's fine to remove or change kfuncs, because kfuncs are only visible to BPF programs, which are kernel-to-kernel programs and user-space rules do not apply. On the other hand, BPF helpers are also only visible to BPF programs, the only user-space visible part is enum name and ID. Yet they are treated very differently. It's fine, but to me it's more of an issue of a user contract, rather than some technicality about being defined in some header. It feels like we should be able to define a contract that some range of IDs will be "unstable" in the sense that they might start eventually returning -ENOTSUP if we have reasonable confidence they are not useful anymore. But it's just my opinion, and no amount of shouting at me will change that fact. And as I said before, I don't think BPF helpers are a big maintenance liability in the first place. > That's why we never try unless there is a very strong reason like security issue. > > For example your last commit to uapi/bpf.h > commit 8a76145a2ec2 ("bpf: explicitly define BPF_FUNC_xxx integer values") > is a leap of faith. > Though we tried to make it as transparent as possible and > I googled BPF_FUNC_MAPPER before applying the patch to see in what weird ways > people can use the macro, there is still a non zero chance that > we would have to revert it if users complain loud enough. > > For example cilium has this bit of code: > https://github.com/cilium/ebpf/blob/master/asm/func.go > I suspect it's broken now, because you've changed 'FN' macro in that commit. > Cilium folks are unlikely to complain and demand a revert, so we should be safe > in this regard, but we cannot assume that for other users. Sure, all above is true and we discussed all that when reviewing that patch. And I liked that we could weigh pros and cons in that particular case, and hopefully can keep doing that. > > It should be obvious that we cannot deprecate helpers with ENOTSUP > or deprecate them in any other way. I'm fine with that. > > > E.g., Linus requested bpf_probe_read() to not exist and not be used, > > everyone agreed. Good opportunity? > > It's an exception that proves the rule. > 1. it's a security issue that's why uapi breakage was on the table. > 2. it wasn't completely removed. See: > > #ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE > case BPF_FUNC_probe_read: > return security_locked_down(LOCKDOWN_BPF_READ_KERNEL) < 0 ? > NULL : &bpf_probe_read_compat_proto; Sure, not disputing this. I do think that it's just another example emphasizing that the world is not black and white and there *has* to be nuance in every decision. > > > The point is that UAPI stability is not the end of the world and > > paranoia is bad. We shouldn't get paralyzed because we add APIs. We do > > that to libbpf and APIs will stay stable within entire 1.x version. > > Yes, we don't have such a nice "luxury" with kernel, but see above. > > Exactly. See above. There is no way at all to deprecate helpers. OK. > > > > > > > - On the one hand you're arguing that in some cases, _no_ API > > > instability is acceptable. That in general, the main kernel <-> kernel > > > BPF program API boundary is equivalent to UAPI, and that it's _never_ > > > acceptable for us to ever, _ever_ deprecate certain APIs because > > > > you are being hyperbolic and overdramatic again for no good reason, > > "ever, _ever_" -- really? There is no such thing. > > Andrii, it's really _ever_. You need to internalize that first > before we discuss this topic again during office hours. I'll try to. My point (somewhat subtle, perhaps) was that humans are very bad about planning 5-10-20-50 years ahead. So any "ever" is overdramatized and hyperbolic. There might be no BPF, Linux, or computers in current form in 50 years. I refuse to stress about not being able to remove BPF helpers in 50 years, sorry. > > > I'd probably still go for it. But building some tool like perf or > > retsnoop -- I'd think twice if I want to take dependency on BPF map > > (or dynptr for that matter), if it potentially limits the > > applicability of my application. > > A quote from retsnoop readme: > " > NOTE: Retsnoop relies on BPF CO-RE technology, so please make sure your Linux > kernel is built with CONFIG_DEBUG_INFO_BTF=y kernel config. Without this > retsnoop will refuse to start. > " > and in calib_feat.bpf.c > /* Detect if bpf_get_func_ip() helper is supported by the kernel. > /* Detect if fentry/fexit re-entry protection is implemented. > /* Detect if fexit is safe to use for long-running and sleepable > /* Detect if bpf_get_branch_snapshot() helper is supported. > /* Detect if BPF_MAP_TYPE_RINGBUF map is supported. > /* Detect if BPF cookie is supported for kprobes. > /* Detect if multi-attach kprobes are supported. > > If the feature is useful you will use it. In retsnoop and everywhere else. > Regardless whether it's arch dependent, kernel dependent or unstable. But I'm just a hostage of these BPF quirks and I very much would like not to be (or at the very least minimize them)! Do you think I'm happy that retsnoop won't work on so many different kernel configs and arches, even though retsnoop would be very useful there? I'm happy I don't make money off of retsnoop, so I can afford to just say "sorry, retsnoop won't work in your particular situation, too bad". But if I had a company and some product that relied on BPF, any such hurdle would be painful and result in extra support, maintenance, developer work, lost opportunity, hurdles in adoption, just headaches. "If the feature is useful you will use it" is missing the nuance again. Almost every feature can be worked around. And if some feature adds too many unnecessary complexities and/or dependencies, I might choose to just work around it. Or use some older feature that's less convenient, less performant, maybe more fragile, but works. E.g., instead of using bpf_ringbuf_reserve_dynptr() to minimize amount of data sent over ringbuf, I'll choose to do bigger fixed-sized chunk, lose efficiency, but not reduce a variety of kernels and systems that my app will work on. But in some other situation this extra efficiency might be the difference between product viability and death, so yeah, I'll take that hit and do the extra work. But again, as a BPF user I will feel as a hostage, knowing that it didn't *have* to be this way. That's why I'm fighting so passionately *to not add unnecessary dependencies and complications*. > > > I disagree about ripping the bandaid and precluding dynptr framework > > to be whole before we solve various problems I pointed out in [1] > > (which unfortunately was mostly ignored, it seems). > > Let's look at your > https://lore.kernel.org/all/CAEf4BzZM0+j6DXMgu2o2UvjtzoOxcjsJtT8j-jqVZYvAqxc52g@mail.gmail.com/ > " > 1. Generic accessors to check validity of *any* dynptr, and it's > inherent properties like offset, available size, read-only property > (just as useful somethings as bpf_ringbuf_query() is for ringbufs, > both for debugging and for various heuristics in production). > > bpf_dynptr_is_null(struct bpf_dynptr *ptr) > long bpf_dynptr_get_size(struct bpf_dynptr *ptr) > long bpf_dynptr_get_offset(struct bpf_dynptr *ptr) > bpf_dynptr_is_rdonly(struct bpf_dynptr *ptr) > > There is nothing to add or remove here. No flags, no change in semantics. > " > > You're arguing that it's obviously stable material. > Like: > +BPF_CALL_1(bpf_dynptr_get_offset, struct bpf_dynptr_kern *, ptr) > +{ > + if (!ptr->data) > + return -EINVAL; > + > + return ptr->offset; > +} > > but we can do it now in native bpf code: > > static inline int bpf_dynptr_get_offset(const struct bpf_dynptr *uptr) > { > struct bpf_dynptr_kern *ptr = bpf_rdonly_cast(uptr, bpf_core_type_id_kernel(struct bpf_dynptr_kern)); > > if (!ptr->data) > return -EINVAL; > > return ptr->offset; > } > > No kernel changes necessary. No UAPI helpers. No kfuncs. > CO-RE will take care of kernel version differences. > > Do you still insist that it should be a stable uapi helper ? Yes! bpf_rdonly_cast() is kfunc, with all the consequences. And we are not just exposing internal implementation details of dynptr, we *expect* users to know, care, and follow them. Neither is great. These simple helpers I can implement with BPF_CORE_READ() even, without kfunc dependency, as I already explained before. And it will even work on kernels with no CO-RE support, thanks to BTFgen. But I do not consider that a good approach and good API, sorry. Certainly doesn't make me feel like dynptr is a core first-class concept in BPF. And I actually have no such solution for bpf_dynptr_clone()/bpf_dynptr_advance()/bpf_dynptr_trim(), which is absolutely critical to make dynptr a standard interface for passing variable-sized chunks of memory to other helpers and kfuncs. > > > And for the "for loop iterator", I absolutely do not want to have a > > useful generic abstraction for repeatable loop, that will have few > > asterisks associated with them, dictating which arches and what kernel > > config values (beyond basic BPF ones) should be ensured to make > > iteration work. Kills any motivation to finish it. > > I'm really sad that you went down this ultimatum path. This wasn't my intent and that's not what I'm doing here. I'm explaining my motivation and how I feel about core concepts being part of stable BPF API offerings. And how the inflexible BPF freeze approach will hurt adoption. And yes, I'm afraid it might hurt even the addition of new features if people feel that their work can't be used universally because of arbitrary policies. Human factor is real. Don't be sad, but try to see the argument behind all the words and examples. > Essentially you're saying: "loop iterator has to be stable helper or > I quit working on it." > Say we cave in and accepted your demand. Later you do another ultimatum I hope you can "cave in" based on technical arguments and feedback from users of BPF technology, which have to deal with real-world aspects of all the BPF machinery. And then already have enough to care about, no need to make their life harder. I'm saying the loop iterator has to be a stable helper to be universally used and universally recommended as *the solution for doing repeatable work*. Without thinking about BTF, kfuncs, arch-specific stuff. Because there is no reason why a loop iterator would require any of that. > and we cannot cave in for whatever reason. You stay true to your words > and quit BPF development. Now we're stuck with your uapi that we cannot > change, cannot improve, but still have to maintain it _forever_ > without you because you quit. That would suck. This was always a risk for many years, that didn't stop BPF from gaining lots of useful functionality, even if we'd retrospectively would like to do some things differently. > Let's get back to discussing technical merits without ultimatums. Ok? That's what I've been (and still am) doing all this time.
() On Mon, Jan 9, 2023 at 9:47 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Jan 5, 2023 at 6:54 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Jan 05, 2023 at 01:01:56PM -0800, Andrii Nakryiko wrote: > > > Didn't find the best place to put this, so it will be here. I think it > > > would be beneficial to discuss BPF helpers freeze in BPF office hours. > > > So I took the liberty to put it up for next BPF office hours, 9am, Jan > > > 12th 2022. I hope that some more people that have exposure to > > > real-world BPF application and pains associated with all that could > > > join the discussion, but obviously anyone is welcome as well, no > > > matter which way they are leaning. > > > > > > Please consider joining, see details on Zoom meeting at [0] > > > > > > For the rest, please see below. I'll be out for a few days and won't > > > be able to reply, my apologies. > > > > > > [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0 > > > > Thanks for adding it to the agenda. > > Hopefully we'll be able to converge faster on a call. > > Yep, hopefully. Looking forward to BPF office hours this week. > > > > > There are several things to discuss: > > 1. whether or not to freeze helpers. > > 2. whether dynptr accessors should be helpers or kfuncs. > > 3. whether your future inline iterators should be helpers or kfuncs. > > 4. whether cilium's bpf_sock_destroy should be helper or kfunc. I think these are all big questions. Maybe we can start with some smaller questions? Here is a list of questions I have: 1. Do we want stable kfuncs (as stable as helpers)? Do we want almost stable kfuncs? Will most users of stable APIs be as happy with almost stable alternatives? 2. Do we decide the stability of a kfunc when it is first added? Or do we plan to promote (maybe also demote?) stability later? 3. Besides stability, what are the concerns with kfuncs? How hard is it to resolve them? AFAICT, the concerns are: require BTF, require trampoline. Anything else? I guess we will never remove BTF dependency. Trampoline dependency is hard to resolve, but still possible? 4. We have feature-rich BPF with Linux-x86_64. Do we need some bare-minimal BPF, say for Linux-MIPS, or Windows-ARM, or even nvme-something? I guess this is also related to the BPF standard? Thanks, Song
On Tue, Jan 03, 2023 at 03:51:07PM -0800, Alexei Starovoitov wrote: > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: > > Discoverability plus being able to know semantics from a user PoV to figure out when > > workarounds for older/newer kernels are required to be able to support both kernels. > > Sounds like your concern is that there could be a kfunc that changed it semantics, > but kept exact same name and arguments? Yeah. That would be bad, but we should prevent > such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs. I would advocate for adding versioning to BPF API (be it helpers or "stable" kfuncs). Right now we have two extremes: helpers that can't be changed/fixed/deprecated ever, and kfuncs that can be changed at any time, so the end users can't be sure new kernel won't break their stuff. Detecting and fixing the breakage can also be tricky: end users have to write different probes on a case-by-case basis, and sometimes it's not just a matter of checking the number of function parameters or presence of some definition (such difficulties happen when backporting drivers to older kernels, so I assume it may be an issue for BPF programs as well). Let's say we add a version number to the kernel, and the BPF program also has an API version number it's compiled for. Whenever something changes in the stable API on the kernel side, the version number is increased. At the same time, compatibility on the kernel side is preserved for some reasonable period of time (2 years, 5 years, whatever), which means that if the kernel loads a BPF program with an older version number, and that version is within the supported period of time, the kernel will behave in the old way, i.e. verify the old signature of a function, preserve the old behavior, etc. This approach has the following upsides: 1. End users can stop worrying that some function changes unexpectedly, and they can have a smoother migration plan. 2. Clear deprecation schedule. 3. Easy way to probe for needed functionality, it's just a matter of comparing numbers: the BPF program loader checks that the kernel is new enough, and the kernel checks that the BPF program's API is not too old. 4. Kernel maintainers will have a deprecation strategy. Cons: 1. Arguably a maintainance burden to preserve compatibility on the kernel side, but I would say it's a balance between helpers (which are maintainance burden forever) and kfuncs (which can be changed in every kernel version without keeping any compatibility). "Kfunc that changed its semantics is bad, we should prevent such patches" are just words, but if the developer needs to keep both versions for a while, it will serve as a calm-down mechanism to prevent changes that aren't really necessary. At the same time, the dead code will stop accumulating, because it can be removed according to the schedule. 2. Having a single version number complicates backporting features to older kernels, it would require backporting all previous features chronologically, even if there is no direct dependency. Having multiple version numbers (per feature) is cumbersome for the BPF program to declare. However, this issue is not new, it's already the case for BPF helpers (you can't backport new helpers skipping some other, because the numbers in the list must match). The above description intentionally doesn't specify whether it should be applied to helpers or kfuncs, because it's a universal concept, about which I would like to hear opinions about versioning without bias to helpers or kfuncs. Regarding freezing helpers, I think there should be a solution for deprecating obsolete stuff. There are historical examples of removing things from UAPI: removing i386 support, ipchains, devfs, IrDA subsystem, even a few architectures [1]. If we apply the versioning approach to helpers, we can make long-waiting incompatible changes in v1, keeping the current set of helpers as v0, used for programs that don't declare a version. Eventually (in 5 years, in 10 years, whatever sounds reasonable) we can drop v0 and remove the support for unversioned BPF programs altogether, similar to how other big things were removed from the kernel. Does it sound feasible? [1]: https://lwn.net/Articles/748074/ > "Proper BPF helper" model is broken. > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > is a hack that works only when compiler optimizes the code. What if we replace codegen for helpers, so that it becomes something like this? static inline void *bpf_map_lookup_elem(void *map, const void *key) { // pseudocode alert! asm("call 1" : : "r1"(map), "r2"(key)); } I.e. can we just throw in some inline BPF assembly that prepares registers and invokes a call instruction with the helper number? That should be portable across clang and gcc, allowing to stop relying on optimizations. Any caveats?
On Wed, Jan 11, 2023 at 1:29 PM Song Liu <song@kernel.org> wrote: > > () > > On Mon, Jan 9, 2023 at 9:47 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Jan 5, 2023 at 6:54 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Thu, Jan 05, 2023 at 01:01:56PM -0800, Andrii Nakryiko wrote: > > > > Didn't find the best place to put this, so it will be here. I think it > > > > would be beneficial to discuss BPF helpers freeze in BPF office hours. > > > > So I took the liberty to put it up for next BPF office hours, 9am, Jan > > > > 12th 2022. I hope that some more people that have exposure to > > > > real-world BPF application and pains associated with all that could > > > > join the discussion, but obviously anyone is welcome as well, no > > > > matter which way they are leaning. > > > > > > > > Please consider joining, see details on Zoom meeting at [0] > > > > > > > > For the rest, please see below. I'll be out for a few days and won't > > > > be able to reply, my apologies. > > > > > > > > [0] https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit#gid=0 > > > > > > Thanks for adding it to the agenda. > > > Hopefully we'll be able to converge faster on a call. > > > > Yep, hopefully. Looking forward to BPF office hours this week. > > > > > > > > There are several things to discuss: > > > 1. whether or not to freeze helpers. > > > 2. whether dynptr accessors should be helpers or kfuncs. > > > 3. whether your future inline iterators should be helpers or kfuncs. > > > 4. whether cilium's bpf_sock_destroy should be helper or kfunc. > > I think these are all big questions. Maybe we can start with some > smaller questions? Here is a list of questions I have: > > 1. Do we want stable kfuncs (as stable as helpers)? Do we want > almost stable kfuncs? Yes. We've touched on some of that earlier. We can talk about a range: unstable, deprecated, starting to deprecate, stable plus orthogonal versioning scheme. > Will most users of stable APIs be as happy > with almost stable alternatives? kfuncs are very much analogous to EXPORT_SYMBOL_GPL. There is no versioning scheme, nor deprecation scheme for that. Yet in-kernel and out-of-tree users have been dealing with it. There are kABI things that make things stable to various degrees. So 'happy' is relative. Using that analogy... In-kernel bpf progs won't care. unstable or not they will get carried along automatically when kfuncs change. Out of tree bpf progs can be divided to kernel dependent and kernel independent. The former are similar to in-tree with extra pain that can be mitigated with kfunc detection. The latter will always use stable with understandable deprecation path. Yet it's all in theory. In practice networking folks are using conntrack kfuncs and xfrm kfuncs assuming we will make it all work somehow, though right now we're saying kfuncs are unstable only. So 'happy' and 'pain' are relative depending on the usefulness of kfunc. If bpf prog needs a feature it will use it. If it's a shiny new feature, the prog authors might wait until kfunc stabilizes. Which is exactly the point. We can wish for something to be useful, but we won't know until we actually use it for real and not in some selftest. And it becomes chicken and egg. If it's a cool new feature the bpf prog wants it to be stable to rely on it later, but because it's so new it's not clear whether it's actually useful, so we shouldn't be declaring it stable and cause kernel pains. > 2. Do we decide the stability of a kfunc when it is first added? Or > do we plan to promote (maybe also demote?) stability later? Claiming that something is stable on day one is a subjective opinion of the developer who's adding that feature. There could even be a giant user space project next to it attempting to use that feature, but we've seen that with other uapi-s in the past. > 3. Besides stability, what are the concerns with kfuncs? How hard > is it to resolve them? > AFAICT, the concerns are: require BTF, require trampoline. Only the former. kfuncs do not require bpf trampoline. $ git grep bpf_jit_supports_kfunc_call arch/arm64/net/bpf_jit_comp.c:bool bpf_jit_supports_kfunc_call(void) arch/loongarch/net/bpf_jit.c:bool bpf_jit_supports_kfunc_call(void) arch/x86/net/bpf_jit_comp.c:bool bpf_jit_supports_kfunc_call(void) arch/x86/net/bpf_jit_comp32.c:bool bpf_jit_supports_kfunc_call(void) iirc I've seen the patches for risc-v and arm32. > Anything else? I guess we will never remove BTF dependency. > Trampoline dependency is hard to resolve, but still possible? > > 4. We have feature-rich BPF with Linux-x86_64. Do we need some > bare-minimal BPF, say for Linux-MIPS, or Windows-ARM, or > even nvme-something? I guess this is also related to the BPF > standard? It's not related to ISA standardization. We're not even talking about BTF standardization. Nor about psABI (calling convention and such). It's going to happen much much later.
On Wed, Jan 11, 2023 at 2:57 PM Maxim Mikityanskiy <maxtram95@gmail.com> wrote: > > On Tue, Jan 03, 2023 at 03:51:07PM -0800, Alexei Starovoitov wrote: > > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: > > > Discoverability plus being able to know semantics from a user PoV to figure out when > > > workarounds for older/newer kernels are required to be able to support both kernels. > > > > Sounds like your concern is that there could be a kfunc that changed it semantics, > > but kept exact same name and arguments? Yeah. That would be bad, but we should prevent > > such patches from landing. It's up to us to define sane and user friendly deprecation of kfuncs. > > I would advocate for adding versioning to BPF API (be it helpers or > "stable" kfuncs). Right now we have two extremes: helpers that can't be > changed/fixed/deprecated ever, and kfuncs that can be changed at any > time, so the end users can't be sure new kernel won't break their stuff. > Detecting and fixing the breakage can also be tricky: end users have to > write different probes on a case-by-case basis, and sometimes it's not > just a matter of checking the number of function parameters or presence > of some definition (such difficulties happen when backporting drivers to > older kernels, so I assume it may be an issue for BPF programs as well). > > Let's say we add a version number to the kernel, and the BPF program > also has an API version number it's compiled for. Whenever something > changes in the stable API on the kernel side, the version number is > increased. At the same time, compatibility on the kernel side is > preserved for some reasonable period of time (2 years, 5 years, > whatever), which means that if the kernel loads a BPF program with an > older version number, and that version is within the supported period of > time, the kernel will behave in the old way, i.e. verify the old > signature of a function, preserve the old behavior, etc. Right. I think somebody proposed a version scheme for kfuncs already. There were so many replies I've lost track. But yes it's definitely on the table and we should consider it. Something like libbpf.map We can declare which stable features are supported in which "version". > This approach has the following upsides: > > 1. End users can stop worrying that some function changes unexpectedly, > and they can have a smoother migration plan. > > 2. Clear deprecation schedule. > > 3. Easy way to probe for needed functionality, it's just a matter of > comparing numbers: the BPF program loader checks that the kernel is new > enough, and the kernel checks that the BPF program's API is not too old. > > 4. Kernel maintainers will have a deprecation strategy. +1 > Cons: > > 1. Arguably a maintainance burden to preserve compatibility on the > kernel side, but I would say it's a balance between helpers (which are > maintainance burden forever) and kfuncs (which can be changed in every > kernel version without keeping any compatibility). "Kfunc that changed > its semantics is bad, we should prevent such patches" are just words, > but if the developer needs to keep both versions for a while, it will > serve as a calm-down mechanism to prevent changes that aren't really > necessary. At the same time, the dead code will stop accumulating, > because it can be removed according to the schedule. That sounds like 'pro' instead of 'con' to me :) > 2. Having a single version number complicates backporting features to > older kernels, it would require backporting all previous features > chronologically, even if there is no direct dependency. Having multiple > version numbers (per feature) is cumbersome for the BPF program to > declare. However, this issue is not new, it's already the case for BPF > helpers (you can't backport new helpers skipping some other, because the > numbers in the list must match). yeah. I recall amazon linux or something else backported helpers out of order and that screwed up bpf progs. That was the reason we added numbers to the FN macro in uapi/bpf.h That will hopefully prevent such mistakes. But practically speaking... The distro that does out-of-order backporting and skips certain helpers is saying: I'm defining my own kABI equivalent for bpf progs. In that sense there is zero difference between helpers and kfuncs from distro point of view and from point of view of their customers. Both helpers and kfuncs are neither stable nor unstable. This discussion is only about pros and cons of the upstream kernel and bpf progs that consume upstream kernel. If we include hyperscalers in the discussion then all helpers and all kfuncs immediately become stable from point of view of their engineers. Big datacenters can maintain kernels with whatever helpers and kfuncs they need. > > The above description intentionally doesn't specify whether it should be > applied to helpers or kfuncs, because it's a universal concept, about > which I would like to hear opinions about versioning without bias to > helpers or kfuncs. > > Regarding freezing helpers, I think there should be a solution for > deprecating obsolete stuff. There are historical examples of removing > things from UAPI: removing i386 support, ipchains, devfs, IrDA > subsystem, even a few architectures [1]. If we apply the versioning > approach to helpers, we can make long-waiting incompatible changes in > v1, keeping the current set of helpers as v0, used for programs that > don't declare a version. Eventually (in 5 years, in 10 years, whatever > sounds reasonable) we can drop v0 and remove the support for unversioned > BPF programs altogether, similar to how other big things were removed > from the kernel. Does it sound feasible? Not to me. Breaking uapi in whichever way with whatever excuse is not on the table. We've documented our rules long ago: Q: Does BPF have a stable ABI? ------------------------------ A: YES. BPF instructions, arguments to BPF programs, set of helper functions and their arguments, recognized return codes are all part of ABI. > > "Proper BPF helper" model is broken. > > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; > > > > is a hack that works only when compiler optimizes the code. > > What if we replace codegen for helpers, so that it becomes something > like this? > > static inline void *bpf_map_lookup_elem(void *map, const void *key) > { > // pseudocode alert! > asm("call 1" : : "r1"(map), "r2"(key)); > } > > I.e. can we just throw in some inline BPF assembly that prepares > registers and invokes a call instruction with the helper number? That > should be portable across clang and gcc, allowing to stop relying on > optimizations. Great idea! It needs "=r" to capture R0 into the 'ret' variable and then it should work. clang may have issues with such asm, but should be fixable. gcc is less clear. iirc they had their own incompatible inline asm :( It's a bigger issue.
On Wed, Jan 11, 2023 at 8:24 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > [...] > > > > 1. Do we want stable kfuncs (as stable as helpers)? Do we want > > almost stable kfuncs? > > Yes. We've touched on some of that earlier. > We can talk about a range: > unstable, deprecated, starting to deprecate, stable > plus orthogonal versioning scheme. > > > Will most users of stable APIs be as happy > > with almost stable alternatives? > > kfuncs are very much analogous to EXPORT_SYMBOL_GPL. > There is no versioning scheme, nor deprecation scheme for that. > Yet in-kernel and out-of-tree users have been dealing with it. > There are kABI things that make things stable to various degrees. > So 'happy' is relative. > Using that analogy... > In-kernel bpf progs won't care. unstable or not they will get > carried along automatically when kfuncs change. > Out of tree bpf progs can be divided to kernel dependent > and kernel independent. The former are similar to in-tree > with extra pain that can be mitigated with kfunc detection. > The latter will always use stable with understandable deprecation path. > Yet it's all in theory. > In practice networking folks are using conntrack kfuncs and > xfrm kfuncs assuming we will make it all work somehow, > though right now we're saying kfuncs are unstable only. I think we need something more stable than EXPORT_SYMBOL_GPL, because: 1) there are more OOT bpf progs than OOT drivers; 2) some BPF developers (network people in KP's categories) have less kernel experience, and thus have a stronger preference for more stable APIs. The range of stability on top of EXPORT_SYMBOL_GPL could be really helpful for these users. > > So 'happy' and 'pain' are relative depending on the usefulness > of kfunc. If bpf prog needs a feature it will use it. > If it's a shiny new feature, the prog authors might wait > until kfunc stabilizes. > Which is exactly the point. > We can wish for something to be useful, but we won't know > until we actually use it for real and not in some selftest. > > And it becomes chicken and egg. If it's a cool new feature > the bpf prog wants it to be stable to rely on it later, > but because it's so new it's not clear whether it's actually useful, > so we shouldn't be declaring it stable and cause kernel pains. > > > 2. Do we decide the stability of a kfunc when it is first added? Or > > do we plan to promote (maybe also demote?) stability later? > > Claiming that something is stable on day one > is a subjective opinion of the developer who's adding that feature. > There could even be a giant user space project next to it > attempting to use that feature, but we've seen that with other > uapi-s in the past. With the range of stability, stable could mean "not going away for at least 5 years". Then claiming something is stable means "I/we will support it for at least 5 years". It is probably not too crazy to make this type of promises for some core APIs. > > > 3. Besides stability, what are the concerns with kfuncs? How hard > > is it to resolve them? > > AFAICT, the concerns are: require BTF, require trampoline. > > Only the former. kfuncs do not require bpf trampoline. > > $ git grep bpf_jit_supports_kfunc_call > arch/arm64/net/bpf_jit_comp.c:bool bpf_jit_supports_kfunc_call(void) > arch/loongarch/net/bpf_jit.c:bool bpf_jit_supports_kfunc_call(void) > arch/x86/net/bpf_jit_comp.c:bool bpf_jit_supports_kfunc_call(void) > arch/x86/net/bpf_jit_comp32.c:bool bpf_jit_supports_kfunc_call(void) > > iirc I've seen the patches for risc-v and arm32. Thanks for the correction. Reading commits that enabled kfunc for different archs, I think it is easier than enabling trampolines. AFAICT, more stability of some kfuncs and better availability of kfuncs should address most of the concerns. I would like to hear Andrii's thoughts on this. Thanks, Song > > > Anything else? I guess we will never remove BTF dependency. > > Trampoline dependency is hard to resolve, but still possible? > > > > 4. We have feature-rich BPF with Linux-x86_64. Do we need some > > bare-minimal BPF, say for Linux-MIPS, or Windows-ARM, or > > even nvme-something? I guess this is also related to the BPF > > standard? > > It's not related to ISA standardization. > We're not even talking about BTF standardization. > Nor about psABI (calling convention and such). > It's going to happen much much later.
> On Wed, Jan 11, 2023 at 2:57 PM Maxim Mikityanskiy <maxtram95@gmail.com> wrote: >> >> On Tue, Jan 03, 2023 at 03:51:07PM -0800, Alexei Starovoitov wrote: >> > On Tue, Jan 03, 2023 at 12:43:58PM +0100, Daniel Borkmann wrote: >> > > Discoverability plus being able to know semantics from a user PoV to figure out when >> > > workarounds for older/newer kernels are required to be able to support both kernels. >> > >> > Sounds like your concern is that there could be a kfunc that changed it semantics, >> > but kept exact same name and arguments? Yeah. That would be bad, but we should prevent >> > such patches from landing. It's up to us to define sane and user >> > friendly deprecation of kfuncs. >> >> I would advocate for adding versioning to BPF API (be it helpers or >> "stable" kfuncs). Right now we have two extremes: helpers that can't be >> changed/fixed/deprecated ever, and kfuncs that can be changed at any >> time, so the end users can't be sure new kernel won't break their stuff. >> Detecting and fixing the breakage can also be tricky: end users have to >> write different probes on a case-by-case basis, and sometimes it's not >> just a matter of checking the number of function parameters or presence >> of some definition (such difficulties happen when backporting drivers to >> older kernels, so I assume it may be an issue for BPF programs as well). >> >> Let's say we add a version number to the kernel, and the BPF program >> also has an API version number it's compiled for. Whenever something >> changes in the stable API on the kernel side, the version number is >> increased. At the same time, compatibility on the kernel side is >> preserved for some reasonable period of time (2 years, 5 years, >> whatever), which means that if the kernel loads a BPF program with an >> older version number, and that version is within the supported period of >> time, the kernel will behave in the old way, i.e. verify the old >> signature of a function, preserve the old behavior, etc. > > Right. I think somebody proposed a version scheme for kfuncs already. > There were so many replies I've lost track. > But yes it's definitely on the table and > we should consider it. > Something like libbpf.map > We can declare which stable features are supported in which "version". > >> This approach has the following upsides: >> >> 1. End users can stop worrying that some function changes unexpectedly, >> and they can have a smoother migration plan. >> >> 2. Clear deprecation schedule. >> >> 3. Easy way to probe for needed functionality, it's just a matter of >> comparing numbers: the BPF program loader checks that the kernel is new >> enough, and the kernel checks that the BPF program's API is not too old. >> >> 4. Kernel maintainers will have a deprecation strategy. > > +1 > >> Cons: >> >> 1. Arguably a maintainance burden to preserve compatibility on the >> kernel side, but I would say it's a balance between helpers (which are >> maintainance burden forever) and kfuncs (which can be changed in every >> kernel version without keeping any compatibility). "Kfunc that changed >> its semantics is bad, we should prevent such patches" are just words, >> but if the developer needs to keep both versions for a while, it will >> serve as a calm-down mechanism to prevent changes that aren't really >> necessary. At the same time, the dead code will stop accumulating, >> because it can be removed according to the schedule. > > That sounds like 'pro' instead of 'con' to me :) > >> 2. Having a single version number complicates backporting features to >> older kernels, it would require backporting all previous features >> chronologically, even if there is no direct dependency. Having multiple >> version numbers (per feature) is cumbersome for the BPF program to >> declare. However, this issue is not new, it's already the case for BPF >> helpers (you can't backport new helpers skipping some other, because the >> numbers in the list must match). > > yeah. I recall amazon linux or something else backported > helpers out of order and that screwed up bpf progs. > That was the reason we added numbers to the FN macro in uapi/bpf.h > That will hopefully prevent such mistakes. > > But practically speaking... > The distro that does out-of-order backporting and skips > certain helpers is saying: I'm defining my own kABI equivalent > for bpf progs. > In that sense there is zero difference between helpers and kfuncs > from distro point of view and from point of view of their customers. > Both helpers and kfuncs are neither stable nor unstable. > > This discussion is only about pros and cons of the upstream kernel > and bpf progs that consume upstream kernel. > > If we include hyperscalers in the discussion then all > helpers and all kfuncs immediately become stable from > point of view of their engineers. > Big datacenters can maintain kernels with whatever helpers > and kfuncs they need. > >> >> The above description intentionally doesn't specify whether it should be >> applied to helpers or kfuncs, because it's a universal concept, about >> which I would like to hear opinions about versioning without bias to >> helpers or kfuncs. >> >> Regarding freezing helpers, I think there should be a solution for >> deprecating obsolete stuff. There are historical examples of removing >> things from UAPI: removing i386 support, ipchains, devfs, IrDA >> subsystem, even a few architectures [1]. If we apply the versioning >> approach to helpers, we can make long-waiting incompatible changes in >> v1, keeping the current set of helpers as v0, used for programs that >> don't declare a version. Eventually (in 5 years, in 10 years, whatever >> sounds reasonable) we can drop v0 and remove the support for unversioned >> BPF programs altogether, similar to how other big things were removed >> from the kernel. Does it sound feasible? > > Not to me. Breaking uapi in whichever way with whatever excuse > is not on the table. > We've documented our rules long ago: > > Q: Does BPF have a stable ABI? > ------------------------------ > A: YES. BPF instructions, arguments to BPF programs, set of helper > functions and their arguments, recognized return codes are all part > of ABI. > >> > "Proper BPF helper" model is broken. >> > static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1; >> > >> > is a hack that works only when compiler optimizes the code. >> >> What if we replace codegen for helpers, so that it becomes something >> like this? >> >> static inline void *bpf_map_lookup_elem(void *map, const void *key) >> { >> // pseudocode alert! >> asm("call 1" : : "r1"(map), "r2"(key)); >> } >> >> I.e. can we just throw in some inline BPF assembly that prepares >> registers and invokes a call instruction with the helper number? That >> should be portable across clang and gcc, allowing to stop relying on >> optimizations. > > Great idea! +1 > It needs "=r" to capture R0 into the 'ret' variable and then it should work. > clang may have issues with such asm, but should be fixable. > gcc is less clear. That inline assembly should work with GCC as it is now. Both compilers use the same syntax for the `call' instruction. > iirc they had their own incompatible inline asm :( > It's a bigger issue. We are taking care of that, by adding support to the GNU assembler to also understand the pseudo-C syntax used by llvm. This covers both .s files specified in the compilation line, and inline asm statements. Should be ready soon.
On Fri, Jan 13, 2023 at 1:45 AM Jose E. Marchesi <jose.marchesi@oracle.com> wrote: > > > iirc they had their own incompatible inline asm :( > > It's a bigger issue. > > We are taking care of that, by adding support to the GNU assembler to > also understand the pseudo-C syntax used by llvm. This covers both .s > files specified in the compilation line, and inline asm statements. > > Should be ready soon. This is awesome! Thank you.