Message ID | cover.1631289870.git.lorenzo@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | mvneta: introduce XDP multi-buffer support | expand |
On Fri, 10 Sep 2021 18:14:06 +0200 Lorenzo Bianconi wrote: > The two following ebpf helpers (and related selftests) has been introduced: > - bpf_xdp_adjust_data: > Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments > according to the offset provided by the ebpf program. This helper can be > used to read/write values in frame payload. > - bpf_xdp_get_buff_len: > Return the total frame size (linear + paged parts) > More info about the main idea behind this approach can be found here [1][2]. Is there much critique of the skb helpers we have? My intuition would be to follow a similar paradigm from the API perspective. It may seem trivial to us to switch between the two but "normal" users could easily be confused. By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes(). Alternatively how about we produce a variation on skb_header_pointer() (use on-stack buffer or direct access if the entire region is in one frag). bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure how practical it would be to applications. My understanding is that the application is not supposed to make assumptions about the fragment geometry, meaning data can be split at any point. Parsing data arbitrarily split into buffers is hard if pull() is not an option, let alone making such parsing provably correct. Won't applications end up building something like skb_header_pointer() based on bpf_xdp_adjust_data(), anyway? In which case why don't we provide them what they need? say: void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, u32 offset, u32 len, void *stack_buf) flags and offset can be squashed into one u64 as needed. Helper returns pointer to packet data, either real one or stack_buf. Verifier has to be taught that the return value is NULL or a pointer which is safe with offsets up to @len. If the reason for access is write we'd also need: void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, u32 offset, u32 len, void *stack_buf) Same inputs, if stack buffer was used it does write back, otherwise nop. Sorry for the longish email if I'm missing something obvious and/or discussed earlier. The other thing I wanted to double check - was the decision on program compatibility made? Is a new program type an option? It'd be extremely useful operationally to be able to depend on kernel enforcement.
> On Fri, 10 Sep 2021 18:14:06 +0200 Lorenzo Bianconi wrote: > > The two following ebpf helpers (and related selftests) has been introduced: > > - bpf_xdp_adjust_data: > > Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments > > according to the offset provided by the ebpf program. This helper can be > > used to read/write values in frame payload. > > - bpf_xdp_get_buff_len: > > Return the total frame size (linear + paged parts) > > > More info about the main idea behind this approach can be found here [1][2]. > > Is there much critique of the skb helpers we have? My intuition would > be to follow a similar paradigm from the API perspective. It may seem > trivial to us to switch between the two but "normal" users could easily > be confused. > > By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes(). > > Alternatively how about we produce a variation on skb_header_pointer() > (use on-stack buffer or direct access if the entire region is in one > frag). > > bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure > how practical it would be to applications. My understanding is that the > application is not supposed to make assumptions about the fragment > geometry, meaning data can be split at any point. Parsing data > arbitrarily split into buffers is hard if pull() is not an option, let > alone making such parsing provably correct. > > Won't applications end up building something like skb_header_pointer() > based on bpf_xdp_adjust_data(), anyway? In which case why don't we > provide them what they need? Please correct me if I am wrong, here you mean in bpf_xdp_adjust_data() we are moving the logic to read/write data across fragment boundaries to the caller. Right. I do not have a clear view about what could be a real use-case for the helper (maybe John can help on this), but similar to what you are suggesting, what about doing something like bpf_skb_load/store_bytes()? - bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len, void *data) - bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len, void *data) the helper can take care of reading/writing across fragment boundaries and remove any layout info from the caller. The only downside here (as for bpf_skb_load/store_bytes()) is we need to copy. But in a real application, is it actually an issue? (since we have much less pps for xdp multi-buff). Moreover I do not know if this solution will requires some verifier changes. @John: can this approach works in your use-case? Anyway I think we should try to get everyone on the same page here since the helper can change according to specific use-case. Since this series is on the agenda for LPC next week, I hope you and others who have an opinion about this will find the time to come and discuss it during the conference :) Regards, Lorenzo > > say: > > void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, > u32 offset, u32 len, void *stack_buf) > > flags and offset can be squashed into one u64 as needed. Helper returns > pointer to packet data, either real one or stack_buf. Verifier has to > be taught that the return value is NULL or a pointer which is safe with > offsets up to @len. > > If the reason for access is write we'd also need: > > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > u32 offset, u32 len, void *stack_buf) > > Same inputs, if stack buffer was used it does write back, otherwise nop. > > Sorry for the longish email if I'm missing something obvious and/or > discussed earlier. > > > The other thing I wanted to double check - was the decision on program > compatibility made? Is a new program type an option? It'd be extremely > useful operationally to be able to depend on kernel enforcement.
On Fri, 17 Sep 2021 16:51:06 +0200 Lorenzo Bianconi wrote: > > Is there much critique of the skb helpers we have? My intuition would > > be to follow a similar paradigm from the API perspective. It may seem > > trivial to us to switch between the two but "normal" users could easily > > be confused. > > > > By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes(). > > > > Alternatively how about we produce a variation on skb_header_pointer() > > (use on-stack buffer or direct access if the entire region is in one > > frag). > > > > bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure > > how practical it would be to applications. My understanding is that the > > application is not supposed to make assumptions about the fragment > > geometry, meaning data can be split at any point. Parsing data > > arbitrarily split into buffers is hard if pull() is not an option, let > > alone making such parsing provably correct. > > > > Won't applications end up building something like skb_header_pointer() > > based on bpf_xdp_adjust_data(), anyway? In which case why don't we > > provide them what they need? > > Please correct me if I am wrong, here you mean in bpf_xdp_adjust_data() > we are moving the logic to read/write data across fragment boundaries > to the caller. Right. > I do not have a clear view about what could be a real use-case for the helper > (maybe John can help on this), but similar to what you are suggesting, what > about doing something like bpf_skb_load/store_bytes()? > > - bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len, > void *data) > > - bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, u32 len, > void *data) > > the helper can take care of reading/writing across fragment boundaries > and remove any layout info from the caller. The only downside here > (as for bpf_skb_load/store_bytes()) is we need to copy. If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we can start with that. In all honesty I don't know what the exact use cases for looking at data are, either. I'm primarily worried about exposing the kernel internals too early. What I'm imagining is that data access mostly works around bad header/data split or loading some simple >L4 headers. On one hand such headers will fall into one frag, so 99.9% of the time the copy isn't really required. But as I said I'm happy with load/store, to begin with. > But in a real application, is it actually an issue? (since we have > much less pps for xdp multi-buff). > Moreover I do not know if this solution will requires some verifier > changes. > > @John: can this approach works in your use-case? > > Anyway I think we should try to get everyone on the same page here > since the helper can change according to specific use-case. Since > this series is on the agenda for LPC next week, I hope you and others > who have an opinion about this will find the time to come and discuss > it during the conference :) Yup, I'll be there. I'm not good at thinking on my feet tho, hence sharing my comments now.
On Fri, Sep 17, 2021 at 11:33 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > > Alternatively how about we produce a variation on skb_header_pointer() > > > (use on-stack buffer or direct access if the entire region is in one > > > frag). > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we > can start with that. In all honesty I don't know what the exact > use cases for looking at data are, either. I'm primarily worried > about exposing the kernel internals too early. I don't mind the xdp equivalent of skb_load_bytes, but skb_header_pointer() idea is superior. When we did xdp with data/data_end there was no refine_retval_range concept in the verifier (iirc or we just missed that opportunity). We'd need something more advanced: a pointer with valid range refined by input argument 'len' or NULL. The verifier doesn't have such thing yet, but it fits as a combination of value_or_null plus refine_retval_range. The bpf_xdp_header_pointer() and bpf_skb_header_pointer() would probably simplify bpf programs as well. There would be no need to deal with data/data_end.
On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote: > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we > > can start with that. In all honesty I don't know what the exact > > use cases for looking at data are, either. I'm primarily worried > > about exposing the kernel internals too early. > > I don't mind the xdp equivalent of skb_load_bytes, > but skb_header_pointer() idea is superior. > When we did xdp with data/data_end there was no refine_retval_range > concept in the verifier (iirc or we just missed that opportunity). > We'd need something more advanced: a pointer with valid range > refined by input argument 'len' or NULL. > The verifier doesn't have such thing yet, but it fits as a combination of > value_or_null plus refine_retval_range. > The bpf_xdp_header_pointer() and bpf_skb_header_pointer() > would probably simplify bpf programs as well. > There would be no need to deal with data/data_end. What are your thoughts on inlining? Can we inline the common case of the header being in the "head"? Otherwise data/end comparisons would be faster.
On Fri, Sep 17, 2021 at 12:00 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote: > > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we > > > can start with that. In all honesty I don't know what the exact > > > use cases for looking at data are, either. I'm primarily worried > > > about exposing the kernel internals too early. > > > > I don't mind the xdp equivalent of skb_load_bytes, > > but skb_header_pointer() idea is superior. > > When we did xdp with data/data_end there was no refine_retval_range > > concept in the verifier (iirc or we just missed that opportunity). > > We'd need something more advanced: a pointer with valid range > > refined by input argument 'len' or NULL. > > The verifier doesn't have such thing yet, but it fits as a combination of > > value_or_null plus refine_retval_range. > > The bpf_xdp_header_pointer() and bpf_skb_header_pointer() > > would probably simplify bpf programs as well. > > There would be no need to deal with data/data_end. > > What are your thoughts on inlining? Can we inline the common case > of the header being in the "head"? Otherwise data/end comparisons > would be faster. Yeah. It can be inlined by the verifier. It would still look like a call from bpf prog pov with llvm doing spill/fill of scratched regs, but it's minor. Also we can use the same bpf_header_pointer(ctx, ...) helper for both xdp and skb program types. They will have different implementation underneath, but this might make possible writing bpf programs that could work in both xdp and skb context. I believe cilium has fancy macros to achieve that.
Alexei Starovoitov wrote: > On Fri, Sep 17, 2021 at 12:00 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote: > > > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we > > > > can start with that. In all honesty I don't know what the exact > > > > use cases for looking at data are, either. I'm primarily worried > > > > about exposing the kernel internals too early. > > > > > > I don't mind the xdp equivalent of skb_load_bytes, > > > but skb_header_pointer() idea is superior. > > > When we did xdp with data/data_end there was no refine_retval_range > > > concept in the verifier (iirc or we just missed that opportunity). > > > We'd need something more advanced: a pointer with valid range > > > refined by input argument 'len' or NULL. > > > The verifier doesn't have such thing yet, but it fits as a combination of > > > value_or_null plus refine_retval_range. > > > The bpf_xdp_header_pointer() and bpf_skb_header_pointer() > > > would probably simplify bpf programs as well. > > > There would be no need to deal with data/data_end. > > > > What are your thoughts on inlining? Can we inline the common case > > of the header being in the "head"? Otherwise data/end comparisons > > would be faster. > > Yeah. It can be inlined by the verifier. > It would still look like a call from bpf prog pov with llvm doing spill/fill > of scratched regs, but it's minor. > > Also we can use the same bpf_header_pointer(ctx, ...) > helper for both xdp and skb program types. They will have different > implementation underneath, but this might make possible writing bpf > programs that could work in both xdp and skb context. > I believe cilium has fancy macros to achieve that. Hi, First a header_pointer() logic that works across skb and xdp seems like a great idea to me. I wonder though if instead of doing the copy into a new buffer for offset past the initial frag like what is done in skb_header_pointer could we just walk the frags and point at the new offset. This is what we do on the socket side with bpf_msg_pull-data() for example. For XDP it should also work. The skb case would depend on clone state and things so might be a bit more tricky there. This has the advantage of only doing the copy when its necessary. This can be useful for example when reading the tail of an IPsec packet. With blind copy most packets will get hit with a copy. By just writing the pkt->data and pkt->data_end we can avoid this case. Lorenz originally implemented something similar earlier and we had the refine retval logic. It failed on no-alu32 for some reason we could revisit. I didn't mind the current help returning with data pointer set to the start of the frag so we stopped following up on it. I agree though the current implementation puts a lot on the BPF writer. So getting both cases covered, I want to take pains in my BPF prog to avoid copies and I just want these bytes handled behind a single helper seems good to me. Thanks, John
John Fastabend <john.fastabend@gmail.com> writes: > Alexei Starovoitov wrote: >> On Fri, Sep 17, 2021 at 12:00 PM Jakub Kicinski <kuba@kernel.org> wrote: >> > >> > On Fri, 17 Sep 2021 11:43:07 -0700 Alexei Starovoitov wrote: >> > > > If bpf_xdp_load_bytes() / bpf_xdp_store_bytes() works for most we >> > > > can start with that. In all honesty I don't know what the exact >> > > > use cases for looking at data are, either. I'm primarily worried >> > > > about exposing the kernel internals too early. >> > > >> > > I don't mind the xdp equivalent of skb_load_bytes, >> > > but skb_header_pointer() idea is superior. >> > > When we did xdp with data/data_end there was no refine_retval_range >> > > concept in the verifier (iirc or we just missed that opportunity). >> > > We'd need something more advanced: a pointer with valid range >> > > refined by input argument 'len' or NULL. >> > > The verifier doesn't have such thing yet, but it fits as a combination of >> > > value_or_null plus refine_retval_range. >> > > The bpf_xdp_header_pointer() and bpf_skb_header_pointer() >> > > would probably simplify bpf programs as well. >> > > There would be no need to deal with data/data_end. >> > >> > What are your thoughts on inlining? Can we inline the common case >> > of the header being in the "head"? Otherwise data/end comparisons >> > would be faster. >> >> Yeah. It can be inlined by the verifier. >> It would still look like a call from bpf prog pov with llvm doing spill/fill >> of scratched regs, but it's minor. >> >> Also we can use the same bpf_header_pointer(ctx, ...) >> helper for both xdp and skb program types. They will have different >> implementation underneath, but this might make possible writing bpf >> programs that could work in both xdp and skb context. >> I believe cilium has fancy macros to achieve that. > > Hi, > > First a header_pointer() logic that works across skb and xdp seems like > a great idea to me. I wonder though if instead of doing the copy > into a new buffer for offset past the initial frag like what is done in > skb_header_pointer could we just walk the frags and point at the new offset. > This is what we do on the socket side with bpf_msg_pull-data() for example. > For XDP it should also work. The skb case would depend on clone state > and things so might be a bit more tricky there. > > This has the advantage of only doing the copy when its necessary. This > can be useful for example when reading the tail of an IPsec packet. With > blind copy most packets will get hit with a copy. By just writing the > pkt->data and pkt->data_end we can avoid this case. > > Lorenz originally implemented something similar earlier and we had the > refine retval logic. It failed on no-alu32 for some reason we could > revisit. I didn't mind the current help returning with data pointer set > to the start of the frag so we stopped following up on it. > > I agree though the current implementation puts a lot on the BPF writer. > So getting both cases covered, I want to take pains in my BPF prog > to avoid copies and I just want these bytes handled behind a single > helper seems good to me. I'm OK with a bpf_header_pointer()-type helper - I quite like the in-kernel version of this for SKBs, so replicating it as a BPF helper would be great. But I'm a little worried about taking a performance hit. I.e., if you do: ptr = bpf_header_pointer(pkt, offset, len, stack_ptr) *ptr = xxx; then, if the helper ended up copying the data into the stack pointer, you didn't actually change anything in the packet, so you need to do a writeback. Jakub suggested up-thread that this should be done with some kind of flush() helper. But you don't know whether the header_pointer()-helper copied the data, so you always need to call the flush() helper, which will incur overhead. If the verifier can in-line the helpers that will lower it, but will it be enough to make it negligible? -Toke
On Sat, 18 Sep 2021 13:53:35 +0200 Toke Høiland-Jørgensen wrote: > I'm OK with a bpf_header_pointer()-type helper - I quite like the > in-kernel version of this for SKBs, so replicating it as a BPF helper > would be great. But I'm a little worried about taking a performance hit. > > I.e., if you do: > > ptr = bpf_header_pointer(pkt, offset, len, stack_ptr) > *ptr = xxx; > > then, if the helper ended up copying the data into the stack pointer, > you didn't actually change anything in the packet, so you need to do a > writeback. > > Jakub suggested up-thread that this should be done with some kind of > flush() helper. But you don't know whether the header_pointer()-helper > copied the data, so you always need to call the flush() helper, which > will incur overhead. If the verifier can in-line the helpers that will > lower it, but will it be enough to make it negligible? Depends on the assumptions the program otherwise makes, right? For reading I'd expect a *layout-independent* TC program would replace approximately: ptr = <some_ptr>; if (ptr + CONST >= md->ptr_end) if (bpf_pull_data(md, off + CONST)) return DROP; ptr = <some_ptr>; if (ptr + CONST >= md->ptr_end) return DROP; /* da hell? */ } With this (pre-inlining): ptr = bpf_header_pointer(md, offset, len, stack); if (!ptr) return DROP; Post-inlining (assuming static validation of args to prevent wraps): if (md->ptr + args->off + args->len < md->ptr_end) ptr = md->ptr + args->off; else ptr = __bpf_header_pointer(md, offset, len, stack); if (!ptr) return DROP; But that's based on guesswork so perhaps I'm off base. Regarding the flush() I was expecting that most progs will not modify the packet (or at least won't modify most headers they load) so no point paying the price of tracking changes auto-magically. In fact I don't think there is anything infra can do better for flushing than the prog itself: bool mod = false; ptr = bpf_header_pointer(...); ... if (some_cond(...)) { change_packet(...); mod = true; } ... if (mod) bpf_header_pointer_flush(); is simple enough.. to me.
Jakub Kicinski <kuba@kernel.org> writes: > On Sat, 18 Sep 2021 13:53:35 +0200 Toke Høiland-Jørgensen wrote: >> I'm OK with a bpf_header_pointer()-type helper - I quite like the >> in-kernel version of this for SKBs, so replicating it as a BPF helper >> would be great. But I'm a little worried about taking a performance hit. >> >> I.e., if you do: >> >> ptr = bpf_header_pointer(pkt, offset, len, stack_ptr) >> *ptr = xxx; >> >> then, if the helper ended up copying the data into the stack pointer, >> you didn't actually change anything in the packet, so you need to do a >> writeback. >> >> Jakub suggested up-thread that this should be done with some kind of >> flush() helper. But you don't know whether the header_pointer()-helper >> copied the data, so you always need to call the flush() helper, which >> will incur overhead. If the verifier can in-line the helpers that will >> lower it, but will it be enough to make it negligible? > > Depends on the assumptions the program otherwise makes, right? > > For reading I'd expect a *layout-independent* TC program would > replace approximately: > > ptr = <some_ptr>; > if (ptr + CONST >= md->ptr_end) > if (bpf_pull_data(md, off + CONST)) > return DROP; > ptr = <some_ptr>; > if (ptr + CONST >= md->ptr_end) > return DROP; /* da hell? */ > } > > With this (pre-inlining): > > ptr = bpf_header_pointer(md, offset, len, stack); > if (!ptr) > return DROP; > > Post-inlining (assuming static validation of args to prevent wraps): > > if (md->ptr + args->off + args->len < md->ptr_end) > ptr = md->ptr + args->off; > else > ptr = __bpf_header_pointer(md, offset, len, stack); > if (!ptr) > return DROP; > > But that's based on guesswork so perhaps I'm off base. Yeah, that's more or less what I was thinking... > Regarding the flush() I was expecting that most progs will not modify > the packet (or at least won't modify most headers they load) so no > point paying the price of tracking changes auto-magically. ... but I guess my mental model assumed that packet writes would be just as frequent as reads, in which case it would be a win if you could amend your example like: > In fact I don't think there is anything infra can do better for > flushing than the prog itself: > > bool mod = false; > > ptr = bpf_header_pointer(...); > ... > if (some_cond(...)) { > change_packet(...); > mod = true; > } > ... > if (mod) to have an additional check like: if (mod && ptr == stack) (or something to that effect). No? -Toke > bpf_header_pointer_flush(); > > > is simple enough.. to me.
On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote: > > In fact I don't think there is anything infra can do better for > > flushing than the prog itself: > > > > bool mod = false; > > > > ptr = bpf_header_pointer(...); > > ... > > if (some_cond(...)) { > > change_packet(...); > > mod = true; > > } > > ... > > if (mod) > > to have an additional check like: > > if (mod && ptr == stack) > > (or something to that effect). No? Good point. Do you think we should have the kernel add/inline this optimization or have the user do it explicitly. The draft API was: void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, u32 offset, u32 len, void *stack_buf) Which does not take the ptr returned by header_pointer(), but that's easy to add (well, easy other than the fact it'd be the 6th arg). BTW I drafted the API this way to cater to the case where flush() is called without a prior call to header_pointer(). For when packet trailer or header is populated directly from a map value. Dunno if that's actually useful, either.
Jakub Kicinski <kuba@kernel.org> writes: > On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote: >> > In fact I don't think there is anything infra can do better for >> > flushing than the prog itself: >> > >> > bool mod = false; >> > >> > ptr = bpf_header_pointer(...); >> > ... >> > if (some_cond(...)) { >> > change_packet(...); >> > mod = true; >> > } >> > ... >> > if (mod) >> >> to have an additional check like: >> >> if (mod && ptr == stack) >> >> (or something to that effect). No? > > Good point. Do you think we should have the kernel add/inline this > optimization or have the user do it explicitly. Hmm, good question. On the one hand it seems like an easy optimisation to add, but on the other hand maybe the caller has other logic that can better know how/when to omit the check. Hmm, but the helper needs to check it anyway, doesn't it? At least it can't just blindly memcpy() if the source and destination would be the same... > The draft API was: > > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > u32 offset, u32 len, void *stack_buf) > > Which does not take the ptr returned by header_pointer(), but that's > easy to add (well, easy other than the fact it'd be the 6th arg). I guess we could play some trickery with stuffing offset/len/flags into one or two u64s to save an argument or two? > BTW I drafted the API this way to cater to the case where flush() > is called without a prior call to header_pointer(). For when packet > trailer or header is populated directly from a map value. Dunno if > that's actually useful, either. Ah, didn't think of that; so then it really becomes a generic xdp_store_bytes()-type helper? Might be useful, I suppose. Adding headers is certainly a fairly common occurrence, but dunno to what extent they'd be copied wholesale from a map (hadn't thought about doing that before either). -Toke
On 21 Sep 2021, at 0:44, Toke Høiland-Jørgensen wrote: > Jakub Kicinski <kuba@kernel.org> writes: > >> On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote: >>>> In fact I don't think there is anything infra can do better for >>>> flushing than the prog itself: >>>> >>>> bool mod = false; >>>> >>>> ptr = bpf_header_pointer(...); >>>> ... >>>> if (some_cond(...)) { >>>> change_packet(...); >>>> mod = true; >>>> } >>>> ... >>>> if (mod) >>> >>> to have an additional check like: >>> >>> if (mod && ptr == stack) >>> >>> (or something to that effect). No? >> >> Good point. Do you think we should have the kernel add/inline this >> optimization or have the user do it explicitly. > > Hmm, good question. On the one hand it seems like an easy optimisation > to add, but on the other hand maybe the caller has other logic that can > better know how/when to omit the check. > > Hmm, but the helper needs to check it anyway, doesn't it? At least it > can't just blindly memcpy() if the source and destination would be the > same... > >> The draft API was: >> >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, >> u32 offset, u32 len, void *stack_buf) >> >> Which does not take the ptr returned by header_pointer(), but that's >> easy to add (well, easy other than the fact it'd be the 6th arg). > > I guess we could play some trickery with stuffing offset/len/flags into > one or two u64s to save an argument or two? > >> BTW I drafted the API this way to cater to the case where flush() >> is called without a prior call to header_pointer(). For when packet >> trailer or header is populated directly from a map value. Dunno if >> that's actually useful, either. > > Ah, didn't think of that; so then it really becomes a generic > xdp_store_bytes()-type helper? Might be useful, I suppose. Adding > headers is certainly a fairly common occurrence, but dunno to what > extent they'd be copied wholesale from a map (hadn't thought about doing > that before either). Sorry for commenting late but I was busy and had to catch up on emails... I like the idea, as these APIs are exactly what I proposed in April, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ I did not call it flush, as it can be used as a general function to copy data to a specific location. //Eelco
On Tue, Sep 21, 2021 at 12:04 PM Eelco Chaudron <echaudro@redhat.com> wrote: > > > > On 21 Sep 2021, at 0:44, Toke Høiland-Jørgensen wrote: > > > Jakub Kicinski <kuba@kernel.org> writes: > > > >> On Mon, 20 Sep 2021 23:01:48 +0200 Toke Høiland-Jørgensen wrote: > >>>> In fact I don't think there is anything infra can do better for > >>>> flushing than the prog itself: > >>>> > >>>> bool mod = false; > >>>> > >>>> ptr = bpf_header_pointer(...); > >>>> ... > >>>> if (some_cond(...)) { > >>>> change_packet(...); > >>>> mod = true; > >>>> } > >>>> ... > >>>> if (mod) > >>> > >>> to have an additional check like: > >>> > >>> if (mod && ptr == stack) > >>> > >>> (or something to that effect). No? > >> > >> Good point. Do you think we should have the kernel add/inline this > >> optimization or have the user do it explicitly. > > > > Hmm, good question. On the one hand it seems like an easy optimisation > > to add, but on the other hand maybe the caller has other logic that can > > better know how/when to omit the check. > > > > Hmm, but the helper needs to check it anyway, doesn't it? At least it > > can't just blindly memcpy() if the source and destination would be the > > same... > > > >> The draft API was: > >> > >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > >> u32 offset, u32 len, void *stack_buf) > >> > >> Which does not take the ptr returned by header_pointer(), but that's > >> easy to add (well, easy other than the fact it'd be the 6th arg). > > > > I guess we could play some trickery with stuffing offset/len/flags into > > one or two u64s to save an argument or two? > > > >> BTW I drafted the API this way to cater to the case where flush() > >> is called without a prior call to header_pointer(). For when packet > >> trailer or header is populated directly from a map value. Dunno if > >> that's actually useful, either. > > > > Ah, didn't think of that; so then it really becomes a generic > > xdp_store_bytes()-type helper? Might be useful, I suppose. Adding > > headers is certainly a fairly common occurrence, but dunno to what > > extent they'd be copied wholesale from a map (hadn't thought about doing > > that before either). > > > Sorry for commenting late but I was busy and had to catch up on emails... > > I like the idea, as these APIs are exactly what I proposed in April, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ > > I did not call it flush, as it can be used as a general function to copy data to a specific location. Here is some performance data (throughput) for this patch set on i40e (40 Gbit/s NIC). All using the xdp_rxq_info sample and NO multi-buffer packets. With v14 only: XDP_DROP: +4% XDP_TX: +1% XDP_PASS: -1% With v14 plus multi-buffer support implemented in i40e courtesy of Tirtha: XDP_DROP: +3% XDP_TX: -1% XDP_PASS: -2% /Magnus > > //Eelco >
On Mon, 20 Sept 2021 at 23:46, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > The draft API was: > > > > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > > u32 offset, u32 len, void *stack_buf) > > > > Which does not take the ptr returned by header_pointer(), but that's > > easy to add (well, easy other than the fact it'd be the 6th arg). > > I guess we could play some trickery with stuffing offset/len/flags into > one or two u64s to save an argument or two? Adding another pointer arg seems really hard to explain as an API. What happens if I pass the "wrong" ptr? What happens if I pass NULL? How about this: instead of taking stack_ptr, take the return value from header_pointer as an argument. Then use the already existing (right ;P) inlining to do the following: if (md->ptr + args->off != ret_ptr) __pointer_flush(...) This means that __pointer_flush has to deal with aliased memory though, so it would always have to memmove. Probably OK for the "slow" path? Lorenz
On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote: > > Won't applications end up building something like skb_header_pointer() > based on bpf_xdp_adjust_data(), anyway? In which case why don't we > provide them what they need? > > say: > > void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, > u32 offset, u32 len, void *stack_buf) > > flags and offset can be squashed into one u64 as needed. Helper returns > pointer to packet data, either real one or stack_buf. Verifier has to > be taught that the return value is NULL or a pointer which is safe with > offsets up to @len. > > If the reason for access is write we'd also need: > > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > u32 offset, u32 len, void *stack_buf) Yes! This would be so much better than bpf_skb_load/store_bytes(), especially if we can use it for both XDP and skb contexts as stated elsewhere in this thread.
Lorenz Bauer <lmb@cloudflare.com> writes: > On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote: >> >> Won't applications end up building something like skb_header_pointer() >> based on bpf_xdp_adjust_data(), anyway? In which case why don't we >> provide them what they need? >> >> say: >> >> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, >> u32 offset, u32 len, void *stack_buf) >> >> flags and offset can be squashed into one u64 as needed. Helper returns >> pointer to packet data, either real one or stack_buf. Verifier has to >> be taught that the return value is NULL or a pointer which is safe with >> offsets up to @len. >> >> If the reason for access is write we'd also need: >> >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, >> u32 offset, u32 len, void *stack_buf) > > Yes! This would be so much better than bpf_skb_load/store_bytes(), > especially if we can use it for both XDP and skb contexts as stated > elsewhere in this thread. Alright. Let's see if we can go this route, then :) -Toke
Lorenz Bauer <lmb@cloudflare.com> writes: > On Mon, 20 Sept 2021 at 23:46, Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> > The draft API was: >> > >> > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, >> > u32 offset, u32 len, void *stack_buf) >> > >> > Which does not take the ptr returned by header_pointer(), but that's >> > easy to add (well, easy other than the fact it'd be the 6th arg). >> >> I guess we could play some trickery with stuffing offset/len/flags into >> one or two u64s to save an argument or two? > > Adding another pointer arg seems really hard to explain as an API. > What happens if I pass the "wrong" ptr? What happens if I pass NULL? > > How about this: instead of taking stack_ptr, take the return value > from header_pointer as an argument. Hmm, that's a good point; I do think that passing the return value from header pointer is more natural as well (you're flushing pointer you just wrote to, after all). > Then use the already existing (right ;P) inlining to do the following: > > if (md->ptr + args->off != ret_ptr) > __pointer_flush(...) The inlining is orthogonal, though, right? The helper can do this check whether or not it's a proper CALL or not (although obviously for performance reasons we do want it to inline, at least eventually). In particular, I believe we can make progress on this patch series without working out the inlining :) > This means that __pointer_flush has to deal with aliased memory > though, so it would always have to memmove. Probably OK for the "slow" > path? Erm, not sure what you mean here? Yeah, flushing is going to take longer if you ended up using the stack pointer instead of writing directly to the packet. That's kinda intrinsic? Or am I misunderstanding you? -Toke
On Wed, 29 Sept 2021 at 13:25, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > Then use the already existing (right ;P) inlining to do the following: > > > > if (md->ptr + args->off != ret_ptr) > > __pointer_flush(...) > > The inlining is orthogonal, though, right? The helper can do this check > whether or not it's a proper CALL or not (although obviously for > performance reasons we do want it to inline, at least eventually). In > particular, I believe we can make progress on this patch series without > working out the inlining :) Yes, I was just worried that your answer would be "it's too expensive" ;) > > This means that __pointer_flush has to deal with aliased memory > > though, so it would always have to memmove. Probably OK for the "slow" > > path? > > Erm, not sure what you mean here? Yeah, flushing is going to take longer > if you ended up using the stack pointer instead of writing directly to > the packet. That's kinda intrinsic? Or am I misunderstanding you? I think I misunderstood your comment about memcpy to mean "want to avoid aliased memory for perf reasons". Never mind!
On Wed, 29 Sept 2021 at 13:10, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Lorenz Bauer <lmb@cloudflare.com> writes: > > > On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote: > >> > >> Won't applications end up building something like skb_header_pointer() > >> based on bpf_xdp_adjust_data(), anyway? In which case why don't we > >> provide them what they need? > >> > >> say: > >> > >> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, > >> u32 offset, u32 len, void *stack_buf) > >> > >> flags and offset can be squashed into one u64 as needed. Helper returns > >> pointer to packet data, either real one or stack_buf. Verifier has to > >> be taught that the return value is NULL or a pointer which is safe with > >> offsets up to @len. > >> > >> If the reason for access is write we'd also need: > >> > >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > >> u32 offset, u32 len, void *stack_buf) > > > > Yes! This would be so much better than bpf_skb_load/store_bytes(), > > especially if we can use it for both XDP and skb contexts as stated > > elsewhere in this thread. > > Alright. Let's see if we can go this route, then :) Something I forgot to mention: you could infer that an XDP program is mb-aware if it only does packet access via the helpers. Put another way, it might be nice if ctx->data wasn't accessible in mb XDP. That way I know that all packet access has to handle mb-aware ctx (think pulling in functions via headers or even pre-compiled bpf libraries).
On Wed, 29 Sep 2021 11:36:33 +0100 Lorenz Bauer wrote: > On Mon, 20 Sept 2021 at 23:46, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > The draft API was: > > > > > > void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > > > u32 offset, u32 len, void *stack_buf) > > > > > > Which does not take the ptr returned by header_pointer(), but that's > > > easy to add (well, easy other than the fact it'd be the 6th arg). > > > > I guess we could play some trickery with stuffing offset/len/flags into > > one or two u64s to save an argument or two? > > Adding another pointer arg seems really hard to explain as an API. > What happens if I pass the "wrong" ptr? What happens if I pass NULL? Sure. We can leave the checking to the program then, but that ties our hands for the implementation changes later on. Not sure which pointer type will be chosen for the ret value but it may allow error checking at verification. > How about this: instead of taking stack_ptr, take the return value > from header_pointer as an argument. Then use the already existing > (right ;P) inlining to do the following: > > if (md->ptr + args->off != ret_ptr) > __pointer_flush(...) That only checks for the case where pointer is in the "head" frag, and is not generally correct. You need to check the length of the first frag is smaller than off. Otherwise BPF stack may "happen" to follow the head page and math will work out. It would also be slower than Lorenzo's current code, which allows access to tail pages without copying.
On Wed, 29 Sep 2021 14:25:09 +0200 Toke Høiland-Jørgensen wrote: > >> I guess we could play some trickery with stuffing offset/len/flags into > >> one or two u64s to save an argument or two? > > > > Adding another pointer arg seems really hard to explain as an API. > > What happens if I pass the "wrong" ptr? What happens if I pass NULL? > > > > How about this: instead of taking stack_ptr, take the return value > > from header_pointer as an argument. > > Hmm, that's a good point; I do think that passing the return value from > header pointer is more natural as well (you're flushing pointer you just > wrote to, after all). It is more natural but doesn't allow for the same level of optimization. Wouldn't consistent naming obviate the semantics?
On Wed, Sep 29, 2021 at 5:38 AM Lorenz Bauer <lmb@cloudflare.com> wrote: > > On Wed, 29 Sept 2021 at 13:10, Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Lorenz Bauer <lmb@cloudflare.com> writes: > > > > > On Thu, 16 Sept 2021 at 18:47, Jakub Kicinski <kuba@kernel.org> wrote: > > >> > > >> Won't applications end up building something like skb_header_pointer() > > >> based on bpf_xdp_adjust_data(), anyway? In which case why don't we > > >> provide them what they need? > > >> > > >> say: > > >> > > >> void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, > > >> u32 offset, u32 len, void *stack_buf) > > >> > > >> flags and offset can be squashed into one u64 as needed. Helper returns > > >> pointer to packet data, either real one or stack_buf. Verifier has to > > >> be taught that the return value is NULL or a pointer which is safe with > > >> offsets up to @len. > > >> > > >> If the reason for access is write we'd also need: > > >> > > >> void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, > > >> u32 offset, u32 len, void *stack_buf) I'm missing something. Why do we need a separate flush() helper? Can't we do: char buf[64], *p; p = xdp_mb_pointer(ctx, flags, off, len, buf); read/write p[] if (p == buf) xdp_store_bytes(ctx, off, buf, len, flags);
On Wed, 29 Sep 2021 11:54:46 -0700 Alexei Starovoitov wrote: > I'm missing something. Why do we need a separate flush() helper? > Can't we do: > char buf[64], *p; > p = xdp_mb_pointer(ctx, flags, off, len, buf); > read/write p[] > if (p == buf) > xdp_store_bytes(ctx, off, buf, len, flags); Sure we can. That's what I meant by "leave the checking to the program". It's bike shedding at this point.
Jakub Kicinski <kuba@kernel.org> writes: > On Wed, 29 Sep 2021 11:54:46 -0700 Alexei Starovoitov wrote: >> I'm missing something. Why do we need a separate flush() helper? >> Can't we do: >> char buf[64], *p; >> p = xdp_mb_pointer(ctx, flags, off, len, buf); >> read/write p[] >> if (p == buf) >> xdp_store_bytes(ctx, off, buf, len, flags); > > Sure we can. That's what I meant by "leave the checking to the program". > It's bike shedding at this point. Yeah, let's discuss the details once we have a patch :) -Toke
> Jakub Kicinski <kuba@kernel.org> writes: > > > On Wed, 29 Sep 2021 11:54:46 -0700 Alexei Starovoitov wrote: > >> I'm missing something. Why do we need a separate flush() helper? > >> Can't we do: > >> char buf[64], *p; > >> p = xdp_mb_pointer(ctx, flags, off, len, buf); > >> read/write p[] > >> if (p == buf) > >> xdp_store_bytes(ctx, off, buf, len, flags); > > > > Sure we can. That's what I meant by "leave the checking to the program". > > It's bike shedding at this point. > > Yeah, let's discuss the details once we have a patch :) > > -Toke > Hi, I implemented the xdp_mb_pointer/xdp_mb_pointer_flush logic here (according to current discussion): https://github.com/LorenzoBianconi/bpf-next/commit/a5c61c0fa6cb05bab8caebd96aca5fbbd9510867 For the moment I have only defined two utility routines and I have not exported them in ebpf helpers since I need to check what are missing bits in the verifier code (but afaik this would be orthogonal with respect to the "helper code"): - bpf_xdp_pointer --> xdp_mb_pointer - bpf_xdp_copy_buf --> xdp_mb_pointer_flush In order to test them I have defined two new ebpf helpers (they use bpf_xdp_pointer/bpf_xdp_copy_buf internally): - bpf_xdp_load_bytes - bpf_xdp_store_bytes In order to test bpf_xdp_load_bytes/bpf_xdp_store_bytes + bpf_xdp_pointer/bpf_xdp_copy_buf I added some selftests here: https://github.com/LorenzoBianconi/bpf-next/commit/5661a491a890c00db744f2884b7ee3a6d0319384 Can you please check if the code above is aligned to current requirements or if it is missing something? If this code it is fine, I guess we have two option here: - integrate the commits above in xdp multi-buff series (posting v15) and work on the verfier code in parallel (if xdp_mb_pointer helper is not required from day0) - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes helper (probably we will still need bpf_xdp_store_bytes) and introduce bpf_xdp_pointer as new ebpf helper. I am fine both ways. If we decide for the second option I would need some guidance on verifier changes since I am not so familiar with that code. Regards, Lorenzo
On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote: > Can you please check if the code above is aligned to current requirements or if > it is missing something? > If this code it is fine, I guess we have two option here: > - integrate the commits above in xdp multi-buff series (posting v15) and work on > the verfier code in parallel (if xdp_mb_pointer helper is not required from day0) > - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes > helper (probably we will still need bpf_xdp_store_bytes) and introduce > bpf_xdp_pointer as new ebpf helper. It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist. But FWIW no preference here.
> On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote: > > Can you please check if the code above is aligned to current requirements or if > > it is missing something? > > If this code it is fine, I guess we have two option here: > > - integrate the commits above in xdp multi-buff series (posting v15) and work on > > the verfier code in parallel (if xdp_mb_pointer helper is not required from day0) > > - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes > > helper (probably we will still need bpf_xdp_store_bytes) and introduce > > bpf_xdp_pointer as new ebpf helper. > > It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist. > But FWIW no preference here. > ack, same here. Any other opinion about it? Regards, Lorenzo
On 6 Oct 2021, at 11:32, Lorenzo Bianconi wrote: >> On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote: >>> Can you please check if the code above is aligned to current requirements or if >>> it is missing something? >>> If this code it is fine, I guess we have two option here: >>> - integrate the commits above in xdp multi-buff series (posting v15) and work on >>> the verfier code in parallel (if xdp_mb_pointer helper is not required from day0) >>> - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes >>> helper (probably we will still need bpf_xdp_store_bytes) and introduce >>> bpf_xdp_pointer as new ebpf helper. >> >> It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist. >> But FWIW no preference here. >> > > ack, same here. Any other opinion about it? I was under the impression getting a pointer might be enough. But playing with the bpf ring buffers for a bit, it still might be handy to extract some data to be sent to userspace. So I would not mind keeping it.
> > > On 6 Oct 2021, at 11:32, Lorenzo Bianconi wrote: > > >> On Fri, 1 Oct 2021 11:03:58 +0200 Lorenzo Bianconi wrote: > >>> Can you please check if the code above is aligned to current requirements or if > >>> it is missing something? > >>> If this code it is fine, I guess we have two option here: > >>> - integrate the commits above in xdp multi-buff series (posting v15) and work on > >>> the verfier code in parallel (if xdp_mb_pointer helper is not required from day0) > >>> - integrate verfier changes in xdp multi-buff series, drop bpf_xdp_load_bytes > >>> helper (probably we will still need bpf_xdp_store_bytes) and introduce > >>> bpf_xdp_pointer as new ebpf helper. > >> > >> It wasn't clear to me that we wanted bpf_xdp_load_bytes() to exist. > >> But FWIW no preference here. > >> > > > > ack, same here. Any other opinion about it? > > I was under the impression getting a pointer might be enough. But playing with the bpf ring buffers for a bit, it still might be handy to extract some data to be sent to userspace. So I would not mind keeping it. > ok, so it seems we have a use-case for bpf_xdp_load_bytes(). If everybody agree, I will post v15 with them included and we then we can work in parallel for a bpf_xdp_pointer ebpf helper. Regards, Lorenzo