Message ID | 20211211184143.142003-7-toke@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | Add support for transmitting packets using XDP in bpf_prog_run() | expand |
On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > + > +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) > +{ > + struct xdp_mem_info mem = { > + .id = t->xdp.pp->xdp_mem_id, > + .type = MEM_TYPE_PAGE_POOL, > + }; pls add a new line. > + xdp_unreg_mem_model(&mem); > +} > + > +static bool ctx_was_changed(struct xdp_page_head *head) > +{ > + return (head->orig_ctx.data != head->ctx.data || > + head->orig_ctx.data_meta != head->ctx.data_meta || > + head->orig_ctx.data_end != head->ctx.data_end); redundant () > bpf_test_timer_enter(&t); > old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > do { > run_ctx.prog_item = &item; > - if (xdp) > + if (xdp && xdp_redirect) { > + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); > + if (unlikely(ret < 0)) > + break; > + *retval = ret; > + } else if (xdp) { > *retval = bpf_prog_run_xdp(prog, ctx); Can we do this unconditionally without introducing a new uapi flag? I mean "return bpf_redirect()" was a nop under test_run. What kind of tests might break if it stops being a nop?
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> + >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) >> +{ >> + struct xdp_mem_info mem = { >> + .id = t->xdp.pp->xdp_mem_id, >> + .type = MEM_TYPE_PAGE_POOL, >> + }; > > pls add a new line. > >> + xdp_unreg_mem_model(&mem); >> +} >> + >> +static bool ctx_was_changed(struct xdp_page_head *head) >> +{ >> + return (head->orig_ctx.data != head->ctx.data || >> + head->orig_ctx.data_meta != head->ctx.data_meta || >> + head->orig_ctx.data_end != head->ctx.data_end); > > redundant () > >> bpf_test_timer_enter(&t); >> old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); >> do { >> run_ctx.prog_item = &item; >> - if (xdp) >> + if (xdp && xdp_redirect) { >> + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); >> + if (unlikely(ret < 0)) >> + break; >> + *retval = ret; >> + } else if (xdp) { >> *retval = bpf_prog_run_xdp(prog, ctx); > > Can we do this unconditionally without introducing a new uapi flag? > I mean "return bpf_redirect()" was a nop under test_run. > What kind of tests might break if it stops being a nop? Well, I view the existing mode of bpf_prog_test_run() with XDP as a way to write XDP unit tests: it allows you to submit a packet, run your XDP program on it, and check that it returned the right value and did the right modifications. This means if you XDP program does 'return bpf_redirect()', userspace will still get the XDP_REDIRECT value and so it can check correctness of your XDP program. With this flag the behaviour changes quite drastically, in that it will actually put packets on the wire instead of getting back the program return. So I think it makes more sense to make it a separate opt-in mode; the old behaviour can still be useful for checking XDP program behaviour. -Toke
On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> + > >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) > >> +{ > >> + struct xdp_mem_info mem = { > >> + .id = t->xdp.pp->xdp_mem_id, > >> + .type = MEM_TYPE_PAGE_POOL, > >> + }; > > > > pls add a new line. > > > >> + xdp_unreg_mem_model(&mem); > >> +} > >> + > >> +static bool ctx_was_changed(struct xdp_page_head *head) > >> +{ > >> + return (head->orig_ctx.data != head->ctx.data || > >> + head->orig_ctx.data_meta != head->ctx.data_meta || > >> + head->orig_ctx.data_end != head->ctx.data_end); > > > > redundant () > > > >> bpf_test_timer_enter(&t); > >> old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > >> do { > >> run_ctx.prog_item = &item; > >> - if (xdp) > >> + if (xdp && xdp_redirect) { > >> + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); > >> + if (unlikely(ret < 0)) > >> + break; > >> + *retval = ret; > >> + } else if (xdp) { > >> *retval = bpf_prog_run_xdp(prog, ctx); > > > > Can we do this unconditionally without introducing a new uapi flag? > > I mean "return bpf_redirect()" was a nop under test_run. > > What kind of tests might break if it stops being a nop? > > Well, I view the existing mode of bpf_prog_test_run() with XDP as a way > to write XDP unit tests: it allows you to submit a packet, run your XDP > program on it, and check that it returned the right value and did the > right modifications. This means if you XDP program does 'return > bpf_redirect()', userspace will still get the XDP_REDIRECT value and so > it can check correctness of your XDP program. > > With this flag the behaviour changes quite drastically, in that it will > actually put packets on the wire instead of getting back the program > return. So I think it makes more sense to make it a separate opt-in > mode; the old behaviour can still be useful for checking XDP program > behaviour. Ok that all makes sense. How about using prog_run to feed the data into proper netdev? XDP prog may or may not attach to it (this detail is tbd) and prog_run would use prog_fd and ifindex to trigger RX (yes, receive) in that netdev. XDP prog will execute and will be able to perform all actions (not only XDP_REDIRECT). XDP_PASS would pass the packet to the stack, etc.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: >> >> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> + >> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) >> >> +{ >> >> + struct xdp_mem_info mem = { >> >> + .id = t->xdp.pp->xdp_mem_id, >> >> + .type = MEM_TYPE_PAGE_POOL, >> >> + }; >> > >> > pls add a new line. >> > >> >> + xdp_unreg_mem_model(&mem); >> >> +} >> >> + >> >> +static bool ctx_was_changed(struct xdp_page_head *head) >> >> +{ >> >> + return (head->orig_ctx.data != head->ctx.data || >> >> + head->orig_ctx.data_meta != head->ctx.data_meta || >> >> + head->orig_ctx.data_end != head->ctx.data_end); >> > >> > redundant () >> > >> >> bpf_test_timer_enter(&t); >> >> old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); >> >> do { >> >> run_ctx.prog_item = &item; >> >> - if (xdp) >> >> + if (xdp && xdp_redirect) { >> >> + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); >> >> + if (unlikely(ret < 0)) >> >> + break; >> >> + *retval = ret; >> >> + } else if (xdp) { >> >> *retval = bpf_prog_run_xdp(prog, ctx); >> > >> > Can we do this unconditionally without introducing a new uapi flag? >> > I mean "return bpf_redirect()" was a nop under test_run. >> > What kind of tests might break if it stops being a nop? >> >> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way >> to write XDP unit tests: it allows you to submit a packet, run your XDP >> program on it, and check that it returned the right value and did the >> right modifications. This means if you XDP program does 'return >> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so >> it can check correctness of your XDP program. >> >> With this flag the behaviour changes quite drastically, in that it will >> actually put packets on the wire instead of getting back the program >> return. So I think it makes more sense to make it a separate opt-in >> mode; the old behaviour can still be useful for checking XDP program >> behaviour. > > Ok that all makes sense. Great! > How about using prog_run to feed the data into proper netdev? > XDP prog may or may not attach to it (this detail is tbd) and > prog_run would use prog_fd and ifindex to trigger RX (yes, receive) > in that netdev. XDP prog will execute and will be able to perform > all actions (not only XDP_REDIRECT). > XDP_PASS would pass the packet to the stack, etc. Hmm, that's certainly an interesting idea! I don't think we can actually run the XDP hook on the netdev itself (since that is deep in the driver), but we can emulate it: we just need to do what this version of the patch is doing, but add handling of the other return codes. XDP_PASS could be supported by basically copying what cpumap is doing (turn the frames into skbs and call netif_receive_skb_list()), but XDP_TX would have to be implemented via ndo_xdp_xmit(), so it becomes equivalent to a REDIRECT back to the same interface. That's probably OK, though, right? I'll try this out for the next version, thanks for the idea! -Toke
On Mon, Dec 13, 2021 at 4:36 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> > >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > >> > >> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > >> >> + > >> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) > >> >> +{ > >> >> + struct xdp_mem_info mem = { > >> >> + .id = t->xdp.pp->xdp_mem_id, > >> >> + .type = MEM_TYPE_PAGE_POOL, > >> >> + }; > >> > > >> > pls add a new line. > >> > > >> >> + xdp_unreg_mem_model(&mem); > >> >> +} > >> >> + > >> >> +static bool ctx_was_changed(struct xdp_page_head *head) > >> >> +{ > >> >> + return (head->orig_ctx.data != head->ctx.data || > >> >> + head->orig_ctx.data_meta != head->ctx.data_meta || > >> >> + head->orig_ctx.data_end != head->ctx.data_end); > >> > > >> > redundant () > >> > > >> >> bpf_test_timer_enter(&t); > >> >> old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > >> >> do { > >> >> run_ctx.prog_item = &item; > >> >> - if (xdp) > >> >> + if (xdp && xdp_redirect) { > >> >> + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); > >> >> + if (unlikely(ret < 0)) > >> >> + break; > >> >> + *retval = ret; > >> >> + } else if (xdp) { > >> >> *retval = bpf_prog_run_xdp(prog, ctx); > >> > > >> > Can we do this unconditionally without introducing a new uapi flag? > >> > I mean "return bpf_redirect()" was a nop under test_run. > >> > What kind of tests might break if it stops being a nop? > >> > >> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way > >> to write XDP unit tests: it allows you to submit a packet, run your XDP > >> program on it, and check that it returned the right value and did the > >> right modifications. This means if you XDP program does 'return > >> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so > >> it can check correctness of your XDP program. > >> > >> With this flag the behaviour changes quite drastically, in that it will > >> actually put packets on the wire instead of getting back the program > >> return. So I think it makes more sense to make it a separate opt-in > >> mode; the old behaviour can still be useful for checking XDP program > >> behaviour. > > > > Ok that all makes sense. > > Great! > > > How about using prog_run to feed the data into proper netdev? > > XDP prog may or may not attach to it (this detail is tbd) and > > prog_run would use prog_fd and ifindex to trigger RX (yes, receive) > > in that netdev. XDP prog will execute and will be able to perform > > all actions (not only XDP_REDIRECT). > > XDP_PASS would pass the packet to the stack, etc. > > Hmm, that's certainly an interesting idea! I don't think we can actually > run the XDP hook on the netdev itself (since that is deep in the > driver), but we can emulate it: we just need to do what this version of > the patch is doing, but add handling of the other return codes. > > XDP_PASS could be supported by basically copying what cpumap is doing > (turn the frames into skbs and call netif_receive_skb_list()), but > XDP_TX would have to be implemented via ndo_xdp_xmit(), so it becomes > equivalent to a REDIRECT back to the same interface. That's probably OK, > though, right? Yep. Something like this. imo the individual BPF_F_TEST_XDP_DO_REDIRECT knob doesn't look right. It's tweaking the prog run from no side effects execution model to partial side effects. If we want to run xdp prog with side effects it probably should behave like normal execution on the netdev when it receives the packet. We might not even need to create a new netdev for that. I can imagine a bpf_prog_run operating on eth0 with a packet prepared by the user space. Like injecting a packet right into the driver and xdp part of it. If prog says XDP_PASS the packet will go up the stack like normal. So this mechanism could be used to inject packets into the stack. Obviously buffer management is an issue in the traditional NIC when a packet doesn't come from the wire. Also doing this in every driver would be a pain. So we need some common infra to inject the user packet into a netdev like it was received by this netdev. It could be a change for tuntap or for veth or not related to netdev at all. After XDP_PASS it doesn't need to be fast. skb will get allocated and the stack might see it as it arrived from ifindex=N regardless of the HW of that netdev. XDP_TX would xmit right out of that ifindex=netdev. and XDP_REDIRECT would redirect to a different netdev. At the end there will be less special cases and page_pool tweaks. Thought the patches 1-5 look fine, it still feels a bit custom just for this particular BPF_F_TEST_XDP_DO_REDIRECT use case. With more generic bpf_run_prog(xdp_prog_fd, ifindex_of_netdev) it might reduce custom handling.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Mon, Dec 13, 2021 at 4:36 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: >> >> > On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> >> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: >> >> >> >> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: >> >> >> + >> >> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) >> >> >> +{ >> >> >> + struct xdp_mem_info mem = { >> >> >> + .id = t->xdp.pp->xdp_mem_id, >> >> >> + .type = MEM_TYPE_PAGE_POOL, >> >> >> + }; >> >> > >> >> > pls add a new line. >> >> > >> >> >> + xdp_unreg_mem_model(&mem); >> >> >> +} >> >> >> + >> >> >> +static bool ctx_was_changed(struct xdp_page_head *head) >> >> >> +{ >> >> >> + return (head->orig_ctx.data != head->ctx.data || >> >> >> + head->orig_ctx.data_meta != head->ctx.data_meta || >> >> >> + head->orig_ctx.data_end != head->ctx.data_end); >> >> > >> >> > redundant () >> >> > >> >> >> bpf_test_timer_enter(&t); >> >> >> old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); >> >> >> do { >> >> >> run_ctx.prog_item = &item; >> >> >> - if (xdp) >> >> >> + if (xdp && xdp_redirect) { >> >> >> + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); >> >> >> + if (unlikely(ret < 0)) >> >> >> + break; >> >> >> + *retval = ret; >> >> >> + } else if (xdp) { >> >> >> *retval = bpf_prog_run_xdp(prog, ctx); >> >> > >> >> > Can we do this unconditionally without introducing a new uapi flag? >> >> > I mean "return bpf_redirect()" was a nop under test_run. >> >> > What kind of tests might break if it stops being a nop? >> >> >> >> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way >> >> to write XDP unit tests: it allows you to submit a packet, run your XDP >> >> program on it, and check that it returned the right value and did the >> >> right modifications. This means if you XDP program does 'return >> >> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so >> >> it can check correctness of your XDP program. >> >> >> >> With this flag the behaviour changes quite drastically, in that it will >> >> actually put packets on the wire instead of getting back the program >> >> return. So I think it makes more sense to make it a separate opt-in >> >> mode; the old behaviour can still be useful for checking XDP program >> >> behaviour. >> > >> > Ok that all makes sense. >> >> Great! >> >> > How about using prog_run to feed the data into proper netdev? >> > XDP prog may or may not attach to it (this detail is tbd) and >> > prog_run would use prog_fd and ifindex to trigger RX (yes, receive) >> > in that netdev. XDP prog will execute and will be able to perform >> > all actions (not only XDP_REDIRECT). >> > XDP_PASS would pass the packet to the stack, etc. >> >> Hmm, that's certainly an interesting idea! I don't think we can actually >> run the XDP hook on the netdev itself (since that is deep in the >> driver), but we can emulate it: we just need to do what this version of >> the patch is doing, but add handling of the other return codes. >> >> XDP_PASS could be supported by basically copying what cpumap is doing >> (turn the frames into skbs and call netif_receive_skb_list()), but >> XDP_TX would have to be implemented via ndo_xdp_xmit(), so it becomes >> equivalent to a REDIRECT back to the same interface. That's probably OK, >> though, right? > > Yep. Something like this. > imo the individual BPF_F_TEST_XDP_DO_REDIRECT knob doesn't look right. > It's tweaking the prog run from no side effects execution model > to partial side effects. > If we want to run xdp prog with side effects it probably should > behave like normal execution on the netdev when it receives the packet. > We might not even need to create a new netdev for that. > I can imagine a bpf_prog_run operating on eth0 with a packet prepared > by the user space. > Like injecting a packet right into the driver and xdp part of it. > If prog says XDP_PASS the packet will go up the stack like normal. > So this mechanism could be used to inject packets into the stack. > Obviously buffer management is an issue in the traditional NIC > when a packet doesn't come from the wire. > Also doing this in every driver would be a pain. > So we need some common infra to inject the user packet into a netdev > like it was received by this netdev. It could be a change for tuntap > or for veth or not related to netdev at all. What you're describing is basically what the cpumap code does; except it doesn't handle XDP_TX, and it doesn't do buffer management. But I already implemented the latter, and the former is straight-forward to do as a special-case XDP_REDIRECT. So my plan is to try this out and see what that looks like :) > After XDP_PASS it doesn't need to be fast. skb will get allocated > and the stack might see it as it arrived from ifindex=N regardless > of the HW of that netdev. > XDP_TX would xmit right out of that ifindex=netdev. > and XDP_REDIRECT would redirect to a different netdev. > At the end there will be less special cases and page_pool tweaks. > Thought the patches 1-5 look fine, it still feels a bit custom > just for this particular BPF_F_TEST_XDP_DO_REDIRECT use case. > With more generic bpf_run_prog(xdp_prog_fd, ifindex_of_netdev) > it might reduce custom handling. Yup, totally makes sense! -Toke
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c26871263f1f..224a6e3261f5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1225,6 +1225,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, support performing redirection of XDP frames */ +#define BPF_F_TEST_XDP_DO_REDIRECT (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type { diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig index d24d518ddd63..c8c920020d11 100644 --- a/kernel/bpf/Kconfig +++ b/kernel/bpf/Kconfig @@ -30,6 +30,7 @@ config BPF_SYSCALL select TASKS_TRACE_RCU select BINARY_PRINTF select NET_SOCK_MSG if NET + select PAGE_POOL if NET default n help Enable the bpf() system call that allows to manipulate BPF programs diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 46dd95755967..02bd0616e64d 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -14,6 +14,7 @@ #include <net/sock.h> #include <net/tcp.h> #include <net/net_namespace.h> +#include <net/page_pool.h> #include <linux/error-injection.h> #include <linux/smp.h> #include <linux/sock_diag.h> @@ -23,19 +24,34 @@ #include <trace/events/bpf_test_run.h> struct bpf_test_timer { - enum { NO_PREEMPT, NO_MIGRATE } mode; + enum { NO_PREEMPT, NO_MIGRATE, XDP } mode; u32 i; u64 time_start, time_spent; + struct { + struct xdp_buff *orig_ctx; + struct xdp_rxq_info rxq; + struct page_pool *pp; + u16 frame_cnt; + } xdp; }; static void bpf_test_timer_enter(struct bpf_test_timer *t) __acquires(rcu) { rcu_read_lock(); - if (t->mode == NO_PREEMPT) + switch (t->mode) { + case NO_PREEMPT: preempt_disable(); - else + break; + case XDP: + migrate_disable(); + xdp_set_return_frame_no_direct(); + t->xdp.frame_cnt = 0; + break; + case NO_MIGRATE: migrate_disable(); + break; + } t->time_start = ktime_get_ns(); } @@ -45,10 +61,18 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t) { t->time_start = 0; - if (t->mode == NO_PREEMPT) + switch (t->mode) { + case NO_PREEMPT: preempt_enable(); - else + break; + case XDP: + xdp_do_flush(); + xdp_clear_return_frame_no_direct(); + fallthrough; + case NO_MIGRATE: migrate_enable(); + break; + } rcu_read_unlock(); } @@ -87,13 +111,161 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *e return false; } +/* We put this struct at the head of each page with a context and frame + * initialised when the page is allocated, so we don't have to do this on each + * repetition of the test run. + */ +struct xdp_page_head { + struct xdp_buff orig_ctx; + struct xdp_buff ctx; + struct xdp_frame frm; + u8 data[]; +}; + +#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head) \ + - sizeof(struct skb_shared_info)) + +static void bpf_test_run_xdp_init_page(struct page *page, void *arg) +{ + struct xdp_page_head *head = phys_to_virt(page_to_phys(page)); + struct xdp_buff *new_ctx, *orig_ctx; + u32 headroom = XDP_PACKET_HEADROOM; + struct bpf_test_timer *t = arg; + size_t frm_len, meta_len; + struct xdp_frame *frm; + void *data; + + orig_ctx = t->xdp.orig_ctx; + frm_len = orig_ctx->data_end - orig_ctx->data_meta; + meta_len = orig_ctx->data - orig_ctx->data_meta; + headroom -= meta_len; + + new_ctx = &head->ctx; + frm = &head->frm; + data = &head->data; + memcpy(data + headroom, orig_ctx->data_meta, frm_len); + + xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &t->xdp.rxq); + xdp_prepare_buff(new_ctx, data, headroom, frm_len, true); + new_ctx->data_meta = new_ctx->data + meta_len; + + xdp_update_frame_from_buff(new_ctx, frm); + frm->mem = new_ctx->rxq->mem; + + memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx)); +} + +static int bpf_test_run_xdp_setup(struct bpf_test_timer *t, struct xdp_buff *orig_ctx) +{ + struct xdp_mem_info mem = {}; + struct page_pool *pp; + int err; + struct page_pool_params pp_params = { + .order = 0, + .flags = 0, + .pool_size = NAPI_POLL_WEIGHT * 2, + .nid = NUMA_NO_NODE, + .max_len = TEST_XDP_FRAME_SIZE, + .init_callback = bpf_test_run_xdp_init_page, + .init_arg = t, + }; + + pp = page_pool_create(&pp_params); + if (IS_ERR(pp)) + return PTR_ERR(pp); + + /* will copy 'mem->id' into pp->xdp_mem_id */ + err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp); + if (err) { + page_pool_destroy(pp); + return err; + } + t->xdp.pp = pp; + + /* We create a 'fake' RXQ referencing the original dev, but with an + * xdp_mem_info pointing to our page_pool + */ + xdp_rxq_info_reg(&t->xdp.rxq, orig_ctx->rxq->dev, 0, 0); + t->xdp.rxq.mem.type = MEM_TYPE_PAGE_POOL; + t->xdp.rxq.mem.id = pp->xdp_mem_id; + t->xdp.orig_ctx = orig_ctx; + + return 0; +} + +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t) +{ + struct xdp_mem_info mem = { + .id = t->xdp.pp->xdp_mem_id, + .type = MEM_TYPE_PAGE_POOL, + }; + xdp_unreg_mem_model(&mem); +} + +static bool ctx_was_changed(struct xdp_page_head *head) +{ + return (head->orig_ctx.data != head->ctx.data || + head->orig_ctx.data_meta != head->ctx.data_meta || + head->orig_ctx.data_end != head->ctx.data_end); +} + +static void reset_ctx(struct xdp_page_head *head) +{ + if (likely(!ctx_was_changed(head))) + return; + + head->ctx.data = head->orig_ctx.data; + head->ctx.data_meta = head->orig_ctx.data_meta; + head->ctx.data_end = head->orig_ctx.data_end; + xdp_update_frame_from_buff(&head->ctx, &head->frm); +} + +static int bpf_test_run_xdp_redirect(struct bpf_test_timer *t, + struct bpf_prog *prog, struct xdp_buff *orig_ctx) +{ + struct xdp_page_head *head; + struct xdp_buff *ctx; + struct page *page; + int ret, err = 0; + + page = page_pool_dev_alloc_pages(t->xdp.pp); + if (!page) + return -ENOMEM; + + head = phys_to_virt(page_to_phys(page)); + reset_ctx(head); + ctx = &head->ctx; + + ret = bpf_prog_run_xdp(prog, ctx); + if (ret == XDP_REDIRECT) { + struct xdp_frame *frm = &head->frm; + + /* if program changed pkt bounds we need to update the xdp_frame */ + if (unlikely(ctx_was_changed(head))) + xdp_update_frame_from_buff(ctx, frm); + + err = xdp_do_redirect_frame(ctx->rxq->dev, ctx, frm, prog); + if (err) + ret = err; + } + if (ret != XDP_REDIRECT) + xdp_return_buff(ctx); + + if (++t->xdp.frame_cnt >= NAPI_POLL_WEIGHT) { + xdp_do_flush(); + t->xdp.frame_cnt = 0; + } + + return ret; +} + static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, - u32 *retval, u32 *time, bool xdp) + u32 *retval, u32 *time, bool xdp, bool xdp_redirect) { struct bpf_prog_array_item item = {.prog = prog}; struct bpf_run_ctx *old_ctx; struct bpf_cg_run_ctx run_ctx; - struct bpf_test_timer t = { NO_MIGRATE }; + struct bpf_test_timer t = { .mode = (xdp && xdp_redirect) ? XDP : NO_MIGRATE }; enum bpf_cgroup_storage_type stype; int ret; @@ -110,14 +282,26 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, if (!repeat) repeat = 1; + if (t.mode == XDP) { + ret = bpf_test_run_xdp_setup(&t, ctx); + if (ret) + return ret; + } + bpf_test_timer_enter(&t); old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); do { run_ctx.prog_item = &item; - if (xdp) + if (xdp && xdp_redirect) { + ret = bpf_test_run_xdp_redirect(&t, prog, ctx); + if (unlikely(ret < 0)) + break; + *retval = ret; + } else if (xdp) { *retval = bpf_prog_run_xdp(prog, ctx); - else + } else { *retval = bpf_prog_run(prog, ctx); + } } while (bpf_test_timer_continue(&t, repeat, &ret, time)); bpf_reset_run_ctx(old_ctx); bpf_test_timer_leave(&t); @@ -125,6 +309,9 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat, for_each_cgroup_storage_type(stype) bpf_cgroup_storage_free(item.cgroup_storage[stype]); + if (t.mode == XDP) + bpf_test_run_xdp_teardown(&t); + return ret; } @@ -663,7 +850,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, ret = convert___skb_to_skb(skb, ctx); if (ret) goto out; - ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false); + ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false, false); if (ret) goto out; if (!is_l2) { @@ -757,6 +944,7 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md) int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr) { + bool do_redirect = (kattr->test.flags & BPF_F_TEST_XDP_DO_REDIRECT); u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); u32 headroom = XDP_PACKET_HEADROOM; u32 size = kattr->test.data_size_in; @@ -773,6 +961,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, prog->expected_attach_type == BPF_XDP_CPUMAP) return -EINVAL; + if (kattr->test.flags & ~BPF_F_TEST_XDP_DO_REDIRECT) + return -EINVAL; + ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md)); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -781,7 +972,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, /* There can't be user provided data before the meta data */ if (ctx->data_meta || ctx->data_end != size || ctx->data > ctx->data_end || - unlikely(xdp_metalen_invalid(ctx->data))) + unlikely(xdp_metalen_invalid(ctx->data)) || + (do_redirect && (kattr->test.data_out || kattr->test.ctx_out))) goto free_ctx; /* Meta data is allocated from the headroom */ headroom -= ctx->data; @@ -807,7 +999,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr, if (repeat > 1) bpf_prog_change_xdp(NULL, prog); - ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true); + ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, + true, do_redirect); /* We convert the xdp_buff back to an xdp_md before checking the return * code so the reference count of any held netdevice will be decremented * even if the test run failed. diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index c26871263f1f..224a6e3261f5 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1225,6 +1225,8 @@ enum { /* If set, run the test on the cpu specified by bpf_attr.test.cpu */ #define BPF_F_TEST_RUN_ON_CPU (1U << 0) +/* If set, support performing redirection of XDP frames */ +#define BPF_F_TEST_XDP_DO_REDIRECT (1U << 1) /* type for BPF_ENABLE_STATS */ enum bpf_stats_type {
This adds support for doing real redirects when an XDP program returns XDP_REDIRECT in bpf_prog_run(). To achieve this, we create a page pool instance while setting up the test run, and feed pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 11.5 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Previous uses of bpf_prog_run() for XDP returned the modified packet data and return code to userspace, which is a different semantic then this new redirect mode. For this reason, the caller has to set the new BPF_F_TEST_XDP_DO_REDIRECT flag when calling bpf_prog_run() to opt in to the different semantics. Enabling this flag is only allowed if not setting ctx_out and data_out in the test specification, since it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> --- include/uapi/linux/bpf.h | 2 + kernel/bpf/Kconfig | 1 + net/bpf/test_run.c | 217 +++++++++++++++++++++++++++++++-- tools/include/uapi/linux/bpf.h | 2 + 4 files changed, 210 insertions(+), 12 deletions(-)