mbox series

[RFC,v8,00/20] bpf qdisc

Message ID 20240510192412.3297104-1-amery.hung@bytedance.com (mailing list archive)
Headers show
Series bpf qdisc | expand

Message

Amery Hung May 10, 2024, 7:23 p.m. UTC
This is the v8 of bpf qdisc patchset. While I would like to do more
testing and performance evaluation, I think posting it now may help
discussions in the upcoming LSF/MM/BPF.

* Overview *

This series supports implementing a qdisc using bpf struct_ops. bpf qdisc
aims to be a flexible and easy-to-use infrastructure that allows users to
quickly experiment with different scheduling algorithms/policies. It only
requires users to implement core qdisc logic using bpf and implements the
mundane part for them. In addition, the ability to easily communicate
between qdisc and other components will also bring new opportunities for
new applications and optimizations.

After discussion in the previous patchset [0], we swicthed to struct_ops
to take the benefit of struct_ops and avoid introducing a new abstraction
to users. In addition, three changes to bpf are introduced to make
bpf qdisc easier to program and performant.

* struct_ops changes *

To make struct_ops works better with bpf qdisc, two new changes are
introduced to bpf specifically for struct_ops programs. Frist, we
introduce "ref_acquired" postfix for arguments in stub functions [1] in
patch 1-2. It will allow Qdisc_ops->enqueue to acquire an referenced kptr
to an skb just once. Through the reference object tracking mechanism in
the verifier, we can make sure that the acquired skb will be either
enqueued or dropped. Besides, no duplicate references can be acquired.
Then, we allow a reference leak in struct_ops programs so that we can
return an skb naturally. This is done and tested in patch 3 and 4.

* Support adding skb to bpf graph *

Allowing users to enqueue an skb directly to a bpf collection improves
users' programming experience and performance of qdiscs. In the previous
patchset (v7), the user would need to allocate a local object, exchange
an skb kptr into the object and then add the object to a collection during
enqueue. The memory allocation in the fast path was hurting the
performance.

To allow adding skb to collection, we first introduced the support for
adding kernel objects to bpf list and rbtree (patch 5-8). Then, we
introduced exclusive-ownership graph nodes so that 1) we can fit
an rb node into an skb, and 2) make it possible for list node and rb node
to coexist in a union in skb (patch 9-12).

We evaluated the benefit of direct skb queueing by comparing the
throughput of simple fifo qdiscs implemented with v7 and v8 patchset.
Both qdisc use a bpf list as the fifo. The fifo v8 is included in the
selftests. While fifo v7 is identical in terms of the queueing logic,
it requires additional bpf_obj_new() and bpf_kptr_xchg() calls to enqueue
a local object containing a skb kptr. The test uses iperf3 to send and 
receive traffic on the qdisc added to the loopback device for 1 minute,
and we repeated it for five times. The result is shown below:

                                    Average throughput   stdev
fifo with indirect queueing (v7)    40.4 Gbps            0.91 Gbps
fifo with direct queueing (v8)      43.5 Gbps            0.24 Gbps

This part of the patchset (patch 5-12) is less tested and the approach may
be overcomplicated, so I especially would like to gather more feedback
before going further.

* Miscellaneous notes *

Finally, this patchset is based on 
34c58c89feb3 (Merge branch 'gve-ring-size-changes') in net-next.

The fq example in selftests requires bpf support of exchanging kptr into
allocated objects (local kptr), which Dave Marchevsky developed and
sent me as off-list patchset.

Todo:
  - Add more bpf testcases
  - Add testcases for bpf_skb_tc_classify and other qdisc ops
  - Add kfunc access control
  - Add support for statistics
  - Remove the requirement of explicit skb->dev restoration
  - Look into more ops in Qdisc_ops
  - Support updating Qdisc_ops

[0] https://lore.kernel.org/netdev/cover.1705432850.git.amery.hung@bytedance.com/

---
v8: Implement support of bpf qdisc using struct_ops
    Allow struct_ops to acquire referenced kptr via argument
    Allow struct_ops to release and return referenced kptr
    Support enqueuing sk_buff to bpf_rbtree/list
    Move examples from samples to selftests
    Add a classful qdisc selftest

v7: Reference skb using kptr to sk_buff instead of __sk_buff
    Use the new bpf rbtree/link to for skb queues
    Add reset and init programs
    Add a bpf fq qdisc sample
    Add a bpf netem qdisc sample

v6: switch to kptr based approach

v5: mv kernel/bpf/skb_map.c net/core/skb_map.c
    implement flow map as map-in-map
    rename bpf_skb_tc_classify() and move it to net/sched/cls_api.c
    clean up eBPF qdisc program context

v4: get rid of PIFO, use rbtree directly

v3: move priority queue from sch_bpf to skb map
    introduce skb map and its helpers
    introduce bpf_skb_classify()
    use netdevice notifier to reset skb's
    Rebase on latest bpf-next

v2: Rebase on latest net-next
    Make the code more complete (but still incomplete)

Amery Hung (20):
  bpf: Support passing referenced kptr to struct_ops programs
  selftests/bpf: Test referenced kptr arguments of struct_ops programs
  bpf: Allow struct_ops prog to return referenced kptr
  selftests/bpf: Test returning kptr from struct_ops programs
  bpf: Generate btf_struct_metas for kernel BTF
  bpf: Recognize kernel types as graph values
  bpf: Allow adding kernel objects to collections
  selftests/bpf: Test adding kernel object to bpf graph
  bpf: Find special BTF fields in union
  bpf: Introduce exclusive-ownership list and rbtree nodes
  bpf: Allow adding exclusive nodes to bpf list and rbtree
  selftests/bpf: Modify linked_list tests to work with macro-ified
    removes
  bpf: net_sched: Support implementation of Qdisc_ops in bpf
  bpf: net_sched: Add bpf qdisc kfuncs
  bpf: net_sched: Allow more optional methods in Qdisc_ops
  libbpf: Support creating and destroying qdisc
  selftests: Add a basic fifo qdisc test
  selftests: Add a bpf fq qdisc to selftest
  selftests: Add a bpf netem qdisc to selftest
  selftests: Add a prio bpf qdisc

 include/linux/bpf.h                           |  30 +-
 include/linux/bpf_verifier.h                  |   8 +-
 include/linux/btf.h                           |   5 +-
 include/linux/rbtree_types.h                  |   4 +
 include/linux/skbuff.h                        |   2 +
 include/linux/types.h                         |   4 +
 include/net/sch_generic.h                     |   8 +
 kernel/bpf/bpf_struct_ops.c                   |  17 +-
 kernel/bpf/btf.c                              | 255 +++++-
 kernel/bpf/helpers.c                          |  63 +-
 kernel/bpf/syscall.c                          |  22 +-
 kernel/bpf/verifier.c                         | 185 +++-
 net/sched/Makefile                            |   4 +
 net/sched/bpf_qdisc.c                         | 788 ++++++++++++++++++
 net/sched/sch_api.c                           |  19 +-
 net/sched/sch_generic.c                       |  11 +-
 tools/lib/bpf/libbpf.h                        |   5 +-
 tools/lib/bpf/netlink.c                       |  20 +-
 .../testing/selftests/bpf/bpf_experimental.h  |  59 +-
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  29 +
 .../selftests/bpf/bpf_testmod/bpf_testmod.h   |  11 +
 .../selftests/bpf/prog_tests/bpf_qdisc.c      | 259 ++++++
 .../selftests/bpf/prog_tests/linked_list.c    |   6 +-
 .../prog_tests/test_struct_ops_kptr_return.c  |  87 ++
 .../prog_tests/test_struct_ops_ref_acquire.c  |  58 ++
 .../selftests/bpf/progs/bpf_qdisc_common.h    |  23 +
 .../selftests/bpf/progs/bpf_qdisc_fifo.c      |  83 ++
 .../selftests/bpf/progs/bpf_qdisc_fq.c        | 660 +++++++++++++++
 .../selftests/bpf/progs/bpf_qdisc_netem.c     | 236 ++++++
 .../selftests/bpf/progs/bpf_qdisc_prio.c      | 112 +++
 .../testing/selftests/bpf/progs/linked_list.c |  15 +
 .../testing/selftests/bpf/progs/linked_list.h |   8 +
 .../selftests/bpf/progs/linked_list_fail.c    |  46 +-
 .../bpf/progs/struct_ops_kptr_return.c        |  24 +
 ...uct_ops_kptr_return_fail__invalid_scalar.c |  24 +
 .../struct_ops_kptr_return_fail__local_kptr.c |  30 +
 ...uct_ops_kptr_return_fail__nonzero_offset.c |  23 +
 .../struct_ops_kptr_return_fail__wrong_type.c |  28 +
 .../bpf/progs/struct_ops_ref_acquire.c        |  27 +
 .../progs/struct_ops_ref_acquire_dup_ref.c    |  24 +
 .../progs/struct_ops_ref_acquire_ref_leak.c   |  19 +
 41 files changed, 3216 insertions(+), 125 deletions(-)
 create mode 100644 net/sched/bpf_qdisc.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_qdisc.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_kptr_return.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_ref_acquire.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_common.h
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fifo.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_fq.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_netem.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_qdisc_prio.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__invalid_scalar.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__local_kptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__nonzero_offset.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_kptr_return_fail__wrong_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_ref_acquire.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_dup_ref.c
 create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_ref_acquire_ref_leak.c