diff mbox series

[bpf-next,v1] docs/bpf: Add docs for BPF_PROG_TYPE_XDP

Message ID 20221212122400.64415-1-donald.hunter@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next,v1] docs/bpf: Add docs for BPF_PROG_TYPE_XDP | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 12 maintainers not CCed: netdev@vger.kernel.org kpsingh@kernel.org haoluo@google.com davem@davemloft.net song@kernel.org yhs@fb.com kuba@kernel.org martin.lau@linux.dev sdf@google.com john.fastabend@gmail.com jolsa@kernel.org hawk@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-8 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-9 success Logs for set-matrix

Commit Message

Donald Hunter Dec. 12, 2022, 12:24 p.m. UTC
Document XDP programs (BPF_PROG_TYPE_XDP) and the XDP data path.

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
---
 Documentation/bpf/prog_xdp.rst | 176 +++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100644 Documentation/bpf/prog_xdp.rst

Comments

Toke Høiland-Jørgensen Dec. 12, 2022, 3:30 p.m. UTC | #1
Donald Hunter <donald.hunter@gmail.com> writes:

> Document XDP programs (BPF_PROG_TYPE_XDP) and the XDP data path.
>
> Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
> ---
>  Documentation/bpf/prog_xdp.rst | 176 +++++++++++++++++++++++++++++++++
>  1 file changed, 176 insertions(+)
>  create mode 100644 Documentation/bpf/prog_xdp.rst
>
> diff --git a/Documentation/bpf/prog_xdp.rst b/Documentation/bpf/prog_xdp.rst
> new file mode 100644
> index 000000000000..69b001a6c7d2
> --- /dev/null
> +++ b/Documentation/bpf/prog_xdp.rst
> @@ -0,0 +1,176 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +.. Copyright (C) 2022 Red Hat, Inc.
> +
> +================
> +XDP BPF Programs
> +================
> +
> +XDP (eXpress Data Path) is a fast path in the kernel network stack. XDP allows
> +for packet processing by BPF programs before the packets traverse the L4-L7
> +network stack.

The 'L4-L7' thing is not really accurate, and it's not really relevant
here either, so just leave it out? Maybe:

"XDP (eXpress Data Path) is a fast path in the kernel network stack. XDP
allows for packet processing by BPF programs in the driver, before the
packets traverse the kernel networking stack"?


> Programs of type ``BPF_PROG_TYPE_XDP`` are attached to the XDP
> +hook of a specific interface in one of three modes:
> +
> +- ``SKB_MODE`` - The hook point is in the generic net device
> +- ``DRV_MODE`` - The hook point is in the driver for the interface
> +- ``HW_MODE`` - The BPF program is offloaded to the NIC

How about moving the attach mode stuff a bit later? When it's mentioned
this early it seems more important than it really is. Since the "modes"
is the next section, we could just leave it out of this intro altogether
(and move this paragraph into the beginning of the "XDP modes" section
below)?

> +The BPF program attached to an interface's XDP hook gets called for each L2
> +frame that is received on the interface. The program is passed a ``struct xdp_md
> +*ctx`` which gives access to the L2 data frame as well as some essential
> +metadata for the frame:
> +
> +.. code-block:: c
> +
> +    struct xdp_md {
> +            __u32 data;
> +            __u32 data_end;
> +            __u32 data_meta;
> +
> +            __u32 ingress_ifindex; /* rxq->dev->ifindex */
> +            __u32 rx_queue_index;  /* rxq->queue_index  */
> +            __u32 egress_ifindex;  /* txq->dev->ifindex */

Not sure it's relevant to show which kernel structures the data comes
from? Maybe change the comments to be English descriptions (like
"ingress ifindex", "ingress RXQ index"). We should also mention that
egress_ifindex is only available the program is attached to a devmap.
And, erm, the text should mention that XDP programs can be attached to a
devmap somewhere? :)

> +    };
> +
> +The BPF program can read and modify the frame before deciding what action should
> +be taken for the packet.

Do we explain how to do that anywhere? I.e., is the "direct data access"
thing explained in some other doc?

> The program returns one of the following action values
> +in order to tell the driver or net device how to process the packet (details in
> +:ref:`xdp_packet_actions`):
> +
> +- ``XDP_DROP`` - Drop the packet without any further processing
> +- ``XDP_PASS`` - Pass the packet to the kernel network stack for further
> +  processing
> +- ``XDP_TX`` - Transmit the packet out of the same interface
> +- ``XDP_REDIRECT`` - Redirect the packet to a specific destination
> +- ``XDP_ABORTED`` - Drop the packet and notify an exception state
> +
> +There are many BPF helper functions available to XDP programs for accessing and
> +modifying packet data, for interacting with the kernel networking stack and for
> +using BPF maps. `bpf-helpers(7)`_ describes the helpers available to XDP
> +programs.
> +
> +The `libxdp`_ library provides functions for attaching XDP programs to network
> +interfaces and for using ``AF_XDP`` sockets.
> +
> +XDP Modes
> +=========
> +
> +SKB Mode
> +--------
> +
> +An XDP program attached in SKB mode gets executed by the kernel network stack
> +*after* the driver has created a ``struct sk_buff`` (SKB) and passed it to the
> +networking stack. SKB mode is also referred to as *generic* mode and is always
> +available, whether or not the driver is XDP-enabled. An XDP program in SKB mode
> +is run by the netdev before classifiers or ``tc`` BPF programs are run.

I think we should add some text saying that the SKB mode has a
significant performance overhead compared to driver mode, and that the
TC hook in many cases is a better choice than using XDP in SKB mode.

I'd also move SKB mode below driver mode, to highlight that driver mode
is really the "main" XDP execution mode.

> +Driver Mode
> +-----------
> +
> +An XDP program attached in driver mode gets executed by the network driver for
> +an interface *before* the driver creates a ``struct sk_buff`` (SKB) for the
> +incoming packet. The XDP program runs immediately after the driver receives the
> +packet. This gives the XDP program an opportunity to entirely avoid the cost of
> +SKB creation and kernel network stack processing.
> +
> +Driver mode requires the driver to be XDP-enabled so is not always available.

Since this is supposed to be the authoritative documentation on XDP,
should we list which drivers support XDP here?

> +Hardware Mode
> +-------------
> +
> +Some devices may support hardware offload of BPF programs, which they do in a
> +hardware specific way.

"...which they do in a hardware-specific way, meaning that some features
and helpers are not available to offloaded programs." ?

Also, as above, should we mention that only the nfp driver supports this? 

> +.. _xdp_packet_actions:
> +
> +XDP Packet Actions
> +==================
> +
> +XDP_DROP
> +--------
> +
> +The ``XDP_DROP`` action tells the driver or netdev to drop the XDP frame without
> +any further processing.
> +
> +XDP_PASS
> +--------
> +
> +The ``XDP_PASS`` action tells the driver to convert the XDP frame into an SKB
> +and the driver or netdev to pass the SKB on to the kernel network stack for
> +normal processing.
> +
> +XDP_TX
> +------
> +
> +The ``XDP_TX`` action tells the driver or netdev to transmit the XDP frame out
> +of the associated interface.
> +
> +XDP_REDIRECT
> +------------
> +
> +The ``XDP_REDIRECT`` action tells the driver to redirect the packet for further
> +processing. There are several types of redirect available to the XDP program:
> +
> +- Redirect to another device by ifindex
> +- Redirect to another device using a devmap
> +- Redirect into an ``AF_XDP`` socket using an xskmap
> +- Redirect to another CPU using a cpumap, before delivering to the network stack
> +
> +The ``bpf_redirect()`` and ``bpf_redirect_map()`` helper functions are used
> +to set up the desired redirect destination before returning ``XDP_REDIRECT`` to
> +the driver.
> +
> +.. code-block:: c
> +
> +    long bpf_redirect(u32 ifindex, u64 flags)
> +
> +The ``bpf_redirect()`` helper function redirects the packet to the net device
> +identified by ``ifindex``.
> +
> +.. code-block:: c
> +
> +    long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
> +
> +The ``bpf_redirect_map()`` helper function redirects the packet to the
> +destination referenced by ``map`` at index ``key``. The type of destination
> +depends on the type ``map`` that is used:
> +
> +- ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` redirects the packet
> +  to another net device
> +- ``BPF_MAP_TYPE_CPUMAP`` redirects the packet processing to a specific CPU
> +- ``BPF_MAP_TYPE_XSKMAP`` redirects the packet to an ``AF_XDP`` socket. See
> +  ../networking/af_xdp.rst for more information.
> +
> +Detailed behaviour of ``bpf_redirect()`` and ``bpf_redirect_map()`` is described
> +in `bpf-helpers(7)`_. ``XDP_REDIRECT`` is described in more detail in
> +redirect.rst.
> +
> +XDP_ABORTED
> +-----------
> +
> +The ``XDP_ABORTED`` action tells the driver that the BPF program exited in an
> +exception state. The driver will drop the packet in the same way as if the BPF
> +program returned ``XDP_DROP`` but the ``trace_xdp_exception`` trace point is also
> +triggered.
> +
> +Examples
> +========
> +
> +An example XDP program that uses ``XDP_REDIRECT`` can be found in
> +`tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c`_ and the
> +corresponding user space code in
> +`tools/testing/selftests/bpf/xdp_redirect_multi.c`_
> +
> +References
> +==========
> +
> +- https://github.com/xdp-project/xdp-tools
> +- https://github.com/xdp-project/xdp-tutorial
> +- https://docs.cilium.io/en/latest/bpf/progtypes

Should we incorporate (some of) the text from the Cilium doc instead of
linking to it (assuming the license is compatible, of course)?

-Toke
diff mbox series

Patch

diff --git a/Documentation/bpf/prog_xdp.rst b/Documentation/bpf/prog_xdp.rst
new file mode 100644
index 000000000000..69b001a6c7d2
--- /dev/null
+++ b/Documentation/bpf/prog_xdp.rst
@@ -0,0 +1,176 @@ 
+.. SPDX-License-Identifier: GPL-2.0-only
+.. Copyright (C) 2022 Red Hat, Inc.
+
+================
+XDP BPF Programs
+================
+
+XDP (eXpress Data Path) is a fast path in the kernel network stack. XDP allows
+for packet processing by BPF programs before the packets traverse the L4-L7
+network stack. Programs of type ``BPF_PROG_TYPE_XDP`` are attached to the XDP
+hook of a specific interface in one of three modes:
+
+- ``SKB_MODE`` - The hook point is in the generic net device
+- ``DRV_MODE`` - The hook point is in the driver for the interface
+- ``HW_MODE`` - The BPF program is offloaded to the NIC
+
+The BPF program attached to an interface's XDP hook gets called for each L2
+frame that is received on the interface. The program is passed a ``struct xdp_md
+*ctx`` which gives access to the L2 data frame as well as some essential
+metadata for the frame:
+
+.. code-block:: c
+
+    struct xdp_md {
+            __u32 data;
+            __u32 data_end;
+            __u32 data_meta;
+
+            __u32 ingress_ifindex; /* rxq->dev->ifindex */
+            __u32 rx_queue_index;  /* rxq->queue_index  */
+            __u32 egress_ifindex;  /* txq->dev->ifindex */
+    };
+
+The BPF program can read and modify the frame before deciding what action should
+be taken for the packet. The program returns one of the following action values
+in order to tell the driver or net device how to process the packet (details in
+:ref:`xdp_packet_actions`):
+
+- ``XDP_DROP`` - Drop the packet without any further processing
+- ``XDP_PASS`` - Pass the packet to the kernel network stack for further
+  processing
+- ``XDP_TX`` - Transmit the packet out of the same interface
+- ``XDP_REDIRECT`` - Redirect the packet to a specific destination
+- ``XDP_ABORTED`` - Drop the packet and notify an exception state
+
+There are many BPF helper functions available to XDP programs for accessing and
+modifying packet data, for interacting with the kernel networking stack and for
+using BPF maps. `bpf-helpers(7)`_ describes the helpers available to XDP
+programs.
+
+The `libxdp`_ library provides functions for attaching XDP programs to network
+interfaces and for using ``AF_XDP`` sockets.
+
+XDP Modes
+=========
+
+SKB Mode
+--------
+
+An XDP program attached in SKB mode gets executed by the kernel network stack
+*after* the driver has created a ``struct sk_buff`` (SKB) and passed it to the
+networking stack. SKB mode is also referred to as *generic* mode and is always
+available, whether or not the driver is XDP-enabled. An XDP program in SKB mode
+is run by the netdev before classifiers or ``tc`` BPF programs are run.
+
+Driver Mode
+-----------
+
+An XDP program attached in driver mode gets executed by the network driver for
+an interface *before* the driver creates a ``struct sk_buff`` (SKB) for the
+incoming packet. The XDP program runs immediately after the driver receives the
+packet. This gives the XDP program an opportunity to entirely avoid the cost of
+SKB creation and kernel network stack processing.
+
+Driver mode requires the driver to be XDP-enabled so is not always available.
+
+Hardware Mode
+-------------
+
+Some devices may support hardware offload of BPF programs, which they do in a
+hardware specific way.
+
+.. _xdp_packet_actions:
+
+XDP Packet Actions
+==================
+
+XDP_DROP
+--------
+
+The ``XDP_DROP`` action tells the driver or netdev to drop the XDP frame without
+any further processing.
+
+XDP_PASS
+--------
+
+The ``XDP_PASS`` action tells the driver to convert the XDP frame into an SKB
+and the driver or netdev to pass the SKB on to the kernel network stack for
+normal processing.
+
+XDP_TX
+------
+
+The ``XDP_TX`` action tells the driver or netdev to transmit the XDP frame out
+of the associated interface.
+
+XDP_REDIRECT
+------------
+
+The ``XDP_REDIRECT`` action tells the driver to redirect the packet for further
+processing. There are several types of redirect available to the XDP program:
+
+- Redirect to another device by ifindex
+- Redirect to another device using a devmap
+- Redirect into an ``AF_XDP`` socket using an xskmap
+- Redirect to another CPU using a cpumap, before delivering to the network stack
+
+The ``bpf_redirect()`` and ``bpf_redirect_map()`` helper functions are used
+to set up the desired redirect destination before returning ``XDP_REDIRECT`` to
+the driver.
+
+.. code-block:: c
+
+    long bpf_redirect(u32 ifindex, u64 flags)
+
+The ``bpf_redirect()`` helper function redirects the packet to the net device
+identified by ``ifindex``.
+
+.. code-block:: c
+
+    long bpf_redirect_map(struct bpf_map *map, u32 key, u64 flags)
+
+The ``bpf_redirect_map()`` helper function redirects the packet to the
+destination referenced by ``map`` at index ``key``. The type of destination
+depends on the type ``map`` that is used:
+
+- ``BPF_MAP_TYPE_DEVMAP`` and ``BPF_MAP_TYPE_DEVMAP_HASH`` redirects the packet
+  to another net device
+- ``BPF_MAP_TYPE_CPUMAP`` redirects the packet processing to a specific CPU
+- ``BPF_MAP_TYPE_XSKMAP`` redirects the packet to an ``AF_XDP`` socket. See
+  ../networking/af_xdp.rst for more information.
+
+Detailed behaviour of ``bpf_redirect()`` and ``bpf_redirect_map()`` is described
+in `bpf-helpers(7)`_. ``XDP_REDIRECT`` is described in more detail in
+redirect.rst.
+
+XDP_ABORTED
+-----------
+
+The ``XDP_ABORTED`` action tells the driver that the BPF program exited in an
+exception state. The driver will drop the packet in the same way as if the BPF
+program returned ``XDP_DROP`` but the ``trace_xdp_exception`` trace point is also
+triggered.
+
+Examples
+========
+
+An example XDP program that uses ``XDP_REDIRECT`` can be found in
+`tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c`_ and the
+corresponding user space code in
+`tools/testing/selftests/bpf/xdp_redirect_multi.c`_
+
+References
+==========
+
+- https://github.com/xdp-project/xdp-tools
+- https://github.com/xdp-project/xdp-tutorial
+- https://docs.cilium.io/en/latest/bpf/progtypes
+
+.. Links
+.. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html
+.. _libxdp: https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp
+.. _tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c:
+   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/xdp_redirect_multi_kern.c
+.. _tools/testing/selftests/bpf/xdp_redirect_multi.c:
+   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/xdp_redirect_multi.c