diff mbox series

[bpf-next,v8,2/5] Documentation/bpf: Add documentation for BPF_PROG_RUN

Message ID 20220218175029.330224-3-toke@redhat.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Add support for transmitting packets using XDP in bpf_prog_run() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 15 of 15 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next success VM_Test

Commit Message

Toke Høiland-Jørgensen Feb. 18, 2022, 5:50 p.m. UTC
This adds documentation for the BPF_PROG_RUN command; a short overview of
the command itself, and a more verbose description of the "live packet"
mode for XDP introduced in the previous commit.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 Documentation/bpf/bpf_prog_run.rst | 120 +++++++++++++++++++++++++++++
 Documentation/bpf/index.rst        |   1 +
 2 files changed, 121 insertions(+)
 create mode 100644 Documentation/bpf/bpf_prog_run.rst

Comments

Alexei Starovoitov March 2, 2022, 7:04 p.m. UTC | #1
On Fri, Feb 18, 2022 at 06:50:26PM +0100, Toke Høiland-Jørgensen wrote:
> This adds documentation for the BPF_PROG_RUN command; a short overview of
> the command itself, and a more verbose description of the "live packet"
> mode for XDP introduced in the previous commit.

Overall the patch set looks great. The doc really helps.
One nit below.

> +- When running the program with multiple repetitions, the execution will happen
> +  in batches, where the program is executed multiple times in a loop, the result
> +  is saved, and other actions (like redirecting the packet or passing it to the
> +  networking stack) will happen for the whole batch after the execution. This is
> +  similar to how execution happens in driver-mode XDP for each hardware NAPI
> +  cycle. The batch size defaults to 64 packets (which is same as the NAPI batch
> +  size), but the batch size can be specified by userspace through the
> +  ``batch_size`` parameter, up to a maximum of 256 packets.

This paragraph is a bit confusing.
I've read it as the program can do only one kind of result per batch and
it will apply to the whole batch.
But the program can do XDP_PASS/REDIRECT in any order.
Can you make "the result is saved" a bit more clear?
Toke Høiland-Jørgensen March 2, 2022, 9:34 p.m. UTC | #2
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Feb 18, 2022 at 06:50:26PM +0100, Toke Høiland-Jørgensen wrote:
>> This adds documentation for the BPF_PROG_RUN command; a short overview of
>> the command itself, and a more verbose description of the "live packet"
>> mode for XDP introduced in the previous commit.
>
> Overall the patch set looks great. The doc really helps.

Great, thanks!

> One nit below.
>
>> +- When running the program with multiple repetitions, the execution will happen
>> +  in batches, where the program is executed multiple times in a loop, the result
>> +  is saved, and other actions (like redirecting the packet or passing it to the
>> +  networking stack) will happen for the whole batch after the execution. This is
>> +  similar to how execution happens in driver-mode XDP for each hardware NAPI
>> +  cycle. The batch size defaults to 64 packets (which is same as the NAPI batch
>> +  size), but the batch size can be specified by userspace through the
>> +  ``batch_size`` parameter, up to a maximum of 256 packets.
>
> This paragraph is a bit confusing.
> I've read it as the program can do only one kind of result per batch and
> it will apply to the whole batch.
> But the program can do XDP_PASS/REDIRECT in any order.
> Can you make "the result is saved" a bit more clear?

Yeah, re-reading it now, I see what you mean; will try to make it
clearer...

-Toke
diff mbox series

Patch

diff --git a/Documentation/bpf/bpf_prog_run.rst b/Documentation/bpf/bpf_prog_run.rst
new file mode 100644
index 000000000000..c561677081de
--- /dev/null
+++ b/Documentation/bpf/bpf_prog_run.rst
@@ -0,0 +1,120 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Running BPF programs from userspace
+===================================
+
+This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
+from userspace.
+
+.. contents::
+    :local:
+    :depth: 2
+
+
+Overview
+--------
+
+The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
+execute a BPF program in the kernel and return the results to userspace. This
+can be used to unit test BPF programs against user-supplied context objects, and
+as way to explicitly execute programs in the kernel for their side effects. The
+command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
+to be defined in the UAPI header, aliased to the same value.
+
+The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
+following types:
+
+- ``BPF_PROG_TYPE_SOCKET_FILTER``
+- ``BPF_PROG_TYPE_SCHED_CLS``
+- ``BPF_PROG_TYPE_SCHED_ACT``
+- ``BPF_PROG_TYPE_XDP``
+- ``BPF_PROG_TYPE_SK_LOOKUP``
+- ``BPF_PROG_TYPE_CGROUP_SKB``
+- ``BPF_PROG_TYPE_LWT_IN``
+- ``BPF_PROG_TYPE_LWT_OUT``
+- ``BPF_PROG_TYPE_LWT_XMIT``
+- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
+- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
+- ``BPF_PROG_TYPE_STRUCT_OPS``
+- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
+- ``BPF_PROG_TYPE_SYSCALL``
+
+When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
+object and (for program types operating on network packets) a buffer containing
+the packet data that the BPF program will operate on. The kernel will then
+execute the program and return the results to userspace. Note that programs will
+not have any side effects while being run in this mode; in particular, packets
+will not actually be redirected or dropped, the program return code will just be
+returned to userspace. A separate mode for live execution of XDP programs is
+provided, documented separately below.
+
+Running XDP programs in "live frame mode"
+-----------------------------------------
+
+The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
+which can be used to execute XDP programs in a way where packets will actually
+be processed by the kernel after the execution of the XDP program as if they
+arrived on a physical interface. This mode is activated by setting the
+``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
+``BPF_PROG_RUN``. Earlier versions of the kernel did not reject invalid flags
+supplied to ``BPF_PROG_RUN`` for XDP programs. For this reason, another new
+flag, ``BPF_F_TEST_XDP_RESERVED`` is defined, which will simply be rejected if
+set. Userspace can use this for feature probing: if the reserved flag is
+rejected, live frame mode is supported by the running kernel.
+
+The live packet mode is optimised for high performance execution of the supplied
+XDP program many times (suitable for, e.g., running as a traffic generator),
+which means the semantics are not quite as straight-forward as the regular test
+run mode. Specifically:
+
+- When executing an XDP program in live frame mode, the result of the execution
+  will not be returned to userspace; instead, the kernel will perform the
+  operation indicated by the program's return code (drop the packet, redirect
+  it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
+  in the syscall parameters when running in this mode will be rejected. In
+  addition, not all failures will be reported back to userspace directly;
+  specifically, only fatal errors in setup or during execution (like memory
+  allocation errors) will halt execution and return an error. If an error occurs
+  in packet processing, like a failure to redirect to a given interface,
+  execution will continue with the next repetition; these errors can be detected
+  via the same trace points as for regular XDP programs.
+
+- Userspace can supply an ifindex as part of the context object, just like in
+  the regular (non-live) mode. The XDP program will be executed as though the
+  packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
+  object will point to that interface. Furthermore, if the XDP program returns
+  ``XDP_PASS``, the packet will be injected into the kernel networking stack as
+  though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
+  will be transmitted *out* of that same interface. Do note, though, that
+  because the program execution is not happening in driver context, an
+  ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
+  that same interface (i.e., it will only work if the driver has support for the
+  ``ndo_xdp_xmit`` driver op).
+
+- When running the program with multiple repetitions, the execution will happen
+  in batches, where the program is executed multiple times in a loop, the result
+  is saved, and other actions (like redirecting the packet or passing it to the
+  networking stack) will happen for the whole batch after the execution. This is
+  similar to how execution happens in driver-mode XDP for each hardware NAPI
+  cycle. The batch size defaults to 64 packets (which is same as the NAPI batch
+  size), but the batch size can be specified by userspace through the
+  ``batch_size`` parameter, up to a maximum of 256 packets.
+
+- When setting up the test run, the kernel will initialise a pool of memory
+  pages of the same size as the batch size. Each memory page will be initialised
+  with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
+  invocation. When possible, the pages will be recycled on future program
+  invocations, to improve performance. Pages will generally be recycled a full
+  batch at a time, except when a packet is dropped (by return code or because
+  of, say, a redirection error), in which case that page will be recycled
+  immediately. If a packet ends up being passed to the regular networking stack
+  (because the XDP program returns ``XDP_PASS``, or because it ends up being
+  redirected to an interface that injects it into the stack), the page will be
+  released and a new one will be allocated when the pool is empty.
+
+  When recycling, the page content is not rewritten; only the packet boundary
+  pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
+  be reset to the original values. This means that if a program rewrites the
+  packet contents, it has to be prepared to see either the original content or
+  the modified version on subsequent invocations.
diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst
index ef5c996547ec..96056a7447c7 100644
--- a/Documentation/bpf/index.rst
+++ b/Documentation/bpf/index.rst
@@ -21,6 +21,7 @@  that goes into great technical depth about the BPF Architecture.
    helpers
    programs
    maps
+   bpf_prog_run
    classic_vs_extended.rst
    bpf_licensing
    test_debug