Message ID | 20231019174944.3376335-12-sdf@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | xsk: TX metadata | expand |
On Thu, 19 Oct 2023 at 19:50, Stanislav Fomichev <sdf@google.com> wrote: > > - how to use > - how to query features > - pointers to the examples > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > Documentation/networking/index.rst | 1 + > Documentation/networking/xsk-tx-metadata.rst | 77 ++++++++++++++++++++ > 2 files changed, 78 insertions(+) > create mode 100644 Documentation/networking/xsk-tx-metadata.rst > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > index 2ffc5ad10295..f3c2566d6cad 100644 > --- a/Documentation/networking/index.rst > +++ b/Documentation/networking/index.rst > @@ -122,6 +122,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > xfrm_sync > xfrm_sysctl > xdp-rx-metadata > + xsk-tx-metadata > > .. only:: subproject and html > > diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst > new file mode 100644 > index 000000000000..b7289f06745c > --- /dev/null > +++ b/Documentation/networking/xsk-tx-metadata.rst > @@ -0,0 +1,77 @@ > +================== > +AF_XDP TX Metadata > +================== > + > +This document describes how to enable offloads when transmitting packets > +via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar > +metadata on the receive side. > + > +General Design > +============== > + > +The headroom for the metadata is reserved via ``tx_metadata_len`` in > +``struct xdp_umem_reg``. The metadata length is therefore the same for > +every socket that shares the same umem. The metadata layout is a fixed UAPI, > +refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``. > +Thus, generally, the ``tx_metadata_len`` field above should contain > +``sizeof(union xsk_tx_metadata)``. > + > +The headroom and the metadata itself should be located right before > +``xdp_desc->addr`` in the umem frame. Within a frame, the metadata > +layout is as follows:: > + > + tx_metadata_len > + / \ > + +-----------------+---------+----------------------------+ > + | xsk_tx_metadata | padding | payload | > + +-----------------+---------+----------------------------+ > + ^ > + | > + xdp_desc->addr > + > +An AF_XDP application can request headrooms larger than ``sizeof(struct > +xsk_tx_metadata)``. The kernel will ignore the padding (and will still > +use ``xdp_desc->addr - tx_metadata_len`` to locate > +the ``xsk_tx_metadata``). For the frames that shouldn't carry > +any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), > +the metadata area is ignored by the kernel as well. > + > +The flags field enables the particular offload: > + > +- ``XDP_TX_METADATA_TIMESTAMP``: requests the device to put transmission > + timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. > +- ``XDP_TX_METADATA_CHECKSUM``: requests the device to calculate L4 > + checksum. ``csum_start`` specifies byte offset of there the checksumming nit: of there -> where > + should start and ``csum_offset`` specifies byte offset where the > + device should store the computed checksum. > +- ``XDP_TX_METADATA_CHECKSUM_SW``: requests checksum calculation to > + be done in software; this mode works only in ``XSK_COPY`` mode and > + is mostly intended for testing. Do not enable this option, it > + will negatively affect performance. > + > +Besides the flags above, in order to trigger the offloads, the first > +packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` > +bit in the ``options`` field. Also not that in a multi-buffer packet nit: not -> note > +only the first chunk should carry the metadata. > + > +Querying Device Capabilities > +============================ > + > +Every devices exports its offloads capabilities via netlink netdev family. > +Refer to ``xsk-flags`` features bitmask in > +``Documentation/netlink/specs/netdev.yaml``. > + > +- ``tx-timestamp``: device supports ``XDP_TX_METADATA_TIMESTAMP`` > +- ``tx-checksum``: device supports ``XDP_TX_METADATA_CHECKSUM`` > + > +Note that every devices supports ``XDP_TX_METADATA_CHECKSUM_SW`` when > +running in ``XSK_COPY`` mode. > + > +See ``tools/net/ynl/samples/netdev.c`` on how to query this information. > + > +Example > +======= > + > +See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example > +program that handles TX metadata. Also see https://github.com/fomichev/xskgen > +for a more bare-bones example. > -- > 2.42.0.655.g421f12c284-goog > >
On 10/23, Magnus Karlsson wrote: > On Thu, 19 Oct 2023 at 19:50, Stanislav Fomichev <sdf@google.com> wrote: > > > > - how to use > > - how to query features > > - pointers to the examples > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > > Documentation/networking/index.rst | 1 + > > Documentation/networking/xsk-tx-metadata.rst | 77 ++++++++++++++++++++ > > 2 files changed, 78 insertions(+) > > create mode 100644 Documentation/networking/xsk-tx-metadata.rst > > > > diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst > > index 2ffc5ad10295..f3c2566d6cad 100644 > > --- a/Documentation/networking/index.rst > > +++ b/Documentation/networking/index.rst > > @@ -122,6 +122,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. > > xfrm_sync > > xfrm_sysctl > > xdp-rx-metadata > > + xsk-tx-metadata > > > > .. only:: subproject and html > > > > diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst > > new file mode 100644 > > index 000000000000..b7289f06745c > > --- /dev/null > > +++ b/Documentation/networking/xsk-tx-metadata.rst > > @@ -0,0 +1,77 @@ > > +================== > > +AF_XDP TX Metadata > > +================== > > + > > +This document describes how to enable offloads when transmitting packets > > +via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar > > +metadata on the receive side. > > + > > +General Design > > +============== > > + > > +The headroom for the metadata is reserved via ``tx_metadata_len`` in > > +``struct xdp_umem_reg``. The metadata length is therefore the same for > > +every socket that shares the same umem. The metadata layout is a fixed UAPI, > > +refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``. > > +Thus, generally, the ``tx_metadata_len`` field above should contain > > +``sizeof(union xsk_tx_metadata)``. > > + > > +The headroom and the metadata itself should be located right before > > +``xdp_desc->addr`` in the umem frame. Within a frame, the metadata > > +layout is as follows:: > > + > > + tx_metadata_len > > + / \ > > + +-----------------+---------+----------------------------+ > > + | xsk_tx_metadata | padding | payload | > > + +-----------------+---------+----------------------------+ > > + ^ > > + | > > + xdp_desc->addr > > + > > +An AF_XDP application can request headrooms larger than ``sizeof(struct > > +xsk_tx_metadata)``. The kernel will ignore the padding (and will still > > +use ``xdp_desc->addr - tx_metadata_len`` to locate > > +the ``xsk_tx_metadata``). For the frames that shouldn't carry > > +any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), > > +the metadata area is ignored by the kernel as well. > > + > > +The flags field enables the particular offload: > > + > > +- ``XDP_TX_METADATA_TIMESTAMP``: requests the device to put transmission > > + timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. > > +- ``XDP_TX_METADATA_CHECKSUM``: requests the device to calculate L4 > > + checksum. ``csum_start`` specifies byte offset of there the checksumming > > nit: of there -> where > > > + should start and ``csum_offset`` specifies byte offset where the > > + device should store the computed checksum. > > +- ``XDP_TX_METADATA_CHECKSUM_SW``: requests checksum calculation to > > + be done in software; this mode works only in ``XSK_COPY`` mode and > > + is mostly intended for testing. Do not enable this option, it > > + will negatively affect performance. > > + > > +Besides the flags above, in order to trigger the offloads, the first > > +packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` > > +bit in the ``options`` field. Also not that in a multi-buffer packet > > nit: not -> note Thank you for both, will fix!
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 2ffc5ad10295..f3c2566d6cad 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -122,6 +122,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. xfrm_sync xfrm_sysctl xdp-rx-metadata + xsk-tx-metadata .. only:: subproject and html diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst new file mode 100644 index 000000000000..b7289f06745c --- /dev/null +++ b/Documentation/networking/xsk-tx-metadata.rst @@ -0,0 +1,77 @@ +================== +AF_XDP TX Metadata +================== + +This document describes how to enable offloads when transmitting packets +via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar +metadata on the receive side. + +General Design +============== + +The headroom for the metadata is reserved via ``tx_metadata_len`` in +``struct xdp_umem_reg``. The metadata length is therefore the same for +every socket that shares the same umem. The metadata layout is a fixed UAPI, +refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``. +Thus, generally, the ``tx_metadata_len`` field above should contain +``sizeof(union xsk_tx_metadata)``. + +The headroom and the metadata itself should be located right before +``xdp_desc->addr`` in the umem frame. Within a frame, the metadata +layout is as follows:: + + tx_metadata_len + / \ + +-----------------+---------+----------------------------+ + | xsk_tx_metadata | padding | payload | + +-----------------+---------+----------------------------+ + ^ + | + xdp_desc->addr + +An AF_XDP application can request headrooms larger than ``sizeof(struct +xsk_tx_metadata)``. The kernel will ignore the padding (and will still +use ``xdp_desc->addr - tx_metadata_len`` to locate +the ``xsk_tx_metadata``). For the frames that shouldn't carry +any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), +the metadata area is ignored by the kernel as well. + +The flags field enables the particular offload: + +- ``XDP_TX_METADATA_TIMESTAMP``: requests the device to put transmission + timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. +- ``XDP_TX_METADATA_CHECKSUM``: requests the device to calculate L4 + checksum. ``csum_start`` specifies byte offset of there the checksumming + should start and ``csum_offset`` specifies byte offset where the + device should store the computed checksum. +- ``XDP_TX_METADATA_CHECKSUM_SW``: requests checksum calculation to + be done in software; this mode works only in ``XSK_COPY`` mode and + is mostly intended for testing. Do not enable this option, it + will negatively affect performance. + +Besides the flags above, in order to trigger the offloads, the first +packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` +bit in the ``options`` field. Also not that in a multi-buffer packet +only the first chunk should carry the metadata. + +Querying Device Capabilities +============================ + +Every devices exports its offloads capabilities via netlink netdev family. +Refer to ``xsk-flags`` features bitmask in +``Documentation/netlink/specs/netdev.yaml``. + +- ``tx-timestamp``: device supports ``XDP_TX_METADATA_TIMESTAMP`` +- ``tx-checksum``: device supports ``XDP_TX_METADATA_CHECKSUM`` + +Note that every devices supports ``XDP_TX_METADATA_CHECKSUM_SW`` when +running in ``XSK_COPY`` mode. + +See ``tools/net/ynl/samples/netdev.c`` on how to query this information. + +Example +======= + +See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example +program that handles TX metadata. Also see https://github.com/fomichev/xskgen +for a more bare-bones example.
- how to use - how to query features - pointers to the examples Signed-off-by: Stanislav Fomichev <sdf@google.com> --- Documentation/networking/index.rst | 1 + Documentation/networking/xsk-tx-metadata.rst | 77 ++++++++++++++++++++ 2 files changed, 78 insertions(+) create mode 100644 Documentation/networking/xsk-tx-metadata.rst