From patchwork Thu Sep 14 21:04:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386051 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E23226E0F for ; Thu, 14 Sep 2023 21:04:59 +0000 (UTC) Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D8722708 for ; Thu, 14 Sep 2023 14:04:58 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-5703b4e92b7so1110253a12.2 for ; Thu, 14 Sep 2023 14:04:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725498; x=1695330298; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FxanLnun3CciTJ7wLoHWhdiCGX4GL5EEolw+Oe4EYgA=; b=3+E261VSNHagC5bcBQhGWfrZli5JCXgQlF8FKK4H9gbEKuKrPuHNU984jPssjCMYzM rJbbSI/kwU3sujLpcokehgbk9kNddG2w3s/uS4Y4uZcvZ2DGACraon62sjObobuR6YGh EH2coFAlZDI/iEHRr5+wp/vHyaI+vJhaQ9OarOJoLpCFyzyTsYGwMoEDeaWXYX6HlJPU ACFgoA1/g675Ekf/uClH+dJS1UBpi9Rqkf32XKo1kAyx61xOM2R1nP2B4Ewl/WeGvelM 4m56+iNNei/eAkMlmn0Ai2aWS1B13FMDWskVwjjT3CHLxh204u65SDa0UyTBvOPOiHKP ejDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725498; x=1695330298; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FxanLnun3CciTJ7wLoHWhdiCGX4GL5EEolw+Oe4EYgA=; b=DDUHueo5cbW8xpHfFObgVZObc6DmYSbiJp7xtCdSfHPGcvogWr/umQVlyYJru0dOWl vXa0X6wmiLOuW7X6h/XxyXr7HII4NDMSrNdz6l3RJv3OfZtm5ipfZycyIEV794bKfVJK uJQEdCkSn4i6JwQTmlNFzlr5CG7CIe+BEhfkHC1RPAA6G6k89r+YOT9Is7Ws6P0I6M6r 2qzgjRtUrAoG7tPa+5LlRkyVxR848nsI3FSPanMABiYVtPVXIcrlGmhETZs/FygSlfBC 3kjERFgiqLwxMb0EOX8VWmGGYUpwA7qOt8DYXTvSpl5by/PrtnbaAfq8h68+QdElxCty v5AA== X-Gm-Message-State: AOJu0YzFtRZpRuMriukYyzL6/FwGoB6DtV8AK7w66jOuPFHAsWZVwzbk 4YWObdwr02BZx9VNXrk+6XtehW4RPrEeF6IAMaAyQ993tCqfaQiBzUh6kymfryn4mi6On0+x24t jYyNgmHWMA77oA3rM488GKQHvM/7vUhpWYqB0k8XoIzluGiXNcQ== X-Google-Smtp-Source: AGHT+IH57MuHcDHv+x5UAqmellOHF8tbCFZy/cwz2QUNWqRkpzHVrkhYN52+AKI+q4qbxLf8Ieq3NJw= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:902:db0b:b0:1b7:c803:4818 with SMTP id m11-20020a170902db0b00b001b7c8034818mr263502plx.0.1694725496654; Thu, 14 Sep 2023 14:04:56 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:44 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-2-sdf@google.com> Subject: [PATCH bpf-next v2 1/9] xsk: Support tx_metadata_len From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net For zerocopy mode, tx_desc->addr can point to the arbitrary offset and carry some TX metadata in the headroom. For copy mode, there is no way currently to populate skb metadata. Introduce new tx_metadata_len umem config option that indicates how many bytes to treat as metadata. Metadata bytes come prior to tx_desc address (same as in RX case). The size of the metadata has the same constraints as XDP: - less than 256 bytes - 4-byte aligned - non-zero This data is not interpreted in any way right now. Signed-off-by: Stanislav Fomichev --- include/net/xdp_sock.h | 1 + include/net/xsk_buff_pool.h | 1 + include/uapi/linux/if_xdp.h | 1 + net/xdp/xdp_umem.c | 4 ++++ net/xdp/xsk.c | 12 +++++++++++- net/xdp/xsk_buff_pool.c | 1 + net/xdp/xsk_queue.h | 17 ++++++++++------- tools/include/uapi/linux/if_xdp.h | 1 + 8 files changed, 30 insertions(+), 8 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 1617af380162..10993a05d220 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -28,6 +28,7 @@ struct xdp_umem { struct user_struct *user; refcount_t users; u8 flags; + u8 tx_metadata_len; bool zc; struct page **pgs; int id; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index b0bdff26fc88..1985ffaf9b0c 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -77,6 +77,7 @@ struct xsk_buff_pool { u32 chunk_size; u32 chunk_shift; u32 frame_len; + u8 tx_metadata_len; /* inherited from umem */ u8 cached_need_wakeup; bool uses_need_wakeup; bool dma_need_sync; diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 8d48863472b9..2ecf79282c26 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -76,6 +76,7 @@ struct xdp_umem_reg { __u32 chunk_size; __u32 headroom; __u32 flags; + __u32 tx_metadata_len; }; struct xdp_statistics { diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 06cead2b8e34..333f3d53aad4 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -199,6 +199,9 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (headroom >= chunk_size - XDP_PACKET_HEADROOM) return -EINVAL; + if (mr->tx_metadata_len > 256 || mr->tx_metadata_len % 4) + return -EINVAL; + umem->size = size; umem->headroom = headroom; umem->chunk_size = chunk_size; @@ -207,6 +210,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) umem->pgs = NULL; umem->user = NULL; umem->flags = mr->flags; + umem->tx_metadata_len = mr->tx_metadata_len; INIT_LIST_HEAD(&umem->xsk_dma_list); refcount_set(&umem->users, 1); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 55f8b9b0e06d..5e479869ede1 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1255,6 +1255,14 @@ struct xdp_umem_reg_v1 { __u32 headroom; }; +struct xdp_umem_reg_v2 { + __u64 addr; /* Start of packet data area */ + __u64 len; /* Length of packet data area */ + __u32 chunk_size; + __u32 headroom; + __u32 flags; +}; + static int xsk_setsockopt(struct socket *sock, int level, int optname, sockptr_t optval, unsigned int optlen) { @@ -1298,8 +1306,10 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, if (optlen < sizeof(struct xdp_umem_reg_v1)) return -EINVAL; - else if (optlen < sizeof(mr)) + else if (optlen < sizeof(struct xdp_umem_reg_v2)) mr_size = sizeof(struct xdp_umem_reg_v1); + else if (optlen < sizeof(mr)) + mr_size = sizeof(struct xdp_umem_reg_v2); if (copy_from_sockptr(&mr, optval, mr_size)) return -EFAULT; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index b3f7b310811e..57c8d7100de8 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -85,6 +85,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, XDP_PACKET_HEADROOM; pool->umem = umem; pool->addrs = umem->addrs; + pool->tx_metadata_len = umem->tx_metadata_len; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xskb_list); INIT_LIST_HEAD(&pool->xsk_tx_list); diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index 13354a1e4280..c74a1372bcb9 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -143,15 +143,17 @@ static inline bool xp_unused_options_set(u32 options) static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - u64 offset = desc->addr & (pool->chunk_size - 1); + u64 addr = desc->addr - pool->tx_metadata_len; + u64 len = desc->len + pool->tx_metadata_len; + u64 offset = addr & (pool->chunk_size - 1); if (!desc->len) return false; - if (offset + desc->len > pool->chunk_size) + if (offset + len > pool->chunk_size) return false; - if (desc->addr >= pool->addrs_cnt) + if (addr >= pool->addrs_cnt) return false; if (xp_unused_options_set(desc->options)) @@ -162,16 +164,17 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { - u64 addr = xp_unaligned_add_offset_to_addr(desc->addr); + u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len; + u64 len = desc->len + pool->tx_metadata_len; if (!desc->len) return false; - if (desc->len > pool->chunk_size) + if (len > pool->chunk_size) return false; - if (addr >= pool->addrs_cnt || addr + desc->len > pool->addrs_cnt || - xp_desc_crosses_non_contig_pg(pool, addr, desc->len)) + if (addr >= pool->addrs_cnt || addr + len > pool->addrs_cnt || + xp_desc_crosses_non_contig_pg(pool, addr, len)) return false; if (xp_unused_options_set(desc->options)) diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 73a47da885dc..34411a2e5b6c 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -76,6 +76,7 @@ struct xdp_umem_reg { __u32 chunk_size; __u32 headroom; __u32 flags; + __u32 tx_metadata_len; }; struct xdp_statistics { From patchwork Thu Sep 14 21:04:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386052 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE54526E29 for ; Thu, 14 Sep 2023 21:04:59 +0000 (UTC) Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44C2C26B2 for ; Thu, 14 Sep 2023 14:04:59 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-573fdb618eeso1277738a12.0 for ; Thu, 14 Sep 2023 14:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725499; x=1695330299; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JY88GcOA/etpbruLYEM+JT5tusTRdLV0g0mY/cdkkZg=; b=CnM8Nz7id8vJtA9xv0zEErwRmheW82h885KNiaGDivQVA02I7GbTIDjjsu6yFm46SV 49HZsawgnIJoceNIjfKMGb7BH277JdzhYkQzgLgTHm/2+UTGWqiK0w0+Xxfjgo87eyeu QGuj0fHQF2IF67m4eNGuev0MKOungbVae1APZt41QmIEqwdghAVRScLwLuvxw5TBALsS qHlo0IT0vSPZE5Gf/weyoE/+cO2PpopfHRUmMUk8yxRA0Oos6MmHm9TC3GHu7qHN5e7K ELep5kzgw9oaOKoeZDuDGZhpIVsTmGQy/lKHcovSsI4AkzExMRIKRfxz1h+0Gvamibvp wqFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725499; x=1695330299; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JY88GcOA/etpbruLYEM+JT5tusTRdLV0g0mY/cdkkZg=; b=qdkc6xI8qxS7/A/dvfFhTK7Fc4mlhYfA55+puhQqJrYpLZY/bCxFnbF6S4vzjSuxnf uzgBSpjxlvgUIbA0rVXYyxbyxB2JbDH0zKdWpFwatt86cea2Z22wqzTNmxL51c/daUrT HHbv28Ub8/gL5hG5tFNKLGOB2b8vxsDeSGFUWLT4DwBPchgBuFL1gY5tA6dP1z8CVqBx fYBqJmBeEBuip0k7I7Xh6N3q6RARPNzij+HL2gvZICKKJQESARF82HhxvCseZCkzirKR 9I7QHlO9Yoz7E5yTB670xPpqqXiEdUdaZbMnyZlInksiIQt1W5ghAaAE1+rPFxzpqeFS OGDQ== X-Gm-Message-State: AOJu0Yz4wORidYSnF7aSD5OlX5F1bPi1jxWumMd03bKZRqlhLWslsLsM jV9wNZJDESfI8At9sY7p6PG6QydXe6XtuxtzvUf71+E8lL30fAspExKZxG5E50qMZ9hH7JoyetY ZwtmHkB8f8NhkAxX/QPJ+xHKN6IWV3yeJj/+1gEf7prAyoHVOXg== X-Google-Smtp-Source: AGHT+IFovq3jHj/M43xTv46pKrbsz8MjBTOkSMv/P0KMkZy9be2wuPayYcx/eSErtaFrdVJbfBGCJQ4= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a63:d203:0:b0:577:f6aa:cc40 with SMTP id a3-20020a63d203000000b00577f6aacc40mr91612pgg.12.1694725498427; Thu, 14 Sep 2023 14:04:58 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:45 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-3-sdf@google.com> Subject: [PATCH bpf-next v2 2/9] xsk: add TX timestamp and TX checksum offload support From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net This change actually defines the (initial) metadata layout that should be used by AF_XDP userspace (xsk_tx_metadata). The first field is flags which requests appropriate offloads, followed by the offload-specific fields. The supported per-device offloads are exported via netlink (new xsk-flags). The offloads themselves are still implemented in a bit of a framework-y fashion that's left from my initial kfunc attempt. I'm introducing new xsk_tx_metadata_ops which drivers are supposed to implement. The drivers are also supposed to call xsk_tx_metadata_request/xsk_tx_metadata_complete in the right places. Since xsk_tx_metadata_{request,_complete} are static inline, we don't incur any extra overhead doing indirect calls. The benefit of this scheme is as follows: - keeps all metadata layout parsing away from driver code - makes it easy to grep and see which drivers implement what - don't need any extra flags to maintain to keep track of what offloads are implemented; if the callback is implemented - the offload is supported (used by netlink reporting code) Two offloads are defined right now: 1. XDP_TX_METADATA_CHECKSUM: skb-style csum_start+csum_offset 2. XDP_TX_METADATA_TIMESTAMP: writes TX timestamp back into metadata area upon completion (tx_timestamp field) The offloads are also implemented for copy mode: 1. Extra XDP_TX_METADATA_CHECKSUM_SW to trigger skb_checksum_help; this might be useful as a reference implementation and for testing 2. XDP_TX_METADATA_TIMESTAMP writes SW timestamp from the skb destructor (note I'm reusing hwtstamps to pass metadata pointer) The struct is forward-compatible and can be extended in the future by appending more fields. Signed-off-by: Stanislav Fomichev --- Documentation/netlink/specs/netdev.yaml | 20 +++++++ include/linux/netdevice.h | 27 +++++++++ include/linux/skbuff.h | 14 ++++- include/net/xdp_sock.h | 80 +++++++++++++++++++++++++ include/net/xdp_sock_drv.h | 13 ++++ include/net/xsk_buff_pool.h | 6 ++ include/uapi/linux/if_xdp.h | 40 +++++++++++++ include/uapi/linux/netdev.h | 16 +++++ net/core/netdev-genl.c | 12 +++- net/xdp/xsk.c | 39 ++++++++++++ net/xdp/xsk_queue.h | 2 +- tools/include/uapi/linux/if_xdp.h | 54 +++++++++++++++-- tools/include/uapi/linux/netdev.h | 15 +++++ 13 files changed, 330 insertions(+), 8 deletions(-) diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index 1c7284fd535b..9002b37b7676 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -42,6 +42,19 @@ name: netdev doc: This feature informs if netdev implements non-linear XDP buffer support in ndo_xdp_xmit callback. + - + type: flags + name: xsk-flags + render-max: true + entries: + - + name: tx-timestamp + doc: + HW timestamping egress packets is supported by the driver. + - + name: tx-checksum + doc: + L3 checksum HW offload is supported by the driver. attribute-sets: - @@ -68,6 +81,12 @@ name: netdev type: u32 checks: min: 1 + - + name: xsk-features + doc: Bitmask of enabled AF_XDP features. + type: u64 + enum: xsk-flags + enum-as-flags: true operations: list: @@ -84,6 +103,7 @@ name: netdev - ifindex - xdp-features - xdp-zc-max-segs + - xsk-features dump: reply: *dev-all - diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0896aaa91dd7..3f02aaa30590 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1647,6 +1647,31 @@ struct net_device_ops { struct netlink_ext_ack *extack); }; +/* + * This structure defines the AF_XDP TX metadata hooks for network devices. + * The following hooks can be defined; unless noted otherwise, they are + * optional and can be filled with a null pointer. + * + * int (*tmo_request_timestamp)(void *priv) + * This function is called when AF_XDP frame requested egress timestamp. + * + * int (*tmo_fill_timestamp)(void *priv) + * This function is called when AF_XDP frame, that had requested + * egress timestamp, received a completion. The hook needs to return + * the actual HW timestamp. + * + * int (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv) + * This function is called when AF_XDP frame requested HW checksum + * offload. csum_start indicates position where checksumming should start. + * csum_offset indicates position where checksum should be stored. + * + */ +struct xsk_tx_metadata_ops { + void (*tmo_request_timestamp)(void *priv); + u64 (*tmo_fill_timestamp)(void *priv); + void (*tmo_request_checksum)(u16 csum_start, u16 csum_offset, void *priv); +}; + /** * enum netdev_priv_flags - &struct net_device priv_flags * @@ -1835,6 +1860,7 @@ enum netdev_ml_priv_type { * @netdev_ops: Includes several pointers to callbacks, * if one wants to override the ndo_*() functions * @xdp_metadata_ops: Includes pointers to XDP metadata callbacks. + * @xsk_tx_metadata_ops: Includes pointers to AF_XDP TX metadata callbacks. * @ethtool_ops: Management operations * @l3mdev_ops: Layer 3 master device operations * @ndisc_ops: Includes callbacks for different IPv6 neighbour @@ -2091,6 +2117,7 @@ struct net_device { unsigned long long priv_flags; const struct net_device_ops *netdev_ops; const struct xdp_metadata_ops *xdp_metadata_ops; + const struct xsk_tx_metadata_ops *xsk_tx_metadata_ops; int ifindex; unsigned short gflags; unsigned short hard_header_len; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4174c4b82d13..444d35dcd690 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -566,6 +566,15 @@ struct ubuf_info_msgzc { int mm_account_pinned_pages(struct mmpin *mmp, size_t size); void mm_unaccount_pinned_pages(struct mmpin *mmp); +/* Preserve some data across TX submission and completion. + * + * Note, this state is stored in the driver. Extending the layout + * might need some special care. + */ +struct xsk_tx_metadata_compl { + __u64 *tx_timestamp; +}; + /* This data is invariant across clones and lives at * the end of the header data, ie. at skb->end. */ @@ -578,7 +587,10 @@ struct skb_shared_info { /* Warning: this field is not always filled in (UFO)! */ unsigned short gso_segs; struct sk_buff *frag_list; - struct skb_shared_hwtstamps hwtstamps; + union { + struct skb_shared_hwtstamps hwtstamps; + struct xsk_tx_metadata_compl xsk_meta; + }; unsigned int gso_type; u32 tskey; diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 10993a05d220..c438c614a8d0 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -90,6 +90,74 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp); int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp); void __xsk_map_flush(void); +/** + * xsk_tx_metadata_to_compl - Save enough relevant metadata information + * to perform tx completion in the future. + * @meta: pointer to AF_XDP metadata area + * @compl: pointer to output struct xsk_tx_metadata_to_compl + * + * This function should be called by the networking device when + * it prepares AF_XDP egress packet. The value of @compl should be stored + * and passed to xsk_tx_metadata_complete upon TX completion. + */ +static inline void xsk_tx_metadata_to_compl(struct xsk_tx_metadata *meta, + struct xsk_tx_metadata_compl *compl) +{ + if (!meta) + return; + + if (meta->flags & XDP_TX_METADATA_TIMESTAMP) + compl->tx_timestamp = &meta->completion.tx_timestamp; + else + compl->tx_timestamp = NULL; +} + +/** + * xsk_tx_metadata_request - Evaluate AF_XDP TX metadata at submission + * and call appropriate xsk_tx_metadata_ops operation. + * @meta: pointer to AF_XDP metadata area + * @ops: pointer to struct xsk_tx_metadata_ops + * @priv: pointer to driver-private aread + * + * This function should be called by the networking device when + * it prepares AF_XDP egress packet. + */ +static inline void xsk_tx_metadata_request(const struct xsk_tx_metadata *meta, + const struct xsk_tx_metadata_ops *ops, + void *priv) +{ + if (!meta) + return; + + if (ops->tmo_request_timestamp) + if (meta->flags & XDP_TX_METADATA_TIMESTAMP) + ops->tmo_request_timestamp(priv); + + if (ops->tmo_request_checksum) + if (meta->flags & XDP_TX_METADATA_CHECKSUM) + ops->tmo_request_checksum(meta->csum_start, meta->csum_offset, priv); +} + +/** + * xsk_tx_metadata_complete - Evaluate AF_XDP TX metadata at completion + * and call appropriate xsk_tx_metadata_ops operation. + * @compl: pointer to completion metadata produced from xsk_tx_metadata_to_compl + * @ops: pointer to struct xsk_tx_metadata_ops + * @priv: pointer to driver-private aread + * + * This function should be called by the networking device upon + * AF_XDP egress completion. + */ +static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_compl *compl, + const struct xsk_tx_metadata_ops *ops, + void *priv) +{ + if (!compl) + return; + + *compl->tx_timestamp = ops->tmo_fill_timestamp(priv); +} + #else static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp) @@ -106,6 +174,18 @@ static inline void __xsk_map_flush(void) { } +static inline void xsk_tx_metadata_request(struct xsk_tx_metadata *meta, + const struct xsk_tx_metadata_ops *ops, + void *priv) +{ +} + +static inline void xsk_tx_metadata_complete(struct xsk_tx_metadata_comp *compl, + const struct xsk_tx_metadata_ops *ops, + void *priv) +{ +} + #endif /* CONFIG_XDP_SOCKETS */ #endif /* _LINUX_XDP_SOCK_H */ diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 1f6fc8c7a84c..e2558ac3e195 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -165,6 +165,14 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) return xp_raw_get_data(pool, addr); } +static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr) +{ + if (!pool->tx_metadata_len) + return NULL; + + return xp_raw_get_data(pool, addr) - pool->tx_metadata_len; +} + static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); @@ -324,6 +332,11 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) return NULL; } +static inline struct xsk_tx_metadata *xsk_buff_get_metadata(struct xsk_buff_pool *pool, u64 addr) +{ + return NULL; +} + static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) { } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 1985ffaf9b0c..97f5cc10d79e 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -33,6 +33,7 @@ struct xdp_buff_xsk { }; #define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb)) +#define XSK_TX_COMPL_FITS(t) BUILD_BUG_ON(sizeof(struct xsk_tx_metadata_compl) > sizeof(t)) struct xsk_dma_map { dma_addr_t *dma_pages; @@ -234,4 +235,9 @@ static inline u64 xp_get_handle(struct xdp_buff_xsk *xskb) return xskb->orig_addr + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT); } +static inline bool xp_tx_metadata_enabled(const struct xsk_buff_pool *pool) +{ + return pool->tx_metadata_len > 0; +} + #endif /* XSK_BUFF_POOL_H_ */ diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index 2ecf79282c26..ecfd67988283 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -106,6 +106,43 @@ struct xdp_options { #define XSK_UNALIGNED_BUF_ADDR_MASK \ ((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1) +/* Request transmit timestamp. Upon completion, put it into tx_timestamp + * field of struct xsk_tx_metadata. + */ +#define XDP_TX_METADATA_TIMESTAMP (1 << 0) + +/* Request transmit checksum offload. Checksum start position and offset + * are communicated via csum_start and csum_offset fields of struct + * xsk_tx_metadata. + */ +#define XDP_TX_METADATA_CHECKSUM (1 << 1) + +/* Force checksum calculation in software. Can be used for testing or + * working around potential HW issues. This option causes performance + * degradation and only works in XDP_COPY mode. + */ +#define XDP_TX_METADATA_CHECKSUM_SW (1 << 2) + +struct xsk_tx_metadata { + union { + struct { + __u32 flags; + + /* XDP_TX_METADATA_CHECKSUM */ + + /* Offset from desc->addr where checksumming should start. */ + __u16 csum_start; + /* Offset from csum_start where checksum should be stored. */ + __u16 csum_offset; + }; + + struct { + /* XDP_TX_METADATA_TIMESTAMP */ + __u64 tx_timestamp; + } completion; + }; +}; + /* Rx/Tx descriptor */ struct xdp_desc { __u64 addr; @@ -122,4 +159,7 @@ struct xdp_desc { */ #define XDP_PKT_CONTD (1 << 0) +/* TX packet carries valid metadata. */ +#define XDP_TX_METADATA (1 << 1) + #endif /* _LINUX_IF_XDP_H */ diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index c1634b95c223..138e467a09d6 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -38,11 +38,27 @@ enum netdev_xdp_act { NETDEV_XDP_ACT_MASK = 127, }; +/** + * enum netdev_xsk_flags + * @NETDEV_XSK_FLAGS_TX_TIMESTAMP: HW timestamping egress packets is supported + * by the driver. + * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the + * driver. + */ +enum netdev_xsk_flags { + NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, + NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, + + /* private: */ + NETDEV_XSK_FLAGS_MASK = 3, +}; + enum { NETDEV_A_DEV_IFINDEX = 1, NETDEV_A_DEV_PAD, NETDEV_A_DEV_XDP_FEATURES, NETDEV_A_DEV_XDP_ZC_MAX_SEGS, + NETDEV_A_DEV_XSK_FEATURES, __NETDEV_A_DEV_MAX, NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index c1aea8b756b6..2821546e08de 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -12,15 +12,25 @@ static int netdev_nl_dev_fill(struct net_device *netdev, struct sk_buff *rsp, const struct genl_info *info) { + u64 xsk_flags = 0; void *hdr; hdr = genlmsg_iput(rsp, info); if (!hdr) return -EMSGSIZE; + if (netdev->xsk_tx_metadata_ops) { + if (netdev->xsk_tx_metadata_ops->tmo_fill_timestamp) + xsk_flags |= NETDEV_XSK_FLAGS_TX_TIMESTAMP; + if (netdev->xsk_tx_metadata_ops->tmo_request_checksum) + xsk_flags |= NETDEV_XSK_FLAGS_TX_CHECKSUM; + } + if (nla_put_u32(rsp, NETDEV_A_DEV_IFINDEX, netdev->ifindex) || nla_put_u64_64bit(rsp, NETDEV_A_DEV_XDP_FEATURES, - netdev->xdp_features, NETDEV_A_DEV_PAD)) { + netdev->xdp_features, NETDEV_A_DEV_PAD) || + nla_put_u64_64bit(rsp, NETDEV_A_DEV_XSK_FEATURES, + xsk_flags, NETDEV_A_DEV_PAD)) { genlmsg_cancel(rsp, hdr); return -EINVAL; } diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 5e479869ede1..44cc4f8560d3 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -543,6 +543,13 @@ static u32 xsk_get_num_desc(struct sk_buff *skb) static void xsk_destruct_skb(struct sk_buff *skb) { + struct xsk_tx_metadata_compl *compl = &skb_shinfo(skb)->xsk_meta; + + if (compl->tx_timestamp) { + /* sw completion timestamp, not a real one */ + *compl->tx_timestamp = ktime_get_tai_fast_ns(); + } + xsk_cq_submit_locked(xdp_sk(skb->sk), xsk_get_num_desc(skb)); sock_wfree(skb); } @@ -627,8 +634,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, struct xdp_desc *desc) { + struct xsk_tx_metadata *meta = NULL; struct net_device *dev = xs->dev; struct sk_buff *skb = xs->skb; + bool first_frag = false; int err; if (dev->priv_flags & IFF_TX_SKB_NO_LINEAR) { @@ -659,6 +668,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, kfree_skb(skb); goto free_err; } + + first_frag = true; } else { int nr_frags = skb_shinfo(skb)->nr_frags; struct page *page; @@ -681,12 +692,40 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb_add_rx_frag(skb, nr_frags, page, 0, len, 0); } + + if (first_frag && desc->options & XDP_TX_METADATA) { + if (unlikely(xs->pool->tx_metadata_len == 0)) { + err = -EINVAL; + goto free_err; + } + + meta = buffer - xs->pool->tx_metadata_len; + + if (meta->flags & XDP_TX_METADATA_CHECKSUM) { + if (unlikely(meta->csum_start + meta->csum_offset + + sizeof(__sum16) > len)) { + err = -EINVAL; + goto free_err; + } + + skb->csum_start = hr + meta->csum_start; + skb->csum_offset = meta->csum_offset; + skb->ip_summed = CHECKSUM_PARTIAL; + + if (unlikely(meta->flags & XDP_TX_METADATA_CHECKSUM_SW)) { + err = skb_checksum_help(skb); + if (err) + goto free_err; + } + } + } } skb->dev = dev; skb->priority = xs->sk.sk_priority; skb->mark = READ_ONCE(xs->sk.sk_mark); skb->destructor = xsk_destruct_skb; + xsk_tx_metadata_to_compl(meta, &skb_shinfo(skb)->xsk_meta); xsk_set_destructor_arg(skb); return skb; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index c74a1372bcb9..6f2d1621c992 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -137,7 +137,7 @@ static inline bool xskq_cons_read_addr_unchecked(struct xsk_queue *q, u64 *addr) static inline bool xp_unused_options_set(u32 options) { - return options & ~XDP_PKT_CONTD; + return options & ~(XDP_PKT_CONTD | XDP_TX_METADATA); } static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool, diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h index 34411a2e5b6c..53ceaae10dd1 100644 --- a/tools/include/uapi/linux/if_xdp.h +++ b/tools/include/uapi/linux/if_xdp.h @@ -26,11 +26,11 @@ */ #define XDP_USE_NEED_WAKEUP (1 << 3) /* By setting this option, userspace application indicates that it can - * handle multiple descriptors per packet thus enabling xsk core to split + * handle multiple descriptors per packet thus enabling AF_XDP to split * multi-buffer XDP frames into multiple Rx descriptors. Without this set - * such frames will be dropped by xsk. + * such frames will be dropped. */ -#define XDP_USE_SG (1 << 4) +#define XDP_USE_SG (1 << 4) /* Flags for xsk_umem_config flags */ #define XDP_UMEM_UNALIGNED_CHUNK_FLAG (1 << 0) @@ -106,6 +106,43 @@ struct xdp_options { #define XSK_UNALIGNED_BUF_ADDR_MASK \ ((1ULL << XSK_UNALIGNED_BUF_OFFSET_SHIFT) - 1) +/* Request transmit timestamp. Upon completion, put it into tx_timestamp + * field of union xsk_tx_metadata. + */ +#define XDP_TX_METADATA_TIMESTAMP (1 << 0) + +/* Request transmit checksum offload. Checksum start position and offset + * are communicated via csum_start and csum_offset fields of union + * xsk_tx_metadata. + */ +#define XDP_TX_METADATA_CHECKSUM (1 << 1) + +/* Force checksum calculation in software. Can be used for testing or + * working around potential HW issues. This option causes performance + * degradation and only works in XDP_COPY mode. + */ +#define XDP_TX_METADATA_CHECKSUM_SW (1 << 2) + +struct xsk_tx_metadata { + union { + struct { + __u32 flags; + + /* XDP_TX_METADATA_CHECKSUM */ + + /* Offset from desc->addr where checksumming should start. */ + __u16 csum_start; + /* Offset from csum_start where checksum should be stored. */ + __u16 csum_offset; + }; + + struct { + /* XDP_TX_METADATA_TIMESTAMP */ + __u64 tx_timestamp; + } completion; + }; +}; + /* Rx/Tx descriptor */ struct xdp_desc { __u64 addr; @@ -113,9 +150,16 @@ struct xdp_desc { __u32 options; }; -/* Flag indicating packet constitutes of multiple buffers*/ +/* UMEM descriptor is __u64 */ + +/* Flag indicating that the packet continues with the buffer pointed out by the + * next frame in the ring. The end of the packet is signalled by setting this + * bit to zero. For single buffer packets, every descriptor has 'options' set + * to 0 and this maintains backward compatibility. + */ #define XDP_PKT_CONTD (1 << 0) -/* UMEM descriptor is __u64 */ +/* TX packet carries valid metadata. */ +#define XDP_TX_METADATA (1 << 1) #endif /* _LINUX_IF_XDP_H */ diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index c1634b95c223..e8fdc530dcc9 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -38,11 +38,26 @@ enum netdev_xdp_act { NETDEV_XDP_ACT_MASK = 127, }; +/** + * enum netdev_xsk_flags + * @NETDEV_XSK_FLAGS_TX_TIMESTAMP: HW timestamping egress packets is supported + * by the driver. + * @NETDEV_XSK_FLAGS_TX_CHECKSUM: L3 checksum HW offload is supported by the + * driver. + */ +enum netdev_xsk_flags { + NETDEV_XSK_FLAGS_TX_TIMESTAMP = 1, + NETDEV_XSK_FLAGS_TX_CHECKSUM = 2, + + NETDEV_XSK_FLAGS_MASK = 3, +}; + enum { NETDEV_A_DEV_IFINDEX = 1, NETDEV_A_DEV_PAD, NETDEV_A_DEV_XDP_FEATURES, NETDEV_A_DEV_XDP_ZC_MAX_SEGS, + NETDEV_A_DEV_XSK_FEATURES, __NETDEV_A_DEV_MAX, NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1) From patchwork Thu Sep 14 21:04:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386053 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE1FE26E0F for ; Thu, 14 Sep 2023 21:05:01 +0000 (UTC) Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 462352700 for ; Thu, 14 Sep 2023 14:05:01 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d80db590b1cso2747701276.0 for ; Thu, 14 Sep 2023 14:05:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725500; x=1695330300; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=s8lJ/wwRdtQO1FPJ3x7uMnAOgiGCvLMvp8mW8wlA0oo=; b=yHP0JIars7HQmA5a3s3DoCjzBZHqVmgW2MVO7hTfbOuBq/savYnh4pQiRGQIS+2Gf/ ESTS50FpbRbBtq5msFNYq4QAq12AcUxKz9ehsuY6hs0UQH5yAk2bEGxHvpH3TElY/Btm Yao+E6heXgMH+OF7xi/8gTUjMfC6kGBy4C6SBnqlRstX+ueWWsez6C/oGAdb5hbXLEMw 88xjt/zicDdFkvZWDccYUTP8VSrnPYB9yOxb2TEiwNWgI+ERvbpItMwdSXFFOX/MyE8V GmuFUHLZhHS7R4zzSygnjTlwL3ZiMC4WUY/lCYdEywdN5odB4DpimwvSB0NaVzoxUfSK L5rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725500; x=1695330300; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=s8lJ/wwRdtQO1FPJ3x7uMnAOgiGCvLMvp8mW8wlA0oo=; b=MsSvcILOnZnhZbNMBoOVzrUjpZPzOfZ2OjcC3ZGVqWu97HTQvilAFlCdVPnRXEj1D0 mF5egoaWS2TIKv7nhoqrlsz7IpkB0uw0NBCr9Fm3aUUr8xAsn4mdLDE2ehayIaZtXqkL lSYtPuq9UlSCLuyLdjZgm/FXliusIcrfN786QyxF5buu1Hhcj85a6E6siFbOGuEnr9cM GIj2dmqMivfIGi765/5OBkFp4VPwSMbmjKXU8EJGbzMEBS44c80yT6e0ohE7oZ+qMpKN D18Yxeo7PiJx7Va2aVI6h3j8BuCWA/TuMvfzG0DaMXGvHSFh5zMeycK2SMRbQ63t9W6U Qeiw== X-Gm-Message-State: AOJu0Yy42Ppw0MgilevxBOV/gCax4LSH8cINilzFi0JiGTqSUBNaygvQ jI0kFE52LRL2OedKo9u3GjEkFn9BnwPpfjwOLdXBkzsFQ1SbE5C2YkaHLEHL3pOhOpk1P/pEnOT PtVvmaCp6AMT2LdedDlRpy38BAIYGbZH2e83gOGEohImq7d7U4g== X-Google-Smtp-Source: AGHT+IGBVfjbSsKj7d7DebQnG2T8uSrtj2rRYcb2IwJR9AmePoOHHNUoBNQun7eAKz/+4xI5vDVwGVs= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a25:d141:0:b0:d77:fb00:b246 with SMTP id i62-20020a25d141000000b00d77fb00b246mr87035ybg.1.1694725500261; Thu, 14 Sep 2023 14:05:00 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:46 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-4-sdf@google.com> Subject: [PATCH bpf-next v2 3/9] tools: ynl: print xsk-features from the sample From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net Regenerate the userspace specs and print xsk-features bitmask. Signed-off-by: Stanislav Fomichev --- tools/net/ynl/generated/netdev-user.c | 19 +++++++++++++++++++ tools/net/ynl/generated/netdev-user.h | 3 +++ tools/net/ynl/samples/netdev.c | 6 ++++++ 3 files changed, 28 insertions(+) diff --git a/tools/net/ynl/generated/netdev-user.c b/tools/net/ynl/generated/netdev-user.c index 68b408ca0f7f..f8dd6aa0ad97 100644 --- a/tools/net/ynl/generated/netdev-user.c +++ b/tools/net/ynl/generated/netdev-user.c @@ -45,12 +45,26 @@ const char *netdev_xdp_act_str(enum netdev_xdp_act value) return netdev_xdp_act_strmap[value]; } +static const char * const netdev_xsk_flags_strmap[] = { + [0] = "tx-timestamp", + [1] = "tx-checksum", +}; + +const char *netdev_xsk_flags_str(enum netdev_xsk_flags value) +{ + value = ffs(value) - 1; + if (value < 0 || value >= (int)MNL_ARRAY_SIZE(netdev_xsk_flags_strmap)) + return NULL; + return netdev_xsk_flags_strmap[value]; +} + /* Policies */ struct ynl_policy_attr netdev_dev_policy[NETDEV_A_DEV_MAX + 1] = { [NETDEV_A_DEV_IFINDEX] = { .name = "ifindex", .type = YNL_PT_U32, }, [NETDEV_A_DEV_PAD] = { .name = "pad", .type = YNL_PT_IGNORE, }, [NETDEV_A_DEV_XDP_FEATURES] = { .name = "xdp-features", .type = YNL_PT_U64, }, [NETDEV_A_DEV_XDP_ZC_MAX_SEGS] = { .name = "xdp-zc-max-segs", .type = YNL_PT_U32, }, + [NETDEV_A_DEV_XSK_FEATURES] = { .name = "xsk-features", .type = YNL_PT_U64, }, }; struct ynl_policy_nest netdev_dev_nest = { @@ -97,6 +111,11 @@ int netdev_dev_get_rsp_parse(const struct nlmsghdr *nlh, void *data) return MNL_CB_ERROR; dst->_present.xdp_zc_max_segs = 1; dst->xdp_zc_max_segs = mnl_attr_get_u32(attr); + } else if (type == NETDEV_A_DEV_XSK_FEATURES) { + if (ynl_attr_validate(yarg, attr)) + return MNL_CB_ERROR; + dst->_present.xsk_features = 1; + dst->xsk_features = mnl_attr_get_u64(attr); } } diff --git a/tools/net/ynl/generated/netdev-user.h b/tools/net/ynl/generated/netdev-user.h index 0952d3261f4d..b8c5cdb331b4 100644 --- a/tools/net/ynl/generated/netdev-user.h +++ b/tools/net/ynl/generated/netdev-user.h @@ -18,6 +18,7 @@ extern const struct ynl_family ynl_netdev_family; /* Enums */ const char *netdev_op_str(int op); const char *netdev_xdp_act_str(enum netdev_xdp_act value); +const char *netdev_xsk_flags_str(enum netdev_xsk_flags value); /* Common nested types */ /* ============== NETDEV_CMD_DEV_GET ============== */ @@ -48,11 +49,13 @@ struct netdev_dev_get_rsp { __u32 ifindex:1; __u32 xdp_features:1; __u32 xdp_zc_max_segs:1; + __u32 xsk_features:1; } _present; __u32 ifindex; __u64 xdp_features; __u32 xdp_zc_max_segs; + __u64 xsk_features; }; void netdev_dev_get_rsp_free(struct netdev_dev_get_rsp *rsp); diff --git a/tools/net/ynl/samples/netdev.c b/tools/net/ynl/samples/netdev.c index 06433400dddd..06377e3f1df5 100644 --- a/tools/net/ynl/samples/netdev.c +++ b/tools/net/ynl/samples/netdev.c @@ -38,6 +38,12 @@ static void netdev_print_device(struct netdev_dev_get_rsp *d, unsigned int op) printf(" %s", netdev_xdp_act_str(1 << i)); } + printf(" %llx:", d->xsk_features); + for (int i = 0; d->xsk_features > 1U << i; i++) { + if (d->xsk_features & (1U << i)) + printf(" %s", netdev_xsk_flags_str(1 << i)); + } + printf(" xdp-zc-max-segs=%u", d->xdp_zc_max_segs); name = netdev_op_str(op); From patchwork Thu Sep 14 21:04:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386054 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59621273D4 for ; Thu, 14 Sep 2023 21:05:04 +0000 (UTC) Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E59D26B7 for ; Thu, 14 Sep 2023 14:05:03 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-59bdac026f7so19615857b3.0 for ; Thu, 14 Sep 2023 14:05:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725503; x=1695330303; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aCsQuDvVQHI1u2mjJYNUcec6V7Rj3JRmHEvDI0G06CE=; b=v6WZLdcLom/puLF4UUgNPmkqZR+jG7eMPFE0irf6dCrDMWfGnKDGoRPhLMt1ve4LC4 6AszHoQl+nB16mnfBGyJVa7gVgrupDUeabY/IH7OOckqKEDHXoLvCCF3cAFMiZb5Mafv bUYqiPmND+6aX+qarS4BoIRr1hnre9r9il+48b+nfTEuypwVucISx0/2poWKmPu+GIb8 7aiq5hKChvcu4kHKnGfWPimdL/BECm4QRZalD15/YOpbV7OAbn9wCoisCVRFrszfc9dQ 3AWFZZLM0pARyEjqDMw56BNS5Gw7WsZtnQqWX75E7ljbJxX/947bslIot8WJqSn+418Y 8V7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725503; x=1695330303; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aCsQuDvVQHI1u2mjJYNUcec6V7Rj3JRmHEvDI0G06CE=; b=Rpp/mNbi+RzT+YWku1oQI3efaOIFCSC6ucul08dAT90AfUroPkygaVoPO22P1Hop+D 7KFhSOGjqWmRxpMWtto8DIzv6D1Jgm63bRBFQt5XYLjdoVSCwfG7LL0/wbLu9KC0vyA7 spcYLa6ynRf5iJbrJxL+hQuutMhYY5HM8FC5lw2mgb6W5AHmmQCNv9bPpHf2cz30J8J0 3TYTttrioPcvm55rsDD716NhbOYLKoEnBT1TJZ29CNy6k1QJjZ5N/wBlOzHsnC/GhXyj sJkp1/1j0NveomabDePYAL22nwJaBvMOVPIlJ6al+D9eH61lqVwI392OcUd4JMusBtbe xzEg== X-Gm-Message-State: AOJu0YwO3ZL+t2KH6PB4Dm494mHSGD+LSQO5OCC3FCcT64RONTmuy2Kq mIZ44vw1K/7Z/oIMSQLH0p385DUlVINlzMQ4JEDBMQuqc5sXw36QKoHpP58oDVOclXWFLfQjXDw CD+Y2jcQ6cLjTKPTOieMNtUxW5Ho/PM+zhKjgdx7Nb+0ac0vEMQ== X-Google-Smtp-Source: AGHT+IHPawcHgrIhXe2P7WOlUIcnQYy6wXiAF1aZw58LrmpkUMFKpdsD3nbkHUFxOeu9lBFJ9nFDWfA= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a81:a08e:0:b0:59b:d9b8:9ae3 with SMTP id x136-20020a81a08e000000b0059bd9b89ae3mr146974ywg.10.1694725502305; Thu, 14 Sep 2023 14:05:02 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:47 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-5-sdf@google.com> Subject: [PATCH bpf-next v2 4/9] net/mlx5e: Implement AF_XDP TX timestamp and checksum offload From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net, Saeed Mahameed X-Patchwork-Delegate: bpf@iogearbox.net TX timestamp: - requires passing clock, not sure I'm passing the correct one (from cq->mdev), but the timestamp value looks convincing TX checksum: - looks like device does packet parsing (and doesn't accept custom start/offset), so I'm ignoring user offsets Cc: Saeed Mahameed Signed-off-by: Stanislav Fomichev --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 +- .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 72 ++++++++++++++++--- .../net/ethernet/mellanox/mlx5/core/en/xdp.h | 11 ++- .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 17 ++++- .../net/ethernet/mellanox/mlx5/core/en_main.c | 1 + 5 files changed, 89 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 86f2690c5e01..f64ceedcc665 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -476,10 +476,12 @@ struct mlx5e_xdp_info_fifo { struct mlx5e_xdpsq; struct mlx5e_xmit_data; +struct xsk_tx_metadata; typedef int (*mlx5e_fp_xmit_xdp_frame_check)(struct mlx5e_xdpsq *); typedef bool (*mlx5e_fp_xmit_xdp_frame)(struct mlx5e_xdpsq *, struct mlx5e_xmit_data *, - int); + int, + struct xsk_tx_metadata *); struct mlx5e_xdpsq { /* data path */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index 12f56d0db0af..b3227b73fc0d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -103,7 +103,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq, xdptxd->dma_addr = dma_addr; if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, - mlx5e_xmit_xdp_frame, sq, xdptxd, 0))) + mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL))) return false; /* xmit_mode == MLX5E_XDP_XMIT_MODE_FRAME */ @@ -145,7 +145,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq, xdptxd->dma_addr = dma_addr; if (unlikely(!INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, - mlx5e_xmit_xdp_frame, sq, xdptxd, 0))) + mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL))) return false; /* xmit_mode == MLX5E_XDP_XMIT_MODE_PAGE */ @@ -261,6 +261,37 @@ const struct xdp_metadata_ops mlx5e_xdp_metadata_ops = { .xmo_rx_hash = mlx5e_xdp_rx_hash, }; +struct mlx5e_xsk_tx_complete { + struct mlx5_cqe64 *cqe; + struct mlx5e_cq *cq; +}; + +static u64 mlx5e_xsk_fill_timestamp(void *_priv) +{ + struct mlx5e_xsk_tx_complete *priv = _priv; + u64 ts; + + ts = get_cqe_ts(priv->cqe); + + if (mlx5_is_real_time_rq(priv->cq->mdev) || mlx5_is_real_time_sq(priv->cq->mdev)) + return mlx5_real_time_cyc2time(&priv->cq->mdev->clock, ts); + + return mlx5_timecounter_cyc2time(&priv->cq->mdev->clock, ts); +} + +static void mlx5e_xsk_request_checksum(u16 csum_start, u16 csum_offset, void *priv) +{ + struct mlx5_wqe_eth_seg *eseg = priv; + + /* HW/FW is doing parsing, so offsets are largely ignored. */ + eseg->cs_flags |= MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM; +} + +const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops = { + .tmo_fill_timestamp = mlx5e_xsk_fill_timestamp, + .tmo_request_checksum = mlx5e_xsk_request_checksum, +}; + /* returns true if packet was consumed by xdp */ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct bpf_prog *prog, struct mlx5e_xdp_buff *mxbuf) @@ -398,11 +429,11 @@ INDIRECT_CALLABLE_SCOPE int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq INDIRECT_CALLABLE_SCOPE bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, - int check_result); + int check_result, struct xsk_tx_metadata *meta); INDIRECT_CALLABLE_SCOPE bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, - int check_result) + int check_result, struct xsk_tx_metadata *meta) { struct mlx5e_tx_mpwqe *session = &sq->mpwqe; struct mlx5e_xdpsq_stats *stats = sq->stats; @@ -420,7 +451,7 @@ mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptx */ if (unlikely(sq->mpwqe.wqe)) mlx5e_xdp_mpwqe_complete(sq); - return mlx5e_xmit_xdp_frame(sq, xdptxd, 0); + return mlx5e_xmit_xdp_frame(sq, xdptxd, 0, meta); } if (!xdptxd->len) { skb_frag_t *frag = &xdptxdf->sinfo->frags[0]; @@ -450,6 +481,7 @@ mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptx * and it's safe to complete it at any time. */ mlx5e_xdp_mpwqe_session_start(sq); + xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, &session->wqe->eth); } mlx5e_xdp_mpwqe_add_dseg(sq, p, stats); @@ -480,7 +512,7 @@ INDIRECT_CALLABLE_SCOPE int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq) INDIRECT_CALLABLE_SCOPE bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, - int check_result) + int check_result, struct xsk_tx_metadata *meta) { struct mlx5e_xmit_data_frags *xdptxdf = container_of(xdptxd, struct mlx5e_xmit_data_frags, xd); @@ -599,6 +631,8 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, sq->pc++; } + xsk_tx_metadata_request(meta, &mlx5e_xsk_tx_metadata_ops, eseg); + sq->doorbell_cseg = cseg; stats->xmit++; @@ -608,7 +642,9 @@ mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq, struct mlx5e_xdp_wqe_info *wi, u32 *xsk_frames, - struct xdp_frame_bulk *bq) + struct xdp_frame_bulk *bq, + struct mlx5e_cq *cq, + struct mlx5_cqe64 *cqe) { struct mlx5e_xdp_info_fifo *xdpi_fifo = &sq->db.xdpi_fifo; u16 i; @@ -668,10 +704,24 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq, break; } - case MLX5E_XDP_XMIT_MODE_XSK: + case MLX5E_XDP_XMIT_MODE_XSK: { /* AF_XDP send */ + struct xsk_tx_metadata_compl *compl = NULL; + struct mlx5e_xsk_tx_complete priv = { + .cqe = cqe, + .cq = cq, + }; + + if (xp_tx_metadata_enabled(sq->xsk_pool)) { + xdpi = mlx5e_xdpi_fifo_pop(xdpi_fifo); + compl = &xdpi.xsk_meta; + + xsk_tx_metadata_complete(compl, &mlx5e_xsk_tx_metadata_ops, &priv); + } + (*xsk_frames)++; break; + } default: WARN_ON_ONCE(true); } @@ -720,7 +770,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq) sqcc += wi->num_wqebbs; - mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq); + mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, cq, cqe); } while (!last_wqe); if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_REQ)) { @@ -767,7 +817,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq) sq->cc += wi->num_wqebbs; - mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq); + mlx5e_free_xdpsq_desc(sq, wi, &xsk_frames, &bq, NULL, NULL); } xdp_flush_frame_bulk(&bq); @@ -840,7 +890,7 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, } ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, - mlx5e_xmit_xdp_frame, sq, xdptxd, 0); + mlx5e_xmit_xdp_frame, sq, xdptxd, 0, NULL); if (unlikely(!ret)) { int j; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h index ecfe93a479da..e054db1e10f8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h @@ -33,6 +33,7 @@ #define __MLX5_EN_XDP_H__ #include +#include #include "en.h" #include "en/txrx.h" @@ -82,7 +83,7 @@ enum mlx5e_xdp_xmit_mode { * num, page_1, page_2, ... , page_num. * * MLX5E_XDP_XMIT_MODE_XSK: - * none. + * frame.xsk_meta. */ #define MLX5E_XDP_FIFO_ENTRIES2DS_MAX_RATIO 4 @@ -97,6 +98,7 @@ union mlx5e_xdp_info { u8 num; struct page *page; } page; + struct xsk_tx_metadata_compl xsk_meta; }; struct mlx5e_xsk_param; @@ -112,13 +114,16 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags); extern const struct xdp_metadata_ops mlx5e_xdp_metadata_ops; +extern const struct xsk_tx_metadata_ops mlx5e_xsk_tx_metadata_ops; INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame_mpwqe(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, - int check_result)); + int check_result, + struct xsk_tx_metadata *meta)); INDIRECT_CALLABLE_DECLARE(bool mlx5e_xmit_xdp_frame(struct mlx5e_xdpsq *sq, struct mlx5e_xmit_data *xdptxd, - int check_result)); + int check_result, + struct xsk_tx_metadata *meta)); INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check_mpwqe(struct mlx5e_xdpsq *sq)); INDIRECT_CALLABLE_DECLARE(int mlx5e_xmit_xdp_frame_check(struct mlx5e_xdpsq *sq)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c index 597f319d4770..a59199ed590d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c @@ -55,12 +55,16 @@ static void mlx5e_xsk_tx_post_err(struct mlx5e_xdpsq *sq, nopwqe = mlx5e_post_nop(&sq->wq, sq->sqn, &sq->pc); mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, *xdpi); + if (xp_tx_metadata_enabled(sq->xsk_pool)) + mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, + (union mlx5e_xdp_info) { .xsk_meta = {} }); sq->doorbell_cseg = &nopwqe->ctrl; } bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) { struct xsk_buff_pool *pool = sq->xsk_pool; + struct xsk_tx_metadata *meta = NULL; union mlx5e_xdp_info xdpi; bool work_done = true; bool flush = false; @@ -93,12 +97,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr); xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr); xdptxd.len = desc.len; + meta = xsk_buff_get_metadata(pool, desc.addr); xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len); ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, mlx5e_xmit_xdp_frame, sq, &xdptxd, - check_result); + check_result, meta); if (unlikely(!ret)) { if (sq->mpwqe.wqe) mlx5e_xdp_mpwqe_complete(sq); @@ -106,6 +111,16 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) mlx5e_xsk_tx_post_err(sq, &xdpi); } else { mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, xdpi); + if (xp_tx_metadata_enabled(sq->xsk_pool)) { + struct xsk_tx_metadata_compl compl; + + xsk_tx_metadata_to_compl(meta, &compl); + XSK_TX_COMPL_FITS(void *); + + mlx5e_xdpi_fifo_push(&sq->db.xdpi_fifo, + (union mlx5e_xdp_info) + { .xsk_meta = compl }); + } } flush = true; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a2ae791538ed..61109d2f4a1c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5096,6 +5096,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->netdev_ops = &mlx5e_netdev_ops; netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops; + netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops; mlx5e_dcbnl_build_netdev(netdev); From patchwork Thu Sep 14 21:04:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386055 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C597226E34 for ; Thu, 14 Sep 2023 21:05:05 +0000 (UTC) Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F0B92700 for ; Thu, 14 Sep 2023 14:05:05 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-577b9a2429cso1103373a12.3 for ; Thu, 14 Sep 2023 14:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725505; x=1695330305; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tf19T9rRGxsZfukPPxPZNYc1yxG0C28TAql/7/AOXQo=; b=eiFAwx9Cp3H6u2ASKyqNC5QDFwljBq/blGWGUnE3GVh+0sRc6M0DTVqyuCP5kewW7D pStwu9HhmEjqNAWnhZLFVQcbVNhTKjGtBZtD0dgM4j6likQPuEaY9r2bWjAe5hKIyP4/ 5cYYIVw7oBHua2Q7EJYbp22WSlZGiIqK69kSeo7bXDpzXJ2mSJ+Ka6K3RBvsEhdlCZVj 8j/QkshqjutPNHCyCs1E7wiCIt0gUn84uBt5jrB3D2vz9TKEfKuXZp/gt30r46qlwl2T peHxyAWlO22YcFkyBv+eQYhkdE11Xe+bZtjwo3Z5LmHwkLPL25xuqz0MYJHxZTt4TJ0g mbfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725505; x=1695330305; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tf19T9rRGxsZfukPPxPZNYc1yxG0C28TAql/7/AOXQo=; b=RQztAMpKJZKcdexeFgIMXflkstiv4A9RLtbZkW+ll1vvqlDBnXxHwrh9VwXZXDatLr W63ffpNjvPfFIXcGCI1jypdyJvvfYo2bbqzAd8rAG3mbNPfxUGCdYVeXjRhSCxbyeACR 2Oin7qsxIdh6scaAz5IBphj5QvQseSlnSLxW3okPRKmQRcqmtMqTtbK75JdqvUtcaOxq FdqAeB8T31m09kasq1UFnXIYHR+Ofh0vQbMK9aYCAhXhXVup5uXwJJDD5swo4qxjb6LE 7ljcnPRKRJ7CElLE2Ja+9SIAUTJlx49l2E7L+iF1mLbr+CFa7oiNO0+Q1B9PowHSRrYV nMJA== X-Gm-Message-State: AOJu0YxAimq+44IJ0VzJWe5WGeSSXc0UfawKDb3TXW00uivyBUtDzwPw Wy+ZlbbIx7aRb0bxqPUs7TjZm/xgOjWxIKp9cnxCVwa6EuN1FsQV+YRTr+aSnwZSu3CuWPyDcoK cpJmnROOok2W+JcIkbdiNlcyu6E78xk4qoo9LWQgZHlAo5lvhMw== X-Google-Smtp-Source: AGHT+IFmD2hAX/VwTYHIHw/W7HhFYlJ6aK73uLJiGlxAHLq3ldCzLSUUUdBbFn8FvQzx2drbgbuGquY= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a63:af06:0:b0:566:1c6:139b with SMTP id w6-20020a63af06000000b0056601c6139bmr145712pge.8.1694725504374; Thu, 14 Sep 2023 14:05:04 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:48 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-6-sdf@google.com> Subject: [PATCH bpf-next v2 5/9] selftests/xsk: Support tx_metadata_len From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net Add new config field and propagate to umem registration setsockopt. Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/xsk.c | 3 +++ tools/testing/selftests/bpf/xsk.h | 1 + 2 files changed, 4 insertions(+) diff --git a/tools/testing/selftests/bpf/xsk.c b/tools/testing/selftests/bpf/xsk.c index d9fb2b730a2c..24f5313dbfde 100644 --- a/tools/testing/selftests/bpf/xsk.c +++ b/tools/testing/selftests/bpf/xsk.c @@ -115,6 +115,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg, cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE; cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM; cfg->flags = XSK_UMEM__DEFAULT_FLAGS; + cfg->tx_metadata_len = 0; return; } @@ -123,6 +124,7 @@ static void xsk_set_umem_config(struct xsk_umem_config *cfg, cfg->frame_size = usr_cfg->frame_size; cfg->frame_headroom = usr_cfg->frame_headroom; cfg->flags = usr_cfg->flags; + cfg->tx_metadata_len = usr_cfg->tx_metadata_len; } static int xsk_set_xdp_socket_config(struct xsk_socket_config *cfg, @@ -252,6 +254,7 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, mr.chunk_size = umem->config.frame_size; mr.headroom = umem->config.frame_headroom; mr.flags = umem->config.flags; + mr.tx_metadata_len = umem->config.tx_metadata_len; err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)); if (err) { diff --git a/tools/testing/selftests/bpf/xsk.h b/tools/testing/selftests/bpf/xsk.h index d93200fdaa8d..bff8e50d7532 100644 --- a/tools/testing/selftests/bpf/xsk.h +++ b/tools/testing/selftests/bpf/xsk.h @@ -200,6 +200,7 @@ struct xsk_umem_config { __u32 frame_size; __u32 frame_headroom; __u32 flags; + __u32 tx_metadata_len; }; int xsk_attach_xdp_program(struct bpf_program *prog, int ifindex, u32 xdp_flags); From patchwork Thu Sep 14 21:04:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386056 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFB9126E34 for ; Thu, 14 Sep 2023 21:05:07 +0000 (UTC) Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A3D42698 for ; Thu, 14 Sep 2023 14:05:07 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id 41be03b00d2f7-577b9a2429cso1103404a12.3 for ; Thu, 14 Sep 2023 14:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725507; x=1695330307; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=14VbHOqDafwAEmRkcsvovpu96sXlLVwtTgqqhNnYgGk=; b=2pnCtxZXdZJR0V9aRR1eCAEE+FC567rt+rRr7MjcK5EUwCOCjXqZZyQ0kobuNl3jHr zIoeuA4aGHtV3zp4qnVQHsqhbR5u+PDfxJLdzs0A9qt9ZK/rCBkwb2K5l7w1K/0ctU08 ad59jHjzLZa+8AEzTnYK5Rb7yVQwZBlrNF4osIQlxChOh+ApiYHKMapkBm9iIetLD5C0 d/DmgPMDZUXrkloHOIexOJw0ItphHjX0WljgbCTQuPzhBy0udBofirr3ib3okluk+AYT bbmJdI3WVfH7H02lk4bL9OzxpAARGogGHLk/1CFU1CW1v1rz1lJyeQ3vKy8oNAwHX3Af 0KiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725507; x=1695330307; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=14VbHOqDafwAEmRkcsvovpu96sXlLVwtTgqqhNnYgGk=; b=li1v08QBBqpyheTum5Rg0/uqHzSpOXs1tmydp+k858H92/fE87r6cCcib9cmB7i3Xx /tZHMJM8IC5MLqWUGHBBfqNsUBmYQkq/SYlZfCeepFJyz/fk5v3Ik2Ip5hxgVK3BO7h1 SN3dqdWHYaFlbHXMYaMH18JhORN9QF9fQxmseAZ9MlSV/wsGaRWpQ1NOSAQ0WCbbApCL qPYDC/oErSPAb7n6CiGtSa+pEH78feXTui+YVDxbfuN8KnDDCBA/uStHDxH9elwy5qRR mtxCxlga0P12KDRx9nS+LtvlDrQIInfjLTNroTKxsWJl6xvSifV4tSC+kCwKsaAip6g8 ozMA== X-Gm-Message-State: AOJu0YzngFrwH4omjoqGmtKuAiWJuQw5PNhecQ3B5xVhkTRur63S+6TB 01qMKxDioTpmoVo2Deo/gMSlfh8FO258sX02ZgJ6ouRli3mXpSwLQ9CN4oFQ4b1VKuv91rfkTaM niXuUn/g6rrBSScuIj4OA35+kxw3HhXkhcxiaHs7YqGNPYOE/6w== X-Google-Smtp-Source: AGHT+IHwSi48qvJdhvmPfcZWPgqPC20HubxSk5DHX5wOCnxgL7h677co1BhFyfAM+WdU60ZWR+hFZKA= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a63:be41:0:b0:563:84fc:f4dd with SMTP id g1-20020a63be41000000b0056384fcf4ddmr146499pgo.6.1694725506209; Thu, 14 Sep 2023 14:05:06 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:49 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-7-sdf@google.com> Subject: [PATCH bpf-next v2 6/9] selftests/bpf: Add csum helpers From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net Checksum helpers will be used to calculate pseudo-header checksum in AF_XDP metadata selftests. The helpers are mirroring existing kernel ones: - csum_tcpudp_magic : IPv4 pseudo header csum - csum_ipv6_magic : IPv6 pseudo header csum - csum_fold : fold csum and do one's complement Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/network_helpers.h | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index 5eccc67d1a99..654a854c9fb2 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -70,4 +70,47 @@ struct nstoken; */ struct nstoken *open_netns(const char *name); void close_netns(struct nstoken *token); + +static __u16 csum_fold(__u32 csum) +{ + csum = (csum & 0xffff) + (csum >> 16); + csum = (csum & 0xffff) + (csum >> 16); + + return (__u16)~csum; +} + +static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, + __u32 len, __u8 proto, + __wsum csum) +{ + __u64 s = csum; + + s += (__u32)saddr; + s += (__u32)daddr; + s += htons(proto + len); + s = (s & 0xffffffff) + (s >> 32); + s = (s & 0xffffffff) + (s >> 32); + + return csum_fold((__u32)s); +} + +static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr, + const struct in6_addr *daddr, + __u32 len, __u8 proto, + __wsum csum) +{ + __u64 s = csum; + int i; + + for (i = 0; i < 4; i++) + s += (__u32)saddr->s6_addr32[i]; + for (i = 0; i < 4; i++) + s += (__u32)daddr->s6_addr32[i]; + s += htons(proto + len); + s = (s & 0xffffffff) + (s >> 32); + s = (s & 0xffffffff) + (s >> 32); + + return csum_fold((__u32)s); +} + #endif From patchwork Thu Sep 14 21:04:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386057 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C64F927716 for ; Thu, 14 Sep 2023 21:05:09 +0000 (UTC) Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6A1792700 for ; Thu, 14 Sep 2023 14:05:09 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-59b5a586da6so26825347b3.1 for ; Thu, 14 Sep 2023 14:05:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725508; x=1695330308; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lXhMN6bGqnJggKKLO/4TPQFnVFdhmWWVa7skOr7EcB8=; b=1eVf/qMBlmaRnP/1942HvQrE/fVdAZvSdZl+GAruNIN/cxjz2IRUWcYkdxtXwVC7i+ mc2qT+JSgHDHggfQjgWUzwiASaPgEzaAgj5jSqZfzFO3a650REd3oj9SnQQMm30g8oY+ JuVdE9/s78Vr97KT6H1xz4/AC2aECct6lm3uqP1S6fPHgP3lHa1seGnGt0TM1AR9EApY ZoQGxK2pIjGQTPt7LCVcheSdLo7wlt+wMTm/lI5qk4e19MytdrLCE84U5s07qpgj3Jlv dEGrCgKlm5bWYPqmGe3A5iZmdvcLTiuzKILne8Um8WOEpgKsQ1PD2J6pZnW2GBXCK+oW b7Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725508; x=1695330308; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lXhMN6bGqnJggKKLO/4TPQFnVFdhmWWVa7skOr7EcB8=; b=iMKMH0mejeomLAvtaph2Bjp+NyV/bv74pl482GJujd9SiIzOCmMonSLxd1la+xspcS IaXi1+8462F81vfjMcmxOdyLyOIDdOCZKdCKT6p/+ddlRHx6vGvq0vsPA2ddl9HL8GMF i6fMSlbT5eNDOJ64As0VF//4lnmTBQx5NhIlIDgXqqnGwnVlBfEzgdVLMThE/WG8zPt/ Y+GetJUhIGJ6Rgtw9KkV0YS4LwIRyhBXOMUBOMomuKBfmyNLksJWi5rY44Xik2EKkyS8 U0eBpxQN3Q4HFzyFsyw6skVhLI5Ipf/zrXG76nRaiUAGOMd24aGaeRSTg96Uf07lgGjF 8H/g== X-Gm-Message-State: AOJu0YyV5LwAb66iCgRSlx8H12BH36MYlXGU+niGquJ2syzOxnxhVK6P ooB3KcXYTK21xoAVdRC9dOv0Rjk77NyNrbMKh4njLIzW4ZX4NyzfK6uf98oqQ0T4JujYDFNOzgc t16eQCtqSNvPUOcr9OadiCH1aL0/VyI0HDBTIruWQQALpon8jgQ== X-Google-Smtp-Source: AGHT+IGI2D9XWQKji+z7BtupmR2kkWhx+xOBty7hIfVTjdRPqhTUDncVdW97+QXgcVjNaC3FDLQCpxo= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a81:9850:0:b0:59b:5a5b:3a91 with SMTP id p77-20020a819850000000b0059b5a5b3a91mr104601ywg.2.1694725508369; Thu, 14 Sep 2023 14:05:08 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:50 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-8-sdf@google.com> Subject: [PATCH bpf-next v2 7/9] selftests/bpf: Add TX side to xdp_metadata From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net Request TX timestamp and make sure it's not empty. Request TX checksum offload (SW-only) and make sure it's resolved to the correct one. Signed-off-by: Stanislav Fomichev --- .../selftests/bpf/prog_tests/xdp_metadata.c | 31 +++++++++++++++++-- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c index 626c461fa34d..f0da8fe93276 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c @@ -57,6 +57,7 @@ static int open_xsk(int ifindex, struct xsk *xsk) .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, + .tx_metadata_len = sizeof(struct xsk_tx_metadata), }; __u32 idx; u64 addr; @@ -138,6 +139,7 @@ static void ip_csum(struct iphdr *iph) static int generate_packet(struct xsk *xsk, __u16 dst_port) { + struct xsk_tx_metadata *meta; struct xdp_desc *tx_desc; struct udphdr *udph; struct ethhdr *eth; @@ -151,10 +153,14 @@ static int generate_packet(struct xsk *xsk, __u16 dst_port) return -1; tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx); - tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE; + tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE + sizeof(struct xsk_tx_metadata); printf("%p: tx_desc[%u]->addr=%llx\n", xsk, idx, tx_desc->addr); data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr); + meta = data - sizeof(struct xsk_tx_metadata); + memset(meta, 0, sizeof(*meta)); + meta->flags = XDP_TX_METADATA_TIMESTAMP; + eth = data; iph = (void *)(eth + 1); udph = (void *)(iph + 1); @@ -178,11 +184,17 @@ static int generate_packet(struct xsk *xsk, __u16 dst_port) udph->source = htons(AF_XDP_SOURCE_PORT); udph->dest = htons(dst_port); udph->len = htons(sizeof(*udph) + UDP_PAYLOAD_BYTES); - udph->check = 0; + udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, + ntohs(udph->len), IPPROTO_UDP, 0); memset(udph + 1, 0xAA, UDP_PAYLOAD_BYTES); + meta->flags |= XDP_TX_METADATA_CHECKSUM | XDP_TX_METADATA_CHECKSUM_SW; + meta->csum_start = sizeof(*eth) + sizeof(*iph); + meta->csum_offset = offsetof(struct udphdr, check); + tx_desc->len = sizeof(*eth) + sizeof(*iph) + sizeof(*udph) + UDP_PAYLOAD_BYTES; + tx_desc->options |= XDP_TX_METADATA; xsk_ring_prod__submit(&xsk->tx, 1); ret = sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0); @@ -194,13 +206,21 @@ static int generate_packet(struct xsk *xsk, __u16 dst_port) static void complete_tx(struct xsk *xsk) { - __u32 idx; + struct xsk_tx_metadata *meta; __u64 addr; + void *data; + __u32 idx; if (ASSERT_EQ(xsk_ring_cons__peek(&xsk->comp, 1, &idx), 1, "xsk_ring_cons__peek")) { addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx); printf("%p: complete tx idx=%u addr=%llx\n", xsk, idx, addr); + + data = xsk_umem__get_data(xsk->umem_area, addr); + meta = data - sizeof(struct xsk_tx_metadata); + + ASSERT_NEQ(meta->completion.tx_timestamp, 0, "tx_timestamp"); + xsk_ring_cons__release(&xsk->comp, 1); } } @@ -221,6 +241,7 @@ static int verify_xsk_metadata(struct xsk *xsk) const struct xdp_desc *rx_desc; struct pollfd fds = {}; struct xdp_meta *meta; + struct udphdr *udph; struct ethhdr *eth; struct iphdr *iph; __u64 comp_addr; @@ -257,6 +278,7 @@ static int verify_xsk_metadata(struct xsk *xsk) ASSERT_EQ(eth->h_proto, htons(ETH_P_IP), "eth->h_proto"); iph = (void *)(eth + 1); ASSERT_EQ((int)iph->version, 4, "iph->version"); + udph = (void *)(iph + 1); /* custom metadata */ @@ -270,6 +292,9 @@ static int verify_xsk_metadata(struct xsk *xsk) ASSERT_EQ(meta->rx_hash_type, 0, "rx_hash_type"); + /* checksum offload */ + ASSERT_EQ(udph->check, 0x1c72, "csum"); + xsk_ring_cons__release(&xsk->rx, 1); refill_rx(xsk, comp_addr); From patchwork Thu Sep 14 21:04:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386059 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2D7D273D8 for ; Thu, 14 Sep 2023 21:05:12 +0000 (UTC) Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19F7A2698 for ; Thu, 14 Sep 2023 14:05:12 -0700 (PDT) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-68fb5a36001so1284828b3a.0 for ; Thu, 14 Sep 2023 14:05:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725511; x=1695330311; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GlWYpCqhbCmdH8qiwsH4cHVSjTYKJwE8lzm2W7kR5g8=; b=2konUb3b/kZf3UWmtuB5CIEpIRqrht5coTP4sqTxFPzg74YtqZ3sVp3uDZEugYj5wH /6IyVml2P1Cnc5JvAMmiNwOO8jjDeL1b3Ez3DHvET5viJX1Dv9ZevJJ9cfQWirZpfUzv gz8j62s1ndUpF71a8w3IeGyWaavFrNnt2EWd3I0oiSQ+0hzuFnAHhB9ftNZ0WfPHKZXf h4x2wmh1mgqevxdFjnteM3e6VDf9Uy6aWxryGOUtDE5FxQ7CgcEovW/rOBtxhuhESb3S BGqZDAOmqjRlJip9K1M4H2G0LFfm17lX2FYpyzBYOEP9ISJ0FunGUohktfabFyhtSlLT 3PWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725511; x=1695330311; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GlWYpCqhbCmdH8qiwsH4cHVSjTYKJwE8lzm2W7kR5g8=; b=QwfmOxP9qOMwpAHLvBiCSo6H7QaiC5zK6ODv1ZFLMrd7XwENSOokkj2ET16njVPtSB avuJ1DhS6parsGwgGjiINehfn6iSVHUGUGFu0t50MFC/kuTbjaxP2DLmOTVBoq8u0RHE iinuBCZDdLlYObng8J7Ur1UX79Hh9/vwWjKeTGAEkEFcLuZrLQzly75K7KlVtTFZM89s BAbsPD9FXm2f/+PW3O+qB+oravPzX4G13feS/tveJB/PV8uGpinffsNNVKP6MjDyCvlL IhUWufbcoB0oyQglMGxHlsWfwooe6si2iw/pGbC7f8jUtyVBnkGR4nVTnPmZJgKQ3E1d 3a8Q== X-Gm-Message-State: AOJu0Yzp1EdW/91KLkobCpDbDw2fTcWb9mB8iLLu/IxsoRwzlNhp0ZkH mS/gDDFaCjs8c6Yq0seg7M8rmR2J02FQTBhraURKeF1WoPIRi/fVotugBgtxpK6n5qZF8mRgKqj jcHIwX0ivFvsTH7lkjkr288QC+BuiPAQag2+WxskqGypj8vUUzw== X-Google-Smtp-Source: AGHT+IGuhZsz2G5Om4WOLOPld0oLUgN1OjaPc+5x1PDZP0t0WxFdj5zB3+OG1M7BuWcQSDmw3A7E5oQ= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a05:6a00:18a3:b0:68a:58e1:ebf5 with SMTP id x35-20020a056a0018a300b0068a58e1ebf5mr317916pfh.2.1694725510362; Thu, 14 Sep 2023 14:05:10 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:51 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-9-sdf@google.com> Subject: [PATCH bpf-next v2 8/9] selftests/bpf: Add TX side to xdp_hw_metadata From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net When we get a packet on port 9091, we swap src/dst and send it out. At this point we also request the timestamp and checksum offloads. Checksum offload is verified by looking at the tcpdump on the other side. The tool prints pseudo-header csum and the final one it expects. The final checksum actually matches the incoming packets checksum because we only flip the src/dst and don't change the payload. Some other related changes: - switched to zerocopy mode by default; new flag can be used to force old behavior - request fixed tx_metadata_len headroom - some other small fixes (umem size, fill idx+i, etc) mvbz3:~# ./xdp_hw_metadata eth3 ... 0x1062cb8: rx_desc[0]->addr=80100 addr=80100 comp_addr=80100 rx_hash: 0x2E1B50B9 with RSS type:0x2A rx_timestamp: 1691436369532047139 (sec:1691436369.5320) XDP RX-time: 1691436369261756803 (sec:1691436369.2618) delta sec:-0.2703 (-270290.336 usec) AF_XDP time: 1691436369261878839 (sec:1691436369.2619) delta sec:0.0001 (122.036 usec) 0x1062cb8: ping-pong with csum=3b8e (want de7e) csum_start=54 csum_offset=6 0x1062cb8: complete tx idx=0 addr=10 0x1062cb8: tx_timestamp: 1691436369598419505 (sec:1691436369.5984) 0x1062cb8: complete rx idx=128 addr=80100 mvbz4:~# nc -Nu -q1 ${MVBZ3_LINK_LOCAL_IP}%eth3 9091 mvbz4:~# tcpdump -vvx -i eth3 udp tcpdump: listening on eth3, link-type EN10MB (Ethernet), snapshot length 262144 bytes 12:26:09.301074 IP6 (flowlabel 0x35fa5, hlim 127, next-header UDP (17) payload length: 11) fe80::1270:fdff:fe48:1087.55807 > fe80::1270:fdff:fe48:1077.9091: [bad udp cksum 0x3b8e -> 0xde7e!] UDP, length 3 0x0000: 6003 5fa5 000b 117f fe80 0000 0000 0000 0x0010: 1270 fdff fe48 1087 fe80 0000 0000 0000 0x0020: 1270 fdff fe48 1077 d9ff 2383 000b 3b8e 0x0030: 7864 70 12:26:09.301976 IP6 (flowlabel 0x35fa5, hlim 127, next-header UDP (17) payload length: 11) fe80::1270:fdff:fe48:1077.9091 > fe80::1270:fdff:fe48:1087.55807: [udp sum ok] UDP, length 3 0x0000: 6003 5fa5 000b 117f fe80 0000 0000 0000 0x0010: 1270 fdff fe48 1077 fe80 0000 0000 0000 0x0020: 1270 fdff fe48 1087 2383 d9ff 000b de7e 0x0030: 7864 70 Signed-off-by: Stanislav Fomichev --- tools/testing/selftests/bpf/xdp_hw_metadata.c | 202 +++++++++++++++++- 1 file changed, 192 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/bpf/xdp_hw_metadata.c b/tools/testing/selftests/bpf/xdp_hw_metadata.c index 613321eb84c1..ab83d0ba6763 100644 --- a/tools/testing/selftests/bpf/xdp_hw_metadata.c +++ b/tools/testing/selftests/bpf/xdp_hw_metadata.c @@ -10,7 +10,9 @@ * - rx_hash * * TX: - * - TBD + * - UDP 9091 packets trigger TX reply + * - TX HW timestamp is requested and reported back upon completion + * - TX checksum is requested */ #include @@ -24,14 +26,17 @@ #include #include #include +#include #include #include #include #include +#include +#include #include "xdp_metadata.h" -#define UMEM_NUM 16 +#define UMEM_NUM 256 #define UMEM_FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE #define UMEM_SIZE (UMEM_FRAME_SIZE * UMEM_NUM) #define XDP_FLAGS (XDP_FLAGS_DRV_MODE | XDP_FLAGS_REPLACE) @@ -51,22 +56,24 @@ struct xsk *rx_xsk; const char *ifname; int ifindex; int rxq; +bool skip_tx; void test__fail(void) { /* for network_helpers.c */ } -static int open_xsk(int ifindex, struct xsk *xsk, __u32 queue_id) +static int open_xsk(int ifindex, struct xsk *xsk, __u32 queue_id, int flags) { int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE; const struct xsk_socket_config socket_config = { .rx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, .tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, - .bind_flags = XDP_COPY, + .bind_flags = flags, }; const struct xsk_umem_config umem_config = { .fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS, .comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS, .frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE, - .flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG, + .flags = XSK_UMEM__DEFAULT_FLAGS, + .tx_metadata_len = sizeof(struct xsk_tx_metadata), }; __u32 idx; u64 addr; @@ -108,7 +115,7 @@ static int open_xsk(int ifindex, struct xsk *xsk, __u32 queue_id) for (i = 0; i < UMEM_NUM / 2; i++) { addr = (UMEM_NUM / 2 + i) * UMEM_FRAME_SIZE; printf("%p: rx_desc[%d] -> %lx\n", xsk, i, addr); - *xsk_ring_prod__fill_addr(&xsk->fill, i) = addr; + *xsk_ring_prod__fill_addr(&xsk->fill, idx + i) = addr; } xsk_ring_prod__submit(&xsk->fill, ret); @@ -129,12 +136,22 @@ static void refill_rx(struct xsk *xsk, __u64 addr) __u32 idx; if (xsk_ring_prod__reserve(&xsk->fill, 1, &idx) == 1) { - printf("%p: complete idx=%u addr=%llx\n", xsk, idx, addr); + printf("%p: complete rx idx=%u addr=%llx\n", xsk, idx, addr); *xsk_ring_prod__fill_addr(&xsk->fill, idx) = addr; xsk_ring_prod__submit(&xsk->fill, 1); } } +static int kick_tx(struct xsk *xsk) +{ + return sendto(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, 0); +} + +static int kick_rx(struct xsk *xsk) +{ + return recvfrom(xsk_socket__fd(xsk->socket), NULL, 0, MSG_DONTWAIT, NULL, NULL); +} + #define NANOSEC_PER_SEC 1000000000 /* 10^9 */ static __u64 gettime(clockid_t clock_id) { @@ -228,6 +245,117 @@ static void verify_skb_metadata(int fd) printf("skb hwtstamp is not found!\n"); } +static bool complete_tx(struct xsk *xsk) +{ + struct xsk_tx_metadata *meta; + __u64 addr; + void *data; + __u32 idx; + + if (!xsk_ring_cons__peek(&xsk->comp, 1, &idx)) + return false; + + addr = *xsk_ring_cons__comp_addr(&xsk->comp, idx); + data = xsk_umem__get_data(xsk->umem_area, addr); + meta = data - sizeof(struct xsk_tx_metadata); + + printf("%p: complete tx idx=%u addr=%llx\n", xsk, idx, addr); + printf("%p: tx_timestamp: %llu (sec:%0.4f)\n", xsk, + meta->completion.tx_timestamp, + (double)meta->completion.tx_timestamp / NANOSEC_PER_SEC); + xsk_ring_cons__release(&xsk->comp, 1); + + return true; +} + +#define swap(a, b, len) do { \ + for (int i = 0; i < len; i++) { \ + __u8 tmp = ((__u8 *)a)[i]; \ + ((__u8 *)a)[i] = ((__u8 *)b)[i]; \ + ((__u8 *)b)[i] = tmp; \ + } \ +} while (0) + +static void ping_pong(struct xsk *xsk, void *rx_packet) +{ + struct xsk_tx_metadata *meta; + struct ipv6hdr *ip6h = NULL; + struct iphdr *iph = NULL; + struct xdp_desc *tx_desc; + struct udphdr *udph; + struct ethhdr *eth; + __sum16 want_csum; + void *data; + __u32 idx; + int ret; + int len; + + ret = xsk_ring_prod__reserve(&xsk->tx, 1, &idx); + if (ret != 1) { + printf("%p: failed to reserve tx slot\n", xsk); + return; + } + + tx_desc = xsk_ring_prod__tx_desc(&xsk->tx, idx); + tx_desc->addr = idx % (UMEM_NUM / 2) * UMEM_FRAME_SIZE + sizeof(struct xsk_tx_metadata); + data = xsk_umem__get_data(xsk->umem_area, tx_desc->addr); + + meta = data - sizeof(struct xsk_tx_metadata); + memset(meta, 0, sizeof(*meta)); + meta->flags = XDP_TX_METADATA_TIMESTAMP; + + eth = rx_packet; + + if (eth->h_proto == htons(ETH_P_IP)) { + iph = (void *)(eth + 1); + udph = (void *)(iph + 1); + } else if (eth->h_proto == htons(ETH_P_IPV6)) { + ip6h = (void *)(eth + 1); + udph = (void *)(ip6h + 1); + } else { + printf("%p: failed to detect IP version for ping pong %04x\n", xsk, eth->h_proto); + xsk_ring_prod__cancel(&xsk->tx, 1); + return; + } + + len = ETH_HLEN; + if (ip6h) + len += sizeof(*ip6h) + ntohs(ip6h->payload_len); + if (iph) + len += ntohs(iph->tot_len); + + swap(eth->h_dest, eth->h_source, ETH_ALEN); + if (iph) + swap(&iph->saddr, &iph->daddr, 4); + else + swap(&ip6h->saddr, &ip6h->daddr, 16); + swap(&udph->source, &udph->dest, 2); + + want_csum = udph->check; + if (ip6h) + udph->check = ~csum_ipv6_magic(&ip6h->saddr, &ip6h->daddr, + ntohs(udph->len), IPPROTO_UDP, 0); + else + udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, + ntohs(udph->len), IPPROTO_UDP, 0); + + meta->flags |= XDP_TX_METADATA_CHECKSUM; + if (iph) + meta->csum_start = sizeof(*eth) + sizeof(*iph); + else + meta->csum_start = sizeof(*eth) + sizeof(*ip6h); + meta->csum_offset = offsetof(struct udphdr, check); + + printf("%p: ping-pong with csum=%04x (want %04x) csum_start=%d csum_offset=%d\n", + xsk, ntohs(udph->check), ntohs(want_csum), meta->csum_start, meta->csum_offset); + + memcpy(data, rx_packet, len); /* don't share umem chunk for simplicity */ + tx_desc->options |= XDP_TX_METADATA; + tx_desc->len = len; + + xsk_ring_prod__submit(&xsk->tx, 1); +} + static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t clock_id) { const struct xdp_desc *rx_desc; @@ -250,6 +378,13 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t while (true) { errno = 0; + + for (i = 0; i < rxq; i++) { + ret = kick_rx(&rx_xsk[i]); + if (ret) + printf("kick_rx ret=%d\n", ret); + } + ret = poll(fds, rxq + 1, 1000); printf("poll: %d (%d) skip=%llu fail=%llu redir=%llu\n", ret, errno, bpf_obj->bss->pkts_skip, @@ -280,6 +415,22 @@ static int verify_metadata(struct xsk *rx_xsk, int rxq, int server_fd, clockid_t xsk, idx, rx_desc->addr, addr, comp_addr); verify_xdp_metadata(xsk_umem__get_data(xsk->umem_area, addr), clock_id); + + if (!skip_tx) { + /* mirror the packet back */ + ping_pong(xsk, xsk_umem__get_data(xsk->umem_area, addr)); + + ret = kick_tx(xsk); + if (ret) + printf("kick_tx ret=%d\n", ret); + + for (int j = 0; j < 500; j++) { + if (complete_tx(xsk)) + break; + usleep(10*1000); + } + } + xsk_ring_cons__release(&xsk->rx, 1); refill_rx(xsk, comp_addr); } @@ -404,21 +555,52 @@ static void timestamping_enable(int fd, int val) error(1, errno, "setsockopt(SO_TIMESTAMPING)"); } +static void usage(const char *prog) +{ + fprintf(stderr, + "usage: %s [OPTS] \n" + "OPTS:\n" + " -r don't generate AF_XDP reply (rx metadata only)\n" + " -c run in copy mode\n", + prog); +} + int main(int argc, char *argv[]) { + int bind_flags = XDP_USE_NEED_WAKEUP | XDP_ZEROCOPY; clockid_t clock_id = CLOCK_TAI; int server_fd = -1; + int opt; int ret; int i; struct bpf_program *prog; - if (argc != 2) { + while ((opt = getopt(argc, argv, "rc")) != -1) { + switch (opt) { + case 'r': + skip_tx = true; + break; + case 'c': + bind_flags = XDP_USE_NEED_WAKEUP | XDP_COPY; + break; + default: + usage(basename(argv[0])); + return 1; + } + } + + if (argc < 2) { fprintf(stderr, "pass device name\n"); return -1; } - ifname = argv[1]; + if (optind >= argc) { + usage(basename(argv[0])); + return 1; + } + + ifname = argv[optind]; ifindex = if_nametoindex(ifname); rxq = rxq_num(ifname); @@ -432,7 +614,7 @@ int main(int argc, char *argv[]) for (i = 0; i < rxq; i++) { printf("open_xsk(%s, %p, %d)\n", ifname, &rx_xsk[i], i); - ret = open_xsk(ifindex, &rx_xsk[i], i); + ret = open_xsk(ifindex, &rx_xsk[i], i, bind_flags); if (ret) error(1, -ret, "open_xsk"); From patchwork Thu Sep 14 21:04:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stanislav Fomichev X-Patchwork-Id: 13386058 X-Patchwork-Delegate: bpf@iogearbox.net Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B4FD127716 for ; Thu, 14 Sep 2023 21:05:13 +0000 (UTC) Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B0892701 for ; Thu, 14 Sep 2023 14:05:13 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-26d50941f68so1300648a91.1 for ; Thu, 14 Sep 2023 14:05:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694725513; x=1695330313; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=13spL0Rm0ZGXcXIcH5DSZIzgePO9nUsV+W7OjRfRCTw=; b=z7o/7zN8ak/Mld6rjv1ajnGeVLErVL+p+yv/QSP5nA1cEZBJX0JPt6vvWzrkSTjCdn KQxLb1UuB2y8bv6rr+u8ArI0ORPPYyJIZ/O+MdMHUDydYDEKbgcuNG/R7oSXQg7MMS/Q NHy7GQGog0qzu0JAd5TQVZa0hqjBa64XAKk6uuuQ7jles/2pakDfzbyQU8Ql6x0CYSUl +fH4Sre+DiyJ0vC9de0UkobSPE4UdnvUp1NIT4g5b1RPnAZnPBUSzXu3KUpQr6Mqwtz6 DLYRJ1eLuYKiUA+PRrYt7aQfbD6JrM3m8XN0x86L0ysFwafhmE7QZamgZr5BDghX7JBz 3Tdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694725513; x=1695330313; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=13spL0Rm0ZGXcXIcH5DSZIzgePO9nUsV+W7OjRfRCTw=; b=cxYp027IZHbRTTpSzf2VPZiJo7i07N8bdsSPJRcsVIiEJLMJ9XUGzedJNGhvM+1sUL LOezwyH37i6XdJomBIlM5ihbeVWHTMUu8mS4OsdlMPX/RfbagvtDSn395pel+sQjMMfq ObsDmfASf7wPPW8xJe8kMP+P4FHdoaA1pldVlvvrSTEXLcWz71JriaEsPE3rq3LFCETH epjZI7hJuMxMj3bT8ayjWgStqFFZRBeZbl5zUG3J9kcYzLk71Dbz5uvJl4J3RMz2TmC8 PIqrWbCbjWXp+xhK7Ly5Dso7uIaMMHSMn9uxv8KK0chjhMXt4ROyd/hiDNRx09e+fFuG Hztw== X-Gm-Message-State: AOJu0YweWiSZsOAgWeitK6qCkn0FfCDCCS47opCeOnlPzaqcpQCk0dk4 wDRgz5+jlBAGf7H9uCdziXXKvoPKwKdLHgTrtsp/WIEYFqgp4vS/JR91cTQ3Za/m2DYW5gHuOlI GaW8RxJjmsk273zLqYjBkY8JhAJQHBsMCrnR7M0mOHSy1noEGug== X-Google-Smtp-Source: AGHT+IEMN2hG9dtM5YRRKewCpdKoIKPR7ONPikbZ6FD/09kibOLr5RACjhIMih7flqe4lTNsti13Q60= X-Received: from sdf.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5935]) (user=sdf job=sendgmr) by 2002:a17:90a:df8a:b0:271:df39:2332 with SMTP id p10-20020a17090adf8a00b00271df392332mr165599pjv.9.1694725512346; Thu, 14 Sep 2023 14:05:12 -0700 (PDT) Date: Thu, 14 Sep 2023 14:04:52 -0700 In-Reply-To: <20230914210452.2588884-1-sdf@google.com> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20230914210452.2588884-1-sdf@google.com> X-Mailer: git-send-email 2.42.0.459.ge4e396fd5e-goog Message-ID: <20230914210452.2588884-10-sdf@google.com> Subject: [PATCH bpf-next v2 9/9] xsk: document tx_metadata_len layout From: Stanislav Fomichev To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, kuba@kernel.org, toke@kernel.org, willemb@google.com, dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org, maciej.fijalkowski@intel.com, hawk@kernel.org, yoong.siang.song@intel.com, netdev@vger.kernel.org, xdp-hints@xdp-project.net X-Patchwork-Delegate: bpf@iogearbox.net - how to use - how to query features - pointers to the examples Signed-off-by: Stanislav Fomichev --- Documentation/networking/index.rst | 1 + Documentation/networking/xsk-tx-metadata.rst | 77 ++++++++++++++++++++ 2 files changed, 78 insertions(+) create mode 100644 Documentation/networking/xsk-tx-metadata.rst diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 5b75c3f7a137..9b2accb48df7 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -123,6 +123,7 @@ Refer to :ref:`netdev-FAQ` for a guide on netdev development process specifics. xfrm_sync xfrm_sysctl xdp-rx-metadata + xsk-tx-metadata .. only:: subproject and html diff --git a/Documentation/networking/xsk-tx-metadata.rst b/Documentation/networking/xsk-tx-metadata.rst new file mode 100644 index 000000000000..b7289f06745c --- /dev/null +++ b/Documentation/networking/xsk-tx-metadata.rst @@ -0,0 +1,77 @@ +================== +AF_XDP TX Metadata +================== + +This document describes how to enable offloads when transmitting packets +via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar +metadata on the receive side. + +General Design +============== + +The headroom for the metadata is reserved via ``tx_metadata_len`` in +``struct xdp_umem_reg``. The metadata length is therefore the same for +every socket that shares the same umem. The metadata layout is a fixed UAPI, +refer to ``union xsk_tx_metadata`` in ``include/uapi/linux/if_xdp.h``. +Thus, generally, the ``tx_metadata_len`` field above should contain +``sizeof(union xsk_tx_metadata)``. + +The headroom and the metadata itself should be located right before +``xdp_desc->addr`` in the umem frame. Within a frame, the metadata +layout is as follows:: + + tx_metadata_len + / \ + +-----------------+---------+----------------------------+ + | xsk_tx_metadata | padding | payload | + +-----------------+---------+----------------------------+ + ^ + | + xdp_desc->addr + +An AF_XDP application can request headrooms larger than ``sizeof(struct +xsk_tx_metadata)``. The kernel will ignore the padding (and will still +use ``xdp_desc->addr - tx_metadata_len`` to locate +the ``xsk_tx_metadata``). For the frames that shouldn't carry +any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), +the metadata area is ignored by the kernel as well. + +The flags field enables the particular offload: + +- ``XDP_TX_METADATA_TIMESTAMP``: requests the device to put transmission + timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. +- ``XDP_TX_METADATA_CHECKSUM``: requests the device to calculate L4 + checksum. ``csum_start`` specifies byte offset of there the checksumming + should start and ``csum_offset`` specifies byte offset where the + device should store the computed checksum. +- ``XDP_TX_METADATA_CHECKSUM_SW``: requests checksum calculation to + be done in software; this mode works only in ``XSK_COPY`` mode and + is mostly intended for testing. Do not enable this option, it + will negatively affect performance. + +Besides the flags above, in order to trigger the offloads, the first +packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` +bit in the ``options`` field. Also not that in a multi-buffer packet +only the first chunk should carry the metadata. + +Querying Device Capabilities +============================ + +Every devices exports its offloads capabilities via netlink netdev family. +Refer to ``xsk-flags`` features bitmask in +``Documentation/netlink/specs/netdev.yaml``. + +- ``tx-timestamp``: device supports ``XDP_TX_METADATA_TIMESTAMP`` +- ``tx-checksum``: device supports ``XDP_TX_METADATA_CHECKSUM`` + +Note that every devices supports ``XDP_TX_METADATA_CHECKSUM_SW`` when +running in ``XSK_COPY`` mode. + +See ``tools/net/ynl/samples/netdev.c`` on how to query this information. + +Example +======= + +See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example +program that handles TX metadata. Also see https://github.com/fomichev/xskgen +for a more bare-bones example.