[RFC,v5] vmxnet3: Add XDP support.

The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.

Background:
The vmxnet3 rx consists of three rings: ring0, ring1, and dataring.
For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma
mapped to the ring's descriptor. If LRO is enabled and packet size larger
than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of
the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is
allocated using alloc_page. So for LRO packets, the payload will be in one
buffer from r0 and multiple from r1, for non-LRO packets, only one
descriptor in r0 is used for packet size less than 3k.

When receiving a packet, the first descriptor will have the sop (start of
packet) bit set, and the last descriptor will have the eop (end of packet)
bit set. Non-LRO packets will have only one descriptor with both sop and
eop set.

Other than r0 and r1, vmxnet3 dataring is specifically designed for
handling packets with small size, usually 128 bytes, defined in
VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend
driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in
order to avoid memory mapping/unmapping overhead. In summary, packet size:
    A. < 128B: use dataring
    B. 128B - 3K: use ring0
    C. > 3K: use ring0 and ring1
As a result, the patch adds XDP support for packets using dataring
and r0 (case A and B), not the large packet size when LRO is enabled.

XDP Implementation:
When user loads and XDP prog, vmxnet3 driver checks configurations, such
as mtu, lro, and re-allocate the rx buffer size for reserving the extra
headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be
associated with every rx queue of the device. Note that when using dataring
for small packet size, vmxnet3 (front-end driver) doesn't control the
buffer allocation, as a result the XDP frame's headroom is zero.

The receive side of XDP is implemented for case A and B, by invoking the
bpf program at vmxnet3_rq_rx_complete and handle its returned action.
The new vmxnet3_run_xdp function handles the difference of using dataring
or ring0, and decides the next journey of the packet afterward.

For TX, vmxnet3 has split header design. Outgoing packets are parsed
first and protocol headers (L2/L3/L4) are copied to the backend. The
rest of the payload are dma mapped. Since XDP_TX does not parse the
packet protocol, the entire XDP frame is using dma mapped for the
transmission.

Performance:
Tested using two VMs inside one ESXi machine, using single core on each
vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3
device, sending 64B or 512B packet.

VM1 txgen:
$ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \
--forward-mode=txonly --eth-peer=0,<mac addr of vm2>
option: add "--txonly-multi-flow"
option: use --txpkts=512 or 64 byte

VM2 running XDP:
$ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode
$ ./samples/bpf/xdp_rxq_info -d ens160 -a <options>
options: XDP_DROP, XDP_PASS, XDP_TX

To test REDIRECT to cpu 0, use
$ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop

Single core performance comparison with skb-mode.
64B:      skb-mode -> native-mode (with this patch)
XDP_DROP: 960Kpps -> 2.4Mpps
XDP_PASS: 240Kpps -> 499Kpps
XDP_TX:   683Kpps -> 2.3Mpps
REDIRECT: 389Kpps -> 449Kpps
Same performance compared to v2.

512B:      skb-mode -> native-mode (with this patch)
                      v2      v3
XDP_DROP: 640Kpps -> 914Kpps -> 1.3Mpps
XDP_PASS: 220Kpps -> 240Kpps -> 280Kpps
XDP_TX:   483Kpps -> 886Kpps -> 1.3Mpps
REDIRECT: 365Kpps -> 1.2Mpps(?) -> 261Kpps

Good performance improvement over v2, due to skip
skb allocation.

Limitations:
a. LRO will be disabled when users load XDP program
b. MTU will be checked and limit to
   VMXNET3_MAX_SKB_BUF_SIZE(3K) - XDP_PACKET_HEADROOM(256) -
   SKB_DATA_ALIGN(sizeof(struct skb_shared_info))

Signed-off-by: William Tu <u9012063@gmail.com>
---
v4 -> v5:
- move XDP code to separate file: vmxnet3_xdp.{c, h},
  suggested by Guolin
- expose vmxnet3_rq_create_all and vmxnet3_adjust_rx_ring_size
- more test using samples/bpf/{xdp1, xdp2, xdp_adjust_tail}
- add debug print
- rebase on commit 65e6af6cebe

v3 -> v4:
- code refactoring and improved commit message
- make dataring and non-dataring case clear
- in XDP_PASS, handle xdp.data and xdp.data_end change after
  bpf program executed
- now still working on internal testing
- v4 applied on net-next commit 65e6af6cebef

v2 -> v3:
- code refactoring: move the XDP processing to the front
  of the vmxnet3_rq_rx_complete, and minimize the places
  of changes to existing code.
- Performance improvement over BUF_SKB (512B) due to skipping
  skb allocation when DROP and TX.

v1 -> v2:
- Avoid skb allocation for small packet size (when dataring is used)
- Use rcu_read_lock unlock instead of READ_ONCE
- Peroformance improvement over v1
- Merge xdp drop, tx, pass, and redirect into 1 patch

I tested the patch using below script:
while [ true ]; do
timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP --skb-mode
timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP
timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS --skb-mode
timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS
timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX --skb-mode
timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX
timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e pass
done
---
 drivers/net/vmxnet3/Makefile          |   2 +-
 drivers/net/vmxnet3/vmxnet3_drv.c     |  48 ++-
 drivers/net/vmxnet3/vmxnet3_ethtool.c |  14 +
 drivers/net/vmxnet3/vmxnet3_int.h     |  20 ++
 drivers/net/vmxnet3/vmxnet3_xdp.c     | 454 ++++++++++++++++++++++++++
 drivers/net/vmxnet3/vmxnet3_xdp.h     |  39 +++
 6 files changed, 569 insertions(+), 8 deletions(-)
 create mode 100644 drivers/net/vmxnet3/vmxnet3_xdp.c
 create mode 100644 drivers/net/vmxnet3/vmxnet3_xdp.h

Message ID	20221214000555.22785-1-u9012063@gmail.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 104EEC4332F for <netdev@archiver.kernel.org>; Wed, 14 Dec 2022 00:06:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237034AbiLNAGI (ORCPT <rfc822;netdev@archiver.kernel.org>); Tue, 13 Dec 2022 19:06:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236930AbiLNAGE (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 13 Dec 2022 19:06:04 -0500 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BFED6435 for <netdev@vger.kernel.org>; Tue, 13 Dec 2022 16:06:02 -0800 (PST) Received: by mail-pl1-x62d.google.com with SMTP id t2so1636529ply.2 for <netdev@vger.kernel.org>; Tue, 13 Dec 2022 16:06:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7tN+vkSjs6Conojnx4M6Jd7owYFuYTZyTTfSRMsDERw=; b=d8Wvj4DSAr0GpxTqJV9JHkETA9DtBqNZJUcdOSKJTqLkuc9MG/WI+epOgcaWAT5nL+ VG8oMvOJVj3jX+RCP6OttJ09mxTFNJrtybI3mP25Y2Nw6vB0rLVRhXY/gtkPLx/CTm6S EzS9hFLPntu1Gz10QJpQCqZitmNth1wHVtPMK0OaGrF59IYatI+9kWuz9c9X2Rw1BCu6 LPV3x/4bFXQYFceOXY7BA37dAgqROTN5+gwY6DsEaCUToXcOtnzk9QXSgEUU5Tdc7xxn rrxSFvJEbAJ6H46i85+rUq9X9JLAdxjrVOAkkhNypXcAi6yH7hoWz2QLNlRk9K0ch2Px LsjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7tN+vkSjs6Conojnx4M6Jd7owYFuYTZyTTfSRMsDERw=; b=iKssr7pXz9LTT3jZSLd60jdKugrlcPB+J480bpmRYxxURlgPYTvYLytguQFvZRKH0Z CaWUI2Cq0hajn8dLYNipFLvkSWwQb25BicW7w+/ZJ+G3Lm1QpKfj1cIAApd8UeYdULgq I8saCezul4A4VyDLNlES5sbF3H1KbUMfVDBVjxazzKz9K+XIag8OAnKo+t2pd5RaLxol Q4k7F5NFhKczQVwmlKLg3K4qma77ie7a1rrpyPEydZ0lAgyNrUy6dkpPJyC8+xBQyuRp wyq49LFVpp7vP7sy25ieEr+ZcttR3v/jZ09UIt108ogPe/1nA44qLzhJ/Ny1mDfgGrrE o/sw== X-Gm-Message-State: ANoB5pnsFc5wRvaNFMXepN90aYTqJUGV66TRseofaWAFI9MkU0k9MKG3 Scg7sQg3apxc4gPBVL5n0Qy8m/OX6A4= X-Google-Smtp-Source: AA0mqf5aZbeZNpfmv3j082yGSMJ6gI1HypSonM07vXVs+j9oOfAreawuA9ZG632LHxgwNSNRAY/O1w== X-Received: by 2002:a05:6a20:ce4d:b0:9d:efd3:66ca with SMTP id id13-20020a056a20ce4d00b0009defd366camr25551133pzb.17.1670976360744; Tue, 13 Dec 2022 16:06:00 -0800 (PST) Received: from tucXMD6R.vmware.com.com ([76.146.104.40]) by smtp.gmail.com with ESMTPSA id r19-20020a63d913000000b00478e7f87f3bsm7193178pgg.67.2022.12.13.16.05.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 13 Dec 2022 16:06:00 -0800 (PST) From: William Tu <u9012063@gmail.com> To: netdev@vger.kernel.org Cc: tuc@vmware.com, gyang@vmware.com, doshir@vmware.com Subject: [RFC PATCH v5] vmxnet3: Add XDP support. Date: Tue, 13 Dec 2022 16:05:55 -0800 Message-Id: <20221214000555.22785-1-u9012063@gmail.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC
Series	[RFC,v5] vmxnet3: Add XDP support. \| expand [RFC,v5] vmxnet3: Add XDP support.

Context	Check	Description
netdev/tree_selection	success	Guessed tree name to be net-next, async
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 38 this patch: 44
netdev/cc_maintainers	warning	10 maintainers not CCed: bpf@vger.kernel.org edumazet@google.com ast@kernel.org davem@davemloft.net daniel@iogearbox.net kuba@kernel.org pabeni@redhat.com john.fastabend@gmail.com pv-drivers@vmware.com hawk@kernel.org
netdev/build_clang	success	Errors and warnings before: 6 this patch: 6
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 36 this patch: 42
netdev/checkpatch	warning	CHECK: Alignment should match open parenthesis WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 83 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

[RFC,v5] vmxnet3: Add XDP support.

Checks

Commit Message

Comments

Patch