From patchwork Wed Dec 7 04:52:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 13066610 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90F93C47089 for ; Wed, 7 Dec 2022 04:52:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229660AbiLGEwz (ORCPT ); Tue, 6 Dec 2022 23:52:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbiLGEwx (ORCPT ); Tue, 6 Dec 2022 23:52:53 -0500 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A3B3248D5 for ; Tue, 6 Dec 2022 20:52:51 -0800 (PST) Received: by mail-pg1-x52c.google.com with SMTP id s196so15264838pgs.3 for ; Tue, 06 Dec 2022 20:52:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=qHOJdS89TPIxCrLGmn+p95taqJMqOjcYPuzEib3dPiY=; b=e1cxCCbIhbG6TiOcEB5iZGA/WviGu/wG8h66xXwrrA2dxty3BFAdnUtlIyoqj0UaHq 6sQmk+EiPpg6KUqmeRssF8kSstb205W9D43YWBc7hs91zK/Kc1hatMz4X7zGdhPOtvay Kv/yVEzGE0Ao9X1BYnZI6RiYDlwXQFUUYW4j+UXe/hYwzTFfQ5m3ZKcz34dzj69uT/a/ 0Pb34XqdBpoh8vLT7+yeDgVTWbWH7PtNVEzAOysYKa28OWd70HVbQDhhnN3+sQfM4AhK nFvGHWolUeM2vIj7Kmjd3WGP7GWrlDrC3lNYYeTt2fhRCzGwR7Wrwh4oxgA/KP3Mg6Td A8fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qHOJdS89TPIxCrLGmn+p95taqJMqOjcYPuzEib3dPiY=; b=VNie8Va3QcehOqLkPVYIBijkfiAOOAdem3uCwsc3rHNkTZVnRqBWwX4HhtSNM5dYJ/ wzwGRaJxQRi2Ouc6XBb+RzD1kmPyWb8TyiXUoHaXqv3hxhjYFRFM/cqhjG6Yjkq+tMnV /g3l5wbYKIXuIvjWh/YPXwx/sZJg5C/5MG/GDzQtHs175zHq51jCkhC3axmnLmR6bIN3 5UKLe89iyh7i+u9aWLs/DC1KgdVcGF+qE3SXRu+PQLpyyZy/4QLcA8kRE2wN9AdvNWVr iy74KgnOdTd1DrC9hLsruclh6GV4jxjsQ+iUswJ4/ncVBIV3OtuAUfqxwcitJW7xMdtj 2BfA== X-Gm-Message-State: ANoB5pnnB/oen6Gb9BV15FPwq3gVN9QwxX+QeyYWiE5ITQyYurc9QPzc oXTru6lAAMPEekG5M8PLkfz4vm6hfjk= X-Google-Smtp-Source: AA0mqf7AzD074Iyh7pRj/sAWpTPGlvE1Q4HSGzNKJPWgyOfI0IKNV7wFkRV48TQgvglmQ74wc2oZQA== X-Received: by 2002:a63:5409:0:b0:476:e3bb:2340 with SMTP id i9-20020a635409000000b00476e3bb2340mr62771556pgb.530.1670388770168; Tue, 06 Dec 2022 20:52:50 -0800 (PST) Received: from tucXMD6R.vmware.com.com ([76.146.104.40]) by smtp.gmail.com with ESMTPSA id m1-20020a17090a3f8100b00213d28a6dedsm236802pjc.13.2022.12.06.20.52.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2022 20:52:49 -0800 (PST) From: William Tu To: netdev@vger.kernel.org Cc: tuc@vmware.com, gyang@vmware.com, doshir@vmware.com Subject: [RFC PATCH v4] vmxnet3: Add XDP support. Date: Tue, 6 Dec 2022 20:52:45 -0800 Message-Id: <20221207045245.95782-1-u9012063@gmail.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. Background: The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma mapped to the ring's descriptor. If LRO is enabled and packet size is larger than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is allocated using alloc_page. So for LRO packets, the payload will be in one buffer from r0 and multiple from r1, for non-LRO packets, only one descriptor in r0 is used for packet size less than 3k. When receiving a packet, the first descriptor will have the sop (start of packet) bit set, and the last descriptor will have the eop (end of packet) bit set. Non-LRO packets will have only one descriptor with both sop and eop set. Other than r0 and r1, vmxnet3 dataring is specifically designed for handling packets with small size, usually 128 bytes, defined in VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in order to avoid memory mapping/unmapping overhead. In summary, packet size: A. < 128B: use dataring B. 128B - 3K: use ring0 C. > 3K: use ring0 and ring1 As a result, the patch adds XDP support for packets using dataring and r0 (case A and B), not the large packet size when LRO is enabled. XDP Implementation: When user loads and XDP prog, vmxnet3 driver checks configurations, such as mtu, lro, and re-allocate the rx buffer size for reserving the extra headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be associated with every rx queue of the device. Note that when using dataring for small packet size, vmxnet3 (front-end driver) doesn't control the buffer allocation, as a result the XDP frame's headroom is zero. The receive side of XDP is implemented for case A and B, by invoking the bpf program at vmxnet3_rq_rx_complete and handle its returned action. The new vmxnet3_run_xdp function handles the difference of using dataring or ring0, and decides the next journey of the packet afterward. For TX, vmxnet3 has split header design. Outgoing packets are parsed first and protocol headers (L2/L3/L4) are copied to the backend. The rest of the payload are dma mapped. Since XDP_TX does not parse the packet protocol, the entire XDP frame is using dma mapped for the transmission. Performance: Tested using two VMs inside one ESXi machine, using single core on each vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3 device, sending 64B or 512B packet. VM1 txgen: $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \ --forward-mode=txonly --eth-peer=0, option: add "--txonly-multi-flow" option: use --txpkts=512 or 64 byte VM2 running XDP: $ ./samples/bpf/xdp_rxq_info -d ens160 -a --skb-mode $ ./samples/bpf/xdp_rxq_info -d ens160 -a options: XDP_DROP, XDP_PASS, XDP_TX To test REDIRECT to cpu 0, use $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop Single core performance comparison with skb-mode. 64B: skb-mode -> native-mode (with this patch) XDP_DROP: 960Kpps -> 2.4Mpps XDP_PASS: 240Kpps -> 499Kpps XDP_TX: 683Kpps -> 2.3Mpps REDIRECT: 389Kpps -> 449Kpps Same performance compared to v2. 512B: skb-mode -> native-mode (with this patch) v2 v3 XDP_DROP: 640Kpps -> 914Kpps -> 1.3Mpps XDP_PASS: 220Kpps -> 240Kpps -> 280Kpps XDP_TX: 483Kpps -> 886Kpps -> 1.3Mpps REDIRECT: 365Kpps -> 1.2Mpps(?) -> 261Kpps Good performance improvement over v2, due to skip skb allocation. Limitations: a. LRO will be disabled when users load XDP program b. MTU will be checked and limit to VMXNET3_MAX_SKB_BUF_SIZE(3K) - XDP_PACKET_HEADROOM(256) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) The patch is based on net-next commit 27f2533bcc6e909b85d3c1b738fa1f203ed8a835 ("nfp: flower: support to offload pedit of IPv6 flowinto fields") Signed-off-by: William Tu --- v3 -> v4: - code refactoring and improved commit message - make dataring and non-dataring case clear - in XDP_PASS, handle xdp.data and xdp.data_end change after bpf program executed - now still working on internal testing - v4 applied on net-next commit 65e6af6cebef v2 -> v3: - code refactoring: move the XDP processing to the front of the vmxnet3_rq_rx_complete, and minimize the places of changes to existing code. - Performance improvement over BUF_SKB (512B) due to skipping skb allocation when DROP and TX. v1 -> v2: - Avoid skb allocation for small packet size (when dataring is used) - Use rcu_read_lock unlock instead of READ_ONCE - Peroformance improvement over v1 - Merge xdp drop, tx, pass, and redirect into 1 patch I tested the patch using below script: while [ true ]; do timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP --skb-mode timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_DROP timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS --skb-mode timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_PASS timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX --skb-mode timeout 20 ./samples/bpf/xdp_rxq_info -d ens160 -a XDP_TX timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop timeout 20 ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e pass done --- drivers/net/vmxnet3/vmxnet3_drv.c | 467 +++++++++++++++++++++++++- drivers/net/vmxnet3/vmxnet3_ethtool.c | 14 + drivers/net/vmxnet3/vmxnet3_int.h | 19 ++ 3 files changed, 497 insertions(+), 3 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index d3e7b27eb933..67dac427cd39 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -26,6 +26,10 @@ #include #include +#include +#include +#include +#include #include "vmxnet3_int.h" @@ -47,6 +51,13 @@ static int enable_mq = 1; static void vmxnet3_write_mac_addr(struct vmxnet3_adapter *adapter, const u8 *mac); +static int +vmxnet3_xdp_headroom(struct vmxnet3_adapter *adapter); +static int +vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, + struct xdp_frame *xdpf, + struct sk_buff *skb, + struct vmxnet3_tx_queue *tq); /* * Enable/Disable the given intr @@ -351,7 +362,6 @@ vmxnet3_unmap_pkt(u32 eop_idx, struct vmxnet3_tx_queue *tq, BUG_ON(VMXNET3_TXDESC_GET_EOP(&(tq->tx_ring.base[eop_idx].txd)) != 1); skb = tq->buf_info[eop_idx].skb; - BUG_ON(skb == NULL); tq->buf_info[eop_idx].skb = NULL; VMXNET3_INC_RING_IDX_ONLY(eop_idx, tq->tx_ring.size); @@ -592,6 +602,9 @@ vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx, rbi->skb = __netdev_alloc_skb_ip_align(adapter->netdev, rbi->len, GFP_KERNEL); + if (adapter->xdp_enabled) + skb_reserve(rbi->skb, XDP_PACKET_HEADROOM); + if (unlikely(rbi->skb == NULL)) { rq->stats.rx_buf_alloc_failure++; break; @@ -1387,6 +1400,259 @@ vmxnet3_get_hdr_len(struct vmxnet3_adapter *adapter, struct sk_buff *skb, return (hlen + (hdr.tcp->doff << 2)); } +static int +vmxnet3_xdp_xmit(struct net_device *dev, + int n, struct xdp_frame **frames, u32 flags) +{ + struct vmxnet3_adapter *adapter; + struct vmxnet3_tx_queue *tq; + struct netdev_queue *nq; + int i, err, cpu; + int nxmit = 0; + int tq_number; + + adapter = netdev_priv(dev); + + if (unlikely(test_bit(VMXNET3_STATE_BIT_QUIESCED, &adapter->state))) + return -ENETDOWN; + if (unlikely(test_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state))) + return -EINVAL; + + tq_number = adapter->num_tx_queues; + cpu = smp_processor_id(); + tq = &adapter->tx_queue[cpu % tq_number]; + if (tq->stopped) { + return -ENETDOWN; + } + nq = netdev_get_tx_queue(adapter->netdev, tq->qid); + + __netif_tx_lock(nq, cpu); + for (i = 0; i < n; i++) { + err = vmxnet3_xdp_xmit_frame(adapter, frames[i], NULL, tq); + if (err) { + tq->stats.xdp_xmit_err++; + break; + } + nxmit++; + } + + tq->stats.xdp_xmit += nxmit; + __netif_tx_unlock(nq); + + return nxmit; +} + +static int +vmxnet3_xdp_xmit_back(struct vmxnet3_adapter *adapter, + struct xdp_frame *xdpf, + struct sk_buff *skb) +{ + struct vmxnet3_tx_queue *tq; + struct netdev_queue *nq; + int err = 0, cpu; + int tq_number; + + tq_number = adapter->num_tx_queues; + cpu = smp_processor_id(); + tq = &adapter->tx_queue[cpu % tq_number]; + if (tq->stopped) { + return -ENETDOWN; + } + nq = netdev_get_tx_queue(adapter->netdev, tq->qid); + + __netif_tx_lock(nq, cpu); + err = vmxnet3_xdp_xmit_frame(adapter, xdpf, skb, tq); + if (err) { + goto exit; + } +exit: + __netif_tx_unlock(nq); + return err; +} + +static int +vmxnet3_xdp_xmit_frame(struct vmxnet3_adapter *adapter, + struct xdp_frame *xdpf, + struct sk_buff *skb, + struct vmxnet3_tx_queue *tq) +{ + struct vmxnet3_tx_buf_info *tbi = NULL; + union Vmxnet3_GenericDesc *gdesc; + struct vmxnet3_tx_ctx ctx; + int tx_num_deferred; + u32 buf_size; + int ret = 0; + u32 dw2; + + if (vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) == 0) { + tq->stats.tx_ring_full++; + ret = -ENOMEM; + goto exit; + } + + dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT; + dw2 |= xdpf->len; + ctx.sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill; + gdesc = ctx.sop_txd; + + buf_size = xdpf->len; + tbi = tq->buf_info + tq->tx_ring.next2fill; + tbi->map_type = VMXNET3_MAP_SINGLE; + tbi->dma_addr = dma_map_single(&adapter->pdev->dev, + xdpf->data, buf_size, + DMA_TO_DEVICE); + if (dma_mapping_error(&adapter->pdev->dev, tbi->dma_addr)) { + ret = -EFAULT; + goto exit; + } + tbi->len = buf_size; + + gdesc = tq->tx_ring.base + tq->tx_ring.next2fill; + BUG_ON(gdesc->txd.gen == tq->tx_ring.gen); + + gdesc->txd.addr = cpu_to_le64(tbi->dma_addr); + gdesc->dword[2] = cpu_to_le32(dw2); + + /* Setup the EOP desc */ + gdesc->dword[3] = cpu_to_le32(VMXNET3_TXD_CQ | VMXNET3_TXD_EOP); + + gdesc->txd.om = 0; + gdesc->txd.msscof = 0; + gdesc->txd.hlen = 0; + gdesc->txd.ti = 0; + + tx_num_deferred = le32_to_cpu(tq->shared->txNumDeferred); + tq->shared->txNumDeferred += 1; + tx_num_deferred++; + + vmxnet3_cmd_ring_adv_next2fill(&tq->tx_ring); + + /* set the last buf_info for the pkt */ + tbi->skb = skb; + tbi->sop_idx = ctx.sop_txd - tq->tx_ring.base; + + dma_wmb(); + gdesc->dword[2] = cpu_to_le32(le32_to_cpu(gdesc->dword[2]) ^ + VMXNET3_TXD_GEN); + if (tx_num_deferred >= le32_to_cpu(tq->shared->txThreshold)) { + tq->shared->txNumDeferred = 0; + VMXNET3_WRITE_BAR0_REG(adapter, + VMXNET3_REG_TXPROD + tq->qid * 8, + tq->tx_ring.next2fill); + } +exit: + return ret; +} + +static int +__vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, void *data, int data_len, + int headroom, int frame_sz, bool *need_xdp_flush, + struct sk_buff *skb) +{ + struct xdp_frame *xdpf; + void *buf_hard_start; + struct xdp_buff xdp; + struct page *page; + void *orig_data; + int err, delta; + int delta_len; + u32 act; + + buf_hard_start = data; + xdp_init_buff(&xdp, frame_sz, &rq->xdp_rxq); + xdp_prepare_buff(&xdp, buf_hard_start, headroom, data_len, false); + orig_data = xdp.data; + + act = bpf_prog_run_xdp(rq->xdp_bpf_prog, &xdp); + rq->stats.xdp_packets++; + + switch (act) { + case XDP_DROP: + rq->stats.xdp_drops++; + break; + case XDP_PASS: + /* bpf prog might change len and data position. + * dataring does not use skb so not support this. + */ + delta = xdp.data - orig_data; + delta_len = (xdp.data_end - xdp.data) - data_len; + if (skb) { + skb_reserve(skb, delta); + skb_put(skb, delta_len); + } + break; + case XDP_TX: + xdpf = xdp_convert_buff_to_frame(&xdp); + if (!xdpf || + vmxnet3_xdp_xmit_back(rq->adapter, xdpf, NULL)) { + rq->stats.xdp_drops++; + } else { + rq->stats.xdp_tx++; + } + break; + case XDP_ABORTED: + trace_xdp_exception(rq->adapter->netdev, rq->xdp_bpf_prog, + act); + rq->stats.xdp_aborted++; + break; + case XDP_REDIRECT: + page = alloc_page(GFP_ATOMIC); + if (!page) { + rq->stats.rx_buf_alloc_failure++; + return XDP_DROP; + } + xdp_init_buff(&xdp, PAGE_SIZE, &rq->xdp_rxq); + xdp_prepare_buff(&xdp, page_address(page), + XDP_PACKET_HEADROOM, + data_len, false); + memcpy(xdp.data, data, data_len); + err = xdp_do_redirect(rq->adapter->netdev, &xdp, + rq->xdp_bpf_prog); + if (!err) { + rq->stats.xdp_redirects++; + } else { + __free_page(page); + rq->stats.xdp_drops++; + } + *need_xdp_flush = true; + break; + default: + bpf_warn_invalid_xdp_action(rq->adapter->netdev, + rq->xdp_bpf_prog, act); + break; + } + return act; +} + +static int +vmxnet3_run_xdp(struct vmxnet3_rx_queue *rq, struct vmxnet3_rx_buf_info *rbi, + struct Vmxnet3_RxCompDesc *rcd, bool *need_flush, + bool rxDataRingUsed) +{ + struct vmxnet3_adapter *adapter; + int act = XDP_PASS; + void *data; + int sz; + + adapter = rq->adapter; + if (rxDataRingUsed) { + sz = rcd->rxdIdx * rq->data_ring.desc_size; + data = &rq->data_ring.base[sz]; + act = __vmxnet3_run_xdp(rq, data, rcd->len, 0, + rq->data_ring.desc_size, need_flush, + NULL); + } else { + dma_unmap_single(&adapter->pdev->dev, + rbi->dma_addr, + rbi->len, + DMA_FROM_DEVICE); + act = __vmxnet3_run_xdp(rq, rbi->skb->data, + rcd->len, XDP_PACKET_HEADROOM, + rbi->len, need_flush, rbi->skb); + } + return act; +} + static int vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter, int quota) @@ -1404,6 +1670,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, struct Vmxnet3_RxDesc rxCmdDesc; struct Vmxnet3_RxCompDesc rxComp; #endif + bool need_flush = 0; + vmxnet3_getRxComp(rcd, &rq->comp_ring.base[rq->comp_ring.next2proc].rcd, &rxComp); while (rcd->gen == rq->comp_ring.gen) { @@ -1444,6 +1712,60 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, goto rcd_done; } + if (rcd->sop && rcd->eop && adapter->xdp_enabled) { + struct bpf_prog *xdp_prog; + bool rxDataRingUsed; + int act; + + if (unlikely(rcd->len == 0)) + goto rcd_done; + rxDataRingUsed = VMXNET3_RX_DATA_RING(adapter, + rcd->rqID); + rcu_read_lock(); + xdp_prog = rcu_dereference(rq->xdp_bpf_prog); + if (!xdp_prog) { + rcu_read_unlock(); + goto skip_xdp; + } + act = vmxnet3_run_xdp(rq, rbi, rcd, &need_flush, + rxDataRingUsed); + rcu_read_unlock(); + + switch (act) { + case XDP_PASS: + ctx->skb = NULL; + goto skip_xdp; + case XDP_DROP: + case XDP_TX: + case XDP_REDIRECT: + case XDP_ABORTED: + default: + /* Reuse and remap the existing buffer. + * No need to re-allocate. */ + if (rxDataRingUsed) + goto rcd_done; + + new_skb = rbi->skb; + new_dma_addr = + dma_map_single(&adapter->pdev->dev, + new_skb->data, rbi->len, + DMA_FROM_DEVICE); + if (dma_mapping_error(&adapter->pdev->dev, + new_dma_addr)) { + dev_kfree_skb(new_skb); + ctx->skb = NULL; + rq->stats.drop_total++; + goto rcd_done; + } + rbi->dma_addr = new_dma_addr; + rxd->addr = cpu_to_le64(rbi->dma_addr); + rxd->len = rbi->len; + ctx->skb = NULL; + skip_page_frags = true; + goto rcd_done; + } + } +skip_xdp: if (rcd->sop) { /* first buf of the pkt */ bool rxDataRingUsed; u16 len; @@ -1483,6 +1805,10 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, goto rcd_done; } + if (adapter->xdp_enabled && !rxDataRingUsed) + skb_reserve(new_skb, + XDP_PACKET_HEADROOM); + if (rxDataRingUsed) { size_t sz; @@ -1730,6 +2056,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, vmxnet3_getRxComp(rcd, &rq->comp_ring.base[rq->comp_ring.next2proc].rcd, &rxComp); } + if (need_flush) + xdp_do_flush_map(); return num_pkts; } @@ -1776,6 +2104,7 @@ vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq, rq->comp_ring.gen = VMXNET3_INIT_GEN; rq->comp_ring.next2proc = 0; + rq->xdp_bpf_prog = NULL; } @@ -1788,6 +2117,32 @@ vmxnet3_rq_cleanup_all(struct vmxnet3_adapter *adapter) vmxnet3_rq_cleanup(&adapter->rx_queue[i], adapter); } +static void +vmxnet3_unregister_xdp_rxq(struct vmxnet3_rx_queue *rq) +{ + xdp_rxq_info_unreg_mem_model(&rq->xdp_rxq); + xdp_rxq_info_unreg(&rq->xdp_rxq); +} + +static int +vmxnet3_register_xdp_rxq(struct vmxnet3_rx_queue *rq, + struct vmxnet3_adapter *adapter) +{ + int err; + + err = xdp_rxq_info_reg(&rq->xdp_rxq, adapter->netdev, rq->qid, 0); + if (err < 0) { + return err; + } + + err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_PAGE_SHARED, + NULL); + if (err < 0) { + xdp_rxq_info_unreg(&rq->xdp_rxq); + return err; + } + return 0; +} static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter) @@ -1832,6 +2187,8 @@ static void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq, kfree(rq->buf_info[0]); rq->buf_info[0] = NULL; rq->buf_info[1] = NULL; + + vmxnet3_unregister_xdp_rxq(rq); } static void @@ -1893,6 +2250,10 @@ vmxnet3_rq_init(struct vmxnet3_rx_queue *rq, } vmxnet3_rq_alloc_rx_buf(rq, 1, rq->rx_ring[1].size - 1, adapter); + /* always register, even if no XDP prog used */ + if (vmxnet3_register_xdp_rxq(rq, adapter)) + return -EINVAL; + /* reset the comp ring */ rq->comp_ring.next2proc = 0; memset(rq->comp_ring.base, 0, rq->comp_ring.size * @@ -2585,7 +2946,8 @@ vmxnet3_setup_driver_shared(struct vmxnet3_adapter *adapter) if (adapter->netdev->features & NETIF_F_RXCSUM) devRead->misc.uptFeatures |= UPT1_F_RXCSUM; - if (adapter->netdev->features & NETIF_F_LRO) { + if (adapter->netdev->features & NETIF_F_LRO && + !adapter->xdp_enabled) { devRead->misc.uptFeatures |= UPT1_F_LRO; devRead->misc.maxNumRxSG = cpu_to_le16(1 + MAX_SKB_FRAGS); } @@ -3025,6 +3387,14 @@ vmxnet3_free_pci_resources(struct vmxnet3_adapter *adapter) pci_disable_device(adapter->pdev); } +static int +vmxnet3_xdp_headroom(struct vmxnet3_adapter *adapter) +{ + if (adapter->xdp_enabled) + return VMXNET3_XDP_ROOM; + else + return 0; +} static void vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter) @@ -3035,7 +3405,8 @@ vmxnet3_adjust_rx_ring_size(struct vmxnet3_adapter *adapter) if (adapter->netdev->mtu <= VMXNET3_MAX_SKB_BUF_SIZE - VMXNET3_MAX_ETH_HDR_SIZE) { adapter->skb_buf_size = adapter->netdev->mtu + - VMXNET3_MAX_ETH_HDR_SIZE; + VMXNET3_MAX_ETH_HDR_SIZE + + vmxnet3_xdp_headroom(adapter); if (adapter->skb_buf_size < VMXNET3_MIN_T0_BUF_SIZE) adapter->skb_buf_size = VMXNET3_MIN_T0_BUF_SIZE; @@ -3563,6 +3934,93 @@ vmxnet3_reset_work(struct work_struct *data) clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state); } +static void +vmxnet3_xdp_exchange_program(struct vmxnet3_adapter *adapter, + struct bpf_prog *prog) +{ + struct vmxnet3_rx_queue *rq; + int i; + + for (i = 0; i < adapter->num_rx_queues; i++) { + rq = &adapter->rx_queue[i]; + rcu_assign_pointer(rq->xdp_bpf_prog, prog); + } + if (prog) + adapter->xdp_enabled = true; + else + adapter->xdp_enabled = false; +} + +static int +vmxnet3_xdp_set(struct net_device *netdev, struct netdev_bpf *bpf, + struct netlink_ext_ack *extack) +{ + struct vmxnet3_adapter *adapter = netdev_priv(netdev); + struct bpf_prog *new_bpf_prog = bpf->prog; + struct bpf_prog *old_bpf_prog; + bool use_dataring; + bool need_update; + bool running; + int err = 0; + + if (new_bpf_prog && netdev->mtu > VMXNET3_XDP_MAX_MTU) { + NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP"); + return -EOPNOTSUPP; + } + + old_bpf_prog = READ_ONCE(adapter->rx_queue[0].xdp_bpf_prog); + if (!new_bpf_prog && !old_bpf_prog) { + adapter->xdp_enabled = false; + return 0; + } + running = netif_running(netdev); + need_update = !!old_bpf_prog != !!new_bpf_prog; + + if (running && need_update) { + vmxnet3_quiesce_dev(adapter); + } + + vmxnet3_xdp_exchange_program(adapter, new_bpf_prog); + if (old_bpf_prog) { + bpf_prog_put(old_bpf_prog); + } + + if (running && need_update) { + vmxnet3_reset_dev(adapter); + vmxnet3_rq_destroy_all(adapter); + vmxnet3_adjust_rx_ring_size(adapter); + err = vmxnet3_rq_create_all(adapter); + if (err) { + NL_SET_ERR_MSG_MOD(extack, + "failed to re-create rx queues for XDP."); + err = -EOPNOTSUPP; + goto out; + } + err = vmxnet3_activate_dev(adapter); + if (err) { + NL_SET_ERR_MSG_MOD(extack, + "failed to activate device for XDP."); + err = -EOPNOTSUPP; + goto out; + } + clear_bit(VMXNET3_STATE_BIT_RESETTING, &adapter->state); + } +out: + return err; +} + +/* This is the main xdp call used by kernel to set/unset eBPF program. */ +static int +vmxnet3_xdp(struct net_device *netdev, struct netdev_bpf *bpf) +{ + switch (bpf->command) { + case XDP_SETUP_PROG: + return vmxnet3_xdp_set(netdev, bpf, bpf->extack); + default: + return -EINVAL; + } + return 0; +} static int vmxnet3_probe_device(struct pci_dev *pdev, @@ -3585,6 +4043,8 @@ vmxnet3_probe_device(struct pci_dev *pdev, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller = vmxnet3_netpoll, #endif + .ndo_bpf = vmxnet3_xdp, + .ndo_xdp_xmit = vmxnet3_xdp_xmit, }; int err; u32 ver; @@ -3900,6 +4360,7 @@ vmxnet3_probe_device(struct pci_dev *pdev, goto err_register; } + adapter->xdp_enabled = false; vmxnet3_check_link(adapter, false); return 0; diff --git a/drivers/net/vmxnet3/vmxnet3_ethtool.c b/drivers/net/vmxnet3/vmxnet3_ethtool.c index 18cf7c723201..6f542236b26e 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethtool.c +++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c @@ -76,6 +76,10 @@ vmxnet3_tq_driver_stats[] = { copy_skb_header) }, { " giant hdr", offsetof(struct vmxnet3_tq_driver_stats, oversized_hdr) }, + { " xdp xmit", offsetof(struct vmxnet3_tq_driver_stats, + xdp_xmit) }, + { " xdp xmit err", offsetof(struct vmxnet3_tq_driver_stats, + xdp_xmit_err) }, }; /* per rq stats maintained by the device */ @@ -106,6 +110,16 @@ vmxnet3_rq_driver_stats[] = { drop_fcs) }, { " rx buf alloc fail", offsetof(struct vmxnet3_rq_driver_stats, rx_buf_alloc_failure) }, + { " xdp packets", offsetof(struct vmxnet3_rq_driver_stats, + xdp_packets) }, + { " xdp tx", offsetof(struct vmxnet3_rq_driver_stats, + xdp_tx) }, + { " xdp redirects", offsetof(struct vmxnet3_rq_driver_stats, + xdp_redirects) }, + { " xdp drops", offsetof(struct vmxnet3_rq_driver_stats, + xdp_drops) }, + { " xdp aborted", offsetof(struct vmxnet3_rq_driver_stats, + xdp_aborted) }, }; /* global stats maintained by the driver */ diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h index 3367db23aa13..24ac14c1abc9 100644 --- a/drivers/net/vmxnet3/vmxnet3_int.h +++ b/drivers/net/vmxnet3/vmxnet3_int.h @@ -56,6 +56,8 @@ #include #include #include +#include +#include #include "vmxnet3_defs.h" @@ -217,6 +219,9 @@ struct vmxnet3_tq_driver_stats { u64 linearized; /* # of pkts linearized */ u64 copy_skb_header; /* # of times we have to copy skb header */ u64 oversized_hdr; + + u64 xdp_xmit; + u64 xdp_xmit_err; }; struct vmxnet3_tx_ctx { @@ -285,6 +290,12 @@ struct vmxnet3_rq_driver_stats { u64 drop_err; u64 drop_fcs; u64 rx_buf_alloc_failure; + + u64 xdp_packets; /* Total packets processed by XDP. */ + u64 xdp_tx; + u64 xdp_redirects; + u64 xdp_drops; + u64 xdp_aborted; }; struct vmxnet3_rx_data_ring { @@ -307,6 +318,8 @@ struct vmxnet3_rx_queue { struct vmxnet3_rx_buf_info *buf_info[2]; struct Vmxnet3_RxQueueCtrl *shared; struct vmxnet3_rq_driver_stats stats; + struct bpf_prog __rcu *xdp_bpf_prog; + struct xdp_rxq_info xdp_rxq; } __attribute__((__aligned__(SMP_CACHE_BYTES))); #define VMXNET3_DEVICE_MAX_TX_QUEUES 32 @@ -415,6 +428,7 @@ struct vmxnet3_adapter { u16 tx_prod_offset; u16 rx_prod_offset; u16 rx_prod2_offset; + bool xdp_enabled; }; #define VMXNET3_WRITE_BAR0_REG(adapter, reg, val) \ @@ -457,6 +471,11 @@ struct vmxnet3_adapter { #define VMXNET3_MAX_ETH_HDR_SIZE 22 #define VMXNET3_MAX_SKB_BUF_SIZE (3*1024) +#define VMXNET3_XDP_ROOM SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \ + XDP_PACKET_HEADROOM +#define VMXNET3_XDP_MAX_MTU VMXNET3_MAX_SKB_BUF_SIZE - VMXNET3_XDP_ROOM + + #define VMXNET3_GET_RING_IDX(adapter, rqID) \ ((rqID >= adapter->num_rx_queues && \ rqID < 2 * adapter->num_rx_queues) ? 1 : 0) \