From patchwork Thu Apr 18 11:36:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13634541 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C05BC4345F for ; Thu, 18 Apr 2024 11:37:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C384C6B00A8; Thu, 18 Apr 2024 07:37:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE8ED6B00A9; Thu, 18 Apr 2024 07:37:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A62616B00AA; Thu, 18 Apr 2024 07:37:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 86C346B00A8 for ; Thu, 18 Apr 2024 07:37:08 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0CF19A11F7 for ; Thu, 18 Apr 2024 11:37:08 +0000 (UTC) X-FDA: 82022451336.19.943B66B Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by imf25.hostedemail.com (Postfix) with ESMTP id CDFB2A0021 for ; Thu, 18 Apr 2024 11:37:05 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MOSVofDJ; spf=pass (imf25.hostedemail.com: domain of aleksander.lobakin@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=aleksander.lobakin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713440226; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FPnQET+jTx+0exIerxxzOsIUSXqQ+hFpPVBW4/Z4oxE=; b=gVQ8jYLlHr35OriWmdkjyJj7yC5tkIJkTO1ndfaMRkt+xsSRsOunFAvxBkKpDk3uocWPij IObxF/lARwTAejOB/bHt/OSf+A/ZV1YqgWstrWjJujODgUbT2Bo9GRR9JYiuCiuWWvzqcu UO8XTth99hyAOt7ElO+ouOPLLFDVglE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MOSVofDJ; spf=pass (imf25.hostedemail.com: domain of aleksander.lobakin@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=aleksander.lobakin@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713440226; a=rsa-sha256; cv=none; b=rydl83tF8djr4odYk1ZkwUkWEDIv0rFF/gc1DER4TP2/R3/Zxbh4zYyxFQIT7Mdirh95Ho dHaGjZAZ5i5LbIM58fSJloBZNnLSkCN6pPphwlO7QiaiJPLbdmKM88ie/jGsxv4jX2oUUl /K9y9Q+dxQWP+S8WEHqMw0yZKT9vQ98= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1713440226; x=1744976226; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2CVyn5njvQE0SmNlFiwWLUfAppDNCRHyl28DZcBT/XM=; b=MOSVofDJQsO0y0YI9xAhGaeFZ8zJkWPlLIz5Erz8jGqjUtQEFooP8jU1 m4TD0Zhx/ImGfYf3vajiA/yzPZWtZ+OE7N8A+bRLqUhrYV4C4+jl2SenJ tfUSSBS1Bf2kgJUO40JPOQT9fET487EIzdwZ1HVJZStYTqNt1lk8K5DIF NPcmySfR++hXyi8tVmfYTVs78l50dWfvh0L6o/PHg0HNIWLSYTvxaKJij 0qHTFqwCsfbT7sKp5LKF+4ksK6bUCDsTJFQ9WS2VDyJjFlrzO42XJJQhT LBDHcJEBQ0vb16/E/MhnyKdsDe0oFrggtcYowxiu5upFGD5BTXv3hZk7G Q==; X-CSE-ConnectionGUID: TQY6aOP+Sle6zj44fxL7dg== X-CSE-MsgGUID: ueS95R7kSUuLsgbw0qkgtw== X-IronPort-AV: E=McAfee;i="6600,9927,11047"; a="19587808" X-IronPort-AV: E=Sophos;i="6.07,212,1708416000"; d="scan'208";a="19587808" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Apr 2024 04:37:05 -0700 X-CSE-ConnectionGUID: 3+s884BpQHi4ovI0Aj2zqQ== X-CSE-MsgGUID: 1lymngatQ2WWD+7QviMoxQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,212,1708416000"; d="scan'208";a="27586137" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa004.fm.intel.com with ESMTP; 18 Apr 2024 04:37:01 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Alexander Duyck , Yunsheng Lin , Jesper Dangaard Brouer , Ilias Apalodimas , Christoph Lameter , Vlastimil Babka , Andrew Morton , Przemek Kitszel , nex.sw.ncis.osdt.itp.upstreaming@intel.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v10 07/10] libeth: add Rx buffer management Date: Thu, 18 Apr 2024 13:36:13 +0200 Message-ID: <20240418113616.1108566-8-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240418113616.1108566-1-aleksander.lobakin@intel.com> References: <20240418113616.1108566-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: CDFB2A0021 X-Stat-Signature: d4f4b56849tkkcwbp8xm5q6x3fstudbf X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1713440225-941359 X-HE-Meta: U2FsdGVkX1+gODlZWSKpOxMoXQWlT+k3dCkFKNmxQ5ZrearyV/VSNpQd24QCZ/m3d7I6F0kqYTfelWuxvSen5bd0E2KH9ggMfzNIN8kdC8I77T4ky4PRyQA9waneO6yZCVfRCaMOvfyPoFzlUzzdQouS0HNZN3bViLdlq6bOpj3f91KtVouTzaVe9dbrSMUOu2pthlTrCzq8Yyfwmo8ECVBd15zsJq8u2a/o413izsvUzbhtA8c1us1f1yWTKxzjynAD6a2TY9QGGNAQd/HM2phv0QALWio9UbaKH6uZdiy4PyrtZskv/Z43c/WIEc2CNMLsZ4vJ+YnXIQupfB/ckBL168pTxWvV7bU9yBf4+0U1ASMKbWXN6rPmKucEipKC3JIueqRFvuO3iiYFozPNt5dqqk8mo33xgPEi/IPJVI4uX1wBwz9fXgdByh5CWaF/5TcDzqSaCrf7CyczfVnE5cayOYsCy2YRGtecPS9pSfj4xt3nHRH0DTRANuXCkMq9VQmsSv+UhbrfpPNI9ja90RvgJDTHEy1kcmm+FWSEIxwgc1hJ6Ug2AgePZUaOIPIHP49lLqT1Wv8cv0J/hmD99Y7iObHhfrvRXc6Zuo0SQXGhfaGGqdEoxPagjM/2HSCm/t/aKUkHChLJLb1cPMDKD9ub3zMq73jSJwMPs+9YXP7Ibvm0ykmKeL748dPAMz2ViDp3cB5A8YhgvRBX7P7H1q8atS7y19/iFVSBFZ98lm1j/vQPMflxdrciKrJwj5cQyg1y8rN41QLu/nFnVN58pkYV2IEeRkoxJgQUz607EPs2Pp1Xx/OvJtZ8votN0JL/o9dPAoi4DMkDHCBOjsx6Hqm3axSYOcHCpSSPyNYt0Ci3uRa2U5inMqgcrETLgyzUlRic4TKZEuRGNtbQTeV2LP5JaJPrVwdW1Fq/kA64fZrithLd4dLnYdc80P1ACfjWVqtII26hSb5BM+1J7K1 KufUIahv tPQGTcrlL6kRdwrvLi2w+XDm6MVHYhVLDh0VXLnKS2bdFTsor9t+x3kToHAulgp1ukvyx7f1E+N46dPFiLj53tx5ivOVoG+gIptTu8YZZCck2U+WiYOekBr77P3GfK/04GHVZC5CWREweyNe0FoI2PZuT1+VbM3fIPCftmowzu7PvVgxpP+ZQsURXZkQGmqwP4P5lH7cYOho4NIechQKuwLtFiA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add a couple intuitive helpers to hide Rx buffer implementation details in the library and not multiplicate it between drivers. The settings are sorta optimized for 100G+ NICs, but nothing really HW-specific here. Use the new page_pool_dev_alloc() to dynamically switch between split-page and full-page modes depending on MTU, page size, required headroom etc. For example, on x86_64 with the default driver settings each page is shared between 2 buffers. Turning on XDP (not in this series) -> increasing headroom requirement pushes truesize out of 2048 boundary, leading to that each buffer starts getting a full page. The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to avoid compound overhead. For the above architecture, this means maximum linear frame size of 3712 w/o XDP. Not that &libeth_buf_queue is not a complete queue/ring structure for now, rather a shim, but eventually the libeth-enabled drivers will move to it, with iavf being the first one. Signed-off-by: Alexander Lobakin --- kernel-doc for libeth_fq::fp generates an "Excess struct member" warning currently, here's patch which fixes the script: [0] [0] https://lore.kernel.org/linux-doc/20240411093208.2483580-1-aleksander.lobakin@intel.com --- drivers/net/ethernet/intel/libeth/Kconfig | 1 + include/net/libeth/rx.h | 117 ++++++++++++++++++++++ drivers/net/ethernet/intel/libeth/rx.c | 98 ++++++++++++++++++ 3 files changed, 216 insertions(+) diff --git a/drivers/net/ethernet/intel/libeth/Kconfig b/drivers/net/ethernet/intel/libeth/Kconfig index af970a63c227..480293b71dbc 100644 --- a/drivers/net/ethernet/intel/libeth/Kconfig +++ b/drivers/net/ethernet/intel/libeth/Kconfig @@ -3,6 +3,7 @@ config LIBETH tristate + select PAGE_POOL help libeth is a common library containing routines shared between several drivers, but not yet promoted to the generic kernel API. diff --git a/include/net/libeth/rx.h b/include/net/libeth/rx.h index 0807e19f44b3..f29ea3e34c6c 100644 --- a/include/net/libeth/rx.h +++ b/include/net/libeth/rx.h @@ -4,8 +4,125 @@ #ifndef __LIBETH_RX_H #define __LIBETH_RX_H +#include + +#include #include +/* Rx buffer management */ + +/* Space reserved in front of each frame */ +#define LIBETH_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN) +/* Maximum headroom for worst-case calculations */ +#define LIBETH_MAX_HEADROOM LIBETH_SKB_HEADROOM +/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */ +#define LIBETH_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN) + +/* Always use order-0 pages */ +#define LIBETH_RX_PAGE_ORDER 0 +/* Pick a sane buffer stride and align to a cacheline boundary */ +#define LIBETH_RX_BUF_STRIDE SKB_DATA_ALIGN(128) +/* HW-writeable space in one buffer: truesize - headroom/tailroom, aligned */ +#define LIBETH_RX_PAGE_LEN(hr) \ + ALIGN_DOWN(SKB_MAX_ORDER(hr, LIBETH_RX_PAGE_ORDER), \ + LIBETH_RX_BUF_STRIDE) + +/** + * struct libeth_fqe - structure representing an Rx buffer (fill queue element) + * @page: page holding the buffer + * @offset: offset from the page start (to the headroom) + * @truesize: total space occupied by the buffer (w/ headroom and tailroom) + * + * Depending on the MTU, API switches between one-page-per-frame and shared + * page model (to conserve memory on bigger-page platforms). In case of the + * former, @offset is always 0 and @truesize is always ```PAGE_SIZE```. + */ +struct libeth_fqe { + struct page *page; + u32 offset; + u32 truesize; +} __aligned_largest; + +/** + * struct libeth_fq - structure representing a buffer (fill) queue + * @fp: hotpath part of the structure + * @pp: &page_pool for buffer management + * @fqes: array of Rx buffers + * @truesize: size to allocate per buffer, w/overhead + * @count: number of descriptors/buffers the queue has + * @buf_len: HW-writeable length per each buffer + * @nid: ID of the closest NUMA node with memory + */ +struct libeth_fq { + struct_group_tagged(libeth_fq_fp, fp, + struct page_pool *pp; + struct libeth_fqe *fqes; + + u32 truesize; + u32 count; + ); + + /* Cold fields */ + u32 buf_len; + int nid; +}; + +int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi); +void libeth_rx_fq_destroy(struct libeth_fq *fq); + +/** + * libeth_rx_alloc - allocate a new Rx buffer + * @fq: fill queue to allocate for + * @i: index of the buffer within the queue + * + * Return: DMA address to be passed to HW for Rx on successful allocation, + * ```DMA_MAPPING_ERROR``` otherwise. + */ +static inline dma_addr_t libeth_rx_alloc(const struct libeth_fq_fp *fq, u32 i) +{ + struct libeth_fqe *buf = &fq->fqes[i]; + + buf->truesize = fq->truesize; + buf->page = page_pool_dev_alloc(fq->pp, &buf->offset, &buf->truesize); + if (unlikely(!buf->page)) + return DMA_MAPPING_ERROR; + + return page_pool_get_dma_addr(buf->page) + buf->offset + + fq->pp->p.offset; +} + +void libeth_rx_recycle_slow(struct page *page); + +/** + * libeth_rx_sync_for_cpu - synchronize or recycle buffer post DMA + * @fqe: buffer to process + * @len: frame length from the descriptor + * + * Process the buffer after it's written by HW. The regular path is to + * synchronize DMA for CPU, but in case of no data it will be immediately + * recycled back to its PP. + * + * Return: true when there's data to process, false otherwise. + */ +static inline bool libeth_rx_sync_for_cpu(const struct libeth_fqe *fqe, + u32 len) +{ + struct page *page = fqe->page; + + /* Very rare, but possible case. The most common reason: + * the last fragment contained FCS only, which was then + * stripped by the HW. + */ + if (unlikely(!len)) { + libeth_rx_recycle_slow(page); + return false; + } + + page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len); + + return true; +} + /* Converting abstract packet type numbers into a software structure with * the packet parameters to do O(1) lookup on Rx. */ diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/intel/libeth/rx.c index 879c4dfd6a4e..6221b88c34ac 100644 --- a/drivers/net/ethernet/intel/libeth/rx.c +++ b/drivers/net/ethernet/intel/libeth/rx.c @@ -3,6 +3,104 @@ #include +/* Rx buffer management */ + +/** + * libeth_rx_hw_len - get the actual buffer size to be passed to HW + * @pp: &page_pool_params of the netdev to calculate the size for + * @max_len: maximum buffer size for a single descriptor + * + * Return: HW-writeable length per one buffer to pass it to the HW accounting: + * MTU the @dev has, HW required alignment, minimum and maximum allowed values, + * and system's page size. + */ +static u32 libeth_rx_hw_len(const struct page_pool_params *pp, u32 max_len) +{ + u32 len; + + len = READ_ONCE(pp->netdev->mtu) + LIBETH_RX_LL_LEN; + len = ALIGN(len, LIBETH_RX_BUF_STRIDE); + len = min3(len, ALIGN_DOWN(max_len ? : U32_MAX, LIBETH_RX_BUF_STRIDE), + pp->max_len); + + return len; +} + +/** + * libeth_rx_fq_create - create a PP with the default libeth settings + * @fq: buffer queue struct to fill + * @napi: &napi_struct covering this PP (no usage outside its poll loops) + * + * Return: %0 on success, -%errno on failure. + */ +int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi) +{ + struct page_pool_params pp = { + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV, + .order = LIBETH_RX_PAGE_ORDER, + .pool_size = fq->count, + .nid = fq->nid, + .dev = napi->dev->dev.parent, + .netdev = napi->dev, + .napi = napi, + .dma_dir = DMA_FROM_DEVICE, + .offset = LIBETH_SKB_HEADROOM, + }; + struct libeth_fqe *fqes; + struct page_pool *pool; + + /* HW-writeable / syncable length per one page */ + pp.max_len = LIBETH_RX_PAGE_LEN(pp.offset); + + /* HW-writeable length per buffer */ + fq->buf_len = libeth_rx_hw_len(&pp, fq->buf_len); + /* Buffer size to allocate */ + fq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset + + fq->buf_len)); + + pool = page_pool_create(&pp); + if (IS_ERR(pool)) + return PTR_ERR(pool); + + fqes = kvcalloc_node(fq->count, sizeof(*fqes), GFP_KERNEL, fq->nid); + if (!fqes) + goto err_buf; + + fq->fqes = fqes; + fq->pp = pool; + + return 0; + +err_buf: + page_pool_destroy(pool); + + return -ENOMEM; +} +EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, LIBETH); + +/** + * libeth_rx_fq_destroy - destroy a &page_pool created by libeth + * @fq: buffer queue to process + */ +void libeth_rx_fq_destroy(struct libeth_fq *fq) +{ + kvfree(fq->fqes); + page_pool_destroy(fq->pp); +} +EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, LIBETH); + +/** + * libeth_rx_recycle_slow - recycle a libeth page from the NAPI context + * @page: page to recycle + * + * To be used on exceptions or rare cases not requiring fast inline recycling. + */ +void libeth_rx_recycle_slow(struct page *page) +{ + page_pool_recycle_direct(page->pp, page); +} +EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, LIBETH); + /* Converting abstract packet type numbers into a software structure with * the packet parameters to do O(1) lookup on Rx. */