From patchwork Thu Apr 4 15:44:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Lobakin X-Patchwork-Id: 13618016 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79E04CD1284 for ; Thu, 4 Apr 2024 15:46:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0946A6B009D; Thu, 4 Apr 2024 11:46:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0438E6B009E; Thu, 4 Apr 2024 11:46:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E278E6B009F; Thu, 4 Apr 2024 11:46:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C54DF6B009D for ; Thu, 4 Apr 2024 11:46:17 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7BD2F809A9 for ; Thu, 4 Apr 2024 15:46:17 +0000 (UTC) X-FDA: 81972275994.17.6679A4E Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by imf24.hostedemail.com (Postfix) with ESMTP id 0797B180032 for ; Thu, 4 Apr 2024 15:46:14 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BSRlSfTH; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of aleksander.lobakin@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=aleksander.lobakin@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712245575; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PulupQH9UVXsmY2JRAPTJS6pOmiun9WSd/M3riY2cAI=; b=7JvZsu64hvmx7LIoS5WHPvRQILEc6pOiKXC8YFJfyKiqrvuAO45Pl97igGrsuYLD4CmuCz yuXKDg8tT0gvtKR30qQhj8HHLWL1QyWslW5PAiSSeOGiWRLi/kkNamub+3NPvyZvOrO54e 2TCfWz6B/2QOY06vHrqrG1z/47utoEI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BSRlSfTH; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of aleksander.lobakin@intel.com designates 192.198.163.12 as permitted sender) smtp.mailfrom=aleksander.lobakin@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712245575; a=rsa-sha256; cv=none; b=mpFb1qnlsxaVGDnKkoXUfgHxaoW7kk55KSR9eyEjxujSGm028dFpLOvY3w/vqmxE7yvALb YAMT7f03tBRCZN5Umyu2Qd8DfIxrromCEWP27t/allJ3UozjZbZpH/FkbZPoC6cooQ3UoM dfOu+pwvxnk42UwYXtD6Xbn1z/By3w4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712245575; x=1743781575; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=s23ubA/HqhhYkzvwsc8HuTKzMBLagp6hK98NDCtSckI=; b=BSRlSfTHje/OXSdxzvnxcMTzHb2ds6oJ3iIEhH248FHg7SvUr0PI5Vk+ m594khpIiJZzm47NyW7Tl+oWmSFvlyGU37qtJ9yojW+Ql9NW6idC72JMR E1T9cQETlKmWBgqt8i4XbTQqbSZ5l7HAtjFMGT43ShivaQr6oD1yjbxkY Vft151FMB5A9dqpLdmtjgrfkZqfdBrRhOAlGdWutwhCMvroEYOI39cryC kxvKhNJHc76yMPI7YLvH+RLVncinAVrxCcW9tllDYru30tohFl4PSsenC G2aO2cUAAtNCP+Ho5eFvPUOb5XJ4LS1VGXpj3BuHCyyy5SF6KZvO21DaO A==; X-CSE-ConnectionGUID: ZkXqJxSmR+evMc+irivwkA== X-CSE-MsgGUID: vJ1JYnTtR3+xOwhyiNiUnw== X-IronPort-AV: E=McAfee;i="6600,9927,11034"; a="11312253" X-IronPort-AV: E=Sophos;i="6.07,179,1708416000"; d="scan'208";a="11312253" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2024 08:46:15 -0700 X-CSE-ConnectionGUID: 1y1TOT6xQAu8HtSdWpHScw== X-CSE-MsgGUID: BrkeVGZoQ8Csx8N3AFY+ZA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,179,1708416000"; d="scan'208";a="23288173" Received: from newjersey.igk.intel.com ([10.102.20.203]) by fmviesa005.fm.intel.com with ESMTP; 04 Apr 2024 08:46:11 -0700 From: Alexander Lobakin To: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: Alexander Lobakin , Alexander Duyck , Yunsheng Lin , Jesper Dangaard Brouer , Ilias Apalodimas , Christoph Lameter , Vlastimil Babka , Andrew Morton , nex.sw.ncis.osdt.itp.upstreaming@intel.com, netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH net-next v9 7/9] libeth: add Rx buffer management Date: Thu, 4 Apr 2024 17:44:00 +0200 Message-ID: <20240404154402.3581254-8-aleksander.lobakin@intel.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240404154402.3581254-1-aleksander.lobakin@intel.com> References: <20240404154402.3581254-1-aleksander.lobakin@intel.com> MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 0797B180032 X-Stat-Signature: zgaqyno49z3cidgwgjady5nduqrsi8af X-Rspam-User: X-HE-Tag: 1712245574-133472 X-HE-Meta: U2FsdGVkX19rtxmetHiaZLXoGLgSkGjRQqZTkOSlgsKDRRneLS8y/DUuDQUZDRDQ1JQCh+wpQsCg/koRXYpmYVC1ElRIuh+8hpfwj3XKu8hvo2Wm4kK1wV8vglWwR6aHJJSUmmX3P+SBKRFXsrjIDgHyxwJ8YzYGusphjrMwX+/V6YyaCdZPGMp6Eu3uTuK+UD7fNEyQMKHdwjCK7JlSM2iDwA8klJ6uQYfd8aba8bZu+70dpz3RtBiR/aK8+Lfjd4pfTHvuF5tDnaxQfA9lhp9g/u4nnQ7D+rHoicWGSVgdpS6gOFfp3knqSZDDGpBJugHpc6vghfsbVZ8yPMaC+df4wdZOiCSztbbFbTLqJ0ALNOY4jaQwFufmGE6C1zqYeOp0T4s4L2NNwhwHAXmeqRN9n8rOWJIRQyaFjKnzFf3Djg1qFkKGi64DQhTiVwDqXmaH0KsGdIDGru6LJWykOBDwGDt8s+uaG9X/2WR90ORl6UIDWaHw6SEEiz4sJCfvKCrYOeI3fLZtWZCe4hNu/Cb+OE1uSFHXFsv1cCr8+Aa4lh/ow1kL5U0E684SYcZD/ogKlKZJdorp3Wn7wAy4xpilZGZBo2PrfL63Ioj+aPT2h/BUhctis2azMatMcZZ0hhYLH/leUaYMx2y3Jc8CGKDYSWlvxGh3FRwU4sQ9BjiiE5sBgc9e4FIvsslTfJqMg0kQDnRXm2WlVccbgQl8qF1sVzz1e6RA8G658Ispi+pVVPU26IGnPvKq4nZ3+Hv7XJgQVZ9VJE2x1clnKDswFvIBGwuCWhs4gzV/0ipdk7yWXfV3bcgEi4Oq7IMi0GVR9TETYgbbCflcaXZBYJXwldQQBf+qeZt0hge3C1nQopwo7GvLTnh5uPRNsiq/1KBTn9jXXx91GRCuV6IFBnu1gKVMeLk4ewpdrANJfM4C/X352Izet3niAUO9LjZ/Askxa/B0cI+6Q8na6T6EOJv dHRbhm/U 7gHMyWW1IeUB5dNwsxVQaLXe4InVLm+gGbZsoazIujIwYHlVdKY/bkLvmPVVxrpqoEKJ51MimqJTfG2SKH/rjwG4hkjmc+paVmQOMzLpkpMbu4rJrSty9k7H7jwvnmCuCVYG1dbU+WBbs8ETomQCL2dFaCfQZyT2T5XBp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add a couple intuitive helpers to hide Rx buffer implementation details in the library and not multiplicate it between drivers. The settings are sorta optimized for 100G+ NICs, but nothing really HW-specific here. Use the new page_pool_dev_alloc() to dynamically switch between split-page and full-page modes depending on MTU, page size, required headroom etc. For example, on x86_64 with the default driver settings each page is shared between 2 buffers. Turning on XDP (not in this series) -> increasing headroom requirement pushes truesize out of 2048 boundary, leading to that each buffer starts getting a full page. The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to avoid compound overhead. For the above architecture, this means maximum linear frame size of 3712 w/o XDP. Not that &libeth_buf_queue is not a complete queue/ring structure for now, rather a shim, but eventually the libeth-enabled drivers will move to it, with iavf being the first one. Signed-off-by: Alexander Lobakin --- drivers/net/ethernet/intel/libeth/Kconfig | 1 + include/net/libeth/rx.h | 117 ++++++++++++++++++++++ drivers/net/ethernet/intel/libeth/rx.c | 98 ++++++++++++++++++ 3 files changed, 216 insertions(+) diff --git a/drivers/net/ethernet/intel/libeth/Kconfig b/drivers/net/ethernet/intel/libeth/Kconfig index af970a63c227..480293b71dbc 100644 --- a/drivers/net/ethernet/intel/libeth/Kconfig +++ b/drivers/net/ethernet/intel/libeth/Kconfig @@ -3,6 +3,7 @@ config LIBETH tristate + select PAGE_POOL help libeth is a common library containing routines shared between several drivers, but not yet promoted to the generic kernel API. diff --git a/include/net/libeth/rx.h b/include/net/libeth/rx.h index aaf9c2cdf7fd..3db2bda4eab6 100644 --- a/include/net/libeth/rx.h +++ b/include/net/libeth/rx.h @@ -4,8 +4,125 @@ #ifndef __LIBETH_RX_H #define __LIBETH_RX_H +#include + +#include #include +/* Rx buffer management */ + +/* Space reserved in front of each frame */ +#define LIBETH_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN) +/* Maximum headroom for worst-case calculations */ +#define LIBETH_MAX_HEADROOM LIBETH_SKB_HEADROOM +/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */ +#define LIBETH_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN) + +/* Always use order-0 pages */ +#define LIBETH_RX_PAGE_ORDER 0 +/* Pick a sane buffer stride and align to a cacheline boundary */ +#define LIBETH_RX_BUF_STRIDE SKB_DATA_ALIGN(128) +/* HW-writeable space in one buffer: truesize - headroom/tailroom, aligned */ +#define LIBETH_RX_PAGE_LEN(hr) \ + ALIGN_DOWN(SKB_MAX_ORDER(hr, LIBETH_RX_PAGE_ORDER), \ + LIBETH_RX_BUF_STRIDE) + +/** + * struct libeth_fqe - structure representing an Rx buffer + * @page: page holding the buffer + * @offset: offset from the page start (to the headroom) + * @truesize: total space occupied by the buffer (w/ headroom and tailroom) + * + * Depending on the MTU, API switches between one-page-per-frame and shared + * page model (to conserve memory on bigger-page platforms). In case of the + * former, @offset is always 0 and @truesize is always ```PAGE_SIZE```. + */ +struct libeth_fqe { + struct page *page; + u32 offset; + u32 truesize; +} __aligned_largest; + +/** + * struct libeth_fq - structure representing a buffer queue + * @fp: hotpath part of the structure + * @pp: &page_pool for buffer management + * @fqes: array of Rx buffers + * @truesize: size to allocate per buffer, w/overhead + * @count: number of descriptors/buffers the queue has + * @buf_len: HW-writeable length per each buffer + * @nid: ID of the closest NUMA node with memory + */ +struct libeth_fq { + struct_group_tagged(libeth_fq_fp, fp, + struct page_pool *pp; + struct libeth_fqe *fqes; + + u32 truesize; + u32 count; + ); + + /* Cold fields */ + u32 buf_len; + int nid; +}; + +int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi); +void libeth_rx_fq_destroy(struct libeth_fq *fq); + +/** + * libeth_rx_alloc - allocate a new Rx buffer + * @fq: buffer queue to allocate for + * @i: index of the buffer within the queue + * + * Return: DMA address to be passed to HW for Rx on successful allocation, + * ```DMA_MAPPING_ERROR``` otherwise. + */ +static inline dma_addr_t libeth_rx_alloc(const struct libeth_fq_fp *fq, u32 i) +{ + struct libeth_fqe *buf = &fq->fqes[i]; + + buf->truesize = fq->truesize; + buf->page = page_pool_dev_alloc(fq->pp, &buf->offset, &buf->truesize); + if (unlikely(!buf->page)) + return DMA_MAPPING_ERROR; + + return page_pool_get_dma_addr(buf->page) + buf->offset + + fq->pp->p.offset; +} + +void libeth_rx_recycle_slow(struct page *page); + +/** + * libeth_rx_sync_for_cpu - synchronize or recycle buffer post DMA + * @fqe: buffer to process + * @len: frame length from the descriptor + * + * Process the buffer after it's written by HW. The regular path is to + * synchronize DMA for CPU, but in case of no data it will be immediately + * recycled back to its PP. + * + * Return: true when there's data to process, false otherwise. + */ +static inline bool libeth_rx_sync_for_cpu(const struct libeth_fqe *fqe, + u32 len) +{ + struct page *page = fqe->page; + + /* Very rare, but possible case. The most common reason: + * the last fragment contained FCS only, which was then + * stripped by the HW. + */ + if (unlikely(!len)) { + libeth_rx_recycle_slow(page); + return false; + } + + page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len); + + return true; +} + /* Converting abstract packet type numbers into a software structure with * the packet parameters to do O(1) lookup on Rx. */ diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/intel/libeth/rx.c index 86f17e29b47d..a557a1ebcbe5 100644 --- a/drivers/net/ethernet/intel/libeth/rx.c +++ b/drivers/net/ethernet/intel/libeth/rx.c @@ -3,6 +3,104 @@ #include +/* Rx buffer management */ + +/** + * libeth_rx_hw_len - get the actual buffer size to be passed to HW + * @pp: &page_pool_params of the netdev to calculate the size for + * @max_len: maximum buffer size for a single descriptor + * + * Return: HW-writeable length per one buffer to pass it to the HW accounting: + * MTU the @dev has, HW required alignment, minimum and maximum allowed values, + * and system's page size. + */ +static u32 libeth_rx_hw_len(const struct page_pool_params *pp, u32 max_len) +{ + u32 len; + + len = READ_ONCE(pp->netdev->mtu) + LIBETH_RX_LL_LEN; + len = ALIGN(len, LIBETH_RX_BUF_STRIDE); + len = min3(len, ALIGN_DOWN(max_len ? : U32_MAX, LIBETH_RX_BUF_STRIDE), + pp->max_len); + + return len; +} + +/** + * libeth_rx_fq_create - create a PP with the default libeth settings + * @fq: buffer queue struct to fill + * @napi: &napi_struct covering this PP (no usage outside its poll loops) + * + * Return: %0 on success, -%errno on failure. + */ +int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi) +{ + struct page_pool_params pp = { + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV, + .order = LIBETH_RX_PAGE_ORDER, + .pool_size = fq->count, + .nid = fq->nid, + .dev = napi->dev->dev.parent, + .netdev = napi->dev, + .napi = napi, + .dma_dir = DMA_FROM_DEVICE, + .offset = LIBETH_SKB_HEADROOM, + }; + struct libeth_fqe *fqes; + struct page_pool *pool; + + /* HW-writeable / syncable length per one page */ + pp.max_len = LIBETH_RX_PAGE_LEN(pp.offset); + + /* HW-writeable length per buffer */ + fq->buf_len = libeth_rx_hw_len(&pp, fq->buf_len); + /* Buffer size to allocate */ + fq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset + + fq->buf_len)); + + pool = page_pool_create(&pp); + if (IS_ERR(pool)) + return PTR_ERR(pool); + + fqes = kvcalloc_node(fq->count, sizeof(*fqes), GFP_KERNEL, fq->nid); + if (!fqes) + goto err_buf; + + fq->fqes = fqes; + fq->pp = pool; + + return 0; + +err_buf: + page_pool_destroy(pool); + + return -ENOMEM; +} +EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, LIBETH); + +/** + * libeth_rx_fq_destroy - destroy a &page_pool created by libeth + * @fq: buffer queue to process + */ +void libeth_rx_fq_destroy(struct libeth_fq *fq) +{ + kvfree(fq->fqes); + page_pool_destroy(fq->pp); +} +EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, LIBETH); + +/** + * libeth_rx_recycle_slow - recycle a libeth page from the NAPI context + * @page: page to recycle + * + * To be used on exceptions or rare cases not requiring fast inline recycling. + */ +void libeth_rx_recycle_slow(struct page *page) +{ + page_pool_recycle_direct(page->pp, page); +} +EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, LIBETH); + /* Converting abstract packet type numbers into a software structure with * the packet parameters to do O(1) lookup on Rx. */