From patchwork Mon Oct 7 22:15:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825339 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC52F18A6DC for ; Mon, 7 Oct 2024 22:16:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339390; cv=none; b=ZEtGQrvfbvyNEh+YJkLiT1LKzwDbS8G0QoPi4ZafeJT0NXDc48zYIcpCOjplZT9RWadW2Dhj/kn9EhN6wpdQ04bpzUYuIDnF1M1Hnuy4XFvRnbe/4rzaY1PrtWRxmuP1s+MpiloAi01Dq9qkZPbg2Egh5Pi26McWkhChj76aI64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339390; c=relaxed/simple; bh=hytG+DlCN2LxsU7N8U7DGWbARztmrWTXZ1oJB5kTG2M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LJmpQ2d6Smc5VRVquyRoESZ1Ct0Hz4ESgvGchC06udgfMx2TVMvnh3s9mUXWUL9sv4sNCg7iR+3LQA3Q9APAnM31+g4ibs/x1HfOdUyWycgH0qRlDUwZFmbBbf/5y0g0n8xk6Hfb0aDPfiQp634bgMAhOLC/I3kh6kgXSLJwjeY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=GLwCkhYZ; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="GLwCkhYZ" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-71de776bc69so2308285b3a.1 for ; Mon, 07 Oct 2024 15:16:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339388; x=1728944188; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=scjT5Hn9k0teOSkkRyDmcqyl/YFn1hQ0QF/jVboTcZU=; b=GLwCkhYZULWEmGFf7H1I3LL191isKWZhqPkybXuEEShKtAbyhlXJHAr3XbmQCq1LVY rVdISH7/RoEhh4Y6tYh5dINsmHyrl7L7gBqsbnkwJPE6Szr/dOe5clvlFEXuOkx0Q72G +v31NixegjIh0wPygwWglAT+e1YFXxAdoh4Mvewv8Ptf/AMiH4dMQ2/AAdWQhfbMWE+Z 5ZPwTJTnEXucfTen/Yhzi8Ri/+xSKPzxk/C0h9odxp74qWQnf9TDNQBiHm9uReyUNcRA e+Jd7an0xnDgipiwtmLDJs7y1eInTZABwURoYGQPDaVu3V63ikl6UQnm2AClfNJt3ErN kPDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339388; x=1728944188; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=scjT5Hn9k0teOSkkRyDmcqyl/YFn1hQ0QF/jVboTcZU=; b=Njiczmj4PAFdP8GIdDiI8ggy9Gc4nWQPyyZlCJhKrafCVSX1R6Fo33dFKcPjgDk18a t1HPr3HOMSwMjYvSIGkVB/yxc6HA9GrBPCFhdjCvjSsHDtaQBfQOqHHuvoCbVa1i315C 9YnZH6beps73dOTeekHSJI11M2s3uT9Od2u+A5dGwkrFDUt0h6VfeFQmIUOmybwo3m3G BF5ZWUqFfzVz9QWIlN34oOPE6RfLv7VTd4Dpq1kORlySTDSuo8BhU++wqHH0a6Wx1ZE6 u8L3CBJGS5g6P3ACZv2PQOMg7ckd1S3lqFbaTLptEf6WEwfGu+E1sNciTMCMem9/lTYa XtAg== X-Forwarded-Encrypted: i=1; AJvYcCWpJfkE0NWGFIMxfBkJZqb0BpI2X0xRntxNwYnaB1e555oA4NSoioVUbuyikMVj0MRoXn356pE=@vger.kernel.org X-Gm-Message-State: AOJu0YysfhdkZC2ySB8hiSwtIHwQ2b8+k/dVomYohp3O1tuEO1BO4spc xNvS9/Ex0oUDSVDww0l+tOobpMDfwBbKCf5HMShmkJqRQplaFIhfJ/3M9LV7fHE= X-Google-Smtp-Source: AGHT+IFW6HRbeVnFXK0JufimVAIAksganyuaGEKl+hR7te+mOEZlkiH8mVhw9SquXWLMe7WX7iXmvg== X-Received: by 2002:a05:6a00:2392:b0:71d:fbf3:f769 with SMTP id d2e1a72fcca58-71dfbf3f973mr11784422b3a.28.1728339388174; Mon, 07 Oct 2024 15:16:28 -0700 (PDT) Received: from localhost (fwdproxy-prn-032.fbsv.net. [2a03:2880:ff:20::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0ccd1aesm4932993b3a.67.2024.10.07.15.16.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:27 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 01/15] net: devmem: pull struct definitions out of ifdef Date: Mon, 7 Oct 2024 15:15:49 -0700 Message-ID: <20241007221603.1703699-2-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Don't hide structure definitions under conditional compilation, it only makes messier and harder to maintain. Move struct dmabuf_genpool_chunk_owner definition out of CONFIG_NET_DEVMEM ifdef together with a bunch of trivial inlined helpers using the structure. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- net/core/devmem.h | 44 +++++++++++++++++--------------------------- 1 file changed, 17 insertions(+), 27 deletions(-) diff --git a/net/core/devmem.h b/net/core/devmem.h index 76099ef9c482..cf66e53b358f 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -44,7 +44,6 @@ struct net_devmem_dmabuf_binding { u32 id; }; -#if defined(CONFIG_NET_DEVMEM) /* Owner of the dma-buf chunks inserted into the gen pool. Each scatterlist * entry from the dmabuf is inserted into the genpool as a chunk, and needs * this owner struct to keep track of some metadata necessary to create @@ -64,16 +63,6 @@ struct dmabuf_genpool_chunk_owner { struct net_devmem_dmabuf_binding *binding; }; -void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding); -struct net_devmem_dmabuf_binding * -net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, - struct netlink_ext_ack *extack); -void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); -int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, - struct net_devmem_dmabuf_binding *binding, - struct netlink_ext_ack *extack); -void dev_dmabuf_uninstall(struct net_device *dev); - static inline struct dmabuf_genpool_chunk_owner * net_iov_owner(const struct net_iov *niov) { @@ -91,6 +80,11 @@ net_iov_binding(const struct net_iov *niov) return net_iov_owner(niov)->binding; } +static inline u32 net_iov_binding_id(const struct net_iov *niov) +{ + return net_iov_owner(niov)->binding->id; +} + static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) { struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov); @@ -99,10 +93,18 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT); } -static inline u32 net_iov_binding_id(const struct net_iov *niov) -{ - return net_iov_owner(niov)->binding->id; -} +#if defined(CONFIG_NET_DEVMEM) + +void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding); +struct net_devmem_dmabuf_binding * +net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, + struct netlink_ext_ack *extack); +void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); +int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, + struct net_devmem_dmabuf_binding *binding, + struct netlink_ext_ack *extack); +void dev_dmabuf_uninstall(struct net_device *dev); + static inline void net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding) @@ -124,8 +126,6 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); #else -struct net_devmem_dmabuf_binding; - static inline void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding) { @@ -165,16 +165,6 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) static inline void net_devmem_free_dmabuf(struct net_iov *ppiov) { } - -static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) -{ - return 0; -} - -static inline u32 net_iov_binding_id(const struct net_iov *niov) -{ - return 0; -} #endif #endif /* _NET_DEVMEM_H */ From patchwork Mon Oct 7 22:15:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825340 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5315718CBF0 for ; Mon, 7 Oct 2024 22:16:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339391; cv=none; b=ZvX2qKpcGssZfUb6f6MLCgnxOZ1WXjBAqPhI18rWH51aRao0GrSQjOKxxzcAVxuC+adRA1I6qtHzc1QZfi17L5ugbhZ8++9tSsT5RVJK2CeF5sdnQvYeupuGL370o0Pss0eS3by5FogN3MSYEsdPvcmj6YOc7IZ8hY+59dE4l/M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339391; c=relaxed/simple; bh=ZfCGft5xpVBEnR3/fijfSn9cgPXUCC73fR3567SvbkI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ad6xrPBWtIFPUwk8zx2lqDybfUApfeZAaTgY6d2ogXDOmyh2ou9ziUrm1AO88XIGnlf6kz78wjcY6G7uSX6ILDGxOqYEiph84sCNEk2c47w7KyoXdDYwviVe6ODP7q9W4zLdPx7EygWEAg+kZdn6LPfDVZtn6dULe43XAPFv7dI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=fRdt8+T2; arc=none smtp.client-ip=209.85.210.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="fRdt8+T2" Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-71df04d3cd1so2596755b3a.2 for ; Mon, 07 Oct 2024 15:16:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339389; x=1728944189; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mnc2aj24bs6F6f4vp0LuYLnguSpST4iVAkcBGLbjMjc=; b=fRdt8+T26mPbaYpHlAE2BnRwtLriyTgMhdDkf+3sj1kqyScp1OzAWI8BWM+kcm+bwW oVAaqWl4pWBlF/P+8HnCRUtt+DUCst2s3pMnEPkCP0SaaSt023qru7s+Ws77InQVnNB/ n7pzdyhRVwPZi3FgfM/NoiBaPRUBLzhqvRm+wLIdeYwFajV8PvCn5IBRwbMQaidE9LwJ MAHXZIbTLg2sdjc0LdVycLVXQIfsNA9pekg0+MkQT+SS92zg9UGhLLpbiGTP+y9BQXhR lr4lSxxAGhdWq4btapxjcg2cYdMwwEhJbYJbeSvQfC8CbdBSKHXg/LMoMAtA9GPCa21l uMGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339389; x=1728944189; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mnc2aj24bs6F6f4vp0LuYLnguSpST4iVAkcBGLbjMjc=; b=tFohfgraJZnF5l11Dp/AuDqWAI48BZyStmu8mhLVLDyZTgA9JP17+ciDfTOTHdQFzD r3gVkVXHQLN8jVZ30eVC9eZfhdT6RLcsN6lrbLljhvGd1InXGo5zCh+vs65fIDo05e3n 90lqFR223gnpxGbUKEMvTnD41zMTD5vlZQ4OooZ/r5lcXUW+uLy+BnW3V0bhw81TVnt4 Oa/vcOrQmWNnbqZts1Lvjq8DGcvhUeW+dDijK+azlsuoAZZvFUzAKZ6HMd1qqR//CgJG rZ7uS7iMpAALvKJ7ixCk3sAkGsHWQUhU85T2QxHIkE8frGlJUKpGJbn+/nv9QlriUZdu 48jA== X-Forwarded-Encrypted: i=1; AJvYcCW5BzVNgaVjfKG8xTES1VOSzmOn6pNBma+2GtwR0sWSeiQKfNqcII56vKtNw7rCzO3w2iNggzs=@vger.kernel.org X-Gm-Message-State: AOJu0YyKc+AkU9TsUJ6JTxbPYPPwk3lyDxcy9O0kRAFI27/AkPWoufAI qDEjZVe0dlhSlo86UUsszbEdA+p1j0vhrez3i2NXmLLUplxDUhhtuj7mr/q8FD0= X-Google-Smtp-Source: AGHT+IGVsAURER/HsQepSDlOXpx3y6gjR1WuypY0frHO80NxNLtIO071JazgfX4mx5fkA96PITPJYg== X-Received: by 2002:a05:6a00:a91:b0:71d:d2a9:6ebf with SMTP id d2e1a72fcca58-71de23a9290mr22888892b3a.6.1728339389576; Mon, 07 Oct 2024 15:16:29 -0700 (PDT) Received: from localhost (fwdproxy-prn-112.fbsv.net. [2a03:2880:ff:70::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0cd14f8sm4895835b3a.65.2024.10.07.15.16.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:29 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 02/15] net: prefix devmem specific helpers Date: Mon, 7 Oct 2024 15:15:50 -0700 Message-ID: <20241007221603.1703699-3-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Add prefixes to all helpers that are specific to devmem TCP, i.e. net_iov_binding[_id]. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- net/core/devmem.c | 2 +- net/core/devmem.h | 6 +++--- net/ipv4/tcp.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index 11b91c12ee11..858982858f81 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -93,7 +93,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) void net_devmem_free_dmabuf(struct net_iov *niov) { - struct net_devmem_dmabuf_binding *binding = net_iov_binding(niov); + struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov); unsigned long dma_addr = net_devmem_get_dma_addr(niov); if (WARN_ON(!gen_pool_has_addr(binding->chunk_pool, dma_addr, diff --git a/net/core/devmem.h b/net/core/devmem.h index cf66e53b358f..80f38fe46930 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -75,14 +75,14 @@ static inline unsigned int net_iov_idx(const struct net_iov *niov) } static inline struct net_devmem_dmabuf_binding * -net_iov_binding(const struct net_iov *niov) +net_devmem_iov_binding(const struct net_iov *niov) { return net_iov_owner(niov)->binding; } -static inline u32 net_iov_binding_id(const struct net_iov *niov) +static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) { - return net_iov_owner(niov)->binding->id; + return net_devmem_iov_binding(niov)->id; } static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 4f77bd862e95..5feef46426f4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2493,7 +2493,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, /* Will perform the exchange later */ dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx]; - dmabuf_cmsg.dmabuf_id = net_iov_binding_id(niov); + dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov); offset += copy; remaining_len -= copy; From patchwork Mon Oct 7 22:15:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825341 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pj1-f46.google.com (mail-pj1-f46.google.com [209.85.216.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CCD618D63B for ; Mon, 7 Oct 2024 22:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339392; cv=none; b=p+G8rOtx9b/Crd3x/6oPsswsJS1bya49jcNfc1BYkSvabENmmfDGT1o8QWDd68S+sU2G4B17siee4JC+O7OMCkT3jtfqUd6ExyXg1iXj2nDDomWg/MmRnpEGHCVRTf5X6KdZYnu4zaQ+SgWzCIBNpn6x4rOliveTYWFnYOD0lK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339392; c=relaxed/simple; bh=tdW7pVO2vzBEUkGSIwk+fjycaNg7WG9R5dNLkTWuiiU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IOxoP8XTRbRqtquNJcm0sBRLFHJRe3TfU7Xw68bNuqybCw7BqWkResjTNNYLIFIwoZCzgeKECiUyAwVa2onDYO+U1d4vqQEtod2HNb5uCKEKPMI21mP7+BfFIA58yXitM5kOgft+HojxMwspiOHjZiXQueanfUmbOWEny6eMojY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=nPDWBMVt; arc=none smtp.client-ip=209.85.216.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="nPDWBMVt" Received: by mail-pj1-f46.google.com with SMTP id 98e67ed59e1d1-2e188185365so4058472a91.1 for ; Mon, 07 Oct 2024 15:16:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339391; x=1728944191; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pL6pmSI5HRx43rRiACnUGu/bYld5yRw2M0KqvIw/BxM=; b=nPDWBMVtxVIjd54qqeIyYvPGH5T2/MKdxv45h6dtzohjDwP4m1NfedYTpIQyAig3xO nyXZtdbW6yrK9mjoEJkcaWvN7F9ZhDS5A5wYnIgEs+4Ak8qYx2nxXaxXcT4FjtGwtAXR Aa6CynuADrJK31YWm8xKL6cxLMlfWGqj26BBfIAiDPkJ8BHX2UfIEUucIFXiikgo2RHJ gV201R/92DUjhHvl5QjqbCCK+uKqGgriCH0HI3qIgEpUBwhTLRrPXWmy2/Lt38NEMR8W wpgFpLNJE/jZYNYzYULOHi+16Gx70ZOdAMOitI3dwE6sj09Pna2aOHvixXgHhb0keQpQ 30eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339391; x=1728944191; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pL6pmSI5HRx43rRiACnUGu/bYld5yRw2M0KqvIw/BxM=; b=EcXSOxztl0bM06dQaqdB2y/RCgDyAHfAU4y1CCE/sNsAfdF+/UxGxEYxUiMGp//fbw Mrwe2D0YEZDZgycRwluf8Ooeql+mpd4sWNmu+J3etvE82ZVNpsvgm4jQ5/lFmft1ltU/ PwKn5Hi8Dotunb709Prk26ebazC/ZmtoUwBruMx+sRfUleSbaUDq5CNrhQGyRTzcNHeT HFdtq0//atqGVBQUrO7cQ19MxKLhQMuCtwBGPofCsMshSyERuGPuSX8oEIdhzhYy/G+/ fwKpqd4eoS/qx2bNjCxXcNTF5/bbz4aUha05hYe/zqpaPcWhkJ2Kx6OC0JlETy2znOgB Assg== X-Forwarded-Encrypted: i=1; AJvYcCUABCRg/dTDRodHSdgqsJG9tGqJsZMnFIm0z9nh6ZnlII7a4Il5TCzSNqq5aIuMtv1IFZExfzs=@vger.kernel.org X-Gm-Message-State: AOJu0YxPw4CQnIWMBgOmqwybHOpgDPF91mO8lNRhLz5cMsUInXG/zzJ8 pMWzFZBT6O2wI0IYmQ7iAbF+6c6VxnO4OrUql9inbAjjVDJtRzmou+ZhobSo+jE= X-Google-Smtp-Source: AGHT+IFRfdWCHOJB3tc8iWXlTfj3dlGdrL6fQ9NuLBzBh6SI0mbyBXqFiz/MSvvzHcfdZcJRJBXAkg== X-Received: by 2002:a17:90a:7085:b0:2dd:4f93:9af1 with SMTP id 98e67ed59e1d1-2e1e626de09mr15350798a91.21.1728339390835; Mon, 07 Oct 2024 15:16:30 -0700 (PDT) Received: from localhost (fwdproxy-prn-060.fbsv.net. [2a03:2880:ff:3c::face:b00c]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e1e83cab42sm7886103a91.2.2024.10.07.15.16.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:30 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 03/15] net: generalise net_iov chunk owners Date: Mon, 7 Oct 2024 15:15:51 -0700 Message-ID: <20241007221603.1703699-4-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Currently net_iov stores a pointer to struct dmabuf_genpool_chunk_owner, which serves as a useful abstraction to share data and provide a context. However, it's too devmem specific, and we want to reuse it for other memory providers, and for that we need to decouple net_iov from devmem. Make net_iov to point to a new base structure called net_iov_area, which dmabuf_genpool_chunk_owner extends. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/netmem.h | 21 ++++++++++++++++++++- net/core/devmem.c | 25 +++++++++++++------------ net/core/devmem.h | 25 +++++++++---------------- 3 files changed, 42 insertions(+), 29 deletions(-) diff --git a/include/net/netmem.h b/include/net/netmem.h index 8a6e20be4b9d..3795ded30d2c 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -24,11 +24,20 @@ struct net_iov { unsigned long __unused_padding; unsigned long pp_magic; struct page_pool *pp; - struct dmabuf_genpool_chunk_owner *owner; + struct net_iov_area *owner; unsigned long dma_addr; atomic_long_t pp_ref_count; }; +struct net_iov_area { + /* Array of net_iovs for this area. */ + struct net_iov *niovs; + size_t num_niovs; + + /* Offset into the dma-buf where this chunk starts. */ + unsigned long base_virtual; +}; + /* These fields in struct page are used by the page_pool and net stack: * * struct { @@ -54,6 +63,16 @@ NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr); NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count); #undef NET_IOV_ASSERT_OFFSET +static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov) +{ + return niov->owner; +} + +static inline unsigned int net_iov_idx(const struct net_iov *niov) +{ + return niov - net_iov_owner(niov)->niovs; +} + /* netmem */ /** diff --git a/net/core/devmem.c b/net/core/devmem.c index 858982858f81..5c10cf0e2a18 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -32,14 +32,15 @@ static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool, { struct dmabuf_genpool_chunk_owner *owner = chunk->owner; - kvfree(owner->niovs); + kvfree(owner->area.niovs); kfree(owner); } static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov) { - struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov); + struct dmabuf_genpool_chunk_owner *owner; + owner = net_devmem_iov_to_chunk_owner(niov); return owner->base_dma_addr + ((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT); } @@ -82,7 +83,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding) offset = dma_addr - owner->base_dma_addr; index = offset / PAGE_SIZE; - niov = &owner->niovs[index]; + niov = &owner->area.niovs[index]; niov->pp_magic = 0; niov->pp = NULL; @@ -250,9 +251,9 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, goto err_free_chunks; } - owner->base_virtual = virtual; + owner->area.base_virtual = virtual; owner->base_dma_addr = dma_addr; - owner->num_niovs = len / PAGE_SIZE; + owner->area.num_niovs = len / PAGE_SIZE; owner->binding = binding; err = gen_pool_add_owner(binding->chunk_pool, dma_addr, @@ -264,17 +265,17 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, goto err_free_chunks; } - owner->niovs = kvmalloc_array(owner->num_niovs, - sizeof(*owner->niovs), - GFP_KERNEL); - if (!owner->niovs) { + owner->area.niovs = kvmalloc_array(owner->area.num_niovs, + sizeof(*owner->area.niovs), + GFP_KERNEL); + if (!owner->area.niovs) { err = -ENOMEM; goto err_free_chunks; } - for (i = 0; i < owner->num_niovs; i++) { - niov = &owner->niovs[i]; - niov->owner = owner; + for (i = 0; i < owner->area.num_niovs; i++) { + niov = &owner->area.niovs[i]; + niov->owner = &owner->area; page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), net_devmem_get_dma_addr(niov)); } diff --git a/net/core/devmem.h b/net/core/devmem.h index 80f38fe46930..12b14377ed3f 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -10,6 +10,8 @@ #ifndef _NET_DEVMEM_H #define _NET_DEVMEM_H +#include + struct netlink_ext_ack; struct net_devmem_dmabuf_binding { @@ -50,34 +52,25 @@ struct net_devmem_dmabuf_binding { * allocations from this chunk. */ struct dmabuf_genpool_chunk_owner { - /* Offset into the dma-buf where this chunk starts. */ - unsigned long base_virtual; + struct net_iov_area area; + struct net_devmem_dmabuf_binding *binding; /* dma_addr of the start of the chunk. */ dma_addr_t base_dma_addr; - - /* Array of net_iovs for this chunk. */ - struct net_iov *niovs; - size_t num_niovs; - - struct net_devmem_dmabuf_binding *binding; }; static inline struct dmabuf_genpool_chunk_owner * -net_iov_owner(const struct net_iov *niov) +net_devmem_iov_to_chunk_owner(const struct net_iov *niov) { - return niov->owner; -} + struct net_iov_area *owner = net_iov_owner(niov); -static inline unsigned int net_iov_idx(const struct net_iov *niov) -{ - return niov - net_iov_owner(niov)->niovs; + return container_of(owner, struct dmabuf_genpool_chunk_owner, area); } static inline struct net_devmem_dmabuf_binding * net_devmem_iov_binding(const struct net_iov *niov) { - return net_iov_owner(niov)->binding; + return net_devmem_iov_to_chunk_owner(niov)->binding; } static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) @@ -87,7 +80,7 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov) static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) { - struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov); + struct net_iov_area *owner = net_iov_owner(niov); return owner->base_virtual + ((unsigned long)net_iov_idx(niov) << PAGE_SHIFT); From patchwork Mon Oct 7 22:15:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825342 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D561518DF92 for ; Mon, 7 Oct 2024 22:16:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339394; cv=none; b=XFBnhu1NxKIUx1B0KtUwI7yFmwAefTpln8e1snVZDnpBSY9bUtek/fm9KVi4/QMC0Amq9VKn0Bwc+DnPx3PNkikjuxagVBAyPNghxnu48WUZ415+f9lpqSXC+l2Gbt6wVoc8qrv814EHiFJSXyQzzo0EUSRYfi0NrShqi/zWV2k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339394; c=relaxed/simple; bh=9AXdipc3QXUbfcOG5e2XU6AisZjBFHEUE/Y2qDzheoA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bHqjD+iTE6r6WO/gVRqvfbS6sP2wCbrTUGgven24xwgu1dLrvOLyvuaoQIYxUorv0x8yYbltR4jLhMFPhwcgPVF1P4VOb/HiAvNAEIfG3EPu/hBlUWMAuUHTeY+RJRlIYlzb72ovFzMjAgDkoWJlF8aYrf1q6Kp2DjR59aKieIM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=P+uAkpIR; arc=none smtp.client-ip=209.85.210.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="P+uAkpIR" Received: by mail-pf1-f179.google.com with SMTP id d2e1a72fcca58-71df4620966so2273700b3a.0 for ; Mon, 07 Oct 2024 15:16:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339392; x=1728944192; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bS0lteTo5Hva+/YqOQiRdxRtksi+N5V23PlOptkcu2c=; b=P+uAkpIRMTguuR26TaV1wDpMqyjzUT5+SQSat2BNci+6Z/sfI0ejhWCbVJMCwMPaHb bdb5VCWECG1TGu1ybj7bc/CZTswcedppNBPVmVt4mbltDlUvrNA8oL/aPe9NJPqQ1ZwM JMRkdkwFFiSa4Xs8e4GMzm4RnlX9MfWcXFd1fdZS0e4OqNGDNH0gAe53puN23IU+lkbB iE2qAIxQ8AZWGIo0iVjRWJ5hIom/mt66O3EzSjtPw/tPQT+VwAeGpybnhGKEbV3OA44C OGLqe2rSv9SbHgdfn2gs4Tg5bGZ7Fgz0KQmk49v+Pkzl/uvgIPnKtDHcLPFt3BA/fW0J woXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339392; x=1728944192; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bS0lteTo5Hva+/YqOQiRdxRtksi+N5V23PlOptkcu2c=; b=tbAQ4qhUgYwT8b3ZJmoPdKuajiMnjKDeX0x8XIeOf2QGnKKC05w8Sbq7Ntoi6oT7ue 64jtFH5i/54HmPoaPCCFCVLbP5JfRqpwYfJZIgrSRgFSW2MYmY6iU6cdLbNjCVyE1q4F Y8/LBd8k3uKE72p4LiDe+0F6lVXfFLRwAnu90jad44Z2dgDTko50O4MJTtvus88ZGQlT j/A39G/m9SEouWBzAjzTlLHYPkUXmvzNlL25SkCNXq6aimNGP2g+4OXvfDKqOGyGry7l 2lQIACkARK6eWUi9nEURliZPU/0fmdsmJkCtrbmapBsd3jDXftTaDaozHW4DQjzxtmVM 99uA== X-Forwarded-Encrypted: i=1; AJvYcCVJ5lFG4csz1mL/IV/jOqe+wW5lJh1L40QJ4rjmIUCwn0WvMg19sZ6BkhugSB2GRMcK6wj8Z/4=@vger.kernel.org X-Gm-Message-State: AOJu0YyEPyYLRvUSaaJ5CITVoJss/s65UZn8E4aau/Cyrlp2Mp5sOF9v KGG80xh3CcVx4wvtG0KhsH5qlf1OVeF6vRM6XqR70tu5JIBnIea1MHHGL0gFYY8= X-Google-Smtp-Source: AGHT+IFXbhBWPDSf8ngFYxv2ddrFCuI2V7jMFKQKrZPa0qXRJUBBmznIA7t/ng0DPLetuwWQnarHsw== X-Received: by 2002:a05:6a00:3c83:b0:717:8ee0:4ea1 with SMTP id d2e1a72fcca58-71de22e8399mr24121525b3a.0.1728339392206; Mon, 07 Oct 2024 15:16:32 -0700 (PDT) Received: from localhost (fwdproxy-prn-037.fbsv.net. [2a03:2880:ff:25::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0d62591sm4921263b3a.150.2024.10.07.15.16.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:31 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 04/15] net: page_pool: create hooks for custom page providers Date: Mon, 7 Oct 2024 15:15:52 -0700 Message-ID: <20241007221603.1703699-5-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Jakub Kicinski The page providers which try to reuse the same pages will need to hold onto the ref, even if page gets released from the pool - as in releasing the page from the pp just transfers the "ownership" reference from pp to the provider, and provider will wait for other references to be gone before feeding this page back into the pool. Signed-off-by: Jakub Kicinski [Pavel] Rebased, renamed callback, +converted devmem Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/types.h | 9 +++++++++ net/core/devmem.c | 13 ++++++++++++- net/core/devmem.h | 2 ++ net/core/page_pool.c | 17 +++++++++-------- 4 files changed, 32 insertions(+), 9 deletions(-) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index c022c410abe3..8a35fe474adb 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -152,8 +152,16 @@ struct page_pool_stats { */ #define PAGE_POOL_FRAG_GROUP_ALIGN (4 * sizeof(long)) +struct memory_provider_ops { + netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp); + bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem); + int (*init)(struct page_pool *pool); + void (*destroy)(struct page_pool *pool); +}; + struct pp_memory_provider_params { void *mp_priv; + const struct memory_provider_ops *mp_ops; }; struct page_pool { @@ -215,6 +223,7 @@ struct page_pool { struct ptr_ring ring; void *mp_priv; + const struct memory_provider_ops *mp_ops; #ifdef CONFIG_PAGE_POOL_STATS /* recycle stats are per-cpu to avoid locking */ diff --git a/net/core/devmem.c b/net/core/devmem.c index 5c10cf0e2a18..83d13eb441b6 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -117,6 +117,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) WARN_ON(rxq->mp_params.mp_priv != binding); rxq->mp_params.mp_priv = NULL; + rxq->mp_params.mp_ops = NULL; rxq_idx = get_netdev_rx_queue_index(rxq); @@ -142,7 +143,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, } rxq = __netif_get_rx_queue(dev, rxq_idx); - if (rxq->mp_params.mp_priv) { + if (rxq->mp_params.mp_ops) { NL_SET_ERR_MSG(extack, "designated queue already memory provider bound"); return -EEXIST; } @@ -160,6 +161,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, return err; rxq->mp_params.mp_priv = binding; + rxq->mp_params.mp_ops = &dmabuf_devmem_ops; err = netdev_rx_queue_restart(dev, rxq_idx); if (err) @@ -169,6 +171,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, err_xa_erase: rxq->mp_params.mp_priv = NULL; + rxq->mp_params.mp_ops = NULL; xa_erase(&binding->bound_rxqs, xa_idx); return err; @@ -388,3 +391,11 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem) /* We don't want the page pool put_page()ing our net_iovs. */ return false; } + +const struct memory_provider_ops dmabuf_devmem_ops = { + .init = mp_dmabuf_devmem_init, + .destroy = mp_dmabuf_devmem_destroy, + .alloc_netmems = mp_dmabuf_devmem_alloc_netmems, + .release_netmem = mp_dmabuf_devmem_release_page, +}; +EXPORT_SYMBOL(dmabuf_devmem_ops); diff --git a/net/core/devmem.h b/net/core/devmem.h index 12b14377ed3f..fbf7ec9a62cb 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -88,6 +88,8 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov) #if defined(CONFIG_NET_DEVMEM) +extern const struct memory_provider_ops dmabuf_devmem_ops; + void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding); struct net_devmem_dmabuf_binding * net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, diff --git a/net/core/page_pool.c b/net/core/page_pool.c index a813d30d2135..c21c5b9edc68 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -284,10 +284,11 @@ static int page_pool_init(struct page_pool *pool, rxq = __netif_get_rx_queue(pool->slow.netdev, pool->slow.queue_idx); pool->mp_priv = rxq->mp_params.mp_priv; + pool->mp_ops = rxq->mp_params.mp_ops; } - if (pool->mp_priv) { - err = mp_dmabuf_devmem_init(pool); + if (pool->mp_ops) { + err = pool->mp_ops->init(pool); if (err) { pr_warn("%s() mem-provider init failed %d\n", __func__, err); @@ -584,8 +585,8 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) return netmem; /* Slow-path: cache empty, do real allocation */ - if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) - netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) + netmem = pool->mp_ops->alloc_netmems(pool, gfp); else netmem = __page_pool_alloc_pages_slow(pool, gfp); return netmem; @@ -676,8 +677,8 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) bool put; put = true; - if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) - put = mp_dmabuf_devmem_release_page(pool, netmem); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops) + put = pool->mp_ops->release_netmem(pool, netmem); else __page_pool_release_page_dma(pool, netmem); @@ -1010,8 +1011,8 @@ static void __page_pool_destroy(struct page_pool *pool) page_pool_unlist(pool); page_pool_uninit(pool); - if (pool->mp_priv) { - mp_dmabuf_devmem_destroy(pool); + if (pool->mp_ops) { + pool->mp_ops->destroy(pool); static_branch_dec(&page_pool_mem_providers); } From patchwork Mon Oct 7 22:15:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825343 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19FE218E04E for ; Mon, 7 Oct 2024 22:16:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339395; cv=none; b=C1pi3RKA0OXtdkVqYFYcYWa+Qd084kkXuKBKn5ey/B80/HfTEzWkIQ/xNNHAnWSACnL6o8yLmXhCRuGVJyxlufVcUrTtnbC7cCqZSEMJ7ZA+66XciVVwTOM9HIwLjTZXSCXjC5iy42D9lvKEEdtUabUc9H8eXtIMitBTRMf/x1k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339395; c=relaxed/simple; bh=PzvHDCZ7021Z0HKhDmIQWr7NYdzsNljWLWXKZ7LyhwI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pyY6vifndb1X9ThX9jB2nBA1mtreDDLGanxcdCfD8kM9+fT6wQ+CydRzquxQe8AhzMtZr7miGiOAjfFfdL0uAzRQfGyPSk114DfzwqwPQH6dMS736fAuFDzK61Wi3pbLcRaxBcjj5rU6bfM4Zrz72UVU+RwRTN5d+XAB84sCzCU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=AAMLnG2x; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="AAMLnG2x" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-7e9ad969a4fso3023337a12.3 for ; Mon, 07 Oct 2024 15:16:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339393; x=1728944193; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=T9pcBB36/ecxZhukt1XlNbWeZgVQ3rKv4Hy/wLZFefI=; b=AAMLnG2x+FkHGh/nWF9+ABrYdLcH0rqncL4q3Xk9JfT2yOVeLpQ9JYG86qSkCrojRG +El+lxnKMyA72pVUurfwJ5C/V76En9UK371eI6ktWt6WiQm+j93rLlan8YLK0RwQxHBL 9Imn1+txtA0JMqzWu59HBh2F0JwcHUq+rzpc6N01NapD67d9XCGcX0/ZkJjI7Si97p2N C7UjhS9n+Fj2MNNW803b+KE+epeswJCBOpYGvHynNV25DEArE8GWsJrPgxD6o/dkqgvk AxbWPKjD1QU49uFbLRJ47jIFv1MpPEAp9nFNcVlaVH10leH1VYi9qZKzO7BHcaP1kSvi C/xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339393; x=1728944193; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T9pcBB36/ecxZhukt1XlNbWeZgVQ3rKv4Hy/wLZFefI=; b=gXCjskXGv4KXAiIvNPRQYy0SvL8Ox/v5pJUJg4B/JSyw4OZbuj0j4KWo9sKQIDxwnz l+jPhKfYJOhqg8v7xX0qncoEkvr+r/axhtXfVUd3zX41nE1TtUSYVKjCrDOXj3Tey0yS kUu4BkO/8nKG4baZsNCCk+bS7HuDmN+xasP5dr3aLguAq1R4bU7ZnPSBMB76N9SmH+Kt dl4neIkwWpMjJc2tAp8zZ9Kx9VosQrSUJS/R7Y9TvlJ+riqdxldH6jDOJi87sAIKjl8I 9iqDve8Yj+BJqHUdsqIaaHrdaQetNkR8XXNZ6D13qhXM+qehxAOFQf3WLlBNmFJ6i54F abrg== X-Forwarded-Encrypted: i=1; AJvYcCUO4CezqPBtgY9LErfFVZITeBiS+qQEcbG0eu6sWTJsZHbJzaAW9eAFMy/W7BnD0CrlL1t8R5k=@vger.kernel.org X-Gm-Message-State: AOJu0YwFl+7qBLpdHxYFikqF2dZpfDR3ea6HphP0FCx0eVuZZPmJVZWU i++M13h/eT9KKaqt9j+lZnYYnTuoa7VHuUErBG5o0sM2kjjqes3jjsSh5oAY0FqEwhHhwtKXFX6 w X-Google-Smtp-Source: AGHT+IETMvKxdjMQtaM8iuhPGelqYhvGKkMtsR+IfLNdg5zFQkOW4alFDxqcPGfPi/W8pkNtVtb7ew== X-Received: by 2002:a05:6a20:6f9c:b0:1d1:13de:68c6 with SMTP id adf61e73a8af0-1d6dfa406b7mr20127507637.29.1728339393417; Mon, 07 Oct 2024 15:16:33 -0700 (PDT) Received: from localhost (fwdproxy-prn-037.fbsv.net. [2a03:2880:ff:25::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0d7d02asm4887626b3a.197.2024.10.07.15.16.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:33 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 05/15] net: prepare for non devmem TCP memory providers Date: Mon, 7 Oct 2024 15:15:53 -0700 Message-ID: <20241007221603.1703699-6-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov There is a good bunch of places in generic paths assuming that the only page pool memory provider is devmem TCP. As we want to reuse the net_iov and provider infrastructure, we need to patch it up and explicitly check the provider type when we branch into devmem TCP code. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- net/core/devmem.c | 4 ++-- net/core/page_pool_user.c | 15 +++++++++------ net/ipv4/tcp.c | 6 ++++++ 3 files changed, 17 insertions(+), 8 deletions(-) diff --git a/net/core/devmem.c b/net/core/devmem.c index 83d13eb441b6..b0733cf42505 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -314,10 +314,10 @@ void dev_dmabuf_uninstall(struct net_device *dev) unsigned int i; for (i = 0; i < dev->real_num_rx_queues; i++) { - binding = dev->_rx[i].mp_params.mp_priv; - if (!binding) + if (dev->_rx[i].mp_params.mp_ops != &dmabuf_devmem_ops) continue; + binding = dev->_rx[i].mp_params.mp_priv; xa_for_each(&binding->bound_rxqs, xa_idx, rxq) if (rxq == &dev->_rx[i]) { xa_erase(&binding->bound_rxqs, xa_idx); diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c index 48335766c1bf..0d6cb7fb562c 100644 --- a/net/core/page_pool_user.c +++ b/net/core/page_pool_user.c @@ -214,7 +214,7 @@ static int page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, const struct genl_info *info) { - struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + struct net_devmem_dmabuf_binding *binding; size_t inflight, refsz; void *hdr; @@ -244,8 +244,11 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool, pool->user.detach_time)) goto err_cancel; - if (binding && nla_put_u32(rsp, NETDEV_A_PAGE_POOL_DMABUF, binding->id)) - goto err_cancel; + if (pool->mp_ops == &dmabuf_devmem_ops) { + binding = pool->mp_priv; + if (nla_put_u32(rsp, NETDEV_A_PAGE_POOL_DMABUF, binding->id)) + goto err_cancel; + } genlmsg_end(rsp, hdr); @@ -353,16 +356,16 @@ void page_pool_unlist(struct page_pool *pool) int page_pool_check_memory_provider(struct net_device *dev, struct netdev_rx_queue *rxq) { - struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv; + void *mp_priv = rxq->mp_params.mp_priv; struct page_pool *pool; struct hlist_node *n; - if (!binding) + if (!mp_priv) return 0; mutex_lock(&page_pools_lock); hlist_for_each_entry_safe(pool, n, &dev->page_pools, user.list) { - if (pool->mp_priv != binding) + if (pool->mp_priv != mp_priv) continue; if (pool->slow.queue_idx == get_netdev_rx_queue_index(rxq)) { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5feef46426f4..2140fa1ec9f8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -277,6 +277,7 @@ #include #include #include +#include #include #include @@ -2475,6 +2476,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, } niov = skb_frag_net_iov(frag); + if (niov->pp->mp_ops != &dmabuf_devmem_ops) { + err = -ENODEV; + goto out; + } + end = start + skb_frag_size(frag); copy = end - offset; From patchwork Mon Oct 7 22:15:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825344 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4158C18FDAC for ; Mon, 7 Oct 2024 22:16:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339396; cv=none; b=CITBX4s85KtHJVO0Zq2IGjgmXNi/CwB/TeU5U8yRiuVL2ycHvn6uv4U1ZD53efawAID/5Gq21JMuHsAu7MO3t3JdNnv5F/leggJdJLGcqH7fqb7ijycZRmtWqM4eaY2QCMLTS5fFp2KAPH25/TxVhyWQJ/T+kxaEwHA+hA5hPak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339396; c=relaxed/simple; bh=9WBfy/JyuSrDBn5Dv9c4MWarmPAF9vNMtpPK/13z3hQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=krF6TiuXeyyqleNjjTeCkba0dypvhPXBXeUVy8emOJR6UCMxr2heMG4v4yJxyxcIrGCQEZvlVxplY6Z7GymjabOA5zp52v2NOSRXBVoBmD04QX++qI2Hhv4A1KRixH1+O3PMnh3hAEtHhKZR1rvDjpCLrBjPC3LmWv4m59z15so= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=RWLWaOiv; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="RWLWaOiv" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-71df1fe11c1so1954034b3a.2 for ; Mon, 07 Oct 2024 15:16:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339395; x=1728944195; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XVeL/qGFfpZ8R7diB0tLtQAmfPMkGjT+PyauVsiDpEk=; b=RWLWaOivLJHVDK4mU/7oMKCc/IVvLFBc2pGkmagQLxqByBb41VE5VX8hJaiyBV96/1 nz6ZuGNOHRIWK7MjJRbPLDFKIKIb7N55EQt1sGbIGQAROcUGIvIf/QHX4sXtnCkopalB FTjDIbTM8Y85fBvHifO4AaG3h3/OxS3trPMsO9U4rQ3NNEjXpq5+92vX2CSCLKB8+w1F HHhFPBHaeOBisgLA2XtlISRgvqht1qVpet7OlJmDW/nfIOmQ6ifIbo+fIirtrLUJgw2c LcTX5w3+vQARWGweZUZuhBnTQqPKs6TUOTRIaUOUilNHXUSXsUs7RB5frYtu25fsWtRk kleg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339395; x=1728944195; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XVeL/qGFfpZ8R7diB0tLtQAmfPMkGjT+PyauVsiDpEk=; b=nN0ICZAhE1GnGM/bRVshSFDKqEfMVcxHDnmzmrF+Y/flzijfSXTI/pPFBnI8X6Kou+ Lh7vHLWVnxzM4E6zIOpfzzEtH834+iVvp6A/tdOma/27oq1Ao2CTVdphhnMH8XZfqdGB uuodBZ9QEb2czlK7xWtYp10WzhW7g8uxKxOvZEKkblh3krRYB2/g1+wUO9bUl9Jq4kY7 eotPzDtw2fPO4j1F8VWHUMu0Pbmy8hK4Gb+HqnlUQsmQ4IhK2uJQjkuFFbr/2aM8Cwmp dXzV+LlDeD5ze7QOz8Y5J6kzZwRT+bDG7Ui9m7DccA+n9u30ZnNXARweBIYCs8FvO8H4 CQKQ== X-Forwarded-Encrypted: i=1; AJvYcCUb5Wq0mBEaTa1Raz319/LDwcccnvPgVo656DpXv3jJcFDg4cjNEAsiKHJyWVELKUdovELECQE=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/aSbZmzCD0hH8RZMZFt3upcfHWyJhbOkls5xhHnXabXLPW6ly thmJp2bFG55EuSpAxnH20lzCe2eWPGUNt/JLMmv2JXLO+/7YFEB/GpIUQMKBEQo= X-Google-Smtp-Source: AGHT+IHegCrN5QL0rZ0MnmFa7fPZpbWcgff4p9G4WYORD/nyz+of7AEWu5cNnyeTMRt+kHX9KhvLUQ== X-Received: by 2002:a05:6a00:3c88:b0:71d:f423:e6d8 with SMTP id d2e1a72fcca58-71df423e9a1mr14912700b3a.6.1728339394670; Mon, 07 Oct 2024 15:16:34 -0700 (PDT) Received: from localhost (fwdproxy-prn-001.fbsv.net. [2a03:2880:ff:1::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0d66324sm5056841b3a.172.2024.10.07.15.16.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:34 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 06/15] net: page_pool: add ->scrub mem provider callback Date: Mon, 7 Oct 2024 15:15:54 -0700 Message-ID: <20241007221603.1703699-7-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov page pool is now waiting for all ppiovs to return before destroying itself, and for that to happen the memory provider might need to push some buffers, flush caches and so on. todo: we'll try to get by without it before the final release Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/types.h | 1 + net/core/page_pool.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 8a35fe474adb..fd0376ad0d26 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -157,6 +157,7 @@ struct memory_provider_ops { bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem); int (*init)(struct page_pool *pool); void (*destroy)(struct page_pool *pool); + void (*scrub)(struct page_pool *pool); }; struct pp_memory_provider_params { diff --git a/net/core/page_pool.c b/net/core/page_pool.c index c21c5b9edc68..9a675e16e6a4 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -1038,6 +1038,9 @@ static void page_pool_empty_alloc_cache_once(struct page_pool *pool) static void page_pool_scrub(struct page_pool *pool) { + if (pool->mp_ops && pool->mp_ops->scrub) + pool->mp_ops->scrub(pool); + page_pool_empty_alloc_cache_once(pool); pool->destroy_cnt++; From patchwork Mon Oct 7 22:15:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825345 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E664018E04E for ; Mon, 7 Oct 2024 22:16:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339398; cv=none; b=ZC7V9vq7sCxaM7LqsuLfarHzQVnejBQyOXCmv5h9UJ7CmZDkZa2khztGeshxpQeD3C1tRwwWhuVc0NcUdItz2qap8U/2jrcS8GBuFMtwLeiDoNVPHP26jYI95OAFy4G2ENp9n/dPRE1ScqHxQnsReIfr/vefPhm1/h+tSayibqY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339398; c=relaxed/simple; bh=Yblnr5AMNf8RSN/GJ4PgNWOte80ZkvI1vRY4B3QrYq4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F08SXW96nojCSXO+CfLBGVG6lqyIVPpR0Yj8pe13EPrNwdRItgKUuYKEO598pEBNopL5JKrt6tZc7n7DYXNDD5tnkuF9PoM3Ut4V+f/9IGva/ajfOBJs+CRUL7sRXPm3quYzwUX0PC1ownCNqRB49ouw7SWc0+y1Gu6INzN32Z8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=ppftiPl4; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="ppftiPl4" Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-71e10ae746aso255377b3a.2 for ; Mon, 07 Oct 2024 15:16:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339396; x=1728944196; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KSYwmGSWU8/HOUW5E/L1560L8milEKARxaMDCraERqM=; b=ppftiPl4hdTfPuzyfkZQk3OYsZ0Ve+EXSi1hJpjXmncP1U+ZKT2KlTgZfWMlt2W5JH 1oj/nDxaqHFYjfX/mA0xo70pb4ngm71iGG5ZOgBZS9+UKPjXcOjc+S6Ql4ZtBNOq7zvR J1bweY5OjD4WFjuiswNSdE/cAfS5VGLEFrCc3KwaTsDHOPcK5h4kU/lXIX+NNSPKtqbM aMojM9OXXy1dEVnIIb6p+cXfZ6U6lOGPBusi1Vy9be37gAxQm+rsQdPmvC0fbIoLSkDS wyvafKrYT7a8yNuZpM2VfBC//xKES/MSiXe0JHRD5oVaANNJl2X11Yo3z/UVy7ztXptk jX5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339396; x=1728944196; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KSYwmGSWU8/HOUW5E/L1560L8milEKARxaMDCraERqM=; b=H6bQ7389uzb0gsHzuiVVVqyQ3YZ6jzuTD1Bhfs91qgUXD36WYIO3mLUvUWgbj9Ilyb 7KPRxnFwUFVv0el2tWhq766qP7JOKe9QBsfYGzFNOQz2ZDqBm5Ks08PFLiUjAbR5gObe 30eT893WIPuZhOmrAppG9FroCt1eQX4STnZZTFaDBvnanPx5rOTejm5q/V93Q7bs9p3J W1gqA1KVWyKBpUtdWbwIqZk5LcPVxE2aLK3iuo4srzvxx++f9D7nb8z1xoTRJYeEQLUu 3Vml09GQ4+WT1hR6/MixgC9GBBz1WwOM1GQDUKesTs+22Cs6ies0yuZJOxzUmYJ2X2VE uypg== X-Forwarded-Encrypted: i=1; AJvYcCU0KQfouc9op6JaALn/l1nQN3CYXFKQcqLRSXoSPYITy8KmYLp7P50G1tZ9YnLHGZ5/KttuaJw=@vger.kernel.org X-Gm-Message-State: AOJu0Ywz3gT20CIgSeuynmL5id20zLvZdSlcQFTAm5JtVe8nPgBVf+wB OfUnHPiJdCajg6T82WsydgABEa4zR3+6YIUELyKxIvhro2HPc9Lz5ZzQCGZdnH4= X-Google-Smtp-Source: AGHT+IF/ExXILoxc9I6iabRffL3xjpaSDIYhIvzLL5wAq4E+cSysf3izJGWwISFwqNSIWYASOBriaA== X-Received: by 2002:a05:6a00:b42:b0:71e:48b:6422 with SMTP id d2e1a72fcca58-71e048b6681mr7200675b3a.2.1728339396016; Mon, 07 Oct 2024 15:16:36 -0700 (PDT) Received: from localhost (fwdproxy-prn-116.fbsv.net. [2a03:2880:ff:74::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0d451cesm4909496b3a.110.2024.10.07.15.16.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:35 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 07/15] net: page pool: add helper creating area from pages Date: Mon, 7 Oct 2024 15:15:55 -0700 Message-ID: <20241007221603.1703699-8-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov Add a helper that takes an array of pages and initialises passed in memory provider's area with them, where each net_iov takes one page. It's also responsible for setting up dma mappings. We keep it in page_pool.c not to leak netmem details to outside providers like io_uring, which don't have access to netmem_priv.h and other private helpers. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/page_pool/types.h | 17 ++++++++++ net/core/page_pool.c | 61 +++++++++++++++++++++++++++++++++-- 2 files changed, 76 insertions(+), 2 deletions(-) diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index fd0376ad0d26..1180ad07423c 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -271,6 +271,11 @@ void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *), const struct xdp_mem_info *mem); void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count); + +int page_pool_init_paged_area(struct page_pool *pool, + struct net_iov_area *area, struct page **pages); +void page_pool_release_area(struct page_pool *pool, + struct net_iov_area *area); #else static inline void page_pool_destroy(struct page_pool *pool) { @@ -286,6 +291,18 @@ static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count) { } + +static inline int page_pool_init_paged_area(struct page_pool *pool, + struct net_iov_area *area, + struct page **pages) +{ + return -EOPNOTSUPP; +} + +static inline void page_pool_release_area(struct page_pool *pool, + struct net_iov_area *area) +{ +} #endif void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem, diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 9a675e16e6a4..112b6fe4b7ff 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -459,7 +459,8 @@ page_pool_dma_sync_for_device(const struct page_pool *pool, __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); } -static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) +static bool page_pool_dma_map_page(struct page_pool *pool, netmem_ref netmem, + struct page *page) { dma_addr_t dma; @@ -468,7 +469,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) * into page private data (i.e 32bit cpu with 64bit DMA caps) * This mapping is kept for lifetime of page, until leaving pool. */ - dma = dma_map_page_attrs(pool->p.dev, netmem_to_page(netmem), 0, + dma = dma_map_page_attrs(pool->p.dev, page, 0, (PAGE_SIZE << pool->p.order), pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); @@ -490,6 +491,11 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) return false; } +static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) +{ + return page_pool_dma_map_page(pool, netmem, netmem_to_page(netmem)); +} + static struct page *__page_pool_alloc_page_order(struct page_pool *pool, gfp_t gfp) { @@ -1154,3 +1160,54 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) } } EXPORT_SYMBOL(page_pool_update_nid); + +static void page_pool_release_page_dma(struct page_pool *pool, + netmem_ref netmem) +{ + __page_pool_release_page_dma(pool, netmem); +} + +int page_pool_init_paged_area(struct page_pool *pool, + struct net_iov_area *area, struct page **pages) +{ + struct net_iov *niov; + netmem_ref netmem; + int i, ret = 0; + + if (!pool->dma_map) + return -EOPNOTSUPP; + + for (i = 0; i < area->num_niovs; i++) { + niov = &area->niovs[i]; + netmem = net_iov_to_netmem(niov); + + page_pool_set_pp_info(pool, netmem); + if (!page_pool_dma_map_page(pool, netmem, pages[i])) { + ret = -EINVAL; + goto err_unmap_dma; + } + } + return 0; + +err_unmap_dma: + while (i--) { + netmem = net_iov_to_netmem(&area->niovs[i]); + page_pool_release_page_dma(pool, netmem); + } + return ret; +} + +void page_pool_release_area(struct page_pool *pool, + struct net_iov_area *area) +{ + int i; + + if (!pool->dma_map) + return; + + for (i = 0; i < area->num_niovs; i++) { + struct net_iov *niov = &area->niovs[i]; + + page_pool_release_page_dma(pool, net_iov_to_netmem(niov)); + } +} From patchwork Mon Oct 7 22:15:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825346 X-Patchwork-Delegate: kuba@kernel.org Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C12F2190693 for ; Mon, 7 Oct 2024 22:16:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339399; cv=none; b=en9AVLVz7kUFUtpNEWFqC3o6IwOPBVyfs1tRasGIK/wUmSo70OHI8BMOYwuUtqy74NSjtjYLVJuyRMQQPYfnSvkFwX50rC7KY31vLLJMnrYrqfhZcQaeoLiMUsmo2Te7qraaNhoUeq/eU1YsH4/eiVIT3sFkeg6es96PHuLwr3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339399; c=relaxed/simple; bh=zIru/BLVEHCMXIaDhxnz8dkzIlpsBj3/Ne9ovJPqNQs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WJ2fVOlyXhArbhHgcKcVUPVxrpyW4/SuRncYU1DUwRwfNCq9sCtNDTb4sTYuSOONuR9/B6lWTXkGG5XAzI7V19eaBVXNke+AEvS4gFUiDwL04j/13AIWywzILnIjZYJAoU3PQpM2FpOmgTxnMswuB3viQ/c2cRuHDlgss0IkD4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=jHgI35/Y; arc=none smtp.client-ip=209.85.210.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="jHgI35/Y" Received: by mail-pf1-f181.google.com with SMTP id d2e1a72fcca58-71df4620966so2273736b3a.0 for ; Mon, 07 Oct 2024 15:16:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339397; x=1728944197; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JgRkWbTo4LNXX7PdXh2qka6BqUOW0eDFLQUXUzzANfY=; b=jHgI35/Ya45E+PZgZMDNdOqsTD4iYKjSuBrU3TBlK560JIg9GBDGyR34aWe+ZGqTY6 lnSN5mcseEtbDFWmfAyJY0tFGpGh22ooxLkejeimRCR+ahMjNikXwwZwR7DVYGXyXYYK jkMq3p9xoTeOBYbHzwEYTdB6BSrBlpAujksYYPNl8uZZDM+B212Yoay+yl2qo9ItEWHH pJz1zAjo4bdfmMQlGYZ85HPYN9BqknNrudMfX40eMhp4engvi5vETeimQgeL2Xwl79OE kgBbo6HHPyKf5DW10uJWnGa+ZXZup1+Uy+BGWhGK5dgSZpjEoY8vr8AoGxG5KVCkA3FR c4vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339397; x=1728944197; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JgRkWbTo4LNXX7PdXh2qka6BqUOW0eDFLQUXUzzANfY=; b=PYLv5WZmm/0WDel1FiNwhbxL5tn/B8DlzaUkI9u58+BQDKcufbjHwBId7m6IoaUEy3 gE+doN+Gsj7cAKErumHLEtH5iRaNTSYLCc/gUNhQ9a8NHr4sTEE9jhRYbSGkvfyrrrak 6/OiJW3XC1+dVGVFs9qmV/AhUTaY5IC01A0X6VoUCyGmM0cxZY1geN6ofMKPZMAa0TtI H0mdG5cIGd4nDA435hVrYOHeNKi3RIDqiJRQYdI1hsSdN4dftzaCv+Jx2M2IomNygCse 7TMEDdGcBZi8iQ/9m2YCXMaE4BxllUpn4D04kKpCRntufc5xJR+h/cpXHb0dCwUBQqS4 3grw== X-Forwarded-Encrypted: i=1; AJvYcCUc3azGxGGlQbRWIHUmSCZ7Pq6qHmogZ+L8+56EVinWzMexUHUv/nASZq6piOiTh0Qmwnor/Iw=@vger.kernel.org X-Gm-Message-State: AOJu0YybZxb2rBlbGJsKtnVFPB9dNQEQKX0fdHwybnNPsZlji7i6kQd6 CCWjVS5CUrVtlIt6yaYTj/UsCJnQj/UPjNfDeZkVbd7DiyFn4H9a1TH6JkTzg5U= X-Google-Smtp-Source: AGHT+IG0MogFJnFezjnGnU9rzQH0+lSqNIR7z+19K7U333m4qAPP5cH5BZLzZI0dCicMLH9pGlRSSQ== X-Received: by 2002:a05:6a00:170d:b0:71d:fea7:60c5 with SMTP id d2e1a72fcca58-71dfea7614dmr10901490b3a.19.1728339397202; Mon, 07 Oct 2024 15:16:37 -0700 (PDT) Received: from localhost (fwdproxy-prn-060.fbsv.net. [2a03:2880:ff:3c::face:b00c]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0cccba9sm4914089b3a.60.2024.10.07.15.16.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:36 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 08/15] net: add helper executing custom callback from napi Date: Mon, 7 Oct 2024 15:15:56 -0700 Message-ID: <20241007221603.1703699-9-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Pavel Begunkov It's useful to have napi private bits and pieces like page pool's fast allocating cache, so that the hot allocation path doesn't have to do any additional synchronisation. In case of io_uring memory provider introduced in following patches, we keep the consumer end of the io_uring's refill queue private to napi as it's a hot path. However, from time to time we need to synchronise with the napi, for example to add more user memory or allocate fallback buffers. Add a helper function napi_execute that allows to run a custom callback from under napi context so that it can access and modify napi protected parts of io_uring. It works similar to busy polling and stops napi from running in the meantime, so it's supposed to be a slow control path. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/net/busy_poll.h | 6 +++++ net/core/dev.c | 53 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index f03040baaefd..3fd9e65731e9 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -47,6 +47,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time); void napi_busy_loop(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), void *loop_end_arg, bool prefer_busy_poll, u16 budget); +void napi_execute(unsigned napi_id, void (*cb)(void *), void *cb_arg); void napi_busy_loop_rcu(unsigned int napi_id, bool (*loop_end)(void *, unsigned long), @@ -63,6 +64,11 @@ static inline bool sk_can_busy_loop(struct sock *sk) return false; } +static inline void napi_execute(unsigned napi_id, + void (*cb)(void *), void *cb_arg) +{ +} + #endif /* CONFIG_NET_RX_BUSY_POLL */ static inline unsigned long busy_loop_current_time(void) diff --git a/net/core/dev.c b/net/core/dev.c index 1e740faf9e78..ba2f43cf5517 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6497,6 +6497,59 @@ void napi_busy_loop(unsigned int napi_id, } EXPORT_SYMBOL(napi_busy_loop); +void napi_execute(unsigned napi_id, + void (*cb)(void *), void *cb_arg) +{ + struct napi_struct *napi; + bool done = false; + unsigned long val; + void *have_poll_lock = NULL; + + rcu_read_lock(); + + napi = napi_by_id(napi_id); + if (!napi) { + rcu_read_unlock(); + return; + } + + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + for (;;) { + local_bh_disable(); + val = READ_ONCE(napi->state); + + /* If multiple threads are competing for this napi, + * we avoid dirtying napi->state as much as we can. + */ + if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED | + NAPIF_STATE_IN_BUSY_POLL)) + goto restart; + + if (cmpxchg(&napi->state, val, + val | NAPIF_STATE_IN_BUSY_POLL | + NAPIF_STATE_SCHED) != val) + goto restart; + + have_poll_lock = netpoll_poll_lock(napi); + cb(cb_arg); + done = true; + gro_normal_list(napi); + local_bh_enable(); + break; +restart: + local_bh_enable(); + if (unlikely(need_resched())) + break; + cpu_relax(); + } + if (done) + busy_poll_stop(napi, have_poll_lock, false, 1); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); + rcu_read_unlock(); +} + #endif /* CONFIG_NET_RX_BUSY_POLL */ static void napi_hash_add(struct napi_struct *napi) From patchwork Mon Oct 7 22:15:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825347 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FD7118C92C for ; Mon, 7 Oct 2024 22:16:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339400; cv=none; b=rfbBB5fBHBTuHNsnosTGXs5t/I0sb9e4alJJNG20MiJLFZ4MXa4sU2CQRRIiA1H9D+rDd099qiSQpaPs+PeFKDo+MopWnGPlv8+fyP/4RRcGWEgQlJINJXpPIZ1R6rIy2BFpZJ6zoUVOkDvK4UHkZ1w2sRT4jseI+orarymTPH4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339400; c=relaxed/simple; bh=l0oLc5E/lEQH9b3nvawvtmwXlGXpKj5cniZ95/KQh2E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Sm5euO59nUEgZgRa+C/7Dt5ywbctQlGMUmNHQPiq+6m+kSqAQ/HO9UTvMnBPunKAv4qO77YjUREFXy3xv61+BD4E5aysIwydLVR8C5qRVmH6wU5G6Z8GYbfSnVM+cPA+Ag/pLOBh4sWGdfHtRme/6rcSTeYRexlsGaGM7fu7jXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=HuA2RRM9; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="HuA2RRM9" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-20b7259be6fso54296745ad.0 for ; Mon, 07 Oct 2024 15:16:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339398; x=1728944198; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8aGVWFZtH3nHvxILpOEkPzZeQKwzNT6cGzSV/XKDlHs=; b=HuA2RRM9hi2P3JXHZER/Ba14FrZwzemnSWyuwBvjFq+fdgMHcuPrvifHXGc984yVTC 5sGdlAUcQ2RJ9FqzePWzedXVGmq04YyxproVnSjo285rZx2wdpNufpdS9XkxlhZUdAh6 gbSWnz28XvIdzb4ITQpOB/YznzsTOv1F1ymspbG62vt/+7IYYVhyyaXV4GqFCy8nhAqz y7G49MqBW/8LfmwezlTgG//0p1yelKMaI2eIuVLVGZVxVYv6weFREtFdwRlrBxwAzgFa sWiLAk8ECGi4ENP6gLT0hhzshYpqolZaOwdb4tKEVIZjkoJGEiHV8quzkEFU6FvboZV8 KouA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339398; x=1728944198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8aGVWFZtH3nHvxILpOEkPzZeQKwzNT6cGzSV/XKDlHs=; b=UbHhd7iJ/Ccv1jEuRmohydWpO9hMDyZ90t9tMViGkYUhEictLJLwb3ujXNk0XQ9o2P yJl5Ho9n1BMO5NU1wGGKJvV3EFrYL5R9mfUyxHOtghIF3ZTnJGirqVZe64LZ0ZH6jHtB ImkSKprXMiBNlgzcMIe09Enoat5oJJ+RWKVMKB7ZE3VJ6GKQjFzgwdtqjcuXI51PGpS3 BSIG5fgbCDBT8z3RzrcSgVHLSnNoaK2nGsxy1/GtSPrHnV6vlomLMT1a1rayNj6pXRJD iFLGPWsLFOcbWEkZRUaMr51t63O/WGIaGWK69KCEk77/6p6GSoZX476LOHgw/SGDecK0 mqfg== X-Forwarded-Encrypted: i=1; AJvYcCU939lYmQauPC0mn3A8fme5UAyBpjjZYYdfRAYMqhhtu6y9okltYLlgy0M6A8VSAuCElakmWDk=@vger.kernel.org X-Gm-Message-State: AOJu0YxYugUN9YxYpJBZADd0dEx/FXxQQAoTpfIwQgGlW/OAKVFMPGM1 evxYN6cUdcsXtw0e2bKACiyFJiF8MyMbNFjeHfWWHho+nJNVEv+0SWbB8uNH+Tw= X-Google-Smtp-Source: AGHT+IGmivCZIXUNtcCa1XuWCwElfZavkxmodB33xiReOy9o7hDvGzETtwtWPSKrtBGAiIn2/8XhvQ== X-Received: by 2002:a17:902:e84d:b0:20b:b26e:c149 with SMTP id d9443c01a7336-20bfe01d3c9mr199425915ad.29.1728339398451; Mon, 07 Oct 2024 15:16:38 -0700 (PDT) Received: from localhost (fwdproxy-prn-060.fbsv.net. [2a03:2880:ff:3c::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c139391e5sm44394065ad.133.2024.10.07.15.16.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:38 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 09/15] io_uring/zcrx: add interface queue and refill queue Date: Mon, 7 Oct 2024 15:15:57 -0700 Message-ID: <20241007221603.1703699-10-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered using a new registration opcode IORING_REGISTER_ZCRX_IFQ. The refill queue is allocated by the kernel and mapped by userspace using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used by userspace to return buffers that it is done with, which will then be re-used by the netdev again. The main CQ ring is used to notify userspace of received data by using the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry contains the offset + len to the data. For now, each io_uring instance only has a single ifq. Signed-off-by: David Wei --- include/linux/io_uring_types.h | 3 + include/uapi/linux/io_uring.h | 43 ++++++++++ io_uring/Makefile | 1 + io_uring/io_uring.c | 7 ++ io_uring/memmap.c | 8 ++ io_uring/register.c | 7 ++ io_uring/zcrx.c | 147 +++++++++++++++++++++++++++++++++ io_uring/zcrx.h | 39 +++++++++ 8 files changed, 255 insertions(+) create mode 100644 io_uring/zcrx.c create mode 100644 io_uring/zcrx.h diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3315005df117..ace7ac056d51 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -39,6 +39,8 @@ enum io_uring_cmd_flags { IO_URING_F_COMPAT = (1 << 12), }; +struct io_zcrx_ifq; + struct io_wq_work_node { struct io_wq_work_node *next; }; @@ -372,6 +374,7 @@ struct io_ring_ctx { struct io_alloc_cache rsrc_node_cache; struct wait_queue_head rsrc_quiesce_wq; unsigned rsrc_quiesce; + struct io_zcrx_ifq *ifq; u32 pers_next; struct xarray personalities; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index adc2524fd8e3..567cdb89711e 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -457,6 +457,8 @@ struct io_uring_cqe { #define IORING_OFF_PBUF_RING 0x80000000ULL #define IORING_OFF_PBUF_SHIFT 16 #define IORING_OFF_MMAP_MASK 0xf8000000ULL +#define IORING_OFF_RQ_RING 0x20000000ULL +#define IORING_OFF_RQ_SHIFT 16 /* * Filled with the offset for mmap(2) @@ -595,6 +597,9 @@ enum io_uring_register_op { IORING_REGISTER_NAPI = 27, IORING_UNREGISTER_NAPI = 28, + /* register a netdev hw rx queue for zerocopy */ + IORING_REGISTER_ZCRX_IFQ = 29, + /* this goes last */ IORING_REGISTER_LAST, @@ -802,6 +807,44 @@ enum io_uring_socket_op { SOCKET_URING_OP_SETSOCKOPT, }; +/* Zero copy receive refill queue entry */ +struct io_uring_zcrx_rqe { + __u64 off; + __u32 len; + __u32 __pad; +}; + +struct io_uring_zcrx_cqe { + __u64 off; + __u64 __pad; +}; + +/* The bit from which area id is encoded into offsets */ +#define IORING_ZCRX_AREA_SHIFT 48 +#define IORING_ZCRX_AREA_MASK (~(((__u64)1 << IORING_ZCRX_AREA_SHIFT) - 1)) + +struct io_uring_zcrx_offsets { + __u32 head; + __u32 tail; + __u32 rqes; + __u32 mmap_sz; + __u64 __resv[2]; +}; + +/* + * Argument for IORING_REGISTER_ZCRX_IFQ + */ +struct io_uring_zcrx_ifq_reg { + __u32 if_idx; + __u32 if_rxq; + __u32 rq_entries; + __u32 flags; + + __u64 area_ptr; /* pointer to struct io_uring_zcrx_area_reg */ + struct io_uring_zcrx_offsets offsets; + __u64 __resv[3]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/Makefile b/io_uring/Makefile index 61923e11c767..1a1184f3946a 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ epoll.o statx.o timeout.o fdinfo.o \ cancel.o waitid.o register.o \ truncate.o memmap.o +obj-$(CONFIG_PAGE_POOL) += zcrx.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3942db160f18..02856245af3c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -97,6 +97,7 @@ #include "uring_cmd.h" #include "msg_ring.h" #include "memmap.h" +#include "zcrx.h" #include "timeout.h" #include "poll.h" @@ -2600,6 +2601,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) return; mutex_lock(&ctx->uring_lock); + io_unregister_zcrx_ifqs(ctx); if (ctx->buf_data) __io_sqe_buffers_unregister(ctx); if (ctx->file_data) @@ -2772,6 +2774,11 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_cqring_overflow_kill(ctx); mutex_unlock(&ctx->uring_lock); } + if (ctx->ifq) { + mutex_lock(&ctx->uring_lock); + io_shutdown_zcrx_ifqs(ctx); + mutex_unlock(&ctx->uring_lock); + } if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) io_move_task_work_from_local(ctx); diff --git a/io_uring/memmap.c b/io_uring/memmap.c index a0f32a255fd1..4c384e8615f6 100644 --- a/io_uring/memmap.c +++ b/io_uring/memmap.c @@ -12,6 +12,7 @@ #include "memmap.h" #include "kbuf.h" +#include "zcrx.h" static void *io_mem_alloc_compound(struct page **pages, int nr_pages, size_t size, gfp_t gfp) @@ -223,6 +224,10 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff, io_put_bl(ctx, bl); return ptr; } + case IORING_OFF_RQ_RING: + if (!ctx->ifq) + return ERR_PTR(-EINVAL); + return ctx->ifq->rq_ring; } return ERR_PTR(-EINVAL); @@ -261,6 +266,9 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) ctx->n_sqe_pages); case IORING_OFF_PBUF_RING: return io_pbuf_mmap(file, vma); + case IORING_OFF_RQ_RING: + return io_uring_mmap_pages(ctx, vma, ctx->ifq->rqe_pages, + ctx->ifq->n_rqe_pages); } return -EINVAL; diff --git a/io_uring/register.c b/io_uring/register.c index e3c20be5a198..3b221427e988 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -28,6 +28,7 @@ #include "kbuf.h" #include "napi.h" #include "eventfd.h" +#include "zcrx.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -511,6 +512,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_unregister_napi(ctx, arg); break; + case IORING_REGISTER_ZCRX_IFQ: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_zcrx_ifq(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c new file mode 100644 index 000000000000..79d79b9b8df8 --- /dev/null +++ b/io_uring/zcrx.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "kbuf.h" +#include "memmap.h" +#include "zcrx.h" + +#define IO_RQ_MAX_ENTRIES 32768 + +#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) + +static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, + struct io_uring_zcrx_ifq_reg *reg) +{ + size_t off, size; + void *ptr; + + off = sizeof(struct io_uring); + size = off + sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries; + + ptr = io_pages_map(&ifq->rqe_pages, &ifq->n_rqe_pages, size); + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + + ifq->rq_ring = (struct io_uring *)ptr; + ifq->rqes = (struct io_uring_zcrx_rqe *)((char *)ptr + off); + return 0; +} + +static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq) +{ + io_pages_unmap(ifq->rq_ring, &ifq->rqe_pages, &ifq->n_rqe_pages, true); + ifq->rq_ring = NULL; + ifq->rqes = NULL; +} + +static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) +{ + struct io_zcrx_ifq *ifq; + + ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); + if (!ifq) + return NULL; + + ifq->if_rxq = -1; + ifq->ctx = ctx; + return ifq; +} + +static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) +{ + io_free_rbuf_ring(ifq); + kfree(ifq); +} + +int io_register_zcrx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zcrx_ifq_reg __user *arg) +{ + struct io_uring_zcrx_ifq_reg reg; + struct io_zcrx_ifq *ifq; + size_t ring_sz, rqes_sz; + int ret; + + /* + * 1. Interface queue allocation. + * 2. It can observe data destined for sockets of other tasks. + */ + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + /* mandatory io_uring features for zc rx */ + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN && + ctx->flags & IORING_SETUP_CQE32)) + return -EINVAL; + if (ctx->ifq) + return -EBUSY; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (reg.__resv[0] || reg.__resv[1] || reg.__resv[2]) + return -EINVAL; + if (reg.if_rxq == -1 || !reg.rq_entries || reg.flags) + return -EINVAL; + if (reg.rq_entries > IO_RQ_MAX_ENTRIES) { + if (!(ctx->flags & IORING_SETUP_CLAMP)) + return -EINVAL; + reg.rq_entries = IO_RQ_MAX_ENTRIES; + } + reg.rq_entries = roundup_pow_of_two(reg.rq_entries); + + if (!reg.area_ptr) + return -EFAULT; + + ifq = io_zcrx_ifq_alloc(ctx); + if (!ifq) + return -ENOMEM; + + ret = io_allocate_rbuf_ring(ifq, ®); + if (ret) + goto err; + + ifq->rq_entries = reg.rq_entries; + ifq->if_rxq = reg.if_rxq; + + ring_sz = sizeof(struct io_uring); + rqes_sz = sizeof(struct io_uring_zcrx_rqe) * ifq->rq_entries; + reg.offsets.mmap_sz = ring_sz + rqes_sz; + reg.offsets.rqes = ring_sz; + reg.offsets.head = offsetof(struct io_uring, head); + reg.offsets.tail = offsetof(struct io_uring, tail); + + if (copy_to_user(arg, ®, sizeof(reg))) { + ret = -EFAULT; + goto err; + } + + ctx->ifq = ifq; + return 0; +err: + io_zcrx_ifq_free(ifq); + return ret; +} + +void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) +{ + struct io_zcrx_ifq *ifq = ctx->ifq; + + lockdep_assert_held(&ctx->uring_lock); + + if (!ifq) + return; + + ctx->ifq = NULL; + io_zcrx_ifq_free(ifq); +} + +void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) +{ + lockdep_assert_held(&ctx->uring_lock); +} + +#endif diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h new file mode 100644 index 000000000000..4ef94e19d36b --- /dev/null +++ b/io_uring/zcrx.h @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_ZC_RX_H +#define IOU_ZC_RX_H + +#include + +struct io_zcrx_ifq { + struct io_ring_ctx *ctx; + struct net_device *dev; + struct io_uring *rq_ring; + struct io_uring_zcrx_rqe *rqes; + u32 rq_entries; + + unsigned short n_rqe_pages; + struct page **rqe_pages; + + u32 if_rxq; +}; + +#if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) +int io_register_zcrx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zcrx_ifq_reg __user *arg); +void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx); +void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx); +#else +static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx, + struct io_uring_zcrx_ifq_reg __user *arg) +{ + return -EOPNOTSUPP; +} +static inline void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) +{ +} +static inline void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) +{ +} +#endif + +#endif From patchwork Mon Oct 7 22:15:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825348 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77AF1192D9E for ; Mon, 7 Oct 2024 22:16:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339401; cv=none; b=CEC8W8hso45C9Z68t/4SJSnVZA+LyqbOiCsFqG8RXcozEgKoRl3CFdF2pQq6n8w5L7ktl5FKmSP9z23k2brPExH/R24G+WY+QXsLZ3P2StZ979FvAod0Hv4DuLshvx8YQkgFnb2GvHqkK4obGA8jxZbxMQrDd3K+z/UYiyNTyVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339401; c=relaxed/simple; bh=iQqYs6usFP25njJko5JRYcKx1VwHRiofw9tOmNkUUg4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oBSLnrnQynz6mAFkBef3DEz16sMZ6JBtje0TQReo/ipSyrf/W07Tk+qN8ptF3Qey4ehBfj1pUsqcL/3+DKjdRkMHMJkrV3yfIwrvDndlyGbHt4KOMi8Tce+hJ1PYshnRxVnj22yLm6AMt2RDPZIQHJsVwJ2UC1/kH89NRWXdR98= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=FlIOqRmw; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="FlIOqRmw" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-20b49ee353cso46362275ad.2 for ; Mon, 07 Oct 2024 15:16:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339400; x=1728944200; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W6nAZRIB2D57h+CRarYB+vjyW8oJUse6eShdZAdmZOQ=; b=FlIOqRmwetFct0+1Ci7igyS/To0PC0h4A6vyT1VG5pRmDHXX/tlPCnUtq+vqQjtQHF EAypSWDZxy3+B7UzEP7CCVRTOX3mpc+3Q8LMLbyyZhA9ZVgG1KZwvQcilZijqJVfu68u xJfdBLEhHpiHv74GTgDsaivLs7moURT2lUVRl1h7famy0GukrdvflLcoPIrSdZTP9lOg Sd2OfzetyccaUwaKqvM65SQUq248heMO9WjI75khb38cmdMTr/0FO3MuHGDg0g5n6A4z 222/FltIO1nBqGDNCz1WHZPMd7ZonFChGjuAqHUFZRt2FNY7UEefGW6ZJxn//yqzP6Bd D1/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339400; x=1728944200; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W6nAZRIB2D57h+CRarYB+vjyW8oJUse6eShdZAdmZOQ=; b=haPcVIwkC6iOaElC2PDNW57GtspDmi5KWRBRqn4AYfiTGTz4i6TeNRQEhHrcHCNEPA MKnJKvchYQb7OVD09dsWmuc9XWUDAekXW2x/rXoXY7eg7ohGlLJS7KP9vowGChjCb4yX cFiUMOzGIFKlj7rVRUclPSSv5PE+NRg5NpF/oCZp+4gDCa6wr3K6fq2ZtZCR+r/SCHwa VfNV9ZeDAbZEy3b4VjrqfsCl6T8sDZYB1CPx34iVRYYT7LGOOEEFGFSIPMtX10jDQWSD STUR3plWbNYRJzWPw7gpkasjQFOlAWWrSOdJ1q9eeV4C3ADaZ/OewpR9gYninmYb5h+Q 2Avw== X-Forwarded-Encrypted: i=1; AJvYcCVKkOIXvpGKTH/9gqrubytO367/29plaNFl1/kVIlkuJytiSSDHYVYFYjNOgRoZ6QUIpkZTgTg=@vger.kernel.org X-Gm-Message-State: AOJu0YxazwewNQuXe7ulD21wTqwYkGOe8PFqEWwr5YqqNV1vHRUNc1+2 RrBQ653qTYX0EBFPp1zpXa74/lwu10VtzVRFkawvfYAjqCgkMQxCP0d5nZ5Icqk= X-Google-Smtp-Source: AGHT+IHlR+Ip6p5EQgh0qVKVZ71k4429xDchgj1lklos+meUcc/KSh5DkItonBDDj01eGwoTz0hezw== X-Received: by 2002:a17:903:2442:b0:207:1675:6709 with SMTP id d9443c01a7336-20bfd9a527bmr195733225ad.0.1728339399777; Mon, 07 Oct 2024 15:16:39 -0700 (PDT) Received: from localhost (fwdproxy-prn-018.fbsv.net. [2a03:2880:ff:12::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c139a125asm44245585ad.292.2024.10.07.15.16.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:39 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 10/15] io_uring/zcrx: add io_zcrx_area Date: Mon, 7 Oct 2024 15:15:58 -0700 Message-ID: <20241007221603.1703699-11-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei Add io_zcrx_area that represents a region of userspace memory that is used for zero copy. During ifq registration, userspace passes in the uaddr and len of userspace memory, which is then pinned by the kernel. Each net_iov is mapped to one of these pages. The freelist is a spinlock protected list that keeps track of all the net_iovs/pages that aren't used. For now, there is only one area per ifq and area registration happens implicitly as part of ifq registration. There is no API for adding/removing areas yet. The struct for area registration is there for future extensibility once we support multiple areas and TCP devmem. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 9 ++++ io_uring/rsrc.c | 2 +- io_uring/rsrc.h | 1 + io_uring/zcrx.c | 93 ++++++++++++++++++++++++++++++++++- io_uring/zcrx.h | 16 ++++++ 5 files changed, 118 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 567cdb89711e..ffd315d8c6b5 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -831,6 +831,15 @@ struct io_uring_zcrx_offsets { __u64 __resv[2]; }; +struct io_uring_zcrx_area_reg { + __u64 addr; + __u64 len; + __u64 rq_area_token; + __u32 flags; + __u32 __resv1; + __u64 __resv2[2]; +}; + /* * Argument for IORING_REGISTER_ZCRX_IFQ */ diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 453867add7ca..42606404019e 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -85,7 +85,7 @@ static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages) return 0; } -static int io_buffer_validate(struct iovec *iov) +int io_buffer_validate(struct iovec *iov) { unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1); diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index c032ca3436ca..e691e8ed849b 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -74,6 +74,7 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg, unsigned size, unsigned type); int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg, unsigned int size, unsigned int type); +int io_buffer_validate(struct iovec *iov); static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 79d79b9b8df8..8382129402ac 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -10,6 +10,7 @@ #include "kbuf.h" #include "memmap.h" #include "zcrx.h" +#include "rsrc.h" #define IO_RQ_MAX_ENTRIES 32768 @@ -40,6 +41,83 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq) ifq->rqes = NULL; } +static void io_zcrx_free_area(struct io_zcrx_area *area) +{ + if (area->freelist) + kvfree(area->freelist); + if (area->nia.niovs) + kvfree(area->nia.niovs); + if (area->pages) { + unpin_user_pages(area->pages, area->nia.num_niovs); + kvfree(area->pages); + } + kfree(area); +} + +static int io_zcrx_create_area(struct io_ring_ctx *ctx, + struct io_zcrx_ifq *ifq, + struct io_zcrx_area **res, + struct io_uring_zcrx_area_reg *area_reg) +{ + struct io_zcrx_area *area; + int i, ret, nr_pages; + struct iovec iov; + + if (area_reg->flags || area_reg->rq_area_token) + return -EINVAL; + if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1]) + return -EINVAL; + if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK) + return -EINVAL; + + iov.iov_base = u64_to_user_ptr(area_reg->addr); + iov.iov_len = area_reg->len; + ret = io_buffer_validate(&iov); + if (ret) + return ret; + + ret = -ENOMEM; + area = kzalloc(sizeof(*area), GFP_KERNEL); + if (!area) + goto err; + + area->pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len, + &nr_pages); + if (IS_ERR(area->pages)) { + ret = PTR_ERR(area->pages); + area->pages = NULL; + goto err; + } + area->nia.num_niovs = nr_pages; + + area->nia.niovs = kvmalloc_array(nr_pages, sizeof(area->nia.niovs[0]), + GFP_KERNEL | __GFP_ZERO); + if (!area->nia.niovs) + goto err; + + area->freelist = kvmalloc_array(nr_pages, sizeof(area->freelist[0]), + GFP_KERNEL | __GFP_ZERO); + if (!area->freelist) + goto err; + + for (i = 0; i < nr_pages; i++) { + area->freelist[i] = i; + } + + area->free_count = nr_pages; + area->ifq = ifq; + /* we're only supporting one area per ifq for now */ + area->area_id = 0; + area_reg->rq_area_token = (u64)area->area_id << IORING_ZCRX_AREA_SHIFT; + spin_lock_init(&area->freelist_lock); + *res = area; + return 0; +err: + if (area) + io_zcrx_free_area(area); + return ret; +} + static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) { struct io_zcrx_ifq *ifq; @@ -55,6 +133,9 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) { + if (ifq->area) + io_zcrx_free_area(ifq->area); + io_free_rbuf_ring(ifq); kfree(ifq); } @@ -62,6 +143,7 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) { + struct io_uring_zcrx_area_reg area; struct io_uring_zcrx_ifq_reg reg; struct io_zcrx_ifq *ifq; size_t ring_sz, rqes_sz; @@ -93,7 +175,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, } reg.rq_entries = roundup_pow_of_two(reg.rq_entries); - if (!reg.area_ptr) + if (copy_from_user(&area, u64_to_user_ptr(reg.area_ptr), sizeof(area))) return -EFAULT; ifq = io_zcrx_ifq_alloc(ctx); @@ -104,6 +186,10 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, if (ret) goto err; + ret = io_zcrx_create_area(ctx, ifq, &ifq->area, &area); + if (ret) + goto err; + ifq->rq_entries = reg.rq_entries; ifq->if_rxq = reg.if_rxq; @@ -118,7 +204,10 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, ret = -EFAULT; goto err; } - + if (copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) { + ret = -EFAULT; + goto err; + } ctx->ifq = ifq; return 0; err: diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 4ef94e19d36b..2fcbeb3d5501 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -3,10 +3,26 @@ #define IOU_ZC_RX_H #include +#include + +struct io_zcrx_area { + struct net_iov_area nia; + struct io_zcrx_ifq *ifq; + + u16 area_id; + struct page **pages; + + /* freelist */ + spinlock_t freelist_lock ____cacheline_aligned_in_smp; + u32 free_count; + u32 *freelist; +}; struct io_zcrx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; + struct io_zcrx_area *area; + struct io_uring *rq_ring; struct io_uring_zcrx_rqe *rqes; u32 rq_entries; From patchwork Mon Oct 7 22:15:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825349 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB40D19340D for ; Mon, 7 Oct 2024 22:16:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339404; cv=none; b=YFg++4r5gg9nEfUVW4vSg61RgCzCtta5NDtwfK1/qHNQrym96zH9cm0EZOIOjCQkhyxFTgmHPw/m+u9HQOSIRlfxSrU7m7L3SpGKK6ccUqXw6HhyuHeP+uAACSCF6QM+5qKOFSRzENNdhiS83ifgrpWLRz3PX019qPRf1FR03Qw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339404; c=relaxed/simple; bh=ylDZLEz8geNgtOLThiFGfRQCFI9rwkwD2KrflKGTq5s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iXAtQCgj+B+IBCCo0Ne72SO20zCnlcnX90hWLdqaYiH98YAY25T5zBhEfKWeTbJNEwt5fD9McxUr5D6rwso+tSP0hqoUDfv4HJy+7yyYkCh4AhXqQEd8v7BCVXlqwEP1J3MWoylHw8K8XUuzWdf3ZH65+pEh6dRvalhMjHo6tvY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=z7jpZelP; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="z7jpZelP" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-20b6c311f62so42950285ad.0 for ; Mon, 07 Oct 2024 15:16:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339401; x=1728944201; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tQt2snHeeCUl2wmuoSme0YVWbdUzaN506EVWMGFosjk=; b=z7jpZelPEy03YeI2RZJRd6qcF7FcyvKXuwPzeatfbiavX9PVeWbcxo5GmNhxuZzqqY Rz+dD4M87/Hcq2mDfQsYY5SFtH/X+PRezrKFwg8X7agc8EIcpnIb7XdeYxUZq1QMuAG2 FRKAMybapMKm0H9+Qf0UV3RwhtjmxAxmRK7r4n0Q1JBM5rItYXY5LJ/glY04h9frXkJm w95bVtL5jt1CYDLV8X6xMiJtvCndcUb9sNOj6iq5hPowGT/wlVTh8xbcRKfK/uYWbBl8 dWpDIL9r04SHLBej1NJob4eQT5iepndgR6sHgyoroGG4sQWslXzLZ94mFxchcUfmsqT/ WbwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339401; x=1728944201; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tQt2snHeeCUl2wmuoSme0YVWbdUzaN506EVWMGFosjk=; b=A9WFmAwN4k9CWB1xODKOrJsNAdl9pEPxKWHSYhPF3ZjMZJ1FPn21HhUCFpKauKVBIw ey8VW/A1OTkGhWizUXE7dOpxhV0k77YTVL36Y7/RhMIj4mfJdPm4yC2UjMufNPilIqez vmw1gZzk64n1req862uCpQTMqm7J9Q27G0NXBUVirIy3hCY80VPEIdHOBPulAOU3Lbvx 4QbokWoB0536bJ5XZkWWnFWInkWS7hY9TQKmLAzKR9/Bl3T8AMwXNhwgOpoIrAXorizv o3o7VWDD6aOxcg8bu9CouBnVGDgManKJ0Na6lH1mEXp1HFnbybTB8+pFmRCmuCoWdavf WYdw== X-Forwarded-Encrypted: i=1; AJvYcCUHw8OGxykiJ2Yh/wJx6pKMAl77nTB6TLjIfwhYgp+IBEELs99ijE3KcJihoGOhSAbTvBCsNqo=@vger.kernel.org X-Gm-Message-State: AOJu0Yy5OUU9MJ20elmXGSjbAx89RZEMpbjLhZEybnyZIzhRydc20RbN kTvRkBqjecPgnFJvJWla4YxkecHQcBIf3AReU3mVCbxKBmj82IJCVikOyjhq3cg= X-Google-Smtp-Source: AGHT+IFoaIh1llAgQrmo/PzVebBGvOzgjnBwWIYt06hJ4MGa6qg+e/TY3zmCQmbl8EIBVbLnZBEGRQ== X-Received: by 2002:a17:902:e745:b0:20b:c1e4:2d70 with SMTP id d9443c01a7336-20bfe294c52mr204002135ad.23.1728339401120; Mon, 07 Oct 2024 15:16:41 -0700 (PDT) Received: from localhost (fwdproxy-prn-054.fbsv.net. [2a03:2880:ff:36::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c1398b868sm44342415ad.244.2024.10.07.15.16.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:40 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 11/15] io_uring/zcrx: implement zerocopy receive pp memory provider Date: Mon, 7 Oct 2024 15:15:59 -0700 Message-ID: <20241007221603.1703699-12-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov Implement a page pool memory provider for io_uring to receieve in a zero copy fashion. For that, the provider allocates user pages wrapped around into struct net_iovs, that are stored in a previously registered struct net_iov_area. Unlike with traditional receives, for which pages from a page pool can be deallocated right after the user receives data, e.g. via recv(2), we extend the lifetime by recycling buffers only after the user space acknowledges that it's done processing the data via the refill queue. Before handing buffers to the user, we mark them by bumping the refcount by a bias value IO_ZC_RX_UREF, which will be checked when the buffer is returned back. When the corresponding io_uring instance and/or page pool are destroyed, we'll force back all buffers that are currently in the user space in ->io_pp_zc_scrub by clearing the bias. Refcounting and lifetime: Initially, all buffers are considered unallocated and stored in ->freelist, at which point they are not yet directly exposed to the core page pool code and not accounted to page pool's pages_state_hold_cnt. The ->alloc_netmems callback will allocate them by placing into the page pool's cache, setting the refcount to 1 as usual and adjusting pages_state_hold_cnt. Then, either the buffer is dropped and returns back to the page pool into the ->freelist via io_pp_zc_release_netmem, in which case the page pool will match hold_cnt for us with ->pages_state_release_cnt. Or more likely the buffer will go through the network/protocol stacks and end up in the corresponding socket's receive queue. From there the user can get it via an new io_uring request implemented in following patches. As mentioned above, before giving a buffer to the user we bump the refcount by IO_ZC_RX_UREF. Once the user is done with the buffer processing, it must return it back via the refill queue, from where our ->alloc_netmems implementation can grab it, check references, put IO_ZC_RX_UREF, and recycle the buffer if there are no more users left. As we place such buffers right back into the page pools fast cache and they didn't go through the normal pp release path, they are still considered "allocated" and no pp hold_cnt is required. For the same reason we dma sync buffers for the device in io_zc_add_pp_cache(). Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- include/linux/io_uring/net.h | 5 + io_uring/zcrx.c | 229 +++++++++++++++++++++++++++++++++++ io_uring/zcrx.h | 6 + 3 files changed, 240 insertions(+) diff --git a/include/linux/io_uring/net.h b/include/linux/io_uring/net.h index b58f39fed4d5..610b35b451fd 100644 --- a/include/linux/io_uring/net.h +++ b/include/linux/io_uring/net.h @@ -5,6 +5,11 @@ struct io_uring_cmd; #if defined(CONFIG_IO_URING) + +#if defined(CONFIG_PAGE_POOL) +extern const struct memory_provider_ops io_uring_pp_zc_ops; +#endif + int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags); #else diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 8382129402ac..6cd3dee8b90a 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -2,7 +2,11 @@ #include #include #include +#include +#include #include +#include +#include #include @@ -16,6 +20,13 @@ #if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) +static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov) +{ + struct net_iov_area *owner = net_iov_owner(niov); + + return container_of(owner, struct io_zcrx_area, nia); +} + static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, struct io_uring_zcrx_ifq_reg *reg) { @@ -101,6 +112,9 @@ static int io_zcrx_create_area(struct io_ring_ctx *ctx, goto err; for (i = 0; i < nr_pages; i++) { + struct net_iov *niov = &area->nia.niovs[i]; + + niov->owner = &area->nia; area->freelist[i] = i; } @@ -233,4 +247,219 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) lockdep_assert_held(&ctx->uring_lock); } +static bool io_zcrx_niov_put(struct net_iov *niov, int nr) +{ + return atomic_long_sub_and_test(nr, &niov->pp_ref_count); +} + +static bool io_zcrx_put_niov_uref(struct net_iov *niov) +{ + if (atomic_long_read(&niov->pp_ref_count) < IO_ZC_RX_UREF) + return false; + + return io_zcrx_niov_put(niov, IO_ZC_RX_UREF); +} + +static inline void io_zc_add_pp_cache(struct page_pool *pp, + struct net_iov *niov) +{ + netmem_ref netmem = net_iov_to_netmem(niov); + +#if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC) + if (pp->dma_sync && dma_dev_need_sync(pp->p.dev)) { + dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem); + + dma_sync_single_range_for_device(pp->p.dev, dma_addr, + pp->p.offset, pp->p.max_len, + pp->p.dma_dir); + } +#endif + + page_pool_fragment_netmem(netmem, 1); + pp->alloc.cache[pp->alloc.count++] = netmem; +} + +static inline u32 io_zcrx_rqring_entries(struct io_zcrx_ifq *ifq) +{ + u32 entries; + + entries = smp_load_acquire(&ifq->rq_ring->tail) - ifq->cached_rq_head; + return min(entries, ifq->rq_entries); +} + +static struct io_uring_zcrx_rqe *io_zcrx_get_rqe(struct io_zcrx_ifq *ifq, + unsigned mask) +{ + unsigned int idx = ifq->cached_rq_head++ & mask; + + return &ifq->rqes[idx]; +} + +static void io_zcrx_ring_refill(struct page_pool *pp, + struct io_zcrx_ifq *ifq) +{ + unsigned int entries = io_zcrx_rqring_entries(ifq); + unsigned int mask = ifq->rq_entries - 1; + + entries = min_t(unsigned, entries, PP_ALLOC_CACHE_REFILL - pp->alloc.count); + if (unlikely(!entries)) + return; + + do { + struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask); + struct io_zcrx_area *area; + struct net_iov *niov; + unsigned niov_idx, area_idx; + + area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT; + niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) / PAGE_SIZE; + + if (unlikely(rqe->__pad || area_idx)) + continue; + area = ifq->area; + + if (unlikely(niov_idx >= area->nia.num_niovs)) + continue; + niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs); + + niov = &area->nia.niovs[niov_idx]; + if (!io_zcrx_put_niov_uref(niov)) + continue; + io_zc_add_pp_cache(pp, niov); + } while (--entries); + + smp_store_release(&ifq->rq_ring->head, ifq->cached_rq_head); +} + +static void io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *ifq) +{ + struct io_zcrx_area *area = ifq->area; + + spin_lock_bh(&area->freelist_lock); + while (area->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) { + struct net_iov *niov; + u32 pgid; + + pgid = area->freelist[--area->free_count]; + niov = &area->nia.niovs[pgid]; + + io_zc_add_pp_cache(pp, niov); + + pp->pages_state_hold_cnt++; + trace_page_pool_state_hold(pp, net_iov_to_netmem(niov), + pp->pages_state_hold_cnt); + } + spin_unlock_bh(&area->freelist_lock); +} + +static void io_zcrx_recycle_niov(struct net_iov *niov) +{ + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + + spin_lock_bh(&area->freelist_lock); + area->freelist[area->free_count++] = net_iov_idx(niov); + spin_unlock_bh(&area->freelist_lock); +} + +static netmem_ref io_pp_zc_alloc_netmems(struct page_pool *pp, gfp_t gfp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + + /* pp should already be ensuring that */ + if (unlikely(pp->alloc.count)) + goto out_return; + + io_zcrx_ring_refill(pp, ifq); + if (likely(pp->alloc.count)) + goto out_return; + + io_zcrx_refill_slow(pp, ifq); + if (!pp->alloc.count) + return 0; +out_return: + return pp->alloc.cache[--pp->alloc.count]; +} + +static bool io_pp_zc_release_netmem(struct page_pool *pp, netmem_ref netmem) +{ + struct net_iov *niov; + + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return false; + + niov = netmem_to_net_iov(netmem); + + if (io_zcrx_niov_put(niov, 1)) + io_zcrx_recycle_niov(niov); + return false; +} + +static void io_pp_zc_scrub(struct page_pool *pp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + struct io_zcrx_area *area = ifq->area; + int i; + + /* Reclaim back all buffers given to the user space. */ + for (i = 0; i < area->nia.num_niovs; i++) { + struct net_iov *niov = &area->nia.niovs[i]; + int count; + + if (!io_zcrx_put_niov_uref(niov)) + continue; + io_zcrx_recycle_niov(niov); + + count = atomic_inc_return_relaxed(&pp->pages_state_release_cnt); + trace_page_pool_state_release(pp, net_iov_to_netmem(niov), count); + } +} + +static int io_pp_zc_init(struct page_pool *pp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + struct io_zcrx_area *area = ifq->area; + int ret; + + if (!ifq) + return -EINVAL; + if (pp->p.order != 0) + return -EINVAL; + if (!pp->p.napi) + return -EINVAL; + if (!pp->p.napi->napi_id) + return -EINVAL; + + ret = page_pool_init_paged_area(pp, &area->nia, area->pages); + if (ret) + return ret; + + ifq->napi_id = pp->p.napi->napi_id; + percpu_ref_get(&ifq->ctx->refs); + ifq->pp = pp; + return 0; +} + +static void io_pp_zc_destroy(struct page_pool *pp) +{ + struct io_zcrx_ifq *ifq = pp->mp_priv; + struct io_zcrx_area *area = ifq->area; + + page_pool_release_area(pp, &ifq->area->nia); + + ifq->pp = NULL; + ifq->napi_id = 0; + + if (WARN_ON_ONCE(area->free_count != area->nia.num_niovs)) + return; + percpu_ref_put(&ifq->ctx->refs); +} + +const struct memory_provider_ops io_uring_pp_zc_ops = { + .alloc_netmems = io_pp_zc_alloc_netmems, + .release_netmem = io_pp_zc_release_netmem, + .init = io_pp_zc_init, + .destroy = io_pp_zc_destroy, + .scrub = io_pp_zc_scrub, +}; + #endif diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 2fcbeb3d5501..67512fc69cc4 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -5,6 +5,9 @@ #include #include +#define IO_ZC_RX_UREF 0x10000 +#define IO_ZC_RX_KREF_MASK (IO_ZC_RX_UREF - 1) + struct io_zcrx_area { struct net_iov_area nia; struct io_zcrx_ifq *ifq; @@ -22,15 +25,18 @@ struct io_zcrx_ifq { struct io_ring_ctx *ctx; struct net_device *dev; struct io_zcrx_area *area; + struct page_pool *pp; struct io_uring *rq_ring; struct io_uring_zcrx_rqe *rqes; u32 rq_entries; + u32 cached_rq_head; unsigned short n_rqe_pages; struct page **rqe_pages; u32 if_rxq; + unsigned napi_id; }; #if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) From patchwork Mon Oct 7 22:16:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825351 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4590E19993D for ; Mon, 7 Oct 2024 22:16:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339406; cv=none; b=iXXq3A+fWvYZAa3Cmvq9QdK+wJA3T8R6DqKA/mYzyQQjkU4t//YDbaLBaA28zGSbjoXeJNPBRv5mCPQe3zNygg4CmRvS/oGhzPXlH660DjZ8FJTXqQu/BEr295eJ4cpV0X6hIJq/ibjqBxcmAFahjGyx/QcJHoLcQz1MRd9uu9I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339406; c=relaxed/simple; bh=pUIpk8i1wQ1Q/0z8XtuP2bi46yAb28sOlJ9m1HZABoU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Eyy44cz+GwIIg1ii872e4qyiwcatCyz+0kmxNWDwcCy+adnRH217fTTcBDJITxqn/tpaq4iVORmoO2M5zLMSTiBRAqYFc0vGClRV3GvKpIUToO1pzR6UGt9ATvYBIRcn6JOZRBuimqtMb+bOgjag1GwPZbdINbYpK3O7vCQYiI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=NjEK3HK2; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="NjEK3HK2" Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-20b95359440so41944185ad.0 for ; Mon, 07 Oct 2024 15:16:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339402; x=1728944202; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=43HapEXGd6Lc5X5h6j6PBqWrgQK7wPQguqF5TQ9rKgg=; b=NjEK3HK2BUxW7VpwoHMC8v3dYHNpogoiYTd7DIDzSfkNgBFDIk1H0xG+9XeJuAfNw3 sFJNI+mSQ0ZWXD5jyOsTEhfeLaIW36Al50bTG+2+cZyOTJtQLMWyQnAro8KPoiAzLu89 9HhBU07uPgPfyDWyQK6i4CrgmcWqzvbfjlDEcGSL9DlffTh8BXkgX+CoMNJoGV7olsxo VnOCwW/nU6abk0xXgjfCo1CIOsyjYdX9yUvXbJbdHibUFggK0VPqZqqfHIxFp2lXu4WE CNXAF+ilVHNEOBwdSKoALp8XS6ngOKrczaBHUJVGhubEK0vUKnA8kr2BoAffp9OUlMzF dX3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339402; x=1728944202; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=43HapEXGd6Lc5X5h6j6PBqWrgQK7wPQguqF5TQ9rKgg=; b=MugJDAC2GaG42okXriMCJin9sibr636Zicbj84tOnNgj0WL2aD/78+Jl+Vjv6XbD5Z 2r2q5JD0N1kMvREzsuYH2+NxPhhGU7Bmybmo2CnMP/ByJbvuNFYTnPwi09tOuR54AI3E KPNk/Y3/Un1u3zR4SNBkxk7nxkrvo/Itgf/ekESnju64t47zlYBXrICSQ9Z6bpv4WBGG KAFoM4B4+0c+W+TrNq882b3CO+ZSXcE3zgFOaLKq9D5Ld0dJoz5BothZEdsGs8R+m4zy o69WLO2+mser35JucjQu06dwWHInkHCxpau0O2EyY3Uo2vy3vbS+dqSa3UK0lbQTsud0 hSrQ== X-Forwarded-Encrypted: i=1; AJvYcCX8m+jnYbdASZYLlr4X9C/IvA8mhUiRdCFBc0fH3e/M1XDbsGOU6tQBpBAFwgy2t2r2kDu7wiE=@vger.kernel.org X-Gm-Message-State: AOJu0Ywh9fIPNSfbfghIdgPzL0cJ6A7DZro73vc3803FKCWi8ygmrBJA S04xtRWnc5o6dVf+3aozH0rLV2at+MZVcTUkqWz3INEKqaIFr+hxttbVAzFv9tvVSBsU/gTN60k J X-Google-Smtp-Source: AGHT+IHC+h7Hwu+SYTbh5uEKgKtA0wwimvtjrNHm4Z7C5dM9mz+BX1pjVZCPkdbYvZPp6CJp3hGbfg== X-Received: by 2002:a17:902:f541:b0:20b:983c:f095 with SMTP id d9443c01a7336-20bff04cdf0mr182503845ad.51.1728339402411; Mon, 07 Oct 2024 15:16:42 -0700 (PDT) Received: from localhost (fwdproxy-prn-039.fbsv.net. [2a03:2880:ff:27::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c138b236fsm44338085ad.19.2024.10.07.15.16.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:41 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 12/15] io_uring/zcrx: add io_recvzc request Date: Mon, 7 Oct 2024 15:16:00 -0700 Message-ID: <20241007221603.1703699-13-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a socket. Only the connection should be land on the specific rx queue set up for zero copy, and the socket must be handled by the io_uring instance that the rx queue was registered for zero copy with. That's because neither net_iovs / buffers from our queue can be read by outside applications, nor zero copy is possible if traffic for the zero copy connection goes to another queue. This coordination is outside of the scope of this patch series. Also, any traffic directed to the zero copy enabled queue is immediately visible to the application, which is why CAP_NET_ADMIN is required at the registeration step. Of course, no data is actually read out of the socket, it has already been copied by the netdev into userspace memory via DMA. OP_RECV_ZC reads skbs out of the socket and checks that its frags are indeed net_iovs that belong to io_uring. A cqe is queued for each one of these frags. Recall that each cqe is a big cqe, with the top half being an io_uring_zcrx_cqe. The cqe res field contains the len or error. The lower IORING_ZCRX_AREA_SHIFT bits of the struct io_uring_zcrx_cqe::off field contain the offset relative to the start of the zero copy area. The upper part of the off field is trivially zero, and will be used to carry the area id. For now, there is no limit as to how much work each OP_RECV_ZC request does. It will attempt to drain a socket of all available data. This request always operates in multishot mode. Signed-off-by: David Wei --- include/uapi/linux/io_uring.h | 2 + io_uring/io_uring.h | 10 ++ io_uring/net.c | 78 +++++++++++++++ io_uring/opdef.c | 16 +++ io_uring/zcrx.c | 180 ++++++++++++++++++++++++++++++++++ io_uring/zcrx.h | 11 +++ 6 files changed, 297 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ffd315d8c6b5..c9c9877f2ba7 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -87,6 +87,7 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + __u32 zcrx_ifq_idx; __u32 optlen; struct { __u16 addr_len; @@ -259,6 +260,7 @@ enum io_uring_op { IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN, + IORING_OP_RECV_ZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index c2acf6180845..8cec53a63c39 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -171,6 +171,16 @@ static inline bool io_get_cqe(struct io_ring_ctx *ctx, struct io_uring_cqe **ret return io_get_cqe_overflow(ctx, ret, false); } +static inline bool io_defer_get_uncommited_cqe(struct io_ring_ctx *ctx, + struct io_uring_cqe **cqe_ret) +{ + io_lockdep_assert_cq_locked(ctx); + + ctx->cq_extra++; + ctx->submit_state.cq_flush = true; + return io_get_cqe(ctx, cqe_ret); +} + static __always_inline bool io_fill_cqe_req(struct io_ring_ctx *ctx, struct io_kiocb *req) { diff --git a/io_uring/net.c b/io_uring/net.c index d08abcca89cc..482e138d2994 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "zcrx.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -89,6 +90,13 @@ struct io_sr_msg { */ #define MULTISHOT_MAX_RETRY 32 +struct io_recvzc { + struct file *file; + unsigned msg_flags; + u16 flags; + struct io_zcrx_ifq *ifq; +}; + int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_shutdown *shutdown = io_kiocb_to_cmd(req, struct io_shutdown); @@ -1193,6 +1201,76 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + unsigned ifq_idx; + + if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr || + sqe->len || sqe->addr3)) + return -EINVAL; + + ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx); + if (ifq_idx != 0) + return -EINVAL; + zc->ifq = req->ctx->ifq; + if (!zc->ifq) + return -EINVAL; + + /* All data completions are posted as aux CQEs. */ + req->flags |= REQ_F_APOLL_MULTISHOT; + + zc->flags = READ_ONCE(sqe->ioprio); + zc->msg_flags = READ_ONCE(sqe->msg_flags); + if (zc->msg_flags) + return -EINVAL; + if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT)) + return -EINVAL; + + +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + struct io_zcrx_ifq *ifq; + struct socket *sock; + int ret; + + if (!(req->flags & REQ_F_POLLED) && + (zc->flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + ifq = req->ctx->ifq; + if (!ifq) + return -EINVAL; + + ret = io_zcrx_recv(req, ifq, sock, zc->msg_flags | MSG_DONTWAIT); + if (unlikely(ret <= 0) && ret != -EAGAIN) { + if (ret == -ERESTARTSYS) + ret = -EINTR; + + req_set_fail(req); + io_req_set_res(req, ret, 0); + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_STOP_MULTISHOT; + return IOU_OK; + } + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_ISSUE_SKIP_COMPLETE; + return -EAGAIN; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index a2be3bbca5ff..599eb3ea5ff4 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -36,6 +36,7 @@ #include "waitid.h" #include "futex.h" #include "truncate.h" +#include "zcrx.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -513,6 +514,18 @@ const struct io_issue_def io_issue_defs[] = { .async_size = sizeof(struct io_async_msghdr), #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_RECV_ZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollin = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_recvzc_prep, + .issue = io_recvzc, +#else + .prep = io_eopnotsupp_prep, #endif }, }; @@ -742,6 +755,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_LISTEN] = { .name = "LISTEN", }, + [IORING_OP_RECV_ZC] = { + .name = "RECV_ZC", + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 6cd3dee8b90a..8166d8a2656e 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -7,6 +7,8 @@ #include #include #include +#include +#include #include @@ -20,6 +22,12 @@ #if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) +struct io_zcrx_args { + struct io_kiocb *req; + struct io_zcrx_ifq *ifq; + struct socket *sock; +}; + static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov) { struct net_iov_area *owner = net_iov_owner(niov); @@ -247,6 +255,11 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) lockdep_assert_held(&ctx->uring_lock); } +static void io_zcrx_get_buf_uref(struct net_iov *niov) +{ + atomic_long_add(IO_ZC_RX_UREF, &niov->pp_ref_count); +} + static bool io_zcrx_niov_put(struct net_iov *niov, int nr) { return atomic_long_sub_and_test(nr, &niov->pp_ref_count); @@ -462,4 +475,171 @@ const struct memory_provider_ops io_uring_pp_zc_ops = { .scrub = io_pp_zc_scrub, }; +static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov, + struct io_zcrx_ifq *ifq, int off, int len) +{ + struct io_uring_zcrx_cqe *rcqe; + struct io_zcrx_area *area; + struct io_uring_cqe *cqe; + u64 offset; + + if (!io_defer_get_uncommited_cqe(req->ctx, &cqe)) + return false; + + cqe->user_data = req->cqe.user_data; + cqe->res = len; + cqe->flags = IORING_CQE_F_MORE; + + area = io_zcrx_iov_to_area(niov); + offset = off + (net_iov_idx(niov) << PAGE_SHIFT); + rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1); + rcqe->off = offset + ((u64)area->area_id << IORING_ZCRX_AREA_SHIFT); + memset(&rcqe->__pad, 0, sizeof(rcqe->__pad)); + return true; +} + +static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + struct net_iov *niov; + + off += skb_frag_off(frag); + + if (unlikely(!skb_frag_is_net_iov(frag))) + return -EOPNOTSUPP; + + niov = netmem_to_net_iov(frag->netmem); + if (niov->pp->mp_ops != &io_uring_pp_zc_ops || + niov->pp->mp_priv != ifq) + return -EFAULT; + + if (!io_zcrx_queue_cqe(req, niov, ifq, off, len)) + return -ENOSPC; + io_zcrx_get_buf_uref(niov); + return len; +} + +static int +io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct io_zcrx_args *args = desc->arg.data; + struct io_zcrx_ifq *ifq = args->ifq; + struct io_kiocb *req = args->req; + struct sk_buff *frag_iter; + unsigned start, start_off; + int i, copy, end, off; + int ret = 0; + + start = skb_headlen(skb); + start_off = offset; + + if (offset < start) + return -EOPNOTSUPP; + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag; + + if (WARN_ON(start > offset + len)) + return -EFAULT; + + frag = &skb_shinfo(skb)->frags[i]; + end = start + skb_frag_size(frag); + + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = io_zcrx_recv_frag(req, ifq, frag, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + + skb_walk_frags(skb, frag_iter) { + if (WARN_ON(start > offset + len)) + return -EFAULT; + + end = start + frag_iter->len; + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = io_zcrx_recv_skb(desc, frag_iter, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + +out: + if (offset == start_off) + return ret; + return offset - start_off; +} + +static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct sock *sk, int flags) +{ + struct io_zcrx_args args = { + .req = req, + .ifq = ifq, + .sock = sk->sk_socket, + }; + read_descriptor_t rd_desc = { + .count = 1, + .arg.data = &args, + }; + int ret; + + lock_sock(sk); + ret = tcp_read_sock(sk, &rd_desc, io_zcrx_recv_skb); + if (ret <= 0) { + if (ret < 0 || sock_flag(sk, SOCK_DONE)) + goto out; + if (sk->sk_err) + ret = sock_error(sk); + else if (sk->sk_shutdown & RCV_SHUTDOWN) + goto out; + else if (sk->sk_state == TCP_CLOSE) + ret = -ENOTCONN; + else + ret = -EAGAIN; + } else if (sock_flag(sk, SOCK_DONE)) { + /* Make it to retry until it finally gets 0. */ + ret = -EAGAIN; + } +out: + release_sock(sk); + return ret; +} + +int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct socket *sock, unsigned int flags) +{ + struct sock *sk = sock->sk; + const struct proto *prot = READ_ONCE(sk->sk_prot); + + if (prot->recvmsg != tcp_recvmsg) + return -EPROTONOSUPPORT; + + sock_rps_record_flow(sk); + return io_zcrx_tcp_recvmsg(req, ifq, sk, flags); +} + #endif diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index 67512fc69cc4..ddd68098122a 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -3,6 +3,7 @@ #define IOU_ZC_RX_H #include +#include #include #define IO_ZC_RX_UREF 0x10000 @@ -44,6 +45,8 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg); void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx); +int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct socket *sock, unsigned int flags); #else static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) @@ -56,6 +59,14 @@ static inline void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) static inline void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) { } +static inline int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + struct socket *sock, unsigned int flags) +{ + return -EOPNOTSUPP; +} #endif +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); + #endif From patchwork Mon Oct 7 22:16:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825350 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50E9119340A for ; Mon, 7 Oct 2024 22:16:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339405; cv=none; b=GjxTutOOH6cS5HD4DFyLFMHFjimSmMOC98oxqznhSu/ZujZ2aGAt+pXeg05dtzwIiw0z9l5bjlMc7VYvJXSeweqyEQTB4C1HgHbt+lHSND+rIWVMMaiqDzRbN08FaDb2bw5+NYF0po1wPgatBQpK5c3V+rd7UW8Hdph1Q2fwsYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339405; c=relaxed/simple; bh=4quKQmRDyKq8kxGfVBu+0W6bFSjXmRzSt74BfF/rLpI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=anqtQ/1kqBhFeubTsX4yLaDC+uZGNAjTrXU4U87UhMRTGa3OZegKB2v2yYqV8IL9g2VbkETdZpBApeyUon3/7ocQXVvgnpAOg7YvehJkqtokoBtYL3yT0vt01MFqcewiT2TbFLvPB6wzwfH1UGvjeuJ3xEICu0w3iwoh3lGNzOQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=U4qCmATg; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="U4qCmATg" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-20b95359440so41944355ad.0 for ; Mon, 07 Oct 2024 15:16:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339404; x=1728944204; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eZCpwQIKANo1xda4Ip9Xs5tdX44KYlU32n0KZmCBN9k=; b=U4qCmATgT8rplfKGmYwPGuhUhTgAYo/pkALlQsWu4NEdUQO2T2+ok86JLFB3Q8XBtE HWJis0crzc4eN/9VnIe7SjCoO3WBgc8p3D4k7ri5iTSqeqp99BJBHGy5uW4r0gPpPUOI 2Zy0Btd9kFo0Q9nj1RkJwrRf+Axd7F0YBQuBdeHA//fkETfdypsf8weENjmRS4joNRR8 C1ZXyU2vtMe8tIz5KUnA0k5fa9iuN0ehM5o8xJRJ+8RlR6TOCgWJdBw/F8AMGk+LRnN3 ON0Ja+tcGS+uAAaMjl0pvIaCPzFusVn7Ln4y6HVTHqD4cYcDtbn+6W0F8nRntCE9Dn3H LJxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339404; x=1728944204; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eZCpwQIKANo1xda4Ip9Xs5tdX44KYlU32n0KZmCBN9k=; b=RFQTVjeVq0zAxJcXPI4DZCCUEscN7DFLIyKDglE/UxlVhpTMCY4B3I92hkpBzQTUhS bnSgLP0AVPg4vvKMpmgDyfiy/QzXU+IcBnN5lAkTEJP+QygRSgZ8zFk+ieRs1h9gSyv/ Lz26CPddsT8ajL8MqBskKPqAdTvN7cPhCV+dFgg23gTmaCRwlqgXrc5fQn89TIBHjCgn u41Uuq5L05YSVKbjlyHffLW+ptmBVJl6kZVvksYLmjuR/phMjlhMTXgnXU0Ie0TON4gq yBfnlDoJ0+WhPGGdx2pEpPb/91vPR2LytO0Nr6uqZhVYsb7RNo7+A/yqJWkekIEWIgBG 7VQg== X-Forwarded-Encrypted: i=1; AJvYcCXuAcDuEiO5nawET0hqroQm1wvkVHUswhM+xmTz+eTKoTY5HkTFjMlZYdyZHjFaJuAhuCaFjkw=@vger.kernel.org X-Gm-Message-State: AOJu0YzPWcJ5vKXz3gwcSa1NArP1I3RgPpMbdaMVYtZn1JT6Inziiq/H I3b6Xu9GO5L5zfuNmIMkSweWLyV+4BIH60tYALqQ205xmVo+zf1D//W9yIGkLR8= X-Google-Smtp-Source: AGHT+IEvyVdzwuTNsYLAMar1o1hnlxxWzdcyt4OsBcpoUoFgJvf0/bwT/mqMaRtPmXlPZ3+f5NPabA== X-Received: by 2002:a17:902:f687:b0:20b:57f0:b392 with SMTP id d9443c01a7336-20bfdff2519mr146303075ad.22.1728339403731; Mon, 07 Oct 2024 15:16:43 -0700 (PDT) Received: from localhost (fwdproxy-prn-057.fbsv.net. [2a03:2880:ff:39::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c138ca835sm44107785ad.68.2024.10.07.15.16.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:43 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 13/15] io_uring/zcrx: add copy fallback Date: Mon, 7 Oct 2024 15:16:01 -0700 Message-ID: <20241007221603.1703699-14-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov There are scenarios in which the zerocopy path might get a normal in-kernel buffer, it could be a mis-steered packet or simply the linear part of an skb. Another use case is to allow the driver to allocate kernel pages when it's out of zc buffers, which makes it more resilient to spikes in load and allow the user to choose the balance between the amount of memory provided and performance. At the moment we fail such requests. Instead, grab a buffer from the page pool, copy data there, and return back to user in the usual way. Because the refill ring is private to the napi our page pool is running from, it's done by stopping the napi via napi_execute() helper. It grabs only one buffer, which is inefficient, and improving it is left for follow up patches. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/zcrx.c | 125 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 118 insertions(+), 7 deletions(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 8166d8a2656e..d21e7017deb3 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -5,6 +5,8 @@ #include #include #include +#include +#include #include #include #include @@ -28,6 +30,11 @@ struct io_zcrx_args { struct socket *sock; }; +struct io_zc_refill_data { + struct io_zcrx_ifq *ifq; + struct net_iov *niov; +}; + static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov) { struct net_iov_area *owner = net_iov_owner(niov); @@ -35,6 +42,13 @@ static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *nio return container_of(owner, struct io_zcrx_area, nia); } +static inline struct page *io_zcrx_iov_page(const struct net_iov *niov) +{ + struct io_zcrx_area *area = io_zcrx_iov_to_area(niov); + + return area->pages[net_iov_idx(niov)]; +} + static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, struct io_uring_zcrx_ifq_reg *reg) { @@ -475,6 +489,34 @@ const struct memory_provider_ops io_uring_pp_zc_ops = { .scrub = io_pp_zc_scrub, }; +static void io_napi_refill(void *data) +{ + struct io_zc_refill_data *rd = data; + struct io_zcrx_ifq *ifq = rd->ifq; + netmem_ref netmem; + + if (WARN_ON_ONCE(!ifq->pp)) + return; + + netmem = page_pool_alloc_netmem(ifq->pp, GFP_ATOMIC | __GFP_NOWARN); + if (!netmem) + return; + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return; + + rd->niov = netmem_to_net_iov(netmem); +} + +static struct net_iov *io_zc_get_buf_task_safe(struct io_zcrx_ifq *ifq) +{ + struct io_zc_refill_data rd = { + .ifq = ifq, + }; + + napi_execute(ifq->napi_id, io_napi_refill, &rd); + return rd.niov; +} + static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov, struct io_zcrx_ifq *ifq, int off, int len) { @@ -498,6 +540,45 @@ static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov, return true; } +static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq, + void *data, unsigned int offset, size_t len) +{ + size_t copy_size, copied = 0; + int ret = 0, off = 0; + struct page *page; + u8 *vaddr; + + do { + struct net_iov *niov; + + niov = io_zc_get_buf_task_safe(ifq); + if (!niov) { + ret = -ENOMEM; + break; + } + + page = io_zcrx_iov_page(niov); + vaddr = kmap_local_page(page); + copy_size = min_t(size_t, PAGE_SIZE, len); + memcpy(vaddr, data + offset, copy_size); + kunmap_local(vaddr); + + if (!io_zcrx_queue_cqe(req, niov, ifq, off, copy_size)) { + napi_pp_put_page(net_iov_to_netmem(niov)); + return -ENOSPC; + } + + io_zcrx_get_buf_uref(niov); + napi_pp_put_page(net_iov_to_netmem(niov)); + + offset += copy_size; + len -= copy_size; + copied += copy_size; + } while (offset < len); + + return copied ? copied : ret; +} + static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, const skb_frag_t *frag, int off, int len) { @@ -505,8 +586,24 @@ static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq, off += skb_frag_off(frag); - if (unlikely(!skb_frag_is_net_iov(frag))) - return -EOPNOTSUPP; + if (unlikely(!skb_frag_is_net_iov(frag))) { + struct page *page = skb_frag_page(frag); + u32 p_off, p_len, t, copied = 0; + u8 *vaddr; + int ret = 0; + + skb_frag_foreach_page(frag, off, len, + page, p_off, p_len, t) { + vaddr = kmap_local_page(page); + ret = io_zcrx_copy_chunk(req, ifq, vaddr, p_off, p_len); + kunmap_local(vaddr); + + if (ret < 0) + return copied ? copied : ret; + copied += ret; + } + return copied; + } niov = netmem_to_net_iov(frag->netmem); if (niov->pp->mp_ops != &io_uring_pp_zc_ops || @@ -527,15 +624,29 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, struct io_zcrx_ifq *ifq = args->ifq; struct io_kiocb *req = args->req; struct sk_buff *frag_iter; - unsigned start, start_off; + unsigned start, start_off = offset; int i, copy, end, off; int ret = 0; - start = skb_headlen(skb); - start_off = offset; + if (unlikely(offset < skb_headlen(skb))) { + ssize_t copied; + size_t to_copy; - if (offset < start) - return -EOPNOTSUPP; + to_copy = min_t(size_t, skb_headlen(skb) - offset, len); + copied = io_zcrx_copy_chunk(req, ifq, skb->data, offset, to_copy); + if (copied < 0) { + ret = copied; + goto out; + } + offset += copied; + len -= copied; + if (!len) + goto out; + if (offset != skb_headlen(skb)) + goto out; + } + + start = skb_headlen(skb); for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { const skb_frag_t *frag; From patchwork Mon Oct 7 22:16:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825352 Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDDF41D9A54 for ; Mon, 7 Oct 2024 22:16:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339407; cv=none; b=loK0KzC78GUFCGAPhNBENt/cEoRod3vphphOA6aGFOMUEnuNtl6rfR9n3WYZcEdBaEsHXBuo1BZJzNJw6x3rbJ4omEcY7NPswZKii7FugyItLcdjT5Xhu0sVPZftAlSqD4VFt8m4OfuIk7JJ8MGGJjcTfPLbu732FMN1ChY3z7Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339407; c=relaxed/simple; bh=M/u32X7mE0eu2j1IfCto9rHwIaFfw+VZzzLEOHDgzC8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=acr38OC5ran+NQ2sj40cY/OsiZSmwQNOACroIy/IMqELFqPicXdNZFJ34EpmrMG5nurzRpZUNJbvprqI2cR1Y6ZspLCiyJ6IYqClo+M4Yq4ABZvuQ9AiyRxy69TCD1K7WGpYClQ6gziuvbc8ufzsVIeAHOFuovrUB8otmsDyMCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=Ohw8gW5U; arc=none smtp.client-ip=209.85.214.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="Ohw8gW5U" Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-20b93887decso39275145ad.3 for ; Mon, 07 Oct 2024 15:16:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339405; x=1728944205; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/eq2VZkNo54wCm06p4ihDUUpbW/pqLbz8N2zWhnWG4o=; b=Ohw8gW5URHf1iufG7i+Pbt9RCUmdRZOuXS5QPXkYWTZCAPb5KGd9XgEIEkWjUok/tX oSBmF5Q+ojSQNa8NdMMAMQ6ulD8JYHLbtdfvi3/AGr3i+y1Mnb29zx0ha/okSO8cOx2e vCK96uPVJDBDeuFjdQ2gemMzJWdG+4WGbyJ0owieumFsNHV11lsMc/86L86Iz+p1unQp 1vGrPpIZWigCc2CYoqTBBr2yQgsA0M6sdnSMgfAZlfTXYPzDWFa5Tg6PiZem+lBltPV7 b3SstKcLV+Vhhm22tb9efnESHbgT8CZyzCQPq0fb4u9JpVOqaUygJV9l1g9X0uWQXkc2 Vwlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339405; x=1728944205; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/eq2VZkNo54wCm06p4ihDUUpbW/pqLbz8N2zWhnWG4o=; b=kW6jvKkssyeuFlfypMpo0MfpzFh9LU3HuiVkjyzlbk3c0SjQUmEI96LiPQ3sWE1qQT fnw6W9EkgQLvLeOvyYDKw8dnz7CyUZ+NrRSUcdftLNNn6mAFyk7iVGQl3FKd988vsSFE ZXiX/zvTnjpxlP9U9ExGXZTds3W56R1yHuV9P5NWdKlK0k2MNKs0wojali6JdZfPt4tx 4ZWVGx43SZbC2d+LDM5goxc4fztxAv99tTs6J2j3+1CdFiTyoMvh4ELmNGQS4hVF7uFg Yrx8dir+mO82YqamYEbbThnj4BR3d2hlO0mPdw+KbL7WPLu6bFoSvjB2tmywqWS+8Mbt A15Q== X-Forwarded-Encrypted: i=1; AJvYcCUfQvIfaBMh7h8pMwzxG5o0lVRPw85Lx9QXiIX2hgAWmHsZahtWecCwVdp8YsMTSk3m5A5lbRA=@vger.kernel.org X-Gm-Message-State: AOJu0YzQiFQPOdA2adNICdHh+A8mxlFrU5lkFhspzSp3oXzfVwwicDYC j801Qtz/cLxXHRh0njuw37asO33jCXPqZELm0ZuzYl6aqD+P3klCWjBcM3/BlO0= X-Google-Smtp-Source: AGHT+IGnf67ts0Ls/kC3chNDoUoQKzIEIrHLFKWe3Mde2+FmEtrlljL2verm97/1cvmUnLU5bmDy0w== X-Received: by 2002:a17:902:cec6:b0:20b:61ec:7d3c with SMTP id d9443c01a7336-20bff1c6365mr217121335ad.49.1728339405080; Mon, 07 Oct 2024 15:16:45 -0700 (PDT) Received: from localhost (fwdproxy-prn-008.fbsv.net. [2a03:2880:ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c1396c887sm44337865ad.237.2024.10.07.15.16.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:44 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 14/15] io_uring/zcrx: set pp memory provider for an rx queue Date: Mon, 7 Oct 2024 15:16:02 -0700 Message-ID: <20241007221603.1703699-15-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: David Wei Set the page pool memory provider for the rx queue configured for zero copy to io_uring. Then the rx queue is reset using netdev_rx_queue_restart() and netdev core + page pool will take care of filling the rx queue from the io_uring zero copy memory provider. For now, there is only one ifq so its destruction happens implicitly during io_uring cleanup. Signed-off-by: David Wei --- io_uring/zcrx.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 82 insertions(+), 2 deletions(-) diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index d21e7017deb3..7939f830cf5b 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -6,6 +6,8 @@ #include #include #include +#include +#include #include #include #include @@ -49,6 +51,63 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov) return area->pages[net_iov_idx(niov)]; } +static int io_open_zc_rxq(struct io_zcrx_ifq *ifq, unsigned ifq_idx) +{ + struct netdev_rx_queue *rxq; + struct net_device *dev = ifq->dev; + int ret; + + ASSERT_RTNL(); + + if (ifq_idx >= dev->num_rx_queues) + return -EINVAL; + ifq_idx = array_index_nospec(ifq_idx, dev->num_rx_queues); + + rxq = __netif_get_rx_queue(ifq->dev, ifq_idx); + if (rxq->mp_params.mp_priv) + return -EEXIST; + + ifq->if_rxq = ifq_idx; + rxq->mp_params.mp_ops = &io_uring_pp_zc_ops; + rxq->mp_params.mp_priv = ifq; + ret = netdev_rx_queue_restart(ifq->dev, ifq->if_rxq); + if (ret) { + rxq->mp_params.mp_ops = NULL; + rxq->mp_params.mp_priv = NULL; + ifq->if_rxq = -1; + } + return ret; +} + +static void io_close_zc_rxq(struct io_zcrx_ifq *ifq) +{ + struct netdev_rx_queue *rxq; + int err; + + if (ifq->if_rxq == -1) + return; + + rtnl_lock(); + if (WARN_ON_ONCE(ifq->if_rxq >= ifq->dev->num_rx_queues)) { + rtnl_unlock(); + return; + } + + rxq = __netif_get_rx_queue(ifq->dev, ifq->if_rxq); + + WARN_ON_ONCE(rxq->mp_params.mp_priv != ifq); + + rxq->mp_params.mp_ops = NULL; + rxq->mp_params.mp_priv = NULL; + + err = netdev_rx_queue_restart(ifq->dev, ifq->if_rxq); + if (err) + pr_devel("io_uring: can't restart a queue on zcrx close\n"); + + rtnl_unlock(); + ifq->if_rxq = -1; +} + static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq, struct io_uring_zcrx_ifq_reg *reg) { @@ -169,9 +228,12 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx) static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq) { + io_close_zc_rxq(ifq); + if (ifq->area) io_zcrx_free_area(ifq->area); - + if (ifq->dev) + dev_put(ifq->dev); io_free_rbuf_ring(ifq); kfree(ifq); } @@ -227,7 +289,17 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, goto err; ifq->rq_entries = reg.rq_entries; - ifq->if_rxq = reg.if_rxq; + + ret = -ENODEV; + rtnl_lock(); + ifq->dev = dev_get_by_index(current->nsproxy->net_ns, reg.if_idx); + if (!ifq->dev) + goto err_rtnl_unlock; + + ret = io_open_zc_rxq(ifq, reg.if_rxq); + if (ret) + goto err_rtnl_unlock; + rtnl_unlock(); ring_sz = sizeof(struct io_uring); rqes_sz = sizeof(struct io_uring_zcrx_rqe) * ifq->rq_entries; @@ -237,15 +309,20 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, reg.offsets.tail = offsetof(struct io_uring, tail); if (copy_to_user(arg, ®, sizeof(reg))) { + io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } if (copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) { + io_close_zc_rxq(ifq); ret = -EFAULT; goto err; } ctx->ifq = ifq; return 0; + +err_rtnl_unlock: + rtnl_unlock(); err: io_zcrx_ifq_free(ifq); return ret; @@ -267,6 +344,9 @@ void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx) void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) { lockdep_assert_held(&ctx->uring_lock); + + if (ctx->ifq) + io_close_zc_rxq(ctx->ifq); } static void io_zcrx_get_buf_uref(struct net_iov *niov) From patchwork Mon Oct 7 22:16:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Wei X-Patchwork-Id: 13825353 Received: from mail-pg1-f171.google.com (mail-pg1-f171.google.com [209.85.215.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20AF01DE2C7 for ; Mon, 7 Oct 2024 22:16:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339408; cv=none; b=AJOOo2tFZpjtJyG6C4kYJB4Z+tweVic0LZuvNRHBtP5LOvD+VSMCrbXe0mLvwRcznqgWEUZuV3GbIcyfW+FJtFLAi2MxSnyxNfoEXr52RQZu1oCBTWtVRit8LMXCEV4Dgjfwe6q+Hq5SCElzcZIqnnSyzi0k4ZenK3Q/YDHGHcY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728339408; c=relaxed/simple; bh=8byHPkmyqil4BRS4TqeQPgrllEa8K8cBiR2yYsnDLjQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=D1fzjhEDTTf5K1kFn22yTyHVYJ7ux9Ed4sdzn+EGmUz4gTHXwrnhaK0WSKC6y2reOrG5gMPGzbRov7TMvpwVuC8HDn48HSW00GXsv2WKsrO4P6O8+u1HA3Toxpcmgw9b4BNwR3sejaHmqMEFBs4XyRTprQ8JE/pWp//D1J2Dng0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk; spf=none smtp.mailfrom=davidwei.uk; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b=01wA2JLY; arc=none smtp.client-ip=209.85.215.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=davidwei.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=davidwei-uk.20230601.gappssmtp.com header.i=@davidwei-uk.20230601.gappssmtp.com header.b="01wA2JLY" Received: by mail-pg1-f171.google.com with SMTP id 41be03b00d2f7-6bce380eb96so3028892a12.0 for ; Mon, 07 Oct 2024 15:16:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=davidwei-uk.20230601.gappssmtp.com; s=20230601; t=1728339406; x=1728944206; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wZx20d5L+OyVbNtxxiSkP5O3ww6PEtO7CAmaLbxuvto=; b=01wA2JLYGHgSeBiGQFVtPj5XmmWpJyM7P8CTNjiVl2KJ2Ji/XfmK5cBgAQoYxCRUN2 YQ6s7WZVfgFZ1VZeUOiaZLKy4f6bZz4308VNwOHyEDHOT/gMpjs5Nmo0GE9fgVIwE+CS 77KtoGsXtMJAoD6aD8ck7HFfGmb2sLbOT13uUwuzokkLIm51VJemVI8rdLp36lhZ09uj BotTSEeLEg1EI0r9AcWty5VAaII4BAxYrhl3uIDx6n/dEkUgUy+3OPxhGh+VRfSp0Mwx KD+iAhe5O6SSJdEhWE1pBSaNBOxei6apLJdA9EjOP3YDXS8gpanCcWARAztTj2Vy+GbJ 0feQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728339406; x=1728944206; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wZx20d5L+OyVbNtxxiSkP5O3ww6PEtO7CAmaLbxuvto=; b=bKCvvFW2wfogkOqo02XfoYQBbNXk2fxZSBWYf2p8Y2k6/lI4L4r00LHdv7kx2huLmB aQqU6R4gLa6+bUbsmt/tJu+3MkBE/8Yz+od7dvdFpPizdgMHaCJFj7NrkiFpq4cnJ9xk itjB8QBQ8BYppKyAqeotI+rrIaD5myS9BvpsKKSxfJic592GjiTj7t4mfwsRg2dXDQbM +noyNLznhQnOiCm1a3NVJXve84dkSiRrxd6D2T6h1ROOf+ktV5GrvmAViaVFriZhJhKA QglXCJHFmtHY1agFgLibQRSqzvDn8D3wslWa5we6vDS5/C3TVgQSFIazogdAg4eCOGI4 PoLA== X-Forwarded-Encrypted: i=1; AJvYcCXVuKLxVctA6mmMx9Nx88ykTxBf/36618WPal5ehLl5WFM0pdSql4yAID3IwB/kK9UPRs+57x8=@vger.kernel.org X-Gm-Message-State: AOJu0YxQSbUAfRvR6WHH4LWSk7SRGk4EAH1MrAcXZ1/LdBvtFiWaoyCV 61KT76RR3nvMs0f5+OqNhIK4UPzFPVfE0e1kU2l7EULiS5LxWw7YSkvXkHmWZ7w= X-Google-Smtp-Source: AGHT+IGG+x8hF8UTMd+j5BFtku8i621L8kiAlMLyUacTPS2TDiYZJc55o5i83AgH/0u7kK+7KB6Q2A== X-Received: by 2002:a17:903:234f:b0:20c:3d9e:5f2b with SMTP id d9443c01a7336-20c3d9e60acmr48053035ad.57.1728339406294; Mon, 07 Oct 2024 15:16:46 -0700 (PDT) Received: from localhost (fwdproxy-prn-024.fbsv.net. [2a03:2880:ff:18::face:b00c]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20c139a1792sm44280965ad.293.2024.10.07.15.16.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Oct 2024 15:16:45 -0700 (PDT) From: David Wei To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: David Wei , Jens Axboe , Pavel Begunkov , Jakub Kicinski , Paolo Abeni , "David S. Miller" , Eric Dumazet , Jesper Dangaard Brouer , David Ahern , Mina Almasry Subject: [PATCH v1 15/15] io_uring/zcrx: throttle receive requests Date: Mon, 7 Oct 2024 15:16:03 -0700 Message-ID: <20241007221603.1703699-16-dw@davidwei.uk> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241007221603.1703699-1-dw@davidwei.uk> References: <20241007221603.1703699-1-dw@davidwei.uk> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Pavel Begunkov io_zc_rx_tcp_recvmsg() continues until it fails or there is nothing to receive. If the other side sends fast enough, we might get stuck in io_zc_rx_tcp_recvmsg() producing more and more CQEs but not letting the user to handle them leading to unbound latencies. Break out of it based on an arbitrarily chosen limit, the upper layer will either return to userspace or requeue the request. Signed-off-by: Pavel Begunkov Signed-off-by: David Wei --- io_uring/net.c | 5 ++++- io_uring/zcrx.c | 17 ++++++++++++++--- io_uring/zcrx.h | 6 ++++-- 3 files changed, 22 insertions(+), 6 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 482e138d2994..c99e62c7dcfb 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1253,10 +1253,13 @@ int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) if (!ifq) return -EINVAL; - ret = io_zcrx_recv(req, ifq, sock, zc->msg_flags | MSG_DONTWAIT); + ret = io_zcrx_recv(req, ifq, sock, zc->msg_flags | MSG_DONTWAIT, + issue_flags); if (unlikely(ret <= 0) && ret != -EAGAIN) { if (ret == -ERESTARTSYS) ret = -EINTR; + if (ret == IOU_REQUEUE) + return IOU_REQUEUE; req_set_fail(req); io_req_set_res(req, ret, 0); diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 7939f830cf5b..a78d82a2d404 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -26,10 +26,13 @@ #if defined(CONFIG_PAGE_POOL) && defined(CONFIG_INET) +#define IO_SKBS_PER_CALL_LIMIT 20 + struct io_zcrx_args { struct io_kiocb *req; struct io_zcrx_ifq *ifq; struct socket *sock; + unsigned nr_skbs; }; struct io_zc_refill_data { @@ -708,6 +711,9 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, int i, copy, end, off; int ret = 0; + if (unlikely(args->nr_skbs++ > IO_SKBS_PER_CALL_LIMIT)) + return -EAGAIN; + if (unlikely(offset < skb_headlen(skb))) { ssize_t copied; size_t to_copy; @@ -785,7 +791,8 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, } static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq, - struct sock *sk, int flags) + struct sock *sk, int flags, + unsigned int issue_flags) { struct io_zcrx_args args = { .req = req, @@ -811,6 +818,9 @@ static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq, ret = -ENOTCONN; else ret = -EAGAIN; + } else if (unlikely(args.nr_skbs > IO_SKBS_PER_CALL_LIMIT) && + (issue_flags & IO_URING_F_MULTISHOT)) { + ret = IOU_REQUEUE; } else if (sock_flag(sk, SOCK_DONE)) { /* Make it to retry until it finally gets 0. */ ret = -EAGAIN; @@ -821,7 +831,8 @@ static int io_zcrx_tcp_recvmsg(struct io_kiocb *req, struct io_zcrx_ifq *ifq, } int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, - struct socket *sock, unsigned int flags) + struct socket *sock, unsigned int flags, + unsigned int issue_flags) { struct sock *sk = sock->sk; const struct proto *prot = READ_ONCE(sk->sk_prot); @@ -830,7 +841,7 @@ int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, return -EPROTONOSUPPORT; sock_rps_record_flow(sk); - return io_zcrx_tcp_recvmsg(req, ifq, sk, flags); + return io_zcrx_tcp_recvmsg(req, ifq, sk, flags, issue_flags); } #endif diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h index ddd68098122a..bb7ca61a251e 100644 --- a/io_uring/zcrx.h +++ b/io_uring/zcrx.h @@ -46,7 +46,8 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx, void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx); void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx); int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, - struct socket *sock, unsigned int flags); + struct socket *sock, unsigned int flags, + unsigned int issue_flags); #else static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx, struct io_uring_zcrx_ifq_reg __user *arg) @@ -60,7 +61,8 @@ static inline void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx) { } static inline int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq, - struct socket *sock, unsigned int flags) + struct socket *sock, unsigned int flags, + unsigned int issue_flags) { return -EOPNOTSUPP; }