From patchwork Tue Nov 8 05:05:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035883 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E9FAC43217 for ; Tue, 8 Nov 2022 05:05:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229731AbiKHFFe convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230130AbiKHFFc (ORCPT ); Tue, 8 Nov 2022 00:05:32 -0500 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06FB5A46D for ; Mon, 7 Nov 2022 21:05:31 -0800 (PST) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKuTM007143 for ; Mon, 7 Nov 2022 21:05:31 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3knnnvd7sf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:31 -0800 Received: from twshared15216.17.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:29 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id D7EA823B26007; Mon, 7 Nov 2022 21:05:21 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 01/15] io_uring: add zctap ifq definition Date: Mon, 7 Nov 2022 21:05:07 -0800 Message-ID: <20221108050521.3198458-2-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: LKbGQUHGlTmEspEhgr2x2_80HYuwLtXZ X-Proofpoint-GUID: LKbGQUHGlTmEspEhgr2x2_80HYuwLtXZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add structure definition for io_zctap_ifq for use by lower level networking hooks. Signed-off-by: Jonathan Lemon --- include/linux/io_uring_types.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index f5b687a787a3..39f20344d578 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -322,6 +322,7 @@ struct io_ring_ctx { struct io_mapped_ubuf *dummy_ubuf; struct io_rsrc_data *file_data; struct io_rsrc_data *buf_data; + struct io_zctap_ifq *zctap_ifq; struct delayed_work rsrc_put_work; struct llist_head rsrc_put_llist; @@ -577,4 +578,14 @@ struct io_overflow_cqe { struct io_uring_cqe cqe; }; +struct io_zctap_ifq { + struct net_device *dev; + struct io_ring_ctx *ctx; + void *region; + struct ubuf_info *uarg; + u16 queue_id; + u16 id; + u16 fill_bgid; +}; + #endif From patchwork Tue Nov 8 05:05:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035884 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 606CCC43219 for ; Tue, 8 Nov 2022 05:05:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229521AbiKHFFd convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229731AbiKHFF2 (ORCPT ); Tue, 8 Nov 2022 00:05:28 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C58711A12 for ; Mon, 7 Nov 2022 21:05:27 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKt44018855 for ; Mon, 7 Nov 2022 21:05:27 -0800 Received: from maileast.thefacebook.com ([163.114.130.3]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3knk5mp8s7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:27 -0800 Received: from twshared2001.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:25 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id DE81523B2600B; Mon, 7 Nov 2022 21:05:21 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 02/15] netdevice: add SETUP_ZCTAP to the netdev_bpf structure Date: Mon, 7 Nov 2022 21:05:08 -0800 Message-ID: <20221108050521.3198458-3-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: nNRrIG_2clC21tw5MDghfOycfHaHN5ut X-Proofpoint-GUID: nNRrIG_2clC21tw5MDghfOycfHaHN5ut X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This command requests the networking device setup or teardown a new interface queue, backed by a region of user supplied memory. The queue will be managed by io-uring. Signed-off-by: Jonathan Lemon --- include/linux/netdevice.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d45713a06568..1d1e10f4216f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -980,6 +980,7 @@ enum bpf_netdev_command { BPF_OFFLOAD_MAP_ALLOC, BPF_OFFLOAD_MAP_FREE, XDP_SETUP_XSK_POOL, + XDP_SETUP_ZCTAP, }; struct bpf_prog_offload_ops; @@ -1018,6 +1019,11 @@ struct netdev_bpf { struct xsk_buff_pool *pool; u16 queue_id; } xsk; + /* XDP_SETUP_ZCTAP */ + struct { + struct io_zctap_ifq *ifq; + u16 queue_id; + } zct; }; }; From patchwork Tue Nov 8 05:05:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035882 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2D68C433FE for ; Tue, 8 Nov 2022 05:05:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229710AbiKHFFd convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38732 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229521AbiKHFF2 (ORCPT ); Tue, 8 Nov 2022 00:05:28 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42699A46D for ; Mon, 7 Nov 2022 21:05:27 -0800 (PST) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A80V0Ha007321 for ; Mon, 7 Nov 2022 21:05:26 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kqcmqse54-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:25 -0800 Received: from twshared5287.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:24 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id E584823B2600D; Mon, 7 Nov 2022 21:05:21 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 03/15] io_uring: add register ifq opcode Date: Mon, 7 Nov 2022 21:05:09 -0800 Message-ID: <20221108050521.3198458-4-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: SAY7s7Ec04GO_SByECfnIIKU4AAmQboP X-Proofpoint-ORIG-GUID: SAY7s7Ec04GO_SByECfnIIKU4AAmQboP X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add initial support for support for hooking in zero-copy interface queues to io_uring. This command requests a user-managed queue from the specified network device. This only includes the register opcode, unregistration is currently done implicitly when the ring is removed. Signed-off-by: Jonathan Lemon --- include/uapi/linux/io_uring.h | 15 ++++ io_uring/Makefile | 3 +- io_uring/io_uring.c | 8 ++ io_uring/zctap.c | 134 ++++++++++++++++++++++++++++++++++ io_uring/zctap.h | 9 +++ 5 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 io_uring/zctap.c create mode 100644 io_uring/zctap.h diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ab7458033ee3..f65543595d71 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -490,6 +490,9 @@ enum { /* register a range of fixed file slots for automatic slot allocation */ IORING_REGISTER_FILE_ALLOC_RANGE = 25, + /* register a network ifq for zerocopy RX */ + IORING_REGISTER_IFQ = 26, + /* this goes last */ IORING_REGISTER_LAST }; @@ -666,6 +669,18 @@ struct io_uring_recvmsg_out { __u32 flags; }; +/* + * Argument for IORING_REGISTER_IFQ + */ +struct io_uring_ifq_req { + __u32 ifindex; + __u16 queue_id; + __u16 ifq_id; + __u16 fill_bgid; + __u16 region_id; + __u16 resv[2]; +}; + #ifdef __cplusplus } #endif diff --git a/io_uring/Makefile b/io_uring/Makefile index 8cc8e5387a75..9d87e2e45ef9 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -7,5 +7,6 @@ obj-$(CONFIG_IO_URING) += io_uring.o xattr.o nop.o fs.o splice.o \ openclose.o uring_cmd.o epoll.o \ statx.o net.o msg_ring.o timeout.o \ sqpoll.o fdinfo.o tctx.o poll.o \ - cancel.o kbuf.o rsrc.o rw.o opdef.o notif.o + cancel.o kbuf.o rsrc.o rw.o opdef.o \ + notif.o zctap.o obj-$(CONFIG_IO_WQ) += io-wq.o diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ac8c488e3077..0d67b0d05ef9 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -91,6 +91,7 @@ #include "cancel.h" #include "net.h" #include "notif.h" +#include "zctap.h" #include "timeout.h" #include "poll.h" @@ -2799,6 +2800,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) __io_cqring_overflow_flush(ctx, true); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); + io_unregister_zctap_all(ctx); if (ctx->rings) io_poll_remove_all(ctx, NULL, true); mutex_unlock(&ctx->uring_lock); @@ -4031,6 +4033,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_file_alloc_range(ctx, arg); break; + case IORING_REGISTER_IFQ: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_ifq(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/zctap.c b/io_uring/zctap.c new file mode 100644 index 000000000000..2ba05110ea8a --- /dev/null +++ b/io_uring/zctap.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "io_uring.h" +#include "zctap.h" + +#define NR_ZCTAP_IFQS 1 + +typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); + +static int __io_queue_mgmt(struct net_device *dev, struct io_zctap_ifq *ifq, + u16 queue_id) +{ + struct netdev_bpf cmd; + bpf_op_t ndo_bpf; + + ndo_bpf = dev->netdev_ops->ndo_bpf; + if (!ndo_bpf) + return -EINVAL; + + cmd.command = XDP_SETUP_ZCTAP; + cmd.zct.ifq = ifq; + cmd.zct.queue_id = queue_id; + + return ndo_bpf(dev, &cmd); +} + +static int io_open_zctap_ifq(struct io_zctap_ifq *ifq, u16 queue_id) +{ + return __io_queue_mgmt(ifq->dev, ifq, queue_id); +} + +static int io_close_zctap_ifq(struct io_zctap_ifq *ifq, u16 queue_id) +{ + return __io_queue_mgmt(ifq->dev, NULL, queue_id); +} + +static struct io_zctap_ifq *io_zctap_ifq_alloc(struct io_ring_ctx *ctx) +{ + struct io_zctap_ifq *ifq; + + ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); + if (!ifq) + return NULL; + + ifq->ctx = ctx; + ifq->queue_id = -1; + return ifq; +} + +static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) +{ + if (ifq->queue_id != -1) + io_close_zctap_ifq(ifq, ifq->queue_id); + if (ifq->dev) + dev_put(ifq->dev); + kfree(ifq); +} + +int io_register_ifq(struct io_ring_ctx *ctx, + struct io_uring_ifq_req __user *arg) +{ + struct io_uring_ifq_req req; + struct io_zctap_ifq *ifq; + int err; + + if (copy_from_user(&req, arg, sizeof(req))) + return -EFAULT; + + if (req.resv[0] || req.resv[1]) + return -EINVAL; + + if (req.ifq_id >= NR_ZCTAP_IFQS) + return -EINVAL; + + if (ctx->zctap_ifq) + return -EBUSY; + + ifq = io_zctap_ifq_alloc(ctx); + if (!ifq) + return -ENOMEM; + + ifq->fill_bgid = req.fill_bgid; + + err = -ENODEV; + ifq->dev = dev_get_by_index(&init_net, req.ifindex); + if (!ifq->dev) + goto out; + + /* region attachment TBD */ + + err = io_open_zctap_ifq(ifq, req.queue_id); + if (err) + goto out; + ifq->queue_id = req.queue_id; + + ctx->zctap_ifq = ifq; + + return 0; + +out: + io_zctap_ifq_free(ifq); + return err; +} + +int io_unregister_zctap_ifq(struct io_ring_ctx *ctx, unsigned long index) +{ + struct io_zctap_ifq *ifq; + + ifq = ctx->zctap_ifq; + if (!ifq) + return -EINVAL; + + ctx->zctap_ifq = NULL; + io_zctap_ifq_free(ifq); + + return 0; +} + +void io_unregister_zctap_all(struct io_ring_ctx *ctx) +{ + int i; + + for (i = 0; i < NR_ZCTAP_IFQS; i++) + io_unregister_zctap_ifq(ctx, i); +} diff --git a/io_uring/zctap.h b/io_uring/zctap.h new file mode 100644 index 000000000000..bbe4a509408b --- /dev/null +++ b/io_uring/zctap.h @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef IOU_ZCTAP_H +#define IOU_ZCTAP_H + +int io_register_ifq(struct io_ring_ctx *ctx, + struct io_uring_ifq_req __user *arg); +void io_unregister_zctap_all(struct io_ring_ctx *ctx); + +#endif From patchwork Tue Nov 8 05:05:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1D8BC4321E for ; Tue, 8 Nov 2022 05:05:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233216AbiKHFFo convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232871AbiKHFFk (ORCPT ); Tue, 8 Nov 2022 00:05:40 -0500 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BDA712AD9 for ; Mon, 7 Nov 2022 21:05:38 -0800 (PST) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 2A80C9mn029307 for ; Mon, 7 Nov 2022 21:05:37 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by m0001303.ppops.net (PPS) with ESMTPS id 3kqcc5hm0e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:37 -0800 Received: from twshared13940.35.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:36 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id EC56723B2600F; Mon, 7 Nov 2022 21:05:21 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 04/15] io_uring: create a zctap region for a mapped buffer Date: Mon, 7 Nov 2022 21:05:10 -0800 Message-ID: <20221108050521.3198458-5-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: EqefHHWu3XUjqBzoiCk8Dt24hU8nsd9i X-Proofpoint-GUID: EqefHHWu3XUjqBzoiCk8Dt24hU8nsd9i X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This function takes all of a memory region that was previously registered with io_uring, and assigns it as the backing store for the specified ifq, binding the pages to a specific device. The entire region is registered instead of providing individual buffers, as this allows the hardware to select the optimal buffer size for incoming packets. The region is registered as part of the register_ifq opcode, instead of separately, since the ifq ring requires memory when it is created. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++- io_uring/zctap.h | 2 ++ 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 2ba05110ea8a..0705f5056d07 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -6,16 +6,73 @@ #include #include #include +#include #include #include "io_uring.h" #include "zctap.h" +#include "rsrc.h" +#include "kbuf.h" #define NR_ZCTAP_IFQS 1 +struct ifq_region { + struct io_mapped_ubuf *imu; + int free_count; + int nr_pages; + u16 id; + struct page *freelist[]; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); +static void io_remove_ifq_region(struct ifq_region *ifr) +{ + kvfree(ifr); +} + +int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) +{ + struct io_ring_ctx *ctx = ifq->ctx; + struct io_mapped_ubuf *imu; + struct ifq_region *ifr; + int i, nr_pages; + struct page *page; + + /* XXX for now, only allow one region per ifq. */ + if (ifq->region) + return -EFAULT; + + if (unlikely(id >= ctx->nr_user_bufs)) + return -EFAULT; + id = array_index_nospec(id, ctx->nr_user_bufs); + imu = ctx->user_bufs[id]; + + /* XXX check region is page aligned */ + if (imu->ubuf & ~PAGE_MASK || imu->ubuf_end & ~PAGE_MASK) + return -EFAULT; + + nr_pages = imu->nr_bvecs; + ifr = kvmalloc(struct_size(ifr, freelist, nr_pages), GFP_KERNEL); + if (!ifr) + return -ENOMEM; + + ifr->nr_pages = nr_pages; + ifr->imu = imu; + ifr->free_count = nr_pages; + ifr->id = id; + + for (i = 0; i < nr_pages; i++) { + page = imu->bvec[i].bv_page; + ifr->freelist[i] = page; + } + + ifq->region = ifr; + + return 0; +} + static int __io_queue_mgmt(struct net_device *dev, struct io_zctap_ifq *ifq, u16 queue_id) { @@ -60,6 +117,8 @@ static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) { if (ifq->queue_id != -1) io_close_zctap_ifq(ifq, ifq->queue_id); + if (ifq->region) + io_remove_ifq_region(ifq->region); if (ifq->dev) dev_put(ifq->dev); kfree(ifq); @@ -95,7 +154,9 @@ int io_register_ifq(struct io_ring_ctx *ctx, if (!ifq->dev) goto out; - /* region attachment TBD */ + err = io_provide_ifq_region(ifq, req.region_id); + if (err) + goto out; err = io_open_zctap_ifq(ifq, req.queue_id); if (err) diff --git a/io_uring/zctap.h b/io_uring/zctap.h index bbe4a509408b..bb44f8e972e8 100644 --- a/io_uring/zctap.h +++ b/io_uring/zctap.h @@ -6,4 +6,6 @@ int io_register_ifq(struct io_ring_ctx *ctx, struct io_uring_ifq_req __user *arg); void io_unregister_zctap_all(struct io_ring_ctx *ctx); +int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id); + #endif From patchwork Tue Nov 8 05:05:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035894 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8C76C43217 for ; Tue, 8 Nov 2022 05:05:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232654AbiKHFFq convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233103AbiKHFFl (ORCPT ); Tue, 8 Nov 2022 00:05:41 -0500 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B9CF17AB9 for ; Mon, 7 Nov 2022 21:05:39 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKmUo022482 for ; Mon, 7 Nov 2022 21:05:38 -0800 Received: from maileast.thefacebook.com ([163.114.130.3]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kq6kk4yf3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:38 -0800 Received: from twshared2001.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:37 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id F335A23B26011; Mon, 7 Nov 2022 21:05:21 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 05/15] io_uring: mark pages in ifq region with zctap information. Date: Mon, 7 Nov 2022 21:05:11 -0800 Message-ID: <20221108050521.3198458-6-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Ct5t0uyShckbSY0wYhvTP4p0n6ROSynx X-Proofpoint-GUID: Ct5t0uyShckbSY0wYhvTP4p0n6ROSynx X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The network stack passes up pages, which must be mapped to zctap device buffers in order to get the reference count and other items. Mark the page as private, and use the page_private field to record the lookup and ownership information. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 61 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 56 insertions(+), 5 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 0705f5056d07..7426feee1e04 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -27,18 +27,68 @@ struct ifq_region { typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); +static void zctap_set_page_info(struct page *page, u64 info) +{ + set_page_private(page, info); +} + +static u64 zctap_mk_page_info(u16 region_id, u16 pgid) +{ + return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; +} + static void io_remove_ifq_region(struct ifq_region *ifr) { + struct io_mapped_ubuf *imu; + struct page *page; + int i; + + imu = ifr->imu; + for (i = 0; i < ifr->nr_pages; i++) { + page = imu->bvec[i].bv_page; + + ClearPagePrivate(page); + set_page_private(page, 0); + } + kvfree(ifr); } +static int io_zctap_map_region(struct ifq_region *ifr) +{ + struct io_mapped_ubuf *imu; + struct page *page; + u64 info; + int i; + + imu = ifr->imu; + for (i = 0; i < ifr->nr_pages; i++) { + page = imu->bvec[i].bv_page; + if (PagePrivate(page)) + goto out; + SetPagePrivate(page); + info = zctap_mk_page_info(ifr->id, i); + zctap_set_page_info(page, info); + ifr->freelist[i] = page; + } + return 0; + +out: + while (i--) { + page = imu->bvec[i].bv_page; + ClearPagePrivate(page); + set_page_private(page, 0); + } + return -EEXIST; +} + int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) { struct io_ring_ctx *ctx = ifq->ctx; struct io_mapped_ubuf *imu; struct ifq_region *ifr; - int i, nr_pages; - struct page *page; + int nr_pages; + int err; /* XXX for now, only allow one region per ifq. */ if (ifq->region) @@ -63,9 +113,10 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) ifr->free_count = nr_pages; ifr->id = id; - for (i = 0; i < nr_pages; i++) { - page = imu->bvec[i].bv_page; - ifr->freelist[i] = page; + err = io_zctap_map_region(ifr); + if (err) { + kvfree(ifr); + return err; } ifq->region = ifr; From patchwork Tue Nov 8 05:05:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035892 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DAFEC43217 for ; Tue, 8 Nov 2022 05:05:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232693AbiKHFFn convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232736AbiKHFFj (ORCPT ); Tue, 8 Nov 2022 00:05:39 -0500 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BE5117A91 for ; Mon, 7 Nov 2022 21:05:38 -0800 (PST) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 2A80CG6p029458 for ; Mon, 7 Nov 2022 21:05:37 -0800 Received: from maileast.thefacebook.com ([163.114.130.3]) by m0001303.ppops.net (PPS) with ESMTPS id 3kqcc5hm0f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:37 -0800 Received: from twshared13940.35.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:36 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 05B0023B26013; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 06/15] io_uring: Provide driver API for zctap packet buffers. Date: Mon, 7 Nov 2022 21:05:12 -0800 Message-ID: <20221108050521.3198458-7-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Nkd2yl1HRKXUjHlPF4m3jhaL8vjEpxLh X-Proofpoint-GUID: Nkd2yl1HRKXUjHlPF4m3jhaL8vjEpxLh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Introduce 'struct io_zctap_buf', representing a buffer used by the network drivers, and the get/put functions which are used by the network driver to obtain the buffers. The code for these will be fleshed out in a following patch. Signed-off-by: Jonathan Lemon --- include/linux/io_uring.h | 47 ++++++++++++++++++++++++++++++++++++++++ io_uring/zctap.c | 23 ++++++++++++++++++++ 2 files changed, 70 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 43bc8a2edccf..97c1a2e37077 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -32,6 +32,13 @@ struct io_uring_cmd { u8 pdu[32]; /* available inline for free use */ }; +struct io_zctap_buf { + dma_addr_t dma; + struct page *page; + atomic_t refcount; + u8 _pad[4]; +}; + #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); @@ -44,6 +51,21 @@ void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); const char *io_uring_get_opcode(u8 opcode); +struct io_zctap_ifq; +struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc); +void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf); +void io_zctap_put_buf_refs(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf, + unsigned count); +bool io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page); + +static inline dma_addr_t io_zctap_buf_dma(struct io_zctap_buf *buf) +{ + return buf->dma; +} +static inline struct page *io_zctap_buf_page(struct io_zctap_buf *buf) +{ + return buf->page; +} static inline void io_uring_files_cancel(void) { if (current->io_uring) { @@ -92,6 +114,31 @@ static inline const char *io_uring_get_opcode(u8 opcode) { return ""; } +static inline dma_addr_t io_zctap_buf_dma(struct io_zctap_buf *buf) +{ + return 0; +} +static inline struct page *io_zctap_buf_page(struct io_zctap_buf *buf) +{ + return NULL; +} +static inline struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, + int refc) +{ + return NULL; +} +void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) +{ +} +void io_zctap_put_buf_refs(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf, + unsigned count) +{ +} +bool io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page) +{ + return false; +} + #endif #endif diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 7426feee1e04..69a04de87f8f 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -37,6 +37,29 @@ static u64 zctap_mk_page_info(u16 region_id, u16 pgid) return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; } +struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc) +{ + return NULL; +} +EXPORT_SYMBOL(io_zctap_get_buf); + +void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) +{ +} +EXPORT_SYMBOL(io_zctap_put_buf); + +void io_zctap_put_buf_refs(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf, + unsigned count) +{ +} +EXPORT_SYMBOL(io_zctap_put_buf_refs); + +bool io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page) +{ + return false; +} +EXPORT_SYMBOL(io_zctap_put_page); + static void io_remove_ifq_region(struct ifq_region *ifr) { struct io_mapped_ubuf *imu; From patchwork Tue Nov 8 05:05:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035887 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E132C43217 for ; Tue, 8 Nov 2022 05:05:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229747AbiKHFFg convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbiKHFFe (ORCPT ); Tue, 8 Nov 2022 00:05:34 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4002AE44 for ; Mon, 7 Nov 2022 21:05:33 -0800 (PST) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A80Utv8007222 for ; Mon, 7 Nov 2022 21:05:33 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kqcmqse5s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:33 -0800 Received: from twshared5476.02.ash7.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:32 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 0CB6C23B26015; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 07/15] io_uring: Allocate zctap device buffers and dma map them. Date: Mon, 7 Nov 2022 21:05:13 -0800 Message-ID: <20221108050521.3198458-8-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: an758WKV419bhmqoSaFd1I5IpsJtwp3C X-Proofpoint-ORIG-GUID: an758WKV419bhmqoSaFd1I5IpsJtwp3C X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The goal is to register a memory region with the device, and later specify the desired packet buffer size. The code currently assumes a page size. Create the desired number of zctap buffers and DMA map them to the target device, recording the dma address for later use. Hold a page reference while the page is dma mapped. Change the freelist from an array of page pointers to an index into the device buffer list. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 78 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 63 insertions(+), 15 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 69a04de87f8f..fe4bb3781636 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -18,11 +18,14 @@ #define NR_ZCTAP_IFQS 1 struct ifq_region { + struct io_zctap_ifq *ifq; /* only for delayed_work */ struct io_mapped_ubuf *imu; int free_count; int nr_pages; u16 id; - struct page *freelist[]; + + struct io_zctap_buf *buf; + u16 freelist[]; }; typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); @@ -60,49 +63,85 @@ bool io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page) } EXPORT_SYMBOL(io_zctap_put_page); +static inline struct device * +netdev2device(struct net_device *dev) +{ + return dev->dev.parent; /* from SET_NETDEV_DEV() */ +} + static void io_remove_ifq_region(struct ifq_region *ifr) { - struct io_mapped_ubuf *imu; - struct page *page; + struct device *device = netdev2device(ifr->ifq->dev); + struct io_zctap_buf *buf; int i; - imu = ifr->imu; for (i = 0; i < ifr->nr_pages; i++) { - page = imu->bvec[i].bv_page; - - ClearPagePrivate(page); - set_page_private(page, 0); + buf = &ifr->buf[i]; + set_page_private(buf->page, 0); + ClearPagePrivate(buf->page); + dma_unmap_page_attrs(device, buf->dma, PAGE_SIZE, + DMA_BIDIRECTIONAL, + DMA_ATTR_SKIP_CPU_SYNC); + put_page(buf->page); } + kvfree(ifr->buf); kvfree(ifr); } -static int io_zctap_map_region(struct ifq_region *ifr) +static int io_zctap_map_region(struct ifq_region *ifr, struct device *device) { struct io_mapped_ubuf *imu; + struct io_zctap_buf *buf; struct page *page; + dma_addr_t addr; + int i, err; u64 info; - int i; imu = ifr->imu; for (i = 0; i < ifr->nr_pages; i++) { page = imu->bvec[i].bv_page; - if (PagePrivate(page)) + + if (PagePrivate(page)) { + err = -EEXIST; goto out; + } + SetPagePrivate(page); info = zctap_mk_page_info(ifr->id, i); zctap_set_page_info(page, info); - ifr->freelist[i] = page; + + buf = &ifr->buf[i]; + addr = dma_map_page_attrs(device, page, 0, PAGE_SIZE, + DMA_BIDIRECTIONAL, + DMA_ATTR_SKIP_CPU_SYNC); + if (dma_mapping_error(device, addr)) { + set_page_private(page, 0); + ClearPagePrivate(page); + err = -ENOMEM; + goto out; + } + buf->dma = addr; + buf->page = page; + atomic_set(&buf->refcount, 0); + get_page(page); + + ifr->freelist[i] = i; } return 0; out: while (i--) { page = imu->bvec[i].bv_page; - ClearPagePrivate(page); set_page_private(page, 0); + ClearPagePrivate(page); + buf = &ifr->buf[i]; + dma_unmap_page_attrs(device, buf->dma, PAGE_SIZE, + DMA_BIDIRECTIONAL, + DMA_ATTR_SKIP_CPU_SYNC); + put_page(page); } - return -EEXIST; + return err; } int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) @@ -131,13 +170,22 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) if (!ifr) return -ENOMEM; + ifr->buf = kvmalloc_array(nr_pages, sizeof(*ifr->buf), GFP_KERNEL); + if (!ifr->buf) { + kvfree(ifr); + return -ENOMEM; + } + ifr->nr_pages = nr_pages; ifr->imu = imu; ifr->free_count = nr_pages; ifr->id = id; - err = io_zctap_map_region(ifr); + ifr->ifq = ifq; /* XXX */ + + err = io_zctap_map_region(ifr, netdev2device(ifq->dev)); if (err) { + kvfree(ifr->buf); kvfree(ifr); return err; } From patchwork Tue Nov 8 05:05:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D1C0C4321E for ; Tue, 8 Nov 2022 05:05:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230130AbiKHFFe convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229747AbiKHFFd (ORCPT ); Tue, 8 Nov 2022 00:05:33 -0500 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 28C26AE44 for ; Mon, 7 Nov 2022 21:05:32 -0800 (PST) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKsiQ007050 for ; Mon, 7 Nov 2022 21:05:31 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3knnnvd7sh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:31 -0800 Received: from twshared9088.05.ash9.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:29 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 1412C23B26017; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 08/15] io_uring: Add zctap buffer get/put functions and refcounting. Date: Mon, 7 Nov 2022 21:05:14 -0800 Message-ID: <20221108050521.3198458-9-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: xr51wzU1SAZlO5BXf2jcwPwgYSL4sRUk X-Proofpoint-GUID: xr51wzU1SAZlO5BXf2jcwPwgYSL4sRUk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Flesh out the driver API functions introduced earlier. The driver gets a buffer with the specified reference count. If the driver specifies a large refcount (bias), it decrements this as skb fragments go up the stack, and the driver releases the references when finished with the buffer. When ownership of the fragment is transferred to the user, a user refcount is incremented, and correspondingly decremented when returned. When all refcounts are released, the buffer is safe to reuse. The user/kernel split is needed to differentiate between "safe to reuse the buffer" and "still in use by the kernel". The locking here can likely be improved. Signed-off-by: Jonathan Lemon --- io_uring/kbuf.c | 13 +++++ io_uring/kbuf.h | 2 + io_uring/zctap.c | 131 ++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 145 insertions(+), 1 deletion(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 25cd724ade18..aadc664aaa87 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -188,6 +188,19 @@ void __user *io_buffer_select(struct io_kiocb *req, size_t *len, return ret; } +/* Called from the network driver, in napi context. */ +u64 io_zctap_buffer(struct io_kiocb *req, size_t *len) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_buffer_list *bl; + void __user *ret = NULL; + + bl = io_buffer_get_list(ctx, req->buf_index); + if (likely(bl)) + ret = io_ring_buffer_select(req, len, bl, IO_URING_F_UNLOCKED); + return (u64)ret; +} + static __cold int io_init_bl_list(struct io_ring_ctx *ctx) { int i; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index c23e15d7d3ca..b530e987b438 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -50,6 +50,8 @@ unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags); void io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags); +u64 io_zctap_buffer(struct io_kiocb *req, size_t *len); + static inline void io_kbuf_recycle_ring(struct io_kiocb *req) { /* diff --git a/io_uring/zctap.c b/io_uring/zctap.c index fe4bb3781636..0da9e6510f36 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -24,6 +24,8 @@ struct ifq_region { int nr_pages; u16 id; + spinlock_t freelist_lock; + struct io_zctap_buf *buf; u16 freelist[]; }; @@ -40,20 +42,146 @@ static u64 zctap_mk_page_info(u16 region_id, u16 pgid) return (u64)0xface << 48 | (u64)region_id << 16 | (u64)pgid; } +static u64 zctap_page_info(const struct page *page) +{ + return page_private(page); +} + +static u16 zctap_page_id(const struct page *page) +{ + return zctap_page_info(page) & 0xffff; +} + +/* driver bias cannot be larger than this */ +#define IO_ZCTAP_UREF 0x10000 +#define IO_ZCTAP_KREF_MASK (IO_ZCTAP_UREF - 1) + +/* return user refs back, indicate whether buffer is reusable */ +static bool io_zctap_put_buf_uref(struct io_zctap_buf *buf) +{ + if (atomic_read(&buf->refcount) < IO_ZCTAP_UREF) { + WARN_ONCE(1, "uref botch: %x < %x, id:%d page:%px\n", + atomic_read(&buf->refcount), IO_ZCTAP_UREF, + zctap_page_id(buf->page), + buf->page); + return false; + } + + return atomic_sub_and_test(IO_ZCTAP_UREF, &buf->refcount); +} + +/* gets a user-supplied buffer from the fill queue */ +static struct io_zctap_buf *io_zctap_get_buffer(struct io_zctap_ifq *ifq, + u16 *buf_pgid) +{ + struct io_zctap_buf *buf; + struct ifq_region *ifr; + struct io_kiocb req; + int pgid, region_id; + size_t len = 0; + u64 addr; + + ifr = ifq->region; +retry: + req = (struct io_kiocb) { + .ctx = ifq->ctx, + .buf_index = ifq->fill_bgid, + }; + /* IN: uses buf_index as buffer group. + * OUT: buf_index of actual buffer. (and req->buf_list set) + * (this comes from the user-supplied bufid) + */ + addr = io_zctap_buffer(&req, &len); + if (!addr) + return NULL; + + pgid = addr & 0xffff; + region_id = (addr >> 16) & 0xffff; + if (region_id) { + WARN_RATELIMIT(1, "region_id %d > max 1", region_id); + return NULL; + } + + if (pgid > ifr->nr_pages) { + WARN_RATELIMIT(1, "bufid %d > max %d", pgid, ifr->nr_pages); + return NULL; + } + + buf = &ifr->buf[pgid]; + if (!io_zctap_put_buf_uref(buf)) + goto retry; + + *buf_pgid = pgid; + return buf; +} + +/* if on exit/teardown path, can skip this work */ +static void io_zctap_recycle_buf(struct ifq_region *ifr, + struct io_zctap_buf *buf) +{ + spin_lock(&ifr->freelist_lock); + + ifr->freelist[ifr->free_count++] = buf - ifr->buf; + + spin_unlock(&ifr->freelist_lock); +} + struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc) { - return NULL; + struct ifq_region *ifr = ifq->region; + struct io_zctap_buf *buf; + u16 pgid; + + spin_lock(&ifr->freelist_lock); + + buf = NULL; + if (ifr->free_count) { + pgid = ifr->freelist[--ifr->free_count]; + buf = &ifr->buf[pgid]; + } + + spin_unlock(&ifr->freelist_lock); + + if (!buf) { + buf = io_zctap_get_buffer(ifq, &pgid); + if (!buf) + return NULL; + } + + WARN_ON(atomic_read(&buf->refcount)); + atomic_set(&buf->refcount, refc & IO_ZCTAP_KREF_MASK); + + return buf; } EXPORT_SYMBOL(io_zctap_get_buf); +/* called from driver and networking stack. */ void io_zctap_put_buf(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf) { + struct ifq_region *ifr = ifq->region; + + /* XXX move to inline function later. */ + if (!atomic_dec_and_test(&buf->refcount)) + return; + + io_zctap_recycle_buf(ifr, buf); } EXPORT_SYMBOL(io_zctap_put_buf); +/* called from driver and networking stack. */ void io_zctap_put_buf_refs(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf, unsigned count) { + struct ifq_region *ifr = ifq->region; + unsigned refs; + + refs = atomic_read(&buf->refcount) & IO_ZCTAP_KREF_MASK; + WARN(refs < count, "driver refcount botch: %u < %u\n", refs, count); + + if (!atomic_sub_and_test(count, &buf->refcount)) + return; + + io_zctap_recycle_buf(ifr, buf); } EXPORT_SYMBOL(io_zctap_put_buf_refs); @@ -176,6 +304,7 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) return -ENOMEM; } + spin_lock_init(&ifr->freelist_lock); ifr->nr_pages = nr_pages; ifr->imu = imu; ifr->free_count = nr_pages; From patchwork Tue Nov 8 05:05:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035896 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16BECC4332F for ; Tue, 8 Nov 2022 05:05:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230246AbiKHFFt convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233173AbiKHFFn (ORCPT ); Tue, 8 Nov 2022 00:05:43 -0500 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B5DA14D3C for ; Mon, 7 Nov 2022 21:05:40 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKnx3022573 for ; Mon, 7 Nov 2022 21:05:39 -0800 Received: from maileast.thefacebook.com ([163.114.130.8]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kq6kk4yf4-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:39 -0800 Received: from twshared2001.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:37 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 1B69E23B26019; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 09/15] skbuff: Introduce SKBFL_FIXED_FRAG and skb_fixed() Date: Mon, 7 Nov 2022 21:05:15 -0800 Message-ID: <20221108050521.3198458-10-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: UvF-xj7GgEVAyCYNJT8Q5MFOjcsmjPxi X-Proofpoint-GUID: UvF-xj7GgEVAyCYNJT8Q5MFOjcsmjPxi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org When a skb marked as zerocopy goes up the network stack, during RX, it calls skb_orphan_frags_rx. This is designed to catch TX zerocopy data being redirected back up the stack, not new zerocopy fragments coming up from the driver. Currently, since the skb is marked as zerocopy, skb_copy_ubufs() is called, defeating the point of zerocopy-RX. Have the driver mark the fragments as fixed, so they are not copied. Signed-off-by: Jonathan Lemon --- include/linux/skbuff.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 59c9fd55699d..5d57e2c37529 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -516,6 +516,9 @@ enum { * use frags only up until ubuf_info is released */ SKBFL_MANAGED_FRAG_REFS = BIT(4), + + /* don't move or copy the fragment */ + SKBFL_FIXED_FRAG = BIT(5), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) @@ -1653,6 +1656,11 @@ static inline bool skb_zcopy_managed(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; } +static inline bool skb_fixed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_FIXED_FRAG; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -3089,7 +3097,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) /* Frags must be orphaned, even if refcounted, if skb might loop to rx path */ static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask) { - if (likely(!skb_zcopy(skb))) + if (likely(!skb_zcopy(skb) || skb_fixed(skb))) return 0; return skb_copy_ubufs(skb, gfp_mask); } From patchwork Tue Nov 8 05:05:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76996C433FE for ; Tue, 8 Nov 2022 05:05:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232641AbiKHFFr convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230246AbiKHFFm (ORCPT ); Tue, 8 Nov 2022 00:05:42 -0500 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB1F2186FF for ; Mon, 7 Nov 2022 21:05:39 -0800 (PST) Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKnx2022573 for ; Mon, 7 Nov 2022 21:05:39 -0800 Received: from maileast.thefacebook.com ([163.114.130.8]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kq6kk4yf4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:39 -0800 Received: from twshared2001.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:37 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 21FDA23B2601B; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 10/15] io_uring: Allocate a uarg for use by the ifq RX Date: Mon, 7 Nov 2022 21:05:16 -0800 Message-ID: <20221108050521.3198458-11-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: H2FP-aF33JfnBK3GNiiVm6LkVdqEKc_J X-Proofpoint-GUID: H2FP-aF33JfnBK3GNiiVm6LkVdqEKc_J X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Create a static uarg which is attached to zerocopy RX buffers, and add a callback to handle freeing the skb. As the skb is marked as zerocopy, it bypasses the default network skb fragment destructor and uses our callback. This handles our buffer refcounts, and releases the ZC buffer back to the freelist. Add the put_page() implementations, which release the fragments. This may also be called by drivers during cleanup. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 64 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 56 insertions(+), 8 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 0da9e6510f36..10d74b8f7cef 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -30,6 +30,12 @@ struct ifq_region { u16 freelist[]; }; +/* XXX get around not having "struct ubuf_info" defined in io_uring_types.h */ +struct io_zctap_ifq_priv { + struct io_zctap_ifq ifq; + struct ubuf_info uarg; +}; + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); static void zctap_set_page_info(struct page *page, u64 info) @@ -52,6 +58,16 @@ static u16 zctap_page_id(const struct page *page) return zctap_page_info(page) & 0xffff; } +static bool zctap_page_magic(const struct page *page) +{ + return (zctap_page_info(page) >> 48) == 0xface; +} + +static bool zctap_page_ours(struct page *page) +{ + return PagePrivate(page) && zctap_page_magic(page); +} + /* driver bias cannot be larger than this */ #define IO_ZCTAP_UREF 0x10000 #define IO_ZCTAP_KREF_MASK (IO_ZCTAP_UREF - 1) @@ -70,7 +86,9 @@ static bool io_zctap_put_buf_uref(struct io_zctap_buf *buf) return atomic_sub_and_test(IO_ZCTAP_UREF, &buf->refcount); } -/* gets a user-supplied buffer from the fill queue */ +/* gets a user-supplied buffer from the fill queue + * note: may drain N entries, but still have no usable buffers + */ static struct io_zctap_buf *io_zctap_get_buffer(struct io_zctap_ifq *ifq, u16 *buf_pgid) { @@ -185,9 +203,19 @@ void io_zctap_put_buf_refs(struct io_zctap_ifq *ifq, struct io_zctap_buf *buf, } EXPORT_SYMBOL(io_zctap_put_buf_refs); +/* could be called by the stack as it drops/recycles the skbs */ bool io_zctap_put_page(struct io_zctap_ifq *ifq, struct page *page) { - return false; + struct ifq_region *ifr; + u16 pgid; + + if (!zctap_page_ours(page)) + return false; + + ifr = ifq->region; /* only one */ + pgid = zctap_page_id(page); + io_zctap_put_buf(ifq, &ifr->buf[pgid]); + return true; } EXPORT_SYMBOL(io_zctap_put_page); @@ -351,17 +379,35 @@ static int io_close_zctap_ifq(struct io_zctap_ifq *ifq, u16 queue_id) return __io_queue_mgmt(ifq->dev, NULL, queue_id); } +static void io_zctap_ifq_callback(struct sk_buff *skb, struct ubuf_info *uarg, + bool success) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + struct io_zctap_ifq_priv *priv; + struct page *page; + int i; + + priv = container_of(uarg, struct io_zctap_ifq_priv, uarg); + + for (i = 0; i < shinfo->nr_frags; i++) { + page = skb_frag_page(&shinfo->frags[i]); + if (!io_zctap_put_page(&priv->ifq, page)) + __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); + } +} + static struct io_zctap_ifq *io_zctap_ifq_alloc(struct io_ring_ctx *ctx) { - struct io_zctap_ifq *ifq; + struct io_zctap_ifq_priv *priv; - ifq = kzalloc(sizeof(*ifq), GFP_KERNEL); - if (!ifq) + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) return NULL; - ifq->ctx = ctx; - ifq->queue_id = -1; - return ifq; + priv->ifq.ctx = ctx; + priv->ifq.queue_id = -1; + priv->ifq.uarg = &priv->uarg; + return &priv->ifq; } static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) @@ -399,6 +445,8 @@ int io_register_ifq(struct io_ring_ctx *ctx, return -ENOMEM; ifq->fill_bgid = req.fill_bgid; + ifq->uarg->callback = io_zctap_ifq_callback; + ifq->uarg->flags = SKBFL_ALL_ZEROCOPY | SKBFL_FIXED_FRAG; err = -ENODEV; ifq->dev = dev_get_by_index(&init_net, req.ifindex); From patchwork Tue Nov 8 05:05:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035886 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D98DC433FE for ; Tue, 8 Nov 2022 05:05:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230325AbiKHFFe convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229854AbiKHFFd (ORCPT ); Tue, 8 Nov 2022 00:05:33 -0500 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8106E11A0B for ; Mon, 7 Nov 2022 21:05:32 -0800 (PST) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKsiR007050 for ; Mon, 7 Nov 2022 21:05:31 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3knnnvd7sh-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:31 -0800 Received: from twshared15216.17.frc2.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:21d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:30 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 2A3CC23B2601D; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 11/15] io_uring: Define the zctap iov[] returned to the user. Date: Mon, 7 Nov 2022 21:05:17 -0800 Message-ID: <20221108050521.3198458-12-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: wV4jZWWTSTxpsbFxuUy0QmgpJZebFhmO X-Proofpoint-GUID: wV4jZWWTSTxpsbFxuUy0QmgpJZebFhmO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org When performing a ZC receive, instead of returning data directly to the user, an iov[] structure is returned referencing the data in user space. The application locates the base address of the data by performing address computations on bgid:bid. The off/len applies to the base address, resulting in the data segment. The bgid:bid identifying the buffer should later be placed in the ifq's fill ring, which returns the buffer back to the kernel. Signed-off-by: Jonathan Lemon --- include/uapi/linux/io_uring.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index f65543595d71..88f01bda12be 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -681,6 +681,14 @@ struct io_uring_ifq_req { __u16 resv[2]; }; +struct io_uring_zctap_iov { + __u32 off; + __u32 len; + __u16 bgid; + __u16 bid; + __u16 resv[2]; +}; + #ifdef __cplusplus } #endif From patchwork Tue Nov 8 05:05:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035888 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6FC0C4332F for ; Tue, 8 Nov 2022 05:05:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229854AbiKHFFf convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229922AbiKHFFe (ORCPT ); Tue, 8 Nov 2022 00:05:34 -0500 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 020E5A46D for ; Mon, 7 Nov 2022 21:05:32 -0800 (PST) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKoEW027743 for ; Mon, 7 Nov 2022 21:05:32 -0800 Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3knkb8633e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:32 -0800 Received: from twshared27579.05.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:30 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 2FE0323B2601F; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 12/15] io_uring: add OP_RECV_ZC command. Date: Mon, 7 Nov 2022 21:05:18 -0800 Message-ID: <20221108050521.3198458-13-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: b4VyeoNHGonvQAkmAlXhOjD-7eCk8nbr X-Proofpoint-ORIG-GUID: b4VyeoNHGonvQAkmAlXhOjD-7eCk8nbr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The recvzc opcode uses a metadata buffer either supplied directly with buf/len, or indirectly from the buffer group. The expectation is that this buffer is then filled with an array of io_uring_zctap_iov structures, which point to the data in user-memory. The opcode uses addr3 with an encoded format: addr3 = (readlen << 32) | (copy_bgid << 16); Readlen specifies the maximum amount of data which should be read from the socket. The amount of returned data is also limited by the number of iovs which the metadata area can hold. As a fallback, if the desired skb data is not already present in user memory, then a seprate buffer is obtained from the copy_bgid and the data is copied into user memory, which is returned as an iov[] structure. This may happen possibly due to system misconfiguration, imprecise header splitting, running the RECV_ZC opcode without hardware zero-copy support, or other network stack follies. Signed-off-by: Jonathan Lemon --- include/uapi/linux/io_uring.h | 1 + io_uring/net.c | 123 ++++++++++++ io_uring/opdef.c | 15 ++ io_uring/zctap.c | 340 ++++++++++++++++++++++++++++++++++ io_uring/zctap.h | 20 ++ 5 files changed, 499 insertions(+) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 88f01bda12be..6d20dfbf5bb1 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -215,6 +215,7 @@ enum io_uring_op { IORING_OP_URING_CMD, IORING_OP_SEND_ZC, IORING_OP_SENDMSG_ZC, + IORING_OP_RECV_ZC, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/net.c b/io_uring/net.c index 15dea91625e2..ef5ed6002751 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -16,6 +16,7 @@ #include "net.h" #include "notif.h" #include "rsrc.h" +#include "zctap.h" #if defined(CONFIG_NET) struct io_shutdown { @@ -67,6 +68,12 @@ struct io_sr_msg { struct io_kiocb *notif; }; +struct io_recvzc { + struct io_sr_msg sr; + u32 datalen; + u16 copy_bgid; +}; + #define IO_APOLL_MULTI_POLLED (REQ_F_APOLL_MULTISHOT | REQ_F_POLLED) int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -908,6 +915,122 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) return ret; } +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + u64 recvzc_cmd; + + /* use addr3 in order to leverage io_recvmsg_prep */ + recvzc_cmd = READ_ONCE(sqe->addr3); + + if (recvzc_cmd & 0xffff) + return -EINVAL; + zc->copy_bgid = (recvzc_cmd >> 16) & 0xffff; + zc->datalen = recvzc_cmd >> 32; + + return io_recvmsg_prep(req, sqe); +} + +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc); + struct zctap_read_desc zrd; + struct msghdr msg; + struct socket *sock; + struct iovec iov; + unsigned int cflags; + unsigned flags; + int ret, min_ret = 0; + bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK; + size_t len = zc->sr.len; + + if (!(req->flags & REQ_F_POLLED) && + (zc->sr.flags & IORING_RECVSEND_POLL_FIRST)) + return -EAGAIN; + + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + +retry_multishot: + if (io_do_buffer_select(req)) { + void __user *buf; + + buf = io_buffer_select(req, &len, issue_flags); + if (!buf) + return -ENOBUFS; + zc->sr.buf = buf; + } + + ret = import_single_range(READ, zc->sr.buf, len, &iov, &msg.msg_iter); + if (unlikely(ret)) + goto out_free; + + msg.msg_name = NULL; + msg.msg_namelen = 0; + msg.msg_control = NULL; + msg.msg_get_inq = 1; + msg.msg_flags = 0; + msg.msg_controllen = 0; + msg.msg_iocb = NULL; + msg.msg_ubuf = NULL; + + flags = zc->sr.msg_flags; + if (force_nonblock) + flags |= MSG_DONTWAIT; + if (flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + + zrd = (struct zctap_read_desc) { + .iov_limit = msg_data_left(&msg), + .recv_limit = zc->datalen, + .iter = &msg.msg_iter, + .ctx = req->ctx, + .copy_bgid = zc->copy_bgid, + }; + + ret = io_zctap_recv(sock, &zrd, &msg, flags); + if (ret < min_ret) { + if (ret == -EAGAIN && force_nonblock) { + if ((req->flags & IO_APOLL_MULTI_POLLED) == IO_APOLL_MULTI_POLLED) { + io_kbuf_recycle(req, issue_flags); + return IOU_ISSUE_SKIP_COMPLETE; + } + + return -EAGAIN; + } + if (ret == -ERESTARTSYS) + ret = -EINTR; + if (ret > 0 && io_net_retry(sock, flags)) { + zc->sr.len -= ret; + zc->sr.buf += ret; + zc->sr.done_io += ret; + req->flags |= REQ_F_PARTIAL_IO; + return -EAGAIN; + } + req_set_fail(req); + } else if ((flags & MSG_WAITALL) && (msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { +out_free: + req_set_fail(req); + } + + if (ret > 0) + ret += zc->sr.done_io; + else if (zc->sr.done_io) + ret = zc->sr.done_io; + else + io_kbuf_recycle(req, issue_flags); + + cflags = io_put_kbuf(req, issue_flags); + if (msg.msg_inq) + cflags |= IORING_CQE_F_SOCK_NONEMPTY; + + if (!io_recv_finish(req, &ret, cflags, ret <= 0)) + goto retry_multishot; + + return ret; +} + void io_send_zc_cleanup(struct io_kiocb *req) { struct io_sr_msg *zc = io_kiocb_to_cmd(req, struct io_sr_msg); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 83dc0f9ad3b2..14b42811a78e 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -33,6 +33,7 @@ #include "poll.h" #include "cancel.h" #include "rw.h" +#include "zctap.h" static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags) { @@ -521,6 +522,20 @@ const struct io_op_def io_op_defs[] = { .fail = io_sendrecv_fail, #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_RECV_ZC] = { + .name = "RECV_ZC", + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollin = 1, + .buffer_select = 1, + .ioprio = 1, +#if defined(CONFIG_NET) + .prep = io_recvzc_prep, + .issue = io_recvzc, +#else + .prep = io_eopnotsupp_prep, #endif }, }; diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 10d74b8f7cef..096b3dd5a8a3 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -53,6 +54,11 @@ static u64 zctap_page_info(const struct page *page) return page_private(page); } +static u16 zctap_page_region_id(const struct page *page) +{ + return (zctap_page_info(page) >> 16) & 0xffff; +} + static u16 zctap_page_id(const struct page *page) { return zctap_page_info(page) & 0xffff; @@ -72,6 +78,14 @@ static bool zctap_page_ours(struct page *page) #define IO_ZCTAP_UREF 0x10000 #define IO_ZCTAP_KREF_MASK (IO_ZCTAP_UREF - 1) +static void io_zctap_get_buf_uref(struct ifq_region *ifr, u16 pgid) +{ + if (WARN_ON(pgid >= ifr->nr_pages)) + return; + + atomic_add(IO_ZCTAP_UREF, &ifr->buf[pgid].refcount); +} + /* return user refs back, indicate whether buffer is reusable */ static bool io_zctap_put_buf_uref(struct io_zctap_buf *buf) { @@ -396,6 +410,18 @@ static void io_zctap_ifq_callback(struct sk_buff *skb, struct ubuf_info *uarg, } } +static struct io_zctap_ifq *io_zctap_skb_ifq(struct sk_buff *skb) +{ + struct io_zctap_ifq_priv *priv; + struct ubuf_info *uarg = skb_zcopy(skb); + + if (uarg && uarg->callback == io_zctap_ifq_callback) { + priv = container_of(uarg, struct io_zctap_ifq_priv, uarg); + return &priv->ifq; + } + return NULL; +} + static struct io_zctap_ifq *io_zctap_ifq_alloc(struct io_ring_ctx *ctx) { struct io_zctap_ifq_priv *priv; @@ -492,3 +518,317 @@ void io_unregister_zctap_all(struct io_ring_ctx *ctx) for (i = 0; i < NR_ZCTAP_IFQS; i++) io_unregister_zctap_ifq(ctx, i); } + +static int __zctap_get_user_buffer(struct zctap_read_desc *zrd, int len) +{ + if (!zrd->buflen) { + zrd->req = (struct io_kiocb) { + .ctx = zrd->ctx, + .buf_index = zrd->copy_bgid, + }; + + zrd->buf = (u8 *)io_zctap_buffer(&zrd->req, &zrd->buflen); + zrd->offset = 0; + } + return len > zrd->buflen ? zrd->buflen : len; +} + +static int zctap_copy_data(struct zctap_read_desc *zrd, int len, u8 *kaddr) +{ + struct io_uring_zctap_iov zov; + u32 space; + int err; + + space = zrd->iov_space + sizeof(zov); + if (space > zrd->iov_limit) + return 0; + + len = __zctap_get_user_buffer(zrd, len); + if (!len) + return -ENOBUFS; + + err = copy_to_user(zrd->buf + zrd->offset, kaddr, len); + if (err) + return -EFAULT; + + zov = (struct io_uring_zctap_iov) { + .off = zrd->offset, + .len = len, + .bgid = zrd->copy_bgid, + .bid = zrd->req.buf_index, + }; + + if (copy_to_iter(&zov, sizeof(zov), zrd->iter) != sizeof(zov)) + return -EFAULT; + + zrd->offset += len; + zrd->buflen -= len; + zrd->iov_space = space; + + return len; +} + +static int zctap_copy_frag(struct zctap_read_desc *zrd, struct page *page, + int off, int len, struct io_uring_zctap_iov *zov) +{ + u8 *kaddr; + int err; + + len = __zctap_get_user_buffer(zrd, len); + if (!len) + return -ENOBUFS; + + kaddr = kmap(page) + off; + err = copy_to_user(zrd->buf + zrd->offset, kaddr, len); + kunmap(page); + + if (err) + return -EFAULT; + + *zov = (struct io_uring_zctap_iov) { + .off = zrd->offset, + .len = len, + .bgid = zrd->copy_bgid, + .bid = zrd->req.buf_index, + }; + + zrd->offset += len; + zrd->buflen -= len; + + return len; +} + +static int zctap_recv_frag(struct zctap_read_desc *zrd, + struct io_zctap_ifq *ifq, + const skb_frag_t *frag, int off, int len) +{ + struct io_uring_zctap_iov zov; + struct page *page; + u32 space; + int pgid; + + space = zrd->iov_space + sizeof(zov); + if (space > zrd->iov_limit) + return 0; + + page = skb_frag_page(frag); + off += skb_frag_off(frag); + + if (likely(ifq && ifq->ctx == zrd->ctx && zctap_page_ours(page))) { + pgid = zctap_page_id(page); + io_zctap_get_buf_uref(ifq->region, pgid); + zov = (struct io_uring_zctap_iov) { + .off = off, + .len = len, + .bgid = zctap_page_region_id(page), + .bid = pgid, + }; + } else { + len = zctap_copy_frag(zrd, page, off, len, &zov); + if (len <= 0) + return len; + } + + if (copy_to_iter(&zov, sizeof(zov), zrd->iter) != sizeof(zov)) + return -EFAULT; + + zrd->iov_space = space; + + return len; +} + +/* Our version of __skb_datagram_iter -- should work for UDP also. */ +static int +zctap_recv_skb(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct zctap_read_desc *zrd = desc->arg.data; + struct io_zctap_ifq *ifq; + unsigned start, start_off; + struct sk_buff *frag_iter; + int i, copy, end, off; + int ret = 0; + + if (zrd->iov_space >= zrd->iov_limit) { + desc->count = 0; + return 0; + } + if (len > zrd->recv_limit) + len = zrd->recv_limit; + + start = skb_headlen(skb); + start_off = offset; + + ifq = io_zctap_skb_ifq(skb); + + if (offset < start) { + copy = start - offset; + if (copy > len) + copy = len; + + /* copy out linear data */ + ret = zctap_copy_data(zrd, copy, skb->data + offset); + if (ret < 0) + goto out; + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + const skb_frag_t *frag; + + WARN_ON(start > offset + len); + + frag = &skb_shinfo(skb)->frags[i]; + end = start + skb_frag_size(frag); + + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zctap_recv_frag(zrd, ifq, frag, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + + skb_walk_frags(skb, frag_iter) { + WARN_ON(start > offset + len); + + end = start + frag_iter->len; + if (offset < end) { + copy = end - offset; + if (copy > len) + copy = len; + + off = offset - start; + ret = zctap_recv_skb(desc, frag_iter, off, copy); + if (ret < 0) + goto out; + + offset += ret; + len -= ret; + if (len == 0 || ret != copy) + goto out; + } + start = end; + } + +out: + if (offset == start_off) + return ret; + return offset - start_off; +} + +static int __io_zctap_tcp_read(struct sock *sk, struct zctap_read_desc *zrd) +{ + read_descriptor_t rd_desc = { + .arg.data = zrd, + .count = 1, + }; + + return tcp_read_sock(sk, &rd_desc, zctap_recv_skb); +} + +static int io_zctap_tcp_recvmsg(struct sock *sk, struct zctap_read_desc *zrd, + int flags, int *addr_len) +{ + size_t used; + long timeo; + int ret; + + ret = used = 0; + + lock_sock(sk); + + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + while (zrd->recv_limit) { + ret = __io_zctap_tcp_read(sk, zrd); + if (ret < 0) + break; + if (!ret) { + if (used) + break; + if (sock_flag(sk, SOCK_DONE)) + break; + if (sk->sk_err) { + ret = sock_error(sk); + break; + } + if (sk->sk_shutdown & RCV_SHUTDOWN) + break; + if (sk->sk_state == TCP_CLOSE) { + ret = -ENOTCONN; + break; + } + if (!timeo) { + ret = -EAGAIN; + break; + } + if (!skb_queue_empty(&sk->sk_receive_queue)) + break; + sk_wait_data(sk, &timeo, NULL); + if (signal_pending(current)) { + ret = sock_intr_errno(timeo); + break; + } + continue; + } + zrd->recv_limit -= ret; + used += ret; + + if (!timeo) + break; + release_sock(sk); + lock_sock(sk); + + if (sk->sk_err || sk->sk_state == TCP_CLOSE || + (sk->sk_shutdown & RCV_SHUTDOWN) || + signal_pending(current)) + break; + } + + release_sock(sk); + + /* XXX, handle timestamping */ + + if (used) + return used; + + return ret; +} + +int io_zctap_recv(struct socket *sock, struct zctap_read_desc *zrd, + struct msghdr *msg, unsigned int flags) +{ + struct sock *sk = sock->sk; + const struct proto *prot; + int addr_len = 0; + int ret; + + if (flags & MSG_ERRQUEUE) + return -EOPNOTSUPP; + + prot = READ_ONCE(sk->sk_prot); + if (prot->recvmsg != tcp_recvmsg) + return -EPROTONOSUPPORT; + + sock_rps_record_flow(sk); + + ret = io_zctap_tcp_recvmsg(sk, zrd, flags, &addr_len); + if (ret >= 0) { + msg->msg_namelen = addr_len; + ret = zrd->iov_space; + } + return ret; +} diff --git a/io_uring/zctap.h b/io_uring/zctap.h index bb44f8e972e8..4db516707d19 100644 --- a/io_uring/zctap.h +++ b/io_uring/zctap.h @@ -2,10 +2,30 @@ #ifndef IOU_ZCTAP_H #define IOU_ZCTAP_H +struct zctap_read_desc { + struct io_ring_ctx *ctx; + struct iov_iter *iter; + u32 iov_space; + u32 iov_limit; + u32 recv_limit; + + struct io_kiocb req; + u8 *buf; + size_t offset; + size_t buflen; + + u16 copy_bgid; /* XXX move to register ifq? */ +}; + int io_register_ifq(struct io_ring_ctx *ctx, struct io_uring_ifq_req __user *arg); void io_unregister_zctap_all(struct io_ring_ctx *ctx); int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id); +int io_recvzc(struct io_kiocb *req, unsigned int issue_flags); +int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_zctap_recv(struct socket *sock, struct zctap_read_desc *zrd, + struct msghdr *msg, unsigned int flags); + #endif From patchwork Tue Nov 8 05:05:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035889 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A77AAC433FE for ; Tue, 8 Nov 2022 05:05:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232950AbiKHFFl convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232641AbiKHFFi (ORCPT ); Tue, 8 Nov 2022 00:05:38 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2E0216584 for ; Mon, 7 Nov 2022 21:05:37 -0800 (PST) Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKsBR018797 for ; Mon, 7 Nov 2022 21:05:37 -0800 Received: from maileast.thefacebook.com ([163.114.130.3]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3knk5mp8ta-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:37 -0800 Received: from twshared5287.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::e) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:36 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 3782923B26021; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 13/15] io_uring: Make remove_ifq_region a delayed work call Date: Mon, 7 Nov 2022 21:05:19 -0800 Message-ID: <20221108050521.3198458-14-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: QuImj1wgOWOWGCaonBuYASmunAg2zaia X-Proofpoint-GUID: QuImj1wgOWOWGCaonBuYASmunAg2zaia X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org The page backing store should not be removed until all ouststanding packets are returned. The packets may be inflight, owned by the driver or sitting in a socket buffer. The region holds a reference to the ifq, and when the ifq is closed, a delayed work item is scheduled which checks that all pages have been returned. When complete, then the region releases the ifq reference so it can be freed. Currently, the work item will exit and leak pages after a timeout expires. This should not happen in normal operation. Signed-off-by: Jonathan Lemon --- include/linux/io_uring_types.h | 1 + io_uring/zctap.c | 77 +++++++++++++++++++++++++--------- 2 files changed, 59 insertions(+), 19 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 39f20344d578..7d9895370875 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -583,6 +583,7 @@ struct io_zctap_ifq { struct io_ring_ctx *ctx; void *region; struct ubuf_info *uarg; + refcount_t refcount; u16 queue_id; u16 id; u16 fill_bgid; diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 096b3dd5a8a3..262aa50de8c4 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -19,13 +19,14 @@ #define NR_ZCTAP_IFQS 1 struct ifq_region { - struct io_zctap_ifq *ifq; /* only for delayed_work */ - struct io_mapped_ubuf *imu; + struct io_zctap_ifq *ifq; int free_count; int nr_pages; u16 id; spinlock_t freelist_lock; + struct delayed_work release_work; + unsigned long delay_end; struct io_zctap_buf *buf; u16 freelist[]; @@ -37,6 +38,8 @@ struct io_zctap_ifq_priv { struct ubuf_info uarg; }; +static void io_zctap_ifq_put(struct io_zctap_ifq *ifq); + typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); static void zctap_set_page_info(struct page *page, u64 info) @@ -239,11 +242,30 @@ netdev2device(struct net_device *dev) return dev->dev.parent; /* from SET_NETDEV_DEV() */ } -static void io_remove_ifq_region(struct ifq_region *ifr) +static void io_remove_ifq_region_work(struct work_struct *work) { + struct ifq_region *ifr = container_of( + to_delayed_work(work), struct ifq_region, release_work); struct device *device = netdev2device(ifr->ifq->dev); struct io_zctap_buf *buf; - int i; + int i, refs, count; + + count = 0; + for (i = 0; i < ifr->nr_pages; i++) { + buf = &ifr->buf[i]; + refs = atomic_read(&buf->refcount) & IO_ZCTAP_KREF_MASK; + if (refs) { + if (time_before(jiffies, ifr->delay_end)) { + schedule_delayed_work(&ifr->release_work, HZ); + return; + } + count++; + } + } + + if (count) + pr_debug("freeing ifr with %d/%d outstanding pages\n", + count, ifr->nr_pages); for (i = 0; i < ifr->nr_pages; i++) { buf = &ifr->buf[i]; @@ -255,20 +277,28 @@ static void io_remove_ifq_region(struct ifq_region *ifr) put_page(buf->page); } + io_zctap_ifq_put(ifr->ifq); kvfree(ifr->buf); kvfree(ifr); } -static int io_zctap_map_region(struct ifq_region *ifr, struct device *device) +static void io_remove_ifq_region(struct ifq_region *ifr) { - struct io_mapped_ubuf *imu; + ifr->delay_end = jiffies + HZ * 10; + INIT_DELAYED_WORK(&ifr->release_work, io_remove_ifq_region_work); + schedule_delayed_work(&ifr->release_work, 0); +} + +static int io_zctap_map_region(struct ifq_region *ifr, + struct io_mapped_ubuf *imu) +{ + struct device *device = netdev2device(ifr->ifq->dev); struct io_zctap_buf *buf; struct page *page; dma_addr_t addr; int i, err; u64 info; - imu = ifr->imu; for (i = 0; i < ifr->nr_pages; i++) { page = imu->bvec[i].bv_page; @@ -302,10 +332,10 @@ static int io_zctap_map_region(struct ifq_region *ifr, struct device *device) out: while (i--) { - page = imu->bvec[i].bv_page; + buf = &ifr->buf[i]; + page = buf->page; set_page_private(page, 0); ClearPagePrivate(page); - buf = &ifr->buf[i]; dma_unmap_page_attrs(device, buf->dma, PAGE_SIZE, DMA_BIDIRECTIONAL, DMA_ATTR_SKIP_CPU_SYNC); @@ -348,13 +378,12 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) spin_lock_init(&ifr->freelist_lock); ifr->nr_pages = nr_pages; - ifr->imu = imu; ifr->free_count = nr_pages; ifr->id = id; + ifr->ifq = ifq; + ifr->delay_end = 0; - ifr->ifq = ifq; /* XXX */ - - err = io_zctap_map_region(ifr, netdev2device(ifq->dev)); + err = io_zctap_map_region(ifr, imu); if (err) { kvfree(ifr->buf); kvfree(ifr); @@ -362,6 +391,7 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) } ifq->region = ifr; + refcount_inc(&ifq->refcount); return 0; } @@ -436,15 +466,23 @@ static struct io_zctap_ifq *io_zctap_ifq_alloc(struct io_ring_ctx *ctx) return &priv->ifq; } -static void io_zctap_ifq_free(struct io_zctap_ifq *ifq) +static void io_zctap_ifq_put(struct io_zctap_ifq *ifq) +{ + if (!refcount_dec_and_test(&ifq->refcount)) + return; + + if (ifq->dev) + dev_put(ifq->dev); + kfree(ifq); +} + +static void io_zctap_ifq_close(struct io_zctap_ifq *ifq) { if (ifq->queue_id != -1) io_close_zctap_ifq(ifq, ifq->queue_id); if (ifq->region) io_remove_ifq_region(ifq->region); - if (ifq->dev) - dev_put(ifq->dev); - kfree(ifq); + io_zctap_ifq_put(ifq); } int io_register_ifq(struct io_ring_ctx *ctx, @@ -473,6 +511,7 @@ int io_register_ifq(struct io_ring_ctx *ctx, ifq->fill_bgid = req.fill_bgid; ifq->uarg->callback = io_zctap_ifq_callback; ifq->uarg->flags = SKBFL_ALL_ZEROCOPY | SKBFL_FIXED_FRAG; + refcount_set(&ifq->refcount, 1); err = -ENODEV; ifq->dev = dev_get_by_index(&init_net, req.ifindex); @@ -493,7 +532,7 @@ int io_register_ifq(struct io_ring_ctx *ctx, return 0; out: - io_zctap_ifq_free(ifq); + io_zctap_ifq_close(ifq); return err; } @@ -506,7 +545,7 @@ int io_unregister_zctap_ifq(struct io_ring_ctx *ctx, unsigned long index) return -EINVAL; ctx->zctap_ifq = NULL; - io_zctap_ifq_free(ifq); + io_zctap_ifq_close(ifq); return 0; } From patchwork Tue Nov 8 05:05:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035890 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB492C4332F for ; Tue, 8 Nov 2022 05:05:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233128AbiKHFFm convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232693AbiKHFFj (ORCPT ); Tue, 8 Nov 2022 00:05:39 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D468C17414 for ; Mon, 7 Nov 2022 21:05:37 -0800 (PST) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A80Utrf007228 for ; Mon, 7 Nov 2022 21:05:37 -0800 Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3kqcmqse66-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:37 -0800 Received: from twshared5287.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:36 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 3F51D23B26023; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 14/15] io_uring: Add a buffer caching mechanism for zctap. Date: Mon, 7 Nov 2022 21:05:20 -0800 Message-ID: <20221108050521.3198458-15-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: uLN2KgYEjEVswIJzAphGSuyolc2n4ZmO X-Proofpoint-ORIG-GUID: uLN2KgYEjEVswIJzAphGSuyolc2n4ZmO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This is based on the same concept as the page pool. Here, there are 4 separate buffer sources: cache - small (128) cache the driver can use locklessly. ptr_ring - buffers freed through skb_release_data() fillq - entries returned from the application freelist - spinlock protected pool of free entries. The driver first tries the lockless cache, before attempting to refill it from the ptr ring. If there are still no buffers, then the fill ring is examined, before going to the freelist. If the ptr_ring is full when buffers are released as the skb is dropped (or the driver returns the buffers), then they are placed back on the freelist. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 128 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 99 insertions(+), 29 deletions(-) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index 262aa50de8c4..c7897fe2ccf6 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -18,8 +18,12 @@ #define NR_ZCTAP_IFQS 1 +#define REGION_CACHE_COUNT 128 +#define REGION_REFILL_COUNT 64 + struct ifq_region { struct io_zctap_ifq *ifq; + int cache_count; int free_count; int nr_pages; u16 id; @@ -28,6 +32,10 @@ struct ifq_region { struct delayed_work release_work; unsigned long delay_end; + u16 cache[REGION_CACHE_COUNT]; + + struct ptr_ring ring; + struct io_zctap_buf *buf; u16 freelist[]; }; @@ -103,8 +111,29 @@ static bool io_zctap_put_buf_uref(struct io_zctap_buf *buf) return atomic_sub_and_test(IO_ZCTAP_UREF, &buf->refcount); } +/* if on exit/teardown path, can skip this work */ +static void io_zctap_recycle_buf(struct ifq_region *ifr, + struct io_zctap_buf *buf) +{ + int rc; + + if (in_serving_softirq()) + rc = ptr_ring_produce(&ifr->ring, buf); + else + rc = ptr_ring_produce_bh(&ifr->ring, buf); + + if (rc) { + spin_lock(&ifr->freelist_lock); + + ifr->freelist[ifr->free_count++] = buf - ifr->buf; + + spin_unlock(&ifr->freelist_lock); + } +} + /* gets a user-supplied buffer from the fill queue * note: may drain N entries, but still have no usable buffers + * XXX add retry limit? */ static struct io_zctap_buf *io_zctap_get_buffer(struct io_zctap_ifq *ifq, u16 *buf_pgid) @@ -150,40 +179,71 @@ static struct io_zctap_buf *io_zctap_get_buffer(struct io_zctap_ifq *ifq, return buf; } -/* if on exit/teardown path, can skip this work */ -static void io_zctap_recycle_buf(struct ifq_region *ifr, - struct io_zctap_buf *buf) +static int io_zctap_get_buffers(struct io_zctap_ifq *ifq, u16 *cache, int n) { - spin_lock(&ifr->freelist_lock); + struct io_zctap_buf *buf; + int i; - ifr->freelist[ifr->free_count++] = buf - ifr->buf; - - spin_unlock(&ifr->freelist_lock); + for (i = 0; i < n; i++) { + buf = io_zctap_get_buffer(ifq, &cache[i]); + if (!buf) + break; + } + return i; } struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc) { - struct ifq_region *ifr = ifq->region; struct io_zctap_buf *buf; + struct ifq_region *ifr; + int count; u16 pgid; + ifr = ifq->region; + if (ifr->cache_count) + goto out; + + if (!__ptr_ring_empty(&ifr->ring)) { + do { + buf = __ptr_ring_consume(&ifr->ring); + if (!buf) + break; + ifr->cache[ifr->cache_count++] = buf - ifr->buf; + } while (ifr->cache_count < REGION_REFILL_COUNT); + + if (ifr->cache_count) + goto out; + } + + count = io_zctap_get_buffers(ifq, ifr->cache, REGION_REFILL_COUNT); + ifr->cache_count += count; + + if (ifr->cache_count) + goto out; + spin_lock(&ifr->freelist_lock); - buf = NULL; - if (ifr->free_count) { - pgid = ifr->freelist[--ifr->free_count]; - buf = &ifr->buf[pgid]; - } + count = min_t(int, ifr->free_count, REGION_CACHE_COUNT); + ifr->free_count -= count; + ifr->cache_count += count; + memcpy(ifr->cache, &ifr->freelist[ifr->free_count], + count * sizeof(u16)); spin_unlock(&ifr->freelist_lock); - if (!buf) { - buf = io_zctap_get_buffer(ifq, &pgid); - if (!buf) - return NULL; - } + if (ifr->cache_count) + goto out; - WARN_ON(atomic_read(&buf->refcount)); + return NULL; + +out: + pgid = ifr->cache[--ifr->cache_count]; + buf = &ifr->buf[pgid]; + + WARN_RATELIMIT(atomic_read(&buf->refcount), + "pgid:%d refc:%d cache_count:%d\n", + pgid, atomic_read(&buf->refcount), + ifr->cache_count); atomic_set(&buf->refcount, refc & IO_ZCTAP_KREF_MASK); return buf; @@ -278,6 +338,7 @@ static void io_remove_ifq_region_work(struct work_struct *work) } io_zctap_ifq_put(ifr->ifq); + ptr_ring_cleanup(&ifr->ring, NULL); kvfree(ifr->buf); kvfree(ifr); } @@ -365,16 +426,18 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) if (imu->ubuf & ~PAGE_MASK || imu->ubuf_end & ~PAGE_MASK) return -EFAULT; + err = -ENOMEM; nr_pages = imu->nr_bvecs; ifr = kvmalloc(struct_size(ifr, freelist, nr_pages), GFP_KERNEL); if (!ifr) - return -ENOMEM; + goto fail; ifr->buf = kvmalloc_array(nr_pages, sizeof(*ifr->buf), GFP_KERNEL); - if (!ifr->buf) { - kvfree(ifr); - return -ENOMEM; - } + if (!ifr->buf) + goto fail_buf; + + if (ptr_ring_init(&ifr->ring, 1024, GFP_KERNEL)) + goto fail_ring; spin_lock_init(&ifr->freelist_lock); ifr->nr_pages = nr_pages; @@ -382,18 +445,25 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) ifr->id = id; ifr->ifq = ifq; ifr->delay_end = 0; + ifr->cache_count = 0; err = io_zctap_map_region(ifr, imu); - if (err) { - kvfree(ifr->buf); - kvfree(ifr); - return err; - } + if (err) + goto fail_map; ifq->region = ifr; refcount_inc(&ifq->refcount); return 0; + +fail_map: + ptr_ring_cleanup(&ifr->ring, NULL); +fail_ring: + kvfree(ifr->buf); +fail_buf: + kvfree(ifr); +fail: + return err; } static int __io_queue_mgmt(struct net_device *dev, struct io_zctap_ifq *ifq, From patchwork Tue Nov 8 05:05:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jonathan Lemon X-Patchwork-Id: 13035893 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12227C4332F for ; Tue, 8 Nov 2022 05:05:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232896AbiKHFFp convert rfc822-to-8bit (ORCPT ); Tue, 8 Nov 2022 00:05:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233095AbiKHFFl (ORCPT ); Tue, 8 Nov 2022 00:05:41 -0500 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87F6913DC2 for ; Mon, 7 Nov 2022 21:05:39 -0800 (PST) Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A7LKpss010010 for ; Mon, 7 Nov 2022 21:05:39 -0800 Received: from maileast.thefacebook.com ([163.114.130.3]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3knq54wajv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 07 Nov 2022 21:05:39 -0800 Received: from twshared2001.03.ash8.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Mon, 7 Nov 2022 21:05:37 -0800 Received: by devvm2494.atn0.facebook.com (Postfix, from userid 172786) id 46E1F23B26025; Mon, 7 Nov 2022 21:05:22 -0800 (PST) From: Jonathan Lemon To: CC: Subject: [PATCH v1 15/15] io_uring: Notify the application as the fillq is drained. Date: Mon, 7 Nov 2022 21:05:21 -0800 Message-ID: <20221108050521.3198458-16-jonathan.lemon@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221108050521.3198458-1-jonathan.lemon@gmail.com> References: <20221108050521.3198458-1-jonathan.lemon@gmail.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: t0aHdp26enFR9ERwmHglAucZKqu-VipG X-Proofpoint-GUID: t0aHdp26enFR9ERwmHglAucZKqu-VipG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-07_11,2022-11-07_02,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Userspace maintains a free count of space available in the fillq, and only returns entries based on the available space. As the kernel removes these entries, it needs to notify the application so more buffers can be queued. Only one outstanding notifier per queue is used, and it provides the most recent count of entries removed from the queue. Also post a notifier when the NIC is unable to obtain any buffers. When this happens, the NIC may just drop packets or stall. Signed-off-by: Jonathan Lemon --- io_uring/zctap.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/io_uring/zctap.c b/io_uring/zctap.c index c7897fe2ccf6..e6c7ed85d4ee 100644 --- a/io_uring/zctap.c +++ b/io_uring/zctap.c @@ -15,6 +15,7 @@ #include "zctap.h" #include "rsrc.h" #include "kbuf.h" +#include "refs.h" #define NR_ZCTAP_IFQS 1 @@ -26,7 +27,9 @@ struct ifq_region { int cache_count; int free_count; int nr_pages; + int taken; u16 id; + bool empty; spinlock_t freelist_lock; struct delayed_work release_work; @@ -44,8 +47,14 @@ struct ifq_region { struct io_zctap_ifq_priv { struct io_zctap_ifq ifq; struct ubuf_info uarg; + struct io_kiocb req; }; +static struct io_kiocb *io_zctap_ifq_notifier(struct io_zctap_ifq *ifq) +{ + return &((struct io_zctap_ifq_priv *)ifq)->req; +} + static void io_zctap_ifq_put(struct io_zctap_ifq *ifq); typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); @@ -131,6 +140,34 @@ static void io_zctap_recycle_buf(struct ifq_region *ifr, } } +struct io_zctap_notif { + struct file *file; + u64 udata; + int res; + int cflags; +}; + +static void io_zctap_post_notify(struct io_kiocb *req, bool *locked) +{ + struct io_zctap_notif *n = io_kiocb_to_cmd(req, struct io_zctap_notif); + + io_post_aux_cqe(req->ctx, n->udata, n->res, n->cflags, true); + io_req_task_complete(req, locked); +} + +static void io_zctap_notify(struct io_kiocb *req, int bgid, int count) +{ + struct io_zctap_notif *n = io_kiocb_to_cmd(req, struct io_zctap_notif); + + n->udata = 0xface0000; /* XXX */ + n->res = (bgid << 16) | count; + n->cflags = IORING_CQE_F_BUFFER|IORING_CQE_F_NOTIF; + + req_ref_get(req); + req->io_task_work.func = io_zctap_post_notify; + io_req_task_work_add(req); +} + /* gets a user-supplied buffer from the fill queue * note: may drain N entries, but still have no usable buffers * XXX add retry limit? @@ -159,6 +196,7 @@ static struct io_zctap_buf *io_zctap_get_buffer(struct io_zctap_ifq *ifq, if (!addr) return NULL; + ifr->taken++; pgid = addr & 0xffff; region_id = (addr >> 16) & 0xffff; if (region_id) { @@ -196,6 +234,7 @@ struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc) { struct io_zctap_buf *buf; struct ifq_region *ifr; + struct io_kiocb *req; int count; u16 pgid; @@ -218,6 +257,12 @@ struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc) count = io_zctap_get_buffers(ifq, ifr->cache, REGION_REFILL_COUNT); ifr->cache_count += count; + req = io_zctap_ifq_notifier(ifq); + if (ifr->taken && atomic_read(&req->refs) == 1) { + io_zctap_notify(req, ifq->fill_bgid, ifr->taken); + ifr->taken = 0; + } + if (ifr->cache_count) goto out; @@ -234,11 +279,17 @@ struct io_zctap_buf *io_zctap_get_buf(struct io_zctap_ifq *ifq, int refc) if (ifr->cache_count) goto out; + if (!ifr->empty && atomic_read(&req->refs) == 1) { + io_zctap_notify(req, ifq->fill_bgid, 0); + ifr->empty = true; + } + return NULL; out: pgid = ifr->cache[--ifr->cache_count]; buf = &ifr->buf[pgid]; + ifr->empty = false; WARN_RATELIMIT(atomic_read(&buf->refcount), "pgid:%d refc:%d cache_count:%d\n", @@ -445,6 +496,8 @@ int io_provide_ifq_region(struct io_zctap_ifq *ifq, u16 id) ifr->id = id; ifr->ifq = ifq; ifr->delay_end = 0; + ifr->taken = 0; + ifr->empty = false; ifr->cache_count = 0; err = io_zctap_map_region(ifr, imu); @@ -533,6 +586,11 @@ static struct io_zctap_ifq *io_zctap_ifq_alloc(struct io_ring_ctx *ctx) priv->ifq.ctx = ctx; priv->ifq.queue_id = -1; priv->ifq.uarg = &priv->uarg; + + priv->req.ctx = ctx; + priv->req.task = current; + io_req_set_refcount(&priv->req); + return &priv->ifq; }