From patchwork Tue Jun 28 18:56:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898735 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6BB4CCA47E for ; Tue, 28 Jun 2022 19:00:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234837AbiF1TAj (ORCPT ); Tue, 28 Jun 2022 15:00:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233842AbiF1TAR (ORCPT ); Tue, 28 Jun 2022 15:00:17 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AB0ACE14; Tue, 28 Jun 2022 11:59:55 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id e40so18931024eda.2; Tue, 28 Jun 2022 11:59:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=aRSWPFJdUal4sFDyAFcRSHQQ/M3F+xYGlhF06C7Lf5E=; b=c1gbKgTS6cKh2Bf15wZewy0H9ONm8eU1XuQZp35JrwslHKC/enp7Fqh1o+b3dVMqJO g8mNWxOUfXm9EzyE4CGD7Tu58aamFaSBoz+Dpggx9IPu2T33m+lYeIOoOUBYi93iKdYb Tt8NC8PczuHidHiuqk3MQjiasZYSDorKQ6Ju6SbIUJnB/iamqNqjbSz7RVHvpgblCFoz 5f0rOwxzw5HTyUxrdkQ3AB2qE7uf5lYaonHVWTYk3x7NBcx3Ou2QX2kfka4+zUQ8f41I 6zgP/4Tf1PW22XB27CPSTBP9J+WnuFYjVsaFhZgiEny1DatirxVUECUv8FGvlLLz7Vks zu/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aRSWPFJdUal4sFDyAFcRSHQQ/M3F+xYGlhF06C7Lf5E=; b=tJB0e9YbkUpfGEGWrLZwhVKpYiME4iJ2nGKDSC0PmZwHsIXT/GLfMGhQpVswWNAJN7 E4XYCdznZcb7KzxH9M+zPT6rI0wKrxIJZT41Edf45e9GToGR/qLUVBkyKuPzg4qQCX/R zcnqbfHcEbEZrh5X3AZfPID3YcXl9bNmy/nBI53fd1me+onluFwV9SqLBaB8AJWey/WW VBM4eEIJ80ZQhbNGhspG18hmUf41yMf/FrImre5GaWIsR29RQppLoaZN9N8EXPZjWS+P zlJNDAKoL8qJzUDGTH6kSL029bGThoWmmny1GvvqBecrSk2mXo7hRC4iqYj2tddlcaWo /uYw== X-Gm-Message-State: AJIora8jp60i/8Ixl2exD593NQnE0DlzSq3nhJd7hWfaa/8xqCJmHvbl ZoM7fG8ZqRTCZRUt0Ph5EJeUkLjSJGZWFw== X-Google-Smtp-Source: AGRyM1uN0ZDn0lfTV06501LE/F3LZzgOFSGDVTUHnkOaY2bAL6wEd8O0aAawYXPuzgKrq8b9D/cO1g== X-Received: by 2002:a05:6402:492:b0:437:4b50:d616 with SMTP id k18-20020a056402049200b004374b50d616mr19550368edv.43.1656442793703; Tue, 28 Jun 2022 11:59:53 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 11:59:53 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 01/29] ipv4: avoid partial copy for zc Date: Tue, 28 Jun 2022 19:56:23 +0100 Message-Id: <31cdb30c440efc9d4cebe196f4dc78d0c1484210.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 00b4bf26fd93..581d1e233260 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -969,7 +969,6 @@ static int __ip_append_data(struct sock *sk, struct inet_sock *inet = inet_sk(sk); struct ubuf_info *uarg = NULL; struct sk_buff *skb; - struct ip_options *opt = cork->opt; int hh_len; int exthdrlen; @@ -977,6 +976,7 @@ static int __ip_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; unsigned int maxfraglen, fragheaderlen, maxnonfragsize; int csummode = CHECKSUM_NONE; struct rtable *rt = (struct rtable *)cork->dst; @@ -1025,6 +1025,7 @@ static int __ip_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1091,9 +1092,12 @@ static int __ip_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Jun 28 18:56:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898736 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5040DC43334 for ; Tue, 28 Jun 2022 19:00:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234607AbiF1TAl (ORCPT ); Tue, 28 Jun 2022 15:00:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56794 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233883AbiF1TAU (ORCPT ); Tue, 28 Jun 2022 15:00:20 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 79746CE1A; Tue, 28 Jun 2022 11:59:56 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id fd6so18898459edb.5; Tue, 28 Jun 2022 11:59:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5dH/7z5ivFVO5BHaFYr7Z/XbdA3PCvYtaqkIcRmZ+ew=; b=J3VMvjRE2aCmN5CovG1k5i/yJ77M0VMkQTyWYUV1xWfdD4M+GSgjHsTZEO3QayIdGP lAIaosJGVCaArl4kZy6ut6NvQWWD36X8UkgW3nniv6YbmwT/L93lY9DvPhrFarzh+/oN oTpQJi405WBnmM2UXaKszXT3sMfxkXwh0El8sPfRk9rGVIXAOlytuSr5vPNNwiJvg/Fj UTqmi2guHL9oDdsxKq1+cVVN8Yc33fLp1WtA0CN/sIvz0FqO2yZeWZ0qBrzyAQfakFua 2m1HmBtZ7oHX//Jcmst2HVWUReS8QRpm+HoXDKAIyySHWrEFRAdMWH99zYlyL0ZB2t7M PyiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5dH/7z5ivFVO5BHaFYr7Z/XbdA3PCvYtaqkIcRmZ+ew=; b=W8cSDoiFl9BpeVsPuViBg9FuIOCFteea1QYnvlPQC16B56qm6nHhMzTTW6NKxe5tm/ MnKfukFgbVbxshRYXEqaM+zLRwHWoWCEZOQHp5+3W/CojNj47f7QH/UWZpAOVx9Fqcmk cFGMJrPedxEAcvqxTw2xij14iafzbKZsByQ/hXUY7j1WRPFpp8QPsUwdqwco6QLEgZpB ELzHMEhZi7PSim0LEUX1OJpN5VqDgdRDw/f8YnDXXay6wexOy5IpzdxfSg+aUm58gjeE 9c4+mgflBt/QD0lT8J0EFv5dMfBPtBpM7Qr3A7HdRmTZcbC+Yh3iu12Z8BV5KAfU7Xrd dF+w== X-Gm-Message-State: AJIora8CBBkfudOT7usC06vZeBFPutPk6Zoms+5aFCz2IwmhfHg2NRO0 iUNMwmsIv2OTY3ZP9wDtCj6XcM8Uisos7g== X-Google-Smtp-Source: AGRyM1tzidqJtkS9GbJ51EKbq3+QT82smcW2dq9tUHG5uXnX/7jnNszUnu93omlW5c0CoMGFi7Z/HA== X-Received: by 2002:a05:6402:5388:b0:435:71b:5d44 with SMTP id ew8-20020a056402538800b00435071b5d44mr24524607edb.364.1656442794760; Tue, 28 Jun 2022 11:59:54 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 11:59:54 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 02/29] ipv6: avoid partial copy for zc Date: Tue, 28 Jun 2022 19:56:24 +0100 Message-Id: <9806f2103d0c0512155957ee57ead379a11d93bd.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Even when zerocopy transmission is requested and possible, __ip_append_data() will still copy a small chunk of data just because it allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles on copy and iter manipulations and also misalignes potentially aligned data. Avoid such coies. And as a bonus we can allocate smaller skb. Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 4081b12a01ff..6103cd9066ff 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1464,6 +1464,7 @@ static int __ip6_append_data(struct sock *sk, int copy; int err; int offset = 0; + bool zc = false; u32 tskey = 0; struct rt6_info *rt = (struct rt6_info *)cork->dst; struct ipv6_txoptions *opt = v6_cork->opt; @@ -1549,6 +1550,7 @@ static int __ip6_append_data(struct sock *sk, if (rt->dst.dev->features & NETIF_F_SG && csummode == CHECKSUM_PARTIAL) { paged = true; + zc = true; } else { uarg->zerocopy = 0; skb_zcopy_set(skb, uarg, &extra_uref); @@ -1630,9 +1632,12 @@ static int __ip6_append_data(struct sock *sk, (fraglen + alloc_extra < SKB_MAX_ALLOC || !(rt->dst.dev->features & NETIF_F_SG))) alloclen = fraglen; - else { + else if (!zc) { alloclen = min_t(int, fraglen, MAX_HEADER); pagedlen = fraglen - alloclen; + } else { + alloclen = fragheaderlen + transhdrlen; + pagedlen = datalen - transhdrlen; } alloclen += alloc_extra; From patchwork Tue Jun 28 18:56:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898743 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCE45C433EF for ; Tue, 28 Jun 2022 19:00:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233883AbiF1TA4 (ORCPT ); Tue, 28 Jun 2022 15:00:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234018AbiF1TAW (ORCPT ); Tue, 28 Jun 2022 15:00:22 -0400 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCE5DBF6C; Tue, 28 Jun 2022 11:59:57 -0700 (PDT) Received: by mail-ed1-x530.google.com with SMTP id cf14so18904716edb.8; Tue, 28 Jun 2022 11:59:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=p79o1VsFDo3cqk77y+g7Vo/BbqY9LUCoOrOtB7thWXc=; b=kCuOg+JnsMVf2GKgp0FVpXjiZvxoJtJjry8oHlX2MssbI32rWwy5pOqbpWYl/RBv6n SdIUM9aZMrfxqcmVo3UCUY+52jLzD74eKkpiFxVYWBhEWKVl+0w1wMPJmIpJ7krtoHbL TAyV1MeBFVt/N3btF3lTG4qTgKw5Zinm3LfEqhuo9j8lcleifwMs29AyBkX7WPwKZOAI mx7xIIzJmuUEkc0NXSCSdCGg7uFnG3nZkfr+3lVNxcog3UlZ87R/loFGtMw5Wai+9OLn PSTmDIOfX5Zpv/71WkQHg9hOOmyEoGBqhvqDSovCuTCkB4nO94D/w7am3C/Hs1m/prj+ WBig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=p79o1VsFDo3cqk77y+g7Vo/BbqY9LUCoOrOtB7thWXc=; b=cXSv/acy3B54tgSvwe1Fai5TrefZeuxVD9ENeBVg/5pQjsJH0jGZeZMtjKh8ten/Bt Lj7q3YveqplpK9RScHQFevMYraIWyQMZ462toCe5l1cUz/orraMo0tRdy451IKXbYpCp z6oESamH3sWkrTOr0QrOYvYssknLhwWhLDipsuQ8Ay+8OS+qIfgGU48pN2T6qrlnmaUi o+pjwqf8ZABUMY9JRIJ+j55lQ2gXbXROLk7GDA/bsn7RfnXuI8b3pTeyANVHCnzCF4Vw 92768xtB21MgKiRvp3jj4xXbLUr/ONe47efdcrlqpbLPKVY/jelNZlXocfCBJz4o3WIE 6NAw== X-Gm-Message-State: AJIora/2EdHjeYKCDj6uMfdolqizCsYvDuIK/vV8qRoRclo0TpvqeUvb FLARH2930BVq2EXgFD2U8FXhFFmGyrbHxg== X-Google-Smtp-Source: AGRyM1sfveaW9Z0wUTJdyiH4ioJG+ffU6P3nXYR1U35Urv1GNQ5nJo6QrTpR64WlJRJWKdj/slZAPw== X-Received: by 2002:a05:6402:5306:b0:437:8bbd:b313 with SMTP id eo6-20020a056402530600b004378bbdb313mr16741610edb.123.1656442796044; Tue, 28 Jun 2022 11:59:56 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 11:59:55 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 03/29] skbuff: add SKBFL_DONT_ORPHAN flag Date: Tue, 28 Jun 2022 19:56:25 +0100 Message-Id: <1def15f02ef8bcfe9fd80a4accd3c5af57675179.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 8 +++++--- net/core/skbuff.c | 2 +- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index da96f0d3e753..eead3527bdaf 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -686,10 +686,13 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) -#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY) +#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ + SKBFL_DONT_ORPHAN) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -3175,8 +3178,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b3559cb1d82..5b35791064d1 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1193,7 +1193,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) uarg->len = 1; uarg->bytelen = size; uarg->zerocopy = 1; - uarg->flags = SKBFL_ZEROCOPY_FRAG; + uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; refcount_set(&uarg->refcnt, 1); sock_hold(sk); From patchwork Tue Jun 28 18:56:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898744 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80CA1C43334 for ; Tue, 28 Jun 2022 19:01:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234901AbiF1TA7 (ORCPT ); Tue, 28 Jun 2022 15:00:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234067AbiF1TAW (ORCPT ); Tue, 28 Jun 2022 15:00:22 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00E3CDF05; Tue, 28 Jun 2022 11:59:59 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id ay16so27701169ejb.6; Tue, 28 Jun 2022 11:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=z2Qv20zedNy8g3sQlLE59sUUdbI93Xy4dZXDehfqyts=; b=TeqIfzcjTVc2mRM1U5TeziQ35hQIeROIcT4yvbV5HXHOIoy0hEaZP4/bExTXk/tyYu uKFvG5wD6xNr1HN3kr0y48w0qtn2fAWU5dwOmZCmFV/j3dnNVtjN7G9u2svX2DHrS2nJ jDZqqVn1mmOBVS/rbDaXXGB589qsANnvgop/ga+HA+C/utXlatrdcHjhaw4I3zVSyiKx lO4e7tU7u9pMF8ygDjsk55XuW4g/F2b3Gtj0gSKbZD2zjPbha2M51Nl6PxH3UZkNDHcn GjlruiXzYDKMVqRTlNDLcgI9QybI8VS99Duc6CxPwkBg/PgkXk/bquZvTthzFvfBNEYC xMOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=z2Qv20zedNy8g3sQlLE59sUUdbI93Xy4dZXDehfqyts=; b=Wcu1UqhOOGcNXixPMPVX/dwwx/nLQFtspxNFfOCxc7jAfap8cEVDX6wdNeFBs+zMlG 8wHSzHcXAz0s4YvoZa8zGOPut22FwkeJFXRKy29BeX4xPQ/B/cLSDuvA4hQ4nSNhg0Zy CisRuWeiqWqx76ZmBH2L3kj0Fe2vsZvkZIo69oSOI0RlZhKkj/yAmQ1wOxyrKpoAVJ/+ FhLvsSF/4BnFj11CihzNfyRIiJEHiu5FWdyJMy50g/WJaHl4R6wfwbX7l91rcrehQAqN qGBNVJP54kLy2KPj19C3v2NK6hPOlzp9JDna1CtMu5XQr5iZ3FEFOlvL3W42msH7AAzJ EaDQ== X-Gm-Message-State: AJIora/LcEmLBgeDvqZ3gswiV2EvbRoh6Mhhna8nCtEbHGYK8NY786gD QL642ROcPl8CML4xpiHar0YEmHXgf7Mk9w== X-Google-Smtp-Source: AGRyM1sb/hLLgQxmWZ9et/0ID1KiNvoqUq/Gi7gZXrgCBmhaXijCGgppkjA8RDKs3Px+si88wnwFng== X-Received: by 2002:a17:906:14d:b0:711:ffc4:3932 with SMTP id 13-20020a170906014d00b00711ffc43932mr19100498ejh.321.1656442797275; Tue, 28 Jun 2022 11:59:57 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 11:59:56 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 04/29] skbuff: carry external ubuf_info in msghdr Date: Tue, 28 Jun 2022 19:56:26 +0100 Message-Id: <1634a40ad0cf05eeee8dd9e88d89f1558704bf2c.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 4 ++++ include/linux/socket.h | 7 +++++++ net/compat.c | 2 ++ net/socket.c | 6 ++++++ 4 files changed, 19 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 8e75539fdc1d..6a57a5ae18fb 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6230,6 +6230,8 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; flags = sr->msg_flags; if (issue_flags & IO_URING_F_NONBLOCK) @@ -6500,6 +6502,8 @@ static int io_recv(struct io_kiocb *req, unsigned int issue_flags) msg.msg_flags = 0; msg.msg_controllen = 0; msg.msg_iocb = NULL; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; flags = sr->msg_flags; if (force_nonblock) diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..ba84ee614d5a 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -66,9 +66,16 @@ struct msghdr { }; bool msg_control_is_user : 1; bool msg_get_inq : 1;/* return INQ after receive */ + /* + * The data pages are pinned and won't be released before ->msg_ubuf + * is released. ->msg_iter should point to a bvec and ->msg_ubuf has + * to be non-NULL. + */ + bool msg_managed_data : 1; unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { diff --git a/net/compat.c b/net/compat.c index 210fc3b4d0d8..435846fa85e0 100644 --- a/net/compat.c +++ b/net/compat.c @@ -80,6 +80,8 @@ int __get_compat_msghdr(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; + kmsg->msg_managed_data = false; *ptr = msg.msg_iov; *len = msg.msg_iovlen; return 0; diff --git a/net/socket.c b/net/socket.c index 2bc8773d9dc5..0963a02b1472 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2106,6 +2106,8 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; if (addr) { err = move_addr_to_kernel(addr, addr_len, &address); if (err < 0) @@ -2171,6 +2173,8 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags, msg.msg_namelen = 0; msg.msg_iocb = NULL; msg.msg_flags = 0; + msg.msg_ubuf = NULL; + msg.msg_managed_data = false; if (sock->file->f_flags & O_NONBLOCK) flags |= MSG_DONTWAIT; err = sock_recvmsg(sock, &msg, flags); @@ -2409,6 +2413,8 @@ int __copy_msghdr_from_user(struct msghdr *kmsg, return -EMSGSIZE; kmsg->msg_iocb = NULL; + kmsg->msg_ubuf = NULL; + kmsg->msg_managed_data = false; *uiov = msg.msg_iov; *nsegs = msg.msg_iovlen; return 0; From patchwork Tue Jun 28 18:56:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898742 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A9D8C43334 for ; Tue, 28 Jun 2022 19:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234896AbiF1TAz (ORCPT ); Tue, 28 Jun 2022 15:00:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234142AbiF1TAX (ORCPT ); Tue, 28 Jun 2022 15:00:23 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E65BDF5B; Tue, 28 Jun 2022 11:59:59 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id e40so18931024eda.2; Tue, 28 Jun 2022 11:59:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mNAl7aHtaBa/r+OrmETRXcPT/IbWKen65cbhqfyTMuk=; b=XzGizqos91CZ9HG5E3teB1nM0p5c7P2MErQhLJ8zGkP2eOQmvAmUunoTYuOetkQ62u w924Z+ChBUqErhBkJR1TBK3i2PhiqeDLtnN3ynk7AJxNPWIo84QYgUROay5gWXjlNGit DLBDrNJj29XtDocUxRtEY9+LLBzdfxfs0Pxx89KSjGfjjPiKNpq8EiaVQV+r/MvxGMxx d5mIybokNy+uat9yBdbYYIhPz0QrCUb8cyJDW1FqMHbQ/51egrIUL25WMTTtb1CLB3/o w7x6a/+sgSZSu0Fwx2qU5Z26j6AshHGYiRWxrBMm+Rf2zBDaxc3JB3RNQ9g/20jVQbsm sQQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mNAl7aHtaBa/r+OrmETRXcPT/IbWKen65cbhqfyTMuk=; b=nvv30mqo6HepTc/8Uj4KCWnS9+ZCRfhz79n2UztOIQGKgk/FEX/XrOk+2oTMbeKhPP F9aMTnHEq0+dZ4xDlugyRyiXqZ6ok6jsjbFRN0MRw8AxQvwvCQpWa/uzExZNq44luxb0 8YPcM7OeLtZivtqOhOEhcwDnRorCoiukaYtfNV12WujkpogCO+cGfqck9wWFJOUcj3x0 8EdcUQndVjADYtm2CSvHoj2Ki5vmFC6Zd5tOYB6d7drnP87UI9Wy5ptW2a0U4/Rzm9Uu QPpEWmrHGjcK/GDR3z3+oHK3TTgzsH8glzO36MzjSsiZRV1LUlIdPM7HSNd+1UnlzoBc EXIA== X-Gm-Message-State: AJIora+gCiAO1diPGvoEdj3CfC2T+ary6Mnnc/EvNNuCvKOW/r67UbVU p7D0vI2Y2EXf/TcLnexK0xLM7gaJKlIqGQ== X-Google-Smtp-Source: AGRyM1ur8kgCpV+3tyFwUj7TgowpM1DSKOSUF/XBNgpKQ89Sb0McR6iQywly0THS8KpazclC+9QQWw== X-Received: by 2002:a05:6402:448c:b0:435:9dcc:b8a5 with SMTP id er12-20020a056402448c00b004359dccb8a5mr24677698edb.287.1656442798496; Tue, 28 Jun 2022 11:59:58 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 11:59:58 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 05/29] net: bvec specific path in zerocopy_sg_from_iter Date: Tue, 28 Jun 2022 19:56:27 +0100 Message-Id: <5143111391e771dc97237e2a5e6a74223ef8f15f.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add an bvec specialised and optimised path in zerocopy_sg_from_iter. It'll be used later for {get,put}_page() optimisations. Signed-off-by: Pavel Begunkov --- net/core/datagram.c | 47 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/net/core/datagram.c b/net/core/datagram.c index 50f4faeea76c..5237cb533bb4 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -613,11 +613,58 @@ int skb_copy_datagram_from_iter(struct sk_buff *skb, int offset, } EXPORT_SYMBOL(skb_copy_datagram_from_iter); +static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length) +{ + int frag = skb_shinfo(skb)->nr_frags; + int ret = 0; + struct bvec_iter bi; + ssize_t copied = 0; + unsigned long truesize = 0; + + bi.bi_size = min(from->count, length); + bi.bi_bvec_done = from->iov_offset; + bi.bi_idx = 0; + + while (bi.bi_size && frag < MAX_SKB_FRAGS) { + struct bio_vec v = mp_bvec_iter_bvec(from->bvec, bi); + + copied += v.bv_len; + truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); + get_page(v.bv_page); + skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + bvec_iter_advance_single(from->bvec, &bi, v.bv_len); + } + if (bi.bi_size) + ret = -EMSGSIZE; + + from->bvec += bi.bi_idx; + from->nr_segs -= bi.bi_idx; + from->count = bi.bi_size; + from->iov_offset = bi.bi_bvec_done; + + skb->data_len += copied; + skb->len += copied; + skb->truesize += truesize; + + if (sk && sk->sk_type == SOCK_STREAM) { + sk_wmem_queued_add(sk, truesize); + if (!skb_zcopy_pure(skb)) + sk_mem_charge(sk, truesize); + } else { + refcount_add(truesize, &skb->sk->sk_wmem_alloc); + } + return ret; +} + int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { int frag = skb_shinfo(skb)->nr_frags; + if (iov_iter_is_bvec(from)) + return __zerocopy_sg_from_bvec(sk, skb, from, length); + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL; From patchwork Tue Jun 28 18:56:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898740 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6517CC43334 for ; Tue, 28 Jun 2022 19:00:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234047AbiF1TAv (ORCPT ); Tue, 28 Jun 2022 15:00:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234147AbiF1TAX (ORCPT ); Tue, 28 Jun 2022 15:00:23 -0400 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 686F4E0DD; Tue, 28 Jun 2022 12:00:01 -0700 (PDT) Received: by mail-ed1-x52f.google.com with SMTP id z41so349149ede.1; Tue, 28 Jun 2022 12:00:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Bdgo+hT6HB62dDCdI9J2Sy239yS/CKVVU+5RK13wq6M=; b=ZFTcDJIlJGmoVZbRJryTC9cl4m1kVAeLrNxrXfr0zCc9x5kgbmgWF4wrjqiWSZwpXJ 5plryWBzU7wjuAldneJkppqThOv4ph0NM/Y1/fnVREW/+c8Go4+iwbHW4MptCxXYYhrU u3GNN6p/y4jtDj+Kj3L3CtRLbkTJcbR0NU8P+R1pLOz3JqkbmNG/j0ie4/uD2tsLYlhw d3GRUB5CrwmIroZGcwAcOsP42IS9E4vGTOtl0gz7JdSP62HvoTzURSq+aQif9oncb3WC oReEpYQwV7n8hWNNztZiiidD1A8a4SJ2m6WKyrcsL5OdqcimT8KRfFtkK3b1DmYcD+KB LLmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Bdgo+hT6HB62dDCdI9J2Sy239yS/CKVVU+5RK13wq6M=; b=OoaSwISwtFr8UtwbrJ6q/fePnnbp58X9Ef2W/bp2iS57tlZp9GY72Kp7woXCIPJExz Qy+JzCYITOPbM7seg2omuPTfL/HiCO/aH+KnQ4HoLx9gomF2IHl8vM/rRcIZegnit+ZT ZZYPaUAzMAzQ+ifEI6pvEpMoQIR96wmmkvlfGrKaFK5usskGS94WL07e+1BqV7fyX+YB bkanUahoZie/pHUe/5WRlVdDv+ISS8YfTIvyUZbfrQ4oj8w46dadzmDfp+I6Vjbx99tt KbHGrK5a304SshS76PRP00kFztasUZ1vm4NGryfk5VeMsz0cVWie1eXkeTpwKlLX8YEU rxbg== X-Gm-Message-State: AJIora+Qy0O5n9zM90u7ENStqCLk4i4kmESq5FMSWebB2SMl8HZKfzjR 1hybTbJDUfmLlGp5B0Gog0yaPji/38jyiQ== X-Google-Smtp-Source: AGRyM1sh2/lq0momfRE3iBMG6dQ7dklpY2m4073o7XNOm4SZL9uV4hpJUm0CeqkQXhInGCl73m8zNQ== X-Received: by 2002:a05:6402:5299:b0:435:61da:9bb9 with SMTP id en25-20020a056402529900b0043561da9bb9mr25470449edb.21.1656442799702; Tue, 28 Jun 2022 11:59:59 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 11:59:59 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 06/29] net: optimise bvec-based zc page referencing Date: Tue, 28 Jun 2022 19:56:28 +0100 Message-Id: <597c7c76624997d8933740e74e9b82d026bcfeff.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Some users like io_uring can pass a bvec iterator to send and also can implement page pinning more efficiently. Add a ->msg_managed_data toogle in msghdr. When set, data pages are "managed" by upper layers, i.e. refcounted and pinned by the caller and will live at least until ->msg_ubuf is released. msghdr has to have non-NULL ->msg_ubuf and ->msg_iter should point to a bvec. Protocols supporting the feature will propagate it by setting SKBFL_MANAGED_FRAG_REFS, which means that the skb doesn't hold refs to its frag pages and only rely on ubuf_info lifetime gurantees. It should only be used with zerocopy skbs with ubuf_info set. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 25 +++++++++++++++++++++++-- net/core/datagram.c | 7 ++++--- net/core/skbuff.c | 29 +++++++++++++++++++++++++++-- 3 files changed, 54 insertions(+), 7 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index eead3527bdaf..5407cfd9cb89 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -688,11 +688,16 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAG_REFS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ - SKBFL_DONT_ORPHAN) + SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -1809,6 +1814,11 @@ static inline bool skb_zcopy_pure(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_PURE_ZEROCOPY; } +static inline bool skb_zcopy_managed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -1883,6 +1893,14 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) } } +void __skb_zcopy_downgrade_managed(struct sk_buff *skb); + +static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + if (unlikely(skb_zcopy_managed(skb))) + __skb_zcopy_downgrade_managed(skb); +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -3491,7 +3509,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!skb_zcopy_managed(skb)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** diff --git a/net/core/datagram.c b/net/core/datagram.c index 5237cb533bb4..a93c05156f56 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -631,7 +631,6 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - get_page(v.bv_page); skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } @@ -660,11 +659,13 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { - int frag = skb_shinfo(skb)->nr_frags; + int frag; - if (iov_iter_is_bvec(from)) + if (skb_zcopy_managed(skb)) return __zerocopy_sg_from_bvec(sk, skb, from, length); + frag = skb_shinfo(skb)->nr_frags; + while (length && iov_iter_count(from)) { struct page *pages[MAX_SKB_FRAGS]; struct page *last_head = NULL; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b35791064d1..71870def129c 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,11 +666,18 @@ static void skb_release_data(struct sk_buff *skb) &shinfo->dataref)) goto exit; - skb_zcopy_clear(skb, true); + if (skb_zcopy(skb)) { + bool skip_unref = shinfo->flags & SKBFL_MANAGED_FRAG_REFS; + + skb_zcopy_clear(skb, true); + if (skip_unref) + goto free_head; + } for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle); +free_head: if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); @@ -895,7 +902,10 @@ EXPORT_SYMBOL(skb_dump); */ void skb_tx_error(struct sk_buff *skb) { - skb_zcopy_clear(skb, true); + if (skb) { + skb_zcopy_downgrade_managed(skb); + skb_zcopy_clear(skb, true); + } } EXPORT_SYMBOL(skb_tx_error); @@ -1371,6 +1381,16 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream); +void __skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + int i; + + skb_shinfo(skb)->flags &= ~SKBFL_MANAGED_FRAG_REFS; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) + skb_frag_ref(skb, i); +} +EXPORT_SYMBOL_GPL(__skb_zcopy_downgrade_managed); + static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig, gfp_t gfp_mask) { @@ -1688,6 +1708,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, BUG_ON(skb_shared(skb)); + skb_zcopy_downgrade_managed(skb); + size = SKB_DATA_ALIGN(size); if (skb_pfmemalloc(skb)) @@ -3484,6 +3506,8 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len) int pos = skb_headlen(skb); const int zc_flags = SKBFL_SHARED_FRAG | SKBFL_PURE_ZEROCOPY; + skb_zcopy_downgrade_managed(skb); + skb_shinfo(skb1)->flags |= skb_shinfo(skb)->flags & zc_flags; skb_zerocopy_clone(skb1, skb, 0); if (len < pos) /* Split line is inside header. */ @@ -3837,6 +3861,7 @@ int skb_append_pagefrags(struct sk_buff *skb, struct page *page, if (skb_can_coalesce(skb, i, page, offset)) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], size); } else if (i < MAX_SKB_FRAGS) { + skb_zcopy_downgrade_managed(skb); get_page(page); skb_fill_page_desc(skb, i, page, offset, size); } else { From patchwork Tue Jun 28 18:56:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898741 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00E78C433EF for ; Tue, 28 Jun 2022 19:00:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234878AbiF1TAy (ORCPT ); Tue, 28 Jun 2022 15:00:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234150AbiF1TAX (ORCPT ); Tue, 28 Jun 2022 15:00:23 -0400 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAE4B10FE7; Tue, 28 Jun 2022 12:00:02 -0700 (PDT) Received: by mail-ej1-x634.google.com with SMTP id fw3so37479ejc.10; Tue, 28 Jun 2022 12:00:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nt1evtlLqyE8dlHcYWIj2tMJ8whNRpAZNuITmbrrROw=; b=dXY006SLFZrYJ9IwScIIe2llnM0NV84sw7x6ypoVhNoS2ZJHIHeCreL61iuCxWQWi8 vUBJ6Cc9hmulBY8M8cs4ADiZIpIghW/Ey8Cch1mFnGOAbZj73JMkmAHZdWj6C+Ws3h01 HC+LgcQKVD/+Q9x9jwmE/yPPRlIZyjQQvbH7rAHdqN7+pxSONjfsX3ui2jV5WAbnFy9j pP0G+e13lLhLq0ZGD/4q3igVvqrzwC4zPJRsTRHVAyfDWUlDldoYLF0W0yZ36HEpLlDw gRtrddZ1A/BzD28yRljqN0Ey2BFQRvEADY0TnMsRQ2VFlUCqY9yzhPFzX2DIKZfwKd3o iIJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nt1evtlLqyE8dlHcYWIj2tMJ8whNRpAZNuITmbrrROw=; b=u81s213lS49LNGBpuRGUPYdjrJU+RAU3T6WNNpZz9EXDQ4eBwbvZCb1FOIygiHDOs0 w+CitgLamGuCGjRjSlCuN5gj+/SmRF0K+VLAnTqOkC34c53bmJX8yoyVrDFVclwjOOF6 xXqpKHllumLQSGSxBolqtWokxOv6S1/VPwEYsmfzAYmI1tLW50yNASmodatZTCCgtcxL hglWfhKn9eTQ+t1cVi1FKgZtKnVQYoYSrdFuYAzaUSbmYm5D/cKp/q5+43YBX3KxQZfS KeLn5JpuRwZWYLmCjfvAcTRz5ZfacL0U1k6PQIy2pu11b1xuMf7AKGMu/aVv9DOHGAqT HDZA== X-Gm-Message-State: AJIora92BBvLy/qGTcGewkv0HQoDZpGXPFSUkO8lwyef/n+xl6Nxu+bU B5xvnLMnLnMvsYBGC4FLRzrRIIAgc+7y9g== X-Google-Smtp-Source: AGRyM1t0iY1G882voOb2vPhB7UdFGBTcTIMEWUhoDGumeYntqH9sLAu7yTOSvdGrDge8ctQru7iVzw== X-Received: by 2002:a17:907:6294:b0:6e1:ea4:74a3 with SMTP id nd20-20020a170907629400b006e10ea474a3mr19641396ejc.168.1656442800956; Tue, 28 Jun 2022 12:00:00 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.11.59.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:00 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 07/29] net: don't track pfmemalloc for managed frags Date: Tue, 28 Jun 2022 19:56:29 +0100 Message-Id: <5271342adcb77d39d148906bbbe215c1bbca3907.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Managed pages contain pinned userspace pages and controlled by upper layers, there is no need in tracking skb->pfmemalloc for them. Signed-off-by: Pavel Begunkov --- include/linux/skbuff.h | 28 +++++++++++++++++----------- net/core/datagram.c | 7 +++++-- 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5407cfd9cb89..6cca146be1f4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2549,6 +2549,22 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, + int i, struct page *page, + int off, int size) +{ + skb_frag_t *frag = &shinfo->frags[i]; + + /* + * Propagate page pfmemalloc to the skb if we can. The problem is + * that not all callers have unique ownership of the page but rely + * on page_is_pfmemalloc doing the right thing(tm). + */ + frag->bv_page = page; + frag->bv_offset = off; + skb_frag_size_set(frag, size); +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2565,17 +2581,7 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - /* - * Propagate page pfmemalloc to the skb if we can. The problem is - * that not all callers have unique ownership of the page but rely - * on page_is_pfmemalloc doing the right thing(tm). - */ - frag->bv_page = page; - frag->bv_offset = off; - skb_frag_size_set(frag, size); - + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; diff --git a/net/core/datagram.c b/net/core/datagram.c index a93c05156f56..3c913a6342ad 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -616,7 +616,8 @@ EXPORT_SYMBOL(skb_copy_datagram_from_iter); static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length) { - int frag = skb_shinfo(skb)->nr_frags; + struct skb_shared_info *shinfo = skb_shinfo(skb); + int frag = shinfo->nr_frags; int ret = 0; struct bvec_iter bi; ssize_t copied = 0; @@ -631,12 +632,14 @@ static int __zerocopy_sg_from_bvec(struct sock *sk, struct sk_buff *skb, copied += v.bv_len; truesize += PAGE_ALIGN(v.bv_len + v.bv_offset); - skb_fill_page_desc(skb, frag++, v.bv_page, v.bv_offset, v.bv_len); + __skb_fill_page_desc_noacc(shinfo, frag++, v.bv_page, + v.bv_offset, v.bv_len); bvec_iter_advance_single(from->bvec, &bi, v.bv_len); } if (bi.bi_size) ret = -EMSGSIZE; + shinfo->nr_frags = frag; from->bvec += bi.bi_idx; from->nr_segs -= bi.bi_idx; from->count = bi.bi_size; From patchwork Tue Jun 28 18:56:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898739 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E65FDCCA479 for ; Tue, 28 Jun 2022 19:00:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231822AbiF1TAs (ORCPT ); Tue, 28 Jun 2022 15:00:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234174AbiF1TAY (ORCPT ); Tue, 28 Jun 2022 15:00:24 -0400 Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E281413FBE; Tue, 28 Jun 2022 12:00:03 -0700 (PDT) Received: by mail-ed1-x535.google.com with SMTP id z41so349286ede.1; Tue, 28 Jun 2022 12:00:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=p9WO85kcK3aAv/etCUIOEoECnFGVfOx8cNxht2BIvVM=; b=JGyeYXVfiLecdLB8zE2WNfeC2X2w/NH6K6N0RQePVZJ3WuaGGktFqNbBp2pYWCHgbi Ad4Nt0obLDehOoY7uQC0L8icGt3PHUwmc2oF30EDGlVdKEj2XHcckD2FI3lMCgK4h2k6 +zfftNBwUU91pCwMCiQReDJ7RetlL8SACeGP/xSdBXYQ5pcwa06JjuKKpLMSJiZPs7Vt PoO8sbrzW4oAkThtVY8R9pDV+LIk/PXnNs0MBkiMDVH62zTdbmCM1FqNROt1p99ijpUe UJKgN11JiVWi7Z+UuhgaEb72aLmoK9oU7tKYlEJWKxyR3AKzzb1OLBrFQyeZbc4NFsV5 DEOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=p9WO85kcK3aAv/etCUIOEoECnFGVfOx8cNxht2BIvVM=; b=nwyPd34a4rL3nDOxjUY8kNfuq92RRJM7RZnxb0bt+BWi7yiIwn56WE0JMPMX84ZU1z gA33f8tz7Hj8nPzqdYEoozHBMKo2qbBjCS6adlvIPLQ0JW/IYgF+vwivzeB5WgscwUcu /1/67GRYG8Fj3uD4jcjpBJuZdmm5rxse4OemgfkrQle5QMxn35uGJNgNQ7mdBY3vhV81 QnODxWotkqKLN+3GYTLlF8GKnRrwmBrYHKY9l/L26TUjpI4R6e376dVcNwZHCIvPjuKe gxCQdzpHz+fnrsTwjA4RnRDGg2kCKJ0hZT7rkIauVNLGIPb4eqLQYgTa9K7FM6k9/a0r mblQ== X-Gm-Message-State: AJIora+0QZjYI7TyFL+mSIjT7OKXsd2+fBuTZySlL/mE4HfpbMsfwiyz XvT0uHCyPFLBlWDXTJp5vTdLbik1d2K00w== X-Google-Smtp-Source: AGRyM1sUVU6MacxwIl9Qn3l7tOsQqWMQWbGamM2B8AEnEWs1e6t8y1ZTIHOrT1WjkwOr8U04Qgf4TA== X-Received: by 2002:a05:6402:44a:b0:437:8234:f4c6 with SMTP id p10-20020a056402044a00b004378234f4c6mr8383733edw.346.1656442802159; Tue, 28 Jun 2022 12:00:02 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:01 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 08/29] skbuff: don't mix ubuf_info of different types Date: Tue, 28 Jun 2022 19:56:30 +0100 Message-Id: <1e6515412ce815241c1d950f5d13f5b300e9edfb.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC We should not append MSG_ZEROCOPY requests to skbuff with non MSG_ZEROCOPY ubuf_info, they are not compatible. Signed-off-by: Pavel Begunkov --- net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 71870def129c..7e6fcb3cd817 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1222,6 +1222,10 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, const u32 byte_limit = 1 << 19; /* limit to a few TSO */ u32 bytelen, next; + /* there might be non MSG_ZEROCOPY users */ + if (uarg->callback != msg_zerocopy_callback) + return NULL; + /* realloc only when socket is locked (TCP, UDP cork), * so uarg->len and sk_zckey access is serialized */ From patchwork Tue Jun 28 18:56:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898738 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CF5FCCA47F for ; Tue, 28 Jun 2022 19:00:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232475AbiF1TAp (ORCPT ); Tue, 28 Jun 2022 15:00:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234261AbiF1TAY (ORCPT ); Tue, 28 Jun 2022 15:00:24 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26341140AA; Tue, 28 Jun 2022 12:00:04 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id fd6so18898459edb.5; Tue, 28 Jun 2022 12:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+E9HBSea0QLQi12+8hra9Imm7sGSrMpugcSoqydPq9A=; b=dTWqNK98Gn4MmP59Z3IyWusFihnYIWy5+kLzy6LU7kO5oG8DE2jCQGGrI79EF8oQla 5YDvsN7S0ilHpH5A636wABoV2x8mhtF4JcksaLm6bOMuOT7HDgVUyOaaKrkKlROzvfkG TsYlcgvrWbiFnl9gkdN3sUgFEaxr4IjrbDxCAU8FQ1+ukETiyPMK7n9Kiw4/657JIMIO D0nCY9l2t8vWDj04S23NUNbQKqx6aHiSOuGlSo1P05U1l/qewYgO/rVWnuPkEDPV1D04 MGOE7t3OpvcyS2Qiul9PqzLf1GPcy9W32EBZxCmmZ5bN+2P9jmqIKdak4NCCGE0UD94X VlGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+E9HBSea0QLQi12+8hra9Imm7sGSrMpugcSoqydPq9A=; b=P+bZZ6j04uFxj6q5kpyadzp2ozcXQFWHfzt8/c0qGm9NxNfmwTrImrslVJb/nBtFT4 n49GY/SYG3/B9nMXZD/L/vap4kDu5JWgsZ33lhuKPTMzqZZs+sNSgyoIt1inFNWBH4q3 dJXpG4slB7xQ6xPlkuLvVPmgJs3d2kKqJrVDYHQaInndq0IlBxENmxcckknoKANEY6sm DxOhNJ8lwL3i/6OeWFBg9PbzmepOdrnln5YxHmJ2tOvLE3j30TmXC2erM6Vx/E61VGKq cxTR7G7c5yaekOTsLHG+LOGLhvaWdkRHqR89tROtNyLAIYRuJtrTdHeOl1ogAetZQE/R sOPA== X-Gm-Message-State: AJIora8bBV2fJvNAhu9Ikx/qhhPSgDn0Bqa6CE9eqECCYjdU41uNJ7Ck 0B8EeX0pajLPHzn1xM1IhdJkaNvTmEtDWQ== X-Google-Smtp-Source: AGRyM1u2+XY5XlbTpT1uHgGVfgjs7SYGIuDhO7cn7CWPNpq2P5wOt5KyPc0neCqLmzSZn9aIxfU4xQ== X-Received: by 2002:aa7:c486:0:b0:435:5d50:ab39 with SMTP id m6-20020aa7c486000000b004355d50ab39mr25462247edq.104.1656442803398; Tue, 28 Jun 2022 12:00:03 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:02 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 09/29] ipv4/udp: support zc with managed data Date: Tue, 28 Jun 2022 19:56:31 +0100 Message-Id: <1904009c2af0197b922e413254ef2ff2c527f743.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Teach ipv4/udp about managed data. Make it recognise and use msg->msg_ubuf, and also set/propagate SKBFL_MANAGED_FRAG_REFS down to skb_zerocopy_iter_dgram(). Signed-off-by: Pavel Begunkov --- net/ipv4/ip_output.c | 57 +++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 581d1e233260..3fd1bf675598 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1017,18 +1017,35 @@ static int __ip_append_data(struct sock *sk, (!exthdrlen || (rt->dst.dev->features & NETIF_F_HW_ESP_TX_CSUM))) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1192,13 +1209,14 @@ static int __ip_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; @@ -1223,7 +1241,18 @@ static int __ip_append_data(struct sock *sk, skb->truesize += copy; wmem_alloc_delta += copy; } else { - err = skb_zerocopy_iter_dgram(skb, from, copy); + struct msghdr *msg = from; + + if (!skb_shinfo(skb)->nr_frags) { + if (msg->msg_managed_data) + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; + } else { + /* appending, don't mix managed and unmanaged */ + if (!msg->msg_managed_data) + skb_zcopy_downgrade_managed(skb); + } + + err = skb_zerocopy_iter_dgram(skb, msg, copy); if (err < 0) goto error; } From patchwork Tue Jun 28 18:56:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898737 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFEB0C43334 for ; Tue, 28 Jun 2022 19:00:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233048AbiF1TAm (ORCPT ); Tue, 28 Jun 2022 15:00:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234275AbiF1TAZ (ORCPT ); Tue, 28 Jun 2022 15:00:25 -0400 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B1E0140B8; Tue, 28 Jun 2022 12:00:06 -0700 (PDT) Received: by mail-ej1-x62b.google.com with SMTP id q6so27605025eji.13; Tue, 28 Jun 2022 12:00:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DH0j6e+roINRxOwiEwvBMCy+ld2yqYbsEeKnLOkWo28=; b=AJ1mx5B8G/qRZ1wHonjQ+NfAd8jMG0CfK+1Q0qHsmiFtCSLEo8PbfOQEthCaMP4jeG qvtwe6WecBXJuMuBo9rX55vN1eFPDlE/im8kvKOftx7Rasu767aHJ8QRqX4AMvfnYeAG iJFR1krhwt4PvOC1pkqhGc7k7ntxeoAIOQMUkJyyhQoUlKJB1GwjikIG92dib3dC6hEd x4MlJYeXmgi9gO2QNWr4lRPjjG0h77Pt80KJ+ED0ggxtw9yV0CEhdclmXpDmAH3V6y2C 0aIJTzuyOLCe+7ue1OI4k27vmm16PyeVVYZf7DzZYNyzAGSN3uHYRsNQnOSIPmO0wHaj vmWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DH0j6e+roINRxOwiEwvBMCy+ld2yqYbsEeKnLOkWo28=; b=lKcVmT74Gta8/cStKW6m7Zch1tZthK0fxql9CN8mYY7316hlOfhgyE4l3cL1H/nTRv vcmsaD6sT20KgtW9G7lBSCzNBJNfGDTg/dyeHBqOFlpAQX1bSeFDCnTxBSZcHiQzregM yz27vjJpIhFQWEnHpIoLUg9tQ+6vnt07rmmWA1XW1DWGIuYpjHD9Yncckr95S3zSVgCY /bM7vw+QBXmcQNNvq8BYXGNMAXF/fxBnE3Ud1pJTU/2As5cVTU2CfyFgpPrZIsOPf26J i231MixS3Lfz9aETnBNepFAW6CM0UvyVO+UYwAcYAAtYsWDq5beiM2upY6iYCPJeweHX m3XA== X-Gm-Message-State: AJIora9uK1u+8RaDffAT5HgIkGIvPEH6ErIjTCUI5ZZbYOK3VGMeWwvG 7ByMM59zGueknZWwxPdIHUo+0E6WOm8h5g== X-Google-Smtp-Source: AGRyM1uCPdIqLl0a2kpzcP4JJICwksY+7+UmuTZ3GpTgLnvtW+Y4ZDXfPntodmoDBlkEySElTMN1Ow== X-Received: by 2002:a17:907:1b25:b0:6da:8206:fc56 with SMTP id mp37-20020a1709071b2500b006da8206fc56mr889842ejc.81.1656442804512; Tue, 28 Jun 2022 12:00:04 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:04 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 10/29] ipv6/udp: support zc with managed data Date: Tue, 28 Jun 2022 19:56:32 +0100 Message-Id: <4ac277fa467025f164b67a76dfb8e12ff6e8ee7d.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Just as with ipv4/udp make ipv6/udp to take advantage of managed data and propagate SKBFL_MANAGED_FRAG_REFS to skb_zerocopy_iter_dgram(). Signed-off-by: Pavel Begunkov --- net/ipv6/ip6_output.c | 57 ++++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 6103cd9066ff..f4138ce6eda3 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1542,18 +1542,35 @@ static int __ip6_append_data(struct sock *sk, rt->dst.dev->features & (NETIF_F_IPV6_CSUM | NETIF_F_HW_CSUM)) csummode = CHECKSUM_PARTIAL; - if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) { - uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); - if (!uarg) - return -ENOBUFS; - extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ - if (rt->dst.dev->features & NETIF_F_SG && - csummode == CHECKSUM_PARTIAL) { - paged = true; - zc = true; - } else { - uarg->zerocopy = 0; - skb_zcopy_set(skb, uarg, &extra_uref); + if ((flags & MSG_ZEROCOPY) && length) { + struct msghdr *msg = from; + + if (getfrag == ip_generic_getfrag && msg->msg_ubuf) { + if (skb_zcopy(skb) && msg->msg_ubuf != skb_zcopy(skb)) + return -EINVAL; + + /* Leave uarg NULL if can't zerocopy, callers should + * be able to handle it. + */ + if ((rt->dst.dev->features & NETIF_F_SG) && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + uarg = msg->msg_ubuf; + } + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); + if (!uarg) + return -ENOBUFS; + extra_uref = !skb_zcopy(skb); /* only ref on new uarg */ + if (rt->dst.dev->features & NETIF_F_SG && + csummode == CHECKSUM_PARTIAL) { + paged = true; + zc = true; + } else { + uarg->zerocopy = 0; + skb_zcopy_set(skb, uarg, &extra_uref); + } } } @@ -1747,13 +1764,14 @@ static int __ip6_append_data(struct sock *sk, err = -EFAULT; goto error; } - } else if (!uarg || !uarg->zerocopy) { + } else if (!zc) { int i = skb_shinfo(skb)->nr_frags; err = -ENOMEM; if (!sk_page_frag_refill(sk, pfrag)) goto error; + skb_zcopy_downgrade_managed(skb); if (!skb_can_coalesce(skb, i, pfrag->page, pfrag->offset)) { err = -EMSGSIZE; @@ -1778,7 +1796,18 @@ static int __ip6_append_data(struct sock *sk, skb->truesize += copy; wmem_alloc_delta += copy; } else { - err = skb_zerocopy_iter_dgram(skb, from, copy); + struct msghdr *msg = from; + + if (!skb_shinfo(skb)->nr_frags) { + if (msg->msg_managed_data) + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; + } else { + /* appending, don't mix managed and unmanaged */ + if (!msg->msg_managed_data) + skb_zcopy_downgrade_managed(skb); + } + + err = skb_zerocopy_iter_dgram(skb, msg, copy); if (err < 0) goto error; } From patchwork Tue Jun 28 18:56:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898748 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F30CACCA47E for ; Tue, 28 Jun 2022 19:01:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232210AbiF1TBR (ORCPT ); Tue, 28 Jun 2022 15:01:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234356AbiF1TA2 (ORCPT ); Tue, 28 Jun 2022 15:00:28 -0400 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 582BF17A9C; Tue, 28 Jun 2022 12:00:07 -0700 (PDT) Received: by mail-ed1-x531.google.com with SMTP id eq6so18872171edb.6; Tue, 28 Jun 2022 12:00:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=5DskBu6zK/vS94aIx3bV1D04eBSiML1zSrR0iXhmDVw=; b=Xi3HjuCqDt5f5v6qFcqs8wBXNipUK5e1yBMsksvYLbz4Yug+qWtGS70p/lM1XWmFrK 7wi8pHMzynNaVaQ10q++3oSpKFsIK6cdPWxdfYPisyahGnut51iQaYt9w2NSNFyWjsWB FUrh93nvW4mmO63XAlPT/S7cWsjjmQDntNIMoYqiNWQ5POCjwfC/tUfCZAYi62XBKZry gsCkWhBrf+FFHLOUWoyb8+csdgi0iv8Di1l8YxG+OfGP/m6RLMnyRMh/M0nporHK7Lib EkxrK4EadrtTk5GYbigaAttggzGWoBxlAkn4IYG6XTWCutmV+rnq1i2GBLbKOVIxXoWa Ef5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=5DskBu6zK/vS94aIx3bV1D04eBSiML1zSrR0iXhmDVw=; b=HcylnpzhgE8tDe7Oxs8BcCJiJtrRnETlmNP4PwML2zlFNt6JfZl+TwwBDw6/GxEVdn i7xrN/KIEaN49hZS5oYLurLcgbzJ7PXBfMqCXLon6gViZaMuRqRUY4LbRbcIzfzrZiNo A9usV9alcH+HbY1aCGrHi3uZaG9J4gSwXCjmTs6huomAEhioPHcLh5qvA7Xbzq3MJj8S mtBEt14tgVnLU2bt7zZkGTWokDYllV3f17xszFSpp5GaNCJJgQfYddANpCmeeEcF74Kp P0SdvSKZlS26dgJj39ik8PIO52auNInFe/IlpGXcEs/oYd+IVfZ8d1fMbOe9ew2AkWoI GpYg== X-Gm-Message-State: AJIora8iILo1StO+fJDuJRoRkVoQ9AfOZmTvLkqootUCWHh2a0743LQ0 ggiCxRWVdGs++JqK07+X06KFamYfZ0AagQ== X-Google-Smtp-Source: AGRyM1s7lwC8AO+svb+wrB69arjJr8HZojfDGncuroRBaSog16ZdmPUj3VX+v7AWqBS9n4T5TMrQQA== X-Received: by 2002:a05:6402:538d:b0:435:7ca6:a136 with SMTP id ew13-20020a056402538d00b004357ca6a136mr25601657edb.268.1656442805664; Tue, 28 Jun 2022 12:00:05 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:05 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 11/29] tcp: support zc with managed data Date: Tue, 28 Jun 2022 19:56:33 +0100 Message-Id: <2d0c627c125cf1019096e1db04264e1cb6149dec.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Also make tcp to use managed data and propagate SKBFL_MANAGED_FRAG_REFS to optimise frag pages referencing. Signed-off-by: Pavel Begunkov --- net/ipv4/tcp.c | 51 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 34 insertions(+), 17 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9984d23a7f3e..832c1afcdbe7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1202,17 +1202,23 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) flags = msg->msg_flags; - if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) { + if ((flags & MSG_ZEROCOPY) && size) { skb = tcp_write_queue_tail(sk); - uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); - if (!uarg) { - err = -ENOBUFS; - goto out_err; - } - zc = sk->sk_route_caps & NETIF_F_SG; - if (!zc) - uarg->zerocopy = 0; + if (msg->msg_ubuf) { + uarg = msg->msg_ubuf; + net_zcopy_get(uarg); + zc = sk->sk_route_caps & NETIF_F_SG; + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { + uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); + if (!uarg) { + err = -ENOBUFS; + goto out_err; + } + zc = sk->sk_route_caps & NETIF_F_SG; + if (!zc) + uarg->zerocopy = 0; + } } if (unlikely(flags & MSG_FASTOPEN || inet_sk(sk)->defer_connect) && @@ -1335,8 +1341,13 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) copy = min_t(int, copy, pfrag->size - pfrag->offset); - if (tcp_downgrade_zcopy_pure(sk, skb) || - !sk_wmem_schedule(sk, copy)) + if (unlikely(skb_zcopy_pure(skb) || skb_zcopy_managed(skb))) { + if (tcp_downgrade_zcopy_pure(sk, skb)) + goto wait_for_space; + skb_zcopy_downgrade_managed(skb); + } + + if (!sk_wmem_schedule(sk, copy)) goto wait_for_space; err = skb_copy_to_page_nocache(sk, &msg->msg_iter, skb, @@ -1357,14 +1368,20 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) pfrag->offset += copy; } else { /* First append to a fragless skb builds initial - * pure zerocopy skb + * zerocopy skb */ - if (!skb->len) + if (!skb->len) { + if (msg->msg_managed_data) + skb_shinfo(skb)->flags |= SKBFL_MANAGED_FRAG_REFS; skb_shinfo(skb)->flags |= SKBFL_PURE_ZEROCOPY; - - if (!skb_zcopy_pure(skb)) { - if (!sk_wmem_schedule(sk, copy)) - goto wait_for_space; + } else { + /* appending, don't mix managed and unmanaged */ + if (!msg->msg_managed_data) + skb_zcopy_downgrade_managed(skb); + if (!skb_zcopy_pure(skb)) { + if (!sk_wmem_schedule(sk, copy)) + goto wait_for_space; + } } err = skb_zerocopy_iter_stream(sk, skb, msg, copy, uarg); From patchwork Tue Jun 28 18:56:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898745 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E9A7C433EF for ; Tue, 28 Jun 2022 19:01:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234923AbiF1TBC (ORCPT ); Tue, 28 Jun 2022 15:01:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234310AbiF1TA0 (ORCPT ); Tue, 28 Jun 2022 15:00:26 -0400 Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4909765B; Tue, 28 Jun 2022 12:00:08 -0700 (PDT) Received: by mail-ed1-x52c.google.com with SMTP id c65so18908157edf.4; Tue, 28 Jun 2022 12:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=h75qpnVIOBkauWn8J3X7YKBCKHImJX+zNFdRExDFoHs=; b=ka2eRn+/WtfG/myqqKawD5QFwCJoWSJfeOfoyHs9WGQ9SjgwYtf0+PwDijc4+N1dkz Q1NwVa20WM3fviOcG+qYQJrKRMQxwr9KhmIkYFx2N781FhGFxty8pZOoS3ejCzZwjKTN 2PbuXgi6IWlkcnDpnOEnC7e4ht28Lw8EIzQEgJvm0JD3zDifZXyqiXoieZe/QVJgaNEQ OZ4T2iho8CICMri7n1/7puniFnDSm9Arqdpgb6IkpHZghK+A0jZCPvPHGnmW1afTfcSW 0YgAu5fcYjuo9A0ESuYN9qHxZSxl0gt/M6mCCRso/Fc6hQ7uj/prEPz7NhIfyqYB2Gc6 Z+jA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=h75qpnVIOBkauWn8J3X7YKBCKHImJX+zNFdRExDFoHs=; b=VPnfJkFV+wYLGUpengBGXEi3zi25q1tALeanvtep0r+BOc56UKQICsObZNw/1KM4QH N4s6HkX9ykK9J5T7ZFko8FX9RbFviipusv4RxXDriqJ/rOcrdJHaiSAAD9olJTVxKxLW kcCM+eeYdrdGFRgpaThcfaMex/mE+UKZA0neWEiyCPisjtvw/2p3qytBWMa5sQ8FuzbW RtttiJhqOv/GV7LPnSdFS3O7PYUBSJy1iQZAa+8V5OULdnN9gpIhm6WSnM0Oj8SeJ+aO z5nzGjES7EPQpCHbaNeJrxUH6Rh/3aZhQ0c6DKkMC/n+sXjLVNEDwpAJaIPYy7iKmUke 1NEw== X-Gm-Message-State: AJIora+s7F9w8l35IJGXBt9nTkq03Nbp9Vx6+Z+B/E/dnW1Kq8O4umKt LItaNM/fgy07oP8BJijC+GuqQgsJpDymZw== X-Google-Smtp-Source: AGRyM1sU7nw1K6Rsocdf52of8qspW3fyrA1OmAFouOsXXPzfXcHOVeVha+CJZgDvP3ntUV2TuhANnw== X-Received: by 2002:a05:6402:254c:b0:435:c541:fc8d with SMTP id l12-20020a056402254c00b00435c541fc8dmr25331254edb.385.1656442806885; Tue, 28 Jun 2022 12:00:06 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:06 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 12/29] tcp: kill extra io_uring's uarg refcounting Date: Tue, 28 Jun 2022 19:56:34 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC io_uring guarantees that passed in uarg stays alive until we return from sendmsg, so no need to temporarily refcount-pin it in tcp_sendmsg_locked(). Signed-off-by: Pavel Begunkov --- net/ipv4/tcp.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 832c1afcdbe7..3482c934eec8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1207,7 +1207,6 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (msg->msg_ubuf) { uarg = msg->msg_ubuf; - net_zcopy_get(uarg); zc = sk->sk_route_caps & NETIF_F_SG; } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb)); @@ -1437,7 +1436,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) tcp_push(sk, flags, mss_now, tp->nonagle, size_goal); } out_nopush: - net_zcopy_put(uarg); + if (uarg && !msg->msg_ubuf) + net_zcopy_put(uarg); return copied + copied_syn; do_error: @@ -1446,7 +1446,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (copied + copied_syn) goto out; out_err: - net_zcopy_put_abort(uarg, true); + if (uarg && !msg->msg_ubuf) + net_zcopy_put_abort(uarg, true); err = sk_stream_error(sk, flags, err); /* make sure we wake any epoll edge trigger waiter */ if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) { From patchwork Tue Jun 28 18:56:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898746 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71473C433EF for ; Tue, 28 Jun 2022 19:01:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233131AbiF1TBD (ORCPT ); Tue, 28 Jun 2022 15:01:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234292AbiF1TA0 (ORCPT ); Tue, 28 Jun 2022 15:00:26 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD66A1834A; Tue, 28 Jun 2022 12:00:08 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id ay16so27701169ejb.6; Tue, 28 Jun 2022 12:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/eliap5e56eSK1XJIKpFjzJdFDnEAAzOxoC0xRFCg9s=; b=ng4a/WkkirDrg9xLd+5D0FaZK/PEUwa33PoIhHvBn2h1I8ghHCvnimUGG1oOJZUR4l xetU9D0YQTf+cVE+EtsawtUwI4CV5EZRmNc6A4qTr7w1XshqqHtt1lpSJ6v+W1AnyuvJ wrGeb0o1+AvtKS06LvUXGS+IHTigEZMUs3di5eZm0/8mPNjAUpAoP0ljXkxvRWX5Yr3u pz7Qn5AWkkHDLBblr/lRPKobznku2l5hYyFWsd893Wuxiv9L5VAGzOzZkzziJ3PFaE9Q yhXtWwy23hNTr3i5zj9i/qQPc8U7X4Z2hc48KVmbuHF4QJ1Wh0cASOony3t3Przt5F4O QGew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/eliap5e56eSK1XJIKpFjzJdFDnEAAzOxoC0xRFCg9s=; b=rNIY019Uo5631Ua869nPLgr8gtWD8Xomk6tUDjryYfNuUs/x+HNgGSwRw5W9wb1Gm8 fYwYL8iemygGQ0HYNfuR0xYe7FNS6D9ZN2FAEy24koaNqZWGdv+hFIYcJ7dLOL46SrDq 5OX1iUpq+tHlptwrF1owdICxNUgoDZZvDqoAB6SoRv/aVd/Q7dNrfHOg8Oq35R54DsJX 9hL6wgNfJ7CgwcAeHreNkZ4VPXJ2wGksP8R55q3zXb0beDzW9rd4+YX+esOroFQQQSfQ 83DmqxWzpcqrdrFaah23e4a2ln/auW6HiHRI57rbHOW9zoHs13dutKnm80qY5Phl+u7a dKIA== X-Gm-Message-State: AJIora8LXD2d4ZGYaogJtW/ta223PbiL6pKYKOWIebFDhTJMqYd4LSl6 Bsj0D6imHxWTDb7X3J0HRmiXFreH2pDRyQ== X-Google-Smtp-Source: AGRyM1ubd9IfdC3EMIb/uwZNW9iUepQ2Li08wZlSCpq5YyIseEz2RCZKI13ZzMvXzuq46t9Rr+V7rg== X-Received: by 2002:a17:907:6ea3:b0:726:ca39:5d98 with SMTP id sh35-20020a1709076ea300b00726ca395d98mr5385796ejc.400.1656442808005; Tue, 28 Jun 2022 12:00:08 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:07 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 13/29] net: let callers provide extra ubuf_info refs Date: Tue, 28 Jun 2022 19:56:35 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Subsystems providing external ubufs to the net layer, i.e. ->msg_ubuf, might have a better way to refcount it. For instance, io_uring can ammortise ref allocation. Add a way to pass one extra ref to ->msg_ubuf into the network stack by setting struct msghdr::msg_ubuf_ref bit. Whoever consumes the ref should clear the flat. If not consumed, it's the responsibility of the caller to put it. Make __ip{,6}_append_data() to use it. Signed-off-by: Pavel Begunkov --- include/linux/socket.h | 1 + net/ipv4/ip_output.c | 3 +++ net/ipv6/ip6_output.c | 3 +++ 3 files changed, 7 insertions(+) diff --git a/include/linux/socket.h b/include/linux/socket.h index ba84ee614d5a..ae869dee82de 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -72,6 +72,7 @@ struct msghdr { * to be non-NULL. */ bool msg_managed_data : 1; + bool msg_ubuf_ref : 1; unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 3fd1bf675598..d73ec0a73bd2 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1032,6 +1032,9 @@ static int __ip_append_data(struct sock *sk, paged = true; zc = true; uarg = msg->msg_ubuf; + /* we might've been given a free ref */ + extra_uref = msg->msg_ubuf_ref; + msg->msg_ubuf_ref = false; } } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index f4138ce6eda3..90bbaab21dbc 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1557,6 +1557,9 @@ static int __ip6_append_data(struct sock *sk, paged = true; zc = true; uarg = msg->msg_ubuf; + /* we might've been given a free ref */ + extra_uref = msg->msg_ubuf_ref; + msg->msg_ubuf_ref = false; } } else if (sock_flag(sk, SOCK_ZEROCOPY)) { uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb)); From patchwork Tue Jun 28 18:56:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898747 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DC69CCA479 for ; Tue, 28 Jun 2022 19:01:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234957AbiF1TBH (ORCPT ); Tue, 28 Jun 2022 15:01:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233544AbiF1TA2 (ORCPT ); Tue, 28 Jun 2022 15:00:28 -0400 Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE90418B28; Tue, 28 Jun 2022 12:00:10 -0700 (PDT) Received: by mail-ej1-x636.google.com with SMTP id lw20so27726288ejb.4; Tue, 28 Jun 2022 12:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UONJeK+9EHyQ+C5KYc2Q0yOxdhk4aZ1zarcHPxbpVEY=; b=FjK/M13ETjiXIqAKhIdhfR2WpMeaoovAIhcP2sJfYOENz6mOGZrIdO8aH75lI/KPt/ Bw6w/brAwILJ5A1STFGloCjay/dkeIX8LVUoVsiv1DPDXuVLNMJ6tnnoY2yqfVjgmkUJ RJu9ettGHpJJRqrpuhvHkFgzuXHD9uH2jS/FRM2xYgm5XuWFgHwCLjllPdIhcWLj0bTA 53DWfk2ytIDL1j1WXZ2XNtjRv+kaU0dYAFMjMvdSEDjQx+4VrzfVxgdx5IDiYUuA47so Nflx+Xnrx98AqgZRKw3crBq9oXycoJhapIZXVBZjS7zQ/yeqi+XCw4gaNNJBwtDNuuyd ksTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UONJeK+9EHyQ+C5KYc2Q0yOxdhk4aZ1zarcHPxbpVEY=; b=56+0NVYO+oV4DNIk609mZ/gb7cAqvPR3DEqu0S7OoOQGwYMwqSJPPC8ITw0MPF/Bt+ OBR5uOHsCSJp2NXfkYcj65CxVq+9sPUx+lurfGkWI/lWykosusC9HUwTQ8G3pMf/9UuM +7Vh0HiSUPwnDFhaOVdT8w6EhFioG9ChXDFnoSnZ4qjYaLT5oBwUBH76/ziQaKbkx7wV eLuwm6f6ViohaR7KCf05uxTqDIPmdAX/aKnTVhzshJiYp569bgAsPfz+r/HUpaay7nAC 1iPSXL1bhHpKH6upNKHZxd68/STCcDDiyiC6jjwHuptC9XgMEC43k7lUoW+jex0WimLV qmvQ== X-Gm-Message-State: AJIora+vFrM4zFYWLKn3cYZDoWloWNarxxL/lr1rDOHSV3Y8snf82Px6 UW/IH9Mu3G19783COhiMgiPIc9wAAqO3Pw== X-Google-Smtp-Source: AGRyM1sqoPxWmG3jBD+RHQNAsgIEpV0+nyh4rKhc8jV5q3gGuiC0v43l5+foAmsZFEKZB9WiRCqV2Q== X-Received: by 2002:a17:907:1c01:b0:6f4:2692:e23 with SMTP id nc1-20020a1709071c0100b006f426920e23mr18883015ejc.243.1656442809225; Tue, 28 Jun 2022 12:00:09 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:08 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 14/29] io_uring: opcode independent fixed buf import Date: Tue, 28 Jun 2022 19:56:36 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Extract an opcode independent helper from io_import_fixed for initialising an iov_iter with a fixed buffer. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 6a57a5ae18fb..e47629adf3f7 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3728,11 +3728,11 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret, } } -static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, - struct io_mapped_ubuf *imu) +static int __io_import_fixed(int rw, struct iov_iter *iter, + struct io_mapped_ubuf *imu, + u64 buf_addr, size_t len) { - size_t len = req->rw.len; - u64 buf_end, buf_addr = req->rw.addr; + u64 buf_end; size_t offset; if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end))) @@ -3802,7 +3802,7 @@ static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, imu = READ_ONCE(ctx->user_bufs[index]); req->imu = imu; } - return __io_import_fixed(req, rw, iter, imu); + return __io_import_fixed(rw, iter, imu, req->rw.addr, req->rw.len); } static int io_buffer_add_list(struct io_ring_ctx *ctx, From patchwork Tue Jun 28 18:56:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898749 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02BFCCCA47F for ; Tue, 28 Jun 2022 19:01:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234997AbiF1TBT (ORCPT ); Tue, 28 Jun 2022 15:01:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234400AbiF1TAa (ORCPT ); Tue, 28 Jun 2022 15:00:30 -0400 Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F791192B2; Tue, 28 Jun 2022 12:00:12 -0700 (PDT) Received: by mail-ej1-x633.google.com with SMTP id cw10so27722463ejb.3; Tue, 28 Jun 2022 12:00:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/DS9gGA8lQr1mQ1qP1UVl4uxwqUZZbMt9fa4yEuYjRY=; b=Y16vBXY3BNMdyY694jmORCUiVVsgwT7R/vsEySJHTa5xjL/1WUS6HZxr8RJkTLsohR KYG1KrzK0bxhuGHEYEiczLTGHkheDaTTvqWMM/M22vrcFRiqa5tJ3ONIsaoPYHX/CaOc LGT6CILsZBUkG3UUoJF+skBT7PgMLG/m4l4TVAbs/l/EovYhYpeWLpq2lfHviKUY7eeK cABSLW3eMzxYMARICAKqS4hjVOQxvTwx3pZfYVQzzZXYmwLndy54b9xgT5gqpH183yD8 s/LJDqTfXOruEvuuNDZcoVbf9Ci3J0RolfudYtBi+NuN1oIMituvB5OWxJlu4RmSsIoi CQLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/DS9gGA8lQr1mQ1qP1UVl4uxwqUZZbMt9fa4yEuYjRY=; b=241VcFm+NvFziOBJsmwlc5z+mAeAsqsKnFrfPQnoFoybA+H1yfSVCQLPsrFqEGQSNX ujFlpLqMdsEQyH7QFzQKhnYn2vgII0PvJxjy6HKNLiUc65tTQ71eweh8YDrHI+dbCvdL CI84duUH8Mxm0vDCDN6qiAWmQPoD5D3DN8YwtwtXfjW6VzhPdOGzUadnUJYpUK4m53iz jx8/7qej7HqMHXWsmHQwksmm/5wYwSvU//uHij30phAP6yXaGu1qIAeUgPqNPsBjfme7 hInpkrcvrdQoOcPQUuXoD7LdG5oZ1sZH0CPZZbHBST2m3BbAfABA27WxOC2ukp1MxEto 5Pog== X-Gm-Message-State: AJIora9Q+7httaUADN2sP/iZywQ7hRu9PgwUDyy8IajdwJ/b9nUyd8uU U/NCKFVX6dG+7RWRrgyHiUkXm0dqUiLr/w== X-Google-Smtp-Source: AGRyM1vk+bLiNcpuFgwwDts6cGS90BWez3dfWZftHhjo+8cjzcN0Ww3ve3c7v6bb4Lq7xPo3FZmIPw== X-Received: by 2002:a17:907:3e8c:b0:726:41fa:2866 with SMTP id hs12-20020a1709073e8c00b0072641fa2866mr17765469ejc.562.1656442810332; Tue, 28 Jun 2022 12:00:10 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:09 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 15/29] io_uring: add zc notification infrastructure Date: Tue, 28 Jun 2022 19:56:37 +0100 Message-Id: <4b2a76541e91194a146788bcd401f438f5b4b45d.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 156 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index e47629adf3f7..7d058deb5f73 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -371,6 +371,43 @@ struct io_ev_fd { struct rcu_head rcu; }; +#define IO_NOTIF_MAX_SLOTS (1U << 10) + +struct io_notif { + struct ubuf_info uarg; + struct io_ring_ctx *ctx; + + /* cqe->user_data, io_notif_slot::tag if not overridden */ + u64 tag; + /* see struct io_notif_slot::seq */ + u32 seq; + + union { + struct callback_head task_work; + struct work_struct commit_work; + }; +}; + +struct io_notif_slot { + /* + * Current/active notifier. A slot holds only one active notifier at a + * time and keeps one reference to it. Flush releases the reference and + * lazily replaces it with a new notifier. + */ + struct io_notif *notif; + + /* + * Default ->user_data for this slot notifiers CQEs + */ + u64 tag; + /* + * Notifiers of a slot live in generations, we create a new notifier + * only after flushing the previous one. Track the sequential number + * for all notifiers and copy it into notifiers's cqe->cflags + */ + u32 seq; +}; + #define BGID_ARRAY 64 struct io_ring_ctx { @@ -423,6 +460,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_notif_slot *notif_slots; + unsigned nr_notif_slots; struct io_submit_state submit_state; @@ -2749,6 +2788,121 @@ static __cold void io_free_req(struct io_kiocb *req) spin_unlock(&ctx->completion_lock); } +static void __io_notif_complete_tw(struct callback_head *cb) +{ + struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_ring_ctx *ctx = notif->ctx; + + spin_lock(&ctx->completion_lock); + io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq); + io_commit_cqring(ctx); + spin_unlock(&ctx->completion_lock); + io_cqring_ev_posted(ctx); + + percpu_ref_put(&ctx->refs); + kfree(notif); +} + +static inline void io_notif_complete(struct io_notif *notif) +{ + __io_notif_complete_tw(¬if->task_work); +} + +static void io_notif_complete_wq(struct work_struct *work) +{ + struct io_notif *notif = container_of(work, struct io_notif, commit_work); + + io_notif_complete(notif); +} + +static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *uarg, + bool success) +{ + struct io_notif *notif = container_of(uarg, struct io_notif, uarg); + + if (!refcount_dec_and_test(&uarg->refcnt)) + return; + INIT_WORK(¬if->commit_work, io_notif_complete_wq); + queue_work(system_unbound_wq, ¬if->commit_work); +} + +static struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif; + + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + + notif->seq = slot->seq++; + notif->tag = slot->tag; + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + /* master ref owned by io_notif_slot, will be dropped on flush */ + refcount_set(¬if->uarg.refcnt, 1); + percpu_ref_get(&ctx->refs); + return notif; +} + +__attribute__((unused)) +static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, + struct io_notif_slot *slot) +{ + if (!slot->notif) + slot->notif = io_alloc_notif(ctx, slot); + return slot->notif; +} + +__attribute__((unused)) +static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, + int idx) + __must_hold(&ctx->uring_lock) +{ + if (idx >= ctx->nr_notif_slots) + return NULL; + idx = array_index_nospec(idx, ctx->nr_notif_slots); + return &ctx->notif_slots[idx]; +} + +static void io_notif_slot_flush(struct io_notif_slot *slot) + __must_hold(&ctx->uring_lock) +{ + struct io_notif *notif = slot->notif; + + slot->notif = NULL; + + if (WARN_ON_ONCE(in_interrupt())) + return; + /* drop slot's master ref */ + if (refcount_dec_and_test(¬if->uarg.refcnt)) + io_notif_complete(notif); +} + +static __cold int io_notif_unregister(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + int i; + + if (!ctx->notif_slots) + return -ENXIO; + + for (i = 0; i < ctx->nr_notif_slots; i++) { + struct io_notif_slot *slot = &ctx->notif_slots[i]; + + if (slot->notif) + io_notif_slot_flush(slot); + } + + kvfree(ctx->notif_slots); + ctx->notif_slots = NULL; + ctx->nr_notif_slots = 0; + return 0; +} + static inline void io_remove_next_linked(struct io_kiocb *req) { struct io_kiocb *nxt = req->link; @@ -11174,6 +11328,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) } #endif WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); + WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); @@ -11368,6 +11523,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) __io_cqring_overflow_flush(ctx, true); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); + io_notif_unregister(ctx); mutex_unlock(&ctx->uring_lock); /* failed during ring init, it couldn't have issued any requests */ From patchwork Tue Jun 28 18:56:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898750 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 986E5C433EF for ; Tue, 28 Jun 2022 19:01:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235015AbiF1TBU (ORCPT ); Tue, 28 Jun 2022 15:01:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234436AbiF1TAa (ORCPT ); Tue, 28 Jun 2022 15:00:30 -0400 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FA15192BC; Tue, 28 Jun 2022 12:00:12 -0700 (PDT) Received: by mail-ej1-x62b.google.com with SMTP id q6so27605025eji.13; Tue, 28 Jun 2022 12:00:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zfHMbLuCKQqFJLG8Xa+1J2oi4mfqrTwl6h13Ej2Ml0E=; b=XQKoHY8GrKGDe5JicrVvTuTlNvN73tTcy6RZF+Mg/PCkmMolIAIQl0kC14UwLCS1U1 ZXPQq9eayrkufn+wD9lDatnxits6uWY3upVEyN/X0EfpbnWcbJshAo9R51UJhgES4a3r hiitciLNIAU7Lcn3l6XfvneGJ7yRvlwZNq7w2krWVH0bio4PKGJa245ZahbBNe4ccKyf jsFJQC5xEmUZcXRvHzuXF8//hhRCyExnHfdGzaBj/f8+EKiyOw2QcsCMaBBO8rnjKH6l 2Lcmt6UOmBVsVqNLZutdo1b8SJvRsfQZsL2hXYXHS2XnMnb0MQvzv+6KKtVZJpoWSJJx LuOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zfHMbLuCKQqFJLG8Xa+1J2oi4mfqrTwl6h13Ej2Ml0E=; b=bjxmoDQnj8N/vLxzvnD7yZIz2tlDl6xBkLSuW2PW5PtGU3aR7hok7Rvevqllxd60R/ w0yCcimHA9CGCd4SOKi6iQyRUqNdiMMaOJPy0YxVSCPQWHZ6VlmuEHZMGuWLtb+C0jX/ h2WUzb1G4uziVEWmI8koBfMD5x9IeQ4NgkUkgYssflgyA5fHjubc4zv3M9kFFzvVm42B aRrWXzO3qlYNzMJw3tC3exoIg/X6z+R8hzsFQVmaH4H1FSFoJSZoF8MK53UUcZyIS5PI ufL3adjn/dQVx2vDzkS7BdF6OC3Rx3FWqEpQhnv29VjqdiFJmuXW0SyARjGqZxpeFPRa GwBA== X-Gm-Message-State: AJIora+dqeD7HLYOUz9ODNfPsyAKYfZKQdfqsz3IhQTnbrUBa0o1spum r5BRZm58ZlmRWsEV8INLbw691c1FlOa9ng== X-Google-Smtp-Source: AGRyM1tUHxbZIc83PqDWmR89oynE48Yyu5+PWEBEifb3wXk/kqiod3sTW8M3H3cT66R2rTRiZTErRA== X-Received: by 2002:a17:907:7f8e:b0:712:f503:1a56 with SMTP id qk14-20020a1709077f8e00b00712f5031a56mr18130211ejc.364.1656442811574; Tue, 28 Jun 2022 12:00:11 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:11 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 16/29] io_uring: cache struct io_notif Date: Tue, 28 Jun 2022 19:56:38 +0100 Message-Id: <91a78581e59863bd45125195055a1712e1e202e3.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 68 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 61 insertions(+), 7 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 7d058deb5f73..422ff835bf36 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -381,6 +381,8 @@ struct io_notif { u64 tag; /* see struct io_notif_slot::seq */ u32 seq; + /* hook into ctx->notif_list and ctx->notif_list_locked */ + struct list_head cache_node; union { struct callback_head task_work; @@ -469,6 +471,8 @@ struct io_ring_ctx { struct xarray io_bl_xa; struct list_head io_buffers_cache; + /* struct io_notif cache protected by uring_lock */ + struct list_head notif_list; struct list_head timeout_list; struct list_head ltimeout_list; struct list_head cq_overflow_list; @@ -481,6 +485,9 @@ struct io_ring_ctx { /* IRQ completion list, under ->completion_lock */ struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; + /* struct io_notif cache protected by completion_lock */ + struct list_head notif_list_locked; + unsigned int notif_locked_nr; const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ @@ -1932,6 +1939,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->locked_free_list); INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); + INIT_LIST_HEAD(&ctx->notif_list); + INIT_LIST_HEAD(&ctx->notif_list_locked); return ctx; err: kfree(ctx->dummy_ubuf); @@ -2795,12 +2804,15 @@ static void __io_notif_complete_tw(struct callback_head *cb) spin_lock(&ctx->completion_lock); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq); + + list_add(¬if->cache_node, &ctx->notif_list_locked); + ctx->notif_locked_nr++; + io_commit_cqring(ctx); spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); percpu_ref_put(&ctx->refs); - kfree(notif); } static inline void io_notif_complete(struct io_notif *notif) @@ -2827,21 +2839,62 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, queue_work(system_unbound_wq, ¬if->commit_work); } +static void io_notif_splice_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + spin_lock(&ctx->completion_lock); + list_splice_init(&ctx->notif_list_locked, &ctx->notif_list); + ctx->notif_locked_nr = 0; + spin_unlock(&ctx->completion_lock); +} + +static void io_notif_cache_purge(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + io_notif_splice_cached(ctx); + + while (!list_empty(&ctx->notif_list)) { + struct io_notif *notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + + list_del(¬if->cache_node); + kfree(notif); + } +} + +static inline bool io_notif_has_cached(struct io_ring_ctx *ctx) + __must_hold(&ctx->uring_lock) +{ + if (likely(!list_empty(&ctx->notif_list))) + return true; + if (data_race(READ_ONCE(ctx->notif_locked_nr) <= IO_COMPL_BATCH)) + return false; + io_notif_splice_cached(ctx); + return !list_empty(&ctx->notif_list); +} + static struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif; - notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); - if (!notif) - return NULL; + if (likely(io_notif_has_cached(ctx))) { + notif = list_first_entry(&ctx->notif_list, + struct io_notif, cache_node); + list_del(¬if->cache_node); + } else { + notif = kzalloc(sizeof(*notif), GFP_ATOMIC | __GFP_ACCOUNT); + if (!notif) + return NULL; + /* pre-initialise some fields */ + notif->ctx = ctx; + notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; + notif->uarg.callback = io_uring_tx_zerocopy_callback; + } notif->seq = slot->seq++; notif->tag = slot->tag; - notif->ctx = ctx; - notif->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; - notif->uarg.callback = io_uring_tx_zerocopy_callback; /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); @@ -11330,6 +11383,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list)); WARN_ON_ONCE(ctx->notif_slots || ctx->nr_notif_slots); + io_notif_cache_purge(ctx); io_mem_free(ctx->rings); io_mem_free(ctx->sq_sqes); From patchwork Tue Jun 28 18:56:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898751 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EE17C433EF for ; Tue, 28 Jun 2022 19:01:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235017AbiF1TBW (ORCPT ); Tue, 28 Jun 2022 15:01:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234445AbiF1TAa (ORCPT ); Tue, 28 Jun 2022 15:00:30 -0400 Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FB4A193E7; Tue, 28 Jun 2022 12:00:14 -0700 (PDT) Received: by mail-ej1-x62d.google.com with SMTP id h23so27614715ejj.12; Tue, 28 Jun 2022 12:00:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=i0MO0fedUqWqU6OS3v8noUO6m7kPGB9OwseLoBpuluo=; b=FxAG96JzN9Cevq47mDhXQKg6CoCJp6Ik7+LjtZfbWvdsLQnX4N4VtaFK1K5TqTCHUJ irgc+0Fcsm9yk31Zrx6if98iy+1fssmDIFeHtq8g/UXqgSemeDhl00kCRKyAC8oyPsL/ MjyujfaEq5rC8cPbcxpvErGctVJWNrkFcnCLVWwHcj30xpCNQDWS3hTJLkpiTrQfHlDy E0bcuQYfIAcNGe+9VPHwMjrCRVpSYBqw6LXIZjWbU1NVEHrDdH1uNYP2NWm/Dc4okbrZ yuPQ8J/ovQ180r61dzHVBvGcSu+C1B9qqYLH1QmDLj7GlQeQRyLedrQZB5PmgBl1nJex nyVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=i0MO0fedUqWqU6OS3v8noUO6m7kPGB9OwseLoBpuluo=; b=u39+5xybBiM9TPjVjVBNDb4MHLGYLYr+p6JzjEY6U/AU2eJuK/32IOL27TOtcMn+wd gdYuM3olQzYZj+rzB9qR75SuhHYIzIXsvsS0b9PftfBVa9AYtDS24cnFiD2bi7GzrOrr WNnqOLPTNMAWU++6WwnZnXMRV+rrbK2ItNcpmhKf47oxhhvM2GJL94ODopaQb6bZ93kf taNtzAE5YJ5VhsXwOmYlEnS1HoVT4riTOSh0Ha/rehR/uaqzpEugAryf6GV53T82zusw aj/TLHXigA5IOKiJ3EAyI0ObGOtH27zsO3Eq9IH2FXh9yoid4YaS7d0Gd7JL2mDjfg6O 9Eyw== X-Gm-Message-State: AJIora/QBy9Aatp5SAkCsBPhMU/2O0aHTzn2Jo+C045eDJmScdv03PB1 ilTs+zxpCpQ0Mt6iKnZFvC6E90fPg1pU2g== X-Google-Smtp-Source: AGRyM1tAMSUMslAPyxoH+0wzXDKJtb027MJYgfXn00t1CGulBpfPU7dB3FymrXO1iaawk8utfe/udQ== X-Received: by 2002:a17:907:94d4:b0:722:e4b8:c2f2 with SMTP id dn20-20020a17090794d400b00722e4b8c2f2mr19166869ejc.527.1656442812700; Tue, 28 Jun 2022 12:00:12 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:12 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 17/29] io_uring: complete notifiers in tw Date: Tue, 28 Jun 2022 19:56:39 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC We need a task context to post CQEs but using wq is too expensive. Try to complete notifiers using task_work and fall back to wq if fails. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 422ff835bf36..9ade0ea8552b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -384,6 +384,8 @@ struct io_notif { /* hook into ctx->notif_list and ctx->notif_list_locked */ struct list_head cache_node; + /* complete via tw if ->task is non-NULL, fallback to wq otherwise */ + struct task_struct *task; union { struct callback_head task_work; struct work_struct commit_work; @@ -2802,6 +2804,11 @@ static void __io_notif_complete_tw(struct callback_head *cb) struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_ring_ctx *ctx = notif->ctx; + if (likely(notif->task)) { + io_put_task(notif->task, 1); + notif->task = NULL; + } + spin_lock(&ctx->completion_lock); io_fill_cqe_aux(ctx, notif->tag, 0, notif->seq); @@ -2835,6 +2842,14 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, if (!refcount_dec_and_test(&uarg->refcnt)) return; + + if (likely(notif->task)) { + init_task_work(¬if->task_work, __io_notif_complete_tw); + if (likely(!task_work_add(notif->task, ¬if->task_work, + TWA_SIGNAL))) + return; + } + INIT_WORK(¬if->commit_work, io_notif_complete_wq); queue_work(system_unbound_wq, ¬if->commit_work); } @@ -2946,8 +2961,12 @@ static __cold int io_notif_unregister(struct io_ring_ctx *ctx) for (i = 0; i < ctx->nr_notif_slots; i++) { struct io_notif_slot *slot = &ctx->notif_slots[i]; - if (slot->notif) + if (slot->notif) { + WARN_ON_ONCE(slot->notif->task); + + slot->notif->task = NULL; io_notif_slot_flush(slot); + } } kvfree(ctx->notif_slots); From patchwork Tue Jun 28 18:56:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898752 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92813C43334 for ; Tue, 28 Jun 2022 19:01:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233908AbiF1TB0 (ORCPT ); Tue, 28 Jun 2022 15:01:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56262 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232972AbiF1TAa (ORCPT ); Tue, 28 Jun 2022 15:00:30 -0400 Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6EE919C02; Tue, 28 Jun 2022 12:00:15 -0700 (PDT) Received: by mail-ej1-x631.google.com with SMTP id ge10so27683293ejb.7; Tue, 28 Jun 2022 12:00:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Cc2i6+MHpZ7ZK8dQIgKI9vrMjg0qmVn/y1UW88z219M=; b=ADXOpUR3O49W4UKfgxHZww6vdKPSdY6G2PkXwDFy5PhaswYG6bEKQKghv/ooxhGGIh yodyFVCEGLNCwpWnRtApYFET563oqP1MpNnywyRsDE5fGOvTrEDyoZiGLhxyFuLuirIE l1ondENLVC8GKz86TfLf1Ea4zUGViYHL1Ho84fpT6uSwqsBqzdD6u+RmdmGqZ4OAPE67 wH8NfcBgrRrTYRu30FNtH7xTbdHRrriimwwJlHubj3OOlKSBRE3GIWby/UAg2Llb0kvN 0Ol9FMRHPcYPfYRVBsvzBKyVRtuuopZrvsUXS91Z7AEonx4VZ8TYNvA0UksfBqezN3xA QQmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Cc2i6+MHpZ7ZK8dQIgKI9vrMjg0qmVn/y1UW88z219M=; b=OGzyLOHiOb18vuyySmiWBoa4Xml3YMbMmy/Y4pR3bCxsJJ4UHcsjPHBFit0oGAnU2F Sq+o6d7jICuj9nv1Aq9/e93qmBjZT5h4Z7LZ5tIy4dzECS4iioxspye/c5sr1k2GKhxm ALMZ5MMhMzHkGPQBc+MmccGcdW84yMSd+vNv5rrI6gJaiaobBbIUptn2IaIoytCocDlS WvfiSA2IbTqOFFgoY5dFCzDzw00QdKS8hsJOTvqQ08TI86iX7UjNbGs8txgGCyizYlDo 4SFw6VcjDrvDX8gCF716tP3Q23r6mIft1cCQMmwLVY5OojGY56x1AsvtjaQFGI/aJ1I9 z1xA== X-Gm-Message-State: AJIora/xwFErM6k3Q+ZRFUujMm3aAlqTK/48L9oGvUkvmtWE+9xyV4Fl zfAXthsrQR86Qo3Xwey2atnKE2WUegSrSg== X-Google-Smtp-Source: AGRyM1sIBdY5KRw+RjwHIfweX4YriDY/C12MfRShhLleIuMgr35iI8TUYcCy2QcTEBLVDbGYHPft+w== X-Received: by 2002:a17:907:3d92:b0:726:39f9:4a33 with SMTP id he18-20020a1709073d9200b0072639f94a33mr18963947ejc.766.1656442813965; Tue, 28 Jun 2022 12:00:13 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:13 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 18/29] io_uring: add notification slot registration Date: Tue, 28 Jun 2022 19:56:40 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Let the userspace to register and unregister notification slots. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 54 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 16 +++++++++++ 2 files changed, 70 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 9ade0ea8552b..22427893549a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -94,6 +94,8 @@ #define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8 +#define IORING_MAX_NOTIF_SLOTS (1U << 10) + /* only define max */ #define IORING_MAX_FIXED_FILES (1U << 20) #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ @@ -2972,6 +2974,49 @@ static __cold int io_notif_unregister(struct io_ring_ctx *ctx) kvfree(ctx->notif_slots); ctx->notif_slots = NULL; ctx->nr_notif_slots = 0; + io_notif_cache_purge(ctx); + return 0; +} + +static __cold int io_notif_register(struct io_ring_ctx *ctx, + void __user *arg, unsigned int size) + __must_hold(&ctx->uring_lock) +{ + struct io_uring_notification_slot __user *slots; + struct io_uring_notification_slot slot; + struct io_uring_notification_register reg; + unsigned i; + + if (ctx->nr_notif_slots) + return -EBUSY; + if (size != sizeof(reg)) + return -EINVAL; + if (copy_from_user(®, arg, sizeof(reg))) + return -EFAULT; + if (!reg.nr_slots || reg.nr_slots > IORING_MAX_NOTIF_SLOTS) + return -EINVAL; + if (reg.resv || reg.resv2 || reg.resv3) + return -EINVAL; + + slots = u64_to_user_ptr(reg.data); + ctx->notif_slots = kvcalloc(reg.nr_slots, sizeof(ctx->notif_slots[0]), + GFP_KERNEL_ACCOUNT); + if (!ctx->notif_slots) + return -ENOMEM; + + for (i = 0; i < reg.nr_slots; i++, ctx->nr_notif_slots++) { + struct io_notif_slot *notif_slot = &ctx->notif_slots[i]; + + if (copy_from_user(&slot, &slots[i], sizeof(slot))) { + io_notif_unregister(ctx); + return -EFAULT; + } + if (slot.resv[0] | slot.resv[1] | slot.resv[2]) { + io_notif_unregister(ctx); + return -EINVAL; + } + notif_slot->tag = slot.tag; + } return 0; } @@ -13378,6 +13423,15 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_unregister_pbuf_ring(ctx, arg); break; + case IORING_REGISTER_NOTIFIERS: + ret = io_notif_register(ctx, arg, nr_args); + break; + case IORING_UNREGISTER_NOTIFIERS: + ret = -EINVAL; + if (arg || nr_args) + break; + ret = io_notif_unregister(ctx); + break; default: ret = -EINVAL; break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 53e7dae92e42..96193bbda2e4 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -417,6 +417,9 @@ enum { IORING_REGISTER_PBUF_RING = 22, IORING_UNREGISTER_PBUF_RING = 23, + IORING_REGISTER_NOTIFIERS = 24, + IORING_UNREGISTER_NOTIFIERS = 25, + /* this goes last */ IORING_REGISTER_LAST }; @@ -463,6 +466,19 @@ struct io_uring_rsrc_update2 { __u32 resv2; }; +struct io_uring_notification_slot { + __u64 tag; + __u64 resv[3]; +}; + +struct io_uring_notification_register { + __u32 nr_slots; + __u32 resv; + __u64 resv2; + __u64 data; + __u64 resv3; +}; + /* Skip updating fd indexes set to this value in the fd table */ #define IORING_REGISTER_FILES_SKIP (-2) From patchwork Tue Jun 28 18:56:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898753 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63D82C43334 for ; Tue, 28 Jun 2022 19:01:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233161AbiF1TB2 (ORCPT ); Tue, 28 Jun 2022 15:01:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232558AbiF1TAb (ORCPT ); Tue, 28 Jun 2022 15:00:31 -0400 Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09E5619C37; Tue, 28 Jun 2022 12:00:17 -0700 (PDT) Received: by mail-ej1-x635.google.com with SMTP id pk21so27742763ejb.2; Tue, 28 Jun 2022 12:00:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uwEtcy35/eb1UWEX+Is4VwPssXK9wNrE3mSEGTaJXGY=; b=impo2pz/qgTx3Q122rqabF7MTpnmkKigg+V9UX2iMgo8WHV4Ob7vNSwHYUJByleSer dN8gTCeWBGEqh7o7UPtLX989YCC507wXDZ/1GtJOcTyoZ6FDhQBNtR1w2AvPUv+zgVtZ Ey9lLpvKDtW36yfzJGo6juQ7lwf3K0YWkvt1lPKpgQGRoizBFrn9MSQrI9xdhwzlI84C ge8brGCFbL+heirfkLR1Xgjhl8D6sE0t7FfLEYNv4eXNqAhqWG9ooywAPcQC93W8Vfo+ OCxg508DUpDFQGSoVIfXd96A+U2aqjm50f0A+ISfGUoQrYfzTaXeez0WHkMZFL5ZGg+Q bWVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=uwEtcy35/eb1UWEX+Is4VwPssXK9wNrE3mSEGTaJXGY=; b=wci9xFn1TE+HKxJedW85nDG7JAie1kt3F3s57TSjPEIxHBHO3R2vPZljm4OJDdWJzx ve7Cww0f69j0D6woGqIf7yX9V0Auxjmh4qxA3XNgZrWsRg04DUZmopEgXlQC8fE/QLM/ KjWnlCG8E0KYG9qn/Uzujyd+lmG8w6xL00DdFCdsGjagqgOVXy1YRa6y6UgWLuxsMTFm KVkeTYL1RKlT85jxN2k0aqOfI9E4+7j1JtKbJCdfUCtrTeev1Pvu7tnS8N/0xZ8nIq1f 6GPVT5q7JOZihFh4k7DJZoBTdU1WFR3fBQBBRlGLTCUzlxBjvV8BNfSTWrR4dtHRc0KO yk1A== X-Gm-Message-State: AJIora8I230EOd3c1yGIO7dYhEdqT4rsppRKagy2X5Bx+akE3+KaYaNN BvLnX+wenbizjLmsHW4C0KqAv+ECtV992Q== X-Google-Smtp-Source: AGRyM1uOhjeZnS3uNRqg1lzXBPlNIwFv28U4mRmy/6A+x0G2p7vStFJeuEOZdP5LbQH1DyGBm2ZBMA== X-Received: by 2002:a17:906:2dd:b0:712:1293:3dd8 with SMTP id 29-20020a17090602dd00b0071212933dd8mr19335681ejk.448.1656442815171; Tue, 28 Jun 2022 12:00:15 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:14 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 19/29] io_uring: rename IORING_OP_FILES_UPDATE Date: Tue, 28 Jun 2022 19:56:41 +0100 Message-Id: <93e0583f37ea7fe64fac4aab782ed9266320666d.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC IORING_OP_FILES_UPDATE will be a more generic opcode serving different resource types, rename it into IORING_OP_RSRC_UPDATE and add subtype handling. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 23 +++++++++++++++++------ include/uapi/linux/io_uring.h | 12 +++++++++++- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 22427893549a..e9fc7e076c7f 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -730,6 +730,7 @@ struct io_rsrc_update { u64 arg; u32 nr_args; u32 offset; + unsigned type; }; struct io_fadvise { @@ -1280,7 +1281,7 @@ static const struct io_op_def io_op_defs[] = { }, [IORING_OP_OPENAT] = {}, [IORING_OP_CLOSE] = {}, - [IORING_OP_FILES_UPDATE] = { + [IORING_OP_RSRC_UPDATE] = { .audit_skip = 1, .iopoll = 1, }, @@ -8268,7 +8269,7 @@ static int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags) return 0; } -static int io_files_update_prep(struct io_kiocb *req, +static int io_rsrc_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { if (unlikely(req->flags & (REQ_F_FIXED_FILE | REQ_F_BUFFER_SELECT))) @@ -8280,6 +8281,7 @@ static int io_files_update_prep(struct io_kiocb *req, req->rsrc_update.nr_args = READ_ONCE(sqe->len); if (!req->rsrc_update.nr_args) return -EINVAL; + req->rsrc_update.type = READ_ONCE(sqe->ioprio); req->rsrc_update.arg = READ_ONCE(sqe->addr); return 0; } @@ -8308,6 +8310,15 @@ static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) return 0; } +static int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) +{ + switch (req->rsrc_update.type) { + case IORING_RSRC_UPDATE_FILES: + return io_files_update(req, issue_flags); + } + return -EINVAL; +} + static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { switch (req->opcode) { @@ -8352,8 +8363,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return io_openat_prep(req, sqe); case IORING_OP_CLOSE: return io_close_prep(req, sqe); - case IORING_OP_FILES_UPDATE: - return io_files_update_prep(req, sqe); + case IORING_OP_RSRC_UPDATE: + return io_rsrc_update_prep(req, sqe); case IORING_OP_STATX: return io_statx_prep(req, sqe); case IORING_OP_FADVISE: @@ -8661,8 +8672,8 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) case IORING_OP_CLOSE: ret = io_close(req, issue_flags); break; - case IORING_OP_FILES_UPDATE: - ret = io_files_update(req, issue_flags); + case IORING_OP_RSRC_UPDATE: + ret = io_rsrc_update(req, issue_flags); break; case IORING_OP_STATX: ret = io_statx(req, issue_flags); diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 96193bbda2e4..5f574558b96c 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -162,7 +162,8 @@ enum io_uring_op { IORING_OP_FALLOCATE, IORING_OP_OPENAT, IORING_OP_CLOSE, - IORING_OP_FILES_UPDATE, + IORING_OP_RSRC_UPDATE, + IORING_OP_FILES_UPDATE = IORING_OP_RSRC_UPDATE, IORING_OP_STATX, IORING_OP_READ, IORING_OP_WRITE, @@ -210,6 +211,7 @@ enum io_uring_op { #define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5) #define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME) #define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE) + /* * sqe->splice_flags * extends splice(2) flags @@ -258,6 +260,14 @@ enum io_uring_op { */ #define IORING_ACCEPT_MULTISHOT (1U << 0) + +/* + * IORING_OP_RSRC_UPDATE flags + */ +enum { + IORING_RSRC_UPDATE_FILES, +}; + /* * IO completion data structure (Completion Queue Entry) */ From patchwork Tue Jun 28 18:56:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898754 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A81C0C433EF for ; Tue, 28 Jun 2022 19:01:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233475AbiF1TBa (ORCPT ); Tue, 28 Jun 2022 15:01:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232704AbiF1TAb (ORCPT ); Tue, 28 Jun 2022 15:00:31 -0400 Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BA3C1A049; Tue, 28 Jun 2022 12:00:17 -0700 (PDT) Received: by mail-ej1-x62d.google.com with SMTP id h23so27614715ejj.12; Tue, 28 Jun 2022 12:00:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=oOGRlNTNJY51QXUFrfFxWPWq7Z2HbWhlV6gca7goiwg=; b=ja4zCRNwBj18hSgwwv3X4sTg9KeDiSX5k7tdpGg2AIfifMAq2k8bNeD6F8wq7iLqY6 XzbD8QyWyA2I70EYpBJTNaq+oUkBUqa1Rverg2GeYp6AcKCs3GtPDJIZ6gb7WKVK9hOE a+AtX+C7xK0aHc1pHgCj/6SLT3K247sGTaRRx9pI6/QyClV/uYVuTvk61nPzVD2Tdh+T moyg1fG21eK4S9goKmnXS0diOAG2+hOu1jNy1EaKaP+sHh+B868gatFkqqpMZ7SUFJe1 sVPziRScbE367FS6TVL/3RnvGT6tt7kYYm4iZH3yHc8IQwobSfD1x6SyApXFyfnMLiTw F1yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oOGRlNTNJY51QXUFrfFxWPWq7Z2HbWhlV6gca7goiwg=; b=6Zk7Mm88XXSBGic+E2KX+S5CN7oe5ReZFAtNPhTMjyuqvKdp0fHr/LUs0XtyRLs1jg DZQzZqocNQozYtkLeqmD4wpGE5gw5R+njsI5YQ4YdKf1AdyBzlmpUf0Z9KCflm4BwmeV e8EV9fKKY0Aq5w3Ucvr791uhD+SDYM5IQwR2ma04A+encN00ZFwdc+ITTrR1EpbkfkB2 //Yajcfmv1GOBBoZzBWedBuNxy6IqUmbypkJgEu5eVs8qo+OwlxOKB3r/Lj9yzaV4liO QfauZS4gQyqGB0db/Iml7Wr+WAjvsy7i4/+Q7igS0egxuocN9Bh1OzA9HUtusSIyTOmP o3dg== X-Gm-Message-State: AJIora/pu8vtIPqlzB7eEgQ8yUURMsKPZoWRtyRljfNkYof41Qu8VVFD H0JLguWmYLSWJ9v/2+gkKhBRk8XFXpErLg== X-Google-Smtp-Source: AGRyM1unAJh+uP60Gs7yhxS83ZH+GB3ChuXyJeXrL9PJ4d+W7MTYnnkWAxVGJ71q+zRqfgbI/jQ8Ew== X-Received: by 2002:a17:906:9c82:b0:6df:c5f0:d456 with SMTP id fj2-20020a1709069c8200b006dfc5f0d456mr19574221ejc.287.1656442816320; Tue, 28 Jun 2022 12:00:16 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:15 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 20/29] io_uring: add zc notification flush requests Date: Tue, 28 Jun 2022 19:56:42 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Overlay notification control onto IORING_OP_RSRC_UPDATE (former IORING_OP_FILES_UPDATE). It allows to flush a range of zc notifications from slots with indexes [sqe->off, sqe->off+sqe->len). If sqe->arg is not zero, it also copies sqe->arg as a new tag for all flushed notifications. Note, it doesn't flush a notification of a slot if there was no requests attached to it (since last flush or registration). Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 47 +++++++++++++++++++++++++++++++++++ include/uapi/linux/io_uring.h | 1 + 2 files changed, 48 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index e9fc7e076c7f..a88c9c73ed1d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1284,6 +1284,7 @@ static const struct io_op_def io_op_defs[] = { [IORING_OP_RSRC_UPDATE] = { .audit_skip = 1, .iopoll = 1, + .ioprio = 1, }, [IORING_OP_STATX] = { .audit_skip = 1, @@ -2953,6 +2954,16 @@ static void io_notif_slot_flush(struct io_notif_slot *slot) io_notif_complete(notif); } +static inline void io_notif_slot_flush_submit(struct io_notif_slot *slot, + unsigned int issue_flags) +{ + if (!(issue_flags & IO_URING_F_UNLOCKED)) { + slot->notif->task = current; + io_get_task_refs(1); + } + io_notif_slot_flush(slot); +} + static __cold int io_notif_unregister(struct io_ring_ctx *ctx) __must_hold(&ctx->uring_lock) { @@ -8286,6 +8297,40 @@ static int io_rsrc_update_prep(struct io_kiocb *req, return 0; } +static int io_notif_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + unsigned len = req->rsrc_update.nr_args; + unsigned idx_end, idx = req->rsrc_update.offset; + int ret = 0; + + io_ring_submit_lock(ctx, issue_flags); + if (unlikely(check_add_overflow(idx, len, &idx_end))) { + ret = -EOVERFLOW; + goto out; + } + if (unlikely(idx_end > ctx->nr_notif_slots)) { + ret = -EINVAL; + goto out; + } + + for (; idx < idx_end; idx++) { + struct io_notif_slot *slot = &ctx->notif_slots[idx]; + + if (!slot->notif) + continue; + if (req->rsrc_update.arg) + slot->tag = req->rsrc_update.arg; + io_notif_slot_flush_submit(slot, issue_flags); + } +out: + io_ring_submit_unlock(ctx, issue_flags); + if (ret < 0) + req_set_fail(req); + __io_req_complete(req, issue_flags, ret, 0); + return 0; +} + static int io_files_update(struct io_kiocb *req, unsigned int issue_flags) { struct io_ring_ctx *ctx = req->ctx; @@ -8315,6 +8360,8 @@ static int io_rsrc_update(struct io_kiocb *req, unsigned int issue_flags) switch (req->rsrc_update.type) { case IORING_RSRC_UPDATE_FILES: return io_files_update(req, issue_flags); + case IORING_RSRC_UPDATE_NOTIF: + return io_notif_update(req, issue_flags); } return -EINVAL; } diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 5f574558b96c..19b9d7a2da29 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -266,6 +266,7 @@ enum io_uring_op { */ enum { IORING_RSRC_UPDATE_FILES, + IORING_RSRC_UPDATE_NOTIF, }; /* From patchwork Tue Jun 28 18:56:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898757 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7C39C43334 for ; Tue, 28 Jun 2022 19:01:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235095AbiF1TBp (ORCPT ); Tue, 28 Jun 2022 15:01:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233443AbiF1TAe (ORCPT ); Tue, 28 Jun 2022 15:00:34 -0400 Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C93A1CB01; Tue, 28 Jun 2022 12:00:19 -0700 (PDT) Received: by mail-ed1-x533.google.com with SMTP id o9so18874953edt.12; Tue, 28 Jun 2022 12:00:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2f1AxK5FaO/U0unxlphFQ9xZ09TOUrK2OQGB6/IkU5Q=; b=M28CQDGd7UfSStv9n7ef92AA6sM8BSea4g+GMgjK3C53lhnbQ5YjjcSNSsoNRJBTTd QXQlbdCyCkHmzmY4pMrpxRgre0CGDT/sQnF6X0heS1QWHMWHvq7A/uLPO9gRP57M/y1g 4723zSN2iNGXRGx8M7Iuf3czaFnJcD9zyNjgTaQhjjhso2dkqtaxpB/WC2q8ZuWQOwft bpI0ZLh4c8a06rGwAsR/Ui8Ex/4Hn3qhyG5rWW4nj25312e2M01uitVyDUt/c/a0Jk6a iMIkWn1vZ2DbeifJ6wisDgxYApJLThGejZXuZsZgBNo/Lo+CPNtAauA2rIYdRUBX7TiO hjLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2f1AxK5FaO/U0unxlphFQ9xZ09TOUrK2OQGB6/IkU5Q=; b=G81HUexszCcJ7TC8icgkT8ZY4u+Y1UJMHnosXhHqfH0Ls8a+uy7g/qY/p6xZY2Ts/m qesy9FMXqaLdpxzz+l3ysGWmqfJKNoGcojgUxprLIeEQQKFjHrinyZYjnhrxLC1pekUx StzSsfgoFiXWfa3pRfD6TifZr+FSffWiVUFtMh/M+6ijbWwphiACWd+K0ug/+DhsEozC iyHN848RjAmhheuCDNEB9VEe42UxYA7W5rLeE78l/MkEaoC5EEuNk3w8ncLka8AFXLds +r3CzzcgdNphS0sc+jHZry31p+y7J0gl2PeAufcIjk6Oj1btPenh431z562KoCN/vrMl A33g== X-Gm-Message-State: AJIora9Qh9WC9iLC83wXRXyqnrT0OeQ3bnSTHhC48enPq1uxd9b4oVjg KpY21u3u+6mUM9DnybhdJbvpAD7oBTh4Og== X-Google-Smtp-Source: AGRyM1tsvZyi0q1mjEcxSKNzX5+chxM1V6KXEbkMOnHkGbvjSI/ZMigUBzLf4F7lq2G5RUpEMJ1NGg== X-Received: by 2002:a05:6402:42cb:b0:435:8c3b:faf8 with SMTP id i11-20020a05640242cb00b004358c3bfaf8mr25224341edc.300.1656442817541; Tue, 28 Jun 2022 12:00:17 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:17 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 21/29] io_uring: wire send zc request type Date: Tue, 28 Jun 2022 19:56:43 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add a new io_uring opcode IORING_OP_SENDZC. The main distinction from IORING_OP_SEND is that the user should specify a notification slot index in sqe::notification_idx and the buffers are safe to reuse only when the used notification is flushed and completes. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 103 +++++++++++++++++++++++++++++++++- include/uapi/linux/io_uring.h | 5 ++ 2 files changed, 106 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index a88c9c73ed1d..4a1a1d43e9b3 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -716,6 +716,14 @@ struct io_sr_msg { unsigned int flags; }; +struct io_sendzc { + struct file *file; + void __user *buf; + size_t len; + u16 slot_idx; + int msg_flags; +}; + struct io_open { struct file *file; int dfd; @@ -1044,6 +1052,7 @@ struct io_kiocb { struct io_socket sock; struct io_nop nop; struct io_uring_cmd uring_cmd; + struct io_sendzc msgzc; }; u8 opcode; @@ -1384,6 +1393,13 @@ static const struct io_op_def io_op_defs[] = { .needs_async_setup = 1, .async_size = uring_cmd_pdu_size(1), }, + [IORING_OP_SENDZC] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .pollout = 1, + .audit_skip = 1, + .ioprio = 1, + }, }; /* requests with any of those set should undergo io_disarm_next() */ @@ -1525,6 +1541,8 @@ const char *io_uring_get_opcode(u8 opcode) return "SOCKET"; case IORING_OP_URING_CMD: return "URING_CMD"; + case IORING_OP_SENDZC: + return "URING_SENDZC"; case IORING_OP_LAST: return "INVALID"; } @@ -2920,7 +2938,6 @@ static struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, return notif; } -__attribute__((unused)) static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot) { @@ -2929,7 +2946,6 @@ static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, return slot->notif; } -__attribute__((unused)) static inline struct io_notif_slot *io_get_notif_slot(struct io_ring_ctx *ctx, int idx) __must_hold(&ctx->uring_lock) @@ -6546,6 +6562,83 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } +static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_sendzc *zc = &req->msgzc; + + if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0])) + return -EINVAL; + + zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); + zc->len = READ_ONCE(sqe->len); + zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL; + zc->slot_idx = READ_ONCE(sqe->notification_idx); + if (zc->msg_flags & MSG_DONTWAIT) + req->flags |= REQ_F_NOWAIT; +#ifdef CONFIG_COMPAT + if (req->ctx->compat) + zc->msg_flags |= MSG_CMSG_COMPAT; +#endif + return 0; +} + +static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_sendzc *zc = &req->msgzc; + struct io_notif_slot *notif_slot; + struct io_notif *notif; + struct msghdr msg; + struct iovec iov; + struct socket *sock; + unsigned msg_flags; + int ret, min_ret = 0; + + if (issue_flags & IO_URING_F_UNLOCKED) + return -EAGAIN; + sock = sock_from_file(req->file); + if (unlikely(!sock)) + return -ENOTSOCK; + + notif_slot = io_get_notif_slot(ctx, zc->slot_idx); + if (!notif_slot) + return -EINVAL; + notif = io_get_notif(ctx, notif_slot); + if (!notif) + return -ENOMEM; + + msg.msg_name = NULL; + msg.msg_control = NULL; + msg.msg_controllen = 0; + msg.msg_namelen = 0; + msg.msg_managed_data = 0; + + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); + if (unlikely(ret)) + return ret; + + msg_flags = zc->msg_flags | MSG_ZEROCOPY; + if (issue_flags & IO_URING_F_NONBLOCK) + msg_flags |= MSG_DONTWAIT; + if (msg_flags & MSG_WAITALL) + min_ret = iov_iter_count(&msg.msg_iter); + + msg.msg_flags = msg_flags; + msg.msg_ubuf = ¬if->uarg; + ret = sock_sendmsg(sock, &msg); + + if (unlikely(ret < min_ret)) { + if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) + return -EAGAIN; + if (ret == -ERESTARTSYS) + ret = -EINTR; + req_set_fail(req); + } + + __io_req_complete(req, issue_flags, ret, 0); + return 0; +} + static int __io_recvmsg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg) { @@ -7064,6 +7157,7 @@ IO_NETOP_PREP_ASYNC(connect); IO_NETOP_PREP(accept); IO_NETOP_PREP(socket); IO_NETOP_PREP(shutdown); +IO_NETOP_PREP(sendzc); IO_NETOP_FN(send); IO_NETOP_FN(recv); #endif /* CONFIG_NET */ @@ -8389,6 +8483,8 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) case IORING_OP_SENDMSG: case IORING_OP_SEND: return io_sendmsg_prep(req, sqe); + case IORING_OP_SENDZC: + return io_sendzc_prep(req, sqe); case IORING_OP_RECVMSG: case IORING_OP_RECV: return io_recvmsg_prep(req, sqe); @@ -8689,6 +8785,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) case IORING_OP_SEND: ret = io_send(req, issue_flags); break; + case IORING_OP_SENDZC: + ret = io_sendzc(req, issue_flags); + break; case IORING_OP_RECVMSG: ret = io_recvmsg(req, issue_flags); break; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 19b9d7a2da29..6c6f20ae5a95 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -61,6 +61,10 @@ struct io_uring_sqe { union { __s32 splice_fd_in; __u32 file_index; + struct { + __u16 notification_idx; + __u16 __pad; + } __attribute__((packed)); }; union { struct { @@ -190,6 +194,7 @@ enum io_uring_op { IORING_OP_GETXATTR, IORING_OP_SOCKET, IORING_OP_URING_CMD, + IORING_OP_SENDZC, /* this goes last, obviously */ IORING_OP_LAST, From patchwork Tue Jun 28 18:56:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898755 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CED13CCA47E for ; Tue, 28 Jun 2022 19:01:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233890AbiF1TBl (ORCPT ); Tue, 28 Jun 2022 15:01:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233559AbiF1TAe (ORCPT ); Tue, 28 Jun 2022 15:00:34 -0400 Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8691D1CB07; Tue, 28 Jun 2022 12:00:20 -0700 (PDT) Received: by mail-ej1-x633.google.com with SMTP id sb34so27624996ejc.11; Tue, 28 Jun 2022 12:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HYd55O2NdGLO2Euqz1Nyab6aWjt/DTgixsVymNAk6zI=; b=GSK4pU182c9jv4Pa9k0QjVW35qbkMOQ/h9jCjetCYwu/R3LP0mTfoPowza+903LW5G e8fXxUrGqr0gbmnnKjC9xJQCiLrnmGXSLaVcwptfbd/APhuwGXzFVv8CSQ1MlFUpXC0r MNfFw64H1qREZflEXEhkKEN3UR1sVd3my9ZZEUWt8K/eNRBRYCkmFQ+9AU7OdtmargIz c/wAP+qmwpnj/Fh1Gy3Dy+BX4RZJP2iTfq0evYLdPa6tZyYAkXjkh+5t2zz8+i/XgAdN /igFpNVAHcbgwhsP2+QubjUIdYnQyYPVlawovbSZX9xMf8sUrJSRJ6yua3ewfWZwKBRR c4Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HYd55O2NdGLO2Euqz1Nyab6aWjt/DTgixsVymNAk6zI=; b=ht5Btv4eO9SijPMIEa1+vROHrEQVPHpmMgdDu9+Mvu+uonbR7IQgQckXl5iUfCuH1b x6NA02y/yur216u04HwfvLtIobqjVZwYXN5e4GXP9j5SD4a02FxR7qu7H+MKPiORrdkz C5YR9KZkGFdXns1deCQoSfJUHV2hcCD2VIoeHCvDKqUqKr7gQVNjk42EB1GyR9YZL4+n j05T8uonCcD+ITG7UwfpXhDkOLoW3wh86LONK8g5SNOK53paLzeIgt8QmWhGGV2TQkYF VVuY/Kb/keO+1lSRtOW60gIfwBkYXjr09R2h0kAM/Ht72q2lUUMQpuxp7vNwUejdpvWM 8RxQ== X-Gm-Message-State: AJIora/0jcaVT5/7y34sRpalafBdv12ZMTgSSRiWJINc5YR8H0Zl+fOU pUE7crhxllgOAHILuBT7ZM60C+uydugugg== X-Google-Smtp-Source: AGRyM1sXQrywrdrTVmgueASf2Qn31JLvNysMxcPp/uHviEe8RykMLE3ipLNyiqZhXRPxIUSKZJiTlg== X-Received: by 2002:a17:907:94d2:b0:722:e4fa:89f7 with SMTP id dn18-20020a17090794d200b00722e4fa89f7mr18101861ejc.603.1656442818800; Tue, 28 Jun 2022 12:00:18 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:18 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 22/29] io_uring: account locked pages for non-fixed zc Date: Tue, 28 Jun 2022 19:56:44 +0100 Message-Id: <7b68f8a5291bc512a225b5a876384ebd4dcda1dd.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Fixed buffers are RLIMIT_MEMLOCK accounted, however it doesn't cover iovec based zerocopy sends. Do the accounting on the io_uring side. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 4a1a1d43e9b3..838030477456 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2825,7 +2825,13 @@ static void __io_notif_complete_tw(struct callback_head *cb) { struct io_notif *notif = container_of(cb, struct io_notif, task_work); struct io_ring_ctx *ctx = notif->ctx; + struct mmpin *mmp = ¬if->uarg.mmp; + if (unlikely(mmp->user)) { + atomic_long_sub(mmp->num_pg, &mmp->user->locked_vm); + free_uid(mmp->user); + mmp->user = NULL; + } if (likely(notif->task)) { io_put_task(notif->task, 1); notif->task = NULL; @@ -6616,6 +6622,7 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); if (unlikely(ret)) return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) From patchwork Tue Jun 28 18:56:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898756 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB33BCCA479 for ; Tue, 28 Jun 2022 19:01:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235082AbiF1TBn (ORCPT ); Tue, 28 Jun 2022 15:01:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233541AbiF1TAe (ORCPT ); Tue, 28 Jun 2022 15:00:34 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1D791CB0D; Tue, 28 Jun 2022 12:00:20 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id fd6so18898459edb.5; Tue, 28 Jun 2022 12:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tGCN/782OTAuVAlm4NhC1meQ0mDdrYQikPz7yxO4s3c=; b=QSfKJbVUSlX701ZwcjuVu/k9eDD9nXSrrcmYKZS3Tn8WEIh1KbT9ujqQr/TR9q/lbY hhBWwY5qUE4FYGbhdsX4VJ/nPqdXCJKDCx5Icqpm0zPWsTQWuNxM55LiRa9qKbnhvFLO FB6h1d18kPy7imiyUyybON+0jjlO4t6S0nHu8WOzV+gkSjQPGdhGGabMpRazMDhMMNhw aulkQ4N0bPSlizeG/GIL4UuzIEllkSVTyGiZKAaxeO59lFImuUxKHIlesirAXdlnc7Y/ aDJ83ofvZV9x/Ynu7qWECB1DWTxp9pKcEZSDO3ZQ5yue3seYDfb5nH72Tq7dF/0YYkf/ PwvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tGCN/782OTAuVAlm4NhC1meQ0mDdrYQikPz7yxO4s3c=; b=ng5IuSposrxeARSu+t2TqYFDlZihUCEP26sbANbQPsnZ3P/q3YjWJh4kCM5hvifb6U AakM0LehXZpgg2lM2mEXaHmk+j41D3xVArFccgHD/YwzZPBA837XWJ1w4eEZolxZzTIr tARIbB6q0YgcrUr5H+uLjwh2e1bsRD77b/ffBnpyI3VHoVGWfEu9NapgInxxdPNtDMAM s1+pW4wOWay2eMdzUdoW648T1AdTWTJ4NnI2TSKhG+4Fidr1kxpxRA/9+EprF99VXDQI zxKtxhNbna8Ka/JL25gNdkJrZu9zwYkPiL9L2aw45MEl8TFi3/cAI4nISVOMjBQ3cWHR MMyA== X-Gm-Message-State: AJIora8Nc0Ryd45lw1q4C+cDFqf41zPFz4X/ep3RSMyGnyDA8uqaZEwM 2ZNqpQw/L5lJ9TpH0sXMsUjhAvvgjKkwZw== X-Google-Smtp-Source: AGRyM1uA/U423K/SUXtBeAD0+1+t7dMIhzGv0ilmdOFqj8DaQ9bEOtc3QI+LwroT8+RT3KttwU+qUA== X-Received: by 2002:a50:fb9a:0:b0:435:6c0e:3342 with SMTP id e26-20020a50fb9a000000b004356c0e3342mr25930016edq.337.1656442819981; Tue, 28 Jun 2022 12:00:19 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:19 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 23/29] io_uring: allow to pass addr into sendzc Date: Tue, 28 Jun 2022 19:56:45 +0100 Message-Id: <228d4841af5eeb9a4b73955136559f18cb7e43a0.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Allow to specify an address to zerocopy sends making it more like sendto(2). Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 16 +++++++++++++++- include/uapi/linux/io_uring.h | 2 +- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 838030477456..a1e9405a3f1b 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -722,6 +722,8 @@ struct io_sendzc { size_t len; u16 slot_idx; int msg_flags; + int addr_len; + void __user *addr; }; struct io_open { @@ -6572,7 +6574,7 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = &req->msgzc; - if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->addr2) || READ_ONCE(sqe->__pad2[0])) + if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->__pad2[0])) return -EINVAL; zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); @@ -6581,6 +6583,9 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->slot_idx = READ_ONCE(sqe->notification_idx); if (zc->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; + zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); + zc->addr_len = READ_ONCE(sqe->addr_len); + #ifdef CONFIG_COMPAT if (req->ctx->compat) zc->msg_flags |= MSG_CMSG_COMPAT; @@ -6590,6 +6595,7 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) { + struct sockaddr_storage address; struct io_ring_ctx *ctx = req->ctx; struct io_sendzc *zc = &req->msgzc; struct io_notif_slot *notif_slot; @@ -6624,6 +6630,14 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) return ret; mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (zc->addr) { + ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); + if (unlikely(ret < 0)) + return ret; + msg.msg_name = (struct sockaddr *)&address; + msg.msg_namelen = zc->addr_len; + } + msg_flags = zc->msg_flags | MSG_ZEROCOPY; if (issue_flags & IO_URING_F_NONBLOCK) msg_flags |= MSG_DONTWAIT; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 6c6f20ae5a95..689aa1444cd4 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -63,7 +63,7 @@ struct io_uring_sqe { __u32 file_index; struct { __u16 notification_idx; - __u16 __pad; + __u16 addr_len; } __attribute__((packed)); }; union { From patchwork Tue Jun 28 18:56:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898759 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C143CCA479 for ; Tue, 28 Jun 2022 19:01:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235138AbiF1TBt (ORCPT ); Tue, 28 Jun 2022 15:01:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232864AbiF1TAf (ORCPT ); Tue, 28 Jun 2022 15:00:35 -0400 Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1949C1EEFE; Tue, 28 Jun 2022 12:00:23 -0700 (PDT) Received: by mail-ej1-x632.google.com with SMTP id g26so27683700ejb.5; Tue, 28 Jun 2022 12:00:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=awYm6HD6jYs2SaAyNqv6tsJHKyEz5elq2Zf1xhTXbLA=; b=lOYPNl4WFiYVlmBE984FzFFjiIRwadj5m58zSyoDhC7nvUlLk/i9yy0IKrPUxIP2L3 wxNIPgdREa/XKvgJ/1BFwVtFAAlcdKcOs462InfzpX6RjhhZVg1ZLB++mQ/5oEKJiBpT IKghCvnxyZ+sHhloRkDGn0fQhlnqhTwPYCD78Apo/o2wxIpo9MGyvB6QoMOVmUB0KzXT OYt/qftk0ms4VyyQSnkDuNp49x/KHQRxE1MNfUU8BjOYSqsDS0ZYnzHjb2PPMzHPeo0K OMInc51n74xAqkEGcDX4hADru1plpGKqiUhadgbUaI6le86dA2aPOv1N0LJvsJCT3ZIR +5WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=awYm6HD6jYs2SaAyNqv6tsJHKyEz5elq2Zf1xhTXbLA=; b=Yu4wyjJL9FdFDLXqPUVYHqp1g+qCC6QhsrAz+vhIV6GHgyIj0gJnpC+LIytf8pHHrb 5ELX0wllnhOlON4WjAQ+uHCcgkuQX4Q4/gl/SKTzIETnNhzwkTzI0fksyKV+4ScVqr4Q TqiFG2bNeiuOT8o903KC6FbOqslRJFy1zFKZSGiwXZX1Oovz/Z96m9LxsKQo0ndTye4q PynDtCgKNYhgAAS8EaUfHkIc0wP+EiNQEM0C0t2fpjXIDzkkFJ8I+UAUD+i+2vOAFHA/ xKyMJtogrYoFS1RS8gKxf4gkSI4gznjezDjakErYW+BsSAJFsZxcs0SEFEu8ooYSAZ56 Egcg== X-Gm-Message-State: AJIora8dqonTNiQ52TEJa606+dWIu19Wq6gTjYcRRyONUbekzsDJtEib TduLUeLB/tpfCr7O5s298tZy6Ldj3e3pgg== X-Google-Smtp-Source: AGRyM1uQA2BJUNEBkywkqA9NRUiJiEe1S1TN+RAgvh9Yxx1WjiBWYnFhFMwxSFJlkcHeN8QxV7lt8w== X-Received: by 2002:a17:906:478e:b0:722:f84d:159f with SMTP id cw14-20020a170906478e00b00722f84d159fmr19402174ejc.182.1656442821239; Tue, 28 Jun 2022 12:00:21 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:20 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 24/29] io_uring: add rsrc referencing for notifiers Date: Tue, 28 Jun 2022 19:56:46 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC In preparation to zerocopy sends with fixed buffers make notifiers to reference the rsrc node to protect the used fixed buffers. We can't just grab it for a send request as notifiers can likely outlive requests that used it. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index a1e9405a3f1b..07d09d06e8ab 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -378,6 +378,7 @@ struct io_ev_fd { struct io_notif { struct ubuf_info uarg; struct io_ring_ctx *ctx; + struct io_rsrc_node *rsrc_node; /* cqe->user_data, io_notif_slot::tag if not overridden */ u64 tag; @@ -1695,13 +1696,20 @@ static __cold void io_rsrc_refs_drop(struct io_ring_ctx *ctx) } } -static void io_rsrc_refs_refill(struct io_ring_ctx *ctx) +static __cold void io_rsrc_refs_refill(struct io_ring_ctx *ctx) __must_hold(&ctx->uring_lock) { ctx->rsrc_cached_refs += IO_RSRC_REF_BATCH; percpu_ref_get_many(&ctx->rsrc_node->refs, IO_RSRC_REF_BATCH); } +static inline void io_charge_rsrc_node(struct io_ring_ctx *ctx) +{ + ctx->rsrc_cached_refs--; + if (unlikely(ctx->rsrc_cached_refs < 0)) + io_rsrc_refs_refill(ctx); +} + static inline void io_req_set_rsrc_node(struct io_kiocb *req, struct io_ring_ctx *ctx, unsigned int issue_flags) @@ -1711,9 +1719,7 @@ static inline void io_req_set_rsrc_node(struct io_kiocb *req, if (!(issue_flags & IO_URING_F_UNLOCKED)) { lockdep_assert_held(&ctx->uring_lock); - ctx->rsrc_cached_refs--; - if (unlikely(ctx->rsrc_cached_refs < 0)) - io_rsrc_refs_refill(ctx); + io_charge_rsrc_node(ctx); } else { percpu_ref_get(&req->rsrc_node->refs); } @@ -2826,6 +2832,7 @@ static __cold void io_free_req(struct io_kiocb *req) static void __io_notif_complete_tw(struct callback_head *cb) { struct io_notif *notif = container_of(cb, struct io_notif, task_work); + struct io_rsrc_node *rsrc_node = notif->rsrc_node; struct io_ring_ctx *ctx = notif->ctx; struct mmpin *mmp = ¬if->uarg.mmp; @@ -2849,6 +2856,7 @@ static void __io_notif_complete_tw(struct callback_head *cb) spin_unlock(&ctx->completion_lock); io_cqring_ev_posted(ctx); + io_rsrc_put_node(rsrc_node, 1); percpu_ref_put(&ctx->refs); } @@ -2943,6 +2951,8 @@ static struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, /* master ref owned by io_notif_slot, will be dropped on flush */ refcount_set(¬if->uarg.refcnt, 1); percpu_ref_get(&ctx->refs); + notif->rsrc_node = ctx->rsrc_node; + io_charge_rsrc_node(ctx); return notif; } From patchwork Tue Jun 28 18:56:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898758 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F385CCA47F for ; Tue, 28 Jun 2022 19:01:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232967AbiF1TBq (ORCPT ); Tue, 28 Jun 2022 15:01:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233201AbiF1TAf (ORCPT ); Tue, 28 Jun 2022 15:00:35 -0400 Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 141131EEFC; Tue, 28 Jun 2022 12:00:23 -0700 (PDT) Received: by mail-ej1-x636.google.com with SMTP id lw20so27726288ejb.4; Tue, 28 Jun 2022 12:00:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DkkJV+I1xqoSsdjvbVDAbb8FThJ4wemGZq1lcjOE1/4=; b=bqKdU4EHepi9SUPDDuE6qYVQOWMlNVyMjLsQMLac4oIkcAwIcoKdhFpwBuSXfEuxTY lMEghLkIoeTGlsEITzdPnuHXFFPAgMwXo0VkUMAvCjB8W5Fi90NsMRKIhFZlPEvw4LOr +PYAyvEmGEfgjXaOLgQ/zs7n8svEMdUVhdFUkx4itqZh65vSW38tL8IY6hyRjNAFoxdi pRt8uNHLxLgKnx9yEc7ZiglfikO1mlG2+10o1deqc41x/qgujIr37OgWFqxVTHiM/3qB wda9Z1riP65r2ThAWFBheEvHOCn8KYrNleqTKPIlTcn9iO2/Ei1TCyi8lnRZqLNjTMKU 6HNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DkkJV+I1xqoSsdjvbVDAbb8FThJ4wemGZq1lcjOE1/4=; b=himbGjpM/bFUdIZIHZgO1vgNH5EiiTNqd/41MKzBqn7Z17rkJV9z6hGPVt+oFlWRWO V+gsSbHquztt1VdMqlRYJteG3VFobmq4ANhIVZs7KOcYNKOGHl/WaccqCABhMoQvE+qu ZoJnHiBEDH8lL+cHSognoFyx3PPy5tbQ1k2Lsrub98pozUJ1N9EkDZeOb1WyrIRQxshU FzjqIrNO3sZmJ+M9U5BznT8Lg0Zfy+CiLrcKgH/aBr7v26dNQ71QzxniE2o7FechxsKv ydRiIp9ymiQOmYlX3Th3XdPxwPQJ/XT6Q9b8ERJZNK3oWPDbc7fd+m7Bz2jQkW06GJPp cEyA== X-Gm-Message-State: AJIora/vq6dr77nj+hdC+FT1blnyTKc11kcFKXf5v/d5IMgBLjwU+rOw tvQLk9W3GcSRQqTLpr43ttw3UYWadkPqRw== X-Google-Smtp-Source: AGRyM1tQmPVip+5n1Y8yQ2uexWKTWxnz+s41uN6RYVlElwRuwIoUnOpZ2HcMZYLug7CY5IM70sO1Kw== X-Received: by 2002:a17:906:b048:b0:6fe:be4a:3ecf with SMTP id bj8-20020a170906b04800b006febe4a3ecfmr19858600ejb.104.1656442822374; Tue, 28 Jun 2022 12:00:22 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:21 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 25/29] io_uring: sendzc with fixed buffers Date: Tue, 28 Jun 2022 19:56:47 +0100 Message-Id: <672444088a4a08b3b098a0edb60d2669ec253161.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Allow zerocopy sends to use fixed buffers. There is an optimisation for this case, the network layer don't need to reference the pages, see SKBFL_MANAGED_FRAG_REFS, so io_uring have to ensure validity of fixed buffers until the notifier is released. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 39 +++++++++++++++++++++++++++++------ include/uapi/linux/io_uring.h | 7 +++++++ 2 files changed, 40 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 07d09d06e8ab..70b1f77ac64e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -723,6 +723,7 @@ struct io_sendzc { size_t len; u16 slot_idx; int msg_flags; + unsigned zc_flags; int addr_len; void __user *addr; }; @@ -6580,11 +6581,14 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } +#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FIXED_BUF + static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_sendzc *zc = &req->msgzc; + struct io_ring_ctx *ctx = req->ctx; - if (READ_ONCE(sqe->ioprio) || READ_ONCE(sqe->__pad2[0])) + if (READ_ONCE(sqe->__pad2[0])) return -EINVAL; zc->buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); @@ -6596,6 +6600,20 @@ static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->addr = u64_to_user_ptr(READ_ONCE(sqe->addr2)); zc->addr_len = READ_ONCE(sqe->addr_len); + zc->zc_flags = READ_ONCE(sqe->ioprio); + if (req->msgzc.zc_flags & ~IO_SENDZC_VALID_FLAGS) + return -EINVAL; + + if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { + unsigned idx = READ_ONCE(sqe->buf_index); + + if (unlikely(idx >= ctx->nr_user_bufs)) + return -EFAULT; + idx = array_index_nospec(idx, ctx->nr_user_bufs); + req->imu = READ_ONCE(ctx->user_bufs[idx]); + io_req_set_rsrc_node(req, ctx, 0); + } + #ifdef CONFIG_COMPAT if (req->ctx->compat) zc->msg_flags |= MSG_CMSG_COMPAT; @@ -6633,12 +6651,21 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_control = NULL; msg.msg_controllen = 0; msg.msg_namelen = 0; - msg.msg_managed_data = 0; + msg.msg_managed_data = 1; - ret = import_single_range(WRITE, zc->buf, zc->len, &iov, &msg.msg_iter); - if (unlikely(ret)) - return ret; - mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { + ret = __io_import_fixed(WRITE, &msg.msg_iter, req->imu, + (u64)zc->buf, zc->len); + if (unlikely(ret)) + return ret; + } else { + msg.msg_managed_data = 0; + ret = import_single_range(WRITE, zc->buf, zc->len, &iov, + &msg.msg_iter); + if (unlikely(ret)) + return ret; + mm_account_pinned_pages(¬if->uarg.mmp, zc->len); + } if (zc->addr) { ret = move_addr_to_kernel(zc->addr, zc->addr_len, &address); diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 689aa1444cd4..69100aa71448 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -274,6 +274,13 @@ enum { IORING_RSRC_UPDATE_NOTIF, }; +/* + * IORING_OP_SENDZC flags + */ +enum { + IORING_SENDZC_FIXED_BUF = (1U << 0), +}; + /* * IO completion data structure (Completion Queue Entry) */ From patchwork Tue Jun 28 18:56:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898760 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FC99C43334 for ; Tue, 28 Jun 2022 19:01:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235149AbiF1TBu (ORCPT ); Tue, 28 Jun 2022 15:01:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233357AbiF1TAg (ORCPT ); Tue, 28 Jun 2022 15:00:36 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4425B1F2E5; Tue, 28 Jun 2022 12:00:25 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id ej4so18911320edb.7; Tue, 28 Jun 2022 12:00:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Ox9QeHiY6HWuPvJfgNGG6WaFTtO2T886numczpXJZTA=; b=bMbLJIsrR1Tb7cS5HHG5VF8QBma2TQ+bPnX0ptlXlGNl15a7RnZiH9s+R1YKDUQw4n 8K5th81a4IF76rwDN7Ww1IAcC80o42i9cViirk4ByviNLaTkbbZ9HzY60t0fgu+djiUl 8P4KdTeNmUyKmELfCW7GZWcwqDon1btdcbMCVxXAWfe0U7KHH98LoxfC8PJQHHozqw/+ CKR9b4ET8hgLBZWaBM2dfXRuOVToAUvI5dzntOFJaIdyE/kBvE311KHnLfXPy8mWURVn DMrQQhm5cSO3c0UF/AUpfYdSr9x/y/JlqQNJOVuQxifojkyATP1ZI+XjdZ3AW4tZl4G8 nl0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ox9QeHiY6HWuPvJfgNGG6WaFTtO2T886numczpXJZTA=; b=e33BnT1Fq23r/2sW6xzw40fwH5TzeE0WFjiEZT7PUpv92uw/Hs3S86FIqaQ4OWBHub 5UPue28qIrrY4hIyCtxMuRSe0XKGqZmFjpzUptXczhD1kUemWCDhGYEOwrntgG+aW81k 6EAOBC5kCXxQo5+ozAKQU4YSePmsG5SjHLL2iOtyEzS/hIPr7irpCKJZuiRmgnZHTmkr lpkkF2lpZOF8uKVpaqb4sGQ6uxhHJZMVzv7qc95t5TCeyDhY7LiZ+AdDbEq9PUvTHDcu hoPsxWZjY1fYzbkR2HbFWNW4+kD6huj94Va5x88/ZW04blcKl77iMESoWoxq1yg4rq91 bupA== X-Gm-Message-State: AJIora+uMiCFfVR+fodOWUo4olXcphCIBsGHOkvtZPQqIFkS+chOmF0G R3DPh74Nb3yCDbsgm90LR0m8l9WHF9Mw6g== X-Google-Smtp-Source: AGRyM1t6UXVD+LJFLiumTBQ1UH4oWqirtexXfZOToprQudgOKXHuHIJBoJDZGrflt4qFX8gvCKC6dA== X-Received: by 2002:a05:6402:4386:b0:437:6450:b41f with SMTP id o6-20020a056402438600b004376450b41fmr25035456edc.97.1656442823586; Tue, 28 Jun 2022 12:00:23 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:23 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 26/29] io_uring: flush notifiers after sendzc Date: Tue, 28 Jun 2022 19:56:48 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Allow to flush notifiers as a part of sendzc request by setting IORING_SENDZC_FLUSH flag. When the sendzc request succeedes it will flush the used [active] notifier. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 7 +++++-- include/uapi/linux/io_uring.h | 1 + 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 70b1f77ac64e..f5fe2ab5622a 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6581,7 +6581,7 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } -#define IO_SENDZC_VALID_FLAGS IORING_SENDZC_FIXED_BUF +#define IO_SENDZC_VALID_FLAGS (IORING_SENDZC_FIXED_BUF|IORING_SENDZC_FLUSH) static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -6685,7 +6685,10 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_ubuf = ¬if->uarg; ret = sock_sendmsg(sock, &msg); - if (unlikely(ret < min_ret)) { + if (likely(ret >= min_ret)) { + if (req->msgzc.zc_flags & IORING_SENDZC_FLUSH) + io_notif_slot_flush_submit(notif_slot, 0); + } else { if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) return -EAGAIN; if (ret == -ERESTARTSYS) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 69100aa71448..7d77d90a5f8a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -279,6 +279,7 @@ enum { */ enum { IORING_SENDZC_FIXED_BUF = (1U << 0), + IORING_SENDZC_FLUSH = (1U << 1), }; /* From patchwork Tue Jun 28 18:56:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898762 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 812B0C43334 for ; Tue, 28 Jun 2022 19:01:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235203AbiF1TBy (ORCPT ); Tue, 28 Jun 2022 15:01:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233575AbiF1TAg (ORCPT ); Tue, 28 Jun 2022 15:00:36 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CA2723BD9; Tue, 28 Jun 2022 12:00:25 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id ay16so27701169ejb.6; Tue, 28 Jun 2022 12:00:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rYlp82dDDlUB3NIHr9+8KFuE4D4YHforTbjmj6Zkq4M=; b=DinE7A7aMO8rULcksnlePYaSwy3tyz13Yr9HSDwiyHc2ghZKBgOmYamaR75K9g2kbk YS3n58g+VLyGWWdI3xAmR8HUoQZh9eX48ExLBvAiOOlWqkzcTGyZ70lF5AV2ZwJXQ+Oa d1LTsmZ+DsIoiAXfVpOCUK+wBs5FmOeFbUy2b3Pck8cNV+Xe9QZ1hSsdP//84K73kKqh vEoYDR8jj2wUzjgne6ZUG69Pzc4JiIbGeODPyOXn8+VykGgaDfQQXUsZsUqZi5zxoTjy TUW+Wpm7XKNxcATFkTrHgeKXuSGoCGN+0Ht6H8Ho017Ho3yxVOL+yNh0dpODi7Xugw9i m43Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rYlp82dDDlUB3NIHr9+8KFuE4D4YHforTbjmj6Zkq4M=; b=BoRp+mijVm4phJcM1YG6jAyWqorCER+tfpl3Gb5xRvF/5Uh9JDB+I6PJwLPQtTxPvw VGpGwPLexrpzqqw7tJ34xQ0KGllF2sIcQyTUEZ1sGC+yjy/TSBVekzUUqnwaNkn0TRhY A/+3vItvlx21nEd1RIPh0eojy1z8Rx56CFi8QikJ82DT9x4BTEuVKCCn91ynTYSwT3I1 gC6DDd13ymIa4vtf/bu1aNjRdjAd9+2VORjRPToidWkT1Zgfumbb9F+91DIQi++bUl3A AZhO0ogKaggNQlWbJdzPfGVXFJwVSou5RDlvk01rikV2MEzXk21E4cyfG1dKNYewCUuN vMqA== X-Gm-Message-State: AJIora/fkGp2lhLbvXPdFJawkU++7AVxhhbCTyT/Xcoa8cTQxRgdemGH iuM6k5e36KJ60fSeA9Lldh9xZC121f4vlA== X-Google-Smtp-Source: AGRyM1v82aE4czkPFmsuPHNjhn2RXCbSE0+RsfeTGfN5SBS2a9urNY+wUkA/ITbOS9Tu9sb7giNJKQ== X-Received: by 2002:a17:907:7288:b0:712:174:8745 with SMTP id dt8-20020a170907728800b0071201748745mr19855681ejc.268.1656442824834; Tue, 28 Jun 2022 12:00:24 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:24 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 27/29] io_uring: allow to override zc tag on flush Date: Tue, 28 Jun 2022 19:56:49 +0100 Message-Id: <011c5487e38ceb5700351ef60d49eedb431f22e0.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add a new sendzc flag, IORING_SENDZC_COPY_TAG. When set and the request flushing a notification, it'll set the notification tag to sqe->user_data. This adds a bit more flexibility allowing to specify notification tags on per-request basis. One use cases is combining the new flag with IOSQE_CQE_SKIP_SUCCESS, so either the request fails and we expect an CQE with a failure and no notification, or it succedees, then there will be no request completion but only a zc notification with an overriden tag. In other words, in the described scheme it posts only one CQE with user_data set to the current requests sqe->user_data. note 1: the flat has no effect if nothing is flushed, e.g. there was no IORING_SENDZC_FLUSH or the request failed. note 2: copying sqe->user_data may be not ideal, but we don't have extra space in SQE to keep a second tag/user_data. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 9 +++++++-- include/uapi/linux/io_uring.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index f5fe2ab5622a..08c98a4d9bd2 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6581,7 +6581,8 @@ static int io_send(struct io_kiocb *req, unsigned int issue_flags) return 0; } -#define IO_SENDZC_VALID_FLAGS (IORING_SENDZC_FIXED_BUF|IORING_SENDZC_FLUSH) +#define IO_SENDZC_VALID_FLAGS (IORING_SENDZC_FIXED_BUF | IORING_SENDZC_FLUSH | \ + IORING_SENDZC_OVERRIDE_TAG) static int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { @@ -6686,7 +6687,11 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) ret = sock_sendmsg(sock, &msg); if (likely(ret >= min_ret)) { - if (req->msgzc.zc_flags & IORING_SENDZC_FLUSH) + unsigned zc_flags = req->msgzc.zc_flags; + + if (zc_flags & IORING_SENDZC_OVERRIDE_TAG) + notif->tag = req->cqe.user_data; + if (zc_flags & IORING_SENDZC_FLUSH) io_notif_slot_flush_submit(notif_slot, 0); } else { if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7d77d90a5f8a..7533387f25d3 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -280,6 +280,7 @@ enum { enum { IORING_SENDZC_FIXED_BUF = (1U << 0), IORING_SENDZC_FLUSH = (1U << 1), + IORING_SENDZC_OVERRIDE_TAG = (1U << 2), }; /* From patchwork Tue Jun 28 18:56:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898761 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59BB4CCA479 for ; Tue, 28 Jun 2022 19:01:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233622AbiF1TBv (ORCPT ); Tue, 28 Jun 2022 15:01:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233764AbiF1TAg (ORCPT ); Tue, 28 Jun 2022 15:00:36 -0400 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C01F02AE20; Tue, 28 Jun 2022 12:00:26 -0700 (PDT) Received: by mail-ej1-x634.google.com with SMTP id fw3so37479ejc.10; Tue, 28 Jun 2022 12:00:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=h+qNzD2Yd+IQYNeNbouiSvYfHA0ZdKwKafy/cYjUYo0=; b=CKLR95HgDTN7qU8vhF7QVz1BFNorTwK32oZ2fp92gAFvtoFxBdzk51D51OCBBQKZWC ZhyOG3QLx/paG79KiFXw6i+lTmJbV/0VedsD/0oNYoN/B/XQQhS2hO7WL9VA4rTnuqAr 0inQZWMliGjkNDGBRPEMzmSt7Qbbt/PvW3eGOTyBDEAAfItvzY5PfYZQRTyqMhSZ2c0s OB5rvryIbaG0RhN0Q/ljW7O6zuV2edHXMqoSrLLCdr6XQuT6D+/+JQipZmT+HAn7V0Pa sG5DAP/CkKzSmOgOH/Sk7/s93xlO4NTDYv8hr4YoxiHyVkpDr8NxxI+jJCuDMWHyP0tT FXKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=h+qNzD2Yd+IQYNeNbouiSvYfHA0ZdKwKafy/cYjUYo0=; b=s4Opol9Sv1swtK0xuaKHkJk27SAVHqBJNIfL0pM+8yg3JceJK4O9OoE/uyUH1yMNE/ Xu+vT0CRjkDIY/JKr1paKL71sUVqj0vShMvPnTDoh26LJM467ud2McIxh7Cx+pjgS+zo ISJPLhVVZ21aqmI7g0mhqJ3mVEnTrwzQd5mZ0QLHsvqiS+s3PcbsLBLB/X6s8Aob1UH3 u/n2bimBo4ZzYJNQpgFZml3JAfYY5FuhY7xUASXf6Rxzs4YSKyv7eb2yTUu5nuDUrsaM uBWbQiz4LhADORdyunrclZAC3hXS6eweBmIZ9I+XbECmbvvE9XY3ox/CGMM/cnqyeOGd aZCQ== X-Gm-Message-State: AJIora9SzRIXvGaaZpxcBQS08P9HUsFmIOdrf81GMMI/T9Y0uPKCJqFD ymBcoaPU/2FV4w+7JmhHJcZa5uWQXJ2iJg== X-Google-Smtp-Source: AGRyM1ulk5olSi4r+mXPRGnWScKC9qxKE5OBDpFddlsQLLWLh2nVQhbsRdAw1F2EuATfwvTninBGAA== X-Received: by 2002:a17:906:74c7:b0:722:e657:4220 with SMTP id z7-20020a17090674c700b00722e6574220mr19124547ejl.589.1656442825987; Tue, 28 Jun 2022 12:00:25 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:25 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 28/29] io_uring: batch submission notif referencing Date: Tue, 28 Jun 2022 19:56:50 +0100 Message-Id: X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Batch get notifier references and use ->msg_ubuf_ref to hand off one ref per sendzc request to the network layer. This ammortises the submission side net_zcopy_get() atomics. Note that we always keep at least one reference in the cache because we do only post send checks on whether ->msg_ubuf_ref was consumed or not. Signed-off-by: Pavel Begunkov --- fs/io_uring.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 08c98a4d9bd2..78990a130b66 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -374,6 +374,7 @@ struct io_ev_fd { }; #define IO_NOTIF_MAX_SLOTS (1U << 10) +#define IO_NOTIF_REF_CACHE_NR 64 struct io_notif { struct ubuf_info uarg; @@ -384,6 +385,8 @@ struct io_notif { u64 tag; /* see struct io_notif_slot::seq */ u32 seq; + /* extra uarg->refcnt refs */ + int cached_refs; /* hook into ctx->notif_list and ctx->notif_list_locked */ struct list_head cache_node; @@ -2949,14 +2952,30 @@ static struct io_notif *io_alloc_notif(struct io_ring_ctx *ctx, notif->seq = slot->seq++; notif->tag = slot->tag; + notif->cached_refs = IO_NOTIF_REF_CACHE_NR; /* master ref owned by io_notif_slot, will be dropped on flush */ - refcount_set(¬if->uarg.refcnt, 1); + refcount_set(¬if->uarg.refcnt, IO_NOTIF_REF_CACHE_NR + 1); percpu_ref_get(&ctx->refs); notif->rsrc_node = ctx->rsrc_node; io_charge_rsrc_node(ctx); return notif; } +static inline void io_notif_consume_ref(struct io_notif *notif) + __must_hold(&ctx->uring_lock) +{ + notif->cached_refs--; + + /* + * Issue sends without looking at notif->cached_refs first, so we + * always have to have at least one ref cached + */ + if (unlikely(!notif->cached_refs)) { + refcount_add(IO_NOTIF_REF_CACHE_NR, ¬if->uarg.refcnt); + notif->cached_refs += IO_NOTIF_REF_CACHE_NR; + } +} + static inline struct io_notif *io_get_notif(struct io_ring_ctx *ctx, struct io_notif_slot *slot) { @@ -2979,13 +2998,15 @@ static void io_notif_slot_flush(struct io_notif_slot *slot) __must_hold(&ctx->uring_lock) { struct io_notif *notif = slot->notif; + int refs = notif->cached_refs + 1; slot->notif = NULL; + notif->cached_refs = 0; if (WARN_ON_ONCE(in_interrupt())) return; - /* drop slot's master ref */ - if (refcount_dec_and_test(¬if->uarg.refcnt)) + /* drop all cached refs and the slot's master ref */ + if (refcount_sub_and_test(refs, ¬if->uarg.refcnt)) io_notif_complete(notif); } @@ -6653,6 +6674,7 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_controllen = 0; msg.msg_namelen = 0; msg.msg_managed_data = 1; + msg.msg_ubuf_ref = 1; if (req->msgzc.zc_flags & IORING_SENDZC_FIXED_BUF) { ret = __io_import_fixed(WRITE, &msg.msg_iter, req->imu, @@ -6686,6 +6708,10 @@ static int io_sendzc(struct io_kiocb *req, unsigned int issue_flags) msg.msg_ubuf = ¬if->uarg; ret = sock_sendmsg(sock, &msg); + /* check if the send consumed an additional ref */ + if (likely(!msg.msg_ubuf_ref)) + io_notif_consume_ref(notif); + if (likely(ret >= min_ret)) { unsigned zc_flags = req->msgzc.zc_flags; From patchwork Tue Jun 28 18:56:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 12898763 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C251C433EF for ; Tue, 28 Jun 2022 19:01:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235201AbiF1TBx (ORCPT ); Tue, 28 Jun 2022 15:01:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234515AbiF1TAh (ORCPT ); Tue, 28 Jun 2022 15:00:37 -0400 Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36CD92BB02; Tue, 28 Jun 2022 12:00:28 -0700 (PDT) Received: by mail-ed1-x530.google.com with SMTP id cf14so18904716edb.8; Tue, 28 Jun 2022 12:00:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3wM+AAPg0fMa6Fj0T71qjnkW5yN4Xjg32c+zeSafwBs=; b=PHSfZerw82V4+1DQTeiMISRQ9mIWdIE4OM6Me9pIAzN/y5UfzqumbtiwzOwFPqhxzC bP/uyZ99M9AHuAoZipAavRa1zIwPGcicvOpE9n2fGIH+OeNVGouKeONEvUuWIvdG2T/+ EP+4zj9Ummc+p636HT8P7khy8FPaMiL0j6wNfBJNtiVRy3hXH7dQGU2fTpO+EliUZXUd XI8tu96X+gSbA+PJSuDJPIhBcnWsYUu7Yv5NgUvl/fd/M+K9I+oq6WFAVPdtEIMxxc0Q xKK0+rElC/e3h0uGBomk/nN/rH6lWdS6adN7v3XBFrcd0C//7X6H6umGA5jWQe2BEFyd 19Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3wM+AAPg0fMa6Fj0T71qjnkW5yN4Xjg32c+zeSafwBs=; b=3F+XYOAHCQCc/wBJjNmHWTaLIoeCtX2xhI6KJXk/blFL7vrkGNYFoyqMTs5khyxAZZ si46TVr9mFLohgjo6jwYLlKDv085ptdgPljGqczXBdIEwC0msYYERp4xC37qSsuMk592 5SB4baZM5efsPgq5To7DCTBIAvhBgr2JqejyzswupQXhQaCAc2NZLk6UAo2WRwTrWWL6 Np6CCpBHBSurDOc8evKqwQu+7OmUjnXCJ+hnkn3blCnLnYCUEesNYlbRZ1rC5htsBzrb 0IMOrszAwP4nJxKuAJOntRgdtPY3aoifCEnjBLCWVXx5SH6s+H9rchy+qs7hXJ7OO6nF o8Ag== X-Gm-Message-State: AJIora+cN0wHGtK+Itctzfy2bFFPoCO+2Y8YZoe9IOV54njZ2zkyqYCc jterDqXYu0sOtAHEN4H6e8t7r6KUH5VVew== X-Google-Smtp-Source: AGRyM1tAeHOF38BE2E3VpdvoBLjtyJKFDrp4p6AX6fn4bp2xtPdH1n1HDBEYcXB8zH8n8qP4ZFJ0Eg== X-Received: by 2002:a05:6402:400c:b0:437:d11f:f7d3 with SMTP id d12-20020a056402400c00b00437d11ff7d3mr153788eda.256.1656442827246; Tue, 28 Jun 2022 12:00:27 -0700 (PDT) Received: from 127.0.0.1localhost (188.28.125.106.threembb.co.uk. [188.28.125.106]) by smtp.gmail.com with ESMTPSA id t21-20020a05640203d500b0043573c59ea0sm9758451edw.90.2022.06.28.12.00.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 12:00:26 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: "David S . Miller" , Jakub Kicinski , Jonathan Lemon , Willem de Bruijn , Jens Axboe , kernel-team@fb.com, Pavel Begunkov Subject: [RFC net-next v3 29/29] selftests/io_uring: test zerocopy send Date: Tue, 28 Jun 2022 19:56:51 +0100 Message-Id: <6dd1916dfb3c474e951ea83895ce41778dd1e508.1653992701.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org X-Patchwork-State: RFC Add selftests for io_uring zerocopy sends and io_uring's notification infrastructure. It's largely influenced by msg_zerocopy and uses it on the receive side. Signed-off-by: Pavel Begunkov --- tools/testing/selftests/net/Makefile | 1 + .../selftests/net/io_uring_zerocopy_tx.c | 605 ++++++++++++++++++ .../selftests/net/io_uring_zerocopy_tx.sh | 131 ++++ 3 files changed, 737 insertions(+) create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index 464df13831f2..f33a626220eb 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -60,6 +60,7 @@ TEST_GEN_FILES += cmsg_sender TEST_GEN_FILES += stress_reuseport_listen TEST_PROGS += test_vxlan_vnifiltering.sh TEST_GEN_FILES += bind_bhash_test +TEST_GEN_FILES += io_uring_zerocopy_tx TEST_FILES := settings diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c new file mode 100644 index 000000000000..899ddc84f8a9 --- /dev/null +++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c @@ -0,0 +1,605 @@ +/* SPDX-License-Identifier: MIT */ +/* based on linux-kernel/tools/testing/selftests/net/msg_zerocopy.c */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define NOTIF_TAG 0xfffffffULL +#define NONZC_TAG 0 +#define ZC_TAG 1 + +enum { + MODE_NONZC = 0, + MODE_ZC = 1, + MODE_ZC_FIXED = 2, + MODE_MIXED = 3, +}; + +static bool cfg_flush = false; +static bool cfg_cork = false; +static int cfg_mode = MODE_ZC_FIXED; +static int cfg_nr_reqs = 8; +static int cfg_family = PF_UNSPEC; +static int cfg_payload_len; +static int cfg_port = 8000; +static int cfg_runtime_ms = 4200; + +static socklen_t cfg_alen; +static struct sockaddr_storage cfg_dst_addr; + +static char payload[IP_MAXPACKET] __attribute__((aligned(4096))); + +struct io_sq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + unsigned *flags; + unsigned *array; +}; + +struct io_cq_ring { + unsigned *head; + unsigned *tail; + unsigned *ring_mask; + unsigned *ring_entries; + struct io_uring_cqe *cqes; +}; + +struct io_uring_sq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *kflags; + unsigned *kdropped; + unsigned *array; + struct io_uring_sqe *sqes; + + unsigned sqe_head; + unsigned sqe_tail; + + size_t ring_sz; +}; + +struct io_uring_cq { + unsigned *khead; + unsigned *ktail; + unsigned *kring_mask; + unsigned *kring_entries; + unsigned *koverflow; + struct io_uring_cqe *cqes; + + size_t ring_sz; +}; + +struct io_uring { + struct io_uring_sq sq; + struct io_uring_cq cq; + int ring_fd; +}; + +#ifdef __alpha__ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 535 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 536 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 537 +# endif +#else /* !__alpha__ */ +# ifndef __NR_io_uring_setup +# define __NR_io_uring_setup 425 +# endif +# ifndef __NR_io_uring_enter +# define __NR_io_uring_enter 426 +# endif +# ifndef __NR_io_uring_register +# define __NR_io_uring_register 427 +# endif +#endif + +#if defined(__x86_64) || defined(__i386__) +#define read_barrier() __asm__ __volatile__("":::"memory") +#define write_barrier() __asm__ __volatile__("":::"memory") +#else + +#define read_barrier() __sync_synchronize() +#define write_barrier() __sync_synchronize() +#endif + +static int io_uring_setup(unsigned int entries, struct io_uring_params *p) +{ + return syscall(__NR_io_uring_setup, entries, p); +} + +static int io_uring_enter(int fd, unsigned int to_submit, + unsigned int min_complete, + unsigned int flags, sigset_t *sig) +{ + return syscall(__NR_io_uring_enter, fd, to_submit, min_complete, + flags, sig, _NSIG / 8); +} + +static int io_uring_register_buffers(struct io_uring *ring, + const struct iovec *iovecs, + unsigned nr_iovecs) +{ + int ret; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_BUFFERS, iovecs, nr_iovecs); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_register_notifications(struct io_uring *ring, + unsigned nr, + struct io_uring_notification_slot *slots) +{ + int ret; + struct io_uring_notification_register r = { + .nr_slots = nr, + .data = (unsigned long)slots, + }; + + ret = syscall(__NR_io_uring_register, ring->ring_fd, + IORING_REGISTER_NOTIFIERS, &r, sizeof(r)); + return (ret < 0) ? -errno : ret; +} + +static int io_uring_mmap(int fd, struct io_uring_params *p, + struct io_uring_sq *sq, struct io_uring_cq *cq) +{ + size_t size; + void *ptr; + int ret; + + sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned); + ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING); + if (ptr == MAP_FAILED) + return -errno; + sq->khead = ptr + p->sq_off.head; + sq->ktail = ptr + p->sq_off.tail; + sq->kring_mask = ptr + p->sq_off.ring_mask; + sq->kring_entries = ptr + p->sq_off.ring_entries; + sq->kflags = ptr + p->sq_off.flags; + sq->kdropped = ptr + p->sq_off.dropped; + sq->array = ptr + p->sq_off.array; + + size = p->sq_entries * sizeof(struct io_uring_sqe); + sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES); + if (sq->sqes == MAP_FAILED) { + ret = -errno; +err: + munmap(sq->khead, sq->ring_sz); + return ret; + } + + cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe); + ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING); + if (ptr == MAP_FAILED) { + ret = -errno; + munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe)); + goto err; + } + cq->khead = ptr + p->cq_off.head; + cq->ktail = ptr + p->cq_off.tail; + cq->kring_mask = ptr + p->cq_off.ring_mask; + cq->kring_entries = ptr + p->cq_off.ring_entries; + cq->koverflow = ptr + p->cq_off.overflow; + cq->cqes = ptr + p->cq_off.cqes; + return 0; +} + +static int io_uring_queue_init(unsigned entries, struct io_uring *ring, + unsigned flags) +{ + struct io_uring_params p; + int fd, ret; + + memset(ring, 0, sizeof(*ring)); + memset(&p, 0, sizeof(p)); + p.flags = flags; + + fd = io_uring_setup(entries, &p); + if (fd < 0) + return fd; + ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq); + if (!ret) + ring->ring_fd = fd; + else + close(fd); + return ret; +} + +static int io_uring_submit(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + const unsigned mask = *sq->kring_mask; + unsigned ktail, submitted, to_submit; + int ret; + + read_barrier(); + if (*sq->khead != *sq->ktail) { + submitted = *sq->kring_entries; + goto submit; + } + if (sq->sqe_head == sq->sqe_tail) + return 0; + + ktail = *sq->ktail; + to_submit = sq->sqe_tail - sq->sqe_head; + for (submitted = 0; submitted < to_submit; submitted++) { + read_barrier(); + sq->array[ktail++ & mask] = sq->sqe_head++ & mask; + } + if (!submitted) + return 0; + + if (*sq->ktail != ktail) { + write_barrier(); + *sq->ktail = ktail; + write_barrier(); + } +submit: + ret = io_uring_enter(ring->ring_fd, submitted, 0, + IORING_ENTER_GETEVENTS, NULL); + return ret < 0 ? -errno : ret; +} + +static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags) +{ + memset(sqe, 0, sizeof(*sqe)); + sqe->opcode = (__u8) IORING_OP_SEND; + sqe->fd = sockfd; + sqe->addr = (unsigned long) buf; + sqe->len = len; + sqe->msg_flags = (__u32) flags; +} + +static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd, + const void *buf, size_t len, int flags, + unsigned slot_idx, unsigned zc_flags) +{ + io_uring_prep_send(sqe, sockfd, buf, len, flags); + sqe->opcode = (__u8) IORING_OP_SENDZC; + sqe->notification_idx = slot_idx; + sqe->ioprio = zc_flags; +} + +static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring) +{ + struct io_uring_sq *sq = &ring->sq; + + if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries) + return NULL; + return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask]; +} + +static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr) +{ + struct io_uring_cq *cq = &ring->cq; + const unsigned mask = *cq->kring_mask; + unsigned head = *cq->khead; + int ret; + + *cqe_ptr = NULL; + do { + read_barrier(); + if (head != *cq->ktail) { + *cqe_ptr = &cq->cqes[head & mask]; + break; + } + ret = io_uring_enter(ring->ring_fd, 0, 1, + IORING_ENTER_GETEVENTS, NULL); + if (ret < 0) + return -errno; + } while (1); + + return 0; +} + +static inline void io_uring_cqe_seen(struct io_uring *ring) +{ + *(&ring->cq)->khead += 1; + write_barrier(); +} + +static unsigned long gettimeofday_ms(void) +{ + struct timeval tv; + + gettimeofday(&tv, NULL); + return (tv.tv_sec * 1000) + (tv.tv_usec / 1000); +} + +static void do_setsockopt(int fd, int level, int optname, int val) +{ + if (setsockopt(fd, level, optname, &val, sizeof(val))) + error(1, errno, "setsockopt %d.%d: %d", level, optname, val); +} + +static int do_setup_tx(int domain, int type, int protocol) +{ + int fd; + + fd = socket(domain, type, protocol); + if (fd == -1) + error(1, errno, "socket t"); + + do_setsockopt(fd, SOL_SOCKET, SO_SNDBUF, 1 << 21); + + if (connect(fd, (void *) &cfg_dst_addr, cfg_alen)) + error(1, errno, "connect"); + return fd; +} + +static void do_tx(int domain, int type, int protocol) +{ + struct io_uring_notification_slot b[1] = {{.tag = NOTIF_TAG}}; + struct io_uring_sqe *sqe; + struct io_uring_cqe *cqe; + unsigned long packets = 0, bytes = 0; + struct io_uring ring; + struct iovec iov; + uint64_t tstop; + int i, fd, ret; + int compl_cqes = 0; + + fd = do_setup_tx(domain, type, protocol); + + ret = io_uring_queue_init(512, &ring, 0); + if (ret) + error(1, ret, "io_uring: queue init"); + + ret = io_uring_register_notifications(&ring, 1, b); + if (ret) + error(1, ret, "io_uring: tx ctx registration"); + + iov.iov_base = payload; + iov.iov_len = cfg_payload_len; + + ret = io_uring_register_buffers(&ring, &iov, 1); + if (ret) + error(1, ret, "io_uring: buffer registration"); + + tstop = gettimeofday_ms() + cfg_runtime_ms; + do { + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 1); + + for (i = 0; i < cfg_nr_reqs; i++) { + unsigned zc_flags = 0; + unsigned buf_idx = 0; + unsigned slot_idx = 0; + unsigned mode = cfg_mode; + unsigned msg_flags = 0; + + if (cfg_mode == MODE_MIXED) + mode = rand() % 3; + + sqe = io_uring_get_sqe(&ring); + + if (mode == MODE_NONZC) { + io_uring_prep_send(sqe, fd, payload, + cfg_payload_len, msg_flags); + sqe->user_data = NONZC_TAG; + } else { + if (cfg_flush) { + zc_flags |= IORING_SENDZC_FLUSH; + compl_cqes++; + } + io_uring_prep_sendzc(sqe, fd, payload, + cfg_payload_len, + msg_flags, slot_idx, zc_flags); + if (mode == MODE_ZC_FIXED) { + sqe->ioprio |= IORING_SENDZC_FIXED_BUF; + sqe->buf_index = buf_idx; + } + sqe->user_data = ZC_TAG; + } + } + + ret = io_uring_submit(&ring); + if (ret != cfg_nr_reqs) + error(1, ret, "submit"); + + for (i = 0; i < cfg_nr_reqs; i++) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + + if (cqe->user_data == NOTIF_TAG) { + compl_cqes--; + i--; + } else if (cqe->user_data != NONZC_TAG && + cqe->user_data != ZC_TAG) { + error(1, cqe->res, "invalid user_data"); + } else if (cqe->res <= 0 && cqe->res != -EAGAIN) { + error(1, cqe->res, "send failed"); + } else { + if (cqe->res > 0) { + packets++; + bytes += cqe->res; + } + /* failed requests don't flush */ + if (cfg_flush && + cqe->res <= 0 && + cqe->user_data == ZC_TAG) + compl_cqes--; + } + io_uring_cqe_seen(&ring); + } + if (cfg_cork) + do_setsockopt(fd, IPPROTO_UDP, UDP_CORK, 0); + } while (gettimeofday_ms() < tstop); + + if (close(fd)) + error(1, errno, "close"); + + fprintf(stderr, "tx=%lu (MB=%lu), tx/s=%lu (MB/s=%lu)\n", + packets, bytes >> 20, + packets / (cfg_runtime_ms / 1000), + (bytes >> 20) / (cfg_runtime_ms / 1000)); + + while (compl_cqes) { + ret = io_uring_wait_cqe(&ring, &cqe); + if (ret) + error(1, ret, "wait cqe"); + io_uring_cqe_seen(&ring); + compl_cqes--; + } +} + +static void do_test(int domain, int type, int protocol) +{ + int i; + + for (i = 0; i < IP_MAXPACKET; i++) + payload[i] = 'a' + (i % 26); + do_tx(domain, type, protocol); +} + +static void usage(const char *filepath) +{ + error(1, 0, "Usage: %s [-f] [-n] [-z0] [-s] " + "(-4|-6) [-t