From patchwork Fri Apr 19 11:08:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13636238 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5628F537FC; Fri, 19 Apr 2024 11:08:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524927; cv=none; b=bWtBXDPVz67MUswLsU1L8MBGkB+yDVmZx5aANoikLSYRvSL6hX5cr/61kSbE0IdKOHsBpQq5q4ylvqfiqp//cr0qDggIXMaj5/SL/CdxUGqSy0AvKV/sniMvY1FvdkGQDSGvPxoqrJDa966Lwot6NnCTCatGVzuCNgIU5KuIJ7o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524927; c=relaxed/simple; bh=qnUyLGRF2KgQTOQZKAS+5p06lue4XABs0mTbEbV0xvs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Xj+f1qnh1pKXj/ujZvHAb7a+RvxDwoSQMTZEjBDw9KjfPjXNpRNqpGCPyqOuao5eJUrn4AGS+/klqYzJ1YHQ86bxGb40rxbqFERxy5hbE2iF6JpG4nZdlfinrxP/SV7mAFvcclfZE/hL0iUvuIGJ0jgg3D8k8EX35nT3jvabJ60= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BO+b5Thi; arc=none smtp.client-ip=209.85.218.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BO+b5Thi" Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a5544fd07easo213215066b.0; Fri, 19 Apr 2024 04:08:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713524923; x=1714129723; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OOjfjLTpACX+ktm4SIjFbQPdNROuvyJVkhcWINyu6zc=; b=BO+b5Thi5jCeIO83O8npYZUHQg+I6tSsDgXsVq6v4Os1XIlVY3sp0wUi7POo/wT01/ eIxutauMoZr+0ytwirfoM1isNAwv07JcZQzply0Y5F8SgI/mnSk7t7eW35xmqPunEG6s pBHxElTlcit3aS2YmHMGugbDmwbtmxm4GjjpNJ6DQvijTuMdsWmVSqhAr4thCO1wKuPb xJMnFF868rGS1pbqBbUn7P1HPPey/BQpupJQgmlm0VL2PL4wSw3L7vbjScBuZc4aQIao 2e6tfL2LO90gGpF9nm7SspQg+N8/cRepQXKBaNsx6mMq3g+WQT8v2k/IsSIhS4yqN8sn tdIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713524923; x=1714129723; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OOjfjLTpACX+ktm4SIjFbQPdNROuvyJVkhcWINyu6zc=; b=r5EtSaeqhft4aBQQPDH6cx/ZW2t7Py7HyipB34IqB+VoJ4XPUPGkx28KPkwqnYuvtA azdewq2p2wh9kAXUnDH7vGExGFXwFmVpMv97i0VUpk97vOJC47hcwbwUuhGeVOoflgGB SJqztsREnftLQExv6p8yfVI9QTYSNL5o3DHOligZphk/6w14EMGhr+yVN+cFJiZ9MBE8 9zvP6Qy/JLMZC7K21D2TAoOgIfDugE7D4Cu2CDqc/pxVC1jpclJPUhLZZZYxGcYHa0/3 K1N4gdXdyHCHeVKMOoAyI14CipWZUic0xHqGp/KJxXd/+1KjLVcs6m04f01RJUElOwgR rZVw== X-Forwarded-Encrypted: i=1; AJvYcCXI22wvHzVTXwGgthfaNGtXI4AtpkaJ8ZFboemBwXfc9AtCl9RUVcUgZYYgwhz3zexHViuIR1mrnH/PuzPLOaoNkHcTDqcOJaqyD7PAS1fil1khz7/tHHzfNAUx X-Gm-Message-State: AOJu0YxuUd9q0NH3JlhZ99nu5jKAJ52sUR+9UHVLsRIEoQheF7f4jaNF pODGp33EVu10dD42TZyE+vFBWeVDM+V03hBZRucuJozWhUnjK7bLQHkVsQ== X-Google-Smtp-Source: AGHT+IEmBOo8ZD4NY+mkF+vWNE4N/26bHcNn7qxJJxwL9/Pi2Lxd8nenGK/ka+snHuSjnwh79g+UAw== X-Received: by 2002:a17:906:840c:b0:a55:6f32:63b2 with SMTP id n12-20020a170906840c00b00a556f3263b2mr1216223ejx.5.1713524922714; Fri, 19 Apr 2024 04:08:42 -0700 (PDT) Received: from 127.0.0.1localhost ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id z13-20020a17090655cd00b00a4739efd7cesm2082525ejp.60.2024.04.19.04.08.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 04:08:42 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, "David S . Miller" , Jakub Kicinski , David Ahern , Eric Dumazet , Willem de Bruijn , Jason Wang , Wei Liu , Paul Durrant , xen-devel@lists.xenproject.org, "Michael S . Tsirkin" , virtualization@lists.linux.dev, kvm@vger.kernel.org Subject: [PATCH io_uring-next/net-next v2 1/4] net: extend ubuf_info callback to ops structure Date: Fri, 19 Apr 2024 12:08:39 +0100 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe Reviewed-by: David Ahern Signed-off-by: Pavel Begunkov Reviewed-by: Willem de Bruijn --- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/net/xen-netback/common.h | 5 ++--- drivers/net/xen-netback/interface.c | 2 +- drivers/net/xen-netback/netback.c | 11 ++++++++--- drivers/vhost/net.c | 8 ++++++-- include/linux/skbuff.h | 19 +++++++++++-------- io_uring/notif.c | 8 ++++++-- net/core/skbuff.c | 16 ++++++++++------ 9 files changed, 46 insertions(+), 27 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 9f0495e8df4d..bfdd3875fe86 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -754,7 +754,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, skb_zcopy_init(skb, msg_control); } else if (msg_control) { struct ubuf_info *uarg = msg_control; - uarg->callback(NULL, uarg, false); + uarg->ops->complete(NULL, uarg, false); } dev_queue_xmit(skb); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 0b3f21cba552..b7401d990680 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1906,7 +1906,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_zcopy_init(skb, msg_control); } else if (msg_control) { struct ubuf_info *uarg = msg_control; - uarg->callback(NULL, uarg, false); + uarg->ops->complete(NULL, uarg, false); } skb_reset_network_header(skb); diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 1fcbd83f7ff2..17421da139f2 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -390,9 +390,8 @@ bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb); void xenvif_carrier_on(struct xenvif *vif); -/* Callback from stack when TX packet can be released */ -void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf, - bool zerocopy_success); +/* Callbacks from stack when TX packet can be released */ +extern const struct ubuf_info_ops xenvif_ubuf_ops; static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue) { diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 7cff90aa8d24..65db5f14465f 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -593,7 +593,7 @@ int xenvif_init_queue(struct xenvif_queue *queue) for (i = 0; i < MAX_PENDING_REQS; i++) { queue->pending_tx_info[i].callback_struct = (struct ubuf_info_msgzc) - { { .callback = xenvif_zerocopy_callback }, + { { .ops = &xenvif_ubuf_ops }, { { .ctx = NULL, .desc = i } } }; queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE; diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 48254fc07d64..5836995d6774 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1157,7 +1157,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s uarg = skb_shinfo(skb)->destructor_arg; /* increase inflight counter to offset decrement in callback */ atomic_inc(&queue->inflight_packets); - uarg->callback(NULL, uarg, true); + uarg->ops->complete(NULL, uarg, true); skb_shinfo(skb)->destructor_arg = NULL; /* Fill the skb with the new (local) frags. */ @@ -1279,8 +1279,9 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) return work_done; } -void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base, - bool zerocopy_success) +static void xenvif_zerocopy_callback(struct sk_buff *skb, + struct ubuf_info *ubuf_base, + bool zerocopy_success) { unsigned long flags; pending_ring_idx_t index; @@ -1313,6 +1314,10 @@ void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base, xenvif_skb_zerocopy_complete(queue); } +const struct ubuf_info_ops xenvif_ubuf_ops = { + .complete = xenvif_zerocopy_callback, +}; + static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue) { struct gnttab_unmap_grant_ref *gop; diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index c64ded183f8d..f16279351db5 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -380,7 +380,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, } } -static void vhost_zerocopy_callback(struct sk_buff *skb, +static void vhost_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *ubuf_base, bool success) { struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base); @@ -408,6 +408,10 @@ static void vhost_zerocopy_callback(struct sk_buff *skb, rcu_read_unlock_bh(); } +static const struct ubuf_info_ops vhost_ubuf_ops = { + .complete = vhost_zerocopy_complete, +}; + static inline unsigned long busy_clock(void) { return local_clock() >> 10; @@ -879,7 +883,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; - ubuf->ubuf.callback = vhost_zerocopy_callback; + ubuf->ubuf.ops = &vhost_ubuf_ops; ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG; refcount_set(&ubuf->ubuf.refcnt, 1); msg.msg_control = &ctl; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4072a7ee3859..a44954264746 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -527,6 +527,11 @@ enum { #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) +struct ubuf_info_ops { + void (*complete)(struct sk_buff *, struct ubuf_info *, + bool zerocopy_success); +}; + /* * The callback notifies userspace to release buffers when skb DMA is done in * lower device, the skb last reference should be 0 when calling this. @@ -536,8 +541,7 @@ enum { * The desc field is used to track userspace buffer index. */ struct ubuf_info { - void (*callback)(struct sk_buff *, struct ubuf_info *, - bool zerocopy_success); + const struct ubuf_info_ops *ops; refcount_t refcnt; u8 flags; }; @@ -1671,14 +1675,13 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) } #endif +extern const struct ubuf_info_ops msg_zerocopy_ubuf_ops; + struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); -void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, - bool success); - int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length); @@ -1766,13 +1769,13 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb) static inline void net_zcopy_put(struct ubuf_info *uarg) { if (uarg) - uarg->callback(NULL, uarg, true); + uarg->ops->complete(NULL, uarg, true); } static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref) { if (uarg) { - if (uarg->callback == msg_zerocopy_callback) + if (uarg->ops == &msg_zerocopy_ubuf_ops) msg_zerocopy_put_abort(uarg, have_uref); else if (have_uref) net_zcopy_put(uarg); @@ -1786,7 +1789,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) if (uarg) { if (!skb_zcopy_is_nouarg(skb)) - uarg->callback(skb, uarg, zerocopy_success); + uarg->ops->complete(skb, uarg, zerocopy_success); skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY; } diff --git a/io_uring/notif.c b/io_uring/notif.c index 3485437b207d..53532d78a947 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -23,7 +23,7 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts) io_req_task_complete(notif, ts); } -static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg, +static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, bool success) { struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg); @@ -43,6 +43,10 @@ static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg, __io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE); } +static const struct ubuf_info_ops io_ubuf_ops = { + .complete = io_tx_ubuf_complete, +}; + struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) __must_hold(&ctx->uring_lock) { @@ -62,7 +66,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) nd->zc_report = false; nd->account_pages = 0; nd->uarg.flags = IO_NOTIF_UBUF_FLAGS; - nd->uarg.callback = io_tx_ubuf_callback; + nd->uarg.ops = &io_ubuf_ops; refcount_set(&nd->uarg.refcnt, 1); return notif; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 37c858dc11a6..0f4cc759824b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1652,7 +1652,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) return NULL; } - uarg->ubuf.callback = msg_zerocopy_callback; + uarg->ubuf.ops = &msg_zerocopy_ubuf_ops; uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1; uarg->len = 1; uarg->bytelen = size; @@ -1678,7 +1678,7 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, u32 bytelen, next; /* there might be non MSG_ZEROCOPY users */ - if (uarg->callback != msg_zerocopy_callback) + if (uarg->ops != &msg_zerocopy_ubuf_ops) return NULL; /* realloc only when socket is locked (TCP, UDP cork), @@ -1789,8 +1789,8 @@ static void __msg_zerocopy_callback(struct ubuf_info_msgzc *uarg) sock_put(sk); } -void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, - bool success) +static void msg_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *uarg, + bool success) { struct ubuf_info_msgzc *uarg_zc = uarg_to_msgzc(uarg); @@ -1799,7 +1799,6 @@ void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, if (refcount_dec_and_test(&uarg->refcnt)) __msg_zerocopy_callback(uarg_zc); } -EXPORT_SYMBOL_GPL(msg_zerocopy_callback); void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) { @@ -1809,10 +1808,15 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) uarg_to_msgzc(uarg)->len--; if (have_uref) - msg_zerocopy_callback(NULL, uarg, true); + msg_zerocopy_complete(NULL, uarg, true); } EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort); +const struct ubuf_info_ops msg_zerocopy_ubuf_ops = { + .complete = msg_zerocopy_complete, +}; +EXPORT_SYMBOL_GPL(msg_zerocopy_ubuf_ops); + int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, struct msghdr *msg, int len, struct ubuf_info *uarg) From patchwork Fri Apr 19 11:08:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13636239 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A4B680BEC; Fri, 19 Apr 2024 11:08:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524928; cv=none; b=Zv3PXiZFmSTJ5WxJJAhucSq8ANUiEGzddWKS6Z5k+tHme2JTts8V279eErbIJ1DofMjwyhxAQNYdcjuHez2nUZaLOIEtH86S2hSicVv8dLDlE4vM43GDJRaquFl2VuDx97Xtbo9k3mk1yZTlbpbEFbFgljb67s2TvWZYM2EXhl4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524928; c=relaxed/simple; bh=1eVlmu4WPqTzS1vEzdsZXLCBN4erb9MBcCXhDGVcS9s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DXw7knYCgFHwF0L5wJGvbnaidAgmz9z7DlTsLFWuw91B+qp2JS6ePW+IjmQ+Pc8DgYrnSxdJqa4Q2UPhgOUayz4epM7NvMO3Sag5JSes3zEPbuT4dQ190T54Zkr/D2K99qWiuxVUHKjlPNhIcam7S+3HaO7nkQWdH5u0dtA0y1U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NITY6W+g; arc=none smtp.client-ip=209.85.208.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NITY6W+g" Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-56e477db7fbso3145772a12.3; Fri, 19 Apr 2024 04:08:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713524924; x=1714129724; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gztq7q+tVBfKfiFB41TR2uivJxMSmSRECNayk7q9P5o=; b=NITY6W+gh/D6k7iDsgwSkeVVJBXE1otCGC5Qs7Y/aUhht6p4l8CPyy6z+86LY6WykK ii/G9SxkpOcBcH3omBR18JmP92kJLs64iew3KRFySE73gUk63FjufhnMTehmaepVw6Qz iZYLhsIfNClk5Tvo2UmkTtc4Sipv34JqaHguGPpnLdKZcxsQmAm+TUPo2dAg3pJ/m+WO jgFxc+3Wc1981wXwy0Wk0yxeCbKI3pOr3snFKRjbEfNZr/SzRhcFNZf9D+/elCNVLTv4 WHfaKTpvFVLJOJ14JkTr7zQnUv/HGFWmlC6pApw5lF8pBSx9Kkref/2nRyrdbxwmg5C6 FwQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713524924; x=1714129724; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gztq7q+tVBfKfiFB41TR2uivJxMSmSRECNayk7q9P5o=; b=vqDnicXStK5rcLIKwpwcitoR/JzDxsQ+UTg6PQrNkmm/tuzmHvrRBYx+sl+7d4gJYD hP8fSZD+ZDS3IwZvwXlhpdYqQWDJHNwD1ZwhFpv0CLDUwhGlK0XeJkZYydFeBioFiNXA 0ErTIJhU78Sa/MxszD5GuXTyX7LIaFXlhuWBoZ2g/t5RXWA4le4HqUo5mriHcIVEdm2v a72tT2jEMT/CW1XwMtgSuQ4gcQopD1SVBlBiYNflbyYcB7aZZq8PIAl8fX9QYQVYQjcp lx5JOzr71GCotFSaqv6y4akIgPjeWsZrSs42Fv1kghesRjWCcCzN6IvAWHAo66voldtF /LXA== X-Forwarded-Encrypted: i=1; AJvYcCXmZW+7128IdGKIkXIQCDpHJekebwzyLBo362c3Yz6L9wNct/owwfjAg5nPgZTXJs/yoAr0BE/Ow/q75GQ+DGmj/OoK4P2fYIvfrHFa94L/WveuxnIpCdZDdzof X-Gm-Message-State: AOJu0YwZlN6BTbEddHiDPqNhhLQqnVCi1z/6n3fjvKQL6FdYZKgwffLs B7Wupm9LslwvkpIfOT0wOTStIoGqDoIevETP68cjhRHyOY7Yk26ZU6hsDg== X-Google-Smtp-Source: AGHT+IFLPp89/CqqYt/oDY9TsD6MpTpW+WHomUzd8c9Q6sFQ45RxeiYZXRR2f3kaQNaQYK9HhlusCw== X-Received: by 2002:a17:907:9624:b0:a52:2a36:38bf with SMTP id gb36-20020a170907962400b00a522a3638bfmr1634494ejc.55.1713524924521; Fri, 19 Apr 2024 04:08:44 -0700 (PDT) Received: from 127.0.0.1localhost ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id z13-20020a17090655cd00b00a4739efd7cesm2082525ejp.60.2024.04.19.04.08.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 04:08:43 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, "David S . Miller" , Jakub Kicinski , David Ahern , Eric Dumazet , Willem de Bruijn , Jason Wang , Wei Liu , Paul Durrant , xen-devel@lists.xenproject.org, "Michael S . Tsirkin" , virtualization@lists.linux.dev, kvm@vger.kernel.org Subject: [PATCH io_uring-next/net-next v2 2/4] net: add callback for setting a ubuf_info to skb Date: Fri, 19 Apr 2024 12:08:40 +0100 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together. Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress. Reviewed-by: Jens Axboe Reviewed-by: David Ahern Signed-off-by: Pavel Begunkov Reviewed-by: Willem de Bruijn --- include/linux/skbuff.h | 2 ++ net/core/skbuff.c | 20 ++++++++++++++------ 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a44954264746..f76825e5b92a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -530,6 +530,8 @@ enum { struct ubuf_info_ops { void (*complete)(struct sk_buff *, struct ubuf_info *, bool zerocopy_success); + /* has to be compatible with skb_zcopy_set() */ + int (*link_skb)(struct sk_buff *skb, struct ubuf_info *uarg); }; /* diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 0f4cc759824b..0c8b82750000 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1824,11 +1824,18 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, struct ubuf_info *orig_uarg = skb_zcopy(skb); int err, orig_len = skb->len; - /* An skb can only point to one uarg. This edge case happens when - * TCP appends to an skb, but zerocopy_realloc triggered a new alloc. - */ - if (orig_uarg && uarg != orig_uarg) - return -EEXIST; + if (uarg->ops->link_skb) { + err = uarg->ops->link_skb(skb, uarg); + if (err) + return err; + } else { + /* An skb can only point to one uarg. This edge case happens + * when TCP appends to an skb, but zerocopy_realloc triggered + * a new alloc. + */ + if (orig_uarg && uarg != orig_uarg) + return -EEXIST; + } err = __zerocopy_sg_from_iter(msg, sk, skb, &msg->msg_iter, len); if (err == -EFAULT || (err == -EMSGSIZE && skb->len == orig_len)) { @@ -1842,7 +1849,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, return err; } - skb_zcopy_set(skb, uarg, NULL); + if (!uarg->ops->link_skb) + skb_zcopy_set(skb, uarg, NULL); return skb->len - orig_len; } EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream); From patchwork Fri Apr 19 11:08:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13636240 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 095DB81AC1; Fri, 19 Apr 2024 11:08:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524929; cv=none; b=L/b5ycQ573+DrCCSDHZm8n/wo6y9uBN9bR75hTFwZoGg3ufs/TavFkgUlFNh43zT18HEIqgsmrQ2nVylZx9Q9EqX8sQ/zYoFrfSBBAdz/WeGAiwMgU6KIbzgz5fsq0lPjjMidNd+fpyJCrEKaMbkXt/xfw7ScEifA9fy+Kt5nYk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524929; c=relaxed/simple; bh=5Ihm49/gOZVBFXJiRL0zEQqhoh6Us3hM99ueJay4DGY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sEI4hmHNilJ9kNQEwXUMPxDh58dPlITmh4XCS+vUGES37Oc8ccEvea0QMCTjZDbTRWLKU46cHRcHI8fqqwEI3PMSxE6ElwQr8CDHNrdUmizsQ1KyuXgVOy+ra1hTklsObbFoenMbENvNuWvHfH5iIquFvM/Ggfli2MyIzQsUsdY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VuaqpEZ8; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VuaqpEZ8" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a5557e3ebcaso316023066b.1; Fri, 19 Apr 2024 04:08:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713524926; x=1714129726; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vg0pL2CVxV49Fgeqn+qahTZ1Z4PbsN7awrFUvCLM+ng=; b=VuaqpEZ8By+fANp/BNjRokpJtCpAz4TtawQ/iMeLOE5fxrhr94wW5ho4qdgc5vkUke b+tqUmnGBYZFmcKdkBN/okc007d6Rt2yc/T72pzmtEiEfVR+izzfy7+0iQBkVjGZc4ev xlbrSyJnVSX4m/XpPaL1NwyJsMlKOMxor8bH3l/fHx0Cl+JhkDKfVlnPkKpMgJE/iG5H kfJxFrOOs2wNX1DJFZuNNwYajUqJ2eBfl83jY8mHBXDvSUVjGCHGLGaSxEv3mIEC1oIX 9zsONMf2zRgz1nEykpT+gNum2jPf2Q4OxZRD0Badm02WFsE046SRBjeHbhLJTZHNS9gf FGRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713524926; x=1714129726; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vg0pL2CVxV49Fgeqn+qahTZ1Z4PbsN7awrFUvCLM+ng=; b=h/DDq6k4BCyh1DmJ2j9Ben6Fc9iykijJyGGV9H6JuoaBKk4TD31eHx6ADvgcdOC9gK 8dBcdHD4QmFGsn31gj/CcM5RUvFQ/e5Bys7xE9tsGh+c5h2Aocl+WwVrnYCy62XbEYIM Q7SK1ALy0Z+akI4tlDljalPMcB/e54QNR2RqdNAZWgNGg1517+e/fNb4ZByRLkO+JCRA Ms2998mfZU7W21ZjOvn6zoSJielIFlUivCZMAoh+abGIMIaPmQAqeNoSzhMzasGgrxed 2CMXGbTqTB6MUYl3OT80q+Mf8S1PDotKUDEpEQf+ZO+I+I+oT3aolY61BU+W3uMfwlB/ w6rA== X-Forwarded-Encrypted: i=1; AJvYcCW4ghVD1kmHJHibObWPQmpRx6WxkzAv8TaH1cHHhkwKgLNmyt6HJReN2p6tgzxWdqFiVsWEGTTjfQt4RZyH3+okkpC/CczI22X6z+LZCYhV0ls7Z8jwHIabY+nv X-Gm-Message-State: AOJu0YzSHwAz9nSDOZm+touhx3v+hyGECrJY50wM2Xxrb33AwuqIdjVF dqTI+8PrirUvFsmYi14PkxOG6X2C6rojvlo1jP+e55lJi/dSQm38WV4mcg== X-Google-Smtp-Source: AGHT+IGlNNpKG3wGDzOQc1RYm8RL9LY87wF61Oxy7BMYiyIn9QMexWxwwOK9pfN0OrWYDJl4d93LDw== X-Received: by 2002:a17:907:7215:b0:a55:75f7:42fb with SMTP id dr21-20020a170907721500b00a5575f742fbmr4895587ejc.24.1713524926172; Fri, 19 Apr 2024 04:08:46 -0700 (PDT) Received: from 127.0.0.1localhost ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id z13-20020a17090655cd00b00a4739efd7cesm2082525ejp.60.2024.04.19.04.08.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 04:08:45 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, "David S . Miller" , Jakub Kicinski , David Ahern , Eric Dumazet , Willem de Bruijn , Jason Wang , Wei Liu , Paul Durrant , xen-devel@lists.xenproject.org, "Michael S . Tsirkin" , virtualization@lists.linux.dev, kvm@vger.kernel.org Subject: [PATCH io_uring-next/net-next v2 3/4] io_uring/notif: simplify io_notif_flush() Date: Fri, 19 Apr 2024 12:08:41 +0100 Message-ID: <19e41652c16718b946a5c80d2ad409df7682e47e.1713369317.git.asml.silence@gmail.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 io_notif_flush() is partially duplicating io_tx_ubuf_complete(), so instead of duplicating it, make the flush call io_tx_ubuf_complete. Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 6 +++--- io_uring/notif.h | 9 +++------ 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index 53532d78a947..26680176335f 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -9,7 +9,7 @@ #include "notif.h" #include "rsrc.h" -void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts) +static void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts) { struct io_notif_data *nd = io_notif_to_data(notif); @@ -23,8 +23,8 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts) io_req_task_complete(notif, ts); } -static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, - bool success) +void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, + bool success) { struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg); struct io_kiocb *notif = cmd_to_io_kiocb(nd); diff --git a/io_uring/notif.h b/io_uring/notif.h index 2e25a2fc77d1..2cf9ff6abd7a 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -21,7 +21,8 @@ struct io_notif_data { }; struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx); -void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts); +void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, + bool success); static inline struct io_notif_data *io_notif_to_data(struct io_kiocb *notif) { @@ -33,11 +34,7 @@ static inline void io_notif_flush(struct io_kiocb *notif) { struct io_notif_data *nd = io_notif_to_data(notif); - /* drop slot's master ref */ - if (refcount_dec_and_test(&nd->uarg.refcnt)) { - notif->io_task_work.func = io_notif_tw_complete; - __io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE); - } + io_tx_ubuf_complete(NULL, &nd->uarg, true); } static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len) From patchwork Fri Apr 19 11:08:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavel Begunkov X-Patchwork-Id: 13636241 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D9FC984D2E; Fri, 19 Apr 2024 11:08:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524931; cv=none; b=G2XKu6Wm6nZNWqNeN+PSYSerqpNz0/fnj9sH0qAn9RsC5cmo2kS7ztpp5nbP3khgb4/FYnHtk4M+/oGpXODdNtUQupurXPvWQsvOD3bQV00pIY+P0HpeT6KR1NXcF4PDnqK4vPqoUk3/RAUdJmMVacUOReNnjv5DCvORYcqQQUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713524931; c=relaxed/simple; bh=sgd0fnLt4r0RsBT24bZl6Ymc5Cr9jd8UGDKTw3oGxVo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AfJtab+Ukz3wv3uutNRcZZvZN8lGL9IBlfKA/vA3ahYoFYyqsTql4BtDEzvxrZ0kxYYMMDvZIDhhojIeYEWK3m71hVJ0AhzGnyuvVbu4Qu6A0tNI4NsinKvKd7a5c1J38vKIK30A7ZNQUvSa5oXHmKAlxBC0I3nbn5totIF32UA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FnQDYn+6; arc=none smtp.client-ip=209.85.218.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FnQDYn+6" Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a5561b88bb3so207744866b.0; Fri, 19 Apr 2024 04:08:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713524928; x=1714129728; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2fP3T1HRg4APmOPCr1i20fy3OAFGb9fSrnxRLKsDwOU=; b=FnQDYn+6iSLNMs9xOgGywna20u9bs/S15XW0t1y85QovUidQy5rFapaq06PRVB61Hk Cisnk/PfejXxYWgfSpmCJ28IPV9Mivl6M2WMkS+Pb1noomVBjvUeV4sf0Wgc8c8b2tWc UF82mCV/wBeqY4Do5hZbpSsY215JK2Oi78XqqAkxmviTt5CvQPjcS44UEAsAGavjNQ/0 qbra4Gcq8Qww4/QMR2GDzDNwLqKRZDuD/YoqACVSNNScWHLvvF2rg+pMMB1/34CNl3ak dQAmKbAW9Zgeu6M0nT9m6FG39TAitVhpTQglh3joIAz/Mbx/sZ8pL9PbPU4glUxxL8BP zjQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713524928; x=1714129728; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2fP3T1HRg4APmOPCr1i20fy3OAFGb9fSrnxRLKsDwOU=; b=w6gTjJECk8WQifhJSTDRdbqyNhWVpdjSAJyM1d/KhmE0PW1b6f71+uufsHwasXB7k2 66vfygy8iSuBWAsZMil3UlYE7R6196kHSrVzx6MmtPhLwSQ0uDGFzXrrX9x7iPzUkjX5 i9QvEqmyCnmp9ziHQM5YgkJNH/6G9+j+AXFZgIIJjDn6g2oM94mWxkQmwaDy5eIvP02u BWzTTxVa2RIp+4MngWxKcseiVLXSJB3AJ9S2eH5u1ZdqRhjehFKhRfGsm+gzut0a28nd ethBqJSv+Q0CDi5SIvNKMAbJCOLCWOywBjTnuislRGdYL1/PONJrW1xwPh0+V1TAJV2K HMJA== X-Forwarded-Encrypted: i=1; AJvYcCVCR7jymh/4t7ZkELLLjMo2vzUr9nxKufjLZGj2P/zI15E821EankDuOILkJAMxj5BXcPsSPdLTEzjHyjC/dMBeVIHeNDlc6iu+2hNM7PHeow9DzxLaH2Ef/1Ul X-Gm-Message-State: AOJu0YyX5TYhEdvEm8jqzzPS1bOWVjvPHthlBe94Muhhe+0n/LjNjqPk E8fTy9h0bz2JWVjpAQ9YodKiPGs9JVNAFuLhw6ppvyR+PmWniXda3kYJOA== X-Google-Smtp-Source: AGHT+IFy2QsdrmLfOtT9EYc1USivrPyf0Bwi5GEdqAq0Fpn53sZdiNwWQaN42/Mf/SYmqQzPfik1yw== X-Received: by 2002:a17:906:a206:b0:a52:2c00:9850 with SMTP id r6-20020a170906a20600b00a522c009850mr1380280ejy.59.1713524927726; Fri, 19 Apr 2024 04:08:47 -0700 (PDT) Received: from 127.0.0.1localhost ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id z13-20020a17090655cd00b00a4739efd7cesm2082525ejp.60.2024.04.19.04.08.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 04:08:46 -0700 (PDT) From: Pavel Begunkov To: io-uring@vger.kernel.org, netdev@vger.kernel.org Cc: Jens Axboe , asml.silence@gmail.com, "David S . Miller" , Jakub Kicinski , David Ahern , Eric Dumazet , Willem de Bruijn , Jason Wang , Wei Liu , Paul Durrant , xen-devel@lists.xenproject.org, "Michael S . Tsirkin" , virtualization@lists.linux.dev, kvm@vger.kernel.org Subject: [PATCH io_uring-next/net-next v2 4/4] io_uring/notif: implement notification stacking Date: Fri, 19 Apr 2024 12:08:42 +0100 Message-ID: X-Mailer: git-send-email 2.44.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The network stack allows only one ubuf_info per skb, and unlike MSG_ZEROCOPY, each io_uring zerocopy send will carry a separate ubuf_info. That means that send requests can't reuse a previosly allocated skb and need to get one more or more of new ones. That's fine for large sends, but otherwise it would spam the stack with lots of skbs carrying just a little data each. To help with that implement linking notification (i.e. an io_uring wrapper around ubuf_info) into a list. Each is refcounted by skbs and the stack as usual. additionally all non head entries keep a reference to the head, which they put down when their refcount hits 0. When the head have no more users, it'll efficiently put all notifications in a batch. As mentioned previously about ->io_link_skb, the callback implementation always allows to bind to an skb without a ubuf_info. Reviewed-by: Jens Axboe Signed-off-by: Pavel Begunkov --- io_uring/notif.c | 71 +++++++++++++++++++++++++++++++++++++++++++----- io_uring/notif.h | 3 ++ 2 files changed, 67 insertions(+), 7 deletions(-) diff --git a/io_uring/notif.c b/io_uring/notif.c index 26680176335f..d58cdc01e691 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -9,18 +9,28 @@ #include "notif.h" #include "rsrc.h" +static const struct ubuf_info_ops io_ubuf_ops; + static void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts) { struct io_notif_data *nd = io_notif_to_data(notif); - if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used)) - notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED; + do { + notif = cmd_to_io_kiocb(nd); - if (nd->account_pages && notif->ctx->user) { - __io_unaccount_mem(notif->ctx->user, nd->account_pages); - nd->account_pages = 0; - } - io_req_task_complete(notif, ts); + lockdep_assert(refcount_read(&nd->uarg.refcnt) == 0); + + if (unlikely(nd->zc_report) && (nd->zc_copied || !nd->zc_used)) + notif->cqe.res |= IORING_NOTIF_USAGE_ZC_COPIED; + + if (nd->account_pages && notif->ctx->user) { + __io_unaccount_mem(notif->ctx->user, nd->account_pages); + nd->account_pages = 0; + } + + nd = nd->next; + io_req_task_complete(notif, ts); + } while (nd); } void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, @@ -39,12 +49,56 @@ void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, if (!refcount_dec_and_test(&uarg->refcnt)) return; + if (nd->head != nd) { + io_tx_ubuf_complete(skb, &nd->head->uarg, success); + return; + } notif->io_task_work.func = io_notif_tw_complete; __io_req_task_work_add(notif, IOU_F_TWQ_LAZY_WAKE); } +static int io_link_skb(struct sk_buff *skb, struct ubuf_info *uarg) +{ + struct io_notif_data *nd, *prev_nd; + struct io_kiocb *prev_notif, *notif; + struct ubuf_info *prev_uarg = skb_zcopy(skb); + + nd = container_of(uarg, struct io_notif_data, uarg); + notif = cmd_to_io_kiocb(nd); + + if (!prev_uarg) { + net_zcopy_get(&nd->uarg); + skb_zcopy_init(skb, &nd->uarg); + return 0; + } + /* handle it separately as we can't link a notif to itself */ + if (unlikely(prev_uarg == &nd->uarg)) + return 0; + /* we can't join two links together, just request a fresh skb */ + if (unlikely(nd->head != nd || nd->next)) + return -EEXIST; + /* don't mix zc providers */ + if (unlikely(prev_uarg->ops != &io_ubuf_ops)) + return -EEXIST; + + prev_nd = container_of(prev_uarg, struct io_notif_data, uarg); + prev_notif = cmd_to_io_kiocb(nd); + + /* make sure all noifications can be finished in the same task_work */ + if (unlikely(notif->ctx != prev_notif->ctx || + notif->task != prev_notif->task)) + return -EEXIST; + + nd->head = prev_nd->head; + nd->next = prev_nd->next; + prev_nd->next = nd; + net_zcopy_get(&nd->head->uarg); + return 0; +} + static const struct ubuf_info_ops io_ubuf_ops = { .complete = io_tx_ubuf_complete, + .link_skb = io_link_skb, }; struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) @@ -65,6 +119,9 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) nd = io_notif_to_data(notif); nd->zc_report = false; nd->account_pages = 0; + nd->next = NULL; + nd->head = nd; + nd->uarg.flags = IO_NOTIF_UBUF_FLAGS; nd->uarg.ops = &io_ubuf_ops; refcount_set(&nd->uarg.refcnt, 1); diff --git a/io_uring/notif.h b/io_uring/notif.h index 2cf9ff6abd7a..f3589cfef4a9 100644 --- a/io_uring/notif.h +++ b/io_uring/notif.h @@ -14,6 +14,9 @@ struct io_notif_data { struct file *file; struct ubuf_info uarg; + struct io_notif_data *next; + struct io_notif_data *head; + unsigned account_pages; bool zc_report; bool zc_used;