From patchwork Wed Jun 15 16:20:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12882700 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCD54CCA47E for ; Wed, 15 Jun 2022 16:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354769AbiFOQU0 (ORCPT ); Wed, 15 Jun 2022 12:20:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354819AbiFOQUY (ORCPT ); Wed, 15 Jun 2022 12:20:24 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C104335; Wed, 15 Jun 2022 09:20:23 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id p63so9080334qkd.10; Wed, 15 Jun 2022 09:20:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=YwLawl/4zZ8T1eVXZ9RLM+u5R3sy+ZfydBh5GxBZmMA=; b=IEZZioPZnHcgdCMQzwCI7x1L1X+xt0h3NkQLb1Z0k9Ui9/XLi9ULN5SjRrWEExpjjV V5EI6BvvX5bEfXc0N6VLtam4VtV1sZCRfhKoAZ4gJKQW9kAqeLMMGdwh+5Eq88gCWMA+ m46olnCRqkwYP+M+Grx9555jddX+graVhpjIOnTwWDIZjSoZP45SEzBlMtlz9Gew7F9E e/5s+ECy9EKTxtO29gBO3GUWYFMaopF683AIZtG0h830rKV7hG3NEE+CNc1dvoOjx1NI onJeMVhlXy4VazHop5GWsuJQ+uqUghOzQZiEuXwhZPfETlFoh/3D4NMsnuD7kaYdPAWZ u2aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YwLawl/4zZ8T1eVXZ9RLM+u5R3sy+ZfydBh5GxBZmMA=; b=mvkn9CnW70CkNaRf2g7lEwNfRj6hM3o9n2+HKxS7st26f8kcvZKOnpb3zxa8sT9D+9 3iCOThDKmk6Xf1vlbKrTf03Sgn5kDhUqGcMJthlfaMSW24zyWwmTxfYmE5q+BI7Ul5u+ b43Qfx2W5iXTriJxVeLTF7blGjSLeTRulUHHNEYHQln1aqesixVh6kAR5MYh3cO93VE4 e/ywIxDrNgqnZ8xIagwTFO04PTHVSFtrn5OkNnJ0wwybw0BazXHwxvNh+CKcsUwuLpbU S0qlVNPqQ0+nIYoO6NE8rLzt9edlflWXDQRLrPWMblNhwQjf4CLtNP4I2Ebo+Kq5GDfr dr6Q== X-Gm-Message-State: AJIora8gSqBBUjmMgdPFrl7l6Ft7c0DUnfgS7kwG0snwyYDzPxBymMML 4NL/FH8HN3XFkOrGq3F/2Ne2x1NMqVI= X-Google-Smtp-Source: AGRyM1vDUfJRUg1hpBu3q+Y6z7sKhw2ZVFIUJvo+3KEZB844Gz/Hmp7NxsITFwq1q98vP5KWUrLugg== X-Received: by 2002:a05:620a:1535:b0:6a6:8fdb:29e4 with SMTP id n21-20020a05620a153500b006a68fdb29e4mr389383qkk.126.1655310022299; Wed, 15 Jun 2022 09:20:22 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a62b:b216:9d84:9f87]) by smtp.gmail.com with ESMTPSA id az7-20020a05620a170700b006a69ee117b6sm11893918qkb.97.2022.06.15.09.20.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Jun 2022 09:20:21 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Eric Dumazet , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v4 1/4] tcp: introduce tcp_read_skb() Date: Wed, 15 Jun 2022 09:20:11 -0700 Message-Id: <20220615162014.89193-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220615162014.89193-1-xiyou.wangcong@gmail.com> References: <20220615162014.89193-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang This patch inroduces tcp_read_skb() based on tcp_read_sock(), a preparation for the next patch which actually introduces a new sock ops. TCP is special here, because it has tcp_read_sock() which is mainly used by splice(). tcp_read_sock() supports partial read and arbitrary offset, neither of them is needed for sockmap. Cc: Eric Dumazet Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang Reviewed-by: Eric Dumazet --- include/net/tcp.h | 2 ++ net/ipv4/tcp.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/include/net/tcp.h b/include/net/tcp.h index 1e99f5c61f84..878544d0f8f9 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -669,6 +669,8 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); void tcp_initialize_rcv_mss(struct sock *sk); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 9984d23a7f3e..e4adbdb0a5c4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1709,6 +1709,53 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, } EXPORT_SYMBOL(tcp_read_sock); +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + struct tcp_sock *tp = tcp_sk(sk); + u32 seq = tp->copied_seq; + struct sk_buff *skb; + int copied = 0; + u32 offset; + + if (sk->sk_state == TCP_LISTEN) + return -ENOTCONN; + + while ((skb = tcp_recv_skb(sk, seq, &offset)) != NULL) { + int used; + + __skb_unlink(skb, &sk->sk_receive_queue); + used = recv_actor(desc, skb, 0, skb->len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } + seq += used; + copied += used; + + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { + consume_skb(skb); + ++seq; + break; + } + consume_skb(skb); + if (!desc->count) + break; + WRITE_ONCE(tp->copied_seq, seq); + } + WRITE_ONCE(tp->copied_seq, seq); + + tcp_rcv_space_adjust(sk); + + /* Clean up data we have read: This will do ACK frames. */ + if (copied > 0) + tcp_cleanup_rbuf(sk, copied); + + return copied; +} +EXPORT_SYMBOL(tcp_read_skb); + int tcp_peek_len(struct socket *sock) { return tcp_inq(sock->sk); From patchwork Wed Jun 15 16:20:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12882701 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1100BC43334 for ; Wed, 15 Jun 2022 16:20:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354937AbiFOQU1 (ORCPT ); Wed, 15 Jun 2022 12:20:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354903AbiFOQU0 (ORCPT ); Wed, 15 Jun 2022 12:20:26 -0400 Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDBC7F59; Wed, 15 Jun 2022 09:20:24 -0700 (PDT) Received: by mail-qk1-x72f.google.com with SMTP id p63so9080397qkd.10; Wed, 15 Jun 2022 09:20:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EnkVmrTCO6NOOQd7Nt3hcpjPX52yO6trP6VXuCvre70=; b=nuxnuMRyKYCwIVbC7OdsKiAiWCaAzUg2aLbknxwpDKGOEu++eCoP2GCySY4fCu7eCr EXg0jnwP5pCzIQAVoim0u3OwHWvlHPZ7Alp553IrmHm8imuX9vvTyLUJBisp4ezQygyw 4g8yb9BXR5LkTkWDyZXZH2N4tMehTPwobBy9Bb5nQ+6pr3Hz0hR7Yd9Nc6HDqHskCGI9 3C4rwRMlJZ9OUz02edHEeiblwkoMQ6XXO2ax56Vix60o22oqFkTGhgUyAyR+IfsgjDS1 WK9HDlNsYKhA8d7qnk8+3NTjg7JsfBmqphIlRpz/GPYrLxnPbjegAZH8f+FmIuWq5M2a ccqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EnkVmrTCO6NOOQd7Nt3hcpjPX52yO6trP6VXuCvre70=; b=wAAgnbf5X4soGA4WNol80aROK2OueneEoTw9ylHJQ961EAO8IvAU8A5WuuvgAyG0m2 9mfV/c7CS/gyyQImHbQb+tXNMhNt3hN+AfxKzw+8GLxO0g3mgLaL9RvQJjhXnH4D8wmE pvjssTMAPaFghJ8aa5d74ex4L33h1dU6QIfL6PKT6ibza+Tj+FZ5JlHGtg4kYLZzpD5E Gy8IhXg2s3w5HG04Ac/ZA6O5+GdXmNAS3ymeXUgqnal1xMVHCdsPhuFcbxXFALOHUKFy djjx17l/Iu7oQ7pVWVTesSaMr8eh3SytHDmaqKSrpRW/xUEFpazynyOUNFh0cydkKBxd ddhQ== X-Gm-Message-State: AJIora+Qow9nz4yI8BCshzVSklZBnXnf9RXAlBgCt7Dhy1ChiRS//0yF J2rKIX8pCw3SKk/aIfS7D82u9qMtKyY= X-Google-Smtp-Source: AGRyM1s8yaByijgYLRKg4qtYPvzakSiYBMeXi9C1hb8jdmMfoyAAT1emfoHLLjg1JuqB2v4BSg+OVA== X-Received: by 2002:a37:9fc9:0:b0:6a6:c16b:e634 with SMTP id i192-20020a379fc9000000b006a6c16be634mr391539qke.230.1655310023642; Wed, 15 Jun 2022 09:20:23 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a62b:b216:9d84:9f87]) by smtp.gmail.com with ESMTPSA id az7-20020a05620a170700b006a69ee117b6sm11893918qkb.97.2022.06.15.09.20.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Jun 2022 09:20:23 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Eric Dumazet , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v4 2/4] net: introduce a new proto_ops ->read_skb() Date: Wed, 15 Jun 2022 09:20:12 -0700 Message-Id: <20220615162014.89193-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220615162014.89193-1-xiyou.wangcong@gmail.com> References: <20220615162014.89193-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Cc: Eric Dumazet Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- include/linux/net.h | 4 ++++ include/net/tcp.h | 3 +-- include/net/udp.h | 3 +-- net/core/skmsg.c | 20 +++++--------------- net/ipv4/af_inet.c | 3 ++- net/ipv4/tcp.c | 9 +++------ net/ipv4/udp.c | 10 ++++------ net/ipv6/af_inet6.c | 3 ++- net/unix/af_unix.c | 23 +++++++++-------------- 9 files changed, 31 insertions(+), 47 deletions(-) diff --git a/include/linux/net.h b/include/linux/net.h index 12093f4db50c..a03485e8cbb2 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -152,6 +152,8 @@ struct module; struct sk_buff; typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *, unsigned int, size_t); +typedef int (*skb_read_actor_t)(struct sock *, struct sk_buff *); + struct proto_ops { int family; @@ -214,6 +216,8 @@ struct proto_ops { */ int (*read_sock)(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); + /* This is different from read_sock(), it reads an entire skb at a time. */ + int (*read_skb)(struct sock *sk, skb_read_actor_t recv_actor); int (*sendpage_locked)(struct sock *sk, struct page *page, int offset, size_t size, int flags); int (*sendmsg_locked)(struct sock *sk, struct msghdr *msg, diff --git a/include/net/tcp.h b/include/net/tcp.h index 878544d0f8f9..3aa859c9a0fb 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -669,8 +669,7 @@ void tcp_get_info(struct sock *, struct tcp_info *); /* Read 'sendfile()'-style from a TCP socket */ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); -int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); +int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); void tcp_initialize_rcv_mss(struct sock *sk); diff --git a/include/net/udp.h b/include/net/udp.h index b83a00330566..47a0e3359771 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -305,8 +305,7 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); -int udp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); +int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 7e03f96e441b..f7f63b7d990c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1160,21 +1160,17 @@ static void sk_psock_done_strp(struct sk_psock *psock) } #endif /* CONFIG_BPF_STREAM_PARSER */ -static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, - unsigned int offset, size_t orig_len) +static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) { - struct sock *sk = (struct sock *)desc->arg.data; struct sk_psock *psock; struct bpf_prog *prog; int ret = __SK_DROP; - int len = orig_len; + int len = skb->len; /* clone here so sk_eat_skb() in tcp_read_sock does not drop our data */ skb = skb_clone(skb, GFP_ATOMIC); - if (!skb) { - desc->error = -ENOMEM; + if (!skb) return 0; - } rcu_read_lock(); psock = sk_psock(sk); @@ -1204,16 +1200,10 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; - read_descriptor_t desc; - if (unlikely(!sock || !sock->ops || !sock->ops->read_sock)) + if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; - - desc.arg.data = sk; - desc.error = 0; - desc.count = 1; - - sock->ops->read_sock(sk, &desc, sk_psock_verdict_recv); + sock->ops->read_skb(sk, sk_psock_verdict_recv); } void sk_psock_start_verdict(struct sock *sk, struct sk_psock *psock) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 93da9f783bec..f615263855d0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1040,6 +1040,7 @@ const struct proto_ops inet_stream_ops = { .sendpage = inet_sendpage, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, + .read_skb = tcp_read_skb, .sendmsg_locked = tcp_sendmsg_locked, .sendpage_locked = tcp_sendpage_locked, .peek_len = tcp_peek_len, @@ -1067,7 +1068,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, - .read_sock = udp_read_sock, + .read_skb = udp_read_skb, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e4adbdb0a5c4..30350d33cd4f 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1709,8 +1709,7 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, } EXPORT_SYMBOL(tcp_read_sock); -int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { struct tcp_sock *tp = tcp_sk(sk); u32 seq = tp->copied_seq; @@ -1725,7 +1724,7 @@ int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, int used; __skb_unlink(skb, &sk->sk_receive_queue); - used = recv_actor(desc, skb, 0, skb->len); + used = recv_actor(sk, skb); if (used <= 0) { if (!copied) copied = used; @@ -1740,9 +1739,7 @@ int tcp_read_skb(struct sock *sk, read_descriptor_t *desc, break; } consume_skb(skb); - if (!desc->count) - break; - WRITE_ONCE(tp->copied_seq, seq); + break; } WRITE_ONCE(tp->copied_seq, seq); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index aa9f2ec3dc46..0a1e90b80e36 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1795,8 +1795,7 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); -int udp_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { int copied = 0; @@ -1818,7 +1817,7 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, continue; } - used = recv_actor(desc, skb, 0, skb->len); + used = recv_actor(sk, skb); if (used <= 0) { if (!copied) copied = used; @@ -1829,13 +1828,12 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, } kfree_skb(skb); - if (!desc->count) - break; + break; } return copied; } -EXPORT_SYMBOL(udp_read_sock); +EXPORT_SYMBOL(udp_read_skb); /* * This should be easy, if there is something there we diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 70564ddccc46..1aea5ef9bdea 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -701,6 +701,7 @@ const struct proto_ops inet6_stream_ops = { .sendpage_locked = tcp_sendpage_locked, .splice_read = tcp_splice_read, .read_sock = tcp_read_sock, + .read_skb = tcp_read_skb, .peek_len = tcp_peek_len, #ifdef CONFIG_COMPAT .compat_ioctl = inet6_compat_ioctl, @@ -726,7 +727,7 @@ const struct proto_ops inet6_dgram_ops = { .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ .recvmsg = inet6_recvmsg, /* retpoline's sake */ - .read_sock = udp_read_sock, + .read_skb = udp_read_skb, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 654dcef7cfb3..3a96008ec331 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -741,10 +741,8 @@ static ssize_t unix_stream_splice_read(struct socket *, loff_t *ppos, unsigned int flags); static int unix_dgram_sendmsg(struct socket *, struct msghdr *, size_t); static int unix_dgram_recvmsg(struct socket *, struct msghdr *, size_t, int); -static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); -static int unix_stream_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor); +static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor); +static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor); static int unix_dgram_connect(struct socket *, struct sockaddr *, int, int); static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t); @@ -798,7 +796,7 @@ static const struct proto_ops unix_stream_ops = { .shutdown = unix_shutdown, .sendmsg = unix_stream_sendmsg, .recvmsg = unix_stream_recvmsg, - .read_sock = unix_stream_read_sock, + .read_skb = unix_stream_read_skb, .mmap = sock_no_mmap, .sendpage = unix_stream_sendpage, .splice_read = unix_stream_splice_read, @@ -823,7 +821,7 @@ static const struct proto_ops unix_dgram_ops = { .listen = sock_no_listen, .shutdown = unix_shutdown, .sendmsg = unix_dgram_sendmsg, - .read_sock = unix_read_sock, + .read_skb = unix_read_skb, .recvmsg = unix_dgram_recvmsg, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, @@ -2487,8 +2485,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t si return __unix_dgram_recvmsg(sk, msg, size, flags); } -static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { int copied = 0; @@ -2503,7 +2500,7 @@ static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, if (!skb) return err; - used = recv_actor(desc, skb, 0, skb->len); + used = recv_actor(sk, skb); if (used <= 0) { if (!copied) copied = used; @@ -2514,8 +2511,7 @@ static int unix_read_sock(struct sock *sk, read_descriptor_t *desc, } kfree_skb(skb); - if (!desc->count) - break; + break; } return copied; @@ -2650,13 +2646,12 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk, } #endif -static int unix_stream_read_sock(struct sock *sk, read_descriptor_t *desc, - sk_read_actor_t recv_actor) +static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { if (unlikely(sk->sk_state != TCP_ESTABLISHED)) return -ENOTCONN; - return unix_read_sock(sk, desc, recv_actor); + return unix_read_skb(sk, recv_actor); } static int unix_stream_read_generic(struct unix_stream_read_state *state, From patchwork Wed Jun 15 16:20:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12882702 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85E5CC433EF for ; Wed, 15 Jun 2022 16:20:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355265AbiFOQUi (ORCPT ); Wed, 15 Jun 2022 12:20:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354909AbiFOQU1 (ORCPT ); Wed, 15 Jun 2022 12:20:27 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73F8E26F; Wed, 15 Jun 2022 09:20:26 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id n197so9121926qke.1; Wed, 15 Jun 2022 09:20:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2p5T12Sr5Y7Jc+Rm7GQI0KwEROHnmY0n8erF/Ry4hZE=; b=Rj1VPoRrO1P4DrfUCSaX6hMsSL/3dSgruT7Yb3qAUwUAWOJ03w/Sqi9UK+XpSU2JeV k0X5YpThk5qpKkxj1ZU+hBtJgEGaT9+AsWq2XWjglFWW6RDVPXkoK0WbNsKObhBqWX7T ldL3sgLz7/SBi5YvNz3MZotFO0iGwpUEOovX9k3CqOyznZmM0fYcQLKm7Im6lkXsoygh y+y2dDofHpgQaLdM9uap3NCHSfyENKxYIGGexlrL6VdLEHVVVz+OKdko706yOjcUfDGF 5+zvpK6UmHSnEc/f260HbWEQLNH3pm9wSi6z45uMqAt+bw1YGBqLB9tsfNv+37J7SPAH 2kpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2p5T12Sr5Y7Jc+Rm7GQI0KwEROHnmY0n8erF/Ry4hZE=; b=HK2knS7BTeLW2zwgGcFtb3M7/Jgj6JVjU1z/UfdfDvaDm2FeuAAvaI3dvQxfONHOS1 WT36ZC4E2UFLqmO12vfQrR4zvqrgJXO7dajHYlHzUeTk4LwEMKcD9T/Z6Z9HU5ejgED5 uU+9RMxdMy9QkCezpWpdTTvuxREMSWB6/7KIw2bimv3t4OL3mBSThprNz7qscM/E9BHb 3FtFth3Byi7eODYTGtahbdvX9Z3pU5ElNq3rg0YDQmZiurGHFxolbcykWSBqb68X+uHD c8utBlyp/pVRSnzaZAKNwGOoheGkPG/tNtjYj/UgMymjf8rOmnRfoImQHmPS10SKu6rA 3HAg== X-Gm-Message-State: AJIora8C7kyElo3Gtr5qRYODDYgrizGfRVH2cVuJmuIiaVNC/9v2LVLD Yido+8tWG9B5KJuUag6VzafwVd13SSQ= X-Google-Smtp-Source: AGRyM1vEeG2o/DKpdD/N2i6QI0WSqWBGz6hxPpP5xe5zrraLBj5goGYEKe3xA4hr8iC46S+ZzvjWqw== X-Received: by 2002:a05:620a:12ed:b0:6a6:b27e:a030 with SMTP id f13-20020a05620a12ed00b006a6b27ea030mr340424qkl.659.1655310025306; Wed, 15 Jun 2022 09:20:25 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a62b:b216:9d84:9f87]) by smtp.gmail.com with ESMTPSA id az7-20020a05620a170700b006a69ee117b6sm11893918qkb.97.2022.06.15.09.20.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Jun 2022 09:20:24 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , Eric Dumazet , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v4 3/4] skmsg: get rid of skb_clone() Date: Wed, 15 Jun 2022 09:20:13 -0700 Message-Id: <20220615162014.89193-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220615162014.89193-1-xiyou.wangcong@gmail.com> References: <20220615162014.89193-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang With ->read_skb() now we have an entire skb dequeued from receive queue, now we just need to grab an addtional refcnt before passing its ownership to recv actors. And we should not touch them any more, particularly for skb->sk. Fortunately, skb->sk is already set for most of the protocols except UDP where skb->sk has been stolen, so we have to fix it up for UDP case. Cc: Eric Dumazet Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- net/core/skmsg.c | 7 +------ net/ipv4/udp.c | 1 + 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index f7f63b7d990c..8b248d289c11 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -1167,10 +1167,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) int ret = __SK_DROP; int len = skb->len; - /* clone here so sk_eat_skb() in tcp_read_sock does not drop our data */ - skb = skb_clone(skb, GFP_ATOMIC); - if (!skb) - return 0; + skb_get(skb); rcu_read_lock(); psock = sk_psock(sk); @@ -1183,12 +1180,10 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb) if (!prog) prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { - skb->sk = sk; skb_dst_drop(skb); skb_bpf_redirect_clear(skb); ret = bpf_prog_run_pin_on_cpu(prog, skb); ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb)); - skb->sk = NULL; } if (sk_psock_verdict_apply(psock, skb, ret) < 0) len = 0; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 0a1e90b80e36..b09936ccf709 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1817,6 +1817,7 @@ int udp_read_skb(struct sock *sk, skb_read_actor_t recv_actor) continue; } + WARN_ON(!skb_set_owner_sk_safe(skb, sk)); used = recv_actor(sk, skb); if (used <= 0) { if (!copied) From patchwork Wed Jun 15 16:20:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12882703 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F72ECCA473 for ; Wed, 15 Jun 2022 16:20:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355005AbiFOQUj (ORCPT ); Wed, 15 Jun 2022 12:20:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232224AbiFOQU2 (ORCPT ); Wed, 15 Jun 2022 12:20:28 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D497DB9F; Wed, 15 Jun 2022 09:20:27 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id 89so9201359qvc.0; Wed, 15 Jun 2022 09:20:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=/YIsaXDf0Vla/CpgZ3fY3VVxEFB+BQX409ICdnuhF6U=; b=VWbaQ9TNHrSILW/Tu+SKzxXjIOWED+dR5EueVKPZTQ30IBefsNMY40tGO5VNSIO477 57QGs/mi21GcgH8t5BzjeCwqdwbhfMcY6QkKsvNKg7EUgXVjliI0bPg2LHkQaaP2J0Ci fDN/4M0/m1RA2NbDv1+qgU1ex9QYKCNqv0VmZp5bUPYKVIjNJDg4iOrwwz1JMnhmONM6 9Sh4i0Vo6eC2YqyrzIGjoaS0duTNg3DHsM7SJ0YYVa4agf3XLaFuZHFlavWC82K2sYS0 onJJtOH0nqVGKHMVAzamy5YHWCBq8zIhpexyPIhoF6HW2Z1o3ShioxC6OUH4AOL8NhDL Qssw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/YIsaXDf0Vla/CpgZ3fY3VVxEFB+BQX409ICdnuhF6U=; b=8Ob1fvdwW/m3drOoLcSaWUyW7xXB8rNzdo0l0hlGgo8naJCP2uvHgPAgSM95UmGYWZ ZZDsIjbiVEp8hUwwSPCX8rAjrL6lcnwsIN8IrPDneueb0IkGCIA9mOYyo7G+aoljLzEE Hi6ouCL6LJAdssE2ftRpd5yf3mScvYUwjx6GeRKGjqUFxe5VciXoKF9kDekoRCN6IN9V ZchONTZJ0EBrtj7eD3oQZjMFiNVHT3GuU5NsyUKV6t9vIhM8gqmlY3kiv7rENNhdOLFX cw5CC5N6BGqhBQE4GYEbHsYAxnE24a5jLpRxd+3dSm6biTY+8ltk1TZOZM/h0QLKPdc/ rmBA== X-Gm-Message-State: AJIora8lUWyMvrjV7Hscmas2jqHqmPbRhYQzBl9LUZ9ywRQHycKnwUq5 g3Z4nYTY2DxZ3Q4Clz7qtY+q0b6Oe84= X-Google-Smtp-Source: AGRyM1v1bcONI72MGMSCwAE1ONnvO0gHfpciAlt2Y/i7YRF3X9aqNHQq/eYUL8ICwnyxn4UXF1QLJQ== X-Received: by 2002:a05:6214:509c:b0:467:d42b:3b43 with SMTP id kk28-20020a056214509c00b00467d42b3b43mr492473qvb.19.1655310026654; Wed, 15 Jun 2022 09:20:26 -0700 (PDT) Received: from pop-os.attlocal.net ([2600:1700:65a0:ab60:a62b:b216:9d84:9f87]) by smtp.gmail.com with ESMTPSA id az7-20020a05620a170700b006a69ee117b6sm11893918qkb.97.2022.06.15.09.20.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Jun 2022 09:20:26 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki Subject: [Patch bpf-next v4 4/4] skmsg: get rid of unncessary memset() Date: Wed, 15 Jun 2022 09:20:14 -0700 Message-Id: <20220615162014.89193-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220615162014.89193-1-xiyou.wangcong@gmail.com> References: <20220615162014.89193-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We always allocate skmsg with kzalloc(), so there is no need to call memset(0) on it, the only thing we need from sk_msg_init() is sg_init_marker(). So introduce a new helper which is just kzalloc()+sg_init_marker(), this saves an unncessary memset(0) for skmsg on fast path. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Signed-off-by: Cong Wang --- net/core/skmsg.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 8b248d289c11..4b297d67edb7 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -497,23 +497,27 @@ bool sk_msg_is_readable(struct sock *sk) } EXPORT_SYMBOL_GPL(sk_msg_is_readable); -static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, - struct sk_buff *skb) +static struct sk_msg *alloc_sk_msg(void) { struct sk_msg *msg; - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_KERNEL); + if (unlikely(!msg)) return NULL; + sg_init_marker(msg->sg.data, NR_MSG_FRAG_IDS); + return msg; +} - if (!sk_rmem_schedule(sk, skb, skb->truesize)) +static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, + struct sk_buff *skb) +{ + if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) return NULL; - msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_KERNEL); - if (unlikely(!msg)) + if (!sk_rmem_schedule(sk, skb, skb->truesize)) return NULL; - sk_msg_init(msg); - return msg; + return alloc_sk_msg(); } static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb, @@ -590,13 +594,12 @@ static int sk_psock_skb_ingress(struct sk_psock *psock, struct sk_buff *skb, static int sk_psock_skb_ingress_self(struct sk_psock *psock, struct sk_buff *skb, u32 off, u32 len) { - struct sk_msg *msg = kzalloc(sizeof(*msg), __GFP_NOWARN | GFP_ATOMIC); + struct sk_msg *msg = alloc_sk_msg(); struct sock *sk = psock->sk; int err; if (unlikely(!msg)) return -EAGAIN; - sk_msg_init(msg); skb_set_owner_r(skb, sk); err = sk_psock_skb_ingress_enqueue(skb, off, len, psock, sk, msg); if (err < 0)