From patchwork Tue Mar 2 02:37:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111191 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D07EDC433E6 for ; Tue, 2 Mar 2021 08:50:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E29C64DE8 for ; Tue, 2 Mar 2021 08:50:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377908AbhCBIoD (ORCPT ); Tue, 2 Mar 2021 03:44:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443986AbhCBCil (ORCPT ); Mon, 1 Mar 2021 21:38:41 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20EA5C06178B; Mon, 1 Mar 2021 18:37:55 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id w69so20485861oif.1; Mon, 01 Mar 2021 18:37:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R0Ojmw0D2lDoqGY2EK1tsZimcNKC42bH4E+kY1K6FpI=; b=AGcNBf4f7BqYj1puJ/t6NxUMt092Y/xARajCd9ETb+LFdXy0G5huHB+eASxwaheo5H irNR3fcNHyNhyyK5U+84LNEtf3s4E6HywMzhdx+/lZgOdGgUzXIlJprtzTuylJSUuvbH gWpchAGP16Y5yM5wICwQGxAk8npOeHsHp+5Qr/uunR5cg1E3VTewZJiTmwQgIRAeFDTS Ia1XPN/NFZ9qLlJ0C8hFIalXmis2N6Ehr/cLIC635KqrNzRuyJfRdbM9wrtB6g3g1JSl pZ7F2ISpYlP2bVttSw+W1XqYuyMZTScIP7mDhhCMmcI1bpjjuavuhNFfXn1S9mr8wATz v+gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R0Ojmw0D2lDoqGY2EK1tsZimcNKC42bH4E+kY1K6FpI=; b=dDue01tc8gL5E8q5zooLeFru3loH75rtpTeTOA2EskRjRaotCjWVUdjKZ0gAhtdqfb 4QQcHCVa2QCEajhKzhZKXw3lA/8K+cI1nM61hDykCR9gll+uBTDiX3WXAVuXpuWOBqU+ GxOOzXx2LXV7Mhntuljm56lx8roxFi9FEU77KzTbysDfeZ8/GiJewsgWpKxgmSblqC7Z 1fuiPE19hWrpgzUTBqRn+v/gL+7urVBrleuE2WCelxik4yDXJ5/l6qzjgl5HvNFHcaCm nKRATuvUtrWOveTsWNrKowChJzUBHcjOblnKJoEyv7H4cKJTRTnjaieG7Lu8ms9dwpim uKwQ== X-Gm-Message-State: AOAM530LSW5YYyX+lbeanHMHGxP2qOrGWj/PSpfiyxPHZbO9Fb1vh/OA nRMvwtCHyAKgSKLV3GAf1/YW4tc4a9+dDg== X-Google-Smtp-Source: ABdhPJwftY1RbFknra67NrwF8kDg5/YZztQimQK3jVrm8gdPFvLIKdRiMGIaeGXG21fhnFemFlfvwg== X-Received: by 2002:aca:2406:: with SMTP id n6mr1674534oic.56.1614652674393; Mon, 01 Mar 2021 18:37:54 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.37.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:37:53 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 1/9] sock_map: introduce BPF_SK_SKB_VERDICT Date: Mon, 1 Mar 2021 18:37:35 -0800 Message-Id: <20210302023743.24123-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to set stream_verdict and skb_verdict at the same time. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 3 +++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + net/core/skmsg.c | 4 +++- net/core/sock_map.c | 23 ++++++++++++++++++++++- tools/bpf/bpftool/common.c | 1 + tools/bpf/bpftool/prog.c | 1 + tools/include/uapi/linux/bpf.h | 1 + 8 files changed, 33 insertions(+), 2 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 6c09d94be2e9..451530d41af7 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -58,6 +58,7 @@ struct sk_psock_progs { struct bpf_prog *msg_parser; struct bpf_prog *stream_parser; struct bpf_prog *stream_verdict; + struct bpf_prog *skb_verdict; }; enum sk_psock_state_bits { @@ -442,6 +443,7 @@ static inline void psock_progs_drop(struct sk_psock_progs *progs) psock_set_prog(&progs->msg_parser, NULL); psock_set_prog(&progs->stream_parser, NULL); psock_set_prog(&progs->stream_verdict, NULL); + psock_set_prog(&progs->skb_verdict, NULL); } int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb); @@ -489,5 +491,6 @@ static inline void skb_bpf_redirect_clear(struct sk_buff *skb) { skb->_sk_redir = 0; } + #endif /* CONFIG_NET_SOCK_MSG */ #endif /* _LINUX_SKMSG_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b89af20cfa19..1a08ab00a45e 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -247,6 +247,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c859bc46d06c..afa803a1553e 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2941,6 +2941,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_SK_MSG; case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: + case BPF_SK_SKB_VERDICT: return BPF_PROG_TYPE_SK_SKB; case BPF_LIRC_MODE2: return BPF_PROG_TYPE_LIRC_MODE2; diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 07f54015238a..5efd790f1b47 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -693,7 +693,7 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock) rcu_assign_sk_user_data(sk, NULL); if (psock->progs.stream_parser) sk_psock_stop_strp(sk, psock); - else if (psock->progs.stream_verdict) + else if (psock->progs.stream_verdict || psock->progs.skb_verdict) sk_psock_stop_verdict(sk, psock); write_unlock_bh(&sk->sk_callback_lock); sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); @@ -1010,6 +1010,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb, } skb_set_owner_r(skb, sk); prog = READ_ONCE(psock->progs.stream_verdict); + if (!prog) + prog = READ_ONCE(psock->progs.skb_verdict); if (likely(prog)) { skb_dst_drop(skb); skb_bpf_redirect_clear(skb); diff --git a/net/core/sock_map.c b/net/core/sock_map.c index dd53a7771d7e..3bddd9dd2da2 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -155,6 +155,8 @@ static void sock_map_del_link(struct sock *sk, strp_stop = true; if (psock->saved_data_ready && stab->progs.stream_verdict) verdict_stop = true; + if (psock->saved_data_ready && stab->progs.skb_verdict) + verdict_stop = true; list_del(&link->list); sk_psock_free_link(link); } @@ -227,7 +229,7 @@ static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, struct sock *sk) { - struct bpf_prog *msg_parser, *stream_parser, *stream_verdict; + struct bpf_prog *msg_parser, *stream_parser, *stream_verdict, *skb_verdict; struct sk_psock *psock; int ret; @@ -256,6 +258,15 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } } + skb_verdict = READ_ONCE(progs->skb_verdict); + if (skb_verdict) { + skb_verdict = bpf_prog_inc_not_zero(skb_verdict); + if (IS_ERR(skb_verdict)) { + ret = PTR_ERR(skb_verdict); + goto out_put_msg_parser; + } + } + psock = sock_map_psock_get_checked(sk); if (IS_ERR(psock)) { ret = PTR_ERR(psock); @@ -265,6 +276,7 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, if (psock) { if ((msg_parser && READ_ONCE(psock->progs.msg_parser)) || (stream_parser && READ_ONCE(psock->progs.stream_parser)) || + (skb_verdict && READ_ONCE(psock->progs.skb_verdict)) || (stream_verdict && READ_ONCE(psock->progs.stream_verdict))) { sk_psock_put(sk, psock); ret = -EBUSY; @@ -296,6 +308,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, } else if (!stream_parser && stream_verdict && !psock->saved_data_ready) { psock_set_prog(&psock->progs.stream_verdict, stream_verdict); sk_psock_start_verdict(sk,psock); + } else if (!stream_verdict && skb_verdict && !psock->saved_data_ready) { + psock_set_prog(&psock->progs.skb_verdict, skb_verdict); + sk_psock_start_verdict(sk, psock); } write_unlock_bh(&sk->sk_callback_lock); return 0; @@ -304,6 +319,9 @@ static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs, out_drop: sk_psock_put(sk, psock); out_progs: + if (skb_verdict) + bpf_prog_put(skb_verdict); +out_put_msg_parser: if (msg_parser) bpf_prog_put(msg_parser); out_put_stream_parser: @@ -1468,6 +1486,9 @@ static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog, case BPF_SK_SKB_STREAM_VERDICT: pprog = &progs->stream_verdict; break; + case BPF_SK_SKB_VERDICT: + pprog = &progs->skb_verdict; + break; default: return -EOPNOTSUPP; } diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index 65303664417e..1828bba19020 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -57,6 +57,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = { [BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict", + [BPF_SK_SKB_VERDICT] = "sk_skb_verdict", [BPF_SK_MSG_VERDICT] = "sk_msg_verdict", [BPF_LIRC_MODE2] = "lirc_mode2", [BPF_FLOW_DISSECTOR] = "flow_dissector", diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c index f2b915b20546..3f067d2d7584 100644 --- a/tools/bpf/bpftool/prog.c +++ b/tools/bpf/bpftool/prog.c @@ -76,6 +76,7 @@ enum dump_mode { static const char * const attach_type_strings[] = { [BPF_SK_SKB_STREAM_PARSER] = "stream_parser", [BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict", + [BPF_SK_SKB_VERDICT] = "skb_verdict", [BPF_SK_MSG_VERDICT] = "msg_verdict", [BPF_FLOW_DISSECTOR] = "flow_dissector", [__MAX_BPF_ATTACH_TYPE] = NULL, diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b89af20cfa19..1a08ab00a45e 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -247,6 +247,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_SK_SKB_VERDICT, __MAX_BPF_ATTACH_TYPE }; From patchwork Tue Mar 2 02:37:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111193 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2BE66C43381 for ; Tue, 2 Mar 2021 08:50:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E343061494 for ; Tue, 2 Mar 2021 08:50:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377929AbhCBIpK (ORCPT ); Tue, 2 Mar 2021 03:45:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443987AbhCBCil (ORCPT ); Mon, 1 Mar 2021 21:38:41 -0500 Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9670C06178C; Mon, 1 Mar 2021 18:37:56 -0800 (PST) Received: by mail-ot1-x32e.google.com with SMTP id v12so17501922ott.10; Mon, 01 Mar 2021 18:37:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=j8szdQHqxNM49f5Dn5eqx7rGepeaUprOkfzillf7fmE=; b=poZJmL/g9qNymTsLQdBQPrzyHEh5sb3QIw+QIjIaNj4aaVqu5wtlGSHnfkQNwSK9wk UmKvg4YnbdrKHJmq+9Fh/QG9IbVD8Q1ntNs4XUjPAP5nv6fe6gzMIb73jrOz3TdWEhZd R0UWvTZnOBUvbPLmv/9rur8ACXSAdDkjOqyUdXtYQPDBT6gvwj3aBuOK7wJUmZeQnud2 NntMYqaEnLfftTSf9N7VMfqIoBw8MuKKFD9rp5V9hmp8BXyKMAvBc1d5FkYPQIdKqQp1 qLRzqbtLVEd4JX5z9qM1H6ZO7xXvhdW9rF3FEkci1s2/FcYhH3nNCH0Edqlf//J+5AQC J9ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=j8szdQHqxNM49f5Dn5eqx7rGepeaUprOkfzillf7fmE=; b=lA7yKQzTjzUNGAVqWulLAtCOe94FLtV/XRPl7YAb/npSU+Htg6qRWJATIC8lawFKZ7 qpVf26y9RUVmTng3+9xuXcF5YYX3ko6NbW5Txkvsu3HmwAJgvtH115iZjn68HkRisOub 0y1ogSo4+Ki8q1crobtY3BPeJCQuyDgBYtj9pE9X6rFHB9eh9hUNqmkIBzYnd59BQqS1 NC71nOdGMUssI1nozkup05vjD3PqZKY+0RrajxxDWxeB+z6PC8YqKr0d6E8y9tGnCfch FQ2S2LP8P41YI9Ct4TofL4NIOfsiIK4HBqEsuCyx7oDMuD8k9dmAqCRdBJ/bMh7pknXC +gHg== X-Gm-Message-State: AOAM5325cIKYmmmkLgM5Gm6IJCjFUPDer4yKg+iuRKwrpxaft5z0vdSS wIn9knMak0Y6S4jFbTw1Wh9+4IJeEjdt2w== X-Google-Smtp-Source: ABdhPJzM0P0aOvUFaxvob49yEobeyDrhf8CcYpkHn6GWpxio4HG51XnN1nEqhzPVzDK1OC3twZEegA== X-Received: by 2002:a05:6830:8d:: with SMTP id a13mr15857425oto.69.1614652676126; Mon, 01 Mar 2021 18:37:56 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.37.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:37:55 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 2/9] sock: introduce sk_prot->update_proto() Date: Mon, 1 Mar 2021 18:37:36 -0800 Message-Id: <20210302023743.24123-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->update_proto(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 18 +++--------------- include/net/sock.h | 3 +++ include/net/tcp.h | 1 + include/net/udp.h | 1 + net/core/skmsg.c | 5 ----- net/core/sock_map.c | 24 ++++-------------------- net/ipv4/tcp_bpf.c | 23 ++++++++++++++++++++--- net/ipv4/tcp_ipv4.c | 3 +++ net/ipv4/udp.c | 3 +++ net/ipv4/udp_bpf.c | 14 ++++++++++++-- net/ipv6/tcp_ipv6.c | 3 +++ net/ipv6/udp.c | 3 +++ 12 files changed, 56 insertions(+), 45 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 451530d41af7..b5df69d5d397 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -98,6 +98,7 @@ struct sk_psock { void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); + int (*saved_update_proto)(struct sock *sk, bool restore); struct proto *sk_proto; struct sk_psock_work_state work_state; struct work_struct work; @@ -350,25 +351,12 @@ static inline void sk_psock_cork_free(struct sk_psock *psock) } } -static inline void sk_psock_update_proto(struct sock *sk, - struct sk_psock *psock, - struct proto *ops) -{ - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, ops); -} - static inline void sk_psock_restore_proto(struct sock *sk, struct sk_psock *psock) { sk->sk_prot->unhash = psock->saved_unhash; - if (inet_csk_has_ulp(sk)) { - tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); - } else { - sk->sk_write_space = psock->saved_write_space; - /* Pairs with lockless read in sk_clone_lock() */ - WRITE_ONCE(sk->sk_prot, psock->sk_proto); - } + if (psock->saved_update_proto) + psock->saved_update_proto(sk, true); } static inline void sk_psock_set_state(struct sk_psock *psock, diff --git a/include/net/sock.h b/include/net/sock.h index 636810ddcd9b..0e8577c917e8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1184,6 +1184,9 @@ struct proto { void (*unhash)(struct sock *sk); void (*rehash)(struct sock *sk); int (*get_port)(struct sock *sk, unsigned short snum); +#ifdef CONFIG_BPF_SYSCALL + int (*update_proto)(struct sock *sk, bool restore); +#endif /* Keeping track of sockets in use */ #ifdef CONFIG_PROC_FS diff --git a/include/net/tcp.h b/include/net/tcp.h index 075de26f449d..2efa4e5ea23d 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2203,6 +2203,7 @@ struct sk_psock; #ifdef CONFIG_BPF_SYSCALL struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int tcp_bpf_update_proto(struct sock *sk, bool restore); void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); #endif /* CONFIG_BPF_SYSCALL */ diff --git a/include/net/udp.h b/include/net/udp.h index d4d064c59232..df7cc1edc200 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -518,6 +518,7 @@ static inline struct sk_buff *udp_rcv_segment(struct sock *sk, #ifdef CONFIG_BPF_SYSCALL struct sk_psock; struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock); +int udp_bpf_update_proto(struct sock *sk, bool restore); #endif #endif /* _UDP_H */ diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 5efd790f1b47..7dbd8344ec89 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -563,11 +563,6 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node) write_lock_bh(&sk->sk_callback_lock); - if (inet_csk_has_ulp(sk)) { - psock = ERR_PTR(-EINVAL); - goto out; - } - if (sk->sk_user_data) { psock = ERR_PTR(-EBUSY); goto out; diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 3bddd9dd2da2..13d2af5bb81c 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -184,26 +184,10 @@ static void sock_map_unref(struct sock *sk, void *link_raw) static int sock_map_init_proto(struct sock *sk, struct sk_psock *psock) { - struct proto *prot; - - switch (sk->sk_type) { - case SOCK_STREAM: - prot = tcp_bpf_get_proto(sk, psock); - break; - - case SOCK_DGRAM: - prot = udp_bpf_get_proto(sk, psock); - break; - - default: + if (!sk->sk_prot->update_proto) return -EINVAL; - } - - if (IS_ERR(prot)) - return PTR_ERR(prot); - - sk_psock_update_proto(sk, psock, prot); - return 0; + psock->saved_update_proto = sk->sk_prot->update_proto; + return sk->sk_prot->update_proto(sk, false); } static struct sk_psock *sock_map_psock_get_checked(struct sock *sk) @@ -570,7 +554,7 @@ static bool sock_map_redirect_allowed(const struct sock *sk) static bool sock_map_sk_is_suitable(const struct sock *sk) { - return sk_is_tcp(sk) || sk_is_udp(sk); + return !!sk->sk_prot->update_proto; } static bool sock_map_sk_state_allowed(const struct sock *sk) diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 17c322b875fd..737726c8138c 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -601,19 +601,36 @@ static int tcp_bpf_assert_proto_ops(struct proto *ops) ops->sendpage == tcp_sendpage ? 0 : -ENOTSUPP; } -struct proto *tcp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int tcp_bpf_update_proto(struct sock *sk, bool restore) { + struct sk_psock *psock = sk_psock(sk); int family = sk->sk_family == AF_INET6 ? TCP_BPF_IPV6 : TCP_BPF_IPV4; int config = psock->progs.msg_parser ? TCP_BPF_TX : TCP_BPF_BASE; + if (restore) { + if (inet_csk_has_ulp(sk)) { + tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space); + } else { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + } + return 0; + } + + if (inet_csk_has_ulp(sk)) + return -EINVAL; + if (sk->sk_family == AF_INET6) { if (tcp_bpf_assert_proto_ops(psock->sk_proto)) - return ERR_PTR(-EINVAL); + return -EINVAL; tcp_bpf_check_v6_needs_rebuild(psock->sk_proto); } - return &tcp_bpf_prots[family][config]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &tcp_bpf_prots[family][config]); + return 0; } /* If a child got cloned from a listening socket that had tcp_bpf diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index daad4f99db32..21c9e262d07c 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2806,6 +2806,9 @@ struct proto tcp_prot = { .hash = inet_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .update_proto = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 4a0478b17243..dbd25b59ce0e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2849,6 +2849,9 @@ struct proto udp_prot = { .unhash = udp_lib_unhash, .rehash = udp_v4_rehash, .get_port = udp_v4_get_port, +#ifdef CONFIG_BPF_SYSCALL + .update_proto = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a94791efc1a..595836088e85 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -41,12 +41,22 @@ static int __init udp_bpf_v4_build_proto(void) } core_initcall(udp_bpf_v4_build_proto); -struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock) +int udp_bpf_update_proto(struct sock *sk, bool restore) { int family = sk->sk_family == AF_INET ? UDP_BPF_IPV4 : UDP_BPF_IPV6; + struct sk_psock *psock = sk_psock(sk); + + if (restore) { + sk->sk_write_space = psock->saved_write_space; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, psock->sk_proto); + return 0; + } if (sk->sk_family == AF_INET6) udp_bpf_check_v6_needs_rebuild(psock->sk_proto); - return &udp_bpf_prots[family]; + /* Pairs with lockless read in sk_clone_lock() */ + WRITE_ONCE(sk->sk_prot, &udp_bpf_prots[family]); + return 0; } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index bd44ded7e50c..ea5be7e7fcb8 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2134,6 +2134,9 @@ struct proto tcpv6_prot = { .hash = inet6_hash, .unhash = inet_unhash, .get_port = inet_csk_get_port, +#ifdef CONFIG_BPF_SYSCALL + .update_proto = tcp_bpf_update_proto, +#endif .enter_memory_pressure = tcp_enter_memory_pressure, .leave_memory_pressure = tcp_leave_memory_pressure, .stream_memory_free = tcp_stream_memory_free, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index d25e5a9252fd..105ba0cf739d 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1713,6 +1713,9 @@ struct proto udpv6_prot = { .unhash = udp_lib_unhash, .rehash = udp_v6_rehash, .get_port = udp_v6_get_port, +#ifdef CONFIG_BPF_SYSCALL + .update_proto = udp_bpf_update_proto, +#endif .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem_offset = offsetof(struct net, ipv4.sysctl_udp_wmem_min), From patchwork Tue Mar 2 02:37:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111189 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A504DC433DB for ; Tue, 2 Mar 2021 08:50:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65A0A61494 for ; Tue, 2 Mar 2021 08:50:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377885AbhCBIlx (ORCPT ); Tue, 2 Mar 2021 03:41:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443988AbhCBCil (ORCPT ); Mon, 1 Mar 2021 21:38:41 -0500 Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACA85C061793; Mon, 1 Mar 2021 18:37:58 -0800 (PST) Received: by mail-oi1-x22a.google.com with SMTP id o3so20440864oic.8; Mon, 01 Mar 2021 18:37:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1+nWbH2X4eUskfN7h2bAJQ0jwGDRaEfpm7mVkpF/v2w=; b=ROtSuaqMLfI54kztrhx/vsjFPBVqHWxu/JRtbSOqxVPOL2vSfNQ1tF5wrclB3nYran qgtGrmHIeMIKuwctWgN3e4e5BZOsT6J79dLp75zY/3QZb7gWhhDPhV8Bs9WGMfJtZeoz HoBhRstYP5lkW8nOVRchom9uagH0cOE2CXbZpwPS6aFkCBeUqu3S9uxRtoWCGsa1MJiu jia5Or8GlJvhT+C1/FNgA08i0rdtawCfq//mtrA0Mz1C/Dpji2XIxmaZqFE1rib8K1q6 UQFPupACZ+B2iFt8zXk0gXCfAPm7CpQj5lRQglC5f3j32/g7bV+yM8Zk0tzVSs2j5N0j 4jOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1+nWbH2X4eUskfN7h2bAJQ0jwGDRaEfpm7mVkpF/v2w=; b=dXT7m3X1IlTEXabW/GbXLzQFLTXk4egL7bVXoeBD+AnmZ7tgvV4Ay7hLLTqP+gm8Z0 bflFswIAVgKo2ShK9h+rmWmHSZD6pcg8aNRNIw6VqnPhiJ/r1wpCZqSAAF99PvNvtF6X iAwK37zrqbE9igTYTxlyhAWMcUoU+5jqBDx3d2szxik22sr5ChkDs+8MqLrevcwN13kV zp90Z0vhuwHJNgA0CeFGgy0ppyQyPoQjX3VUevYUGIfIqFa1TrdGfV8vlSemTKcX/CNV M+98DgQjySr8CwW6Te0pAkZT6QmryakXMJM0cacd2Ildn0+ychhJZsvQhOiyx3DKOhce se0A== X-Gm-Message-State: AOAM530jyX1q1bDcNAwnWz/cz1ZiS7+fH21QCKG3qUO5qxoXxYG2ehX1 YQO+B+KUbniJ/4AopPpfNFer8QMcxojqDQ== X-Google-Smtp-Source: ABdhPJzn+qeD0x3rs4Bf3/YXKrzZA893z5z/hfPUZCLTaRdr3pYNehgeCVFLYzm+EoUGsQ6Vzqf5tA== X-Received: by 2002:aca:d644:: with SMTP id n65mr1718463oig.118.1614652677938; Mon, 01 Mar 2021 18:37:57 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.37.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:37:57 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 3/9] udp: implement ->sendmsg_locked() Date: Mon, 1 Mar 2021 18:37:37 -0800 Message-Id: <20210302023743.24123-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang UDP already has udp_sendmsg() which takes lock_sock() inside. We have to build ->sendmsg_locked() on top of it, by adding a new parameter for whether the sock has been locked. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 1 + net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 30 +++++++++++++++++++++++------- 3 files changed, 25 insertions(+), 7 deletions(-) diff --git a/include/net/udp.h b/include/net/udp.h index df7cc1edc200..5264ba1439f9 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -292,6 +292,7 @@ int udp_get_port(struct sock *sk, unsigned short snum, int udp_err(struct sk_buff *, u32); int udp_abort(struct sock *sk, int err); int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len); +int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); int udp_push_pending_frames(struct sock *sk); void udp_flush_pending_frames(struct sock *sk); int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index a02ce89b56b5..d8c73a848c53 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1071,6 +1071,7 @@ const struct proto_ops inet_dgram_ops = { .setsockopt = sock_common_setsockopt, .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, + .sendmsg_locked = udp_sendmsg_locked, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index dbd25b59ce0e..93db853601d7 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1024,7 +1024,7 @@ int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size) } EXPORT_SYMBOL_GPL(udp_cmsg_send); -int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked) { struct inet_sock *inet = inet_sk(sk); struct udp_sock *up = udp_sk(sk); @@ -1063,15 +1063,18 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) * There are pending frames. * The socket lock must be held while it's corked. */ - lock_sock(sk); + if (!locked) + lock_sock(sk); if (likely(up->pending)) { if (unlikely(up->pending != AF_INET)) { - release_sock(sk); + if (!locked) + release_sock(sk); return -EINVAL; } goto do_append_data; } - release_sock(sk); + if (!locked) + release_sock(sk); } ulen += sizeof(struct udphdr); @@ -1241,11 +1244,13 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } - lock_sock(sk); + if (!locked) + lock_sock(sk); if (unlikely(up->pending)) { /* The socket is already corked while preparing it. */ /* ... which is an evident application bug. --ANK */ - release_sock(sk); + if (!locked) + release_sock(sk); net_dbg_ratelimited("socket already corked\n"); err = -EINVAL; @@ -1272,7 +1277,8 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) err = udp_push_pending_frames(sk); else if (unlikely(skb_queue_empty(&sk->sk_write_queue))) up->pending = 0; - release_sock(sk); + if (!locked) + release_sock(sk); out: ip_rt_put(rt); @@ -1302,8 +1308,18 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) err = 0; goto out; } + +int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udp_sendmsg(sk, msg, len, false); +} EXPORT_SYMBOL(udp_sendmsg); +int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udp_sendmsg(sk, msg, len, true); +} + int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags) { From patchwork Tue Mar 2 02:37:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111201 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F137C4332D for ; Tue, 2 Mar 2021 08:50:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1FF264DE8 for ; Tue, 2 Mar 2021 08:50:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377945AbhCBIph (ORCPT ); Tue, 2 Mar 2021 03:45:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443989AbhCBCim (ORCPT ); Mon, 1 Mar 2021 21:38:42 -0500 Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AD8FC061794; Mon, 1 Mar 2021 18:38:00 -0800 (PST) Received: by mail-ot1-x332.google.com with SMTP id v12so17502035ott.10; Mon, 01 Mar 2021 18:38:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GiB0NEAUfxf9GywTLDDCuaFjXeli5Cn9ox8fpeP4HHI=; b=WblbCjmTRubjTckCQ7GDNDx90zL0hklQRWqJhEtl1TlQe4ESx831lDFHR701MC1wBy Ii7Uzq2Rq+6MQbYC65UJsVzRp9nr9ogf9TtkShKZrK1KxjpSXP6dPQ9nJMrIDKWJQxR8 NQo2s5AgcEVBP0mqD6ESBG6S4bjqdCQLsn0HY81LVcv3YXnhwZr6uHbmw2xUjFbRSLB6 PgsnwMJMG/BqXojWEKH+FKlMf4eLU266gFgTf3/y1W+KRccaXaUPBkAAAsREPMbWEtdI 7u5aAGVf6UXK7RWUUXiBPktbqT9Ssau/t/BtJNYFq2e2HbM5YcmQDsSwRfCKlTRe11ZU 3lbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GiB0NEAUfxf9GywTLDDCuaFjXeli5Cn9ox8fpeP4HHI=; b=kM3zw2lLb/Jqmt6e4cPw6ytcwJYSUVGDDgoxg3oCPbAHmZXbUINXhp+SIPG2X0WJrE 2oPmgj0VyewE+6IoT0l+GZuEADoATk7Hj0lpy6qF9hx5hVz/Ujs9CZEX0fQ1KPOZpUGs ebKsXN+JECb8zZsn7TFZDqWg+iATC77A+EgUvTSoOAXISJUi18jUYo/PJyEkX12OFmaG +lJphcfMl1JBKYam9by/dTwewj2D/idDZPx/xLRTvCgCJhMaJzZxj2ZewvCZUNeTMtjr Nm80a5E0b7y3zPOj/XKy6VaTnEZLB4MOlx4/hNhGbYdhT87zmbrBUeOASBpclMGHbjee 5fpg== X-Gm-Message-State: AOAM533GyGvQsQAt5o5Qc2qaDrFi1ygNl2QT53UjvrGDCFcXlIsCHkzV BWktRPaZHoOmmL7ermvhNVJUzCe26Twrrg== X-Google-Smtp-Source: ABdhPJxtpkUitb0hkxcYiK7vaF8MpDkNE11YfGGPXUZwMGLLjg6JsbMhop2W7PTssiu8mNRXS5BnAQ== X-Received: by 2002:a9d:340b:: with SMTP id v11mr14334996otb.284.1614652679217; Mon, 01 Mar 2021 18:37:59 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.37.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:37:58 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 4/9] udp: implement ->read_sock() for sockmap Date: Mon, 1 Mar 2021 18:37:38 -0800 Message-Id: <20210302023743.24123-5-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/udp.h | 2 ++ net/ipv4/af_inet.c | 1 + net/ipv4/udp.c | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 37 insertions(+) diff --git a/include/net/udp.h b/include/net/udp.h index 5264ba1439f9..44a94cfc63b5 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -330,6 +330,8 @@ struct sock *__udp6_lib_lookup(struct net *net, struct sk_buff *skb); struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid * possibly multiple cache miss on dequeue() diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index d8c73a848c53..df8e8e238756 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1072,6 +1072,7 @@ const struct proto_ops inet_dgram_ops = { .getsockopt = sock_common_getsockopt, .sendmsg = inet_sendmsg, .sendmsg_locked = udp_sendmsg_locked, + .read_sock = udp_read_sock, .recvmsg = inet_recvmsg, .mmap = sock_no_mmap, .sendpage = inet_sendpage, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 93db853601d7..54f24b1d4f65 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1798,6 +1798,40 @@ struct sk_buff *__skb_recv_udp(struct sock *sk, unsigned int flags, } EXPORT_SYMBOL(__skb_recv_udp); +int udp_read_sock(struct sock *sk, read_descriptor_t *desc, + sk_read_actor_t recv_actor) +{ + int copied = 0; + + while (1) { + int offset = 0, err; + struct sk_buff *skb; + + skb = __skb_recv_udp(sk, 0, 1, &offset, &err); + if (!skb) + break; + if (offset < skb->len) { + int used; + size_t len; + + len = skb->len - offset; + used = recv_actor(desc, skb, offset, len); + if (used <= 0) { + if (!copied) + copied = used; + break; + } else if (used <= len) { + copied += used; + offset += used; + } + } + if (!desc->count) + break; + } + + return copied; +} + /* * This should be easy, if there is something there we * return it, otherwise we block. From patchwork Tue Mar 2 02:37:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111195 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2986FC43331 for ; Tue, 2 Mar 2021 08:50:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC42264DEC for ; Tue, 2 Mar 2021 08:50:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377962AbhCBIp5 (ORCPT ); Tue, 2 Mar 2021 03:45:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444090AbhCBCj1 (ORCPT ); Mon, 1 Mar 2021 21:39:27 -0500 Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DB22C061797; Mon, 1 Mar 2021 18:38:01 -0800 (PST) Received: by mail-ot1-x330.google.com with SMTP id e45so18583741ote.9; Mon, 01 Mar 2021 18:38:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DKj01eWVlyMr7t8j3UxFWe5D/QCYMBC3DYw8jt3wc38=; b=FAgVdAJRK+htc197BZv5jzLqwNSdsPRuI5cQIlmQb8koGHt/4ATutydJCIL6a0sJq+ CZjipKEx7a9+aw7vCGKj4lJdqfBts52qGFT1u7J/rGFNIxpLK723SvRs81r7VQJF7chw 5uY2hCSapZpC63Uv4yLPraabrSWf1jK1yc3YTsCy7Onq1PLFrYMRvmZzg8AOcHI2C/Hi MkAtkQdYySKx3bOpkM8GdsmydWk5QaF8bxEqJpgm6owYJjkXAwcY5ii9KCCdVAOWqNXX r/KZ9GzZmzCnmTG907bv/M7z9A72bMzW4sgyJyjEEDc+Wg3RDZ6UpXnkqCkmDAMXj+B9 cQEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=DKj01eWVlyMr7t8j3UxFWe5D/QCYMBC3DYw8jt3wc38=; b=pDNPWhbMMIMoNafjozq1IWCUWh7zBdMea/0LGVQkEQa/Q6e17G/AizQ2yOmB9pID/m 3K6HqdZ+6to/qiOJG4O66pOcmYSFqggvP37TVNMuK3IwIpHO4r6L75NhFIh3SqSPyZct SUQD3au6sGlnZ7Xf7AHh2kz6QRetXsY8nF0+AMtPa4kkrKMsQfypT5n6r6q1hJeJ+ZhS V+svHmjtauRcfinIVIms3vc70zE6EgEG9kyqhqdkIUXA5CezJgAomN38hHScZSgQ8wRK 7e4HBnDpS4XFCTSmA3wpZZv4rIaDpkmoR1fbOkOL66+gUAWXHc9j4zjn33hf214Ly0nH ie/g== X-Gm-Message-State: AOAM533IkEfSmDu3aQB6T5bnJFvgOn44dDvPhhw5BtRZc8lZv+N3PUUH dQlWMxW45p4nfDKWd3KfcBAmm7neiGD9CQ== X-Google-Smtp-Source: ABdhPJxl364A77qIEiNxquKriNtmsuqro4z1JbWTetSSemI97XdnatzgkUuasG3TrYcOMYE5NpzskQ== X-Received: by 2002:a9d:7519:: with SMTP id r25mr16133568otk.172.1614652680517; Mon, 01 Mar 2021 18:38:00 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.37.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:38:00 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 5/9] udp: add ->read_sock() and ->sendmsg_locked() to ipv6 Date: Mon, 1 Mar 2021 18:37:39 -0800 Message-Id: <20210302023743.24123-6-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Similarly, udpv6_sendmsg() takes lock_sock() inside too, we have to build ->sendmsg_locked() on top of it. For ->read_sock(), we can just use udp_read_sock(). Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/net/ipv6.h | 1 + net/ipv4/udp.c | 1 + net/ipv6/af_inet6.c | 2 ++ net/ipv6/udp.c | 27 +++++++++++++++++++++------ 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/include/net/ipv6.h b/include/net/ipv6.h index bd1f396cc9c7..48b6850dae85 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -1119,6 +1119,7 @@ int inet6_hash_connect(struct inet_timewait_death_row *death_row, int inet6_sendmsg(struct socket *sock, struct msghdr *msg, size_t size); int inet6_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags); +int udpv6_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len); /* * reassembly.c diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 54f24b1d4f65..717c543aaec3 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1831,6 +1831,7 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc, return copied; } +EXPORT_SYMBOL(udp_read_sock); /* * This should be easy, if there is something there we diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 1fb75f01756c..634ab3a825d7 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -714,7 +714,9 @@ const struct proto_ops inet6_dgram_ops = { .setsockopt = sock_common_setsockopt, /* ok */ .getsockopt = sock_common_getsockopt, /* ok */ .sendmsg = inet6_sendmsg, /* retpoline's sake */ + .sendmsg_locked = udpv6_sendmsg_locked, .recvmsg = inet6_recvmsg, /* retpoline's sake */ + .read_sock = udp_read_sock, .mmap = sock_no_mmap, .sendpage = sock_no_sendpage, .set_peek_off = sk_set_peek_off, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 105ba0cf739d..4372597bc271 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1272,7 +1272,7 @@ static int udp_v6_push_pending_frames(struct sock *sk) return err; } -int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +static int __udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked) { struct ipv6_txoptions opt_space; struct udp_sock *up = udp_sk(sk); @@ -1361,7 +1361,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) * There are pending frames. * The socket lock must be held while it's corked. */ - lock_sock(sk); + if (!locked) + lock_sock(sk); if (likely(up->pending)) { if (unlikely(up->pending != AF_INET6)) { release_sock(sk); @@ -1370,7 +1371,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) dst = NULL; goto do_append_data; } - release_sock(sk); + if (!locked) + release_sock(sk); } ulen += sizeof(struct udphdr); @@ -1533,11 +1535,13 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } - lock_sock(sk); + if (!locked) + lock_sock(sk); if (unlikely(up->pending)) { /* The socket is already corked while preparing it. */ /* ... which is an evident application bug. --ANK */ - release_sock(sk); + if (!locked) + release_sock(sk); net_dbg_ratelimited("udp cork app bug 2\n"); err = -EINVAL; @@ -1562,7 +1566,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (err > 0) err = np->recverr ? net_xmit_errno(err) : 0; - release_sock(sk); + if (!locked) + release_sock(sk); out: dst_release(dst); @@ -1593,6 +1598,16 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto out; } +int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udpv6_sendmsg(sk, msg, len, false); +} + +int udpv6_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len) +{ + return __udpv6_sendmsg(sk, msg, len, true); +} + void udpv6_destroy_sock(struct sock *sk) { struct udp_sock *up = udp_sk(sk); From patchwork Tue Mar 2 02:37:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111197 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B3A0C433DB for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4DAE064DE8 for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377981AbhCBIqK (ORCPT ); Tue, 2 Mar 2021 03:46:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444091AbhCBCj2 (ORCPT ); Mon, 1 Mar 2021 21:39:28 -0500 Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6D5DC0617A7; Mon, 1 Mar 2021 18:38:02 -0800 (PST) Received: by mail-ot1-x32d.google.com with SMTP id h22so18607627otr.6; Mon, 01 Mar 2021 18:38:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LRuBf2TNfdnuikXXNBubkWS73ESZs9RS8GJqwcp+Nmc=; b=YEvCBlSecCnhPWouAWkynbjuQ+fERjpIwc3Hu0haxJwV576f2Cl8Zp9lcgd3IiMJY5 8r8ewl1hs9joytJZ2T5NLbIK0Pn6mFDCY4u/t0PgQdJnPzlthEkjk3uesyqRMtXPnWfX MSDNuBrHfyepuZSdmFXFLm1WA4eckxf+6Tkg40bOB49FVw937COSFMBQnSnLbMRhNHhF dvNdQ47GFaYx7hY3vGayfSVJdrYc3sNMMsCc0sTEw/NDfuoTteKGrFNIG+lTSb3nmRSh qqst/d7fvOXy5dAwntbdnVRUdMDzHP0Y5DapwB8XeTrg+9p6ZD8s6QZyh/evAYMk5ULf Egiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LRuBf2TNfdnuikXXNBubkWS73ESZs9RS8GJqwcp+Nmc=; b=Fldy+60ZuYcgZwZJXNTFhUxfDEFsG4xVGbctUOp8I0s1eVYk+jXZvQ1DuRG3SUTOqA km8A3J9XNQ9S98gtl68yI+VtI3veJulwP9nvtmRlI2CtHAtRnVYSs7v/8vh4IS6bj71c 3k7V8IG1LR7LBgtERAItr44OT4AkVRzae3PIpmdkEN9zqPdgWTiUtwq/EoeFlJCygNAk EnDBTEe09KXXq8BOVfgZWnz+7JNsGNTvpKN4zglW028FK2U5c+qzr1qxX+UQXnv+NXzi ww1S4RZmWUYjrLfzt7IX+2tM0gZtCydntOOzTD3KMfSwEOGBodb+CmXRWwqE+Axxjl8V 8FqQ== X-Gm-Message-State: AOAM531X7S0585tf1QRL/D+mdExi70wzwtebLfbJQYMw3vAuqBOWV2ad yg5o+cAiZRvRme+Dxyfj8MDrcJBGqnfWvA== X-Google-Smtp-Source: ABdhPJzbRcpf9p3+I1jRG65fMON+meFBVXAJXhqj48eU9M/6Aumnt9Sihw2HPmXd9m8Q/NGmaLlJog== X-Received: by 2002:a9d:42c:: with SMTP id 41mr15595561otc.108.1614652681869; Mon, 01 Mar 2021 18:38:01 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.38.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:38:01 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 6/9] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() Date: Mon, 1 Mar 2021 18:37:40 -0800 Message-Id: <20210302023743.24123-7-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- include/linux/skmsg.h | 4 ++ include/net/tcp.h | 2 - net/core/skmsg.c | 104 +++++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp_bpf.c | 106 +----------------------------------------- net/tls/tls_sw.c | 4 +- 5 files changed, 112 insertions(+), 108 deletions(-) diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index b5df69d5d397..8c24495d8d33 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -126,6 +126,10 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, struct sk_msg *msg, u32 bytes); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err); +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags); static inline void sk_msg_check_to_free(struct sk_msg *msg, u32 i, u32 bytes) { diff --git a/include/net/tcp.h b/include/net/tcp.h index 2efa4e5ea23d..31b1696c62ba 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2209,8 +2209,6 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk); int tcp_bpf_sendmsg_redir(struct sock *sk, struct sk_msg *msg, u32 bytes, int flags); -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags); #endif /* CONFIG_NET_SOCK_MSG */ #if !defined(CONFIG_BPF_SYSCALL) || !defined(CONFIG_NET_SOCK_MSG) diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 7dbd8344ec89..fa10d869a728 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -399,6 +399,110 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, } EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter); +int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags, + long timeo, int *err) +{ + DEFINE_WAIT_FUNC(wait, woken_wake_function); + int ret = 0; + + if (sk->sk_shutdown & RCV_SHUTDOWN) + return 1; + + if (!timeo) + return ret; + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + ret = sk_wait_event(sk, &timeo, + !list_empty(&psock->ingress_msg) || + !skb_queue_empty(&sk->sk_receive_queue), &wait); + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); + return ret; +} +EXPORT_SYMBOL_GPL(sk_msg_wait_data); + +/* Receive sk_msg from psock->ingress_msg to @msg. */ +int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, + int len, int flags) +{ + struct iov_iter *iter = &msg->msg_iter; + int peek = flags & MSG_PEEK; + struct sk_msg *msg_rx; + int i, copied = 0; + + msg_rx = list_first_entry_or_null(&psock->ingress_msg, + struct sk_msg, list); + + while (copied != len) { + struct scatterlist *sge; + + if (unlikely(!msg_rx)) + break; + + i = msg_rx->sg.start; + do { + struct page *page; + int copy; + + sge = sk_msg_elem(msg_rx, i); + copy = sge->length; + page = sg_page(sge); + if (copied + copy > len) + copy = len - copied; + copy = copy_page_to_iter(page, sge->offset, copy, iter); + if (!copy) + return copied ? copied : -EFAULT; + + copied += copy; + if (likely(!peek)) { + sge->offset += copy; + sge->length -= copy; + if (!msg_rx->skb) + sk_mem_uncharge(sk, copy); + msg_rx->sg.size -= copy; + + if (!sge->length) { + sk_msg_iter_var_next(i); + if (!msg_rx->skb) + put_page(page); + } + } else { + /* Lets not optimize peek case if copy_page_to_iter + * didn't copy the entire length lets just break. + */ + if (copy != sge->length) + return copied; + sk_msg_iter_var_next(i); + } + + if (copied == len) + break; + } while (i != msg_rx->sg.end); + + if (unlikely(peek)) { + if (msg_rx == list_last_entry(&psock->ingress_msg, + struct sk_msg, list)) + break; + msg_rx = list_next_entry(msg_rx, list); + continue; + } + + msg_rx->sg.start = i; + if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { + list_del(&msg_rx->list); + if (msg_rx->skb) + consume_skb(msg_rx->skb); + kfree(msg_rx); + } + msg_rx = list_first_entry_or_null(&psock->ingress_msg, + struct sk_msg, list); + } + + return copied; +} +EXPORT_SYMBOL_GPL(sk_msg_recvmsg); + static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk, struct sk_buff *skb) { diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index 737726c8138c..d2c5394bfb5e 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -10,86 +10,6 @@ #include #include -int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock, - struct msghdr *msg, int len, int flags) -{ - struct iov_iter *iter = &msg->msg_iter; - int peek = flags & MSG_PEEK; - struct sk_msg *msg_rx; - int i, copied = 0; - - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - - while (copied != len) { - struct scatterlist *sge; - - if (unlikely(!msg_rx)) - break; - - i = msg_rx->sg.start; - do { - struct page *page; - int copy; - - sge = sk_msg_elem(msg_rx, i); - copy = sge->length; - page = sg_page(sge); - if (copied + copy > len) - copy = len - copied; - copy = copy_page_to_iter(page, sge->offset, copy, iter); - if (!copy) - return copied ? copied : -EFAULT; - - copied += copy; - if (likely(!peek)) { - sge->offset += copy; - sge->length -= copy; - if (!msg_rx->skb) - sk_mem_uncharge(sk, copy); - msg_rx->sg.size -= copy; - - if (!sge->length) { - sk_msg_iter_var_next(i); - if (!msg_rx->skb) - put_page(page); - } - } else { - /* Lets not optimize peek case if copy_page_to_iter - * didn't copy the entire length lets just break. - */ - if (copy != sge->length) - return copied; - sk_msg_iter_var_next(i); - } - - if (copied == len) - break; - } while (i != msg_rx->sg.end); - - if (unlikely(peek)) { - if (msg_rx == list_last_entry(&psock->ingress_msg, - struct sk_msg, list)) - break; - msg_rx = list_next_entry(msg_rx, list); - continue; - } - - msg_rx->sg.start = i; - if (!sge->length && msg_rx->sg.start == msg_rx->sg.end) { - list_del(&msg_rx->list); - if (msg_rx->skb) - consume_skb(msg_rx->skb); - kfree(msg_rx); - } - msg_rx = list_first_entry_or_null(&psock->ingress_msg, - struct sk_msg, list); - } - - return copied; -} -EXPORT_SYMBOL_GPL(__tcp_bpf_recvmsg); - static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, struct sk_msg *msg, u32 apply_bytes, int flags) { @@ -243,28 +163,6 @@ static bool tcp_bpf_stream_read(const struct sock *sk) return !empty; } -static int tcp_bpf_wait_data(struct sock *sk, struct sk_psock *psock, - int flags, long timeo, int *err) -{ - DEFINE_WAIT_FUNC(wait, woken_wake_function); - int ret = 0; - - if (sk->sk_shutdown & RCV_SHUTDOWN) - return 1; - - if (!timeo) - return ret; - - add_wait_queue(sk_sleep(sk), &wait); - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - ret = sk_wait_event(sk, &timeo, - !list_empty(&psock->ingress_msg) || - !skb_queue_empty(&sk->sk_receive_queue), &wait); - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); - remove_wait_queue(sk_sleep(sk), &wait); - return ret; -} - static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -284,13 +182,13 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, } lock_sock(sk); msg_bytes_ready: - copied = __tcp_bpf_recvmsg(sk, psock, msg, len, flags); + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); if (!copied) { int data, err = 0; long timeo; timeo = sock_rcvtimeo(sk, nonblock); - data = tcp_bpf_wait_data(sk, psock, flags, timeo, &err); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); if (data) { if (!sk_psock_queue_empty(psock)) goto msg_bytes_ready; diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 01d933ae5f16..1dcb34dfd56b 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -1789,8 +1789,8 @@ int tls_sw_recvmsg(struct sock *sk, skb = tls_wait_data(sk, psock, flags, timeo, &err); if (!skb) { if (psock) { - int ret = __tcp_bpf_recvmsg(sk, psock, - msg, len, flags); + int ret = sk_msg_recvmsg(sk, psock, msg, len, + flags); if (ret > 0) { decrypted += ret; From patchwork Tue Mar 2 02:37:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111199 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D70CBC4332E for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9F7DD64DE4 for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381793AbhCBIq3 (ORCPT ); Tue, 2 Mar 2021 03:46:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444134AbhCBCjp (ORCPT ); Mon, 1 Mar 2021 21:39:45 -0500 Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13494C0617A9; Mon, 1 Mar 2021 18:38:04 -0800 (PST) Received: by mail-ot1-x32f.google.com with SMTP id 105so18645289otd.3; Mon, 01 Mar 2021 18:38:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qGgIBOZxfhj4RQXaRAJW/z/oxERFBXXb52q9pb05GWg=; b=Xub6m/fqWy9UDYtuvuotNM4hF3wsViKZZy/0QYUrz2l4g2C00eAzj9YId2jPs8Hodh JlU++zKNbc8vtm9e5pUAoYTrAYaxvjono/Em+4cAps5lYmWWFadvKjIY1QzSFhBt6n5z DYDa/XUMOM8lbrRzNVzh0pihxBGtNhBfGHebdYO378rG82Z/ezXQwtHC7YptEXnJ9bqQ BaXVKNAJxswXI+o/zKtZ8pK5fGST1XM7K84n8sTW7HtKm93H+39Xls/r/P+izaV8UX8U 6uAPzJPPqcsIzeIyJ8RTxndu3l7H2YXAPi3uL1J2Mr/G51kL25SX/4vwTrude8EFjFt1 a72w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qGgIBOZxfhj4RQXaRAJW/z/oxERFBXXb52q9pb05GWg=; b=OwbE/U/3jQyOW8fh9dTf7pfp7sayWr8rnhxYg/s6+4TiKGPElx9XiOMhYj6y+KG3X3 ReHvyanZw0s7iVBQMWHb0mbohyuGOy620zWUxOO8M/BxQ/yg7wIwQECPa/BcKSVObkAM fRvgSd8bf/2EJRNbaWYuJIwJ5ra+hLqnEsx3WvnhsT6tmeviRPFtJIzZzIFWtD15QF1F LYB/aH7SzaZYVQumpoxfdgIMGVltlv+ljFXoFeJ3GJQIoTXQZ5ydaATeGck5vHl+ZjLa JWZVgr3Uebplvr678x6A27lMHvJ23xwwGG304VsbCs7vJ4sbnEhSOV3THZMDrM+4vbVm hvgg== X-Gm-Message-State: AOAM53088v72wVdLQYj1UOi15pf3doYOtuj32MkhQfGjAuCZKWkDxp2D FCZZFA465wgPg+k08kGWw4SA8dnH9Bj18Q== X-Google-Smtp-Source: ABdhPJx3H6Ao6rDPhVOoJO9vXVYk78Y0zbb9gZGhYu9ZTh+L7vGMPVipRBeq3QQZ5Su3yA/pQ3A0BQ== X-Received: by 2002:a05:6830:1502:: with SMTP id k2mr11650604otp.166.1614652683268; Mon, 01 Mar 2021 18:38:03 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.38.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:38:02 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 7/9] udp: implement udp_bpf_recvmsg() for sockmap Date: Mon, 1 Mar 2021 18:37:41 -0800 Message-Id: <20210302023743.24123-8-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang We have to implement udp_bpf_recvmsg() to replace the ->recvmsg() to retrieve skmsg from ingress_msg. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/ipv4/udp_bpf.c | 64 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 595836088e85..9a37ba056575 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -4,6 +4,68 @@ #include #include #include +#include + +#include "udp_impl.h" + +static struct proto *udpv6_prot_saved __read_mostly; + +static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int noblock, int flags, int *addr_len) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return udpv6_prot_saved->recvmsg(sk, msg, len, noblock, flags, + addr_len); +#endif + return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len); +} + +static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int nonblock, int flags, int *addr_len) +{ + struct sk_psock *psock; + int copied, ret; + + if (unlikely(flags & MSG_ERRQUEUE)) + return inet_recv_error(sk, msg, len, addr_len); + + psock = sk_psock_get(sk); + if (unlikely(!psock)) + return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + + lock_sock(sk); + if (sk_psock_queue_empty(psock)) { + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + +msg_bytes_ready: + copied = sk_msg_recvmsg(sk, psock, msg, len, flags); + if (!copied) { + int data, err = 0; + long timeo; + + timeo = sock_rcvtimeo(sk, nonblock); + data = sk_msg_wait_data(sk, psock, flags, timeo, &err); + if (data) { + if (!sk_psock_queue_empty(psock)) + goto msg_bytes_ready; + ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len); + goto out; + } + if (err) { + ret = err; + goto out; + } + copied = -EAGAIN; + } + ret = copied; +out: + release_sock(sk); + sk_psock_put(sk, psock); + return ret; +} enum { UDP_BPF_IPV4, @@ -11,7 +73,6 @@ enum { UDP_BPF_NUM_PROTS, }; -static struct proto *udpv6_prot_saved __read_mostly; static DEFINE_SPINLOCK(udpv6_prot_lock); static struct proto udp_bpf_prots[UDP_BPF_NUM_PROTS]; @@ -20,6 +81,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->unhash = sock_map_unhash; prot->close = sock_map_close; + prot->recvmsg = udp_bpf_recvmsg; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) From patchwork Tue Mar 2 02:37:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111203 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE7D3C432C3 for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BFB9964DE8 for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381813AbhCBIqm (ORCPT ); Tue, 2 Mar 2021 03:46:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444135AbhCBCjp (ORCPT ); Mon, 1 Mar 2021 21:39:45 -0500 Received: from mail-ot1-x329.google.com (mail-ot1-x329.google.com [IPv6:2607:f8b0:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D555C0617AA; Mon, 1 Mar 2021 18:38:05 -0800 (PST) Received: by mail-ot1-x329.google.com with SMTP id v12so17502191ott.10; Mon, 01 Mar 2021 18:38:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tCjVfOBpsxyYlXVTXml2C2itPEUYXHFIwvXt6TaQXvU=; b=vQlmYwLFDUKxl8+LJSykwCdcFPOBtdcAz1uBaqpMn0bzEfGN6NOLoCIyUDMRHMvAfM 1nGUrAg1bv+OO2ALNVCObNE5ak3cDdlCKRC5a9nPXA436404CahNQUcnzSx3i91nBPHR Hp6MEdtTGJezKLbcqzmuioe4CvQOHEoUXlVihRxWs0Y5Ddz1jvM365W6FlIKLfhiAlCK 6GTSq4OCKR0Sb+Po5ED2ifeUM21CgGShXqVJKhOuqcEjIuog3UfLsE08HLHC21J9fWa6 pUNLCgdbnBrvR/oZMvqIOr4hOirREC4JqcVvphOgK0voW8ukCT4TGrhBp0uZ8JartGmi 2Z0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tCjVfOBpsxyYlXVTXml2C2itPEUYXHFIwvXt6TaQXvU=; b=STGR4rLbkD0yA9QNB/lhYET5QsYTd8jCu9NKmYYxfRAbGaUztNfcRzxtx0wjETAbUZ xfjkAjtLCDcVspzWc/tiZgBowG7WLHDMaMPY/ftYpIv2af+4umKimCxaWVAuWaF1f5Q7 eFZWG5ayPm4MGUMUOFNBRaqxL1dlQrkrL0s+qN/YjERqWgFPCstHyhzatSNMR2zII6/3 au9UdwIqZaz5nBuQGPi7/pkZC4X6TY1ne2KZDzkZ4O6nA6WuwF4NbDDCpggNRylfch0k 3ymcAIFWicaM3Kkr+O4qM7/yza2iwbeWtHYh9ffAQR6fIm7YipsaNhn5SdBB0ERmVQrw pTRQ== X-Gm-Message-State: AOAM530cwbLVAzZbyrt9O1GaqLvijAdWxie/ZuHTWhyARjHzz44NCRp0 GhVGofskoe5K7ArA18JYiQtOV4kwzdO0sw== X-Google-Smtp-Source: ABdhPJyvxw9GreDsJcpj3JFfgeWqyCP/h0GXPqD+SvFLoTXh0nN65wp3IlR9PrFiNuxaNxliaTjfug== X-Received: by 2002:a9d:2e4:: with SMTP id 91mr16357634otl.60.1614652684830; Mon, 01 Mar 2021 18:38:04 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.38.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:38:04 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 8/9] sock_map: update sock type checks for UDP Date: Mon, 1 Mar 2021 18:37:42 -0800 Message-Id: <20210302023743.24123-9-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Now UDP supports sockmap and redirection, we can safely update the sock type checks for it accordingly. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- net/core/sock_map.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 13d2af5bb81c..f7eee4b7b994 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -549,7 +549,10 @@ static bool sk_is_udp(const struct sock *sk) static bool sock_map_redirect_allowed(const struct sock *sk) { - return sk_is_tcp(sk) && sk->sk_state != TCP_LISTEN; + if (sk_is_tcp(sk)) + return sk->sk_state != TCP_LISTEN; + else + return sk->sk_state == TCP_ESTABLISHED; } static bool sock_map_sk_is_suitable(const struct sock *sk) From patchwork Tue Mar 2 02:37:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 12111205 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1158DC43333 for ; Tue, 2 Mar 2021 08:50:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB22661494 for ; Tue, 2 Mar 2021 08:50:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381827AbhCBIqs (ORCPT ); Tue, 2 Mar 2021 03:46:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444166AbhCBCj5 (ORCPT ); Mon, 1 Mar 2021 21:39:57 -0500 Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E18FEC0617AB; Mon, 1 Mar 2021 18:38:06 -0800 (PST) Received: by mail-oi1-x22f.google.com with SMTP id x62so6193490oix.5; Mon, 01 Mar 2021 18:38:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=pl7tpEzuYgw7xAUS2PXljzawtnR88XWzXZTy+ZPdEhS5ZMdNNsLcpyiSgeVD9qQ6Qz jSSGiUghjGhQUQY/OsTt/bFkjczkum43iYvveJxVE5svkHO9XR5lYRWlepFKkEXJ69Ix 91+PELGmSjSiDRwJlq0X+VB3/15PMgiEUq/TruDnEbPyDDpKMJuSeHtiOMWuMU+Ciqgh xR8SC4emtX97RzHmVVKL41Jo+48k4c7Yfhd9ZUQKdtJI+lZ0Qp40moNvV4bqVaps9YOB K6g8hZIWXm4RCt/GzezN927EFOoYIX0vJf1fuFmB7WUtVWCaO2IoHUeJaa2YxSM3Ccwy dsaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=g3nptvPrG/Rja0vDg1eqjKadpsNbfe6SjFJEHfl1sz0=; b=uHz4p1aaVyntXvERodgMd5ctrjhXEeZ2jR5Ktb/Sf/bbl/fb+l95oZPttQPNruc7Vx YG1+BJanYGIu32BSLApC4e0cP9Y8kMyNF5thUKRForEk241hiimEfRM42u2rlB0j3xjF u80g1gIgmRZR1HehTmR1J0/eK51vAHZIwaHO6a5UtPrpfYK08Wjf14mrlxQ03muh9pip isS8Ch+4/RxkHs6PJY/9NfGQr8HH+/iJ/1D5kas2Y6wVXfKkCHX9+9g42tLCn2wOmD6S GJelT9WRrr+t84J+11A6+HsR1z7+Ttmok/OzgD6jct5VmK/RX5sIeqPMK9F1XACbe/E3 6TWA== X-Gm-Message-State: AOAM532HYxtAb2BsyjrRkFBoQfrLhxGsMURvacvwebpYbSdmAqZnlYgx kbBba6oSdqqRCfRcM5SXv8LoPNzFk+Uxww== X-Google-Smtp-Source: ABdhPJw4gLXpMKB5ZZ/RrdPRQKemKnRZ2Do0c0kpdg7xE9dbJNSsR0zhMH8e6uOo0Ur1aurea4lbVA== X-Received: by 2002:aca:a809:: with SMTP id r9mr1626335oie.163.1614652686127; Mon, 01 Mar 2021 18:38:06 -0800 (PST) Received: from unknown.attlocal.net ([2600:1700:65a0:ab60:1bb:8d29:39ef:5fe5]) by smtp.gmail.com with ESMTPSA id a30sm100058oiy.42.2021.03.01.18.38.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Mar 2021 18:38:05 -0800 (PST) From: Cong Wang To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang , John Fastabend , Daniel Borkmann , Jakub Sitnicki , Lorenz Bauer Subject: [Patch bpf-next v2 9/9] selftests/bpf: add a test case for udp sockmap Date: Mon, 1 Mar 2021 18:37:43 -0800 Message-Id: <20210302023743.24123-10-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210302023743.24123-1-xiyou.wangcong@gmail.com> References: <20210302023743.24123-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: bpf@iogearbox.net From: Cong Wang Add a test case to ensure redirection between two UDP sockets work. Cc: John Fastabend Cc: Daniel Borkmann Cc: Jakub Sitnicki Cc: Lorenz Bauer Signed-off-by: Cong Wang --- .../selftests/bpf/prog_tests/sockmap_listen.c | 140 ++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 22 +++ 2 files changed, 162 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c index c26e6bf05e49..a549ebd3b5a6 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c @@ -1563,6 +1563,142 @@ static void test_redir(struct test_sockmap_listen *skel, struct bpf_map *map, } } +static void udp_redir_to_connected(int family, int sotype, int sock_mapfd, + int verd_mapfd, enum redir_mode mode) +{ + const char *log_prefix = redir_mode_str(mode); + struct sockaddr_storage addr; + int c0, c1, p0, p1; + unsigned int pass; + socklen_t len; + int err, n; + u64 value; + u32 key; + char b; + + zero_verdict_count(verd_mapfd); + + p0 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p0 < 0) + return; + len = sizeof(addr); + err = xgetsockname(p0, sockaddr(&addr), &len); + if (err) + goto close_peer0; + + c0 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c0 < 0) + goto close_peer0; + err = xconnect(c0, sockaddr(&addr), len); + if (err) + goto close_cli0; + err = xgetsockname(c0, sockaddr(&addr), &len); + if (err) + goto close_cli0; + err = xconnect(p0, sockaddr(&addr), len); + if (err) + goto close_cli0; + + p1 = socket_loopback(family, sotype | SOCK_NONBLOCK); + if (p1 < 0) + goto close_cli0; + err = xgetsockname(p1, sockaddr(&addr), &len); + if (err) + goto close_cli0; + + c1 = xsocket(family, sotype | SOCK_NONBLOCK, 0); + if (c1 < 0) + goto close_peer1; + err = xconnect(c1, sockaddr(&addr), len); + if (err) + goto close_cli1; + err = xgetsockname(c1, sockaddr(&addr), &len); + if (err) + goto close_cli1; + err = xconnect(p1, sockaddr(&addr), len); + if (err) + goto close_cli1; + + key = 0; + value = p0; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + key = 1; + value = p1; + err = xbpf_map_update_elem(sock_mapfd, &key, &value, BPF_NOEXIST); + if (err) + goto close_cli1; + + n = write(c1, "a", 1); + if (n < 0) + FAIL_ERRNO("%s: write", log_prefix); + if (n == 0) + FAIL("%s: incomplete write", log_prefix); + if (n < 1) + goto close_cli1; + + key = SK_PASS; + err = xbpf_map_lookup_elem(verd_mapfd, &key, &pass); + if (err) + goto close_cli1; + if (pass != 1) + FAIL("%s: want pass count 1, have %d", log_prefix, pass); + + n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1); + if (n < 0) + FAIL_ERRNO("%s: read", log_prefix); + if (n == 0) + FAIL("%s: incomplete read", log_prefix); + +close_cli1: + xclose(c1); +close_peer1: + xclose(p1); +close_cli0: + xclose(c0); +close_peer0: + xclose(p0); +} + +static void udp_skb_redir_to_connected(struct test_sockmap_listen *skel, + struct bpf_map *inner_map, int family, + int sotype) +{ + int verdict = bpf_program__fd(skel->progs.prog_skb_verdict); + int verdict_map = bpf_map__fd(skel->maps.verdict_map); + int sock_map = bpf_map__fd(inner_map); + int err; + + err = xbpf_prog_attach(verdict, sock_map, BPF_SK_SKB_VERDICT, 0); + if (err) + return; + + skel->bss->test_ingress = false; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_EGRESS); + skel->bss->test_ingress = true; + udp_redir_to_connected(family, sotype, sock_map, verdict_map, + REDIR_INGRESS); + + xbpf_prog_detach2(verdict, sock_map, BPF_SK_SKB_VERDICT); +} + +static void test_udp_redir(struct test_sockmap_listen *skel, struct bpf_map *map, + int family) +{ + const char *family_name, *map_name; + char s[MAX_TEST_NAME]; + + family_name = family_str(family); + map_name = map_type_str(map); + snprintf(s, sizeof(s), "%s %s %s", map_name, family_name, __func__); + if (!test__start_subtest(s)) + return; + udp_skb_redir_to_connected(skel, map, family, SOCK_DGRAM); +} + static void test_reuseport(struct test_sockmap_listen *skel, struct bpf_map *map, int family, int sotype) { @@ -1626,10 +1762,14 @@ void test_sockmap_listen(void) skel->bss->test_sockmap = true; run_tests(skel, skel->maps.sock_map, AF_INET); run_tests(skel, skel->maps.sock_map, AF_INET6); + test_udp_redir(skel, skel->maps.sock_map, AF_INET); + test_udp_redir(skel, skel->maps.sock_map, AF_INET6); skel->bss->test_sockmap = false; run_tests(skel, skel->maps.sock_hash, AF_INET); run_tests(skel, skel->maps.sock_hash, AF_INET6); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET); + test_udp_redir(skel, skel->maps.sock_hash, AF_INET6); test_sockmap_listen__destroy(skel); } diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c index fa221141e9c1..a39eba9f5201 100644 --- a/tools/testing/selftests/bpf/progs/test_sockmap_listen.c +++ b/tools/testing/selftests/bpf/progs/test_sockmap_listen.c @@ -29,6 +29,7 @@ struct { } verdict_map SEC(".maps"); static volatile bool test_sockmap; /* toggled by user-space */ +static volatile bool test_ingress; /* toggled by user-space */ SEC("sk_skb/stream_parser") int prog_stream_parser(struct __sk_buff *skb) @@ -55,6 +56,27 @@ int prog_stream_verdict(struct __sk_buff *skb) return verdict; } +SEC("sk_skb/skb_verdict") +int prog_skb_verdict(struct __sk_buff *skb) +{ + unsigned int *count; + __u32 zero = 0; + int verdict; + + if (test_sockmap) + verdict = bpf_sk_redirect_map(skb, &sock_map, zero, + test_ingress ? BPF_F_INGRESS : 0); + else + verdict = bpf_sk_redirect_hash(skb, &sock_hash, &zero, + test_ingress ? BPF_F_INGRESS : 0); + + count = bpf_map_lookup_elem(&verdict_map, &verdict); + if (count) + (*count)++; + + return verdict; +} + SEC("sk_msg") int prog_msg_verdict(struct sk_msg_md *msg) {