[bpf-next,v3,3/9] udp: implement ->sendmsg_locked()

Message ID	20210305015655.14249-4-xiyou.wangcong@gmail.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@kernel.org> From: Cong Wang <xiyou.wangcong@gmail.com> To: netdev@vger.kernel.org Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com, wangdongdong.6@bytedance.com, jiang.wang@bytedance.com, Cong Wang <cong.wang@bytedance.com>, John Fastabend <john.fastabend@gmail.com>, Daniel Borkmann <daniel@iogearbox.net>, Jakub Sitnicki <jakub@cloudflare.com>, Lorenz Bauer <lmb@cloudflare.com> Subject: [Patch bpf-next v3 3/9] udp: implement ->sendmsg_locked() Date: Thu, 4 Mar 2021 17:56:49 -0800 Message-Id: <20210305015655.14249-4-xiyou.wangcong@gmail.com> In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com> References: <20210305015655.14249-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	sockmap: introduce BPF_SK_SKB_VERDICT and support UDP \| expand [bpf-next,v3,0/9] sockmap: introduce BPF_SK_SKB_VERDICT and support UDP [bpf-next,v3,1/9] sock_map: introduce BPF_SK_SKB_VERDICT [bpf-next,v3,2/9] sock: introduce sk->sk_prot->psock_update_sk_prot() [bpf-next,v3,3/9] udp: implement ->sendmsg_locked() [bpf-next,v3,4/9] udp: implement ->read_sock() for sockmap [bpf-next,v3,5/9] udp: add ->read_sock() and ->sendmsg_locked() to ipv6 [bpf-next,v3,6/9] skmsg: extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data() [bpf-next,v3,7/9] udp: implement udp_bpf_recvmsg() for sockmap [bpf-next,v3,8/9] sock_map: update sock type checks for UDP [bpf-next,v3,9/9] selftests/bpf: add a test case for udp sockmap

Message ID

20210305015655.14249-4-xiyou.wangcong@gmail.com (mailing list archive)

State

Superseded

Delegated to:

BPF

Headers

From: Cong Wang <xiyou.wangcong@gmail.com>
To: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, duanxiongchun@bytedance.com,
        wangdongdong.6@bytedance.com, jiang.wang@bytedance.com,
        Cong Wang <cong.wang@bytedance.com>,
        John Fastabend <john.fastabend@gmail.com>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Jakub Sitnicki <jakub@cloudflare.com>,
        Lorenz Bauer <lmb@cloudflare.com>
Subject: [Patch bpf-next v3 3/9] udp: implement ->sendmsg_locked()
Date: Thu,  4 Mar 2021 17:56:49 -0800
Message-Id: <20210305015655.14249-4-xiyou.wangcong@gmail.com>
In-Reply-To: <20210305015655.14249-1-xiyou.wangcong@gmail.com>
References: <20210305015655.14249-1-xiyou.wangcong@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

sockmap: introduce BPF_SK_SKB_VERDICT and support UDP | expand

Checks

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/subject_prefix	success	Link
netdev/cc_maintainers	warning	10 maintainers not CCed: yoshfuji@linux-ipv6.org kuba@kernel.org davem@davemloft.net yhs@fb.com ast@kernel.org kpsingh@kernel.org songliubraving@fb.com kafai@fb.com andrii@kernel.org dsahern@kernel.org
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 280 this patch: 280
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	warning	WARNING: line length of 86 exceeds 80 columns
netdev/build_allmodconfig_warn	success	Errors and warnings before: 388 this patch: 388
netdev/header_inline	success	Link
netdev/stable	success	Stable not CCed

Context

Check

Description

netdev/cover_letter

success

Link

netdev/fixes_present

success

Link

netdev/patch_count

success

Link

netdev/tree_selection

success

Clearly marked for bpf-next

netdev/subject_prefix

success

Link

netdev/cc_maintainers

warning

10 maintainers not CCed: yoshfuji@linux-ipv6.org kuba@kernel.org davem@davemloft.net yhs@fb.com ast@kernel.org kpsingh@kernel.org songliubraving@fb.com kafai@fb.com andrii@kernel.org dsahern@kernel.org

netdev/source_inline

success

Was 0 now: 0

netdev/verify_signedoff

success

Link

netdev/module_param

success

Was 0 now: 0

netdev/build_32bit

success

Errors and warnings before: 280 this patch: 280

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/verify_fixes

success

Link

netdev/checkpatch

warning

WARNING: line length of 86 exceeds 80 columns

netdev/build_allmodconfig_warn

success

Errors and warnings before: 388 this patch: 388

netdev/header_inline

success

Link

netdev/stable

success

Stable not CCed

Commit Message

Cong Wang March 5, 2021, 1:56 a.m. UTC

From: Cong Wang <cong.wang@bytedance.com>

UDP already has udp_sendmsg() which takes lock_sock() inside.
We have to build ->sendmsg_locked() on top of it, by adding
a new parameter for whether the sock has been locked.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/net/udp.h  |  1 +
 net/ipv4/af_inet.c |  1 +
 net/ipv4/udp.c     | 30 +++++++++++++++++++++++-------
 3 files changed, 25 insertions(+), 7 deletions(-)

Comments

John Fastabend March 6, 2021, 1:20 a.m. UTC | #1

Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> UDP already has udp_sendmsg() which takes lock_sock() inside.
> We have to build ->sendmsg_locked() on top of it, by adding
> a new parameter for whether the sock has been locked.
> 
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  include/net/udp.h  |  1 +
>  net/ipv4/af_inet.c |  1 +
>  net/ipv4/udp.c     | 30 +++++++++++++++++++++++-------
>  3 files changed, 25 insertions(+), 7 deletions(-)

[...]

> -int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> +static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
>  {

The lock_sock is also taken by BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK() in
udp_sendmsg(),

 if (cgroup_bpf_enabled(BPF_CGROUP_UDP4_SENDMSG) && !connected) {
    err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
                                    (struct sockaddr *)usin, &ipc.addr);

so that will also need to be handled.

It also looks like sk_dst_set() wants the sock lock to be held, but I'm not
seeing how its covered in the current code,

 static inline void
 __sk_dst_set(struct sock *sk, struct dst_entry *dst)
 {
        struct dst_entry *old_dst;

        sk_tx_queue_clear(sk);
        sk->sk_dst_pending_confirm = 0;
        old_dst = rcu_dereference_protected(sk->sk_dst_cache,
                                            lockdep_sock_is_held(sk));
        rcu_assign_pointer(sk->sk_dst_cache, dst);
        dst_release(old_dst);
 }

I guess this could trip lockdep now, I'll dig a bit more Monday and see
if its actually the case.

In general I don't really like code that wraps locks in 'if' branches
like this. It seem fragile to me. I didn't walk every path in the code
to see if a lock is taken in any of the called functions but it looks
like ip_send_skb() can call into netfilter code and may try to take
the sock lock.

Do we need this locked send at all? We use it in sk_psock_backlog
but that routine needs an optimization rewrite for TCP anyways.
Its dropping a lot of performance on the floor for no good reason.

.John

Cong Wang March 6, 2021, 6:34 p.m. UTC | #2

On Fri, Mar 5, 2021 at 5:21 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > UDP already has udp_sendmsg() which takes lock_sock() inside.
> > We have to build ->sendmsg_locked() on top of it, by adding
> > a new parameter for whether the sock has been locked.
> >
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  include/net/udp.h  |  1 +
> >  net/ipv4/af_inet.c |  1 +
> >  net/ipv4/udp.c     | 30 +++++++++++++++++++++++-------
> >  3 files changed, 25 insertions(+), 7 deletions(-)
>
> [...]
>
> > -int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> > +static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
> >  {
>
> The lock_sock is also taken by BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK() in
> udp_sendmsg(),
>
>  if (cgroup_bpf_enabled(BPF_CGROUP_UDP4_SENDMSG) && !connected) {
>     err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
>                                     (struct sockaddr *)usin, &ipc.addr);
>
> so that will also need to be handled.

Indeed, good catch!

>
> It also looks like sk_dst_set() wants the sock lock to be held, but I'm not
> seeing how its covered in the current code,
>
>  static inline void
>  __sk_dst_set(struct sock *sk, struct dst_entry *dst)
>  {
>         struct dst_entry *old_dst;
>
>         sk_tx_queue_clear(sk);
>         sk->sk_dst_pending_confirm = 0;
>         old_dst = rcu_dereference_protected(sk->sk_dst_cache,
>                                             lockdep_sock_is_held(sk));
>         rcu_assign_pointer(sk->sk_dst_cache, dst);
>         dst_release(old_dst);
>  }

I do not see how __sk_dst_set() is called in udp_sendmsg().

>
> I guess this could trip lockdep now, I'll dig a bit more Monday and see
> if its actually the case.
>
> In general I don't really like code that wraps locks in 'if' branches
> like this. It seem fragile to me. I didn't walk every path in the code

I do not like it either, actually I spent quite some time trying to
get rid of this lock_sock, it is definitely not easy. The comment in
sk_psock_backlog() is clearly wrong, we do not lock_sock to keep
sk_socket, we lock it to protect other structures like
ingress_{skb,msg}.

> to see if a lock is taken in any of the called functions but it looks
> like ip_send_skb() can call into netfilter code and may try to take
> the sock lock.

Are you saying skb_send_sock_locked() is buggy? If so, clearly not
my fault.

>
> Do we need this locked send at all? We use it in sk_psock_backlog
> but that routine needs an optimization rewrite for TCP anyways.
> Its dropping a lot of performance on the floor for no good reason.

At least for ingress_msg. It is not as easy as adding a queue lock here,
because we probably want to retrieve atomically with the receive queue
together.

Thanks.

John Fastabend March 9, 2021, 12:10 a.m. UTC | #3

Cong Wang wrote:
> On Fri, Mar 5, 2021 at 5:21 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > UDP already has udp_sendmsg() which takes lock_sock() inside.
> > > We have to build ->sendmsg_locked() on top of it, by adding
> > > a new parameter for whether the sock has been locked.
> > >
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> > >  include/net/udp.h  |  1 +
> > >  net/ipv4/af_inet.c |  1 +
> > >  net/ipv4/udp.c     | 30 +++++++++++++++++++++++-------
> > >  3 files changed, 25 insertions(+), 7 deletions(-)
> >
> > [...]
> >
> > > -int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> > > +static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
> > >  {
> >
> > The lock_sock is also taken by BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK() in
> > udp_sendmsg(),
> >
> >  if (cgroup_bpf_enabled(BPF_CGROUP_UDP4_SENDMSG) && !connected) {
> >     err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk,
> >                                     (struct sockaddr *)usin, &ipc.addr);
> >
> > so that will also need to be handled.
> 
> Indeed, good catch!

This is going to get tricky though because we can't exactly drop the
lock and try to reclaim it. We would have no guarentee some other
core didn't grab the lock from the backlog side.

> 
> >
> > It also looks like sk_dst_set() wants the sock lock to be held, but I'm not
> > seeing how its covered in the current code,
> >
> >  static inline void
> >  __sk_dst_set(struct sock *sk, struct dst_entry *dst)
> >  {
> >         struct dst_entry *old_dst;
> >
> >         sk_tx_queue_clear(sk);
> >         sk->sk_dst_pending_confirm = 0;
> >         old_dst = rcu_dereference_protected(sk->sk_dst_cache,
> >                                             lockdep_sock_is_held(sk));
> >         rcu_assign_pointer(sk->sk_dst_cache, dst);
> >         dst_release(old_dst);
> >  }
> 
> I do not see how __sk_dst_set() is called in udp_sendmsg().

The path I was probably looking at is,

  udp_sendmsg()
    sk_dst_check()
      sk_dst_reset()
        sk_dst_set(sk, NULL)

but that does a cmpxchg only __sk_dst_set() actually has the
lockdep_sock_is_held(sk) check. So should be OK.

> 
> >
> > I guess this could trip lockdep now, I'll dig a bit more Monday and see
> > if its actually the case.
> >
> > In general I don't really like code that wraps locks in 'if' branches
> > like this. It seem fragile to me. I didn't walk every path in the code
> 
> I do not like it either, actually I spent quite some time trying to
> get rid of this lock_sock, it is definitely not easy. The comment in
> sk_psock_backlog() is clearly wrong, we do not lock_sock to keep
> sk_socket, we lock it to protect other structures like
> ingress_{skb,msg}.

The comment comes from early days before psock was ref counted and
can be removed.

> 
> > to see if a lock is taken in any of the called functions but it looks
> > like ip_send_skb() can call into netfilter code and may try to take
> > the sock lock.
> 
> Are you saying skb_send_sock_locked() is buggy? If so, clearly not
> my fault.

Except this path only exists on the UDP I think.

  udp_sendmsg()
   udp_send_skb()
     ip_send_skb()
     ...

TCP has some extra queuing logic in there that makes this work.

> 
> >
> > Do we need this locked send at all? We use it in sk_psock_backlog
> > but that routine needs an optimization rewrite for TCP anyways.
> > Its dropping a lot of performance on the floor for no good reason.
> 
> At least for ingress_msg. It is not as easy as adding a queue lock here,
> because we probably want to retrieve atomically with the receive queue
> together.

Agree. I'll try a bit harder tomorrow and see if I can come up with
anything, intended to do this today, but got busy with some other things.
Best case is we find some way to drop that sock_lock altogether here.

> 
> Thanks.

diff --git a/include/net/udp.h b/include/net/udp.h
index df7cc1edc200..5264ba1439f9 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -292,6 +292,7 @@  int udp_get_port(struct sock *sk, unsigned short snum,
 int udp_err(struct sk_buff *, u32);
 int udp_abort(struct sock *sk, int err);
 int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
+int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len);
 int udp_push_pending_frames(struct sock *sk);
 void udp_flush_pending_frames(struct sock *sk);
 int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index a02ce89b56b5..d8c73a848c53 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1071,6 +1071,7 @@  const struct proto_ops inet_dgram_ops = {
 	.setsockopt	   = sock_common_setsockopt,
 	.getsockopt	   = sock_common_getsockopt,
 	.sendmsg	   = inet_sendmsg,
+	.sendmsg_locked    = udp_sendmsg_locked,
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = sock_no_mmap,
 	.sendpage	   = inet_sendpage,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 38952aaee3a1..424231e910a9 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1024,7 +1024,7 @@  int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size)
 }
 EXPORT_SYMBOL_GPL(udp_cmsg_send);
 
-int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+static int __udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len, bool locked)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct udp_sock *up = udp_sk(sk);
@@ -1063,15 +1063,18 @@  int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		 * There are pending frames.
 		 * The socket lock must be held while it's corked.
 		 */
-		lock_sock(sk);
+		if (!locked)
+			lock_sock(sk);
 		if (likely(up->pending)) {
 			if (unlikely(up->pending != AF_INET)) {
-				release_sock(sk);
+				if (!locked)
+					release_sock(sk);
 				return -EINVAL;
 			}
 			goto do_append_data;
 		}
-		release_sock(sk);
+		if (!locked)
+			release_sock(sk);
 	}
 	ulen += sizeof(struct udphdr);
 
@@ -1241,11 +1244,13 @@  int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		goto out;
 	}
 
-	lock_sock(sk);
+	if (!locked)
+		lock_sock(sk);
 	if (unlikely(up->pending)) {
 		/* The socket is already corked while preparing it. */
 		/* ... which is an evident application bug. --ANK */
-		release_sock(sk);
+		if (!locked)
+			release_sock(sk);
 
 		net_dbg_ratelimited("socket already corked\n");
 		err = -EINVAL;
@@ -1272,7 +1277,8 @@  int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		err = udp_push_pending_frames(sk);
 	else if (unlikely(skb_queue_empty(&sk->sk_write_queue)))
 		up->pending = 0;
-	release_sock(sk);
+	if (!locked)
+		release_sock(sk);
 
 out:
 	ip_rt_put(rt);
@@ -1302,8 +1308,18 @@  int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	err = 0;
 	goto out;
 }
+
+int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	return __udp_sendmsg(sk, msg, len, false);
+}
 EXPORT_SYMBOL(udp_sendmsg);
 
+int udp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t len)
+{
+	return __udp_sendmsg(sk, msg, len, true);
+}
+
 int udp_sendpage(struct sock *sk, struct page *page, int offset,
 		 size_t size, int flags)
 {