Message ID | 20211209013250.44347-1-kuniyu@amazon.co.jp (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net-next] tcp: Warn if sock_owned_by_user() is true in tcp_child_process(). | expand |
On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > While creating a child socket from ACK (not TCP Fast Open case), before > v2.3.41, we used to call bh_lock_sock() later than now; it was called just > before tcp_rcv_state_process(). The full socket was put into an accept > queue and exposed to other CPUs before bh_lock_sock() so that process > context might have acquired the lock by then. Thus, we had to check if any > process context was accessing the socket before tcp_rcv_state_process(). > I think you misunderstood me. I think this code is not dead yet, so I would : Not include a Fixes: tag to avoid unnecessary backports (of a patch and its revert) If you want to get syzbot coverage for few releases, especially with MPTCP and synflood, you can then submit a patch like the following. diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index cf913a66df17..19da6e442fca 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -843,6 +843,9 @@ int tcp_child_process(struct sock *parent, struct sock *child, * in main socket hash table and lock on listening * socket does not protect us more. */ + + /* Check if this code path is obsolete ? */ + WARN_ON_ONCE(1); __sk_add_backlog(child, skb); }
From: Eric Dumazet <edumazet@google.com> Date: Thu, 9 Dec 2021 00:00:35 -0800 > On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: >> >> While creating a child socket from ACK (not TCP Fast Open case), before >> v2.3.41, we used to call bh_lock_sock() later than now; it was called just >> before tcp_rcv_state_process(). The full socket was put into an accept >> queue and exposed to other CPUs before bh_lock_sock() so that process >> context might have acquired the lock by then. Thus, we had to check if any >> process context was accessing the socket before tcp_rcv_state_process(). >> > > I think you misunderstood me. > > I think this code is not dead yet, so I would : > > Not include a Fixes: tag to avoid unnecessary backports (of a patch > and its revert) > > If you want to get syzbot coverage for few releases, especially with > MPTCP and synflood, > you can then submit a patch like the following. Sorry, I got on the same page. Let me take a look at MPTCP, then if I still think it is dead code, I will submit the patch. Thank you. > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > index cf913a66df17..19da6e442fca 100644 > --- a/net/ipv4/tcp_minisocks.c > +++ b/net/ipv4/tcp_minisocks.c > @@ -843,6 +843,9 @@ int tcp_child_process(struct sock *parent, struct > sock *child, > * in main socket hash table and lock on listening > * socket does not protect us more. > */ > + > + /* Check if this code path is obsolete ? */ > + WARN_ON_ONCE(1); > __sk_add_backlog(child, skb); > }
On Thu, 2021-12-09 at 20:07 +0900, Kuniyuki Iwashima wrote: > From: Eric Dumazet <edumazet@google.com> > Date: Thu, 9 Dec 2021 00:00:35 -0800 > > On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: > > > > > > While creating a child socket from ACK (not TCP Fast Open case), before > > > v2.3.41, we used to call bh_lock_sock() later than now; it was called just > > > before tcp_rcv_state_process(). The full socket was put into an accept > > > queue and exposed to other CPUs before bh_lock_sock() so that process > > > context might have acquired the lock by then. Thus, we had to check if any > > > process context was accessing the socket before tcp_rcv_state_process(). > > > > > > > I think you misunderstood me. > > > > I think this code is not dead yet, so I would : > > > > Not include a Fixes: tag to avoid unnecessary backports (of a patch > > and its revert) > > > > If you want to get syzbot coverage for few releases, especially with > > MPTCP and synflood, > > you can then submit a patch like the following. > > Sorry, I got on the same page. > Let me take a look at MPTCP, then if I still think it is dead code, I will > submit the patch. For the records, I think the 'else' branch should be reachble with MPTCP in some non trivial scenario, e.g. MPJ subflows 3WHS racing with setsockopt on the main MPTCP socket. I'm unsure if syzbot could catch that, as it needs mptcp endpoints configuration. Cheers, Paolo
From: Paolo Abeni <pabeni@redhat.com> Date: Thu, 09 Dec 2021 12:59:21 +0100 > On Thu, 2021-12-09 at 20:07 +0900, Kuniyuki Iwashima wrote: >> From: Eric Dumazet <edumazet@google.com> >> Date: Thu, 9 Dec 2021 00:00:35 -0800 >>> On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote: >>>> >>>> While creating a child socket from ACK (not TCP Fast Open case), before >>>> v2.3.41, we used to call bh_lock_sock() later than now; it was called just >>>> before tcp_rcv_state_process(). The full socket was put into an accept >>>> queue and exposed to other CPUs before bh_lock_sock() so that process >>>> context might have acquired the lock by then. Thus, we had to check if any >>>> process context was accessing the socket before tcp_rcv_state_process(). >>>> >>> >>> I think you misunderstood me. >>> >>> I think this code is not dead yet, so I would : >>> >>> Not include a Fixes: tag to avoid unnecessary backports (of a patch >>> and its revert) >>> >>> If you want to get syzbot coverage for few releases, especially with >>> MPTCP and synflood, >>> you can then submit a patch like the following. >> >> Sorry, I got on the same page. >> Let me take a look at MPTCP, then if I still think it is dead code, I will >> submit the patch. > > For the records, I think the 'else' branch should be reachble with > MPTCP in some non trivial scenario, e.g. MPJ subflows 3WHS racing with > setsockopt on the main MPTCP socket. I'm unsure if syzbot could catch > that, as it needs mptcp endpoints configuration. Ah, I was wrong. Thanks for explaining! > > Cheers, > > Paolo
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index cf913a66df17..85b1e752da5d 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -833,18 +833,15 @@ int tcp_child_process(struct sock *parent, struct sock *child, sk_mark_napi_id(child, skb); tcp_segs_in(tcp_sk(child), skb); - if (!sock_owned_by_user(child)) { - ret = tcp_rcv_state_process(child, skb); - /* Wakeup parent, send SIGIO */ - if (state == TCP_SYN_RECV && child->sk_state != state) - parent->sk_data_ready(parent); - } else { - /* Alas, it is possible again, because we do lookup - * in main socket hash table and lock on listening - * socket does not protect us more. - */ - __sk_add_backlog(child, skb); - } + + /* The lock is held in sk_clone_lock() */ + WARN_ON_ONCE(sock_owned_by_user(child)); + + ret = tcp_rcv_state_process(child, skb); + + /* Wakeup parent, send SIGIO */ + if (state == TCP_SYN_RECV && child->sk_state != state) + parent->sk_data_ready(parent); bh_unlock_sock(child); sock_put(child);
While creating a child socket from ACK (not TCP Fast Open case), before v2.3.41, we used to call bh_lock_sock() later than now; it was called just before tcp_rcv_state_process(). The full socket was put into an accept queue and exposed to other CPUs before bh_lock_sock() so that process context might have acquired the lock by then. Thus, we had to check if any process context was accessing the socket before tcp_rcv_state_process(). We can see this code in tcp_v4_do_rcv() of v2.3.14. [0] if (sk->state == TCP_LISTEN) { struct sock *nsk; nsk = tcp_v4_hnd_req(sk, skb); ... if (nsk != sk) { bh_lock_sock(nsk); if (nsk->lock.users != 0) { ... sk_add_backlog(nsk, skb); bh_unlock_sock(nsk); return 0; } ... } } if (tcp_rcv_state_process(sk, skb, skb->h.th, skb->len)) goto reset; However, in 2.3.15, this lock.users test was replaced with BUG_TRAP() by mistake. [1] if (nsk != sk) { ... BUG_TRAP(nsk->lock.users == 0); ... ret = tcp_rcv_state_process(nsk, skb, skb->h.th, skb->len); ... bh_unlock_sock(nsk); ... return 0; } Fortunately, the test was back in 2.3.41. [2] Then, related code was packed into tcp_child_process() with comments, which remains until now. What is interesting in v2.3.41 is that the bh_lock_sock() was moved to tcp_create_openreq_child() and placed just after sock_lock_init(). Thus, the lock is never acquired until tcp_rcv_state_process() by process contexts. The bh_lock_sock() is now in sk_clone_lock() and the rule does not change. As of now, alas, it is not possible to reach the commented path by the change. This patch removes the unreachable path and adds a WARN_ON_ONCE() so that syzbot can validate if it is dead code or not. The WARN_ON_ONCE() could be removed if syzbot is happy for at least one release. [3] [0]: https://cdn.kernel.org/pub/linux/kernel/v2.3/linux-2.3.14.tar.gz [1]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.15.gz [2]: https://cdn.kernel.org/pub/linux/kernel/v2.3/patch-2.3.41.gz [3]: https://lore.kernel.org/all/CANn89iL+YWbQDCTQU-D1nU4EePv07EyHvMPjFPkpH1ELyzg5MA@mail.gmail.com/ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> --- I left Fixes: tag as a reference, but if it is unnecessary, please remove it. Changelog: v2: * Add a WARN_ON_ONCE() * Clarify TCP Fast Open is not the case v1: https://lore.kernel.org/all/20211208051633.49122-1-kuniyu@amazon.co.jp/ --- net/ipv4/tcp_minisocks.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-)