Message ID | 20221114191619.124659-1-jakub@cloudflare.com (mailing list archive) |
---|---|
State | Accepted |
Commit | b68777d54fac21fc833ec26ea1a2a84f975ab035 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock | expand |
On Mon, Nov 14, 2022 at 20:16:19 +0100, Jakub Sitnicki wrote: > sk->sk_user_data has multiple users, which are not compatible with each > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > > l2tp currently fails to grab the lock when modifying the underlying tunnel > socket fields. Fix it by adding appropriate locking. > > We err on the side of safety and grab the sk_callback_lock also inside the > sk_destruct callback overridden by l2tp, even though there should be no > refs allowing access to the sock at the time when sk_destruct gets called. > > v4: > - serialize write to sk_user_data in l2tp sk_destruct > > v3: > - switch from sock lock to sk_callback_lock > - document write-protection for sk_user_data > > v2: > - update Fixes to point to origin of the bug > - use real names in Reported/Tested-by tags LGTM, thanks Jakub. > > Cc: Tom Parkin <tparkin@katalix.com> > Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") > Reported-by: Haowei Yan <g1042620637@gmail.com> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > --- > > This took me forever. Sorry about that. > > include/net/sock.h | 2 +- > net/l2tp/l2tp_core.c | 19 +++++++++++++------ > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 5db02546941c..e0517ecc6531 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -323,7 +323,7 @@ struct sk_filter; > * @sk_tskey: counter to disambiguate concurrent tstamp requests > * @sk_zckey: counter to order MSG_ZEROCOPY notifications > * @sk_socket: Identd and reporting IO signals > - * @sk_user_data: RPC layer private data > + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. > * @sk_frag: cached page frag > * @sk_peek_off: current peek_offset value > * @sk_send_head: front of stuff to transmit > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > index 7499c51b1850..754fdda8a5f5 100644 > --- a/net/l2tp/l2tp_core.c > +++ b/net/l2tp/l2tp_core.c > @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk) > } > > /* Remove hooks into tunnel socket */ > + write_lock_bh(&sk->sk_callback_lock); > sk->sk_destruct = tunnel->old_sk_destruct; > sk->sk_user_data = NULL; > + write_unlock_bh(&sk->sk_callback_lock); > > /* Call the original destructor */ > if (sk->sk_destruct) > @@ -1469,16 +1471,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > sock = sockfd_lookup(tunnel->fd, &ret); > if (!sock) > goto err; > - > - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap); > - if (ret < 0) > - goto err_sock; > } > > + sk = sock->sk; > + write_lock(&sk->sk_callback_lock); > + > + ret = l2tp_validate_socket(sk, net, tunnel->encap); > + if (ret < 0) > + goto err_sock; > + > tunnel->l2tp_net = net; > pn = l2tp_pernet(net); > > - sk = sock->sk; > sock_hold(sk); > tunnel->sock = sk; > > @@ -1504,7 +1508,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > > setup_udp_tunnel_sock(net, sock, &udp_cfg); > } else { > - sk->sk_user_data = tunnel; > + rcu_assign_sk_user_data(sk, tunnel); > } > > tunnel->old_sk_destruct = sk->sk_destruct; > @@ -1518,6 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > if (tunnel->fd >= 0) > sockfd_put(sock); > > + write_unlock(&sk->sk_callback_lock); > return 0; > > err_sock: > @@ -1525,6 +1530,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, > sock_release(sock); > else > sockfd_put(sock); > + > + write_unlock(&sk->sk_callback_lock); > err: > return ret; > } > -- > 2.38.1 >
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: > sk->sk_user_data has multiple users, which are not compatible with each > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > > l2tp currently fails to grab the lock when modifying the underlying tunnel > socket fields. Fix it by adding appropriate locking. > > We err on the side of safety and grab the sk_callback_lock also inside the > sk_destruct callback overridden by l2tp, even though there should be no > refs allowing access to the sock at the time when sk_destruct gets called. > > [...] Here is the summary with links: - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock https://git.kernel.org/netdev/net/c/b68777d54fac You are awesome, thank you!
On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > Hello: > > This patch was applied to netdev/net.git (master) > by David S. Miller <davem@davemloft.net>: > > On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: > > sk->sk_user_data has multiple users, which are not compatible with each > > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > > > > l2tp currently fails to grab the lock when modifying the underlying tunnel > > socket fields. Fix it by adding appropriate locking. > > > > We err on the side of safety and grab the sk_callback_lock also inside the > > sk_destruct callback overridden by l2tp, even though there should be no > > refs allowing access to the sock at the time when sk_destruct gets called. > > > > [...] > > Here is the summary with links: > - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock > https://git.kernel.org/netdev/net/c/b68777d54fac > > I guess this patch has not been tested with LOCKDEP, right ? sk_callback_lock always needs _bh safety. I will send something like: diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, } sk = sock->sk; - write_lock(&sk->sk_callback_lock); + write_lock_bh(&sk->sk_callback_lock); ret = l2tp_validate_socket(sk, net, tunnel->encap); if (ret < 0) @@ -1522,7 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, if (tunnel->fd >= 0) sockfd_put(sock); - write_unlock(&sk->sk_callback_lock); + write_unlock_bh(&sk->sk_callback_lock); return 0; err_sock: @@ -1531,7 +1531,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, else sockfd_put(sock); - write_unlock(&sk->sk_callback_lock); + write_unlock_bh(&sk->sk_callback_lock); err: return ret; }
On Thu, Nov 17, 2022 at 01:07 AM -08, Eric Dumazet wrote: > On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: >> >> Hello: >> >> This patch was applied to netdev/net.git (master) >> by David S. Miller <davem@davemloft.net>: >> >> On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: >> > sk->sk_user_data has multiple users, which are not compatible with each >> > other. Writers must synchronize by grabbing the sk->sk_callback_lock. >> > >> > l2tp currently fails to grab the lock when modifying the underlying tunnel >> > socket fields. Fix it by adding appropriate locking. >> > >> > We err on the side of safety and grab the sk_callback_lock also inside the >> > sk_destruct callback overridden by l2tp, even though there should be no >> > refs allowing access to the sock at the time when sk_destruct gets called. >> > >> > [...] >> >> Here is the summary with links: >> - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock >> https://git.kernel.org/netdev/net/c/b68777d54fac >> >> > > I guess this patch has not been tested with LOCKDEP, right ? > > sk_callback_lock always needs _bh safety. > > I will send something like: > > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 > 100644 > --- a/net/l2tp/l2tp_core.c > +++ b/net/l2tp/l2tp_core.c > @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > *tunnel, struct net *net, > } > > sk = sock->sk; > - write_lock(&sk->sk_callback_lock); > + write_lock_bh(&sk->sk_callback_lock); > > ret = l2tp_validate_socket(sk, net, tunnel->encap); > if (ret < 0) > @@ -1522,7 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > *tunnel, struct net *net, > if (tunnel->fd >= 0) > sockfd_put(sock); > > - write_unlock(&sk->sk_callback_lock); > + write_unlock_bh(&sk->sk_callback_lock); > return 0; > > err_sock: > @@ -1531,7 +1531,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > *tunnel, struct net *net, > else > sockfd_put(sock); > > - write_unlock(&sk->sk_callback_lock); > + write_unlock_bh(&sk->sk_callback_lock); > err: > return ret; > } Hmm, weird. I double checked - I have PROVE_LOCKING enabled. Didn't see any lockdep reports when running selftests/net/l2tp.sh. I my defense - I thought _bh was not needed because l2tp_tunnel_register() gets called only in the process context. I mean, it's triggered by Netlink sendmsg, but that gets processed in-line AFAIU: netlink_sendmsg netlink_unicast ->netlink_rcv genl_rcv genl_rcv_msg genl_family_rcv_msg genl_family_rcv_msg_doit ->doit l2tp_nl_cmd_tunnel_create l2tp_tunnel_register
On Thu, Nov 17, 2022 at 1:07 AM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > > > Hello: > > > > This patch was applied to netdev/net.git (master) > > by David S. Miller <davem@davemloft.net>: > > > > On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: > > > sk->sk_user_data has multiple users, which are not compatible with each > > > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > > > > > > l2tp currently fails to grab the lock when modifying the underlying tunnel > > > socket fields. Fix it by adding appropriate locking. > > > > > > We err on the side of safety and grab the sk_callback_lock also inside the > > > sk_destruct callback overridden by l2tp, even though there should be no > > > refs allowing access to the sock at the time when sk_destruct gets called. > > > > > > [...] > > > > Here is the summary with links: > > - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock > > https://git.kernel.org/netdev/net/c/b68777d54fac > > > > > > I guess this patch has not been tested with LOCKDEP, right ? > > sk_callback_lock always needs _bh safety. > > I will send something like: > > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 > 100644 > --- a/net/l2tp/l2tp_core.c > +++ b/net/l2tp/l2tp_core.c > @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > *tunnel, struct net *net, > } > > sk = sock->sk; > - write_lock(&sk->sk_callback_lock); > + write_lock_bh(&sk->sk_callback_lock); Unfortunately this might still not work, because setup_udp_tunnel_sock->udp_encap_enable() probably could sleep in static_branch_inc() ? I will release the syzbot report, and let you folks work on a fix, thanks.
On Thu, Nov 17, 2022 at 1:45 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Thu, Nov 17, 2022 at 01:07 AM -08, Eric Dumazet wrote: > > On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > >> > >> Hello: > >> > >> This patch was applied to netdev/net.git (master) > >> by David S. Miller <davem@davemloft.net>: > >> > >> On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: > >> > sk->sk_user_data has multiple users, which are not compatible with each > >> > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > >> > > >> > l2tp currently fails to grab the lock when modifying the underlying tunnel > >> > socket fields. Fix it by adding appropriate locking. > >> > > >> > We err on the side of safety and grab the sk_callback_lock also inside the > >> > sk_destruct callback overridden by l2tp, even though there should be no > >> > refs allowing access to the sock at the time when sk_destruct gets called. > >> > > >> > [...] > >> > >> Here is the summary with links: > >> - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock > >> https://git.kernel.org/netdev/net/c/b68777d54fac > >> > >> > > > > I guess this patch has not been tested with LOCKDEP, right ? > > > > sk_callback_lock always needs _bh safety. > > > > I will send something like: > > > > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > > index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 > > 100644 > > --- a/net/l2tp/l2tp_core.c > > +++ b/net/l2tp/l2tp_core.c > > @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > > *tunnel, struct net *net, > > } > > > > sk = sock->sk; > > - write_lock(&sk->sk_callback_lock); > > + write_lock_bh(&sk->sk_callback_lock); > > > > ret = l2tp_validate_socket(sk, net, tunnel->encap); > > if (ret < 0) > > @@ -1522,7 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > > *tunnel, struct net *net, > > if (tunnel->fd >= 0) > > sockfd_put(sock); > > > > - write_unlock(&sk->sk_callback_lock); > > + write_unlock_bh(&sk->sk_callback_lock); > > return 0; > > > > err_sock: > > @@ -1531,7 +1531,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > > *tunnel, struct net *net, > > else > > sockfd_put(sock); > > > > - write_unlock(&sk->sk_callback_lock); > > + write_unlock_bh(&sk->sk_callback_lock); > > err: > > return ret; > > } > > Hmm, weird. I double checked - I have PROVE_LOCKING enabled. > Didn't see any lockdep reports when running selftests/net/l2tp.sh. > > I my defense - I thought _bh was not needed because > l2tp_tunnel_register() gets called only in the process context. I mean, > it's triggered by Netlink sendmsg, but that gets processed in-line > AFAIU: > > netlink_sendmsg > netlink_unicast > ->netlink_rcv > genl_rcv > genl_rcv_msg > genl_family_rcv_msg > genl_family_rcv_msg_doit > ->doit > l2tp_nl_cmd_tunnel_create > l2tp_tunnel_register Three different syzbot reports will help to better understand the issue, sorry it is 2am for me, I am not sure in which time zone you are in ...
On Thu, Nov 17, 2022 at 01:40 AM -08, Eric Dumazet wrote: > On Thu, Nov 17, 2022 at 1:07 AM Eric Dumazet <edumazet@google.com> wrote: >> >> On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: >> > >> > Hello: >> > >> > This patch was applied to netdev/net.git (master) >> > by David S. Miller <davem@davemloft.net>: >> > >> > On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: >> > > sk->sk_user_data has multiple users, which are not compatible with each >> > > other. Writers must synchronize by grabbing the sk->sk_callback_lock. >> > > >> > > l2tp currently fails to grab the lock when modifying the underlying tunnel >> > > socket fields. Fix it by adding appropriate locking. >> > > >> > > We err on the side of safety and grab the sk_callback_lock also inside the >> > > sk_destruct callback overridden by l2tp, even though there should be no >> > > refs allowing access to the sock at the time when sk_destruct gets called. >> > > >> > > [...] >> > >> > Here is the summary with links: >> > - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock >> > https://git.kernel.org/netdev/net/c/b68777d54fac >> > >> > >> >> I guess this patch has not been tested with LOCKDEP, right ? >> >> sk_callback_lock always needs _bh safety. >> >> I will send something like: >> >> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c >> index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 >> 100644 >> --- a/net/l2tp/l2tp_core.c >> +++ b/net/l2tp/l2tp_core.c >> @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel >> *tunnel, struct net *net, >> } >> >> sk = sock->sk; >> - write_lock(&sk->sk_callback_lock); >> + write_lock_bh(&sk->sk_callback_lock); > > Unfortunately this might still not work, because > setup_udp_tunnel_sock->udp_encap_enable() probably could sleep in > static_branch_inc() ? > > I will release the syzbot report, and let you folks work on a fix, thanks. Ah, the problem is with pppol2tp_connect racing with itself. Thanks for the syzbot report. I will take a look. I live for debugging deadlocks :-)
On Thu, Nov 17, 2022 at 1:57 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Thu, Nov 17, 2022 at 01:40 AM -08, Eric Dumazet wrote: > > On Thu, Nov 17, 2022 at 1:07 AM Eric Dumazet <edumazet@google.com> wrote: > >> > >> On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > >> > > >> > Hello: > >> > > >> > This patch was applied to netdev/net.git (master) > >> > by David S. Miller <davem@davemloft.net>: > >> > > >> > On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: > >> > > sk->sk_user_data has multiple users, which are not compatible with each > >> > > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > >> > > > >> > > l2tp currently fails to grab the lock when modifying the underlying tunnel > >> > > socket fields. Fix it by adding appropriate locking. > >> > > > >> > > We err on the side of safety and grab the sk_callback_lock also inside the > >> > > sk_destruct callback overridden by l2tp, even though there should be no > >> > > refs allowing access to the sock at the time when sk_destruct gets called. > >> > > > >> > > [...] > >> > > >> > Here is the summary with links: > >> > - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock > >> > https://git.kernel.org/netdev/net/c/b68777d54fac > >> > > >> > > >> > >> I guess this patch has not been tested with LOCKDEP, right ? > >> > >> sk_callback_lock always needs _bh safety. > >> > >> I will send something like: > >> > >> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > >> index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 > >> 100644 > >> --- a/net/l2tp/l2tp_core.c > >> +++ b/net/l2tp/l2tp_core.c > >> @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel > >> *tunnel, struct net *net, > >> } > >> > >> sk = sock->sk; > >> - write_lock(&sk->sk_callback_lock); > >> + write_lock_bh(&sk->sk_callback_lock); > > > > Unfortunately this might still not work, because > > setup_udp_tunnel_sock->udp_encap_enable() probably could sleep in > > static_branch_inc() ? > > > > I will release the syzbot report, and let you folks work on a fix, thanks. > > Ah, the problem is with pppol2tp_connect racing with itself. Thanks for > the syzbot report. I will take a look. I live for debugging deadlocks > :-) Hi Jakub, any updates on this issue ? (Other syzbot reports with the same root cause are piling up) Thanks
On Fri, Nov 18, 2022 at 02:28 AM -08, Eric Dumazet wrote: > On Thu, Nov 17, 2022 at 1:57 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: >> >> On Thu, Nov 17, 2022 at 01:40 AM -08, Eric Dumazet wrote: >> > On Thu, Nov 17, 2022 at 1:07 AM Eric Dumazet <edumazet@google.com> wrote: >> >> >> >> On Wed, Nov 16, 2022 at 5:30 AM <patchwork-bot+netdevbpf@kernel.org> wrote: >> >> > >> >> > Hello: >> >> > >> >> > This patch was applied to netdev/net.git (master) >> >> > by David S. Miller <davem@davemloft.net>: >> >> > >> >> > On Mon, 14 Nov 2022 20:16:19 +0100 you wrote: >> >> > > sk->sk_user_data has multiple users, which are not compatible with each >> >> > > other. Writers must synchronize by grabbing the sk->sk_callback_lock. >> >> > > >> >> > > l2tp currently fails to grab the lock when modifying the underlying tunnel >> >> > > socket fields. Fix it by adding appropriate locking. >> >> > > >> >> > > We err on the side of safety and grab the sk_callback_lock also inside the >> >> > > sk_destruct callback overridden by l2tp, even though there should be no >> >> > > refs allowing access to the sock at the time when sk_destruct gets called. >> >> > > >> >> > > [...] >> >> > >> >> > Here is the summary with links: >> >> > - [net,v4] l2tp: Serialize access to sk_user_data with sk_callback_lock >> >> > https://git.kernel.org/netdev/net/c/b68777d54fac >> >> > >> >> > >> >> >> >> I guess this patch has not been tested with LOCKDEP, right ? >> >> >> >> sk_callback_lock always needs _bh safety. >> >> >> >> I will send something like: >> >> >> >> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c >> >> index 754fdda8a5f52e4e8e2c0f47331c3b22765033d0..a3b06a3cf68248f5ec7ae8be2a9711d0f482ac36 >> >> 100644 >> >> --- a/net/l2tp/l2tp_core.c >> >> +++ b/net/l2tp/l2tp_core.c >> >> @@ -1474,7 +1474,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel >> >> *tunnel, struct net *net, >> >> } >> >> >> >> sk = sock->sk; >> >> - write_lock(&sk->sk_callback_lock); >> >> + write_lock_bh(&sk->sk_callback_lock); >> > >> > Unfortunately this might still not work, because >> > setup_udp_tunnel_sock->udp_encap_enable() probably could sleep in >> > static_branch_inc() ? >> > >> > I will release the syzbot report, and let you folks work on a fix, thanks. >> >> Ah, the problem is with pppol2tp_connect racing with itself. Thanks for >> the syzbot report. I will take a look. I live for debugging deadlocks >> :-) > > Hi Jakub, any updates on this issue ? (Other syzbot reports with the > same root cause are piling up) > > Thanks Sorry, I don't have anything yet. I have reserved time to work on it this afternoon (I'm in the CET timezone). Alternatively, I can send a revert right away and come back with fixed patch once I have that, if you prefer.
On Fri, Nov 18, 2022 at 3:00 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > Sorry, I don't have anything yet. I have reserved time to work on it > this afternoon (I'm in the CET timezone). > > Alternatively, I can send a revert right away and come back with fixed > patch once I have that, if you prefer. No worries, this can wait for a fix, thanks.
On Fri, Nov 18, 2022 at 03:09 AM -08, Eric Dumazet wrote: > On Fri, Nov 18, 2022 at 3:00 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > >> >> Sorry, I don't have anything yet. I have reserved time to work on it >> this afternoon (I'm in the CET timezone). >> >> Alternatively, I can send a revert right away and come back with fixed >> patch once I have that, if you prefer. > > No worries, this can wait for a fix, thanks. Proposed fix now posted: https://lore.kernel.org/netdev/20221119130317.39158-1-jakub@cloudflare.com/
On Mon, Nov 14, 2022 at 08:16:19PM +0100, Jakub Sitnicki wrote: > sk->sk_user_data has multiple users, which are not compatible with each > other. Writers must synchronize by grabbing the sk->sk_callback_lock. > > l2tp currently fails to grab the lock when modifying the underlying tunnel > socket fields. Fix it by adding appropriate locking. > > We err on the side of safety and grab the sk_callback_lock also inside the > sk_destruct callback overridden by l2tp, even though there should be no > refs allowing access to the sock at the time when sk_destruct gets called. > > v4: > - serialize write to sk_user_data in l2tp sk_destruct > > v3: > - switch from sock lock to sk_callback_lock > - document write-protection for sk_user_data > > v2: > - update Fixes to point to origin of the bug > - use real names in Reported/Tested-by tags > > Cc: Tom Parkin <tparkin@katalix.com> > Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") > Reported-by: Haowei Yan <g1042620637@gmail.com> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > --- > > This took me forever. Sorry about that. > > include/net/sock.h | 2 +- > net/l2tp/l2tp_core.c | 19 +++++++++++++------ > 2 files changed, 14 insertions(+), 7 deletions(-) > > diff --git a/include/net/sock.h b/include/net/sock.h > index 5db02546941c..e0517ecc6531 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -323,7 +323,7 @@ struct sk_filter; > * @sk_tskey: counter to disambiguate concurrent tstamp requests > * @sk_zckey: counter to order MSG_ZEROCOPY notifications > * @sk_socket: Identd and reporting IO signals > - * @sk_user_data: RPC layer private data > + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. > * @sk_frag: cached page frag > * @sk_peek_off: current peek_offset value > * @sk_send_head: front of stuff to transmit > diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c > index 7499c51b1850..754fdda8a5f5 100644 > --- a/net/l2tp/l2tp_core.c > +++ b/net/l2tp/l2tp_core.c > @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk) > } > > /* Remove hooks into tunnel socket */ > + write_lock_bh(&sk->sk_callback_lock); > sk->sk_destruct = tunnel->old_sk_destruct; > sk->sk_user_data = NULL; > + write_unlock_bh(&sk->sk_callback_lock); > > /* Call the original destructor */ > if (sk->sk_destruct) Hi Jakub, I have a similar issue with vxlan driver. Similar with commit ad6c9986bcb6 ("vxlan: Fix GRO cells race condition between receive and link delete"). There is still a race condition on vxlan that when receive a packet while deleting a VXLAN device. In vxlan_ecn_decapsulate(), the vxlan_get_sk_family() call panic as sk is NULL. #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757 #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48 #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542 #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62 [exception RIP: vxlan_ecn_decapsulate+0x3b] RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246 RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000 RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700 RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700 R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan] #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507 #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45 #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807 #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951 #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139 #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3 #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3 --- <IRQ stack> --- > struct socket ffff8a9fd1168700 struct socket { state = SS_FREE, type = 0, flags = 0, file = 0xffff8a9fcb748000, sk = 0x0, ops = 0x0, So I'm wondering if we should also have locks in udp_tunnel_sock_release(). Or should we add a checking in sk state before calling vxlan_get_sk_family()? Thanks Hangbin
On Fri, Dec 02, 2022 at 05:50 PM +08, Hangbin Liu wrote: > On Mon, Nov 14, 2022 at 08:16:19PM +0100, Jakub Sitnicki wrote: >> sk->sk_user_data has multiple users, which are not compatible with each >> other. Writers must synchronize by grabbing the sk->sk_callback_lock. >> >> l2tp currently fails to grab the lock when modifying the underlying tunnel >> socket fields. Fix it by adding appropriate locking. >> >> We err on the side of safety and grab the sk_callback_lock also inside the >> sk_destruct callback overridden by l2tp, even though there should be no >> refs allowing access to the sock at the time when sk_destruct gets called. >> >> v4: >> - serialize write to sk_user_data in l2tp sk_destruct >> >> v3: >> - switch from sock lock to sk_callback_lock >> - document write-protection for sk_user_data >> >> v2: >> - update Fixes to point to origin of the bug >> - use real names in Reported/Tested-by tags >> >> Cc: Tom Parkin <tparkin@katalix.com> >> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") >> Reported-by: Haowei Yan <g1042620637@gmail.com> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> >> --- >> >> This took me forever. Sorry about that. >> >> include/net/sock.h | 2 +- >> net/l2tp/l2tp_core.c | 19 +++++++++++++------ >> 2 files changed, 14 insertions(+), 7 deletions(-) >> >> diff --git a/include/net/sock.h b/include/net/sock.h >> index 5db02546941c..e0517ecc6531 100644 >> --- a/include/net/sock.h >> +++ b/include/net/sock.h >> @@ -323,7 +323,7 @@ struct sk_filter; >> * @sk_tskey: counter to disambiguate concurrent tstamp requests >> * @sk_zckey: counter to order MSG_ZEROCOPY notifications >> * @sk_socket: Identd and reporting IO signals >> - * @sk_user_data: RPC layer private data >> + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. >> * @sk_frag: cached page frag >> * @sk_peek_off: current peek_offset value >> * @sk_send_head: front of stuff to transmit >> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c >> index 7499c51b1850..754fdda8a5f5 100644 >> --- a/net/l2tp/l2tp_core.c >> +++ b/net/l2tp/l2tp_core.c >> @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk) >> } >> >> /* Remove hooks into tunnel socket */ >> + write_lock_bh(&sk->sk_callback_lock); >> sk->sk_destruct = tunnel->old_sk_destruct; >> sk->sk_user_data = NULL; >> + write_unlock_bh(&sk->sk_callback_lock); >> >> /* Call the original destructor */ >> if (sk->sk_destruct) > > Hi Jakub, > > I have a similar issue with vxlan driver. Similar with commit > ad6c9986bcb6 ("vxlan: Fix GRO cells race condition between receive and link > delete"). There is still a race condition on vxlan that when receive a packet > while deleting a VXLAN device. In vxlan_ecn_decapsulate(), the > vxlan_get_sk_family() call panic as sk is NULL. > > #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757 > #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d > #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48 > #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b > #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb > #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542 > #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62 > [exception RIP: vxlan_ecn_decapsulate+0x3b] > RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246 > RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000 > RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700 > RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae > R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700 > R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan] > #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507 > #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45 > #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807 > #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951 > #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde > #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b > #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139 > #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a > #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3 > #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca > #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3 > --- <IRQ stack> --- > >> struct socket ffff8a9fd1168700 > struct socket { > state = SS_FREE, > type = 0, > flags = 0, > file = 0xffff8a9fcb748000, > sk = 0x0, > ops = 0x0, > > So I'm wondering if we should also have locks in udp_tunnel_sock_release(). > Or should we add a checking in sk state before calling vxlan_get_sk_family()? This is how like to think about it: To know when it is safe to load vs->sock->sk->sk_family, we have to ask: 1. What ensures that the objects remain alive/valid in our scope? 2. What protects the objects from being mutated? In case of vxlan_sock object in the context of vxlan_ecn_decapsulate(): 1. We are in an RCU read side section (ip_local_deliver_finish). 2. RCU-protected objects are not to be mutated while readers exist. The classic "What is RCU, Fundamentally?" article explains it much better than I ever could: https://lwn.net/Articles/262464/ As to where the problem lies. I belive udp_tunnel_sock_release() is not keeping the (2) promise. After unpublishing the sk_user_data, we should wait for any existing readers accessing the vxlan_sock to finish with synchronize_rcu(), before releaseing the socket. That is: --- a/net/ipv4/udp_tunnel_core.c +++ b/net/ipv4/udp_tunnel_core.c @@ -176,6 +176,7 @@ EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb); void udp_tunnel_sock_release(struct socket *sock) { rcu_assign_sk_user_data(sock->sk, NULL); + synchronize_rcu(); kernel_sock_shutdown(sock, SHUT_RDWR); sock_release(sock); } Otherwise accessing vxlan_sock state doesn't look safe to me.
On Mon, Dec 05, 2022 at 11:24:39AM +0100, Jakub Sitnicki wrote: > > Hi Jakub, > > > > I have a similar issue with vxlan driver. Similar with commit > > ad6c9986bcb6 ("vxlan: Fix GRO cells race condition between receive and link > > delete"). There is still a race condition on vxlan that when receive a packet > > while deleting a VXLAN device. In vxlan_ecn_decapsulate(), the > > vxlan_get_sk_family() call panic as sk is NULL. > > > > So I'm wondering if we should also have locks in udp_tunnel_sock_release(). > > Or should we add a checking in sk state before calling vxlan_get_sk_family()? > > This is how like to think about it: > > To know when it is safe to load vs->sock->sk->sk_family, we have to ask: > > 1. What ensures that the objects remain alive/valid in our scope? > 2. What protects the objects from being mutated? > > In case of vxlan_sock object in the context of vxlan_ecn_decapsulate(): > > 1. We are in an RCU read side section (ip_local_deliver_finish). > 2. RCU-protected objects are not to be mutated while readers exist. > > The classic "What is RCU, Fundamentally?" article explains it much > better than I ever could: > > https://lwn.net/Articles/262464/ > > As to where the problem lies. I belive udp_tunnel_sock_release() is not > keeping the (2) promise. > > After unpublishing the sk_user_data, we should wait for any existing > readers accessing the vxlan_sock to finish with synchronize_rcu(), > before releaseing the socket. > > That is: > > --- a/net/ipv4/udp_tunnel_core.c > +++ b/net/ipv4/udp_tunnel_core.c > @@ -176,6 +176,7 @@ EXPORT_SYMBOL_GPL(udp_tunnel_xmit_skb); > void udp_tunnel_sock_release(struct socket *sock) > { > rcu_assign_sk_user_data(sock->sk, NULL); > + synchronize_rcu(); > kernel_sock_shutdown(sock, SHUT_RDWR); > sock_release(sock); > } > > > Otherwise accessing vxlan_sock state doesn't look safe to me. Hi Jakub, Thanks for your explain. As it's a little on my side, I will read your comments and try your suggestion tomorrow. Currently, I use the following draft patch to fix the vxlan issue. diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c index 2122747a0224..53259b0b07f3 100644 --- a/drivers/net/vxlan/vxlan_core.c +++ b/drivers/net/vxlan/vxlan_core.c @@ -4234,6 +4234,7 @@ static void vxlan_dellink(struct net_device *dev, struct list_head *head) struct vxlan_dev *vxlan = netdev_priv(dev); vxlan_flush(vxlan, true); + vxlan_sock_release(vxlan); list_del(&vxlan->next); unregister_netdevice_queue(dev, head); Cheers Hangbin
diff --git a/include/net/sock.h b/include/net/sock.h index 5db02546941c..e0517ecc6531 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -323,7 +323,7 @@ struct sk_filter; * @sk_tskey: counter to disambiguate concurrent tstamp requests * @sk_zckey: counter to order MSG_ZEROCOPY notifications * @sk_socket: Identd and reporting IO signals - * @sk_user_data: RPC layer private data + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. * @sk_frag: cached page frag * @sk_peek_off: current peek_offset value * @sk_send_head: front of stuff to transmit diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 7499c51b1850..754fdda8a5f5 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk) } /* Remove hooks into tunnel socket */ + write_lock_bh(&sk->sk_callback_lock); sk->sk_destruct = tunnel->old_sk_destruct; sk->sk_user_data = NULL; + write_unlock_bh(&sk->sk_callback_lock); /* Call the original destructor */ if (sk->sk_destruct) @@ -1469,16 +1471,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, sock = sockfd_lookup(tunnel->fd, &ret); if (!sock) goto err; - - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap); - if (ret < 0) - goto err_sock; } + sk = sock->sk; + write_lock(&sk->sk_callback_lock); + + ret = l2tp_validate_socket(sk, net, tunnel->encap); + if (ret < 0) + goto err_sock; + tunnel->l2tp_net = net; pn = l2tp_pernet(net); - sk = sock->sk; sock_hold(sk); tunnel->sock = sk; @@ -1504,7 +1508,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, setup_udp_tunnel_sock(net, sock, &udp_cfg); } else { - sk->sk_user_data = tunnel; + rcu_assign_sk_user_data(sk, tunnel); } tunnel->old_sk_destruct = sk->sk_destruct; @@ -1518,6 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, if (tunnel->fd >= 0) sockfd_put(sock); + write_unlock(&sk->sk_callback_lock); return 0; err_sock: @@ -1525,6 +1530,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, sock_release(sock); else sockfd_put(sock); + + write_unlock(&sk->sk_callback_lock); err: return ret; }
sk->sk_user_data has multiple users, which are not compatible with each other. Writers must synchronize by grabbing the sk->sk_callback_lock. l2tp currently fails to grab the lock when modifying the underlying tunnel socket fields. Fix it by adding appropriate locking. We err on the side of safety and grab the sk_callback_lock also inside the sk_destruct callback overridden by l2tp, even though there should be no refs allowing access to the sock at the time when sk_destruct gets called. v4: - serialize write to sk_user_data in l2tp sk_destruct v3: - switch from sock lock to sk_callback_lock - document write-protection for sk_user_data v2: - update Fixes to point to origin of the bug - use real names in Reported/Tested-by tags Cc: Tom Parkin <tparkin@katalix.com> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> --- This took me forever. Sorry about that. include/net/sock.h | 2 +- net/l2tp/l2tp_core.c | 19 +++++++++++++------ 2 files changed, 14 insertions(+), 7 deletions(-)