Message ID | 20231108202819.1932920-1-sdf@google.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: set SOCK_RCU_FREE before inserting socket into hashtable | expand |
On Wed, Nov 8, 2023 at 9:28 PM Stanislav Fomichev <sdf@google.com> wrote: > > We've started to see the following kernel traces: > > WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0 > > Call Trace: > <IRQ> > __bpf_skc_lookup+0x10d/0x120 > bpf_sk_lookup+0x48/0xd0 > bpf_sk_lookup_tcp+0x19/0x20 > bpf_prog_<redacted>+0x37c/0x16a3 > cls_bpf_classify+0x205/0x2e0 > tcf_classify+0x92/0x160 > __netif_receive_skb_core+0xe52/0xf10 > __netif_receive_skb_list_core+0x96/0x2b0 > napi_complete_done+0x7b5/0xb70 > <redacted>_poll+0x94/0xb0 > net_rx_action+0x163/0x1d70 > __do_softirq+0xdc/0x32e > asm_call_irq_on_stack+0x12/0x20 > </IRQ> > do_softirq_own_stack+0x36/0x50 > do_softirq+0x44/0x70 > > I'm not 100% what is causing them. It might be some kernel change or > new code path in the bpf program. But looking at the code, > I'm assuming the issue has been there for a while. > > __inet_hash can race with lockless (rcu) readers on the other cpus: > > __inet_hash > __sk_nulls_add_node_rcu > <- (bpf triggers here) > sock_set_flag(SOCK_RCU_FREE) > > Let's move the SOCK_RCU_FREE part up a bit, before we are inserting > the socket into hashtables. Note, that the race is really harmless; > the bpf callers are handling this situation (where listener socket > doesn't have SOCK_RCU_FREE set) correctly, so the only > annoyance is a WARN_ONCE (so not 100% sure whether it should > wait until net-next instead). > > For the fixes tag, I'm using the original commit which added the flag. When this commit added the flag, precise location of the sock_set_flag(sk, SOCK_RCU_FREE) did not matter, because the thread calling __inet_hash() owns a reference on sk. SOCK_RCU_FREE was tested only at dismantle time. Back then BPF was not able yet to perform lookups, and double check if SOCK_RCU_FREE was set or not. Checking SOCK_RCU_FREE _after_ the lookup to infer if a refcount has been taken came with commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") I think we can be more precise and help future debugging, in case more problems need investigations. Can you augment the changelog and use a different Fixes: tag ? With that, Reviewed-by: Eric Dumazet <edumazet@google.com> > > Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > net/ipv4/inet_hashtables.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > index 598c1b114d2c..a532f749e477 100644 > --- a/net/ipv4/inet_hashtables.c > +++ b/net/ipv4/inet_hashtables.c > @@ -751,12 +751,12 @@ int __inet_hash(struct sock *sk, struct sock *osk) > if (err) > goto unlock; > } > + sock_set_flag(sk, SOCK_RCU_FREE); > if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && > sk->sk_family == AF_INET6) > __sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head); > else > __sk_nulls_add_node_rcu(sk, &ilb2->nulls_head); > - sock_set_flag(sk, SOCK_RCU_FREE); > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); > unlock: > spin_unlock(&ilb2->lock); > -- > 2.42.0.869.gea05f2083d-goog >
On Wed, Nov 8, 2023 at 12:58 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Nov 8, 2023 at 9:28 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > We've started to see the following kernel traces: > > > > WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0 > > > > Call Trace: > > <IRQ> > > __bpf_skc_lookup+0x10d/0x120 > > bpf_sk_lookup+0x48/0xd0 > > bpf_sk_lookup_tcp+0x19/0x20 > > bpf_prog_<redacted>+0x37c/0x16a3 > > cls_bpf_classify+0x205/0x2e0 > > tcf_classify+0x92/0x160 > > __netif_receive_skb_core+0xe52/0xf10 > > __netif_receive_skb_list_core+0x96/0x2b0 > > napi_complete_done+0x7b5/0xb70 > > <redacted>_poll+0x94/0xb0 > > net_rx_action+0x163/0x1d70 > > __do_softirq+0xdc/0x32e > > asm_call_irq_on_stack+0x12/0x20 > > </IRQ> > > do_softirq_own_stack+0x36/0x50 > > do_softirq+0x44/0x70 > > > > I'm not 100% what is causing them. It might be some kernel change or > > new code path in the bpf program. But looking at the code, > > I'm assuming the issue has been there for a while. > > > > __inet_hash can race with lockless (rcu) readers on the other cpus: > > > > __inet_hash > > __sk_nulls_add_node_rcu > > <- (bpf triggers here) > > sock_set_flag(SOCK_RCU_FREE) > > > > Let's move the SOCK_RCU_FREE part up a bit, before we are inserting > > the socket into hashtables. Note, that the race is really harmless; > > the bpf callers are handling this situation (where listener socket > > doesn't have SOCK_RCU_FREE set) correctly, so the only > > annoyance is a WARN_ONCE (so not 100% sure whether it should > > wait until net-next instead). > > > > For the fixes tag, I'm using the original commit which added the flag. > > When this commit added the flag, precise location of the > sock_set_flag(sk, SOCK_RCU_FREE) > did not matter, because the thread calling __inet_hash() owns a reference on sk. > > SOCK_RCU_FREE was tested only at dismantle time. > > Back then BPF was not able yet to perform lookups, and double check if > SOCK_RCU_FREE > was set or not. > > Checking SOCK_RCU_FREE _after_ the lookup to infer if a refcount has > been taken came > with commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") > > I think we can be more precise and help future debugging, in case more problems > need investigations. > > Can you augment the changelog and use a different Fixes: tag ? > > With that, > > Reviewed-by: Eric Dumazet <edumazet@google.com> Sure, thank you for the timeline! Will resend shortly with the updated changelog. > > > > Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > > net/ipv4/inet_hashtables.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c > > index 598c1b114d2c..a532f749e477 100644 > > --- a/net/ipv4/inet_hashtables.c > > +++ b/net/ipv4/inet_hashtables.c > > @@ -751,12 +751,12 @@ int __inet_hash(struct sock *sk, struct sock *osk) > > if (err) > > goto unlock; > > } > > + sock_set_flag(sk, SOCK_RCU_FREE); > > if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && > > sk->sk_family == AF_INET6) > > __sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head); > > else > > __sk_nulls_add_node_rcu(sk, &ilb2->nulls_head); > > - sock_set_flag(sk, SOCK_RCU_FREE); > > sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); > > unlock: > > spin_unlock(&ilb2->lock); > > -- > > 2.42.0.869.gea05f2083d-goog > >
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 598c1b114d2c..a532f749e477 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -751,12 +751,12 @@ int __inet_hash(struct sock *sk, struct sock *osk) if (err) goto unlock; } + sock_set_flag(sk, SOCK_RCU_FREE); if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && sk->sk_family == AF_INET6) __sk_nulls_add_node_tail_rcu(sk, &ilb2->nulls_head); else __sk_nulls_add_node_rcu(sk, &ilb2->nulls_head); - sock_set_flag(sk, SOCK_RCU_FREE); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1); unlock: spin_unlock(&ilb2->lock);
We've started to see the following kernel traces: WARNING: CPU: 83 PID: 0 at net/core/filter.c:6641 sk_lookup+0x1bd/0x1d0 Call Trace: <IRQ> __bpf_skc_lookup+0x10d/0x120 bpf_sk_lookup+0x48/0xd0 bpf_sk_lookup_tcp+0x19/0x20 bpf_prog_<redacted>+0x37c/0x16a3 cls_bpf_classify+0x205/0x2e0 tcf_classify+0x92/0x160 __netif_receive_skb_core+0xe52/0xf10 __netif_receive_skb_list_core+0x96/0x2b0 napi_complete_done+0x7b5/0xb70 <redacted>_poll+0x94/0xb0 net_rx_action+0x163/0x1d70 __do_softirq+0xdc/0x32e asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x36/0x50 do_softirq+0x44/0x70 I'm not 100% what is causing them. It might be some kernel change or new code path in the bpf program. But looking at the code, I'm assuming the issue has been there for a while. __inet_hash can race with lockless (rcu) readers on the other cpus: __inet_hash __sk_nulls_add_node_rcu <- (bpf triggers here) sock_set_flag(SOCK_RCU_FREE) Let's move the SOCK_RCU_FREE part up a bit, before we are inserting the socket into hashtables. Note, that the race is really harmless; the bpf callers are handling this situation (where listener socket doesn't have SOCK_RCU_FREE set) correctly, so the only annoyance is a WARN_ONCE (so not 100% sure whether it should wait until net-next instead). For the fixes tag, I'm using the original commit which added the flag. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Stanislav Fomichev <sdf@google.com> --- net/ipv4/inet_hashtables.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)