Message ID | 20230323200633.3175753-4-aditi.ghag@isovalent.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf-nex: Add socket destroy capability | expand |
On 03/23, Aditi Ghag wrote: > Previously, BPF TCP iterator was acquiring fast version of sock lock that > disables the BH. This introduced a circular dependency with code paths > that > later acquire sockets hash table bucket lock. > Replace the fast version of sock lock with slow that faciliates BPF > programs executed from the iterator to destroy TCP listening sockets > using the bpf_sock_destroy kfunc. > Here is a stack trace that motivated this change: > ``` > 1) sock_lock with BH disabled + bucket lock > lock_acquire+0xcd/0x330 > _raw_spin_lock_bh+0x38/0x50 > inet_unhash+0x96/0xd0 > tcp_set_state+0x6a/0x210 > tcp_abort+0x12b/0x230 > bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa > bpf_iter_run_prog+0x1ff/0x340 > bpf_iter_tcp_seq_show+0xca/0x190 > bpf_seq_read+0x177/0x450 > vfs_read+0xc6/0x300 > ksys_read+0x69/0xf0 > do_syscall_64+0x3c/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > 2) sock lock with BH enable > [ 1.499968] lock_acquire+0xcd/0x330 > [ 1.500316] _raw_spin_lock+0x33/0x40 > [ 1.500670] sk_clone_lock+0x146/0x520 > [ 1.501030] inet_csk_clone_lock+0x1b/0x110 > [ 1.501433] tcp_create_openreq_child+0x22/0x3f0 > [ 1.501873] tcp_v6_syn_recv_sock+0x96/0x940 > [ 1.502284] tcp_check_req+0x137/0x660 > [ 1.502646] tcp_v6_rcv+0xa63/0xe80 > [ 1.502994] ip6_protocol_deliver_rcu+0x78/0x590 > [ 1.503434] ip6_input_finish+0x72/0x140 > [ 1.503818] __netif_receive_skb_one_core+0x63/0xa0 > [ 1.504281] process_backlog+0x79/0x260 > [ 1.504668] __napi_poll.constprop.0+0x27/0x170 > [ 1.505104] net_rx_action+0x14a/0x2a0 > [ 1.505469] __do_softirq+0x165/0x510 > [ 1.505842] do_softirq+0xcd/0x100 > [ 1.506172] __local_bh_enable_ip+0xcc/0xf0 > [ 1.506588] ip6_finish_output2+0x2a8/0xb00 > [ 1.506988] ip6_finish_output+0x274/0x510 > [ 1.507377] ip6_xmit+0x319/0x9b0 > [ 1.507726] inet6_csk_xmit+0x12b/0x2b0 > [ 1.508096] __tcp_transmit_skb+0x549/0xc40 > [ 1.508498] tcp_rcv_state_process+0x362/0x1180 > ``` > Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Acked-by: Stanislav Fomichev <sdf@google.com> Don't need fixes because it doesn't trigger without your new bpf_sock_destroy? > --- > net/ipv4/tcp_ipv4.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index ea370afa70ed..f2d370a9450f 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file > *seq, void *v) > struct bpf_iter_meta meta; > struct bpf_prog *prog; > struct sock *sk = v; > - bool slow; > uid_t uid; > int ret; > @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file > *seq, void *v) > return 0; > if (sk_fullsock(sk)) > - slow = lock_sock_fast(sk); > + lock_sock(sk); > if (unlikely(sk_unhashed(sk))) { > ret = SEQ_SKIP; > @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file > *seq, void *v) > unlock: > if (sk_fullsock(sk)) > - unlock_sock_fast(sk, slow); > + release_sock(sk); > return ret; > } > -- > 2.34.1
On 3/23/23 1:06 PM, Aditi Ghag wrote: > Previously, BPF TCP iterator was acquiring fast version of sock lock that > disables the BH. This introduced a circular dependency with code paths that > later acquire sockets hash table bucket lock. > Replace the fast version of sock lock with slow that faciliates BPF > programs executed from the iterator to destroy TCP listening sockets > using the bpf_sock_destroy kfunc. This batch should be moved before the bpf_sock_destroy patch.
> On Mar 24, 2023, at 2:45 PM, Stanislav Fomichev <sdf@google.com> wrote: > > On 03/23, Aditi Ghag wrote: >> Previously, BPF TCP iterator was acquiring fast version of sock lock that >> disables the BH. This introduced a circular dependency with code paths that >> later acquire sockets hash table bucket lock. >> Replace the fast version of sock lock with slow that faciliates BPF >> programs executed from the iterator to destroy TCP listening sockets >> using the bpf_sock_destroy kfunc. > >> Here is a stack trace that motivated this change: > >> ``` >> 1) sock_lock with BH disabled + bucket lock > >> lock_acquire+0xcd/0x330 >> _raw_spin_lock_bh+0x38/0x50 >> inet_unhash+0x96/0xd0 >> tcp_set_state+0x6a/0x210 >> tcp_abort+0x12b/0x230 >> bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa >> bpf_iter_run_prog+0x1ff/0x340 >> bpf_iter_tcp_seq_show+0xca/0x190 >> bpf_seq_read+0x177/0x450 >> vfs_read+0xc6/0x300 >> ksys_read+0x69/0xf0 >> do_syscall_64+0x3c/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc > >> 2) sock lock with BH enable > >> [ 1.499968] lock_acquire+0xcd/0x330 >> [ 1.500316] _raw_spin_lock+0x33/0x40 >> [ 1.500670] sk_clone_lock+0x146/0x520 >> [ 1.501030] inet_csk_clone_lock+0x1b/0x110 >> [ 1.501433] tcp_create_openreq_child+0x22/0x3f0 >> [ 1.501873] tcp_v6_syn_recv_sock+0x96/0x940 >> [ 1.502284] tcp_check_req+0x137/0x660 >> [ 1.502646] tcp_v6_rcv+0xa63/0xe80 >> [ 1.502994] ip6_protocol_deliver_rcu+0x78/0x590 >> [ 1.503434] ip6_input_finish+0x72/0x140 >> [ 1.503818] __netif_receive_skb_one_core+0x63/0xa0 >> [ 1.504281] process_backlog+0x79/0x260 >> [ 1.504668] __napi_poll.constprop.0+0x27/0x170 >> [ 1.505104] net_rx_action+0x14a/0x2a0 >> [ 1.505469] __do_softirq+0x165/0x510 >> [ 1.505842] do_softirq+0xcd/0x100 >> [ 1.506172] __local_bh_enable_ip+0xcc/0xf0 >> [ 1.506588] ip6_finish_output2+0x2a8/0xb00 >> [ 1.506988] ip6_finish_output+0x274/0x510 >> [ 1.507377] ip6_xmit+0x319/0x9b0 >> [ 1.507726] inet6_csk_xmit+0x12b/0x2b0 >> [ 1.508096] __tcp_transmit_skb+0x549/0xc40 >> [ 1.508498] tcp_rcv_state_process+0x362/0x1180 > >> ``` > >> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> > > Acked-by: Stanislav Fomichev <sdf@google.com> > > Don't need fixes because it doesn't trigger without your new > bpf_sock_destroy? That's right. > > >> --- >> net/ipv4/tcp_ipv4.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) > >> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c >> index ea370afa70ed..f2d370a9450f 100644 >> --- a/net/ipv4/tcp_ipv4.c >> +++ b/net/ipv4/tcp_ipv4.c >> @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) >> struct bpf_iter_meta meta; >> struct bpf_prog *prog; >> struct sock *sk = v; >> - bool slow; >> uid_t uid; >> int ret; > >> @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) >> return 0; > >> if (sk_fullsock(sk)) >> - slow = lock_sock_fast(sk); >> + lock_sock(sk); > >> if (unlikely(sk_unhashed(sk))) { >> ret = SEQ_SKIP; >> @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) > >> unlock: >> if (sk_fullsock(sk)) >> - unlock_sock_fast(sk, slow); >> + release_sock(sk); >> return ret; > >> } >> -- >> 2.34.1
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index ea370afa70ed..f2d370a9450f 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2962,7 +2962,6 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) struct bpf_iter_meta meta; struct bpf_prog *prog; struct sock *sk = v; - bool slow; uid_t uid; int ret; @@ -2970,7 +2969,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) return 0; if (sk_fullsock(sk)) - slow = lock_sock_fast(sk); + lock_sock(sk); if (unlikely(sk_unhashed(sk))) { ret = SEQ_SKIP; @@ -2994,7 +2993,7 @@ static int bpf_iter_tcp_seq_show(struct seq_file *seq, void *v) unlock: if (sk_fullsock(sk)) - unlock_sock_fast(sk, slow); + release_sock(sk); return ret; }
Previously, BPF TCP iterator was acquiring fast version of sock lock that disables the BH. This introduced a circular dependency with code paths that later acquire sockets hash table bucket lock. Replace the fast version of sock lock with slow that faciliates BPF programs executed from the iterator to destroy TCP listening sockets using the bpf_sock_destroy kfunc. Here is a stack trace that motivated this change: ``` 1) sock_lock with BH disabled + bucket lock lock_acquire+0xcd/0x330 _raw_spin_lock_bh+0x38/0x50 inet_unhash+0x96/0xd0 tcp_set_state+0x6a/0x210 tcp_abort+0x12b/0x230 bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa bpf_iter_run_prog+0x1ff/0x340 bpf_iter_tcp_seq_show+0xca/0x190 bpf_seq_read+0x177/0x450 vfs_read+0xc6/0x300 ksys_read+0x69/0xf0 do_syscall_64+0x3c/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc 2) sock lock with BH enable [ 1.499968] lock_acquire+0xcd/0x330 [ 1.500316] _raw_spin_lock+0x33/0x40 [ 1.500670] sk_clone_lock+0x146/0x520 [ 1.501030] inet_csk_clone_lock+0x1b/0x110 [ 1.501433] tcp_create_openreq_child+0x22/0x3f0 [ 1.501873] tcp_v6_syn_recv_sock+0x96/0x940 [ 1.502284] tcp_check_req+0x137/0x660 [ 1.502646] tcp_v6_rcv+0xa63/0xe80 [ 1.502994] ip6_protocol_deliver_rcu+0x78/0x590 [ 1.503434] ip6_input_finish+0x72/0x140 [ 1.503818] __netif_receive_skb_one_core+0x63/0xa0 [ 1.504281] process_backlog+0x79/0x260 [ 1.504668] __napi_poll.constprop.0+0x27/0x170 [ 1.505104] net_rx_action+0x14a/0x2a0 [ 1.505469] __do_softirq+0x165/0x510 [ 1.505842] do_softirq+0xcd/0x100 [ 1.506172] __local_bh_enable_ip+0xcc/0xf0 [ 1.506588] ip6_finish_output2+0x2a8/0xb00 [ 1.506988] ip6_finish_output+0x274/0x510 [ 1.507377] ip6_xmit+0x319/0x9b0 [ 1.507726] inet6_csk_xmit+0x12b/0x2b0 [ 1.508096] __tcp_transmit_skb+0x549/0xc40 [ 1.508498] tcp_rcv_state_process+0x362/0x1180 ``` Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> --- net/ipv4/tcp_ipv4.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)