[v3,bpf-next,3/5,RFC] net: Skip taking lock in BPF context

Message ID	20230321184541.1857363-4-aditi.ghag@isovalent.com (mailing list archive)
State	RFC
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Aditi Ghag <aditi.ghag@isovalent.com> To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, edumazet@google.com, aditi.ghag@isovalent.com Subject: [PATCH v3 bpf-next 3/5] [RFC] net: Skip taking lock in BPF context Date: Tue, 21 Mar 2023 18:45:39 +0000 Message-Id: <20230321184541.1857363-4-aditi.ghag@isovalent.com> In-Reply-To: <20230321184541.1857363-1-aditi.ghag@isovalent.com> References: <20230321184541.1857363-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bpf-next: Add socket destroy capability \| expand [v3,bpf-next,0/5] bpf-next: Add socket destroy capability [v3,bpf-next,1/5] bpf: Implement batching in UDP iterator [v3,bpf-next,2/5] bpf: Add bpf_sock_destroy kfunc [v3,bpf-next,3/5,RFC] net: Skip taking lock in BPF context [v3,bpf-next,4/5,RFC] udp: Fix destroying UDP listening sockets [5/5] selftests/bpf: Add tests for bpf_sock_destroy

Context	Check	Description
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/apply	fail	Patch does not apply to bpf-next
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	success	Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-4	success	Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6	success	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10	success	Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11	success	Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12	success	Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13	success	Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14	success	Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15	success	Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16	success	Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-17	success	Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18	success	Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19	success	Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20	success	Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21	success	Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-22	success	Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23	success	Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24	success	Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25	success	Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-26	success	Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-27	success	Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28	success	Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-29	success	Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-30	success	Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-31	success	Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32	success	Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-33	success	Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34	success	Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-35	success	Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-36	success	Logs for test_verifier on x86_64 with llvm-16

Aditi Ghag March 21, 2023, 6:45 p.m. UTC

When sockets are destroyed in the BPF iterator context, sock
lock is already acquired, so skip taking the lock. This allows
TCP listening sockets to be destroyed from BPF programs.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/ipv4/inet_hashtables.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Stanislav Fomichev March 21, 2023, 9:31 p.m. UTC | #1

On 03/21, Aditi Ghag wrote:
> When sockets are destroyed in the BPF iterator context, sock
> lock is already acquired, so skip taking the lock. This allows
> TCP listening sockets to be destroyed from BPF programs.

> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> ---
>   net/ipv4/inet_hashtables.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)

> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index e41fdc38ce19..5543a3e0d1b4 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
>   		/* Don't disable bottom halves while acquiring the lock to
>   		 * avoid circular locking dependency on PREEMPT_RT.
>   		 */
> -		spin_lock(&ilb2->lock);
> +		if (!has_current_bpf_ctx())
> +			spin_lock(&ilb2->lock);
>   		if (sk_unhashed(sk)) {
> -			spin_unlock(&ilb2->lock);
> +			if (!has_current_bpf_ctx())
> +				spin_unlock(&ilb2->lock);

That's bucket lock, why do we have to skip it?

>   			return;
>   		}

> @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk)

>   		__sk_nulls_del_node_init_rcu(sk);
>   		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
> -		spin_unlock(&ilb2->lock);
> +		if (!has_current_bpf_ctx())
> +			spin_unlock(&ilb2->lock);
>   	} else {
>   		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);

> --
> 2.34.1

Aditi Ghag March 21, 2023, 9:37 p.m. UTC | #2

> On Mar 21, 2023, at 2:31 PM, Stanislav Fomichev <sdf@google.com> wrote:
> 
> On 03/21, Aditi Ghag wrote:
>> When sockets are destroyed in the BPF iterator context, sock
>> lock is already acquired, so skip taking the lock. This allows
>> TCP listening sockets to be destroyed from BPF programs.
> 
>> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
>> ---
>>  net/ipv4/inet_hashtables.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>> index e41fdc38ce19..5543a3e0d1b4 100644
>> --- a/net/ipv4/inet_hashtables.c
>> +++ b/net/ipv4/inet_hashtables.c
>> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
>>  		/* Don't disable bottom halves while acquiring the lock to
>>  		 * avoid circular locking dependency on PREEMPT_RT.
>>  		 */
>> -		spin_lock(&ilb2->lock);
>> +		if (!has_current_bpf_ctx())
>> +			spin_lock(&ilb2->lock);
>>  		if (sk_unhashed(sk)) {
>> -			spin_unlock(&ilb2->lock);
>> +			if (!has_current_bpf_ctx())
>> +				spin_unlock(&ilb2->lock);
> 
> That's bucket lock, why do we have to skip it?

Because we take the bucket lock while iterating UDP sockets. See the first commit in this series around batching. But not all BPF contexts that could invoke this function may not acquire the lock, so we can't always skip it. 

> 
>>  			return;
>>  		}
> 
>> @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk)
> 
>>  		__sk_nulls_del_node_init_rcu(sk);
>>  		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>> -		spin_unlock(&ilb2->lock);
>> +		if (!has_current_bpf_ctx())
>> +			spin_unlock(&ilb2->lock);
>>  	} else {
>>  		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
> 
>> --
>> 2.34.1

Eric Dumazet March 21, 2023, 9:41 p.m. UTC | #3

On Tue, Mar 21, 2023 at 2:37 PM Aditi Ghag <aditi.ghag@isovalent.com> wrote:
>
>
>
> > On Mar 21, 2023, at 2:31 PM, Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 03/21, Aditi Ghag wrote:
> >> When sockets are destroyed in the BPF iterator context, sock
> >> lock is already acquired, so skip taking the lock. This allows
> >> TCP listening sockets to be destroyed from BPF programs.
> >
> >> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
> >> ---
> >>  net/ipv4/inet_hashtables.c | 9 ++++++---
> >>  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> >> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> >> index e41fdc38ce19..5543a3e0d1b4 100644
> >> --- a/net/ipv4/inet_hashtables.c
> >> +++ b/net/ipv4/inet_hashtables.c
> >> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
> >>              /* Don't disable bottom halves while acquiring the lock to
> >>               * avoid circular locking dependency on PREEMPT_RT.
> >>               */
> >> -            spin_lock(&ilb2->lock);
> >> +            if (!has_current_bpf_ctx())
> >> +                    spin_lock(&ilb2->lock);
> >>              if (sk_unhashed(sk)) {
> >> -                    spin_unlock(&ilb2->lock);
> >> +                    if (!has_current_bpf_ctx())
> >> +                            spin_unlock(&ilb2->lock);
> >
> > That's bucket lock, why do we have to skip it?
>
> Because we take the bucket lock while iterating UDP sockets. See the first commit in this series around batching. But not all BPF contexts that could invoke this function may not acquire the lock, so we can't always skip it.

This means the kernel will deadlock right before this patch is applied ?

Please squash this patch into the patch adding this bug.

Also, ensuring locks are owned would be nice (full LOCKDEP support,
instead of assumptions)

Aditi Ghag March 21, 2023, 9:43 p.m. UTC | #4

> On Mar 21, 2023, at 2:37 PM, Aditi Ghag <aditi.ghag@isovalent.com> wrote:
> 
> 
> 
>> On Mar 21, 2023, at 2:31 PM, Stanislav Fomichev <sdf@google.com> wrote:
>> 
>> On 03/21, Aditi Ghag wrote:
>>> When sockets are destroyed in the BPF iterator context, sock
>>> lock is already acquired, so skip taking the lock. This allows
>>> TCP listening sockets to be destroyed from BPF programs.
>> 
>>> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
>>> ---
>>> net/ipv4/inet_hashtables.c | 9 ++++++---
>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>>> index e41fdc38ce19..5543a3e0d1b4 100644
>>> --- a/net/ipv4/inet_hashtables.c
>>> +++ b/net/ipv4/inet_hashtables.c
>>> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
>>> 		/* Don't disable bottom halves while acquiring the lock to
>>> 		 * avoid circular locking dependency on PREEMPT_RT.
>>> 		 */
>>> -		spin_lock(&ilb2->lock);
>>> +		if (!has_current_bpf_ctx())
>>> +			spin_lock(&ilb2->lock);
>>> 		if (sk_unhashed(sk)) {
>>> -			spin_unlock(&ilb2->lock);
>>> +			if (!has_current_bpf_ctx())
>>> +				spin_unlock(&ilb2->lock);
>> 
>> That's bucket lock, why do we have to skip it?
> 
> Because we take the bucket lock while iterating UDP sockets. See the first commit in this series around batching. But not all BPF contexts that could invoke this function may not acquire the lock, so we can't always skip it. 

Sorry, my buttery fingers hit the sent button too soon. You are right, it's the bucket and not *sock* lock. The commit that adds batching releases the bucket lock. I'll take a look at the stack trace again. 


> 
>> 
>>> 			return;
>>> 		}
>> 
>>> @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk)
>> 
>>> 		__sk_nulls_del_node_init_rcu(sk);
>>> 		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>>> -		spin_unlock(&ilb2->lock);
>>> +		if (!has_current_bpf_ctx())
>>> +			spin_unlock(&ilb2->lock);
>>> 	} else {
>>> 		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
>> 
>>> --
>>> 2.34.1

Aditi Ghag March 21, 2023, 11:39 p.m. UTC | #5

> On Mar 21, 2023, at 2:31 PM, Stanislav Fomichev <sdf@google.com> wrote:
> 
> On 03/21, Aditi Ghag wrote:
>> When sockets are destroyed in the BPF iterator context, sock
>> lock is already acquired, so skip taking the lock. This allows
>> TCP listening sockets to be destroyed from BPF programs.
> 
>> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
>> ---
>>  net/ipv4/inet_hashtables.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>> index e41fdc38ce19..5543a3e0d1b4 100644
>> --- a/net/ipv4/inet_hashtables.c
>> +++ b/net/ipv4/inet_hashtables.c
>> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
>>  		/* Don't disable bottom halves while acquiring the lock to
>>  		 * avoid circular locking dependency on PREEMPT_RT.
>>  		 */
>> -		spin_lock(&ilb2->lock);
>> +		if (!has_current_bpf_ctx())
>> +			spin_lock(&ilb2->lock);
>>  		if (sk_unhashed(sk)) {
>> -			spin_unlock(&ilb2->lock);
>> +			if (!has_current_bpf_ctx())
>> +				spin_unlock(&ilb2->lock);
> 
> That's bucket lock, why do we have to skip it?

Good catch! We shouldn't skip it. So the issue is that BPF TCP iterator acquires the sock fast lock that disables BH, and then inet_unhash acquires the bucket lock, but this is in reverse order to when the sock lock is acquired with BH enabled. 

@Martin: I wonder if you ran into similar issues while working on the BPF TCP iterator batching changes for setsockopt? 

Here is the truncated stack trace:

    1.494510] test_progs/118 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
[    1.495123] ffff9ec9c184da10 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x96/0xd0
[    1.495867]
[    1.495867] and this task is already holding:
[    1.496401] ffff9ec9c23400b0 (slock-AF_INET6){+.-.}-{2:2}, at: __lock_sock_fast+0x33/0x70
[    1.497148] which would create a new lock dependency:
[    1.497619]  (slock-AF_INET6){+.-.}-{2:2} -> (&h->lhash2[i].lock){+.+.}-{2:2}
[    1.498278]
[    1.498278] but this new dependency connects a SOFTIRQ-irq-safe lock:
[    1.499011]  (slock-AF_INET6){+.-.}-{2:2}
[    1.499014]
[    1.499014] ... which became SOFTIRQ-irq-safe at:
[    1.499968]   lock_acquire+0xcd/0x330
[    1.500316]   _raw_spin_lock+0x33/0x40
[    1.500670]   sk_clone_lock+0x146/0x520
[    1.501030]   inet_csk_clone_lock+0x1b/0x110
[    1.501433]   tcp_create_openreq_child+0x22/0x3f0
[    1.501873]   tcp_v6_syn_recv_sock+0x96/0x940
[    1.502284]   tcp_check_req+0x137/0x660
[    1.502646]   tcp_v6_rcv+0xa63/0xe80
[    1.502994]   ip6_protocol_deliver_rcu+0x78/0x590
[    1.503434]   ip6_input_finish+0x72/0x140
[    1.503818]   __netif_receive_skb_one_core+0x63/0xa0
:
:

[    1.512311] to a SOFTIRQ-irq-unsafe lock:
[    1.512773]  (&h->lhash2[i].lock){+.+.}-{2:2}
[    1.512776]
[    1.512776] ... which became SOFTIRQ-irq-unsafe at:
[    1.513691] ...
[    1.513692]   lock_acquire+0xcd/0x330
[    1.514168]   _raw_spin_lock+0x33/0x40
[    1.514492]   __inet_hash+0x4b/0x210
[    1.514802]   inet_csk_listen_start+0xe6/0x100
[    1.515188]   inet_listen+0x95/0x1d0
[    1.515510]   __sys_listen+0x69/0xb0
[    1.515825]   __x64_sys_listen+0x14/0x20
[    1.516163]   do_syscall_64+0x3c/0x90
[    1.516481]   entry_SYSCALL_64_after_hwframe+0x72/0xdc
:
:

[    1.560494]  ... acquired at:
[    1.560634]    lock_acquire+0xcd/0x330
[    1.560804]    _raw_spin_lock_bh+0x38/0x50
[    1.561002]    inet_unhash+0x96/0xd0
[    1.561168]    tcp_set_state+0x6a/0x210
[    1.561352]    tcp_abort+0x12b/0x230
[    1.561518]    bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa
[    1.561799]    bpf_iter_run_prog+0x1ff/0x340
[    1.561990]    bpf_iter_tcp_seq_show+0xca/0x190
[    1.562193]    bpf_seq_read+0x177/0x450
[    1.562363]    vfs_read+0xc6/0x300
[    1.562516]    ksys_read+0x69/0xf0
[    1.562665]    do_syscall_64+0x3c/0x90
[    1.562838]    entry_SYSCALL_64_after_hwframe+0x72/0xdc

> 
>>  			return;
>>  		}
> 
>> @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk)
> 
>>  		__sk_nulls_del_node_init_rcu(sk);
>>  		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>> -		spin_unlock(&ilb2->lock);
>> +		if (!has_current_bpf_ctx())
>> +			spin_unlock(&ilb2->lock);
>>  	} else {
>>  		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
> 
>> --
>> 2.34.1

Martin KaFai Lau March 22, 2023, 12:02 a.m. UTC | #6

On 3/21/23 4:39 PM, Aditi Ghag wrote:
> 
> 
>> On Mar 21, 2023, at 2:31 PM, Stanislav Fomichev <sdf@google.com> wrote:
>>
>> On 03/21, Aditi Ghag wrote:
>>> When sockets are destroyed in the BPF iterator context, sock
>>> lock is already acquired, so skip taking the lock. This allows
>>> TCP listening sockets to be destroyed from BPF programs.
>>
>>> Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
>>> ---
>>>   net/ipv4/inet_hashtables.c | 9 ++++++---
>>>   1 file changed, 6 insertions(+), 3 deletions(-)
>>
>>> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
>>> index e41fdc38ce19..5543a3e0d1b4 100644
>>> --- a/net/ipv4/inet_hashtables.c
>>> +++ b/net/ipv4/inet_hashtables.c
>>> @@ -777,9 +777,11 @@ void inet_unhash(struct sock *sk)
>>>   		/* Don't disable bottom halves while acquiring the lock to
>>>   		 * avoid circular locking dependency on PREEMPT_RT.
>>>   		 */
>>> -		spin_lock(&ilb2->lock);
>>> +		if (!has_current_bpf_ctx())
>>> +			spin_lock(&ilb2->lock);
>>>   		if (sk_unhashed(sk)) {
>>> -			spin_unlock(&ilb2->lock);
>>> +			if (!has_current_bpf_ctx())
>>> +				spin_unlock(&ilb2->lock);
>>
>> That's bucket lock, why do we have to skip it?
> 
> Good catch! We shouldn't skip it. So the issue is that BPF TCP iterator acquires the sock fast lock that disables BH, and then inet_unhash acquires the bucket lock, but this is in reverse order to when the sock lock is acquired with BH enabled.
> 
> @Martin: I wonder if you ran into similar issues while working on the BPF TCP iterator batching changes for setsockopt?

The existing helpers don't need to acquire bucket log. afaik, the destroy kfunc 
is the first. [ Unrelated note, the bh disable had recently be removed for lhash 
in commit 4f9bf2a2f5aa. ]

If I read this splat correctly, similar to the earlier reply in patch 2 for a 
different issue, an easy solution is to use lock_sock() instead of 
lock_sock_fast() in bpf_iter_tcp_seq_show() .

> 
> Here is the truncated stack trace:
> 
>      1.494510] test_progs/118 [HC0[0]:SC0[4]:HE1:SE0] is trying to acquire:
> [    1.495123] ffff9ec9c184da10 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x96/0xd0
> [    1.495867]
> [    1.495867] and this task is already holding:
> [    1.496401] ffff9ec9c23400b0 (slock-AF_INET6){+.-.}-{2:2}, at: __lock_sock_fast+0x33/0x70
> [    1.497148] which would create a new lock dependency:
> [    1.497619]  (slock-AF_INET6){+.-.}-{2:2} -> (&h->lhash2[i].lock){+.+.}-{2:2}
> [    1.498278]
> [    1.498278] but this new dependency connects a SOFTIRQ-irq-safe lock:
> [    1.499011]  (slock-AF_INET6){+.-.}-{2:2}
> [    1.499014]
> [    1.499014] ... which became SOFTIRQ-irq-safe at:
> [    1.499968]   lock_acquire+0xcd/0x330
> [    1.500316]   _raw_spin_lock+0x33/0x40
> [    1.500670]   sk_clone_lock+0x146/0x520
> [    1.501030]   inet_csk_clone_lock+0x1b/0x110
> [    1.501433]   tcp_create_openreq_child+0x22/0x3f0
> [    1.501873]   tcp_v6_syn_recv_sock+0x96/0x940
> [    1.502284]   tcp_check_req+0x137/0x660
> [    1.502646]   tcp_v6_rcv+0xa63/0xe80
> [    1.502994]   ip6_protocol_deliver_rcu+0x78/0x590
> [    1.503434]   ip6_input_finish+0x72/0x140
> [    1.503818]   __netif_receive_skb_one_core+0x63/0xa0
> :
> :
> 
> [    1.512311] to a SOFTIRQ-irq-unsafe lock:
> [    1.512773]  (&h->lhash2[i].lock){+.+.}-{2:2}
> [    1.512776]
> [    1.512776] ... which became SOFTIRQ-irq-unsafe at:
> [    1.513691] ...
> [    1.513692]   lock_acquire+0xcd/0x330
> [    1.514168]   _raw_spin_lock+0x33/0x40
> [    1.514492]   __inet_hash+0x4b/0x210
> [    1.514802]   inet_csk_listen_start+0xe6/0x100
> [    1.515188]   inet_listen+0x95/0x1d0
> [    1.515510]   __sys_listen+0x69/0xb0
> [    1.515825]   __x64_sys_listen+0x14/0x20
> [    1.516163]   do_syscall_64+0x3c/0x90
> [    1.516481]   entry_SYSCALL_64_after_hwframe+0x72/0xdc
> :
> :
> 
> [    1.560494]  ... acquired at:
> [    1.560634]    lock_acquire+0xcd/0x330
> [    1.560804]    _raw_spin_lock_bh+0x38/0x50
> [    1.561002]    inet_unhash+0x96/0xd0
> [    1.561168]    tcp_set_state+0x6a/0x210
> [    1.561352]    tcp_abort+0x12b/0x230
> [    1.561518]    bpf_prog_f4110fb1100e26b5_iter_tcp6_server+0xa3/0xaa
> [    1.561799]    bpf_iter_run_prog+0x1ff/0x340
> [    1.561990]    bpf_iter_tcp_seq_show+0xca/0x190
> [    1.562193]    bpf_seq_read+0x177/0x450
> [    1.562363]    vfs_read+0xc6/0x300
> [    1.562516]    ksys_read+0x69/0xf0
> [    1.562665]    do_syscall_64+0x3c/0x90
> [    1.562838]    entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
>>
>>>   			return;
>>>   		}
>>
>>> @@ -788,7 +790,8 @@ void inet_unhash(struct sock *sk)
>>
>>>   		__sk_nulls_del_node_init_rcu(sk);
>>>   		sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
>>> -		spin_unlock(&ilb2->lock);
>>> +		if (!has_current_bpf_ctx())
>>> +			spin_unlock(&ilb2->lock);
>>>   	} else {
>>>   		spinlock_t *lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
>>
>>> --
>>> 2.34.1
>

[v3,bpf-next,3/5,RFC] net: Skip taking lock in BPF context

Checks

Commit Message

Comments

Patch