diff mbox series

[bpf] bpf: fix recursive lock when verdict program return SK_PASS

Message ID 20241106124431.5583-1-mrpre@163.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf] bpf: fix recursive lock when verdict program return SK_PASS | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: daniel@iogearbox.net; 3 maintainers not CCed: horms@kernel.org daniel@iogearbox.net bpf@vger.kernel.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: From:/Signed-off-by: email name mismatch: 'From: mrpre <mrpre@163.com>' != 'Signed-off-by: Jiayuan Chen <mrpre@163.com>'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-VM_Test-15 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18

Commit Message

Jiayuan Chen Nov. 6, 2024, 12:44 p.m. UTC
When the stream_verdict program returns SK_PASS, it places the received skb
into its own receive queue, but a recursive lock eventually occurs, leading
to an operating system deadlock. This issue has been present since v6.9.

'''
sk_psock_strp_data_ready
    write_lock_bh(&sk->sk_callback_lock)
    strp_data_ready
      strp_read_sock
        read_sock -> tcp_read_sock
          strp_recv
            cb.rcv_msg -> sk_psock_strp_read
              # now stream_verdict return SK_PASS without peer sock assign
              __SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
              sk_psock_verdict_apply
                sk_psock_skb_ingress_self
                  sk_psock_skb_ingress_enqueue
                    sk_psock_data_ready
                      read_lock_bh(&sk->sk_callback_lock) <= dead lock

'''

This topic has been discussed before, but it has not been fixed.
Previous discussion:
https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch

Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/skmsg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Martin KaFai Lau Nov. 8, 2024, 9:03 p.m. UTC | #1
On 11/6/24 4:44 AM, mrpre wrote:
> When the stream_verdict program returns SK_PASS, it places the received skb
> into its own receive queue, but a recursive lock eventually occurs, leading
> to an operating system deadlock. This issue has been present since v6.9.
> 
> '''
> sk_psock_strp_data_ready
>      write_lock_bh(&sk->sk_callback_lock)
>      strp_data_ready
>        strp_read_sock
>          read_sock -> tcp_read_sock
>            strp_recv
>              cb.rcv_msg -> sk_psock_strp_read
>                # now stream_verdict return SK_PASS without peer sock assign
>                __SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
>                sk_psock_verdict_apply
>                  sk_psock_skb_ingress_self
>                    sk_psock_skb_ingress_enqueue
>                      sk_psock_data_ready
>                        read_lock_bh(&sk->sk_callback_lock) <= dead lock
> 
> '''
> 
> This topic has been discussed before, but it has not been fixed.
> Previous discussion:
> https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch

Is the selftest included in this link still useful to reproduce this bug?
If yes, please include that also.

> 
> Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
> Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> Signed-off-by: Jiayuan Chen <mrpre@163.com>

Please also use the real name in the author (i.e. the email sender). The patch 
needs a real author name also. I had manually fixed one of your earlier 
lock_sock fix before applying.

pw-bot: cr

> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

The patch and the earlier discussion make sense to me.
John and JakubS, please help to take another look in the next respin.
Martin KaFai Lau Nov. 8, 2024, 9:07 p.m. UTC | #2
On 11/8/24 1:03 PM, Martin KaFai Lau wrote:
> On 11/6/24 4:44 AM, mrpre wrote:
>> When the stream_verdict program returns SK_PASS, it places the received skb
>> into its own receive queue, but a recursive lock eventually occurs, leading
>> to an operating system deadlock. This issue has been present since v6.9.
>>
>> '''
>> sk_psock_strp_data_ready
>>      write_lock_bh(&sk->sk_callback_lock)
>>      strp_data_ready
>>        strp_read_sock
>>          read_sock -> tcp_read_sock
>>            strp_recv
>>              cb.rcv_msg -> sk_psock_strp_read
>>                # now stream_verdict return SK_PASS without peer sock assign
>>                __SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
>>                sk_psock_verdict_apply
>>                  sk_psock_skb_ingress_self
>>                    sk_psock_skb_ingress_enqueue
>>                      sk_psock_data_ready
>>                        read_lock_bh(&sk->sk_callback_lock) <= dead lock
>>
>> '''
>>
>> This topic has been discussed before, but it has not been fixed.
>> Previous discussion:
>> https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
> 
> Is the selftest included in this link still useful to reproduce this bug?
> If yes, please include that also.
> 
>>
>> Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in 
>> sk_psock_skb_ingress_enqueue")
>> Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
>> Signed-off-by: Jiayuan Chen <mrpre@163.com>
> 
> Please also use the real name in the author (i.e. the email sender). The patch 
> needs a real author name also. I had manually fixed one of your earlier 
> lock_sock fix before applying.

and the bpf mailing list address has a typo in the original patch email... I 
fixed that in this reply.

> 
> pw-bot: cr
> 
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> 
> The patch and the earlier discussion make sense to me.
> John and JakubS, please help to take another look in the next respin.
> 
>
Jiayuan Chen Nov. 10, 2024, 5:04 a.m. UTC | #3
On Fri, Nov 08, 2024 at 01:07:57PM +0800, Martin KaFai Lau wrote:
> On 11/8/24 1:03 PM, Martin KaFai Lau wrote:
> > On 11/6/24 4:44 AM, mrpre wrote:
> > > When the stream_verdict program returns SK_PASS, it places the received skb
> > > into its own receive queue, but a recursive lock eventually occurs, leading
> > > to an operating system deadlock. This issue has been present since v6.9.
> > > 
> > > '''
> > > sk_psock_strp_data_ready
> > >      write_lock_bh(&sk->sk_callback_lock)
> > >      strp_data_ready
> > >        strp_read_sock
> > >          read_sock -> tcp_read_sock
> > >            strp_recv
> > >              cb.rcv_msg -> sk_psock_strp_read
> > >                # now stream_verdict return SK_PASS without peer sock assign
> > >                __SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
> > >                sk_psock_verdict_apply
> > >                  sk_psock_skb_ingress_self
> > >                    sk_psock_skb_ingress_enqueue
> > >                      sk_psock_data_ready
> > >                        read_lock_bh(&sk->sk_callback_lock) <= dead lock
> > > 
> > > '''
> > > 
> > > This topic has been discussed before, but it has not been fixed.
> > > Previous discussion:
> > > https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
> > 
> > Is the selftest included in this link still useful to reproduce this bug?
> > If yes, please include that also.
> > 
> > > 
> > > Fixes: 6648e613226e ("bpf, skmsg: Fix NULL pointer dereference in
> > > sk_psock_skb_ingress_enqueue")
> > > Reported-by: Vincent Whitchurch <vincent.whitchurch@datadoghq.com>
> > > Signed-off-by: Jiayuan Chen <mrpre@163.com>
> > 
> > Please also use the real name in the author (i.e. the email sender). The
> > patch needs a real author name also. I had manually fixed one of your
> > earlier lock_sock fix before applying.
> 
> and the bpf mailing list address has a typo in the original patch email... I
> fixed that in this reply.
> 
> > 
> > pw-bot: cr
> > 
> > > Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> > 
> > The patch and the earlier discussion make sense to me.
> > John and JakubS, please help to take another look in the next respin.
> > 
> > 
Hi Martin,

Thank you for the reminder. I’ve added test case in the new patch,and
I found that the deadlock issue can be reproduced 100% of the time
whenever the test cases are run. This is indeed a very dangerous defect.

New patch: https://lore.kernel.org/bpf/20241109150305.141759-1-mrpre@163.com/T/#t
(Additionally, I followed your guidance and used the correct names in the
new patch. Thanks again.)
diff mbox series

Patch

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b1dcbd3be89e..e90fbab703b2 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1117,9 +1117,9 @@  static void sk_psock_strp_data_ready(struct sock *sk)
 		if (tls_sw_has_ctx_rx(sk)) {
 			psock->saved_data_ready(sk);
 		} else {
-			write_lock_bh(&sk->sk_callback_lock);
+			read_lock_bh(&sk->sk_callback_lock);
 			strp_data_ready(&psock->strp);
-			write_unlock_bh(&sk->sk_callback_lock);
+			read_unlock_bh(&sk->sk_callback_lock);
 		}
 	}
 	rcu_read_unlock();