Message ID | 20241020110345.1468595-1-zijianzhang@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | Fixes to bpf_msg_push/pop_data and test_sockmap | expand |
zijianzhang@ wrote: > From: Zijian Zhang <zijianzhang@bytedance.com> > > Several fixes to test_sockmap and added push/pop logic for msg_verify_data > Before the fixes, some of the tests in test_sockmap are problematic, > resulting in pseudo-correct result. > > 1. txmsg_pass is not set in some tests, as a result, no eBPF program is > attached to the sockmap. > 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some > test skippings and failures. > 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause > msg_loop_rx end early and skip some data tests. > > Besides, for msg_verify_data, I added push/pop checking logic to function > msg_verify_data and added more tests for different cases. Thanks! Yep I think push/pop are not widely used anywhere unfortunately. There are some interesting uses for push/pop to add/edit headers, but I've not gotten there yet clearly. > > After that, I found that there are some bugs in bpf_msg_push_data, > bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason > why they have not been exposed is that because of the above problems, they > will not be triggered. Good. I'll review these quickly tonight/tomorrow and run some testing. We don't currently have any longer running tests with push/pop. > > With the fixes, we can pass the sockmap test with data integrity test now. > However, the fixes to test_sockmap expose more problems in sockhash test > with SENDPAGE and ktls with SENDPAGE. > > The problem I observed, Thanks for digging into these. [...]
Hi Zijian, On 10/24/24 6:06 AM, John Fastabend wrote: > zijianzhang@ wrote: >> From: Zijian Zhang <zijianzhang@bytedance.com> >> >> Several fixes to test_sockmap and added push/pop logic for msg_verify_data >> Before the fixes, some of the tests in test_sockmap are problematic, >> resulting in pseudo-correct result. >> >> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is >> attached to the sockmap. >> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some >> test skippings and failures. >> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause >> msg_loop_rx end early and skip some data tests. >> >> Besides, for msg_verify_data, I added push/pop checking logic to function >> msg_verify_data and added more tests for different cases. > > Thanks! Yep I think push/pop are not widely used anywhere unfortunately. > There are some interesting uses for push/pop to add/edit headers, but > I've not gotten there yet clearly. > >> After that, I found that there are some bugs in bpf_msg_push_data, >> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason >> why they have not been exposed is that because of the above problems, they >> will not be triggered. > > Good. I'll review these quickly tonight/tomorrow and run some testing. > We don't currently have any longer running tests with push/pop. Looks like the series needs a rebase to latest bpf tree. Thanks, Daniel
On 10/24/24 7:43 AM, Daniel Borkmann wrote: > Hi Zijian, > > On 10/24/24 6:06 AM, John Fastabend wrote: >> zijianzhang@ wrote: >>> From: Zijian Zhang <zijianzhang@bytedance.com> >>> >>> Several fixes to test_sockmap and added push/pop logic for >>> msg_verify_data >>> Before the fixes, some of the tests in test_sockmap are problematic, >>> resulting in pseudo-correct result. >>> >>> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is >>> attached to the sockmap. >>> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some >>> test skippings and failures. >>> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may >>> cause >>> msg_loop_rx end early and skip some data tests. >>> >>> Besides, for msg_verify_data, I added push/pop checking logic to >>> function >>> msg_verify_data and added more tests for different cases. >> >> Thanks! Yep I think push/pop are not widely used anywhere unfortunately. >> There are some interesting uses for push/pop to add/edit headers, but >> I've not gotten there yet clearly. >> Thanks for the reviewing :) >>> After that, I found that there are some bugs in bpf_msg_push_data, >>> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason >>> why they have not been exposed is that because of the above problems, >>> they >>> will not be triggered. >> >> Good. I'll review these quickly tonight/tomorrow and run some testing. >> We don't currently have any longer running tests with push/pop. > > Looks like the series needs a rebase to latest bpf tree. > > Thanks, > Daniel This series depends on my previous fixes to test_sockmap("Two fixes for test_sockmap"), and they were merged to bpf/bpf-next.git (net branch) a week ago. Shall I wait for merging of them to the latest bpf, and then rebase? Thanks, Zijian
On 10/24/24 7:56 PM, Zijian Zhang wrote: > On 10/24/24 7:43 AM, Daniel Borkmann wrote: >> On 10/24/24 6:06 AM, John Fastabend wrote: >>> zijianzhang@ wrote: >>>> From: Zijian Zhang <zijianzhang@bytedance.com> >>>> >>>> Several fixes to test_sockmap and added push/pop logic for msg_verify_data >>>> Before the fixes, some of the tests in test_sockmap are problematic, >>>> resulting in pseudo-correct result. >>>> >>>> 1. txmsg_pass is not set in some tests, as a result, no eBPF program is >>>> attached to the sockmap. >>>> 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some >>>> test skippings and failures. >>>> 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause >>>> msg_loop_rx end early and skip some data tests. >>>> >>>> Besides, for msg_verify_data, I added push/pop checking logic to function >>>> msg_verify_data and added more tests for different cases. >>> >>> Thanks! Yep I think push/pop are not widely used anywhere unfortunately. >>> There are some interesting uses for push/pop to add/edit headers, but >>> I've not gotten there yet clearly. > > Thanks for the reviewing :) > >>>> After that, I found that there are some bugs in bpf_msg_push_data, >>>> bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason >>>> why they have not been exposed is that because of the above problems, they >>>> will not be triggered. >>> >>> Good. I'll review these quickly tonight/tomorrow and run some testing. >>> We don't currently have any longer running tests with push/pop. >> >> Looks like the series needs a rebase to latest bpf tree. >> >> Thanks, >> Daniel > > This series depends on my previous fixes to test_sockmap("Two fixes for > test_sockmap"), and they were merged to bpf/bpf-next.git (net branch) a > week ago. Shall I wait for merging of them to the latest bpf, and then > rebase? Then this series also needs to be based against bpf-next, net branch (along with PATCH bpf-next in $subj) so that the CI can pick it up. Thanks, Daniel
On 10/24/24 11:12 AM, Daniel Borkmann wrote: >> This series depends on my previous fixes to test_sockmap("Two fixes for >> test_sockmap"), and they were merged to bpf/bpf-next.git (net branch) a >> week ago. Shall I wait for merging of them to the latest bpf, and then >> rebase? > > Then this series also needs to be based against bpf-next, net branch (along > with PATCH bpf-next in $subj) so that the CI can pick it up. > > Thanks, > Daniel I tried bpf-next, and found I still could not pass the apply test. Then, I sent another series with bpf-next/next in $subj, and it also failed, but the CI is running somehow. Not sure what is the right way to target bpf-next, net branch? Apologize for the flood of emails. Could you help me change the state of the wrong patch series, so that it won't bother others? Thanks, Zijian
From: Zijian Zhang <zijianzhang@bytedance.com> Several fixes to test_sockmap and added push/pop logic for msg_verify_data Before the fixes, some of the tests in test_sockmap are problematic, resulting in pseudo-correct result. 1. txmsg_pass is not set in some tests, as a result, no eBPF program is attached to the sockmap. 2. In SENDPAGE, a wrong iov_length in test_send_large may result in some test skippings and failures. 3. The calculation of total_bytes in msg_loop_rx is wrong, which may cause msg_loop_rx end early and skip some data tests. Besides, for msg_verify_data, I added push/pop checking logic to function msg_verify_data and added more tests for different cases. After that, I found that there are some bugs in bpf_msg_push_data, bpf_msg_pop_data and sk_msg_reset_curr, and fix them. I guess the reason why they have not been exposed is that because of the above problems, they will not be triggered. With the fixes, we can pass the sockmap test with data integrity test now. However, the fixes to test_sockmap expose more problems in sockhash test with SENDPAGE and ktls with SENDPAGE. The problem I observed, 1. In sockhash test, a NULL pointer kernel BUG will be reported for nearly every cork test. More inspections are needed for splice_to_socket. BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 0 P4D 0 Oops: Oops: 0000 [#3] PREEMPT SMP PTI CPU: 3 UID: 0 PID: 2122 Comm: test_sockmap 6.12.0-rc2.bm.1-amd64+ #98 Tainted: [D]=DIE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:splice_to_socket+0x34a/0x480 Call Trace: <TASK> ? __die_body+0x1e/0x60 ? page_fault_oops+0x159/0x4d0 ? exc_page_fault+0x7e/0x180 ? asm_exc_page_fault+0x26/0x30 ? splice_to_socket+0x34a/0x480 ? __memcg_slab_post_alloc_hook+0x205/0x3c0 ? alloc_pipe_info+0xd6/0x1f0 ? __kmalloc_noprof+0x37f/0x3b0 direct_splice_actor+0x40/0x100 splice_direct_to_actor+0xfd/0x290 ? __pfx_direct_splice_actor+0x10/0x10 do_splice_direct_actor+0x82/0xb0 ? __pfx_direct_file_splice_eof+0x10/0x10 do_splice_direct+0x13/0x20 ? __pfx_direct_splice_actor+0x10/0x10 do_sendfile+0x33c/0x3f0 __x64_sys_sendfile64+0xa7/0xc0 do_syscall_64+0x62/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]--- 2. txmsg_pass are not set before, and some tests are skipped. Now after the fixes, we have some failure cases now. More fixes are needed either for the selftest or the ktls kernel code. 1/ 6 sockhash:ktls:txmsg test passthrough:OK 2/ 6 sockhash:ktls:txmsg test redirect:OK 3/ 1 sockhash:ktls:txmsg test redirect wait send mem:OK 4/ 6 sockhash:ktls:txmsg test drop:OK 5/ 6 sockhash:ktls:txmsg test ingress redirect:OK 6/ 7 sockhash:ktls:txmsg test skb:OK 7/12 sockhash:ktls:txmsg test apply:OK 8/12 sockhash:ktls:txmsg test cork:OK 9/ 3 sockhash:ktls:txmsg test hanging corks:OK detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. 10/11 sockhash:ktls:txmsg test push_data:FAIL detected data corruption @iov[0]:0 17 != 00, 00 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 00 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. 11/17 sockhash:ktls:txmsg test pull-data:FAIL recv failed(): Invalid argument rx thread exited with err 1. recv failed(): Invalid argument rx thread exited with err 1. recv failed(): Bad message rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. detected data corruption @iov[0]:0 17 != 00, 03 ?= 01 data verify msg failed: Unknown error -2001 rx thread exited with err 1. 12/ 9 sockhash:ktls:txmsg test pop-data:FAIL recv failed(): Bad message rx thread exited with err 1. recv failed(): Bad message rx thread exited with err 1. 13/ 6 sockhash:ktls:txmsg test push/pop data:FAIL 14/ 1 sockhash:ktls:txmsg test ingress parser:OK 15/ 0 sockhash:ktls:txmsg test ingress parser2:OK Pass: 11 Fail: 17 Zijian Zhang (8): selftests/bpf: Add txmsg_pass to pull/push/pop in test_sockmap selftests/bpf: Fix SENDPAGE data logic in test_sockmap selftests/bpf: Fix total_bytes in msg_loop_rx in test_sockmap selftests/bpf: Add push/pop checking for msg_verify_data in test_sockmap selftests/bpf: Add more tests for test_txmsg_push_pop in test_sockmap bpf, sockmap: Several fixes to bpf_msg_push_data bpf, sockmap: Several fixes to bpf_msg_pop_data bpf, sockmap: Fix sk_msg_reset_curr net/core/filter.c | 89 +++++----- tools/testing/selftests/bpf/test_sockmap.c | 180 +++++++++++++++++++-- 2 files changed, 215 insertions(+), 54 deletions(-)