Message ID | 56d8ec28df901432e7bde4953795166ce2edd472.1719553101.git.tanggeliang@kylinos.cn (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | bugfixes for skmsg | expand |
On 6/28/24 1:47 PM, Geliang Tang wrote: > From: Geliang Tang <tanggeliang@kylinos.cn> > > Every time run this BPF selftests (./test_sockmap) on a Loongarch platform, > a Kernel panic occurs: > > ''' > Oops[#1]: > CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G OE 6.10.0-rc2+ #32 > Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 > ... ... > ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420 > ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220 > CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) > PRMD: 0000000c (PPLV0 +PIE +PWE) > EUEN: 00000007 (+FPE +SXE +ASXE -BTE) > ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) > ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) > BADV: 0000000000000040 > PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) > Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT > Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043, task=...) > Stack : ... ... > ... > Call Trace: > [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220 > [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420 > [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0 > [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360 > [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120 > [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100 > [<90000000045b76ec>] do_syscall+0x8c/0xc0 > [<90000000030e1da4>] handle_syscall+0xc4/0x160 > > Code: ... > > ---[ end trace 0000000000000000 ]--- > ''' > > This crash is because a NULL pointer is passed to page_address() in > sk_msg_memcopy_from_iter(). Due to the difference in architecture, > page_address(0) will not trigger a panic on the X86 platform but will panic > on the Loogarch platform. So this bug was hidden on the x86 platform, but > now it is exposed on the Loogarch platform. > > This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an invalid > "sge" is always used: > > if (msg->sg.copybreak >= sge->length) { > msg->sg.copybreak = 0; > sk_msg_iter_var_next(i); > if (i == msg->sg.end) > break; > sge = sk_msg_elem(msg, i); > } > > If the value of i is 2, msg->sg.end is also 2 when entering this if block. > sk_msg_iter_var_next() increases i by 1, and now i is 3, which is no longer > equal to msg->sg.end. The break will not be triggered, and the next sge > obtained by sk_msg_elem(3) will be an invalid one. > > The correct approach is to check (i == msg->sg.end) first, and then invoke > sk_msg_iter_var_next() if they are not equal. > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > --- > net/core/skmsg.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > index 44952cdd1425..1906d0d0eeac 100644 > --- a/net/core/skmsg.c > +++ b/net/core/skmsg.c > @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, > /* This is possible if a trim operation shrunk the buffer */ > if (msg->sg.copybreak >= sge->length) { > msg->sg.copybreak = 0; > - sk_msg_iter_var_next(i); > if (i == msg->sg.end) > break; > + sk_msg_iter_var_next(i); Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> > sge = sk_msg_elem(msg, i); > } >
Hello, On Mon, 2024-07-01 at 17:00 +0800, D. Wythe wrote: > > > On 6/28/24 1:47 PM, Geliang Tang wrote: > > From: Geliang Tang <tanggeliang@kylinos.cn> > > > > Every time run this BPF selftests (./test_sockmap) on a Loongarch > > platform, > > a Kernel panic occurs: > > > > ''' > > Oops[#1]: > > CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G OE 6.10.0- > > rc2+ #32 > > Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS > > Loongson-UDK2018 > > ... ... > > ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420 > > ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220 > > CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) > > PRMD: 0000000c (PPLV0 +PIE +PWE) > > EUEN: 00000007 (+FPE +SXE +ASXE -BTE) > > ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) > > ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) > > BADV: 0000000000000040 > > PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) > > Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack > > ipt_REJECT > > Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043, > > task=...) > > Stack : ... ... > > ... > > Call Trace: > > [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220 > > [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420 > > [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0 > > [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360 > > [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120 > > [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100 > > [<90000000045b76ec>] do_syscall+0x8c/0xc0 > > [<90000000030e1da4>] handle_syscall+0xc4/0x160 > > > > Code: ... > > > > ---[ end trace 0000000000000000 ]--- > > ''' > > > > This crash is because a NULL pointer is passed to page_address() in > > sk_msg_memcopy_from_iter(). Due to the difference in architecture, > > page_address(0) will not trigger a panic on the X86 platform but > > will panic > > on the Loogarch platform. So this bug was hidden on the x86 > > platform, but > > now it is exposed on the Loogarch platform. > > > > This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an > > invalid > > "sge" is always used: > > > > if (msg->sg.copybreak >= sge->length) { > > msg->sg.copybreak = 0; > > sk_msg_iter_var_next(i); > > if (i == msg->sg.end) > > break; > > sge = sk_msg_elem(msg, i); > > } > > > > If the value of i is 2, msg->sg.end is also 2 when entering this if > > block. > > sk_msg_iter_var_next() increases i by 1, and now i is 3, which is > > no longer > > equal to msg->sg.end. The break will not be triggered, and the next > > sge > > obtained by sk_msg_elem(3) will be an invalid one. > > > > The correct approach is to check (i == msg->sg.end) first, and then > > invoke > > sk_msg_iter_var_next() if they are not equal. > > > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg > > interface") > > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > > --- > > net/core/skmsg.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c > > index 44952cdd1425..1906d0d0eeac 100644 > > --- a/net/core/skmsg.c > > +++ b/net/core/skmsg.c > > @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk, > > struct iov_iter *from, > > /* This is possible if a trim operation shrunk the > > buffer */ > > if (msg->sg.copybreak >= sge->length) { > > msg->sg.copybreak = 0; > > - sk_msg_iter_var_next(i); > > if (i == msg->sg.end) > > break; > > + sk_msg_iter_var_next(i); > Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Thanks for your review. But this change breaks test_sockmap. Will send a v4 to fix this. Changes Requested. -Geliang > > sge = sk_msg_elem(msg, i); > > } > > > > >
Hi John, On Mon, 2024-07-01 at 18:29 +0800, Geliang Tang wrote: > > > > > > > > Hello, > > > > > > > > > > > > > > > > On Mon, 2024-07-01 at 17:00 +0800, D. Wythe wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 6/28/24 1:47 PM, Geliang Tang wrote: > > > > > > > > > > > > > > > > > > > > > > > > From: Geliang Tang > > > > > > > > > > > > > > > > > > > > > > > > <tanggeliang@kylinos.cn > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Every time run this BPF > > > > > > > > > > > > > > > > > > > > > > > > selftests > > > > > > > > > > > > > > > > > > > > > > > > (./test_sockmap) on a > > > > > > > > > > > > > > > > > > > > > > > > Loongarch > > > > > > > > > > > > > > > > > > > > > > > > platform, This issue seems related to "txmsg_cork". Every subtest of test_sockmap with a txmsg_cork value greater than 1 will trigger this issue. For example, this test_txmsg_cork_hangs test: static void test_txmsg_cork_hangs(int cgrp, struct sockmap_options *opt) { txmsg_pass = 1; txmsg_redir = 0; txmsg_cork = 4097; txmsg_apply = 4097; test_send_large(opt, cgrp); txmsg_pass = 0; txmsg_redir = 1; txmsg_apply = 0; txmsg_cork = 4097; test_send_large(opt, cgrp); txmsg_pass = 0; txmsg_redir = 1; txmsg_apply = 4097; txmsg_cork = 4097; test_send_large(opt, cgrp); } These tests will break the sk_msg sge iteration in sk_msg_memcopy_from_iter(). I added the following test code: ''' diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bbf40b999713..e1fd40aa8586 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -374,6 +374,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, void *to; do { + pr_info("%s i=%d end=%d\n", __func__, i, msg->sg.end); sge = sk_msg_elem(msg, i); /* This is possible if a trim operation shrunk the buffer */ if (msg->sg.copybreak >= sge->length) { -- ''' And got this logs when running "test hanging corks" test: ''' # 1/ 1 sockmap::txmsg test hanging corks:OK # 2/ 1 sockhash::txmsg test hanging corks:OK # 3/ 1 sockhash:ktls:txmsg test hanging corks:OK [ 55.751687] sk_msg_memcopy_from_iter i=0 end=8 [ 55.751712] sk_msg_memcopy_from_iter i=1 end=8 [ 55.751726] sk_msg_memcopy_from_iter i=2 end=8 [ 55.751769] sk_msg_memcopy_from_iter i=3 end=8 [ 55.751778] sk_msg_memcopy_from_iter i=4 end=8 [ 55.751787] sk_msg_memcopy_from_iter i=5 end=8 [ 55.751796] sk_msg_memcopy_from_iter i=6 end=8 [ 55.751805] sk_msg_memcopy_from_iter i=7 end=8 [ 55.752979] sk_msg_memcopy_from_iter i=8 end=16 [ 55.752988] sk_msg_memcopy_from_iter i=9 end=16 [ 55.752995] sk_msg_memcopy_from_iter i=10 end=16 [ 55.753002] sk_msg_memcopy_from_iter i=11 end=16 [ 55.753008] sk_msg_memcopy_from_iter i=12 end=16 [ 55.753015] sk_msg_memcopy_from_iter i=13 end=16 [ 55.753022] sk_msg_memcopy_from_iter i=14 end=16 [ 55.753028] sk_msg_memcopy_from_iter i=15 end=16 [ 56.087047] sk_msg_memcopy_from_iter i=0 end=1 [ 56.087075] sk_msg_memcopy_from_iter i=1 end=1 [ 56.087081] sk_msg_memcopy_from_iter i=3 end=1 [ 56.087086] sk_msg_memcopy_from_iter i=5 end=1 [ 56.087091] sk_msg_memcopy_from_iter i=7 end=1 [ 56.087095] sk_msg_memcopy_from_iter i=9 end=1 [ 56.087100] sk_msg_memcopy_from_iter i=11 end=1 [ 56.087105] sk_msg_memcopy_from_iter i=13 end=1 [ 56.087110] sk_msg_memcopy_from_iter i=15 end=1 [ 56.087115] sk_msg_memcopy_from_iter i=17 end=1 ''' When "i" is greater than "end", the sge we get is empty, sge->length is 0. If we access this sge, this issue will occur. Kernel panics on some machines. To fix this, we can't "break" the loop like I did in this patch itself. It will break test_sockmap with the following error logs: # 1/ 1 sockmap::txmsg test hanging corks:OK sendpage loop error: No space left on device msg_loop_tx: iov_count 1024 iov_buf 256 cnt 2 err -1 tx thread exited with err 1. # 2/ 1 sockhash::txmsg test hanging corks:FAIL # 3/ 1 sockhash:ktls:txmsg test hanging corks:OK Pass: 2 Fail: 1 We must "continue" the loop to get the next valid sge. A better fix is here: ''' diff --git a/net/core/skmsg.c b/net/core/skmsg.c index bbf40b999713..bbaf909d0f9c 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -376,13 +376,8 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, do { sge = sk_msg_elem(msg, i); /* This is possible if a trim operation shrunk the buffer */ - if (msg->sg.copybreak >= sge->length) { - msg->sg.copybreak = 0; - sk_msg_iter_var_next(i); - if (i == msg->sg.end) - break; - sge = sk_msg_elem(msg, i); - } + if (msg->sg.copybreak >= sge->length) + goto next; buf_size = sge->length - msg->sg.copybreak; copy = (buf_size > bytes) ? bytes : buf_size; @@ -399,6 +394,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, bytes -= copy; if (!bytes) break; +next: msg->sg.copybreak = 0; sk_msg_iter_var_next(i); } while (i != msg->sg.end);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index 44952cdd1425..1906d0d0eeac 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from, /* This is possible if a trim operation shrunk the buffer */ if (msg->sg.copybreak >= sge->length) { msg->sg.copybreak = 0; - sk_msg_iter_var_next(i); if (i == msg->sg.end) break; + sk_msg_iter_var_next(i); sge = sk_msg_elem(msg, i); }