Message ID | 20250331081003.1503211-1-wangliang74@huawei.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [net] net/smc: fix general protection fault in __smc_diag_dump | expand |
On 3/31/25 10:10 AM, Wang Liang wrote: > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 3e6cb35baf25..454801188514 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) > sk->sk_protocol = protocol; > WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); > WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); > + smc->clcsock = NULL; > INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); > INIT_WORK(&smc->connect_work, smc_connect_work); > INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); The syzkaller report has a few reproducers, have you tested this? AFAICS the smc socket is already zeroed on allocation by sk_alloc(). /P
On 01.04.25 13:01, Paolo Abeni wrote: > On 3/31/25 10:10 AM, Wang Liang wrote: >> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >> index 3e6cb35baf25..454801188514 100644 >> --- a/net/smc/af_smc.c >> +++ b/net/smc/af_smc.c >> @@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) >> sk->sk_protocol = protocol; >> WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); >> WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); >> + smc->clcsock = NULL; >> INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); >> INIT_WORK(&smc->connect_work, smc_connect_work); >> INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); > > The syzkaller report has a few reproducers, have you tested this? AFAICS > the smc socket is already zeroed on allocation by sk_alloc(). Yes. I also agree with you that smc socket should have already been zeroed. Currently in this commit, this member variable is set to NULL explicitly. I am not sure if this can fix this problem or not. Based on the following, it seems that this problem can be reproduced. " syzbot has tested the proposed patch but the reproducer is still triggering an issue: general protection fault in __smc_diag_dump " Thus follow the instructions in this link to make tests. https://groups.google.com/g/syzkaller-bugs/c/YwENRImdcsk/m/wBJo6qGiCAAJ?pli=1, the following can trigger the reproducer. " If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. " Zhu Yanjun > > /P >
在 2025/4/1 19:01, Paolo Abeni 写道: > On 3/31/25 10:10 AM, Wang Liang wrote: >> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >> index 3e6cb35baf25..454801188514 100644 >> --- a/net/smc/af_smc.c >> +++ b/net/smc/af_smc.c >> @@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) >> sk->sk_protocol = protocol; >> WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); >> WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); >> + smc->clcsock = NULL; >> INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); >> INIT_WORK(&smc->connect_work, smc_connect_work); >> INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); > The syzkaller report has a few reproducers, have you tested this? AFAICS > the smc socket is already zeroed on allocation by sk_alloc(). Yes, I test it by the C repro: https://syzkaller.appspot.com/text?tag=ReproC&x=13d2dc98580000 The C repro is provided by the 2025/02/27 15:16 crash from https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 After apply my patch, the crash no longer happens when running the C repro. "the smc socket is already zeroed on allocation by sk_alloc()", That is right. However, smc->clcsock may be modified indirectly in inet6_create(). The process like this: __sys_socket __sys_socket_create sock_create __sock_create # pf->create inet6_create // init smc->clcsock = 0 sk = sk_alloc() // set smc->clcsock to invalid address inet = inet_sk(sk); inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags); inet6_set_bit(MC6_LOOP, sk); inet6_set_bit(MC6_ALL, sk); smc_inet_init_sock smc_sk_init // add sk to smc_hash smc_hash_sk sk_add_node(sk, head); smc_create_clcsk // set smc->clcsock sock_create_kern(..., &smc->clcsock);) So initialize smc->clcsock to NULL explicitly in smc_sk_init() can fix this crash scene. If the problem can be reproduced after this patch, I guess it is not the same reason, and fix it by another patch is more appropriate. > > /P > >
On Wed, Apr 02, 2025 at 10:37:24AM +0800, Wang Liang wrote: > > 在 2025/4/1 19:01, Paolo Abeni 写道: > >On 3/31/25 10:10 AM, Wang Liang wrote: > >>diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > >>index 3e6cb35baf25..454801188514 100644 > >>--- a/net/smc/af_smc.c > >>+++ b/net/smc/af_smc.c > >>@@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) > >> sk->sk_protocol = protocol; > >> WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); > >> WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); > >>+ smc->clcsock = NULL; > >> INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); > >> INIT_WORK(&smc->connect_work, smc_connect_work); > >> INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); > >The syzkaller report has a few reproducers, have you tested this? AFAICS > >the smc socket is already zeroed on allocation by sk_alloc(). > > > Yes, I test it by the C repro: > https://syzkaller.appspot.com/text?tag=ReproC&x=13d2dc98580000 > > The C repro is provided by the 2025/02/27 15:16 crash from > https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 > > After apply my patch, the crash no longer happens when running the C repro. > > "the smc socket is already zeroed on allocation by sk_alloc()", That > is right. > However, smc->clcsock may be modified indirectly in inet6_create(). > The process like this: > > __sys_socket > __sys_socket_create > sock_create > __sock_create > # pf->create > inet6_create > // init smc->clcsock = 0 > sk = sk_alloc() > > // set smc->clcsock to invalid address > inet = inet_sk(sk); > inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags); > inet6_set_bit(MC6_LOOP, sk); > inet6_set_bit(MC6_ALL, sk); > > smc_inet_init_sock > smc_sk_init > // add sk to smc_hash > smc_hash_sk > sk_add_node(sk, head); > smc_create_clcsk > // set smc->clcsock > sock_create_kern(..., &smc->clcsock);) > > So initialize smc->clcsock to NULL explicitly in smc_sk_init() can fix > this crash scene. If the problem can be reproduced after this patch, I > guess it is not the same reason, and fix it by another patch is more > appropriate. > This is actually because the current smc_sock is not an inet_sock, leading to two modules simultaneously modifying the same offset in memory but interpreting its structure differently. I previously proposed embedding an inet(6)_sock at the beginning of smc_sock, but the community had some objections... I'm not sure on the community's current stance on this matter, but if a fix is absolutely necessary, my recommendation would still be to embed an inet(6)_sock within the smc_sock structure D. > > > >/P > > > >
在 2025/4/2 15:20, D. Wythe 写道: > On Wed, Apr 02, 2025 at 10:37:24AM +0800, Wang Liang wrote: >> 在 2025/4/1 19:01, Paolo Abeni 写道: >>> On 3/31/25 10:10 AM, Wang Liang wrote: >>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c >>>> index 3e6cb35baf25..454801188514 100644 >>>> --- a/net/smc/af_smc.c >>>> +++ b/net/smc/af_smc.c >>>> @@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) >>>> sk->sk_protocol = protocol; >>>> WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); >>>> WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); >>>> + smc->clcsock = NULL; >>>> INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); >>>> INIT_WORK(&smc->connect_work, smc_connect_work); >>>> INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); >>> The syzkaller report has a few reproducers, have you tested this? AFAICS >>> the smc socket is already zeroed on allocation by sk_alloc(). >> >> Yes, I test it by the C repro: >> https://syzkaller.appspot.com/text?tag=ReproC&x=13d2dc98580000 >> >> The C repro is provided by the 2025/02/27 15:16 crash from >> https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 >> >> After apply my patch, the crash no longer happens when running the C repro. >> >> "the smc socket is already zeroed on allocation by sk_alloc()", That >> is right. >> However, smc->clcsock may be modified indirectly in inet6_create(). >> The process like this: >> >> __sys_socket >> __sys_socket_create >> sock_create >> __sock_create >> # pf->create >> inet6_create >> // init smc->clcsock = 0 >> sk = sk_alloc() >> >> // set smc->clcsock to invalid address >> inet = inet_sk(sk); >> inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags); >> inet6_set_bit(MC6_LOOP, sk); >> inet6_set_bit(MC6_ALL, sk); >> >> smc_inet_init_sock >> smc_sk_init >> // add sk to smc_hash >> smc_hash_sk >> sk_add_node(sk, head); >> smc_create_clcsk >> // set smc->clcsock >> sock_create_kern(..., &smc->clcsock);) >> >> So initialize smc->clcsock to NULL explicitly in smc_sk_init() can fix >> this crash scene. If the problem can be reproduced after this patch, I >> guess it is not the same reason, and fix it by another patch is more >> appropriate. >> > This is actually because the current smc_sock is not an inet_sock, > leading to two modules simultaneously modifying the same offset in > memory but interpreting its structure differently. I previously proposed > embedding an inet(6)_sock at the beginning of smc_sock, but the > community had some objections... > > I'm not sure on the community's current stance on this matter, but if a > fix is absolutely necessary, my recommendation would still be to embed > an inet(6)_sock within the smc_sock structure > > D. At present, I think initializing the smc in smc_sk_init() may be the most simple and effective method. :P > >>> /P >>> >>>
On 31.03.25 10:10, Wang Liang wrote: > Syzbot reported a general protection fault: > > CPU: 0 UID: 0 PID: 5830 Comm: syz-executor600 Not tainted 6.14.0-rc4-syzkaller-00090-gdd83757f6e68 #0 > RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] > RIP: 0010:__smc_diag_dump.constprop.0+0x3de/0x23d0 net/smc/smc_diag.c:89 > Call Trace: > <TASK> > smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 > smc_diag_dump+0x84/0x90 net/smc/smc_diag.c:236 > netlink_dump+0x53c/0xd00 net/netlink/af_netlink.c:2318 > __netlink_dump_start+0x6ca/0x970 net/netlink/af_netlink.c:2433 > netlink_dump_start include/linux/netlink.h:340 [inline] > smc_diag_handler_dump+0x1fb/0x240 net/smc/smc_diag.c:251 > __sock_diag_cmd net/core/sock_diag.c:249 [inline] > sock_diag_rcv_msg+0x437/0x790 net/core/sock_diag.c:287 > netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2543 > netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] > netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1348 > netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1892 > sock_sendmsg_nosec net/socket.c:718 [inline] > __sock_sendmsg net/socket.c:733 [inline] > ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2573 > ___sys_sendmsg+0x135/0x1e0 net/socket.c:2627 > __sys_sendmsg+0x16e/0x220 net/socket.c:2659 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > </TASK> > > When create smc socket, smc_inet_init_sock() first add sk to the smc_hash > by smc_hash_sk(), then create smc->clcsock. it is possible that, after > smc_diag_dump_proto() traverses the smc_hash, smc->clcsock is not created > when the function visit it. > > The process like this: > > (CPU1) | (CPU2) > inet6_create() | > smc_inet_init_sock() | > smc_sk_init() | > smc_hash_sk() | > head = &smc_hash->ht; | > sk_add_node(sk, head); | > | smc_diag_dump_proto > | head = &smc_hash->ht; > | sk_for_each(sk, head) > | __smc_diag_dump() > | visit smc->clcsock > smc_create_clcsk() | > set smc->clcsock | > > Fix this by initialize smc->clcsock to NULL before add sk to smc_hash in > smc_sk_init(). > > Reported-by: syzbot+271fed3ed6f24600c364@syzkaller.appspotmail.com > Closes: https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 > Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") > Signed-off-by: Wang Liang <wangliang74@huawei.com> > --- > net/smc/af_smc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c > index 3e6cb35baf25..454801188514 100644 > --- a/net/smc/af_smc.c > +++ b/net/smc/af_smc.c > @@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) > sk->sk_protocol = protocol; > WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); > WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); > + smc->clcsock = NULL; > INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); > INIT_WORK(&smc->connect_work, smc_connect_work); > INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); I have to agree with this workaround, even though I see that is not the best solution. Thus, I'd like to give my R-b: Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Btw. @D. Wythe, would you mind sending me the link of your proposal you mentioned please? Let me have a look. It seems like I missed it. Thanks, Wenjia
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 3e6cb35baf25..454801188514 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -371,6 +371,7 @@ void smc_sk_init(struct net *net, struct sock *sk, int protocol) sk->sk_protocol = protocol; WRITE_ONCE(sk->sk_sndbuf, 2 * READ_ONCE(net->smc.sysctl_wmem)); WRITE_ONCE(sk->sk_rcvbuf, 2 * READ_ONCE(net->smc.sysctl_rmem)); + smc->clcsock = NULL; INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work); INIT_WORK(&smc->connect_work, smc_connect_work); INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
Syzbot reported a general protection fault: CPU: 0 UID: 0 PID: 5830 Comm: syz-executor600 Not tainted 6.14.0-rc4-syzkaller-00090-gdd83757f6e68 #0 RIP: 0010:smc_diag_msg_common_fill net/smc/smc_diag.c:44 [inline] RIP: 0010:__smc_diag_dump.constprop.0+0x3de/0x23d0 net/smc/smc_diag.c:89 Call Trace: <TASK> smc_diag_dump_proto+0x26d/0x420 net/smc/smc_diag.c:217 smc_diag_dump+0x84/0x90 net/smc/smc_diag.c:236 netlink_dump+0x53c/0xd00 net/netlink/af_netlink.c:2318 __netlink_dump_start+0x6ca/0x970 net/netlink/af_netlink.c:2433 netlink_dump_start include/linux/netlink.h:340 [inline] smc_diag_handler_dump+0x1fb/0x240 net/smc/smc_diag.c:251 __sock_diag_cmd net/core/sock_diag.c:249 [inline] sock_diag_rcv_msg+0x437/0x790 net/core/sock_diag.c:287 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2543 netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline] netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1348 netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1892 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg net/socket.c:733 [inline] ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2573 ___sys_sendmsg+0x135/0x1e0 net/socket.c:2627 __sys_sendmsg+0x16e/0x220 net/socket.c:2659 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> When create smc socket, smc_inet_init_sock() first add sk to the smc_hash by smc_hash_sk(), then create smc->clcsock. it is possible that, after smc_diag_dump_proto() traverses the smc_hash, smc->clcsock is not created when the function visit it. The process like this: (CPU1) | (CPU2) inet6_create() | smc_inet_init_sock() | smc_sk_init() | smc_hash_sk() | head = &smc_hash->ht; | sk_add_node(sk, head); | | smc_diag_dump_proto | head = &smc_hash->ht; | sk_for_each(sk, head) | __smc_diag_dump() | visit smc->clcsock smc_create_clcsk() | set smc->clcsock | Fix this by initialize smc->clcsock to NULL before add sk to smc_hash in smc_sk_init(). Reported-by: syzbot+271fed3ed6f24600c364@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=271fed3ed6f24600c364 Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Signed-off-by: Wang Liang <wangliang74@huawei.com> --- net/smc/af_smc.c | 1 + 1 file changed, 1 insertion(+)