Message ID | 20240118043210.47618-1-guwen@linux.alibaba.com (mailing list archive) |
---|---|
State | Accepted |
Commit | dbc153fd3c142909e564bb256da087e13fbf239c |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net/smc: fix illegal rmb_desc access in SMC-D connection dump | expand |
On Thu, Jan 18, 2024 at 12:32:10PM +0800, Wen Gu wrote: >A crash was found when dumping SMC-D connections. It can be reproduced >by following steps: > >- run nginx/wrk test: > smc_run nginx > smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL> > >- continuously dump SMC-D connections in parallel: > watch -n 1 'smcss -D' > > BUG: kernel NULL pointer dereference, address: 0000000000000030 > CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55 > RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] > Call Trace: > <TASK> > ? __die+0x24/0x70 > ? page_fault_oops+0x66/0x150 > ? exc_page_fault+0x69/0x140 > ? asm_exc_page_fault+0x26/0x30 > ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] > ? __kmalloc_node_track_caller+0x35d/0x430 > ? __alloc_skb+0x77/0x170 > smc_diag_dump_proto+0xd0/0xf0 [smc_diag] > smc_diag_dump+0x26/0x60 [smc_diag] > netlink_dump+0x19f/0x320 > __netlink_dump_start+0x1dc/0x300 > smc_diag_handler_dump+0x6a/0x80 [smc_diag] > ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] > sock_diag_rcv_msg+0x121/0x140 > ? __pfx_sock_diag_rcv_msg+0x10/0x10 > netlink_rcv_skb+0x5a/0x110 > sock_diag_rcv+0x28/0x40 > netlink_unicast+0x22a/0x330 > netlink_sendmsg+0x1f8/0x420 > __sock_sendmsg+0xb0/0xc0 > ____sys_sendmsg+0x24e/0x300 > ? copy_msghdr_from_user+0x62/0x80 > ___sys_sendmsg+0x7c/0xd0 > ? __do_fault+0x34/0x160 > ? do_read_fault+0x5f/0x100 > ? do_fault+0xb0/0x110 > ? __handle_mm_fault+0x2b0/0x6c0 > __sys_sendmsg+0x4d/0x80 > do_syscall_64+0x69/0x180 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > >It is possible that the connection is in process of being established >when we dump it. Assumed that the connection has been registered in a >link group by smc_conn_create() but the rmb_desc has not yet been >initialized by smc_buf_create(), thus causing the illegal access to >conn->rmb_desc. So fix it by checking before dump. > >Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") >Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Best regards, Dust >--- >v2->v1: corrected the commit in Fixes tag. >(https://lore.kernel.org/netdev/20240117122749.63785-1-guwen@linux.alibaba.com/) > > net/smc/smc_diag.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c >index 52f7c4f1e767..5a33908015f3 100644 >--- a/net/smc/smc_diag.c >+++ b/net/smc/smc_diag.c >@@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, > } > if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd && > (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) && >- !list_empty(&smc->conn.lgr->list)) { >+ !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) { > struct smc_connection *conn = &smc->conn; > struct smcd_diag_dmbinfo dinfo; > struct smcd_dev *smcd = conn->lgr->smcd; >-- >2.32.0.3.g01195cf9f
On 18.01.24 05:32, Wen Gu wrote: > A crash was found when dumping SMC-D connections. It can be reproduced > by following steps: > > - run nginx/wrk test: > smc_run nginx > smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL> > > - continuously dump SMC-D connections in parallel: > watch -n 1 'smcss -D' > > BUG: kernel NULL pointer dereference, address: 0000000000000030 > CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55 > RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] > Call Trace: > <TASK> > ? __die+0x24/0x70 > ? page_fault_oops+0x66/0x150 > ? exc_page_fault+0x69/0x140 > ? asm_exc_page_fault+0x26/0x30 > ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] > ? __kmalloc_node_track_caller+0x35d/0x430 > ? __alloc_skb+0x77/0x170 > smc_diag_dump_proto+0xd0/0xf0 [smc_diag] > smc_diag_dump+0x26/0x60 [smc_diag] > netlink_dump+0x19f/0x320 > __netlink_dump_start+0x1dc/0x300 > smc_diag_handler_dump+0x6a/0x80 [smc_diag] > ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] > sock_diag_rcv_msg+0x121/0x140 > ? __pfx_sock_diag_rcv_msg+0x10/0x10 > netlink_rcv_skb+0x5a/0x110 > sock_diag_rcv+0x28/0x40 > netlink_unicast+0x22a/0x330 > netlink_sendmsg+0x1f8/0x420 > __sock_sendmsg+0xb0/0xc0 > ____sys_sendmsg+0x24e/0x300 > ? copy_msghdr_from_user+0x62/0x80 > ___sys_sendmsg+0x7c/0xd0 > ? __do_fault+0x34/0x160 > ? do_read_fault+0x5f/0x100 > ? do_fault+0xb0/0x110 > ? __handle_mm_fault+0x2b0/0x6c0 > __sys_sendmsg+0x4d/0x80 > do_syscall_64+0x69/0x180 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > It is possible that the connection is in process of being established > when we dump it. Assumed that the connection has been registered in a > link group by smc_conn_create() but the rmb_desc has not yet been > initialized by smc_buf_create(), thus causing the illegal access to > conn->rmb_desc. So fix it by checking before dump. > > Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") > Signed-off-by: Wen Gu <guwen@linux.alibaba.com> > --- > v2->v1: corrected the commit in Fixes tag. > (https://lore.kernel.org/netdev/20240117122749.63785-1-guwen@linux.alibaba.com/) > > net/smc/smc_diag.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c > index 52f7c4f1e767..5a33908015f3 100644 > --- a/net/smc/smc_diag.c > +++ b/net/smc/smc_diag.c > @@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, > } > if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd && > (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) && > - !list_empty(&smc->conn.lgr->list)) { > + !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) { > struct smc_connection *conn = &smc->conn; > struct smcd_diag_dmbinfo dinfo; > struct smcd_dev *smcd = conn->lgr->smcd; That sounds reasonable to me! Thank you for the fix! Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Hello: This patch was applied to netdev/net.git (main) by David S. Miller <davem@davemloft.net>: On Thu, 18 Jan 2024 12:32:10 +0800 you wrote: > A crash was found when dumping SMC-D connections. It can be reproduced > by following steps: > > - run nginx/wrk test: > smc_run nginx > smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL> > > [...] Here is the summary with links: - [net,v2] net/smc: fix illegal rmb_desc access in SMC-D connection dump https://git.kernel.org/netdev/net/c/dbc153fd3c14 You are awesome, thank you!
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 52f7c4f1e767..5a33908015f3 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, } if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd && (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) && - !list_empty(&smc->conn.lgr->list)) { + !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) { struct smc_connection *conn = &smc->conn; struct smcd_diag_dmbinfo dinfo; struct smcd_dev *smcd = conn->lgr->smcd;
A crash was found when dumping SMC-D connections. It can be reproduced by following steps: - run nginx/wrk test: smc_run nginx smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL> - continuously dump SMC-D connections in parallel: watch -n 1 'smcss -D' BUG: kernel NULL pointer dereference, address: 0000000000000030 CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55 RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] ? __kmalloc_node_track_caller+0x35d/0x430 ? __alloc_skb+0x77/0x170 smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x1f8/0x420 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x160 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 ? __handle_mm_fault+0x2b0/0x6c0 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x69/0x180 entry_SYSCALL_64_after_hwframe+0x6e/0x76 It is possible that the connection is in process of being established when we dump it. Assumed that the connection has been registered in a link group by smc_conn_create() but the rmb_desc has not yet been initialized by smc_buf_create(), thus causing the illegal access to conn->rmb_desc. So fix it by checking before dump. Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> --- v2->v1: corrected the commit in Fixes tag. (https://lore.kernel.org/netdev/20240117122749.63785-1-guwen@linux.alibaba.com/) net/smc/smc_diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)