diff mbox series

[net,v2] net/smc: fix illegal rmb_desc access in SMC-D connection dump

Message ID 20240118043210.47618-1-guwen@linux.alibaba.com (mailing list archive)
State Accepted
Commit dbc153fd3c142909e564bb256da087e13fbf239c
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] net/smc: fix illegal rmb_desc access in SMC-D connection dump | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1080 this patch: 1080
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 1095 this patch: 1095
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1095 this patch: 1095
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-01-18--06-00 (tests: 403)

Commit Message

Wen Gu Jan. 18, 2024, 4:32 a.m. UTC
A crash was found when dumping SMC-D connections. It can be reproduced
by following steps:

- run nginx/wrk test:
  smc_run nginx
  smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>

- continuously dump SMC-D connections in parallel:
  watch -n 1 'smcss -D'

 BUG: kernel NULL pointer dereference, address: 0000000000000030
 CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G	E      6.7.0+ #55
 RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
 Call Trace:
  <TASK>
  ? __die+0x24/0x70
  ? page_fault_oops+0x66/0x150
  ? exc_page_fault+0x69/0x140
  ? asm_exc_page_fault+0x26/0x30
  ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
  ? __kmalloc_node_track_caller+0x35d/0x430
  ? __alloc_skb+0x77/0x170
  smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
  smc_diag_dump+0x26/0x60 [smc_diag]
  netlink_dump+0x19f/0x320
  __netlink_dump_start+0x1dc/0x300
  smc_diag_handler_dump+0x6a/0x80 [smc_diag]
  ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
  sock_diag_rcv_msg+0x121/0x140
  ? __pfx_sock_diag_rcv_msg+0x10/0x10
  netlink_rcv_skb+0x5a/0x110
  sock_diag_rcv+0x28/0x40
  netlink_unicast+0x22a/0x330
  netlink_sendmsg+0x1f8/0x420
  __sock_sendmsg+0xb0/0xc0
  ____sys_sendmsg+0x24e/0x300
  ? copy_msghdr_from_user+0x62/0x80
  ___sys_sendmsg+0x7c/0xd0
  ? __do_fault+0x34/0x160
  ? do_read_fault+0x5f/0x100
  ? do_fault+0xb0/0x110
  ? __handle_mm_fault+0x2b0/0x6c0
  __sys_sendmsg+0x4d/0x80
  do_syscall_64+0x69/0x180
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

It is possible that the connection is in process of being established
when we dump it. Assumed that the connection has been registered in a
link group by smc_conn_create() but the rmb_desc has not yet been
initialized by smc_buf_create(), thus causing the illegal access to
conn->rmb_desc. So fix it by checking before dump.

Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
v2->v1: corrected the commit in Fixes tag.
(https://lore.kernel.org/netdev/20240117122749.63785-1-guwen@linux.alibaba.com/)

 net/smc/smc_diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Dust Li Jan. 18, 2024, 9:11 a.m. UTC | #1
On Thu, Jan 18, 2024 at 12:32:10PM +0800, Wen Gu wrote:
>A crash was found when dumping SMC-D connections. It can be reproduced
>by following steps:
>
>- run nginx/wrk test:
>  smc_run nginx
>  smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
>
>- continuously dump SMC-D connections in parallel:
>  watch -n 1 'smcss -D'
>
> BUG: kernel NULL pointer dereference, address: 0000000000000030
> CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G	E      6.7.0+ #55
> RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
> Call Trace:
>  <TASK>
>  ? __die+0x24/0x70
>  ? page_fault_oops+0x66/0x150
>  ? exc_page_fault+0x69/0x140
>  ? asm_exc_page_fault+0x26/0x30
>  ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
>  ? __kmalloc_node_track_caller+0x35d/0x430
>  ? __alloc_skb+0x77/0x170
>  smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
>  smc_diag_dump+0x26/0x60 [smc_diag]
>  netlink_dump+0x19f/0x320
>  __netlink_dump_start+0x1dc/0x300
>  smc_diag_handler_dump+0x6a/0x80 [smc_diag]
>  ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
>  sock_diag_rcv_msg+0x121/0x140
>  ? __pfx_sock_diag_rcv_msg+0x10/0x10
>  netlink_rcv_skb+0x5a/0x110
>  sock_diag_rcv+0x28/0x40
>  netlink_unicast+0x22a/0x330
>  netlink_sendmsg+0x1f8/0x420
>  __sock_sendmsg+0xb0/0xc0
>  ____sys_sendmsg+0x24e/0x300
>  ? copy_msghdr_from_user+0x62/0x80
>  ___sys_sendmsg+0x7c/0xd0
>  ? __do_fault+0x34/0x160
>  ? do_read_fault+0x5f/0x100
>  ? do_fault+0xb0/0x110
>  ? __handle_mm_fault+0x2b0/0x6c0
>  __sys_sendmsg+0x4d/0x80
>  do_syscall_64+0x69/0x180
>  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
>It is possible that the connection is in process of being established
>when we dump it. Assumed that the connection has been registered in a
>link group by smc_conn_create() but the rmb_desc has not yet been
>initialized by smc_buf_create(), thus causing the illegal access to
>conn->rmb_desc. So fix it by checking before dump.
>
>Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
>Signed-off-by: Wen Gu <guwen@linux.alibaba.com>

Reviewed-by: Dust Li <dust.li@linux.alibaba.com>

Best regards,
Dust

>---
>v2->v1: corrected the commit in Fixes tag.
>(https://lore.kernel.org/netdev/20240117122749.63785-1-guwen@linux.alibaba.com/)
>
> net/smc/smc_diag.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
>index 52f7c4f1e767..5a33908015f3 100644
>--- a/net/smc/smc_diag.c
>+++ b/net/smc/smc_diag.c
>@@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
> 	}
> 	if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
> 	    (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
>-	    !list_empty(&smc->conn.lgr->list)) {
>+	    !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
> 		struct smc_connection *conn = &smc->conn;
> 		struct smcd_diag_dmbinfo dinfo;
> 		struct smcd_dev *smcd = conn->lgr->smcd;
>-- 
>2.32.0.3.g01195cf9f
Wenjia Zhang Jan. 18, 2024, 1:44 p.m. UTC | #2
On 18.01.24 05:32, Wen Gu wrote:
> A crash was found when dumping SMC-D connections. It can be reproduced
> by following steps:
> 
> - run nginx/wrk test:
>    smc_run nginx
>    smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
> 
> - continuously dump SMC-D connections in parallel:
>    watch -n 1 'smcss -D'
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000030
>   CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G	E      6.7.0+ #55
>   RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
>   Call Trace:
>    <TASK>
>    ? __die+0x24/0x70
>    ? page_fault_oops+0x66/0x150
>    ? exc_page_fault+0x69/0x140
>    ? asm_exc_page_fault+0x26/0x30
>    ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag]
>    ? __kmalloc_node_track_caller+0x35d/0x430
>    ? __alloc_skb+0x77/0x170
>    smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
>    smc_diag_dump+0x26/0x60 [smc_diag]
>    netlink_dump+0x19f/0x320
>    __netlink_dump_start+0x1dc/0x300
>    smc_diag_handler_dump+0x6a/0x80 [smc_diag]
>    ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
>    sock_diag_rcv_msg+0x121/0x140
>    ? __pfx_sock_diag_rcv_msg+0x10/0x10
>    netlink_rcv_skb+0x5a/0x110
>    sock_diag_rcv+0x28/0x40
>    netlink_unicast+0x22a/0x330
>    netlink_sendmsg+0x1f8/0x420
>    __sock_sendmsg+0xb0/0xc0
>    ____sys_sendmsg+0x24e/0x300
>    ? copy_msghdr_from_user+0x62/0x80
>    ___sys_sendmsg+0x7c/0xd0
>    ? __do_fault+0x34/0x160
>    ? do_read_fault+0x5f/0x100
>    ? do_fault+0xb0/0x110
>    ? __handle_mm_fault+0x2b0/0x6c0
>    __sys_sendmsg+0x4d/0x80
>    do_syscall_64+0x69/0x180
>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> It is possible that the connection is in process of being established
> when we dump it. Assumed that the connection has been registered in a
> link group by smc_conn_create() but the rmb_desc has not yet been
> initialized by smc_buf_create(), thus causing the illegal access to
> conn->rmb_desc. So fix it by checking before dump.
> 
> Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support")
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
> v2->v1: corrected the commit in Fixes tag.
> (https://lore.kernel.org/netdev/20240117122749.63785-1-guwen@linux.alibaba.com/)
> 
>   net/smc/smc_diag.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
> index 52f7c4f1e767..5a33908015f3 100644
> --- a/net/smc/smc_diag.c
> +++ b/net/smc/smc_diag.c
> @@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
>   	}
>   	if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
>   	    (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
> -	    !list_empty(&smc->conn.lgr->list)) {
> +	    !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
>   		struct smc_connection *conn = &smc->conn;
>   		struct smcd_diag_dmbinfo dinfo;
>   		struct smcd_dev *smcd = conn->lgr->smcd;

That sounds reasonable to me! Thank you for the fix!

Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
patchwork-bot+netdevbpf@kernel.org Jan. 19, 2024, 12:10 p.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 18 Jan 2024 12:32:10 +0800 you wrote:
> A crash was found when dumping SMC-D connections. It can be reproduced
> by following steps:
> 
> - run nginx/wrk test:
>   smc_run nginx
>   smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
> 
> [...]

Here is the summary with links:
  - [net,v2] net/smc: fix illegal rmb_desc access in SMC-D connection dump
    https://git.kernel.org/netdev/net/c/dbc153fd3c14

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 52f7c4f1e767..5a33908015f3 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -164,7 +164,7 @@  static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
 	}
 	if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
 	    (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
-	    !list_empty(&smc->conn.lgr->list)) {
+	    !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
 		struct smc_connection *conn = &smc->conn;
 		struct smcd_diag_dmbinfo dinfo;
 		struct smcd_dev *smcd = conn->lgr->smcd;