diff mbox series

[net] net/smc: fix invalid link access in dumping SMC-R connections

Message ID 1703662835-53416-1-git-send-email-guwen@linux.alibaba.com (mailing list archive)
State Accepted
Commit 9dbe086c69b8902c85cece394760ac212e9e4ccc
Delegated to: Netdev Maintainers
Headers show
Series [net] net/smc: fix invalid link access in dumping SMC-R connections | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1113 this patch: 1113
netdev/cc_maintainers success CCed 11 of 11 maintainers
netdev/build_clang success Errors and warnings before: 1140 this patch: 1140
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1140 this patch: 1140
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Wen Gu Dec. 27, 2023, 7:40 a.m. UTC
A crash was found when dumping SMC-R connections. It can be reproduced
by following steps:

- environment: two RNICs on both sides.
- run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
  will be created.
- set the first RNIC down on either side and link group will turn to
  SMC_LGR_ASYMMETRIC_LOCAL then.
- run 'smcss -R' and the crash will be triggered.

 BUG: kernel NULL pointer dereference, address: 0000000000000010
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W   E      6.7.0-rc6+ #51
 RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
 Call Trace:
  <TASK>
  ? __die+0x24/0x70
  ? page_fault_oops+0x66/0x150
  ? exc_page_fault+0x69/0x140
  ? asm_exc_page_fault+0x26/0x30
  ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
  smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
  smc_diag_dump+0x26/0x60 [smc_diag]
  netlink_dump+0x19f/0x320
  __netlink_dump_start+0x1dc/0x300
  smc_diag_handler_dump+0x6a/0x80 [smc_diag]
  ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
  sock_diag_rcv_msg+0x121/0x140
  ? __pfx_sock_diag_rcv_msg+0x10/0x10
  netlink_rcv_skb+0x5a/0x110
  sock_diag_rcv+0x28/0x40
  netlink_unicast+0x22a/0x330
  netlink_sendmsg+0x240/0x4a0
  __sock_sendmsg+0xb0/0xc0
  ____sys_sendmsg+0x24e/0x300
  ? copy_msghdr_from_user+0x62/0x80
  ___sys_sendmsg+0x7c/0xd0
  ? __do_fault+0x34/0x1a0
  ? do_read_fault+0x5f/0x100
  ? do_fault+0xb0/0x110
  __sys_sendmsg+0x4d/0x80
  do_syscall_64+0x45/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

When the first RNIC is set down, the lgr->lnk[0] will be cleared and an
asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1]
by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections
in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting
in this issue. So fix it by accessing the right link.

Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets")
Reported-by: henaumars <henaumars@sina.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_diag.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Tony Lu Dec. 28, 2023, 9:32 a.m. UTC | #1
On Wed, Dec 27, 2023 at 03:40:35PM +0800, Wen Gu wrote:
> A crash was found when dumping SMC-R connections. It can be reproduced
> by following steps:
> 
> - environment: two RNICs on both sides.
> - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
>   will be created.
> - set the first RNIC down on either side and link group will turn to
>   SMC_LGR_ASYMMETRIC_LOCAL then.
> - run 'smcss -R' and the crash will be triggered.
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000010
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0
>  Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W   E      6.7.0-rc6+ #51
>  RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
>  Call Trace:
>   <TASK>
>   ? __die+0x24/0x70
>   ? page_fault_oops+0x66/0x150
>   ? exc_page_fault+0x69/0x140
>   ? asm_exc_page_fault+0x26/0x30
>   ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
>   smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
>   smc_diag_dump+0x26/0x60 [smc_diag]
>   netlink_dump+0x19f/0x320
>   __netlink_dump_start+0x1dc/0x300
>   smc_diag_handler_dump+0x6a/0x80 [smc_diag]
>   ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
>   sock_diag_rcv_msg+0x121/0x140
>   ? __pfx_sock_diag_rcv_msg+0x10/0x10
>   netlink_rcv_skb+0x5a/0x110
>   sock_diag_rcv+0x28/0x40
>   netlink_unicast+0x22a/0x330
>   netlink_sendmsg+0x240/0x4a0
>   __sock_sendmsg+0xb0/0xc0
>   ____sys_sendmsg+0x24e/0x300
>   ? copy_msghdr_from_user+0x62/0x80
>   ___sys_sendmsg+0x7c/0xd0
>   ? __do_fault+0x34/0x1a0
>   ? do_read_fault+0x5f/0x100
>   ? do_fault+0xb0/0x110
>   __sys_sendmsg+0x4d/0x80
>   do_syscall_64+0x45/0xf0
>   entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> When the first RNIC is set down, the lgr->lnk[0] will be cleared and an
> asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1]
> by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections
> in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting
> in this issue. So fix it by accessing the right link.
> 
> Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets")
> Reported-by: henaumars <henaumars@sina.com>
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616

What about using Link: http... here?

> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>

Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>

> ---
>  net/smc/smc_diag.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
> index a584613aca12..5cc376834c57 100644
> --- a/net/smc/smc_diag.c
> +++ b/net/smc/smc_diag.c
> @@ -153,8 +153,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
>  			.lnk[0].link_id = link->link_id,
>  		};
>  
> -		memcpy(linfo.lnk[0].ibname,
> -		       smc->conn.lgr->lnk[0].smcibdev->ibdev->name,
> +		memcpy(linfo.lnk[0].ibname, link->smcibdev->ibdev->name,
>  		       sizeof(link->smcibdev->ibdev->name));
>  		smc_gid_be16_convert(linfo.lnk[0].gid, link->gid);
>  		smc_gid_be16_convert(linfo.lnk[0].peer_gid, link->peer_gid);
> -- 
> 2.43.0
Wen Gu Dec. 28, 2023, 11:02 a.m. UTC | #2
On 2023/12/28 17:32, Tony Lu wrote:
> On Wed, Dec 27, 2023 at 03:40:35PM +0800, Wen Gu wrote:
>> A crash was found when dumping SMC-R connections. It can be reproduced
>> by following steps:
>>
>> - environment: two RNICs on both sides.
>> - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
>>    will be created.
>> - set the first RNIC down on either side and link group will turn to
>>    SMC_LGR_ASYMMETRIC_LOCAL then.
>> - run 'smcss -R' and the crash will be triggered.
>>
>>   BUG: kernel NULL pointer dereference, address: 0000000000000010
>>   #PF: supervisor read access in kernel mode
>>   #PF: error_code(0x0000) - not-present page
>>   PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0
>>   Oops: 0000 [#1] PREEMPT SMP PTI
>>   CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W   E      6.7.0-rc6+ #51
>>   RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
>>   Call Trace:
>>    <TASK>
>>    ? __die+0x24/0x70
>>    ? page_fault_oops+0x66/0x150
>>    ? exc_page_fault+0x69/0x140
>>    ? asm_exc_page_fault+0x26/0x30
>>    ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
>>    smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
>>    smc_diag_dump+0x26/0x60 [smc_diag]
>>    netlink_dump+0x19f/0x320
>>    __netlink_dump_start+0x1dc/0x300
>>    smc_diag_handler_dump+0x6a/0x80 [smc_diag]
>>    ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
>>    sock_diag_rcv_msg+0x121/0x140
>>    ? __pfx_sock_diag_rcv_msg+0x10/0x10
>>    netlink_rcv_skb+0x5a/0x110
>>    sock_diag_rcv+0x28/0x40
>>    netlink_unicast+0x22a/0x330
>>    netlink_sendmsg+0x240/0x4a0
>>    __sock_sendmsg+0xb0/0xc0
>>    ____sys_sendmsg+0x24e/0x300
>>    ? copy_msghdr_from_user+0x62/0x80
>>    ___sys_sendmsg+0x7c/0xd0
>>    ? __do_fault+0x34/0x1a0
>>    ? do_read_fault+0x5f/0x100
>>    ? do_fault+0xb0/0x110
>>    __sys_sendmsg+0x4d/0x80
>>    do_syscall_64+0x45/0xf0
>>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>
>> When the first RNIC is set down, the lgr->lnk[0] will be cleared and an
>> asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1]
>> by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections
>> in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting
>> in this issue. So fix it by accessing the right link.
>>
>> Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets")
>> Reported-by: henaumars <henaumars@sina.com>
>> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616
> 
> What about using Link: http... here?
> 

Thank you, Tony.

According to [1],

"
The Reported-by tag gives credit to people who find bugs and report them and it
hopefully inspires them to help us again in the future. The tag is intended for
bugs; please do not use it to credit feature requests. The tag should be followed
by a Closes: tag pointing to the report, unless the report is not available on
the web. The Link: tag can be used instead of Closes: if the patch fixes a part
of the issue(s) being reported.
"

So I guess the Closes: tag is fine here.

[1] https://docs.kernel.org/process/submitting-patches.html

>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> 
> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
> 
>> ---
>>   net/smc/smc_diag.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
>> index a584613aca12..5cc376834c57 100644
>> --- a/net/smc/smc_diag.c
>> +++ b/net/smc/smc_diag.c
>> @@ -153,8 +153,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
>>   			.lnk[0].link_id = link->link_id,
>>   		};
>>   
>> -		memcpy(linfo.lnk[0].ibname,
>> -		       smc->conn.lgr->lnk[0].smcibdev->ibdev->name,
>> +		memcpy(linfo.lnk[0].ibname, link->smcibdev->ibdev->name,
>>   		       sizeof(link->smcibdev->ibdev->name));
>>   		smc_gid_be16_convert(linfo.lnk[0].gid, link->gid);
>>   		smc_gid_be16_convert(linfo.lnk[0].peer_gid, link->peer_gid);
>> -- 
>> 2.43.0
Wenjia Zhang Jan. 3, 2024, 9:33 a.m. UTC | #3
On 27.12.23 08:40, Wen Gu wrote:
> A crash was found when dumping SMC-R connections. It can be reproduced
> by following steps:
> 
> - environment: two RNICs on both sides.
> - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
>    will be created.
> - set the first RNIC down on either side and link group will turn to
>    SMC_LGR_ASYMMETRIC_LOCAL then.
> - run 'smcss -R' and the crash will be triggered.
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000010
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 8000000101fdd067 P4D 8000000101fdd067 PUD 10ce46067 PMD 0
>   Oops: 0000 [#1] PREEMPT SMP PTI
>   CPU: 3 PID: 1810 Comm: smcss Kdump: loaded Tainted: G W   E      6.7.0-rc6+ #51
>   RIP: 0010:__smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
>   Call Trace:
>    <TASK>
>    ? __die+0x24/0x70
>    ? page_fault_oops+0x66/0x150
>    ? exc_page_fault+0x69/0x140
>    ? asm_exc_page_fault+0x26/0x30
>    ? __smc_diag_dump.constprop.0+0x36e/0x620 [smc_diag]
>    smc_diag_dump_proto+0xd0/0xf0 [smc_diag]
>    smc_diag_dump+0x26/0x60 [smc_diag]
>    netlink_dump+0x19f/0x320
>    __netlink_dump_start+0x1dc/0x300
>    smc_diag_handler_dump+0x6a/0x80 [smc_diag]
>    ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag]
>    sock_diag_rcv_msg+0x121/0x140
>    ? __pfx_sock_diag_rcv_msg+0x10/0x10
>    netlink_rcv_skb+0x5a/0x110
>    sock_diag_rcv+0x28/0x40
>    netlink_unicast+0x22a/0x330
>    netlink_sendmsg+0x240/0x4a0
>    __sock_sendmsg+0xb0/0xc0
>    ____sys_sendmsg+0x24e/0x300
>    ? copy_msghdr_from_user+0x62/0x80
>    ___sys_sendmsg+0x7c/0xd0
>    ? __do_fault+0x34/0x1a0
>    ? do_read_fault+0x5f/0x100
>    ? do_fault+0xb0/0x110
>    __sys_sendmsg+0x4d/0x80
>    do_syscall_64+0x45/0xf0
>    entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> When the first RNIC is set down, the lgr->lnk[0] will be cleared and an
> asymmetric link will be allocated in lgr->link[SMC_LINKS_PER_LGR_MAX - 1]
> by smc_llc_alloc_alt_link(). Then when we try to dump SMC-R connections
> in __smc_diag_dump(), the invalid lgr->lnk[0] will be accessed, resulting
> in this issue. So fix it by accessing the right link.
> 
> Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets")
> Reported-by: henaumars <henaumars@sina.com>
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7616
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>

That is really good catch and good description! Thank you, Wen Gu, for 
fixing it!

Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com>
patchwork-bot+netdevbpf@kernel.org Jan. 4, 2024, 1 a.m. UTC | #4
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 27 Dec 2023 15:40:35 +0800 you wrote:
> A crash was found when dumping SMC-R connections. It can be reproduced
> by following steps:
> 
> - environment: two RNICs on both sides.
> - run SMC-R between two sides, now a SMC_LGR_SYMMETRIC type link group
>   will be created.
> - set the first RNIC down on either side and link group will turn to
>   SMC_LGR_ASYMMETRIC_LOCAL then.
> - run 'smcss -R' and the crash will be triggered.
> 
> [...]

Here is the summary with links:
  - [net] net/smc: fix invalid link access in dumping SMC-R connections
    https://git.kernel.org/netdev/net/c/9dbe086c69b8

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index a584613aca12..5cc376834c57 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -153,8 +153,7 @@  static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
 			.lnk[0].link_id = link->link_id,
 		};
 
-		memcpy(linfo.lnk[0].ibname,
-		       smc->conn.lgr->lnk[0].smcibdev->ibdev->name,
+		memcpy(linfo.lnk[0].ibname, link->smcibdev->ibdev->name,
 		       sizeof(link->smcibdev->ibdev->name));
 		smc_gid_be16_convert(linfo.lnk[0].gid, link->gid);
 		smc_gid_be16_convert(linfo.lnk[0].peer_gid, link->peer_gid);