diff mbox series

[net,v3] net/smc: Reset conn->lgr when link group registration fails

Message ID 1641364133-61284-1-git-send-email-guwen@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] net/smc: Reset conn->lgr when link group registration fails | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: ubraun@linux.ibm.com; 1 maintainers not CCed: ubraun@linux.ibm.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Wen Gu Jan. 5, 2022, 6:28 a.m. UTC
SMC connections might fail to be registered to a link group due to
things like unable to find a link to assign to in its creation. As
a result, connection creation will return a failure and most
resources related to the connection won't be applied or initialized,
such as conn->abort_work or conn->lnk.

If smc_conn_free() is invoked later, it will try to access the
resources related to the connection, which wasn't initialized, thus
causing a panic.

Here is an example, a SMC-R connection failed to be registered
to a link group and conn->lnk is NULL. The following crash will
happen if smc_conn_free() tries to access conn->lnk in
smc_cdc_tx_dismiss_slots().

 BUG: kernel NULL pointer dereference, address: 0000000000000168
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 4 PID: 68 Comm: kworker/4:1 Kdump: loaded Tainted: G E     5.16.0-rc5+ #52
 Workqueue: smc_hs_wq smc_listen_work [smc]
 RIP: 0010:smc_wr_tx_dismiss_slots+0x1e/0xc0 [smc]
 Call Trace:
  <TASK>
  smc_conn_free+0xd8/0x100 [smc]
  smc_lgr_cleanup_early+0x15/0x90 [smc]
  smc_listen_work+0x302/0x1230 [smc]
  ? process_one_work+0x25c/0x600
  process_one_work+0x25c/0x600
  worker_thread+0x4f/0x3a0
  ? process_one_work+0x600/0x600
  kthread+0x15d/0x1a0
  ? set_kthread_struct+0x40/0x40
  ret_from_fork+0x1f/0x30
  </TASK>

This patch tries to fix this by resetting conn->lgr to NULL if an
abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the
crash caused by accessing the uninitialized resources in smc_conn_free().
And the new created link group will be terminated if smc connections
can't be registered to it.

Fixes: 56bc3b2094b4 ("net/smc: assign link to a new connection")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
v1->v2:
- Reset conn->lgr to NULL in smc_lgr_register_conn().
- Only free new created link group.
v2->v3:
- Using __smc_lgr_terminate() instead of smc_lgr_schedule_free_work()
  for an immediate free.
---
 net/smc/smc_core.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Dust Li Jan. 5, 2022, 7:54 a.m. UTC | #1
On Wed, Jan 05, 2022 at 02:28:53PM +0800, Wen Gu wrote:
>SMC connections might fail to be registered to a link group due to
>things like unable to find a link to assign to in its creation. As
>a result, connection creation will return a failure and most
>resources related to the connection won't be applied or initialized,
>such as conn->abort_work or conn->lnk.
>
>If smc_conn_free() is invoked later, it will try to access the
>resources related to the connection, which wasn't initialized, thus
>causing a panic.
>
>Here is an example, a SMC-R connection failed to be registered
>to a link group and conn->lnk is NULL. The following crash will
>happen if smc_conn_free() tries to access conn->lnk in
>smc_cdc_tx_dismiss_slots().
>
> BUG: kernel NULL pointer dereference, address: 0000000000000168
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 4 PID: 68 Comm: kworker/4:1 Kdump: loaded Tainted: G E     5.16.0-rc5+ #52
> Workqueue: smc_hs_wq smc_listen_work [smc]
> RIP: 0010:smc_wr_tx_dismiss_slots+0x1e/0xc0 [smc]
> Call Trace:
>  <TASK>
>  smc_conn_free+0xd8/0x100 [smc]
>  smc_lgr_cleanup_early+0x15/0x90 [smc]
>  smc_listen_work+0x302/0x1230 [smc]
>  ? process_one_work+0x25c/0x600
>  process_one_work+0x25c/0x600
>  worker_thread+0x4f/0x3a0
>  ? process_one_work+0x600/0x600
>  kthread+0x15d/0x1a0
>  ? set_kthread_struct+0x40/0x40
>  ret_from_fork+0x1f/0x30
>  </TASK>
>
>This patch tries to fix this by resetting conn->lgr to NULL if an
>abnormal exit occurs in smc_lgr_register_conn(), thus avoiding the
>crash caused by accessing the uninitialized resources in smc_conn_free().
>And the new created link group will be terminated if smc connections
>can't be registered to it.
>
>Fixes: 56bc3b2094b4 ("net/smc: assign link to a new connection")
>Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>---
>v1->v2:
>- Reset conn->lgr to NULL in smc_lgr_register_conn().
>- Only free new created link group.
>v2->v3:
>- Using __smc_lgr_terminate() instead of smc_lgr_schedule_free_work()
>  for an immediate free.
>---
> net/smc/smc_core.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
>index 412bc85..0201f99 100644
>--- a/net/smc/smc_core.c
>+++ b/net/smc/smc_core.c
>@@ -171,8 +171,10 @@ static int smc_lgr_register_conn(struct smc_connection *conn, bool first)
> 
> 	if (!conn->lgr->is_smcd) {
> 		rc = smcr_lgr_conn_assign_link(conn, first);
>-		if (rc)
>+		if (rc) {
>+			conn->lgr = NULL;
> 			return rc;
>+		}
> 	}
> 	/* find a new alert_token_local value not yet used by some connection
> 	 * in this link group
>@@ -1835,8 +1837,14 @@ int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
> 		write_lock_bh(&lgr->conns_lock);
> 		rc = smc_lgr_register_conn(conn, true);
> 		write_unlock_bh(&lgr->conns_lock);
>-		if (rc)
>+		if (rc) {
>+			spin_lock_bh(lgr_lock);
>+			if (!list_empty(&lgr->list))
>+				list_del_init(&lgr->list);
>+			spin_unlock_bh(lgr_lock);
>+			__smc_lgr_terminate(lgr, true);

What about adding a smc_lgr_terminate() wrapper and put list_del_init()
and __smc_lgr_terminate() into it ?

> 			goto out;
>+		}
> 	}
> 	conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
> 	conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;
>-- 
>1.8.3.1
Wen Gu Jan. 5, 2022, 8:55 a.m. UTC | #2
Thanks for your suggestion.

On 2022/1/5 3:54 pm, dust.li wrote:

>> -		if (rc)
>> +		if (rc) {
>> +			spin_lock_bh(lgr_lock);
>> +			if (!list_empty(&lgr->list))
>> +				list_del_init(&lgr->list);
>> +			spin_unlock_bh(lgr_lock);
>> +			__smc_lgr_terminate(lgr, true);
> 
> What about adding a smc_lgr_terminate() wrapper and put list_del_init()
> and __smc_lgr_terminate() into it ?

Adding a new wrapper is a good idea. But I think the logic here is relatively simple.
So instead of wrapping them, I coded them like what smc_lgr_cleanup_early() does.

Thanks,
Wen Gu

> 
>> 			goto out;
>> +		}
>> 	}
>> 	conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
>> 	conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;
>> -- 
>> 1.8.3.1
Karsten Graul Jan. 5, 2022, 1:25 p.m. UTC | #3
On 05/01/2022 09:55, Wen Gu wrote:
> On 2022/1/5 3:54 pm, dust.li wrote:
> 
>>> -        if (rc)
>>> +        if (rc) {
>>> +            spin_lock_bh(lgr_lock);
>>> +            if (!list_empty(&lgr->list))
>>> +                list_del_init(&lgr->list);
>>> +            spin_unlock_bh(lgr_lock);
>>> +            __smc_lgr_terminate(lgr, true);
>>
>> What about adding a smc_lgr_terminate() wrapper and put list_del_init()
>> and __smc_lgr_terminate() into it ?
> 
> Adding a new wrapper is a good idea. But I think the logic here is relatively simple.
> So instead of wrapping them, I coded them like what smc_lgr_cleanup_early() does.

It might look cleaner with the following changes:
- adopt smc_lgr_cleanup_early() to take only an lgr as parameter and remove the call to smc_conn_free()
- change smc_conn_abort() (which is the only caller of smc_lgr_cleanup_early() right now), always
  call smc_conn_free() and if (local_first) additionally call smc_lgr_cleanup_early() 
  (hold a local copy of the lgr for this call)
- finally call smc_lgr_cleanup_early(lgr) from smc_conn_create()

This should be the same processing, but the smc_conn_free() is moved to smc_conn_abort() where
it looks to be a better place for this call. And smc_lgr_cleanup_early() takes only care of an lgr.

What do you think? Did I miss something?
Wen Gu Jan. 6, 2022, 2:09 a.m. UTC | #4
Thanks for your suggestion.

On 2022/1/5 9:25 pm, Karsten Graul wrote:

> It might look cleaner with the following changes:
> - adopt smc_lgr_cleanup_early() to take only an lgr as parameter and remove the call to smc_conn_free()
> - change smc_conn_abort() (which is the only caller of smc_lgr_cleanup_early() right now), always
>    call smc_conn_free() and if (local_first) additionally call smc_lgr_cleanup_early()
>    (hold a local copy of the lgr for this call)
> - finally call smc_lgr_cleanup_early(lgr) from smc_conn_create()
> 
> This should be the same processing, but the smc_conn_free() is moved to smc_conn_abort() where
> it looks to be a better place for this call. And smc_lgr_cleanup_early() takes only care of an lgr.
> 

I think those are very good changes, making smc_lgr_cleanup_early() processing link group only and
more reusable.

> What do you think? Did I miss something?
I think it is better and complete. I will improve the patch and test it, then send a v4.

Thanks,
Wen Gu
diff mbox series

Patch

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 412bc85..0201f99 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -171,8 +171,10 @@  static int smc_lgr_register_conn(struct smc_connection *conn, bool first)
 
 	if (!conn->lgr->is_smcd) {
 		rc = smcr_lgr_conn_assign_link(conn, first);
-		if (rc)
+		if (rc) {
+			conn->lgr = NULL;
 			return rc;
+		}
 	}
 	/* find a new alert_token_local value not yet used by some connection
 	 * in this link group
@@ -1835,8 +1837,14 @@  int smc_conn_create(struct smc_sock *smc, struct smc_init_info *ini)
 		write_lock_bh(&lgr->conns_lock);
 		rc = smc_lgr_register_conn(conn, true);
 		write_unlock_bh(&lgr->conns_lock);
-		if (rc)
+		if (rc) {
+			spin_lock_bh(lgr_lock);
+			if (!list_empty(&lgr->list))
+				list_del_init(&lgr->list);
+			spin_unlock_bh(lgr_lock);
+			__smc_lgr_terminate(lgr, true);
 			goto out;
+		}
 	}
 	conn->local_tx_ctrl.common.type = SMC_CDC_MSG_TYPE;
 	conn->local_tx_ctrl.len = SMC_WR_TX_SIZE;