diff mbox series

[net,2/2] net/smc: fix LGR and link use-after-free issue

Message ID 20241122071630.63707-3-guwen@linux.alibaba.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series two fixes for SMC | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: kgraul@linux.ibm.com; 1 maintainers not CCed: kgraul@linux.ibm.com
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 53 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-25--00-00 (tests: 791)

Commit Message

Wen Gu Nov. 22, 2024, 7:16 a.m. UTC
We encountered a LGR/link use-after-free issue, which manifested as
the LGR/link refcnt reaching 0 early and entering the clear process,
making resource access unsafe.

 refcount_t: addition on 0; use-after-free.
 WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
 Workqueue: events smc_lgr_terminate_work [smc]
 Call trace:
  refcount_warn_saturate+0x9c/0x140
  __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
  smc_lgr_terminate_work+0x28/0x30 [smc]
  process_one_work+0x1b8/0x420
  worker_thread+0x158/0x510
  kthread+0x114/0x118

or

 refcount_t: underflow; use-after-free.
 WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
 Workqueue: smc_hs_wq smc_listen_work [smc]
 Call trace:
  refcount_warn_saturate+0xf0/0x140
  smcr_link_put+0x1cc/0x1d8 [smc]
  smc_conn_free+0x110/0x1b0 [smc]
  smc_conn_abort+0x50/0x60 [smc]
  smc_listen_find_device+0x75c/0x790 [smc]
  smc_listen_work+0x368/0x8a0 [smc]
  process_one_work+0x1b8/0x420
  worker_thread+0x158/0x510
  kthread+0x114/0x118

It is caused by repeated release of LGR/link refcnt. One suspect is that
smc_conn_free() is called repeatedly because some smc_conn_free() are not
protected by sock lock.

Calls under socklock        | Calls not under socklock
-------------------------------------------------------
lock_sock(sk)               | smc_conn_abort
smc_conn_free               | \- smc_conn_free
\- smcr_link_put            |    \- smcr_link_put (duplicated)
release_sock(sk)

So make sure smc_conn_free() is called under the sock lock.

Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Kai <KaiShen@linux.alibaba.com>
Signed-off-by: Kai <KaiShen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/af_smc.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Comments

Wenjia Zhang Nov. 22, 2024, 3:56 p.m. UTC | #1
On 22.11.24 08:16, Wen Gu wrote:
> We encountered a LGR/link use-after-free issue, which manifested as
> the LGR/link refcnt reaching 0 early and entering the clear process,
> making resource access unsafe.
>

How did you make sure that the refcount mentioned in the warning are the 
LGR/link refcnt, not &sk->sk_refcnt?

>   refcount_t: addition on 0; use-after-free.
>   WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
>   Workqueue: events smc_lgr_terminate_work [smc]
>   Call trace:
>    refcount_warn_saturate+0x9c/0x140
>    __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
>    smc_lgr_terminate_work+0x28/0x30 [smc]
>    process_one_work+0x1b8/0x420
>    worker_thread+0x158/0x510
>    kthread+0x114/0x118
> 
> or
> 
>   refcount_t: underflow; use-after-free.
>   WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
>   Workqueue: smc_hs_wq smc_listen_work [smc]
>   Call trace:
>    refcount_warn_saturate+0xf0/0x140
>    smcr_link_put+0x1cc/0x1d8 [smc]
>    smc_conn_free+0x110/0x1b0 [smc]
>    smc_conn_abort+0x50/0x60 [smc]
>    smc_listen_find_device+0x75c/0x790 [smc]
>    smc_listen_work+0x368/0x8a0 [smc]
>    process_one_work+0x1b8/0x420
>    worker_thread+0x158/0x510
>    kthread+0x114/0x118
> 
> It is caused by repeated release of LGR/link refcnt. One suspect is that
> smc_conn_free() is called repeatedly because some smc_conn_free() are not
> protected by sock lock.
> 
> Calls under socklock        | Calls not under socklock
> -------------------------------------------------------
> lock_sock(sk)               | smc_conn_abort
> smc_conn_free               | \- smc_conn_free
> \- smcr_link_put            |    \- smcr_link_put (duplicated)
> release_sock(sk)
> 
> So make sure smc_conn_free() is called under the sock lock.
> 

If I understand correctly, the fix could only solve a part of the 
problem, i.e. what the second call trace reported, right?

> Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
> Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
> Co-developed-by: Kai <KaiShen@linux.alibaba.com>
> Signed-off-by: Kai <KaiShen@linux.alibaba.com>
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>   net/smc/af_smc.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index ed6d4d520bc7..e0a7a0151b11 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
>   	return smc_connect_fallback(smc, reason_code);
>   }
>   
> -static void smc_conn_abort(struct smc_sock *smc, int local_first)
> +static void __smc_conn_abort(struct smc_sock *smc, int local_first,
> +			     bool locked)
>   {
>   	struct smc_connection *conn = &smc->conn;
>   	struct smc_link_group *lgr = conn->lgr;
> @@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
>   	if (smc_conn_lgr_valid(conn))
>   		lgr_valid = true;
>   
> -	smc_conn_free(conn);
> +	if (!locked) {
> +		lock_sock(&smc->sk);
> +		smc_conn_free(conn);
> +		release_sock(&smc->sk);
> +	} else {
> +		smc_conn_free(conn);
> +	}
>   	if (local_first && lgr_valid)
>   		smc_lgr_cleanup_early(lgr);
>   }
>   
> +static void smc_conn_abort(struct smc_sock *smc, int local_first)
> +{
> +	__smc_conn_abort(smc, local_first, false);
> +}
> +
> +static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
> +{
> +	__smc_conn_abort(smc, local_first, true);
> +}
> +
>   /* check if there is a rdma device available for this connection. */
>   /* called for connect and listen */
>   static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
> @@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
>   
>   	return 0;
>   connect_abort:
> -	smc_conn_abort(smc, ini->first_contact_local);
> +	smc_conn_abort_locked(smc, ini->first_contact_local);
>   	mutex_unlock(&smc_client_lgr_pending);
>   	smc->connect_nonblock = 0;
>   
> @@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
>   
>   	return 0;
>   connect_abort:
> -	smc_conn_abort(smc, ini->first_contact_local);
> +	smc_conn_abort_locked(smc, ini->first_contact_local);
>   	mutex_unlock(&smc_server_lgr_pending);
>   	smc->connect_nonblock = 0;
>   

Why is smc_conn_abort_locked() only necessary for the smc_connect_work, 
not for the smc_listen_work?

Thanks,
Wenjia
Alexandra Winter Nov. 22, 2024, 4:03 p.m. UTC | #2
On 22.11.24 08:16, Wen Gu wrote:
> We encountered a LGR/link use-after-free issue, which manifested as
> the LGR/link refcnt reaching 0 early and entering the clear process,
> making resource access unsafe.
> 
>  refcount_t: addition on 0; use-after-free.
>  WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
>  Workqueue: events smc_lgr_terminate_work [smc]
>  Call trace:
>   refcount_warn_saturate+0x9c/0x140
>   __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
>   smc_lgr_terminate_work+0x28/0x30 [smc]
>   process_one_work+0x1b8/0x420
>   worker_thread+0x158/0x510
>   kthread+0x114/0x118
> 
> or
> 
>  refcount_t: underflow; use-after-free.
>  WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
>  Workqueue: smc_hs_wq smc_listen_work [smc]
>  Call trace:
>   refcount_warn_saturate+0xf0/0x140
>   smcr_link_put+0x1cc/0x1d8 [smc]
>   smc_conn_free+0x110/0x1b0 [smc]
>   smc_conn_abort+0x50/0x60 [smc]
>   smc_listen_find_device+0x75c/0x790 [smc]
>   smc_listen_work+0x368/0x8a0 [smc]
>   process_one_work+0x1b8/0x420
>   worker_thread+0x158/0x510
>   kthread+0x114/0x118
> 
> It is caused by repeated release of LGR/link refcnt. One suspect is that
> smc_conn_free() is called repeatedly because some smc_conn_free() are not
> protected by sock lock.
> 
> Calls under socklock        | Calls not under socklock
> -------------------------------------------------------
> lock_sock(sk)               | smc_conn_abort
> smc_conn_free               | \- smc_conn_free
> \- smcr_link_put            |    \- smcr_link_put (duplicated)
> release_sock(sk)
> 
> So make sure smc_conn_free() is called under the sock lock.
> 
> Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
> Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
> Co-developed-by: Kai <KaiShen@linux.alibaba.com>
> Signed-off-by: Kai <KaiShen@linux.alibaba.com>
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>  net/smc/af_smc.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index ed6d4d520bc7..e0a7a0151b11 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
>  	return smc_connect_fallback(smc, reason_code);
>  }
>  
> -static void smc_conn_abort(struct smc_sock *smc, int local_first)
> +static void __smc_conn_abort(struct smc_sock *smc, int local_first,
> +			     bool locked)
>  {
>  	struct smc_connection *conn = &smc->conn;
>  	struct smc_link_group *lgr = conn->lgr;
> @@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
>  	if (smc_conn_lgr_valid(conn))
>  		lgr_valid = true;
>  
> -	smc_conn_free(conn);
> +	if (!locked) {
> +		lock_sock(&smc->sk);
> +		smc_conn_free(conn);
> +		release_sock(&smc->sk);
> +	} else {
> +		smc_conn_free(conn);
> +	}
>  	if (local_first && lgr_valid)
>  		smc_lgr_cleanup_early(lgr);
>  }
>  
> +static void smc_conn_abort(struct smc_sock *smc, int local_first)
> +{
> +	__smc_conn_abort(smc, local_first, false);
> +}
> +
> +static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
> +{
> +	__smc_conn_abort(smc, local_first, true);
> +}
> +
>  /* check if there is a rdma device available for this connection. */
>  /* called for connect and listen */
>  static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
> @@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
>  
>  	return 0;
>  connect_abort:
> -	smc_conn_abort(smc, ini->first_contact_local);
> +	smc_conn_abort_locked(smc, ini->first_contact_local);
>  	mutex_unlock(&smc_client_lgr_pending);
>  	smc->connect_nonblock = 0;
>  
> @@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
>  
>  	return 0;
>  connect_abort:
> -	smc_conn_abort(smc, ini->first_contact_local);
> +	smc_conn_abort_locked(smc, ini->first_contact_local);
>  	mutex_unlock(&smc_server_lgr_pending);
>  	smc->connect_nonblock = 0;
>  

I wonder if this can deadlock, when you take lock_sock so far down in the callchain.
example:
 smc_connect will first take lock_sock(sk) and then mutex_lock(&smc_server_lgr_pending);  (e.g. in smc_connect_ism())
wheras
smc_listen_work() will take mutex_lock(&smc_server_lgr_pending); and then lock_sock(sk) (in your __smc_conn_abort(,,false))

I am not sure whether this can be called on the same socket, but it looks suspicious to me.


All callers of smc_conn_abort() without socklock seem to originate from smc_listen_work(). 
That makes me think whether smc_listen_work() should do lock_sock(sk) on a higher level.

Do you have an example which function could collide with smc_listen_work()?
i.e. have you found a way to reproduce this?


Are you sure that all callers of smc_conn_free(), that are not smc_conn_abort(), do set the socklock?
It seems to me that the path of smc_conn_kill() is not covered by your solution.


Please excuse, that I am not deeply familiar with this code. 
I'm just trying to ask helpful questions.
Alexandra Winter Nov. 22, 2024, 4:11 p.m. UTC | #3
On 22.11.24 17:03, Alexandra Winter wrote:
> Are you sure that all callers of smc_conn_free(), that are not smc_conn_abort(), do set the socklock?
> It seems to me that the path of smc_conn_kill() is not covered by your solution.

My bad. smc_conn_kill() is called under socklock
Wen Gu Nov. 25, 2024, 6:46 a.m. UTC | #4
On 2024/11/22 23:56, Wenjia Zhang wrote:
> 
> 
> On 22.11.24 08:16, Wen Gu wrote:
>> We encountered a LGR/link use-after-free issue, which manifested as
>> the LGR/link refcnt reaching 0 early and entering the clear process,
>> making resource access unsafe.
>>
> 
> How did you make sure that the refcount mentioned in the warning are the LGR/link refcnt, not &sk->sk_refcnt?
> 
Because according to the panic stack, the UAF is found in smcr_link_put(),
and it is also found that the link has been cleared at that time (lnk has
been memset to all zero by __smcr_link_clear()).

>>   refcount_t: addition on 0; use-after-free.
>>   WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
>>   Workqueue: events smc_lgr_terminate_work [smc]
>>   Call trace:
>>    refcount_warn_saturate+0x9c/0x140
>>    __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
>>    smc_lgr_terminate_work+0x28/0x30 [smc]
>>    process_one_work+0x1b8/0x420
>>    worker_thread+0x158/0x510
>>    kthread+0x114/0x118
>>
>> or
>>
>>   refcount_t: underflow; use-after-free.
>>   WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
>>   Workqueue: smc_hs_wq smc_listen_work [smc]
>>   Call trace:
>>    refcount_warn_saturate+0xf0/0x140
>>    smcr_link_put+0x1cc/0x1d8 [smc]
>>    smc_conn_free+0x110/0x1b0 [smc]
>>    smc_conn_abort+0x50/0x60 [smc]
>>    smc_listen_find_device+0x75c/0x790 [smc]
>>    smc_listen_work+0x368/0x8a0 [smc]
>>    process_one_work+0x1b8/0x420
>>    worker_thread+0x158/0x510
>>    kthread+0x114/0x118
>>
>> It is caused by repeated release of LGR/link refcnt. One suspect is that
>> smc_conn_free() is called repeatedly because some smc_conn_free() are not
>> protected by sock lock.
>>
>> Calls under socklock        | Calls not under socklock
>> -------------------------------------------------------
>> lock_sock(sk)               | smc_conn_abort
>> smc_conn_free               | \- smc_conn_free
>> \- smcr_link_put            |    \- smcr_link_put (duplicated)
>> release_sock(sk)
>>
>> So make sure smc_conn_free() is called under the sock lock.
>>
> 
> If I understand correctly, the fix could only solve a part of the problem, i.e. what the second call trace reported, right?

I think that these panic stacks (there are some other variations that I haven't posted)
have the same root cause, that is the link/lgr refcnt reaches 0 early in the race situation,
making access to link/lgr related resources no longer safe.

The link/lgr refcnt was introduced by [1] & [2], the link refcnt is operated by link
itself and connections registered to it, and the lgr refcnt is operated by lgr itself,
links belong to it and connections registered to it. Through code analysis, the most
likely suspect is that smc_conn_free() duplicate put link/lgr refcnt because some
smc_conn_free() calls by smc_conn_abort() are not under the protection of sock lock,
so if they are called at the same time, there may be a race condition.

for example:

    __smc_lgr_terminate            | smc_listen_decline
    --------------------------------------------------------------
    lock_sock                      |
    smc_conn_kill                  | smc_conn_abort
     \- smc_conn_free              |  \- smc_conn_free
    release_sock                   |

[1] 61f434b0280e ("net/smc: Resolve the race between link group access and termination")
[2] 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access and clear")

> 
>> Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
>> Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>> Co-developed-by: Kai <KaiShen@linux.alibaba.com>
>> Signed-off-by: Kai <KaiShen@linux.alibaba.com>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> ---
>>   net/smc/af_smc.c | 25 +++++++++++++++++++++----
>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index ed6d4d520bc7..e0a7a0151b11 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
>>       return smc_connect_fallback(smc, reason_code);
>>   }
>> -static void smc_conn_abort(struct smc_sock *smc, int local_first)
>> +static void __smc_conn_abort(struct smc_sock *smc, int local_first,
>> +                 bool locked)
>>   {
>>       struct smc_connection *conn = &smc->conn;
>>       struct smc_link_group *lgr = conn->lgr;
>> @@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
>>       if (smc_conn_lgr_valid(conn))
>>           lgr_valid = true;
>> -    smc_conn_free(conn);
>> +    if (!locked) {
>> +        lock_sock(&smc->sk);
>> +        smc_conn_free(conn);
>> +        release_sock(&smc->sk);
>> +    } else {
>> +        smc_conn_free(conn);
>> +    }
>>       if (local_first && lgr_valid)
>>           smc_lgr_cleanup_early(lgr);
>>   }
>> +static void smc_conn_abort(struct smc_sock *smc, int local_first)
>> +{
>> +    __smc_conn_abort(smc, local_first, false);
>> +}
>> +
>> +static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
>> +{
>> +    __smc_conn_abort(smc, local_first, true);
>> +}
>> +
>>   /* check if there is a rdma device available for this connection. */
>>   /* called for connect and listen */
>>   static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
>> @@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
>>       return 0;
>>   connect_abort:
>> -    smc_conn_abort(smc, ini->first_contact_local);
>> +    smc_conn_abort_locked(smc, ini->first_contact_local);
>>       mutex_unlock(&smc_client_lgr_pending);
>>       smc->connect_nonblock = 0;
>> @@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
>>       return 0;
>>   connect_abort:
>> -    smc_conn_abort(smc, ini->first_contact_local);
>> +    smc_conn_abort_locked(smc, ini->first_contact_local);
>>       mutex_unlock(&smc_server_lgr_pending);
>>       smc->connect_nonblock = 0;
> 
> Why is smc_conn_abort_locked() only necessary for the smc_connect_work, not for the smc_listen_work?
> 

Before this patch, the smc_conn_abort()->smc_conn_free() calls are not
protected by sock lock except in smc_conn_{rdma|ism}. So I add sock lock
protection inside the __smc_conn_abort() and introduce smc_conn_abort_locked()
(which means sock lock has been taken) for smc_conn_{rdma|ism}.

> Thanks,
> Wenjia

Thanks!
Wen Gu
Wen Gu Nov. 25, 2024, 10 a.m. UTC | #5
On 2024/11/23 00:03, Alexandra Winter wrote:
> 
> 
> On 22.11.24 08:16, Wen Gu wrote:
>> We encountered a LGR/link use-after-free issue, which manifested as
>> the LGR/link refcnt reaching 0 early and entering the clear process,
>> making resource access unsafe.
>>
>>   refcount_t: addition on 0; use-after-free.
>>   WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
>>   Workqueue: events smc_lgr_terminate_work [smc]
>>   Call trace:
>>    refcount_warn_saturate+0x9c/0x140
>>    __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
>>    smc_lgr_terminate_work+0x28/0x30 [smc]
>>    process_one_work+0x1b8/0x420
>>    worker_thread+0x158/0x510
>>    kthread+0x114/0x118
>>
>> or
>>
>>   refcount_t: underflow; use-after-free.
>>   WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
>>   Workqueue: smc_hs_wq smc_listen_work [smc]
>>   Call trace:
>>    refcount_warn_saturate+0xf0/0x140
>>    smcr_link_put+0x1cc/0x1d8 [smc]
>>    smc_conn_free+0x110/0x1b0 [smc]
>>    smc_conn_abort+0x50/0x60 [smc]
>>    smc_listen_find_device+0x75c/0x790 [smc]
>>    smc_listen_work+0x368/0x8a0 [smc]
>>    process_one_work+0x1b8/0x420
>>    worker_thread+0x158/0x510
>>    kthread+0x114/0x118
>>
>> It is caused by repeated release of LGR/link refcnt. One suspect is that
>> smc_conn_free() is called repeatedly because some smc_conn_free() are not
>> protected by sock lock.
>>
>> Calls under socklock        | Calls not under socklock
>> -------------------------------------------------------
>> lock_sock(sk)               | smc_conn_abort
>> smc_conn_free               | \- smc_conn_free
>> \- smcr_link_put            |    \- smcr_link_put (duplicated)
>> release_sock(sk)
>>
>> So make sure smc_conn_free() is called under the sock lock.
>>
>> Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
>> Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>> Co-developed-by: Kai <KaiShen@linux.alibaba.com>
>> Signed-off-by: Kai <KaiShen@linux.alibaba.com>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> ---
>>   net/smc/af_smc.c | 25 +++++++++++++++++++++----
>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>> index ed6d4d520bc7..e0a7a0151b11 100644
>> --- a/net/smc/af_smc.c
>> +++ b/net/smc/af_smc.c
>> @@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
>>   	return smc_connect_fallback(smc, reason_code);
>>   }
>>   
>> -static void smc_conn_abort(struct smc_sock *smc, int local_first)
>> +static void __smc_conn_abort(struct smc_sock *smc, int local_first,
>> +			     bool locked)
>>   {
>>   	struct smc_connection *conn = &smc->conn;
>>   	struct smc_link_group *lgr = conn->lgr;
>> @@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
>>   	if (smc_conn_lgr_valid(conn))
>>   		lgr_valid = true;
>>   
>> -	smc_conn_free(conn);
>> +	if (!locked) {
>> +		lock_sock(&smc->sk);
>> +		smc_conn_free(conn);
>> +		release_sock(&smc->sk);
>> +	} else {
>> +		smc_conn_free(conn);
>> +	}
>>   	if (local_first && lgr_valid)
>>   		smc_lgr_cleanup_early(lgr);
>>   }
>>   
>> +static void smc_conn_abort(struct smc_sock *smc, int local_first)
>> +{
>> +	__smc_conn_abort(smc, local_first, false);
>> +}
>> +
>> +static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
>> +{
>> +	__smc_conn_abort(smc, local_first, true);
>> +}
>> +
>>   /* check if there is a rdma device available for this connection. */
>>   /* called for connect and listen */
>>   static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
>> @@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
>>   
>>   	return 0;
>>   connect_abort:
>> -	smc_conn_abort(smc, ini->first_contact_local);
>> +	smc_conn_abort_locked(smc, ini->first_contact_local);
>>   	mutex_unlock(&smc_client_lgr_pending);
>>   	smc->connect_nonblock = 0;
>>   
>> @@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
>>   
>>   	return 0;
>>   connect_abort:
>> -	smc_conn_abort(smc, ini->first_contact_local);
>> +	smc_conn_abort_locked(smc, ini->first_contact_local);
>>   	mutex_unlock(&smc_server_lgr_pending);
>>   	smc->connect_nonblock = 0;
>>   
> 
> I wonder if this can deadlock, when you take lock_sock so far down in the callchain.
> example:
>   smc_connect will first take lock_sock(sk) and then mutex_lock(&smc_server_lgr_pending);  (e.g. in smc_connect_ism())
> wheras
> smc_listen_work() will take mutex_lock(&smc_server_lgr_pending); and then lock_sock(sk) (in your __smc_conn_abort(,,false))
> 
> I am not sure whether this can be called on the same socket, but it looks suspicious to me.
> 

IMHO this two paths can not occur on the same sk.

> 
> All callers of smc_conn_abort() without socklock seem to originate from smc_listen_work().
> That makes me think whether smc_listen_work() should do lock_sock(sk) on a higher level.
> 

Yes, I also think about this question, I guess it is because the new smc sock will be
accepted by userspace only after smc_listen_work() is completed. Before that, no userspace
operation occurs synchronously with it, so it is not protected by sock lock. But I am not
sure if there are other reasons, so I did not aggressively protect the entire smc_listen_work
with sock lock, but chose a conservative approach.

> Do you have an example which function could collide with smc_listen_work()?
> i.e. have you found a way to reproduce this?
> 

We discovered this during our fault injection testing where the rdma driver was rmmod/insmod
sporadically during the nginx/wrk 1K connections test.

e.g.

    __smc_lgr_terminate            | smc_listen_decline
    (caused by rmmod mlx5_ib)      | (caused by e.g. reg mr fail)
    --------------------------------------------------------------
    lock_sock                      |
    smc_conn_kill                  | smc_conn_abort
     \- smc_conn_free              |  \- smc_conn_free
    release_sock                   |

> 
> Are you sure that all callers of smc_conn_free(), that are not smc_conn_abort(), do set the socklock?
> It seems to me that the path of smc_conn_kill() is not covered by your solution.
> 

smc_conn_free is called in these places:

1. __smc_release (protected by sock lock)
2. smc_conn_abort (partially protected by sock lock)
3. smc_close_active_abort - smc_release(protected by sock lock)
                           - smc_conn_kill - __smc_lgr_terminate/smc_conn_abort_work(protected by sock lock)
4. smc_close_passive_work (protected by sock lock)

So only smc_conn_abort->smc_conn_free is not well protected by sock lock.

> 
> Please excuse, that I am not deeply familiar with this code.
> I'm just trying to ask helpful questions.

Thanks! :)
Alexandra Winter Nov. 25, 2024, 1:02 p.m. UTC | #6
On 25.11.24 11:00, Wen Gu wrote:
>> I wonder if this can deadlock, when you take lock_sock so far down in the callchain.
>> example:
>>   smc_connect will first take lock_sock(sk) and then mutex_lock(&smc_server_lgr_pending);  (e.g. in smc_connect_ism())
>> wheras
>> smc_listen_work() will take mutex_lock(&smc_server_lgr_pending); and then lock_sock(sk) (in your __smc_conn_abort(,,false))
>>
>> I am not sure whether this can be called on the same socket, but it looks suspicious to me.
>>
> 
> IMHO this two paths can not occur on the same sk.
> 
>>
>> All callers of smc_conn_abort() without socklock seem to originate from smc_listen_work().
>> That makes me think whether smc_listen_work() should do lock_sock(sk) on a higher level.
>>
> 
> Yes, I also think about this question, I guess it is because the new smc sock will be
> accepted by userspace only after smc_listen_work() is completed. Before that, no userspace
> operation occurs synchronously with it, so it is not protected by sock lock. But I am not
> sure if there are other reasons, so I did not aggressively protect the entire smc_listen_work
> with sock lock, but chose a conservative approach.
> 
>> Do you have an example which function could collide with smc_listen_work()?
>> i.e. have you found a way to reproduce this?
>>
> 
> We discovered this during our fault injection testing where the rdma driver was rmmod/insmod
> sporadically during the nginx/wrk 1K connections test.
> 
> e.g.
> 
>    __smc_lgr_terminate            | smc_listen_decline
>    (caused by rmmod mlx5_ib)      | (caused by e.g. reg mr fail)
>    --------------------------------------------------------------
>    lock_sock                      |
>    smc_conn_kill                  | smc_conn_abort
>     \- smc_conn_free              |  \- smc_conn_free
>    release_sock                   |


Thank you for the explanations. So the most suspicious scenario is
smc_listen_work() colliding with 
 __smc_lgr_terminate() -> smc_conn_kill() of the conn and smc socket that is just under 
construction by smc_listen_work() (without socklock).

I am wondering, if other parts of smc_listen_work() are allowed to run in parallel
with smc_conn_kill() of this smc socket??

My impression would be that the whole smc_listen_work() should be protected against
smc_conn_kill(), not only smc_conn_free.
Wenjia Zhang Nov. 26, 2024, 12:12 p.m. UTC | #7
On 25.11.24 07:46, Wen Gu wrote:
> 
> 
> On 2024/11/22 23:56, Wenjia Zhang wrote:
>>
>>
>> On 22.11.24 08:16, Wen Gu wrote:
>>> We encountered a LGR/link use-after-free issue, which manifested as
>>> the LGR/link refcnt reaching 0 early and entering the clear process,
>>> making resource access unsafe.
>>>
>>
>> How did you make sure that the refcount mentioned in the warning are 
>> the LGR/link refcnt, not &sk->sk_refcnt?
>>
> Because according to the panic stack, the UAF is found in smcr_link_put(),
> and it is also found that the link has been cleared at that time (lnk has
> been memset to all zero by __smcr_link_clear()).
> 
ok, I think you're right, I was distracted by the the sock_put() in 
function __smc_lgr_terminate()

>>>   refcount_t: addition on 0; use-after-free.
>>>   WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 
>>> refcount_warn_saturate+0x9c/0x140
>>>   Workqueue: events smc_lgr_terminate_work [smc]
>>>   Call trace:
>>>    refcount_warn_saturate+0x9c/0x140
>>>    __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
>>>    smc_lgr_terminate_work+0x28/0x30 [smc]
>>>    process_one_work+0x1b8/0x420
>>>    worker_thread+0x158/0x510
>>>    kthread+0x114/0x118
>>>
>>> or
>>>
>>>   refcount_t: underflow; use-after-free.
>>>   WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 
>>> refcount_warn_saturate+0xf0/0x140
>>>   Workqueue: smc_hs_wq smc_listen_work [smc]
>>>   Call trace:
>>>    refcount_warn_saturate+0xf0/0x140
>>>    smcr_link_put+0x1cc/0x1d8 [smc]
>>>    smc_conn_free+0x110/0x1b0 [smc]
>>>    smc_conn_abort+0x50/0x60 [smc]
>>>    smc_listen_find_device+0x75c/0x790 [smc]
>>>    smc_listen_work+0x368/0x8a0 [smc]
>>>    process_one_work+0x1b8/0x420
>>>    worker_thread+0x158/0x510
>>>    kthread+0x114/0x118
>>>
>>> It is caused by repeated release of LGR/link refcnt. One suspect is that
>>> smc_conn_free() is called repeatedly because some smc_conn_free() are 
>>> not
>>> protected by sock lock.
>>>
>>> Calls under socklock        | Calls not under socklock
>>> -------------------------------------------------------
>>> lock_sock(sk)               | smc_conn_abort
>>> smc_conn_free               | \- smc_conn_free
>>> \- smcr_link_put            |    \- smcr_link_put (duplicated)
>>> release_sock(sk)
>>>
>>> So make sure smc_conn_free() is called under the sock lock.
>>>
>>
>> If I understand correctly, the fix could only solve a part of the 
>> problem, i.e. what the second call trace reported, right?
> 
> I think that these panic stacks (there are some other variations that I 
> haven't posted)
> have the same root cause, that is the link/lgr refcnt reaches 0 early in 
> the race situation,
> making access to link/lgr related resources no longer safe.
> 
> The link/lgr refcnt was introduced by [1] & [2], the link refcnt is 
> operated by link
> itself and connections registered to it, and the lgr refcnt is operated 
> by lgr itself,
> links belong to it and connections registered to it. Through code 
> analysis, the most
> likely suspect is that smc_conn_free() duplicate put link/lgr refcnt 
> because some
> smc_conn_free() calls by smc_conn_abort() are not under the protection 
> of sock lock,
> so if they are called at the same time, there may be a race condition.
> 
> for example:
> 
>     __smc_lgr_terminate            | smc_listen_decline
>     --------------------------------------------------------------
>     lock_sock                      |
>     smc_conn_kill                  | smc_conn_abort
>      \- smc_conn_free              |  \- smc_conn_free
>     release_sock                   |
> 
> [1] 61f434b0280e ("net/smc: Resolve the race between link group access 
> and termination")
> [2] 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access 
> and clear")
> 
I see, thx!
>>
>>> Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen 
>>> processing")
>>> Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>>> Co-developed-by: Kai <KaiShen@linux.alibaba.com>
>>> Signed-off-by: Kai <KaiShen@linux.alibaba.com>
>>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>>> ---
>>>   net/smc/af_smc.c | 25 +++++++++++++++++++++----
>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>> index ed6d4d520bc7..e0a7a0151b11 100644
>>> --- a/net/smc/af_smc.c
>>> +++ b/net/smc/af_smc.c
>>> @@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct 
>>> smc_sock *smc, int reason_code,
>>>       return smc_connect_fallback(smc, reason_code);
>>>   }
>>> -static void smc_conn_abort(struct smc_sock *smc, int local_first)
>>> +static void __smc_conn_abort(struct smc_sock *smc, int local_first,
>>> +                 bool locked)
>>>   {
>>>       struct smc_connection *conn = &smc->conn;
>>>       struct smc_link_group *lgr = conn->lgr;
>>> @@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock 
>>> *smc, int local_first)
>>>       if (smc_conn_lgr_valid(conn))
>>>           lgr_valid = true;
>>> -    smc_conn_free(conn);
>>> +    if (!locked) {
>>> +        lock_sock(&smc->sk);
>>> +        smc_conn_free(conn);
>>> +        release_sock(&smc->sk);
>>> +    } else {
>>> +        smc_conn_free(conn);
>>> +    }
>>>       if (local_first && lgr_valid)
>>>           smc_lgr_cleanup_early(lgr);
>>>   }
>>> +static void smc_conn_abort(struct smc_sock *smc, int local_first)
>>> +{
>>> +    __smc_conn_abort(smc, local_first, false);
>>> +}
>>> +
>>> +static void smc_conn_abort_locked(struct smc_sock *smc, int 
>>> local_first)
>>> +{
>>> +    __smc_conn_abort(smc, local_first, true);
>>> +}
>>> +
>>>   /* check if there is a rdma device available for this connection. */
>>>   /* called for connect and listen */
>>>   static int smc_find_rdma_device(struct smc_sock *smc, struct 
>>> smc_init_info *ini)
>>> @@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
>>>       return 0;
>>>   connect_abort:
>>> -    smc_conn_abort(smc, ini->first_contact_local);
>>> +    smc_conn_abort_locked(smc, ini->first_contact_local);
>>>       mutex_unlock(&smc_client_lgr_pending);
>>>       smc->connect_nonblock = 0;
>>> @@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
>>>       return 0;
>>>   connect_abort:
>>> -    smc_conn_abort(smc, ini->first_contact_local);
>>> +    smc_conn_abort_locked(smc, ini->first_contact_local);
>>>       mutex_unlock(&smc_server_lgr_pending);
>>>       smc->connect_nonblock = 0;
>>
>> Why is smc_conn_abort_locked() only necessary for the 
>> smc_connect_work, not for the smc_listen_work?
>>
> 
> Before this patch, the smc_conn_abort()->smc_conn_free() calls are not
> protected by sock lock except in smc_conn_{rdma|ism}. So I add sock lock
Do you mean here that the smc_conn_abort()->smc_conn_free() calls are 
protected in smc_listen_{rdma|ism}, right?
If it is, could you please point me out where they(the protection) are?
> protection inside the __smc_conn_abort() and introduce 
> smc_conn_abort_locked()
> (which means sock lock has been taken) for smc_conn_{rdma|ism}.
> 

Thanks,
Wenjia
Wenjia Zhang Nov. 26, 2024, 12:19 p.m. UTC | #8
On 26.11.24 13:12, Wenjia Zhang wrote:
> 
> 
> On 25.11.24 07:46, Wen Gu wrote:
>>
>>
>> On 2024/11/22 23:56, Wenjia Zhang wrote:
>>>
>>>
>>> On 22.11.24 08:16, Wen Gu wrote:
>>>> We encountered a LGR/link use-after-free issue, which manifested as
>>>> the LGR/link refcnt reaching 0 early and entering the clear process,
>>>> making resource access unsafe.
>>>>
>>>
>>> How did you make sure that the refcount mentioned in the warning are 
>>> the LGR/link refcnt, not &sk->sk_refcnt?
>>>
>> Because according to the panic stack, the UAF is found in 
>> smcr_link_put(),
>> and it is also found that the link has been cleared at that time (lnk has
>> been memset to all zero by __smcr_link_clear()).
>>
> ok, I think you're right, I was distracted by the the sock_put() in 
> function __smc_lgr_terminate()
> 
>>>>   refcount_t: addition on 0; use-after-free.
>>>>   WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 
>>>> refcount_warn_saturate+0x9c/0x140
>>>>   Workqueue: events smc_lgr_terminate_work [smc]
>>>>   Call trace:
>>>>    refcount_warn_saturate+0x9c/0x140
>>>>    __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
>>>>    smc_lgr_terminate_work+0x28/0x30 [smc]
>>>>    process_one_work+0x1b8/0x420
>>>>    worker_thread+0x158/0x510
>>>>    kthread+0x114/0x118
>>>>
>>>> or
>>>>
>>>>   refcount_t: underflow; use-after-free.
>>>>   WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 
>>>> refcount_warn_saturate+0xf0/0x140
>>>>   Workqueue: smc_hs_wq smc_listen_work [smc]
>>>>   Call trace:
>>>>    refcount_warn_saturate+0xf0/0x140
>>>>    smcr_link_put+0x1cc/0x1d8 [smc]
>>>>    smc_conn_free+0x110/0x1b0 [smc]
>>>>    smc_conn_abort+0x50/0x60 [smc]
>>>>    smc_listen_find_device+0x75c/0x790 [smc]
>>>>    smc_listen_work+0x368/0x8a0 [smc]
>>>>    process_one_work+0x1b8/0x420
>>>>    worker_thread+0x158/0x510
>>>>    kthread+0x114/0x118
>>>>
>>>> It is caused by repeated release of LGR/link refcnt. One suspect is 
>>>> that
>>>> smc_conn_free() is called repeatedly because some smc_conn_free() 
>>>> are not
>>>> protected by sock lock.
>>>>
>>>> Calls under socklock        | Calls not under socklock
>>>> -------------------------------------------------------
>>>> lock_sock(sk)               | smc_conn_abort
>>>> smc_conn_free               | \- smc_conn_free
>>>> \- smcr_link_put            |    \- smcr_link_put (duplicated)
>>>> release_sock(sk)
>>>>
>>>> So make sure smc_conn_free() is called under the sock lock.
>>>>
>>>
>>> If I understand correctly, the fix could only solve a part of the 
>>> problem, i.e. what the second call trace reported, right?
>>
>> I think that these panic stacks (there are some other variations that 
>> I haven't posted)
>> have the same root cause, that is the link/lgr refcnt reaches 0 early 
>> in the race situation,
>> making access to link/lgr related resources no longer safe.
>>
>> The link/lgr refcnt was introduced by [1] & [2], the link refcnt is 
>> operated by link
>> itself and connections registered to it, and the lgr refcnt is 
>> operated by lgr itself,
>> links belong to it and connections registered to it. Through code 
>> analysis, the most
>> likely suspect is that smc_conn_free() duplicate put link/lgr refcnt 
>> because some
>> smc_conn_free() calls by smc_conn_abort() are not under the protection 
>> of sock lock,
>> so if they are called at the same time, there may be a race condition.
>>
>> for example:
>>
>>     __smc_lgr_terminate            | smc_listen_decline
>>     --------------------------------------------------------------
>>     lock_sock                      |
>>     smc_conn_kill                  | smc_conn_abort
>>      \- smc_conn_free              |  \- smc_conn_free
>>     release_sock                   |
>>
>> [1] 61f434b0280e ("net/smc: Resolve the race between link group access 
>> and termination")
>> [2] 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access 
>> and clear")
>>
> I see, thx!
>>>
>>>> Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen 
>>>> processing")
>>>> Co-developed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>>>> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
>>>> Co-developed-by: Kai <KaiShen@linux.alibaba.com>
>>>> Signed-off-by: Kai <KaiShen@linux.alibaba.com>
>>>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>>>> ---
>>>>   net/smc/af_smc.c | 25 +++++++++++++++++++++----
>>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
>>>> index ed6d4d520bc7..e0a7a0151b11 100644
>>>> --- a/net/smc/af_smc.c
>>>> +++ b/net/smc/af_smc.c
>>>> @@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct 
>>>> smc_sock *smc, int reason_code,
>>>>       return smc_connect_fallback(smc, reason_code);
>>>>   }
>>>> -static void smc_conn_abort(struct smc_sock *smc, int local_first)
>>>> +static void __smc_conn_abort(struct smc_sock *smc, int local_first,
>>>> +                 bool locked)
>>>>   {
>>>>       struct smc_connection *conn = &smc->conn;
>>>>       struct smc_link_group *lgr = conn->lgr;
>>>> @@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock 
>>>> *smc, int local_first)
>>>>       if (smc_conn_lgr_valid(conn))
>>>>           lgr_valid = true;
>>>> -    smc_conn_free(conn);
>>>> +    if (!locked) {
>>>> +        lock_sock(&smc->sk);
>>>> +        smc_conn_free(conn);
>>>> +        release_sock(&smc->sk);
>>>> +    } else {
>>>> +        smc_conn_free(conn);
>>>> +    }
>>>>       if (local_first && lgr_valid)
>>>>           smc_lgr_cleanup_early(lgr);
>>>>   }
>>>> +static void smc_conn_abort(struct smc_sock *smc, int local_first)
>>>> +{
>>>> +    __smc_conn_abort(smc, local_first, false);
>>>> +}
>>>> +
>>>> +static void smc_conn_abort_locked(struct smc_sock *smc, int 
>>>> local_first)
>>>> +{
>>>> +    __smc_conn_abort(smc, local_first, true);
>>>> +}
>>>> +
>>>>   /* check if there is a rdma device available for this connection. */
>>>>   /* called for connect and listen */
>>>>   static int smc_find_rdma_device(struct smc_sock *smc, struct 
>>>> smc_init_info *ini)
>>>> @@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
>>>>       return 0;
>>>>   connect_abort:
>>>> -    smc_conn_abort(smc, ini->first_contact_local);
>>>> +    smc_conn_abort_locked(smc, ini->first_contact_local);
>>>>       mutex_unlock(&smc_client_lgr_pending);
>>>>       smc->connect_nonblock = 0;
>>>> @@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
>>>>       return 0;
>>>>   connect_abort:
>>>> -    smc_conn_abort(smc, ini->first_contact_local);
>>>> +    smc_conn_abort_locked(smc, ini->first_contact_local);
>>>>       mutex_unlock(&smc_server_lgr_pending);
>>>>       smc->connect_nonblock = 0;
>>>
>>> Why is smc_conn_abort_locked() only necessary for the 
>>> smc_connect_work, not for the smc_listen_work?
>>>
>>
>> Before this patch, the smc_conn_abort()->smc_conn_free() calls are not
>> protected by sock lock except in smc_conn_{rdma|ism}. So I add sock lock
> Do you mean here that the smc_conn_abort()->smc_conn_free() calls are 
> protected in smc_listen_{rdma|ism}, right?
> If it is, could you please point me out where they(the protection) are?

sorry, I see it, pls forget this question
>> protection inside the __smc_conn_abort() and introduce 
>> smc_conn_abort_locked()
>> (which means sock lock has been taken) for smc_conn_{rdma|ism}.
>>
> 
> Thanks,
> Wenjia
> 
>
Wen Gu Nov. 27, 2024, 7:03 a.m. UTC | #9
On 2024/11/25 21:02, Alexandra Winter wrote:
> 
> 
> On 25.11.24 11:00, Wen Gu wrote:
>>> I wonder if this can deadlock, when you take lock_sock so far down in the callchain.
>>> example:
>>>    smc_connect will first take lock_sock(sk) and then mutex_lock(&smc_server_lgr_pending);  (e.g. in smc_connect_ism())
>>> wheras
>>> smc_listen_work() will take mutex_lock(&smc_server_lgr_pending); and then lock_sock(sk) (in your __smc_conn_abort(,,false))
>>>
>>> I am not sure whether this can be called on the same socket, but it looks suspicious to me.
>>>
>>
>> IMHO this two paths can not occur on the same sk.
>>
>>>
>>> All callers of smc_conn_abort() without socklock seem to originate from smc_listen_work().
>>> That makes me think whether smc_listen_work() should do lock_sock(sk) on a higher level.
>>>
>>
>> Yes, I also think about this question, I guess it is because the new smc sock will be
>> accepted by userspace only after smc_listen_work() is completed. Before that, no userspace
>> operation occurs synchronously with it, so it is not protected by sock lock. But I am not
>> sure if there are other reasons, so I did not aggressively protect the entire smc_listen_work
>> with sock lock, but chose a conservative approach.
>>
>>> Do you have an example which function could collide with smc_listen_work()?
>>> i.e. have you found a way to reproduce this?
>>>
>>
>> We discovered this during our fault injection testing where the rdma driver was rmmod/insmod
>> sporadically during the nginx/wrk 1K connections test.
>>
>> e.g.
>>
>>     __smc_lgr_terminate            | smc_listen_decline
>>     (caused by rmmod mlx5_ib)      | (caused by e.g. reg mr fail)
>>     --------------------------------------------------------------
>>     lock_sock                      |
>>     smc_conn_kill                  | smc_conn_abort
>>      \- smc_conn_free              |  \- smc_conn_free
>>     release_sock                   |
> 
> 
> Thank you for the explanations. So the most suspicious scenario is
> smc_listen_work() colliding with
>   __smc_lgr_terminate() -> smc_conn_kill() of the conn and smc socket that is just under
> construction by smc_listen_work() (without socklock).
> 
> I am wondering, if other parts of smc_listen_work() are allowed to run in parallel
> with smc_conn_kill() of this smc socket??
> 
Ideally, smc_listen_work should be all covered by new_smc's sock lock, mutually
exclusive with other conn operations.

But I need to look deeper into the smc_listen_work() implementation to see if
all-covered by sock lock is feasible. At least some of the places already protected
by new_smc's sock lock need to be excluded or handled.

e.g.

smc_listen_work()
  \- smc_listen_out_xxx()
      \- smc_listen_out()
          \- smc_close_non_accepted() -> take the new_smc's sock lock.

> My impression would be that the whole smc_listen_work() should be protected against
> smc_conn_kill(), not only smc_conn_free.
> 

Thanks!
diff mbox series

Patch

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index ed6d4d520bc7..e0a7a0151b11 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -973,7 +973,8 @@  static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
 	return smc_connect_fallback(smc, reason_code);
 }
 
-static void smc_conn_abort(struct smc_sock *smc, int local_first)
+static void __smc_conn_abort(struct smc_sock *smc, int local_first,
+			     bool locked)
 {
 	struct smc_connection *conn = &smc->conn;
 	struct smc_link_group *lgr = conn->lgr;
@@ -982,11 +983,27 @@  static void smc_conn_abort(struct smc_sock *smc, int local_first)
 	if (smc_conn_lgr_valid(conn))
 		lgr_valid = true;
 
-	smc_conn_free(conn);
+	if (!locked) {
+		lock_sock(&smc->sk);
+		smc_conn_free(conn);
+		release_sock(&smc->sk);
+	} else {
+		smc_conn_free(conn);
+	}
 	if (local_first && lgr_valid)
 		smc_lgr_cleanup_early(lgr);
 }
 
+static void smc_conn_abort(struct smc_sock *smc, int local_first)
+{
+	__smc_conn_abort(smc, local_first, false);
+}
+
+static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
+{
+	__smc_conn_abort(smc, local_first, true);
+}
+
 /* check if there is a rdma device available for this connection. */
 /* called for connect and listen */
 static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
@@ -1352,7 +1369,7 @@  static int smc_connect_rdma(struct smc_sock *smc,
 
 	return 0;
 connect_abort:
-	smc_conn_abort(smc, ini->first_contact_local);
+	smc_conn_abort_locked(smc, ini->first_contact_local);
 	mutex_unlock(&smc_client_lgr_pending);
 	smc->connect_nonblock = 0;
 
@@ -1454,7 +1471,7 @@  static int smc_connect_ism(struct smc_sock *smc,
 
 	return 0;
 connect_abort:
-	smc_conn_abort(smc, ini->first_contact_local);
+	smc_conn_abort_locked(smc, ini->first_contact_local);
 	mutex_unlock(&smc_server_lgr_pending);
 	smc->connect_nonblock = 0;