Message ID | 20240413035150.3338977-1-shaozhengchao@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/smc: fix potential sleeping issue in smc_switch_conns | expand |
On 13.04.24 05:51, Zhengchao Shao wrote: > Potential sleeping issue exists in the following processes: > smc_switch_conns > spin_lock_bh(&conn->send_lock) > smc_switch_link_and_count > smcr_link_put > __smcr_link_clear > smc_lgr_put > __smc_lgr_free > smc_lgr_free_bufs > __smc_lgr_free_bufs > smc_buf_free > smcr_buf_free > smcr_buf_unmap_link > smc_ib_put_memory_region > ib_dereg_mr > ib_dereg_mr_user > mr->device->ops.dereg_mr > If scheduling exists when the IB driver implements .dereg_mr hook > function, the bug "scheduling while atomic" will occur. For example, > cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. > > Fixes: 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access and clear") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > net/smc/af_smc.c | 2 +- > net/smc/smc.h | 2 +- > net/smc/smc_cdc.c | 14 +++++++------- > net/smc/smc_core.c | 8 ++++---- > net/smc/smc_tx.c | 8 ++++---- > 5 files changed, 17 insertions(+), 17 deletions(-) > Hi Zhengchao, If I understand correctly, the sleeping issue is not the core issue, it looks like a kind of deadlock or kernel pointer dereference issue. Did you get crash? Do you have any backtrace? Why do you think the mutex lock will fix it? Thanks, Wenjia
On Sat, 2024-04-13 at 11:51 +0800, Zhengchao Shao wrote: > Potential sleeping issue exists in the following processes: > smc_switch_conns > spin_lock_bh(&conn->send_lock) > smc_switch_link_and_count > smcr_link_put > __smcr_link_clear > smc_lgr_put > __smc_lgr_free > smc_lgr_free_bufs > __smc_lgr_free_bufs > smc_buf_free > smcr_buf_free > smcr_buf_unmap_link > smc_ib_put_memory_region > ib_dereg_mr > ib_dereg_mr_user > mr->device->ops.dereg_mr > If scheduling exists when the IB driver implements .dereg_mr hook > function, the bug "scheduling while atomic" will occur. For example, > cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. I tried to inspect all the lock call sites, and it *look* like they are all in process context, so the switch should be feasible. Still the fact that the existing lock is a BH variant is suspect. Either the BH part was not needed or this can introduce subtle regressions/issues. I think this deserves at least a 3rd party testing. Thanks, Paolo
On 2024/4/16 20:06, Paolo Abeni wrote: > On Sat, 2024-04-13 at 11:51 +0800, Zhengchao Shao wrote: >> Potential sleeping issue exists in the following processes: >> smc_switch_conns >> spin_lock_bh(&conn->send_lock) >> smc_switch_link_and_count >> smcr_link_put >> __smcr_link_clear >> smc_lgr_put >> __smc_lgr_free >> smc_lgr_free_bufs >> __smc_lgr_free_bufs >> smc_buf_free >> smcr_buf_free >> smcr_buf_unmap_link >> smc_ib_put_memory_region >> ib_dereg_mr >> ib_dereg_mr_user >> mr->device->ops.dereg_mr >> If scheduling exists when the IB driver implements .dereg_mr hook >> function, the bug "scheduling while atomic" will occur. For example, >> cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. > > I tried to inspect all the lock call sites, and it *look* like they are > all in process context, so the switch should be feasible. There exist some calls from tasklet, where mutex lock is infeasible. For example: - tasklet -> smc_wr_tx_tasklet_fn -> smc_wr_tx_process_cqe -> pnd_snd.handler -> smc_cdc_tx_handler -> smc_tx_pending -> smc_tx_sndbuf_nonempty -> smcr_tx_sndbuf_nonempty -> spin_lock_bh(&conn->send_lock) - tasklet -> smc_wr_rx_tasklet_fn -> smc_wr_rx_process_cqes -> smc_wr_rx_demultiplex -> smc_cdc_rx_handler -> smc_cdc_msg_validate -> spin_lock_bh(&conn->send_lock) Thanks, Guangguan Wang > > Still the fact that the existing lock is a BH variant is suspect. > Either the BH part was not needed or this can introduce subtle > regressions/issues. > > I think this deserves at least a 3rd party testing. > > Thanks, > > Paolo >
On 2024/4/13 11:51, Zhengchao Shao wrote: > Potential sleeping issue exists in the following processes: > smc_switch_conns > spin_lock_bh(&conn->send_lock) > smc_switch_link_and_count > smcr_link_put > __smcr_link_clear > smc_lgr_put > __smc_lgr_free > smc_lgr_free_bufs > __smc_lgr_free_bufs > smc_buf_free > smcr_buf_free > smcr_buf_unmap_link > smc_ib_put_memory_region > ib_dereg_mr > ib_dereg_mr_user > mr->device->ops.dereg_mr > If scheduling exists when the IB driver implements .dereg_mr hook > function, the bug "scheduling while atomic" will occur. For example, > cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. > > Fixes: 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access and clear") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > net/smc/af_smc.c | 2 +- > net/smc/smc.h | 2 +- > net/smc/smc_cdc.c | 14 +++++++------- > net/smc/smc_core.c | 8 ++++---- > net/smc/smc_tx.c | 8 ++++---- > 5 files changed, 17 insertions(+), 17 deletions(-) > Hi Zhengchao, I doubt whether this bug really exists, as efa supports SRD QP while SMC-R relies on RC QP, cxgb4 is a IWARP adaptor while SMC-R relies on ROCE adaptor. Thanks, Guangguan Wang
Hi Guangguan: Thank you for your review. When I used the hns driver, I ran into the problem of "scheduling while atomic". But the problem was tested on the 5.10 kernel branch, and I'm still trying to reproduce it using the mainline. Zhengchao Shao On 2024/4/17 16:00, Guangguan Wang wrote: > > > On 2024/4/13 11:51, Zhengchao Shao wrote: >> Potential sleeping issue exists in the following processes: >> smc_switch_conns >> spin_lock_bh(&conn->send_lock) >> smc_switch_link_and_count >> smcr_link_put >> __smcr_link_clear >> smc_lgr_put >> __smc_lgr_free >> smc_lgr_free_bufs >> __smc_lgr_free_bufs >> smc_buf_free >> smcr_buf_free >> smcr_buf_unmap_link >> smc_ib_put_memory_region >> ib_dereg_mr >> ib_dereg_mr_user >> mr->device->ops.dereg_mr >> If scheduling exists when the IB driver implements .dereg_mr hook >> function, the bug "scheduling while atomic" will occur. For example, >> cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. >> >> Fixes: 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access and clear") >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> >> --- >> net/smc/af_smc.c | 2 +- >> net/smc/smc.h | 2 +- >> net/smc/smc_cdc.c | 14 +++++++------- >> net/smc/smc_core.c | 8 ++++---- >> net/smc/smc_tx.c | 8 ++++---- >> 5 files changed, 17 insertions(+), 17 deletions(-) >> > > Hi Zhengchao, > > I doubt whether this bug really exists, as efa supports SRD QP while SMC-R relies on RC QP, > cxgb4 is a IWARP adaptor while SMC-R relies on ROCE adaptor. > > Thanks, > Guangguan Wang
On 17.04.24 09:32, Guangguan Wang wrote: > > > On 2024/4/16 20:06, Paolo Abeni wrote: >> On Sat, 2024-04-13 at 11:51 +0800, Zhengchao Shao wrote: >>> Potential sleeping issue exists in the following processes: >>> smc_switch_conns >>> spin_lock_bh(&conn->send_lock) >>> smc_switch_link_and_count >>> smcr_link_put >>> __smcr_link_clear >>> smc_lgr_put >>> __smc_lgr_free >>> smc_lgr_free_bufs >>> __smc_lgr_free_bufs >>> smc_buf_free >>> smcr_buf_free >>> smcr_buf_unmap_link >>> smc_ib_put_memory_region >>> ib_dereg_mr >>> ib_dereg_mr_user >>> mr->device->ops.dereg_mr >>> If scheduling exists when the IB driver implements .dereg_mr hook >>> function, the bug "scheduling while atomic" will occur. For example, >>> cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. >> >> I tried to inspect all the lock call sites, and it *look* like they are >> all in process context, so the switch should be feasible. > > There exist some calls from tasklet, where mutex lock is infeasible. > For example: > - tasklet -> smc_wr_tx_tasklet_fn -> smc_wr_tx_process_cqe -> pnd_snd.handler -> smc_cdc_tx_handler -> smc_tx_pending -> smc_tx_sndbuf_nonempty -> smcr_tx_sndbuf_nonempty -> spin_lock_bh(&conn->send_lock) > - tasklet -> smc_wr_rx_tasklet_fn -> smc_wr_rx_process_cqes -> smc_wr_rx_demultiplex -> smc_cdc_rx_handler -> smc_cdc_msg_validate -> spin_lock_bh(&conn->send_lock) > > Thanks, > Guangguan Wang > Agree! Thank you, Guangguan, for the examples! If we can verify that this bug exits, we should find other solutions. >> >> Still the fact that the existing lock is a BH variant is suspect. >> Either the BH part was not needed or this can introduce subtle >> regressions/issues. >> >> I think this deserves at least a 3rd party testing. >> >> Thanks, >> >> Paolo >> >
On 17.04.24 10:29, shaozhengchao wrote: > > Hi Guangguan: > Thank you for your review. When I used the hns driver, I ran into the > problem of "scheduling while atomic". But the problem was tested on the > 5.10 kernel branch, and I'm still trying to reproduce it using the > mainline. > > Zhengchao Shao > Could you please try to reproduce the bug with the latest kernel? And show more details (e.g. kernel log) on this bug? Thanks, Wenjia
On 2024/4/17 23:23, Wenjia Zhang wrote: > > > On 17.04.24 10:29, shaozhengchao wrote: >> >> Hi Guangguan: >> Thank you for your review. When I used the hns driver, I ran into the >> problem of "scheduling while atomic". But the problem was tested on the >> 5.10 kernel branch, and I'm still trying to reproduce it using the >> mainline. >> >> Zhengchao Shao >> > Hi Wenjia: I will try to reproduce it. In addition, the last time I sent you a issue about the smc-tool, do you have any idea? Thank you Zhengchao Shao > Could you please try to reproduce the bug with the latest kernel? And > show more details (e.g. kernel log) on this bug? > > Thanks, > Wenjia
On 18.04.24 03:48, shaozhengchao wrote: > > > On 2024/4/17 23:23, Wenjia Zhang wrote: >> >> >> On 17.04.24 10:29, shaozhengchao wrote: >>> >>> Hi Guangguan: >>> Thank you for your review. When I used the hns driver, I ran into the >>> problem of "scheduling while atomic". But the problem was tested on the >>> 5.10 kernel branch, and I'm still trying to reproduce it using the >>> mainline. >>> >>> Zhengchao Shao >>> >> > Hi Wenjia: > I will try to reproduce it. Thanks! In addition, the last time I sent you a > issue about the smc-tool, do you have any idea? > mhhh, I just see a patch from you on smc_hash_sk/smc_unhash_sk, and it is already applied during my vacation and it does look good to me. If you mean others, could you send me the link again please, I mightbe have missed out on it. > Thank you > Zhengchao Shao >> Could you please try to reproduce the bug with the latest kernel? And >> show more details (e.g. kernel log) on this bug? >> >> Thanks, >> Wenjia >
On 2024/4/18 15:50, Wenjia Zhang wrote: > > > On 18.04.24 03:48, shaozhengchao wrote: >> >> >> On 2024/4/17 23:23, Wenjia Zhang wrote: >>> >>> >>> On 17.04.24 10:29, shaozhengchao wrote: >>>> >>>> Hi Guangguan: >>>> Thank you for your review. When I used the hns driver, I ran into >>>> the >>>> problem of "scheduling while atomic". But the problem was tested on the >>>> 5.10 kernel branch, and I'm still trying to reproduce it using the >>>> mainline. >>>> >>>> Zhengchao Shao >>>> >>> >> Hi Wenjia: >> I will try to reproduce it. > > Thanks! > > In addition, the last time I sent you a >> issue about the smc-tool, do you have any idea? >> > Hi Wenjia: I have send it to you. Could you receive it? Thank you. Zhengchao Shao > mhhh, I just see a patch from you on smc_hash_sk/smc_unhash_sk, and it > is already applied during my vacation and it does look good to me. If > you mean others, could you send me the link again please, I mightbe have > missed out on it. > >> Thank you >> Zhengchao Shao >>> Could you please try to reproduce the bug with the latest kernel? And >>> show more details (e.g. kernel log) on this bug? >>> >>> Thanks, >>> Wenjia >> >
On 18.04.24 13:01, shaozhengchao wrote: > > > On 2024/4/18 15:50, Wenjia Zhang wrote: >> >> >> On 18.04.24 03:48, shaozhengchao wrote: >>> >>> >>> On 2024/4/17 23:23, Wenjia Zhang wrote: >>>> >>>> >>>> On 17.04.24 10:29, shaozhengchao wrote: >>>>> >>>>> Hi Guangguan: >>>>> Thank you for your review. When I used the hns driver, I ran >>>>> into the >>>>> problem of "scheduling while atomic". But the problem was tested on >>>>> the >>>>> 5.10 kernel branch, and I'm still trying to reproduce it using the >>>>> mainline. >>>>> >>>>> Zhengchao Shao >>>>> >>>> >>> Hi Wenjia: >>> I will try to reproduce it. >> >> Thanks! >> >> In addition, the last time I sent you a >>> issue about the smc-tool, do you have any idea? >>> >> > Hi Wenjia: > I have send it to you. Could you receive it? > > Thank you. > Zhengchao Shao yes I see it, thank you. I'll look into it as soon as possilbe.
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index ad5bab6a44b6..c0a228def6da 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -386,7 +386,7 @@ static struct sock *smc_sock_alloc(struct net *net, struct socket *sock, INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work); INIT_LIST_HEAD(&smc->accept_q); spin_lock_init(&smc->accept_q_lock); - spin_lock_init(&smc->conn.send_lock); + mutex_init(&smc->conn.send_lock); sk->sk_prot->hash(sk); mutex_init(&smc->clcsock_release_lock); smc_init_saved_callbacks(smc); diff --git a/net/smc/smc.h b/net/smc/smc.h index 18c8b7870198..ba8efed240e3 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -194,7 +194,7 @@ struct smc_connection { atomic_t sndbuf_space; /* remaining space in sndbuf */ u16 tx_cdc_seq; /* sequence # for CDC send */ u16 tx_cdc_seq_fin; /* sequence # - tx completed */ - spinlock_t send_lock; /* protect wr_sends */ + struct mutex send_lock; /* protect wr_sends */ atomic_t cdc_pend_tx_wr; /* number of pending tx CDC wqe * - inc when post wqe, * - dec on polled tx cqe diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c index 3c06625ceb20..f8ad0035905a 100644 --- a/net/smc/smc_cdc.c +++ b/net/smc/smc_cdc.c @@ -186,10 +186,10 @@ static int smcr_cdc_get_slot_and_msg_send(struct smc_connection *conn) if (rc) goto put_out; - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); if (link != conn->lnk) { /* link of connection changed, try again one time*/ - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); smc_wr_tx_put_slot(link, (struct smc_wr_tx_pend_priv *)pend); smc_wr_tx_link_put(link); @@ -199,7 +199,7 @@ static int smcr_cdc_get_slot_and_msg_send(struct smc_connection *conn) goto again; } rc = smc_cdc_msg_send(conn, wr_buf, pend); - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); put_out: smc_wr_tx_link_put(link); return rc; @@ -214,9 +214,9 @@ int smc_cdc_get_slot_and_msg_send(struct smc_connection *conn) return -EPIPE; if (conn->lgr->is_smcd) { - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); rc = smcd_cdc_msg_send(conn); - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); } else { rc = smcr_cdc_get_slot_and_msg_send(conn); } @@ -308,10 +308,10 @@ static void smc_cdc_msg_validate(struct smc_sock *smc, struct smc_cdc_msg *cdc, if (diff < 0) { /* diff larger than 0x7fff */ /* drop connection */ conn->out_of_sync = 1; /* prevent any further receives */ - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); conn->local_tx_ctrl.conn_state_flags.peer_conn_abort = 1; conn->lnk = link; - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); sock_hold(&smc->sk); /* sock_put in abort_work */ if (!queue_work(smc_close_wq, &conn->abort_work)) sock_put(&smc->sk); diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 9b84d5897aa5..21e0d95ab8c8 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1083,9 +1083,9 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr, smc->sk.sk_state == SMC_PEERFINCLOSEWAIT || smc->sk.sk_state == SMC_PEERABORTWAIT || smc->sk.sk_state == SMC_PROCESSABORT) { - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); smc_switch_link_and_count(conn, to_lnk); - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); continue; } sock_hold(&smc->sk); @@ -1095,10 +1095,10 @@ struct smc_link *smc_switch_conns(struct smc_link_group *lgr, if (rc) goto err_out; /* avoid race with smcr_tx_sndbuf_nonempty() */ - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); smc_switch_link_and_count(conn, to_lnk); rc = smc_switch_cursor(smc, pend, wr_buf); - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); sock_put(&smc->sk); if (rc) goto err_out; diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c index 214ac3cbcf9a..b6790bd82b4e 100644 --- a/net/smc/smc_tx.c +++ b/net/smc/smc_tx.c @@ -573,7 +573,7 @@ static int smcr_tx_sndbuf_nonempty(struct smc_connection *conn) return rc; } - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); if (link != conn->lnk) { /* link of connection changed, tx_work will restart */ smc_wr_tx_put_slot(link, @@ -597,7 +597,7 @@ static int smcr_tx_sndbuf_nonempty(struct smc_connection *conn) } out_unlock: - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); smc_wr_tx_link_put(link); return rc; } @@ -607,7 +607,7 @@ static int smcd_tx_sndbuf_nonempty(struct smc_connection *conn) struct smc_cdc_producer_flags *pflags = &conn->local_tx_ctrl.prod_flags; int rc = 0; - spin_lock_bh(&conn->send_lock); + mutex_lock(&conn->send_lock); if (!pflags->urg_data_present) rc = smc_tx_rdma_writes(conn, NULL); if (!rc) @@ -617,7 +617,7 @@ static int smcd_tx_sndbuf_nonempty(struct smc_connection *conn) pflags->urg_data_pending = 0; pflags->urg_data_present = 0; } - spin_unlock_bh(&conn->send_lock); + mutex_unlock(&conn->send_lock); return rc; }
Potential sleeping issue exists in the following processes: smc_switch_conns spin_lock_bh(&conn->send_lock) smc_switch_link_and_count smcr_link_put __smcr_link_clear smc_lgr_put __smc_lgr_free smc_lgr_free_bufs __smc_lgr_free_bufs smc_buf_free smcr_buf_free smcr_buf_unmap_link smc_ib_put_memory_region ib_dereg_mr ib_dereg_mr_user mr->device->ops.dereg_mr If scheduling exists when the IB driver implements .dereg_mr hook function, the bug "scheduling while atomic" will occur. For example, cxgb4 and efa driver. Use mutex lock instead of spin lock to fix it. Fixes: 20c9398d3309 ("net/smc: Resolve the race between SMC-R link access and clear") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> --- net/smc/af_smc.c | 2 +- net/smc/smc.h | 2 +- net/smc/smc_cdc.c | 14 +++++++------- net/smc/smc_core.c | 8 ++++---- net/smc/smc_tx.c | 8 ++++---- 5 files changed, 17 insertions(+), 17 deletions(-)