Message ID | 1646140644-121649-1-git-send-email-alibuda@linux.alibaba.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/smc: fix unexpected SMC_CLC_DECL_ERR_REGRMB error | expand |
在 2022/3/1 下午9:17, D. Wythe 写道: > From: "D. Wythe" <alibuda@linux.alibaba.com> > > Remove connections from link group is not synchronous with handling > SMC_LLC_DELETE_RKEY, which means that even the number of connections is > less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can > register rtoken successfully later, in other words, the rtoken entry may > have not been released. This will cause an unexpected > SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have > to fallback to TCP. > IMHO, if there are SMC_RMBS_PER_LGR_MAX connections in the link group now, one of them is being removed and here comes a new connection at this moment, then: (1) without this patch, the new connection will be added into the old link group but fallback if the removing connection has not finished unregistering its rmb. (2) with this patch, a new link group will be created and the new connection will be added into the new link group. I am wondering if (1) should be considered as a issue, or just a bydesign? If it is a issue, I think this patch works, Thanks! Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Best Regards, Wen Gu
On Wed, Mar 02, 2022 at 03:28:32PM +0800, Wen Gu wrote: > > > 在 2022/3/1 下午9:17, D. Wythe 写道: > >From: "D. Wythe" <alibuda@linux.alibaba.com> > > > >Remove connections from link group is not synchronous with handling > >SMC_LLC_DELETE_RKEY, which means that even the number of connections is > >less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can > >register rtoken successfully later, in other words, the rtoken entry may > >have not been released. This will cause an unexpected > >SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have > >to fallback to TCP. > > > > > IMHO, if there are SMC_RMBS_PER_LGR_MAX connections in the link group now, > one of them is being removed and here comes a new connection at this moment, > then: > > (1) without this patch, the new connection will be added into the old link group > but fallback if the removing connection has not finished unregistering its rmb. > > (2) with this patch, a new link group will be created and the new connection > will be added into the new link group. > > I am wondering if (1) should be considered as a issue, or just a bydesign? > If it is a issue, I think this patch works, Thanks! We should always be willing to improve the success rate of the SMC connection, creating a new group is not a side effect of this patch, it actually dues to the state bewteen connections that can not achieve clock synchronization. In fact, it can happen in any times. Thanks.
On 2022/3/2 3:51 pm, D. Wythe wrote: > We should always be willing to improve the success rate of the SMC > connection, creating a new group is not a side effect of this patch, it > actually dues to the state bewteen connections that can not achieve > clock synchronization. In fact, it can happen in any times. > > Thanks. OK, I understand. Thanks.
在 2022/3/1 下午9:17, D. Wythe 写道: > From: "D. Wythe" <alibuda@linux.alibaba.com> > > Remove connections from link group is not synchronous with handling > SMC_LLC_DELETE_RKEY, which means that even the number of connections is > less that SMC_RMBS_PER_LGR_MAX, it does not mean that the connection can > register rtoken successfully later, in other words, the rtoken entry may > have not been released. This will cause an unexpected > SMC_CLC_DECL_ERR_REGRMB to be reported, and then ths smc connection have > to fallback to TCP. > > We found that the main reason for the problem dues to following execution > sequence: > > Server Conn A: Server Conn B: Client Conn B: > > smc_lgr_unregister_conn > smc_lgr_register_conn > smc_clc_send_accept -> > smc_rtoken_add > smcr_buf_unuse > -> Client Conn A: > smc_rtoken_delete > > smc_lgr_unregister_conn() makes current link available to assigned to new > incoming connection, while smcr_buf_unuse() has not executed yet, which > means that smc_rtoken_add may fail because of insufficient rtoken_entry, > reversing their execution order will avoid this problem. > > Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") > Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> > --- > net/smc/smc_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c > index 2f321d2..c9c3a68 100644 > --- a/net/smc/smc_core.c > +++ b/net/smc/smc_core.c > @@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn) > cancel_work_sync(&conn->abort_work); > } > if (!list_empty(&lgr->list)) { > - smc_lgr_unregister_conn(conn); > smc_buf_unuse(conn, lgr); /* allow buffer reuse */ > + smc_lgr_unregister_conn(conn); > } > > if (!lgr->conns_num) I have two patch for this issue, and i missed one, I'll post it in v2 series.
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 2f321d2..c9c3a68 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -1161,8 +1161,8 @@ void smc_conn_free(struct smc_connection *conn) cancel_work_sync(&conn->abort_work); } if (!list_empty(&lgr->list)) { - smc_lgr_unregister_conn(conn); smc_buf_unuse(conn, lgr); /* allow buffer reuse */ + smc_lgr_unregister_conn(conn); } if (!lgr->conns_num)