mbox series

[net,v2,0/3] net/smc: Fixes for race in smc link group termination

Message ID 1642063002-45688-1-git-send-email-guwen@linux.alibaba.com (mailing list archive)
Headers show
Series net/smc: Fixes for race in smc link group termination | expand

Message

Wen Gu Jan. 13, 2022, 8:36 a.m. UTC
We encountered some crashes recently and they are caused by the
race between the access and free of link/link group in abnormal
smc link group termination. The crashes can be reproduced in
frequent abnormal link group termination, like setting RNICs up/down.

This set of patches tries to fix this by extending the life cycle
of link/link group to ensure that they won't be referred to after
cleared or freed.

v1 -> v2:
- Improve some comments.

- Move codes of waking up lgrs_deleted wait queue from smc_lgr_free()
  to __smc_lgr_free().

- Move codes of waking up links_deleted wait queue from smcr_link_clear()
  to __smcr_link_clear().

- Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear()
  to __smcr_link_clear()

- Move smc_lgr_put() to the end of __smcr_link_clear().

- Call smc_lgr_put() after 'out' tag in smcr_link_init() when link
  initialization fails.

- Modify the location where smc connection holds the lgr or link.

    before:
      * hold lgr in smc_lgr_register_conn().
      * hold link in smcr_lgr_conn_assign_link().
    after:
      * hold both lgr and link in smc_conn_create().

  Modify the location to symmetrical with the place where smc connections
  put the lgr or link, which is smc_conn_free().

- Initialize conn->freed as zero in smc_conn_create().

Wen Gu (3):
  net/smc: Resolve the race between link group access and termination
  net/smc: Introduce a new conn->lgr validity check helper
  net/smc: Resolve the race between SMC-R link access and clear

 net/smc/af_smc.c   |   6 ++-
 net/smc/smc.h      |   1 +
 net/smc/smc_cdc.c  |   3 +-
 net/smc/smc_clc.c  |   2 +-
 net/smc/smc_core.c | 120 +++++++++++++++++++++++++++++++++++++++++------------
 net/smc/smc_core.h |  12 ++++++
 net/smc/smc_diag.c |   6 +--
 7 files changed, 118 insertions(+), 32 deletions(-)

Comments

Jakub Kicinski Jan. 13, 2022, 7:10 p.m. UTC | #1
On Thu, 13 Jan 2022 16:36:39 +0800 Wen Gu wrote:
> We encountered some crashes recently and they are caused by the
> race between the access and free of link/link group in abnormal
> smc link group termination. The crashes can be reproduced in
> frequent abnormal link group termination, like setting RNICs up/down.
> 
> This set of patches tries to fix this by extending the life cycle
> of link/link group to ensure that they won't be referred to after
> cleared or freed.

Looks applied, thanks.