Message ID | 20230105120518.29776-1-mateusz.palczewski@intel.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] ice: Fix deadlock on the rtnl_mutex | expand |
On Thu, Jan 05, 2023 at 07:05:18AM -0500, Mateusz Palczewski wrote: > There is a deadlock on rtnl_mutex when attempting to take the lock > in unregister_netdev() after it has already been taken by > ethnl_set_channels(). This happened when unregister_netdev() was > called inside of ice_vsi_rebuild(). > Fix that by removing the unregister_netdev() usage and replace it with > ice_vsi_clear_rings() that deallocates the tx and rx rings for the VSI. > > Fixes: df0f847915b4 ("ice: Move common functions out of ice_main.c part 6/7") > Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> > --- > v2: Fixed goto unwind to remove code redundancy > --- > drivers/net/ethernet/intel/ice/ice_lib.c | 35 ++++++++++++------------ > 1 file changed, 17 insertions(+), 18 deletions(-) > I think that it will be beneficial to have lockdep trace in commit message too. Thanks, Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
On Thu, 5 Jan 2023 07:05:18 -0500 Mateusz Palczewski wrote: > ret = ice_vsi_alloc_q_vectors(vsi); > - if (ret) > - goto err_rings; > + if (ret){ > + ice_vsi_clear_rings(vsi); > + goto err_reset; > + } > > ret = ice_vsi_setup_vector_base(vsi); > if (ret) Why do cases which jump to err_vectors no need any changes?
Hi, Sorry for late response > > >> ret = ice_vsi_alloc_q_vectors(vsi); >> - if (ret) >> - goto err_rings; >> + if (ret){ >> + ice_vsi_clear_rings(vsi); >> + goto err_reset; >> + } >> >> ret = ice_vsi_setup_vector_base(vsi); >> if (ret) > >Why do cases which jump to err_vectors no need any changes? During my testing I saw no issues with cases when goto err_vectors were used. Did You manage to have other results? > Regards, Mateusz
On Thu, 12 Jan 2023 13:31:06 +0000 Palczewski, Mateusz wrote: > >Why do cases which jump to err_vectors no need any changes? > > During my testing I saw no issues with cases when goto err_vectors > were used. Did You manage to have other results? I'm just reviewing the code. Exhaustively testing all the cases is probably very hard, which is why we generally try to reason about the code from first principles. IOW "it didn't fail in my testing" is rarely a sufficient proof upstream.
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 94aa834cd9a6..e5e96dad3563 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3479,8 +3479,10 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) ice_vsi_set_num_qs(vsi, NULL); ret = ice_vsi_alloc_arrays(vsi); - if (ret < 0) - goto err_vsi; + if (ret < 0){ + ice_vsi_clear(vsi); + goto err_reset; + } ice_vsi_get_qs(vsi); @@ -3489,16 +3491,19 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) /* Initialize VSI struct elements and create VSI in FW */ ret = ice_vsi_init(vsi, init_vsi); - if (ret < 0) - goto err_vsi; - + if (ret < 0){ + ice_vsi_clear(vsi); + goto err_reset; + } switch (vtype) { case ICE_VSI_CTRL: case ICE_VSI_SWITCHDEV_CTRL: case ICE_VSI_PF: ret = ice_vsi_alloc_q_vectors(vsi); - if (ret) - goto err_rings; + if (ret){ + ice_vsi_clear_rings(vsi); + goto err_reset; + } ret = ice_vsi_setup_vector_base(vsi); if (ret) @@ -3544,8 +3549,10 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) break; case ICE_VSI_VF: ret = ice_vsi_alloc_q_vectors(vsi); - if (ret) - goto err_rings; + if (ret){ + ice_vsi_clear_rings(vsi); + goto err_reset; + } ret = ice_vsi_set_q_vectors_reg_idx(vsi); if (ret) @@ -3618,15 +3625,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) err_vectors: ice_vsi_free_q_vectors(vsi); -err_rings: - if (vsi->netdev) { - vsi->current_netdev_flags = 0; - unregister_netdev(vsi->netdev); - free_netdev(vsi->netdev); - vsi->netdev = NULL; - } -err_vsi: - ice_vsi_clear(vsi); +err_reset: set_bit(ICE_RESET_FAILED, pf->state); kfree(coalesce); return ret;
There is a deadlock on rtnl_mutex when attempting to take the lock in unregister_netdev() after it has already been taken by ethnl_set_channels(). This happened when unregister_netdev() was called inside of ice_vsi_rebuild(). Fix that by removing the unregister_netdev() usage and replace it with ice_vsi_clear_rings() that deallocates the tx and rx rings for the VSI. Fixes: df0f847915b4 ("ice: Move common functions out of ice_main.c part 6/7") Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> --- v2: Fixed goto unwind to remove code redundancy --- drivers/net/ethernet/intel/ice/ice_lib.c | 35 ++++++++++++------------ 1 file changed, 17 insertions(+), 18 deletions(-)