Message ID | 20250219202719.957100-4-sdf@fomichev.me (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: Hold netdev instance lock during ndo operations | expand |
2025-02-19, 12:27:10 -0800, Stanislav Fomichev wrote: > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c > index 533e659b15b3..cf9bd08d04b2 100644 > --- a/drivers/net/ethernet/google/gve/gve_main.c > +++ b/drivers/net/ethernet/google/gve/gve_main.c > @@ -1886,7 +1886,7 @@ static void gve_turndown(struct gve_priv *priv) > netif_queue_set_napi(priv->dev, idx, > NETDEV_QUEUE_TYPE_TX, NULL); > > - napi_disable(&block->napi); > + napi_disable_locked(&block->napi); I don't think all the codepaths that can lead to gve_turndown have the required netdev_lock(): gve_resume -> gve_reset_recovery -> gve_turndown gve_user_reset -> gve_reset -> gve_reset_recovery (and nit:) There's also a few places in the series (bnxt, ethtool calling __netdev_update_features) where the lockdep annotation/_locked() variant gets introduced before the patch adding the corresponding lock.
On 02/20, Sabrina Dubroca wrote: > 2025-02-19, 12:27:10 -0800, Stanislav Fomichev wrote: > > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c > > index 533e659b15b3..cf9bd08d04b2 100644 > > --- a/drivers/net/ethernet/google/gve/gve_main.c > > +++ b/drivers/net/ethernet/google/gve/gve_main.c > > @@ -1886,7 +1886,7 @@ static void gve_turndown(struct gve_priv *priv) > > netif_queue_set_napi(priv->dev, idx, > > NETDEV_QUEUE_TYPE_TX, NULL); > > > > - napi_disable(&block->napi); > > + napi_disable_locked(&block->napi); > > I don't think all the codepaths that can lead to gve_turndown have the > required netdev_lock(): > > gve_resume -> gve_reset_recovery -> gve_turndown Good catch, looks like suspend is missing the netdev lock as well, will add. > gve_user_reset -> gve_reset -> gve_reset_recovery I believe this should be covered by patch "net: ethtool: try to protect all callback with netdev instance lock", no? __dev_ethtool netdev_lock_ops ethtool_reset gve_user_reset Or is there some other reset path I'm missing? > (and nit:) There's also a few places in the series (bnxt, ethtool > calling __netdev_update_features) where the lockdep > annotation/_locked() variant gets introduced before the patch adding > the corresponding lock. This is mostly about ethtool patch and queue ops patch? The latter converts most of the napi/netif calls to _locked variant leaving a small window where some of the paths might be not properly locked. Not sure what to do about it, but probably nothing since everything is still rtnl_lock-protected and the issue is mostly about (temporary) wrong lockdep annotations? Any other suggestions? Thanks for the review!
2025-02-20, 09:00:24 -0800, Stanislav Fomichev wrote: > On 02/20, Sabrina Dubroca wrote: > > 2025-02-19, 12:27:10 -0800, Stanislav Fomichev wrote: > > > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c > > > index 533e659b15b3..cf9bd08d04b2 100644 > > > --- a/drivers/net/ethernet/google/gve/gve_main.c > > > +++ b/drivers/net/ethernet/google/gve/gve_main.c > > > @@ -1886,7 +1886,7 @@ static void gve_turndown(struct gve_priv *priv) > > > netif_queue_set_napi(priv->dev, idx, > > > NETDEV_QUEUE_TYPE_TX, NULL); > > > > > > - napi_disable(&block->napi); > > > + napi_disable_locked(&block->napi); > > > > I don't think all the codepaths that can lead to gve_turndown have the > > required netdev_lock(): > > > > gve_resume -> gve_reset_recovery -> gve_turndown > Good catch, looks like suspend is missing the netdev lock as well, will > add. > > > gve_user_reset -> gve_reset -> gve_reset_recovery > I believe this should be covered by patch "net: ethtool: try to protect > all callback with netdev instance lock", no? > > __dev_ethtool > netdev_lock_ops > ethtool_reset > gve_user_reset Ah, right, sorry, I missed that. > Or is there some other reset path I'm missing? Looking at net/ethtool, maybe cmis_fw_update_reset? module_flash_fw_work -> ethtool_cmis_fw_update -> cmis_fw_update_reset -> ->reset() (no idea if it can ever be called for those drivers) > > (and nit:) There's also a few places in the series (bnxt, ethtool > > calling __netdev_update_features) where the lockdep > > annotation/_locked() variant gets introduced before the patch adding > > the corresponding lock. > > This is mostly about ethtool patch and queue ops patch? Patch 04 also adds a lockdep annotation to __netdev_update_features, which gets call (unlocked until the ethtool patch) from ethtool. > The latter > converts most of the napi/netif calls to _locked variant leaving > a small window where some of the paths might be not properly locked. > Not sure what to do about it, but probably nothing since everything > is still rtnl_lock-protected and the issue is mostly about (temporary) > wrong lockdep annotations? Yes, it's temporary (I didn't check the final bnxt patch to see if it covers all paths). > Any other suggestions? I guess the alternative would be introducing netdev_lock where it belongs before adding the lockdep annotations/switching to _locked() variants. Maybe it's not worth the pain of reworking this patchset if it ends up in the correct state anyway, I don't know. Probably more a question for the maintainers, depending on what they prefer. > Thanks for the review! Thanks
On 02/21, Sabrina Dubroca wrote: > 2025-02-20, 09:00:24 -0800, Stanislav Fomichev wrote: > > On 02/20, Sabrina Dubroca wrote: > > > 2025-02-19, 12:27:10 -0800, Stanislav Fomichev wrote: > > > > diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c > > > > index 533e659b15b3..cf9bd08d04b2 100644 > > > > --- a/drivers/net/ethernet/google/gve/gve_main.c > > > > +++ b/drivers/net/ethernet/google/gve/gve_main.c > > > > @@ -1886,7 +1886,7 @@ static void gve_turndown(struct gve_priv *priv) > > > > netif_queue_set_napi(priv->dev, idx, > > > > NETDEV_QUEUE_TYPE_TX, NULL); > > > > > > > > - napi_disable(&block->napi); > > > > + napi_disable_locked(&block->napi); > > > > > > I don't think all the codepaths that can lead to gve_turndown have the > > > required netdev_lock(): > > > > > > gve_resume -> gve_reset_recovery -> gve_turndown > > Good catch, looks like suspend is missing the netdev lock as well, will > > add. > > > > > gve_user_reset -> gve_reset -> gve_reset_recovery > > I believe this should be covered by patch "net: ethtool: try to protect > > all callback with netdev instance lock", no? > > > > __dev_ethtool > > netdev_lock_ops > > ethtool_reset > > gve_user_reset > > Ah, right, sorry, I missed that. > > > Or is there some other reset path I'm missing? > > Looking at net/ethtool, maybe cmis_fw_update_reset? > module_flash_fw_work -> ethtool_cmis_fw_update -> cmis_fw_update_reset -> ->reset() > > (no idea if it can ever be called for those drivers) Hmm, and this workqueue work doesn't grab rtnl_lock, interesting.. Let me add netdev_lock_ops just in case, won't hurt. > > > (and nit:) There's also a few places in the series (bnxt, ethtool > > > calling __netdev_update_features) where the lockdep > > > annotation/_locked() variant gets introduced before the patch adding > > > the corresponding lock. > > > > This is mostly about ethtool patch and queue ops patch? > > Patch 04 also adds a lockdep annotation to __netdev_update_features, > which gets call (unlocked until the ethtool patch) from ethtool. > > > The latter > > converts most of the napi/netif calls to _locked variant leaving > > a small window where some of the paths might be not properly locked. > > Not sure what to do about it, but probably nothing since everything > > is still rtnl_lock-protected and the issue is mostly about (temporary) > > wrong lockdep annotations? > > Yes, it's temporary (I didn't check the final bnxt patch to see if it > covers all paths). > > > Any other suggestions? > > I guess the alternative would be introducing netdev_lock where it > belongs before adding the lockdep annotations/switching to _locked() > variants. > > Maybe it's not worth the pain of reworking this patchset if it ends up > in the correct state anyway, I don't know. Probably more a question > for the maintainers, depending on what they prefer. SG! I did try to to do this initially (adding extra locking, then removing extra locking), but it seemed to only make everything more difficult to review (imo), so I reverted to a simplified way :-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 15c57a06ecaf..ead9fcaad6bd 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11508,7 +11508,7 @@ static int bnxt_request_irq(struct bnxt *bp) if (rc) break; - netif_napi_set_irq(&bp->bnapi[i]->napi, irq->vector); + netif_napi_set_irq_locked(&bp->bnapi[i]->napi, irq->vector); irq->requested = 1; if (zalloc_cpumask_var(&irq->cpu_mask, GFP_KERNEL)) { @@ -11557,9 +11557,9 @@ static void bnxt_del_napi(struct bnxt *bp) for (i = 0; i < bp->cp_nr_rings; i++) { struct bnxt_napi *bnapi = bp->bnapi[i]; - __netif_napi_del(&bnapi->napi); + __netif_napi_del_locked(&bnapi->napi); } - /* We called __netif_napi_del(), we need + /* We called __netif_napi_del_locked(), we need * to respect an RCU grace period before freeing napi structures. */ synchronize_net(); @@ -11578,12 +11578,12 @@ static void bnxt_init_napi(struct bnxt *bp) cp_nr_rings--; for (i = 0; i < cp_nr_rings; i++) { bnapi = bp->bnapi[i]; - netif_napi_add_config(bp->dev, &bnapi->napi, poll_fn, - bnapi->index); + netif_napi_add_config_locked(bp->dev, &bnapi->napi, poll_fn, + bnapi->index); } if (BNXT_CHIP_TYPE_NITRO_A0(bp)) { bnapi = bp->bnapi[cp_nr_rings]; - netif_napi_add(bp->dev, &bnapi->napi, bnxt_poll_nitroa0); + netif_napi_add_locked(bp->dev, &bnapi->napi, bnxt_poll_nitroa0); } } @@ -11604,7 +11604,7 @@ static void bnxt_disable_napi(struct bnxt *bp) cpr->sw_stats->tx.tx_resets++; if (bnapi->in_reset) cpr->sw_stats->rx.rx_resets++; - napi_disable(&bnapi->napi); + napi_disable_locked(&bnapi->napi); } } @@ -11626,7 +11626,7 @@ static void bnxt_enable_napi(struct bnxt *bp) INIT_WORK(&cpr->dim.work, bnxt_dim_work); cpr->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE; } - napi_enable(&bnapi->napi); + napi_enable_locked(&bnapi->napi); } } diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c index 533e659b15b3..cf9bd08d04b2 100644 --- a/drivers/net/ethernet/google/gve/gve_main.c +++ b/drivers/net/ethernet/google/gve/gve_main.c @@ -1886,7 +1886,7 @@ static void gve_turndown(struct gve_priv *priv) netif_queue_set_napi(priv->dev, idx, NETDEV_QUEUE_TYPE_TX, NULL); - napi_disable(&block->napi); + napi_disable_locked(&block->napi); } for (idx = 0; idx < priv->rx_cfg.num_queues; idx++) { int ntfy_idx = gve_rx_idx_to_ntfy(priv, idx); @@ -1897,7 +1897,7 @@ static void gve_turndown(struct gve_priv *priv) netif_queue_set_napi(priv->dev, idx, NETDEV_QUEUE_TYPE_RX, NULL); - napi_disable(&block->napi); + napi_disable_locked(&block->napi); } /* Stop tx queues */ @@ -1925,7 +1925,7 @@ static void gve_turnup(struct gve_priv *priv) if (!gve_tx_was_added_to_block(priv, idx)) continue; - napi_enable(&block->napi); + napi_enable_locked(&block->napi); if (idx < priv->tx_cfg.num_queues) netif_queue_set_napi(priv->dev, idx, @@ -1953,7 +1953,7 @@ static void gve_turnup(struct gve_priv *priv) if (!gve_rx_was_added_to_block(priv, idx)) continue; - napi_enable(&block->napi); + napi_enable_locked(&block->napi); netif_queue_set_napi(priv->dev, idx, NETDEV_QUEUE_TYPE_RX, &block->napi); diff --git a/drivers/net/ethernet/google/gve/gve_utils.c b/drivers/net/ethernet/google/gve/gve_utils.c index 30fef100257e..ace9b8698021 100644 --- a/drivers/net/ethernet/google/gve/gve_utils.c +++ b/drivers/net/ethernet/google/gve/gve_utils.c @@ -110,13 +110,13 @@ void gve_add_napi(struct gve_priv *priv, int ntfy_idx, { struct gve_notify_block *block = &priv->ntfy_blocks[ntfy_idx]; - netif_napi_add(priv->dev, &block->napi, gve_poll); - netif_napi_set_irq(&block->napi, block->irq); + netif_napi_add_locked(priv->dev, &block->napi, gve_poll); + netif_napi_set_irq_locked(&block->napi, block->irq); } void gve_remove_napi(struct gve_priv *priv, int ntfy_idx) { struct gve_notify_block *block = &priv->ntfy_blocks[ntfy_idx]; - netif_napi_del(&block->napi); + netif_napi_del_locked(&block->napi); } diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index d857ae3c8f06..07f4670d69d5 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -664,7 +664,7 @@ nsim_queue_mem_alloc(struct net_device *dev, void *per_queue_mem, int idx) goto err_free; if (!ns->rq_reset_mode) - netif_napi_add_config(dev, &qmem->rq->napi, nsim_poll, idx); + netif_napi_add_config_locked(dev, &qmem->rq->napi, nsim_poll, idx); return 0; @@ -681,7 +681,7 @@ static void nsim_queue_mem_free(struct net_device *dev, void *per_queue_mem) page_pool_destroy(qmem->pp); if (qmem->rq) { if (!ns->rq_reset_mode) - netif_napi_del(&qmem->rq->napi); + netif_napi_del_locked(&qmem->rq->napi); page_pool_destroy(qmem->rq->page_pool); nsim_queue_free(qmem->rq); } @@ -693,9 +693,11 @@ nsim_queue_start(struct net_device *dev, void *per_queue_mem, int idx) struct nsim_queue_mem *qmem = per_queue_mem; struct netdevsim *ns = netdev_priv(dev); + netdev_assert_locked(dev); + if (ns->rq_reset_mode == 1) { ns->rq[idx]->page_pool = qmem->pp; - napi_enable(&ns->rq[idx]->napi); + napi_enable_locked(&ns->rq[idx]->napi); return 0; } @@ -703,15 +705,15 @@ nsim_queue_start(struct net_device *dev, void *per_queue_mem, int idx) * here we want to test various call orders. */ if (ns->rq_reset_mode == 2) { - netif_napi_del(&ns->rq[idx]->napi); - netif_napi_add_config(dev, &qmem->rq->napi, nsim_poll, idx); + netif_napi_del_locked(&ns->rq[idx]->napi); + netif_napi_add_config_locked(dev, &qmem->rq->napi, nsim_poll, idx); } else if (ns->rq_reset_mode == 3) { - netif_napi_add_config(dev, &qmem->rq->napi, nsim_poll, idx); - netif_napi_del(&ns->rq[idx]->napi); + netif_napi_add_config_locked(dev, &qmem->rq->napi, nsim_poll, idx); + netif_napi_del_locked(&ns->rq[idx]->napi); } ns->rq[idx] = qmem->rq; - napi_enable(&ns->rq[idx]->napi); + napi_enable_locked(&ns->rq[idx]->napi); return 0; } @@ -721,7 +723,9 @@ static int nsim_queue_stop(struct net_device *dev, void *per_queue_mem, int idx) struct nsim_queue_mem *qmem = per_queue_mem; struct netdevsim *ns = netdev_priv(dev); - napi_disable(&ns->rq[idx]->napi); + netdev_assert_locked(dev); + + napi_disable_locked(&ns->rq[idx]->napi); if (ns->rq_reset_mode == 1) { qmem->pp = ns->rq[idx]->page_pool; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 30daef442a13..79d68ebfa606 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2726,7 +2726,7 @@ static inline void netdev_assert_locked_or_invisible(struct net_device *dev) static inline bool netdev_need_ops_lock(struct net_device *dev) { - bool ret = false; + bool ret = !!dev->queue_mgmt_ops; #if IS_ENABLED(CONFIG_NET_SHAPER) ret |= !!dev->netdev_ops->net_shaper_ops; diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c index ddd54e1e7289..7419c41fd3cb 100644 --- a/net/core/netdev_rx_queue.c +++ b/net/core/netdev_rx_queue.c @@ -30,6 +30,8 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx) goto err_free_new_mem; } + netdev_lock(dev); + err = qops->ndo_queue_mem_alloc(dev, new_mem, rxq_idx); if (err) goto err_free_old_mem; @@ -52,6 +54,8 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx) qops->ndo_queue_mem_free(dev, old_mem); + netdev_unlock(dev); + kvfree(old_mem); kvfree(new_mem); @@ -76,6 +80,7 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx) qops->ndo_queue_mem_free(dev, new_mem); err_free_old_mem: + netdev_unlock(dev); kvfree(old_mem); err_free_new_mem:
For the drivers that use queue management API, switch to the mode where core stack holds the netdev instance lock. This affects the following drivers: - bnxt - gve - netdevsim Originally I locked only start/stop, but switched to holding the lock over all iterations to make them look atomic to the device (feels like it should be easier to reason about). Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 16 +++++++-------- drivers/net/ethernet/google/gve/gve_main.c | 8 ++++---- drivers/net/ethernet/google/gve/gve_utils.c | 6 +++--- drivers/net/netdevsim/netdev.c | 22 ++++++++++++--------- include/linux/netdevice.h | 2 +- net/core/netdev_rx_queue.c | 5 +++++ 6 files changed, 34 insertions(+), 25 deletions(-)