Message ID | 20250116215530.158886-11-saeed@kernel.org (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR | expand |
On Thu, 16 Jan 2025 13:55:28 -0800 Saeed Mahameed wrote: > +static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = { > + .ndo_queue_mem_size = sizeof(struct mlx5_qmgmt_data), > + .ndo_queue_mem_alloc = mlx5e_queue_mem_alloc, > + .ndo_queue_mem_free = mlx5e_queue_mem_free, > + .ndo_queue_start = mlx5e_queue_start, > + .ndo_queue_stop = mlx5e_queue_stop, > +}; We need to pay off some technical debt we accrued before we merge more queue ops implementations. Specifically the locking needs to move from under rtnl. Sorry, this is not going in for 6.14.
On 16 Jan 15:21, Jakub Kicinski wrote: >On Thu, 16 Jan 2025 13:55:28 -0800 Saeed Mahameed wrote: >> +static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = { >> + .ndo_queue_mem_size = sizeof(struct mlx5_qmgmt_data), >> + .ndo_queue_mem_alloc = mlx5e_queue_mem_alloc, >> + .ndo_queue_mem_free = mlx5e_queue_mem_free, >> + .ndo_queue_start = mlx5e_queue_start, >> + .ndo_queue_stop = mlx5e_queue_stop, >> +}; > >We need to pay off some technical debt we accrued before we merge more >queue ops implementations. Specifically the locking needs to move from >under rtnl. Sorry, this is not going in for 6.14. What technical debt accrued ? I haven't seen any changes in queue API since bnxt and gve got merged, what changed since then ? mlx5 doesn't require rtnl if this is because of the assert, I can remove it. I don't understand what this series is being deferred for, please elaborate, what do I need to do to get it accepted ? Thanks, Saeed. >-- >pw-bot: defer
On Thu, 16 Jan 2025 15:46:43 -0800 Saeed Mahameed wrote: > >We need to pay off some technical debt we accrued before we merge more > >queue ops implementations. Specifically the locking needs to move from > >under rtnl. Sorry, this is not going in for 6.14. > > What technical debt accrued ? I haven't seen any changes in queue API since > bnxt and gve got merged, what changed since then ? > > mlx5 doesn't require rtnl if this is because of the assert, I can remove > it. I don't understand what this series is being deferred for, please > elaborate, what do I need to do to get it accepted ? Remove the dependency on rtnl_lock _in the core kernel_.
On 01/16, Jakub Kicinski wrote: > On Thu, 16 Jan 2025 15:46:43 -0800 Saeed Mahameed wrote: > > >We need to pay off some technical debt we accrued before we merge more > > >queue ops implementations. Specifically the locking needs to move from > > >under rtnl. Sorry, this is not going in for 6.14. > > > > What technical debt accrued ? I haven't seen any changes in queue API since > > bnxt and gve got merged, what changed since then ? > > > > mlx5 doesn't require rtnl if this is because of the assert, I can remove > > it. I don't understand what this series is being deferred for, please > > elaborate, what do I need to do to get it accepted ? > > Remove the dependency on rtnl_lock _in the core kernel_. IIUC, we want queue API to move away from rtnl and use only (new) netdev lock. Otherwise, removing this dependency in the future might be complicated. I'll talk to Jakub so can we can maybe get something out early in the next merge window so you can retest the mlx5 changes on top. Will that work? (unless, Saeed, you want to look into that core locking part yourself)
On Thu, 23 Jan 2025 16:39:05 -0800 Stanislav Fomichev wrote: > > > What technical debt accrued ? I haven't seen any changes in queue API since > > > bnxt and gve got merged, what changed since then ? > > > > > > mlx5 doesn't require rtnl if this is because of the assert, I can remove > > > it. I don't understand what this series is being deferred for, please > > > elaborate, what do I need to do to get it accepted ? > > > > Remove the dependency on rtnl_lock _in the core kernel_. > > IIUC, we want queue API to move away from rtnl and use only (new) netdev > lock. Otherwise, removing this dependency in the future might be > complicated. Correct. We only have one driver now which reportedly works (gve). Let's pull queues under optional netdev_lock protection. Then we can use queue mgmt op support as a carrot for drivers to convert / test the netdev_lock protection... "compliance". I added netdev_lock protection for NAPI before the merge window. Queues are configured in much more ad-hoc fashion, so I think the best way to make queue changes netdev_lock safe would be to wrap all driver ops which are currently under rtnl_lock with netdev_lock.
On 23 Jan 16:55, Jakub Kicinski wrote: >On Thu, 23 Jan 2025 16:39:05 -0800 Stanislav Fomichev wrote: >> > > What technical debt accrued ? I haven't seen any changes in queue API since >> > > bnxt and gve got merged, what changed since then ? >> > > >> > > mlx5 doesn't require rtnl if this is because of the assert, I can remove >> > > it. I don't understand what this series is being deferred for, please >> > > elaborate, what do I need to do to get it accepted ? >> > >> > Remove the dependency on rtnl_lock _in the core kernel_. >> >> IIUC, we want queue API to move away from rtnl and use only (new) netdev >> lock. Otherwise, removing this dependency in the future might be >> complicated. > >Correct. We only have one driver now which reportedly works (gve). >Let's pull queues under optional netdev_lock protection. >Then we can use queue mgmt op support as a carrot for drivers >to convert / test the netdev_lock protection... "compliance". > >I added netdev_lock protection for NAPI before the merge window. >Queues are configured in much more ad-hoc fashion, so I think >the best way to make queue changes netdev_lock safe would be to >wrap all driver ops which are currently under rtnl_lock with >netdev_lock. Are you expecting drivers to hold netdev_lock internally? I was thinking something more scalable, queue_mgmt API to take netdev_lock, and any other place in the stack that can access "netdev queue config" e.g ethtool/netlink/netdev_ops should grab netdev_lock as well, this is better for the future when we want to reduce rtnl usage in the stack to protect single netdev ops where netdev_lock will be sufficient, otherwise you will have to wait for ALL drivers to properly use netdev_lock internally to even start thinking of getting rid of rtnl from some parts of the core stack.
On Thu, 23 Jan 2025 19:11:23 -0800 Saeed Mahameed wrote: > On 23 Jan 16:55, Jakub Kicinski wrote: > >> IIUC, we want queue API to move away from rtnl and use only (new) netdev > >> lock. Otherwise, removing this dependency in the future might be > >> complicated. > > > >Correct. We only have one driver now which reportedly works (gve). > >Let's pull queues under optional netdev_lock protection. > >Then we can use queue mgmt op support as a carrot for drivers > >to convert / test the netdev_lock protection... "compliance". > > > >I added netdev_lock protection for NAPI before the merge window. > >Queues are configured in much more ad-hoc fashion, so I think > >the best way to make queue changes netdev_lock safe would be to > >wrap all driver ops which are currently under rtnl_lock with > >netdev_lock. > > Are you expecting drivers to hold netdev_lock internally? > I was thinking something more scalable, queue_mgmt API to take > netdev_lock, and any other place in the stack that can access > "netdev queue config" e.g ethtool/netlink/netdev_ops should grab > netdev_lock as well, this is better for the future when we want to > reduce rtnl usage in the stack to protect single netdev ops where > netdev_lock will be sufficient, otherwise you will have to wait for ALL > drivers to properly use netdev_lock internally to even start thinking of > getting rid of rtnl from some parts of the core stack. Agreed, expecting drivers to get the locking right internally is easier short term but messy long term. I'm thinking opt-in for drivers to have netdev_lock taken by the core. Probably around all ops which today hold rtnl_lock, to keep the expectations simple. net_shaper and queue_mgmt ops can require that drivers that support them opt-in and these ops can hold just the netdev_lock, no rtnl_lock.
On 24 Jan 07:26, Jakub Kicinski wrote: >On Thu, 23 Jan 2025 19:11:23 -0800 Saeed Mahameed wrote: >> On 23 Jan 16:55, Jakub Kicinski wrote: >> >> IIUC, we want queue API to move away from rtnl and use only (new) netdev >> >> lock. Otherwise, removing this dependency in the future might be >> >> complicated. >> > >> >Correct. We only have one driver now which reportedly works (gve). >> >Let's pull queues under optional netdev_lock protection. >> >Then we can use queue mgmt op support as a carrot for drivers >> >to convert / test the netdev_lock protection... "compliance". >> > >> >I added netdev_lock protection for NAPI before the merge window. >> >Queues are configured in much more ad-hoc fashion, so I think >> >the best way to make queue changes netdev_lock safe would be to >> >wrap all driver ops which are currently under rtnl_lock with >> >netdev_lock. >> >> Are you expecting drivers to hold netdev_lock internally? >> I was thinking something more scalable, queue_mgmt API to take >> netdev_lock, and any other place in the stack that can access >> "netdev queue config" e.g ethtool/netlink/netdev_ops should grab >> netdev_lock as well, this is better for the future when we want to >> reduce rtnl usage in the stack to protect single netdev ops where >> netdev_lock will be sufficient, otherwise you will have to wait for ALL >> drivers to properly use netdev_lock internally to even start thinking of >> getting rid of rtnl from some parts of the core stack. > >Agreed, expecting drivers to get the locking right internally is easier >short term but messy long term. I'm thinking opt-in for drivers to have >netdev_lock taken by the core. Probably around all ops which today hold >rtnl_lock, to keep the expectations simple. > Why opt-in? I don't see any overhead of taking netdev_lock by default in rtnl_lock flows. >net_shaper and queue_mgmt ops can require that drivers that support >them opt-in and these ops can hold just the netdev_lock, no rtnl_lock.
On Fri, 24 Jan 2025 11:34:54 -0800 Saeed Mahameed wrote: > On 24 Jan 07:26, Jakub Kicinski wrote: > >> Are you expecting drivers to hold netdev_lock internally? > >> I was thinking something more scalable, queue_mgmt API to take > >> netdev_lock, and any other place in the stack that can access > >> "netdev queue config" e.g ethtool/netlink/netdev_ops should grab > >> netdev_lock as well, this is better for the future when we want to > >> reduce rtnl usage in the stack to protect single netdev ops where > >> netdev_lock will be sufficient, otherwise you will have to wait for ALL > >> drivers to properly use netdev_lock internally to even start thinking of > >> getting rid of rtnl from some parts of the core stack. > > > >Agreed, expecting drivers to get the locking right internally is easier > >short term but messy long term. I'm thinking opt-in for drivers to have > >netdev_lock taken by the core. Probably around all ops which today hold > >rtnl_lock, to keep the expectations simple. > > Why opt-in? I don't see any overhead of taking netdev_lock by default in > rtnl_lock flows. We could, depends on how close we take the dev lock to the ndo vs to rtnl_lock. Some drivers may call back into the stack so if we're not careful enough we'll get flooded by static analysis reports saying that we had deadlocked some old Sun driver :( Then there are SW upper drivers like bonding which we'll need at the very least lockdep nesting allocations for. Would be great to solve all these issues, but IMHO not a hard requirement, we can at least start with opt in. Unless always taking the lock gives us some worthwhile invariant I haven't considered?
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 340ed7d3feac..1e03f2afe625 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -5489,6 +5489,101 @@ static const struct netdev_stat_ops mlx5e_stat_ops = { .get_base_stats = mlx5e_get_base_stats, }; +struct mlx5_qmgmt_data { + struct mlx5e_channel *c; + struct mlx5e_channel_param cparam; +}; + +static int mlx5e_queue_mem_alloc(struct net_device *dev, void *newq, int queue_index) +{ + struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq; + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_channels *chs = &priv->channels; + struct mlx5e_params params = chs->params; + struct mlx5_core_dev *mdev; + int err; + + ASSERT_RTNL(); + mutex_lock(&priv->state_lock); + if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { + err = -ENODEV; + goto unlock; + } + + if (queue_index >= chs->num) { + err = -ERANGE; + goto unlock; + } + + if (MLX5E_GET_PFLAG(&chs->params, MLX5E_PFLAG_TX_PORT_TS) || + chs->params.ptp_rx || + chs->params.xdp_prog || + priv->htb) { + netdev_err(priv->netdev, + "Cloning channels with Port/rx PTP, XDP or HTB is not supported\n"); + err = -EOPNOTSUPP; + goto unlock; + } + + mdev = mlx5_sd_ch_ix_get_dev(priv->mdev, queue_index); + err = mlx5e_build_channel_param(mdev, ¶ms, &new->cparam); + if (err) { + return err; + goto unlock; + } + + err = mlx5e_open_channel(priv, queue_index, ¶ms, NULL, &new->c); +unlock: + mutex_unlock(&priv->state_lock); + return err; +} + +static void mlx5e_queue_mem_free(struct net_device *dev, void *mem) +{ + struct mlx5_qmgmt_data *data = (struct mlx5_qmgmt_data *)mem; + + /* not supposed to happen since mlx5e_queue_start never fails + * but this is how this should be implemented just in case + */ + if (data->c) + mlx5e_close_channel(data->c); +} + +static int mlx5e_queue_stop(struct net_device *dev, void *oldq, int queue_index) +{ + /* mlx5e_queue_start does not fail, we stop the old queue there */ + return 0; +} + +static int mlx5e_queue_start(struct net_device *dev, void *newq, int queue_index) +{ + struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq; + struct mlx5e_priv *priv = netdev_priv(dev); + struct mlx5e_channel *old; + + mutex_lock(&priv->state_lock); + + /* stop and close the old */ + old = priv->channels.c[queue_index]; + mlx5e_deactivate_priv_channels(priv); + /* close old before activating new, to avoid napi conflict */ + mlx5e_close_channel(old); + + /* start the new */ + priv->channels.c[queue_index] = new->c; + mlx5e_activate_priv_channels(priv); + mutex_unlock(&priv->state_lock); + return 0; +} + +static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = { + .ndo_queue_mem_size = sizeof(struct mlx5_qmgmt_data), + .ndo_queue_mem_alloc = mlx5e_queue_mem_alloc, + .ndo_queue_mem_free = mlx5e_queue_mem_free, + .ndo_queue_start = mlx5e_queue_start, + .ndo_queue_stop = mlx5e_queue_stop, +}; + static void mlx5e_build_nic_netdev(struct net_device *netdev) { struct mlx5e_priv *priv = netdev_priv(netdev); @@ -5499,6 +5594,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) SET_NETDEV_DEV(netdev, mdev->device); netdev->netdev_ops = &mlx5e_netdev_ops; + netdev->queue_mgmt_ops = &mlx5e_queue_mgmt_ops; netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops; netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;