mbox series

[net-next,00/15] Rate management on traffic classes + misc

Message ID 20250209101716.112774-1-tariqt@nvidia.com (mailing list archive)
Headers show
Series Rate management on traffic classes + misc | expand

Message

Tariq Toukan Feb. 9, 2025, 10:17 a.m. UTC
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hi,

This patchset consists of multiple features from the team to the mlx5
core and Eth drivers.

The first 5 patches by Carolina are V7 of the feature that adds rate
management support on traffic classes in devlink and mlx5, more details
below [1].

Patches 6-8 by William reduce the memory consumption for representors to
achieve better scalability.

Patches 9-10 by Akiva expose ICM memory consumption per function.

Patches 11-13 expose helpful information on RSS resources in devlink RX
reporter diagnose.

Patches 14-15 are simple enhancements by Alex Lazar.

Regards,
Tariq


[1]
This is V7 of the feature. Find V6 here:
https://lore.kernel.org/all/20241209210950.290129-1-tariqt@nvidia.com/

This feature extends the devlink-rate API to support traffic class (TC)
bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.

Additionally, it refines the QoS handling in net/mlx5 to support TC
arbitration and bandwidth management on vports and rate nodes.

Discussions on traffic class shaping in net-shapers began in V5 [2],
where we discussed with maintainers whether net-shapers should support
traffic classes and how this could be implemented.

Later, after further conversations with Paolo Abeni and Simon Horman,
Cosmin provided an update [3], confirming that net-shapers' tree-based
hierarchy aligns well with traffic classes when treated as distinct
subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
queues and traffic classes, this approach seems feasible, though some
open questions remain regarding queue reconfiguration and certain mlx5
scheduling behaviors.

[2]
https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/
[3]
https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.camel@nvidia.com/

V7:
- Fixed disabling tc-bw on leaf nodes that did not have tc-bw
  configured.
- Fixed an issue where tc-bw was disabled on a node with assigned
  vports, ensuring that vport->qos.sched_node->parent is correctly
  updated with the cloned node.
- Declared a constant for the maximum allowed Traffic Class index in
  devlink rate.
- Added a range check to validate rate-tc-index.
- Added documentation for the tc-bw argument.
- Add a validation check to ensure that the total bandwidth assigned to
  all traffic classes sums to 100.

Akiva Goldberger (2):
  net/mlx5: Rename and move mlx5_esw_query_vport_vhca_id
  net/mlx5: Expose ICM consumption per function

Alexei Lazar (2):
  net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
  net/mlx5: XDP, Enable TX side XDP multi-buffer support

Amir Tzin (3):
  net/mlx5e: Move RQs diagnose to a dedicated function
  net/mlx5e: Add direct TIRs to devlink rx reporter diagnose
  net/mlx5e: Expose RSS via devlink rx reporter diagnose

Carolina Jubran (5):
  devlink: Extend devlink rate API with traffic classes bandwidth
    management
  net/mlx5: Add no-op implementation for setting tc-bw on rate objects
  net/mlx5: Add support for setting tc-bw on nodes
  net/mlx5: Add traffic class scheduling support for vport QoS
  net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw

William Tu (3):
  net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
  net/mlx5e: reduce rep rxq depth to 256 for ECPF
  net/mlx5e: set the tx_queue_len for pfifo_fast

 Documentation/netlink/specs/devlink.yaml      |  36 +-
 .../networking/devlink/devlink-port.rst       |   7 +
 Documentation/networking/devlink/mlx5.rst     |   4 +
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   2 +
 .../mellanox/mlx5/core/diag/reporter_vnic.c   |  46 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   3 -
 .../ethernet/mellanox/mlx5/core/en/params.c   |  16 +-
 .../ethernet/mellanox/mlx5/core/en/params.h   |   1 -
 .../mellanox/mlx5/core/en/reporter_rx.c       | 119 ++-
 .../mellanox/mlx5/core/en/reporter_tx.c       |   1 -
 .../net/ethernet/mellanox/mlx5/core/en/rss.c  |  15 +
 .../net/ethernet/mellanox/mlx5/core/en/rss.h  |   3 +
 .../ethernet/mellanox/mlx5/core/en/rx_res.c   |   9 +-
 .../ethernet/mellanox/mlx5/core/en/rx_res.h   |   5 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  49 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  29 -
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   5 +
 .../ethernet/mellanox/mlx5/core/en_selftest.c |   3 +
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 815 +++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.h |   4 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  13 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     |  29 +-
 .../ethernet/mellanox/mlx5/core/lib/fs_ttc.c  |  19 +
 .../ethernet/mellanox/mlx5/core/lib/fs_ttc.h  |   1 +
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   2 +
 .../net/ethernet/mellanox/mlx5/core/vport.c   |  25 +
 include/net/devlink.h                         |   9 +
 include/uapi/linux/devlink.h                  |   4 +
 net/devlink/netlink_gen.c                     |  16 +-
 net/devlink/netlink_gen.h                     |   2 +
 net/devlink/rate.c                            | 127 +++
 31 files changed, 1274 insertions(+), 145 deletions(-)


base-commit: acdefab0dcbc3833b5a734ab80d792bb778517a0

Comments

Jakub Kicinski Feb. 12, 2025, 3:36 a.m. UTC | #1
On Sun, 9 Feb 2025 12:17:01 +0200 Tariq Toukan wrote:
> This feature extends the devlink-rate API to support traffic class (TC)
> bandwidth management, enabling more granular control over traffic
> shaping and rate limiting across multiple TCs. The API now allows users
> to specify bandwidth proportions for different traffic classes in a
> single command. This is particularly useful for managing Enhanced
> Transmission Selection (ETS) for groups of Virtual Functions (VFs),
> allowing precise bandwidth allocation across traffic classes.
> 
> Additionally, it refines the QoS handling in net/mlx5 to support TC
> arbitration and bandwidth management on vports and rate nodes.
> 
> Discussions on traffic class shaping in net-shapers began in V5 [2],
> where we discussed with maintainers whether net-shapers should support
> traffic classes and how this could be implemented.
> 
> Later, after further conversations with Paolo Abeni and Simon Horman,
> Cosmin provided an update [3], confirming that net-shapers' tree-based
> hierarchy aligns well with traffic classes when treated as distinct
> subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
> queues and traffic classes, this approach seems feasible, though some
> open questions remain regarding queue reconfiguration and certain mlx5
> scheduling behaviors.

/trim CC, add Carolina.

I don't understand what the plan is for shapers. As you say at netdev
level the classes will likely be associated with queues, so there isn't
much to do. So how will we represent the TCs based on classification?
I appreciate you working with Paolo and Simon, but from my perspective
none of the questions have been answered.

I'm not even asking you to write the code, just to have a solid plan.


Tariq, LMK if you want me to apply the patches starting from patch 6.
The rest of the series looks good. Two process notes, FWIW:
 - pretty sure we agreed in the past that it's okay to have patches
   which significantly extend uAPI or core to be handled outside of
   the main "driver update stream"; what "significant" means may take
   a bit of trial and error but I can't think of any misunderstandings 
   so far
 - from my PoV it'd be perfectly fine if you were to submit multiple
   series at once if they are independent. Just as long as there's not
   more than 15 patches for either tree outstanding. But I understand
   that comes at it's own cost
Tariq Toukan Feb. 12, 2025, 11:08 a.m. UTC | #2
On 12/02/2025 5:36, Jakub Kicinski wrote:
> On Sun, 9 Feb 2025 12:17:01 +0200 Tariq Toukan wrote:

..

> Tariq, LMK if you want me to apply the patches starting from patch 6.

Yes, please apply.

> The rest of the series looks good. Two process notes, FWIW:
>   - pretty sure we agreed in the past that it's okay to have patches
>     which significantly extend uAPI or core to be handled outside of
>     the main "driver update stream"; what "significant" means may take
>     a bit of trial and error but I can't think of any misunderstandings
>     so far
>   - from my PoV it'd be perfectly fine if you were to submit multiple
>     series at once if they are independent. Just as long as there's not
>     more than 15 patches for either tree outstanding. But I understand
>     that comes at it's own cost
> 

Ack.
That sounds good.
Can be helpful in some cases.

Regards,
Tariq
patchwork-bot+netdevbpf@kernel.org Feb. 12, 2025, 7:20 p.m. UTC | #3
Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sun, 9 Feb 2025 12:17:01 +0200 you wrote:
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Hi,
> 
> This patchset consists of multiple features from the team to the mlx5
> core and Eth drivers.
> 
> [...]

Here is the summary with links:
  - [net-next,01/15] devlink: Extend devlink rate API with traffic classes bandwidth management
    (no matching commit)
  - [net-next,02/15] net/mlx5: Add no-op implementation for setting tc-bw on rate objects
    (no matching commit)
  - [net-next,03/15] net/mlx5: Add support for setting tc-bw on nodes
    (no matching commit)
  - [net-next,04/15] net/mlx5: Add traffic class scheduling support for vport QoS
    (no matching commit)
  - [net-next,05/15] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw
    (no matching commit)
  - [net-next,06/15] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
    https://git.kernel.org/netdev/net-next/c/e1d68ea58c7e
  - [net-next,07/15] net/mlx5e: reduce rep rxq depth to 256 for ECPF
    https://git.kernel.org/netdev/net-next/c/b9cc8f9d7008
  - [net-next,08/15] net/mlx5e: set the tx_queue_len for pfifo_fast
    https://git.kernel.org/netdev/net-next/c/a38cc5706fb9
  - [net-next,09/15] net/mlx5: Rename and move mlx5_esw_query_vport_vhca_id
    https://git.kernel.org/netdev/net-next/c/38b3d42e5afa
  - [net-next,10/15] net/mlx5: Expose ICM consumption per function
    https://git.kernel.org/netdev/net-next/c/b820864335c8
  - [net-next,11/15] net/mlx5e: Move RQs diagnose to a dedicated function
    https://git.kernel.org/netdev/net-next/c/913175b3f919
  - [net-next,12/15] net/mlx5e: Add direct TIRs to devlink rx reporter diagnose
    https://git.kernel.org/netdev/net-next/c/99c55284e85b
  - [net-next,13/15] net/mlx5e: Expose RSS via devlink rx reporter diagnose
    https://git.kernel.org/netdev/net-next/c/896c92aa7429
  - [net-next,14/15] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
    https://git.kernel.org/netdev/net-next/c/95b9606b15bb
  - [net-next,15/15] net/mlx5: XDP, Enable TX side XDP multi-buffer support
    https://git.kernel.org/netdev/net-next/c/1a9304859b3a

You are awesome, thank you!
Tariq Toukan Feb. 12, 2025, 8:19 p.m. UTC | #4
On 12/02/2025 5:36, Jakub Kicinski wrote:
> On Sun, 9 Feb 2025 12:17:01 +0200 Tariq Toukan wrote:
>> This feature extends the devlink-rate API to support traffic class (TC)
>> bandwidth management, enabling more granular control over traffic
>> shaping and rate limiting across multiple TCs. The API now allows users
>> to specify bandwidth proportions for different traffic classes in a
>> single command. This is particularly useful for managing Enhanced
>> Transmission Selection (ETS) for groups of Virtual Functions (VFs),
>> allowing precise bandwidth allocation across traffic classes.
>>
>> Additionally, it refines the QoS handling in net/mlx5 to support TC
>> arbitration and bandwidth management on vports and rate nodes.
>>
>> Discussions on traffic class shaping in net-shapers began in V5 [2],
>> where we discussed with maintainers whether net-shapers should support
>> traffic classes and how this could be implemented.
>>
>> Later, after further conversations with Paolo Abeni and Simon Horman,
>> Cosmin provided an update [3], confirming that net-shapers' tree-based
>> hierarchy aligns well with traffic classes when treated as distinct
>> subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
>> queues and traffic classes, this approach seems feasible, though some
>> open questions remain regarding queue reconfiguration and certain mlx5
>> scheduling behaviors.
> 
> /trim CC, add Carolina.
> 
> I don't understand what the plan is for shapers. As you say at netdev
> level the classes will likely be associated with queues, so there isn't
> much to do. So how will we represent the TCs based on classification?
> I appreciate you working with Paolo and Simon, but from my perspective
> none of the questions have been answered.
> 
> I'm not even asking you to write the code, just to have a solid plan.
> 

This is WIP. We'll share more details soon.

Tariq.