Message ID | 20240406010538.220167-1-parav@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | devlink: Add port function attribute for IO EQs | expand |
在 2024/4/6 3:05, Parav Pandit 写道: > Currently, PCI SFs and VFs use IO event queues to deliver netdev per > channel events. The number of netdev channels is a function of IO > event queues. In the second scenario of an RDMA device, the > completion vectors are also a function of IO event queues. Currently, an > administrator on the hypervisor has no means to provision the number > of IO event queues for the SF device or the VF device. Device/firmware > determines some arbitrary value for these IO event queues. Due to this, > the SF netdev channels are unpredictable, and consequently, the > performance is too. > > This short series introduces a new port function attribute: max_io_eqs. > The goal is to provide administrators at the hypervisor level with the > ability to provision the maximum number of IO event queues for a > function. This gives the control to the administrator to provision > right number of IO event queues and have predictable performance. > > Examples of when an administrator provisions (set) maximum number of > IO event queues when using switchdev mode: > > $ devlink port show pci/0000:06:00.0/1 > pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 > function: > hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 > > $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 > > $ devlink port show pci/0000:06:00.0/1 > pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 > function: > hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 > > This sets the corresponding maximum IO event queues of the function > before it is enumerated. Thus, when the VF/SF driver reads the > capability from the device, it sees the value provisioned by the > hypervisor. The driver is then able to configure the number of channels > for the net device, as well as the number of completion vectors > for the RDMA device. The device/firmware also honors the provisioned > value, hence any VF/SF driver attempting to create IO EQs > beyond provisioned value results in an error. > > With above setting now, the administrator is able to achieve the 2x > performance on SFs with 20 channels. In second example when SF was > provisioned for a container with 2 cpus, the administrator provisioned only > 2 IO event queues, thereby saving device resources. > The following paragraph is the same with the above paragraph? > With the above settings now in place, the administrator achieved 2x > performance with the SF device with 20 channels. In the second example, > when the SF was provisioned for a container with 2 CPUs, the administrator > provisioned only 2 IO event queues, thereby saving device resources. > > changelog: > v2->v3: > - limited to 80 chars per line in devlink > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock > on error path > v1->v2: > - limited comment to 80 chars per line in header file > - fixed set function variables for reverse christmas tree > - fixed comments from Kalesh > - fixed missing kfree in get call > - returning error code for get cmd failure > - fixed error msg copy paste error in set on cmd failure > > Parav Pandit (2): > devlink: Support setting max_io_eqs > mlx5/core: Support max_io_eqs for a function > > .../networking/devlink/devlink-port.rst | 33 +++++++ > .../mellanox/mlx5/core/esw/devlink_port.c | 4 + > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ > .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ > include/net/devlink.h | 14 +++ > include/uapi/linux/devlink.h | 1 + > net/devlink/port.c | 53 ++++++++++ > 7 files changed, 209 insertions(+) >
> From: Zhu Yanjun <yanjun.zhu@linux.dev> > Sent: Saturday, April 6, 2024 2:36 PM > > 在 2024/4/6 3:05, Parav Pandit 写道: > > Currently, PCI SFs and VFs use IO event queues to deliver netdev per > > channel events. The number of netdev channels is a function of IO > > event queues. In the second scenario of an RDMA device, the completion > > vectors are also a function of IO event queues. Currently, an > > administrator on the hypervisor has no means to provision the number > > of IO event queues for the SF device or the VF device. Device/firmware > > determines some arbitrary value for these IO event queues. Due to > > this, the SF netdev channels are unpredictable, and consequently, the > > performance is too. > > > > This short series introduces a new port function attribute: max_io_eqs. > > The goal is to provide administrators at the hypervisor level with the > > ability to provision the maximum number of IO event queues for a > > function. This gives the control to the administrator to provision > > right number of IO event queues and have predictable performance. > > > > Examples of when an administrator provisions (set) maximum number of > > IO event queues when using switchdev mode: > > > > $ devlink port show pci/0000:06:00.0/1 > > pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum > 0 vfnum 0 > > function: > > hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 > > > > $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 > > > > $ devlink port show pci/0000:06:00.0/1 > > pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum > 0 vfnum 0 > > function: > > hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 > > > > This sets the corresponding maximum IO event queues of the function > > before it is enumerated. Thus, when the VF/SF driver reads the > > capability from the device, it sees the value provisioned by the > > hypervisor. The driver is then able to configure the number of > > channels for the net device, as well as the number of completion > > vectors for the RDMA device. The device/firmware also honors the > > provisioned value, hence any VF/SF driver attempting to create IO EQs > > beyond provisioned value results in an error. > > > > With above setting now, the administrator is able to achieve the 2x > > performance on SFs with 20 channels. In second example when SF was > > provisioned for a container with 2 cpus, the administrator provisioned > > only > > 2 IO event queues, thereby saving device resources. > > > > The following paragraph is the same with the above paragraph? > Ah, yes. I forgot to remove one of them while doing minor grammar changes. > > With the above settings now in place, the administrator achieved 2x > > performance with the SF device with 20 channels. In the second > > example, when the SF was provisioned for a container with 2 CPUs, the > > administrator provisioned only 2 IO event queues, thereby saving device > resources. > > > > changelog: > > v2->v3: > > - limited to 80 chars per line in devlink > > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock > > on error path > > v1->v2: > > - limited comment to 80 chars per line in header file > > - fixed set function variables for reverse christmas tree > > - fixed comments from Kalesh > > - fixed missing kfree in get call > > - returning error code for get cmd failure > > - fixed error msg copy paste error in set on cmd failure > > > > Parav Pandit (2): > > devlink: Support setting max_io_eqs > > mlx5/core: Support max_io_eqs for a function > > > > .../networking/devlink/devlink-port.rst | 33 +++++++ > > .../mellanox/mlx5/core/esw/devlink_port.c | 4 + > > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ > > .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ > > include/net/devlink.h | 14 +++ > > include/uapi/linux/devlink.h | 1 + > > net/devlink/port.c | 53 ++++++++++ > > 7 files changed, 209 insertions(+) > >
Hello: This series was applied to netdev/net-next.git (main) by David S. Miller <davem@davemloft.net>: On Sat, 6 Apr 2024 04:05:36 +0300 you wrote: > Currently, PCI SFs and VFs use IO event queues to deliver netdev per > channel events. The number of netdev channels is a function of IO > event queues. In the second scenario of an RDMA device, the > completion vectors are also a function of IO event queues. Currently, an > administrator on the hypervisor has no means to provision the number > of IO event queues for the SF device or the VF device. Device/firmware > determines some arbitrary value for these IO event queues. Due to this, > the SF netdev channels are unpredictable, and consequently, the > performance is too. > > [...] Here is the summary with links: - [net-next,v4,1/2] devlink: Support setting max_io_eqs https://git.kernel.org/netdev/net-next/c/5af3e3876d56 - [net-next,v4,2/2] mlx5/core: Support max_io_eqs for a function https://git.kernel.org/netdev/net-next/c/93197c7c509d You are awesome, thank you!