mbox series

[net-next,v2,0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe

Message ID 1644571221-237302-1-git-send-email-moshe@nvidia.com (mailing list archive)
Headers show
Series net/mlx5: Introduce devlink param to disable SF aux dev probe | expand

Message

Moshe Shemesh Feb. 11, 2022, 9:20 a.m. UTC
Currently SF device has all the aux devices enabled by default. Once
loaded, user who desire to disable some of them need to perform devlink
reload. This operation helps to reclaim memory that was not supposed
to be used, but the lost time in disabling and enabling again cannot be
recovered by this approach[1].
Therefore, introduce a new devlink generic parameter for PCI PF which
spawns SF devices. This parameter sets a flag in order to disable all
auxiliary devices of the SF. i.e.: All children auxiliary devices of SF
for RDMA, eth and vdpa-net are disabled by default and hence no device
initialization is done at probe stage.

The settings introduced here should suit either if ESW and PF are on
same host or not.

Example 1: When ESW and SF hosting PF are the same:

Disable SF aux dev probe:
$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
              value false cmode runtime

Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

Now depending on the use case, the user can enable specific auxiliary
device(s). For example:

$ devlink dev param set auxiliary/mlx5_core.sf.1 \
              name enable_vnet value true cmode driverinit

Afterwards, user needs to reload the SF in order for the SF to come up
with the specific configuration:

$ devlink dev reload auxiliary/mlx5_core.sf.1


Example2: ESW and PF are on different hosts.

Disable SF's children auxiliary device probing for the specified PF on
host:
$ devlink dev param set pci/0000:04:00.0 name enable_sfs_aux_devs \
               value false cmode runtime

Create SF on ESW side:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 \
               controller 1
$ devlink port function set pci/0000:08:00.0/32768 \
               hw_addr 00:00:00:00:00:11 state active

When SF device appears on the host:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
               name enable_vnet value true cmode driverinit
$ devlink dev reload auxiliary/mlx5_core.sf.1

changelog:
v1->v2:
 - updated example to make clear SF port and SF device creation PFs
 - added example when SF port and device creation PFs are on different
   hosts

[1]
mlx5 devlink reload is taking about 2 seconds, which means that with
256 SFs we are speaking about ~8.5 minutes.

Shay Drory (4):
  net/mlx5: Split function_setup() to enable and open functions
  net/mlx5: Delete redundant default assignment of runtime devlink
    params
  devlink: Add new "enable_sfs_aux_devs" generic device param
  net/mlx5: Support enable_sfs_aux_devs devlink param

 .../networking/devlink/devlink-params.rst     |   5 +
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |  16 ++
 .../net/ethernet/mellanox/mlx5/core/devlink.c |  51 ++---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   3 +
 .../net/ethernet/mellanox/mlx5/core/health.c  |   5 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    | 183 +++++++++++++++---
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |   6 +
 .../mellanox/mlx5/core/sf/dev/driver.c        |  13 +-
 .../ethernet/mellanox/mlx5/core/sf/devlink.c  |  40 ++++
 .../ethernet/mellanox/mlx5/core/sf/hw_table.c |   7 +
 .../net/ethernet/mellanox/mlx5/core/sf/priv.h |   2 +
 include/linux/mlx5/driver.h                   |   1 +
 include/net/devlink.h                         |   4 +
 net/core/devlink.c                            |   5 +
 14 files changed, 284 insertions(+), 57 deletions(-)

Comments

Jakub Kicinski Feb. 12, 2022, 1:12 a.m. UTC | #1
On Fri, 11 Feb 2022 11:20:17 +0200 Moshe Shemesh wrote:
> v1->v2:
>  - updated example to make clear SF port and SF device creation PFs
>  - added example when SF port and device creation PFs are on different hosts

How does this address my comments?

We will not define Linux APIs based on what your firmware can or 
cannot do today. Can we somehow avoid having another frustrating
and drawn out discussion that hinges on that point?

Otherwise, why the global policy and all the hoops to jump thru?
User wants a device with a vnet, give them a device with a vnet.

You left out from your steps how ESW learns that the device has 
to be spawned. Given there's some form of communication between
user intent and ESW the location of the two is completely irrelevant.
You were right to treat the two cases as equivalent in the cover 
letter for v1.
Parav Pandit Feb. 14, 2022, 2:45 p.m. UTC | #2
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Saturday, February 12, 2022 6:43 AM
> 
> On Fri, 11 Feb 2022 11:20:17 +0200 Moshe Shemesh wrote:
> > v1->v2:
> >  - updated example to make clear SF port and SF device creation PFs
> >  - added example when SF port and device creation PFs are on different
> > hosts
> 
> How does this address my comments?
>
Which one?
Your suggestion in [1] to specify vnet during port function spawning time?
Or 
Your suggestion in [2] to add "noprobe" option?

> We will not define Linux APIs based on what your firmware can or cannot do
> today. 
Sure. I just answered and clarified what it is the device capable to do.
We can possibly enhance the fw if it looks correct.

> Otherwise, why the global policy and all the hoops to jump thru?
> User wants a device with a vnet, give them a device with a vnet.
>
User wants a device with specific attributes.
Do you suggest to tunnel those params at port spawning time to share to different host via fw?
Some are HW/FW capabilities, and some are hints.
Some are hints because vnet also uses some eth resources by its very nature of being vnet.

Lets discuss two use cases.
Use case -1:
User wants params of [3] to below value.
eth=false, vnet=false, rdma=true, roce=false, io_eq_size=4, event_eq_size=256, max_macs=1.

Use case -2:
User wants params of [3] be below value.
eth=true, vnet=false, rdma=true, roce=true, rest=don't care.

Last year, when we added "roce" in [4] on the eswitch side, you commented in [4] to leave the decision on the SF side.
Based on this feedback, you can see growth of such params on the SF side in [5], [6] and reusing existing params in [7].
(instead of doing them on port spawning side)

Port spawning time attributes should cover minimum of below attributes of [3].
(a) enable_vnet,eth,roce,rdma,iwarp (b) io_eq_size, (c) event_eq_size (d) max_macs.

Do you agree if above list is worth addition as port function attributes?
If not, its not addressing the user needs.

Did you get a chance to read my reply in [8]?
In future when user wants to change the cpu affinity of a SF, user needs to perform devlink reload.
And params of [3] + any new devlink params also benefit from single devlink reload?
For example, which and how many cpus to use is something best decided by the depending on the workload and use case.

> You left out from your steps how ESW learns that the device has to be
> spawned. 
I read above note few times, but didn't understand. Can you please explain?

> Given there's some form of communication between user intent and
> ESW the location of the two is completely irrelevant.
I find it difficult to have all attributes on the port function, specially knobs which are very host specific.
Few valid knobs that I see on host side are 
(a) cpu affinity mask
(b) number of msix vectors to consume within driver internally vs map to user space

At present I see knobs on both sides.
Saeed is offline this week, and I want to gather his feedback as well on passing hints from port spawning side to host side.
Parav Pandit Feb. 24, 2022, 3:44 a.m. UTC | #3
Hi Jakub,

> From: Parav Pandit
> Sent: Monday, February 14, 2022 8:16 PM
> 
> > From: Jakub Kicinski <kuba@kernel.org>
> > Sent: Saturday, February 12, 2022 6:43 AM
> >
> > On Fri, 11 Feb 2022 11:20:17 +0200 Moshe Shemesh wrote:
> > > v1->v2:
> > >  - updated example to make clear SF port and SF device creation PFs
> > >  - added example when SF port and device creation PFs are on
> > > different hosts
> >
> > How does this address my comments?
> >
> Which one?
> Your suggestion in [1] to specify vnet during port function spawning time?
> Or
> Your suggestion in [2] to add "noprobe" option?

> Saeed is offline this week, and I want to gather his feedback as well on passing
> hints from port spawning side to host side.

Saeed is back. Saeed, others and I discussed the per SF knob further.
The option to indicate to not initialize the SF device during port spawning time is very useful to users.

So, how about crafting below UAPI?
Example:

Esw side:
$ devlink port add <dev> flavour pcisf .. 
$ devlink port function set <port> hw_addr 00:11:22:33:44:55 initialize false/true ...

The "initialize" option indicates whether a function should fully initialize or not at driver level on the host side.
When initialize=false, the device will spawn but not initialize until a user on the host initialize it using the existing devlink reload API on this SF device.

For example, on host side, a user will be able to do,
$ devlink dev auxiliary/mlx5_core.sf2 resource/param set
$ devlink dev reload auxiliary/mlx5_core.sf.2