[net-next,0/3] nfp: support VF multi-queues configuration

Message ID	20221019140943.18851-1-simon.horman@corigine.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Simon Horman <simon.horman@corigine.com> To: David Miller <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Michael Chan <michael.chan@broadcom.com>, Andy Gospodarek <andy@greyhouse.net>, Gal Pressman <gal@nvidia.com>, Saeed Mahameed <saeed@kernel.org>, Jesse Brandeburg <jesse.brandeburg@intel.com>, Tony Nguyen <anthony.l.nguyen@intel.com>, Edward Cree <ecree.xilinx@gmail.com>, Vladimir Oltean <vladimir.oltean@nxp.com>, Andrew Lunn <andrew@lunn.ch>, Peng Zhang <peng.zhang@corigine.com>, netdev@vger.kernel.org, oss-drivers@corigine.com Subject: [PATCH net-next 0/3] nfp: support VF multi-queues configuration Date: Wed, 19 Oct 2022 16:09:40 +0200 Message-Id: <20221019140943.18851-1-simon.horman@corigine.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Precedence: bulk
Series	nfp: support VF multi-queues configuration \| expand [net-next,0/3] nfp: support VF multi-queues configuration [net-next,1/3] nfp: support VF multi-queues configuration [net-next,2/3] devlink: Add new "max_vf_queue" generic device param [net-next,3/3] nfp: devlink: add the devlink parameter "max_vf_queue" support

Simon Horman Oct. 19, 2022, 2:09 p.m. UTC

Hi,

this short series adds the max_vf_queue generic devlink device parameter,
the intention of this is to allow configuration of the number of queues
associated with VFs, and facilitates having VFs with different queue
counts.

The series also adds support for multi-queue VFs to the nfp driver
and support for the max_vf_queue feature described above.

Diana Wang (1):
  nfp: support VF multi-queues configuration

Peng Zhang (2):
  devlink: Add new "max_vf_queue" generic device param
  nfp: devlink: add the devlink parameter "max_vf_queue" support

 .../networking/devlink/devlink-params.rst     |   5 +
 Documentation/networking/devlink/nfp.rst      |   2 +
 .../ethernet/netronome/nfp/devlink_param.c    | 114 ++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_main.c |   6 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h |  13 ++
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_main.c |   3 +
 .../ethernet/netronome/nfp/nfp_net_sriov.c    | 101 ++++++++++++++++
 .../ethernet/netronome/nfp/nfp_net_sriov.h    |   3 +
 include/net/devlink.h                         |   4 +
 net/core/devlink.c                            |   5 +
 11 files changed, 257 insertions(+)

Jakub Kicinski Oct. 20, 2022, 1:01 a.m. UTC | #1

On Wed, 19 Oct 2022 16:09:40 +0200 Simon Horman wrote:
> this short series adds the max_vf_queue generic devlink device parameter,
> the intention of this is to allow configuration of the number of queues
> associated with VFs, and facilitates having VFs with different queue
> counts.
> 
> The series also adds support for multi-queue VFs to the nfp driver
> and support for the max_vf_queue feature described above.

I appreciate CCing a wider group this time, but my concerns about using
devlink params for resource allocation still stand. I don't remember
anyone refuting that.

https://lore.kernel.org/all/20220921063448.5b0dd32b@kernel.org/

Yinjun Zhang Oct. 20, 2022, 1:35 a.m. UTC | #2

On Wed, Oct 19, 2022 at 06:01:06PM -0700, Jakub Kicinski wrote:
> On Wed, 19 Oct 2022 16:09:40 +0200 Simon Horman wrote:
> > this short series adds the max_vf_queue generic devlink device parameter,
> > the intention of this is to allow configuration of the number of queues
> > associated with VFs, and facilitates having VFs with different queue
> > counts.
> > 
> > The series also adds support for multi-queue VFs to the nfp driver
> > and support for the max_vf_queue feature described above.
> 
> I appreciate CCing a wider group this time, but my concerns about using
> devlink params for resource allocation still stand. I don't remember
> anyone refuting that.
> 
> https://lore.kernel.org/all/20220921063448.5b0dd32b@kernel.org/

Sorry this part was neglected, we'll take a look into the resource APIs.
Thanks.

Saeed Mahameed Oct. 25, 2022, 7:51 a.m. UTC | #3

On 20 Oct 09:35, Yinjun Zhang wrote:
>On Wed, Oct 19, 2022 at 06:01:06PM -0700, Jakub Kicinski wrote:
>> On Wed, 19 Oct 2022 16:09:40 +0200 Simon Horman wrote:
>> > this short series adds the max_vf_queue generic devlink device parameter,
>> > the intention of this is to allow configuration of the number of queues
>> > associated with VFs, and facilitates having VFs with different queue
>> > counts.
>> >
>> > The series also adds support for multi-queue VFs to the nfp driver
>> > and support for the max_vf_queue feature described above.
>>
>> I appreciate CCing a wider group this time, but my concerns about using
>> devlink params for resource allocation still stand. I don't remember
>> anyone refuting that.
>>
>> https://lore.kernel.org/all/20220921063448.5b0dd32b@kernel.org/
>
>Sorry this part was neglected, we'll take a look into the resource APIs.
>Thanks.
>

The problem with this is that this should be a per function parameter,
devlink params or resources is not the right place for this as this
should be a configuration of a specific devlink object that is not the
parent device (namely devlink port function), otherwise we will have to
deal with ugly string parsing to address the specific vf attributes. 

let's use devlink port:
https://www.kernel.org/doc/html/latest/networking/devlink/devlink-port.html

devlink ports have attributes and we should extend attributes to act like
devlink parameters.

   devlink port function set DEV/PORT_INDEX [ queue_count count ] ...

https://man7.org/linux/man-pages/man8/devlink-port.8.html

Alternatively you should also consider limiting vf msix, as we did in mlx5
https://patchwork.kernel.org/project/linux-rdma/cover/20210314124256.70253-1-leon@kernel.org/

Yinjun Zhang Oct. 25, 2022, 10:41 a.m. UTC | #4

On Tue, 25 Oct 2022 08:51:41 +0100, Saeed Mahameed wrote:
> On 20 Oct 09:35, Yinjun Zhang wrote:
> >On Wed, Oct 19, 2022 at 06:01:06PM -0700, Jakub Kicinski wrote:
> >> On Wed, 19 Oct 2022 16:09:40 +0200 Simon Horman wrote:
> >> > this short series adds the max_vf_queue generic devlink device
> parameter,
> >> > the intention of this is to allow configuration of the number of queues
> >> > associated with VFs, and facilitates having VFs with different queue
> >> > counts.
> >> >
> >> > The series also adds support for multi-queue VFs to the nfp driver
> >> > and support for the max_vf_queue feature described above.
> >>
> >> I appreciate CCing a wider group this time, but my concerns about using
> >> devlink params for resource allocation still stand. I don't remember
> >> anyone refuting that.
> >>
> >> https://lore.kernel.org/all/20220921063448.5b0dd32b@kernel.org/
> >
> >Sorry this part was neglected, we'll take a look into the resource APIs.
> >Thanks.
> >
> 
> The problem with this is that this should be a per function parameter,
> devlink params or resources is not the right place for this as this
> should be a configuration of a specific devlink object that is not the
> parent device (namely devlink port function), otherwise we will have to
> deal with ugly string parsing to address the specific vf attributes.
> 
> let's use devlink port:
> https://www.kernel.org/doc/html/latest/networking/devlink/devlink-
> port.html
> 
> devlink ports have attributes and we should extend attributes to act like
> devlink parameters.
> 
>    devlink port function set DEV/PORT_INDEX [ queue_count count ] ...
> 
> https://man7.org/linux/man-pages/man8/devlink-port.8.html

Although the vf-max-queue is a per-VF property, it's configured from PF's 
perspective, so that the overall queue resource can be reallocated among VFs.
So a devlink object attached to the PF is used to configure, and resource seems
more appropriate than param.

> 
> Alternatively you should also consider limiting vf msix, as we did in mlx5
> https://patchwork.kernel.org/project/linux-
> rdma/cover/20210314124256.70253-1-leon@kernel.org/

Msix is not the limitation in our case.

Saeed Mahameed Oct. 25, 2022, 11:05 a.m. UTC | #5

On 25 Oct 10:41, Yinjun Zhang wrote:
>On Tue, 25 Oct 2022 08:51:41 +0100, Saeed Mahameed wrote:
>> On 20 Oct 09:35, Yinjun Zhang wrote:
>> >On Wed, Oct 19, 2022 at 06:01:06PM -0700, Jakub Kicinski wrote:
>> >> On Wed, 19 Oct 2022 16:09:40 +0200 Simon Horman wrote:
>> >> > this short series adds the max_vf_queue generic devlink device
>> parameter,
>> >> > the intention of this is to allow configuration of the number of queues
>> >> > associated with VFs, and facilitates having VFs with different queue
>> >> > counts.
>> >> >
>> >> > The series also adds support for multi-queue VFs to the nfp driver
>> >> > and support for the max_vf_queue feature described above.
>> >>
>> >> I appreciate CCing a wider group this time, but my concerns about using
>> >> devlink params for resource allocation still stand. I don't remember
>> >> anyone refuting that.
>> >>
>> >> https://lore.kernel.org/all/20220921063448.5b0dd32b@kernel.org/
>> >
>> >Sorry this part was neglected, we'll take a look into the resource APIs.
>> >Thanks.
>> >
>>
>> The problem with this is that this should be a per function parameter,
>> devlink params or resources is not the right place for this as this
>> should be a configuration of a specific devlink object that is not the
>> parent device (namely devlink port function), otherwise we will have to
>> deal with ugly string parsing to address the specific vf attributes.
>>
>> let's use devlink port:
>> https://www.kernel.org/doc/html/latest/networking/devlink/devlink-
>> port.html
>>
>> devlink ports have attributes and we should extend attributes to act like
>> devlink parameters.
>>
>>    devlink port function set DEV/PORT_INDEX [ queue_count count ] ...
>>
>> https://man7.org/linux/man-pages/man8/devlink-port.8.html
>
>Although the vf-max-queue is a per-VF property, it's configured from PF's
>perspective, so that the overall queue resource can be reallocated among VFs.
>So a devlink object attached to the PF is used to configure, and resource seems
>more appropriate than param.
>

devlink port function is an object that's exposed on the PF. It will give
you a handle on the PF side to every sub-function (vf/sf) exposed via the
PF.

can you provide an example of how you imagine the reosurce vf-max-queue api
will look like ?

Yinjun Zhang Oct. 25, 2022, 11:39 a.m. UTC | #6

On Date: Tue, 25 Oct 2022 12:05:14 +0100, Saeed Mahameed wrote:
> On 25 Oct 10:41, Yinjun Zhang wrote:
> >On Tue, 25 Oct 2022 08:51:41 +0100, Saeed Mahameed wrote:
> >> The problem with this is that this should be a per function parameter,
> >> devlink params or resources is not the right place for this as this
> >> should be a configuration of a specific devlink object that is not the
> >> parent device (namely devlink port function), otherwise we will have to
> >> deal with ugly string parsing to address the specific vf attributes.
> >>
> >> let's use devlink port:
> >> https://www.kernel.org/doc/html/latest/networking/devlink/devlink-
> >> port.html
> >>
> >> devlink ports have attributes and we should extend attributes to act like
> >> devlink parameters.
> >>
> >>    devlink port function set DEV/PORT_INDEX [ queue_count count ] ...
> >>
> >> https://man7.org/linux/man-pages/man8/devlink-port.8.html
> >
> >Although the vf-max-queue is a per-VF property, it's configured from PF's
> >perspective, so that the overall queue resource can be reallocated among
> VFs.
> >So a devlink object attached to the PF is used to configure, and resource
> seems
> >more appropriate than param.
> >
> 
> devlink port function is an object that's exposed on the PF. It will give
> you a handle on the PF side to every sub-function (vf/sf) exposed via the
> PF.

Sorry, I thought you meant each VF creates a devlink obj. So still one devlink obj
and each VF registers a devlink port, right? But the configuration is supposed to
be done before VFs are created, it maybe not appropriate to register ports before
relevant VFs are created I think.

> 
> can you provide an example of how you imagine the reosurce vf-max-queue
> api
> will look like ?

Two options, 
one is from VF's perspective, you need configure one by one, very straightforward:
```
pci/xxxx:xx:xx.x:
  name max_q size 128 unit entry
    resources:
      name VF0 size 1 unit entry size_min 1 size_max 128 size_gran 1
      name VF1 size 1 unit entry size_min 1 size_max 128 size_gran 1
      ...
```
another is from queue's perspective, several class is supported, not very flexible:
```
pci/xxxx:xx:xx.x:
  name max_q_class size 128 unit entry
    resources:
      # means how many VFs possess max-q-number of 16/8/..1 respectively
      name _16 size 0 unit entry size_min 0 size_max 128 size_gran 1
      name _8 size 0 unit entry size_min 0 size_max 128 size_gran 1
      ...
      name _1 size 0 unit entry size_min 0 size_max 128 size_gran 1
```

Saeed Mahameed Oct. 26, 2022, 2:22 p.m. UTC | #7

On 25 Oct 11:39, Yinjun Zhang wrote:
>On Date: Tue, 25 Oct 2022 12:05:14 +0100, Saeed Mahameed wrote:
>> On 25 Oct 10:41, Yinjun Zhang wrote:
>> >On Tue, 25 Oct 2022 08:51:41 +0100, Saeed Mahameed wrote:
>> >> The problem with this is that this should be a per function parameter,
>> >> devlink params or resources is not the right place for this as this
>> >> should be a configuration of a specific devlink object that is not the
>> >> parent device (namely devlink port function), otherwise we will have to
>> >> deal with ugly string parsing to address the specific vf attributes.
>> >>
>> >> let's use devlink port:
>> >> https://www.kernel.org/doc/html/latest/networking/devlink/devlink-
>> >> port.html
>> >>
>> >> devlink ports have attributes and we should extend attributes to act like
>> >> devlink parameters.
>> >>
>> >>    devlink port function set DEV/PORT_INDEX [ queue_count count ] ...
>> >>
>> >> https://man7.org/linux/man-pages/man8/devlink-port.8.html
>> >
>> >Although the vf-max-queue is a per-VF property, it's configured from PF's
>> >perspective, so that the overall queue resource can be reallocated among
>> VFs.
>> >So a devlink object attached to the PF is used to configure, and resource
>> seems
>> >more appropriate than param.
>> >
>>
>> devlink port function is an object that's exposed on the PF. It will give
>> you a handle on the PF side to every sub-function (vf/sf) exposed via the
>> PF.
>
>Sorry, I thought you meant each VF creates a devlink obj. So still one devlink obj
>and each VF registers a devlink port, right? But the configuration is supposed to
>be done before VFs are created, it maybe not appropriate to register ports before
>relevant VFs are created I think.
>

Usually you create the VFs unbound, configure them and then bind them.
otherwise a query will have to query any possible VF which for some vendors
can be thousands ! it's better to work on created but not yet deployed vfs

>>
>> can you provide an example of how you imagine the reosurce vf-max-queue
>> api
>> will look like ?
>
>Two options,
>one is from VF's perspective, you need configure one by one, very straightforward:
>```
>pci/xxxx:xx:xx.x:
>  name max_q size 128 unit entry
>    resources:
>      name VF0 size 1 unit entry size_min 1 size_max 128 size_gran 1
>      name VF1 size 1 unit entry size_min 1 size_max 128 size_gran 1
>      ...

the above semantics are really weird, 
VF0 can't be a sub-resource of max_q ! 

sorry i can't think of a way where devlink resoruce semantics can work for
VF resource allocation.

Unless a VF becomes a resource and it's q_table becomes a sub resource of that
VF, which means you will have to register each vf as a resource individually.

Note that i called the resource "q_table" and not "max_queues",
since semantically max_queues is a parameter where q_table can be looked at
as a sub-resource of the VF, the q_table size decides the max_queues a VF
will accept, so there you go ! 
arghh weird.. just make it an attribute for devlink port function and name it
max_q as god intended it to be ;). Fix your FW to allow changing VF maxqueue for
unbound VFs if needed.


>```
>another is from queue's perspective, several class is supported, not very flexible:
>```
>pci/xxxx:xx:xx.x:
>  name max_q_class size 128 unit entry
>    resources:
>      # means how many VFs possess max-q-number of 16/8/..1 respectively
>      name _16 size 0 unit entry size_min 0 size_max 128 size_gran 1
>      name _8 size 0 unit entry size_min 0 size_max 128 size_gran 1
>      ...
>      name _1 size 0 unit entry size_min 0 size_max 128 size_gran 1
>```

weirder.

Jakub Kicinski Oct. 26, 2022, 8:07 p.m. UTC | #8

On Wed, 26 Oct 2022 15:22:21 +0100 Saeed Mahameed wrote:
> >Sorry, I thought you meant each VF creates a devlink obj. So still one devlink obj
> >and each VF registers a devlink port, right? But the configuration is supposed to
> >be done before VFs are created, it maybe not appropriate to register ports before
> >relevant VFs are created I think.
> 
> Usually you create the VFs unbound, configure them and then bind them.
> otherwise a query will have to query any possible VF which for some vendors
> can be thousands ! it's better to work on created but not yet deployed vfs

And the vendors who need to configure before spawning will do what,
exactly? Let's be practical.

> >> can you provide an example of how you imagine the reosurce vf-max-queue
> >> api
> >> will look like ?  
> >
> >Two options,
> >one is from VF's perspective, you need configure one by one, very straightforward:
> >```
> >pci/xxxx:xx:xx.x:
> >  name max_q size 128 unit entry
> >    resources:
> >      name VF0 size 1 unit entry size_min 1 size_max 128 size_gran 1
> >      name VF1 size 1 unit entry size_min 1 size_max 128 size_gran 1
> >      ...  
> 
> the above semantics are really weird, 
> VF0 can't be a sub-resource of max_q ! 
> 
> sorry i can't think of a way where devlink resoruce semantics can work for
> VF resource allocation.

It's just an API section. New forms of configuration can be added.
In fact they should so we can stop having this conversation.

> Unless a VF becomes a resource and it's q_table becomes a sub resource of that
> VF, which means you will have to register each vf as a resource individually.
> 
> Note that i called the resource "q_table" and not "max_queues",
> since semantically max_queues is a parameter where q_table can be looked at
> as a sub-resource of the VF, the q_table size decides the max_queues a VF
> will accept, so there you go ! 

Somewhere along the way you missed the other requirements to also allow
configuring guaranteed count that came from brcm as some point.

> arghh weird.. just make it an attribute for devlink port function and name it
> max_q as god intended it to be ;)

Let's not equate what fits the odd world of Melvidia FW with god.

> Fix your FW to allow changing VF maxqueue for unbound VFs if needed.

Not every device out there is all FW. Thankfully.

Saeed Mahameed Oct. 26, 2022, 10:25 p.m. UTC | #9

On 26 Oct 13:07, Jakub Kicinski wrote:
>On Wed, 26 Oct 2022 15:22:21 +0100 Saeed Mahameed wrote:
>> >Sorry, I thought you meant each VF creates a devlink obj. So still one devlink obj
>> >and each VF registers a devlink port, right? But the configuration is supposed to
>> >be done before VFs are created, it maybe not appropriate to register ports before
>> >relevant VFs are created I think.
>>
>> Usually you create the VFs unbound, configure them and then bind them.
>> otherwise a query will have to query any possible VF which for some vendors
>> can be thousands ! it's better to work on created but not yet deployed vfs
>
>And the vendors who need to configure before spawning will do what,
>exactly? Let's be practical.
>
>> >> can you provide an example of how you imagine the reosurce vf-max-queue
>> >> api
>> >> will look like ?
>> >
>> >Two options,
>> >one is from VF's perspective, you need configure one by one, very straightforward:
>> >```
>> >pci/xxxx:xx:xx.x:
>> >  name max_q size 128 unit entry
>> >    resources:
>> >      name VF0 size 1 unit entry size_min 1 size_max 128 size_gran 1
>> >      name VF1 size 1 unit entry size_min 1 size_max 128 size_gran 1
>> >      ...
>>
>> the above semantics are really weird,
>> VF0 can't be a sub-resource of max_q !
>>
>> sorry i can't think of a way where devlink resoruce semantics can work for
>> VF resource allocation.
>
>It's just an API section. New forms of configuration can be added.
>In fact they should so we can stop having this conversation.
>
>> Unless a VF becomes a resource and it's q_table becomes a sub resource of that
>> VF, which means you will have to register each vf as a resource individually.
>>
>> Note that i called the resource "q_table" and not "max_queues",
>> since semantically max_queues is a parameter where q_table can be looked at
>> as a sub-resource of the VF, the q_table size decides the max_queues a VF
>> will accept, so there you go !
>
>Somewhere along the way you missed the other requirements to also allow
>configuring guaranteed count that came from brcm as some point.
>

max_q and guarantee can be two separate attributes of a devlink port,
anyway see below before you answer, because i don't really care if it
devlink port or resource, but please don't allow the two suggested devlink
resource options to make it upstream as is, please, let's be practical and
correct.

>> arghh weird.. just make it an attribute for devlink port function and name it
>> max_q as god intended it to be ;)
>
>Let's not equate what fits the odd world of Melvidia FW with god.
>
>> Fix your FW to allow changing VF maxqueue for unbound VFs if needed.
>
>Not every device out there is all FW. Thankfully.

This has nothing to do with FW, all devices should be flexible enough to allow
configuring VFs after spawning.
anyway, my point is max_q isn't a resource period, you have to come-up with 
"q_table" concept. and sure, change the API to make it fit your needs, that's
acceptable when you want it to be acceptable, fine, but please don't allow
wrong semantics, I am interested in this feature for mlx5 as well, i don't care
if it's going to be resource or port, all I care about using the right
semantics, for that i suggested the devlink port option and the right semantics
for the devlink resource option, so i am being practical.

Yinjun Zhang Oct. 27, 2022, 2:11 a.m. UTC | #10

On Wed, 26 Oct 2022 15:22:21 +0100, Saeed Mahameed wrote:
> On 25 Oct 11:39, Yinjun Zhang wrote:
> >On Date: Tue, 25 Oct 2022 12:05:14 +0100, Saeed Mahameed wrote:
> 
> Usually you create the VFs unbound, configure them and then bind them.
> otherwise a query will have to query any possible VF which for some vendors
> can be thousands ! it's better to work on created but not yet deployed vfs

Usually creating and binding are not separated, that's why `sriov_drivers_autoprobe`
is default true, unless some particular configuration requires it, like mlnx's msix
practice. 

> 
> >Two options,
> >one is from VF's perspective, you need configure one by one, very
> straightforward:
> >```
> >pci/xxxx:xx:xx.x:
> >  name max_q size 128 unit entry
> >    resources:
> >      name VF0 size 1 unit entry size_min 1 size_max 128 size_gran 1
> >      name VF1 size 1 unit entry size_min 1 size_max 128 size_gran 1
> >      ...
> 
> the above semantics are really weird,
> VF0 can't be a sub-resource of max_q !

Sorry, I admit the naming is not appropriate here. It should be "total_q_for_VF "
for parent resource, and "q_for_VFx" for the sub resources.

> 
> Note that i called the resource "q_table" and not "max_queues",
> since semantically max_queues is a parameter where q_table can be looked
> at
> as a sub-resource of the VF, the q_table size decides the max_queues a VF
> will accept, so there you go !

Queue itself is a kind of resource, why "q_table"? Just because the unit is entry?
I think we need introduce a new generic unit, so that its usage won't be limited.

> arghh weird.. just make it an attribute for devlink port function and name it
> max_q as god intended it to be ;). Fix your FW to allow changing VF
> maxqueue for
> unbound VFs if needed.
> 

It's not the FW constraints, the reason I don't prefer port way is it needs:
1. separate VF creating and binding, which needs extra operation
2. register extra ports for VFs
Both can be avoided when using resource way.

> 
> >```
> >another is from queue's perspective, several class is supported, not very
> flexible:
> >```
> >pci/xxxx:xx:xx.x:
> >  name max_q_class size 128 unit entry
> >    resources:
> >      # means how many VFs possess max-q-number of 16/8/..1 respectively
> >      name _16 size 0 unit entry size_min 0 size_max 128 size_gran 1
> >      name _8 size 0 unit entry size_min 0 size_max 128 size_gran 1
> >      ...
> >      name _1 size 0 unit entry size_min 0 size_max 128 size_gran 1
> >```
> 
> weirder.

Yes, kind of obscure. The intention is to avoid configuring one by one, especially
when there're thousands of VFs. Any better idea is welcomed.

Leon Romanovsky Oct. 27, 2022, 5:53 a.m. UTC | #11

On Thu, Oct 27, 2022 at 02:11:55AM +0000, Yinjun Zhang wrote:
> On Wed, 26 Oct 2022 15:22:21 +0100, Saeed Mahameed wrote:
> > On 25 Oct 11:39, Yinjun Zhang wrote:
> > >On Date: Tue, 25 Oct 2022 12:05:14 +0100, Saeed Mahameed wrote:
> > 
> > Usually you create the VFs unbound, configure them and then bind them.
> > otherwise a query will have to query any possible VF which for some vendors
> > can be thousands ! it's better to work on created but not yet deployed vfs
> 
> Usually creating and binding are not separated, that's why `sriov_drivers_autoprobe`
> is default true.

No, the situation is completely an opposite in a world which heavily uses SR-IOV.
Data centers which rely on SR-IOV to provide VMs to customers separate creation and
bind, as they two different stages in container/VM lifetime. Create is done when
physical server is booted and bind is done when user requested specific properties
of VM.

Various container orchestration frameworks do it for them.

> that's why `sriov_drivers_autoprobe` is default true.

It is not true either. The default value is chosen to keep kernel
backward compatible behavior. 

> unless some particular configuration requires it, like mlnx's msix
> practice. 

And it is not true either. I did MLNX implementation to be aligned with
PCI spec and in-kernel PCI subsystem implementation. Our device can change
MSI-X on-fly and from HW perspective unbind is not important.

Saeed is right "Usually you create the VFs unbound, configure them and
then bind them".

Thanks

Leon Romanovsky Oct. 27, 2022, 6:01 a.m. UTC | #12

On Thu, Oct 27, 2022 at 02:11:55AM +0000, Yinjun Zhang wrote:
> On Wed, 26 Oct 2022 15:22:21 +0100, Saeed Mahameed wrote:
> > On 25 Oct 11:39, Yinjun Zhang wrote:
> > >On Date: Tue, 25 Oct 2022 12:05:14 +0100, Saeed Mahameed wrote:

<...>

> > >```
> > >another is from queue's perspective, several class is supported, not very
> > flexible:
> > >```
> > >pci/xxxx:xx:xx.x:
> > >  name max_q_class size 128 unit entry
> > >    resources:
> > >      # means how many VFs possess max-q-number of 16/8/..1 respectively
> > >      name _16 size 0 unit entry size_min 0 size_max 128 size_gran 1
> > >      name _8 size 0 unit entry size_min 0 size_max 128 size_gran 1
> > >      ...
> > >      name _1 size 0 unit entry size_min 0 size_max 128 size_gran 1
> > >```
> > 
> > weirder.
> 
> Yes, kind of obscure. The intention is to avoid configuring one by one, especially
> when there're thousands of VFs. Any better idea is welcomed.

In parallel to netdev world, we extended netlink UAPI to receive ranges
and gave an option for users to write something like this: "cmd ... 1-100 ...",
which in kernel was translated to loop from 1 to 100.

Of course, we are talking about very specific fields (IDs) which can receive range.

It is just an idea how devlink can be extended to support batch configuration
of same values.

Thanks

Saeed Mahameed Oct. 27, 2022, 8:46 a.m. UTC | #13

On 27 Oct 02:11, Yinjun Zhang wrote:
>On Wed, 26 Oct 2022 15:22:21 +0100, Saeed Mahameed wrote:
>> On 25 Oct 11:39, Yinjun Zhang wrote:
>> >On Date: Tue, 25 Oct 2022 12:05:14 +0100, Saeed Mahameed wrote:
>>
>> Usually you create the VFs unbound, configure them and then bind them.
>> otherwise a query will have to query any possible VF which for some vendors
>> can be thousands ! it's better to work on created but not yet deployed vfs
>
>Usually creating and binding are not separated, that's why `sriov_drivers_autoprobe`
>is default true, unless some particular configuration requires it, like mlnx's msix
>practice.
>
>>
>> >Two options,
>> >one is from VF's perspective, you need configure one by one, very
>> straightforward:
>> >```
>> >pci/xxxx:xx:xx.x:
>> >  name max_q size 128 unit entry
>> >    resources:
>> >      name VF0 size 1 unit entry size_min 1 size_max 128 size_gran 1
>> >      name VF1 size 1 unit entry size_min 1 size_max 128 size_gran 1
>> >      ...
>>
>> the above semantics are really weird,
>> VF0 can't be a sub-resource of max_q !
>
>Sorry, I admit the naming is not appropriate here. It should be "total_q_for_VF "
>for parent resource, and "q_for_VFx" for the sub resources.
>
>>
>> Note that i called the resource "q_table" and not "max_queues",
>> since semantically max_queues is a parameter where q_table can be looked
>> at
>> as a sub-resource of the VF, the q_table size decides the max_queues a VF
>> will accept, so there you go !
>
>Queue itself is a kind of resource, why "q_table"? Just because the unit is entry?
>I think we need introduce a new generic unit, so that its usage won't be limited.
>
it's all abut semantics, yes q is a resource, but max_q is not.
if you want to go with q as a resource, then you will have to start
assigning individual queues to vfs one by one.. hence q_table per VF will
make it easier to control q table size per vf, with max size and guaranteed
size.
  
>> arghh weird.. just make it an attribute for devlink port function and name it
>> max_q as god intended it to be ;). Fix your FW to allow changing VF
>> maxqueue for
>> unbound VFs if needed.
>>
>
>It's not the FW constraints, the reason I don't prefer port way is it needs:
>1. separate VF creating and binding, which needs extra operation
>2. register extra ports for VFs
>Both can be avoided when using resource way.
>

Thanks, good to know it's not a FW/ASIC constraint, 
I am trying to push for one unified orchestration model for all VFs,SFs and the
upcoming intel's SIOV function.

create->configure->deploy. This aligns with all standard virtualization
orchestration modles, libvirt, kr8, etc .. 

Again i am worried we will have to support a config query for ALL possible
functions prior to creation.

Anyway i am flexible, I am ok with having a configure option prior to
creation as long as it doesn't create clutter, and user confusion, and it's
semantically correct.

we can also extend devlink port API to allow configure ops on "future"
ports and we can always extend the API to accept yaml file as an extension
of what Jakub suggested in LPC, to avoid one by one configurations.

Yinjun Zhang Oct. 27, 2022, 9:46 a.m. UTC | #14

On Thu, 27 Oct 2022 09:46:54 +0100, Saeed Mahameed wrote:
<...>
> if you want to go with q as a resource, then you will have to start
> assigning individual queues to vfs one by one.. hence q_table per VF will
> make it easier to control q table size per vf, with max size and guaranteed
> size.

Excuse my foolishness, I still don't get your q_table. What I want is allocating
a certain amount of queues from a queue pool for different VFs, can you 
provide an example of q_table?

<...>
> 
> Thanks, good to know it's not a FW/ASIC constraint,
> I am trying to push for one unified orchestration model for all VFs,SFs and
> the
> upcoming intel's SIOV function.
> 
> create->configure->deploy. This aligns with all standard virtualization
> orchestration modles, libvirt, kr8, etc ..
> 
> Again i am worried we will have to support a config query for ALL possible
> functions prior to creation.
> 
> Anyway i am flexible, I am ok with having a configure option prior to
> creation as long as it doesn't create clutter, and user confusion, and it's
> semantically correct.

Thanks for your ok and thanks to Leon's explanation, I understand your
create->config->deploy proposal. But I have to say the resource way
doesn't break it, you can config it after creating, and it's not constrained
to it, you can config it before creating as well.

> 
> we can also extend devlink port API to allow configure ops on "future"
> ports and we can always extend the API to accept yaml file as an extension
> of what Jakub suggested in LPC, to avoid one by one configurations.
> 
>

Saeed Mahameed Oct. 27, 2022, 10:49 a.m. UTC | #15

On 27 Oct 09:46, Yinjun Zhang wrote:
>On Thu, 27 Oct 2022 09:46:54 +0100, Saeed Mahameed wrote:
><...>
>> if you want to go with q as a resource, then you will have to start
>> assigning individual queues to vfs one by one.. hence q_table per VF will
>> make it easier to control q table size per vf, with max size and guaranteed
>> size.
>
>Excuse my foolishness, I still don't get your q_table. What I want is allocating
>a certain amount of queues from a queue pool for different VFs, can you
>provide an example of q_table?
>

queue pool and queue table are the same concept. Q as a resource is an
individual entity, so it can't be used as the abstract.
for simplicity you can just call the resource "queues" plural, maybe .. 

Also do we want to manage different queue types separately ?
Rx/Tx/Cq/EQ/command etc ?

how about the individual max sizes of these queues ? 

BTW do you plan to have further customization per VF? not in particular
resource oriented customization, more like capability control type of
configs, for example enabling/disabling crypto offloads per VF? admin
policies ? etc .. 
If so i would be happy if we collaborated on defining the APIs.

><...>
>>
>> Thanks, good to know it's not a FW/ASIC constraint,
>> I am trying to push for one unified orchestration model for all VFs,SFs and
>> the
>> upcoming intel's SIOV function.
>>
>> create->configure->deploy. This aligns with all standard virtualization
>> orchestration modles, libvirt, kr8, etc ..
>>
>> Again i am worried we will have to support a config query for ALL possible
>> functions prior to creation.
>>
>> Anyway i am flexible, I am ok with having a configure option prior to
>> creation as long as it doesn't create clutter, and user confusion, and it's
>> semantically correct.
>
>Thanks for your ok and thanks to Leon's explanation, I understand your
>create->config->deploy proposal. But I have to say the resource way
>doesn't break it, you can config it after creating, and it's not constrained
>to it, you can config it before creating as well.
>

Think about the poor admin who will need to have different config steps for
different port flavors.

Yinjun Zhang Oct. 29, 2022, 3:32 a.m. UTC | #16

On 2022/10/27 18:49, Saeed Mahameed wrote:
> On 27 Oct 09:46, Yinjun Zhang wrote:
>> On Thu, 27 Oct 2022 09:46:54 +0100, Saeed Mahameed wrote:
>> <...>
>>> if you want to go with q as a resource, then you will have to start
>>> assigning individual queues to vfs one by one.. hence q_table per VF 
>>> will
>>> make it easier to control q table size per vf, with max size and 
>>> guaranteed
>>> size.
>>
>> Excuse my foolishness, I still don't get your q_table. What I want is 
>> allocating
>> a certain amount of queues from a queue pool for different VFs, can you
>> provide an example of q_table?
>>
>
> queue pool and queue table are the same concept. Q as a resource is an
> individual entity, so it can't be used as the abstract.
> for simplicity you can just call the resource "queues" plural, maybe ..

I see, you're always standing on VF's point, so "queues" is just one of 
the VF's
properties, so that port way sounds better indeed. And I'm standing on
queues' point, and VFs happen to be its recipients.

> Also do we want to manage different queue types separately ?
> Rx/Tx/Cq/EQ/command etc ?

We didn't expect those variants in our case for now.

>
> how about the individual max sizes of these queues ?
> BTW do you plan to have further customization per VF? not in particular
> resource oriented customization, more like capability control type of
> configs, for example enabling/disabling crypto offloads per VF? admin
> policies ? etc .. If so i would be happy if we collaborated on 
> defining the APIs.

Not scheduled either.
I think it's better to introduce port param back, it's more flexible 
than port function
and can avoid frequent change to uAPIs.

[net-next,0/3] nfp: support VF multi-queues configuration

Message

Comments