Message ID | 20230206153603.2801791-2-simon.horman@corigine.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | devlink: expose port function commands to assign VFs to multiple devlink | expand |
On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > +VF assignment setup > +--------------------------- > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > +different ports. Please make sure you run make htmldocs when changing docs, this will warn. > +- Get count of VFs assigned to physical port:: > + > + $ devlink port show pci/0000:82:00.0/0 > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 Physical port has VFs? My knee jerk reaction is that allocating resources via devlink is fine but this seems to lean a bit into forwarding. How do other vendors do it? What's the mapping of VFs to ports? What do you suggest should happen when user enables switchdev mode?
On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > > +VF assignment setup > > +--------------------------- > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > > +different ports. > > Please make sure you run make htmldocs when changing docs, > this will warn. Thanks, will do. > > +- Get count of VFs assigned to physical port:: > > + > > + $ devlink port show pci/0000:82:00.0/0 > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 > > Physical port has VFs? My knee jerk reaction is that allocating > resources via devlink is fine but this seems to lean a bit into > forwarding. How do other vendors do it? What's the mapping of VFs > to ports? We are not aware of any non-vendor-specific mechanism. If there is one we'd gladly consider it. > What do you suggest should happen when user enables switchdev mode? Thanks for pointing this out. It ought to be documented. Prior to this patch-set, for all NFP application firmwares supported by upstream (and I'm not implying there is anything sneaky elsewhere - I am honestly not aware of any such things), the embedded switch on the NIC (which as you know is software running on the NIC), allows forwarding of packets between VFs and physical ports without any partitioning - other than what policy may implement. This remains the behaviour if the feature proposed by this patch-set is not enabled, either because it is unsupported by the application firmware, or has not been enabled, i.e. using devlink. If the feature is enabled, then in effect there is a partition between VFs on one physical port and those on another. And some sort of forwarding in software (on the host) is required. It is my understanding that this is the dominant behaviour amongst multi-port NICs from other vendors which, by default (and perhaps can only), associate VFs with specific physical ports. This is my understanding of things. And I believe this applies in both switchdev and legacy mode.
On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > > +VF assignment setup > > +--------------------------- > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > > +different ports. > > Please make sure you run make htmldocs when changing docs, > this will warn. > > > +- Get count of VFs assigned to physical port:: > > + > > + $ devlink port show pci/0000:82:00.0/0 > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 > > Physical port has VFs? My knee jerk reaction is that allocating > resources via devlink is fine but this seems to lean a bit into > forwarding. How do other vendors do it? What's the mapping of VFs > to ports? I don't understand the meaning of VFs here. If we are talking about PCI VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which talks about having one bit to enable all VFs at once. All these VFs will have separate netdevs. > > What do you suggest should happen when user enables switchdev mode? >
On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote: > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: > > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > > > +VF assignment setup > > > +--------------------------- > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > > > +different ports. > > > > Please make sure you run make htmldocs when changing docs, > > this will warn. > > > > > +- Get count of VFs assigned to physical port:: > > > + > > > + $ devlink port show pci/0000:82:00.0/0 > > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 > > > > Physical port has VFs? My knee jerk reaction is that allocating > > resources via devlink is fine but this seems to lean a bit into > > forwarding. How do other vendors do it? What's the mapping of VFs > > to ports? > > I don't understand the meaning of VFs here. If we are talking about PCI > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which > talks about having one bit to enable all VFs at once. All these VFs will > have separate netdevs. Yes, that is the case here too (before and after). What we are talking about is the association of VFs to physical ports (in the case where a NIC has more than one physical port).
Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote: >From: Fei Qin <fei.qin@corigine.com> > >Multiple physical ports of the same NIC may share the single >PCI address. In some cases, assigning VFs to different physical >ports can be demanded, especially under high-traffic scenario. >Load balancing can be realized in virtualised use¬cases through >distributing packets between different physical ports with LAGs >of VFs which are assigned to those physical ports. > >This patch adds new attribute "vf_count" to 'devlink port function' >API which only can be shown and configured under devlink ports >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL". I have to be missing something. That is the meaning of "assigning VF" to a physical port? Why there should be any relationship between physical port and VF other than configured forwarding (using TC for example)? This seems very wrong. Preliminary NAK. > >e.g. >$ devlink port show pci/0000:82:00.0/0 >pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical >port 0 splittable true lanes 4 > function: > vf_count 0 > >$ devlink port function set pci/0000:82:00.0/0 vf_count 3 > >$ devlink port show pci/0000:82:00.0/0 >pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical >port 0 splittable true lanes 4 > function: > vf_count 3 > >Signed-off-by: Fei Qin <fei.qin@corigine.com> >Signed-off-by: Simon Horman <simon.horman@corigine.com>
Wed, Feb 08, 2023 at 12:36:53PM CET, simon.horman@corigine.com wrote: >On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote: >> On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: >> > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: >> > > +VF assignment setup >> > > +--------------------------- >> > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to >> > > +different ports. >> > >> > Please make sure you run make htmldocs when changing docs, >> > this will warn. >> > >> > > +- Get count of VFs assigned to physical port:: >> > > + >> > > + $ devlink port show pci/0000:82:00.0/0 >> > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 >> > >> > Physical port has VFs? My knee jerk reaction is that allocating >> > resources via devlink is fine but this seems to lean a bit into >> > forwarding. How do other vendors do it? What's the mapping of VFs >> > to ports? >> >> I don't understand the meaning of VFs here. If we are talking about PCI >> VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which >> talks about having one bit to enable all VFs at once. All these VFs will >> have separate netdevs. > >Yes, that is the case here too (before and after). > >What we are talking about is the association of VFs to physical ports >(in the case where a NIC has more than one physical port). What is "the association"?
On Wed, Feb 08, 2023 at 12:36:53PM +0100, Simon Horman wrote: > On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote: > > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: > > > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > > > > +VF assignment setup > > > > +--------------------------- > > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > > > > +different ports. > > > > > > Please make sure you run make htmldocs when changing docs, > > > this will warn. > > > > > > > +- Get count of VFs assigned to physical port:: > > > > + > > > > + $ devlink port show pci/0000:82:00.0/0 > > > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 > > > > > > Physical port has VFs? My knee jerk reaction is that allocating > > > resources via devlink is fine but this seems to lean a bit into > > > forwarding. How do other vendors do it? What's the mapping of VFs > > > to ports? > > > > I don't understand the meaning of VFs here. If we are talking about PCI > > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which > > talks about having one bit to enable all VFs at once. All these VFs will > > have separate netdevs. > > Yes, that is the case here too (before and after). > > What we are talking about is the association of VFs to physical ports > (in the case where a NIC has more than one physical port). We have devices with multiple ports too, but don't have such issues. So it will help if you can provide more context here. I'm failing to see connection between physical ports and physical VFs. Are you saying that physical ports are actual PCI VFs, which spans L2 VFs, which you want to assign to another port (PF)? Thanks
On Wed, Feb 08, 2023 at 01:53:48PM +0200, Leon Romanovsky wrote: > On Wed, Feb 08, 2023 at 12:36:53PM +0100, Simon Horman wrote: > > On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote: > > > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: > > > > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > > > > > +VF assignment setup > > > > > +--------------------------- > > > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > > > > > +different ports. > > > > > > > > Please make sure you run make htmldocs when changing docs, > > > > this will warn. > > > > > > > > > +- Get count of VFs assigned to physical port:: > > > > > + > > > > > + $ devlink port show pci/0000:82:00.0/0 > > > > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 > > > > > > > > Physical port has VFs? My knee jerk reaction is that allocating > > > > resources via devlink is fine but this seems to lean a bit into > > > > forwarding. How do other vendors do it? What's the mapping of VFs > > > > to ports? > > > > > > I don't understand the meaning of VFs here. If we are talking about PCI > > > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which > > > talks about having one bit to enable all VFs at once. All these VFs will > > > have separate netdevs. > > > > Yes, that is the case here too (before and after). > > > > What we are talking about is the association of VFs to physical ports > > (in the case where a NIC has more than one physical port). > > We have devices with multiple ports too, but don't have such issues. > So it will help if you can provide more context here. > > I'm failing to see connection between physical ports and physical VFs. > > Are you saying that physical ports are actual PCI VFs, which spans L2 VFs, > which you want to assign to another port (PF)? No, a physical port is not a VF (nor a PF, FWIIW). The topic here is about two modes of behaviour. One, where VFs are associated with physical ports - conceptually one might think of this as some VFs and one phys port being plugged into one VEB, while other VFs and another phys port are plugged into another VEB. I believe this is the mode on most multi-port NICs. And another mode where all VFs are associated with one physical port, even if more than one is present. The NFP currently implements this model. This patch is about allowing NICs, in particular the NFP based NICs, to switch modes.
On Wed, Feb 08, 2023 at 12:40:45PM +0100, Jiri Pirko wrote: > Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote: > >From: Fei Qin <fei.qin@corigine.com> > > > >Multiple physical ports of the same NIC may share the single > >PCI address. In some cases, assigning VFs to different physical > >ports can be demanded, especially under high-traffic scenario. > >Load balancing can be realized in virtualised use¬cases through > >distributing packets between different physical ports with LAGs > >of VFs which are assigned to those physical ports. > > > >This patch adds new attribute "vf_count" to 'devlink port function' > >API which only can be shown and configured under devlink ports > >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL". > > I have to be missing something. That is the meaning of "assigning VF" > to a physical port? Why there should be any relationship between > physical port and VF other than configured forwarding (using TC for > example)? > > This seems very wrong. Preliminary NAK. Of course if TC is involved, then we have flexibility. What we are talking about here is primarily legacy mode. And the behaviour described would, when enabled allow NFP based NICs to behave more like most other multi-port NICs. That is, we can envisage a VEB with some VFs and one physical port. And anther with other VFs and another physical port. This is as opposed to a single VEB with all VFs, as is currently the case on NFP based NICs (but not most other multi-port NICs).
On Wed, Feb 08, 2023 at 12:41:45PM +0100, Jiri Pirko wrote: > Wed, Feb 08, 2023 at 12:36:53PM CET, simon.horman@corigine.com wrote: > >On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote: > >> On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: > >> > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: > >> > > +VF assignment setup > >> > > +--------------------------- > >> > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to > >> > > +different ports. > >> > > >> > Please make sure you run make htmldocs when changing docs, > >> > this will warn. > >> > > >> > > +- Get count of VFs assigned to physical port:: > >> > > + > >> > > + $ devlink port show pci/0000:82:00.0/0 > >> > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 > >> > > >> > Physical port has VFs? My knee jerk reaction is that allocating > >> > resources via devlink is fine but this seems to lean a bit into > >> > forwarding. How do other vendors do it? What's the mapping of VFs > >> > to ports? > >> > >> I don't understand the meaning of VFs here. If we are talking about PCI > >> VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which > >> talks about having one bit to enable all VFs at once. All these VFs will > >> have separate netdevs. > > > >Yes, that is the case here too (before and after). > > > >What we are talking about is the association of VFs to physical ports > >(in the case where a NIC has more than one physical port). > > What is "the association"? My current explanation - and I'm sure I can dig and find others - is to association == plugged into same VEB. But I feel that description will lead to further confusion :(
Wed, Feb 08, 2023 at 01:07:54PM CET, simon.horman@corigine.com wrote: >On Wed, Feb 08, 2023 at 12:40:45PM +0100, Jiri Pirko wrote: >> Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote: >> >From: Fei Qin <fei.qin@corigine.com> >> > >> >Multiple physical ports of the same NIC may share the single >> >PCI address. In some cases, assigning VFs to different physical >> >ports can be demanded, especially under high-traffic scenario. >> >Load balancing can be realized in virtualised use¬cases through >> >distributing packets between different physical ports with LAGs >> >of VFs which are assigned to those physical ports. >> > >> >This patch adds new attribute "vf_count" to 'devlink port function' >> >API which only can be shown and configured under devlink ports >> >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL". >> >> I have to be missing something. That is the meaning of "assigning VF" >> to a physical port? Why there should be any relationship between >> physical port and VF other than configured forwarding (using TC for >> example)? >> >> This seems very wrong. Preliminary NAK. > >Of course if TC is involved, then we have flexibility. > >What we are talking about here is primarily legacy mode. I don't see any reason to add knobs for purpose of supporting the legacy mode, sorry. If you need this functionality, use TC. >And the behaviour described would, when enabled allow NFP based NICs >to behave more like most other multi-port NICs. > >That is, we can envisage a VEB with some VFs and one physical port. >And anther with other VFs and another physical port. > >This is as opposed to a single VEB with all VFs, as is currently >the case on NFP based NICs (but not most other multi-port NICs). >
On Wed, Feb 08, 2023 at 01:34:19PM +0100, Jiri Pirko wrote: > Wed, Feb 08, 2023 at 01:07:54PM CET, simon.horman@corigine.com wrote: > >On Wed, Feb 08, 2023 at 12:40:45PM +0100, Jiri Pirko wrote: > >> Mon, Feb 06, 2023 at 04:36:02PM CET, simon.horman@corigine.com wrote: > >> >From: Fei Qin <fei.qin@corigine.com> > >> > > >> >Multiple physical ports of the same NIC may share the single > >> >PCI address. In some cases, assigning VFs to different physical > >> >ports can be demanded, especially under high-traffic scenario. > >> >Load balancing can be realized in virtualised use¬cases through > >> >distributing packets between different physical ports with LAGs > >> >of VFs which are assigned to those physical ports. > >> > > >> >This patch adds new attribute "vf_count" to 'devlink port function' > >> >API which only can be shown and configured under devlink ports > >> >with flavor "DEVLINK_PORT_FLAVOUR_PHYSICAL". > >> > >> I have to be missing something. That is the meaning of "assigning VF" > >> to a physical port? Why there should be any relationship between > >> physical port and VF other than configured forwarding (using TC for > >> example)? > >> > >> This seems very wrong. Preliminary NAK. > > > >Of course if TC is involved, then we have flexibility. > > > >What we are talking about here is primarily legacy mode. > > I don't see any reason to add knobs for purpose of supporting the legacy > mode, sorry. > > If you need this functionality, use TC. I understand your point, even if I don't agree in this case. > >And the behaviour described would, when enabled allow NFP based NICs > >to behave more like most other multi-port NICs. > > > >That is, we can envisage a VEB with some VFs and one physical port. > >And anther with other VFs and another physical port. > > > >This is as opposed to a single VEB with all VFs, as is currently > >the case on NFP based NICs (but not most other multi-port NICs). > >
On 08 Feb 13:05, Simon Horman wrote: >On Wed, Feb 08, 2023 at 01:53:48PM +0200, Leon Romanovsky wrote: >> On Wed, Feb 08, 2023 at 12:36:53PM +0100, Simon Horman wrote: >> > On Wed, Feb 08, 2023 at 01:21:22PM +0200, Leon Romanovsky wrote: >> > > On Mon, Feb 06, 2023 at 06:42:27PM -0800, Jakub Kicinski wrote: >> > > > On Mon, 6 Feb 2023 16:36:02 +0100 Simon Horman wrote: >> > > > > +VF assignment setup >> > > > > +--------------------------- >> > > > > +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to >> > > > > +different ports. >> > > > >> > > > Please make sure you run make htmldocs when changing docs, >> > > > this will warn. >> > > > >> > > > > +- Get count of VFs assigned to physical port:: >> > > > > + >> > > > > + $ devlink port show pci/0000:82:00.0/0 >> > > > > + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 >> > > > >> > > > Physical port has VFs? My knee jerk reaction is that allocating >> > > > resources via devlink is fine but this seems to lean a bit into >> > > > forwarding. How do other vendors do it? What's the mapping of VFs >> > > > to ports? >> > > >> > > I don't understand the meaning of VFs here. If we are talking about PCI >> > > VFs, other vendors follow PCI spec "9.3.3.3.1 VF Enable" section, which >> > > talks about having one bit to enable all VFs at once. All these VFs will >> > > have separate netdevs. >> > >> > Yes, that is the case here too (before and after). >> > >> > What we are talking about is the association of VFs to physical ports >> > (in the case where a NIC has more than one physical port). >> >> We have devices with multiple ports too, but don't have such issues. >> So it will help if you can provide more context here. >> >> I'm failing to see connection between physical ports and physical VFs. >> >> Are you saying that physical ports are actual PCI VFs, which spans L2 VFs, >> which you want to assign to another port (PF)? > >No, a physical port is not a VF (nor a PF, FWIIW). > >The topic here is about two modes of behaviour. > >One, where VFs are associated with physical ports - conceptually one might >think of this as some VFs and one phys port being plugged into one VEB, >while other VFs and another phys port are plugged into another VEB. > >I believe this is the mode on most multi-port NICs. > >And another mode where all VFs are associated with one physical port, >even if more than one is present. The NFP currently implements this model. > >This patch is about allowing NICs, in particular the NFP based NICs, >to switch modes. I don't understand the difference between the two modes, 1) "where VFs are associated with physical ports" 2) "another mode where all VFs are associated with one physical port" anyway here how it works for ConnectX devices, and i think the model should be generalized to others as it simplifies the user life in my opinion. Each physical port is represented by a PCI PF function. devlink dev show pci/0000:81:00.0 #port 1 pci/0000:81:00.1 #port 2 when you enable sriov, you enable it on a specific PF, eventually port, meaning those vfs are only associated with that port. # enable vfs on port 1 echo 4 > /sys/bus/pci/devices/0000:81:00.0/sriov_numvfs # enable vfs on port 2 echo 4 > /sys/bus/pci/devices/0000:81:00.1/sriov_numvfs $ devlink dev show # port 1 PF pci/0000:81:00.0 # port 2 PF pci/0000:81:00.1 # port 1 VFs pci/0000:81:00.2 pci/0000:81:00.3 pci/0000:81:00.4 pci/0000:81:00.5 # port 2 VFs pci/0000:81:01.2 pci/0000:81:01.3 pci/0000:81:01.4 pci/0000:81:01.5 The pci address enumeration is device specific, but i don't think user should care.
On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote: > I don't understand the difference between the two modes, > 1) "where VFs are associated with physical ports" > 2) "another mode where all VFs are associated with one physical port" > > anyway here how it works for ConnectX devices, and i think the model should > be generalized to others as it simplifies the user life in my opinion. I'm guessing the version of the NFP Simon posted this for behaves much like CX3 / mlx4. One PF, multiple Ethernet ports.
On Wed, 8 Feb 2023 13:34:19 +0100 Jiri Pirko wrote: >> Of course if TC is involved, then we have flexibility. >> >> What we are talking about here is primarily legacy mode. > > I don't see any reason to add knobs for purpose of supporting the legacy > mode, sorry. > > If you need this functionality, use TC. Agreed, I seem to remember that mlx4 had some custom module param to do exactly the same thing. But this is a new addition so we should just say no.
On 08 Feb 15:35, Jakub Kicinski wrote: >On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote: >> I don't understand the difference between the two modes, >> 1) "where VFs are associated with physical ports" >> 2) "another mode where all VFs are associated with one physical port" >> >> anyway here how it works for ConnectX devices, and i think the model should >> be generalized to others as it simplifies the user life in my opinion. > >I'm guessing the version of the NFP Simon posted this for behaves >much like CX3 / mlx4. One PF, multiple Ethernet ports. Then the question is, can they do PF per port and avoid such complex APIs ?
On Wed, 8 Feb 2023 16:55:12 -0800, Saeed Mahameed wrote: > On 08 Feb 15:35, Jakub Kicinski wrote: > >On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote: > >> I don't understand the difference between the two modes, > >> 1) "where VFs are associated with physical ports" > >> 2) "another mode where all VFs are associated with one physical port" > >> > >> anyway here how it works for ConnectX devices, and i think the model > should > >> be generalized to others as it simplifies the user life in my opinion. > > > >I'm guessing the version of the NFP Simon posted this for behaves > >much like CX3 / mlx4. One PF, multiple Ethernet ports. > > Then the question is, can they do PF per port and avoid such complex APIs ? > To answer your last question, it needs silicon support, so we can't for some old products. Then let me clarify something more for this patch-set's purpose. Indeed, one port per PF is current mainstream. In this case, all the VFs created from PF0 use physical port 0 as the uplink port(outlet to external world), and all the VFs from PF1 use p1 as the uplink port. Let me call them two switch-sets. And they're isolated, you can't make the traffic input from VFs of PF0 output to p1 or VFs of PF1, right? Even with TC in switchdev mode, the two switch-sets are still isolated, right? Correct me if I'm wrong here. And the posted configuration in this patch-set is useless in this case, it's for one PF with multi ports. Let me take NFP implementation for example here, all the VFs created from the single PF use p0 as the uplink port by default. In legacy mode, by no means we can choose other ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split one switch-set to several switch-sets with every physical port as the uplink port respectively, by grouping the VFs and assigning them to physical ports.
Thu, Feb 09, 2023 at 03:20:48AM CET, yinjun.zhang@corigine.com wrote: >On Wed, 8 Feb 2023 16:55:12 -0800, Saeed Mahameed wrote: >> On 08 Feb 15:35, Jakub Kicinski wrote: >> >On Wed, 8 Feb 2023 13:37:08 -0800 Saeed Mahameed wrote: >> >> I don't understand the difference between the two modes, >> >> 1) "where VFs are associated with physical ports" >> >> 2) "another mode where all VFs are associated with one physical port" >> >> >> >> anyway here how it works for ConnectX devices, and i think the model >> should >> >> be generalized to others as it simplifies the user life in my opinion. >> > >> >I'm guessing the version of the NFP Simon posted this for behaves >> >much like CX3 / mlx4. One PF, multiple Ethernet ports. >> >> Then the question is, can they do PF per port and avoid such complex APIs ? >> > >To answer your last question, it needs silicon support, so we can't for some old products. > >Then let me clarify something more for this patch-set's purpose. >Indeed, one port per PF is current mainstream. In this case, all the VFs created from PF0 >use physical port 0 as the uplink port(outlet to external world), and all the VFs from PF1 >use p1 as the uplink port. Let me call them two switch-sets. And they're isolated, you can't >make the traffic input from VFs of PF0 output to p1 or VFs of PF1, right? Even with TC in >switchdev mode, the two switch-sets are still isolated, right? Correct me if I'm wrong here. >And the posted configuration in this patch-set is useless in this case, it's for one PF with >multi ports. > >Let me take NFP implementation for example here, all the VFs created from the single PF >use p0 as the uplink port by default. In legacy mode, by no means we can choose other Legacy is legacy. I believe it is like 5 years already no knobs for legacy mode are accepted. You should not use it for new features. Why this is any different? Implement TC offloading and then you can ballance the hell out of the thing :) >ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split >one switch-set to several switch-sets with every physical port as the uplink port respectively, >by grouping the VFs and assigning them to physical ports.
On Thu, 9 Feb 2023 16:15:58 +0100, Jiri Pirko wrote: > Thu, Feb 09, 2023 at 03:20:48AM CET, yinjun.zhang@corigine.com wrote: > > > >Let me take NFP implementation for example here, all the VFs created from the single PF > >use p0 as the uplink port by default. In legacy mode, by no means we can choose other > > Legacy is legacy. I believe it is like 5 years already no knobs for > legacy mode are accepted. You should not use it for new features. > Why this is any different? > > Implement TC offloading and then you can ballance the hell out of the > thing :) I understand in switchdev mode, the fine-grained manipulation by TC can do it. While legacy has fixed forwarding rule, and we hope it can be implemented without too much involved configuration from user if they only want legacy forwarding. As multi-port mapping to one PF NIC is scarce, maybe we should implement is as vendor specific configuration, make sense? > > > >ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split > >one switch-set to several switch-sets with every physical port as the uplink port respectively, > >by grouping the VFs and assigning them to physical ports.
On Fri, 10 Feb 2023 02:14:27 +0000 Yinjun Zhang wrote: > I understand in switchdev mode, the fine-grained manipulation by TC can do it. > While legacy has fixed forwarding rule, and we hope it can be implemented without > too much involved configuration from user if they only want legacy forwarding. > > As multi-port mapping to one PF NIC is scarce, maybe we should implement is as > vendor specific configuration, make sense? Vendor extension or not we are disallowing adding configuration for legacy SR-IOV mode. We want people to move to switchdev mode, otherwise we'll have to keep extending both for ever.
Fri, Feb 10, 2023 at 03:14:27AM CET, yinjun.zhang@corigine.com wrote: >On Thu, 9 Feb 2023 16:15:58 +0100, Jiri Pirko wrote: >> Thu, Feb 09, 2023 at 03:20:48AM CET, yinjun.zhang@corigine.com wrote: >> > >> >Let me take NFP implementation for example here, all the VFs created from the single PF >> >use p0 as the uplink port by default. In legacy mode, by no means we can choose other >> >> Legacy is legacy. I believe it is like 5 years already no knobs for >> legacy mode are accepted. You should not use it for new features. >> Why this is any different? >> >> Implement TC offloading and then you can ballance the hell out of the >> thing :) > >I understand in switchdev mode, the fine-grained manipulation by TC can do it. >While legacy has fixed forwarding rule, and we hope it can be implemented without >too much involved configuration from user if they only want legacy forwarding. > >As multi-port mapping to one PF NIC is scarce, maybe we should implement is as >vendor specific configuration, make sense? No, it does not make sense what so ever. You want to extend legacy, which is no longer an option (for many years). If you need this feature, implement switchdev mode for your device. Simple as that. I think this was clearly stated in multiple emails in this thread, I don't follow why it needs to be repeated. > >> >> >> >ports as outlet. So what we're doing here is try to simulate one-port-per-PF case, to split >> >one switch-set to several switch-sets with every physical port as the uplink port respectively, >> >by grouping the VFs and assigning them to physical ports.
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index 3da590953ce8..5c3996bce6d9 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -128,6 +128,9 @@ Users may also set the RoCE capability of the function using Users may also set the function as migratable using 'devlink port function set migratable' command. +Users may also assign VFs to physical ports using +'devlink port function set vf_count' command. + Function attributes =================== @@ -240,6 +243,27 @@ Attach VF to the VM. Start the VM. Perform live migration. + +VF assignment setup +--------------------------- +In some cases, NICs could have multiple physical ports per PF. Users can assign VFs to +different ports. + +- Get count of VFs assigned to physical port:: + + $ devlink port show pci/0000:82:00.0/0 + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 + function: + vf_count 0 + +- Set count of VFs assigned to physical port:: + $ devlink port function set pci/0000:82:00.0/0 vf_count 3 + + $ devlink port show pci/0000:82:00.0/0 + pci/0000:82:00.0/0: type eth netdev enp130s0np0 flavour physical port 0 splittable true lanes 4 + function: + vf_count 3 + Subfunction ============ diff --git a/include/net/devlink.h b/include/net/devlink.h index 2e85a5970a32..3e98fa3d251f 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1491,6 +1491,27 @@ struct devlink_ops { int (*port_fn_migratable_set)(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack); + + /** + * @port_fn_vf_count_get: Port function's VF count get function + * + * Get assigned VF count of a function managed by the devlink port, + * should only be used for DEVLINK_PORT_FLAVOUR_PHYSICAL. + * Return -EOPNOTSUPP if port function vf_count setup is not supported. + */ + int (*port_fn_vf_count_get)(struct devlink_port *port, u16 *vf_count, + struct netlink_ext_ack *extack); + + /** + * @port_fn_vf_count_set: Port function's VF count set function + * + * Set assigned VF count of a function managed by the devlink port, + * should only be used for DEVLINK_PORT_FLAVOUR_PHYSICAL. + * Return -EOPNOTSUPP if port function vf_count setup is not supported. + */ + int (*port_fn_vf_count_set)(struct devlink_port *port, u16 vf_count, + struct netlink_ext_ack *extack); + /** * port_new() - Add a new port function of a specified flavor * @devlink: Devlink instance diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 3782d4219ac9..774e17f6100b 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -676,6 +676,7 @@ enum devlink_port_function_attr { DEVLINK_PORT_FN_ATTR_STATE, /* u8 */ DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ + DEVLINK_PORT_FN_ATTR_VF_COUNT, /* u16 */ __DEVLINK_PORT_FUNCTION_ATTR_MAX, DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1 diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c index 97d30ea98b00..6dac8b562232 100644 --- a/net/devlink/leftover.c +++ b/net/devlink/leftover.c @@ -141,6 +141,7 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ DEVLINK_PORT_FN_STATE_ACTIVE), [DEVLINK_PORT_FN_ATTR_CAPS] = NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), + [DEVLINK_PORT_FN_ATTR_VF_COUNT] = { .type = NLA_U16 }, }; #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ @@ -520,6 +521,35 @@ static int devlink_port_fn_caps_fill(const struct devlink_ops *ops, return 0; } +static int devlink_port_fn_vf_count_fill(const struct devlink_ops *ops, + struct devlink_port *devlink_port, + struct sk_buff *msg, + struct netlink_ext_ack *extack, + bool *msg_updated) +{ + u16 vf_count; + int err; + + if (!ops->port_fn_vf_count_get || + devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL) + return 0; + + err = ops->port_fn_vf_count_get(devlink_port, &vf_count, extack); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + + err = nla_put_u16(msg, DEVLINK_PORT_FN_ATTR_VF_COUNT, vf_count); + if (err) + return err; + + *msg_updated = true; + + return 0; +} + static int devlink_sb_tc_index_get_from_info(struct devlink_sb *devlink_sb, struct genl_info *info, @@ -871,6 +901,16 @@ static int devlink_port_fn_caps_set(struct devlink_port *devlink_port, return 0; } +static int devlink_port_fn_vf_count_set(struct devlink_port *devlink_port, + const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + const struct devlink_ops *ops = devlink_port->devlink->ops; + u16 vf_count = nla_get_u16(attr); + + return ops->port_fn_vf_count_set(devlink_port, vf_count, extack); +} + static int devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port, struct netlink_ext_ack *extack) @@ -893,6 +933,11 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por &msg_updated); if (err) goto out; + + err = devlink_port_fn_vf_count_fill(ops, port, msg, extack, &msg_updated); + if (err) + goto out; + err = devlink_port_fn_state_fill(ops, port, msg, extack, &msg_updated); out: if (err || !msg_updated) @@ -1219,6 +1264,19 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port, "Function does not support state setting"); return -EOPNOTSUPP; } + attr = tb[DEVLINK_PORT_FN_ATTR_VF_COUNT]; + if (attr) { + if (!ops->port_fn_vf_count_set) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "Function doesn't support VF assignment"); + return -EOPNOTSUPP; + } + if (devlink_port->attrs.flavour != DEVLINK_PORT_FLAVOUR_PHYSICAL) { + NL_SET_ERR_MSG_ATTR(extack, attr, + "VFs assignment supported for physical ports only"); + return -EOPNOTSUPP; + } + } attr = tb[DEVLINK_PORT_FN_ATTR_CAPS]; if (attr) { struct nla_bitfield32 caps; @@ -1278,6 +1336,13 @@ static int devlink_port_function_set(struct devlink_port *port, return err; } + attr = tb[DEVLINK_PORT_FN_ATTR_VF_COUNT]; + if (attr) { + err = devlink_port_fn_vf_count_set(port, attr, extack); + if (err) + return err; + } + /* Keep this as the last function attribute set, so that when * multiple port function attributes are set along with state, * Those can be applied first before activating the state.