diff mbox series

[v4,05/23] ice: Add devlink params support

Message ID 20210406210125.241-6-shiraz.saleem@intel.com (mailing list archive)
State Superseded
Headers show
Series Add Intel Ethernet Protocol Driver for RDMA (irdma) | expand

Commit Message

Shiraz Saleem April 6, 2021, 9:01 p.m. UTC
Add a new generic runtime devlink parameter 'rdma_protocol'
and use it in ice PCI driver. Configuration changes
result in unplugging the auxiliary RDMA device and re-plugging
it with updated values for irdma auxiiary driver to consume at
drv.probe()

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 .../networking/devlink/devlink-params.rst          |  6 ++
 Documentation/networking/devlink/ice.rst           | 13 +++
 drivers/net/ethernet/intel/ice/ice_devlink.c       | 92 +++++++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
 drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
 include/net/devlink.h                              |  4 +
 net/core/devlink.c                                 |  5 ++
 7 files changed, 125 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe April 7, 2021, 2:57 p.m. UTC | #1
On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> Add a new generic runtime devlink parameter 'rdma_protocol'
> and use it in ice PCI driver. Configuration changes
> result in unplugging the auxiliary RDMA device and re-plugging
> it with updated values for irdma auxiiary driver to consume at
> drv.probe()
> 
> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
>  .../networking/devlink/devlink-params.rst          |  6 ++
>  Documentation/networking/devlink/ice.rst           | 13 +++
>  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92 +++++++++++++++++++++-
>  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
>  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
>  include/net/devlink.h                              |  4 +
>  net/core/devlink.c                                 |  5 ++
>  7 files changed, 125 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
> index 54c9f10..0b454c3 100644
> +++ b/Documentation/networking/devlink/devlink-params.rst
> @@ -114,3 +114,9 @@ own name.
>         will NACK any attempt of other host to reset the device. This parameter
>         is useful for setups where a device is shared by different hosts, such
>         as multi-host setup.
> +   * - ``rdma_protocol``
> +     - string
> +     - Selects the RDMA protocol selected for multi-protocol devices.
> +        - ``iwarp`` iWARP
> +	- ``roce`` RoCE
> +	- ``ib`` Infiniband

I'm still not sure this belongs in devlink. 

What about devices that support roce and iwarp concurrently?

There is nothing at the protocol level that precludes this - doesn't
this device allow it?

I know Parav is looking at the general problem of how to customize
what aux devices are created, that may be a better fit for this.

Can you remove the devlink parts to make progress?

Jason
Shiraz Saleem April 7, 2021, 8:58 p.m. UTC | #2
> Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> 
> On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> > Add a new generic runtime devlink parameter 'rdma_protocol'
> > and use it in ice PCI driver. Configuration changes result in
> > unplugging the auxiliary RDMA device and re-plugging it with updated
> > values for irdma auxiiary driver to consume at
> > drv.probe()
> >
> > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> >  .../networking/devlink/devlink-params.rst          |  6 ++
> >  Documentation/networking/devlink/ice.rst           | 13 +++
> >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92 +++++++++++++++++++++-
> >  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
> >  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
> >  include/net/devlink.h                              |  4 +
> >  net/core/devlink.c                                 |  5 ++
> >  7 files changed, 125 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/networking/devlink/devlink-params.rst
> > b/Documentation/networking/devlink/devlink-params.rst
> > index 54c9f10..0b454c3 100644
> > +++ b/Documentation/networking/devlink/devlink-params.rst
> > @@ -114,3 +114,9 @@ own name.
> >         will NACK any attempt of other host to reset the device. This parameter
> >         is useful for setups where a device is shared by different hosts, such
> >         as multi-host setup.
> > +   * - ``rdma_protocol``
> > +     - string
> > +     - Selects the RDMA protocol selected for multi-protocol devices.
> > +        - ``iwarp`` iWARP
> > +	- ``roce`` RoCE
> > +	- ``ib`` Infiniband
> 
> I'm still not sure this belongs in devlink.

I believe you suggested we use devlink for protocol switch.

> 
> What about devices that support roce and iwarp concurrently?
> 
> There is nothing at the protocol level that precludes this - doesn't this device allow
> it?

Nope. This device doesn’t support both protocols concurrently on same PCI function.

Maybe then it makes sense to move this protocol switch as driver specific devlink?

> 
> I know Parav is looking at the general problem of how to customize what aux
> devices are created, that may be a better fit for this.
> 
> Can you remove the devlink parts to make progress?
> 

It is important since otherwise the customer will have no way to use RoCEv2 on this device.

Shiraz
Jason Gunthorpe April 7, 2021, 10:46 p.m. UTC | #3
On Wed, Apr 07, 2021 at 08:58:25PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > 
> > On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> > > Add a new generic runtime devlink parameter 'rdma_protocol'
> > > and use it in ice PCI driver. Configuration changes result in
> > > unplugging the auxiliary RDMA device and re-plugging it with updated
> > > values for irdma auxiiary driver to consume at
> > > drv.probe()
> > >
> > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > >  .../networking/devlink/devlink-params.rst          |  6 ++
> > >  Documentation/networking/devlink/ice.rst           | 13 +++
> > >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92 +++++++++++++++++++++-
> > >  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
> > >  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
> > >  include/net/devlink.h                              |  4 +
> > >  net/core/devlink.c                                 |  5 ++
> > >  7 files changed, 125 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/networking/devlink/devlink-params.rst
> > > b/Documentation/networking/devlink/devlink-params.rst
> > > index 54c9f10..0b454c3 100644
> > > +++ b/Documentation/networking/devlink/devlink-params.rst
> > > @@ -114,3 +114,9 @@ own name.
> > >         will NACK any attempt of other host to reset the device. This parameter
> > >         is useful for setups where a device is shared by different hosts, such
> > >         as multi-host setup.
> > > +   * - ``rdma_protocol``
> > > +     - string
> > > +     - Selects the RDMA protocol selected for multi-protocol devices.
> > > +        - ``iwarp`` iWARP
> > > +	- ``roce`` RoCE
> > > +	- ``ib`` Infiniband
> > 
> > I'm still not sure this belongs in devlink.
> 
> I believe you suggested we use devlink for protocol switch.

Yes, devlink is the right place, but selecting a *single* protocol
doesn't seem right, or general enough.

Parav is talking about generic ways to customize the aux devices
created and that would seem to serve the same function as this.

> > I know Parav is looking at the general problem of how to customize what aux
> > devices are created, that may be a better fit for this.
> > 
> > Can you remove the devlink parts to make progress?
> 
> It is important since otherwise the customer will have no way to use RoCEv2 on this device.

I'm not saying to not having it eventually, I'm just getting tired of
looking at 23 patches. You can argue it out after

I'm also half thinking of applying this under driver/staging or
CONFIG_BROKEN or something just because I am getting sick of looking
at it.

Jason
Shiraz Saleem April 12, 2021, 2:50 p.m. UTC | #4
> Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> 
> On Wed, Apr 07, 2021 at 08:58:25PM +0000, Saleem, Shiraz wrote:
> > > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > >
> > > On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> > > > Add a new generic runtime devlink parameter 'rdma_protocol'
> > > > and use it in ice PCI driver. Configuration changes result in
> > > > unplugging the auxiliary RDMA device and re-plugging it with
> > > > updated values for irdma auxiiary driver to consume at
> > > > drv.probe()
> > > >
> > > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > >  .../networking/devlink/devlink-params.rst          |  6 ++
> > > >  Documentation/networking/devlink/ice.rst           | 13 +++
> > > >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92
> +++++++++++++++++++++-
> > > >  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
> > > >  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
> > > >  include/net/devlink.h                              |  4 +
> > > >  net/core/devlink.c                                 |  5 ++
> > > >  7 files changed, 125 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/networking/devlink/devlink-params.rst
> > > > b/Documentation/networking/devlink/devlink-params.rst
> > > > index 54c9f10..0b454c3 100644
> > > > +++ b/Documentation/networking/devlink/devlink-params.rst
> > > > @@ -114,3 +114,9 @@ own name.
> > > >         will NACK any attempt of other host to reset the device. This parameter
> > > >         is useful for setups where a device is shared by different hosts, such
> > > >         as multi-host setup.
> > > > +   * - ``rdma_protocol``
> > > > +     - string
> > > > +     - Selects the RDMA protocol selected for multi-protocol devices.
> > > > +        - ``iwarp`` iWARP
> > > > +	- ``roce`` RoCE
> > > > +	- ``ib`` Infiniband
> > >
> > > I'm still not sure this belongs in devlink.
> >
> > I believe you suggested we use devlink for protocol switch.
> 
> Yes, devlink is the right place, but selecting a *single* protocol doesn't seem right,
> or general enough.
> 
> Parav is talking about generic ways to customize the aux devices created and that
> would seem to serve the same function as this.

Is there an RFC or something posted for us to look at?

> 
> > > I know Parav is looking at the general problem of how to customize
> > > what aux devices are created, that may be a better fit for this.
> > >
> > > Can you remove the devlink parts to make progress?
> >
> > It is important since otherwise the customer will have no way to use RoCEv2 on
> this device.
> 
> I'm not saying to not having it eventually, I'm just getting tired of looking at 23
> patches. You can argue it out after
> 

Since this device cannot do concurrent protocols per function,
is having a driver specific devlink to switch between the 2 also a NACK?

There is something similar in another PCI driver.
https://www.kernel.org/doc/html/latest/networking/devlink/qed.html

Can we not adapt to Parav's solution (if it's a good fit) when it becomes available?

Shiraz
Parav Pandit April 12, 2021, 7:07 p.m. UTC | #5
> From: Saleem, Shiraz <shiraz.saleem@intel.com>
> Sent: Monday, April 12, 2021 8:21 PM
> 
> > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> >
> > On Wed, Apr 07, 2021 at 08:58:25PM +0000, Saleem, Shiraz wrote:
> > > > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > > >
> > > > On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> > > > > Add a new generic runtime devlink parameter 'rdma_protocol'
> > > > > and use it in ice PCI driver. Configuration changes result in
> > > > > unplugging the auxiliary RDMA device and re-plugging it with
> > > > > updated values for irdma auxiiary driver to consume at
> > > > > drv.probe()
> > > > >
> > > > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > >  .../networking/devlink/devlink-params.rst          |  6 ++
> > > > >  Documentation/networking/devlink/ice.rst           | 13 +++
> > > > >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92
> > +++++++++++++++++++++-
> > > > >  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
> > > > >  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
> > > > >  include/net/devlink.h                              |  4 +
> > > > >  net/core/devlink.c                                 |  5 ++
> > > > >  7 files changed, 125 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/networking/devlink/devlink-params.rst
> > > > > b/Documentation/networking/devlink/devlink-params.rst
> > > > > index 54c9f10..0b454c3 100644
> > > > > +++ b/Documentation/networking/devlink/devlink-params.rst
> > > > > @@ -114,3 +114,9 @@ own name.
> > > > >         will NACK any attempt of other host to reset the device. This
> parameter
> > > > >         is useful for setups where a device is shared by different hosts,
> such
> > > > >         as multi-host setup.
> > > > > +   * - ``rdma_protocol``
> > > > > +     - string
> > > > > +     - Selects the RDMA protocol selected for multi-protocol devices.
> > > > > +        - ``iwarp`` iWARP
> > > > > +	- ``roce`` RoCE
> > > > > +	- ``ib`` Infiniband
> > > >
> > > > I'm still not sure this belongs in devlink.
> > >
> > > I believe you suggested we use devlink for protocol switch.
> >
> > Yes, devlink is the right place, but selecting a *single* protocol
> > doesn't seem right, or general enough.
> >
> > Parav is talking about generic ways to customize the aux devices
> > created and that would seem to serve the same function as this.
> 
> Is there an RFC or something posted for us to look at?
I do not have polished RFC content ready yet.
But coping the full config sequence snippet from the internal draft (changed for ice example) here as I like to discuss with you in this context.

# (1) show auxiliary device types supported by a given devlink device.
# applies to pci pf,vf,sf. (in general at devlink instance).
$ devlink dev auxdev show pci/0000:06.00.0
pci/0000:06.00.0:
  current:
    roce eth
  new:
  supported:
    roce eth iwarp

# (2) enable iwarp and ethernet type of aux devices and disable roce.
$ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on

# (3) now see which aux devices will be enable on next reload.
$ devlink dev auxdev show pci/0000:06:00.0
pci/0000:06:00.0:
  current:
    roce eth
  new:
    eth iwarp
  supported:
    roce eth iwarp

# (4) now reload the device and see which aux devices are created.
At this point driver undergoes reconfig for removal of roce and adding iwarp.
$ devlink reload pci/0000:06:00.0

# (5) verify which are the aux devices now activated.
$ devlink dev auxdev show pci/0000:06:00.0
pci/0000:06:00.0:
  current:
    roce eth
  new:
  supported:
    roce eth iwarp

Above 'new' section is only shown when its set. (similar to devlink resource).
In command two vendor driver can fail the call when iwarp and roce both enabled by user.
mlx5_core doesn't have iwarp, but its vdpa type of device. And for other cases we just want to enable roce disabling eth and vdpa.
Parav Pandit April 13, 2021, 4:03 a.m. UTC | #6
> From: Parav Pandit <parav@nvidia.com>
> Sent: Tuesday, April 13, 2021 12:38 AM
> 
> > From: Saleem, Shiraz <shiraz.saleem@intel.com>
> > Sent: Monday, April 12, 2021 8:21 PM
> >
> > > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > >
> > > On Wed, Apr 07, 2021 at 08:58:25PM +0000, Saleem, Shiraz wrote:
> > > > > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > > > >
> > > > > On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> > > > > > Add a new generic runtime devlink parameter 'rdma_protocol'
> > > > > > and use it in ice PCI driver. Configuration changes result in
> > > > > > unplugging the auxiliary RDMA device and re-plugging it with
> > > > > > updated values for irdma auxiiary driver to consume at
> > > > > > drv.probe()
> > > > > >
> > > > > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > >  .../networking/devlink/devlink-params.rst          |  6 ++
> > > > > >  Documentation/networking/devlink/ice.rst           | 13 +++
> > > > > >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92
> > > +++++++++++++++++++++-
> > > > > >  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
> > > > > >  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
> > > > > >  include/net/devlink.h                              |  4 +
> > > > > >  net/core/devlink.c                                 |  5 ++
> > > > > >  7 files changed, 125 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git
> > > > > > a/Documentation/networking/devlink/devlink-params.rst
> > > > > > b/Documentation/networking/devlink/devlink-params.rst
> > > > > > index 54c9f10..0b454c3 100644
> > > > > > +++ b/Documentation/networking/devlink/devlink-params.rst
> > > > > > @@ -114,3 +114,9 @@ own name.
> > > > > >         will NACK any attempt of other host to reset the
> > > > > > device. This
> > parameter
> > > > > >         is useful for setups where a device is shared by
> > > > > > different hosts,
> > such
> > > > > >         as multi-host setup.
> > > > > > +   * - ``rdma_protocol``
> > > > > > +     - string
> > > > > > +     - Selects the RDMA protocol selected for multi-protocol devices.
> > > > > > +        - ``iwarp`` iWARP
> > > > > > +	- ``roce`` RoCE
> > > > > > +	- ``ib`` Infiniband
> > > > >
> > > > > I'm still not sure this belongs in devlink.
> > > >
> > > > I believe you suggested we use devlink for protocol switch.
> > >
> > > Yes, devlink is the right place, but selecting a *single* protocol
> > > doesn't seem right, or general enough.
> > >
> > > Parav is talking about generic ways to customize the aux devices
> > > created and that would seem to serve the same function as this.
> >
> > Is there an RFC or something posted for us to look at?
> I do not have polished RFC content ready yet.
> But coping the full config sequence snippet from the internal draft (changed
> for ice example) here as I like to discuss with you in this context.
> 
> # (1) show auxiliary device types supported by a given devlink device.
> # applies to pci pf,vf,sf. (in general at devlink instance).
> $ devlink dev auxdev show pci/0000:06.00.0
> pci/0000:06.00.0:
>   current:
>     roce eth
>   new:
>   supported:
>     roce eth iwarp
> 
> # (2) enable iwarp and ethernet type of aux devices and disable roce.
> $ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on
> 
> # (3) now see which aux devices will be enable on next reload.
> $ devlink dev auxdev show pci/0000:06:00.0
> pci/0000:06:00.0:
>   current:
>     roce eth
>   new:
>     eth iwarp
>   supported:
>     roce eth iwarp
> 
> # (4) now reload the device and see which aux devices are created.
> At this point driver undergoes reconfig for removal of roce and adding iwarp.
> $ devlink reload pci/0000:06:00.0
> 
> # (5) verify which are the aux devices now activated.
> $ devlink dev auxdev show pci/0000:06:00.0
> pci/0000:06:00.0:
>   current:
>     roce eth
This was copy paste error from my command #1.
It should be "roce eth" should be "iwarp eth".
This is because in step #3, iwarp was enabled.

>   new:
>   supported:
>     roce eth iwarp
> 
> Above 'new' section is only shown when its set. (similar to devlink resource).
> In command two vendor driver can fail the call when iwarp and roce both
> enabled by user.
> mlx5_core doesn't have iwarp, but its vdpa type of device. And for other
> cases we just want to enable roce disabling eth and vdpa.
Shiraz Saleem April 13, 2021, 2:40 p.m. UTC | #7
> Subject: RE: [PATCH v4 05/23] ice: Add devlink params support
> 
> 
> 
> > From: Saleem, Shiraz <shiraz.saleem@intel.com>
> > Sent: Monday, April 12, 2021 8:21 PM
> >
> > > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > >
> > > On Wed, Apr 07, 2021 at 08:58:25PM +0000, Saleem, Shiraz wrote:
> > > > > Subject: Re: [PATCH v4 05/23] ice: Add devlink params support
> > > > >
> > > > > On Tue, Apr 06, 2021 at 04:01:07PM -0500, Shiraz Saleem wrote:
> > > > > > Add a new generic runtime devlink parameter 'rdma_protocol'
> > > > > > and use it in ice PCI driver. Configuration changes result in
> > > > > > unplugging the auxiliary RDMA device and re-plugging it with
> > > > > > updated values for irdma auxiiary driver to consume at
> > > > > > drv.probe()
> > > > > >
> > > > > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > >  .../networking/devlink/devlink-params.rst          |  6 ++
> > > > > >  Documentation/networking/devlink/ice.rst           | 13 +++
> > > > > >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 92
> > > +++++++++++++++++++++-
> > > > > >  drivers/net/ethernet/intel/ice/ice_devlink.h       |  5 ++
> > > > > >  drivers/net/ethernet/intel/ice/ice_main.c          |  2 +
> > > > > >  include/net/devlink.h                              |  4 +
> > > > > >  net/core/devlink.c                                 |  5 ++
> > > > > >  7 files changed, 125 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git
> > > > > > a/Documentation/networking/devlink/devlink-params.rst
> > > > > > b/Documentation/networking/devlink/devlink-params.rst
> > > > > > index 54c9f10..0b454c3 100644
> > > > > > +++ b/Documentation/networking/devlink/devlink-params.rst
> > > > > > @@ -114,3 +114,9 @@ own name.
> > > > > >         will NACK any attempt of other host to reset the
> > > > > > device. This
> > parameter
> > > > > >         is useful for setups where a device is shared by
> > > > > > different hosts,
> > such
> > > > > >         as multi-host setup.
> > > > > > +   * - ``rdma_protocol``
> > > > > > +     - string
> > > > > > +     - Selects the RDMA protocol selected for multi-protocol devices.
> > > > > > +        - ``iwarp`` iWARP
> > > > > > +	- ``roce`` RoCE
> > > > > > +	- ``ib`` Infiniband
> > > > >
> > > > > I'm still not sure this belongs in devlink.
> > > >
> > > > I believe you suggested we use devlink for protocol switch.
> > >
> > > Yes, devlink is the right place, but selecting a *single* protocol
> > > doesn't seem right, or general enough.
> > >
> > > Parav is talking about generic ways to customize the aux devices
> > > created and that would seem to serve the same function as this.
> >
> > Is there an RFC or something posted for us to look at?
> I do not have polished RFC content ready yet.
> But coping the full config sequence snippet from the internal draft (changed for ice
> example) here as I like to discuss with you in this context.

Thanks Parav! Some comments below.

> 
> # (1) show auxiliary device types supported by a given devlink device.
> # applies to pci pf,vf,sf. (in general at devlink instance).
> $ devlink dev auxdev show pci/0000:06.00.0
> pci/0000:06.00.0:
>   current:
>     roce eth
>   new:
>   supported:
>     roce eth iwarp
> 
> # (2) enable iwarp and ethernet type of aux devices and disable roce.
> $ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on
> 
> # (3) now see which aux devices will be enable on next reload.
> $ devlink dev auxdev show pci/0000:06:00.0
> pci/0000:06:00.0:
>   current:
>     roce eth
>   new:
>     eth iwarp
>   supported:
>     roce eth iwarp
> 
> # (4) now reload the device and see which aux devices are created.
> At this point driver undergoes reconfig for removal of roce and adding iwarp.
> $ devlink reload pci/0000:06:00.0

I see this is modeled like devlink resource.

Do we really to need a PCI driver re-init to switch the type of the auxdev hanging off the PCI dev?

Why not just allow the setting to apply dynamically during a 'set' itself with an unplug/plug
of the auxdev with correct type.

Shiraz
Parav Pandit April 13, 2021, 5:36 p.m. UTC | #8
> From: Saleem, Shiraz <shiraz.saleem@intel.com>
> Sent: Tuesday, April 13, 2021 8:11 PM
[..]

> > > > Parav is talking about generic ways to customize the aux devices
> > > > created and that would seem to serve the same function as this.
> > >
> > > Is there an RFC or something posted for us to look at?
> > I do not have polished RFC content ready yet.
> > But coping the full config sequence snippet from the internal draft
> > (changed for ice
> > example) here as I like to discuss with you in this context.
> 
> Thanks Parav! Some comments below.
> 
> >
> > # (1) show auxiliary device types supported by a given devlink device.
> > # applies to pci pf,vf,sf. (in general at devlink instance).
> > $ devlink dev auxdev show pci/0000:06.00.0
> > pci/0000:06.00.0:
> >   current:
> >     roce eth
> >   new:
> >   supported:
> >     roce eth iwarp
> >
> > # (2) enable iwarp and ethernet type of aux devices and disable roce.
> > $ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on
> >
> > # (3) now see which aux devices will be enable on next reload.
> > $ devlink dev auxdev show pci/0000:06:00.0
> > pci/0000:06:00.0:
> >   current:
> >     roce eth
> >   new:
> >     eth iwarp
> >   supported:
> >     roce eth iwarp
> >
> > # (4) now reload the device and see which aux devices are created.
> > At this point driver undergoes reconfig for removal of roce and adding
> iwarp.
> > $ devlink reload pci/0000:06:00.0
> 
> I see this is modeled like devlink resource.
> 
> Do we really to need a PCI driver re-init to switch the type of the auxdev
> hanging off the PCI dev?
> 
I don't see a need to re-init the whole PCI driver. Since only aux device config is changed only that piece to get reloaded.

> Why not just allow the setting to apply dynamically during a 'set' itself with an
> unplug/plug of the auxdev with correct type.
> 
This suggestion came up in the internal discussion too.
However such task needs to synchronize with devlink reload command and also with driver remove() sequence.
So locking wise and depending on amount of config change, it is close to what reload will do.
For example other resource config or other params setting also to take effect.
So to avoid defining multiple config sequence, doing as part of already existing devlink reload, it brings simple sequence to user.

For example,
1. enable/disable desired aux devices
2. configure device resources
3. set some device params
4. do devlink reload and apply settings done in #1 to #3
Shiraz Saleem April 14, 2021, 12:21 a.m. UTC | #9
> Subject: RE: [PATCH v4 05/23] ice: Add devlink params support
> 
> 
> 
> > From: Saleem, Shiraz <shiraz.saleem@intel.com>
> > Sent: Tuesday, April 13, 2021 8:11 PM
> [..]
> 
> > > > > Parav is talking about generic ways to customize the aux devices
> > > > > created and that would seem to serve the same function as this.
> > > >
> > > > Is there an RFC or something posted for us to look at?
> > > I do not have polished RFC content ready yet.
> > > But coping the full config sequence snippet from the internal draft
> > > (changed for ice
> > > example) here as I like to discuss with you in this context.
> >
> > Thanks Parav! Some comments below.
> >
> > >
> > > # (1) show auxiliary device types supported by a given devlink device.
> > > # applies to pci pf,vf,sf. (in general at devlink instance).
> > > $ devlink dev auxdev show pci/0000:06.00.0
> > > pci/0000:06.00.0:
> > >   current:
> > >     roce eth
> > >   new:
> > >   supported:
> > >     roce eth iwarp
> > >
> > > # (2) enable iwarp and ethernet type of aux devices and disable roce.
> > > $ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on
> > >
> > > # (3) now see which aux devices will be enable on next reload.
> > > $ devlink dev auxdev show pci/0000:06:00.0
> > > pci/0000:06:00.0:
> > >   current:
> > >     roce eth
> > >   new:
> > >     eth iwarp
> > >   supported:
> > >     roce eth iwarp
> > >
> > > # (4) now reload the device and see which aux devices are created.
> > > At this point driver undergoes reconfig for removal of roce and
> > > adding
> > iwarp.
> > > $ devlink reload pci/0000:06:00.0
> >
> > I see this is modeled like devlink resource.
> >
> > Do we really to need a PCI driver re-init to switch the type of the
> > auxdev hanging off the PCI dev?
> >
> I don't see a need to re-init the whole PCI driver. Since only aux device config is
> changed only that piece to get reloaded.

But that is what mlx5 and other implementations does on reload no? i.e. a PCI driver reinit.
I can see an ice implementation of reload morphing to similar over time to support a new config
that requires a true reinit of PCI driver entities.

> 
> > Why not just allow the setting to apply dynamically during a 'set'
> > itself with an unplug/plug of the auxdev with correct type.
> >
> This suggestion came up in the internal discussion too.
> However such task needs to synchronize with devlink reload command and also
> with driver remove() sequence.
> So locking wise and depending on amount of config change, it is close to what
> reload will do.

Holding this mutex across the auxiliary device unplug/plug in "set" wont cut it?
https://elixir.bootlin.com/linux/v5.12-rc7/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L1304

> For example other resource config or other params setting also to take effect.
> So to avoid defining multiple config sequence, doing as part of already existing
> devlink reload, it brings simple sequence to user.
> 
> For example,
> 1. enable/disable desired aux devices
> 2. configure device resources
> 3. set some device params
> 4. do devlink reload and apply settings done in #1 to #3

Sure. But a user might also just want to operate on just an auxiliary device configuration change. As in #1.
And he ends up having everything hanging off the PF to get blown out, including potentially
the VFs. That feels like too big a hammer.

Shiraz
Parav Pandit April 14, 2021, 5:27 a.m. UTC | #10
+ Jiri.

> From: Saleem, Shiraz <shiraz.saleem@intel.com>
> Sent: Wednesday, April 14, 2021 5:51 AM
> 
> > Subject: RE: [PATCH v4 05/23] ice: Add devlink params support
> >
> >
> >
> > > From: Saleem, Shiraz <shiraz.saleem@intel.com>
> > > Sent: Tuesday, April 13, 2021 8:11 PM
> > [..]
> >
> > > > > > Parav is talking about generic ways to customize the aux
> > > > > > devices created and that would seem to serve the same function as
> this.
> > > > >
> > > > > Is there an RFC or something posted for us to look at?
> > > > I do not have polished RFC content ready yet.
> > > > But coping the full config sequence snippet from the internal
> > > > draft (changed for ice
> > > > example) here as I like to discuss with you in this context.
> > >
> > > Thanks Parav! Some comments below.
> > >
> > > >
> > > > # (1) show auxiliary device types supported by a given devlink device.
> > > > # applies to pci pf,vf,sf. (in general at devlink instance).
> > > > $ devlink dev auxdev show pci/0000:06.00.0
> > > > pci/0000:06.00.0:
> > > >   current:
> > > >     roce eth
> > > >   new:
> > > >   supported:
> > > >     roce eth iwarp
> > > >
> > > > # (2) enable iwarp and ethernet type of aux devices and disable roce.
> > > > $ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on
> > > >
> > > > # (3) now see which aux devices will be enable on next reload.
> > > > $ devlink dev auxdev show pci/0000:06:00.0
> > > > pci/0000:06:00.0:
> > > >   current:
> > > >     roce eth
> > > >   new:
> > > >     eth iwarp
> > > >   supported:
> > > >     roce eth iwarp
> > > >
> > > > # (4) now reload the device and see which aux devices are created.
> > > > At this point driver undergoes reconfig for removal of roce and
> > > > adding
> > > iwarp.
> > > > $ devlink reload pci/0000:06:00.0
> > >
> > > I see this is modeled like devlink resource.
> > >
> > > Do we really to need a PCI driver re-init to switch the type of the
> > > auxdev hanging off the PCI dev?
> > >
> > I don't see a need to re-init the whole PCI driver. Since only aux
> > device config is changed only that piece to get reloaded.
> 
> But that is what mlx5 and other implementations does on reload no? i.e. a
> PCI driver reinit.
Currently yes, reload does PCI re-init.
However I am not seeing the value of reload if no config (param, resource, auxdev) is changed.

> I can see an ice implementation of reload morphing to similar over time to
> support a new config that requires a true reinit of PCI driver entities.
> 
Sure.

> >
> > > Why not just allow the setting to apply dynamically during a 'set'
> > > itself with an unplug/plug of the auxdev with correct type.
> > >
> > This suggestion came up in the internal discussion too.
> > However such task needs to synchronize with devlink reload command and
> > also with driver remove() sequence.
> > So locking wise and depending on amount of config change, it is close
> > to what reload will do.
> 
> Holding this mutex across the auxiliary device unplug/plug in "set" wont cut
> it?
> https://elixir.bootlin.com/linux/v5.12-
> rc7/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L1304
> 
Currently devlink reload for mlx5 is source of lockdep assert, use after free access and a deadlock in net ns. :-(
Multiple of us (Leon, Saeed, Moshe) working on it resolve it.
So I want to stay away from intf_mutex for now.

> > For example other resource config or other params setting also to take
> effect.
> > So to avoid defining multiple config sequence, doing as part of
> > already existing devlink reload, it brings simple sequence to user.
> >
> > For example,
> > 1. enable/disable desired aux devices
> > 2. configure device resources
> > 3. set some device params
> > 4. do devlink reload and apply settings done in #1 to #3
> 
> Sure. But a user might also just want to operate on just an auxiliary device
> configuration change. As in #1.
> And he ends up having everything hanging off the PF to get blown out,
> including potentially the VFs. That feels like too big a hammer.
This is certainly not desired.

If we want aux device enable/disable to take effect when its done without reload than above flow should be redefined as,

1. configure device resources (optional)
2. set some device params (optional)
3. enable/disable desired aux devices

Step-3 needs to apply the settings of (1) and (2) without user doing devlink reload.
devlink core doesn't know on step #3, that reload_down() and reload_up() to be done.
So driver internally needs to implement reload_down(), up() on callback of #3.
This builds parallel framework to devlink reload.

Jiri,
What do you think of it?
Leon Romanovsky April 18, 2021, 11:51 a.m. UTC | #11
On Wed, Apr 14, 2021 at 12:21:08AM +0000, Saleem, Shiraz wrote:
> > Subject: RE: [PATCH v4 05/23] ice: Add devlink params support

<...>

> > > Why not just allow the setting to apply dynamically during a 'set'
> > > itself with an unplug/plug of the auxdev with correct type.
> > >
> > This suggestion came up in the internal discussion too.
> > However such task needs to synchronize with devlink reload command and also
> > with driver remove() sequence.
> > So locking wise and depending on amount of config change, it is close to what
> > reload will do.
> 
> Holding this mutex across the auxiliary device unplug/plug in "set" wont cut it?
> https://elixir.bootlin.com/linux/v5.12-rc7/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L1304

Like Parav said, we are working to fix it and already have one working
solution, unfortunately it has one eyebrow raising change and we are
trying another one.

You can take a look here to get sense of the scope:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=devlink-core

Thanks
diff mbox series

Patch

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 54c9f10..0b454c3 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -114,3 +114,9 @@  own name.
        will NACK any attempt of other host to reset the device. This parameter
        is useful for setups where a device is shared by different hosts, such
        as multi-host setup.
+   * - ``rdma_protocol``
+     - string
+     - Selects the RDMA protocol selected for multi-protocol devices.
+        - ``iwarp`` iWARP
+	- ``roce`` RoCE
+	- ``ib`` Infiniband
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index a432dc4..2e04c99 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -193,3 +193,16 @@  Users can request an immediate capture of a snapshot via the
     0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 
     $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
+
+Parameters
+==========
+
+The ``ice`` driver implements the following generic and driver-specific
+parameters.
+
+.. list-table:: Generic parameters implemented
+
+   * - Name
+     - Mode
+   * - ``rdma_protocol``
+     - runtime
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index cf685ee..de03eb8 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -449,6 +449,64 @@  static int ice_devlink_info_get(struct devlink *devlink,
 	.flash_update = ice_devlink_flash_update,
 };
 
+static int
+ice_devlink_rdma_prot_get(struct devlink *devlink, u32 id,
+			  struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct iidc_core_dev_info *cdev_info =
+		ice_find_cdev_info_by_id(pf, IIDC_RDMA_ID);
+
+	if (cdev_info->rdma_protocol == IIDC_RDMA_PROTOCOL_IWARP)
+		strcpy(ctx->val.vstr, "iwarp");
+	else
+		strcpy(ctx->val.vstr, "roce");
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_prot_set(struct devlink *devlink, u32 id,
+			  struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct iidc_core_dev_info *cdev_info =
+		ice_find_cdev_info_by_id(pf, IIDC_RDMA_ID);
+	enum iidc_rdma_protocol prot = !strcmp(ctx->val.vstr, "iwarp") ?
+					IIDC_RDMA_PROTOCOL_IWARP :
+					IIDC_RDMA_PROTOCOL_ROCEV2;
+
+	if (cdev_info->rdma_protocol != prot) {
+		ice_unplug_aux_devs(pf);
+		cdev_info->rdma_protocol = prot;
+		ice_plug_aux_devs(pf);
+	}
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_prot_validate(struct devlink *devlink, u32 id,
+			       union devlink_param_value val,
+			       struct netlink_ext_ack *extack)
+{
+	char *value = val.vstr;
+
+	if (!strcmp(value, "iwarp") || !strcmp(value, "roce"))
+		return 0;
+
+	NL_SET_ERR_MSG_MOD(extack, "\"iwarp\" and \"roce\" are the only supported values");
+
+	return -EINVAL;
+}
+
+static const struct devlink_param ice_devlink_params[] = {
+	DEVLINK_PARAM_GENERIC(RDMA_PROTOCOL, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			      ice_devlink_rdma_prot_get,
+			      ice_devlink_rdma_prot_set,
+			      ice_devlink_rdma_prot_validate),
+};
+
 static void ice_devlink_free(void *devlink_ptr)
 {
 	devlink_free((struct devlink *)devlink_ptr);
@@ -491,15 +549,31 @@  int ice_devlink_register(struct ice_pf *pf)
 {
 	struct devlink *devlink = priv_to_devlink(pf);
 	struct device *dev = ice_pf_to_dev(pf);
+	union devlink_param_value value;
 	int err;
 
 	err = devlink_register(devlink, dev);
+	if (err)
+		goto err;
+
+	err = devlink_params_register(devlink, ice_devlink_params,
+				      ARRAY_SIZE(ice_devlink_params));
 	if (err) {
-		dev_err(dev, "devlink registration failed: %d\n", err);
-		return err;
+		devlink_unregister(devlink);
+		goto err;
 	}
 
+	strcpy(value.vstr, "iwarp");
+	devlink_param_driverinit_value_set(devlink,
+					   DEVLINK_PARAM_GENERIC_ID_RDMA_PROTOCOL,
+					   value);
+
 	return 0;
+
+err:
+	dev_err(dev, "devlink registration failed: %d\n", err);
+
+	return err;
 }
 
 /**
@@ -510,10 +584,24 @@  int ice_devlink_register(struct ice_pf *pf)
  */
 void ice_devlink_unregister(struct ice_pf *pf)
 {
+	devlink_params_unregister(priv_to_devlink(pf), ice_devlink_params,
+				  ARRAY_SIZE(ice_devlink_params));
 	devlink_unregister(priv_to_devlink(pf));
 }
 
 /**
+ * ice_devlink_params_publish - Publish devlink param
+ * @pf: the PF structure to cleanup
+ *
+ * Publish previously registered devlink parameters after driver
+ * is initialized
+ */
+void ice_devlink_params_publish(struct ice_pf *pf)
+{
+	devlink_params_publish(priv_to_devlink(pf));
+}
+
+/**
  * ice_devlink_create_port - Create a devlink port for this VSI
  * @vsi: the VSI to create a port for
  *
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
index e07e744..e7239fa 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.h
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -4,10 +4,15 @@ 
 #ifndef _ICE_DEVLINK_H_
 #define _ICE_DEVLINK_H_
 
+enum ice_devlink_param_id {
+	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+};
+
 struct ice_pf *ice_allocate_pf(struct device *dev);
 
 int ice_devlink_register(struct ice_pf *pf);
 void ice_devlink_unregister(struct ice_pf *pf);
+void ice_devlink_params_publish(struct ice_pf *pf);
 int ice_devlink_create_port(struct ice_vsi *vsi);
 void ice_devlink_destroy_port(struct ice_vsi *vsi);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 3d750ba..3e3a9cf 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4346,6 +4346,8 @@  static void ice_print_wake_reason(struct ice_pf *pf)
 		dev_warn(dev, "RDMA is not supported on this device\n");
 	}
 
+	ice_devlink_params_publish(pf);
+
 	return 0;
 
 err_init_aux_unroll:
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 853420d..09e4d76 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -498,6 +498,7 @@  enum devlink_param_generic_id {
 	DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE,
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
+	DEVLINK_PARAM_GENERIC_ID_RDMA_PROTOCOL,
 
 	/* add new param generic ids above here*/
 	__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -538,6 +539,9 @@  enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset"
 #define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_NAME "rdma_protocol"
+#define DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_TYPE DEVLINK_PARAM_TYPE_STRING
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
 {									\
 	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 737b61c..1bb3865 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3766,6 +3766,11 @@  static int devlink_nl_cmd_flash_update(struct sk_buff *skb,
 		.name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME,
 		.type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE,
 	},
+	{
+		.id = DEVLINK_PARAM_GENERIC_ID_RDMA_PROTOCOL,
+		.name = DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_NAME,
+		.type = DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_TYPE,
+	},
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)