diff mbox series

[v2,05/23] ice: Add devlink params support

Message ID 20210324000007.1450-6-shiraz.saleem@intel.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Add Intel Ethernet Protocol Driver for RDMA (irdma) | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Shiraz Saleem March 23, 2021, 11:59 p.m. UTC
Add two new runtime RDMA related devlink parameters to ice
driver. 'rdma_resource_limits_sel' is driver-specific
while 'rdma_protocol' is generic. Configuration changes
result in unplugging the auxiliary RDMA device and re-plugging
it with updated values for irdma auxiiary driver to consume at
drv.probe()

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
---
 .../networking/devlink/devlink-params.rst          |   6 +
 Documentation/networking/devlink/ice.rst           |  35 +++++
 drivers/net/ethernet/intel/ice/ice_devlink.c       | 146 ++++++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_devlink.h       |   6 +
 drivers/net/ethernet/intel/ice/ice_main.c          |   2 +
 include/net/devlink.h                              |   4 +
 net/core/devlink.c                                 |   5 +
 7 files changed, 202 insertions(+), 2 deletions(-)

Comments

Parav Pandit March 24, 2021, 4:50 a.m. UTC | #1
Hi Shiraz,

> From: Shiraz Saleem <shiraz.saleem@intel.com>
> Sent: Wednesday, March 24, 2021 5:30 AM
> 
> Add two new runtime RDMA related devlink parameters to ice driver.
> 'rdma_resource_limits_sel' is driver-specific while 'rdma_protocol' is generic.
> Configuration changes result in unplugging the auxiliary RDMA device and re-
> plugging it with updated values for irdma auxiiary driver to consume at
> drv.probe()
> 
> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> ---
>  .../networking/devlink/devlink-params.rst          |   6 +
>  Documentation/networking/devlink/ice.rst           |  35 +++++
>  drivers/net/ethernet/intel/ice/ice_devlink.c       | 146
> ++++++++++++++++++++-
>  drivers/net/ethernet/intel/ice/ice_devlink.h       |   6 +
>  drivers/net/ethernet/intel/ice/ice_main.c          |   2 +
>  include/net/devlink.h                              |   4 +
>  net/core/devlink.c                                 |   5 +
>  7 files changed, 202 insertions(+), 2 deletions(-)
> 

[..]
> +.. list-table:: Driver-specific parameters implemented
> +   :widths: 5 5 5 85
> +
> +   * - Name
> +     - Type
> +     - Mode
> +     - Description
> +   * - ``rdma_resource_limits_sel``
> +     - string
> +     - runtime
> +     - Selector to limit the RDMA resources configured for the device. The
> range
> +       is between 0 and 7 with a default value equal to 3. Each selector
> supports
> +       up to the value specified in the table.
> +          - 0: 128 QPs
> +          - 1: 1K QPs
> +          - 2: 2K QPs
> +          - 3: 4K QPs
> +          - 4: 16K QPs
> +          - 5: 64K QPs
> +          - 6: 128K QPs
> +          - 7: 256K QPs

Resources are better represented as devlink resource.
Such as,

$ devlink resource set pci/0000:06:00.0 /rdma/max_qps 16384
$ devlink resource set pci/0000:06:00.0 /rdma/max_cqs 8192
$ devlink resource set pci/0000:06:00.0 /rdma/max_mrs 16384

Please move from param to resource.
Shiraz Saleem March 25, 2021, 8:10 p.m. UTC | #2
> Subject: RE: [PATCH v2 05/23] ice: Add devlink params support
> 
> Hi Shiraz,
> 
> > From: Shiraz Saleem <shiraz.saleem@intel.com>
> > Sent: Wednesday, March 24, 2021 5:30 AM
> >
> > Add two new runtime RDMA related devlink parameters to ice driver.
> > 'rdma_resource_limits_sel' is driver-specific while 'rdma_protocol' is generic.
> > Configuration changes result in unplugging the auxiliary RDMA device
> > and re- plugging it with updated values for irdma auxiiary driver to
> > consume at
> > drv.probe()
> >
> > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > ---
> >  .../networking/devlink/devlink-params.rst          |   6 +
> >  Documentation/networking/devlink/ice.rst           |  35 +++++
> >  drivers/net/ethernet/intel/ice/ice_devlink.c       | 146
> > ++++++++++++++++++++-
> >  drivers/net/ethernet/intel/ice/ice_devlink.h       |   6 +
> >  drivers/net/ethernet/intel/ice/ice_main.c          |   2 +
> >  include/net/devlink.h                              |   4 +
> >  net/core/devlink.c                                 |   5 +
> >  7 files changed, 202 insertions(+), 2 deletions(-)
> >
> 
> [..]
> > +.. list-table:: Driver-specific parameters implemented
> > +   :widths: 5 5 5 85
> > +
> > +   * - Name
> > +     - Type
> > +     - Mode
> > +     - Description
> > +   * - ``rdma_resource_limits_sel``
> > +     - string
> > +     - runtime
> > +     - Selector to limit the RDMA resources configured for the
> > + device. The
> > range
> > +       is between 0 and 7 with a default value equal to 3. Each
> > + selector
> > supports
> > +       up to the value specified in the table.
> > +          - 0: 128 QPs
> > +          - 1: 1K QPs
> > +          - 2: 2K QPs
> > +          - 3: 4K QPs
> > +          - 4: 16K QPs
> > +          - 5: 64K QPs
> > +          - 6: 128K QPs
> > +          - 7: 256K QPs
> 
> Resources are better represented as devlink resource.
> Such as,
> 
> $ devlink resource set pci/0000:06:00.0 /rdma/max_qps 16384 $ devlink resource
> set pci/0000:06:00.0 /rdma/max_cqs 8192 $ devlink resource set pci/0000:06:00.0
> /rdma/max_mrs 16384
> 

Hi Parav - Thank you for the feedback.

Maybe I am missing something but I see that a devlink hot reload is required to enforce the update?
There isn't really a de-init required of PCI driver entities in this case for this rdma param.
But only an unplug, plug of the auxdev with new value. Intuitively it feels more runtime-ish.

There is also a device powerof2 requirement on the maxqp which I don't see enforceable as it stands.

This is not super-critical for the initial submission but a nice to have. But I do want to brainstorm options.. 

Shiraz
Parav Pandit March 26, 2021, 5:42 a.m. UTC | #3
> From: Saleem, Shiraz <shiraz.saleem@intel.com>
> Sent: Friday, March 26, 2021 1:40 AM
> > Subject: RE: [PATCH v2 05/23] ice: Add devlink params support

[..]
> >
> > Resources are better represented as devlink resource.
> > Such as,
> >
> > $ devlink resource set pci/0000:06:00.0 /rdma/max_qps 16384 $ devlink
> > resource set pci/0000:06:00.0 /rdma/max_cqs 8192 $ devlink resource
> > set pci/0000:06:00.0 /rdma/max_mrs 16384
> >
> 
> Hi Parav - Thank you for the feedback.
> 
> Maybe I am missing something but I see that a devlink hot reload is required
> to enforce the update?
It isn't mandatory to reload, but yes either reload or driver unbind/bind is needed as you suggested below.

> There isn't really a de-init required of PCI driver entities in this case for this
> rdma param.
> But only an unplug, plug of the auxdev with new value. Intuitively it feels
> more runtime-ish.
> 
Driver unbind/bind to reflect new limits is ok for cases where it is not time sensitive.
For mlx5 use cases, user expects device to be provisioned in < few msecs.
And driver unbind/bind are sub-optimal to achieve it from time and memory wise.
So mlx5 driver prefers to stay away from driver unbind/bind steps.
So I am working on series to not create class aux devices by default for SFs.
Rather to create them on reload.
Something like, 
$ devlink dev param set pci/0000:06:00.0 name pcisf_classes value false cmode driverinit
$ devlink dev class set auxiliary/mlx5_core.sf.4 rdma true
$ devlink resource set auxiliary/mlx5_core.sf.4 path rdma/max_qps size 200000
$ devlink dev reload auxiliary/mlx5_core.sf.4
This last command will create the mlx5_core.rdma.4 aux device and when its bound to driver, it will create IB device with right max_qp.
So rdma device is created only once, instead of twice using unbind/bind sequence.

This may not be possible for the PF/VF device due to backward compatibility and their different usage in system.

> There is also a device powerof2 requirement on the maxqp which I don't see
> enforceable as it stands.
>
Right, but similar to size_params.size_max, size_granularity, I believe size_params can be extended for alignment restriction.

> This is not super-critical for the initial submission but a nice to have. But I do
> want to brainstorm options..
> 
If max_qp is truly a dynamic value that doesn't require device recreation,
extending existing rdma resource command seems more useful to end user than doing unbind/bind.
Jason Gunthorpe March 29, 2021, 4:07 p.m. UTC | #4
On Thu, Mar 25, 2021 at 08:10:13PM +0000, Saleem, Shiraz wrote:

> Maybe I am missing something but I see that a devlink hot reload is
> required to enforce the update?  There isn't really a de-init
> required of PCI driver entities in this case for this rdma param.
> But only an unplug, plug of the auxdev with new value. Intuitively
> it feels more runtime-ish.
> 
> There is also a device powerof2 requirement on the maxqp which I
> don't see enforceable as it stands.
> 
> This is not super-critical for the initial submission but a nice to
> have. But I do want to brainstorm options..

devlink upai often seems to be an adventure, can you submit this
driver without devlink (or any other uapis) then debate how to add
them in as followup patches?

Jason
Shiraz Saleem March 29, 2021, 4:25 p.m. UTC | #5
> Subject: Re: [PATCH v2 05/23] ice: Add devlink params support
> 
> On Thu, Mar 25, 2021 at 08:10:13PM +0000, Saleem, Shiraz wrote:
> 
> > Maybe I am missing something but I see that a devlink hot reload is
> > required to enforce the update?  There isn't really a de-init required
> > of PCI driver entities in this case for this rdma param.
> > But only an unplug, plug of the auxdev with new value. Intuitively it
> > feels more runtime-ish.
> >
> > There is also a device powerof2 requirement on the maxqp which I don't
> > see enforceable as it stands.
> >
> > This is not super-critical for the initial submission but a nice to
> > have. But I do want to brainstorm options..
> 
> devlink upai often seems to be an adventure, can you submit this driver without
> devlink (or any other uapis) then debate how to add them in as followup patches?
> 

Yes to removing this particular param 'resource_limits_sel' from first submission.
I think Parav has given some good pointers. Will review internally and continue the discussion.
And submit follow on patch for it.

W.r.t protocol selection, without devlink, user cannot configure roce (iwarp is default) on the function.
So its good to get that in if possible. It seemed people were ok with our approach on it.

Shiraz
diff mbox series

Patch

diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 54c9f10..0b454c3 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -114,3 +114,9 @@  own name.
        will NACK any attempt of other host to reset the device. This parameter
        is useful for setups where a device is shared by different hosts, such
        as multi-host setup.
+   * - ``rdma_protocol``
+     - string
+     - Selects the RDMA protocol selected for multi-protocol devices.
+        - ``iwarp`` iWARP
+	- ``roce`` RoCE
+	- ``ib`` Infiniband
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index a432dc4..28ac332 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -193,3 +193,38 @@  Users can request an immediate capture of a snapshot via the
     0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 
     $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
+
+Parameters
+==========
+
+The ``ice`` driver implements the following generic and driver-specific
+parameters.
+
+.. list-table:: Generic parameters implemented
+
+   * - Name
+     - Mode
+   * - ``rdma_protocol``
+     - runtime
+
+.. list-table:: Driver-specific parameters implemented
+   :widths: 5 5 5 85
+
+   * - Name
+     - Type
+     - Mode
+     - Description
+   * - ``rdma_resource_limits_sel``
+     - string
+     - runtime
+     - Selector to limit the RDMA resources configured for the device. The range
+       is between 0 and 7 with a default value equal to 3. Each selector supports
+       up to the value specified in the table.
+          - 0: 128 QPs
+          - 1: 1K QPs
+          - 2: 2K QPs
+          - 3: 4K QPs
+          - 4: 16K QPs
+          - 5: 64K QPs
+          - 6: 128K QPs
+          - 7: 256K QPs
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.c b/drivers/net/ethernet/intel/ice/ice_devlink.c
index cf685ee..3e53cc4 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.c
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.c
@@ -449,6 +449,113 @@  static int ice_devlink_info_get(struct devlink *devlink,
 	.flash_update = ice_devlink_flash_update,
 };
 
+static int
+ice_devlink_rdma_limits_sel_get(struct devlink *devlink, u32 id,
+				struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct iidc_core_dev_info *cdev_info =
+		ice_find_cdev_info_by_id(pf, IIDC_RDMA_ID);
+
+	ctx->val.vu8 = cdev_info->rdma_limits_sel;
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_limits_sel_set(struct devlink *devlink, u32 id,
+				struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct iidc_core_dev_info *cdev_info =
+		ice_find_cdev_info_by_id(pf, IIDC_RDMA_ID);
+
+	if (ctx->val.vu8 != cdev_info->rdma_limits_sel) {
+		ice_unplug_aux_devs(pf);
+		cdev_info->rdma_limits_sel = ctx->val.vu8;
+		ice_plug_aux_devs(pf);
+	}
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_limits_sel_validate(struct devlink *devlink, u32 id,
+				     union devlink_param_value val,
+				     struct netlink_ext_ack *extack)
+{
+	if (val.vu8 > IIDC_RDMA_LIMITS_SEL_7) {
+		NL_SET_ERR_MSG_MOD(extack, "RDMA resource limits selector range is (0-7)");
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_prot_get(struct devlink *devlink, u32 id,
+			  struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct iidc_core_dev_info *cdev_info =
+		ice_find_cdev_info_by_id(pf, IIDC_RDMA_ID);
+
+	if (cdev_info->rdma_protocol == IIDC_RDMA_PROTOCOL_IWARP)
+		strcpy(ctx->val.vstr, "iwarp");
+	else
+		strcpy(ctx->val.vstr, "roce");
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_prot_set(struct devlink *devlink, u32 id,
+			  struct devlink_param_gset_ctx *ctx)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	struct iidc_core_dev_info *cdev_info =
+		ice_find_cdev_info_by_id(pf, IIDC_RDMA_ID);
+	enum iidc_rdma_protocol prot = !strcmp(ctx->val.vstr, "iwarp") ?
+					IIDC_RDMA_PROTOCOL_IWARP :
+					IIDC_RDMA_PROTOCOL_ROCEV2;
+
+	if (cdev_info->rdma_protocol != prot) {
+		ice_unplug_aux_devs(pf);
+		cdev_info->rdma_protocol = prot;
+		ice_plug_aux_devs(pf);
+	}
+
+	return 0;
+}
+
+static int
+ice_devlink_rdma_prot_validate(struct devlink *devlink, u32 id,
+			       union devlink_param_value val,
+			       struct netlink_ext_ack *extack)
+{
+	char *value = val.vstr;
+
+	if (!strcmp(value, "iwarp") || !strcmp(value, "roce"))
+		return 0;
+
+	NL_SET_ERR_MSG_MOD(extack, "\"iwarp\" and \"roce\" are the only supported values");
+
+	return -EINVAL;
+}
+
+static const struct devlink_param ice_devlink_params[] = {
+	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_RDMA_LIMITS_SELECTOR,
+			     "rdma_resource_limits_sel", DEVLINK_PARAM_TYPE_U8,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     ice_devlink_rdma_limits_sel_get,
+			     ice_devlink_rdma_limits_sel_set,
+			     ice_devlink_rdma_limits_sel_validate),
+	DEVLINK_PARAM_GENERIC(RDMA_PROTOCOL, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			      ice_devlink_rdma_prot_get,
+			      ice_devlink_rdma_prot_set,
+			      ice_devlink_rdma_prot_validate),
+};
+
 static void ice_devlink_free(void *devlink_ptr)
 {
 	devlink_free((struct devlink *)devlink_ptr);
@@ -491,15 +598,36 @@  int ice_devlink_register(struct ice_pf *pf)
 {
 	struct devlink *devlink = priv_to_devlink(pf);
 	struct device *dev = ice_pf_to_dev(pf);
+	union devlink_param_value value;
 	int err;
 
 	err = devlink_register(devlink, dev);
+	if (err)
+		goto err;
+
+	err = devlink_params_register(devlink, ice_devlink_params,
+				      ARRAY_SIZE(ice_devlink_params));
 	if (err) {
-		dev_err(dev, "devlink registration failed: %d\n", err);
-		return err;
+		devlink_unregister(devlink);
+		goto err;
 	}
 
+	value.vu8 = IIDC_RDMA_LIMITS_SEL_3;
+	devlink_param_driverinit_value_set(devlink,
+					   ICE_DEVLINK_PARAM_ID_RDMA_LIMITS_SELECTOR,
+					   value);
+
+	strcpy(value.vstr, "iwarp");
+	devlink_param_driverinit_value_set(devlink,
+					   DEVLINK_PARAM_GENERIC_ID_RDMA_PROTOCOL,
+					   value);
+
 	return 0;
+
+err:
+	dev_err(dev, "devlink registration failed: %d\n", err);
+
+	return err;
 }
 
 /**
@@ -510,10 +638,24 @@  int ice_devlink_register(struct ice_pf *pf)
  */
 void ice_devlink_unregister(struct ice_pf *pf)
 {
+	devlink_params_unregister(priv_to_devlink(pf), ice_devlink_params,
+				  ARRAY_SIZE(ice_devlink_params));
 	devlink_unregister(priv_to_devlink(pf));
 }
 
 /**
+ * ice_devlink_params_publish - Publish devlink param
+ * @pf: the PF structure to cleanup
+ *
+ * Publish previously registered devlink parameters after driver
+ * is initialized
+ */
+void ice_devlink_params_publish(struct ice_pf *pf)
+{
+	devlink_params_publish(priv_to_devlink(pf));
+}
+
+/**
  * ice_devlink_create_port - Create a devlink port for this VSI
  * @vsi: the VSI to create a port for
  *
diff --git a/drivers/net/ethernet/intel/ice/ice_devlink.h b/drivers/net/ethernet/intel/ice/ice_devlink.h
index e07e744..865f797 100644
--- a/drivers/net/ethernet/intel/ice/ice_devlink.h
+++ b/drivers/net/ethernet/intel/ice/ice_devlink.h
@@ -4,10 +4,16 @@ 
 #ifndef _ICE_DEVLINK_H_
 #define _ICE_DEVLINK_H_
 
+enum ice_devlink_param_id {
+	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+	ICE_DEVLINK_PARAM_ID_RDMA_LIMITS_SELECTOR,
+};
+
 struct ice_pf *ice_allocate_pf(struct device *dev);
 
 int ice_devlink_register(struct ice_pf *pf);
 void ice_devlink_unregister(struct ice_pf *pf);
+void ice_devlink_params_publish(struct ice_pf *pf);
 int ice_devlink_create_port(struct ice_vsi *vsi);
 void ice_devlink_destroy_port(struct ice_vsi *vsi);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 4b03157..789aa39 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4346,6 +4346,8 @@  static void ice_print_wake_reason(struct ice_pf *pf)
 		dev_warn(dev, "RDMA is not supported on this device\n");
 	}
 
+	ice_devlink_params_publish(pf);
+
 	return 0;
 
 err_init_aux_unroll:
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 853420d..09e4d76 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -498,6 +498,7 @@  enum devlink_param_generic_id {
 	DEVLINK_PARAM_GENERIC_ID_RESET_DEV_ON_DRV_PROBE,
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
 	DEVLINK_PARAM_GENERIC_ID_ENABLE_REMOTE_DEV_RESET,
+	DEVLINK_PARAM_GENERIC_ID_RDMA_PROTOCOL,
 
 	/* add new param generic ids above here*/
 	__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -538,6 +539,9 @@  enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME "enable_remote_dev_reset"
 #define DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_NAME "rdma_protocol"
+#define DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_TYPE DEVLINK_PARAM_TYPE_STRING
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
 {									\
 	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 737b61c..1bb3865 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -3766,6 +3766,11 @@  static int devlink_nl_cmd_flash_update(struct sk_buff *skb,
 		.name = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_NAME,
 		.type = DEVLINK_PARAM_GENERIC_ENABLE_REMOTE_DEV_RESET_TYPE,
 	},
+	{
+		.id = DEVLINK_PARAM_GENERIC_ID_RDMA_PROTOCOL,
+		.name = DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_NAME,
+		.type = DEVLINK_PARAM_GENERIC_RDMA_PROTOCOL_TYPE,
+	},
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)