diff mbox series

[for-next,v2] RDMA/bnxt_re: Congestion control settings using debugfs hook

Message ID 1737301535-6599-1-git-send-email-selvin.xavier@broadcom.com (mailing list archive)
State New
Headers show
Series [for-next,v2] RDMA/bnxt_re: Congestion control settings using debugfs hook | expand

Commit Message

Selvin Xavier Jan. 19, 2025, 3:45 p.m. UTC
Implements routines to set and get different settings  of
the congestion control. This will enable the users to modify
the settings according to their network.

Currently supporting only GEN 0 version of the parameters.
Reading these files queries the firmware and report the values
currently programmed. Writing to the files sends commands that
update the congestion control settings

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
v1 -> v2:
  Addressed Leon's comments
     - rename debugfs file "g" to "run_avg_weight_g"
     - Fix the indentation errors
     - Remove the unnecessary error message during the read entry point
     - Fix the return value

 drivers/infiniband/hw/bnxt_re/bnxt_re.h |   2 +
 drivers/infiniband/hw/bnxt_re/debugfs.c | 212 +++++++++++++++++++++++++++++++-
 drivers/infiniband/hw/bnxt_re/debugfs.h |  15 +++
 3 files changed, 228 insertions(+), 1 deletion(-)

Comments

Jason Gunthorpe Jan. 20, 2025, 4:40 p.m. UTC | #1
On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote:
> Implements routines to set and get different settings  of
> the congestion control. This will enable the users to modify
> the settings according to their network.

Should something like this be in debugfs though?

bnxt_qplib_modify_cc() is just sending a firmware command, seems like
this should belong to fwctl?

Additionally there may be interest in some common way to control CC
for RDMA..

Jason
Selvin Xavier Jan. 21, 2025, 10:40 a.m. UTC | #2
On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote:
> > Implements routines to set and get different settings  of
> > the congestion control. This will enable the users to modify
> > the settings according to their network.
>
> Should something like this be in debugfs though?
Since these are Broadcom specific parameters, i thought its better to
be under debugfs. Also I took the reference of a similar
implementation in mlx5.
>
> bnxt_qplib_modify_cc() is just sending a firmware command, seems like
> this should belong to fwctl?
Agree. We can move to this model once fwctl is accepted. For now, it
is important for us to support our customers with an immediate
solution. Customers are asking for this support.

>
> Additionally there may be interest in some common way to control CC
> for RDMA..

Do you think there are common parameters for multiple vendors here? I
think enable/disable is an option.

>
> Jason
Jason Gunthorpe Jan. 21, 2025, 3:31 p.m. UTC | #3
On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote:
> On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote:
> > > Implements routines to set and get different settings  of
> > > the congestion control. This will enable the users to modify
> > > the settings according to their network.
> >
> > Should something like this be in debugfs though?
> Since these are Broadcom specific parameters, i thought its better to
> be under debugfs. Also I took the reference of a similar
> implementation in mlx5.

debugfs is disabled in a lot of deployments, it is a big part of why
we are doing fwctl. If you know it works for you cases, debugfs is
pretty open ended..

> > bnxt_qplib_modify_cc() is just sending a firmware command, seems like
> > this should belong to fwctl?
> Agree. We can move to this model once fwctl is accepted. For now, it
> is important for us to support our customers with an immediate
> solution. Customers are asking for this support.

Well, fwctl can be accepted when you guys come through with an
implementation :)
 
> > Additionally there may be interest in some common way to control CC
> > for RDMA..
> 
> Do you think there are common parameters for multiple vendors here? I
> think enable/disable is an option.

I haven't seen much commonality here, every site seems to have their
own totally different stuff right now.

Jason
Selvin Xavier Jan. 22, 2025, 8:09 a.m. UTC | #4
On Tue, Jan 21, 2025 at 9:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote:
> > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote:
> > > > Implements routines to set and get different settings  of
> > > > the congestion control. This will enable the users to modify
> > > > the settings according to their network.
> > >
> > > Should something like this be in debugfs though?
> > Since these are Broadcom specific parameters, i thought its better to
> > be under debugfs. Also I took the reference of a similar
> > implementation in mlx5.
>
> debugfs is disabled in a lot of deployments, it is a big part of why
> we are doing fwctl. If you know it works for you cases, debugfs is
> pretty open ended..
The main use case for this debugfs support is for evaluation customers and
the tuning for their network. So debugfs should be okay.
>
> > > bnxt_qplib_modify_cc() is just sending a firmware command, seems like
> > > this should belong to fwctl?
> > Agree. We can move to this model once fwctl is accepted. For now, it
> > is important for us to support our customers with an immediate
> > solution. Customers are asking for this support.
>
> Well, fwctl can be accepted when you guys come through with an
> implementation :)
>
> > > Additionally there may be interest in some common way to control CC
> > > for RDMA..
> >
> > Do you think there are common parameters for multiple vendors here? I
> > think enable/disable is an option.
>
> I haven't seen much commonality here, every site seems to have their
> own totally different stuff right now.
>
> Jason
Jason Gunthorpe Jan. 22, 2025, 3:22 p.m. UTC | #5
On Wed, Jan 22, 2025 at 01:39:16PM +0530, Selvin Xavier wrote:
> On Tue, Jan 21, 2025 at 9:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote:
> > > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote:
> > > > > Implements routines to set and get different settings  of
> > > > > the congestion control. This will enable the users to modify
> > > > > the settings according to their network.
> > > >
> > > > Should something like this be in debugfs though?
> > > Since these are Broadcom specific parameters, i thought its better to
> > > be under debugfs. Also I took the reference of a similar
> > > implementation in mlx5.
> >
> > debugfs is disabled in a lot of deployments, it is a big part of why
> > we are doing fwctl. If you know it works for you cases, debugfs is
> > pretty open ended..
> The main use case for this debugfs support is for evaluation customers and
> the tuning for their network. So debugfs should be okay.

In my experience it makes no difference, if the customer is using
secure boot then they are always using secure boot expect in small lab
systems perhaps.

Are you certain this is useful not just "should be okay" ?

Jason
Selvin Xavier Jan. 23, 2025, 5:20 a.m. UTC | #6
On Wed, Jan 22, 2025 at 8:52 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Wed, Jan 22, 2025 at 01:39:16PM +0530, Selvin Xavier wrote:
> > On Tue, Jan 21, 2025 at 9:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote:
> > > > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote:
> > > > > > Implements routines to set and get different settings  of
> > > > > > the congestion control. This will enable the users to modify
> > > > > > the settings according to their network.
> > > > >
> > > > > Should something like this be in debugfs though?
> > > > Since these are Broadcom specific parameters, i thought its better to
> > > > be under debugfs. Also I took the reference of a similar
> > > > implementation in mlx5.
> > >
> > > debugfs is disabled in a lot of deployments, it is a big part of why
> > > we are doing fwctl. If you know it works for you cases, debugfs is
> > > pretty open ended..
> > The main use case for this debugfs support is for evaluation customers and
> > the tuning for their network. So debugfs should be okay.
>
> In my experience it makes no difference, if the customer is using
> secure boot then they are always using secure boot expect in small lab
> systems perhaps.
>
> Are you certain this is useful not just "should be okay" ?
Yes. It is useful based on the request we have seen from our customers.
>
> Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
index b91a85a..b33b04e 100644
--- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
+++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
@@ -232,6 +232,8 @@  struct bnxt_re_dev {
 	unsigned long			event_bitmap;
 	struct bnxt_qplib_cc_param	cc_param;
 	struct workqueue_struct		*dcb_wq;
+	struct dentry                   *cc_config;
+	struct bnxt_re_dbg_cc_config_params *cc_config_params;
 };
 
 #define to_bnxt_re_dev(ptr, member)	\
diff --git a/drivers/infiniband/hw/bnxt_re/debugfs.c b/drivers/infiniband/hw/bnxt_re/debugfs.c
index 7c47039..f4dd2fb 100644
--- a/drivers/infiniband/hw/bnxt_re/debugfs.c
+++ b/drivers/infiniband/hw/bnxt_re/debugfs.c
@@ -22,6 +22,23 @@ 
 
 static struct dentry *bnxt_re_debugfs_root;
 
+static const char * const bnxt_re_cc_gen0_name[] = {
+	"enable_cc",
+	"run_avg_weight_g",
+	"num_phase_per_state",
+	"init_cr",
+	"init_tr",
+	"tos_ecn",
+	"tos_dscp",
+	"alt_vlan_pcp",
+	"alt_vlan_dscp",
+	"rtt",
+	"cc_mode",
+	"tcp_cp",
+	"tx_queue",
+	"inactivity_cp",
+};
+
 static inline const char *bnxt_re_qp_state_str(u8 state)
 {
 	switch (state) {
@@ -110,19 +127,212 @@  void bnxt_re_debug_rem_qpinfo(struct bnxt_re_dev *rdev, struct bnxt_re_qp *qp)
 	debugfs_remove(qp->dentry);
 }
 
+static int map_cc_config_offset_gen0_ext0(u32 offset, struct bnxt_qplib_cc_param *ccparam, u32 *val)
+{
+	u64 map_offset;
+
+	map_offset = BIT(offset);
+
+	switch (map_offset) {
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ENABLE_CC:
+		*val = ccparam->enable;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_G:
+		*val = ccparam->g;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_NUMPHASEPERSTATE:
+		*val = ccparam->nph_per_state;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_CR:
+		*val = ccparam->init_cr;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_TR:
+		*val = ccparam->init_tr;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_ECN:
+		*val = ccparam->tos_ecn;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_DSCP:
+		*val =  ccparam->tos_dscp;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_VLAN_PCP:
+		*val = ccparam->alt_vlan_pcp;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_TOS_DSCP:
+		*val = ccparam->alt_tos_dscp;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_RTT:
+	       *val = ccparam->rtt;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_CC_MODE:
+		*val = ccparam->cc_mode;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TCP_CP:
+		*val =  ccparam->tcp_cp;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static ssize_t bnxt_re_cc_config_get(struct file *filp, char __user *buffer,
+				     size_t usr_buf_len, loff_t *ppos)
+{
+	struct bnxt_re_cc_param *dbg_cc_param = filp->private_data;
+	struct bnxt_re_dev *rdev = dbg_cc_param->rdev;
+	struct bnxt_qplib_cc_param ccparam = {};
+	u32 offset = dbg_cc_param->offset;
+	char buf[16];
+	u32 val;
+	int rc;
+
+	rc = bnxt_qplib_query_cc_param(&rdev->qplib_res, &ccparam);
+	if (rc)
+		return rc;
+
+	rc = map_cc_config_offset_gen0_ext0(offset, &ccparam, &val);
+	if (rc)
+		return rc;
+
+	rc = snprintf(buf, sizeof(buf), "%d\n", val);
+	if (rc < 0)
+		return rc;
+
+	return simple_read_from_buffer(buffer, usr_buf_len, ppos, (u8 *)(buf), rc);
+}
+
+static void bnxt_re_fill_gen0_ext0(struct bnxt_qplib_cc_param *ccparam, u32 offset, u32 val)
+{
+	u32 modify_mask;
+
+	modify_mask = BIT(offset);
+
+	switch (modify_mask) {
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ENABLE_CC:
+		ccparam->enable = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_G:
+		ccparam->g = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_NUMPHASEPERSTATE:
+		ccparam->nph_per_state = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_CR:
+		ccparam->init_cr = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_TR:
+		ccparam->init_tr = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_ECN:
+		ccparam->tos_ecn = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_DSCP:
+		ccparam->tos_dscp = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_VLAN_PCP:
+		ccparam->alt_vlan_pcp = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_TOS_DSCP:
+		ccparam->alt_tos_dscp = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_RTT:
+		ccparam->rtt = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_CC_MODE:
+		ccparam->cc_mode = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TCP_CP:
+		ccparam->tcp_cp = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TX_QUEUE:
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INACTIVITY_CP:
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TIME_PER_PHASE:
+		ccparam->time_pph = val;
+		break;
+	case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_PKTS_PER_PHASE:
+		ccparam->pkts_pph = val;
+		break;
+	}
+
+	ccparam->mask = modify_mask;
+}
+
+static int bnxt_re_configure_cc(struct bnxt_re_dev *rdev, u32 gen_ext, u32 offset, u32 val)
+{
+	struct bnxt_qplib_cc_param ccparam = { };
+
+	/* Supporting only Gen 0 now */
+	if (gen_ext != CC_CONFIG_GEN0_EXT0)
+		bnxt_re_fill_gen0_ext0(&ccparam, offset, val);
+	else
+		return -EINVAL;
+
+	bnxt_qplib_modify_cc(&rdev->qplib_res, &ccparam);
+	return 0;
+}
+
+static ssize_t bnxt_re_cc_config_set(struct file *filp, const char __user *buffer,
+				     size_t count, loff_t *ppos)
+{
+	struct bnxt_re_cc_param *dbg_cc_param = filp->private_data;
+	struct bnxt_re_dev *rdev = dbg_cc_param->rdev;
+	u32 offset = dbg_cc_param->offset;
+	u8 cc_gen = dbg_cc_param->cc_gen;
+	char buf[16];
+	u32 val;
+	int rc;
+
+	if (copy_from_user(buf, buffer, count))
+		return -EFAULT;
+
+	buf[count] = '\0';
+	if (kstrtou32(buf, 0, &val))
+		return -EINVAL;
+
+	rc = bnxt_re_configure_cc(rdev, cc_gen, offset, val);
+	return rc ? rc : count;
+}
+
+static const struct file_operations bnxt_re_cc_config_ops = {
+	.owner = THIS_MODULE,
+	.open = simple_open,
+	.read = bnxt_re_cc_config_get,
+	.write = bnxt_re_cc_config_set,
+};
+
 void bnxt_re_debugfs_add_pdev(struct bnxt_re_dev *rdev)
 {
 	struct pci_dev *pdev = rdev->en_dev->pdev;
+	struct bnxt_re_dbg_cc_config_params *cc_params;
+	int i;
 
 	rdev->dbg_root = debugfs_create_dir(dev_name(&pdev->dev), bnxt_re_debugfs_root);
 
 	rdev->qp_debugfs = debugfs_create_dir("QPs", rdev->dbg_root);
+	rdev->cc_config = debugfs_create_dir("cc_config", rdev->dbg_root);
+
+	rdev->cc_config_params = kzalloc(sizeof(*cc_params), GFP_KERNEL);
+
+	for (i = 0; i < BNXT_RE_CC_PARAM_GEN0; i++) {
+		struct bnxt_re_cc_param *tmp_params = &rdev->cc_config_params->gen0_parms[i];
+
+		tmp_params->rdev = rdev;
+		tmp_params->offset = i;
+		tmp_params->cc_gen = CC_CONFIG_GEN0_EXT0;
+		tmp_params->dentry = debugfs_create_file(bnxt_re_cc_gen0_name[i], 0400,
+							 rdev->cc_config, tmp_params,
+							 &bnxt_re_cc_config_ops);
+	}
 }
 
 void bnxt_re_debugfs_rem_pdev(struct bnxt_re_dev *rdev)
 {
 	debugfs_remove_recursive(rdev->qp_debugfs);
-
+	debugfs_remove_recursive(rdev->cc_config);
+	kfree(rdev->cc_config_params);
 	debugfs_remove_recursive(rdev->dbg_root);
 	rdev->dbg_root = NULL;
 }
diff --git a/drivers/infiniband/hw/bnxt_re/debugfs.h b/drivers/infiniband/hw/bnxt_re/debugfs.h
index cd3be0a9..3374097 100644
--- a/drivers/infiniband/hw/bnxt_re/debugfs.h
+++ b/drivers/infiniband/hw/bnxt_re/debugfs.h
@@ -18,4 +18,19 @@  void bnxt_re_debugfs_rem_pdev(struct bnxt_re_dev *rdev);
 void bnxt_re_register_debugfs(void);
 void bnxt_re_unregister_debugfs(void);
 
+#define CC_CONFIG_GEN_EXT(x, y)	(((x) << 16) | (y))
+#define CC_CONFIG_GEN0_EXT0  CC_CONFIG_GEN_EXT(0, 0)
+
+#define BNXT_RE_CC_PARAM_GEN0	__ffs(CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INACTIVITY_CP)
+
+struct bnxt_re_cc_param {
+	struct bnxt_re_dev *rdev;
+	struct dentry *dentry;
+	u32 offset;
+	u8 cc_gen;
+};
+
+struct bnxt_re_dbg_cc_config_params {
+	struct bnxt_re_cc_param        gen0_parms[BNXT_RE_CC_PARAM_GEN0];
+};
 #endif