Message ID | 1737301535-6599-1-git-send-email-selvin.xavier@broadcom.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [for-next,v2] RDMA/bnxt_re: Congestion control settings using debugfs hook | expand |
On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote: > Implements routines to set and get different settings of > the congestion control. This will enable the users to modify > the settings according to their network. Should something like this be in debugfs though? bnxt_qplib_modify_cc() is just sending a firmware command, seems like this should belong to fwctl? Additionally there may be interest in some common way to control CC for RDMA.. Jason
On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote: > > Implements routines to set and get different settings of > > the congestion control. This will enable the users to modify > > the settings according to their network. > > Should something like this be in debugfs though? Since these are Broadcom specific parameters, i thought its better to be under debugfs. Also I took the reference of a similar implementation in mlx5. > > bnxt_qplib_modify_cc() is just sending a firmware command, seems like > this should belong to fwctl? Agree. We can move to this model once fwctl is accepted. For now, it is important for us to support our customers with an immediate solution. Customers are asking for this support. > > Additionally there may be interest in some common way to control CC > for RDMA.. Do you think there are common parameters for multiple vendors here? I think enable/disable is an option. > > Jason
On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote: > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote: > > > Implements routines to set and get different settings of > > > the congestion control. This will enable the users to modify > > > the settings according to their network. > > > > Should something like this be in debugfs though? > Since these are Broadcom specific parameters, i thought its better to > be under debugfs. Also I took the reference of a similar > implementation in mlx5. debugfs is disabled in a lot of deployments, it is a big part of why we are doing fwctl. If you know it works for you cases, debugfs is pretty open ended.. > > bnxt_qplib_modify_cc() is just sending a firmware command, seems like > > this should belong to fwctl? > Agree. We can move to this model once fwctl is accepted. For now, it > is important for us to support our customers with an immediate > solution. Customers are asking for this support. Well, fwctl can be accepted when you guys come through with an implementation :) > > Additionally there may be interest in some common way to control CC > > for RDMA.. > > Do you think there are common parameters for multiple vendors here? I > think enable/disable is an option. I haven't seen much commonality here, every site seems to have their own totally different stuff right now. Jason
On Tue, Jan 21, 2025 at 9:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote: > > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote: > > > > Implements routines to set and get different settings of > > > > the congestion control. This will enable the users to modify > > > > the settings according to their network. > > > > > > Should something like this be in debugfs though? > > Since these are Broadcom specific parameters, i thought its better to > > be under debugfs. Also I took the reference of a similar > > implementation in mlx5. > > debugfs is disabled in a lot of deployments, it is a big part of why > we are doing fwctl. If you know it works for you cases, debugfs is > pretty open ended.. The main use case for this debugfs support is for evaluation customers and the tuning for their network. So debugfs should be okay. > > > > bnxt_qplib_modify_cc() is just sending a firmware command, seems like > > > this should belong to fwctl? > > Agree. We can move to this model once fwctl is accepted. For now, it > > is important for us to support our customers with an immediate > > solution. Customers are asking for this support. > > Well, fwctl can be accepted when you guys come through with an > implementation :) > > > > Additionally there may be interest in some common way to control CC > > > for RDMA.. > > > > Do you think there are common parameters for multiple vendors here? I > > think enable/disable is an option. > > I haven't seen much commonality here, every site seems to have their > own totally different stuff right now. > > Jason
On Wed, Jan 22, 2025 at 01:39:16PM +0530, Selvin Xavier wrote: > On Tue, Jan 21, 2025 at 9:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote: > > > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote: > > > > > Implements routines to set and get different settings of > > > > > the congestion control. This will enable the users to modify > > > > > the settings according to their network. > > > > > > > > Should something like this be in debugfs though? > > > Since these are Broadcom specific parameters, i thought its better to > > > be under debugfs. Also I took the reference of a similar > > > implementation in mlx5. > > > > debugfs is disabled in a lot of deployments, it is a big part of why > > we are doing fwctl. If you know it works for you cases, debugfs is > > pretty open ended.. > The main use case for this debugfs support is for evaluation customers and > the tuning for their network. So debugfs should be okay. In my experience it makes no difference, if the customer is using secure boot then they are always using secure boot expect in small lab systems perhaps. Are you certain this is useful not just "should be okay" ? Jason
On Wed, Jan 22, 2025 at 8:52 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Wed, Jan 22, 2025 at 01:39:16PM +0530, Selvin Xavier wrote: > > On Tue, Jan 21, 2025 at 9:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > On Tue, Jan 21, 2025 at 04:10:33PM +0530, Selvin Xavier wrote: > > > > On Mon, Jan 20, 2025 at 10:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > > > > > > > On Sun, Jan 19, 2025 at 07:45:35AM -0800, Selvin Xavier wrote: > > > > > > Implements routines to set and get different settings of > > > > > > the congestion control. This will enable the users to modify > > > > > > the settings according to their network. > > > > > > > > > > Should something like this be in debugfs though? > > > > Since these are Broadcom specific parameters, i thought its better to > > > > be under debugfs. Also I took the reference of a similar > > > > implementation in mlx5. > > > > > > debugfs is disabled in a lot of deployments, it is a big part of why > > > we are doing fwctl. If you know it works for you cases, debugfs is > > > pretty open ended.. > > The main use case for this debugfs support is for evaluation customers and > > the tuning for their network. So debugfs should be okay. > > In my experience it makes no difference, if the customer is using > secure boot then they are always using secure boot expect in small lab > systems perhaps. > > Are you certain this is useful not just "should be okay" ? Yes. It is useful based on the request we have seen from our customers. > > Jason
diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h index b91a85a..b33b04e 100644 --- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h +++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h @@ -232,6 +232,8 @@ struct bnxt_re_dev { unsigned long event_bitmap; struct bnxt_qplib_cc_param cc_param; struct workqueue_struct *dcb_wq; + struct dentry *cc_config; + struct bnxt_re_dbg_cc_config_params *cc_config_params; }; #define to_bnxt_re_dev(ptr, member) \ diff --git a/drivers/infiniband/hw/bnxt_re/debugfs.c b/drivers/infiniband/hw/bnxt_re/debugfs.c index 7c47039..f4dd2fb 100644 --- a/drivers/infiniband/hw/bnxt_re/debugfs.c +++ b/drivers/infiniband/hw/bnxt_re/debugfs.c @@ -22,6 +22,23 @@ static struct dentry *bnxt_re_debugfs_root; +static const char * const bnxt_re_cc_gen0_name[] = { + "enable_cc", + "run_avg_weight_g", + "num_phase_per_state", + "init_cr", + "init_tr", + "tos_ecn", + "tos_dscp", + "alt_vlan_pcp", + "alt_vlan_dscp", + "rtt", + "cc_mode", + "tcp_cp", + "tx_queue", + "inactivity_cp", +}; + static inline const char *bnxt_re_qp_state_str(u8 state) { switch (state) { @@ -110,19 +127,212 @@ void bnxt_re_debug_rem_qpinfo(struct bnxt_re_dev *rdev, struct bnxt_re_qp *qp) debugfs_remove(qp->dentry); } +static int map_cc_config_offset_gen0_ext0(u32 offset, struct bnxt_qplib_cc_param *ccparam, u32 *val) +{ + u64 map_offset; + + map_offset = BIT(offset); + + switch (map_offset) { + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ENABLE_CC: + *val = ccparam->enable; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_G: + *val = ccparam->g; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_NUMPHASEPERSTATE: + *val = ccparam->nph_per_state; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_CR: + *val = ccparam->init_cr; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_TR: + *val = ccparam->init_tr; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_ECN: + *val = ccparam->tos_ecn; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_DSCP: + *val = ccparam->tos_dscp; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_VLAN_PCP: + *val = ccparam->alt_vlan_pcp; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_TOS_DSCP: + *val = ccparam->alt_tos_dscp; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_RTT: + *val = ccparam->rtt; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_CC_MODE: + *val = ccparam->cc_mode; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TCP_CP: + *val = ccparam->tcp_cp; + break; + default: + return -EINVAL; + } + + return 0; +} + +static ssize_t bnxt_re_cc_config_get(struct file *filp, char __user *buffer, + size_t usr_buf_len, loff_t *ppos) +{ + struct bnxt_re_cc_param *dbg_cc_param = filp->private_data; + struct bnxt_re_dev *rdev = dbg_cc_param->rdev; + struct bnxt_qplib_cc_param ccparam = {}; + u32 offset = dbg_cc_param->offset; + char buf[16]; + u32 val; + int rc; + + rc = bnxt_qplib_query_cc_param(&rdev->qplib_res, &ccparam); + if (rc) + return rc; + + rc = map_cc_config_offset_gen0_ext0(offset, &ccparam, &val); + if (rc) + return rc; + + rc = snprintf(buf, sizeof(buf), "%d\n", val); + if (rc < 0) + return rc; + + return simple_read_from_buffer(buffer, usr_buf_len, ppos, (u8 *)(buf), rc); +} + +static void bnxt_re_fill_gen0_ext0(struct bnxt_qplib_cc_param *ccparam, u32 offset, u32 val) +{ + u32 modify_mask; + + modify_mask = BIT(offset); + + switch (modify_mask) { + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ENABLE_CC: + ccparam->enable = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_G: + ccparam->g = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_NUMPHASEPERSTATE: + ccparam->nph_per_state = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_CR: + ccparam->init_cr = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INIT_TR: + ccparam->init_tr = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_ECN: + ccparam->tos_ecn = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TOS_DSCP: + ccparam->tos_dscp = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_VLAN_PCP: + ccparam->alt_vlan_pcp = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_ALT_TOS_DSCP: + ccparam->alt_tos_dscp = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_RTT: + ccparam->rtt = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_CC_MODE: + ccparam->cc_mode = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TCP_CP: + ccparam->tcp_cp = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TX_QUEUE: + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INACTIVITY_CP: + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_TIME_PER_PHASE: + ccparam->time_pph = val; + break; + case CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_PKTS_PER_PHASE: + ccparam->pkts_pph = val; + break; + } + + ccparam->mask = modify_mask; +} + +static int bnxt_re_configure_cc(struct bnxt_re_dev *rdev, u32 gen_ext, u32 offset, u32 val) +{ + struct bnxt_qplib_cc_param ccparam = { }; + + /* Supporting only Gen 0 now */ + if (gen_ext != CC_CONFIG_GEN0_EXT0) + bnxt_re_fill_gen0_ext0(&ccparam, offset, val); + else + return -EINVAL; + + bnxt_qplib_modify_cc(&rdev->qplib_res, &ccparam); + return 0; +} + +static ssize_t bnxt_re_cc_config_set(struct file *filp, const char __user *buffer, + size_t count, loff_t *ppos) +{ + struct bnxt_re_cc_param *dbg_cc_param = filp->private_data; + struct bnxt_re_dev *rdev = dbg_cc_param->rdev; + u32 offset = dbg_cc_param->offset; + u8 cc_gen = dbg_cc_param->cc_gen; + char buf[16]; + u32 val; + int rc; + + if (copy_from_user(buf, buffer, count)) + return -EFAULT; + + buf[count] = '\0'; + if (kstrtou32(buf, 0, &val)) + return -EINVAL; + + rc = bnxt_re_configure_cc(rdev, cc_gen, offset, val); + return rc ? rc : count; +} + +static const struct file_operations bnxt_re_cc_config_ops = { + .owner = THIS_MODULE, + .open = simple_open, + .read = bnxt_re_cc_config_get, + .write = bnxt_re_cc_config_set, +}; + void bnxt_re_debugfs_add_pdev(struct bnxt_re_dev *rdev) { struct pci_dev *pdev = rdev->en_dev->pdev; + struct bnxt_re_dbg_cc_config_params *cc_params; + int i; rdev->dbg_root = debugfs_create_dir(dev_name(&pdev->dev), bnxt_re_debugfs_root); rdev->qp_debugfs = debugfs_create_dir("QPs", rdev->dbg_root); + rdev->cc_config = debugfs_create_dir("cc_config", rdev->dbg_root); + + rdev->cc_config_params = kzalloc(sizeof(*cc_params), GFP_KERNEL); + + for (i = 0; i < BNXT_RE_CC_PARAM_GEN0; i++) { + struct bnxt_re_cc_param *tmp_params = &rdev->cc_config_params->gen0_parms[i]; + + tmp_params->rdev = rdev; + tmp_params->offset = i; + tmp_params->cc_gen = CC_CONFIG_GEN0_EXT0; + tmp_params->dentry = debugfs_create_file(bnxt_re_cc_gen0_name[i], 0400, + rdev->cc_config, tmp_params, + &bnxt_re_cc_config_ops); + } } void bnxt_re_debugfs_rem_pdev(struct bnxt_re_dev *rdev) { debugfs_remove_recursive(rdev->qp_debugfs); - + debugfs_remove_recursive(rdev->cc_config); + kfree(rdev->cc_config_params); debugfs_remove_recursive(rdev->dbg_root); rdev->dbg_root = NULL; } diff --git a/drivers/infiniband/hw/bnxt_re/debugfs.h b/drivers/infiniband/hw/bnxt_re/debugfs.h index cd3be0a9..3374097 100644 --- a/drivers/infiniband/hw/bnxt_re/debugfs.h +++ b/drivers/infiniband/hw/bnxt_re/debugfs.h @@ -18,4 +18,19 @@ void bnxt_re_debugfs_rem_pdev(struct bnxt_re_dev *rdev); void bnxt_re_register_debugfs(void); void bnxt_re_unregister_debugfs(void); +#define CC_CONFIG_GEN_EXT(x, y) (((x) << 16) | (y)) +#define CC_CONFIG_GEN0_EXT0 CC_CONFIG_GEN_EXT(0, 0) + +#define BNXT_RE_CC_PARAM_GEN0 __ffs(CMDQ_MODIFY_ROCE_CC_MODIFY_MASK_INACTIVITY_CP) + +struct bnxt_re_cc_param { + struct bnxt_re_dev *rdev; + struct dentry *dentry; + u32 offset; + u8 cc_gen; +}; + +struct bnxt_re_dbg_cc_config_params { + struct bnxt_re_cc_param gen0_parms[BNXT_RE_CC_PARAM_GEN0]; +}; #endif
Implements routines to set and get different settings of the congestion control. This will enable the users to modify the settings according to their network. Currently supporting only GEN 0 version of the parameters. Reading these files queries the firmware and report the values currently programmed. Writing to the files sends commands that update the congestion control settings Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> --- v1 -> v2: Addressed Leon's comments - rename debugfs file "g" to "run_avg_weight_g" - Fix the indentation errors - Remove the unnecessary error message during the read entry point - Fix the return value drivers/infiniband/hw/bnxt_re/bnxt_re.h | 2 + drivers/infiniband/hw/bnxt_re/debugfs.c | 212 +++++++++++++++++++++++++++++++- drivers/infiniband/hw/bnxt_re/debugfs.h | 15 +++ 3 files changed, 228 insertions(+), 1 deletion(-)