From patchwork Tue Mar 4 14:15:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 14000820 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC5F9205AB0; Tue, 4 Mar 2025 14:15:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097755; cv=none; b=rZAI2lF7sTaw711RL8Q+qhAopQ3uPeTHqrS8BJuIickKzOT3zpm9WsTKqPVlN+is3XIARAqTzw3F8YbpqM864UBFBMS/UJH/o+0bW6oRpQ1fk7dCaQHrocvGG5VOzpXuZ2GQZY+JI8fco8/sloyMU5/E9PK01jvVGp8Usj6jxO4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097755; c=relaxed/simple; bh=SJhWAKXdgTLOGMX8zrfgGfyH+Mez40csD9Qg+/i2ZyI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gEhfDM5JJec3kX8BKkojSHwOpCAi5LLp6O4K5wE/DK0eS9WLO5jhehg+AhAbW4emGZFW3BNfpkXYc7tkw7Zx7G9eIB+DcIImBEWFzrSSL+QAoD1yNr3oaIdGnGYqYInWagYB43Al9et3+wcozGMLUEwL32apxFApsJWsSq6REd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=U88frQbM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="U88frQbM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AB729C4CEE5; Tue, 4 Mar 2025 14:15:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741097754; bh=SJhWAKXdgTLOGMX8zrfgGfyH+Mez40csD9Qg+/i2ZyI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=U88frQbME+BjO1Sr45JFV/nc7mxTS2vq788iUmAw8/ydz1CrC3sontBVrDohbJMp1 DGsvzI57wlyhQDgemH8ITALKGlv5wGdOFrCUgRQJ7xxx1cN3SSXVHcVHM5HH9Ym+ht Z80AODVgKSV3359dH3JMnqy8gTzc9AhwAMLOsTLPPP18eYQibVHk8ST33YFqV8w1gR QMVk4wcY19Gte9Ho3rUqfmf6BpD6UTZxtN0uRnW3PVsdZ8zQ1czBN0l1T9O5ZJXAB8 18eSAr+Yc6pCC4o/rUxZNpBzElZOYy7JoHjY0UVoRr5ZQNjiaZP99gw+vFr1J4S4dz Qu/Ft40VHE5IQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Bloch , netdev@vger.kernel.org, Saeed Mahameed , Tariq Toukan Subject: [PATCH mlx5-next 1/5] RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytes Date: Tue, 4 Mar 2025 16:15:25 +0200 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrisious Haddad Add the following optional counters: rdma_tx_packets,rdma_rx_bytes,rdma_rx_packets,rdma_tx_bytes. Which counts all RDMA packets/bytes sent and received per link. Note that since each direction packet and byte counter are shared, the counter is only reset when both counters of that direction are removed. But from user-perspective each can be enabled/disabled separately. The counters can be enabled using: sudo rdma stat set link rocep8s0f0/1 optional-counters rdma_tx_packets And can be seen using: rdma stat -j show link rocep8s0f0/1 Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 86 ++++++++++++++++++++++++++- drivers/infiniband/hw/mlx5/fs.c | 46 +++++++++++++- drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 ++ include/linux/mlx5/device.h | 4 +- 4 files changed, 133 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index 4f6c1968a2ee..e2a54c1dbc7a 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -140,6 +140,13 @@ static const struct mlx5_ib_counter rdmatx_cnp_op_cnts[] = { INIT_OP_COUNTER(cc_tx_cnp_pkts, CC_TX_CNP_PKTS), }; +static const struct mlx5_ib_counter packets_op_cnts[] = { + INIT_OP_COUNTER(rdma_tx_packets, RDMA_TX_PACKETS), + INIT_OP_COUNTER(rdma_tx_bytes, RDMA_TX_BYTES), + INIT_OP_COUNTER(rdma_rx_packets, RDMA_RX_PACKETS), + INIT_OP_COUNTER(rdma_rx_bytes, RDMA_RX_BYTES), +}; + static int mlx5_ib_read_counters(struct ib_counters *counters, struct ib_counters_read_attr *read_attr, struct uverbs_attr_bundle *attrs) @@ -427,6 +434,15 @@ static int do_get_hw_stats(struct ib_device *ibdev, return num_counters; } +static bool is_rdma_bytes_counter(u32 type) +{ + if (type == MLX5_IB_OPCOUNTER_RDMA_TX_BYTES || + type == MLX5_IB_OPCOUNTER_RDMA_RX_BYTES) + return true; + + return false; +} + static int do_get_op_stat(struct ib_device *ibdev, struct rdma_hw_stats *stats, u32 port_num, int index) @@ -434,7 +450,7 @@ static int do_get_op_stat(struct ib_device *ibdev, struct mlx5_ib_dev *dev = to_mdev(ibdev); const struct mlx5_ib_counters *cnts; const struct mlx5_ib_op_fc *opfcs; - u64 packets = 0, bytes; + u64 packets, bytes; u32 type; int ret; @@ -453,8 +469,11 @@ static int do_get_op_stat(struct ib_device *ibdev, if (ret) return ret; + if (is_rdma_bytes_counter(type)) + stats->value[index] = bytes; + else + stats->value[index] = packets; out: - stats->value[index] = packets; return index; } @@ -677,6 +696,12 @@ static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev, descs[j].priv = &rdmatx_cnp_op_cnts[i].type; } } + + for (i = 0; i < ARRAY_SIZE(packets_op_cnts); i++, j++) { + descs[j].name = packets_op_cnts[i].name; + descs[j].flags |= IB_STAT_FLAG_OPTIONAL; + descs[j].priv = &packets_op_cnts[i].type; + } } @@ -727,6 +752,8 @@ static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev, num_op_counters = ARRAY_SIZE(basic_op_cnts); + num_op_counters += ARRAY_SIZE(packets_op_cnts); + if (MLX5_CAP_FLOWTABLE(dev->mdev, ft_field_support_2_nic_receive_rdma.bth_opcode)) num_op_counters += ARRAY_SIZE(rdmarx_cnp_op_cnts); @@ -756,10 +783,47 @@ static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev, return -ENOMEM; } +/* + * Checks if the given flow counter type should be sharing the same flow counter + * with another type and if it should, checks if that other type flow counter + * was already created, if both conditions are met return true and the counter + * else return false. + */ +static bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, + u32 type, + struct mlx5_ib_op_fc **opfc) +{ + u32 shared_fc_type; + + switch (type) { + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS; + break; + default: + return false; + } + + *opfc = &opfcs[shared_fc_type]; + if (!(*opfc)->fc) + return false; + + return true; +} + static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev) { u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {}; int num_cnt_ports = dev->num_ports; + struct mlx5_ib_op_fc *in_use_opfc; int i, j; if (is_mdev_switchdev_mode(dev->mdev)) @@ -781,11 +845,16 @@ static void mlx5_ib_dealloc_counters(struct mlx5_ib_dev *dev) if (!dev->port[i].cnts.opfcs[j].fc) continue; + if (mlx5r_is_opfc_shared_and_in_use( + dev->port[i].cnts.opfcs, j, &in_use_opfc)) + goto skip; + if (IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)) mlx5_ib_fs_remove_op_fc(dev, &dev->port[i].cnts.opfcs[j], j); mlx5_fc_destroy(dev->mdev, dev->port[i].cnts.opfcs[j].fc); +skip: dev->port[i].cnts.opfcs[j].fc = NULL; } } @@ -979,8 +1048,8 @@ static int mlx5_ib_modify_stat(struct ib_device *device, u32 port, unsigned int index, bool enable) { struct mlx5_ib_dev *dev = to_mdev(device); + struct mlx5_ib_op_fc *opfc, *in_use_opfc; struct mlx5_ib_counters *cnts; - struct mlx5_ib_op_fc *opfc; u32 num_hw_counters, type; int ret; @@ -1004,6 +1073,13 @@ static int mlx5_ib_modify_stat(struct ib_device *device, u32 port, if (opfc->fc) return -EEXIST; + if (mlx5r_is_opfc_shared_and_in_use(cnts->opfcs, type, + &in_use_opfc)) { + opfc->fc = in_use_opfc->fc; + opfc->rule[0] = in_use_opfc->rule[0]; + return 0; + } + opfc->fc = mlx5_fc_create(dev->mdev, false); if (IS_ERR(opfc->fc)) return PTR_ERR(opfc->fc); @@ -1019,8 +1095,12 @@ static int mlx5_ib_modify_stat(struct ib_device *device, u32 port, if (!opfc->fc) return -EINVAL; + if (mlx5r_is_opfc_shared_and_in_use(cnts->opfcs, type, &in_use_opfc)) + goto out; + mlx5_ib_fs_remove_op_fc(dev, opfc, type); mlx5_fc_destroy(dev->mdev, opfc->fc); +out: opfc->fc = NULL; return 0; } diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c index 6ae2801fa13f..93b229e9aab3 100644 --- a/drivers/infiniband/hw/mlx5/fs.c +++ b/drivers/infiniband/hw/mlx5/fs.c @@ -802,10 +802,12 @@ static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev, enum { RDMA_RX_ECN_OPCOUNTER_PRIO, RDMA_RX_CNP_OPCOUNTER_PRIO, + RDMA_RX_PKTS_BYTES_OPCOUNTER_PRIO, }; enum { RDMA_TX_CNP_OPCOUNTER_PRIO, + RDMA_TX_PKTS_BYTES_OPCOUNTER_PRIO, }; static int set_vhca_port_spec(struct mlx5_ib_dev *dev, u32 port_num, @@ -869,6 +871,29 @@ static int set_cnp_spec(struct mlx5_ib_dev *dev, u32 port_num, return 0; } +/* Returns the prio we should use for the given optional counter type, + * whereas for bytes type we use the packet type, since they share the same + * resources. + */ +static struct mlx5_ib_flow_prio *get_opfc_prio(struct mlx5_ib_dev *dev, + u32 type) +{ + u32 prio_type; + + switch (type) { + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: + prio_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: + prio_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS; + break; + default: + prio_type = type; + } + + return &dev->flow_db->opfcs[prio_type]; +} + int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, struct mlx5_ib_op_fc *opfc, enum mlx5_ib_optional_counter_type type) @@ -923,6 +948,20 @@ int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, priority = RDMA_TX_CNP_OPCOUNTER_PRIO; break; + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: + spec_num = 1; + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; + priority = RDMA_TX_PKTS_BYTES_OPCOUNTER_PRIO; + break; + + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: + spec_num = 1; + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; + priority = RDMA_RX_PKTS_BYTES_OPCOUNTER_PRIO; + break; + default: err = -EOPNOTSUPP; goto free; @@ -934,7 +973,7 @@ int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, goto free; } - prio = &dev->flow_db->opfcs[type]; + prio = get_opfc_prio(dev, type); if (!prio->flow_table) { prio = _get_prio(dev, ns, prio, priority, dev->num_ports * MAX_OPFC_RULES, 1, 0, 0); @@ -976,11 +1015,14 @@ void mlx5_ib_fs_remove_op_fc(struct mlx5_ib_dev *dev, struct mlx5_ib_op_fc *opfc, enum mlx5_ib_optional_counter_type type) { + struct mlx5_ib_flow_prio *prio; int i; + prio = get_opfc_prio(dev, type); + for (i = 0; i < MAX_OPFC_RULES && opfc->rule[i]; i++) { mlx5_del_flow_rules(opfc->rule[i]); - put_flow_table(dev, &dev->flow_db->opfcs[type], true); + put_flow_table(dev, prio, true); } } diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index a0138bdfa389..24b18942762c 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -294,6 +294,10 @@ enum mlx5_ib_optional_counter_type { MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS, MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS, MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS, + MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS, + MLX5_IB_OPCOUNTER_RDMA_TX_BYTES, + MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS, + MLX5_IB_OPCOUNTER_RDMA_RX_BYTES, MLX5_IB_OPCOUNTER_MAX, }; diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 8fe56d0362c6..63f0d9fb94b4 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1532,8 +1532,8 @@ static inline u16 mlx5_to_sw_pkey_sz(int pkey_sz) return MLX5_MIN_PKEY_TABLE_SIZE << pkey_sz; } -#define MLX5_RDMA_RX_NUM_COUNTERS_PRIOS 2 -#define MLX5_RDMA_TX_NUM_COUNTERS_PRIOS 1 +#define MLX5_RDMA_RX_NUM_COUNTERS_PRIOS 3 +#define MLX5_RDMA_TX_NUM_COUNTERS_PRIOS 2 #define MLX5_BY_PASS_NUM_REGULAR_PRIOS 16 #define MLX5_BY_PASS_NUM_DONT_TRAP_PRIOS 16 #define MLX5_BY_PASS_NUM_MULTICAST_PRIOS 1 From patchwork Tue Mar 4 14:15:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 14000824 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C26F205506 for ; Tue, 4 Mar 2025 14:16:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097771; cv=none; b=d5LTix1A51bEkCs6INr+cZqbZk/HW1sT7xLPPSk/Hs3mTJrgfs1+yKxYCSdnyecO9KOJ58FYsuZpD+ac4muiskpSqId8bg3hWsPbTbMFDYsT/Hh57fuLKQzaOB/3MzWMu8ZzGqXawrgvEsOr6x9NjiZ8kVGLUJ+r0Kpse+R9wwI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097771; c=relaxed/simple; bh=cEqY5QZCGJX+e8Y5y6oXiuTLANfh+qvdHtqhk4WSvww=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bDcZLWCPoyVbdThrNGCmXNAh2Wz+iI85sYcgcqAJWlZBE9kP6bfKX/81Ujw+JMlHJp3ktXE5T71V3/zhs+3Lgff+PKt2T68lKC8YGkKcmppSyk4tpsC7m8SGVrDr3iKPXin4kW5z7zZDFMQMb34UL7cgjugpdd2xFwoYKWjGve8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=M3F231+7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="M3F231+7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CEEA8C4CEE5; Tue, 4 Mar 2025 14:16:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741097770; bh=cEqY5QZCGJX+e8Y5y6oXiuTLANfh+qvdHtqhk4WSvww=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M3F231+7lXOzyT8bt51Otql7arRC9ers3/DQyyUEEi+cZ0j8VBd72BZTR4jOotSoT yTYQjKu/gMSgAVyNXYDBuwssPxPv7gQNk2LiZV1FtBE/moUVxMflLOcxRDhhb8J9QO VntNtGyIUQKn5+Upg19IG5yI5UCePnPWwkr7C7hVZWnQcQGw9aGJUDmSMEfnc1gleG SL8lfcETLg2Abe4toeVd7IzGYOojF+FcvLog8+fsaIqCCaxaZ7Y0pbPIsImV7ZnNNE RQsj9epQml6Mz6fbpH4xjmgq7f/J3Sfax+2cY0OOW/xZiAZNgodzQY37zbRiCFa59O /nFAe0rlON08g== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Bloch Subject: [PATCH rdma-next 2/5] RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj() Date: Tue, 4 Mar 2025 16:15:26 +0200 Message-ID: <2d1583cdf8a21e816996597a4d382008f08309b2.1741097408.git.leonro@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrisious Haddad Change rdma_counter allocation to use rdma_zalloc_drv_obj() instead of, explicitly allocating at core, in order to be contained inside driver specific structures. Adjust all drivers that use it to have their containing structure, and add driver specific initialization operation. This change is needed to allow upcoming patches to implement optional-counters binding whereas inside each driver specific counter struct his bound optional-counters will be maintained. Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/counters.c | 4 +++- drivers/infiniband/core/device.c | 2 ++ drivers/infiniband/hw/mlx5/counters.c | 8 ++++++++ drivers/infiniband/hw/mlx5/counters.h | 11 +++++++++++ include/rdma/ib_verbs.h | 6 ++++++ 5 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c index af59486fe418..981d5a28614a 100644 --- a/drivers/infiniband/core/counters.c +++ b/drivers/infiniband/core/counters.c @@ -149,13 +149,15 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, if (!dev->ops.counter_dealloc || !dev->ops.counter_alloc_stats) return NULL; - counter = kzalloc(sizeof(*counter), GFP_KERNEL); + counter = rdma_zalloc_drv_obj(dev, rdma_counter); if (!counter) return NULL; counter->device = dev; counter->port = port; + dev->ops.counter_init(counter); + rdma_restrack_new(&counter->res, RDMA_RESTRACK_COUNTER); counter->stats = dev->ops.counter_alloc_stats(counter); if (!counter->stats) diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 8feb22089cbb..bfb10c9a553f 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2678,6 +2678,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) SET_DEVICE_OP(dev_ops, counter_alloc_stats); SET_DEVICE_OP(dev_ops, counter_bind_qp); SET_DEVICE_OP(dev_ops, counter_dealloc); + SET_DEVICE_OP(dev_ops, counter_init); SET_DEVICE_OP(dev_ops, counter_unbind_qp); SET_DEVICE_OP(dev_ops, counter_update_stats); SET_DEVICE_OP(dev_ops, create_ah); @@ -2792,6 +2793,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) SET_OBJ_SIZE(dev_ops, ib_srq); SET_OBJ_SIZE(dev_ops, ib_ucontext); SET_OBJ_SIZE(dev_ops, ib_xrcd); + SET_OBJ_SIZE(dev_ops, rdma_counter); } EXPORT_SYMBOL(ib_set_device_ops); diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index e2a54c1dbc7a..018bb96bdbf4 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -1105,6 +1105,8 @@ static int mlx5_ib_modify_stat(struct ib_device *device, u32 port, return 0; } +static void mlx5_ib_counter_init(struct rdma_counter *counter) {} + static const struct ib_device_ops hw_stats_ops = { .alloc_hw_port_stats = mlx5_ib_alloc_hw_port_stats, .get_hw_stats = mlx5_ib_get_hw_stats, @@ -1115,6 +1117,9 @@ static const struct ib_device_ops hw_stats_ops = { .counter_update_stats = mlx5_ib_counter_update_stats, .modify_hw_stat = IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS) ? mlx5_ib_modify_stat : NULL, + .counter_init = mlx5_ib_counter_init, + + INIT_RDMA_OBJ_SIZE(rdma_counter, mlx5_rdma_counter, rdma_counter), }; static const struct ib_device_ops hw_switchdev_vport_op = { @@ -1129,6 +1134,9 @@ static const struct ib_device_ops hw_switchdev_stats_ops = { .counter_dealloc = mlx5_ib_counter_dealloc, .counter_alloc_stats = mlx5_ib_counter_alloc_stats, .counter_update_stats = mlx5_ib_counter_update_stats, + .counter_init = mlx5_ib_counter_init, + + INIT_RDMA_OBJ_SIZE(rdma_counter, mlx5_rdma_counter, rdma_counter), }; static const struct ib_device_ops counters_ops = { diff --git a/drivers/infiniband/hw/mlx5/counters.h b/drivers/infiniband/hw/mlx5/counters.h index 6bcaaa52e2b2..f153901a43be 100644 --- a/drivers/infiniband/hw/mlx5/counters.h +++ b/drivers/infiniband/hw/mlx5/counters.h @@ -8,6 +8,17 @@ #include "mlx5_ib.h" + +struct mlx5_rdma_counter { + struct rdma_counter rdma_counter; +}; + +static inline struct mlx5_rdma_counter * +to_mcounter(struct rdma_counter *counter) +{ + return container_of(counter, struct mlx5_rdma_counter, rdma_counter); +} + int mlx5_ib_counters_init(struct mlx5_ib_dev *dev); void mlx5_ib_counters_cleanup(struct mlx5_ib_dev *dev); void mlx5_ib_counters_clear_description(struct ib_counters *counters); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 9941f4185c79..90e93297d59e 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2665,6 +2665,11 @@ struct ib_device_ops { */ int (*counter_update_stats)(struct rdma_counter *counter); + /** + * counter_init - Initialize the driver specific rdma counter struct. + */ + void (*counter_init)(struct rdma_counter *counter); + /** * Allows rdma drivers to add their own restrack attributes * dumped via 'rdma stat' iproute2 command. @@ -2716,6 +2721,7 @@ struct ib_device_ops { DECLARE_RDMA_OBJ_SIZE(ib_srq); DECLARE_RDMA_OBJ_SIZE(ib_ucontext); DECLARE_RDMA_OBJ_SIZE(ib_xrcd); + DECLARE_RDMA_OBJ_SIZE(rdma_counter); }; struct ib_core_device { From patchwork Tue Mar 4 14:15:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 14000821 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 059CB204F7F for ; Tue, 4 Mar 2025 14:15:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097759; cv=none; b=ahEr8PJW7Z1e9W5TRuFcWmiVPWXibeWfT1P3kAMnC0yQ2MB8Vzur4Dmz5X4Op3Hc4e4gN7jNR08/PNh7WjZl7HCQza2fKza1Y/kH62A6TVVCior1NqtIdWIqFAk971Yyive8HL9NUKvPr4XiJev1v+R10ZSAhw5y9vu5Jx9C8RE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097759; c=relaxed/simple; bh=pJkWPIN96OgWhizqGd20CsKU3gl0Qo4uGEqszfsdQJo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qais7jkElXvTwyHE904ePLkOLWp9sZQNm0GM+Rbu7FH/joqNhxYLgJnJpv9MXORSiCPYu9JyY58uNFYXRc75CeeShFC1kCCFD57iu0u0ZonCHXrHKmG44KDKHUholp8Ve16zJnGhQ6MGNT7+WCOkzLkrSoVYzcRW3TcgfMexjuE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=kGluy4Hq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="kGluy4Hq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BFE80C4CEE5; Tue, 4 Mar 2025 14:15:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741097758; bh=pJkWPIN96OgWhizqGd20CsKU3gl0Qo4uGEqszfsdQJo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kGluy4HqCusNKZahgshfNwYNj2jA+O0E8BWMXSBkRwx8a9Zkj68HNhPKoJhjjdPnD AxT41+4rQkG6UoaHm2pFoThdmVbcfhlGYDB8liARbWPUoiSqSZvXDGpmNoRSAQhuKn Z+qHX5MDVAcqqATVchntGvd8HYzhF2et+4WDf07EgYNAgwxXIm9eOj3164ARmi4m8p 9+gVX5h4J34ZEGrGerf5HEV69q6YlXN0hUb5GDUjiQr94ix/30vUXFJjQpxuS5Rq53 4IJdd+lm8B/ibD9ReOh182TEG1ZJpaB8hN2/E0qJ59lJr5sk1e1dWKQ4d4ZpZfL+73 h0IkXoDRygf2g== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Bloch Subject: [PATCH rdma-next 3/5] RDMA/core: Add support to optional-counters binding configuration Date: Tue, 4 Mar 2025 16:15:27 +0200 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrisious Haddad Whenever a new counter is created, save inside it the user requested configuration for optional-counters binding, for manual configuration it is requested directly by the user and for the automatic configuration it depends on if the automatic binding was enabled with or without optional-counters binding. This argument will later be used by the driver to determine if to bind the optional-counters as well or not when trying to bind this counter to a QP. It indicates that when binding counters to a QP we also want the currently enabled link optional-counters to be bound as well. Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/counters.c | 28 +++++++++++++++++++--------- drivers/infiniband/core/nldev.c | 18 ++++++++++++++++-- include/rdma/rdma_counter.h | 5 ++++- include/uapi/rdma/rdma_netlink.h | 2 ++ 4 files changed, 41 insertions(+), 12 deletions(-) diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c index 981d5a28614a..b270a208214e 100644 --- a/drivers/infiniband/core/counters.c +++ b/drivers/infiniband/core/counters.c @@ -12,7 +12,8 @@ static int __counter_set_mode(struct rdma_port_counter *port_counter, enum rdma_nl_counter_mode new_mode, - enum rdma_nl_counter_mask new_mask) + enum rdma_nl_counter_mask new_mask, + bool bind_opcnt) { if (new_mode == RDMA_COUNTER_MODE_AUTO) { if (new_mask & (~ALL_AUTO_MODE_MASKS)) @@ -23,6 +24,7 @@ static int __counter_set_mode(struct rdma_port_counter *port_counter, port_counter->mode.mode = new_mode; port_counter->mode.mask = new_mask; + port_counter->mode.bind_opcnt = bind_opcnt; return 0; } @@ -41,6 +43,7 @@ static int __counter_set_mode(struct rdma_port_counter *port_counter, */ int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port, enum rdma_nl_counter_mask mask, + bool bind_opcnt, struct netlink_ext_ack *extack) { struct rdma_port_counter *port_counter; @@ -59,12 +62,13 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port, RDMA_COUNTER_MODE_NONE; if (port_counter->mode.mode == mode && - port_counter->mode.mask == mask) { + port_counter->mode.mask == mask && + port_counter->mode.bind_opcnt == bind_opcnt) { ret = 0; goto out; } - ret = __counter_set_mode(port_counter, mode, mask); + ret = __counter_set_mode(port_counter, mode, mask, bind_opcnt); out: mutex_unlock(&port_counter->lock); @@ -140,7 +144,8 @@ int rdma_counter_modify(struct ib_device *dev, u32 port, static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, struct ib_qp *qp, - enum rdma_nl_counter_mode mode) + enum rdma_nl_counter_mode mode, + bool bind_opcnt) { struct rdma_port_counter *port_counter; struct rdma_counter *counter; @@ -168,7 +173,7 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, switch (mode) { case RDMA_COUNTER_MODE_MANUAL: ret = __counter_set_mode(port_counter, RDMA_COUNTER_MODE_MANUAL, - 0); + 0, bind_opcnt); if (ret) { mutex_unlock(&port_counter->lock); goto err_mode; @@ -187,6 +192,7 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, mutex_unlock(&port_counter->lock); counter->mode.mode = mode; + counter->mode.bind_opcnt = bind_opcnt; kref_init(&counter->kref); mutex_init(&counter->lock); @@ -215,7 +221,8 @@ static void rdma_counter_free(struct rdma_counter *counter) port_counter->num_counters--; if (!port_counter->num_counters && (port_counter->mode.mode == RDMA_COUNTER_MODE_MANUAL)) - __counter_set_mode(port_counter, RDMA_COUNTER_MODE_NONE, 0); + __counter_set_mode(port_counter, RDMA_COUNTER_MODE_NONE, 0, + false); mutex_unlock(&port_counter->lock); @@ -347,7 +354,8 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port) return ret; } } else { - counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO); + counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO, + port_counter->mode.bind_opcnt); if (!counter) return -ENOMEM; } @@ -560,7 +568,7 @@ int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u32 port, goto err; } - counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL); + counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL, true); if (!counter) { ret = -ENOMEM; goto err; @@ -615,13 +623,15 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u32 port, int rdma_counter_get_mode(struct ib_device *dev, u32 port, enum rdma_nl_counter_mode *mode, - enum rdma_nl_counter_mask *mask) + enum rdma_nl_counter_mask *mask, + bool *opcnt) { struct rdma_port_counter *port_counter; port_counter = &dev->port_data[port].port_counter; *mode = port_counter->mode.mode; *mask = port_counter->mode.mask; + *opcnt = port_counter->mode.bind_opcnt; return 0; } diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nldev.c index cb987ab0177c..a872643e8039 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -171,6 +171,7 @@ static const struct nla_policy nldev_policy[RDMA_NLDEV_ATTR_MAX] = { [RDMA_NLDEV_ATTR_PARENT_NAME] = { .type = NLA_NUL_STRING }, [RDMA_NLDEV_ATTR_NAME_ASSIGN_TYPE] = { .type = NLA_U8 }, [RDMA_NLDEV_ATTR_EVENT_TYPE] = { .type = NLA_U8 }, + [RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED] = { .type = NLA_U8 }, }; static int put_driver_name_print_type(struct sk_buff *msg, const char *name, @@ -2028,6 +2029,7 @@ static int nldev_stat_set_mode_doit(struct sk_buff *msg, struct ib_device *device, u32 port) { u32 mode, mask = 0, qpn, cntn = 0; + bool opcnt = false; int ret; /* Currently only counter for QP is supported */ @@ -2035,12 +2037,17 @@ static int nldev_stat_set_mode_doit(struct sk_buff *msg, nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_RES]) != RDMA_NLDEV_ATTR_RES_QP) return -EINVAL; + if (tb[RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED]) + opcnt = !!nla_get_u8( + tb[RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED]); + mode = nla_get_u32(tb[RDMA_NLDEV_ATTR_STAT_MODE]); if (mode == RDMA_COUNTER_MODE_AUTO) { if (tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]) mask = nla_get_u32( tb[RDMA_NLDEV_ATTR_STAT_AUTO_MODE_MASK]); - return rdma_counter_set_auto_mode(device, port, mask, extack); + return rdma_counter_set_auto_mode(device, port, mask, opcnt, + extack); } if (!tb[RDMA_NLDEV_ATTR_RES_LQPN]) @@ -2358,6 +2365,7 @@ static int stat_get_doit_qp(struct sk_buff *skb, struct nlmsghdr *nlh, struct ib_device *device; struct sk_buff *msg; u32 index, port; + bool opcnt; int ret; if (tb[RDMA_NLDEV_ATTR_STAT_COUNTER_ID]) @@ -2393,7 +2401,7 @@ static int stat_get_doit_qp(struct sk_buff *skb, struct nlmsghdr *nlh, goto err_msg; } - ret = rdma_counter_get_mode(device, port, &mode, &mask); + ret = rdma_counter_get_mode(device, port, &mode, &mask, &opcnt); if (ret) goto err_msg; @@ -2410,6 +2418,12 @@ static int stat_get_doit_qp(struct sk_buff *skb, struct nlmsghdr *nlh, goto err_msg; } + if ((mode == RDMA_COUNTER_MODE_AUTO) && + nla_put_u8(msg, RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED, opcnt)) { + ret = -EMSGSIZE; + goto err_msg; + } + nlmsg_end(msg, nlh); ib_device_put(device); return rdma_nl_unicast(sock_net(skb->sk), msg, NETLINK_CB(skb).portid); diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h index 45d5481a7846..74e635409ff7 100644 --- a/include/rdma/rdma_counter.h +++ b/include/rdma/rdma_counter.h @@ -23,6 +23,7 @@ struct rdma_counter_mode { enum rdma_nl_counter_mode mode; enum rdma_nl_counter_mask mask; struct auto_mode_param param; + bool bind_opcnt; }; struct rdma_port_counter { @@ -47,6 +48,7 @@ void rdma_counter_init(struct ib_device *dev); void rdma_counter_release(struct ib_device *dev); int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port, enum rdma_nl_counter_mask mask, + bool bind_opcnt, struct netlink_ext_ack *extack); int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port); int rdma_counter_unbind_qp(struct ib_qp *qp, bool force); @@ -61,7 +63,8 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u32 port, u32 qp_num, u32 counter_id); int rdma_counter_get_mode(struct ib_device *dev, u32 port, enum rdma_nl_counter_mode *mode, - enum rdma_nl_counter_mask *mask); + enum rdma_nl_counter_mask *mask, + bool *opcnt); int rdma_counter_modify(struct ib_device *dev, u32 port, unsigned int index, bool enable); diff --git a/include/uapi/rdma/rdma_netlink.h b/include/uapi/rdma/rdma_netlink.h index 9f9cf20c1cd8..f41f0228fcd0 100644 --- a/include/uapi/rdma/rdma_netlink.h +++ b/include/uapi/rdma/rdma_netlink.h @@ -580,6 +580,8 @@ enum rdma_nldev_attr { RDMA_NLDEV_ATTR_EVENT_TYPE, /* u8 */ RDMA_NLDEV_SYS_ATTR_MONITOR_MODE, /* u8 */ + + RDMA_NLDEV_ATTR_STAT_OPCOUNTER_ENABLED, /* u8 */ /* * Always the end */ From patchwork Tue Mar 4 14:15:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 14000822 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44BC4204F79 for ; Tue, 4 Mar 2025 14:16:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097762; cv=none; b=m3uewQeGtrrHolB5D202UD1OJnFlM/Yq/r0rA2sil1s5+eL9P3Rds0h3WF6ySQyYypiiN6a8S3psSq5BP/erg+Yv4LbG4EkNWb4OpHdoYL9KcUpMgoIbqdGoXS5TR4fqW8XS+7myWS17F34fARU1ym8SRA6aPFyC9RHm+ufXeEk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097762; c=relaxed/simple; bh=SWEIC7EiNX5inIciHZWEAlcmYHAsZ94otUJW5QwN/UE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sIFsf74lJcaSq1AVZpim0mOOy3i0Bcoy8B2i0os+lG20thCcx1UEA5DqrNNc7gq6lXeW8roITvnCTbtjhBW24bwxEAc9BGoBa1Wqf4+4OW7ZofxWSIj6V9v0rylHQgUjIGZKVy3hWk9bPumC3rdUgKwOKeJKJgmk2F5ketFau/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Pviij37+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pviij37+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A0F47C4CEE5; Tue, 4 Mar 2025 14:16:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741097762; bh=SWEIC7EiNX5inIciHZWEAlcmYHAsZ94otUJW5QwN/UE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Pviij37+CF26t8fTv9Sh9luje9+nGlBMcttFqgvGMfhUrGbK83yzzrcJulWQ+3FZB Sz3aBmIJ6/kariXVkWH7iDaocobP4V1YVY3tClMWmDYJ3xfWOpAvcMQuJArGilAIQC WptG2ueQdxutiIxT+doCVvz5Gmc+6/Dt8EJFV5e3VTeS/ka0uLzs5VfAdzRpoc3kyw s2feOZ9j50Jwa7bvlkEruyCBqJhVy5hvSvYt0QG5kcCg3LnYOUlQGJtTk73lsziymR LxFP23doAep4Uaj3D8gvSMEQP+0K//JirZhb1muTPPh8lpVFARCt/s504xPbHvEohn 2fbUh6Cyx04MQ== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Bloch Subject: [PATCH rdma-next 4/5] RDMA/core: Pass port to counter bind/unbind operations Date: Tue, 4 Mar 2025 16:15:28 +0200 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrisious Haddad This will be useful for the next patches in the series since port number is needed for optional counters binding and unbinding. Note that this change is needed since when the operation is done qp->port isn't necessarily initialized yet and can't be used. Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Signed-off-by: Leon Romanovsky --- drivers/infiniband/core/counters.c | 20 ++++++++++---------- drivers/infiniband/core/verbs.c | 2 +- drivers/infiniband/hw/mlx5/counters.c | 4 ++-- include/rdma/ib_verbs.h | 5 +++-- include/rdma/rdma_counter.h | 2 +- 5 files changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c index b270a208214e..e6ec7b7a40af 100644 --- a/drivers/infiniband/core/counters.c +++ b/drivers/infiniband/core/counters.c @@ -93,7 +93,7 @@ static void auto_mode_init_counter(struct rdma_counter *counter, } static int __rdma_counter_bind_qp(struct rdma_counter *counter, - struct ib_qp *qp) + struct ib_qp *qp, u32 port) { int ret; @@ -104,7 +104,7 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter, return -EOPNOTSUPP; mutex_lock(&counter->lock); - ret = qp->device->ops.counter_bind_qp(counter, qp); + ret = qp->device->ops.counter_bind_qp(counter, qp, port); mutex_unlock(&counter->lock); return ret; @@ -196,7 +196,7 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port, kref_init(&counter->kref); mutex_init(&counter->lock); - ret = __rdma_counter_bind_qp(counter, qp); + ret = __rdma_counter_bind_qp(counter, qp, port); if (ret) goto err_mode; @@ -247,7 +247,7 @@ static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter, return match; } -static int __rdma_counter_unbind_qp(struct ib_qp *qp) +static int __rdma_counter_unbind_qp(struct ib_qp *qp, u32 port) { struct rdma_counter *counter = qp->counter; int ret; @@ -256,7 +256,7 @@ static int __rdma_counter_unbind_qp(struct ib_qp *qp) return -EOPNOTSUPP; mutex_lock(&counter->lock); - ret = qp->device->ops.counter_unbind_qp(qp); + ret = qp->device->ops.counter_unbind_qp(qp, port); mutex_unlock(&counter->lock); return ret; @@ -348,7 +348,7 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port) counter = rdma_get_counter_auto_mode(qp, port); if (counter) { - ret = __rdma_counter_bind_qp(counter, qp); + ret = __rdma_counter_bind_qp(counter, qp, port); if (ret) { kref_put(&counter->kref, counter_release); return ret; @@ -368,7 +368,7 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port) * @force: * true - Decrease the counter ref-count anyway (e.g., qp destroy) */ -int rdma_counter_unbind_qp(struct ib_qp *qp, bool force) +int rdma_counter_unbind_qp(struct ib_qp *qp, u32 port, bool force) { struct rdma_counter *counter = qp->counter; int ret; @@ -376,7 +376,7 @@ int rdma_counter_unbind_qp(struct ib_qp *qp, bool force) if (!counter) return -EINVAL; - ret = __rdma_counter_unbind_qp(qp); + ret = __rdma_counter_unbind_qp(qp, port); if (ret && !force) return ret; @@ -523,7 +523,7 @@ int rdma_counter_bind_qpn(struct ib_device *dev, u32 port, goto err_task; } - ret = __rdma_counter_bind_qp(counter, qp); + ret = __rdma_counter_bind_qp(counter, qp, port); if (ret) goto err_task; @@ -614,7 +614,7 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u32 port, goto out; } - ret = rdma_counter_unbind_qp(qp, false); + ret = rdma_counter_unbind_qp(qp, port, false); out: rdma_restrack_put(&qp->res); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index dc40001072a5..c5e78bbefbd0 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2105,7 +2105,7 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata) if (!qp->uobject) rdma_rw_cleanup_mrs(qp); - rdma_counter_unbind_qp(qp, true); + rdma_counter_unbind_qp(qp, qp->port, true); ret = qp->device->ops.destroy_qp(qp, udata); if (ret) { if (sec) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index 018bb96bdbf4..d826f03b6ec5 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -562,7 +562,7 @@ static int mlx5_ib_counter_dealloc(struct rdma_counter *counter) } static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, - struct ib_qp *qp) + struct ib_qp *qp, u32 port) { struct mlx5_ib_dev *dev = to_mdev(qp->device); int err; @@ -594,7 +594,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, return err; } -static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp) +static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, u32 port) { return mlx5_ib_qp_set_counter(qp, NULL); } diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 90e93297d59e..d42eae69d9a8 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2644,12 +2644,13 @@ struct ib_device_ops { * @counter - The counter to be bound. If counter->id is zero then * the driver needs to allocate a new counter and set counter->id */ - int (*counter_bind_qp)(struct rdma_counter *counter, struct ib_qp *qp); + int (*counter_bind_qp)(struct rdma_counter *counter, struct ib_qp *qp, + u32 port); /** * counter_unbind_qp - Unbind the qp from the dynamically-allocated * counter and bind it onto the default one */ - int (*counter_unbind_qp)(struct ib_qp *qp); + int (*counter_unbind_qp)(struct ib_qp *qp, u32 port); /** * counter_dealloc -De-allocate the hw counter */ diff --git a/include/rdma/rdma_counter.h b/include/rdma/rdma_counter.h index 74e635409ff7..4204d08a010a 100644 --- a/include/rdma/rdma_counter.h +++ b/include/rdma/rdma_counter.h @@ -51,7 +51,7 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port, bool bind_opcnt, struct netlink_ext_ack *extack); int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port); -int rdma_counter_unbind_qp(struct ib_qp *qp, bool force); +int rdma_counter_unbind_qp(struct ib_qp *qp, u32 port, bool force); int rdma_counter_query_stats(struct rdma_counter *counter); u64 rdma_counter_get_hwstat_value(struct ib_device *dev, u32 port, u32 index); From patchwork Tue Mar 4 14:15:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 14000823 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 88699205518; Tue, 4 Mar 2025 14:16:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097766; cv=none; b=PPlF9uV+IB3eBZvZoyFD05Mj/MNVpxMluzXpKOyAMLVrDo6p0L8z/BTHhTHwA191IrkCggSGDLqv/m4MjY8e4hREnUGpdk+yXS6jk+PRcdqy88Kbel1Y3GIA2ZC4JuN8+7lUyIet3jWARJsir9qKmjEzXkmByx6NNBHQpjCOt10= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741097766; c=relaxed/simple; bh=sN0DLmvq/imr6sTeAr1Tu9Tpft4mYmIThOllTYwmeYo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E0gZP7vu6HBmk+K7lklXVbOjmceXNzjj+8ljdfQ0PKq0QSejSBrHtcM9ec6ckuViTKniJH6hTAJ8c4L2Hy9PdTQwEXMNZPzzSfTXaFD5thaKIghyVnfE+PJH5vg3MBZHkTRAgfYITcARUQxNRco24ODPNM0ua+ISvz+H/C36izs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KwPD8XgW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KwPD8XgW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 44796C4CEE5; Tue, 4 Mar 2025 14:16:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741097766; bh=sN0DLmvq/imr6sTeAr1Tu9Tpft4mYmIThOllTYwmeYo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KwPD8XgWv5sv8aa2tNM5imMTWzShrzRc2jRUU2udGwVH7gi1I5ZddaUFbxQyIYHSu m/kGfLEdnr35VvsLWguklsN1eU607ZhLxTNN7Ye8gWf3uXjuWSlKqWc7rq9yuLJaRF PyF8vlCorg9UMty4FdHANhjvdvf8GOCRXF0C15fv+Tms8hWx0XFXS6fYnAOASqSrBU nqka2ivVYjejAGxW5dSzxQfj8+lzb2n9BrTU7Tv+358nDDj4iCPOVeQgoSkDQJWjaJ W5FbsgP3X4bzGRpD0Wei0Jwfq0SrBDD1/+2L90uRw2TIQhpcAZEll7YpCLzud0BKdZ Zkr30hA8xjYOw== From: Leon Romanovsky To: Jason Gunthorpe Cc: Patrisious Haddad , linux-rdma@vger.kernel.org, Mark Bloch , netdev@vger.kernel.org, Saeed Mahameed , Tariq Toukan Subject: [PATCH mlx5-next 5/5] RDMA/mlx5: Support optional-counters binding for QPs Date: Tue, 4 Mar 2025 16:15:29 +0200 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Patrisious Haddad Add support to allow optional-counters binding to a QP, whereas when a bind operation is requested depending on the counter optional-counter binding state the driver will determine if to also add optional-counters to this QP binding. The optional-counter binding is done by simply adding a steering rule for the specific optional-counter condition with the additional match over that QP number. Note that optional-counters per QP rules are handled on an earlier prio than per device counters, and per device counter correctness is maintained by core whereas it is responsible to sum active counters when checking device counter and to add them to history count when they are deallocated. Signed-off-by: Patrisious Haddad Reviewed-by: Mark Bloch Signed-off-by: Leon Romanovsky --- drivers/infiniband/hw/mlx5/counters.c | 99 +++++- drivers/infiniband/hw/mlx5/counters.h | 9 + drivers/infiniband/hw/mlx5/fs.c | 428 +++++++++++++++++++++++++- drivers/infiniband/hw/mlx5/mlx5_ib.h | 16 + include/linux/mlx5/device.h | 4 +- 5 files changed, 545 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index d826f03b6ec5..7d32b8c6c1a5 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -437,12 +437,49 @@ static int do_get_hw_stats(struct ib_device *ibdev, static bool is_rdma_bytes_counter(u32 type) { if (type == MLX5_IB_OPCOUNTER_RDMA_TX_BYTES || - type == MLX5_IB_OPCOUNTER_RDMA_RX_BYTES) + type == MLX5_IB_OPCOUNTER_RDMA_RX_BYTES || + type == MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP || + type == MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP) return true; return false; } +static int do_per_qp_get_op_stat(struct rdma_counter *counter) +{ + struct mlx5_ib_dev *dev = to_mdev(counter->device); + const struct mlx5_ib_counters *cnts = get_counters(dev, counter->port); + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + int i, ret, index, num_hw_counters; + u64 packets = 0, bytes = 0; + + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { + if (!mcounter->fc[i]) + continue; + + ret = mlx5_fc_query(dev->mdev, mcounter->fc[i], + &packets, &bytes); + if (ret) + return ret; + + num_hw_counters = cnts->num_q_counters + + cnts->num_cong_counters + + cnts->num_ext_ppcnt_counters; + + index = i - MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP + + num_hw_counters; + + if (is_rdma_bytes_counter(i)) + counter->stats->value[index] = bytes; + else + counter->stats->value[index] = packets; + + clear_bit(index, counter->stats->is_disabled); + } + return 0; +} + static int do_get_op_stat(struct ib_device *ibdev, struct rdma_hw_stats *stats, u32 port_num, int index) @@ -542,19 +579,30 @@ static int mlx5_ib_counter_update_stats(struct rdma_counter *counter) { struct mlx5_ib_dev *dev = to_mdev(counter->device); const struct mlx5_ib_counters *cnts = get_counters(dev, counter->port); + int ret; + + ret = mlx5_ib_query_q_counters(dev->mdev, cnts, counter->stats, + counter->id); + if (ret) + return ret; + + if (!counter->mode.bind_opcnt) + return 0; - return mlx5_ib_query_q_counters(dev->mdev, cnts, - counter->stats, counter->id); + return do_per_qp_get_op_stat(counter); } static int mlx5_ib_counter_dealloc(struct rdma_counter *counter) { + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); struct mlx5_ib_dev *dev = to_mdev(counter->device); u32 in[MLX5_ST_SZ_DW(dealloc_q_counter_in)] = {}; if (!counter->id) return 0; + WARN_ON(!xa_empty(&mcounter->qpn_opfc_xa)); + mlx5r_fs_destroy_fcs(dev, counter); MLX5_SET(dealloc_q_counter_in, in, opcode, MLX5_CMD_OP_DEALLOC_Q_COUNTER); MLX5_SET(dealloc_q_counter_in, in, counter_set_id, counter->id); @@ -585,8 +633,14 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, if (err) goto fail_set_counter; + err = mlx5r_fs_bind_op_fc(qp, counter, port); + if (err) + goto fail_bind_op_fc; + return 0; +fail_bind_op_fc: + mlx5_ib_qp_set_counter(qp, NULL); fail_set_counter: mlx5_ib_counter_dealloc(counter); counter->id = 0; @@ -596,7 +650,20 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter, static int mlx5_ib_counter_unbind_qp(struct ib_qp *qp, u32 port) { - return mlx5_ib_qp_set_counter(qp, NULL); + struct rdma_counter *counter = qp->counter; + int err; + + mlx5r_fs_unbind_op_fc(qp, counter); + + err = mlx5_ib_qp_set_counter(qp, NULL); + if (err) + goto fail_set_counter; + + return 0; + +fail_set_counter: + mlx5r_fs_bind_op_fc(qp, counter, port); + return err; } static void mlx5_ib_fill_counters(struct mlx5_ib_dev *dev, @@ -789,9 +856,8 @@ static int __mlx5_ib_alloc_counters(struct mlx5_ib_dev *dev, * was already created, if both conditions are met return true and the counter * else return false. */ -static bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, - u32 type, - struct mlx5_ib_op_fc **opfc) +bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, u32 type, + struct mlx5_ib_op_fc **opfc) { u32 shared_fc_type; @@ -808,6 +874,18 @@ static bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS; break; + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; + break; default: return false; } @@ -1105,7 +1183,12 @@ static int mlx5_ib_modify_stat(struct ib_device *device, u32 port, return 0; } -static void mlx5_ib_counter_init(struct rdma_counter *counter) {} +static void mlx5_ib_counter_init(struct rdma_counter *counter) +{ + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + + xa_init(&mcounter->qpn_opfc_xa); +} static const struct ib_device_ops hw_stats_ops = { .alloc_hw_port_stats = mlx5_ib_alloc_hw_port_stats, diff --git a/drivers/infiniband/hw/mlx5/counters.h b/drivers/infiniband/hw/mlx5/counters.h index f153901a43be..4c2421bcf876 100644 --- a/drivers/infiniband/hw/mlx5/counters.h +++ b/drivers/infiniband/hw/mlx5/counters.h @@ -9,8 +9,15 @@ #include "mlx5_ib.h" +struct mlx5_per_qp_opfc { + struct mlx5_ib_op_fc opfcs[MLX5_IB_OPCOUNTER_MAX]; +}; + struct mlx5_rdma_counter { struct rdma_counter rdma_counter; + + struct mlx5_fc *fc[MLX5_IB_OPCOUNTER_MAX]; + struct xarray qpn_opfc_xa; }; static inline struct mlx5_rdma_counter * @@ -25,4 +32,6 @@ void mlx5_ib_counters_clear_description(struct ib_counters *counters); int mlx5_ib_flow_counters_set_data(struct ib_counters *ibcounters, struct mlx5_ib_create_flow *ucmd); u16 mlx5_ib_get_counters_id(struct mlx5_ib_dev *dev, u32 port_num); +bool mlx5r_is_opfc_shared_and_in_use(struct mlx5_ib_op_fc *opfcs, u32 type, + struct mlx5_ib_op_fc **opfc); #endif /* _MLX5_IB_COUNTERS_H */ diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c index 93b229e9aab3..3069090874a1 100644 --- a/drivers/infiniband/hw/mlx5/fs.c +++ b/drivers/infiniband/hw/mlx5/fs.c @@ -800,12 +800,17 @@ static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev, } enum { + RDMA_RX_ECN_OPCOUNTER_PER_QP_PRIO, + RDMA_RX_CNP_OPCOUNTER_PER_QP_PRIO, + RDMA_RX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO, RDMA_RX_ECN_OPCOUNTER_PRIO, RDMA_RX_CNP_OPCOUNTER_PRIO, RDMA_RX_PKTS_BYTES_OPCOUNTER_PRIO, }; enum { + RDMA_TX_CNP_OPCOUNTER_PER_QP_PRIO, + RDMA_TX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO, RDMA_TX_CNP_OPCOUNTER_PRIO, RDMA_TX_PKTS_BYTES_OPCOUNTER_PRIO, }; @@ -887,6 +892,12 @@ static struct mlx5_ib_flow_prio *get_opfc_prio(struct mlx5_ib_dev *dev, case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: prio_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS; break; + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: + prio_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: + prio_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; + break; default: prio_type = type; } @@ -894,6 +905,315 @@ static struct mlx5_ib_flow_prio *get_opfc_prio(struct mlx5_ib_dev *dev, return &dev->flow_db->opfcs[prio_type]; } +static void put_per_qp_prio(struct mlx5_ib_dev *dev, + enum mlx5_ib_optional_counter_type type) +{ + enum mlx5_ib_optional_counter_type per_qp_type; + struct mlx5_ib_flow_prio *prio; + + switch (type) { + case MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS: + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS: + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS: + per_qp_type = MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; + break; + default: + return; + } + + prio = get_opfc_prio(dev, per_qp_type); + put_flow_table(dev, prio, true); +} + +static int get_per_qp_prio(struct mlx5_ib_dev *dev, + enum mlx5_ib_optional_counter_type type) +{ + enum mlx5_ib_optional_counter_type per_qp_type; + enum mlx5_flow_namespace_type fn_type; + struct mlx5_flow_namespace *ns; + struct mlx5_ib_flow_prio *prio; + int priority; + + switch (type) { + case MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; + priority = RDMA_RX_ECN_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; + priority = RDMA_RX_CNP_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; + priority = RDMA_TX_CNP_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; + priority = RDMA_TX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_TX_COUNTERS; + priority = RDMA_TX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; + priority = RDMA_RX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES: + fn_type = MLX5_FLOW_NAMESPACE_RDMA_RX_COUNTERS; + priority = RDMA_RX_PKTS_BYTES_OPCOUNTER_PER_QP_PRIO; + per_qp_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; + break; + default: + return -EINVAL; + } + + ns = mlx5_get_flow_namespace(dev->mdev, fn_type); + if (!ns) + return -EOPNOTSUPP; + + prio = get_opfc_prio(dev, per_qp_type); + if (prio->flow_table) + return 0; + + prio = _get_prio(dev, ns, prio, priority, MLX5_FS_MAX_POOL_SIZE, 1, 0, 0); + if (IS_ERR(prio)) + return PTR_ERR(prio); + + prio->refcount = 1; + + return 0; +} + +static struct mlx5_per_qp_opfc * +get_per_qp_opfc(struct mlx5_rdma_counter *mcounter, u32 qp_num, bool *new) +{ + struct mlx5_per_qp_opfc *per_qp_opfc; + + *new = false; + + per_qp_opfc = xa_load(&mcounter->qpn_opfc_xa, qp_num); + if (per_qp_opfc) + return per_qp_opfc; + per_qp_opfc = kzalloc(sizeof(*per_qp_opfc), GFP_KERNEL); + + if (!per_qp_opfc) + return NULL; + + *new = true; + return per_qp_opfc; +} + +static int add_op_fc_rules(struct mlx5_ib_dev *dev, + struct mlx5_rdma_counter *mcounter, + struct mlx5_per_qp_opfc *per_qp_opfc, + struct mlx5_ib_flow_prio *prio, + enum mlx5_ib_optional_counter_type type, + u32 qp_num, u32 port_num) +{ + struct mlx5_ib_op_fc *opfc = &per_qp_opfc->opfcs[type], *in_use_opfc; + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_destination dst; + struct mlx5_flow_spec *spec; + int i, err, spec_num; + bool is_tx; + + if (opfc->fc) + return -EEXIST; + + if (mlx5r_is_opfc_shared_and_in_use(per_qp_opfc->opfcs, type, + &in_use_opfc)) { + opfc->fc = in_use_opfc->fc; + opfc->rule[0] = in_use_opfc->rule[0]; + return 0; + } + + opfc->fc = mcounter->fc[type]; + + spec = kcalloc(MAX_OPFC_RULES, sizeof(*spec), GFP_KERNEL); + if (!spec) { + err = -ENOMEM; + goto null_fc; + } + + switch (type) { + case MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP: + if (set_ecn_ce_spec(dev, port_num, &spec[0], + MLX5_FS_IPV4_VERSION) || + set_ecn_ce_spec(dev, port_num, &spec[1], + MLX5_FS_IPV6_VERSION)) { + err = -EOPNOTSUPP; + goto free_spec; + } + spec_num = 2; + is_tx = false; + + MLX5_SET_TO_ONES(fte_match_param, spec[1].match_criteria, + misc_parameters.bth_dst_qp); + MLX5_SET(fte_match_param, spec[1].match_value, + misc_parameters.bth_dst_qp, qp_num); + spec[1].match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS; + break; + case MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP: + if (!MLX5_CAP_FLOWTABLE( + dev->mdev, + ft_field_support_2_nic_receive_rdma.bth_opcode) || + set_cnp_spec(dev, port_num, &spec[0])) { + err = -EOPNOTSUPP; + goto free_spec; + } + spec_num = 1; + is_tx = false; + break; + case MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP: + if (!MLX5_CAP_FLOWTABLE( + dev->mdev, + ft_field_support_2_nic_transmit_rdma.bth_opcode) || + set_cnp_spec(dev, port_num, &spec[0])) { + err = -EOPNOTSUPP; + goto free_spec; + } + spec_num = 1; + is_tx = true; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP: + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: + spec_num = 1; + is_tx = true; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP: + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: + spec_num = 1; + is_tx = false; + break; + default: + err = -EINVAL; + goto free_spec; + } + + if (is_tx) { + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, + misc_parameters.source_sqn); + MLX5_SET(fte_match_param, spec->match_value, + misc_parameters.source_sqn, qp_num); + } else { + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, + misc_parameters.bth_dst_qp); + MLX5_SET(fte_match_param, spec->match_value, + misc_parameters.bth_dst_qp, qp_num); + } + + spec->match_criteria_enable |= MLX5_MATCH_MISC_PARAMETERS; + + dst.type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; + dst.counter = opfc->fc; + + flow_act.action = + MLX5_FLOW_CONTEXT_ACTION_COUNT | MLX5_FLOW_CONTEXT_ACTION_ALLOW; + + for (i = 0; i < spec_num; i++) { + opfc->rule[i] = mlx5_add_flow_rules(prio->flow_table, &spec[i], + &flow_act, &dst, 1); + if (IS_ERR(opfc->rule[i])) { + err = PTR_ERR(opfc->rule[i]); + goto del_rules; + } + } + prio->refcount += spec_num; + + err = xa_err(xa_store(&mcounter->qpn_opfc_xa, qp_num, per_qp_opfc, + GFP_KERNEL)); + if (err) + goto del_rules; + + kfree(spec); + + return 0; + +del_rules: + while (i--) + mlx5_del_flow_rules(opfc->rule[i]); + put_flow_table(dev, prio, false); +free_spec: + kfree(spec); +null_fc: + opfc->fc = NULL; + return err; +} + +static bool is_fc_shared_and_in_use(struct mlx5_rdma_counter *mcounter, + u32 type, struct mlx5_fc **fc) +{ + u32 shared_fc_type; + + switch (type) { + case MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; + break; + case MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP: + shared_fc_type = MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP; + break; + default: + return false; + } + + *fc = mcounter->fc[shared_fc_type]; + if (!(*fc)) + return false; + + return true; +} + +void mlx5r_fs_destroy_fcs(struct mlx5_ib_dev *dev, + struct rdma_counter *counter) +{ + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + struct mlx5_fc *in_use_fc; + int i; + + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { + if (!mcounter->fc[i]) + continue; + + if (is_fc_shared_and_in_use(mcounter, i, &in_use_fc)) { + mcounter->fc[i] = NULL; + continue; + } + + mlx5_fc_destroy(dev->mdev, mcounter->fc[i]); + mcounter->fc[i] = NULL; + } +} + int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, struct mlx5_ib_op_fc *opfc, enum mlx5_ib_optional_counter_type type) @@ -975,11 +1295,15 @@ int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, prio = get_opfc_prio(dev, type); if (!prio->flow_table) { + err = get_per_qp_prio(dev, type); + if (err) + goto free; + prio = _get_prio(dev, ns, prio, priority, dev->num_ports * MAX_OPFC_RULES, 1, 0, 0); if (IS_ERR(prio)) { err = PTR_ERR(prio); - goto free; + goto put_prio; } } @@ -1006,6 +1330,8 @@ int mlx5_ib_fs_add_op_fc(struct mlx5_ib_dev *dev, u32 port_num, for (i -= 1; i >= 0; i--) mlx5_del_flow_rules(opfc->rule[i]); put_flow_table(dev, prio, false); +put_prio: + put_per_qp_prio(dev, type); free: kfree(spec); return err; @@ -1024,6 +1350,106 @@ void mlx5_ib_fs_remove_op_fc(struct mlx5_ib_dev *dev, mlx5_del_flow_rules(opfc->rule[i]); put_flow_table(dev, prio, true); } + + put_per_qp_prio(dev, type); +} + +void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter) +{ + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + struct mlx5_ib_dev *dev = to_mdev(counter->device); + struct mlx5_per_qp_opfc *per_qp_opfc; + struct mlx5_ib_op_fc *in_use_opfc; + struct mlx5_ib_flow_prio *prio; + int i, j; + + per_qp_opfc = xa_load(&mcounter->qpn_opfc_xa, qp->qp_num); + if (!per_qp_opfc) + return; + + for (i = MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP; i++) { + if (!per_qp_opfc->opfcs[i].fc) + continue; + + if (mlx5r_is_opfc_shared_and_in_use(per_qp_opfc->opfcs, i, + &in_use_opfc)) { + per_qp_opfc->opfcs[i].fc = NULL; + continue; + } + + for (j = 0; j < MAX_OPFC_RULES; j++) { + if (!per_qp_opfc->opfcs[i].rule[j]) + continue; + mlx5_del_flow_rules(per_qp_opfc->opfcs[i].rule[j]); + prio = get_opfc_prio(dev, i); + put_flow_table(dev, prio, true); + } + per_qp_opfc->opfcs[i].fc = NULL; + } + + kfree(per_qp_opfc); + xa_erase(&mcounter->qpn_opfc_xa, qp->qp_num); +} + +int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, + u32 port) +{ + struct mlx5_rdma_counter *mcounter = to_mcounter(counter); + struct mlx5_ib_dev *dev = to_mdev(qp->device); + struct mlx5_per_qp_opfc *per_qp_opfc; + struct mlx5_ib_flow_prio *prio; + struct mlx5_ib_counters *cnts; + struct mlx5_ib_op_fc *opfc; + struct mlx5_fc *in_use_fc; + int i, err, per_qp_type; + bool new; + + if (!counter->mode.bind_opcnt) + return 0; + + cnts = &dev->port[port - 1].cnts; + + for (i = 0; i <= MLX5_IB_OPCOUNTER_RDMA_RX_BYTES; i++) { + opfc = &cnts->opfcs[i]; + if (!opfc->fc) + continue; + + per_qp_type = i + MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP; + prio = get_opfc_prio(dev, per_qp_type); + WARN_ON(!prio->flow_table); + + if (is_fc_shared_and_in_use(mcounter, per_qp_type, &in_use_fc)) + mcounter->fc[per_qp_type] = in_use_fc; + + if (!mcounter->fc[per_qp_type]) { + mcounter->fc[per_qp_type] = mlx5_fc_create(dev->mdev, + false); + if (IS_ERR(mcounter->fc[per_qp_type])) + return PTR_ERR(mcounter->fc[per_qp_type]); + } + + per_qp_opfc = get_per_qp_opfc(mcounter, qp->qp_num, &new); + if (!per_qp_opfc) { + err = -ENOMEM; + goto free_fc; + } + err = add_op_fc_rules(dev, mcounter, per_qp_opfc, prio, + per_qp_type, qp->qp_num, port); + if (err) + goto del_rules; + } + + return 0; + +del_rules: + mlx5r_fs_unbind_op_fc(qp, counter); + if (new) + kfree(per_qp_opfc); +free_fc: + if (xa_empty(&mcounter->qpn_opfc_xa)) + mlx5r_fs_destroy_fcs(dev, counter); + return err; } static void set_underlay_qp(struct mlx5_ib_dev *dev, diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 24b18942762c..84a1f07d46a7 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -299,6 +299,14 @@ enum mlx5_ib_optional_counter_type { MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS, MLX5_IB_OPCOUNTER_RDMA_RX_BYTES, + MLX5_IB_OPCOUNTER_CC_RX_CE_PKTS_PER_QP, + MLX5_IB_OPCOUNTER_CC_RX_CNP_PKTS_PER_QP, + MLX5_IB_OPCOUNTER_CC_TX_CNP_PKTS_PER_QP, + MLX5_IB_OPCOUNTER_RDMA_TX_PACKETS_PER_QP, + MLX5_IB_OPCOUNTER_RDMA_TX_BYTES_PER_QP, + MLX5_IB_OPCOUNTER_RDMA_RX_PACKETS_PER_QP, + MLX5_IB_OPCOUNTER_RDMA_RX_BYTES_PER_QP, + MLX5_IB_OPCOUNTER_MAX, }; @@ -891,6 +899,14 @@ void mlx5_ib_fs_remove_op_fc(struct mlx5_ib_dev *dev, struct mlx5_ib_op_fc *opfc, enum mlx5_ib_optional_counter_type type); +int mlx5r_fs_bind_op_fc(struct ib_qp *qp, struct rdma_counter *counter, + u32 port); + +void mlx5r_fs_unbind_op_fc(struct ib_qp *qp, struct rdma_counter *counter); + +void mlx5r_fs_destroy_fcs(struct mlx5_ib_dev *dev, + struct rdma_counter *counter); + struct mlx5_ib_multiport_info; struct mlx5_ib_multiport { diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 63f0d9fb94b4..344124644697 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1532,8 +1532,8 @@ static inline u16 mlx5_to_sw_pkey_sz(int pkey_sz) return MLX5_MIN_PKEY_TABLE_SIZE << pkey_sz; } -#define MLX5_RDMA_RX_NUM_COUNTERS_PRIOS 3 -#define MLX5_RDMA_TX_NUM_COUNTERS_PRIOS 2 +#define MLX5_RDMA_RX_NUM_COUNTERS_PRIOS 6 +#define MLX5_RDMA_TX_NUM_COUNTERS_PRIOS 4 #define MLX5_BY_PASS_NUM_REGULAR_PRIOS 16 #define MLX5_BY_PASS_NUM_DONT_TRAP_PRIOS 16 #define MLX5_BY_PASS_NUM_MULTICAST_PRIOS 1