diff mbox

[1/2] IB/mlx5: posting klm/mtt list inline in the send queue for reg_wr

Message ID 1509295101-14081-2-git-send-email-idanb@mellanox.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Idan Burstein Oct. 29, 2017, 4:38 p.m. UTC
From: Idan Burstein <idanb@mellanox.com>

As most kernel RDMA ULPs, NVMe over Fabrics in its secure
"register_always" mode registers and invalidates user buffer
upon each IO.

Today the mlx5 driver is posting the registration work
request using scatter/gather entry for the MTT/KLM list.

The fetch of the MTT/KLM list becomes the bottleneck in
number of IO operation could be done by NVMe over Fabrics
host driver on a single adapter as shown below.

This patch is adding the support for inline registration
work request upon MTT/KLM list of size <=64B.

The result for NVMe over Fabrics is increase of x3.6 for small
IOs as shown below, I expect other ULPs (e.g iSER, NFS over RDMA)
performance to be enhanced as well.

The following results were taken against a single nvme over fabrics
subsystem with a single namespace backed by null_blk:

Block Size       s/g reg_wr            inline reg_wr
++++++++++     +++++++++++++++        ++++++++++++++++
512B            1446.6K/8.57%          5224.2K/76.21%
1KB             1390.6K/8.5%           4656.7K/71.69%
2KB             1343.8K/8.6%           3410.3K/38.96%
4KB             1254.8K/8.39%          2033.6K/15.86%
8KB             1079.5K/7.7%           1143.1K/7.27%
16KB            603.4K/3.64%           593.8K/3.4%
32KB            294.8K/2.04%           293.7K/1.98%
64KB            138.2K/1.32%           141.6K/1.26%

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Idan Burstein <idanb@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)

Comments

Sagi Grimberg Oct. 29, 2017, 5:09 p.m. UTC | #1
> As most kernel RDMA ULPs, NVMe over Fabrics in its secure
> "register_always" mode registers and invalidates user buffer
> upon each IO.

Again, lets drop the secured mode language.

> diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
> index acb79d3..b25b93d 100644
> --- a/drivers/infiniband/hw/mlx5/qp.c
> +++ b/drivers/infiniband/hw/mlx5/qp.c
> @@ -53,6 +53,7 @@ enum {
>   
>   enum {
>   	MLX5_IB_SQ_STRIDE	= 6,
> +	MLX5_IB_SQ_UMR_INLINE_THRESHOLD = 64,
>   };

Is this a device capability? Is it true for all mlx5 devices?
can we get it from the FW instead?

Otherwise, looks good to me,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roland Dreier April 19, 2018, 9:11 p.m. UTC | #2
> The fetch of the MTT/KLM list becomes the bottleneck in
> number of IO operation could be done by NVMe over Fabrics
> host driver on a single adapter as shown below.
>
> This patch is adding the support for inline registration
> work request upon MTT/KLM list of size <=64B.

It looks like this patch never made it upstream, and I don't see any
discussion or reason why it might have been dropped.

This still seems like a good idea - should we respin it against the latest tree?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Ledford April 20, 2018, 2:04 a.m. UTC | #3
On Thu, 2018-04-19 at 14:11 -0700, roland wrote:
> > The fetch of the MTT/KLM list becomes the bottleneck in
> > number of IO operation could be done by NVMe over Fabrics
> > host driver on a single adapter as shown below.
> > 
> > This patch is adding the support for inline registration
> > work request upon MTT/KLM list of size <=64B.
> 
> It looks like this patch never made it upstream, and I don't see any
> discussion or reason why it might have been dropped.
> 
> This still seems like a good idea - should we respin it against the latest tree?

If you want it included, then you'll have to respin it if for no other
reason than the fact that it isn't readily available in patchworks and
that's what we're using to track/grab patches now a days.  I supposed I
could go hunting for it, but that's a good way to grab an out of date
patch and make a mistake by accident...
Max Gurtovoy April 20, 2018, 7:07 p.m. UTC | #4
Thanks Roland.
Our team was actually running the patch again using ConnectX-4 and 
ConnectX-5 last week succesfully with NVMe-oF/RDMA. We'll push it throw 
Jason's PR with all the great results published.

-Max.

On 4/20/2018 5:04 AM, Doug Ledford wrote:
> On Thu, 2018-04-19 at 14:11 -0700, roland wrote:
>>> The fetch of the MTT/KLM list becomes the bottleneck in
>>> number of IO operation could be done by NVMe over Fabrics
>>> host driver on a single adapter as shown below.
>>>
>>> This patch is adding the support for inline registration
>>> work request upon MTT/KLM list of size <=64B.
>>
>> It looks like this patch never made it upstream, and I don't see any
>> discussion or reason why it might have been dropped.
>>
>> This still seems like a good idea - should we respin it against the latest tree?
> 
> If you want it included, then you'll have to respin it if for no other
> reason than the fact that it isn't readily available in patchworks and
> that's what we're using to track/grab patches now a days.  I supposed I
> could go hunting for it, but that's a good way to grab an out of date
> patch and make a mistake by accident...
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index acb79d3..b25b93d 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -53,6 +53,7 @@  enum {
 
 enum {
 	MLX5_IB_SQ_STRIDE	= 6,
+	MLX5_IB_SQ_UMR_INLINE_THRESHOLD = 64,
 };
 
 static const u32 mlx5_ib_opcode[] = {
@@ -295,7 +296,8 @@  static int sq_overhead(struct ib_qp_init_attr *attr)
 			max(sizeof(struct mlx5_wqe_atomic_seg) +
 			    sizeof(struct mlx5_wqe_raddr_seg),
 			    sizeof(struct mlx5_wqe_umr_ctrl_seg) +
-			    sizeof(struct mlx5_mkey_seg));
+			    sizeof(struct mlx5_mkey_seg) +
+			    MLX5_IB_SQ_UMR_INLINE_THRESHOLD / MLX5_IB_UMR_OCTOWORD);
 		break;
 
 	case IB_QPT_XRC_TGT:
@@ -3164,13 +3166,15 @@  static __be64 sig_mkey_mask(void)
 }
 
 static void set_reg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
-			    struct mlx5_ib_mr *mr)
+			    struct mlx5_ib_mr *mr, bool umr_inline)
 {
 	int size = mr->ndescs * mr->desc_size;
 
 	memset(umr, 0, sizeof(*umr));
 
 	umr->flags = MLX5_UMR_CHECK_NOT_FREE;
+	if (umr_inline)
+		umr->flags |= MLX5_UMR_INLINE;
 	umr->xlt_octowords = cpu_to_be16(get_xlt_octo(size));
 	umr->mkey_mask = frwr_mkey_mask();
 }
@@ -3341,6 +3345,24 @@  static void set_reg_data_seg(struct mlx5_wqe_data_seg *dseg,
 	dseg->lkey = cpu_to_be32(pd->ibpd.local_dma_lkey);
 }
 
+static void set_reg_umr_inline_seg(void *seg, struct mlx5_ib_qp *qp,
+				   struct mlx5_ib_mr *mr, int mr_list_size)
+{
+	void *qend = qp->sq.qend;
+	void *addr = mr->descs;
+	int copy;
+
+        if (unlikely(seg + mr_list_size > qend)) {
+                copy = qend - seg;
+                memcpy(seg, addr, copy);
+                addr += copy;
+                mr_list_size -= copy;
+                seg = mlx5_get_send_wqe(qp, 0);
+        }
+        memcpy(seg, addr, mr_list_size);
+        seg += mr_list_size;
+}
+
 static __be32 send_ieth(struct ib_send_wr *wr)
 {
 	switch (wr->opcode) {
@@ -3735,6 +3757,8 @@  static int set_reg_wr(struct mlx5_ib_qp *qp,
 {
 	struct mlx5_ib_mr *mr = to_mmr(wr->mr);
 	struct mlx5_ib_pd *pd = to_mpd(qp->ibqp.pd);
+	int mr_list_size = mr->ndescs * mr->desc_size;
+	bool umr_inline = mr_list_size <= MLX5_IB_SQ_UMR_INLINE_THRESHOLD;
 
 	if (unlikely(wr->wr.send_flags & IB_SEND_INLINE)) {
 		mlx5_ib_warn(to_mdev(qp->ibqp.device),
@@ -3742,7 +3766,7 @@  static int set_reg_wr(struct mlx5_ib_qp *qp,
 		return -EINVAL;
 	}
 
-	set_reg_umr_seg(*seg, mr);
+	set_reg_umr_seg(*seg, mr, umr_inline);
 	*seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
 	*size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
 	if (unlikely((*seg == qp->sq.qend)))
@@ -3754,9 +3778,14 @@  static int set_reg_wr(struct mlx5_ib_qp *qp,
 	if (unlikely((*seg == qp->sq.qend)))
 		*seg = mlx5_get_send_wqe(qp, 0);
 
-	set_reg_data_seg(*seg, mr, pd);
-	*seg += sizeof(struct mlx5_wqe_data_seg);
-	*size += (sizeof(struct mlx5_wqe_data_seg) / 16);
+	if (umr_inline) {
+		set_reg_umr_inline_seg(*seg, qp, mr, mr_list_size);
+		*size += get_xlt_octo(mr_list_size);
+	} else {
+                set_reg_data_seg(*seg, mr, pd);
+                *seg += sizeof(struct mlx5_wqe_data_seg);
+                *size += (sizeof(struct mlx5_wqe_data_seg) / 16);
+	}
 
 	return 0;
 }