diff mbox series

[for-next,9/9] RDMA/hns: Fix different dgids mapping to the same dip_idx

Message ID 20240906093444.3571619-10-huangjunxian6@hisilicon.com (mailing list archive)
State Superseded
Headers show
Series RDMA/hns: Bugfixes and one improvement | expand

Commit Message

Junxian Huang Sept. 6, 2024, 9:34 a.m. UTC
From: Feng Fang <fangfeng4@huawei.com>

DIP algorithm requires a one-to-one mapping between dgid and dip_idx.
Currently a queue 'spare_idx' is used to store QPN of QPs that use
DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx.
This method lacks a mechanism for deduplicating QPN, which may result
in different dgids sharing the same dip_idx and break the one-to-one
mapping requirement.

This patch replaces spare_idx with two new bitmaps: qpn_bitmap to record
QPN that is not being used as dip_idx, and dip_idx_map to record QPN
that is being used. Besides, introduce a reference count of a dip_idx
to indicate the number of QPs that using this dip_idx. When creating
a DIP QP, if it has a new dgid, set the corresponding bit in dip_idx_map,
otherwise add 1 to the reference count of the reused dip_idx and set bit
in qpn_bitmap. When destroying a DIP QP, decrement the reference count
by 1. If it becomes 0, set bit in qpn_bitmap and clear bit in dip_idx_map.

Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid")
Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW")
Signed-off-by: Feng Fang <fangfeng4@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
---
 drivers/infiniband/hw/hns/hns_roce_device.h |  6 +--
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 58 ++++++++++++++++++---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  1 +
 drivers/infiniband/hw/hns/hns_roce_qp.c     | 16 ++++--
 4 files changed, 67 insertions(+), 14 deletions(-)

Comments

Leon Romanovsky Sept. 10, 2024, 1:12 p.m. UTC | #1
On Fri, Sep 06, 2024 at 05:34:44PM +0800, Junxian Huang wrote:
> From: Feng Fang <fangfeng4@huawei.com>
> 
> DIP algorithm requires a one-to-one mapping between dgid and dip_idx.
> Currently a queue 'spare_idx' is used to store QPN of QPs that use
> DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx.
> This method lacks a mechanism for deduplicating QPN, which may result
> in different dgids sharing the same dip_idx and break the one-to-one
> mapping requirement.
> 
> This patch replaces spare_idx with two new bitmaps: qpn_bitmap to record
> QPN that is not being used as dip_idx, and dip_idx_map to record QPN
> that is being used. Besides, introduce a reference count of a dip_idx
> to indicate the number of QPs that using this dip_idx. When creating
> a DIP QP, if it has a new dgid, set the corresponding bit in dip_idx_map,
> otherwise add 1 to the reference count of the reused dip_idx and set bit
> in qpn_bitmap. When destroying a DIP QP, decrement the reference count
> by 1. If it becomes 0, set bit in qpn_bitmap and clear bit in dip_idx_map.
> 
> Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid")
> Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW")
> Signed-off-by: Feng Fang <fangfeng4@huawei.com>
> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> ---
>  drivers/infiniband/hw/hns/hns_roce_device.h |  6 +--
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 58 ++++++++++++++++++---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  1 +
>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 16 ++++--
>  4 files changed, 67 insertions(+), 14 deletions(-)

It is strange implementation, double bitmap and refcount looks like
open-coding of some basic coding patterns. Let's wait with applying it
for now.

Thanks
Junxian Huang Oct. 17, 2024, 1:21 p.m. UTC | #2
On 2024/9/10 21:12, Leon Romanovsky wrote:
> On Fri, Sep 06, 2024 at 05:34:44PM +0800, Junxian Huang wrote:
>> From: Feng Fang <fangfeng4@huawei.com>
>>
>> DIP algorithm requires a one-to-one mapping between dgid and dip_idx.
>> Currently a queue 'spare_idx' is used to store QPN of QPs that use
>> DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx.
>> This method lacks a mechanism for deduplicating QPN, which may result
>> in different dgids sharing the same dip_idx and break the one-to-one
>> mapping requirement.
>>
>> This patch replaces spare_idx with two new bitmaps: qpn_bitmap to record
>> QPN that is not being used as dip_idx, and dip_idx_map to record QPN
>> that is being used. Besides, introduce a reference count of a dip_idx
>> to indicate the number of QPs that using this dip_idx. When creating
>> a DIP QP, if it has a new dgid, set the corresponding bit in dip_idx_map,
>> otherwise add 1 to the reference count of the reused dip_idx and set bit
>> in qpn_bitmap. When destroying a DIP QP, decrement the reference count
>> by 1. If it becomes 0, set bit in qpn_bitmap and clear bit in dip_idx_map.
>>
>> Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid")
>> Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW")
>> Signed-off-by: Feng Fang <fangfeng4@huawei.com>
>> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
>> ---
>>  drivers/infiniband/hw/hns/hns_roce_device.h |  6 +--
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 58 ++++++++++++++++++---
>>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  1 +
>>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 16 ++++--
>>  4 files changed, 67 insertions(+), 14 deletions(-)
> 
> It is strange implementation, double bitmap and refcount looks like
> open-coding of some basic coding patterns. Let's wait with applying it
> for now.
> 

Hi Leon, it's been a while since this patch was sent. Is it okay to be applied?

Regarding your question about the double bitmaps, that's because we have 3 states
to track:
1) the context hasn't been created
2) the context has been created but not used as dip_ctx
3) the context is being used as dip_ctx.

Junxian

> Thanks
>
Leon Romanovsky Oct. 29, 2024, 12:40 p.m. UTC | #3
On Thu, Oct 17, 2024 at 09:21:45PM +0800, Junxian Huang wrote:
> 
> 
> On 2024/9/10 21:12, Leon Romanovsky wrote:
> > On Fri, Sep 06, 2024 at 05:34:44PM +0800, Junxian Huang wrote:
> >> From: Feng Fang <fangfeng4@huawei.com>
> >>
> >> DIP algorithm requires a one-to-one mapping between dgid and dip_idx.
> >> Currently a queue 'spare_idx' is used to store QPN of QPs that use
> >> DIP algorithm. For a new dgid, use a QPN from spare_idx as dip_idx.
> >> This method lacks a mechanism for deduplicating QPN, which may result
> >> in different dgids sharing the same dip_idx and break the one-to-one
> >> mapping requirement.
> >>
> >> This patch replaces spare_idx with two new bitmaps: qpn_bitmap to record
> >> QPN that is not being used as dip_idx, and dip_idx_map to record QPN
> >> that is being used. Besides, introduce a reference count of a dip_idx
> >> to indicate the number of QPs that using this dip_idx. When creating
> >> a DIP QP, if it has a new dgid, set the corresponding bit in dip_idx_map,
> >> otherwise add 1 to the reference count of the reused dip_idx and set bit
> >> in qpn_bitmap. When destroying a DIP QP, decrement the reference count
> >> by 1. If it becomes 0, set bit in qpn_bitmap and clear bit in dip_idx_map.
> >>
> >> Fixes: eb653eda1e91 ("RDMA/hns: Bugfix for incorrect association between dip_idx and dgid")
> >> Fixes: f91696f2f053 ("RDMA/hns: Support congestion control type selection according to the FW")
> >> Signed-off-by: Feng Fang <fangfeng4@huawei.com>
> >> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
> >> ---
> >>  drivers/infiniband/hw/hns/hns_roce_device.h |  6 +--
> >>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 58 ++++++++++++++++++---
> >>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  1 +
> >>  drivers/infiniband/hw/hns/hns_roce_qp.c     | 16 ++++--
> >>  4 files changed, 67 insertions(+), 14 deletions(-)
> > 
> > It is strange implementation, double bitmap and refcount looks like
> > open-coding of some basic coding patterns. Let's wait with applying it
> > for now.
> > 
> 
> Hi Leon, it's been a while since this patch was sent. Is it okay to be applied?

Not in this implementation. I think that xarray + tag will give you what
you are looking without the need to open-code it.

Thanks

> 
> Regarding your question about the double bitmaps, that's because we have 3 states
> to track:
> 1) the context hasn't been created
> 2) the context has been created but not used as dip_ctx
> 3) the context is being used as dip_ctx.
> 
> Junxian
> 
> > Thanks
> > 
>
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 0b1e21cb6d2d..adc65d383cf1 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -490,9 +490,8 @@  struct hns_roce_bank {
 };
 
 struct hns_roce_idx_table {
-	u32 *spare_idx;
-	u32 head;
-	u32 tail;
+	unsigned long *qpn_bitmap;
+	unsigned long *dip_idx_bitmap;
 };
 
 struct hns_roce_qp_table {
@@ -656,6 +655,7 @@  struct hns_roce_qp {
 	enum hns_roce_cong_type	cong_type;
 	u8			tc_mode;
 	u8			priority;
+	struct hns_roce_dip *dip;
 };
 
 struct hns_roce_ib_iboe {
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
index 085461713fa9..19a4bf80a080 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
@@ -4706,21 +4706,24 @@  static int get_dip_ctx_idx(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
 {
 	const struct ib_global_route *grh = rdma_ah_read_grh(&attr->ah_attr);
 	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
-	u32 *spare_idx = hr_dev->qp_table.idx_table.spare_idx;
-	u32 *head =  &hr_dev->qp_table.idx_table.head;
-	u32 *tail =  &hr_dev->qp_table.idx_table.tail;
+	unsigned long *dip_idx_bitmap = hr_dev->qp_table.idx_table.dip_idx_bitmap;
+	unsigned long *qpn_bitmap = hr_dev->qp_table.idx_table.qpn_bitmap;
+	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
 	struct hns_roce_dip *hr_dip;
 	unsigned long flags;
 	int ret = 0;
+	u32 idx;
 
 	spin_lock_irqsave(&hr_dev->dip_list_lock, flags);
 
-	spare_idx[*tail] = ibqp->qp_num;
-	*tail = (*tail == hr_dev->caps.num_qps - 1) ? 0 : (*tail + 1);
+	if (!test_bit(ibqp->qp_num, dip_idx_bitmap))
+		set_bit(ibqp->qp_num, qpn_bitmap);
 
 	list_for_each_entry(hr_dip, &hr_dev->dip_list, node) {
 		if (!memcmp(grh->dgid.raw, hr_dip->dgid, GID_LEN_V2)) {
 			*dip_idx = hr_dip->dip_idx;
+			hr_dip->qp_cnt++;
+			hr_qp->dip = hr_dip;
 			goto out;
 		}
 	}
@@ -4734,9 +4737,21 @@  static int get_dip_ctx_idx(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
 		goto out;
 	}
 
+	idx = find_first_bit(qpn_bitmap, hr_dev->caps.num_qps);
+	if (idx < hr_dev->caps.num_qps) {
+		*dip_idx = idx;
+		clear_bit(idx, qpn_bitmap);
+		set_bit(idx, dip_idx_bitmap);
+	} else {
+		ret = -ENOENT;
+		kfree(hr_dip);
+		goto out;
+	}
+
 	memcpy(hr_dip->dgid, grh->dgid.raw, sizeof(grh->dgid.raw));
-	hr_dip->dip_idx = *dip_idx = spare_idx[*head];
-	*head = (*head == hr_dev->caps.num_qps - 1) ? 0 : (*head + 1);
+	hr_dip->dip_idx = *dip_idx;
+	hr_dip->qp_cnt++;
+	hr_qp->dip = hr_dip;
 	list_add_tail(&hr_dip->node, &hr_dev->dip_list);
 
 out:
@@ -5597,12 +5612,41 @@  static int hns_roce_v2_destroy_qp_common(struct hns_roce_dev *hr_dev,
 	return ret;
 }
 
+static void put_dip_ctx_idx(struct hns_roce_dev *hr_dev,
+			    struct hns_roce_qp *hr_qp)
+{
+	unsigned long *dip_idx_bitmap = hr_dev->qp_table.idx_table.dip_idx_bitmap;
+	unsigned long *qpn_bitmap = hr_dev->qp_table.idx_table.qpn_bitmap;
+	struct hns_roce_dip *hr_dip = hr_qp->dip;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hr_dev->dip_list_lock, flags);
+
+	if (hr_dip) {
+		hr_dip->qp_cnt--;
+		if (!hr_dip->qp_cnt) {
+			clear_bit(hr_dip->dip_idx, dip_idx_bitmap);
+			set_bit(hr_dip->dip_idx, qpn_bitmap);
+
+			list_del(&hr_dip->node);
+		} else {
+			hr_dip = NULL;
+		}
+	}
+
+	spin_unlock_irqrestore(&hr_dev->dip_list_lock, flags);
+	kfree(hr_dip);
+}
+
 int hns_roce_v2_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 {
 	struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
 	struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
 	int ret;
 
+	if (hr_qp->cong_type == CONG_TYPE_DIP)
+		put_dip_ctx_idx(hr_dev, hr_qp);
+
 	ret = hns_roce_v2_destroy_qp_common(hr_dev, hr_qp, udata);
 	if (ret)
 		ibdev_err_ratelimited(&hr_dev->ib_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
index c65f68a14a26..3804882bb5b4 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.h
@@ -1342,6 +1342,7 @@  struct hns_roce_v2_priv {
 struct hns_roce_dip {
 	u8 dgid[GID_LEN_V2];
 	u32 dip_idx;
+	u32 qp_cnt;
 	struct list_head node; /* all dips are on a list */
 };
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index 6b03ba671ff8..bc278b735736 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -1546,11 +1546,18 @@  int hns_roce_init_qp_table(struct hns_roce_dev *hr_dev)
 	unsigned int reserved_from_bot;
 	unsigned int i;
 
-	qp_table->idx_table.spare_idx = kcalloc(hr_dev->caps.num_qps,
-					sizeof(u32), GFP_KERNEL);
-	if (!qp_table->idx_table.spare_idx)
+	qp_table->idx_table.qpn_bitmap = bitmap_zalloc(hr_dev->caps.num_qps,
+						       GFP_KERNEL);
+	if (!qp_table->idx_table.qpn_bitmap)
 		return -ENOMEM;
 
+	qp_table->idx_table.dip_idx_bitmap = bitmap_zalloc(hr_dev->caps.num_qps,
+							   GFP_KERNEL);
+	if (!qp_table->idx_table.dip_idx_bitmap) {
+		bitmap_free(qp_table->idx_table.qpn_bitmap);
+		return -ENOMEM;
+	}
+
 	mutex_init(&qp_table->scc_mutex);
 	mutex_init(&qp_table->bank_mutex);
 	xa_init(&hr_dev->qp_table_xa);
@@ -1580,5 +1587,6 @@  void hns_roce_cleanup_qp_table(struct hns_roce_dev *hr_dev)
 		ida_destroy(&hr_dev->qp_table.bank[i].ida);
 	mutex_destroy(&hr_dev->qp_table.bank_mutex);
 	mutex_destroy(&hr_dev->qp_table.scc_mutex);
-	kfree(hr_dev->qp_table.idx_table.spare_idx);
+	bitmap_free(hr_dev->qp_table.idx_table.qpn_bitmap);
+	bitmap_free(hr_dev->qp_table.idx_table.dip_idx_bitmap);
 }