diff mbox

[rfc,3/3] nvmet-rdma: assign cq completion vector based on the port allowed cpus

Message ID 1499007694-7231-4-git-send-email-sagi@grimberg.me (mailing list archive)
State RFC
Headers show

Commit Message

Sagi Grimberg July 2, 2017, 3:01 p.m. UTC
We first take a cpu assignment from the port configured cpulist
(spread uniformly accross them) with rdma core API.

If the device does not expose a vector affinity mask,  or we
couldn't find a match, we fallback to the old behavior as we
don't have sufficient information to do the "correct" vector
assignment.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/rdma.c | 40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

Comments

Christoph Hellwig July 13, 2017, 3:50 p.m. UTC | #1
We really shouldn't be doing any of this in NVMe I think.  We'll need
to go back to the cq pool API first.  The last version I had was here:

	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq

and then do the affinity in common code.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg July 13, 2017, 4:37 p.m. UTC | #2
> We really shouldn't be doing any of this in NVMe I think.  We'll need
> to go back to the cq pool API first.  The last version I had was here:
> 
> 	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq

Not against the idea, though the ULP will need to pass a desired cpu
affinity.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 13, 2017, 5:16 p.m. UTC | #3
On Thu, Jul 13, 2017 at 07:37:13PM +0300, Sagi Grimberg wrote:
>
>> We really shouldn't be doing any of this in NVMe I think.  We'll need
>> to go back to the cq pool API first.  The last version I had was here:
>>
>> 	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq
>
> Not against the idea, though the ULP will need to pass a desired cpu
> affinity.

Yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever July 13, 2017, 5:19 p.m. UTC | #4
> On Jul 13, 2017, at 11:50 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> We really shouldn't be doing any of this in NVMe I think.  We'll need
> to go back to the cq pool API first.  The last version I had was here:
> 
> 	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq
> 
> and then do the affinity in common code.

This seems to address the problem I mentioned to you with properly
estimating send CQ size when using the rdma_rw API. If these are
going to be merged soon, I can drop the new API I proposed here:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commit;h=2156cb956101da854f64918066710ff4e7affc5b


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 13, 2017, 5:24 p.m. UTC | #5
On Thu, Jul 13, 2017 at 01:19:00PM -0400, Chuck Lever wrote:
> 
> > On Jul 13, 2017, at 11:50 AM, Christoph Hellwig <hch@lst.de> wrote:
> > 
> > We really shouldn't be doing any of this in NVMe I think.  We'll need
> > to go back to the cq pool API first.  The last version I had was here:
> > 
> > 	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq
> > 
> > and then do the affinity in common code.
> 
> This seems to address the problem I mentioned to you with properly
> estimating send CQ size when using the rdma_rw API. If these are
> going to be merged soon, I can drop the new API I proposed here:
> 
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commit;h=2156cb956101da854f64918066710ff4e7affc5b

I'd need to find time to get back to it, and I have a few big
chunks on my todo list.  Any chance you (or someone else interested)
could take the series over?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever July 13, 2017, 5:31 p.m. UTC | #6
> On Jul 13, 2017, at 1:24 PM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Thu, Jul 13, 2017 at 01:19:00PM -0400, Chuck Lever wrote:
>> 
>>> On Jul 13, 2017, at 11:50 AM, Christoph Hellwig <hch@lst.de> wrote:
>>> 
>>> We really shouldn't be doing any of this in NVMe I think.  We'll need
>>> to go back to the cq pool API first.  The last version I had was here:
>>> 
>>> 	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-cq
>>> 
>>> and then do the affinity in common code.
>> 
>> This seems to address the problem I mentioned to you with properly
>> estimating send CQ size when using the rdma_rw API. If these are
>> going to be merged soon, I can drop the new API I proposed here:
>> 
>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=commit;h=2156cb956101da854f64918066710ff4e7affc5b
> 
> I'd need to find time to get back to it, and I have a few big
> chunks on my todo list.  Any chance you (or someone else interested)
> could take the series over?

Seems like it's right in Sagi's ballpark, and might be a pre-req
for his affinity work. If he's not interested, I can take it.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 56a4cba690b5..a1725d3e174a 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -889,27 +889,43 @@  nvmet_rdma_find_get_device(struct rdma_cm_id *cm_id)
 	return NULL;
 }
 
-static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
+static int nvmet_rdma_assign_vector(struct nvmet_rdma_queue *queue)
 {
-	struct ib_qp_init_attr qp_attr;
-	struct nvmet_rdma_device *ndev = queue->dev;
-	int comp_vector, nr_cqe, ret, i;
+	struct ib_device *dev = queue->dev->device;
+	struct nvmet_port *port = queue->port;
+	int vec, cpu;
 
 	/*
-	 * Spread the io queues across completion vectors,
-	 * but still keep all admin queues on vector 0.
+	 * Spread the io queues across port cpus,
+	 * but still keep all admin queues on cpu 0.
 	 */
-	comp_vector = !queue->host_qid ? 0 :
-		queue->idx % ndev->device->num_comp_vectors;
+	cpu = !queue->host_qid ? 0 : port->cpus[queue->idx % port->nr_cpus];
+
+	if (ib_find_cpu_vector(dev, cpu, &vec))
+		return vec;
+
+	pr_debug("device %s could not provide vector to match cpu %d\n",
+			dev->name, cpu);
+	/*
+	 * No corresponding vector affinity found, fallback to
+	 * the old behavior where we spread vectors all over...
+	 */
+	return !queue->host_qid ? 0 : queue->idx % dev->num_comp_vectors;
+}
+
+static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
+{
+	struct ib_qp_init_attr qp_attr;
+	struct nvmet_rdma_device *ndev = queue->dev;
+	int nr_cqe, ret, i;
 
 	/*
 	 * Reserve CQ slots for RECV + RDMA_READ/RDMA_WRITE + RDMA_SEND.
 	 */
 	nr_cqe = queue->recv_queue_size + 2 * queue->send_queue_size;
 
-	queue->cq = ib_alloc_cq(ndev->device, queue,
-			nr_cqe + 1, comp_vector,
-			IB_POLL_WORKQUEUE);
+	queue->cq = ib_alloc_cq(ndev->device, queue, nr_cqe + 1,
+			nvmet_rdma_assign_vector(queue), IB_POLL_WORKQUEUE);
 	if (IS_ERR(queue->cq)) {
 		ret = PTR_ERR(queue->cq);
 		pr_err("failed to create CQ cqe= %d ret= %d\n",
@@ -1080,6 +1096,7 @@  nvmet_rdma_alloc_queue(struct nvmet_rdma_device *ndev,
 	INIT_WORK(&queue->release_work, nvmet_rdma_release_queue_work);
 	queue->dev = ndev;
 	queue->cm_id = cm_id;
+	queue->port = cm_id->context;
 
 	spin_lock_init(&queue->state_lock);
 	queue->state = NVMET_RDMA_Q_CONNECTING;
@@ -1198,7 +1215,6 @@  static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 		ret = -ENOMEM;
 		goto put_device;
 	}
-	queue->port = cm_id->context;
 
 	if (queue->host_qid == 0) {
 		/* Let inflight controller teardown complete */