[4/5] IB/core: cache the CQ completion vector

Message ID	20200317134030.152833-5-maxg@mellanox.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=Q14g=5C=vger.kernel.org=linux-rdma-owner@kernel.org> From: Max Gurtovoy <maxg@mellanox.com> To: linux-nvme@lists.infradead.org, sagi@grimberg.me, hch@lst.de, loberman@redhat.com, bvanassche@acm.org, linux-rdma@vger.kernel.org Cc: kbusch@kernel.org, leonro@mellanox.com, jgg@mellanox.com, dledford@redhat.com, idanb@mellanox.com, shlomin@mellanox.com, oren@mellanox.com, vladimirk@mellanox.com, Max Gurtovoy <maxg@mellanox.com> Subject: [PATCH 4/5] IB/core: cache the CQ completion vector Date: Tue, 17 Mar 2020 15:40:29 +0200 Message-Id: <20200317134030.152833-5-maxg@mellanox.com> In-Reply-To: <20200317134030.152833-1-maxg@mellanox.com> References: <20200317134030.152833-1-maxg@mellanox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk
Series	nvmet-rdma/srpt: SRQ per completion vector \| expand [0/5] nvmet-rdma/srpt: SRQ per completion vector [1/5] IB/core: add a simple SRQ set per PD [3/5] nvmet-rdma: use SRQ per completion vector [4/5] IB/core: cache the CQ completion vector [5/5] RDMA/srpt: use SRQ per completion vector

Max Gurtovoy March 17, 2020, 1:40 p.m. UTC

In some cases, e.g. when using ib_alloc_cq_any, one would like to know
the completion vector that eventually assigned to the CQ. Cache this
value during CQ creation.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/infiniband/core/cq.c | 1 +
 include/rdma/ib_verbs.h      | 1 +
 2 files changed, 2 insertions(+)

Chuck Lever March 17, 2020, 3:19 p.m. UTC | #1

Hi Max-

> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote:
> 
> In some cases, e.g. when using ib_alloc_cq_any, one would like to know
> the completion vector that eventually assigned to the CQ. Cache this
> value during CQ creation.

I'm confused by the mention of the ib_alloc_cq_any() API here.

Is your design somehow dependent on the way the current ib_alloc_cq_any()
selects comp_vectors? The contract for _any() is that it is an API for
callers that simply do not care about what comp_vector is chosen. There's
no guarantee that the _any() comp_vector allocator will continue to use
round-robin in the future, for instance.

If you want to guarantee that there is an SRQ for each comp_vector and a
comp_vector for each SRQ, stick with a CQ allocation API that enables
explicit selection of the comp_vector value, and cache that value in the
caller, not in the core data structures.

> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> ---
> drivers/infiniband/core/cq.c | 1 +
> include/rdma/ib_verbs.h      | 1 +
> 2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index 4f25b24..a7cbf52 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
> 	cq->device = dev;
> 	cq->cq_context = private;
> 	cq->poll_ctx = poll_ctx;
> +	cq->comp_vector = comp_vector;
> 	atomic_set(&cq->usecnt, 0);
> 
> 	cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index fc8207d..0d61772 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1558,6 +1558,7 @@ struct ib_cq {
> 	struct ib_device       *device;
> 	struct ib_ucq_object   *uobject;
> 	ib_comp_handler   	comp_handler;
> +	u32			comp_vector;
> 	void                  (*event_handler)(struct ib_event *, void *);
> 	void                   *cq_context;
> 	int               	cqe;
> -- 
> 1.8.3.1
> 

--
Chuck Lever
chucklever@gmail.com

Max Gurtovoy March 17, 2020, 3:41 p.m. UTC | #2

On 3/17/2020 5:19 PM, Chuck Lever wrote:
> Hi Max-
>
>> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote:
>>
>> In some cases, e.g. when using ib_alloc_cq_any, one would like to know
>> the completion vector that eventually assigned to the CQ. Cache this
>> value during CQ creation.
> I'm confused by the mention of the ib_alloc_cq_any() API here.

Can't the user use ib_alloc_cq_any() and still want to know what is the 
final comp vector for his needs ?

>
> Is your design somehow dependent on the way the current ib_alloc_cq_any()
> selects comp_vectors? The contract for _any() is that it is an API for
> callers that simply do not care about what comp_vector is chosen. There's
> no guarantee that the _any() comp_vector allocator will continue to use
> round-robin in the future, for instance.

it's fine. I just want to make sure that I'll spread the SRQ's equally.

>
> If you want to guarantee that there is an SRQ for each comp_vector and a
> comp_vector for each SRQ, stick with a CQ allocation API that enables
> explicit selection of the comp_vector value, and cache that value in the
> caller, not in the core data structures.

I'm Ok with that as well. This is exactly what we do in the nvmf/rdma 
but I wanted to stick also with the SRP target approach.

Bart,

Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and 
use ib_alloc_cq() ?


>
>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> ---
>> drivers/infiniband/core/cq.c | 1 +
>> include/rdma/ib_verbs.h      | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>> index 4f25b24..a7cbf52 100644
>> --- a/drivers/infiniband/core/cq.c
>> +++ b/drivers/infiniband/core/cq.c
>> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>> 	cq->device = dev;
>> 	cq->cq_context = private;
>> 	cq->poll_ctx = poll_ctx;
>> +	cq->comp_vector = comp_vector;
>> 	atomic_set(&cq->usecnt, 0);
>>
>> 	cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index fc8207d..0d61772 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -1558,6 +1558,7 @@ struct ib_cq {
>> 	struct ib_device       *device;
>> 	struct ib_ucq_object   *uobject;
>> 	ib_comp_handler   	comp_handler;
>> +	u32			comp_vector;
>> 	void                  (*event_handler)(struct ib_event *, void *);
>> 	void                   *cq_context;
>> 	int               	cqe;
>> -- 
>> 1.8.3.1
>>
> --
> Chuck Lever
> chucklever@gmail.com
>
>
>

Chuck Lever March 17, 2020, 8:36 p.m. UTC | #3

> On Mar 17, 2020, at 11:41 AM, Max Gurtovoy <maxg@mellanox.com> wrote:
> 
> 
> On 3/17/2020 5:19 PM, Chuck Lever wrote:
>> Hi Max-
>> 
>>> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote:
>>> 
>>> In some cases, e.g. when using ib_alloc_cq_any, one would like to know
>>> the completion vector that eventually assigned to the CQ. Cache this
>>> value during CQ creation.
>> I'm confused by the mention of the ib_alloc_cq_any() API here.
> 
> Can't the user use ib_alloc_cq_any() and still want to know what is the final comp vector for his needs ?

If your caller cares about the final cv value, then it should not use
the _any API. The point of _any is that the caller really does not care,
the cv value is hidden because it is not consequential. Your design
breaks that assumption/contract.


>> Is your design somehow dependent on the way the current ib_alloc_cq_any()
>> selects comp_vectors? The contract for _any() is that it is an API for
>> callers that simply do not care about what comp_vector is chosen. There's
>> no guarantee that the _any() comp_vector allocator will continue to use
>> round-robin in the future, for instance.
> 
> it's fine. I just want to make sure that I'll spread the SRQ's equally.

The _any algorithm is simplistic, it spreads cvs for the system as a whole.
All devices, all callers. You're going to get some bad degenerate cases
as soon as you have multiple users of the SRQ facility.

So, you really want to have a more specialized comp_vector selector for
the SRQ facility; one that is careful to spread cvs per device, independent
of the global allocator, which is good enough for normal cases.

I think your tests perform well simply because there was no other contender
for comp_vectors on your test system.


>> If you want to guarantee that there is an SRQ for each comp_vector and a
>> comp_vector for each SRQ, stick with a CQ allocation API that enables
>> explicit selection of the comp_vector value, and cache that value in the
>> caller, not in the core data structures.
> 
> I'm Ok with that as well. This is exactly what we do in the nvmf/rdma but I wanted to stick also with the SRP target approach.
> 
> Bart,
> 
> Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and use ib_alloc_cq() ?
> 
> 
>> 
>> 
>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>> ---
>>> drivers/infiniband/core/cq.c | 1 +
>>> include/rdma/ib_verbs.h      | 1 +
>>> 2 files changed, 2 insertions(+)
>>> 
>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>>> index 4f25b24..a7cbf52 100644
>>> --- a/drivers/infiniband/core/cq.c
>>> +++ b/drivers/infiniband/core/cq.c
>>> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>>> 	cq->device = dev;
>>> 	cq->cq_context = private;
>>> 	cq->poll_ctx = poll_ctx;
>>> +	cq->comp_vector = comp_vector;
>>> 	atomic_set(&cq->usecnt, 0);
>>> 
>>> 	cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>> index fc8207d..0d61772 100644
>>> --- a/include/rdma/ib_verbs.h
>>> +++ b/include/rdma/ib_verbs.h
>>> @@ -1558,6 +1558,7 @@ struct ib_cq {
>>> 	struct ib_device       *device;
>>> 	struct ib_ucq_object   *uobject;
>>> 	ib_comp_handler   	comp_handler;
>>> +	u32			comp_vector;
>>> 	void                  (*event_handler)(struct ib_event *, void *);
>>> 	void                   *cq_context;
>>> 	int               	cqe;
>>> -- 
>>> 1.8.3.1
>>> 
>> --
>> Chuck Lever
>> chucklever@gmail.com
>> 
>> 
>> 

--
Chuck Lever
chucklever@gmail.com

Max Gurtovoy March 17, 2020, 10:18 p.m. UTC | #4

On 3/17/2020 10:36 PM, Chuck Lever wrote:
>
>> On Mar 17, 2020, at 11:41 AM, Max Gurtovoy <maxg@mellanox.com> wrote:
>>
>>
>> On 3/17/2020 5:19 PM, Chuck Lever wrote:
>>> Hi Max-
>>>
>>>> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote:
>>>>
>>>> In some cases, e.g. when using ib_alloc_cq_any, one would like to know
>>>> the completion vector that eventually assigned to the CQ. Cache this
>>>> value during CQ creation.
>>> I'm confused by the mention of the ib_alloc_cq_any() API here.
>> Can't the user use ib_alloc_cq_any() and still want to know what is the final comp vector for his needs ?
> If your caller cares about the final cv value, then it should not use
> the _any API. The point of _any is that the caller really does not care,
> the cv value is hidden because it is not consequential. Your design
> breaks that assumption/contract.

How come it breaks ?

If the ULP want to let the rdma-core layer to allocate the optimal 
vector and rely on it to do so, why is it wrong to know the final vector 
assigned ?

I can remove it and change the SRP target to use ib_alloc_cq but it 
doesn't break the contract.

>
>>> Is your design somehow dependent on the way the current ib_alloc_cq_any()
>>> selects comp_vectors? The contract for _any() is that it is an API for
>>> callers that simply do not care about what comp_vector is chosen. There's
>>> no guarantee that the _any() comp_vector allocator will continue to use
>>> round-robin in the future, for instance.
>> it's fine. I just want to make sure that I'll spread the SRQ's equally.
> The _any algorithm is simplistic, it spreads cvs for the system as a whole.
> All devices, all callers. You're going to get some bad degenerate cases
> as soon as you have multiple users of the SRQ facility.

how come ? This facility is per PD that is created by the ULP.


>
> So, you really want to have a more specialized comp_vector selector for
> the SRQ facility; one that is careful to spread cvs per device, independent
> of the global allocator, which is good enough for normal cases.
>
> I think your tests perform well simply because there was no other contender
> for comp_vectors on your test system.

For the testing result I did I used NVMf target that uses ib_alloc_cq so 
it would be good anyway.

According to your theory ib_alloc_cq_any will not perform well and have 
degenerate cases anyway regardless the SRQ feature that was intended to 
save resources and stay with great performance.

>
>
>>> If you want to guarantee that there is an SRQ for each comp_vector and a
>>> comp_vector for each SRQ, stick with a CQ allocation API that enables
>>> explicit selection of the comp_vector value, and cache that value in the
>>> caller, not in the core data structures.
>> I'm Ok with that as well. This is exactly what we do in the nvmf/rdma but I wanted to stick also with the SRP target approach.
>>
>> Bart,
>>
>> Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and use ib_alloc_cq() ?
>>
>>
>>>
>>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>> ---
>>>> drivers/infiniband/core/cq.c | 1 +
>>>> include/rdma/ib_verbs.h      | 1 +
>>>> 2 files changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>>>> index 4f25b24..a7cbf52 100644
>>>> --- a/drivers/infiniband/core/cq.c
>>>> +++ b/drivers/infiniband/core/cq.c
>>>> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
>>>> 	cq->device = dev;
>>>> 	cq->cq_context = private;
>>>> 	cq->poll_ctx = poll_ctx;
>>>> +	cq->comp_vector = comp_vector;
>>>> 	atomic_set(&cq->usecnt, 0);
>>>>
>>>> 	cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL);
>>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>>> index fc8207d..0d61772 100644
>>>> --- a/include/rdma/ib_verbs.h
>>>> +++ b/include/rdma/ib_verbs.h
>>>> @@ -1558,6 +1558,7 @@ struct ib_cq {
>>>> 	struct ib_device       *device;
>>>> 	struct ib_ucq_object   *uobject;
>>>> 	ib_comp_handler   	comp_handler;
>>>> +	u32			comp_vector;
>>>> 	void                  (*event_handler)(struct ib_event *, void *);
>>>> 	void                   *cq_context;
>>>> 	int               	cqe;
>>>> -- 
>>>> 1.8.3.1
>>>>
>>> --
>>> Chuck Lever
>>> chucklever@gmail.com
>>>
>>>
>>>
> --
> Chuck Lever
> chucklever@gmail.com
>
>
>

Bart Van Assche March 17, 2020, 10:50 p.m. UTC | #5

On 2020-03-17 08:41, Max Gurtovoy wrote:
> On 3/17/2020 5:19 PM, Chuck Lever wrote:
>> If you want to guarantee that there is an SRQ for each comp_vector and a
>> comp_vector for each SRQ, stick with a CQ allocation API that enables
>> explicit selection of the comp_vector value, and cache that value in the
>> caller, not in the core data structures.
> 
> I'm Ok with that as well. This is exactly what we do in the nvmf/rdma
> but I wanted to stick also with the SRP target approach.
> 
> Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and
> use ib_alloc_cq() ?
Hi Max,

Wasn't that call introduced by Chuck (see also commit 20cf4e026730
("rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors")
# v5.4)? Anyway, I'm fine with the proposed change.

Thanks,

Bart.

Max Gurtovoy March 17, 2020, 11:26 p.m. UTC | #6

On 3/18/2020 12:50 AM, Bart Van Assche wrote:
> On 2020-03-17 08:41, Max Gurtovoy wrote:
>> On 3/17/2020 5:19 PM, Chuck Lever wrote:
>>> If you want to guarantee that there is an SRQ for each comp_vector and a
>>> comp_vector for each SRQ, stick with a CQ allocation API that enables
>>> explicit selection of the comp_vector value, and cache that value in the
>>> caller, not in the core data structures.
>> I'm Ok with that as well. This is exactly what we do in the nvmf/rdma
>> but I wanted to stick also with the SRP target approach.
>>
>> Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and
>> use ib_alloc_cq() ?
> Hi Max,
>
> Wasn't that call introduced by Chuck (see also commit 20cf4e026730
> ("rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors")
> # v5.4)? Anyway, I'm fine with the proposed change.

Yes this was introduced by Chuck, but no contract is broken :)

I'll update srpt in V2.


> Thanks,
>
> Bart.

[4/5] IB/core: cache the CQ completion vector

Commit Message

Comments

Patch