Message ID | 20200317134030.152833-5-maxg@mellanox.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | nvmet-rdma/srpt: SRQ per completion vector | expand |
Hi Max- > On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote: > > In some cases, e.g. when using ib_alloc_cq_any, one would like to know > the completion vector that eventually assigned to the CQ. Cache this > value during CQ creation. I'm confused by the mention of the ib_alloc_cq_any() API here. Is your design somehow dependent on the way the current ib_alloc_cq_any() selects comp_vectors? The contract for _any() is that it is an API for callers that simply do not care about what comp_vector is chosen. There's no guarantee that the _any() comp_vector allocator will continue to use round-robin in the future, for instance. If you want to guarantee that there is an SRQ for each comp_vector and a comp_vector for each SRQ, stick with a CQ allocation API that enables explicit selection of the comp_vector value, and cache that value in the caller, not in the core data structures. > Signed-off-by: Max Gurtovoy <maxg@mellanox.com> > --- > drivers/infiniband/core/cq.c | 1 + > include/rdma/ib_verbs.h | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > index 4f25b24..a7cbf52 100644 > --- a/drivers/infiniband/core/cq.c > +++ b/drivers/infiniband/core/cq.c > @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, > cq->device = dev; > cq->cq_context = private; > cq->poll_ctx = poll_ctx; > + cq->comp_vector = comp_vector; > atomic_set(&cq->usecnt, 0); > > cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h > index fc8207d..0d61772 100644 > --- a/include/rdma/ib_verbs.h > +++ b/include/rdma/ib_verbs.h > @@ -1558,6 +1558,7 @@ struct ib_cq { > struct ib_device *device; > struct ib_ucq_object *uobject; > ib_comp_handler comp_handler; > + u32 comp_vector; > void (*event_handler)(struct ib_event *, void *); > void *cq_context; > int cqe; > -- > 1.8.3.1 > -- Chuck Lever chucklever@gmail.com
On 3/17/2020 5:19 PM, Chuck Lever wrote: > Hi Max- > >> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote: >> >> In some cases, e.g. when using ib_alloc_cq_any, one would like to know >> the completion vector that eventually assigned to the CQ. Cache this >> value during CQ creation. > I'm confused by the mention of the ib_alloc_cq_any() API here. Can't the user use ib_alloc_cq_any() and still want to know what is the final comp vector for his needs ? > > Is your design somehow dependent on the way the current ib_alloc_cq_any() > selects comp_vectors? The contract for _any() is that it is an API for > callers that simply do not care about what comp_vector is chosen. There's > no guarantee that the _any() comp_vector allocator will continue to use > round-robin in the future, for instance. it's fine. I just want to make sure that I'll spread the SRQ's equally. > > If you want to guarantee that there is an SRQ for each comp_vector and a > comp_vector for each SRQ, stick with a CQ allocation API that enables > explicit selection of the comp_vector value, and cache that value in the > caller, not in the core data structures. I'm Ok with that as well. This is exactly what we do in the nvmf/rdma but I wanted to stick also with the SRP target approach. Bart, Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and use ib_alloc_cq() ? > > >> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> >> --- >> drivers/infiniband/core/cq.c | 1 + >> include/rdma/ib_verbs.h | 1 + >> 2 files changed, 2 insertions(+) >> >> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c >> index 4f25b24..a7cbf52 100644 >> --- a/drivers/infiniband/core/cq.c >> +++ b/drivers/infiniband/core/cq.c >> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, >> cq->device = dev; >> cq->cq_context = private; >> cq->poll_ctx = poll_ctx; >> + cq->comp_vector = comp_vector; >> atomic_set(&cq->usecnt, 0); >> >> cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); >> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h >> index fc8207d..0d61772 100644 >> --- a/include/rdma/ib_verbs.h >> +++ b/include/rdma/ib_verbs.h >> @@ -1558,6 +1558,7 @@ struct ib_cq { >> struct ib_device *device; >> struct ib_ucq_object *uobject; >> ib_comp_handler comp_handler; >> + u32 comp_vector; >> void (*event_handler)(struct ib_event *, void *); >> void *cq_context; >> int cqe; >> -- >> 1.8.3.1 >> > -- > Chuck Lever > chucklever@gmail.com > > >
> On Mar 17, 2020, at 11:41 AM, Max Gurtovoy <maxg@mellanox.com> wrote: > > > On 3/17/2020 5:19 PM, Chuck Lever wrote: >> Hi Max- >> >>> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote: >>> >>> In some cases, e.g. when using ib_alloc_cq_any, one would like to know >>> the completion vector that eventually assigned to the CQ. Cache this >>> value during CQ creation. >> I'm confused by the mention of the ib_alloc_cq_any() API here. > > Can't the user use ib_alloc_cq_any() and still want to know what is the final comp vector for his needs ? If your caller cares about the final cv value, then it should not use the _any API. The point of _any is that the caller really does not care, the cv value is hidden because it is not consequential. Your design breaks that assumption/contract. >> Is your design somehow dependent on the way the current ib_alloc_cq_any() >> selects comp_vectors? The contract for _any() is that it is an API for >> callers that simply do not care about what comp_vector is chosen. There's >> no guarantee that the _any() comp_vector allocator will continue to use >> round-robin in the future, for instance. > > it's fine. I just want to make sure that I'll spread the SRQ's equally. The _any algorithm is simplistic, it spreads cvs for the system as a whole. All devices, all callers. You're going to get some bad degenerate cases as soon as you have multiple users of the SRQ facility. So, you really want to have a more specialized comp_vector selector for the SRQ facility; one that is careful to spread cvs per device, independent of the global allocator, which is good enough for normal cases. I think your tests perform well simply because there was no other contender for comp_vectors on your test system. >> If you want to guarantee that there is an SRQ for each comp_vector and a >> comp_vector for each SRQ, stick with a CQ allocation API that enables >> explicit selection of the comp_vector value, and cache that value in the >> caller, not in the core data structures. > > I'm Ok with that as well. This is exactly what we do in the nvmf/rdma but I wanted to stick also with the SRP target approach. > > Bart, > > Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and use ib_alloc_cq() ? > > >> >> >>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> >>> --- >>> drivers/infiniband/core/cq.c | 1 + >>> include/rdma/ib_verbs.h | 1 + >>> 2 files changed, 2 insertions(+) >>> >>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c >>> index 4f25b24..a7cbf52 100644 >>> --- a/drivers/infiniband/core/cq.c >>> +++ b/drivers/infiniband/core/cq.c >>> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, >>> cq->device = dev; >>> cq->cq_context = private; >>> cq->poll_ctx = poll_ctx; >>> + cq->comp_vector = comp_vector; >>> atomic_set(&cq->usecnt, 0); >>> >>> cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); >>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h >>> index fc8207d..0d61772 100644 >>> --- a/include/rdma/ib_verbs.h >>> +++ b/include/rdma/ib_verbs.h >>> @@ -1558,6 +1558,7 @@ struct ib_cq { >>> struct ib_device *device; >>> struct ib_ucq_object *uobject; >>> ib_comp_handler comp_handler; >>> + u32 comp_vector; >>> void (*event_handler)(struct ib_event *, void *); >>> void *cq_context; >>> int cqe; >>> -- >>> 1.8.3.1 >>> >> -- >> Chuck Lever >> chucklever@gmail.com >> >> >> -- Chuck Lever chucklever@gmail.com
On 3/17/2020 10:36 PM, Chuck Lever wrote: > >> On Mar 17, 2020, at 11:41 AM, Max Gurtovoy <maxg@mellanox.com> wrote: >> >> >> On 3/17/2020 5:19 PM, Chuck Lever wrote: >>> Hi Max- >>> >>>> On Mar 17, 2020, at 9:40 AM, Max Gurtovoy <maxg@mellanox.com> wrote: >>>> >>>> In some cases, e.g. when using ib_alloc_cq_any, one would like to know >>>> the completion vector that eventually assigned to the CQ. Cache this >>>> value during CQ creation. >>> I'm confused by the mention of the ib_alloc_cq_any() API here. >> Can't the user use ib_alloc_cq_any() and still want to know what is the final comp vector for his needs ? > If your caller cares about the final cv value, then it should not use > the _any API. The point of _any is that the caller really does not care, > the cv value is hidden because it is not consequential. Your design > breaks that assumption/contract. How come it breaks ? If the ULP want to let the rdma-core layer to allocate the optimal vector and rely on it to do so, why is it wrong to know the final vector assigned ? I can remove it and change the SRP target to use ib_alloc_cq but it doesn't break the contract. > >>> Is your design somehow dependent on the way the current ib_alloc_cq_any() >>> selects comp_vectors? The contract for _any() is that it is an API for >>> callers that simply do not care about what comp_vector is chosen. There's >>> no guarantee that the _any() comp_vector allocator will continue to use >>> round-robin in the future, for instance. >> it's fine. I just want to make sure that I'll spread the SRQ's equally. > The _any algorithm is simplistic, it spreads cvs for the system as a whole. > All devices, all callers. You're going to get some bad degenerate cases > as soon as you have multiple users of the SRQ facility. how come ? This facility is per PD that is created by the ULP. > > So, you really want to have a more specialized comp_vector selector for > the SRQ facility; one that is careful to spread cvs per device, independent > of the global allocator, which is good enough for normal cases. > > I think your tests perform well simply because there was no other contender > for comp_vectors on your test system. For the testing result I did I used NVMf target that uses ib_alloc_cq so it would be good anyway. According to your theory ib_alloc_cq_any will not perform well and have degenerate cases anyway regardless the SRQ feature that was intended to save resources and stay with great performance. > > >>> If you want to guarantee that there is an SRQ for each comp_vector and a >>> comp_vector for each SRQ, stick with a CQ allocation API that enables >>> explicit selection of the comp_vector value, and cache that value in the >>> caller, not in the core data structures. >> I'm Ok with that as well. This is exactly what we do in the nvmf/rdma but I wanted to stick also with the SRP target approach. >> >> Bart, >> >> Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and use ib_alloc_cq() ? >> >> >>> >>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> >>>> --- >>>> drivers/infiniband/core/cq.c | 1 + >>>> include/rdma/ib_verbs.h | 1 + >>>> 2 files changed, 2 insertions(+) >>>> >>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c >>>> index 4f25b24..a7cbf52 100644 >>>> --- a/drivers/infiniband/core/cq.c >>>> +++ b/drivers/infiniband/core/cq.c >>>> @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, >>>> cq->device = dev; >>>> cq->cq_context = private; >>>> cq->poll_ctx = poll_ctx; >>>> + cq->comp_vector = comp_vector; >>>> atomic_set(&cq->usecnt, 0); >>>> >>>> cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); >>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h >>>> index fc8207d..0d61772 100644 >>>> --- a/include/rdma/ib_verbs.h >>>> +++ b/include/rdma/ib_verbs.h >>>> @@ -1558,6 +1558,7 @@ struct ib_cq { >>>> struct ib_device *device; >>>> struct ib_ucq_object *uobject; >>>> ib_comp_handler comp_handler; >>>> + u32 comp_vector; >>>> void (*event_handler)(struct ib_event *, void *); >>>> void *cq_context; >>>> int cqe; >>>> -- >>>> 1.8.3.1 >>>> >>> -- >>> Chuck Lever >>> chucklever@gmail.com >>> >>> >>> > -- > Chuck Lever > chucklever@gmail.com > > >
On 2020-03-17 08:41, Max Gurtovoy wrote: > On 3/17/2020 5:19 PM, Chuck Lever wrote: >> If you want to guarantee that there is an SRQ for each comp_vector and a >> comp_vector for each SRQ, stick with a CQ allocation API that enables >> explicit selection of the comp_vector value, and cache that value in the >> caller, not in the core data structures. > > I'm Ok with that as well. This is exactly what we do in the nvmf/rdma > but I wanted to stick also with the SRP target approach. > > Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and > use ib_alloc_cq() ? Hi Max, Wasn't that call introduced by Chuck (see also commit 20cf4e026730 ("rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors") # v5.4)? Anyway, I'm fine with the proposed change. Thanks, Bart.
On 3/18/2020 12:50 AM, Bart Van Assche wrote: > On 2020-03-17 08:41, Max Gurtovoy wrote: >> On 3/17/2020 5:19 PM, Chuck Lever wrote: >>> If you want to guarantee that there is an SRQ for each comp_vector and a >>> comp_vector for each SRQ, stick with a CQ allocation API that enables >>> explicit selection of the comp_vector value, and cache that value in the >>> caller, not in the core data structures. >> I'm Ok with that as well. This is exactly what we do in the nvmf/rdma >> but I wanted to stick also with the SRP target approach. >> >> Any objection to remove the call for ib_alloc_cq_any() from ib_srpt and >> use ib_alloc_cq() ? > Hi Max, > > Wasn't that call introduced by Chuck (see also commit 20cf4e026730 > ("rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors") > # v5.4)? Anyway, I'm fine with the proposed change. Yes this was introduced by Chuck, but no contract is broken :) I'll update srpt in V2. > Thanks, > > Bart.
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index 4f25b24..a7cbf52 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -217,6 +217,7 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private, cq->device = dev; cq->cq_context = private; cq->poll_ctx = poll_ctx; + cq->comp_vector = comp_vector; atomic_set(&cq->usecnt, 0); cq->wc = kmalloc_array(IB_POLL_BATCH, sizeof(*cq->wc), GFP_KERNEL); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fc8207d..0d61772 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1558,6 +1558,7 @@ struct ib_cq { struct ib_device *device; struct ib_ucq_object *uobject; ib_comp_handler comp_handler; + u32 comp_vector; void (*event_handler)(struct ib_event *, void *); void *cq_context; int cqe;
In some cases, e.g. when using ib_alloc_cq_any, one would like to know the completion vector that eventually assigned to the CQ. Cache this value during CQ creation. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> --- drivers/infiniband/core/cq.c | 1 + include/rdma/ib_verbs.h | 1 + 2 files changed, 2 insertions(+)