diff mbox

mlx5 + SRP: max_qp_sz mismatch

Message ID 53FF51BE.1080103@acm.org (mailing list archive)
State Rejected
Headers show

Commit Message

Bart Van Assche Aug. 28, 2014, 3:58 p.m. UTC
On 08/27/14 13:28, Eli Cohen wrote:
> On 08/26/14 18:10, Sagi Grimberg wrote:
>>
>> Since I don't know how true send queue size can be computed from the 
>> device capabilities at the moment -I can suggest a fix to srpt to 
>> retry with srp_sq_size/2 (ans so on until it succeeds...)
>>
> The device capabilities provide the maximum number of send work
> requests that the device supports but the actual number of work
> requests that can be supported in a specific case depends on other
> characteristics of the work requests. For example, in the case of
> Connect-IB, the actual number depends on the number of s/g entries,
> the transport type, etc. This is in compliance with the IB spec:
> 
> 11.2.1.2 QUERY HCA
> Description:
> Returns the attributes for the specified HCA.
> The maximum values defined in this section are guaranteed
> not-to-exceed values. It is possible for an implementation to allocate
> some HCA resources from the same space. In that case, the maximum
> values returned are not guaranteed for all of those resources
> simultaneously.
> 
> So, a well written application should try smaller values if it fails
> with ENOMEM.
 
Hello Mark,

It would help if you could test the patch below. Sorry but I don't
have access to a ConnectIB setup myself.

Thanks,

Bart.

Reported-by: Mark Lehrer <lehrer@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Mark Lehrer Sept. 2, 2014, 5:36 p.m. UTC | #1
Just got back from a few days in Denver, I'll give it a try ASAP.

We also have a ton of ConnectX-3's and a few ConnectX-2's.  I'll give
it a quick try on those too just for fun.  And if anyone ever needs to
test something against one of these (and the test isn't prohibitively
difficult to set up)  I would be happy to give it a try.

Thanks,
Mark


On Thu, Aug 28, 2014 at 9:58 AM, Bart Van Assche <bvanassche@acm.org> wrote:
> On 08/27/14 13:28, Eli Cohen wrote:
>> On 08/26/14 18:10, Sagi Grimberg wrote:
>>>
>>> Since I don't know how true send queue size can be computed from the
>>> device capabilities at the moment -I can suggest a fix to srpt to
>>> retry with srp_sq_size/2 (ans so on until it succeeds...)
>>>
>> The device capabilities provide the maximum number of send work
>> requests that the device supports but the actual number of work
>> requests that can be supported in a specific case depends on other
>> characteristics of the work requests. For example, in the case of
>> Connect-IB, the actual number depends on the number of s/g entries,
>> the transport type, etc. This is in compliance with the IB spec:
>>
>> 11.2.1.2 QUERY HCA
>> Description:
>> Returns the attributes for the specified HCA.
>> The maximum values defined in this section are guaranteed
>> not-to-exceed values. It is possible for an implementation to allocate
>> some HCA resources from the same space. In that case, the maximum
>> values returned are not guaranteed for all of those resources
>> simultaneously.
>>
>> So, a well written application should try smaller values if it fails
>> with ENOMEM.
>
> Hello Mark,
>
> It would help if you could test the patch below. Sorry but I don't
> have access to a ConnectIB setup myself.
>
> Thanks,
>
> Bart.
>
> Reported-by: Mark Lehrer <lehrer@gmail.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/infiniband/ulp/srpt/ib_srpt.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
> index fe09f27..3ffaf4e 100644
> --- a/drivers/infiniband/ulp/srpt/ib_srpt.c
> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
> @@ -2091,6 +2091,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
>         if (!qp_init)
>                 goto out;
>
> +retry:
>         ch->cq = ib_create_cq(sdev->device, srpt_completion, NULL, ch,
>                               ch->rq_size + srp_sq_size, 0);
>         if (IS_ERR(ch->cq)) {
> @@ -2114,6 +2115,13 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
>         ch->qp = ib_create_qp(sdev->pd, qp_init);
>         if (IS_ERR(ch->qp)) {
>                 ret = PTR_ERR(ch->qp);
> +               if (ret == -ENOMEM) {
> +                       srp_sq_size /= 2;
> +                       if (srp_sq_size >= MIN_SRPT_SQ_SIZE) {
> +                               ib_destroy_cq(ch->cq);
> +                               goto retry;
> +                       }
> +               }
>                 printk(KERN_ERR "failed to create_qp ret= %d\n", ret);
>                 goto err_destroy_cq;
>         }
> --
> 1.8.4.5
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index fe09f27..3ffaf4e 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -2091,6 +2091,7 @@  static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 	if (!qp_init)
 		goto out;
 
+retry:
 	ch->cq = ib_create_cq(sdev->device, srpt_completion, NULL, ch,
 			      ch->rq_size + srp_sq_size, 0);
 	if (IS_ERR(ch->cq)) {
@@ -2114,6 +2115,13 @@  static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 	ch->qp = ib_create_qp(sdev->pd, qp_init);
 	if (IS_ERR(ch->qp)) {
 		ret = PTR_ERR(ch->qp);
+		if (ret == -ENOMEM) {
+			srp_sq_size /= 2;
+			if (srp_sq_size >= MIN_SRPT_SQ_SIZE) {
+				ib_destroy_cq(ch->cq);
+				goto retry;
+			}
+		}
 		printk(KERN_ERR "failed to create_qp ret= %d\n", ret);
 		goto err_destroy_cq;
 	}