diff mbox series

[for-rc] RDMA/rxe: Fix rnr retry behavior

Message ID 20220512194901.76696-1-rpearsonhpe@gmail.com (mailing list archive)
State Superseded
Headers show
Series [for-rc] RDMA/rxe: Fix rnr retry behavior | expand

Commit Message

Bob Pearson May 12, 2022, 7:49 p.m. UTC
Currently the completer tasklet when it sets the retransmit timer or the
nak timer sets the same flag (qp->req.need_retry) so that if either
timer fires it will attempt to perform a retry flow on the send queue.
This has the effect of responding to an RNR NAK at the first retransmit
timer event which does not allow for the requested rnr timeout.

This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
prevents a retry flow until the rnr nak timer fires.

This patch fixes rnr retry errors which can be observed by running the
pyverbs test suite 50-100X. With this patch applied they do not occur.

Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
 drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
 drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
 drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
 4 files changed, 7 insertions(+), 5 deletions(-)


base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a

Comments

Zhu Yanjun May 13, 2022, 2:40 a.m. UTC | #1
在 2022/5/13 3:49, Bob Pearson 写道:
> Currently the completer tasklet when it sets the retransmit timer or the
> nak timer sets the same flag (qp->req.need_retry) so that if either
> timer fires it will attempt to perform a retry flow on the send queue.
> This has the effect of responding to an RNR NAK at the first retransmit
> timer event which does not allow for the requested rnr timeout.
> 
> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
> prevents a retry flow until the rnr nak timer fires.
> 
> This patch fixes rnr retry errors which can be observed by running the
> pyverbs test suite 50-100X. With this patch applied they do not occur.

Do you mean that running run_tests.py for 50-100times can reproduce this 
bug? I want to reproduce this problem.

Thanks a lot.
Zhu Yanjun

> 
> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>   4 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 138b3e7d3a5f..bc668cb211b1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>   				if (qp->comp.rnr_retry != 7)
>   					qp->comp.rnr_retry--;
>   
> -				qp->req.need_retry = 1;
> -				pr_debug("qp#%d set rnr nak timer\n",
> -					 qp_num(qp));
> +				qp->req.need_rnr_timeout = 1;
>   				mod_timer(&qp->rnr_nak_timer,
>   					  jiffies + rnrnak_jiffies(aeth_syn(pkt)
>   						& ~AETH_TYPE_MASK));
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 62acf890af6c..1c962468714e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>   	atomic_set(&qp->ssn, 0);
>   	qp->req.opcode = -1;
>   	qp->req.need_retry = 0;
> +	qp->req.need_rnr_timeout = 0;
>   	qp->req.noack_pkts = 0;
>   	qp->resp.msn = 0;
>   	qp->resp.opcode = -1;
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index ae5fbc79dd5c..770ae4279f73 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>   {
>   	struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>   
> -	pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
> +	qp->req.need_retry = 1;
> +	qp->req.need_rnr_timeout = 0;
>   	rxe_run_task(&qp->req.task, 1);
>   }
>   
> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>   		qp->req.need_rd_atomic = 0;
>   		qp->req.wait_psn = 0;
>   		qp->req.need_retry = 0;
> +		qp->req.need_rnr_timeout = 0;
>   		goto exit;
>   	}
>   
> -	if (unlikely(qp->req.need_retry)) {
> +	if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>   		req_retry(qp);
>   		qp->req.need_retry = 0;
>   	}
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index e7eff1ca75e9..ab3186478c3f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -123,6 +123,7 @@ struct rxe_req_info {
>   	int			need_rd_atomic;
>   	int			wait_psn;
>   	int			need_retry;
> +	int			need_rnr_timeout;
>   	int			noack_pkts;
>   	struct rxe_task		task;
>   };
> 
> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
Tom Talpey May 13, 2022, 1:04 p.m. UTC | #2
On 5/12/2022 3:49 PM, Bob Pearson wrote:
> Currently the completer tasklet when it sets the retransmit timer or the
> nak timer sets the same flag (qp->req.need_retry) so that if either
> timer fires it will attempt to perform a retry flow on the send queue.
> This has the effect of responding to an RNR NAK at the first retransmit
> timer event which does not allow for the requested rnr timeout.
> 
> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
> prevents a retry flow until the rnr nak timer fires.

The new name is a little confusing, nobody "needs" an RNR timeout. :)
But it seems really odd (and maybe fragile) to have two flags. To me,
setting need_retry to "-1", or anything but 0 or 1, would be better.
After all, if the RNR NAK timer goes off, the code will generally retry, 
right? So it seems more logical to merge these.

> This patch fixes rnr retry errors which can be observed by running the
> pyverbs test suite 50-100X. With this patch applied they do not occur.
> 
> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>   4 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 138b3e7d3a5f..bc668cb211b1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>   				if (qp->comp.rnr_retry != 7)
>   					qp->comp.rnr_retry--;
>   
> -				qp->req.need_retry = 1;
> -				pr_debug("qp#%d set rnr nak timer\n",
> -					 qp_num(qp));
> +				qp->req.need_rnr_timeout = 1;

Suggest req.need_rnr_retry = -1  (and keep the useful pr_debug!)

>   				mod_timer(&qp->rnr_nak_timer,
>   					  jiffies + rnrnak_jiffies(aeth_syn(pkt)
>   						& ~AETH_TYPE_MASK));
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 62acf890af6c..1c962468714e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>   	atomic_set(&qp->ssn, 0);
>   	qp->req.opcode = -1;
>   	qp->req.need_retry = 0;
> +	qp->req.need_rnr_timeout = 0;
>   	qp->req.noack_pkts = 0;
>   	qp->resp.msn = 0;
>   	qp->resp.opcode = -1;
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index ae5fbc79dd5c..770ae4279f73 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>   {
>   	struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>   
> -	pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
> +	qp->req.need_retry = 1;
> +	qp->req.need_rnr_timeout = 0;

Simply setting need_retry = 1 would suffice, if suggestion accepted.

>   	rxe_run_task(&qp->req.task, 1);
>   }
>   
> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>   		qp->req.need_rd_atomic = 0;
>   		qp->req.wait_psn = 0;
>   		qp->req.need_retry = 0;
> +		qp->req.need_rnr_timeout = 0;
>   		goto exit;
>   	}
>   
> -	if (unlikely(qp->req.need_retry)) {
> +	if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {

This would become (unlikely (qp->req.rnr_retry > 0)) ...

>   		req_retry(qp);
>   		qp->req.need_retry = 0;
>   	}
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index e7eff1ca75e9..ab3186478c3f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -123,6 +123,7 @@ struct rxe_req_info {
>   	int			need_rd_atomic;
>   	int			wait_psn;
>   	int			need_retry;
> +	int			need_rnr_timeout;

Drop

>   	int			noack_pkts;
>   	struct rxe_task		task;
>   };
> 
> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a


Tom.
Zhu Yanjun May 13, 2022, 1:38 p.m. UTC | #3
在 2022/5/13 10:40, Yanjun Zhu 写道:
> 在 2022/5/13 3:49, Bob Pearson 写道:
>> Currently the completer tasklet when it sets the retransmit timer or the
>> nak timer sets the same flag (qp->req.need_retry) so that if either
>> timer fires it will attempt to perform a retry flow on the send queue.
>> This has the effect of responding to an RNR NAK at the first retransmit
>> timer event which does not allow for the requested rnr timeout.
>>
>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>> prevents a retry flow until the rnr nak timer fires.
>>
>> This patch fixes rnr retry errors which can be observed by running the
>> pyverbs test suite 50-100X. With this patch applied they do not occur.
> 
> Do you mean that running run_tests.py for 50-100times can reproduce this 
> bug? I want to reproduce this problem.

Running run_tests.py for 50-100times, I can not reproduce this problem

Zhu Yanjun

> 
> Thanks a lot.
> Zhu Yanjun
> 
>>
>> Link: 
>> https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/ 
>>
>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>   4 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c 
>> b/drivers/infiniband/sw/rxe/rxe_comp.c
>> index 138b3e7d3a5f..bc668cb211b1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>                   if (qp->comp.rnr_retry != 7)
>>                       qp->comp.rnr_retry--;
>> -                qp->req.need_retry = 1;
>> -                pr_debug("qp#%d set rnr nak timer\n",
>> -                     qp_num(qp));
>> +                qp->req.need_rnr_timeout = 1;
>>                   mod_timer(&qp->rnr_nak_timer,
>>                         jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>                           & ~AETH_TYPE_MASK));
>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c 
>> b/drivers/infiniband/sw/rxe/rxe_qp.c
>> index 62acf890af6c..1c962468714e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>       atomic_set(&qp->ssn, 0);
>>       qp->req.opcode = -1;
>>       qp->req.need_retry = 0;
>> +    qp->req.need_rnr_timeout = 0;
>>       qp->req.noack_pkts = 0;
>>       qp->resp.msn = 0;
>>       qp->resp.opcode = -1;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c 
>> b/drivers/infiniband/sw/rxe/rxe_req.c
>> index ae5fbc79dd5c..770ae4279f73 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>   {
>>       struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>> -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>> +    qp->req.need_retry = 1;
>> +    qp->req.need_rnr_timeout = 0;
>>       rxe_run_task(&qp->req.task, 1);
>>   }
>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>           qp->req.need_rd_atomic = 0;
>>           qp->req.wait_psn = 0;
>>           qp->req.need_retry = 0;
>> +        qp->req.need_rnr_timeout = 0;
>>           goto exit;
>>       }
>> -    if (unlikely(qp->req.need_retry)) {
>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>           req_retry(qp);
>>           qp->req.need_retry = 0;
>>       }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h 
>> b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> index e7eff1ca75e9..ab3186478c3f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>       int            need_rd_atomic;
>>       int            wait_psn;
>>       int            need_retry;
>> +    int            need_rnr_timeout;
>>       int            noack_pkts;
>>       struct rxe_task        task;
>>   };
>>
>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>
Bob Pearson May 13, 2022, 3:33 p.m. UTC | #4
On 5/13/22 08:04, Tom Talpey wrote:
> 
> On 5/12/2022 3:49 PM, Bob Pearson wrote:
>> Currently the completer tasklet when it sets the retransmit timer or the
>> nak timer sets the same flag (qp->req.need_retry) so that if either
>> timer fires it will attempt to perform a retry flow on the send queue.
>> This has the effect of responding to an RNR NAK at the first retransmit
>> timer event which does not allow for the requested rnr timeout.
>>
>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>> prevents a retry flow until the rnr nak timer fires.
> 
> The new name is a little confusing, nobody "needs" an RNR timeout. :)
> But it seems really odd (and maybe fragile) to have two flags. To me,
> setting need_retry to "-1", or anything but 0 or 1, would be better.
> After all, if the RNR NAK timer goes off, the code will generally retry, right? So it seems more logical to merge these.
I am trying to cleanup pyverbs runs which sometimes fail with rnr retry errors.
In this specific case the rnr timeout value is set a lot longer than the retry timeout.
As discussed earlier the retry timer is never cleared so it continuously fires at about
40 times a second. The failing test is intentionally setting up a situation where the
receiver is not ready and then it is. The retry timeout usually acts as though it
were the rnr retry and 'fixes' the problem and the test passes. Since the retry timer
is just randomly firing it sometimes happens too soon and the receiver isn't ready yet
which leads to the failed test.

Logic says that if you receive an rnr nak the target is waiting for that specific send
packet to be resent and no other so we shouldn't be responding to a spurious retry
timeout.

You are correct that the retry sequence can be shared but it shouldn't start until the
rnr timer has fired once an rnr nak is seen. This patch does exactly that by blocking
the retry sequence until the rnr timer fires.

If you don't like 'need_rnr_timeout' perhaps you would accept 'wait_rnr_timeout' instead.
Overloading need_retry is less clear IMHO. You don't need the retry until the rnr timer
has fired then you do. If you really want to just use one variable we are basically
implementing a state machine with 3 states and it should get an enum to define the states.

Once this issue is resolved I will fix the spurious retries to make them a lot less likely.
> 
>> This patch fixes rnr retry errors which can be observed by running the
>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>
>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>   4 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>> index 138b3e7d3a5f..bc668cb211b1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>                   if (qp->comp.rnr_retry != 7)
>>                       qp->comp.rnr_retry--;
>>   -                qp->req.need_retry = 1;
>> -                pr_debug("qp#%d set rnr nak timer\n",
>> -                     qp_num(qp));
>> +                qp->req.need_rnr_timeout = 1;
> 
> Suggest req.need_rnr_retry = -1  (and keep the useful pr_debug!)
> 
>>                   mod_timer(&qp->rnr_nak_timer,
>>                         jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>                           & ~AETH_TYPE_MASK));
>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>> index 62acf890af6c..1c962468714e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>       atomic_set(&qp->ssn, 0);
>>       qp->req.opcode = -1;
>>       qp->req.need_retry = 0;
>> +    qp->req.need_rnr_timeout = 0;
>>       qp->req.noack_pkts = 0;
>>       qp->resp.msn = 0;
>>       qp->resp.opcode = -1;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index ae5fbc79dd5c..770ae4279f73 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>   {
>>       struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>   -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>> +    qp->req.need_retry = 1;
>> +    qp->req.need_rnr_timeout = 0;
> 
> Simply setting need_retry = 1 would suffice, if suggestion accepted.
> 
>>       rxe_run_task(&qp->req.task, 1);
>>   }
>>   @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>           qp->req.need_rd_atomic = 0;
>>           qp->req.wait_psn = 0;
>>           qp->req.need_retry = 0;
>> +        qp->req.need_rnr_timeout = 0;
>>           goto exit;
>>       }
>>   -    if (unlikely(qp->req.need_retry)) {
>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
> 
> This would become (unlikely (qp->req.rnr_retry > 0)) ...
> 
>>           req_retry(qp);
>>           qp->req.need_retry = 0;
>>       }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> index e7eff1ca75e9..ab3186478c3f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>       int            need_rd_atomic;
>>       int            wait_psn;
>>       int            need_retry;
>> +    int            need_rnr_timeout;
> 
> Drop
> 
>>       int            noack_pkts;
>>       struct rxe_task        task;
>>   };
>>
>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
> 
> 
> Tom.
Bob Pearson May 13, 2022, 3:42 p.m. UTC | #5
On 5/13/22 08:38, Yanjun Zhu wrote:
> 
> 
> 在 2022/5/13 10:40, Yanjun Zhu 写道:
>> 在 2022/5/13 3:49, Bob Pearson 写道:
>>> Currently the completer tasklet when it sets the retransmit timer or the
>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>> timer fires it will attempt to perform a retry flow on the send queue.
>>> This has the effect of responding to an RNR NAK at the first retransmit
>>> timer event which does not allow for the requested rnr timeout.
>>>
>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>> prevents a retry flow until the rnr nak timer fires.
>>>
>>> This patch fixes rnr retry errors which can be observed by running the
>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>
>> Do you mean that running run_tests.py for 50-100times can reproduce this bug? I want to reproduce this problem.
> 
> Running run_tests.py for 50-100times, I can not reproduce this problem

I see them consistently. It is just luck of timing. If you add print statements
to show when the retry timer fires and when the rnr timer fires you should see that
all the retry sequences are a result of the retry timer and not the rnr timer.
This means that you are not allowing the requested rnr delay for the responder.
In my traces most of the time the retry timer fires and the test ends and the qp
is torn down deleting the rnr timer before it ever times out. Occasionally the
retry timer fires too early and the test fails. This is still a bug and it should
be fixed.

Bob
> 
> Zhu Yanjun
> 
>>
>> Thanks a lot.
>> Zhu Yanjun
>>
>>>
>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>   4 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>                   if (qp->comp.rnr_retry != 7)
>>>                       qp->comp.rnr_retry--;
>>> -                qp->req.need_retry = 1;
>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>> -                     qp_num(qp));
>>> +                qp->req.need_rnr_timeout = 1;
>>>                   mod_timer(&qp->rnr_nak_timer,
>>>                         jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>                           & ~AETH_TYPE_MASK));
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> index 62acf890af6c..1c962468714e 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>       atomic_set(&qp->ssn, 0);
>>>       qp->req.opcode = -1;
>>>       qp->req.need_retry = 0;
>>> +    qp->req.need_rnr_timeout = 0;
>>>       qp->req.noack_pkts = 0;
>>>       qp->resp.msn = 0;
>>>       qp->resp.opcode = -1;
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>> index ae5fbc79dd5c..770ae4279f73 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>   {
>>>       struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>> -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>> +    qp->req.need_retry = 1;
>>> +    qp->req.need_rnr_timeout = 0;
>>>       rxe_run_task(&qp->req.task, 1);
>>>   }
>>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>           qp->req.need_rd_atomic = 0;
>>>           qp->req.wait_psn = 0;
>>>           qp->req.need_retry = 0;
>>> +        qp->req.need_rnr_timeout = 0;
>>>           goto exit;
>>>       }
>>> -    if (unlikely(qp->req.need_retry)) {
>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>           req_retry(qp);
>>>           qp->req.need_retry = 0;
>>>       }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> index e7eff1ca75e9..ab3186478c3f 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>       int            need_rd_atomic;
>>>       int            wait_psn;
>>>       int            need_retry;
>>> +    int            need_rnr_timeout;
>>>       int            noack_pkts;
>>>       struct rxe_task        task;
>>>   };
>>>
>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>
Tom Talpey May 13, 2022, 5:43 p.m. UTC | #6
On 5/13/2022 11:33 AM, Bob Pearson wrote:
> On 5/13/22 08:04, Tom Talpey wrote:
>>
>> On 5/12/2022 3:49 PM, Bob Pearson wrote:
>>> Currently the completer tasklet when it sets the retransmit timer or the
>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>> timer fires it will attempt to perform a retry flow on the send queue.
>>> This has the effect of responding to an RNR NAK at the first retransmit
>>> timer event which does not allow for the requested rnr timeout.
>>>
>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>> prevents a retry flow until the rnr nak timer fires.
>>
>> The new name is a little confusing, nobody "needs" an RNR timeout. :)
>> But it seems really odd (and maybe fragile) to have two flags. To me,
>> setting need_retry to "-1", or anything but 0 or 1, would be better.
>> After all, if the RNR NAK timer goes off, the code will generally retry, right? So it seems more logical to merge these.
> I am trying to cleanup pyverbs runs which sometimes fail with rnr retry errors.
> In this specific case the rnr timeout value is set a lot longer than the retry timeout.

Interesting. I was not aware that was possible, and I'm skeptical.
But perhaps that's another matter.

> As discussed earlier the retry timer is never cleared so it continuously fires at about
> 40 times a second. The failing test is intentionally setting up a situation where the
> receiver is not ready and then it is. The retry timeout usually acts as though it
> were the rnr retry and 'fixes' the problem and the test passes. Since the retry timer
> is just randomly firing it sometimes happens too soon and the receiver isn't ready yet
> which leads to the failed test.
> 
> Logic says that if you receive an rnr nak the target is waiting for that specific send
> packet to be resent and no other so we shouldn't be responding to a spurious retry
> timeout.
> 
> You are correct that the retry sequence can be shared but it shouldn't start until the
> rnr timer has fired once an rnr nak is seen. This patch does exactly that by blocking
> the retry sequence until the rnr timer fires.
> 
> If you don't like 'need_rnr_timeout' perhaps you would accept 'wait_rnr_timeout' instead.
> Overloading need_retry is less clear IMHO. You don't need the retry until the rnr timer
> has fired then you do. If you really want to just use one variable we are basically
> implementing a state machine with 3 states and it should get an enum to define the states.

I guess I do prefer the latter enum approach. To elaborate on what I
meant by fragile, it appears that with the two-field approach, only
three states are valid. It can never be the case that both are set,
yet here is your hunk in rxe_requester:

   if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {

The part after the && will always evaluate to true, right? The code
clears need_rnr_timeout wherever it sets need_retry. So this is testing
for an impossible case.

That said, if you don't agree, the name needs to be less confusing.
I guess I'd suggest "rnr_timer_running".

> Once this issue is resolved I will fix the spurious retries to make them a lot less likely.

Excellent. RoCE suffers from congestion enough without pouring
gasoline on the fire.

Tom.

>>
>>> This patch fixes rnr retry errors which can be observed by running the
>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>>
>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>> ---
>>>    drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>    drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>    drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>    drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>    4 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>                    if (qp->comp.rnr_retry != 7)
>>>                        qp->comp.rnr_retry--;
>>>    -                qp->req.need_retry = 1;
>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>> -                     qp_num(qp));
>>> +                qp->req.need_rnr_timeout = 1;
>>
>> Suggest req.need_rnr_retry = -1  (and keep the useful pr_debug!)
>>
>>>                    mod_timer(&qp->rnr_nak_timer,
>>>                          jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>                            & ~AETH_TYPE_MASK));
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> index 62acf890af6c..1c962468714e 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>        atomic_set(&qp->ssn, 0);
>>>        qp->req.opcode = -1;
>>>        qp->req.need_retry = 0;
>>> +    qp->req.need_rnr_timeout = 0;
>>>        qp->req.noack_pkts = 0;
>>>        qp->resp.msn = 0;
>>>        qp->resp.opcode = -1;
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>> index ae5fbc79dd5c..770ae4279f73 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>    {
>>>        struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>>    -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>> +    qp->req.need_retry = 1;
>>> +    qp->req.need_rnr_timeout = 0;
>>
>> Simply setting need_retry = 1 would suffice, if suggestion accepted.
>>
>>>        rxe_run_task(&qp->req.task, 1);
>>>    }
>>>    @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>            qp->req.need_rd_atomic = 0;
>>>            qp->req.wait_psn = 0;
>>>            qp->req.need_retry = 0;
>>> +        qp->req.need_rnr_timeout = 0;
>>>            goto exit;
>>>        }
>>>    -    if (unlikely(qp->req.need_retry)) {
>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>
>> This would become (unlikely (qp->req.rnr_retry > 0)) ...
>>
>>>            req_retry(qp);
>>>            qp->req.need_retry = 0;
>>>        }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> index e7eff1ca75e9..ab3186478c3f 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>        int            need_rd_atomic;
>>>        int            wait_psn;
>>>        int            need_retry;
>>> +    int            need_rnr_timeout;
>>
>> Drop
>>
>>>        int            noack_pkts;
>>>        struct rxe_task        task;
>>>    };
>>>
>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>
>>
>> Tom.
> 
>
Bob Pearson May 13, 2022, 6:22 p.m. UTC | #7
On 5/13/22 08:38, Yanjun Zhu wrote:
> 
> 
> 在 2022/5/13 10:40, Yanjun Zhu 写道:
>> 在 2022/5/13 3:49, Bob Pearson 写道:
>>> Currently the completer tasklet when it sets the retransmit timer or the
>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>> timer fires it will attempt to perform a retry flow on the send queue.
>>> This has the effect of responding to an RNR NAK at the first retransmit
>>> timer event which does not allow for the requested rnr timeout.
>>>
>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>> prevents a retry flow until the rnr nak timer fires.
>>>
>>> This patch fixes rnr retry errors which can be observed by running the
>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>
>> Do you mean that running run_tests.py for 50-100times can reproduce this bug? I want to reproduce this problem.
> 
> Running run_tests.py for 50-100times, I can not reproduce this problem

I went back and looked. The occasionally failing test case is:

	test_rdmacm_async_traffic_external_qp

This test is normally skipped for rxe because of a hack in <path to rdma-core>/tests/base.py. I sent you
a patch that enables these tests a few days ago.

It randomly causes an rnr retry about 1/3 of the time. Most of these are repaired by the retry timer
going off. A few ~1-2% are not.

Bob
> 
> Zhu Yanjun
> 
>>
>> Thanks a lot.
>> Zhu Yanjun
>>
>>>
>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>> ---
>>>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>   4 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>                   if (qp->comp.rnr_retry != 7)
>>>                       qp->comp.rnr_retry--;
>>> -                qp->req.need_retry = 1;
>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>> -                     qp_num(qp));
>>> +                qp->req.need_rnr_timeout = 1;
>>>                   mod_timer(&qp->rnr_nak_timer,
>>>                         jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>                           & ~AETH_TYPE_MASK));
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> index 62acf890af6c..1c962468714e 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>       atomic_set(&qp->ssn, 0);
>>>       qp->req.opcode = -1;
>>>       qp->req.need_retry = 0;
>>> +    qp->req.need_rnr_timeout = 0;
>>>       qp->req.noack_pkts = 0;
>>>       qp->resp.msn = 0;
>>>       qp->resp.opcode = -1;
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>> index ae5fbc79dd5c..770ae4279f73 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>   {
>>>       struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>> -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>> +    qp->req.need_retry = 1;
>>> +    qp->req.need_rnr_timeout = 0;
>>>       rxe_run_task(&qp->req.task, 1);
>>>   }
>>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>           qp->req.need_rd_atomic = 0;
>>>           qp->req.wait_psn = 0;
>>>           qp->req.need_retry = 0;
>>> +        qp->req.need_rnr_timeout = 0;
>>>           goto exit;
>>>       }
>>> -    if (unlikely(qp->req.need_retry)) {
>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>           req_retry(qp);
>>>           qp->req.need_retry = 0;
>>>       }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> index e7eff1ca75e9..ab3186478c3f 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>       int            need_rd_atomic;
>>>       int            wait_psn;
>>>       int            need_retry;
>>> +    int            need_rnr_timeout;
>>>       int            noack_pkts;
>>>       struct rxe_task        task;
>>>   };
>>>
>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>
Bob Pearson May 13, 2022, 6:32 p.m. UTC | #8
On 5/13/22 12:43, Tom Talpey wrote:
> On 5/13/2022 11:33 AM, Bob Pearson wrote:
>> On 5/13/22 08:04, Tom Talpey wrote:
>>>
>>> On 5/12/2022 3:49 PM, Bob Pearson wrote:
>>>> Currently the completer tasklet when it sets the retransmit timer or the
>>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>>> timer fires it will attempt to perform a retry flow on the send queue.
>>>> This has the effect of responding to an RNR NAK at the first retransmit
>>>> timer event which does not allow for the requested rnr timeout.
>>>>
>>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>>> prevents a retry flow until the rnr nak timer fires.
>>>
>>> The new name is a little confusing, nobody "needs" an RNR timeout. :)
>>> But it seems really odd (and maybe fragile) to have two flags. To me,
>>> setting need_retry to "-1", or anything but 0 or 1, would be better.
>>> After all, if the RNR NAK timer goes off, the code will generally retry, right? So it seems more logical to merge these.
>> I am trying to cleanup pyverbs runs which sometimes fail with rnr retry errors.
>> In this specific case the rnr timeout value is set a lot longer than the retry timeout.
> 
> Interesting. I was not aware that was possible, and I'm skeptical.
> But perhaps that's another matter.

It would make sense. The retry timer handles NIC to NIC failures which are fast.
Basically a network transit time or two. RNR recovery requires software to intervene
and post a recv WR.
 
> 
>> As discussed earlier the retry timer is never cleared so it continuously fires at about
>> 40 times a second. The failing test is intentionally setting up a situation where the
>> receiver is not ready and then it is. The retry timeout usually acts as though it
>> were the rnr retry and 'fixes' the problem and the test passes. Since the retry timer
>> is just randomly firing it sometimes happens too soon and the receiver isn't ready yet
>> which leads to the failed test.
>>
>> Logic says that if you receive an rnr nak the target is waiting for that specific send
>> packet to be resent and no other so we shouldn't be responding to a spurious retry
>> timeout.
>>
>> You are correct that the retry sequence can be shared but it shouldn't start until the
>> rnr timer has fired once an rnr nak is seen. This patch does exactly that by blocking
>> the retry sequence until the rnr timer fires.
>>
>> If you don't like 'need_rnr_timeout' perhaps you would accept 'wait_rnr_timeout' instead.
>> Overloading need_retry is less clear IMHO. You don't need the retry until the rnr timer
>> has fired then you do. If you really want to just use one variable we are basically
>> implementing a state machine with 3 states and it should get an enum to define the states.
> 
> I guess I do prefer the latter enum approach. To elaborate on what I
> meant by fragile, it appears that with the two-field approach, only
> three states are valid. It can never be the case that both are set,
> yet here is your hunk in rxe_requester:
> 
>   if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
> 
> The part after the && will always evaluate to true, right? The code
> clears need_rnr_timeout wherever it sets need_retry. So this is testing
> for an impossible case.

Wrong. In the RNR path things are as you say but if the retry timer fires,
independently of receiving an RNR NAK, need_retry is set and need_rnr_timeout
may or may not be true. If we did receive an RNR NAK then we should ignore
the state of the retry timer until the rnr timer goes off.

Bob
> 
> That said, if you don't agree, the name needs to be less confusing.
> I guess I'd suggest "rnr_timer_running".
> 
>> Once this issue is resolved I will fix the spurious retries to make them a lot less likely.
> 
> Excellent. RoCE suffers from congestion enough without pouring
> gasoline on the fire.
> 
> Tom.
> 
>>>
>>>> This patch fixes rnr retry errors which can be observed by running the
>>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>>>
>>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>> ---
>>>>    drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>>    drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>>    drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>>    drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>>    4 files changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>>                    if (qp->comp.rnr_retry != 7)
>>>>                        qp->comp.rnr_retry--;
>>>>    -                qp->req.need_retry = 1;
>>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>>> -                     qp_num(qp));
>>>> +                qp->req.need_rnr_timeout = 1;
>>>
>>> Suggest req.need_rnr_retry = -1  (and keep the useful pr_debug!)
>>>
>>>>                    mod_timer(&qp->rnr_nak_timer,
>>>>                          jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>>                            & ~AETH_TYPE_MASK));
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>> index 62acf890af6c..1c962468714e 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>>        atomic_set(&qp->ssn, 0);
>>>>        qp->req.opcode = -1;
>>>>        qp->req.need_retry = 0;
>>>> +    qp->req.need_rnr_timeout = 0;
>>>>        qp->req.noack_pkts = 0;
>>>>        qp->resp.msn = 0;
>>>>        qp->resp.opcode = -1;
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>>> index ae5fbc79dd5c..770ae4279f73 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>>    {
>>>>        struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>>>    -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>>> +    qp->req.need_retry = 1;
>>>> +    qp->req.need_rnr_timeout = 0;
>>>
>>> Simply setting need_retry = 1 would suffice, if suggestion accepted.
>>>
>>>>        rxe_run_task(&qp->req.task, 1);
>>>>    }
>>>>    @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>>            qp->req.need_rd_atomic = 0;
>>>>            qp->req.wait_psn = 0;
>>>>            qp->req.need_retry = 0;
>>>> +        qp->req.need_rnr_timeout = 0;
>>>>            goto exit;
>>>>        }
>>>>    -    if (unlikely(qp->req.need_retry)) {
>>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>
>>> This would become (unlikely (qp->req.rnr_retry > 0)) ...
>>>
>>>>            req_retry(qp);
>>>>            qp->req.need_retry = 0;
>>>>        }
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>> index e7eff1ca75e9..ab3186478c3f 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>>        int            need_rd_atomic;
>>>>        int            wait_psn;
>>>>        int            need_retry;
>>>> +    int            need_rnr_timeout;
>>>
>>> Drop
>>>
>>>>        int            noack_pkts;
>>>>        struct rxe_task        task;
>>>>    };
>>>>
>>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>>
>>>
>>> Tom.
>>
>>
Tom Talpey May 13, 2022, 7:08 p.m. UTC | #9
On 5/13/2022 2:32 PM, Bob Pearson wrote:
> On 5/13/22 12:43, Tom Talpey wrote:
>> On 5/13/2022 11:33 AM, Bob Pearson wrote:
>>> On 5/13/22 08:04, Tom Talpey wrote:
>>>>
>>>> On 5/12/2022 3:49 PM, Bob Pearson wrote:
>>>>> Currently the completer tasklet when it sets the retransmit timer or the
>>>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>>>> timer fires it will attempt to perform a retry flow on the send queue.
>>>>> This has the effect of responding to an RNR NAK at the first retransmit
>>>>> timer event which does not allow for the requested rnr timeout.
>>>>>
>>>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>>>> prevents a retry flow until the rnr nak timer fires.
>>>>
>>>> The new name is a little confusing, nobody "needs" an RNR timeout. :)
>>>> But it seems really odd (and maybe fragile) to have two flags. To me,
>>>> setting need_retry to "-1", or anything but 0 or 1, would be better.
>>>> After all, if the RNR NAK timer goes off, the code will generally retry, right? So it seems more logical to merge these.
>>> I am trying to cleanup pyverbs runs which sometimes fail with rnr retry errors.
>>> In this specific case the rnr timeout value is set a lot longer than the retry timeout.
>>
>> Interesting. I was not aware that was possible, and I'm skeptical.
>> But perhaps that's another matter.
> 
> It would make sense. The retry timer handles NIC to NIC failures which are fast.
> Basically a network transit time or two. RNR recovery requires software to intervene
> and post a recv WR.
>   
>>
>>> As discussed earlier the retry timer is never cleared so it continuously fires at about
>>> 40 times a second. The failing test is intentionally setting up a situation where the
>>> receiver is not ready and then it is. The retry timeout usually acts as though it
>>> were the rnr retry and 'fixes' the problem and the test passes. Since the retry timer
>>> is just randomly firing it sometimes happens too soon and the receiver isn't ready yet
>>> which leads to the failed test.
>>>
>>> Logic says that if you receive an rnr nak the target is waiting for that specific send
>>> packet to be resent and no other so we shouldn't be responding to a spurious retry
>>> timeout.
>>>
>>> You are correct that the retry sequence can be shared but it shouldn't start until the
>>> rnr timer has fired once an rnr nak is seen. This patch does exactly that by blocking
>>> the retry sequence until the rnr timer fires.
>>>
>>> If you don't like 'need_rnr_timeout' perhaps you would accept 'wait_rnr_timeout' instead.
>>> Overloading need_retry is less clear IMHO. You don't need the retry until the rnr timer
>>> has fired then you do. If you really want to just use one variable we are basically
>>> implementing a state machine with 3 states and it should get an enum to define the states.
>>
>> I guess I do prefer the latter enum approach. To elaborate on what I
>> meant by fragile, it appears that with the two-field approach, only
>> three states are valid. It can never be the case that both are set,
>> yet here is your hunk in rxe_requester:
>>
>>    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>
>> The part after the && will always evaluate to true, right? The code
>> clears need_rnr_timeout wherever it sets need_retry. So this is testing
>> for an impossible case.
> 
> Wrong. In the RNR path things are as you say but if the retry timer fires,
> independently of receiving an RNR NAK, need_retry is set and need_rnr_timeout
> may or may not be true. If we did receive an RNR NAK then we should ignore
> the state of the retry timer until the rnr timer goes off.

Ok, you're correct that it's possible that need_retry may be determined,
and at just that moment before the retry happens, an RNR NAK arrives and
starts the second timer. These fields aren't protected by locks so it
all seems super-racy. But anyway...

This really screams for clarity in the code, and again, I'll suggest
that the new field name is confusing.

Another name suggestion might be "need_retry_after_delay", which rhymes
with the existing "need_retry".

Tom.

> Bob
>>
>> That said, if you don't agree, the name needs to be less confusing.
>> I guess I'd suggest "rnr_timer_running".
>>
>>> Once this issue is resolved I will fix the spurious retries to make them a lot less likely.
>>
>> Excellent. RoCE suffers from congestion enough without pouring
>> gasoline on the fire.
>>
>> Tom.
>>
>>>>
>>>>> This patch fixes rnr retry errors which can be observed by running the
>>>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>>>>
>>>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>>> ---
>>>>>     drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>>>     drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>>>     drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>>>     drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>>>     4 files changed, 7 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>>>                     if (qp->comp.rnr_retry != 7)
>>>>>                         qp->comp.rnr_retry--;
>>>>>     -                qp->req.need_retry = 1;
>>>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>>>> -                     qp_num(qp));
>>>>> +                qp->req.need_rnr_timeout = 1;
>>>>
>>>> Suggest req.need_rnr_retry = -1  (and keep the useful pr_debug!)
>>>>
>>>>>                     mod_timer(&qp->rnr_nak_timer,
>>>>>                           jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>>>                             & ~AETH_TYPE_MASK));
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>>> index 62acf890af6c..1c962468714e 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>>>         atomic_set(&qp->ssn, 0);
>>>>>         qp->req.opcode = -1;
>>>>>         qp->req.need_retry = 0;
>>>>> +    qp->req.need_rnr_timeout = 0;
>>>>>         qp->req.noack_pkts = 0;
>>>>>         qp->resp.msn = 0;
>>>>>         qp->resp.opcode = -1;
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>>>> index ae5fbc79dd5c..770ae4279f73 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>>>     {
>>>>>         struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>>>>     -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>>>> +    qp->req.need_retry = 1;
>>>>> +    qp->req.need_rnr_timeout = 0;
>>>>
>>>> Simply setting need_retry = 1 would suffice, if suggestion accepted.
>>>>
>>>>>         rxe_run_task(&qp->req.task, 1);
>>>>>     }
>>>>>     @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>>>             qp->req.need_rd_atomic = 0;
>>>>>             qp->req.wait_psn = 0;
>>>>>             qp->req.need_retry = 0;
>>>>> +        qp->req.need_rnr_timeout = 0;
>>>>>             goto exit;
>>>>>         }
>>>>>     -    if (unlikely(qp->req.need_retry)) {
>>>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>>
>>>> This would become (unlikely (qp->req.rnr_retry > 0)) ...
>>>>
>>>>>             req_retry(qp);
>>>>>             qp->req.need_retry = 0;
>>>>>         }
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>>> index e7eff1ca75e9..ab3186478c3f 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>>>         int            need_rd_atomic;
>>>>>         int            wait_psn;
>>>>>         int            need_retry;
>>>>> +    int            need_rnr_timeout;
>>>>
>>>> Drop
>>>>
>>>>>         int            noack_pkts;
>>>>>         struct rxe_task        task;
>>>>>     };
>>>>>
>>>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>>>>
>>>>
>>>> Tom.
>>>
>>>
> 
>
Zhu Yanjun May 14, 2022, 2:30 a.m. UTC | #10
在 2022/5/14 2:22, Bob Pearson 写道:
> On 5/13/22 08:38, Yanjun Zhu wrote:
>>
>> 在 2022/5/13 10:40, Yanjun Zhu 写道:
>>> 在 2022/5/13 3:49, Bob Pearson 写道:
>>>> Currently the completer tasklet when it sets the retransmit timer or the
>>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>>> timer fires it will attempt to perform a retry flow on the send queue.
>>>> This has the effect of responding to an RNR NAK at the first retransmit
>>>> timer event which does not allow for the requested rnr timeout.
>>>>
>>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>>> prevents a retry flow until the rnr nak timer fires.
>>>>
>>>> This patch fixes rnr retry errors which can be observed by running the
>>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>> Do you mean that running run_tests.py for 50-100times can reproduce this bug? I want to reproduce this problem.
>> Running run_tests.py for 50-100times, I can not reproduce this problem
> I went back and looked. The occasionally failing test case is:
>
> 	test_rdmacm_async_traffic_external_qp
>
> This test is normally skipped for rxe because of a hack in <path to rdma-core>/tests/base.py. I sent you
> a patch that enables these tests a few days ago.

What is the patch? Can you send me it again?

And what is the symptoms when this problem occurs?

I want to check if I can reproduce this problem.


Thanks and Regards,

Zhu Yanjun

>
> It randomly causes an rnr retry about 1/3 of the time. Most of these are repaired by the retry timer
> going off. A few ~1-2% are not.
>
> Bob
>> Zhu Yanjun
>>
>>> Thanks a lot.
>>> Zhu Yanjun
>>>
>>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>> ---
>>>>    drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>>    drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>>    drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>>    drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>>    4 files changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>>                    if (qp->comp.rnr_retry != 7)
>>>>                        qp->comp.rnr_retry--;
>>>> -                qp->req.need_retry = 1;
>>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>>> -                     qp_num(qp));
>>>> +                qp->req.need_rnr_timeout = 1;
>>>>                    mod_timer(&qp->rnr_nak_timer,
>>>>                          jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>>                            & ~AETH_TYPE_MASK));
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>> index 62acf890af6c..1c962468714e 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>>        atomic_set(&qp->ssn, 0);
>>>>        qp->req.opcode = -1;
>>>>        qp->req.need_retry = 0;
>>>> +    qp->req.need_rnr_timeout = 0;
>>>>        qp->req.noack_pkts = 0;
>>>>        qp->resp.msn = 0;
>>>>        qp->resp.opcode = -1;
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>>> index ae5fbc79dd5c..770ae4279f73 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>>    {
>>>>        struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>>> -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>>> +    qp->req.need_retry = 1;
>>>> +    qp->req.need_rnr_timeout = 0;
>>>>        rxe_run_task(&qp->req.task, 1);
>>>>    }
>>>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>>            qp->req.need_rd_atomic = 0;
>>>>            qp->req.wait_psn = 0;
>>>>            qp->req.need_retry = 0;
>>>> +        qp->req.need_rnr_timeout = 0;
>>>>            goto exit;
>>>>        }
>>>> -    if (unlikely(qp->req.need_retry)) {
>>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>>            req_retry(qp);
>>>>            qp->req.need_retry = 0;
>>>>        }
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>> index e7eff1ca75e9..ab3186478c3f 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>>        int            need_rd_atomic;
>>>>        int            wait_psn;
>>>>        int            need_retry;
>>>> +    int            need_rnr_timeout;
>>>>        int            noack_pkts;
>>>>        struct rxe_task        task;
>>>>    };
>>>>
>>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
Bob Pearson May 14, 2022, 3:25 a.m. UTC | #11
On 5/13/22 21:30, Yanjun Zhu wrote:
> 
> 在 2022/5/14 2:22, Bob Pearson 写道:
>> On 5/13/22 08:38, Yanjun Zhu wrote:
>>>
>>> 在 2022/5/13 10:40, Yanjun Zhu 写道:
>>>> 在 2022/5/13 3:49, Bob Pearson 写道:
>>>>> Currently the completer tasklet when it sets the retransmit timer or the
>>>>> nak timer sets the same flag (qp->req.need_retry) so that if either
>>>>> timer fires it will attempt to perform a retry flow on the send queue.
>>>>> This has the effect of responding to an RNR NAK at the first retransmit
>>>>> timer event which does not allow for the requested rnr timeout.
>>>>>
>>>>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>>>>> prevents a retry flow until the rnr nak timer fires.
>>>>>
>>>>> This patch fixes rnr retry errors which can be observed by running the
>>>>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>>>> Do you mean that running run_tests.py for 50-100times can reproduce this bug? I want to reproduce this problem.
>>> Running run_tests.py for 50-100times, I can not reproduce this problem
>> I went back and looked. The occasionally failing test case is:
>>
>>     test_rdmacm_async_traffic_external_qp
>>
>> This test is normally skipped for rxe because of a hack in <path to rdma-core>/tests/base.py. I sent you
>> a patch that enables these tests a few days ago.
> 
> What is the patch? Can you send me it again?

diff --git a/tests/base.py b/tests/base.py

index 93568c7f..03b5601e 100644

--- a/tests/base.py

+++ b/tests/base.py

@@ -245,10 +245,11 @@ class RDMATestCase(unittest.TestCase):

         self._add_gids_per_port(ctx, dev, self.ib_port)

 

     def _get_ip_mac(self, dev, port, idx):

-        if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):

-            self.args.append([dev, port, idx, None, None])

-            return

-        net_name = self.get_net_name(dev)

+        #if not os.path.exists('/sys/class/infiniband/{}/device/net/'.format(dev)):

+            #self.args.append([dev, port, idx, None, None])

+            #return

+        #net_name = self.get_net_name(dev)

+        net_name = "enp0s3"

         try:

             ip_addr, mac_addr = self.get_ip_mac_address(net_name)

         except (KeyError, IndexError):


Save this to a file (e.g. net_name.patch) and type git apply net_name.patch.

Before doing this change "enp0s3" to the correct name for the Ethernet adapter you are
using in your system. tests/base.py assumes that all roce devices have a file with a path

/sys/class/infiniband/rxe0/device/net/

But this is not always the case and rxe does not have that path. This breaks all the
rdmacm test cases in pyverbs.

> 
> And what is the symptoms when this problem occurs?

When I run

# sudo ./build/bin/run_tests -k test_rdmacm_async_traffic_external_qp several times

I will eventually see an rnr timeout error. You may not have this experience but you
can add print statements to the retransmit_timer and rnr_timer routines in rxe_comp.c
and rxe_req.c respectively. You should see that without the below patch you will
never see the rnr_timer fire and you will see lots of retransmit_timer fires.

The current code responds to the retry timer by initiating a send queue retry and
the test will usually pass if it has sent an RNR NAK and tear down the rnr timer
before it have a chance to fire.

Bob
> 
> I want to check if I can reproduce this problem.
> 
> 
> Thanks and Regards,
> 
> Zhu Yanjun

Thanks

Bob
> 
>>
>> It randomly causes an rnr retry about 1/3 of the time. Most of these are repaired by the retry timer
>> going off. A few ~1-2% are not.
>>
>> Bob
>>> Zhu Yanjun
>>>
>>>> Thanks a lot.
>>>> Zhu Yanjun
>>>>
>>>>> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>>>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>>>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>>>> ---
>>>>>    drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>>>>>    drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>>>>>    drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>>>>>    drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>>>>>    4 files changed, 7 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>>> index 138b3e7d3a5f..bc668cb211b1 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>>>>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>>>>>                    if (qp->comp.rnr_retry != 7)
>>>>>                        qp->comp.rnr_retry--;
>>>>> -                qp->req.need_retry = 1;
>>>>> -                pr_debug("qp#%d set rnr nak timer\n",
>>>>> -                     qp_num(qp));
>>>>> +                qp->req.need_rnr_timeout = 1;
>>>>>                    mod_timer(&qp->rnr_nak_timer,
>>>>>                          jiffies + rnrnak_jiffies(aeth_syn(pkt)
>>>>>                            & ~AETH_TYPE_MASK));
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>>> index 62acf890af6c..1c962468714e 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>>>>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>>>>>        atomic_set(&qp->ssn, 0);
>>>>>        qp->req.opcode = -1;
>>>>>        qp->req.need_retry = 0;
>>>>> +    qp->req.need_rnr_timeout = 0;
>>>>>        qp->req.noack_pkts = 0;
>>>>>        qp->resp.msn = 0;
>>>>>        qp->resp.opcode = -1;
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>>>>> index ae5fbc79dd5c..770ae4279f73 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>>>>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>>>>>    {
>>>>>        struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>>>>> -    pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>>>>> +    qp->req.need_retry = 1;
>>>>> +    qp->req.need_rnr_timeout = 0;
>>>>>        rxe_run_task(&qp->req.task, 1);
>>>>>    }
>>>>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>>>>>            qp->req.need_rd_atomic = 0;
>>>>>            qp->req.wait_psn = 0;
>>>>>            qp->req.need_retry = 0;
>>>>> +        qp->req.need_rnr_timeout = 0;
>>>>>            goto exit;
>>>>>        }
>>>>> -    if (unlikely(qp->req.need_retry)) {
>>>>> +    if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>>>>>            req_retry(qp);
>>>>>            qp->req.need_retry = 0;
>>>>>        }
>>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>>> index e7eff1ca75e9..ab3186478c3f 100644
>>>>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>>>>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>>>>>        int            need_rd_atomic;
>>>>>        int            wait_psn;
>>>>>        int            need_retry;
>>>>> +    int            need_rnr_timeout;
>>>>>        int            noack_pkts;
>>>>>        struct rxe_task        task;
>>>>>    };
>>>>>
>>>>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 138b3e7d3a5f..bc668cb211b1 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -733,9 +733,7 @@  int rxe_completer(void *arg)
 				if (qp->comp.rnr_retry != 7)
 					qp->comp.rnr_retry--;
 
-				qp->req.need_retry = 1;
-				pr_debug("qp#%d set rnr nak timer\n",
-					 qp_num(qp));
+				qp->req.need_rnr_timeout = 1;
 				mod_timer(&qp->rnr_nak_timer,
 					  jiffies + rnrnak_jiffies(aeth_syn(pkt)
 						& ~AETH_TYPE_MASK));
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 62acf890af6c..1c962468714e 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -513,6 +513,7 @@  static void rxe_qp_reset(struct rxe_qp *qp)
 	atomic_set(&qp->ssn, 0);
 	qp->req.opcode = -1;
 	qp->req.need_retry = 0;
+	qp->req.need_rnr_timeout = 0;
 	qp->req.noack_pkts = 0;
 	qp->resp.msn = 0;
 	qp->resp.opcode = -1;
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index ae5fbc79dd5c..770ae4279f73 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -103,7 +103,8 @@  void rnr_nak_timer(struct timer_list *t)
 {
 	struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
 
-	pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
+	qp->req.need_retry = 1;
+	qp->req.need_rnr_timeout = 0;
 	rxe_run_task(&qp->req.task, 1);
 }
 
@@ -624,10 +625,11 @@  int rxe_requester(void *arg)
 		qp->req.need_rd_atomic = 0;
 		qp->req.wait_psn = 0;
 		qp->req.need_retry = 0;
+		qp->req.need_rnr_timeout = 0;
 		goto exit;
 	}
 
-	if (unlikely(qp->req.need_retry)) {
+	if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
 		req_retry(qp);
 		qp->req.need_retry = 0;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index e7eff1ca75e9..ab3186478c3f 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -123,6 +123,7 @@  struct rxe_req_info {
 	int			need_rd_atomic;
 	int			wait_psn;
 	int			need_retry;
+	int			need_rnr_timeout;
 	int			noack_pkts;
 	struct rxe_task		task;
 };