diff mbox series

RDMA/rxe: Delete error messages triggered by incoming Read requests

Message ID 20220829054413.1630495-1-matsuda-daisuke@fujitsu.com (mailing list archive)
State Superseded
Headers show
Series RDMA/rxe: Delete error messages triggered by incoming Read requests | expand

Commit Message

Daisuke Matsuda (Fujitsu) Aug. 29, 2022, 5:44 a.m. UTC
An incoming Read request causes multiple Read responses. If a user MR to
copy data from is unavailable or responder cannot send a reply, then the
error messages can be printed for each response attempt, resulting in
message overflow.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

Cheng Xu Aug. 29, 2022, 5:51 a.m. UTC | #1
On 8/29/22 1:44 PM, Daisuke Matsuda wrote:
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
> 
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  
>  	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>  			  payload, RXE_FROM_MR_OBJ);
> -	if (err)
> -		pr_err("Failed copying memory\n");

The err is set but not used. Below is better:

-  	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
-  			  payload, RXE_FROM_MR_OBJ);
+	rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
+		    payload, RXE_FROM_MR_OBJ);

Thanks,
Cheng Xu

>  	if (mr)
>  		rxe_put(mr);
>  
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>  	}
>  
>  	err = rxe_xmit_packet(qp, &ack_pkt, skb);
> -	if (err) {
> -		pr_err("Failed sending RDMA reply.\n");
> +	if (err)
>  		return RESPST_ERR_RNR;
> -	}
>  
>  	res->read.va += payload;
>  	res->read.resid -= payload;
Zhu Yanjun Aug. 29, 2022, 6:31 a.m. UTC | #2
On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
<matsuda-daisuke@fujitsu.com> wrote:
>
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>
>         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>                           payload, RXE_FROM_MR_OBJ);
> -       if (err)
> -               pr_err("Failed copying memory\n");

pr_err_once is better?

>         if (mr)
>                 rxe_put(mr);
>
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>         }
>
>         err = rxe_xmit_packet(qp, &ack_pkt, skb);
> -       if (err) {
> -               pr_err("Failed sending RDMA reply.\n");

The same with the above

Zhu Yanjun

> +       if (err)
>                 return RESPST_ERR_RNR;
> -       }
>
>         res->read.va += payload;
>         res->read.resid -= payload;
> --
> 2.31.1
>
Leon Romanovsky Aug. 29, 2022, 7:02 a.m. UTC | #3
On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote:
> On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
> <matsuda-daisuke@fujitsu.com> wrote:
> >
> > An incoming Read request causes multiple Read responses. If a user MR to
> > copy data from is unavailable or responder cannot send a reply, then the
> > error messages can be printed for each response attempt, resulting in
> > message overflow.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> >  1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >
> >         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> >                           payload, RXE_FROM_MR_OBJ);
> > -       if (err)
> > -               pr_err("Failed copying memory\n");
> 
> pr_err_once is better?

Like Jason said, there shouldn't any prints in network triggered flows.

Thanks
Zhu Yanjun Aug. 29, 2022, 7:13 a.m. UTC | #4
On Mon, Aug 29, 2022 at 3:02 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote:
> > On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda
> > <matsuda-daisuke@fujitsu.com> wrote:
> > >
> > > An incoming Read request causes multiple Read responses. If a user MR to
> > > copy data from is unavailable or responder cannot send a reply, then the
> > > error messages can be printed for each response attempt, resulting in
> > > message overflow.
> > >
> > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > > ---
> > >  drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> > >  1 file changed, 1 insertion(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> > >
> > >         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > >                           payload, RXE_FROM_MR_OBJ);
> > > -       if (err)
> > > -               pr_err("Failed copying memory\n");
> >
> > pr_err_once is better?
>
> Like Jason said, there shouldn't any prints in network triggered flows.
>

Ok, thanks.

Zhu Yanjun
> Thanks
Li Zhijian Aug. 29, 2022, 7:36 a.m. UTC | #5
On 29/08/2022 13:44, Daisuke Matsuda wrote:
> An incoming Read request causes multiple Read responses. If a user MR to
> copy data from is unavailable or responder cannot send a reply, then the
> error messages can be printed for each response attempt, resulting in
> message overflow.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>   1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index b36ec5c4d5e0..4b3e8aec2fb8 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>   
>   	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>   			  payload, RXE_FROM_MR_OBJ);
> -	if (err)
> -		pr_err("Failed copying memory\n");
Not relate to this patch.
I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
IMO, when err happens, responder shall notify the request anyhow.

Thanks
Zhijian

>   	if (mr)
>   		rxe_put(mr);
>   
> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>   	}
>   
>   	err = rxe_xmit_packet(qp, &ack_pkt, skb);
> -	if (err) {
> -		pr_err("Failed sending RDMA reply.\n");
> +	if (err)
>   		return RESPST_ERR_RNR;
> -	}
>   
>   	res->read.va += payload;
>   	res->read.resid -= payload;
Daisuke Matsuda (Fujitsu) Aug. 29, 2022, 10:21 a.m. UTC | #6
On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
> On 29/08/2022 13:44, Daisuke Matsuda wrote:
> > An incoming Read request causes multiple Read responses. If a user MR to
> > copy data from is unavailable or responder cannot send a reply, then the
> > error messages can be printed for each response attempt, resulting in
> > message overflow.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> >   drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
> >   1 file changed, 1 insertion(+), 5 deletions(-)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index b36ec5c4d5e0..4b3e8aec2fb8 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >
> >   	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> >   			  payload, RXE_FROM_MR_OBJ);
> > -	if (err)
> > -		pr_err("Failed copying memory\n");
> Not relate to this patch.
> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
> IMO, when err happens, responder shall notify the request anyhow.

Practically, I have never seen rxe_mr_copy() failed before,
but I agree the implementation may be incorrect as you mentioned.

As far as I tested, responder replied with the requested amount of payloads
even when rxe_mr_copy() is modified to fail. In this case, 
requester may mistakenly believe that they get data correctly.

For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).

Daisuke Matsuda

> 
> Thanks
> Zhijian
> 
> >   	if (mr)
> >   		rxe_put(mr);
> >
> > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >   	}
> >
> >   	err = rxe_xmit_packet(qp, &ack_pkt, skb);
> > -	if (err) {
> > -		pr_err("Failed sending RDMA reply.\n");
> > +	if (err)
> >   		return RESPST_ERR_RNR;
> > -	}
> >
> >   	res->read.va += payload;
> >   	res->read.resid -= payload;
Li Zhijian Aug. 30, 2022, 9:44 a.m. UTC | #7
On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
>> On 29/08/2022 13:44, Daisuke Matsuda wrote:
>>> An incoming Read request causes multiple Read responses. If a user MR to
>>> copy data from is unavailable or responder cannot send a reply, then the
>>> error messages can be printed for each response attempt, resulting in
>>> message overflow.
>>>
>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
>>> ---
>>>    drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>
>>>    	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>    			  payload, RXE_FROM_MR_OBJ);
>>> -	if (err)
>>> -		pr_err("Failed copying memory\n");
>> Not relate to this patch.
>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
>> IMO, when err happens, responder shall notify the request anyhow.
> Practically, I have never seen rxe_mr_copy() failed before,
> but I agree the implementation may be incorrect as you mentioned.
>
> As far as I tested, responder replied with the requested amount of payloads
> even when rxe_mr_copy() is modified to fail. In this case,
> requester may mistakenly believe that they get data correctly.
>
> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).

it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
by returning RESPST_ERR_RKEY_VIOLATION here.

see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"



>
> Daisuke Matsuda
>
>> Thanks
>> Zhijian
>>
>>>    	if (mr)
>>>    		rxe_put(mr);
>>>
>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>    	}
>>>
>>>    	err = rxe_xmit_packet(qp, &ack_pkt, skb);
>>> -	if (err) {
>>> -		pr_err("Failed sending RDMA reply.\n");
>>> +	if (err)
>>>    		return RESPST_ERR_RNR;
>>> -	}
>>>
>>>    	res->read.va += payload;
>>>    	res->read.resid -= payload;
Li Zhijian Aug. 30, 2022, 9:54 a.m. UTC | #8
On 30/08/2022 17:44, Li Zhijian wrote:
>
>
> On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote:
>> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote:
>>> On 29/08/2022 13:44, Daisuke Matsuda wrote:
>>>> An incoming Read request causes multiple Read responses. If a user MR to
>>>> copy data from is unavailable or responder cannot send a reply, then the
>>>> error messages can be printed for each response attempt, resulting in
>>>> message overflow.
>>>>
>>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
>>>> ---
>>>>    drivers/infiniband/sw/rxe/rxe_resp.c | 6 +-----
>>>>    1 file changed, 1 insertion(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644
>>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>
>>>>        err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>>>>                  payload, RXE_FROM_MR_OBJ);

Looks i was missing the the faulting point inside rxe_mr_copy()

mr_check_range() is the only point to return an error inside rxe_mr_copy()
and mr_check_range() would never fail in this moment, since it is always tested by RESPST_CHK_RKEY
before calling read_reply()

so it's safe to remove this print, and add some comments ?

Thanks
Zhijian



>>>> -    if (err)
>>>> -        pr_err("Failed copying memory\n");
>>> Not relate to this patch.
>>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ?
>>> IMO, when err happens, responder shall notify the request anyhow.
>> Practically, I have never seen rxe_mr_copy() failed before,
>> but I agree the implementation may be incorrect as you mentioned.
>>
>> As far as I tested, responder replied with the requested amount of payloads
>> even when rxe_mr_copy() is modified to fail. In this case,
>> requester may mistakenly believe that they get data correctly.
>>
>> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334).
>
> it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side
> by returning RESPST_ERR_RKEY_VIOLATION here.
>
> see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION"
>
>
>
>>
>> Daisuke Matsuda
>>
>>> Thanks
>>> Zhijian
>>>
>>>>        if (mr)
>>>>            rxe_put(mr);
>>>>
>>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>>>        }
>>>>
>>>>        err = rxe_xmit_packet(qp, &ack_pkt, skb);
>>>> -    if (err) {
>>>> -        pr_err("Failed sending RDMA reply.\n");
>>>> +    if (err)
>>>>            return RESPST_ERR_RNR;
>>>> -    }
>>>>
>>>>        res->read.va += payload;
>>>>        res->read.resid -= payload;
>
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index b36ec5c4d5e0..4b3e8aec2fb8 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -811,8 +811,6 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 
 	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
 			  payload, RXE_FROM_MR_OBJ);
-	if (err)
-		pr_err("Failed copying memory\n");
 	if (mr)
 		rxe_put(mr);
 
@@ -823,10 +821,8 @@  static enum resp_states read_reply(struct rxe_qp *qp,
 	}
 
 	err = rxe_xmit_packet(qp, &ack_pkt, skb);
-	if (err) {
-		pr_err("Failed sending RDMA reply.\n");
+	if (err)
 		return RESPST_ERR_RNR;
-	}
 
 	res->read.va += payload;
 	res->read.resid -= payload;