Message ID | 20220829054413.1630495-1-matsuda-daisuke@fujitsu.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | RDMA/rxe: Delete error messages triggered by incoming Read requests | expand |
On 8/29/22 1:44 PM, Daisuke Matsuda wrote: > An incoming Read request causes multiple Read responses. If a user MR to > copy data from is unavailable or responder cannot send a reply, then the > error messages can be printed for each response attempt, resulting in > message overflow. > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index b36ec5c4d5e0..4b3e8aec2fb8 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), > payload, RXE_FROM_MR_OBJ); > - if (err) > - pr_err("Failed copying memory\n"); The err is set but not used. Below is better: - err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), - payload, RXE_FROM_MR_OBJ); + rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), + payload, RXE_FROM_MR_OBJ); Thanks, Cheng Xu > if (mr) > rxe_put(mr); > > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, > } > > err = rxe_xmit_packet(qp, &ack_pkt, skb); > - if (err) { > - pr_err("Failed sending RDMA reply.\n"); > + if (err) > return RESPST_ERR_RNR; > - } > > res->read.va += payload; > res->read.resid -= payload;
On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda <matsuda-daisuke@fujitsu.com> wrote: > > An incoming Read request causes multiple Read responses. If a user MR to > copy data from is unavailable or responder cannot send a reply, then the > error messages can be printed for each response attempt, resulting in > message overflow. > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index b36ec5c4d5e0..4b3e8aec2fb8 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), > payload, RXE_FROM_MR_OBJ); > - if (err) > - pr_err("Failed copying memory\n"); pr_err_once is better? > if (mr) > rxe_put(mr); > > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, > } > > err = rxe_xmit_packet(qp, &ack_pkt, skb); > - if (err) { > - pr_err("Failed sending RDMA reply.\n"); The same with the above Zhu Yanjun > + if (err) > return RESPST_ERR_RNR; > - } > > res->read.va += payload; > res->read.resid -= payload; > -- > 2.31.1 >
On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote: > On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda > <matsuda-daisuke@fujitsu.com> wrote: > > > > An incoming Read request causes multiple Read responses. If a user MR to > > copy data from is unavailable or responder cannot send a reply, then the > > error messages can be printed for each response attempt, resulting in > > message overflow. > > > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > > --- > > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > > index b36ec5c4d5e0..4b3e8aec2fb8 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), > > payload, RXE_FROM_MR_OBJ); > > - if (err) > > - pr_err("Failed copying memory\n"); > > pr_err_once is better? Like Jason said, there shouldn't any prints in network triggered flows. Thanks
On Mon, Aug 29, 2022 at 3:02 PM Leon Romanovsky <leon@kernel.org> wrote: > > On Mon, Aug 29, 2022 at 02:31:00PM +0800, Zhu Yanjun wrote: > > On Mon, Aug 29, 2022 at 1:45 PM Daisuke Matsuda > > <matsuda-daisuke@fujitsu.com> wrote: > > > > > > An incoming Read request causes multiple Read responses. If a user MR to > > > copy data from is unavailable or responder cannot send a reply, then the > > > error messages can be printed for each response attempt, resulting in > > > message overflow. > > > > > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > > > --- > > > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- > > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > > > index b36ec5c4d5e0..4b3e8aec2fb8 100644 > > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > > > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > > > > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), > > > payload, RXE_FROM_MR_OBJ); > > > - if (err) > > > - pr_err("Failed copying memory\n"); > > > > pr_err_once is better? > > Like Jason said, there shouldn't any prints in network triggered flows. > Ok, thanks. Zhu Yanjun > Thanks
On 29/08/2022 13:44, Daisuke Matsuda wrote: > An incoming Read request causes multiple Read responses. If a user MR to > copy data from is unavailable or responder cannot send a reply, then the > error messages can be printed for each response attempt, resulting in > message overflow. > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > --- > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > index b36ec5c4d5e0..4b3e8aec2fb8 100644 > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), > payload, RXE_FROM_MR_OBJ); > - if (err) > - pr_err("Failed copying memory\n"); Not relate to this patch. I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ? IMO, when err happens, responder shall notify the request anyhow. Thanks Zhijian > if (mr) > rxe_put(mr); > > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, > } > > err = rxe_xmit_packet(qp, &ack_pkt, skb); > - if (err) { > - pr_err("Failed sending RDMA reply.\n"); > + if (err) > return RESPST_ERR_RNR; > - } > > res->read.va += payload; > res->read.resid -= payload;
On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote: > On 29/08/2022 13:44, Daisuke Matsuda wrote: > > An incoming Read request causes multiple Read responses. If a user MR to > > copy data from is unavailable or responder cannot send a reply, then the > > error messages can be printed for each response attempt, resulting in > > message overflow. > > > > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> > > --- > > drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c > > index b36ec5c4d5e0..4b3e8aec2fb8 100644 > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > > @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > > > err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), > > payload, RXE_FROM_MR_OBJ); > > - if (err) > > - pr_err("Failed copying memory\n"); > Not relate to this patch. > I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ? > IMO, when err happens, responder shall notify the request anyhow. Practically, I have never seen rxe_mr_copy() failed before, but I agree the implementation may be incorrect as you mentioned. As far as I tested, responder replied with the requested amount of payloads even when rxe_mr_copy() is modified to fail. In this case, requester may mistakenly believe that they get data correctly. For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334). Daisuke Matsuda > > Thanks > Zhijian > > > if (mr) > > rxe_put(mr); > > > > @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, > > } > > > > err = rxe_xmit_packet(qp, &ack_pkt, skb); > > - if (err) { > > - pr_err("Failed sending RDMA reply.\n"); > > + if (err) > > return RESPST_ERR_RNR; > > - } > > > > res->read.va += payload; > > res->read.resid -= payload;
On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote: > On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote: >> On 29/08/2022 13:44, Daisuke Matsuda wrote: >>> An incoming Read request causes multiple Read responses. If a user MR to >>> copy data from is unavailable or responder cannot send a reply, then the >>> error messages can be printed for each response attempt, resulting in >>> message overflow. >>> >>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> >>> --- >>> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- >>> 1 file changed, 1 insertion(+), 5 deletions(-) >>> >>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c >>> index b36ec5c4d5e0..4b3e8aec2fb8 100644 >>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c >>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c >>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, >>> >>> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), >>> payload, RXE_FROM_MR_OBJ); >>> - if (err) >>> - pr_err("Failed copying memory\n"); >> Not relate to this patch. >> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ? >> IMO, when err happens, responder shall notify the request anyhow. > Practically, I have never seen rxe_mr_copy() failed before, > but I agree the implementation may be incorrect as you mentioned. > > As far as I tested, responder replied with the requested amount of payloads > even when rxe_mr_copy() is modified to fail. In this case, > requester may mistakenly believe that they get data correctly. > > For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334). it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side by returning RESPST_ERR_RKEY_VIOLATION here. see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION" > > Daisuke Matsuda > >> Thanks >> Zhijian >> >>> if (mr) >>> rxe_put(mr); >>> >>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, >>> } >>> >>> err = rxe_xmit_packet(qp, &ack_pkt, skb); >>> - if (err) { >>> - pr_err("Failed sending RDMA reply.\n"); >>> + if (err) >>> return RESPST_ERR_RNR; >>> - } >>> >>> res->read.va += payload; >>> res->read.resid -= payload;
On 30/08/2022 17:44, Li Zhijian wrote: > > > On 29/08/2022 18:21, Matsuda, Daisuke/松田 大輔 wrote: >> On Monday, August 29, 2022 4:36 PM, Li Zhijian wrote: >>> On 29/08/2022 13:44, Daisuke Matsuda wrote: >>>> An incoming Read request causes multiple Read responses. If a user MR to >>>> copy data from is unavailable or responder cannot send a reply, then the >>>> error messages can be printed for each response attempt, resulting in >>>> message overflow. >>>> >>>> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> >>>> --- >>>> drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- >>>> 1 file changed, 1 insertion(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c >>>> index b36ec5c4d5e0..4b3e8aec2fb8 100644 >>>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c >>>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c >>>> @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, >>>> >>>> err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), >>>> payload, RXE_FROM_MR_OBJ); Looks i was missing the the faulting point inside rxe_mr_copy() mr_check_range() is the only point to return an error inside rxe_mr_copy() and mr_check_range() would never fail in this moment, since it is always tested by RESPST_CHK_RKEY before calling read_reply() so it's safe to remove this print, and add some comments ? Thanks Zhijian >>>> - if (err) >>>> - pr_err("Failed copying memory\n"); >>> Not relate to this patch. >>> I'm wondering why this err is ignored, rxe_mr_copy() does the real execution or rxe_mr_copy() would never fail ? >>> IMO, when err happens, responder shall notify the request anyhow. >> Practically, I have never seen rxe_mr_copy() failed before, >> but I agree the implementation may be incorrect as you mentioned. >> >> As far as I tested, responder replied with the requested amount of payloads >> even when rxe_mr_copy() is modified to fail. In this case, >> requester may mistakenly believe that they get data correctly. >> >> For more details, see IB Specification Vol 1-Revision-1.5 Ch.9.7.5.1.3 (page.334). > > it seems it's suitable to reply NAK code "REMOTE ACCESS ERROR" to the requester side > by returning RESPST_ERR_RKEY_VIOLATION here. > > see "9.7.5.2.4 REMOTE ACCESS ERROR" and "9.7.4.1.5 RESPONDER R_KEY VALIDATION" > > > >> >> Daisuke Matsuda >> >>> Thanks >>> Zhijian >>> >>>> if (mr) >>>> rxe_put(mr); >>>> >>>> @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, >>>> } >>>> >>>> err = rxe_xmit_packet(qp, &ack_pkt, skb); >>>> - if (err) { >>>> - pr_err("Failed sending RDMA reply.\n"); >>>> + if (err) >>>> return RESPST_ERR_RNR; >>>> - } >>>> >>>> res->read.va += payload; >>>> res->read.resid -= payload; >
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c index b36ec5c4d5e0..4b3e8aec2fb8 100644 --- a/drivers/infiniband/sw/rxe/rxe_resp.c +++ b/drivers/infiniband/sw/rxe/rxe_resp.c @@ -811,8 +811,6 @@ static enum resp_states read_reply(struct rxe_qp *qp, err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt), payload, RXE_FROM_MR_OBJ); - if (err) - pr_err("Failed copying memory\n"); if (mr) rxe_put(mr); @@ -823,10 +821,8 @@ static enum resp_states read_reply(struct rxe_qp *qp, } err = rxe_xmit_packet(qp, &ack_pkt, skb); - if (err) { - pr_err("Failed sending RDMA reply.\n"); + if (err) return RESPST_ERR_RNR; - } res->read.va += payload; res->read.resid -= payload;
An incoming Read request causes multiple Read responses. If a user MR to copy data from is unavailable or responder cannot send a reply, then the error messages can be printed for each response attempt, resulting in message overflow. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> --- drivers/infiniband/sw/rxe/rxe_resp.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)