diff mbox series

rxe: fix error completion wr_id and qp_num

Message ID 20181025194057.16973-1-sagi@grimberg.me (mailing list archive)
State Accepted
Commit e48d8ed9c6193502d849b35767fd18e20bbd7ba2
Headers show
Series rxe: fix error completion wr_id and qp_num | expand

Commit Message

Sagi Grimberg Oct. 25, 2018, 7:40 p.m. UTC
Error completions must still contain a valid wr_id and
qp_num such that the consumer can rely on. Correctly
fill these fields in receive error completions.

Reported-by: Walker Benjamin <benjamin.walker@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Zhu Yanjun Oct. 26, 2018, 8:23 a.m. UTC | #1
On 2018/10/26 3:40, Sagi Grimberg wrote:
> Error completions must still contain a valid wr_id and
> qp_num such that the consumer can rely on. Correctly
> fill these fields in receive error completions.
>
> Reported-by: Walker Benjamin <benjamin.walker@intel.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested in my linux environment. This patch worked well.

Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>

> ---
>   drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index aa5833318372..3438f0653df0 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -841,11 +841,16 @@ static enum resp_states do_complete(struct rxe_qp *qp,
>   
>   	memset(&cqe, 0, sizeof(cqe));
>   
> -	wc->wr_id		= wqe->wr_id;
> -	wc->status		= qp->resp.status;
> -	wc->qp			= &qp->ibqp;
> +	if (qp->rcq->is_user) {
> +		uwc->status             = qp->resp.status;
> +		uwc->qp_num             = qp->ibqp.qp_num;
> +		uwc->wr_id              = wqe->wr_id;
> +	} else {
> +		wc->status              = qp->resp.status;
> +		wc->qp                  = &qp->ibqp;
> +		wc->wr_id               = wqe->wr_id;
> +	}
>   
> -	/* fields after status are not required for errors */
>   	if (wc->status == IB_WC_SUCCESS) {
>   		wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
>   				pkt->mask & RXE_WRITE_MASK) ?
Sasha Kotchubievsky Oct. 29, 2018, 8:16 a.m. UTC | #2
Hi,

I just checked IB sepc. It looks like, IB spec in relevant chapter
(11.4.2) doesn't have any requirement for qp_num to be valid in case of
error. Only wr_id and status must be valid.

I think, wr_id and status should be valid in existing implementation.
struct rxe_cqe {
        union {
                struct ib_wc            ibwc;
                struct ib_uverbs_wc     uibwc;
        };
};
In both cases wr_id and status are in the same places
So lines
wc->wr_id		= wqe->wr_id;
wc->status		= qp->resp.status
set the valid values for rxe too.

But I agree, that at the first glance, the code looks invalid and some
refactoring can be done.

Regards
Sasha

On 25/10/2018 22:40, Sagi Grimberg wrote:
> Error completions must still contain a valid wr_id and
> qp_num such that the consumer can rely on. Correctly
> fill these fields in receive error completions.
> 
> Reported-by: Walker Benjamin <benjamin.walker@intel.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index aa5833318372..3438f0653df0 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -841,11 +841,16 @@ static enum resp_states do_complete(struct rxe_qp *qp,
>  
>  	memset(&cqe, 0, sizeof(cqe));
>  
> -	wc->wr_id		= wqe->wr_id;
> -	wc->status		= qp->resp.status;
> -	wc->qp			= &qp->ibqp;
> +	if (qp->rcq->is_user) {
> +		uwc->status             = qp->resp.status;
> +		uwc->qp_num             = qp->ibqp.qp_num;
> +		uwc->wr_id              = wqe->wr_id;
> +	} else {
> +		wc->status              = qp->resp.status;
> +		wc->qp                  = &qp->ibqp;
> +		wc->wr_id               = wqe->wr_id;
> +	}
>  
> -	/* fields after status are not required for errors */
>  	if (wc->status == IB_WC_SUCCESS) {
>  		wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
>  				pkt->mask & RXE_WRITE_MASK) ?
>
Doug Ledford Nov. 6, 2018, 9:24 p.m. UTC | #3
On Mon, 2018-10-29 at 10:16 +0200, Sasha Kotchubievsky wrote:
> I think, wr_id and status should be valid in existing implementation.
> struct rxe_cqe {
>         union {
>                 struct ib_wc            ibwc;
>                 struct ib_uverbs_wc     uibwc;
>         };
> };
> In both cases wr_id and status are in the same places
> So lines
> wc->wr_id		= wqe->wr_id;

The original bug report clearly called out wr_id as being invalid in
their specific scenario.

> wc->status		= qp->resp.status
> set the valid values for rxe too.
> 
> But I agree, that at the first glance, the code looks invalid and some
> refactoring can be done.

Personally, I like Sagi's patch.  It's clear and unambiguous and you
don't need to understand that there is a union in play making things
work.  And the original code obviously got the qp setting for user space
wrong, period.  You can argue that it doesn't need to provide the qpnum
to user space, but the original code set it to the internal qp address
for both user and kernel space and needs to be fixed.

I'm inclined to take the patch, especially since it has verified test
results to go along with it.  So, applied to for-next, please post any
fixups as incremental patches.

> Regards
> Sasha
> 
> On 25/10/2018 22:40, Sagi Grimberg wrote:
> > Error completions must still contain a valid wr_id and
> > qp_num such that the consumer can rely on. Correctly
> > fill these fields in receive error completions.
> > 
> > Reported-by: Walker Benjamin <benjamin.walker@intel.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> > ---
> >  drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index aa5833318372..3438f0653df0 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -841,11 +841,16 @@ static enum resp_states do_complete(struct rxe_qp *qp,
> >  
> >  	memset(&cqe, 0, sizeof(cqe));
> >  
> > -	wc->wr_id		= wqe->wr_id;
> > -	wc->status		= qp->resp.status;
> > -	wc->qp			= &qp->ibqp;
> > +	if (qp->rcq->is_user) {
> > +		uwc->status             = qp->resp.status;
> > +		uwc->qp_num             = qp->ibqp.qp_num;
> > +		uwc->wr_id              = wqe->wr_id;
> > +	} else {
> > +		wc->status              = qp->resp.status;
> > +		wc->qp                  = &qp->ibqp;
> > +		wc->wr_id               = wqe->wr_id;
> > +	}
> >  
> > -	/* fields after status are not required for errors */
> >  	if (wc->status == IB_WC_SUCCESS) {
> >  		wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
> >  				pkt->mask & RXE_WRITE_MASK) ?
> >
Ben Walker Nov. 7, 2018, 9:16 p.m. UTC | #4
On Tue, 2018-11-06 at 16:24 -0500, Doug Ledford wrote:
> On Mon, 2018-10-29 at 10:16 +0200, Sasha Kotchubievsky wrote:
> > I think, wr_id and status should be valid in existing implementation.
> > struct rxe_cqe {
> >         union {
> >                 struct ib_wc            ibwc;
> >                 struct ib_uverbs_wc     uibwc;
> >         };
> > };
> > In both cases wr_id and status are in the same places
> > So lines
> > wc->wr_id		= wqe->wr_id;
> 
> The original bug report clearly called out wr_id as being invalid in
> their specific scenario.

I thought it was, but only qp_num was actually invalid. My interpretation of
wr_id was dependent on an earlier use of qp_num and I didn't realize that was
the fundamental problem.

> 
> > wc->status		= qp->resp.status
> > set the valid values for rxe too.
> > 
> > But I agree, that at the first glance, the code looks invalid and some
> > refactoring can be done.
> 
> Personally, I like Sagi's patch.  It's clear and unambiguous and you
> don't need to understand that there is a union in play making things
> work.  And the original code obviously got the qp setting for user space
> wrong, period.  You can argue that it doesn't need to provide the qpnum
> to user space, but the original code set it to the internal qp address
> for both user and kernel space and needs to be fixed.

The man page for ibv_poll_cq says qp_num should be valid. I understand the verbs
specification may indicate otherwise, but given what the man page says and how
easy it is to make qp_num valid, I am very glad to see that this is the approach
being taken. That said, I've since reworked my code to avoid needing the qp_num
on the error path and everything is working both with and without this patch.

Thanks for the help everyone.

> 
> I'm inclined to take the patch, especially since it has verified test
> results to go along with it.  So, applied to for-next, please post any
> fixups as incremental patches.
> 
> > Regards
> > Sasha
> > 
> > On 25/10/2018 22:40, Sagi Grimberg wrote:
> > > Error completions must still contain a valid wr_id and
> > > qp_num such that the consumer can rely on. Correctly
> > > fill these fields in receive error completions.
> > > 
> > > Reported-by: Walker Benjamin <benjamin.walker@intel.com>
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> > > ---
> > >  drivers/infiniband/sw/rxe/rxe_resp.c | 13 +++++++++----
> > >  1 file changed, 9 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
> > > b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > index aa5833318372..3438f0653df0 100644
> > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > > @@ -841,11 +841,16 @@ static enum resp_states do_complete(struct rxe_qp
> > > *qp,
> > >  
> > >  	memset(&cqe, 0, sizeof(cqe));
> > >  
> > > -	wc->wr_id		= wqe->wr_id;
> > > -	wc->status		= qp->resp.status;
> > > -	wc->qp			= &qp->ibqp;
> > > +	if (qp->rcq->is_user) {
> > > +		uwc->status             = qp->resp.status;
> > > +		uwc->qp_num             = qp->ibqp.qp_num;
> > > +		uwc->wr_id              = wqe->wr_id;
> > > +	} else {
> > > +		wc->status              = qp->resp.status;
> > > +		wc->qp                  = &qp->ibqp;
> > > +		wc->wr_id               = wqe->wr_id;
> > > +	}
> > >  
> > > -	/* fields after status are not required for errors */
> > >  	if (wc->status == IB_WC_SUCCESS) {
> > >  		wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
> > >  				pkt->mask & RXE_WRITE_MASK) ?
> > > 
> 
>
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index aa5833318372..3438f0653df0 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -841,11 +841,16 @@  static enum resp_states do_complete(struct rxe_qp *qp,
 
 	memset(&cqe, 0, sizeof(cqe));
 
-	wc->wr_id		= wqe->wr_id;
-	wc->status		= qp->resp.status;
-	wc->qp			= &qp->ibqp;
+	if (qp->rcq->is_user) {
+		uwc->status             = qp->resp.status;
+		uwc->qp_num             = qp->ibqp.qp_num;
+		uwc->wr_id              = wqe->wr_id;
+	} else {
+		wc->status              = qp->resp.status;
+		wc->qp                  = &qp->ibqp;
+		wc->wr_id               = wqe->wr_id;
+	}
 
-	/* fields after status are not required for errors */
 	if (wc->status == IB_WC_SUCCESS) {
 		wc->opcode = (pkt->mask & RXE_IMMDT_MASK &&
 				pkt->mask & RXE_WRITE_MASK) ?