diff mbox

[libmlx5] fix err return values to match ibv_post_send expectations

Message ID 1470418439-59245-1-git-send-email-jarod@redhat.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Jarod Wilson Aug. 5, 2016, 5:33 p.m. UTC
The man page for ibv_post_send says:

   RETURN VALUE

       ibv_post_send() returns 0 on success, or the value of errno on failure
       (which indicates the failure reason).

QEMU looks for the return value, and in the ENOMEM case, waits and
retries, but with mlx5, it ends up dropping requests and hanging, because
of the unexpected -1 return instead of ENOMEM.

The fix is simple: set err = E<whatever> instead of -1, and eliminate use
of errno = in _mlx5_post_send, just have mlx5_post_send assign the return
from _mlx5_post_send in errno instead. This fix has been confirmed to
resolves the issues seen with QEMU.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
---
 src/qp.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

Comments

Jason Gunthorpe Aug. 5, 2016, 6:01 p.m. UTC | #1
On Fri, Aug 05, 2016 at 01:33:59PM -0400, Jarod Wilson wrote:
> -	return _mlx5_post_send(ibqp, wr, bad_wr);
> +	errno = _mlx5_post_send(ibqp, wr, bad_wr);
> +	return errno;

Why still set errno? It was wrong in the first place..

Likely every use of errno in this provider should be reviewed.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarod Wilson Aug. 6, 2016, 6:48 p.m. UTC | #2
On Fri, Aug 05, 2016 at 12:01:07PM -0600, Jason Gunthorpe wrote:
> On Fri, Aug 05, 2016 at 01:33:59PM -0400, Jarod Wilson wrote:
> > -	return _mlx5_post_send(ibqp, wr, bad_wr);
> > +	errno = _mlx5_post_send(ibqp, wr, bad_wr);
> > +	return errno;
> 
> Why still set errno? It was wrong in the first place..
> 
> Likely every use of errno in this provider should be reviewed.

Wasn't sure if something else might actually consume errno elsewhere,
since it's a global, and the other code path in that function set errno.
Could certainly just stick with returning the _mlx5_post_send() result
directly.
Jason Gunthorpe Aug. 8, 2016, 4:30 p.m. UTC | #3
On Sat, Aug 06, 2016 at 02:48:23PM -0400, Jarod Wilson wrote:
> On Fri, Aug 05, 2016 at 12:01:07PM -0600, Jason Gunthorpe wrote:
> > On Fri, Aug 05, 2016 at 01:33:59PM -0400, Jarod Wilson wrote:
> > > -	return _mlx5_post_send(ibqp, wr, bad_wr);
> > > +	errno = _mlx5_post_send(ibqp, wr, bad_wr);
> > > +	return errno;
> > 
> > Why still set errno? It was wrong in the first place..
> > 
> > Likely every use of errno in this provider should be reviewed.
> 
> Wasn't sure if something else might actually consume errno elsewhere,
> since it's a global, and the other code path in that function set errno.
> Could certainly just stick with returning the _mlx5_post_send() result
> directly.

If the man page doesn't document the function destroying errno, then
it should preserve it. Do not just randomly set errno.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarod Wilson Aug. 8, 2016, 4:52 p.m. UTC | #4
On Mon, Aug 08, 2016 at 10:30:18AM -0600, Jason Gunthorpe wrote:
> On Sat, Aug 06, 2016 at 02:48:23PM -0400, Jarod Wilson wrote:
> > On Fri, Aug 05, 2016 at 12:01:07PM -0600, Jason Gunthorpe wrote:
> > > On Fri, Aug 05, 2016 at 01:33:59PM -0400, Jarod Wilson wrote:
> > > > -	return _mlx5_post_send(ibqp, wr, bad_wr);
> > > > +	errno = _mlx5_post_send(ibqp, wr, bad_wr);
> > > > +	return errno;
> > > 
> > > Why still set errno? It was wrong in the first place..
> > > 
> > > Likely every use of errno in this provider should be reviewed.
> > 
> > Wasn't sure if something else might actually consume errno elsewhere,
> > since it's a global, and the other code path in that function set errno.
> > Could certainly just stick with returning the _mlx5_post_send() result
> > directly.
> 
> If the man page doesn't document the function destroying errno, then
> it should preserve it. Do not just randomly set errno.

$ man ibv_post_send
...
RETURN VALUE
       ibv_post_send() returns 0 on success, or the value of errno on
       failure (which indicates the failure reason).
...

This is a little bit terse, but looks to me like it means to say do set
errno in ibv_post_send. I really am not terribly familiar with this code
though, so I have no idea for sure if my interpretation is correct. Looks
like libmlx4 does NOT twiddle errno though. I can certainly update the
patch if consensus is that errno shouldn't be touched. Looks like
mlx5_post_send() will need some additional work as well if that's the case
though.
Jason Gunthorpe Aug. 8, 2016, 5:33 p.m. UTC | #5
On Mon, Aug 08, 2016 at 12:52:13PM -0400, Jarod Wilson wrote:
> > If the man page doesn't document the function destroying errno, then
> > it should preserve it. Do not just randomly set errno.
> 
> $ man ibv_post_send
> ...
> RETURN VALUE
>        ibv_post_send() returns 0 on success, or the value of errno on
>        failure (which indicates the failure reason).
> ...
> 
> This is a little bit terse, but looks to me like it means to say do set
> errno in ibv_post_send. I really am not terribly familiar with this
> code

Yes, our man pages are a little baroque.

The phrase 'returns the value of errno' means it returns a E* value,
not errno itself and leaves errno alone.

Using the phrase 'it returns an error number' would be clearer and
more consistent with glibc documentation.

> patch if consensus is that errno shouldn't be touched. Looks like
> mlx5_post_send() will need some additional work as well if that's the case
> though.

Most likely.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/qp.c b/src/qp.c
index 51e1176..2ad3ac0 100644
--- a/src/qp.c
+++ b/src/qp.c
@@ -590,8 +590,7 @@  static inline int _mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 		if (unlikely(wr->opcode < 0 ||
 		    wr->opcode >= sizeof mlx5_ib_opcode / sizeof mlx5_ib_opcode[0])) {
 			mlx5_dbg(fp, MLX5_DBG_QP_SEND, "bad opcode %d\n", wr->opcode);
-			errno = EINVAL;
-			err = -1;
+			err = EINVAL;
 			*bad_wr = wr;
 			goto out;
 		}
@@ -599,8 +598,7 @@  static inline int _mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 		if (unlikely(mlx5_wq_overflow(&qp->sq, nreq,
 					      to_mcq(qp->ibv_qp->send_cq)))) {
 			mlx5_dbg(fp, MLX5_DBG_QP_SEND, "work queue overflow\n");
-			errno = ENOMEM;
-			err = -1;
+			err = ENOMEM;
 			*bad_wr = wr;
 			goto out;
 		}
@@ -608,8 +606,7 @@  static inline int _mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 		if (unlikely(wr->num_sge > qp->sq.max_gs)) {
 			mlx5_dbg(fp, MLX5_DBG_QP_SEND, "max gs exceeded %d (max = %d)\n",
 				 wr->num_sge, qp->sq.max_gs);
-			errno = ENOMEM;
-			err = -1;
+			err = ENOMEM;
 			*bad_wr = wr;
 			goto out;
 		}
@@ -918,7 +915,8 @@  int mlx5_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
 	}
 #endif
 
-	return _mlx5_post_send(ibqp, wr, bad_wr);
+	errno = _mlx5_post_send(ibqp, wr, bad_wr);
+	return errno;
 }
 
 int mlx5_bind_mw(struct ibv_qp *qp, struct ibv_mw *mw,