diff mbox series

[for-rdma-core,v2] pyverbs: Fix wrong rkey in test_qp_ex_rc_bind_mw

Message ID 20220519155810.28803-1-rpearsonhpe@gmail.com (mailing list archive)
State Accepted
Delegated to: Jason Gunthorpe
Headers show
Series [for-rdma-core,v2] pyverbs: Fix wrong rkey in test_qp_ex_rc_bind_mw | expand

Commit Message

Bob Pearson May 19, 2022, 3:58 p.m. UTC
The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect
value for the new_rkey based on the old mr.rkey. This patch fixes that
behavior by basing the new rkey on the old mw.rkey instead.

Before this patch the test will fail for the rxe driver about 1 in 256
tries since randomly that is the freguency of new_rkeys which have the
same 8 bit key portion as the current mw which is not allowed. With
this patch those errors do not occur.

Fixes: 9fca2824b5ec ("tests: Retrieve tests that generates mlx5 CQE errors")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
v2
  Added a Fixes: line per Edward. It is the latest change to the test.
  Changed the subject line from RDMA/core: to pyverbs:
  Changed for-next to for-rdma-core per Zhu

 tests/test_qpex.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Gunthorpe July 4, 2022, 1:24 p.m. UTC | #1
On Thu, May 19, 2022 at 10:58:11AM -0500, Bob Pearson wrote:
> The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect
> value for the new_rkey based on the old mr.rkey. This patch fixes that
> behavior by basing the new rkey on the old mw.rkey instead.
> 
> Before this patch the test will fail for the rxe driver about 1 in 256
> tries since randomly that is the freguency of new_rkeys which have the
> same 8 bit key portion as the current mw which is not allowed. With
> this patch those errors do not occur.

While this patch seems correct, I don't understand these remarks.

IBA says:

 After the bind operation completes, the R_Key must consist of the 24
 bit index associated with the Type 2 Memory Window and the 8 bit key
 supplied by the Con- sumer in the Post Send Bind Memory Window Work
 Request.

Meaning the bind should only be processing the lower variant bits in
the first place, and there should be no condition where the bind could
fail since all varient bits are always legal.

Bind does not allow changing the upper fixed bit - only allocation can
change those. So if rxe kernel side is changing the upper bits it is
also buggy. IMHO it would be appropriate to fail the operation if
given rkey has unmatched upper bits.

Jason
Bob Pearson July 14, 2022, 8:20 p.m. UTC | #2
On 7/4/22 08:24, Jason Gunthorpe wrote:
> On Thu, May 19, 2022 at 10:58:11AM -0500, Bob Pearson wrote:
>> The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect
>> value for the new_rkey based on the old mr.rkey. This patch fixes that
>> behavior by basing the new rkey on the old mw.rkey instead.
>>
>> Before this patch the test will fail for the rxe driver about 1 in 256
>> tries since randomly that is the freguency of new_rkeys which have the
>> same 8 bit key portion as the current mw which is not allowed. With
>> this patch those errors do not occur.
> 
> While this patch seems correct, I don't understand these remarks.
> 
> IBA says:
> 
>  After the bind operation completes, the R_Key must consist of the 24
>  bit index associated with the Type 2 Memory Window and the 8 bit key
>  supplied by the Con- sumer in the Post Send Bind Memory Window Work
>  Request.
> 
> Meaning the bind should only be processing the lower variant bits in
> the first place, and there should be no condition where the bind could
> fail since all varient bits are always legal.
> 
> Bind does not allow changing the upper fixed bit - only allocation can
> change those. So if rxe kernel side is changing the upper bits it is
> also buggy. IMHO it would be appropriate to fail the operation if
> given rkey has unmatched upper bits.
> 
> Jason

The root cause of this failure was that the script generated the new mw rkey
from the mr rkey which had nothing to do with it. The current mw implementation
enforces a rule that the new mw rkey have a different key portion than the old
mw rkey. Since the mr rkey is changing independently from the mw rkeys this rule
will randomly cause an error approx 1 out of 256 times. You are questioning the
validity of this rule.

According to the IBA there are two types of mws type 1 and 2. Type 1 mws are bound
through a user space api ibv_bind_mw() while type 2 mws are bound with a sq operation.
In the kernel there is no implementation of the type 1 bind API so it is implemented
by mapping it into a sq operation.

The sq operation has 3 key parameters, the mr lkey, the original mw rkey and the new
key portion of the mw rkey. Only the low order 8 bits of this parameter are used.

For type 1 mws the operation returns a new rkey key portion which is guaranteed to be
different than the original one. According to the man page the user space app must
save the old rkey and if the operation completes with a failure it must fix up the
key to the old value. Checking the key is different in this case is just paranoia since
the rdma-core library is supposed to change it.

For type 2 mws, as you say, the key portion is entirely up to the consumer so any value
is valid. The current test that the key portion is different can be dropped. There is
a slight problem though which is fixed by the test even though it is not proper. C10-68
requires that as soon as a bind operation is complete that no operations referring to
the previous binding shall succeed without error. The problem is how do you know
if the rkey is the same for the new and old bindings are the same. The sane way to do
this is to make sure they are different. This is how all the examples of use cases I
have seen proceed.

Bob
diff mbox series

Patch

diff --git a/tests/test_qpex.py b/tests/test_qpex.py
index 8f3f338e..a4c99910 100644
--- a/tests/test_qpex.py
+++ b/tests/test_qpex.py
@@ -300,7 +300,7 @@  class QpExTestCase(RDMATestCase):
             if ex.error_code == errno.EOPNOTSUPP:
                 raise unittest.SkipTest('Memory Window allocation is not supported')
             raise ex
-        new_key = inc_rkey(server.mr.rkey)
+        new_key = inc_rkey(mw.rkey)
         server.qp.wr_bind_mw(mw, new_key, bind_info)
         server.qp.wr_complete()
         u.poll_cq(server.cq)