Message ID | 20220519155810.28803-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | [for-rdma-core,v2] pyverbs: Fix wrong rkey in test_qp_ex_rc_bind_mw | expand |
On Thu, May 19, 2022 at 10:58:11AM -0500, Bob Pearson wrote: > The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect > value for the new_rkey based on the old mr.rkey. This patch fixes that > behavior by basing the new rkey on the old mw.rkey instead. > > Before this patch the test will fail for the rxe driver about 1 in 256 > tries since randomly that is the freguency of new_rkeys which have the > same 8 bit key portion as the current mw which is not allowed. With > this patch those errors do not occur. While this patch seems correct, I don't understand these remarks. IBA says: After the bind operation completes, the R_Key must consist of the 24 bit index associated with the Type 2 Memory Window and the 8 bit key supplied by the Con- sumer in the Post Send Bind Memory Window Work Request. Meaning the bind should only be processing the lower variant bits in the first place, and there should be no condition where the bind could fail since all varient bits are always legal. Bind does not allow changing the upper fixed bit - only allocation can change those. So if rxe kernel side is changing the upper bits it is also buggy. IMHO it would be appropriate to fail the operation if given rkey has unmatched upper bits. Jason
On 7/4/22 08:24, Jason Gunthorpe wrote: > On Thu, May 19, 2022 at 10:58:11AM -0500, Bob Pearson wrote: >> The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect >> value for the new_rkey based on the old mr.rkey. This patch fixes that >> behavior by basing the new rkey on the old mw.rkey instead. >> >> Before this patch the test will fail for the rxe driver about 1 in 256 >> tries since randomly that is the freguency of new_rkeys which have the >> same 8 bit key portion as the current mw which is not allowed. With >> this patch those errors do not occur. > > While this patch seems correct, I don't understand these remarks. > > IBA says: > > After the bind operation completes, the R_Key must consist of the 24 > bit index associated with the Type 2 Memory Window and the 8 bit key > supplied by the Con- sumer in the Post Send Bind Memory Window Work > Request. > > Meaning the bind should only be processing the lower variant bits in > the first place, and there should be no condition where the bind could > fail since all varient bits are always legal. > > Bind does not allow changing the upper fixed bit - only allocation can > change those. So if rxe kernel side is changing the upper bits it is > also buggy. IMHO it would be appropriate to fail the operation if > given rkey has unmatched upper bits. > > Jason The root cause of this failure was that the script generated the new mw rkey from the mr rkey which had nothing to do with it. The current mw implementation enforces a rule that the new mw rkey have a different key portion than the old mw rkey. Since the mr rkey is changing independently from the mw rkeys this rule will randomly cause an error approx 1 out of 256 times. You are questioning the validity of this rule. According to the IBA there are two types of mws type 1 and 2. Type 1 mws are bound through a user space api ibv_bind_mw() while type 2 mws are bound with a sq operation. In the kernel there is no implementation of the type 1 bind API so it is implemented by mapping it into a sq operation. The sq operation has 3 key parameters, the mr lkey, the original mw rkey and the new key portion of the mw rkey. Only the low order 8 bits of this parameter are used. For type 1 mws the operation returns a new rkey key portion which is guaranteed to be different than the original one. According to the man page the user space app must save the old rkey and if the operation completes with a failure it must fix up the key to the old value. Checking the key is different in this case is just paranoia since the rdma-core library is supposed to change it. For type 2 mws, as you say, the key portion is entirely up to the consumer so any value is valid. The current test that the key portion is different can be dropped. There is a slight problem though which is fixed by the test even though it is not proper. C10-68 requires that as soon as a bind operation is complete that no operations referring to the previous binding shall succeed without error. The problem is how do you know if the rkey is the same for the new and old bindings are the same. The sane way to do this is to make sure they are different. This is how all the examples of use cases I have seen proceed. Bob
diff --git a/tests/test_qpex.py b/tests/test_qpex.py index 8f3f338e..a4c99910 100644 --- a/tests/test_qpex.py +++ b/tests/test_qpex.py @@ -300,7 +300,7 @@ class QpExTestCase(RDMATestCase): if ex.error_code == errno.EOPNOTSUPP: raise unittest.SkipTest('Memory Window allocation is not supported') raise ex - new_key = inc_rkey(server.mr.rkey) + new_key = inc_rkey(mw.rkey) server.qp.wr_bind_mw(mw, new_key, bind_info) server.qp.wr_complete() u.poll_cq(server.cq)
The current test_qp_ex_rc_bind_mw in tests/test_qpex.py uses an incorrect value for the new_rkey based on the old mr.rkey. This patch fixes that behavior by basing the new rkey on the old mw.rkey instead. Before this patch the test will fail for the rxe driver about 1 in 256 tries since randomly that is the freguency of new_rkeys which have the same 8 bit key portion as the current mw which is not allowed. With this patch those errors do not occur. Fixes: 9fca2824b5ec ("tests: Retrieve tests that generates mlx5 CQE errors") Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> --- v2 Added a Fixes: line per Edward. It is the latest change to the test. Changed the subject line from RDMA/core: to pyverbs: Changed for-next to for-rdma-core per Zhu tests/test_qpex.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)